NLTK - Error getting classifier accuracy


Keywords:python 


Question: 

Fairly new to Python/NLTK so forgive me if this is a basic question.

The classifier appears to be running/working fine but when trying to retrieve the accuracy via nltk.classify.accuracy I am encountering a ValueError.

Is this related to the training set being contained within [({xxx})] while the test set is contained within [xxx]?

The error states:

results = classifier.classify_many([fs for (fs, l) in gold])
ValueError: too many values to unpack (expected 2)`

The code

 train = [('train', 'train'),
('next train in', 'train'),
('When is the next train', 'train'),
('How long until the next train', 'train'),
("Where is the next train", 'train'),
('dart', 'train'),
('next dart in', 'train'),
('When is the next dart', 'train'),
('How long until the next dart', 'train'),
("Where is the next dart", 'train'),
("Show me where", 'map'),
("Directions to", 'map'),
('map', 'map')]


all_words = set(word.lower() for passage in train for word in word_tokenize(passage[0]))
t = [({word: (word in word_tokenize(x[0])) for word in all_words}, x[1]) for x in train]
classifier = nltk.NaiveBayesClassifier.train(t)
classifier.show_most_informative_features()


test_sentence = 'Whatever my message is, hopefully something about trains'

test_sent_features = {word.lower(): (word in word_tokenize(test_sentence.lower())) for word in all_words}

print(classifier.classify(test_sent_features))
print(nltk.classify.accuracy(classifier, test_sent_features))

I'm sure there's something simple I'm overlooking but I cant seem to spot it. Would appreciate any input on this, thanks.


2 Answers: 

Use the enumerate function on your for loop.
for index, item in enumerate(yourlist):



Yeah, you're doing it wrong. Think about it: How would the classifier module be able to calculate the accuracy, unless you give it the answers?

The accuracy() function must be called with a list of labeled data (the "label" is the desired classification), the same way you call train(). It needs a whole list of them (not just one sentence), so that it can tell you what percent of the answers it computed are correct.