python 2.7 - difflib.get_close_matches GET SCORE


Keywords:python  2.7 


Question: 

I am trying to get the score of the best match using difflib.get_close_matches:

import difflib

best_match = difflib.get_close_matches(str,str_list,1)[0]

I know of the option to add 'cutoff' parameter, but couldn't find out how to get the actual score after setting the threshold. Am I missing something? Is there a better solution to match unicode strings?


1 Answer: 

I found that difflib.get_close_matches is the simplest way for matching/fuzzy-matching strings. But there are a few other more advanced libraries like fuzzywuzzy as you mentioned in the comments.

But if you want to use difflib, you can use difflib.SequenceMatcher to get the score as follows:

import difflib
my_str = 'apple'
str_list = ['ape' , 'fjsdf', 'aerewtg', 'dgyow', 'paepd']
best_match = difflib.get_close_matches(my_str,str_list,1)[0]
score = difflib.SequenceMatcher(None, my_str, best_match).ratio()

In this example, the best match between 'apple' and the list is 'ape' and the score is 0.75.

You can also loop through the list and compute all the scores to check:

for word in str_list:
    print "score for: " + my_str + " vs. " + word + " = " + str(difflib.SequenceMatcher(None, my_str, word).ratio())

For this example, you get the following:

score for: apple vs. ape = 0.75
score for: apple vs. fjsdf = 0.0
score for: apple vs. aerewtg = 0.333333333333
score for: apple vs. dgyow = 0.0
score for: apple vs. paepd = 0.4

Documentation for difflib can be found here: