Levenshtein distance. Max distance exception


Keywords:c# 


Question: 

I have this levenstein algorithm:

public static int? GetLevenshteinDistance(string input, string output, int maxDistance)
        {
            var stringOne = String.Empty;
            var stringTwo = String.Empty;

            if (input.Length >= output.Length)
            {
                stringOne = input;
                stringTwo = output;
            }
            else
            {
                stringOne = output;
                stringTwo = input;
            }

            var stringOneLength = stringOne.Length;
            var stringTwoLength = stringTwo.Length;

            var matrix = new int[stringOneLength + 1, stringTwoLength + 1];

            for (var i = 0; i <= stringOneLength; matrix[i, 0] = i++) { }
            for (var j = 0; j <= stringTwoLength; matrix[0, j] = j++) { }

            for (var i = 1; i <= stringOneLength; i++)
            {
                bool isBreak = true;

                for (var j = 1; j <= stringTwoLength; j++)
                {
                    var cost = (stringTwo[j - 1] == stringOne[i - 1]) ? 0 : 1;

                    matrix[i, j] = Math.Min(
                        Math.Min(matrix[i - 1, j] + 1, matrix[i, j - 1] + 1),
                        matrix[i - 1, j - 1] + cost);

                    if (matrix[i, j] < maxDistance)
                    {
                        isBreak = false;
                    }
                }

                if (isBreak)
                {
                    return null;
                }
            }

            return matrix[stringOneLength, stringTwoLength];
        }

I checked each values and if it > max distance I break for. But it does not always work correctly.

For example:

string1 = "#rewRPAF"
string2 = "#rewQVRZP"
maxDistance = 4

I get value 5 but don't null.

This solution i get this - Levenstein distance limit


2 Answers: 

We don't fix code on here, but I will help you fix it yourself.

Change this

            if (matrix[i, j] < maxDistance)
            {
                isBreak = false;
            }

to

            if (matrix[i, j] < maxDistance)
            {
                isBreak = false;
            } else {
                System.Diagnostics.Debugger.Break();
            }

that should break the debugger when you get to maxDistance, when that happens step forward in the debugger and follow what your program does. That should allow you to see what is happening that you don't want.



Look at what happens the first time around the inner loop. At this point the cost can not exceed one. Thus IsBreak is always being set to false if MaxDistance is greater than 1.

My gut says:

scrap everything to do with IsBreak

int Distance = matrix[stringOneLength, stringTwoLength];
return Distance > MaxDistance ? null : Distance;

but I haven't tried it.

Alternately (I haven't done enough with Levenshtein to be confident of this approach):

scrap everything to do with IsBreak

if (matrix[i, j] < maxDistance)
    {
        isBreak = false;
    }

becomes

if (matrix[i, j] > maxDistance)
    {
        return null;
    }

(Note that your termination test had an off-by-one.)