"Hugo Boss" this includes "Hugo Bos", "Huggo Boss", "Hugo Boss Ltd".... All of the above will have the same soundex (phonetic algorithm) values except for the last one with "LTD".
You could match soundex to the business names. This should work on "Hugo Boss" "Hugo Bos", and "Huggo Boss". However "Hugo Boss Ltd" will not match the other because of the LTD at the end. This technique has worked well for fuzzy matching where I work and the results have been useful when comparing across first names and last names to establish identity.
Keep in mind though that soundex won't work for things like social security numbers. It has a stricter domain as compared to a distance measure such as an edit distance.
You probably could also strip off things like "Ltd", "LLC", "Corp" that are common to business names in your data set. This would help a soundex matching framework because it shortens string lengths.
In addition you could compare letter ngrams as thomas recommended in his record linkage answer and this would simplify the number of ngrams to test as well.
Here is the NYSIIS algorithm:
The algorithm, as described in New York State Identification and Intelligence System:
1. Translate first characters of name: MAC → MCC, KN → N, K → C, PH, PF → FF, SCH → SSS
2. Translate last characters of name: EE → Y, IE → Y, DT, RT, RD, NT, ND → D
3. First character of key = first character of name.
4. Translate remaining characters by following rules, incrementing by one character each time:
1. EV → AF else A, E, I, O, U → A
2. Q → G, Z → S, M → N
3. KN → N else K → C
4. SCH → SSS, PH → FF
5. H → If previous or next is non-vowel, previous.
6. W → If previous is vowel, A.
7. Add current to key if current is not same as the last key character.
5. If last character is S, remove it.
6. If last characters are AY, replace with Y.
7. If last character is A, remove it.
8. Append translated key to value from step 3 (removed first character)
9. If longer than 6 characters, truncate to first 6 characters. (only needed for true NYSIIS, some versions use the full key)
Soundex packages are found in many high level programming languages. In python you can try fuzzy package:
names = [ 'Catherine', 'Katherine', 'Katarina',
'Johnathan', 'Jonathan', 'John',
for n in names:
print '%-10s' % n, fuzzy.nysiis(n)
$ python show_nysiis.py
The example above can be found here:
You can key and match ngrams or the full names.
Finally you can use the mode name in the data or some other method to normalize the name field.