Levenshtein edit distance

posted Jan 3, 2011 12:37 PM by Amin Ariana   [ updated Feb 4, 2011 5:21 PM ]
Tags: Difficult, dynamic programming

Problem description

(From Wikipedia) In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e. an edit distance). The term edit distance is often used to refer specifically to Levenshtein distance.


The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. It is named after Vladimir Levenshtein, who considered this distance in 1965.

For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:

  1. kitten → sitten (substitution of 'k' with 's')
  2. sitten → sittin (substitution of 'e' with 'i')
  3. sittin → sitting (insert 'g' at the end).


Given a list of strings representing all the words and phrases in English, and a query string, return the string from the list with the least edit distance from the query string.

Grading

  • Completeness of the C# or pseudo-code algorithm (25%)
  • Correctness of the algorithm, demonstrated by quick manual unit tests (25%)
  • What is the time and space complexity of your algorithm? (25%)
  • Is the most efficient solution offered? (25%)

References

Lars, a Google engineer and a University of Waterloo alumnus, asked me this during a two hour Google screening interview.

Levenshtein distance. Wikipedia. http://en.wikipedia.org/wiki/Levenshtein_distance (accessed Jan. 2011)

The Levenshtein-Algorithm. Levenshtein. http://www.levenshtein.net/ (accessed Jan. 2011)