How to modify Levenshtein algorithm, to know if it inserted, deleted, or substituted a character?

Tags: , ,



So I am trying to devise a spin off of the Levenshtein algorithm, where I keep track of what transformations I did in the string(inserted a, or substitute a for b).

Example:

Basically, say I am computing the edit distance of “bbd” and “bcd”

The edit distance will be 1 and the transformation will be “substitude b for c”

Question: How would I approach this problem since the implementations i’ve seen do not concern themselves with knowing what kind of operation it was but only the total cost?

Answer

You can use this module – there is an editops function there, which returns a list with the operations needed to transform one string to another.

Example:

Levenshtein.editops("FBBDE", "BCDASD")
[('delete', 0, 0), ('replace', 2, 1), ('insert', 4, 3), ('insert', 4, 4), ('replace', 4, 5)]

From the docs:

Find sequence of edit operations transforming one string to another.

editops(source_string, destination_string) editops(edit_operations, source_length, destination_length)

The result is a list of triples (operation, spos, dpos), where operation is one of ‘equal’, ‘replace’, ‘insert’, or ‘delete’; spos and dpos are position of characters in the first (source) and the second (destination) strings. These are operations on single characters. In fact the returned list doesn’t contain the ‘equal’, but all the related functions accept both lists with and without ‘equal’s.



Source: stackoverflow