Skip to content
Advertisement

Edit Distance w/ operational weights in Python

I am learning about edit distance for the first time and have only been coding for a few months. I’m trying to modify the algorithm such that the different editing operations carry different weights as follows: insertion weighs 20, deletion weighs 20 and replacement weighs 5.

I have been able to implement the basic code that calculates minimum edit distance if all operations were equal in weight (levenshtein distance). But how would one implement it if they are different as stated above? This is what I have at the moment:

str1="algorithms"
str2="alligator"
m=len(str1)
n=len(str2)

def editdistance(str1, str2, m, n):
  table=[[0 for x in range(n+1)] for x in range(m+1)]
  
  for i in range(m+1):
    for j in range(n+1):

      if i==0:
        table[i][j]=j

      elif j==0:
        table[i][j]=i

      elif str1[i-1]==str2[j-1]:
        table[i][j]=table[i-1][j-1]

      else:
         table[i][j] = min(20+table[i][j-1], 20+table[i-1][j], 5+table[i-1][j-1])
        

  return table[m][n]

print(editdistance(str1, str2, m, n)) 

Output is 46, which is obviously wrong as the answer should be a multiple of 5. What am I missing here? Any help would be greatly appreciated.

Advertisement

Answer

You have the base costs for when i = 0 and j = 0 to be j and i respectively, which are not multiples of 5. Then you should be multiplying them by 20 since not using the letters is essentially the same as deleting or inserting them for the purposes of edit distance. So you should try something like this:

str1="algorithms"
str2="alligator"
m=len(str1)
n=len(str2)

def editdistance(str1, str2, m, n):
  table=[[0 for x in range(n+1)] for x in range(m+1)]
  
  for i in range(m+1):
    for j in range(n+1):

      if i==0:
        table[i][j]=j*20

      elif j==0:
        table[i][j]=i*20

      elif str1[i-1]==str2[j-1]:
        table[i][j]=table[i-1][j-1]

      else:
         table[i][j] = min(20+table[i][j-1], 20+table[i-1][j], 5+table[i-1][j-1])
        

  return table[m][n]

print(editdistance(str1, str2, m, n)) 
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement