I have a similarity matrix of words and would like to apply an algorithm that can put the words in clusters. Here’s the example I have so far: Obviously this is a very simple dummy example, but what I would expect the output to be is 2 clusters, one with ‘The Bachelor’,’The Bachelorette’,’The Bachelor Special’, and the other with ‘SportsCenter’,’SportsCenter
Tag: levenshtein-distance
Replace values in a column with similar values in another column with different size – Python
I have a dataframe with different values in a column (about 6,000 rows), which I need to replace with similar (but differents) values found in another dataframe, which has fewer rows. Store Values to replace Store A 05/15/21 Store A The Store B 04/01/21 Store B Store letter B 11/12/21 Store C Store C 10/24/21 Store D Store D 09/30/21
Edit Distance w/ operational weights in Python
I am learning about edit distance for the first time and have only been coding for a few months. I’m trying to modify the algorithm such that the different editing operations carry different weights as follows: insertion weighs 20, deletion weighs 20 and replacement weighs 5. I have been able to implement the basic code that calculates minimum edit distance
How to modify Levenshtein algorithm, to know if it inserted, deleted, or substituted a character?
So I am trying to devise a spin off of the Levenshtein algorithm, where I keep track of what transformations I did in the string(inserted a, or substitute a for b). Example: Basically, say I am computing the edit distance of “bbd” and “bcd” The edit distance will be 1 and the transformation will be “substitude b for c” Question: