Skip to content
Advertisement

How would I make clusters from a Levenshtein similarity matrix?

I have a similarity matrix of words and would like to apply an algorithm that can put the words in clusters.

Here’s the example I have so far:

JavaScript

Obviously this is a very simple dummy example, but what I would expect the output to be is 2 clusters, one with ‘The Bachelor’,’The Bachelorette’,’The Bachelor Special’, and the other with ‘SportsCenter’,’SportsCenter 8 PM’,’SportsCenter Sunday’.

Can anyone help me with this?

Advertisement

Answer

Using the matrix that your code generates, the matrix can be looped through to find the similarity between each word and the words under a certain threshold can be grouped. The code should look similar to the following:

JavaScript

The threshold variable must be altered depending on the desired sensitivity for similarity. The code assumes that words may be repeated in separate clusters as long as they meet the similarity threshold. I recommend changing the matrix values to a percentage as well so that the length of the words will have less impact on sensitivity threshold. Also, if the matrix does not need to be computed before the clusters are created, the clusters could be created as the Levenshtein distances are computed.

Happy Coding!

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement