Skip to content
Advertisement

How to count letter based similarity on pandas dataframe

Here’s my first dataframe df1

JavaScript

Here’s my second dataframe df2

JavaScript

Similarity Matrix, columns is Id from df1, rows is Id from df2

JavaScript

Note:

0 value in (1,1), (2,1) and (3,2) because no letter similar

0.25 value in (3,1) is because of only 1 letter from raUw avaliable in 4 letter `dnag’ (1/4 equals 0.25)

0.5 is counted because of 2 of 4 letter similar

0.66 is counted because of 2 of 3 words similar

Advertisement

Answer

IIUC, one option is to use set.intersection in a nested list comprehension:

JavaScript

Output:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement