Skip to content
Advertisement

How to count word similarity between two pandas dataframe

Here’s my first dataframe df1

JavaScript

Here’s my second dataframe df2

JavaScript

Similarity Matrix, columns is Id from df1, rows is Id from df2

JavaScript

Note:

0 value in (1,1) and (3,2) because no text similar

1 value in (3,1) is because of Bersatu and Kita' (Id 1ondf2is avalilable in Id3ondf1`

0.33 is counted because of 1 of 3 words similar

0.66 is counted because of 2 of 3 words similar

Advertisement

Answer

IIUC, you need to compute a set intersection:

JavaScript

output:

JavaScript

NB. Note that the condition on the denominator is not fully clear, for {teguh, kita, bersatu} vs {kita, bersatu} I count 2/3 = 0.666

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement