i would like check substring match between comments and keyword column and find if anyone of the keywords present in that particular row.
input
JavaScript
x
5
1
name comments keywords
2
0 paul account is active active,activated,activ
3
1 john account is activated active,activated,activ
4
2 max account is activateds active,activated,activ
5
expected output
JavaScript
1
5
1
match
2
True
3
True
4
True
5
Advertisement
Answer
The most efficient here is to loop, you can use set
intersection:
JavaScript
1
3
1
df['match'] = [set(c.split()).intersection(k.split(',')) > set()
2
for c,k in zip(df['comments'], df['keywords'])]
3
Output:
JavaScript
1
5
1
name comments keywords match
2
0 paul account is active active,activated,activ True
3
1 john account is activated active,activated,activ True
4
2 max account is activateds active,activated,activ False
5
Used input:
JavaScript
1
4
1
df = pd.DataFrame({'name': ['paul' , 'john' , 'max'],
2
'comments': ['account is active' ,'account is activated','account is activateds'],
3
'keywords': ['active,activated,activ', 'active,activated,activ', 'active,activated,activ']})
4
With a minor variation you could check for substring match (“activ” would match “activateds”):
JavaScript
1
3
1
df['substring'] = [any(w in c for w in k.split(','))
2
for c,k in zip(df['comments'], df['keywords'])]
3
Output:
JavaScript
1
5
1
name comments keywords substring
2
0 paul account is active active,activated,activ True
3
1 john account is activated active,activated,activ True
4
2 max account is activateds active,activated,activ True
5