i would like check substring match between comments and keyword column and find if anyone of the keywords present in that particular row.
input
name comments keywords 0 paul account is active active,activated,activ 1 john account is activated active,activated,activ 2 max account is activateds active,activated,activ
expected output
match True True True
Advertisement
Answer
The most efficient here is to loop, you can use set
intersection:
df['match'] = [set(c.split()).intersection(k.split(',')) > set() for c,k in zip(df['comments'], df['keywords'])]
Output:
name comments keywords match 0 paul account is active active,activated,activ True 1 john account is activated active,activated,activ True 2 max account is activateds active,activated,activ False
Used input:
df = pd.DataFrame({'name': ['paul' , 'john' , 'max'], 'comments': ['account is active' ,'account is activated','account is activateds'], 'keywords': ['active,activated,activ', 'active,activated,activ', 'active,activated,activ']})
With a minor variation you could check for substring match (“activ” would match “activateds”):
df['substring'] = [any(w in c for w in k.split(',')) for c,k in zip(df['comments'], df['keywords'])]
Output:
name comments keywords substring 0 paul account is active active,activated,activ True 1 john account is activated active,activated,activ True 2 max account is activateds active,activated,activ True