Skip to content
Advertisement

Find if words from one sentence are found in corresponding row of another column also containing sentences (Pandas)

I have dataframe that looks like this:

                         email                 account_name
0                          NaN  weichert, realtors   mnsota
1  jhawkins sterling group com               sterling group
2        lbaltz baltzchevy com              baltz chevrolet

and I have this code that works as a solution but it takes forever on larger datasets and I know there has to be an easier way to solve it so just looking to see if anyone knows of a more concise/elegant way to do find a count of matching words between corresponding rows of both columns. Thanks

test = prod_nb_wcomps_2.sample(3, random_state=10).reset_index(drop = True)
test = test[['email','account_name']]
print(test)
lst = []

for i in test.index:
    if not isinstance(test['email'].iloc[i], float):
        for word in test['email'].iloc[i].split(' '):
            if not isinstance(test['account_name'].iloc[i], float):
                 for word2 in test['account_name'].iloc[i].split(' '):  
                    if word in word2:
                         lst.append({'index':i, 'bool_col': True})
                    else: lst.append({'index':i, 'bool_col': False})

df_dct = pd.DataFrame(lst)
df_dct = df_dct.loc[df_dct['bool_col'] == True]
df_dct['number of matches_per_row'] = df_dct.groupby('index')['bool_col'].transform('size')
df_dct.set_index('index', inplace=True, drop=True)
df_dct.drop(['bool_col'], inplace=True, axis =1)
test_ = pd.merge(test, df_dct, left_index=True, right_index=True)
test_

the resulting dataframe test_ looks like this

enter image description here

Advertisement

Answer

This solves your query.

import pandas as pd

df = pd.DataFrame({'email': ['', 'jhawkins sterling group com', 'lbaltz baltzchevy com'], 'name': ['John', 'sterling group', 'Linda']})

for index, row in df.iterrows():
    matches = sum([1 for x in row['email'].split() if x in row['name'].split()])
    df.loc[index, 'matches'] = matches

Output:

                         email            name  matches
0                                         John      0.0
1  jhawkins sterling group com  sterling group      2.0
2        lbaltz baltzchevy com           Linda      0.0
Advertisement