I have dataframe that looks like this:
email account_name 0 NaN weichert, realtors mnsota 1 jhawkins sterling group com sterling group 2 lbaltz baltzchevy com baltz chevrolet
and I have this code that works as a solution but it takes forever on larger datasets and I know there has to be an easier way to solve it so just looking to see if anyone knows of a more concise/elegant way to do find a count of matching words between corresponding rows of both columns. Thanks
test = prod_nb_wcomps_2.sample(3, random_state=10).reset_index(drop = True) test = test[['email','account_name']] print(test) lst = [] for i in test.index: if not isinstance(test['email'].iloc[i], float): for word in test['email'].iloc[i].split(' '): if not isinstance(test['account_name'].iloc[i], float): for word2 in test['account_name'].iloc[i].split(' '): if word in word2: lst.append({'index':i, 'bool_col': True}) else: lst.append({'index':i, 'bool_col': False}) df_dct = pd.DataFrame(lst) df_dct = df_dct.loc[df_dct['bool_col'] == True] df_dct['number of matches_per_row'] = df_dct.groupby('index')['bool_col'].transform('size') df_dct.set_index('index', inplace=True, drop=True) df_dct.drop(['bool_col'], inplace=True, axis =1) test_ = pd.merge(test, df_dct, left_index=True, right_index=True) test_
the resulting dataframe test_
looks like this
Advertisement
Answer
This solves your query.
import pandas as pd df = pd.DataFrame({'email': ['', 'jhawkins sterling group com', 'lbaltz baltzchevy com'], 'name': ['John', 'sterling group', 'Linda']}) for index, row in df.iterrows(): matches = sum([1 for x in row['email'].split() if x in row['name'].split()]) df.loc[index, 'matches'] = matches
Output:
email name matches 0 John 0.0 1 jhawkins sterling group com sterling group 2.0 2 lbaltz baltzchevy com Linda 0.0