How to check if a substring in a pandas dataframe column exists in a substring of another column in the same dataframe?

Question

I have a dataframe with columns like this: I want to create a list with values from A that matches values from B. The list should look like [- 5923FoxRd, Saratoga Street, Suite 200...]. What is the easiest way to do this? Answer To make a little go a long way, do the following: Create a new series for each

Accepted Answer

To make a little go a long way, do the following:Create a new series for each column and pass the regex pattern W+ to str.replace()use str.lower()create replace lists to normalize drive to dr, avenue to ave, etc.s1 = df['A'].str.replace('W+', '').str.lower()s2 = df['B'].str.replace('W+', '').str.lower()lst = [*df[s1==s2]['A']]lstOut[1]: ['- 5923FoxRd', 'Saratoga Street, Suite 200']This is what s1 and s2 look like:print(s1,s2)0                 5923foxrd1            631newhavenave2    saratogastreetsuite200Name: A, dtype: object0                 5923foxrd1                   modesto2    saratogastreetsuite200Name: B, dtype: objectFrom there, you might want to create some replace values in order to normalize your data even further like:to_replace = ['drive', 'avenue', 'street']replaced = ['dr', 'ave', 'str']to_replace = ['drive', 'avenue', 'street']replaced = ['dr', 'ave', 'str']s1 = df['A'].str.replace('W+', '').str.lower().replace(to_replace, replaced, regex=True)s2 = df['B'].str.replace('W+', '').str.lower().replace(to_replace, replaced, regex=True)lst = [*df[s1==s2]['A']]lstprint(s1,s2)0              5923foxrd1         631newhavenave2    saratogastrsuite200Name: A, dtype: object0              5923foxrd1                modesto2    saratogastrsuite200Name: B, dtype: object

Advertisement

Answer