Skip to content
Advertisement

How to check if a substring in a pandas dataframe column exists in a substring of another column in the same dataframe?

I have a dataframe with columns like this:

JavaScript

I want to create a list with values from A that matches values from B. The list should look like [- 5923FoxRd, Saratoga Street, Suite 200…]. What is the easiest way to do this?

Advertisement

Answer

To make a little go a long way, do the following:

  1. Create a new series for each column and pass the regex pattern W+ to str.replace()
  2. use str.lower()
  3. create replace lists to normalize drive to dr, avenue to ave, etc.

JavaScript

This is what s1 and s2 look like:

JavaScript

From there, you might want to create some replace values in order to normalize your data even further like:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement