Skip to content
Advertisement

How to drop rows in one DataFrame based on one similar column in another Dataframe that has a different number of rows

I have two DataFrames that are completely dissimilar except for certain values in one particular column:

df
  First  Last   Email             Age
0 Adam   Smith  email1@email.com   30
1 John   Brown  email2@email.com  35
2 Joe    Max    email3@email.com  40
3 Will   Bill   email4@email.com  25
4 Johnny Jacks  email5@email.com  50
df2
  ID   Location  Contact
0 5435 Austin    email5@email.com
1 4234 Atlanta   email1@email.com
2 7896 Toronto   email3@email.com

How would I go about finding the matching values in the Email column of df and the Contact column of df2, and then dropping the whole row in df based on that match?

Output I’m looking for (index numbering doesn’t matter):

df1
  First  Last   Email             Age
1 John   Brown  email2@email.com  35
3 Will   Bill   email4@email.com  25

I’ve been able to identify matches using a few different methods like:

Changing the column names to be identical

common = df.merge(df2,on=['Email'])
df3 = df[(~df['Email'].isin(common['Email']))]

But df3 still shows all the rows from df.

I’ve also tried:

common = df['Email'].isin(df2['Contact'])
df.drop(df[common].index, inplace = True)

And again, identifies the matches but df still contains all original rows.

So the main thing I’m having difficulty with is updating df with the matches dropped or creating a new DataFrame that contains only the rows with dissimilar values when comparing the Email column from df and the Contact column in df2. Appreciate any suggestions.

Advertisement

Answer

As mentioned in the comments(@Arkadiusz), it is enough to filter your data using the following

df3 = df[(~df['Email'].isin(df2.Contact))].copy()
print(df3)
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement