I have two DataFrames that are completely dissimilar except for certain values in one particular column:
df First Last Email Age 0 Adam Smith email1@email.com 30 1 John Brown email2@email.com 35 2 Joe Max email3@email.com 40 3 Will Bill email4@email.com 25 4 Johnny Jacks email5@email.com 50
df2 ID Location Contact 0 5435 Austin email5@email.com 1 4234 Atlanta email1@email.com 2 7896 Toronto email3@email.com
How would I go about finding the matching values in the Email column of df and the Contact column of df2, and then dropping the whole row in df based on that match?
Output I’m looking for (index numbering doesn’t matter):
df1 First Last Email Age 1 John Brown email2@email.com 35 3 Will Bill email4@email.com 25
I’ve been able to identify matches using a few different methods like:
Changing the column names to be identical
common = df.merge(df2,on=['Email']) df3 = df[(~df['Email'].isin(common['Email']))]
But df3 still shows all the rows from df.
I’ve also tried:
common = df['Email'].isin(df2['Contact']) df.drop(df[common].index, inplace = True)
And again, identifies the matches but df still contains all original rows.
So the main thing I’m having difficulty with is updating df with the matches dropped or creating a new DataFrame that contains only the rows with dissimilar values when comparing the Email column from df and the Contact column in df2. Appreciate any suggestions.
Advertisement
Answer
As mentioned in the comments(@Arkadiusz), it is enough to filter your data using the following
df3 = df[(~df['Email'].isin(df2.Contact))].copy() print(df3)