Skip to content
Advertisement

Uncommon rows based on a column in pandas

Suppose I have two dataframes:

d1 = {'col1':['a','b','c'],
    'col2':[1,2,3],
    'col3':[4,5,6]}
df1 = pd.DataFrame(d1)
  col1  col2  col3
0    a     1     4
1    b     2     5
2    c     3     6

and

d2 = {'col1':['a','b']}
df2 = pd.DataFrame(d2)
  col1
0    a
1    b

I want to use the second df as reference and drop those rows that exist in df2 from df1, so the result would be

  col1  col2  col3
0    c     3     6

I tried:

df2.merge(df1, how='left', on='col1')

but this gives me the following:

  col1  col2  col3
0    a     1     4
1    b     2     5

Advertisement

Answer

Use Series.isin with inverted mask by ~ in boolean indexing, working well if need test only one column:

df = df1[~df1['col1'].isin(df2['col1'])]
print (df)
  col1  col2  col3
2    c     3     6

If need test 2 or more columns use DataFrame.merge with indicator parameter and then test in if column is not both:

df = df2.merge(df1, how='outer', on='col1', indicator=True)
df = df[df.pop('_merge').ne('both')]
print (df)
  col1  col2  col3
2    c     3     6
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement