Suppose I have two dataframes:
d1 = {'col1':['a','b','c'],
'col2':[1,2,3],
'col3':[4,5,6]}
df1 = pd.DataFrame(d1)
col1 col2 col3
0 a 1 4
1 b 2 5
2 c 3 6
and
d2 = {'col1':['a','b']}
df2 = pd.DataFrame(d2)
col1
0 a
1 b
I want to use the second df as reference and drop those rows that exist in df2 from df1, so the result would be
col1 col2 col3 0 c 3 6
I tried:
df2.merge(df1, how='left', on='col1')
but this gives me the following:
col1 col2 col3 0 a 1 4 1 b 2 5
Advertisement
Answer
Use Series.isin with inverted mask by ~ in boolean indexing, working well if need test only one column:
df = df1[~df1['col1'].isin(df2['col1'])] print (df) col1 col2 col3 2 c 3 6
If need test 2 or more columns use DataFrame.merge with indicator parameter and then test in if column is not both:
df = df2.merge(df1, how='outer', on='col1', indicator=True)
df = df[df.pop('_merge').ne('both')]
print (df)
col1 col2 col3
2 c 3 6