Suppose I have two dataframes:
JavaScript
x
9
1
d1 = {'col1':['a','b','c'],
2
'col2':[1,2,3],
3
'col3':[4,5,6]}
4
df1 = pd.DataFrame(d1)
5
col1 col2 col3
6
0 a 1 4
7
1 b 2 5
8
2 c 3 6
9
and
JavaScript
1
6
1
d2 = {'col1':['a','b']}
2
df2 = pd.DataFrame(d2)
3
col1
4
0 a
5
1 b
6
I want to use the second df
as reference and drop those rows that exist in df2
from df1
, so the result would be
JavaScript
1
3
1
col1 col2 col3
2
0 c 3 6
3
I tried:
JavaScript
1
2
1
df2.merge(df1, how='left', on='col1')
2
but this gives me the following:
JavaScript
1
4
1
col1 col2 col3
2
0 a 1 4
3
1 b 2 5
4
Advertisement
Answer
Use Series.isin
with inverted mask by ~
in boolean indexing
, working well if need test only one column:
JavaScript
1
5
1
df = df1[~df1['col1'].isin(df2['col1'])]
2
print (df)
3
col1 col2 col3
4
2 c 3 6
5
If need test 2 or more columns use DataFrame.merge
with indicator
parameter and then test in if column is not both
:
JavaScript
1
6
1
df = df2.merge(df1, how='outer', on='col1', indicator=True)
2
df = df[df.pop('_merge').ne('both')]
3
print (df)
4
col1 col2 col3
5
2 c 3 6
6