Suppose I have two dataframes:
d1 = {'col1':['a','b','c'], 'col2':[1,2,3], 'col3':[4,5,6]} df1 = pd.DataFrame(d1) col1 col2 col3 0 a 1 4 1 b 2 5 2 c 3 6
and
d2 = {'col1':['a','b']} df2 = pd.DataFrame(d2) col1 0 a 1 b
I want to use the second df
as reference and drop those rows that exist in df2
from df1
, so the result would be
col1 col2 col3 0 c 3 6
I tried:
df2.merge(df1, how='left', on='col1')
but this gives me the following:
col1 col2 col3 0 a 1 4 1 b 2 5
Advertisement
Answer
Use Series.isin
with inverted mask by ~
in boolean indexing
, working well if need test only one column:
df = df1[~df1['col1'].isin(df2['col1'])] print (df) col1 col2 col3 2 c 3 6
If need test 2 or more columns use DataFrame.merge
with indicator
parameter and then test in if column is not both
:
df = df2.merge(df1, how='outer', on='col1', indicator=True) df = df[df.pop('_merge').ne('both')] print (df) col1 col2 col3 2 c 3 6