I have two dataframes
df1 = pd.DataFrame({ 'Date':['2013-11-24','2013-11-24','2013-11-25','2013-11-25'], 'Fruit':['Banana','Orange','Apple','Celery'], 'Num':[22.1,8.6,7.6,10.2], 'Color':['Yellow','Orange','Green','Green'], }) print(df1) Date Fruit Num Color 0 2013-11-24 Banana 22.1 Yellow 1 2013-11-24 Orange 8.6 Orange 2 2013-11-25 Apple 7.6 Green 3 2013-11-25 Celery 10.2 Green df2 = pd.DataFrame({ 'Date':['2013-11-25','2013-11-25','2013-11-25','2013-11-25','2013-11-25','2013-11-25'], 'Fruit':['Banana','Orange','Apple','Celery','X','Y'], 'Num':[22.1,8.6,7.6,10.2,22.1,8.6], 'Color':['Yellow','Orange','Green','Green','Red','Orange'], }) print(df2) Date Fruit Num Color 0 2013-11-25 Banana 22.1 Yellow 1 2013-11-25 Orange 8.6 Orange 2 2013-11-25 Apple 7.6 Green 3 2013-11-25 Celery 10.2 Green 4 2013-11-25 X 22.1 Red 5 2013-11-25 Y 8.6 Orange
I am trying to find out the difference between these two dataframes based on the column Fruit
This is what i am doing now but i am not getting the expected output
mapped_df = pd.concat([df1,df2],ignore_index=True).drop_duplicates(keep=False) print(mapped_df)
Expected output
Date Fruit Num Color 8 2013-11-25 X 22.1 Red 9 2013-11-25 Y 8.6 Orange
Advertisement
Answer
You can use the negated isin
:
output = df2.loc[~df2['Fruit'].isin(df1['Fruit'])]