I have a similar question as here Comparing two data frames and getting the differences But columns in df1 is a subset of columns in df2.
df1: Date Fruit Num Color 2013-11-24 Banana 22.1 Yellow 2013-11-24 Orange 8.6 Orange 2013-11-24 Apple 7.6 Green 2013-11-24 Celery 10.2 Green df2: Date Fruit Num Color A 2013-11-24 Banana 22.1 Yellow 1 2013-11-24 Orange 8.6 Orange 2 2013-11-24 Apple 7.6 Green 3 2013-11-24 Celery 10.2 Green 4 2013-11-25 Apple 22.1 Red 5 2013-11-25 Orange 8.6 Orange 6
I would like to get the difference the two df by comparing those columns in common only. So the result I expect to get is
Date Fruit Num Color A 4 2013-11-25 Apple 22.1 Red 5 5 2013-11-25 Orange 8.6 Orange 6
Is there a way to do so? Any help is appreciated.
Advertisement
Answer
First you get the column names of df1
df1_columns = df1.columns # ["Date", "Fruit", "Num", "Color"]
Now you create a new df2 dataframe with only df1 columns
df2_filtered = df2[df1_columns]
And now you can apply the solution from this other question.
#concatenate both dataframes df = pd.concat([df1, df2_filtered]) df = df.reset_index(drop=True) #group by df_gpby = df.groupby(list(df.columns)) # get index of unique records idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1] #filter df.reindex(idx)
Hope it helps!