I have a similar question as here Comparing two data frames and getting the differences But columns in df1 is a subset of columns in df2.
JavaScript
x
16
16
1
df1:
2
Date Fruit Num Color
3
2013-11-24 Banana 22.1 Yellow
4
2013-11-24 Orange 8.6 Orange
5
2013-11-24 Apple 7.6 Green
6
2013-11-24 Celery 10.2 Green
7
8
df2:
9
Date Fruit Num Color A
10
2013-11-24 Banana 22.1 Yellow 1
11
2013-11-24 Orange 8.6 Orange 2
12
2013-11-24 Apple 7.6 Green 3
13
2013-11-24 Celery 10.2 Green 4
14
2013-11-25 Apple 22.1 Red 5
15
2013-11-25 Orange 8.6 Orange 6
16
I would like to get the difference the two df by comparing those columns in common only. So the result I expect to get is
JavaScript
1
4
1
Date Fruit Num Color A
2
4 2013-11-25 Apple 22.1 Red 5
3
5 2013-11-25 Orange 8.6 Orange 6
4
Is there a way to do so? Any help is appreciated.
Advertisement
Answer
First you get the column names of df1
JavaScript
1
2
1
df1_columns = df1.columns # ["Date", "Fruit", "Num", "Color"]
2
Now you create a new df2 dataframe with only df1 columns
JavaScript
1
2
1
df2_filtered = df2[df1_columns]
2
And now you can apply the solution from this other question.
JavaScript
1
13
13
1
#concatenate both dataframes
2
df = pd.concat([df1, df2_filtered])
3
df = df.reset_index(drop=True)
4
5
#group by
6
df_gpby = df.groupby(list(df.columns))
7
8
# get index of unique records
9
idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]
10
11
#filter
12
df.reindex(idx)
13
Hope it helps!