I have two DFs:
JavaScript
x
16
16
1
df1:
2
Date Fruit Num Color
3
2013-11-24 Banana 22.1 Yellow
4
2013-11-24 Orange 8.6 Orange
5
2013-11-24 Apple 7.6 Green
6
2013-11-24 Celery 10.2 Green
7
8
df2:
9
Date Fruit Num Color
10
2013-11-24 Banana 22.1 Yellow
11
2013-11-24 Orange 8.6 Orange
12
2013-11-24 Apple 7.6 Green
13
2013-11-24 Celery 10.2 Green
14
2013-11-25 Apple 22.1 Red
15
2013-11-25 Orange 8.6 Orange
16
Now I would like to compare the two dfs and put a column ‘True’ in df2 when the color column of df2 is residing in df1.
desired output:
JavaScript
1
8
1
Date Fruit Num Color Match
2
2013-11-24 Banana 22.1 Yellow True
3
2013-11-24 Orange 8.6 Orange True
4
2013-11-24 Apple 7.6 Green True
5
2013-11-24 Celery 10.2 Green True
6
2013-11-25 Apple 22.1 Red False
7
2013-11-25 Orange 8.6 Orange True
8
I came up with the following:
JavaScript
1
2
1
df2['Match'] = np.where(df2['Match'] == df1, True, False)
2
However got the following error:
JavaScript
1
2
1
ValueError: Can only compare identically-labeled Series objects
2
And tried the following
JavaScript
1
2
1
flat_user_data['Match'] = np.where(df2['Color'].isin(df1['Color']), True, False)
2
ValueError: Length of values (5) does not match length of index (10798)
Advertisement
Answer
IIUC, Series.isin
:
JavaScript
1
2
1
df2['Match'] = df2['Color'].isin(df1['Color'])
2
Or np.isin
:
JavaScript
1
2
1
df2['Match'] = np.isin(df2['Color'], df1['Color'])
2