I have two DFs:
df1: Date Fruit Num Color 2013-11-24 Banana 22.1 Yellow 2013-11-24 Orange 8.6 Orange 2013-11-24 Apple 7.6 Green 2013-11-24 Celery 10.2 Green df2: Date Fruit Num Color 2013-11-24 Banana 22.1 Yellow 2013-11-24 Orange 8.6 Orange 2013-11-24 Apple 7.6 Green 2013-11-24 Celery 10.2 Green 2013-11-25 Apple 22.1 Red 2013-11-25 Orange 8.6 Orange
Now I would like to compare the two dfs and put a column ‘True’ in df2 when the color column of df2 is residing in df1.
desired output:
Date Fruit Num Color Match 2013-11-24 Banana 22.1 Yellow True 2013-11-24 Orange 8.6 Orange True 2013-11-24 Apple 7.6 Green True 2013-11-24 Celery 10.2 Green True 2013-11-25 Apple 22.1 Red False 2013-11-25 Orange 8.6 Orange True
I came up with the following:
df2['Match'] = np.where(df2['Match'] == df1, True, False)
However got the following error:
ValueError: Can only compare identically-labeled Series objects
And tried the following
flat_user_data['Match'] = np.where(df2['Color'].isin(df1['Color']), True, False)
ValueError: Length of values (5) does not match length of index (10798)
Advertisement
Answer
IIUC, Series.isin
:
df2['Match'] = df2['Color'].isin(df1['Color'])
Or np.isin
:
df2['Match'] = np.isin(df2['Color'], df1['Color'])