Skip to content
Advertisement

How to search a dataframe value based on another dataframe value?

I have two DFs:

df1:
Date       Fruit  Num  Color 
2013-11-24 Banana 22.1 Yellow
2013-11-24 Orange  8.6 Orange
2013-11-24 Apple   7.6 Green
2013-11-24 Celery 10.2 Green

df2:
Date       Fruit  Num  Color 
2013-11-24 Banana 22.1 Yellow
2013-11-24 Orange  8.6 Orange
2013-11-24 Apple   7.6 Green
2013-11-24 Celery 10.2 Green
2013-11-25 Apple  22.1 Red
2013-11-25 Orange  8.6 Orange

Now I would like to compare the two dfs and put a column ‘True’ in df2 when the color column of df2 is residing in df1.

desired output:

Date       Fruit  Num  Color  Match
2013-11-24 Banana 22.1 Yellow True
2013-11-24 Orange  8.6 Orange True
2013-11-24 Apple   7.6 Green  True
2013-11-24 Celery 10.2 Green  True
2013-11-25 Apple  22.1 Red    False
2013-11-25 Orange  8.6 Orange True

I came up with the following:

df2['Match'] = np.where(df2['Match'] == df1, True, False)

However got the following error:

ValueError: Can only compare identically-labeled Series objects

And tried the following

flat_user_data['Match'] = np.where(df2['Color'].isin(df1['Color']), True, False)

ValueError: Length of values (5) does not match length of index (10798)

Advertisement

Answer

IIUC, Series.isin:

df2['Match'] = df2['Color'].isin(df1['Color'])

Or np.isin:

df2['Match'] = np.isin(df2['Color'], df1['Color'])
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement