Skip to content
Advertisement

How to match multiple columns from two dataframes that have different sizes?

One of the solutions that is similar is found in here where the asker only have a single dataframe and their requirements was to match a fixed string value:

result = df.loc[(df['Col1'] =='Team2') & (df['Col2']=='Medium'), 'Col3'].values[0]

However, the problem I encountered with the .loc method is that it requires the 2 dataframes to have the same size because it will only match values on the same row position of each dataframe. So if the orders of the rows are mixed in either of the dataframes, it will not work as expected.

Sample of this situation is shown below:

df1df1 = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]})

enter image description here

df2df2 = pd.DataFrame({'a': [1, 3, 2], 'b': [4, 6, 5]})

enter image description here

Using, df1.loc[(df1['a'] == df2['a']) & (df1['b'] == df2['b']), 'Status'] = 'Included' will yield:

enter image description here

But i’m looking for something like this:

enter image description here

I have looked into methods such as .lookup but is deprecated as of December of the year 2020 (which also requires similar sized dataframes).

Advertisement

Answer

Use DataFrame.merge with indicator parameter for new column with this information, if need change values e.g. use numpy.where:

df = df1.merge(df2, indicator='status', how='left')
df['status'] = np.where(df['status'].eq('both'), 'included', 'not included')
print (df)
   a  b    status
0  1  4  included
1  2  5  included
2  3  6  included
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement