I asked quite a similar question here but was wondering if there is a way to tackle the issue if one has to rely on multiple columns to perform the append. So the dataframes look as follows,
import pandas as pd d1 ={'col1': ['I ate dinner','I ate dinner', 'the play was inetresting','the play was inetresting'], 'col2': ['I ate dinner','I went to school', 'the play was inetresting for her','the gold is shining'], 'col3': ['I went out','I did not stay at home', 'the play was inetresting for her','the house is nice'], 'col4': ['min', 'max', 'mid','min'], 'col5': ['min', 'max', 'max','max']} d2 ={'col1': ['I ate dinner',' the glass is shattered', 'the play was inetresting'], 'col2': ['I ate dinner',' the weather is nice', 'the gold is shining'], 'col3': ['I went out',' the house was amazing', 'the house is nice'], 'col4': ['min', 'max', 'max'], 'col5': ['max', 'min', 'mid']} df1 = pd.DataFrame(d1) df2 = pd.DataFrame(d2)
so this time, i would like to append the rows from df2 under similar rows in df1 only if the rows are similar in all col1, col2, col3. so the output is,
col1 col2 col3 col4 col5 0 I ate dinner I ate dinner I went out min min 1 I ate dinner I ate dinner I went out min max 2 the play was inetresting the gold is shining the house is nice min max 3 the play was inetresting the gold is shining the house is nice max mid
so I tried the following,
df = pd.concat(df1[df1.set_index(['col1','col2','col3']).index.isin(df2.set_index(['col1','col2','col3']).index)]).sort_values(df1.set_index(['col1','col2','col3']).index, ignore_index=True)
but I get this error,
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
Advertisement
Answer
Another solution is to use pd.merge
and pd.wide_to_long
:
out = ( pd.wide_to_long( pd.merge(df1, df2, how='inner', on=['col1', 'col2', 'col3']).reset_index(), stubnames=['col4', 'col5'], i='index', j='val', sep='_', suffix=r'[xy]') .sort_index().reset_index(drop=True)[df1.columns] )
Output:
>>> out col1 col2 col3 col4 col5 0 I ate dinner I ate dinner I went out min min 1 I ate dinner I ate dinner I went out min max 2 the play was inetresting the gold is shining the house is nice min max 3 the play was inetresting the gold is shining the house is nice max mid
Step by step
# Step 1: merge >>> out = pd.merge(df1, df2, how='inner', on=['col1', 'col2', 'col3']).reset_index() index col1 col2 col3 col4_x col5_x col4_y col5_y 0 0 I ate dinner I ate dinner I went out min min min max 1 1 the play was inetresting the gold is shining the house is nice min max max mid # Step 2: wide_to_long >>> out = pd.wide_to_long(out, stubnames=['col4', 'col5'], i='index', j='val', sep='_', suffix=r'[xy]') col3 col2 col1 col4 col5 index val 0 x I went out I ate dinner I ate dinner min min 1 x the house is nice the gold is shining the play was inetresting min max 0 y I went out I ate dinner I ate dinner min max 1 y the house is nice the gold is shining the play was inetresting max mid # Step 3: reorder dataframe >>> out = out.sort_index().reset_index(drop=True)[df1.columns] col1 col2 col3 col4 col5 0 I ate dinner I ate dinner I went out min min 1 I ate dinner I ate dinner I went out min max 2 the play was inetresting the gold is shining the house is nice min max 3 the play was inetresting the gold is shining the house is nice max mid