I asked quite a similar question here but was wondering if there is a way to tackle the issue if one has to rely on multiple columns to perform the append. So the dataframes look as follows,
JavaScript
x
16
16
1
import pandas as pd
2
d1 ={'col1': ['I ate dinner','I ate dinner', 'the play was inetresting','the play was inetresting'],
3
'col2': ['I ate dinner','I went to school', 'the play was inetresting for her','the gold is shining'],
4
'col3': ['I went out','I did not stay at home', 'the play was inetresting for her','the house is nice'],
5
'col4': ['min', 'max', 'mid','min'],
6
'col5': ['min', 'max', 'max','max']}
7
8
d2 ={'col1': ['I ate dinner',' the glass is shattered', 'the play was inetresting'],
9
'col2': ['I ate dinner',' the weather is nice', 'the gold is shining'],
10
'col3': ['I went out',' the house was amazing', 'the house is nice'],
11
'col4': ['min', 'max', 'max'],
12
'col5': ['max', 'min', 'mid']}
13
14
df1 = pd.DataFrame(d1)
15
df2 = pd.DataFrame(d2)
16
so this time, i would like to append the rows from df2 under similar rows in df1 only if the rows are similar in all col1, col2, col3. so the output is,
JavaScript
1
6
1
col1 col2 col3 col4 col5
2
0 I ate dinner I ate dinner I went out min min
3
1 I ate dinner I ate dinner I went out min max
4
2 the play was inetresting the gold is shining the house is nice min max
5
3 the play was inetresting the gold is shining the house is nice max mid
6
so I tried the following,
JavaScript
1
2
1
df = pd.concat(df1[df1.set_index(['col1','col2','col3']).index.isin(df2.set_index(['col1','col2','col3']).index)]).sort_values(df1.set_index(['col1','col2','col3']).index, ignore_index=True)
2
but I get this error,
JavaScript
1
2
1
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
2
Advertisement
Answer
Another solution is to use pd.merge
and pd.wide_to_long
:
JavaScript
1
7
1
out = (
2
pd.wide_to_long(
3
pd.merge(df1, df2, how='inner', on=['col1', 'col2', 'col3']).reset_index(),
4
stubnames=['col4', 'col5'], i='index', j='val', sep='_', suffix=r'[xy]')
5
.sort_index().reset_index(drop=True)[df1.columns]
6
)
7
Output:
JavaScript
1
7
1
>>> out
2
col1 col2 col3 col4 col5
3
0 I ate dinner I ate dinner I went out min min
4
1 I ate dinner I ate dinner I went out min max
5
2 the play was inetresting the gold is shining the house is nice min max
6
3 the play was inetresting the gold is shining the house is nice max mid
7
Step by step
JavaScript
1
23
23
1
# Step 1: merge
2
>>> out = pd.merge(df1, df2, how='inner', on=['col1', 'col2', 'col3']).reset_index()
3
index col1 col2 col3 col4_x col5_x col4_y col5_y
4
0 0 I ate dinner I ate dinner I went out min min min max
5
1 1 the play was inetresting the gold is shining the house is nice min max max mid
6
7
# Step 2: wide_to_long
8
>>> out = pd.wide_to_long(out, stubnames=['col4', 'col5'], i='index', j='val', sep='_', suffix=r'[xy]')
9
col3 col2 col1 col4 col5
10
index val
11
0 x I went out I ate dinner I ate dinner min min
12
1 x the house is nice the gold is shining the play was inetresting min max
13
0 y I went out I ate dinner I ate dinner min max
14
1 y the house is nice the gold is shining the play was inetresting max mid
15
16
# Step 3: reorder dataframe
17
>>> out = out.sort_index().reset_index(drop=True)[df1.columns]
18
col1 col2 col3 col4 col5
19
0 I ate dinner I ate dinner I went out min min
20
1 I ate dinner I ate dinner I went out min max
21
2 the play was inetresting the gold is shining the house is nice min max
22
3 the play was inetresting the gold is shining the house is nice max mid
23