I would like to join two files by key but I would like some columns to be joined together for example:
File1:
JavaScript
x
10
10
1
df1 = pd.DataFrame({'List' : ['P111', 'P999', 'P111;P999;P777', 'P555', 'P666;P111;P333'],
2
'Color' : ['red', 'red', 'blue','yellow', 'red']})
3
4
List Color
5
0 P111 red
6
1 P999 red
7
2 P111;P999;P777 blue
8
3 P555 yellow
9
4 P666;P111;P333 red
10
File2:
JavaScript
1
13
13
1
df2 = pd.DataFrame({'Cod' : ['P111', 'P222', 'P333', 'P444', 'P555', 'P666', 'P777'],
2
'Animal' : ['DOG', 'CAT', 'BUG','SNAKE,DOG', 'CAT,BUG', 'DOG', 'SNAKE'],
3
'Letter' : ['A,F', 'C', 'S,M', 'F,L', 'C,A','M,C', 'Z,L']})
4
5
Cod Animal Letter
6
0 P111 DOG A,F
7
1 P222 CAT C
8
2 P333 BUG S,M
9
3 P444 SNAKE,DOG F,L
10
4 P555 CAT,BUG C,A
11
5 P666 DOG M,C
12
6 P777 SNAKE Z,L
13
I would like to merge with primary key List (from file1) and Cod (from file2), to get:
JavaScript
1
7
1
List Color Animal Letter
2
0 P111 red DOG,FROG A,F
3
1 P999 red - -
4
2 P111;P999;P777 blue DOG,FROG|-|SNAKE A,F|-|Z,L
5
3 P555 yellow CAT,BUG C,A
6
4 P666;P111;P333 red DOG|DOG,FROG|BUG M,C|A,F|S,M
7
I think we need something like a left join and an agragation but I don’t know how. In the final table I would like that in the values not found there was a – . While I would like a | to separate values in “aggregated” columns
Which is the best way?
Advertisement
Answer
Idea is use DataFrame.explode
by splitted values of List
, then use left join and aggregate first values with join
for expected ouput:
JavaScript
1
19
19
1
df = (df1.assign(Cod = df1['List'].str.split(';'))
2
.explode('Cod')
3
.reset_index()
4
.merge(df2, how='left', on='Cod')
5
.fillna('-')
6
.groupby('index')
7
.agg(List=('List','first'),
8
Color=('Color','first'),
9
Animal=('Animal','|'.join),
10
Letter=('Letter','|'.join))
11
.rename_axis(None))
12
print (df)
13
List Color Animal Letter
14
0 P111 red DOG A,F
15
1 P999 red - -
16
2 P111;P999;P777 blue DOG|-|SNAKE A,F|-|Z,L
17
3 P555 yellow CAT,BUG C,A
18
4 P666;P111;P333 red DOG|DOG|BUG M,C|A,F|S,M
19