Skip to content
Advertisement

Merge two rows and put the results in the same columns

I would like to join two files by key but I would like some columns to be joined together for example:

File1:

    df1 = pd.DataFrame({'List' : ['P111', 'P999', 'P111;P999;P777', 'P555', 'P666;P111;P333'],
               'Color' : ['red', 'red', 'blue','yellow', 'red']})

        List        Color
0  P111              red
1  P999              red
2  P111;P999;P777    blue
3  P555              yellow
4  P666;P111;P333    red

File2:

    df2 = pd.DataFrame({'Cod' : ['P111', 'P222', 'P333', 'P444', 'P555', 'P666', 'P777'],
               'Animal' : ['DOG', 'CAT', 'BUG','SNAKE,DOG', 'CAT,BUG', 'DOG', 'SNAKE'],
               'Letter' : ['A,F', 'C', 'S,M', 'F,L', 'C,A','M,C', 'Z,L']})

    Cod     Animal     Letter
 0  P111    DOG         A,F
 1  P222    CAT         C
 2  P333    BUG         S,M
 3  P444    SNAKE,DOG   F,L
 4  P555    CAT,BUG     C,A
 5  P666    DOG         M,C
 6  P777    SNAKE       Z,L

I would like to merge with primary key List (from file1) and Cod (from file2), to get:

     List           Color     Animal             Letter
0  P111              red      DOG,FROG           A,F
1  P999              red      -                  -
2  P111;P999;P777    blue     DOG,FROG|-|SNAKE   A,F|-|Z,L
3  P555              yellow   CAT,BUG            C,A 
4  P666;P111;P333    red      DOG|DOG,FROG|BUG   M,C|A,F|S,M

I think we need something like a left join and an agragation but I don’t know how. In the final table I would like that in the values ​​not found there was a . While I would like a | to separate values ​​in “aggregated” columns

Which is the best way?

Advertisement

Answer

Idea is use DataFrame.explode by splitted values of List, then use left join and aggregate first values with join for expected ouput:

df = (df1.assign(Cod = df1['List'].str.split(';'))
         .explode('Cod')
         .reset_index()
         .merge(df2, how='left', on='Cod')
         .fillna('-')
         .groupby('index')
         .agg(List=('List','first'),
              Color=('Color','first'),
              Animal=('Animal','|'.join),
              Letter=('Letter','|'.join))
         .rename_axis(None))
print (df)
             List   Color       Animal       Letter
0            P111     red          DOG          A,F
1            P999     red            -            -
2  P111;P999;P777    blue  DOG|-|SNAKE    A,F|-|Z,L
3            P555  yellow      CAT,BUG          C,A
4  P666;P111;P333     red  DOG|DOG|BUG  M,C|A,F|S,M
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement