Pandas merging/joining tables with multiple key columns and duplicating rows where necessary

Question

I have several tables that contain lab results, with a 'master' table of sample data with things like a description. The results tables are also broken down by specimen (sub-samples). They contain multiple results columns - I'm just showing one here. I want to combine all the results tables into one dataframe, like this: I currently have a solution for

Accepted Answer

First idea is join df2, df3 together by concat and for unique 'Location','Sample','Specimen' rows are rows aggregated by sum, last merge to df1:df23 = (pd.concat([df2, df3])          .groupby(['Location','Sample','Specimen'], as_index=False, sort=False)          .sum(min_count=1))df = df1.merge(df23, on=['Location','Sample'])print (df)   Location Sample Description Specimen  Result1  Result20         1      A      Yellow        x      5.0      4.01         1      A      Yellow        y      6.0      NaN2         1      A      Yellow        q      NaN      6.03         1      B         Red        x     10.0      NaN4         1      B         Red        k      NaN      8.05         2      A        Blue        x      1.0      NaN6         2      B      Violet        z      NaN      5.0Or if all rows in df2,df3 per columns ['Location','Sample','Specimen'] are unique, solution is simplier:df23 = pd.concat([df2.set_index(['Location','Sample','Specimen']),                   df3.set_index(['Location','Sample','Specimen'])], axis=1)df = df1.merge(df23.reset_index(), on=['Location','Sample'])print (df)   Location Sample Description Specimen  Result1  Result20         1      A      Yellow        q      NaN      6.01         1      A      Yellow        x      5.0      4.02         1      A      Yellow        y      6.0      NaN3         1      B         Red        k      NaN      8.04         1      B         Red        x     10.0      NaN5         2      A        Blue        x      1.0      NaN6         2      B      Violet        z      NaN      5.0EDIT: With new data second solution working well:df23 = pd.concat([df2.set_index(['Location','Sample','Specimen']),                   df3.set_index(['Location','Sample','Specimen'])], axis=1)df = df1.merge(df23.reset_index(), on=['Location','Sample'])print (df)           Location Sample Description Specimen  Result1  Result20         1      A      Yellow        q      NaN     Soft1         1      A      Yellow        x      5.0    Heavy2         1      A      Yellow        y      6.0      NaN3         1      B         Red        k      NaN     Grey4         1      B         Red        x     10.0      NaN5         2      A        Blue        x      1.0      NaN6         2      B      Violet        z      NaN  Bananas

Advertisement

Answer