Skip to content
Advertisement

Pandas merging/joining tables with multiple key columns and duplicating rows where necessary

I have several tables that contain lab results, with a ‘master’ table of sample data with things like a description. The results tables are also broken down by specimen (sub-samples). They contain multiple results columns – I’m just showing one here. I want to combine all the results tables into one dataframe, like this:

JavaScript

I currently have a solution for this using iterrows() and df.append(), but these are both slow operations and when there are thousands of results it takes too long. Is there better way? I have tried using join() and merge() but I can’t seem to get the result I want.

Quick code to reproduce my dataframes:

JavaScript

Advertisement

Answer

First idea is join df2, df3 together by concat and for unique 'Location','Sample','Specimen' rows are rows aggregated by sum, last merge to df1:

JavaScript

Or if all rows in df2,df3 per columns ['Location','Sample','Specimen'] are unique, solution is simplier:

JavaScript

EDIT: With new data second solution working well:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement