Combine dataframes based on multiple conditions in python

Question

Table A Table B Table C These are example tables that represent dataframes that I'm going to create from seperate excel sheets. Basically there's a many to many relationship going on and I want to be able to create a combined sheet that will roll up the "amount" total (from Table A) for each year of the item (Table B).

Accepted Answer

IIUC, you can use a merge and post-process to remove the duplicates per year:out = (dfB    .merge(dfA.rename(columns={'Item': 'item'})              .groupby(['item', 'year'], as_index=False).sum(), how='left')    .assign(amount=lambda d: d['amount']                             .mask(d.groupby('year').cumcount().gt(0), 0)                             .fillna(0)           ))output:  item  year desc  amount0    A  2011  xxx   210.01    A  2011  xxx     0.02    A  2012  xxx   150.03    B  2011  xxx     0.04    B  2012  xxx     0.05    B  2013  xxx    54.06    B  2014  xxx     0.07    C  2020  xxx    55.08    D  2022  xxx    68.0

Advertisement

Answer