Table A
Item amount year A 100 2011 A 110 2011 A 125 2012 A 25 2012 B 54 2013 C 55 2020 D 68 2022
Table B
item year desc A 2011 xxx A 2011 xxx A 2012 xxx B 2011 xxx B 2012 xxx B 2013 xxx B 2014 xxx C 2020 xxx D 2022 xxx
Table C
item year desc Total A 2011 xxx 220 A 2011 xxx 0 A 2012 xxx 150 B 2011 xxx 0 B 2012 xxx 0 B 2013 xxx 54 B 2014 xxx 0 C 2020 xxx 55 D 2022 xxx 68
These are example tables that represent dataframes that I’m going to create from seperate excel sheets.
Basically there’s a many to many relationship going on and I want to be able to create a combined sheet that will roll up the “amount” total (from Table A) for each year of the item (Table B). I don’t want to duplicate the amounts for multiple items with the same year as you see in table A, item A has two 2011 entries.
What I’d like to get is a result like Table C below
I’m currently trying to use pandas merge but not having much luck. Can anyone lend a hand?
Advertisement
Answer
IIUC, you can use a merge
and post-process to remove the duplicates per year:
out = (dfB .merge(dfA.rename(columns={'Item': 'item'}) .groupby(['item', 'year'], as_index=False).sum(), how='left') .assign(amount=lambda d: d['amount'] .mask(d.groupby('year').cumcount().gt(0), 0) .fillna(0) ) )
output:
item year desc amount 0 A 2011 xxx 210.0 1 A 2011 xxx 0.0 2 A 2012 xxx 150.0 3 B 2011 xxx 0.0 4 B 2012 xxx 0.0 5 B 2013 xxx 54.0 6 B 2014 xxx 0.0 7 C 2020 xxx 55.0 8 D 2022 xxx 68.0