Table A
JavaScript
x
9
1
Item amount year
2
A 100 2011
3
A 110 2011
4
A 125 2012
5
A 25 2012
6
B 54 2013
7
C 55 2020
8
D 68 2022
9
Table B
JavaScript
1
11
11
1
item year desc
2
A 2011 xxx
3
A 2011 xxx
4
A 2012 xxx
5
B 2011 xxx
6
B 2012 xxx
7
B 2013 xxx
8
B 2014 xxx
9
C 2020 xxx
10
D 2022 xxx
11
Table C
JavaScript
1
11
11
1
item year desc Total
2
A 2011 xxx 220
3
A 2011 xxx 0
4
A 2012 xxx 150
5
B 2011 xxx 0
6
B 2012 xxx 0
7
B 2013 xxx 54
8
B 2014 xxx 0
9
C 2020 xxx 55
10
D 2022 xxx 68
11
These are example tables that represent dataframes that I’m going to create from seperate excel sheets.
Basically there’s a many to many relationship going on and I want to be able to create a combined sheet that will roll up the “amount” total (from Table A) for each year of the item (Table B). I don’t want to duplicate the amounts for multiple items with the same year as you see in table A, item A has two 2011 entries.
What I’d like to get is a result like Table C below
I’m currently trying to use pandas merge but not having much luck. Can anyone lend a hand?
Advertisement
Answer
IIUC, you can use a merge
and post-process to remove the duplicates per year:
JavaScript
1
9
1
out = (dfB
2
.merge(dfA.rename(columns={'Item': 'item'})
3
.groupby(['item', 'year'], as_index=False).sum(), how='left')
4
.assign(amount=lambda d: d['amount']
5
.mask(d.groupby('year').cumcount().gt(0), 0)
6
.fillna(0)
7
)
8
)
9
output:
JavaScript
1
11
11
1
item year desc amount
2
0 A 2011 xxx 210.0
3
1 A 2011 xxx 0.0
4
2 A 2012 xxx 150.0
5
3 B 2011 xxx 0.0
6
4 B 2012 xxx 0.0
7
5 B 2013 xxx 54.0
8
6 B 2014 xxx 0.0
9
7 C 2020 xxx 55.0
10
8 D 2022 xxx 68.0
11