I have a database with more than 300 duplicates that look like this:
JavaScript
x
13
13
1
rate shipment_id original_cost
2
8.14 500410339210 5.93
3
7.81 500410339221 5.93
4
8.53 500410339232 7.07
5
8.53 500410339243 14.31
6
2.76 500410345319 68.87
7
8
8.46 987506030619 7.36
9
8.46 987506030620 7.36
10
7.32 987506030630 6.80
11
27.82 997311250164181 144.44
12
7.32 997355250064942 19.83
13
I want that for each duplicate shipment_id only original_cost gets added together and rates remain as they are.
JavaScript
1
6
1
rate shipment_id original_cost
2
3.06 926401748430 2.54
3
3.06 926401748430 19.60
4
16.34 926401748430 2.54
5
16.34 926401748430 19.60
6
like for these duplicates: it should look something like this:
JavaScript
1
6
1
rate shipment_id original_cost
2
3.06 926401748430 22.14
3
3.06 926401748430 22.14
4
16.34 926401748430 22.14
5
16.34 926401748430 22.14
6
is there any way to do this?
Advertisement
Answer
Group by the duplicate values (['shipment_id', 'rate']
) and use transform
on the “original_cost” column to calculate the sum:
JavaScript
1
2
1
df['original_cost'] = df.groupby(['shipment_id', 'rate'])['original_cost'].transform('sum')
2
Example input:
JavaScript
1
6
1
rate shipment_id original_cost
2
0 3.06 926401748430 22.14
3
1 3.06 926401748430 22.14
4
2 16.34 926401748430 22.14
5
3 16.34 926401748430 22.14
6
Example output:
JavaScript
1
6
1
rate shipment_id original_cost
2
0 3.06 926401748430 22.14
3
1 3.06 926401748430 22.14
4
2 16.34 926401748430 22.14
5
3 16.34 926401748430 22.14
6