I have a database with more than 300 duplicates that look like this:
rate shipment_id original_cost 8.14 500410339210 5.93 7.81 500410339221 5.93 8.53 500410339232 7.07 8.53 500410339243 14.31 2.76 500410345319 68.87 ... ... ... ... 8.46 987506030619 7.36 8.46 987506030620 7.36 7.32 987506030630 6.80 27.82 997311250164181 144.44 7.32 997355250064942 19.83
I want that for each duplicate shipment_id only original_cost gets added together and rates remain as they are.
rate shipment_id original_cost 3.06 926401748430 2.54 3.06 926401748430 19.60 16.34 926401748430 2.54 16.34 926401748430 19.60
like for these duplicates: it should look something like this:
rate shipment_id original_cost 3.06 926401748430 22.14 3.06 926401748430 22.14 16.34 926401748430 22.14 16.34 926401748430 22.14
is there any way to do this?
Advertisement
Answer
Group by the duplicate values (['shipment_id', 'rate']
) and use transform
on the “original_cost” column to calculate the sum:
df['original_cost'] = df.groupby(['shipment_id', 'rate'])['original_cost'].transform('sum')
Example input:
rate shipment_id original_cost 0 3.06 926401748430 22.14 1 3.06 926401748430 22.14 2 16.34 926401748430 22.14 3 16.34 926401748430 22.14
Example output:
rate shipment_id original_cost 0 3.06 926401748430 22.14 1 3.06 926401748430 22.14 2 16.34 926401748430 22.14 3 16.34 926401748430 22.14