Summing duplicates rows

Question

I have a database with more than 300 duplicates that look like this: I want that for each duplicate shipment_id only original_cost gets added together and rates remain as they are. like for these duplicates: it should look something like this: is there any way to do this? Answer Group by the duplicate values …

Accepted Answer

Group by the duplicate values (['shipment_id', 'rate']) and use transform on the &#8220;original_cost&#8221; column to calculate the sum:df['original_cost'] = df.groupby(['shipment_id', 'rate'])['original_cost'].transform('sum')Example input:    rate    shipment_id original_cost0   3.06    926401748430    22.141   3.06    926401748430    22.142   16.34   926401748430    22.143   16.34   926401748430    22.14Example output:    rate   shipment_id  original_cost0   3.06  926401748430          22.141   3.06  926401748430          22.142  16.34  926401748430          22.143  16.34  926401748430          22.14

Advertisement

Answer