Skip to content
Advertisement

Summing up all repeated values in a dataset

I have a dataset in which in a column I have the name of a person and in another column I have the amount she was paid for a given service. I’d like to build a list with the names of all people ordained by the total amount they were paid regardless of the service they performed. Example:

Ann     100
John    200
Matt    150
John    150
John    150
Ann     300
Erik    150

===========
John    500
Ann     400
Matt    150
Erik    150

I figured this involves looking for all repeated instances of the name of the person and then storing the value paid on the column, eventually summing up everything. The problem is I have too big of a list and can’t check individual names. That is, I can’t define a particular string for each name to be checked, rather I’d like the program to figure the repeated instances by itself and return the ordained list in the manner that I described. Is there any way to do this? I know a bit of Python and R so any method described in these languages would be particularly helpful.

Advertisement

Answer

If you made your dataset into a pandas dataframe this is easily done with groupby

import pandas as pd
df = pd.DataFrame({'name':names, 'paid':paid})


total_pay = df.groupby(by='name').sum()
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement