I have a dataset in which in a column I have the name of a person and in another column I have the amount she was paid for a given service. I’d like to build a list with the names of all people ordained by the total amount they were paid regardless of the service they performed. Example:
Ann 100
John 200
Matt 150
John 150
John 150
Ann 300
Erik 150
===========
John 500
Ann 400
Matt 150
Erik 150
I figured this involves looking for all repeated instances of the name of the person and then storing the value paid on the column, eventually summing up everything. The problem is I have too big of a list and can’t check individual names. That is, I can’t define a particular string for each name to be checked, rather I’d like the program to figure the repeated instances by itself and return the ordained list in the manner that I described. Is there any way to do this? I know a bit of Python and R so any method described in these languages would be particularly helpful.
Advertisement
Answer
If you made your dataset into a pandas dataframe this is easily done with groupby
import pandas as pd
df = pd.DataFrame({'name':names, 'paid':paid})
total_pay = df.groupby(by='name').sum()