I have a dataset in which in a column I have the name of a person and in another column I have the amount she was paid for a given service. I’d like to build a list with the names of all people ordained by the total amount they were paid regardless of the service they performed. Example:
Ann 100 John 200 Matt 150 John 150 John 150 Ann 300 Erik 150 =========== John 500 Ann 400 Matt 150 Erik 150
I figured this involves looking for all repeated instances of the name of the person and then storing the value paid on the column, eventually summing up everything. The problem is I have too big of a list and can’t check individual names. That is, I can’t define a particular string for each name to be checked, rather I’d like the program to figure the repeated instances by itself and return the ordained list in the manner that I described. Is there any way to do this? I know a bit of Python and R so any method described in these languages would be particularly helpful.
Advertisement
Answer
If you made your dataset into a pandas dataframe this is easily done with groupby
import pandas as pd df = pd.DataFrame({'name':names, 'paid':paid}) total_pay = df.groupby(by='name').sum()