Skip to content
Advertisement

Pandas Dataframe – Sum values for a specific date then divide by the count of that date

I have a pandas dataframe with several dates, and several values for each date, I’m trying to sum the values of each date then divide by the number of records for that same date.

Example:

date value
2022-09-16 1
2022-09-16 2
2022-09-16 3
2022-09-15 6
2022-09-15 2
2022-09-15 2
2022-09-14 7

The expected result for this would be a new dataframe containing the sum of the values for the date divided by the count of values for each date: Eg:

date value
2022-09-16 2
2022-09-15 3.3333
2022-09-14 7

2022-09-16 would be: (1+2+3)/3

My current code for this function is the following so far:

  dfyieldd = pd.read_sql_query(getmacrofactorvalues, conn)
            dfyieldd.to_csv('dfyieldd_raw.csv')
            

            resultseries = pd.Series(data=dfyieldd['rawvalue'])
            resultzscored = zscoreoutliers(resultseries)
        
            dfyieldd['value'] = resultzscored
            dfyieldd.to_csv('dfyieldd_zcored.csv')

            
            dfyieldd.set_index('datadate')
            dfyieldd.index=pd.to_datetime(dfyieldd.index)

My question is if there is any way to do this without having to use foreach. I could achieve it by looping through the dates, but it doesn’t feel it’s the proper way to deal with it.

Thanks

Advertisement

Answer

Use pandas.DataFrame.groupby to aggregate then pandas.Series.div to divide cols.

Try this :

out = (
        dfyieldd.groupby('date')['value'].agg(['sum','count'])
                .assign(value=lambda x: x.pop('sum').div(x.pop('count')))
                .reset_index()
      )

Or simply by using pandas.Series.mean :

out = dfyieldd.groupby('date', as_index=False)['value'].mean()

# Output :

print(out)

         date     value
0  2022-09-14  7.000000
1  2022-09-15  3.333333
2  2022-09-16  2.000000
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement