Python

I have a pandas dataframe with several dates, and several values for each date, I’m trying to sum the values of each date then divide by the number of records for that same date.

Example:

date	value
2022-09-16	1
2022-09-16	2
2022-09-16	3
2022-09-15	6
2022-09-15	2
2022-09-15	2
2022-09-14	7

The expected result for this would be a new dataframe containing the sum of the values for the date divided by the count of values for each date: Eg:

date	value
2022-09-16	2
2022-09-15	3.3333
2022-09-14	7

2022-09-16 would be: (1+2+3)/3

My current code for this function is the following so far:

  dfyieldd = pd.read_sql_query(getmacrofactorvalues, conn)
            dfyieldd.to_csv('dfyieldd_raw.csv')
            

            resultseries = pd.Series(data=dfyieldd['rawvalue'])
            resultzscored = zscoreoutliers(resultseries)
        
            dfyieldd['value'] = resultzscored
            dfyieldd.to_csv('dfyieldd_zcored.csv')

            
            dfyieldd.set_index('datadate')
            dfyieldd.index=pd.to_datetime(dfyieldd.index)

JavaScript
​x
 
  dfyieldd = pd.read_sql_query(getmacrofactorvalues, conn)
            dfyieldd.to_csv('dfyieldd_raw.csv')
            
​
            resultseries = pd.Series(data=dfyieldd['rawvalue'])
            resultzscored = zscoreoutliers(resultseries)
        
            dfyieldd['value'] = resultzscored
            dfyieldd.to_csv('dfyieldd_zcored.csv')
​
            
            dfyieldd.set_index('datadate')
            dfyieldd.index=pd.to_datetime(dfyieldd.index)
​

My question is if there is any way to do this without having to use foreach. I could achieve it by looping through the dates, but it doesn’t feel it’s the proper way to deal with it.

Thanks

Answer

Use pandas.DataFrame.groupby to aggregate then pandas.Series.div to divide cols.

Try this :

out = (
        dfyieldd.groupby('date')['value'].agg(['sum','count'])
                .assign(value=lambda x: x.pop('sum').div(x.pop('count')))
                .reset_index()
      )

JavaScript
 
out = (
        dfyieldd.groupby('date')['value'].agg(['sum','count'])
                .assign(value=lambda x: x.pop('sum').div(x.pop('count')))
                .reset_index()
      )
​

Or simply by using pandas.Series.mean :

out = dfyieldd.groupby('date', as_index=False)['value'].mean()

JavaScript
 
out = dfyieldd.groupby('date', as_index=False)['value'].mean()
​

# Output :

print(out)

         date     value
0  2022-09-14  7.000000
1  2022-09-15  3.333333
2  2022-09-16  2.000000

JavaScript
 
print(out)
​
         date     value
0  2022-09-14  7.000000
1  2022-09-15  3.333333
2  2022-09-16  2.000000
​

Pandas Dataframe – Sum values for a specific date then divide by the count of that date

Advertisement

Answer

# Output :