Pandas Grouping by Hostname. Average of Sessions(on host) by Hour

Question

The dataframe looks like this. What I am trying to show the average sessions per hour by individual hostname. So I would get something back like this. I think I'm getting my grouping wrong as when trying this what I end up with is typically the largest average value per hour for any given hostname ordered in date by hour.

Accepted Answer

Here is an example based on the data you have provided. I have added the steps to make dates into datetime (in case they were objects) and to set datetime as a datetimeindex in order to use resample. It would go something like this:import pandas as pdimport numpy as npd ={'datetime' :['2020-10-27 00:00:05','2020-10-27 00:00:10','2020-10-27 00:00:15','2020-10-27 01:00:05','2020-10-27 01:00:10','2020-10-27 01:00:15','2020-10-27 00:00:05','2020-10-27 00:00:10','2020-10-27 00:00:15','2020-10-27 01:00:05','2020-10-27 01:00:10','2020-10-27 01:00:15'],   'hostname':['server001','server001','server001','server001','server001','server001','server002','server002','server002','server002','server002','server002'],   'sessions':[ 22,25,21 ,30,30,35,15,10, 11,19,22,18]}       df = pd.DataFrame(data=d)df['datetime'] =  pd.to_datetime(df['datetime'])df = df.set_index(pd.DatetimeIndex(df['datetime']))df.resample('H').mean()Actually, you can modify this example to fit other purposes. As I understood your question, you want to calculate hourly mean number of sessions. Check the resample-function if you need other groupby.sThe alternative to doing this is to seaprate date and time and then take the mean:df['datetime'] =  pd.to_datetime(df['datetime'])df['Date'] = [x.strftime('%Y-%m-%d') for x in df['datetime'].tolist()]df['Time'] = ['%s:00' % x.strftime('%H') for x in df['datetime'].tolist()]df_1 = df.groupby(['Date', 'Time', 'hostname']).mean()which gives

Advertisement

Answer