Skip to content
Advertisement

Pandas – What datatype should a duration column (mm:ss) be to use aggregates on it?

I’m doing some NBA analysis and have a “Minutes Played” column for players in a mm:ss format. What dtype should this column be to perform aggregate functions (mean, min, max, etc…) on it? The df has over 20,000 rows, so here is a sample of the column in question:

    Minutes
0   18:30
1   24:50
2   33:21
3   28:39
4   27:30

I ran this code to change the format to datetime –

df['Minutes'] = pd.to_datetime(df['Minutes'], format='%M:%S', errors='coerce')

it changed the dtype successfully, but I am still unable to perform operations on the column. I am met with this error when trying to aggregate the column:

DataError: No numeric types to aggregate

My code for the aggregate

df2 = df.groupby(['Name', 'Team']).agg({'Minutes' : 'mean'})

I would like to be able to see the average # of minutes and retain the mm:ss format.

Any help is appreciated.

Advertisement

Answer

import pandas as pd
data = {
    'Minutes': ['18:30', '24:50', '33:21', '28:39', '27:30'],
    'Team': ['team1', 'team2', 'team1', 'team1', 'team2']
}

df = pd.DataFrame(data)
df['Minutes'] = pd.to_timedelta('00:' + df['Minutes'].replace('',np.NaN)))
df.groupby('Team')['Minutes'].mean()

output:

>>>
Team
team1   0 days 00:26:50
team2   0 days 00:26:10
Name: Minutes, dtype: timedelta64[ns]
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement