I’m doing some NBA analysis and have a “Minutes Played” column for players in a mm:ss format. What dtype should this column be to perform aggregate functions (mean, min, max, etc…) on it? The df has over 20,000 rows, so here is a sample of the column in question:
JavaScript
x
7
1
Minutes
2
0 18:30
3
1 24:50
4
2 33:21
5
3 28:39
6
4 27:30
7
I ran this code to change the format to datetime –
JavaScript
1
2
1
df['Minutes'] = pd.to_datetime(df['Minutes'], format='%M:%S', errors='coerce')
2
it changed the dtype successfully, but I am still unable to perform operations on the column. I am met with this error when trying to aggregate the column:
JavaScript
1
2
1
DataError: No numeric types to aggregate
2
My code for the aggregate
JavaScript
1
2
1
df2 = df.groupby(['Name', 'Team']).agg({'Minutes' : 'mean'})
2
I would like to be able to see the average # of minutes and retain the mm:ss format.
Any help is appreciated.
Advertisement
Answer
JavaScript
1
10
10
1
import pandas as pd
2
data = {
3
'Minutes': ['18:30', '24:50', '33:21', '28:39', '27:30'],
4
'Team': ['team1', 'team2', 'team1', 'team1', 'team2']
5
}
6
7
df = pd.DataFrame(data)
8
df['Minutes'] = pd.to_timedelta('00:' + df['Minutes'].replace('',np.NaN)))
9
df.groupby('Team')['Minutes'].mean()
10
output:
JavaScript
1
6
1
>>>
2
Team
3
team1 0 days 00:26:50
4
team2 0 days 00:26:10
5
Name: Minutes, dtype: timedelta64[ns]
6