Skip to content
Advertisement

pandas pivot_tables doesn’t work with date data (No numeric types to aggregate)

I have the following dataframe:

index   id    code    data    date
0      AZ234  B213    apple   2020-09-01 <- duplicate id, code, data
1      AZ234  B213    apple   2022-02-02 <- duplicate id, code, data
2      AZ234  B213    banana  2020-07-01
3      AZ234  B213    orange  2020-05-11
4      AL612  B309    apple   2020-12-05
5      AL612  B309    banana  2020-07-21
6      AL612  B309    orange  2020-09-21

...

I want to create pivot table to get the following table:

id    code    apple         banana        orange
AZ234  B213   2020-09-01    2020-07-01     2020-05-11
AL612  B309   2020-12-05    2020-07-21     2020-09-21
...

I have tried to do this using pivot_table (pandas):

pd.pivot_table(df, values='date', index=['id','code'],
                       columns=['data'])

but I get this error:

DataError: No numeric types to aggregate

I have read this post but it seems to be a bit different as I don’t want to change the columns and also I got error when I tried to set_index with code and id ( ” ValueError: Index contains duplicate entries, cannot reshape”).

My goal is to create pivot table with dates as values of the table.

Advertisement

Answer

There are duplicates per id, date, data so is necessary add some aggregate function:

If there are datetimes:

df['date'] = pd.to_datetime(df['date'])

df.pivot_table(values='date', index=['id','code'], columns=['data'], aggfunc='first')

df.pivot_table(values='date', index=['id','code'], columns=['data'], aggfunc='max')

If there are strings:

print (df['date'].dtype)

df.pivot_table(values='date', index=['id','code'], columns=['data'], aggfunc=','.join)
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement