I have the following dataframe:
index id code data date 0 AZ234 B213 apple 2020-09-01 <- duplicate id, code, data 1 AZ234 B213 apple 2022-02-02 <- duplicate id, code, data 2 AZ234 B213 banana 2020-07-01 3 AZ234 B213 orange 2020-05-11 4 AL612 B309 apple 2020-12-05 5 AL612 B309 banana 2020-07-21 6 AL612 B309 orange 2020-09-21 ...
I want to create pivot table to get the following table:
id code apple banana orange AZ234 B213 2020-09-01 2020-07-01 2020-05-11 AL612 B309 2020-12-05 2020-07-21 2020-09-21 ...
I have tried to do this using pivot_table (pandas):
pd.pivot_table(df, values='date', index=['id','code'], columns=['data'])
but I get this error:
DataError: No numeric types to aggregate
I have read this post but it seems to be a bit different as I don’t want to change the columns and also I got error when I tried to set_index with code and id ( ” ValueError: Index contains duplicate entries, cannot reshape”).
My goal is to create pivot table with dates as values of the table.
Advertisement
Answer
There are duplicates per id, date, data
so is necessary add some aggregate function:
If there are datetimes:
df['date'] = pd.to_datetime(df['date']) df.pivot_table(values='date', index=['id','code'], columns=['data'], aggfunc='first') df.pivot_table(values='date', index=['id','code'], columns=['data'], aggfunc='max')
If there are strings:
print (df['date'].dtype) df.pivot_table(values='date', index=['id','code'], columns=['data'], aggfunc=','.join)