I have a Dataframe with three columns Customer
, Purchase amount
and Purchase date
.
I need to get sum of Purchase amount
of customer purchases for each day, then get mean of purchases for each customer for one month.
I’ve tried to use df.pivot_table()
but it takes mean of purchases for one day and then mean of purchases for one month.
Hope you can help me.
My Dataframe:
lst = [['Customer1', 100, '10.01.2020'], ['Customer1', 210, '10.01.2020'], ['Customer1', 122, '10.01.2020'], ['Customer1', 259, '10.13.2020'], ['Customer1', 160, '10.13.2020'], ['Customer1', 225, '10.13.2020'], ['Customer1', 310, '10.15.2020'], ['Customer1', 310, '10.15.2020'], ['Customer1', 310, '10.15.2020'], ['Customer2', 50, '10.21.2020'], ['Customer2', 70, '10.21.2020'], ['Customer2', 100, '10.21.2020'], ['Customer2', 120, '10.25.2020'], ['Customer2', 140, '10.25.2020'], ['Customer2', 155, '10.25.2020'], ['Customer2', 40, '10.27.2020'], ['Customer2', 30, '10.27.2020'], ['Customer2', 80, '10.27.2020']] df = pd.DataFrame(lst, columns = ['Customer', 'Amount', 'Purchase date']) Customer Amount Purchase date 0 Customer1 100 10.01.2020 1 Customer1 210 10.01.2020 2 Customer1 122 10.01.2020 3 Customer1 259 10.13.2020 4 Customer1 160 10.13.2020 5 Customer1 225 10.13.2020 6 Customer1 310 10.15.2020 7 Customer1 310 10.15.2020 8 Customer1 310 10.15.2020 9 Customer2 50 10.21.2020 10 Customer2 70 10.21.2020 11 Customer2 100 10.21.2020 12 Customer2 120 10.25.2020 13 Customer2 140 10.25.2020 14 Customer2 155 10.25.2020 15 Customer2 40 10.27.2020 16 Customer2 30 10.27.2020 17 Customer2 80 10.27.2020
I’ve tried to use:
df.pivot_table(index = 'Customer', columns = 'Purchase date', values = 'Amount', aggfunc = 'mean', margins = True)
Result:
Purchase date 10.01.2020 10.13.2020 10.15.2020 10.21.2020 10.25.2020 10.27.2020 Mean Customer Customer1 144 214 310 NaN NaN NaN 222 Customer2 NaN NaN NaN 73 138 50 87
Here is what I need to get:
Purchase date 10.01.2020 10.13.2020 10.15.2020 10.21.2020 10.25.2020 10.27.2020 Mean Customer1 432 644 930 NaN NaN NaN 668 Customer2 NaN NaN NaN 220 415 150.0 261
Advertisement
Answer
df['Purchase date'] = pd.to_datetime(df['Purchase date']) # create a month column df['month'] = df['Purchase date'].apply(lambda x: x.month)
For getting the sum for each day use groupby
df.groupby(by=['Purchase date', 'Customer']).sum()
output
Purchase date Customer Amount month 2020-10-01 Customer1 432 30 2020-10-13 Customer1 644 30 2020-10-15 Customer1 930 30 2020-10-21 Customer2 220 30 2020-10-25 Customer2 415 30 2020-10-27 Customer2 150 30
For monthly average again use groupby
df.groupby(by=['month','Customer'] ).mean()
output
month Customer Amount 10 Customer1 222.888889 Customer2 87.222222
Looking at what you are trying to achieve can be done using pivot table
as
df = df.pivot_table(index = 'Customer', columns = 'Purchase date', values = 'Amount', aggfunc = 'sum', margins = True, margins_name='mean') df['mean'] = df.apply(lambda x: round(x['mean']/(x.count()-1)), axis=1)
output
Purchase date 10.01.2020 10.13.2020 10.15.2020 10.21.2020 10.25.2020 10.27.2020 mean Customer Customer1 432.0 644.0 930.0 NaN NaN NaN 669.0 Customer2 NaN NaN NaN 220.0 415.0 150.0 262.0 mean 432.0 644.0 930.0 220.0 415.0 150.0 930.0