Skip to content
Advertisement

Getting column mean by summing values of another column

I have a Dataframe with three columns Customer, Purchase amount and Purchase date.

I need to get sum of Purchase amount of customer purchases for each day, then get mean of purchases for each customer for one month.

I’ve tried to use df.pivot_table() but it takes mean of purchases for one day and then mean of purchases for one month.

Hope you can help me.

My Dataframe:

lst = [['Customer1', 100, '10.01.2020'], ['Customer1', 210, '10.01.2020'], ['Customer1', 122, '10.01.2020'],
       ['Customer1', 259, '10.13.2020'], ['Customer1', 160, '10.13.2020'], ['Customer1', 225, '10.13.2020'],
       ['Customer1', 310, '10.15.2020'], ['Customer1', 310, '10.15.2020'], ['Customer1', 310, '10.15.2020'],
       ['Customer2', 50, '10.21.2020'], ['Customer2', 70, '10.21.2020'], ['Customer2', 100, '10.21.2020'],
       ['Customer2', 120, '10.25.2020'], ['Customer2', 140, '10.25.2020'], ['Customer2', 155, '10.25.2020'],
       ['Customer2', 40, '10.27.2020'], ['Customer2', 30, '10.27.2020'], ['Customer2', 80, '10.27.2020']]

df = pd.DataFrame(lst, columns = ['Customer', 'Amount', 'Purchase date'])



   Customer  Amount Purchase date
0   Customer1     100    10.01.2020
1   Customer1     210    10.01.2020
2   Customer1     122    10.01.2020
3   Customer1     259    10.13.2020
4   Customer1     160    10.13.2020
5   Customer1     225    10.13.2020
6   Customer1     310    10.15.2020
7   Customer1     310    10.15.2020
8   Customer1     310    10.15.2020
9   Customer2      50    10.21.2020
10  Customer2      70    10.21.2020
11  Customer2     100    10.21.2020
12  Customer2     120    10.25.2020
13  Customer2     140    10.25.2020
14  Customer2     155    10.25.2020
15  Customer2      40    10.27.2020
16  Customer2      30    10.27.2020
17  Customer2      80    10.27.2020

I’ve tried to use:

df.pivot_table(index = 'Customer', columns = 'Purchase date', values = 'Amount', aggfunc = 'mean', margins = True)

Result:

Purchase date   10.01.2020  10.13.2020  10.15.2020  10.21.2020  10.25.2020  10.27.2020  Mean
Customer                            
Customer1          144          214        310          NaN         NaN         NaN     222
Customer2           NaN         NaN        NaN          73          138         50       87

Here is what I need to get:

  Purchase date 10.01.2020  10.13.2020  10.15.2020  10.21.2020  10.25.2020  10.27.2020  Mean                        
  Customer1        432          644        930         NaN          NaN        NaN      668
  Customer2        NaN          NaN        NaN         220           415      150.0     261
   

Advertisement

Answer

df['Purchase date'] = pd.to_datetime(df['Purchase date'])
# create a month column
df['month'] = df['Purchase date'].apply(lambda x: x.month)

For getting the sum for each day use groupby

df.groupby(by=['Purchase date', 'Customer']).sum()

output

Purchase date   Customer Amount month       
2020-10-01  Customer1   432     30
2020-10-13  Customer1   644     30
2020-10-15  Customer1   930     30
2020-10-21  Customer2   220     30
2020-10-25  Customer2   415     30
2020-10-27  Customer2   150     30

For monthly average again use groupby

df.groupby(by=['month','Customer'] ).mean()

output

        
month   Customer    Amount
10  Customer1   222.888889
Customer2   87.222222

Looking at what you are trying to achieve can be done using pivot table as

df = df.pivot_table(index = 'Customer', columns = 'Purchase date', values = 'Amount', aggfunc = 'sum', margins = True, margins_name='mean')
df['mean'] = df.apply(lambda x: round(x['mean']/(x.count()-1)), axis=1)

output

Purchase date   10.01.2020  10.13.2020  10.15.2020  10.21.2020  10.25.2020  10.27.2020  mean
Customer                            
Customer1   432.0   644.0   930.0   NaN    NaN      NaN     669.0
Customer2   NaN     NaN     NaN     220.0  415.0    150.0   262.0
mean       432.0    644.0   930.0   220.0   415.0   150.0   930.0
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement