Skip to content
Advertisement

Calculate the weighted average using groupby in Python

here is the dataframe I’m currently working on :

df_weight_0

enter image description here

What I’d like to calculate is the average of the variable “avg_lag” weighted by “tot_SKU” in each product_basket for both SMB and CORP groups. This means that, taking CORP as an example, I want to calculate something as:

(585,134 * 46.09 + 147,398 * 104.55 + … + 1,112,941 * 75.73) / (585,134 + 147,398 + … + 1,112,941)

and the same will need to be done for SMB.

Ideally, I’d like to have this two numbers in the dataframe above in a new column (SMB and CORP rows will have their weighted average repeated according to the two values calculated as shown above).

P.S. I will go deeper on more levels in my analysis so the most general the approach, the better.

Thanks in advance,

Stefano

Advertisement

Answer

So this should do the trick I think

import pandas as pd


def calculator(df, columns):
    weighted_sum = (df[columns[0]]*df[columns[1]]).sum()/df[columns[0]].sum()
    return weighted_sum

cols = ['tot_SKU', 'avg_lag']

Sums = df.groupby('SF_type').apply(lambda x: calculator(x, cols))
df.join(Sums.rename(('sums')), on='SF_type')

Edit: Added the requested merge with the old dataframe

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement