here is the dataframe I’m currently working on :
df_weight_0
What I’d like to calculate is the average of the variable “avg_lag” weighted by “tot_SKU” in each product_basket for both SMB and CORP groups. This means that, taking CORP as an example, I want to calculate something as:
(585,134 * 46.09 + 147,398 * 104.55 + … + 1,112,941 * 75.73) / (585,134 + 147,398 + … + 1,112,941)
and the same will need to be done for SMB.
Ideally, I’d like to have this two numbers in the dataframe above in a new column (SMB and CORP rows will have their weighted average repeated according to the two values calculated as shown above).
P.S. I will go deeper on more levels in my analysis so the most general the approach, the better.
Thanks in advance,
Stefano
Advertisement
Answer
So this should do the trick I think
import pandas as pd def calculator(df, columns): weighted_sum = (df[columns[0]]*df[columns[1]]).sum()/df[columns[0]].sum() return weighted_sum cols = ['tot_SKU', 'avg_lag'] Sums = df.groupby('SF_type').apply(lambda x: calculator(x, cols)) df.join(Sums.rename(('sums')), on='SF_type')
Edit: Added the requested merge with the old dataframe