Pandas – Groupby and Standardize

Question

I have tried to tackle this for quite some time, but haven't been able to get a pythonic way around it by using the built-in groupby and transform methods from pandas. The goal is to group the data by columns ex_date and id, then within the groups identified, standardize the column called ref_value_1 against the value found in the row

Accepted Answer

You can mask the non matching values and fill per group using groupby+transform to get the reference. Then simply divide your data with the reference.ref = df['ref_value_1'].where(df['calc_date'].eq(df['ex_date'])).groupby(df['id']).transform('first')df['standardized_val'] = df['ref_value_1'].div(ref)Output:   calc_date   ex_date  id  ref_value_1  bins  standardized_val0   1/1/2021  2/1/2021   1          1.5     1             0.5001   2/1/2021  2/1/2021   1          3.0     1             1.0002   3/1/2021  2/1/2021   1          4.5     1             1.5003   1/1/2021  2/1/2021   2          5.0     1             0.5004   2/1/2021  2/1/2021   2         10.0     1             1.0005   3/1/2021  2/1/2021   2         15.0     1             1.5006   1/1/2021  2/1/2021   3         15.0     2             0.3757   2/1/2021  2/1/2021   3         40.0     2             1.0008   3/1/2021  2/1/2021   3         60.0     2             1.5009   1/1/2021  2/1/2021   4         75.0     3             0.75010  2/1/2021  2/1/2021   4        100.0     3             1.00011  3/1/2021  2/1/2021   4        120.0     3             1.200

calc_date	ex_date	id	ref_value_1	bins
1/1/2021	2/1/2021	1	1.5	1
2/1/2021	2/1/2021	1	3	1
3/1/2021	2/1/2021	1	4.5	1
1/1/2021	2/1/2021	2	5	1
2/1/2021	2/1/2021	2	10	1
3/1/2021	2/1/2021	2	15	1
1/1/2021	2/1/2021	3	15	2
2/1/2021	2/1/2021	3	40	2
3/1/2021	2/1/2021	3	60	2
1/1/2021	2/1/2021	4	75	3
2/1/2021	2/1/2021	4	100	3
3/1/2021	2/1/2021	4	120	3

Advertisement

Answer