Skip to content
Advertisement

Pandas – Groupby and Standardize

I have tried to tackle this for quite some time, but haven’t been able to get a pythonic way around it by using the built-in groupby and transform methods from pandas.

The goal is to group the data by columns ex_date and id, then within the groups identified, standardize the column called ref_value_1 against the value found in the row where df['calc_date'] == df['ex_date']

Here’s a sample input:

JavaScript

which looks like:

calc_date ex_date id ref_value_1 bins
1/1/2021 2/1/2021 1 1.5 1
2/1/2021 2/1/2021 1 3 1
3/1/2021 2/1/2021 1 4.5 1
1/1/2021 2/1/2021 2 5 1
2/1/2021 2/1/2021 2 10 1
3/1/2021 2/1/2021 2 15 1
1/1/2021 2/1/2021 3 15 2
2/1/2021 2/1/2021 3 40 2
3/1/2021 2/1/2021 3 60 2
1/1/2021 2/1/2021 4 75 3
2/1/2021 2/1/2021 4 100 3
3/1/2021 2/1/2021 4 120 3

And expected output:

calc_date ex_date id ref_value_1 bins standardized_val
1/1/2021 2/1/2021 1 1.5 1 0.5
2/1/2021 2/1/2021 1 3 1 1
3/1/2021 2/1/2021 1 4.5 1 1.5
1/1/2021 2/1/2021 2 5 1 0.5
2/1/2021 2/1/2021 2 10 1 1
3/1/2021 2/1/2021 2 15 1 1.5
1/1/2021 2/1/2021 3 15 2 0.375
2/1/2021 2/1/2021 3 40 2 1
3/1/2021 2/1/2021 3 60 2 1.5
1/1/2021 2/1/2021 4 75 3 0.75
2/1/2021 2/1/2021 4 100 3 1
3/1/2021 2/1/2021 4 120 3 1.2

Advertisement

Answer

You can mask the non matching values and fill per group using groupby+transform to get the reference. Then simply divide your data with the reference.

JavaScript

Output:

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement