I have tried to tackle this for quite some time, but haven’t been able to get a pythonic way around it by using the built-in groupby
and transform
methods from pandas.
The goal is to group the data by columns ex_date
and id
, then within the groups identified, standardize the column called ref_value_1
against the value found in the row where df['calc_date'] == df['ex_date']
Here’s a sample input:
JavaScript
x
2
1
df = pd.DataFrame({'calc_date': ['1/1/2021', '2/1/2021', '3/1/2021', '1/1/2021', '2/1/2021', '3/1/2021', '1/1/2021', '2/1/2021', '3/1/2021', '1/1/2021', '2/1/2021', '3/1/2021'], 'ex_date': ['2/1/2021', '2/1/2021', '2/1/2021', '2/1/2021', '2/1/2021', '2/1/2021', '2/1/2021', '2/1/2021', '2/1/2021', '2/1/2021', '2/1/2021', '2/1/2021'], 'id': [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4], 'ref_value_1': [1.5, 3.0, 4.5, 5.0, 10.0, 15.0, 15.0, 40.0, 60.0, 75.0, 100.0, 120.0], 'bins': [1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3]})
2
which looks like:
calc_date | ex_date | id | ref_value_1 | bins |
---|---|---|---|---|
1/1/2021 | 2/1/2021 | 1 | 1.5 | 1 |
2/1/2021 | 2/1/2021 | 1 | 3 | 1 |
3/1/2021 | 2/1/2021 | 1 | 4.5 | 1 |
1/1/2021 | 2/1/2021 | 2 | 5 | 1 |
2/1/2021 | 2/1/2021 | 2 | 10 | 1 |
3/1/2021 | 2/1/2021 | 2 | 15 | 1 |
1/1/2021 | 2/1/2021 | 3 | 15 | 2 |
2/1/2021 | 2/1/2021 | 3 | 40 | 2 |
3/1/2021 | 2/1/2021 | 3 | 60 | 2 |
1/1/2021 | 2/1/2021 | 4 | 75 | 3 |
2/1/2021 | 2/1/2021 | 4 | 100 | 3 |
3/1/2021 | 2/1/2021 | 4 | 120 | 3 |
And expected output:
calc_date | ex_date | id | ref_value_1 | bins | standardized_val |
---|---|---|---|---|---|
1/1/2021 | 2/1/2021 | 1 | 1.5 | 1 | 0.5 |
2/1/2021 | 2/1/2021 | 1 | 3 | 1 | 1 |
3/1/2021 | 2/1/2021 | 1 | 4.5 | 1 | 1.5 |
1/1/2021 | 2/1/2021 | 2 | 5 | 1 | 0.5 |
2/1/2021 | 2/1/2021 | 2 | 10 | 1 | 1 |
3/1/2021 | 2/1/2021 | 2 | 15 | 1 | 1.5 |
1/1/2021 | 2/1/2021 | 3 | 15 | 2 | 0.375 |
2/1/2021 | 2/1/2021 | 3 | 40 | 2 | 1 |
3/1/2021 | 2/1/2021 | 3 | 60 | 2 | 1.5 |
1/1/2021 | 2/1/2021 | 4 | 75 | 3 | 0.75 |
2/1/2021 | 2/1/2021 | 4 | 100 | 3 | 1 |
3/1/2021 | 2/1/2021 | 4 | 120 | 3 | 1.2 |
Advertisement
Answer
You can mask the non matching values and fill per group using groupby
+transform
to get the reference. Then simply divide your data with the reference.
JavaScript
1
4
1
ref = df['ref_value_1'].where(df['calc_date'].eq(df['ex_date'])).groupby(df['id']).transform('first')
2
3
df['standardized_val'] = df['ref_value_1'].div(ref)
4
Output:
JavaScript
1
14
14
1
calc_date ex_date id ref_value_1 bins standardized_val
2
0 1/1/2021 2/1/2021 1 1.5 1 0.500
3
1 2/1/2021 2/1/2021 1 3.0 1 1.000
4
2 3/1/2021 2/1/2021 1 4.5 1 1.500
5
3 1/1/2021 2/1/2021 2 5.0 1 0.500
6
4 2/1/2021 2/1/2021 2 10.0 1 1.000
7
5 3/1/2021 2/1/2021 2 15.0 1 1.500
8
6 1/1/2021 2/1/2021 3 15.0 2 0.375
9
7 2/1/2021 2/1/2021 3 40.0 2 1.000
10
8 3/1/2021 2/1/2021 3 60.0 2 1.500
11
9 1/1/2021 2/1/2021 4 75.0 3 0.750
12
10 2/1/2021 2/1/2021 4 100.0 3 1.000
13
11 3/1/2021 2/1/2021 4 120.0 3 1.200
14