I have a DataFrame that is a time series every minutes (duration: up to today) All ts are in datetime64[ns]
Input:
timestamp A 2021-01-01 00:00:00.000 2000 2021-01-01 00:01:00.000 2005 2021-01-01 00:02:00.000 2010 ...
I create a new column with categorical data consisting of periods of 40 days like below: ( I am using pd.cut) , output is:
timestamp A period 2021-01-01 00:00:00.000 2000 (2021-12-31 23:59:59.999999999, 2022-02-10] 2021-01-01 00:01:00.000 2005 (2021-12-31 23:59:59.999999999, 2022-02-10] 2021-01-01 00:02:00.000 2010 (2021-12-31 23:59:59.999999999, 2022-02-10] ...
The set of period is as below:
[(2021-12-31 23:59:59.999999999, 2022-02-10] < (2022-02-10, 2022-03-22] < (2022-03-22, 2022-05-01]]
I want to create a new column in which I get the first value of this category For example:
timestamp A period A_ref 2021-01-01 00:00:00.000 2000 (2021-12-31 23:59:59.999999999, 2022-02-10] 2000 2021-01-01 00:01:00.000 2005 (2021-12-31 23:59:59.999999999, 2022-02-10] 2000 2021-01-01 00:02:00.000 2010 (2021-12-31 23:59:59.999999999, 2022-02-10] 2000 ...
It could be that the beginning of the period is not an exact match.
What I have tried: df.resample(’40D’) taking ‘first’ as aggregator. Doing this I can get the value I am interested in but I really want is fill this value across all of the rows. Maybe the approach of using a category is not necessary?
Thank you
Advertisement
Answer
df.merge(df.groupby('period')['A'].first().reset_index(), on='period', suffixes = ('', '_ref')) timestamp ... A_ref 0 2021-01-01 00:00:00.000 ... 2000 1 2021-01-01 00:01:00.000 ... 2000 2 2021-01-01 00:02:00.000 ... 2000