Creating a new column if a condition is satisfied in the last N days python pandas

Question

I have a dataframe like this: timestamp value id. 2020-12-16 25 1 2020-12-17 45 1 2020-12-31 40 1 2021-01-31 37 1 2020-12-15 12 2 2020-12-16 78. 2. I want to create a new column outcome which takes a value yes is the id doesn&#8217;t have any entry for the last 25 days. For e.g., this is the expected output t…

Accepted Answer

Because your calculation requires sorting we can avoid grouping. Sort, take a row-difference and use where to NaN the values that cross groups (i.e. the earliest row for every ID). Because you want the first difference to be relative to '2020-12-15' we can use fillna to find the difference from that date and use np.where to assign your strings values based on your condition.import pandas as pdimport numpy as npdf['timestamp'] = pd.to_datetime(df['timestamp'])df = df.sort_values(['id', 'timestamp'])s = (df['timestamp'].diff()       .where(df['id'].eq(df['id'].shift()))       .fillna(df['timestamp'] - pd.to_datetime('2020-12-15')))#0    1 days#1    1 days#2   14 days#3   31 days#4    0 days#5    1 daysdf['outcome'] = np.where(s <= pd.Timedelta(25, 'D'), 'yes', 'no')#   timestamp  value  id outcome#0 2020-12-16     25   1     yes#1 2020-12-17     45   1     yes#2 2020-12-31     40   1     yes#3 2021-01-31     37   1      no#4 2020-12-15     12   2     yes#5 2020-12-16     78   2     yes

timestamp	value	id.
2020-12-16	25	1
2020-12-17	45	1
2020-12-31	40	1
2021-01-31	37	1
2020-12-15	12	2
2020-12-16	78.	2.

timestamp	value	id.	outcome
2020-12-16	25	1	yes
2020-12-17	45	1	yes.
2020-12-31	40	1	yes.
2021-01-31	37	1	no.
2020-12-15	12	2	yes.
2020-12-16	78.	2.	yes.

Advertisement

Answer