Creating a new column if a condition is satisfied in the last N days python pandas

Question

I have a dataframe like this: timestamp value id. 2020-12-16 25 1 2020-12-17 45 1 2020-12-31 40 1 2021-01-31 37 1 2020-12-15 12 2 2020-12-16 78. 2. I want to create a new column outcome which takes a value yes is the id doesn't have any entry for the last 25 days. For e.g., this is the expected output timestamp

Accepted Answer

Because your calculation requires sorting we can avoid grouping. Sort, take a row-difference and use where to NaN the values that cross groups (i.e. the earliest row for every ID). Because you want the first difference to be relative to '2020-12-15' we can use fillna to find the difference from that date and use np.where to assign your strings values based on your condition.import pandas as pdimport numpy as npdf['timestamp'] = pd.to_datetime(df['timestamp'])df = df.sort_values(['id', 'timestamp'])s = (df['timestamp'].diff()       .where(df['id'].eq(df['id'].shift()))       .fillna(df['timestamp'] - pd.to_datetime('2020-12-15')))#0    1 days#1    1 days#2   14 days#3   31 days#4    0 days#5    1 daysdf['outcome'] = np.where(s <= pd.Timedelta(25, 'D'), 'yes', 'no')#   timestamp  value  id outcome#0 2020-12-16     25   1     yes#1 2020-12-17     45   1     yes#2 2020-12-31     40   1     yes#3 2021-01-31     37   1      no#4 2020-12-15     12   2     yes#5 2020-12-16     78   2     yes

timestamp	value	id.
2020-12-16	25	1
2020-12-17	45	1
2020-12-31	40	1
2021-01-31	37	1
2020-12-15	12	2
2020-12-16	78.	2.

timestamp	value	id.	outcome
2020-12-16	25	1	yes
2020-12-17	45	1	yes.
2020-12-31	40	1	yes.
2021-01-31	37	1	no.
2020-12-15	12	2	yes.
2020-12-16	78.	2.	yes.

Advertisement

Answer