Skip to content
Advertisement

Creating a new column if a condition is satisfied in the last N days python pandas

I have a dataframe like this:

timestamp value id.
2020-12-16 25 1
2020-12-17 45 1
2020-12-31 40 1
2021-01-31 37 1
2020-12-15 12 2
2020-12-16 78. 2.

I want to create a new column outcome which takes a value yes is the id doesn’t have any entry for the last 25 days. For e.g., this is the expected output

timestamp value id. outcome
2020-12-16 25 1 yes
2020-12-17 45 1 yes.
2020-12-31 40 1 yes.
2021-01-31 37 1 no.
2020-12-15 12 2 yes.
2020-12-16 78. 2. yes.

In the above example, we can assume that the start date is 2020-12-15. So, our comparisons start after this date for all the ids. For id 1, starting from 2020-12-15, it has entries in the last 25 days except for 2021-01-31. The entry that happened before 2021-01-31 is on 2020-12-31 which is > 25 days in this example. So, the outcome variable takes a value no. For the rest of the rows, it is yes because for each of the dates, there’s a prior entry that falls within the 25 day difference.

I am very confused about how to write an algorithm for this.

I would have to groupby id, order by timestamp and then do rolling something on the timestamp to create the outcome column?

I am very confused about this and any help will be much appreciated!

Thanks in advance!

Advertisement

Answer

Because your calculation requires sorting we can avoid grouping. Sort, take a row-difference and use where to NaN the values that cross groups (i.e. the earliest row for every ID). Because you want the first difference to be relative to '2020-12-15' we can use fillna to find the difference from that date and use np.where to assign your strings values based on your condition.

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement