I have the a dataframe df that looks like sorted by ID and Date in ascending and descending order respectively:
ID Date Place 0 1 2021-04-28 1 1 1 2021-02-28 2 2 1 2020-12-23 11 3 1 2020-11-29 1 4 2 2021-07-07 1 5 2 2021-06-20 4 6 2 2021-05-26 8 7 2 2021-04-08 1 8 2 2021-03-03 3 9 2 2021-02-03 1 10 2 2021-01-13 9 11 2 2020-12-23 12 12 3 2021-06-02 1 13 3 2021-05-08 1 14 3 2021-04-08 9 15 3 2021-01-17 1 16 3 2020-12-23 4 17 3 2020-12-02 1 18 3 2020-11-14 2
I would like to add two new columns called Number of 1 which is defined to be cumulative count of number 1 occurred in Place for each ID and another column called Recent Number of 1 which is defined to be the cumcount of 1’s in Place but the count restarts every year.
So the outcome looks like
ID Date Place Number of 1 Recent Number of 1 0 1 2021-04-28 1 2 1 1 1 2021-02-28 2 1 0 2 1 2020-12-23 11 1 1 3 1 2020-11-29 1 1 1 4 2 2021-07-07 1 3 3 5 2 2021-06-20 4 2 2 6 2 2021-05-26 8 2 2 7 2 2021-04-08 1 2 2 8 2 2021-03-03 3 1 1 9 2 2021-02-03 1 1 1 10 2 2021-01-13 9 0 0 11 2 2020-12-23 12 0 0 12 3 2021-06-02 1 4 2 13 3 2021-05-08 1 3 1 14 3 2021-04-08 9 2 1 15 3 2021-01-17 1 2 1 16 3 2020-12-23 4 1 1 17 3 2020-12-02 1 1 1 18 3 2020-11-14 2 0 0
I think for the recent number it should have something to do with
df.sort_values(by = ['ID', 'Date'], ascending = [True, False], inplace = True)
df['Date'] = df['Date'].dt.year
df.groupby(['ID','Date']).cumcount something
Thank you so much for your help.
Advertisement
Answer
here is one way to do it
the result don’t match the expected outcome, as the first table in question don’t match the table under expected.
The below is from the first table in the question
df['Date']=pd.to_datetime(df['Date']) df['number of 1']=df.sort_values(['ID','Date'],ascending=[1,1])['Place'].eq(1).groupby(df['ID']).cumsum() df['Recent number of 1']=df.sort_values(['ID','Date'],ascending=[1,1])['Place'].eq(1).groupby([df['ID'],df['Date'].dt.year]).cumsum() df
ID Date Place number of 1 Recent number of 1 0 1 2021-04-28 8 2 1 1 1 2021-02-28 1 2 1 2 1 2020-12-23 11 1 1 3 1 2020-11-29 1 1 1 4 2 2021-07-07 1 3 3 5 2 2021-06-20 4 2 2 6 2 2021-05-26 8 2 2 7 2 2021-04-08 1 2 2 8 2 2021-03-03 3 1 1 9 2 2021-02-03 1 1 1 10 2 2021-01-13 9 0 0 11 2 2020-12-23 12 0 0 12 3 2021-06-02 1 3 2 13 3 2021-05-08 8 2 1 14 3 2021-04-08 9 2 1 15 3 2021-01-17 1 2 1 16 3 2020-12-23 4 1 1 17 3 2020-12-02 7 1 1 18 3 2020-11-14 1 1 1