I have the a dataframe df
that looks like sorted by ID
and Date
in ascending and descending order respectively:
ID Date Place 0 1 2021-04-28 1 1 1 2021-02-28 2 2 1 2020-12-23 11 3 1 2020-11-29 1 4 2 2021-07-07 1 5 2 2021-06-20 4 6 2 2021-05-26 8 7 2 2021-04-08 1 8 2 2021-03-03 3 9 2 2021-02-03 1 10 2 2021-01-13 9 11 2 2020-12-23 12 12 3 2021-06-02 1 13 3 2021-05-08 1 14 3 2021-04-08 9 15 3 2021-01-17 1 16 3 2020-12-23 4 17 3 2020-12-02 1 18 3 2020-11-14 2
I would like to add two new columns called Number of 1
which is defined to be cumulative count of number 1 occurred in Place
for each ID
and another column called Recent Number of 1
which is defined to be the cumcount
of 1’s in Place but the count restarts every year.
So the outcome looks like
ID Date Place Number of 1 Recent Number of 1 0 1 2021-04-28 1 2 1 1 1 2021-02-28 2 1 0 2 1 2020-12-23 11 1 1 3 1 2020-11-29 1 1 1 4 2 2021-07-07 1 3 3 5 2 2021-06-20 4 2 2 6 2 2021-05-26 8 2 2 7 2 2021-04-08 1 2 2 8 2 2021-03-03 3 1 1 9 2 2021-02-03 1 1 1 10 2 2021-01-13 9 0 0 11 2 2020-12-23 12 0 0 12 3 2021-06-02 1 4 2 13 3 2021-05-08 1 3 1 14 3 2021-04-08 9 2 1 15 3 2021-01-17 1 2 1 16 3 2020-12-23 4 1 1 17 3 2020-12-02 1 1 1 18 3 2020-11-14 2 0 0
I think for the recent number it should have something to do with
df.sort_values(by = ['ID', 'Date'], ascending = [True, False], inplace = True)
df['Date'] = df['Date'].dt.year
df.groupby(['ID','Date']).cumcount
something
Thank you so much for your help.
Advertisement
Answer
here is one way to do it
the result don’t match the expected outcome, as the first table in question don’t match the table under expected.
The below is from the first table in the question
df['Date']=pd.to_datetime(df['Date']) df['number of 1']=df.sort_values(['ID','Date'],ascending=[1,1])['Place'].eq(1).groupby(df['ID']).cumsum() df['Recent number of 1']=df.sort_values(['ID','Date'],ascending=[1,1])['Place'].eq(1).groupby([df['ID'],df['Date'].dt.year]).cumsum() df
ID Date Place number of 1 Recent number of 1 0 1 2021-04-28 8 2 1 1 1 2021-02-28 1 2 1 2 1 2020-12-23 11 1 1 3 1 2020-11-29 1 1 1 4 2 2021-07-07 1 3 3 5 2 2021-06-20 4 2 2 6 2 2021-05-26 8 2 2 7 2 2021-04-08 1 2 2 8 2 2021-03-03 3 1 1 9 2 2021-02-03 1 1 1 10 2 2021-01-13 9 0 0 11 2 2020-12-23 12 0 0 12 3 2021-06-02 1 3 2 13 3 2021-05-08 8 2 1 14 3 2021-04-08 9 2 1 15 3 2021-01-17 1 2 1 16 3 2020-12-23 4 1 1 17 3 2020-12-02 7 1 1 18 3 2020-11-14 1 1 1