I have a dataframe that looks like this: Code Period A 2022-04-29 A 2022-04-29 A 2022-04-30 A 2022-05-01 A 2022-05-01 A 2022-05-01 I have to create a new column, i.e., if the month ends then Count should start from 1. Below is the code that I have tried at my end. Code Period size A 2022-04-29 2 A 2022-04-30 1

Cumulative count of column based on Month

I have a dataframe that looks like this:

dict_1 = {"Code" : ['A', 'A', 'A', 'A', 'A', 'A'],
          'Period' : ['2022-04-29','2022-04-29', '2022-04-30', '2022-05-01', '2022-05-01', '2022-05-01']}
df_1 = pd.DataFrame(dict_1)
df_1['Period'] = pd.to_datetime(df_1['Period']).dt.strftime("%Y-%m-%d")
df_1.head(10)

JavaScript
​x
 
dict_1 = {"Code" : ['A', 'A', 'A', 'A', 'A', 'A'],
          'Period' : ['2022-04-29','2022-04-29', '2022-04-30', '2022-05-01', '2022-05-01', '2022-05-01']}
df_1 = pd.DataFrame(dict_1)
df_1['Period'] = pd.to_datetime(df_1['Period']).dt.strftime("%Y-%m-%d")
df_1.head(10)
​

Code	Period
A	2022-04-29
A	2022-04-29
A	2022-04-30
A	2022-05-01
A	2022-05-01
A	2022-05-01

I have to create a new column, i.e., if the month ends then Count should start from 1.

Below is the code that I have tried at my end.

df_2 = df_1.groupby(['Period', 'Code'], as_index=False).size()
df_2.head()

JavaScript
 
df_2 = df_1.groupby(['Period', 'Code'], as_index=False).size()
df_2.head()
​

Code	Period	size
A	2022-04-29	2
A	2022-04-30	1
A	2022-05-01	3

def Cumulative(lists):
    cu_list = []
    length = len(lists)
    cu_list = [sum(lists[0:x:1]) for x in range(0, length+1)]
    return cu_list[1:]

df_2['Count'] = Cumulative(df_2['size'])
df_2.head()

JavaScript
 
def Cumulative(lists):
    cu_list = []
    length = len(lists)
    cu_list = [sum(lists[0:x:1]) for x in range(0, length+1)]
    return cu_list[1:]
​
df_2['Count'] = Cumulative(df_2['size'])
df_2.head()
​

Code	Period	size	Count
A	2022-04-29	2	2
A	2022-04-30	1	3
A	2022-05-01	3	6

For the row with a Period of 2022-05-01, the total count should be 3 instead of 6 because a new month has started.

Answer

Use groupby on the month (and year to be safe) information from Period and apply cumsum:

year_col = pd.to_datetime(df_2['Period']).dt.year
month_col = pd.to_datetime(df_2['Period']).dt.month
df_2['count'] = df_2.groupby([year_col, month_col])['size'].cumsum()

JavaScript
 
year_col = pd.to_datetime(df_2['Period']).dt.year
month_col = pd.to_datetime(df_2['Period']).dt.month
df_2['count'] = df_2.groupby([year_col, month_col])['size'].cumsum()
​

Result:

        Period  Code  size  count
0   2022-04-29     A     2      2
1   2022-04-30     A     1      3
2   2022-05-01     A     3      3

JavaScript
 
        Period  Code  size  count
0   2022-04-29     A     2      2
1   2022-04-30     A     1      3
2   2022-05-01     A     3      3
​

Advertisement

Answer