Q1. Given data frame 1, I am trying to get group-by unique new occurrences & another column that gives me existing ID count per month
JavaScript
x
10
10
1
ID Date
2
1 Jan-2020
3
2 Feb-2020
4
3 Feb-2020
5
1 Mar-2020
6
2 Mar-2020
7
3 Mar-2020
8
4 Apr-2020
9
5 Apr-2020
10
Expected output for unique newly added group-by ID values & for existing sum of ID values
JavaScript
1
6
1
Date ID_Count Existing_count
2
Jan-2020 1 0
3
Feb-2020 2 1
4
Mar-2020 0 3
5
Apr-2020 2 3
6
Note: Mar-2020 ID_Count is ZERO because ID 1, 2, and 3 were present in previous months.
Note: Existing count is 0 for Jan-2020 because there were zero IDs before Jan. The existing count for Feb-2020 is 1 because before Feb there was only 1. Mar-2020 has 3 existing counts as it adds Jan + Feb and so on
Advertisement
Answer
I think you can do it like this:
JavaScript
1
13
13
1
df['month'] = pd.to_datetime(df['Date'], format='%b-%Y')
2
3
# Find new IDs
4
df['new'] = df.groupby('ID').cumcount()==0
5
6
# Count new IDs by month
7
df_ct = df.groupby('month')['new'].sum().to_frame(name='ID_Count')
8
9
# Count all previous new IDs
10
df_ct['Existing_cnt'] = df_ct['ID_Count'].shift().cumsum().fillna(0).astype(int)
11
df_ct.index = df_ct.index.strftime('%b-%Y')
12
df_ct
13
Output:
JavaScript
1
7
1
ID_Count Existing_cnt
2
month
3
Jan-2020 1 0
4
Feb-2020 2 1
5
Mar-2020 0 3
6
Apr-2020 2 3
7