Q1. Given data frame 1, I am trying to get group-by unique new occurrences & another column that gives me existing ID count per month
ID Date 1 Jan-2020 2 Feb-2020 3 Feb-2020 1 Mar-2020 2 Mar-2020 3 Mar-2020 4 Apr-2020 5 Apr-2020
Expected output for unique newly added group-by ID values & for existing sum of ID values
Date ID_Count Existing_count Jan-2020 1 0 Feb-2020 2 1 Mar-2020 0 3 Apr-2020 2 3
Note: Mar-2020 ID_Count is ZERO because ID 1, 2, and 3 were present in previous months.
Note: Existing count is 0 for Jan-2020 because there were zero IDs before Jan. The existing count for Feb-2020 is 1 because before Feb there was only 1. Mar-2020 has 3 existing counts as it adds Jan + Feb and so on
Advertisement
Answer
I think you can do it like this:
df['month'] = pd.to_datetime(df['Date'], format='%b-%Y') # Find new IDs df['new'] = df.groupby('ID').cumcount()==0 # Count new IDs by month df_ct = df.groupby('month')['new'].sum().to_frame(name='ID_Count') # Count all previous new IDs df_ct['Existing_cnt'] = df_ct['ID_Count'].shift().cumsum().fillna(0).astype(int) df_ct.index = df_ct.index.strftime('%b-%Y') df_ct
Output:
ID_Count Existing_cnt month Jan-2020 1 0 Feb-2020 2 1 Mar-2020 0 3 Apr-2020 2 3