Extract duplicity without rearranging the column and find cumsum in python

Question

I have a dataset with 4000 rows, where I have the duplicate rows(e.g. 2, 3, 4 times). I want to find the cumsum of the duplicates over time. I have used this code to assign the number of duplicity. But it has rearranged the position of ID Output whereas I want to add the duplicity and the ID remains same

Accepted Answer

Use groupby and transform:df['Duplicity'] = df.groupby(['ID', 'Time'])['ID'].transform('size')print(df)# Output      ID  Time  Duplicity0  34696  2020          31  12345  2020          22  12345  2020          23  34696  2020          34  34696  2020          35  34567  2021          1

Advertisement

Answer