Skip to content
Advertisement

Python Pandas, Running Sum, based on previous rows value and grouped

I have a pandas dataframe along these lines, based on where a customer service case sits before being closed. Every time the case is edited and audit trial is captured. I want to generate a counter for each time the Department of a case changes from the department it was previously in.

ID Department Start Date End Date
A Sales 01/01/2022 02/01/2022
A Sales 02/01/2022 03/01/2022
A Operations 03/01/2022 04/01/2022
A Sales 04/01/2022 05/01/2022
B Finance 01/01/2022 02/01/2022
B Risk 02/01/2022 03/01/2022

The output I want to achieve is shown below, the part I am struggling with is getting the ‘Count of Department Change’ value to increase when the ticket returns to a department it has already been in.

ID Department Start Date End Date Count of Department Change
A Sales 01/01/2022 02/01/2022 0
A Sales 02/01/2022 03/01/2022 0
A Operations 03/01/2022 04/01/2022 1
A Sales 04/01/2022 05/01/2022 2
B Finance 01/01/2022 02/01/2022 0
B Risk 02/01/2022 03/01/2022 1

Using the following code I am able to flag when the department changes for a given case.

JavaScript

I’m thinking I could use the df[‘Dept_Change_Count’] and a running sum along the ID to generate the output I’m after but I haven’t had much luck so far.

Any help greatly appreciated!

Advertisement

Answer

Compare previous and current row in Department per ID then again group by ID and calculate cumsum to generate counter

JavaScript

Alternative approach using a single groupby with lambda func to calculate cumsum:

JavaScript

JavaScript
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement