Skip to content
Advertisement

Python Pandas, Running Sum, based on previous rows value and grouped

I have a pandas dataframe along these lines, based on where a customer service case sits before being closed. Every time the case is edited and audit trial is captured. I want to generate a counter for each time the Department of a case changes from the department it was previously in.

ID Department Start Date End Date
A Sales 01/01/2022 02/01/2022
A Sales 02/01/2022 03/01/2022
A Operations 03/01/2022 04/01/2022
A Sales 04/01/2022 05/01/2022
B Finance 01/01/2022 02/01/2022
B Risk 02/01/2022 03/01/2022

The output I want to achieve is shown below, the part I am struggling with is getting the ‘Count of Department Change’ value to increase when the ticket returns to a department it has already been in.

ID Department Start Date End Date Count of Department Change
A Sales 01/01/2022 02/01/2022 0
A Sales 02/01/2022 03/01/2022 0
A Operations 03/01/2022 04/01/2022 1
A Sales 04/01/2022 05/01/2022 2
B Finance 01/01/2022 02/01/2022 0
B Risk 02/01/2022 03/01/2022 1

Using the following code I am able to flag when the department changes for a given case.

df['Dept_Change_Count'] = np.where((df['Department'] != df['Department'].shift(1)) & (df['ID'] == df['ID'].shift(1)), '1', '0')

I’m thinking I could use the df[‘Dept_Change_Count’] and a running sum along the ID to generate the output I’m after but I haven’t had much luck so far.

Any help greatly appreciated!

Advertisement

Answer

Compare previous and current row in Department per ID then again group by ID and calculate cumsum to generate counter

m = df['Department'] != df.groupby('ID')['Department'].shift()
df['Dept_Change_Count'] = m.groupby(df['ID']).cumsum() - 1

Alternative approach using a single groupby with lambda func to calculate cumsum:

df['Dept_Change_Count'] = df.groupby('ID')['Department']
                            .apply(lambda s: (s != s.shift()).cumsum()) - 1

  ID  Department  Start Date    End Date  Dept_Change_Count
0  A       Sales  01/01/2022  02/01/2022                  0
1  A       Sales  02/01/2022  03/01/2022                  0
2  A  Operations  03/01/2022  04/01/2022                  1
3  A       Sales  04/01/2022  05/01/2022                  2
4  B     Finance  01/01/2022  02/01/2022                  0
5  B        Risk  02/01/2022  03/01/2022                  1
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement