I have a pandas dataframe along these lines, based on where a customer service case sits before being closed. Every time the case is edited and audit trial is captured. I want to generate a counter for each time the Department of a case changes from the department it was previously in.
ID | Department | Start Date | End Date |
---|---|---|---|
A | Sales | 01/01/2022 | 02/01/2022 |
A | Sales | 02/01/2022 | 03/01/2022 |
A | Operations | 03/01/2022 | 04/01/2022 |
A | Sales | 04/01/2022 | 05/01/2022 |
B | Finance | 01/01/2022 | 02/01/2022 |
B | Risk | 02/01/2022 | 03/01/2022 |
The output I want to achieve is shown below, the part I am struggling with is getting the ‘Count of Department Change’ value to increase when the ticket returns to a department it has already been in.
ID | Department | Start Date | End Date | Count of Department Change |
---|---|---|---|---|
A | Sales | 01/01/2022 | 02/01/2022 | 0 |
A | Sales | 02/01/2022 | 03/01/2022 | 0 |
A | Operations | 03/01/2022 | 04/01/2022 | 1 |
A | Sales | 04/01/2022 | 05/01/2022 | 2 |
B | Finance | 01/01/2022 | 02/01/2022 | 0 |
B | Risk | 02/01/2022 | 03/01/2022 | 1 |
Using the following code I am able to flag when the department changes for a given case.
df['Dept_Change_Count'] = np.where((df['Department'] != df['Department'].shift(1)) & (df['ID'] == df['ID'].shift(1)), '1', '0')
I’m thinking I could use the df[‘Dept_Change_Count’] and a running sum along the ID to generate the output I’m after but I haven’t had much luck so far.
Any help greatly appreciated!
Advertisement
Answer
Compare previous and current row in Department
per ID
then again group
by ID
and calculate cumsum
to generate counter
m = df['Department'] != df.groupby('ID')['Department'].shift() df['Dept_Change_Count'] = m.groupby(df['ID']).cumsum() - 1
Alternative approach using a single groupby
with lambda func to calculate cumsum:
df['Dept_Change_Count'] = df.groupby('ID')['Department'] .apply(lambda s: (s != s.shift()).cumsum()) - 1
ID Department Start Date End Date Dept_Change_Count 0 A Sales 01/01/2022 02/01/2022 0 1 A Sales 02/01/2022 03/01/2022 0 2 A Operations 03/01/2022 04/01/2022 1 3 A Sales 04/01/2022 05/01/2022 2 4 B Finance 01/01/2022 02/01/2022 0 5 B Risk 02/01/2022 03/01/2022 1