I need to calculate Total Hours and Hours by Status per Week using Python / Pandas GROUP BY.
Id Week Status Hours 1 01/10/2022 - 01/16/2022 On 5 2 01/10/2022 - 01/16/2022 Off 2 3 01/17/2022 - 01/23/2022 Off 6 4 01/17/2022 - 01/23/2022 On 1 5 01/17/2022 - 01/23/2022 On 5 6 01/03/2022 - 01/09/2022 On 10 7 01/10/2022 - 01/16/2022 Off 9 8 01/03/2022 - 01/09/2022 On 3 9 01/24/2022 - 01/30/2022 Off 4 10 01/24/2022 - 01/30/2022 On 7
test_data = {'Id': [1,2,3,4,5,6,7,8,9,10], 'Week': ['01/10/2022 - 01/16/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/17/2022 - 01/23/2022', '01/17/2022 - 01/23/2022', '01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/03/2022 - 01/09/2022', '01/24/2022 - 01/30/2022', '01/24/2022 - 01/30/2022'], 'Status': ['On', 'Off', 'Off', 'On', 'On', 'On', 'Off', 'On', 'Off', 'On'], 'Hours': [5,2,6,1,5,10,9,3,4,7]} test_df = pd.DataFrame(data=test_data)
I can get Total Hours by each Week:
test_df.groupby(by=['Week'], as_index=False).agg({"Hours": "sum"})
But I don’t know how to also group by Status, so it will be 2 additional columns (On Status Hours and Off Status Hours)
If I add Status column just to the groupby part, it creates extra rows (I understand why)
test_df.groupby(by=['Week', 'Status'], as_index=False).agg({"Hours": "sum"})
Output I want:
Week | Total Hours | On Status Hours | Off Status Hours |
---|---|---|---|
01/03/2022 – 01/09/2022 | 13 | 13 | 0 |
01/10/2022 – 01/16/2022 | 16 | 5 | 11 |
01/17/2022 – 01/23/2022 | 12 | 6 | 6 |
01/24/2022 – 01/30/2022 | 11 | 7 | 4 |
Advertisement
Answer
You can use:
(test_df .groupby(['Week', 'Status'])['Hours'] .sum() .unstack(1, fill_value=0) .add_suffix(' Status Hours') .assign(**{'Total Hours': lambda d: d.sum(1)}) )
Output:
Status Off Status Hours On Status Hours Total Hours Week 01/03/2022 - 01/09/2022 0 13 13 01/10/2022 - 01/16/2022 11 5 16 01/17/2022 - 01/23/2022 6 6 12 01/24/2022 - 01/30/2022 4 7 11