I have data that look like the following
JavaScript
x
13
13
1
Device Time Condition
2
D1 01/11/2019 00:00 issue
3
D1 01/11/2019 00:15 issue
4
D1 01/11/2019 00:30 issue
5
D1 01/11/2019 00:45 issue
6
D1 01/11/2019 01:00 issue
7
D1 01/11/2019 01:15 Resolved
8
D1 01/11/2019 01:30 Resolved
9
D2 01/11/2019 01:45 issue
10
D2 01/11/2019 02:00 Resolved
11
D1 01/11/2019 01:45 issue
12
D1 01/11/2019 02:00 Resolved
13
I need to create a new column that will find the time between the first issue and the first resolved. I need a groupby statement that will keep the first issue and the first resolved for all the issues. Then find the time – When I use group by Device and condition it just kept one issue per device.
The desired output is like the following
JavaScript
1
13
13
1
Device Time Condition durationTofix
2
D1 01/11/2019 00:00 issue
3
D1 01/11/2019 00:15 issue
4
D1 01/11/2019 00:30 issue
5
D1 01/11/2019 00:45 issue
6
D1 01/11/2019 01:00 issue
7
D1 01/11/2019 01:15 Resolved 01:15
8
D1 01/11/2019 01:30 Resolved
9
D2 01/11/2019 01:45 issue
10
D2 01/11/2019 02:00 Resolved 00:15
11
D1 01/11/2019 01:45 issue
12
D1 01/11/2019 02:00 Resolved 00:15
13
As groupby Device and Condition is not enough I thought to create an index column
JavaScript
1
2
1
data["index"] = data.groupby(['Device','condition']).??? #Something like cumcount() but it is not cumcount in this case
2
Then use pivot table for the time calculations
JavaScript
1
3
1
H = data.pivot_table(index=['index','Device'], columns=['condition'], values='Timestamp',aggfunc=lambda x: x)
2
H['durationTofix'] = H['Resolved']- H['issue']
3
Advertisement
Answer
The biggest problem is how to group your issues/resolved properly, which can be done by a reversed cumsum
:
JavaScript
1
21
21
1
df["Time"] = pd.to_datetime(df["Time"])
2
3
df["group"] = (df["Condition"].eq("Resolved")&df["Condition"].shift(-1).eq("issue"))[::-1].cumsum()[::-1]
4
5
df["diff"] = (df[~df.duplicated(["Condition","group"])].groupby("group")["Time"].transform(lambda d: d.diff()))
6
7
print (df)
8
9
Device Time Condition group diff
10
0 D1 2019-01-11 00:00:00 issue 2 NaT
11
1 D1 2019-01-11 00:15:00 issue 2 NaT
12
2 D1 2019-01-11 00:30:00 issue 2 NaT
13
3 D1 2019-01-11 00:45:00 issue 2 NaT
14
4 D1 2019-01-11 01:00:00 issue 2 NaT
15
5 D1 2019-01-11 01:15:00 Resolved 2 01:15:00
16
6 D1 2019-01-11 01:30:00 Resolved 2 NaT
17
7 D2 2019-01-11 01:45:00 issue 1 NaT
18
8 D2 2019-01-11 02:00:00 Resolved 1 00:15:00
19
9 D1 2019-01-11 01:45:00 issue 0 NaT
20
10 D1 2019-01-11 02:00:00 Resolved 0 00:15:00
21