Here’s some made up time series data on 1 minute intervals:
JavaScript
x
12
12
1
import pandas as pd
2
import numpy as np
3
import random
4
5
random.seed(5)
6
7
8
rows,cols = 8760,3
9
data = np.random.rand(rows,cols)
10
tidx = pd.date_range('2019-01-01', periods=rows, freq='1T')
11
df = pd.DataFrame(data, columns=['condition1','condition2','condition3'], index=tidx)
12
This is just some code to create some Boolean columns
JavaScript
1
7
1
df['condition1_bool'] = df['condition1'].lt(.1)
2
df['condition2_bool'] = df['condition2'].lt(df['condition1']) & df['condition2'].gt(df['condition3'])
3
df['condition3_bool'] = df['condition3'].gt(.9)
4
5
df = df[['condition1_bool','condition2_bool','condition3_bool']]
6
df = df.astype(int)
7
On my screen this prints:
JavaScript
1
7
1
condition1_bool condition2_bool condition3_bool
2
2019-01-01 00:00:00 0 0 0
3
2019-01-01 00:01:00 0 0 1 <---- Count as same event!
4
2019-01-01 00:02:00 0 0 1 <---- Count as same event!
5
2019-01-01 00:03:00 1 0 0
6
2019-01-01 00:04:00 0 0 0
7
What I am trying to figure out is how to rollup per hour cumulative events (True or 1
) but if there is no 0 between events, its the same event! Hopefully that makes sense what I was describing above on the <---- Count as same event!
If I do:
JavaScript
1
2
1
df = df.resample('H').sum()
2
This will just resample and count all events, right regardless of the time series commitment I was trying to highlight with the <---- Count as same event!
Thanks for any tips!!
Advertisement
Answer
Check if the current row (“2019-01-01 00:02:00”) equals to 1 and check if the previous row (“2019-01-01 00:01:00”) is not equal to 1. This removes consecutive 1 of the sum.
JavaScript
1
17
17
1
>>> df.resample('H').apply(lambda x: (x.eq(1) & x.shift().ne(1)).sum())
2
3
condition1_bool condition2_bool condition3_bool
4
2019-01-01 00:00:00 4 8 4
5
2019-01-01 01:00:00 9 7 6
6
2019-01-01 02:00:00 7 14 4
7
2019-01-01 03:00:00 2 8 7
8
2019-01-01 04:00:00 4 9 5
9
10
2019-01-06 21:00:00 4 8 2
11
2019-01-06 22:00:00 3 11 4
12
2019-01-06 23:00:00 6 11 4
13
2019-01-07 00:00:00 8 7 8
14
2019-01-07 01:00:00 4 9 6
15
16
[146 rows x 3 columns]
17
Using your code:
JavaScript
1
17
17
1
>>> df.resample('H').sum()
2
3
condition1_bool condition2_bool condition3_bool
4
2019-01-01 00:00:00 5 8 5
5
2019-01-01 01:00:00 9 8 6
6
2019-01-01 02:00:00 7 14 5
7
2019-01-01 03:00:00 2 9 7
8
2019-01-01 04:00:00 4 11 5
9
10
2019-01-06 21:00:00 5 11 3
11
2019-01-06 22:00:00 3 15 4
12
2019-01-06 23:00:00 6 12 4
13
2019-01-07 00:00:00 8 7 10
14
2019-01-07 01:00:00 4 9 7
15
16
[146 rows x 3 columns]
17
Check:
JavaScript
1
5
1
dti = pd.date_range('2021-11-15 21:00:00', '2021-11-15 22:00:00',
2
closed='left', freq='T')
3
4
df1 = pd.DataFrame({'c1': 1}, index=dti)
5
JavaScript
1
8
1
>>> df1.resample('H').apply(lambda x: (x.eq(1) & x.shift().ne(1)).sum())
2
c1
3
2021-11-15 21:00:00 1
4
5
>>> df1.resample('H').sum()
6
c1
7
2021-11-15 21:00:00 60
8