Here’s some made up time series data on 1 minute intervals:
import pandas as pd import numpy as np import random random.seed(5) rows,cols = 8760,3 data = np.random.rand(rows,cols) tidx = pd.date_range('2019-01-01', periods=rows, freq='1T') df = pd.DataFrame(data, columns=['condition1','condition2','condition3'], index=tidx)
This is just some code to create some Boolean columns
df['condition1_bool'] = df['condition1'].lt(.1) df['condition2_bool'] = df['condition2'].lt(df['condition1']) & df['condition2'].gt(df['condition3']) df['condition3_bool'] = df['condition3'].gt(.9) df = df[['condition1_bool','condition2_bool','condition3_bool']] df = df.astype(int)
On my screen this prints:
condition1_bool condition2_bool condition3_bool 2019-01-01 00:00:00 0 0 0 2019-01-01 00:01:00 0 0 1 <---- Count as same event! 2019-01-01 00:02:00 0 0 1 <---- Count as same event! 2019-01-01 00:03:00 1 0 0 2019-01-01 00:04:00 0 0 0
What I am trying to figure out is how to rollup per hour cumulative events (True or 1
) but if there is no 0 between events, its the same event! Hopefully that makes sense what I was describing above on the <---- Count as same event!
If I do:
df = df.resample('H').sum()
This will just resample and count all events, right regardless of the time series commitment I was trying to highlight with the <---- Count as same event!
Thanks for any tips!!
Advertisement
Answer
Check if the current row (“2019-01-01 00:02:00”) equals to 1 and check if the previous row (“2019-01-01 00:01:00”) is not equal to 1. This removes consecutive 1 of the sum.
>>> df.resample('H').apply(lambda x: (x.eq(1) & x.shift().ne(1)).sum()) condition1_bool condition2_bool condition3_bool 2019-01-01 00:00:00 4 8 4 2019-01-01 01:00:00 9 7 6 2019-01-01 02:00:00 7 14 4 2019-01-01 03:00:00 2 8 7 2019-01-01 04:00:00 4 9 5 ... ... ... ... 2019-01-06 21:00:00 4 8 2 2019-01-06 22:00:00 3 11 4 2019-01-06 23:00:00 6 11 4 2019-01-07 00:00:00 8 7 8 2019-01-07 01:00:00 4 9 6 [146 rows x 3 columns]
Using your code:
>>> df.resample('H').sum() condition1_bool condition2_bool condition3_bool 2019-01-01 00:00:00 5 8 5 2019-01-01 01:00:00 9 8 6 2019-01-01 02:00:00 7 14 5 2019-01-01 03:00:00 2 9 7 2019-01-01 04:00:00 4 11 5 ... ... ... ... 2019-01-06 21:00:00 5 11 3 2019-01-06 22:00:00 3 15 4 2019-01-06 23:00:00 6 12 4 2019-01-07 00:00:00 8 7 10 2019-01-07 01:00:00 4 9 7 [146 rows x 3 columns]
Check:
dti = pd.date_range('2021-11-15 21:00:00', '2021-11-15 22:00:00', closed='left', freq='T') df1 = pd.DataFrame({'c1': 1}, index=dti)
>>> df1.resample('H').apply(lambda x: (x.eq(1) & x.shift().ne(1)).sum()) c1 2021-11-15 21:00:00 1 >>> df1.resample('H').sum() c1 2021-11-15 21:00:00 60