Given the following DataFrame of pandas in Python: Displays the switching on and off of 3 light bulbs at different times using date and time objects. I want to add a new column, called cost_days. This column will include only for rows where the variable using_time is different from NaT. Information on how many times the light bulb has been

Calculate activity interval for a pandas DataFrame with datetime rows

Given the following DataFrame of pandas in Python:

Displays the switching on and off of 3 light bulbs at different times using date and time objects.

                        date       ID_bulb     switch       using_time
1  2022-03-27 15:30:21+00:00             1        ON               NaT
2  2022-03-29 17:05:21+00:00             1       OFF   2 days 01:35:00
3  2022-04-07 17:05:21+00:00             1       OFF               NaT
4  2022-04-06 16:10:21+00:00             2        ON               NaT
5  2022-04-07 15:30:21+00:00             2       OFF   0 days 23:20:00
6  2022-02-15 23:10:21+00:00             3        ON               NaT
7  2022-02-16 02:10:21+00:00             3       OFF   0 days 04:00:00
8  2022-02-16 02:50:01+00:00             3        ON               NaT
9  2022-02-18 10:50:01+00:00             3       OFF   2 days 07:00:00
10 2022-02-04 19:40:21+00:00             4        ON               NaT
11 2022-02-06 15:35:21+00:00             4       OFF   1 days 19:55:00
12 2022-02-23 20:10:21+00:00             4        ON               NaT
13 2022-02-24 02:10:21+00:00             4       OFF   0 days 10:00:00
14 2022-03-14 12:10:21+00:00             5        ON               NaT
15 2022-03-15 00:10:21+00:00             5        ON               NaT
16 2022-03-16 05:10:21+00:00             5       OFF   0 days 05:00:00

I want to add a new column, called cost_days. This column will include only for rows where the variable using_time is different from NaT. Information on how many times the light bulb has been on during at least n hours in a row between a night period defined by start_time to end_time.

Example of the resulting DataFrame:

Given for example:

add_costdays_column(df, 5, "22:00:00", "07:00:00")

                        date       ID_bulb     switch       using_time    cost_days
1  2022-03-27 15:30:21+00:00             1        ON               NaT         0
2  2022-03-29 17:05:21+00:00             1       OFF   2 days 01:35:00         2
3  2022-04-07 17:05:21+00:00             1       OFF               NaT         0
4  2022-04-06 16:10:21+00:00             2        ON               NaT         0
5  2022-04-07 15:30:21+00:00             2       OFF   0 days 23:20:00         1
6  2022-02-15 23:10:21+00:00             3        ON               NaT         0
7  2022-02-16 02:10:21+00:00             3       OFF   0 days 04:00:00         0
8  2022-02-16 02:50:01+00:00             3        ON               NaT         0
9  2022-02-18 10:50:01+00:00             3       OFF   2 days 08:00:00         2
10 2022-02-04 19:40:21+00:00             4        ON               NaT         0
11 2022-02-06 15:35:21+00:00             4       OFF   1 days 19:55:00         2
12 2022-02-23 20:10:21+00:00             4        ON               NaT         0
13 2022-02-24 02:10:21+00:00             4       OFF   0 days 06:00:00         0
14 2022-03-14 12:10:21+00:00             5        ON               NaT         0
15 2022-03-15 00:10:21+00:00             5        ON               NaT         0
16 2022-03-16 05:10:21+00:00             5       OFF   0 days 05:00:00         1

Only the row ON before the row OFF with a value other than NaT in the using_time column is taken into account for switching on the bulb. This is to simplify the problem, I will adapt it later to the specific case.

Answer

Use:

import pandas as pd
df = pd.read_csv('TEST_STACK.csv')
df = df.iloc[:df[df['using_time'].notna()].index[-1]+1]
#solution
g = (df['using_time'].notna()).sort_index(ascending=False).cumsum()
g = (g-max(g)).abs()
import numpy as np
def rounder(x):
      v = pd.date_range(list(x)[-2], list(x)[-1], freq='1h')
      temp = pd.Series(v, index = v).between_time('22:00', '07:00')
      temp = len(temp)/9
      return np.floor(temp) if np.mod(temp, 1.0) < 6/9 else np.ceil(temp)/9
temp = df.groupby(g)['date'].apply(lambda x: rounder(x))
df.loc[df[df['using_time'].notna()].index, 'new col']=temp.values
df['new col'] = df['new col'].fillna(0)

Advertisement

Answer