Given the following DataFrame of pandas in Python:
Displays the switching on and off of 3 light bulbs at different times using date and time objects.
date ID_bulb switch using_time 1 2022-03-27 15:30:21+00:00 1 ON NaT 2 2022-03-29 17:05:21+00:00 1 OFF 2 days 01:35:00 3 2022-04-07 17:05:21+00:00 1 OFF NaT 4 2022-04-06 16:10:21+00:00 2 ON NaT 5 2022-04-07 15:30:21+00:00 2 OFF 0 days 23:20:00 6 2022-02-15 23:10:21+00:00 3 ON NaT 7 2022-02-16 02:10:21+00:00 3 OFF 0 days 04:00:00 8 2022-02-16 02:50:01+00:00 3 ON NaT 9 2022-02-18 10:50:01+00:00 3 OFF 2 days 07:00:00 10 2022-02-04 19:40:21+00:00 4 ON NaT 11 2022-02-06 15:35:21+00:00 4 OFF 1 days 19:55:00 12 2022-02-23 20:10:21+00:00 4 ON NaT 13 2022-02-24 02:10:21+00:00 4 OFF 0 days 10:00:00 14 2022-03-14 12:10:21+00:00 5 ON NaT 15 2022-03-15 00:10:21+00:00 5 ON NaT 16 2022-03-16 05:10:21+00:00 5 OFF 0 days 05:00:00
I want to add a new column, called cost_days
. This column will include only for rows where the variable using_time
is different from NaT
. Information on how many times the light bulb has been on during at least n
hours in a row between a night period defined by start_time
to end_time
.
Example of the resulting DataFrame:
Given for example:
add_costdays_column(df, 5, "22:00:00", "07:00:00")
date ID_bulb switch using_time cost_days 1 2022-03-27 15:30:21+00:00 1 ON NaT 0 2 2022-03-29 17:05:21+00:00 1 OFF 2 days 01:35:00 2 3 2022-04-07 17:05:21+00:00 1 OFF NaT 0 4 2022-04-06 16:10:21+00:00 2 ON NaT 0 5 2022-04-07 15:30:21+00:00 2 OFF 0 days 23:20:00 1 6 2022-02-15 23:10:21+00:00 3 ON NaT 0 7 2022-02-16 02:10:21+00:00 3 OFF 0 days 04:00:00 0 8 2022-02-16 02:50:01+00:00 3 ON NaT 0 9 2022-02-18 10:50:01+00:00 3 OFF 2 days 08:00:00 2 10 2022-02-04 19:40:21+00:00 4 ON NaT 0 11 2022-02-06 15:35:21+00:00 4 OFF 1 days 19:55:00 2 12 2022-02-23 20:10:21+00:00 4 ON NaT 0 13 2022-02-24 02:10:21+00:00 4 OFF 0 days 06:00:00 0 14 2022-03-14 12:10:21+00:00 5 ON NaT 0 15 2022-03-15 00:10:21+00:00 5 ON NaT 0 16 2022-03-16 05:10:21+00:00 5 OFF 0 days 05:00:00 1
Only the row ON before the row OFF with a value other than NaT
in the using_time
column is taken into account for switching on the bulb. This is to simplify the problem, I will adapt it later to the specific case.
Advertisement
Answer
Use:
import pandas as pd df = pd.read_csv('TEST_STACK.csv') df = df.iloc[:df[df['using_time'].notna()].index[-1]+1] #solution g = (df['using_time'].notna()).sort_index(ascending=False).cumsum() g = (g-max(g)).abs() import numpy as np def rounder(x): v = pd.date_range(list(x)[-2], list(x)[-1], freq='1h') temp = pd.Series(v, index = v).between_time('22:00', '07:00') temp = len(temp)/9 return np.floor(temp) if np.mod(temp, 1.0) < 6/9 else np.ceil(temp)/9 temp = df.groupby(g)['date'].apply(lambda x: rounder(x)) df.loc[df[df['using_time'].notna()].index, 'new col']=temp.values df['new col'] = df['new col'].fillna(0)