I have hourly data on electricity consumption for a specific day. I would like to use this data to “predict” the hourly electricity consumption for the following days. The value for the following day should be the value from the same hour the day before, multiplied by a scaling factor f
(e.g. 2).
The dataframe df
that I have looks something like this:
load_kWh 2021-01-01 00:00:00 1.0 2021-01-01 01:00:00 1.0 2021-01-01 02:00:00 1.0 2021-01-01 03:00:00 1.0 2021-01-01 04:00:00 1.0 2021-01-01 05:00:00 1.0 2021-01-01 06:00:00 1.0 2021-01-01 07:00:00 3.0 2021-01-01 08:00:00 3.0 2021-01-01 09:00:00 3.0 2021-01-01 10:00:00 3.0 2021-01-01 11:00:00 3.0 2021-01-01 12:00:00 3.0 2021-01-01 13:00:00 3.0 2021-01-01 14:00:00 3.0 2021-01-01 15:00:00 3.0 2021-01-01 16:00:00 3.0 2021-01-01 17:00:00 3.0 2021-01-01 18:00:00 3.0 2021-01-01 19:00:00 3.0 2021-01-01 20:00:00 1.0 2021-01-01 21:00:00 1.0 2021-01-01 22:00:00 1.0 2021-01-01 23:00:00 1.0
And I would like the output dataframe df_ex
to look something like this:
load_kWh 2021-01-01 00:00:00 1.0 2021-01-01 01:00:00 1.0 2021-01-01 02:00:00 1.0 2021-01-01 03:00:00 1.0 2021-01-01 04:00:00 1.0 2021-01-01 05:00:00 1.0 2021-01-01 06:00:00 1.0 2021-01-01 07:00:00 3.0 2021-01-01 08:00:00 3.0 2021-01-01 09:00:00 3.0 2021-01-01 10:00:00 3.0 2021-01-01 11:00:00 3.0 2021-01-01 12:00:00 3.0 2021-01-01 13:00:00 3.0 2021-01-01 14:00:00 3.0 2021-01-01 15:00:00 3.0 2021-01-01 16:00:00 3.0 2021-01-01 17:00:00 3.0 2021-01-01 18:00:00 3.0 2021-01-01 19:00:00 3.0 2021-01-01 20:00:00 1.0 2021-01-01 21:00:00 1.0 2021-01-01 22:00:00 1.0 2021-01-01 23:00:00 1.0 2021-01-02 00:00:00 2.0 2021-01-02 01:00:00 2.0 2021-01-02 02:00:00 2.0 2021-01-02 03:00:00 2.0 2021-01-02 04:00:00 2.0 2021-01-02 05:00:00 2.0 2021-01-02 06:00:00 2.0 2021-01-02 07:00:00 6.0 2021-01-02 08:00:00 6.0 2021-01-02 09:00:00 6.0 2021-01-02 10:00:00 6.0 2021-01-02 11:00:00 6.0 2021-01-02 12:00:00 6.0 2021-01-02 13:00:00 6.0 2021-01-02 14:00:00 6.0 2021-01-02 15:00:00 6.0 2021-01-02 16:00:00 6.0 2021-01-02 17:00:00 6.0 2021-01-02 18:00:00 6.0 2021-01-02 19:00:00 6.0 2021-01-02 20:00:00 2.0 2021-01-02 21:00:00 2.0 2021-01-02 22:00:00 2.0 2021-01-02 23:00:00 2.0 2021-01-03 00:00:00 4.0 2021-01-03 01:00:00 4.0 2021-01-03 02:00:00 4.0 2021-01-03 03:00:00 4.0 2021-01-03 04:00:00 4.0 2021-01-03 05:00:00 4.0 2021-01-03 06:00:00 4.0 2021-01-03 07:00:00 12.0 2021-01-03 08:00:00 12.0 2021-01-03 09:00:00 12.0 2021-01-03 10:00:00 12.0 2021-01-03 11:00:00 12.0 2021-01-03 12:00:00 12.0 2021-01-03 13:00:00 12.0 2021-01-03 14:00:00 12.0 2021-01-03 15:00:00 12.0 2021-01-03 16:00:00 4.0 2021-01-03 17:00:00 4.0 2021-01-03 18:00:00 4.0 2021-01-03 19:00:00 4.0 2021-01-03 20:00:00 4.0 2021-01-03 21:00:00 4.0 2021-01-03 22:00:00 4.0 2021-01-03 23:00:00 4.0
I have tried the following solution (df
as defined above):
import pandas as pd import datetime start = '2021-01-01 00:00' end = '2021-01-03 23:00' freq = 'H' index = pd.date_range(start, end, freq=freq) df_ex = df.reindex(index) i = df_ex.index[0].day f = 2.0 df_ex.loc[df_ex.index.day == i+1] = df_ex.loc[df_ex.index.day == i] * f print(df_ex)
Which results in:
load_kWh 2021-01-01 00:00:00 1.0 2021-01-01 01:00:00 1.0 2021-01-01 02:00:00 1.0 2021-01-01 03:00:00 1.0 2021-01-01 04:00:00 1.0 ... ... 2021-01-03 19:00:00 NaN 2021-01-03 20:00:00 NaN 2021-01-03 21:00:00 NaN 2021-01-03 22:00:00 NaN 2021-01-03 23:00:00 NaN
It appears that my attempt to populate the rows after the first day with values did not succeed. The index is a DateTimeIndex.
Any suggestions on how to solve this would be greatly appreciated!
Advertisement
Answer
To create the data you need to iterate one day at a time.
Assuming that the original data has at least a full day of data then you can do:
import pandas as pd import itertools import datetime as dt start = "2021-01-01 00:00" end = "2021-01-01 23:00" freq = "H" df = pd.DataFrame( {"load_kWh": itertools.chain([1.0] * 7, [3.0] * 13, [1.0] * 4)}, index=pd.date_range(start, end, freq=freq), ) def add_days_to_df(data: pd.DataFrame, number_of_days: int, k: float) -> pd.DataFrame: data = data.copy() for _ in range(number_of_days): day = data[-24:] day.index += dt.timedelta(days=1) day *= k data = pd.concat((data, day)) return data print(add_days_to_df(data=df, number_of_days=2, k=2.0))