I then have a function which look for a specific date (in this case, 2022-01-26):
def get_days(data, date): df = pd.read_csv(data) df = df[(df['date'] >= date) & (df['date'] <= date)] get_trading_session_times(df)
Which returns:
v vw o c h l n date time 0 134730.0 3.6805 3.60 3.61 3.90 3.58 494 2022-01-26 09:00:00 1 72594.0 3.6324 3.60 3.62 3.70 3.57 376 2022-01-26 09:01:00 2 51828.0 3.6151 3.62 3.63 3.65 3.57 278 2022-01-26 09:02:00 3 40245.0 3.6343 3.63 3.65 3.65 3.62 191 2022-01-26 09:03:00 4 76428.0 3.6094 3.64 3.62 3.66 3.57 298 2022-01-26 09:04:00 .. ... ... ... ... ... ... ... ... ... 868 176.0 3.1300 3.13 3.13 3.13 3.13 2 2022-01-26 23:53:00 869 550.0 3.1200 3.12 3.12 3.12 3.12 3 2022-01-26 23:56:00 870 460.0 3.1211 3.12 3.12 3.12 3.12 3 2022-01-26 23:57:00 871 1175.0 3.1201 3.12 3.12 3.12 3.12 6 2022-01-26 23:58:00 872 559.0 3.1102 3.11 3.11 3.11 3.11 5 2022-01-26 23:59:00 [873 rows x 9 columns]
When I then try to look for only times between 09:00 and 09:30 like so:
def get_trading_session_times(df): df = df['time'].between_time('09:00', '09:30') print(df)
I get the following error:
Index must be DatetimeIndex when filtering dataframe
Full code:
import pandas as pd data = 'dataBBIG.csv' date = '2022-01-26' def get_days(data, date): df = pd.read_csv(data) df = df[(df['date'] >= date) & (df['date'] <= date)] get_trading_session_times(df) def get_trading_session_times(df): df = df['time'].between_time('09:00', '09:30') print(df) get_days(data, date)
What am I doing wrong?
Advertisement
Answer
between_time
is only valid if your index is a DateTiimeIndex
As your string time is well formatted, you can use between
to compare them because your values can be sorted in lexicographical order.
>>> df[df['time'].between('09:00', '09:30')] v vw o c h l n date time 0 134730.0 3.6805 3.60 3.61 3.90 3.58 494 2022-01-26 09:00:00 1 72594.0 3.6324 3.60 3.62 3.70 3.57 376 2022-01-26 09:01:00 2 51828.0 3.6151 3.62 3.63 3.65 3.57 278 2022-01-26 09:02:00 3 40245.0 3.6343 3.63 3.65 3.65 3.62 191 2022-01-26 09:03:00 4 76428.0 3.6094 3.64 3.62 3.66 3.57 298 2022-01-26 09:04:00
Update
If your time
column contains a time
object:
from datetime import time df['time'] = pd.to_datetime(df['time']).dt.time out = df[df['time'].between(time(9, 0), time(9, 30))] print(out) # Output v vw o c h l n date time 0 134730.0 3.6805 3.60 3.61 3.90 3.58 494 2022-01-26 09:00:00 1 72594.0 3.6324 3.60 3.62 3.70 3.57 376 2022-01-26 09:01:00 2 51828.0 3.6151 3.62 3.63 3.65 3.57 278 2022-01-26 09:02:00 3 40245.0 3.6343 3.63 3.65 3.65 3.62 191 2022-01-26 09:03:00 4 76428.0 3.6094 3.64 3.62 3.66 3.57 298 2022-01-26 09:04:00