Skip to content
Advertisement

Index must be DatetimeIndex when filtering dataframe

I then have a function which look for a specific date (in this case, 2022-01-26):

def get_days(data, date):

    df = pd.read_csv(data)

    df = df[(df['date'] >= date) & (df['date'] <= date)]

    get_trading_session_times(df)

Which returns:

            v      vw     o     c     h     l    n        date      time
0    134730.0  3.6805  3.60  3.61  3.90  3.58  494  2022-01-26  09:00:00
1     72594.0  3.6324  3.60  3.62  3.70  3.57  376  2022-01-26  09:01:00
2     51828.0  3.6151  3.62  3.63  3.65  3.57  278  2022-01-26  09:02:00
3     40245.0  3.6343  3.63  3.65  3.65  3.62  191  2022-01-26  09:03:00
4     76428.0  3.6094  3.64  3.62  3.66  3.57  298  2022-01-26  09:04:00
..        ...     ...   ...   ...   ...   ...  ...         ...       ...
868     176.0  3.1300  3.13  3.13  3.13  3.13    2  2022-01-26  23:53:00
869     550.0  3.1200  3.12  3.12  3.12  3.12    3  2022-01-26  23:56:00
870     460.0  3.1211  3.12  3.12  3.12  3.12    3  2022-01-26  23:57:00
871    1175.0  3.1201  3.12  3.12  3.12  3.12    6  2022-01-26  23:58:00
872     559.0  3.1102  3.11  3.11  3.11  3.11    5  2022-01-26  23:59:00

[873 rows x 9 columns]

When I then try to look for only times between 09:00 and 09:30 like so:

def get_trading_session_times(df):
    df = df['time'].between_time('09:00', '09:30')

    print(df)

I get the following error:

Index must be DatetimeIndex when filtering dataframe

Full code:

import pandas as pd

data = 'dataBBIG.csv'
date = '2022-01-26'

def get_days(data, date):

    df = pd.read_csv(data)

    df = df[(df['date'] >= date) & (df['date'] <= date)]

    get_trading_session_times(df)

def get_trading_session_times(df):
    df = df['time'].between_time('09:00', '09:30')

    print(df)


get_days(data, date)

What am I doing wrong?

Advertisement

Answer

between_time is only valid if your index is a DateTiimeIndex

As your string time is well formatted, you can use between to compare them because your values can be sorted in lexicographical order.

>>> df[df['time'].between('09:00', '09:30')]
          v      vw     o     c     h     l    n        date      time
0  134730.0  3.6805  3.60  3.61  3.90  3.58  494  2022-01-26  09:00:00
1   72594.0  3.6324  3.60  3.62  3.70  3.57  376  2022-01-26  09:01:00
2   51828.0  3.6151  3.62  3.63  3.65  3.57  278  2022-01-26  09:02:00
3   40245.0  3.6343  3.63  3.65  3.65  3.62  191  2022-01-26  09:03:00
4   76428.0  3.6094  3.64  3.62  3.66  3.57  298  2022-01-26  09:04:00

Update

If your time column contains a time object:

from datetime import time

df['time'] = pd.to_datetime(df['time']).dt.time
out = df[df['time'].between(time(9, 0), time(9, 30))]
print(out)

# Output
          v      vw     o     c     h     l    n        date      time
0  134730.0  3.6805  3.60  3.61  3.90  3.58  494  2022-01-26  09:00:00
1   72594.0  3.6324  3.60  3.62  3.70  3.57  376  2022-01-26  09:01:00
2   51828.0  3.6151  3.62  3.63  3.65  3.57  278  2022-01-26  09:02:00
3   40245.0  3.6343  3.63  3.65  3.65  3.62  191  2022-01-26  09:03:00
4   76428.0  3.6094  3.64  3.62  3.66  3.57  298  2022-01-26  09:04:00
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement