Skip to content
Advertisement

Pandas datetime filter

I want to get subset of my dataframe if date is before 2022-04-22. The original df is like below

df:

    date       hour    value
0  2022-04-21  0       10   
1  2022-04-21  1       12   
2  2022-04-21  2       14 
3  2022-04-23  0       10   
4  2022-04-23  1       12   
5  2022-04-23  2       14   

I checked data type by df.dtypes and it told me ‘date’ column is ‘object’.

So I checked individual cell using df[‘date’][0] and it is datetime.date(2022, 4, 21).

Also, df[‘date’][0] < datetime.date(2022, 4, 22) gave me ‘True’

However, when I wanted to apply this smaller than in whole dataframe by

df2 = df[df[‘date’] < datetime.date(2022, 4, 22)],

it showed TypeError: ‘<‘ not supported between instances of ‘str’ and ‘datetime.date’

Why was this happening? Thanks in advance!

Advertisement

Answer

You most likely still have some string dates in one of your rows thus the first element might be ok but a complete comparison of all values using “<” will fail.

Either you use timegeb’s answer in the comments.

df['date'] = pd.to_datetime(df['date'])

or you convert them elementwise

import datetime
df['date']=[datetime.datetime.strptime(d,'%Y-%m-%d') if type(d)==str else d for d in test]

Both methods might fail if you have an odd string in any of your rows. In that case you can use:

def convstr2date(d):
    if type(d)==str:
        try:    
            d = datetime.datetime.strptime(str(d),'%Y-%m-%d')
        except:
            d = np.datetime64('NaT')
    return d

df['date'] = [convstr2date(d) for d in df['date']]
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement