I want to get subset of my dataframe if date is before 2022-04-22. The original df is like below
df:
date hour value 0 2022-04-21 0 10 1 2022-04-21 1 12 2 2022-04-21 2 14 3 2022-04-23 0 10 4 2022-04-23 1 12 5 2022-04-23 2 14
I checked data type by df.dtypes and it told me ‘date’ column is ‘object’.
So I checked individual cell using df[‘date’][0] and it is datetime.date(2022, 4, 21).
Also, df[‘date’][0] < datetime.date(2022, 4, 22) gave me ‘True’
However, when I wanted to apply this smaller than in whole dataframe by
df2 = df[df[‘date’] < datetime.date(2022, 4, 22)],
it showed TypeError: ‘<‘ not supported between instances of ‘str’ and ‘datetime.date’
Why was this happening? Thanks in advance!
Advertisement
Answer
You most likely still have some string dates in one of your rows thus the first element might be ok but a complete comparison of all values using “<” will fail.
Either you use timegeb’s answer in the comments.
df['date'] = pd.to_datetime(df['date'])
or you convert them elementwise
import datetime df['date']=[datetime.datetime.strptime(d,'%Y-%m-%d') if type(d)==str else d for d in test]
Both methods might fail if you have an odd string in any of your rows. In that case you can use:
def convstr2date(d): if type(d)==str: try: d = datetime.datetime.strptime(str(d),'%Y-%m-%d') except: d = np.datetime64('NaT') return d df['date'] = [convstr2date(d) for d in df['date']]