date_cet col1 --------------------------------------- 2021-10-31 02:00:00+02:00 7.0 2021-10-31 02:00:00+02:00 7.0 2021-10-31 02:00:00+02:00 8.0 2021-10-31 02:00:00+01:00 10.0 2021-10-31 02:00:00+01:00 11.0
I have a data frame that has columns looking similar to this. This data is imported from SQL into a Pandas data frame, and when I print out the dtypes
I can see that the date_cet
column is object
. Since I need it further on, I want to convert it to a datetime object. However, the stuff I’ve tried just doesn’t work, and I think it might have something to do with 1) the timezone difference and 2) the fact that this date is where DST changes (i.e. the +01:00
and +02:00
).
I’ve tried to do stuff like this:
import datetime as dt df["new_date"] = [dt.datetime.strptime(str(x), "%Y-%m-%d %H:%M:%S %z") for x in df["date_cet"]] df['new_date']= pd.to_datetime(df['date_cet'])
and a hand full of other stuff.
The first gives an error of:
ValueError: time data '2021-10-31 02:00:00+02:00' does not match format '%Y-%m-%d %H:%M:%S %z'
And the last:
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
Basically, I have no idea how to fix this. I just need this column to become a datetime[ns, Europe/Copenhagen] type of column, but everything I’ve done so far doesn’t work.
Advertisement
Answer
In the datetime string (‘2021-10-31 02:00:00+02:00’) there is no space between %S
and %z
try to change to this format – "%Y-%m-%d %H:%M:%S%z"
import datetime as dt df["new_date"] = [dt.datetime.strptime(str(x), "%Y-%m-%d %H:%M:%S%z") for x in df["date_cet"]] df['new_date']= pd.to_datetime(df['date_cet'])
Update:
to fix the error try adding – utc=True
:
import datetime as dt df["new_date"] = [dt.datetime.strptime(str(x), "%Y-%m-%d %H:%M:%S%z") for x in df["date_cet"]] df['new_date']= pd.to_datetime(df['date_cet'], utc=True)
you can do this by one line:
df['new_date']= pd.to_datetime(df['date_cet'], format="%Y-%m-%d %H:%M:%S%z", utc=True)