After loading my csv file into my notebook in VS Code i wanted to change the columns type from object
to datetime
for some columns. So i did the following:
object values of columns
These are example values in the columns.
my_col_1 -> 2022-02-07 20:19:04+01:00 my_col_2 -> 2020-10-27 my_col_3 -> 2020-10-27 16:00:00+01:00 my_col_4 -> 2020-10-27 18:00:00+01:00
check
df_example.dtypes my_col_1 object my_col_2 object my_col_3 object my_col_4 object
convert
format_1 = '%Y-%m-%d %H:%M:%S' format_2 = '%Y-%m-%d' df_example['my_col_1'] = pd.to_datetime(df_example['my_col_1'], format=format_1, exact=True, errors='raise', utc=False) df_example['my_col_2'] = pd.to_datetime(df_example['my_col_2'], format=format_2, exact=True, errors='raise', utc=False) df_example['my_col_3'] = pd.to_datetime(df_example['my_col_3'],format=format_1, exact=True, errors='raise', utc=False) df_example['my_col_4'] = pd.to_datetime(df_example['my_col_4'],format=format_1, exact=True, errors='raise', utc=False)
check
After converting the columns i wanted to check if everything worked out fine.
df_example.dtypes my_col_1 datetime64[ns, pytz.FixedOffset(60)] my_col_2 datetime64[ns] my_col_3 object my_col_4 object
This is strange. It seems my_col_3
and my_col_4
were not converted. But at the same time no Issue or Error was raised also while processing the code. my_col_3
and my_col_4
have the exact same format as my_col_1
, yet they were not converted.
second approach
lets change the utc=False
to utc=True
df_example['my_col_3'] = pd.to_datetime(df_example['my_col_3'],format=format_1, exact=True, errors='raise', utc=True) df_example['my_col_4'] = pd.to_datetime(df_example['my_col_4'],format=format_1, exact=True, errors='raise', utc=True)
check
lets check again.
df_example.dtypes my_col_1 datetime64[ns, pytz.FixedOffset(60)] my_col_2 datetime64[ns] my_col_3 datetime64[ns, UTC] my_col_4 datetime64[ns, UTC]
now i have
my_col_1
withUTC+01:00
my_col_3
withUTC+00:00
(turns15:00:00
to14:00:00
for example)my_col_4
withUTC+00:00
(the same)
Question
This is strange. I do not want my_col_3
and my_col_4
to convert to the UTC+00:00
Timezone because i live in UTC+01:00
.
Do i have to introduce another process step and re-apply my Timezone once again to these columns? But it seems redundant. Why not simply use the timezone already given in the columns like pandas does in my_col_1
? Can i tell pd.to_datetime()
what my timezone is? like utc=True
but with tz=01:00
?
Advertisement
Answer
So here is my Solution. Converting it with dt.tz_convert('Europe/Berlin')
does the job. So i guess there is no way around to just take the already given +01:00
value. Turn the Column with to_datetime
to UTC +00:00
and afterwards you have to convert it to your Timezone again.
df_example['my_col_3'] = pd.to_datetime(df_example['my_col_3'],format=format_1, exact=True, errors='raise', utc=True).dt.tz_convert('Europe/Berlin') df_example['my_col_4'] = pd.to_datetime(df_example['my_col_4'],format=format_1, exact=True, errors='raise', utc=True).dt.tz_convert('Europe/Berlin')
now my columns have changed into datetime with right Timezone
df_example.dtypes my_col_3 datetime64[ns, Europe/Berlin] my_col_4 datetime64[ns, Europe/Berlin]