After loading my csv file into my notebook in VS Code i wanted to change the columns type from object to datetime for some columns. So i did the following:
object values of columns
These are example values in the columns.
my_col_1 -> 2022-02-07 20:19:04+01:00 my_col_2 -> 2020-10-27 my_col_3 -> 2020-10-27 16:00:00+01:00 my_col_4 -> 2020-10-27 18:00:00+01:00
check
df_example.dtypes my_col_1 object my_col_2 object my_col_3 object my_col_4 object
convert
format_1 = '%Y-%m-%d %H:%M:%S'
format_2 = '%Y-%m-%d'
df_example['my_col_1'] = pd.to_datetime(df_example['my_col_1'], format=format_1,
exact=True, errors='raise', utc=False)
df_example['my_col_2'] = pd.to_datetime(df_example['my_col_2'], format=format_2,
exact=True, errors='raise', utc=False)
df_example['my_col_3'] = pd.to_datetime(df_example['my_col_3'],format=format_1,
exact=True, errors='raise', utc=False)
df_example['my_col_4'] = pd.to_datetime(df_example['my_col_4'],format=format_1,
exact=True, errors='raise', utc=False)
check
After converting the columns i wanted to check if everything worked out fine.
df_example.dtypes my_col_1 datetime64[ns, pytz.FixedOffset(60)] my_col_2 datetime64[ns] my_col_3 object my_col_4 object
This is strange. It seems my_col_3 and my_col_4 were not converted. But at the same time no Issue or Error was raised also while processing the code. my_col_3 and my_col_4 have the exact same format as my_col_1, yet they were not converted.
second approach
lets change the utc=Falseto utc=True
df_example['my_col_3'] = pd.to_datetime(df_example['my_col_3'],format=format_1,
exact=True, errors='raise', utc=True)
df_example['my_col_4'] = pd.to_datetime(df_example['my_col_4'],format=format_1,
exact=True, errors='raise', utc=True)
check
lets check again.
df_example.dtypes my_col_1 datetime64[ns, pytz.FixedOffset(60)] my_col_2 datetime64[ns] my_col_3 datetime64[ns, UTC] my_col_4 datetime64[ns, UTC]
now i have
my_col_1withUTC+01:00my_col_3withUTC+00:00(turns15:00:00to14:00:00for example)my_col_4withUTC+00:00(the same)
Question
This is strange. I do not want my_col_3 and my_col_4 to convert to the UTC+00:00 Timezone because i live in UTC+01:00.
Do i have to introduce another process step and re-apply my Timezone once again to these columns? But it seems redundant. Why not simply use the timezone already given in the columns like pandas does in my_col_1? Can i tell pd.to_datetime() what my timezone is? like utc=Truebut with tz=01:00?
Advertisement
Answer
So here is my Solution. Converting it with dt.tz_convert('Europe/Berlin') does the job. So i guess there is no way around to just take the already given +01:00 value. Turn the Column with to_datetime to UTC +00:00and afterwards you have to convert it to your Timezone again.
df_example['my_col_3'] = pd.to_datetime(df_example['my_col_3'],format=format_1,
exact=True, errors='raise', utc=True).dt.tz_convert('Europe/Berlin')
df_example['my_col_4'] = pd.to_datetime(df_example['my_col_4'],format=format_1,
exact=True, errors='raise', utc=True).dt.tz_convert('Europe/Berlin')
now my columns have changed into datetime with right Timezone
df_example.dtypes my_col_3 datetime64[ns, Europe/Berlin] my_col_4 datetime64[ns, Europe/Berlin]