Skip to content
Advertisement

Why is pd.to_datetime() only changing type if utc is True?

After loading my csv file into my notebook in VS Code i wanted to change the columns type from object to datetime for some columns. So i did the following:

object values of columns

These are example values in the columns.

my_col_1 -> 2022-02-07 20:19:04+01:00
my_col_2 -> 2020-10-27
my_col_3 -> 2020-10-27 16:00:00+01:00
my_col_4 -> 2020-10-27 18:00:00+01:00

check

df_example.dtypes

my_col_1 object
my_col_2 object
my_col_3 object
my_col_4 object

convert

format_1 = '%%Y-%%m-%%d %%H:%%M:%%S' 
format_2 = '%%Y-%%m-%%d' 

df_example['my_col_1'] = pd.to_datetime(df_example['my_col_1'], format=format_1, 
    exact=True, errors='raise', utc=False)

df_example['my_col_2'] = pd.to_datetime(df_example['my_col_2'], format=format_2,
        exact=True, errors='raise', utc=False)
    
df_example['my_col_3'] = pd.to_datetime(df_example['my_col_3'],format=format_1,
        exact=True, errors='raise', utc=False)
    
df_example['my_col_4'] = pd.to_datetime(df_example['my_col_4'],format=format_1,
        exact=True, errors='raise', utc=False)

check

After converting the columns i wanted to check if everything worked out fine.

df_example.dtypes

my_col_1 datetime64[ns, pytz.FixedOffset(60)]
my_col_2 datetime64[ns]
my_col_3 object
my_col_4 object

This is strange. It seems my_col_3 and my_col_4 were not converted. But at the same time no Issue or Error was raised also while processing the code. my_col_3 and my_col_4 have the exact same format as my_col_1, yet they were not converted.

second approach

lets change the utc=Falseto utc=True

df_example['my_col_3'] = pd.to_datetime(df_example['my_col_3'],format=format_1,
        exact=True, errors='raise', utc=True)
    
df_example['my_col_4'] = pd.to_datetime(df_example['my_col_4'],format=format_1,
        exact=True, errors='raise', utc=True)

check

lets check again.

df_example.dtypes

my_col_1 datetime64[ns, pytz.FixedOffset(60)]
my_col_2 datetime64[ns]
my_col_3 datetime64[ns, UTC]
my_col_4 datetime64[ns, UTC]

now i have

  • my_col_1 with UTC+01:00
  • my_col_3 with UTC+00:00 (turns 15:00:00 to 14:00:00 for example)
  • my_col_4 with UTC+00:00 (the same)

Question

This is strange. I do not want my_col_3 and my_col_4 to convert to the UTC+00:00 Timezone because i live in UTC+01:00.

Do i have to introduce another process step and re-apply my Timezone once again to these columns? But it seems redundant. Why not simply use the timezone already given in the columns like pandas does in my_col_1? Can i tell pd.to_datetime() what my timezone is? like utc=Truebut with tz=01:00?

Advertisement

Answer

So here is my Solution. Converting it with dt.tz_convert('Europe/Berlin') does the job. So i guess there is no way around to just take the already given +01:00 value. Turn the Column with to_datetime to UTC +00:00and afterwards you have to convert it to your Timezone again.

df_example['my_col_3'] = pd.to_datetime(df_example['my_col_3'],format=format_1,
        exact=True, errors='raise', utc=True).dt.tz_convert('Europe/Berlin')
    
df_example['my_col_4'] = pd.to_datetime(df_example['my_col_4'],format=format_1,
        exact=True, errors='raise', utc=True).dt.tz_convert('Europe/Berlin')

now my columns have changed into datetime with right Timezone

df_example.dtypes

my_col_3 datetime64[ns, Europe/Berlin]
my_col_4 datetime64[ns, Europe/Berlin]

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement