How to convert dataframe column into UTC datetime format?

Tags: , , ,



I want to convert this Origin column in the dataframe data_copy to UTC datetime format

import pandas as pd

>>>data_copy["Origin"]
 
0       1669-06-04 00:00:00
1       1669-06-22 00:00:00
2       1720-07-15 00:00:00
3       1803-09-01 00:00:00
4       1816-05-26 00:00:00
        
6395    2020-03-29 18:27:36
6396    2020-03-29 18:47:53
6397    2020-03-29 20:05:19
6398    2020-03-30 02:19:27
6399    2020-03-30 06:11:36

There is also some data entries with 00:00:00 Time (I need to convert this also) I tried this command data_copy["Origin"] = pd.to_datetime(data_copy["Origin"],infer_datetime_format=True) But I am getting error like this

Traceback (most recent call last):

  File "C:ProgramDataAnaconda3envsroseslibsite-packagespandascorearraysdatetimes.py", line 2054, in objects_to_datetime64ns
    values, tz_parsed = conversion.datetime_to_datetime64(data)

  File "pandas_libstslibsconversion.pyx", line 350, in pandas._libs.tslibs.conversion.datetime_to_datetime64

TypeError: Unrecognized value type: <class 'str'>


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "<ipython-input-93-aead2d23f264>", line 1, in <module>
    data_copy["Origin"] = pd.to_datetime(data_copy["Origin"],infer_datetime_format=True)

  File "C:ProgramDataAnaconda3envsroseslibsite-packagespandascoretoolsdatetimes.py", line 803, in to_datetime
    values = convert_listlike(arg._values, format)

  File "C:ProgramDataAnaconda3envsroseslibsite-packagespandascoretoolsdatetimes.py", line 466, in _convert_listlike_datetimes
    allow_object=True,

  File "C:ProgramDataAnaconda3envsroseslibsite-packagespandascorearraysdatetimes.py", line 2059, in objects_to_datetime64ns
    raise e

  File "C:ProgramDataAnaconda3envsroseslibsite-packagespandascorearraysdatetimes.py", line 2050, in objects_to_datetime64ns
    require_iso8601=require_iso8601,

  File "pandas_libstslib.pyx", line 352, in pandas._libs.tslib.array_to_datetime

  File "pandas_libstslib.pyx", line 574, in pandas._libs.tslib.array_to_datetime

  File "pandas_libstslib.pyx", line 570, in pandas._libs.tslib.array_to_datetime

  File "pandas_libstslib.pyx", line 546, in pandas._libs.tslib.array_to_datetime

  File "pandas_libstslibsnp_datetime.pyx", line 113, in pandas._libs.tslibs.np_datetime.check_dts_bounds

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1669-06-04 00:00:00

How could I convert the column into UTC datetime format?

Answer

Here is problem datetimes are outside limits in pandas link:

In [92]: pd.Timestamp.min
Out[92]: Timestamp(‘1677-09-21 00:12:43.145225’)

In [93]: pd.Timestamp.max
Out[93]: Timestamp(‘2262-04-11 23:47:16.854775807’)

Possible solution is replace values to NaT by errors='coerce' parameter:

data_copy["Origin"] = pd.to_datetime(data_copy["Origin"],
                                     infer_datetime_format=True, 
                                     errors='coerce')


Source: stackoverflow