I received a data export from an unspecified source system, which includes dates in the format of:
- 52164-01-19 00:00:00+00
- 52992-08-12 04:29:36+00
- 52838-10-19 04:08:32.999936+00
- 54022-03-12 17:20:36.999936+00
I was told that the error is caused by a faulty conversion of unix to datetime (seconds and milliseconds).
We came up with a possible approach to refactor the date in python, by separating the year into a “normal” year (2164) and convert the rest into milliseconds.
import time import math d0 = '52164-01-19 00:00:00+00' d0_y = 52164 multiplier = 5 # avg gregorian year in seconds avg_greg = (365.2425 * 24 * 3600) d1_modulus = 52164 % (1000 * multiplier) d1_rest = d0_y - d1_modulus # replace orginal year with modulus d1_time = time.strptime(str(d1_modulus) + '-10-19 04:08:32', '%Y-%m-%d %H:%M:%S') #convert to milliseconds and add d1_rest in "seconds" bigtime = time.mktime(d1_time) + (avg_greg * d1_rest) #in milliseconds biggertime = bigtime / 1000 # in seconds finaltime = time.ctime(biggertime) # finaltime = 'Thu Mar 12 07:34:41 2020' print(finaltime)
This code can break for different dates and result in multiplier/modulus combinations to create out of range values for time.mktime.
Can someone suggest an alternative or better approach to this?
Thanks in advance Gabe
Advertisement
Answer
Python datetime only supports between years 1 and 9999.
So I installed astropy
and it works fine :
import datetime import re import astropy.time as astropy_time # installed with PIP faulty_data = "52164-01-19 00:00:00+00" timeformat = re.compile(r"(?P<year>d{5})-(?P<month>d{2})-(?P<day>d{2}) (?P<hour>d{2}):(?P<minute>d{2}):(?P<second>d{2})(?P<sign_tz>[+- ])(?P<hour_tz>d{2})") match = timeformat.fullmatch(faulty_data) assert match assert len(match.group("year")) == 5 assert match.group("hour_tz") == "00" missing_thousand_years = int(match.group("year")[0]) time = astropy_time.Time({"year": int(match.group("year")), "month": int(match.group("month")), "day": int(match.group("day")), "hour": int(match.group("hour")), "minute": int(match.group("minute")), "second": int(match.group("second")) }, scale="utc") print(time) milliseconds = time.unix print(milliseconds) actual_datetime = datetime.datetime.fromtimestamp(milliseconds / 1000) print(actual_datetime)
(52164, 1, 19, 0, 0, 0.) 1583971200000.0 2020-03-12 01:00:00
So it seems that the original date was 2020-03-12 01:00:00
, which is close to what you got with your method.
NB: it raises two warnings, that you can silence