I received a data export from an unspecified source system, which includes dates in the format of:
- 52164-01-19 00:00:00+00
- 52992-08-12 04:29:36+00
- 52838-10-19 04:08:32.999936+00
- 54022-03-12 17:20:36.999936+00
I was told that the error is caused by a faulty conversion of unix to datetime (seconds and milliseconds).
We came up with a possible approach to refactor the date in python, by separating the year into a “normal” year (2164) and convert the rest into milliseconds.
import time
import math
d0 = '52164-01-19 00:00:00+00'
d0_y = 52164
multiplier = 5
# avg gregorian year in seconds
avg_greg = (365.2425 * 24 * 3600)
d1_modulus = 52164 % (1000 * multiplier)
d1_rest = d0_y - d1_modulus
# replace orginal year with modulus
d1_time = time.strptime(str(d1_modulus) + '-10-19 04:08:32', '%Y-%m-%d %H:%M:%S')
#convert to milliseconds and add d1_rest in "seconds"
bigtime = time.mktime(d1_time) + (avg_greg * d1_rest) #in milliseconds
biggertime = bigtime / 1000 # in seconds
finaltime = time.ctime(biggertime)
# finaltime = 'Thu Mar 12 07:34:41 2020'
print(finaltime)
This code can break for different dates and result in multiplier/modulus combinations to create out of range values for time.mktime.
Can someone suggest an alternative or better approach to this?
Thanks in advance Gabe
Advertisement
Answer
Python datetime only supports between years 1 and 9999.
So I installed astropy
and it works fine :
import datetime
import re
import astropy.time as astropy_time # installed with PIP
faulty_data = "52164-01-19 00:00:00+00"
timeformat = re.compile(r"(?P<year>d{5})-(?P<month>d{2})-(?P<day>d{2}) (?P<hour>d{2}):(?P<minute>d{2}):(?P<second>d{2})(?P<sign_tz>[+- ])(?P<hour_tz>d{2})")
match = timeformat.fullmatch(faulty_data)
assert match
assert len(match.group("year")) == 5
assert match.group("hour_tz") == "00"
missing_thousand_years = int(match.group("year")[0])
time = astropy_time.Time({"year": int(match.group("year")),
"month": int(match.group("month")),
"day": int(match.group("day")),
"hour": int(match.group("hour")),
"minute": int(match.group("minute")),
"second": int(match.group("second"))
},
scale="utc")
print(time)
milliseconds = time.unix
print(milliseconds)
actual_datetime = datetime.datetime.fromtimestamp(milliseconds / 1000)
print(actual_datetime)
(52164, 1, 19, 0, 0, 0.)
1583971200000.0
2020-03-12 01:00:00
So it seems that the original date was 2020-03-12 01:00:00
, which is close to what you got with your method.
NB: it raises two warnings, that you can silence