Skip to content
Advertisement

5-digit-year in date export (52164-01-19 00:00:00+00)

I received a data export from an unspecified source system, which includes dates in the format of:

  1. 52164-01-19 00:00:00+00
  2. 52992-08-12 04:29:36+00
  3. 52838-10-19 04:08:32.999936+00
  4. 54022-03-12 17:20:36.999936+00

I was told that the error is caused by a faulty conversion of unix to datetime (seconds and milliseconds).

We came up with a possible approach to refactor the date in python, by separating the year into a “normal” year (2164) and convert the rest into milliseconds.

import time
import math

d0 = '52164-01-19 00:00:00+00'
d0_y = 52164
multiplier = 5

# avg gregorian year in seconds
avg_greg = (365.2425 * 24 * 3600)

d1_modulus = 52164 % (1000 * multiplier)
d1_rest = d0_y - d1_modulus

# replace orginal year with modulus 
d1_time = time.strptime(str(d1_modulus) + '-10-19 04:08:32', '%Y-%m-%d %H:%M:%S')

#convert to milliseconds and add d1_rest in "seconds"
bigtime = time.mktime(d1_time) + (avg_greg * d1_rest) #in milliseconds
biggertime = bigtime / 1000 # in seconds
finaltime = time.ctime(biggertime)
# finaltime = 'Thu Mar 12 07:34:41 2020'
print(finaltime)

This code can break for different dates and result in multiplier/modulus combinations to create out of range values for time.mktime.

Can someone suggest an alternative or better approach to this?

Thanks in advance Gabe

Advertisement

Answer

Python datetime only supports between years 1 and 9999.
So I installed astropy and it works fine :

import datetime
import re

import astropy.time as astropy_time  # installed with PIP


faulty_data = "52164-01-19 00:00:00+00"
timeformat = re.compile(r"(?P<year>d{5})-(?P<month>d{2})-(?P<day>d{2}) (?P<hour>d{2}):(?P<minute>d{2}):(?P<second>d{2})(?P<sign_tz>[+- ])(?P<hour_tz>d{2})")

match = timeformat.fullmatch(faulty_data)
assert match
assert len(match.group("year")) == 5
assert match.group("hour_tz") == "00"
missing_thousand_years = int(match.group("year")[0])

time = astropy_time.Time({"year": int(match.group("year")),
                          "month": int(match.group("month")),
                          "day": int(match.group("day")),
                          "hour": int(match.group("hour")),
                          "minute": int(match.group("minute")),
                          "second": int(match.group("second"))
                          },
                         scale="utc")
print(time)
milliseconds = time.unix
print(milliseconds)
actual_datetime = datetime.datetime.fromtimestamp(milliseconds / 1000)
print(actual_datetime)
(52164, 1, 19, 0, 0, 0.)
1583971200000.0
2020-03-12 01:00:00

So it seems that the original date was 2020-03-12 01:00:00, which is close to what you got with your method.

NB: it raises two warnings, that you can silence

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement