python datetime gives one datetime, hive/java another when converting timestamp

I am attempting to figure out in python.. what is the native hive timestamp format that it can consume as a string in parquet..

I think python is giving me a good close way BUT.. i noticed my code is yielding a good date in python..

def dt2epoch(value):
    d = parse(value)
    d = d.replace(microsecond=0)
    timestamp = d.replace(tzinfo=datetime.timezone.utc).timestamp()
    new_timestamp = round(int(timestamp), -3)

    new_date = datetime.datetime.fromtimestamp(new_timestamp)

    return new_timestamp

JavaScript
​x
 
def dt2epoch(value):
    d = parse(value)
    d = d.replace(microsecond=0)
    timestamp = d.replace(tzinfo=datetime.timezone.utc).timestamp()
    new_timestamp = round(int(timestamp), -3)
​
    new_date = datetime.datetime.fromtimestamp(new_timestamp)
​
    return new_timestamp
​

But when I load this in HIVE as a table

CREATE TABLE IF NOT EXISTS hive.DBNAME.TABLE_NAME (
           ->                  COL1           VARCHAR,
           ->                  COL2           VARCHAR,
           ->                  COL3           VARCHAR,
           ->                  COL4           BIGINT,
           ->                  COL5           VARCHAR,
           ->                  COL6           VARCHAR,
           ->                  timestamped    TIMESTAMP)
           ->                WITH (
           ->                  external_location = 's3a://MYBUCKET/dir1/dir2/',
           ->                  format = 'PARQUET');

JavaScript
 
CREATE TABLE IF NOT EXISTS hive.DBNAME.TABLE_NAME (
           ->                  COL1           VARCHAR,
           ->                  COL2           VARCHAR,
           ->                  COL3           VARCHAR,
           ->                  COL4           BIGINT,
           ->                  COL5           VARCHAR,
           ->                  COL6           VARCHAR,
           ->                  timestamped    TIMESTAMP)
           ->                WITH (
           ->                  external_location = 's3a://MYBUCKET/dir1/dir2/',
           ->                  format = 'PARQUET');
​
​

it comes out like it’s the 70’s

Answer

I think it is dividing your timestamp with 1000, which is already in seconds. If you convert 1663529 (seconds), to timestamp, it will give you a result in 1970s. I don’t use Hive but maybe you can multiply the input by 1000 or find out how if it accepts any parameters that allow you define in the code, whether the input is in seconds or milliseconds.

Advertisement

Answer