I am attempting to figure out in python.. what is the native hive timestamp format that it can consume as a string in parquet..
I think python is giving me a good close way BUT.. i noticed my code is yielding a good date in python..
def dt2epoch(value): d = parse(value) d = d.replace(microsecond=0) timestamp = d.replace(tzinfo=datetime.timezone.utc).timestamp() new_timestamp = round(int(timestamp), -3) new_date = datetime.datetime.fromtimestamp(new_timestamp) return new_timestamp
But when I load this in HIVE as a table
CREATE TABLE IF NOT EXISTS hive.DBNAME.TABLE_NAME ( -> COL1 VARCHAR, -> COL2 VARCHAR, -> COL3 VARCHAR, -> COL4 BIGINT, -> COL5 VARCHAR, -> COL6 VARCHAR, -> timestamped TIMESTAMP) -> WITH ( -> external_location = 's3a://MYBUCKET/dir1/dir2/', -> format = 'PARQUET');
it comes out like it’s the 70’s
I think it is dividing your timestamp with 1000, which is already in seconds. If you convert 1663529 (seconds), to timestamp, it will give you a result in 1970s. I don’t use Hive but maybe you can multiply the input by 1000 or find out how if it accepts any parameters that allow you define in the code, whether the input is in seconds or milliseconds.