How to retrieve idAdjustedUTC flag value for a TIMESTAMP column in a parquet file?

Tags: , , , ,



I have a parquet file with a number of columns of type converted_type (legacy): TIMESTAMP_MICROS. I want to check if the flag isAjustedToUTC is true. I can get it this way:

import pyarrow.parquet as pq
import re   
 
arrow = pq.ParquetFile("/Parquet/File/Path/filename.parquet")
timestamp_string = str(arrow.metadata.row_group(0).column(79).statistics.logical_type)
re.search("isAdjustedToUTC=(.*), timeUnit",timestamp_string).group(1)

This gives me either true or false as string. Is there another way to retrieve the value of isAdjustedToUTC without using a regex?

Answer

As far as I can tell it’s not possible. logical_type is of type pyarrow._parquet.ParquetLogicalType which doesn’t expose directly it’s underlying members.

The only available fields are:

dir(logical_type)
>> ['__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__ne__',
 '__new__',
 '__pyx_vtable__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'to_json',
 'type']

You could use the to_json function, but it’s as dirty as the option you’ve suggested:

import json
json.loads(logical_type.to_json())['isAdjustedToUTC']
>> true


Source: stackoverflow