I have a parquet file with a number of columns of type converted_type (legacy): TIMESTAMP_MICROS
. I want to check if the flag isAjustedToUTC
is true. I can get it this way:
import pyarrow.parquet as pq import re arrow = pq.ParquetFile("/Parquet/File/Path/filename.parquet") timestamp_string = str(arrow.metadata.row_group(0).column(79).statistics.logical_type) re.search("isAdjustedToUTC=(.*), timeUnit",timestamp_string).group(1)
This gives me either true
or false
as string. Is there another way to retrieve the value of isAdjustedToUTC
without using a regex?
Advertisement
Answer
As far as I can tell it’s not possible. logical_type
is of type pyarrow._parquet.ParquetLogicalType
which doesn’t expose directly it’s underlying members.
The only available fields are:
dir(logical_type) >> ['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__pyx_vtable__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', 'to_json', 'type']
You could use the to_json
function, but it’s as dirty as the option you’ve suggested:
import json json.loads(logical_type.to_json())['isAdjustedToUTC'] >> true