I have a parquet file with a number of columns of type converted_type (legacy): TIMESTAMP_MICROS
. I want to check if the flag isAjustedToUTC
is true. I can get it this way:
JavaScript
x
7
1
import pyarrow.parquet as pq
2
import re
3
4
arrow = pq.ParquetFile("/Parquet/File/Path/filename.parquet")
5
timestamp_string = str(arrow.metadata.row_group(0).column(79).statistics.logical_type)
6
re.search("isAdjustedToUTC=(.*), timeUnit",timestamp_string).group(1)
7
This gives me either true
or false
as string. Is there another way to retrieve the value of isAdjustedToUTC
without using a regex?
Advertisement
Answer
As far as I can tell it’s not possible. logical_type
is of type pyarrow._parquet.ParquetLogicalType
which doesn’t expose directly it’s underlying members.
The only available fields are:
JavaScript
1
29
29
1
dir(logical_type)
2
>> ['__class__',
3
'__delattr__',
4
'__dir__',
5
'__doc__',
6
'__eq__',
7
'__format__',
8
'__ge__',
9
'__getattribute__',
10
'__gt__',
11
'__hash__',
12
'__init__',
13
'__init_subclass__',
14
'__le__',
15
'__lt__',
16
'__ne__',
17
'__new__',
18
'__pyx_vtable__',
19
'__reduce__',
20
'__reduce_ex__',
21
'__repr__',
22
'__setattr__',
23
'__setstate__',
24
'__sizeof__',
25
'__str__',
26
'__subclasshook__',
27
'to_json',
28
'type']
29
You could use the to_json
function, but it’s as dirty as the option you’ve suggested:
JavaScript
1
4
1
import json
2
json.loads(logical_type.to_json())['isAdjustedToUTC']
3
>> true
4