Code:
JavaScript
x
22
22
1
In [31]: df = pd.DataFrame({"a": [[{"b": 1}], [{"b": np.nan}]]})
2
3
In [32]: df
4
Out[32]:
5
a
6
0 [{'b': 1}]
7
1 [{'b': nan}]
8
9
In [33]: df.dtypes
10
Out[33]:
11
a object
12
dtype: object
13
14
In [34]: df.to_parquet("a.parquet")
15
16
In [35]: pd.read_parquet("a.parquet")
17
Out[35]:
18
a
19
0 [{'b': 1.0}]
20
1 [{'b': None}]
21
22
As you can see here, [{'b': 1}]
becomes [{'b': 1.0}]
.
How can I keep dtypes even in reading the parquet file?
Advertisement
Answer
You can try to use pyarrow.parquet.read_table
and pyarrow.Table.to_pandas
with integer_object_nulls
(see the doc)
JavaScript
1
4
1
import pyarrow.parquet as pq
2
3
pq.read_table("a.parquet").to_pandas(integer_object_nulls=True)
4
a | |
---|---|
0 | [{‘b’: 1}] |
1 | [{‘b’: None}] |
On the other hand, it looks like pandas.read_parquet
with use_nullable_dtypes
doesn’t work.
JavaScript
1
5
1
df = pd.DataFrame({"a": [[{"b": 1}], [{"b": None}]]})
2
3
df.to_parquet("a.parquet")
4
pd.read_parquet("a.parquet", use_nullable_dtypes=True)
5
a | |
---|---|
0 | [{‘b’: 1.0}] |
1 | [{‘b’: None}] |