How to keep dtypes when reading a parquet file(read_parquet()) in pandas?

Question

Code: As you can see here, [{'b': 1}] becomes [{'b': 1.0}]. How can I keep dtypes even in reading the parquet file? Answer You can try to use pyarrow.parquet.read_table and pyarrow.Table.to_pandas with integer_object_nulls (see the doc) a 0 [{'b': 1}] 1 [{'b': None}] On the other hand, it looks like pandas.read_parquet with use_nullable_dtypes doesn't work. a 0 [{'b': 1.0}] 1

Accepted Answer

You can try to use pyarrow.parquet.read_table and pyarrow.Table.to_pandas with integer_object_nulls  (see the doc)import pyarrow.parquet as pqpq.read_table("a.parquet").to_pandas(integer_object_nulls=True)a0[{&#8216;b&#8217;: 1}]1[{&#8216;b&#8217;: None}]On the other hand, it looks like pandas.read_parquet with use_nullable_dtypes doesn&#8217;t work.df = pd.DataFrame({"a": [[{"b": 1}], [{"b": None}]]})df.to_parquet("a.parquet")pd.read_parquet("a.parquet", use_nullable_dtypes=True)a0[{&#8216;b&#8217;: 1.0}]1[{&#8216;b&#8217;: None}]

	a
0	[{‘b’: 1}]
1	[{‘b’: None}]

	a
0	[{‘b’: 1.0}]
1	[{‘b’: None}]

Advertisement

Answer