Skip to content
Advertisement

Tag: parquet

How to keep dtypes when reading a parquet file(read_parquet()) in pandas?

Code: As you can see here, [{‘b’: 1}] becomes [{‘b’: 1.0}]. How can I keep dtypes even in reading the parquet file? Answer You can try to use pyarrow.parquet.read_table and pyarrow.Table.to_pandas with integer_object_nulls (see the doc) a 0 [{‘b’: 1}] 1 [{‘b’: None}] On the other hand, it looks like pandas.read_parquet with use_nullable_dtypes doesn’t work. a 0 [{‘b’: 1.0}] 1

Retrieving data from multiple parquet files into one dataframe (Python)

I want to start by saying this is the first time I work with Parquet files. I have a list of 2615 parquet files that I downloaded from an S3 bucket and I want to read them into one dataframe. They follow the same folder structure and I am putting an example below: /Forecasting/as_of_date=2022-02-01/type=full/export_country=Spain/import_country=France/000.parquet’ The file name 000.parquet is always

How to read a Parquet file into Pandas DataFrame?

How to read a modestly sized Parquet data-set into an in-memory Pandas DataFrame without setting up a cluster computing infrastructure such as Hadoop or Spark? This is only a moderate amount of data that I would like to read in-memory with a simple Python script on a laptop. The data does not reside on HDFS. It is either on the

Advertisement