Tag: pyarrow

Could not build wheels for pyarrow

This issue occurred when I install streamlit. I had also tried to install “pyarrow” separately. But the same error occurred. Both Window and Python are 64bit. Can anyone please help me with this Issue? Thank you in advance. enter image description here enter image description here Also tried to install pyproject.toml. Answer pyarrow wheels are not available for Python3.11 on

How to read tsv file from vaex and output a pyarrow parquet file?

apache-arrow parquet pyarrow python vaex

On these vaex and pyarrow version: When reading a tsv file and exporting it to arrow, the arrow table couldn’t be properly loaded by pyarrow.read_table(), e.g. given a file, e.g. s2t.tsv: The file looks like this: And when I tried exporting the tsv to arrow as such, then reading it back: It throws the following error: Is there some additional

How to use pyarrow parquet with multiprocessing

hdfs parquet pyarrow python python-multiprocessing

I want to read multiple hdfs files simultaneously using pyarrow and multiprocessing. The simple python script works (see below), but if I try to do the same thing with multiprocessing, then it hangs indefinitely. My only guess is that env is different somehow, but all the environment variable should be the same in the child process and parent process. I’ve

How to retrieve idAdjustedUTC flag value for a TIMESTAMP column in a parquet file?

metadata parquet pyarrow python timestamp

I have a parquet file with a number of columns of type converted_type (legacy): TIMESTAMP_MICROS. I want to check if the flag isAjustedToUTC is true. I can get it this way: This gives me either true or false as string. Is there another way to retrieve the value of isAdjustedToUTC without using a regex? Answer As far as I can

Transforming a pandas df to a parquet-file-bytes-object

azure pandas pyarrow python

I have a pandas dataframe and want to write it as a parquet file to the Azure file storage. So far I have not been able to transform the dataframe directly into a bytes which I then can upload to Azure. My current workaround is to save it as a parquet file to the local drive, then read it as

How to read a list of parquet files from S3 as a pandas dataframe using pyarrow?

boto3 dataframe pandas pyarrow python

I have a hacky way of achieving this using boto3 (1.4.4), pyarrow (0.4.1) and pandas (0.20.3). First, I can read a single parquet file locally like this: I can also read a directory of parquet files locally like this: Both work like a charm. Now I want to achieve the same remotely with files stored in a S3 bucket. I