Skip to content
Advertisement

Read shapefile from HDFS with geopandas

I have a shapefile on my HDFS and I would like to import it in my Jupyter Notebook with geopandas (version 0.8.1).
I tried the standard read_file() method but it does not recognize the HDFS directory; instead I believe it searches in my local directory, as I made a test with the local directory and reads the shapefile correctly.

This is the code I used:

JavaScript

and the error I obtained:

JavaScript

So, I was wondering whether it is actually possible to read a shapefile, stored in HDFS, with geopandas. If yes, how?

Advertisement

Answer

If someone is still looking for an answer to this question, I managed to find a workaround.

First of all, you need a .zip file which contains all the data related to your shapefile (.shp, .shx, .dbf, …). Then, we use pyarrow to establish a connection to HDFS and fiona to read the zipped shapefile.

Package versions I’m using:

  • pyarrow==2.0.0
  • fiona==1.8.18

The code:

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement