Skip to content
Advertisement

Tag: hadoop

How to use multistep mrjob with json file

I’m trying to use hadoop to get some statistics from a json file like average number of stars for a category or language with most reviews. To do this I am using mrjob, I found this code: It allows to find the most used word, but I am not sure how to do this with json attributes instead of words.

PySpark not able to move file from local to HDFS

I am running hadoop in my local machine on port 8020. My name nodes exist under path /usr/local/Cellar/hadoop/hdfs/tmp/dfs/name. I have setup a pySpark project using Conda env and installed pyspark and hdfs3 dependencies. The following is my code: I am trying to copy the file from my local file system to HDFS but I am getting the following error: But

Read shapefile from HDFS with geopandas

I have a shapefile on my HDFS and I would like to import it in my Jupyter Notebook with geopandas (version 0.8.1). I tried the standard read_file() method but it does not recognize the HDFS directory; instead I believe it searches in my local directory, as I made a test with the local directory and reads the shapefile correctly. This

Advertisement