Skip to content
Advertisement

Tag: hdfs

How to use multistep mrjob with json file

I’m trying to use hadoop to get some statistics from a json file like average number of stars for a category or language with most reviews. To do this I am using mrjob, I found this code: It allows to find the most used word, but I am not sure how to do this with json attributes instead of words.

PySpark not able to move file from local to HDFS

I am running hadoop in my local machine on port 8020. My name nodes exist under path /usr/local/Cellar/hadoop/hdfs/tmp/dfs/name. I have setup a pySpark project using Conda env and installed pyspark and hdfs3 dependencies. The following is my code: I am trying to copy the file from my local file system to HDFS but I am getting the following error: But

Advertisement