Skip to content
Advertisement

PySpark not able to move file from local to HDFS

I am running hadoop in my local machine on port 8020. My name nodes exist under path /usr/local/Cellar/hadoop/hdfs/tmp/dfs/name. I have setup a pySpark project using Conda env and installed pyspark and hdfs3 dependencies.

The following is my code:

JavaScript

I am trying to copy the file from my local file system to HDFS but I am getting the following error:

JavaScript

But I can cd into the same directory and its exists. I am not sure why I get this error.

Also, when I try to do hdfs.mv with same params, I get the following error:

JavaScript

Advertisement

Answer

If you want to upload a local CSV using Spark you’d need to actually use Spark

JavaScript

Otherwise, you cannot “put” into your Homebrew location since that doesn’t exist on HDFS (at least, not unless you ran hadoop fs mkdir -p /usr/local/Cellar/... for some reason)

when I try to do hdfs.mv with same params … FileNotFoundError

Because you need to cd to the directory with the local CSV first. Otherwise, specify the full path

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement