PySpark not able to move file from local to HDFS

Question

I am running hadoop in my local machine on port 8020. My name nodes exist under path /usr/local/Cellar/hadoop/hdfs/tmp/dfs/name. I have setup a pySpark project using Conda env and installed pyspark and hdfs3 dependencies. The following is my code: I am trying to copy the file from my local file system to HDFS but I am getting the following error: But

Accepted Answer

If you want to upload a local CSV using Spark you&#8217;d need to actually use Sparkspark = SparkSession.builder.getOrCreate()df = spark.read.csv('file:///path/to/file.csv')df.write.csv(output_path)Otherwise, you cannot &#8220;put&#8221; into your Homebrew location since that doesn&#8217;t exist on HDFS (at least, not unless you ran hadoop fs mkdir -p /usr/local/Cellar/... for some reason)when I try to do hdfs.mv with same params &#8230; FileNotFoundErrorBecause you need to cd to the directory with the local CSV first. Otherwise, specify the full path

Advertisement

Answer