How to start a standalone cluster using pyspark?

Question

I am using pyspark under ubuntu with python 2.7 I installed it using And trying to follow the instruction to setup spark cluster I can&#8217;t find the script start-master.sh I assume that it has to do with the fact that i installed pyspark and not regular spark I found here that i can connect a worker node t…

Accepted Answer

Well i did a bit of a mix-up in the op. You need to get spark on the machine that should run as master. You can download it here After extracting it, you have spark/sbin folder, there you have start-master.sh script. you need to start it with -h  argument.please note that you need to create a spark-env file like explained here and define the spark local and master variables, this is important on the master machine.After that, on the worker nodes, use the start-slave.sh script to start worker nodes. And you are good to go, you can use a spark context inside python to use it!

Advertisement

Answer