Skip to content
Advertisement

How to start a standalone cluster using pyspark?

I am using pyspark under ubuntu with python 2.7 I installed it using

pip install pyspark --user 

And trying to follow the instruction to setup spark cluster

I can’t find the script start-master.sh I assume that it has to do with the fact that i installed pyspark and not regular spark

I found here that i can connect a worker node to the master via pyspark, but how do i start the master node with pyspark?

Advertisement

Answer

Well i did a bit of a mix-up in the op.

You need to get spark on the machine that should run as master. You can download it here

After extracting it, you have spark/sbin folder, there you have start-master.sh script. you need to start it with -h argument.

please note that you need to create a spark-env file like explained here and define the spark local and master variables, this is important on the master machine.

After that, on the worker nodes, use the start-slave.sh script to start worker nodes.

And you are good to go, you can use a spark context inside python to use it!

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement