I am using pyspark under ubuntu with python 2.7 I installed it using
pip install pyspark --user
And trying to follow the instruction to setup spark cluster
I can’t find the script start-master.sh I assume that it has to do with the fact that i installed pyspark and not regular spark
I found here that i can connect a worker node to the master via pyspark, but how do i start the master node with pyspark?
Advertisement
Answer
Well i did a bit of a mix-up in the op.
You need to get spark on the machine that should run as master. You can download it here
After extracting it, you have spark/sbin folder, there you have start-master.sh script. you need to start it with -h argument.
please note that you need to create a spark-env file like explained here and define the spark local and master variables, this is important on the master machine.
After that, on the worker nodes, use the start-slave.sh script to start worker nodes.
And you are good to go, you can use a spark context inside python to use it!