I have installed pyspark recently. It was installed correctly. When I am using following simple program in python, I am getting an error.
JavaScript
x
6
1
>>from pyspark import SparkContext
2
>>sc = SparkContext()
3
>>data = range(1,1000)
4
>>rdd = sc.parallelize(data)
5
>>rdd.collect()
6
while running the last line I am getting error whose key line seems to be
JavaScript
1
6
1
[Stage 0:> (0 + 0) / 4]18/01/15 14:36:32 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
2
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
3
File "/usr/local/lib/python3.5/dist-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", line 123, in main
4
("%d.%d" % sys.version_info[:2], version))
5
Exception: Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
6
I have the following variables in .bashrc
JavaScript
1
3
1
export SPARK_HOME=/opt/spark
2
export PYTHONPATH=$SPARK_HOME/python3
3
I am using Python 3.
Advertisement
Answer
By the way, if you use PyCharm, you could add PYSPARK_PYTHON
and PYSPARK_DRIVER_PYTHON
to run/debug configurations per image below