Code:
from pyspark.sql import SparkSession from pysparkling import * hc = H2OContext.getOrCreate()
I am using spark standalone cluster 3.2.1 and try to initiate H2OContext in python file. while trying to run the script using spark-submit, i am getting following error:
hc = H2OContext.getOrCreate() NameError: name 'H2OContext' is not defined
Spark-submit command:
spark-submit –master spark://local:7077 –packages ai.h2o:sparkling-water-package_2.12:3.36.1.3-1-3.2 spark_h20/h2o.py
Advertisement
Answer
The parameter --packages ai.h2o:sparkling-water-package_2.12:3.36.1.3-1-3.2
downloads a jar artifact from Maven. This artifact could be used only for Scala/Java. I see there is a mistake in Sparkling Water documentation.
If you want to use Python API, you need to:
- Download SW zip archive from this location
- Unzip the archive and go to the unzipped folder
- Use the command
spark-submit --master spark://local:7077 --py-files py/h2o_pysparkling_3.2-3.36.1.3-1-3.2.zip spark_h20/h2o.py
for submitting the script to the cluster.