Skip to content
Advertisement

I am getting error while defining H2OContext in python spark script

Code:

from pyspark.sql import SparkSession
from pysparkling import *

hc = H2OContext.getOrCreate()

I am using spark standalone cluster 3.2.1 and try to initiate H2OContext in python file. while trying to run the script using spark-submit, i am getting following error:

hc = H2OContext.getOrCreate() NameError: name 'H2OContext' is not defined

Spark-submit command:

spark-submit –master spark://local:7077 –packages ai.h2o:sparkling-water-package_2.12:3.36.1.3-1-3.2 spark_h20/h2o.py

Advertisement

Answer

The parameter --packages ai.h2o:sparkling-water-package_2.12:3.36.1.3-1-3.2 downloads a jar artifact from Maven. This artifact could be used only for Scala/Java. I see there is a mistake in Sparkling Water documentation.

If you want to use Python API, you need to:

  • Download SW zip archive from this location
  • Unzip the archive and go to the unzipped folder
  • Use the command spark-submit --master spark://local:7077 --py-files py/h2o_pysparkling_3.2-3.36.1.3-1-3.2.zip spark_h20/h2o.py for submitting the script to the cluster.
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement