Skip to content

I am getting error while defining H2OContext in python spark script


from pyspark.sql import SparkSession
from pysparkling import *

hc = H2OContext.getOrCreate()

I am using spark standalone cluster 3.2.1 and try to initiate H2OContext in python file. while trying to run the script using spark-submit, i am getting following error:

hc = H2OContext.getOrCreate() NameError: name 'H2OContext' is not defined

Spark-submit command:

spark-submit –master spark://local:7077 –packages ai.h2o:sparkling-water-package_2.12: spark_h20/



The parameter --packages ai.h2o:sparkling-water-package_2.12: downloads a jar artifact from Maven. This artifact could be used only for Scala/Java. I see there is a mistake in Sparkling Water documentation.

If you want to use Python API, you need to:

  • Download SW zip archive from this location
  • Unzip the archive and go to the unzipped folder
  • Use the command spark-submit --master spark://local:7077 --py-files py/ spark_h20/ for submitting the script to the cluster.
User contributions licensed under: CC BY-SA
2 People found this is helpful