Skip to content
Advertisement

Spark ERROR in cluster: ModuleNotFoundError: No module named ‘cst_utils’

I have a Spark program with python. The structure of the program is like this:

JavaScript

Each cst_utils.py,bn_utils.py,ep_utils.py has a function called Spark_Func(sc). In main I make a Spark Context, sc, and send it to the each Spark_Func like this:

JavaScript

I config Spark cluster with two Slaves and One Master, all of them have Ubuntu 20.04 OS. I set Master IP in spark-env.sh and make SSH passwordless that Master node can access to each Slave node without Authentication. I run these command in each node:

MASTER NODE:

JavaScript

SLAVES:

JavaScript

The cluster is made, because I can see SPARK UI with this command in browser:

JavaScript

But when I want to run the program with this command:

JavaScript

I receive this error:

JavaScript

The program path is in the same path for all of the node, as well as SPARK path.

In fact,when I run the program in local mode, it run without any issue. However, to run locally, I use this config in SPARK CONTEXT:

JavaScript

Update 1 :

I also use virtual environment and install all the packages in it to distribute them among nodes. In details:

  1. To create virtual environment in python run this command:

    JavaScript
  2. Create virtual environment:

    JavaScript
  3. Enter to the enviroment:

    JavaScript
  4. I use venv-pack to pack all packages that you install in your project.

    JavaScript
  5. Pack the packages:

    JavaScript

Moreover, as Spark site said, I put all the .py files of the project in a folder and compress it into .zip folder.

Finally after making the cluster, I run this command:

JavaScript

But, it end up with this error:

JavaScript

Would you please guide me what is wrong with running code in Cluster?

Any help would be really appreciated.

Advertisement

Answer

Problem solved.

First, I installed all packages in each node with this command:

JavaScript

Then, when I run the program, I must write all the PY files which used in the program, in front of –py-files like this:

JavaScript

Then I don’t have any error about importing the file.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement