Skip to content
Advertisement

ModuleNotFoundError in Dataflow job

I am trying to execute a apache beam pipeline as a dataflow job in Google Cloud Platform.

My project structure is as follows:

JavaScript

Here’s my setup.py

JavaScript

Here’s my pipeline code:

JavaScript

Functionality of pipeline is

  1. Query against BigQuery Table.
  2. Count the total records fetched from querying.
  3. Print using the custom Log module present in utils folder.

I am running the job using cloud shell using command - python3 - main.py

Though the Dataflow job starts, the worker nodes throws error after few mins saying “ModuleNotFoundError: No module named ‘utils'”

“utils” folder is available and the same code works fine when executed with “DirectRunner”.

log_util and config_util files are custom util files for logging and config fetching respectively.

Also, I tried running with setup_file options as python3 - main.py --setup_file </path/of/setup.py> which makes the job to just freeze and does not proceed even after 15 mins.

How do I resolve the ModuleNotFoundError with “DataflowRunner”?

Advertisement

Answer

Posting as community wiki. As confirmed by @GopinathS the error and fix are as follows:

The error encountered by the workers is Beam SDK base version 2.32.0 does not match Dataflow Python worker version 2.28.0. Please check Dataflow worker startup logs and make sure that correct version of Beam SDK is installed.

To fix this “apache-beam[gcp]>=2.20.0” is removed from install_requires of setup.py since, the ‘>=’ is assigning the latest available version (2.32.0 as of this writing) while the workers version are only 2.28.0.

Updated setup.py:

JavaScript

Updated beam_options in the pipeline code:

JavaScript

Also make sure that you pass all the pipeline options at once and not partially.

If you pass --setup_file </path/of/setup.py> in the command then make sure to read and append the setup file path into the already defined beam_options variable using argument_parser in your code.

To avoid parsing the argument and appending into beam_options I instead added it directly in beam_options as "setup_file": "./setup.py"

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement