Skip to content
Advertisement

Can we create Dataproc Workflow Template by passing a path of Jupyter notebooks in step_id?

I have been trying to create Dataproc Workflow Template to execute Jupyter notebooks present on my Dataproc cluster but when I instantiate that template the jobs fail whereas if I download my notebooks as .py files and then add them to a Workflow Template it works.

I am just curious if there is any way to create a Workflow Template that can directly take in existing Jupyter notebooks as its steps.

Advertisement

Answer

Direct execution of Jupyter notebooks via Jobs and Workflow Template APIs is not yet supported on Dataproc.

You can workaround this by writing and submitting a PySpark job/Workflow Template step that will use nbconvert to execute a notebook.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement