I have been trying to create Dataproc Workflow Template to execute Jupyter notebooks present on my Dataproc cluster but when I instantiate that template the jobs fail whereas if I download my notebooks as .py
files and then add them to a Workflow Template it works.
I am just curious if there is any way to create a Workflow Template that can directly take in existing Jupyter notebooks as its steps.
Advertisement
Answer
Direct execution of Jupyter notebooks via Jobs and Workflow Template APIs is not yet supported on Dataproc.
You can workaround this by writing and submitting a PySpark job/Workflow Template step that will use nbconvert
to execute a notebook.