Skip to content
Advertisement

Need help running spark-submit in Apache Airflow

I am a relatively new user to Python and Airflow and am having a very difficult time getting spark-submit to run in an Airflow task. My goal is to get the following DAG task to run successfully

JavaScript

I know the problem lies with Airflow and not with the bash because when I run the command spark-submit --class CLASSPATH.CustomCreate ~/IdeaProjects/custom-create-job/build/libs/custom-create.jar in the terminal it runs successfully.

I have been getting the following error from the Airflow logs

JavaScript

I have also tried working with the SparkSubmitOperator(...) but have had no successful runs using it, I have only ever ended up with error logs like the following

JavaScript

Is there something I have to do using SparkSubmitOperator(...) before I can run the spark-submit ... command in a BashOperator(...) task?

Is there a way to run my spark-submit command directly from the SparkSubmitOperator(...) task?

Is there anything that I have to do to spark_default in the Admin->Connections page of Airflow?

Is there anything that must be set in the Admin->Users page of Airflow? Is there anything that must be set to allow Airflow to run spark or run a jar file created by a specific user? If so, what/how?

Advertisement

Answer

I found a workaround that solved this problem.

Create a new ssh connection (or edit the default) like the one below in the Airflow Admin->Connection page Airflow SSH Connection Example

Below is a text version if you cannot see the image
Conn ID: ssh_connection
Conn Type: SSH
Host: HOST IP ADDRESS
Username: HOST USERNAME
Password: HOST PASSWORD
Port:
Extra: {“key_file”: “/PATH TO HOME DIR/airflow/.ssh/id_rsa”, “allow_host_key_change”: “true”, “no_host_key_check”: “true”}

Then make the proper adjustments to your python script

JavaScript

I hope this solution helps other people who may be running into a similar problem like I was.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement