Skip to content
Advertisement

Airflow run python script connected via gcsfuse using PythonOperator

I want to run a Python script that is stored in this gcp directory:

 /home/airflow/gcsfuse/dags/external/projectXYZ/test.py

I used the Bash Operator before to execute the script which works in theory but I’m getting some errors for some functions in some python libraries. Therefore I want to test the PythonOperator if it works. For the BashOperator I used the following code snippet:

run_python = BashOperator(
        task_id='run_python',
        bash_command='python /home/airflow/gcsfuse/dags/external/projectXYZ/test.py'
    )

For the PythonOperator I saw some posts importing a function of a python script. However I don’t know how I get Airflow to recognize an import. The only option I have to interact between stuff on the gcp and Airflow is through the gcsfuse/dags/external folder. How can I execute the file from this path instead of calling a function in the PythonOperator?

Advertisement

Answer

So after some researching and testing I came to the conclusion that it is not possible to execute a python file which is located on a gcp storage bucket with the PytonOperator. If there is a python file in a gcp storage bucket which is connected to Airflow via gcsfuse then you need to use the BashOperator. If you want to use the PythonOperator you either have to write you python code inside your dag and call a function with the PythonOperator or you import a function from a python file that is already stored on the airflow storage itself and then call this function with the PythonOperator.

Feel free to correct me if I am mistaken

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement