I’m trying to make a DAG with params that can be triggered with a dag_id/task_id. The goal of this DAG is to set the state of the last executed task to “success” and to continue the pipeline from this point. exemple of pipeline: In my dag I want to be able to set “run_that” to success and automatically run “run_them”
Tag: airflow
How to load a BigQuery table from a file in GCS Bucket using Airflow?
I am new to Airflow, and I am wondering, how do I load a file from a GCS Bucket to BigQuery? So far, I have managed to do BigQuery to GCS Bucket: Can someone help me to modify my current code, so I can load a file from a GCS Bucket and load it to BigQuery? Answer For your requirement,
Airflow 2.2.2 remote worker logging getting 403 Forbidden
I have a setup where airflow is running in kubernetes (EKS) and remote worker running in docker-compose in a VM behind a firewall in a different location. Problem Airflow Web server in EKS is getting 403 forbidden error when trying to get logs on remote worker. Build Version Airflow – 2.2.2 OS – Linux – Ubuntu 20.04 LTS Kubernetes 1.22
Poetry and buildkit mount=type=cache not working when building over airflow image
I have 2 examples of docker file and one is working and another is not. The main difference between the 2 is the base image. Simple python base image docker file: Airflow base image docker file: Before building the docker file run poetry lock in the same folder as the pyproject.toml file! pyproject.toml file: In order to build the images
Airflow- how to send out email alert when sensor is set to soft fail
In the dag below, sensor A is set to soft_fail = True, because I’d like to skip B and C if A fails. The problem is I’d still like to get an email alert when A fails. But when soft_fail is true, A is marked as success when the sensor doesn’t detect anything, and no email alert would be sent
Airflow – how to skip certain tasks
For a pipeline like below, sensor was set to softfail= True I’m trying to figure out how to only skip certain tasks when the sensor fails. For example only have B and D fail but still execute C and E. Many thanks for your help. Sensor A >> B >> C >> D >> E Answer I think you could
Read run_id in airflow operator for non templated fields
I am trying to read run_id for DAG in SnowflakeOperator to set a session parameter, query_tag. But it seems like the session parameter is not templated. How can I reference run_id and use it as an input here? Answer You need to make the non-templated field templated. Then you can use it as:
Flask blueprint already registered due to setuptools returns duplicate distributions
When trying a new Airflow version, I got this error: With Apache Airflow you can define a plugin using an entry_point. I managed to track it down to a call to importlib_metadata.distributions() which returns the same object twice. Why does it return twice? Answer The importlib_metadata.distributions() call uses your PYTHONPATH environment variable, accessible via sys.path in your python project. When I
Airflow is failing my DAG when I use external scripts giving ModuleNotFoundError: No module named
I am new to Airflow, and I am trying to create a Python pipeline scheduling automation process. My project youtubecollection01 utilizes custom created modules, so when I run the DAG it fails with ModuleNotFoundError: No module named ‘Authentication’. This is how my project is structured: This is my dag file: I am importing main function from the __main__.py, however inside
Airflow run python script connected via gcsfuse using PythonOperator
I want to run a Python script that is stored in this gcp directory: I used the Bash Operator before to execute the script which works in theory but I’m getting some errors for some functions in some python libraries. Therefore I want to test the PythonOperator if it works. For the BashOperator I used the following code snippet: For