Airflow ExternalTaskSensor don’t fail when External Task fails

Question

I was trying to use the ExternalTaskSensor in Airflow 1.10.11 to manage the coordinate some dags. I have develop this code to test the functionality: The idea is that one dag triggers another one with a TriggerDagRunOperator. This sets the execution_date to the same value in both dags. This works perfectly when the state of the dummy_dag last task, ends,

Accepted Answer

failed_states was added in Airflow 2.0; you&#8217;d set it to ["failed"] to configure the sensor to fail the current DAG run if the monitored DAG run failed. If given a task ID, it&#8217;ll monitor the task state, otherwise it monitors DAG run state.In Airflow 1.x, unfortunately, the ExternalTaskSensor operation only compares DAG run or task state against allowed_states; as soon as the monitored DAG run or task reaches one of the allowed states, the sensor stops, and is then always marked as successful. By default, the sensor only looks for the SUCCESS state, so without a timeout it&#8217;ll just keep on poking forever if the monitored DAG run has failed. If you put failed in the allowed_states list, it will still only ever mark itself as successful.While you could use a timeout, like you I needed the sensor to fail it&#8217;s own DAG run if the external DAG run failed, as if the dependencies for the next task have not been met. This requires you write your own sensor, unfortunately.Here is my implementation; it is a simplified version of the ExternalTaskSensor() class, adapted to my simpler needs (no need to check for a specific task id or for anything other than the same execution date):from airflow.exceptions import AirflowFailExceptionfrom airflow.models import DagRunfrom airflow.sensors.base_sensor_operator import BaseSensorOperatorfrom airflow.utils.db import provide_sessionfrom airflow.utils.decorators import apply_defaultsfrom airflow.utils.state import Stateclass ExternalDagrunSensor(BaseSensorOperator):    """    Waits for a different DAG to complete; if the dagrun has failed, this    task fails itself as well.    :param external_dag_id: The dag_id that contains the task you want to        wait for    :type external_dag_id: str    """    template_fields = ["external_dag_id"]    ui_color = "#19647e"    @apply_defaults    def __init__(self, external_dag_id, *args, **kwargs):        super().__init__(*args, **kwargs)        self.external_dag_id = external_dag_id    @provide_session    def poke(self, context, session=None):        dag_id, execution_date = self.external_dag_id, context["execution_date"]        self.log.info("Poking for %s on %s ... ", dag_id, execution_date)        state = (            session.query(DagRun.state)            .filter(                DagRun.dag_id == dag_id,                DagRun.execution_date == execution_date,                DagRun.state.in_((State.SUCCESS, State.FAILED)),            )            .scalar()        )        if state == State.FAILED:            raise AirflowFailException(                f"The external DAG run {dag_id} {execution_date} has failed"            )        return state is not NoneThe base sensor implementation will call the poke() method repeatedly until it returns True (or the optional timeout was reached), and by raising AirflowFailException the task state is set to failed immediately, no retrying. It is then up to the downstream task configuration if they will be scheduled to run.

Advertisement

Answer