Skip to content
Advertisement

Use existing celery workers for Airflow’s Celeryexecutor workers

I am trying to introduce dynamic workflows into my landscape that involves multiple steps of different model inference where the output from one model gets fed into another model.Currently we have few Celery workers spread across hosts to manage the inference chain. As the complexity increase, we are attempting to build workflows on the fly. For that purpose, I got a dynamic DAG setup with Celeryexecutor working. Now, is there a way I can retain the current Celery setup and route airflow driven tasks to the same workers? I do understand that the setup in these workers should have access to the DAG folders and environment same as the airflow server. I want to know how the celery worker need to be started in these servers so that airflow can route the same tasks that used to be done by the manual workflow from a python application. If I start the workers using command “airflow celery worker”, I cannot access my application tasks. If I start celery the way it is currently ie “celery -A proj”, airflow has nothing to do with it. Looking for ideas to make it work.

Advertisement

Answer

Thanks @DejanLekic. I got it working (though the DAG task scheduling latency was too much that I dropped the approach). If someone is looking to see how this was accomplished, here are few things I did to get it working.

  1. Change the airflow.cfg to change the executor,queue and result back-end settings (Obvious)
  2. If we have to use Celery worker spawned outside the airflow umbrella, change the celery_app_name setting to celery.execute instead of airflow.executors.celery_execute and change the Executor to “LocalExecutor”. I have not tested this, but it may even be possible to avoid switching to celery executor by registering airflow’s Task in the project’s celery App.
  3. Each task will now call send_task(), the AsynResult object returned is then stored in either Xcom(implicitly or explicitly) or in Redis(implicitly push to the queue) and the child task will then gather the Asyncresult ( it will be an implicit call to get the value from Xcom or Redis) and then call .get() to obtain the result from the previous step.

Note: It is not necessary to split the send_task() and .get() between two tasks of the DAG. By splitting them between parent and child, I was trying to take advantage of the lag between tasks. But in my case, the celery execution of tasks completed faster than airflow’s inherent latency in scheduling dependent tasks.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement