I have a setup where airflow is running in kubernetes (EKS) and remote worker running in docker-compose in a VM behind a firewall in a different location.
Problem Airflow Web server in EKS is getting 403 forbidden error when trying to get logs on remote worker.
Build Version
- Airflow – 2.2.2
- OS – Linux – Ubuntu 20.04 LTS
Kubernetes
- 1.22 (EKS)
- Redis (Celery Broker) – Service Port exposed on 6379
- PostgreSQL (Celery Backend) – Service Port exposed on 5432
Airflow ENV config setup
AIRFLOW__API__AUTH_BACKEND: airflow.api.auth.backend.basic_auth AIRFLOW__CELERY__BROKER_URL: redis://<username>:<password>@redis-master.airflow-dev.svc.cluster.local:6379/0 AIRFLOW__CELERY__RESULT_BACKEND: >- db+postgresql://<username>:<password>@db-postgresql.airflow-dev.svc.cluster.local/<db> AIRFLOW__CLI__ENDPOINT_URL: http://{hostname}:8080 AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true' AIRFLOW__CORE__EXECUTOR: CeleryExecutor AIRFLOW__CORE__FERNET_KEY: <fernet_key> AIRFLOW__CORE__HOSTNAME_CALLABLE: socket.getfqdn AIRFLOW__CORE__LOAD_EXAMPLES: 'false' AIRFLOW__CORE__SQL_ALCHEMY_CONN: >- postgresql+psycopg2://<username>:<password>@db-postgresql.airflow-dev.svc.cluster.local/<db> AIRFLOW__LOGGING__BASE_LOG_FOLDER: /opt/airflow/logs AIRFLOW__LOGGING__WORKER_LOG_SERVER_PORT: '8793' AIRFLOW__WEBSERVER__BASE_URL: http://{hostname}:8080 AIRFLOW__WEBSERVER__SECRET_KEY: <secret_key> _AIRFLOW_DB_UPGRADE: 'true' _AIRFLOW_WWW_USER_CREATE: 'true' _AIRFLOW_WWW_USER_PASSWORD: <username-webserver> _AIRFLOW_WWW_USER_USERNAME: <password-webserver>
Airflow is using CeleryExecutor
Setup Test
- Network reach ability by ping – OK
- Celery Broker reach ability for both EKS and remote worker – OK
- Celery Backend reach ability for both EKS and remote worker – OK
- Firewall Port expose for remote worker Gunicorn API – OK
- curl -v telnet://:8793 test – OK (Connected)
- Airflow flower recognizing both workers from Kubernetes and remote worker – OK
- All the ENV on both webserver, worker (EKS, remote) and scheduler are identical
- Queue is setup so the DAG runs exactly in that particular worker
- Time on both docker, VM and EKS is on UTC. There is a slight 5 to 8 seconds difference in docker and the pod in EKS
- Ran webserver on the remote VM as well which can pick up and show logs
Description Airflow is able to execute the DAG in remote worker, the logs can be seen in the remote worker. I have tried all combinations of setting but still keep getting 403.
Another test which was done was just normal curl with webserver auth
This curl was done both from EKS and remote server which hosts docker-compose. Results are the same on all the server.
curl --user <username-webserver> -vvv http:<remote-worker>:8793/logs/?<rest-of-the-log-url> Getting 403 Forbidden
I might have miss configured it, but I doubt that is the case. Any tips on what I am missing here? Many thanks in advance.
Advertisement
Answer
https://github.com/apache/airflow/discussions/26624#discussioncomment-3715688
With the above discussion I had with airflow community in Github, I synced the servers to use NTP, EKS and the remote worker had 135sec time drift.
Later worked on the auth.
I rebuilt the curl auth from this file of branch 2.2 https://github.com/apache/airflow/blob/main/airflow/utils/log/file_task_handler.py
Later realized that the auth doesn’t like special characters in secret key, and added to that there was NTP time drift of 135 seconds (2min 15seconds) which would also factor in causing confusion.
I would recommend people who would face this problem to avoid special characters in secret key. Just an airflow user recommendation, I wouldn’t want to say it is the only solution but something which helped me.
Special character and combined with NTP caused confusion for debugging the issue, resolving NTP should be first thing than with the auth.