I am trying to containerize my airflow setup. I’ve been tasked to keep the environment the same, just move it into a docker container. We currently have Airflow and all our dependencies installed within a anaconda environment. So what I’ve done is created a custom docker image that installs anaconda and creates my environment. The problem is, our current environment utilized systemd services to start airflow where Docker needs it to run via airflow command “airflow webserver/scheduler/worker” and when I run it like that, I get an error. I get the error after I start up the scheduler.
Our DAGs require a custom repo that helps communicate to our database servers. Within that repo we are using pathlib to get the path of a config file and pass it to configparser.
Basically like this:
import configparser from pathlib import Path config = configparser.ConfigParser() p = Path(__file__) p = p.parent config_file_name = 'comms.conf' config.read(p.joinpath('config', config_file_name))
This is throwing an the following error for all my DAGs in Airflow:
Broken DAG: [/opt/airflow/dags/example_folder/example_dag.py] 'PosixPath' object is not iterable
On the command line the error is:
[2021-01-11 19:53:13,868] {dagbag.py:259} ERROR - Failed to import: /opt/airflow/dags/example_folder/example_dag.py Traceback (most recent call last): File "/opt/anaconda3/envs/airflow/lib/python3.7/site-packages/airflow/models/dagbag.py", line 256, in process_file m = imp.load_source(mod_name, filepath) File "/opt/anaconda3/envs/airflow/lib/python3.7/imp.py", line 172, in load_source module = _load(spec) File "<frozen importlib._bootstrap>", line 696, in _load File "<frozen importlib._bootstrap>", line 677, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 728, in exec_module File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "/opt/airflow/example_folder/example_dag.py", line 8, in <module> dag = Dag() File "/opt/airflow/dags/util/dag_base.py", line 27, in __init__ self.comms = get_comms(Variable.get('environment')) File "/opt/airflow/repository/repo_folder/custom_script.py", line 56, in get_comms config = get_config('comms.conf') File "/opt/airflow/repository/repo_folder/custom_script.py", line 39, in get_config config.read(p.joinpath('config', config_file_name)) File "/opt/anaconda3/envs/airflow/lib/python3.7/site-packages/backports/configparser/__init__.py", line 702, in read for filename in filenames: TypeError: 'PosixPath' object is not iterable
I was able to replicate this behavior outside of the docker container, so I don’t think that has anything to do with it. It has to be a difference between how airflow runs as a systemd service and how it runs via cli?
Here is my airflow service file that works:
[Unit] Description=Airflow webserver daemon After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service Wants=postgresql.service mysql.service redis.service rabbitmq-server.service [Service] EnvironmentFile=/etc/sysconfig/airflow User=airflow Group=airflow Type=simple ExecStart=/opt/anaconda3/envs/airflow/bin/airflow webserver Restart=on-failure RestartSec=5s PrivateTmp=true [Install] WantedBy=multi-user.target
Here is the airflow environment file that I’m using within the service file. Note that I needed to export these env variables locally to get airflow to run up to this point in the cli. Also note that the custom repos live in the /opt/airflow directory.
AIRFLOW_CONFIG=/opt/airflow/airflow.cfg AIRFLOW_HOME=/opt/airflow PATH=/bin:/opt/anaconda3/envs/airflow/bin:/opt/airflow/etl:/opt/airflow:$PATH PYTHONPATH=/opt/airflow/etl:/opt/airflow:$PYTHONPATH
My airflow config is default, other then the following changes:
executor = CeleryExecutor sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@192.168.x.x:5432/airflow load_examples = False logging_level = WARN broker_url = amqp://guest:guest@127.0.0.1:5672/ result_backend = db+postgresql://airflow:airflow@192.168.x.x:5432/airflow catchup_by_default = False
configparser==3.5.3
My conda environment is using python 3.7 and the airflow version is 1.10.14. It’s running on a Centos7 server. If anyone has any ideas that could help, I would appropriate it!
Edit: If I change the line config.read(p.joinpath('config', config_file_name))
to point directly to the config like this config.read('/opt/airflow/repository/repo_folder/config/comms.conf')
it works fine. So it has something to do with how configparser handles the pathlib output? But it doesn’t have a problem with this if airflow is run via systemd service?
Edit2: I can also wrap the pathlib object in str() and it works. config.read(str(p.joinpath('config', config_file_name)))
I just want to know why this works fine with the systemd service.. I’m afraid other stuff is going to be broken?
Advertisement
Answer
I was able to fix this issue by uninstalling and installing a newer version of configparser.
configparser==5.0.1