Skip to content
Advertisement

Airflow issue with pathlib / configparser – ‘PosixPath’ object is not iterable

I am trying to containerize my airflow setup. I’ve been tasked to keep the environment the same, just move it into a docker container. We currently have Airflow and all our dependencies installed within a anaconda environment. So what I’ve done is created a custom docker image that installs anaconda and creates my environment. The problem is, our current environment utilized systemd services to start airflow where Docker needs it to run via airflow command “airflow webserver/scheduler/worker” and when I run it like that, I get an error. I get the error after I start up the scheduler.

Our DAGs require a custom repo that helps communicate to our database servers. Within that repo we are using pathlib to get the path of a config file and pass it to configparser.

Basically like this:

import configparser
from pathlib import Path

config = configparser.ConfigParser()
p = Path(__file__)
p = p.parent
config_file_name = 'comms.conf'
config.read(p.joinpath('config', config_file_name))

This is throwing an the following error for all my DAGs in Airflow:

Broken DAG: [/opt/airflow/dags/example_folder/example_dag.py] 'PosixPath' object is not iterable

On the command line the error is:

[2021-01-11 19:53:13,868] {dagbag.py:259} ERROR - Failed to import: /opt/airflow/dags/example_folder/example_dag.py
Traceback (most recent call last):
  File "/opt/anaconda3/envs/airflow/lib/python3.7/site-packages/airflow/models/dagbag.py", line 256, in process_file
    m = imp.load_source(mod_name, filepath)
  File "/opt/anaconda3/envs/airflow/lib/python3.7/imp.py", line 172, in load_source
    module = _load(spec)
  File "<frozen importlib._bootstrap>", line 696, in _load
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/airflow/example_folder/example_dag.py", line 8, in <module>
    dag = Dag()
  File "/opt/airflow/dags/util/dag_base.py", line 27, in __init__
    self.comms = get_comms(Variable.get('environment'))
  File "/opt/airflow/repository/repo_folder/custom_script.py", line 56, in get_comms
    config = get_config('comms.conf')
  File "/opt/airflow/repository/repo_folder/custom_script.py", line 39, in get_config
    config.read(p.joinpath('config', config_file_name))
  File "/opt/anaconda3/envs/airflow/lib/python3.7/site-packages/backports/configparser/__init__.py", line 702, in read
    for filename in filenames:
TypeError: 'PosixPath' object is not iterable

I was able to replicate this behavior outside of the docker container, so I don’t think that has anything to do with it. It has to be a difference between how airflow runs as a systemd service and how it runs via cli?

Here is my airflow service file that works:

[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service

[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=airflow
Group=airflow
Type=simple
ExecStart=/opt/anaconda3/envs/airflow/bin/airflow webserver
Restart=on-failure
RestartSec=5s
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Here is the airflow environment file that I’m using within the service file. Note that I needed to export these env variables locally to get airflow to run up to this point in the cli. Also note that the custom repos live in the /opt/airflow directory.

AIRFLOW_CONFIG=/opt/airflow/airflow.cfg
AIRFLOW_HOME=/opt/airflow
PATH=/bin:/opt/anaconda3/envs/airflow/bin:/opt/airflow/etl:/opt/airflow:$PATH
PYTHONPATH=/opt/airflow/etl:/opt/airflow:$PYTHONPATH

My airflow config is default, other then the following changes:

executor = CeleryExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@192.168.x.x:5432/airflow
load_examples = False
logging_level = WARN
broker_url = amqp://guest:guest@127.0.0.1:5672/
result_backend = db+postgresql://airflow:airflow@192.168.x.x:5432/airflow
catchup_by_default =  False

configparser==3.5.3

My conda environment is using python 3.7 and the airflow version is 1.10.14. It’s running on a Centos7 server. If anyone has any ideas that could help, I would appropriate it!

Edit: If I change the line config.read(p.joinpath('config', config_file_name)) to point directly to the config like this config.read('/opt/airflow/repository/repo_folder/config/comms.conf') it works fine. So it has something to do with how configparser handles the pathlib output? But it doesn’t have a problem with this if airflow is run via systemd service?

Edit2: I can also wrap the pathlib object in str() and it works. config.read(str(p.joinpath('config', config_file_name))) I just want to know why this works fine with the systemd service.. I’m afraid other stuff is going to be broken?

Advertisement

Answer

I was able to fix this issue by uninstalling and installing a newer version of configparser.

configparser==5.0.1

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement