Skip to content
Advertisement

Why does my Python file directory structure not match my Dockerfile output?

My Dockerfile (run by a much larger docker-compose):

# set base image (host OS)
FROM python:3.7

ARG scorer

# set the working directory in the container
WORKDIR /code

# Download and install kenlm and stt for generating scorer files
RUN git clone https://github.com/kpu/kenlm.git --depth=1
RUN git clone https://github.com/coqui-ai/stt --depth=1
RUN apt-get update && apt-get install -y build-essential libboost-all-dev cmake libeigen3-dev
RUN mkdir /code/kenlm/build
RUN cd /code/kenlm/build && cmake .. && make -j 4

# copy the dependencies file to the working directory
COPY requirements.txt .

# copy libraries (asr-common)
COPY lib ./lib

# install dependencies
RUN pip install -r requirements.txt

# copy the content of the local src directory to the working directory
COPY src_code_folder ./src_code_folder
RUN mkdir -p /code/models
RUN ls -lF models
# empty directory as expected
COPY models/scorers/$scorer /code/models/${scorer}
RUN ls -lF models
# output (as expected):
# some.scorer*
RUN curl -o /code/models/model.tflite -L https://coqui.gateway.scarf.sh/english/coqui/v1.0.0-large-vocab/model.tflite
RUN ls -lF
# output (as expected):
# src_code_folder/
# kenlm/
# lib/
# models/
# requirements.txt*
# stt/

RUN ls -F models
# output (as expected):
# some.scorer*
# model.tflite*

# command to run on container start
CMD [ "python", "-m", "src_code_folder" ]

and the relevant code from the docker-compose.yml:

  coqui-asr:
    build:
      context: microservices/coqui-asr
      args:
        scorer: some.scorer
    container_name: coqui-asr
    restart: always
    environment:
      - MQTT_ENDPOINT
    depends_on:
      - broker
    volumes:
      - ./microservices/coqui-asr/models:/code/models

The Python code I’m using to check the directory structure:

pbmms = glob.glob(os.path.join(args.models_dir, "*.tflite"))
scorers = glob.glob(os.path.join(args.models_dir, "*.scorer"))
logger.debug(f"Input: {args.models_dir}")

logger.debug(f"tflite file: {pbmms}")
logger.debug(f"scorer file: {scorers}")
logger.debug(f"this directory: {os.path.dirname(os.path.realpath(__file__))}")
logger.debug(f"current working directory: {os.getcwd()}")
for (dirpath, dirnames, filenames) in os.walk(os.getcwd()):
    if 'code/stt' in dirpath or 'code/kenlm' in dirpath:
        # these cloned repos have a LOT of folders we don't need to see
        continue
    logger.debug(f"Path: {dirpath}")
    logger.debug(f"tDirectory: {dirnames}")
for (dirpath, dirnames, filenames) in os.walk("models/"):
    logger.debug(f"Path: {dirpath}")
    logger.debug(f"tDirectory: {dirnames}")
    logger.debug(f"tFile: {filenames}")
assert len(pbmms) == 1  # passes
assert len(scorers) == 1  # fails

and its output:

DEBUG Input: models
DEBUG tflite file: ['models/model.tflite']
DEBUG scorer file: []
DEBUG this directory: /code/src_code_folder
DEBUG current working directory: /code
DEBUG Path: /code
DEBUG      Directory: ['src_code_folder', 'models', 'lib', 'kenlm', 'stt']
...
irrelevant output of all the other folders
...
DEBUG Path: models/
DEBUG      Directory: ['scorers']          <----- WHY IS THIS HERE
DEBUG      File: ['model.tflite']
DEBUG Path: models/scorers                 <----- WHY DOES THIS APPEAR
DEBUG      Directory: []
DEBUG      File: ['some.scorer', 'other.scorer']

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/code/src_code_folder/__main__.py", line 71, in <module>
    main(args)
  File "/code/src_code_folder/__main__.py", line 33, in main
    assert len(scorers) == 1
AssertionError

I don’t understand why the directory structure seen/output by the Dockerfile would be completely different from the directory structure seen/output by my Python file. For some reason, even though the Python code is run by (and inside) the Docker container and only specific files and folders are copied, the file system seen/output by Python seems to match my host system’s file structure:
Host system file structure

Clearly some stuff is getting copied but not at all what I would expect based on my Dockerfile commands and the output from said Dockerfile.


Please let me know if I need to add more information.

Advertisement

Answer

Your Compose file specifies:

volumes:
  - ./microservices/coqui-asr/models:/code/models

This indicates that the /code/models directory in the image, and whatever setup you’ve done locally on it, should be hidden and replaced with the named host directory.

Your image already contains the models, though, and it’s done some additional pre-processing on them. You should delete this volumes: block so that you see the original contents of the image.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement