My Dockerfile (run by a much larger docker-compose):
# set base image (host OS)
FROM python:3.7
ARG scorer
# set the working directory in the container
WORKDIR /code
# Download and install kenlm and stt for generating scorer files
RUN git clone https://github.com/kpu/kenlm.git --depth=1
RUN git clone https://github.com/coqui-ai/stt --depth=1
RUN apt-get update && apt-get install -y build-essential libboost-all-dev cmake libeigen3-dev
RUN mkdir /code/kenlm/build
RUN cd /code/kenlm/build && cmake .. && make -j 4
# copy the dependencies file to the working directory
COPY requirements.txt .
# copy libraries (asr-common)
COPY lib ./lib
# install dependencies
RUN pip install -r requirements.txt
# copy the content of the local src directory to the working directory
COPY src_code_folder ./src_code_folder
RUN mkdir -p /code/models
RUN ls -lF models
# empty directory as expected
COPY models/scorers/$scorer /code/models/${scorer}
RUN ls -lF models
# output (as expected):
# some.scorer*
RUN curl -o /code/models/model.tflite -L https://coqui.gateway.scarf.sh/english/coqui/v1.0.0-large-vocab/model.tflite
RUN ls -lF
# output (as expected):
# src_code_folder/
# kenlm/
# lib/
# models/
# requirements.txt*
# stt/
RUN ls -F models
# output (as expected):
# some.scorer*
# model.tflite*
# command to run on container start
CMD [ "python", "-m", "src_code_folder" ]
and the relevant code from the docker-compose.yml
:
coqui-asr:
build:
context: microservices/coqui-asr
args:
scorer: some.scorer
container_name: coqui-asr
restart: always
environment:
- MQTT_ENDPOINT
depends_on:
- broker
volumes:
- ./microservices/coqui-asr/models:/code/models
The Python code I’m using to check the directory structure:
pbmms = glob.glob(os.path.join(args.models_dir, "*.tflite"))
scorers = glob.glob(os.path.join(args.models_dir, "*.scorer"))
logger.debug(f"Input: {args.models_dir}")
logger.debug(f"tflite file: {pbmms}")
logger.debug(f"scorer file: {scorers}")
logger.debug(f"this directory: {os.path.dirname(os.path.realpath(__file__))}")
logger.debug(f"current working directory: {os.getcwd()}")
for (dirpath, dirnames, filenames) in os.walk(os.getcwd()):
if 'code/stt' in dirpath or 'code/kenlm' in dirpath:
# these cloned repos have a LOT of folders we don't need to see
continue
logger.debug(f"Path: {dirpath}")
logger.debug(f"tDirectory: {dirnames}")
for (dirpath, dirnames, filenames) in os.walk("models/"):
logger.debug(f"Path: {dirpath}")
logger.debug(f"tDirectory: {dirnames}")
logger.debug(f"tFile: {filenames}")
assert len(pbmms) == 1 # passes
assert len(scorers) == 1 # fails
and its output:
DEBUG Input: models
DEBUG tflite file: ['models/model.tflite']
DEBUG scorer file: []
DEBUG this directory: /code/src_code_folder
DEBUG current working directory: /code
DEBUG Path: /code
DEBUG Directory: ['src_code_folder', 'models', 'lib', 'kenlm', 'stt']
irrelevant output of all the other folders
DEBUG Path: models/
DEBUG Directory: ['scorers'] <----- WHY IS THIS HERE
DEBUG File: ['model.tflite']
DEBUG Path: models/scorers <----- WHY DOES THIS APPEAR
DEBUG Directory: []
DEBUG File: ['some.scorer', 'other.scorer']
Traceback (most recent call last):
File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/code/src_code_folder/__main__.py", line 71, in <module>
main(args)
File "/code/src_code_folder/__main__.py", line 33, in main
assert len(scorers) == 1
AssertionError
I don’t understand why the directory structure seen/output by the Dockerfile would be completely different from the directory structure seen/output by my Python file. For some reason, even though the Python code is run by (and inside) the Docker container and only specific files and folders are copied, the file system seen/output by Python seems to match my host system’s file structure:
Clearly some stuff is getting copied but not at all what I would expect based on my Dockerfile commands and the output from said Dockerfile.
Please let me know if I need to add more information.
Advertisement
Answer
Your Compose file specifies:
volumes:
- ./microservices/coqui-asr/models:/code/models
This indicates that the /code/models
directory in the image, and whatever setup you’ve done locally on it, should be hidden and replaced with the named host directory.
Your image already contains the models, though, and it’s done some additional pre-processing on them. You should delete this volumes:
block so that you see the original contents of the image.