I can’t find the proper way to add dependencies to my Azure Container Instance for ML Inference.
I basically started by following this tutorial : Train and deploy an image classification model with an example Jupyter Notebook
It works fine.
Now I want to deploy my trained TensorFlow model for inference. I tried many ways, but I was never able to add python dependencies to the Environment.
From the TensorFlow curated environment
Using AzureML-tensorflow-2.4-ubuntu18.04-py37-cpu-inference :
from azureml.core import Workspace # connect to your workspace ws = Workspace.from_config() # names experiment_name = "my-experiment" model_name = "my-model" env_version="1" env_name="my-env-"+env_version service_name = str.lower(model_name + "-service-" + env_version) # create environment for the deploy from azureml.core.environment import Environment, DEFAULT_CPU_IMAGE from azureml.core.conda_dependencies import CondaDependencies from azureml.core.webservice import AciWebservice # get a curated environment env = Environment.get( workspace=ws, name="AzureML-tensorflow-2.4-ubuntu18.04-py37-cpu-inference", # ) custom_env = env.clone(env_name) custom_env.inferencing_stack_version='latest' # add packages conda_dep = CondaDependencies() python_packages = ['joblib', 'numpy', 'os', 'json', 'tensorflow'] for package in python_packages: conda_dep.add_pip_package(package) conda_dep.add_conda_package(package) # Adds dependencies to PythonSection of env custom_env.python.user_managed_dependencies=True custom_env.python.conda_dependencies=conda_dep custom_env.register(workspace=ws) # create deployment config i.e. compute resources aciconfig = AciWebservice.deploy_configuration( cpu_cores=1, memory_gb=1, tags={"experiment": experiment_name, "model": model_name}, ) from azureml.core.model import InferenceConfig from azureml.core.model import Model # get the registered model model = Model(ws, model_name) # create an inference config i.e. the scoring script and environment inference_config = InferenceConfig(entry_script="score.py", environment=custom_env) # deploy the service service = Model.deploy( workspace=ws, name=service_name, models=[model], inference_config=inference_config, deployment_config=aciconfig, ) service.wait_for_deployment(show_output=True)
I get the following log :
AzureML image information: tensorflow-2.4-ubuntu18.04-py37-cpu-inference:20220110.v1 PATH environment variable: /opt/miniconda/envs/amlenv/bin:/opt/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PYTHONPATH environment variable: Pip Dependencies --------------- EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting... 2022-01-24T10:21:09,855130300+00:00 - iot-server/finish 1 0 2022-01-24T10:21:09,856870100+00:00 - Exit code 1 is normal. Not restarting iot-server. absl-py==0.15.0 applicationinsights==0.11.10 astunparse==1.6.3 azureml-inference-server-http==0.4.2 cachetools==4.2.4 certifi==2021.10.8 charset-normalizer==2.0.10 click==8.0.3 Flask==1.0.3 flatbuffers==1.12 gast==0.3.3 google-auth==2.3.3 google-auth-oauthlib==0.4.6 google-pasta==0.2.0 grpcio==1.32.0 gunicorn==20.1.0 h5py==2.10.0 idna==3.3 importlib-metadata==4.10.0 inference-schema==1.3.0 itsdangerous==2.0.1 Jinja2==3.0.3 Keras-Preprocessing==1.1.2 Markdown==3.3.6 MarkupSafe==2.0.1 numpy==1.19.5 oauthlib==3.1.1 opt-einsum==3.3.0 pandas==1.1.5 protobuf==3.19.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 python-dateutil==2.8.2 pytz==2021.3 requests==2.27.1 requests-oauthlib==1.3.0 rsa==4.8 six==1.15.0 tensorboard==2.7.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tensorflow==2.4.0 tensorflow-estimator==2.4.0 termcolor==1.1.0 typing-extensions==3.7.4.3 urllib3==1.26.8 Werkzeug==2.0.2 wrapt==1.12.1 zipp==3.7.0 Entry script directory: /var/azureml-app/. Dynamic Python package installation is disabled. Starting AzureML Inference Server HTTP. Azure ML Inferencing HTTP server v0.4.2 Server Settings --------------- Entry Script Name: score.py Model Directory: /var/azureml-app/azureml-models/my-model/1 Worker Count: 1 Worker Timeout (seconds): 300 Server Port: 31311 Application Insights Enabled: false Application Insights Key: None Server Routes --------------- Liveness Probe: GET 127.0.0.1:31311/ Score: POST 127.0.0.1:31311/score Starting gunicorn 20.1.0 Listening at: http://0.0.0.0:31311 (69) Using worker: sync Booting worker with pid: 100 Exception in worker process Traceback (most recent call last): File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker worker.init_process() File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/workers/base.py", line 134, in init_process self.load_wsgi() File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi self.wsgi = self.app.wsgi() File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 58, in load return self.load_wsgiapp() File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp return util.import_app(self.app_uri) File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/gunicorn/util.py", line 359, in import_app mod = importlib.import_module(module) File "/opt/miniconda/envs/amlenv/lib/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1006, in _gcd_import File "<frozen importlib._bootstrap>", line 983, in _find_and_load File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 677, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 728, in exec_module File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/entry.py", line 1, in <module> import create_app File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/create_app.py", line 4, in <module> from routes_common import main File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/routes_common.py", line 32, in <module> from aml_blueprint import AMLBlueprint File "/opt/miniconda/envs/amlenv/lib/python3.7/site-packages/azureml_inference_server_http/server/aml_blueprint.py", line 28, in <module> main_module_spec.loader.exec_module(main) File "/var/azureml-app/score.py", line 4, in <module> import joblib ModuleNotFoundError: No module named 'joblib' Worker exiting (pid: 100) Shutting down: Master Reason: Worker failed to boot. 2022-01-24T10:21:13,851467800+00:00 - gunicorn/finish 3 0 2022-01-24T10:21:13,853259700+00:00 - Exit code 3 is not normal. Killing image.
From a Conda specification
Same as before, but with a fresh environment from Conda specification and changing the env_version
number :
# ... env_version="2" # ... custom_env = Environment.from_conda_specification(name=env_name, file_path="my-env.yml") custom_env.docker.base_image = DEFAULT_CPU_IMAGE # ...
with my-env.yml
:
name: my-env dependencies: - python - pip: - azureml-defaults - azureml-sdk - sklearn - numpy - matplotlib - joblib - uuid - requests - tensorflow
I get this log :
2022-01-24T11:06:54,887886931+00:00 - iot-server/run 2022-01-24T11:06:54,891839877+00:00 - rsyslog/run 2022-01-24T11:06:54,893640998+00:00 - gunicorn/run 2022-01-24T11:06:54,912032812+00:00 - nginx/run EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting... 2022-01-24T11:06:55,398420960+00:00 - iot-server/finish 1 0 2022-01-24T11:06:55,414425146+00:00 - Exit code 1 is normal. Not restarting iot-server. PATH environment variable: /opt/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PYTHONPATH environment variable: Pip Dependencies --------------- brotlipy==0.7.0 certifi==2020.6.20 cffi @ file:///tmp/build/80754af9/cffi_1605538037615/work chardet @ file:///tmp/build/80754af9/chardet_1605303159953/work conda==4.9.2 conda-package-handling @ file:///tmp/build/80754af9/conda-package-handling_1603018138503/work cryptography @ file:///tmp/build/80754af9/cryptography_1605544449973/work idna @ file:///tmp/build/80754af9/idna_1593446292537/work pycosat==0.6.3 pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1605545627475/work PySocks @ file:///tmp/build/80754af9/pysocks_1594394576006/work requests @ file:///tmp/build/80754af9/requests_1592841827918/work ruamel-yaml==0.15.87 six @ file:///tmp/build/80754af9/six_1605205313296/work tqdm @ file:///tmp/build/80754af9/tqdm_1605303662894/work urllib3 @ file:///tmp/build/80754af9/urllib3_1603305693037/work Starting HTTP server 2022-01-24T11:06:59,701365128+00:00 - gunicorn/finish 127 0 ./run: line 127: exec: gunicorn: not found 2022-01-24T11:06:59,706177784+00:00 - Exit code 127 is not normal. Killing image.
I really don’t know what I’m missing, and I’ve been searching for too long already (Azure docs, SO, …).
Thanks for your help !
Edit : Non-exhaustive list of solutions I tried :
- How to create AzureML environement and add required packages
- how to use existing conda environment as a AzureML environment
- …
- https://learn.microsoft.com/en-us/azure/machine-learning/concept-environments#environment-building-caching-and-reuse
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-environments#add-packages-to-an-environment
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-inferencing-gpus
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=python#define-a-deployment-configuration
- …
Advertisement
Answer
OK, I got it working : I started over from scratch and it worked.
I have no idea what was wrong in all my preceding tries, and that is terrible.
Multiple problems and how I (think I) solved them :
joblib
: I actually didn’t need it to load my Keras model. But the problem was not with this specific library, rather that I couldn’t add dependencies to the inference environment.Environment
: finally, I was only able to make things work with a custom env :Environment.from_conda_specification(name=version, file_path="conda_dependencies.yml")
. I haven’t been able to add my libraries (or specify a specific package version) to a “currated environment”. I don’t know why though…TensorFlow
: last problem I had was that I trained and registered my model in AzureML Notebook’sazureml_py38_PT_TF
kernel (tensorflow==2.7.0
), and tried to load it in the inference Docker image (tensorflow==2.4.0
). So I had to specify the version of TensorFlow I wanted to use in the inference image (which required the previous point to be solved).
What finally worked :
- notebook.ipynb
import uuid from azureml.core import Workspace, Environment, Model from azureml.core.webservice import AciWebservice from azureml.core.model import InferenceConfig version = "test-"+str(uuid.uuid4())[:8] env = Environment.from_conda_specification(name=version, file_path="conda_dependencies.yml") inference_config = InferenceConfig(entry_script="score.py", environment=env) ws = Workspace.from_config() model = Model(ws, model_name) aci_config = AciWebservice.deploy_configuration( cpu_cores=1, memory_gb=1, ) service = Model.deploy( workspace=ws, name=version, models=[model], inference_config=inference_config, deployment_config=aci_config, overwrite=True, ) service.wait_for_deployment(show_output=True)
- conda_dependencies.yml
channels: - conda-forge dependencies: - python=3.8 - pip: - azureml-defaults - azureml-sdk - numpy - tensorflow==2.7.0
- score.py
import os import json import numpy as np import tensorflow as tf def init(): global model model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model/data/model") model = tf.keras.models.load_model(model_path) def run(raw_data): data = np.array(json.loads(raw_data)["data"]) y_hat = model.predict(data) return y_hat.tolist()