I’m trying to run an Azure Batch task on an Ubuntu VM with an image pulled from a private Azure Container Registry. The nodes in the pool fail on creation with the following error, whether I pre-fetch or not:
Code: NodePreparationError Message: An error occurred during node preparation Values: Error - Hit unexpected error installing containers Message - 400, message='Bad Request', url=URL('http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://management.azure.com/&mi_res_id=/subscriptions/7bd2fd6e-1cb6-4db2-82fe-67c7ea3024cd/resourceGroups/SANDBOX/providers/Microsoft.ManagedIdentity/userAssignedIdentities/my_uami')
Baseline: I have an Azure Subscription with a Resource Group. In the Resource Group is
- a Container Registry,
- a Batch Account, and
- a User Assigned Managed Identity.
The UAMI is assigned in the Identity blade of both the Container Registry and the Batch Account. It has been assigned the AcrPull
role by an admin for my subscription.
I can pull the image to my local machine, so I know it exists. I have tried running a simple task on a pre-fetched python3.7-slim
image from Docker Hub and succeeded, so the problem is somewhere between Batch and ACR.
Here is a minimal sample demonstrating the problem:
from azure.batch import BatchServiceClient from azure.batch.batch_auth import SharedKeyCredentials from azure.batch.models import ( ComputeNodeIdentityReference, ContainerConfiguration, ContainerRegistry, ImageReference, JobAddParameter, PoolAddParameter, PoolInformation, VirtualMachineConfiguration, ) if __name__ == '__main__': batch_service_client = BatchServiceClient( SharedKeyCredentials('batchtest2021', 'GZTn…………………………………pGJ+gNE…………………………dvw=='), batch_url='https://batchtest2021.westeurope.batch.azure.com/', ) pool_id = 'my_test_pool' new_pool = PoolAddParameter( id=pool_id, virtual_machine_configuration=VirtualMachineConfiguration( container_configuration=ContainerConfiguration( container_image_names=[ 'myprivateacr.azurecr.io/mydockerimage:latest', ], container_registries=[ ContainerRegistry( registry_server='myprivateacr.azurecr.io', identity_reference=ComputeNodeIdentityReference( resource_id=f'/subscriptions/7bd2fd6e-1cb6-4db2-82fe-67c7ea3024cd/resourceGroups/SANDBOX/providers/Microsoft.ManagedIdentity/userAssignedIdentities/my_uami' ), ), ], ), image_reference=ImageReference( publisher='microsoft-azure-batch', offer='ubuntu-server-container', sku='20-04-lts', version='latest', ), node_agent_sku_id='batch.node.ubuntu 20.04', ), vm_size='STANDARD_A2M_V2', target_dedicated_nodes=2, ) batch_service_client.pool.add(new_pool) job = JobAddParameter(id='sample_job_id', pool_info=PoolInformation(pool_id=pool_id)) batch_service_client.job.add(job)
The code is based on the Batch Python Quickstart samples and the Batch documentation.
I have tried various steps in the Troubleshoot registry login guide without effect. I have no problems signing in to the ACR through Azure Shell, but that’s with my regular user, not the UAMI of course.
GUIDs have been changed to protect the innocent.
Halp?
Advertisement
Answer
When using managed identity for pools you will have to add the identity to the pool itself, setting an identity on the account allows the batch service itself to use the identity but not the VMs within your pool. Please note that you do not actually need to set the identity on the account in your use-case (Azure Container Registry), just the pool.
Please see the docs on assign identities to pools here:
https://learn.microsoft.com/azure/batch/managed-identity-pools