I have a google cloud composer environment. In my DAG I want to create a pod in GKE. When I come to deploy a simple app based on a docker container that doesn’t need any volume configuration or secrets, everything works fine, for example:
kubernetes_max = GKEStartPodOperator( # The ID specified for the task. task_id="python-simple-app", # Name of task you want to run, used to generate Pod ID. name="python-demo-app", project_id=PROJECT_ID, location=CLUSTER_REGION, cluster_name=CLUSTER_NAME, # Entrypoint of the container, if not specified the Docker container's # entrypoint is used. The cmds parameter is templated. cmds=["python", "app.py"], namespace="production", image="gcr.io/path/to/lab-python-job:latest", )
But when I have an application that need to access to my GKE cluster volumes, I need to configure volumes in my pod. The issue is the documentation is not clear regarding this. The only example that I ever foud is this:
volume = k8s.V1Volume( name='test-volume', persistent_volume_claim=k8s.V1PersistentVolumeClaimVolumeSource(claim_name='test-volume'), )
While the volumes in the my manifest file (I use it to deploy my app from local) looks like this:
volumes: - name: volume-prod secret: secretName: volume-prod items: - key: config path: config.json - key: another_config path: another_config.conf - key: random-ca path: random-ca.pem
So when I compare how both volumes looks like in the console (when I manually deploy the manifest file that successfully run, and when I deploy the pod using clod composer that fails):
The successful run – Manifest file:
volume-prod
Name: volume-prod
Type: secret
Source volume identifier: volume-prodThe failed run – Composer
GKEStartPodOperator
:volume-prod
Name: volume-prod
Type: emptyDir
Source volume identifier: Node’s default medium
How I can configure my pod from cloud composer in a way it can read the volume of my cluster?
Advertisement
Answer
The KubernetesPodOperator
/GKEStartOperator
is just a wrapper around the python Kubernetes sdk – I agree that it isn’t well documented in the Airflow/Cloud Composer documentation but the Python SDK for Kubernetes itself is well documented.
Start here with the kubernetes python sdk documentation: https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1PodSpec.md
You’ll notice that the arguments the KubernetesPodOperator
/GKEStartOperator
take match this spec. If you dig into the source code of the operators you’ll see that the operator is nothing more than a builder that creates a kubernetes.client.models.V1Pod
object and uses the API to deploy the pod.
The operator takes a volumes
parameter which should be of type List[V1Volume]
, where the documentation for V1Volume
is here.
So in your case you would need to provide:
from kubernetes.client import models as k8s kubernetes_max = GKEStartPodOperator( # The ID specified for the task. task_id="python-simple-app", # Name of task you want to run, used to generate Pod ID. name="python-demo-app", project_id=PROJECT_ID, location=CLUSTER_REGION, cluster_name=CLUSTER_NAME, # Entrypoint of the container, if not specified the Docker container's # entrypoint is used. The cmds parameter is templated. cmds=["python", "app.py"], namespace="production", image="gcr.io/path/to/lab-python-job:latest", volumes=[ k8s.V1Volume( name="volume-prod", secret=k8s.V1SecretVolumeSource( secret_name="volume-prod", items=[ k8s.V1KeyToPath(key="config", path="config.json"), k8s.V1KeyToPath(key="another_config", path="another_config.conf"), k8s.V1KeyToPath(key="random-ca", path="random-ca.pem"), ], ) ) ] )
Alternatively, you can provide your manifest to the pod_template_file
argument in GKEStartPodOperator
– this will need to be available to the workers inside airflow.
There are 3 ways to create pods in Airflow using this Operator:
- Use the arguments of the operator to specify what you need and have the operator build the
V1Pod
for you. - Provide a manifest by passing in
pod_template_file
argument. - Use the Kubernetes sdk to create a
V1Pod
object yourself and pass this to thefull_pod_spec
argument.