Skip to content
Advertisement

Configure volumes in airflow GKEStartPodOperator operator

I have a google cloud composer environment. In my DAG I want to create a pod in GKE. When I come to deploy a simple app based on a docker container that doesn’t need any volume configuration or secrets, everything works fine, for example:

JavaScript

But when I have an application that need to access to my GKE cluster volumes, I need to configure volumes in my pod. The issue is the documentation is not clear regarding this. The only example that I ever foud is this:

JavaScript

While the volumes in the my manifest file (I use it to deploy my app from local) looks like this:

JavaScript

So when I compare how both volumes looks like in the console (when I manually deploy the manifest file that successfully run, and when I deploy the pod using clod composer that fails):

  • The successful run – Manifest file:

    volume-prod
    Name: volume-prod
    Type: secret
    Source volume identifier: volume-prod

  • The failed run – Composer GKEStartPodOperator:

    volume-prod
    Name: volume-prod
    Type: emptyDir
    Source volume identifier: Node’s default medium

How I can configure my pod from cloud composer in a way it can read the volume of my cluster?

Advertisement

Answer

The KubernetesPodOperator/GKEStartOperator is just a wrapper around the python Kubernetes sdk – I agree that it isn’t well documented in the Airflow/Cloud Composer documentation but the Python SDK for Kubernetes itself is well documented.

Start here with the kubernetes python sdk documentation: https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1PodSpec.md

You’ll notice that the arguments the KubernetesPodOperator/GKEStartOperator take match this spec. If you dig into the source code of the operators you’ll see that the operator is nothing more than a builder that creates a kubernetes.client.models.V1Pod object and uses the API to deploy the pod.

The operator takes a volumes parameter which should be of type List[V1Volume], where the documentation for V1Volume is here.

So in your case you would need to provide:

JavaScript

Alternatively, you can provide your manifest to the pod_template_file argument in GKEStartPodOperator – this will need to be available to the workers inside airflow.

There are 3 ways to create pods in Airflow using this Operator:

  1. Use the arguments of the operator to specify what you need and have the operator build the V1Pod for you.
  2. Provide a manifest by passing in pod_template_file argument.
  3. Use the Kubernetes sdk to create a V1Pod object yourself and pass this to the full_pod_spec argument.
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement