Skip to content
Advertisement

How do I correctly add worker nodes to my cluster?

I am trying to create a cluster with the following parameters on Google Cloud:

  1. 1 Master
  2. 7 Worker nodes
  3. Each of them with 1 vCPU
  4. The master node should get full SSD capacity and the worker nodes should get equal shares of standard disk capacity.

This is my code:

#Create the cluster
CLUSTER = '{}-cluster'.format(PROJECT)
!gcloud dataproc clusters create $CLUSTER 
    --image-version 1.5-ubuntu18 --single-node 
    --master-machine-type n1-standard-1 
    --master-boot-disk-type pd-ssd --master-boot-disk-size 100 
    --num-workers 7 
    --worker-machine-type n1-standard-1 
    --worker-boot-disk-type pd-standard --worker-boot-disk-size 200 
    --max-idle 3600s 

This is my error:

RROR: (gcloud.dataproc.clusters.create) argument --single-node: At most one of --single-node | --num-secondary-workers --num-workers --secondary-worker-type can be specified.

Updated attempt:

#Create the cluster
CLUSTER = '{}-cluster'.format(PROJECT)
!gcloud dataproc clusters create $CLUSTER 
    --image-version 1.5-ubuntu18 
    --master-machine-type n1-standard-1 
    --master-boot-disk-type pd-ssd --master-boot-disk-size 100 
    --num-secondary-workers = 7 
    --secondary-worker-type=non-preemptible 
    --secondary-worker-boot-disk-type pd-standard 
    --secondary-worker-boot-disk-size=200 
    --max-idle 3600s 
    --initialization-actions=gs://goog-dataproc-initialization-actions-$REGION/python/pip-install.sh 
    --metadata=PIP_PACKAGES=tensorflow==2.4.0

I don’t follow what I am doing wrong here. Can anyone advise?

Advertisement

Answer

The documentation should help gcloud dataproc clusters create. It explains that:

The documentation describes Secondary workers

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement