I am trying to create a cluster with the following parameters on Google Cloud:
- 1 Master
- 7 Worker nodes
- Each of them with 1 vCPU
- The master node should get full SSD capacity and the worker nodes should get equal shares of standard disk capacity.
This is my code:
JavaScript
x
11
11
1
#Create the cluster
2
CLUSTER = '{}-cluster'.format(PROJECT)
3
!gcloud dataproc clusters create $CLUSTER
4
--image-version 1.5-ubuntu18 --single-node
5
--master-machine-type n1-standard-1
6
--master-boot-disk-type pd-ssd --master-boot-disk-size 100
7
--num-workers 7
8
--worker-machine-type n1-standard-1
9
--worker-boot-disk-type pd-standard --worker-boot-disk-size 200
10
--max-idle 3600s
11
This is my error:
JavaScript
1
2
1
RROR: (gcloud.dataproc.clusters.create) argument --single-node: At most one of --single-node | --num-secondary-workers --num-workers --secondary-worker-type can be specified.
2
Updated attempt:
JavaScript
1
14
14
1
#Create the cluster
2
CLUSTER = '{}-cluster'.format(PROJECT)
3
!gcloud dataproc clusters create $CLUSTER
4
--image-version 1.5-ubuntu18
5
--master-machine-type n1-standard-1
6
--master-boot-disk-type pd-ssd --master-boot-disk-size 100
7
--num-secondary-workers = 7
8
--secondary-worker-type=non-preemptible
9
--secondary-worker-boot-disk-type pd-standard
10
--secondary-worker-boot-disk-size=200
11
--max-idle 3600s
12
--initialization-actions=gs://goog-dataproc-initialization-actions-$REGION/python/pip-install.sh
13
--metadata=PIP_PACKAGES=tensorflow==2.4.0
14
I don’t follow what I am doing wrong here. Can anyone advise?
Advertisement
Answer
The documentation should help gcloud dataproc clusters create
. It explains that:
--single-node
says “Create a single node cluster” which is not what you want, so you don’t want to include it.- You want a “Multi-node cluster” so you want a combination of
--num-secondary-workers
and--num-workers
.
The documentation describes Secondary workers