import joblib from sklearn.externals.joblib import parallel_backend with joblib.parallel_backend('dask'): from dask_ml.model_selection import GridSearchCV import xgboost from xgboost import XGBRegressor grid_search = GridSearchCV(estimator= XGBRegressor(), param_grid = param_grid, cv = 3, n_jobs = -1) grid_search.fit(df2,df3)
I created a dask cluster using two local machines using
client = dask.distributed.client('tcp://191.xxx.xx.xxx:8786')
I am trying to find best parameters using dask gridsearchcv. I am facing the following error.
istributed.scheduler - ERROR - Couldn't gather keys {"('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0', 1202, 2)": ['tcp://127.0.0.1:3738']} state: ['processing'] workers: ['tcp://127.0.0.1:3738'] NoneType: None distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:3738'], ('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0', 1202, 2) NoneType: None distributed.client - WARNING - Couldn't gather 1 keys, rescheduling {"('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0', 1202, 2)": ('tcp://127.0.0.1:3738',)} distributed.nanny - WARNING - Restarting worker distributed.scheduler - ERROR - Couldn't gather keys {"('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0', 1, 2)": ['tcp://127.0.0.1:3730']} state: ['processing'] workers: ['tcp://127.0.0.1:3730'] NoneType: None distributed.scheduler - ERROR - Couldn't gather keys {"('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0', 0, 1)": ['tcp://127.0.0.1:3730'], "('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0', 5, 1)": ['tcp://127.0.0.1:3729'], "('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0', 4, 2)": ['tcp://127.0.0.1:3729'], "('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0', 2, 1)": ['tcp://127.0.0.1:3730']} state: ['processing', 'processing', 'processing', 'processing'] workers: ['tcp://127.0.0.1:3730', 'tcp://127.0.0.1:3729'] NoneType: None distributed.scheduler - ERROR - Couldn't gather keys {'cv-n-samples-7cb7087b3aff75a31f487cfe5a9cedb0': ['tcp://127.0.0.1:3729']} state: ['processing'] workers: ['tcp://127.0.0.1:3729'] NoneType: None distributed.scheduler - ERROR - Couldn't gather keys {"('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0', 4, 0)": ['tcp://127.0.0.1:3729'], "('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0', 2, 0)": ['tcp://127.0.0.1:3729'], "('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0', 0, 0)": ['tcp://127.0.0.1:3729']} state: ['processing', 'processing', 'processing'] workers: ['tcp://127.0.0.1:3729'] NoneType: None distributed.scheduler - ERROR - Couldn't gather keys {"('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0', 0, 2)": ['tcp://127.0.0.1:3729'], "('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0', 2, 2)": ['tcp://127.0.0.1:3729']} state: ['processing', 'processing'] workers: ['tcp://127.0.0.1:3729'] NoneType: None distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://127.0.0.1:3730'], ('xgbregressor-fit-score-7cb7087b3aff75a31f487cfe5a9cedb0', 1, 2) NoneType: None
I hope someone helps in solving this issue. Thanks in advance.
Advertisement
Answer
I also meet the same issue, and I find it’s likely to be caused by firewall.
Suppose we have two machines, 191.168.1.1 for scheduler and 191.168.1.2 for worker.
When we start scheduler, we may get following info:
distributed.scheduler - INFO - ----------------------------------------------- distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy distributed.scheduler - INFO - ----------------------------------------------- distributed.scheduler - INFO - Clear task state distributed.scheduler - INFO - Scheduler at: tcp://191.168.1.1:8786 distributed.scheduler - INFO - dashboard at: :8787
so for scheduler, we should confirm that port 8786
and port 8786
can be accessed.
Simlilarly, we can check worker’s info:
istributed.nanny - INFO - Start Nanny at: 'tcp://191.168.1.2:39042' distributed.diskutils - INFO - Found stale lock file and directory '/root/dask-worker-space/worker-39rf_n28', purging distributed.worker - INFO - Start worker at: tcp://191.168.1.2:39040 distributed.worker - INFO - Listening to: tcp://191.168.1.2:39040 distributed.worker - INFO - dashboard at: 191.168.1.2:39041 distributed.worker - INFO - Waiting to connect to: tcp://191.168.1.1:8786 distributed.worker - INFO - -------------------------------------------------
nanny port is 39042
, worker port is 39040
and dashboard port is 39041
.
set these ports open for both 191.168.1.1 and 191.168.1.2:
firewall-cmd --permanent --add-port=8786/tcp firewall-cmd --permanent --add-port=8787/tcp firewall-cmd --permanent --add-port=39040/tcp firewall-cmd --permanent --add-port=39041/tcp firewall-cmd --permanent --add-port=39042/tcp firewall-cmd --reload
and task can run sucessfully.
Finally, Dask
will choose ports for worker randomly, we can also start worker with customized ports:
dask-worker 191.168.1.1:8786 --worker-port 39040 --dashboard-address 39041 --nanny-port 39042
More parameters can be referred here.