Why GridSearchCV spends more than 50% time on {met…

Recently I am tuning up some of my machine learning pipeline. I decided to take advantage of my multicore processor. And I ran cross-validation with param n_jobs=-1. I also profiled it and what was suprise for me: the top function was:

{method 'acquire' of 'thread.lock' objects}

I was not sure if it was my fault due to operations I do in Pipeline. So I decided to make small experiment:

pp = Pipeline([('svc', SVC())])
cv = GridSearchCV(pp, {'svc__C' : [1, 100, 200]}, jobs=-1, cv=2, refit=True)
%prun cv.fit(np.random.rand(1e4, 100), np.random.randint(0, 5, 1e4))

The output is :

2691 function calls (2655 primitive calls) in 74.005 seconds
Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   83   43.819    0.528   43.819    0.528 {method 'acquire' of 'thread.lock' objects}
    1   30.112   30.112   30.112   30.112 {sklearn.svm.libsvm.fit}

I wonder what is the cause of such behavior. And if it is possible to speed it up a little bit.

Answer

The profiler is only telling you what the main process is doing, while its child processes are doing all the work. Setting verbose=2 on GridSearchCV may give better output than %prun in this case.

Why GridSearchCV spends more than 50% time on {method ‘acquire’ of ‘thread.lock’ objects}?

Advertisement

Answer