Skip to content
Advertisement

Buildbot: worker is idle

I’ve noticed that the distribution of Builds between workers is sub-optimal, 80% of the time Builds are running on busy workers.

If you have a look at the image, tmp_worker1 can process triggered_build_1, but instead, it’s idle!!! For some reason, triggered_build_1 is in acquiring a locked state and is assigned to the busy example-worker

enter image description here

I have the next setup:

  • 3 workers
  • 1 main builder
  • 3 triggerable builders (with locks)

Main source code below

# triggerable scheduler
c['schedulers'].append(schedulers.Triggerable(name="trigger_from_main",
    builderNames=['triggered_build_0', 'triggered_build_1', 'triggered_build_2']))

# main builder factory
factory_main = util.BuildFactory()

# trigger
factory_main.addStep(steps.Trigger(
    schedulerNames=['trigger_from_main'],
    waitForFinish=True,
    haltOnFailure=True,
    name='trigger'
))

# main builder 
c['builders'].append(
    util.BuilderConfig(name="test_main",
        workernames=['example-worker', 'tmp_worker0', 'tmp_worker1'],
        factory=factory_main,
    )
)

# lock
worker_lock = [util.WorkerLock("worker_builds", maxCount=1).access('counting')]

# 1st of 3 sub-builder
c['builders'].append(
    util.BuilderConfig(name="triggered_build_0",
        workernames=['example-worker', 'tmp_worker0', 'tmp_worker1'],
        factory=factory_subbuild,
        locks=worker_lock,
    )
)

# 2nd of 3 sub-builder
...
# 3rd of 3 sub-builder
...

Advertisement

Answer

This behavior is triggered because of locks and the way the build are distributed by the master.

When a build need to be run, the master do this following step (link to the source code):

  • Select randomly a worker
  • Check if the worker and the build match
  • Check if the build can take lock(s) needed on the worker
  • Assign the build to the master

And when the build start on the worker, it take the lock.

So if the master check the requirement of the next build to dispatch before the lock was acquired, it can dispatch a new build on the same worker (even if they need the same lock).

You can fix this if you put into quarantine the worker where the build is assign in order to give the worker enough time to take the lock. You can do this with the canStartBuild function which is run just before assigning the build on the worker (docs).

def canStartBuildLockQuarantine(builder, wfb, request):
    # Put the worker in quarantine for 5 seconds
    wfb.worker.quarantine_timeout = 5
    wfb.worker.putInQuarantine()
    # Reset wfb.worker.quarantine_timeout
    wfb.worker.resetQuarantine()
    return True

And give it to the worker that will take locks.

c['builders'].append(
    util.BuilderConfig(name="triggered_build_0",
        workernames=['example-worker', 'tmp_worker0', 'tmp_worker1'],
        factory=factory_subbuild,
        canStartBuild=canStartBuildLockQuarantine,
        locks=worker_lock,
    )
)
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement