Skip to content
Advertisement

my python Ray script runs on a single worker only

I am a new with Ray and after have read he documentation, I came up with a script that mimics what I want to do further with Ray. Here is my script:

import ray
import time

import h5py

@ray.remote
class Analysis:
    def __init__(self):
        self._file = h5py.File('./Data/Trajectories/MDANSE/apoferritin.h5')

    def __getstate__(self):
        print('I dump')
        d = self.__dict__.copy()
        del d['_file']
        return d

    def __setstate__(self,state):
        self.__dict__ = state
        self._file = h5py.File('./Data/Trajectories/MDANSE/apoferritin.h5')

    def run_step(self,index):
        time.sleep(5)        
        print('I run a step',index)

    def combine(self,index):
        print('I combine',index)

ray.init(num_cpus=4)

a = Analysis.remote()
obj_id = ray.put(a)
for i in range(100):
    output = ray.get(a.run_step.remote(i))

My problem is that when I run this script it runs on a single worker as indicated by the Ray output whereas I would expect 4 workers to be fired. Would you know what is wrong with my script ?

Advertisement

Answer

Quoting from ray docs on actor

Methods called on different actors can execute in parallel, and methods called on the same actor are executed serially in the order that they are called.

Another issue with the above code is that ray.get is a blocking call.

I will suggest instantiating multiple actors and running the jobs, like

actors = [Analysis.remote() for i in range(num_cpus)]
outputs = []
for i in range(100):
    outputs.append(actors[i % num_cpus].run_step.remote(i))
output = ray.get(outputs)
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement