Skip to content
Advertisement

Is it possible to share a numpy array that’s not empty between processes?

I thought that SharedMemory would keep values of target arrays, but when I actually tried it, it seems it doesn’t.

from multiprocessing import Process, Semaphore, shared_memory
import numpy as np
import time
 
dtype_eV = np.dtype({ 'names':['idx', 'value', 'size'], 
                    'formats':['int32', 'float64', 'float64'] })
 
 
def worker_writer(id, number, a, shm):
    exst_shm = shared_memory.SharedMemory(name=shm)
    b = np.ndarray(a.shape, dtype=a.dtype, buffer=exst_shm.buf)
 
    for i in range(5):
        time.sleep(0.5)
        b['idx'][i] = i
 
def worker_reader(id, number, a, shm):
    exst_shm = shared_memory.SharedMemory(name=shm)
    b = np.ndarray(a.shape, dtype=a.dtype, buffer=exst_shm.buf)
 
    for i in range(5):
        time.sleep(1)
        print(b['idx'][i], b['value'][i])
 
 
if __name__ == "__main__":
    a = np.zeros(5, dtype=dtype_eV)
    a['value'] = 100
    shm = shared_memory.SharedMemory(create=True, size=a.nbytes)  
    c = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf)
    th1 = Process(target=worker_writer, args=(1, 50000000, a, shm.name))
    th2 = Process(target=worker_reader, args=(2, 50000000, a, shm.name))
 
    th1.start()
    th2.start()
    th1.join()
    th2.join()

'''
result:
0 0.0
1 0.0
2 0.0
3 0.0
4 0.0
'''

In the code above, the 2 processes can share one array(a) and access to it. But the value that was given before sharing(a[‘value’] = 100) is missing. Is it just natural or is there any way to keep the value even after sharing?

Advertisement

Answer

Here’s an example of how to use shared_memory using numpy. It was pasted together from several of my other answers, but there are a couple pitfalls to keep in mind with shared_memory:

  • When you create a numpy ndarray from a shm object, it doesn’t prevent the shm from being garbage collected. The unfortunate side effect of this is that the next time you try to access the array, you get a segfault. From another question I created a quick ndarray subclass to just attach the shm as an attribute, so a reference sticks around, and it doesn’t get GC’d.
  • Another pitfall is that on Windows, the OS does the tracking of when to delete the memory rather than giving you the access to do so. That means that even if you don’t call unlink, the memory will get deleted if there are no active references to that particular segment of memory (given by the name). The way to solve this is to make sure you keep an shm open on the main process that outlives all child processes. Calling close and unlink at the end keeps that reference to the end, and makes sure on other platforms you don’t leak memory.
import numpy as np
import multiprocessing as mp
from multiprocessing.shared_memory import SharedMemory

class SHMArray(np.ndarray): #copied from https://numpy.org/doc/stable/user/basics.subclassing.html#slightly-more-realistic-example-attribute-added-to-existing-array
    '''an ndarray subclass that holds on to a ref of shm so it doesn't get garbage collected too early.'''
    def __new__(cls, input_array, shm=None):
        obj = np.asarray(input_array).view(cls)
        obj.shm = shm
        return obj

    def __array_finalize__(self, obj):
        if obj is None: return
        self.shm = getattr(obj, 'shm', None)

def child_func(name, shape, dtype):
    shm = SharedMemory(name=name)
    arr = SHMArray(np.ndarray(shape, buffer=shm.buf, dtype=dtype), shm)
    arr[:] += 5
    shm.close() #be sure to cleanup your shm's locally when they're not needed (referring to arr after this will segfault)

if __name__ == "__main__":
    shape = (10,) # 1d array 10 elements long
    dtype = 'f4' # 32 bit floats
    dummy_array = np.ndarray(shape, dtype=dtype) #dumy array to calculate nbytes
    shm = SharedMemory(create=True, size=dummy_array.nbytes)
    arr = np.ndarray(shape, buffer=shm.buf, dtype=dtype) #create the real arr backed by the shm
    arr[:] = 0
    print(arr) #should print arr full of 0's
    p1 = mp.Process(target=child_func, args=(shm.name, shape, dtype))
    p1.start()
    p1.join()
    print(arr) #should print arr full of 5's
    shm.close() #be sure to cleanup your shm's
    shm.unlink() #call unlink when the actual memory can be deleted
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement