I thought that SharedMemory would keep values of target arrays, but when I actually tried it, it seems it doesn’t.
from multiprocessing import Process, Semaphore, shared_memory import numpy as np import time dtype_eV = np.dtype({ 'names':['idx', 'value', 'size'], 'formats':['int32', 'float64', 'float64'] }) def worker_writer(id, number, a, shm): exst_shm = shared_memory.SharedMemory(name=shm) b = np.ndarray(a.shape, dtype=a.dtype, buffer=exst_shm.buf) for i in range(5): time.sleep(0.5) b['idx'][i] = i def worker_reader(id, number, a, shm): exst_shm = shared_memory.SharedMemory(name=shm) b = np.ndarray(a.shape, dtype=a.dtype, buffer=exst_shm.buf) for i in range(5): time.sleep(1) print(b['idx'][i], b['value'][i]) if __name__ == "__main__": a = np.zeros(5, dtype=dtype_eV) a['value'] = 100 shm = shared_memory.SharedMemory(create=True, size=a.nbytes) c = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf) th1 = Process(target=worker_writer, args=(1, 50000000, a, shm.name)) th2 = Process(target=worker_reader, args=(2, 50000000, a, shm.name)) th1.start() th2.start() th1.join() th2.join() ''' result: 0 0.0 1 0.0 2 0.0 3 0.0 4 0.0 '''
In the code above, the 2 processes can share one array(a) and access to it. But the value that was given before sharing(a[‘value’] = 100) is missing. Is it just natural or is there any way to keep the value even after sharing?
Advertisement
Answer
Here’s an example of how to use shared_memory
using numpy. It was pasted together from several of my other answers, but there are a couple pitfalls to keep in mind with shared_memory
:
- When you create a numpy ndarray from a shm object, it doesn’t prevent the shm from being garbage collected. The unfortunate side effect of this is that the next time you try to access the array, you get a segfault. From another question I created a quick ndarray subclass to just attach the shm as an attribute, so a reference sticks around, and it doesn’t get GC’d.
- Another pitfall is that on Windows, the OS does the tracking of when to delete the memory rather than giving you the access to do so. That means that even if you don’t call unlink, the memory will get deleted if there are no active references to that particular segment of memory (given by the name). The way to solve this is to make sure you keep an shm open on the main process that outlives all child processes. Calling close and unlink at the end keeps that reference to the end, and makes sure on other platforms you don’t leak memory.
import numpy as np import multiprocessing as mp from multiprocessing.shared_memory import SharedMemory class SHMArray(np.ndarray): #copied from https://numpy.org/doc/stable/user/basics.subclassing.html#slightly-more-realistic-example-attribute-added-to-existing-array '''an ndarray subclass that holds on to a ref of shm so it doesn't get garbage collected too early.''' def __new__(cls, input_array, shm=None): obj = np.asarray(input_array).view(cls) obj.shm = shm return obj def __array_finalize__(self, obj): if obj is None: return self.shm = getattr(obj, 'shm', None) def child_func(name, shape, dtype): shm = SharedMemory(name=name) arr = SHMArray(np.ndarray(shape, buffer=shm.buf, dtype=dtype), shm) arr[:] += 5 shm.close() #be sure to cleanup your shm's locally when they're not needed (referring to arr after this will segfault) if __name__ == "__main__": shape = (10,) # 1d array 10 elements long dtype = 'f4' # 32 bit floats dummy_array = np.ndarray(shape, dtype=dtype) #dumy array to calculate nbytes shm = SharedMemory(create=True, size=dummy_array.nbytes) arr = np.ndarray(shape, buffer=shm.buf, dtype=dtype) #create the real arr backed by the shm arr[:] = 0 print(arr) #should print arr full of 0's p1 = mp.Process(target=child_func, args=(shm.name, shape, dtype)) p1.start() p1.join() print(arr) #should print arr full of 5's shm.close() #be sure to cleanup your shm's shm.unlink() #call unlink when the actual memory can be deleted