Skip to content
Advertisement

Fast shaping of multiprocessing return values in Python

I have a function with list valued return values that I’m multiprocessing in Python and I need to concatenate them to 1D lists at the end. The following is a sample code for demonstration:

JavaScript

The output for illustration is:

JavaScript

The problem is that the list L that I’m processing is pretty huge and that the concatenations at the end take a huge amount of time which minimizes the advantage over serial processing considerably.

Is there some clever way to avoid the concatenation or alternatively a faster method to perform the concatenation? I’ve been fiddling with queues but this seems kind of very slow.

Note: This seems to be a similar question as Add result from multiprocessing into array.

Advertisement

Answer

If the desired output is an input suitable for creating a scipy.sparse.coo_matrix, I would take a very different approach: Don’t return anything, just create shared objects that can be modified directly.

What you need to create a coo_matrix is an array of the data values, an array of the data rows, and an array of the data columns (unless you already have another sparse matrix / dense matrix). I would create 3 shared arrays that each process can dump results directly into using the index of each entry from L. This even allows out of order execution, so you can use imap_unordered instead for better speed:

JavaScript

By the way: You should also absolutely always be using if __name__ == "__main__": when using multiprocessing. It is suggested everywhere, and required on windows.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement