This is probably a trivial question, but how do I parallelize the following loop in python?
# setup output lists
output1 = list()
output2 = list()
output3 = list()
for j in range(0, 10):
    # calc individual parameter value
    parameter = j * offset
    # call the calculation
    out1, out2, out3 = calc_stuff(parameter = parameter)
    # put results into correct output list
    output1.append(out1)
    output2.append(out2)
    output3.append(out3)
I know how to start single threads in Python but I don’t know how to “collect” the results.
Multiple processes would be fine too – whatever is easiest for this case. I’m using currently Linux but the code should run on Windows and Mac as-well.
What’s the easiest way to parallelize this code?
Advertisement
Answer
Using multiple threads on CPython won’t give you better performance for pure-Python code due to the global interpreter lock (GIL).  I suggest using the multiprocessing module instead:
pool = multiprocessing.Pool(4) out1, out2, out3 = zip(*pool.map(calc_stuff, range(0, 10 * offset, offset)))
Note that this won’t work in the interactive interpreter.
To avoid the usual FUD around the GIL: There wouldn’t be any advantage to using threads for this example anyway. You want to use processes here, not threads, because they avoid a whole bunch of problems.
 
						