Skip to content
Advertisement

Process a lot of data without waiting for a chunk to finish

I am confused with map, imap, apply_async, apply, Process etc from the multiprocessing python package.

What I would like to do:

I have 100 simulation script files that need to be run through a simulation program. I would like python to run as many as it can in parallel, then as soon as one is finished, grab a new script and run that one. I don’t want any waiting.

Here is a demo code:

JavaScript

However, I don’t think that map is the correct way here since I want the PC not to wait for the batch to be finished but immediately proceed to the next simulation file.

The code will run mp.cpu_count()-1 simulations at the same time and then wait for every one of them to be finished, before starting a new batch of size mp.cpu_count()-1 . I don’t want the code to wait, but just to grab a new simulation file as soon as possible.

enter image description here

Do you have any advice on how to code it better?

Some clarifications:

I am reducing the pool to one less than the CPU count because I don’t want to block the PC. I still need to do light work while the code is running.

Advertisement

Answer

It works correctly using map. The trouble is simply that you sleep all thread for 5 seconds, so they all finish at the same time.

Try this code to see the effect correctly:

JavaScript
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement