How to pass variables in parent to subprocess in python?

Question

I am trying to have a parent python script sent variables to a child script to help me speed-up and automate video analysis. I am now using the subprocess.Popen() call to start-up 6 instances of a child script but cannot find a way to pass variables and modules already called for in the parent to the child. For example, the

Accepted Answer

The simple answer here is: don&#8217;t use subprocess.Popen, use multiprocessing.Process. Or, better yet, multiprocessing.Pool or concurrent.futures.ProcessPoolExecutor.With subprocess, your program&#8217;s Python interpreter doesn&#8217;t know anything about the subprocess at all; for all it knows, the child process is running Doom. So there&#8217;s no way to directly share information with it.* But with multiprocessing, Python controls launching the subprocess and getting everything set up so that you can share data as conveniently as possible.Unfortunately &#8220;as conveniently as possible&#8221; still isn&#8217;t 100% as convenient as all being in one process. But what you can do is usually good enough. Read the section on Exchanging objects between processes and the following few sections; hopefully one of those mechanisms will be exactly what you need.But, as I implied at the top, in most cases you can make it even simpler, by using a pool. Instead of thinking about &#8220;running 6 processes and sharing data with them&#8221;, just think about it as &#8220;running a bunch of tasks on a pool of 6 processes&#8221;. A task is basically just a function—it takes arguments, and returns a value. If the work you want to parallelize fits into that model—and it sounds like your work does—life is as simple as could be. For example:import multiprocessingimport osimport sysimport analysisparent_dir = os.path.realpath(sys.argv[0])paths = [os.path.join(folderpath, file)          for file in os.listdir(folderpath)]with multiprocessing.Pool(processes=6) as pool:    results = pool.map(analysis.analyze, paths)If you&#8217;re using Python 3.2 or earlier (including 2.7), you can&#8217;t use a Pool in a with statement. I believe you want this:**pool = multiprocessing.Pool(processes=6)try:    results = pool.map(analysis.analyze, paths)finally:    pool.close()    pool.join()This will start up 6 processes,*** then tell the first one to do analysis.analyze(paths[0]), the second to do analysis.analyze(paths[1]), etc. As soon as any of the processes finishes, the pool will give it the next path to work on. When they&#8217;re all finished, you get back a list of all the results.****Of course this means that the top-level code that lived in analysis.py has to be moved into a function def analyze(path): so you can call it. Or, even better, you can move that function into the main script, instead of a separate file, if you really want to save that import line.* You can still indirectly share information by, e.g., marshaling it into some interchange format like JSON and pass it via the stdin/stdout pipes, a file, a shared memory segment, a socket, etc., but multiprocessing effectively wraps that up for you to make it a whole lot easier.** There are different ways to shut a pool down, and you can also choose whether or not to join it immediately, so you really should read up on the details at some point. But when all you&#8217;re doing is calling pool.map, it really doesn&#8217;t matter; the pool is guaranteed to shut down and be ready to join nearly instantly by the time the map call returns.*** I&#8217;m not sure why you wanted 6; most machines have 4, 8, or 16 cores, not 6; why not use them all? The best thing to do is usually to just leave out the processes=6 entirely and let multiprocessing ask your OS how many cores to use, which means it&#8217;ll still run at full speed on your new machine with twice as many cores that you&#8217;ll buy next year.**** This is slightly oversimplified; usually the pool will give the first process a batch of files, not one at a time, to save a bit of overhead, and you can manually control the batching if you need to optimize things or sequence them more carefully. But usually you don&#8217;t care, and this oversimplification is fine.

Advertisement

Answer