multiprocessing loop over a simple list?

I have a function that calls a custom function that compares rows in a dataframe and calculates some stats. vt.make_breakpts it needs a dataframe (data), a key (unique identifier), and a datefield (date) to do it’s thing. I can run this and wait a very long time and it will go through and entire dataframe and output a dataframe of stats calculated by comparing the in a sequence (in this case date). I have a list of all unique key values are want to pass it to multiprocessing so that each item in the list is used to subset the input df and then pass that work to a processor. So I created a def function that will pass the values to the custom function.

def taska(id, data, key, date):
    cdata = data[data[key]==id]
    return vt.make_breakpts (data=cdata, key=key, date=date)

Then used functools to set the unchanging variables and an empty list to capture the results and use unique() to get a list of unique key values.

partialA = functools.partial(taska, data=pgdf, key=VID, date=PDATE)
resultList = []
vidList = list(pgdf['VESSEL_ID'].unique())

How do I pass the list values to the multicore processor and return the results from each process to the list? I used…

with Pool(14) as pool:
    for results in pool.imap_unordered(partial_task, bwedf.iterrows()):
        ResultsList.append(results[0])

.iterrows() worked because in that example I was using a dataframe, is there a similar approach for a simple list?

Answer

you just pass the list itself.

with Pool(14) as pool:
    for results in pool.imap_unordered(partial_task, vidList):
        ResultsList.append(results[0])

explaination: imap expects an iterable, both lists and df.iterrows are iterables … specifically anything that you can be put in a for loop is an iterable ie:

for i in iterable:

Advertisement

Answer