How can I parallelize a function with multiple arguments

I have written a function create_time_series(input_df1, info_df1, unit_name,start_date,end_date), which aims to create a time series based on log-files saved in input_df1. The problem of my function is that the execution is slow, therefore I thought of parallelizing it.

The following code is my attempt at utilizing the multiprocessing library:

if __name__ == '__main__':
arg = corrected_data,block_info,(unit for unit in block_info.UnitID.unique()),"2015-01-01","2021-12-31"
with Pool(processes = 16) as pool:
    temp_data = pool.starmap(create_time_series,arg)
    out_data = pd.concat([out_data,temp_data[unit]],axis =1)

JavaScript
​x
 
if __name__ == '__main__':
arg = corrected_data,block_info,(unit for unit in block_info.UnitID.unique()),"2015-01-01","2021-12-31"
with Pool(processes = 16) as pool:
    temp_data = pool.starmap(create_time_series,arg)
    out_data = pd.concat([out_data,temp_data[unit]],axis =1)
​

In the task manager, I can see the processes running however, those seem to be idling. Hence my question, what did I do wrong in attempting to parallelize the task ?

Answer

You are not splitting your load, and giving the process pool only one item to process (arg). Check the documentation for starmap: it expects an iterable (e.g. list) of tuples, each of which has all the required arguments

Advertisement

Answer