I have written a function create_time_series(input_df1, info_df1, unit_name,start_date,end_date)
, which aims to create a time series based on log-files saved in input_df1
.
The problem of my function is that the execution is slow, therefore I thought of parallelizing it.
The following code is my attempt at utilizing the multiprocessing library:
JavaScript
x
6
1
if __name__ == '__main__':
2
arg = corrected_data,block_info,(unit for unit in block_info.UnitID.unique()),"2015-01-01","2021-12-31"
3
with Pool(processes = 16) as pool:
4
temp_data = pool.starmap(create_time_series,arg)
5
out_data = pd.concat([out_data,temp_data[unit]],axis =1)
6
In the task manager, I can see the processes running however, those seem to be idling. Hence my question, what did I do wrong in attempting to parallelize the task ?
Advertisement
Answer
You are not splitting your load, and giving the process pool only one item to process (arg
). Check the documentation for starmap: it expects an iterable (e.g. list) of tuples, each of which has all the required arguments