How to encapsulate an imported module into a method for multithreading in python?

Question

I&#8217;m new in python and I have a concurrent problem when using internal functions of importing libraries. The problem is that my code calculates different kinds of variables and in the last process they are saved into different files. But I have the same problem when reading and writing. This is an exampl…

Accepted Answer

As I&#8217;m new in python I was unaware of the different kinds of threads that we can create, so in my example above, I was using the ThreadPool that can be locked by the GIL (Global Interpreter Lock), so to avoid it there is another kind of threads we can use, here an example:import osimport concurrent.futuresdef get_xarray(self):    tasks = []    cpu_count = os.cpu_count()    with concurrent.futures.ProcessPoolExecutor(max_workers = cpu_count) as executor:        for i in range(0, len(self.files)):            tasks.append(executor.submit(self.get_xarray_by_file, self.files[i]))    results = []    for result in tasks:        results.append(result.result())    era_raw = xr.merge(results, compat='override')    return era_raw.persist().load()def get_xarray_by_file(self, files):    era_raw = xr.open_mfdataset(files , engine='cfgrib',         combine='nested', concat_dim ='time', decode_coords = False, parallel = True)    return era_raw.persist().load()In that case, we use the ProcessPoolExecutor:The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only pickable objects can be executed and returned.Now we can read in parallel grib2 files, or create nc or csv files from a dataframe in real parallel.

Advertisement

Answer