Skip to content
Advertisement

Getting very slow iterations in a loop run over a Datarray using Xarray and Dask

I am trying to calculate windspeed from u and v components for 1 year data at hourly timestep and 0.1 x 0.1 Degree resolution for a total of 40 years. The individual u and v netcdf files for 1 year is about 5GB each. I have implemented a basic for loop where the u and v netcdf files for each year are opened through Xarray open_dataset and rechunked to get them as dask arrays, followed by the calculations and export the result as new netcdf. When the loop is run, the first iteration happens almost instantaneously but then the loop takes too long for the next iteration (almost to a point where it appears to be stalled). I do not understand what part of my code is bottlenecking here and why. Any help would be appreciated. Also, I have properly implemented the dask scheduler to request the resources adaptively. I am attaching the relevant code snippet for reference :

JavaScript

Advertisement

Answer

As it is, your code still appears to be serial rather than parallel, specifically wind_speed.to_netcdf(w_dir) will trigger computation right away. The code below might require some adjustment, but the main point is to parallelise your operations:

JavaScript
Advertisement