Skip to content
Advertisement

Running two dask-ml imputers simultaneously instead of sequentially

I can impute the mean and most frequent value using dask-ml like so, this works fine:

JavaScript

But what if I have 100 million rows of data it seems that dask would do two loops when it could have done only one, is it possible to run both imputers simultaneously and/or in parallel instead of sequentially? What would be a sample code to achieve that?

Advertisement

Answer

You can used dask.delayed as suggested in docs and Dask Toutorial to parallelise the computation if entities are independent of one another.

Your code would look like:

JavaScript

The c object is a lazy Delayed object. This object holds everything we need to compute the final result, including references to all of the functions that are required and their inputs and relationship to one-another.

Advertisement