Skip to content

Tag: dask

High memory allocation when using dask.bag.map

I am using dask for extending dask bag items by information from an external, previously computed object arg. Dask seems to allocate memory for arg for each partition at once in the beginning of the computation process. Is there a workaround to prevent Dask from duplicating the arg multiple times (and allocat…

Dealing with huge pandas data frames

I have a huge database (of 500GB or so) an was able to put it in pandas. The databasse contains something like 39705210 observations. As you can imagine, python has hard times even opening it. Now, I am trying to use Dask in order to export it to cdv into 20 partitions like this: However when I am trying to

Dask Df convert All Dtype using dictionary

Is there an easy equivalent way to convert all columns in a dask df(converted from a pandas df) using a dictionary. I have a dictionary as follows: and would like to convert the pandas|dask df dtypes all at once to the suggested dtypes in the dictionary. Answer Not sure if I understand the question correctly,…