I have a list of dask delayed objects Portfolio_perfs
:
JavaScript
x
6
1
type(Portfolio_perfs)
2
<class 'list'>
3
#print until 3
4
Portfolio_perfs[:3]
5
[Delayed('getitem-b7fd8629e2a0ecfe4e61ae6f39926140'), Delayed('getitem-af3225459229d541b73dc79319edaec2'), Delayed('getitem-0555389e6dd01031de85e293b8c42b85')]
6
Each delayed object is a numpy array of length 2
JavaScript
1
3
1
Portfolio_perfs[0].compute()
2
array([0.75620425, 0.1835988 ])
3
I want to build the following dataframe without using dask.compute
:
JavaScript
1
16
16
1
pd.DataFrame(dask.compute(*Portfolio_perfs))
2
0 1
3
0 0.756204 0.183599
4
1 0.825101 0.195705
5
2 0.792804 0.189422
6
3 0.786267 0.178194
7
4 0.860377 0.220204
8
..
9
595 0.636857 0.139955
10
596 0.925144 0.218462
11
597 0.925077 0.213963
12
598 0.922016 0.206081
13
599 0.770950 0.170273
14
15
[600 rows x 2 columns]
16
How can I build this dask dataframe without going through dask.compute
?
Thank you
Advertisement
Answer
Since each delayed object is a numpy array, you are interested in da.from_delayed()
:
JavaScript
1
2
1
dask_array = da.from_delayed(Portfolio_perfs)
2
Alternatively, it’s possible to convert numpy arrays to pandas dataframes and then use:
JavaScript
1
2
1
dd.from_delayed()
2
Note that it’s not possible to do it with pd.DataFrame
because pandas will not know what to do with the delayed objects, so you will need to use dask.dataframe
for this task.