I’ve been trying to convert a numpy rec.array into a dataframe. The current array looks like:
[rec.array([([0.2], [ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ]),
            ([0.2], [ 1.86755799, -0.97727788,  0.95008842, -0.15135721]),
            ([0.2], [-0.10321885,  0.4105985 ,  0.14404357,  1.45427351]),
            ([0.2], [ 0.76103773,  0.12167502,  0.44386323,  0.33367433]),
            ([0.2], [ 1.49407907, -0.20515826,  0.3130677 , -0.85409574])],
           dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))]),
 rec.array([([0.1], [ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ]),
            ([0.1], [ 1.86755799, -0.97727788,  0.95008842, -0.15135721]),
            ([0.1], [-0.10321885,  0.4105985 ,  0.14404357,  1.45427351]),
            ([0.1], [ 0.76103773,  0.12167502,  0.44386323,  0.33367433]),
            ([0.1], [ 1.49407907, -0.20515826,  0.3130677 , -0.85409574]),
            ([0.1], [-2.55298982,  0.6536186 ,  0.8644362 , -0.74216502]),
            ([0.1], [ 2.26975462, -1.45436567,  0.04575852, -0.18718385]),
            ([0.1], [ 1.53277921,  1.46935877,  0.15494743,  0.37816252]),
            ([0.1], [-0.88778575, -1.98079647, -0.34791215,  0.15634897]),
            ([0.1], [ 1.23029068,  1.20237985, -0.38732682, -0.30230275])],
           dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))]),
 rec.array([([0.16666667], [ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ]),
            ([0.16666667], [ 1.86755799, -0.97727788,  0.95008842, -0.15135721]),
            ([0.16666667], [-0.10321885,  0.4105985 ,  0.14404357,  1.45427351]),
            ([0.16666667], [ 0.76103773,  0.12167502,  0.44386323,  0.33367433]),
            ([0.16666667], [ 1.49407907, -0.20515826,  0.3130677 , -0.85409574]),
            ([0.16666667], [-2.55298982,  0.6536186 ,  0.8644362 , -0.74216502])],
           dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))]),
 rec.array([([0.05882353], [ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ]),
            ([0.05882353], [ 1.86755799, -0.97727788,  0.95008842, -0.15135721]),
            ([0.05882353], [-0.10321885,  0.4105985 ,  0.14404357,  1.45427351]),
            ([0.05882353], [ 0.76103773,  0.12167502,  0.44386323,  0.33367433]),
            ([0.05882353], [ 1.49407907, -0.20515826,  0.3130677 , -0.85409574]),
            ([0.05882353], [-2.55298982,  0.6536186 ,  0.8644362 , -0.74216502]),
            ([0.05882353], [ 2.26975462, -1.45436567,  0.04575852, -0.18718385]),
            ([0.05882353], [ 1.53277921,  1.46935877,  0.15494743,  0.37816252]),
            ([0.05882353], [-0.88778575, -1.98079647, -0.34791215,  0.15634897]),
            ([0.05882353], [ 1.23029068,  1.20237985, -0.38732682, -0.30230275]),
            ([0.05882353], [-1.04855297, -1.42001794, -1.70627019,  1.9507754 ]),
            ([0.05882353], [-0.50965218, -0.4380743 , -1.25279536,  0.77749036]),
            ([0.05882353], [-1.61389785, -0.21274028, -0.89546656,  0.3869025 ]),
            ([0.05882353], [-0.51080514, -1.18063218, -0.02818223,  0.42833187]),
            ([0.05882353], [ 0.06651722,  0.3024719 , -0.63432209, -0.36274117]),
            ([0.05882353], [-0.67246045, -0.35955316, -0.81314628, -1.7262826 ]),
            ([0.05882353], [ 0.17742614, -0.40178094, -1.63019835,  0.46278226])]],
           dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))])]
The result should be a five-column dataframe like the following:
| Weights | v_1 | v_2 | v_3 | v_4 | 
|---|---|---|---|---|
| 0.2 | 1.76405235 | 0.40015721 | 0.97873798 | 2.2408932 | 
| 0.2 | 1.86755799 | -0.97727788 | 0.95008842 | -0.15135721 | 
| …. | …. | … | … | … | 
| 0.05882353 | 0.17742614 | -0.40178094 | -1.63019835 | 0.46278226 | 
and so on..
However, as I do pd.DataFrame(my_list), the resulting dataframe has like 90 columns and not 5 as the above. Each column represents a sublist of the array of the form [a], [w, x, y, z]. The resulting dataframe should be: 5 columns and number of rows equal to 32 (for the above example).
Advertisement
Answer
I assume your recarray is stored in a variable called data. You can convert the array to dataframe using pd.DataFrame and pd.concat. Then you can use pandas.DataFrame.pop to drop the array of lists and pandas.DataFrame.explode to convert column containing list to data in multiple columns.
Reading Data
df = pd.DataFrame()
for record in data:
    temp_df = pd.DataFrame(record.tolist())
    df = pd.concat([df, temp_df])
Pre-processing and Unraveling data
df[['v_1', 'v_2', 'v_3', 'v_4']] = pd.DataFrame(df[1].tolist(), index= df.index) df['weights'] = df.pop(0).explode() df.pop(1)
Output :
This gives us the expected output :
v_1 v_2 v_3 v_4 weights 0 1.764052 0.400157 0.978738 2.240893 0.2 1 1.867558 -0.977278 0.950088 -0.151357 0.2 2 -0.103219 0.410598 0.144044 1.454274 0.2 3 0.761038 0.121675 0.443863 0.333674 0.2 4 1.494079 -0.205158 0.313068 -0.854096 0.2 5 1.764052 0.400157 0.978738 2.240893 0.1 6 1.867558 -0.977278 0.950088 -0.151357 0.1 7 -0.103219 0.410598 0.144044 1.454274 0.1 8 0.761038 0.121675 0.443863 0.333674 0.1 9 1.494079 -0.205158 0.313068 -0.854096 0.1 10 -2.552990 0.653619 0.864436 -0.742165 0.1 11 2.269755 -1.454366 0.045759 -0.187184 0.1 12 1.532779 1.469359 0.154947 0.378163 0.1 13 -0.887786 -1.980796 -0.347912 0.156349 0.1 14 1.230291 1.202380 -0.387327 -0.302303 0.1 15 1.764052 0.400157 0.978738 2.240893 0.166667 16 1.867558 -0.977278 0.950088 -0.151357 0.166667 17 -0.103219 0.410598 0.144044 1.454274 0.166667 18 0.761038 0.121675 0.443863 0.333674 0.166667 19 1.494079 -0.205158 0.313068 -0.854096 0.166667 20 -2.552990 0.653619 0.864436 -0.742165 0.166667 21 1.764052 0.400157 0.978738 2.240893 0.058824 22 1.867558 -0.977278 0.950088 -0.151357 0.058824 23 -0.103219 0.410598 0.144044 1.454274 0.058824 24 0.761038 0.121675 0.443863 0.333674 0.058824 25 1.494079 -0.205158 0.313068 -0.854096 0.058824 26 -2.552990 0.653619 0.864436 -0.742165 0.058824 27 2.269755 -1.454366 0.045759 -0.187184 0.058824 28 1.532779 1.469359 0.154947 0.378163 0.058824 29 -0.887786 -1.980796 -0.347912 0.156349 0.058824 30 1.230291 1.202380 -0.387327 -0.302303 0.058824 31 -1.048553 -1.420018 -1.706270 1.950775 0.058824 32 -0.509652 -0.438074 -1.252795 0.777490 0.058824 33 -1.613898 -0.212740 -0.895467 0.386902 0.058824 34 -0.510805 -1.180632 -0.028182 0.428332 0.058824 35 0.066517 0.302472 -0.634322 -0.362741 0.058824 36 -0.672460 -0.359553 -0.813146 -1.726283 0.058824 37 0.177426 -0.401781 -1.630198 0.462782 0.058824
Alternatively
The same thing can be done using np.hstack as well, where data is the list of your recarray.
df = pd.DataFrame(np.hstack(data).tolist()) df['weights'] = df[0].explode() df[['v_1', 'v_2', 'v_3', 'v_4']] = pd.DataFrame(df[1].tolist()) df.drop([0, 1], inplace=True, axis=1)
Output
This gives us the same output
weights v_1 v_2 v_3 v_4 0 0.2 1.764052 0.400157 0.978738 2.240893 1 0.2 1.867558 -0.977278 0.950088 -0.151357 2 0.2 -0.103219 0.410598 0.144044 1.454274 3 0.2 0.761038 0.121675 0.443863 0.333674 4 0.2 1.494079 -0.205158 0.313068 -0.854096 5 0.1 1.764052 0.400157 0.978738 2.240893 6 0.1 1.867558 -0.977278 0.950088 -0.151357 7 0.1 -0.103219 0.410598 0.144044 1.454274 8 0.1 0.761038 0.121675 0.443863 0.333674 9 0.1 1.494079 -0.205158 0.313068 -0.854096 10 0.1 -2.552990 0.653619 0.864436 -0.742165 11 0.1 2.269755 -1.454366 0.045759 -0.187184 12 0.1 1.532779 1.469359 0.154947 0.378163 13 0.1 -0.887786 -1.980796 -0.347912 0.156349 14 0.1 1.230291 1.202380 -0.387327 -0.302303 15 0.166667 1.764052 0.400157 0.978738 2.240893 16 0.166667 1.867558 -0.977278 0.950088 -0.151357 17 0.166667 -0.103219 0.410598 0.144044 1.454274 18 0.166667 0.761038 0.121675 0.443863 0.333674 19 0.166667 1.494079 -0.205158 0.313068 -0.854096 20 0.166667 -2.552990 0.653619 0.864436 -0.742165 21 0.058824 1.764052 0.400157 0.978738 2.240893 22 0.058824 1.867558 -0.977278 0.950088 -0.151357 23 0.058824 -0.103219 0.410598 0.144044 1.454274 24 0.058824 0.761038 0.121675 0.443863 0.333674 25 0.058824 1.494079 -0.205158 0.313068 -0.854096 26 0.058824 -2.552990 0.653619 0.864436 -0.742165 27 0.058824 2.269755 -1.454366 0.045759 -0.187184 28 0.058824 1.532779 1.469359 0.154947 0.378163 29 0.058824 -0.887786 -1.980796 -0.347912 0.156349 30 0.058824 1.230291 1.202380 -0.387327 -0.302303 31 0.058824 -1.048553 -1.420018 -1.706270 1.950775 32 0.058824 -0.509652 -0.438074 -1.252795 0.777490 33 0.058824 -1.613898 -0.212740 -0.895467 0.386902 34 0.058824 -0.510805 -1.180632 -0.028182 0.428332 35 0.058824 0.066517 0.302472 -0.634322 -0.362741 36 0.058824 -0.672460 -0.359553 -0.813146 -1.726283 37 0.058824 0.177426 -0.401781 -1.630198 0.462782