I’ve been trying to convert a numpy rec.array into a dataframe. The current array looks like:
[rec.array([([0.2], [ 1.76405235, 0.40015721, 0.97873798, 2.2408932 ]), ([0.2], [ 1.86755799, -0.97727788, 0.95008842, -0.15135721]), ([0.2], [-0.10321885, 0.4105985 , 0.14404357, 1.45427351]), ([0.2], [ 0.76103773, 0.12167502, 0.44386323, 0.33367433]), ([0.2], [ 1.49407907, -0.20515826, 0.3130677 , -0.85409574])], dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))]), rec.array([([0.1], [ 1.76405235, 0.40015721, 0.97873798, 2.2408932 ]), ([0.1], [ 1.86755799, -0.97727788, 0.95008842, -0.15135721]), ([0.1], [-0.10321885, 0.4105985 , 0.14404357, 1.45427351]), ([0.1], [ 0.76103773, 0.12167502, 0.44386323, 0.33367433]), ([0.1], [ 1.49407907, -0.20515826, 0.3130677 , -0.85409574]), ([0.1], [-2.55298982, 0.6536186 , 0.8644362 , -0.74216502]), ([0.1], [ 2.26975462, -1.45436567, 0.04575852, -0.18718385]), ([0.1], [ 1.53277921, 1.46935877, 0.15494743, 0.37816252]), ([0.1], [-0.88778575, -1.98079647, -0.34791215, 0.15634897]), ([0.1], [ 1.23029068, 1.20237985, -0.38732682, -0.30230275])], dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))]), rec.array([([0.16666667], [ 1.76405235, 0.40015721, 0.97873798, 2.2408932 ]), ([0.16666667], [ 1.86755799, -0.97727788, 0.95008842, -0.15135721]), ([0.16666667], [-0.10321885, 0.4105985 , 0.14404357, 1.45427351]), ([0.16666667], [ 0.76103773, 0.12167502, 0.44386323, 0.33367433]), ([0.16666667], [ 1.49407907, -0.20515826, 0.3130677 , -0.85409574]), ([0.16666667], [-2.55298982, 0.6536186 , 0.8644362 , -0.74216502])], dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))]), rec.array([([0.05882353], [ 1.76405235, 0.40015721, 0.97873798, 2.2408932 ]), ([0.05882353], [ 1.86755799, -0.97727788, 0.95008842, -0.15135721]), ([0.05882353], [-0.10321885, 0.4105985 , 0.14404357, 1.45427351]), ([0.05882353], [ 0.76103773, 0.12167502, 0.44386323, 0.33367433]), ([0.05882353], [ 1.49407907, -0.20515826, 0.3130677 , -0.85409574]), ([0.05882353], [-2.55298982, 0.6536186 , 0.8644362 , -0.74216502]), ([0.05882353], [ 2.26975462, -1.45436567, 0.04575852, -0.18718385]), ([0.05882353], [ 1.53277921, 1.46935877, 0.15494743, 0.37816252]), ([0.05882353], [-0.88778575, -1.98079647, -0.34791215, 0.15634897]), ([0.05882353], [ 1.23029068, 1.20237985, -0.38732682, -0.30230275]), ([0.05882353], [-1.04855297, -1.42001794, -1.70627019, 1.9507754 ]), ([0.05882353], [-0.50965218, -0.4380743 , -1.25279536, 0.77749036]), ([0.05882353], [-1.61389785, -0.21274028, -0.89546656, 0.3869025 ]), ([0.05882353], [-0.51080514, -1.18063218, -0.02818223, 0.42833187]), ([0.05882353], [ 0.06651722, 0.3024719 , -0.63432209, -0.36274117]), ([0.05882353], [-0.67246045, -0.35955316, -0.81314628, -1.7262826 ]), ([0.05882353], [ 0.17742614, -0.40178094, -1.63019835, 0.46278226])]], dtype=[('weights', '<f8', (1,)), ('integration', '<f8', (4,))])]
The result should be a five-column dataframe like the following:
Weights | v_1 | v_2 | v_3 | v_4 |
---|---|---|---|---|
0.2 | 1.76405235 | 0.40015721 | 0.97873798 | 2.2408932 |
0.2 | 1.86755799 | -0.97727788 | 0.95008842 | -0.15135721 |
…. | …. | … | … | … |
0.05882353 | 0.17742614 | -0.40178094 | -1.63019835 | 0.46278226 |
and so on..
However, as I do pd.DataFrame(my_list)
, the resulting dataframe has like 90 columns and not 5 as the above. Each column represents a sublist of the array of the form [a], [w, x, y, z]. The resulting dataframe should be: 5 columns and number of rows equal to 32 (for the above example).
Advertisement
Answer
I assume your recarray
is stored in a variable called data
. You can convert the array to dataframe using pd.DataFrame
and pd.concat
. Then you can use pandas.DataFrame.pop
to drop the array of lists and pandas.DataFrame.explode
to convert column containing list to data in multiple columns.
Reading Data
df = pd.DataFrame() for record in data: temp_df = pd.DataFrame(record.tolist()) df = pd.concat([df, temp_df])
Pre-processing and Unraveling data
df[['v_1', 'v_2', 'v_3', 'v_4']] = pd.DataFrame(df[1].tolist(), index= df.index) df['weights'] = df.pop(0).explode() df.pop(1)
Output :
This gives us the expected output :
v_1 v_2 v_3 v_4 weights 0 1.764052 0.400157 0.978738 2.240893 0.2 1 1.867558 -0.977278 0.950088 -0.151357 0.2 2 -0.103219 0.410598 0.144044 1.454274 0.2 3 0.761038 0.121675 0.443863 0.333674 0.2 4 1.494079 -0.205158 0.313068 -0.854096 0.2 5 1.764052 0.400157 0.978738 2.240893 0.1 6 1.867558 -0.977278 0.950088 -0.151357 0.1 7 -0.103219 0.410598 0.144044 1.454274 0.1 8 0.761038 0.121675 0.443863 0.333674 0.1 9 1.494079 -0.205158 0.313068 -0.854096 0.1 10 -2.552990 0.653619 0.864436 -0.742165 0.1 11 2.269755 -1.454366 0.045759 -0.187184 0.1 12 1.532779 1.469359 0.154947 0.378163 0.1 13 -0.887786 -1.980796 -0.347912 0.156349 0.1 14 1.230291 1.202380 -0.387327 -0.302303 0.1 15 1.764052 0.400157 0.978738 2.240893 0.166667 16 1.867558 -0.977278 0.950088 -0.151357 0.166667 17 -0.103219 0.410598 0.144044 1.454274 0.166667 18 0.761038 0.121675 0.443863 0.333674 0.166667 19 1.494079 -0.205158 0.313068 -0.854096 0.166667 20 -2.552990 0.653619 0.864436 -0.742165 0.166667 21 1.764052 0.400157 0.978738 2.240893 0.058824 22 1.867558 -0.977278 0.950088 -0.151357 0.058824 23 -0.103219 0.410598 0.144044 1.454274 0.058824 24 0.761038 0.121675 0.443863 0.333674 0.058824 25 1.494079 -0.205158 0.313068 -0.854096 0.058824 26 -2.552990 0.653619 0.864436 -0.742165 0.058824 27 2.269755 -1.454366 0.045759 -0.187184 0.058824 28 1.532779 1.469359 0.154947 0.378163 0.058824 29 -0.887786 -1.980796 -0.347912 0.156349 0.058824 30 1.230291 1.202380 -0.387327 -0.302303 0.058824 31 -1.048553 -1.420018 -1.706270 1.950775 0.058824 32 -0.509652 -0.438074 -1.252795 0.777490 0.058824 33 -1.613898 -0.212740 -0.895467 0.386902 0.058824 34 -0.510805 -1.180632 -0.028182 0.428332 0.058824 35 0.066517 0.302472 -0.634322 -0.362741 0.058824 36 -0.672460 -0.359553 -0.813146 -1.726283 0.058824 37 0.177426 -0.401781 -1.630198 0.462782 0.058824
Alternatively
The same thing can be done using np.hstack
as well, where data is the list of your recarray.
df = pd.DataFrame(np.hstack(data).tolist()) df['weights'] = df[0].explode() df[['v_1', 'v_2', 'v_3', 'v_4']] = pd.DataFrame(df[1].tolist()) df.drop([0, 1], inplace=True, axis=1)
Output
This gives us the same output
weights v_1 v_2 v_3 v_4 0 0.2 1.764052 0.400157 0.978738 2.240893 1 0.2 1.867558 -0.977278 0.950088 -0.151357 2 0.2 -0.103219 0.410598 0.144044 1.454274 3 0.2 0.761038 0.121675 0.443863 0.333674 4 0.2 1.494079 -0.205158 0.313068 -0.854096 5 0.1 1.764052 0.400157 0.978738 2.240893 6 0.1 1.867558 -0.977278 0.950088 -0.151357 7 0.1 -0.103219 0.410598 0.144044 1.454274 8 0.1 0.761038 0.121675 0.443863 0.333674 9 0.1 1.494079 -0.205158 0.313068 -0.854096 10 0.1 -2.552990 0.653619 0.864436 -0.742165 11 0.1 2.269755 -1.454366 0.045759 -0.187184 12 0.1 1.532779 1.469359 0.154947 0.378163 13 0.1 -0.887786 -1.980796 -0.347912 0.156349 14 0.1 1.230291 1.202380 -0.387327 -0.302303 15 0.166667 1.764052 0.400157 0.978738 2.240893 16 0.166667 1.867558 -0.977278 0.950088 -0.151357 17 0.166667 -0.103219 0.410598 0.144044 1.454274 18 0.166667 0.761038 0.121675 0.443863 0.333674 19 0.166667 1.494079 -0.205158 0.313068 -0.854096 20 0.166667 -2.552990 0.653619 0.864436 -0.742165 21 0.058824 1.764052 0.400157 0.978738 2.240893 22 0.058824 1.867558 -0.977278 0.950088 -0.151357 23 0.058824 -0.103219 0.410598 0.144044 1.454274 24 0.058824 0.761038 0.121675 0.443863 0.333674 25 0.058824 1.494079 -0.205158 0.313068 -0.854096 26 0.058824 -2.552990 0.653619 0.864436 -0.742165 27 0.058824 2.269755 -1.454366 0.045759 -0.187184 28 0.058824 1.532779 1.469359 0.154947 0.378163 29 0.058824 -0.887786 -1.980796 -0.347912 0.156349 30 0.058824 1.230291 1.202380 -0.387327 -0.302303 31 0.058824 -1.048553 -1.420018 -1.706270 1.950775 32 0.058824 -0.509652 -0.438074 -1.252795 0.777490 33 0.058824 -1.613898 -0.212740 -0.895467 0.386902 34 0.058824 -0.510805 -1.180632 -0.028182 0.428332 35 0.058824 0.066517 0.302472 -0.634322 -0.362741 36 0.058824 -0.672460 -0.359553 -0.813146 -1.726283 37 0.058824 0.177426 -0.401781 -1.630198 0.462782