I have a pandas DataFrame with a structure like this:
In [22]: df Out[22]: a b 0 [1, 2, 3] [4, 5, 6] 1 [7, 8, 9] [10, 11, 12]
(to build it, do something like
df = pd.DataFrame([[object(), object()], [object(), object()]], columns=["a", "b"]) df.iat[0, 0] = [1, 2, 3] df.iat[0, 1] = [4, 5, 6] df.iat[1, 0] = [7, 8, 9] df.iat[1, 1] = [10, 11, 12]
What would be the simplest way to turn it into a NumPy 3-dimensional array? This would be the expected result:
In [20]: arr Out[20]: array([[[ 1, 2, 3], [ 4, 5, 6]], [[ 7, 8, 9], [10, 11, 12]]]) In [21]: arr.shape Out[21]: (2, 2, 3) In [22]: df.iloc[0, 0] Out[22]: [1, 2, 3] In [23]: arr[0, 0] Out[23]: array([1, 2, 3]) In [24]: df.iloc[-1] Out[24]: a [7, 8, 9] b [10, 11, 12] Name: 1, dtype: object In [25]: arr[-1] Out[25]: array([[ 7, 8, 9], [10, 11, 12]])
I have tried several things, without success:
In [6]: df.values # Notice the dtype Out[6]: array([[list([1, 2, 3]), list([4, 5, 6])], [list([7, 8, 9]), list([10, 11, 12])]], dtype=object) In [7]: df.values.astype(int) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) TypeError: int() argument must be a string, a bytes-like object or a real number, not 'list' The above exception was the direct cause of the following exception: ValueError Traceback (most recent call last) Input In [7], in <cell line: 1>() ----> 1 df.values.astype(int) ValueError: setting an array element with a sequence. In [14]: df.values.reshape(2, 2, -1) Out[14]: array([[[list([1, 2, 3])], [list([4, 5, 6])]], [[list([7, 8, 9])], [list([10, 11, 12])]]], dtype=object)
Advertisement
Answer
One option is to convert df
to a list; then cast to numpy array:
out = np.array(df.to_numpy().tolist())
Output:
>>> out array([[[ 1, 2, 3], [ 4, 5, 6]], [[ 7, 8, 9], [10, 11, 12]]]) >>> out.shape (2, 2, 3) >>> out[0,0] array([1, 2, 3]) >>> out[-1] array([[ 7, 8, 9], [10, 11, 12]])