Reshaping a 3D array to a 2D array to produce a DataFrame: keep track of indices to produce column names

Tags: , , ,



The following code generates a pandas.DataFrame from a 3D array over the first axis. I manually create the columns names (defining cols): is there a more built-in way to do this (to avoid potential errors e.g. regarding C-order)?

–> I am looking for a way to guarantee the respect of the order of the indices after the reshape operation (here it relies on the correct order of the iterations over range(nrow) and range(ncol)).

import numpy as np
import pandas as pd

nt = 6 ; nrow = 4 ; ncol = 3 ; shp = (nt, nrow, ncol)

np.random.seed(0)
a = np.array(np.random.randint(0, 1000, nt*nrow*ncol)).reshape(shp)

# This is the line I think should be improved --> any numpy function or so?
cols = [str(i) + '-' + str(j) for i in range(nrow) for j in range(ncol)]

adf = pd.DataFrame(a.reshape(nt, -1), columns = cols)

print(adf)

   0-0  0-1  0-2  1-0  1-1  1-2  2-0  2-1  2-2  3-0  3-1  3-2
0  684  559  629  192  835  763  707  359    9  723  277  754
1  804  599   70  472  600  396  314  705  486  551   87  174
2  600  849  677  537  845   72  777  916  115  976  755  709
3  847  431  448  850   99  984  177  755  797  659  147  910
4  423  288  961  265  697  639  544  543  714  244  151  675
5  510  459  882  183   28  802  128  128  932   53  901  550

EDIT

Illustrating why I don’t like my solution – it is just too easy to make a code which technically works but produce a wrong result (inverting i and j or nrow and ncol):

wrongcols1 = [str(i) + '-' + str(j) for i in range(ncol) for j in range(nrow)]
adf2 = pd.DataFrame(a.reshape(nt, -1), columns=wrongcols1)
print(adf2)
   0-0  0-1  0-2  0-3  1-0  1-1  1-2  1-3  2-0  2-1  2-2  2-3
0  684  559  629  192  835  763  707  359    9  723  277  754
1  804  599   70  472  600  396  314  705  486  551   87  174
2  600  849  677  537  845   72  777  916  115  976  755  709
3  847  431  448  850   99  984  177  755  797  659  147  910
4  423  288  961  265  697  639  544  543  714  244  151  675
5  510  459  882  183   28  802  128  128  932   53  901  550

wrongcols2 = [str(j) + '-' + str(i) for i in range(nrow) for j in range(ncol)]
adf3 = pd.DataFrame(a.reshape(nt, -1), columns=wrongcols2)
print(adf3)
   0-0  1-0  2-0  0-1  1-1  2-1  0-2  1-2  2-2  0-3  1-3  2-3
0  684  559  629  192  835  763  707  359    9  723  277  754
1  804  599   70  472  600  396  314  705  486  551   87  174
2  600  849  677  537  845   72  777  916  115  976  755  709
3  847  431  448  850   99  984  177  755  797  659  147  910
4  423  288  961  265  697  639  544  543  714  244  151  675
5  510  459  882  183   28  802  128  128  932   53  901  550

Answer

Try this and see if it fits your use case:

Generate columns via a combination of np.indices, np.dstack and np.vstack :

columns = np.vstack(np.dstack(np.indices((nrow, ncol))))

array([[0, 0],
       [0, 1],
       [0, 2],
       [1, 0],
       [1, 1],
       [1, 2],
       [2, 0],
       [2, 1],
       [2, 2],
       [3, 0],
       [3, 1],
       [3, 2]])

Now convert to string via a combination of map, join and list comprehension:

columns = ["-".join(map(str, entry)) for entry in columns]
['0-0',
 '0-1',
 '0-2',
 '1-0',
 '1-1',
 '1-2',
 '2-0',
 '2-1',
 '2-2',
 '3-0',
 '3-1',
 '3-2']

Let’s know how it goes.



Source: stackoverflow