The following code generates a pandas.DataFrame
from a 3D array over the first axis. I manually create the columns names (defining cols
): is there a more built-in way to do this (to avoid potential errors e.g. regarding C-order)?
–> I am looking for a way to guarantee the respect of the order of the indices after the reshape
operation (here it relies on the correct order of the iterations over range(nrow)
and range(ncol)
).
import numpy as np import pandas as pd nt = 6 ; nrow = 4 ; ncol = 3 ; shp = (nt, nrow, ncol) np.random.seed(0) a = np.array(np.random.randint(0, 1000, nt*nrow*ncol)).reshape(shp) # This is the line I think should be improved --> any numpy function or so? cols = [str(i) + '-' + str(j) for i in range(nrow) for j in range(ncol)] adf = pd.DataFrame(a.reshape(nt, -1), columns = cols) print(adf) 0-0 0-1 0-2 1-0 1-1 1-2 2-0 2-1 2-2 3-0 3-1 3-2 0 684 559 629 192 835 763 707 359 9 723 277 754 1 804 599 70 472 600 396 314 705 486 551 87 174 2 600 849 677 537 845 72 777 916 115 976 755 709 3 847 431 448 850 99 984 177 755 797 659 147 910 4 423 288 961 265 697 639 544 543 714 244 151 675 5 510 459 882 183 28 802 128 128 932 53 901 550
EDIT
Illustrating why I don’t like my solution – it is just too easy to make a code which technically works but produce a wrong result (inverting i
and j
or nrow
and ncol
):
wrongcols1 = [str(i) + '-' + str(j) for i in range(ncol) for j in range(nrow)] adf2 = pd.DataFrame(a.reshape(nt, -1), columns=wrongcols1) print(adf2) 0-0 0-1 0-2 0-3 1-0 1-1 1-2 1-3 2-0 2-1 2-2 2-3 0 684 559 629 192 835 763 707 359 9 723 277 754 1 804 599 70 472 600 396 314 705 486 551 87 174 2 600 849 677 537 845 72 777 916 115 976 755 709 3 847 431 448 850 99 984 177 755 797 659 147 910 4 423 288 961 265 697 639 544 543 714 244 151 675 5 510 459 882 183 28 802 128 128 932 53 901 550 wrongcols2 = [str(j) + '-' + str(i) for i in range(nrow) for j in range(ncol)] adf3 = pd.DataFrame(a.reshape(nt, -1), columns=wrongcols2) print(adf3) 0-0 1-0 2-0 0-1 1-1 2-1 0-2 1-2 2-2 0-3 1-3 2-3 0 684 559 629 192 835 763 707 359 9 723 277 754 1 804 599 70 472 600 396 314 705 486 551 87 174 2 600 849 677 537 845 72 777 916 115 976 755 709 3 847 431 448 850 99 984 177 755 797 659 147 910 4 423 288 961 265 697 639 544 543 714 244 151 675 5 510 459 882 183 28 802 128 128 932 53 901 550
Advertisement
Answer
Try this and see if it fits your use case:
Generate columns via a combination of np.indices, np.dstack and np.vstack :
columns = np.vstack(np.dstack(np.indices((nrow, ncol)))) array([[0, 0], [0, 1], [0, 2], [1, 0], [1, 1], [1, 2], [2, 0], [2, 1], [2, 2], [3, 0], [3, 1], [3, 2]])
Now convert to string via a combination of map, join and list comprehension:
columns = ["-".join(map(str, entry)) for entry in columns] ['0-0', '0-1', '0-2', '1-0', '1-1', '1-2', '2-0', '2-1', '2-2', '3-0', '3-1', '3-2']
Let’s know how it goes.