Skip to content
Advertisement

Convert x same size 2D numpy arrays to a 2+x column data frame

I have two ndarrays of size (m x n), and two lists of length m and n respectively. I want to convert the two matrices to a dataframe with four columns. The first two columns correspond to the m and n dimensions, and contain the values from the lists. The next two columns should contain the values from the two matrices. In total, the resulting dataframe should have m times n rows.

Example: If these are the two matrices and two lists,

a1 = np.array([[1, 2], [3, 4],[5,6]])
a2 = np.array([[10, 20], [30, 40],[50,60]])
l1 = [5,7,99]
l2 = [2,3]

then the resulting dataframe should look like this:

"l1" "l2" "a1" "a2"
5    2    1    10
7    2    3    30
99   2    5    50
5    3    2    20
7    3    4    40
99   3    6    60

The order of the rows does not matter.

Although I only have two matrices in this specific case, I am curious about a solution which is easily applicable to any number of same size matrices.

Advertisement

Answer

Use np.vstack for join arrays created by numpy.tile, numpy.repeat and numpy.ravel and pass to DataFrame cosntructor:

a = np.vstack((np.tile(l1, len(l2)),
               np.repeat(l2, len(l1)),
               np.ravel(a1, 'F'), 
               np.ravel(a2, 'F'))).T
print (a)
[[ 5  2  1 10]
 [ 7  2  3 30]
 [99  2  5 50]
 [ 5  3  2 20]
 [ 7  3  4 40]
 [99  3  6 60]]


df = pd.DataFrame(a, columns=['l1','l2','a1','a2'])
print (df)
   l1  l2  a1  a2
0   5   2   1  10
1   7   2   3  30
2  99   2   5  50
3   5   3   2  20
4   7   3   4  40
5  99   3   6  60

For multiple arrays:

arrays =  [a1, a2]

arr = [np.ravel(a, 'F') for a in arrays]
a = np.vstack((np.tile(l1, len(l2)), 
               np.repeat(l2, len(l1)),
               arr)).T
print (a)
[[ 5  2  1 10]
 [ 7  2  3 30]
 [99  2  5 50]
 [ 5  3  2 20]
 [ 7  3  4 40]
 [99  3  6 60]]


df = pd.DataFrame(a, columns=['l1','l2'] + [f'a{x+1}' for x in range(len(arrays))])
print (df)
   l1  l2  a1  a2
0   5   2   1  10
1   7   2   3  30
2  99   2   5  50
3   5   3   2  20
4   7   3   4  40
5  99   3   6  60

Pandas only solution with concat and DataFrame.unstack:

df = (pd.concat([pd.DataFrame(a1, columns=l2, index=l1).unstack(),
                pd.DataFrame(a2, columns=l2, index=l1).unstack()],
               axis=1, keys=['a1','a2'])
        .rename_axis(['l2','l1']).swaplevel(1,0).reset_index())
print (df)
   l1  l2  a1  a2
0   5   2   1  10
1   7   2   3  30
2  99   2   5  50
3   5   3   2  20
4   7   3   4  40
5  99   3   6  60

For multiple arrays:

arrays =  [a1, a2]
df = (pd.concat([pd.DataFrame(a, columns=l2, index=l1).unstack() for a in arrays],
               axis=1)
        .rename_axis(['l2','l1'])
        .swaplevel(1,0)
        .rename(columns=lambda x: f'a{x+1}')
        .reset_index())
print (df)
   l1  l2  a1  a2
0   5   2   1  10
1   7   2   3  30
2  99   2   5  50
3   5   3   2  20
4   7   3   4  40
5  99   3   6  60
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement