I have two ndarrays of size (m x n), and two lists of length m and n respectively. I want to convert the two matrices to a dataframe with four columns. The first two columns correspond to the m and n dimensions, and contain the values from the lists. The next two columns should contain the values from the two matrices. In total, the resulting dataframe should have m times n rows.
Example: If these are the two matrices and two lists,
a1 = np.array([[1, 2], [3, 4],[5,6]]) a2 = np.array([[10, 20], [30, 40],[50,60]]) l1 = [5,7,99] l2 = [2,3]
then the resulting dataframe should look like this:
"l1" "l2" "a1" "a2" 5 2 1 10 7 2 3 30 99 2 5 50 5 3 2 20 7 3 4 40 99 3 6 60
The order of the rows does not matter.
Although I only have two matrices in this specific case, I am curious about a solution which is easily applicable to any number of same size matrices.
Advertisement
Answer
Use np.vstack
for join arrays created by numpy.tile
, numpy.repeat
and numpy.ravel
and pass to DataFrame
cosntructor:
a = np.vstack((np.tile(l1, len(l2)), np.repeat(l2, len(l1)), np.ravel(a1, 'F'), np.ravel(a2, 'F'))).T print (a) [[ 5 2 1 10] [ 7 2 3 30] [99 2 5 50] [ 5 3 2 20] [ 7 3 4 40] [99 3 6 60]] df = pd.DataFrame(a, columns=['l1','l2','a1','a2']) print (df) l1 l2 a1 a2 0 5 2 1 10 1 7 2 3 30 2 99 2 5 50 3 5 3 2 20 4 7 3 4 40 5 99 3 6 60
For multiple arrays:
arrays = [a1, a2] arr = [np.ravel(a, 'F') for a in arrays] a = np.vstack((np.tile(l1, len(l2)), np.repeat(l2, len(l1)), arr)).T print (a) [[ 5 2 1 10] [ 7 2 3 30] [99 2 5 50] [ 5 3 2 20] [ 7 3 4 40] [99 3 6 60]] df = pd.DataFrame(a, columns=['l1','l2'] + [f'a{x+1}' for x in range(len(arrays))]) print (df) l1 l2 a1 a2 0 5 2 1 10 1 7 2 3 30 2 99 2 5 50 3 5 3 2 20 4 7 3 4 40 5 99 3 6 60
Pandas only solution with concat
and DataFrame.unstack
:
df = (pd.concat([pd.DataFrame(a1, columns=l2, index=l1).unstack(), pd.DataFrame(a2, columns=l2, index=l1).unstack()], axis=1, keys=['a1','a2']) .rename_axis(['l2','l1']).swaplevel(1,0).reset_index()) print (df) l1 l2 a1 a2 0 5 2 1 10 1 7 2 3 30 2 99 2 5 50 3 5 3 2 20 4 7 3 4 40 5 99 3 6 60
For multiple arrays:
arrays = [a1, a2] df = (pd.concat([pd.DataFrame(a, columns=l2, index=l1).unstack() for a in arrays], axis=1) .rename_axis(['l2','l1']) .swaplevel(1,0) .rename(columns=lambda x: f'a{x+1}') .reset_index()) print (df) l1 l2 a1 a2 0 5 2 1 10 1 7 2 3 30 2 99 2 5 50 3 5 3 2 20 4 7 3 4 40 5 99 3 6 60