Convert x same size 2D numpy arrays to a 2+x column data frame

Question

I have two ndarrays of size (m x n), and two lists of length m and n respectively. I want to convert the two matrices to a dataframe with four columns. The first two columns correspond to the m and n dimensions, and contain the values from the lists. The next two columns should contain the values from the two

Accepted Answer

Use np.vstack for join arrays created by numpy.tile, numpy.repeat and numpy.ravel and pass to DataFrame cosntructor:a = np.vstack((np.tile(l1, len(l2)),               np.repeat(l2, len(l1)),               np.ravel(a1, 'F'),                np.ravel(a2, 'F'))).Tprint (a)[[ 5  2  1 10] [ 7  2  3 30] [99  2  5 50] [ 5  3  2 20] [ 7  3  4 40] [99  3  6 60]]df = pd.DataFrame(a, columns=['l1','l2','a1','a2'])print (df)   l1  l2  a1  a20   5   2   1  101   7   2   3  302  99   2   5  503   5   3   2  204   7   3   4  405  99   3   6  60For multiple arrays:arrays =  [a1, a2]arr = [np.ravel(a, 'F') for a in arrays]a = np.vstack((np.tile(l1, len(l2)),                np.repeat(l2, len(l1)),               arr)).Tprint (a)[[ 5  2  1 10] [ 7  2  3 30] [99  2  5 50] [ 5  3  2 20] [ 7  3  4 40] [99  3  6 60]]df = pd.DataFrame(a, columns=['l1','l2'] + [f'a{x+1}' for x in range(len(arrays))])print (df)   l1  l2  a1  a20   5   2   1  101   7   2   3  302  99   2   5  503   5   3   2  204   7   3   4  405  99   3   6  60Pandas only solution with concat and DataFrame.unstack:df = (pd.concat([pd.DataFrame(a1, columns=l2, index=l1).unstack(),                pd.DataFrame(a2, columns=l2, index=l1).unstack()],               axis=1, keys=['a1','a2'])        .rename_axis(['l2','l1']).swaplevel(1,0).reset_index())print (df)   l1  l2  a1  a20   5   2   1  101   7   2   3  302  99   2   5  503   5   3   2  204   7   3   4  405  99   3   6  60For multiple arrays:arrays =  [a1, a2]df = (pd.concat([pd.DataFrame(a, columns=l2, index=l1).unstack() for a in arrays],               axis=1)        .rename_axis(['l2','l1'])        .swaplevel(1,0)        .rename(columns=lambda x: f'a{x+1}')        .reset_index())print (df)   l1  l2  a1  a20   5   2   1  101   7   2   3  302  99   2   5  503   5   3   2  204   7   3   4  405  99   3   6  60

Advertisement

Answer