Skip to content
Advertisement

Combine and sort multiple array columns of values A and B where A is the common index

I have a n-long list of arrays. Each array consists of two columns: A) index values between 1-500 B) measured values

Each A column is slightly different (i.e. missing or having extra values).

I want to create single large array where i) there is single A (index) column consisting of all the index values and ii) all the B (measured values) columns appropriately sorted, so they are in the same row as the original index value. The missing values would be filled with nan or 0s.

Array examples:

#         A      B
arr1 = np.array([[  25,    64],
                 [  45,    26]])

arr2 = np.array([[ 8,     54],
                [  25,     2],
                [  45,    84],
                [  128,   22]])

arr3 = np.array([[ 17,   530],
                 [255,    25]])

Array of my dreams:

#                   A          B  
#                      arr1  arr2  arr3
dreamArr = array([[8,     0,   54,    0],
                  [17,    0,    0,  530],
                  [25,   64,    2,    0],
                  [45,   26,   84,    0],
                  [128,   0,   22,    0],
                  [255,   0,     0,  25]])

I tried creating an np.zeros() array and replaced the individual columns with small arrays and got stuck.

Then I tried getting all the A values upfront by np.vstack(), removed duplicates with np.unique, np.sort()ed them and got stuck again.

All input is much appreciated!

Advertisement

Answer

It’s quite simple using :

import pandas as pd

arrs = [arr1, arr2, arr3]

out = (pd
 .concat([pd.DataFrame(a).set_index(0) for a in arrs], axis=1)
 .fillna(0, downcast='infer')
 .sort_index().reset_index().to_numpy()
)

output:

array([[  8,   0,  54,   0],
       [ 17,   0,   0, 530],
       [ 25,  64,   2,   0],
       [ 45,  26,  84,   0],
       [128,   0,  22,   0],
       [255,   0,   0,  25]])
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement