I have a n-long list of arrays. Each array consists of two columns: A) index values between 1-500 B) measured values
Each A column is slightly different (i.e. missing or having extra values).
I want to create single large array where i) there is single A (index) column consisting of all the index values and ii) all the B (measured values) columns appropriately sorted, so they are in the same row as the original index value. The missing values would be filled with nan or 0s.
Array examples:
# A B arr1 = np.array([[ 25, 64], [ 45, 26]]) arr2 = np.array([[ 8, 54], [ 25, 2], [ 45, 84], [ 128, 22]]) arr3 = np.array([[ 17, 530], [255, 25]])
Array of my dreams:
# A B # arr1 arr2 arr3 dreamArr = array([[8, 0, 54, 0], [17, 0, 0, 530], [25, 64, 2, 0], [45, 26, 84, 0], [128, 0, 22, 0], [255, 0, 0, 25]])
I tried creating an np.zeros()
array and replaced the individual columns with small arrays and got stuck.
Then I tried getting all the A values upfront by np.vstack()
, removed duplicates with np.unique
, np.sort()
ed them and got stuck again.
All input is much appreciated!
Advertisement
Answer
It’s quite simple using pandas:
import pandas as pd arrs = [arr1, arr2, arr3] out = (pd .concat([pd.DataFrame(a).set_index(0) for a in arrs], axis=1) .fillna(0, downcast='infer') .sort_index().reset_index().to_numpy() )
output:
array([[ 8, 0, 54, 0], [ 17, 0, 0, 530], [ 25, 64, 2, 0], [ 45, 26, 84, 0], [128, 0, 22, 0], [255, 0, 0, 25]])