I have a specific matrix with dimensions (nz,ny,nx) and another matrix with dimensions (ny,nx). In this other matrix are specific values and for instance I want to sum all the points in this first 3-dimensional matrix at locations where the second matrix has a specific value.
I am doing the following:
kkw = np.squeeze(np.where(wz==wval))
which has (2,X) elements and when I now try to do:
serout = np.sum(dum[:,kkw],axis=(1,2))
I end with something that is completely wrong or even uses a lot of memory and gets killed in the end.
What is the correct way to broadcast/sum the values in the original matrix using another matrix for mask and a specific mask value to make this sum?
Here is the small example of the problem:
nz,ny,nx = 3,20,10 datain = np.random.random((nz,ny,nx)) mask = np.array(5+5.*np.random.random((ny,nx)),dtype='int'); kkw = np.squeeze(np.where(mask==5)) dataout = np.sum(datain[:,kkw],axis=(1,2))
So, in the end I would expect dataout to have  3 elements, but it has dimensions (3,20). Clearly, I am not doing the broadcasting correctly.
Advertisement
Answer
Your sample arrays:
In [165]: nz,ny,nx = 3,20,10
     ...: datain = np.random.random((nz,ny,nx))
     ...: mask = np.array(5+5.*np.random.random((ny,nx)),dtype='int');
     ...: 
In [166]: datain.shape
Out[166]: (3, 20, 10)
In [167]: mask.shape
Out[167]: (20, 10)
Indices where mask is 5:
In [170]: np.where(mask==5)
Out[170]: 
(array([ 0,  0,  0,  0,  1,  1,  1,  2,  3,  3,  3,  4,  4,  4,  4,  5,  6,
         6,  6,  7,  7,  8,  8,  9,  9,  9,  9, 10, 10, 10, 11, 12, 13, 13,
        13, 13, 13, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 18, 18, 18, 19,
        19]),
 array([1, 3, 6, 9, 3, 6, 8, 9, 1, 5, 7, 0, 2, 3, 6, 9, 1, 6, 9, 8, 9, 3,
        5, 1, 3, 4, 8, 0, 4, 9, 0, 2, 0, 3, 5, 6, 8, 2, 5, 9, 0, 1, 2, 5,
        8, 0, 4, 6, 8, 9, 3, 7]))
I haven’t seen squeeze applied to this, but all it’s doing is converting the tuple of arrays to a (2,52) array.  np.argwhere applies np.transpose, producing a (52,2) array.
In [171]: np.squeeze(_)
Out[171]: 
array([[ 0,  0,  0,  0,  1,  1,  1,  2,  3,  3,  3,  4,  4,  4,  4,  5,
         6,  6,  6,  7,  7,  8,  8,  9,  9,  9,  9, 10, 10, 10, 11, 12,
        13, 13, 13, 13, 13, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 18,
        18, 18, 19, 19],
       [ 1,  3,  6,  9,  3,  6,  8,  9,  1,  5,  7,  0,  2,  3,  6,  9,
         1,  6,  9,  8,  9,  3,  5,  1,  3,  4,  8,  0,  4,  9,  0,  2,
         0,  3,  5,  6,  8,  2,  5,  9,  0,  1,  2,  5,  8,  0,  4,  6,
         8,  9,  3,  7]])
The where tuple is easier to use when indexing.
In [172]: I,J = np.where(mask==5) In [173]: I.shape Out[173]: (52,)
Applied to mask itself we get all the 5 values:
In [174]: mask[I,J]
Out[174]: 
array([5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
       5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
       5, 5, 5, 5, 5, 5, 5, 5])
Applied to the 3d array:
In [175]: datain[:,I,J].shape Out[175]: (3, 52) In [176]: datain[:,I,J].sum(axis=1) Out[176]: array([28.01960227, 26.66364387, 26.24709875])
datain[:,kkw] produces a (3,2,52,10) array.  It’s applying the (2,52) array to index just the size 20 dimension.  Converting the where tuple to a 2d array does not help with this indexing.  When indexing, the distinction between tuples, lists, and arrays is very important.