I have a specific matrix with dimensions (nz,ny,nx) and another matrix with dimensions (ny,nx). In this other matrix are specific values and for instance I want to sum all the points in this first 3-dimensional matrix at locations where the second matrix has a specific value.
I am doing the following:
kkw = np.squeeze(np.where(wz==wval))
which has (2,X) elements and when I now try to do:
serout = np.sum(dum[:,kkw],axis=(1,2))
I end with something that is completely wrong or even uses a lot of memory and gets killed in the end.
What is the correct way to broadcast/sum the values in the original matrix using another matrix for mask and a specific mask value to make this sum?
Here is the small example of the problem:
nz,ny,nx = 3,20,10 datain = np.random.random((nz,ny,nx)) mask = np.array(5+5.*np.random.random((ny,nx)),dtype='int'); kkw = np.squeeze(np.where(mask==5)) dataout = np.sum(datain[:,kkw],axis=(1,2))
So, in the end I would expect dataout
to have 3 elements, but it has dimensions (3,20). Clearly, I am not doing the broadcasting correctly.
Advertisement
Answer
Your sample arrays:
In [165]: nz,ny,nx = 3,20,10 ...: datain = np.random.random((nz,ny,nx)) ...: mask = np.array(5+5.*np.random.random((ny,nx)),dtype='int'); ...: In [166]: datain.shape Out[166]: (3, 20, 10) In [167]: mask.shape Out[167]: (20, 10)
Indices where mask
is 5:
In [170]: np.where(mask==5) Out[170]: (array([ 0, 0, 0, 0, 1, 1, 1, 2, 3, 3, 3, 4, 4, 4, 4, 5, 6, 6, 6, 7, 7, 8, 8, 9, 9, 9, 9, 10, 10, 10, 11, 12, 13, 13, 13, 13, 13, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 18, 18, 18, 19, 19]), array([1, 3, 6, 9, 3, 6, 8, 9, 1, 5, 7, 0, 2, 3, 6, 9, 1, 6, 9, 8, 9, 3, 5, 1, 3, 4, 8, 0, 4, 9, 0, 2, 0, 3, 5, 6, 8, 2, 5, 9, 0, 1, 2, 5, 8, 0, 4, 6, 8, 9, 3, 7]))
I haven’t seen squeeze
applied to this, but all it’s doing is converting the tuple of arrays to a (2,52) array. np.argwhere
applies np.transpose
, producing a (52,2) array.
In [171]: np.squeeze(_) Out[171]: array([[ 0, 0, 0, 0, 1, 1, 1, 2, 3, 3, 3, 4, 4, 4, 4, 5, 6, 6, 6, 7, 7, 8, 8, 9, 9, 9, 9, 10, 10, 10, 11, 12, 13, 13, 13, 13, 13, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 18, 18, 18, 19, 19], [ 1, 3, 6, 9, 3, 6, 8, 9, 1, 5, 7, 0, 2, 3, 6, 9, 1, 6, 9, 8, 9, 3, 5, 1, 3, 4, 8, 0, 4, 9, 0, 2, 0, 3, 5, 6, 8, 2, 5, 9, 0, 1, 2, 5, 8, 0, 4, 6, 8, 9, 3, 7]])
The where
tuple is easier to use when indexing.
In [172]: I,J = np.where(mask==5) In [173]: I.shape Out[173]: (52,)
Applied to mask
itself we get all the 5
values:
In [174]: mask[I,J] Out[174]: array([5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5])
Applied to the 3d array:
In [175]: datain[:,I,J].shape Out[175]: (3, 52) In [176]: datain[:,I,J].sum(axis=1) Out[176]: array([28.01960227, 26.66364387, 26.24709875])
datain[:,kkw]
produces a (3,2,52,10) array. It’s applying the (2,52) array to index just the size 20 dimension. Converting the where
tuple to a 2d array does not help with this indexing. When indexing, the distinction between tuples, lists, and arrays is very important.