How to replace values in a np 2d array based on condition for every row

Tags: , , , ,



I have a numpy 2d array (named lda_fit) with probabilities, where I want to replace the probabilities with 0 or 1, based on the max value in each line.

array([[0.06478282, 0.80609092, 0.06511851, 0.06400775],
       [0.50386571, 0.02621445, 0.44400621, 0.02591363],
       [0.259538  , 0.04266385, 0.65470484, 0.04309331],
       ...,
       [0.01415491, 0.01527508, 0.22211579, 0.74845422],
       [0.01419367, 0.01537099, 0.01521318, 0.95522216],
       [0.25      , 0.25      , 0.25      , 0.25      ]])

So after all the first line should look like [0,1,0,0], the second like [1,0,0,0] and so on. I have tried, and this works, but only for a given threshold (0.5):

np.where(lda_fit < 0.5,0,1)

But as I might not have the largest value being greater than 0.5, I want to specify a new threshold for each line. Unfortunately this gives me the max value of the whole array.

np.where(lda_fit < np.max(lda_fit),0,1)

Answer

You can use np.max with specifying axis:

(lda_fit.max(1,keepdims=True)==lda_fit)+0

Note: if there is more than one max in a row, it will return 1 for all of them. For alternative solution follow the next method.

output for example input in question:

[[0 1 0 0]
 [1 0 0 0]
 [0 0 1 0]
 [0 0 0 1]
 [0 0 0 1]
 [1 1 1 1]]

In case of multiple max in a row, if you want to have only first one as 1 and the rest of max as 0, you can use argmax:

(lda_fit.argmax(axis=1)[:,None] == range(lda_fit.shape[1]))+0

or equally:

lda_fit_max = np.zeros(lda_fit.shape, dtype=int)
lda_fit_max[np.arange(len(lda_fit)),lda_fit.argmax(axis=1)]=1

output:

[[0 1 0 0]
 [1 0 0 0]
 [0 0 1 0]
 [0 0 0 1]
 [0 0 0 1]
 [1 0 0 0]]


Source: stackoverflow