I have a numpy 2d array (named lda_fit
) with probabilities, where I want to replace the probabilities with 0 or 1, based on the max value in each line.
array([[0.06478282, 0.80609092, 0.06511851, 0.06400775], [0.50386571, 0.02621445, 0.44400621, 0.02591363], [0.259538 , 0.04266385, 0.65470484, 0.04309331], ..., [0.01415491, 0.01527508, 0.22211579, 0.74845422], [0.01419367, 0.01537099, 0.01521318, 0.95522216], [0.25 , 0.25 , 0.25 , 0.25 ]])
So after all the first line should look like [0,1,0,0]
, the second like [1,0,0,0]
and so on. I have tried, and this works, but only for a given threshold (0.5):
np.where(lda_fit < 0.5,0,1)
But as I might not have the largest value being greater than 0.5, I want to specify a new threshold for each line. Unfortunately this gives me the max value of the whole array.
np.where(lda_fit < np.max(lda_fit),0,1)
Advertisement
Answer
You can use np.max
with specifying axis:
(lda_fit.max(1,keepdims=True)==lda_fit)+0
Note: if there is more than one max in a row, it will return 1 for all of them. For alternative solution follow the next method.
output for example input in question:
[[0 1 0 0] [1 0 0 0] [0 0 1 0] [0 0 0 1] [0 0 0 1] [1 1 1 1]]
In case of multiple max in a row, if you want to have only first one as 1 and the rest of max as 0, you can use argmax
:
(lda_fit.argmax(axis=1)[:,None] == range(lda_fit.shape[1]))+0
or equally:
lda_fit_max = np.zeros(lda_fit.shape, dtype=int) lda_fit_max[np.arange(len(lda_fit)),lda_fit.argmax(axis=1)]=1
output:
[[0 1 0 0] [1 0 0 0] [0 0 1 0] [0 0 0 1] [0 0 0 1] [1 0 0 0]]