A robust way to keep the n-largest elements in rows or colums in the matrix

Question

I would like to make a sparse matrix from the dense one, such that in each row or column only n-largest elements are preserved. I do the following: This approach works as intended but seems to be not the most efficient, and the robust one. What would you recommend to do it an better way? The example of usage: Answer

Accepted Answer

Instead of making another dense matrix, you can use coo_matrix to build up using only the values you need:return spsp.coo_matrix((K[cond], np.where(cond)), shape = K.shape)As for the rest, you can maybe short-circuit the second dimension, but your time savings will be completely dependent on your inputsdef sparsify(K, min_nnz = 5): ''' This function eliminates the elements which are smaller that the maximal element in the matrix, Parameters ---------- K : ndarray K - the input matrix min_nnz: the minimal number of elements in row or column to be preserved ''' cond = K >= -np.partition(-K, min_nnz - 1, axis = 0)[min_nnz - 1, :] mask = cond.sum(1) < min_nnz cond[mask] = np.bitwise_or(cond[mask], K[mask] >= -np.partition(-K[mask], min_nnz - 1, axis = 1)[:, min_nnz - 1][:, None]) return spsp.coo_matrix((K[cond], np.where(cond)), shape = K.shape)Testing:sparsify(A)Out[]: <10x10 sparse matrix of type '' with 58 stored elements in COOrdinate format>sparsify(A).AOut[]: array([[0. , 0. , 0.61362248, 0. , 0.73648987, 0.64561856, 0.40727807, 0.61674005, 0.53533315, 0. ], [0.8888361 , 0.64548039, 0.94659603, 0.78474203, 0. , 0. , 0.78809603, 0.88938798, 0. , 0.37631541], [0.69356682, 0. , 0. , 0. , 0. , 0.7386594 , 0.71687659, 0.67750768, 0.58002451, 0. ], [0.67241433, 0.71923718, 0.95888737, 0. , 0. , 0. , 0.82773085, 0.69788448, 0.63736915, 0.4263064 ], [0. , 0.65831794, 0. , 0. , 0.59850093, 0. , 0. , 0.61913869, 0.65024867, 0.50860294], [0.75522891, 0. , 0.93342402, 0.8284258 , 0.64471939, 0.6990814 , 0. , 0. , 0. , 0.32940821], [0. , 0.88458635, 0.62460096, 0.60412265, 0.66969674, 0. , 0.40318741, 0. , 0. , 0.44116059], [0. , 0. , 0.500971 , 0.92291245, 0. , 0.8862903 , 0. , 0.375885 , 0.49473635, 0. ], [0.86920647, 0.85157893, 0.89883006, 0. , 0.68427193, 0.91195162, 0. , 0. , 0.94762875, 0. ], [0. , 0.6435456 , 0. , 0.70551006, 0. , 0.8075527 , 0. , 0.9421039 , 0.91096934, 0. ]])sparsify(A).A.astype(bool).sum(0)Out[]: array([5, 6, 7, 5, 5, 6, 5, 7, 7, 5])sparsify(A).A.astype(bool).sum(1)Out[]: array([6, 7, 5, 7, 5, 6, 6, 5, 6, 5])

Advertisement

Answer