Initialize high dimensional sparse matrix

Question

I want to initialize 300,000 x 300,0000 sparse matrix using sklearn, but it requires memory as if it was not sparse: it gives the error: which is the same error as if I initialize using numpy: Even when I go to a very low density, it reproduces the error: Is there a more memory-efficient way to create such a …

Accepted Answer

Just generate only what you need.from scipy import sparseimport numpy as npn, m = 300000, 300000density = 0.00000001size = int(n * m * density)rows = np.random.randint(0, n, size=size)cols = np.random.randint(0, m, size=size)data = np.random.rand(size)arr = sparse.csr_matrix((data, (rows, cols)), shape=(n, m))This lets you build monster sparse arrays provided they’re sparse enough to fit into memory.>>> arr<300000x300000 sparse matrix of type '' with 900 stored elements in Compressed Sparse Row format>This is probably how the sparse.rand constructor should be working anyway. If any row, col pairs collide it’ll add the data values together, which is probably fine for all applications I can think of.

Advertisement

Answer