Import large .tiff file as sparse matrix

Question

I have a large .tiff file (4.4gB, 79530 x 54980 values) with 1 band. Since only 16% of the values are valid, I was thinking it's better to import the file as sparse matrix, to save RAM. When I first open it as np.array and then transform it into a sparse matrix using csr_matrix(), my kernel already crashes. See code

Accepted Answer

Can you tell where the crash occurs?band =  ds.GetRasterBand(1)temp = band.ReadAsArray()array = np.array(temp)    # if temp is already an array, you don't need thiscsr_matrix(array)If array is 4.4gB, (79530, 54980)In [62]: (79530 * 54980) / 1e9Out[62]: 4.3725594    # 4.4gB makes sense for 1 byte/elementIn [63]: (79530 * 54980) * 0.16        # 16% densityOut[63]: 699609504.0                # number of nonzero valuescreating csr requires doing np.nonzero(array) to get the indices.  That will produce 2 arrays of this 0.7 * 8 Gb size (indices are 8 byte ints).  coo format actually requires those 2 arrays plus 0.7 for the nonzero values &#8211; about 12 Gb .  Converted to csr, the row attribute is reduced to 79530 elements &#8211; so about 7 Gb .   (corrected for 8 bytes/element)So at 16% density, the sparse format is, at it&#8217;s best, is still larger than the dense version.Memory error when converting matrix to sparse matrix, specified dtype is invalidis a recent case of a memory error &#8211; which occurred in nonzero step.

Advertisement

Answer