Skip to content
Advertisement

Import large .tiff file as sparse matrix

I have a large .tiff file (4.4gB, 79530 x 54980 values) with 1 band. Since only 16% of the values are valid, I was thinking it’s better to import the file as sparse matrix, to save RAM. When I first open it as np.array and then transform it into a sparse matrix using csr_matrix(), my kernel already crashes. See code below.

JavaScript

Is there a better way to work with this file? In the end I have to make calculations based on the values in the raster. (Unfortunately, due to confidentiality, I cannot attach the relevant file.)

Advertisement

Answer

Can you tell where the crash occurs?

JavaScript

If array is 4.4gB, (79530, 54980)

JavaScript

creating csr requires doing np.nonzero(array) to get the indices. That will produce 2 arrays of this 0.7 * 8 Gb size (indices are 8 byte ints). coo format actually requires those 2 arrays plus 0.7 for the nonzero values – about 12 Gb . Converted to csr, the row attribute is reduced to 79530 elements – so about 7 Gb . (corrected for 8 bytes/element)

So at 16% density, the sparse format is, at it’s best, is still larger than the dense version.

Memory error when converting matrix to sparse matrix, specified dtype is invalid

is a recent case of a memory error – which occurred in nonzero step.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement