Skip to content
Advertisement

How to use np.unique on big arrays?

I work with geospatial images in tif format. Thanks to the rasterio lib I can exploit these images as numpy arrays of dimension (nb_bands, x, y). Here I manipulate an image that contains patches of unique values that I would like to count. (they were generated with the scipy.ndimage.label function).

My idea was to use the unique method of numpy to retrieve the information from these patches as follows:

JavaScript

My problem is this: For an image of shape 69940, 70936, (84.7 mB on my disk), np.unique tries to allocate an array of the same dim in int64 and I get the following error:

Unable to allocate 37.0 GiB for an array with shape (69940, 70936) and data type uint64

  • Is it normal that unique reformats my painting in int64?
  • Is it possible to force it to use a more optimal format? (even if all my patches were 1 pixel np.int32would be sufficent)
  • Is there another solution using a function I don’t know?

Advertisement

Answer

I dig into the scipy.ndimage lib and effectivly find a solution that avoid memory explosion. As it’s slicing the initial raster is faster than I thought :

JavaScript
Advertisement