Skip to content

How to use np.unique on big arrays?

I work with geospatial images in tif format. Thanks to the rasterio lib I can exploit these images as numpy arrays of dimension (nb_bands, x, y). Here I manipulate an image that contains patches of unique values that I would like to count. (they were generated with the scipy.ndimage.label function).

My idea was to use the unique method of numpy to retrieve the information from these patches as follows:

# identify the clumps
with as f:
    mask_raster =

class_, indices, count = np.unique(mask_raster, return_index=True, return_counts=True) 
del mask_raster
# identify the value
with as f:
    src_raster =

src_flat = src_raster.flatten()
del src_raster 
values = [src_flat[index] for index in indices]
df = pd.DataFrame({'patchId': indices, 'nb_pixel': count, 'value': values})

My problem is this: For an image of shape 69940, 70936, (84.7 mB on my disk), np.unique tries to allocate an array of the same dim in int64 and I get the following error:

Unable to allocate 37.0 GiB for an array with shape (69940, 70936) and data type uint64

  • Is it normal that unique reformats my painting in int64?
  • Is it possible to force it to use a more optimal format? (even if all my patches were 1 pixel np.int32would be sufficent)
  • Is there another solution using a function I don’t know?



I dig into the scipy.ndimage lib and effectivly find a solution that avoid memory explosion. As it’s slicing the initial raster is faster than I thought :

from scipy import ndimage
import numpy as np 

# open the files 
with as f_mask, as f_src: 
    mask_raster =
    src_raster =
# use patches as slicing material 
indices = [i for i in range(1, np.max(mask_raster))]
counts = []
values = []
for i, loc in enumerate(ndimage.find_objects(mask_raster)):
    loc_values, loc_counts = np.unique(mask_raster[loc], return_counts=True)
    # the value of the patch is the value with the highest count 
    idx = np.argmax(loc_counts)
df = pd.DataFrame({'patchId': indices, 'nb_pixel': count, 'value': values})