How to use np.unique on big arrays?

Question

I work with geospatial images in tif format. Thanks to the rasterio lib I can exploit these images as numpy arrays of dimension (nb_bands, x, y). Here I manipulate an image that contains patches of unique values that I would like to count. (they were generated with the scipy.ndimage.label function). My idea w…

Accepted Answer

I dig into the scipy.ndimage lib and effectivly find a solution that avoid memory explosion.As it&#8217;s slicing the initial raster is faster than I thought :from scipy import ndimageimport numpy as np # open the files with rio.open(mask) as f_mask, rio.open(src) as f_src:     mask_raster = f_mask.read(1)    src_raster = f_src.read(1)    # use patches as slicing material indices = [i for i in range(1, np.max(mask_raster))]counts = []values = []for i, loc in enumerate(ndimage.find_objects(mask_raster)):    loc_values, loc_counts = np.unique(mask_raster[loc], return_counts=True)        # the value of the patch is the value with the highest count     idx = np.argmax(loc_counts)    counts.append(loc_counts[idx])    values.append(loc_values[idx])    df = pd.DataFrame({'patchId': indices, 'nb_pixel': count, 'value': values})

Advertisement

Answer