I have a 2D array, where I label clusters using the ndimage.label()
function like this:
import numpy as np from scipy.ndimage import label input_array = np.array([[0, 1, 1, 0], [1, 1, 0, 0], [0, 0, 0, 1], [0, 0, 0, 1]]) labeled_array, _ = label(input_array) # Result: # labeled_array == [[0, 1, 1, 0], # [1, 1, 0, 0], # [0, 0, 0, 2], # [0, 0, 0, 2]]
I can get the element counts, the centroids or the bounding box of the labeled clusters. But I would like to also get the coordinates of each element in clusters. Something like this (the data structure doesn’t have to be like this, any data structure is okay):
{ 1: [(0, 1), (0, 2), (1, 0), (1, 1)], # Coordinates of the elements that have the label "1" 2: [(2, 3), (3, 3)] # Coordinates of the elements that have the label "2" }
I can loop over the label list and call np.where()
for each one of them but I wonder if there is a way to do this without a loop, so that it would be faster?
Advertisement
Answer
You can make a map of the coordinates, sort and split it:
# Get the indexes (coordinates) of the labeled (non-zero) elements ind = np.argwhere(labeled_array) # Get the labels corresponding to those indexes above labels = labeled_array[tuple(ind.T)] # Sort both arrays so that lower label numbers appear before higher label numbers. This is not for cosmetic reasons, # but we will use sorted nature of these label indexes when we use the "diff" method in the next step. sort = labels.argsort() ind = ind[sort] labels = labels[sort] # Find the split points where a new label number starts in the ordered label numbers splits = np.flatnonzero(np.diff(labels)) + 1 # Create a data structure out of the label numbers and indexes (coordinates). # The first argument to the zip is: we take the 0th label number and the label numbers at the split points # The second argument is the indexes (coordinates), split at split points # so the length of both arguments to the zip function is the same result = {k: v for k, v in zip(labels[np.r_[0, splits]], np.split(ind, splits))}