Agglomerate adjecent cells and their neighbours of the same type to clusters with python

Question

I&#8217;m trying to agglomerate adjacent cells (and their neighbours) that have the same type (integer from 1 to 10) into new clusters by assigning them to a cluster id. As visualised here for some of the clusters: Currently, I use an abbreviation from Breadth-First search to go through all neighbours and the…

Accepted Answer

You&#8217;re making a couple of mistakes. Note that for ~3 million cells, this should work within about a second.As pointed out in the comments, visited should be a set for O(1) lookup.Second, queue = queue + get_same_neighbours(base_pred, n, e, agglo_df) can get very slow if the queue gets very long, as it creates a copy of the queue. Instead, simply append the new items.Finally, your queue is not actually a queue, but a list. Therefore, popping the first element takes O(n) time. Instead you can use a queue (deque in python), or you can pop the last element instead, which will turn Breadth-firsth search into Depth-first search, which should work just as well.Apart from this, I think the only remaining problems are in the graph/grid representation. Storing it as a dataframe seems to often require parsing the entire dataframe to find cells. If you store it as a multidimensional array (be it lists or numpy arrays) or even as a dict, you can find any cell in O(1) time. Therefore I&#8217;d suggest first preprocessing the data from the dataframe into another data structure that grants easy access.

Advertisement

Answer