I am trying to implement an image stippling algorithm in python, and want to vectorize calculating the density (average luminance) of labelled image regions (Voronoi cells). Currently I’m able to do so using a loop, but this is too computationally intensive for large numbers of regions. How can I vectorize this operation?
import numpy as np from skimage import io from scipy.interpolate import griddata number_of_points = 1000 img = io.imread('https://www.kindpng.com/picc/m/111-1114964_house-icon-png-old-house-easy-drawing-transparent.png', as_gray=True) height, width = img.shape # generate random points rng = np.random.default_rng() points = rng.random((number_of_points,2)) * [width, height] # calculate labelled regions grid_x, grid_y = np.mgrid[0:width, 0:height] labels = griddata(points, np.arange(number_of_points), (grid_x, grid_y), method='nearest') # calculate density per region (mean of grayscale values of pixels in each region) point_idxs = np.arange(len(points)) density = [np.mean(img[labels.T==i]) for i in point_idxs] # <-- this is the bottleneck
Advertisement
Answer
The problem is not the loop but the fact that this algorithm is not efficient. Using vectorization will use a lot of memory (which is slow) and barely speed up the loop. Indeed, img
is fully read len(point_idxs)
. It can be read once using np.add.at
and np.bincount
:
sumByLabel = np.zeros(np.max(labels)+1) np.add.at(sumByLabel, labels.T, img) countByLabel = np.bincount(labels.reshape(-1)) density = sumByLabel / countByLabel
This takes 32 ms on my machine while the initial code takes 539 ms (17x faster).