Vectorize calculation of density of image regions

I am trying to implement an image stippling algorithm in python, and want to vectorize calculating the density (average luminance) of labelled image regions (Voronoi cells). Currently I’m able to do so using a loop, but this is too computationally intensive for large numbers of regions. How can I vectorize this operation?

import numpy as np
from skimage import io
from scipy.interpolate import griddata

number_of_points = 1000
img = io.imread('https://www.kindpng.com/picc/m/111-1114964_house-icon-png-old-house-easy-drawing-transparent.png', as_gray=True)
height, width = img.shape

# generate random points
rng = np.random.default_rng()
points = rng.random((number_of_points,2)) * [width, height]

# calculate labelled regions
grid_x, grid_y = np.mgrid[0:width, 0:height]
labels = griddata(points, np.arange(number_of_points), (grid_x, grid_y), method='nearest')

# calculate density per region (mean of grayscale values of pixels in each region)
point_idxs = np.arange(len(points))
density = [np.mean(img[labels.T==i]) for i in point_idxs] # <-- this is the bottleneck

JavaScript
​x
 
import numpy as np
from skimage import io
from scipy.interpolate import griddata
​
number_of_points = 1000
img = io.imread('https://www.kindpng.com/picc/m/111-1114964_house-icon-png-old-house-easy-drawing-transparent.png', as_gray=True)
height, width = img.shape
​
# generate random points
rng = np.random.default_rng()
points = rng.random((number_of_points,2)) * [width, height]
​
# calculate labelled regions
grid_x, grid_y = np.mgrid[0:width, 0:height]
labels = griddata(points, np.arange(number_of_points), (grid_x, grid_y), method='nearest')
​
# calculate density per region (mean of grayscale values of pixels in each region)
point_idxs = np.arange(len(points))
density = [np.mean(img[labels.T==i]) for i in point_idxs] # <-- this is the bottleneck
​

Answer

The problem is not the loop but the fact that this algorithm is not efficient. Using vectorization will use a lot of memory (which is slow) and barely speed up the loop. Indeed, img is fully read len(point_idxs). It can be read once using np.add.at and np.bincount:

sumByLabel = np.zeros(np.max(labels)+1)
np.add.at(sumByLabel, labels.T, img)
countByLabel = np.bincount(labels.reshape(-1))
density = sumByLabel / countByLabel

JavaScript
 
sumByLabel = np.zeros(np.max(labels)+1)
np.add.at(sumByLabel, labels.T, img)
countByLabel = np.bincount(labels.reshape(-1))
density = sumByLabel / countByLabel
​

This takes 32 ms on my machine while the initial code takes 539 ms (17x faster).

Advertisement

Answer