Q: How to speed this up?
Below is my implementation of Matlab’s im2col ‘sliding’ with the additional feature of returning every n’th column. The function takes an image (or any 2 dim array) and slides from left to right, top to bottom, picking off every overlapping sub-image of a given size, and returning an array whose columns are the sub-images.
import numpy as np
def im2col_sliding(image, block_size, skip=1):
rows, cols = image.shape
horz_blocks = cols - block_size[1] + 1
vert_blocks = rows - block_size[0] + 1
output_vectors = np.zeros((block_size[0] * block_size[1], horz_blocks * vert_blocks))
itr = 0
for v_b in xrange(vert_blocks):
for h_b in xrange(horz_blocks):
output_vectors[:, itr] = image[v_b: v_b + block_size[0], h_b: h_b + block_size[1]].ravel()
itr += 1
return output_vectors[:, ::skip]
example:
a = np.arange(16).reshape(4, 4)
print a
print im2col_sliding(a, (2, 2)) # return every overlapping 2x2 patch
print im2col_sliding(a, (2, 2), 4) # return every 4th vector
returns:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
[[ 0. 1. 2. 4. 5. 6. 8. 9. 10.]
[ 1. 2. 3. 5. 6. 7. 9. 10. 11.]
[ 4. 5. 6. 8. 9. 10. 12. 13. 14.]
[ 5. 6. 7. 9. 10. 11. 13. 14. 15.]]
[[ 0. 5. 10.]
[ 1. 6. 11.]
[ 4. 9. 14.]
[ 5. 10. 15.]]
The performance isn’t great, especially considering whether I call im2col_sliding(big_matrix, (8, 8))
(62001 columns) or im2col_sliding(big_matrix, (8, 8), 10)
(6201 columns; keeping only every 10th vector) it will take the same amount of time [where big_matrix is of size 256 x 256].
I’m looking for any ideas to speed this up.
Advertisement
Answer
Approach #1
We could use some broadcasting
here to get all the indices of all those sliding windows in one go and thus with indexing achieve a vectorized solution
. This is inspired by Efficient Implementation of im2col and col2im
.
Here’s the implementation –
def im2col_sliding_broadcasting(A, BSZ, stepsize=1):
# Parameters
M,N = A.shape
col_extent = N - BSZ[1] + 1
row_extent = M - BSZ[0] + 1
# Get Starting block indices
start_idx = np.arange(BSZ[0])[:,None]*N + np.arange(BSZ[1])
# Get offsetted indices across the height and width of input array
offset_idx = np.arange(row_extent)[:,None]*N + np.arange(col_extent)
# Get all actual indices & index into input array for final output
return np.take (A,start_idx.ravel()[:,None] + offset_idx.ravel()[::stepsize])
Approach #2
Using newly gained knowledge of NumPy array strides
that lets us create such sliding windows, we would have another efficient solution –
def im2col_sliding_strided(A, BSZ, stepsize=1):
# Parameters
m,n = A.shape
s0, s1 = A.strides
nrows = m-BSZ[0]+1
ncols = n-BSZ[1]+1
shp = BSZ[0],BSZ[1],nrows,ncols
strd = s0,s1,s0,s1
out_view = np.lib.stride_tricks.as_strided(A, shape=shp, strides=strd)
return out_view.reshape(BSZ[0]*BSZ[1],-1)[:,::stepsize]
Approach #3
The strided method listed in the previous approach has been incorporated into scikit-image
module for a less messier, like so –
from skimage.util import view_as_windows as viewW
def im2col_sliding_strided_v2(A, BSZ, stepsize=1):
return viewW(A, (BSZ[0],BSZ[1])).reshape(-1,BSZ[0]*BSZ[1]).T[:,::stepsize]
Sample runs –
In [106]: a # Input array
Out[106]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
In [107]: im2col_sliding_broadcasting(a, (2,3))
Out[107]:
array([[ 0, 1, 2, 5, 6, 7, 10, 11, 12],
[ 1, 2, 3, 6, 7, 8, 11, 12, 13],
[ 2, 3, 4, 7, 8, 9, 12, 13, 14],
[ 5, 6, 7, 10, 11, 12, 15, 16, 17],
[ 6, 7, 8, 11, 12, 13, 16, 17, 18],
[ 7, 8, 9, 12, 13, 14, 17, 18, 19]])
In [108]: im2col_sliding_broadcasting(a, (2,3), stepsize=2)
Out[108]:
array([[ 0, 2, 6, 10, 12],
[ 1, 3, 7, 11, 13],
[ 2, 4, 8, 12, 14],
[ 5, 7, 11, 15, 17],
[ 6, 8, 12, 16, 18],
[ 7, 9, 13, 17, 19]])
Runtime test
In [183]: a = np.random.randint(0,255,(1024,1024))
In [184]: %timeit im2col_sliding(img, (8,8), skip=1)
%timeit im2col_sliding_broadcasting(img, (8,8), stepsize=1) :
%timeit im2col_sliding_strided(img, (8,8), stepsize=1) :
%timeit im2col_sliding_strided_v2(img, (8,8), stepsize=1) :
:
1 loops, best of 3: 1.29 s per loop
1 loops, best of 3: 226 ms per loop
10 loops, best of 3: 84.5 ms per loop
10 loops, best of 3: 111 ms per loop
In [185]: %timeit im2col_sliding(img, (8,8), skip=4)
%timeit im2col_sliding_broadcasting(img, (8,8), stepsize=4) :
%timeit im2col_sliding_strided(img, (8,8), stepsize=4) :
%timeit im2col_sliding_strided_v2(img, (8,8), stepsize=4) :
:
1 loops, best of 3: 1.31 s per loop
10 loops, best of 3: 104 ms per loop
10 loops, best of 3: 84.4 ms per loop
10 loops, best of 3: 109 ms per loop
Around 16x
speedup there with the strided method over the original loopy version!