How to efficiently loop over an image pixel by pixel in python OpenCV?

Question

What I want to do is to loop over an image pixel by pixel using each pixel value to draw a circle in another corresponding image. My approach is as follows: Looping this way is somewhat slow. I tried adding the @njit decorator of numba, but apparently it has problems with opencv. Input images are 32&#215;32 p…

Accepted Answer

When:Dealing with drawingsThe number of possible options does not exceed a common sense value (in this case: 256)Speed is important (I guess that&#8217;s always the case)There&#8217;s no other restriction preventing this approachthe best way would be to &#8220;cache&#8221; the drawings (draw them upfront (or on demand depending on the needed overhead) in another array), and when the drawing should normally take place, simply take the appropriate drawing from the cache and place it in the target area (as @ChristophRackwitz stated in one of the comments), which is a very fast NumPy operation (compared to drawing).As a side note, this is a generic method not necessarily limited to drawings.But the results you claim you&#8217;re getting: ~100 ms per one 32&#215;32 image (to a 640&#215;640 circles one), didn&#8217;t make any sense to me (as OpenCV is also fast, and 1024 circles shouldn&#8217;t be such a big deal), so I created a program to convince myself.code00.py:#!/usr/bin/env pythonimport itertools as itsimport sysimport timeimport cv2import numpy as npdef draw_img_orig(arr_in, arr_out):    factor = round(arr_out.shape[0] / arr_in.shape[0])    factor_2 = factor // 2    it = np.nditer(arr_in, flags=["multi_index"])    while not it.finished:        y, x = it.multi_index        color = it[0]        it.iternext()        center = (x * factor + factor_2, y * factor + factor_2) # corresponding circle center        cv2.circle(arr_out, center, int(8 * color / 255), 255, -1)def draw_img_regular_iter(arr_in, arr_out):    factor = round(arr_out.shape[0] / arr_in.shape[0])    factor_2 = factor // 2    for row_idx, row in enumerate(arr_in):        for col_idx, col in enumerate(row):            cv2.circle(arr_out, (col_idx * factor + factor_2, row_idx * factor + factor_2), int(8 * col / 255), 255, -1)def draw_img_cache(arr_in, arr_out, cache):    factor = round(arr_out.shape[0] / arr_in.shape[0])    it = np.nditer(arr_in, flags=["multi_index"])    while not it.finished:        y, x = it.multi_index        yf = y * factor        xf = x *factor        arr_out[yf: yf + factor, xf: xf + factor] = cache[it[0]]        it.iternext()def generate_input_images(shape, count, dtype=np.uint8):    return np.random.randint(256, size=(count,) + shape, dtype=dtype)def generate_circles(shape, dtype=np.uint8, count=256, rad_func=lambda arg: int(8 * arg / 255), color=255):    ret = np.zeros((count,) + shape, dtype=dtype)    cy = shape[0] // 2    cx = shape[1] // 2    for idx, arr in enumerate(ret):        cv2.circle(arr, (cx, cy), rad_func(idx), color, -1)    return retdef test_draw(imgs_in, img_out, count, draw_func, *draw_func_args):    print("nTesting {:s}".format(draw_func.__name__))    start = time.time()    for i, e in enumerate(its.cycle(range(imgs_in.shape[0]))):        draw_func(imgs_in[e], img_out, *draw_func_args)        if i >= count:            break    print("Took {:.3f} seconds ({:d} images)".format(time.time() - start, count))def test_speed(shape_in, shape_out, dtype=np.uint8):    imgs_in = generate_input_images(shape_in, 50, dtype=dtype)    #print(imgs_in.shape, imgs_in)    img_out = np.zeros(shape_out, dtype=dtype)    circles = generate_circles((shape_out[0] // shape_in[0], shape_out[1] // shape_in[1]))    count = 250    funcs_data = (        (draw_img_orig,),        (draw_img_regular_iter,),        (draw_img_cache, circles),    )    for func_data in funcs_data:        test_draw(imgs_in, img_out, count, func_data[0], *func_data[1:])def test_accuracy(shape_in, shape_out, dtype=np.uint8):    img_in = np.arange(np.product(shape_in), dtype=dtype).reshape(shape_in)    circles = generate_circles((shape_out[0] // shape_in[0], shape_out[1] // shape_in[1]))    funcs_data = (        (draw_img_orig, "orig.png"),        (draw_img_regular_iter, "regit.png"),        (draw_img_cache, "cache.png", circles),    )    imgs_out = [np.zeros(shape_out, dtype=dtype) for _ in funcs_data]    for idx, func_data in enumerate(funcs_data):        func_data[0](img_in, imgs_out[idx], *func_data[2:])        cv2.imwrite(func_data[1], imgs_out[idx])    for idx, img in enumerate(imgs_out[1:], start=1):        if not np.array_equal(img, imgs_out[0]):            print("Image index different: {:d}".format(idx))def main(*argv):    dt = np.uint8    shape_in = (32, 32)    factor_io = 20    shape_out = tuple(i * factor_io for i in shape_in)    test_speed(shape_in, shape_out, dtype=dt)    test_accuracy(shape_in, shape_out, dtype=dt)if __name__ == "__main__":    print("Python {:s} {:03d}bit on {:s}n".format(" ".join(elem.strip() for elem in sys.version.split("n")),                                                   64 if sys.maxsize > 0x100000000 else 32, sys.platform))    rc = main(*sys.argv[1:])    print("nDone.n")    sys.exit(rc)Notes:Besides your implementation that uses np.nditer (which I placed in a function called draw_img_orig), I created 2 more:One that iterates the input array Pythonicly (draw_img_regular_iter)One that uses cached circles, and also iterates via np.nditer (draw_img_cache)In terms of tests, there are 2 of them &#8211; each being performed on every of the 3 (above) approaches:Speed: measure the time took to process a number of imagesAccuracy: measure the output for a 32&#215;32 input containing the interval [0, 255] (4 times)Output:[cfati@CFATI-5510-0:e:WorkDevStackOverflowq071818080]> sopr.bat### Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ###[prompt]> dir /bcode00.py[prompt]> "e:WorkDevVEnvspy_pc064_03.09_test0Scriptspython.exe" code00.pyPython 3.9.9 (tags/v3.9.9:ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)] 064bit on win32Testing draw_img_origTook 0.908 seconds (250 images)Testing draw_img_regular_iterTook 1.061 seconds (250 images)Testing draw_img_cacheTook 0.426 seconds (250 images)Done.[prompt]>[prompt]> dir /bcache.pngcode00.pyorig.pngregit.pngAbove there are the speed test results: as seen, your approach took a bit less than a second for 250 images!!! So I was right, I don&#8217;t know where your slowness comes from, but it&#8217;s not from here (maybe you got the measurements wrong?). The regular method is a bit slower, while the cached one is ~2X faster. I ran the code on my laptop:Win 10 pc064CPU: Intel i7 6820HQ @ 2.70GHz (fairly old)GPU: not relevant, as I didn&#8217;t notice any spikes during executionRegarding the accuracy test, all (3) output arrays are identical (there&#8217;s no message saying otherwise), here&#8217;s one saved image:

Advertisement

Answer