I have, for example, the following image:
Is there any way of identifying the embedded image between the text?
I know you can use OpenCV’s match_by_template
but that is if you already know what the embedded image is. I have spent a lot of time searching how to identify an image within an image in Python when not already knowing what the embedded image is, but I have not been able to find anything. Please let me know what I can do.
Advertisement
Answer
Here’s my attempt. This is how it works:
- To detect text, check how different pixels are from their neighbors. This is done using an absolute difference.
- The previous step only detects the edges of text. Expand this with a gaussian blur.
- Threshold this, and remove text.
- Crop remaining whitespace.
It uses numpy, opencv, and scipy to do it.
Full code:
import numpy as np import cv2 as cv import matplotlib.pyplot as plt import scipy.ndimage img_orig = cv.imread('image_extract.png') def find_text(gray, gaussian_size_px=10, text_threshold=10): flat = gray.flatten().astype('int16') # difference each pixel against the pixel 1 position forward differenced_image = np.abs(flat - np.roll(flat, 1)).reshape(gray.shape) differenced_image = scipy.ndimage.gaussian_filter(differenced_image, sigma=gaussian_size_px) is_text = differenced_image > text_threshold return is_text def remove_text(img, minpool_size=3): is_text = find_text(img) image_only = np.where(is_text, 0, img) # filter out small bright pixels with convolved minimum image_only = scipy.ndimage.minimum_filter(image_only, size=minpool_size) return image_only def find_subimage(img_orig): gray = cv.cvtColor(img_orig, cv.COLOR_BGR2GRAY) # invert colors so white = 0 and black = 255 gray = np.max(gray) - gray image_only = remove_text(gray) coords = cv.findNonZero(image_only) x, y, w, h = cv.boundingRect(coords) cropped = img_orig[y:y+h, x:x+w] return cropped plt.imshow(find_subimage(img_orig))
Output: