Skip to content
Advertisement

Bounding box detection for characters / digits

I have images, which look like the following:

enter image description here

I want to find the bounding boxes for the 8 digits. My first try was to use cv2 with the following code:

JavaScript

Unfortunately that doesn’t work. Does anyone have an idea?

Advertisement

Answer

The problem in your solution is likely the input image, which is very poor in quality. There’s hardly any contrast between the characters and the background. The blob detection algorithm from cvlib is probably failing to distinguish between character blobs and background, producing a useless binary mask. Let’s try to solve this using purely OpenCV.

I propose the following steps:

  1. Apply adaptive threshold to get a reasonably good binary mask.
  2. Clean the binary mask from blob noise using an area filter.
  3. Improve the quality of the binary image using morphology.
  4. Get the outer contours of each character and fit a bounding rectangle to each character blob.
  5. Crop each character using the previously calculated bounding rectangle.

Let’s see the code:

JavaScript

From here there’s not much to discuss, just reading the BGR image and converting it to grayscale. Now, let’s apply an adaptive threshold using the gaussian method. This is the tricky part, as the parameters are adjusted manually depending on the quality of the input. The way the method works is dividing the image into a grid of cells of windowSize, it then applies a local threshold to found the optimal separation between foreground and background. An additional constant, indicated by windowConstant can be added to the threshold to fine tune the output:

JavaScript

You get this nice binary image:

Now, as you can see, the image has some blob noise. Let’s apply an area filter to get rid of the noise. The noise is smaller than the target blobs of interest, so we can easy filter them based on area, like this:

JavaScript

This is the filtered image:

We can improve the quality of this image with some morphology. Some of the characters seem to be broken (Check out the first 3 – it is broken in two separated blobs). We can join them applying a closing operation:

JavaScript

This is the “closed” image:

Now, you want to get the bounding boxes for each character. Let’s detect the outer contour of each blob and fit a nice rectangle around it:

JavaScript

The last for loop is pretty much optional. It fetches each bounding rectangle from the list and draws it on the input image, so you can see each individual rectangle, like this:

Let’s visualize that on the binary image:

Additionally, if you want to crop each character using the bounding boxes we just got, you do it like this:

JavaScript

This is how you can get the individual bounding boxes. Now, maybe you are trying to pass these images to an OCR. I tried passing the filtered binary image (after the closing operation) to pyocr (That’s the OCR I’m using) and I get this as output string: 31197402

The code I used to get the OCR of the closed image is this:

JavaScript

Be aware that the OCR receives black characters on white background, so you must invert the image first.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement