Skip to content
Advertisement

Pytesseract read coloured text

I am trying to read coloured (red and orange) text with Pytesseract. I tried to not grayscale the image, but that didn’t work either.

Images, that it CAN read IMG 1 IMG 2 IMG 3

Images, that it CANNOT read IMG 1 IMG 2

My current code is:

        tesstr = pytesseract.image_to_string(
                    cv2.cvtColor(nm.array(cap), cv2.COLOR_BGR2GRAY),
                    config="--psm 7")

Advertisement

Answer

This little function (below) will do for any color

ec9Ut.png

enter image description here

Thresh result

enter image description here

x18MN.png

enter image description here

Thresh result

enter image description here

SFr48.png

enter image description here

Thresh result

enter image description here

import cv2
from pytesseract import image_to_string

def getText(filename):
    img = cv2.imread(filename)
    HSV_img = cv2.cvtColor(img,cv2.COLOR_BGR2HSV)
    h,s,v = cv2.split(HSV_img)
    thresh = cv2.threshold(v, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    txt = image_to_string(thresh, config="--psm 6 digits")
    return txt
    

text = getText('ec9Ut.png')
print(text)
text = getText('x18MN.png')
print(text)
text = getText('SFr48.png')
print(text)

Output

46
31
53
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement