Skip to content

Tag: python-tesseract

How to improve Hindi text extraction?

I am trying to extract Hindi text from a PDF. I tried all the methods to exract from the PDF, but none of them worked. There are explanations why it doesn’t work, but no answers as such. So, I decided to convert the PDF to an image, and then use pytesseract to extract texts. I have downloaded the Hindi trained

Pytesseract read coloured text

I am trying to read coloured (red and orange) text with Pytesseract. I tried to not grayscale the image, but that didn’t work either. Images, that it CAN read Images, that it CANNOT read My current code is: Answer This little function (below) will do for any color ec9Ut.png Thresh result x18MN.png Thresh result SFr48.png Thresh result Output
