Unable to extract a word out of an image

  • A+
Category:Languages

I've written a script in python in combination with pytesseract to extract a word out of an image. There is only a single word TOOLS available in that image and that is what I'm after. Currently my below script is giving me wrong output which is WIS. What Can I do to get the text?

Link to that image

This is my script:

import requests, io, pytesseract from PIL import Image  response = requests.get('http://facweb.cs.depaul.edu/sgrais/images/Type/Tools.jpg') img = Image.open(io.BytesIO(response.content)) img = img.resize([100,100], Image.ANTIALIAS) img = img.convert('L') img = img.point(lambda x: 0 if x < 170 else 255) imagetext = pytesseract.image_to_string(img) print(imagetext) # img.show() 

This is the status of the modified image when I run the above script:

Unable to extract a word out of an image

The output I'm having:

WIS 

Expected output:

TOOLS 

 


The key is matching image transformation to the tesseract abilities. Your main problem is that the font is not a usual one. All you need is

from PIL import Image, ImageEnhance, ImageFilter  response = requests.get('http://facweb.cs.depaul.edu/sgrais/images/Type/Tools.jpg') img = Image.open(io.BytesIO(response.content))  # remove texture enhancer = ImageEnhance.Color(img) img = enhancer.enhance(0)   # decolorize img = img.point(lambda x: 0 if x < 250 else 255) # set threshold img = img.resize([300, 100], Image.LANCZOS) # resize to remove noise img = img.point(lambda x: 0 if x < 250 else 255) # get rid of remains of noise # adjust font weight img = img.filter(ImageFilter.MaxFilter(11)) # lighten the font ;) imagetext = pytesseract.image_to_string(img) print(imagetext) 

And voila,

TOOLS 

are recognized.

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: