Tesseract open source OCR Engine

Tesseract is an open source Optical Character Recognition (OCR) Engine, available under the Apache 2.0 license. It can be used directly, or (for programmers) using an API to extract typed, handwritten or printed text from images. It supports a wide variety of languages.

Tesseract OCR
Image – Tesseract OCR

Key Features :

  • The library provides optical character recognition (OCR) support for:
    • TIFF, JPEG, GIF, PNG, and BMP image formats
    • Multi-page TIFF images
    • PDF document format
  • Out-of-box support for multiple languages
  • Capability to train for new languages including German, Chinese Simplified, Chinese Traditional, Hindi
  • Provides scripts to compile the code for a variety of targets environments
  • Provides capability to OCR from a variety of source documents including multi-page TIFF, images and PDF.

Like this post? Don’t forget to share it!

Summary
Article Name
Tesseract open source OCR Engine
Description
Tesseract OCR can be used directly, or (for programmers) using an API to extract typed, handwritten or printed text from images.
Author
Publisher Name
upnxtblog

Leave a Reply