Tesseract is an open source Optical Character Recognition (OCR) Engine, available under the Apache 2.0 license. It can be used directly, or (for programmers) using an API to extract typed, handwritten or printed text from images. It supports a wide variety of languages.
Key Features :
- The library provides optical character recognition (OCR) support for:
- TIFF, JPEG, GIF, PNG, and BMP image formats
- Multi-page TIFF images
- PDF document format
- Out-of-box support for multiple languages
- Capability to train for new languages including German, Chinese Simplified, Chinese Traditional, Hindi
- Provides scripts to compile the code for a variety of targets environments
- Provides capability to OCR from a variety of source documents including multi-page TIFF, images and PDF.
Like this post? Don’t forget to share it!