⚙️ Engines

⚙️ Engines#

Open-source OCR engines#

Non-exhaustive list of open-source engines that can be used to build OCR applications.

Tesseract OCR#

Documentation: https://tesseract-ocr.github.io/

Tesseract OCR is an open source engine that was originally developed by Hewlett-Packard in the 1980s. It is currently maintained by Google and is available under the Apache License 2.0. Tesseract is written in C++ and supports over 100 languages. It can be used as a standalone command-line tool or integrated into other applications via an API. Tesseract is widely used in the OCR community and has been integrated into many commercial products, including Google Docs, Microsoft Office, and Adobe Acrobat. [Smith, 2007]

OCRopus#

Documentation: https://ocropus.github.io/

OCRopus is an open source engine developed by Google. It offers a comprehensive set of tools and algorithms for recognizing and extracting text from scanned documents and images. With its modular architecture and support for multiple languages and fonts, OCRopus enables developers to create customizable OCR solutions. It includes components for layout analysis, character recognition, and post-processing, allowing for accurate and efficient text extraction. OCRopus is widely used in research and development projects, offering a flexible and accessible platform for OCR tasks and serving as the foundation for various OCR applications and services.

Kraken#

Documentation: https://kraken.re/main/index.html

Kraken is an open source OCR engine developed by the University of Innsbruck. It is a fork of OCRopus. It is written in Python and supports over 40 languages. It is designed to be modular and extensible, allowing developers to easily integrate it into their applications. Kraken offers a wide range of features, including layout analysis, text recognition, and post-processing. It also provides tools for training custom models and evaluating their performance. Kraken is widely used in research and development projects, offering a flexible and accessible platform for OCR tasks and serving as the foundation for various OCR applications and services.

Cuneiform#

Documentation: https://launchpad.net/cuneiform-linux

Cuneiform is an open source OCR engine developed by Cognitive Technologies. It is written in C++ and supports over 30 languages.In addition to text recognition it also does layout analysis and text format recognition. It is available under the Apache License 2.0. Cuneiform is widely used in research and development projects, offering a flexible and accessible platform for OCR tasks and serving as the foundation for various OCR applications and services.

GOCR#

Documentation: https://jocr.sourceforge.net/

GOCR is an open source OCR engine developed by JOCR. It is written in C and supports over 30 languages. It is available under the GNU General Public License. GOCR is widely used in research and development projects, offering a flexible and accessible platform for OCR tasks and serving as the foundation for various OCR applications and services.

Ocrad#

Documentation: https://www.gnu.org/software/ocrad/

Ocrad is an open source OCR engine that is part of the GNU project. It was developed by Antonio Diaz Diaz in 2003. It is written in C++.It can be used as a stand-alone command-line application or as a back-end to other programs.

EasyOCR#

Documentation: JaidedAI/EasyOCR

EasyOCR is an open source engine developed by JaidedAI designed to simplify text extraction from images. It provides developers with an easy-to-use interface for integrating OCR capabilities into their applications. EasyOCR supports multiple languages and can recognize text in various fonts and layouts. It utilizes deep learning models and techniques to accurately extract text from images, making it suitable for tasks such as document digitization, text recognition in images, and data extraction. EasyOCR offers a user-friendly approach to OCR, enabling developers to leverage its functionalities with minimal coding effort and providing a valuable tool for applications that require text recognition from images.

PaddleOCR#

Documentation: PaddlePaddle/PaddleOCR

PaddleOCR is an open source engine developed by PaddlePaddle (Baidu). It provides a comprehensive suite of tools and pre-trained models for text recognition tasks. PaddleOCR utilizes deep learning algorithms, including convolutional neural networks (CNNs), to achieve accurate and efficient text extraction from images and documents. It supports various OCR functionalities, such as text detection, recognition, and layout analysis. PaddleOCR is designed to handle multiple languages and offers a flexible and customizable platform for developers to integrate OCR capabilities into their applications.

Commercial OCR engines#

Non-exhaustive list of commercial engines that can be used to build OCR applications.

ABBYY FineReader#

Documentation: https://www.abbyy.com/en-us/finereader/

FineReader is a commericial engine developed by ABBYY in 1993. The software utilizes advanced OCR algorithms and machine learning techniques to accurately convert scanned documents, images, and PDF files into editable and searchable formats, such as Microsoft Word, Excel, or searchable PDF. It is known for its ability to handle complex layouts, including tables, graphics, and multiple languages, while preserving the original document’s structure and formatting.

IBM Datacap#

Documentation: https://www.ibm.com/products/datacap

IBM Datacap is a commericial engine initially developed by Datacap Inc., a company founded in 1989. Datacap Inc. specialized in document capture and forms processing solutions. In 2010, IBM acquired Datacap Inc. and incorporated their technology into the IBM Enterprise Content Management (ECM) portfolio. Since then, IBM Datacap has been continuously developed and enhanced by IBM. Datacap utilizes advanced optical character recognition (OCR) and intelligent document recognition (IDR) technologies to accurately recognize and extract key information from documents, such as invoices, forms, and contracts. It offers features like data validation, workflow automation, and integration with backend systems, allowing for efficient document routing, processing, and storage.

AWS Textract#

Documentation: https://aws.amazon.com/textract/

AWS Textract is a cloud service provided by Amazon Web Services (AWS). It was introduced and made generally available in 2019. It was developed as part of AWS’s suite of machine learning services, aimed at simplifying the extraction of text and data from documents using advanced OCR and machine learning algorithms. Textract allows developers and organizations to automate the process of extracting information from a variety of document types, including scanned documents, PDFs, and images.

Google Cloud Vision#

Documentation: https://cloud.google.com/vision

Google Cloud Vision was first introduced and made available as a service in 2015. It was developed by Google as part of its Cloud AI offerings, providing advanced image analysis and recognition capabilities through machine learning technologies. Google Cloud Vision enables developers and businesses to leverage image analysis algorithms for tasks such as label detection, face detection, text extraction, landmark recognition, and more. The service has since been continuously improved and expanded with additional features and enhancements including image classification, content moderation, and image-based search.

Microsoft Azure Computer Vision#

Documentation: https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/

Microsoft Azure Computer Vision is a service offered by Microsoft Azure for image analysis and recognition. It was initially developed and released in 2015. It was created as part of Microsoft’s cloud-based artificial intelligence (AI) services to enable developers and organizations to incorporate advanced image processing capabilities into their applications. Azure Computer Vision provides a range of functionalities, including image classification, object detection, optical character recognition (OCR), image tagging, and facial recognition.

Kofax OmniPage#

Documentation: https://www.kofax.com/Products/omnipage

Kofax Omnipage is OCR software developed by Kofax, designed to convert various types of documents, including scanned paper documents, PDF files, and images, into editable and searchable formats. With advanced OCR technology, it accurately recognizes and extracts text from documents while maintaining the original layout and formatting. Kofax Omnipage offers features like automatic document classification, intelligent zonal recognition, and image enhancement tools.