Top 8 OCR Libraries in Python to Extract Text from Image

Ayushi Trivedi Last Updated : 11 Dec, 2024
11 min read

Have you ever thought how your computer can read text from images? It is all thanks to something called Optical Character Recognition, or OCR. In Python, there are some cool libraries that help your computer understand text in pictures. From Google’s powerful Tesseract to EasyOCR’s fancy deep learning, these libraries can do some pretty amazing stuff. Let’s take a peek into python OCR image to text libraries in Python and see how these libraries turn images into readable text!

Learning Objectives:

  • Understand what optical character recognition (OCR) is and its applications
  • Explore the top 8 OCR libraries in Python: EasyOCR, Doctr, Keras-OCR, Tesseract, GOCR, Pytesseract, OpenCV, and Amazon Textract
  • Learn how to install and implement each OCR library in Python

EasyOCR

EasyOCR simplifies text extraction from images in Python with its user-friendly approach and deep learning-powered model. It supports multiple languages, making it versatile for international applications. Whether it’s printed or handwritten, horizontally or vertically aligned, EasyOCR adeptly handles various text styles and orientations. Its efficiency ensures swift performance, ideal for real-time applications. EasyOCR’s open-source nature facilitates user modification and contribution. It enables users to tailor it to their own requirements. It also provides a dependable and easy-to-use method for removing text from photos, optimizing document processing, creating apps, and improving accessibility.

Steps to Install and Implement EasyOCR

Step1: Install Python

First, ensure you have Python installed on your system. You can download it from the official Python website and follow the installation instructions.

Step2: Install EasyOCR

Once Python is installed, open your command line or terminal and run the following command to install EasyOCR using pip.

pip install easyocr

Step3: Install Dependencies

EasyOCR has a few dependencies that need to be installed. Don’t worry; pip will take care of these for you automatically.

Step4: Usage of EasyOCR

Now that EasyOCR is installed, your Python scripts can use it. This is a basic illustration of how to extract text from an image using EasyOCR.

import easyocr

# Create an OCR reader object
reader = easyocr.Reader(['en'])

# Read text from an image
result = reader.readtext('image.jpg')

# Print the extracted text
for detection in result:
    print(detection[1])

You may now easily extract text from photos in your Python programs if you have EasyOCR installed. EasyOCR makes text extraction simple, whether you’re improving accessibility or automating data entry.

Doctr

Doctr is a Python library for document understanding and processing, specifically designed for machine learning and natural language processing applications. It aids in tasks like document layout analysis, text extraction, and semantic understanding. Doctr identifies text regions, images, and tables within documents, and offers optical character recognition for text extraction from various formats. It also aims to understand semantic meaning through named entity recognition and sentiment analysis. Doctr is scalable, efficient, and suitable for large document volumes in production environments. It encourages community contributions and offers an extensible architecture for custom components.

Steps to Install and Implement Doctr

Step1: Install Doctr

You can install Doctr using pip, Python’s package manager. Open your command line or terminal and run the following command.

pip install doctr

Step2: Import the necessary modules

In your Python script or notebook, import the Doctr modules you’ll need for your document processing tasks. For example:

from doctr.models import ocr_predictor

Step3: Load a document

Depending on your use case, load the document you want to process. Doctr supports various document formats, including PDFs and images.

Step4: Perform document understanding tasks

Use Doctr’s functionalities to perform tasks such as document layout analysis, text extraction, and semantic understanding. For example, you can use the python OCR image to text predictor to extract text from an image:

# Load an image
image_path = 'example_image.jpg'

# Create an OCR predictor
predictor = ocr_predictor.create_predictor()

# Perform OCR on the image
result = predictor(image_path)

# Print the extracted text
print(result)

Step5: Integrate with your workflow

Once you’re satisfied with your implementation, integrate Doctr into your workflow or application to automate document processing tasks and streamline your workflow.

These steps will help you install and use Doctr in your Python environment. So that you can quickly and effectively complete activities related to document understanding and processing.

Keras-OCR

Keras-OCR is a Python library that simplifies python OCR image to text tasks through the Keras and TensorFlow frameworks. It offers pre-trained models with high accuracy across various text and font styles. Its user-friendly API allows for easy implementation. Keras-OCR offers flexibility in configuration, allowing for customization of parameters like input image size and target language. Its open-source nature fosters a collaborative environment, enhancing productivity and integrating OCR capabilities into Python applications.

Steps to Install and Implement Keras-OCR

To implement Keras-OCR for text recognition in Python, follow these steps:

Step1: Install Keras-OCR

Use pip to install the Keras-OCR library in your Python environment.

pip install keras-ocr

Step2: Import Necessary Modules

In your Python script or notebook, import the required modules from Keras-OCR.

import keras_ocr

Step3: Load Pre-Trained Model

Keras-OCR provides pre-trained models for text recognition. You can load one of these models using the pipeline.Pipeline() function.

pipeline = keras_ocr.pipeline.Pipeline()

Step4: Perform Text Recognition

Use the loaded pipeline to perform text recognition on images. You can pass a single image or a list of images to the recognize() function.

images = ['image1.jpg', 'image2.jpg']  # List of image file paths
predictions = pipeline.recognize(images)

This will return predictions for each image, containing information about the detected text regions and the recognized text.

Step5: Display Results

You can then iterate through the predictions to display the recognized text and visualize the text regions:

for image, prediction in zip(images, predictions):
    keras_ocr.tools.drawAnnotations(image=image, predictions=prediction)

Step6: Integration

Finally, integrate the text recognition functionality into your Python application or workflow as needed.

You can easily implement Keras-OCR for text recognition in your Python projects by following these steps. These enable you to extract text from images with high accuracy and efficiency.

Tesseract

Tesseract is an open-source OCR engine maintained by Google. It is known for its exceptional accuracy in deciphering text from images. It supports over 100 languages and can handle various image types, including scanned documents and photographs. Users can customize parameters like page segmentation mode and language models to optimize recognition accuracy. Tesseract encourages community contributions and is easily integrated with Python, providing a straightforward interface for developers to incorporate OCR capabilities into their applications.

Steps to Install and Implement Tesseract

Installing the pytesseract library is required in order to use Tesseract OCR in Python. The Tesseract engine is encapsulated in this library. Here’s a detailed of steps:

Step1: Install Tesseract

First, you need to install the Tesseract python OCR image to text engine on your system. You can download and install it from here.

Step2: Install pytesseract

Next, install the pytesseract library using pip:

pip install pytesseract

Step3: Import pytesseract

Import the pytesseract module in your Python script or notebook:

import pytesseract

Step4: Set Tesseract Path (Optional)

The pytesseract.pytesseract.tesseract_cmd variable must be used to define the location of Tesseract if it is not installed in the default system path:

pytesseract.pytesseract.tesseract_cmd = r'/path/to/tesseract'

Step5: Perform OCR

Use the image_to_string() function to perform OCR on an image. Pass the image file path as an argument:

# Perform OCR on an image
text = pytesseract.image_to_string('image.jpg')

This will extract text from the image and store it in the text variable.

Step6: Display Results

You can then print or manipulate the extracted text as needed:

print(text)

You can quickly integrate Tesseract python OCR image to text to extract text from photos in your Python environment by following these instructions. Remember that Tesseract’s accuracy can change based on a number of variables, including language, text complexity, and image quality. For particular use situations, modifying the parameters and preparing the photos could assist increase OCR accuracy.

GOCR

GOCR is an open-source OCR engine that was created under the GNU General Public License that allows users to extract text from photographs on a range of platforms. This includes some basic text recognition features and is compatible with numerous systems. But it is mostly focused on English and does not support other languages. Its efficacy for some applications may be limited in comparison to more contemporary options due to its lack of active development and restricted linguistic support.

Steps to Install and Implement GOCR

Installing the GOCR program and utilizing its command-line interface (CLI) to perform optical character recognition on photos are the first steps in implementing GOCR. Here is a general how-to implementation:

Step1: Install GOCR

Depending on your operating system, you may be able to install GOCR using package managers like apt on Ubuntu or Homebrew on macOS. Alternatively, you can download the source code and compile it manually.

Step2: Prepare Images

Prepare the images containing the text you want to recognize. Ensure that the images are clear and of sufficient quality for accurate OCR.

Step3: Run the library from the Command Line

Use the GOCR command-line interface to perform OCR on your images. Here’s a basic command to run GOCR on an image file named “image.jpg”.

gocr image.jpg

It will process the image and output the recognized text to the terminal.

Step4: Process Output

Once GOCR has finished processing the image, you can capture the output text from the terminal and use it in your application as needed.

Keep in mind that this library may have limitations compared to more modern OCR engines in terms of accuracy, language support, and ease of use. If GOCR isn’t up to par, it’s critical to assess your demands and take other OCR options into account.

Pytesseract

A Python wrapper called Pytesseract allows Tesseract-OCR Engine from Google to be integrated into Python programs. It offers an efficient method for optical character recognition. Because of its intuitive interface, users may extract text from photographs with little to no coding knowledge. Pytesseract supports a variety of languages, including English, French, Spanish, and German, and is compatible with the Windows, macOS, and Linux operating systems. Text in different fonts, sizes, and styles can be processed using it. OCR parameters can be adjusted by developers to maximize accuracy. Additionally, Pytesseract interfaces with the Python Imaging Library Pillow, enabling preprocessing before to OCR processes.

Steps to Install and Implement Pytesseract

Installing the pytesseract library and using it to perform optical character recognition (OCR) on photographs is the implementation of pytesseract. Here’s how to use Pytesseract in Python, step-by-step:

Step1: Install Tesseract

Before using pytesseract, you need to install the Tesseract OCR engine on your system. You can download and install it from here.

Step2: Install pytesseract

Next, install the pytesseract library using pip:

pip install pytesseract

Step3: Import pytesseract

Import the pytesseract module in your Python script or notebook:

import pytesseract

Step4: Perform OCR on an Image

Use the image_to_string() function from pytesseract to perform OCR on an image. Pass the image file path as an argument:

# Perform OCR on an image
text = pytesseract.image_to_string('image.jpg')

This will extract text from the image and store it in the text variable.

Step5: Optional Configuration

You can configure pytesseract to use specific OCR parameters, such as language and page segmentation mode. For example:

# Set language (default is English)
pytesseract.pytesseract.tesseract_cmd = r'/path/to/tesseract'
tessdata_dir_config = '--tessdata-dir "/usr/share/tesseract-ocr/4.00/tessdata"'
text = pytesseract.image_to_string('image.jpg', config=tessdata_dir_config)

Step6: Display Results

Finally, you can print or manipulate the extracted text as needed:

print(text)

These steps will help you quickly integrate Pytesseract into your Python environment so that you can use OCR to extract text from photos. Remember that a number of variables, like language, text complexity, and image quality, can affect how accurate OCR is. For particular use situations, modifying the parameters and preparing the photos could assist increase OCR accuracy.

OpenCV

OpenCV, created by Intel and kept up to date by a global developer community. It is an essential tool for computer vision and machine learning. For a variety of uses, such as image processing, object detection, face recognition, augmented reality, and robotics. It provides an extensive range of features and techniques. OpenCV’s Python interface facilitates quick development and prototyping, and its cross-platform compatibility guarantees accessible across multiple systems. OpenCV is a foundational library in computer vision that is seamlessly integrated with other Python libraries such as NumPy, SciPy, and TensorFlow. This allows developers to design inventive applications across a range of domains.

Steps to Install and Implement OpenCV

Installing the library and utilizing its features to carry out different computer vision tasks constitutes the implementation of OpenCV. Here is a simple illustration of how to process images using OpenCV in Python:

Step1: Install OpenCV

Use pip to install the OpenCV library in your Python environment.

pip install opencv-python

Step2: Import OpenCV

Import the OpenCV library in your Python script or notebook:

import cv2

Step3: Read an Image

Use the cv2.imread() function to read an image from a file:

# Read an image from file
image = cv2.imread('image.jpg')

Step4: Display the Image

Use the cv2.imshow() function to display the image in a window:

# Display the image in a window
cv2.imshow('Image', image)

Step5: Wait for User Input

Use the cv2.waitKey() function to wait for a key press to close the window:

# Wait for a key press and close the window
cv2.waitKey(0)
cv2.destroyAllWindows()

Step6: Perform Image Processing (Optional)

You can use various OpenCV functions to perform image processing tasks, such as resizing, cropping, filtering, and more:

# Resize the image
resized_image = cv2.resize(image, (width, height))

# Convert the image to grayscale
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Apply Gaussian blur to the image
blurred_image = cv2.GaussianBlur(image, (5, 5), 0)

Step7: Save the Processed Image (Optional)

Use the cv2.imwrite() function to save the processed image to a file:

# Save the processed image to file
cv2.imwrite('processed_image.jpg', processed_image)

By following these steps, you can easily implement OpenCV in your Python environment to perform various image processing tasks. OpenCV offers a wide range of functions and capabilities, allowing you to manipulate images, detect objects, track motion, and much more. Experimenting with different functions and parameters will help you explore the full potential of OpenCV for your computer vision applications.

Amazon Textract

Amazon Textract is a machine learning service by Amazon Web Services (AWS) that efficiently extracts text and data from documents. It uses advanced algorithms to identify and analyze structured data, including text, tables, and forms. It is particularly useful for financial reports and invoices. Textract automates key-value pair extraction and form data extraction, streamlining data entry and processing workflows. It also offers advanced document analysis functionalities. Amazon Textract is integrated with other AWS services, ensuring scalability, high performance, and reliability. It also provides a secure environment for document processing across various sectors, including finance, healthcare, legal, and government.

Steps to Install and Implement Amazon Textract

Implementing Amazon Textract involves using the AWS SDK to interact with the Textract API. Here’s a high-level overview of the steps to implement Amazon Textract in Python:

Step1: Set Up AWS Credentials

Ensure you have AWS credentials configured with appropriate permissions to access the Textract service.

Step2: Install the AWS SDK

Install the AWS SDK for Python (Boto3) using pip:

pip install boto3

Step3: Initialize the Textract Client

Create a Textract client object using the Boto3 library and your AWS credentials:

# Initialize Textract client
textract_client = boto3.client(
    'textract',
    region_name='your-region',
    aws_access_key_id='your-access-key-id',
    aws_secret_access_key='your-secret-access-key'
)

Step4: Process Documents

Use the analyze_document() method of the Textract client to analyze documents and extract text and data:

# Process document
response = textract_client.analyze_document(
    Document={
        'S3Object': {
            'Bucket': 'your-bucket-name',
            'Name': 'your-document-key'
        }
    },
    FeatureTypes=['TABLES', 'FORMS']
)

This will return a response containing extracted text, tables, and forms from the document.

Step5: Access Extracted Data

Extracted text, tables, and forms can be accessed from the response object and further processed as needed:

# Extract text
extracted_text = response['Blocks']

# Extract tables
extracted_tables = [block for block in extracted_text if block['BlockType'] == 'TABLE']

# Extract forms
extracted_forms = [block for block in extracted_text if block['BlockType'] == 'KEY_VALUE_SET']

Step6: Handle Errors and Exceptions

Implement error handling to gracefully handle exceptions and errors that may occur during document processing:

try:
    response = textract_client.analyze_document(
        Document={
            'S3Object': {
                'Bucket': 'your-bucket-name',
                'Name': 'your-document-key'
            }
        },
        FeatureTypes=['TABLES', 'FORMS']
    )
except Exception as e:
    print(f'Error processing document: {e}')

Step7: Further Processing and Integration

Depending on your application requirements, you may need to further process the extracted text, tables, and forms, and integrate them into your workflow or application.

By following these steps, you can implement Amazon Textract in your Python application to extract text and data from documents stored in Amazon S3. Make sure to refer to the AWS documentation for detailed information on the Textract API and its usage.

Conclusion

Optical character recognition (OCR) has revolutionized computer text understanding, enabling various applications. Python offers eight top OCR libraries, each with unique features. EasyOCR is user-friendly, Tesseract is accurate, and Amazon Textract is efficient. OCR libraries cater to diverse needs and use cases, automating tasks, streamlining workflows, and extracting valuable insights from unstructured data. With advancements in machine learning and computer vision, the future of OCR holds promising prospects for innovation and enhancement.

Key Takeaways:

  • OCR enables computers to extract text from images and documents, automating data entry and analysis tasks
  • Python offers a variety of OCR libraries with different features, strengths, and use cases
  • Libraries like EasyOCR, Tesseract, and Amazon Textract are powerful tools for accurate text recognition and data extraction

Frequently Asked Questions

Q1. What library is used for OCR?

A. For OCR, libraries like Tesseract, EasyOCR, and PyOCR are commonly used.

Q2.Which is the best OCR library?

A. Tesseract is often considered the best OCR library due to its accuracy and support for multiple languages.

Q3. What does OCR mean in library?

A. In a library context, OCR (Optical Character Recognition) refers to technology that converts different types of documents into editable and searchable data.

Q4. What is the OCR library in Python?

A. In Python, Tesseract (via the pytesseract wrapper) and EasyOCR are popular libraries for OCR tasks.

My name is Ayushi Trivedi. I am a B. Tech graduate. I have 3 years of experience working as an educator and content editor. I have worked with various python libraries, like numpy, pandas, seaborn, matplotlib, scikit, imblearn, linear regression and many more. I am also an author. My first book named #turning25 has been published and is available on amazon and flipkart. Here, I am technical content editor at Analytics Vidhya. I feel proud and happy to be AVian. I have a great team to work with. I love building the bridge between the technology and the learner.

Responses From Readers

Clear

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details