Understanding Label Detection in Invoices using OpenCV

Hari Bhutanadhu Last Updated : 12 Mar, 2023

6 min read

Introduction

Document image analysis is the name for the algorithms and methods used to turn the pixels in an image into a description that a computer can understand. Optical Character Recognition, or OCR, uses computer vision to find and read the text in images. OCR can accurately predict the output in a matter of milliseconds. OCR was one of the first problems that computer vision tried to solve, and it has come a long way since then. With the help of these OCR models, we found a way of label detection invoices, such as the vendor’s name, the bill date, the bill number, the bill amount, and the total number of items. To get a high level of accuracy, we used an ensemble technique in which we used different OCRs for detecting and recognizing the labels separately.

Learning Objectives

Below are the major learning objectives of this article:

You will learn how to use opencv for label detection on an invoice, such as the invoice number, invoice date, total amount, total number of items, etc.
You will learn how to get the text’s coordinates from any invoice image.
You will learn the steps in image preprocessing.
You will learn how to tell what type of template a new invoice is using the template image dataset.
Go through the code snippets to understand the above objectives.

This article was published as a part of the Data Science Blogathon.

Basic Architecture

Let’s say we need to detect labels on invoices from different templates and are given a template labels dataset consisting of the labels’ names for several templates.
If we have a template labels dataset with the names of labels for a number of templ.
The coordinates for the required labels for each template are stored in a table (csv file).
Layout mapping is done to find the image template for the new invoice so that labels for the new invoice can be found using the coordinates that have already been stored.
After the template was found, the coordinates of the labels in the table (csv file) were retrieved.
The extracted coordinates are used to predict the labels of the new invoice.

Image Preprocessing of Invoices

Since the input is an image of an invoice, we know that preprocessing the images is a very important step that will help us get better results. For this, we used Skew Correction, Binarisation, Noise Filtering, and contour detection as part of the preprocessing.

#binarisation
res = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_GAUSSIAN_C,\
            cv.THRESH_BINARY,11,2)                   
    plt.figure(figsize=(100, 60))
plt.imshow(res,'gray')
plt.show()

#noise filtering
cv2.fastNlMeansDenoisingColored(img,None,10,10,7,21) 

#skew correction
import numpy as np
from skimage import io
from skimage.transform import rotate
from skimage.color import rgb2gray
from deskew import determine_skew
image = io.imread(_img)
grayscale = rgb2gray(image)
angle = determine_skew(grayscale)                    
rotated = rotate(image, angle, resize=True) * 255
rotated=rotated.astype(np.uint8)

Contour Detection is done because the invoices in the images we have are in different places and we need to find them. This was done with the help of a ” contour detection method.” Find the image’s largest contour, crop it to fit, and show it. This was done by using the cv2.findContours() function to find the edges and the cv2.contourArea() method to find the edge with the most area, then cropping the image to that edge.

contours, hierarchy = cv2.findContours(thresh,cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
# Find Biggest Contour
areas = [cv2.contourArea(c) for c in contours]
max_index = np.argmax(areas)
# Find approxPoly Of Biggest Contour
epsilon = 0.1 * cv2.arcLength(contours[max_index], True)
approx = cv2.approxPolyDP(contours[max_index], epsilon, True)
# Crop The Image 
points1 = np.float32(approx)
points = np.float32([[0, 0], [width, 0], [width, height], [0, height]])
result = cv2.warpPerspective(img, matrix, (width, height))
matrix = cv2.getPerspectiveTransform(points1, points)

Extracting Coordinates of Labels of Different Invoice Templates

Then, using EasyOCR as the detection model and PaddleOCR as the recognition model, the MultiOcr model is built to get the coordinates of the labels for each invoice template.

reader = easyocr.Reader(['en'])
ocr = PaddleOCR(lang='en')

#detection
def detect_text_blocks(img_path):  
    detection_result = reader.detect(img_path,width_ths=0.7,mag_ratio=1.5)
    text_coordinates = detection_result[0][0]
    return text_coordinates

The MultiOcr model finds the coordinates of label names in the template labels dataset for each template invoice and stores them in a table (csv file). Because the number of items on an invoice can vary, the starting and ending coordinates of the table of invoice items in the invoice image were given to predict how many items were on the invoice.

When the size of the table of items in the invoice image changes, labels like the “total amount” position change. This is because the total amount comes after the table of invoice items in any invoice. To solve this problem, a relative positioning method can be used to guess and detect the total amount. This can be done by storing the coordinates of the strings around the total amount label in the invoice. This is done because the string’s value (or name) doesn’t change, even if the invoices are different but come from the same template.

Finding the Template of any Given New Invoice

To detect the labels of new invoices, we need to know the template of the invoice. The purpose of the document similarity method is to predict the invoice template
As the name suggests, document similarity tells you how similar two documents are. Document distance is used to figure out how similar two documents are. The cosine similarity method can be used to do this
From this, we will be able to obtain the template of the invoice whose labels are to be predicted using this method

For Example

Label Detection
The document similarity method is used on these three images. Image1 and image2 are from the same vendor, and image3 is from a different vendor. The document similarity results are shown below:

Image1 – Image2 : The distance is 1.000072 (radians)

Image1 – Image3 : The distance is 1.408562 (radians)

From the document similarity method results, we can see that the distance between image1 and image2 is less than between image1 and image3. This means that images 1 and 2 are from the same vendor.

Label Detection of the New Invoice Using Template’s Label Coordinates

Since we got the template from the table (csv file), the label coordinates are taken and used to identify invoice image labels.

Example: When an image of an invoice like the one below is given as input, it first looks for the invoice’s template. The table (csv file) is used to get coordinates for the labels. The image labels on the invoice will be identified with these label coordinates.

Methods to Improve Performance

During preprocessing, different thresholding methods, such as Global Thresholding, Adaptive Mean Thresholding, and Adaptive Gaussian Thresholding, can be used to get a better image of an invoice
For detection and recognition, the MultiOcr model can use several OCR models, such as PyTesseract, PPOCR, easyOCR, MMOCR, and Keras-OCR. The OCR model that gave the best results will be chosen as the final model
In the MultiOcr model’s detection step, hyperparameter tuning is done with parameters width_ths, which sets the maximum horizontal distance between two bounding boxes to be merged, and mag_ratio, which scales the image up or down based on the factor given
Several document similarities methods, such as cosine similarity and Euclidean Distance, can be used to improve the results when predicting the template

Conclusion

In Conclusion, With this work, we propose an algorithm for label detection from the invoices using the MultiOcr Model; we will be able to successfully detect the positions of the labels for templates as well as the labels for any new invoices within the given templates. For this, we used OCR models like easyOCR as the detection model and PaddleOCR as the recognition model. Also, we are happy to say that we are able to give better results with this algorithm.

Key takeaways of this article

We can get 85% accuracy for contour detection, and the multiOcr model that includes EasyOcr and paddleOCR achieves approximately 95% accuracy.
The cosine similarity approach determines document similarity with 82.8% precision. False positives may arise if two documents share a large number of terms.
We have discussed image preprocessing steps, label detection from bills using their coordinates, and invoice template detection.
Learned some basic codes and concluded the article with an example

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Hari Bhutanadhu

My self Bhutanadhu Hari, 2023 Graduated from Indian Institute of Technology Jodhpur ( IITJ ) . I am interested in Web Development and Machine Learning and most passionate about exploring Artificial Intelligence.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Introduction to Computer Vision

Getting Started with Image Data

Introduction to CNN and Implementation

Introduction to CNN and implementation

Introduction to Transfer Learning

CNN Visualization

Overview of Pretrained Models

Inception

ResNets

DenseNets

CSRNet

Introduction to Object Detection

Region Based Convolutional Neural Network

Single Stage Networks

Transformed Based Object Detection Models

Face Detection

Object Tracking

Pose Estimation

Introduction to Image Segmentation

Understanding Deep Learning Architectures for Image Segmentation

Video Classification

Introduction to Image Generation

Experiments with Generative Adversarial Networks

Zero and Few Shot Learning

Model Deployment

Understanding Label Detection in Invoices using OpenCV

Introduction

Table of Contents

Basic Architecture

Image Preprocessing of Invoices

Extracting Coordinates of Labels of Different Invoice Templates

Finding the Template of any Given New Invoice

Label Detection of the New Invoice Using Template’s Label Coordinates

Methods to Improve Performance

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit