Intelligent Document Processing with Azure Form Recognizer

harun raseed Last Updated : 09 Dec, 2024

8 min read

Introduction

Intelligent document processing (IDP) is a technology that uses artificial intelligence (AI) and machine learning (ML) to automatically extract information from unstructured documents such as invoices, receipts, and forms. IDP combines optical character recognition (OCR) technology with AI and ML algorithms to extract data and insights from documents, reducing the need for manual data entry and improving accuracy with Azure forms recogniser.

In this article, we are going to see how we can implement IDP using the Azure Forms Recogzier service and create an end-to-end pipeline to automate the process of document extraction and data visualization using Azure Functions and Power BI.

Learning Objectives

Create an Azure Forms Recognizer service and use its built-in models.
Prepare, Label, Train, and Analyse a custom model based on your own requirement.
How to integrate the Form Recognizer output with Azure Function and automate the process?

This article was published as a part of the Data Science Blogathon.

Getting Started with Azure Forms Recognizer

Azure Forms Recognizer is a cutting-edge technology that utilizes machine learning algorithms to automate document processing and data extraction tasks. With its advanced capabilities, it can quickly analyze structured and unstructured documents such as invoices, receipts, and forms and extract valuable data in a matter of seconds.

In this section, we will see how to create an Azure Form Recognizer service from Azure Portal,

First, log in to your Azure portal (portal.azure.com).
Once you’re logged in, click on the “Create a resource” button on the left-hand side of the screen.
In the “New” pane, type “Form Recognizer” into the search box and press enter.
Select the “Form Recognizer” service from the results.
In the “Form Recognizer” pane, click on the “Create” button.
In the “Create Form Recognizer” pane, fill out the required fields such as subscription, resource group, name of the service, pricing tier, and location.
Next, under the “Features” tab, select the type of form that you want to recognize, such as receipts, invoices, business cards, or custom forms.
Once you select your form type, click the “Review + create” button.
Review your settings and then click on the “Create” button to create your Form Recognizer service.

From the above-created Azure Form Recognizer, we can be able to process some of our documents and receipts whose prebuilt models are already available with Azure Form Recognizer Service.

For Example, the below pre-built models are already available with Azure Form Recognizer service.

Invoices
Receipts
Business Cards
Identity Documents
Health Insurance Cards
US Tax Documents (W-2, 1098, 1098-E, 1098-T)
Contracts
Vaccination Cards

But Consider that the input document is a claim form from an insurance company. Since there is no pre-built model available for processing such claim forms, we will create a custom model for intelligent document processing (IDP). This model will read and extract information from scanned and handwritten claim forms submitted by policyholders.

Creating a Custom Model From Azure Form Recognizer

Azure Form Recognizer | document processing

Creating a Custom model involves four major steps,

Prepare
Label
Train
Analyse

For Prepare step, we need to have a minimum of five sample files (Claim Forms), and we need to label those sample files and train them to do further analysis of the model.

To create a custom model from the Azure portal, we need to follow the below steps.

Select Custom Model from the Azure Form Recognizer Studio
Create a New Project, Give the appropriate Project name and description, and click continue.

In the next pop-up, choose the appropriate Azure Subscription and Rescource group where you created your Azure Form recognizer Resource, choose the latest API version from the list, and click continue.

In this section, We need to connect to the source dataset container. For that, choose the appropriate subscription, resource group and storage account and container where you are going to keep your training dataset. Click Continue to Review and Create the Project.

Prepare

As discussed earlier, we need to have a minimum of five sample files for labeling and training the model. So we need to upload those five sample files in the container or directly in the UI we can upload/Drag and Drop.

Label

Once the sample files are available, Then we need to Start Labeling our sample files.

After labeling (Minimum Five files), click on the Train button.

Train: It will ask us to choose from two different build modes (Template and Neural)

Template: This mode is for Structure-based extraction, and it takes only 1-5 minutes to train the model, and it supports 164 languages.

Neural: This mode is for Structured, semi-structured, and Unstructured based extraction. It takes around 30 minutes to train a model and it supports only English Language documents.

In our case, the claim forms come under Structured/Semi-Structured, so we can choose Template mode and click on Train.

Within a few seconds to minutes, the Model will get created in the Models Section.

Now we can start analyzing our new Claim forms.
Upload any new claim form we want to process and click the Analyse button.
Now the document will get analyzed based on the model that we created, and it will give the extracted information in the portal.

Test and evaluate your custom model: After the training is complete, you can test your custom model by submitting test documents to it and reviewing the output.

Problem Statement

Organizations that deal with huge volumes of documents are facing a significant challenge in processing a large number of scanned and handwritten documents and forms received from their customers. These documents and forms contain a vast amount of critical information, such as personal details, medical history, and damage assessment reports, which must be accurately extracted and processed for efficient claim processing. However, manual processing has become increasingly time-consuming, error-prone, and resource-intensive due to the sheer volume of documents. This has led to delays in claims processing, increased operational costs, and dissatisfied policyholders.

To address this challenge, they need a solution that can automate the document processing and data extraction processes, improve accuracy and reduce the overall processing time.

Proposed Solution For Real-Time Use Case

To overcome the above-mentioned problem, we can use the Azure Form Recognizer service for intelligent document processing along with other data engineering methodologies to process the documents on a large scale with a lesser operational cost. We can also extract the data from the document, do a transformation, and then visualize it in a dashboard, which helps the organizations analyze their KPIs and allows them to make some business decisions.

Automation Process

We can create a custom model based on our requirements. But now, the challenging part is how we are going to consume that data. We cannot consume the data from UI directly. And also, we need to automate a process, like whenever a new scanned document lands in the storage container, it needs to get processed by the form recognizer, and the extracted information needs to be saved as a file. Further, the file should get visualized in PowerBI.

The architecture below clearly explains how we are going to achieve the above challenge.

Once the claim forms land in the storage container, the Azure function equipped with Blob trigger will trigger the function (once it identifies a new blob activity) and run the code inside the Azure function to call the Azure forms recognizer, extract the data, process the data using simple Pandas code, and save it as a csv file in the blob storage container. Once we got the csv file in the output container, we could connect that storage path with Power BI and visualize the data.

Find the Below sample API calls for different pre-built and custom models.

Layout API:

https://{endpoint}/formrecognizer/v2.0/layout/analyze

Receipt API:

https://{endpoint}/formrecognizer/v2.0/prebuilt/receipt/analyze[?includeTextDetails]

Custom Model API:

https://{endpoint}/formrecognizer/v2.0/custom/models/{modelId}/analyze[?includeTextDetails]

With the help of these sample APIs, we can embed our form recognizer service into various other services based on our requirements. In this particular scenario, we are going to create an Azure Function App, call the Form Recognizer using the API, and process our document.

In the below code snippet, we will call the Layout API and process a document that is in pdf format, extract the document’s content, convert it into a csv file, and push it to a separate container (output container).

Whenever a file lands in the input container, the blob trigger will call the Layout API and process the document, and the Azure function will push the converted csv file into the output container.

import logging
from azure.storage.blob import BlobServiceClient
import azure.functions as func
import json
import time
from requests import get, post
import os
import requests
from collections import OrderedDict
import numpy as np
import pandas as pd

def main(myblob: func.InputStream):
    logging.info(f"Python blob trigger function processed blob \n"
                 f"Name: {myblob.name}\n"
                 f"Blob Size: {myblob.length} bytes")

    # This is the call to the Form Recognizer endpoint
    endpoint = r"https://myformrecognizername.cognitiveservices.azure.com/"
    apim_key = "***************************"
    post_url = endpoint + "/formrecognizer/v2.1/layout/analyze"
    source = myblob.read()

    headers = {
    # Request headers
    'Content-Type': 'application/pdf',
    'Ocp-Apim-Subscription-Key': apim_key,
    }

    text1=os.path.basename(myblob.name)

    resp = requests.post(url= post_url, data= source, headers= headers)

    if resp.status_code != 202:
        print("POST analyze failed:\n%s" % resp.text)
        quit()
    print("POST analyze succeeded:\n%s" % resp.headers)
    get_url = resp.headers["operation-location"]

    wait_sec = 25

    time.sleep(wait_sec)
    # The layout API is async therefore the wait statement

    resp = requests.get(url=get_url, headers={"Ocp-Apim-Subscription-Key": apim_key})

    resp_json = json.loads(resp.text)

    status = resp_json["status"]

    if status == "succeeded":
        print("POST Layout Analysis succeeded:\n%s")
        results = resp_json
    else:
        print("GET Layout results failed:\n%s")
        quit()

    results = resp_json
    print("i came here")

    # This is the connection to the blob storage, with the Azure Python SDK
    blob_service_client = BlobServiceClient.from_connection_string("DefaultEndpointsProtocol=https;AccountName=storageaccountname;AccountKey={***key***}==;EndpointSuffix=core.windows.net")
    container_client=blob_service_client.get_container_client("output")
    print("storage")
    # The code below extracts the json format into tabular data.
    # Please note that you need to adjust the code below to your form structure.
    # It probably won't work out-of-the-box for your specific form.
    pages = results["analyzeResult"]["pageResults"]
    
    def make_page(p):
        res=[]
        res_table=[]
        y=0
        page = pages[p]
        for tab in page["tables"]:
            for cell in tab["cells"]:
                res.append(cell)
                res_table.append(y)
            y=y+1

        res_table=pd.DataFrame(res_table)
        res=pd.DataFrame(res)
        res["table_num"]=res_table[0]
        h=res.drop(columns=["boundingBox","elements"])
        h.loc[:,"rownum"]=range(0,len(h))
        num_table=max(h["table_num"])
        return h, num_table, p

    h, num_table, p= make_page(0)

    for k in range(num_table+1):
        new_table=h[h.table_num==k]
        new_table.loc[:,"rownum"]=range(0,len(new_table))
        row_table=pages[p]["tables"][k]["rows"]
        col_table=pages[p]["tables"][k]["columns"]
        b=np.zeros((row_table,col_table))
        b=pd.DataFrame(b)
        s=0
        for i,j in zip(new_table["rowIndex"],new_table["columnIndex"]):
            b.loc[i,j]=new_table.loc[new_table.loc[s,"rownum"],"text"]
            s=s+1
            
    # Here is the upload to the blob storage
    tab1_csv=b.to_csv(header=False,index=False,mode='w')
    name1=(os.path.splitext(text1)[0]) +'.csv'
    container_client.upload_blob(name=name1,data=tab1_csv)

Data Visualization

Once the csv file is created in the Output container, we can create a visualization in Power BI Desktop using the Azure Blob Storage Connector.

Once we connect to the storage account, we can create a simple visualization in Power BI from the csv file available in the output container.

Sample Visualization from the Processed Data | azure

The above report can be published further to Power BI services with proper dataset refresh intervals to get real-time reporting.

Conclusion

The intelligent data processing solution using Azure Form Recognizer, Azure Function, and Power BI visualization provides a powerful tool for industries to automate data extraction and analysis from a wide range of forms, documents, and receipts. This solution offers numerous benefits, including increased efficiency, accuracy, and cost savings for businesses by reducing manual data entry and errors and providing timely insights for better decision-making.

The key takeaways from this article are:

Intelligent Document Processing (IDP) is a technology that automates the extraction of data from documents using machine learning algorithms.
Azure Form Recognizer is a cloud-based IDP service offered by Microsoft Azure that can extract structured data from various types of documents, such as invoices, receipts, and forms.
Azure Functions is a serverless computing service offered by Microsoft Azure that enables developers to run code in response to events and triggers without the need to manage infrastructure.
By combining Azure Form Recognizer with Azure Functions, developers can create intelligent document processing workflows that automatically extract data from documents and integrate it into other applications or systems.
At the end, we also discussed how to implement the end-to-end architecture, including Power BI for visualization.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

harun raseed

An engineer who loves to play with data and has more than half a decade of experience in the data engineering field.
Skilled in Databricks, MSBI, SQL, ETL Tools, Data Analysis, and Azure Cloud Technology.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction

Tools

Libraries

Plots

Use cases

Intelligent Document Processing with Azure Form Recognizer

Introduction

Table of Contents

Getting Started with Azure Forms Recognizer

Creating a Custom Model From Azure Form Recognizer

Problem Statement

Proposed Solution For Real-Time Use Case

Automation Process

Data Visualization

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

Google (11)

_gcl_au

SID

SAPISID

__Secure-#

APISID

SSID

HSID

DV

NID

1P_JAR