Intelligent document processing (IDP) is a technology that uses artificial intelligence (AI) and machine learning (ML) to automatically extract information from unstructured documents such as invoices, receipts, and forms. IDP combines optical character recognition (OCR) technology with AI and ML algorithms to extract data and insights from documents, reducing the need for manual data entry and improving accuracy with Azure forms recogniser.
In this article, we are going to see how we can implement IDP using the Azure Forms Recogzier service and create an end-to-end pipeline to automate the process of document extraction and data visualization using Azure Functions and Power BI.
Learning Objectives
This article was published as a part of the Data Science Blogathon.
Azure Forms Recognizer is a cutting-edge technology that utilizes machine learning algorithms to automate document processing and data extraction tasks. With its advanced capabilities, it can quickly analyze structured and unstructured documents such as invoices, receipts, and forms and extract valuable data in a matter of seconds.
In this section, we will see how to create an Azure Form Recognizer service from Azure Portal,
From the above-created Azure Form Recognizer, we can be able to process some of our documents and receipts whose prebuilt models are already available with Azure Form Recognizer Service.
For Example, the below pre-built models are already available with Azure Form Recognizer service.
But Consider that the input document is a claim form from an insurance company. Since there is no pre-built model available for processing such claim forms, we will create a custom model for intelligent document processing (IDP). This model will read and extract information from scanned and handwritten claim forms submitted by policyholders.
Creating a Custom model involves four major steps,
For Prepare step, we need to have a minimum of five sample files (Claim Forms), and we need to label those sample files and train them to do further analysis of the model.
To create a custom model from the Azure portal, we need to follow the below steps.
Prepare
As discussed earlier, we need to have a minimum of five sample files for labeling and training the model. So we need to upload those five sample files in the container or directly in the UI we can upload/Drag and Drop.
Label
Once the sample files are available, Then we need to Start Labeling our sample files.
After labeling (Minimum Five files), click on the Train button.
Train: It will ask us to choose from two different build modes (Template and Neural)
Template: This mode is for Structure-based extraction, and it takes only 1-5 minutes to train the model, and it supports 164 languages.
Neural: This mode is for Structured, semi-structured, and Unstructured based extraction. It takes around 30 minutes to train a model and it supports only English Language documents.
In our case, the claim forms come under Structured/Semi-Structured, so we can choose Template mode and click on Train.
Within a few seconds to minutes, the Model will get created in the Models Section.
Test and evaluate your custom model: After the training is complete, you can test your custom model by submitting test documents to it and reviewing the output.
Organizations that deal with huge volumes of documents are facing a significant challenge in processing a large number of scanned and handwritten documents and forms received from their customers. These documents and forms contain a vast amount of critical information, such as personal details, medical history, and damage assessment reports, which must be accurately extracted and processed for efficient claim processing. However, manual processing has become increasingly time-consuming, error-prone, and resource-intensive due to the sheer volume of documents. This has led to delays in claims processing, increased operational costs, and dissatisfied policyholders.
To address this challenge, they need a solution that can automate the document processing and data extraction processes, improve accuracy and reduce the overall processing time.
To overcome the above-mentioned problem, we can use the Azure Form Recognizer service for intelligent document processing along with other data engineering methodologies to process the documents on a large scale with a lesser operational cost. We can also extract the data from the document, do a transformation, and then visualize it in a dashboard, which helps the organizations analyze their KPIs and allows them to make some business decisions.
We can create a custom model based on our requirements. But now, the challenging part is how we are going to consume that data. We cannot consume the data from UI directly. And also, we need to automate a process, like whenever a new scanned document lands in the storage container, it needs to get processed by the form recognizer, and the extracted information needs to be saved as a file. Further, the file should get visualized in PowerBI.
The architecture below clearly explains how we are going to achieve the above challenge.
Once the claim forms land in the storage container, the Azure function equipped with Blob trigger will trigger the function (once it identifies a new blob activity) and run the code inside the Azure function to call the Azure forms recognizer, extract the data, process the data using simple Pandas code, and save it as a csv file in the blob storage container. Once we got the csv file in the output container, we could connect that storage path with Power BI and visualize the data.
Find the Below sample API calls for different pre-built and custom models.
Layout API:
Receipt API:
Custom Model API:
With the help of these sample APIs, we can embed our form recognizer service into various other services based on our requirements. In this particular scenario, we are going to create an Azure Function App, call the Form Recognizer using the API, and process our document.
In the below code snippet, we will call the Layout API and process a document that is in pdf format, extract the document’s content, convert it into a csv file, and push it to a separate container (output container).
Whenever a file lands in the input container, the blob trigger will call the Layout API and process the document, and the Azure function will push the converted csv file into the output container.
import logging
from azure.storage.blob import BlobServiceClient
import azure.functions as func
import json
import time
from requests import get, post
import os
import requests
from collections import OrderedDict
import numpy as np
import pandas as pd
def main(myblob: func.InputStream):
logging.info(f"Python blob trigger function processed blob \n"
f"Name: {myblob.name}\n"
f"Blob Size: {myblob.length} bytes")
# This is the call to the Form Recognizer endpoint
endpoint = r"https://myformrecognizername.cognitiveservices.azure.com/"
apim_key = "***************************"
post_url = endpoint + "/formrecognizer/v2.1/layout/analyze"
source = myblob.read()
headers = {
# Request headers
'Content-Type': 'application/pdf',
'Ocp-Apim-Subscription-Key': apim_key,
}
text1=os.path.basename(myblob.name)
resp = requests.post(url= post_url, data= source, headers= headers)
if resp.status_code != 202:
print("POST analyze failed:\n%s" % resp.text)
quit()
print("POST analyze succeeded:\n%s" % resp.headers)
get_url = resp.headers["operation-location"]
wait_sec = 25
time.sleep(wait_sec)
# The layout API is async therefore the wait statement
resp = requests.get(url=get_url, headers={"Ocp-Apim-Subscription-Key": apim_key})
resp_json = json.loads(resp.text)
status = resp_json["status"]
if status == "succeeded":
print("POST Layout Analysis succeeded:\n%s")
results = resp_json
else:
print("GET Layout results failed:\n%s")
quit()
results = resp_json
print("i came here")
# This is the connection to the blob storage, with the Azure Python SDK
blob_service_client = BlobServiceClient.from_connection_string("DefaultEndpointsProtocol=https;AccountName=storageaccountname;AccountKey={***key***}==;EndpointSuffix=core.windows.net")
container_client=blob_service_client.get_container_client("output")
print("storage")
# The code below extracts the json format into tabular data.
# Please note that you need to adjust the code below to your form structure.
# It probably won't work out-of-the-box for your specific form.
pages = results["analyzeResult"]["pageResults"]
def make_page(p):
res=[]
res_table=[]
y=0
page = pages[p]
for tab in page["tables"]:
for cell in tab["cells"]:
res.append(cell)
res_table.append(y)
y=y+1
res_table=pd.DataFrame(res_table)
res=pd.DataFrame(res)
res["table_num"]=res_table[0]
h=res.drop(columns=["boundingBox","elements"])
h.loc[:,"rownum"]=range(0,len(h))
num_table=max(h["table_num"])
return h, num_table, p
h, num_table, p= make_page(0)
for k in range(num_table+1):
new_table=h[h.table_num==k]
new_table.loc[:,"rownum"]=range(0,len(new_table))
row_table=pages[p]["tables"][k]["rows"]
col_table=pages[p]["tables"][k]["columns"]
b=np.zeros((row_table,col_table))
b=pd.DataFrame(b)
s=0
for i,j in zip(new_table["rowIndex"],new_table["columnIndex"]):
b.loc[i,j]=new_table.loc[new_table.loc[s,"rownum"],"text"]
s=s+1
# Here is the upload to the blob storage
tab1_csv=b.to_csv(header=False,index=False,mode='w')
name1=(os.path.splitext(text1)[0]) +'.csv'
container_client.upload_blob(name=name1,data=tab1_csv)
Once the csv file is created in the Output container, we can create a visualization in Power BI Desktop using the Azure Blob Storage Connector.
Once we connect to the storage account, we can create a simple visualization in Power BI from the csv file available in the output container.
The above report can be published further to Power BI services with proper dataset refresh intervals to get real-time reporting.
The intelligent data processing solution using Azure Form Recognizer, Azure Function, and Power BI visualization provides a powerful tool for industries to automate data extraction and analysis from a wide range of forms, documents, and receipts. This solution offers numerous benefits, including increased efficiency, accuracy, and cost savings for businesses by reducing manual data entry and errors and providing timely insights for better decision-making.
The key takeaways from this article are:
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.