We talk about AI almost daily due to its growing impact in replacing humans’ manual work. Building AI-enabled software has rapidly grown in a brief time. Enterprises and businesses believe in integrating reliable and responsible AI in their application to generate more revenue. The most challenging part of integrating AI into an application is the model inference and computation resources utilized in training the model. Many techniques already exist that improve the performance by optimizing the model during inference with fewer computation resources. With this problem statement, Intel introduced the OpenVINO Toolkit, an absolute game-changer. OpenVINO is an open-source toolkit for optimizing and deploying AI inference.
In this article, we will:
This article was published as a part of the Data Science Blogathon.
OpenVINO, which stands for Open Visual Inference and Neural Network Optimization, is an open-source toolkit developed by the Intel team to facilitate the optimization of deep learning models. The vision of the OpenVINO toolkit is to boost your AI deep-learning models and deploy the application on-premise, on-device, or in the cloud with more efficiency and effectiveness.
OpenVINO Toolkit is particularly valuable because it supports many deep learning frameworks, including popular ones like TensorFlow, PyTorch, Onnx, and Caffe. You can train your models using your preferred framework and then use OpenVINO to convert and optimize them for deployment on Intel’s hardware accelerators, like CPUs, GPUs, FPGAs, and VPUs.
Concerning inference, OpenVINO Toolkit offers various tools for model quantization and compression, which can significantly reduce the size of deep learning models without losing inference accuracy.
The craze of AI is currently in no mood to slow down. With this popularity, it is evident that more and more applications will be developed to run AI applications on-premise and on-device. A few of the challenging areas where OpenVINO excels make it an ideal choice why it is crucial to use OpenVINO:
OpenVINO provides a model zoo with pre-trained deep-learning models for tasks like Stable Diffusion, Speech, Object detection, and more. These models can serve as a starting point for your projects, saving you time and resources.
OpenVINO supports many deep learning frameworks, including TensorFlow, PyTorch, ONNx, and Caffe. This means you can use your preferred framework to train your models and then convert and optimize them for deployment using the OpenVINO Toolkit.
OpenVINO is optimized for fast inference, making it suitable for real-time applications like computer vision, robotics, and IoT devices. It leverages hardware acceleration such as FPGA, GPU, and TPU to achieve high throughput and low latency.
AI in Edge is the most challenging area to tackle. Building an optimized solution to solve hardware constraints is no longer impossible with the help of OpenVINO. The future of AI in Edge with this Toolkit has the potential to revolutionize various industries and applications.
Let’s find out how OpenVINO works to make it suitable for AI in Edge.
With this approach, OpenVINO can play a vital role in AI in Edge. Let’s dirty our hands with a code project to implement Text detection in an image using the OpenVINO Toolkit.
In this project implementation, we will use Google Colab as a medium to run the application successfully. In this project, we will use the horizontal-text-detection-0001 model from the OpenVINO model Zoo. This pre-trained model detects horizontal text in input images and returns a blob of data in the shape (100,5). This response looks like (x_min, y_min, x_max, y_max, conf) format.
!pip install openvino
Let’s import the required modules to run this application. OpenVINO supports a utils helper function to download pre-trained weights from the provided source code URL.
import urllib.request
base = "https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks"
utils_file = "/main/notebooks/utils/notebook_utils.py"
urllib.request.urlretrieve(
url= base + utils_file,
filename='notebook_utils.py'
)
from notebook_utils import download_file
You can verify, that notebook_utils is now successfully downloaded, let’s quickly import the remaining modules.
import openvino
import cv2
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
Initialize the Path to download IR data model weight files of horizontal text detection in .xml and .bin format.
base_model_dir = Path("./model").expanduser()
model_name = "horizontal-text-detection-0001"model_xml_name = f'{model_name}.xml'
model_bin_name = f'{model_name}.bin'
model_xml_path = base_model_dir / model_xml_name
model_bin_path = base_model_dir / model_bin_name
In the following code snippet, we use three variables to simplify the path where the pre-trained model weights exist.
model_zoo = "https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.3/models_bin/1/"
algo = "horizontal-text-detection-0001/FP32/"
xml_url = "horizontal-text-detection-0001.xml"
bin_url = "horizontal-text-detection-0001.bin"
model_xml_url = model_zoo+algo+xml_url
model_bin_url = model_zoo+algo+bin_url
download_file(model_xml_url, model_xml_name, base_model_dir)
download_file(model_bin_url, model_bin_name, base_model_dir)
OpenVINO provides a Core class to interact with the OpenVINO toolkit. The Core class provides various methods and functions for working with models and performing inference. Use read_model and pass the model_xml_path. After reading the model, compile the model for a specific target device.
core = Core()
model = core.read_model(model=model_xml_path)
compiled_model = core.compile_model(model=model, device_name="CPU")
input_layer_ir = compiled_model.input(0)
output_layer_ir = compiled_model.output("boxes")
In the above code snippet, the complied model returns the input image shape (704,704,3), an RGB image but in PyTorch format (1,3,704,704) where 1 is the batch size, 3 is the number of channels, 704 is height and weight. Output returns (x_min, y_min, x_max, y_max, conf). Let’s load an input image now.
The model weight is [1,3,704,704]. Consequently, you should resize the input image accordingly to match this shape. In Google Colab or your code editor, you can upload your input image, and in our case, the image file is named sample_image.jpg.
image = cv2.imread("sample_image.jpg")
# N,C,H,W = batch size, number of channels, height, width.
N, C, H, W = input_layer_ir.shape
# Resize the image to meet network expected input sizes.
resized_image = cv2.resize(image, (W, H))
# Reshape to the network input shape.
input_image = np.expand_dims(resized_image.transpose(2, 0, 1), 0)
print("Model weights shape:")
print(input_layer_ir.shape)
print("Image after resize:")
print(input_image.shape)
Display the input image.
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.axis("off")
Previously, we used model weights to compile the model. Use compile the model in context to the input image.
# Create an inference request.
boxes = compiled_model([input_image])[output_layer_ir]
# Remove zero only boxes.
boxes = boxes[~np.all(boxes == 0, axis=1)]
The compiled_model returns boxes with the bounding box coordinates. We use the cv2 module to create a rectangle and putText to add the confidence score above the detected text.
def detect_text(bgr_image, resized_image, boxes, threshold=0.3, conf_labels=True):
# Fetch the image shapes to calculate a ratio.
(real_y, real_x), (resized_y, resized_x) = bgr_image.shape[:2], resized_image.shape[:2]
ratio_x, ratio_y = real_x / resized_x, real_y / resized_y
# Convert image from BGR to RGB format.
rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)
# Iterate through non-zero boxes.
for box in boxes:
# Pick a confidence factor from the last place in an array.
conf = box[-1]
if conf > threshold:
(x_min, y_min, x_max, y_max) = [
int(max(corner_position * ratio_y, 10)) if idx % 2
else int(corner_position * ratio_x)
for idx, corner_position in enumerate(box[:-1])
]
# Draw a box based on the position, parameters in rectangle function are:
# image, start_point, end_point, color, thickness.
rgb_image = cv2.rectangle(rgb_image, (x_min, y_min), (x_max, y_max),(0,255, 0), 10)
# Add text to the image based on position and confidence.
if conf_labels:
rgb_image = cv2.putText(
rgb_image,
f"{conf:.2f}",
(x_min, y_min - 10),
cv2.FONT_HERSHEY_SIMPLEX,
4,
(255, 0, 0),
8,
cv2.LINE_AA,
)
return rgb_image
Display the output image
plt.imshow(detect_text(image, resized_image, boxes));
plt.axis("off")
To conclude, we successfully built Text detection in an image project using the OpenVINO Toolkit. Intel team continuously improves the Toolkit. OpenVINO also supports pre-trained Generative AI models such as Stable Diffusion, ControlNet, Speech-to-text, and more.
A. Intel OpenVINO provides a model zoo with pre-trained deep-learning models for tasks like Stable Diffusion, Speech, and more. OpenVINO runs model zoo pre-trained models on-premise, on-device, and in the cloud more efficiently and effectively.
A. Both OpenVINO and TensorFlow are free and open-source. Developers use TensorFlow, a deep-learning framework, for model development, while OpenVINO, a Toolkit, optimizes deep-learning models and deploys them on Intel hardware accelerators.
A. OpenVINO’s versatility and ability to optimize deep learning models for Intel hardware make it a valuable tool for AI and computer vision applications across various industries such as Military defense, Healthcare, Smart cities, and many more.
A. Yes, Intel’s OpenVINO toolkit is free to use. The Intel team developed this open-source toolkit to facilitate the optimization of deep learning models.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.