Image segmentation models have brought ways to complete tasks in various dimensions. The open-source space has overseen different computer vision tasks and their applications. Background removal is another image segmentation task that models have continued to explore over the years.
Bria’s RMGB v2.0 is a state-of-the-art model that performs background removal with great precision and accuracy. This model is an improvement from the older RMGB 1.4 version. This open-source model comes with accuracy, efficiency, and versatility across different benchmarks.
This model has applications in various fields, from gaming to stock image generation. Its capabilities can also be associated with its training data and architecture, allowing it to operate in various contexts.
This article was published as a part of the Data Science Blogathon.
This model has a simple working principle. It takes images as input(in various formats, such as Jpeg, PNG, etc.). After processing the images, the models provide an output of a segmented image area, removing the background or foreground.
RGMB can also provide a mask to process the image further or add a new background.
This model’s performance beats its predecessor—-the RGMB v1.4 — with performance and accuracy. Results from testing a few images highlighted how the v2.0 presented a cleaner background.
Although the earlier version performed well, RGMB v2.0 sets a new standard for understanding complex scenes and details on the edges while improving background removal in general.
Check out this link to test the earlier version with the latest can be found here.
Developed by BRAI AI, RMGB is based on the BiRefNet mechanism. This framework is an architecture that allows high-resolution tasks involving image-background separation.
This approach combines the representation complementary representation from two sources within a high-resolution restoration model. This method combines overall scene understanding (general localization) with detailed edge information(local), allowing for clear and precise boundary detection.
RGMB v2.0 uses a two-stage model to leverage the BiRefNet architecture: the Localization and restoration modules.
The localization module generates the general semantic map representing the image’s primary areas. This component ensures that the model accurately represents the image’s structure. With this framework, the model can identify where the location of objects in the image while considering the background.
On the other hand, the restoration module helps with the restoration boundaries of the object in the image. It performs this process in high resolution, compared to the first stage, where the semantic map generation is done in a lower resolution.
The restoration module has two phases: the original reference, a pixel map of the original image, provides background context. The second phase is the gradient reference, which provides the details of the fine edges. The gradient reference can also help with accuracy by giving context to images with sharp boundaries and complex colors.
This approach yields excellent results in object separation, especially in high-resolution images. The BriRefNet architecture and the model training dataset can provide the best results on various benchmarks.
You can run inference on this model even in low-resource environments. You can completely perform an accurate separation by working with a simple background image.
Let’s dive into how we can run the RGMB v2.0 model;
pip install kornia
Installing Konia is relevant for this task as it is a Python library essential for various computer vision models. Konia is a differentiable computer vision task built on PyTorch that provides functionalities for image processing, geometric transformations, filtering, and deep learning applications.
from PIL import Image
import matplotlib.pyplot as plt
import torch
from torchvision import transforms
from transformers import AutoModelForImageSegmentation
These libraries are all essential to running this model. ‘PIL’ always comes in handy for image processing tasks like loading and opening images, while ‘matpotlib’ is great for displaying images and drawing graphs.
The ‘torch’ transforms the images into a format compatible with deep learning models. Finally, we use ‘AutoModelForIMageSegmentation’, which allows us to use the pre-trained model for image segmentation.
model = AutoModelForImageSegmentation.from_pretrained('briaai/RMBG-2.0', trust_remote_code=True)
torch.set_float32_matmul_precision(['high', 'highest'][0])
model.to('cuda')
model.eval()
This code loads the pre-trained model for background removal, then applies the ‘trust_remote_code=True’ as it allows the execution of custom Python code. The next line optimizes the performance using matrix multiplications.
Finally, we move the model to use available GPU and prepare it for inference.
This code defines the image processing stage by resizing the image to 1024 x 1024 and converting it to tensors. So, we have the pixel values in mean and standard deviation.
The ‘transform.compose’ function helps process the input image operation in a chain-like transformation to ensure that it is processed uniformly. This step also keeps the pixel values in a consistent range.
image_size = (1024, 1024)
transform_image = transforms.Compose([
transforms.Resize(image_size),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
image = Image.open("/content/Boy using a computer.jpeg")
input_images = transform_image(image).unsqueeze(0).to('cuda')
Here, we load the image and prepare it for the model. First, it opens the image using ‘PIL.’ Then, it resizes it and converts it to tensors. An extra batch dimension is also added to the image before moving it to ‘cuda’ for GPU to speed up the inference and ensure compatibility with the model.
This code removes the background by generating a segmentation mask from the model’s predictions and applying it to the original image.
with torch.no_grad():
preds = model(input_images)[-1].sigmoid().cpu()
pred = preds[0].squeeze()
pred_pil = transforms.ToPILImage()(pred)
mask = pred_pil.resize(image.size)
image.putalpha(mask)
This code removes the background by getting a transparency mask from the model. It runs the model without gradient tracking, applies sigmoid() to get pixel probabilities, and moves the result to the CPU. The mask is resized to match the original image and set as its alpha channel, making the background transparent.
The result of the input image is below, with the background removed and separated from the primary object (the boy).
Here is the file to the code.
There are various use cases of this model across different fields. Some of the common applications include;
RMGB is used across various industries. This model’s capabilities have also improved from the earlier v1.2 to the more recent v2.0. Its architecture and utilization of the BiRefNet play a huge role in its performance and inference time. You can explore this model with various image types and the output and quality of performance.
A. RMGB v2.0 improves edge detection, background separation, and accuracy, especially in complex scenes with detailed edges.
A. It supports various formats, such as JPEG and PNG, making it adaptable for different use cases.
A. This model is optimized for low-resource environments and can run efficiently on standard GPUs.
A. RMGB v2.0 is built on the BiRefNet mechanism, which improves high-resolution image-background separation using localization and restoration modules.
A. You can install required dependencies like Kornia, load the pre-trained model, preprocess images, and perform inference using PyTorch.
A. You can refer to BraiAI’s blog, Hugging Face model repository, and AIModels.fyi for documentation and implementation guides.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.