This guide will walk you through what Segment Anything Model 2 is, how it works, and how you’ll utilize it to portion objects in pictures and videos. It offers state-of-the-art execution and adaptability in fragmenting objects into pictures, making it an important resource for a assortment of computer vision applications. This directly points to supplying a nitty-gritty, step-by-step walkthrough for setting up and utilizing SAM 2 to perform picture division. By taking this direct, you will be able to produce division covers for pictures utilizing both box and point prompts.
This article was published as a part of the Data Science Blogathon.
Some time recently you begin, guarantee you’ve got a CUDA-enabled GPU for quicker handling. Also, verify that you have Python installed on your machine. This guide assumes you have some basic knowledge of Python and image processing concepts.
Segment Anything Model 2 is an progressed instrument for picture division created by Facebook AI Inquire about (Reasonable). On July 29th, 2024, Meta AI discharged SAM 2, an progressed picture and video division establishment show. SAM 2 empowers clients to supply focuses or boxes in an picture or video to create division covers for particular objects.
Let us now look into the applications of SAM 2 below:
Image segmentation is a computer vision technique that involves dividing an image into multiple segments or regions to simplify its analysis. Each segment represents a different object or part of an object within the image, making it easier to identify and analyze specific elements.
Types of Image Segmentation
We’ll guide you through the process of setting up the Segment Anything Model 2 (SAM 2) in your environment and utilizing its powerful capabilities for precise image segmentation tasks. From ensuring your GPU is ready to configuring the model and applying it to real images, each step will be covered in detail to help you harness the full potential of SAM 2.
First, let’s ensure that your environment is properly set up, starting with checking for GPU availability and setting the current working directory.
# Check GPU availability and CUDA version
!nvidia-smi
!nvcc --version
# Import necessary modules
import os
# Set the current working directory
HOME = os.getcwd()
print("HOME:", HOME)
Next, we need to clone the SAM 2 repository from GitHub and install the required dependencies.
# Clone the SAM 2 repository
!git clone https://github.com/facebookresearch/segment-anything-2.git
# Change to the repository directory
%cd segment-anything-2
# Install the SAM 2 package
!pip install -e .
# Install additional packages
!pip install supervision jupyter_bbox_widget
Model checkpoints are essential, as they contain the trained parameters of SAM 2. We will download multiple checkpoints for different model sizes.
# Create a directory for checkpoints
!mkdir -p checkpoints
# Download the model checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt -P checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_small.pt -P checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_base_plus.pt -P checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt -P checkpoints
For demonstration purposes, we’ll use some sample images. You can also use your images by following similar steps.
# Create a directory for data
!mkdir -p data
# Download sample images
!wget -q https://media.roboflow.com/notebooks/examples/dog.jpeg -P data
!wget -q https://media.roboflow.com/notebooks/examples/dog-2.jpeg -P data
!wget -q https://media.roboflow.com/notebooks/examples/dog-3.jpeg -P data
!wget -q https://media.roboflow.com/notebooks/examples/dog-4.jpeg -P data
Now, we will set up the SAM 2 model, load an image, and prepare it for segmentation.
import cv2
import torch
import numpy as np
import supervision as sv
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from sam2.automatic_mask_generator import SAM2AutomaticMaskGenerator
# Enable CUDA if available
torch.autocast(device_type="cuda", dtype=torch.bfloat16).__enter__()
if torch.cuda.get_device_properties(0).major >= 8:
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
# Set the device to CUDA
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Define the model checkpoint and configuration
CHECKPOINT = "checkpoints/sam2_hiera_large.pt"
CONFIG = "sam2_hiera_l.yaml"
# Build the SAM 2 model
sam2_model = build_sam2(CONFIG, CHECKPOINT, device=DEVICE, apply_postprocessing=False)
# Create the automatic mask generator
mask_generator = SAM2AutomaticMaskGenerator(sam2_model)
# Load an image for segmentation
IMAGE_PATH = "/content/WhatsApp Image 2024-08-02 at 14.17.11_2b223e01.jpg"
image_bgr = cv2.imread(IMAGE_PATH)
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)
# Generate segmentation masks
sam2_result = mask_generator.generate(image_rgb)
We will now visualize the segmentation masks generated by SAM 2.
# Annotate the masks on the image
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)
detections = sv.Detections.from_sam(sam_result=sam2_result)
annotated_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)
# Plot the original and segmented images side by side
sv.plot_images_grid(
images=[image_bgr, annotated_image],
grid_size=(1, 2),
titles=['source image', 'segmented image']
)
# Extract and plot individual masks
masks = [
mask['segmentation']
for mask in sorted(sam2_result, key=lambda x: x['area'], reverse=True)
]
sv.plot_images_grid(
images=masks[:16],
grid_size=(4, 4),
size=(12, 12)
)
Box prompts allow us to specify regions of interest in the image for segmentation.
# Define the SAM 2 Image Predictor
predictor = SAM2ImagePredictor(sam2_model)
# Reload the image
image_bgr = cv2.imread(IMAGE_PATH)
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)
# Encode the image for bounding box input
import base64
def encode_image(filepath):
with open(filepath, 'rb') as f:
image_bytes = f.read()
encoded = str(base64.b64encode(image_bytes), 'utf-8')
return "data:image/jpg;base64,"+encoded
# Enable custom widget manager in Colab
IS_COLAB = True
if IS_COLAB:
from google.colab import output
output.enable_custom_widget_manager()
from jupyter_bbox_widget import BBoxWidget
# Create a bounding box widget
widget = BBoxWidget()
widget.image = encode_image(IMAGE_PATH)
# Display the widget
widget
After specifying the bounding boxes, we can use them to generate segmentation masks.
# Get the bounding boxes from the widget
boxes = widget.bboxes
boxes = np.array([
[
box['x'],
box['y'],
box['x'] + box['width'],
box['y'] + box['height']
] for box in boxes
])
[{'x': 457, 'y': 341, 'width': 0, 'height': 0, 'label': ''},
{'x': 205, 'y': 79, 'width': 0, 'height': 1, 'label': ''}]
# Set the image in the predictor
predictor.set_image(image_rgb)
# Generate masks using the bounding boxes
masks, scores, logits = predictor.predict(
box=boxes,
multimask_output=False
)
# Convert masks to binary format
masks = np.squeeze(masks)
# Annotate and visualize the masks
box_annotator = sv.BoxAnnotator(color=sv.Color.white())
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)
detections = sv.Detections(
xyxy=boxes,
mask=masks.astype(bool)
)
source_image = box_annotator.annotate(scene=image_bgr.copy(), detections=detections)
segmented_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)
# Plot the annotated images
sv.plot_images_grid(
images=[source_image, segmented_image],
grid_size=(1, 2),
titles=['source image', 'segmented image']
)
Point prompts allow us to specify individual points of interest for segmentation.
# Create point prompts based on bounding boxes
input_point = np.array([
[
box['x'] + (box['width'] // 2),
box['y'] + (box['height'] // 2)
] for box in widget.bboxes
])
input_label = np.array([1] * len(input_point))
# Generate masks using the point prompts
masks, scores, logits = predictor.predict(
point_coords=input_point,
point_labels=input_label,
multimask_output=True
)
# Convert masks to binary format
masks = np.squeeze(masks)
# Annotate and visualize the masks
point_annotator = sv.PointAnnotator(color_lookup=sv.ColorLookup.INDEX)
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)
detections = sv.Detections(
xyxy=sv.mask_to_xyxy(masks=masks),
mask=masks.astype(bool)
)
source_image = point_annotator.annotate(scene=image_bgr.copy(), detections=detections)
segmented_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)
# Plot the annotated images
sv.plot_images_grid(
images=[source_image, segmented_image],
grid_size=(1, 2),
titles=['source image', 'segmented image']
)
Let us now look into few important key points below:
The Segment Anything Model 2 (SAM 2) stands poised to revolutionize the fields of photo and video editing by introducing significant advancements in precision and computational efficiency. By integrating advanced AI capabilities, SAM 2 will enable more intuitive user interactions and real-time segmentation and editing, allowing seamless alterations with minimal effort. This groundbreaking technology promises to democratize content creation, empowering both professionals and amateurs to manipulate visual content, create stunning effects, and produce high-quality media with ease.
As SAM 2 automates complex segmentation tasks, it will accelerate workflows and make sophisticated editing accessible to a wider audience. This transformation will inspire innovation across various industries, from entertainment and advertising to education. In the realm of visual effects (VFX), SAM 2 will streamline intricate processes, reducing the time and effort needed to create elaborate VFX. This will enable more ambitious projects, elevate the quality of visual storytelling, and open up new creative possibilities in the VFX world.
By following this guide, you have learned how to set up and use the Segment Anything Model 2 (SAM 2) for image segmentation using both box and point prompts. SAM 2 provides powerful and flexible tools for segmenting objects in images, making it a valuable asset for various computer vision tasks. Feel free to experiment with your images and explore the capabilities of SAM 2 further.
A. SAM 2, or Section Anything Show 2, is a picture and video division show created by Meta AI that permits clients to produce division covers for particular objects by giving box or point prompts.
A. To use SAM 2, you need a CUDA-enabled GPU for faster processing and Python installed on your machine. Basic knowledge of Python and image processing concepts is also helpful.
A. Set up SAM 2 by checking GPU availability, cloning the SAM 2 repository from GitHub, installing required dependencies, and downloading model checkpoints and sample images for testing.
A. SAM 2 supports both box prompts and point prompts. Box prompts involve specifying regions of interest using bounding boxes, while point prompts involve selecting specific points in the image.
A. SAM 2 can revolutionize photo and video altering by mechanizing complex division assignments, empowering real-time altering, and making advanced altering apparatuses available to a broader gathering of people, in this manner improving imaginative conceivable outcomes and workflow proficiency.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.