A Step-by-Step Guide to Image Segmentation Techniques (Part 1)

Pulkit Sharma Last Updated : 02 Dec, 2024
16 min read

What’s the first thing you do when attempting to cross the road? We typically look left and right, take stock of the vehicles on the road, and decide. In milliseconds, Our brain can analyze what kind of vehicle (car, bus, truck, auto, etc.) is approaching us. Can machines do that?

The answer was an emphatic ‘no’ until a few years back. However, the rise and advancements in computer vision have changed the game. We can build computer vision models that can detect objects, determine their shape, predict the direction they will go in, and many other things. You might have guessed it—that’s the powerful technology behind self-driving cars!

image segmentation

There are multiple ways of dealing with computer vision challenges. The most popular approach I have encountered is based on identifying the objects present in an image, aka object detection. But what if we want to dive deeper? What if just detecting objects isn’t enough—we want to analyze our image at a much more granular level?

As data scientists, we are always curious to dig deeper into the data. Asking questions like these is why I love working in this field!

In this article, I will introduce you to image segmentation. It is a powerful computer vision algorithm that builds upon the idea of object detection and takes us to a whole new level of working with image data. This technique opens up so many possibilities – it has blown my mind. Also, in this article, you will learn about image segmentation in image processing, its benefits, and how it works. So, you will completely understand image segmentation and image segmentation in image processing.So, you will completely understand what image segmentation is and how it fits into the broader scope of types of image segmentation.

What Is Image Segmentation?

Image segmentation is a computer vision method that divides a digital image into distinct pixel groups, known as image segments, to aid in object detection and similar tasks. By breaking down an image’s complex visual information into uniquely shaped segments, this technique facilitates quicker and more sophisticated image processing

Here’s a breakdown of what image segmentation is and what it does:

  • Goal: Simplify and analyze images by separating them into different segments. This makes it easier for computers to understand the content of the image.
  • Process: Assigns a label to each pixel in the image. Pixels with the same label share certain properties, like color or brightness.
  • Benefits:
    • Enables object detection and recognition in images.
    • Allows for more detailed analysis of specific image regions.
    • Simplifies image processing tasks.pen_spark

Let’s understand the image segmentation algorithm using a simple example. Consider the below image:

dog image

There’s only one object here – a dog. We can build a straightforward cat-dog classifier model and predict that there’s a dog in the given image. But what if we have a cat and a dog in a single image?

cat and dog

We can train a multi-label classifier, for instance. However, there’s another caveat—we won’t know the location of either animal or object in the image.

That’s where image localization comes into the picture (no pun intended!). It helps us identify a single object’s location in the given image. We rely on object detection (OD) if we have multiple objects present. We can predict the location and class for each object using OD.

image localization and object detection

Before detecting the objects and even before classifying the image, we need to understand what it consists of. Enter Image Segmentation.

How Does Image Segmentation Work?

We can divide or partition the image into various parts called segments. It’s not a great idea to process the entire image at the same time, as there will be regions in the image that do not contain any information. By dividing the image into segments, we can use the important segments to process the image. That, in a nutshell, is how image segmentation works.

An image is a collection or set of different pixels. We group the pixels that have similar attributes using image segmentation. Take a moment to go through the below visual (it’ll give you a practical idea of segmentation in image processing):

Object detection builds a bounding box corresponding to each class in the image. But it tells us nothing about the object’s shape—we only get the set of bounding box coordinates. We want more information—this is too vague for our purposes.

The image segmentation algorithm creates a pixel-wise mask for each object in the image. This technique gives us a far more granular understanding of the object(s) in the image.

Why do we need to go this deep? Can’t all image processing tasks be solved using simple bounding box coordinates? Let’s take a real-world example to answer this pertinent question.

What Is Image Segmentation Used For?

Cancer has long been a deadly illness. Even in today’s age of technological advancements, cancer can be fatal if we don’t identify it at an early stage. Detecting cancerous cells as quickly as possible can save millions of lives.

The shape of the cancerous cells plays a vital role in determining the severity of the cancer. You might have put the pieces together, but object detection will not be very useful here. We will only generate bounding boxes, which will not help us identify the shape of the cells.

Image Segmentation techniques make a MASSIVE impact here. They help us approach this problem more granularly and get more meaningful results. A win-win for everyone in the healthcare industry.

cancer cell segmentation

Here, we can see the shapes of all the cancerous cells. There are many other applications where the Image segmentation algorithm is transforming industries:

There are even more applications where Image Segmentation algorithms are very useful. Feel free to share them with me in the comments section below this article – let’s see if we can build something together. 🙂

Different Types of Image Segmentation

We can broadly divide image segmentation techniques into two types. Consider the below images:

semantic and instance segmentation

Can you identify the difference between these two? Both images use image segmentation techniques to identify and locate the people present.

  • In image 1, every pixel belongs to a particular class (either background or person). Also, all the pixels belonging to a particular class are represented by the same color (background as black and person as pink). This is an example of semantic segmentation
  • Image 2 also assigns a particular class to each pixel of the image. However, different objects of the same class have different colors (Person 1 as red, Person 2 as green, background as black, etc.). This is an example of instance segmentation
semantic and instance segmentation

Let me quickly summarize what we’ve learned. If there are 5 people in an image, semantic segmentation will focus on classifying all the people as a single instance. Instance segmentation, however, will identify each of these people individually.

So far, we have delved into the theoretical concepts of image processing and segmentation. Let’s mix things up a bit – we’ll combine learning concepts with implementing them in Python. I believe that’s the best way to learn and remember any topic.

Region-based Segmentation

One simple way to segment different objects could be to use their pixel values. An important point to note – the pixel values will be different for the objects and the image’s background if there’s a sharp contrast between them.

In this case, we can set a threshold value. The pixel values falling below or above that threshold can be classified accordingly (as objects or backgrounds). This technique is known as Threshold Segmentation.

If we want to divide the image into two regions (object and background), we define a single threshold value. This is known as the global threshold.

If we have multiple objects along with the background, we must define multiple thresholds. These thresholds are collectively known as the local threshold.

Implementation

Let’s implement what we’ve learned in this section. Download this image and run the code below. It will give you a better understanding of how thresholding works. (You can use any image of your choice if you feel like experimenting!).

First, we’ll import the required libraries.

Let’s read the downloaded image and plot it:

image = plt.imread('1.jpeg')
image.shape
plt.imshow(image)
threshold segmentation

It is a three-channel image (RGB). We need to convert it into grayscale to have only a single channel. Doing this will also help us better understand how the algorithm works.

grayscale image

Now, we want to apply a certain threshold to this image. This threshold should separate the image into two parts – the foreground and the background. Before we do that, let’s quickly check the shape of this image:

gray.shape

(192, 263)

The height and width of the image are 192 and 263, respectively. We will take the mean of the pixel values and use that as a threshold. If the pixel value exceeds our threshold, we can say it belongs to an object. The pixel value will be treated as the background if it is less than the threshold. Let’s code this:

gray_r = gray.reshape(gray.shape[0]*gray.shape[1])
for i in range(gray_r.shape[0]):
    if gray_r[i] > gray_r.mean():
        gray_r[i] = 1
    else:
        gray_r[i] = 0
gray = gray_r.reshape(gray.shape[0],gray.shape[1])
plt.imshow(gray, cmap='gray')

Nice! The darker region (black) represents the background, and the brighter (white) region is the foreground. We can define multiple thresholds as well to detect multiple objects:

gray = rgb2gray(image)
gray_r = gray.reshape(gray.shape[0]*gray.shape[1])
for i in range(gray_r.shape[0]):
    if gray_r[i] > gray_r.mean():
        gray_r[i] = 3
    elif gray_r[i] > 0.5:
        gray_r[i] = 2
    elif gray_r[i] > 0.25:
        gray_r[i] = 1
    else:
        gray_r[i] = 0
gray = gray_r.reshape(gray.shape[0],gray.shape[1])
plt.imshow(gray, cmap='gray')
local threshold

There are four different segments in the above image. You can set different threshold values and check how the segments are made. Some of the advantages of this method are:

  • Calculations are simpler
  • Fast operation speed
  • When the object and background have high contrast, this method performs well

However, this approach has some limitations. When there is no significant grayscale difference or an overlap of the grayscale pixel values, it becomes very difficult to get accurate segments.

Edge Detection Segmentation

What divides two objects in an image? An edge is always between two adjacent regions with different grayscale values (pixel values). The edges can be considered as the discontinuous local features of an image.

We can use this discontinuity to detect edges and hence define a boundary of the object. This helps us detect the shapes of multiple objects in a given image. Now, the question is, how can we detect these edges? This is where we can make use of filters and convolutions. Refer to this article if you need to learn about these concepts.

The below visual will help you understand how a filter convolves over an image :

convolution

Here’s the step-by-step process of how this works:

  • Take the weight matrix
  • Put it on top of the image
  • Perform element-wise multiplication and get the output
  • Move the weight matrix as per the stride chosen
  • Convolve until all the pixels of the input are used

The values of the weight matrix define the output of the convolution. My advice: It helps to extract features from the input. Researchers have found that choosing some specific values for these weight matrices helps us detect horizontal or vertical edges (or even the combination of horizontal and vertical edges).

One such weight matrix is the Sobel operator. It is typically used to detect edges. The Sobel operator has two weight matrices—one for detecting horizontal edges and the other for detecting vertical edges. Let me show how these operators look, and we will then implement them in Python.

Sobel filter (horizontal) =

121
000
-1-2-1

Sobel filter (vertical) =

-101
-202
-101

Edge detection works by convolving these filters over the given image. Let’s visualize them on this article.

image = plt.imread('index.png')
plt.imshow(image)
edge detection

Understanding how the edges are detected in this image should be fairly simple. Let’s convert it into grayscale and define the sobel filter (both horizontal and vertical) that will be convolved over this image:

# converting to grayscale
gray = rgb2gray(image)

# defining the sobel filters
sobel_horizontal = np.array([np.array([1, 2, 1]), np.array([0, 0, 0]), np.array([-1, -2, -1])])
print(sobel_horizontal, 'is a kernel for detecting horizontal edges')
 
sobel_vertical = np.array([np.array([-1, 0, 1]), np.array([-2, 0, 2]), np.array([-1, 0, 1])])
print(sobel_vertical, 'is a kernel for detecting vertical edges')
edge detection

Now, convolve this filter over the image using the convolve function of the ndimage package from scipy.

out_h = ndimage.convolve(gray, sobel_horizontal, mode='reflect')
out_v = ndimage.convolve(gray, sobel_vertical, mode='reflect')
# here mode determines how the input array is extended when the filter overlaps a border.

Let’s plot these results:

plt.imshow(out_h, cmap='gray')
horizontal edge detection
plt.imshow(out_v, cmap='gray')
vertical edge detection

Here, we can identify the horizontal and vertical edges. There is one more type of filter that can detect both horizontal and vertical edges simultaneously. This is called the laplace operator:

111
1-81
111

Let’s define this filter in Python and convolve it on the same image:

kernel_laplace = np.array([np.array([1, 1, 1]), np.array([1, -8, 1]), np.array([1, 1, 1])])
print(kernel_laplace, 'is a laplacian kernel')
laplacian filter

Next, convolve the filter and print the output:

out_l = ndimage.convolve(gray, kernel_laplace, mode='reflect')
plt.imshow(out_l, cmap='gray')
laplacian edge detection

Here, we can see that our method has detected both horizontal and vertical edges. I encourage you to try it on different images and share your results. Remember, the best way to learn is by practicing!

Clustering-based Image Segmentation

This idea might have come to you while reading about image segmentation techniques. Can’t we use clustering techniques to divide images into segments? We certainly can!

In this section, we’ll get an intuition of clustering (it’s always good to revise certain concepts!) and how to use it to segment images.

Clustering is dividing the population (data points) into many groups, such that data points in the same groups are more similar to other data points in that group than those in other groups. These groups are known as clusters.

K-means Clustering

One of the most commonly used clustering algorithms is k-means. Here, the k represents the number of clusters (not to be confused with k-nearest neighbor). Let’s understand how k-means works:

  1. First, randomly select k initial clusters
  2. Randomly assign each data point to any one of the k clusters
  3. Calculate the centers of these clusters
  4. Calculate the distance of all the points from the center of each cluster
  5. Depending on this distance, the points are reassigned to the nearest cluster
  6. Calculate the center of the newly formed clusters
  7. Finally, repeat steps (4), (5) and (6) until either the center of the clusters does not change or we reach the set number of iterations

The key advantage of using the k-means algorithm is that it is simple and easy to understand. We are assigning the points to the clusters closest to them.

How well does k-means segment objects in an image?

Let’s put our learning to the test and check how well k-means segment the objects in an image. We will be using this image, so download it, read it and, check its dimensions:

pic = plt.imread('1.jpeg')/255  # dividing by 255 to bring the pixel values between 0 and 1
print(pic.shape)
plt.imshow(pic)
clustering

It’s a 3-dimensional image of shape (192, 263, 3). To cluster the image using k-means, we first need to convert it into a 2-dimensional array whose shape is (length* width* channels). In our example, this will be (192* 263, 3).

pic_n = pic.reshape(pic.shape[0]*pic.shape[1], pic.shape[2])
pic_n.shape

(50496, 3)

The image has been converted to a 2-dimensional array. Next, the k-means algorithm is fitted to this reshaped array to obtain the clusters. The cluster_centers_ function of k-means returns the cluster centers, and the labels _ function gives us the label for each pixel (it tells us which pixel of the image belongs to which cluster).

from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5, random_state=0).fit(pic_n)
pic2show = kmeans.cluster_centers_[kmeans.labels_]

I have chosen 5 clusters for this article, but you can play around with this number and check the results. Now, let’s return the clusters to their original shape, a 3-dimensional image, and plot the results.

cluster_pic = pic2show.reshape(pic.shape[0], pic.shape[1], pic.shape[2])
plt.imshow(cluster_pic)
clustering based segmentation

Amazing, isn’t it? We can segment the image pretty well using just 5 clusters. I’m sure you’ll be able to improve the segmentation by increasing the number of clusters.

K-means works well when we have a small dataset. It can segment the objects in the image and give impressive results. However, the algorithm hits a roadblock when applied to a large dataset (more images).

It looks at all the samples at every iteration, so the time taken is too high. Hence, it’s also too expensive to implement. And since k-means is a distance-based algorithm, it only applies to convex datasets and is unsuitable for clustering non-convex clusters.

Finally, let’s look at a simple, flexible, and general approach for segmentation in image processing.

Mask R-CNN

Data scientists and researchers at Facebook AI Research (FAIR) pioneered a deep learning architecture called Mask R-CNN that can create a pixel-wise mask for each object in an image. This is a cool concept so follow along closely!

Mask R-CNN is an extension of the popular Faster R-CNN object detection architecture. Mask R-CNN adds a branch to the already existing Faster R-CNN outputs. The Faster R-CNN method generates two things for each object in the image:

  • It’s class
  • The bounding box coordinates

Mask R-CNN adds a third branch to this, which also outputs the object mask. Take a look at the below image to get an intuition of how Mask R-CNN works on the inside:

Mask R-CNN
  1. We take an image as input and pass it to the ConvNet, which returns the feature map for that image
  2. A region proposal network (RPN) is applied to these feature maps. This returns the object proposals along with their objectness score
  3. A RoI pooling layer is applied to these proposals to bring down all the proposals to the same size
  4. Finally, the proposals are passed to a fully connected layer to classify and output the bounding boxes for objects. It also returns the mask for each proposal

Mask R-CNN is the current state-of-the-art for image segmentation techniques and runs at 5 fps.

Image Classification vs. Object Detection vs. Image Segmentation

Image classification, object detection, and image segmentation techniques are all fundamental tasks in computer vision that analyze image content, but they answer different questions about the image:

  • Image Classification: What’s in the image? This is the most basic task. The model assigns a single label to the entire image, like “cat” or “landscape.” It’s like answering a multiple-choice question with only one answer.
  • Object Detection: What objects are in the image, and where are they? This goes beyond classification. The model identifies specific objects (cats, cars, people) and draws bounding boxes around them to indicate their location. It’s like answering a multiple-choice question where you can choose multiple answers and mark their positions on the image.
  • Image Segmentation: What are the exact shapes of the objects in the image? This provides the most detail. The model assigns a label to each pixel in the image, creating a kind of digital mask that outlines the shape of each object. It’s like coloring each object in the image with a different color to show their exact boundaries.

Summary of Image Segmentation Techniques

I have summarized the different image segmentation algorithms in the below table.. I suggest keeping this handy next time you’re working on an image segmentation challenge or problem!

AlgorithmDescriptionAdvantagesLimitations
Region-Based SegmentationSeparates the objects into different regions based on some threshold value(s).When there is no significant grayscale difference or an overlap of the grayscale pixel values, it becomes difficult to get accurate segments.It is not suitable when there are too many edges in the image and if there is less contrast between objects.
Edge Detection SegmentationIt is good for images to have better contrast between objects.a. Simple calculations

 

b. Fast operation speed

c. When the object and background have high contrast, this method performs well

Works well on small datasets and generates excellent clusters.
Segmentation based on ClusteringDivides the pixels of the image into homogeneous clusters.a. Simple, flexible, and general approach

 

b. It is also the current state-of-the-art for image segmentation

a. Computation time is too large and expensive.

 

b. k-means is a distance-based algorithm. It is not suitable for clustering non-convex clusters.

Mask R-CNNGives three outputs for each object in the image: its class, bounding box coordinates, and object maska. Simple, flexible and general approach

 

b. It is also the current state-of-the-art for image segmentation

High training time

Image Segmentation in Image Processing

Image segmentation techniques are fundamental in image processing. They divide a digital image into meaningful parts, like partitioning it into different regions containing pixels with similar characteristics. This simplifies the image and allows for a more focused analysis of specific objects or areas of interest.

Here’s a breakdown of what image segmentation is and how it works:

Purpose:

  • Simplify Complex Images: Segmenting an image breaks it down into smaller, more manageable pieces, making it easier to analyze specific regions or objects within the image.
  • Extract Objects of Interest: Image segmentation allows you to isolate specific objects from the background or other foreground elements. This is crucial for tasks like object recognition, counting, and tracking.
  • Prepare Images for Further Processing: Segmentation can be a pre-processing step for various image-processing tasks. By segmenting the image, you can focus on relevant regions and improve the accuracy of subsequent analysis.

How it Works:

  • Grouping Pixels: Image segmentation algorithms group pixels in an image based on shared characteristics. These characteristics can include color, intensity, texture, or spatial location.
  • Segmenting the Image: The segmentation process creates a new image, often called a segmentation mask. This mask assigns a label to each pixel, indicating the segment it belongs to.

Conclusion

This article is just the beginning of our journey to learn about image segmentation. In the next article of this series, we will explore the implementation of Mask R-CNN. So stay tuned!

The image segmentation algorithm is useful in my deep learning career. The level of granularity I get from these techniques is astounding. I am always amazed by how much detail we can extract with a few lines of code.

I hope you like the article and get a clear understanding of image segmentation techniques and types, image segmentation in image processing, image segmentation in deep learning, and what image segmentation is. There, you have cleared everything about the image segmentation topic.

Part 2 of this series is also live now: Computer Vision Tutorial: Implementing Mask R-CNN for Image Segmentation (with Python Code)

If you’re new to deep learning and computer vision, I recommend the below resources to get an understanding of the key concepts:

Frequently Asked Questions

Q1. What are the different types of image segmentation?

A. There are mainly 4 types of image segmentation: region-based segmentation, edge detection segmentation, clustering-based segmentation, and mask R-CNN.

Q2. What is the best image segmentation method?

A. Clustering-based segmentation techniques such as k-means clustering are the most commonly used method for image segmentation.

Q3. Which method is used for image segmentation?

A. Image segmentation employs various methods, including thresholding, region-based segmentation, edge detection, and clustering algorithms like K-means and Gaussian mixture models. Each method aims to partition an image into distinct regions or objects based on criteria such as color, intensity, texture, or spatial proximity.

Q4. What are the advantages of image segmentation?

A. Image segmentation offers several advantages, including object recognition, image understanding, feature extraction, and image compression. Dividing an image into meaningful segments facilitates more precise analysis and manipulation of specific areas of interest, leading to enhanced accuracy in tasks like object detection, classification, and tracking in computer vision applications.

Q5. Which method is used for Image Segmentation?

A. Two main ways to segment images:
Classic methods: Analyze pixel features (color, location) for grouping. Examples: thresholding, region-based segmentation (clustering), edge detection.
Deep learning: Powerful neural networks learn from data to segment. Uses convolutional neural networks (CNNs) for complex tasks.

My research interests lies in the field of Machine Learning and Deep Learning. Possess an enthusiasm for learning new skills and technologies.

Responses From Readers

Clear

Amr Barakat
Amr Barakat

Hello Pulkit, I am looking for an expert in machine learning like you , i have a project to detect missing bolts, screws, nuts, washers.... from an item or a device for example car dynamo. I would like your help in that. can you please contact me [email protected]

Vijit
Vijit

Excellent article!! Explained in very simple way.

Libo Wang
Libo Wang

This is exactly what I need! Thank you! When will part II be live?

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details