Scratch Detection Using Mask RCNN & Yolov5

Abhinav Last Updated : 10 Feb, 2023

8 min read

Introduction

This project focuses on Car Scratch Detection, in sync with the development of autonomous quality inspection systems for different types of products. For example, in a parking lot, such detection provides the client with the assurance that their car will be safe and sound; also, if something happens, the detection system will be useful to handle the situation carefully.

Further, the techniques learned in this project can be supplemented in other projects or used in conjunction with some other problems, such as quality assurance and second-hand car valuation. I have tackled this problem as a single-class classification problem that considers dent, damage, and scratch as scratch and further, with the help of a flask, made a basic app. I will walk you through all the thoughts, code, algorithms, and knowledge I obtained while doing this project, which I implemented via Mask RCNN and Yolov5.

This is the end result of the model.

Learning Objective

Learn how to perform custom object detection using Mask RCNN and Yolov5.
Make use of transfer learning while using models trained on the coco dataset and Resnet50.
Learning the importance of quality data collection and data annotation is an integral and the most time-consuming part of any project.

This article was published as a part of the Data Science Blogathon.

Collecting Our Dataset

In order to collect data, I made a data scraper that uses Beautiful Soup to scrape data from online websites such as adobe, Istock photo, etc.

url = 'https://stock.adobe.com/in/search/images?k=car%20scratch'

# make a request to the url

r = requests.get(url)

# create our soup

soup = BeautifulSoup(r.text, 'html.parser')

print(soup.title.text)

images = soup.find_all('img')

for image in images[-1]:

name = image['alt']

link = image['src']

with open(name.replace(' ', '-').replace('/', '') + '.jpg', 'wb') as f:

im = requests.get(link)

f.write(im.content)

But it didn’t work because most of the images were not scraped because of the websites’ privacy policy regarding scraping. Because of the privacy issues I went forward and downloaded the images directly from Istock photo, Shutter photo, and Adobe.

I started with around 80 images, increasing it to 350 images and further increasing it to around 900 images for the final annotations.

Instance Segmentation with Mask RCNN

Image Segmentation is the segmentation of images based on pixels into different regions. Mask RCNN is a model used for Instance Segmentation, a sub-type of image segmentation that separate instances in an object’s boundaries. It is built further upon Faster RCNN. While Faster RCNN has two outputs for each object, as a class label and a bounding-box offset, Mask RCNN is the addition of third output i.e the mask of the object.

The architecture of Mask RCNN consists of the following:

Backbone Network
Region Proposal Network
Mask Representation
ROI Aign

The advantage of using Mask RCNN to detect scratches in cars is that we can work with polygons and not just bounding boxes, and create a mask on our target further abling us to obtain and visualize the result in a more accurate and succinct way.

Let’s start implementing our problem with Mask RCNN.

Importing the Libraries

Importing all the libraries required to implement our Mask RCNN algorithm.

# importing libraries
import pandas as pd
import numpy as np
import cv2
import os
import re
from PIL import Image
import albumentations as A
from albumentations.pytorch.transforms import ToTensorV2
import torch
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator
from torch.utils.data import DataLoader, Dataset
from torch.utils.data.sampler import SequentialSampler
from matplotlib import pyplot as plt

Dividing our Dataset

Data used in this is in .csv format which has x, y, w, and h coordinates of the bounding box, whereas the data is annotated using make-sense which is a data annotator.

image_ids = train_df['image_id'].unique()
print(len(image_ids))
valid_ids = image_ids[-10:]
train_ids = image_ids[:-10]
# valid and train df
valid_df = train_df[train_df['image_id'].isin(valid_ids)]
train_df = train_df[train_df['image_id'].isin(train_ids)]

Creating a Scratch Class

Creating our Scratch Dataset class which transforms our dataset and returns the required.

class ScratchDataset(Dataset):
    def __init__(self, dataframe, image_dir, transforms=None):
        super().__init__()
        self.image_ids = dataframe['image_id'].unique()
        self.df = dataframe
        self.image_dir = image_dir
        self.transforms = transforms
    def __getitem__(self, index: int):
        image_id = self.image_ids[index]
        records = self.df[self.df['image_id'] == image_id]
        image = cv2.imread(f'{self.image_dir}/{image_id}.jpg', cv2.IMREAD_COLOR)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).astype(np.float32)
        image /= 255.0
        boxes = records[['x', 'y', 'w', 'h']].values
        boxes[:, 2] = boxes[:, 0] + boxes[:, 2]
        boxes[:, 3] = boxes[:, 1] + boxes[:, 3]
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2]-boxes[:, 0])
        area = torch.as_tensor(area, dtype=torch.float32)
        # there is only one class
        labels = torch.ones((records.shape[0],), dtype=torch.int64)
        # suppose all instances are not crowd
        iscrowd = torch.zeros((records.shape[0],), dtype=torch.int64)
        target = {}
        target['boxes']=boxes
        target['labels']=labels
        target['image_id']=torch.tensor([index])
        target['area']=area
        target['iscrowd']=iscrowd
        if self.transforms:
            sample = {
                'image':image,
                'bboxes': target['boxes'],
                'labels': labels
            }
            sample = self.transforms(**sample)
            image = sample['image']
            target['boxes'] = torch.tensor(sample['bboxes'])
        return image, target, image_id
    def __len__(self) -> int:
        return self.image_ids.shape[0]

Here ‘img_dir’, is the path to directory where images are saved.

Data Augmentation

Here we are using Albumentations for data augmentation.

# Albumenations
def get_train_transform():
    return A.Compose([
        A.Flip(0.5),
        ToTensorV2(p=1.0)
    ], bbox_params={'format':'pascal_voc', 'label_fields':['labels']})
def get_valid_transform():
    return A.Compose([
        ToTensorV2(p=1.0)
    ], bbox_params={'format': 'pascal_voc', 'label_fields':['labels']})

Creating Our Model

We are gonna use the Resnet50 model along with Mask RCNN.

# load a model pre-trained on COCO
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

num_classes = 2 # 1 class scratch+ background

# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features

# replace th epre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

Let’s move towards creating an Averager class and training and validation data loader, which are going to be the key component while training our model.

class Averager:
    def __init__(self):
        self.current_total = 0.0
        self.iterations = 0.0
    def send(self, value):
        self.current_total += value
        self.iterations += 1
    @property
    def value(self):
        if self.iterations == 0:
            return 0
        else:
            return 1.0 * self.current_total/ self.iterations
    def reset(self):
        self.current_total = 0.0
        self.iterations = 0.0

def collate_fn(batch):
    return tuple(zip(*batch))

train_dataset = WheatDataset(train_df, DIR_TRAIN, get_train_transform())
valid_dataset = WheatDataset(valid_df, DIR_TRAIN, get_valid_transform())
# split the dataset in train and test set
indices = torch.randperm(len(train_dataset)).tolist()
train_data_loader = DataLoader(
    train_dataset,
    batch_size=16,
    shuffle=False,
    num_workers=4,
    collate_fn=collate_fn
)
valid_data_loader = DataLoader(
    valid_dataset,
    batch_size=8,
    shuffle=False,
    num_workers=4,
    collate_fn=collate_fn
)

Training our Model

We are activating ‘cuda’ and accessing the GPU if it is available to us. Further our weight_decay=0.0005, momentum=0.9, and a dynamic learning rate that starts with 0.05

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
images, targets, image_ids = next(iter(train_data_loader))
model.to(device)
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, 
                            weight_decay=0.0005)
# lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1)
lr_scheduler = None
num_epochs = 2

loss_hist = Averager()
itr=1
for epoch in range(num_epochs):
    loss_hist.reset()
    for images, targets, image_ids, in train_data_loader:
        images = list(image.to(device) for image in images)
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
        loss_dict = model(images, targets)
        losses = sum(loss for loss in loss_dict.values())
        loss_value = losses.item()
        loss_hist.send(loss_value)
        optimizer.zero_grad()
        losses.backward()
        optimizer.step()
        if itr % 50 == 0:
            print(f'Iteration #{itr} loss: {loss_value}')
        itr += 1
    # update the learning rate
    if lr_scheduler is not None:
        lr_scheduler.step()
    print(f'Epoch #{epoch} loss: {loss_hist.value}')

But I could not implement this as it took 10 hours and more for meager 80 images.
The time complexity of using Mask RCNN for custom training is huge, and you need a lot of computing power which wasn’t available for me.

I hope you have a good computing machine and you can implement it.

Object Detection Through Yolov5

Primarily used for object detection, Yolo is released by Ultralytics[github], which has become the benchmark algorithm, for instance, segmentation in visual data. Yolov5 is faster and more efficient than Yolov4, and it generalizes well to new images.

Yolov5 Architecture

The algorithm works based on the following:

Residual blocks
Bounding box regression
Intersection Over Unions(IOU)
Non-Maximum Suppression.a

Yolov5 is faster, smaller, and roughly as accurate as previous versions. Trained on the coco dataset, it works well with bounding boxes.

Let’s start with the implementation of Yolov5 in our problem case; I have used google collab to run the code therein.

Data Annotation

I used a make-sense data annotator for annotating the dataset.
When the data is annotated precisely i.e., small, and to the point, Yolo doesn’t work well enough since it doesn’t generalize well to small bounding boxes.
Therefore data annotation is a bit tricky, and the regions should be annotated uniformly.

Training

After loading the model,

model = torch.hub.load('ultralytics/yolov5','yolov5s')

We added the yaml file and data as required for working on Yolo (images in one folder, whereas annotations as text files in another folder) we trained our model with a batch size of 16 and an image size of 320*320.

!cd yolov5 && python train.py --img 320 --batch 16 --epochs 50 --data carScr_up.yaml --weights last.pt

Though in the Yolo documentation, it is stated to run for 300 epochs to get good results, we have brought it down to 50 epochs, and after hyperparameter tuning, our model started doing pretty well even within 30 epochs.

For hyperparameter tuning, we use evolve provided by Yolo, wherein data is trained for 10 epochs for 300 evolution.

!cd yolov5 && python train.py --img 320 --batch 32 --epochs 10 --data carScr_up.yaml --weights yolov5s.pt --cache --evolve

Results

The results started with

Exp	Precision	Recall	mAP_0.5
1	0.003	0.511	0.001
2	0.659	0.311	0.363
3	0.624	0.536	0.512
4	0.572	0.610	0.519

The below image represents experiment 4 and each experiment is trained on a different number of images and annotations. The predictions for cars with scratches are as follows:

The precision and recall, in this case, are small because with Yolo, we are dealing with bounding boxes and these metrics depends upon the Intersection of the Union(IOU) of the actual and predicted boxes.

Let’s look at the metrics obtained after training our dataset with Yolov5 for 50 epochs

We can see that after 20 epochs the progress stagnates, thus Yolo is pretty fast to learn the relationship and generalize well to our problem statement, even though the data we had was below 1000 images.

Conclusion

We can see that Yolov5 and Mask RCNN works really well for our problem statement, though I couldn’t implement the latter code. Yolov5 works pretty well in keeping up with our problem statement. While doing custom training with Yolov5, barring the metrics, it is able to predict extremely well, detecting all the scratches and damage in a sample image. Thus we have a pretty good model wherein we learn how to collect, annotate and train different models and what it takes to train different models.

In the above, I have considered damage and scratch as a single class.
Data Annotations and collection is an integral and exhaustive part of this solution.
We can definitely do better if we use polygon and increase our dataset size.

PS: This doesn’t work well with the car with no damage. Since we trained it on only data containing cars containing scratches and damages. We can definitely generalize this to suit our needs

Alternatively, we can follow the link to the research paper mentioned below, where the images are divided into 3*3 grids and used as our training data. This will result in an increase in the ratio of scratch to image, thus generalizing well to the dataset and improving our metrics.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Abhinav

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Computer Vision

Getting Started with Image Data

Introduction to CNN and Implementation

Introduction to CNN and implementation

Introduction to Transfer Learning

CNN Visualization

Overview of Pretrained Models

Inception

ResNets

DenseNets

CSRNet

Introduction to Object Detection

Region Based Convolutional Neural Network

Single Stage Networks

Transformed Based Object Detection Models

Face Detection

Object Tracking

Pose Estimation

Introduction to Image Segmentation

Understanding Deep Learning Architectures for Image Segmentation

Video Classification

Introduction to Image Generation

Experiments with Generative Adversarial Networks

Zero and Few Shot Learning

Model Deployment

Scratch Detection Using Mask RCNN & Yolov5

Introduction

Table of Contents

Collecting Our Dataset

Instance Segmentation with Mask RCNN

Importing the Libraries

Dividing our Dataset

Creating a Scratch Class

Data Augmentation

Creating Our Model

Training our Model

Object Detection Through Yolov5

Yolov5 Architecture

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm