This project focuses on Car Scratch Detection, in sync with the development of autonomous quality inspection systems for different types of products. For example, in a parking lot, such detection provides the client with the assurance that their car will be safe and sound; also, if something happens, the detection system will be useful to handle the situation carefully.
Further, the techniques learned in this project can be supplemented in other projects or used in conjunction with some other problems, such as quality assurance and second-hand car valuation. I have tackled this problem as a single-class classification problem that considers dent, damage, and scratch as scratch and further, with the help of a flask, made a basic app. I will walk you through all the thoughts, code, algorithms, and knowledge I obtained while doing this project, which I implemented via Mask RCNN and Yolov5.
This is the end result of the model.
Learning Objective
This article was published as a part of the Data Science Blogathon.
In order to collect data, I made a data scraper that uses Beautiful Soup to scrape data from online websites such as adobe, Istock photo, etc.
url = 'https://stock.adobe.com/in/search/images?k=car%20scratch' # make a request to the url r = requests.get(url) # create our soup soup = BeautifulSoup(r.text, 'html.parser') print(soup.title.text)
images = soup.find_all('img') for image in images[-1]: name = image['alt'] link = image['src'] with open(name.replace(' ', '-').replace('/', '') + '.jpg', 'wb') as f: im = requests.get(link) f.write(im.content)
But it didn’t work because most of the images were not scraped because of the websites’ privacy policy regarding scraping. Because of the privacy issues I went forward and downloaded the images directly from Istock photo, Shutter photo, and Adobe.
I started with around 80 images, increasing it to 350 images and further increasing it to around 900 images for the final annotations.
Image Segmentation is the segmentation of images based on pixels into different regions. Mask RCNN is a model used for Instance Segmentation, a sub-type of image segmentation that separate instances in an object’s boundaries. It is built further upon Faster RCNN. While Faster RCNN has two outputs for each object, as a class label and a bounding-box offset, Mask RCNN is the addition of third output i.e the mask of the object.
The architecture of Mask RCNN consists of the following:
The advantage of using Mask RCNN to detect scratches in cars is that we can work with polygons and not just bounding boxes, and create a mask on our target further abling us to obtain and visualize the result in a more accurate and succinct way.
Let’s start implementing our problem with Mask RCNN.
Importing all the libraries required to implement our Mask RCNN algorithm.
# importing libraries import pandas as pd import numpy as np import cv2 import os import re from PIL import Image import albumentations as A from albumentations.pytorch.transforms import ToTensorV2 import torch import torchvision from torchvision.models.detection.faster_rcnn import FastRCNNPredictor from torchvision.models.detection import FasterRCNN from torchvision.models.detection.rpn import AnchorGenerator from torch.utils.data import DataLoader, Dataset from torch.utils.data.sampler import SequentialSampler from matplotlib import pyplot as plt
Data used in this is in .csv format which has x, y, w, and h coordinates of the bounding box, whereas the data is annotated using make-sense which is a data annotator.
image_ids = train_df['image_id'].unique() print(len(image_ids)) valid_ids = image_ids[-10:] train_ids = image_ids[:-10] # valid and train df valid_df = train_df[train_df['image_id'].isin(valid_ids)] train_df = train_df[train_df['image_id'].isin(train_ids)]
Creating our Scratch Dataset class which transforms our dataset and returns the required.
class ScratchDataset(Dataset): def __init__(self, dataframe, image_dir, transforms=None): super().__init__() self.image_ids = dataframe['image_id'].unique() self.df = dataframe self.image_dir = image_dir self.transforms = transforms def __getitem__(self, index: int): image_id = self.image_ids[index] records = self.df[self.df['image_id'] == image_id] image = cv2.imread(f'{self.image_dir}/{image_id}.jpg', cv2.IMREAD_COLOR) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).astype(np.float32) image /= 255.0 boxes = records[['x', 'y', 'w', 'h']].values boxes[:, 2] = boxes[:, 0] + boxes[:, 2] boxes[:, 3] = boxes[:, 1] + boxes[:, 3] area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2]-boxes[:, 0]) area = torch.as_tensor(area, dtype=torch.float32) # there is only one class labels = torch.ones((records.shape[0],), dtype=torch.int64) # suppose all instances are not crowd iscrowd = torch.zeros((records.shape[0],), dtype=torch.int64) target = {} target['boxes']=boxes target['labels']=labels target['image_id']=torch.tensor([index]) target['area']=area target['iscrowd']=iscrowd if self.transforms: sample = { 'image':image, 'bboxes': target['boxes'], 'labels': labels } sample = self.transforms(**sample) image = sample['image'] target['boxes'] = torch.tensor(sample['bboxes']) return image, target, image_id def __len__(self) -> int: return self.image_ids.shape[0]
Here ‘img_dir’, is the path to directory where images are saved.
Here we are using Albumentations for data augmentation.
# Albumenations def get_train_transform(): return A.Compose([ A.Flip(0.5), ToTensorV2(p=1.0) ], bbox_params={'format':'pascal_voc', 'label_fields':['labels']}) def get_valid_transform(): return A.Compose([ ToTensorV2(p=1.0) ], bbox_params={'format': 'pascal_voc', 'label_fields':['labels']})
We are gonna use the Resnet50 model along with Mask RCNN.
# load a model pre-trained on COCO model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
num_classes = 2 # 1 class scratch+ background
# get number of input features for the classifier in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace th epre-trained head with a new one model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
Let’s move towards creating an Averager class and training and validation data loader, which are going to be the key component while training our model.
class Averager: def __init__(self): self.current_total = 0.0 self.iterations = 0.0 def send(self, value): self.current_total += value self.iterations += 1 @property def value(self): if self.iterations == 0: return 0 else: return 1.0 * self.current_total/ self.iterations def reset(self): self.current_total = 0.0 self.iterations = 0.0
def collate_fn(batch): return tuple(zip(*batch))
train_dataset = WheatDataset(train_df, DIR_TRAIN, get_train_transform()) valid_dataset = WheatDataset(valid_df, DIR_TRAIN, get_valid_transform()) # split the dataset in train and test set indices = torch.randperm(len(train_dataset)).tolist() train_data_loader = DataLoader( train_dataset, batch_size=16, shuffle=False, num_workers=4, collate_fn=collate_fn ) valid_data_loader = DataLoader( valid_dataset, batch_size=8, shuffle=False, num_workers=4, collate_fn=collate_fn )
We are activating ‘cuda’ and accessing the GPU if it is available to us. Further our weight_decay=0.0005, momentum=0.9, and a dynamic learning rate that starts with 0.05
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') images, targets, image_ids = next(iter(train_data_loader)) model.to(device) params = [p for p in model.parameters() if p.requires_grad] optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005) # lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1) lr_scheduler = None num_epochs = 2
loss_hist = Averager() itr=1 for epoch in range(num_epochs): loss_hist.reset() for images, targets, image_ids, in train_data_loader: images = list(image.to(device) for image in images) targets = [{k: v.to(device) for k, v in t.items()} for t in targets] loss_dict = model(images, targets) losses = sum(loss for loss in loss_dict.values()) loss_value = losses.item() loss_hist.send(loss_value) optimizer.zero_grad() losses.backward() optimizer.step() if itr % 50 == 0: print(f'Iteration #{itr} loss: {loss_value}') itr += 1 # update the learning rate if lr_scheduler is not None: lr_scheduler.step() print(f'Epoch #{epoch} loss: {loss_hist.value}')
But I could not implement this as it took 10 hours and more for meager 80 images.
The time complexity of using Mask RCNN for custom training is huge, and you need a lot of computing power which wasn’t available for me.
I hope you have a good computing machine and you can implement it.
Primarily used for object detection, Yolo is released by Ultralytics[github], which has become the benchmark algorithm, for instance, segmentation in visual data. Yolov5 is faster and more efficient than Yolov4, and it generalizes well to new images.
The algorithm works based on the following:
Yolov5 is faster, smaller, and roughly as accurate as previous versions. Trained on the coco dataset, it works well with bounding boxes.
Let’s start with the implementation of Yolov5 in our problem case; I have used google collab to run the code therein.
Data Annotation
Training
After loading the model,
model = torch.hub.load('ultralytics/yolov5','yolov5s')
We added the yaml file and data as required for working on Yolo (images in one folder, whereas annotations as text files in another folder) we trained our model with a batch size of 16 and an image size of 320*320.
!cd yolov5 && python train.py --img 320 --batch 16 --epochs 50 --data carScr_up.yaml --weights last.pt
Though in the Yolo documentation, it is stated to run for 300 epochs to get good results, we have brought it down to 50 epochs, and after hyperparameter tuning, our model started doing pretty well even within 30 epochs.
For hyperparameter tuning, we use evolve provided by Yolo, wherein data is trained for 10 epochs for 300 evolution.
!cd yolov5 && python train.py --img 320 --batch 32 --epochs 10 --data carScr_up.yaml --weights yolov5s.pt --cache --evolve
Results
The results started with
Exp | Precision | Recall | mAP_0.5 |
1 | 0.003 | 0.511 | 0.001 |
2 | 0.659 | 0.311 | 0.363 |
3 | 0.624 | 0.536 | 0.512 |
4 | 0.572 | 0.610 | 0.519 |
The below image represents experiment 4 and each experiment is trained on a different number of images and annotations. The predictions for cars with scratches are as follows:
The precision and recall, in this case, are small because with Yolo, we are dealing with bounding boxes and these metrics depends upon the Intersection of the Union(IOU) of the actual and predicted boxes.
Let’s look at the metrics obtained after training our dataset with Yolov5 for 50 epochs
We can see that after 20 epochs the progress stagnates, thus Yolo is pretty fast to learn the relationship and generalize well to our problem statement, even though the data we had was below 1000 images.
We can see that Yolov5 and Mask RCNN works really well for our problem statement, though I couldn’t implement the latter code. Yolov5 works pretty well in keeping up with our problem statement. While doing custom training with Yolov5, barring the metrics, it is able to predict extremely well, detecting all the scratches and damage in a sample image. Thus we have a pretty good model wherein we learn how to collect, annotate and train different models and what it takes to train different models.
PS: This doesn’t work well with the car with no damage. Since we trained it on only data containing cars containing scratches and damages. We can definitely generalize this to suit our needs
Alternatively, we can follow the link to the research paper mentioned below, where the images are divided into 3*3 grids and used as our training data. This will result in an increase in the ratio of scratch to image, thus generalizing well to the dataset and improving our metrics.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.