Fast food classification has become an important task in the automated food delivery system. Machine learning has become popular with the growth of fast food chains and the need for accurate and efficient food recognition systems. In this blog, we will explore the use of transfer learning for fast food classification using PyTorch. Transfer learning is a technique that leverages pre-trained models to solve new tasks with limited data.
We will discuss how to fine-tune a pre-trained model for fast food classification and the results obtained from this approach.
Learning objective
This article was published as a part of the Data Science Blogathon.
Transfer learning is a technique that utilizes the pre-trained weights of a deep learning model to perform a new task with limited data. In the context of ResNet18(which I will use in this project), transfer learning would involve taking a pre-trained ResNet18 model and fine-tuning its weights for a specific fast food classification task. This approach aims to leverage the knowledge learned by the pre-trained model on a large dataset to solve the new task with fewer data and computational resources. The fine-tuning process typically involves retraining the final layers of the ResNet18 model to adapt it to the new task. Below is the resnet18 model diagram.
Source: ResearchGate
You can see the model consists of 17 layers of Convolutional layer with 3 * 3 filter and One layer of the Fully connected layer. Last is the Softmax function for multiple classes of image classification.
Data is hosted in Kaggle here.
There are 10 categories of Fast Food images.
Step1: Import all the necessary libraries
from __future__ import print_function, division
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.backends.cudnn as cudnn
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy
Step2: Setting PATH to the dataset and device
PATH = "../data/Fast Food Classification V2/"
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# make sure my GPU is detected.
print(device)
Step3: Data Augmentation and Normalization
Data augmentation is a crucial technique used in deep learning to increase the size of the training dataset and prevent overfitting. It can help improve the performance and robustness of deep learning models, especially in scenarios with limited data.
data_transforms = {
'Train': transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'Valid': transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
Step4: Loading Dataset and creating Dataloader object
image_datasets = {
x: datasets.ImageFolder(os.path.join(PATH, x),
data_transforms[x]) for x in ['Train', 'Valid']
}
dataloaders = {
x: torch.utils.data.DataLoader(image_datasets[x],
batch_size=32,
shuffle=True,
) for x in ['Train', 'Valid']
}
dataset_sizes = {x: len(image_datasets[x]) for x in ['Train', 'Valid']}
class_names = image_datasets['Train'].classes
print(classes)
>>>
['Baked Potato',
'Burger',
'Crispy Chicken',
'Donut',
'Fries',
'Hot Dog',
'Pizza',
'Sandwich',
'Taco',
'Taquito']
Let’s see some training data.
# create a function image show
def imshow(inp, title=None):
inp = inp.numpy().transpose((1, 2, 0))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
inp = std * inp + mean
inp = np.clip(inp, 0, 1)
plt.imshow(inp)
if title is not None:
plt.title(title)
plt.pause(0.001)
# Get a batch of training data
inputs, classes = next(iter(dataloaders['Train']))
# Make a grid from batch
out = torchvision.utils.make_grid(inputs)
imshow(out)
Step5: Create a training function
The function takes the following inputs:
The function trains the model for num_epochs epochs, alternating between the training and validation phases. In each epoch, the model’s parameters are updated using the optimizer and the criterion to calculate the loss. During the training phase, the gradients are computed using backward(), and the parameters are updated using the optimizer.step(). The model’s performance is evaluated in the validation phase without updating the parameters.
After each epoch, the performance metrics (loss and accuracy) are printed. The best model weights (with the highest validation accuracy) are saved using copy.deepcopy(). At the end of the training, the time elapsed and the best validation accuracy are printed, and the best model weights are loaded using the model.load_state_dict(). Finally, the trained model is returned.
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
for epoch in range(num_epochs):
print(f'Epoch {epoch}/{num_epochs - 1}')
print('-' * 10)
# Each epoch has a training and validation phase
for phase in ['Train', 'Valid']:
if phase == 'Train':
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# Iterate over data.
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'Train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'Train':
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
if phase == 'Train':
scheduler.step()
epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects.double() / dataset_sizes[phase]
print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
# deep copy the model
if phase == 'Valid' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
print()
time_elapsed = time.time() - since
print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
print(f'Best Valid Acc: {best_acc:4f}')
# load best model weights
model.load_state_dict(best_model_wts)
return model
Step6: Start training the model with Resnet18 weights
model_1 = models.resnet18(pretrained=True)
num_ftrs = model_1.fc.in_features
# Here the size of each output sample is set to 2.
# Alternatively, it can be generalized to nn.Linear(num_ftrs, len(class_names)).
model_1.fc = nn.Linear(num_ftrs, len(class_names))
model_1 = model_1.to(device)
criterion = nn.CrossEntropyLoss()
# Observe that all parameters are being optimized
optimizer_sgd = optim.SGD(model_1.parameters(), lr=0.001, momentum=0.9)
optimizer_adam = optim.Adam(model_1.parameters(), lr=0.001)
# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_adam, step_size=7, gamma=0.1)
model_resnetft = train_model(model_1, criterion, optimizer_adam, exp_lr_scheduler,
num_epochs=15)
output >>>
Epoch 0/14
----------
Train Loss: 1.3397 Acc: 0.5660
Valid Loss: 1.0503 Acc: 0.6691
.
.
.
continues
.
.
.
Epoch 14/14
----------
Train Loss: 0.4054 Acc: 0.8709
Valid Loss: 0.4723 Acc: 0.8600
Training complete in 27m 23s
Best Valid Acc: 0.867714
So, you can see that completing training takes nearly 28min in GPU in Nvidia Tesla P100. And the Best Validation accuracy score is 86.77%.
Step7: Now see some results
The code first sets the model to evaluation mode (model.eval()) and initializes a counter images_so_far to keep track of the number of images visualized so far. A figure is also created using plt.figure().
The function then iterates over the validation data using enumerate(dataloaders[‘Valid’]). For each iteration, the input images and labels are moved to the specified device (using inputs.to(device) and labels.to(device)), and the model’s predictions are computed using model(inputs). The predicted class for each image is obtained using _, preds = torch.max(outputs, 1).
For each input image, the code plots the image using imshow(inputs.cpu().data[j]) and sets the title to the predicted class. The code keeps track of the number of images visualized so far using the counter images_so_far, and if the number of images visualized equals the specified number, the function returns.
Finally, the code sets the model back to its original training mode using the model.train(mode=was_training).
def visualize_model(model, num_images=6):
was_training = model.training
model.eval()
images_so_far = 0
fig = plt.figure()
with torch.no_grad():
for i, (inputs, labels) in enumerate(dataloaders['Valid']):
inputs = inputs.to(device)
labels = labels.to(device)
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
for j in range(inputs.size()[0]):
images_so_far += 1
ax = plt.subplot(num_images//2, 2, images_so_far)
ax.axis('off')
ax.set_title(f'predicted: {class_names[preds[j]]}')
imshow(inputs.cpu().data[j])
if images_so_far == num_images:
model.train(mode=was_training)
return
model.train(mode=was_training)
# Visualize model
visualize_model(model_1)
This article has demonstrated the use of transfer learning to perform fast food classification using the ResNet18 architecture and PyTorch. The implementation showed how to fine-tune the pre-trained model on the food dataset and evaluate the model’s performance on the validation set. The results showed that transfer learning could effectively leverage the knowledge learned from the large-scale dataset to improve the performance of the food classification task. Overall, transfer learning is a powerful tool for solving computer vision problems and has the potential to revolutionize the field. Following are some key learning from this project:
I hope this article will help you in your learning quest. If you have any questions, comment below. The entire codes are in my Kaggle notebook.
Connect with me on Twitter and LinkedIn.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.