Image augmentation is a super effective concept when we don’t have enough data with us
We can use image augmentation for deep learning in any setting – hackathons, industry projects, and so on
We’ll also build an image classification model using PyTorch to understand how image augmentation fits into the picture
Introduction
The trick to do well in deep learning hackathons (or frankly any data science hackathon) often comes down to feature engineering. How much creativity can you muster when you’re given data that simply isn’t enough to build a winning deep learning model?
I’m talking from my own experience of participating in multiple deep learning hackathons where we were given a dataset of a few hundred images – simply not enough to win or even finish in the top echelons of the leaderboard. So how can we deal with this problem?
The answer? Well, that lies deep in a data scientist’s skillset! This is where our curiosity and creativity come to the fore. That’s the idea behind feature engineering – how well we can come up with new features given existing ones. And the same idea applies when we’re working with image data.
And that’s where image augmentation plays a major role. This concept isn’t limited to hackathons – we use it in the industry and in real-world deep learning projects all the time!
I love how image augmentation helps spruce up my existing dataset with more data without having to put manual time taking efforts. And I’m sure you’re going to find this technique very helpful for your own projects.
So in this article, we will understand the concept of image augmentation, why it’s helpful, and what are the different image augmentation techniques. We’ll also implement these image augmentation techniques to build an image classification model using PyTorch.
This is the fifth article of PyTorch for beginners series which I have been working on. You can access the previous articles here:
Basic Guidelines for Selecting the Right Augmentation Technique
Case Study: Solving an Image Classification Problem using Image Augmentation
Why Do We Need Image Augmentation?
Deep learning models usually require a lot of data for training. In general, the more the data, the better the performance of the model. But acquiring massive amounts of data comes with its own challenges. Not everyone has the deep pockets of the big firms.
And the problem with a lack of data is that our deep learning model might not learn the pattern or function from the data and hence it might not give a good performance on unseen data.
So what can we do in that case? Instead of spending days manually collecting data, we can make use of Image Augmentation techniques.
Image Augmentation is the process of generating new images for training our deep learning model. These new images are generated using the existing training images and hence we don’t have to collect them manually.
There are multiple image augmentation techniques and we will discuss some of the common and most widely used ones in the next section.
Different Image Augmentation Techniques
Image Rotation
Image rotation is one of the most commonly used augmentation techniques. It can help our model become robust to the changes in the orientation of objects. Even if we rotate the image, the information of the image remains the same. A car is a car even if we see it from a different angle:
Hence, we can use this technique to increase the size of our data by creating rotated images from the original ones. Let’s see how we can rotate images:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I will be using this image to demonstrate different image augmentation techniques. You can try other images as well as per your requirement.
Let’s import the image and visualize it first:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is the original image. Let’s now see how we can rotate it. I will use the rotate function of the skimage library to rotate the image:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Nice! Setting mode as ‘wrap’ fills the points outside the boundaries of the input with the remaining pixels of the image.
Shifting Images
There might be scenarios when the objects in the image are not perfectly central aligned. In these cases, image shift can be used to add shift-invariance to the images.
By shifting the images, we can change the position of the object in the image and hence give more variety to the model. This will eventually lead to a more generalized model.
Image shift is a geometric transformation that maps the position of every object in the image to a new location in the final output image.
After the shift operation, an object present at a location (x,y) in the input image is shifted to a new position (X, Y):
X = x + dx
Y = y + dy
Here, dx and dy are the respective shifts along different dimensions. Let’s see how we can apply shift to an image:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The translation hyperparameter defines the number of pixels by which the image should be shifted. Here, I have shifted the image by (25, 25) pixels. You can play around with the values of this hyperparameter.
Again I have used the mode as ‘wrap’ which fills the points outside the boundaries of the input with the remaining pixels of the image. In the output above, you can see that both the height and width of the image have been shifted by 25 pixels.
Flipping Images
Flipping is an extension of rotation. It allows us to flip the image in the left-right as well as up-down direction. Let’s see how we can implement flipping:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Here, I have used the fliplr function of NumPy to flip the image from left to right. It flips the pixel values of each row and the output confirms the same. Similarly, we can flip the images in an up-down direction:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is how we can flip the image and make more generalized models that will learn the patterns of the original as well as the flipped images. Adding random noise to the images is also an image augmentation technique. Let’s understand it using an example.
Adding Noise to Images
Image noising is an important augmentation step that allows our model to learn how to separate signal from noise in an image. This also makes the model more robust to changes in the input.
We will use the random_noise function of the skimage library to add some random noise to our original image.
I will take the standard deviation of the noise to be added as 0.155 (you can change this value as well). Just keep in mind that increasing this value will add more noise to the image and vice versa:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We can see that random noise has been added to our original image. Play around with the standard deviation value and see the different results you get.
Blurring Images
All photography lovers will instantly understand this idea.
Images come from different sources. And hence, the quality of the images will not be the same from each source. Some images might be of very high quality while others might be just plain bad.
In such scenarios, we can blur the image. How will that help? Well, this helps make our deep learning model more robust.
Let’s see how we can do that. We will use a Gaussian filter for blurring the image:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sigma here is the standard deviation for the Gaussian filter. I have taken it as 1. The higher the sigma value, the more will be the blurring effect. Setting Multichannel to true ensures that each channel of the image is filtered separately.
Again, you can try different sigma values to change the magnitude of blurriness.
These are some of the image augmentation techniques which help to make our deep learning model robust and generalizable. This also helps increase the size of the training set.
We’re almost at the implementation part of this tutorial. Before that, let’s look at some of the basic guidelines for deciding the right image augmentation technique.
Basic Guidelines for Selecting the Right Augmentation Technique
There are a few guidelines that I think are important while deciding the augmentation technique based on the problem that you are trying to solve. Here is a brief summary of these guidelines:
The first step in any model building process is to make sure that the size of our input matches what is expected by the model. We also have to make sure that the size of all the images should be similar. For this, we can resize our images to the appropriate size.
Let’s say you are working on a classification problem and have relatively less number of data samples. In such scenarios, you can use different augmentation techniques like image rotation, image noising, flipping, shift, etc. Remember all these operations are applicable for classification problems where the location of objects in the image does not matter.
If you are working on an object detection task, where the location of objects is what we want to detect, these techniques might not be appropriate.
Normalizing image pixel values is always a good strategy to ensure better and faster convergence of the model. If there are some specific requirements of the model, we must pre-process the images as per the model’s requirement.
Now, without waiting further, let’s move on to the model building part. We will apply the augmentation techniques that are discussed in this article to generate images and then use those images to train the model.
Case Study: Solving an Image Classification Problem and Applying Image Augmentation
We will be working on the emergency vs non-emergency vehicle classification problem. You should be familiar with the problem statement if you’ve gone through my previous PyTorch articles.
The aim of this project is to classify the images of vehicles as emergency or non-emergency. And you guessed it – it’s an image classification problem. You can download the dataset from here.
Loading the dataset
Let’s begin! We’ll start by loading the data into our notebook. Then, we’ll apply image augmentation techniques and finally, build a convolutional neural network (CNN) model.
Let’s import the required libraries:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Now, we will read the CSV file that contains the names of images and their corresponding labels:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
0 here represents that the vehicle is non-emergency and 1 means it’s an emergency vehicle. Let’s now import all the images from our dataset:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We have a total of 1,646 images in the dataset. Let’s split this data into training and validation set. We will use the validation set to evaluate how well the model will perform on unseen data:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I have kept the test_size as 0.1 and hence 10% data will be randomly selected as the validation set and the remaining 90% will be used to train the model. We have 1,481 images in the training set which is quite less to train a deep learning model.
So next, we will augment these training images to increase the training set and possibly improve our model’s performance.
Augmenting the Images
We will be using the image augmentation techniques we discussed earlier:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We have generated 4 augmented images for each of the 1,481 images in the training set. Let’s convert the images in the form of an array and verify the size of our dataset:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This confirms that we have augmented the images and increased the size of our training set. Let’s visualize these augmented images:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The first image here is the original image from the dataset. The remaining four images are generated using different image augmentation techniques – rotation, left-to-right flip, up-down flip and adding random noise respectively.
Our dataset is now ready. It’s time to define the architecture of our deep learning model and then train it on the augmented training set. Let’s first import all the functions from PyTorch:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We’ll have to convert both the training and validation sets into PyTorch format:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Next, we will define the architecture of the model. This is a bit complex since the architecture has 4 convolutional blocks and then 4 fully connected dense layers:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Let’s define the other hyperparameters of the model as well, including the optimizer, learning rate, and the loss function:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Let’s train our deep learning model for 20 epochs:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a summary of the training phase. You’ll notice that the training loss decreases as we increase the epochs. Let’s save the weights of the trained model so we can use them in the future without retraining the model:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If you do not wish to train the model at your end, you can download the weights of the model which I trained for 20 epochs using this link.
Next, let’s load this model:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Finally, let’s make predictions for the training and validation set and check the respective accuracy:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We got an accuracy of more than 91% on the training set! That’s quite promising. But let’s wait before we celebrate. We need to check the same for the validation set:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The validation accuracy is around 78%. That’s quite good!
End Notes
This is how we can use image augmentation techniques when we are given less training data to begin with.
In this article, we covered most of the commonly used image augmentation techniques. We learned how to rotate, shift, and flip images. We also learned how we can add random noise to images or blur them. Then we discussed basic guidelines for selecting the right augmentation technique.
You can try these image augmentation techniques on any image classification problem and then compare the performance with and without augmentation. Feel free to share your results in the comments section below.
And if you’re new to deep learning, computer vision and image data in general, I suggest going through the below course:
Amazing and very interesting article.
Thank you.
Silvia
Anand Mishra
Interesting and insightful article Pulkit :)
purna
Nice article. I got an error while doing `final_train = np.array(final_train_data)` , saying "Memory Error". My python is running on 64 bit and on 16gb ram. Do you have any idea or solution for this?
We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.
Show details
Powered By
Cookies
This site uses cookies to ensure that you get the best experience possible. To learn more about how we use cookies, please refer to our Privacy Policy & Cookies Policy.
brahmaid
It is needed for personalizing the website.
csrftoken
This cookie is used to prevent Cross-site request forgery (often abbreviated as CSRF) attacks of the website
Identityid
Preserves the login/logout state of users across the whole site.
sessionid
Preserves users' states across page requests.
g_state
Google One-Tap login adds this g_state cookie to set the user status on how they interact with the One-Tap modal.
MUID
Used by Microsoft Clarity, to store and track visits across websites.
_clck
Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_clsk
Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.
SRM_I
Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.
SM
Use to measure the use of the website for internal analytics
CLID
The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.
SRM_B
Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.
_gid
This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.
_ga_#
Used by Google Analytics, to store and count pageviews.
_gat_#
Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.
collect
Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.
AEC
cookies ensure that requests within a browsing session are made by the user, and not by other sites.
G_ENABLED_IDPS
use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.
test_cookie
This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.
_we_us
this is used to send push notification using webengage.
WebKlipperAuth
used by webenage to track auth of webenagage.
ln_or
Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.
JSESSIONID
Use to maintain an anonymous user session by the server.
li_rm
Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.
AnalyticsSyncHistory
Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.
lms_analytics
Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.
liap
Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.
visit
allow for the Linkedin follow feature.
li_at
often used to identify you, including your name, interests, and previous activity.
s_plt
Tracks the time that the previous page took to load
lang
Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings
s_tp
Tracks percent of page viewed
AMCV_14215E3D5995C57C0A495C55%40AdobeOrg
Indicates the start of a session for Adobe Experience Cloud
s_pltp
Provides page name value (URL) for use by Adobe Analytics
s_tslv
Used to retain and fetch time since last visit in Adobe Analytics
li_theme
Remembers a user's display preference/theme setting
li_theme_set
Remembers which users have updated their display / theme preferences
We do not use cookies of this type.
_gcl_au
Used by Google Adsense, to store and track conversions.
SID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
SAPISID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
__Secure-#
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
APISID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
SSID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
HSID
Save certain preferences, for example the number of search results per page or activation of the SafeSearch Filter. Adjusts the ads that appear in Google Search.
DV
These cookies are used for the purpose of targeted advertising.
NID
These cookies are used for the purpose of targeted advertising.
1P_JAR
These cookies are used to gather website statistics, and track conversion rates.
OTZ
Aggregate analysis of website visitors
_fbp
This cookie is set by Facebook to deliver advertisements when they are on Facebook or a digital platform powered by Facebook advertising after visiting this website.
fr
Contains a unique browser and user ID, used for targeted advertising.
bscookie
Used by LinkedIn to track the use of embedded services.
lidc
Used by LinkedIn for tracking the use of embedded services.
bcookie
Used by LinkedIn to track the use of embedded services.
aam_uuid
Use these cookies to assign a unique ID when users visit a website.
UserMatchHistory
These cookies are set by LinkedIn for advertising purposes, including: tracking visitors so that more relevant ads can be presented, allowing users to use the 'Apply with LinkedIn' or the 'Sign-in with LinkedIn' functions, collecting information about how visitors use the site, etc.
li_sugr
Used to make a probabilistic match of a user's identity outside the Designated Countries
MR
Used to collect information for analytics purposes.
ANONCHK
Used to store session ID for a users session to ensure that clicks from adverts on the Bing search engine are verified for reporting purposes and for personalisation
We do not use cookies of this type.
Cookie declaration last updated on 24/03/2023 by Analytics Vidhya.
Cookies are small text files that can be used by websites to make a user's experience more efficient. The law states that we can store cookies on your device if they are strictly necessary for the operation of this site. For all other types of cookies, we need your permission. This site uses different types of cookies. Some cookies are placed by third-party services that appear on our pages. Learn more about who we are, how you can contact us, and how we process personal data in our Privacy Policy.
Amazing and very interesting article. Thank you. Silvia
Thank you Silvia!
Interesting and insightful article Pulkit :)
Thank you Anand!!
Nice article. I got an error while doing `final_train = np.array(final_train_data)` , saying "Memory Error". My python is running on 64 bit and on 16gb ram. Do you have any idea or solution for this?
Hi Purna, Try to switch to Google Colab. The error that you are getting is due to less available memory.