In the world of deep learning, where data is often less, the role of data augmentation has become very important. We use methods like turning images or flipping them to make our model learn better. But our datasets are becoming more complicated. That’s where data augmentation steps in. This helps for our learning model and to manage complex datasets with new and effective methods.
New methods like Cutmix, Mixup and Cutout dynamically create augmented samples, that provides more easy solution for handling complicated datasets. Data augmentation makes our deep learning model even more smarter. It shows the limitations of static data augmentation. This blog begins on a journey to look into details of data augmentation in deep learning, by learning it’s importance, techniques and practical implications.
This article was published as a part of the Data Science Blogathon.
In the area of deep learning, where computers learn to do smart things, there’s a challenge we face â sometimes, we don’t have many examples for them to learn from. That’s where the idea of data augmentation comes in. Think of it like this: if we want a computer to identify cats, we can show it lots of pictures of cats. But what if we don’t have a mountain of cat photos? Here’s where data augmentation steps up. We take the pictures we have and give them a little twist â maybe flip them, rotate them, or zoom in a bit. It’s like creating new learning moments for the model, helping it become better at noticing things, even with not-so-many examples.
Now, let’s talk about something even better called data augmentation. It’s like a new upgrade for our deep learning models. Instead of using the same old tricks all the time, data augmentation changes things up based on what the machine is learning. So, in simple words, it’s a smarter way to teach our machines, making them better learners.
1. Noise injection: Add gaussian or random noise to the audio dataset to improve the model performance.
2. Shifting: Shift audio left (fast forward) or right with random seconds.
3. Changing the speed: Stretches times series by a fixed rate.
4. Changing the pitch: Randomly change the pitch of the audio.
1. Word or sentence shuffling: Randomly changing the position of a word or sentence.
2. Word replacement: Replace words with synonyms.
3. Syntax-tree manipulation: Paraphrase the sentence using the same word.
4. Random word insertion: Inserts words at random.
5. Random word deletion: Deletes words at random.
1. Geometric transformations: Randomly flip, crop, rotate, stretch, and zoom images. You need to be careful about applying multiple transformations on the same images, as this can reduce model performance.
2. Color space transformations: Randomly change RGB color channels, contrast, and brightness.
3. Kernel filters: Randomly change the sharpness or blurring of the image.
4. Random erasing: Delete some part of the initial image.
5. Mixing images: Blending and mixing multiple images.
Now let’s see Image Augmentation Methods in more detail.
Classical Image Augmentation techniques for convolutional neural networks in computer vision are scaling, cropping, flipping, or rotating an image.
The most effective image augmentation tech other than the classical ones are:
1. Cutout
2. Mixup
3. Cutmix
Cutout was introduced in a paper called âImproved regularization of convolutional neural networks with cutoutâ by DeVries & Taylor in 2017. The main idea behind Cutout image augmentation is to randomly remove a square region of pixels in an input image during training.
This tech prevents the model from depending too heavily on specific features, forcing it to focus on the entire input. It acts as a regularization method, introducing noise and making the model more strong to unimportant patterns. Cutout is simple yet useful, especially in cases where the dataset is likely to overfitting.
Implementation in Python with PyTorch
transforms_cutout = A.Compose([
A.Resize(256, 256),
A.CoarseDropout(max_holes = 1, # Maximum number of regions to zero out. (default: 8)
max_height = 128, # Maximum height of the hole. (default: 8)
max_width = 128, # Maximum width of the hole. (default: 8)
min_holes=None,
min_height=None,
min_width=None,
fill_value=0, # value for dropped pixels.
mask_fill_value=None, # fill value for dropped pixels in mask.
always_apply=False,
p=0.5
),
ToTensorV2(),
])
The returned sample batch looks as follows:
Mixup was introduced in a paper called âmixup: Beyond empirical risk minimizationâ by Zhang, Cisse, Dauphin, & Lopez-Paz also in 2017.
MixUp deals with overfitting by taking a different way. It involves linearly interpolating between pairs of training samples, both in terms of input features and corresponding labels. This smooth interpolation creates new samples, reducing the risk of the model memorizing specific examples. MixUp is particularly useful in cases where datasets doesn’t have difference, helping the model generalize better to unseen data.
Implementation in Python with PyTorch
The mixup() function applies Mixup to a full batch. The pairs are generated by shuffling the batch and selecting one image from the original batch and one from the shuffled batch.
def mixup(data, targets, alpha):
indices = torch.randperm(data.size(0))
shuffled_data = data[indices]
shuffled_targets = targets[indices]
lam = np.random.beta(alpha, alpha)
new_data = data * lam + shuffled_data * (1 - lam)
new_targets = [targets, shuffled_targets, lam]
return new_data, new_targets
In addition to the function that augments the images and labels, we must modify the loss function with a custom mixup_criterion() function. This function returns the loss for the two labels according to the lam.
def mixup_criterion(preds, targets):
targets1, targets2, lam = targets[0], targets[1], targets[2]
criterion = nn.CrossEntropyLoss()
return lam * criterion(preds, targets1) + (1 - lam) * criterion(preds, targets2)
The mixup() and mixup_criterion() functions, are not applied in the PyTorch Dataset but in the training code as shown below.
Since the augmentation is applied to the full batch, we will also add a variable p_mixup that controls the portion of batches that will be augmented. E.g. p_mixup = 0.5 would apply Mixup augmentation to 50 % of batches in an epoch.
for epoch in range(NUM_EPOCHS):
# Train
model.train()
# Define any variables for metrics
for samples, labels in (train_dataloader):
samples, labels = samples.to(device), labels.to(device)
# Normalize
samples = samples/255
# Apply Mixup augmentation #
p = np.random.rand()
if p < p_mixup:
samples, labels = mixup(samples, labels, 0.8)
# Zero the parameter gradients
...
with torch.set_grad_enabled(True):
# Forward: Get model outputs and calculate loss
output = model(samples)
# Apply Mixup criterion #
if p < p_mixup:
loss = mixup_criterion(output, labels)
else:
loss = criterion(output, labels)
The returned sample batch looks as follows:
Cutmix was introduced in a paper called âCutmix: Regularization strategy to train strong classifiers with localizable featuresâ by Yun, Han, Oh, Chun, Choe & Yoo in 2019.
CutMix is an augmentation tech that consists cutting and pasting patches from different images to create a new training sample. This process not only introduces differences but also makes the model to learn from regions of multiple images at the same time. By mixing different contexts, CutMix provides a more challenging training environment, improving the model’s strength against changes in real-world data.
Implementation in Python with PyTorch
The implementation for Cutmix is similar to the implementation of Mixup. First, you will also need a custom function cutmix() that applies the image augmentation.
def cutmix(data, targets, alpha):
indices = torch.randperm(data.size(0))
shuffled_data = data[indices]
shuffled_targets = targets[indices]
lam = np.random.beta(alpha, alpha)
bbx1, bby1, bbx2, bby2 = rand_bbox(data.size(), lam)
data[:, :, bbx1:bbx2, bby1:bby2] = data[indices, :, bbx1:bbx2, bby1:bby2]
# adjust lambda to exactly match pixel ratio
lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (data.size()[-1] * data.size()[-2]))
new_targets = [targets, shuffled_targets, lam]
return data, new_targets
def rand_bbox(size, lam):
W = size[2]
H = size[3]
cut_rat = np.sqrt(1. - lam)
cut_w = int(W * cut_rat)
cut_h = int(H * cut_rat)
# uniform
cx = np.random.randint(W)
cy = np.random.randint(H)
bbx1 = np.clip(cx - cut_w // 2, 0, W)
bby1 = np.clip(cy - cut_h // 2, 0, H)
bbx2 = np.clip(cx + cut_w // 2, 0, W)
bby2 = np.clip(cy + cut_h // 2, 0, H)
return bbx1, bby1, bbx2, bby2
The rest is the same as for Mixup:
1. Define a cutmix_criterion() functions to handle the custom loss (see the implementation of mixup_criterion())
2. Define a variable p_cutmix to control the portion of batches that will be augmented (see p_mixup)
3. Apply cutmix() and cutmix_criterion() in accordance to p_cutmix in the training code.
The returned sample batch looks as follows:
Using geometric and other transformations can help you train robust and accurate machine-learning models. For example, in the case of Pneumonia Classification, you can use random cropping, zooming, stretching, and color space transformation to improve the model performance. However, you need to be careful about certain augmentations as they can result in opposite results. For example, random rotation and reflection along the x-axis are not recommended for the X-ray imaging dataset.
In autonomous driving scenarios, data augmentation is crucial to train models to identify objects, pedestrians, and road conditions in different environments. This includes simulating changes in weather, lighting, and road types.
Data augmentation techniques such as paraphrasing and word replacement are applied in NLP tasks like text classification and sentiment analysis. This helps improve the model’s ability to understand and generalize across different forms of language expression.
In this article, we covered the acceptance of Data Augmentation as an important step towards the increased model performance. This exploration consists of accepting Augmentation strategies ranging from traditional techniques like Geometric transformation and Color Space transformation to high level methods like Cutout, Cutmix and Mixup. Practical Implementation of these methods has also been explored. Further, the applications of Data Augmentation has also been discussed.
A brief overview of Data Augmentation has been done in this article. Implementing Data Augmentation shows a sign towards smarter, more adaptable models in the ever-evolving world of deep learning. Using these techniques into learning journeys builds the way for models that succeeds in the unknown real-world data, marking an important step towards strong and intelligent machine learning.
A. Data Augmentation is important as it helps overcome limitations in training data, improves model generalization, and reduces overfitting by providing a different set of augmented examples for learning.
A. Foundational techniques include geometric transformations (rotation, scaling) and color space transformations, which lay the groundwork for more high level methods by introducing variability into the dataset.
A. CutMix combines two images by cutting and pasting patches, promoting and improving model perfomance. This approach differs with traditional methods like flipping or rotating.
A. Yes, Data Augmentation extends beyond images and finds application in different domains. It is employed in natural language processing for text augmentation, in speech recognition for audio data manipulation, and in other fields.
A. While Data Augmentation is powerful, challenges may include the risk of removing important features, especially in few data, or the need for careful thought when applying certain techniques in specific contexts. It’s important to fit the approach to the characteristics of the dataset and the requirements of the task.
The media shown in this article is not owned by Analytics Vidhya and is used at the Authorâs discretion.