Imagine a scenario where you find an old family photo album hidden in a dusty attic. You will immediately clean the dust and with the most excitement, you will flip through the pages. And you found a picture from many years ago. But still, you look aren’t happy because the picture is faded and blurry. You will strain your eyes in order to find the faces and details in the picture. This is the scenario in the olden days. Thanks to new technologies these days. We have Super-Resolution Generative Adversarial Network(SRGAN) to convert low-resolution images to high-resolution images. In this article, we will learn the most about SRGAN and implement it for QR code enhancement.
In this article, we will learn:
This article was published as a part of the Data Science Blogathon.
In many crime investigation movies, we often encounter a typical scenario where the detectives check for CCTV footage for evidence. And there is a scene where someone finds a small and obscured image and they will get a clear picture of it by zooming and enhancing it. Do you think, it is possible? Yes, We can do it with the help of super-resolution. Super-resolution techniques can enhance blurry images captured by CCTV cameras, providing them with more detailed visuals.
…………………………………………………………………………………………………………………………………………………………..
…………………………………………………………………………………………………………………………………………………………..
The process of upscaling and enhancing the images is called Super Resolution. It involves generating a high-resolution version of an image or video from respective low-resolution input. The goal of it is to recover missing details, improve sharpness, and improve visual quality. If you just zoom in on the picture without enhancing it, you will get a blurred picture as shown in the below images. Enhancement takes place with super-resolution. It has many applications in many domains including photography, surveillance systems, Medical imaging, Satellite imaging, and many more.
………..
Traditional approaches mainly focus on estimating missing pixel values and improving image resolution. There are two approaches to it: interpolation-based methods and regularization-based methods.
In the early days of super-resolution, they focused on interpolation-based methods and the goal is to estimate the missing pixel values and then upscale the image. Do this with the assumption that neighboring pixel values will be having similar pixel values. Use these values to estimate the missing values. The most commonly used interpolation methods include bicubic, bilinear, and nearest-neighbor interpolation. But the results are unsatisfactory. This led to blurry images. These methods efficiently compute, making them suitable for basic resolution tasks and situations with limited computational resources.
On the other hand, regularization-based methods aim to improve super-resolution results by introducing extra constraints or priors into the image reconstruction process. These techniques take advantage of an image’s statistical features to increase the accuracy of the rebuilt images while preserving fine details. It has provided more control over the reconstruction process and enhances the sharpness and details of the image. But here there are some limitations like handling complex image content because it leads to over-smoothing in some cases.
Even though these traditional approaches have some limitations, they showed a path for the emergence of powerful methods for super-resolution.
Learning-based approaches have become a powerful and effective solution for super-resolution. It has allowed the generation of highly detailed high-resolution images. There are two main learning-based approaches: Single Image Super-Resolution (SISR) and Generative Adversarial Networks (GANs).
Single Image Super-Resolution focuses on learning a mapping function that directly maps from low-resolution to high-resolution images. It uses convolutional neural networks (CNNs). Researchers train these networks using large-scale datasets that include pairs of low-resolution and high-resolution images. These networks learn the underlying patterns and the relationships between the low relation and high resolutions of images so that it generates high-quality results. The architecture of SISR models consists of an encoder and a decoder.
Here the encoder captures the low-resolution image’s feature and then passed through the decoder to upscale it and refines those features to get a high-resolution image. Commonly used lost metrics for measuring the difference between real and generated images include Mean Squared Error (MSE) and Peak Signal-to-Noise Ratio (PSNR). By minimizing these loss functions during training, the network will learn to produce high-resolution images that closely mimic the original high-resolution images.
On the other hand, Generative Adversarial Networks (GANs) has introduced adversarial learning framework and brought advancements in super-resolution. Two parts make up GANs. They consist of a discriminator network and a generator network. The generator network takes a low-resolution image as input and attempts to produce a high-resolution output. The discriminator network makes an effort to distinguish between artificially created high-resolution images and actual high-resolution images. GAN-based super-resolution methods have shown impressive results in generating realistic images. In comparison to traditional methods, they are more capable to capture complex patterns and create fine textures. Super-Resolution Generative Adversarial Networks (SRGANs) is a popular implementation of GANs for super-resolution tasks.
In today’s world, high-quality images are very important in many domains. But it is not always possible to take high-resolution images due to many limitations. This is where super-resolution became relevant. It converts low-resolution content into high-resolution content. To overcome the limitations of traditional approaches, learning-based super-resolution approaches have emerged, and using GANs is one of them.
SRGAN is a combination of generative adversarial networks (GANs) and deep convolutional neural networks (CNNs) and it produces highly realistic high-resolution images from low-resolution images. As we know Generative Adversarial Network(SRGAN) consists of two parts. They are a generator and a discriminator. Both the generator and discriminator learn themselves by working against each other. As we know the aim of the generator is to generate high-resolution images that are indistinguishable from the ground truth high-resolution images. And the aim of the discriminator is to distinguish generator images from the real images. This is called Adversarial Training. The generator always tries to deceive the discriminator by generating super realistic high-resolution images. It learns to capture the very fine details and overall visual characteristics in the image. The discriminator provides feedback to the generator on the generated images and through backpropagation generator learns better and tries to minimize the loss.
Use the loss function in the case of SRGANs which is Perceptual Loss, the combination of two different losses. They are Content loss and Adversarial loss.
The overall loss for super-resolution(perceptual loss) is
It starts by taking a low-resolution image as an input and sends this input image through the convolutional layer that uses 64 filters of size 9 by 9. Next, the parametric ReLU function receives it as input. The values are then sent to the residual blocks, where common operations are grouped together, forming a residual block. This sequence of operations is repeated for each block it passes through. Inside the residual block, we have a convolutional layer that uses 64 pixels of size 3 by 3. Following the parametric ReLU, a batch normalization layer is applied. This is followed by another convolutional layer, which in turn is followed by batch normalization.
Finally, an elementwise sum is performed with the input of the residual block. The output of this block is sent to the next block and repeats the same steps. This continues till the last residual block. As mentioned in the original paper by the authors, we have 16 residual blocks in total in SRGANs. The purpose of these residual blocks is to extract features from the input image.
After the residual blocks, we have another convolutional layer and Batch normalization layer. Next, the output of the first parametric ReLU function is again subjected to an elementwise sum. Next is the upsampling block where the pixel shuffling takes place to gradually increase the resolution of the image. It has two upscaling blocks. It is ended with a convolutional layer and a super-resolution image will be generated as output.
A discriminator network is just an image classification Convolution Neural Network(CNN). It is responsible for differentiating the generated images and real high-resolution images. It learns to classify the input images. Firstly, a convolutional layer is applied to the input image, whether it is a real high-resolution image or a generator-generated high-resolution image. This layer extracts features from the input image, which are then passed through the Leaky ReLU function. Passed through several discriminator blocks which contain a convolutional layer, Batch Normalization, and Leaky ReLU. Finally, it passed through the Dense layer followed by Leaky ReLu and another Dense layer to get an output. As we know it is a classification between original high-resolution images and generator-generated high-resolution images.
In this project, we will use SRGANs for implementation. This project is about QR code enhancement Where a low resolution and blurry image of a QR code will be passed as an input and our model will give a high solution clear picture of the QR code.
You can download the dataset of QR codes here.
Let’s start by importing some required libraries.
import tensorflow as tf
import numpy as np
import pandas as pd
import cv2
import os
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras import layers, Model
from sklearn.model_selection import train_test_split
from keras import Model
from keras.layers import Conv2D
from keras.layers import PReLU
from keras.layers import BatchNormalization
from keras.layers import Flatten
from keras.layers import UpSampling2D
from keras.layers import LeakyReLU
from keras.layers import Dense
from keras.layers import Input
from keras.layers import add
from tqdm import tqdm
Install all the missing packages using pip.
!pip install opencv-python
!pip install tqdm
!pip install scikit-image
Now iterate over the files in a directory, read an image file using OpenCV, and display it using matplotlib. So firstly, assign the path where the images are stored. We will break the loop after one iteration. So only one image will be displayed.
datadir = r'path-to-dataset'
# iterating over just one element
for img in os.listdir(datadir):
img_array = cv2.imread(os.path.join(datadir,img) ,cv2.IMREAD_GRAYSCALE)
plt.imshow(img_array, cmap='gray')
plt.show()
break
Now we have to process all the images in the directory and create training data. For this, we have to declare two lists: array and array_small. These are initialized to store resized images. The ‘tqdm’ module is imported to display a progress bar while iterating over the images. In the create_training_data function, we will iterate over each image in the directory. For each image, first, we will read it using imread() and then resize it to (128,128) using resize() functions. Then append the resized image to the array list. And then resize it to (32,32) and append it to the array_small list. Repeat the process for every image in the directory.
array = []
array_small =[]
from tqdm import tqdm
def create_training_data():
for img in tqdm(list(os.listdir(datadir))): # iterate over each image per dogs and cats
try:
img_array = cv2.imread(datadir+'/'+img ,cv2.IMREAD_COLOR) # convert to array
new_array = cv2.resize(img_array, (128, 128)) # resize to normalize data size
array.append([new_array])
array_small.append([cv2.resize(img_array, (32,32),
interpolation=cv2.INTER_AREA)]) # add this to our training_data
except Exception as e: # in the interest in keeping the output clean...
pass
create_training_data()
Let’s find the length of the array. It means we have 10000 images in total.
len(array)
#10000
To check if the image processing and resizing steps are successful, we need to create two more empty lists: X and Xs. And append all the high-resolution images to X and low-resolution images to Xs. Then plot a figure with both high and low-resolution images. Before that, convert both lists into arrays.
X = []
Xs = []
for features in array:
X.append(features)
for features in array_small:
Xs.append(features)
plt.figure(figsize=(16, 8))
X = np.array(X).reshape(-1, 128, 128, 3)
Xs = np.array(Xs).reshape(-1, 32, 32, 3)
plt.subplot(231)
plt.imshow(X[0], cmap = 'gray')
plt.subplot(233)
plt.imshow(Xs[0], cmap = 'gray')
plt.show()
Let’s augment the entire data we have. We can use ImageDataGenerator() to create augmented images. After creating images, reshape them and save them to a separate directory.
#augmenting the data
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from skimage import io
datagen = ImageDataGenerator(
rotation_range = 40,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True,
brightness_range = (0.5, 1.5))
for a in X:
i = 0
a = a.reshape((1, ) + a.shape)
for batch in datagen.flow(a, batch_size=1, save_to_dir= r'C:\Users\Admin\Downloads\QR\augmented',
save_prefix='ag', save_format='png'):
try:
i += 1
if i>= 10:
break
except Exception:
print("error")
pass
We have to create training data for augmented images similar to how we have done to original data. And then create two more lists: X1 and Xs1 for storing augmented data and then plot the figure to understand. Then concatenate original data lists and augmented data lists.
array=[]
array_small=[]
datadir = r'C:\Users\Admin\Downloads\QR\augmented'
create_training_data()
X1 = []
Xs1 = []
for features in array:
X1.append(features)
for features in array_small:
Xs1.append(features)
X1 = np.array(X1).reshape(-1, 128, 128, 3)
Xs1 = np.array(Xs1).reshape(-1, 32, 32, 3)
plt.figure(figsize=(16, 8))
plt.subplot(231)
plt.imshow(X1[0], cmap = 'gray')
plt.subplot(233)
plt.imshow(Xs1[0], cmap = 'gray')
plt.show()
X=np.concatenate((X,X1), axis = 0)
Xs=np.concatenate((Xs,Xs1), axis=0)
X.shape
It’s time to split our entire data into training and validation sets. The test_size represents that 33% of the data should be allocated to the validation set, while 67% is allocated to the training set. The random_state sets the random seed to ensure the reproducibility of the split.
from sklearn.model_selection import train_test_split
X_train,X_valid,y_train, y_valid = train_test_split(Xs, X, test_size = 0.33, random_state = 12)
X_train.shape
Let’s build the generator. So first define the residual block which is a fundamental building block in many deep learning architectures. Then define the upscale block which is responsible for increasing the resolution of the input tensor. Finally, define a generator that takes 3 input parameters. They are input and additional parameters res_range and upscale_range that control the number of residual blocks and upscale blocks in the network, respectively.
def res_block(input_dim):
model = Conv2D(64, (3,3), padding = 'same' )(input_dim)
model = BatchNormalization()(model)
model = PReLU(shared_axes = [1,2])(model)
model = Conv2D(64, (3,3), padding = 'same' )(model)
model = BatchNormalization()(model)
return add([input_dim, model])
def upscale_block(input_dim):
model = Conv2D(256,(3,3), strides=1, padding = 'same')(input_dim)
model = UpSampling2D(size = (2,2))(model)
model = PReLU(shared_axes=[1, 2])(model)
return model
def generator(input, res_range = 1,upscale_range=1):
model = Conv2D(64,(9,9), strides=1, padding = 'same')(input)
model = PReLU(shared_axes = [1,2])(model)
model1 = model
for i in range(res_range):
model = res_block(model)
model = Conv2D(64, (3,3), padding = 'same' )(model)
model = BatchNormalization()(model)
model = add([model,model1])
for i in range(upscale_range):
model =upscale_block(model)
output = Conv2D(3, (9,9), padding='same')(model)
return Model(input, output)
Now let’s build the second part of the GAN, which is the discriminator. Firstly, the discriminator block, which is a convolutional block used in the discriminator, is defined. Next, define the discriminator network. It takes an input tensor input and constructs the discriminator architecture. It applies a 2D convolution with 64 filters and a kernel size of (3, 3), applies the LeakyReLU activation function, adds some discriminator blocks, flattens the output tensor, applies a fully connected layer with 1024 units, applies the LeakyReLU activation function with an alpha of 0.2, and outputs a single unit with a sigmoid activation function, representing the discriminator’s output. Finally, the function returns a Keras ‘Model’ object with the input and output tensors.
def discrim_block(input_dim, fmaps = 64, strides = 1):
model = Conv2D(fmaps, (3,3), padding = 'same', strides = strides)(input_dim)
model = BatchNormalization()(model)
model = LeakyReLU()(model)
return model
def discriminator(input):
model = Conv2D(64,(3,3),padding='same')(input)
model = LeakyReLU()(model)
model = discrim_block(model, strides = 2)
model = discrim_block(model, fmaps = 128)
model = discrim_block(model, fmaps = 128, strides = 2)
model = discrim_block(model, fmaps=256)
model = discrim_block(model, fmaps=256, strides=2)
model = discrim_block(model, fmaps=512)
model = discrim_block(model, fmaps=512, strides=2)
model = Flatten()(model)
model = Dense(1024)(model)
model = LeakyReLU(alpha = 0.2)(model)
out = Dense(1, activation='sigmoid')(model)
return Model(input, out)
Our next step is to build a VGG model. It initializes a VGG19 model pre-trained on the ImageNet dataset using the VGG19 function. Finally, the function returns a Keras Model object with the input and output tensors.
Then we have to create a combined model with generator, discriminator, and VGG19 layers. It takes the inputs: the generator model, the discriminator model, the VGG19 model, the low-resolution input, and the high-resolution input. It passes the low-resolution input through the generator model to generate a high-resolution output. Next, the VGG19 model (vgg) is used to extract features from the generated high-resolution image. The discriminator model is set to be non-trainable as the intention is to solely train the generator part of the model. The validity of the generated image is computed by passing the generated image (gen_img) through the discriminator model (disc_model). By combining the generator, discriminator, and VGG19 layers, the resulting model can be used to train the generator to produce high-resolution images
#introducing vgg19 layer
from tensorflow.keras.applications.vgg19 import VGG19
def build_vgg(hr_shape):
vgg = VGG19(weights="imagenet", include_top=False, input_shape=hr_shape)
return Model(inputs=vgg.inputs, outputs=vgg.layers[10].output)
# Define combined model
def create_comb(gen_model, disc_model, vgg, lr_ip, hr_ip):
gen_img = gen_model(lr_ip)
gen_features = vgg(gen_img)
disc_model.trainable = False
validity = disc_model(gen_img)
return Model(inputs=[lr_ip, hr_ip], outputs=[validity, gen_features])
Now create the final generator network. For that, set all the inputs, build a generator, discriminator, and VGG19 layer, and finally create the combined model (GAN model). So first set to the shape of the high-resolution training images which is y_train and set to the shape of the low-resolution training images which is X_train. Then using the generator and discriminator functions create the generator and discriminator respectively. Create a VGG19 layer using the build_vgg function. finally, using the create_comb function create a GAN model. The GAN model combines the generator, discriminator, and VGG19 layers into a single model for training.
hr_shape = (y_train.shape[1], y_train.shape[2], y_train.shape[3])
lr_shape = (X_train.shape[1], X_train.shape[2], X_train.shape[3])
lr_ip = Input(shape=lr_shape)
hr_ip = Input(shape=hr_shape)
generator = generator(lr_ip, res_range = 16, upscale_range=2)
generator.summary()
discriminator = discriminator(hr_ip)
discriminator.compile(loss="binary_crossentropy", optimizer="adam", metrics=['accuracy'])
discriminator.summary()
vgg = build_vgg((128,128,3))
print(vgg.summary())
vgg.trainable = False
gan_model = create_comb(generator, discriminator, vgg, lr_ip, hr_ip)
Compile the SRGAN using binary cross entropy and mean squared error loss functions and Adam optimizer. Use the first loss function for the discriminator output (validity) and use the second loss function for the generator output (gen_features).
gan_model.compile(loss=["binary_crossentropy", "mse"], loss_weights=[1e-3, 1], optimizer="adam")
gan_model.summary()
Divide the training data into batches for training the SRGAN model. We created two empty lists, train_lr_batches, and train_hr_batches, are created to store the low-resolution and high-resolution image batches, respectively. Inside the loop, the high-resolution image batch (y_train[start_idx:end_idx]) is extracted from the y_train dataset and appended to the train_hr_batches list. Similarly, the low-resolution image batch (X_train[start_idx:end_idx]) is extracted from the X_train dataset and appended to the train_lr_batches list.
batch_size = 1
train_lr_batches = []
train_hr_batches = []
for it in range(int(y_train.shape[0] / batch_size)):
start_idx = it * batch_size
end_idx = start_idx + batch_size
train_hr_batches.append(y_train[start_idx:end_idx])
train_lr_batches.append(X_train[start_idx:end_idx])
Our next step is training this SRGAN model. Iterate epochs the number of times. Create fake_label which is a numpy array filled with zeros, representing the labels for the fake (generated) images, and real_label which is a numpy array filled with ones, representing the labels for the real images. Then two empty lists, g_losses, and d_losses, are created to store the generator and discriminator losses, respectively.
During this process, the generator generates fake images, and train the discriminator using both the fake images and real images. The VGG network is responsible for extracting features from the high-resolution images. After iterating through all the batches, we compute the average generator and discriminator losses. The training of the SRGAN model by updating the discriminator and generator in an adversarial manner and tracking their losses.
epochs = 1
#Enumerate training over epochs
for e in range(epochs):
fake_label = np.zeros((batch_size, 1))
real_label = np.ones((batch_size,1))
g_losses = []
d_losses = []
#Enumerate training over batches.
for b in tqdm(range(len(train_hr_batches))):
lr_imgs = train_lr_batches[b]
hr_imgs = train_hr_batches[b]
fake_imgs = generator.predict_on_batch(lr_imgs)
discriminator.trainable = True
d_loss_gen = discriminator.train_on_batch(fake_imgs, fake_label)
d_loss_real = discriminator.train_on_batch(hr_imgs, real_label)
discriminator.trainable = False
d_loss = 0.5 * np.add(d_loss_gen, d_loss_real)
image_features = vgg.predict(hr_imgs)
g_loss, _, _ = gan_model.train_on_batch([lr_imgs, hr_imgs], [real_label, image_features])
d_losses.append(d_loss)
g_losses.append(g_loss)
g_losses = np.array(g_losses)
d_losses = np.array(d_losses)
g_loss = np.sum(g_losses, axis=0) / len(g_losses)
d_loss = np.sum(d_losses, axis=0) / len(d_losses)
print("epoch:", e+1 ,"g_loss:", g_loss, "d_loss:", d_loss)
if (e+1) % 5 == 0:
generator.save("gen_e_"+ str(e+1) +".h5")
Save the trained generator model.
generator.save("generator"+ str(e+1) +".h5")
Our final step is to check our SRGAN. Now let’s use the trained generator model to produce super-resolution images and compare them with the low-resolution and original high-resolution images.
from tensorflow.keras.models import load_model
from numpy.random import randint
[X1, X2] = [X_valid, y_valid]
ix = randint(0, len(X1), 1)
src_image, tar_image = X1[ix], X2[ix]
gen_image = generator.predict(src_image)
plt.figure(figsize=(16, 8))
plt.subplot(231)
plt.title('Low Resolution Image')
plt.imshow(src_image[0,:,:,:], cmap = 'gray')
plt.subplot(232)
plt.title('Super Resolution Image')
plt.imshow(cv2.cvtColor(gen_image[0,:,:,:], cv2.COLOR_BGR2GRAY),cmap = 'gray')
plt.subplot(233)
plt.title('Original High Resolution Image')
plt.imshow(tar_image[0,:,:,:], cmap = 'gray')
plt.show()
We have successfully implemented SRGAN for QR code enhancement. The result that we got here is after just one epoch. We can observe the change in resolutions, it almost reached the original high-resolution image. Imagine if we had trained for at least 10 epochs. That’s the power of SRGANs. SRGANs have emerged as game-changer in the field of image super-resolution. These are the most advanced and powerful models for generating super-resolution images.
A. SRGAN stands for Super-Resolution Generative Adversarial Network. It is a deep learning approach for image resolution that converts low-resolution input images to High-resolution images.
A. In Generative Adversarial Network(GAN) we probably have a generator and a discriminator. Here both the generator and discriminator work in an adversarial manner where the generator generates high-resolution images and the discriminator tries to differentiate them from real images.
A. Perceptual loss is the loss function used in the SRGAN. Perceptual loss is a weighted sum of content loss and adversarial loss. Adversarial loss encourages the generator to produce realistic images. Content loss measures the similarity between generated and real high-resolution images.
A. Yes, there are limitations with SRGAN. Generating high-resolution images can be resource-intensive and time-consuming because of their deep learning architectures. There is a chance for overfitting. Careful regularization and training strategies can help overcome these challenges.
A. Yes, there are models that have been trained on large datasets and are ready to use for generating high-resolution images. Some examples are Enhanced Super-Resolution Generative Adversarial Network(ESRGAN), SRResNet, and SRGAN-Tensorflow.
If you have any queries, then please connect with me on LinkedIn.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.