Imagine a world where fashion designers never run out of new ideas and every outfit we wear is a work of art. Sounds interesting, right? Well, we can make this happen in reality with the help of General Adversarial Networks (GANs). GANs had blurred the line between reality and imagination. It’s like a genie in a bottle that grants all our creative wishes. We can even create a sun on the Earth with the help of GANs, which is not possible in real life.
Back in the 2010s, Lan Goodfellow and his colleagues introduced this framework. They actually aimed to address the challenge of unsupervised learning, where the model learns from unlabelled data and generate new samples. GANs have revolutionized a number of industries with their capacity to produce fascinating and lifelike content, and the fashion industry is leading the way in embracing this potential. Now we will explore the potential of GANs and understand how they magically work.
This article was published as a part of the Data Science Blogathon.
Generative Adversarial Networks are a class of machine learning models which are used for generating new realistic data. It can produce highly realistic images, videos, and many more. It contains only two neural networks: Generator and Discriminator.
A generator is a convolutional neural network that generates data samples that cannot be distinguished by the discriminator. Here generator learns how to create data from noise. It always tries to fool the discriminator.
The discriminator is a deconvolutional neural network that tries to correctly classify between real and fake samples generated by the generator. Discriminator takes both real and fake data generated by the generator and learns to distinguish it from real data. The discriminator will give a score between 0 and 1 as output for the generated images. Here 0 indicates the image is fake, and 1 indicates the image is real.
The training process includes generating fake data, and the discriminator tries to identify it correctly. It involves two stages: Generator training and Discriminator training. It also involves optimizing both the generator and discriminator. The goal of the generator is to generate data that are not distinguishable from real data and the goal of the discriminator is to identify real and fake data. If both networks work properly, then we can say the model is optimized. Both of them are trained using backpropagation. So whenever an error occurs, it will be propagated back and they will update their weights.
The loss function used in the GANs consists of two components, as we have two networks in its architecture. In this, the generator’s loss is based on how well it can generate realistic data that are not distinguishable by the discriminator. It always tries to minimize the discriminator’s ability. On the other hand, the discriminator’s loss is based on how well it can classify real and fake samples. It tries to minimize misclassification.
During training, both the generator and discriminator are updated alternatively. Here both try to minimize their losses. The generator tries to reduce its loss by generating better samples for the discriminator, and the discriminator tries to reduce its loss by classifying fake samples and real samples accurately. This process continues until the GAN reaches the desired level of convergence.
Due to their ability to generate new realistic data, GANs have become more important in the field of machine learning and artificial intelligence. This has many varieties of applications like video generation, image generation, text-to-image synthesis, etc. These revolutionize many industries. Let’s see some reasons why GANs are important in this field.
As the research on GANs still continues, we can expect many more miracles of this technology in the future.
Even though GANs have shown their ability to generate realistic and diverse data, it still has some challenges and limitations that need to be considered. Let’s see some challenges and limitations of GANs.
Though it has some challenges and limitations, GANs have a potentially bright future. Numerous industries, including healthcare, finance, and entertainment, are expected to experience a revolution as a result of GANs.
It is a popular dataset used in machine learning for various purposes. It’s a replacement for the original MNIST dataset, which contains digits from 0 to 9. In our fashion MNIST dataset, we have images of various fashion items instead of digits. This dataset contains 70000 images, of which 60000 are training images and 10000 are testing images. Each of them is in greyscale with 28 x 28 pixels. The fashion MNIST dataset has 10 classes of fashion items. They are:
Initially, this dataset was created to develop machine-learning models for classification. This dataset is even used as a benchmark for evaluating many machine learning algorithms. This dataset is easy to access and can be downloaded from various sources, including Tensorflow and PyTorch libraries. Compared to the original digits MINIST dataset, it is more challenging. Models must be able to distinguish between various fashion products that may have similar shapes or patterns. This makes it suitable for testing the robustness of various algorithms.
The fashion industry has undergone a tremendous transition because of GANs, which enabled creativity and change. The way we design, produce, and experience fashion has been revolutionized by GANs. Let’s see some real-world applications of General Adversarial Networks(GANs) in the fashion industry.
We will now use Generative Adversarial Networks (GANs) to generate fashion samples using the MNIST fashion dataset. Start by importing all the necessary libraries.
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Reshape
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import ZeroPadding2D
from tensorflow.keras.layers import LeakyReLU
from tensorflow.keras.layers import UpSampling2D
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.models import Sequential
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt
import sys
We have to load the dataset. Here we are using the fashion MNIST dataset. This is a built-in dataset in tensorflow. So we can directly load this using tensorflow keras. This dataset is basically used for classification tasks. As discussed earlier, it has greyscale images of pixels 28 x 28. We just need a training set of data. So we will divide it into training and testing datasets and load only the training set.
Loaded data is then normalized between -1 and 1. We usually normalize to improve the stability and convergence of deep learning models during training. This is a common step in most deep-learning tasks. And finally, we will add an extra dimension to the data array. Because we need to match the expected input shape of the generator. The generator requires a 4D tensor. It represents the batch size, height, width, and number of channels.
# Load fashion dataset
(X_train, _), (_, _) = tf.keras.datasets.fashion_mnist.load_data()
X_train = X_train / 127.5 - 1.
X_train = np.expand_dims(X_train, axis=3)
Set dimensions of generator and discriminator. Here gen_input_dim is the size of the generator’s input, and in the next line, define the shape of images that are generated by the generator. Here it is 28 x 28 and in greyscale as we are providing only one channel.
gen_input_dim = 100
img_shape = (28, 28, 1)
Now we will define the generator model. It takes only one single argument and that is the input dimension. It uses keras sequential API to build the model. It has three fully connected layers with LeakyReLU activation functions and batch normalization. And in the final layer, it uses tanh activation function to generate the final output image. Finally, it returns a keras model object which takes the noise vector as input and gives a generated image as output.
def build_generator(input_dim):
model = Sequential()
model.add(Dense(256, input_dim=input_dim))
model.add(LeakyReLU(alpha=0.2))
model.add(BatchNormalization(momentum=0.8))
model.add(Dense(512))
model.add(LeakyReLU(alpha=0.2))
model.add(BatchNormalization(momentum=0.8))
model.add(Dense(1024))
model.add(LeakyReLU(alpha=0.2))
model.add(BatchNormalization(momentum=0.8))
model.add(Dense(np.prod(img_shape), activation='tanh'))
model.add(Reshape(img_shape))
noise = Input(shape=(input_dim,))
img = model(noise)
return Model(noise, img)
The next step is to build a discriminator. It is almost similar to the generator model but here it has only two fully connected layers and with sigmoid activation function for the last layer. And it returns the model object as output by taking the noise vector as input and outputs the probability that the image is real.
def build_discriminator(img_shape):
model = Sequential()
model.add(Flatten(input_shape=img_shape))
model.add(Dense(512))
model.add(LeakyReLU(alpha=0.2))
model.add(Dense(256))
model.add(LeakyReLU(alpha=0.2))
model.add(Dense(1, activation='sigmoid'))
img = Input(shape=img_shape)
validity = model(img)
return Model(img, validity)
Now we have to compile them. We use binary cross-entropy loss and the Adam optimizer to compile the discriminator and generator. We set the learning rate to 0.0002 and the decay rate to 0.5. A discriminator model is built and compiled using a binary cross-entropy loss function which is popularly used for binary classification tasks. Accuracy metrics are also defined to evaluate the discriminator.
Similarly, a generator model is built that creates an architecture for the generator. Here we won’t compile the generator as we do for the discriminator. It will be trained in an adversarial manner against the discriminator. z is an input layer representing random noise for the generator. The generator takes z as input and generates img as output. The discriminator’s weights are frozen during the training of the combined model. The generator’s output will be fed to the discriminator and validity will be generated, which measures the quality of the generated image. Then the combined model is created using z as input and validity as output. This is used to train the generator.
optimizer = Adam(0.0002, 0.5)
discriminator = build_discriminator(img_shape)
discriminator.compile(loss='binary_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
generator = build_generator(gen_input_dim)
z = Input(shape=(gen_input_dim,))
img = generator(z)
discriminator.trainable = False
validity = discriminator(img)
combined = Model(z, validity)
combined.compile(loss='binary_crossentropy',
optimizer=optimizer)
It’s time to train our GAN. We know that it runs for epochs number of iterations. In each iteration, a batch of random images is taken from the training set and a batch of fake images is generated by the generator by passing noise.
Discriminator is trained on both real images and fake images. And the average loss is calculated. The generator is trained on noise and the loss is calculated. Here we have defined sample_interval as 1000. So for every 1000 iterations, losses will be printed.
# Train GAN
epochs = 5000
batch_size = 32
sample_interval = 1000
d_losses = []
g_losses = []
for epoch in range(epochs):
idx = np.random.randint(0, X_train.shape[0], batch_size)
real_images = X_train[idx]
# Train discriminator
noise = np.random.normal(0, 1, (batch_size, gen_input_dim))
fake_images = generator.predict(noise)
d_loss_real = discriminator.train_on_batch(real_images, np.ones((batch_size, 1)))
d_loss_fake = discriminator.train_on_batch(fake_images, np.zeros((batch_size, 1)))
d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
d_losses.append(d_loss[0])
# Train generator
noise = np.random.normal(0, 1, (batch_size, gen_input_dim))
g_loss = combined.train_on_batch(noise, np.ones((batch_size, 1)))
g_losses.append(g_loss)
# Print progress
if epoch % sample_interval == 0:
print(f"Epoch {epoch}, Discriminator loss: {d_loss[0]}, Generator loss: {g_loss}")
Now let’s see some generated samples. Here we are plotting a grid with 5 rows and 10 columns of these samples. This is created with matplotlib. These generated samples are similar to the dataset we used for training. We can generate better-quality samples by training for more epochs.
# Generate sample images
r, c = 5,10
noise = np.random.normal(0, 1, (r * c, gen_input_dim))
gen_imgs = generator.predict(noise)
# Rescale images 0 - 1
gen_imgs = 0.5 * gen_imgs + 0.5
# Plot images
fig, axs = plt.subplots(r, c)
cnt = 0
for i in range(r):
for j in range(c):
axs[i,j].imshow(gen_imgs[cnt,:,:,0], cmap='gray')
axs[i,j].axis('off')
cnt += 1
plt.show()
Generative Adversarial Networks (GANs) are the most popular choice for many applications because of their unique architecture, training process, and their ability to generate data. As with any technology, GANs too have some challenges and limitations. Researchers are working to minimize them and crave better GANs. Overall we have learned and understood the power and potential of GANs and their working. We have also built a GAN to generate fashion samples using the fashion MNIST dataset.
Hope you found this article useful. Connect with me on LinkedIn.
A. GANs, or Generative Adversarial Networks, generate synthetic data that closely resembles real data. They have applications in various fields, including image generation, video synthesis, text generation, and data augmentation.
A. There are several types of GANs, including Conditional GANs (cGANs) that generate outputs based on specific conditions, CycleGANs that learn mappings between two domains, and Progressive GANs that generate images of increasing quality.
A. GANs have two main components: generator and discriminator networks. The generator generates synthetic data, while the discriminator distinguishes between real and fake data. Both networks are trained simultaneously in a competitive fashion.
A. The advantage of GANs is their ability to generate realistic and diverse synthetic data. They can capture complex patterns and generate new samples that exhibit similar characteristics to the training data. GANs have broad applications in various creative and data-driven domains.
A. GANs can be challenging to train and stabilize. They are sensitive to hyperparameters and may suffer from mode collapse, where the generator fails to explore the entire data distribution. Evaluating GAN performance objectively is also a challenge, making it difficult to assess the quality of generated samples.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.