In the realm of artificial intelligence and computer vision, CycleGAN stands as a remarkable innovation that has redefined the way we perceive and manipulate images. This cutting-edge technique has revolutionized image-to-image translation, enabling seamless transformations between domains, such as turning horses into zebras or converting summer landscapes into snowy vistas. In this article, we’ll uncover the magic of CycleGAN and explore its diverse applications across various domains.
This article was published as a part of the Data Science Blogathon.
CycleGAN, short for “Cycle-Consistent Generative Adversarial Network,” is a novel deep-learning architecture that facilitates unsupervised image translation. Traditional GANs pit a generator against a discriminator in a min-max game, but CycleGAN introduces an ingenious twist. Instead of aiming for a one-way translation, CycleGAN focuses on achieving bidirectional mapping between two domains without relying on paired training data. This means that CycleGAN can convert images from domain A to domain B and, crucially, back from domain B to domain A while ensuring that the image remains coherent through the cycle.
The architecture of CycleGAN is characterized by its two generators, G_A and G_B, responsible for translating images from domain A to domain B and vice versa. These generators are trained alongside two discriminators, D_A and D_B, which evaluate the authenticity of translated images against real ones from their respective domains. The adversarial training forces the generators to produce images indistinguishable from real images in the target domain, while the cycle-consistency loss enforces that the original image can be reconstructed after the bidirectional translation.
# import libraries
import tensorflow as tf
import tensorflow_datasets as tfdata
from tensorflow_examples.models.pix2pix import pix2pix
import os
import time
import matplotlib.pyplot as plt
from IPython.display import clear_output
# Dataset preparation
dataset, metadata = tfdata.load('cycle_gan/horse2zebra',
with_info=True, as_supervised=True)
train_horses, train_zebras = dataset['trainA'], dataset['trainB']
test_horses, test_zebras = dataset['testA'], dataset['testB']
def preprocess(image):
# resize
image = tf.image.resize(image, [286, 286],
method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
# crop
image = random_crop(image)
# mirror
image = tf.image.random_flip_left_right(image)
return image
# Training set and testing set
train_horses = train_horses.cache().map(
preprocess_image, num_parallel_calls=AUTOTUNE).shuffle(
1000).batch(1)
train_zebras = train_zebras.cache().map(
preprocess_image, num_parallel_calls=AUTOTUNE).shuffle(
1000).batch(1)
horse = next(iter(train_horses))
zebra = next(iter(train_zebras))
# Import pretrained model
channels = 3
g_generator = pix2pix.unet_generator(channels, norm_type='instancenorm')
f_generator = pix2pix.unet_generator(channels, norm_type='instancenorm')
a_discriminator = pix2pix.discriminator(norm_type='instancenorm', target=False)
b_discriminator = pix2pix.discriminator(norm_type='instancenorm', target=False)
to_zebra = g_generator(horse)
to_horse = f_generator(zebra)
plt.figure(figsize=(8, 8))
contrast = 8
# Define loss functions
loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
def discriminator(real, generated):
real = loss(tf.ones_like(real), real)
generated = loss(tf.zeros_like(generated), generated)
total_disc= real + generated
return total_disc * 0.5
def generator(generated):
return loss(tf.ones_like(generated), generated)
# Model training
def train(a_real, b_real):
with tf.GradientTape(persistent=True) as tape:
b_fake = g_generator(a_real, training=True)
a_cycled = f_generator(b_fake, training=True)
a_fake = f_generator(b_real, training=True)
b_cycled = g_generator(a_fake, training=True)
a = f_generator(a_real, training=True)
b = g_generator(b_real, training=True)
a_disc_real = a_discriminator(a_real, training=True)
b_disc_real = b_discriminator(b_real, training=True)
a_disc_fake = a_discriminator(a_fake, training=True)
b_disc_fake = b_discriminator(b_fake, training=True)
# loss calculation
g_loss = generator_loss(a_disc_fake)
f_loss = generator_loss(b_disc_fake)
# Model run
for epoch in range(10):
start = time.time()
n = 0
for a_image, b_image in tf.data.Dataset.zip((train_horses, train_zebras)):
train(a_image, b_image)
if n % 10 == 0:
print ('.', end='')
n += 1
clear_output(wait=True)
generate_images(g_generator, horse)
CycleGAN’s prowess extends far beyond its technical intricacies, finding application in diverse domains where image transformation is pivotal:
CycleGAN’s ability to translate images while preserving content and structure is potent for artistic endeavors. It facilitates the transfer of artistic styles between images, offering new perspectives on classical artworks or breathing new life into modern photography.
In machine learning, CycleGAN aids domain adaptation by translating images from one domain (e.g., real photos) to another (e.g., synthetic images), helping models trained on limited data generalize better to real-world scenarios. It also augments training data by creating variations of images, enriching the diversity of the dataset.
CycleGAN’s talent for transforming landscapes between seasons aids urban planning and environmental studies. Simulating how areas look during different seasons supports decision-making for landscaping, city planning, and even predicting the effects of climate change.
It can generate augmented medical images for training machine learning models. Generating diverse variations of medical images (e.g., MRI scans) can improve model generalization and performance.
Satellite images captured under different lighting conditions, times of the day, or weather conditions can be challenging to compare. CycleGAN can convert satellite images taken at different times or under varying conditions, aiding in tracking environmental changes and urban development.
Game developers can create immersive experiences by transforming real-world images into the visual style of their virtual environments. This can enhance realism and user engagement in virtual reality and gaming applications.
CycleGAN’s transformative potential in image-to-image translation is undeniable. It bridges domains, morphs seasons, and infuses creativity into visual arts. As research and applications evolve, Its impact promises to reach new heights, transcending the boundaries of image manipulation and ushering in a new era of seamless visual transformation. Some key takeaways from this article are:
Both models are effective tools for translating one image into another. However, one of the biggest differences is whether the data they used is paired. In particular, Pix2Pix requires well-paired data, but CycleGAN does not.
It has three losses: Cycle-consistent, which compares the original image to a translated version of the image in a different domain and back. Adversarial, which guarantees realistic pictures. Identity, which preserves the image’s color space.
Generative Adversarial Models (GANs) are composed of 2 neural networks: a generator and a discriminator. A CycleGAN is composed of 2 GANs, making it a total of 2 generators and 2 discriminators.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.