Exploring Advanced Generative AI | Conditional VAEs

Hari Bhutanadhu Last Updated : 20 May, 2024

7 min read

Introduction

Welcome to this article, where we’ll explore the exciting world of Generative AI. We will mainly focus on Conditional Variational Autoencoders or CVAEs, these are like the next level of AI artistry, merging the strengths of Variational Autoencoders (VAEs) with the ability to follow specific instructions, giving us fine-tuned control over image creation. Throughout this article, we’ll dive deep into CVAEs, and will see how and why they can be used in various real-world scenarios, and even provide you with some easy-to-understand code examples to showcase their potential.

Conditional VAEs | Generative AI — Source : IBM

This article was published as a part of the Data Science Blogathon.

Introduction
Understanding Variational Autoencoders (VAEs)
Conditional Variational Autoencoders (CVAEs) Explained
Difference Between VAEs and CVAEs
Implementing CVAEs: Code Examples
Applications of CVAEs
Challenges and Future Directions
Conclusion
Frequently Asked Questions

Understanding Variational Autoencoders (VAEs)

Before diving into CVAEs, lets focus on fundamentals of VAEs. VAEs are a type of generative model that combines an encoder and a decoder network. They are used to learn the underlying structure of data and generate new samples.

Understanding Variational Autoencoders | Conditional VAEs | Generative AI

Sure, let’s use a simple example involving coffee preferences to explain Variational Autoencoders (VAEs)

Imagine you want to represent everyone’s coffee preferences in your office:

Encoder: Each person summarizes their coffee choice (black, latte, cappuccino) with a few words (e.g., firm, creamy, mild).
Variation: Understands that even within the same choice (e.g., latte), there are variations in milk, sweetness, etc.
Latent Space: Creates a flexible space where coffee preferences can vary.
Decoder: Uses these summaries to make coffee for colleagues, with slight variations, respecting their preferences.
Generative Power: Can create new coffee styles that suit individual tastes but aren’t exact replicas.

VAEs work similarly, learning core features and variations in data to generate new, similar data with slight differences.

Here’s a simple Variational Autoencoder (VAE) implementation using Python and TensorFlow/Keras. This example uses the MNIST dataset for simplicity, but you can adapt it to other data types.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

# Load and preprocess the MNIST dataset
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Define the VAE model
latent_dim = 2

# Encoder
encoder_inputs = keras.Input(shape=(28, 28))
x = layers.Flatten()(encoder_inputs)
x = layers.Dense(256, activation='relu')(x)
z_mean = layers.Dense(latent_dim)(x)
z_log_var = layers.Dense(latent_dim)(x)

# Reparameterization trick
def sampling(args):
    z_mean, z_log_var = args
    epsilon = tf.keras.backend.random_normal(shape=(tf.shape(z_mean)[0], latent_dim))
    return z_mean + tf.exp(0.5 * z_log_var) * epsilon

z = layers.Lambda(sampling)([z_mean, z_log_var])

# Decoder
decoder_inputs = keras.Input(shape=(latent_dim,))
x = layers.Dense(256, activation='relu')(decoder_inputs)
x = layers.Dense(28 * 28, activation='sigmoid')(x)
decoder_outputs = layers.Reshape((28, 28))(x)

# Define the VAE model
encoder = keras.Model(encoder_inputs, [z_mean, z_log_var, z], name='encoder')
decoder = keras.Model(decoder_inputs, decoder_outputs, name='decoder')
vae_outputs = decoder(encoder(encoder_inputs)[2])
vae = keras.Model(encoder_inputs, vae_outputs, name='vae')

# Loss function
def vae_loss(x, x_decoded_mean, z_log_var, z_mean):
    x = tf.keras.backend.flatten(x)
    x_decoded_mean = tf.keras.backend.flatten(x_decoded_mean)
    xent_loss = keras.losses.binary_crossentropy(x, x_decoded_mean)
    kl_loss = -0.5 * tf.reduce_mean(1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var))
    return xent_loss + kl_loss

vae.compile(optimizer='adam', loss=vae_loss)
vae.fit(x_train, x_train, epochs=10, batch_size=32, validation_data=(x_test, x_test))

Conditional Variational Autoencoders (CVAEs) Explained

CVAEs extend the capabilities of VAEs by introducing conditional inputs. CVAEs can generate data samples based on specific conditions or information. For example, you can conditionally generate images of cats or dogs by providing the model with the desired class label as input.

Let us understand using a real time example.

Online Shopping with CVAEs Imagine you’re shopping online for sneakers:

Basic VAE (no conditions): The website shows you random sneakers.
CVAE (with conditions): You select your preferences – color (red), size (10), and style (running).
Encoder: The website understands your choices and filters sneakers based on these conditions.
Variation: Recognizing that even within your conditions, there are variations (different shades of red, styles of running shoes), it considers those.
Latent Space: It creates a “sneaker customization space” where variations are allowed.
Decoder: Using your personalized conditions, it shows you sneakers that match your preferences closely.

CVAEs, like online shopping websites, use specific conditions (your preferences) to generate customized data (sneaker options) that closely align with your choices.

Continuing from the Variational Autoencoder (VAE) example, you can implement a Conditional Variational Autoencoder (CVAE). In this example, we’ll consider the MNIST dataset and generate digits conditionally based on a class label.

# Define the CVAE model
encoder = keras.Model([encoder_inputs, label], [z_mean, z_log_var, z], name='encoder')
decoder = keras.Model([decoder_inputs, label], decoder_outputs, name='decoder')
cvae_outputs = decoder([encoder([encoder_inputs, label])[2], label])
cvae = keras.Model([encoder_inputs, label], cvae_outputs, name='cvae')

Encoder | Decoder — Source : ResearchGate

Difference Between VAEs and CVAEs

VAE

VAEs are like artists who create art but with a bit of randomness.
They learn to create diverse variations of data without any specific instructions.
Useful for generating new data samples without conditions, like random art.

CVAE

CVAEs are like artists who can follow specific requests
They generate data based on given conditions or instructions
Useful for tasks where you want precise control over what’s generated, like turning a horse into a zebra while preserving the main features

Implementing CVAEs: Code Examples

Let’s explore a simple Python code example using TensorFlow and Keras to implement a CVAE for generating handwritten digits

# Import necessary libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Model

# Define the CVAE model architecture
latent_dim = 2
input_shape = (28, 28, 1)
num_classes = 10

# Encoder network
encoder_inputs = keras.Input(shape=input_shape)
x = layers.Conv2D(32, 3, padding='same', activation='relu')(encoder_inputs)
x = layers.Flatten()(x)
x = layers.Dense(64, activation='relu')(x)

# Conditional input
label = keras.Input(shape=(num_classes,))
x = layers.concatenate([x, label])

# Variational layers
z_mean = layers.Dense(latent_dim)(x)
z_log_var = layers.Dense(latent_dim)(x)

# Reparameterization trick
def sampling(args):
    z_mean, z_log_var = args
    epsilon = tf.keras.backend.random_normal(shape=(tf.shape(z_mean)[0], latent_dim))
    return z_mean + tf.exp(0.5 * z_log_var) * epsilon

z = layers.Lambda(sampling)([z_mean, z_log_var])

# Decoder network
decoder_inputs = layers.Input(shape=(latent_dim,))
x = layers.concatenate([decoder_inputs, label])
x = layers.Dense(64, activation='relu')(x)
x = layers.Dense(28 * 28 * 1, activation='sigmoid')(x)
x = layers.Reshape((28, 28, 1))(x)

# Create the models
encoder = Model([encoder_inputs, label], [z_mean, z_log_var, z], name='encoder')
decoder = Model([decoder_inputs, label], x, name='decoder')
cvae = Model([encoder_inputs, label], decoder([z, label]), name='cvae')
#import csv

This code provides a basic structure for a CVAE model. To train and generate images, you’ll need an appropriate dataset and further tuning.

Applications of CVAEs

CVAEs have applications in diverse domains, including:

Image-to-Image Translation: They can be used to translate images from one domain to another while preserving content. Imagine you have a photo of a horse, and you want to turn it into a zebra while keeping the main features. CVAEs can do that:

#import csv# Translate horse image to a zebra image
translated_image = cvae_generate(horse_image, target="zebra")

Style Transfer: CVAEs enable the transfer of artistic styles between images. Suppose you have a picture and want it to look like a famous painting, say, Van Gogh’s “Starry Night.” CVAEs can apply that style:

#import csv
# Apply "Starry Night" style to your photo
styled_image = cvae_apply_style(your_photo, style="Starry Night")

Anomaly Detection : They are effective in detecting anomalies in data. You have a dataset of normal heartbeats, and you want to detect irregular heartbeats. CVAEs can spot anomalies:

# Detect irregular heartbeats
is_anomaly = cvae_detect_anomaly(heartbeat_data)

Drug Discovery : CVAEs help in generating molecular structures for drug discovery. Let’s say you need to find new molecules for a life-saving drug. CVAEs can help design molecular structures:

#import csv# Generate potential drug molecules
drug_molecule = cvae_generate_molecule("anti-cancer")

These applications show how CVAEs can transform images, apply artistic styles, detect anomalies, and aid in crucial tasks like drug discovery, all while keeping the underlying data meaningful and useful.

Challenges and Future Directions

Challenges

Mode Collapse: Think of CVAEs like a painter who sometimes forgets to use all their colors. Mode collapse happens when CVAEs keep using the same colors (representations) for different things. So, they might paint all animals in just one color, losing diversity.
Generating High-Resolution Images: Imagine asking an artist to paint a detailed, large mural on a tiny canvas. It’s challenging. CVAEs face a similar challenge when trying to create highly detailed, big pictures.

Future Goals

Researchers want to make CVAEs better:

Avoid Mode Collapse: They’re working on making sure the artist (CVAE) uses all the colors (representations) they have, creating more diverse and accurate results.
High-Resolution Art: They aim to help the artist (CVAE) paint bigger and more detailed murals (images) by improving the techniques used. This way, we can get impressive, high-quality artworks from CVAEs.

Conclusion

Conditional Variational Autoencoders represent a groundbreaking development in Generative AI. Their ability to generate data based on specific conditions opens up a world of possibilities in various applications. By understanding their underlying principles and implementing them effectively, we can harness the potential of CVAEs for advanced image generation and beyond.

Key Takeaways

Generative AI Advancement: Enabling image generation with conditional inputs.
Simple Coffee Analogy: Think of VAEs like summarizing coffee preferences, allowing variations while preserving the essence.
Basic VAE Code: A beginner-friendly Python code example of a VAE is provided, using the MNIST dataset.
CVAE Implementation: The article includes a code snippet to implement a CVAE for conditional image generation.
Online Shopping Example: An analogy of online sneaker shopping illustrates CVAEs’ ability to customize data based on conditions.

Frequently Asked Questions

Q1. How do Conditional VAEs differ from VAEs?

A. While VAEs generate data with some randomness, CVAEs generate data with specific conditions or constraints. VAEs are like artists creating random art.

Q2. What’s the role of Conditional VAEs in the field of AI and machine learning?

A. Conditional Variational Autoencoders (CVAEs) are very useful in the world of AI. They can create customized data based on specific conditions, opening doors to many applications.

Q3.What are the libraries that are open-sourced or pre-trained models for CVAEs?

A. Yes, you can find open-source libraries like TensorFlow and PyTorch that provide tools for building CVAEs. Some pre-trained models and code examples are available in these libraries to kickstart your projects.

Q4. Are there pre-trained CVAE models available for specific tasks?

A. Pre-trained CVAE models are less common compared to other architectures like Convolutional Neural Networks (CNNs). However, you can find pre-trained VAEs that you can adapt for your task by fine-tuning the model.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Hari Bhutanadhu

My self Bhutanadhu Hari, 2023 Graduated from Indian Institute of Technology Jodhpur ( IITJ ) . I am interested in Web Development and Machine Learning and most passionate about exploring Artificial Intelligence.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Deep Learning

Feed Forward Networks

Gradient Descent

Loss Function

Activation Functions

Introduction to Neural networks

Forward and Backward Propagation

Optimizers

Learning Rate Schedulers

NN on Structured Data

Improving the Deep Learning Model

Deep Learning Model Optimization

Unsupervised Deep Learning

AutoDL

Model Deployment

Introduction to PyTorch

Exploring Advanced Generative AI | Conditional VAEs

Introduction

Table of contents

Understanding Variational Autoencoders (VAEs)

Conditional Variational Autoencoders (CVAEs) Explained

Difference Between VAEs and CVAEs

VAE

CVAE

Implementing CVAEs: Code Examples

Applications of CVAEs

Challenges and Future Directions

Challenges

Future Goals

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang