What are Variational Autoencoders (VAEs)?

ANURAG SINGH CHOUDHARY Last Updated : 29 Jan, 2025

10 min read

Variational Autoencoders (VAEs) are generative models explicitly designed to capture the underlying probability distribution of a given dataset and generate novel samples. They utilize an architecture that comprises an encoder-decoder structure. The encoder transforms input data into a latent form, and the decoder aims to reconstruct the original data based on this latent representation. The VAE is programmed to minimize the dissimilarity between the original and reconstructed data, enabling it to comprehend the underlying data distribution and generate new samples that conform to the same distribution.

One notable advantage of VAEs is their ability to generate new data samples resembling the training data. Because the VAE’s latent space is continuous, the decoder can generate new data points that seamlessly interpolate among the training data points. VAEs find applications in various domains like density estimation and text generation.

In this article, you get an understanding about the Variational autoencoders, and also covering on these topics how variational autoencoders in deep learning works. So at the end of this article you will get full understand about the VAE Model and how they works.

An introduction to Autoencoders for Beginners

This article was published as a part of the Data Science Blogathon.

What is a Variational Autoencoder?
The Architecture of Variational Autoencoder
Intuitions About the Regularization
Mathematical Details of VAEs
Neural Networks in the Model
Variational Autoencoder Execution
Visualization of Latent Space
How VAEs could be used in the Future
Conclusion
Frequently Asked Questions

What is a Variational Autoencoder?

Variational Autoencoders (VAEs) are a type of artificial neural network architecture that combines the power of autoencoders with probabilistic methods. They are used for generative modeling, meaning they can generate new data samples similar to the training data.

The Architecture of Variational Autoencoder

A VAE typically has two major components: An encoder connection and a decoder connection. An encoder network transforms The input data into a low-dimensional secret space, often called a “secret code”.

Various neural network topologies, such as fully connected or convolutional neural networks, can be investigated for implementing encoder networks. The architecture chosen is based on the characteristics of the data. The encoder network produces essential parameters, such as the mean and variance of a Gaussian distribution, necessary for sampling and generating the latent code.

Similarly, researchers can construct the decoder network using various types of neural networks, and its objective is to reconstruct the original data from the provided latent code.

Example of VAE architecture: fen

Architecture of Variational Autoencoders

A VAE comprises an encoder network that maps input data to a latent code and a decoder network that conducts the inverse operation by translating the latent code back to the reconstruction data. By undergoing this training process, the VAE learns an optimized latent representation that captures the fundamental characteristics of the data, enabling precise reconstruction.

Intuitions About the Regularization

In addition to the architectural aspects, researchers apply regularization to the latent code, making it a vital element of VAEs. This regularization prevents overfitting by encouraging a smooth distribution of the latent code rather than simply memorizing the training data.

The regularization not only aids in generating new data samples that interpolate smoothly between training data points but also contributes to the VAE’s ability to generate novel data resembling the training data. Moreover, this regularization prevents the decoder network from perfectly reconstructing the input data, promoting the learning of a more general data representation that enhances the VAE’s capacity for generating diverse data samples.

Mathematically, in VAEs, researchers express the regularization by incorporating a Kullback-Leibler (KL) divergence term into the loss function. The encoder network generates parameters (e.g., mean and log-variance) of a Gaussian distribution for sampling the latent code. The loss function of a VAE includes the calculation of the KL divergence between the distribution of the learned latent variables and a prior distribution, normal distribution. Researchers incorporate the KL divergence term to encourage the latent variables to possess distributions similar to the prior distribution.

here is the formula for KL divergence:

KL(q(z∣x)∣∣p(z)) = E[log q(z∣x) − log p(z)]

Intuitions about regularisation | Variational Autoencoders

In summary, the regularization incorporated in VAEs plays a crucial role in enhancing the model’s capacity to generate fresh data samples while mitigating the risk of overfitting the training data.

Mathematical Details of VAEs

Probabilistic Framework and Assumptions

The probabilistic framework of a VAE can be outlined as follows:

Latent Variables

The inclusion of keywords such as “latent distribution,” “latent variable z,” “deep generative models,” and “random variable” is pivotal in facilitating their incorporation within a model structured around a simpler (usually exponential) conditional distribution pertaining to the observable variable. This setup revolves around a probability distribution involving two variables: p(x, z). While the variable x is readily observable in the dataset being analyzed, the variable z remains concealed. The overall probability distribution can be expressed as p(x, z) = p(x|z)p(z).

Observed Variables

We have an observed variable x, which is assumed to follow a likelihood distribution p(x|z) (for example, a Bernoulli distribution).

Likelihood Distribution

L(x, z) is a function that depends on two variables. If we set the value of x, the likelihood function can be understood as a distribution representing the probability distribution of z for that particular fixed x. However, if we set the value of z, the likelihood function should not be regarded as a distribution for x. In most cases, it does not adhere to the characteristics of a distribution, such as summing up to 1. Nevertheless, certain scenarios exist where the likelihood function can formally meet the distribution criteria and satisfy the requirement of summing to 1.

The combined distribution of the latent and observable variables is as follows: p(x,z) = p(x|z)p(z). A joint probability distribution presents the probability distribution for multiple random variables.

The main purpose of a VAE is to understand the true posterior distribution of the latent variables, denoted as p(z|x). A VAE accomplishes this by employing an encoder network to approximate the genuine posterior distribution with a learned approximation q(z|x).

Posterior Distribution

In Bayesian statistics, a posterior probability refers to the adjusted or updated probability of an event happening in light of newly acquired information. Update the prior probability by applying Bayes’ theorem to calculate the posterior probability.

The VAE learns the model parameters by maximizing the Evidence Lower Bound (ELBO):

ELBO = E[log(p(x|z))] – KL(q(z|x)||p(z))

ELBO consists of two terms. The first term is the reconstruction term, which calculates the ability of the VAE to recover the input data correctly. The second term, the KL variance, defines the difference between the estimated posterior distribution (q(z|x)) and the prior distribution (p(z)).

By employing a probabilistic framework, VAE models generate the data assuming that the input data from a latent space is on specific probabilistic distributions. The objective is to learn the true posterior distribution by maximizing the likelihood of the input data.

Variational Inference Formulation

The formulation of Variational Inference in a VAE is as follows:

Approximate posterior distribution: We have an approximation of the posterior distribution q(z|x).
True posterior distribution: We have the true posterior distribution p(z|x).

The aim is to find a similar distribution (q(z|x)) that approximates the true distribution (p(z|x)) as closely as possible, using the KL divergence method.

The KL variance equation compares two probability distributions, q(z|x) and p(z|x), to measure their differences.

During VAE training, we try to minimize the KL divergence by increasing the evidence of lower boundary (ELBO), a combination of the reconstruction term and the KL divergence. The reconstruction term assesses the model’s ability to reconstruct input data, while the KL divergence measures the difference between the approximate and actual distributions.

Neural Networks in the Model

Neural networks are commonly used to implement VAEs, where both the encoder and decoder components are implemented as neural networks. During the training process, the VAE adjusts the parameters of the encoder and decoder networks to minimize two key components: the reconstruction error and the KL divergence between the variational distribution and the true posterior distribution. This optimization task is often accomplished using techniques like stochastic gradient descent or other suitable optimization algorithms.

Variational Autoencoder Execution

Before getting into the configuration of a Variational Autoencoder (VAE), it is critical first to understand the fundamental concepts. While VAE implementation can be intricate, we can simplify learning by following a logical and coherent structure.

Our approach will involve gradually introducing the fundamental concepts and progressively delving into implementation details. We will adopt a hands-on approach to enhance comprehension and provide illustrative examples throughout the learning journey.

Data Preparation

The provided code includes loading the MNIST dataset, a widely utilized dataset for machine learning and computer vision tasks. This dataset comprises 60,000 grayscale images of handwritten digits (0-9), each with a size of 28×28 pixels, along with their corresponding labels indicating the digit represented in each image. This allows us to link the images with their respective categories or names. To prepare the input data for training, the code applies normalization by dividing all pixel values by 255. Furthermore, we reshape the input data to incorporate a batch dimension. This preprocessing step ensures that you format the data properly for model training.

import tensorflow as tf
import numpy as np

(x_train, y_train)
,(x_test, y_test) =
tf.keras.datasets.mnist.load_data()

# Normalize the input data
x_train = x_train / 255.
# Reshape the input data to have an additional batch dimension
x_train = x_train.reshape((-1, 28*28))
x_test = x_test.reshape((-1, 28*28))

Model Definition

In the VAE model, we have an encoder and a decoder that work together. The encoder maps the input image to the latent space using two dense layers with a ReLU activation function. On the other hand, the decoder takes the latent vector as input and reconstructs the original image using two dense layers.

input_dim = 28*28
hidden_dim = 512
latent_dim = 128

Encoder Architecture

encoder_input = tf.keras.Input(shape=(input_dim,))
encoder_hidden = tf.keras.layers.Dense(hidden_dim, activation='relu')(encoder_input)
latent = tf.keras.layers.Dense(latent_dim)(encoder_hidden)
encoder = tf.keras.Model(encoder_input, latent)

Decoder Architecture

decoder_input = tf.keras.Input(shape=(latent_dim,))
decoder_hidden = tf.keras.layers.Dense(hidden_dim, activation='relu')(decoder_input)
decoder_output = tf.keras.layers.Dense(input_dim)(decoder_hidden)
decoder = tf.keras.Model(decoder_input, decoder_output)

VAE Architecture

inputs = tf.keras.Input(shape=(input_dim,))
latent = encoder(inputs)
outputs = decoder(latent)
vae = tf.keras.Model(inputs, outputs)

Training the Model

To train the VAE, we utilize the Adam optimizer and the binary cross-entropy loss function. The training is performed in mini-batches, where the loss is calculated, and gradients are backpropagated for each image. Repeat this process.


loss_fn = tf.keras.losses.BinaryCrossentropy()
optimizer = tf.keras.optimizers.Adam()

num_epochs = 50
for epoch in range(num_epochs):
    for x in x_train:
     
        x = x[tf.newaxis, ...]
        
        with tf.GradientTape() as tape:
        
            reconstructed = vae(x)
            
           
            loss = loss_fn(x, reconstructed)
            
       
        grads = tape.gradient(loss, vae.trainable_variables)
        optimizer.apply_gradients(zip(grads, vae.trainable_variables))
        
    print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss.numpy():.4f}')

Output:

Epoch 1: Loss - 0.3559
Epoch 2: Loss - 0.3550
.
.
.

Generate Samples

In this updated code, we redefine the latent_samples variable with a shape of (5, latent_dim), allowing it to generate five random samples instead of 10. We also modified the for loop to iterate five times, displaying five generated samples instead of 10. Additionally, we adjust the subplot function to arrange the generated samples in a grid with one row and five columns.

# Generate samples
latent_samples = tf.random.normal(shape=(5, latent_dim))
generated_samples = decoder(latent_samples)

# Plot the generated samples
import matplotlib.pyplot as plt

for i in range(5):
    plt.subplot(1, 5, i+1)
    plt.imshow(generated_samples[i].numpy().reshape(28, 28), cmap='gray')
    plt.axis('off')
    
plt.show()

output:

When you run this code, it will generate a figure showcasing five images that resemble the ones from the MNIST test set. The system will display these photographs in a grid arrangement featuring one row and five columns. The system will showcase them in grayscale, using the ‘grey’ color map, without axes.

Visualization of Latent Space

To gain insights into the latent space of a VAE, you can follow these steps:

Utilize the VAE to encode the training data points, projecting them into the latent space.
Employ a dimensionality reduction technique like t-SNE to map the high-dimensional latent space onto a 2D space suitable for visualization.
Plot the data points in the 2D space, allowing for a visual exploration of the latent space.

By following this process, you can effectively visualize and comprehend the underlying structure and distribution of the latent space in the VAE.

import tensorflow as tf
from sklearn.manifold import TSNE


latent_vectors = encoder(x_train).numpy()


latent_2d = TSNE(n_components=2).fit_transform(latent_vectors)

# Ploting latent space
plt.scatter(latent_2d[:, 0], latent_2d[:, 1], c=y_train, cmap='viridis')
plt.colorbar()
plt.show()

output:

Gaining insights into the structure and organization of the data trained on a Variational Autoencoder (VAE) by visualizing its latent space. This visualization technique offers a valuable means of comprehending the underlying patterns and relationships within the data.

How VAEs could be used in the Future

Personalized medicine

VAEs could be used to develop personalized medical treatments for patients based on their individual genetic makeup and medical history. For example, it could be used to design new drugs that are more effective and have fewer side effects.

New materials

VAEs could be used to design new materials with unique properties, such as stronger and lighter materials for aircraft or more efficient solar cells. For example, it could be used to design new materials that can withstand extreme temperatures or pressures.

Creative AI

VAEs could be used to create new forms of art and entertainment, such as realistic images, videos, or music that is tailored to a user’s individual preferences. For example, it could be used to create new video games or movies that are more immersive and engaging.

Scientific research

VAEs could be used to generate new scientific data for research purposes. For example, it could be used to generate new images of galaxies or proteins or to create new simulations of physical systems.

Conclusion

A variational autoencoder (VAE) is an enhanced form of an autoencoder that incorporates regularization techniques to mitigate overfitting and ensure desirable properties in the latent space for effective generative processes. Functioning as a generative system, VAEs share a similar objective with generative adversarial networks. Like a conventional autoencoder, a VAE comprises an encoder and a decoder. Its training aims to minimize the reconstruction error between the encoded-decoded data and the original input.

Hope you like this article and know you get clearance about the topics variational autoencoders, variational autoencoders in deep learning, vae architecture, and about he vae models.As what you get in the article given down the key takeways.

Key Takeaways

Variational autoencoders (VAEs) can learn to reconstruct and generate new samples from a provided dataset.
By utilizing a latent space, VAEs can represent data continuously and smoothly, facilitating the generation of variations of the input data with smooth transitions.
The architecture of a VAE consists of an encoder network that maps the input data to the latent space, a decoder network responsible for reconstructing the data from the latent space, and a loss function that combines a reconstruction loss and a regularization term.
VAEs have demonstrated their utility in image generation, anomaly detection, and semi-supervised learning tasks.

Frequently Asked Questions

Q1. What exactly is a variational autoencoder?

A. Variational autoencoders (VAEs) are probabilistic generative models with different components, including neural networks called encoders and decoders. The encoder network handles the first part, and the decoder network handles the second part.

Q2. What are the advantages of VAEs?

A. One of the main benefits of VAEs is their ability to generate new data samples that closely resemble the training data. Achieve this through a continuous latent space, enabling the decoder to produce new data points that smoothly interpolate between the existing training data points.

Q3. What is the most crucial drawback of VAEs?

A. A notable limitation of variational autoencoders is their tendency to produce blurry and unrealistic outputs. This issue arises from their approach to recovering data distributions and calculating loss functions.

Q4. Which is better, GAN or VAE?

A. GANs produce highly realistic images but can be challenging to train and work with. On the other hand, VAEs are generally easier to train but may not always achieve the same level of image quality as GANs.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

ANURAG SINGH CHOUDHARY

Passionate Machine learning professional and data-driven analyst with the ability to apply ML techniques and various algorithms to solve real-world business problems. I have always been fascinated by Mathematics and Numbers. Over the past few months, I have dedicated a considerable amount of time and effort to Machine Learning Studies.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Deep Learning

Feed Forward Networks

Gradient Descent

Loss Function

Activation Functions

Introduction to Neural networks

Forward and Backward Propagation

Optimizers

Learning Rate Schedulers

NN on Structured Data

Improving the Deep Learning Model

Deep Learning Model Optimization

Unsupervised Deep Learning

AutoDL

Model Deployment

Introduction to PyTorch

What are Variational Autoencoders (VAEs)?

Table of contents

What is a Variational Autoencoder?

The Architecture of Variational Autoencoder

Intuitions About the Regularization

Mathematical Details of VAEs

Latent Variables

Observed Variables

Likelihood Distribution

Posterior Distribution

Variational Inference Formulation

Neural Networks in the Model

Variational Autoencoder Execution

Data Preparation

Model Definition

Encoder Architecture

Decoder Architecture

VAE Architecture

Training the Model

Generate Samples

Visualization of Latent Space

How VAEs could be used in the Future

Personalized medicine

New materials

Creative AI

Scientific research

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)