Variational Autoencoders (VAEs) are generative models explicitly designed to capture the underlying probability distribution of a given dataset and generate novel samples. They utilize an architecture that comprises an encoder-decoder structure. The encoder transforms input data into a latent form, and the decoder aims to reconstruct the original data based on this latent representation. The VAE is programmed to minimize the dissimilarity between the original and reconstructed data, enabling it to comprehend the underlying data distribution and generate new samples that conform to the same distribution.
One notable advantage of VAEs is their ability to generate new data samples resembling the training data. Because the VAE’s latent space is continuous, the decoder can generate new data points that seamlessly interpolate among the training data points. VAEs find applications in various domains like density estimation and text generation.
In this article, you get an understanding about the Variational autoencoders, and also covering on these topics how variational autoencoders in deep learning works. So at the end of this article you will get full understand about the VAE Model and how they works.
This article was published as a part of the Data Science Blogathon.
Variational Autoencoders (VAEs) are a type of artificial neural network architecture that combines the power of autoencoders with probabilistic methods. They are used for generative modeling, meaning they can generate new data samples similar to the training data.
A VAE typically has two major components: An encoder connection and a decoder connection. An encoder network transforms The input data into a low-dimensional secret space, often called a “secret code”.
Various neural network topologies, such as fully connected or convolutional neural networks, can be investigated for implementing encoder networks. The architecture chosen is based on the characteristics of the data. The encoder network produces essential parameters, such as the mean and variance of a Gaussian distribution, necessary for sampling and generating the latent code.
Similarly, researchers can construct the decoder network using various types of neural networks, and its objective is to reconstruct the original data from the provided latent code.
Example of VAE architecture: fen
A VAE comprises an encoder network that maps input data to a latent code and a decoder network that conducts the inverse operation by translating the latent code back to the reconstruction data. By undergoing this training process, the VAE learns an optimized latent representation that captures the fundamental characteristics of the data, enabling precise reconstruction.
In addition to the architectural aspects, researchers apply regularization to the latent code, making it a vital element of VAEs. This regularization prevents overfitting by encouraging a smooth distribution of the latent code rather than simply memorizing the training data.
The regularization not only aids in generating new data samples that interpolate smoothly between training data points but also contributes to the VAE’s ability to generate novel data resembling the training data. Moreover, this regularization prevents the decoder network from perfectly reconstructing the input data, promoting the learning of a more general data representation that enhances the VAE’s capacity for generating diverse data samples.
Mathematically, in VAEs, researchers express the regularization by incorporating a Kullback-Leibler (KL) divergence term into the loss function. The encoder network generates parameters (e.g., mean and log-variance) of a Gaussian distribution for sampling the latent code. The loss function of a VAE includes the calculation of the KL divergence between the distribution of the learned latent variables and a prior distribution, normal distribution. Researchers incorporate the KL divergence term to encourage the latent variables to possess distributions similar to the prior distribution.
here is the formula for KL divergence:
KL(q(z∣x)∣∣p(z)) = E[log q(z∣x) − log p(z)]
In summary, the regularization incorporated in VAEs plays a crucial role in enhancing the model’s capacity to generate fresh data samples while mitigating the risk of overfitting the training data.
Probabilistic Framework and Assumptions
The probabilistic framework of a VAE can be outlined as follows:
The inclusion of keywords such as “latent distribution,” “latent variable z,” “deep generative models,” and “random variable” is pivotal in facilitating their incorporation within a model structured around a simpler (usually exponential) conditional distribution pertaining to the observable variable. This setup revolves around a probability distribution involving two variables: p(x, z). While the variable x is readily observable in the dataset being analyzed, the variable z remains concealed. The overall probability distribution can be expressed as p(x, z) = p(x|z)p(z).
We have an observed variable x, which is assumed to follow a likelihood distribution p(x|z) (for example, a Bernoulli distribution).
L(x, z) is a function that depends on two variables. If we set the value of x, the likelihood function can be understood as a distribution representing the probability distribution of z for that particular fixed x. However, if we set the value of z, the likelihood function should not be regarded as a distribution for x. In most cases, it does not adhere to the characteristics of a distribution, such as summing up to 1. Nevertheless, certain scenarios exist where the likelihood function can formally meet the distribution criteria and satisfy the requirement of summing to 1.
The combined distribution of the latent and observable variables is as follows: p(x,z) = p(x|z)p(z). A joint probability distribution presents the probability distribution for multiple random variables.
The main purpose of a VAE is to understand the true posterior distribution of the latent variables, denoted as p(z|x). A VAE accomplishes this by employing an encoder network to approximate the genuine posterior distribution with a learned approximation q(z|x).
In Bayesian statistics, a posterior probability refers to the adjusted or updated probability of an event happening in light of newly acquired information. Update the prior probability by applying Bayes’ theorem to calculate the posterior probability.
The VAE learns the model parameters by maximizing the Evidence Lower Bound (ELBO):
ELBO = E[log(p(x|z))] – KL(q(z|x)||p(z))
ELBO consists of two terms. The first term is the reconstruction term, which calculates the ability of the VAE to recover the input data correctly. The second term, the KL variance, defines the difference between the estimated posterior distribution (q(z|x)) and the prior distribution (p(z)).
By employing a probabilistic framework, VAE models generate the data assuming that the input data from a latent space is on specific probabilistic distributions. The objective is to learn the true posterior distribution by maximizing the likelihood of the input data.
The formulation of Variational Inference in a VAE is as follows:
The aim is to find a similar distribution (q(z|x)) that approximates the true distribution (p(z|x)) as closely as possible, using the KL divergence method.
The KL variance equation compares two probability distributions, q(z|x) and p(z|x), to measure their differences.
During VAE training, we try to minimize the KL divergence by increasing the evidence of lower boundary (ELBO), a combination of the reconstruction term and the KL divergence. The reconstruction term assesses the model’s ability to reconstruct input data, while the KL divergence measures the difference between the approximate and actual distributions.
Neural networks are commonly used to implement VAEs, where both the encoder and decoder components are implemented as neural networks. During the training process, the VAE adjusts the parameters of the encoder and decoder networks to minimize two key components: the reconstruction error and the KL divergence between the variational distribution and the true posterior distribution. This optimization task is often accomplished using techniques like stochastic gradient descent or other suitable optimization algorithms.
Before getting into the configuration of a Variational Autoencoder (VAE), it is critical first to understand the fundamental concepts. While VAE implementation can be intricate, we can simplify learning by following a logical and coherent structure.
Our approach will involve gradually introducing the fundamental concepts and progressively delving into implementation details. We will adopt a hands-on approach to enhance comprehension and provide illustrative examples throughout the learning journey.
The provided code includes loading the MNIST dataset, a widely utilized dataset for machine learning and computer vision tasks. This dataset comprises 60,000 grayscale images of handwritten digits (0-9), each with a size of 28×28 pixels, along with their corresponding labels indicating the digit represented in each image. This allows us to link the images with their respective categories or names. To prepare the input data for training, the code applies normalization by dividing all pixel values by 255. Furthermore, we reshape the input data to incorporate a batch dimension. This preprocessing step ensures that you format the data properly for model training.
import tensorflow as tf
import numpy as np
(x_train, y_train)
,(x_test, y_test) =
tf.keras.datasets.mnist.load_data()
# Normalize the input data
x_train = x_train / 255.
# Reshape the input data to have an additional batch dimension
x_train = x_train.reshape((-1, 28*28))
x_test = x_test.reshape((-1, 28*28))
In the VAE model, we have an encoder and a decoder that work together. The encoder maps the input image to the latent space using two dense layers with a ReLU activation function. On the other hand, the decoder takes the latent vector as input and reconstructs the original image using two dense layers.
input_dim = 28*28
hidden_dim = 512
latent_dim = 128
encoder_input = tf.keras.Input(shape=(input_dim,))
encoder_hidden = tf.keras.layers.Dense(hidden_dim, activation='relu')(encoder_input)
latent = tf.keras.layers.Dense(latent_dim)(encoder_hidden)
encoder = tf.keras.Model(encoder_input, latent)
decoder_input = tf.keras.Input(shape=(latent_dim,))
decoder_hidden = tf.keras.layers.Dense(hidden_dim, activation='relu')(decoder_input)
decoder_output = tf.keras.layers.Dense(input_dim)(decoder_hidden)
decoder = tf.keras.Model(decoder_input, decoder_output)
inputs = tf.keras.Input(shape=(input_dim,))
latent = encoder(inputs)
outputs = decoder(latent)
vae = tf.keras.Model(inputs, outputs)
To train the VAE, we utilize the Adam optimizer and the binary cross-entropy loss function. The training is performed in mini-batches, where the loss is calculated, and gradients are backpropagated for each image. Repeat this process.
loss_fn = tf.keras.losses.BinaryCrossentropy()
optimizer = tf.keras.optimizers.Adam()
num_epochs = 50
for epoch in range(num_epochs):
for x in x_train:
x = x[tf.newaxis, ...]
with tf.GradientTape() as tape:
reconstructed = vae(x)
loss = loss_fn(x, reconstructed)
grads = tape.gradient(loss, vae.trainable_variables)
optimizer.apply_gradients(zip(grads, vae.trainable_variables))
print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss.numpy():.4f}')
Output:
Epoch 1: Loss - 0.3559
Epoch 2: Loss - 0.3550
.
.
.
In this updated code, we redefine the latent_samples variable with a shape of (5, latent_dim), allowing it to generate five random samples instead of 10. We also modified the for loop to iterate five times, displaying five generated samples instead of 10. Additionally, we adjust the subplot function to arrange the generated samples in a grid with one row and five columns.
# Generate samples
latent_samples = tf.random.normal(shape=(5, latent_dim))
generated_samples = decoder(latent_samples)
# Plot the generated samples
import matplotlib.pyplot as plt
for i in range(5):
plt.subplot(1, 5, i+1)
plt.imshow(generated_samples[i].numpy().reshape(28, 28), cmap='gray')
plt.axis('off')
plt.show()
output:
When you run this code, it will generate a figure showcasing five images that resemble the ones from the MNIST test set. The system will display these photographs in a grid arrangement featuring one row and five columns. The system will showcase them in grayscale, using the ‘grey’ color map, without axes.
To gain insights into the latent space of a VAE, you can follow these steps:
By following this process, you can effectively visualize and comprehend the underlying structure and distribution of the latent space in the VAE.
import tensorflow as tf
from sklearn.manifold import TSNE
latent_vectors = encoder(x_train).numpy()
latent_2d = TSNE(n_components=2).fit_transform(latent_vectors)
# Ploting latent space
plt.scatter(latent_2d[:, 0], latent_2d[:, 1], c=y_train, cmap='viridis')
plt.colorbar()
plt.show()
output:
Gaining insights into the structure and organization of the data trained on a Variational Autoencoder (VAE) by visualizing its latent space. This visualization technique offers a valuable means of comprehending the underlying patterns and relationships within the data.
VAEs could be used to develop personalized medical treatments for patients based on their individual genetic makeup and medical history. For example, it could be used to design new drugs that are more effective and have fewer side effects.
VAEs could be used to design new materials with unique properties, such as stronger and lighter materials for aircraft or more efficient solar cells. For example, it could be used to design new materials that can withstand extreme temperatures or pressures.
VAEs could be used to create new forms of art and entertainment, such as realistic images, videos, or music that is tailored to a user’s individual preferences. For example, it could be used to create new video games or movies that are more immersive and engaging.
VAEs could be used to generate new scientific data for research purposes. For example, it could be used to generate new images of galaxies or proteins or to create new simulations of physical systems.
A variational autoencoder (VAE) is an enhanced form of an autoencoder that incorporates regularization techniques to mitigate overfitting and ensure desirable properties in the latent space for effective generative processes. Functioning as a generative system, VAEs share a similar objective with generative adversarial networks. Like a conventional autoencoder, a VAE comprises an encoder and a decoder. Its training aims to minimize the reconstruction error between the encoded-decoded data and the original input.
Hope you like this article and know you get clearance about the topics variational autoencoders, variational autoencoders in deep learning, vae architecture, and about he vae models.As what you get in the article given down the key takeways.
A. Variational autoencoders (VAEs) are probabilistic generative models with different components, including neural networks called encoders and decoders. The encoder network handles the first part, and the decoder network handles the second part.
A. One of the main benefits of VAEs is their ability to generate new data samples that closely resemble the training data. Achieve this through a continuous latent space, enabling the decoder to produce new data points that smoothly interpolate between the existing training data points.
A. A notable limitation of variational autoencoders is their tendency to produce blurry and unrealistic outputs. This issue arises from their approach to recovering data distributions and calculating loss functions.
A. GANs produce highly realistic images but can be challenging to train and work with. On the other hand, VAEs are generally easier to train but may not always achieve the same level of image quality as GANs.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.