Unleashing Generative AI with VAEs, GANs, and Transformers

Babina Banjara Last Updated : 11 Aug, 2023

10 min read

Introduction

Generative AI, an exciting field at the intersection of artificial intelligence and creativity, is revolutionizing various industries by enabling machines to generate new and original content. From generating realistic images and music compositions to creating lifelike text and immersive virtual environments, generative AI is pushing the boundaries of what machines can achieve. In this blog, we will embark on a journey to explore the promising landscape of generative AI with VAEs, GANs and Transformers, delving into its applications, advancements, and the profound impact it holds for the future.

Learning Objectives

Understand the fundamental concepts of generative AI, including Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Transformers.
Explore the creative potential of generative AI models and their applications.
Gain insights into the implementation of VAEs, GANs, and Transformers.
Explore the future directions and advancements in generative AI.

This article was published as a part of the Data Science Blogathon.

Defining Generative AI

Generative AI, at its core, involves training models to learn from existing data and then generate new content that shares similar characteristics. It breaks away from traditional AI approaches that focus on recognizing patterns and making predictions based on existing information. Instead, generative AI aims to create something entirely new, expanding the realms of creativity and innovation.

The Power of Generative AI

Generative AI has the power to unleash creativity and push the boundaries of what machines can accomplish. By understanding the underlying principles and models used in generative AI, such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Transformers, we can grasp the techniques and methods behind this creative technology.

The power of generative AI lies in its ability to unleash creativity and generate new content that imitates and even surpasses human creativity. By leveraging algorithms and models, generative AI can produce diverse outputs such as images, music, and text that inspire, innovate, and push the boundaries of artistic expression.

Generative AI models, such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Transformers, play a key role in unlocking this power. VAEs capture the underlying structure of data and can generate new samples by sampling from a learned latent space. GANs introduce a competitive framework between a generator and discriminator, leading to highly realistic outputs. Transformers excel at capturing long-range dependencies, making them well-suited for generating coherent and contextually relevant content.

Let’s explore this in detail.

Variational Autoencoders (VAEs)

One of the fundamental models used in generative AI is the Variational Autoencoder or VAE. By employing an encoder-decoder architecture, VAEs capture the essence of input data by compressing it into a lower-dimensional latent space. From this latent space, the decoder generates new samples that resemble the original data.

VAEs have found applications in image generation, text synthesis, and more, allowing machines to create novel content that captivates and inspires.

VAE Implementation

In this section, we will be implementing Variational Autoencoder (VAE) from scratch.

Defining Encoder and Decoder Model

The encoder takes the input data, passes it through a dense layer with a ReLU activation function, and outputs the mean and log variance of the latent space distribution.

The decoder network is a feed-forward neural network that takes the latent space representation as input, passes it through a dense layer with a ReLU activation function, and produces the decoder outputs by applying another dense layer with a sigmoid activation function.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Define the encoder network
encoder_inputs = keras.Input(shape=(input_dim,))
x = layers.Dense(hidden_dim, activation="relu")(encoder_inputs)
z_mean = layers.Dense(latent_dim)(x)
z_log_var = layers.Dense(latent_dim)(x)

# Define the decoder network
decoder_inputs = keras.Input(shape=(latent_dim,))
x = layers.Dense(hidden_dim, activation="relu")(decoder_inputs)
decoder_outputs = layers.Dense(output_dim, activation="sigmoid")(x)

Define Sampling Function

The sampling function takes the mean and log variance of a latent space as inputs and generates a random sample by adding noise scaled by the exponential of half the log variance to the mean.

# Define the sampling function for the latent space
def sampling(args):
    z_mean, z_log_var = args
    epsilon = tf.random.normal(shape=(batch_size, latent_dim))
    return z_mean + tf.exp(0.5 * z_log_var) * epsilon

z = layers.Lambda(sampling)([z_mean, z_log_var])

Define Loss Function

The VAE loss function has the reconstruction loss, which measures the similarity between the input and output, and the Kullback-Leibler (KL) loss, which regularizes the latent space by penalizing deviations from a prior distribution. These losses are combined and added to the VAE model allowing for end-to-end training that simultaneously optimizes both the reconstruction and regularization objectives.

vae = keras.Model(inputs=encoder_inputs, outputs=decoder_outputs)

# Define the loss function
reconstruction_loss = keras.losses.binary_crossentropy(encoder_inputs, decoder_outputs)
reconstruction_loss *= input_dim

kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
kl_loss = tf.reduce_mean(kl_loss) * -0.5

vae_loss = reconstruction_loss + kl_loss
vae.add_loss(vae_loss)

Compile and Train the Model

The given code compiles and trains a Variational Autoencoder model using the Adam optimizer, where the model learns to minimize the combined reconstruction and KL loss to generate meaningful representations and reconstructions of the input data.

# Compile and train the VAE
vae.compile(optimizer="adam")
vae.fit(x_train, epochs=epochs, batch_size=batch_size)

Generative Adversarial Networks (GANs)

Generative Adversarial Networks have gained significant attention in the field of generative AI. Comprising a generator and a discriminator, GANs engage in an adversarial training process. The generator aims to produce realistic samples, while the discriminator distinguishes between real and generated samples. Through this competitive interplay, GANs learn to generate increasingly convincing and lifelike content.

GANs have been employed in generating images, and videos, and even simulating human voices, offering a glimpse into the astonishing potential of generative AI.

GAN Implementation

In this section, we will be implementing Generative Adversarial Networks (GANs) from scratch.

Defining Generator and Discriminator Network

This defines a generator network, represented by the ‘generator’ variable, which takes a latent space input and transforms it through a series of dense layers with ReLU activations to generate synthetic data samples.

Similarly, it also defines a discriminator network, represented by the ‘discriminator’ variable, which takes the generated data samples as input and passes them through dense layers with ReLU activations to predict a single output value indicating the probability of the input being real or fake.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Define the generator network
generator = keras.Sequential([
    layers.Dense(256, input_dim=latent_dim, activation="relu"),
    layers.Dense(512, activation="relu"),
    layers.Dense(output_dim, activation="sigmoid")
])

# Define the discriminator network
discriminator = keras.Sequential([
    layers.Dense(512, input_dim=output_dim, activation="relu"),
    layers.Dense(256, activation="relu"),
    layers.Dense(1, activation="sigmoid")
])

Defining GAN Model

The GAN model is defined by combining the generator and discriminator networks. The discriminator is compiled separately with binary cross-entropy loss and the Adam optimizer. During GAN training, the discriminator is frozen to prevent its weights from being updated. The GAN model is then compiled with binary cross-entropy loss and the Adam optimizer.

# Define the GAN model
gan = keras.Sequential([generator, discriminator])

# Compile the discriminator
discriminator.compile(loss="binary_crossentropy", optimizer="adam")

# Freeze the discriminator during GAN training
discriminator.trainable = False

# Compile the GAN
gan.compile(loss="binary_crossentropy", optimizer="adam")

Training the GAN

In the training loop, the discriminator and generator are trained separately using batches of real and generated data, and the losses are printed for each epoch to monitor the training progress. The GAN model aims to train the generator to produce realistic data samples that can deceive the discriminator.

# Training loop
for epoch in range(epochs):
    # Generate random noise
    noise = tf.random.normal(shape=(batch_size, latent_dim))

    # Generate fake samples and create a batch of real samples
    generated_data = generator(noise)
    real_data = x_train[np.random.choice(x_train.shape[0], batch_size, replace=False)]

    # Concatenate real and fake samples and create labels
    combined_data = tf.concat([real_data, generated_data], axis=0)
    labels = tf.concat([tf.ones((batch_size, 1)), tf.zeros((batch_size, 1))], axis=0)

    # Train the discriminator
    discriminator_loss = discriminator.train_on_batch(combined_data, labels)

    # Train the generator (via GAN model)
    gan_loss = gan.train_on_batch(noise, tf.ones((batch_size, 1)))

    # Print the losses
    print(f"Epoch: {epoch+1}, Disc Loss: {discriminator_loss}, GAN Loss: {gan_loss}")

Transformers and Autoregressive Models

These models have revolutionized natural language processing tasks. With the transformers self-attention mechanism, excel at capturing long-range dependencies in sequential data. This ability enables them to generate coherent and contextually relevant text, revolutionizing language generation tasks.

Autoregressive models, such as the GPT series, generate outputs sequentially, conditioning each step on previous outputs. These models have proved invaluable in generating captivating stories, engaging dialogues, and even assisting in writing.

Transformer Implementation

This defines a Transformer model using the Keras Sequential API, which includes an embedding layer, a Transformer layer, and a dense layer with a softmax activation. This model is designed for tasks such as sequence-to-sequence language translation or natural language processing, where it can learn to process sequential data and generate output predictions.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Define the Transformer model
transformer = keras.Sequential([
    layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim),
    layers.Transformer(num_layers, d_model, num_heads, dff, 
        input_vocab_size=vocab_size, maximum_position_encoding=max_seq_length),
    layers.Dense(output_vocab_size, activation="softmax")
])

Real-world Application of Generative AI

Generative Artificial Intelligence has emerged as a game-changer, transforming various industries by enabling personalized experiences and unlocking new realms of creativity. Through techniques such as VAEs, GANs, and Transformers, generative AI has made significant strides in personalized recommendations, creative content generation, and data augmentation. In this blog, we will explore how these real-world applications are reshaping industries and revolutionizing user experiences.

Personalized Recommendations

Generative AI techniques, such as VAEs, GANs, and Transformers, are revolutionizing recommendation systems by delivering highly tailored and personalized content. By analyzing user data, these models provide customized recommendations for products, services, and content, enhancing user experiences and engagement.

Creative Content Generation

Generative AI empowers artists, designers, and musicians to explore new realms of creativity. Models trained on vast datasets can generate stunning artwork, inspire designs, and even compose original music. This collaboration between human creativity and machine intelligence opens up new possibilities for innovation and expression.

Data Augmentation and Synthesis

Generative models play a crucial role in data augmentation by generating synthetic data samples to augment limited training datasets. This improves the generalization capability of ML models, enhancing their performance and robustness, from computer vision to NLP.

Personalized Advertising and Marketing

Generative AI transforms advertising and marketing by enabling personalized and targeted campaigns. By analyzing user behavior and preferences, AI models generate personalized advertisements and marketing content. It delivers tailored messages and offers to individual customers. This enhances user engagement and improves marketing effectiveness.

Challenges and Ethical Considerations

Generative AI brings forth possibilities, it is vital to address the challenges and ethical considerations that accompany these powerful technologies. As we delve into the world of recommendations, creative content generation, and data augmentation, we must ensure fairness, authenticity, and responsible use of generative AI.

1. Biases and Fairness

Generative AI models can inherit biases present in training data, necessitating efforts to minimize and mitigate biases through data selection and algorithmic fairness measures.

2. Intellectual Property Rights

Clear guidelines and licensing frameworks are crucial to protect the rights of content creators and ensure respectful collaboration between generative AI and human creators.

3. Misuse of Generated Information

Robust safeguards, verification mechanisms, and education initiatives are needed to combat the potential misuse of generative AI for fake news, misinformation, or deepfakes.

4. Transparency and Explainability

Enhancing transparency and explainability in generative AI models can foster trust and accountability, enabling users and stakeholders to understand the decision-making processes.

By addressing these challenges and ethical considerations, we can harness the power of generative AI responsibly, promoting fairness, inclusivity, and ethical innovation for the benefit of society.

Future of Generative AI

The future of generative AI holds exciting possibilities and advancements. Here are a few key areas that could shape its development

Enhanced Controllability

Researchers are working on improving the controllability of generative AI models. This includes techniques that allow users to have more fine-grained control over the generated outputs, such as specifying desired attributes, styles, or levels of creativity. Controllability will empower users to shape the generated content according to their specific needs and preferences.

Interpretable and Explainable Outputs

Enhancing the interpretability of generative AI models is an active area of research. The ability to understand and explain why a model generates a particular output is crucial, especially in domains like healthcare and law where accountability and transparency are important. Techniques that provide insights into the decision-making process of generative AI models will enable better trust and adoption.

Few-Shot and Zero-Shot Learning

Currently, generative AI models often require large amounts of high-quality training data to produce desirable outputs. However, researchers are exploring techniques to enable models to learn from limited or even no training examples. Few-shot and zero-shot learning approaches will make generative AI more accessible and applicable to domains where acquiring large datasets is challenging.

Multimodal Generative Models

Multimodal generative models that combine different types of data, such as text, images, and audio, are gaining attention. These models can generate diverse and cohesive outputs across multiple modalities, enabling richer and more immersive content creation. Applications could include generating interactive stories, augmented reality experiences, and personalized multimedia content.

Real-Time and Interactive Generation

The ability to generate content in real-time and interactively opens up exciting opportunities. This includes generating personalized recommendations, virtual avatars, and dynamic content that responds to user input and preferences. Real-time generative AI has applications in gaming, virtual reality, and personalized user experiences.

As generative AI continues to advance, it is important to consider the ethical implications, responsible development, and fair use of these models. By addressing these concerns and fostering collaboration between human creativity and generative AI, we can unlock its full potential to drive innovation and positively impact various industries and domains.

Conclusion

Generative AI has emerged as a powerful tool for creative expression, revolutionizing various industries and pushing the boundaries of what machines can accomplish. With ongoing advancements and research, the future of generative AI holds tremendous promise. As we continue to explore this exciting landscape, it is essential to navigate the ethical considerations and ensure responsible and inclusive development.

Key Takeaways

VAEs offer creative potential by mapping data to a lower-dimensional space and generating diverse content, making them invaluable for applications like artwork and image synthesis.
GANs revolutionize AI-generated content through their competitive framework, producing highly realistic outputs such as deepfake videos and photorealistic artwork.
Transformers excel in generating coherent outputs by capturing long-range dependencies, making them well-suited for tasks like machine translation, text generation, and image synthesis.
The future of generative AI lies in improving controllability, interpretability, and efficiency through research advancements in multi-modal models, transfer learning, and training methods to enhance the quality and diversity of generated outputs.

Embracing generative AI opens up new possibilities for creativity, innovation, and personalized experiences, shaping the future of technology and human interaction.

Frequently Asked Questions

Q1: What is generative AI?

A1: Generative AI refers to the use of algorithms and models to generate new content, such as images, music, and text.

Q2: How do Variational Autoencoders (VAEs) work?

A2: VAEs consist of an encoder and a decoder. The encoder maps input data to a lower-dimensional latent space, capturing the essence of the data. The decoder reconstructs the original data from points in the latent space. It allows for the generation of new samples by sampling from this space.

Q3: What are Generative Adversarial Networks (GANs)?

A3: GANs consist of a generator and a discriminator. The generator generates new samples from random noise, aiming to fool the discriminator. The discriminator acts as a judge, distinguishing between real and fake samples. GANs are known for their ability to produce highly realistic outputs.

Q4: How do Transformers contribute to generative AI?

A4: Transformers excel in generating coherent outputs by capturing long-range dependencies in the data. They weigh the importance of different input elements. This makes them effective for tasks like machine translation, text generation, and image synthesis.

Q5: Can generative AI models be fine-tuned for specific tasks?

A5: Generative AI models can be fine-tuned and conditioned. But on specific input parameters or constraints to generate content that adheres to desired characteristics or styles. This allows for greater control over the generated outputs.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Babina Banjara

Technology can impact lives at a level that has never been realized in mankind's history. The idea that something I create can impact someone worldwide now or in the future drives my passion for Technology.

A dedicated ML Engineer and Tech enthusiast, proficient in training ML models. My current interests are advancing machine learning techniques, particularly in natural language processing, LLMs, and multimodal AI.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Unleashing Generative AI with VAEs, GANs, and Transformers

Introduction

Learning Objectives

Defining Generative AI

The Power of Generative AI

Variational Autoencoders (VAEs)

VAE Implementation

Defining Encoder and Decoder Model

Define Sampling Function

Define Loss Function

Compile and Train the Model

Generative Adversarial Networks (GANs)

GAN Implementation

Defining Generator and Discriminator Network

Defining GAN Model

Training the GAN

Transformers and Autoregressive Models

Transformer Implementation

Real-world Application of Generative AI

Personalized Recommendations

Creative Content Generation

Data Augmentation and Synthesis

Personalized Advertising and Marketing

Challenges and Ethical Considerations

1. Biases and Fairness

2. Intellectual Property Rights

3. Misuse of Generated Information

4. Transparency and Explainability

Future of Generative AI

Enhanced Controllability

Interpretable and Explainable Outputs

Few-Shot and Zero-Shot Learning

Multimodal Generative Models

Real-Time and Interactive Generation

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#