Have you ever wondered how AI can create stunning images from scratch? That’s where Stable Diffusion comes in! It’s a fascinating concept in machine learning and generative AI, falling under the umbrella of generative models.
In this article, we’ll dive into the magic behind Stable Diffusion. We’ll explore its theoretical foundations, practical implementation, and some of its exciting applications. So, whether you’re a seasoned AI enthusiast or just curious about how machines can craft art, stick around! This is going to be a fun and enlightening journey.
The idea of the diffusion model is not that old. In the 2015 paper called “Deep Unsupervised Learning using Nonequilibrium Thermodynamics”, the Authors described it like this:
The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data.
Here, the diffusion process is split into forward and reverse diffusion processes. The forward diffusion process turns an image into noise, and the reverse diffusion process is supposed to turn that noise into the image again.
In forward diffusion, we take an image with a non-random distribution. We do not know the distribution, but our goal is to destroy it by adding noise to it. At the end of the process, we should have noise that is similar to pure noise.
Let’s look into an example, we will take the below image
Our goal is to destroy the above image’s distribution so that it becomes pure noise like below.
Here is the forward process:
The below image represents noise being added t+1 times.
After iterating through our steps 11 times, we get a completely destroyed image.
Also read: Mastering Diffusion Models: A Guide to Image Generation with Stable Diffusion
Let x0 represent the initial data (e.g., an image). The forward process generates a series of noisy versions of this data x1,x2,…,xT through the following iterative equation:
Here,q is our forward process, and xt is the output of the forward pass at step t. N is a normal distribution, 1-txt-1 is our mean, and tI defines variance.
t refers to the schedule, and its values range from 0 to 1. The value of t is usually kept low to avoid variance from exploding. The paper from 2020 uses a linear schedule; hence, the output looks like the below:
The images above show us the forward diffusion process using a linear schedule with 1000 time steps.
In this case, 𝛽𝑡 ranges from 0.0001 to 0.02 for the mean and variance behaves as shown below.
Later, in 2021, researchers from OpenAI decided that using a linear schedule is not that efficient. As we have seen before, most of the information from the original image is lost after around half of the total steps. They designed their own schedule and called it the cosine schedule. The improvement in the schedule allowed them to reduce the number of steps to 50.
Latent samples from linear (top) and cosine (bottom)
schedules respectively at linearly spaced values of t from 0 to T
Also read: Stable Diffusion AI has Taken the World By Storm
It can be described as:
Where q(x1:T∣x0) represents the joint distribution of the noisy data over all time steps. With that equation, we can calculate noise at any arbitrary step t without going through the process.
Here are the applications:
When implementing the forward process in practice, several considerations must be addressed:
In Stable Diffusion, the forward process is a painstakingly crafted technique that applies progressive noise addition to convert data into a Gaussian noise distribution. Understanding this procedure is essential to using diffusion models for creative endeavors. The forward stable diffusion process creates the foundation for efficient and reliable data production, opening up a world of machine learning and artificial intelligence possibilities. It does this by meticulously adjusting the noise schedule and guaranteeing computing efficiency.
Ans. The forward process in stable diffusion refers to the progressive noising of data, typically an image, over a series of steps to create a noisy version of the original input. This process is used in training diffusion models to learn how to reverse the noising process and generate high-quality samples.
Ans. The forward process incrementally adds Gaussian noise to the data at each time step. This creates a sequence of progressively noisier versions of the original data, allowing the model to learn the relationship between clean and noisy data.
Ans. The forward process is crucial because it gives the model the training data needed to learn the reverse process. By seeing how data becomes noisy, the model can learn to reverse the noise addition, essential for generating new, high-quality samples from noise.
Ans. Gaussian noise is typically added during the forward process. The noise is added in such a way that it progressively increases with each time step, degrading the original data more and more.
Ans. The number of steps in the forward process can vary but is usually set to a high number, such as 1,000 steps. This allows for a fine-grained progression of noise addition, aiding the model’s learning of the reverse process.