Guide to Image-to-Image Diffusion: A Hugging Face Pipeline

Mobarak Inuwa Last Updated : 24 Oct, 2024

9 min read

By applying specific modern state-of-the-art techniques, stable diffusion models make it possible to generate images and audio. Stable Diffusion works by modifying input data with the guide of text input and generating new creative output data. In this article, we will see how to generate new images from a given input image by employing depth-to-depth model diffusers on the PyTorch backend with a Hugging Face pipeline. We are using Hugging Face since they have made an easy-to-use image generation using stable diffusion pipeline available.

Learn More: Hugging Face Transformers Pipeline Functions

Learning Objectives

Understand the concept of Stable Diffusion and its application in generating images and audio using modern state-of-the-art techniques.
Gain knowledge of the key components and techniques involved in Stable Diffusion, such as latent diffusion models, denoising autoencoders, variational autoencoders, U-Net blocks, and text encoders.
Explore common applications of diffusion models, including text-to-image, text-to-videos, and text-to-3D conversions.
Learn how to set up the environment for Stable Diffusion, including utilizing GPU and installing necessary libraries and dependencies.
Develop practical skills in applying Stable Diffusion by loading and diffusing images, creating text prompts to guide the output, adjusting diffusion levels, and understanding the limitations and challenges associated with image generation using stable diffusion models.

This article was published as a part of the Data Science Blogathon.

What is a Stable Diffusion?
The Concepts of Stable Diffusion
Common Applications of Diffusion
Setting Up Environment
Importing Dependencies
Instantiating the Pre-trained Diffusers
Preparing Image Data
Loading Image
Creating Text Prompts
Creating Negative Prompts
Adjusting Diffusion Level
Limitations of Diffusion Models
Conclusion
Frequently Asked Questions

What is a Stable Diffusion?

Stable Diffusion models function as latent diffusion models. It learns the latent structure of input by modeling how the data attributes diffuse through the latent space. They belong to the deep generative neural network. It is considered stable because we guide the results using original images, text, etc. On the other hand, an unstable diffusion will be unpredictable.

The Concepts of Stable Diffusion

Stable Diffusion uses the Diffusion or latent image generation using stable diffusion model (LDM), a probabilistic model. These models are trained like other deep learning models. Still, the objective here is removing the need for continuous applications of signal processing denoting a kind of noise in the signals in which the probability density function equals the normal distribution. We refer to this as the Gaussian noise applied to the training images. We achieve this through a sequence of denoising autoencoders (DAE). DAEs contribute by changing the reconstruction criterion. This is what alters the continuous application of signal processing. It is initialized to add a noise process to the standard autoencoder.

Stable Diffusion | Hugging Face Pipeline

In a more detailed explanation, Stable Diffusion consists of 3 essential parts: First is the variational autoencoder (VAE) which, in simple terms, is an artificial neural network that performs as probabilistic graphical models. Next is the U-Net block. This convolutional neural network (CNN) was developed for image segmentation. Lastly is the text encoder part. A trained CLIP ViT-L/14 text encoder deals with this. It handles the transformations of the text prompts into an embedding space.

The VAE encoder compresses the image pixel space values into a smaller dimensional latent space to carry out image diffusion. This helps the image not to lose details. It is represented again in pixeled pictures.

Common Applications of Diffusion

Let us quickly look at three common areas where diffusion models can be applied:

Text-to-Image: This approach does not use images but a piece of text “prompt” to generate related photos.
Text-to-Videos: Diffusion models are used for generating videos out of text prompts. Current research uses this in media to do interesting feats like creating online ad videos, explaining concepts, and creating short animation videos, song videos, etc.
Text-to-3D: This stable image generation using stable diffusion approach converts input text to 3D images.

Applying diffusers can help generate free images that are plagiarism free. This provides content for your projects, materials, and even marketing brands. Instead of hiring a painter or photographer, you can generate your images. Instead of a voice-over artist, you can create your unique audio. Now let’s look at Image-to-image Generation.

Image-to-image Generation | Stable Diffusion

Also Read: Bring Doodles to Life: Meta Open-Sources AI Model

Setting Up Environment

This task requires GPU and a good development environment like processing images and graphics. You are expected to ensure you have GPU available if you want to follow along with this project. We can use Google Colab since it provides a suitable environment and GPU, and you can search for it online. Follow the steps below to engage the available GPU:

Go to the Runtime tab towards the top right.
After selecting Runtime, click the Change Runtime Type option.
Then select GPU as a hardware accelerator from the drop-down option.

You can find all the code on GitHub.

Importing Dependencies

There are several dependencies in using the pipeline from Huggingface. We will first start by importing them into our project environment.

Installing Libraries

Some libraries are not preinstalled in Colab. We need to start by installing them before importing from them.

#  Installing required libraries
%pip install --quiet --upgrade diffusers transformers scipy ftfy

#  Installing required libraries
%pip install --quiet --upgrade accelerate

Let us explain the installations we have done above. Firstly are the diffusers, transformers, scipy, and ftfy. SciPy and ftfy are standard Python libraries we employ for everyday Python tasks. We will explain the new major libraries below.

Diffusers: Diffusers is a library made available by Hugging Face for getting well-trained image to image stable diffusion models for generating images. We are going to use it for accessing our pipeline and other packages.

Transformers: Transformers contain tools and APIs that help us cut training costs from scratch.

# Backend
import torch

 # Internet access
import requests

# Regular Python library for Image processing
from PIL import Image

# Hugging face pipeline
from diffusers import StableDiffusionDepth2ImgPipeline

StableDiffusionDepth2ImgPipeline is the library that reduces our code. All we need to do is pass an image and a prompt describing our expectations.

Instantiating the Pre-trained Diffusers

Next, we just make an instance of the pre-trained diffuser we imported above and assign it to our GPU. Here this is Cuda.

#  Creating a variable instance of the pipeline
pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-depth",
    torch_dtype=torch.float16,
)

#  Assigning to GPU
pipe.to("cuda")

Preparing Image Data

Let’s define a function to help us check images from URLs. You can skip this step to try an image you have locally. Mount the drive in Colab.

# Accesssing images from the web
import urllib.parse as parse
import os
import requests

# Verify URL
def check_url(string):
    try:
        result = parse.urlparse(string)
        return all([result.scheme, result.netloc, result.path])
    except:
        return False

We can define another function to use the check_url function for loading an image.

# Load an image
def load_image(image_path):
    if check_url(image_path):
        return Image.open(requests.get(image_path, stream=True).raw)
    elif os.path.exists(image_path):
        return Image.open(image_path)

Loading Image

Now, we need an image to diffuse into another image. You can use your photo. In this example, we are using an online image for convenience. Feel free to use your URL or images.

# Loading an image URL
img = load_image("https://img.freepik.com/free-photo/stacked-tomatoes_1353-262.jpg?w=740&t=st=1683821147~exp=1683821747~hmac=708f16371d1e158d76c8ea5e8b9790fb68dc75009750b8328e17c21f16d36468")

# Displaying the Image
img

Creating Text Prompts

Now we have a usable image. Let’s now show some image to image stable diffusion feats on it. To achieve this, we wrap prompts to the pictures. These are sets of texts with keywords describing our expectations from the Diffusion. Instead of generating a random new image, we can use prompts to guide the model’s output.

Note that we set the strength to 0.7. This is an average. Also, note the negative_prompt is set to None. We will look at this more later.

# Setting Image prompt
prompt = "Some sliced tomatoes mixed"

# Assigning to pipeline
pipe(prompt=prompt, image=img, negative_prompt=None, strength=0.7).images[0]

Now we can continue with this step on new images. The method remains;

Loading the image to be diffused, and

Creating a text description of the target image.

You can create some examples on your own.

Creating Negative Prompts

Another approach is to create a negative prompt to counter the intended output. This makes the pipeline more flexible. We can do this by assigning a negative prompt to the negative_prompt variable.

# Loading an image URL
img = load_image("https://img.freepik.com/free-photo/stacked-tomatoes_1353-262.jpg?w=740&t=st=1683821147~exp=1683821747~hmac=708f16371d1e158d76c8ea5e8b9790fb68dc75009750b8328e17c21f16d36468")

# Displaying the Image
img

# Setting Image prompt
prompt = ""
n_prompt = "rot, bad, decayed, wrinkled"

# Assigning to pipeline
pipe(prompt=prompt, image=img, negative_prompt=n_prompt, strength=0.7).images[0]

Adjusting Diffusion Level

You may ask about altering how much the new image changes from the first. We can achieve this by changing the strength level. We will observe the effect of different strength levels on the previous image.

At strength = 0.1

# Setting Image prompt
prompt = ""
n_prompt = "rot, bad, decayed, wrinkled"

# Assigning to pipeline
pipe(prompt=prompt, image=img, negative_prompt=n_prompt, strength=0.1).images[0]

On strength = 0.4

# Setting Image prompt
prompt = ""
n_prompt = "rot, bad, decayed, wrinkled"

# Assigning to pipeline
pipe(prompt=prompt, image=img, negative_prompt=n_prompt, strength=0.4).images[0]

At strength = 1.0

# Setting Image prompt
prompt = ""
n_prompt = "rot, bad,decayed, wrinkled"

# Assigning to pipeline
pipe(prompt=prompt, image=img, negative_prompt=n_prompt, strength=1.0).images[0]

The strength variable makes it possible to work on the effect of Diffusion on the new image generated. This makes it more flexible and adjustable.

Limitations of Diffusion Models

Before we call it a wrap on Stable Diffusion, one must understand that one can face some limitations and challenges with these pipelines. Every new technology always has some issues at first.

We trained the stable diffusion model on images with 512×512 resolution. The implication is that when we generate new photos and desire dimensions higher than 512×512, the image quality tends to degrade. Although, there is an attempt to solve this problem by updating higher versions of the Stable Diffusion model where we can natively generate images but at 768×768 resolution. Although people attempt to improve things, as long as there is a maximum resolution, the use case will primarily limit printing large banners and flyers.
Training the dataset on the LAION database. It is a non-profit organization that provides datasets, tools, and models for research purposes. This has shown that the model could not identify human limbs and faces richly.
Stable image to image stable diffusion on a CPU can run in a feasible time ranging from a few seconds to a few minutes. This removes the need for a high computing environment. It can only be a bit complex when the pipeline is customized. This can demand high RAM and processor, but the available channel takes less complexity.
Lastly is the issue of Legal rights. The practice can easily suffer legal matters as the models require vast images and datasets to learn and perform well. An instance is the January 2023 lawsuits from three artists for copyright infringement against Stability AI, Midjourney, and DeviantArt. Therefore, there can be limitations in freely building these images.

Conclusion

In conclusion, while the concept of diffusers is cutting-edge, the Hugging Face pipeline makes it easy to integrate into our projects with an easy and very direct code underside. Using prompts on the images makes it possible to set and bring an imaginary picture to the Diffusion. Additionally, the strength variable is another critical parameter. It helps us with the level of Diffusion. We have seen how to generate new images from images.

Key Takeaways

By applying state-of-the-art techniques, stable diffusion models generate images and audio.
Typical applications of image to image stable diffusion include Text-to-image, Text-to-Videos, and Text-to-3D.
StableDiffusion Depth2ImgPipeline is the library that reduces our code, so we only need to pass an image to describe our expectations.

Learn More: Pytorch | Getting Started With Pytorch

Master image generation with our Stable Diffusion with Hugging Face course. Learn to create stunning images from text prompts and input images with ease.

Reference Links

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Frequently Asked Questions

Q1. What can I do with Stable Diffusion?

A. Stable Diffusion allows users to generate high-quality images by iteratively refining them through diffusion processes. This technique enhances image quality and realism over time, making it suitable for various creative and artistic applications.

Q2. Can I use Stable Diffusion for free?

A. Yes, Stable Diffusion is open-source and available for free. Users can access and utilize the model without any cost, facilitating experimentation and development in the field of image generation and enhancement.

Q3. Can you make NSFW with Stable Diffusion?

A. Yes, Stable Diffusion can generate NSFW (Not Safe For Work) content as it allows users to control and manipulate image generation processes. However, ethical considerations and guidelines should be followed when creating such content.

Q4. How to start working with Stable Diffusion?

A. To begin working with Stable Diffusion, you can install the necessary libraries and dependencies, such as PyTorch and Stable Diffusion framework. Next, explore tutorials and documentation available online to understand its functionalities and start experimenting with image generation tasks.

Mobarak Inuwa

I am an AI Engineer with a deep passion for research, and solving complex problems. I provide AI solutions leveraging Large Language Models (LLMs), GenAI, Transformer Models, and Stable Diffusion.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Guide to Image-to-Image Diffusion: A Hugging Face Pipeline

Learning Objectives

Table of contents

What is a Stable Diffusion?

The Concepts of Stable Diffusion

Common Applications of Diffusion

Setting Up Environment

Importing Dependencies

Installing Libraries

Instantiating the Pre-trained Diffusers

Preparing Image Data

Loading Image

Creating Text Prompts

Creating Negative Prompts

Adjusting Diffusion Level

Limitations of Diffusion Models

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp