Stability.ai has unveiled Stable Diffusion 3.5, featuring multiple variants: Stable Diffusion 3.5 Large, Large Turbo, and Medium. These models are customizable and can run on consumer hardware. Let’s explore these models, learn how to access them, and use them for inference to see what Stable Diffusion brings to the table this time around.
Stable Diffusion 3.5 offers a range of models:
The models can be easily fine-tuned to fit the needs and are optimized for consumer hardware, including the Stable Diffusion 3.5 Medium and Large Turbo models, which offer high-quality output with minimal resource demands. The 3.5 Medium model requires 9.9 GB VRAM (excluding text encoders), ensuring broad compatibility with most GPUs.
The Stable Diffusion 3.5 Large leads in prompt adherence and rivals larger models in image quality. The Large Turbo variant delivers fast inference and quality output, while the 3.5 Medium offers a high-performing, efficient option among medium-sized models.
Go to the platform page and get your API Key. (You’re offered 25 credits after signing up)
Run this Python code in a jupyter environment (Replace your API key in the code) to generate an image and change the prompt if you wish to.
import requests
response = requests.post(
f"https://api.stability.ai/v2beta/stable-image/generate/sd3",
headers={
"authorization": f"Bearer sk-{API-key}",
"accept": "image/*"
},
files={"none": ''},
data={
"prompt": "A middle-aged man wearing formal clothes",
"output_format": "jpeg",
},
)
if response.status_code == 200:
with open("./man.jpeg", 'wb') as file:
file.write(response.content)
else:
raise Exception(str(response.json()))
I asked the model to generate an image of “A middle-aged man wearing formal clothes”, the model seems to be performing well in generating photo-realistic images.
You can use the model on Hugging Face.
First, click on the link, and then you can start inferencing directly from the Stable Diffusion 3.5-medium model.
This is the interface you’ll be greeted with:
I prompted the model to generate an image of “A forest with red trees”, and it did a wonderful job generating this 1024 x 1024 image.
Feel free to play around with the advanced settings to see how the result changes.
Step 1: Visit the model page of Stable Diffusion 3.5-large on Hugging Face
Note: You can choose a different model and see the options here: Hugging Face.
Step 2: Fill out the necessary details to get access to the model, as it’s a gated model, and wait for a while. Once you’ve been granted access, you’ll be able to use the model.
Step-3: Now you can run this Python code in a jupyter environment to send prompts to the model. (make sure to replace your Hugging Face token in the header)
import requests
API_URL = "https://api-inference.huggingface.co/models/stabilityai/stable-diffusion-3.5-large"
headers = {"Authorization": "Bearer hf_token"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.content
image_bytes = query({
"inputs": "A ninja sitting on top of a tall building, 8k",
})
# You can access the image with PIL
import io
from PIL import Image
image = Image.open(io.BytesIO(image_bytes))
image
You can feel free to change the prompt and try to generate different sorts of images.
In conclusion, the model offers a robust range of image-generation models with various performance levels tailored for both professional and consumer use. The lineup, which includes the Large, Large Turbo, and Medium models, provides flexibility in quality and speed, making it a great choice for various applications. With simple access options via Stability AI’s platform, Hugging Face, and API integrations, Stable Diffusion 3.5 makes high-quality AI-driven image generation easier.
Also, if you are looking for Generative AI course then explore: GenAI Pinnacle Program
Ans. API requests require an API key for authentication, which should be included in the header to access various functionalities.
Ans. Common errors include unauthorized access, invalid parameters, or exceeding usage limits, each with specific response codes for troubleshooting.
Ans. The model is free under the Stability Community License for research, non-commercial use, and organizations with under $1M revenue. Larger entities need an Enterprise License.
Ans. It uses a Multimodal Diffusion Transformer (MMDiT-X) with improved training techniques, such as QK-normalization and dual attention, for enhanced image generation across multiple resolutions.