In the recent re:Invent 2024 event, Amazon released its most advanced Nova foundation models, built to enhance AI and content creation. In this article, I’ll discuss Nova’s architecture, highlighting its powerful capabilities, and then put it to the test to share my hands-on experience with this innovative technology.
Amazon Nova is the next evolution in foundation models, delivering state-of-the-art intelligence combined with unparalleled price-performance. Exclusively available through Amazon Bedrock, these models empower a wide range of applications.
From processing documents with image and text analysis to scaling marketing content creation or building AI assistants that can interpret and respond to visual data, Amazon Nova provides the intelligence and flexibility to meet your needs. The suite includes two specialized model categories: Understanding and Creative Content Generation, catering to diverse use cases with precision and innovation.
Amazon Nova Micro, Nova Lite, and Nova Pro are advanced understanding models designed to process text, image, and video inputs, delivering text-based outputs. These models offer a versatile range of capabilities, balancing accuracy, speed, and cost to meet diverse operational needs. Key features include:
Let’s look at each one of them:
Amazon Nova Micro is a text-only model optimized for ultra-low latency and cost-effective performance. It excels in a wide range of tasks, including language understanding, translation, reasoning, code completion, brainstorming, and mathematical problem-solving. With a generation speed exceeding 200 tokens per second, it is perfect for applications demanding rapid responses.
Key Features
Amazon Nova Lite is an ultra-fast and cost-effective multimodal model designed to handle text, image, and video inputs. Its impressive accuracy across diverse tasks, combined with exceptional speed, makes it ideal for interactive and high-volume applications where cost-efficiency is a priority.
Key Features
Amazon Nova Pro is a highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Pro’s capabilities, coupled with its industry-leading speed and cost efficiency, makes it a compelling model for almost any task, including video summarization, Q&A, mathematical reasoning, software development, and AI agents that can execute multi-step workflows. In addition to state-of-the-art accuracy on text and visual intelligence benchmarks, Amazon Nova Pro excels at instruction following and agentic workflows as measured by Comprehensive RAG Benchmark (CRAG), the Berkeley Function Calling Leaderboard, and Mind2Web.
Key Features
Most capable multimodal model for complex reasoning tasks and for use as the best teacher for distilling custom models. Amazon Nova Premier is still in training. They are targeting availability in early 2025.
The Amazon Nova suite includes two cutting-edge models for creating realistic multimodal content, tailored for a wide range of applications such as advertising, marketing, and entertainment:
A state-of-the-art image generation model designed to produce high-quality visuals with precise control over style and content. Amazon Nova Canvas offers advanced features for creative flexibility and excels in benchmarks like TIFA (Text-to-Image Faithfulness Assessment) and ImageReward.
Key Functionalities
A state-of-the-art video generation model designed to create professional-quality video content. Amazon Nova Reel outperforms existing models in human evaluations of video quality and consistency.
Key Functionalities
Amazon Nova models deliver exceptional performance across core and agentic text benchmarks, excelling in MMLU, ARC-C, and GSM8K. Tested against leading models like GPT-4 and Claude, Nova sets new standards in accuracy, reasoning, and task execution.
Quantitative results on core capability benchmarks, including MMLU, ARC-C, DROP, GPQA, MATH, GSM8K, IFEval, and BigBench-Hard (BBH). Unless stated otherwise, reference values are sourced from the original technical reports and websites for Claude, GPT-4, Llama, and Gemini models. Results labeled with M were independently measured, while Claude’s IFEval scores are marked with an asterisk (∗) due to unspecified scoring methodology.
Results from the Berkeley Function Calling Leaderboard (BFCL) v3 as of the November 17, 2024 update, featuring the latest model versions available at that time. For Llama 3.2 11B and 90B, leaderboard results for Llama 3.1 8B and 70B are used due to the shared text LLM.
In the next section, I’ll demonstrate how to put AWS Nova to use. If you’re having trouble accessing AWS Nova, check out my detailed instructions in the article – How to Access Nova in AWS?
To demonstrate the capabilities of document analysis, I downloaded this Article from Analytics Vidhya blog Build Agents the Atomic Way! in PDF format.
First, I navigated to the Model Access section in the Amazon Bedrock console and requested access to the new Amazon Nova models. Next, in the Playground section, I selected the Chat/Text option and chose the Amazon Nova Pro model. I then uploaded the decision guide PDF and asked:
Write a summary of this doc in 100 words. Then, build a decision tree.
Output:
The output follows my instructions producing a structured decision tree that gives me a glimpse of the document before reading it.
To demonstrate video analysis, I will input one video into the input :
Amazon Nova Pro can analyze videos that are uploaded, I asked:
Whats happening in the video?
Output:
At the beginning of the video, there are three cats on a ledge. One cat is gray and white, one is brown and white, and one is white. The white cat is on the right side of the ledge. The cats are looking in different directions. There are some plants and trees in the background. As the video progresses, the cats continue to stand on the ledge. The white cat moves to the middle of the ledge. The cats continue to look in different directions. The plants and trees in the background remain the same.
I can use a more detailed prompt to extract specific information from the video such as objects or text. Note that Amazon Nova currently does not process audio in a video.
I can also use the AWS SDK for Python (Boto3) to invoke the Amazon Nova Pro model using the Amazon Bedrock Converse API and analyze the video. Please ensure that AWS is properly configured in your system to use the API. Additionally, verify that you have the necessary permissions to execute the operations.
import boto3
AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-pro-v1:0"
VIDEO_FILE = "/home/abhishek/Downloads/cats_sample"
bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)
with open(VIDEO_FILE, "rb") as f:
video = f.read()
user_message = "Describe this video."
messages = [ { "role": "user", "content": [
{"video": {"format": "mp4", "source": {"bytes": video}}},
{"text": user_message}
] } ]
response = bedrock_runtime.converse(
modelId=MODEL_ID,
messages=messages,
inferenceConfig={"temperature": 0.0}
)
response_text = response["output"]["message"]["content"][0]["text"]
print(response_text)
Amazon Nova Pro can analyze videos that are uploaded with the API (as in the previous code) or that are stored in an Amazon Simple Storage Service (Amazon S3) bucket.
Output:
Now, let’s create a video using Amazon Nova Reel, starting from a text-only prompt and then providing a reference image. Because generating a video takes a few minutes, the Amazon Bedrock API introduced three new operations:
Amazon Nova Reel supports camera control actions such as zooming or moving the camera. This Python script creates a video from this text prompt:
A colorful flower garden with roses, sunflowers,
tulips, and lavender swaying in the sunlight.
The camera zooms in to capture the
intricate details of each bloom..
After the first invocation, the script periodically checks the status until the creation of the video has been completed. I pass a random seed to get a different result each time the code runs.
import random
import time
import boto3
AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-reel-v1:0"
SLEEP_TIME = 30
S3_DESTINATION_BUCKET = "<BUCKET>"
video_prompt = "A colorful flower garden with roses, sunflowers, tulips, and lavender swaying in the sunlight. The camera zooms in to capture the intricate details of each bloom."
bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)
model_input = {
"taskType": "TEXT_VIDEO",
"textToVideoParams": {"text": video_prompt},
"videoGenerationConfig": {
"durationSeconds": 6,
"fps": 24,
"dimension": "1280x720",
"seed": random.randint(0, 2147483648)
}
}
invocation = bedrock_runtime.start_async_invoke(
modelId=MODEL_ID,
modelInput=model_input,
outputDataConfig={"s3OutputDataConfig": {"s3Uri": f"s3://{S3_DESTINATION_BUCKET}"}}
)
invocation_arn = invocation["invocationArn"]
s3_prefix = invocation_arn.split('/')[-1]
s3_location = f"s3://{S3_DESTINATION_BUCKET}/{s3_prefix}"
print(f"\nS3 URI: {s3_location}")
while True:
response = bedrock_runtime.get_async_invoke(
invocationArn=invocation_arn
)
status = response["status"]
print(f"Status: {status}")
if status != "InProgress":
break
time.sleep(SLEEP_TIME)
if status == "Completed":
print(f"\nVideo is ready at {s3_location}/output.mp4")
else:
print(f"\nVideo generation status: {status}")
Output:
After a few minutes, the script completes and prints the output Amazon Simple Storage Service (Amazon S3) location. I download the output video using the AWS Command Line Interface (AWS CLI) or I can download it manually:
aws s3 cp s3://BUCKET/PREFIX/output.mp4 ./output-from-text.mp4
This is the resulting video. As requested, the camera zooms in on the subject.
To have better control over the creation of the video, I can provide Amazon Nova Reel a reference image such as the following:
The provided image must have dimensions in the set [1280×720].
This script uses the reference image and a text prompt with a camera action (drone view then a bee sitting on a flower when zoomed in) to create a video:
import base64
import random
import time
import boto3
S3_DESTINATION_BUCKET = "<BUCKET>"
AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-reel-v1:0"
SLEEP_TIME = 30
input_image_path = "seascape.png"
video_prompt = "drone view then a bee sitting on a flower when zoomed in"
bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)
# Load the input image as a Base64 string.
with open(input_image_path, "rb") as f:
input_image_bytes = f.read()
input_image_base64 = base64.b64encode(input_image_bytes).decode("utf-8")
model_input = {
"taskType": "TEXT_VIDEO",
"textToVideoParams": {
"text": video_prompt,
"images": [{ "format": "png", "source": { "bytes": input_image_base64 } }]
},
"videoGenerationConfig": {
"durationSeconds": 6,
"fps": 24,
"dimension": "1280x720",
"seed": random.randint(0, 2147483648)
}
}
invocation = bedrock_runtime.start_async_invoke(
modelId=MODEL_ID,
modelInput=model_input,
outputDataConfig={"s3OutputDataConfig": {"s3Uri": f"s3://{S3_DESTINATION_BUCKET}"}}
)
invocation_arn = invocation["invocationArn"]
s3_prefix = invocation_arn.split('/')[-1]
s3_location = f"s3://{S3_DESTINATION_BUCKET}/{s3_prefix}"
print(f"\nS3 URI: {s3_location}")
while True:
response = bedrock_runtime.get_async_invoke(
invocationArn=invocation_arn
)
status = response["status"]
print(f"Status: {status}")
if status != "InProgress":
break
time.sleep(SLEEP_TIME)
if status == "Completed":
print(f"\nVideo is ready at {s3_location}/output.mp4")
else:
print(f"\nVideo generation status: {status}")
Output:
Again, I download the output using the AWS CLI:
aws s3 cp s3://BUCKET/PREFIX/output.mp4 ./output-from-image.mp4
This is the resulting video. The camera starts from the reference image and moves forward.
Amazon Nova models are designed with a strong emphasis on customer safety, security, and trust throughout their development, ensuring peace of mind and the flexibility needed to support diverse use cases.
With robust safety features and content moderation capabilities, Amazon Nova provides you with the necessary controls to adopt AI responsibly. Every image and video generated by these models includes digital watermarking for added transparency.
To match the advanced capabilities of Amazon Nova foundation models, comprehensive protections are in place. These safeguards actively address critical issues such as misinformation, child sexual abuse material (CSAM), and risks associated with chemical, biological, radiological, or nuclear (CBRN) threats.
Amazon Nova has proven to be a powerful tool in my hands-on experience. From analyzing documents to creating high-quality videos, the models showcased impressive speed, accuracy, and versatility. The video analysis, in particular, stood out, with detailed and insightful outputs that far exceeded my expectations.
Now, I’d love to hear from you! Have you had a chance to try Amazon Nova? What are your thoughts on its performance, features, or any specific tasks you’ve tested it on? Let me know in the comment section below.