I used Amazon Nova Today and this is my Honest Review

Abhishek Kumar Last Updated : 05 Dec, 2024
9 min read

In the recent re:Invent 2024 event, Amazon released its most advanced Nova foundation models, built to enhance AI and content creation. In this article, I’ll discuss Nova’s architecture, highlighting its powerful capabilities, and then put it to the test to share my hands-on experience with this innovative technology.

What are Amazon Nova Foundational Models?

Amazon Nova is the next evolution in foundation models, delivering state-of-the-art intelligence combined with unparalleled price-performance. Exclusively available through Amazon Bedrock, these models empower a wide range of applications.

From processing documents with image and text analysis to scaling marketing content creation or building AI assistants that can interpret and respond to visual data, Amazon Nova provides the intelligence and flexibility to meet your needs. The suite includes two specialized model categories: Understanding and Creative Content Generation, catering to diverse use cases with precision and innovation.

Types of AWS Nova Models

Understanding models: Text and Visual Intelligence

Amazon Nova Micro, Nova Lite, and Nova Pro are advanced understanding models designed to process text, image, and video inputs, delivering text-based outputs. These models offer a versatile range of capabilities, balancing accuracy, speed, and cost to meet diverse operational needs. Key features include:

  • Efficient and cost-effective inference across various intelligence tiers
  • State-of-the-art understanding of text, images, and videos
  • Fine-tuning support for text, image, and video inputs
  • Cutting-edge multimodal retrieval-augmented generation (RAG) and agentic capabilities
  • Seamless integration with proprietary data and applications via Amazon Bedrock
Amazon Nova Foundational Models
Source: AWS

Let’s look at each one of them:

Amazon Nova Micro

Amazon Nova Micro is a text-only model optimized for ultra-low latency and cost-effective performance. It excels in a wide range of tasks, including language understanding, translation, reasoning, code completion, brainstorming, and mathematical problem-solving. With a generation speed exceeding 200 tokens per second, it is perfect for applications demanding rapid responses.

Key Features

  • Maximum Tokens: Supports up to 128k tokens
  • Languages: Compatible with 200+ languages
  • Fine-Tuning: Fully supports fine-tuning with text input

Amazon Nova Lite

Amazon Nova Lite is an ultra-fast and cost-effective multimodal model designed to handle text, image, and video inputs. Its impressive accuracy across diverse tasks, combined with exceptional speed, makes it ideal for interactive and high-volume applications where cost-efficiency is a priority.

Key Features

  • Maximum Tokens: Supports up to 300k tokens
  • Languages: Compatible with 200+ languages
  • Fine-Tuning: Fully supports fine-tuning with text, image, and video inputs

Amazon Nova Pro

Amazon Nova Pro is a highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks.  Amazon Nova Pro’s capabilities, coupled with its industry-leading speed and cost efficiency, makes it a compelling model for almost any task, including video summarization, Q&A, mathematical reasoning, software development, and AI agents that can execute multi-step workflows. In addition to state-of-the-art accuracy on text and visual intelligence benchmarks, Amazon Nova Pro excels at instruction following and agentic workflows as measured by Comprehensive RAG Benchmark (CRAG), the Berkeley Function Calling Leaderboard, and Mind2Web.

Key Features

  • Max tokens: 300k
  • Languages: 200+ languages
  • Fine-tuning supported: Yes, with text, image, and video input.

Amazon Nova Premier

Most capable multimodal model for complex reasoning tasks and for use as the best teacher for distilling custom models. Amazon Nova Premier is still in training. They are targeting availability in early 2025.

Creative Content Generation: Bringing Concepts to Life

The Amazon Nova suite includes two cutting-edge models for creating realistic multimodal content, tailored for a wide range of applications such as advertising, marketing, and entertainment:

 Amazon Nova Canvas

A state-of-the-art image generation model designed to produce high-quality visuals with precise control over style and content. Amazon Nova Canvas offers advanced features for creative flexibility and excels in benchmarks like TIFA (Text-to-Image Faithfulness Assessment) and ImageReward.

Key Functionalities

  • Text-to-Image Generation:
    • Generates images in resolutions ranging from 512p to 2K horizontal resolution.
    • Supports flexible aspect ratios (1:4 to 4:1) with a maximum of 4.2 million pixels.
    • Allows customers to provide reference images to guide the model’s style, color palette, or to create variations.
  • Image Editing:
    • Offers precise editing capabilities such as inpainting and outpainting using natural language mask prompts to target specific areas for modification.
    • Includes background removal to seamlessly replace or adjust backgrounds while preserving the subject.

Amazon Nova Reel

A state-of-the-art video generation model designed to create professional-quality video content. Amazon Nova Reel outperforms existing models in human evaluations of video quality and consistency.

Key Functionalities

  • Generate Videos from Text Prompts: Creates 6-second videos at 720p resolution and 24 frames per second.
  • Generate Videos from Reference Images and Prompts: Combines static images and textual inputs to produce dynamic, guided motion.
  • Camera Motion Control: Provides over 20 camera motion effects, such as “zoom” and “dolly forward,” guided through text prompts, offering precise control over visual dynamics.

Amazon Nova: Benchmarks and Results

Amazon Nova models deliver exceptional performance across core and agentic text benchmarks, excelling in MMLU, ARC-C, and GSM8K. Tested against leading models like GPT-4 and Claude, Nova sets new standards in accuracy, reasoning, and task execution.

Core Capability Text Benchmarks and Results

Quantitative results on core capability benchmarks, including MMLU, ARC-C, DROP, GPQA, MATH, GSM8K, IFEval, and BigBench-Hard (BBH). Unless stated otherwise, reference values are sourced from the original technical reports and websites for Claude, GPT-4, Llama, and Gemini models. Results labeled with M were independently measured, while Claude’s IFEval scores are marked with an asterisk (∗) due to unspecified scoring methodology.

Agentic Text Benchmarks and Results

 Core capability text benchmarks and results

Results from the Berkeley Function Calling Leaderboard (BFCL) v3 as of the November 17, 2024 update, featuring the latest model versions available at that time. For Llama 3.2 11B and 90B, leaderboard results for Llama 3.1 8B and 70B are used due to the shared text LLM.

In the next section, I’ll demonstrate how to put AWS Nova to use. If you’re having trouble accessing AWS Nova, check out my detailed instructions in the article – How to Access Nova in AWS?

Using Amazon Nova Pro for Document Analysis

To demonstrate the capabilities of document analysis, I downloaded this Article from Analytics Vidhya blog Build Agents the Atomic Way! in PDF format.

First, I navigated to the Model Access section in the Amazon Bedrock console and requested access to the new Amazon Nova models. Next, in the Playground section, I selected the Chat/Text option and chose the Amazon Nova Pro model. I then uploaded the decision guide PDF and asked:

Write a summary of this doc in 100 words. Then, build a decision tree.

Output:

The output follows my instructions producing a structured decision tree that gives me a glimpse of the document before reading it.

Using Amazon Nova Pro for Document Analysis

Using Amazon Nova Pro for Video Analysis

Nova Pro Interface

To demonstrate video analysis, I will input one video into the input :

Amazon Nova Pro can analyze videos that are uploaded, I asked:

Whats happening in the video?

Output:

Using Amazon Nova Pro for Video Analysis
At the beginning of the video, there are three cats on a ledge. 
One cat is gray and white, one is brown and white, and one is white. 
The white cat is on the right side of the ledge. 
The cats are looking in different directions. 
There are some plants and trees in the background.

As the video progresses, the cats continue to stand on the ledge. 
The white cat moves to the middle of the ledge. 
The cats continue to look in different directions. 
The plants and trees in the background remain the same.

Nova Pro API

I can use a more detailed prompt to extract specific information from the video such as objects or text. Note that Amazon Nova currently does not process audio in a video.

I can also use the AWS SDK for Python (Boto3) to invoke the Amazon Nova Pro model using the Amazon Bedrock Converse API and analyze the video. Please ensure that AWS is properly configured in your system to use the API. Additionally, verify that you have the necessary permissions to execute the operations.

import boto3

AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-pro-v1:0"
VIDEO_FILE = "/home/abhishek/Downloads/cats_sample"

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)
with open(VIDEO_FILE, "rb") as f:
    video = f.read()

user_message = "Describe this video."

messages = [ { "role": "user", "content": [
    {"video": {"format": "mp4", "source": {"bytes": video}}},
    {"text": user_message}
] } ]

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=messages,
    inferenceConfig={"temperature": 0.0}
 )

response_text = response["output"]["message"]["content"][0]["text"]
print(response_text)

Amazon Nova Pro can analyze videos that are uploaded with the API (as in the previous code) or that are stored in an Amazon Simple Storage Service (Amazon S3) bucket.

Output:

NOVA API output

Using Amazon Nova Reel for Video Creation

Now, let’s create a video using Amazon Nova Reel, starting from a text-only prompt and then providing a reference image. Because generating a video takes a few minutes, the Amazon Bedrock API introduced three new operations:

  • StartAsyncInvoke: Initiates video creation.
  • GetAsyncInvoke: Tracks the status of creation.
  • ListAsyncInvokes: Lists all ongoing or completed video tasks.

Amazon Nova Reel supports camera control actions such as zooming or moving the camera. This Python script creates a video from this text prompt:

A colorful flower garden with roses, sunflowers, 
tulips, and lavender swaying in the sunlight. 
The camera zooms in to capture the 
intricate details of each bloom..

After the first invocation, the script periodically checks the status until the creation of the video has been completed. I pass a random seed to get a different result each time the code runs.

import random
import time

import boto3

AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-reel-v1:0"
SLEEP_TIME = 30
S3_DESTINATION_BUCKET = "<BUCKET>"

video_prompt = "A colorful flower garden with roses, sunflowers, tulips, and lavender swaying in the sunlight. The camera zooms in to capture the intricate details of each bloom."

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)
model_input = {
    "taskType": "TEXT_VIDEO",
    "textToVideoParams": {"text": video_prompt},
    "videoGenerationConfig": {
        "durationSeconds": 6,
        "fps": 24,
        "dimension": "1280x720",
        "seed": random.randint(0, 2147483648)
    }
}

invocation = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={"s3OutputDataConfig": {"s3Uri": f"s3://{S3_DESTINATION_BUCKET}"}}
)

invocation_arn = invocation["invocationArn"]
s3_prefix = invocation_arn.split('/')[-1]
s3_location = f"s3://{S3_DESTINATION_BUCKET}/{s3_prefix}"
print(f"\nS3 URI: {s3_location}")

while True:
    response = bedrock_runtime.get_async_invoke(
        invocationArn=invocation_arn
    )
    status = response["status"]
    print(f"Status: {status}")
    if status != "InProgress":
        break
    time.sleep(SLEEP_TIME)

if status == "Completed":
    print(f"\nVideo is ready at {s3_location}/output.mp4")
else:
    print(f"\nVideo generation status: {status}")

Output:

After a few minutes, the script completes and prints the output Amazon Simple Storage Service (Amazon S3) location. I download the output video using the AWS Command Line Interface (AWS CLI) or I can download it manually:

aws s3 cp s3://BUCKET/PREFIX/output.mp4 ./output-from-text.mp4

This is the resulting video. As requested, the camera zooms in on the subject.

Using Amazon Nova Reel with a Reference Image

To have better control over the creation of the video, I can provide Amazon Nova Reel a reference image such as the following:

The provided image must have dimensions in the set [1280×720].

Using Amazon Nova Reel with a Reference Image

This script uses the reference image and a text prompt with a camera action (drone view then a bee sitting on a flower when zoomed in) to create a video:

import base64
import random
import time

import boto3

S3_DESTINATION_BUCKET = "<BUCKET>"
AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-reel-v1:0"
SLEEP_TIME = 30
input_image_path = "seascape.png"
video_prompt = "drone view then a bee sitting on a flower when zoomed in"

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)

# Load the input image as a Base64 string.
with open(input_image_path, "rb") as f:
    input_image_bytes = f.read()
    input_image_base64 = base64.b64encode(input_image_bytes).decode("utf-8")

model_input = {
    "taskType": "TEXT_VIDEO",
    "textToVideoParams": {
        "text": video_prompt,
        "images": [{ "format": "png", "source": { "bytes": input_image_base64 } }]
        },
    "videoGenerationConfig": {
        "durationSeconds": 6,
        "fps": 24,
        "dimension": "1280x720",
        "seed": random.randint(0, 2147483648)
    }
}

invocation = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={"s3OutputDataConfig": {"s3Uri": f"s3://{S3_DESTINATION_BUCKET}"}}
)

invocation_arn = invocation["invocationArn"]
s3_prefix = invocation_arn.split('/')[-1]
s3_location = f"s3://{S3_DESTINATION_BUCKET}/{s3_prefix}"

print(f"\nS3 URI: {s3_location}")

while True:
    response = bedrock_runtime.get_async_invoke(
        invocationArn=invocation_arn
    )
    status = response["status"]
    print(f"Status: {status}")
    if status != "InProgress":
        break
    time.sleep(SLEEP_TIME)
if status == "Completed":
    print(f"\nVideo is ready at {s3_location}/output.mp4")
else:
    print(f"\nVideo generation status: {status}")

Output:

Again, I download the output using the AWS CLI:

aws s3 cp s3://BUCKET/PREFIX/output.mp4 ./output-from-image.mp4

This is the resulting video. The camera starts from the reference image and moves forward.

Building AI Responsibly

Amazon Nova models are designed with a strong emphasis on customer safety, security, and trust throughout their development, ensuring peace of mind and the flexibility needed to support diverse use cases.

With robust safety features and content moderation capabilities, Amazon Nova provides you with the necessary controls to adopt AI responsibly. Every image and video generated by these models includes digital watermarking for added transparency.

To match the advanced capabilities of Amazon Nova foundation models, comprehensive protections are in place. These safeguards actively address critical issues such as misinformation, child sexual abuse material (CSAM), and risks associated with chemical, biological, radiological, or nuclear (CBRN) threats.

End Note

Amazon Nova has proven to be a powerful tool in my hands-on experience. From analyzing documents to creating high-quality videos, the models showcased impressive speed, accuracy, and versatility. The video analysis, in particular, stood out, with detailed and insightful outputs that far exceeded my expectations.

Now, I’d love to hear from you! Have you had a chance to try Amazon Nova? What are your thoughts on its performance, features, or any specific tasks you’ve tested it on? Let me know in the comment section below.

Hello, I'm Abhishek, a Data Engineer Trainee at Analytics Vidhya. I'm passionate about data engineering and video games I have experience in Apache Hadoop, AWS, and SQL,and I keep on exploring their intricacies and optimizing data workflows 

Responses From Readers

Clear

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details