I used Amazon Nova Today and this is my Honest Review

Abhishek Kumar Last Updated : 19 Dec, 2024

9 min read

In the recent re:Invent 2024 event, Amazon released its most advanced Nova foundation models, built to enhance AI and content creation. In this article, I’ll discuss Nova’s architecture, highlighting its powerful capabilities, and then put it to the test to share my hands-on experience with this innovative technology.

In this article, you will discover the features and capabilities of Amazon Nova, explore detailed Amazon Nova reviews, analyze Amazon Nova benchmarks, and gain insights from the AWS Nova review to understand its impact on AI applications.

What are Amazon Nova Foundational Models?
Types of AWS Nova Models
- Understanding models: Text and Visual Intelligence
- Creative Content Generation: Bringing Concepts to Life
Amazon Nova: Benchmarks and Results
- Core Capability Text Benchmarks and Results
- Agentic Text Benchmarks and Results
Using Amazon Nova Pro for Document Analysis
Using Amazon Nova Pro for Video Analysis
- Nova Pro Interface
- Nova Pro API
Using Amazon Nova Reel for Video Creation
Using Amazon Nova Reel with a Reference Image
Building AI Responsibly
End Note

What are Amazon Nova Foundational Models?

Amazon Nova is the next evolution in foundation models, delivering state-of-the-art intelligence combined with unparalleled price-performance. Exclusively available through Amazon Bedrock, these models empower a wide range of applications.

From processing documents with image and text analysis to scaling marketing content creation or building AI assistants that can interpret and respond to visual data, Amazon Nova provides the intelligence and flexibility to meet your needs. The suite includes two specialized model categories: Understanding and Creative Content Generation, catering to diverse use cases with precision and innovation.

Types of AWS Nova Models

Understanding models: Text and Visual Intelligence

Amazon Nova Micro, Nova Lite, and Nova Pro are advanced understanding models designed to process text, image, and video inputs, delivering text-based outputs. These models offer a versatile range of capabilities, balancing accuracy, speed, and cost to meet diverse operational needs. Key features include:

Efficient and cost-effective inference across various intelligence tiers
State-of-the-art understanding of text, images, and videos
Fine-tuning support for text, image, and video inputs
Cutting-edge multimodal retrieval-augmented generation (RAG) and agentic capabilities
Seamless integration with proprietary data and applications via Amazon Bedrock

Amazon Nova Foundational Models — Source: AWS

Let’s look at each one of them:

Amazon Nova Micro

Amazon Nova Micro is a text-only model optimized for ultra-low latency and cost-effective performance. It excels in a wide range of tasks, including language understanding, translation, reasoning, code completion, brainstorming, and mathematical problem-solving. With a generation speed exceeding 200 tokens per second, it is perfect for applications demanding rapid responses.

Key Features

Maximum Tokens: Supports up to 128k tokens
Languages: Compatible with 200+ languages
Fine-Tuning: Fully supports fine-tuning with text input

Amazon Nova Lite

Amazon Nova Lite is an ultra-fast and cost-effective multimodal model designed to handle text, image, and video inputs. Its impressive accuracy across diverse tasks, combined with exceptional speed, makes it ideal for interactive and high-volume applications where cost-efficiency is a priority.

Key Features

Maximum Tokens: Supports up to 300k tokens
Languages: Compatible with 200+ languages
Fine-Tuning: Fully supports fine-tuning with text, image, and video inputs

Amazon Nova Pro

Amazon Nova Pro is a highly capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Pro’s capabilities, coupled with its industry-leading speed and cost efficiency, makes it a compelling model for almost any task, including video summarization, Q&A, mathematical reasoning, software development, and AI agents that can execute multi-step workflows. In addition to state-of-the-art accuracy on text and visual intelligence benchmarks, Amazon Nova Pro excels at instruction following and agentic workflows as measured by Comprehensive RAG Benchmark (CRAG), the Berkeley Function Calling Leaderboard, and Mind2Web.

Key Features

Max tokens: 300k
Languages: 200+ languages
Fine-tuning supported: Yes, with text, image, and video input.

Amazon Nova Premier

Most capable multimodal model for complex reasoning tasks and for use as the best teacher for distilling custom models. Amazon Nova Premier is still in training. They are targeting availability in early 2025.

Creative Content Generation: Bringing Concepts to Life

The Amazon Nova suite includes two cutting-edge models for creating realistic multimodal content, tailored for a wide range of applications such as advertising, marketing, and entertainment:

Amazon Nova Canvas

A state-of-the-art image generation model designed to produce high-quality visuals with precise control over style and content. Amazon Nova Canvas offers advanced features for creative flexibility and excels in benchmarks like TIFA (Text-to-Image Faithfulness Assessment) and ImageReward.

Key Functionalities

Text-to-Image Generation:
- Generates images in resolutions ranging from 512p to 2K horizontal resolution.
- Supports flexible aspect ratios (1:4 to 4:1) with a maximum of 4.2 million pixels.
- Allows customers to provide reference images to guide the model’s style, color palette, or to create variations.
Image Editing:
- Offers precise editing capabilities such as inpainting and outpainting using natural language mask prompts to target specific areas for modification.
- Includes background removal to seamlessly replace or adjust backgrounds while preserving the subject.

Amazon Nova Reel

A state-of-the-art video generation model designed to create professional-quality video content. Amazon Nova Reel outperforms existing models in human evaluations of video quality and consistency.

Key Functionalities

Generate Videos from Text Prompts: Creates 6-second videos at 720p resolution and 24 frames per second.
Generate Videos from Reference Images and Prompts: Combines static images and textual inputs to produce dynamic, guided motion.
Camera Motion Control: Provides over 20 camera motion effects, such as “zoom” and “dolly forward,” guided through text prompts, offering precise control over visual dynamics.

Amazon Nova: Benchmarks and Results

Amazon Nova models deliver exceptional performance across core and agentic text benchmarks, excelling in MMLU, ARC-C, and GSM8K. Tested against leading models like GPT-4 and Claude, Nova sets new standards in accuracy, reasoning, and task execution.

Core Capability Text Benchmarks and Results

Quantitative results on core capability benchmarks, including MMLU, ARC-C, DROP, GPQA, MATH, GSM8K, IFEval, and BigBench-Hard (BBH). Unless stated otherwise, reference values are sourced from the original technical reports and websites for Claude, GPT-4, Llama, and Gemini models. Results labeled with M were independently measured, while Claude’s IFEval scores are marked with an asterisk (∗) due to unspecified scoring methodology.

Agentic Text Benchmarks and Results

Core capability text benchmarks and results

Results from the Berkeley Function Calling Leaderboard (BFCL) v3 as of the November 17, 2024 update, featuring the latest model versions available at that time. For Llama 3.2 11B and 90B, leaderboard results for Llama 3.1 8B and 70B are used due to the shared text LLM.

In the next section, I’ll demonstrate how to put AWS Nova to use. If you’re having trouble accessing AWS Nova, check out my detailed instructions in the article – How to Access Nova in AWS?

Using Amazon Nova Pro for Document Analysis

To demonstrate the capabilities of document analysis, I downloaded this Article from Analytics Vidhya blog Build Agents the Atomic Way! in PDF format.

First, I navigated to the Model Access section in the Amazon Bedrock console and requested access to the new Amazon Nova models. Next, in the Playground section, I selected the Chat/Text option and chose the Amazon Nova Pro model. I then uploaded the decision guide PDF and asked:

Write a summary of this doc in 100 words. Then, build a decision tree.

Output:

The output follows my instructions producing a structured decision tree that gives me a glimpse of the document before reading it.

Using Amazon Nova Pro for Document Analysis

Using Amazon Nova Pro for Video Analysis

Nova Pro Interface

To demonstrate video analysis, I will input one video into the input :

Amazon Nova Pro can analyze videos that are uploaded, I asked:

Whats happening in the video?

Output:

Using Amazon Nova Pro for Video Analysis

At the beginning of the video, there are three cats on a ledge. 
One cat is gray and white, one is brown and white, and one is white. 
The white cat is on the right side of the ledge. 
The cats are looking in different directions. 
There are some plants and trees in the background.

As the video progresses, the cats continue to stand on the ledge. 
The white cat moves to the middle of the ledge. 
The cats continue to look in different directions. 
The plants and trees in the background remain the same.

Nova Pro API

I can use a more detailed prompt to extract specific information from the video such as objects or text. Note that Amazon Nova currently does not process audio in a video.

I can also use the AWS SDK for Python (Boto3) to invoke the Amazon Nova Pro model using the Amazon Bedrock Converse API and analyze the video. Please ensure that AWS is properly configured in your system to use the API. Additionally, verify that you have the necessary permissions to execute the operations.

import boto3

AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-pro-v1:0"
VIDEO_FILE = "/home/abhishek/Downloads/cats_sample"

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)
with open(VIDEO_FILE, "rb") as f:
    video = f.read()

user_message = "Describe this video."

messages = [ { "role": "user", "content": [
    {"video": {"format": "mp4", "source": {"bytes": video}}},
    {"text": user_message}
] } ]

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=messages,
    inferenceConfig={"temperature": 0.0}
 )

response_text = response["output"]["message"]["content"][0]["text"]
print(response_text)

Amazon Nova Pro can analyze videos that are uploaded with the API (as in the previous code) or that are stored in an Amazon Simple Storage Service (Amazon S3) bucket.

Output:

Using Amazon Nova Reel for Video Creation

Now, let’s create a video using Amazon Nova Reel, starting from a text-only prompt and then providing a reference image. Because generating a video takes a few minutes, the Amazon Bedrock API introduced three new operations:

StartAsyncInvoke: Initiates video creation.
GetAsyncInvoke: Tracks the status of creation.
ListAsyncInvokes: Lists all ongoing or completed video tasks.

Amazon Nova Reel supports camera control actions such as zooming or moving the camera. This Python script creates a video from this text prompt:

A colorful flower garden with roses, sunflowers, 
tulips, and lavender swaying in the sunlight. 
The camera zooms in to capture the 
intricate details of each bloom..

After the first invocation, the script periodically checks the status until the creation of the video has been completed. I pass a random seed to get a different result each time the code runs.

import random
import time

import boto3

AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-reel-v1:0"
SLEEP_TIME = 30
S3_DESTINATION_BUCKET = "<BUCKET>"

video_prompt = "A colorful flower garden with roses, sunflowers, tulips, and lavender swaying in the sunlight. The camera zooms in to capture the intricate details of each bloom."

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)
model_input = {
    "taskType": "TEXT_VIDEO",
    "textToVideoParams": {"text": video_prompt},
    "videoGenerationConfig": {
        "durationSeconds": 6,
        "fps": 24,
        "dimension": "1280x720",
        "seed": random.randint(0, 2147483648)
    }
}

invocation = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={"s3OutputDataConfig": {"s3Uri": f"s3://{S3_DESTINATION_BUCKET}"}}
)

invocation_arn = invocation["invocationArn"]
s3_prefix = invocation_arn.split('/')[-1]
s3_location = f"s3://{S3_DESTINATION_BUCKET}/{s3_prefix}"
print(f"\nS3 URI: {s3_location}")

while True:
    response = bedrock_runtime.get_async_invoke(
        invocationArn=invocation_arn
    )
    status = response["status"]
    print(f"Status: {status}")
    if status != "InProgress":
        break
    time.sleep(SLEEP_TIME)

if status == "Completed":
    print(f"\nVideo is ready at {s3_location}/output.mp4")
else:
    print(f"\nVideo generation status: {status}")

Output:

After a few minutes, the script completes and prints the output Amazon Simple Storage Service (Amazon S3) location. I download the output video using the AWS Command Line Interface (AWS CLI) or I can download it manually:

aws s3 cp s3://BUCKET/PREFIX/output.mp4 ./output-from-text.mp4

This is the resulting video. As requested, the camera zooms in on the subject.

Using Amazon Nova Reel with a Reference Image

To have better control over the creation of the video, I can provide Amazon Nova Reel a reference image such as the following:

The provided image must have dimensions in the set [1280×720].

Using Amazon Nova Reel with a Reference Image

This script uses the reference image and a text prompt with a camera action (drone view then a bee sitting on a flower when zoomed in) to create a video:

import base64
import random
import time

import boto3

S3_DESTINATION_BUCKET = "<BUCKET>"
AWS_REGION = "us-east-1"
MODEL_ID = "amazon.nova-reel-v1:0"
SLEEP_TIME = 30
input_image_path = "seascape.png"
video_prompt = "drone view then a bee sitting on a flower when zoomed in"

bedrock_runtime = boto3.client("bedrock-runtime", region_name=AWS_REGION)

# Load the input image as a Base64 string.
with open(input_image_path, "rb") as f:
    input_image_bytes = f.read()
    input_image_base64 = base64.b64encode(input_image_bytes).decode("utf-8")

model_input = {
    "taskType": "TEXT_VIDEO",
    "textToVideoParams": {
        "text": video_prompt,
        "images": [{ "format": "png", "source": { "bytes": input_image_base64 } }]
        },
    "videoGenerationConfig": {
        "durationSeconds": 6,
        "fps": 24,
        "dimension": "1280x720",
        "seed": random.randint(0, 2147483648)
    }
}

invocation = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={"s3OutputDataConfig": {"s3Uri": f"s3://{S3_DESTINATION_BUCKET}"}}
)

invocation_arn = invocation["invocationArn"]
s3_prefix = invocation_arn.split('/')[-1]
s3_location = f"s3://{S3_DESTINATION_BUCKET}/{s3_prefix}"

print(f"\nS3 URI: {s3_location}")

while True:
    response = bedrock_runtime.get_async_invoke(
        invocationArn=invocation_arn
    )
    status = response["status"]
    print(f"Status: {status}")
    if status != "InProgress":
        break
    time.sleep(SLEEP_TIME)
if status == "Completed":
    print(f"\nVideo is ready at {s3_location}/output.mp4")
else:
    print(f"\nVideo generation status: {status}")

Output:

Again, I download the output using the AWS CLI:

aws s3 cp s3://BUCKET/PREFIX/output.mp4 ./output-from-image.mp4

This is the resulting video. The camera starts from the reference image and moves forward.

Building AI Responsibly

Amazon Nova models are designed with a strong emphasis on customer safety, security, and trust throughout their development, ensuring peace of mind and the flexibility needed to support diverse use cases.

With robust safety features and content moderation capabilities, Amazon Nova provides you with the necessary controls to adopt AI responsibly. Every image and video generated by these models includes digital watermarking for added transparency.

To match the advanced capabilities of Amazon Nova foundation models, comprehensive protections are in place. These safeguards actively address critical issues such as misinformation, child sexual abuse material (CSAM), and risks associated with chemical, biological, radiological, or nuclear (CBRN) threats.

End Note

Amazon Nova has proven to be a powerful tool in my hands-on experience. From analyzing documents to creating high-quality videos, the models showcased impressive speed, accuracy, and versatility. The video analysis, in particular, stood out, with detailed and insightful outputs that far exceeded my expectations.

Now, I’d love to hear from you! Have you had a chance to try Amazon Nova? What are your thoughts on its performance, features, or any specific tasks you’ve tested it on? Let me know in the comment section below.

Abhishek Kumar

Hello, I'm Abhishek, a Data Engineer Trainee at Analytics Vidhya. I'm passionate about data engineering and video games I have experience in Apache Hadoop, AWS, and SQL,and I keep on exploring their intricacies and optimizing data workflows

Generative AI LLMs

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Basics of Machine Learning

Machine Learning Lifecycle

Importance of Stats and EDA

Understanding Data

Probability

Exploring Continuous Variable

Exploring Categorical Variables

Missing Values and Outliers

Central Limit theorem

Bivariate Analysis Introduction

Continuous - Continuous Variables

Continuous Categorical

Categorical Categorical

Multivariate Analysis

Different tasks in Machine Learning

Build Your First Predictive Model

Evaluation Metrics

Preprocessing Data

Linear Models

KNN

Selecting the Right Model

Feature Selection Techniques

Decision Tree

Feature Engineering

Naive Bayes

Multiclass and Multilabel

Basics of Ensemble Techniques

Advance Ensemble Techniques

Hyperparameter Tuning

Support Vector Machine

Advance Dimensionality Reduction

Unsupervised Machine Learning Methods

Recommendation Engines

Improving ML models

Working with Large Datasets

Interpretability of Machine Learning Models

Automated Machine Learning

Model Deployment

Deploying ML Models

Embedded Devices

I used Amazon Nova Today and this is my Honest Review

Table of contents

What are Amazon Nova Foundational Models?

Types of AWS Nova Models

Understanding models: Text and Visual Intelligence

Amazon Nova Micro

Amazon Nova Lite

Amazon Nova Pro

Amazon Nova Premier

Creative Content Generation: Bringing Concepts to Life

Amazon Nova Canvas

Amazon Nova Reel

Amazon Nova: Benchmarks and Results

Core Capability Text Benchmarks and Results

Agentic Text Benchmarks and Results

Using Amazon Nova Pro for Document Analysis

Using Amazon Nova Pro for Video Analysis

Nova Pro Interface

Nova Pro API

Using Amazon Nova Reel for Video Creation

Using Amazon Nova Reel with a Reference Image

Building AI Responsibly

End Note

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)