IBM Granite-3.0 Model: A Guide to Model Setup and Usage

Mobarak Inuwa Last Updated : 29 Oct, 2024

9 min read

IBM’s latest addition to its Granite series, Granite 3.0, marks a significant leap forward in the field of large language models (LLMs). Granite 3.0 provides enterprise-ready, instruction-tuned models with an emphasis on safety, speed, and cost-efficiency focused on balancing power and practicality. The Granite 3.0 series enhances IBM’s AI offerings, particularly in domains where precision, security, and adaptability are crucial and built on a foundation of diverse data and fine-tuning techniques.

Learning Objectives

Gain an understanding of Granite 3.0’s model architecture and its enterprise applications.
Learn how to utilize Granite-3.0-2B-Instruct for tasks like summarization, code generation, and Q&A.
Explore IBM’s innovations in training techniques that enhance Granite 3.0’s performance and efficiency.
Understand IBM’s commitment to open-source transparency and responsible AI development.
Discover the role of Granite 3.0 in advancing secure, cost-effective AI solutions across industries.

This article was published as a part of the Data Science Blogathon.

What are Granite 3.0 Models?
Enterprise-Ready Performance and Cost Efficiency
Advancements in Model Training Techniques
Granite-3.0-2B-Instruct: Google Colab Guide
Model Architecture and Training Innovations
Real-World Applications of Granite 3.0
Responsible AI and Open Source Commitment
Future Developments and Expanding Capabilities
Conclusion
Frequently Asked Questions

What are Granite 3.0 Models?

At the forefront of the Granite 3.0 lineup is the Granite 3.0 8B Instruct, an instruction-tuned dense decoder-only model designed to deliver high performance for enterprise tasks. Trained with a dual-phase approach, it was developed with over 12 trillion tokens in various languages and programming dialects, making it highly versatile. This model is suitable for complex workflows in industries like finance, cybersecurity, and programming, combining general-purpose capabilities with robust task-specific fine-tuning.

IBM offers Granite 3.0 under the open-source Apache 2.0 license, ensuring transparency in usage and data handling. The models integrate seamlessly into existing platforms, including IBM’s own Watsonx, Google Cloud Vertex AI, and NVIDIA NIM, enabling accessibility across various environments. This alignment with open-source principles and transparency further reinforces detailed disclosures of training datasets and methodologies, as outlined in the Granite 3.0 technical paper.

Key Features of Granite 3.0

Diverse Model Options for Flexible Use: Granite 3.0 includes models such as Granite-3.0–8B-Instruct, Granite-3.0–8B-Base, Granite-3.0–2B-Instruct, and Granite-3.0–2B-Base, providing a range of options based on scale and performance needs.
Enhanced Safety through Guardrail Models: The release also includes Granite-Guardian-3.0 models, which offer additional layers of safety for sensitive applications. These models help filter inputs and outputs to meet stringent enterprise standards in regulated sectors like healthcare and finance.
Mixture of Experts (MoE) for Latency Reduction: Granite-3.0–3B-A800M-Instruct and other MoE models reduce latency while maintaining high performance, making them ideal for applications with demanding speed requirements.
Improved Inference Speed via Speculative Decoding: Granite-3.0–8B-Instruct-Accelerator introduces speculative decoding, which increases inference speed by allowing the model to make predictions about the next set of possible tokens, enhancing overall efficiency and reducing response time.

Enterprise-Ready Performance and Cost Efficiency

Granite 3.0 optimizes enterprise tasks that require high accuracy and security. Researchers rigorously test the models on industry-specific tasks and academic benchmarks, delivering leading performance in several areas:

Enterprise-Specific Benchmarks: On IBM’s proprietary RAGBench, which evaluates retrieval-augmented generation tasks, Granite 3.0 performed at the top of its class. This benchmark specifically measures qualities like faithfulness and correctness in model outputs, crucial for applications where factual accuracy is paramount.
Specialization in Key Industries: Granite 3.0 shines in sectors such as cybersecurity, where it has been benchmarked against IBM’s proprietary datasets and publicly available cybersecurity standards. This specialization makes it highly suitable for industries with high-stakes data protection needs.
Programming and Tool-Calling Proficiency: Granite 3.0 excels in programming-related tasks, such as code generation and function calling. When tested on multiple tool-calling benchmarks, Granite 3.0 outperformed other models in its weight class, making it a valuable asset for applications involving technical support and software development.

Advancements in Model Training Techniques

IBM’s advanced training methodologies have significantly contributed to Granite 3.0’s high performance and efficiency. The use of Data Prep Kit and IBM Research’s Power Scheduler played crucial roles in optimizing model learning and data processing.

Data Prep Kit: IBM’s Data Prep Kit allows for scalable and streamlined processing of unstructured data, with features like metadata logging and checkpoint capabilities, enabling enterprises to efficiently manage vast datasets.
Power Scheduler for Optimal Learning Rates: IBM’s Power Scheduler dynamically adjusts the model’s learning rate based on batch size and token count, ensuring that training remains efficient without risking overfitting. This innovative approach facilitates faster convergence to optimal model weights, minimizing both time and computational cost.

Granite-3.0-2B-Instruct: Google Colab Guide

Granite-3.0-2B-Instruct is part of IBM’s Granite 3.0 series, developed with a focus on powerful and practical applications for enterprise use. This model strikes a balance between efficient model size and exceptional performance across diverse business scenarios. IBM Granite models are optimized for speed, safety, and cost-effectiveness, making them ideal for production-scale AI applications. The screen shot below was taken after making inferences with the model.

The Granite 3.0 models excel in multilingual support, natural language processing (NLP) tasks, and enterprise-specific use cases. The 2B-Instruct model specifically supports summarization, classification, entity extraction, question-answering, retrieval-augmented generation (RAG), and function-calling tasks.

Model Architecture and Training Innovations

IBM’s Granite 3.0 series utilizes a decoder-only dense transformer architecture, featuring innovations such as GQA (Grouped Query Attention) and RoPE (Rotary Position Embedding) for handling extensive multilingual data.

Key architecture components include:

SwiGLU (Switchable Gated Linear Units): Increases the model’s ability to process complex patterns in natural language.
RMSNorm (Root Mean Square Normalization): Enhances training stability and efficiency.
IBM Power Scheduler: Adjusts learning rates based on a power-law equation to optimize training for large datasets, which is a significant advancement in ensuring cost-effective and scalable training.

Step 1: Setup (Install Required Libraries)

The Granite 3.0 models are hosted on Hugging Face, requiring torch, accelerate, and transformers libraries. Run the following commands to set up the environment:

# Install required libraries
!pip install torch torchvision torchaudio
!pip install accelerate
!pip install git+https://github.com/huggingface/transformers.git # Since it is not available via pip yet

Step 2: Model and Tokenizer Initialization

Now, load the Granite-3.0-2B-Instruct model and tokenizer. This model is hosted on Huggingface, and the AutoModelForCausalLM class is used for language generation tasks. Use the transformers library to load the model and tokenizer. The model is available at IBM’s Hugging Face repository.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Define device as 'cuda' if a GPU is available for faster computation
device = "cuda" if torch.cuda.is_available() else "cpu"

# Model and tokenizer paths
model_path = "ibm-granite/granite-3.0-2b-instruct"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Load the model; set device_map based on your setup
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
model.eval()

Step 3: Input Format for Instruction-based Queries

The model takes input in a structured chat format. To ensure the prompt is in the correct format, create a chat dictionary with roles like “user” or “assistant” to distinguish instructions. To interact with the Granite-3.0-2B-Instruct model, start by defining a structured prompt. The model can respond to detailed prompts, making it suitable for tool-calling and other advanced applications.

# Define a user query in a structured format
chat = [
    { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]

# Prepare the chat data with the required prompts
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

Step 4: Tokenize the Input

Tokenize the structured chat data for the model. This tokenization step converts the text input into a format the model understands.

# Tokenize the input chat
input_tokens = tokenizer(chat, return_tensors="pt").to(device)

Step 5: Generate a Response

With the input tokenized, use the model to generate a response based on the instruction.

# Generate output tokens with a maximum of 100 new tokens in the response
output = model.generate(**input_tokens, max_new_tokens=100)

Step 6: Decode and Print the Output

Finally, decode the generated tokens back into readable text and print the output to see the model’s response.

# Decode and print the response
response = tokenizer.batch_decode(output, skip_special_tokens=True)
print(response[0])

user: Please list one IBM Research laboratory located in the United States. You should only output its name and location.
assistant: 1. IBM Research - Austin, Texas

Real-World Applications of Granite 3.0

Here are a few additional examples to explore Granite-3.0-2B-Instruct’s versatility:

Text Summarization

Quickly distill lengthy documents into concise summaries, allowing users to grasp the core message without sifting through extensive content.

chat = [
    { "role": "user", "content": " Summarize the following paragraph: Granite-3.0-2B-Instruct is developed by IBM for handling multilingual and domain-specific tasks with general instruction following capabilities." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_new_tokens=1000)
print(tokenizer.batch_decode(output, skip_special_tokens=True)[0])

user Summarize the following paragraph: Granite-3.0-2B-Instruct is developed by IBM for handling multilingual and domain-specific tasks with general instruction following capabilities.
assistant Granite-3.0-2B-Instruct is an AI model by IBM, designed to manage multilingual and domain-specific tasks while adhering to general instructions.

Question Answering

Answer questions directly from data sources, providing users with precise information in response to their specific inquiries.

chat = [
    { "role": "user", "content": "What are the capabilities of Granite-3.0-2B-Instruct?" },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_new_tokens=100)
print(tokenizer.batch_decode(output, skip_special_tokens=True)[0])

user What are the capabilities of Granite-3.0-2B-Instruct?
assistant 1. Text Generation: Granite-3.0-2B-Instruct can generate human-like text based on the input it receives.
2. Question Answering: It can provide accurate and relevant answers to a wide range of questions.
3. Translation: It can translate text from one language to another.
4. Summarization: It can summarize long pieces of text into shorter, more digestible versions.
5. Sentiment Analysis: It can analyze text

Automatically generate code snippets and entire scripts, accelerating development and making complex programming tasks more accessible.

chat = [
    { "role": "user", "content": "Write a Python function to compute the factorial of a number." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_new_tokens=100)
print(tokenizer.batch_decode(output, skip_special_tokens=True)[0])

userWrite a Python function to compute the factorial of a number.
assistantHere is the code to compute the factorial of a number:

```python
def factorial(n: int) -> int:
    if n < 0:
        raise ValueError("Factorial is not defined for negative numbers")
    elif n == 0:
        return 1
    else:
        result = 1
        for i in range(1, n + 1):
            result *= i
        return result
```

```python
import unittest

class TestFactorial(unittest.TestCase):
    def test_factorial(self):
        self.assertEqual(factorial(0), 1)
        self.assertEqual(factorial(1), 1)
        self.assertEqual(factorial(5), 120)
        self.assertEqual(factorial(10), 3628800)
        with self.assertRaises(ValueError):
            factorial(-5)

if __name__ == '__main__':
    unittest.main(argv=[''], verbosity=2, exit=False)
```

This code defines a function `factorial` that takes an integer `n` as input and returns the factorial of `n`. The function first checks if `n` is less than 0, and if so, raises a `ValueError` since factorial is not defined for negative numbers. If `n` is 0, the function returns 1 since the factorial of 0 is 1. Otherwise, the function initializes a variable `result` to 1 and then uses a for loop to multiply `result` by each integer from 1 to `n` (inclusive). The function finally returns the value of `result`.

The code also includes a unit test class `TestFactorial` that tests the `factorial` function with various inputs and checks that the output is correct. The test class includes a method `test_factorial` that tests the function with different inputs and checks that the output is correct using the `assertEqual` method. The test class also includes a test case that checks that the function raises a `ValueError` when given a negative input. The unit test is run using the `unittest` module.

Note that the output is in markdown format.

Responsible AI and Open Source Commitment

Reflecting its commitment to ethical AI, IBM has ensured that Granite 3.0 models are built with governance, privacy, and bias mitigation at the forefront. IBM has taken additional steps to maintain transparency by disclosing all training datasets, aligning with its Responsible Use Guide, which outlines the model’s responsible applications and limitations. IBM also offers uncapped indemnity for third-party IP claims, demonstrating confidence in the legal robustness of its models.

Granite 3.0 models continue IBM’s legacy of supporting sustainable AI development. Trained on Blue Vela, a renewable energy-powered infrastructure, IBM underscores its commitment to reducing environmental impact within the AI industry.

Future Developments and Expanding Capabilities

IBM plans to extend the capabilities of Granite 3.0 throughout the year, adding features like expanded context windows up to 128K tokens and enhanced multilingual support. These enhancements will increase the model’s adaptability to more complex queries and improve its versatility in global enterprises. In addition, IBM will be introducing multimodal capabilities, enabling Granite 3.0 to handle image-in, text-out tasks, broadening its application to industries like media and retail.

Conclusion

IBM’s Granite-3.0-2B-Instruct is one of the smallest models in the series as regards parameters yet offers powerful, enterprise-ready capabilities designed to meet the demands of modern business applications. IBM’s open-source tools, flexible licensing, and innovations in model training can help developers and data scientists build solutions with lower costs and improved reliability. The entire IBM Granite 3.0 series represents a step forward in practical, enterprise-level AI applications. Granite 3.0 combines powerful performance, robust safety measures, and cost-effective scalability, positioning itself as a cornerstone for businesses seeking sophisticated language models tailored to their unique needs.

Key Takeaways

Efficiency and Scalability: Granite-3.0-2B-Instruct provides high performance with a cost-effective and scalable model size, ideal for enterprise AI solutions.
Transparency and Safety: The model’s open-source design under Apache 2.0 and IBM’s Responsible Use Guide reflect a commitment to safety, transparency, and ethical AI use.
Advanced Multilingual Support: With training across 12 languages, Granite-3.0-2B-Instruct offers broad applicability in diverse business environments globally.

References

Hugging Face: Click Here
GitHub: Clic k Here
IBM: Click Here
IBM Doc: Click Here

Frequently Asked Questions

Q1. What makes IBM Granite-3.0 Model unique compared to other large language models?

A. IBM Granite-3.0 Model is optimized for enterprise use with a balance of powerful performance and practical model size. Its dense, decoder-only architecture, robust multilingual support, and cost-efficient scalability make it ideal for diverse business applications.

Q2. How does the IBM Power Scheduler improve training efficiency?

A. The IBM Power Scheduler dynamically adjusts learning rates based on training parameters like token count and batch size, allowing the model to train faster without overfitting, thus reducing costs.

Q3. What tasks can Granite-3.0 be used for in natural language processing?

A. Granite-3.0 supports tasks like text summarization, classification, entity extraction, code generation, retrieval-augmented generation (RAG), and customer service automation.

Q4. How does Granite-3.0 ensure data safety and ethical use?

A. IBM includes a Responsible Use Guide with the model, focused on governance, risk mitigation, and privacy. IBM also discloses training datasets, ensuring transparency around the data used for model training.

Q5. Can Granite-3.0 be fine-tuned for specific industries?

A. Yes, using IBM’s InstructLab and the Data Prep Kit, enterprises can fine-tune the model to meet specific needs. InstructLab facilitates phased fine-tuning with synthetic data, making customization easier and more cost-effective.

Q6. Is Granite-3.0 available on cloud platforms for easier access?

A. Yes, the model is accessible on the IBM Watsonx platform and through partners like Google Vertex AI, Hugging Face, and NVIDIA, enabling flexible deployment options for businesses.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Mobarak Inuwa

I am an AI Engineer with a deep passion for research, and solving complex problems. I provide AI solutions leveraging Large Language Models (LLMs), GenAI, Transformer Models, and Stable Diffusion.

Advanced Generative AI LLMs

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

IBM Granite-3.0 Model: A Guide to Model Setup and Usage

Learning Objectives

Table of contents

What are Granite 3.0 Models?

Key Features of Granite 3.0

Enterprise-Ready Performance and Cost Efficiency

Advancements in Model Training Techniques

Granite-3.0-2B-Instruct: Google Colab Guide

Model Architecture and Training Innovations

Step 1: Setup (Install Required Libraries)

Step 2: Model and Tokenizer Initialization

Step 3: Input Format for Instruction-based Queries

Step 4: Tokenize the Input

Step 5: Generate a Response

Step 6: Decode and Print the Output

Real-World Applications of Granite 3.0

Text Summarization

Question Answering

Code-Related Tasks

Responsible AI and Open Source Commitment

Future Developments and Expanding Capabilities

Conclusion

Key Takeaways

References

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory