IBM Granite-3.0 Model: A Guide to Model Setup and Usage

Mobarak Inuwa Last Updated : 29 Oct, 2024
9 min read

IBM’s latest addition to its Granite series, Granite 3.0, marks a significant leap forward in the field of large language models (LLMs). Granite 3.0 provides enterprise-ready, instruction-tuned models with an emphasis on safety, speed, and cost-efficiency focused on balancing power and practicality. The Granite 3.0 series enhances IBM’s AI offerings, particularly in domains where precision, security, and adaptability are crucial and built on a foundation of diverse data and fine-tuning techniques.

Learning Objectives

  • Gain an understanding of Granite 3.0’s model architecture and its enterprise applications.
  • Learn how to utilize Granite-3.0-2B-Instruct for tasks like summarization, code generation, and Q&A.
  • Explore IBM’s innovations in training techniques that enhance Granite 3.0’s performance and efficiency.
  • Understand IBM’s commitment to open-source transparency and responsible AI development.
  • Discover the role of Granite 3.0 in advancing secure, cost-effective AI solutions across industries.

This article was published as a part of the Data Science Blogathon.

What are Granite 3.0 Models?

At the forefront of the Granite 3.0 lineup is the Granite 3.0 8B Instruct, an instruction-tuned dense decoder-only model designed to deliver high performance for enterprise tasks. Trained with a dual-phase approach, it was developed with over 12 trillion tokens in various languages and programming dialects, making it highly versatile. This model is suitable for complex workflows in industries like finance, cybersecurity, and programming, combining general-purpose capabilities with robust task-specific fine-tuning.

Image source: IBM

IBM offers Granite 3.0 under the open-source Apache 2.0 license, ensuring transparency in usage and data handling. The models integrate seamlessly into existing platforms, including IBM’s own Watsonx, Google Cloud Vertex AI, and NVIDIA NIM, enabling accessibility across various environments. This alignment with open-source principles and transparency further reinforces detailed disclosures of training datasets and methodologies, as outlined in the Granite 3.0 technical paper.

Key Features of Granite 3.0

  • Diverse Model Options for Flexible Use: Granite 3.0 includes models such as Granite-3.0–8B-Instruct, Granite-3.0–8B-Base, Granite-3.0–2B-Instruct, and Granite-3.0–2B-Base, providing a range of options based on scale and performance needs.
  • Enhanced Safety through Guardrail Models: The release also includes Granite-Guardian-3.0 models, which offer additional layers of safety for sensitive applications. These models help filter inputs and outputs to meet stringent enterprise standards in regulated sectors like healthcare and finance.
  • Mixture of Experts (MoE) for Latency Reduction: Granite-3.0–3B-A800M-Instruct and other MoE models reduce latency while maintaining high performance, making them ideal for applications with demanding speed requirements.
  • Improved Inference Speed via Speculative Decoding: Granite-3.0–8B-Instruct-Accelerator introduces speculative decoding, which increases inference speed by allowing the model to make predictions about the next set of possible tokens, enhancing overall efficiency and reducing response time.

Enterprise-Ready Performance and Cost Efficiency

Granite 3.0 optimizes enterprise tasks that require high accuracy and security. Researchers rigorously test the models on industry-specific tasks and academic benchmarks, delivering leading performance in several areas:

  • Enterprise-Specific Benchmarks: On IBM’s proprietary RAGBench, which evaluates retrieval-augmented generation tasks, Granite 3.0 performed at the top of its class. This benchmark specifically measures qualities like faithfulness and correctness in model outputs, crucial for applications where factual accuracy is paramount.
  • Specialization in Key Industries: Granite 3.0 shines in sectors such as cybersecurity, where it has been benchmarked against IBM’s proprietary datasets and publicly available cybersecurity standards. This specialization makes it highly suitable for industries with high-stakes data protection needs.
  • Programming and Tool-Calling Proficiency: Granite 3.0 excels in programming-related tasks, such as code generation and function calling. When tested on multiple tool-calling benchmarks, Granite 3.0 outperformed other models in its weight class, making it a valuable asset for applications involving technical support and software development.

Advancements in Model Training Techniques

IBM’s advanced training methodologies have significantly contributed to Granite 3.0’s high performance and efficiency. The use of Data Prep Kit and IBM Research’s Power Scheduler played crucial roles in optimizing model learning and data processing.

  • Data Prep Kit: IBM’s Data Prep Kit allows for scalable and streamlined processing of unstructured data, with features like metadata logging and checkpoint capabilities, enabling enterprises to efficiently manage vast datasets.
  • Power Scheduler for Optimal Learning Rates: IBM’s Power Scheduler dynamically adjusts the model’s learning rate based on batch size and token count, ensuring that training remains efficient without risking overfitting. This innovative approach facilitates faster convergence to optimal model weights, minimizing both time and computational cost.

Granite-3.0-2B-Instruct: Google Colab Guide

Granite-3.0-2B-Instruct is part of IBM’s Granite 3.0 series, developed with a focus on powerful and practical applications for enterprise use. This model strikes a balance between efficient model size and exceptional performance across diverse business scenarios. IBM Granite models are optimized for speed, safety, and cost-effectiveness, making them ideal for production-scale AI applications. The screen shot below was taken after making inferences with the model.

GPU usage without any quantization

The Granite 3.0 models excel in multilingual support, natural language processing (NLP) tasks, and enterprise-specific use cases. The 2B-Instruct model specifically supports summarization, classification, entity extraction, question-answering, retrieval-augmented generation (RAG), and function-calling tasks.

Model Architecture and Training Innovations

IBM’s Granite 3.0 series utilizes a decoder-only dense transformer architecture, featuring innovations such as GQA (Grouped Query Attention) and RoPE (Rotary Position Embedding) for handling extensive multilingual data.

Key architecture components include:

  • SwiGLU (Switchable Gated Linear Units): Increases the model’s ability to process complex patterns in natural language.
  • RMSNorm (Root Mean Square Normalization): Enhances training stability and efficiency.
  • IBM Power Scheduler: Adjusts learning rates based on a power-law equation to optimize training for large datasets, which is a significant advancement in ensuring cost-effective and scalable training.

Step 1: Setup (Install Required Libraries)

The Granite 3.0 models are hosted on Hugging Face, requiring torch, accelerate, and transformers libraries. Run the following commands to set up the environment:

# Install required libraries
!pip install torch torchvision torchaudio
!pip install accelerate
!pip install git+https://github.com/huggingface/transformers.git # Since it is not available via pip yet

Step 2: Model and Tokenizer Initialization

Now, load the Granite-3.0-2B-Instruct model and tokenizer. This model is hosted on Huggingface, and the AutoModelForCausalLM class is used for language generation tasks. Use the transformers library to load the model and tokenizer. The model is available at IBM’s Hugging Face repository.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Define device as 'cuda' if a GPU is available for faster computation
device = "cuda" if torch.cuda.is_available() else "cpu"

# Model and tokenizer paths
model_path = "ibm-granite/granite-3.0-2b-instruct"

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Load the model; set device_map based on your setup
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
model.eval()

Step 3: Input Format for Instruction-based Queries

The model takes input in a structured chat format. To ensure the prompt is in the correct format, create a chat dictionary with roles like “user” or “assistant” to distinguish instructions. To interact with the Granite-3.0-2B-Instruct model, start by defining a structured prompt. The model can respond to detailed prompts, making it suitable for tool-calling and other advanced applications.

# Define a user query in a structured format
chat = [
    { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]

# Prepare the chat data with the required prompts
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

Step 4: Tokenize the Input

Tokenize the structured chat data for the model. This tokenization step converts the text input into a format the model understands.

# Tokenize the input chat
input_tokens = tokenizer(chat, return_tensors="pt").to(device)

Step 5: Generate a Response

With the input tokenized, use the model to generate a response based on the instruction.

# Generate output tokens with a maximum of 100 new tokens in the response
output = model.generate(**input_tokens, max_new_tokens=100)

Step 6: Decode and Print the Output

Finally, decode the generated tokens back into readable text and print the output to see the model’s response.

# Decode and print the response
response = tokenizer.batch_decode(output, skip_special_tokens=True)
print(response[0])
user: Please list one IBM Research laboratory located in the United States. You should only output its name and location.
assistant: 1. IBM Research - Austin, Texas

Real-World Applications of Granite 3.0

Here are a few additional examples to explore Granite-3.0-2B-Instruct’s versatility:

Text Summarization

Quickly distill lengthy documents into concise summaries, allowing users to grasp the core message without sifting through extensive content.

chat = [
    { "role": "user", "content": " Summarize the following paragraph: Granite-3.0-2B-Instruct is developed by IBM for handling multilingual and domain-specific tasks with general instruction following capabilities." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_new_tokens=1000)
print(tokenizer.batch_decode(output, skip_special_tokens=True)[0])
user Summarize the following paragraph: Granite-3.0-2B-Instruct is developed by IBM for handling multilingual and domain-specific tasks with general instruction following capabilities.
assistant Granite-3.0-2B-Instruct is an AI model by IBM, designed to manage multilingual and domain-specific tasks while adhering to general instructions.

Question Answering

Answer questions directly from data sources, providing users with precise information in response to their specific inquiries.

chat = [
    { "role": "user", "content": "What are the capabilities of Granite-3.0-2B-Instruct?" },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_new_tokens=100)
print(tokenizer.batch_decode(output, skip_special_tokens=True)[0])
user What are the capabilities of Granite-3.0-2B-Instruct?
assistant 1. Text Generation: Granite-3.0-2B-Instruct can generate human-like text based on the input it receives.
2. Question Answering: It can provide accurate and relevant answers to a wide range of questions.
3. Translation: It can translate text from one language to another.
4. Summarization: It can summarize long pieces of text into shorter, more digestible versions.
5. Sentiment Analysis: It can analyze text

Automatically generate code snippets and entire scripts, accelerating development and making complex programming tasks more accessible.

chat = [
    { "role": "user", "content": "Write a Python function to compute the factorial of a number." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_new_tokens=100)
print(tokenizer.batch_decode(output, skip_special_tokens=True)[0])
userWrite a Python function to compute the factorial of a number.
assistantHere is the code to compute the factorial of a number:

```python
def factorial(n: int) -> int:
    if n < 0:
        raise ValueError("Factorial is not defined for negative numbers")
    elif n == 0:
        return 1
    else:
        result = 1
        for i in range(1, n + 1):
            result *= i
        return result
```

```python
import unittest

class TestFactorial(unittest.TestCase):
    def test_factorial(self):
        self.assertEqual(factorial(0), 1)
        self.assertEqual(factorial(1), 1)
        self.assertEqual(factorial(5), 120)
        self.assertEqual(factorial(10), 3628800)
        with self.assertRaises(ValueError):
            factorial(-5)

if __name__ == '__main__':
    unittest.main(argv=[''], verbosity=2, exit=False)
```

This code defines a function `factorial` that takes an integer `n` as input and returns the factorial of `n`. The function first checks if `n` is less than 0, and if so, raises a `ValueError` since factorial is not defined for negative numbers. If `n` is 0, the function returns 1 since the factorial of 0 is 1. Otherwise, the function initializes a variable `result` to 1 and then uses a for loop to multiply `result` by each integer from 1 to `n` (inclusive). The function finally returns the value of `result`.

The code also includes a unit test class `TestFactorial` that tests the `factorial` function with various inputs and checks that the output is correct. The test class includes a method `test_factorial` that tests the function with different inputs and checks that the output is correct using the `assertEqual` method. The test class also includes a test case that checks that the function raises a `ValueError` when given a negative input. The unit test is run using the `unittest` module.

Note that the output is in markdown format.

Responsible AI and Open Source Commitment

Reflecting its commitment to ethical AI, IBM has ensured that Granite 3.0 models are built with governance, privacy, and bias mitigation at the forefront. IBM has taken additional steps to maintain transparency by disclosing all training datasets, aligning with its Responsible Use Guide, which outlines the model’s responsible applications and limitations. IBM also offers uncapped indemnity for third-party IP claims, demonstrating confidence in the legal robustness of its models.

Image source: IBM

Granite 3.0 models continue IBM’s legacy of supporting sustainable AI development. Trained on Blue Vela, a renewable energy-powered infrastructure, IBM underscores its commitment to reducing environmental impact within the AI industry.

Future Developments and Expanding Capabilities

IBM plans to extend the capabilities of Granite 3.0 throughout the year, adding features like expanded context windows up to 128K tokens and enhanced multilingual support. These enhancements will increase the model’s adaptability to more complex queries and improve its versatility in global enterprises. In addition, IBM will be introducing multimodal capabilities, enabling Granite 3.0 to handle image-in, text-out tasks, broadening its application to industries like media and retail.

Conclusion

IBM’s Granite-3.0-2B-Instruct is one of the smallest models in the series as regards parameters yet offers powerful, enterprise-ready capabilities designed to meet the demands of modern business applications. IBM’s open-source tools, flexible licensing, and innovations in model training can help developers and data scientists build solutions with lower costs and improved reliability. The entire IBM Granite 3.0 series represents a step forward in practical, enterprise-level AI applications. Granite 3.0 combines powerful performance, robust safety measures, and cost-effective scalability, positioning itself as a cornerstone for businesses seeking sophisticated language models tailored to their unique needs.

Key Takeaways

  • Efficiency and Scalability: Granite-3.0-2B-Instruct provides high performance with a cost-effective and scalable model size, ideal for enterprise AI solutions.
  • Transparency and Safety: The model’s open-source design under Apache 2.0 and IBM’s Responsible Use Guide reflect a commitment to safety, transparency, and ethical AI use.
  • Advanced Multilingual Support: With training across 12 languages, Granite-3.0-2B-Instruct offers broad applicability in diverse business environments globally.

References

Frequently Asked Questions

Q1. What makes IBM Granite-3.0 Model unique compared to other large language models?

A. IBM Granite-3.0 Model is optimized for enterprise use with a balance of powerful performance and practical model size. Its dense, decoder-only architecture, robust multilingual support, and cost-efficient scalability make it ideal for diverse business applications.

Q2. How does the IBM Power Scheduler improve training efficiency?

A. The IBM Power Scheduler dynamically adjusts learning rates based on training parameters like token count and batch size, allowing the model to train faster without overfitting, thus reducing costs.

Q3. What tasks can Granite-3.0 be used for in natural language processing?

A. Granite-3.0 supports tasks like text summarization, classification, entity extraction, code generation, retrieval-augmented generation (RAG), and customer service automation.

Q4. How does Granite-3.0 ensure data safety and ethical use?

A. IBM includes a Responsible Use Guide with the model, focused on governance, risk mitigation, and privacy. IBM also discloses training datasets, ensuring transparency around the data used for model training.

Q5. Can Granite-3.0 be fine-tuned for specific industries?

A. Yes, using IBM’s InstructLab and the Data Prep Kit, enterprises can fine-tune the model to meet specific needs. InstructLab facilitates phased fine-tuning with synthetic data, making customization easier and more cost-effective.

Q6. Is Granite-3.0 available on cloud platforms for easier access?

A. Yes, the model is accessible on the IBM Watsonx platform and through partners like Google Vertex AI, Hugging Face, and NVIDIA, enabling flexible deployment options for businesses.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

I am an AI Engineer with a deep passion for research, and solving complex problems. I provide AI solutions leveraging Large Language Models (LLMs), GenAI, Transformer Models, and Stable Diffusion.

Responses From Readers

Clear

Congratulations, You Did It!
Well Done on Completing Your Learning Journey. Stay curious and keep exploring!

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details