IBM’s latest addition to its Granite series, Granite 3.0, marks a significant leap forward in the field of large language models (LLMs). Granite 3.0 provides enterprise-ready, instruction-tuned models with an emphasis on safety, speed, and cost-efficiency focused on balancing power and practicality. The Granite 3.0 series enhances IBM’s AI offerings, particularly in domains where precision, security, and adaptability are crucial and built on a foundation of diverse data and fine-tuning techniques.
This article was published as a part of the Data Science Blogathon.
At the forefront of the Granite 3.0 lineup is the Granite 3.0 8B Instruct, an instruction-tuned dense decoder-only model designed to deliver high performance for enterprise tasks. Trained with a dual-phase approach, it was developed with over 12 trillion tokens in various languages and programming dialects, making it highly versatile. This model is suitable for complex workflows in industries like finance, cybersecurity, and programming, combining general-purpose capabilities with robust task-specific fine-tuning.
IBM offers Granite 3.0 under the open-source Apache 2.0 license, ensuring transparency in usage and data handling. The models integrate seamlessly into existing platforms, including IBM’s own Watsonx, Google Cloud Vertex AI, and NVIDIA NIM, enabling accessibility across various environments. This alignment with open-source principles and transparency further reinforces detailed disclosures of training datasets and methodologies, as outlined in the Granite 3.0 technical paper.
Granite 3.0 optimizes enterprise tasks that require high accuracy and security. Researchers rigorously test the models on industry-specific tasks and academic benchmarks, delivering leading performance in several areas:
IBM’s advanced training methodologies have significantly contributed to Granite 3.0’s high performance and efficiency. The use of Data Prep Kit and IBM Research’s Power Scheduler played crucial roles in optimizing model learning and data processing.
Granite-3.0-2B-Instruct is part of IBM’s Granite 3.0 series, developed with a focus on powerful and practical applications for enterprise use. This model strikes a balance between efficient model size and exceptional performance across diverse business scenarios. IBM Granite models are optimized for speed, safety, and cost-effectiveness, making them ideal for production-scale AI applications. The screen shot below was taken after making inferences with the model.
The Granite 3.0 models excel in multilingual support, natural language processing (NLP) tasks, and enterprise-specific use cases. The 2B-Instruct model specifically supports summarization, classification, entity extraction, question-answering, retrieval-augmented generation (RAG), and function-calling tasks.
IBM’s Granite 3.0 series utilizes a decoder-only dense transformer architecture, featuring innovations such as GQA (Grouped Query Attention) and RoPE (Rotary Position Embedding) for handling extensive multilingual data.
Key architecture components include:
The Granite 3.0 models are hosted on Hugging Face, requiring torch, accelerate, and transformers libraries. Run the following commands to set up the environment:
# Install required libraries
!pip install torch torchvision torchaudio
!pip install accelerate
!pip install git+https://github.com/huggingface/transformers.git # Since it is not available via pip yet
Now, load the Granite-3.0-2B-Instruct model and tokenizer. This model is hosted on Huggingface, and the AutoModelForCausalLM class is used for language generation tasks. Use the transformers library to load the model and tokenizer. The model is available at IBM’s Hugging Face repository.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Define device as 'cuda' if a GPU is available for faster computation
device = "cuda" if torch.cuda.is_available() else "cpu"
# Model and tokenizer paths
model_path = "ibm-granite/granite-3.0-2b-instruct"
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Load the model; set device_map based on your setup
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
model.eval()
The model takes input in a structured chat format. To ensure the prompt is in the correct format, create a chat dictionary with roles like “user” or “assistant” to distinguish instructions. To interact with the Granite-3.0-2B-Instruct model, start by defining a structured prompt. The model can respond to detailed prompts, making it suitable for tool-calling and other advanced applications.
# Define a user query in a structured format
chat = [
{ "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
# Prepare the chat data with the required prompts
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
Tokenize the structured chat data for the model. This tokenization step converts the text input into a format the model understands.
# Tokenize the input chat
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
With the input tokenized, use the model to generate a response based on the instruction.
# Generate output tokens with a maximum of 100 new tokens in the response
output = model.generate(**input_tokens, max_new_tokens=100)
Finally, decode the generated tokens back into readable text and print the output to see the model’s response.
# Decode and print the response
response = tokenizer.batch_decode(output, skip_special_tokens=True)
print(response[0])
user: Please list one IBM Research laboratory located in the United States. You should only output its name and location.
assistant: 1. IBM Research - Austin, Texas
Here are a few additional examples to explore Granite-3.0-2B-Instruct’s versatility:
Quickly distill lengthy documents into concise summaries, allowing users to grasp the core message without sifting through extensive content.
chat = [
{ "role": "user", "content": " Summarize the following paragraph: Granite-3.0-2B-Instruct is developed by IBM for handling multilingual and domain-specific tasks with general instruction following capabilities." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_new_tokens=1000)
print(tokenizer.batch_decode(output, skip_special_tokens=True)[0])
user Summarize the following paragraph: Granite-3.0-2B-Instruct is developed by IBM for handling multilingual and domain-specific tasks with general instruction following capabilities.
assistant Granite-3.0-2B-Instruct is an AI model by IBM, designed to manage multilingual and domain-specific tasks while adhering to general instructions.
Answer questions directly from data sources, providing users with precise information in response to their specific inquiries.
chat = [
{ "role": "user", "content": "What are the capabilities of Granite-3.0-2B-Instruct?" },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_new_tokens=100)
print(tokenizer.batch_decode(output, skip_special_tokens=True)[0])
user What are the capabilities of Granite-3.0-2B-Instruct?
assistant 1. Text Generation: Granite-3.0-2B-Instruct can generate human-like text based on the input it receives.
2. Question Answering: It can provide accurate and relevant answers to a wide range of questions.
3. Translation: It can translate text from one language to another.
4. Summarization: It can summarize long pieces of text into shorter, more digestible versions.
5. Sentiment Analysis: It can analyze text
Automatically generate code snippets and entire scripts, accelerating development and making complex programming tasks more accessible.
chat = [
{ "role": "user", "content": "Write a Python function to compute the factorial of a number." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_new_tokens=100)
print(tokenizer.batch_decode(output, skip_special_tokens=True)[0])
userWrite a Python function to compute the factorial of a number.
assistantHere is the code to compute the factorial of a number:
```python
def factorial(n: int) -> int:
if n < 0:
raise ValueError("Factorial is not defined for negative numbers")
elif n == 0:
return 1
else:
result = 1
for i in range(1, n + 1):
result *= i
return result
```
```python
import unittest
class TestFactorial(unittest.TestCase):
def test_factorial(self):
self.assertEqual(factorial(0), 1)
self.assertEqual(factorial(1), 1)
self.assertEqual(factorial(5), 120)
self.assertEqual(factorial(10), 3628800)
with self.assertRaises(ValueError):
factorial(-5)
if __name__ == '__main__':
unittest.main(argv=[''], verbosity=2, exit=False)
```
This code defines a function `factorial` that takes an integer `n` as input and returns the factorial of `n`. The function first checks if `n` is less than 0, and if so, raises a `ValueError` since factorial is not defined for negative numbers. If `n` is 0, the function returns 1 since the factorial of 0 is 1. Otherwise, the function initializes a variable `result` to 1 and then uses a for loop to multiply `result` by each integer from 1 to `n` (inclusive). The function finally returns the value of `result`.
The code also includes a unit test class `TestFactorial` that tests the `factorial` function with various inputs and checks that the output is correct. The test class includes a method `test_factorial` that tests the function with different inputs and checks that the output is correct using the `assertEqual` method. The test class also includes a test case that checks that the function raises a `ValueError` when given a negative input. The unit test is run using the `unittest` module.
Note that the output is in markdown format.
Reflecting its commitment to ethical AI, IBM has ensured that Granite 3.0 models are built with governance, privacy, and bias mitigation at the forefront. IBM has taken additional steps to maintain transparency by disclosing all training datasets, aligning with its Responsible Use Guide, which outlines the model’s responsible applications and limitations. IBM also offers uncapped indemnity for third-party IP claims, demonstrating confidence in the legal robustness of its models.
Granite 3.0 models continue IBM’s legacy of supporting sustainable AI development. Trained on Blue Vela, a renewable energy-powered infrastructure, IBM underscores its commitment to reducing environmental impact within the AI industry.
IBM plans to extend the capabilities of Granite 3.0 throughout the year, adding features like expanded context windows up to 128K tokens and enhanced multilingual support. These enhancements will increase the model’s adaptability to more complex queries and improve its versatility in global enterprises. In addition, IBM will be introducing multimodal capabilities, enabling Granite 3.0 to handle image-in, text-out tasks, broadening its application to industries like media and retail.
IBM’s Granite-3.0-2B-Instruct is one of the smallest models in the series as regards parameters yet offers powerful, enterprise-ready capabilities designed to meet the demands of modern business applications. IBM’s open-source tools, flexible licensing, and innovations in model training can help developers and data scientists build solutions with lower costs and improved reliability. The entire IBM Granite 3.0 series represents a step forward in practical, enterprise-level AI applications. Granite 3.0 combines powerful performance, robust safety measures, and cost-effective scalability, positioning itself as a cornerstone for businesses seeking sophisticated language models tailored to their unique needs.
A. IBM Granite-3.0 Model is optimized for enterprise use with a balance of powerful performance and practical model size. Its dense, decoder-only architecture, robust multilingual support, and cost-efficient scalability make it ideal for diverse business applications.
A. The IBM Power Scheduler dynamically adjusts learning rates based on training parameters like token count and batch size, allowing the model to train faster without overfitting, thus reducing costs.
A. Granite-3.0 supports tasks like text summarization, classification, entity extraction, code generation, retrieval-augmented generation (RAG), and customer service automation.
A. IBM includes a Responsible Use Guide with the model, focused on governance, risk mitigation, and privacy. IBM also discloses training datasets, ensuring transparency around the data used for model training.
A. Yes, using IBM’s InstructLab and the Data Prep Kit, enterprises can fine-tune the model to meet specific needs. InstructLab facilitates phased fine-tuning with synthetic data, making customization easier and more cost-effective.
A. Yes, the model is accessible on the IBM Watsonx platform and through partners like Google Vertex AI, Hugging Face, and NVIDIA, enabling flexible deployment options for businesses.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.