How to Use Falcon 3-7B Instruct?

Maigari David Last Updated : 16 Jan, 2025
6 min read

TII’s ambition to redefine AI has moved to the next level with the advanced Falcon 3. This latest-generation release sets a performance benchmark that makes a big statement about open-source AI models. 

The Falcon 3 model’s lightweight design redefines how we communicate with technology. Its ability to run smoothly on small devices and great context-handling capabilities make this model’s release a major leap forward in advanced AI models. 

Falcon 3’s expanded training data, at 14 trillion tokens, is a significant improvement, more than double the size of Falcon 2’s, at 5.5 trillion. So, its high performance and efficiency are in no doubt.

Learning Objectives 

  • Understand the key features and improvements of the Falcon 3 model.
  • Learn how Falcon 3’s architecture enhances performance and efficiency.
  • Explore the different model sizes and their use cases.
  • Gain insight into Falcon 3’s capabilities in text generation and task-specific applications.
  • Discover the potential of Falcon 3’s upcoming multimodal functionalities.

This article was published as a part of the Data Science Blogathon.

Family of Falcon 3: Different Model Sizes and Versions

The model comes in different sizes, so we have Falcon 3-1B, -3B, -7B, and -10B. All these versions have a base model and an instruct model for conversational applications. Although we would be running the -10B instruct version, knowing the different models in Falcon 3 is important.

TII has worked to make the model compatible in various ways. It is compatible with standard APIs and libraries, and users can enjoy easy integrations. They are also quantized models. This release also made special English, French, Portuguese, and Spanish editions. 

Note: The models listed above can also handle common languages. 

Also read: Experience Advanced AI Anywhere with Falcon 3’s Lightweight Design

Model Architecture of Falcon 3 

This model is designed on a decoder-only architecture using Flash Attention 2 to group query attention. It integrates the grouped query attention to share parameters and minimizes memory to ensure efficient operation during inference.

Another vital part of this model’s architecture is how it supports 131K tokens, which is twice that of Falcon 2. This model also offers superior compression and enhanced performance while having the capacity to handle diverse tasks. 

Falcon 3 is also capable of handling long context training. A context 32K trained natively on this model can process long and complex inputs. 

Model Architecture of Falcon 3 

A key attribute of this model is its functionality, even in low-resource environments. And that is because TII made it to meet this efficiency with quantization. So, Falcon 3 has some quantized versions (int4, int8, and 1.5 Bisnet).

TTI-Falcon-3-Benchmark-Comparison: Falcon 3-7B Instruct
Source: Click Here

Performance Benchmark 

Compared to other small LLMs, Falcon leads on various benchmarks. This model ranks higher than other open-source models on hugging faces, such as Llama. Regarding robust functionality, Falcon 3 just surpasses Qwen’s performance threshold. 

The instruct version of Falcon 3 also ranks as the leader globally. Its adaptability to different fine-tuned versions makes it stand out. This feature makes it a leading performer in creating conversational and task-specific applications. 

Falcon 3’s innovative design is another threshold for outstanding performance that it adopts. The scalable and diverse versions ensure that various users can deploy it, and the resource-efficient deployment allows it to beat various other benchmarks. 

Falcon 3: Multimodal Capabilities for 2025

TII plans to expand this model’s capabilities with multimodal functionalities. Thus, we could see more applications with images, videos, and voice processing. The multimodal functionality would mean that you can get models from Falcon 3 to use text for generating images and videos. TII is also planning to make it possible for models to be created to support voice processing. So, you can have all these functionalities that could be valuable for researchers, developers, and businesses. 

This could be groundbreaking, considering this model was designed for developers, businesses, and researchers. It could also be a foundation for creating more industry applications that foster creativity and innovation. 

Examples of Multimodal Capabilities

There are lots of capabilities in multimodal applications. A good example of this is visual question answering. This application can help you provide answers to questions using visual content like images and videos. 

Voice processing is another good application of multimodal functionality. With this application, you can explore models to generate voices from text or use voices to generate text. Image-to-text and Text-to-image are great use cases of multimodal capabilities in models, and they can be used for search applications or help in seamless integration.  

Multimodal modal has a wide range of use cases. Other applications may include image segmentations and Generative AI

How to Use Falcon 3-7B Instruct ?

Running this model is scalable, as you can perform text generation, conversation, or chat tasks. We will try one text input to show its ability to handle long context inputs. 

Importing Necessary Libraries

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

Importing ‘torch’ leverages the PyTorch to facilitate deep learning computation and help with running models on GPU. 

Loading Pre-trained Model

From the ‘AutoModelForCausalLM,’ you get an interface to load pre-trained causal language models. This is for models to generate text sequentially. On the other hand, the ‘Autotokenizer’ loads a tokenizer compatible with the Falcon 3 model. 

Initializing the Pre-trained Model

model_id = "tiiuae/Falcon3-7B-Instruct-1.58bit"


model = AutoModelForCausalLM.from_pretrained(
 model_id,
 torch_dtype=torch.bfloat16,
).to("cuda")

Model_id is the variable that identifies the model we want to load, which is the Falcon 3-7B Instruct in this case. Then, we fetch the weight and configuration from HF while leveraging the ‘bfloat’ in the computation to get efficient GPU performance. The GPU is moved to accelerated processing during inference. 

Text Processing and Input

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)


# Define input prompt
input_prompt = "Explain the concept of reinforcement learning in simple terms:"


# Tokenize the input prompt
inputs = tokenizer(input_prompt, return_tensors="pt").to("cuda")

After loading the tokenizer associated with the model, you can now input the prompt for text generation. The input prompt is tokenized, converting it into a format compatible with the model. The resulting tokenized input is then moved to the GPU (“cuda”) for efficient processing during text generation.

Generating Text

output = model.generate(
   **inputs,
   max_length=200,  # Maximum length of generated text
   num_return_sequences=1,  # Number of sequences to generate
   temperature=0.7,  # Controls randomness; lower values make it more deterministic
   top_p=0.9,  # Nucleus sampling; use only top 90% probability tokens
   top_k=50,  # Consider the top 50 tokens
   do_sample=True,  # Enable sampling for more diverse outputs
)

This code generates text with the tokenized input. The output sequence of the text is set to a maximum length of 200 tokens. With certain parameters like ‘temperature’ and’ top_p,’ you can control the diversity and randomness of the output. So, with this setting, you can be creative and set the tone for your text output, making this model customizable and balanced. 

Output:

 # Decode the output
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

# Print the generated text
print(generated_text)

In this step, we first decode the output into human-readable text using the ‘decode’ method. Then, we print the decoded text to display the model’s generated response.

generated_text

Here is the result of running this with Falcon 3. This shows how the model understands and handles context when generating output. 

output for generated text

However, this model also possesses other significant capabilities in its application across science and other industries. 

Applications and Limitations of Falcon 3

These are some major attributes of the Falcon 3 model: 

  • An extended context handling reaching 32K tokens shows its ability to provide diversity when running task-specific problems. 
  • Falcon 3 has also shown great promise in solving complex math problems, especially the Falcon 3 -10B base model. 
  • Falcon 3 -10B and its instruct version both demonstrate high code proficiency and can perform general programming tasks. 

Limitations 

  • Falcon 3 supports English, Spanish, French, and German, which can be a limitation for the global accessibility of this model.
  • This model is currently limited for researchers or developers exploring multimodal functionalities. However, this part of Falcon 3 is planned for development.

Conclusion

Falcon 3 is a testament to TII’s dedication to advancing open-source AI. It offers cutting-edge performance, versatility, and efficiency. With its extended context handling, robust architecture, and diverse applications, Falcon 3 is poised to transform text generation, programming, and scientific problem-solving. With a promising future based on incoming multimodal functionalities, this model would be a significant one to watch. 

Key Takeaways

Here are some highlights from our breakdown of Falcon 3: 

  • Improved reasoning features and added data training mean this model has better context handling than Falcon 2. 
  • This model’s resource-efficient design makes it lightweight, supporting quantization in low-resource environments. Its compatibility with APIs and libraries makes deployment easy and integration seamless.
  • The versatility of Falcon 3 in maths, code, and general context handling is amazing. The possible development of multimodal functionality is also a prospect for researchers. 

Resources

Frequently Asked Questions

Q1. What are the key features of Falcon 3?

A. This model has several features, including its light design for optimized architecture, advanced tokenization, and extended context handling. 

Q2. How does Falcon 3 compare to other open-source LLMs?

A. Falcon 3 outperforms other models like Llama and Qwen on various benchmarks. Its instruct version ranks as the global leader in creating conversational and task-specific applications, showcasing exceptional versatility.

Q3. What are some of the applications of Falcon 3?

A. This model can handle text generation, complex maths problems, and programming tasks. It was designed for developers, researchers, and businesses. 

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Hey there! I'm David Maigari a dynamic professional with a passion for technical writing writing, Web Development, and the AI world. David is an also enthusiast of data science and AI innovations.

Responses From Readers

Clear

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details