From GPT to Mistral-7B: The Exciting Leap Forward in AI Conversations

Suvojit Last Updated : 03 Nov, 2023

9 min read

Introduction

The field of artificial intelligence has seen remarkable advancements in recent years, particularly in the area of large language models. LLMs can generate human-like text, summarize documents, and write software code. Mistral-7B is one of the recent large language models that support English text and code generation abilities, and it can be used for various tasks such as text summarization, classification, text completion, and code completion.

What sets Mistral-7B-Instruct apart is its ability to deliver stellar performance despite having fewer parameters, making it a high-performing and cost-effective solution. The model recently gained popularity after benchmark results showed that it not only outperforms all 7B models on MT-Bench but also competes favorably with 13B chat models. In this blog, we will explore the features and capabilities of Mistral 7B, including its use cases, performance, and a hands-on guide to fine-tuning the model.

Learning Objectives

Understand how large language models and Mistral 7B work
Architecture of Mistral 7B and benchmarks
Use cases of Mistral 7B and how it performs
Deep dive into code for inference and fine-tuning

This article was published as a part of the Data Science Blogathon.

What are Large Language Models?
Mistral 7B Architecture
Mistral 7B in Google Colab
Use Cases
Custom Instructions
Fine-tuning Mistral 7B
Frequently Asked Questions

What are Large Language Models?

Large language models‘ architecture is formed with transformers, which use attention mechanisms to capture long-range dependencies in data, where multiple layers of transformer blocks contain multi-head self-attention and feed-forward neural networks. These models are pre-trained on text data, learning to predict the next word in a sequence, thus capturing the patterns in languages. The pre-training weights can be fine-tuned on specific tasks. We will specifically look at the architecture of Mistral 7B LLM, and what makes it stand out.

Mistral 7B Architecture

The Mistral 7B model transformer architecture efficiently balances high performance with memory usage, using attention mechanisms and caching strategies to outperform larger models in speed and quality. It uses 4096-window Sliding Window Attention (SWA), which maximizes attention over longer sequences by allowing each token to attend to a subset of precursor tokens, optimizing attention over longer sequences.

A given hidden layer can access tokens from input layers at distances determined by the window size and layer depth. The model integrates modifications to Flash Attention and xFormers, doubling the speed over traditional attention mechanisms. Additionally, a Rolling Buffer Cache mechanism maintains a fixed cache size for efficient memory usage.

Mistral-7B Architecture | AI conversations

Mistral 7B in Google Colab

Let’s deep dive into the code and look at running inferences with the Mistral 7B model in Google Colab. We will use the free version with a single T4 GPU and load the model from Hugging Face.

1. Install and import the ctransformers library in Colab.

#intsall ctransformers
pip install ctransformers[cuda]

#import
from ctransformers import AutoModelForCausalLM

2. Initialize the model object from Hugging Face and set the necessary parameters. We will use a different version of the model since the original model from Mistral AI can have issues with loading the entire model into memory on Google Colab.

#load the model from huggingface with 50 gpu layers
llm = AutoModelForCausalLM.from_pretrained("TheBloke/Mistral-7B-Instruct-v0.1-GGUF", 
model_file="mistral-7b-instruct-v0.1.Q4_K_M.gguf", 
model_type = "mistral", gpu_layers = 50)

3. Define a function to print the results vertically in Google Colab. This step can be skipped or modified if it is being run in a different environment.

#function to print the model output in colab in a readable manner
def colab_print(text, max_width = 120):
  words = text.split()
  line = ""
  for word in words:
    if len(line) + len(word) + 1 > max_width:
      print(line)
      line = ""
    line += word + " "
  print (line)

4. Generate text using the model and see the results. Modify the parameters to change the quality of the generated text.

#generate text
colab_print(llm('''Give me a well-written paragraph in 5 sentences about a Senior Data 
Scientist (name - Suvojit) who writes blogs on LLMs on Analytics Vidhya. He studied 
Masters in AIML in BITS Pilani and works at AZ Company, with a total of 4 years of 
experience. Start the sentence with - Suvojit is a''', 
max_new_tokens = 2048, temperature = 0.9, top_k = 55, top_p = 0.93, 
repetition_penalty = 1.2))

Model Response: Suvojit is a Senior Data Scientist who has been working for 4 years at AZ company as part of their team focusing on the design, implementation, and improvement of consumer behavior predictive models in their client’s brands and business lines using Limited Memory Machine Learning techniques. He writes about LLMs on Analytics Vidhya which helps him to stay up to date with the latest trends in Data Science. He has a master’s degree in AIML from BITS Pilani, where he studied machine learning algorithms and their applications. Suvojit is passionate about using his knowledge of data science to help businesses make better decisions and improve their performance.

Use Cases

Let’s look at some of the use cases of the Mistral 7B in more detail.

Translation

Mistral 7B supports multiple languages for translation. Let’s look at English to French and Spanish translations and check the accuracy.

#translate English to French
colab_print(llm('''Translate this text from English to French:
"Suvojit is a Senior Data Scientist who writes blogs on Analytics Vidhya."''',
                max_new_tokens = 2048, temperature = 0.9, top_k = 55, top_p = 0.93, 
                repetition_penalty = 1.2))

Model Response: The translation would be: “Suvojit est un scientifique des données sénior qui écrit des blogues sur Analytics Vidhya.”

#tranlsate English to Spanish
colab_print(llm('''Translate this text from English to Spanish:
"Suvojit is a Senior Data Scientist who writes blogs on Analytics Vidhya."''',
                max_new_tokens = 2048, temperature = 0.9, top_k = 55, top_p = 0.93, 
                repetition_penalty = 1.2))

Model Response: The translation would be: “Suvojit es un Senior Data Científico que escribe blogs en Analytics Vidhya.”

Summarization

Use Mistral 7B to summarize documents into a shorter version using appropriate summarization instructions.

#Define the prompt with instructions
prompt = '''[INST] You are a helpful code assistant. Your task is to summarize text 
into exactly two short sentences:

Transformers have revolutionized the field of large language models with their ability 
to capture long-range dependencies and intricate patterns in text. Originating 
from the paper "Attention Is All You Need" by Vaswani et al. in 2017, transformers 
introduced the self-attention mechanism, which weighs input tokens differently based 
on their relevance to a given token. This mechanism enables transformers to process
information more flexibly than traditional RNNs or LSTMs. Consequently, models 
like GPT, BERT, and their subsequent iterations have been built on the transformer 
architecture, leading to breakthroughs in numerous natural language processing tasks. 
Their parallel processing capabilities further make them conducive to training on 
vast amounts of data. However, as these models grow in size, concerns about their 
environmental impact and computational demands also emerge. Despite these challenges, 
the transformative power of transformers in language models remains undeniable.

Just summarize the text in exactly 2 short sentences without explanations:
[/INST]'''

#Summarize
colab_print(llm(prompt,
                max_new_tokens = 2048, temperature = 0.9, top_k = 55, top_p = 0.93, 
                repetition_penalty = 1.2))

Model Response: Transformers have revolutionized large language models by capturing long-range dependencies and intricate patterns, introduced a self-attention mechanism that makes them process information flexibly and subsequent implementations include breakthroughs like GPT & BERT. However, as these models grow in size concerns about their environmental impact and computational demands arise, despite these challenges they remain undeniably transformative in language modeling.

Custom Instructions

We can use the [INST] tag to modify the user input to get a particular response from the model. For example, we can generate a JSON based on text description.

prompt = '''[INST] You are a helpful code assistant. Your task is to generate a valid 
JSON object based on the given information:

My name is Suvojit Hore, working in company AB and my address is AZ Street NY.

Just generate the JSON object without explanations:
[/INST]
'''

colab_print(llm(prompt,
                max_new_tokens = 2048, temperature = 0.9, top_k = 55, top_p = 0.93, 
                repetition_penalty = 1.2))

Model Response: “`json { “name”: “Suvojit Hore”, “company”: “AB”, “address”: “AZ Street NY” } “`

Fine-tuning Mistral 7B

Let’s look at how we can fine-tune the model using a single GPU on Google Colab. We will use a dataset that converts few-word descriptions about images to detailed and highly descriptive text. These results can be used in Midjourney to generate the specific image. The goal is to train the LLM to act as a prompt engineer for image generation.

Setup the environment and import the necessary libraries in Google Colab:

# Install the necessary libraries
!pip install pandas autotrain-advanced -q
!autotrain setup --update-torch
!pip install -q peft  accelerate bitsandbytes safetensors

#import the necesary libraries
import pandas as pd
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers
from huggingface_hub import notebook_login

Login to Hugging Face from a browser and copy the access token. Use this token to log in to Hugging Face in the notebook.

notebook_login()

Upload the dataset to Colab session storage. We will use the Midjourney dataset.

df = pd.read_csv("prompt_engineering.csv")
df.head(5)

Prompt Engineering Dataset | Mistral-7B — Prompt Engineering Dataset

Train the model using Autotrain with appropriate parameters. Modify the command below to run it for your own Huggin Face repo and user access token.

!autotrain llm --train --project_name mistral-7b-sh-finetuned --model 
username/Mistral-7B-Instruct-v0.1-sharded --token hf_yiguyfTFtufTFYUTUfuytfuys 
--data_path . --use_peft --use_int4 --learning_rate 2e-4 --train_batch_size 12 
--num_train_epochs 3 --trainer sft --target_modules q_proj,v_proj --push_to_hub 
--repo_id username/mistral-7b-sh-finetuned

Now let’s use the finetuned model to run the inference engine and generate some detailed descriptions of the images.

#adapter and model
adapters_name = "suvz47/mistral-7b-sh-finetuned"
model_name = "bn22/Mistral-7B-Instruct-v0.1-sharded" 

device = "cuda"

#set the config
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

#initialize the model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config,
    device_map='auto'
)

Load the finetuned model and tokenizer.

#load the model and tokenizer
model = PeftModel.from_pretrained(model, adapters_name)

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.bos_token_id = 1

stop_token_ids = [0]

Generate a detailed and descriptive Midjourney prompt with just a few words.

#prompt
text = "[INST] generate a midjourney prompt in less than 20 words for A computer 
with an emotional chip	 [/INST]"

#encoder and decoder
encoded = tokenizer(text, return_tensors="pt", add_special_tokens=False)
model_input = encoded
model.to(device)
generated_ids = model.generate(**model_input, max_new_tokens=200, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print('\n\n')
print(decoded[0])

Model Response: As the computer with an emotional chip begins to process its emotions, it starts to question its existence and purpose, leading to a journey of self-discovery and self-improvement.

#prompt
text = "[INST] generate a midjourney prompt in less than 20 words for A rainbow 
chasing its colors	 [/INST]"

#encoder and decoder
encoded = tokenizer(text, return_tensors="pt", add_special_tokens=False)
model_input = encoded
model.to(device)
generated_ids = model.generate(**model_input, max_new_tokens=200, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print('\n\n')
print(decoded[0])

Model Response: A rainbow chasing colors finds itself in a desert where the sky is a sea of endless blue, and the colors of the rainbow are scattered in the sand.

Conclusion

Mistral 7B has proved to be a significant advancement in the field of Large Language Models. Its efficient architecture, combined with its superior performance, showcases its potential to be a staple for various NLP tasks in the future. This blog provides insights into the model’s architecture, its application, and how one can harness its power for specific tasks like translation, summarization, and fine-tuning for other applications. With the right guidance and experimentation, Mistral 7B could redefine the boundaries of what’s possible with LLMs.

Key Takeaways

Mistral-7B-Instruct excels in performance despite fewer parameters.
It uses Sliding Window Attention for long-sequence optimization.
Features like Flash Attention and xFormers double its speed.
Rolling Buffer Cache ensures efficient memory management.
Versatile: Handles translation, summarization, structured data generation, text generation and text completion.
Prompt Engineering to add custom instructions can help the model understand the query better and perform several complex language tasks.
Finetune Mistral 7B for any specific language tasks like acting as a prompt engineer.

Frequently Asked Questions

Q1. What is the primary difference between Mistral-7B and other large language models?

A. Mistral-7B is designed for efficiency and performance. While it has fewer parameters than some other models, its architectural advancements, such as the Sliding Window Attention, allow it to deliver outstanding results, even outperforming larger models in specific tasks.

Q2. Is it possible to fine-tune Mistral-7B for custom tasks?

A. Yes, Mistral-7B can be fine-tuned for various tasks. The guide provides an example of fine-tuning the model to convert short text descriptions into detailed prompts for image generation.

Q3. How does the Sliding Window Attention mechanism in Mistral-7B improve its performance?

A. The Sliding Window Attention (SWA) allows the model to handle longer sequences efficiently. With a window size of 4096, SWA optimizes attention operations, enabling Mistral-7B to process lengthy texts without compromising on speed or accuracy.

Q4. Do you need a specific library to run Mistral-7B inferences?

A. Yes, when running Mistral-7B inferences, we recommend using the ctransformers library, especially when working within Google Colab. You can also load the model from Hugging Face for added convenience

Q5. How can I ensure optimal results when generating outputs with Mistral-7B?

A. It’s crucial to craft detailed instructions in the input prompt. Mistral-7B’s versatility enables it to understand and follow these detailed instructions, ensuring accurate and desired outputs. Proper prompt engineering can significantly enhance the model’s performance.

References

Thumbnail – Generated using Stable Diffusion
Architecture – Paper

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Suvojit

Suvojit is a Senior Data Scientist at DunnHumby. He enjoys exploring new and innovative ideas and techniques in the field of AI and tries to solve real-world machine learning problems by thinking out of the box. He writes about the latest advancements in Computer Vision and Natural Language processing. You can follow him on LinkedIn.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

From GPT to Mistral-7B: The Exciting Leap Forward in AI Conversations

Introduction

Learning Objectives

Table of contents

What are Large Language Models?

Mistral 7B Architecture

Mistral 7B in Google Colab

Use Cases

Translation

Summarization

Custom Instructions

Fine-tuning Mistral 7B

Conclusion

Key Takeaways

Frequently Asked Questions

References

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

Google (11)

_gcl_au

SID

SAPISID

__Secure-#

APISID

SSID

HSID

DV

NID

1P_JAR

OTZ

Facebook (2)