Mistral Large 2: Powerful Enough to Challenge Llama 3.1 405B?

Ajay Last Updated : 30 Jul, 2024

9 min read

Introduction

Just a few days ago Meta AI released the new Llama 3.1 family of models. A day after the release, the Mistral AI released its largest model so far, called the Mistral Large 2. The model is trained on a large corpus of data and is expected to perform on par with the current SOTA models like the GPT 4o, and Opus and lie just below the open-source Meta Llama 3.1 405B. Like the Meta models, the Large 2 is said to excel at multi-lingual capabilities. In this article, we will go through the Mistral Large 2 model, check how well it works in different aspects.

Learning Objectives

Explore Mistral Large 2 and its features.
See how well it compares to the current SOTA models.
Understand the Large 2 coding abilities from its generations.
Learn to generate structured JSON responses with Large 2.
Understanding the tool calling feature of Mistral Large 2.

This article was published as a part of the Data Science Blogathon.

Exploring Mistral Large 2 – Mistral’s Largest Open Model
Mistral Large 2 Compared to the Best: A Benchmark Analysis
Hands-On with Mistral Large 2: Accessing the Model via API
Generating Structured Responses and Tool Calling
Frequently Asked Questions

Exploring Mistral Large 2 – Mistral’s Largest Open Model

As the heading goes, Mistral AI has recently announced the release of its newest and largest model named Mistral Large 2. This was announced just after the Meta AI released the Llama 3.1 family of models. Mistral Large 2 is a 123 Billion parameter model with 96 attention heads and the model has a context length similar to the Llama 3.1 family of models and is 128k tokens.

Similar to the Llama 3.1 family, Mistral Large 2 uses diverse data containing different languages including Hindi, French, Korean, Portuguese, and more, though it falls just short of the Llama 3.1 405B. The model also trains on over 80 coding languages, with a focus on Python, C++, Javascript, C, and Java. The team has said that Large 2 is exceptional in following instructions and remembering long conversations.

The major difference between the Llama 3.1 family and the Mistral Large 2 release is their respective licenses. While Llama 3.1 is released for both commercial and research purposes, Mistral Large 2 is released under the Mistral Research License, allowing developers to research it but not use it for developing commercial applications. The team assures that developers can work with Mistral Large to create the best Agentic systems, leveraging its exceptional JSON and tool-calling skills.

Mistral Large 2 Compared to the Best: A Benchmark Analysis

Mistral Large 2 gets great results on the HuggingFace Open LLM Benchmarks. Coming to the coding, it outperforms the recently released Codestral and CodeMamba and the performance comes close to the leading models like the GPT 4o, Opus, and the Llama 3.1 405B.

The above graph pic depicts Reasoning benchmarks for different models. We can notice that Large 2 is good at Reasoning. The Large 2 just falls short of the GPT 4o model from OpenAI. Compared to the previously released Mistral Large, the Mistral Large 2 beats its older self by a huge margin.

This graph gives us information about the scores performed by different SOTA models in the Multi-Lingual MMLU benchmark. We can notice that the Mistral Large 2 is very close to the Llama 3.1 405B in terms of performance despite being 3 times smaller and beats the other models in all the above languages.

Hands-On with Mistral Large 2: Accessing the Model via API

In this section, we will get an API Key from the Mistral website, which will let us access their newly released Mistral Large 2 model. For this, first, we need to sign up on their portal which can be accessed by clicking the link here. We need to verify with our mobile number to create an API Key. Then visit the link here to create the API key.

Above, we can see that we can create a new API Key by clicking on the Create new key button. So, we will create a key and store it.

Downloading Libraries

Now, we will start by downloading the following libraries.

!pip install -q mistralai

This downloads the mistralai library, maintained by Mistral AI, allowing us to access all the models created by the Mistral AI team through the API key we created.

Storing Key in Environment

Next, we will store our key in an environment variable with the below code:

import os
os.environ["MISTRAL_API_KEY"] = "YOUR_API_KEY"

Testing the Model

Now, we will begin the coding part to test the new model.

from mistralai.client import MistralClient
from mistralai.models.chat_completion import ChatMessage

message = [ChatMessage(role="user", content="What is a Large Language Model?")]
client = MistralClient(api_key=os.environ["MISTRAL_API_KEY"])

response = client.chat(
   model="mistral-large-2407",
   messages=message
)

print(response.choices[0].message.content)

We start by importing the MistralClient, which will let us access the model and the ChatMessage class with which we will create the Prompt Message.
Then we define a list of ChatMessage instances by giving the instance, the role, which is the user, and the content, here we are asking about LLMs.
Then we create an instance of the MistralClient by giving it the API Key.
Now we call the chat() method of the client object and give it the model name which is mistral-large-2407, it is the name for the Mistral Large 2.
We give the list of messages to the messages parameter, and the response variable stores the generated answer.
Finally, we print the response. The text response is stored in the response.choice[0].message.content, which follows the OpenAI style.

Output

Running this has produced the output below:

The Large Language Model generates a well-structured and straight-to-the-point response. We have seen that the Mistral Large 2 performs well at coding tasks. So let us test the model by asking it a coding-related question.

response = client.chat(
   model="mistral-large-2407",
   messages=[ChatMessage(role="user", content="Create a good looking profile card in css and html")]
)
print(response.choices[0].message.content)

Here, we have asked the model to generate a code to create a good-looking profile card in CSS and HTML. We can check the response generated above. The Mistral Large 2 has generated an HTML code followed by the CSS code generation and finally explains how it works. It even tells us to replace the profile-pic.png so that we can get our photo there. Now let us test this in an online web editor.

The results can be seen below:

Now this is a good-looking profile card. The styling is impressive, with a rounded photo and a well-chosen color scheme. The code includes hyperlinks for Twitter, LinkedIn, and GitHub, allowing you to link to their respective URLs. Overall, Mistral Large 2 serves as an excellent coding assistant for developers who are just getting started.

Generating Structured Responses and Tool Calling

The Mistral AI team has announced that the Mistral Large 2 is one of the best choices to create Agentic Workflows, where a task requires multiple Agents and the Agents require multiple tools to solve it. For this to happen, the Mistral Large has to be good at two things, the first is generating structured responses that are in JSON format and the next is being an expert in tool calling to call different tools.

Testing the model

Let us test the model by asking it to generate a response in JSON format.

For this, the code will be:

messages = [
   ChatMessage(role="user", content="""Who are the best F1 drivers and which team they belong to? /
   Return the name and the ingredients in short JSON object.""")
]


response = client.chat(
   model="mistral-large-2407",
   response_format={"type": "json_object"},
   messages=messages,
)


print(response.choices[0].message.content)

Here, the process for generating a JSON response is very similar to the chat completions. We just send a message to the model asking it to generate a JSON response. Here, we are asking it to generate a JSON response of some of the best F1 drivers along with the team they drive for. The only difference is that, inside the chat() function, we give a response_format parameter to which we give a dictionary stating that we need a JSON response.

Running the code

Running the code and checking the results above, we can see that the model has indeed generated a JSON response.

We can validate the JSON response with the below code:

import json

try:
 json.dumps(chat_response.choices[0].message.content)
 print("Valid JSON")
except Exception as e:
 print("Failed")

Running this has printed Valid JSON to the terminal. So the Mistral Large 2 is capable of generating valid JSONs.

Testing Function Calling Abilities

Let us test the function-calling abilities of this model as well. For this:

def add(a: int, b: int) -> int:
 return a+b
tools = [
   {
       "type": "function",
       "function": {
           "name": "add",
           "description": "Adds two numbers",
           "parameters": {
               "type": "object",
               "properties": {
                   "a": {
                       "type": "integer",
                       "description": "An integer number",
                   },
                   "b": {
                       "type": "integer",
                       "description": "An integer number",
                   },
               },
               "required": ["a","b"],
           },
       },
   }
]


name_to_function = {
   "add": add
}

We start by defining the function. Here we defined a simple add function that takes two integers and adds them.
Now, we need to create a dictionary explaining this function. The type key tells us that this tool is a function, followed by that we give information like what is the function name, what the function does.
Then, we give it the function properties. Properties are the function parameters. Each parameter is a separate key and for each parameter, we tell the type of the parameter and provide a description of it.
Then we give the required key, for this the value will be the list of all required variables. For an add function to work, we require both parameters a and b, hence we give both of them to the required key.
We create such dictionaries for each function that we create and append it to a list.
We even create a name_to_function dictionary which will map our function names in strings to the actual functions.

Testing the Model Again

Now, we will give this function to the model and test it.

response = client.chat(
   model="mistral-large-2407",
   messages=[ChatMessage(role="user", content="I have 19237 apples and 21374 oranges. How many fruits I have in total?")],
   tools=tools,
   tool_choice="auto"
)

from rich import print as rprint

rprint(response.choices[0].message.tool_calls[0])
rprint("Function Name:",response.choices[0].message.tool_calls[0].function.name)
rprint("Function Args:",response.choices[0].message.tool_calls[0].function.arguments)

Here to the chat() function, we give the list of tools to the tools parameter and set the tool_choice to auto.
The auto will let the model decide whether it has to use a tool or not.
We have given it a query by providing the quantity of two fruits and asking it to sum them.
We import rich to get better printing of responses.
All the tool calls generated by the model will be stored in the tools_call attribute of the message class. We access the first tool call by indexing it [0].
Inside this tool_call, we have different attributes like to which function the tool call refers to and what are the function arguments. All these we are printing in the above code.

We can take a look at the output pic above. The part above the func_name is the output generated from the above code. The model has indeed made a tool call to the add function. It has provided the arguments a and b along with their values for the function arguments. Now the function argument looks like a dictionary but it is a string. So to convert it to a dictionary and give it to the model we use the json.loads() method.

So, we access the function from the name_to_function dictionary and then give it the parameters that it takes and print the output that it generates. From this example, we have taken a look at the tool-calling abilities of the Mistral Large 2.

Conclusion

Mistral Large 2, the latest open model from Mistral AI, boasts an impressive 123 billion parameters and demonstrates exceptional instruction-following and conversation-remembering capabilities. While it falls short of Llama 3.1 405B in terms of size, it outperforms other models in coding tasks and shows remarkable performance in reasoning and multilingual benchmarks. Its ability to generate structured responses and call tools makes it an excellent choice for creating Agentic workflows.

Key Takeaways

Mistral Large 2 is Mistral AI’s largest open model, with 123 billion parameters and 96 attention heads.
Trained on datasets containing different languages, including Hindi, French, Korean, Portuguese, and over 80 coding languages.
Beats Codestral and CodeMamba, in terms of coding abilities and is on par with the SOTA models.
Despite being 3 times smaller than the Llama 3.1 405B model, Mistra Large 2 is very close to this model in multi-lingual capabilities.
Being fine-tuned on large datasets of code, the Mistral Large 2 can generate working code which was seen in this article.

Frequently Asked Questions

Q1. Can Mistral Large 2 be used for commercial applications?

A. No, Mistral Large 2 is released under the Mistral Research License, which restricts commercial use.

Q2. Can Mistral Large 2 generate structured responses?

A. Yes, Mistral Large 2 can generate structured responses in JSON format, making it suitable for Agentic workfl.ows

Q3. Does Mistral Large 2 have tool-calling abilities?

A. Yes, Mistral Large 2 can call external tools and functions. It is good at grasping the functions given to it and selects the best based on events.

Q4. How can one interact with the Mistral Large 2 model?

A. Currently, anyone can sign up for the Mistral AI website and create a free API key for a few days, with which we can interact with the model through the mistralai library.

Q5. On what other platforms Mistral Large 2 is available?

A. Mistral Large 2 is available on popular cloud providers like the Vertex AI from GCP, Azure AI Studio from Azure, Amazon Bedrock, and even on IBM Watson.ai.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Ajay

I work as a Developer in the field of Data Science. I constantly spend time learning new things be it related to AI, DataSceine, and CyberSecurity. Deep learning and machine learning are two topics that I find particularly fascinating, and Python is my preferred language for programming. Cyber Security is another field that I'm touching upon recently. I have experience with large-scale data analysis, and I have a solid grasp of a variety of deep learning and machine learning approaches, including neural networks, regression models, and natural language processing. I'm eager to take on new challenges and make a meaningful contribution to the industry, so I'm constantly seeking for ways to enlarge and deepen my knowledge and skills in the subject.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Mistral Large 2: Powerful Enough to Challenge Llama 3.1 405B?

Introduction

Learning Objectives

Table of contents

Exploring Mistral Large 2 – Mistral’s Largest Open Model

Mistral Large 2 Compared to the Best: A Benchmark Analysis

Hands-On with Mistral Large 2: Accessing the Model via API

Downloading Libraries

Storing Key in Environment

Testing the Model

Output

Testing Based on Coding Related Questions

Generating Structured Responses and Tool Calling

Testing the model

Running the code

Testing Function Calling Abilities

Testing the Model Again

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt