Qwen2: Alibaba Cloud’s Open-Source LLM

Ajay Kumar Reddy 12 Jun, 2024
8 min read

Introduction

Many new companies are popping up and releasing new open source Large Language Models in the coming years. As time progresses, these models are becoming closer and closer to the paid closed-source models. These companies are releasing these models in various sizes and making sure to keep their licenses so that anyone can use them commercially. One such group of models is Qwen. Its previous models have proven to be one of the best open source models alongside Mistral and Zephyr and now they have recently announced a version 2 of it called the Qwen2.

Qwen

Learning Objectives

  • Learn about Qwen, Alibaba Cloud’s open-source language models.
  • Discover Qwen2’s new features.
  • Review Qwen2’s performance benchmarks.
  • Trying Qwen2 with the HuggingFace Transformer library.
  • Recognize Qwen2’s commercial and open-source potential.

This article was published as a part of the Data Science Blogathon.

What is Qwen?

Qwen refers to a family of Large Language Models backed by Alibaba Cloud, a firm located in China. It has made a great contribution to AI Space by releasing many of its open-source models that are on par with the top models on the HuggingFace leaderboard. Qwen has released its models in different sizes ranging from the 7 Billion Parameter model to the 70 Billion Parameter model. They have not just released the models but have finetuned them in a way that was at the top of the leaderboard when they were released.

But Qwen did not stop it with this. It has even released Chat Finetuned models, LLMs that were heavily trained in Mathematics and Code. It has even released vision language models. The Qwen team is even moving to the audio space to release Text-to-speech models. Qwen is trying to create an ecosystem of open-source models readily available for everyone to start building applications with them without any restrictions and for commercial purposes.

What is Qwen2?

Qwen received much appreciation from the open-source community when it was released. A lot of derivates have been created from this Qwen model. Recently the Qwen team has announced a series of successor models to its previous generation, called the Qwen2 with more models and more finetuned versions compared to previous generations.

Qwen2 was released in 5 different sizes, which include the 0.5B, 1.5B, 7B, 14B, and 72 Billion versions. These models have been pretrained on more than 27 different languages and have been significantly improved in the areas of code and mathematics compared to the earlier generation of models. The great thing is here is that even the 0.5B and the 1.5B models come with 32k context length. While the 7B and the 72B come with 128k context length.

All these models have Grouped Query Attention, which greatly speeds up the process of attention and the amount of memory required to store the intermediate results during the inference.

Performance and Benchmarks

Coming to the base model comparisons, the Qwen2 72B Large Language Model outperforms the newly released Llama3 70B model and the mixture of exports Mixtral 8x22B model. We can see the benchmark scores in the below pic. The Qwen model outperforms both the Llama3 and Mixtral in many benchmarks like MMLU, MMLU-Pro, TheoremQA, HumanEval, GSM8k and many more.

Qwen2: Performance and Benchmarks

Coming to the smaller model i.e. the Qwen2 7B Instruct Model, it also outperforms the newly introduced SOTA(State-Of-The-Art) models like the Llama3 8B Model and the GLM4 9B Model. Despite Qwen2 being the smallest model of the three, it outperforms both of them and the results for all the benchmarks can be seen in the below pic.

Qwen2 7B Instruct Model

Qwen2 in Action

We will be working with Google Colab to try out the Qwen2 model.

Step1: Download Libraries

To get started, we need to download a few helper libraries. For this, we work with the below code:

!pip install -U -q transformers accelerate
  • transformers: It is a popular Python package from HuggingFace, with which we can download any deep learning models and work with them.
  • accelerate: Even this, is a package developed by HuggingFace. This package helps in increasing the inference speed of the Large Language Models when they are running on the GPU.

Step2: Download the Qwen Model

Now we will write the code to download the Qwen model and test it. The code for this will be:

from transformers import pipeline

device = "cuda"

pipe = pipeline("text-generation",
                model="Qwen/Qwen2-1.5B-Instruct",
                device=device,
                max_new_tokens=512,
                do_sample=True,
                temperature=0.7,
                top_p=0.95,
                )
  • We start by importing the pipeline function from the transformers library.
  • Then we set the device to which the model has to be mapped to. Here, we set it to cuda, which means the model will be sent to GPU if available.
  • model=”Qwen/Qwen2-1.5B-Instruct”: This tells the pre-trained model to be worked with device=device: This tells the device to be used for running the model.
  • max_new_tokens=512: Here, we give the maximum number of new tokens to be generated.
  • do_sample=True: This enables sampling during generation for increased diversity in the output.
  • temperature=0.7: This controls the randomness of the generated text. Higher values lead to more creative and unpredictable outputs.
  • top_p=0.95: This sets the probability mass to be considered for the next token during generation.

Step3: Giving List of Messages to the Model

Now, let us try giving the model a list of messages for the input and see the output that it generates for the given list of messages.

messages = [
    {"role": "system",
     "content": "You are a funny assistant. You must respons to user questions in funny way"},
    {"role": "user", "content": "What is life?"},
]

response = pipe(messages)

print(response[0]['generated_text'][-1]['content'])
  • Here, the first message is a system message that instructs the assistant to be funny.
  • The second message is a user message that asks “What is life?”.
  • We put both these messages as items in a list.
  • Then we give this list, containing a list of messages to the pipeline object, that is to our model.
  • The model then processes these messages and generates a response.
  • Finally, we extract the content of the last generated text from the response.

Running this code has produced the following output:

output

We see that the model indeed tried to generate a funny answer.

Step4: Testing the Model with Mathematics Questions

Now let us test the model with a few mathematics questions. The code for this will be:

messages = [
    {"role": "user", "content": "If a car travels at a constant speed of \
    60 miles per hour, how far will it travel in 45 minutes?"},
    {"role": "assistant", "content": "To find the distance, \
    use the formula: distance = speed × time. Here, speed = 60 miles per \
    hour and time = 45 minutes = 45/60 hours. So, distance = 60 × (45/60) = 45 miles."},
    {"role": "user", "content": "How far will it travel in 2.5 hours? Explain step by step"}
]

response = pipe(messages)

print(response[0]['generated_text'][-1]['content'])
  • Here again, we are creating a list of messages.
  • The first message is a user message that asks how far a car will travel in 45 minutes at a constant speed of 60 miles per hour.
  • The second message is an assistant message that provides the solution to the user’s question using the formula distance = speed × time.
  • The third message is again a user message asking the assistant another question.
  • Then we give this list of messages to the pipeline.
  • The model will then process these messages and generate a response.

The output generated by running the code can be seen below:

Output

We can see that the Qwen2 1.5B model started thinking step by step to answer the user question. It first started by defining the formula to calculate distance. Following that, it wrote down the information it has regarding the speed and time. Then it has finally put together these things to make up the final answer. Despite just being a 1.5 Billion Parameter model, the model is truly working well.

Testing with More Examples

Let us test the model with a few more examples:

messages = [
    {"role": "user", "content": "A clock shows 12:00 p.m. now. \
    How many degrees will the minute hand move in 15 minutes?"},
    {"role": "assistant", "content": "The minute hand moves 360 degrees \
    in one hour (60 minutes). Therefore, in 15 minutes, it will \
    move (15/60) * 360 degrees = 90 degrees."},
    {"role": "user", "content": "How many degrees does the hour hand \
    move in 15 minutes?"}
]

response = pipe(messages)

print(response[0]['generated_text'][-1]['content'])
output
messages = [
    {"role": "user", "content": "Convert 100 degrees Fahrenheit to Celsius."},
    {"role": "assistant", "content": "To convert Fahrenheit to Celsius,\
     use the formula: C = (F - 32) × 5/9. So, for 100 degrees Fahrenheit, \
     C = (100 - 32) × 5/9 = 37.78 degrees Celsius."},
    {"role": "user", "content": "What is 0 degrees Celsius in Fahrenheit?"}
]

response = pipe(messages)

print(response[0]['generated_text'][-1]['content'])
Qwen2
messages = [
    {"role": "user", "content": "What gets wetter as it dries?"},
    {"role": "assistant", "content": "A towel gets wetter as it dries \
    because it absorbs the water from the body, becoming wetter itself."},
    {"role": "user", "content": "What has keys but can't open locks?"}
]

response = pipe(messages)

print(response[0]['generated_text'][-1]['content'])
Qwen2

Here we have additionally tested the model with three other examples. The first two examples are on mathematics again. We see that Qwen2 1.5B was able to understand the question well and was able to generate a pleasing answer.

But in the example, it has failed. The answer to the question is the piano keys. That is a piano that has keys but can’t open locks. The model has failed to answer this but came up with a different answer. It answered the keychain and even gave a supporting statement to it. We cannot exactly say it has failed because technically a keychain contains open locks but the keys in the keychain do.

Overall, we see that despite being a 1.5 Billion parameter model, the Qwen2 1.5B has answered the mathematical questions correctly and was able to provide good reasoning around the answers it generated. This tells us the bigger parameter models like the Qwen2 7B, 14B, and 72B models can perform extremely well in different tasks.

Conclusion

Qwen2, a new series of open-source models from Alibaba Cloud, represents a great advancement in the field of large language models (LLMs). Building on the success of its predecessor, Qwen2 offers a range of models from 0.5B to 72B parameters, excelling in performance across various benchmarks. The models are designed to be versatile and commercially accessible, supporting multiple languages and featuring improved capabilities in code, mathematics, and more. Qwen2’s impressive performance and open accessibility position it as a formidable competitor to closed-source alternatives, fostering innovation and application development in AI.

Key Takeaways

  • Qwen2 continues the trend of high-quality open-source LLMs, providing robust alternatives to closed-source models.
  • The Qwen2 series includes models from 0.5 billion to 72 billion parameters, catering to diverse computational needs and use cases.
  • Qwen2 models are pretrained in over 27 languages, enhancing their applicability in global contexts
  • Licenses that allow for commercial use promote widespread adoption and innovation of Qwen2 models.
  • Developers and researchers can easily integrate and utilize the models via popular tools like HuggingFace’s transformers library, making them accessible.

Frequently Asked Questions

Q1. What is Qwen?

A. Qwen is a family of large language models created by Alibaba Cloud. They release open-source models in various sizes that are competitive with paid models.

Q2. What is Qwen2?

A. Qwen2 is the latest version of Qwen models with improved performance and more features. It comes in different sizes, ranging from 0.5 billion to 72 billion parameters.

Q3. How do I use Qwen2 for text generation?

A. You can use the pipeline function from the Transformers library to generate text with Qwen2. The code example in the article shows how to do this.

Q4. How does Qwen2 perform?

A. Qwen2 outperforms other leading models in many benchmarks, including language understanding, code generation, and mathematical reasoning.

Q5. Can Qwen2 answer math questions?

A. Yes, Qwen2, especially the larger models, can answer math questions and provide explanations for the answers.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Ajay Kumar Reddy 12 Jun, 2024

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear