Many new companies are popping up and releasing new open source Large Language Models in the coming years. As time progresses, these models are becoming closer and closer to the paid closed-source models. These companies are releasing these models in various sizes and making sure to keep their licenses so that anyone can use them commercially. One such group of models is Qwen. Its previous models have proven to be one of the best open source models alongside Mistral and Zephyr and now they have recently announced a version 2 of it called the Qwen2.
Also, in this article you we will also talk about qwen llm and qwen 2 llm you will get full understanding of these.
This article was published as a part of the Data Science Blogathon.
Qwen refers to a family of Large Language Models backed by Alibaba Cloud, a firm located in China. It has made a great contribution to AI Space by releasing many of its open-source models that are on par with the top models on the HuggingFace leaderboard. Qwen has released its models in different sizes ranging from the 7 Billion Parameter model to the 70 Billion Parameter model. They have not just released the models but have finetuned them in a way that was at the top of the leaderboard when they were released.
But Qwen did not stop it with this. It has even released Chat Finetuned models, LLMs that were heavily trained in Mathematics and Code. It has even released vision language models. The Qwen team is even moving to the audio space to release Text-to-speech models. Qwen is trying to create an ecosystem of open-source models readily available for everyone to start building applications with them without any restrictions and for commercial purposes.
Qwen received much appreciation from the open-source community when it was released. A lot of derivates have been created from this Qwen model. Recently the Qwen team has announced a series of successor models to its previous generation, called the Qwen2 with more models and more finetuned versions compared to previous generations.
Qwen2 was released in 5 different sizes, which include the 0.5B, 1.5B, 7B, 14B, and 72 Billion versions. These models have been pretrained on more than 27 different languages and have been significantly improved in the areas of code and mathematics compared to the earlier generation of models. The great thing is here is that even the 0.5B and the 1.5B models come with 32k context length. While the 7B and the 72B come with 128k context length.
All these models have Grouped Query Attention, which greatly speeds up the process of attention and the amount of memory required to store the intermediate results during the inference.
Coming to the base model comparisons, the Qwen2 72B Large Language Model outperforms the newly released Llama3 70B model and the mixture of exports Mixtral 8x22B model. We can see the benchmark scores in the below pic. The Qwen model outperforms both the Llama3 and Mixtral in many benchmarks like MMLU, MMLU-Pro, TheoremQA, HumanEval, GSM8k and many more.
Coming to the smaller model i.e. the Qwen2 7B Instruct Model, it also outperforms the newly introduced SOTA(State-Of-The-Art) models like the Llama3 8B Model and the GLM4 9B Model. Despite Qwen2 being the smallest model of the three, it outperforms both of them and the results for all the benchmarks can be seen in the below pic.
We will be working with Google Colab to try out the Qwen2 model.
To get started, we need to download a few helper libraries. For this, we work with the below code:
!pip install -U -q transformers accelerate
Now we will write the code to download the Qwen model and test it. The code for this will be:
from transformers import pipeline
device = "cuda"
pipe = pipeline("text-generation",
model="Qwen/Qwen2-1.5B-Instruct",
device=device,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_p=0.95,
)
Now, let us try giving the model a list of messages for the input and see the output that it generates for the given list of messages.
messages = [
{"role": "system",
"content": "You are a funny assistant. You must respons to user questions in funny way"},
{"role": "user", "content": "What is life?"},
]
response = pipe(messages)
print(response[0]['generated_text'][-1]['content'])
Running this code has produced the following output:
We see that the model indeed tried to generate a funny answer.
Now let us test the model with a few mathematics questions. The code for this will be:
messages = [
{"role": "user", "content": "If a car travels at a constant speed of \
60 miles per hour, how far will it travel in 45 minutes?"},
{"role": "assistant", "content": "To find the distance, \
use the formula: distance = speed × time. Here, speed = 60 miles per \
hour and time = 45 minutes = 45/60 hours. So, distance = 60 × (45/60) = 45 miles."},
{"role": "user", "content": "How far will it travel in 2.5 hours? Explain step by step"}
]
response = pipe(messages)
print(response[0]['generated_text'][-1]['content'])
The output generated by running the code can be seen below:
We can see that the Qwen2 1.5B model started thinking step by step to answer the user question. It first started by defining the formula to calculate distance. Following that, it wrote down the information it has regarding the speed and time. Then it has finally put together these things to make up the final answer. Despite just being a 1.5 Billion Parameter model, the model is truly working well.
Let us test the model with a few more examples:
messages = [
{"role": "user", "content": "A clock shows 12:00 p.m. now. \
How many degrees will the minute hand move in 15 minutes?"},
{"role": "assistant", "content": "The minute hand moves 360 degrees \
in one hour (60 minutes). Therefore, in 15 minutes, it will \
move (15/60) * 360 degrees = 90 degrees."},
{"role": "user", "content": "How many degrees does the hour hand \
move in 15 minutes?"}
]
response = pipe(messages)
print(response[0]['generated_text'][-1]['content'])
messages = [
{"role": "user", "content": "Convert 100 degrees Fahrenheit to Celsius."},
{"role": "assistant", "content": "To convert Fahrenheit to Celsius,\
use the formula: C = (F - 32) × 5/9. So, for 100 degrees Fahrenheit, \
C = (100 - 32) × 5/9 = 37.78 degrees Celsius."},
{"role": "user", "content": "What is 0 degrees Celsius in Fahrenheit?"}
]
response = pipe(messages)
print(response[0]['generated_text'][-1]['content'])
messages = [
{"role": "user", "content": "What gets wetter as it dries?"},
{"role": "assistant", "content": "A towel gets wetter as it dries \
because it absorbs the water from the body, becoming wetter itself."},
{"role": "user", "content": "What has keys but can't open locks?"}
]
response = pipe(messages)
print(response[0]['generated_text'][-1]['content'])
Here we have additionally tested the model with three other examples. The first two examples are on mathematics again. We see that Qwen2 1.5B was able to understand the question well and was able to generate a pleasing answer.
But in the example, it has failed. The answer to the question is the piano keys. That is a piano that has keys but can’t open locks. The model has failed to answer this but came up with a different answer. It answered the keychain and even gave a supporting statement to it. We cannot exactly say it has failed because technically a keychain contains open locks but the keys in the keychain do.
Overall, we see that despite being a 1.5 Billion parameter model, the Qwen2 1.5B has answered the mathematical questions correctly and was able to provide good reasoning around the answers it generated. This tells us the bigger parameter models like the Qwen2 7B, 14B, and 72B models can perform extremely well in different tasks.
Qwen2, a new series of open-source models from Alibaba Cloud, represents a great advancement in the field of large language models (LLMs). Building on the success of its predecessor, Qwen2 offers a range of models from 0.5B to 72B parameters, excelling in performance across various benchmarks. The models are designed to be versatile and commercially accessible, supporting multiple languages and featuring improved capabilities in code, mathematics, and more. Qwen2’s impressive performance and open accessibility position it as a formidable competitor to closed-source alternatives, fostering innovation and application development in AI.
A. Running Qwen-72B requires at least 8 NVIDIA A100 GPUs (80GB each), a high-performance multi-core CPU, substantial RAM (hundreds of GB), and high-speed NVMe SSD storage.
A. Yes, Qwen (Qwen-7B and Qwen-7B-Chat) is open source under the Apache 2.0 license, released by Alibaba.
A. You can use the pipeline function from the Transformers library to generate text with Qwen2. The code example in the article shows how to do this.
A. Qwen2 outperforms other leading models in many benchmarks, including language understanding, code generation, and mathematical reasoning.
A. Yes, Qwen2, especially the larger models, can answer math questions and provide explanations for the answers.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.