Solar 10.7B: Comparing Its Performance to Other Notable LLMs

Ajay Last Updated : 16 Feb, 2024

12 min read

Introduction

Transformers and the Large Language Models have taken the world by storm after they have been introduced in the field of Natural Language Processing (NLP). Since their inception, the field has been quickly evolving with innovations and research that make these LLMs more efficient. These include LoRA(Low-Rank Adaption), Flash Attention, Quantization, and the recent Merging approach of the notable LLMs. In this guide, we will look at a new approach to merging LLMs (Solar 10.7B) introduced by the Upstage AI.

Learning Objectives

Understand the unique architecture of Solar 10.7B and its innovative “depth up-scaling”
Explore the model’s pre-training process and the diverse data it consumes
Analyze the impressive performance benchmarks of Solar 10.7B across different NLP tasks
Compare and contrast Solar 10.7B with other notable LLMs, like Mixtral MoE
Learn how to access and work with Solar 10.7B for your projects

This article was published as a part of the Data Science Blogathon.

What is SOLAR 10.7B?
What is Depth Up Scaling?
Training the SOLAR 10.7B
Evaluation and Benchmark Results
Getting Started with SOLAR 10.7B
SOLAR 10.7B vs Mixtral MoE
Limitations and Considerations
Frequently Asked Questions

What is SOLAR 10.7B?

Upstange AI introduced the new 10.7 Billion Parameter model, SOLAR 10.7B. This model is a result of merging two 7 Billion Parameter Models, specifically two Llama 2 7 Billion models, which were pretrained to create SOLAR 10.7B. The unique aspect of this merge is the application of a new approach called Depth Up-Scaling (DUS), contrasting with the Mixtral method where a mixture of experts is employed.

The new 10.7B Model outperformed the Mistral 7B, Qwen 14B. An Instruct version called SOLAR 10.7B Instruct has been released, and upon its release, it topped the leaderboard, surpassing both the Qwen 72B and the Mixtral 8x7B Large Language Model. Despite being a 10.7 Billion Parameter model, the SOLAR was able to outperform the LLMs that are multiple times its size

What is Depth Up Scaling?

Let’s understand how it all began, and the formation of SOLAR 10.7B. It all starts with a single Base Model. The Upstage has chosen the Llama 2 containing 32 Transformer Layers for its Base Model due to its wider Open Source Contributors. Then a copy of this Base Model was created

We then get two Base Models. As for the weights, the Upstage has taken the pretrained weights from the Mistral 7B because it was performing the best at that time. Now, we start the depthwise scaling. Each of the Base Models contains 32 Layers. From these 32 Layers, we remove m Layers, that is the final m Layers from the Original Model and the first m layers from the copy version of it. This adds up to 24 Layers in each of them. Then we merge these two models:

The two Base Models are concatenated to form the scaled model. The scaled model now contains 48 Layers. The scaled model performs poorly due to the merging. Hence the scaled model undergoes pretraining. This Depthwise Scaling followed by the continued Pretraining together makes the Depth Up-Scaling (DUS).

depthwise scaling | SOLAR 10.7B | Notable LLMs | Upstage AI

Training the SOLAR 10.7B

The scaled model needs to be pretrained because of the decrease in performance due to merging. The makers said that the performance has risen quickly with pretraining. The pretraining / fine-tuning involved two stages

The first stage was the Instruction Fine-Tuning. In this type of Fine-Tuning, the model underwent training on datasets to align with the instructions. The fine-tuning process involved working with popular Open Source datasets such as Alpaca-GPT4 and OpenOrca. The paper noted that only a subset of the dataset was utilized in fine-tuning the merged model. Along with the Open Source data, the Upstage even trained it with some closed source Math data.

In the second stage, Alignment Tuning is performed. In Alignment Tuning, we take the stage one fine-tuned model and further fine-tune it to be more aligned with humans or powerful AIs like GPT4. This was done through the DPOTrainer(Direct Preference Optimization) an RLHF(Reinforcement Learning with Human Feedback)-like technique.

In Direct Preference Optimization, we have a dataset containing three columns, a Prompt, a preferred answer column, and a rejected answer column. This is then used to train the scaled model to make it generate the answers that we need it to generate. The same datasets that were trained for instruction-finetuning are used here.

Evaluation and Benchmark Results

The Hugging Face OpenLLM Leaderboard uses several benchmarks to evaluate the capabilities of Large Language Models (LLMs). Each benchmark assesses different aspects of an LLM’s performance:

ARC (AI2 Reasoning Challenge): This benchmark tests an LLM’s ability to answer elementary-level science questions, providing insights into the model’s understanding and reasoning of scientific concepts.

MMLU (Massive MultiTask Language Understanding): MMLU is a diverse benchmark that covers 57 different tasks, including questions related to basic mathematics, history, law, computer science, and others. It evaluates the LLM’s ability to process and understand information across multiple disciplines.

HellaSwag: Aimed at testing an LLM’s commonsense reasoning, HellaSwag challenges models to apply everyday logic to a variety of scenarios, assessing their ability to make intuitive judgments similar to human thought processes.

Winogrande: This benchmark similar to the HellaSwag, focuses on commonsense reasoning but with different nuances compared to HellaSwag. It requires LLMs to demonstrate a sophisticated level of understanding and logical reasoning.

TruthfulQA: TruthfulQA evaluates the accuracy and reliability of information provided by LLMs. It includes questions from different areas including science, law, politics, and more, testing the model’s ability to generate truthful and factual responses.

GSM8K: Specifically designed to test Math abilities, GSM8K includes multi-step math problems that need logical reasoning and computational thinking, challenging LLMs to evaluate their problem-solving skills in mathematics.

The base SOLAR 10.7B Model outperformed models like the Mistral 7B Instruct v0.2 model and the Qwen 14B model. The Instruct version of the SOLAR 10.7B was able to even beat the very Large Language Models like the Mistral 8x7B, Qwen 72B, Falcon 180B, and the other huge Large Language Models. It was ahead of all the models in the ARC and the TruthfulQA benchmark

Getting Started with SOLAR 10.7B

The SOLAR 10.7B Model is readily available in the HuggingFace Hub to work with the transformers library. Even the quantized models of the SOLAR 10.7B are available to work with. In this section, we will be downloading the quantized version and try inputting the model with different tasks and seeing the output generated

For testing with the quantized version of SOLAR 10.7B, we will be working with the llama_cpp_python library of Python that lets us run quantized Large Language Models. For this demo, we will be working with the free version of Google Colab.

Download the Package

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip3 install llama-cpp-python
!pip3 install huggingface-hub

The CMAKE_ARGS=”-DLLAMA_CUBLAS=on” and FORCE_CMAKE=1, will allow the llama_cpp_python to work the Nvidia GPU available in the free colab version
Then we install the llama_cpp_python package through the pip3
We even download the huggingface-hub, with which we will be downloading the quantized SOLAR 10.7B model

To work with the SOLAR 10.7B model, we need to first download the quantized version of it. To download it, we will run the following code:

from huggingface_hub import hf_hub_download

# specifying the model name
model_name = "TheBloke/SOLAR-10.7B-Instruct-v1.0-GGUF"
# specifying the type of quantization of the model
model_file = "solar-10.7b-instruct-v1.0.Q2_K.gguf"

# download the model by specifying the model name and quantized model name
model_path = hf_hub_download(model_name, filename=model_file)

Working with Hugging Face Hub

Here, we work with the hugging_face_hub to download the quantized model. For this, we import the hf_hub_download that takes in the following parameters

model_name: This is the type of model that we wish to download. Here we wish to download the SOLAR 10.7B Instruct GGUF model
model_file: Here we tell which quantized version we want to download. Here we will download the 2bit quantized version of the SOLAR 10.7B Instruct
We then pass these parameters to the hf_hub_download, which takes in these parameters and downloads the specified model. After downloading, it returns the path where the model is downloaded
This path returned is being saved in the model_path variable

Now, we can load this model through the llama_cpp_python library. The code for loading the model will be like the one below

from llama_cpp import Llama


llm = Llama(
    model_path=model_path,
    n_ctx=512,  # the number of i/p tokens the model can take
    n_threads=8, # the number of threads to use
    n_gpu_layers=110 # how many layers of the model to offload to the GPU
)

Import the Llama Class

We import the Llama class from the llama_cpp, which takes in the following parameters

model_path: This variable takes in the path where our model is stored. We have got the path from the previous step, which we will be providing here
n_ctx: Here, we give the context length for the model. For now, we are providing 512 tokens for the context length
n_threads: Here we mention the number of threads to be used by the Llama class. For now, we pass it 8, because we have 4 core CPU, where each core can run 2 threads simultaneously
n_gpu_layers: We give this if we have a running GPU, which we do because we are working with the free colab. To this, we pass 110, which tells that we want to offload the entire model into the GPU and do not want some part of it to run in the system RAM
Finally, we create an object from this Llama class and give it to the variable llm

Running this code will load the SOLAR 10.7B quantized model onto the GPU and set the appropriate context length. Now, it’s time to perform some inferences on this model. For this, we work with the below code

output = llm(
  "### User:\nWho are you?\n\n### Assistant:", # User Prompt
  max_tokens=512,  # the number of output tokens generated
  stop=["</s>"],   # the token which tells the LLM to stop
)


print(output['choices'][0]['text']) # llm generated text

Infer the Model

To infer the model, we pass the following parameters to the LLMs:

Prompt/chat template: This is the template needed to chat with the model. The above-mentioned template(### User:\n{user_prompt}?\n\n### Assistant:) is the one that works for the SOLAR 10.7B model. In the template, the sentence after the User is the User Prompt and the generation will be generated after the Assistant
max_tokens: This is the maximum amount of tokens that the Large Language Model can output when a Prompt is given. For now, we are limiting it to 512 tokens
stop: This is the stop token. The stop token tells the Large Language Model that it needs to stop generating further tokens. For SOLAR 10.7B, the stop token is </s>

Running this will store the results in the output variable. The result generated is similar to the OpenAI API call. Hence we can access the generation through the given print statement, which is similar to how we access the generation from the OpenAI responses. The output generated can be seen below

The generated sentence seems good enough without the appearance of major grammatical mistakes. Let’s try the common sense part of the model by giving the following Prompts

output = llm(
  "### User:\nHow many eggs can a monkey lay in its lifetime?\n\n### Assistant:",
  max_tokens=512,  
  stop=["</s>"],        
)


print(output['choices'][0]['text'])

output = llm(
  "### User:\nHow many smartphones can a human eat?\n\n### Assistant:",
  max_tokens=512,  
  stop=["</s>"],        
)


print(output['choices'][0]['text'])

Here we see two examples related to common sense and surprisingly SOLAR 10.7B handles it very well. The Large Language Model was able to deliver the right answers with some useful content. Let’s try testing the math and Reasoning Abilities of the model through the following Prompts

output = llm(
  "### User:\nLook at this series: 80, 10, 70, 15, 60, ... \
  What number should come next?\n\n### Assistant:",
  max_tokens=512,  
  stop=["</s>"],        
)


print(output['choices'][0]['text'])

output = llm(
  "### User:\nJohn runs faster than Ken. Magnus runs faster than John. \
  Does Ken run faster than Magnus?\n\n### Assistant:",
  max_tokens=512,  
  stop=["</s>"],        
)


print(output['choices'][0]['text'])

From the given example Prompts, the SOLAR 10.7B generated a good response. It was able to answer the given mathematical, and logical reasoning correctly and even the questions related to common sense. Overall we can conclude that SOLAR 10.7B Large Language Model is generating good responses

SOLAR 10.7B vs Mixtral MoE

Mixtral 8x7B MoE is created by the Mistral AI with the Mixture of Experts architecture. In brief, this Mixture of Experts, the Mistral employs 8 7Billion Parameter Models. Each of these models has some of its feed-forward networks replaced by other layers called experts. Hence the Mixtral 8x7B is considered to have 8 experts. And everyone the model takes in the Input Prompt, there will be a gating mechanism that selects only 2 of these experts from the 8. The 2 experts then take in this Input Prompt and generate final output tokens. So we can see that there is a bit of complexity involved in this type of merging, where we have to replace the feed-forward layers with other layers and introduce a gating mechanism that selects between these experts

While the SOLAR 10.7B Model from Upstage leverages the Depth Up-Scaling method. In the Depth Up-Scaling, we only just remove some number of the starting layers from a Base Model and the same number of final layers from its copy version. Then we just merge the models by stacking one on top of the other. And with just a few epochs of fine-tuning the merged model can show a rapid growth in performance. Here we do not replace the existing layers with some other layers. Also here we do not have a gating mechanism. In overall, the Depth Up-Scaling is a simple and effective way to merge models that do not involve complexities.

Also comparing the performances, the Depth Up-Scaling, though by just combining two 7 Billion Models, the SOLAR 10.7B was able to clearly outperform the Mixtral 8x7B, which is a far larger model in comparison. This proves the effectiveness of a simple merging method over a complex one like the Mixtral of Experts

Limitations and Considerations

Hyperparameter Exploration: A crucial limitation is the insufficient exploration of hyperparameters in the DUS approach. Due to hardware limitations, 8 layers were removed from both ends of the Base Model without verifying if this number is optimal for getting the best performance. Future work aims to conduct more rigorous experiments and to do an analysis to address this.
Computational Demands: The model needs a huge amount of computational resources for training and inference. This could limit its usage, mainly for those with limited computational capabilities.
Biases in Training Data: Like all machine learning models, it is susceptible to biases present in the training data, potentially leading to skewed outcomes in certain scenarios.
Environmental Impact: Even the energy consumption necessary for training and operating the model poses environmental concerns, highlighting the importance of sustainable AI development.
Model’s Broader Implications: While the model shows improved performance in following instructions, it still requires task-specific fine-tuning for optimal performance in specialized applications. This fine-tuning process is resource-intensive and may not always be effective.

Conclusion

In this guide, we have taken a look at the recently released SOLAR 10.7Billion Parameter model by the Upstage AI. Upstage AI has taken a new approach to merge and scale models. The paper used a new approach called Depth Up-Scaling to merge two Llama-2 7 Billion Parameter models by removing some of the starting and final transformer layers. Afterward, it fine-tuned the model on Open Source datasets and tested it on the OpenLLM Leaderboard, achieving the highest H6 score and topping the leaderboard.

Key Takeaways

SOLAR 10.7B introduces Depth Up-Scaling, a unique merging approach, challenging traditional methods and showing the advancements in model architecture
Despite its 10.7 billion parameters, SOLAR 10.7B outshines larger models, surpassing Mistral 7B, Qwen 14B, and even topping leaderboards with versions like SOLAR 10.7B Instruct
The two-stage fine-tuning process involving Instruction and Alignment Tuning ensures the model’s adaptability to different tasks, making it very good at following instructions and aligning with human preferences
SOLAR 10.7B excels across diverse benchmarks, thus showing its competence in tasks ranging from Basic Mathematics and language understanding to commonsense reasoning and truthfulness evaluation
Readily available on the HuggingFace Hub, SOLAR 10.7B provides developers and researchers with an efficient and available tool for language-processing applications
You can fine-tune the model using the regular methods employed for fine-tuning large language models. For instance, you can utilize the Supervised Fine-Tune Trainer (SFTrainer) from Hugging Face to fine-tune the SOLAR 10.7B Model.

Frequently Asked Questions

Q1. What is SOLAR 10.7B, and how does it stand out in the world of LLMs?

A. SOLAR 10.7B is a 10.7 billion parameter model by Upstage AI, utilizing a unique merging technique called Depth Up-Scaling. It distinguishes itself by outperforming larger LLMs and showcasing advancements in merging models.

Q2. How does Depthwise Scaling Work?

A. Depthwise Scaling involves two base models. The process involves directly merging these two base models by stacking them on top of one another. Before the merging takes place, the initial layers from one model and the final layers from the other model are removed.

Q3. How was SOLAR 10.7B trained?

A. SOLAR 10.7B undergoes a two-stage pretraining process. Instruction fine-tuning involves training the model on datasets emphasizing instruction-following. Alignment tuning refines the model’s alignment with human preferences using a technique called Direct Preference Optimization (DPO).

Q4. How does SOLAR 10.7B perform in benchmark evaluations?

A. SOLAR 10.7B excels across various benchmarks, including ARC (AI2 Reasoning Challenge), MMLU (Massive MultiTask Language Understanding), HellaSwag, Winogrande, TruthfulQA, and GSM8K. It achieves high scores, demonstrating its versatility in handling different language tasks.

Q5. How does SOLAR 10.7B compare to other large models, such as Mistral 7B and Qwen 14B?

A. SOLAR 10.7B surpasses models like Mistral 7B and Qwen 14B, showcasing superior performance despite having fewer parameters. The instruct version even competes with and outperforms very large models, including Mistral 8x7B and Qwen 72B, on various benchmarks.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Ajay

I work as a Developer in the field of Data Science. I constantly spend time learning new things be it related to AI, DataSceine, and CyberSecurity. Deep learning and machine learning are two topics that I find particularly fascinating, and Python is my preferred language for programming. Cyber Security is another field that I'm touching upon recently. I have experience with large-scale data analysis, and I have a solid grasp of a variety of deep learning and machine learning approaches, including neural networks, regression models, and natural language processing. I'm eager to take on new challenges and make a meaningful contribution to the industry, so I'm constantly seeking for ways to enlarge and deepen my knowledge and skills in the subject.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Solar 10.7B: Comparing Its Performance to Other Notable LLMs

Introduction

Learning Objectives

Table of contents

What is SOLAR 10.7B?

What is Depth Up Scaling?

Training the SOLAR 10.7B

Evaluation and Benchmark Results

Getting Started with SOLAR 10.7B

Download the Package

Working with Hugging Face Hub

Import the Llama Class

Infer the Model

SOLAR 10.7B vs Mixtral MoE

Limitations and Considerations

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp