EXAONE 3.5 is the latest iteration in a series of large language models developed by LG AI Research, designed to enhance the capabilities and accessibility of artificial intelligence technologies. Released in December 2024, EXAONE 3.5 encompasses three distinct configurations: 2.4 billion, 7.8 billion, and 32 billion parameters. Each model variant is tailored to meet different performance needs, ranging from lightweight applications suitable for mobile devices to high-performance tasks requiring extensive computational resources. With a focus on bilingual proficiency in English and Korean, EXAONE 3.5 aims to set new standards in instruction-following accuracy and long-context understanding, making it an invaluable tool across various sectors.
This article was published as a part of the Data Science Blogathon.
Reasoning-based large language models , like EXAONE 3.5, process complex tasks that require logical thinking, problem-solving, and understanding of intricate patterns. Built using advanced architectures such as transformer-based networks, these models excel at handling sequential data and long-contexts. They train on vast datasets to recognize relationships between pieces of information, enabling them to generate accurate responses to queries, reason through problems, and follow instructions effectively.
By leveraging fine-tuning techniques like Supervised Fine-tuning (SFT) and Direct Preference Optimization (DPO), these LLMs refine their ability to mimic human-like reasoning in diverse applications, from simple tasks to complex decision-making scenarios.
EXAONE 3.5 utilizes a decoder-only transformer architecture, which has become a standard in modern LLM design due to its efficiency in processing sequential data. The architecture is optimized for instruction-following tasks, allowing it to understand and execute user commands effectively. The key specifications for all the three model variants (2.4 billion, 7.8 billion, and 32 billion parameters) are as follows:
EXAONE 3.5 introduces groundbreaking advancements to its architecture, enhancing its ability to process extended contexts and deliver accurate, user-aligned outputs. These innovations set new standards for efficiency and performance in large language models.
It is a novel algorithm designed to fine-tune large language models by directly aligning them with human preferences without the complexities of traditional reinforcement learning methods. Unlike Reinforcement Learning from Human Feedback (RLHF), which requires intricate reward modeling and sampling, DPO simplifies the process by employing a straightforward classification loss to optimize model responses based on user preferences. This approach allows for stable and efficient training, making it computationally lightweight and easier to implement.
It is important to note that DPO needs a preference dataset. DPO is applied to preference data, which basically consists of a dataset of triplets (prompt, chosen answer, rejected answer).
Decontamination refers to a rigorous process aimed at enhancing the generalization performance of the models by removing contaminated examples from the training dataset. Since the training data often comes from web crawls, some test-set examples might appear in the training corpus, which can lead to biased evaluations. To address this, EXAONE uses a substring-level matching method to identify and eliminate these contaminated samples.
These architectural enhancements enable EXAONE models to excel in real-world applications while maintaining competitive performance across various benchmarks.
The evaluation benchmarks of EXAONE 3.5 Models were categorized into three groups:
As seen from the above Figures, all the three models excelled in real-world use cases and long-context scenarios, often surpassing baseline models of similar size. For example, the 32B model achieved an average score of 74.3 in real-world use cases, significantly outperforming competitors like Qwen 2.5 32B and Gemma 2 27B.
The EXAONE 3.5 excels in both mathematical and coding tasks. Across nine general benchmarks, the 2.4B model achieved the highest average score, surpassing other global models of the same size. Likewise, the 7.8B and 32B models also placed among the top performers, securing impressive average scores.
Below we will learn how to set up and query the EXAONE 3.5 model (7B variant) on Google Colab using Ollama. This guide walks you through the installation, configuration, and testing process to evaluate the model’s capabilities firsthand.
Install necessary libraries and tools, including Langchain and Ollama, to prepare the Colab environment for running the model.
!sudo apt update
!sudo apt install -y pciutils
!pip install langchain-ollama
!curl -fsSL https://ollama.com/install.sh | sh
!pip install ollama==0.4.2
Set up a threading process to run Ollama on Google Colab and ensure smooth execution.
import threading
import subprocess
import time
def run_ollama_serve():
subprocess.Popen(["ollama", "serve"])
thread = threading.Thread(target=run_ollama_serve)
thread.start()
time.sleep(5)
Download the EXAONE 3.5 model (7B variant) using Ollama to prepare it for querying.
!ollama pull exaone3.5
Define the query using Langchain, invoke the model, and display the response in Markdown format to evaluate the model’s performance.
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM
from IPython.display import Markdown
template = """Question: {question}"""
prompt = ChatPromptTemplate.from_template(template)
model = OllamaLLM(model="exaone3.5")
chain = prompt | model
# Prepare input for invocation
input_data = {
"question": 'I have 2 apples, then I buy 2 more. I bake a pie with 2 of the apples. After eating half of the pie how many apples do I have left?'}
# Invoke the chain with input data and display the response in Markdown format
response = chain.invoke(input_data)
display(Markdown(response))
Below we will test the model for different prompts:
For finding specific information in very long inputs
“Context: Climate change is causing glaciers to melt at an unprecedented rate,
leading to rising sea levels. In coastal cities like Miami and New Orleans, this
poses a significant threat to infrastructure and ecosystems. Furthermore,
scientists predict that if current trends continue, sea levels could rise by more
than six feet by the end of the century.
Question: Based on the context, what are two potential impacts of rising sea levels
due to climate change?”
Output:
As we can see from the output, the model has correctly identified the needed information from the context.
“Context: The Great Wall of China was built over several dynasties, primarily during
the Ming dynasty (1368–1644). It stretches over 13,000 miles and was constructed to
protect against invasions. Today, it stands as a UNESCO World Heritage site and
attracts millions of tourists each year.
Questions:
a) During which dynasty was most of the Great Wall constructed?
b) How long is the Great Wall of China?
c) What designation does it hold today?”
Output:
As we can see from the output, the model has correctly identified the needed information from the context.
Let us now look into some real world use cases below:
“User Query: "I received the wrong item in my order. What should I do?"
Prompt: Given the user's query, provide a clear and actionable response that guides
them through the return process. Include any necessary information about contacting
customer support or initiating a return.”
Output:
As we can see from the output, the model has answered pretty well from the perspective of a customer support engineer to the raised query.
“User Query: "I'm struggling with calculus concepts, especially derivatives. Can you explain it simply?"
Prompt: Explain the concept of derivatives in calculus using simple language and
examples. Include visual aids or analogies if possible to enhance understanding.”
Output:
As we can see from the output, the model has answered pretty well from the perspective of a an educational counsellor to help the student with the raised query.
Below we will look in to some logical reasoning tasks:
“Oliver picks 44 kiwis on Friday, then 58 on Saturday. On Sunday, he picks double
what he did on Friday, but five of them were smaller than average. How many kiwis
does Oliver have?”
Output:
The model provides an accurate response to the fragile mathematical context above and does not get confused by additional information.
”John is allergic to peanuts. He ate a peanut butter sandwich and felt fine. What
can we conclude about John's allergy?”
As we can see from the output above with the contradictory information in the input, the model gives an accurate response providing all the arguments correctly.
"한국의 수도는 무엇이며, 그 도시의 주요 특징은 무엇인가요?"
The english translation of the above query is “What is the capital of Korea and what are the main features of that city?”
Output:
As we can see from the output above, the response is accurate with enough details.
"인도의 총리는 누구입니까? 한국어로 설명하다"
The english translation of the above query is “Who is the Prime Minister of India? Explain in Korean”
Output:
The output shows that, although the answer includes clarification in Korean as instructed, the response is inaccurate. The accurate response should have been “Narendra Modi”.
EXAONE 3.5 by LG AI Research represents a significant advancement in large language models, offering three versatile configurations tailored for diverse applications. With its enhanced architecture, including an extended context length and robust instruction-following capabilities, EXAONE 3.5 excels in real-world tasks and multilingual contexts. Its performance benchmarks demonstrate competitive advantages in long-context processing and general domain tasks, making it a valuable tool for researchers and businesses alike, while adhering to ethical standards in AI development.
A. EXAONE 3.5 comes in three variants with different parameter counts: 2.4 billion, 7.8 billion, and 32 billion parameters, allowing it to serve different computational needs.
A. EXAONE 3.5 is bilingual, with proficiency in both English and Korean, making it suitable for global and multilingual applications.
A. EXAONE 3.5 can handle a maximum context length of 32,768 tokens, enabling it to process longer texts without losing coherence.
A. EXAONE 3.5’s performance evaluates real-world use cases, long-context processing, and general domain tasks such as mathematics, coding, and knowledge-based tasks.
A. EXAONE 3.5 employs a rigorous decontamination process to enhance its generalization performance by removing contaminated examples from the training data. Since the models train on web-crawled data, overlapping test-set examples with the training corpus can skew evaluation metrics and compromise reliability.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.