Bilingual Powerhouse EXAONE 3.5 Sets New AI Standards

Nibedita Dutta Last Updated : 16 Jan, 2025

9 min read

EXAONE 3.5 is the latest iteration in a series of large language models developed by LG AI Research, designed to enhance the capabilities and accessibility of artificial intelligence technologies. Released in December 2024, EXAONE 3.5 encompasses three distinct configurations: 2.4 billion, 7.8 billion, and 32 billion parameters. Each model variant is tailored to meet different performance needs, ranging from lightweight applications suitable for mobile devices to high-performance tasks requiring extensive computational resources. With a focus on bilingual proficiency in English and Korean, EXAONE 3.5 aims to set new standards in instruction-following accuracy and long-context understanding, making it an invaluable tool across various sectors.

Learning Objectives

Understand the architecture and design choices of EXAONE 3.5, including its decoder-only transformer model and extended context length.
Explore the bilingual proficiency of EXAONE 3.5 in English and Korean, and its applications in multilingual scenarios.
Learn about the two-stage training process and how fine-tuning enhances instruction-following and long-context understanding.
Gain insights into advanced methodologies like the decontamination process and Direct Preference Optimization (DPO) for training LLMs.
Evaluate EXAONE 3.5’s performance benchmarks across real-world use cases, long-context processing, and general domain tasks.

This article was published as a part of the Data Science Blogathon.

How Reasoning-Based LLMs Work?
EXAONE 3.5 Model Architecture
Architectural Innovations in EXAONE 3.5
What is Direct Preference Optimization (DPO)?
What is Decontamination Process?
Performance Benchmarks
Running EXAONE 3.5 (7 Billion) on Google Colab Using Ollama
Testing the Model For Different Prompts
Real-world Use Case Scenarios
Conclusion
Frequently Asked Questions

How Reasoning-Based LLMs Work?

Reasoning-based large language models , like EXAONE 3.5, process complex tasks that require logical thinking, problem-solving, and understanding of intricate patterns. Built using advanced architectures such as transformer-based networks, these models excel at handling sequential data and long-contexts. They train on vast datasets to recognize relationships between pieces of information, enabling them to generate accurate responses to queries, reason through problems, and follow instructions effectively.

By leveraging fine-tuning techniques like Supervised Fine-tuning (SFT) and Direct Preference Optimization (DPO), these LLMs refine their ability to mimic human-like reasoning in diverse applications, from simple tasks to complex decision-making scenarios.

EXAONE 3.5 Model Architecture

EXAONE 3.5 utilizes a decoder-only transformer architecture, which has become a standard in modern LLM design due to its efficiency in processing sequential data. The architecture is optimized for instruction-following tasks, allowing it to understand and execute user commands effectively. The key specifications for all the three model variants (2.4 billion, 7.8 billion, and 32 billion parameters) are as follows:

Maximum Context Length:32,768 tokens
Layers: 32
Feedforward Dimension: 14,336

Architectural Innovations in EXAONE 3.5

EXAONE 3.5 introduces groundbreaking advancements to its architecture, enhancing its ability to process extended contexts and deliver accurate, user-aligned outputs. These innovations set new standards for efficiency and performance in large language models.

Extended Context Length: The maximum context length has been significantly increased to accommodate up to 32,768 tokens, enabling effective processing of larger texts without losing coherence.
Two-Stage Training Process: EXAONE underwent a two-stage training process consisting of general-domain training followed by fine-tuning for specific tasks related to long-context understanding. In the pre-training phase, the process removes duplicates and personally identifiable information from datasets to improve the models’ performance and reduce infrastructure costs. In the post-training phase, Supervised Fine-tuning (SFT) and Direct Preference Optimization (DPO) methods enhance the models’ instruction-following capabilities and enable them to better reflect user preferences.
Decontamination Process: The team implemented a rigorous decontamination process to ensure unbiased evaluations by removing contaminated data from the training set. They borrowed a decontamination method from a global model whose performance was rigorously evaluated. The process involved comparing the training data with evaluation datasets, repeating it 10 times.

What is Direct Preference Optimization (DPO)?

It is a novel algorithm designed to fine-tune large language models by directly aligning them with human preferences without the complexities of traditional reinforcement learning methods. Unlike Reinforcement Learning from Human Feedback (RLHF), which requires intricate reward modeling and sampling, DPO simplifies the process by employing a straightforward classification loss to optimize model responses based on user preferences. This approach allows for stable and efficient training, making it computationally lightweight and easier to implement.

It is important to note that DPO needs a preference dataset. DPO is applied to preference data, which basically consists of a dataset of triplets (prompt, chosen answer, rejected answer).

What is Decontamination Process?

Decontamination refers to a rigorous process aimed at enhancing the generalization performance of the models by removing contaminated examples from the training dataset. Since the training data often comes from web crawls, some test-set examples might appear in the training corpus, which can lead to biased evaluations. To address this, EXAONE uses a substring-level matching method to identify and eliminate these contaminated samples.

These architectural enhancements enable EXAONE models to excel in real-world applications while maintaining competitive performance across various benchmarks.

Performance Benchmarks

The evaluation benchmarks of EXAONE 3.5 Models were categorized into three groups:

Real-world use cases – evaluated the models’ ability to understand and respond to user queries in practical scenarios
Long-context processing – assessed the models’ capability to process and retrieve information from extended textual inputs
General domain tasks – tested the models’ proficiency in mathematics, coding, and knowledge-based tasks.

As seen from the above Figures, all the three models excelled in real-world use cases and long-context scenarios, often surpassing baseline models of similar size. For example, the 32B model achieved an average score of 74.3 in real-world use cases, significantly outperforming competitors like Qwen 2.5 32B and Gemma 2 27B.

The EXAONE 3.5 excels in both mathematical and coding tasks. Across nine general benchmarks, the 2.4B model achieved the highest average score, surpassing other global models of the same size. Likewise, the 7.8B and 32B models also placed among the top performers, securing impressive average scores.

Running EXAONE 3.5 (7 Billion) on Google Colab Using Ollama

Below we will learn how to set up and query the EXAONE 3.5 model (7B variant) on Google Colab using Ollama. This guide walks you through the installation, configuration, and testing process to evaluate the model’s capabilities firsthand.

Step1: Installation of Libraries

Install necessary libraries and tools, including Langchain and Ollama, to prepare the Colab environment for running the model.

!sudo apt update
!sudo apt install -y pciutils
!pip install langchain-ollama
!curl -fsSL https://ollama.com/install.sh | sh
!pip install ollama==0.4.2

Step2: Enabling the Threading Process to run Ollama on Google Colab

Set up a threading process to run Ollama on Google Colab and ensure smooth execution.

import threading
import subprocess
import time

def run_ollama_serve():
  subprocess.Popen(["ollama", "serve"])

thread = threading.Thread(target=run_ollama_serve)
thread.start()
time.sleep(5)

Step3: Pulling the Ollama Model

Download the EXAONE 3.5 model (7B variant) using Ollama to prepare it for querying.

!ollama pull exaone3.5

Step4: Querying the Model

Define the query using Langchain, invoke the model, and display the response in Markdown format to evaluate the model’s performance.

from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM
from IPython.display import Markdown

template = """Question: {question}"""

prompt = ChatPromptTemplate.from_template(template)

model = OllamaLLM(model="exaone3.5")

chain = prompt | model

# Prepare input for invocation
input_data = {
    "question": 'I have 2 apples, then I buy 2 more. I bake a pie with 2 of the apples. After eating half of the pie how many apples do I have left?'}

# Invoke the chain with input data and display the response in Markdown format
response = chain.invoke(input_data)
display(Markdown(response))

Testing the Model For Different Prompts

Below we will test the model for different prompts:

Needle in the Haystack Tasks

For finding specific information in very long inputs

“Context: Climate change is causing glaciers to melt at an unprecedented rate, 
leading to rising sea levels. In coastal cities like Miami and New Orleans, this
 poses a significant threat to infrastructure and ecosystems. Furthermore, 
scientists predict that if current trends continue, sea levels could rise by more
 than six feet by the end of the century.

Question: Based on the context, what are two potential impacts of rising sea levels
 due to climate change?”

Output:

As we can see from the output, the model has correctly identified the needed information from the context.

Ancestral Trace Challenge

“Context: The Great Wall of China was built over several dynasties, primarily during
 the Ming dynasty (1368–1644). It stretches over 13,000 miles and was constructed to
 protect against invasions. Today, it stands as a UNESCO World Heritage site and 
attracts millions of tourists each year.

Questions:
a) During which dynasty was most of the Great Wall constructed?
b) How long is the Great Wall of China?
c) What designation does it hold today?”

Output:

As we can see from the output, the model has correctly identified the needed information from the context.

Real-world Use Case Scenarios

Let us now look into some real world use cases below:

Customer Support Scenario

“User Query: "I received the wrong item in my order. What should I do?"

Prompt: Given the user's query, provide a clear and actionable response that guides
 them through the return process. Include any necessary information about contacting 
customer support or initiating a return.”

Output:

As we can see from the output, the model has answered pretty well from the perspective of a customer support engineer to the raised query.

Educational Assistance

“User Query: "I'm struggling with calculus concepts, especially derivatives. Can you explain it simply?"

Prompt: Explain the concept of derivatives in calculus using simple language and
 examples. Include visual aids or analogies if possible to enhance understanding.”

Output:

As we can see from the output, the model has answered pretty well from the perspective of a an educational counsellor to help the student with the raised query.

Logical Reasoning Tasks

Below we will look in to some logical reasoning tasks:

Fragile Mathematical Context

“Oliver picks 44 kiwis on Friday, then 58 on Saturday. On Sunday, he picks double
 what he did on Friday, but five of them were smaller than average. How many kiwis
 does Oliver have?”

Output:

The model provides an accurate response to the fragile mathematical context above and does not get confused by additional information.

Contradictory Information

”John is allergic to peanuts. He ate a peanut butter sandwich and felt fine. What 
can we conclude about John's allergy?”

As we can see from the output above with the contradictory information in the input, the model gives an accurate response providing all the arguments correctly.

Korean Tasks on General Knowledge

"한국의 수도는 무엇이며, 그 도시의 주요 특징은 무엇인가요?"

The english translation of the above query is “What is the capital of Korea and what are the main features of that city?”

Output:

As we can see from the output above, the response is accurate with enough details.

Korean Task on General Knowledge with Desired Output in Korean

"인도의 총리는 누구입니까? 한국어로 설명하다"

The english translation of the above query is “Who is the Prime Minister of India? Explain in Korean”

Output:

The output shows that, although the answer includes clarification in Korean as instructed, the response is inaccurate. The accurate response should have been “Narendra Modi”.

Conclusion

EXAONE 3.5 by LG AI Research represents a significant advancement in large language models, offering three versatile configurations tailored for diverse applications. With its enhanced architecture, including an extended context length and robust instruction-following capabilities, EXAONE 3.5 excels in real-world tasks and multilingual contexts. Its performance benchmarks demonstrate competitive advantages in long-context processing and general domain tasks, making it a valuable tool for researchers and businesses alike, while adhering to ethical standards in AI development.

Key Takeaways

EXAONE 3.5 offers three variants with different parameter counts (2.4 billion, 7.8 billion, and 32 billion), catering to a range of applications, from mobile-friendly solutions to high-performance tasks requiring more computational power.
The model supports a maximum context length of 32,768 tokens, allowing it to effectively process longer texts and maintain coherence for tasks requiring in-depth responses.
EXAONE 3.5 excels in both English and Korean, making it suitable for a global audience and enabling multilingual use cases.
EXAONE 3.5 undergoes a two-stage training process: first, general-domain training, followed by fine-tuning for long-context understanding, optimizing the model’s real-world applicability.
A rigorous decontamination process removes biased data from the training set, ensuring fair and unbiased model evaluations.

Frequently Asked Questions

Q1. How many parameter configurations does EXAONE 3.5 have?

A. EXAONE 3.5 comes in three variants with different parameter counts: 2.4 billion, 7.8 billion, and 32 billion parameters, allowing it to serve different computational needs.

Q2. What languages does EXAONE 3.5 support?

A. EXAONE 3.5 is bilingual, with proficiency in both English and Korean, making it suitable for global and multilingual applications.

Q3. What is the maximum context length supported by EXAONE 3.5?

A. EXAONE 3.5 can handle a maximum context length of 32,768 tokens, enabling it to process longer texts without losing coherence.

Q4. What performance benchmarks were used to evaluate EXAONE 3.5?

A. EXAONE 3.5’s performance evaluates real-world use cases, long-context processing, and general domain tasks such as mathematics, coding, and knowledge-based tasks.

Q5. What is the decontamination process in EXAONE 3.5?

A. EXAONE 3.5 employs a rigorous decontamination process to enhance its generalization performance by removing contaminated examples from the training data. Since the models train on web-crawled data, overlapping test-set examples with the training corpus can skew evaluation metrics and compromise reliability.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Nibedita Dutta

Nibedita completed her master’s in Chemical Engineering from IIT Kharagpur in 2014 and is currently working as a Senior Data Scientist. In her current capacity, she works on building intelligent ML-based solutions to improve business processes.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Data analyst Learning Path

Tableau Learning Path

NLP Learning Path

Data Scientist Learning Path

Data Engineer Learning Path

MLOps Learning Path

AI Engineer Learning Path

Computer Vision Learning Path

Generative AI Learning Path

Generative AI Roadmap for Enterprises

LLMs Roadmap

Prompt Engineer Leaning Path

Bilingual Powerhouse EXAONE 3.5 Sets New AI Standards

Learning Objectives

Table of contents

How Reasoning-Based LLMs Work?

EXAONE 3.5 Model Architecture

Architectural Innovations in EXAONE 3.5

What is Direct Preference Optimization (DPO)?

What is Decontamination Process?

Performance Benchmarks

Running EXAONE 3.5 (7 Billion) on Google Colab Using Ollama

Step1: Installation of Libraries

Step2: Enabling the Threading Process to run Ollama on Google Colab

Step3: Pulling the Ollama Model

Step4: Querying the Model

Testing the Model For Different Prompts

Needle in the Haystack Tasks

Ancestral Trace Challenge

Real-world Use Case Scenarios

Customer Support Scenario

Educational Assistance

Logical Reasoning Tasks

Fragile Mathematical Context

Contradictory Information

Korean Tasks on General Knowledge

Korean Task on General Knowledge with Desired Output in Korean

Conclusion

Key Takeaways

Frequently Asked Questions

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm