Building a RAG System for AI Reasoning with DeepSeek R1 Distilled Model

Nibedita Dutta Last Updated : 11 Feb, 2025
11 min read

DeepSeek R1, released in January 2025 by Chinese AI startup DeepSeek, is making waves in the AI industry as an open-source language model that rivals some of the most advanced models like OpenAI’s o1. DeepSeek-R1 distinguishes itself through its mixture of experts (MoE) architecture, reinforcement learning techniques, and focus on reasoning capabilities, enabling it to perform text-based tasks with efficiency and accuracy. It has 671 billion parameters, but only activates 37 billion parameters per request, reducing computational costs. DeepSeek R1 distills its advanced reasoning capabilities into smaller, more accessible open-source models like Llama and Qwen1. It fine-tunes these models using multiple data points generated from the main DeepSeek R1 model.

In this tutorial, we will build a Retrieval Augmented Generation (RAG) system the DeepSeek-R1-Distill-Llama-8B model. This distilled DeepSeek-R1 model was created by fine-tuning the Llama 3.1 8B model on the data generated with DeepSeek-R1.

Learning Objectives

  • Understand the architecture, key innovations, and reinforcement learning techniques behind the DeepSeek-R1 model.
  • Explore the role of Group Relative Policy Optimization (GRPO) in enhancing DeepSeek-R1’s reasoning capabilities.
  • Analyze DeepSeek-R1’s benchmark performance and its efficiency compared to other leading AI models.
  • Implement a Retrieval Augmented Generation (RAG) system using DeepSeek-R1 distilled models like Llama and Qwen.

This article was published as a part of the Data Science Blogathon.

What is Deepseek-R1 model?

DeepSeek-R1 and DeepSeek-R1-Zero are first-generation reasoning models3. DeepSeek-R1-Zero is a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. It demonstrates remarkable reasoning capabilities and emerges with powerful and interesting reasoning behaviors through RL. This approach marks a step toward improving language model reasoning capabilities using pure RL. However, DeepSeek-R1-Zero faces challenges such as poor readability and language mixing

DeepSeek-R1 overcomes the limitations of DeepSeek-R1-Zero by incorporating cold-start data before reinforcement learning, providing a strong foundation for reasoning and non-reasoning tasks.

What Makes DeepSeek-R1 Stand Out?

DeepSeek-R1 stands out with its advanced architecture and enhanced efficiency, pushing the boundaries of AI performance. This model introduces key innovations that set it apart from its predecessors and competitors.

Key Innovations in DeepSeek R1 model

Differentiating Features of DeepSeek R1 model:

  • Mixture-of-Experts (MoE) Architecture: Unlike standard transformer-based models, DeepSeek R1 employs a MoE architecture, activating only 37 billion of its 671 billion parameters per request. This improves efficiency and reduces computational costs.
  • Reinforcement Learning (RL): DeepSeek-R1’s training process uses reinforcement learning to enhance its reasoning capabilities. This approach eliminates the need for a separate value function model, making the fine-tuning process more efficient.
  • Cost-Effectiveness: DeepSeek R1 was trained using fewer resources (2,000 Nvidia GPUs and approximately $5.6 million) compared to similar projects by major U.S.-based tech companies. Its API costs are also substantially lower than competitors, making it a cost-effective solution for developers.
  • Superior Benchmark Performance: DeepSeek-R1 consistently scores higher across accuracy and percentile tests compared to competitors. For example, it achieved 79.8% on AIME 2024, 96.3% on Codeforces, 71.5% on GPQA Diamond, 97.3% on MATH-500, 90.8% on MMLU, and 49.2% on SWE-bench Verified.
  • Scalability: DeepSeek has introduced “distilled” versions of R1, ranging from 1.5 billion to 70 billion parameters, making it accessible for various hardware configurations.
  • Long Context Handling: Supports variable context lengths, allowing efficient management of complex tasks that require detailed analysis. It supports a context length of 128K tokens. DeepSeek-R1 is adept at maintaining logic and context over long interactions.

Reinforcement Learning in DeepSeek R1 Model

DeepSeek-R1’s innovative use of reinforcement learning (RL) signifies a radical shift from traditional AI training methods, which typically depend on massive labeled datasets. Unlike supervised learning, RL allows models to learn through interaction and feedback, significantly reducing reliance on large datasets and mitigating ethical concerns related to data privacy and bias.

  • Pure RL: DeepSeek R1 pioneers a training process centered around pure RL, bypassing the traditional reliance on supervised fine-tuning. DeepSeek-R1-Zero learns complex reasoning behaviors purely through reinforcement learning without any supervised fine-tuning.
  • Self-Evolution: The model refines its behavior through trial and error, achieving higher performance with each training iteration.
  • Accuracy Rewards: The model earns rewards by matching its predictions to ground truth answers, creating a precise feedback loop in tasks with clear right or wrong answers like mathematics. The system uses rule-based verification, testing code against specific cases and validating mathematical solutions against established formulas.
  • Format Rewards: The model receives additional rewards for clear, well-structured responses, and learns to express its reasoning process using specific tags.
  • Chain-of-Thought (CoT) Reasoning: The model articulates its thought process step-by-step, allowing it to refine its own reasoning, identify errors, and correct them on the fly, making it more accurate over time. Reinforcement learning and fine-tuning use long Chain of Thought data to encourage the model to deliver longer, more introspective outputs.
  • Efficiency and Innovation: DeepSeek’s approach shifts the focus from merely accumulating more data to enhancing the quality of data through smarter computation.
  • Combination of RL and SFT: DeepSeek-R1 combines a small amount of high-quality “cold-start” data alongside iterative reinforcement learning and supervised fine-tuning to produce more coherent, user-friendly outputs while maintaining state-of-the-art reasoning performance.

Group Relative Policy Optimization in DeepSeek-R1

GRPO, or Group Relative Policy Optimization, represents a reinforcement learning approach designed to enhance the reasoning prowess of Large Language Models (LLMs). First presented in the DeepSeekMath publication concerning mathematical reasoning, GRPO innovates upon traditional Proximal Policy Optimization (PPO) by dispensing with a value function model.

How GRPO works in Deepseek-R1

GRPO’s methodology, applicable with both rule/binary-based rewards and general reward models, refines models regarding their helpfulness. The process unfolds as follows:

  • Sampling: The current policy guides the generation of multiple outputs for each given prompt
  • Reward Scoring: A rule-based or outcome-based reward function assigns a score to each generated output.
  • Advantage Calculation: The system establishes a baseline using the average reward of the outputs and computes each solution’s advantage within the group relative to this baseline. It then normalizes the reward within the group.
  • Policy Optimization: The policy seeks to maximize the GRPO objective, incorporating calculated advantages and a KL divergence term; this contrasts with PPO’s implementation of the KL term within the reward

Performance Benchmarks of DeepSeek R1 model

DeepSeek R1 has demonstrated impressive performance on several benchmarks.

  • Benchmark Results: DeepSeek claims that R1 outperforms OpenAI’s o1 on AIME, MATH-500, and SWE-bench Verified. It also achieved results comparable to OpenAI’s o1 model on benchmarks like MATH-500 and SWE-bench.
  • MATH-500: DeepSeek-R1 leads with 97.3%, slightly surpassing OpenAI’s o1-1217 at 96.4%
  • SWE-bench Verified: DeepSeek-R1 achieved a score of 49.2% on this benchmark, which assesses reasoning in software engineering tasks
  • AIME 2024: In a 2025 performance evaluation, DeepSeek-R1 demonstrated impressive results, performing on par with OpenAI’s OpenAI-o1-1217

What are DeepSeek-R1 Distilled models?

To adapt DeepSeek R1’s advanced reasoning abilities for use in more compact language models, the creators compiled a dataset of 800,000 examples generated by DeepSeek R1 itself. These examples were then used to fine-tune existing models such as QWEN and LLAMA. The results demonstrated that this relatively simple knowledge distillation method effectively transferred R1’s sophisticated reasoning capabilities to these other models. Remarkably, this transfer was achieved without any further reinforcement learning, highlighting the quality and instructional power inherent in the original DeepSeek R1’s.

Benefits of RAG with DeepSeek R1 Distilled Models

  • Improved Reasoning in Smaller Models: Distillation transfers the reasoning capabilities of the larger DeepSeek R1 model into more compact architectures. This allows smaller models like the 8B version to improve over their corresponding base Llama models in specific reasoning tasks
  • Enhanced Efficiency: Distilled models significantly improve inference speed and reduce computational costs compared to the original 671B parameter model. Smaller distilled models can process requests much faster and consume fewer resources, making them more cost-effective for production deployments.
  • Cost-Effectiveness: Distilled models provide sufficient capability for many applications at a lower cost, making them a cost-effective solution for developers[1].
  • Accessibility: Distilled models extend the reach of advanced reasoning by fine-tuning smaller open-source models like Llama and Qwen, bringing powerful reasoning capabilities to models that are more accessible for a range of applications

Building a RAG System using DeepSeek-R1-Distill-Qwen-1.5B model

We will be building a RAG system based on the DeepSeek-R1-Distill-Qwen-1.5B on Google Colab with T4 GPU.

Step 1: Install the prerequisite libraries

Install all necessary libraries to set up the RAG system on Google Colab.

!pip install -q torch transformers sentence-transformers faiss-cpu pypdf
!pip install -U langchain-huggingface 
!pip install -q langchain langchain-community 

Step 2: Importing Necessary Libraries

Load essential Python libraries for document processing, embedding storage, retrieval, and model interaction.

import langchain as lc
from langchain import LLMMathChain
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema import Document
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import pipeline
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_huggingface import HuggingFacePipeline

Step 3: Loading the PDF 

Use a PDF file as the knowledge source for the RAG system by extracting its text.

 We have used this PDF for creating the RAG system.  

# Load content from local PDFs
loader = PyPDFLoader("./Coffee.pdf")
docs = loader.load()

Step 4: Storing the Embeddings of the Chunked Data in a DB

Split the document into smaller chunks and store their vector embeddings in a FAISS database.

splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=30)
chunked_docs = splitter.split_documents(docs)

db = FAISS.from_documents(chunked_docs,
                          HuggingFaceEmbeddings(model_name='BAAI/bge-base-en-v1.5'))

Step 5: Defining the Retriever

Create a retriever to fetch relevant document chunks based on similarity search.

retriever = db.as_retriever(
    search_type="similarity",
    search_kwargs={'k': 3}
)

Step 6: Loading the Model

Load the DeepSeek-R1-Distill-Qwen-1.5B model and its tokenizer for text generation.

model_name ="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Step 7: Loading the RAG pipeline

Set up the retrieval-augmented generation (RAG) pipeline using the model and a custom prompt template.

# Pipeline for text generation
text_generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    temperature=0.2,
    do_sample=True,
    repetition_penalty=1.1,
    return_full_text=False,
    max_new_tokens=500,
)

llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

# Prompt template to match desired output format
prompt_template = """
You are an academic researcher who is doing research on Chemical Sciences. Use the following context to answer the question using information provided by the paper:

{context}

Question: {question}
"""

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template,
)

llm_chain = prompt | llm | StrOutputParser()


rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | llm_chain
)

Step 8: Querying the model

Ask a question related to the document and use the RAG pipeline to generate an answer.

question = "Which coffee by-products can lead to reduction of intestinal pH? "

# Invoke the chain to generate answers
result = rag_chain.invoke(question)

# Display the output
print

Output

Based on the given documents, what conclusion can you draw?

The options are:
A) Melanoidins
B) Chlorogenic acids
C) Osmolytes
D) Carbohydrates

I need to choose the correct option.
Okay, so I'm trying to figure out this chemistry question about coffee by-products 
and how they affect the pH of the intestine. Let me start by understanding the
 question.

The question asks: Which coffee by-products can lead to a reduction in the 
intestinal pH? The options are A) Melanoidins, B) Chlorogenic acids, C) Osmolytes,
 D) Carbohydrates.

Looking at the documents provided, each one seems to discuss different aspects
 related to coffee by-products and their potential roles in the gut microbiota. 
Since all three documents are about coffee by-products, I'll focus on those.

First, let's recall some basic concepts. Intestinal pH refers to the acidity or
 basicity of the soil around the digestive system. A lower pH means more acidic, 
while a higher pH means more alkaline. In the gut microbiota, bacteria often live in
 environments that are either acidic or basic. For example, some bacteria thrive in
 acidic conditions, others in neutral, and some in alkaline.

Now, looking at the documents:
1. The first document talks about the effects of certain coffee products on gut
 microbiota but doesn't directly mention pH changes. It focuses more on the impact
 on the microbiome rather than the chemical properties of the by-products.

2. The second and third documents seem to delve deeper into specific by-products.
 They mention melanoidins and chlorogenic acids. Also, there's a discussion about
 probiotics and gut health.

Let me break down the key points from these documents.

Starting with melanoidins: These are pigments produced by coffee beans. They are
 known to have anti-inflammatory properties. From what I remember, melanoidins can
 act as cofactors in various biochemical processes. One study I've heard about
 suggests that melanoidins might influence the activity of enzymes involved in the
 gut microbiome. Specifically, they could help maintain the balance of certain
 microbial species. If melanoidins are present, maybe they contribute to keeping the
 gut environment more balanced, possibly affecting pH levels.

Chlorogenic acids: These are another type of pigment produced by coffee beans.
 They're similar to melanoidins but have slightly different structures. Chlorogenic
 acids are also known for their antioxidant properties.

As observed from the output above, the answer is enriched with elaborate reasoning since we used the DeepSeek-r1 distilled model (deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)

Output from Original Qwen2.5-1.5B

Lets now see what the output would have been with the original Qwen 1.5 B model. We can just replace the model “”Qwen/Qwen2.5-1.5B” and re reun the code.

Answer: 
melonoidins

As seen from the output from the original Qwen 1.5 B model, it lacks the reasoning and human like text as we got from the DeepSeek-R1-Distill-Qwen-1.5B model. Also, “Chlorogenic acids” is not mentioned in the output from the original model. 

Another Query 

question = "What are three main polysaccharides found in non-defective coffee beans?"

# Invoke the chain to generate answers
result = rag_chain.invoke(question)

# Display the output
print(result)

Output

Based on the provided context, select all correct options from A to D.
To solve this, I need to look for the relevant information about polysaccharides in
 non-defective coffee beans.

First, I'll go through each document's page content to find mentions of
 polysaccharides like arabinogalactan, mannan, etc.

Looking at the first document, it lists arabinogalactan, mannan, and cellulose as
 the main polysaccharides. So that's one set.

The second document also mentions arabinogalactan, mannan, and cellulose. It further
 notes that xylan is predominant, but that's more about the byproduct, so maybe not
 directly related to the main ones.

Third document again lists arabinogalactan, mannan, and cellulose. It talks about
 pectins and xylan, which might be byproducts.

So, putting it together, the main polysaccharides are arabinogalactan, mannan, and
 cellulose. Therefore, the correct options should include these three.
</think>

The three main polysaccharides found in non-defective coffee beans are 
arabinogalactan, mannan, and cellulose.

Answer: A, B, C

As observed from the output above, the answer is enriched with long reasoning and human like text even with a small 1.5 Billion Model DeepSeek-r1 distilled model (deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).

Conclusion

DeepSeek-R1 is a major leap in language model reasoning. It uses pure reinforcement learning (RL) to achieve strong performance on benchmarks. The model features a mixture-of-experts architecture and advanced training methods like Group Relative Policy Optimization (GRPO). These innovations improve efficiency, scalability, and cost-effectiveness. DeepSeek-R1 excels at distilling complex reasoning into smaller models. This makes AI development more coherent and high-performing. Using RAG with distilled models like DeepSeek-R1 boosts efficiency and reasoning in smaller architectures. It also reduces costs and increases speed. This approach enables faster, more resource-efficient deployments for developers.

Key Takeaways

  • DeepSeek-R1 employs pure reinforcement learning (RL) to enhance reasoning capabilities, marking a shift from traditional supervised fine-tuning methods and reducing reliance on large labeled datasets.
  • The Mixture-of-Experts (MoE) architecture of DeepSeek-R1 activates only a subset of its massive 671 billion parameters per request, improving efficiency and reducing computational costs.
  • Despite its advanced capabilities, DeepSeek-R1 uses fewer resources than other models and reduces API costs, making it an affordable option for developers.
  • DeepSeek-R1 outperforms competitors across multiple benchmarks, such as MATH-500 and AIME, demonstrating its strong reasoning performance and accuracy.
  • DeepSeek R1’s reasoning abilities have been successfully transferred to smaller, compact models through knowledge distillation, allowing for high-quality performance across various hardware configurations without additional reinforcement learning. Using RAG with distilled models like DeepSeek R1 enhances the efficiency and reasoning capabilities of smaller architectures, offering significant advantages in cost and speed.

Frequently Asked Questions

Q1. What is the key difference between DeepSeek-R1 and DeepSeek-R1-Zero?

A. DeepSeek-R1 improves upon DeepSeek-R1-Zero by incorporating cold-start data before reinforcement learning (RL), which enhances its reasoning capabilities and reduces challenges like poor readability and language mixing that were present in DeepSeek-R1-Zero.

Q2. How does DeepSeek-R1 use reinforcement learning (RL) in its training?

A. DeepSeek-R1 employs pure RL to refine its reasoning abilities. Unlike traditional models that rely on supervised fine-tuning, RL allows the model to learn through interaction, feedback, and self-evolution, improving its performance over time. It also uses rewards for accurate predictions and well-structured responses.

Q3. What are the key benefits of the Mixture-of-Experts (MoE) architecture in DeepSeek-R1?

A. The MoE architecture in DeepSeek-R1 allows it to activate only a subset of its 671 billion parameters (37 billion per request), significantly improving computational efficiency and reducing costs, which makes it a more resource-effective solution than standard transformer-based models.

Q4. How does DeepSeek-R1 perform on standard benchmark tests compared to other models?

A. DeepSeek-R1 consistently outperforms competitors, achieving top scores in benchmarks like MATH-500, AIME 2024, and SWE-bench Verified. It has been shown to surpass OpenAI’s o1 model in tasks like mathematical reasoning and software engineering problem-solving.

Q5. What is knowledge distillation, and how is it used in DeepSeek-R1?

A. Knowledge distillation in DeepSeek-R1 refers to transferring its advanced reasoning abilities to smaller models like QWEN and LLAMA. By using a dataset of 800,000 examples generated by DeepSeek R1, the distilled models successfully adopt its sophisticated reasoning capabilities without needing additional reinforcement learning.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Nibedita completed her master’s in Chemical Engineering from IIT Kharagpur in 2014 and is currently working as a Senior Data Scientist. In her current capacity, she works on building intelligent ML-based solutions to improve business processes.

Responses From Readers

Clear

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details