DeepSeek R1, released in January 2025 by Chinese AI startup DeepSeek, is making waves in the AI industry as an open-source language model that rivals some of the most advanced models like OpenAI’s o1. DeepSeek-R1 distinguishes itself through its mixture of experts (MoE) architecture, reinforcement learning techniques, and focus on reasoning capabilities, enabling it to perform text-based tasks with efficiency and accuracy. It has 671 billion parameters, but only activates 37 billion parameters per request, reducing computational costs. DeepSeek R1 distills its advanced reasoning capabilities into smaller, more accessible open-source models like Llama and Qwen1. It fine-tunes these models using multiple data points generated from the main DeepSeek R1 model.
In this tutorial, we will build a Retrieval Augmented Generation (RAG) system the DeepSeek-R1-Distill-Llama-8B model. This distilled DeepSeek-R1 model was created by fine-tuning the Llama 3.1 8B model on the data generated with DeepSeek-R1.
This article was published as a part of the Data Science Blogathon.
DeepSeek-R1 and DeepSeek-R1-Zero are first-generation reasoning models3. DeepSeek-R1-Zero is a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. It demonstrates remarkable reasoning capabilities and emerges with powerful and interesting reasoning behaviors through RL. This approach marks a step toward improving language model reasoning capabilities using pure RL. However, DeepSeek-R1-Zero faces challenges such as poor readability and language mixing
DeepSeek-R1 overcomes the limitations of DeepSeek-R1-Zero by incorporating cold-start data before reinforcement learning, providing a strong foundation for reasoning and non-reasoning tasks.
DeepSeek-R1 stands out with its advanced architecture and enhanced efficiency, pushing the boundaries of AI performance. This model introduces key innovations that set it apart from its predecessors and competitors.
Differentiating Features of DeepSeek R1 model:
DeepSeek-R1’s innovative use of reinforcement learning (RL) signifies a radical shift from traditional AI training methods, which typically depend on massive labeled datasets. Unlike supervised learning, RL allows models to learn through interaction and feedback, significantly reducing reliance on large datasets and mitigating ethical concerns related to data privacy and bias.
GRPO, or Group Relative Policy Optimization, represents a reinforcement learning approach designed to enhance the reasoning prowess of Large Language Models (LLMs). First presented in the DeepSeekMath publication concerning mathematical reasoning, GRPO innovates upon traditional Proximal Policy Optimization (PPO) by dispensing with a value function model.
GRPO’s methodology, applicable with both rule/binary-based rewards and general reward models, refines models regarding their helpfulness. The process unfolds as follows:
DeepSeek R1 has demonstrated impressive performance on several benchmarks.
To adapt DeepSeek R1’s advanced reasoning abilities for use in more compact language models, the creators compiled a dataset of 800,000 examples generated by DeepSeek R1 itself. These examples were then used to fine-tune existing models such as QWEN and LLAMA. The results demonstrated that this relatively simple knowledge distillation method effectively transferred R1’s sophisticated reasoning capabilities to these other models. Remarkably, this transfer was achieved without any further reinforcement learning, highlighting the quality and instructional power inherent in the original DeepSeek R1’s.
We will be building a RAG system based on the DeepSeek-R1-Distill-Qwen-1.5B on Google Colab with T4 GPU.
Install all necessary libraries to set up the RAG system on Google Colab.
!pip install -q torch transformers sentence-transformers faiss-cpu pypdf
!pip install -U langchain-huggingface
!pip install -q langchain langchain-community
Load essential Python libraries for document processing, embedding storage, retrieval, and model interaction.
import langchain as lc
from langchain import LLMMathChain
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema import Document
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import pipeline
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_huggingface import HuggingFacePipeline
Use a PDF file as the knowledge source for the RAG system by extracting its text.
We have used this PDF for creating the RAG system.
# Load content from local PDFs
loader = PyPDFLoader("./Coffee.pdf")
docs = loader.load()
Split the document into smaller chunks and store their vector embeddings in a FAISS database.
splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=30)
chunked_docs = splitter.split_documents(docs)
db = FAISS.from_documents(chunked_docs,
HuggingFaceEmbeddings(model_name='BAAI/bge-base-en-v1.5'))
Create a retriever to fetch relevant document chunks based on similarity search.
retriever = db.as_retriever(
search_type="similarity",
search_kwargs={'k': 3}
)
Load the DeepSeek-R1-Distill-Qwen-1.5B model and its tokenizer for text generation.
model_name ="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
Set up the retrieval-augmented generation (RAG) pipeline using the model and a custom prompt template.
# Pipeline for text generation
text_generation_pipeline = pipeline(
model=model,
tokenizer=tokenizer,
task="text-generation",
temperature=0.2,
do_sample=True,
repetition_penalty=1.1,
return_full_text=False,
max_new_tokens=500,
)
llm = HuggingFacePipeline(pipeline=text_generation_pipeline)
# Prompt template to match desired output format
prompt_template = """
You are an academic researcher who is doing research on Chemical Sciences. Use the following context to answer the question using information provided by the paper:
{context}
Question: {question}
"""
prompt = PromptTemplate(
input_variables=["context", "question"],
template=prompt_template,
)
llm_chain = prompt | llm | StrOutputParser()
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| llm_chain
)
Ask a question related to the document and use the RAG pipeline to generate an answer.
question = "Which coffee by-products can lead to reduction of intestinal pH? "
# Invoke the chain to generate answers
result = rag_chain.invoke(question)
# Display the output
print
Based on the given documents, what conclusion can you draw? The options are: A) Melanoidins B) Chlorogenic acids C) Osmolytes D) Carbohydrates I need to choose the correct option. Okay, so I'm trying to figure out this chemistry question about coffee by-products and how they affect the pH of the intestine. Let me start by understanding the question. The question asks: Which coffee by-products can lead to a reduction in the intestinal pH? The options are A) Melanoidins, B) Chlorogenic acids, C) Osmolytes, D) Carbohydrates. Looking at the documents provided, each one seems to discuss different aspects related to coffee by-products and their potential roles in the gut microbiota. Since all three documents are about coffee by-products, I'll focus on those. First, let's recall some basic concepts. Intestinal pH refers to the acidity or basicity of the soil around the digestive system. A lower pH means more acidic, while a higher pH means more alkaline. In the gut microbiota, bacteria often live in environments that are either acidic or basic. For example, some bacteria thrive in acidic conditions, others in neutral, and some in alkaline. Now, looking at the documents: 1. The first document talks about the effects of certain coffee products on gut microbiota but doesn't directly mention pH changes. It focuses more on the impact on the microbiome rather than the chemical properties of the by-products. 2. The second and third documents seem to delve deeper into specific by-products. They mention melanoidins and chlorogenic acids. Also, there's a discussion about probiotics and gut health. Let me break down the key points from these documents. Starting with melanoidins: These are pigments produced by coffee beans. They are known to have anti-inflammatory properties. From what I remember, melanoidins can act as cofactors in various biochemical processes. One study I've heard about suggests that melanoidins might influence the activity of enzymes involved in the gut microbiome. Specifically, they could help maintain the balance of certain microbial species. If melanoidins are present, maybe they contribute to keeping the gut environment more balanced, possibly affecting pH levels. Chlorogenic acids: These are another type of pigment produced by coffee beans. They're similar to melanoidins but have slightly different structures. Chlorogenic acids are also known for their antioxidant properties.
As observed from the output above, the answer is enriched with elaborate reasoning since we used the DeepSeek-r1 distilled model (deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
Lets now see what the output would have been with the original Qwen 1.5 B model. We can just replace the model “”Qwen/Qwen2.5-1.5B” and re reun the code.
Answer:
melonoidins
As seen from the output from the original Qwen 1.5 B model, it lacks the reasoning and human like text as we got from the DeepSeek-R1-Distill-Qwen-1.5B model. Also, “Chlorogenic acids” is not mentioned in the output from the original model.
question = "What are three main polysaccharides found in non-defective coffee beans?"
# Invoke the chain to generate answers
result = rag_chain.invoke(question)
# Display the output
print(result)
Output
Based on the provided context, select all correct options from A to D. To solve this, I need to look for the relevant information about polysaccharides in non-defective coffee beans. First, I'll go through each document's page content to find mentions of polysaccharides like arabinogalactan, mannan, etc. Looking at the first document, it lists arabinogalactan, mannan, and cellulose as the main polysaccharides. So that's one set. The second document also mentions arabinogalactan, mannan, and cellulose. It further notes that xylan is predominant, but that's more about the byproduct, so maybe not directly related to the main ones. Third document again lists arabinogalactan, mannan, and cellulose. It talks about pectins and xylan, which might be byproducts. So, putting it together, the main polysaccharides are arabinogalactan, mannan, and cellulose. Therefore, the correct options should include these three. </think> The three main polysaccharides found in non-defective coffee beans are arabinogalactan, mannan, and cellulose. Answer: A, B, C
As observed from the output above, the answer is enriched with long reasoning and human like text even with a small 1.5 Billion Model DeepSeek-r1 distilled model (deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).
DeepSeek-R1 is a major leap in language model reasoning. It uses pure reinforcement learning (RL) to achieve strong performance on benchmarks. The model features a mixture-of-experts architecture and advanced training methods like Group Relative Policy Optimization (GRPO). These innovations improve efficiency, scalability, and cost-effectiveness. DeepSeek-R1 excels at distilling complex reasoning into smaller models. This makes AI development more coherent and high-performing. Using RAG with distilled models like DeepSeek-R1 boosts efficiency and reasoning in smaller architectures. It also reduces costs and increases speed. This approach enables faster, more resource-efficient deployments for developers.
A. DeepSeek-R1 improves upon DeepSeek-R1-Zero by incorporating cold-start data before reinforcement learning (RL), which enhances its reasoning capabilities and reduces challenges like poor readability and language mixing that were present in DeepSeek-R1-Zero.
A. DeepSeek-R1 employs pure RL to refine its reasoning abilities. Unlike traditional models that rely on supervised fine-tuning, RL allows the model to learn through interaction, feedback, and self-evolution, improving its performance over time. It also uses rewards for accurate predictions and well-structured responses.
A. The MoE architecture in DeepSeek-R1 allows it to activate only a subset of its 671 billion parameters (37 billion per request), significantly improving computational efficiency and reducing costs, which makes it a more resource-effective solution than standard transformer-based models.
A. DeepSeek-R1 consistently outperforms competitors, achieving top scores in benchmarks like MATH-500, AIME 2024, and SWE-bench Verified. It has been shown to surpass OpenAI’s o1 model in tasks like mathematical reasoning and software engineering problem-solving.
A. Knowledge distillation in DeepSeek-R1 refers to transferring its advanced reasoning abilities to smaller models like QWEN and LLAMA. By using a dataset of 800,000 examples generated by DeepSeek R1, the distilled models successfully adopt its sophisticated reasoning capabilities without needing additional reinforcement learning.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.