DeepSeek V3 developed by the Chinese AI research lab DeepSeek under High-Flyer has been a standout in the AI landscape since its initial open-source release in December 2024. Known for its efficiency, performance, and accessibility, it continues to evolve rapidly. The latest update to DeepSeek V3, tagged “DeepSeek V3 0324” was rolled out on March 24, 2025, bringing subtle yet impactful refinements. Let’s look at these updates and try the new DeepSeek V3 model.
Enhanced Model Performance
The latest version, DeepSeek-V3-0324, shows substantial improvements in reasoning and benchmark performance:
This indicates stronger problem-solving and knowledge retention capabilities compared to the previous V3 model.
Improved Front-End Web Development
Upgraded Chinese Writing Proficiency
Feature Enhancements
Unchanged Core Infrastructure
DeepSeek V3 on Chatbot Arena leaderboard:
DeepSeek’s new V3 got a 55% score on a tough test (aider’s polyglot benchmark), which is much better than its previous version. Right now, it’s the second-best AI that doesn’t focus on deep thinking/reasoning, just behind Sonnet 3.7. It also performs close to more advanced thinking models like R1 and o3-mini.
DeepSeek V3-0324 marks the first time an open weights model has been the leading non-reasoning model.
Also Read: DeepSeek V3-0324 vs Claude 3.7: Which is the Better Coder?
I am going to use the updated DeepSeek model locally and via API.
Installation Steps
Here’s what you need to run it on your machine (assuming you’re using llm
CLI + mlx backend):
!pip install llm
!llm install llm-mlx
!llm mlx download-model mlx-community/DeepSeek-V3-0324-4bit
This will:
llm
CLIDeepSeek-V3-0324-4bit
) — more memory-efficientRun a Chat Prompt Locally
Example:
!llm chat -m mlx-community/DeepSeek-V3-0324-4bit 'Generate an SVG of a pelican riding a bicycle'
Output:
If the model runs successfully, it should respond with an SVG snippet of a pelican on a bike – goofy and glorious.
Install Required Package
!pip3 install openai
Yes, even though you’re using DeepSeek, you’re interfacing with it using OpenAI-compatible SDK syntax.
Python Script for API Interaction
Here’s a cleaned-up, annotated version of what’s happening in the script:
from openai import OpenAI
import time
# Timing setup
start_time = time.time()
# Initialize client with your DeepSeek API key and base URL
client = OpenAI(
api_key="Your_api_key",
base_url="https://api.deepseek.com" # This is important
)
# Send a streaming chat request
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "How many r's are there in Strawberry"},
],
stream=True
)
# Handle streamed response and collect metrics
prompt_tokens = 0
generated_tokens = 0
full_response = ""
for chunk in response:
if hasattr(chunk, "usage") and hasattr(chunk.usage, "prompt_tokens"):
prompt_tokens = chunk.usage.prompt_tokens
if hasattr(chunk, "choices") and hasattr(chunk.choices[0], "delta") and hasattr(chunk.choices[0].delta, "content"):
content = chunk.choices[0].delta.content
if content:
generated_tokens += 1
full_response += content
print(content, end="", flush=True)
# Performance tracking
end_time = time.time()
total_time = end_time - start_time
# Token/sec calculations
prompt_tps = prompt_tokens / total_time if prompt_tokens > 0 else 0
generation_tps = generated_tokens / total_time if generated_tokens > 0 else 0
# Output metrics
print("\n\n--- Performance Metrics ---")
print(f"Prompt: {prompt_tokens} tokens, {prompt_tps:.3f} tokens-per-sec")
print(f"Generation: {generated_tokens} tokens, {generation_tps:.3f} tokens-per-sec")
print(f"Total time: {total_time:.2f} seconds")
print(f"Full response length: {len(full_response)} characters")
### Final Answer
After carefully examining each letter in "Strawberry," we find that the letter 'r' appears **3 times**.
**Answer:** There are **3 r's** in the word "Strawberry."
--- Performance Metrics ---
Prompt: 17 tokens, 0.709 tokens-per-sec
Generation: 576 tokens, 24.038 tokens-per-sec
Total time: 23.96 seconds
Full response length: 1923 characters
Find the full code and output here.
Using DeepSeek-V3-0324, an advanced language model, to automatically generate a digital marketing landing page—modern, sleek, and small in scope—by using a prompt-based code generation approach.
!pip3 install openai
# Please install OpenAI SDK first: `pip3 install openai`
from openai import OpenAI
import time
# Record the start time
start_time = time.time() # Add this line to initialize start_time
client = OpenAI(api_key="Your_API_KEY", base_url="https://api.deepseek.com")
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a Website Developer"},
{"role": "user", "content": "Code a modern small digital marketing Landing page"},
],
stream=True # This line makes the response a stream of events
)
# Initialize variables to track tokens and content
prompt_tokens = 0
generated_tokens = 0
full_response = ""
# Process the stream
for chunk in response:
# Track prompt tokens (usually only in first chunk)
if hasattr(chunk, "usage") and hasattr(chunk.usage, "prompt_tokens"):
prompt_tokens = chunk.usage.prompt_tokens
# Track generated content
if hasattr(chunk, "choices") and hasattr(chunk.choices[0], "delta") and hasattr(chunk.choices[0].delta, "content"):
content = chunk.choices[0].delta.content
if content:
generated_tokens += 1
full_response += content
print(content, end="", flush=True)
# Calculate timing metrics
end_time = time.time()
total_time = end_time - start_time
# Calculate tokens per second
if prompt_tokens > 0:
prompt_tps = prompt_tokens / total_time
else:
prompt_tps = 0
if generated_tokens > 0:
generation_tps = generated_tokens / total_time
else:
generation_tps = 0
# Print metrics similar to the screenshot
print("\n\n--- Performance Metrics ---")
print(f"Prompt: {prompt_tokens} tokens, {prompt_tps:.3f} tokens-per-sec")
print(f"Generation: {generated_tokens} tokens, {generation_tps:.3f} tokens-per-sec")
print(f"Total time: {total_time:.2f} seconds")
print(f"Full response length: {len(full_response)} characters")
Output:
The page is for a digital marketing agency called “NexaGrowth” It uses a modern, clean design with a carefully chosen color palette The layout is responsive and uses contemporary web design techniques The navigation is fixed at the top of the page The hero section is designed to immediately capture attention with a large headline and call-to-action buttons.
You can view the website here.
Find the full code and output here.
Also Read:
To clarify what’s new, here’s a quick recap of the V3 baseline before the March 24 update:
Find all about DeepSeek V3 Frontier LLM, Trained on a $6M Budget
The DeepSeek V3 0324 update might seem small, but it brings big improvements. It’s faster now, handling tasks like math and coding quickly. It’s also very steady, giving good results every time, whether you’re coding or solving problems. Plus, it can write 700 lines of code without messing up, which is great for people who build things with code. It still uses the smart 671B-parameter setup and stays cheap to use. Try the new DeepSeek V3 0324 and tell me what you think in the comments!
Stay tuned to Analytics Vidhya Blog for more such content!