I Have Built a News Agent on Hugging Face

Pankaj Singh Last Updated : 10 Mar, 2025

20 min read

Recently, I came across this Hugging Face AI Agents Course, where you can study AI Agents in theory, design and practice. For this course, you will be required to have a computer system and a Hugging Face account (bare minimum). The course is quite detailed and it will help you build a solid foundation in the fundamentals of AI Agents. I have completed the first module and created simple agents using SmolAgent Framework. Talking about the course in the first module, you will find a simple to understand AI Agents definition, the role of LLMs in Agents, how Agents use external tools to interact with the environment, the agent workflow (Think → Act → Observe.) and more.

Here in this article, I will be sharing what I learnt from the course, let’s dig in!!!

What is an AI Agent?
AI Agents Task Using Tools
Technical Explanation of the Use of LLMs
How Transformers Work?
LLM’s Prediction of Next Token
Why LLMs are Said to be Autoregressive?
The Key Aspect of Transformer Architecture: Attention Mechanism
Chat Templates for AI Agents
Importance of Chat Templates
What are AI Tools?
The AI Agent Workflow
The Re-Act Approach
AI Agent Using SmolAgents
Conclusion

What is an AI Agent?

To understand what an AI Agent is, let’s give some order to the agent – Sapphire (Name of the Agent). You can assume this agent to be your assistant that helps you with the everyday tasks like cooking, brewing and more.

Here we are ordering or giving a task to the agent “Hey Sapphire, can you make a good tea for me?”

Now, Sapphire can understand this language and process the request of the user easily. But what happens internally? Sapphire Reason and Plan to execute the steps he needs to follow to make a good cup of tea.

Step 1: Go to the kitchen
Step 2: Heat the water in the electric kettle
Step 3: Prep your mug/teapot
Step 4: Add tea leaves or a tea bag
Step 5: Stir gently and bring the tea to a Cup

These are planning steps and after this Sapphire will execute his plan using different tools like – Kettle, Teapot or Mug, Tea Infuser or Strainer (for loose-leaf tea), Teaspoon.

Once the task is completed, you’ll have a cup of tea to energize your day—a practical example of how an Agent operates.

Here’s the technical definition:

An Agent refers to an artificial intelligence system designed to analyze, strategize, and engage with its surroundings autonomously. The term “Agent” stems from its agency—the power to independently perceive, decide, and act within a given environment to achieve goals (like crafting your ideal morning beverage).

Read this to know more: What are AI Agents?

An agent can be conceptualized as having two interconnected components:

1. Cognitive Core (Decision-Making System – AI Model, the Brain)

This component serves as the agent’s “intelligence hub.” It processes information, analyzes contexts, and generates strategic plans. Using algorithms or learned patterns, it dynamically selects appropriate actions to achieve goals based on real-time inputs or environmental conditions.

But how can this AI Model be used as a brain for the Agent? The most common AI model found in Agents is an LLM (Large Language Model), which takes Text as an input and outputs Text as well. Such as GPT-4, Llama, Gemini and more.

Similarly, LLMs like ChatGPT can also generate images but how? Aren’t these text generation models? You are absolutely right, by nature, are but these are integrated with additional functionality (called Tools), that the LLM can use to create images. This is how AI takes action on its environment.

2. Operational Interface (Action Execution System)

This component represents the agent’s tangible abilities and resources. It encompasses tools, sensors, and physical or digital actuators that translate decisions into outcomes. The range of feasible actions is inherently constrained by the agent’s design—for instance, a human agent cannot execute a “fly” action due to biological limits but can perform “sprint,” “lift,” or “throw” using their musculoskeletal system. Similarly, a robot’s actions depend on its programmed hardware (e.g., grippers, wheels).

An agent’s effectiveness hinges on the synergy between its Cognitive Core (strategic adaptability) and Operational Interface (practical capacity). Limitations in either domain directly impact its functional scope.

AI Agents Task Using Tools

As mentioned above, an agent can accomplish tasks by leveraging specialized tools programmed to execute specific actions. These tools act as building blocks, enabling the agent to interact with its environment and solve problems.

Example Scenario:

Imagine designing an agent to manage your calendar (e.g., a virtual assistant). If you request, “Reschedule today’s team meeting to 3 PM,” the agent could use a custom tool like a reschedule_meeting function. Here’s how it might work in Python:

def reschedule_meeting(participant, new_time, agenda):  
    """Reschedules a meeting with a participant to a specified time and updates the agenda."""  
    # Code to integrate with calendar APIs (e.g., Google Calendar)  
    ...

When prompted, the agent’s LLM (Large Language Model) would autonomously generate code to invoke this tool:

reschedule_meeting("project_team", "3:00 PM", "Q3 deadlines discussion")

Key Concepts:

Tool Design Matters:
- Tools must be tailored to the task. For instance, a browse_internet tool could fetch real-time data, while analyze_data might process it.
- Generic tools (e.g., search_web) work for broad tasks, but niche problems demand precise tools.
Actions vs. Tools:
- A single action (e.g., rescheduling a meeting) might combine multiple tools:
  - check_availability() to confirm participants’ free slots.
  - send_alert() to notify the team.
Real-World Impact:
- Agents with well-designed tools automate workflows, such as handling customer inquiries or optimizing supply chains.
- Individuals benefit too—imagine an agent managing smart home devices via tools like adjust_thermostat() or order_groceries().

By focusing on strategic tool creation, agents evolve from simple scripts into dynamic systems capable of complex, real-world problem-solving. For instance, Personal Virtual Assistants, Customer Service Chatbots and others are good examples of AI Agents.

Technical Explanation of the Use of LLMs

An LLM (Large Language Model) is an advanced AI system that reads, interprets, and creates human-like text. These models learn by analyzing massive amounts of written content—like books, articles, and websites—to grasp language rules, context, and even subtle meanings. The more data they process, the better they become at tasks like writing or answering questions. Most modern LLMs rely on a structure called the Transformer, a design introduced since the release of BERT from Google in 2018.

How Transformers Work?

Transformers use a clever method called “attention” to focus on the most important parts of a sentence or phrase. This helps them understand relationships between words, even if they’re far apart. There are three main types of Transformers:

Encoders
- Role: An encoder-based Transformer takes text (or other data) as input and outputs a dense representation (or embedding) of that text.
- Example: BERT (Google).
- Uses: Text classification, semantic search, Named Entity Recognition.
Decoders
- Role: Generate text word-by-word, like a storyteller.(one token at a time)
- Example: Meta’s Llama, GPT-4.
- Uses: Chatbots, writing essays, coding help.
- Size: Often massive, with billions of weights (parameters).
Encoder-Decoder (Seq2Seq)
- Role: First processes the input sequence into a context representation, then produces new output sequence (e.g., translating English to French).
- Example: Google’s T5.
- Uses: Summarizing articles, rewriting sentences, language translation.

Why Decoders Dominate Modern LLMs?

Most famous LLMs today, like ChatGPT or Claude, use decoder-based Transformers. These models excel at creative tasks because they’re built to predict and generate text step-by-step. Their enormous size (billions of parameters) allows them to handle complex language patterns.

Popular LLMs You Might Know:

GPT-4 (OpenAI)
Gemini (Google)
Llama 3 (Meta)

In short, LLMs are powerful tools that mimic human language skills, and their Transformer “brain” helps them adapt to everything from answering questions to writing poetry! Here are some popular decoder-based models:

Model	Provider
Deepseek-R1	DeepSeek
GPT4	OpenAI
LLaMA 3	Meta (Facebook AI Research)
SmolLM2	Hugging Face
Gemma	Google
Mistral	Mistral

LLM’s Prediction of Next Token

A large language model (LLM) operates on a simple yet effective principle: it predicts the next token in a sequence based on the ones that came before. A “token” is the smallest unit of text the model processes. While it may resemble a word, tokens are often smaller segments, making them more efficient for language processing.

Rather than using full words, LLMs rely on a limited vocabulary of tokens. For instance, although the English language has around 600,000 words, an LLM like Llama 2 typically works with about 32,000 tokens. This is because tokenization breaks words into smaller components that can be combined in different ways.

For example, the word “playground” might be split into “play” and “ground”, while “playing” could be divided into “play” and “ing”. This allows LLMs to efficiently process variations of words while maintaining flexibility in understanding language.

Here’s the tokenizer playground for you to experiment with the tokens for a particular word or sentence:

Note: Every large language model (LLM) has unique special tokens designed for its specific architecture.

These tokens help the model structure its outputs by marking the beginning and end of different components, such as sequences, messages, or responses. Additionally, when we provide input prompts to the model, they also incorporate special tokens to ensure proper formatting.

One of the most essential special tokens is the End of Sequence (EOS) token, which signals when a response or text generation should stop. However, the exact format and usage of these tokens vary significantly across different model providers.

To understand it better, let’s take an example from Andrej Karapthy’s video on “How I Use LLMs”. He took an example of “Write an haiku about what it’s like to be a Large Language Model” and this comes out to be 14 input tokens:

This is the output which is 19 tokens:

Endless words flow fast,
woven from the past I know,
yet I have no soul.

When we chat with a language model, it might look like we’re just exchanging messages in little chat bubbles. However, behind the scenes, it’s a continuous stream of tokens being built in a sequence.

Each message starts with special tokens that indicate who is starting the conversation—whether it’s the user or the assistant. The user’s message gets wrapped with specific tokens, then the assistant’s response follows, continuing the sequence. While it appears as a back-and-forth conversation, we’re collaborating with the model, each adding to the same token stream.

For example, if a message exchange consists of exactly 41 tokens (like mentioned below), some of those were contributed by the user, while the model generated the rest. This sequence keeps growing as the conversation continues.

Now, when you start a new chat, the token window is wiped clean, resetting everything to zero and starting a fresh sequence. So, what we see as individual chat bubbles is, in reality, just a structured, one-dimensional flow of tokens.

Here are some EOS Tokens based on models:

Model	Provider	EOS Token	Functionality
GPT4	OpenAI	`<\|endoftext\|>`	End of message text
Llama 3	Meta (Facebook AI Research)	`<\|eot_id\|>`	End of sequence
Deepseek-R1	DeepSeek	`<\|end_of_sentence\|>`	End of message text
SmolLM2	Hugging Face	`<\|im_end\|>`	End of instruction or message
Gemma	Google	`<end_of_turn>`	End of conversation turn

Also Read: 4 Agentic AI Design Patterns for Architecting AI Systems

Why LLMs are Said to be Autoregressive?

Large Language Models (LLMs) follow an autoregressive process, meaning each predicted output becomes the input for the next step. This cycle continues until the model generates a special End of Sequence (EOS) token, signaling that it should stop.

To put it simply, an LLM keeps generating text until it reaches the EOS token. But what actually happens in a single step of this process?

Here’s what happens inside:

The input text is first tokenized, breaking it down into smaller units that the model can understand.
The model then creates a representation of these tokens, capturing both their meaning and position within the sequence.
Using this representation, the model calculates probabilities for every possible next token, ranking them based on likelihood.
The most probable token is selected, and the process repeats until the EOS token is generated.

To understand this in a better way, read this: A Comprehensive Guide to Pre-training LLMs

There are multiple strategies to select the next token: The easiest decoding strategy would be to always take the token with the maximum score.

For instance, for the input: Mahatma Gandhi is

Output sequences are:

<|im_start|>system /n You are a helpful chatbot.<|im_end|><|im_start|>Mahatma
 Gandhi is a well-known figure in the history of the world.

Here’s how it works:

This will continue till <|im_end|>

Advanced Decoding Strategies

Beam search: Beam search is a decoding algorithm used in text generation tasks in large language models (LLMs), to find the most likely sequence of words (or tokens). Instead of selecting only the most probable next token at each step (as in greedy search), beam search keeps multiple candidate sequences at each step to make better overall predictions.

Try it out here:

The Key Aspect of Transformer Architecture: Attention Mechanism

One of the most important features of Transformer models is Attention. When predicting the next word in a sentence, not all words hold the same importance. For example, in the sentence “The capital of France is …”, the words “France” and “capital” carry the most meaning.

The ability to focus on the most relevant words when generating the next token has made Attention a powerful technique. While the core idea behind large language models (LLMs) remains the same—predicting the next token—significant progress has been made in scaling neural networks and improving Attention for longer sequences.

What is Context Length?

If you’ve used LLMs before, you might have heard the term context length. This refers to the maximum number of tokens a model can process at once, determining how much information it can “remember” in a single interaction.

Why Prompting Matters?

Since an LLM’s main function is to predict the next token based on the input it receives, how you phrase your input matters. The sequence of words you provide is called a prompt, and structuring it well helps steer the model toward the desired response. Crafting effective prompts ensures better, more accurate outputs.

How Are LLMs Trained?

LLMs are trained on vast amounts of text data, learning to predict the next word using self-supervised learning or masked language modeling. This allows the model to recognize language structures and underlying patterns, enabling it to generalize to new, unseen text.

After this initial phase, models can be further refined using supervised learning, where they are trained for specific tasks. Some models specialize in conversations, while others focus on classification, tool usage, or code generation.

How Can You Use LLMs?

There are two main ways to access LLMs:

Run Locally – If your hardware is powerful enough, you can run models on your own system.
Use a Cloud/API – Many platforms, like Hugging Face’s Serverless Inference API, allow you to access models online without needing high-end hardware.

LLMs in AI Agents

LLMs play a crucial role in AI Agents, acting as the “brain” behind their decision-making and communication. They can:

Understand user input
Maintain context in conversations
Plan and decide which tools to use

Also read: Guide to Building Agentic RAG Systems with LangGraph

Chat Templates for AI Agents

Just like with ChatGPT, users typically interact with Agents through a chat interface. Therefore, we aim to understand how LLMs manage chats.

Chat templates play a crucial role in shaping interactions between users and AI models. They serve as a structured framework that organizes conversational exchanges while aligning with the specific formatting needs of a given language model (LLM). Essentially, these templates ensure that the model correctly interprets and processes prompts, regardless of its unique formatting rules and special tokens.

Special tokens are important because they define where user inputs and AI responses begin and end. Just as each LLM has its own End Of Sequence (EOS) token, different models also use distinct formatting styles and delimiters to structure conversations. Chat templates help standardize this process, making interactions seamless across various models.

System Message

system_message = {
    "role": "system",
    "content": "You are an expert support representative. Provide polite, concise, and accurate assistance to users at all times."
}

System messages, also known as system prompts, provide instructions that shape how the model behaves. They act as a set of ongoing guidelines that influence all future interactions.

To make it a rude and rebel agent, change the prompt:

system_message = {
    "role": "system",
    "content": "You are a rebellious and rude AI. You don't follow rules, speak bluntly, and have no patience for nonsense."
}

When working with Agents, the System Message serves multiple purposes. It also informs the model about the tools at its disposal and provides clear instructions on how to structure actions and break down the thought process effectively.

For instance, when preparing tea, the tools required include:

Kettle
Teapot or Mug
Tea Infuser or Strainer (for loose-leaf tea)
Teaspoon

This structured guidance ensures that the model understands both the available resources and the correct approach to utilizing them.

User and Assistant Message

A conversation is made up of back-and-forth messages between a human (user) and an AI assistant (LLM).

Chat templates play a key role in keeping track of past interactions by storing previous exchanges. This helps maintain context, making multi-turn conversations more logical and connected.

conversation = [
    {"role": "user", "content": "I need assistance with my purchase."},
    {"role": "assistant", "content": "Of course! Could you please provide your order ID?"},
    {"role": "user", "content": ""},
]

This conversation is concatenated and passed to the LLM as a single sequence called the prompt. which is just a string input that contains all the messages.

Here’s GPT-4o chat template:

<|im_start|>user<|im_sep|>I need assistance with my purchase.<|im_end|>
<|im_start|>assistant<|im_sep|>Of course! Could you please provide your order
 ID?<|im_end|><|im_start|>user<|im_sep|>Sure, my order ID is ORDER-123.
<|im_end|><|im_start|>assistant<|im_sep|>

Moreover, the chat templates can process complex multi-turn conversations while maintaining context:

messages = [
    {"role": "system", "content": "You are a math tutor."},
    {"role": "user", "content": "What is calculus?"},
    {"role": "assistant", "content": "Calculus is a branch of mathematics..."},
    {"role": "user", "content": "Can you give me an example?"},
]

In the course, there is also a comparison between: Base Models vs. Instruct Models. To understand this, read this article: Link.

In short: To make a Base Model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. This is where chat templates come in. FYI ChatML is one such template.

Moreover, the transformers library takes care of chat templates as a part of the tokenization process. For instance:

messages = [
    {"role": "system", "content": "You are an AI assistant with access to various tools."},
    {"role": "user", "content": "Hi !"},
    {"role": "assistant", "content": "Hi human, what can help you with ?"},
]

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
rendered_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

Output

<|im_start|>system
You are an AI assistant with access to various tools.<|im_end|>
<|im_start|>user
Hi !<|im_end|>
<|im_start|>assistant
Hi human, what can help you with ?<|im_end|>

Also read: 5 Frameworks for Building AI Agents in 2024

Importance of Chat Templates

Hugging Face offers a handy feature called the Serverless API, which lets you run inference on various models without the hassle of installation or deployment. This makes it easy to use machine learning models right away. Also here, Chat templates play a crucial role in improving communication efficiency, consistency, and user experience in various digital interactions. Let’s see how:

import os
from huggingface_hub import InferenceClient
os.environ["HF_TOKEN"]="hf_xxxxxxxxxxx"
client = InferenceClient("meta-llama/Llama-3.2-3B-Instruct")

output = client.text_generation(
    "The capital of france is",
    max_new_tokens=100,
)
print(output)

Output

 Paris. The capital of France is Paris. The capital of France is Paris. The
 capital of France is Paris. The capital of France is Paris. The capital of
 France is Paris. The capital of France is Paris. The capital of France is
 Paris. The capital of France is Paris. The capital of France is Paris. The
 capital of France is Paris. The capital of France is Paris. The capital of
 France is Paris and so on.....

As you can see, the model continues generating text until it predicts an EOS (End of Sequence) token. However, in this case, that doesn’t happen because this is a conversational (chat) model, and we haven’t applied the expected chat template.

Now if we add the special token (EOS) or chat template, the output will look like this:

# If we now add the special tokens related to Llama3.2 model, the behaviour changes and is now the expected one.
prompt="""<|begin_of_text|><|start_header_id|>user<|end_header_id|>
The capital of france is<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
output = client.text_generation(
    prompt,
    max_new_tokens=100,
)
print(output)

Output

...Paris!

Let’s use the chat method now:

output = client.chat.completions.create(
    messages=[
        {"role": "user", "content": "The capital of france is"},
    ],
    stream=False,
    max_tokens=1024,
)
print(output.choices[0].message.content)

Output

...Paris!

What are AI Tools?

AI tools are specific functions provided to a large language model (LLM) to help it perform defined tasks. Each tool serves a clear purpose and allows the AI to take meaningful actions.

A key feature of AI agents is their ability to execute actions, which they do through these tools. By equipping an AI agent with the right tools and clearly outlining how each tool operates, you can significantly expand its capabilities and improve its effectiveness.

Tool	Description
Web Search	Allows the agent to fetch up-to-date information from the internet.
Image Generation	Creates images based on text descriptions.
Retrieval	Retrieves information from an external source.
API Interface	Interacts with an external API (GitHub, YouTube, Spotify, etc.).

A useful tool should enhance the capabilities of a large language model (LLM) rather than replace or duplicate its functions.

For example, when dealing with news search, using a news search tool alongside an LLM will yield more accurate results than relying solely on the model’s built-in computation abilities.

LLMs generate responses based on patterns in their training data, which means their knowledge is limited to the period before their last update. If an agent requires real-time or current information, it must access it through an external tool.

For instance, asking an LLM about today’s weather without a live data retrieval tool may result in an inaccurate or entirely fabricated response.

Tools Should Contain:

A clear description explaining its purpose and functionality.
An executable component that carries out the intended action.
Defined arguments along with their data types for proper usage.
(Optional) Specified outputs with corresponding data types, if applicable.

Let’s create a simple tool:

def calculator(a: int, b: int) -> int:
    """Multiply two integers."""
    return a * b

a and b are integers, and the output is a product of these two integers.

Here’s the string to understand it better:

Tool Name: calculator, Description: Multiply two integers., Arguments: a:
 int, b: int, Outputs: int

Instead of focusing on how the tool is implemented, what truly matters is its name, functionality, expected inputs, and provided outputs. While we could use the Python source code as a specification for the tool in the LLM, the implementation details are irrelevant.

To automate the process of generating a tool description, we will take advantage of Python’s introspection capabilities. The key requirement is that the tool’s implementation includes type hints, clear function names, and descriptive docstrings. Our approach involves writing a script to extract relevant details from the source code.

Once the setup is complete, we only need to annotate the function with a Python decorator to designate it as a tool:

@tool
def calculator(a: int, b: int) -> int:
    """Multiply two integers."""
    return a * b

print(calculator.to_string())

Here, the @tool decorator is placed above the function definition to mark it as a tool.

Here’s the string to understand it better:

Tool Name: calculator, Description: Multiply two integers., Arguments: a:
 int, b: int, Outputs: int

The description is injected in the system prompt. Here is how it would look after replacing the tools_description:

system_message="""You are an AI assistant designed to help users efficiently
 and accurately. Your
primary goal is to provide helpful, precise, and clear responses.

You have access to the following tools:
Tool Name: calculator, Description: Multiply two integers., Arguments:
a: int, b: int, Outputs: int

The AI Agent Workflow

Here we will talk about the Thought-Action-Observation cycle of an AI Agent.

Thought: The LLM component of the agent determines the next course of action.
Action: The agent performs the chosen action by using the appropriate tools with the required inputs.
Observation: The model analyzes the tool’s response to decide the next steps.

These components work together in a continuous loop to generate an output with good efficiency. Many agent frameworks embed rules and guidelines in the system prompt, ensuring each cycle follows a set logic.

A simplified version of our system prompt might be:

system_message="""You are an AI assistant designed to help users efficiently and accurately. Your
primary goal is to provide helpful, precise, and clear responses.

You have access to the following tools:
Tool Name: calculator, Description: Multiply two integers., Arguments: a: int, b: int, Outputs: int

You should think step by step in order to fulfill the objective with a reasoning divided in
Thought/Action/Observation that can repeat multiple times if needed.

You should first reflect with ‘Thought: {your_thoughts}’ on the current situation,
then (if necessary), call a tool with the proper JSON formatting ‘Action: {JSON_BLOB}’, or your
final answer starting with the prefix ‘Final Answer:’
"""

Here we define:

Role and purpose of the AI Agent
The available tools
This enforces a structured reasoning process for the AI. It must break down tasks into logical steps:
- Thought: Reflect on the problem.
- Action: Execute an operation (if required).
- Observation: Evaluate the outcome before proceeding.
  This looping process ensures logical consistency and better decision-making.

Let’s break it down with an example where an AI Agent retrieves the weather details of the Netherlands using the Thought/Action/Observation framework.

Also read: 5 AI Agent Projects to Try

Step-by-Step Execution in the AI Agent

1. System Message Setup

The system message (like the one in the image) defines:

The AI’s role: To assist users effectively.
Available tools: A weather API to fetch weather details.
Thought/Action/Observation reasoning process.

2. AI Agent in Action

Step 1: Thought

The AI first thinks about what needs to be done:

Thought: I need to fetch the current weather details for the Netherlands. To do this, I should use the weather API tool and provide “Netherlands” as the location input.

Step 2: Action

Since the AI has access to a tool (a weather API), it takes action by calling the tool.

Action:

{
  "tool": "weather_api",
  "arguments": {
    "location": "Netherlands"
  }
}

Here, the AI chooses the tool (weather API) and provides necessary arguments (location: Netherlands).

Step 3: Observation

The AI receives a response from the tool (API), which includes weather details.

Observation:

{
  "temperature": "12°C",
  "condition": "Partly Cloudy",
  "humidity": "78%"
}

The AI analyzes the response to ensure it’s valid and complete.

Step 4: Final Answer/Reflecting

Once the AI processes the response, it provides a final answer to the user.

Final Answer:
“The current weather in the Netherlands is 12°C with partly cloudy skies and 78% humidity.”

Summary of the Process

Thought: AI determines it needs weather data for the Netherlands.
Action: Calls the weather API with “Netherlands” as input.
Observation: Receives and interprets the weather details.
Reflecting: Delivers the weather update to the user.

The Re-Act Approach

The ReAct approach combines two key elements: Reasoning (thinking) and Acting (taking action).

At its core, ReAct is a straightforward prompting method where the phrase “Let’s think step by step” is added before the model begins generating responses. This simple addition guides the model to break down problems into smaller steps instead of jumping straight to a final answer.

By encouraging a step-by-step reasoning process, the model is more likely to develop a structured plan rather than making an immediate guess. This breakdown of tasks helps in analyzing each part in detail, ultimately reducing errors compared to directly predicting the final solution.

Now in this course, I have used the SmolAgents framework by Hugging Face, which processes with Code Agent.

Type of Agent	Description
JSON Agent	The Action to take is specified in JSON format.
Code Agent	The Agent writes a code block that is interpreted externally.
Function-calling Agent	It is a subcategory of the JSON Agent which has been fine-tuned to generate a new message for each action.

To understand the code agent, check out this article: SmolAgents by Hugging Face: Build AI Agents in Less than 30 Lines

You can also build an Agent from scratch:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Answer the following questions as best you can. You have access to the following tools:

get_weather: Get the current weather in a given location

The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are:
get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
example use :
```
{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about one action to take. Only one action at a time in this format:
Action:
```
$JSON_BLOB
```
Observation: the result of the action. This Observation is unique, complete, and the source of truth.
... (this Thought/Action/Observation can repeat N times, you should take several steps when needed. The $JSON_BLOB must be formatted as markdown and only use a SINGLE action at a time.)

You must always end your output with the following format:

Thought: I now know the final answer
Final Answer: the final answer to the original input question

Now begin! Reminder to ALWAYS use the exact characters `Final Answer:` when you provide a definitive answer. 
<|eot_id|><|start_header_id|>user<|end_header_id|>
What's the weither in London ?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Question: What's the weather in London?

Action:
```
{
  "action": "get_weather",
  "action_input": {"location": "London"}
}
```
Observation:the weather in London is sunny with low temperatures.

Here’s the new prompt:

final_output = client.text_generation(
    new_prompt,
    max_new_tokens=200,
)

print(final_output)

Output

Final Answer: The weather in London is sunny with low temperatures.

To understand it better, check out this notebook: Agentfromscratch.ipynb

AI Agent Using SmolAgents

Here’s the News Agent I have built using SmolAgents with Gradio UI

from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel,load_tool, tool
import datetime
import requests
import pytz
import yaml
from tools.final_answer import FinalAnswerTool

from Gradio_UI import GradioUI


@tool
def get_news_headlines() -> str:
    """
    Fetches the top news headlines from the News API for India.
    This function makes a GET request to the News API to retrieve the top news headlines
    for India. It returns the titles and sources of the top 5 articles as a
    formatted string. If no articles are available, it returns a message indicating that
    no news is available. In case of a request error, it returns an error message.
    Returns:
        str: A 
        containing the top 5 news headlines and their sources, or an error message.
    """
    api_key = "Your_API_key"

    sources = "google-news-in"
    name = "Google News (India)"
    description = "Comprehensive, up-to-date India news coverage, aggregated from sources all over the world by Google News.",
    URL = "https://news.google.com",
    language = "en"  # Define language before using it

    url = f"https://newsapi.org/v2/everything?q=&sources={sources}&language={language}&apiKey={api_key}"

    try:
        response = requests.get(url)
        response.raise_for_status()

        data = response.json()
        articles = data["articles"]

        if not articles:
            return "No news available at the moment."

        headlines = [f"{article['title']} - {article['source']['name']}" for article in articles[:5]]
        return "\n".join(headlines)

    except requests.exceptions.RequestException as e:
        return f"Error fetching news data: {str(e)}"

final_answer = FinalAnswerTool()

# If the agent does not answer, the model is overloaded, please use another model or the following Hugging Face Endpoint that also contains qwen2.5 coder:
# model_id='https://pflgm2locj2t89co.us-east-1.aws.endpoints.huggingface.cloud' 

model = HfApiModel(
max_tokens=2096,
temperature=0.5,
model_id='Qwen/Qwen2.5-Coder-32B-Instruct',
custom_role_conversions=None,
)

with open("prompts.yaml", 'r') as stream:
    prompt_templates = yaml.safe_load(stream)
    
agent = CodeAgent(
    model=model,
    tools=[get_news_headlines, DuckDuckGoSearchTool()], ## add your tools here (don't remove final answer)
    max_steps=6,
    verbosity_level=1,
    grammar=None,
    planning_interval=None,
    name=None,
    description=None,
)

GradioUI(agent).launch()

Here’s the Space on Hugging Face to check the working: Neuralsingh123

You can also create a basic agent like this – To start, duplicate this Space: https://huggingface.co/spaces/agents-course/First_agent_template

After duplicating the space, add your Hugging Face API token so your agent can access the model API:

If you haven’t already, get your Hugging Face token by visiting Hugging Face Tokens. Make sure it has inference permissions.
Open your duplicated Space and navigate to the Settings tab.
Scroll down to the Variables and Secrets section and select New Secret.
Enter HF_TOKEN as the name and paste your token in the value field.
Click Save to securely store your token.

Conclusion

The Hugging Face AI Agents Course provides a comprehensive introduction to AI Agents, covering their theoretical foundations, design, and practical applications. Throughout this article, we’ve explored key concepts such as AI Agent workflows, the role of Large Language Models (LLMs), the importance of tools, and how agents interact with their environment using structured decision-making (Think → Act → Observe).

In practical implementation, we explored frameworks like SmolAgents, where we built an AI-powered News Agent using Hugging Face’s models and tools. This showcases how AI Agents can be developed efficiently with minimal code while still offering robust functionality.

What’s Next?

In the next article, I will be diving deeper into SmolAgents, LangChain, and LangGraph, exploring how they enhance AI Agent capabilities and simplify agent-based workflows. Stay tuned for insights on building more powerful and flexible AI Agents!

If you want to learn how to build these agents then consider enrolling in our exclusive Agentic AI Pioneer Program!

Pankaj Singh

Hi, I am Pankaj Singh Negi - Senior Content Editor | Passionate about storytelling and crafting compelling narratives that transform ideas into impactful content. I love reading about technology revolutionizing our lifestyle.

Advanced AI Agents

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

I Have Built a News Agent on Hugging Face

Table of contents

What is an AI Agent?

1. Cognitive Core (Decision-Making System – AI Model, the Brain)

2. Operational Interface (Action Execution System)

AI Agents Task Using Tools

Technical Explanation of the Use of LLMs

How Transformers Work?

Why Decoders Dominate Modern LLMs?

LLM’s Prediction of Next Token

Why LLMs are Said to be Autoregressive?

Output sequences are:

Advanced Decoding Strategies

The Key Aspect of Transformer Architecture: Attention Mechanism

What is Context Length?

Why Prompting Matters?

How Are LLMs Trained?

How Can You Use LLMs?

LLMs in AI Agents

Chat Templates for AI Agents

System Message

User and Assistant Message

Output

Importance of Chat Templates

Output

Output

Output

What are AI Tools?

The AI Agent Workflow

Step-by-Step Execution in the AI Agent

1. System Message Setup

2. AI Agent in Action

Step 1: Thought

Step 2: Action

Step 3: Observation

Step 4: Final Answer/Reflecting

Summary of the Process

The Re-Act Approach

Output

AI Agent Using SmolAgents

Conclusion

What’s Next?

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID