LLMs have now exploded in their use across various domains. They are no longer limited to chatbots hosted on the web but are being integrated into enterprises, government agencies, and beyond. A key innovation in this landscape is building custom tools for AI agents using smolagents, allowing these systems to extend their capabilities. Using smolagents, AI agents can leverage tools, take actions in defined environments, and even call other agents.
This workflow enables LLM-powered AI systems to operate with greater autonomy, making them more reliable for achieving complete end-to-end task completion.
This article was published as a part of the Data Science Blogathon.
This is an article meant for the intermediate-level developers and data professionals who are well versed in using basic LLMs. The following are expected:
Those are the bare minimum that is expected of you to learn from this tutorial, but here are further recommended background for you to benefit fully from this tutorial:
You are probably familiar with ChatGPT. You can ask questions to it, and it answers your questions. It can also write code for you, tell you a joke, etc.
Because it can code, and it can answer your questions, you might want to use it to complete tasks for you, too. Where you demand something from it, and it completes a full task for you.
If it is vague for you right now, don’t worry. Let me give you an example. You know LLMs can search the web, and they can reason using information as input. So, you can combine these capabilities together, and ask an LLM to create a full travel itinerary for you. Right?
Yes. You will ask something like, “Hey AI, I am planning a vacation from 1st April to 7th April. I would like to visit the state of Himachal Pradesh. I really like snow, skiing, rope-ways, and lush green landscape. Can you plan an itinerary for me? Also find the lowest flight costs for me from the Kolkata airport.”
Taking in this information an agent should be able to find and compare all flight costs of those days inclusive, including return journey, and which places you should visit given your criteria, and hotels and costs for each place.
Here, the AI model is using your given criteria to interact with the real world to search for flights, hotels, buses, etc., and also suggest you places to visit.
This is what we call agentic approach in AI. And let’s learn more about it.
The agent is based on an LLM and LLM can interact with the external world using only text. Text in, text out.
So, when we ask an agent to do something, it takes that input as text data, and it reasons using text/language, and it can only output text.
It is in the middle part or the last part where the use of tools come in. The tools return some desired values, and using those values the agent returns the response in text. It can also do something very different, like making a transaction on the stock market, or generate an image.
The workflow of an AI agent should be understood like this:
Understand –> Reason –> Interact
This is one step of an agentic workflow, and when multiple steps are involved, like in most use cases, it should be seen as:
Thought –> Action –> Observation
Using the command given to the agent, it thinks about the task at hand, analyzes what needs to be done (Thought), and then it acts towards the completion of the task (Action), and then it observes if any further actions are needed to be performed, or how complete the whole task is (Observation).
In this tutorial, we will code up a chat agent where we will ask it to greet the user according to the user’s time zone. So, when a user says, “I am in Kolkata, greet me!”, the agent will think about the request, and parse it carefully. Then it will fetch the current time according to the timezone, this is the action. And then, it will observe for further task, whether the user have requested an image. If not, then it will go on and greet the user. Otherwise, it will further take action invoking the image generation model.
So far, we were talking in conceptual terms, and workflow. Now lets take a dive into the concrete components of an AI agent.
You can say that an AI agent has two parts:
The brain of the agent is a traditional LLM model like llama3, phi4, GPT4, etc. Using this, the agent thinks and reasons.
The tools are externally coded tools that the agent can invoke. It can call an API for a stock price or the current temperature of a place. Also have another agent that it can invoke. It can also be a simple calculator.
Using `smolagents` framework, you can create any function in Python with any AI model that has been tuned for function calling.
In our example, we will have a tool to tell the user a fun fact about a dog, fetch the current timezone, and generate an image. The model will be a Qwen LLM model. More on the model later.
They are now not merely used as text-completion tools and answering questions in Q&A formats. They are now used as small but but crucial cogs in much larger systems where many elements of those systems are not based on Generative AI.
Below is an abstract concept image:
In this abstract system graph, we see that GenAI components often have to take important inputs from non-Generative AI traditional system components.
We need tools to interact with these component and not the answer that is present in an LLM’s knowledge base.
As we have seen that LLM models serve as the “brain” of the agent, the agent will inherit all the faults of LLMs as well. Some of them are:
The above are only some reasons to use deterministic tools.
`smolagents` is a library used as a framework for using agents in your LLM application. It is developed by HuggingFace, and it is Open Source.
There are other frameworks such as LlamaIndex, LangGraph, etc. that you can use for the same purpose. But, for this tutorial, we will focus on smolagents alone.
There are some libraries that create agents that output JSON, and there are some libraries that output Python code directly. Research has shown this approach to be much more practical and efficient. smolagents is a library that creates agents that output Python code directly.
All code are available on the GitHub repository for the project. I will not go through all the code there, but I will highlight the most important pieces of that codebase.
The prompts.yaml file contain many example tasks and responses formats we expect the model to see. It also uses Jinja templating. It gets added to the prompt that we ultimately send to the model. We will later see that the prompts are added to the `CodeAgent` class.
Tool-calling agents can work in two ways- they can either return a JSON blob, or they can directly write code.
It is apparent that if the tool-calling agent uses code directly, it is much better in practice. It also saves you the overhead of having the system to parse the JSON in the middle.
`smolagents` library falls in the second category of LLM agents, i.e. it uses code directly.
The app.py file
This is the file where we create the agent class, and this is where we define our own tools.
These are the imports:
from smolagents import CodeAgent,DuckDuckGoSearchTool, HfApiModel,load_tool,tool
import datetime
import requests
import pytz
import yaml
from tools.final_answer import FinalAnswerTool
We are importing `CodeAgent` class from the `smolagents` library. Also importing `load_tool` and `tool` classes. We will use these in time.
We want to call an API that has stored cool facts about dogs. It is hosted on https://dogapi.dog. You can visit the website and read the docs about using the API. It is completely free.
To make a Python function usable by the AI agent, you have to:
@tool
def get_amazing_dog_fact()-> str:
"""A tool that tells you an amazing fact about dogs using a public API.
Args: None
"""
# URL for the public API
url = "https://dogapi.dog/api/v2/facts?limit=1"
# case when there is a response from the API
try:
#
response = requests.get(url)
if response.status_code == 200: # excpected, okay status code
# parsing status code
cool_dog_fact = response.json()['data'][0]['attributes']['body']
return cool_dog_fact
else:
# in case of an unfavorable status code
return "A dog fact could not be fetched."
except requests.exceptions.RequestException as e:
return "A dog fact could not be fetched."
Note that we are returning a properly parsed string as the final answer.
Below is a tool to get the current time in a timezone of your choice:
@tool
def get_current_time_in_timezone(timezone: str) -> str:
"""A tool that fetches the current local time in a specified timezone.
Args:
timezone: A string representing a valid timezone (e.g., 'America/New_York').
"""
try:
# Create timezone object
tz = pytz.timezone(timezone)
# Get current time in that timezone
local_time = datetime.datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S")
return f"The current local time in {timezone} is: {local_time}"
except Exception as e:
return f"Error fetching time for timezone '{timezone}': {str(e)}"
You can also use other tools that are other AI models, like this:
image_generation_tool = load_tool("agents-course/text-to-image", trust_remote_code=True)
Now, these are the tools at the agent’s disposal. What about the model? We are going to use the Qwen2.5-Coder-32B-Instruct model. You have to apply to be able to use this model. They are pretty open about granting access.
This is how you create the model object:
model = HfApiModel(
max_tokens=2096,
temperature=0.5,
model_id='Qwen/Qwen2.5-Coder-32B-Instruct',# it is possible that this model may be overloaded
custom_role_conversions=None,
)
We now have to add the prompts that we talked about earlier:
with open("prompts.yaml", 'r') as stream:
prompt_templates = yaml.safe_load(stream)
Now, our final task is to create the agent object.
agent = CodeAgent(
model=model,
tools=[final_answer, get_current_time_in_timezone, get_amazing_dog_fact,
image_generation_tool], ## add your tools here (don't remove final answer)
max_steps=6,
verbosity_level=1,
grammar=None,
planning_interval=None,
name=None,
description=None,
prompt_templates=prompt_templates
)
Note the very important argument `tools`. Here we add all the names of the functions that we created or defined to a list. This is very important. This is how the agent knows about the tools that are available to its disposal.
Other arguments to this function are several hyperparameters that we will not discuss or change in this tutorial. You can refer to the documentation for more information.
For the full code, go ahead and visit the repository and the app.py file from where the above code is.
I have explained all the core concepts and all the necessary code. HuggingFace provided the template of the project here.
You can go ahead right now, and use the chat interface where you can use the tools that I have mentioned.
Here is my HuggingFace space, called greetings_gen. You should clone the project, and set a suitable name, and also change the visibility to public if you want to make the agent available to friends and public.
And make changes `app.py` file and add your new tools, remove mine- whatever you wish.
Here are some examples where you can see the inputs and outputs of the agent:
Agents can reliably perform tasks using multiple tools giving them more autonomy, and enables them to complete more complex tasks with deterministic inputs and outputs, while giving more ease to the user.
You learned about the basics of agentic AI, the basics of using smolagents library, and you also learned to create tools of your own that an AI agent can use, along hosting a chat model in HuggingFace spaces where you can interact with an agent that uses the tools that you created!
Feel free to follow me on the Fediverse, X/Twitter, and LinkedIn. And be sure to visit my website.
smolagents
library simplifies AI agent creation by providing an easy-to-use framework.A. An AI agent is an LLM-powered system that can interact with custom tools to perform specific tasks beyond text generation.
A. Custom tools help AI agents fetch real-time data, execute commands, and perform actions they can’t handle on their own.
smolagents
library? A. smolagents
is a lightweight framework by Hugging Face that helps developers create AI agents capable of using custom tools.
A. You can define functions as custom tools and integrate them into your AI agent to extend its capabilities.
A. You can deploy AI agents on platforms like Hugging Face Spaces for easy access and interaction.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.