Before talking about AI Agents, It is imperative to understand the lifespan of a sophisticated language model like GPT. A large language model such as GPT starts its lifespan with pretraining when it learns from a massive corpus of textual data to establish a basic grasp of the language. The next step is supervised fine-tuning when the model is improved for specific tasks by using specified datasets to refine it. By using positive reinforcement to optimize the model’s behavior, reward modeling enhances performance in general and decision-making in particular. Lastly, the model may learn and change dynamically through interactions thanks to reinforcement learning, honing its skills to do various tasks more accurately and adaptable. In this article, we will also learn how you can build AI Agents using “Tool Use.”
Each phase of the model’s development—pretraining, supervised fine-tuning, reward modeling, and reinforcement learning—progresses through four critical components: Dataset, Algorithm, Model, and Evaluation.
In the initial pretraining phase, the model ingests vast quantities of raw internet data, totaling trillions of words. While the data’s quality may vary, its sheer volume is substantial but still falls short of satisfying the model’s hunger for more. This phase demands significant hardware resources, including GPUs, and months of intensive training. The process begins with initializing weights from scratch and updating them as learning progresses. Algorithms like language modeling predict the next token, forming the basis of the model’s early stages.
Moving to supervised fine-tuning, the focus shifts to task-specific labeled datasets where the model refines its parameters to predict accurate labels for each input. Here, the datasets’ quality is paramount, leading to a reduction in quantity. Algorithms tailor training for tasks such as token prediction, culminating in a Supervised Fine-Tuning (SFT) Model. This phase requires fewer GPUs and less time than pretraining due to enhanced dataset quality.
Reward modeling follows, employing algorithms like binary classification to enhance model performance based on positive reinforcement signals. The resulting Reward Modeling (RM) Model undergoes further enhancement through human feedback or evaluation.
Reinforcement learning optimizes the model’s responses through iterative interactions with its environment, ensuring adaptability to new information and prompts. However, integrating real-world data to keep the model updated remains a challenge.
Addressing this challenge involves bridging the gap between trained data and real-world information. It necessitates strategies to continuously update and integrate new data into the model’s knowledge base, ensuring it can respond accurately to the latest queries and prompts.
However, a critical question arises: While we’ve trained our LLM on the data provided, how do we equip it to access and respond to real-world information, especially to address the latest queries and prompts?
For instance, the model struggled to provide responses grounded in real-world data when testing ChatGPT 3.5 with specific questions, as shown in the image below:
One approach is to fine-tune the model, perhaps scheduling daily sessions regularly. However, due to resource limitations, the viability of this technique is currently under doubt. Regular fine-tuning comes with several difficulties:
In light of these difficulties, it is clear that adding new data to the model requires overcoming several barriers and is not a simple operation.
Here, we present AI agents, essentially LLMs, with built-in access to external tools. These agents can collect and process information, carry out tasks, and keep track of past encounters in their working memory. Although familiar LLM-based systems are capable of running programming and conducting web searches, AI agents go one step further:
If prompted with “What is the current temperature and weather in Delhi, India?” an online LLM-based chat system might initiate a web search to gather relevant information. Early on, developers of LLMs recognized that relying solely on pre-trained transformers to generate output is limiting. By integrating a web search tool, LLMs can perform more comprehensive tasks. In this scenario, the LLM could be fine-tuned or prompted (potentially with few-shot learning) to generate a specific command like {tool: web-search, query: “current temperature and weather in Delhi, India”} to initiate a search engine query.
A subsequent step identifies such commands, triggers the web search function with the appropriate parameters, retrieves the weather information, and integrates it back into the LLM’s input context for further processing.
If you pose a question such as, “If a product-based company sells an item at a 20% loss, what would be the final profit or loss?” an LLM equipped with a code execution tool could handle this by executing a Python command to compute the result accurately. For instance, it might generate a command like {tool: python-interpreter, code: “cost_price * (1 – 0.20)”}, where “cost_price” represents the initial cost of the item. This approach ensures that the LLM leverages computational tools effectively to provide the correct profit or loss calculation rather than attempting to generate the answer directly through its language processing capabilities, which might not yield accurate results. Besides that, with the help of external tools, the users can also book a ticket, which is planning an execution, i.e., Task Planning – Agentic Workflow.
So, AI agents can help ChatGPT with the problem of not having any information about the latest data in the real world. We can provide access to the Internet, where it can Google search and retrieve the top matches. So here, in this case, the tool is the Internet search.
When the AI identifies the necessity for current weather information in responding to a user’s query, it includes a list of available tools in its API request, indicating its access to such functions. Upon recognizing the need to use get_current_weather, it generates a specific function call with a designated location, such as “London,” as the parameter. Subsequently, the system executes this function call, fetching the latest weather details for London. The retrieved weather data is then seamlessly integrated into the AI’s response, enhancing the accuracy and relevance of the information provided to the user.
Now, let’s implement and inculcate the Tool Use to understand the Agentic workflow!
We are going to Use AI agents, a tool, to get information on current weather. As we saw in the above example, it cannot generate a response to the real-world question using the latest data.
So, we will now begin with the Implementation.
Let’s begin:
Let’s install dependencies first:
langchain
langchain-community>=0.0.36
langchainhub>=0.1.15
llama_cpp_python # please install the correct build based on your hardware and OS
pandas
loguru
googlesearch-python
transformers
Openai
Now, we will import libraries:
from openai import OpenAI
import json
from rich import print
import dotenv
dotenv.load_dotenv()
Keep your OpenAI API key in an env file, or you can put the key in a variable
OPENAI_API_KEY= "your_open_api_key"
client = OpenAI(api_key= OPENAI_API_KEY)
Interact with the GPT model using code and not interface :
messages = [{"role": "user", "content": "What's the weather like in London?"}]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
print(response)
This code sets up a simple interaction with an AI model, asking about the weather in London. The API would process this request and return a response, which you would need to parse to get the actual answer.
It’s worth noting that this code doesn’t fetch real-time weather data. Instead, it asks an AI model to generate a response based on its training data, which may not reflect the current weather in London.
In this case, the AI acknowledged it couldn’t provide real-time information and suggested checking a weather website or app for current London weather.
This structure allows easy parsing and extracting relevant information from the API response. The additional metadata (like token usage) can be useful for monitoring and optimizing API usage.
Now, let’s define a function for getting weather information and set up the structure for using it as a tool in an AI conversation:
def get_current_weather(location):
"""Get the current weather in a given city"""
if "london" in location.lower():
return json.dumps({"temperature": "20 C"})
elif "san francisco" in location.lower():
return json.dumps({"temperature": "15 C"})
elif "paris" in location.lower():
return json.dumps({"temperature": "22 C"})
else:
return json.dumps({"temperature": "unknown"})
messages = [{"role": "user", "content": "What's the weather like in London?"}]
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco",
},
},
"required": ["location"],
},
},
}
]
This code snippet defines a function for getting weather information and sets up the structure for using it as a tool in an AI conversation. Let’s break it down:
get_current_weather
function:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
)
print(response)
Also read: Agentic AI Demystified: The Ultimate Guide to Autonomous Agents
Here, we use three external Scripts named LLMs, tools, and tool_executor, which act as helper functions.
fromllms import OpenAIChatCompletion
from tools import get_current_weather
from tool_executor import need_tool_use
Before going further with the code flow, let’s understand the scripts.
It manages interactions with OpenAI’s chat completion API, enabling the use of external tools within the chat context:
from typing import List, Optional, Any, Dict
import logging
from agents.specs import ChatCompletion
from agents.tool_executor import ToolRegistry
from langchain_core.tools import StructuredTool
from llama_cpp import ChatCompletionRequestMessage
from openai import OpenAI
logger = logging.getLogger(__name__)
class OpenAIChatCompletion:
def __init__(self, model: str = "gpt-4o"):
self.model = model
self.client = OpenAI()
self.tool_registry = ToolRegistry()
def bind_tools(self, tools: Optional[List[StructuredTool]] = None):
for tool in tools:
self.tool_registry.register_tool(tool)
def chat_completion(
self, messages: List[ChatCompletionRequestMessage], **kwargs
) -> ChatCompletion:
tools = self.tool_registry.openai_tools
output = self.client.chat.completions.create(
model=self.model, messages=messages, tools=tools
)
logger.debug(output)
return output
def run_tools(self, chat_completion: ChatCompletion) -> List[Dict[str, Any]]:
return self.tool_registry.call_tools(chat_completion)
This code defines a class OpenAIChatCompletion that encapsulates the functionality for interacting with OpenAI’s chat completion API and managing tools. Let’s break it down:
Imports
Various typing annotations and necessary modules are imported.
Class Definition
pythonCopyclass OpenAIChatCompletion:
This class serves as a wrapper for OpenAI’s chat completion functionality.
Constructor
pythonCopydef __init__(self, model: str = “gpt-4o”):
Initializes the class with a specified model (default is “gpt-4o”).
Creates an OpenAI client and a ToolRegistry instance.
bind_tools method
pythonCopydef bind_tools(self, tools: Optional[List[StructuredTool]] = None):
Registers provided tools with the ToolRegistry.
This allows the chat completion to use these tools when needed.
chat_completion method:
pythonCopydef chat_completion(
self, messages: List[ChatCompletionRequestMessage], **kwargs
) ->
ChatCompletion
Sends a request to the OpenAI API for chat completion.
Includes the registered tools in the request.
Returns the API response as a ChatCompletion object.
run_tools method
pythonCopydef run_tools(self, chat_completion: ChatCompletion) -> List[Dict[str, Any]]:
Executes the tools called in the chat completion response.
Returns the results of the tool executions.
It defines individual tools or functions, such as fetching real-time weather data, that can be utilized by the AI to perform specific tasks.
import json
import requests
from langchain.tools import tool
from loguru import logger
@tool
def get_current_weather(city: str) -> str:
"""Get the current weather for a given city.
Args:
city (str): The city to fetch weather for.
Returns:
str: current weather condition, or None if an error occurs.
"""
try:
data = json.dumps(
requests.get(f"https://wttr.in/{city}?format=j1")
.json()
.get("current_condition")[0]
)
return data
except Exception as e:
logger.exception(e)
error_message = f"Error fetching current weather for {city}: {e}"
return error_message
This code defines several tools that can be used in an AI system, likely in conjunction with the OpenAIChatCompletion class we discussed earlier. Let’s break down each tool:
get_current_weather:
It handles the execution and management of tools, ensuring they are called and integrated correctly within the AI’s response workflow.
import json
from typing import Any, List, Union, Dict
from langchain_community.tools import StructuredTool
from langchain_core.utils.function_calling import convert_to_openai_function
from loguru import logger
from agents.specs import ChatCompletion, ToolCall
class ToolRegistry:
def __init__(self, tool_format="openai"):
self.tool_format = tool_format
self._tools: Dict[str, StructuredTool] = {}
self._formatted_tools: Dict[str, Any] = {}
def register_tool(self, tool: StructuredTool):
self._tools[tool.name] = tool
self._formatted_tools[tool.name] = convert_to_openai_function(tool)
def get(self, name: str) -> StructuredTool:
return self._tools.get(name)
def __getitem__(self, name: str)
return self._tools[name]
def pop(self, name: str) -> StructuredTool:
return self._tools.pop(name)
@property
def openai_tools(self) -> List[Dict[str, Any]]:
# [{"type": "function", "function": registry.openai_tools[0]}],
result = []
for oai_tool in self._formatted_tools.values():
result.append({"type": "function", "function": oai_tool})
return result if result else None
def call_tool(self, tool: ToolCall) -> Any:
"""Call a single tool and return the result."""
function_name = tool.function.name
function_to_call = self.get(function_name)
if not function_to_call:
raise ValueError(f"No function was found for {function_name}")
function_args = json.loads(tool.function.arguments)
logger.debug(f"Function {function_name} invoked with {function_args}")
function_response = function_to_call.invoke(function_args)
logger.debug(f"Function {function_name}, responded with {function_response}")
return function_response
def call_tools(self, output: Union[ChatCompletion, Dict]) -> List[Dict[str, str]]:
"""Call all tools from the ChatCompletion output and return the
result."""
if isinstance(output, dict):
output = ChatCompletion(**output)
if not need_tool_use(output):
raise ValueError(f"No tool call was found in ChatCompletion\n{output}")
messages = []
# https://platform.openai.com/docs/guides/function-calling
tool_calls = output.choices[0].message.tool_calls
for tool in tool_calls:
function_name = tool.function.name
function_response = self.call_tool(tool)
messages.append({
"tool_call_id": tool.id,
"role": "tool",
"name": function_name,
"content": function_response,
})
return messages
def need_tool_use(output: ChatCompletion) -> bool:
tool_calls = output.choices[0].message.tool_calls
if tool_calls:
return True
return False
def check_function_signature(
output: ChatCompletion, tool_registry: ToolRegistry = None
):
tools = output.choices[0].message.tool_calls
invalid = False
for tool in tools:
tool: ToolCall
if tool.type == "function":
function_info = tool.function
if tool_registry:
if tool_registry.get(function_info.name) is None:
logger.error(f"Function {function_info.name} is not available")
invalid = True
arguments = function_info.arguments
try:
json.loads(arguments)
except json.JSONDecodeError as e:
logger.exception(e)
invalid = True
if invalid:
return False
return True
This code defines a ToolRegistry class and associated helper functions for managing and executing tools in an AI system. Let’s break it down:
This ToolRegistry class is a central component for managing and executing tools in an AI system. It allows for:
The design allows seamless integration with AI models supporting function calling, like those from OpenAI. It provides a structured way to extend an AI system’s capabilities by allowing it to interact with external tools and data sources.
The helper functions need_tool_use and check_function_signature provide additional utility for working with ChatCompletion outputs and validating tool usage.
This code forms a crucial part of a larger system for building AI agents capable of using external tools and APIs to enhance their capabilities beyond simple text generation.
These were the external scripts and other helper functions required to include external tools/functionality and leverage all AI capabilities.
Also read: How Autonomous AI Agents Are Shaping Our Future?
Now, an instance of OpenAIChatCompletion is created.
The get_current_weather tool is bound to this instance.
A message list is created with a user query about London’s weather.
A chat completion is requested using this setup.
llm = OpenAIChatCompletion()
llm.bind_tools([get_current_weather])
messages = [
{"role": "user", "content": "how is the weather in London today?"}
]
output = llm.chat_completion(messages)
print(output)
This demonstrates how the AI can intelligently decide to use available tools to gather information before providing an answer, making its responses more accurate and up-to-date.
if need_tool_use(output):
print("Using weather tool")
tool_results = llm.run_tools(output)
print(tool_results)
tool_results[0]["role"] = "assistant"
updated_messages = messages + tool_results
updated_messages = updated_messages + [
{"role": "user", "content": "Think step by step and answer my question based on the above context."}
]
output = llm.chat_completion(updated_messages)
print(output.choices[0].message.content)
This code:
This implementation represents a significant step toward creating more capable, context-aware AI systems. By bridging the gap between large language models and external tools and data sources, we can create AI assistants that understand and generate human-like text that meaningfully interacts with the real world.
Ans. An AI agent with dynamic tool use is an advanced artificial intelligence system that can autonomously select and utilize various external tools or functions to gather information, perform tasks, and solve problems. Unlike traditional chatbots or AI models that are limited to their pre-trained knowledge, these agents can interact with external data sources and APIs in real time, allowing them to provide up-to-date and contextually relevant responses.
Ans. Regular AI models typically rely solely on their pre-trained knowledge to generate responses. In contrast, AI agents with dynamic tool use can recognize when they need additional information, select appropriate tools to gather that information (like weather APIs, search engines, or databases), use these tools, and then incorporate the new data into their reasoning process. This allows them to handle a much wider range of tasks and provide more accurate, current information.
Ans. The applications of building AI agents are vast and varied. Some examples include:
– Personal assistants who can schedule appointments, check real-time information, and perform complex research tasks.
– Customer service bots that can access user accounts, process orders, and provide product information.
– Financial advisors who can analyze market data, check current stock prices, and provide personalized investment advice.
– Healthcare assistants who can access medical databases interpret lab results and provide preliminary diagnoses.
– Project management systems that can coordinate tasks, access multiple data sources, and provide real-time updates.