OpenAI’s latest models, like GPT-o1 and GPT-4o, excel in delivering accurate, context-aware responses across diverse fields. A key factor behind the advancements in these Large Language Models (LLMs) is their enhanced utility and the significant reduction in common issues like hallucinations. Techniques like retrieval-augmented generation (RAG) enhance accuracy and reduce hallucinations by allowing models to access external, pre-indexed data. However, llm function calling emerges as a key capability when applications need real-time data like weather forecasting, stock prices (easy to judge the bullish and bearish behaviour) and other dynamic updates. Function-calling in LLMs, also known as Tool Calling, allows LLMs to invoke APIs or other systems, offering the ability to perform specific tasks autonomously.
This article explores 6 LLMs that support function-calling capabilities, offering real-time API integration for enhanced accuracy and automation. These models are shaping the next generation of AI agents, enabling them to autonomously handle tasks involving data retrieval, processing, and real-time decision-making.
Function calling is a methodology that enables large language models (LLMs) to interact with external systems, APIs, and tools. By equipping an LLM with a collection of functions or tools and details on how to use them, the model can intelligently choose and execute the appropriate function to perform a specific task.
This capability significantly extends the functionality of LLMs beyond simple text generation, allowing them to engage with the real world. Instead of only producing text-based responses, LLMs with function-calling capabilities can now perform actions, control devices, access databases for information retrieval, and complete a variety of tasks by utilizing external tools and services.
However, not all LLMs are equipped with function-calling abilities. Only models that have been specifically trained or fine-tuned for this purpose can recognize when a prompt requires invoking a function. The Berkeley Function-Calling Leaderboard, for instance, evaluates how well different LLMs handle a variety of programming languages and API scenarios, highlighting the versatility and reliability of these models in executing multiple, complex functions in parallel. This capability is essential for creating AI systems operating across various software environments and managing tasks requiring simultaneous actions.
Typically, applications utilizing function-calling LLMs follow a two-step process: mapping the user prompt to the correct function and input parameters and processing the function’s output to generate a final, coherent response.
To learn basics of AI Agents, checkout our free course on Introduction to AI Agents!
Here are 6 LLMs that support function callings:
Link to the doc: GPT-4o Function Calling
Function calling in GPT-4o allows developers to connect large language models to external tools and systems, enhancing their capabilities. By leveraging this feature, AI can interact with APIs, fetch data, execute functions, and perform tasks requiring external resource integration. This capability is particularly useful in building intelligent assistants, automating workflows, or developing dynamic applications that can perform actions based on user input.
Function calling with GPT-4o opens up a wide range of practical applications, including but not limited to:
These improvements make GPT-4o ideal for building autonomous AI agents, from virtual assistants to complex data analysis tools.
Also read: Introduction to OpenAI Function Calling
Link to the doc: Gemini 1.5-Flash function calling
Function Calling is a powerful feature of Gemini-1.5 Flash that allows developers to define and integrate custom functions seamlessly with Gemini models. Instead of directly invoking these functions, the models generate structured data outputs that specify the function names and suggested arguments. This approach enables the creation of dynamic applications that can interact with external APIs, databases, and various services, providing real-time and contextually relevant responses to user queries.
Introduction to Function Calling with Gemini-1.5 Flash:
The Function Calling feature in Gemini-1.5 Flash empowers developers to extend the capabilities of Gemini models by integrating custom functionalities. By defining custom functions and supplying them to the Gemini models, applications can leverage these functions to perform specific tasks, fetch real-time data, and interact with external systems. This enhances the model’s ability to provide comprehensive and accurate responses tailored to user needs.
Function Calling with Gemini-1.5 Flash can be leveraged across various domains to enhance application functionality and user experience. Here are some illustrative use cases:
Link to the doc: Anthropic Claude Sonnet 3.5 function calling
Anthropic Claude 4.5 supports function calling, enabling seamless integration with external tools to perform specific tasks. This allows Claude to interact dynamically with external systems and return results to the user in real time. By incorporating custom tools, you can expand Claude’s functionality beyond text generation, enabling it to access external APIs, fetch data, and perform actions essential for specific use cases.
In the context of Claude’s function calling, external tools or APIs can be defined and made available for the model to call during a conversation. Claude intelligently determines when a tool is necessary based on the user’s input, formats the request appropriately, and provides the result in a clear response. This mechanism enhances Claude’s versatility, allowing it to go beyond just answering questions or generating text by integrating real-world data or executing code through external APIs.
To integrate function calling with Claude, follow these steps:
Here are the use cases of this function:
By enabling function calling, Claude 4.5 significantly enhances its ability to assist users by integrating custom and real-world solutions into everyday interactions.
Claude excels in scenarios where safety and interpretability are paramount, making it a reliable choice for applications that require secure and accurate external system integrations.
Link to the doc: Cohere Command R+ Function Calling
Function calling, often referred to as Single-Step Tool Use, is a key capability of Command R+ that allows the system to interact directly with external tools like APIs, databases, or search engines in a structured and dynamic manner. The model makes intelligent decisions about which tool to use and what parameters to pass, simplifying the interaction with external systems and APIs.
This capability is central to many advanced use cases because it enables the model to perform tasks that require retrieving or manipulating external data, rather than relying solely on its pre-trained knowledge.
Command R+ utilizes function calling by making two key inferences:
Command R+ has been specifically trained to handle this functionality using a specialized prompt template. This ensures that the model can consistently deliver high-quality results when interacting with external tools. Deviating from the recommended template may reduce the performance of the function calling feature.
Link to the doc: Mistral Large 2Function Calling
Source: Author
Mistral Large 2, an advanced language model with 123 billion parameters, excels in generating code, solving mathematical problems, and handling multilingual tasks. One of its most powerful features is enhanced function calling, which allows it to execute complex, multi-step processes both in parallel and sequentially. Function calling refers to the model’s ability to dynamically interact with external tools, APIs, or other models to retrieve or process data based on specific user instructions. This capability significantly extends its application across various fields, making it a versatile solution for advanced computational and business applications.
Mistral Large 2 has been trained to handle intricate function calls by leveraging both its reasoning skills and its capability to integrate with external processes. Whether it’s calculating complex equations, generating real-time reports, or interacting with APIs to fetch live data, the model’s robust function calling can coordinate tasks that demand high-level problem-solving. The model excels at determining when to call specific functions and how to sequence them for optimal results, whether through parallelization or sequential steps.
Also read: Mistral Large 2: Powerful Enough to Challenge Llama 3.1 405B?
LLaMA 3.2, developed by Meta, stands out for its open-source accessibility and introduction of function calling, making it a powerful tool for developers who require flexibility and customization. This version hasn’t seen as widespread commercialization as other AI models, but its emphasis on adaptability is ideal for teams with strong development resources, especially in research and AI experimentation contexts.
As of now, LLaMA 3.2 benchmarks are still in development and haven’t been fully tested, so we’re awaiting comprehensive comparisons to models like GPT-4o. However, its introduction is an exciting leap in function-based AI interaction and flexibility, bringing new opportunities for experimentation and custom solutions.
Also read: 3 Ways to Run Llama 3.2 on Your Device
To integrate function calling into your application, follow these steps:
Manages a conversation with the GPT model, leveraging function calling to obtain weather data when needed.
import json
import os
import requests
from openai import OpenAI
client = OpenAI()
def get_current_weather(latitude, longitude):
"""Get the current weather in a given latitude and longitude"""
base = "https://api.openweathermap.org/data/2.5/weather"
key = "c64b4b9038f82998c12fa174d606591a"
request_url = f"{base}?lat={latitude}&lon={longitude}&appid={key}&units=metric"
response = requests.get(request_url)
result = {
"latitude": latitude,
"longitude": longitude,
**response.json()["main"]
}
return json.dumps(result)
def run_conversation(content):
messages = [{"role": "user", "content": content}]
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given latitude and longitude",
"parameters": {
"type": "object",
"properties": {
"latitude": {
"type": "string",
"description": "The latitude of a place",
},
"longitude": {
"type": "string",
"description": "The longitude of a place",
},
},
"required": ["latitude", "longitude"],
},
},
}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto",
)
response_message = response.choices[0].message
tool_calls = response_message.tool_calls
if tool_calls:
messages.append(response_message)
available_functions = {
"get_current_weather": get_current_weather,
}
for tool_call in tool_calls:
print(f"Function: {tool_call.function.name}")
print(f"Params:{tool_call.function.arguments}")
function_name = tool_call.function.name
function_to_call = available_functions[function_name]
function_args = json.loads(tool_call.function.arguments)
function_response = function_to_call(
latitude=function_args.get("latitude"),
longitude=function_args.get("longitude"),
)
print(f"API: {function_response}")
messages.append(
{
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": function_response,
}
)
second_response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
stream=True
)
return second_response
if __name__ == "__main__":
question = "What's the weather like in Paris and San Francisco?"
response = run_conversation(question)
for chunk in response:
print(chunk.choices[0].delta.content or "", end='', flush=True)
The run_conversation function takes a user’s input as its argument and starts a conversation by creating a message representing the user’s role and content. This initiates the chat flow where the user’s message is the first interaction.
A list of tools is defined, and one such tool is a function called get_current_weather. This function is described as retrieving the current weather based on the provided latitude and longitude coordinates. The parameters for this function are clearly specified, including that both latitude and longitude are required inputs.
The function then calls the GPT-4 model to generate a response based on the user’s message. The model has access to the tools (such as get_current_weather), and it automatically decides whether to use any of these tools. The response from the model may include tool calls, which are captured for further processing.
If the model decides to invoke a tool, the tool calls are processed. The function retrieves the appropriate tool (in this case, the get_current_weather function), extracts the parameters (latitude and longitude), and calls the function to get the weather information. The result from this function is then printed and appended to the conversation as a response from the tool.
After the tool’s output is integrated into the conversation, a second request is sent to the GPT-4 model to generate a new response enriched with the tool’s output. This second response is streamed and returned as the function’s final output.
Output
if __name__ == "__main__":
question = "What's the weather like in Delhi?"
response = run_conversation(question)
for chunk in response:
print(chunk.choices[0].delta.content or "", end='', flush=True)
This radar chart visualizes the performance of several AI language models based on different functional metrics. The models are:
This radar chart compares the performance of different models on function calling (FC) across several tasks. Here’s a brief breakdown of how they perform:
The function-calling (FC) aspect refers to how well these models can handle structured tasks, execute commands, or interact functionally. GPT-4o, Gemini 1.5, and Claude 3.5 generally lead across most metrics, with GPT-4o often taking the top spot. These models excel in accuracy and structured summaries (both live and non-live). Command-R Plus performs decently, particularly in summary tasks, but isn’t as dominant in overall accuracy.
Meta-LLaMA and Mistral Large are competent but fall behind in critical areas like hallucinations and multi-turn summaries, making them less reliable for function-calling tasks compared to GPT-4 and Claude.
In terms of human-like performance in function-calling, GPT-4o is clearly in the lead, as it balances well across all metrics, making it a great choice for tasks requiring accuracy and minimal hallucination. However, Claude 3.5 and Meta-LLaMA may have a slight advantage for specific tasks like Live Summaries.
Function calling enhances the capabilities of AI agents by allowing them to integrate specific, real-world functionality that they may not inherently possess. Here’s how the two are linked:
Imagine a customer support AI agent for an e-commerce platform. When a customer asks about their order status, the AI agent could:
In this scenario, the AI agent uses function calling to access external systems to provide a meaningful, goal-driven interaction, which it couldn’t achieve with just basic language processing.
In summary, function calling serves as a powerful tool that extends the abilities of AI agents. While the agent provides decision-making and goal-oriented actions, function calling enables the agent to interface with external functions or systems, adding real-world interactivity and specialized task execution. This synergy between AI agents and function calling leads to more robust and capable AI-driven systems.
Function calling in LLMs is essential for applications requiring real-time data access and dynamic interaction with external systems. The top LLMs—OpenAI GPT-4o, Gemini 1.5 Flash, Anthropic Claude Sonnet 3.5, Cohere Command+, Mistral Large 2, and Meta LLaMA 3.2—each offer distinct advantages depending on the use case. Whether it’s a focus on enterprise workflows, lightweight mobile applications, or AI safety, these models are paving the way for more accurate, reliable, and interactive AI Agents that can automate tasks, reduce hallucinations, and provide meaningful real-time insights.
Also, if you want to learn all about Generative AI then explore: GenAI Pinnacle Program
Ans. Function calling allows large language models (LLMs) to interact with external systems, APIs, or tools to perform real-world tasks beyond text generation.
Ans. Function calling enhances accuracy by enabling LLMs to retrieve real-time data, execute tasks, and make informed decisions through external tools.
Ans. Top LLMs with function calling include OpenAI’s GPT-4o, Gemini 1.5 Flash, Anthropic Claude Sonnet 3.5, Cohere Command+, Mistral Large 2, and Meta LLaMA 3.2.
Ans. Use cases include real-time data retrieval, automated workflows, scheduling, weather forecasting, and API-based tasks like stock or product updates.
Ans. It allows AI agents to perform tasks that require external data or actions autonomously, enhancing their efficiency and decision-making in dynamic environments.