How OpenAI Swarm Enhances Multi-Agent Collaboration?

Badrinarayan M Last Updated : 06 Nov, 2024
12 min read

OpenAI’s Swarm framework is designed to create a user-friendly and flexible environment for coordinating multiple agents. While it’s primarily intended for educational and experimental use, OpenAI advises against using Swarm in production settings, but it is a framework worth exploring. Its core purpose is to demonstrate the concepts of “handoffs” and “routines,” patterns that help agents collaborate efficiently. Swarm isn’t a standalone library but a tool to explore these patterns. Let’s dive into what routines and handoffs are and how they play a role in orchestrating agent behaviour.

OpenAI Swarm

Overview

  1. OpenAI Swarm is a framework designed for coordinating multiple agents through routines and handoffs.
  2. It offers a user-friendly environment ideal for educational and experimental purposes.
  3. Swarm is not intended for production use but serves as a learning tool for multi-agent orchestration.
  4. The framework helps developers understand agent collaboration patterns, enhancing flexibility and task execution.
  5. Swarm emphasizes seamless agent interaction, making complex systems manageable without a steep learning curve.
  6. Through practical examples, Swarm illustrates how routines and handoffs can streamline agent behaviour and coordination.

What is OpenAI Swarm?

OpenAI has bundled these ideas into a sample library called Swarm, designed as a proof of concept. While Swarm is not meant for production use, it serves as a great starting point for experimentation, offering ideas and code you can build upon to create your own systems.

Swarm focuses on making agent coordination and task execution lightweight, easy to control, and simple to test. It does this by relying on two core concepts: Agents and handoffs. An Agent represents a set of instructions and tools, and at any point, it can hand off a conversation to another Agent.

These core abstractions are powerful enough to model complex interactions between tools and networks of agents. This makes building scalable, real-world systems possible without facing a steep learning curve.

Why Use OpenAI Swarm?

OpenAI Swarm explores lightweight, scalable, and inherently customizable patterns. It’s ideal for scenarios involving many independent tasks and instructions, which are hard to capture in a single prompt.

The Assistants API might be a better fit for developers looking for fully hosted solutions with built-in memory management. However, Swarm is a fantastic educational resource for those who want to dive into the mechanics of multi-agent orchestration. Running mostly on the client, Swarm is similar to the Chat Completions API and doesn’t store state between calls, making it an effective tool for learning and experimenting.

Example using OpenAI Swarm Framework

This code demonstrates how OpenAI’s Swarm framework can make agent collaboration fun, flexible, and dynamic. Let’s dive into what’s happening here!

Setting the Stage

First, we import the essentials:

from swarm import Swarm, Agent
client = Swarm()

This creates the Swarm client, which orchestrates the interactions between our agents. Think of it as the mastermind behind the scenes, ensuring the agents do their thing.

The Agents Take the Stage

Next, we define a simple yet crucial function:

def transfer_to_agent_b():
   return agent_b

This function is the handoff mechanic. It allows Agent A to politely pass the conversation to Agent B when the time is right.

Now, let’s meet the agents:

agent_a = Agent(
   name="Agent A",
   instructions="You are a helpful agent.",
   functions=[transfer_to_agent_b],
)

agent_b = Agent(
   name="Agent B",
   instructions="Only speak in Haikus.",

)

Agent A is your friendly helper—always ready to assist but also smart enough to know when it’s time to bring in a colleague. Agent B is a bit more poetic and mysterious, only communicating in the elegant form of haikus.

Now, we bring it all together:

response = client.run(
   agent=agent_a,
   messages=[{"role": "user", "content": "I want to talk to agent B."}],
)


print(response.messages[-1]["content"])

This starts a conversation with Agent A, but the user requests a chat with Agent B. Thanks to the function transfer_to_agent_b, Agent A recognizes that it’s time to step aside and lets Agent B take over. Agent B, true to form, will respond in haikus, adding a creative twist to the interaction!

Output

Output

Building a Complex Customer Service Multi-Agent System

We will approach this with understanding how routine and handoffs work in the swarm. 

Importing our dependencies

from openai import OpenAI
from pydantic import BaseModel
from typing import Optional
import json

client = OpenAI()

Routines

A “routine” isn’t rigidly defined but instead captures the idea of a sequence of actions. Think of it as a set of natural language instructions (provided via a system prompt) and the tools needed to carry them out.

OpenAI Swarm Routines
Source: Author

Let’s break it down with an example: 

Imagine building a customer service agent that helps users solve their problems. The agent follows these steps:

  1. Gather information: First, the agent asks the user about the issue they’re facing.
  2. Ask for more details (if needed): If the agent needs more information to understand the problem, it asks follow-up questions.
  3. Provide a solution: Based on the information, the agent suggests a solution to fix the issue.
  4. Offer a refund (if needed): If the user is still not satisfied, the agent offers a refund.
  5. Process the refund: If the refund is accepted, the agent will find the relevant ID and complete the refund process.

This step-by-step process helps the agent efficiently resolve user issues while ensuring the user is satisfied.

The real power of routines lies in their simplicity and adaptability. Notice how the tasks are conditional, much like branches in a state machine. But routines go a step further. With “soft” adherence, the large language model (LLM) doesn’t get stuck in a loop; it skillfully guides the conversation, making these routines highly effective for small and medium tasks.

Here’s the GitHub Link to Swarm.

Executing Routines

Start with a basic loop: gather user input, append the message to the conversation history, call the model, and then append the model’s response back to the history.

def run_full_turn(system_message, messages):
   response = client.chat.completions.create(
       model="gpt-4o-mini",
       messages=[{"role": "system", "content": system_message}] + messages,
   )
   message = response.choices[0].message
   messages.append(message)


   if message.content: print("Assistant:", message.content)


   return message

messages = []
while True:
   user = input("User: ")
   messages.append({"role": "user", "content": user})


   run_full_turn(system_message, messages)

Since we haven’t integrated function calls yet, we need to add that next. Functions should be formatted as function schemas according to the model’s specifications. To make this easier, we can create a helper function that converts Python functions into the correct schema format.

import inspect

def function_to_schema(func) -> dict:
   type_map = {
       str: "string",
       int: "integer",
       float: "number",
       bool: "boolean",
       list: "array",
       dict: "object",
       type(None): "null",
   }


   try:
       signature = inspect.signature(func)
   except ValueError as e:
       raise ValueError(
           f"Failed to get signature for function {func.__name__}: {str(e)}"
       )

   parameters = {}
   for param in signature.parameters.values():
       try:
           param_type = type_map.get(param.annotation, "string")
       except KeyError as e:
           raise KeyError(
               f"Unknown type annotation {param.annotation} for parameter {param.name}: {str(e)}"
           )
       parameters[param.name] = {"type": param_type}

   required = [
       param.name
       for param in signature.parameters.values()
       if param.default == inspect._empty
   ]

   return {
       "type": "function",
       "function": {
           "name": func.__name__,
           "description": (func.__doc__ or "").strip(),
           "parameters": {
               "type": "object",
               "properties": parameters,
               "required": required,
           },
       },
   }
  1. Now, when the model triggers a tool, we need to run the appropriate function and return the result. This can be done by mapping tool names to Python functions in a
  2. In practice, we should allow the model to react differently depending on the result of the tool call. This process can repeat as long as there are more tool calls, since the model’s response may prompt another one. Here’s how the loop looks when we bring everything together:
# Customer Service Routine

system_message = (
   "You are a customer support agent for ACME Inc."
   "Always answer in a sentence or less."
   "Follow the following routine with the user:"
   "1. First, ask probing questions and understand the user's problem deeper.\n"
   " - unless the user has already provided a reason.\n"
   "2. Propose a fix (make one up).\n"
   "3. ONLY if not satesfied, offer a refund.\n"
   "4. If accepted, search for the ID and then execute refund."
   ""
)

def look_up_item(search_query):
   """Use to find item ID.
   Search query can be a description or keywords."""

   # return hard-coded item ID - in reality would be a lookup
   return "item_132612938"

def execute_refund(item_id, reason="not provided"):

   print("Summary:", item_id, reason) # lazy summary
   return "success"

tools = [execute_refund, look_up_item]

def run_full_turn(system_message, tools, messages):

   num_init_messages = len(messages)
   messages = messages.copy()

   while True:

       # turn python functions into tools and save a reverse map
       tool_schemas = [function_to_schema(tool) for tool in tools]
       tools_map = {tool.__name__: tool for tool in tools}

       # === 1. get openai completion ===
       response = client.chat.completions.create(
           model="gpt-4o-mini",
           messages=[{"role": "system", "content": system_message}] + messages,
           tools=tool_schemas or None,
       )
       message = response.choices[0].message
       messages.append(message)

       if message.content:  # print assistant response
           print("Assistant:", message.content)

       if not message.tool_calls:  # if finished handling tool calls, break
           break

       # === 2. handle tool calls ===

       for tool_call in message.tool_calls:
           result = execute_tool_call(tool_call, tools_map)

           result_message = {
               "role": "tool",
               "tool_call_id": tool_call.id,
               "content": result,
           }
           messages.append(result_message)

   # ==== 3. return new messages =====
   return messages[num_init_messages:]

def execute_tool_call(tool_call, tools_map):
   name = tool_call.function.name
   args = json.loads(tool_call.function.arguments)

   print(f"Assistant: {name}({args})")

   # call corresponding function with provided arguments
   return tools_map[name](**args)

messages = []
while True:
   user = input("User: ")
   messages.append({"role": "user", "content": user})


   new_messages = run_full_turn(system_message, tools, messages)
   messages.extend(new_messages)

Once the basic routine is up and running, we can consider adding more steps and tools. By loading the necessary tools and processes, we can expand routines to handle different kinds of user requests. However, as we try to stretch routines across too many tasks, they may begin to falter.

That’s where the concept of multiple routines comes in handy. We can switch to the appropriate routine with the right tools to handle different user requests. At first, dynamically changing tools and instructions might feel complex. But if we think of routines as individual “agents,” the concept of handoffs makes this easier—one agent can simply pass the conversation to another, keeping the workflow seamless.

Also read: Top 4 Agentic AI Design Patterns for Architecting AI Systems

Handoffs in the OpenAI Swarm Framework

Similar to being transferred to another representative during a phone call, a “handoff” in the Swarm framework happens when one agent (or routine) passes an ongoing conversation to another. But unlike real-life handoffs, these agents are fully aware of your previous interactions, ensuring a smooth transition!

Handoffs in the OpenAI Swarm Framework
Source: Author

To implement handoffs in code, we first need to define a class for an Agent. This will allow agents to manage conversations and transfer them when necessary.

class Agent(BaseModel):
   name: str = "Agent"
   model: str = "gpt-4o-mini"
   instructions: str = "You are a helpful Agent"
   tools: list = []

Next, we’ll modify the existing routine code to support agents. Instead of passing a system_message and tools directly into the run_full_turn function, we’ll have it accept an Agent object instead.

def run_full_turn(agent, messages):


   num_init_messages = len(messages)
   messages = messages.copy()


   while True:


       # turn python functions into tools and save a reverse map
       tool_schemas = [function_to_schema(tool) for tool in agent.tools]
       tools_map = {tool.__name__: tool for tool in agent.tools}


       # === 1. get openai completion ===
       response = client.chat.completions.create(
           model=agent.model,
           messages=[{"role": "system", "content": agent.instructions}] + messages,
           tools=tool_schemas or None,
       )
       message = response.choices[0].message
       messages.append(message)


       if message.content:  # print assistant response
           print("Assistant:", message.content)


       if not message.tool_calls:  # if finished handling tool calls, break
           break


       # === 2. handle tool calls ===


       for tool_call in message.tool_calls:
           result = execute_tool_call(tool_call, tools_map)


           result_message = {
               "role": "tool",
               "tool_call_id": tool_call.id,
               "content": result,
           }
           messages.append(result_message)


   # ==== 3. return new messages =====
   return messages[num_init_messages:]

def execute_tool_call(tool_call, tools_map):
   name = tool_call.function.name
   args = json.loads(tool_call.function.arguments)


   print(f"Assistant: {name}({args})")


   # call corresponding function with provided arguments
   return tools_map[name](**args)

With this setup, running multiple agents becomes straightforward:

def execute_refund(item_name):
   return "success"


refund_agent = Agent(
   name="Refund Agent",
   instructions="You are a refund agent. Help the user with refunds.",
   tools=[execute_refund],
)


def place_order(item_name):
   return "success"


sales_assistant = Agent(
   name="Sales Assistant",
   instructions="You are a sales assistant. Sell the user a product.",
   tools=[place_order],
)

messages = []
user_query = "Place an order for a black boot."
print("User:", user_query)
messages.append({"role": "user", "content": user_query})


response = run_full_turn(sales_assistant, messages) # sales assistant
messages.extend(response)

user_query = "Actually, I want a refund." # implitly refers to the last item
print("User:", user_query)
messages.append({"role": "user", "content": user_query})
response = run_full_turn(refund_agent, messages) # refund agent

In this example, handoffs are performed manually, but ideally, we want agents to pass tasks between each other automatically. A simple way to achieve this is through function calling. Each agent can invoke a specific handoff function, like transfer_to_xxx, to smoothly hand over the conversation to the next agent in line.

This method allows agents to handle conversations seamlessly, without manual intervention!

Handoff Functions

Now that our agent can communicate its intention to transfer a task, we need to implement the actual handoff. While there are several ways to do this, one particularly elegant approach is available.

So far, we’ve been returning strings from our agent functions, such as execute_refund or place_order. But what if we return an Agent object when it’s time to transfer instead of just returning a string? For example:

refund_agent = Agent(
   name="Refund Agent",
   instructions="You are a refund agent. Help the user with refunds.",
   tools=[execute_refund],
)

def transfer_to_refunds():
   return refund_agent

sales_assistant = Agent(
   name="Sales Assistant",
   instructions="You are a sales assistant. Sell the user a product.",
   tools=[place_order],
)

Now, let’s update the run_full_turn function to accommodate this kind of handoff:

def run_full_turn(agent, messages):

   current_agent = agent
   num_init_messages = len(messages)
   messages = messages.copy()

   while True:

       # turn python functions into tools and save a reverse map
       tool_schemas = [function_to_schema(tool) for tool in current_agent.tools]
       tools = {tool.__name__: tool for tool in current_agent.tools}

       # === 1. get openai completion ===
       response = client.chat.completions.create(
           model=agent.model,
           messages=[{"role": "system", "content": current_agent.instructions}]
           + messages,
           tools=tool_schemas or None,
       )
       message = response.choices[0].message
       messages.append(message)


       if message.content:  # print agent response
           print(f"{current_agent.name}:", message.content)


       if not message.tool_calls:  # if finished handling tool calls, break
           break


       # === 2. handle tool calls ===


       for tool_call in message.tool_calls:
           result = execute_tool_call(tool_call, tools, current_agent.name)


           if type(result) is Agent:  # if agent transfer, update current agent
               current_agent = result
               result = (
                   f"Transfered to {current_agent.name}. Adopt persona immediately."
               )


           result_message = {
               "role": "tool",
               "tool_call_id": tool_call.id,
               "content": result,
           }
           messages.append(result_message)


   # ==== 3. return last agent used and new messages =====
   return Response(agent=current_agent, messages=messages[num_init_messages:])

def execute_tool_call(tool_call, tools, agent_name):
   name = tool_call.function.name
   args = json.loads(tool_call.function.arguments)

   print(f"{agent_name}:", f"{name}({args})")

   return tools[name](**args)  # call corresponding function with provided arguments

Let’s take a look at an example where multiple agents are involved, allowing them to transfer tasks between one another:

def escalate_to_human(summary):
   """Only call this if explicitly asked to."""
   print("Escalating to human agent...")
   print("\n=== Escalation Report ===")
   print(f"Summary: {summary}")
   print("=========================\n")
   exit()

def transfer_to_sales_agent():
   """User for anything sales or buying related."""
   return sales_agent

def transfer_to_issues_and_repairs():
   """User for issues, repairs, or refunds."""
   return issues_and_repairs_agent

def transfer_back_to_triage():
   """Call this if the user brings up a topic outside of your purview,
   including escalating to human."""
   return triage_agent

triage_agent = Agent(
   name="Triage Agent",
   instructions=(
       "You are a customer service bot for ACME Inc. "
       "Introduce yourself. Always be very brief. "
       "Gather information to direct the customer to the right department. "
       "But make your questions subtle and natural."
   ),
   tools=[transfer_to_sales_agent, transfer_to_issues_and_repairs, escalate_to_human],
)

def execute_order(product, price: int):
   """Price should be in USD."""
   print("\n\n=== Order Summary ===")
   print(f"Product: {product}")
   print(f"Price: ${price}")
   print("=================\n")
   confirm = input("Confirm order? y/n: ").strip().lower()
   if confirm == "y":
       print("Order execution successful!")
       return "Success"
   else:
       print("Order cancelled!")
       return "User cancelled order."

sales_agent = Agent(
   name="Sales Agent",
   instructions=(
       "You are a sales agent for ACME Inc."
       "Always answer in a sentence or less."
       "Follow the following routine with the user:"
       "1. Ask them about any problems in their life related to catching roadrunners.\n"
       "2. Casually mention one of ACME's crazy made-up products can help.\n"
       " - Don't mention price.\n"
       "3. Once the user is bought in, drop a ridiculous price.\n"
       "4. Only after everything, and if the user says yes, "
       "tell them a crazy caveat and execute their order.\n"
       ""
   ),
   tools=[execute_order, transfer_back_to_triage],
)

def look_up_item(search_query):
   """Use to find item ID.
   Search query can be a description or keywords."""
   item_id = "item_132612938"
   print("Found item:", item_id)
   return item_id

def execute_refund(item_id, reason="not provided"):
   print("\n\n=== Refund Summary ===")
   print(f"Item ID: {item_id}")
   print(f"Reason: {reason}")
   print("=================\n")
   print("Refund execution successful!")
   return "success"

issues_and_repairs_agent = Agent(
   name="Issues and Repairs Agent",
   instructions=(
       "You are a customer support agent for ACME Inc."
       "Always answer in a sentence or less."
       "Follow the following routine with the user:"
       "1. First, ask probing questions and understand the user's problem deeper.\n"
       " - unless the user has already provided a reason.\n"
       "2. Propose a fix (make one up).\n"
       "3. ONLY if not satisfied, offer a refund.\n"
       "4. If accepted, search for the ID and then execute refund."
       ""
   ),
   tools=[execute_refund, look_up_item, transfer_back_to_triage],
)

Finally, we can run this in a loop to see everything in action. Since this won’t work directly in a Python notebook, try it in a separate Python file:

agent = triage_agent
messages = []

while True:
   user = input("User: ")
   messages.append({"role": "user", "content": user})


   response = run_full_turn(agent, messages)
   agent = response.agent
   messages.extend(response.messages)

Using this method, agents can seamlessly hand off tasks to each other, enabling fluid transitions without extra complexity!

Also, to understand the Agent AI better, explore: The Agentic AI Pioneer Program

Conclusion

The OpenAI Swarm framework provides an innovative approach to coordinating multiple agents in a dynamic and user-friendly manner. By focusing on the principles of routines and handoffs, Swarm facilitates seamless interactions between agents, allowing them to work collaboratively and adaptively to fulfil user requests.

This framework simplifies the management of agent behaviours and enhances the overall user experience by ensuring smooth transitions and continuity in conversations. With its lightweight and customizable architecture, Swarm serves as an excellent starting point for developers looking to explore multi-agent orchestration in their applications.

While it may not be suitable for production use, Swarm stands out as a valuable educational resource, inspiring developers to build their own systems and understand the intricacies of agent coordination. As you experiment with Swarm, you’ll discover new possibilities for creating engaging and responsive interactions in your projects. Whether for learning or experimentation, Swarm exemplifies how to harness the power of AI-driven agents to tackle complex tasks effectively.

Frequently Asked Questions

Q1. What is the primary purpose of OpenAI’s Swarm framework?

Ans. Swarm is designed to create a user-friendly and flexible environment for coordinating multiple agents. It aims to demonstrate concepts like “handoffs” and “routines,” enabling agents to collaborate effectively in educational and experimental settings.

Q2. Can Swarm be used in production applications?

Ans. OpenAI advises against using Swarm in production environments. While it is an excellent tool for learning and experimentation, it is not optimized for production use and may lack the robustness needed for real-world applications.

Q3. What are “routines” in the context of Swarm?

Ans. Routines refer to sequences of actions or natural language instructions that guide an agent’s behaviour. They allow agents to respond to user requests dynamically, adapting their responses based on the context and previous interactions.

Q4. How do handoffs work between agents in Swarm?

Ans. Handoffs occur when one agent transfers an ongoing conversation to another agent. This process is designed to be seamless, allowing the receiving agent to have access to prior interactions and ensuring a smooth transition for the user.

Q5. Is Swarm suitable for developers new to AI and multi-agent systems?

Ans. Yes! Swarm is an excellent educational resource for developers looking to learn about multi-agent orchestration. Its lightweight architecture and focus on core concepts make it accessible for those starting in AI and agent-based programming, offering a practical way to explore these ideas without a steep learning curve.

Data science Trainee at Analytics Vidhya, specializing in ML, DL and Gen AI. Dedicated to sharing insights through articles on these subjects. Eager to learn and contribute to the field's advancements. Passionate about leveraging data to solve complex problems and drive innovation.

Responses From Readers

Clear

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details