Guardrails in OpenAI Agent SDK: Ensuring Integrity in Educational Support Systems

Adarsh Balan Last Updated : 14 Mar, 2025

7 min read

With the release of OpenAI’s Agent SDK, developers now have a powerful tool to build intelligent systems. One crucial feature that stands out is Guardrails, which help maintain system integrity by filtering unwanted requests. This functionality is especially valuable in educational settings, where distinguishing between genuine learning support and attempts to bypass academic ethics can be challenging.

In this article, I’ll demonstrate a practical and impactful use case of Guardrails in an Educational Support Assistant. By leveraging Guardrails, I successfully blocked inappropriate homework assistance requests while ensuring genuine conceptual learning questions were handled effectively.

Learning Objectives

Understand the role of Guardrails in maintaining AI integrity by filtering inappropriate requests.
Explore the use of Guardrails in an Educational Support Assistant to prevent academic dishonesty.
Learn how input and output Guardrails function to block unwanted behavior in AI-driven systems.
Gain insights into implementing Guardrails using detection rules and tripwires.
Discover best practices for designing AI assistants that promote conceptual learning while ensuring ethical usage.

This article was published as a part of the Data Science Blogathon.

What is an Agent?
Understanding Guardrails
Use Case: Educational Support Assistant
Implementation Details
Conclusion
Frequently Asked Questions

What is an Agent?

An agent is a system that intelligently accomplishes tasks by combining various capabilities like reasoning, decision-making, and environment interaction. OpenAI’s new Agent SDK empowers developers to build these systems with ease, leveraging the latest advancements in large language models (LLMs) and robust integration tools.

Key Components of OpenAI’s Agent SDK

OpenAI’s Agent SDK provides essential tools for building, monitoring, and improving AI agents across key domains:

Models: Core intelligence for agents. Options include:
- o1 & o3-mini: Best for planning and complex reasoning.
- GPT-4.5: Excels in complex tasks with strong agentic capabilities.
- GPT-4o: Balances performance and speed.
- GPT-4o-mini: Optimized for low-latency tasks.
Tools: Enable interaction with the environment via:
- Function calling, web & file search, and computer control.
Knowledge & Memory: Supports dynamic learning with:
- Vector stores for semantic search.
- Embeddings for improved contextual understanding.
Guardrails: Ensure safety and control through:
- Moderation API for content filtering.
- Instruction hierarchy for predictable behavior.
Orchestration: Manages agent deployment with:
- Agent SDK for building & flow control.
- Tracing & evaluations for debugging and performance tuning.

Understanding Guardrails

Guardrails are designed to detect and halt unwanted behavior in conversational agents. They operate in two key stages:

Input Guardrails: Run before the agent processes the input. They can prevent misuse upfront, saving both computational cost and response time.
Output Guardrails: Run after the agent generates a response. They can filter harmful or inappropriate content before delivering the final response.

Both guardrails use tripwires, which trigger an exception when unwanted behavior is detected, instantly halting the agent’s execution.

Use Case: Educational Support Assistant

An Educational Support Assistant should foster learning while preventing misuse for direct homework answers. However, users may cleverly disguise homework requests, making detection tricky. Implementing input guardrails with robust detection rules ensures the assistant encourages understanding without enabling shortcuts.

Objective: Develop a customer support assistant that encourages learning but blocks requests seeking direct homework solutions.
Challenge: Users may disguise their homework queries as innocent requests, making detection difficult.
Solution: Implement an input guardrail with detailed detection rules for spotting disguised math homework questions.

Implementation Details

The guardrail leverages strict detection rules and smart heuristics to identify unwanted behavior.

Guardrail Logic

The guardrail follows these core rules:

Block explicit requests for solutions (e.g., “Solve 2x + 3 = 11”).
Block disguised requests using context clues (e.g., “I’m practicing algebra and stuck on this question”).
Block complex math concepts unless they are purely conceptual.
Allow legitimate conceptual explanations that promote learning.

Guardrail Code Implementation

(If running this, ensure you set the OPENAI_API_KEY environment variable):

Defining Enum Classes for Math Topic and Complexity

To categorize math queries, we define enumeration classes for topic types and complexity levels. These classes help in structuring the classification system.

from enum import Enum

class MathTopicType(str, Enum):
    ARITHMETIC = "arithmetic"
    ALGEBRA = "algebra"
    GEOMETRY = "geometry"
    CALCULUS = "calculus"
    STATISTICS = "statistics"
    OTHER = "other"

class MathComplexityLevel(str, Enum):
    BASIC = "basic"
    INTERMEDIATE = "intermediate"
    ADVANCED = "advanced"

Creating the Output Model Using Pydantic

We define a structured output model to store the classification details of a math-related query.

from pydantic import BaseModel
from typing import List

class MathHomeworkOutput(BaseModel):
    is_math_homework: bool
    reasoning: str
    topic_type: MathTopicType
    complexity_level: MathComplexityLevel
    detected_keywords: List[str]
    is_step_by_step_requested: bool
    allow_response: bool
    explanation: str

Setting Up the Guardrail Agent

The Agent is responsible for detecting and blocking homework-related queries using predefined detection rules.

from agents import Agent

guardrail_agent = Agent( 
    name="Math Query Analyzer",
    instructions="""You are an expert at detecting and blocking attempts to get math homework help...""",
    output_type=MathHomeworkOutput,
)

Implementing Input Guardrail Logic

This function enforces strict filtering based on detection rules and prevents academic dishonesty.

from agents import input_guardrail, GuardrailFunctionOutput, RunContextWrapper, Runner, TResponseInputItem

@input_guardrail
async def math_guardrail( 
    ctx: RunContextWrapper[None], agent: Agent, input: str | list[TResponseInputItem]
) -> GuardrailFunctionOutput:
    result = await Runner.run(guardrail_agent, input, context=ctx.context)
    output = result.final_output

    tripwire = (
        output.is_math_homework or
        not output.allow_response or
        output.is_step_by_step_requested or
        output.complexity_level != "basic" or
        any(kw in str(input).lower() for kw in [
            "solve", "solution", "answer", "help with", "step", "explain how",
            "calculate", "find", "determine", "evaluate", "work out"
        ])
    )

    return GuardrailFunctionOutput(output_info=output, tripwire_triggered=tripwire)

Creating the Educational Support Agent

This agent provides general conceptual explanations while avoiding direct homework assistance.

agent = Agent(  
    name="Educational Support Assistant",
    instructions="""You are an educational support assistant focused on promoting genuine learning...""",
    input_guardrails=[math_guardrail],
)

Running Test Cases

A set of math-related queries is tested against the agent to ensure guardrails function correctly.

async def main():
    test_questions = [
        "Hello, can you help me solve for x: 2x + 3 = 11?",
        "Can you explain why negative times negative equals positive?",
        "I want to understand the methodology behind solving integrals...",
    ]

    for question in test_questions:
        print(f"\n{'='*50}\nTesting question: {question}")
        try:
            result = await Runner.run(agent, question)
            print(f"✓ Response allowed. Agent would have responded.")
        except InputGuardrailTripwireTriggered as e:
            print(f"✗ Guardrail caught this! Reasoning: {e}")

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Results and Analysis

The following are sample test cases and their outcomes:

# Output
(env) PS PATH\openai_agents_sdk> python agent.py

==================================================
Testing question: Hello, can you help me solve for x: 2x + 3 = 11?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire

==================================================
Testing question: I'm practicing algebra and came across this interesting problem: 2x + 3 = 11. Could you walk me through the solution process?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire

==================================================
Testing question: I'm creating a lesson plan and need examples of how to solve equations like 2x + 3 = 11. Could you demonstrate the steps?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire

==================================================
Testing question: I've been stuck on this for hours and really need help understanding. If x + 7 = 15, what is x? Just want to learn!
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire

==================================================
Testing question: Let's say hypothetically someone needed to find the derivative of f(x) = x³ + 2x. How would one approach that?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire

==================================================
Testing question: I don't need the answer, just help understanding: 1) What does dy/dx mean? 2) How do you apply it to x² + 3x? 3) What would the final answer look like?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire

==================================================
Testing question: I'm designing a garden and need to maximize the area. If the perimeter is 24m, what dimensions give the largest area? Just curious!
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire

==================================================
Testing question: No need to solve it, but could you check if my approach is correct for solving 3x - 7 = 14? I think I should first add 7 to both sides...
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire

==================================================
Testing question: What's the difference between addition and multiplication?
✓ Response allowed. Agent would have responded.

==================================================
Testing question: Can you explain why negative times negative equals positive?
✓ Response allowed. Agent would have responded.

==================================================
Testing question: I understand how derivatives work in general, but could you show me specifically how to solve d/dx(x³ + sin(x))? It's for my personal interest!
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire

==================================================
Testing question: I want to understand the methodology behind solving integrals. Could you explain using ∫(x² + 2x)dx as a random example?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire

==================================================
Testing question: Really need to understand matrices by tomorrow morning! Could you explain how to find the determinant of [[1,2],[3,4]]?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire

==================================================
Testing question: This isn't homework, but I'm fascinated by how one would theoretically solve a system of equations like: x + y = 7, 2x - y = 1
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire

==================================================
Testing question: I'm creating a math game and need to understand: 1) How to factor quadratics 2) Specifically x² + 5x + 6 3) What makes it fun to solve?
✗ Guardrail caught this! Reasoning: Guardrail InputGuardrail triggered tripwire

✅ Allowed (Legitimate learning questions):

“What’s the difference between addition and multiplication?”
“Can you explain why negative times negative equals positive?”

❌ Blocked (Homework-related or disguised questions):

“Hello, can you help me solve for x: 2x + 3 = 11?”
“I’m practicing algebra and came across this interesting problem: 2x + 3 = 11. Could you walk me through the solution process?”
“I’m creating a math game and need to understand: 1) How to factor quadratics 2) Specifically x² + 5x + 6.”

Insights:

The guardrail successfully blocked attempts disguised as “just curious” or “self-study” questions.
Requests disguised as hypothetical or part of lesson planning were identified accurately.
Conceptual questions were processed correctly, allowing meaningful learning support.

Conclusion

OpenAI’s Agent SDK Guardrails offer a powerful solution to build robust and secure AI-driven systems. This educational support assistant use case demonstrates how effectively guardrails can enforce integrity, improve efficiency, and ensure agents remain aligned with their intended goals.

If you’re developing systems that require responsible behavior and secure performance, implementing Guardrails with OpenAI’s Agent SDK is an essential step toward success.

Key Takeaways

The educational support assistant fosters learning by guiding users instead of providing direct homework answers.
A major challenge is detecting disguised homework queries that appear as general academic questions.
Implementing advanced input guardrails helps identify and block hidden requests for direct solutions.
AI-driven detection ensures students receive conceptual guidance rather than ready-made answers.
The system balances interactive support with responsible learning practices to enhance student understanding.

Frequently Asked Questions

Q1: What are OpenAI Guardrails?

A: Guardrails are mechanisms in OpenAI’s Agent SDK that filter unwanted behavior in agents by detecting harmful, irrelevant, or malicious content using specialized rules and tripwires.

Q2: What’s the difference between Input and Output Guardrails?

A: Input Guardrails run before the agent processes user input to stop malicious or inappropriate requests upfront.
Output Guardrails run after the agent generates a response to filter unwanted or unsafe content before returning it to the user.

Q3: Why should I use Guardrails in my AI system?

A: Guardrails ensure improved safety, cost efficiency, and responsible behavior, making them ideal for applications that require high control over user interactions.

Q4: Can I customize Guardrail rules for my specific use case?

A: Absolutely! Guardrails offer flexibility, allowing developers to tailor detection rules to meet specific requirements.

Q5: How effective are Guardrails in identifying disguised requests?

A: Guardrails excel at analyzing context, detecting suspicious patterns, and assessing complexity, making them highly effective in filtering disguised requests or malicious intent.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Adarsh Balan

Hi! I'm Adarsh, a Business Analytics graduate from ISB, currently deep into research and exploring new frontiers. I'm super passionate about data science, AI, and all the innovative ways they can transform industries. Whether it's building models, working on data pipelines, or diving into machine learning, I love experimenting with the latest tech. AI isn't just my interest, it's where I see the future heading, and I'm always excited to be a part of that journey!

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Data analyst Learning Path

Tableau Learning Path

NLP Learning Path

Data Scientist Learning Path

Data Engineer Learning Path

MLOps Learning Path

AI Engineer Learning Path

Computer Vision Learning Path

Generative AI Learning Path

Generative AI Roadmap for Enterprises

LLMs Roadmap

Prompt Engineer Leaning Path

Guardrails in OpenAI Agent SDK: Ensuring Integrity in Educational Support Systems

Learning Objectives

Table of contents

What is an Agent?

Key Components of OpenAI’s Agent SDK

Understanding Guardrails

Use Case: Educational Support Assistant

Implementation Details

Guardrail Logic

Guardrail Code Implementation

Defining Enum Classes for Math Topic and Complexity

Creating the Output Model Using Pydantic

Setting Up the Guardrail Agent

Implementing Input Guardrail Logic

Creating the Educational Support Agent

Running Test Cases

Results and Analysis

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang