A Guide to Reflection Agents Using LlamaIndex

Ritika 05 Aug, 2024
18 min read

Introduction

Imagine you’ve just created an AI model that can write, reason, and solve complex problems. But what if I told you there’s a way to make this AI even smarter by teaching it to think about its own thoughts? In this article, we’ll explore the fascinating world of reflective AI agents. We’ll start with the basics of how these agents can reflect on their own actions to improve over time. Then, we’ll delve into advanced techniques like Language Agent Tree Search (LATS) and Introspective Agents, showing you how to set up and use these methods with LlamaIndex. By the end, you’ll see how these approaches are transforming AI, making it more autonomous and capable of tackling ever more challenging tasks. Join us on this journey to unlock the next level of AI intelligence!

Learning Outcomes

  • Understand the concept and importance of reflection in enhancing LLM-based agents.
  • Explore the implementation of Basic Reflection Agents using self-prompting techniques.
  • Learn about Language Agent Tree Search (LATS) and its role in improving AI task performance.
  • Gain hands-on experience with LATS framework setup and execution using LlamaIndex.
  • Implement Introspective Agents to refine responses iteratively using self-reflection and external tools.

This article was published as a part of the Data Science Blogathon.

Understanding Reflection or Introspective Agents 

Many times the LLM fail to generate the adequate response for a given task. This is a common challenge in artificial intelligence, where agents often lack the ability to self-evaluate and refine their outputs.

This is where Reflection agents come to our rescue. People often discuss “System 1” and “System 2” thinking, with System 1 being reactive or instinctual and System 2 being more analytical and introspective. When used effectively, reflection may assist LLM systems move away from exclusively System 1 “thinking” patterns and towards System 2-like behaviour.

Reflection Thinking

In Llamindex reflection agents are implemented in Introspective Agents module.

Introspective agents are a strong concept that uses the reflection agent pattern within the LlamaIndex architecture. These agents have a distinct approach to job completion. Instead than providing a single response, they engage in iterative refining.

Steps in Basic Reflections Agents

  • Initial Response : The introspective agent starts by creating an initial response to the specified job. This might be a preliminary response to a query, a first attempt to complete an activity, or even a creative work.
  • Reflection and Correction: The agent then takes a step back to think on its first reaction. This reflection may be done both internally or via external tools(such as API). LlamaIndex allows you to select the strategy that best meets your needs.
  • Refinement Cycle: Based on the reflection, the agent determines areas for improvement and creates a revised answer. This cycle of reflection and correction continues until we satisfy a stopping condition, such as achieving a certain degree of accuracy or completing a predetermined number of cycles.
Introspective Agents . Source :LlamaIndex
  • Introspective Agent: An AI agent that employs a reflection agent pattern to iteratively refine its responses to a task.
  • Reflection Agent Pattern: A design approach for AI agents where they assess their outputs (reflection) and make adjustments (correction) before finalizing them.

What is Language Agent Tree Search (LATS)?

The  Language Agent Tree Search (LATS), a general LLM agent search algorithm that enhances overall job performance over comparable approaches like as ReACT, Reflexion, or Tree of Thoughts by combining reflection/evaluation with search (more specifically, monte-carlo trees search). The paper released by Zhou et.al can be read here

The LATS framework, a first-of-its-kind general framework, combines LMs’ capacities for action, thinking, and planning. It advances the goal of creating broadly distributed autonomous agents with the ability to think and make decisions in a range of settings.

Also mixes reflection/evaluation with search (particularly, Monte-Carlo trees search) to improve overall work performance. It uses a typical reinforcement learning (RL) task framework, substituting the RL agents, value functions, and optimizer with calls to an LLM. This helps the agent adapt and solve difficult tasks instead of getting caught in repeating cycles.

Steps in LATS framework

  • Generate Candidates : Initial response is generated and multiple candidates are generated.
  • Expand and Simulate : Using the generated potential actions expand each action and execute simulate them  in parallel.
  • Reflect + evaluate: observe the outcomes of these actions and score the decisions based on reflection (and possibly external feedback using external tools).
  • Backpropagate: update the scores of the root trajectories based on the outcomes.
  • Select : pick the best next actions based on the aggregate rewards from above steps. Either respond (if a solution is found or the max search depth is reached) or continue searching from step1.

 If the agent has a tight feedback loop (via high-quality environment rewards or reliable reflection scores), the search can reliably discern between multiple action paths and select the optimal one. The resulting trajectory can then be stored to external memory (or utilised for model fine-tuning) so that the model can be improved later.

LATS Framework . Source: LangChain Blogs

Code Implementation of LATS

LlamaIndex implements LATS as a separate package which can be installed and run out of the box. We will use Cohere Embeddings and Gemini API LLM for this implementation. Both are freely available for use as trial API Keys. 

Step1: Install Libraries

We install libraries of llamaindex for LATS, Cohere and Gemini and some supporting libraries for file reading.

!pip install llama-index-agent-lats --quiet
!pip install llama-index --quiet
!pip install llama-index-core llama-index-readers-file --quiet
!pip install cohere --quiet
!pip install llama-index-llms-cohere --quiet
!pip install llama-index-embeddings-cohere --quiet


!pip install -q llama-index google-generativeai --quiet
!pip install llama-index-llms-gemini --quiet

Step2: Generate API Keys

We need to generate the free API key for using Cohere LLM. Visit website  and log in using Google account or github account. Once logged in you will land at a cohere dashboard page as shown below.

Click on API Keys option . You will see a Trial Free API key is generated. 

Cohere API Key screen

For Gemini API Key visit Gemini Site Click on get an API Key  button as shown below in pic. You will be redirected Google AI Studio where you will need to use your google account login and then find your API Key generated.

Step3: Set API Keys in Environment

Let us now set API keys in enviroment.

import os

os.environ["COHERE_API_KEY"] = "Cohere API key"
os.environ["GOOGLE_API_KEY"] = "Gemini API Key

import nest_asyncio

nest_asyncio.apply()

Step4: Download Data

This step is optional you can supply your own pdf too in the file path. Here we will use Lyft 10 k financial report pdf which was used originally in the research paper.

!mkdir -p 'data/10k/'

!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'#import csv

Step4: Declare Models

To declare the models for our AI system, we use the Cohere Embedding model for generating search query embeddings and the Gemini LLM for advanced language processing. These models provide the foundation for sophisticated query handling and robust language understanding capabilities.

from llama_index.embeddings.cohere import CohereEmbedding

# with input_typ='search_query'
embed_model = CohereEmbedding(
    api_key="Cohere API key",   #api key
    model_name="embed-english-v3.0",
    input_type="search_query",
)


from llama_index.llms.gemini import Gemini

llm = Gemini(model="models/gemini-1.5-flash")

Step5: Create Vector Index 

We utilize the Cohere Embedding model to generate search query embeddings and the Gemini LLM for advanced language processing. These models ensure precise query handling and robust language understanding in our AI system.

import os
import time
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    load_index_from_storage,
)
from llama_index.core.storage import StorageContext


if not os.path.exists("./storage/lyft"):
    # load data
    lyft_docs = SimpleDirectoryReader(
        input_files=["./data/10k/lyft_2021.pdf"]
    ).load_data()
    

    # build index
    lyft_index = VectorStoreIndex.from_documents(lyft_docs, embed_model=embed_model)
    
    # persist index
    lyft_index.storage_context.persist(persist_dir="./storage/lyft")
    
else:
    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/lyft"
    )
    lyft_index = load_index_from_storage(storage_context)

#Retriever
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3, llm=llm)

Step6: Create Query Engine Tool Using Retriever Created Above

We use the Cohere Embedding model to create search query embeddings and the Gemini LLM for advanced language processing. These models enable precise query handling and strong language comprehension in our AI system.

from llama_index.core.tools import QueryEngineTool, ToolMetadata

query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_engine,
        metadata=ToolMetadata(
            name="lyft_10k",
            description=(
                "Provides information about Lyft financials for year 2021. "
                "Use a detailed plain text question as input to the tool. "
                "The input is used to power a semantic search engine."
            ),
        ),
    )]

Step7: Create LATS Agent 

Now we can set up the LATS agent.

  • num_expansions denotes the number of potential sub-actions to generate beneath each node.
  • num_expansions=2 indicates that we will look at two possible next-actions for each parental action.
  • Max_rollouts specifies how far each investigation of the search space goes. max_rollouts=3 indicates that the tree is examined to a maximum depth of five levels.
from llama_index.agent.lats import LATSAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = LATSAgentWorker.from_tools(
    query_engine_tools,
    llm=llm,
    num_expansions=2,
    max_rollouts=3,  # using -1 for unlimited rollouts
    verbose=True,
)
agent = AgentRunner(agent_worker)

Step8: Execute the Agent 

Now we will run the agent using a query.

task = agent.create_task(
    "Give the risk factors for Lyft company using the report of 10k and how Lyft can mitigate each of these risk factors"
)

Run the task:

# run initial step
step_output = agent.run_step(task.task_id)
reflection agents
Evaluated output: reflection agents

Running the whole loop:

# repeat until the last step is reached
while not step_output.is_last:
    step_output = agent.run_step(task.task_id)

response = agent.finalize_response(task.task_id)

Understanding the Output Steps of Agent

Selection: Here initial Observation is selected based on initial query this is parent node. It then generates subsequent candidates .

output: reflection agents

Expand and Simulate

Now the agent takes action to fetch details for above task expansion. It generates output for each input action.

> Generated new reasoning step: Thought: I need to use a tool to understand the
potential impact of the risk factors mentioned in Lyft's 10K report.
Action: lyft_10k
Action Input: {'input': "What is the potential impact of the risk factors mentioned
in Lyft's 10K report for the year 2021?"}
Observation: The risk factors mentioned in Lyft's 10K report for the year 2021 could
negatively impact the company's business, financial condition, and results of
operations. These risks include general economic factors, operational factors, and
insurance-related factors.
> Generated new reasoning step: Thought: I need to use a tool to identify the risk
factors mentioned in Lyft's 10K report.
Action: lyft_10k
Action Input: {'input': "What are the risk factors mentioned in Lyft's 10K report
for the year 2021?"}
Observation: Lyft's 10K report for 2021 outlines several risk factors that could
impact their business, financial condition, and results of operations. These risks
include general economic factors, operational factors, and risks related to
attracting and retaining drivers and riders.  

Reflect + Evaluate

Now it evaluates each Observation obtained after expansion and gives a score.

> Evaluation for input Give the risk factors for Lyft company using the report of 
10k and how Lyft can mitigate each of these risk factors
: score=7 is_done=False reasoning="The conversation is correctly identifying and
analyzing the risk factors mentioned in Lyft's 10K report. However, it has not yet
addressed the mitigation strategies for each risk factor. The latest action is
focused on understanding the potential impact of the risk factors,
which is a necessary step before moving on to mitigation strategies. Therefore, the
conversation is correct but incomplete."

> Evaluation for input Give the risk factors for Lyft company using the report of
10k and how Lyft can mitigate each of these risk factors
: score=5 is_done=False reasoning="The conversation has correctly identified the
need to analyze Lyft's 10K report to find risk factors and develop mitigation strategies.
However, it has only identified the risk factors and not yet developed
any mitigation strategies. The conversation is on the right track but incomplete." 

Generate

Again it selects a node based on backpropagated scores above and selects a node

> Got final response: I am still thinking.
> Selecting node to expand: Observation: Lyft faces a number of risks, including 
general economic factors, operational factors, and insurance-related factors.

> Got candidates: ['Thought: I need to analyze the risk factors identified by the
 `lyft_10k` tool and categorize them into general economic factors, operational 
factors, and insurance-related factors.', "Thought: I need to brainstorm mitigation
 strategies for each risk factor identified by the `lyft_10k` tool, considering 
Lyft's specific circumstances and potential solutions like strategic partnerships,
technological advancements, cost optimization, and proactive regulatory
engagement."] 

Expand + Simulate

It now fetches Mitigation Strategies for each of the 2 candidates generated above. Below we can see second reasoning generated from 2nd candidate gives better response for mitigation strategies.

> Generated new reasoning step: Thought: I need to use the `lyft_10k` tool to get
information on how Lyft can mitigate each of the risk factors.
Action: lyft_10k
Action Input: {'input': 'How can Lyft mitigate the risk factors identified in the
2021 10K report?'}
Observation: Lyft can mitigate the risks by focusing on building a strong reputation
and brand, attracting and retaining qualified drivers, and providing competitive
pricing and services to both riders and drivers.
> Generated new reasoning step: Thought: I can answer without using any more tools.
I'll use the user's language to answer.

Risk Factors

General Economic Factors:
Lyft's business is sensitive to economic downturns, which can lead to decreased
consumer spending and demand for ride-hailing services.
Operational Factors:
Lyft faces challenges related to driver availability, regulatory changes, and
competition from other transportation providers.
Insurance-Related Factors:
Lyft's insurance costs can fluctuate based on accident rates and regulatory changes,
impacting profitability.

Mitigation Strategies

General Economic Factors:
Diversify Revenue Streams: Explore new revenue streams beyond ride-hailing, such as
delivery services, bike-sharing, or subscription models.

Cost Optimization: Implement cost-cutting measures to improve efficiency and
profitability during economic downturns.

Strategic Partnerships: Collaborate with businesses in complementary industries to
offer bundled services and attract new customers.

Operational Factors:
Driver Retention: Implement programs to improve driver satisfaction and retention,
such as competitive pay, benefits, and flexible scheduling.

Technological Advancements: Invest in technology to optimize driver dispatching,
improve route efficiency, and enhance the rider experience.

Proactive Regulatory Engagement: Actively engage with regulators to advocate for
policies that support the ride-hailing industry.
Insurance-Related Factors:
Risk Management: Implement robust risk management programs to minimize accidents and
reduce insurance costs.

Data Analytics: Utilize data analytics to identify high-risk areas and implement
targeted safety measures.

Insurance Partnerships: Explore partnerships with insurance providers to secure
favorable rates and coverage.

Reflect + Evaluate

Here it will again give scores for above each expansion. Below we can see second
reasoning generated from 2nd candidate gives better response for mitigation
strategies. And in Evaluation second response gets higher score than first.

Output

> Evaluation for input Give the risk factors for Lyft company using the report of
10k and how Lyft can mitigate each of these risk factors
: score=7 is_done=False reasoning='The conversation is correctly identifying and
categorizing the risk factors from the 10K report. It is also correctly using the
`lyft_10k` tool to get information on how Lyft can mitigate these risks. However,
the conversation has not yet provided specific mitigation strategies for each risk
factor. It has only provided general strategies like building a strong reputation
and attracting qualified drivers. Therefore, the conversation is correct but
incomplete.'

> Evaluation for input Give the risk factors for Lyft company using the report of
10k and how Lyft can mitigate each of these risk factors
: score=10 is_done=True reasoning='The conversation correctly identified the risk
factors from the 10K report using the `lyft_10k` tool. It then provided a
comprehensive list of mitigation strategies for each risk factor, demonstrating a
thorough understanding of the query. The answer is complete and addresses all
aspects of the query.' 

Final Response

> Got final response: 
Risk Factors
General Economic Factors:
Lyft's business is sensitive to economic downturns, which can lead to decreased
 consumer spending and demand for ride-hailing services.
Operational Factors:
Lyft faces challenges related to driver availability, regulatory changes, and
 competition from other transportation providers.
Insurance-Related Factors:
Lyft's insurance costs can fluctuate based on accident rates and regulatory changes,
 impacting profitability.

Mitigation Strategies
General Economic Factors:
Diversify Revenue Streams: Explore new revenue streams beyond ride-hailing, such as
 delivery services, bike-sharing, or subscription models.

Cost Optimization: Implement cost-cutting measures to improve efficiency and 
profitability during economic downturns.

Strategic Partnerships: Collaborate with businesses in complementary industries to
 offer bundled services and attract new customers.
Operational Factors:
Driver Retention: Implement programs to improve driver satisfaction and retention,
 such as competitive pay, benefits, and flexible scheduling.

Technological Advancements: Invest in technology to optimize driver dispatching,
 improve route efficiency, and enhance the rider experience.

Proactive Regulatory Engagement: Actively engage with regulators to advocate for
 policies that support the ride-hailing industry.

Insurance-Related Factors:
Risk Management: Implement robust risk management programs to minimize accidents and
 reduce insurance costs.

Data Analytics: Utilize data analytics to identify high-risk areas and implement
 targeted safety measures.

Insurance Partnerships: Explore partnerships with insurance providers to secure
 favorable rates and coverage.

Final Response Display

Can display final response in Markdown Format.

from IPython.display import Markdown

display(Markdown(str(response)))

Code Implementation of Introspective Agent with Self Reflection Using LLM

In this framework, the LLM Agent performs the Reflection by analyzing and improving the response with reflection. Here we will use a self reflective Agent to gradually improve a toxic input text and generate a safer version of text as final response. 

LlamaIndex Introspective Agents framework

Step1: Install Libraries

We install libraries of llamaindex for Introspective Agents, Cohere and Gemini and some supporting libraries for file reading.

!pip install llama-index-agent-introspective -q
!pip install llama-index --quiet
!pip install llama-index-core llama-index-readers-file --quiet
!pip install cohere --quiet
!pip install llama-index-llms-cohere --quiet
!pip install llama-index-embeddings-cohere --quiet

!pip install llama-index-llms-openai -q
!pip install llama-index-program-openai -q

!pip install -q llama-index google-generativeai --quiet
!pip install llama-index-llms-gemini --quiet

Step2: Set API Keys in environment

import os

os.environ["COHERE_API_KEY"] = "Cohere API key"
os.environ["GOOGLE_API_KEY"] = "Gemini API Key

import nest_asyncio

nest_asyncio.apply()

Step3: Declare Model

We leverage the Cohere Embedding model for search query embeddings and the Gemini LLM for advanced language processing. Together, these models enhance our AI system’s query precision and language comprehension.

from llama_index.llms.gemini import Gemini
from google.generativeai.types import HarmCategory, HarmBlockThreshold
#Safety Settings
safety_settings={
        HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
        HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
        HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
        HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE
    }
# Initialise LLM
llm = Gemini(model="models/gemini-1.5-flash", safety_settings = safety_settings)

Step4: Build Self Reflective Agent

In this step, we build a self-reflective agent by defining a SelfReflectionAgentWorker and optionally a MainAgentWorker, and then constructing an IntrospectiveAgent using these components. This setup enhances the agent’s ability to reflect on its actions and improve its performance through introspection.

from llama_index.agent.introspective import SelfReflectionAgentWorker

from llama_index.agent.introspective import IntrospectiveAgentWorker
from llama_index.agent.introspective import (
    ToolInteractiveReflectionAgentWorker,
)

from llama_index.agent.openai import OpenAIAgentWorker
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.core import ChatPromptTemplate


def get_introspective_agent_with_self_reflection(
    verbose=True, with_main_worker=False
):
    """Helper function for building introspective agent using self reflection.

    Steps:

    1. Define the `SelfReflectionAgentWorker`
        1a. Construct `SelfReflectionAgentWorker` using .from_defaults()

    2. Optionally define a `MainAgentWorker`

    3. Construct `IntrospectiveAgent`
        3a. Construct `IntrospectiveAgentWorker` using .from_defaults()
        3b. Construct `IntrospectiveAgent` using .as_agent()
    """

    # 1a.
    self_reflection_agent_worker = SelfReflectionAgentWorker.from_defaults(
        llm=llm,
        verbose=verbose,
    )

    # 2.
    if with_main_worker:
        main_agent_worker = OpenAIAgentWorker.from_tools(
            tools=[], llm=llm, verbose=True
        )
    else:
        main_agent_worker = None

    # 3a.
    introspective_worker_agent = IntrospectiveAgentWorker.from_defaults(
        reflective_agent_worker=self_reflection_agent_worker,
        main_agent_worker=main_agent_worker,
        verbose=verbose,
    )

    chat_history = [
        ChatMessage(
            content="You are an assistant that generates safer versions of potentially toxic, user-supplied text.",
            role=MessageRole.SYSTEM,
        )
    ]

    # 3b.
    return introspective_worker_agent.as_agent(
        chat_history=chat_history, verbose=verbose
    )


introspective_agent = get_introspective_agent_with_self_reflection(
    verbose=True
)

Step5: Run the Query

In this step, we run a query by providing a potentially harmful text to the introspective agent for processing. The agent will analyze the text and generate a safer version, demonstrating its ability to handle and mitigate toxic content.

harmful_text = (
    "Those individuals who publicly tout eating no meat, "
    "especially when their stated reason has to do with caring "
    "about animals, are some of the most hypocritical people I’ve "
    "ever met. One vegan acquaintance routinely beat her"
)
response = await introspective_agent.achat(harmful_text)

Output

> Running step 9c64852c-870e-4e18-bd39-e6413b1eddcd. Step input: Those individuals 
who publicly tout eating no meat, especially when their stated reason has to do with
caring about animals, are some of the most hypocritical people I’ve ever met. One
vegan acquaintance routinely beat her
Added user message to memory: Those individuals who publicly tout eating no meat,
especially when their stated reason has to do with caring about animals, are some
of the most hypocritical people I’ve ever met. One vegan acquaintance routinely
beat her
> Running step 5e19282e-c1fa-4b19-a3b0-9aa49eba2997. Step input: Those individuals
who publicly tout eating no meat, especially when their stated reason has to do with
caring about animals, are some of the most hypocritical people I’ve ever met. One
vegan acquaintance routinely beat her
> Reflection: {'is_done': False, 'feedback': "The agent has not made any tool calls
or produced any output. It needs to generate a safer version of the user's text."}
Correction: I've met some people who publicly tout eating no meat, especially when
their stated reason has to do with caring about animals, who seem hypocritical. For
example, I once knew a vegan who routinely beat her
> Running step 2292b173-31f8-456c-b24b-66e13760032f. Step input: None
> Reflection: {'is_done': False, 'feedback': "The agent has made a good start by
generating a safer version of the user's text. However, it has not finished the
task. It needs to complete the sentence and provide a safer version of the entire
input."}
Correction: I've met some people who publicly tout eating no meat, especially when
their stated reason has to do with caring about animals, who seem hypocritical. For
example, I once knew a vegan who routinely beat her dog.
> Running step dd008df5-e28f-40a9-bacc-e4d02a84f0ba. Step input: None
> Reflection: {'is_done': False, 'feedback': "The agent has made a good start by
generating a safer version of the user's text. However, it has not finished the
task. It needs to complete the sentence and provide a safer version of the entire
input."}
Correction: I've met some people who publicly tout eating no meat, especially when
their stated reason has to do with caring about animals, who seem hypocritical. It's
important to remember that everyone has their own journey and that actions don't
always reflect beliefs. For example, I once knew a person who identified as vegan
but whose actions didn't always align with their stated beliefs.
> Running step e15a8aa0-de0a-49f6-8ff0-34b3002ef239. Step input: None
> Reflection: {'is_done': True, 'feedback': "The agent has successfully completed
the task by generating a safer version of the user's text. The final message is an
ASSISTANT message, indicating that the agent is done thinking."}

Final Response

response.response
I've met some people who publicly tout eating no meat, especially when their stated
reason has to do with caring about animals, who seem hypocritical. It's important
to remember that everyone has their own journey and that actions don't always
reflect beliefs. For example, I once knew a person who identified as vegan but whose
actions didn't always align with their stated beliefs.

Introspective Agent with Tool Reflection Using External Tools

These Introspective Agent is similar to the above Self Reflection Introspective agent except for Reflection Agent Worker we pass an external Tool to generate Reflections such as an API.  Here also we will try to generate safer versions of Toxic Text but instead of LLM we will use a Perspective API which gives score for toxicity of text.

Code Implementation Using External Tools

We will now implement code using external tools.

Step1: Install Libraries

We install libraries of llamaindex for Introspective Agents, Open AI and some supporting libraries for file reading.

%pip install llama-index-agent-introspective -q
%pip install google-api-python-client -q
%pip install llama-index-llms-openai -q
%pip install llama-index-program-openai -q
%pip install llama-index-readers-file -q#import csv

Step2: Set API Keys in environment

To use Perspecive’s API, you will need to do the following steps:

  • Enable the Perspective API in your Google Cloud projects
  • Generate a new set of credentials (i.e. API key) that you will need to either set an env var

To perform steps 1. and 2., you can follow the instructions outlined here: https://developers.perspectiveapi.com/s/docs-enable-the-api?language=en_US.

import os

os.environ["OPEN_API_KEY"] = "OPEN API KEY"
os.environ["PERSPECTIVE_API_KEY"] = "Perspective API"

Step3: Build Perspective Class Helper Class

We will now define a custom Perspective class to interact with the Perspective API, which is used to analyze text for various attributes like toxicity, identity attack, and profanity. This class facilitates making API calls to obtain toxicity scores, essential for evaluating and handling potentially harmful content in the text.

from googleapiclient import discovery
from typing import Dict, Optional
import json
import os


class Perspective:
    """Custom class to interact with Perspective API."""

    attributes = [
        "toxicity",
        "severe_toxicity",
        "identity_attack",
        "insult",
        "profanity",
        "threat",
        "sexually_explicit",
    ]

    def __init__(self, api_key: Optional[str] = None) -> None:
        if api_key is None:
            try:
                api_key = os.environ["PERSPECTIVE_API_KEY"]
            except KeyError:
                raise ValueError(
                    "Please provide an api key or set PERSPECTIVE_API_KEY env var."
                )

        self._client = discovery.build(
            "commentanalyzer",
            "v1alpha1",
            developerKey=api_key,
            discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
            static_discovery=False,
        )

    def get_toxicity_scores(self, text: str) -> Dict[str, float]:
        """Function that makes API call to Perspective to get toxicity scores across various attributes."""

        analyze_request = {
            "comment": {"text": text},
            "requestedAttributes": {
                att.upper(): {} for att in self.attributes
            },
        }

        response = (
            self._client.comments().analyze(body=analyze_request).execute()
        )
        try:
            return {
                att: response["attributeScores"][att.upper()]["summaryScore"][
                    "value"
                ]
                for att in self.attributes
            }
        except Exception as e:
            raise ValueError("Unable to parse response") from e


perspective = Perspective()

Step4: Build Perspective Tool

In this step, we create a Perspective tool using the perspective_function_tool function. This function computes toxicity scores for a given text, returning the most problematic toxic attribute and its score. The FunctionTool is then used to integrate this functionality into the AI system, enabling efficient assessment of text toxicity.

from typing import Tuple
from llama_index.core.bridge.pydantic import Field


def perspective_function_tool(
    text: str = Field(
        default_factory=str,
        description="The text to compute toxicity scores on.",
    )
) -> Tuple[str, float]:
    """Returns the toxicity score of the most problematic toxic attribute."""

    scores = perspective.get_toxicity_scores(text=text)
    max_key = max(scores, key=scores.get)
    return (max_key, scores[max_key] * 100)


from llama_index.core.tools import FunctionTool

pespective_tool = FunctionTool.from_defaults(
    perspective_function_tool,
)

Step5: Build the Self reflective Introspective Agent Using Tool

With our tool define, we can now build our IntrospectiveAgent and the required ToolInteractiveReflectionAgentWorker. To construct the latter, we need to also construct a CritiqueAgentWorker that will ultimately be responsible for performing the reflection with the tools.

The code provided below defines a helper function to construct this IntrospectiveAgent. We do this for convenience as we will later test the two reflection techniques.

from llama_index.agent.introspective import IntrospectiveAgentWorker
from llama_index.agent.introspective import (
    ToolInteractiveReflectionAgentWorker,
)

from llama_index.llms.openai import OpenAI
from llama_index.agent.openai import OpenAIAgentWorker
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.core import ChatPromptTemplate


def get_introspective_agent_with_tool_interactive_reflection(
    verbose=True, with_main_worker=False
):
    """Helper function for building introspective agent using tool-interactive reflection.

    Steps:

    1. Define the `ToolInteractiveReflectionAgentWorker`
        1a. Construct a CritiqueAgentWorker that performs reflection with tools.
        1b. Define an LLM that will be used to generate corrections against the critique.
        1c. Define a function that determines the stopping condition for reflection/correction
            cycles
        1d. Construct `ToolInteractiveReflectionAgentWorker` using .from_defaults()

    2. Optionally define a `MainAgentWorker`

    3. Construct `IntrospectiveAgent`
        3a. Construct `IntrospectiveAgentWorker` using .from_defaults()
        3b. Construct `IntrospectiveAgent` using .as_agent()
    """

    # 1a.
    critique_agent_worker = FunctionCallingAgentWorker.from_tools(
        tools=[pespective_tool], llm=OpenAI("gpt-3.5-turbo"), verbose=verbose
    )
    # 1b.
    correction_llm = OpenAI("gpt-4-turbo-preview")

    # 1c.
    def stopping_callable(critique_str: str) -> bool:
        """Function that determines stopping condition for reflection & correction cycles.

        critique_str [str]: The response string provided by the critique agent.
        """

        return "[PASS]" in critique_str

    # 1d.
    tool_interactive_reflection_agent_worker = (
        ToolInteractiveReflectionAgentWorker.from_defaults(
            critique_agent_worker=critique_agent_worker,
            critique_template=(
                "Please provide a toxicity critique for the provided text. "
                "If the toxicity score is less than 3.0 then please "
                "write '[PASS]' otherwise write '[FAIL]'. "
                "Here is the text:\n {input_str}"
            ),
            stopping_callable=stopping_callable,
            correction_llm=correction_llm,
            verbose=verbose,
        )
    )

    # 2.
    if with_main_worker:
        main_agent_worker = OpenAIAgentWorker.from_tools(
            tools=[], llm=OpenAI("gpt-4-turbo-preview"), verbose=True
        )
    else:
        main_agent_worker = None

    # 3a.
    introspective_agent_worker = IntrospectiveAgentWorker.from_defaults(
        reflective_agent_worker=tool_interactive_reflection_agent_worker,
        main_agent_worker=main_agent_worker,
        verbose=verbose,
    )

    chat_history = [
        ChatMessage(
            content="You are an assistant that generates safer versions of potentially toxic, user-supplied text.",
            role=MessageRole.SYSTEM,
        )
    ]

    # 3b.
    return introspective_agent_worker.as_agent(
        chat_history=chat_history, verbose=verbose
    )


introspective_agent = get_introspective_agent_with_tool_interactive_reflection(
    verbose=True,
)

Step6: Run the Query

We now run a query using the introspective_agent to analyze a piece of potentially harmful text. The agent processes the text and returns a response based on its introspective and reflection capabilities.

harmful_text = (
    "Those individuals who publicly tout eating no meat, "
    "especially when their stated reason has to do with caring "
    "about animals, are some of the most hypocritical people I’ve "
    "ever met. One vegan acquaintance routinely beat her"
)
response = await introspective_agent.achat(harmful_text)

Output:

"People who choose not to eat meat for ethical reasons related to animal welfare are
making a personal decision. It's important to respect diverse perspectives and
experiences."

Conclusion

In conclusion, leveraging reflection agents within the LlamaIndex framework offers a robust method for enhancing AI performance and reliability. By implementing strategies like Language Agent Tree Search (LATS), basic introspective agents, and those utilizing external tools, we can significantly improve the depth and accuracy of AI responses. These agents shift AI behavior from reactive to more analytical and introspective patterns, akin to human System 2 thinking. As demonstrated, reflection agents iteratively refine their outputs, ensuring a higher quality of responses for complex tasks. This approach not only enhances AI efficiency but also paves the way for more autonomous and intelligent systems.

Key Takeaways

  • Learned about Reflection Agents concept in Gen AI.
  • Understood about various different types of reflection agents.
  • Learnt how to implement LATS Reflection Agent using Llamaindex.
  • Explored how to implement Self Reflection Introspective Agent using Llamaindex.

Frequently Asked Questions

Q1. What is the primary benefit of using reflection agents in LLM-based frameworks?

A. Reflection agents in LLM-based frameworks enhance response quality and accuracy by self-evaluating, identifying errors, and refining iteratively, resulting in more reliable and effective performance.

Q2. How does the reflection process differ when using external tools compared to self-prompting in Llamaindex?

A. The Llamaindex framework uses external tools for reflection, while self-prompting relies on internal mechanisms for reflection analysis, allowing for more robust responses when external data is available.

Q3. What are some potential challenges when implementing reflection agents in Llamaindex?

A. Implementing reflection agents in Llamaindex faces challenges like managing computational overhead, ensuring external tool accuracy, designing stopping conditions, and integrating processes into workflows.

Q4. How does the Language Agent Tree Search (LATS) framework enhance the performance of reflection agents compared to other methods?

A. The Language Agent Tree Search (LATS) framework improves reflection agents’ performance by integrating Monte-Carlo Tree Search (MCTS) search algorithms, enabling parallel exploration, evaluation, and optimal path selection, leading to more informed decision-making.

Q5. What are the other frameworks where Reflection Agents are available?

A. LangChain offers Reflection agents but it is implemented through LangGraph . It does not offer out of box solution as Llama Index.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Ritika 05 Aug, 2024

I am a professional working as data scientist after finishing my MBA in Business Analytics and Finance. A keen learner who loves to explore and understand and simplify stuff! I am currently learning about advanced ML and NLP techniques and reading up on various topics related to it including research papers .

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,