Agentic RAG for Analyzing Customer Issues

Nibedita Dutta Last Updated : 06 Nov, 2024

8 min read

RAG is a sophisticated AI technique that enhances the performance of LLMs by retrieving relevant documents or information from external sources during text generation; unlike traditional LLMs that rely solely on internal training data, RAG leverages real-time information to deliver more accurate and contextually relevant responses. While Naive RAG works very well for simple queries, it struggles with complex questions requiring multi-step reasoning or iterative refinement.

Learning Objectives

Understand the key differences between Agentic RAG and Naive RAG.
Recognize the limitations of Naive RAG in handling complex queries.
Explore diverse use cases where Agentic RAG excels in multi-step reasoning tasks.
Learn how to implement Agentic RAG in Python using CrewAI for intelligent data retrieval and summarization.
Discover how Agentic RAG strengthens Naive RAG’s capabilities by adding decision-making agents.

This article was published as a part of the Data Science Blogathon.

Learning Objectives
Agentic RAG Strengthening Capabilities of Naive RAG
Agentic RAG vs Naive RAG
Use Cases of Agentic RAG
Building Agentic RAG with Python and CrewAI
Conclusion
- Key Takeaways
Frequently Asked Questions

Agentic RAG Strengthening Capabilities of Naive RAG

Agentic RAG is a novel hybrid approach that merges the strengths of Retrieval-Augmented Generation and AI Agents. This framework enhances generation and decision-making by integrating dynamic retrieval systems (RAG) with autonomous agents. In Agentic RAG, the retriever and generator are combined and operate within a multi-agent framework where agents can request specific pieces of information and make decisions based on retrieved data.

Agentic RAG vs Naive RAG

While Naive RAG focuses solely on improving generation through information retrieval, Agentic RAG adds a layer of decision-making through autonomous agents.
In Naive RAG, the retriever is passive, retrieving data only when requested. In contrast, Agentic RAG employs agents that actively decide when, how, and what to retrieve.

Top k retrieval in Naive RAG can fail in the following scenarios:

Summarization Questions: “Give me a summary of this document”.
Comparison Questions: “Compare business strategy of PepsiCo vs Coca Cola for the last quarter of 2023”
Multi-part Complex Queries: “Tell me about the top arguments on Retail Inflation presented in the Mint Article and tell me about the top arguments on Retail inflation on Economic Times Article. Make a comparison table based on the collected arguments and then generate the top conclusions based on these facts.”

Naive RAG: Agentic RAG for Analyzing Customer Issues

Use Cases of Agentic RAG

With the incorporation of AI agents in RAG, agentic RAG could be leveraged in several intelligent, multi-step reasoning systems. Few key use cases could be the following –

Legal Research: Comparison of Legal Documents and Generation of Key Clauses for quick decision making.
Market Analysis: Competitive analysis of Top brands in a product segment.
Medical Diagnosis: Comparison of Patient Data and Latest Research Studies to generate possible diagnosis.
Financial Analysis: Processing Different Financial Reports and generation of key points for better investment insights.
Compliance: Ensuring regulatory compliance by comparing policies with laws.

Building Agentic RAG with Python and CrewAI

Consider a dataset consisting of different tech products and the customer issues raised for these products as shown in the image below. You can download the dataset from here.

We can develop an agentic RAG system to summarize the top customer complaints for each of the brands like GoPro, Microsoft etc across all their products. We will see in the following steps how we can achieve it.

Multiagent : Agentic RAG for Analyzing Customer Issues

Step1: Install Necessary Python Libraries

Before starting with Agentic RAG, it’s crucial to install the required Python libraries, including CrewAI and LlamaIndex, to support data retrieval and agent-based tasks.

!pip install llama-index-core
!pip install llama-index-readers-file
!pip install llama-index-embeddings-openai
!pip install llama-index-llms-llama-api
!pip install 'crewai[tools]'

Step2: Import the required Python Libraries

This step involves importing essential libraries to set up the agents and tools for implementing Agentic RAG, enabling efficient data processing and retrieval.

import os
from crewai import Agent, Task, Crew, Process
from crewai_tools import LlamaIndexTool
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.openai import OpenAI

Step3: Read the relevant csv file of Customer Issues Data

Now we load the dataset containing customer issues to make it accessible for analysis, forming the basis for retrieval and summarization.

reader = SimpleDirectoryReader(input_files=["CustomerSuppTicket_small.csv"])
docs = reader.load_data()

Step4: Define the Open AI API key

This step sets up the OpenAI API key, which is necessary to access OpenAI’s language models for handling data queries.

from google.colab import userdata
openai_api_key = ''
os.environ['OPENAI_API_KEY']=openai_api_key

Step5: LLM Initialization

Initialize the Large Language Model (LLM), which will process the query results retrieved by the Agentic RAG system, enhancing summarization and insights.

llm = OpenAI(model="gpt-4o")

Step6: Creating a Vector Store Index and Query Engine

This involves creating a vector store index and query engine, making the dataset easily searchable based on similarity, with refined results delivered by the LLM.

#creates a VectorStoreIndex from a list of documents (docs)
index = VectorStoreIndex.from_documents(docs)

#The vector store is transformed into a query engine. 
#Setting similarity_top_k=5 limits the results to the top 5 documents that are most similar to the query, 
#llm specifies that the LLM should be used to process and refine the query results
query_engine = index.as_query_engine(similarity_top_k=5, llm=llm)

Step7: Creating a Tool Based on the Defined Query Engine

This uses LlamaIndexTool to create a tool based on the query_engine. The tool is named “Customer Support Query Tool” and is described as a way to look up customer ticket data.

query_tool = LlamaIndexTool.from_query_engine(
    query_engine,
    name="Customer Support Query Tool",
    description="Use this tool to lookup the customer ticket data",
)

Step8: Defining the Agents

Agents are defined with specific roles and goals to perform tasks, such as data analysis and content creation, aimed at uncovering insights from customer data.

researcher = Agent(
      role="Customer Ticket Analyst",
      goal="Uncover insights about customer issues trends",
      backstory="""You work at a Product Company.
    Your goal is to understand customer issues patterns for each of the brands - 'GoPro' 'LG' 'Dell' 'Microsoft' 'Autodesk' 'Philips' 'Fitbit' 'Dyson'
 'Nintendo' 'Nest' 'Sony' 'Xbox' 'Canon' 'HP' 'Amazon' 'Lenovo' 'Adobe'
 'Google' 'PlayStation' 'Samsung' 'iPhone'.""",
      verbose=True,
      allow_delegation=False,
      tools=[query_tool],
  )
  
  
writer = Agent(
      role="Product Content Specialist",
      goal="""Craft compelling content on customer issues trends for each of the brands - 'GoPro' 'LG' 'Dell' 'Microsoft' 'Autodesk' 'Philips' 'Fitbit' 'Dyson'
 'Nintendo' 'Nest' 'Sony' 'Xbox' 'Canon' 'HP' 'Amazon' 'Lenovo' 'Adobe'
 'Google' 'PlayStation' 'Samsung' 'iPhone'.""",
      backstory="""You are a renowned Content Specialist, known for your insightful and engaging articles.
    You transform complex sales data into compelling narratives.""",
      verbose=True,

      allow_delegation=False,
  )

The role of the ‘researcher’ agent is an analyst who will review and interpret customer support data. The goal of this agent is defined to “uncover insights about customer issues trends. The backstory provides the agent with a background or context about its purpose. Here, it assumes the role of a support analyst at a product company tasked with understanding customer issues for various brands (e.g., GoPro, LG, Dell, etc.). This background helps the agent focus on each brand individually as it looks for trends. The agent is provided with the tool – ‘query_tool’. This means that the researcher agent can use this tool to retrieve relevant customer support data, which it can then analyze according to its goal and backstory.

The role of the ‘writer’ agent is that of a content creator focused on providing product insights. The goal of this agent is defined to to “craft compelling content” regarding trends in customer issues for a list of brands. This goal will guide the agent to look specifically for insights that would make good narrative or analytical content. The backstory gives the agent additional context, painting it as a highly skilled content creator capable of turning data into engaging articles.

Step9: Creating the Tasks for the Defined Agents

Tasks are assigned to agents based on their roles, outlining specific responsibilities like data analysis and crafting narratives on customer issues.

task1 = Task(
      description="""Analyze the top customer issues issues for each of the brands - 'GoPro' 'LG' 'Dell' 'Microsoft' 'Autodesk' 'Philips' 'Fitbit' 'Dyson'
 'Nintendo' 'Nest' 'Sony' 'Xbox' 'Canon' 'HP' 'Amazon' 'Lenovo' 'Adobe'
 'Google' 'PlayStation' 'Samsung' 'iPhone'.""",
      expected_output="Detailed Customer Issues mentioning NAME of Brand report with trends and insights",
      agent=researcher,
  )

task2 = Task(
      description="""Using the insights provided, develop an engaging blog
    post that highlights the top-customer issues for each of the brands - 'GoPro' 'LG' 'Dell' 'Microsoft' 'Autodesk' 'Philips' 'Fitbit' 'Dyson'
 'Nintendo' 'Nest' 'Sony' 'Xbox' 'Canon' 'HP' 'Amazon' 'Lenovo' 'Adobe'
 'Google' 'PlayStation' 'Samsung' 'iPhone' and their pain points.

    Your post should be informative yet accessible, catering to a casual audience.Ensure thet the post has NAME of the BRAND e.g. GoPro, FitBit etc.
    Make it sound cool, avoid complex words.""",
      expected_output="Full blog post in Bullet Points of customer issues. Ensure thet the Blog has NAME of the BRAND e.g. GoPro, FitBit etc.",
      agent=writer,
  )

Step10: Instantiating the Crew with a Sequential Process

A crew is formed with agents and tasks, and this step initiates the process, where agents collaboratively retrieve, analyze, and present data insights.

crew = Crew(
      agents=[researcher,writer],
      tasks=[task1,task2],
      verbose=True,  # You can set it to 1 or 2 to different logging levels
  )

result = crew.kickoff()

This code creates a Crew instance, which is a group of agents assigned specific tasks, and then initiates the crew’s work with the kickoff() method.

agents: This parameter assigns a list of agents to the crew. Here, we have two agents: researcher and writer. Each agent has a specific role—researcher focuses on analyzing the customer issues for each brands, while writer focuses on summarizing them.
tasks: This parameter provides a list of tasks that the crew should complete.

Output

As can be seen from the output above, using the Agentic RAG system, a concise summary in bullet points of all customer issues across different brands like LG, Dell, Fitbit etc have been generated. This concise and accurate summarization of customer issues across the different brands is possible only through use of the agents.

Conclusion

Agentic RAG is a major step forward in Retrieval-Augmented Generation. It blends RAG’s retrieval power with autonomous agents’ decision-making ability. This hybrid model goes beyond Naive RAG, tackling complex questions and comparative analysis. Across industries, it provides more insightful, accurate responses. Using Python and CrewAI, developers can now create Agentic RAG systems for smarter, data-driven decisions.

Key Takeaways

Agentic RAG integrates autonomous agents, adding a layer of dynamic decision-making that goes beyond simple retrieval.
Agentic RAG leverages agents to tackle complex queries, including summarization, comparison, and multi-part reasoning. This capability addresses limitations where Naive RAG typically falls short.
Agentic RAG is valuable in fields like legal research, medical diagnosis, financial analysis, and compliance monitoring. It provides nuanced insights and enhanced decision-making support.
Using CrewAI, Agentic RAG can be effectively implemented in Python, demonstrating a structured approach for multi-agent collaboration to tackle intricate customer support analysis tasks.
Agentic RAG’s flexible agent-based architecture makes it well-suited to complex data retrieval and analysis in diverse use cases, from customer service to advanced analytics.

Frequently Asked Questions

Q1. What makes Agentic RAG different from Naive RAG?

A. Agentic RAG incorporates autonomous agents that actively manage data retrieval and decision-making, while Naive RAG simply retrieves information on request without additional reasoning capabilities.

Q2. Why does Naive RAG struggle with complex queries?

A. Naive RAG’s passive retrieval approach is limited to direct responses, which makes it ineffective for summarization, comparison, or multi-part queries that need iterative reasoning or layered information retrieval.

Q3. How is Agentic RAG applied in real-world scenarios?

A. Agentic RAG is valuable for tasks that require multi-step reasoning, such as legal research, market analysis, medical diagnosis, financial insights, and ensuring compliance through policy comparison.

Q4. Can I implement Agentic RAG using Python?

A. Yes, you can implement Agentic RAG in Python, particularly using libraries like CrewAI. This helps set up and manage agents that collaborate to retrieve, analyze, and summarize data.

Q5. What industries can benefit the most from Agentic RAG?

A. Industries with complex data processing needs, such as law, healthcare, finance, and customer support, stand to benefit the most from Agentic RAG’s intelligent data retrieval and decision-making capabilities.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Nibedita Dutta

Nibedita completed her master’s in Chemical Engineering from IIT Kharagpur in 2014 and is currently working as a Senior Data Scientist. In her current capacity, she works on building intelligent ML-based solutions to improve business processes.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Agentic RAG for Analyzing Customer Issues

Learning Objectives

Table of contents

Agentic RAG Strengthening Capabilities of Naive RAG

Agentic RAG vs Naive RAG

Use Cases of Agentic RAG

Building Agentic RAG with Python and CrewAI

Step1: Install Necessary Python Libraries

Step2: Import the required Python Libraries

Step3: Read the relevant csv file of Customer Issues Data

Step4: Define the Open AI API key

Step5: LLM Initialization

Step6: Creating a Vector Store Index and Query Engine

Step7: Creating a Tool Based on the Defined Query Engine

Step8: Defining the Agents

Step9: Creating the Tasks for the Defined Agents

Step10: Instantiating the Crew with a Sequential Process

Output

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at