In today’s highly competitive market, businesses strive to understand and resolve consumer complaints effectively. Consumer complaints can shed light on a wide range of issues from product defects and poor customer service to billing errors and safety concerns. They play a crucial role in the feedback (regarding products, services, or experiences) loop between businesses and their customers. Analysing and understanding these complaints can provide valuable insights into product or service improvements, customer satisfaction, and overall business growth. In this article, we will explore how to leverage the Doctran Python library to analyse consumer complaints, extract insights, and make data-driven decisions.
In this article, you will:
This article was published as a part of the Data Science Blogathon.
Doctran is a state-of-the-art Python library designed for document transformation and analysis. It provides a set of functions to pre-process text data, extract key information, categorize/classify, interrogate, summarize the information, and translate text into other languages. Doctran utilizes LLMs (Large Language Models) such as OpenAI GPT based models and open source NLP libraries to dissect textual data.
It supports following six types of document transformations:
The integration is also available in LangChain framework inside document_transformers module. LangChain is a cutting-edge framework to build LLM powered applications.
LangChain provides the flexibility to explore and utilize a wide range of open source and closed source LLM models. It seamlessly allows to connect to diverse external data sources such as PDFs, text files, Excel spreadsheets, PPTs etc. It also empowers to experiment with different prompts, engage in prompt engineering, leverage built-in chains and agents, and more.
Within the document_transformers module of Langchain, there are three implementations: DoctranPropertyExtractor, DoctranQATransformer, and DoctranTextTranslator. These are used for Extract, Interrogate, and Translate document transformations, respectively.
Doctran can be easily installed using pip command.
pip install doctran
Having known about doctran library, now let’s explore different types of document transformations available in doctran using the below consumer complaint enclosed in triple backticks (“`).
“`
November 26, 2021
The Manager
Customer Service Department
Taurus Shop
New Delhi – 110023
Subject: Complaint about defective ‘VIP’ washing machine
Dear Sir,
I had purchased an automatic washing machine on 15 July 2022, model no. G 24 and the invoice no. is 1598.
Last week, the machine stopped working abruptly and has not been working since then despite all our efforts. The machine stops running after the rinsing process is completed, causing a lot of problems. Moreover, the machine since the last day or so has also started making loud noises, creating inconvenience for us.
Please send your technician to repair it and if needed get it replaced within the following week.
Hoping for an early response
Yours truly
“`
To perform document transformation using doctran, first we need to convert the raw text into a doctran document. A doctran document is a fundamental data type that are optimized for vector search. It represents a piece of unstructured data. It consists of raw content and associated metadata.
Instantiate a doctran object by specifying the OPENAI_API_KEY in the open_ai_key parameter. Next, parse the raw content as a doctran document by calling the parse() method on top of doctran object.
sample_complain = """
November 26, 2021
The Manager
Customer Service Department
Taurus Shop
New Delhi – 110023
Subject: Complaint about defective ‘VIP’ washing machine
Dear Sir,
I had purchased an automatic washing machine on 15 July 2022,
model no. G 24 and the invoice no. is 1598.
Last week, the machine stopped working abruptly and has not been working
since then despite all our efforts.
The machine stops running after the rinsing process is completed,
causing a lot of problems.
Moreover, the machine since the last day or so has also started making loud noises,
creating inconvenience for us.
Please send your technician to repair it and if needed get it replaced within the following week.
Hoping for an early response
Yours truly
"""
doctran = Doctran(openai_api_key=OPENAI_API_KEY)
document = doctran.parse(content=sample_complain)
print(document.raw_content)
Output:
One of the primary functions of doctran is to extract key properties from a document. Internally, it make use of OpenAI function calling to extract properties (data points) from a document. It uses OpenAI GPT-4 model with a token limit of 8000 tokens.
GPT-4, short for Generative Pre-trained Transformer 4 is multimodal large language model developed by OpenAI. In comparison to its predecessors, GPT-4 demonstrates an enhanced capability to tackle complex tasks. Additionally, it can use visual inputs (such as images, charts, memes etc.) alongside text. The model has achieved human-level performance on a variety of professional and academic benchmarks, including the Uniform Bar Exam.
We need to define a schema by instantiating ExtractProperty class for each of the property that we want to extract. The schema comprises several key elements: a property name, a description, data type, a list of selectable values, and a required flag, which is a boolean indicator.
Here, we have specified four properties – Category, Sentiment, Aggressiveness and Language.
from doctran import ExtractProperty
properties = [
ExtractProperty(
name="Category",
description="What type of consumer complaint this is",
type="string",
enum=["Product or Service", "Wait Time", "Delivery", "Communication Gap", "Personnel"],
required=True
),
ExtractProperty(
name="Sentiment",
description = "Assess the polarity/sentiment",
type="string",
enum = ["Positive", "Negative", "Neutral"],
required=True
),
ExtractProperty(
name="Aggressiveness",
description="""describes how aggressive the complaint is,
the higher the number the more aggressive""",
type="number",
enum=[1, 2, 3, 4, 5],
required=True
),
ExtractProperty(
name="Language",
type="string",
description = "source language",
enum = ["English", "Hindi", "Spanish", "Italian", "German"],
required=True
)
]
To retrieve the properties, we can call the extract() function on the document. This function takes the properties as a parameter.
extracted_doc = await document.extract(properties=properties).execute()
The extract operation returns a new document with properties provided in extracted_properties key.
print(extracted_doc.extracted_properties)
Output:
Doctran allows us to convert the content within a document into a Q&A format. User queries are typically phrased as questions. So, to improve search results when using a vector database, it can be helpful to transform the information into questions. Creating indexes from these questions allows for better context retrieval compared to indexing the original text.
To interrogate the document, make use of built-in interrogate() function. It returns a new document and the generated set of Q&A is available inside extracted_properties attribute.
interrogated_doc = await document.interrogate().execute()
print(interrogated_doc.extracted_properties['questions_and_answers'])
Output:
Using doctran, we can also generate a concise and meaningful summary of the original text. Invoke the summarize() function to summarize the document. Additionally, specify the token_limit to configure the size of summary.
summarized_doc = await document.summarize(token_limit=30).execute()
print(summarized_doc.transformed_content)
Output:
Translating documents into other languages can be helpful especially when users are expected to query the knowledge base in different languages, or when state-of-the-art embedding models are not available for a given language.
Language translation for our consumer complaints use case can be useful for global businesses with multilingual customer bases. Using the built-in translate() function we can translate the information into another languages such as Hindi, Spanish, Italian, German etc.
translated_doc = await document.translate(language="hindi").execute()
print(translated_doc.transformed_content)
Output:
In the era of data-driven decision-making, consumer complaint analysis is a vital process that can lead to improved products and services and ultimately result in higher customer satisfaction. Using LLMs and advanced NLP tools we can convert the raw textual data into actionable insights that drive business growth and improvement. In this article, we discussed about doctran, different types of document transformations supported by this library with the help of consumer complaints.
A: The primary purpose of the doctran Python library is to perform document transformation and analysis. It offers a set of functions to pre-process text data, extract valuable information, categorize and classify content, and translate text into different languages. It uses Large Language Models (LLMs) like OpenAI’s GPT-based models to dissect textual data.
A: Doctran can extract key properties from documents by using OpenAI’s GPT-4 model. These properties are defined in a schema and can be retrieved using the extract() function. Some examples are extracting category, sentiment, aggressiveness, language from the raw text.
A: Converting document content into a question-and-answer format using Doctran’s interrogation feature improves information retrieval. It allows for better context retrieval compared to indexing the original text, making it more suitable for search engines. The built-in interrogate() function transforms the document into a Q&A format, enhancing search results.
A: Language translation is crucial in consumer complaint analysis, particularly for businesses with multilingual customer bases. This feature ensures that information is accessible to a global audience. Doctran supports language translation using the built-in translate() function, enabling documents to be translated into various languages such as Hindi, Spanish, Italian, German, and more.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.