Customer support calls hold a wealth of information, but finding the time to manually comb through these recordings for insights isn’t easy. Imagine if you could instantly turn these long recordings into clear summaries, track how the sentiment shifts throughout the call, and even get tailored insights based on how you want to analyze the conversation. Sounds useful?
In this article, we’ll walk through creating a practical tool I built SnapSynapse (click here), to do exactly that! Using tools like pyannote.audio for speaker diarization(identification), Whisper for transcription, and Gemini-1.5 Pro for generating AI-driven summaries, I’ll show how you can automate the process of turning support call recordings into actionable insights. Along the way, you’ll see how to clean and refine transcriptions, generate custom summaries based on user input, and track sentiment trends—all with easy-to-follow code snippets. This is a hands-on guide to building a tool that goes beyond transcription to help you understand and improve your customer support experience.
This article was published as a part of the Data Science Blogathon.
SnapSynapse is a handy tool for turning customer support calls into valuable insights. It breaks down conversations by speaker, transcribes everything, and highlights the overall mood and key points, so teams can quickly understand what customers need. Using models like Pyannote for diarization, Whisper for transcription, and Gemini for summaries, SnapSynapse delivers clear summaries and sentiment trends without any hassle. It’s designed to help support teams connect better with customers and improve service, one conversation at a time.
Below are the important key features of SnapSynapse:
In this section, we’ll explore the core features that make SnapSynapse a powerful tool for customer support analysis. From automatically diarizing and transcribing calls to generating dynamic conversation summaries, these features are built to enhance support team efficiency. With its ability to detect sentiment trends and provide actionable insights, SnapSynapse simplifies the process of understanding customer interactions.
In case, if you want to check out the whole source code, refer to the files in the repo : repo_link
We’ll need OPEN AI API and GEMINI API to run this project. You can get the API’s here – Gemini API , OpenAI API
Project flow:
speaker diarization -> transcription -> time stamps -> cleaning -> summarization -> sentiment analysis
In the first step, we’ll use a single script to take an audio file, separate the speakers (diarization), generate a transcription, and assign timestamps. Here’s how the script works, including a breakdown of the code and key functions:
This Python script performs three main tasks in one go:
The core function, transcribe_with_diarization(), combines all the steps:
def transcribe_with_diarization(file_path):
diarization_result = perform_diarization(file_path)
audio = AudioSegment.from_file(file_path)
transcriptions = []
for segment, _, speaker in diarization_result.itertracks(yield_label=True):
start_time_ms = int(segment.start * 1000)
end_time_ms = int(segment.end * 1000)
chunk = audio[start_time_ms:end_time_ms]
chunk_filename = f"{speaker}_segment_{int(segment.start)}.wav"
chunk.export(chunk_filename, format="wav")
with open(chunk_filename, "rb") as audio_file:
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="json"
)
transcriptions.append({
"speaker": speaker,
"start_time": segment.start,
"end_time": segment.end,
"transcription": transcription.text
})
print(f"Transcription for {chunk_filename} by {speaker} completed.")
A Glimpse of the output generated and got saved in diarized_transcription.py file:
# function to clean the transcription text
def clean_transcription(text):
# List of common filler words
filler_words = [
"um", "uh", "like", "you know", "actually", "basically", "I mean",
"sort of", "kind of", "right", "okay", "so", "well", "just"
]
# regex pattern to match filler words (case insensitive)
filler_pattern = re.compile(r'\b(' + '|'.join(filler_words) + r')\b', re.IGNORECASE)
# Remove filler words
cleaned_text = filler_pattern.sub('', text)
# Remove extra whitespace
cleaned_text = re.sub(r'\s+', ' ', cleaned_text).strip()
return cleaned_text
In the next step, we use the Gemini API to generate structured insights and summaries based on the cleaned transcriptions. We utilize the Gemini 1.5 pro model for natural language processing to analyze customer support calls and provide actionable summaries.
Here’s a breakdown of the functionality:
A short glimpse of different prompts used:
A glimpse of the output generated:
Further, in the next step we perform sentiment analysis on customer support call transcriptions to assess the emotional tone throughout the conversation. It uses the VADER sentiment analysis tool from NLTK to determine sentiment scores for each segment of the conversation.
Here’s a breakdown of the process:
# Calculate the overall sentiment score
overall_sentiment_score = total_compound / len(sentiment_results)
# Calculate average sentiment for Customer and Agent
average_customer_sentiment = customer_sentiment / customer_count if customer_count else 0
average_agent_sentiment = agent_sentiment / agent_count if agent_count else 0
# Determine the overall sentiment as positive, neutral, or negative
if overall_sentiment_score > 0.05:
overall_sentiment = "Positive"
elif overall_sentiment_score < -0.05:
overall_sentiment = "Negative"
else:
overall_sentiment = "Neutral"
def plot_sentiment_trend(sentiment_results):
# Extract compound sentiment scores for plotting
compound_scores = [entry['sentiment']['compound'] for entry in sentiment_results]
# Create a single line plot showing sentiment trend
plt.figure(figsize=(12, 6))
plt.plot(compound_scores, color='purple', linestyle='-', marker='o', markersize=5, label="Sentiment Trend")
plt.axhline(0, color='grey', linestyle='--') # Add a zero line for neutral sentiment
plt.title("Sentiment Trend Over the Customer Support Conversation", fontsize=16, fontweight='bold', color="darkblue")
plt.xlabel("Segment Index")
plt.ylabel("Compound Sentiment Score")
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend()
plt.show()
Sentiment Analysis Scores generated:
Sentiment Analysis Plot generated:
You can find the code repository here – repo_link
Now, let’s walk through how to set up and run SnapSynapse on your local machine:
Start by cloning the project repository to your local machine to begin using SnapSynapse. This provides access to the application’s source code and all its essential components.
git clone https://github.com/Keerthanareddy95/SnapSynapse.git
cd SnapSynapse
A virtual environment helps isolate dependencies and ensures your project runs smoothly. This step sets up an independent workspace for SnapSynapse to operate without interference from other packages.
# For Windows:
python -m venv venv
# For macOS and Linux:
python3 -m venv venv
# For Windows:
.\venv\Scripts\activate
# For macOS and Linux:
source venv/bin/activate
With the virtual environment in place, the next step is to install all necessary libraries and tools. These dependencies enable the core functionalities of SnapSynapse, including transcript generation, speaker diarization, time stamp generation, summary generation, sentiment analysis scores, visualization and more.
pip install -r requirements.txt
To leverage AI-driven insights, diarization, transcription and summarization and you’ll need to configure API keys for Google Gemini and OPEN AI Whisper.
Create a .env file in the root of the project and add your API keys for Google Gemini and OPEN AI Whisper.
GOOGLE_API_KEY="your_google_api_key"
OPENAI_API_KEY="your_open_ai_api_key"
summary_output.json
.Let us now look onto the tools used in development for SnapSynapse below:
In a nutshell, SnapSynapse revolutionizes customer support analysis by transforming raw call recordings into actionable insights. From speaker diarization and transcription to generating a structured summary and sentiment analysis, SnapSynapse streamlines every step to deliver a comprehensive view of customer interactions. With the power of the Gemini model’s tailored prompts and detailed sentiment tracking, users can easily obtain summaries and trends that highlight key insights and support outcomes.
A big shoutout to Google Gemini, Pyannote Audio, and Whisper for powering this project with their innovative tools!
You can check out the repo here.
A. SnapSynapse can handle audio files of the formats mp3 and wav.
A. SnapSynapse uses Whisper for transcription, followed by a cleanup process that removes filler words, pauses, and irrelevant content.
A. Yes! SnapSynapse offers five distinct prompt options, allowing you to choose a summary format tailored to your needs. These include focus areas like action items, escalation needs, and technical issues.
A. SnapSynapse’s sentiment analysis assesses the emotional tone of the conversation, providing a sentiment score and a trend graph.
A. Customer Call Analysis uses AI-powered tools to transcribe, analyze, and extract valuable insights from customer interactions, helping businesses improve service, identify trends, and enhance customer satisfaction.
A. By customer call analysis, businesses can gain a deeper understanding of customer sentiment, common issues, and agent performance, leading to more informed decisions and improved customer service strategies.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.