Build Your Own Desktop Voice Assistant in Python

[email protected] Last Updated : 08 Dec, 2020

4 min read

This article was published as a part of the Data Science Blogathon.

Introduction

How cool is it to build your own personal assistants like Alexa or Siri? It’s not very complicated and can be easily achieved in Python. Personal digital assistants are capturing a lot of attention lately. Chatbots are common in most commercial websites. With growing advancements in artificial intelligence, training the machines to tackle day-to-day tasks is the norm.

Voice based personal assistants have gained a lot of popularity in this era of smart homes and smart devices. These personal assistants can be easily configured to perform many of your regular tasks by simply giving voice commands. Google has popularized voice-based search that is a boon for many like senior citizens who are not comfortable using the keypad/keyboard.

This article will walk you through the steps to quickly develop a voice based desktop assistant, Minchu (meaning Flash) that you can deploy on any device. The prerequisite for developing this application is knowledge of Python.

For building any voice based assistant you need two main functions. One for listening to your commands and another to respond to your commands. Along with these two core functions, you need the customized instructions that you will feed your assistant.

The first step is to install and import all the necessary libraries. Use pip install to install the libraries before importing them. Following are some of the key libraries used in this program:

The SpeechRecognition library allows Python to access audio from your system’s microphone, transcribe the audio, and save it.
Google’s text-to-speech package, gTTS converts your audio questions to text. The response from the look-up function that you write for fetching answer to the question is converted to an audio phrase by gTTS. This package interfaces with Google Translate’s API.
Playsound package is used to give voice to the answer. Playsound allows Python to play MP3 files.
Web browser package provides a high-level interface that allows displaying Web-based pages to users. Selenium is another option for displaying web pages. However, for using this you need to install and provide the browser-specific web driver.
Wikipedia is used to fetch a variety of information from the Wikipedia website.
Wolfram|Alpha is a computational knowledge engine or answer engine that can compute mathematical questions using Wolfram’s knowledge base and AI technology. You need to fetch the API to use this package.

Implementation of the Personal Assistant

The entire code for this application is written in Python using libraries supported by Python.

Import required libraries:

import speech_recognition as sr #convert speech to text
import datetime #for fetching date and time
import wikipedia
import webbrowser
import requests
import playsound # to play saved mp3 file 
from gtts import gTTS # google text to speech 
import os # to save/open files 
import wolframalpha # to calculate strings into formula
from selenium import webdriver # to control browser operations

Write a function to capture your requests/questions:

def talk():
    input=sr.Recognizer()
    with sr.Microphone() as source:
        audio=input.listen(source)
        data=""
        try:
            data=input.recognize_google(audio)
            print("Your question is, " + data)
            
        except sr.UnknownValueError:
            print("Sorry I did not hear your question, Please repeat again.")
return data

Next, write a function to respond to your questions:

def respond(output):
    num=0
    print(output)
    num += 1
    response=gTTS(text=output, lang='en')
    file = str(num)+".mp3"
    response.save(file)
    playsound.playsound(file, True)
    os.remove(file)

Now write the module to add all the required customized responses to your questions:

if __name__=='__main__':
    respond("Hi, I am Minchu your personal desktop assistant")
          
    while(1):
        respond("How can I help you?")
        text=talk().lower()
        
        if text==0:
            continue
            
        if "stop" in str(text) or "exit" in str(text) or "bye" in str(text):
            respond("Ok bye and take care")
            break
            
        if 'wikipedia' in text:
            respond('Searching Wikipedia')
            text =text.replace("wikipedia", "")
            results = wikipedia.summary(text, sentences=3)
            respond("According to Wikipedia")
            print(results)
            respond(results)
                  
        elif 'time' in text:
            strTime=datetime.datetime.now().strftime("%H:%M:%S")
            respond(f"the time is {strTime}")     
        
        elif 'search'  in text:
            text = text.replace("search", "")
            webbrowser.open_new_tab(text)
            time.sleep(5)
        
        elif "calculate" or "what is" in text: 
            question=talk()
            app_id="Mention your API Key"
            client = wolframalpha.Client(app_id)
            res = client.query(question)
            answer = next(res.results).text
            respond("The answer is " + answer)
            
        elif 'open googlr' in text:
            webbrowser.open_new_tab("https://www.google.com")
            respond("Google is open")
            time.sleep(5)
            
        elif 'youtube' in text: 
            driver = webdriver.Chrome(r"Mention your webdriver location") 
            driver.implicitly_wait(1) 
            driver.maximize_window()
            respond("Opening in youtube") 
            indx = text.split().index('youtube') 
            query = text.split()[indx + 1:] 
            driver.get("http://www.youtube.com/results?search_query =" + '+'.join(query))              
                
        elif "open word" in text: 
            respond("Opening Microsoft Word") 
            os.startfile('Mention location of Word in your system') 
        
        else:
           respond("Application not available")

Once all the modules of your program are ready, execute it. You will be thrilled to hear your own personal assistant converse with you. You can add more customizations based on your requirements, and develop a very intuitive voice based assistant. Once your desktop assistant is ready it’s time to deploy it. You can convert it into an executable file and run it on any device.

Generate an executable for your voice assistant

To create an executable from the Python script you can use Pyinstaller. First, you have to convert the .ipynb Python file to a .py extension. For this use ipython and nbconvert packages. Next, use Pyinstaller to create a .exe file for your .py file. All the following steps need to be performed in the command prompt from the location where Python is installed:

pip install ipython
pip install nbconvert
pip install pyinstaller

ipython nbconvert --to script minchu.ipynb #mention .ipynb file name to convert to .py

pyinstaller minchu.py #builds .exe file

The .py file created should be located in the same folder where the .ipynb file is located. Once the build is complete, Pyinstaller creates two folders, build and dist. Navigate to the dist folder and execute the .exe file to run your personal desktop assistant. This application is portable and can be executed on any device.

Conclusion

This is how simple it is to build your own voice assistant. You can add many more features such as play your favorite songs, give weather details, open email application, compose emails, restart your system, etc. You can integrate this application into your phone or tablet as well. Have fun exploring and developing your own Alexa/Siri/Cortana.

The entire code along with some additional features for this voice assistant is located in my git repo. You can checkout Geeks for Geeks for more variations in Python-based personal assistants.

[email protected]

I am a tech-savvy data analyst and a passionate tech blogger. With an insatiable curiosity for the latest in technology and a knack for turning raw data into meaningful insights, I'm on a constant quest to explore the ever-evolving digital landscape.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Introduction to NLP

Text Pre-processing

NLP Libraries

Regular Expressions

String Similarity

Spelling Correction

Topic Modeling

Text Representation

Information Retrieval System

Word Vectors

Word Senses

Dependency Parsing

Language Modeling

Getting Started with RNN

Different Variants of RNN

Machine Translation and Attention

Self Attention and Transformers

Transfomers and Pretraining

Question Answering

Text Summarization

Named Entity Recognition

Coreference Resolution

Audio Data

ASR

Audio Separation

Chatbot

Auto NLP

Build Your Own Desktop Voice Assistant in Python

Introduction

Implementation of the Personal Assistant

Import required libraries:

Write a function to capture your requests/questions:

Next, write a function to respond to your questions:

Now write the module to add all the required customized responses to your questions:

Generate an executable for your voice assistant

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap