Hey Folks!
In this article, we are going to discuss Speech Recognition and its application of it by implementing a Speech to Text and Text to Speech Model with Python. Speech Recognition is also known as Speech Text conversion or simply Voice Recognition. This is the technique of making computers understand human language. Have you ever wondered how amazon’s Alexa apple’s Siri and google’s voice assistant talk to us and understand our language, this is done by Speech Recognition?
Speech Recognition is a very important task in NLP. Speech Recognition is the only medium to make computers understand our spoken speech. As we know computers can easily understand a written text by converting text into features (numerical features) by implementing various feature extraction techniques.
Here the idea is to convert spoken speech into text and then feed it to computers.
There are numerous applications of Speech Recognition some major applications are:
Nowadays interaction with computers and smart devices is tending towards the voice. Devices working on Voice Commands are quick effective and have to be smarter. Since machines can understand the text by applying some feature extraction techniques our goal is to convert any speech into a text.
Business Problem
We want to convert speech into text
Solution
there are various technologies available to perform speech to text but PyAudio provides a very easy and efficient implementation.
Implementation Using Python
installing libraries
!pip install SpeechRecognition !pip install PyAudio
# if pip install PyAudio throws error try: !conda install pyaudio
PyAudio is used to record and play an audio file with Python. it enables the microphone with python
SpeechRecognition takes an AudioData instance and converts it into text. this works online using the Google Speech Recognition API.
import speech_recognition as sr r=sr.Recognizer() with sr.Microphone() as source: print("Please say something") audio = r.listen(source) print("Time over, thanks") try: print("You said: "+r.recognize_google(audio,language = 'en-US')); except: pass
Output
Please say something Time over, thanks you said: This is Speech Recognition done by NLP
sr.Recognizer()
is a recognizer instancerecognizer_instance.recognize_google(audio_data,language = “en-US”)
‘en-US’
recognize_google(audio, language =’hi-IN’))
TTS(Text to Speech) interface that allows the computer to read a text like a human. this is also called read-aloud technology.
In the real world, we can see numerous applications of the TTS system. this is widely used to make smart devices that can interact with humans.
There are some major applications of the TTS system:
Problem
We want to create a system that can read a given text in a human’s voice.
Solution
There could be multiple ways to perform Text2Speech but the easiest and most efficient way is to use Google’s API using the gTTS
library
gTTS
library!pip install gTTS
gTTS
let’s load and work with itfrom gtts import gTTS input_text = "I like NLP and now this is machine voice" convert = gTTS(text= input_text, lang='en', slow=False)
convert.save('audio.mp3')
If you play audio.mp3 you would listen to “I like NLP and now this is machine voice” in a human’s voice.
there are some parameters used to change the voice and control voice speed using parameters. For more information refer to this link.
We have discussed Speech to Text and Text to Speech now we will talk about language translation using python.Using tools like Happy Scribe, you can efficiently translate audio to English or other languages by first converting speech to text and then using a translation API to convert the text. Happy Scribe simplifies the initial step by providing accurate transcripts and subtitles, which can be translated into various languages. Integrating this with Python makes it possible to create a seamless workflow for audio-to-audio language translation.
Using these 3 technologies we can create our own Language Translator that takes Speech and convert it into the desired language’s Speech
As we all know Language translation is widely used nowadays. language translation can take language in the form of speech, text as well as pictures.
Google’s Language Translator system is most widely used and it supports almost every major language.
Google’s Language Translator is supported by Attention layers that make it very robust compared to other translator models.
Problem
Create a Model that can translate a given text into the desired language
Solution
The most effective and easiest way to implement language translation for your project is to use the library goslate
that works using Google’s Translator API in the backend
goslate
provides us python API to google translation service by querying google translation website.
goslate
!pip install goslate import goslate
text = "Bonjour le monde" gs = goslate.Goslate() translatedText = gs.translate(text,'en') print(translatedText)
Output
Hello World
goslate.Goslate()
is a translator’s instancegoslate
can also be used to detect language. Goslate.detect(‘text’)
returns the language of the text.
gs.detect('hallo welt')
we can also query concurrent text by passing an array of text into .translate()
method.
For more detailed documentation on
goslate
refer to this link.
I believe that you are comfortable with the basics of natural language processing you have already implemented some basic NLP tasks, and you are ready to solve some real-world business problems using NLP
In the Next Article, we will Implement Industry Applications of NLP ie.
These Tasks contain some series of concepts of NLP that will be leveraged while building these applications. So Stay Tuned for My next article that going to be an end-to-end guide on industry applications of NLP
In this article, we have discussed speech2text using (pyaudio
, speech recognition) and implemented on python. then we covered text2speech using the library gTTS
that simply queries to google’s text2speech API in the backend. then we covered Language Translation using the library goslate
that is again supported by Google’s Translator API in the backend.
Read more articles on converting text to speech topics.
If you have any suggestions or questions for me feel free to hit me on my Linkedin.
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.
It's great that this article talked about how by implementing different feature extraction techniques, computers can understand a written text. Last night, my best friend told me that he and his mate was looking for a captioning service that could do real-time speech-to-text translation solutions for their video formats, and he asked if I had any idea what is the best choice. Thanks to this instructive article, I'll be sure to tell him that he can consult a captioning service as they can provide more information about the translation process.