Skip to content
Advertisement

Google speech recognition not recognizing certain words / phrases like um and er | python

So it seems google speech recognition is taking out certain parts of my speech like um, er and ahh. The problem is I want these to be recognized, I can not seem to figure out how to enable this.

Here is the code:

import pyttsx3

recognizer = speech_recognition.Recognizer()

vocal_imperfections = 0

vi_list = ['hmm', 'umm', 'aha', 'ahh', 'uh', 'um', 'er']

while True:
    try:
        with speech_recognition.Microphone() as mic:
            recognizer.adjust_for_ambient_noise(mic, duration=0.2)
            audio = recognizer.listen(mic)
            text = recognizer.recognize_google(audio, language='en-IN', show_all=True)
            #text = recognizer.recognize_ibm(audio)
            if text != []:
                text = text['alternative'][0]['transcript']
                if any(word in text for word in vi_list):
                    vocal_imperfections = vocal_imperfections+1
                print(text)
                print(vocal_imperfections)


    except speech_recognition.UnknownValueError():
        recognizer = speech_recognition.Recognizer()
        continue

It works as wanted just google takes out the vocal imperfections. Does anyone know how to enable this, or alternative free real time speech recognition that will recognize vocal imperfections?

Example: If I were to say: “um, I think today is the 30th” Google would return: “I think today is the 30th”

Advertisement

Answer

I took a look at the Google Cloud Speech-to-text API docs and didn’t see anything relevant (as of March 2022). I also came across these related resources:

All evidence suggests that it isn’t possible to use the Google Cloud Speech-to-text service (at this time), and that you’ll have to seek alternative services. I won’t rehash the alternatives listed in the resources, but several are provided and you’ll have to pick which one best suits your particular needs.

Also, you may already know this (so apologies if you do), but these types of words are typically called “filler” and/or “hesitation” words. That might be helpful to you while researching the topic.

The good news is that the SpeechRecognition module (I think that’s what you’re using based on your code) supports several different engines, so hopefully one of those provides filler words.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement