A step-by-step guidance to create a simple real-time speech recogniser

1. Environment

1.1 Pyaudio

  • Windows: pip install pyaudio
  • Linux: sudo apt-get install python-pyaudio python3-pyaudio
  • Mac OSX: brew install portaudio (need to install Homebrew on your mac first); then pip install pyaudio

1.2 SpeechRecognition package

pip install SpeechRecognition

Package webpage

2. Example code

Here is an example for recognising ‘yes’ and ’no’:

import speech_recognition as sr
import os


# obtain audio from the microphone
r = sr.Recognizer()
t = True

while t:
    with sr.Microphone() as source:
        r.adjust_for_ambient_noise(source, duration = 0.5)  
        print('say something')

        audio = r.record(source, duration = 2)    # listen for 2 seconds
    
    output = r.recognize_google(audio, show_all = True)

    if (len(output) < 1):
        print("Say louder")    # if the recogniser did not recognise anything from the microphone, ask speaker to say louder
    else:
        possible = [word['transcript'] for word in output['alternative']]    # extract all the possible phrase from return dictionary

        if ("yes" in possible):
            print("yes")
            t = False
        if ("no" in possible):
            print("no")
            t = False
        else:
            print("Say it again")