Speech Recognition using Google Speech API

Google has a great Speech Recognition API. This API converts spoken text (microphone) into written text (Python strings), briefly Speech to Text. You can simply speak in a microphone and Google API will translate this into written text. The API has excellent results for English language.

Google has also created the JavaScript Web Speech API, so you can recognize speech also in JavaScript if you want, here’s the link: https://www.google.com/intl/en/chrome/demos/speech.html. To use it on the web you will need Google Chrome version 25 or later.

Installation

Google Speech API v2 is limited to 50 queries per day. Make sure you have a good microphone.

Are you are looking for text to speech instead?

This is the installation guide for Ubuntu Linux. But this will probably work on other platforms is well. You will need to install a few packages: PyAudio, PortAudio and SpeechRecognition. PyAudio 0.2.9 is required and you may need to compile that manually.

git clone http://people.csail.mit.edu/hubert/git/pyaudio.git

cd pyaudio

sudo python setup.py install

sudo apt-get installl libportaudio-dev

sudo apt-get install python-dev

sudo apt-get install libportaudio0 libportaudio2 libportaudiocpp0 portaudio19-dev

sudo pip3 install SpeechRecognition

Program

This program will record audio from your microphone, send it to the speech API and return a Python string.

The audio is recorded using the speech recognition module, the module will include on top of the program. Secondly we send the record speech to the Google speech recognition API which will then return the output.

r.recognize_google(audio) returns a string.

#!/usr/bin/env python3

# Requires PyAudio and PySpeech.

import speech_recognition as sr

# Record Audio

r = sr.Recognizer()

with sr.Microphone() as source:

    print("Say something!")

    audio = r.listen(source)

# Speech recognition using Google Speech Recognition

try:

    # for testing purposes, we're just using the default API key

    # to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")`

    # instead of `r.recognize_google(audio)`

    print("You said: " + r.recognize_google(audio))

except sr.UnknownValueError:

    print("Google Speech Recognition could not understand audio")

except sr.RequestError as e:

    print("Could not request results from Google Speech Recognition service; {0}".format(e))