Build your own Voice Assistant

A step by step tutorial on building a voice-based assistant using python

What is a Voice Assistant?

A voice assistant or intelligent personal assistant is a software agent that can perform tasks or services for an individual based on verbal commands i.e. by interpreting human speech and respond via synthesized voices. Users can ask their assistants’ questions, control home automation devices, and media playback via voice, and manage other basic tasks such as email, to-do lists, open or close any application etc with verbal commands.

Let me give you the example of Braina (Brain Artificial) which is an intelligent personal assistant, human language interface, automation and voice recognition software for Windows PC. Braina is a multi-functional AI software that allows you to interact with your computer using voice commands in most of the languages of the world. Braina also allows you to accurately convert speech to text in over 100 different languages of the world.

Personal Assistant (Jarvis) in Python

I thought it would be cool to create a personal assistant in Python. If you are into movies you may have heard of Jarvis, an A.I. based character in the Iron-Man Movies. In this tutorial we will create a robot.

The features I want to have are:

Recognize spoken voice (Speech recognition)
Answer in spoken voice (Text to speech)
Answer simple commands

For this tutorial you will need (Ubuntu) Linux, Python and a working microphone.

Having said that, how cool it would be to build a simple voice-based desktop/laptop assistant that has the capability to:-

1. Open the subreddit in the browser.

2. Open any website in the browser.

3. Send an email to your contacts.

4. Launch any system application.

5. Tells you the current weather and temperature of almost any city

6. Tells you the current time.

7. Greetings

8. Play you a song on VLC media player(of course you need to have VLC media player installed in your laptop/desktop)

9. Change desktop wallpaper.

10. Tells you latest news feeds.

11. Tells you about almost anything you ask.

So here in this article, we are going to build a voice-based application which is capable of doing all the above-mentioned tasks. But first, check out this video below which I made while I was interacting with the desktop voice assistant and I call her Sofia.

New video by Nagesh Chauhan

Interaction_with_Sofia

photos.app.goo.gl

I hope you guys have liked the above video in which I was interacting with Sofia. Now let’s start building this cool thing…

Recognize spoken voice

Speech recognition can by done using the Python SpeechRecognition module. We make use of the Google Speech API because of it’s great quality.

Answer in spoken voice (Text To Speech)

Various APIs and programs are available for text to speech applications. Espeak and pyttsx work out of the box but sound very robotic.

Dependencies and requirements :

Install all these python libraries :

pip install SpeechRecognition

pip install beautifulsoup4

pip install python-vlc

pip install youtube-dl

pip install pyowm

pip install wikipedia

Let’s start building our desktop voice assistant using python

Start by importing all the required libraries :

import speech_recognition as sr

import os

import sys

import re

import webbrowser

import smtplib

import requests

import subprocess

from pyowm import OWM

import youtube_dl

import vlc

import urllib

import urllib2

import json

from bs4 import BeautifulSoup as soup

from urllib2 import urlopen

import wikipedia

import random

from time import strftime

For our voice-assistant to perform all the above-discussed features, we have to code the logic of each of them in one method.

So our first step is to create the method which will interpret user voice response.

def myCommand():

    r = sr.Recognizer()

    with sr.Microphone() as source:

        print('Say something...')

        r.pause_threshold = 1

        r.adjust_for_ambient_noise(source, duration=1)

        audio = r.listen(source)

    try:

        command = r.recognize_google(audio).lower()

        print('You said: ' + command + '\n')

    #loop back to continue to listen for commands if unrecognizable speech is received

    except sr.UnknownValueError:

        print('....')

        command = myCommand();

    return command