Home‎ > ‎

Speech Recognition with the Raspberry Pi

UPDATE: Audio quality is greatly improved by using a sampling rate of 48000 Hz (The default rate is 8000 Hz). So, the commands in step 10 and step 16 been changed to reflect that sampling rate.

These are the steps:
1. Start out by installing the Debian Wheezy image on an SD card and boot the RPi with it.
2. When raspi-config starts, run through whichever items you need to configure. In the following steps, you should be root.
3. Type the following commands:
   apt-get update
   apt-get upgrade
4. Get rpi-update from https://github.com/Hexxeh/rpi-update and install it. There are good instructions there, but to save time and starting up LXDE to access the site, you can type:
   wget http://goo.gl/1BOfJ -O /usr/bin/rpi-update && chmod +x /usr/bin/rpi-update
5. Before you run rpi-update, you'll need git, so type:
   apt-get install git-core
Then, run rpi-update by simply typing:
6. Plug in your usb audio adapter. Mine is the SYBA SD-CM-UAUD USB Stereo Audio Adapter and I plugged it into a powered hub. ALSA sees this as a C-Media USB Audio Device.
7. When you type "cat /proc/asound/cards", you should see that the adapter is card 1. Typing "cat /proc/asound/modules" should show that the driver for card 0 is snd_bcm2835 and the driver for card 1 is snd_usb_audio. This means that ALSA is currently configured to send audio to RPi's built-in audio output hardware. The RPi doesn't have built-in audio input hardware, so ALSA won't have a Capture device configured.
8. This step configures ALSA to use the usb audio adapter as the default device for audio input and output. The most elegant way to do this would be to configure ALSA to switch between the adapter and built-in hardware depending on whether the adapter was plugged in. If you need that feature (I dont't), you should check out the instructions at http://superuser.com/questions/172514/alsa-usb-audio-hotplug. So, in /etc/modprobe.d/alsa-base.conf, you should see the following couple of lines at the end of the file:
    # Keep snd-usb-audio from being loaded as first soundcard
    options snd-usb-audio index=-2
On the options line, I changed index=-2 to index=0.
9. To reload alsa-base.conf, type
   alsa force-reload
10. Now, /proc/asound/cards will show the adapter as the default (card 0). If you plug a headset into the adapter, you should be able to record audio from the mic (e.g. "arecord -d 5 -r 48000 test.wav" will make a 5 second recording) and play it back through the headset speakers (e.g. aplay test.wav). Use alsamixer to adjust the Speaker output level and the Capture input level (if you don't see the Capture control, press F5) to get the best results.
11. Now, on to pocketsphinx. Download the latest versions (0.7, at this writing) of sphinxbase and pocketsphinx from http://cmusphinx.sourceforge.net/wiki/download.
12. Extract the downloaded files (sphinxbase-0.7.tar.gz and pocketsphinx-0.7.tar.gz) into separate directories.
13. To compile these packages, you'll need to install bison and you'll need ALSA development headers. NOTE: It is important that the ALSA headers be installed before you build sphinxbase. Otherwise, sphinxbase will not use ALSA. It also appears that ALSA will not be used if PulseAudio is installed (thanks to NickE for discovering this). So, type:
    apt-get install bison
    apt-get install libasound2-dev
14. Change to the sphinxbase directory and type the following commands:
    ./configure --enable-fixed
    make install
15. Now, change to the pocketsphinx directory and type the following commands:
    make install
16. Test out pocketsphinx by running "src/programs/pocketsphinx_continuous -samprate 48000" in a terminal window. Whenever the program prints "Ready," try saying a word like "two" or "volume" and somewhere at the bottom of all the printout should be the word it thought you said. You'll probably need to adjust the input level with alsamixer or amixer to improve accuracy. If you're really serious about wanting to tweak it, I recommend you read the information at http://cmusphinx.sourceforge.net/wiki/ pertaining to pocketsphinx.

In my current set-up, I'm using a "less than top of the line" usb audio adapter and microphone. I don't know whether they or the RPi are limiting speech recognition performance. My true intent is to use a bluetooth headset, but that has been an even greater challenge on the RPi, so far.