When bringing NextSlide to life, one of the major challenges we faced was getting speech recognition to work. Originally, we were going to use the built in Google Speech API. However, there were several issues with it that made it undesirable for our application. The built in speech to text API is very buggy; although you can change the parameters of the activity in the code, such as duration to listen to speech and timeout duration, these changes are never reflected in the app when it actually runs. Additionally, there was no way to disable the sound effect that is played whenever the search starts or stops. Because we were constantly instancing searches, the app constantly made noise and prevented sound effects from the app from being audible.
In the end, we used CMUSphinx. CMUSphinx is an advanced speech recognition program that offers greater customization than the default Google API. Changes in search parameters are recognized by CMUSphinx, and there is no sound effect when speech transactions end/begin, allowing us to play our own sound effects during the presentation. The most useful features that we took advantage of were the ability to implement a keyword search that only returned results when a keyword was spoken and setting the sensitivity of the keywords based on the number of syllables they have. The longer the word is, the harder it is for the recognizer to recognize the word correctly, so longer words have a lower threshold of confidence required before they are considered recognized.