Low Cost Voice Synthesis

All Ears for Type 'n Talk by Gordon Mc Comb, published in Creative Computing year ? page 148-152

  1. A Few Words About Voice synthesis
    1. There are three major ways to give your computer a voice. One system, pioneered by Texas Instruments, is called Linear Predictive Coding (LPC) where an announcer speaks into a microphone connec ted to a computer. The computer digitizes and condenses the speech and stores it in memory. That memory can be duplicated and inserted in a finished product. In this way, calculators, toys and other electronic devices can be given voices-but can only speak those words originally recorded. Words can be combined to create complete phrases, such as "six times six equals thirty-six." Each word is recorded separately. A small computer inside the device picks out the proper sequence of words and strings them together.
    2. Another method used by many computer hobbyists is similar to LPC, but allows the user to speak into a microphone and digitize his own voice. His voice is then "recorded" in memory, and recalled from the computer at will. Super Talker from Mountain Computer provides this type of speech capability for the Apple.
    3. Phoneme-based synthesis is perhaps the truest form of voice synthesis. It creates words by imitating the sounds produced by the human vocal tract. In this way, words, phrases and even singing can be produced without the need to prerecord or digitize speech. One such unit was built for the Radio Shack TRS-80. However, it takes several hours to input a page or two of text even for the most experienced operator. The first synthesizers of this kind were built by Bell Laboratories in the 50's. (The first working model is featured on the Philadetphia Computer Music Festival LP record; $6 from Creative Computing.) Later, less elaborate commercial versions of this type of synthesizer were shown widely in the early 70's, however, all suffered from a lack of inflection. This gave them a decided Scandinavian or Eastern European accent and did not contribute to their widespread acceptance. Text-to-speech synthesis. still in its infancy, eliminates the tedious programming of the phoneme-based synthesizer. Text is typed into a computer and is translated by a built-in language interpreter. The translator has been programmed to correct for the majority of pronunciation variances inherent in our language.-DKA
  • For years, computer hobbyists have seen voice synthesis as a distant fantasy-OK for comic books and novels, but too complex and expensive for the home computer den. But that was yesterday! Today, direct text-to-speech voice systems enable the home computer user to type in plain English, and the synthesizer automatically converts the written words into intelligible speech. Sure, this technology has been around for a while, but not at $345! Votrax, an old and reliable name in voice synthesis, recently announced a product called Type 'n Talk. Type 'n Talk works with any computer and any language, has an unlimited vocabulary, is RS-232C serial interface compatible, and extremely easy to use. Let's see what else Type 'n Talk has to say for itself.
  • An Inside Look
      It all starts at your computer or terminal. Information, in the form of ASCII characters is sent through an RS-232C serial interface and into Type 'n Talk (TNT). This information is fed through very quickly - faster than the synthesizer could say the words. So a buffer has been inserted at the input to collect the information so it can be slowly dispersed as the words are spoken (your printer works in a similar fashion). From the buffer, the data is sent to a text-to-speech translator, that decides how the words you typed will be pronounced. From the translator, the information is sent to a voice synthesis chip. This chip creates a series of hissing, pocking, clicking, humming and other strange sounds that combine to form human speech. These sounds are sent through an internal amplifier and then to your speaker.
  • Connections
      No special hardware modifications or devices are required to connect TNT to your computer system. However, you do need a standard RS-232C serial interface and cable. Up to eight TNTs can be connected to one computer system, and each can be independently addressed. This is particularly helpful in the classroom. You must make two other connections to complete your TNT set up: one for power supply cable (included) and the other for the speaker connection. You can connect TNT to any 8-ohm speaker or wire it into your hi-fi if the on-board 1-watt amp isn't strong enough for your needs (TNT does not have an internal speaker). You must also select the baud rate. A series of small switches on the back of TNT controls the rate from 75 to 9600 baud. The data buffer built into TNT is capable of holding 750 bytes, or about one minute of speech. At 9600 baud, this buffer takes less than one second to fill. So while TNT is speaking, your computer is free to do other tasks.
  • System Ready
      When everything is properly connected, Type 'n Talk announces "system ready." Adjust the volume control to a comfortable listening level. The frequency control changes the speed of the voice. On computers that don't have a built-in serial interface, you'll need to instruct the computer on where to send the information so it'll get to TNT. Generally, information transfers to Type 'n Talk can be accomplished with the same commands and software used to send data to a terminal, printer or tape drive. TNT's instruction manual gives a few insights on hooking it into your computer. Many of your programs can be run "as is," others you may want to modify slightly to make better use of the voice system. Whenever there is a PRINT statement, TNT can be made to speak the text. You can also modify your program so it will speak some of the statements, and print out the rest. Any combination is possible. Audible speech is generated by the letters A through Z and numerals 0 through 9 only. Characters such as %, @, &, ( and so on) have either no effect or produce periods of silence. Capital letters are treated in two different manners. If only the first letter of a word is capitalized, then TNT will pronounce the word in the usual fashion. But if the first two (or more) letters are caps TNT will spell out the word, letter by letter. For example: The words T-H-E and T-E-D are pronounced tee-aych-ee and tee-ee-dee. The words and t-h-e and T-e-d are p