Text 2 Speech‎ > ‎

t2sp Votrax TNT

Votrax Type 'n Talk

Review The Votrax Type 'n Talk:Give Your Computer a Voice by Niel Rosenberg

  • The Type'n'Talk speech synthesizer by Votrax (Troy, Ml),
    priced under $400, is about the size of 1 1/2 pocket calculators. By means of a simple RS-232 interface, one can create fully formed, fully understandable speech by sending unmodified ASCII text to it.
  • Inside a sturdy extruded aluminum case, one finds a single pc board with several interface and memory chips, the SC-01 phoneme generator and a potted area (for software security), which contains a Motorola 6800 series micro joined with a 4K-byte PROM. In this PROM are all the algorithms for converting standard text to phoneme data. Many rules for proper speech are obeyed, yielding comparatively natural sounds and correct pronunciation.
  • On the front panel are the unit's two controls, one for frequency and speed of the speech-the other for volume.
  • Although the unit contains no internal speaker, it does have a built-in amplifier for driving an external 8-ohm speaker to more than adequate volume. On the rear panel is a standard DB-25 (female) connector, an input jack for the power cord, and a coaxial phono jack for the audio output. Also present on the rear is a rectangular cutout, which provides access to a DIP switch for baud rate selection, adjustable from 75 to 9600 baud.
  • No problems were experienced in interfacing this product to a normal serial port-except the usual pilot error. Within a couple of hours, spent mostly making cables and procuring a speaker, I had the unit repeating back to me everything I typed at my terminal. No software is provided, but thanks to Microsoft Basic's MID$ command, the accompanying program was easy to write.
80 GO TO 20
  • The first thing heard after properly connecting the device and powering it up is "System Ready," after which it awaits data to pronounce. The voice is a monotone. After a little experimentation with the frequency knob, one can tune a pleasant speaking tone. The range of adjustment is large, with a slight peculiarity in the control: as you turn it clockwise, the frequency decreases, the opposite of what one would expect. The tone can be manipulated from a positively gravelly-sounding rumble to a fast high pitched mousy voice. The upper and lower limits are beyond reasonable sound.
  • The unit has three main operating modes that can be selected under software control. The first one (the power-up condition) speaks words, where a word is defined as a continuous string of alpha characters preceded and terminated by a space or a period. In this mode, the algorithms are put to work, analyzing the context information within each word, and outpufting the proper phoneme sequence. If it encounters individual letters surrounded by spaces, it speaks the letter as if it were recited from the alphabet. The string OK would be pronounced as "ahk" and 0 K as "oh kay." Numbers are always pronounced as individual digits. If the data contains an imbedded period just preceding a number, it is spoken as "point," thus properly interpreting it as a decimal point
  • The second mode speaks all characters as individual alphabet leffers, regardless of word grouping. The third mode (for more advanced users) is direct phoneme input. It resembles functionally the unit Radio Shack markets for the TRS-80, which is also a Votrax product. Although this would appear to be a more capable mode for accurate reproduction, most of this functionality of direct phoneme control can be accomplished in the first mode by careful manipulation of word spelling. This is an intuitive process, which isn't difficult once you get the hang of it. For example, the word computer is not spoken well by the device in mode one, but by respelling it to cum pewter it comes out quite well. Note that it has been split into two words. This forces the unit to pronounce the "y" sound in pewter as "pyuter," whereas if they were joined as one it would sound like "cumpooter." The best way to develop a rapport with the tool is to sit down and experiment
  • Not all sounds are perfectly formed; some are deficient. For example, the lefters G, H, and L, when spoken as parts of words, are too short in duration and lack strength. Also. when speaking individual alphabetic leffers, such as D. P, T and others that end with the "ee" sound, the pronunciation appears to be almost two-syllabic, with a slight hump in amplitude when going through the transition from the particular consonant sound to the "ee" follow-through. It has been pointed out that learning to understand this unit, as with many other synthesizers, is much like learning to understand a person with a foreign accent. Because pronunciation is the same each time for similar circumstances, one quickly gets to know the quirks. The Type'n'Talk is even easier than most due to its welldesigned algorithms.
  • The various supply voltages are delivered to the Type'n'Talk directly through a multi-conductor line from an independent plastic module. The supply runs fairly warm to the touch, and AC is permanently live on the primary. Thus the power switch is lowvoltage DC oriented. No other cables are supplied with the unit, so be prepared to have on hand an RS-232 cable and a miniature phone plug with two conductor wires to a speaker.
  • Although the manufacturer data claims a conservative 75Ocharacter buffer, 850 to 900 characters can be held at any one time. The way it handles data is as follows: When the power is turned on, the buffer is empty. As data is input, it is written into the 2114's. As soon as a carriage return is experienced, it moves about 125 bytes, or all of the characters-whichever is less-into an output holding area for processing and speech. If you send 125 bytes, followed by a carriage return-then 750 bytes-you will have effectively used all of the memory. Unfortunately, this is slightly less than the 1,024 needed to do a complete screen dump from a 64-by-16 video screen.
  • Proper timing will reduce overrun
  • The interface protocol requires the use of CTS (clear-to-send) to ensure against buffer overrun. This is somewhat inconvenient as compared with XON, XOFF used by many terminals, since it requires that a port service this status like a parallel bit. This is not a major problem because the buffer is quite large, and if you are careful in your software timing, it will be able to speak all of the desired text without filling up. The likelihood of this problem is reduced by using the lower baud rates. Unless batchtext is sent, there is virtually no output speed difference. The user's manual is brief and does not include a circuit diagram. Votrax contends that the company will soon be publishing the non-confidential portion of the schematic. This will indeed be a help. In the interest of brevity, some important details have been omifted such as the polarity of the CTS line during busy and not-busy conditions. Measurement shows that the line is + 12 volts when the buffer is not full, and - 12 volts when it is.
  • By incorporating a unit-ssignment code (address), one can daisy chain several Type'n'Talks, and individually access the devices. One can also communicate bi-directionaIly with the product, and get from it the phoneme data for the words given to it. This enables the user to fine-tune a word with minimal effort. Because it is a monotonic device, the human quality of the speech is somewhat limited. Votrax claims to be considering production of a software inflection-controlled device. By evidence of this superior product, it appears that the consumer-available synthesizer has finally reached its day.
  • Neil Rosenberg holds a Masters degree in Product Design/Engineering from Stanford University, and an Architectural degree from MIT. He is currenily employed as Engineering Manager at Integral Data Systems, a matrix printer manufacturer in Milford NH.