Sp Recognition‎ > ‎

sp recog Add a voice

Add a Voice to Your Computer for $35





Steve Ciarcia POB 582, Glastonbury CT 06033

published June 1978@BYTE Publications Inc.
  • Talk to Me!

  • Talk to me! Talk to me!"
    "OK! I'll talk to you if you need it that much!" Ken called out as he descended the stairs into my cellar workshop. "You sure you aren't going a little buggy"
    I looked up from the video monitor and parted the piles of cassette tapes and printouts. Ken was a good neighbor and I knew his comment was only in jest. I hit the carriage return and the speaker said, "Talk to me!"
    Ken smiled when he realized I was just exercising the voice synthesizer option I had previously added to my system.
    "This synthesizer is part of the reason I'm here this evening," he said.
    "What's the problem?" I asked.
    "No problem really. We just got a microcomputer in'my company's R and D lab and I've been playing with it lately. It's pretty sophisticated and has plenty of memory space. What would it cost to put that type of synthesized voice on our computer? I can probably raise $50 among the technicians for it. They'd get a kick out of it."
    "Well, depending on the manufacturer and the particular interface, they usually run from $400 to $800 and up." I looked at the startled expression on Ken's face. It was what I normally call "peripheral face," the look you get when you tell someone that it'll cost $1100 for a video terminal to communicate with the computer he just bought for $250.
    "So much for that idea. How's the weather been lately?"
    "Wait!" I interjected. "How much memory do you have on your lab microcomputer?"
    "40 K, I believe. Why?"
    "How much of a vocabulary do you need?"
    "I suspect we'd only need the numbers 0 through 9 and a few letters. We want to monitor data and verbally record channel number and input value. But at that price it's far too expensive to justify."
    "How about digitized speech? You probably have enough memory for that."
    "What's that?"
    "It's a process to record speech digitally. For all practical purposes it's like a tape recorder, but instead of magnetic tape for the storage medium it uses the computer's programmable memory. The tape recorder uses an analog storage method while the computer stores the information digitally."
    "If it's that simple why don't more people use it?"
    "It's mostly because it's not very memory efficient. A voice synthesizer is an analog voltage generator that creates the speech phoneme sounds through a hard wired circuit. In its most advanced form a single 8 bit byte can be used to tell the synthesizer what discrete sound It should make. By sending it a series of byte codes, words can be made from the discrete sounds. That's the way my Votrax synthesizer works." I pulled out a pad to sketch my explanation. "In digitized speech the analog voice input is sampled very quickly with a high speed analog to digital converter, and the samples are stored In memory. To reconvert to analog or "say" the words, the stored digital data is sent to a digital to analog converter at the same rate and in the same order the samples were taken. The concept of digitized speech has been around for a long time, but up until recently the cost of a system dedicated to this was prohibitive. You already have the computer and enough memory for limited applications. All you need is the high speed analog to digital and digital to analog converters and the knowledge to do it."
    "And what is that going to cost me, $500?" Ken was still skeptical.
    I opened a drawer under the bench. It was my "junk box" (in my case one corner of my cellar is a junk room). I rummaged through the prototype boards from previous experiments and pulled out a particular one. "Ah, here we are. You remember a few months ago when I designed that 8 channel digital voltmeter (December 1977 BYTE, page 76, and January 1978 BYTE, page 37).

  • Figure 1a: Block diagram of a digital speech recording system. Speech is picked up as sound waves by the microphone and is amplified and processed through a high speed analog to digital converter which samples the analog' sound waveform several thousand times a second. These samples are stored in the computer's programmable memory.




  • I needed it to troubleshoot this board. This is all you need for digitized speech." I tossed the board to Ken. "It contains a 100,000 sample per second 8 bit analog to digital converter and an equivalent speed digital to analog converter. And now the beauty part: It cost less than $35 to build."
    "Great! Tell me how to use it. How much memory does it need? What kind of program does it use? Can you tell me how to use it so I can borrow it for work tomorrow?"
    "Well, let's go over the concept in more detail...."


  • What is Digitized Speech?
  • Digitized speech is simply a standard a data acquisition technique with a new definition. For years people have been using computers to scan analog to digital input converters and store the results in memory. Often, in high speed applications such as wind tunnels and nuclear experiments, the sample rates can exceed thousands of samples a second. In cases where the critical event is of short duration, these thousands of samples are stored directly into memory to increase system throughput capabilities. When the event has passed and sampling has stopped, the computer memory contains a record of that event in discretely timed intervals. The stored data is now available to be reduced, analyzed or listed. It's often listed in "slow motion." This technique employs an analog pen recorder and a digital to analog converter. Each sample is successively processed through a digital to analog converter at a slow rate to the pen recorder. The result is an expanded view of a short event.
    An alternative method for utilizing this stored data is to play it back in real time.
    In this case the computer outputs the stored data to the digital to analog converter at the same rate the data is taken. The output of the converter would then exactly duplicate the values of the event previously recorded (at the times the samples were taken).
    Digitized speech is a specific application of this type of data recording technique. Your voice, when applied to a microphone and amplifier, creates a fluctuating analog voltage that varies at the frequency rate of the sound. If this analog signal is applied to the input of a high speed (greater than 10,000 samples per second) analog to digital converter and stored in memory, the computer won't care whether the source is speech or
    a nuclear reaction. The analog fluctuations are "digitized" at discrete sampling intervals and stored (figure 1a). If the stored memory table is sent to a digital to analog converter at the same rate it was initially sampled, the
    speech is reproduced exactly. Of course there are trade-offs and limitations that have to be considered to produce a usable system (figure lb). We will consider them in detail later.


  • Figure lb: Block diagram of a digital speech playback system. Digital sample points stored by the system in figure la are converted by a high speed digital to analog converter intQ an analog speech waveform. A low pass filter is used to smooth the signa4 which is then amplified and played back through a speaker.


  • A digitized speech system creates its output waveform by digital to analog conversion rather than by completely analog generation as in the case of a voice synthesizer. The major consideration that limits the usefulness of digital speech is the vast quantity of data which must be stored to reproduce a single spoken word.
    choosing the Correct Sampling Rate

    The 8 channel digital voltmeter mentioned earlier has a maximum sampling rate of 25 conversions a second. A slow speed analog to digital converter of this type is of no value in this application. The normal human voice occupies a bandwidth of 4000 Hz, and taking


  • Figure 2a: A waveform (considerably simplified) which is characteristic of the voice.

  • Figure 2b: Waveform in figure 2a after being processed through a digital to analog converter at a sample rate of 5000 samples per second.

  • Figure 2c: Waveform in figure 2a after being processed through a digital to analog converter at a sample rate of 10,000 samples per second.


  • 25 samples within a period of one second - could not effectively record the event. At what sampling rate should audio speech be digitized?
    There is a specific law used to determine this rate, called the Nyquist criterion. It states that, at the very minimum, the sampling rate of the digitizer must be twice the maximum frequency of the input sample. If human voice extends to 4 kHz, the minimum sample rate should be 8 kHz. This presumes that there is an ideal low pass filter on the output of the converter. Ideal filters are something like perpetual motion, impossible to attain. In reality the sampling rate should be three or four times the highest input frequency. This means that to digitize voice fully you need a sample rate of from 12 to 16 kHz.
    It is easier to explain the digitization process visually. Figure 2 illustrates an expanded view of a typical speechlike waveform. Voice waveforms are complex: the majority of the voice sounds exist below 1500 Hz, but intonation and accent occupy the higher frequencies. It is these added harmonics and inflections that make one voice different from another, and capturing and recording them is an important consideration. The waveform in figure 2 has been digitized at two different rates for comparison. Figure 2a is the original waveform which consists of a fundamental frequency of approximatley 500 Hz and some added components of higher frequency. If this waveform is "digitized' or sampled at a 5000 samples per second rate and the stored values are sent to a digital to analog converter, the resultant waveform would be that shown in figure 2b. It is easy to see that only a vague representation of the original waveform would be recorded. Even though this output is filtered before being amplified, the higher frequency components of the original input would be lost. Increasing the sampling rate to 10,000 samples per second as in figure 2c gives a better record of the higher frequencies. The addition of a good low pass filter would eliminate the sharp transitions between samples.


  • Tradeoffs to be Considered

  • The benefits associated with the reduced cost of the voice input and output circuitry


  • Figure 3a: An 8 bit successive approximation analog to digital converter.

  • are counteracted by the increased memory requirements. Digitized speech uses a lot of memory. In the previous example, if the voice input is sampled at 10,000 samples per second, the table in memory needed to store one second of data would be 10,000 bytes long (presuming an 8 bit analog to digital converter). If Increased fidelity is required and the sampling rate Is set for 16 kHz, the table would fill up at a rate of 16,000 bytes per second.
    Obviously, systems like my own, which already have considerable amounts of programmable memory, would be easy to use for experimenting with digital speech. I do not recommend buying additional memory just to store a few words, but, if you have it, you'll be surprised at the results.



  • Building a Voice Digitizer

  • To experiment fully with digitized speech, it is necessary to have a high speed analog to digital converter to store the analog Input and a high speed digital to analog converter to reconstruct the analog output.
    Figure 3a shows the schematic of an 8 bit analog to digital converter capable of sample rates in excess of 200,000 samples per second. With an 900 kHz clock rate it will run at a modest 100,000 samples per second. Figure 3b shows an 8 bit digital to analog converter and low pass filter with similar capabilIties. The estimated total cost for parts is $35.
    The analog to digital converter is a general purpose high speed 8 bit converter that can


  • Table 1: Power wiring table for figures 3a and 3b.


  • Figure 3b: An 8 bit digital to analog converter and low pass filter.


  • be used for any data acquisition application requiring high speed. The technique used to attain this speed is called successive approximation. The circular logic of successive approximation is best explained in a block diagram (see figure 4).
    Initially, the output of the Successive Approximation Register (SAR) and mutual IV connected digital to analog converter is at a zero level. After a start conversion pulse, the register enables the output bits one at a time starting with the most significant bit (MSB). As each bit is enabled, the comparator gives an output signifying whether the amplitude of the Input signal is greater than or less than the amplitude of the converter. If the converter output is greater, that particular bit is set equal to 0; if less than, it is set to 1. The register moves successively to the next least significant bit (retaining the setting on the previously tested bit or bits) and performs the same test. After all the bits of the converter have been tested, an EOC Is output and then the conversion cycle is complete. The entire conversion period takes only nine clock cycles, and


  • Figure 3c: Power supply circuitry for figures 3a and 3b.

  • Figure 4: Block diagram of a typical successive approximation analog to digital converter The device uses a digital to analog converter to perform its function. The successive approximation register is initially set to 0. After a start conversion pulse, the register enables the output bits one at a time, starting with the most significant bit (MSB). As each bit Is enabled, the comparator gives an output signifying whether the amplitude of the Input signal is greater than or less than the amplitude of the digItal to analog converter. if the converter output Is greater, the bit in question is set equal to 0. Otherwise it Is set to 1. The process continues for the remaining bits, until the conversion Is complete.

  • another conversion begins on the next clock pulse when in free run mode. To retain the 8 bit value between conversions, an 8 bit register (1C3) has been added (see "Control the World," September 1977 BYTE, page 30, for a complete description of MCl408 digital to analog converter operation).



  • Assembly and Testing

  • 1. Component types and values are chosen to allow high speed operation. Substitution of slower devices may cornpromise overall performance.
  • 2. Assemble components on a prototype board as neatly as possible. Keep wires between components short and dIrect. The MCI 4559 is a CMOS device and it should be handled carefully. Sockets are suggested for all integrated circuits.
  • 3. Check power supply voltage before inserting integrated circuits. Then insert Clock oscillator 1C6. The clock frequency should be around 900 kHz.
  • 4. Insert the rest of the integrated circuits and ground the V input connection of 1C4. Slowly rotate the zero adjust pot until the parallel output of 1C3 reads binary 10000000. This output can be read either through a computer program which scans and displays this value or with LEDs attached to the output pins. In practice, the LEDs are easier in the long run.
  • 5. Remove the short on V input and apply a voltage of +2 V. Adjust the span adjust pot until the displayed output Is 11111111. The result of this procedure is an analog to digital converter with an input range of -2 to +2 V represented by binary 00000000 and 11111111 patterns respectively. o V is represented by 10000000. Any
    voltage span between + and -5 V can be set on this circuit using this method.
  • 6. The digital to analog converter section should be assembled with the same care. Insert all ICs. With all parallel input pins at a logic zero level, adjust the zero pot until 1C9 pin 6 reads 0 V.
  • 7. With all parallel input pins at a logic 1 level, adjust the span pot until the output at 1C9 pin 6 equals the +V setting of the analog to digital converter, or as in the example (2 V).
  • 8. The low pass filter in the schematic is optimized for the speech samples in the text, but can be experimentally
    determined. The optimum cut off frequency of the low pass filter should be the sampling rate frequency. (ie:
    10 kHz cut off for 10 kllz sample rate).
  • 9. The easiest way to test the entire unit is to attach the analog to digital converter output to the digital to analog converter input. What goes in should come out! Since both units would be running continuously at the 100,000 samples per second rate, this will give the experimenter firsthand knowledge of the ultimate fidelity of the system. Don't expect miracles with an 8 bit unit; 12 bit units would be far superior, but 8 bit precision is more than adequate, A standard cassette recorder in the record mode serves as a handy amplifier. The amplified output is available at the earphone jack on most recorders.



  • Using the Interface with a Computer

  • Not everyone will want to add a voice to their home computer but the concept is none the less intriguing. Once you have built the analog to digital converter and digital to analog converter of figure 3 you are ready to digitize the spoken word. Listing 1 is a simple program that reads the analog to digital converter output and puts the values sequentially in a memory table. Hardware for the experiment should be arranged as in figure 1b. When the program is executed It will scan the input port containing the analog to digital converter information and will compare this value to hexadecimal AS (when speech is started, the audio level will presumably exceed this trigger level). The amplifier should be adjusted to eliminate false triggering because of background noise.
    When the input level is attained, the digitization process begins. The program sets the beginning address of the memory and sequentially reads the input port and stores the value. The rate at which the sampling occurs is determined by the value of a constant, "SAMP." A value of hexadecimal 38 is approximately 3 kHz on my Z-80 system. When the table is filled, the program stops: All programs in this article, while written on a Z-80, use only 8080 instructions.
    Once the table is filled with digital values corresponding to a voice input we are ready for the next phase: voice output The hardware is configured as in figure 1b, and the output program shown in listing 2 should be used. The same values as those of the input program should be used for START, END and SAMP. When the - program is executed, the recorded data gives a speech output.
    As with most computer experimenters, hearing is believing. To allow people to try out the concept without having to construct the analog to digital converter I have included a predigitized listing of a few words. This 2000 byte listing (listing 3), will say "Talk to me" when read out using the program of listing 2. Since I could not presume that everyone had the patience to hand load a 10,000 byte table with good fidelity, a compromise was in order. The sample rate on this table is only 3 kHz, but the speech will still be understandable. It should also be realized that since this example of digitized speech is actually recorded sounds, the words "Talk to me" will be in my voice. The fact that I have a fairly low voice allows understandable speech even at these very low sample rates.

  • Listing 1: An 8080 assembler program that reads the 8 bit parallel output of the analog to digital converter and stores the samples sequentially in memory. This assembly uses octal notation for machIne codes.


  • Listing 2: An 8080 assembler program designed to output digital speech samples to the digital to analog converter at the correct rate. This assembly uses octal notation for machine codes.

  • Listing 3: A listing of the digital samples making up the phrase Talk to me spoken by the author. This somewhat bandwidth limited signal allows interested readers to reproduce the message through an 8 bit digital to analog convertor without having to build the analog to digital converter.


  • I don't want you to finish this article and think that digitized speech is as limited as I have represented it so far. It is possible to totally simulate the capabilities of an analog speech synthesizer with more involved software. If you realize that the analog synthesizer works by connecting strings of distinctly independent phonemes, it is not hard to consider that the same can be true for the digital method. Each phoneme could be recorded separately and would occupy approximately 2 K bytes. As in the analog situation, a separate control program determines how these individual phonemes are to be connected together. Besides determining the type of phoneme to be used, the processor must also create the waveform. Such a system uses much more memory and takes considerably more processing time than something like the Votrax, but it is equally as versatile.


  • Listing 3, continued:
Comments