Analog Vocoder‎ > ‎

EMS manual

EMS: 22 channel Vocoder Manual

  • Published in ? date ? by EMS, written by ?
  • The EMS Vocoder (voice-coder) is a 22-channel vocoder that has been specifically designed to process speech and other sounds in a variety of new and interesting ways.
    In the following chapters, I will explain its many functions and describe how to operate the machine.
    I would now like to discuss the way in which natural speech mechanisms work and how the Vocoder analyses them. The speech production system is a very elaborate machine.
    When, for instance, vowel sounds are being voiced, that is when the vocal cords are oscillating, it acts like a reed instrument. But, when whistle sounds are being generated, it acts like a wind instrument. It can also generate articulated noise sounds. Therefore, a brief description of the speech production system would be a network composed of a set of variable resonators which can be excited either by the vocal cords at a variable pitch amplitude or by a flow of air (from the lungs) producing noise or whistles. This system is described in Fig. 2.
  • The vocal cords, airflow and the resonators are all controlled by the brain and manipulated in such a way as to produce intelligible speech (except in the case of politicians).
    Consider, for example, the following piece of speech: "EMS Vocoder", Fig. 3.
  • Note that the 'S' sounds, (I will call them the unvoiced parts of speech) are generated by noise excitation, and that the rest of the words (these I will call the voiced parts) are generated by the vocal cord excitation. Also note that the pitch of the vocal cord excitation changes during the words.
    Now, this information is only part of the data necessary to reconstruct the speech. If you were to listen to a noise source and an oscillator being controlled in the same way as they are in the vocal tract, you would not hear anything speech-like at all. The mechanism that transforms this sound into speech is the set of resonators. That is, the throat, nose, mouth in conjunction with the tongue. Therefore, to be able to reproduce speech, the Vocoder has to analyse the signal in the following way. It must decide whether or not the speech is voiced or unvoiced. It must calculate the pitch of the voiced portions and it must continuously analyse the frequency spectrum of the speech so as to determine the operation of the set of resonators, see Fig. 4.
  • Having done this, we have all the information necessary to reproduce the speech, but the important thing is that in doing so, we can change some of the parameters and so get some interesting effects. For instance, in the forthcoming example we exchange the vocal cords and noise source for the output of an organ and get a talking organ.
    To be able to reproduce the speech, the Vocoder uses an electronic model of the vocal tract, Fig. 5.
  • In this model a noise source and an oscillator (VCO) can be turned on and off and the VCO controlled in pitch. Their outputs are used to excite the synthesis filter bank. This is a model of the set of resonators and it is controlled by the data from the analysing filter bank. Figures 4 and 5 are the analysis and synthesis sections of the Vocoder and they form the main part of the machine.

  • The Vocoder is a signal processing device, and therefore it requires a signal input to obtain an output. To illustrate this, I will take the example of making an organ speak, Fig. 6.
  • Two signal sources are required: one, a source of speech which in this case is a pre-recorded tape, and two, an organ which can be used to play chords or tunes. The resultant output of the Vocoder is a speaking organ; that is, it is a sound that contains the characteristics of both of the inputs. It has the melody and a proportion of the harmonic structure of the organ plus the articulation of the speech.
    One of the fundamental properties of a good vocoder is its ability to make its inputs sound like speech. Therefore, it is particularly important when using pre-recorded speech tapes that no other spurious noises are on that tape, such as rustling of paper or background traffic noises - because the Vocoder will try to synthesize speech from them, and the result will be a rather curious output. Also, any noise on the speech tape will upset some of the Vocoders' modules and so a signal to noise ratio of 50 to 55 dB is required if good results are to be obtained.


  • The generation of synthetic natural sounding speech is a complex problem which can be solved using the Vocoder. However, the resultant speech always has a slight mechanical 'feel' about it and the quality of the speech varies depending on the characteristics of the original speaker. To produce synthetic speech, we must use the Vocoder to both analyse and then re-synthesize, Fig. 7.
  • In this example, the speech is analysed to break it up into its voiced and unvoiced portions, and the pitch of the voiced portions is determined. This data is then used to turn on and off the VCO and the noise source, and to control the pitch of the VCO. The VCO and noise source are then used to excite the synthesizing filter bank which is itself controlled by the data from the analysing filter bank. This enables the excitation to be articulated, the result being synthetic speech.

  • I should now like to briefly describe some of the additional features of the EMS Vocoder.
      produces a 22-bar histogram of the energy distribution of the analysed speech. The display medium is an external XY scope.

      energy levels of the 22 analysing filters is available simultaneously at a large multiway connector at the back of the Vocoder, thus enabling a computer to process this data. Also, the computer can inject control signals so that the 22 synthesizing channels may be manipulated.

      analysing filter bank is connected to the synthesizing filter bank via a 22 x 22 way patch board system. This enables the routing of the filters to be altered at will. Also, there are 22 synthesis input level controls which can also be used as a 22 channel equaliser.

      shifter has both UP and DOWN shifted outputs, with a frequency range of 0.05Hz to 1kHz. It can also be used to generate phasing effects.

      the input and output connections to the vocoder are made at the back panel, Fig. 8.

  • is a real time spectrum display of the input speech signal represented by a 22 bar histogram. Each bar shows the energy level in a bandpass filter of 1/4 octave spacing.
    When speech is being analysed it is possibJe to observe the motion of the formants. The diagram shows such a situation

  • Spectrum Display is an add-on optional function. A Vocoder owner can obtain this facility by simply buying the board and plugging it in. An XY oscilloscope is required to view the display.

  • I will now describe the function of each of the connectors.

  • MAINS:
    Power input 240V - 220V. 50Hz or 11OV 60Hz selectable.
  • FUSE: Mains fuse 1 amp.
  • X Y DISPLAY: BNC connectors. Spectrum display output to an XY scope.
    Vocoder Input and Output connectors are all 1/4" mono jacks, unbalanced. All outputs are short circuit protected.
  • Vocoder
    This is the output of the machine. It will drive 640 ohms equipment at Line Level. However, there is an output level control on the front panel if smaller levels are required. Usually, this output will be connected to a tape recorder and/or a monitoring system.
  • Equaliser
    This output is the sum of all the signals on the synthesis input level controls. Therefore it is basically a 22 channel equaliser acting upon the signal injected into speech input.
  • Pitch Voltage
    The control voltage generated by the pitch extractor appears here, as well as inside the Vocoder to control the VCOs. Thus it is possible to control exterral pieces of synthesizer equipment with this voltage.

  • Voiced
    When the voiced/unvoiced detector has decided that the incoming speech signal is voiced, a +2.5V signal is produced. Otherwise a -2.5V is generated.

  • Unvoiced
    Operation is complementary to the above. These signals can he used to turn on and off external pieces of equipment.

  • Up Mix This is the UP shifted mixed signal from the frequency shifter.
  • Down Mix
    The DOWN shifted mixed signal is also available and can be used in conjunction with the UP MIX output. For instance, when they are both being used to produce slow phasing, a mobile stereo image can be generated.

  • Speech
    Input impedance: 10k ohms. Signal level required, line level. This is the speech input to the machine; if you want to use a microphone, then the signal level will have to be brought up to line level with an external pre-amplifier. Also, when trying to either set up or demonstrate the machine, it is very useful to use a pre-recorded speech tape, about 10 to 15 minutes in length.
  • Excitation A and B
    Input impedance: 10k ohms. Signal level required line level.
    Any external excitation such as organ, music engine noises etc., is inserted here.

  • VC Slew
    It is possible to voltage control the slew freeze function. Input voltage range: ca. 1V.
  • External VC
    This connector allows the VCOs and the frequency shifter to be externally controlled. Pitch spread approximately + 0.5V/octave.
  • Envelope Outputs and Control Inputs
    There are 22 inputs and outputs on this multi-way connector, which are intended for computer control. Output voltage range 0 to -4V. Input voltage range 0 to +4V. Input impedance 33k to 68k.

  • Now for the front panel and its many controls. Let us suppose that we wish to produce synthetic speech from a pre-recorded tape. I will list the necessary sequence of events. Firstly, connect the mains to the power connection, and switch on. The orange lights should illuminate. Next, connect the tape recorder output to the speech input of the Vocoder, roll the tape and turn the speech input level control (Fig. 9) clockwise until speech PPM meter reads peaks of 6 to 7.
  • If the meter reading is low with the control at maximum, then the tape recorder signal will have to be externally increased, otherwise the Vocoder's performance will be degraded. The orange and green lamps will now flash on and off indicating that voiced/unvoiced decisions are being made. If this does not happen, but only one colour remains on when speech is being produced from the tape recorder, then check the SLEW-FREEZE function. This is below the patchboard. When the red lamp is lit, then the voiced/unvoiced mechanism is frozen. The normal position for the slew freeze controls is with the switch off and the knob at 'fast'.

  • Next the filter bank patching and the synthesis input levels, Fig.10.

  • The Patch Board connects the analysing filter bank to the syntiesis filter bank. This is done with the patch pins, and they should be inserted as in Fig. 9. This is the normal position. Also, all the synthesis input levels should be turned fully clockwise. Now you should see the signal level amps above these pots being lit up the the incoming speech. These lamps indicate the energy levels in the filters and so they are in fact a rather crude real time spectrum display. If the lamps do not light up, then check the squelch switch. This is located inbetween the PPM's and it should be off (in the up position). Also make sure that the FREQUENCY SHIFTER ON/OFF switch is off.

    Now, the output mixer, Fig. 11.

  • This mixer has three inputs, A, B and C and one output, D, the level of which is displayed on a PPM meter, vertically above. The three inputs are one, the original speech signal; two, the excitation signal; and three, the Vocoder output. (That is the output from the synthesis filter bank). Note that the speech and the excitation can both be switched off completely.
    Turn off controls B, C, D and turn on control A. Now connect the output of the amplifier and speaker. As control D (the mixed output) is turned on, two things should happen. Firstly, the original speech will be heard and secondly, the speech signal will be seen on the output PPM. Turn off control A (speech).
    We now have nearly all the conditions necessary to produce synthetic speech. The only thing that is missing is the excitation.
    Figure 12 shows the excitation section of the Vocoder, with the exception of the excitation PPM which has been omitted.

  • This section comprises two VCO's, a noise source and the external excitation controls.
    First, I will explain the operation of the VCO's. Note that VCO 1 and 2 are the same and so for the purposes of this demonstration, I will use VCO 1. Set up the knobs and switches for VCO 1 as they are in Fig. 12. The excitation PPM will indicate a level of about 6 or 7, which can be adjusted by altering the level control knob. For best results use an excitation level of about 6 or 7 on the PPM. Now turn up the excitation control on the output mixer. You will hear a continuous tone which you can vary in pitch using the slow motion drive of VCO 1. Turn off the excitation control (this is the moment you have been waiting for) and turn on the Vocoder control to maximum . You are now hearing synthetic monotonic (constant pitch) speech. Try altering the slow motion drive and the pitch of the speech will vary.
    Now I will explain the functions of the switches, numbers 1 to 5, Fig. 12.
    Switch 1 enables the VCO's to be externally voltage controlled in frequency. Switch 2 is the connection between the VCO's and the pitch extractor. It has three positions: CALibrated, whereby a change in pitch in the input speech will produce the same interval change in the VCO; OFF, pitch extractor has no effect; VARiable, the pitch extractor controls the VCO with a gain factor between +2.2 to -2.2, which is itself controlled by the pitch knob.
    Switch 3 is used to select real time or sequencer control of the VCO's, Fig. 13.

  • That is, they can be controlled by an EMS keyboard, such as a KS or DKl or DK2. However, only the KS keyboard has a sequencer output. The real time and sequercer signals have pitch spread controls on the Vocoder, Fig. 13.
    Note that other manufacturers keyboards can be used, either by putting the the control in the external VC or in via the keyboard controls. Switch 3 has a centre off position. Switch 4 selects the output of the VCO to be either a ramp or a squarewave, the latter having only odd harmonics the former having both odd and even.
    On previous Vocoders, switch 4 was a sync switch which synchronised the VCO to the fundamental of the input speech signal. Switch 5 controls the output level of the VCO. When it is in the ON position, the VCO is on continuously; when the switch is in the V position, the VCO is only on when the incoming speech is 'voiced' speech. The green lamp is lit when this state is detected.

    Listen to the synthetic speech with switch 5 in the V position. You will note that the 'S' sounds are absent.
    Next, I will describe the noise source. This section is used to generate the 'S' sounds and the whispered speech effects. It has a level control and a colour control (a filter) as well as an output switch. This switch has three states: ON all the time, OFF and UV which is only on when unvoiced signals are detected. This last state is the opposite to the VCO output switch.
    Switch off the VCO and switch on the noise source. Turn both of its controls fully clockwise. The synthetic speech now produced should be a whisper. Now switch the noise to UV. Only the 'S' sounds should be produced.
    To complete the synthetic speech, switch VCO 1 to V. Now we have monotonic speech with 'S' sounds, known as fricatives.
    The last of the excitation sections is the input controls for the external excitations A and B. These are simply level controls plus a switch that allows the signal to pass all the time or only when voiced states exist or only when unvoiced states exist. Green and orange lamps indicate V and UV states.
    Next, the squelch switch inbetween the PPM's. This switch has the job of cleaning up the signal processing in the filter bank. When there is no speech signal, then any excitation breakthrough becomes noticeable and vice versa.
    The squelch switch brings into operation a circuit that detects the absence of either the speech or the excitation and then squelches the filter bank output. However, this switch is not to be used when slewing or freezing the Vocoder.
    Back to the synthetic speech. This speech is monotonic and therefore requires some movement in pitch to make it sound more natural. This is the job of the pitch extractor, Fig. 14.

  • This device extracts the fundamental signal from the speech and converts it to a control voltage. The switch and the pitch output knob,control the voltage that is sent to the back panel, in the same way as they do in the two VCO's. The QUALITY knob controls a filter which preceeds the pitch extractor. With this knob set at NORMAL the device makes less errors, when it is set at ERRATIC, it makes more. To demonstrate this, let's go back to the synthetic speech.
    Turn the QUALITY knob to normal and set switch 2 (VCO 1) to CAL. The result should be synthetic speech with pitch variance and fricatives. You may have to adjust the slow motion drive of VCO 1 to restore it to a natural pitch.
    Now turn the QUALITY knob to ERRATIC. The synthesised speech will occasionally produce a noticeably wild pitch. Turn the knob back to NORMAL. Next the SET-ZERO knob. The pitch range used by one speaker is normally not very great, but the range of speakers is. That is, a large man may be 3 octaves lower in pitch than a child, but both of them may only have a speaking range of 1/2 an octave. Now, it is sometimes important that the output voltage of the pitch extractor swings equally positive and negative for one particular speaker. For instance, when using the variable pitch spread knob to get a voice to go slowly from monotonic to varying pitch, there must not be a standing DC voltage on the output of the pitch extractor. If there is, then the resultant speech will have a fixed frequency shift proportional to the PITCH knob setting. Therefore, the job of the SET ZERO knob is to bias the pitch extractors' voltage output so that it swings equally positive and negative.
    Pitch Extractor
    The diagram below shows the Pitch Extractorts output for three speakers; A, a child: B, a woman; C, a man.

  • In this example, they are all speaking the same text, and thus their pitch variance is similar. However, they are displaced by a fixed interval, due to the physical differences between them. It is useful to have an output from the Pitch Extractor which swings equally about 0 volts. This is achieved using the SET ZERO knob. By adjusting this control, both A and C can be biased so as to swing equally about 0 volts.
    Next the SLEW-FREEZE section, Fig. 15.
  • Information inside the filter bank is analysed and the data produced is then used to control the synthesis process. However, before this data reaches the synthesis filters it has to pass through the SLEW-FREEZE section. Thus, it is possible to freeze the data and so hold a particular filter structure, on say a vowel sound. That is, the excitation can still be varied at will, but the filter structure is frozen from that point in time. Also, it is possible to slew the data. That is, to smear it out in time. The SLEW knob performs this function. As it is rotated anticlockwise, the data flow becomes slower and slower unit it eventually freezes (the lamp comes on). Note that the freeze switch will always freeze the sound, no matter where the setting of the knob.
    To demonstrate this, set up a synthetic monotonic speech output with only VCO 1, and no noise source. Turn the SLEW pot anti-clockwise and listen to the time smearing effect. Return it to fast and then use the freeze switch to freeze at various points in the speech. Note that the SLEW FREEZE section affects the filter bank, the V/UV detector and the pitch voltage. Try the previous operations on synthetic speech with both pitch and noise. The SLEW FREEZE section is also voltage controllable. For instance, you can use an external squarewave oscillator to freeze the Vocoder. If the square wave has a long freeze period and a short fast slew period, then some interesting effects can be obtained with an oscillator frequency of about 1 to 10Hz.

  • The last section to be covered is the FREQUENCY SHIFTER, Fig. 16.
  • Adjust the output mixer so that we are listening to the original speech only and set up the FREQUENCY SHIFTER so that its knobs and switches are as shown in Fig. 16. As the FREQUENCY SHIFT knob is rotated clockwise, the pitch of the speech will rise. You may have heard this effect before; it is, in fact, single sideband modulation as used in radio communications.
    Now rotate the OUTPUT knob fully anti-clockwise and repeat the process. This time the signal will fall in pitch. Thus we have seen how the SHIFTER can move signals both up and down in pitch. Now, set both the UP MIX and DOWN MIX to 5 and set the FREQUENCY SHIFT knob to a low frequency, about 1 or 2Hz. You should now hear phasing sounds which continually sweep in one direction. Turn the OUTPUT knob back to UP MIX and this direction will reverse. It is possible to shift any one of the signals present at the OUTPUT MIXER. This selection is done with rotary switch 12.
    Now the SQUELCH knob. Sometimes when there is no signal coming in, the frequency shifter will produce a faint audible tone. This is known as carrier breakthrough, but it should be about 60dB down on the signal level.
    However, if this is still unacceptable, the SQUELCH knob can be used to remove the breakthrough. This knob merely defines a signal level below which the entire signal is squelched off.
    Finally the remaining knob and switches. The PITCH knob and switch 9 are exactly the same in operation as those on the VCO's. Switch 10 allows an external voltage to control the shift frequency and switch 11 turns the frequency shifter on and off.
    That is all the controls described, so now (very briefly) I will show how to produce a few effects:
    Double Tracking
    Set up a synthetic speech output with pitch and fricatives, but move all the pins up one hole (a 1/4 octave). In fact, all the pins will not go in, and you will have to lose one. Note that the speech sounds like it is being produced by a smaller speaker.
    Now mix in some of the original speech and notice the double tracking effect. Try this again, but move the pins down.
    Time Compression
    Move the pins down by 4 holes (1 octave), play the tape at twice the normal
    speed and adjust the VCO pitch so that the best intelligibility is obtained.
    You are now hearing time compressed speech. Compare it with the original.
    Try different forms of excitation.. i.e. monotonic VCO or continuous noise.

    Time Expansion
    Repeat as above but move the pins up by 4 holes and run the tape at half speed. This decreased data rate makes things much easier for the Vocoder.

    Although the Vocoder is a portable device, and although it can be used live, it is still best thought of as being a piece of studio equipment.
    Experience has shown that it is much easier to use the machine when only one of its inputs (speech or excitation) is live. That is, either use pre-recorded speech and then try to match the excitation to it, or vice versa. For instance, when trying to make an organ sing, the procedure would be as follows:-
    1. make a high quality recording of the singing.
    2. Patch to recorded singing to the output of the organ to the Vocoder.
    3. Give the organist the Vocoder output to listen to. Have several rehearsals before recording. Note that if the organist takes his hands off the keyboard, all the sound will cease.
    When it is required to articulate the sounds of normally inanimate sources, then it is necessary to tailor the speech to the excitation. For example, if the wind has to speak, then the speech must be long and uninterrupted. Or if you want to freeze on a vowel, it would be advisable for the speaker to elongate this vowel so as to enable the freeze to be operated in time. If you want to make a crowd speak, then the speech must be smeared out in time, possibly by adding reverberation to it.
    Another way of using the Vocoder is to feed the output back into the excitation input, having no other sources of excitation. This will make the Vocoder self oscillate, but controlled by the incoming speech. This makes a sound like a talking wind instrument. Of course, the speech input can have other signals applied to it, such as musical irstruments, animal noises etc., and these will produce a variety of new sounds.
    The possibilities are limitless and I must leave it up to you to discover them.