Tripod.com
Published Juni 1981 Electronics and Music Maker
by Raj Gunawardana, Texas Instruments Ltd.
Constructional details by Glenn Rogers and Peter
Kershaw
- E and MM brings you the first majar Solid State Speech project for
under 100 UK pounds
- Promises to have a dramatic impact on state-of-the-art electronics -
now, and for generations to come
- Complex talking library of over 200 words with further expansion
space
- Easy interfacing to a microcomputer through a few lines of BASIC
- Pitch control has exciting electronic music applications
x
- For some ten years Texas Instruments have been developing solid
state speech technology with the result that speech can now be produced
which faithfully preserves the character of the spoken voice including
intonation, accent, dialect, and pitch. Linked to a microcomputer, words
can be strung together to make complete phrases and sentences so that
voice communication between 'computer' and human becomes possible.
Wordmaker
- The uses of this project are far reaching and will be of benefit to
almost anyone who uses it. The carefully selected word library has many
applications in the home and industry, for telephone, burglar alarms,
conversations, messages, games, electronic terms, studio control
speaking clock, temperature indication, calendar, business coding,
factory announcements, and accountancy.
- This month we shall present the complete building project which can
be purchased as a kit and explain how to interface the WORDMAKER board
to a microcomputer. Possible interface circuits are included and BASIC
programs are also given. It has already been fully tested on the Sharp
MZ-80K and Tangerine systems. Further details are provided later for
other popular micros and we shall be following this article with
additional information on the processes of speech synthesis employed,
and readers' ideas for interfacing and use will be welcomed.
- The E and MM WORDMAKER Speech Synthesiser is based on the Texas
Instruments Voice Synthesis Processor (VSP). This card can be interfaced
to any computer system or used as an independent unit. The card
comprises the Texas TMS 5100 Voice Synthesis Processor; a memory bank
containing the vocabulary and an onboard amplifier.
VSP card
- The synthesis method used is called Linear Predictive Coding (LPC).
This is a technique developed by Texas which minimises the amount of
storage needed for each word. Human speech, like most communication
signals, contains a large proportion of redundant information. LPC
involves looking at the complete word as a binary data string and
removing any redundant data. The coding is then tested to check that the
word is spoken satisfactorily. The TMS 5100 contains a 10pole digital
filter which synthesises the voice; the filter is controlled by the LPC
data. For each word sample, the length of the data string written to the
TMS 5100 may vary from 4 to 49 bits. The device, therefore, requires
quite a high level of 'intelligence'.
- The Heart of the System-TMS 5100
- The TMS 5100 has five control lines. The command is set up on the
CTL lines and executed by toggling the command clock line, PDC.
Table 1 shows the complete list of commands and Figure 1 gives the pin
configuration of the IC.
Figure 1: TMS 5100 pin details Table 1: The TMS
5100 VSP Command Summary
- Load Address Command: This command causes the VSP to accept a
subsequent nibble (4-bits) of data set up on CTL lines as a speech
address segment which is transferred to voice synthesis (V/S) ROM
address registers.
Read and Branch: This instructs the VSP to set up appropriate control
signals to the V/S ROM, causing it to update its address registers with
the contents of the currently addressed pair of bytes.
- Speak: On receiving this command, the VSP takes over the control of
the V/S ROM and generates pulses on its l/O line to fetch bit serial
data from ROM and commences speech. Pulses on the I/O line occur in
bursts of a frame interval of twentyfive milliseconds. The number of
pulses in any one frame varies from 4 to 49, depending on the data. The
timing of I/O pulses for a maximum length of 49 bits, is shown in Figure
2. Details of the data structure will be discussed in a future article.
Figure 2: IO Pulse and Data input to TMS 5100 in
Talk mode
- Test Busy: This command permits the controller to access the TALK
STATUS LATCH of the VSR In operation the command is first set up on CTL
lines and the PDC line toggled once. A subsequent toggle of the PDC line
enables the Talk Status to be output to CTL1 line. The Talk Status will
be high during the execution of speech generation and will be set low
on an END or PHRASE code being' encountered. A third toggle of the PDC
line is required to return the VSP to a state of accepting new commands.
- Read Bit: This command causes the VSP to generate a single pulse on
the I/O line and thus read a single bit of data. Each data read is input
via the ADD8 line to 4-bit shift register in the VSP. Hence four
consecutive read operations are required to completely update the shift
register contents.
- Output: On receiving this command the VSP is initialised into
outputting its buffer contents to the CTL lines. A second PDC toggle
enables the CTL output buffers and a third is required to return the VSP
to the command mode. The output commamd coupled with Read Bit thus
allows the controller to access auxillary data stored in the V/S ROM.
- Reset: This command is used to establish known initial conditions in
the internal circuitry of the VSP in readiness for a following sequence
of commands. Since, the CTL lines convey data as well as commands to
the VSP, when previous conditions are not known, it is possible that a
command can be conveyed as data. Hence, it is necessary toggle PDC at
least three times whilst maintaining the Reset command on CTL lines, to
ensure correct synchronisation of subsequent commands. Reset can be used
in the middle of speech to stop VSP execution.
In the circuit design discussed in this article only the Reset, Test
Busy and Speak commands are used.
- Interfacing the VSP Design Considerations
- The following requirements have been considered in interfacing the
TMS 5100 to a microcomputer to operate as a speech peripheral:
- (1) SPEECH DATA MEMORY
should have means of serial data output and an autoincrementing address
register for sequential data access.
- (2) SPEECH DATA ADDRESS
should be presetable from the host processor to define current
enunciation required.
- (3) THE CONTROL INTERFACE
should be consistent with device specifications (of TMS 5100)
- (4) SIGNAL LEVELS to and from the controller should
be TTL compatible.
- As far as speech data memory is concerned, two approaches can be
made in implementation:
- (1) Speech data can be stored externally to the
processor in nonvolatile memory for stand alone operation.
- (2) Speech data can be supplied from within the
processor with synchronisation to suit the TMS 5100 timing (see Figure
2).
Figure 3: VSP interface
- The circuit discussed in this article takes the first approach, to
achieve stand alone operation. Figure 3 shows how the VSP could be
interfaced toa microcomputer by implementing a direct data path between
the address counter (and the controller, instead of via the TMS 5100 CTL
lines. This feature avoids the neeed to decode various commands (e.g.
Load Address), to maintain a record of command sequences and to build up
the contents of the address registers, one nibble at a time.
- Speech data memory can, in theory, either be non-volatile or Random
Access Memory (RAM). If memory comprises RAM, it would be possible to
'overlay' speech code read out from a slow bulk storage peripheral such
as floppy disc or cassette tape. The circuit discussed in this article,
however, uses a choice of EPROM types for speech data storage.
- Figure 4 shows a practical circuit designed in accordance with the
architecture discussed.
The circuit is designed to be driven from a byte oriented bus and
requires a number of control bits to clock data and to monitor VSP busy
conditions. Once commanded to TALK, the circuit will operate
independently of the processor to generate a single utterance.
Concatenation of such utterances has to be carried out by the host
processor.
Figure 4: Wordmaker complete circuit diagram
- The Control Interface. The control interface
comprises four lines named C0, C1, CCLK and BUSY. C0 and C1 are used to
set up three commands on CTL2, CTL4 and CTL8 lines, as shown in Table 2.
Command |
C0 |
C1 |
Reset |
1 |
1 |
Talk |
0 |
1 |
Testbusy |
0 |
0 |
Invalid |
1 |
0 |
Transistors TR1,TR2 and TR3 are used to convert TTL levels to drive
voltages suited to the TMS 5100. CCLK is used to clock commands set up
on C0 and C1 lines into the VSP. The VSP clock line, PDC, has to change
synchronously with the VSP ROM clock line. This is achieved by the use
of lC2b as a synchroniser. The CCLK line should therefore be held high
for a minimum duration, of 6.25 microseconds to guarantee that a command
would be accepted by the VSP.
The busy line can be used in one of two ways to monitor the end of an
utterance.. During speech and when the CTL1 line is in a disabled state,
the BUSY line will be low, producing a high level only when CTL1 is
enabled and subsequent to encountering an END OF PHRASE code. Hence, the
host processor can be made to monitor the BUSY line until a high level
is detected. Alternatively, more efficient use of the, host processor
can be achieved by using the positive going edge of the BUSY signal to
generate an interrupt.
- Speech Address Buffer/Counter
The address counter comprises four 74LS193 ICs which are 4bit binary
counters with parallel loading capability (IC4-IC7). The starting
address is loaded from the data input lines D0-D7, in two stages.
Applying, a low logic level to LDA1 causes the less significant byte of
the counter (IC6 & IC7) to be loaded with data setup on input lines
D0-D7. Applying a low logic level to LDA0 loads the more significant
byte of the counter.
Byte address incrementing pulses are derived from IC8 which is
programmed as a module 8 counter. lC3b is clocked with pulses generated
by VSP on I/O line. IC3b is used to invert the I/O line and as a buffer
to provide greater fanout capability. This results in IC8 incrementing
its contents on the negative edge of the I/O pulse and consequently
keeping track of bitcount, at a byte level, for accessing bit serial
speech data. At the commencement of speech, speech data is, output
starting with least significant bit of the first speech data byte.
Hence, IC8 is cleared every time a new address byte is loaded into the
less significant byte of the address counter. The 16bit address
counter permits a maximum speech memory capacity of 64K bytes. The total
capacity of the memory can be expanded by using extra counter stages,
if required. A 64K byte memory will store approximately 600 spoken
words.
- Speech Data Memory
In the circuit shown speech data can be stored in TMS 2516 (16K-bit),
TMS 2532 (32K-bit) or TMS 2564 (64Kbit) EPROMs, by wiring an appropriate
set of links. Tables 3, 4 and 5 show the links required for each EPROM
type and the resulting memory maps.
Serial data is derived by the use of a 74LS151, an eight to one line
multiplexer. IC10 data input is fed from the data output of EPROMs. The
select input of IC10 is obtained from IC8 which maintains a module 8
count which is incremented once, when a single data bit is accessed by
the VSP The output of the multiplexer is conveyed through IC2a which is
used as a single-bit shiftregister clocked by I/O pulse. The purpose of
IC2a is to synchronise serial speech data such that data equested by a
particular I/O pulse (see Figure 2) is stored unchanged despite the bit
count and the memory address changing as a result of address
incrementation.
Table 3: Speech memory address mapping for TMS 2516 Table
4: Speech memory address mapping for TMS 2532 Table 5: Speech memory
mapping for TMS 2564
- Audio signal Conditioning
IC11, a quadoperational amplifier is used to condition the
differential audio output of the VSP (SP1 and SP2) into a form suitable
for driving a general purpose 8-ohm speaker. IC11a converts the
differential pushpull output current into a singleended voltage output.
This signal is then low pass filtered by the active filter comprising
IC11b to get rid of any harmonic distortion, generated by the 8Khz
sampled output from the D to A converter.
The thirdstage of the op-amp is used along with transistors TR6-9, to
provide power amplification. The amplifier is capable of producing up to
4.5 Watts of audio power into an 8 Ohm speaker. At this power rating,
it will be necessary to mount TR8 and TR9 on heatsinks to maintain
devices within operating temperature. At reduced power levels, the heat
sinking area etched on the PCB should be adequate for normal operation.
- Power Supply Requirements
Figure 6 shows the distribution of power supplies in the circuit. The
negative 5 volt supply is generated on the PCB, by using REG1 (voltage
regulator) and tapping on to the negative 12 volts supply. Typical power
requirements (for a board fully populated with TMS 2532 EPROMs) are +5V
@ 300mA, +12V @ 50mA, -12V @
50mA without any audio output.
Figure 6: Power supply distribution
- Speech Data EPROMs
Table 7 gives the speech starting addresses for data in the PROMs
provided as parts of the kit. EMM1, EMM2, EMM3 and EMM4 should be
plugged into IC sockets IC12, IC13, IC14 and IC15 respectively. The
links should be connected according to Table 4 (i.e. same as for TMS
2532 EPROMs). In the kit SPST DIL switches are provided for this
purpose.
Table 7: E"MM speech data EPROM listing
- Construction and Setting Up
- Figure 5 shows the component overlay for the circuit. The first step
is to fit all the necessary links between the two sides of the PCB
(using Track pins or small lengths of wire). Care should be taken when
soldering on this board as the tracks are fine and often very close
together. The resistors and capacitors can then be fitted, followed by
the diodes, soldering both sides where necessary. Next, make all the IC
sockets using Soldercon connectors and again solder to both sides of the
PCB where necessary. Having completed these stages you can fit the
transistors, the voltage regulator (REG1) and IC11.The powertransistors
(TR8, TR9) and the negative 5 volt regulator should be positioned flat
as shown in Figure 5 and bolted on to the PCB to achieve good thermal
dissipation.
Figure 5: Component Overlay
- Before you plug in any more ICs, connect a low impedance speaker and
power up the (connection details are shown in Table 6). Check the
supply currents and voltages (the current should be approximately 50mA
on +12V lines and negligible on +5V line). Next, check the amplifier is
operating. If all is well can proceed and fit the rest of the
components.
The pin numbers given in circuit diagram are correct for the TMS 2564
only. TMS 2532 and TMS 2516 ICs have 24 pins compared with 28 pins for
the 2564. The signal lines match when the lower 24 pins of the 28 pin
configuration are used (i.e. pin numbers 1, 2, 27 & 28 are not
used).
Table 6: Edge connector details
- Note: When using 24-pin p packages you must link pin 28 to pin 26 on
ICs 12-19 (see photo)
For correct speed of operation the TMS 5100 internal clock frequency
should be adjusted with RV1 to obtain a square wave of period 6.25us (a
frequency of 160KHz) at ROM CLK (pin 3) of IC1. The correct adjustment
is nominally midway on RV1. If instruments for adjustment are not
available, good results can be obtained by listening to the speech
output and making the adjustment such that the output sounds 'normal'.
Photo: Wire links in place for EPROM's supplied not available
- Care should be taken in handling the TMS 5100 which can be damaged
by static discharges.
The kit of parts does not contain an edge connector. The RS467-425
20-way, double edge connector is suitable and instead of Veropins for
soldering the speaker connections, the screw connector socket
(RS423-762) can be used (both available from Radio Spares). A suitable
Power Supply circuit diagram for the WORDMAKER is shown in Figure 11.
Figure 11: Suggested power supply
- Now you know all about the E&MM WORDMAKER but is it any use to
you? The all-important question is 'Will it interface to my
microcomputer?' Well, here is a simple guide to give you some idea. List
1 contains all the popular systems which can be used with available
modules. List 2 contains all the popular microcomputers which will drive
the WORDMAKER if a simple dedicated interface is used such as the one
shown.
LIST 1
Sharp MZ-80K |
with parallel I/O card and expansion unit |
Nascom 1 & 2 |
as standard |
Apple/ITT 2020 |
with parallel I/O card |
Commodore Pet |
with parallel I/O expansion |
Atari 400 & 800 |
with parallel I/O expansion |
Tangerine Micron |
as standard |
Acorn |
as standard |
Video Genie |
with parallel I/O expansion |
LIST 2
Microcomputer |
Addressing mode |
Sharp MZ-80K |
I/O mapped |
Tandy TRS 80 |
I/O mapped |
Sinclair Zx80/81 |
I/O mapped |
Apple/ITT 2020 |
memory mapped |
Commondore Pet |
memory mapped |
Atari 400 & 800 |
memory mapped |
UK 101 |
memory mapped |
OHIO Superboard |
memory mapped |
- Communication with the VSP card is carried out through two ports;
one to supply the address of the word defining data in the V/S ROM, and
the other to set the various control functions. There are two preset
potentiometers on the card; RV1 controls the spead and pitch of the
voice; RV2 controls the volume of the onboard amplifier. All the
connections on the board are TTL compatible for easy interfacing (see
Figures 9 and 10).
Figure 9: Connection to a standard PIO/PIA port
Figure 10: Purpose-built interface (memory mapped or
I/O addressed
- The VSP card is very simple to use and the flowcharts in Figure 7
show the sequence of operations. Figure 8 shows the sequence of
commands and the relevant timing. On 'power up' the card must be
initialised bysettirig C0 and C1 to 'RESET' (see Table 2), toggling CCLK
3 times then setting C0 and C1 to 'TEST BUSY' and toggling CCLK a
further 3 times. The card is then ready to talk to you. The flowchart in
Figure 7(a) also shows a 'dummy test talk' command can be executed in
order to avoid an audible click that may be generated prior to
commencement of speech. To make it speak the address of the word is
written to the card.
Figure 7: Flowchart
Figure 8: VSP interface control signal timing
- The two address bytes are latched into the VSP card by taking LDA1
(for LS byte) and LDA0 (for MS byte) low for at least 6.25us (at an
oscillator frequency of 640kHz). It is important to note that the LS
byte must be loaded first. Having set up the address all we need to do
now is send the 'TALK' command on C0 and C1 and toggle CCLK once. Then:
Hey Presto, it speaks! If any problems are encountered at this point, a
logic probe will be useful for checking that the control and data input
lines are providing the correct 'high'/'Iow' signals via the board
connector to the EPROMs and associated logic ICs. Resistor values for
R40, R41 R42 may need changing in order to get the right 'pullup'.
Wordmaker circuit board
- If you are using the VSP card with a computer system it wi probably
become necessary some stage to be able to test whe one word has finished
so you ca start another. If you try and start word while the VSP is
speaking, will miss the end of the first word and say the next - or it
might just stop altogether. Using the 'TEST BUSY' command it is possible
to monitor the BUSY line. This done by setting the 'TEST BUSY' command
on C0 and C1 and toggling CCLK twice, then reading the BUSY line. When
BUSY goes high you toggle CCLK once more and then initiate the next talk
cycle. The BUSY line output (connector pin 38) need not be connected
when first testing the board for correct speech opera tion (e.g. using
Test Program 1)
- These programs are written BASIC to run on the Shar MZ-80K. The port
is assumed to be addressed I/O. If you wish to use a memory-mapped
syster replace all output statement with 'pokes' and input statement
with 'peeks'. The programs ar written as subroutines to aIIow them to be
easily incorporate into existing BASIC program (see Subroutines).
During the 'Initialise' subroutine, you will need to specify the port
address. On the Sharp this is simply two numbers, say 2 and 3.
- Test Program 1: By entering the word start address
in decimal when prompted, the VSP card will say the word. WL=LS byte,
WH=MS byte. This program is a continual loop and to stop use escape',
'break' or 'control C' command.
Test Program 2: By entering a string of word start
addresses in the DATA line as follows: WL1, WH1, WL2, WH2 ...., the VSP
card can be made to speak the entered sentence or phrase. If you use the
data list (lines 35-62) the WORDMAKER speaks the whole word library
available in correct EPROM order (see Table 7). Note that the
'decimal' Address has the correct numbers for operation instead of a
straightforward Hex conversion.
Test Program 3: This program, based on Test Programs 1
and 2, gives some sample sentences and tones which are recorded on our
demonstration cassette No.2. Pauses of varying lengths are easily made
by inserting a FOR / NEXT loop at line 28 as shown. Some idea of the
musical potential, using varying pitch/clock rates by adjusting RV1
(this can be increased to 100k for greater range), is also given.
Exciting possibilities are evident here.
Program listings 1
Program listings 2
Parts list
- We hope you will find the simple programs helpful in your
investigation into the world of talking computers and that you wont
spend too many hours talking to your computer as opposed to your family
or friends!E"MM
|