The Engineering Staff of TEXAS INSTRUMENTS INCORPORATED Semiconductor Group SOURCE 2 # MOS INTEGRATED CIRCUITS **FOR VOICE SYNTHESIS** q RESES 004309 4309 **JUNE 1980** # TEXAS INSTRUMENTS **INCORPORATED** #### 1. FEATURES - High quality speech at low data rate. - Performance and reliability of electronic systems based on integrated circuits. - Cost effective fabrication process PMOS. - Minimum external components. - Simple external interface. - Low power consumption: Battery Operated. - Efficient coding permitting storage of a large number of messages. - Suitable for synthesis of Speech and Sound effects. #### 2. SYSTEM DESCRIPTION The VOICE SYNTHESIS SYSTEM described herein utilizes a method of speech encoding known as pitch excited linear predictive coding (LPC). The speech is synthesized by exciting a time varying digital filter modelling the human vocal tract with a digital representation of glottal air impulses for voiced sounds, or the rush of the air for unvoiced sounds. A typical three-chip system consisting of the TMS 5100 voice synthesis processor (VS/P), the TMS 6100 ROM voice and the TMS 1000 family controller is shown in Figure 1. The TMS 5100 is designed to synthesize speech from a variable-data-rate bit stream provided by the 128K read-only memory, the TMS 6100. Up to 16 TMS 6100 ROMs may be used in a single system, providing the potential of storing 2 million bits, or 30 minutes of speech, or a vocabulary of over 2500 words. The multifunction controller, a 4 bit microcomputer, spends very little time on speech synthesis itself; this is due to the simplicity of external interface required by the synthesizer. It can, therefore, fullfill easily its primary objective-control of peripherals, such as keyboard, displays, or sensors. #### 2.1 VOICE SYNTHESIS PROCESSOR The TMS 5100 chip is designed for a 40-Hz frame rate (the rate at which new speech data-typically 49-bits — obtained from the speech ROM) and a 8-kHz sampling rate, which corresponds to a 4 kHz voice-band. A block diagram of the synthesizer is given in Figure 2. The ten-stage digital filter shown in simplified form in Figure 3 has the excitation signal applied at the input of stage 10, and produces samples representing synthesized speech at the output of stage 1. The digital filter structure is that of two-multiply lattice filter, performing two's complement arithmetic with 10-bit time varying reflection coefficients and 14-bit intermediate results. The calculations performed in each stage are represented in Figure 4. Recoding pipeline multiplier performs these overlapping multiplies at a rate of one every 6.25 microseconds. For voiced sounds, a 6.25 millisecond long excitation signal is applied to the input at a time interval equal to the pitch period. For unvoiced sounds, the excitation has a constant magnitude and pseudo-random sign. The 12 synthesis parameters (the reflection coefficients K1-K10, pitch, and energy) are stored in the speech ROM in coded form. Each parameter can assume only a certain number of values from the 2<sup>10</sup> available. As the number of allowed levels is related to the number of code bits required in the speech ROM, a compromise has been made between speech quality and data storage. The distribution of the number of levels (and, consequently the number of code bits) among the synthesis parameters is given in Table 1. A full set of parameters for each frame would require a data rate of 1960 bits/second. In three special cases, i.e., slowly varying shape of the vocal tract, generation of unvoiced sounds, and during an inter-word or inter-syllable pause, the data rate can be further reduced. The combined effect of these three special cases reduces the average data rate to only 1100 bits/second. In most cases it is desirable for speech parameters to vary smoothly from frame to frame, rather than be updated every frame period. To this end, the TMS 5100 contains all necessary logic to do an approximately linear interpolation of all parameters at eight equidistant points within each frame. The TMS 5100 contains an 8-bit digital-to-analog converter with one-half LSB accuracy. It also incorporates a 36-milliwatt push-pull speaker driver. The TMS 5100 has a six line control interface partitioned as follows: four bidirectional lines CTL 1-8 for transfer of commands and ROM addresses to the TMS 5100, or of speech status, or of ROM data to the TMS 5100; one processor data clock line (PDC) to transfer the data on CTL 1-8; and, one chip select (CS) line, to enable the forementioned five lines. #### 2.2 VOICE SYNTHESIS MEMORY In order to store a relatively large vocabulary in a single-integrated circuit, the speech synthesis system makes use of the TMS 6100 - a $16,384 \times 8$ mask-programmable read-only memory. The chip features a multiplex addressing scheme with an internal 18 bit-address counter/register. Fourteen bits of the address go directly to the ROM array, while the remaining four MSB's address four programmable gates to select 1 of 16 chips. There are two control lines M0 and M1, and four data lines ADD 1-8. While ADD 1-8 constitute a four-bit wide input, ADD 8 acts also as the serial-data-output line. FIGURE 1 - TYPICAL THREE-CHIP SYNTHESIS SYSTEM FIGURE 2 - SYNTHESIZER BLOCK DIAGRAM <sup>†</sup>One of several TMS 1000 Family products may be used and the number of R-output lines depends on the particular product type used. FIGURE 3 - TEN-STAGE DIGITAL LATTICE FILTER FIGURE 4 – CALCULATIONS WITHIN A SINGLE STAGE ## **TMS 5100 PIN DESIGNATIONS** | NO. | SIGNATURE | DESCRIPTION | |-----|-----------|----------------------------| | 1 | TST | Test | | 2 | PDC | Processor-data-clock input | | 3 | ROM CK | ROM-clock output (160 KHz) | | 4 | CPU CK | CRU-clock output (320 KHz) | | 5 | $V_{DD}$ | Drain supply voltage | | 6 | C.R. OSC | Oscillator input | | 7 | R.C. OSC | Oscillator input | | 8 | T11 | Test sync | | 9 | NC | No internal connection | | 10 | I/O | Test/digital output | | 11 | SPK1 | Speaker drive | | 12 | SPK2 | Speaker drive | | 13 | PROM OUT | Test | | 14 | $v_{SS}$ | Substrate supply voltage | | 15 | MO | Command bit to TMS 6100 | | 16 | NC | No internal connection | | 17 | NC | No internal connection | | 18 | NC | No internal connection | | 19 | M1 | Command bit to TMS 6100 | | 20 | CTL4 | TMS 1XXX control | | 21 | ADD4 | TMS 6100 address | | 22 | ADD2 | TMS 6100 address | | 23 | CTL2 | TMS 1XXX control | | 24 | ADD1 | TMS 6100 address/data in | | 25 | CTL1 | TMS 1XXX control | | 26 | ADD8 | TMS 6100 address | | 27 | CTL8 | TMS 1XXX control | | 28 | CS | TMS 6100 chip select | #### TABLE 1 - LPC -10 SPEECH SYNTHESIS CODING | PARAMETER<br>NUMBER | PARAMETER | NUMBER OF ALLOWED VALUES | NUMBER OF CODE BITS | |---------------------|------------|--------------------------|---------------------| | 1 | AMPLITUDE | 15 | 4 | | 2a, | REPEAT BIT | 2 | 1 | | 2b. | PITCH | 32 | 4 | | 3 | K1 | 32 | 5 | | 4 | К2 | 32 | 5 | | 5 | К3 | 16 | 4 | | 6 | K4 | 16 | 4 | | 7 | K5 | 16 | 4 | | 8 | K6 | 16 | 4 | | 9 | К7 | 16 | 4 | | 10 | К8 | 8 | 3 | | 11 | К9 | 8 | 3 | | 12 | K10 | 8 | 3 | ## 3. TMS 5100 SPECIFICATIONS 3.1 TMS 5100 ABSOLUTE MAXIMUM RATINGS OVER OPERATING FREE-AIR TEMPERATURE RANGE (UNLESS OTHERWISE NOTED) 3.2 TMS 5100 OPERATING CONDITIONS AND CHARACTERISTICS (FOR COMPLETE CONDITIONS AND CHARACTERISTICS, SEE THE DETAIL SPECIFICATION FOR THIS DEVICE) | PARAMETER | MIN | NOM | MAX | UNIT | | |----------------------------------------------|-------------------|-----|------|------|--| | Supply voltage, VDD | -8.3 | -9 | -9.7 | ٧ | | | Supply current, IDD | | | 45 | mA | | | High-level input voltage | -0.7 <sup>†</sup> | | 0 | ٧ | | | Low-level input voltage | V <sub>DD</sub> | | 4 | V | | | Voltage applied to any input/output terminal | | | -24 | V | | | Oscillator frequency (external RC) | 608 | 640 | 674 | kHz | | <sup>&</sup>lt;sup>†</sup> The algebraic convention where the more negative (less positive) limit is designated as minimum is used in this document for logic voltage levels only. NOTE 1: All voltage values are with respect to $V_{\mbox{SS}}$ .