XVI. SPEECH COMMUNICATION A. E. C. F. G. G. W. Hughes Jane B. Arnold P. T. Brady O. Fujimura J. M. Heinz Prof. M. Halle Prof. K. N. Stevens Dr. T. T. Sandel C. G. Bell H. Lenneberg I. Malme Poza Rosen SPEECH SYNTHESIZER DYNAMIC ANALOG Study of the generation of fricatives by the dynamic analog speech synthesizer has continued, with emphasis on the role of temporal variables in the perception of synthetic fricatives. A fricative-vowel syllable, the following parameters: /fa/, was generated with various settings time at which articulation changes from /// to /a/, and time at which glottal excitation commences. at which noise ceases, of time Pairs of syl- w 0 Z O 0 000 U ioo O w a 0 o i ARTCULATION 0 50 100 150 200 250 300 350 400 450 5C T!ME IMSEC) Fig. XVI-1. Schematic of timing patterns, showing duration of events, their starting In case of inflection, the times, and rates at which changes occur. piecewise-linear curve represents the frequency of glottal excitation; the curves labeled "buzz" and "noise" represent amplitudes of buzz and noise voltages that excite the dynamic analog; the curve labeled "articulation" represents the course of the articulatory configuration of the dynamic to the /a/ positions. analog from the /f/ lables were presented to listeners, who were asked to choose the more "natural" member of each pair. The criterion was stated in written instructions given to the listeners for each test of the series. The initial and final configurations were identical for all stimuli. articulatory configuration and point of noise insertion for the previous study (1). This particular //f/ /f/ The choice of was based on a configuration received a unanimous vote in a Air Force (Air Force Cambridge Research Center, Air Research and Development Command) under Contract AFl9(604)-2061. 171 (XVI. SPEECH COMMUNICATION) - 00 2 0 6 5 7 4 ioo S2 0 3 4 5 3 54 6 7 6 7 1002 457 S2 3 1 100 2 150 3 4 5 6 4 5 6 7 7 200 300 250 TIME (MSEC) Fig. XVI-2. Summary of timing patterns. Each numbered ramp represents a test item. Whenever a given variable was studied the other variables were set at neutral values indicated by the heavy lines. (a) Noise time settings; (b) buzz time settings covering the /f/ range; (c) buzz time settings covering the /3/ range; and (d) articulation time settings. nonforced absolute identification test involving 6 fricatives, ments per stimulus per subject. position, 8 subjects, and 3 judg- The chosen item has a tight constriction at the fifth and the noise was inserted at the second position (2). These positions are 3.5 cm and 1 cm behind the mouth opening. Figure XVI-1, which is the timing pattern for the stimuli in the previous study, also represents the set of neutral values used in the present study. Times are indicated relative to the start of the counting cycle in the sequential controller. In each of three experiments one of three variables (buzz, noise or articulation) was studied, by using stimuli with different timing trigger instants before or after the neutral value. Five subjects participated in the tests, and the method of paired comparisons was used. Each of 7 stimuli, representing 7 timing patterns, was paired, in a randomized order, with each of the other 6 stimuli, and the listener was asked to pick the better stimulus of each pair according to a stated criterion. The score gives the total number of times a given stimulus was judged better than another stimulus. that any stimulus may receive is zero, the average is 172 30, The minimum score and the maximum is 60. A (XVI. SPEECH COMMUNICATION) 6 40 20C 10 L 40 160 180 150 200 7 90 70 210 230 250 240 260 TIME (MSEC) TIME (MSEC) (o) o w i 6 3 7 4 40 i j 4/ 303 20 io - ioo 120 6( 140 180 160 20( 180 Fig. XVI-3. 200 220 TIME(MSEC) TIME(MSEC Summary of subject responses: (a), (b), (c), and (d) should be paired with Fig. XVI-Za, b, c, and d. The numbered points on each response curve refer to like-numbered test items in Fig. XVI-2. voicing modulator for modulating the noise with buzz was always in the noise-generator The modulator is important in connection with stimuli in which noise and buzz circuit. excitations overlap in time, but otherwise it is of little consequence. The timing pattern for the noise-off time study is shown in Fig. XVI-2a, subject responses are summarized in Fig. XVI-3a. seven test items. It seems, curve. There is a plateau, therefore, Numbers and the 1 through 7 refer to the approximately 20 msec wide, in the response that a timing error that produces a gap is more serious than one which results in the same amount of excess overlap. Figure XVI-2b and 2c shows the timing patterns for the buzz time studies, and 3c gives the Fig. XVI-3b Fig. XVI-Zb cover the range for corresponding /f/, subject responses. and those in Fig. The and stimuli in XVI-2c cover the range for /C/. In both cases the subjects were instructed to vote for the more natural stimulus, regardless of whether it was /f/ or 173 /3/. We are attempting to establish (XVI. SPEECH COMMUNICATION) the region in which both /// and /5/ can be produced. It will take a supplementary test to determine the change in response from If/ to /3/ as the buzz onset time is varied. This can be done with a simple 2-item absolute identification test. Fig. XVI-2d shows the timing patterns for the articulation time studies, and Fig. XVI-3d gives the corresponding subject responses. There is a plateau, approximately 10 msec wide, in the response curve. The results presented here summarize only a first study of temporal variables with the dynamic analog speech synthesizer; the investigation is not exhaustive, since it pertains to one consonant and one set of ramp durations. However, an estimate of the degree of accuracy required for specifying the articulatory transition time, buzz onset time, and noise cessation time has been obtained. They need be specified no more closely than within one to three pitch periods for male speech. G. Rosen References 1. G. Rosen, Dynamic analog speech synthesizer, Quarterly Progress Report No. 52, Research Laboratory of Electronics, M. I. T., Jan. 15, 1959, p. 142. 2. Ibid, see Fig. XVIII-8b, p. 143. 174