XVI. SPEECH COMMUNICATION Prof. M. Halle

advertisement
XVI.
SPEECH COMMUNICATION
A.
E.
C.
F.
G.
G. W. Hughes
Jane B. Arnold
P. T. Brady
O. Fujimura
J. M. Heinz
Prof. M. Halle
Prof. K. N. Stevens
Dr. T. T. Sandel
C. G. Bell
H. Lenneberg
I. Malme
Poza
Rosen
SPEECH SYNTHESIZER
DYNAMIC ANALOG
Study of the generation of fricatives by the dynamic analog speech synthesizer has
continued, with emphasis on the role of temporal variables in the perception of synthetic
fricatives.
A fricative-vowel syllable,
the following parameters:
/fa/,
was generated with various settings
time at which articulation changes from ///
to /a/,
and time at which glottal excitation commences.
at which noise ceases,
of
time
Pairs of syl-
w
0
Z
O
0
000
U ioo
O
w
a
0
o
i
ARTCULATION
0
50
100
150
200
250
300
350
400
450
5C
T!ME IMSEC)
Fig. XVI-1.
Schematic of timing patterns, showing duration of events, their starting
In case of inflection, the
times, and rates at which changes occur.
piecewise-linear curve represents the frequency of glottal excitation; the
curves labeled "buzz" and "noise" represent amplitudes of buzz and noise
voltages that excite the dynamic analog; the curve labeled "articulation"
represents the course of the articulatory configuration of the dynamic
to the /a/ positions.
analog from the /f/
lables were presented to listeners, who were asked to choose the more "natural" member
of each pair.
The criterion was stated in written instructions given to the listeners for
each test of the series.
The initial and final configurations were identical for all stimuli.
articulatory configuration and point of noise insertion for the
previous study (1).
This particular
//f/
/f/
The choice of
was based on a
configuration received a unanimous vote in a
This research was supported in part by the U. S. Air Force (Air Force Cambridge Research Center, Air Research and Development Command) under Contract AFl9(604)-2061.
171
(XVI.
SPEECH COMMUNICATION)
-
00
2
0
6
5 7
4
ioo
S2
0
3
4
5
3
54
6
7
6
7
1002 457
S2
3
1
100
2
150
3
4
5
6
4
5
6
7
7
200
300
250
TIME (MSEC)
Fig. XVI-2.
Summary of timing patterns.
Each numbered ramp represents
a test item.
Whenever a given variable was studied the other
variables were set at neutral values indicated by the heavy
lines.
(a) Noise time settings; (b) buzz time settings covering
the /f/
range; (c) buzz time settings covering the /3/
range;
and (d) articulation time settings.
nonforced absolute identification test involving 6 fricatives,
ments per stimulus per subject.
position,
8 subjects,
and 3 judg-
The chosen item has a tight constriction at the fifth
and the noise was inserted at the second position (2).
These positions
are
3.5 cm and 1 cm behind the mouth opening.
Figure XVI-1,
which is the timing pattern for the stimuli in the previous study, also
represents the set of neutral values used in the present study.
Times are indicated
relative to the start of the counting cycle in the sequential controller.
In each of three
experiments one of three variables (buzz, noise or articulation) was studied, by using
stimuli with different timing trigger instants before or after the neutral value.
Five subjects participated in the tests, and the method of paired comparisons was
used.
Each of 7 stimuli, representing 7 timing patterns, was paired, in a randomized
order, with each of the other 6 stimuli, and the listener was asked to pick the better
stimulus of each pair according to a stated criterion.
The score gives the total number
of times a given stimulus was judged better than another stimulus.
that any stimulus may receive is zero, the average is
172
30,
The minimum score
and the maximum is
60.
A
(XVI.
SPEECH COMMUNICATION)
6
40
20C
10
L
40
160
180
150
200
7
90
70
210
230
250
240
260
TIME (MSEC)
TIME (MSEC)
(o)
o
w
i
6
3
7
4
40
i j
4/
303
20
io -
ioo
120
6(
140
180
160
20(
180
Fig. XVI-3.
200
220
TIME(MSEC)
TIME(MSEC
Summary of subject responses: (a), (b), (c), and (d) should be paired
with Fig. XVI-Za, b, c, and d. The numbered points on each response
curve refer to like-numbered test items in Fig. XVI-2.
voicing modulator for modulating the noise with buzz was always in the noise-generator
The modulator is important in connection with stimuli in which noise and buzz
circuit.
excitations overlap in time, but otherwise it is of little consequence.
The timing pattern for the noise-off time study is shown in Fig. XVI-2a,
subject responses are summarized in Fig. XVI-3a.
seven test items.
It seems,
curve.
There is a plateau,
therefore,
Numbers
and the
1 through 7 refer to the
approximately 20 msec wide,
in the response
that a timing error that produces a gap is more serious
than one which results in the same amount of excess overlap.
Figure XVI-2b and 2c shows the timing patterns for the buzz time studies,
and
3c
gives
the
Fig.
XVI-3b
Fig.
XVI-Zb cover the range for
corresponding
/f/,
subject
responses.
and those in Fig.
The
and
stimuli
in
XVI-2c cover the range
for /C/.
In both cases the subjects were instructed to vote for the more natural
stimulus,
regardless of whether it was
/f/
or
173
/3/.
We are attempting to establish
(XVI.
SPEECH COMMUNICATION)
the region in which both ///
and /5/ can be produced. It will take a supplementary
test to determine the change in response from If/
to /3/
as the buzz onset time is
varied. This can be done with a simple 2-item absolute identification test.
Fig. XVI-2d shows the timing patterns for the articulation time studies, and
Fig. XVI-3d gives the corresponding subject responses. There is a plateau, approximately 10 msec wide, in the response curve.
The results presented here summarize only a first study of temporal variables with
the dynamic analog speech synthesizer; the investigation is not exhaustive, since it
pertains to one consonant and one set of ramp durations.
However, an estimate of the
degree of accuracy required for specifying the articulatory transition time, buzz onset
time, and noise cessation time has been obtained. They need be specified no more
closely than within one to three pitch periods for male speech.
G. Rosen
References
1. G. Rosen, Dynamic analog speech synthesizer, Quarterly Progress Report
No. 52, Research Laboratory of Electronics, M. I. T., Jan. 15, 1959, p. 142.
2. Ibid, see Fig. XVIII-8b, p. 143.
174
Download