Detection of Stop Consonant Voicing: Toward a... Independent Model

Detection of Stop Consonant Voicing: Toward a Speaker
Independent Model
by
Xiaomin Mou
B.S., Massachusetts Institute of Technology (2000)
Submitted to the Department of
Electrical Engineering and Computer Science
in Partial Fulfillment of the Requirements for the Degree of
Master of Engineering in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June, 2001
©Xiaomin Mou. All rights reserved.
The author hereby grants to M.I.T. permission to reproduce and distribute publicly
paper and electronic copies of this thesis document in whole or in part.
SARKER
MASSAChUSETT S W9TITUTE
OF TECH4OLOGY
JUL 3 1 2002
Author........
.................
LIBRARIES
Oepartment of Electrical Engineering and Computer Science
May 23, 2001
Certified by ......
Kenneth N. Stevens
Clarence J. LeBel Professor of Electrical Engineering
Thesis Supervisor
Accepted by ...........
.
Aithur C. Smith
Chairman, Department Committee on Graduate Students
Detection of Stop Consonant Voicing: Toward a Speaker
Independent Model
by
Xiaomin Mou
Submitted to the Department of Electrical Engineering and Computer Science
on May 23, 2001, in Partial Fulfillment of the Requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science
ABSTRACT
In this thesis, a method is described for determining from acoustic analysis whether a stop
consonant is voiced or voiceless. Stop consonant production and conditions for voicing are first
presented. A preliminary set of acoustic cues for determining voicing is formulated next from
knowledge of acoustic theory. The acoustic cues include the fundamental frequency, first
formant frequency, and the relative amplitudes of the first harmonic, first formant prominence
and third formant prominence. The fundamental frequency in the adjacent vowel is used to gauge
the stiffness of the vocal folds. Additional cues are the voice onset time (VOT) from release to
the onset of voicing and the voice offset periodicity (VOP) immediately after the closure. Some
of the measures are used to estimate the spread of the glottis and are measured immediately
before the closure and after the onset of voicing, and others provide evidence for stiffening or
slacking of the vocal folds. VOT and VOP are the most important voicing cues. VOT of
unvoiced stop consonants is on average 45ms higher than that of their voiced counterparts and
VOP of voiced stop consonants is on average significantly greater than that of their voiced
counterparts. The fundamental frequency, the change in first harmonic amplitude and the change
in the difference between the amplitudes of first and second harmonic are cues that can
contribute to voicing identification. The results show that a small set of acoustic cues based on
theory of speech production may be reliable in determining voicing.
Thesis supervisor: Kenneth N. Stevens
Title: Clarence Lebel Professor of Electrical Engineering
2
Acknowledgements
I thank my advisor, Ken Stevens, who fostered my interest in the field of speech
communications, for his unfailing encouragement and illuminating insight which helped me
grow both as a researcher and as a person. His feedback during our meetings raised important
questions and challenged me to be more critical with my research.
I thank members of the speech communications group for the weekly seminars which
often brought in speakers across the disciplines and gave me a chance to appreciate the
complexity of this field. I thank Dan Shub, Laura Dilley, Stefanie Shattuck-Hufnagel, and Ariel
Salomon, for recording the speech data used in this thesis. I thank Arlene for her support.
I am grateful for my best friend Cherry Liu, who has been my cheerleader through
frustrating moments and all the way to the final stretch.
Finally, I thank my parents for loving me and believing in me, always.
3
Contents
1. Introduction
9
1.1 Lexical representation in terms of segments and features
10
1.2 Landmarks and acoustic cues
13
1.3 Thesis outline
15
1.3.1 Past work
15
1.3.2 Present study
16
2. Production and cues of stop consonant voicing
17
2.1 Voicing
17
2.2 Stop consonant voicing
18
2.3 Acoustic cues for stop consonant voicing
19
2.3.1 Voicing context
19
2.3.2 Voicing cues
21
4
3. Databases and acoustic measurements
23
3.1 Description of database
23
3.2 Description of analysis tool
24
3.3 Measurements
25
3.4 Data
28
3.4.1 Voice onset time (VOT)
28
3.4.2 Voice offset periodicity (VOP)
30
3.4.3 Glottalization
32
3.4.4 H1, H2, F1, Al, and A3
34
4. Acoustic cue analysis
41
4.1 Analysis of Hl, H2, Fl, Al, and A3
42
4.2 Average data for voicing cues
42
4.2.1 Average FO data
48
4.3 Combining acoustic cues
50
5. Summary and Discussion
56
5.1 Voicing in isolated utterances
56
5.2 Further work
57
Appendix A
58
Appendix B
76
5
List of Figures
1.2
Vowel and consonant landmarks for "We took a hike."
14
2.3
Measures of stop consonant voicing
22
3.3
An example of gathering measurements for a VCV utterance
26
3.4.1
Average voice onset time (VOT)
30
3.4.2
Average voice offset periodicity (VOP)
31
3.4.3a Glottalization of /t/
33
3.4.3b Average Fl decrease for unvoiced stops
34
3.4.4a Measurements for determining voicing at the closure and release for voiced
and unvoiced consonants in VCV utterances by male speakers
37
3.4.4b Measurements for determining voicing at the closure and release for voiced
and unvoiced consonants in VCV utterances by female speakers
38
3.4.4c Measurements for determining voicing at the closure and release for voiced
and unvoiced consonants in CVC utterances by male speakers
39
3.4.4d Measurements for determining voicing at the closure and release for voiced
and unvoiced consonants in CVC utterances by female speakers
6
40
4.2a
Average VCV measurements
43
4.2b
Average CVC measurements
44
4.2. 1a Average FO values at the release and closure landmarks for VCVs and CVCs
49
4.2.1b FO values for male and female utterances
50
4.3a
Scatter plots of voicing cues in VCV utterances using MAX values
52
4.3b
Scatter plots of voicing cues in CVC utterances using MAX values
53
4.3c
Scatter plots of the measurement Hi-Al against HI using MAX values
54
4.3d
Scatter plots of the measurement H1-A3 against HI using MAX values
55
7
List of Tables
1.1 Feature chart for standard segments in English
8
11
Chapter 1
Introduction
The study of speech communication involves an understanding of how a speaker organizes a
discrete linguistic representation into an acoustic signal and how a listener decodes that acoustic
signal back to the linguistic representation. Quantitative models of the human speech perception
and production mechanisms can be used to improve the performance of speech processing
applications such as automatic speech recognition. The characteristic information contained in
the acoustic signal derived from phonetics and linguistics theories may provide crucial cues that
are robust even in natural environments where casual speech may be sloppy and background
noise high.
In clearly enunciated speech, words are produced with all of their features well
represented in the sound, whereas in casual speech the acoustic cues for selected features are
often modified in context. Small children manage to develop a system for speech perception that
accounts for the variability of speech, which depends on context, speaker, the mode of speaker,
and the speech environment. They also learn which modifications are acceptable. Current speech
9
recognition models which rely on training with large amounts of speech data and modeling
variability probabilistically fall short of human speech perception for casual speech. Advances in
the study of acoustic phonetics are leading to an understanding of relations between linguistic,
articulatory, acoustic, and auditory representations of speech. These relations have shown that
the variability in speech is not random. Research has shown that the variability arises from
principles of speech production and perception and suggests that modeling this variability is a
key in improving existing models of natural speech recognition (Stevens, 1995).
1.1 Lexical representation in terms of segments and features
The procedure for extracting from casual speech a description of an utterance in terms of
distinctive features is a component of a lexical access model (Stevens, 1995). The lexical access
model assumes that words are stored in memory as sequences of phonological segments, such as
vowels or consonants, each of which is describable by a set of distinctive features. Three kinds of
evidence for this word representation are provided. First, the words in a language can potentially
be organized into minimal pairs such that one feature of one segment has a different binary
value. Examples of such a pair are "bit" and "pit". Secondly, the constraints on the structure and
formation of the words can be expressed by rules based on features (Chomsky and Halle, 1968).
Thirdly, the anatomical organization of the vocal tract and the respiratory system is responsible
for the quantal nature of speech sounds (Stevens, 1989). Over some regions of an articulatory
parameter the acoustic properties are relatively stable, but movements outside of these regions
show abrupt changes in the acoustics.
The lexical access model deduces word sequences by matching patterns of distinctive
features against a stored lexicon of words. There are two types of distinctive features: the
10
articulator-free and articulator-bound features. Table 1. 1(Choi, 1999) is a feature chart for the
standard segments in English.
symbol
iy
ih
ey
eh
ae
aa
vowel
glide
consonant
sonorant
continuant
strident
+
+
+
+
+
+
ao
+
ow
+
ah
+
uw uh
rr
+
+
+
ex
au
+
ai
+
oi
+
+
h w u r
+
+
+
+ ++
+
stiff
slack
+
spread
constricted
advancedconstrictedtogue root
+
-
+
-
-
-
-
+
-
+----
-
-
-
-
-
+
+
-
-
-
-
-
+
-
+
-
-
-
-
-
-
+
-
-
+
+
-
-
-
+
-
-
-------+ -
++
+
-
+
-
-
-
nasal
tongue body
blade
lips
high
low
back
-
-
-
-
+
-
+
+
+
+
+
-
+
+
-+
+ ++
anterior
distributed
-+
++-
- +
++-
+
++-
+
+
lateral
+
rhotic
round
+
+
+
11
+
+
+
+
+
I m n ng
symbol
v dh zzh
f th s sh
++++
- ++++
-
++
b d g
p
t k
dj ch
vowel
glide
consonant
sonorant
continuant
strident
++
+ +
+ + + +
- -
++
++++
- + +
+ +
+ +
stiff
slack
+ +
-
+ + +
----
+ + +
- -
+
-
+
-
-
-
-+
+
-+
+
-
-
-
+
+ +
+ +
+
spread
constricted
advancedconstrictedtogue root
nasal
-
tongue body
blade
+
+ + +
+ + +
+ + +
+
lips
+
+
+
+
+
+
+
+
+
high
-
+
+
+
low
-
back
+
+
+
+
anterior
distributed
+
+
lateral
rhotic
round
+ +
+ - +
+
-
++ -
+
+
+
+
+
+
-
-
+
+
-
-
-
-
-
Table 1.1: Feature chart for standard segments in English. The vowels and glides are shown in
the top part of the table, and the consonants in the bottom part.
The top half shows features for vowels and glides and the bottom half shows the features
for consonants. The features for the stop consonants are shown bold-faced. The first six rows are
the articulator-free features and the remaining rows are the articulator-bound features. The
articulator-free features refer to how the articulators are manipulated. For example, the features
[high], [low], and [back] describe the position of the tongue body, and are represented for the
vowels and most glides, as well as consonants produced with the tongue body. As another
example, a narrow constriction in the oral cavity produces a consonant. Stop consonants /b/
12
through /k/ are marked [-continuant] because air flow in the mouth is completely blocked during
production, whereas fricative consonants /v/ through /sh/ are marked [+continuant] because
airflow is not completely blocked. The articulator-bound features refer to the configuration and
position of the particular articulators that are involved. For example, the features [stiff], [slack],
[spread], and [constricted] describe the configuration of the larynx and the state of the vocal
folds, and give information about voicing. The lexical representation in terms of distinctive
features is independent of the context in which a word is spoken or the speaking style.
1.2 Landmarks and acoustic cues
The speech signal can be segmented into vowels and consonants. Vowels are marked by regions
of maximum low-frequency amplitude corresponding to maximum vocal-tract opening in
phonated intervals, and consonants are indicated by regions of discontinuities corresponding to
releases and closures. These regions of change are referred to as landmarks. In the vicinity of the
landmarks, the speech signal can be analyzed for the acoustic cues that correspond to the
articulator-bound features. To better analyze these segments of change, the utterances are usually
Fourier transformed from the time domain into the frequency domain. A commonly used
representation of the utterance in the frequency domain is called the spectrogram. Figure 1.2
shows the spectrogram of the utterance "We took a hike." spoken by a female. Formants are
shown for vowels and consonants, and noise energy is shown for burst regions. In the vicinity of
these landmarks, the signal is examined in more detail for clues or acoustic cues that reveal
information about the place of articulation, nasality, tenseness, and voicing. The acoustic cues
may include formant frequencies, the fundamental frequency, and energy distribution in different
frequency ranges. For example, high vowels such as /i/ have a low first formant frequency while
13
low vowels such as /a/ have a high first formant frequency. Cues such as low frequency energy
near the closure of a consonant may indicate the voicing features (represented by [+ stiff] vocal
folds and [+slack] vocal folds in Table 1.1).
t uh
Wi
lL
J00
h
kaa
k
ay
~
AL.JbL
-
-0
-Ole
0
50
JOD
1503M
2M0
300
350
.400
.40
50
53D 60
3010
WflUFA
0
7
25
0
030
J00
J 5D
i
J 50
5.
S
p/1
glide
vowel
vowel
glide vowel
dipthon
stop/cl
stop/cl
F2
FF1
0
0
F
100
200
300
400
600
500
TIME (ins)
700
800
900
1000
100
200
300
400
600
500
TIM E (ins)
700
800
900
1000
80
40
<CI0
0
Figure 1.2: Vowel and consonant landmarks for "We took a hike." The top figure indicates the
consonants and vowels in the waveform and the bottom figures shows the vowel and consonant
landmarks as well as formant frequencies in the corresponding spectrogram.
The utterance "We took a hike" contains a glide /w/, three stop consonants /t/, /k/, and /k/,
four vowels /i/, /uh/, /aa/ and /ay/, as well as a voiceless aspirated consonant /h/. A discontinuity
at 275ms follows the first vowel /i/ and leads to a silence interval. This discontinuity is a
landmark for the closure of the first stop consonant /t/. A second discontinuity at 300ms, from
the silence region in to a burst of high frequency energy, is a landmark for the release of the stop
consonant. Together, these two regions are landmarks for a stop consonant. The discontinuity
14
between the /k/ and the following vowel just before 460ms is evidence for voice onset, or
phonation following a voiceless aspirated stop consonant.
The existence of landmarks show where consonants and vowels begin and end but do not
give information about the articulators involved in producing the segments.That information is
gathered by zooming in on the landmarks to examine for acoustic cues such as formant
frequencies. The front vowel /i/ in "We" at 200ms is marked by a low first formant frequency at
250 Hz and high second formant frequency at 2 kHz. The back vowel /uh/ in "took" is marked
by a high first formant frequency at 600 Hz and a low second formant frequency at 1.6 kHz. The
stop consonant /k/ in "took" at 455ms is marked by the proximity of the second and third
formants at 1850 Hz and 2150 Hz respectively. This proximity reflects the configuration of the
/k/ closure. The velar /k/, compared to the labial stops /b/ or /p/, is produced with a vocal-tract
constriction located much further from the lips. As a result, the lowest natural frequency of the
cavity in front of the constriction likely corresponds with the second formant frequency.
Furthermore, one of the natural frequencies of the back cavity will be very close to it. (Stevens,
1998)
1.3 Thesis Outline
1.3.1 Past Work
Elizabeth Choi's doctoral work (Choi, 1999), Detection of Consonant Voicing: A Module
for a Hierarchical Speech Recognition System, presented a component of a systematic
recognition system which focused on the detection of consonant voicing. Each utterance from a
database spoken by two speakers was first transformed into a spectral representation and
15
examined for landmarks. Further processing around the landmarks for acoustic cues led to a
deconvolution of segments and their features. The features were finally compared against items
in the lexicon for possible matches. Error analysis was performed on the measurements from two
databases: isolated CVC and VCV syllables, and continuous reading of short sentences. The
error rates, scored separately for each landmark, were in the range of 10 to 20 percent, the higher
error being for continuous speech. Combining closure and release landmarks reduced error rates
by approximately 5 percent (Choi, 1999).
1.3.2 Present Study
This study builds upon Choi's doctoral work by focusing solely on the voicing feature in
stop consonants both in isolated utterances. Choi's study involved only two speakers, who were
examined separately, so the differences in measurements related to physical dimensions of the
vocal tract were not accounted for. In addition, absolute measurements were obtained at a given
landmark. For example, in a VCV syllable, measurements were obtained at +30ms after a closure
and -30ms before a release. This study hopes to arrive at a more robust model by limiting the
number of consonants, by focusing on more speaker data, and by considering relative
measurements of acoustic cues that do not depend on factors such as speech level, pitch, and
gender.
Chapter 2 describes stop consonant production and the measurements that can be used to
determine voicing. Chapter 3 presents preliminary measurements from isolated VCV and CVC
utterances. Chapter 4 gives an analysis of those preliminary measurements and draws some
conclusions about the relative importance of different voicing cues. Chapter 5 gives a summary
and discusses future work.
16
Chapter 2
Production and cues of stop consonant
voicing
2.1 Voicing
Voicing refers the manifestation of vocal fold vibration during the production of a speech
segment. It is a distinctive feature in stop consonants because two stop consonants can be
identical in all the features except voicing. In the feature representation, voicing is determined by
the state of the vocal folds which can be either stiff or slack. When the vocal folds are stiff,
vibration in an obstruent consonant (such as a stop consonant) is inhibited; a segment in this state
is described as unvoiced. Conversely, when the vocal folds are slack, vibration during an
obstruent consonant is facilitated, and a segment in such a state is said to be voiced. This thesis is
primarily concerned with the voicing feature of the six English stop consonants and this chapter
17
gives a brief description of the cues which distinguish a voiced stop consonant from an unvoiced
one.
2.2 Stop consonant voicing
Stop consonant production is characterized by a complete closure in the vocal tract, an interval
following the closure during which pressure builds up behind the constriction, and finally a
sudden release of the constriction. The constriction is formed with the lips in the labials /b/ and /
p/, with the tongue tip in the alveolars /d/ and /t/, and with the tongue body raised against the soft
palate in the velars /g/ and /k/. The biggest difference between labials and alveolars is the larger
amplitude of higher frequencies in the spectrum of the release burst for alveolars. The burst for
velar stop consonants are characterized by peaks in the mid frequency range in the spectrum as a
consequence of the tongue body position which produces a front cavity with resonances
corresponding to frequencies near F2 and F3. At the instant of closure, the intraoral pressure is
zero. During the closure interval, as the intraoral pressure increases to approach the pressure
across the glottis, glottal airflow decreases. At the release, air abruptly flows through the
constriction and there is a decrease in pressure across the glottis as well as intraoral pressure.
During the closure interval, the pressure built up behind the constriction leads to a smaller
difference in the pressure across the glottis. When this difference is small enough, the vocal folds
cease to vibrate and voicing stops. The vocal folds can be further manipulated to be spread apart
or stiffened to continue this voiceless state. However, if the pressure built up behind the
constriction is not great enough to lead to a small enough difference in pressure across the
glottis, airflow will continue and the vocal folds will continue to vibrate. The pressure buildup is
18
limited by expanding the pharyngeal region (Stevens, 1998). The vocal folds remain slack in this
voiced state.
2.3 Acoustic cues for stop consonant voicing
Acoustic cues for voicing depends on the position and context in which the stop consonant
occurs. The cues are also influenced by the speaking rate and speaking style, and the variability
in these types of speech also needs to be characterized.
Absolute measurements of some cues may not provided adequate cues for voicing, and
relative measurements taken over a period of time near the closure and release landmarks may
reflect differences in voicing more robustly. For example, immediately prior to closure, there is a
in the low frequency amplitude of the radiated sound. Following the closure, the amplitude of the
radiated sound is small compared to that in the preceding vowel because sound is no longer
radiated from the mouth opening. The amplitude of the glottal pulses would be expected to
decrease abruptly at the instant of closure and the amplitude in the first formant region would be
expected to decrease immediately after the closure. This amplitude will then increase
immediately following the release. The most abrupt decrease in amplitude of F1 in stop
consonants would be expected for labials. By monitoring cues such as the low frequency
amplitude over time, the voicing feature may be revealed.
2.3.1 Voicing context
The voicing feature can be examined in three contexts. In syllable-initial position, the stop
consonant is not preceded by a sonorant segment and the voicing evidence can only be found
19
from the region immediately preceding and following the release region. In syllable-final
position, the stop is not followed by a sonorant segment and voicing evidence can be found in the
region leading from the preceding vowel segment into the closure for the consonant. In the
intersonorant position, voicing evidence can be found in both regions of the closure and release.
Several voicing cues exist following the release of a stop consonant. These cues include
the time from the release to the onset of low-frequency amplitude (VOT for voiced consonants
tend to be shorter in duration than for unvoiced consonants), a measure of the breathiness of
voicing after onset of glottal vibration (represented in part by the difference between the
amplitudes of the first two harmonics, H1-H2), the change over a period of time in the first
formant frequency, and the fundamental frequency following the onset of glottal vibration.
Cues which exist in the syllable-final position include lengthening of the preceding
vowel, possible glottalization of the stop consonant release, and duration of the time interval in
which glottal vibration continues. An example of vowel lengthening occurs in the pair of words
"rider" and "writer". The vowel which precedes the final voiced consonant /d/ is longer in
duration than the vowel which precedes the final unvoiced consonant /t/. An example of
glottalization is the /t/ in the word "can't" in which the release of /t/ is not registered as a
landmark in the sound. However, cues for the presence of glottalization are evidence for a
following voiceless stop consonant, since glottal adduction is often used to enhance the
termination of glottal vibration for syllable-final voiceless stops (Stevens, 1998).
For stop consonants in the intervocalic position, there is a combination of cues from the
closure and release.
20
2.3.2 Voicing cues
Voicing cues refer to acoustic cues which reflect the articulatory movements of voiced stop
consonant production and distinguish a voiced stop from an unvoiced one. Voicing cues should
be subject to minimum variation in spite of differences in the context in which stop consonants
appear. When the vocal tract forms a constriction or closure in the oral region, the cues for
voicing should permit estimates of the vocal fold stiffness and the degree of abduction or
adduction of the glottis.
Figure 2.3 is a schematic articulatory representation that accounts for the two types of
cues that will be used as the preliminary cues in this study. Based on knowledge from acoustic
phonetics, the main indicators of voicing are the stiffness of vocal folds and the spreading of the
glottis. The top panel shows a schematic representation of the change in the area of the glottal
opening in a VCV sequence, where the consonant is in the intervocalic position, for both a
voiced and unvoiced consonant. Consonant closure is marked arbitrarily by -100 and the release
is marked by 100. For the VCV utterances in which closure precedes release, -100 refers to the
point of closure. For the CVC utterances where release precedes closure, -100 refers to the point
of release. The bottom panel shows the change in stiffness of the vocal folds that is postulated to
occur during the closure interval and immediately following the release. The top panel shows
that the glottis is more spread in the unvoiced stop consonant during the closure interval and
immediately prior to the closure as well as after the release. The glottis has to be spread in the
unvoiced stop consonant to keep the vocal folds from vibrating. The configuration of the glottis
changes relatively little for the voiced stop consonant. The bottom panel shows that the vocal
folds are increasingly stiff for the unvoiced consonant and that they are increasingly slack for the
voiced consonant. The stiffness of the vocal folds keeps them from vibrating in producing an
21
unvoiced consonant and the slackness of the vocal folds allows vibration to continue in
producing a voiced consonant (Stevens, 1999). The acoustic cues for the voicing distinction are
proposed based on this view of these adjustments of vocal-fold configuration and stiffness.
Closure
Release
v
C
v
v
unvoice
nvoiced
0d
voice onset
C
voiced
voiced
-100
C
100
Release
Clos ure
C6
v
C
V
voiced
C
______v
0.)
ed
100
-100
Figure 2.3: Schematic measures of glottal opening and stiffness associated with stop consonant
voicing. The top panel shows estimates of the area of the glottal opening and the bottom panel
shows the change in vocal fold stiffness during the closure interval and immediately after the
release. Closure is marked at -100 and release at 100.
The next chapter discusses the particular cues involved in determining voicing, and
applies them to recorded speech data. The goal is to determine which voicing cues are the most
important in an isolated context of CVC and VCV utterances.
22
Chapter 3
Databases and Acoustic Measurements
3.1 Description of database
This study focuses on the measurement of the relative values of certain acoustic parameters at
different times within utterances. Data from two female and two male speakers is adequate for
that purpose. The stop consonants are b, p, g, k, d, and t. Vowel-consonant-vowel (VCV) clusters
such as /ahdah/, and consonant-vowel-consonant (CVC) clusters like /dahd/ are first recorded
and then digitized. Results from Choi's study suggested that it may be possible to pool
measurements from utterances where vowels adjacent to the consonants are variable
(Choi,1999). Therefore, a neutral vowel /ah/, as in /cut/, is chosen for this thesis study. CVC
utterances cover the syllable-initial and syllable-final voicing contexts and VCV utterances cover
the intervocalic position. For analysis purposes, VCV utterances are separated split into the VC
and CV pairs which correspond to the two landmarks of a stop consonant. Similarly, CVC
23
utterances are split into the CV and VC pairs. The relative importance of the acoustic cues at
either landmark can then be assessed to arrive at a best combination of these cues. Primary
spectral measurements from the center of the landmark are extracted at 10ms intervals. For the
VCV utterances, where the closure landmark precedes the release landmark, the closure
landmark is arbitrarily assigned -100ms and the release landmark is assigned 100ms. Times
relative to these landmarks are selected as measurement points. For the closure landmarks,
measurements are taken from -150 to -70ms; at the release, data are sampled from 100 to 170ms.
For the CVC utterances, the release landmark is assigned -100ms and the closure landmarks is
assigned 100ms, and measurements are taken from -100 to -30ms at the release and 50 to 30ms
at the closure.
3.2 Description of analysis tool
After the utterances are digitized, the waveforms are analyzed by using XKL, an X-windows port
of the interactive speech analysis package originally developed by Dennis Klatt. The XKL
program makes a spectral representation of the waveform. XKL computes a 512-point discrete
Fourier transform on a length of waveform that is first differenced, and multiplied by a Hamming
window. In this thesis, a long window length of 30ms is used to measure HI, H2, and FO and a
short window of 6.4ms is used to measure the formant frequencies as well as formant
amplitudes. The longer time window corresponding to a higher frequency resolution is used to
better capture harmonics. A shorter time window corresponding to a lower frequency resolution
is adequate to measure formant frequencies.
The XKL program computes and displays fundamental frequency if it determines local
spectral maxima at regular intervals. The fundamental frequency is computed by collecting
24
frequencies of local maxima in the dft spectrum. Only peaks below 3kHz are considered and the
frequency is specified if the program judges the peaks to be equally spaced. If there is little lowfrequency energy present in the spectrum, or if the distribution of differences is too spread in
frequency, no fundamental frequency is displayed. See Klatt, 1980 for details.
3.3 Measurements
The first step in the detection of consonant voicing is to locate the closure and release landmarks
of the stop consonants. The locations of the landmarks can be aided by the knowledge that a
complete closure somewhere in the vocal tract is required and that pressure builds up as a result
of the constriction, leading to a reduction or extinction of glottal vibration. The pressure buildup
is followed by a sudden release of the constriction which can result in the generation of
turbulence noise. In this thesis, the landmarks are determined by hand by examining the
waveform for abrupt discontinuities.
Around each landmark, the primary acoustics parameters measured with XKL are the
fundamental frequency (FO), the amplitude of the first harmonic (Hi), the first formant
frequency (F1), the amplitude of the second harmonic (H2), and the amplitudes of the first and
third formant spectral prominences (Al and A3). Figure 3.3 shows the different parameters that
are measured in different time regions of a VCV utterance. Immediately before closure, the
measures that reflect the spread of the glottis are HI, H2, Fl, Al, and A3 and the measure that
reflects the stiffness of the vocal folds is FO. The same is true after voicing resumes some time
after release. In between voicing, however, a measure of the presence of absence of FO is used to
determine the periodicity of the vocal fold vibration immediately after closure and around the
25
release is obtained. At each 10ms sample point, the signal is taken to be periodic if it finds an FO,
and is taken to be non-periodic if it fails to find an FO.
ci
rl
V
C
Hi
CH2
A
A3
Hi
H2
5 F1
Al
A3
FO
AF0
Z
2
8
V
50ms
3
F1 F
F
VOT
30ms
40ms
Time
Figure 3.3: An example of gathering measurements for a VCV utterance. In the 50ms interval
before the closure interval, parameters for spread and stiffness are measured. 30ms after the
closure interval, periodicity is measured. VOT is determined after the release, and parameters for
spread and stiffness are measured in the 40ms after onset of glottal vibration.
HI is a measure of the strength of the vocal fold vibration. The difference between Hi
and H2 is an indirect measurement of the amount of glottal spreading; Hi-Al, and H1-A3 also
reflect the state of the glottis. A larger value for any of these differences in a vowel immediately
adjacent to a stop consonant corresponds to a more spread glottis and a broader first formant
bandwidth. Al may be smaller in a vowel adjacent to an unvoiced stop consonant where the
vocal folds are spread or constricted, leading to acoustic loss. Therefore, H1-AI would be
expected to be higher for unvoiced stop consonants than for voiced consonants. Formation of the
constriction leads to an abrupt termination of the phonation source, which is reflected in sharp
decrease of Fl. F1 reflects the degree of the glottal constriction. After the sudden closure, F1
26
would be expected to fall off substantially and rise after the release. The fall in Fl at consonant
closure and the rise in Fl at voicing onset are expected to be greater for voiced consonants.
Another cue for stop consonant voicing is a measure of the how long the vocal folds are
spread before they come together at the release of the consonant, called the voice onset time
(VOT). The VOT is a cue found at the release landmark, as voicing begins sometime after the
release of a stop consonant. The VOT is determined in this thesis by taking the difference
between the time of release, which is marked by a burst of energy in the waveform, and the time
from which XKL detects periodicity consistently.
Another measurement preceding constant closure is vowel duration. Vowel duration can
be used as a cue for voicing of final-position stop consonants when the vowel is in the utterance
final position (Crystal and House, 1990). A vowel followed by a voiceless consonant such as the
/eh/ in "bet" is usually shorter in duration than one followed by a voiced consonant such as
"bed". There is a natural tendency to make a slightly early glottal opening gesture for a postvocalic voiceless consonant in order to insure that no low frequency voicing cue is generated
(Klatt, 1975).
At the release landmark, of interest is how many time frames have elapsed before the
onset of periodicity. Once periodicity is detected, how do the parameters change with time? At
the closure landmark, a similar cue for stop consonant voicing is a measure of how long the
vocal folds continue to vibrate after the point of closure. This cue at the closure landmark will be
referred to in this thesis as the voice offset periodicity (VOP). This parameter is defined here as
the percent of the 4 time frames after the point of closure for which XKL returns a fundamental
frequency. Together, the VOP after the closure (in percent), and the VOT before the release (in
ms) might give valuable information about the behavior of the vocal folds during the consonant
27
interval, as reflected by the presence or absence of FO. The duration of the vowel may also be a
cue for voicing of final stop consonant in CVC utterances. Away from the consonant, before the
closure and after the release landmarks, the behavior of the vocal folds in preparation to
transition into or out of the consonant might also provide a clue about voicing, as reflected by the
change in other parameters (H1,
H2, F1, Al and A3).
3.4 Data
This section presents data obtained with XKL on the VCV and CVC utterances in two parts. The
first part is concerned with the behavior of the parameters during the closure interval. The
parameter used is the FO returned by XKL, and this information is used to calculate the VOT and
VOP. The second part is concerned with the behavior of the parameters H1, H2, F1, Al, and A3,
before the closure and after the onset of voicing at the release landmark.
3.4.1 Voice onset time (VOT)
The first plot of figure 3.4.1 shows the average VOT for stop consonants from data provided by
Crystal and House (1988). Notice that the VOT for unvoiced stop consonants is about 45ms
longer than for voiced stop consonants. The second and third plots of figure 3.4.1 show VOT
calculated in this study from the VCV and CVC utterances and reflect the same relationship. The
last plot of figure 3.4.1 shows the vowel duration for CVC utterances. On top of each average
measurement value, the standard error of that measurements across all speakers is shown. The
small and non overlapping standard deviations between unvoiced and voiced stops suggest that it
is possible to distinguish voicing by examining solely the VOT. It is simple to distinguish
28
between a voiced and an unvoiced stop. It is, however, difficult to distinguish place of
articulation for voiced stop consonants or for unvoiced consonants based on VOT, although there
are some systematic differences. The labials have the shortest VOT and the velars have the
longest VOT. In a cross-language study of voicing, Lisker and Abramson (1964) found that the
velars have consistently higher VOT values than the other stops and suggested that the VOT is,
to a certain extent, sensitive to the place of stop closure. In order to prevent the effect of
producing overlapping distribution, data from stops of the same manner but different places of
articulation were kept separate in that study.
Vowel duration is taken to be the time from the release of the syllable-initial stop to the
closure of the syllable-final stop. Studies on fricatives by Crystal and House show that the
duration of a vowel up to the time of the frication onset tends to be shorter when the vowel is
followed by a voiceless fricative than by a voiced fricative, when the consonant is in the syllablefinal position. This duration difference for vowels is negligible when the fricative is in the non
syllable-final position. These results also apply to obstruent consonants in general. Previous
studies showed that the average duration of vowels followed by //t/ is 160ms, by /d/ 210ms, by
/p/ 150ms, by /b/ 210ms, by /k/ 150ms and by /g/ 210ms (Crystal and House, 1990). Panel (d)
shows vowel duration data from this study. There is similar relationship in the vowel durations in
voiced and voiceless stop consonant segments. Vowel duration for vowels followed by voiced
stop consonants is on the average 70ms longer than vowels followed by unvoiced consonants.
The This large difference suggests that vowel duration can be a cue that contributes to voicing
identification in syllable-final stop consonants.
29
--
-
*1'1*~~
-
-
--- ,
---------------------
Average stop consonant voice onset time VOT
100b
E
C
(a)
50 -
-E
F--
U1
0
100-
ahbah
Average 'JtV voice onsA time VOT 5
2
1
E
C
g
p
ahgah
h
at
ahkah
ahdah
a
h
........
....
50-
7
6
(b)-
E
0
0
1
10 0 - . . ... ..-. .
0)
E
C
2
bahb
pahp
1
2
bahb
pahp
Average dVC voice onsA time VOT 5
gahg
kahk
7
6
dahd
taht
5
6
dahd
taht
50 -
E
U-
C
0
300
Averag CVC vowel duration
gahg
kahk
7
E20C
C
E 100
(d)
0
0
1
2
4
3
5
6
7
Figure 3.4.1: Average voice onset time (VOT). Panel (a) shows data from a prior study (Crystal
and House, 1990). Panel (b) shows VOT for VCV utterances, panel (c) for CVC utterances and
the panel (d) shows the average vowel duration in CVC utterances. The bottom portion of each
stacked bar shows the actual VOT in ms and the top portion shows the standard error of the data.
3.4.2 Voice offset periodicity (VOP)
Figure 3.4.2 shows the average VOP. The top plot represents the average VOP for VCV
utterances and the bottom plot gives the average VOP for CVC utterances. Average VOP for the
30
voiced utterances is higher than that for the unvoiced utterances. Average VOP for the voiced
/ahgah/ and /gahg/ is more than 65% higher than average VOP of their counterparts /ahkah/ and
/kahk/. The same is true for /d/ and /t/ utterances. Average VOP for the voiced /ahbah/ and /bahb/
is only about 20% higher than their unvoiced counterparts. Average VOP for the unvoiced CVC
tuht is 0. Again, the small standard error indicates that VOP is helpful in distinguishing voicing
for postvocalic stop consonants.
100
Average VCV voice offset periodicity VOP
- --
-
-
ahpah
ahbah
80
a
ahkah
h
a
h
ahtah
-.-.-.-
. 60 -
C.L
0
0
100
1
--
80 -
bahb
-..
-.-..-.-.-
-
--
---
--
2
-- -
--
4
3
5
Average CVC voice offset periodicity VOP
- -.---.--.
dahd
kahk
g
pahp
6
7
taht
60.2
0
0
1
2
4
3
5
6
7
Figure 3.4.2: Average voice offset periodicity (VOP). The top panel shows percent periodicity
for VCV utterances and the bottom shows percent periodicity for CVC utterances. The standard
error of all the measurements across speakers is shown above each bar.
31
Note that the VOT for unvoiced stop consonants is higher than the VOT for voiced stop
consonants, and that the VOP for unvoiced stop consonants is lower than the VOP for unvoiced
stop consonants. It takes longer for an unvoiced stop than for a voiced stop to resume voicing at
the release, and it is less likely that there is continued voicing after the closure. Note that while
VOP for /g/ is more than four times that for /k/, the VOP for /b/ is only twice that for /b/. The
smaller difference in VOP for the /p/ and /b/ pairs could arise because the closure of the labial
stops is faster, resulting in a more rapid drop in Fl and hence a more rapid decrease in
periodicity as measured by XKL. On the other hand, there is a large difference between the VOP
of /dahd/ and /taht/. The VOP of /taht/ is zero. The large difference between the VOP of tuht and
duhd suggests that the final /t/ is much more frequently glottalized than the final voiceless stops
/k/ and /p/. That is, the glottis closes and voicing is cut off even before closure.
3.4.3. Glottalization
Glottalization is a phonological modification mostly made in continuous speech. Often, stop
consonants in the final position are not fully released. The VOP data of /taht/ may be an
indication that the final /t/ is more frequently glottalized than other stop consonants in the final
position. Figure 3.4.3a shows two spectrograms. The top one is of /taht/ and the bottom one is of
/kahk/. Immediately after 610ms in /taht/, voicing stops altogether and there is no release of the
final /t/. On the other hand, there is a definite release of the final /k/ beginning at 700ms.
32
5I
I~7
/u
/t/release
T
/t/closure
Y-
LL
o
100
200
300
400
50
600
1iME (ms)
700
800
900
1000'
0
100
200
300
400
600
500
TIME (ms)
700
800
900
1000
70
80
90
10
700
800
900
1000
0
Wrl /uh/,
C/0
14
0
100
200
0
0 I
ms6o
CL*
0*
0
30
400
500
600
TIME (ma)
Figure 3.4.3a: Glottalization of /t/. The top panel shows the spectrogram for /taht/ and the bottom
shows the spectrogram for /kahk/ by the same speaker.
If the final /t/ is glottalized, it should turn out that F1 should drop less for /t/ than for /p/
and /k/. Presumably voicing is cut off so abruptly that F1 will not have had a chance to decrease
significantly. Figure 3.4.3b shows that after the point of closure, on average, F1 decreases less
for /t/ than for /p/ and /k/. F1 for /k/ decreases the most, from 680Hz at the point of closure to
270Hz 10ms after. F1 for /t/ decreases the least, from 615Hz to 500Hz. F1 for /p/ decreases
somewhere in between, from 605Hz to 400Hz.
33
75
Average F1 decrease for unvoiced stops
0
-.
-. -. -.
-.
-.--..-..-7 0 01 - .- .
650 6 00 -
500 -
. .. .
.-.-.-
..
..
-. . . .
-.
-.
-
-
. . . .- .
-.
-. -. -.-.
5 50 - - -
-. -..-
.. .
-.
-.
taht-
-
--
..
-
..-..--
-..
-...-
- -..-
--
- ..- . -
-
- -
- ---
o- pahp
450 --
kahk
400
-
350-
--
.....
-.
300250
50
60
70
80
Time [ms]
90
100
110
Figure 3.4.3b: The average F1 decrease for unvoiced stops. The stars indicate values for /t/, the
circles for /p/ and the dots for /k/. Averages are over four utterances, one each by four speakers.
Note that the closure is at 100ms and measurements are taken from 50ms to 1 10ms.
3.4.4 H1, H2, F1, Al, and A3
The previous section discussed the importance of VOT, VOP, and vowel duration as cues for
voicing. One might conclude that VOT, VOP, or vowel duration alone can determine the
presence of voicing. In other contexts such as a noisy environment, however, often VOT and
VOP cannot be detected. In casual speech, vowel duration may vary as a result of changes such
as emphasis or speaker mood. What other cues can be used to determine voicing in these
contexts? The answer might be on the other side of the landmarks. Figure 3.4.4a shows the
behavior of H1, H2, Al, A3, FO and Fl measures over time for the isolated VCV utterances
/ahgah/ and /ahkah/ spoken by two males as and ds. The following section remarks on general
trends in the behavior of these measures and the next chapter provides an analysis of these
34
measures, relates the behavior to theory, and draws conclusions about their strength in
determining voicing.
In figure 3.4.4a, measures for speaker as are shown on the left column and for speaker ds
on the right column. Figure 3.4.4b shows the same measures for the female speakers ld and ss.
The fundamental frequency FO and first formant frequency F1 are displayed alone while the
amplitudes H2, Al, and A3 are shown relative to HI, the amplitude of the first harmonic.
Data for the male speaker as show that at the closure, HI (FO amplitude) begins to drop
from 50dB to 25dB. At the release landmark, voicing begins about 35ms after the release, and
HI rises from about 30dB to 50dB. Data for the other male speaker ds follows the same trend
and voicing begins about 40ms after the release.
After the closure, FO frequency for the voiced /ahgah/ remains constant while FO of the
unvoiced /ahkah/ becomes undetectable by XKL. XKL returns 0 for FO when the fundamental
frequency cannot be detected because there is a lack of regular harmonics. In this thesis, a "0" is
assigned to F0 as a null value. The figures show that FO drops from some value to 0 abruptly at
the closure. At the release, as soon as voicing begins, FO for the unvoiced is detectable again and
is higher than FO for the voiced, presumably because the vocal folds were held stiff to keep from
vibrating before voicing. At that point, FO of the unvoiced is about 30Hz greater than FO of the
voiced.
After the closure, H1-H2 of the unvoiced /ahkah/ is about 15dB greater than that of the
voiced /ahgah/. H1-H2 of the voiced falls while that of the unvoiced rises. At the release, H1-H2
of the unvoiced rises from the onset of voicing and then falls while that of the voiced stays
relatively constant. The overall change in H1-H2 of the unvoiced stop is greater than that of the
voiced.
35
The profile of Hi-Al over time is the same for the unvoiced and voiced stop at the
closure. At the release, however, Hi-Al of the voiced decreases steadily while that of the
unvoiced rises and falls after voicing begins. Again, the overall change in H1-A2 of the unvoiced
stop is greater than that of the voiced.
Hl-A3 of the unvoiced stop falls after the closure while that of the voiced stop continues
to rise. Because A3 is falling more than Al, H1-Al is almost constant around the closure for
both the voiced and unvoiced stops. At the release, when voicing begins, H1-A3 rises and then
falls after reaching a maximum around 50ms after the release. Again, the overall change in HIA3 is bigger for the unvoiced than the voiced stop.
At the closure, there is nearly no difference in the behavior of F1 in the voiced and
unvoiced stops. At the release, F1 is higher for the unvoiced than for the voiced and it falls after
about 50ms after release.
These general observations apply to the female speaker data shown in Figure 3.4.4b.
Figures 3.4.4c and 3.4.4d also show similar trends for the CVC utterances /gahg/ and /kahk/. The
closure and release landmarks are switched from those of the VCV utterances. Data for all
speakers are shown in the appendix.
The next chapter gives an analysis of these cues and assesses their validity and
importance in determining voicing.
36
2. 80 060--
VCV Male as: ahgah vs. ahkah
. elease
Closure .
o unvoice
. voiced
, 20'
c60
40
C~D
9J
Li-
2080-
0
-100
2400
0
T 200-0100.
IL
-100
0
0-600
200
100
0
100
200
......--.-.-.---
200 0
0100-
-100
0
. -.
-..
. --.
.
4
-200
o50--
-100
T
L-
Sq0
00.
020-\ 20-01-20
T -20-
ClctR.
Li-
200
100
VCV Male ds: ahgah vs. ahkah
Cl.osure
Release
.
..
fflOM E
100
200
. -.. -...-. ..-- ..
0-100
..-... .
0
..
.
-100
0
100
_,-20-T-202050
-200
100
-0
050-
200
100
0
-100
. .
20--0
*4
2
100
0
00
. .
200
D O
-100
0. . %.. .
0
.
100
200
.
09
*I
i 0
lb00 - - 110-
100
0
..........
500 - -.
IL
0-200
-100
001
.........................
0
time in ms
100
0CII
-100
10.0
200
100
0
time in ms
200
0
U-
200
0-
-a
-200
-100
Figure 3.4.4a: Measurements for determining voicing at the closure and release for voiced (dot)
and unvoiced (open circle) consonants in VCV utterances with vowel /ah/ for male speakers as
and ds. Closure occurs at -100ms and release occurs at 100ms. Data are taken every lOms from
50ms before closure to 30 ms after closure, and from the release to 70ms after release.
37
8060
40
20
I3 O0
20001000
0
200
-,
40-
VCV Femal e Id: ahgah vs. ahkah
VCV Female ss: ahgah /s. ahkah
ClosureRerease
Closure
Retease
.
o unvoiced
.. .
c 60 '.
oiced
0
40
20 ... . . ..OD . . . . . . .
0
0 ~ 20'
-100
20 o
100
0
00
-100
0
100
200
-....-...
0
200
G9 D...
-8~.
0 100-........ .............
. 00.
g0I33D
-100
0
100
I
-100
OQ
-10
0
- 0
~200
-2000
20
0
0
-
200
-100
. . . .
0
100
200
100
200
,n
-
. -.. . . .
-100
200
-0 50
C
0
100
200
-D
0
1 0.
100g00-100
0
100
T0
200
500-
500-01
-200
100
-D
20 ... .. ... .
250 --..
-20-200
-100
20 0
100
00
0
-100
m40
20,
-00
0,
0
-200
0
20 0
-100
0
time in ms
100
0-200
200
-100
0
100
200
.0
0
-100
0
time in ms
100
200
Figure 3.4.4b: Measurements for determining consonant voicing at closure and release for voiced
(dot) and unvoiced (open circle) consonants in VCV utterances with vowel /ah/ for female
speakers ld and ss.
38
80 0-60>40- 20
~00
CVC Male as: gahg vs. kahk
losu re~.~.~.Re le as e
o-
80 --604020
3 0
200 100 -
1
U-
100
0
-100
200
7N200 0100.
I00
100
0
-100
0
-00
20 -
200
. ..
-204200
.c 20-m0=
-201
-200
50 -
-100
0
100
0 --20'
4600..
20 - -
200
.0
coo
-
0
-.
CQ D ........ ....
-100
100
200
0
100
200
0
100
20 0
0
-100
..
... . .... .. . .. .. ...
---... .
-100
0
0^
0-**
-20-200
50 -
-100
0
10500..
-100
100
. . . .0
.. .
.
.... ....... ....
-100
CVC Male ds: gahg vs. kahk
Release
Closure
00
.
....
100
0
200
200
0..
...
.00
01
1500
500 -.
-00
01
-200
-100
-. 00.
-100
0
100
b
0
time in ms
200
100
01
-200
200
100
200
100
200
0
500-
... ..
0
0
-100
-
0
time in ms
Figure 3.4.4c: Measurements for determining consonant voicing at release and closure for voiced
(dot) and unvoiced (open circle) consonants in CVC utterances with vowel /ah/ for male speakers
as and ds.
39
80 Q6040U- 20'
3Q-00
T
I,80
~
vunivned
voicedc0? 60
-20-
1o
0
-100
00
200
.0
0
0-100
1200
01000
1i00
0-
200
-200
40-
...
. ...
40
20
-100
100
0
20
20
T-20200
-200
0
.......
20 - 0-
7-2
-200
a 50-
-100
0.-
r-"
-
500 -U-
0
. . .
-20 -200
50-
200
-100 -. . . 0
100
0
-2001000
....
-..
-100
.
200
0
100
200
.
-100
--
100
0
00
01
-200
200
time in ms
100
0
-100
.*
.
.
0..
. .
200
...
CbD
0
-100
100
200
00
500 0
200
0
200
100
. .
-100
.
-200
......-
0*
Oko
0.
10
100.
100
0
7-
I
.. ....
0..
... 00
-e-
-2
-100
0
U-
4 0
..
...
...
....... ... . . .
LL
0100
01
Closure
Release
040-
200 .
-200
CVC Female ss: gahg vs. kahk
VC Female Id: gahg vs. kahk
Retease
Gbo sur
-100
00D
0
time in ms
100
200
Figure 3.4.4d: Measurements for determining consonant voicing at release and closure for voiced
(dot) and unvoiced (open circle) consonants in CVC utterances with vowel /ah/ for female
speakers ld and ss.
40
Chapter 4
Acoustic cue analysis
Much of the data analysis is concerned with the relative importance of the cues near the two
landmarks. In the vicinity of each landmark, the importance of the cues is further scrutinized in
segments of time which represent periods before the closure, after the closure and after onset of
voicing at the release.
For each CVC or VCV utterance, the behavior of the parameters H1, H2, F1, Al, and A3
is examined for the 50ms leading up to closure. After the point of closure, FO (indicating
presence or absence of periodicity) is examined for 30ms. At the release, FO is examined for 40
ms. After that point, H1, H2, F1, Al, and A3 are examined for 40 more ms. This analysis of
voicing can be thought of as four continuous stages. A first stage concentrates on the time
leading up to closure, a second stage focuses on the time after closure, a third stage looks at time
leading up to voice onset, and a fourth stage examines the time after voice onset. In effect, this
analysis concentrates on the acoustic manifestation of laryngeal gestures that overlap into regions
adjacent to the landmarks. A typical question that needs to be addressed is: in the transition from
41
vowel to closure, is there an adjustment of the vocal folds to prepare for a voiced or unvoiced
stop? Is there something done differently by the vocal folds during the vowel interval before the
consonant?
4.1 Analysis of H1, H2, F1, Al, and A3
In addition to the primary cues VOT, VOP, and vowel duration, several important cues exist just
before the closure and after voicing onset following the release. This section examines the
importance of these cues by tracking the behavior of the parameters HI, H2, FL, Al, and A3
over a period of 50 ms before closure and 40ms after periodicity occurs after release. Average
data and standard errors will be presented first, followed by a discussion of the relative
importance of each cue.
4.2 Average data for voicing cues
Figure 4.2a shows the average data for 24 VCV utterances by four speakers, two female and two
male and figure 4.2b shows the average data for the CVC utterances. The first column represents
activity at the closure of a voiced utterance, the second column at the release of the same voiced
utterance, the third column at the closure of the unvoiced counterpart, the fourth column at the
release of the same unvoiced utterance. The first row are plots of the behavior of H1, the second
row of F1, the third row of Hl-H2, the fourth row of Hi-Al, and the last row of H1-A3.
42
VCV voiced closure
-
10
--
0
-
10
Max
0
1
10
3
0
2 0 0 -.. ... .-.... . 200
0 .. .
..
.. .
-200
1
0
10
-
0 -.--.-
'-10,
1
0
10
2
2
Diff
1
0
-0
3
-
0
-
-10-
0
2
1
0
I'U
-.
3
Diff
2
3
0
1
-
2
3
0
-. 10
1
2
1
2
3
1c
Ini
3
-.-
-10
0
10
3
2
1
.
0
-10-
3
2
0
0
0
.
1
-200 ...............
'U
.
0
200
0
-10,
Diff
Max
-10
-
... . . . . -200 L...
1
2
3
0
10
..........
0
-
-10 Max
3
0
1
. 200
-
2
- .-
-200 1
3
0
. . 10
-
-
0 ..
.
Max
-10
Diff
2
10
-:.10
-- :-.
0 .
VCV unvoiced release
VCV unvoiced closure
VCV voiced release
--
1A
0
1
2
3
0
2
1
10
3
0-
0
1
2
3
.
0
0 -
.
0
1
2
3
2
3
10
10
-
-.
-
1C
0
1
2
3
-10
0
1
2
3
-10
0
1
2
3
-10-
0
1
Figure 4.2a: Average VCV measurements. The first item in each panel (Max) represents the
average maximum difference in the measure between two neighboring time frames. Max is the
maximum rate of change over lOms. The second item (Diff) shows the difference between the
value of the measure in the last time frame and that in the first time frame. The light part of each
stacked bar shows the actual measure and the dark portion indicates standard error. VCV closure
is defined at -100 and release at 100. Measurements are take from -150 to-i IOims at the closure,
and from 140 tol70ms at the release.
43
CVC voiced closure
CVC voiced release
10
10
0
-10
0
-
Diff
Max
0
2
1
3
-10
-
1
0
-
3
2
200
200
I-
0
0
3
2
1
1
3
2
10
0
-10
-
0
10 .
10
3
2
0
1
-10
10
3
2
-
-10
10
Diff
Max
-10
0
.200
-
2
3
1
2
3
1
2
3
1
-0
-
2
1
0
-.
10.........
0
-
1
0
2
3
-10
10
0
3
10
-
1
0
2
3
-10
10
2
3
1
2
3
-
-10
10
0
---.
0
.
1
0
0
.
--
0
-
0
-10
1
0
-
3
2
-
-200 ................ -200 -
-200 .
-200
1
0
0
.
Diff
Max
-10
200
-
.. .
0
.
Diff
Max
CVC unvoiced closure
CVC unvoiced release
10
- 10
0
1
-
2
3
0
1
2
3
-10
10
0
1
2
-
3
0
1
2
3
00
-10
0
1
2
3
-10
0
-10
-10
Figure 4.2b: Average CVC measurements. The first measure in each plot is the maximum
difference between two neighboring time frames. Max is the maximum rate of change over
10ms. The second measure is the difference between the last and first time frame. The light
portion of each stacked bar shows the actual measure and the dark portion represents the
standard error. CVC release is defined at -100 and closure at 100. Measurements are taken at the
release from -60 to -30ms, and at the closure from 100 to 140ms.
The first item of each bar graph (Max) shows the average maximum difference in the
measurement between any two 10ms time frames during either the 40ms release or 50ms closure
interval. The second item (Diff) shows the average difference between the first and last time
frames. The bars shown are all stacked with the bottom part representing the actual measure and
44
the top portion representing the standard error of that measure across speakers. Together the
values Max and Diff and their respective standard errors can give insight into the general
effectiveness of a parameter in characterizing a voiced or a voiceless stop.
Comparison of HI for the voiced and unvoiced VCV utterances at the closure shows that
their behavior is actually very similar. The Max and Diff values are both slightly negative,
indicating a slightly decreasing HI at the closure for both the voiced and unvoiced VCV
utterances. However, at the release, the Max and Diff values for the unvoiced VCVs, at 8 dB and
11 dB respectively, are much larger than those for voiced VCVs, which are both about 0 dB.
Furthermore, because the standard deviation of Max and Diff for the unvoiced utterance at the
release do not overlap with that for the voiced utterance, there is clearly a separation in the
behavior of HI at the release between the voiced and unvoiced VCVs. Because HI is a measure
of the amplitude of the first harmonic, the large rise in HI at the release of the unvoiced stop
consonant reflects the closing of the glottis as glottal vibration begins. Figure 4.2b shows the
same behavior of HI in CVC utterances. HI is a valid cue for determining voicing near the
release landmark for both VCV and CVC utterances.
At the release, the maximum rate of change in F1 is greater for the voiced than for the
unvoiced in both the VCV and CVC utterances. This could arise because at the release, there is
an increase in low frequency amplitude of radiated sound. At the release of voiced VCVs, Fl
rises by a Max of 47Hz (standard error 18) and by an overall Diff of 64Hz (standard error 21). At
the release of unvoiced VCVs, F1 drops by a Max of -22Hz (standard error 20) and -27Hz
(standard error 23). At the release of voiced CVCs, F1 rises by a Max of 32Hz (standard error
14) and a Diff of 39Hz (standard error 16). At the release of unvoiced CVCs, Fl drops by a Max
of -35Hz (standard error 8) and a Diff of -48Hz (standard error 31). At the closure, there are even
45
more considerable differences between the voiced and unvoiced. In the VCV utterances, the
decrease in F1 at the closure of the unvoiced utterances is indicated by a Max of -147Hz
(standard error 38) and Diff of -209Hz (standard error 40). The decrease in F1 at the closure of
the voiced is a Max of -178Hz (standard error 40) and a Diff of -236Hz (standard error 36). At
the closure of the voiced CVCs, Fl decreases by a Max of -178Hz (standard error 40) and a Diff
of -236 Hz (standard error 36). At the closure of the unvoiced CVCs, F1 decreases by a Max of 122Hz (standard error 33) and -185Hz (standard error 36). The larger decrease in F1 at the
closure of the voiced utterances is consistent with the expectation that immediately after the
closure, there is a decrease in the low frequency amplitude of radiated sound. The parameter F1
is therefore a valid cue in determining voicing at both the release and the closure landmark.
Comparison of the Max and Diff in H1-H2 for the voiced and unvoiced VCVs shows that
it is a robust cue for voicing both at the release and at the closure. At the closure, Diff is -1 dB
(standard error 0.9) for the voiced and 5 dB (standard error 1.5) for the unvoiced. The positive
value of Diff for the unvoiced indicates that the H1-H2 becomes larger approaching the closure.
This is consistent with the physiological behavior as the glottal opening increases and a smoother
glottal waveform results in a lower H2. The last few pitch periods have become breathy. At the
release, Diff is 0.2 dB (standard error 0.2) for the voiced and -4 dB (standard error 1) for the
unvoiced. The negative value of Diff indicates that H1-H2 is decreasing after the release. This is
again consistent with the behavior of the glottis as it becomes less spread to prepare for voicing.
The large differences in Diff at both landmarks and the none-overlapping standard errors would
indicate that the measure Hl-H2 is a valid measure for voicing. Therefore, H1-H2 is a fairly
valid measure of voicing both at the closure and at the release for the VCVs. Figure 4.2b shows
that for the CVCs, however, H1-H2 is not as robust a cue at either the release or the closure.
46
Although there is a similar pattern of HI-H2 increase at the closure and decrease at the release,
the large standard errors make it difficult to draw any conclusions about its contribution to
voicing identification.
The large standard errors associated with the parameters Hi-Al and HI-A3 makes it
difficult to draw conclusions about their validity as cues for voicing. Figure 4.2a shows that in
the VCV utterances, H1-Al increases more for the voiced than for the unvoiced at the closure
landmark, and it decreases more for the unvoiced than for the voiced at the release landmark.
Figure 4.2b shows that in the CVC utterances, it also increases more for the voiced than for the
unvoiced at the closure, and it decreases more for the unvoiced than for the voiced at the release.
H1-A3 follows the same trend at the closure for both the VCV and CVC utterances. There is
more of an H1-A3 increase at the release of both the VCV and CVC utterances. Compared with
the other parameters, HI-Al and HI-A3 do not contribute much to voicing detection.
It should be noted that these are average data over both female and male speakers. Some
parameters, such as H1, might change more for the female than for the male. In Glottal
Characteristicsof male speakers: Acoustics correlatesand comparison with female data,
Hanson and Chuang (1999) looked for parameters for differentiating male and female speech.
Acoustic measurements that reflected glottal characteristics on recordings collected on 21 male
speakers and 22 female speakers showed that while there was significant overlap across gender,
the male data showed lower average values and less interspeaker variation for all measures (first
formant bandwidth, HI-H2, Hi-Al, and Hi-A3). Males tend to have a more complete glottal
closure, leading to less energy loss at the glottis and less spectral tilt (Hanson and Chuang,
1999). Because this thesis looks for significant parameters for differentiating voicing in stop
consonants, the data collected on 2 female speakers and 2 male speakers were analyzed together.
47
The next section discusses the validity of the actual value of FO at the release or closure
landmark as a voicing cue. Average data of FO are provided first, followed by separate data from
male and female speakers.
4.2.1 Average FO data
In the previous chapter, the presence of FO, as detected by the XKL program, was the basis for
determining VOT and VOP. In addition, the actual value of FO at the closure or the release may
be a valid cue for voicing. FO at the release of an unvoiced stop consonant is expected to be
higher than FO at the release of a voiced stop because the vocal folds were held stiff during the
unvoiced interval. During the transition from the unvoiced stop consonant to the following vowel
segment, FO will fall. Similarly, at the closure, FO is expected to be somewhat higher for the
unvoiced stop consonant than for the voiced, assuming that the stiffness or the slackness of the
vocal folds is implemented before the closure.
It should be noted that FO is determined in this thesis based on the output of the XKL
program. Somewhere at the closure landmark, XKL stops detecting periodicity and returns 0 for
FO. Somewhere at release landmark, it begins to detect periodicity and returns an actual value for
FO.
Figure 4.2.1 a shows the average FO values at the release and closure landmarks for both
VCV and CVC utterances. Each bar represents the average FO spoken by 4 speakers on 3
utterances (voiced or unvoiced). Data for VCV utterances are shown in the top plot and data for
CVC utterances are shown in the bottom plot. In the VCV utterances, the average FO is 10Hz
higher for the unvoiced than for the voiced at the closure and 20Hz higher for the unvoiced than
for the voiced at the release. In the CVC utterances, the average FO is 5Hz higher for the
48
N&2=cd-- ---
--- z --
- --
---
- -
--
-
unvoiced than for the voiced at the closure and 30Hz higher for the unvoiced than for the voiced
at the release.
unvoiced release
voiced release
unvoiced closure
Average FO values in VCVs
voiced closure
200-
..
-.....
150-
.
-..
F7
0S
C-
.
-..
....... ....
100-
..
-.. ...
....... ....
50
A-
1.5
1
0. 5
2
2.5
3.5
3
4. 5
unvoiced release
voiced release
unvoiced closure
Average F values in CVCs
voiced closure
4
200
150
- -...
.......
-.....
-....
-..-..-.100
00.
5
-
-
50
--
1
1
1.5
2
2.5
3
3.5
4
4.5
Figure 4.2. la: Average FO values at the release and closure landmark for VCV (top) and CVC
(bottom) utterances. Each bar represents the average FO spoken by 4 speakers on 3 utterances,
either voiced or unvoiced.
FO is on average nearly 100Hz higher for the female speaker. Before any conclusions
about its validity as a speaker independent acoustic cue for voicing, it is important to examine
whether this average trend in FO is actually more influenced by one gender than the other. Figure
4.2.1b shows that this trend holds in both genders. The light bars represent male data and the
darker bars to the right represent female data. The measure of FO at the closure or release
landmark is therefore a valid cue for voicing.
49
297
FO male and female values in VCVs
300
unvoiced release
voiced release
unvoiced closure
voiced closure
250
female
male
200
150
100
50
U-
0.5
A I
1
1.5
2.5
2
A,
3
4
3.5
4.5
F male and female values in CVCs
30 0
F voiced closure
voiced release
unvoiced closure
unvoiced release
250
200
150
100
50
UI
F-I
0.5
I
1
1.5
V
aI.
i
2.5
2
3
3.5
1,
4
4.5
Figure 4.2. Ib: FO values shown separately for males and females in VCVs (top plot) and CVCs
(bottom plot). The first bar in each group represents the average FO for female speakers and the
second bar represents the average FO for male speakers.
4.3 Combining acoustic cues
The previous section summarized the behavior of six measures over time. The measures H1, FO,
F1, and H1-H2 were determined to be robust cues for voicing in both the CVC and VCV
utterances. The measures Hi-Al and H1-A3 were determined to be poor cues for voicing, partly
given the sample size. This section tries to address the following question: given this valid cues,
could a combination of these cues yield a better estimate of voicing? In certain contexts, the
single cues may be too weak to distinguish voicing but a combination of cues may be strong
enough to judge voicing.
50
Figure 4.3a shows some examples of such a combination of voicing cues using the MAX
values determined in section 4.2. The first row plots the MAX H1-H2 against MAX H1, the
second row plots the MAX F1 against HI, and the last row plots the actual values of FO (as
obtained in section 4.2.1) against MAX HI. Hl is used as a basis for these combinations because
it is already a valid cue for voicing on its own. The voiced data are indicated by filled circles and
the unvoiced data are shown by unfilled circles. The 24 data points in each graph represent 6
stops in the VCV or CVC utterances spoken once by each of the 4 speakers. At the release, the
scatter plot of the MAX of HI-H2 and of H1 shows that there is a clear separation in the data
points between the voiced and unvoiced. The same observation applies to the two scatter plots
below. Except for one or two overlapping data points, the otherwise clean separation indicates
that the combination of cues helps to distinguish voicing. The right side of Figure 4.3a shows
that there is no such clear separation at the closure landmark. In the bottom plot, notice that there
is a separation of data points depending on the gender of the speaker. Data from the female
speakers are about 100Hz higher than data from the male speakers. Figure 4.3b shows that for
the CVC utterances, there is no separation at the closure and that while there is some separation
at the release, compared to the VCV utterances, the separation is not as clear.
51
VCV release
..-..
-.. ..
20
-.
10
C\J
I
VCV closure
..
-..
..-..
0-
20
.. .....
...
. ........
10
-
0-
0
-0
-10-
...
......
-.-
.-..
-10x
oper: unvoiced
filled: voiced
-20
-6u
-1 0
-5
200
S
0
-
i 5
10
5
0
MAX H1 [dB]
-20
0
5
MAX H1 [dB]
10
15
-5
0
5
MAX H1 [dB]
10
i 5
N
-
U-
U-
x -200
-20C
-40C
-400
-10
-5
0
5
MAX H1 [dB]
- 10
15
10
30C
300
250
r200
0
u_ 150
-5
20C
0
-
10
I
- -.-.-.
0
--
.
--
25C
0
N 20(
0
'00~~
'O
LL 15C
10(
100
-10
-5
0
5
MAX H1 [dB]
10
15
-10
0
-5
.
0
5
MAX H1 [dB]
10
15
Figure 4.3a: Scatter plots of the voicing cues in VCV utterances using MAX values. MAX, as
described before, is the maximum rate of change over lOms. The left side shows trends at the
release and the right side at the closure. Voiced utterances are indicated by filled circles and
unvoiced utterances by unfilled circles.
52
CVC release
CVC closure
20
20
10
C\J
X
-10
LL
0
-10
<-20
1~
-30
-1 0
15
10
5
0
MAX H1 [dB]
-5
0
0
5
MAX Hi [dB]
-5
10
15
200
-I'
-9
0
x
0
0.
.. . . .
open: unvoiced
filled: voiced
-30
-1 0
200
C\J
x
t
-20
0
-0
0'
S
10
0
.0
0
0
0
So
0
00
U-
. .
.
-200
0
00
x -200
-
......... ... .... .........
..
-400 - ---
-5
-10
30U
..
.. .. . ... -... .-.... -..
10
0
5
MAX H1 [dB]
S-
.
0
-10
15
0
0
-
0
100
-10
..0.
--.
-5
0
5
MAX H1 [dB]
10
15
.-..
. . ..
N 200
0
*0.
. ..
0
5
MAX H1 [dB]
250
r 200
LL 150
-5
300
-- 0
. ... 0..
250
-400
0
J- 150
0
10
.0
100
-10
15
.. .
-5
0
5
MAX H1 [dB]
10
15
Figure 4.3b: Scatter plots of the voicing cues in CVC utterances using MAX values. Max is the
maximum rate of change over 10 ms. The left side shows trends at the release and the right side
at the closure. Voiced utterances are indicated by filled circles and unvoiced utterances by
unfilled circles.
The other two measures for voicing, HI-A1, and H1-A3, which were determined to be
poor for distinguishing voicing in the previous section, are shown in figures 4.3c and 4.3d,
combined with the measure HI. The goal is to determine whether these less robust cues are
valuable in combination with another more robust cue or cues. Figure 4.3c shows the
combination of HI-A l and H1 using their MAX values. There is virtually no activity at the
closure, as both cues have a value close to zero. While it was difficult to determine voicing based
53
on data from Hi-Al alone, the combined cues HI-A3 and HI show that there is a clear
separation at the release of the VCV utterances. The separation is not as clear for the CVC
utterances. Figure 4.3d shows that the combination of Hi-A3 and HI cues yield similar results.
closure
release
30
30
20
20
VCV
F
I;
0
-10
0
0
0
0
10
10
I
open: unvoiced
filled: voiced
00
-1
0
0
x
-10
o& 0
-20
'
'
0
10
MAX H1 [dB]
20
-20L'
-10
30
30
30
20-
VCV
V
0
20
30
20
30
-
20-
CVC
10
MAX H1 [dB}
CVC
Va
:C
0
10
10-
0
0
0
0
0
-10
-20'
-1 0
00
C
0
0
x
0
0-
0
10
MAX Hi [dB]
0-
20
-20-10
30
I
0:
0
0
10
MAX H1 [dB]
Figure 4.3c: Scatter plots of the measurement Hi-Al against HI using MAX values. The left
side shows trends at the release and the right side at the closure. Data from VCV utterances are
shown in the first row and data from CVC utterances are shown in the second row. Voiced
utterances are indicated by filled circles and unvoiced utterances by unfilled circles.
54
release
30
closure
30
-
20-
0
VCV
-
20-
VCV
0
CO,
VY
10-
10-
I
open: unvoiced
filled: voiced
0
0
-10-20L-
-10
0
0
0
0
0
-10
10
MAX H1 [dB]
20
CvC
20
CO)
20
30
0
0
x
0
20
30
CvC
10
80
0
00 0
-10
0
-10
10
MAX H1 [dB]
0
0 0
10
- z'-"
0
30
20
-10
[
-20
-1 C
30
30
CO,
48
0
10
MAX H1
20
-20-10
30
fdRI
0
10
MAX Hi
[dRil
Figure 4.3d: Scatter plots of the measurement H1-A3 against HI using MAX values. The left
side shows trends at the release and the right side at the closure. Data from VCV utterances are
shown in the first row and data from CVC utterances are shown in the second row. Voiced
utterances are indicated by filled circles and unvoiced utterances by unfilled circles.
In different speech environments, some cues for voicing will not be found at all while
some cues will be more prominent. The discussion above showed that the voicing cues, even
relatively weak ones, can be combined to take advantage of every cue that might be present.
55
Chapter 5
Summary and Discussions
This thesis examined the acoustic cues for the voicing contrast that exist in the vicinity of closure
and release landmarks of stop consonants. Preliminary data collected from four speakers on
isolated VCV and CVC utterances were analyzed for trends that prevailed across speakers. The
following sections summarize these results and suggest further study.
5.1 Voicing cues in isolated utterances
The voicing cues discussed in this thesis were determined from examining isolated CVC and
VCV utterances. The cues were chosen initially from knowledge gathered from the study of the
speech production mechanism which involved matching the physiological activities with the
acoustic manifestations. The parameters chosen were the fundamental frequency, first formant
frequency, first and second harmonic amplitudes, first and third formant frequency amplitudes,
time to onset of periodicity at the release, and time to offset of periodicity at the closure. A
56
measure of periodicity during the region immediately after the stop consonant closure was used,
and the same measure was used for determining the voice onset time at the release. The other
measures were relevant in inferring the stiffness and spread of the vocal folds at 50ms before
closure and 40 after the onset of voicing. Data showed that the voice onset time (VOT), percent
of 10-ms samples out of a four 10ms time frames that were not periodic at the closure (VOP),
and vowel duration were the most important cues in determining voicing, and that the measures
for spread and stiffness of the vocal folds can be secondary cues when the primary cues are
absent from an utterance. Analysis of the secondary cues for the VCV utterances show that HI,
FO, F1, and Hi-H2 can contribute to voicing identification at the release landmark. The same
analysis showed that Hi-Al and H1-A3 are poor measures of voicing, given the size of the data
and the associated variability. When two voicing cues are combined, such as HI and HI-H2,
they contribute more to voicing detection for stop consonants.
5.2 Further work
The combination of one acoustic cue with H1 as shown in section 4.3 showed that two cues
together can be more robust at distinguishing voicing than alone. More work needs to
concentrate on linear discriminant analysis or other methods to devise ways of best combining
these cues. This combination needs to take into account different speech environments where one
cue might be stronger than another or one cue might be missing altogether.
This thesis work examined all parameters except FO as they changed with time within an
utterance to draw conclusions based on the trend across all speakers. FO was analyzed only at the
point of voice offset or the point of voice onset. Utterances were examined for only 70ms after
the release. Voice onset time was on average 40ms longer at the release of unvoiced stops than of
57
voiced stops. The long voice onset time for unvoiced stops frequently extended beyond the 70ms
time frame used to capture change in the FO parameter. This change, was not monitored in this
thesis as a result of the short time frame allowed after the release landmark. The time frame after
the release should be extended so that the behavior of FO can be more carefully analyzed.
The voice offset periodicity was defined in this thesis as the percentage of periodicity
detected by the XKL program over four 10ms time frames. Whether XKL returns a value for FO
depends on factors such as the window size. The same window size would encompass more
points for female speakers so it could be easier to detect FO in female speech. Periodicity might
not be returned if each periodicity is different within a time window as a result of changes in the
frequency, amplitude, and spectrum of each pulse. A more reliable way to measure periodicity
needs to be developed.
The voicing cues discussed in this thesis are found in relatively simple contexts, such as
consonant-vowel (CV) or vowel-consonant (VC) utterances. By limiting context, it was possible
to examine for patterns in stop consonant voicing. The cues apply in contexts containing stressed
vowels such as in the word /cut/. When the stop consonant occurs in more complex contexts such
as casually spoken sentences, however, those cues may not be consistently present in the sound.
Depending on factors such as noise and speaker modification, the relative importance of these
cues will vary. For example, in noisy speech, the primary voicing cue in citation speech, VOT,
may not be very apparent. An example of variability is the change from the voiceless and
aspirated /t/ in "top" to the voiced and unaspirated /t/ in "stop". A similar change occurs in /p/
from "pot" to "spot". These modifications could be the result of overlapping articulatory gestures
for voicelessness which obscure voicing or vice versa. Another example of modification is a
blending of the place of articulation of a syllable-final alveolar noncontinuant consonant with
58
that of a following consonant as in the sequence "note book." (Stevens, 1996). The /t/ is
glottalized and the /o/ is shortened in this sequence. Other examples of variability include the
effect of flaps as in the word "writer". The /t/ in "writer" has been flapped so that it is difficult to
distinguish from the /d/ in "rider". Effects of vowel duration also need to be examined. For
instance, the second vowel in "paper" is a reduced vowel and this causes the second /p/ to be
produced unaspirated.
Despite these kinds of variability, a native speaker of English can still rely on some set of
clues to help extract what has been heard. A speech recognizer must be able to do the same. A
goal for further research would be to determine which voicing cues that apply in citation speech
would still apply in continuous speech and which new cues, if any, should be considered.
The utterances used in this thesis were recorded by 4 speakers, 2 female, 2 male. In order
to make sure the observations made in this thesis are true in general, more speaker data needs to
be examined in order to discount for particular speaker differences such as breathiness. The
variability of articulation rate should be controlled by training speakers beforehand so that this
variability does not contribute as much to the standard error of the vowel duration. More
speakers data are needed to provide a more meaningful conclusion about each measurement.
59
Bibliography
[1] Blumstein, S. E., K. N. Stevens, L. Glicksman, M. Burton, K. Kurowski. Acoustic and
perceptual characteristics of voicing in fricatives and fricative clusters. Journalof the Acoustical
Society of America, Vol. 91, 2979-3000, 1992.
[2] Choi, J.Y. Detection of consonant voicing: A module for a hierarchical speech recognition
system. Ph.D thesis. MIT, 1999.
[3] Chomsky N. and Halle, M. The sound patterns of English. New York, Harper and Row.
1968.
[4] Crystal, T.H. and A.S. House. Articulation rate and the duration of syllables and stress groups
in connected speech. Journalof the Acoustical Society ofAmerica, Vol. 88, 101-112, 1990.
[5] Duda, Richard. Pattern classification and scene analysis. New York, Wiley, 1973.
[6] Hanson, H. M. and E. S. Chuang. Glottal characteristic of male speakers: Acoustic correlates
and comparison with female data. Journalof the Acoustical Society of America, Vol. 106, 10641077, 19909
60
[7] House, A. S. On vowel duration in English. Journal of the Acoustical Society ofAmerica,
Vol. 33, 1174-1178, 1961.
[8] Klatt, Dennis. Linguistic uses of segmental duration in English: Acoustic and perceptual
evidence. Journalof the Acoustical Society of America, Vol. 59, 1208-1221, 1976.
[9] Klatt, Dennis. MIT SpeechVAX user's guide, Internal Memorandum, 1984.
[10] Klatt, Dennis. Klatt. Software for a cascade/parallel formant synthesizer. Journal of the
Acoustical Society of America, Vol. 67, 971-995, 1980
[11] Lindblom, B. Spectographic study of vowel reduction. Journalof the Acoustical Society of
America, Vol. 35, 1773-1781, 1963.
[12] Lisker, L. and A. S. Abramson. A cross-language study of voicing in initial stops: Acoustic
Measurements. Word 20: 384-422. 1964.
[13] Manuel, S. Y. and K. N. Stevens. Revisiting place of articulation measures for stop
consonants: implications for models of consonant production. ICPhS San Fransisco, 1999.
[14] Nigrin, A. Neural Networks for PatternRecognition. MIT Press, 1993.
[15] Ohde, R. N. Fundamental frequency as an acoustic correlate of stop consonant voicing.
Journal of the Acoustical Society of America, Vol. 33, 224-230, 1984.
[16] S. K. Keyser and K. N. Stevens. Feature geometry and the vocal tract. Phonology, Vol. 11,
207-236, 1994.
[17] Stevens, K. N. Acoustic Phonetics.MIT Press, 1998.
[18] Stevens, K. N. On the quantal nature of speech. Journalof Phonetics,Vol. 17, 3-46, 1989.
[19] Stevens, K. N. Applying phonetic knowledge to lexical access. 4th European conference on
speech communication and technology. Madrid, Spain. Vol. 1, 3-11, 1995.
61
[20] Stevens, K. N. Understanding variability in speech: a requisite for advances in speech
synthesis and recognition. ASA-ASJ Third Joint Meeting, 1996.
[21] Stevens, K. N. and A. S. House. Perturbation of vowel articulations by consonant context:
An acoustical study. Journalof Speech and Hearing research,Vol. 6, 111-128, 1963
62
Appendix A
63
8(
M 6
4(
2
voiced bahb-as (filled) vs. unvoiced pahp-as (open)
Closure
Release
I-
-F
0000000
-150
200
0
-50
-100
50
~j20C
100
150
200
100
150
200
100
150
20
200
100
150
200
0-
IL
-150
200
0
-50
-100
50
0
2C
-
2
-2(
I
'00
-150
200
-2
2
~
000
-00
01
0
-50
-100
100
50
10
-150
-2 200
-~
CO
*
50
0
-50
-100
-
0.66
00
I
00
1ob
50
LL
otm
200
80
M 60
40 00
20
-2 200
200
C 100 00
-2
40
C
20
-150
-100
-150
-100
-150
-100
00 -
ingmsg
-50
0
50
100
150
200
-50
0
50
100
150
2200
-50
0
50
100
150
200
0
50
100
150
200
I
(D E G)
150
200
0 ****
00
-150
-50
-100
I9(
SI
Q00
(D
C)000o
(DO0000
-2C
a 2C00
-2C
-150
-50
-100
.. 00
time inms
0
50
100
20
- -2 00
50 C
~
-06
0O
-00
-150
-100
-50
0
50
100
150
200
-150
-100
-50
0
50
100
150
200
CO
106i
?00
50(
LL
C
-~~tm
in
-ms-'
64
~E
80
M60
40
20
-2 00
voiced bahb-ld (filled) vs. unvoiced pahp-ld (open)
Closure
Release
**
. * 60
0000660
0
0
00
-100
-150
50
0
-50
100
150
200
100
150
200
150
200
-i 200
0
0
-100
-150
20
460
E
0
E)
4)G )E)E
600000'
'
o 100
u..
0'0000
0
-50
50
0Q99QQ00
M -20
0
-20
-100
-150
Q9
50
0
-50
0
Q
100
300000200
20
-- 20
0-
M
Z-20
DO
-150
-100
-50
0
50
100
150
200
00
-150
-100
-50
0
50
100
150
200
100
150
200
150
200
150
200
150
200
lo50
O
0.009
-
O
0
500
LL
0
00
0
-50
-100
-150
50
time in ms
80
S60
-40
20
-2 00
voiced bahb-ss (filled) vs. unvoiced pahp-ss (open)
Release
Closure
I
I
0
-50
-100
-150
0 (D Q
Q
0
'5
- -..
'
0000
100
50
200
0)
0
E)
S100
E)EE
LL
0
00
-150
00
_-150
-100
-50
QQ
0
50
- - 100
a: 20
20
'
'
-100'? - -
-50 0
0
50
-100
-50
0
50
100 000.90150
200
-100
-50
0
50
100
200
I0
M -20
20
20
-20
io
-2 00
I-2
0 50
00
-150
-150
-~
-..
500
0
-2 00
0 0 0o6
06
0.00
00
,0100
. . . .
0o 00o -
150
-
O
102
066
-150
-100
-50
0
time in ms
65
50
100
150
200
voiced dahd-as (filled) vs. unvoiced taht-as (open)
Release
Closure
806040-
9
OOOo
Oo
-0-0--9-0
. 0-.
.0
6066
* *60
20
-200
L
-150
I
I
-100
-50
0
2 00-
OOOOQ
150
200
-.
0
150
100
50
o
oooeoeoe
0-
.
-5
-10
-10
4600
50
-
0I
01 00-
66
L
100
200
-)E)E
o
o
20-
C-
I-
-500
-0-100
40
0
100
150
100
150
20
-o 20 -
10
2
CO
-0
0
-200
-150
~
-200
10 60-
-100
-50
0
~Q
1
s
-150
-100
-150
-100
50
-50
0
0
50
0
50
~
1
10
0
100
200
20
150
200
150
200
5
L
40
-200
Q
-50
time
20
~
in ms
100
1
?
1
p
0
0
qQ
voiced dahd-ds (filled) vs. unvoiced taht-ds
(open)
Release
Closure
'00-000(
8000
00
600
-150
-200
-100
-50
0
50
0
0
100
150
.
0
-200
-150
-100
-50
0
50
100
20
1000.
150
200
100
150
200
200
2 00 -
00-
IL
0
1
460
- - - - .
-20
-150
-100
-150
-100
-150
-100
- *00
em0o
e
e q
9EoE
-50
0
50
S
-50
0
50
50
100
10
II
150
15
0
50
100
150
100
5
0
150
200
20.
-200
Cf)
0
20
-200
-50
0(5
50-200
-15
-100
000
11
0
0
-0
200
-
10
5
0
-200
-50
-100
0
-50
time in ms
66
50
100
80
voiced dahd-ld (filled) vs. unvoiced taht-ld (open)
Release
Closure
. . . (DG
.
-
6040-
'
-100
'
-150
20
-c -200
2 00-
-50
.
I
.
1
.
'
0
'
50
I
I
'l on
100
0
1
150
200
.
00-
-
0
-00
I-
200.
20
-00
20-
u.
-150
-100
-
.
-150
1
-100
-O
-50
E
0
OE)e
'
50
000
. . . . - .Q
-
)
'
.
150
200
150
200
..
oD o0
0
-50
100
50
-
0
---
-
100
0-0
20
>
-200
-150
-100
0
-50
50
100
200
150
-
-
500
0
-50
-100
-150
10 -00
1
50
200
150
100
500-
-0
401
000)
-100
-150
-200
-50
0
time in ms
-
I
~0
0
40
20111
-200
-150
00
0
50
0
50
0
50
100
150
200
150
200
150
200
6 ----
00
0
0
-20
20
-200
-150
-100
-150
-100
20-
e
E~O
-50
i
Eo
-50
E)E)e
1 oo6i
100
40'1-150
20
-200
111n
0n-
..-
-66
-100
0
-50
50
-
0
-
100
?
-20 -150
0Q0.*Q
-100
150
-0 -D
0
-50
50
100
9?
200
150
200
0
9?-0
-.
0~o
-200
II
50 -
10
o
100
- -otie.
-
0
5 00
200
o
0
-50
-100
W200-
LL
150
voiced dahd-ss (filled) vs. unvoiced taht-ss (open)
Release
Closure
80 -
U-
00
0
100
50
-150
-100
0
-50
time in ms
67
50
100
)e.L0
150
200
voiced gahg-as (filled) vs. unvoiced kahk-as (open)
Release
Closure
80m60 -
9 OO
-40-
0
00
1
0
-100
1
-50
2-i100
L
-0 -
0
-100
E10 E)
-50
20-
-200
,7200-
-150
-150
200
40
1
100
1
50
150
®®®eo--
0
50
100
150
20
4
200
200
200
-
0-
Q90
-200
-150
-100
-50
0
50
100
150
200
-20
-200
50 -
-150
-100
-50
0
50
100
150
200
-
150
200
150
200
-4.-
0
-150
-200
-100
0
-50
50
50
-
-20
0
L-150
-200
4w
-100
66
6
-50
oo..
00. .
100
.
100o
50
0
00
0
2066
20-
lsr
I
-200
-150
0 ' S 020
-200 -20
-o
-150
-150
-100
190
-100
-100
-50
E)eE..I
')
0
I
II
0
50
-
0
0
-50
-50
0
0
2-.
M:-0
ooo
000
1~j00
o
-150
-100
100
E
0®oo
-20
I
510*101
50
150
200
150
200
200
150
200
0
150
200
E)E )E
100
100
00
0
0
-50
50
100
O
350-
-ees
.
-102
-D
II
1
101
-200
-150
-100
-50
0
time in ms
68
50
100
voiced gahg-ld (filled) vs. unvoiced kahk-ld (open)
Closure
Release
8
06o_
4
20
200
2000-100
L%
o
LI-
. ( Q
0
1000
-150
-100
-50
.
0
1
100
1
50
1
0
QQ
150
200
150
200
200
150
200
150
200
9Qogo ..
0
0
200
2
QQ®GO
0
-15
-150
-10
-100
-150
o~oo
-100
Eo -50 Eso
0
50
-50 OOO
0
50
0:
100
E
00
-2
20
200
®0 100
6 O
00
]Z -2
00- -2 0
200
0o
-150
0i
1
1
0
-50
-100
u
v
-
0 -
100
50
k
(open)
CO,
00
I
200
0
4
0
-22
-
1
-100
-15
200
0
200
0
-
-150
0
-50
0
50
100
150
200
-0
-150
-100
-50
0
50
100
150
200
-150
-100
-50
0
tim in ms
50
100
150
200
0
50
0
50
100
150
200
-
50 -
* * * * -0-
0 -
I
C
I
106200
50
I 20 --
-150
-50
-100
2
-ioo -so
I-
-20-
timein ms
''
oQQo
00
.)
-
o....e
-0-0
0200
-2 S
-150
0
-50
-100
-
00
0
I-
~j
1o
50
oo
1
100
-
150
200
10
0
150
200
0 -00
0 0
200
. .- '
0~~o
-15
-150
-100
000000
1 -
5
0
-50
time in ms
69
00
5
00
-..-
50
100
'
g
e
F
80
C60
40
20
3Q6(
- 200
0 100
LL
-2
o
0
voiced ahbah-as (filled) vs. unvoiced ahpah-as (open)
F
00
46
20
- - - 6 ( ( ( (
006
-150
1
0
1
50
100
-100
-50
0
50
100
150
-0
-100
-50
50
100
150
-0
5
5
0
5
50
100
-100
®®D
D
(D (D(
-150
0
1
-50
IF
0
Release
Closure
.
0000Co
0Q.
0
11
150
200
G
.
200
20
0
-20
-0
-10
0
20
OODO
-20
0
-2
2
R 50
-150
F
C O
10(
22
00
(5GG)C)C)C
00O~
-50
-100
0
00
'090
0000000
150
200
10
200
150
200
-
500
--.
0
-2
00
-150
-100
0
-50
50
100
time in ms
voiced ahbah-ds (filled) vs. unvoiced ahpah-ds (open)
80
Closure
S60
QQ999 0
40
20
00
366
200
LL100
0
-150
-T--
F
20 0
2 -20
0
1
00
-100
00oo
00
Release
11
.....
-50
0
50
00
-150
100
O
150
200
_
........
-
4Q9
E)
E) E)
-100
-50
0
50
I
100
0
-150
0
-100
eO E
150
00
0
-50
50
100
200
-
150
200
40
20 00
100
2-20
-2
50
I
II
0
00
1006
-150
-15
(D
00
00
000
-100
0
-0
0
I 1-
0~0
-50
0
0
50
100
-5(?010
1
150
200
5
0
150
200
-2
z 500
0
-2 00
-150
-100
0
-50
time in ms
70
50
100
voiced ahbah-ld (filled) vs. unvoiced ahpah-Id (open)
80S
60 -
godu ce
0
40-
2011
-150
-200
3
Reja(e( 5 C ( C 0
,
0
-100
0D0000
N2 00-
,
0
-50
0
50
100
150
G''(D
-.
0
1400-
0
p
'
-150
200
-100
oE
20 -
-100
0-
20
-200
50 -
-150
-100
0 00 00
-150
10 -200
100
e
150
200
-
1
-50
0
50
100
-50
I
0
I
50
I
100
150
200
150
200
-
I
00
0os
0
-50
-100
1O:aa~
0
---5
50
-
gQQQQDOO066
20-
-
0
1
-150
-00
LL
-50
-
20
EF
200
50
100--
o
200
150
tie-n---®i
3
-200
200
time
in ms
voiced ahbah-ss (filled) vs. unvoiced ahpah-ss (open)
Closure
Release
0000066.
'
0o
1
1
-150
-100
-50
0
50
100
80~
6014020
-200
®®(
2200-
G ®(
O
1
150
200
-
. .
00-
LL
0
-150
-200
4
20-
Ee
-50
0
50
-150
150
7
200
-100
-50
0
50
100
150
200
0
00
20
I-20
-200
50-
-150
-1
-100
-50
0
50
150
100
00'
0
00
IL 500-
10
Qq
-
0
-50
10
200
-
0
0
-20
10
0
100
-.
20
-200
4
10
-100
50
0 01
100
1520
-
' ' ' '
00
-200
-150
-100
-50
0
time in ms
71
50
100
150
200
voiced ahdah-as (filled) vs. unvoiced ahtah-as (open)
Closure
Release
80m60 20
0 Q Q G
-150
-200
O)0
®o0669
-100
-50
0
50
9 0
0,
100
150
200
M 200-
-
0000006
LL100-
0
0
0
-150
9
''
eoEoo
-100
-50
0
50
100
-50
0
50
100
o
150
200
150
200
-20
0
-150
-200
20 -
-100
D G® D(®
0 --20
-200
-150
<
Doo
-100
100
-150
-50
0
50
100
150
-50
0
50
100
150
Q0
-200
-150
200
00
-100
500
L
0
0.
0
6O~~
0~~6
-20
Q
00
0
-100
-50
LL1
0
time in ms
50
100
Qo
200
200
-
150
200
voiced ahdah-ds (filled) vs. unvoiced ahtah-ds (open)
Closure
Release
80
60-40
000
650000®®o.000
40
20
1
-150
-200
-100
-50
0
50
1
100
150
200
200
-
U-
o
10
0
EF46
-500
e-100 E)aeE000
-150
-200
-100
-50
0
50
0
100
-
150
20-
0-20
-200
-150
6
-20 --200
1? 00O
0
50
150
-0
200
-
-100
50
0
-50
50
100
50
0
100
00
Q
-100
-150
(?
11
0
-50
0
-150
-100
150
I
200
-
150
200
150
200
0
1
m500e- in
-200
100
0
o
G)o
0
50 -
0
1
-50
0
-150
-200
0
oo000
o
-100
20
0
-
200
200
00150
-50
0
time in ms
72
0
50
100
-E
80
60
voiced ahdah-ld (filled) vs. unvoiced ahtah-ld (open)
Closure
Release
000 006-
0
40
20
-150
-
00
366
200
-100
GrG G 0
- O0
- - -
. -)- -Qe
0
0
-50
I
0
50
100
I
150
I
. .
0
200
0
..
0100
0
-150
00
y0
-100
-50
20
0
50
100
150
0000
000
.
0
-20
00
20
C
ooooo...
-100
-150
-50
0
50
100
-150
1
1
-100
-
00
1040
150
- 09
..
-
-
-20C
-- 2 00
2
5C
lI
200
1
-50
0
50
100
150
0G) 0
200
1
200
~
-150
-100
-50
-150
00-100
-50
0
50
100
00 150
200
0
50
100
150
200
570
L 0
C)
V
00
4C
80 O 60 4
2
-
200
CO)
0I 100 0 --2 0
200
-
0
106
01
voiced ahdah-ss (filled) vs. unvoiced ahtah-ss (open)
Closure
Release
. .
.
.- 0
0(OD
200
-150
-10
0e
'
-150
-100
e
0
. .
-100
-50
0
50
,D
ee
-50
0
1 ~~
50
~
100
10
100
-150
0
-50
-100
-150
50
100
200
200
20
150
200
7
Q Qg
150
0
-50
50
100
150
- -
-150
-100
-150
-100
1
200
200
-
0
-50
00
50 0-
0-
150
100
166OO
-100
0-
20
e
~
0~~
200
-
00
-2
- 00
0- Co
00
0-
2
-o
time in ms
-50
0
time in ms
73
50
100
0
10
50
100
150
ooo
150
200
2
200
voiced ahgah-as (filled) vs. unvoiced ahkah-as (open)
Closure
Release
0E
0-
0000006...
0 r-
0-
1
-150
0
20
3(
N2C0 -
c
-100
0-
E)00
0
200
1
-) - -)
- .
ee
o E
-150
-100
00
1
50
1
0
I1
-50
100
0
0
50
e
100
j
-
-50
200
150
200
150
0 -Q-
-50
-100
-150
200
100
50
0
200
150
00
z010
-2
OO
0-
- 0-
-o
0
60
QQ
-150
-200
-50
-100
50 -.
0
10
50
0
100
200
150
.-
I
I
-
0.-OQgO
9 0 0.
00
-150
-200
56
))0-
-100
-50
I
01i
0
0
50
100
I
I
II
-150
00
-50
-100
0
time in ms
50
0
150
0
-
-200
0
®000
200
-
0
150
100
200
voiced ahgah-ds (filled) vs. unvoiced ahkah-ds (open)
Closure
Release
80-
60120
0Q
.
06 -
40-
-150
0
-100
0000
0
-50
1
1
1
50
100
20 0
150
200-
G)0 0 . . .
I eee
0 1 00L
0
200
40 0
I
-100
-150
0
-50
50
-
Oeee
100
- 9
150
e
20 0
20-
-
-20
-00
-150
200
40
0
-201
-200
0
-150
-50
0
50
100
I
I
I
I
I
-
-
-100
0o
0
-200
500 -
0
-200
.-
0
-50
50
-150
6
-100
(D
-150
100
,L
10
20 0
150
-
-50
?9..
0
0
t00m i ms
50
I
100
0 00
0
-100
0
0
0
-)00DG
oe
20 0
150
o6
'0
50LL
-
-100
20
CO
-
0 Q)09 0 00
0000
0 -
I-
.
I
0
-50
time in ms
74
50
j100
150
DE
150
20 0
-
200
80
M60
40
20
3-2
200
100
0
-2
0
20
0
-20
-2
-
voiced ahgah-Id (filled) vs. unvoiced ahkah-ld (open)
Release
Closure
. voiced
006 - .-
D
0
- 6
ovunvoiced
-100
-150
-50
0
50
100
-0
-50
0
0
50
10-0
100
150
-
0
-100
-06-
-150
-150
00
20
0
-20
-2 00
- 50
-150
co)
oo
r 0
00
1056
-150
0
-2 0
-150
0
E1
-50
-5 -0
0
50
0
0
-100
-50
0
50
00
-100
100
006
'00
100
150
10
2C0
20
150
2C0
Q0
0
0
-50
0
0
I
I
I
LL
-100
-50
50
0
time in ms
1
2C0
150
100
50
I
I
II
-
0
-100
-10
150
0
2 0
20 0
150
- .
0
Q
0
-0
QQ
150
100
20 0
voiced ahgah-ss (filled) vs. unvoiced ahkah-ss (unfilled)
Closure
Release
80 ~
CO 60
40C -000
20
0
356
00
-150
-100
50
0
-50
100
20 0
150
0
-200
0 100
u.
0
I
-150
0
46
-2
20
-100
-50
50
0
100
00000
.090O00000
0
-2
0
-2C
JI S20
-150
-100
0
-50
50
100
0
o j
I-
-20
-- 2 00
5C
1 0
II
I
-150
-100
-
I
0
-50
-
I
I
I
50
100
150
-
-1
20
00
.
0000000
20 0
150
--
0
20 0
150
00
C
00
-150
000
50C
_
-100
0
-50
50
00
00
-150
-100
2
0
0
0...
o
-2
150
100
-50
time
75
0
in ms
50
100
000
150
20
Appendix B
76
FO [Hz] data from speakers as, ds, Id, and ss that were used to plot figure 4.2.la. vc: voiced
closure, uc: unvoiced closure, vr: voiced release, ur: unvoiced release.
VCV utterances
ahbahasvc=
ahbahasuc=
ahbahasvr=
ahbahasur=
ahgahas-vc=
ahgahasjuc=
ahgah_as-vr=
ahgahlas-ur=
ahdahasvc=
ahdahasuc=
ahdahasvr=
ahdahasur=
93
104
123
105
104
101
114
146
100
100
120
137
ahbahdsvc= 108
ahbahdsuc= 128
ahbahdsvr= 119
ahbahdsur= 157
ahgahdsyvc=106
ahgah_ds-uc= 112
ahgahds-vr= 120
ahgahds-ur= 140
ahdahdsvc= 110
ahdahdsuc= 107
ahdahdsvr= 112
ahdahdsur= 146
ahbahldvc=198
ahbahld_uc=201
ahbahld_vr=214
ahbahld_ur=215
ahgah_ld vc=197
ahgah.1djuc=211
ahgah_1djvr=229
ahgahjld-ur=248
ahdah-ld_vc=197
ahdahld_uc=212
ahdahld_vr=229
ahdahld_ur=239
ahbahssvc=169
ahbahssuc=181
ahbahssvr=243
ahbahssur=255
ahgahss-vc= 180
ahgah-ss-uc= 199
ahgahss-vr= 198
ahgahss-ur= 254
ahdahssvc= 165
ahdahssuc= 191
ahdahssvr= 208
ahdahssur= 243
77
CVC utterances
bahbasvc=
bahbasuc=
bahb-as-vr=
bahbasur=
gahg-as-vc=
gahg-as-uc=
gahg-asjvr=
gahg-as-ur=
dahdasvc=
dahdasuc=
dahdasvr=
dahdasur=
87
94
110
122
105
97
108
113
104
98
108
142
bahbdsvc= 100
bahbdsuc= 101
bahbdsvr= 127
bahbdsur= 115
gahg-ds-vc= 95
gahg-ds-uc= 100
gahg-ds-vr= 121
gahg-ds-ur= 160
dahddsvc= 94
dahddsuc= 95
dahddsvr= 119
dahddsur= 137
bahbld_vc= 179
bahbld_uc= 197
bahbld_vr= 228
bahbjld-ur= 245
gahgjdcvc= 175
gahgjdcuc= 181
gahgjLdvr= 228
gahgjldur= 252
dahdld_vc= 166
dahdld_uc= 168
dahdld_vr= 226
dahdld_ur= 223
bahbssvc= 151
bahbssuc= 160
bahbssvr= 201
bahbss-ur= 271
gahg-ss-vc= 163
gahg-ss-uc= 188
gahg-ss-vr= 213
gahg-ssjur= 275
dahd_ssvc=162
dahd ss uc= 164
dahdssvr= 207
dahdssur= 299
78
FO [Hz], HI [db], Fl [Hz], H1-H2 [dB], HI-Al [dB], and Hl-A3 [dB] data used to plot figures
4.2a and 4.2b. The first number represents the Max value and the second represents the Diff
value. vr: voiced release, vc: voiced closure, ur: unvoiced release, uc: unvoiced closure.
VCV utterances
Male Speaker AS ahbah/ahpah
F0_vr:-8.0000 -25.0000
FOvc :-2.0000 -3.0000
FOur:105.0000 103.0000
FOuc :-2.0000 -3.0000
Hvr :-0.4000 -0.6000
Hivc:-1.6000 -2.0000
Hiur:0.9000 1.3000
Hiuc:-1.8000 -1.9000
Flvr :22.0000 39.0000
Flvc :-58.0000 -117.0000
Flur :-20.0000 -20.0000
Fluc :-39.0000 -98.0000
HiH2_vr :-0.3000 -0.4000
HiH2_vc:1.1000 1.7000
HiH2_ur :-2.6000 -3.7000
HiH2_uc:1.0000 2.3000
Hi_Alvr :-0.8000 -1.7000
Hi-Alvc:1.1000 3.3000
HI_Alur :-5.2000 -6.8000
Hi_Aluc:1.1000 2.2000
HiA3_vr:-0.7000 -1.2000
HiA3_vc :5.5000 9.5000
HiA3_ur:-3.8000 -3.9000
HiA3_uc :-3.4000 -7.5000
Male Speaker DS ahbah/ahpah
FOvr :-4.0000 -6.0000
FO_vc :-108.0000 -111.0000
FOur:157.0000 157.0000
FOuc :1.0000 1.0000
Hivr:-0.4000 -1.1000
Hivc :-1.1000 -2.1000
Hiur:8.1000 18.5000
Hiuc:-1.9000 -2.0000
Flvr:20 20
Flvc :-58.0000 -137.0000
Flur :-58.0000 -117.0000
Fluc:-39.0000 -79.0000
HiH2_vr :-0.3000 -0.6000
Hi-H2_vc :-0.7000 -0.8000
HiH2_ur :-1.9000 -1.9000
HiH2_uc :1.8000 2.5000
Hi_Alvr :-1.0000 -2.3000
Hi_Alvc :1.3000 -0.2000
Hi-Alur :-13.2000 -5.3000
Hi_Aluc:1.7000 2.5000
HiA3_vr:-1.3000 -1.7000
HiA3_vc:5.2000 7.8000
HIA3_ur:8.9000 14.3000
HiA3_uc :3.2000 4.4000
79
Female Speaker LD ahbah/ahpah
FOvr :95.0000 93.0000
FOvc :2.0000 -1.0000
FOur :26.0000 30.0000
F0_uc :-5.0000 -10.0000
Hvr :-0.3000 -0.4000
Hivc :-0.5000 0.4000
HIur:-0.5000 -0.6000
Hiuc :-1.7000 -2.4000
Flvr:156.0000 176.0000
Flvc :-19.0000 -19.0000
Flur :-20 -20
Fluc :-175.0000 -195.0000
HiH2_vr:1.0000 1.3000
HiH2_vc:1.0000 -0.1000
HiH2-ur:-10.7000 -2.2000
HiH2_uc:1.3000 0.9000
Hi_Alvr:1.2000 0.6000
Hi_Alvc :2.6000 3.3000
Hi_Alur:0.4000 0.5000
Hi_Aluc :0.9000 1.7000
HiA3_vr :-1.4000 -0.3000
HiA3_vc :2.8000 7.0000
HiA3_ur:1.9000 2.2000
HiA3_uc :2.6000 6.5000
Female Speaker SS ahbah/ahpah
FOvr :243.0000 197.0000
FOvc :2 6
FOur:255.0000 238.0000
FOuc :3.0000 3.0000
Hvr:-0.2000 -0.3000
Hivc :-0.9000 -0.4000
Hiur:6.0000 1.3000
Hiuc :-1.7000 -0.6000
Flvr :137.0000 156.0000
F1_vc :-117.0000 -195.0000
Flur:39.0000 19.0000
Fluc :-117.0000 -196.0000
HiH2_vr :-0.5000 -0.8000
HiH2_vc :-3.700 -7.9000
HiH2_ur :-4.2000 -6.1000
HiH2_uc :-2.0000 0.3000
Hi_Alvr:-0.9000 -1.4000
Hi_Al_vc:1.4000 0.3000
Hi_Alur:-11.4000 -19.8000
Hi_Al_uc :2.2000 4.3000
HiA3_vr:1.4000 3.5000
HiA3_vc :3.3000 6.6000
HIA3_ur :-15.6000 -9.2000
HiA3_uc :3.3000 8.1000
Male Speaker AS ahgah/ahkah
FOvr :114.0000 102.0000
FOvc :-2.0000 -3.0000
FOur:146.0000 129.0000
FOuc :-2.0000 -2.0000
HIvr :-0.3000 -0.4000
Hivc:-1.0000 -1.3000
80
HIur:8.6000 18.4000
H Iuc :-1.6000 -1.1000
Fl-vr:156.0000 195.0000
Flvc :-215.0000 -313.0000
Flur :-90.0000 -156.000
Fluc :-391.0000 -489.0000
HiH2_vr:0.3000 0.5000
H lH2_vc:-0.9000 -1.4000
HiH2_ur:-2.1000 -3.0000
HiH2_uc :0.9000 2.6000
H I_A l_vr :-1.6000 -0.4000
Hi_Al_vc:-2.1000 -2.1000
HiAl-ur:7.0000 -0.2000
Hi_Aluc:2.9000 6.4000
HiA3_vr:-2.8000 -3.6000
HiA3_vc:3.7000 5.7000
HiA3_ur:7.7000 15.3000
HiA3_uc :4.6000 9.9000
Male Speaker DS ahgah/ahkah
FOvr:120.0000 110.0000
FOvc :-107.0000 -108.0000
FOur :140.0000 124.0000
FOuc :3.0000 4.0000
Hivr:-0.9000 -1.4000
Hivc :0.6000 0.4000
Hiur:5.5000 3.9000
HIuc :2.0000 3.8000
Flvr :39.0000 79.0000
FIve :-156.0000 -312.0000
Flur:-100.0000 18.0000
FlIuc :-137.0000 -274.0000
0
HiH2_vr :0.4000
HiH2_vc:0.8000 1.4000
HiH2_ur :-4.2000 -9.6000
HiH2_uc :2.0000 7.1000
Hi_Al_vr :-1.8000 -2.4000
Hi_Alvc:1.6000 4.1000
Hi_Alur :-12.5000 -19.4000
Hi_Aluc:3.3000 11.0000
HiA3_vr :-0.7000 -0.9000
HiA3_vc :2.3000 3.6000
HiA3_ur :-9.8000 -7.7000
HiA3_uc :2.2000 7.0000
Female Speaker LD ahgah/ahkah
FOvr :229.0000 222.0000
FOve :3.0000 8.0000
FOur :248 248
FOuc :-6.0000 -6.0000
Hivr :-0.2000 -0.3000
Hivc:-0.4000 0.3000
Hiur:8.2000 12.0000
Hiuc :-3.4000 -4.2000
Flvr:-20 -20
Flvc :-411.0000 -411.0000
Flur :39.0000 78.0000
Fl-uc :-430.0000 -449.0000
HiH2_vr :0.3000 0.5000
81
HiH2_vc:3.0000
HIH2-ur:-0.7000
HiH2_uc :3.0000
Hi_Al_vr:-3.9000
Hi_Alvc:5.4000
Hi_Al_ur:-8.9000
Hi_Al-uc:4.0000
HiA3_vr:-1.7000
HiA3_vc:6.7000
HiA3_ur:7.2000
HIA3_uc:6.4000
3.6000
-1.7000
6.9000
-6.1000
7.5000
-13.7000
3.6000
-1.9000
10.2000
0.9000
16.6000
Female Speaker SS ahgah/ahkah
FOvr:103.0000 193.0000
FOvc :-8.0000 -9.0000
FOur :254.0000 254.0000
FOuc :1.0000 1.0000
Hvr :-0.2000 -0.4000
Hivc:-1.5000 -3.2000
Hiur:21.5000 30.6000
Hiuc:-1.9000
0
Flvr:0 0
Flvc :-332.0000 -273.0000
Flur:-125.0000 -86.0000
FIuc :-157.0000 -195.0000
HiH2_vr:0.5000 1.4000
HiH2_vc :-2.3000 -5.8000
HiH2_ur:2.2000 0.6000
HiH2_uc:12.6000 18.8000
Hi_Alvr:-3.2000 -5.5000
Hi_Al_vc:2.8000 1.6000
Hi_Alur:13.7000 6.2000
Hi_Al_uc :3.4000 9.2000
HiA3_vr :-2.6000 -5.8000
HiA3_vc :4.7000 8.2000
HiA3_ur:21.2000 30.1000
HiA3_uc :4.8000 9.8000
Male Speaker AS ahdah/ahtah
FOvr:183.0000 101.0000
FOvc:-1.0000 -1.0000
FOur:137.0000 137.0000
0
FOuc :-2.0000
Hivr:0.3000 0.5000
Hivc:-1.1000 -1.3000
Hiur:9.3000 17.7000
Hiuc :-2.3000 -2.5000
Flvr:20.0000 58.0000
Flvc :-58.0000 -156.0000
Flur:137.0000 117.0000
Fluc :-39.0000 -136.0000
HIH2_vr :-0.5000 -0.5000
HiH2_vc :-0.2000 -0.2000
HiH2_ur :-6.3000 -0.7000
HiH2_uc :0.4000 1.3000
HiAl_vr:1.9000 1.0000
Hi_Alvc :-0.3000 -0.1000
Hi_Alur:11.6000 11.4000
Hi_Al_uc:1.6000 3.4000
82
HiA3_vr:-2.2000 -1.3000
HIA3_vc:1.5000 3.5000
HiA3_ur:7.5000 15.7000
HiA3_uc:3.6000 7.8000
Male Speaker DS ahdah/ahtah
FOvr :112.0000 107.0000
FOvc :1.0000
0
FOur:146.0000 140.0000
FOuc :-107.0000 -115.0000
Hlvr :-0.3000 -0.2000
Hvc:-1.7000 -2.1000
Hur:6.3000 8.7000
H_uc:2.3000 1.4000
Flvr :39.0000 59.0000
Fivc :-58.0000 -137.0000
Flur :-78.0000 -59.0000
Fluc :-109.0000 -188.0000
HiH2_vr :0.4000 0.2000
HiH2_vc :-0.3000 -0.4000
HiH2-ur :-2.4000 -6.4000
HiH2_uc:8.6000 6.0000
Hi_Alvr:1.4000 2.4000
Hi_Al_vc:-1.4000 -0.7000
Hi_Alur:-7.0000 -7.5000
Hi_Al_uc:4.6000 13.4000
HiA3_vr:2.1000 5.4000
HiA3_vc :-3.4000 -9.4000
HiA3ur :-18.0000 -8.5000
HiA3_uc :4.0000 8.9000
Female Speaker LD ahdah/ahtah
FOvr :229.0000 218.0000
FOvc :1.0000
0
FOur :239.0000 239.0000
FOuc :2.0000 1.0000
Hivr :-0.2000 -0.4000
Hivc :-0.8000 -0.4000
HIur:13.6000 23.8000
Hi_uc :-2.1000 -2.2000
F1_vr:0 0
Flve :-352.0000 -372.0000
Flur :-59.0000 -136.0000
Fl_uc:0 0
HiH2_vr :0.7000 1.4000
HiH2_vc :1.9000 1.0000
HiH2_ur:-1.9000 -3.5000
HiH2_uc:1.3000 2.2000
Hi_Alvr :-3.6000 -5.3000
HiAlve :2.5000 6.3000
Hi_Alur:11.8000 0.8000
Hi_Aluc:1.4000 2.9000
HiA3_vr :-3.2000 -4.8000
HiA3_vc:1.5000 1.2000
HIA3_ur:15.3000 21.8000
HiA3_uc :3.3000 10.4000
Female Speaker SS ahdah/ahtah
FOvr :208.0000 193.0000
83
FOvc :4.0000 7.0000
FOur :-50.0000 -54.0000
FOuc :2.0000 3.0000
Hivr :-0.4000 -0.8000
Hve :-2.4000 -2.3000
HIur :4.2000 -1.5000
Hiuc :-2.4000 -3.7000
Flvr:0 0
Flvc :-312.0000 -391.0000
Flur :69.0000 30.0000
Fluc :-137.0000 -215.0000
HiH2_vr :-0.3000 -0.7000
HiH2_vc:-0.8000 -1.1000
HiH2_ur:-5.8000 -10.2000
HiH2_uc :3.9000 5.8000
Hi_Alvr:-4.7000 -2.3000
Hi_Alvc:10.8000 13.7000
Hi_Alur:-11.1000 -14.6000
Hi_Aluc:2.6000 4.2000
HlA3_vr :-2.3000 -3.6000
HiA3_vc :4.5000 8.7000
HiA3_ur:-7.5000 -8.3000
HiA3_uc:1.9000 2.5000
CVC utterances
bahb/pahp Male Speaker AS
FOvr:110.0000 102.0000
FOve :-5.0000 -8.0000
FOur:122.0000 122.0000
FOuc :-56.0000 -102.0000
Hivr :-0.5000 -0.2000
Hivc :-2.2000 -6.7000
Hiur:10.1000 7.8000
Hiuc :-2.2000 -6.7000
Flvr :78.0000 137.0000
Flvc :-293.0000 -351.0000
Flur :-97.0000 -39.0000
Fluc :-1 17.0000 -157.0000
HiH2_vr:0.3000 0.5000
HiH2_vc :0.3000 0.8000
Hi-H2_ur:8.3000 4.1000
HIH2_uc :6.0000 12.6000
Hi_Al_vr:1.3000 0.7000
Hi_Alvc :-0.8000 -0.7000
Hi_Alur:10.7000 7.0000
HiAluc:6.6000 7.1000
HiA3_vr :-1.2000 -2.0000
HiA3_vc :5.6000 8.4000
HiA3_ur :12.3000 10.1000
HiA3_uc:15.0000 17.7000
bahb/pahp Male Speaker DS
FOvr:127.0000 131.0000
FOvc :-6.0000 -19.0000
FOur:68.0000 115.0000
FOuc :-101.0000 -101.0000
Hivr:-1.1000 -1.0000
84
Hl_vc :-2.3000 -6.7000
HIur :-5.5000 -6.6000
Hiuc :-2.3000 -6.7000
Flvr :20.0000 20.0000
Flvc :-39.0000 -78.0000
Flur:-59.0000 -19.0000
Fluc -19.0000 -19.0000
HiH2_vr:-1.1000 -0.5000
HiH2_vc :-1.4000 -4.2000
HiH2_ur:-5.8000 -14.2000
HiH2_uc:17.7000 19.7000
Hi_Alvr:0.5000 0.6000
Hi_Alvc:-4.7000 -6.1000
Hi_Alur:-12.1000 -27.3000
Hi_Aluc:7.6000 16.4000
HiA3_vr:-1.6000 -2.5000
HiA3_vc :-2.4000 -0.7000
HiA3_ur:-12.6000 -27.5000
HiA3_uc :7.6000 16.4000
bahb/pahp Female Speaker LD
FOvr :228.0000 232.0000
FOvc:-4.0000 -10.0000
FOur :245.0000 242.0000
FOuc :18.0000 13.0000
Hivr:-0.4000 -0.2000
Hivc:-1.3000 -2.5000
Hiur:1.8000 1.7000
Hiuc:-1.3000 -2.5000
FIvr :19.0000 19.0000
Flvc :-59.0000 -98.0000
Flur :58.0000 39.0000
FIuc :19.0000 19.0000
Hi1H2_vr:0.8000 1.6000
H1H2_vc :-1.8000 -2.9000
H1H2_ur:-1.0000 -1.9000
HiH2_uc :3.6000 2.7000
HiAl-vr:0.7000 1.7000
Hi_Alvc:5.7000 4.0000
HiAlur:-5.9000 -12.9000
HiAluc :-2.0000 2.1000
HiA3_vr :-0.8000 -2.3000
HiA3_vc :4.3000 8.2000
HIA3_ur :-6.3000 -10.6000
HiA3_uc:8.7000 8.1000
bahb/pahp Female Speaker SS
FOvr:201.0000 184.0000
FOvc :-7.0000 -17.0000
FOur :270.0000 224.0000
FOuc :-31.0000 -91.0000
Hivr :-0.8000 -2.2000
0
HIve :1.1000
Hiur :-3.9000 -4.0000
0
HIuc:1.1000
Flvr :-19.0000 -19.0000
Flvc :-40.0000 -78.0000
Flur:50.0000 50.0000
Fluc :-39.0000 -39.0000
85
HlH2_vr:-1.9000
H1H2_vc:1.4000
H1H2_ur:3.4000
HiH2_uc :-3.2000
Hi_Alvr:-2.7000
Hi_Alvc:3.0000
HI_Al_ur:-6.6000
Hi_Aluc :-3.1000
HiA3_vr:-1.0000
HiA3_vc:3.2000
HiA3_ur:-5.3000
HiA3_uc :-5.5000
-3.5000
3.3000
0.3000
-4.6000
-6.1000
8.0000
-11.0000
-6.4000
-0.8000
7.1000
-14.7000
-10.3000
gahg/kahk Male Speaker AS
FOvr:108.0000 101.0000
FOvc :-100 -100 40
FOur:113.0000 113.0000
FOuc :-1.0000 -2.0000
Hlvr:1.8000 2.4000
Hivc:-1.5000 -2.4000
Hiur:13.1000 14.9000
Hiuc:-1.5000 -2.4000
Flvr:98.0000 98.0000
Flvc :-156.0000 -235.0000
Flur :-98.0000 -157.0000
Fluc :20.0000 -19.0000
H1H2_vr:-1.5000 -2.0000
HiH2_ve :-0.2000 -0.6000
HiH2_ur :-8.0000 -4.7000
HiH2_uc:11.3000 4.5000
Hi_Al_vr :-2.4000 -2.9000
Hi_Al_vc:-1.4000 -1.6000
Hi_Alur:-12.8000 0.4000
Hi_Aluc :10.6000 10.3000
Hi-A3_vr:-1.3000 -1.4000
HiA3_vc :3.0000 7.6000
HiA3_ur :13.3000 8.3000
HiA3_uc :12.3000 7.1000
gahg/kahk Male Speaker DS
FOvr:121.0000 104.0000
FOvc :-95.0000 -7.0000
FOur:100.0000 145.0000
0
FOuc:100.0000
Hivr:0.6000 0.1000
Hivc :-1.2000 -3.3000
Hiur:8.1000 10.3000
Hiuc :-1.2000 -3.3000
Fivr:0 0 0
Flvc :-59.0000 -215.0000
Flur :-164.0000 -254.0000
Fluc :-39.0000 -59.0000
HiH2_vr:0.5000 0.8000
HiH2_vc:-1.7000 -1.8000
H1H2_ur :-1.7000 -3.4000
HiH2_uc :2.1000 1.6000
Hi_Alvr:1.4000 2.5000
Hi_Alvc :2.2000 4.3000
Hi_Alur:-8.1000 -17.6000
86
Hi_Aluc:-4.0000
HiA3_vr:-1.6000
HiA3_vc:3.8000
HiA3_ur:9.0000
HiA3_uc:-7.1000
-5.8000
0.4000
7.6000
-0.7000
-9.8000
gahg/kahk Female Speaker LD
F0_vr :228.0000 101.0000
FOve :1.0000 1.0000
FOur:-91.0000 -91.0000
FOuc :-43.0000 -66.0000
Hivr:-0.3000 -0.7000
Hivc:-1.6000 -1.0000
Hiur:11.7000 19.5000
HIuc :-1.6000 -1.0000
Flvr:0 0
Flve :-371.0000 -371.0000
Flur:58.0000 39.0000
FLuc :-39.0000 -97.0000
HiH2_vr:-5.7000 -5.4000
HiH2_vc:-1.6000 -0.4000
Hi1_H2_ur:3.4000 1.1000
HiH2_uc :-2.3000 -6.5000
Hi_Alvr :-3.7000 -7.1000
Hi_Alvc :8.2000 18.7000
Hi_Alur:-5.3000 -8.6000
Hi_Aluc :-2.3000 -3.5000
HiA3_vr:-1.1000 -1.9000
HiA3_vc :9.5000 11.6000
HiA3_ur:7.2000 1.8000
HiA3_uc :-4.0000 -9.1000
gahg/kahk Female Speaker SS
FOvr:213.0000 193.0000
FOvc:-4.0000 -3.0000
FO-ur:275.0000 275.0000
FOuc :-169.0000 -169.0000
Hivr:-1.3000 -1.6000
Hivc:1.5000 1.8000
Hiur:3.7000 5.4000
Hiuc:1.5000 1.8000
Flvr :-20.0000 -20.0000
Flvc :-176.0000 -351.0000
Flur:19.0000 33.0000
Fluc :-39.0000 -58.0000
HiH2_vr:-2.1000 -3.1000
HiH2_vc:3.3000 2.1000
HiH2_ur:-1.7000 -4.1000
HiH2_uc :4.0000 0.7000
Hi_Alvr :-4.9000 -3.4000
Hi_Alve :4.2000 10.0000
HI_Alur :-4.2000 -6.8000
Hi_Aluc :-4.5000 -0.1000
HiA3_vr :-1.4000 -0.6000
HiA3_ve:8.6000 17.4000
HiA3_ur :-8.1000 -2.7000
HiA3_uc:-4.1000 -3.1000
dahd/taht Male Speaker AS
87
FOvr:108.0000 98.0000
FOvc :-3.0000 -3.0000
FOur:142.0000 142.0000
FOuc :-6.0000 -5.0000
Hvr:-0.6000 -1.4000
Hlvc:6.3000 5.4000
Hiur:8.0000 9.7000
H1_uc :6.3000 5.4000
Flvr :39.0000 78.0000
Flvc :-195.0000 -253.0000
Flur:83.0000 156.0000
Fluc :-39.0000
0
HiH2_vr:0.6000 0.8000
HiH2_vc:14.4000 16.200
HiH2_ur:1.3000 1.3000
HiH2_uc :13.6000 14.4000
Hi_Alvr:1.4000 0.8000
Hi_Alvc:8.1000 9.6000
HiAlur:6.2000 8.1000
Hi_Aluc:11.9000 14.1000
HiA3_vr:1.4000 1.2000
HiA3_vc:8.9000 11.5000
HiA3_ur:8.0000 3.7000
HiA3_uc:10.0000 17.6000
dahd/taht Male Speaker DS
FOvr:119.0000 100.0000
FOvc :89.0000 -3.0000
FOur:137.0000 138.0000
0
FOuc :76.0000
Hivr:-1.5000 -3.1000
Hivc:-4.1000 -7.1000
Hiur:6.1000 8.0000
Hiuc:-4.1000 -7.1000
Flvr:39.0000 19.0000
Flvc:-20.0000 -39.0000
Flur:29.0000 39.0000
Fluc:-39.0000 -78.0000
HiH2_vr :-0.7000 -0.9000
HiH2_vc :-3.4000 -5.7000
HiH2_ur :-2.8000 -5.6000
HiH2_uc :4.9000 13.5000
HI_Alvr:1.1000 1.9000
Hi_Alvc :-2.9000 -5.6000
Hi_Alur:-11.3000 -18.0000
Hi_Aluc:6.0000 -0.7000
HiA3_vr :-1.3000 -1.6000
HiA3_vc :-4.0000 -5.6000
HiA3_ur :4.9000 -0.5000
HiA3_uc :6.0000 -1.1000
dahd/taht Female Speaker LD
FOvr:226.0000 220.0000
FOvc :-3.0000 -5.0000
FOur:223.0000 223.0000
FOuc :-178.0000 -188.0000
HIvr :-0.1000 -0.2000
Hivc :-1.3000 -4.0000
Hiur:6.1000 13.7000
88
Hiuc:-1.3000 -4.0000
Flvr:0 0
Fl_vc :-20.0000 -39.0000
Flur :-98.0000 -127.0000
Fluc :-20.0000 -39.0000
HiH2_vr:1.1000 2.5000
HiH2_vc:12.8000 12.4000
HiH2_ur:3.6000 4.2000
HiH2_uc:6.6000 2.8000
Hi_Alvr :-3.2000 -4.0000
Hi_Al_vc:4.3000 7.2000
Hi_Alur:-4.5000 1.2000
Hi_Aluc:4.0000 0.1000
HiA3_vr:2.4000 1.1000
HiA3_vc:1.0000 0.7000
HiA3_ur:6.8000 16.5000
Hi-A3_uc :7.6000 4.0000
dahd/taht Female Speaker SS
FOvr -4.0000 -6.0000
FOve :-8.0000 -26.0000
FOur :278.0000 278.0000
FOuc :-164.0000 -179.0000
Hivr:-0.4000 -1.0000
Hi-vc:-1.4000 -3.8000
Hiur:7.8000 8.4000
Hiuc:-1.4000 -3.8000
Flvr:137.0000 137.0000
Fl-vc :-39.0000 -117.0000
Flur :-210.0000 -341.0000
Fluc :-39.0000 -39.0000
HiH2_vr :-1.5000 -2.8000
HiH2_vc :1.3000 0.4000
HiH2_ur :-3.4000 -6.6000
HiH2_uc :-23.9000 -22.9000
Hi_Alvr :2.5000 3.0000
Hi_Alvc :3.0000 5.0000
Hi_Alur:-7.2000 -13.2000
Hi_Aluc :-12.8000 -18.4000
HiA3_vr :0.5000 -0.3000
HiA3_ve :3.3000 3.6000
HiA3_ur:-8.6000 0.3000
HiA3_uc :-4.5000 -10.1000
89