A comparison of the contributions of two vocal characteristics

advertisement
Dept. for Speech, Music and Hearing
Quarterly Progress and
Status Report
A comparison of the
contributions of two vocal
characteristics to the
perception of maleness and
femaleness in the voice
Coleman, R. O.
journal:
volume:
number:
year:
pages:
STL-QPSR
14
2-3
1973
013-022
http://www.speech.kth.se/qpsr
11. SPEECH PERCEPTION
A.
A COMPARISON OF THE CONTRIBUTIONS O F TWO VOCAL
CHARACTERISTICS TO TIiE PERCEPTION OF MALENESS
AND FEMALENESS IN THE 'JOICE4t.
R . 0. Coleman
Abstract
Two e x p e r i m e n t s w e r e c a r r i e s out in which a c o m p a r i s o n w a s made
between the contributions of F on one hand, and vocal t r a c t r e s o n a n c e s
on the o t h e r , t o a perception of m a l e n e s s and f e m a l e n e s s in the adult
voice. In the f i r s t e x p e r i m e n t , in which n a t u r a l voice w a s u s e d , the
frequency of the Fo w a s found to be v e r y highly c o r r e l a t e d ( r s = . 9 4 ) with
the d e g r e e of m a l e n e s s and f e m a l e n e s s in the voice. The vocal t r a c t r e sonances w e r e l e s s highly c o r r e l a t e d and it i s a p p a r e n t that in the p r e s ence of the n a t u r a l l a r y n g e a l tone, these p e r c e p t i o n s a r e based on the
frequency of the Fo.
In the second e x p e r i m e n t a tone produced by a l a r y n g e a l v i b r a t o r w a s
substituted f o r the n o r m a l glottal tone a t simulated Fo' s r e p r e s e n t i n g both
m a l e s (120 Hz) and f e m a l e s (240 Hz). When l i s t e n e r s w e r e asked to identify the s e x of the s p e a k e r s some inconsistency with the findings of the
f i r s t e x p e r i m e n t w a s seen. The female F 0 w a s a weak indicator of female
voice quality when combined with m a l e vocal t r a c t resonance although the
m a l e Fo r e t a i n e d the p e r c e p t u a l prominence s e e n in the f i r s t experiment.
T h i s finding m a y be indicative of some basic difference in the n o r m a l glott a l c h a r a c t e r i s t i c s of m a l e s and female s.
Introduction
It h a s long been recognized that the s h o r t e r o v e r a l l vocal t r a c t length
in f e m a l e s r e s u l t s in a n upward shift in the frequency of f e m a l e vowel
formants.
The d e g r e e of t h i s shift h a s been e s t i m a t e d t o be on the o r d e r
of about 20
70 ( P e t e r s o n
and B a r n e y , 1952), although individual differences
in vocal t r a c t length can be expected t o affect the exact amount of shift
o c c u r r i n g between any two individuals.
The p e r c e p t u a l significance of these differences h a s been studied by
Coleman (1971) who r e p o r t e d that in the absence of n o r m a l differences in
the s e x of the speaker can be recognized with little difficulty in m o s t
instances.
Confusion o c c u r r e d when the s p e a k e r ' s vocal t r a c t r e s o n a n c e s
w e r e m o r e like that of the opposite s e x than of his own.
The d e g r e e of
m a l e n e s s o r f e m a l e n e s s in the voice w a s a l s o found to be significantly
c o r r e l a t e d with the frequency of the r e s o n a n c e s of the p e r s o n ' s speech.
" G u e s t r e s e a r c h e r f r o m the University of Oregon Medical School,
P o r t l a n d , Oregon, USA f r o m 10 F e b r u a r y to 7 August, 1973.
STL-QPSR 2 -3/i973
In other w o r d s , the higher the vowel f o r m a n t f r e q u e n c i e s , the m o r e female like the voice w a s f e l t t o be and vice v e r s a . T h i s finding would suggest
that under c e r t a i n c i r c m s t a n c e s l i s t e n e r s a r e quite sensitive t o the r e l a tively s m a l l d i f f e r e n c e s in vocal t r a c t r e s o n a n c e s that o c c u r between
speakers.
It a p p e a r s that l i s t e n e r s a r e a l s o sensitive t o resonance dif-
f e r e n c e s in c e r t a i n consonants.
Schwartz (1968) and Ingemann (19 6 8 ) have
found, f o r e x a m p l e , that the s e x of a speaker c a n be a c c u r a t e l y identified
on the b a s i s of isolated, voiceless f r i c a t i v e s .
On the b a s i s of the perceptual significance of the vocal t r a c t r e s o n a n c e s ,
it h a s been hypothesized that a perception of a s p e a k e r ' s vocal "pitch" and
subsequently the m a l e n e s s and f e m a l e n e s s of h i s voice, r e s u l t s f r o m the
combining of the information conveyed by both the l a r y n g e a l fundamental
and the r e s o n a n c e s of the vocal t r a c t (Coleman, 1971). T h i s hypothesis
h a s implications f o r the automatic recognition of speech a s well a s for the
understanding of c e r t a i n types of voice d i s o r d e r s and two e x p e r i m e n t s de signed to t e s t i t w e r e c a r r i e d out.
In both, the purpose was to c a n p a r e the
contributions of the laryngeal fundamental and vocal t r a c t r e s o n a n c e s t o a
perception of malene s s and f e m a l e n e s s in the voice.
In the f i r s t e x p e r i m e n t , utilizing n a t u r a l voice, c o r r e l a t i o n coefficients
w e r e computed between m e a s u r e s of vocal t r a c t r e s o n a n c e , laryngeal funda m e n t a l , and judgments of the d e g r e e of m a l e n e s s o r f e m a l e n e s s in the
voice.
The d e g r e e of c o r r e l a t i o n i s considered to indicate the contribution
of e a c h of the two vocal c h a r a c t e r i s t i c s to the l i s t e n e r judgments.
In the
second e x p e r i m e n t , the two vocal c h a r a c t e r i s t i c s w e r e contrasted s o that
l i s t e n e r s w e r e p r e s e n t e d with speech composed of vocal t r a c t r e s o n a n c e s
c h a r a c t e r i s t i c of one s e x in combination with an F
c h a r a c t e r i s t i c of the
0
opposite s e x and a s k e d to identify the s e x of the speaker. In such a f o r c e d choice situation, l i s t e n e r identifications would p r e s u m a b l y be based on the
m o r e perceptually prominent of the two vocal c h a r a c t e r i s t i c s .
Speaker s
The p e r sons f r o m whom the speech s a m p l e s w e r e obtained w e r e twenty
n o r m a l adult m a l e s and a like number of f e m a l e s selected f r o m a n A m e r i c a n
university population.
S p e a k e r s w e r e e s s e n t i a l l y unselected except those
whose f i r s t language w a s not English o r who had some obvious s p e e c h def e c t o r regional a c c e n t w e r e not included.
STL-QPSR 2 -3/1973
Recording p r o c e d u r e
Tape r e c o r d i n g s w e r e made on a high fidelity a r r a y that included a n
Ampex 601 r e c o r d e r and a n E l e c t r o - V o i c e 666 microphone.
The following
spoken m a t e r i a l w a s r e c o r d e d by each s p e a k e r : Two repetitions of a
standard six-word sentence, t h r e e productions of e a c h of the vowels /i/,
/u/,
/E
sage".
/,
and / a / ,
and t h r e e v e r s i o n s of a portion of the I1Rainbow P a s -
T h e s e p a r t i c u l a r vowels w e r e selected since they a r e r e p r e s e n t a t i v e
of the e x t r e m e tongue positions during production.
T h i s i s a n important
consideration since the difference between the vowel f o r m a n t frequencies
of m a l e s and f e m a l e s can be expected to v a r y f r o m one vowel to the next.
T h i s i s b e c a u s e , according to F a n t (1966), i t i s the pharyngeal cavity that
i s s h o r t e r in f e m a l e s with the o r a l cavity being about the s a m e length in
both sexes.
The change f r o m m a l e s to f e m a l e s , t h e r e f o r e , would be m o r e
pronounced in those vowels having f o r m a n t s a s s o c i a t e d with the pharyngeal
cavity, for instance F2 of / i / .
The sample sentences and one v e r s i o n of the Rainbow P a s s a g e w e r e
produced with a n o r m a l voice.
Two other v e r s i o n s of Rainbow P a s s a g e
w e r e produced with a laryngeal v i b r a t o r ( W e s t e r n E l e c t r i c , Model 5 E l e c t r o l a r y n x ) substituting f o r the glottal sound a t f r e q u e n c i e s of 120 Hz and 240 Hz.
These we r e selected a s being r e p r e s e n t a t i v e of the typical m a l e and female
The isolated vowels w e r e a l s o produced with the laryngeal v i b r a t o r
Fo.
producing a 100 Hz sound. T h i s was done in o r d e r to enhance the s p e c t r o graphic a n a l y s i s of the vowels since it i s s o m e t i m e s difficult to identify the
higher f o r m a n t s of female s p e a k e r s , p a r t i c u l a r l y those with a high F
0'
Subjects w e r e instructed to shape t h e i r mouths to produce e a c h t a r g e t vowel
and allowed to p r a c t i c e with the laryngeal v i b r a t o r until e a s i l y recognizable
vowels could be produced.
A
problem inherent in using a v i b r a t o r produced tone to substitute for
the n o r m a l laryngeal tone i s that if the t r a c h e a i s coupled t o the r e s t of the
vocal t r a c t by a glottal opening, the resonant c h a r a c t e r i s t i c s of the vocal
t r a c t a r e altered.
T h i s h a s been d i s c u s s e d by F a n t e t a l . (1972) who point
out that introducing a n e x t e r i o r l y based sound s o u r c e n e a r the level ,of the
open glottis r e s u l t s in the introduction of the r e s o n a n c e s and antiresonance s
that originate in the t r a c h e a .
These a r e capable of bringing about a funda-
m e n t a l a l t e r a t i o n in the t r a n s f e r c h a r a c t e r i s t i c s of the vocal t r a c t and in
the vowels produced by that vocal t r a c t .
16.
STL-QPSR 2 -3/1973
T h i s effect i s i l l u s t r a t e d in Fig. 11-A-I.
A s p e c t r o g r a m of the vowel
/ i / produced by the author using a l a r y n g e a l v i b r a t o r with the glottis both
open and closed i s shown.
A s a r e s u l t of the coupling of the t r a c h e a to the
r e s t of the vocal t r a c t , FZ of t h i s vowel h a s been virtually eliminated and
F l shifted upward in frequency.
The p e r c e p t u a l change accompanying t h i s
i s equally m a r k e d and with the glottis open the vowel i s unrecognizable.
It i s important, t h e r e f o r e , t h a t the glottis r e m a i n s closed when producing
vowels with a laryngeal v i b r a t o r in o r d e r t o i n s u r e a n o r m a l f o r m a n t
I
I
structure.
In t h i s study t h i s p r o b l e m w a s handled r e t r o s p e c t i v e l y by reviewing
a l l vowels and eliminating those which did not have a quality e a s i l y r e cognizable a s being that of the t a r g e t vowel.
Virtually all the vowels had
a n acceptable quality and it is likely that p e r s o n s normally keep t h e i r
glottis closed when producing vowels in t h i s way.
At the s a m e t i m e it i s
impossible to verify for c e r t a i n that some d e g r e e of coupling between the
t r a c h e a and the r e s t of the vocal t r a c t did not e x i s t .
Vowel f o r m a n t values
have, t h e r z f o r e , been r e p o r t e d a s ranlcings within the sample population
r a t h e r than a s specific f o r m a n t frequencies in Hz.
The assumption i s that
the s p e a k e r s would be relatively consistent in their p e r f o r m a n c e and what
-
e v e r shift in formant frequency m a y be o c c u r r i n g will a l s o be consistent
for this group of s p e a k e r s .
Formant measure s
F o r m a n t f r e q u e n c i e s for the fqur vowels under study w e r e obtained
f r o m hand -drawn l i n e s -of -be s t -fit around intensity -by -time s p e c t r o g r a p h i c
sections of e a c h of the isolated vowels.
The frequency of the m o s t i n -
t e n s e h a r m o n i c of e a c h f o r m a n t w a s designated a s the frequency of that
p a r t i c u l a r formant.
Individual values f o r f o r m a n t s 1, 2 , and 3 ( F1,
=2 J
and F ) f o r e a c h vowel w e r e a v e r a g e d and the r e sulting figure used to
3
r e p r e s e n t the r e l a t i v e vocal t r a c t resonance c h a r a c t e r i s t i c of e a c h speaker.
It w a s found that i t w a s not always possible t o c l e a r l y distinguish between
and F2 in the /a/ produced by s e v e r a l s p e a k e r s and the /a/ w a s sub1
sequently eliminated f r o m consideration. Ranking s of vocal t r a c t r e sonance
F
c h a r a c t e r i s t i c s a r e based, t h e r e f o r e , on m e a s u r e s of / i / ,
only.
/, /,
and /u/,
closed glottis
open glottis
closed glottis
F i g . II-A- I. Vowel /i/ produced using a laryngeal vibrator with speakerys glottis open and closed.
-
-
STL-QPSR
Fundamental frequency ( F d
Fundamental frequency was m e a s u r e d for e a c h speaker by determining
the number of v e r t i c a l s t r i a t i o n s appearing on intensity-by-time s p e c t r o g r a m s in two standard sentences and dividing by the duration of the voiced
portion of the sentences.
The r e s u l t i n g F
0
values w e r e a v e r a g e d f o r e a c h
speaker and ranked within the forty speaker group.
L i s t e n e r iudcments
-
E x w e r i ~ n e n tI
A listening tape w a s p r e p a r e d which consisted of 5 s e c s a m p l e s taken
f r o m e a c h of the forty s p e a k e r s ' n o r m a l voice production of the Rainbow
Passage.
T h e s e w e r e p r e s e n t e d to a group of 17 young adults who w e r e
asked to
1) d e t e r m i n e the s e x of e a c h s p e a k e r , and 2 ) to e s t i m a t e , on a
scale of 1 - 7 , how much of the quality they a s s o c i a t e with that p a r t i c u l a r
s e x e a c h voice contained.
In o r d c r to reduce the influence of possible
male/fernale differences in r a t e , juncture, and inflection, the five -second
s a m p l e s w e r e p r e s e n t e d backwards.
The judgments should, t h e r e f o r e , be
based solely on vocal t r a c t resonance and fundamental frequency information which would be unaffected bjr the backward presentation.
The r a t i n g s given e a c h subject w e r e averaged and ranked within the
f o r t y subject group.
The ranking of e a c h subject i s considered to r e p r e -
sent the r e l a t i v e d e g r e e of m a l e n e s s o r f e m a l e n e s s (14-F voice quality)
in e a c h s p e a k e r ' s voice.
C o r r e l a t i o n coefficients ( s p e a r m a n r ) w e r e
S
then computed between e a c h of the t h r e e r a n k o r d e r e d m e a s u r e s .
The a s -
sumption i s that the c l o s e s t c o r r e l a t i o n would indicate the vocal c h a r a c t e r i s t i c that i s making the g r e a t e s t contribution to the perception of M - F
voice quality.
If the two c h a r a c t e r i s t i c s under study w e r e equally p r o m -
inent t h i s should be reflected by a s i m i l a r i t y in the c o r r e l a t i o n coefficients.
Re sult s
- Experiment I
The r e s u l t s of the v a r i o u s r a n k - o r d e r c o r r e l a t i o n s .are s u m m a r i a e d
in Table 11-A-I.
These s a m e r e s u l t s a r e a l s o shown in F i g s . 11-A-2,
11-A-3, and 11-A-4.
lowest F
In these f i g u r e s the ranking of "l'! r e p r e s e n t s the
0 ' the lowest VTR figure, and the Most Male sounding voice,
respectively.
40
38
-
I
I
I
I
I
I
I
I
I
I
I
I
I
I
-
LL
rn
a
z
Z
4
0
o
E
0
-
.
-
6-•
4
-
O
0
-
O
0
-
icO
1-1
0
36 r
34
32 30
28
26
24
22
2018
16 1412
10
8
I
I
.a
a
...
0
0
o
0
o
-
-
-
-
0
0
0
-
I
I
I
I
Males
0 Femates
-
% a
2 i l l , , l , , , 1
0
0 2 4 6 8 10 12 14 16 18 202224 2628303234363840
RANKINGS OF THE RATINGS OF MALE-FEMALE
VOICE QUALITY
F i g . 11-A-2.
Rankings of l i s t e n e r r a t i n g s of d e g r e e of m a l e - f e m a l e
voice quality c o m p a r e d with rankings of ETo
RANKINGS OF THE RATINGS OF MALE-FEMALE VOICE QUALITY
Y h),
-
0
a
Q,
0
h),
00
-
0
G),
0
8 -
K
% P
O'!
0
a
0
0
0
-
0
0
0
0
0
I
-
-
T
r
T
T
1
9
:
I
T
r
I
T
0
-
I-i
-
STL-QPSR 2 -3/1973
18.
S p e a r m a n r a n k - o r d e r c o r r e l a t i o n coefficients
( r s) be tween d e g r e e of ~ a l e / ~ e m a lvoice
e
quality, fundamental frequency ( F o ) , and
vocal t r a c t r e s o n a n c e s (VTR).
TABLE 11-A-I.
I
1
b o m p a r ison
1
i
Combined Males
and F e m a l e s
h-FVoice Duality with Fo
.94*
bd-F Voice Duality with VTR
.59*
!
7%
p
t
p N.S.
6 5*
.O O t
. 5 6*
TR with Fo
I
rs
Males
only
I
.14t
I
I
I
L
i
i
Females
only
I
.88*
.27+
. 17+
.oi
!
It c a n r e a d i l y be s e e n t h a t the l i s t e n e r s w e r e basing t h e i r judgments
of the d e g r e e of m a l e n e s s o r f e m a l e n e s s in the voice on the frequency of
the l a r y n g e a l fundamental.
The c o r r e l a t i o n coefficient of . 9 4 r e p r e s e n t s
a n a l m o s t p e r f e c t one -to -one correspondence between the rankings of how
m a l e o r female sounding a p e r s o n ' s voice w a s judged to be and the f r e quency of h i s laryngeal fundamental.
A significant c o r r e l a t i o n of . 5 9 w a s
a l s o observed between VTR and M - F voice quality.
This i s interpreted
a s being simply a n e x p r e s s i o n of the relationship between VTR and F
0
which showed a s i m i l a r c o r r e l a t i o n of . 5 6 .
When the c o r r e l a t i o n s a r e compared s e p a r a t e l y f o r the two s e x e s , howe v e r , somewhat different r e s u l t s a r e seen.
A significant, though somewhat
reduced, c o r r e l a t i o n i s again seen between the rankings of F
and M - F
0
voice quality while the rankings of the other two m e a s u r e s show no c o r r e l a tion.
That i s to s a y , rankings of Fo w e r e unrelated t o those of VTRs,
and VTR rankings w e r e unrelated to those of M - F voice quality within
m a l e s and f e m a l e s a s sub-groups.
T h i s i s a reflection of the dichotomous
nature of the two s e x e s in t h e i r rankings of F
and M - F voice quality.
0
T h i s d ichotomy would contribute to the relationship between the ranking s
of t h e s e m e a s u r e s when both s e x e s a r e considered a s a single group but
would not when they a r e considered a s two subgroups.
A s can be s e e n
in F i g s . 11-A-2, 11-A-3, and 11-A-4 t h e r e i s no overlap between m a l e s
and f e m a l e s in t h e i r rankings of F
0
and M - F voice quality.
I
STL-QPSR 2 -3/1973
L i s t e n e r judgments
- E x p e r i m e n t II
In t h i s e x p e r i m e n t voices which consisted of a m a l e c h a r a c t e r i s t i c in
combination with a female c h a r a c t e r i s t i c w e r e p r e s e n t e d to a group of
l i s t e n e r s who w e r e asked to d e t e r m i n e the s e x of e a c h s p e a k e r .
As a
c o n t r o l , the l i s t e n e r s w e r e a l s o asked t o m a k e a s i m i l a r judgment f o r
voices containing vocal c h a r a c t e r i s t i c s consistent f o r one sex.
F r o m f o r t y s p e a k e r s used in E x p e r i m e n t I, the five f e m a l e s with the
highest vocal t r a c t r e s o n a n c e s and the five m a l e s with the lowest vocal
A listening
t r a c t r e s o n a n c e s w e r e s e l e c t e d for u s e in this experiment.
I
tape w a s p r e p a r e d which consisted of five-second s e g m e n t s of e a c h of the
t e n s p e a k e r s articulating the tone produced by a laryngeal v i b r a t o r a t
e a c h of the two pitches:
240 Hz and 120 Hz.
The listening tape consisted,
t h e r e f o r e , of a n equal number of e a c h of the following combinations:
low VTR with low F o , high VTR with high Fo, and high VTR with low Fo.
Half of the s a m p l e s , t h e r e f o r e , contained two vocal c u e s consistent with
,
I
I
one s e x and half contained two contrasting vocal cues.
I
T h i s 20 item s e r i e s w a s p r e s e n t e d to a listening group composed of
2 5 young adults.
L i s t e n e r s w e r e asked simply t o d e t e r m i n e whether
e a c h s p e a k e r w a s a m a l e o r a female.
Since speech produced with a
l a r y n g e a l v i b r a t o r h a s a n unusual quality the group w a s allowed t o l i s t e n
t o the e n t i r e s e r i e s and t h e i r judglr-snts w e r e based on a second playing
of the tape which follovred immediately.
In those speech s a m p l e s where the voices contaiiied two contrasting
vocal c h a r a c t e r i s t i c s , l i s t e n e r identifications of the s p e a k e r s e x would
be based on the m o r e prominent of the two c h a r a c t e r i s t i c s .
If the c u e s
we r e perceptually equal, identifications would be distributed r andomally
with e a c h s e x identified about a n equal number of t i m e s .
In those voices
where the two vocal c h a r a c t e r i s t i c s w e r e consistent, speaker s e x identifications should be appropriate to that sex.
Results
-
E x p e r i m e n t I1
The r e s u l t s of E x p e r i m e n t I1 a r e s u m m a r i z e d in Table 11-A -11 and
i l l u s t r a t e d in F i g s . 11-A-5 and 11-A-6.
When the two vocal c h a r a c t e r i s t i c s
w e r e consistent f o r one s e x , shown in the left side of the table, the s e x of
the speaker w a s c o r r e c t l y identified 245 out of 250 t i m e s and it i s c l e a r
that t h i s w a s a n e a s y identification for the l i s t e n e r s to m a k e .
I
Speaker
1
Voices combining two FEMALE
c h a r a c t e r i s t i c s (High Fo;
High VTR)
I I
L i s t e n e r identifications of
s p e a k e r s as MALE
Fig. 11-A-5.
Voices combining two MALE
c h a r a c t e r i s t i c s (Low F0 *
Low VTR)
L i s t e n e r identifications of
s p e a k e r s a s FEMALE
Speaker s e x identifications based on voices in
which two vocal qualities c h a r a c t e r i s t i c of the
s a m e sex a r e combined. (Fg = Fundamental
Frequency; VTR = Vocal T r a c t Resonance).
-
Speaker
Voices combining a m a l e
EYo with a F e m a l e VTR
Voices combining a F e m a l e
F', with a M a l e VTR
L i s t e n e r identifications of
s p e a k e r s a s MALE
Fig. 11-A-6.
L i s t e n e r identifications of
s p e a k e r s a s FEMALE
S p e a k e r identifications Lased o n voices i n
which a v o c a l quality c h a r a c t e r i s t i c of ant.
s e x is combined with a v o c a l cjuality c11;lra c t e r i s t i c of the other sex. (F, Fun<3~lmant,il
F r e q u e n c y ; VTK - Vocal T r a c t R c s o n ; l n c r ~ ) .
STL-QPSR 2-3/1973
TABLE 11-A-11.
2 0.
Distribution of l i s t e n e r identifications of s p e a k e r s
a s m a l e o r female in r e s p o n s e t o two vocal c h a r a c t e r i s t i c combinations: I - Fundamental frequency
( F O )and Vocal T r a c t Resonance (VTR) r e p r e sentative
of the s a m e sex: I1 - A vocal quality c h a r a c t e r i s t i c
of one s e x combined with a vocal quality c h a r a c t e r i s t i c
of the other sex. S p e a k e r s I - 5 a r e f e m a l e , 6 - 10
a r e male.
i
I
Vocal
Characteristic:
I1
I
Speaker
Fo
VTR
1.
'Female
Female
Times each
Speaker p e r ceived a s :
Male F e m a l e
i
Vocal
Characteristic:
FO
2
23
Male
VTR
Times each
Speaker p e r ceived a s :
Male F e m a l e
Female
15
13
10
12
2.
II
II
0
25
II
11
3.
1I
II
1
24
I1
11
20
5
4.
II
II
1
24
II
11
5
5.
II
II
I
24
I1
11
20
20
5
120
88
37
25
0
Fenale
Male
21
4
II
8
17
I1
17
8
18
20
7
84
41
Total
6.
Male
I
Male
$1
II
25
0
II
8.
II
I1
25
0
II
9.
11
It
25
0
II
II
11
25
0
11
II
125
0
7.
Total
5;
5
tI
When the two vocal c h a r a c t e r i s t i c s w e r e contrasted within one voice,
shown on the r i g h t side of the t a b l e , different r e s u l t s a r e seen.
It should
be kept in mind in evaluating the distribution of these judgments that t h e r e
a r e no ' ' c o r r e c t f 1o r ''incorrect" identifications s i n c e , a t t h i s s t a g e , both
vocal c h a r a c t e r i s t i c s a r e p r e s u m e d to be of equal p e r c e p t u a l prominence.
R a t h e r , a choice a s to whether a speaker i s m a l e o r f e m a l e indicates
simply t h a t the l i s t e n e r ' s perception w a s based on one vocal c h a r a c t e r i s t i c
o r the other and not whether h i s judgment was r i g h t o r wrong.
It i s c l e a r f r o m the distribution of identifications in r e s p o n s e t o the
contrasted cues that both m a l e c h a r a c t e r i s t i c s a r e perceptually m o r e
prominent than t h e i r female counterpart.
S p e a k e r s w e r e identified a s
STL-QPSR 2 -3/1973
21.
f e m a l e s 7 8 t i m e s and a s m a l e s 172 t i m e s indicating that l i s t e n e r s based
t h e i r identification on the m a l e voice c h a r a c t e r i s t i c a t a b e t t e r than two to
one r a t e .
Both m a l e c h a r a c t e r i s t i c s a p p e a r to be about equally dominant
over the female vocal c h a r a c t e r i s t i c in the s a m e voice.
Of the 172 m a l e
identifications 88 w e r e based on a low F0 and 84 on a low VTR c h a r a c t e r i s t i c ,
T h e r e i s no question, however, that the p r e s e n c e of female vocal c h a r a c t e r i s t i c r e d u c e s the perceptual prominence of the m a l e c h a r a c t e r i s t i c .
This
i s a p p a r e n t in the reduction of m a l e identifications f r o m 100 % when two
m a l e c h a r a c t e r i s t i c s w e r e combined t o 69 % when a m a l e and a female c h a r a c t e r i s t i c w e r e combined.
Discussion
The r e s u l t s of these two e x p e r i m e n t s considered together appear to be
somewhat inconsistent.
In the experiment using l i v e , unaltered voices
the frequency of the s p e a k e r ' s Fo w a s found t o be the p r i m a r y d e t e r m i n e r
of how m a l e o r female a voice sounded.
The p r e s e n c e of vocal t r a c t r e -
sonances that w e r e m o r e c h a r a c t e r i s t i c of the opposite s e x in these voices
did not influence the judges' e s t i m a t e s of d e g r e e of m a l e n e s s o r female -
,
n e s s and it i s c l e a r that in n a t u r a l speech, a perception of vocal pitch i s
I
a product of the frequency of the Fo.
In the second e x p e r i m e n t , on the other hand, s p e a k e r s w e r e m o r e likely
t o be judged a s m a l e even in the p r e s e n c e of a female-like F
0
p a r e n t in these c a s e s that the m a l e vocal t r a c t r e s o n a n c e c h a r a c t e r i s t i c
had considerable p e r c e p t u a l importance.
The m a l e F o , however, retained
i t s predominance over the female vocal t r a c t resonance c h a r a c t e r i s t i c which
would be consistent with the findings of E x p e r i m e n t I.
The somewhat different r e s u l t s of the two e x p e r i m e n t s m a y be a r e s u l t
of the u s e of the laryngeal v i b r a t o r a s a vocal sound s o u r c e in E x p e r i m e n t 11.
It m a y be e a s i e r t o produce a m o r e n a t u r a l sounding m a l e than female funda m e n t a l with the p a r t i c u l a r laryngeal v i b r a t o r used in t h i s study.
This
could account f o r the perceptual weakness of the female Fo in the second
e x p e r i m e n t that w a s not seen in the f i r s t .
It i s a l s o possible that the glottal
s o u r c e in f e m a l e s d i f f e r s f r o m m a l e s in some b a s i c way b e s i d e s simply
t h a t of pitch.
i
I
and it is a p -
T h i s needs to be examined m o r e closely since it m a y be d e -
s i r a b l e to produce female sounding speech synthetically that can be e a s i l y
recognized a s such e i t h e r by humans o r by automatic speech recognition
I
1
I
1
I
STL-QPSR 2 -3/1973
devices.
22.
If a good approximation of the n a t u r a l female glottal tone i s not
provided, speaker s e x recognition m a y be confounded by the frequency of
the vocal t r a c t r e s o n a n c e s with which it i s combined.
Conclusions
In n a t u r a l speech, the d e g r e e of m a l e o r female quality in the voice i s
a function of the frequency of the laryngeal fundamental.
Individual
vocal t r a c t resonance c h a r a c t e r i s t i c s , w h e t h e r m a l e o r f e m a l e , contribute
little o r nothing to the perception of t h i s vocal quality.
When a laryngeal
v i b r a t o r i s substituted for the n o r m a l glottal tone, however, the female
F
0
i s perceptually weaker than m a l e vocal trac-t resonance c h a r a c t e r i s t i c s ,
while the m a l e F
0
r e t a i n s i t s p e r c e p t u a l prominence.
Acknowledgments
T h i s study was supported by a g r a n t f r o m the Selling R e s e a r c h Foundation of the University of Oregon Medical School and by Dept. of H. E . W.
G r a n t 72 -260-5089.
A portion of t h i s r e s e a r c h was c a r r i e d out while the
author w a s a guest r e s e a r c h e r at the Dept. of Speech Communication, Speech
T r a n s m i s s i o n L a b o r a t o r y of the Royal Institute of Technology (KTH), Stockholm.
The valuable suggestions provided by many of the STL staff a r e
gratefully acknowledged.
Appreciation i s a l s o extended t o those p e r s o n s
who s e r v e d a s willing subjects and l i s t e n e r s in the two e x p e r i m e n t s .
References:
Coleman, R. (1971): "Male and F e m a l e Voice Quality and I t s Relationship to Vowel F o r m a n t F r e q u e n c i e s " , J . Sp. H e a r . R e s . 1 4 , p. 566.
F a n t , G. (1954): "A Note on Vocal T r a c t Size F a c t o r s and Non-Uniform
F - P a t t e r n Scalings", STL-QPSR 4/1966, p. 2 2 .
F a n t , G . , Ishizaka, K . , Lindqvist, J . , and Sundberg, J . (1972): "Subglottal F o r m a n t s " , STL-QPSR 1/1972, p. I.
Ingemann, F. (1968): ''Ident ification of the S p e a k e r J s Sex f r o m Voiceless
F r i c a t i v e s " , J.Acoust. Soc.Am. 44, p. 1142.
P e t e r s o n , G. and Barney, H. (1952): "Control Methods Used in a Study
of the Vowelsu, J.Acoust.Soc.Am. 24, p. 175.
S c h w a r t z , M. (1968): "Identification of Speaker Sex f r o m Isolated Voice43, p. 1178.
l e s s F r i c a t i v e s " , J . A c o u s t . Soc.Am. -
Download