Dept. for Speech, Music and Hearing Quarterly Progress and Status Report A comparison of the contributions of two vocal characteristics to the perception of maleness and femaleness in the voice Coleman, R. O. journal: volume: number: year: pages: STL-QPSR 14 2-3 1973 013-022 http://www.speech.kth.se/qpsr 11. SPEECH PERCEPTION A. A COMPARISON OF THE CONTRIBUTIONS O F TWO VOCAL CHARACTERISTICS TO TIiE PERCEPTION OF MALENESS AND FEMALENESS IN THE 'JOICE4t. R . 0. Coleman Abstract Two e x p e r i m e n t s w e r e c a r r i e s out in which a c o m p a r i s o n w a s made between the contributions of F on one hand, and vocal t r a c t r e s o n a n c e s on the o t h e r , t o a perception of m a l e n e s s and f e m a l e n e s s in the adult voice. In the f i r s t e x p e r i m e n t , in which n a t u r a l voice w a s u s e d , the frequency of the Fo w a s found to be v e r y highly c o r r e l a t e d ( r s = . 9 4 ) with the d e g r e e of m a l e n e s s and f e m a l e n e s s in the voice. The vocal t r a c t r e sonances w e r e l e s s highly c o r r e l a t e d and it i s a p p a r e n t that in the p r e s ence of the n a t u r a l l a r y n g e a l tone, these p e r c e p t i o n s a r e based on the frequency of the Fo. In the second e x p e r i m e n t a tone produced by a l a r y n g e a l v i b r a t o r w a s substituted f o r the n o r m a l glottal tone a t simulated Fo' s r e p r e s e n t i n g both m a l e s (120 Hz) and f e m a l e s (240 Hz). When l i s t e n e r s w e r e asked to identify the s e x of the s p e a k e r s some inconsistency with the findings of the f i r s t e x p e r i m e n t w a s seen. The female F 0 w a s a weak indicator of female voice quality when combined with m a l e vocal t r a c t resonance although the m a l e Fo r e t a i n e d the p e r c e p t u a l prominence s e e n in the f i r s t experiment. T h i s finding m a y be indicative of some basic difference in the n o r m a l glott a l c h a r a c t e r i s t i c s of m a l e s and female s. Introduction It h a s long been recognized that the s h o r t e r o v e r a l l vocal t r a c t length in f e m a l e s r e s u l t s in a n upward shift in the frequency of f e m a l e vowel formants. The d e g r e e of t h i s shift h a s been e s t i m a t e d t o be on the o r d e r of about 20 70 ( P e t e r s o n and B a r n e y , 1952), although individual differences in vocal t r a c t length can be expected t o affect the exact amount of shift o c c u r r i n g between any two individuals. The p e r c e p t u a l significance of these differences h a s been studied by Coleman (1971) who r e p o r t e d that in the absence of n o r m a l differences in the s e x of the speaker can be recognized with little difficulty in m o s t instances. Confusion o c c u r r e d when the s p e a k e r ' s vocal t r a c t r e s o n a n c e s w e r e m o r e like that of the opposite s e x than of his own. The d e g r e e of m a l e n e s s o r f e m a l e n e s s in the voice w a s a l s o found to be significantly c o r r e l a t e d with the frequency of the r e s o n a n c e s of the p e r s o n ' s speech. " G u e s t r e s e a r c h e r f r o m the University of Oregon Medical School, P o r t l a n d , Oregon, USA f r o m 10 F e b r u a r y to 7 August, 1973. STL-QPSR 2 -3/i973 In other w o r d s , the higher the vowel f o r m a n t f r e q u e n c i e s , the m o r e female like the voice w a s f e l t t o be and vice v e r s a . T h i s finding would suggest that under c e r t a i n c i r c m s t a n c e s l i s t e n e r s a r e quite sensitive t o the r e l a tively s m a l l d i f f e r e n c e s in vocal t r a c t r e s o n a n c e s that o c c u r between speakers. It a p p e a r s that l i s t e n e r s a r e a l s o sensitive t o resonance dif- f e r e n c e s in c e r t a i n consonants. Schwartz (1968) and Ingemann (19 6 8 ) have found, f o r e x a m p l e , that the s e x of a speaker c a n be a c c u r a t e l y identified on the b a s i s of isolated, voiceless f r i c a t i v e s . On the b a s i s of the perceptual significance of the vocal t r a c t r e s o n a n c e s , it h a s been hypothesized that a perception of a s p e a k e r ' s vocal "pitch" and subsequently the m a l e n e s s and f e m a l e n e s s of h i s voice, r e s u l t s f r o m the combining of the information conveyed by both the l a r y n g e a l fundamental and the r e s o n a n c e s of the vocal t r a c t (Coleman, 1971). T h i s hypothesis h a s implications f o r the automatic recognition of speech a s well a s for the understanding of c e r t a i n types of voice d i s o r d e r s and two e x p e r i m e n t s de signed to t e s t i t w e r e c a r r i e d out. In both, the purpose was to c a n p a r e the contributions of the laryngeal fundamental and vocal t r a c t r e s o n a n c e s t o a perception of malene s s and f e m a l e n e s s in the voice. In the f i r s t e x p e r i m e n t , utilizing n a t u r a l voice, c o r r e l a t i o n coefficients w e r e computed between m e a s u r e s of vocal t r a c t r e s o n a n c e , laryngeal funda m e n t a l , and judgments of the d e g r e e of m a l e n e s s o r f e m a l e n e s s in the voice. The d e g r e e of c o r r e l a t i o n i s considered to indicate the contribution of e a c h of the two vocal c h a r a c t e r i s t i c s to the l i s t e n e r judgments. In the second e x p e r i m e n t , the two vocal c h a r a c t e r i s t i c s w e r e contrasted s o that l i s t e n e r s w e r e p r e s e n t e d with speech composed of vocal t r a c t r e s o n a n c e s c h a r a c t e r i s t i c of one s e x in combination with an F c h a r a c t e r i s t i c of the 0 opposite s e x and a s k e d to identify the s e x of the speaker. In such a f o r c e d choice situation, l i s t e n e r identifications would p r e s u m a b l y be based on the m o r e perceptually prominent of the two vocal c h a r a c t e r i s t i c s . Speaker s The p e r sons f r o m whom the speech s a m p l e s w e r e obtained w e r e twenty n o r m a l adult m a l e s and a like number of f e m a l e s selected f r o m a n A m e r i c a n university population. S p e a k e r s w e r e e s s e n t i a l l y unselected except those whose f i r s t language w a s not English o r who had some obvious s p e e c h def e c t o r regional a c c e n t w e r e not included. STL-QPSR 2 -3/1973 Recording p r o c e d u r e Tape r e c o r d i n g s w e r e made on a high fidelity a r r a y that included a n Ampex 601 r e c o r d e r and a n E l e c t r o - V o i c e 666 microphone. The following spoken m a t e r i a l w a s r e c o r d e d by each s p e a k e r : Two repetitions of a standard six-word sentence, t h r e e productions of e a c h of the vowels /i/, /u/, /E sage". /, and / a / , and t h r e e v e r s i o n s of a portion of the I1Rainbow P a s - T h e s e p a r t i c u l a r vowels w e r e selected since they a r e r e p r e s e n t a t i v e of the e x t r e m e tongue positions during production. T h i s i s a n important consideration since the difference between the vowel f o r m a n t frequencies of m a l e s and f e m a l e s can be expected to v a r y f r o m one vowel to the next. T h i s i s b e c a u s e , according to F a n t (1966), i t i s the pharyngeal cavity that i s s h o r t e r in f e m a l e s with the o r a l cavity being about the s a m e length in both sexes. The change f r o m m a l e s to f e m a l e s , t h e r e f o r e , would be m o r e pronounced in those vowels having f o r m a n t s a s s o c i a t e d with the pharyngeal cavity, for instance F2 of / i / . The sample sentences and one v e r s i o n of the Rainbow P a s s a g e w e r e produced with a n o r m a l voice. Two other v e r s i o n s of Rainbow P a s s a g e w e r e produced with a laryngeal v i b r a t o r ( W e s t e r n E l e c t r i c , Model 5 E l e c t r o l a r y n x ) substituting f o r the glottal sound a t f r e q u e n c i e s of 120 Hz and 240 Hz. These we r e selected a s being r e p r e s e n t a t i v e of the typical m a l e and female The isolated vowels w e r e a l s o produced with the laryngeal v i b r a t o r Fo. producing a 100 Hz sound. T h i s was done in o r d e r to enhance the s p e c t r o graphic a n a l y s i s of the vowels since it i s s o m e t i m e s difficult to identify the higher f o r m a n t s of female s p e a k e r s , p a r t i c u l a r l y those with a high F 0' Subjects w e r e instructed to shape t h e i r mouths to produce e a c h t a r g e t vowel and allowed to p r a c t i c e with the laryngeal v i b r a t o r until e a s i l y recognizable vowels could be produced. A problem inherent in using a v i b r a t o r produced tone to substitute for the n o r m a l laryngeal tone i s that if the t r a c h e a i s coupled t o the r e s t of the vocal t r a c t by a glottal opening, the resonant c h a r a c t e r i s t i c s of the vocal t r a c t a r e altered. T h i s h a s been d i s c u s s e d by F a n t e t a l . (1972) who point out that introducing a n e x t e r i o r l y based sound s o u r c e n e a r the level ,of the open glottis r e s u l t s in the introduction of the r e s o n a n c e s and antiresonance s that originate in the t r a c h e a . These a r e capable of bringing about a funda- m e n t a l a l t e r a t i o n in the t r a n s f e r c h a r a c t e r i s t i c s of the vocal t r a c t and in the vowels produced by that vocal t r a c t . 16. STL-QPSR 2 -3/1973 T h i s effect i s i l l u s t r a t e d in Fig. 11-A-I. A s p e c t r o g r a m of the vowel / i / produced by the author using a l a r y n g e a l v i b r a t o r with the glottis both open and closed i s shown. A s a r e s u l t of the coupling of the t r a c h e a to the r e s t of the vocal t r a c t , FZ of t h i s vowel h a s been virtually eliminated and F l shifted upward in frequency. The p e r c e p t u a l change accompanying t h i s i s equally m a r k e d and with the glottis open the vowel i s unrecognizable. It i s important, t h e r e f o r e , t h a t the glottis r e m a i n s closed when producing vowels with a laryngeal v i b r a t o r in o r d e r t o i n s u r e a n o r m a l f o r m a n t I I structure. In t h i s study t h i s p r o b l e m w a s handled r e t r o s p e c t i v e l y by reviewing a l l vowels and eliminating those which did not have a quality e a s i l y r e cognizable a s being that of the t a r g e t vowel. Virtually all the vowels had a n acceptable quality and it is likely that p e r s o n s normally keep t h e i r glottis closed when producing vowels in t h i s way. At the s a m e t i m e it i s impossible to verify for c e r t a i n that some d e g r e e of coupling between the t r a c h e a and the r e s t of the vocal t r a c t did not e x i s t . Vowel f o r m a n t values have, t h e r z f o r e , been r e p o r t e d a s ranlcings within the sample population r a t h e r than a s specific f o r m a n t frequencies in Hz. The assumption i s that the s p e a k e r s would be relatively consistent in their p e r f o r m a n c e and what - e v e r shift in formant frequency m a y be o c c u r r i n g will a l s o be consistent for this group of s p e a k e r s . Formant measure s F o r m a n t f r e q u e n c i e s for the fqur vowels under study w e r e obtained f r o m hand -drawn l i n e s -of -be s t -fit around intensity -by -time s p e c t r o g r a p h i c sections of e a c h of the isolated vowels. The frequency of the m o s t i n - t e n s e h a r m o n i c of e a c h f o r m a n t w a s designated a s the frequency of that p a r t i c u l a r formant. Individual values f o r f o r m a n t s 1, 2 , and 3 ( F1, =2 J and F ) f o r e a c h vowel w e r e a v e r a g e d and the r e sulting figure used to 3 r e p r e s e n t the r e l a t i v e vocal t r a c t resonance c h a r a c t e r i s t i c of e a c h speaker. It w a s found that i t w a s not always possible t o c l e a r l y distinguish between and F2 in the /a/ produced by s e v e r a l s p e a k e r s and the /a/ w a s sub1 sequently eliminated f r o m consideration. Ranking s of vocal t r a c t r e sonance F c h a r a c t e r i s t i c s a r e based, t h e r e f o r e , on m e a s u r e s of / i / , only. /, /, and /u/, closed glottis open glottis closed glottis F i g . II-A- I. Vowel /i/ produced using a laryngeal vibrator with speakerys glottis open and closed. - - STL-QPSR Fundamental frequency ( F d Fundamental frequency was m e a s u r e d for e a c h speaker by determining the number of v e r t i c a l s t r i a t i o n s appearing on intensity-by-time s p e c t r o g r a m s in two standard sentences and dividing by the duration of the voiced portion of the sentences. The r e s u l t i n g F 0 values w e r e a v e r a g e d f o r e a c h speaker and ranked within the forty speaker group. L i s t e n e r iudcments - E x w e r i ~ n e n tI A listening tape w a s p r e p a r e d which consisted of 5 s e c s a m p l e s taken f r o m e a c h of the forty s p e a k e r s ' n o r m a l voice production of the Rainbow Passage. T h e s e w e r e p r e s e n t e d to a group of 17 young adults who w e r e asked to 1) d e t e r m i n e the s e x of e a c h s p e a k e r , and 2 ) to e s t i m a t e , on a scale of 1 - 7 , how much of the quality they a s s o c i a t e with that p a r t i c u l a r s e x e a c h voice contained. In o r d c r to reduce the influence of possible male/fernale differences in r a t e , juncture, and inflection, the five -second s a m p l e s w e r e p r e s e n t e d backwards. The judgments should, t h e r e f o r e , be based solely on vocal t r a c t resonance and fundamental frequency information which would be unaffected bjr the backward presentation. The r a t i n g s given e a c h subject w e r e averaged and ranked within the f o r t y subject group. The ranking of e a c h subject i s considered to r e p r e - sent the r e l a t i v e d e g r e e of m a l e n e s s o r f e m a l e n e s s (14-F voice quality) in e a c h s p e a k e r ' s voice. C o r r e l a t i o n coefficients ( s p e a r m a n r ) w e r e S then computed between e a c h of the t h r e e r a n k o r d e r e d m e a s u r e s . The a s - sumption i s that the c l o s e s t c o r r e l a t i o n would indicate the vocal c h a r a c t e r i s t i c that i s making the g r e a t e s t contribution to the perception of M - F voice quality. If the two c h a r a c t e r i s t i c s under study w e r e equally p r o m - inent t h i s should be reflected by a s i m i l a r i t y in the c o r r e l a t i o n coefficients. Re sult s - Experiment I The r e s u l t s of the v a r i o u s r a n k - o r d e r c o r r e l a t i o n s .are s u m m a r i a e d in Table 11-A-I. These s a m e r e s u l t s a r e a l s o shown in F i g s . 11-A-2, 11-A-3, and 11-A-4. lowest F In these f i g u r e s the ranking of "l'! r e p r e s e n t s the 0 ' the lowest VTR figure, and the Most Male sounding voice, respectively. 40 38 - I I I I I I I I I I I I I I - LL rn a z Z 4 0 o E 0 - . - 6-• 4 - O 0 - O 0 - icO 1-1 0 36 r 34 32 30 28 26 24 22 2018 16 1412 10 8 I I .a a ... 0 0 o 0 o - - - - 0 0 0 - I I I I Males 0 Femates - % a 2 i l l , , l , , , 1 0 0 2 4 6 8 10 12 14 16 18 202224 2628303234363840 RANKINGS OF THE RATINGS OF MALE-FEMALE VOICE QUALITY F i g . 11-A-2. Rankings of l i s t e n e r r a t i n g s of d e g r e e of m a l e - f e m a l e voice quality c o m p a r e d with rankings of ETo RANKINGS OF THE RATINGS OF MALE-FEMALE VOICE QUALITY Y h), - 0 a Q, 0 h), 00 - 0 G), 0 8 - K % P O'! 0 a 0 0 0 - 0 0 0 0 0 I - - T r T T 1 9 : I T r I T 0 - I-i - STL-QPSR 2 -3/1973 18. S p e a r m a n r a n k - o r d e r c o r r e l a t i o n coefficients ( r s) be tween d e g r e e of ~ a l e / ~ e m a lvoice e quality, fundamental frequency ( F o ) , and vocal t r a c t r e s o n a n c e s (VTR). TABLE 11-A-I. I 1 b o m p a r ison 1 i Combined Males and F e m a l e s h-FVoice Duality with Fo .94* bd-F Voice Duality with VTR .59* ! 7% p t p N.S. 6 5* .O O t . 5 6* TR with Fo I rs Males only I .14t I I I L i i Females only I .88* .27+ . 17+ .oi ! It c a n r e a d i l y be s e e n t h a t the l i s t e n e r s w e r e basing t h e i r judgments of the d e g r e e of m a l e n e s s o r f e m a l e n e s s in the voice on the frequency of the l a r y n g e a l fundamental. The c o r r e l a t i o n coefficient of . 9 4 r e p r e s e n t s a n a l m o s t p e r f e c t one -to -one correspondence between the rankings of how m a l e o r female sounding a p e r s o n ' s voice w a s judged to be and the f r e quency of h i s laryngeal fundamental. A significant c o r r e l a t i o n of . 5 9 w a s a l s o observed between VTR and M - F voice quality. This i s interpreted a s being simply a n e x p r e s s i o n of the relationship between VTR and F 0 which showed a s i m i l a r c o r r e l a t i o n of . 5 6 . When the c o r r e l a t i o n s a r e compared s e p a r a t e l y f o r the two s e x e s , howe v e r , somewhat different r e s u l t s a r e seen. A significant, though somewhat reduced, c o r r e l a t i o n i s again seen between the rankings of F and M - F 0 voice quality while the rankings of the other two m e a s u r e s show no c o r r e l a tion. That i s to s a y , rankings of Fo w e r e unrelated t o those of VTRs, and VTR rankings w e r e unrelated to those of M - F voice quality within m a l e s and f e m a l e s a s sub-groups. T h i s i s a reflection of the dichotomous nature of the two s e x e s in t h e i r rankings of F and M - F voice quality. 0 T h i s d ichotomy would contribute to the relationship between the ranking s of t h e s e m e a s u r e s when both s e x e s a r e considered a s a single group but would not when they a r e considered a s two subgroups. A s can be s e e n in F i g s . 11-A-2, 11-A-3, and 11-A-4 t h e r e i s no overlap between m a l e s and f e m a l e s in t h e i r rankings of F 0 and M - F voice quality. I STL-QPSR 2 -3/1973 L i s t e n e r judgments - E x p e r i m e n t II In t h i s e x p e r i m e n t voices which consisted of a m a l e c h a r a c t e r i s t i c in combination with a female c h a r a c t e r i s t i c w e r e p r e s e n t e d to a group of l i s t e n e r s who w e r e asked to d e t e r m i n e the s e x of e a c h s p e a k e r . As a c o n t r o l , the l i s t e n e r s w e r e a l s o asked t o m a k e a s i m i l a r judgment f o r voices containing vocal c h a r a c t e r i s t i c s consistent f o r one sex. F r o m f o r t y s p e a k e r s used in E x p e r i m e n t I, the five f e m a l e s with the highest vocal t r a c t r e s o n a n c e s and the five m a l e s with the lowest vocal A listening t r a c t r e s o n a n c e s w e r e s e l e c t e d for u s e in this experiment. I tape w a s p r e p a r e d which consisted of five-second s e g m e n t s of e a c h of the t e n s p e a k e r s articulating the tone produced by a laryngeal v i b r a t o r a t e a c h of the two pitches: 240 Hz and 120 Hz. The listening tape consisted, t h e r e f o r e , of a n equal number of e a c h of the following combinations: low VTR with low F o , high VTR with high Fo, and high VTR with low Fo. Half of the s a m p l e s , t h e r e f o r e , contained two vocal c u e s consistent with , I I one s e x and half contained two contrasting vocal cues. I T h i s 20 item s e r i e s w a s p r e s e n t e d to a listening group composed of 2 5 young adults. L i s t e n e r s w e r e asked simply t o d e t e r m i n e whether e a c h s p e a k e r w a s a m a l e o r a female. Since speech produced with a l a r y n g e a l v i b r a t o r h a s a n unusual quality the group w a s allowed t o l i s t e n t o the e n t i r e s e r i e s and t h e i r judglr-snts w e r e based on a second playing of the tape which follovred immediately. In those speech s a m p l e s where the voices contaiiied two contrasting vocal c h a r a c t e r i s t i c s , l i s t e n e r identifications of the s p e a k e r s e x would be based on the m o r e prominent of the two c h a r a c t e r i s t i c s . If the c u e s we r e perceptually equal, identifications would be distributed r andomally with e a c h s e x identified about a n equal number of t i m e s . In those voices where the two vocal c h a r a c t e r i s t i c s w e r e consistent, speaker s e x identifications should be appropriate to that sex. Results - E x p e r i m e n t I1 The r e s u l t s of E x p e r i m e n t I1 a r e s u m m a r i z e d in Table 11-A -11 and i l l u s t r a t e d in F i g s . 11-A-5 and 11-A-6. When the two vocal c h a r a c t e r i s t i c s w e r e consistent f o r one s e x , shown in the left side of the table, the s e x of the speaker w a s c o r r e c t l y identified 245 out of 250 t i m e s and it i s c l e a r that t h i s w a s a n e a s y identification for the l i s t e n e r s to m a k e . I Speaker 1 Voices combining two FEMALE c h a r a c t e r i s t i c s (High Fo; High VTR) I I L i s t e n e r identifications of s p e a k e r s as MALE Fig. 11-A-5. Voices combining two MALE c h a r a c t e r i s t i c s (Low F0 * Low VTR) L i s t e n e r identifications of s p e a k e r s a s FEMALE Speaker s e x identifications based on voices in which two vocal qualities c h a r a c t e r i s t i c of the s a m e sex a r e combined. (Fg = Fundamental Frequency; VTR = Vocal T r a c t Resonance). - Speaker Voices combining a m a l e EYo with a F e m a l e VTR Voices combining a F e m a l e F', with a M a l e VTR L i s t e n e r identifications of s p e a k e r s a s MALE Fig. 11-A-6. L i s t e n e r identifications of s p e a k e r s a s FEMALE S p e a k e r identifications Lased o n voices i n which a v o c a l quality c h a r a c t e r i s t i c of ant. s e x is combined with a v o c a l cjuality c11;lra c t e r i s t i c of the other sex. (F, Fun<3~lmant,il F r e q u e n c y ; VTK - Vocal T r a c t R c s o n ; l n c r ~ ) . STL-QPSR 2-3/1973 TABLE 11-A-11. 2 0. Distribution of l i s t e n e r identifications of s p e a k e r s a s m a l e o r female in r e s p o n s e t o two vocal c h a r a c t e r i s t i c combinations: I - Fundamental frequency ( F O )and Vocal T r a c t Resonance (VTR) r e p r e sentative of the s a m e sex: I1 - A vocal quality c h a r a c t e r i s t i c of one s e x combined with a vocal quality c h a r a c t e r i s t i c of the other sex. S p e a k e r s I - 5 a r e f e m a l e , 6 - 10 a r e male. i I Vocal Characteristic: I1 I Speaker Fo VTR 1. 'Female Female Times each Speaker p e r ceived a s : Male F e m a l e i Vocal Characteristic: FO 2 23 Male VTR Times each Speaker p e r ceived a s : Male F e m a l e Female 15 13 10 12 2. II II 0 25 II 11 3. 1I II 1 24 I1 11 20 5 4. II II 1 24 II 11 5 5. II II I 24 I1 11 20 20 5 120 88 37 25 0 Fenale Male 21 4 II 8 17 I1 17 8 18 20 7 84 41 Total 6. Male I Male $1 II 25 0 II 8. II I1 25 0 II 9. 11 It 25 0 II II 11 25 0 11 II 125 0 7. Total 5; 5 tI When the two vocal c h a r a c t e r i s t i c s w e r e contrasted within one voice, shown on the r i g h t side of the t a b l e , different r e s u l t s a r e seen. It should be kept in mind in evaluating the distribution of these judgments that t h e r e a r e no ' ' c o r r e c t f 1o r ''incorrect" identifications s i n c e , a t t h i s s t a g e , both vocal c h a r a c t e r i s t i c s a r e p r e s u m e d to be of equal p e r c e p t u a l prominence. R a t h e r , a choice a s to whether a speaker i s m a l e o r f e m a l e indicates simply t h a t the l i s t e n e r ' s perception w a s based on one vocal c h a r a c t e r i s t i c o r the other and not whether h i s judgment was r i g h t o r wrong. It i s c l e a r f r o m the distribution of identifications in r e s p o n s e t o the contrasted cues that both m a l e c h a r a c t e r i s t i c s a r e perceptually m o r e prominent than t h e i r female counterpart. S p e a k e r s w e r e identified a s STL-QPSR 2 -3/1973 21. f e m a l e s 7 8 t i m e s and a s m a l e s 172 t i m e s indicating that l i s t e n e r s based t h e i r identification on the m a l e voice c h a r a c t e r i s t i c a t a b e t t e r than two to one r a t e . Both m a l e c h a r a c t e r i s t i c s a p p e a r to be about equally dominant over the female vocal c h a r a c t e r i s t i c in the s a m e voice. Of the 172 m a l e identifications 88 w e r e based on a low F0 and 84 on a low VTR c h a r a c t e r i s t i c , T h e r e i s no question, however, that the p r e s e n c e of female vocal c h a r a c t e r i s t i c r e d u c e s the perceptual prominence of the m a l e c h a r a c t e r i s t i c . This i s a p p a r e n t in the reduction of m a l e identifications f r o m 100 % when two m a l e c h a r a c t e r i s t i c s w e r e combined t o 69 % when a m a l e and a female c h a r a c t e r i s t i c w e r e combined. Discussion The r e s u l t s of these two e x p e r i m e n t s considered together appear to be somewhat inconsistent. In the experiment using l i v e , unaltered voices the frequency of the s p e a k e r ' s Fo w a s found t o be the p r i m a r y d e t e r m i n e r of how m a l e o r female a voice sounded. The p r e s e n c e of vocal t r a c t r e - sonances that w e r e m o r e c h a r a c t e r i s t i c of the opposite s e x in these voices did not influence the judges' e s t i m a t e s of d e g r e e of m a l e n e s s o r female - , n e s s and it i s c l e a r that in n a t u r a l speech, a perception of vocal pitch i s I a product of the frequency of the Fo. In the second e x p e r i m e n t , on the other hand, s p e a k e r s w e r e m o r e likely t o be judged a s m a l e even in the p r e s e n c e of a female-like F 0 p a r e n t in these c a s e s that the m a l e vocal t r a c t r e s o n a n c e c h a r a c t e r i s t i c had considerable p e r c e p t u a l importance. The m a l e F o , however, retained i t s predominance over the female vocal t r a c t resonance c h a r a c t e r i s t i c which would be consistent with the findings of E x p e r i m e n t I. The somewhat different r e s u l t s of the two e x p e r i m e n t s m a y be a r e s u l t of the u s e of the laryngeal v i b r a t o r a s a vocal sound s o u r c e in E x p e r i m e n t 11. It m a y be e a s i e r t o produce a m o r e n a t u r a l sounding m a l e than female funda m e n t a l with the p a r t i c u l a r laryngeal v i b r a t o r used in t h i s study. This could account f o r the perceptual weakness of the female Fo in the second e x p e r i m e n t that w a s not seen in the f i r s t . It i s a l s o possible that the glottal s o u r c e in f e m a l e s d i f f e r s f r o m m a l e s in some b a s i c way b e s i d e s simply t h a t of pitch. i I and it is a p - T h i s needs to be examined m o r e closely since it m a y be d e - s i r a b l e to produce female sounding speech synthetically that can be e a s i l y recognized a s such e i t h e r by humans o r by automatic speech recognition I 1 I 1 I STL-QPSR 2 -3/1973 devices. 22. If a good approximation of the n a t u r a l female glottal tone i s not provided, speaker s e x recognition m a y be confounded by the frequency of the vocal t r a c t r e s o n a n c e s with which it i s combined. Conclusions In n a t u r a l speech, the d e g r e e of m a l e o r female quality in the voice i s a function of the frequency of the laryngeal fundamental. Individual vocal t r a c t resonance c h a r a c t e r i s t i c s , w h e t h e r m a l e o r f e m a l e , contribute little o r nothing to the perception of t h i s vocal quality. When a laryngeal v i b r a t o r i s substituted for the n o r m a l glottal tone, however, the female F 0 i s perceptually weaker than m a l e vocal trac-t resonance c h a r a c t e r i s t i c s , while the m a l e F 0 r e t a i n s i t s p e r c e p t u a l prominence. Acknowledgments T h i s study was supported by a g r a n t f r o m the Selling R e s e a r c h Foundation of the University of Oregon Medical School and by Dept. of H. E . W. G r a n t 72 -260-5089. A portion of t h i s r e s e a r c h was c a r r i e d out while the author w a s a guest r e s e a r c h e r at the Dept. of Speech Communication, Speech T r a n s m i s s i o n L a b o r a t o r y of the Royal Institute of Technology (KTH), Stockholm. The valuable suggestions provided by many of the STL staff a r e gratefully acknowledged. Appreciation i s a l s o extended t o those p e r s o n s who s e r v e d a s willing subjects and l i s t e n e r s in the two e x p e r i m e n t s . References: Coleman, R. (1971): "Male and F e m a l e Voice Quality and I t s Relationship to Vowel F o r m a n t F r e q u e n c i e s " , J . Sp. H e a r . R e s . 1 4 , p. 566. F a n t , G. (1954): "A Note on Vocal T r a c t Size F a c t o r s and Non-Uniform F - P a t t e r n Scalings", STL-QPSR 4/1966, p. 2 2 . F a n t , G . , Ishizaka, K . , Lindqvist, J . , and Sundberg, J . (1972): "Subglottal F o r m a n t s " , STL-QPSR 1/1972, p. I. Ingemann, F. (1968): ''Ident ification of the S p e a k e r J s Sex f r o m Voiceless F r i c a t i v e s " , J.Acoust. Soc.Am. 44, p. 1142. P e t e r s o n , G. and Barney, H. (1952): "Control Methods Used in a Study of the Vowelsu, J.Acoust.Soc.Am. 24, p. 175. S c h w a r t z , M. (1968): "Identification of Speaker Sex f r o m Isolated Voice43, p. 1178. l e s s F r i c a t i v e s " , J . A c o u s t . Soc.Am. -