Analytical expressions for the tonotopic sensory scale

advertisement
The following article appeared in
Journal of the Acoustical Society of America 88: 97–100
and may be found at
http://scitation.aip.org/content/asa/journal/jasa/88/1/10.1121/1.399849.
Copyright (1990) Acoustical Society of America. This article may be downloaded for
personal use only. Any other use requires prior permission of the author and the Acoustical
Society of America.
Analytical expressions for the tonotopic sensory scale
Hartmut
Traunm011er
InstitutionenfSr lingvistik,Stockholmsuniversitet,S-106 91 Stockholm,Sweden
(Received16August1989;acceptedfor publication20 February1990)
Accuracyand simplicityof analyticalexpressions
for the relationsbetweenfrequencyand
criticalbandwidthaswell ascritical-bandrate (in Bark) are assessed
for the purposeof
applicationsin speechperceptionresearchand in speechtechnology.The equivalent
rectangularbandwidth (ERB) is seenas a measureof frequencyresolution,while the classical
critical-bandrate is considereda measureof tonotopicposition.For the conversionof
frequencyto critical-bandrate, and vice versa,the inversibleformula z = [26.81/( 1 + 1960/
f) ] -- 0.53 is proposed.Within the frequencyrangeof the perceptuallyessentialvowel
formants(0.2-6.7 kHz), it agreesto within _+0.05 Bark with the Bark scale,originally
publishedin the form of a table.
PACS numbers:43.71.Cq, 43.72.Ar, 43.66.Fe
INTRODUCTION
or less than the CB. The CB and the ERB have been found to
Two processesare generallyassumedto contributeto
auditoryfrequencyresolution.First, the hearingsystemis
capableof performingan "oscillographic"analysisof the set
of neuralsignalsoriginatingin the cochlea.This processis
limitedto frequencies
that canbe resolvedin the patternof
neuralresponses.
While singleneuronsare not likely to fire
morefrequentlythan500timespersecondevenat highstimulusintensities,frequencies
between0.5 and 1.5kHz canstill
behandledin the temporaldomain,albeitlessefficiently,on
thebasisof the signalsfrom a largenumberof neurons.The
capabilityandlimitationsof a frequencyanalysisin thetemporaldomainaredemonstrated
vividlyby cochlearimplant
patientswhosesole auditory input is an undifferentiated
electricalstimulationof the auditorynerve.
The second
process
coversthewholeauditoryfrequency
range.Any soundenteringa normalfunctioningcochleais
subjectto a spectralanalysis,resultingin a frequency-toplacetransformation.
The cochleacanberegardedasa bank
of filterswhoseoutputsare orderedtonotopically,with the
filtersclosest
to thebaseresponding
maximallyto thehighestfrequencies.
The tonotopicorderis knownto be maintainedin thestructureof the neuralnetworkat higherlevels
in the hearingsystem.
be proportionalandequivalentfor centerfrequencies
above
500 Hz. For lower frequencies,
there is a discrepancy,
as
shownin Fig. 1. In this range,the ERB decreases
with decreasingcenter frequency,while the CB remainscloseto
constant.The discrepancy
canbe explainedby the reasonableassumption
thattheanalysis
withinthetemporaldomain
is irrelevantto loudness
summationaslongasloudness
variations are not audible as such, while it contributessubstan-
tially to frequency
resolution
forf< 500Hz. Consequently,
theCB shouldnotbetakenasa measure
offrequency
resolution,butCB ratemaybetakenasa measure
of thetonotopic
sensoryscale.
In the familiar CB-ratescale(seeFig. 2), the CB has
beenchosento serveasa naturalunit of the tonotopicsensoryscale.Standardvaluesfortherelationbetweenfrequen-
Frequency
O. I
ß
0.2
0.5
I
I
f
1.0
I
(kHz)
2.0
•;.0
I
I
IO
I
The "notch-noise method" has often been used in inves-
tigationsof auditory frequencyselectivity.It involvesthe
determinationof the detectionthresholdfor a sinusoid,centeredin a spectralnotchof a noise,asa functionof the width
of the notch. On the basis of results obtained with this meth-
od,auditoryfrequencyselecivitycanbedescribedin termsof
the equivalentrectangularbandwidth(ERB) as a function
of centerfrequency(Moore and Glasberg,1983). Sincethe
two processes
mentionedaboveboth contributeto the detection of the sinusoid,the ERB, or ERB rate should not be
takenasa measureof the tonotopicscaleassuch.
A quantity related to the ERB, though not identical
with it, is the classicalcritical bandwidth(CB) (Zwicker et
al., 1957). Measurement
of the CB typicallyinvolvesloudnesssummationexperiments.Different summationrules
havebeenfoundto holdfor auditorystimuli,depending
on
whethertheirfrequencycomponents
areseparated
by more
97
J. Acoust.Soc.Am.88 (1),July1990
2.0
2. S
3.0
3.S;
z,.O
l•<f)
FIG. 1. Equivalent rectangularbandwidth, accordingto the formula
B = 6.23 10-6 f2 q_9.33910-2 f + 28.52,givenby MooreandGlasberg
( 1983) (curve), and criticalbandwidth,accordingto Zwicker's ( 1961) table (marks), as a functionof frequency.
0001-4966/90/070097-04500.80
¸ 1990Acoustical
Societyof America
97
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 130.237.171.143 On: Wed, 19 Nov 2014 11:19:40
Frequency
0.1
0.2
I
I
O.S
!
•
1.0
I
(kHz>
2.0
!
I. ANALYTICAL
S.O
I
EXPRESSIONS
10
I
A. Expressions for critical-band rate
In roughapproximation,the relationbetweenfandz is
linearforf< 500 Hz (z =f/100) and logarithmicfor higher
frequencies.Figure 3 (a) showsthe error functionsof two
logarithmicapproximationsto the CB scale.One of these,
Eq. (1), has been suggestedby Zwicker and Terhardt
(1980). It givesvaluesthat agreewith the tabulatedonesto
within _ 0.25 Bark in the range0.6 <f< 7.2 kHz. The other
approximation,Eq. (2), satisfies
our stricterstandardsof no
24
more than _ 0.05-Bark deviation at the costof a reduction
in the rangeof validity, to 1.0< f< 3.6 kHz:
z = 14.2log(f/1000)
4- 8.7,
z = 6.578 ln(f) -- 36.99.
lg(f>
cyfand CB ratez havebeenproposedby Zwicker ( 1961) in
theform of a table.The CB-ratescalehasbeenappliedextensivelyin researchonpsychoacoustics
andspeechperception.
For mostof theseapplications,it wouldbe moreconvenient
to havethe relationbetweenz andfspecifiedin the form of
an equationinsteadof a table. Severalequationsthat approximate the tabulatedvalueshave also been published
(Tjomov, 1971; Schroeder, 1977; Zwicker and Terhardt,
1980;Traunm011er,1983). In the following,the error functionsof theseequationswill be compared.
Recentstudiesof speechsoundssuggest
that the tonotopic distances (CB-rate differences)between prominent
peaksin their spectraare fundamentalto the perceptionof
theirphoneticquality.More specifically,
it hasbeensuggested that the spectralpeaksshapedby the formantsand the
fundamentalhave the samerelativetonotopiclocationsin
linguisticallyidenticalvowelsutteredby speakersdifferent
in ageandsex(Traunm011er,
1983,1988;SyrdalandGopal,
1986). While differences
in speakersizeappearto be reflected in a tonotopictranslationof the spectralpeaks,differencesin vocaleffortappearto be reflectedin a linear tonotopic compression/expansion
(Traunm011er,1988). In order
to test thesehypotheses,both in theory and by meansof
speechsynthesis,a convenientand accuratemethodof conversionfrom frequencyto CB rate, and viceversa,isneeded.
Our requirementsincludethat the functionhave a simple inverse and that it be accurate preferably to within
_ 0.05Barkin therangeof essential
vowelformantfrequenciesof men, women,and children.This rigorousclaim for
accuracypreventsthe introductionof any avoidableerror in
addition to that inherent in the table (Zwicker, 1961 ). However, it should be noticed that the absolute width of the criti-
cal band,and its definition,is irrelevantto the applications
we havein mind, aslongasthe obtainedscalesremainproportional.
98
J. Acoust.Soc.Am.,Vol.88, No.1, July1990
(2)
In theseandin all thefollowingequations,
frequencyfis
to be expressed
in Hz andCB ratez in CB units(Bark).
A mathematical
FIG. 2. Critical-bandratez asa functionoffrequencyf The plussign( + )
represents
datafrom Zwicker ( 1961). The curvecorresponds
to Eq. (6).
( 1)
function that is linear at one extreme
andlogarithmicat theotherextreme,thesinus-hyperbolicus
function,hasbeenusedby Tjomov ( 1971), Eq. (3), and by
Schroeder(1977), Eq. (4), to calculateCB rate. The error
functionsof both equationsare shownin Fig. 3(b).
f = 600 sinh(z/6.7) 4- 20,
z = 6.7 ln{[ (f--
+ ([ (f-
( 3)
20)/600]
20)/600]2+ 1)•/2} (inverse),
f = 650 sinh(z?7),
(4)
z = 7 ln((f/650) + [ (f/650) 2+ 1] •/2) (inverse).
As comparedwith the tabulatedvalues,Tjomov'sequaBark for
f< 4.5 kHz and Schroeder's
equation(4) to within 4- 0.13
Bark for f< 4.0 kHz. Theseequationsare accurateenough
for someapplicationsin whichfrequencycomponents
above
4 kHz maybe neglected,asthey are in somesystemsof telephoniccommunication.
Approximationscoveringthe wholeauditoryfrequency
rangecanbeachievedin variouswaysby appropriatecombinationsof mathematicalfunctions.For the mostpart, however, this yields equationsthat lack a simpleinverse.The
most accurateof the equationsgivenby Zwicker and Tertion (3) is accurate to within + 0.03 to -0.28
hardt (1980),
z = 13atn(0.00076f) + 3.5 atn(f/7500) 2,
(5)
is of this kind. It agreeswith the table to within 4- 0.20 to
-0.25 Bark over the whole range of auditory perception
[seeFig. 3 (c) ]. The wavinessof the error functiontells us,
however,that there is room for improvement.The equation
alsoclearlyfalls short of our standards.If, e.g.,we want to
comparethe tonotopicdistancesbetweentwo pairsof spectral peaks,we might obtainan error of up to 0.9 Bark.
An approximationthat hasa simpleinverseand meets
our standardsis achievedby consideringz to be related to
log(f) by a logistic function, also known as "growth
curve."Suchan approximation,Eq. (6), hasbeenproposed
by Traunm011er(1983). Its error function is shownin Fig.
3(d):
HartmutTraunmdller:
Tonotopic
sensoryscale
98
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 130.237.171.143 On: Wed, 19 Nov 2014 11:19:40
Fregue'ncy
0.0
I
O.S
1.0
I
I
f
Frequency
(kHz)
2.0
4.0
8.0
I
•
•
0.0
I
A
•
0.•
I
I'''''1'
•
f
(kHz)
l.O
I
2.0
I
•.0
I½
....
I ..........
8.0
I
I
3
v
.
N
2
I
-
A
.
.
•
•
ß 0•
-
-.!
_
O
v-2N
ee •e 3
o
_
i
'
_
ß
. -.3 !..........................................................................................................
• --e•
•
.
o
_ 4
•
I ....
,I,,,,,I
0
CB-rote
z
(LobIe
Frequency
a,
0.0
0.5
I
I
1.0
f
vo, lue)
z
/,.0
8.0
I
I
I
0.0
O.S
1.0
18
(Loble
Frequency
(kHz)
I,,,,,I
12
CB-rote
(a)
2.0
!
.....
6
f
2•
v•lue)
(b)
(kHz)
2.0
•.0
8.0
ß
•
I
I I'''''1'''''1'
.... I' .... I
o
.3
3
.1
2
ß2f
.1
ß
ß
.1
•
0
0
......
-
c• -.1
o
vN
2
-,2
o _.•
,,,
i , , , , , I , , , ,
o
6.
CB-rale
I .....
I , , , , , I
12
z
(tab!e
•
18
I,
0
2 z,
v•lue)
•...............................................................................
,,,,
I , ,,,
CB-•cte
(c)
, I ,,,,
6
z
• I ,,
12
18
(tc•le
vclue)
,,,
I
2•
FIG. 3. (a)-(d) Errorfunctions
ofvarious
approximations
oftheCB-ratescale.
Theerrorisdefined
asthedifference
between
thecalculated
valueandthatin
Zwicker's( 1961) table.It isplottedin steps
of0.5 Barkforeachfrequency
valuein thattable.(a) Logarithmic
approximations:
curvewithmarks,Eq. ( 1)
[givenbyZwickerandTerhardt(1980)]; curvewithoutmarks,Eq. (2). (b) Sinus-hyperbolicus
approximations:
lowercurve,Eq. (3) [givenbyTjomov
( 1971) ]; uppercurve,Eq. (4) [givenbySchroeder
(1977)]. (c) An overallapproximation,
Eq. ( 5), givenbyZwickerandTerhardt(1980).(d) A logistic
"growth-curve"
approximation:
lowercurvewitherrorscaleat theleft,Eq. (6) [givenbyTraunmiiller(1983)]; uppercurve,shownverticallydisplaced,
with error scaleat the right, Eq. (6) with corrections(7) and (8).
z = [26.81f/( 1960 +f)
f=
] -- 0.53,
1960(z + 0.53)/(26.28 -- z)
(6)
(inverse).
The valuesobtainedwith Eq. (6) deviatefrom the tabulated onesby lessthan +_0.05 Bark for 0.2 <f< 6.7 kHz.
At the low-frequencyend of the scale,the deviation
from the table (Zwicker, 1961) sumsup to -- 0.53 Bark for
f= 0 Hz ( -- 0.26 Bark forf= 20 Hz). At leastin part, this
deviationis due to biasedroundingof the bandwidthvalues
in Zwicker'stable. For frequenciesbelow400 Hz, the standard width of the critical band was set uniformly equal to
100Hz. This appearsto havebeendonein orderto obtainthe
mnemonicallysimplerelationz = f? 100.The originalbandwidth data (Zwicker et al., 1957) indicateB • 90 Hz for the
lowerfrequencies
in that range.The valueslistedin the table
forf< 100Hz are particularlyquestionable
becausetheycan
hardly be saidto be basedon any reliableexperimentalevi99
J. Acoust. $oc. Am., Vol. 88, No. 1, July 1990
dence.Equation(6) may representthe tonotopicscalewell
enoughdown to the lowestfrequenciesfor which it can be
determinedexperimentally.The deviationat the high-frequencyendof the scaleremainsunaccounted
for.
Calculatingz with Eq. (6), closeagreementwith the
table can be achieved over the whole auditory frequency
range by added corrections,bending the error function
straightat both endsof the scale,in the followingway:
for calculated z < 2.0 Bark: z'=z+O.
15(2--z),
for calculated z > 20.1 Bark: z' = z q- 0.22 (z -- 20.1 ).
(7)
(8)
Sincethisisan easilyinvertedprocedure,the calculation
off for a givenzis nota problem.The errorfunctionobtained
with thesecorrectionsis alsoshownin Fig. 3 (d). The values
calculatedin this way agreewith the tableforf> 100 Hz to
within -F 0.05 Bark. Correction (7), however, simulates
alsothe above-mentioned
biasat low frequencies.
Hartmut Traunm•Jller:Tonotopicsensory scale
99
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 130.237.171.143 On: Wed, 19 Nov 2014 11:19:40
F•e•u, ency
0.0
ß
0
O.E;
1.0
f
2.0
(kHz)
A..O
for criticalbandscenteredat z obtainedby Eq. (6) without
corrections.The valuescalculatedby Eq. (10) agreewith
Zwicker's table to within + 6% for 0.27 <f<5.8 kHz.
Within that range, the error function is similar to that obtainedby Eq. (9). The error functionsof both equationsare
shownin Fig. 4.
8.0
ACKNOWLEDGMENT
'
The preparationof this paperhasbeensupportedby a
grant from HSFR, the SwedishCouncilfor Researchin the
v
rn
2
Humanities
u.I
I ,
0
,
I
, , , , I , , , , , I , , , ,
G
CB-rote
12
z
18
(Loble
and Social Sciences.
I
Moore,B.C. J., andGlasberg,B. R. (1983). "Suggested
formulaefor calculating auditory-filterbandwidthsand excitationpatterns,"J. Acoust.
2•,
value)
Soc. Am. 74, 750-753.
FIG. 4. Error functionsfor critical bandwidthcalculatedwith Eq. (9)
(curve with marks) and Eq. (10) (curve without marks), as compared
with Zwicker's ( 1961) tablevalues(seealsoFig. 1).
Schroeder,M. R. (1977). "Recognitionof complexacousticsignals,"in
Life Sciences
Research
Report5 (DahlemKonferenzen),
editedby T. H.
Bullock (Abakon Verlag, Berlin), pp. 323-328.
Syrdal,A. K., and Gopal, H. S. (1986). "A perceptualmodelof vowelrecognitionbasedon the auditoryrepresentation
of AmericanEnglishvowels," J. Acoust. Soc. Am. 79, 1086-1100.
B. Expressions for critical bandwidth
Zwicker and Terhardt (1980) proposedthe equation
B- 25 + 75(1 + 1.4 10--6f2)0'69
to calculate critical bandwidth
(9)
B as a function of center fre-
quencyf While Eq. (9) is very accurate,it cannoteasilybe
integratedto obtainCB rate. The authors'equationfor CB
rate (5)'is not compatiblewith Eq. (9).
Proceedingfrom Eq. (6), critical bandwidthsB can be
calculated
100
J. Acoust.Soc. Am., Vol. 88, No. 1, July 1990
loudnesssummation," J. Acoust. Soc. Am. 29, 548-557.
Zwicker, E., and Terhardt, E. (1980). "Analytical expressions
for criticalbandrate and criticalbandwidthasa functionof frequency,"J. Acoust.
as
B = 52548/(z2 -- 52.56z+ 690.39)
Tjomov,V. L. (1971). "A modelto describetheresultsof psychoacoustical
experiments
on steady-state
stimuli,"in AnalizRechevykh
$ignalovChelovekom,editedby G. V. Gershuni (Nauka, Leningrad), pp. 36-49.
Traunmiiller,H. (1983). "On vowels:Perceptionof spectralfeatures,related aspectsof productionand sociophonetic
dimensions,"
Ph.D. thesis,
Universityof Stockholm.
Traunmiiller, H. (1988). "Paralinguisticvariationand invariancein the
characteristic
frequencies
of vowels,"Phonetica45, 1-29.
Zwicker,E. ( 1961). "Subdivision
of the audiblefrequencyrangeinto critical bands(Frequenzgruppen),"J. Accoust.Soc.Am. 33, 248.
Zwicker,E., Flottorp, G., andStevens,S.S. (1957). "Critical bandwidthin
(lO)
Soc. Am. 68, 1523-1524.
HartmutTraunm(Jller:
Tonotopicsensoryscale
100
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 130.237.171.143 On: Wed, 19 Nov 2014 11:19:40
Download