Perception of aperiodicities in synthetically generated voices

advertisement
Perception of aperiodicities in synthetically generated voices
James Hillenbranda)
RIT Research
Corporation
and Departmentof Computer
Science,
Rochester
Instituteof Technology,
75HighpowerRoad,Rochester,
New York14623-3435
(Received2 December1987;accepted
for publication15February1988)
The purposeof this studywasto investigateunivariaterelationshipsbetweenperceived
dysphoniaand variationin pitchperturbation,amplitudeperturbation,andadditivenoise.A
time-domain,pitch-synchronous
synthesis
techniquewasusedto generatesustainedvowels
varyingin eachof the threeacousticdimensions.
A panelof trainedlistenersprovideddirect
magnitude
estimates
of roughness
in thecaseof thestimulivaryingin pitchandamplitude
perturbation,and breathinessin the caseof the stimulivaryingin additivenoise.Very strong
relationships
werefoundbetweenperceivedroughness
andeitherpitchor amplitude
perturbation.
However,unlikeresultsreportedpreviously
for nonspeech
stimuli,thesubjective
qualityassociated
with pitchperturbation
wasquitedifferentfromthat associated
with
amplitudeperturbation.Resultsalsoshowedthat perceived
roughness
wasaffectednot onlyby
the amountof perturbation,but alsoby the degreeof correlationbetweenadjacentpitchor
amplitudevalues.A strongrelationshipwasfoundbetweenperceivedbreathiness
and signalto-noiseratio. Contrary to previousfindings,therewasno interactionbetweensignal-to-noise
ratio and the amountof high-frequencyenergyin the periodiccomponentof the stimulus:
Stimuliwith similarsignal-to-noise
ratiosreceivedsimilarratings,regardless
of differences
in
the spectralslopeof the periodiccomponent.
PACS numbers:43.70.Dn, 43.7 l.Gv, 43.72.Ja
tic dimensions
in naturallyoccurringvoices.Of particular
importance
are
theinabilityto controltherangeof variation
There is very goodagreementamongvoiceclinicians
on
particular
acoustic
dimensions
andthedegree
ofintercorand voicescientistsof the needfor objectivemeasurements
relation
among
individual
acoustic
properties.
that wouldserveto quantifyboth the typeand severityof
The presentstudyrepresents
an attemptto address
this
dysphonia
thataccompanies
a widerangeoflaryngeaidisorproblem
by
studying
the
perceptual
characteristics
of
synders(e.g.,Aronson,1980;Davis,1981;Fritzelletal., 1977;
theticallygenerated
voices.
Thethreeparameters
chosen
for
Hammarbergetal.,1980;Hirano,1981;Jensen,1965).In an
studywerepitchperturbation,
amplitudeperturbation,
and
effortto achieve
thisgoal,a numberof i.nvestigators
have
additivenoise.The long-termgoalof this work is to learn
made detailedacousticmeasurements
of a wide variety of
acousticpropertiesassociatedwith laryngealdisorders. howtheseandotheracousticparameterscombineto affecta
INTRODUCTION
Many of the acoustictechniquesthat havebeeninvestigated
arebasedon the findingthat dysphonicvoicestendto show
listener's
overallimpression
of vocalquality.Asaninitial
steptowardthisgoal,thepresentstudywasdesigned
to examinetheunivariaterelationships
betweenvariationin each
largerthannormaldeviations
fromperfectperiodicity.As a
numberof investigators
havepointedout,however,acoustic of these acousticdimensionsand the perceptionof dysmeasurements
areusefulonlyif theycanberelatedto specif- phonievocalquality.
ic diagnostic
categories,
or to meaningfulperceptualdimensions.For example,Hammarberget al. (1980) commented A. Pitch perturbation
that" ... acoustic measurements do not make senseon their
Pitchperturbation,
or "vocaljitter,"isdefined
ascycleown... [but] mustberelatedto perceptual
characteristics
in
to-cyclevariationin voicefundamental
frequency
(Fo). All
orderto beclinicallyuseful"(p. 32).
human voicescontain a certain amount of vocaljitter, and
The taskof relatingacousticproperties
to perceptual synthesis
studies
haveshownthat a minimumamountof
dimensions
hasnot provento bea simpleone.The most jitter is requiredfor a voiceto soundnatural(Gill, 1961;
commonapproachto thisproblemhasinvolvedthe useof
Holmes, 1962; Rozsypal and Millar, 1979; Schroeder,
correlational
techniques
to determine
relationships
between 1961). Jittervalues
in normalvoices
aregenerally
lessthan
theperceived
degreeand/or typeof dysphonia
andvariation about 1.0% (Hollien eta!., 1973; Horii, 1979, 1982;Jacob,
in specificacousticdimensions(e.g., Kempster, 1984; 1968;Kempstcr,1984;Simon,1927). Lieberman(1963)
Kempsterand Kistler, 1984;Kojimaet aL, 1980;Murry et
wasthe firstto reportthat dysphonicvoicestend to show
al., 1977;Proseket al., 1984;Smithet al., 1978;Yanagihara, unusuallylargecycle-to-cycle
variationsin Fo. This basic
1967:Yumoto et al., 1982). As will be discussedin greater
findinghasbeenconfirmedin severalstudiesusinga wide
detailbelow,the interpretabilityof thesestudieshasbeen varietyof analysisand computational
procedures
(Davis,
limitedbytheinabilityto controlvariationin specific
acous- 1976;Deal and Emanuel, 1978; Hecker and Kruel, 1971;
Kitajimaet al., 1975;Koike, 1973;seeMurry andDoherty,
'• Presentaddress:
Departmentof SpeechPathology
andAudiology,WesternMichiganUniversity,Kalamazoo,MI 49008.
2361
J. Acoust.Soc.Am.83 (6),June1988
1980,for a negativefinding).
0001-4966/88/062361-11
$00.80
¸ 1988Acoustical
SocietyofAmerica
2361
cal quality in naturallyproducedvoices.Takahashiand
Koike (1975) reportedthat"breathiness"
ratingscorrelated
with amplitudeperturbation
and"roughness"
ratingscorrelatedbothwithpitchperturbation(r = 0.55) andamplitude
disorderedvoices.Weak- to moderate-strength
correlations perturbation(r = 0.72). Deal and Emanuel (1978) made
measures
of pitchandamplitudeperturbation
betweenpitchperturbationand perceived
roughness
were nonsequential
whowereaskedto simulate
reportedin studiesof disorderedspeakersby Deal and froma groupof normalspeakers
Emanuel (1978), Lieberman (1963), and Takahashi and
roughvoicequalityand froma groupof speakers
with varKoike (1975). Smithetal. (1978) reporteda relativelyweak iouslaryngealpathologies.
Resultsshowedsignificant
correand measures
(r = 0.55) nonsignificant
correlation
between
jitter andper- lationsbetweenlistenerratingsof roughness
of bothpitchand amplitudeperturbation.
On the basisof
ceivedroughness
in a groupof esophogeal
spea.
kers.
analyses,
Deal and Emanuelconcluded
As mentionedpreviously,two importantinterpretive multipleregression
limitationsof thesestudiesusingnaturally producedvoices that, "... cyclicpeak amplitudevariabilitymay providea
betterindexof perceivedroughness
than cyclicperiodvariconcernthe inability to controljitter valueswithout affecting valueson other dimensions
and the inabilityto control ation ..." (p. 250). A reanalysisof theseresultsby Nichols
Jitter doesnot appearto be a factorcontrollingperceivedroughness
in naturallyproduced
normalvoices(HeibergerandHorii, 1982),butthere!ssomeevidence
suggesting that jitter is correlatedwith perceivedroughnessin
therangeof variationon anyof theacousticdimensions.
The
firstof theseproblemsisespeciallyimportantbecauseseveral acoustic measurement studies of disordered voices have
(1979) reachedthe sameconclusionusingpartial correlation techniques.
Studiesby Wendahl (1966a,b) and Heibergerand Horii
(1982) reported strong relationshipsbetwei•namplitude
perturbationand perceivedroughness
in synthetically
generatednonspeech
signals.Wendahlreportedthattheroughnessperceptresultingfrom the introductionof amplitude
perturbationin sawtoothwaveformswasverysimilarto that
associatedwith pitch perturbation:
"... it isinteresting
to notethat theroughness
generated
by thesedifferentprocedures
[i.e., pitchandamplitude
perturbation]resultsin suchsimilar auditory experiences.Somehighly trainedlistenerswereableto distinguishbetweenthe two typesof stimuli,but the writer,
who has had years of listeningexperiencewith such
stimuli, is able to discriminatebetweenthe program
typesonly at the extremesof the continuum..." (Wendahl, 1966b,p. 106).
comparison
listening
tests
showed
thatroughness
judgments
Heiberger and Horii studied perceptualtrading relacorrelatedstronglywith pitch perturbation.Wendah!'sresultsalsoshowedthat, for a givenjitter size,a signalwith a
tionsbetweenpitch and amplitudeperturbationby synthesizingtriangular wavesvarying in jitter, shimmer,or both
low Fotendedto soundmoreroughthana signalwith a high
Fo (see,also,Coleman,1969), a findingwhichhasalsobeen jitter andshimmer.The resultssuggested
that theperceptual
effects
of
jitter
and
shimmer
are,
in
some
sense,equivalent.
reportedfor naturallyproducedvoices(Deal andEmanuel,
1978; Heibergerand Horii, 1982). Heiberger and Horii
For example,a stimuluswith 2.0% jitter wasjudged to be
(1982) alsoreporteda strongcorrelationbetweenjitter and
approximatelyequivalentin roughness
to a 1.0-dBshimmer
perceived
roughness
usingsynthesized
triangularwaves.To
stimulus.The resultsalsosuggested
that the effectsof jitter
date, no studyhasexaminedthe relationshipbetweenjitter
and shimmerare additive;for example,a stimuluscontainand perceivedroughnessin syntheticallygeneratedvoice ing both 2.0% jitter and 1.5-dB shimmer soundedmore
signals.
roughthaneithera 2.0% jitter stimulusor a 1.5-dBshimmer
reported significantintercorrelationsamong individual
acousticparameterssuchas pitch pe.rturbation,
amplitude
perturbation,and additivenoise(Davis, 1976;Deal and
Emanuel, 1978; Heibergerand Horii, 1982; Horii, 1980;.
Kempster,1984;KempsterandKistler,1984;Yumotoetal.,
! 984). Further,a recentmethodological
studysuggests
that
the acousticanalysistechniquesthat havebeenusedto measureperturbationarenot alwaysableto discriminateamong
varioussourcesof aperiodicity(Hillenbrand, 1987).
Becauseof the difficulty in interpretingthe resultsof
perceptionstudiesusingnaturallyproducedvoices,several
studieshaveexaminedtheperceptio
n of syntheticallygenerated signals.Wendahl (1963, 1966a,b) synthesizedsawtoothwavesvaryingin jitter and meanFo. Resultsof paired
stimulus.
B. Amplitude perturbation
Amplitudeperturbation,or "vocalshimmer,"isdefined
ascycle-to-cycle
variationin voiceamplitude.Shimmervaluesin normal voicesare generallylessthan about0.7 dB
(Horii, 1980,1982;Kempster,1984;Robbins,1981). Using
a calculation method based on successive differences from a
thre•-point
moving
.average,
Kitajima
andGould(1976)reportedthat amplitudeperturbationvaluesfrom a groupof
dysphonicsubjectsweresignificantlylargerthan thosefor a
nondisordered
controlgroup.Similarfindingswerereported
by Davis (1981) usingslightlydifferentcalculationmeth-
C. Additive
noise
The termadditivenoiseis generallyusedto referto the
acousticby-productof turbulencegeneratedat the glottis.A
numberof studieshave reportedthat noiselevelsin dysphonicvoicestendto be higherthan thosein normalvoices
and that noisemeasurements
correlatewith subjectiveratingsof dysphonia(Deal and Emanuel, 1978;Emanueland
Sansone,1969;Kojima et al., 1980;Lively and Emanuel,
1970;Sanseine
and Emanuel,1970;Yanagihara,1967;Yu-
motoeta!., 1982,1984}.A straightforward
interpretation
of
ods.
theperceptual
effects
ofadditive
noise
iscomplicated
bythe
Relativelylittle isknownabouttherelationship
between
amplitudeperturbationandthe perceptionofdysphonicvo-
presence
of relativelystrongintercorrelations
amongmea-
2362
J. Acoust. Soc. Am., Vol. 83, No. 6, June 1988
sures
ofperturbation
and.
additive
noise
(DealandEmanuel,
JamesHillonbrand:
Perceptionof aperiodicities
2362
1978;Kempster,1984;Kempsterand Kistler, 1984;Yu-
for the vowel [a] (F1=720, F2=1240, F3=2400,
F4 = 3300,F 5 = 3700).Thepitchpulse
wasgenerated
with
surement
interactions
amongthesevariables(Hillenbrand, a 40-kHzsample
frequency,
12bitsofamplitude
resolution,
1987).
andconsisted
of512datapoints
( 12.8ms).Asshown
inFig.
Verylittleworkhasbeendoneonthesynthesis
andper- 1,sustained
vowels
weresynthesized
bystringing
together
ception
ofstimulivaryingin additivenoise.In a studythatis
the individualdampedoscillations
produced
by the Klatt
described
very briefly,Yanagihara(1967) mixedfiltered synthesizer.
TheFowascontrolled
byadjusting
theinterval
moroetal., 1984)andbythepresence
of verystrong
mea-
and unfilterednaturallyproducedsustainedvowelswith
varioustypesof bandpass
filterednoise.The signal-to-noise
ratioswereheldconstantand the primarypurpose
of the
studywasto determinethe relationship
betweenperceived
dysphonia
and the spectralproperties
of the periodicand
aperiodic
components
of thestimuli.Yanagiharareporteda
strongrelationship
betweenthe lossof high-frequency
harmonicsandperceived
dysphonia:
"Evenif therelativeintensityof thenoisecomponents
andtheharmoniccomponents
remainunchanged,
the lossof high-frequency
harmonicsresuitsin anincrease
ofthedegreeof perceived
dysphonia"
(p.
538). Yanagihara'sresultsare consistentwith FroekjaerJensenandPrytz (1976), who reportedan increasein highfrequencyenergyin long-termaveragespectrummeasurementsfollowingtreatmentfor voicedisorders(see, also,
Hammarberget al., 1980;Fritzell et al., 1977;Gauffinand
Sundberg,1977).
The presentstudywasdesigned
to extendthe work of
Wendahl( 1963, 1966a,b)and Heibergerand Horii (1982)
in studyingtherelationsbetweenperturbationandperceived
roughness
in syntheticallygeneratedsignalsand to extend
the work of Yanagihara (1967) in studyingthe perceptual
effectsof additivenoise.The experiments
on the perception
of perturbationweredesignedprimarily to addresstwo limitationsof previousresearchonjitter andshimmersynthesis.
First, the presentstudywasdesignedto examineroughness-
perturbationrelations in syntheticallygeneratedvoices,
ratherthan the nonspeech
waveformsusedin the studiesby
Wendahland Heibergerand Horii. Second,unlike the previousperturbationsynthesisstudies,the presentstudyused
synthesis
techniques
that attemptedto modelthe sequential
propertiesof cycle-to-cycle
pitch and amplitudechangein
naturallyproducedvoices.The purposeof theadditivenoise
experiments
wasto examinethe relationshipbetweennoise
leveland perceivedbreathiness
overa widerangeof signalto-noiseratiosand to testfor possibleinteractionsbetween
signal-to-noise
ratioandenergylevelsin high-frequency
harmonicsovera broadrangeof signal-to-noise
ratios.
I. EXPERIMENT
betweentheonsetof onedampedoscillation
andtheonsetof
the nextoscillation.For fundamentalperiodsthat are less
than12.8ms,theendof onedampedoscillation
will overlap
with the beginningof the next.This effectwasaccountedfor
simplybyaddingthetail endof onedampedoscillation
into
thebeginning
of thenext.In Fig. 1,theonset-to-onset
intervalwasfixedat 8 ms,producing
a vowelwitha constant
Foof
125Hz. Pitchperturbation
wascontrolledby introducing
specificamountsof variabilityin the onset-to-onset
intervals.
Althoughnot usedin experiment1, VSYN controlsamplitudeperturbation
byscalingeachpitchpulseindividually
to achievethe desiredamountof amplitudevariability.Additivenoisecanbecontrolledby theappropriatescalingand
point-for-pointadditionof a separatenoisesignal.
Using a methodsuchas this to synthesizestimuli that
differonlyin pitchperturbationis problematicsince,asdiscussedin detail in a relatedarticle (Hillenbrand, 1987), am-
plitudeperturbation
isproducedasa sideeffectof pitchperturbation.For thestimulithat wereintendedto differonlyin
pitch perturbation,thisartifactwasremovedby a separate
programthat measuredthe intensityof individual pitch
pulsesand scaledall pitch pulsesto the samerms value.
2. Random-number generation
A random-number
generatorof sometypeis neededto
producethesequence
of Foand/or pitch-pulse
amplitude
valuesthat control the synthesizer.The random-number
1
A. Methods
1. Synthesis technique
Stimuli for all the experimentsdescribedin this article
weregenerated
with a pitch-synchronous
synthesis
program
called VSYN (Wilde and Martens, 1985; modeled after
Wilde et al., 1986). The programwasdesignedto generate
sustainedvowelsdifferingin jitter, shimmer,additivenoise,
and mcanF o.The first stepin the synthesisprocessinvolved
usingKlatt's (1980) formantsynthesizer
to generatea single
pitchpulsewithformantfrequency
characteristics
appropriate for whatevervowel quality is desired.Stimuli for the
presentstudyuseda formantfrequencypatternappropriate
2363
J. Acoust.Sec. Am., Vol. 83, No. 6, June 1988
FIG. 1. Pitch-synchronous,
time-domain
synthesis
technique
usedby
VSYN.
James Hillenbrand:Perceptionof aperiodicities
2363
(i.e., the standarddeviationof the distributioncycle-to-cycle differencesin fundamental period with the sign retained).
Equatingthe stimuli foreither fundamentalperiodstandard deviationor perturbationfactor increasedrather than
decreaseddifferences
in roughness
magnitudebetweenthe
correlated
and uncorrelated
continua.
When
the stimuli
wereequatedfor fundamentalperiodstandarddeviation,the
averagedifferencein roughnessmagnitudebetweencorrelated and uncorrelatedsignalswas20.4%; for the perturbation factor, the differencewas 29.4%. Koike's (1973) "rela-
tiveaverageperturbation,"whichusesa three-pointmoving
average,producedresultsthat werevery similar to the mean
jitterdataShown
in Fig.3. It isalsointeresting
to notethat
the uncorrelatedsignalshad substantiallyhigher valuesof
directionaljitter than the correlatedsignals(72.7% vs
40.6%). The factthat the correlatedsignalssounded
more
roughthanthe uncorrelated
signalswouldseemto indicate
thatdirectional
jitter doesnotplaya rolein roughness
perception.
In general,our preliminary conclusionfrom the com-
parisonbetweenthe correlatedand uncorrelated
signalsis
that the standarddeviationof signedjitter, or Davis' ( 1981)
PPQ, showsa strongerrelationship
to perceivedroughness
than other methodsof representingpitch perturbation.
However, none of the calculation methods that were used
eliminatedthedifference
in perceived
roughness
betweenthe
correlatedand uncorrelatedsignals.
II. EXPERIMENT
PERTURBATION
2: PERCEPTION
OF AMPLITUDE
A. Methods
VSYN wasusedto synthesizetwo 22-membershimmer
continuausingmethodsthat wereanalogous
to thoseusedto
createthejitter continuumin experiment1. As in the pitch
perturbationexperiment,onecontinuumwascreatedusing
the modifiedI/f random-number
generatorand the other
was createdusinga standardwhite-noisegenerator.The
stimuli along each continuumvaried from 0.0-2.6 dB and
were spacedat 0.1-dB incrementsfrom 0.0-1.0 dB and at
0.2-dB increments from 1.0-2.6 dB. The decision to restrict
B. Results
and discussion
The functionrelatingshimmerto perceivedroughness
is
shownin Fig. 5. The smoothcurveisa second-order
polynomial in the case of the correlated
continuum
and a fourth-
order polynomialin the caseof the uncorrelatedcontinuum.
As wastrue for the pitch-perturbation
data,the signalsthat
were producedfrom correlatedsequences
soundedmore
roughthan signalswith the samemeanperturbationvalues
that were producedfrom uncorrelatedsequences.
For the
datain Fig. 5, the averagedifferencein roughness
magnitude
betweencorrelatedand uncorrelatedsignalswith the same
meanshimmervalue was 27.1%. This valueis nearlythree
times larger than the differencethat was observedbetween
correlatedand uncorrelatedstimulifor the pitch-perturbation continua.
Unlike the pitch-perturbation
data. this discrelms•_betweenthe correlatedand uncorrelatedsignalsdoes
seemto berelatedto the choiceof perturbationcalo•t•tion
methods.In general,stimuli that were matchedfor mean
shimmer
tendedtoshowverysimilarratings
•-henperluffoationwasmeasured
usingothercalculationmethods,su• as
pitch-pulseamplitudestandardde•Sation,
amplitude-perturbationquotient(the amplitudeanalogofPPQ), standaxd
deviationof signedshimmer,and Koike's (1973) relatis•
averageperturbation.
Oneimportantfindingof thisexperimentthat cannotbe
observed
in Fig. 5 concerns
thesubjective
qualityof thestim-
uli varyingin amplitudeperturbation.
Recallthatprevious
synthesis
researchwith nonspeech
signalssuggested
that
amplitudeperturbation
produceda sensation
of roughness
that wasvirtuallyindistinguishable
from that producedby
pitchperturbation(Wendahl,1966b).Althoughsubjects
in
the presentexperimentwere'askedto rate the stimuli on
•
[]
R
CORRELATED AMPLITUDE SEOUENCES
UNCORRELATEDAMPLITUDE SEOUENCES
80
0
the continuum to the range below 2.6 dB was somewhat
U
arbitraryand wasbasedon the increasingly
unnaturalperceptualqualityof the synthesizedsignalsasshimmervalues
approachedabout2.0 dB. All stimuliwere 1.0 s in duration
and weresynthesized
at 40 kHz, with a constantFo of 130
Hz. As in thepreviousexperiment,the stimuliweregatedon
H
G
70
N
E
S
•0
S
S0
R
•0
T
and off with a 20-ms cosine function and all stimuli on the
continuumwereequatedfor overallrmsintensity.
Subjects
consisted
of thesametenlisteners
whoparticipatedin experiment1.As in theprevious
experiments,
subjectswereaskedtoratethestimulionthedegree
ofperceived
roughness.
Eachof the44 stimuliwaspresented16timesin
pseudorandomorder. The first 132 trials were consideredto
bepracticeandthesedatawerenotincludedin theanalysis.
Methodsusedfor stimuluspresentationwere identicalto
experiment1.
2366
J.Acoust.
Soc.Am.,Vol.83,No.6,June1988
G
20
I
0.0
I
0.•f
I
0.8
I
1.2
SHIMMER
I
1.6
I
2.0
I
2.t+
I
2.8
(08)
FIG. 5. Perceivedroughness
as a functionof shimmerfor correlatedand
uncorrelated
pitch-pulseamplitudesequences.
James
Hillenbrand:
Perception
ofaperiodicities 2366
themselves
to beexperienced
in theevaluation
andtreatment
of voicedisorders.
Thesesamesubjects
participated
in experiments2-4 aswell. The order of presentationof the four
experimentswascounterbalanced
acrosssubjects,with one
exception:All subjectsparticipatedin experiment3 (comparisonof pitchandamplitudeperturbation)followingtheir
participationin experiment1 (perceptionof pitchperturbation) and experiment2 (perceptionof amplitudeperturbation).
B. Results
and discussion
Resultsare shownin Fig. 3, which plots normalized
roughness
magnitudeas a functionof percentjitter, pooled
acrossall ten listeners.Direct magnitudeestimateswererescaledseparatelyfor eachsubjectsothat the numbersranged
from 10-90.The smoothcurvesarethird-orderpolynomials
that werefit to the data. Ignoringfor the momentthe differencebetweenthe correlated and uncorrelated stimuli, it can
beseenthat thereisa verystrongrelationship
betweenpitch
perturbationand perceivedroughness.
The compression
of
thefunctionat thehighendof thejitter continuumisconsistent with Heibergerand Horii (1982), who reportedthat,
"... beyond a certain point, relatively large increasesin
[jitter] did not resultin similarlylarge increasesin roughnesslevel ..." (p. 321). However,in Heibergerand Horii's
nonspeech
data,the changein slopeoccurredbetweenjitter
valuesof 5.0% and 10.0%, much largerthan the valueof
approximately2.0% foundin the presentstudy.Although
this discrepancy
might reflectdifferences
in the perception
of triangularwavesversusthe moreharmonicallyrich voice
signalsusedin the presentstudy,thereare two other possibilities.The stimuliusedby Heibergerand Horii werehigher
in mean Fo ( 165 vs 130 Hz usedin the presentstudy) and
were presentedto subjectsover earphonesrather than a
•
, []
90
G
uncorrelatedcontinua were equatedfor mean jitter (the
averageabsolutedifferencein fundamentalperiodbetween
adjacentpitchpulses),but werenot necessarily
matchedin
terms of other calculation
the correlated and uncorrelated continua, but with the stim-
uli equatedfor Davis' ( 1981) "pitch perturbationquotient"
(PPQ), which usesa five-pointmovingaverage.It can be
seenthat thedifferences
in perceivedroughness
betweenthe
correlatedand uncorrelatedsignalsare reducedsignificantly, althoughnot eliminatedentirely.For the data in Fig. 4,
the averagedifferencein roughness
magnitudebetweenthe
correlatedand uncorrelatedsignalswas4.5%. Very similar
resultswere found when the correlatedand uncorrelatedsig-
nalswereequatedfor the standarddeviationof signedjitter
CORRELATED
PERIOD
SEQUENCE
]
UNCORRELATED
90
R
0
U
-
methods that have been used to
representpitchperturbation.For example,stimulifrom the
correlatedcontinuumgenerallyhad largervaluesof fundamental period standarddeviation (Deal and Emanuel,
1978)andlargermeanjitter valueswhencalculations
were
made from either a three-pointmovingaverage(Koike,
1973) or a five-pointmovingaverage(Davis, 1981).
Figure4 showsthe roughness-perturbation
functionfor
_
o
70
tant to note, however, that stimuli on the correlated and
CORRELRTED PERIOD SEQUENCES
UNCORRELRTED' PERIO0 SEOUENCES
R 80 U
loudspeaker.
Bothof thesedifferences
wouldbeexpectedto
makethe Heibergerand Horii stimulisoundlessroughthan
the stimuli used in the presentstudy •Wendahl, 1963,
1966a,b;Coleman, 1969;Wilde et aL, 1986), which might
havethe effectof movingthe entireroughness-perturbation
functionto the right.
The otherobviousfeatureof thedatain Fig. 3 is that the
stimuligenerated
fromthecorrelatedperiodsequences
were
perceivedasmoreroughthanthestimuligenerated
fromthe
uncorrelated
periodsequences.
The differences
in roughness
magnitudefor a givenjitter valueaveraged9.3% and were
highly significant(t = 26.0, df= 29, p <0.01 ). It is impor-
80
PERIO0
SEOUENCES
_
-
G
H
N
E
S
70
_
60
-
S
S0
R
T
T
!
[
30
30
N
G
20
20
10
10
0
I
2
PERCENT
3
•t
S
6
JITTER
FIG. 3. Perceived
roughness
asa functionofjitter for correlatedanduncorrelatedperiodsequences.
2365
J. Acoust.Soc. Am., Vol. 63, No. 6, June 1988
0
50
1OO
PPQ
1S0
2•0
•50
300
350
(MICROSECONDS)
FIG. 4. Perceivedroughness
asa functionofjitter for correlatedanduncorrelatedperiodsequences
with stimuliequatedfor PPQ.
James Hillenbrand:Perceptionof aperiodicities
2365
(i.e., the standarddeviationof the distributioncycle-to-cycle differencesin fundamentalperiod with the sign retained).
Equatingthestimuliforeitherfundamental
periodstandarddeviationor perturbationfactorincreased
ratherthan
decreased
differences
in roughness
magnitudebetweenthe
correlated and uncorrelated
continua. When the stimuli
wereequatedfor fundamentalperiodstandarddeviation,the
averagedifferencein roughness
magnitudebetweencorre-
latedanduncorrelated
signalswas20.4%;for theperturbation factor, the differencewas 29.4%. Koike's (1973) "rela-
tiveaverage
perturbation,"
whichusesa three-point
moving
average,producedresultsthat wereverysimilarto the mean
jitterdataShown
in Fig.3. It isalsointeresting
tonotethat
the uncorrelated
signals
hadsubstantially
highervaluesof
directionaljitter than the correlatedsignals(72.7% vs
40.6%). The factthat the correlatedsignalssounded
more
B. Results
and discussion
Thefunctionrelatingshimmerto perceived
roughness
is
shownin Fig.5.Thesmooth
curveisa second-order
polynomial in the case of the correlated continuum and a fourth-
orderpolynomialin thecaseof the uncorrelatedcontinuum.
As wastruefor thepitch-perturbation
data,thesignalsthat
were producedfrom correlatedsequences
soundedmore
roughthansignalswith the samemeanperturbationvalues
that wereproducedfrom uncorrelated
sequences.
For the
datain Fig. 5, theaveragedifference
in roughness
magnitude
betweencorrelatedand uncorrelated
signalswith the same
meanshimmervaluewas27.1%. This valueis nearlythree
timeslargerthan the differencethat wasobservedbetween
correlated
anduncorrelated
stimulifor thepitch-perturbation continua.
Unlike the pitch-perturbation
data, this discrepancy
betweenthe correlatedand uncorrelated
signalsdoesnot
roughthanthe uncorrelated
signalswouldseemto indicate
seemto be relatedto the choiceof perturbationcalculation
thatdirectional
jitterdoes
notplaya roleinroughness
per- methods.In general,stimuli that were matchedfor mean
ception.
shimmer
tendedtoshowverysimilarratingswhenperturbaIn general,our preliminaryconclusionfrom the comtionwasmeasured
usingothercalculationmethods,suchas
parisonbetweenthe correlatedand uncorrelated
signalsis
pitch-pulse
amplitudestandarddeviation,amplitude-perthat the standarddeviationof signedjitter, or Davis' ( 1981)
turbationquotient(theamplitudeanalogof PPQ), standard
PPQ,showsa stronger
relationship
to perceived
roughness deviationof signedshimmer,and Koike's (1973) relative
than other methodsof representing
pitch perturbation. averageperturbation.
However, none of the calculation methods that were used
eliminated
thedifference
in perceived
roughness
between
the
correlatedand uncorrelatedsignals.
II. EXPERIMENT
PERTURBATION
2: PERCEPTION
OF AMPLITUDE
A. Methods
Oneimportantfindingof thisexperiment
thatcannotbe
observed
in Fig. 5 concerns
thesubjective
qualityof thestimuli varyingin amplitudeperturbation.
Recallthat previous
synthesis
researchwith nonspeech
signalssuggested
that
amplitudeperturbation
produced
a sensation
of roughness
thatwasvirtuallyindistinguishable
fromthat produced
by
pitchperturbation
(Wendahl,1966b).Althoughsubjects
in
the presentexperimentwere'askedto rate the stimuli on
VSYN wasusedto synthesize
two 22-membershimmer
continuausingmethods
thatwereanalogous
to thoseusedto
createthejitter continuumin experiment1.As in the pitch
perturbation
experiment,
onecontinuumwascreatedusing
the modifiedI/f random-numbergeneratorand the other
•
[]
was created using a standardwhite-noisegenerator.The
stimulialongeachcontinuumvariedfrom 0.0-2.6 dB and
9o!
-I
CORRELATED
AMPLITUDE
SEQUENCES
UNCORRELATED
AMPLITUDE
SEOUENCESJ
I
I•
were spacedat 0. l-dB increments from 0.0-1.0 dB and at
0.2-dB increments from 1.0-2.6 dB. The decision to restrict
the continuumto the rangebelow 2.6 dB was somewhat
arbitraryandwasbasedon the increasingly
unnaturalpereeptualqualityof thesynthesized
signalsasshimmervalues
approachedabout 2.0 dB. All stimuli were 1.0 s in duration
and weresynthesized
at 40 kHz, with a constantF0 of 130
I-Iz. Ag in the previougexperiment,the•timuli weregatedon
and off with a 20-ms cosine function and all stimuli on the
continuumwereequatedfor overallrmsintensity.
Subjects
consisted
of thesametenlisteners
whoparticipatedin experiment
1.Asin theprevious
experiments,
sub-
• 70
A
• 60
s
so
• 30
20
1o
jectswereasked
toratethestimulionthedegree
ofperceived
roughness.
Eachof the44 stimuliwaspresented
16timesin
pseudorandomorder. The first 132 trials were consideredto
bepracticeandthesedatawerenotincludedin theanalysis.
Methods usedfor stimuluspresentationwere identicalto
experiment 1.
2366
J. Acoust.Soc. Am., Vol. 83, No. 6, June 1988
0.0
0.•
0.8
1.2
SHlMMER
1.6
I
I
I
2.0
2.•
2.8
(DB)
FIG. 5. Perceivedroughness
asa functionof shimmerfor correlatedand
uncorrelated
pitch-pulseamplitudesequences.
JamesHillenbrand:
Perception
of aperiodicities
2366
roughness,
thistermisalmostcertainlynota gooddescriptionof theperceptual
qualityofthestimulivaryinginamplitudeperturbation.Unlike the resultsfor sawtoothwavesreported by Wendahl, the perceptualquality of the stimuli
varyingin amplitudeperturbationin the presentstudywas
B. Results
and discussion
The resultsarepresentedin Fig. 6, whichshowspercent
correct identification
for each stimulus.
With
the obvious
exceptionof stimuli with zero-perturbationvalues,subjects
weregenerallyableto determinewhetherthestimulusrepre-
quite differentfrom that producedby pitch perturbation. sentedperturbationsin pitch or amplitude.Identification
When askedto provideverbaldescriptions
of the stimuli, performanceimprovedat higherperturbationlevelsandwas
subjectsgenerallycommentedthat the signalstoward the
generallybetterfor the correlatedratherthan uncorrelated
highendof theshimmercontinuumhadan unnatural"popstimuli.Theseresultssuggestthat, contraryto the findings
ping"quality.For example,onesubjectcommentedthat the
reportedby Wendahl(1966b) for sawtoothwaves,the substimulisoundedasthoughtheywerebeingplayedthrougha
jectivequalitiesproducedbyjitter andshimmerin synthetic
loudspeakerwith a loosewire and another comparedthe
vowelsarequitedifferent,exceptat verylow levelsof aperiosignalsto speechplayed over a radio during an electrical dicity.
storm.By contrast,stimulitowardthehighendof thepitchperturbationcontinuumare perceivedas very rough;howIV. EXPERIMENT
4: PERCEPTION
OF ADDITIVE
NOISE
ever,with the exceptionof the very highjitter valuesfrom
The purposeof experiment
4 wasto studytherelationthe correlatedcontinuum,the stimuli soundedas though
ship
between
additive
noise
and
perceived
dysphonia
andto
they could havebeenproducedby a talker with a severely
determine
how
this
relationship
might
be
affected
by
the
disordered voice.
slopeof thespectrum
in theperiodiccomponent
of thestimulus.Examinationof the role spectralslopewas motivated
by Yanagihara's(1967) findingthat the lossof energyin
III. EXPERIMENT
3: COMPARISON
OF PITCH AND
high-frequency
harmonicsresultsin an increasein perceived
AMPLITUDE
PERTURBATION
dysphonia
evenwhenstimuliarematchedfor signal-to-noise
A. Methods
1. Stimuli
ratio.
A. Methods
The purposeof experiment3 wasto determinewhether
subjectscould, in fact, differentiatebetweenthe effectsof
pitchandamplitudeperturbation.The teststimuliconsisted
of ninesignalseachfrom the correlatedpitch-perturbation
continuum,theuncorrelatedpitch-perturbation
continuum,
the correlatedamplitude-perturbation
continuum,and the
uncorrelatedamplitude-perturbation
continuum.The nine
1. Stimuli
All stimuli for experiment4 were synthesized
with
VSYN, whichcontrolssignal-to-noise
ratioby theapproprißatescalingandpoint-for-point
additionof separate
periodic
and aperiodiccomponents.
A singleaperiodicsignalwas
generatedwith the Klatt (1980) synthesis
programby passing
the
aspiration
source
through
formant
resonators
that
stimulus values from each continuum were chosen in such a
were
set
appropriate
for
[a]
(F1
=720,
F2=
1240,
way that the spacingbetweenstimuli was approximately
F3 = 2400,F4 = 3300,F5 = 3700). Exceptfor differences
evenin perceptualterms,as determinedby the roughness
in amplitude,
thenoisewaveform
wasidenticalforall stimumagnitudeestimates.
Eachseriesof ninestimulibeganwith
li.
a stimulushavinga perturbationvalueof zero.Thesefour
stimuli should have been identical and were included as a
reliabilitycheck.The 36 stimuliwereequatedfor overallrms
intensityand presentedovera loudspeakerusingthe proceduresdescribedpreviously.
1oo
E
2. Subjectsand procedures
R so
Thetensubjects
whohadparticipated
inexperiments
1 • ao
and2 served
aslisteners.
Twoseparate
identification
tasks r
wererun in counterbalanced
order. One taskusedthe corre-
70
c
o
latedsignals
fromeachcontinuum,
andtheotherusedthe R 60
uncorrelated
signals.
A verybrieftrainingsession
preceded E
c
so
each identificationtask. The trainingsessionconsistedof
tworandomly
ordered
presentations
ofeach
ofthe18stimuli. Subjects
wereaskedto pressoneof two keyson a terminal
keyboardto indicatewhetherthestimuluswasdrawnfrom
thejitter continuumor the shimmercontinuum.Fcedback
wasprovidedon eachof the 36 trials.The testingsessions
wereidenticalexceptthat feedbackwasnot provided,and
eachstimuluswaspresented
tentimesin pseudorandom
order.
2367
J. Acoust.Sec. Am.,Vol.83, No. 6, June1988
I
T
ß
UNCORRELRTED
JITTER
[]
UNCORRE•TED SHIMMER
•^
,o
I
•
STIMULUS
FIG.
6. Percent correct identification
CORRELRTED
...............
I
S
I
6
I
7
SHIMMEVR
I
8
I
9
NUMBER
of stimuli from the correlated and
uncorrelatedjinercontinuaandfrom thecorrelatedanduncorrelatedshimmer continua.The subjects'task was to judge whetherthe stimuluswas
drawn from one of thejitter continuaor oneof the shimmercontinua.
JamesHillenbrand:
Perceptionof aperiodicities
2367
With thesynthesizer
setin parallelmode,
Severalversions
of theperiodiccomponent
weregenera- mantamplitudes.
the resonatorgainsassociated
with F l-F6 werespacedat
ted usingthe Klatt synthesizer.
Two methodswereusedto
-- 3-, -- 5-, or -- 10-dB increments.For example,for the
controlspectralslope.In method1, spectralslopewascon-- 10-ribsignal,F 1gainwassetto 66 riB,F2 gainwassetto
trolledat the glottallevelbyvaryingthe "bandwidthof glotcharacteristal resonance"(BGR) parameterin the Klatt synthesizer. 56riB,F 3gainwassetto46riB,etc.Thespectral
This parametercontrolsthe cutofffrequencyof a low-pass ticsof thesestimuliareshownin Fig. 8. The periodiccomponentsthat weregenerated
with thismethodweremixedapfilterthat is usedto shapethe voiceimpulsetrain. The BGR
parameterwassetat 75, 150,and 300 Hz, produe!ng
the propriatelywith the noisesignaldescribedaboveto produce
glottalsource
functions
shownin Fig.7. Theglott• wave- threeadditional13-stepcontinuavaryingin signal-to-noise
formswerepassed
throughformantresonators
.appropriate
ratio from -- 10 to 26 dB.
for [a] and then mixed with the scaled noise described
above.Three 13-stepcontinuaweresynthesized
that varied
2. Sublets and procedures
in signal-to-noise
ratioin 3-dBstepsfrom -- 10to 26 dB (39
stimuli). All stimuliwere 1.0s in durationand weresynthesizedwith a constant130-HzFo and a 40-kHz samplefrequency.
Method 2, whichwasmorenearlyanalogousto Yanagihara's(1967) technique,usedthe sameglottalwaveformfor
all stimuli and controlledthe spectralslopeby adjustingfor-
Listeners
consisted
of nineof thetenspeech
pathologists
who participatedin the other experiments.An additional
subject'meeting
the samecriteria was recruitedto replace
onespeechpathologistwho wasnot availableat the time the
experimentwas run. Using the magnitudeestimationtask
described
above,subjects
wereaskedto ratethestimulion
thedegreeofperceived
breathiness.
Subjects
wererunin two
blocks of 624 trials in counterbalanced
I
I
I
I
order. One block con-
sistedof 16 psuedorandomly
orderedpresentations
of the 39
stimulicreatedusingmethod 1 to controlspectralslope;a
secondblockconsisted
of 624 presentations
of the 39 stimuli
createdusingmethod 2. For both blocksof trials, the first
117 stimuluspresentations
were consideredto be practice
trials and were not includedin the data analysis.
I
B. Resultsand discussion
I
TI•E
i
i
Functionsrelatingsignal-to-noise
ratios to nrmalized
breathiness
ratingsareshownin Fig. 9 for method1andFig.
I0 for method2. Thesmoothcurvesarethird-orderpolynomials.Not surprisingly,
thereis a verystrongrelationship
betweensignal-to-noise
ratio and listeners'perceptionof
breathiness.
However,contraryto the resultsreportedby
i
--)
Yanagihara
(1967),theamount
ofhigh-frequency
energy
in
the periodiccomponentdid not appearto playa rolein controllingthe degreeof perceiveddysphonia.In general,stimuli with similarsignal-to-noise
ratiostendedto receivevery
80
70
9O
G0
• 8O
S0
•0
v
o
i
2
FREQUENCY
3
•
S
(KHZ)
FIG. 7. Time-domain(top) andfrequency-domain
(bottom) representationsof glonal sourcefunctionsvaryingin spectralslope.The function
showingthe highestrate of changein the time domainand the greatest
amountof high-frequency
energywasproducedwith a BGR valueof 300
Hz; the functionwith themostgradualrateof changeand theleastamount
of high-frequency
energywasproducedwith a BGR of 75 Hz. The middle
functionin bothpanelswasproducedwith a BGR of 150Hz.
2368
d.Acoust.
Soc.Am.,Vol.83,No.6, June1988
3o
O
I
2
FREQUENCY
3
•f
S
(KHZ)
FIG. 8.Fourierspectra( 1024points)of theperiodiccomponents
produced
by controllingformantamplitudeswith the synthesizer
in parallelmode.
JamesHillenbrand:
Perception
ofaperiodicities 2368
I
I
I
I
I
I
I
I
I
I
I
I
I
I
gists who were asked to rate the same set of stimuli on
9O
hoarsehess.
Theresults
of thattestwerevirtuallyidenticalto
the data shownin Figs.9 and 10.
B
80
œ
A
T
70
V. GENERAL
[
N
E
6o
$
S0
To summarizebriefly,four experiments
wererun that
examinedunivariaterelationships
betweenperceived
dysphoniaandvariationin pitchperturbation,
amplitude
perturbation,andadditivenoisein synthetically
generated
sustained vowels. Among the results were: (1) Strong
DISCUSSION
H
S
T
[
30
relationships
werefoundbetweenperceivedroughness
and
variationin eitherpitchor amplitudeperturbation;
(2) stim-
N
G 20
uli thatweregenerated
fromcorrelated
pitchor amplitude
•0
r-l•,I
I
I
-12-9
I
-6-3
I
I
I
I
0
3
6
9
SIGNP•L-TO-NO[SE
I
I
I
12 IS
I
1821
RATIO
I
I
2•
27
(OB)
FIG. 9. Perceived
breathiness
asa functionof signal-to-noise
ratio.The parameteris the spectralslopeof the periodiccomponent,which wascontrolledby varyingthe cutofffrequencyof a low-passfilter that shapesthe
glottalsourcefunction.This is the BGR parameterin the Klatt (1980)
synthesis
program.
similarbreathiness
ratings,regardless
of differences
in the
spectralslopeof theperiodiccomponent.
It isnotclearwhythepresentfindingsdonotagreewith
thoseof Yanagihara(1967). Comparingthe synthesis
procedures used in the two studies is difficult since the methods
usedby Yanagiharaare not describedin greatdetail. One
possibilitythat wasconsidered
is that the discrepancy
is related to the fact that Yanagihara'ssubjectsrated the stimuli
on hoarseness,
whilesubjects
in thepresentstudywereasked
to makebreathiness
judgments.To testthispossibility,the
experiment
wasrerunwith four additionalspeechpatholo-
sequences
soundedmoreroughthanstimuligenerated
from
uncorrelated
sequences,
especially
for stimulivaryingin amplitudeperturbation;(3) unlikefindingsreportedfor sawtooth wavesvaryingin pitch and amplitudeperturbation
(Wendahl,1966b),theperceptassociated
withpitchperturbation was noticeablydifferent from that associatedwith
amplitudeperturbation;(4) a strongrelationship
wasfound
betweenadditive noiseand ratingsof breathiness;and (5)
contrary to Yanagihara's(1967) report, ratingsof either
breathiness
or hoarseness
were unaffected
by the spectral
slopeof the periodiccomponent.
Resultsof the listeningtestsusingstimulivaryingin
pitchand amplitudeperturbationare in generalagreement
with thosereportedfor nonspeech
waveforms(Heiberger
andHorii, 1982;Wendahl,1963,1966a,b).Oneveryimportant discrepancy
concernsWendahl's(1966b) reportthat
theroughness
percepts
associated
withpitchandamplitude
perturbationwere very similar to oneanother.Resultsfrom
the presentstudysuggestthat thesetwo perceptsare quite
easy to differentiate. Further, informal interviews with the
speechpathologistswho servedas listenerssuggested
that
jitter seemsto morecloselyapproximatethe kind of roughnessthat is heardin naturallyoccurringdisorderedvoices.
Mostlisteners
agreed
thatstimulitowardthehighendofthe
shimmercontinuahadan unnaturalquality.It mightbenoted that the unusualquality associatedwith the shimmer
IIIIIIIIIIIIII
90
8
R
Io
sLoPE
=
[] SLOPE= -S DB
80
=-
E
stimuli does not seem to be restricted to stimuli created with
I
T
H
I
60
_
N
E
S
SO _
tudeperturbationwasnot expectedsincepreviouscorrelational researchwith naturally producedvoicessuggested
that amplitude perturbationwas more stronglyassociated
with perceivedroughness
thanpitchperturbation(Deal and
Emanuel,1978;Nichols,1979). It isimportantto note,however, that the amplitude-perturbationmeasuresreportedin
S
R H'O_
T
30 _
N
G
20
_
10
-
I
I
t
12-9-6-3
I
I
I
I
I
0
3
6
9 121S18212•27
SIGNAL-TO-NOISE
I
I
I
I
I
I
RATIO
FIG. 10. Perceivedbreathiness
as a functionof signal-to-noise
ratio. The
parameteris thespectralslopeof theperiodiccomponent,
whichwascontrolledby varyingparallelformantamplitudes.
Parallelresonator
gainsfor
F1-F6 werespacedat intervalsof -- 3, -- 5, or -- 10 dB.
2369
the time-domain synthesismethod used by VSYN. Pilot
work usingstimuligeneratedwith the Klatt (1980) synthesisprogramproducedstimuli that soundedvery similar to
thosegeneratedby VSYN.
The Unnaturalquality of the stimuli varyingin ampli-
J. Acoust. Soc. Am., Vol. 83, No. 6, June 1988
the natural speechstudiesalmost certainly reflectedseveral
differenttypesofaperiodicity.Becauseof limitationsin measurementtechniques,
measuredvaluesof parameterssuchas
amplitude perturbationcan reflect unknown combinations
of amplitudeperturbation,
l•itch perturbation,
additive
noise,and perhapsother sourcesof aperiodicity(Hillenbrand, 1987). The presenceof thesemeasurementartifacts
makesit difficultto interpretperceptualresultsin termsof
James Hillenbrand:Perceptionof aperiodicities
2369
specificunderlyingacousticevents.
Another possibilitythat shouldbe consideredis that
thereis someimportantaspectof amplitudeperturbationin
naturally producedvoicesthat was not modeledaccurately
by the synthesistechniquesusedin the presentstudy. It is
catethat spectralslopeplaysno rolein judgmentsof voice
quality.Althoughit wasnot testedformally,mostlisteners
indicatedthat theywereawareof thedifferences
in "brightness"amongthestimuli.Thesedifferences,
however,didnot
appearto influencesubjects'
judgmentsalongthe two quali-
alsopossible
that the amplitude-perturbation
signalssound
unnaturalbecause
amplitudevariabilityisoccurringagainst
a backgroundof perfect periodicity in other dimensions.
More natural-soundingsignalsmight be producedif amplitudeperturbationwerecombinedwith othertypesof aperiodicity. Experimentsthat arejust underwayin our laboratory
are designedto studythe perceptualpropertiesof sustained
vowelsthat combineseveralsourcesof aperiodicity.
The experimentswith stimuli varyingin pitch and amplitude perturbationalsoshowedthat perceivedroughness
wasaffectednot onlyby the amountof perturbation,but also
by the degreeof correlationamongadjacentpitchor amplitude values.In general,stimuli that were generatedfrom
correlatedsequences
tended to soundsignificantlymore
roughthan stimulithat were generatedfrom uncorrelated
sequences.
Thiswasespecially
truefortheamplitude-perturbationstimuli,wherethe differencein perceivedroughness
betweencorrelatedanduncorrelated
signalswasverylarge.
For the pitch-perturbation
signals,therewassomeevidence
ty dimensionsthat were tested.
thatthiseffect
mayhave
been
atleast
partly
duetothechoice
of meanjitter as a way to calculatepitch perturbation.The
difference
between
correlated
anduncorrelated
signals
was
reduced
significantly
whenstimuliwereequated
for either
PPQ (Davis, 1981) or thestandarddeviationof signedjitter.
However,changingthe calculationmethoddid not entirely
eliminatethedifferencein perceivedroughness
betweencorrelatedanduncorrelated
pitch-perturbation
signals,
andthis
effectseemedto be largelyunrelatedto calculationmethods
for the amplitude-perturbation
signals.
One implicationof thesefindingsis that acousticmeasurementtechniques
might needto accountfor the sequential characteristics
of pitchor amplitudechangein addition
to the degreeof perturbation.Work along theselines has
been reportedby Koike (1973), who studiedautocorrelation functionsof voiceamplitudesequences
from normal
speakers
andpatientswitheitherlaryngealneoplasms
or unilateralvocalcordparalysis.Koike reportedthat the presencepeaksin the autocorrelationfunction at lags of 3-12
periodscouldbe usefulin differentiatingpatientswith neoplasmsfrom theothertwo groups.The presentfindingssug-
As indicatedpreviously,thelong-termgoalof thiswork
is to learnsomethingaboutmultivariateratherthan simply
univariate relationshipsbetweenperceiveddysphoniaand
variationin underlyingacousticparameters.Experiments
that are currentlyunderwayusingmore complexsynthetic
stimuli have been designedto determinehow acousticparameterssuchas the onesexaminedin the presentstudy
combineto influencelistenerjudgmentsof the overall severity of dysphonia,as well as the type of dysphonia.Another
important issuethat will need to be addressedin future researchconcernsthe generalizabilityof resultsbasedon sustained vowelsto continuousspeech.Sustainedvowelshave
beenstudiedheavilybecause
the measurement
problemsare
moretractableand because
the psychophysical
characteristicsarelikelyto bemuchsimpler.However,therearea number of phenomena
foundonly in continuous
speech(e.g.,
pitchbreaksandaperiodicities
at transitionsbetweenvoiced
andunvoicedsegments)that are likely to play a strongrole
in theperceptionof dysphonia.A majorchallengefor future
researchwill be to determinehow thesedynamiccharacteristicsinteract with the kindsof aperiodicitiesthat were examined in the presentstudy.
ACKNOWLEDGMENTS
I am verygratefulto Dale Metz and RobertWhitehead
of the National Technical Institute for the Deaf for their
technicaladvice,commentson previousdrafts,and the generouscontributionof theirtimein listeningto thetestsignals
and suggesting
improvementsto the procedures.I would
like to thankRaymondColtonfor hishelpfulcommentsona
previousdraft,ThomasRidleyfor hishelpwith dataanalysis,datacollection,andsoftwaredevelopment,
andWilliam
Martens
andMartinWildefortheirhelpindeveloping
techniquesfor the generationof correlatedrandom numbers.
This researchwassupportedby NIH Grant No. 7-R01-NS23703-01to RIT ResearchCorporation.
Aronson,A. (1980). ClinicalVoiceDisorders:
An Interdisciplinary
Approach(Thieme-Stratton,New York).
gestthatthesequential
characteristics
of pitchandampli-
Coleman,R. F. (1969). "Effectsof medianfrequency
levelsuponthe
tude change might need to be incorporated in the
developmentof a quantitativeindexof the severityof dys-
Davis,S.B. (1976)."Computer
evaluation
oflaryngeal
pathology
based
on
inverse
filteringof speech,"
SCRLMonogr.13,Speech
Communication
phonia. More research will be needed, however, to deter-
roughness
jitteredstimuli,"J. SpeechHear.Res.12, 330-336.
Research Laboratory. Santa Barbara. CA.
mine how the sequentialpropertiesof pitch and amplitude
Davis,S.B. (1981)."Acoustical
characteristics
ofnormalandpathological
changein naturallyproduced
voicescanbequantified
and
Deal,R. E., andEmanuel,
F. W. (1978)."Somewaveform
andspectral
features
of vowelroughness,"
J. Speech
Hear.Res.21,250-264.
Emanuel,
F. W., andSansone,
F. (1969)."Somespectral
features
of 'nor-
how this information can be combined with more standard
measures
of perturbation.
Theprimaryfinding
fromthelistening
testsusingstimuli varying in additive noisewas the failure to observean
effectfor the spectralslopeof the periodiccomponent.In
general,stimuliwith similarsignal-to-noise
ratiostendedto
receiveverysimilarratingsof eitherbreathiness
or hoarseness.It is importantto note that thesefindingsdo not indi2370
J.Acoust.
$oc.Am.,Vol.83,No.6, June1988
voices,"ASHA Rep. 11, 97-115.
mal' and'simulated
rough'vowels,"FoliaPhoniatr.21, 410-415.
Fritzell,B.,Hammarberg,
B.,andWedin,L. (1977}. "Clinical
applications
of acoustic
voiceanalysis,
PartI: Background
andperceptual
factors,"
Speech
Trans.Lab.Q. Prog.Stat.Rep.2-3, 31-38.
Froekjaer-Jensen,
B.,andPrytz,S.(1976)."Registration
ofvoicequality,"
Brueland Kjaer Tech.Rev. 3, 3-17.
Gardner,M. (1978). "White and brownmusic,fraetalcurves,and oneover-f fluctuations,"Sci. Am. 238, 16-31.
JamesHillenbrand:
Perception
ofaperiodicities 2370
Gauffin,J., andSundberg,
J. (1977). "Clinicalapplications
of acoustic
voiceanalysis,Part II: Acousticanalysis,results,and discussion,"
SpeechTran.Lab.Q. Prog.Star.Rep.2-3, 39-43.
Gill, J. S. (1961). "Automatic extraction of the excitation function of
speechwith particularreferenceto the useof correlationmethods,"in
Proceedings
of the ThirdInternational
Congress
onAcoustics
(Elsevier,
Amsterdam),Vol. 1, pp. 217-220.
Hammarberg,
B., Fritzell, B., Gauffin,J., Sundberg,
J., and Wedin,L.
(1980). "Perceptual
andacoustic
correlates
of abnormal
voicequality,"
Acta Otolaryngol.90, 441-451.
Hecker,M., andKruel,E. J. (1971). "Descriptions
of thespeech
ofpatients
withcancerof thevocalfolds,PartI: Measures
of fundamental
frequency," J. Acoust. Soc. Am. 49, 1275-1282.
Heiberger,V. L., and Horii, Y. (1982). "Jitter and shimmer in sustained
phonation," in Speechand Language.'Advancesin BasicResearchand
Practice,Vol.7, editedby N.J. Lass(Academic,New York), pp. 299332.
Hillenbrand,
J. (1987). "A methodological
studyof perturbation
andadditivenoisein synthetically
generated
voicesignals,"J. SpeechHear.Res.
30, 448-461.
Hirano,M. ( 1981). ClinicalExamination
of Voice(Springer,
NewYork).
Hollien,H., Michel,J., andDoherty,E. T. (1973). "A methodfor analyzing vocaljitter in sustainedphonation,"J. Phon. 1, 85-91.
Holmes,J. N. (1962). "The effectof simulatingnaturallarynxbehavioron
the quality of synthetic speech,"SpeechCommunicationsSeminar,
SpeechTransmission
Laboratory,RoyalInstituteof Technology,Stockholm, Sweden.
Horii, Y. (1979). "Fundamentalfrequency
perturbation
observed
in sustainedphonation,"J. SpeechHear. Res.22, 5-19.
Horii,Y. (1980). "Vocalshimmerin sustained
phonation,"
J.Speech
Hear.
Res. 23, 202-209.
Horii, Y. (1982). "Jitterandshimmerdifferences
amongsustained
vowel
phonations,"
J. SpeechHear. Res.25, 12-14.
Jacob,L. (1968). "A normatirestudyof laryngealjitter," Master'sthesis,
Universityof Kansas,Lawrence,KA, unpublished.
Jensen,J.P. (1965). "Adequacyof terminologyfor clinicaljudgmentof
voicequality deviation," Eye, Ear, Nose Throat Mort. 44, 77-82.
Kempster,G. B. (1984). "A multidimensional
analysisof dysphoniain two
dysphonicgroups,"Ph.D. thesis,NorthwesternUniversity, Evanston,
IL, unpublished.
Kempster,
G. B., andKistler,D. J. (1984). "Perceptual
dimensions
ofdysphonicvoices,"J. Acoust.Soc.Am. Suppl.1 75, S8.
Kitajima,K., andGould,W. J. (1976). "Vocalshimmerin sustained
phonationof normalandpathologic
voice,"Ann. Otol.Rhinol.Laryngol.85,
377-381.
Francisco).
Murry, T., andDoherty,E. T. (1980). "Selectedacousticcharacteristics
of
pathologic
andnormalspeakers,"
J. Speech
Hear.Res.23, 361-369.
Murry, T., Singh,S., and Sargent,M. (1977). "Multidimensionalclassificationof abnormalvoicequalities,"J. Acoust.Soc.Am. 61, 1630-1635.
Nichols,A. C. (1979). "Jitterand shimmerrelatedto vocalroughness:
A
commentontheDeal andEmanuelstudy,"J. SpeechHear.Res.22, 670671.
Prosek,R. A., Montgomery,A. A., Walden, B. E., and Hawkins, D. B.
(1984). "Somerelationsbetweenvoice-qualityjudgmentsand derived
acousticmeasurements,"
J. Acoust.Soc.Am. Suppl.I 75, S8.
Rozsypal,A. J.,andMillar, B. F. (1979). "Perceptionofjitter andshimmer
in syntheticvowels,"J. Phon.7, 343-355.
Robbins,J. (1981). "A comparative
acousticstudyof laryngealspeech,
esophageal
speech,and speechproductionafter tracheo-esophageal
puncture,"Ph.D thesis,Northwestern
University,Evanston,
IL, unpublished.
Sansone,
F., andEmanuel,F. W. (1970). "Spectralnoiselevelsandroughnessseverityratingsfor normalandsimulatedroughvowelsproducedby
adult males,"J. SpeechHear. Res. 13, 489-502.
Schroeder,M. R. (1961). "Recentprogressin speechcodingat Bell TelephoneLaboratories,"
in Proceedings
of the Third InternationalCongress
onAcoustics
(Elsevier,Amsterdam),Vol. 1, pp. 201-210.
Simon,C. (1927). "The variabilityof consecutive
wavelengths
in vocaland
instrumentalsounds,"Psyehol.Monogr.36, 41-83.
Smith,B., Weinberg,B., Feth, L., and Horii, Y. (1978). "Vocaljitter and
roughness
characteristics
of esophageal
speech,"J. SpeechHear. Res.21,
240-249.
Takahashi,H., and Koike, Y. (1975). "Someperceptualdimensions
and
acousticalcorrelatesof pathologicvoices,"Acta Otolaryngol.Suppl.
338, 1-24.
Wendahl,R. (1963). "Laryngealanalogsynthesisof harshvocalquality,"
Folia Phoniatr. 15, 241-250.
Wendahl, R. (1966a). "Some parametersof auditory roughness,"Folia
Phoniatr. 18, 26-32.
Wendahl,R. (1966b). "Laryngealanalogsynthesis
of jitter and shimmer:
auditoryparameters
of harshness,"
Folia Phoniatr.18, 98-108.
Wilde, M.D., and Martens,W. L. (1985). "VSYN: A computerprogram
for synthesizing
vocalsignalsvaryingin perturbationandsignal-to-noise
ratio," NorthwesternUniversity,Evanston,IL.
Wilde, M.D., Martens, W. L., Hillenbrand, J., and Jones,D. R. (1986).
Kitajima,K., Tanabe,M., and Isshiki,N. (1975). "Pitchperturbation
in
normaland pathologicvoice,"Stud.Phonol.9, 25-32.
Klatt, D. H. (1980). "Softwarefor a cascade/parallel
formantsynthesizer," J. Acoust. Soc. Am. 67, 971-995.
Koike,Y. (1973). "Applicatio
n of someacoustic
measures
for theevaluationof laryngealdysfunction,"Stud.Phonol.7, 17-23.
Kojima,H., Gould,W. J., Lambaisc,A., andIsshiki,N. (1980). "Computer analysisof hoarseness,"
Acta Otolaryngol.89, 547-554.
Lieberman,P. (1963). "Someacousticmeasuresof the fundamentalperiodicityof normalandpathologic
larynges,"J. Acoust.Soc.Am. 35, 344353.
2371
Lively, M. A., and Emanuel,F. W. (1970). "Spectralnoiselevelsand
roughness
severityratingsfor normalandsimulated
roughvowelsproducedby adultfemales,"J. SpeechHear.Res.13, 503-5t7.
Mandelbrot,B. (1983). TheFractalGeometryof Nature (Freeman,San
J. Acoust.Soc. Am., Vol. 83, No. 6, June 1988
"Externalization
mediates
changes
in theperceived
roughness
of sound
signals
withjitteredfundamental
frequencies,"
in Proceedingsof
the1986
InternationalComputer
MusicConference,
The Hague.
Yanagihara,N. (1967). "Significance
of harmonicchangeand noisecomponentsin hoarseness,"
J. SpeechHear. Res.10, 531-541.
Yumoto, E., Gould, W. J., and Baer, T. (1982). "Harmonics-to-noise ratio
asan indexof the degreeof hoarseness,"
J. Acoust.Sec.Am. 71, 15441550.
Yumoto, E., Sasaki, Y., and Okamura, H. (1984). "Harmonics-to-noise
ratio and psychophysical
measurement
of the degreeof hoarsehesS,"
J.
SpeechHear. Res.27, 2-6.
James Hillenbrand:Perceptionof aperiodicities
2371
Download