A High Resolution Fundamental Frequency

advertisement
A high resolution fundamental frequency determination based
on phase changes of the Fourier transform
Judith C. Brown
PhysicsDepartment,WellesleyCollege,We/lesley,Massachusetts
O12$1andMedia Lab, Massachusetts
Instituteof Technology,
Cambridge,Massachusetts
02139
Miller S. Puckette
IRC•4M, 31 rue St Merri, Paris 75004, France
(Received16 June 1992;revised5 April 1993;accepted6 April 1993)
The constantQ transform describedrecently [J. C. Brown and M. S. Puckette,"An efficient
algorithmfor the calculationof a constantQ transform,"J. Acoust.Soc.Am. 92, 2698-2701
(1992}] has beenadaptedso that it is suitablefor trackingthe fundamentalfrequencyof
extremelyrapid musicalpassages.
For this purposethe calculationdescribedpreviouslyhasbeen
modifiedso that it is constantfrequencyresolutionrather than constantQ for lower frequency
bins.This modifiedcalculationservesas the input for a fundamentalfrequencytracker similar
to that describedby Brown [J. C. Brown, "Musical fundamentalfrequencytracking usinga
patternrecognitionmethod,"J. Acoust.Soe.Am. 92, 1394-1402( 1992)]. Oncethe fastFourier
transform(FFT) bin corresponding
to the fundamentalfrequencyis chosenby the frequency
tracker,an approximationis usedfor the phasechangein the FFT for a time advanceof one
sampleto obtainan extremelyprecisevaluefor thisfrequency.Graphicalexamplesare givenfor
musicalpassages
by a violin executingvibrato and glissandowhere the fundamentalfrequency
changesare rapid and continuous.
PACS numbers: 43.75.Yy, 43.60.Gk
INTRODUCTION
time resolution
Our fundamentalfrequencytracker is basedupon the
calculationof a constantQ spectraltransformdescribed
recently by Brown and Puckette (1992). Here we showed
that the direct calculationof the componentsof a constant
Q transformcan be accomplishedmore efficientlyby a
of 25 ms or less is desirable. We are thus
confrontedwith the usualdilemmain choosingan accept-
able trade-offbetweentemporaland frequencyresolution.
Our compromiseconsistsof limiting the temporalextent of the window.This meansthat the low-frequencybins
of our transformaxeconstantfrequencyresolution(equal
to the sample rate over the temporal window length)
transformation of the fast Fourier transform (FFT). To
rather than constantQ. For example,we may choosefor
summarize that calculation, the direct calculation can be
the centerfrequenciesof the bins of the transformto corcarried out using
respondto the frequenciesof notesof the equal tempered
musicalscalebeginningwith the first bin at C3 ( 130.9Hz).
x•q[kq]
=
w[n,kcq]x[
n]e-J•'k,•
n,
Then a windowlengthof 25 ms meansthat the resolution
is a constantequalto 40 Hz (while the Q is variable)up to
whereX•q[kcq]
is the keqcomponent
of the constant
Q a frequencyof 717.8 Hz or the 30th bin. The resolutionis
transform,
to[n,kcq
] isa window
function
of length
N[kcq
], then variable with Q constantequal roughly 17 up to a
x[n]isasampled
function
oftime,
andcok•q
isthefrequencyfrequencyof 5274 Hz or the 65th bin. We will call this
of this component.
transform the modified constantQ transform, and we will
This can be evaluated using the following form of
show that limiting the temporalextent of the window for
Parseval'sequation
the low-frequencybins doesnot lead to decreasedperformancefor the detectionof the fundamentalfrequency.It
N-•
I •v-•
does, of course, mean greater spillover for the lowfrequencybins.
if we define
An exampleof this calculationcan be foundin Fig. 1
for a clarinet playing a chromaticscale where we have
w[n,ko•]eiOkt,"=.7F
* [n,kcql
.
plottedthe amplitudeof the modifiedconstantQ transform
Since
thefunctions
•/•*[n,kq]haveveryfewnonzero
com- componentsagainstbin number in each frame. Time is
increasingvertically for theseframes. Here the lower Q's
ponents,the fast Fourier transformX[k] of the signalx[n]
(with the lowest value about 5) of the low-frequencybins
can be transformed into a constant Q transform with very
are manifestedby a greater bandwidth. This figure can be
few additional operations.
comparedto Fig. 4 of Brown and Puckette (1992) for a
For applicationto the performanceof modem comspectrumof the samesoundcalculatedwith a "true" conputer music, which includesextremely rapid passages,a
•v[k••
• x[n]•*[n,kc•]----•
k=o
662
J. Acoust.Soc. Am. 94 (2), Pt. 1, August1993
0001-4966/93/94(2)/662/6/$6.00
@ 1993 AcousticalSocietyof America
662
CLARINET
SCALE
VIOLIN
SCALE
80o
700
A
600
300
121
200
0
I
2
2.53
time (s) ->
32.5
FFT BIN NUMBER
•.c
65
FIG. 2. Frequencytrackingresultsfor a violinplayingtwo octavesof a
diatonicscale.The opensquareis the resultfrom the modifiedconstantQ
fundamentalfrequencytracker.The solidtriangleis the precisefrequency
returnedby the high-resolutionphasecalculation.
o
->
FIG. 1. Amplitudeof the modifiedconstantQ spectralcomponents
plotted againstbin number(corresponding
to frequency)for a seriesof time
frames.A clarinetwith microphone
placedin the barrelis playingseveral
octaves of a chromatic scale.
stantQ of 17. It is interestingto note that the evenharmonicsare missingas predictedfor a clarinetup to about
the 29th bin and at this point the energyin the third harmonicbeginsto decrease.The clarinetsoundwas recorded
with the microphonein the barrel of the instrumentsothe
spectrumis not as rich as for a normal clarinet sound.
I. CALCULATION
OF INITIAL FREQUENCY
ESTIMATE
When the constantQ transformof a soundconsisting
of harmonicfrequencycomponents
is plottedagainstlog
frequency,the spacingof thesecomponentsis invariant
(Brown, 1991). The fundamentalfrequencycan then be
determinedby findingthe positionin log frequencyspace
of this invariantpattern.This is bestaccomplished
by calculatingthe cross-correlation
functionof eachframeof the
log frequencyspectrumwith the ideal patternas discussed
by Brown (1992).
The "ideal pattern" used in calculatingthe crosscorrelationfunctionconsistsof componentswith the frequencyspacingdiscussed
aboveand with amplitudesdecreasinglinearly from 1 for the fundamentalto 0.6 for the
highestharmonic.The purposeof varyingthe amplitudes
is to preventthe choiceof the frequencyan octavebelow
that of the true fundamental.For this positionof the ideal
pattern, all even componentsof the ideal pattern line up
with components
presentin the spectrum.This is because
the spacingof components
2f, 4f, 6f, etc. is the sameas
that of components
f,, 2f, 3f, etc. If all components
are
weightedequally, the value of the cross-correlation
function will havethe samevaluefor evencomponents
of the
patternalignedwith the components
of the signalasfor the
"true" positionwhereall components
of the patternfall on
663
J. Acoust.Soc.Am.,Vol.94, No. 2, Pt. 1, August1993
their counterpartsin the test spectrum. However, with
smaller weights on higher components,this error is
avoided.
We haveestimatedthe run time for our algorithmwith
calculationscarried out on a 40-MHz Intel i860 usinga
hand-codedroutine. With a 512-pointFFT and quarter
tonespacingoverthreeoctaves,the FFT takes343/rs and
the transform1664-2/•s (measuredon an oscilloscope).
The cross-correlation
calculation
involves well under 250
multipliesevenwith ten components
in the harmonicpattern so the computationtime for this operationis negligible. The overallcomputationtime then is on the order of
0.5 ms. This can be comparedto a time advanceof 25 ms
betweenframes, so the calculation is easily carried out in
real time.
Resultsof our calculationappliedto digitizedviolin
and clarinetscalesare found in Figs. 2 and 3 where the
opensquarescorrespondto the frequenciesof the notesof
the equaltemperedscalechosenby our frequencytracker
plotted againstframe number.The solid triangleswill be
explainedin the next section.Each framecorresponds
to a
time advanceof 25 ms in the sound.Theseare examplesof
instrumentswith very differentspectra.The violin has a
complexspectrumwith many higher harmonicspresent.
This clarinet soundwas recordedwith the microphonein
the barrel and has a relativelysimple spectrumwith a
smallernumberof harmonics.The numberof components
in the ideal pattern used for the crosscorrelationvaried
accordingly,with 10 componentsfor the violin and three
componentsfor the clarinet. There were essentiallyno errors in determiningfundamentalfrequenciesof the notes
presentwith our modifiedconstantQ transform.
II. HIGH-RESOLUTION
FREQUENCY
DETERMINATION
A. Phase background
The importanceof the phasein the discreteFourier
transform(DFT) hasbeendiscussed
by Oppenheimand
J. Brownand M. PuckeRe:
Highresolution
frequency
determination 663
CLARINET
SCALE
tunedto temperaments
otherthan equaltempered.For all
of thesecasesthe frequencydeterminationmust be much
2OOO
more accurate than 6% in order to track the audio waveform.
The frequencyof a particular Fourier componentas
obtainedfrom the bin into which it falls in the magnitude
spectrumis only as accurateas the frequencydifference
betweenbins, in our case6% or 3% dependingon the
IOO0
calculation.This estimatecan be improvedby a quadratic
fit usingthe amplitudesof the bin containingthe maximum
and the two adjacentbins and identifying the positionof
the maximum of the parabolathus obtained.This is an
extremely well-known technique describedrecently by
Smith and Serra (1987). We will discussthe accuracyof
this approximationin a later section.
Even more accurateis a method we have developed
time (s) ->
basedon an approximationfor the phasechangeper unit
samplefor the Fourier componentchosenas the correct
fundamental
frequencyby our frequencytracker.It is well
FIG. 3. Frequencytrackingresultsfor a clarinetplayingseveraloctaves
known
that
the
frequencyas determinedby the changein
o• a chromatic
scale.
Theopensquare
is theresultfromthemodified
constantQ fundamentalfrequencytracker.The solidtriangleis the prephaseis muchmoreaccuratethanthat obtainedfrom the
eisafrequencyreturnedby the high-resolution
phasecalculation.
magnitudespectrum.However, the problem with deter-
miningthefrequency
fromthephasedifference
overa rehLuis ( 1981). They pointout that in a varietyof disciplines
phaseonlyinformationleadsto a morerecognizable
reconstructionof the originalobjectanalyzedthan doesinformation basedon magnitudeonly. Friedman (1985) demonstrated that one can obtain narrower formant bands for
sonablehopsize(samplesbetweenframes)is oneof phase
unwrapping.The phasechangeis only knownmodulo2•z.
This problemdoesnot arisewith a hop sizeof one sample
sincethe highestdigital frequencyis •r radians/sample,
but
this case necessitates
the computationof an additional
sonagramsof speechusing a histogramof occurrences FFT.
againstfrequencywith the frequencyobtainedfrom the
In fact this additionalcomputationcan be avoidedby
time derivative of the phase of the short-time Fourier
usingan approximation
whichassumes
that x[n] is peritransform.Computationally
this wasobtainedby calculat- odic.The phasechangefor a hopsizeof onesamplecanbe
ing two short-timeFouriertransform(STFT's) with the obtained from the following identity (Oppenheim and
derivative of the window function used in one of them.
Schafer,1975; Charpentier,1986). If •-{x[n]}=X[k] is
Earlier, in work on the phase roeoder applied to
the kth componentof the discreteFourier transformof
speechsignalsFlanaganand Golden (1966) usedphase x[n], then
differences
to obtaingreateraccuracyof Fourier compo•-{x[n+m] }•_eJ2•'km/•VX[k]
( 1)
nents. Beauchamp(1966, 1969) and Grey and Moorer
(1977) useda similartechniqueappliedto musicalsignals.
is the DFT after m samples.
See also Moorer (1978) and Dolson (1986).
The aboveequationappliesto an unwindowedDFT. It
Charpentiex(1986) describeda pitch trackerbasedon
is possibleto usethis resultto obtaina Hanning-windowed
frequencies
obtainedfrom an approximation
for the phase
transformsincethe effectof windowingcan be calculated
differenceof time frames of the STFT separatedby one
in the frequencydomainfor this window.We will usethe
sample.We obtainedthis sameexpressionindependently
notation
XZt[k]to denote
theHanning-windowed
Fourier
and will discussit in the followingsection.
transformevaluatedfor a window beginningon sampleno,
that is
B. Phase
calculation
N--I
The method of frequencydeterminationwhich we
have described in the Introduction and in Brown (1992)
works extremely well for instruments playing discrete
notes belongingto the equal temperedscale. Here the
smallestfrequencydifferencebetweennotesis approximately 6%, and the resultsare reportedas notesof the
equaltemperedscale.However,a very differentsituation
can arise in passages
playedby stringedinstrumentsor
wind instruments.
These instruments are not constrained
to play discretenotesas are keyboardinstruments.
Thus
the frequencycanvary continuously
asin, for example,a
glissando
or vibrato.Keyboardinstruments
may alsobe
664
J. Acoust.Soc. Am.,VoL94, No. 2, Pt. 1, August1993
Xu[t,no]= • x[n+no]w[n]e-J2'•kn/•
where
i
1
w[ n ] =•--• cos(2•rn/N)
-- 1/2[ 1 (!løj2;n/N l"--j2•rn/N1
Substitutingthis expressionfor the window in the precedingequationleadsto
XH[k,no]
=«{X[k]---}X[k+ 1]---}X[k--1] }.
(2)
J. Brownand M. PuckeRe:Highresolution
frequencydetermination 664
Using Eq. (1) with rn= 1 in Eq. (2), the approximation for the Hanning-windowed
DFT after one sampleis
TEST
FILE
WITH
THREE
SINUSOIDS
5000
X•[ k,no+1] =«{eJ2•k/NX[
k] -«eJ2•(k
+•)/NX[k+ 1]
_ «e22•(k-•)/•vX[
k_ l ]}.
And finally
3000
X•[k,no+
I I =«eJ2•/•V{X[
k] --«ed2•r/•vX[
k+ 11
--«e-J2•/•VX[
k--1]
(3)
The digitalfrequencyin radiansper samplefor the kth
bin corresponding
to the phasedifferencefor a time advanceof one sampleis
to(k,no) = • (k,n0+ 1) -qb(k,no),
2000
1000
0
(4)
100
FFT
BIN
NUMBER
->
where
[Im(X•r[k,no+1] )
•(k,no
+1)=arctan
/Re
(XH
[k,no
+1])}
FIG. 4. High-resolution
frequency
ascalculated
in Eq. (4) fi3reachbin
plottedagainstbin numberfor a test file with threecomponents.
and
as describedaddsa negligibleamountto the computation
time sinceit is only carriedout for oneDFT bin. Oncethe
flm(XH[k,no]
)
4(k,n0)
=arctan
[Re(
Xa[
k,no
])}.
bin number for the initial estimate of the fundamental
This expression
for the phasedifferenceholdsfor any DFT
bin with the bin indicatedby k. For usewith a fundamental
frequencytracker, the calculationwould only be usedon
the bin selectedas winner by the tracker.
III. RESULTS
To checkthis methoda testfile wasgeneratedin software consistingof the superposition
of sinusoidalcomponentsof equal amplitudeat frequencies434.97, 1739.88,
and 3479.77Hz. Thesefrequencies
werechosento fall into
bin positions10.1, 40.4, and 80.8. A 256-pointDFT was
taken, and the real and imaginarycomponentsobtained
were substitutedinto Eq. (4) usingthe definitionsfollowing this equation.The calculationwascarriedout for each
of the 128 positivefrequencycomponents
of the DFT.
The result is found in Fig. 4 where we have plotted
to(k) (convertedto Hertz for a samplerate of 11025)
againstbin number. Note that the calculatedfrequencies
are correct for five or more bins on either side of the bin
fre-
quencyis selected
from the constantQ transform,a calculation is madeto determinethe corresponding
bin number
for the FFT. The real and imaginarypartsof the FFT for
this bin and thoseon either sideof it were previouslycalculated,and only thesethree complexnumbersare needed
for the evaluationof Eqs. (2) and (3). Theseare then used
in Eq. (4) with the definitionsfollowingit.
The precisefrequencies
were calculatedfor the violin
and clarinet scalesdiscussedpreviously.The resultsare
givenby theclosedtrianglesin Figs.2 and3. Note thatfor
the violin the precisefrequencyis correctin severalcases
wherethe initial estimatewas off by a bin. For the clarinet
in Fig. 3, the precisefrequency
is correctfor the firstthree
notes where the initial estimate was incorrect because the
constantQ transformdid not extendto that low a frequency.
The powerof thismethodis evenmoreapparentwhen
it is appliedto the acousticsoundsfor whichit is :intended.
In Figs.5 and 6 arefoundthe frequencies
(solidtriangles)
obtainedusingEq. (4) on the output from the frequency
trackerdescribedin Sec.I. A violin is executinga glissando
in Fig. 5 and vibratoin Fig. 6. For comparison
we include
the opensquaresindicatingthe resultsfor the frequency
tracker without the phasecorrection.For the glissando
two pointson the precisefrequencycurveare slightlyoff,
into which the frequencybelongs.This point will be discussedfurther in the sectioncomparingthis method with
the phasevocoder.
The measuredfrequenciesas determinedby Eq. (4)
were alsoprinted out and were correctto the two decimal
placesindicatedabove.The solid diagonalline represents but these errors are small. We are uncertain as to their
origin.
the centerfrequencyof theseDFT binsplottedagainstbin
number.The analogousgraphobtainedfrom an exactmeasurement of to based on the calculation of two successive
DFT's with a time advanceof one samplewas identicalto
this graph and is not included. This means that the assumption of periodicity used in Eq. (1) is extremely good
for a hop size of 1 sample.
When
used as the back
end of a fundamental
fre-
quencytracker, the determinationof the precisefrequency
665
J. Acoust.Soc. Am., Vol. 94, No. 2, Pt. 1, August1993
IV. COMPARISON
OF ACCURACY
METHOD WITH QUADRATIC
FIT
OF PHASE
For a sampled function y(x) where values are only
known for integralx, the positionof the "true" maximum
of the function usuallyoccursat nonintegralvaluesof x.
One widely used meansof approximatingthe (noninte-
J. Brownand M. Puckette:High resolutionfrequencydetermination 665
VIOLIN
GLISSANDO
TABLE I. Error in quadraticfit calculation.The first column is the
positionof the true maximumof the functionrelativeto zero.The second
columnis the positionof the maximumpredictedby the quadraticfit in
Eq. (6), and the third columnis the error in the quadraticfit calculation
900
(difference in columns one and two).
r
800
700
v vv
600
500
Q fit
Error
--0.500
--0.500
--0.400
--0.357
--0.043
0.000
--0.300
-0.247
--0.053
--0.200
--0.156
--0.044
-- 0.100
-- 0.076
-- 0.024
0.000
0.000
0.000
0.100
0.076
0.024
0.200
0.156
0.044
0.300
0.247
0.053
0.400
0.357
0.043
0.500
0.500
0.000
time (s) -•
FIG. 5. High-resolutionfrequencyplotted againsttime for a violin executinga glissando.
Opensquaresrepresentthe resultsof the fundamental
N-I
X[k]= • x[n]e-j2•rl•n/N,
n=0
frequencytrackerwith solidtrianglesthe high-resolution
results.
where
x[n] = e-j2•(%+r)•/At.
Forthek[hbinweobtain
gral) x valuewherethe true maximumoccursis by fitting
a parabolathroughthe point where the maximumof the
sampledfunctionoccursand the pointson either side.
If the maximumoccursat x = 0 andy (0) =Y0, thenthe
threepoints(-1,y_•), (0,y0),and (1,y+•) labeledwith
their x andy valuesare assumedto lie on a parabola.With
a little algebrait is easyto showthat the maximum of the
parabolaoccursat
x=
Y+t--Y-•
(5)
2(2yo-y+t-y_•) '
The accuracyfor the quadraticfit methodcan be calculatedexactly for an input signalconsistingof a single
harmoniccomponent.
For a componentfallinginto bin k0
with a frequencycorrespondingexactly to ko+r where
-0.5<r<0.5 we evaluateEq. (2)
sin(rrr)
X[ko]
=e-1•r(ar-•
)/ssin(
,rr/N)
'
For X[ko+l ], r-•r-1
and for X[k0-1 ], r-,r+l
in the
aboveexpression.
We then useEq. (2) to obtainthe Hanning-windowed
values
forthese
Fourier
coefficients,
Xn[k0],
Xn[k0+1]and
Xn[ko
- 1].ThesethreeFouriercoefficients
willbethose
havingthe maximumvaluesand their amplitudescan be
substitutedin Eq. (5) to obtain the fractionalvalue of k
correspondingto the maximum of the parabola fitted
through thesethree points.
IXn[k0+l]l --IXm[k0-III
5k=2(2
IX"[k0l
I- IX"[ko+I- IX"[ko-ll'
(6)
Sincethe correctvalueis k0+ r, the error in the quadratic
fit is then
VIOLIN
VIBRATO
r-•Sk.
800
We have carried
600
500
400
300
for values of r
these results for different frames.
200
1oo
o
time (s) ->
FIG. 6. High-resolutionfrequencyplottedagainsttime for a violin executing vibrato. Symbolshave the same meaningsas in Fig. 5.
666
out the calculation
from --0.5 to 0.5 and reportedthem in Table I alongwith
the error in the valuegivenby the quadraticfit whichis the
differencein the columnslabeled r and Q fit. We also verified thesevaluesexperimentallyby generatinga sinusoid
with the appropriater, carryingout a Hanning-windowed
FFT analysisfor 10 successive
frames,and then determining the positionof the maximum using the quadratic fit
formula of Eq. (5). The results were identical to those
givenin columnII of Table I with zero deviationamong
J. Acoust. Soc. Am., Vol. 94, No. 2, Pt. 1, August 1993
It shouldbe notedthat the error is independentof the
bin number ko so the fractional error which would be reportedfor an FFT would in fact be the error in the third
columnof Table I dividedby k0+ r.
Our calculationbasedon phasedifferencesfrom Eq.
(4) was also applied to severalof thesefrequencies.The
d. Brown and M. Puckette:High resolutionfrequencydetermination
666
TABLE II. Comparison
of averagefrequencies
predictedby phasedifferenceapproximation
(row two) andquadraticfit (row three)withtheir
standarddeviations
from 10 measurements.
Row onegivesthe exactfrequencies
in the testsignal.
Frequency
True
133.506
Phaseapprox.
Quad. fit
133.505
132.478
s.d.
Freq.
s.d.
267.012
Freq.
s.d.
400.518
0.654 267.009 1.277 400.514 1.036
0.4155 265.138 0.771 398.214 0.151
deviationfrom the correctfrequencywas lessthan 0.01%
with this method.We thus concludethat, for a signalconsistingof a singlecomponent,the phasemethodis more
accuratethan the quadraticfit.
We then determinedthe effectof "spillover"from adjacentbinsby generatinga soundconsistingof the sumof
componentsin exact bin positions3.1, 6.2, and 9.3. The
results for 10 frames are found in Table II.
For this casethe phasemethod again givesa more
accuratevalue,but the standarddeviationis greater.Sothe
confidencein a singlemeasurementwould be lower for the
phasemethod.It shouldbe noted that the actual error is
greaterthan the standarddeviationfor the quadraticfit.
V. COMPARISON
VI. CONCLUSION
Our methodof trackingthe fundamentalfrequencyof
musicalpassages
in real time is extremelyaccurateand
reportsfrequenciesto the nearestquarter tone. Our high
resolutionfrequencydeterminationcan be usedas a back
end for a fundamentalfrequencytrackerwherehigh precision is desired.Applicationsrange from analysisof
soundswith continuousfrequencyvariationto determination of temperamentfor performance
studiesin cognitive
psychology.
ACKNOWLEDGMENTS
JCB is gratefulto IRCAM for their hospitalityduring
a Sabbatical leave when much of this work was carried out
and to WellesleyCollegefor its generousSabbaticalleave
policy.Shewouldlike to thankDan Ellis of the MIT Media Lab for invaluableand illuminatingconversations.
Finallysheis gratefulto JimBeauchamp
of the Universityof
Illinois for many helpfule-mail discussions.
TO THE PHASE VOCODER
A comparisonof this methodwith that of the phase
vocoder is useful. It should be noted that there are two
methodsof conductingphasevocoderanalyses.With the
filterbankapproachthe sinusoidalanalysisfunctionsmaintain a fixedphasewith respectto the signal,and one measuresdeviationsfrom center frequencies.On the other
hand, the phasevocoderanalysisis often carried out by
takingFFT's, wherethe analysisfunctionsdo not maintain
their initial phasewith respectto the input signal,but
rather restart at zero for each calculation.
In this case the
absolutephase is measured.To compensate,the phase
changecorrespondingto the bin center frequencyis subtractedfrom the measuredphasechange,or an equivalent
method,suchas circularlyrotatingsamplesin the analysis
window,is applied.There are, nevertheless,
errorsin bins
far removed from the bins in which componentsof the
signalactually fall unlessthe hop size is 1. Theseerrors
increaseas the hop size increasesso there is a tradeoff
betweendata proliferationand accuracy.
Our methodis equivalentto a phasevocoderanalysis
using the FFT method with a hop size of 1 sample.The
advantageis that we usethe approximationof Eq. (3) and
do not have to perform the secondFFT. Thus we measure
the absolutefrequencyfor each bin, and get this information from a singleFFT. With our methodwe obtain the
exactfrequencyof the sourcefor five or so bins on either
sideof the bin with centerfrequencyclosestto that of the
667
inputsinusoidal
component.
Our methodhasclearadvantagesoverthe conventional
phasevocoderand thusholds
promisefor musicalsynthesis
aswell asanalysis.
J. Acoust.Soc. Am., Vol. 94, No. 2, Pt. 1, August 1993
Beauchamp,J. W. (1966). "TransientAnalysisof HarmonicMusical
Tonesby Digital Computer,"Audio Eng. Soc.PreprintNc,.479.
Beauchamp,
J. W. (1969). "A ComputerSystemfor Time-VariantHarmonicAnalysisand Synthesis
of MusicalTones,"Musicby Computers,
editedby vonFoersterandBeauchamp
(Wiley, New York) :.pp. 19-61.
Brown,J. C. (1991). "Calculationof a constantQ spectraltransform,"J.
Acoust. Soc. Am. 89, 425-434.
Brown, J. C. (1992). "Musical fundamentalfrequencytracking usinga
patternrecognition
method,"J. Acoust.Soc.Am. 92, 1394-1402.
Brown,J. C., andPuckette,M. S. (1992). "An efficientalgorithmfor the
calculationof a constantQ transform,"J. Acoust.Soc.Am. 92, 26982701.
Charpentier,
F. J. (1986). "PitchDetectionUsingtheShort-TermPhase
Spectrum,"Proceedings
of the InternationalConference
on Acoustics,
Speech,
andSignalProcessing
(IEEE, New York), pp. 113--116.
Dolson,M. (1986). "The PhaseVocoder:A Tutorial," Cornput.MusicJ.
10, 14-27.
Flanagan,J. L., and Golden,R. M. (1966). "PhaseVocoder,"Bell Syst.
Tech. J. 45, 1493-1509.
Friedman,D. (1985). "Instantaneous-Frequency
Distributionvs Time:
An Interpretation
of thePhaseStructure
of Speech,"
Proceedings
of the
InternationalConference
on Acoustics,
Speech,and SignalProcessing,
(IEEE, New York), pp. 1121-1124.
Grey,J. M., andMoorer,J. A. (1977). "Perceptual
evaluations
of synthesizedmusical instrument tones," J. Acoust. Soc. Am. 62, 454-462.
Moorer, J. A. (1978). "The Use of the Phase Vocoder in Computer
MusicApplications,"
J. Audio Eng. Soc.26, 42-45.
Oppenheim,
A. V., andLuis,J. S. (1981). "The Importance
of Phasein
Signal,"Proc. IEEE 69, 529-541.
Oppenheim,
A. V., andSchafer,R. W, (1975). DigitalSignalProcessing
(Prentice-Hall, EnglewoodCliffs, NJ).
Smith,J. O., and Serra,X. (1987). "An Analysis/Synthesis
Programfor
Non-Harmonic SoundsBasedon a SinusoidalRepresentation,"Proceedings
of the 1987 InternationalComputerMusic Confi•rence
(Int.
ComputerMusicAssoc.,SanFrancisco),pp. 290-297.
J. Brown and M. Puckette:High resolutionfrequencydetermination
667
Download