Investigation of Randomly Modulated Periodicity in Musical

advertisement
Investigation of Randomly Modulated Periodicity in Musical
Instruments
Shlomo Dubnov
Music Department, University of California San Diego, La Jolla, CA 92093-0326. USA,
sdubnov@ucsd.edu
Melvin J. Hinich
Applied Research Laboratories, The University of Texas at Austin, Austin, TX 787138029, USA
hinich@mail.la.utexas.edu
Abbreviated Title: Random Modulated Periodicity
Received
1
Abstract
Acoustical musical instruments, which are considered to produce a well-defined pitch,
emit waveforms that are never exactly periodic. A periodic signal can be perfectly
predicted far into the future and considered deterministic. In nature, and specifically in
sustained portion of musical sounds, there is always some variation in the waveform over
time. Thus, signals that are labeled as periodic are not truly deterministic. In this paper
we use a formal definition for such a varying periodic signal by means of a modulation
coherence function. This measure characterizes the amount of random variation in each
Fourier component and allows capturing its statistical properties. The estimation is done
in period or pitch-synchronous manner and allows capturing even the smallest deviations
away from periodicity, with only mild assumptions on the nature of the random
modulating noise. This modulation coherence function is very different from the
coherence function between two stationary signals, which measure second order
statistical / spectral similarity between signal. It is also different from non-linear phase
coupling measures that were previously applied to musical sounds, which depend on
interaction between several harmonic Fourier components using higher order statistics.
The method is applied to a digitized record of an acoustic signal from several musical
instruments.
PACS 43: 60.Cg, 75.De, 75.Ef, 75.Fg
2
I. INTRODUCTION
This paper investigates fluctuations away from perfect periodicity in pitched
acoustic instruments during a sustained portion of their sounds. Acoustical musical
instruments, which are considered to produce a well-defined pitch, emit waveforms that
are never exactly periodic (Beauchamp 1974, McIntyre et al. 1981, Schumacher 1992,
Dubnov and Rodet 2003). The paper focuses quantitatively on one of the possible ways
of describing those fluctuations that has not been quantitatively addressed so far, namely,
random modulated periodicity (Hinich 2000, Hinich and Wild 2001). This type random
modulation is encountered in signals, which are labeled as periodic, but exhibit some
variation in the waveform over time which are not truly deterministic. A randomly
modulated periodic signal is created by some mechanism that has a more or less stable
inherent periodicity with random deviations around the mean periodic value. For
example, in speech signals voiced speech is randomly modulated since the oscillating
vocal cord varies slowly in amplitude and phase over several pitch periods in a seemingly
random fashion. Other examples include sonar reflections pinging on a target, rotating
machinery (Barker et al., 1994), and so on.
In this work we investigate instrumental sounds that have a well-defined pitch
during a sustained portion of their sound. Although we are dealing with sustained
portions of instrumental sounds, it is important to state that these sounds are not in the
"steady state" as would be produced by an artificial blowing or bowing machine, but are
played by a human player, with all the attendant vibrato, amplitude and pitch variability.
For instance, it should be noted that both the flute and the cello are normally played with
3
significant vibrato at around 6Hz, while the trumpet is normally played with no vibrato.
In the case of Cello, one must also distinguish between natural playing of stopped and
open strings. Playing a note on an open string contains only small pitch variation due to
possible variations in the force applied to the bow. A flute vibrato generally adds only a
small pitch variation, and generally has a large and uncorrelated variation in the
amplitudes of upper partials and not a large variation in the amplitude of the fundamental.
In stopped string bowing, the sounds have both a significant pitch variation (a few
percent) over all partials and also large amplitude variations among the partials because
of body resonance (Fletcher and Rossing, 1995)
Recently a method for evaluating the degree of phase synchronous vs. asynchronous
deviations among harmonics of musical instruments in sustained portions of their sounds
was proposed (Dubnov and Rodet, 2003), based on estimation of the degree phase
coupling among groups of harmonically related partials and it is closely related to
evaluation of bi-coherence (using Higher Order Spectral (HOS) analysis). The bicoherence method is different from the coherence method of the current paper in several
aspects: First, the bi-coherence function depends on interaction between phases of
different partials, while the coherence measure is a local property of every partial.
Moreover, phase coupling measures deviations between phases of sinusoidal
components, while coherence captures random modulations that may contain both phase
and amplitude deviations.
We use a term “modulation coherence” to denote this new measure for signal
deviation from periodicity, which measures the deviations in the frequency domain of the
signal spectral component relative to a mean signal that has perfectly coherent or constant
4
spectral components with no amplitude or phase deviations between periods. We use the
term “coherence” in analogy with the physics use of the term, like in “coherent light”,
being a signal of zero bandwidth, and having no deviations from single frequency
(monochromatic).
One of the contributions of this paper is in derivation of a theoretical estimate for the
amount of decay in modulation coherence due to vibrato (mathematical details are
provided in the Appendix). It might be expected that a signal containing quazi-periodic
frequency fluctuations would have little modulation coherence since it does not have a
well-defined period and accordingly no averaging period or mean signal could be
determined. Our analysis shows that in case that vibrato is considered to be a (random)
frequency modulation, then for vibrato depth of the order of magnitude of a semitone (or
less, typical to musical instruments), the decay in modulation coherence is actually very
small. This finding is interesting when considering the experimental modulation
coherence results for instruments with vibrato. For instance, comparing open and stopped
notes on a Cello (i.e. without and with vibrato), we come to the conclusion that the large
reduction in modulation coherence in the later case cannot be attributed to frequency
modulation aspect due to the vibrato.
The experimental analyses in the paper are performed using a set of sounds similar to
ones that were used in (Dubnov and Rodet, 2003) (specifically, the sounds of Cello, Flute
and Trumpet instruments are the same recordings). The experiments include investigation
of both stopped and open string cello sounds and normal playing for wind instruments
containing various amount of vibrato, with the flute having a significant vibrato, while
5
the trumpet or French horn having no vibrato. These samples were taken from McGill
University Music Sound Database (McGill University Master Samples).
II. THE MODEL
A varying periodic signal with a randomly modulated periodicity is defined as
follows:
Definition: A signal {x(t )} is called a randomly modulated periodicity with period T if it
is of the form
x ( t )  K 1
K /2

 k  uk (t ) exp(i 2f k t )
for
fk = k/T
(1.1)
k  K / 2
where   k   k , u k (t )  uk (t ) , and Euk (t )  0 for each k and E is the expectation
l
q
operation. The K/2+1 uk ( t ) are jointly dependent random processes that represent the
random modulation. This signal can be written as x(t )  s(t )  u(t ) where
s(t )  K 1
K /2
  k exp(i 2f k t ) and u(t )  K 1
k  K / 2
K /2
 u (t ) exp(i 2f t ) .
k
k
(1.2)
k  K / 2
The periodic component s(t ) is the mean of x (t ) . The zero mean stochastic term u(t ) is
a real valued non-stationary process.
A common approach in processing signals with a periodic structure is to segment the
observations into frames of length T so that there is exactly integer number of periods in
each sampling frame. The term sampling frame, or simply frame is used in this paper in
order to match the terminology used in the speech and audio processing literature. The
6
waveform in frame m is slightly different from that in frame m +1 due to variation in the
stochastic signal. To further simplify notation, let us set the time origin at the start of the
first frame. Then the start of the m-th frame is  m  (m  1)T where m=1,…M. The
variation of the waveform from frame-to-frame is determined by a probability
mechanism described by the joint distribution of x(m ), ..., x( m  T 1) .
Now that the concept of a randomly modulated periodicity has been defined, the next
step is to develop a measure of the amount of random variation present in each Fourier
component of a signal. Such a measure, called a modulation coherence function, is
presented in the next section. It is important to note that in the definition of the signal
(1.1) it is implicitly assumed that the signal period is some integer multiple of 1/T and
accordingly the frequencies fk are integer multiples of this period. Since, at this point of
discussion, we are free to specify any sampling frequency, one could in principle sample
any periodic analog signal so that it is also discrete periodic. The implication of the
choice of the sampling frequency is that the spectral analysis involved in estimation of
the modulation coherence function (i.e. the DFT operation to be performed below), does
not need to employ windowing or frequency interpolation techniques in order to obtain
additional spectral values “in between” the DFT bins. In practice, the signal sampling
frequency is chosen a-priori independently of the signal period, a situation that indeed
requires additional methods for improving the spectral analysis. This will be done in the
section on estimating the coherence function immediately following the next section. For
the sake of clarity of the presentation we shall first define the modulation coherence
function assuming that the sampling of the signal and the signal periodicity indeed
correspond to each other (i.e. the signal is discrete periodic).
7
Modulation coherence
The m-th frame of the signal is
l x(
m
q
), ..., x (  m  T  1) . Its discrete Fourier
transform (DFT) at frequency f r  r / T for each r = 1,…,T/2 is
T 1
X (r )   x ((m  1)T  t ) exp( i 2 f r t )
(2.1)
t 0
T 1
=
 T 1
t 0
K /2
 
k  K / 2
k
 uk (t )  exp(i 2 kt / T ) exp( i 2 rt / T )
T 1
  r  U (r ) where U (r )   u(t ) exp( i 2 f r t ) .
t 0
Essentially, the above result says that the DFT of a randomly modulated periodic signal
can be split into the mean spectral component and the contribution of the modulation
component at that frequency. Although initially this may seem trivial, there are a couple
of points to consider here: One is that this is a first step in preparing the estimator and
defining the modulation coherence. The second is more significant, and it shows that
periodic modulation, which is considered here as an inherent property of the signal and
not as an added noise, behaves in the frequency domain as an additive spectral
component, i.e. surplus energy and possibly phase shift in addition to the spectral
components of the mean signal. Mathematically, of course, this is a manifestation of the
linearity of the DFT, but it is considered here in a stochastic context, i.e. the added
spectral component is a random spectral deviation and some statistics need to be
extracted from it in order to use it as a signal characteristic.
8
To simplify the notation, the index m is not used to subscript the complex valued
random variables X(r) and U(r). The variability of the complex Fourier amplitude X(r)
about its mean r is E[U  (r1 )U (r1  r )]   u (r ) , independent of r1 due to stationarity. If
 r  0 and  u (r )  0 then that complex amplitude is a true periodicity. The larger the
value of  u (r ) , the greater is the variability of that component from frame to frame. If
 r  0 and  u (r )  0 , then that component does not contribute to periodicity.
In order to quantify the variability consider the function  x (r ) , called a modulation
coherence function defined as follows for each r=1,...,T/2:
 x (r ) 
r
2
(2.2)
 r   2u (r )
2
If  u (r )  0 and  r  0 then  x (r )  1 . This is the case where the f r frequency
component has a constant amplitude and phase. If  r  0 then  x (r )  0 . This is the case
where the mean value of the f r frequency component is zero, which is true for each
frequency component of any stationary random process with finite energy.
A high coherence value can be either due to large amplitude  r relative to the
standard deviation  u (r ) or a small standard deviation relative to the amplitude  r . The
signal coherence value at each harmonic is dimensionless and is neither a function of the
energy in the band nor the amplitude of the partial.
One should note that this modulation coherence function is very different from the
coherence function between two stationary signals (p.352, Jenkins and Watts, 1968). The
coherence (sometimes called coherency) between x1 (t ) and x2 (t ) at frequency f r is the
9
correlation between X1 (r ) and X 2 (r ) . The closer the coherence value is to one, the
higher the correlation between the real and imaginary parts of both Fourier components
(Carter, Knapp, and Nuttall, 1973). The modulation coherence function, in contrast, is
defined for one signal1. It measures the variability of X(r) about its mean r. One should
keep in mind that the signal in this representation is the mean of the observed signal.
In the signal plus modulation-noise representation of {x(t )} the signal-to-modulationnoise
ratio
(SMNR)
is
(r )   r  u2 (r )
2
for
frequency
fr .
Thus
 2x (r )   (r ) /  (r )  1 is a monotonically increasing function of SMNR. Inverting this
relationship it follows that
bg bg
bg
 2x r
r 
1   2x r
(2.3)
A modulation coherence value of 0.44 yields a SMNR of 0.24 which is –6.2 dB.
The measure is not shift invariant in the sense that it needs to be “synchronized” to the
pitch. As will be discussed in the next section, the size of the frame is chosen in practice
to include multiple periods. The size of the frame defines the resolution bandwidth, i.e.
the larger the frames are, the better frequency resolution we get, but with a tradeoff of
having less averaging (smaller amount of frames for the signal duration) and accordingly
more noisy estimates.
1
Estimation of correlation for one signal yields a periodicity estimate, i.e. a time shift of the signal with
respect to itself when it is similar. This is again different from modulation coherence.
10
Estimating the Modulation Coherence Function
As mentioned earlier, the signal in practice would most likely not have a
correspondence between the sampling frequency and the signal period. This situation
violates the model of (1.1) and requires some changes to the modulation coherence
function in (2.2). The simple solution to this problem is to assume that either the
sampling frequency is sufficiently high compared to the signal period. Another solution is
to use multiple periods in a frame and possibly to use zero padding or other spectral
interpolation methods for estimation of the signal spectrum at frequencies that do not
correspond precisely to the DFT frequencies.
We shall address these problems in two stages. First, we present a simple method for
finding the fundamental frequency. Then, we use a large frame size (a frame that contains
multiple periods instead of a single period) and for estimation of the mean signal and
include zero padding for estimation of the spectrum of the remaining difference signal.
Finding the Fundamental Frequency
It is important to know the fundamental frequency of the periodic component in order
to obtain the correct frame length for correct DFT analysis and averaging of the signal. In
case that the fundamental is unknown, it must be estimated from the signal. There are
many algorithms in the literature that might be used for pitch or fundamental frequency
detection. Below we describe the method for determining the fundamental that was used
in our program.
11
To find the fundamental of a sound we subtract the mean (i.e. DC value) of the signal
from each data point x  tn  where tn  n and  is the sampling interval. In our case it
is important to find the exact value of the fundamental frequency to a precision that might
be higher then the DFT resolution 1/T in equation (2.1). For this purpose we resample
the signal to a higher sampling frequency and then we compute the discrete Fourier
transform X  r   n0 x  tn  exp(i 2 f r tn ) using a multiple of the fundamental instead
N
of a single period, a situation that also stabilizes the average frame in terms of amplitude,
phase and frequency fluctuations of the instrument. The coherence function is estimated
from the mean and the variance of the DFT as explained below and the process is iterated
by manually adjusting the analysis frame size (and changing the DFT analysis frequency
accordingly) so as to maximize the resulting coherence values. The maximally coherent
results are reported in the following graphs. It should be noted that additional zero
padding is not required since when a matching signal period and DFT analysis frequency
are found, the analysis frequency is exact.
Mean signal, modulation variance and modulation coherence function estimates
Suppose that we have observed M frames each of length T of the process {x(t )} as
denoted in the beginning of Section 2. Recall that  m  (m  1)T for each m=1,…,M .
The sample mean for each t=0,…,T-1
M
x ( t )  M 1  x (  m  t )
(3.1)
m 1
is an unbiased estimator of the "signal" s(t ) .
12
Let X (r ) denote the r-th component of the DFT of ( x (0),..., x (T  1)) . We define
y (  m  t )  x (  m  t )  x (t ) ,
(3.2)
and let Ym(r) denote the rth DFT component of ( y (  m ),..., y (  m  T  1)) . Estimator of
the variance  2u (r ) is defined as:

M
 2u (r )  M 1  Ym (r ) .
2
(3.3)
m 1
The statistic  x (r ) defined by
 x (r ) 
X (r )
2
2
X (r )   2u (r )
.
(3.4)
It can be shown (Hinich 2000) that  x (r ) is a consistent estimator of  x (r ) for frequency
2
f r with an error of O(M 1/ 2 ) . The expression X (r )  u2 (r ) can be used as an estimator
of the signal-to-noise ratio (r ) for frequency f r .
Example: Coherent versus modulation only signal components
In order to better explain the difference between modulation coherence estimation
and other, more standard spectral estimation methods we consider a signal comprising of
a single sinusoid at a frequency f 0 and a band-limited noise-only component at the first
harmonic frequency 2 f 0 . The signal can be written as
x(t ) 1 sin(2 f 0t )  u2 (t )sin(2 2 f 0t )
13
(4.1)
Note that this signal has energy at two frequencies, where a component at frequency f 0
has u1 (t )  0 for all times, which results in modulation coherence of value one, and a
second component at frequency 2 f 0 that has 2  0 , resulting in modulation coherence
of zero value. It should be noted that the bandwidth of the noise component is not
specified in the definition of modulation coherence, since both the definition and the
analysis are asymptotic. From the point of view of spectral analysis, the second
component at the right hand side of equation (4.1) is heterodyning of a signal u2 (t ) ,
which centers the energy of the noise on frequency 2 f 0 , with a bandwidth that equals
that of u2 (t ) .
The following figures present one such example with frame size of T  100 samples,
fundamental period of 20 samples ( f 0 =1/20 or five period in a frame), and a low-pass
u2 (t ) with cutoff equal to the frame rate (it was generated by band-limited interpolation
from a random sequence with factor 1:100, or up-sampling of a random signal generate at
frame rate into signal u2 (t ) at the original sampling rate). A total of 200 frames were
generated. An excerpt from the signal is shown in top Figure 1. It can be seen that the
signal has strong amplitude variations to the strongly modulted second harmonic.
The mean signal was estimated by averaging the frames. It should be noted that this
averaging occurs in “pitch synchronous” manner. As can be seen from the second from
top Figure 1, the resulting signal corresponds to the periodic component only.
The DFT analysis by multiplying the signal frames by a cosine and sine matrices, each
generated with an exact period T, resulting in a matrix of dimensions 50 x 100 (50
14
frequency points and 100 time samples). The mean values of the sine and cosine
components were used as an estimate of the mean signal spectrum. The variances of these
components were used for estimation of the variance  2u (r ) . Both of them were used for
estimation of coherence. Bottom Figure 1 shows the coherence values for the 50 DFT
values.
This should be contrasted with the spectral estimation using standard methods, such as
periodogram or correlogram methods. The power spectral density estimate using Welch
method appears above the modulation coherence graph (third from top Figure 1). One can
see that there is no distinction between the sinusoidal and the band-limited noise
components since both contribute approximately same energy at their respective
frequencies.
PLACE FIGURE 1 ABOUT HERE
III. INFLUENCE OF FREQUENCY MODULATION ON MODULATION
COHERENCE FUNCTION
The coherence analysis of the previous sections is written out as an amplitude
modulation component added to a coherent (i.e. zero bandwidth) sinusoid.
Mathematically speaking, the way the modulation coherence model is written out, one
cannot distinguish between amplitude and phase modulation. In fact, the variations of
conjugate positive/negative components
u k (t )  uk (t ) include variation both in
15
amplitude and phase. Since phase modulations, or more precisely their derivative, are in
fact frequency modulations, model does incorporate effect related to random frequency
deviations.
In this section we explore the effect of random frequency deviations on the modulation
coherence function. In order to maintain clarity in the presentation, the mathematical
details of the analysis are deferred until the Appendix. It should be noted that in natural
playing conditions, the signal contains quazi-periodic frequency fluctuations that could
be generally termed as vibrato. Such a signal seems to violate the modulation coherence
model, since it does not fulfill the condition of having an exact period of the waveform in
every frame. In such a case, we consider longer frames that correspond to a multiple of an
average signal period. This allows viewing vibrato over a long enough signal frame as a
particular case of the periodic modulation model, as described below. We consider a
mathematical model of random frequency fluctuations as
x(t )  K 1
K /2

k  K / 2
k exp(i 2 f k (1  Ivk (t ))t )
(5.1)
where vk (t ) is a slowly varying random process limited to the range [-1,1] and I is a
multiplicative factor representing modulation depth that we call “modulation index”.
Note that in this model the frequency modulation is proportional to the frequency of
every partial. DFT analysis with period f k can be evaluated by considering the mismatch
between the analysis frame size and the instantaneous random deviation in the signal
period. The frequency modulations cause a mismatch between the period of x(t ) and the
size of the DFT analysis frame, which is equal to the mean period f k (or an integer
16
multiple of it). This results in distortion of the spectrum of the mean component by a
discrete Dirichlet kernel, or discrete sinc function, which can be considered as a
windowing effect caused by using a rectangular window whose size does not match the
period of the signal. Analytical development are simplified by allowing small frequency
deviations. The details of the mathematical development of the signal mean spectrum
and spectrum autocorrelations are given in the Appendix.
Using these approximations, we have calculated the dependence of the mean coherence
on frequency modulation. Figure 2 shows that decay of the modulation coherence as a
function of a parameter   rI , which we call “vibrato depth”, where r is the frequency
partial index and I the modulation depth, assuming zero mean uniform distribution
vr (t )  v(t ) Uniform[1,1] .
PLACE FIGURE 2 ABOUT HERE
The graph can be used to evaluate the decay in modulation coherence as a function of
modulation index I for a given partial index r, or decay in modulation coherence as
function of partial index for a constant modulation index. For instance, for modulation
index of 1%, the first 20 partials of a perfectly coherent signal can be looked up from the
graph up to   0.2 and they will lose coherence in the amount of less then 3%
(coherence of the 20th partial would be around 0.97). Partial number 50 that undergoes
50% modulation (vibrato depth 0.5) drops in coherence to 0.87, and so on.
17
Discussion
The estimator  x (r ) is a point estimate of the modulation coherence  x (r ) that
measures the relative stability of the f r frequency component of a quasi-periodic signal.
The term u(t ) is not an additive stationary noise, but rather a residual process capturing
the random modulation in x (t ) that deviated from the mean periodic component s(t ) . It
should be noted that s(t ) and u(t ) are different from the deterministic and residual
stochastic signals that are estimated in sinusoidal plus noise models of speech or musical
signals.
In the sinusoidal model (Rodet 97) the signal is represented as the sum of a small
number of sinusoids (partials) with time-varying amplitudes and frequencies. The signal
parameters are estimated in the following manner: first, search for peaks is done using
amplitudes of short-time Fourier transform. Then, the peak amplitudes and the
instantaneous frequencies at peak location are noted. Finally a "nearest-neighbor"
matching is performed across frames in order to relate different partials in time.
In (Serra and Smith, 1989) the authors incorporated a non-sinusoidal "residual" in the
model as part to the additive synthesis and represented it as an additive random signal.
Their model is a “signal plus noise” model, where the sinusoidal part is estimated
instantaneously based on signal DFT, without matching signal period to the size of DFT
and with no statistical averaging.
The proposed analysis method differs from the sinusoidal plus noise model in the
following important aspects: The mean signal is a consistent estimate of the mean
periodic signal that is estimated on period-to-period basis (i.e. using pitch-synchronous
frames). This is different from the periodogram or spectral peak picking methods, both in
18
terms of precision and in terms of the underlying statistical model. It can be shown that
the estimator of the variance  2u (r ) equals T S u ( f r )  O(1 / T ) , where Su ( f r ) is the
spectrum of u(t) at frequency f r  2 r / T . It should be noted that both X (r ) and  2u (r )
can have non-zero values at same r, a situation that does not occur in sinusoidal plus
residual models where every frequency bin is either deterministic sinusoidal or residual
signal. For instance, coherence is defined for frequencies where the periodic component
is extremely weak. To sum up, the underlying statistical assumptions provide the
modulation coherence a precise statistical meaning. Moreover, the estimators allow
detection of a mean periodic signal under very severe SMNR conditions, or detection of
coherence for very weak periodic components.
IV. ANALYSIS OF MUSIC INSTRUMENT SIGNALS
In this section we apply coherence measure for analysis of instrumental sounds
that have a well-defined pitch during a sustained portion of their sound. It is important to
state that these sounds are not produced by an artificial blowing or bowing machine, but
are played by a human player, with the entire attendant vibrato, amplitude and pitch
variation. For instance, both the flute and the cello (stopped string note) are played with
vibrato, while the trumpet is played with no vibrato. Analysis of signals with vibrato
shows vibrato depth of the order of magnitude of few percent of the fundamental
frequency.
We investigated the coherence of several musical instruments, which include the Flute,
Cello, Trumpet and French Horn. For purpose of analysis we used samples from a McGill
University Master Samples. The signals are sampled at 44100 Hz, 16 bit. For most cases
19
we have used sounds of the same pitch, playing C4. The only exception was an analysis
of open string cello sound, which was played at A3 note.
As mentioned above, the sounds were produced by natural playing, i.e. including vibrato
and amplitude modulations, as well as in presence of recording room conditions (these
are not anechoic room recordings). The results displayed in the figures below correspond
to signal segments of 40,000 samples of the different instruments (i.e. duration of 0.9
sec.). For signal periods were used at every frame. The exact periods for each signal
were manually tuned around the candidate pitch frequency so as to maximize the signal
coherence. The signals were up-sampled by factor 4 to allow for a quarter sample
precision in finding the period.
Analysis Results
The signal coherence function provides additional information about the signal
beyond the power spectrum. There are evident differences in coherence between different
instruments.
PLACE FIGURE 3 ABOUT HERE
PLACE FIGURE 4 ABOUT HERE
PLACE FIGURE 5 ABOUT HERE
PLACE FIGURE 6 ABOUT HERE
20
One particularly interesting case of coherence is the Cello. It produces its sounds through
a periodic bow excitation that passes through a complicated resonance body. There is a
big difference in coherence between open and stopped sounds. According to the
theoretical analysis of the influence of frequency modulation on signal coherence in the
previous section, if the only effect of vibrato had been frequency modulation, the
resulting reduction in coherence should have been only a few percent. In realty, the
effect of vibrato for the Cello results in significant loss of coherence, a fact that suggests
that vibrato causes other changes in the signal, beyond simple frequency modulation.
PLACE FIGURE 7 ABOUT HERE
The analysis in Figure 7 shows the analysis of an open string Cello sound, which is
indeed very coherent.
Discussion
In this paper we have presented a method for analysis of deviations that occur in
periodic sounds. Two types of deviations were considered: 1) Randomly modulated
periodicity that consists of amplitude or phase deviations at every signal period that occur
“around” a constant mean signal, and 2) Random frequency modulations that change the
signal period and might occur due to vibrato. It has been shown theoretically that the
effect of frequency modulation alone is negligible for lower partials at normal vibrato
playing conditions (vibrato depth of quarter tone is only 3% of the fundamental
frequency). Let us summarize our findings:
21
1.
Stopped Cello sound played with vibrato has high coherence only at the first
two partials and the rest of the harmonics have low coherence.
2.
Flute has coherence above 0.5 for the first three harmonics. Higher harmonics
have significantly lower coherence.
3.
French Horn has high coherence up to first 12 or 13 partials. There is a smaller
drop in coherence earlier (at partial 7).
4.
Trumpet exhibits a consistently decaying pattern of coherence, decreasing
gradually from almost perfect coherence to as low as 0.1 at 18th harmonic.
5.
Cello without vibrato (open string) has a high coherence up to partial number
21, and the coherence remains relatively high for the higher partials as well.
Considering the differences between open string and vibrato Cello sounds, it is evident
that the drop in coherence caused by vibrato is much higher then what the effect of
frequency modulation alone as it should have been according to the theory. This finding
supports the assumption that a complex interaction between the bowed string excitation
and the body exist (Beauchamp 1974, McIntyre et al. 1981, Schumacher 1992, Weinreich
1997).
Reproducible results
The MATLAB programs used to generate the examples are available at
http://music.ucsd.edu/~sdubnov/SigCoh
A general program for computing signal modulation coherence is available from the
second author.
22
APPENDIX
In order to analyze the effect of frequency fluctuations on modulation coherence we
consider a process
x(t )  K 1
K /2

k  K / 2
k exp(i 2 f k (1  Ivk (t ))t ) ,
where vk (t ) is a slowly varying random process limited to the range [-1,1] and I is a
modulation index. Note that in our model the frequency deviation is proportional to the
frequency of every partial. We use the subscript k in the definition of vk (t ) to allow for
separate deviations in every partial, although in most practical situations it could be the
same or highly correlated process.
l
q
Considering the m-th frame of the signal x (  m ), ..., x (  m  T  1) , we write the DFT of
component r
T 1
X (r )   x((m  1)T  t ) exp(i 2 f r t )
t 0
T 1
= T 1 
K /2

t  0 k  K / 2

K /2

k  K / 2
k exp(i 2 kt / T (1  Ivk (t ))) exp( i 2 rt / T )
k Dk , I (r )
T 1
where Dk , I (r )  T 1  exp(i 2 t / T (k  r  kIvk (t ))) .
t 0
Let us assume that the random deviation vk (t ) is constant during the frame
l x(
m
q
), ..., x (  m  T  1) . Defining  k , I (r )  k (1  Ivk )  r , we rewrite the factor
23
T 1
Dk , I (r )  T 1  exp(i 2 tk , I (r ) / T )  T 1e
1
ik ,I ( r )(1 )
T
t 0
sin k , I (r )
sin k , I (r ) / T 
In order to evaluate the mean and variance values of the r-th DFT component, we
assume that the deviation is small relative to the spacing between the harmonics, and we
shall limit ourselves to deviations that occur in the main lobe of the resulting function
(this function is close to so called Dirichlet kernel or discrete sinc, with an additional
multiplicative complex exponential).
E ( X (r ))  E ( Dr , I (r )) r ,
E ( X (r ) X * (r ))   k k ' E ( Dk , I (r ) Dk*', I (r )) 
k
k'
1
T2
 sin 2 ( k , I (r )) 

2
 sin ( k , I (r ) / T ) 
 k2 E 
k
2
1 2  sin ( r , I (r )) 
 2 r E  2
 sin ( (r ) / T ) 
T
r ,I


Using the estimate
ˆx (r ) 
E ( X (r ))
E ( X (r ) X * (r ))
we can obtain an analytical approximation to modulation-coherence function for different
values of modulation parameters. One may note that for r=k we have  r , I (r )  rIvr (t ) and
Dr , I (r )  D1,rI (1) . In the case of uniformly distributed vk (t ) the averaging of Dr , I (r )
equals to integrating the function D1,rI (1) over  r , I ( r ) values over a range of [-rI rI], with
ˆx (r ) independent of the spectral amplitudes  r .
Figure 2 presents a graph of the theoretical evaluation of ˆx (r ) as a function of   rI
(that we call “vibrato depth”), obtained by integrating of Dr , I (r ) over different ranges of
24
[-rI rI]. This function is compared to modulation coherence values obtained from a
simulation of modulation-coherence function using a randomly modulated partial
(calculated over 2000 instances of a sinusoidal components with randomly chosen
frequency modulations at different vibrato depths). Both theoretical derivation and the
simulations were done assuming a uniform distribution of the frequency modulation
index values within a vibrato depth range. The resulting expression could be considered
either as a function of partial index r for constant modulation index I or as a function of I
for a given r.
25
REFERENCES
Barker, R. W., G-A Klutke, M. J. Hinich, C.N. Ramirez, and R. J. Thornhill (1994),
"Development and application of a statistically based feature extraction algorithm for
monitoring tool wear in circuit board assembly," Circuits, Systems, and Signal
Processing 13 (4), 411–434.
Beauchamp, W. (1974), “Time-Variant spectra of violin tones”, J.Acoust.Soc.Am., Vol
56, No 3, September.
Carter, G.C., C.H. Knapp, and A.H. Nuttall (1973), “Estimation of the magnitudesquared coherence function via overlapped Fast Fourier Transform Processing”, IEEE
Trans. On Audio Electroacoustics AU-21, 337-344.
Dubnov S., Rodet X. (2003), “Investigation of Phase Coupling Phenomena in Sustained
Portion Of Musical Instruments Sound”, J.Acoust.Soc.Am., Vol. 113, No 1, January, pp.
348-359
Fletcher, N.H., and Rossing, T.D. (1998), The Physics of Musical Instruments, 2nd ed.
(Springer Verlag, New York).
Gardner, W. A. and L. E .Franks (1975), "Characterization of cyclostationary random
signal processes," IEEE Trans. Information Th. IT-21, 4-14.
Hinich, M.J. (2000), “A Statistical Theory of Signal Coherence”, Journal of Oceanic
Engineering, Vol. 25, No.2, 256-261
Hinich, M.J. and P. Wild (2001), ”Testing time series stationarity against an alternative
whose mean is periodic”, Macroeconomic Dynamics, Vol.5, 380-412.
26
Jenkins, G.M. and D.G. Watts (1968), Spectral Analysis and its Applications, San
Francisco: Holden Day.
McIntyre M.E., Schumacher R.T. and Woodhouse J. (1981), “Apperiodicity in bowed
string motion, Acustica”, 49, 13-32.
McGill University Master Samples, Faulty of Music, McGill University, 555 Sherbrooke
St., West, Montreal, Quebec.
Rodet, X. (1997), “Musical Sound Signals Analysis/Synthesis: Sinusoidal+Residual and
Elementary Waveform Models”, in Proc. of the IEEE Time-Frequency and Time-Scale
Workshop, Coventy, U.K.
Schumacher, R.T. (1992), “Analysis of apperiodicities in nearly periodic waveforms”, J.
Acoust. Soc. Am. Vol. 91, 438-451.
Serra, X. and Smith J.O. (1989), “Spectral Modeling Synthesis: A sound
analysis/synthesis system based on deterministic plus stochastic decomposition”,
Computer Music Journal, 14(4), pp. 12-24.
Weinreich. G. (1997), “Directional Tone Color”, J. Acoust. Soc. Am., Vol. 101, 23892406
27
FIGURE CAPTIONS
Figure 1: Signal waveform, the mean signal, power spectrum and modulation coherence,
from top to bottom. The signal comprises of one coherent and another modulation-only
components. See text for more detail.
Figure 2: Modulation coherence as a function of vibrato depth. The graph shows the
reduction of coherence as a function of partial number for constant modulation index, or
as a function of modulation index for a given partial. See text for more detail.
Figure 3: Cello C4 (with natural vibrato). Top figure shows several signal waveforms
from different segments of the sustained portion of the signal. Second from top figure
shows the mean signal waveform. Third figure from top to bottom shows the power
spectrum (solid) and the mean spectrum (dashed). Bottom figure shows the modulation
coherence function at the harmonics.
Figure 4: Flute C4. Top figure shows the power spectrum (solid) and the mean spectrum
(dashed). Bottom figure shows the modulation coherence function at the harmonics. A
drop in coherence level below 0.5 is observed at partials higher then 3.
Figure 5: French Horn C4. Top figure shows the power spectrum (solid) and the mean
spectrum (dashed). Gradual decay in coherence is observed for lower partials, with a
larger drop occurring at partial 13 and above.
28
Figure 6: Trumpet C4. Top figure shows the power spectrum (solid) and the mean
spectrum (dashed).
Figure 7: Cello A3 Open String (with no vibrato). Top figure shows several signal
waveforms from different segments of the sustained portion of the signal. The mean
signal is not shown but it is obviously very similar to the individual waveforms. Second
figure from top shows the power spectrum (solid) and the mean spectrum (dashed).
Bottom figure shows the modulation coherence function at the harmonics.
29
Download