Investigation of Randomly Modulated Periodicity in Musical Instruments Shlomo Dubnov Music Department, University of California San Diego, La Jolla, CA 92093-0326. USA, sdubnov@ucsd.edu Melvin J. Hinich Applied Research Laboratories, The University of Texas at Austin, Austin, TX 787138029, USA hinich@mail.la.utexas.edu Abbreviated Title: Random Modulated Periodicity Received 1 Abstract Acoustical musical instruments, which are considered to produce a well-defined pitch, emit waveforms that are never exactly periodic. A periodic signal can be perfectly predicted far into the future and considered deterministic. In nature, and specifically in sustained portion of musical sounds, there is always some variation in the waveform over time. Thus, signals that are labeled as periodic are not truly deterministic. In this paper we use a formal definition for such a varying periodic signal by means of a modulation coherence function. This measure characterizes the amount of random variation in each Fourier component and allows capturing its statistical properties. The estimation is done in period or pitch-synchronous manner and allows capturing even the smallest deviations away from periodicity, with only mild assumptions on the nature of the random modulating noise. This modulation coherence function is very different from the coherence function between two stationary signals, which measure second order statistical / spectral similarity between signal. It is also different from non-linear phase coupling measures that were previously applied to musical sounds, which depend on interaction between several harmonic Fourier components using higher order statistics. The method is applied to a digitized record of an acoustic signal from several musical instruments. PACS 43: 60.Cg, 75.De, 75.Ef, 75.Fg 2 I. INTRODUCTION This paper investigates fluctuations away from perfect periodicity in pitched acoustic instruments during a sustained portion of their sounds. Acoustical musical instruments, which are considered to produce a well-defined pitch, emit waveforms that are never exactly periodic (Beauchamp 1974, McIntyre et al. 1981, Schumacher 1992, Dubnov and Rodet 2003). The paper focuses quantitatively on one of the possible ways of describing those fluctuations that has not been quantitatively addressed so far, namely, random modulated periodicity (Hinich 2000, Hinich and Wild 2001). This type random modulation is encountered in signals, which are labeled as periodic, but exhibit some variation in the waveform over time which are not truly deterministic. A randomly modulated periodic signal is created by some mechanism that has a more or less stable inherent periodicity with random deviations around the mean periodic value. For example, in speech signals voiced speech is randomly modulated since the oscillating vocal cord varies slowly in amplitude and phase over several pitch periods in a seemingly random fashion. Other examples include sonar reflections pinging on a target, rotating machinery (Barker et al., 1994), and so on. In this work we investigate instrumental sounds that have a well-defined pitch during a sustained portion of their sound. Although we are dealing with sustained portions of instrumental sounds, it is important to state that these sounds are not in the "steady state" as would be produced by an artificial blowing or bowing machine, but are played by a human player, with all the attendant vibrato, amplitude and pitch variability. For instance, it should be noted that both the flute and the cello are normally played with 3 significant vibrato at around 6Hz, while the trumpet is normally played with no vibrato. In the case of Cello, one must also distinguish between natural playing of stopped and open strings. Playing a note on an open string contains only small pitch variation due to possible variations in the force applied to the bow. A flute vibrato generally adds only a small pitch variation, and generally has a large and uncorrelated variation in the amplitudes of upper partials and not a large variation in the amplitude of the fundamental. In stopped string bowing, the sounds have both a significant pitch variation (a few percent) over all partials and also large amplitude variations among the partials because of body resonance (Fletcher and Rossing, 1995) Recently a method for evaluating the degree of phase synchronous vs. asynchronous deviations among harmonics of musical instruments in sustained portions of their sounds was proposed (Dubnov and Rodet, 2003), based on estimation of the degree phase coupling among groups of harmonically related partials and it is closely related to evaluation of bi-coherence (using Higher Order Spectral (HOS) analysis). The bicoherence method is different from the coherence method of the current paper in several aspects: First, the bi-coherence function depends on interaction between phases of different partials, while the coherence measure is a local property of every partial. Moreover, phase coupling measures deviations between phases of sinusoidal components, while coherence captures random modulations that may contain both phase and amplitude deviations. We use a term “modulation coherence” to denote this new measure for signal deviation from periodicity, which measures the deviations in the frequency domain of the signal spectral component relative to a mean signal that has perfectly coherent or constant 4 spectral components with no amplitude or phase deviations between periods. We use the term “coherence” in analogy with the physics use of the term, like in “coherent light”, being a signal of zero bandwidth, and having no deviations from single frequency (monochromatic). One of the contributions of this paper is in derivation of a theoretical estimate for the amount of decay in modulation coherence due to vibrato (mathematical details are provided in the Appendix). It might be expected that a signal containing quazi-periodic frequency fluctuations would have little modulation coherence since it does not have a well-defined period and accordingly no averaging period or mean signal could be determined. Our analysis shows that in case that vibrato is considered to be a (random) frequency modulation, then for vibrato depth of the order of magnitude of a semitone (or less, typical to musical instruments), the decay in modulation coherence is actually very small. This finding is interesting when considering the experimental modulation coherence results for instruments with vibrato. For instance, comparing open and stopped notes on a Cello (i.e. without and with vibrato), we come to the conclusion that the large reduction in modulation coherence in the later case cannot be attributed to frequency modulation aspect due to the vibrato. The experimental analyses in the paper are performed using a set of sounds similar to ones that were used in (Dubnov and Rodet, 2003) (specifically, the sounds of Cello, Flute and Trumpet instruments are the same recordings). The experiments include investigation of both stopped and open string cello sounds and normal playing for wind instruments containing various amount of vibrato, with the flute having a significant vibrato, while 5 the trumpet or French horn having no vibrato. These samples were taken from McGill University Music Sound Database (McGill University Master Samples). II. THE MODEL A varying periodic signal with a randomly modulated periodicity is defined as follows: Definition: A signal {x(t )} is called a randomly modulated periodicity with period T if it is of the form x ( t ) K 1 K /2 k uk (t ) exp(i 2f k t ) for fk = k/T (1.1) k K / 2 where k k , u k (t ) uk (t ) , and Euk (t ) 0 for each k and E is the expectation l q operation. The K/2+1 uk ( t ) are jointly dependent random processes that represent the random modulation. This signal can be written as x(t ) s(t ) u(t ) where s(t ) K 1 K /2 k exp(i 2f k t ) and u(t ) K 1 k K / 2 K /2 u (t ) exp(i 2f t ) . k k (1.2) k K / 2 The periodic component s(t ) is the mean of x (t ) . The zero mean stochastic term u(t ) is a real valued non-stationary process. A common approach in processing signals with a periodic structure is to segment the observations into frames of length T so that there is exactly integer number of periods in each sampling frame. The term sampling frame, or simply frame is used in this paper in order to match the terminology used in the speech and audio processing literature. The 6 waveform in frame m is slightly different from that in frame m +1 due to variation in the stochastic signal. To further simplify notation, let us set the time origin at the start of the first frame. Then the start of the m-th frame is m (m 1)T where m=1,…M. The variation of the waveform from frame-to-frame is determined by a probability mechanism described by the joint distribution of x(m ), ..., x( m T 1) . Now that the concept of a randomly modulated periodicity has been defined, the next step is to develop a measure of the amount of random variation present in each Fourier component of a signal. Such a measure, called a modulation coherence function, is presented in the next section. It is important to note that in the definition of the signal (1.1) it is implicitly assumed that the signal period is some integer multiple of 1/T and accordingly the frequencies fk are integer multiples of this period. Since, at this point of discussion, we are free to specify any sampling frequency, one could in principle sample any periodic analog signal so that it is also discrete periodic. The implication of the choice of the sampling frequency is that the spectral analysis involved in estimation of the modulation coherence function (i.e. the DFT operation to be performed below), does not need to employ windowing or frequency interpolation techniques in order to obtain additional spectral values “in between” the DFT bins. In practice, the signal sampling frequency is chosen a-priori independently of the signal period, a situation that indeed requires additional methods for improving the spectral analysis. This will be done in the section on estimating the coherence function immediately following the next section. For the sake of clarity of the presentation we shall first define the modulation coherence function assuming that the sampling of the signal and the signal periodicity indeed correspond to each other (i.e. the signal is discrete periodic). 7 Modulation coherence The m-th frame of the signal is l x( m q ), ..., x ( m T 1) . Its discrete Fourier transform (DFT) at frequency f r r / T for each r = 1,…,T/2 is T 1 X (r ) x ((m 1)T t ) exp( i 2 f r t ) (2.1) t 0 T 1 = T 1 t 0 K /2 k K / 2 k uk (t ) exp(i 2 kt / T ) exp( i 2 rt / T ) T 1 r U (r ) where U (r ) u(t ) exp( i 2 f r t ) . t 0 Essentially, the above result says that the DFT of a randomly modulated periodic signal can be split into the mean spectral component and the contribution of the modulation component at that frequency. Although initially this may seem trivial, there are a couple of points to consider here: One is that this is a first step in preparing the estimator and defining the modulation coherence. The second is more significant, and it shows that periodic modulation, which is considered here as an inherent property of the signal and not as an added noise, behaves in the frequency domain as an additive spectral component, i.e. surplus energy and possibly phase shift in addition to the spectral components of the mean signal. Mathematically, of course, this is a manifestation of the linearity of the DFT, but it is considered here in a stochastic context, i.e. the added spectral component is a random spectral deviation and some statistics need to be extracted from it in order to use it as a signal characteristic. 8 To simplify the notation, the index m is not used to subscript the complex valued random variables X(r) and U(r). The variability of the complex Fourier amplitude X(r) about its mean r is E[U (r1 )U (r1 r )] u (r ) , independent of r1 due to stationarity. If r 0 and u (r ) 0 then that complex amplitude is a true periodicity. The larger the value of u (r ) , the greater is the variability of that component from frame to frame. If r 0 and u (r ) 0 , then that component does not contribute to periodicity. In order to quantify the variability consider the function x (r ) , called a modulation coherence function defined as follows for each r=1,...,T/2: x (r ) r 2 (2.2) r 2u (r ) 2 If u (r ) 0 and r 0 then x (r ) 1 . This is the case where the f r frequency component has a constant amplitude and phase. If r 0 then x (r ) 0 . This is the case where the mean value of the f r frequency component is zero, which is true for each frequency component of any stationary random process with finite energy. A high coherence value can be either due to large amplitude r relative to the standard deviation u (r ) or a small standard deviation relative to the amplitude r . The signal coherence value at each harmonic is dimensionless and is neither a function of the energy in the band nor the amplitude of the partial. One should note that this modulation coherence function is very different from the coherence function between two stationary signals (p.352, Jenkins and Watts, 1968). The coherence (sometimes called coherency) between x1 (t ) and x2 (t ) at frequency f r is the 9 correlation between X1 (r ) and X 2 (r ) . The closer the coherence value is to one, the higher the correlation between the real and imaginary parts of both Fourier components (Carter, Knapp, and Nuttall, 1973). The modulation coherence function, in contrast, is defined for one signal1. It measures the variability of X(r) about its mean r. One should keep in mind that the signal in this representation is the mean of the observed signal. In the signal plus modulation-noise representation of {x(t )} the signal-to-modulationnoise ratio (SMNR) is (r ) r u2 (r ) 2 for frequency fr . Thus 2x (r ) (r ) / (r ) 1 is a monotonically increasing function of SMNR. Inverting this relationship it follows that bg bg bg 2x r r 1 2x r (2.3) A modulation coherence value of 0.44 yields a SMNR of 0.24 which is –6.2 dB. The measure is not shift invariant in the sense that it needs to be “synchronized” to the pitch. As will be discussed in the next section, the size of the frame is chosen in practice to include multiple periods. The size of the frame defines the resolution bandwidth, i.e. the larger the frames are, the better frequency resolution we get, but with a tradeoff of having less averaging (smaller amount of frames for the signal duration) and accordingly more noisy estimates. 1 Estimation of correlation for one signal yields a periodicity estimate, i.e. a time shift of the signal with respect to itself when it is similar. This is again different from modulation coherence. 10 Estimating the Modulation Coherence Function As mentioned earlier, the signal in practice would most likely not have a correspondence between the sampling frequency and the signal period. This situation violates the model of (1.1) and requires some changes to the modulation coherence function in (2.2). The simple solution to this problem is to assume that either the sampling frequency is sufficiently high compared to the signal period. Another solution is to use multiple periods in a frame and possibly to use zero padding or other spectral interpolation methods for estimation of the signal spectrum at frequencies that do not correspond precisely to the DFT frequencies. We shall address these problems in two stages. First, we present a simple method for finding the fundamental frequency. Then, we use a large frame size (a frame that contains multiple periods instead of a single period) and for estimation of the mean signal and include zero padding for estimation of the spectrum of the remaining difference signal. Finding the Fundamental Frequency It is important to know the fundamental frequency of the periodic component in order to obtain the correct frame length for correct DFT analysis and averaging of the signal. In case that the fundamental is unknown, it must be estimated from the signal. There are many algorithms in the literature that might be used for pitch or fundamental frequency detection. Below we describe the method for determining the fundamental that was used in our program. 11 To find the fundamental of a sound we subtract the mean (i.e. DC value) of the signal from each data point x tn where tn n and is the sampling interval. In our case it is important to find the exact value of the fundamental frequency to a precision that might be higher then the DFT resolution 1/T in equation (2.1). For this purpose we resample the signal to a higher sampling frequency and then we compute the discrete Fourier transform X r n0 x tn exp(i 2 f r tn ) using a multiple of the fundamental instead N of a single period, a situation that also stabilizes the average frame in terms of amplitude, phase and frequency fluctuations of the instrument. The coherence function is estimated from the mean and the variance of the DFT as explained below and the process is iterated by manually adjusting the analysis frame size (and changing the DFT analysis frequency accordingly) so as to maximize the resulting coherence values. The maximally coherent results are reported in the following graphs. It should be noted that additional zero padding is not required since when a matching signal period and DFT analysis frequency are found, the analysis frequency is exact. Mean signal, modulation variance and modulation coherence function estimates Suppose that we have observed M frames each of length T of the process {x(t )} as denoted in the beginning of Section 2. Recall that m (m 1)T for each m=1,…,M . The sample mean for each t=0,…,T-1 M x ( t ) M 1 x ( m t ) (3.1) m 1 is an unbiased estimator of the "signal" s(t ) . 12 Let X (r ) denote the r-th component of the DFT of ( x (0),..., x (T 1)) . We define y ( m t ) x ( m t ) x (t ) , (3.2) and let Ym(r) denote the rth DFT component of ( y ( m ),..., y ( m T 1)) . Estimator of the variance 2u (r ) is defined as: M 2u (r ) M 1 Ym (r ) . 2 (3.3) m 1 The statistic x (r ) defined by x (r ) X (r ) 2 2 X (r ) 2u (r ) . (3.4) It can be shown (Hinich 2000) that x (r ) is a consistent estimator of x (r ) for frequency 2 f r with an error of O(M 1/ 2 ) . The expression X (r ) u2 (r ) can be used as an estimator of the signal-to-noise ratio (r ) for frequency f r . Example: Coherent versus modulation only signal components In order to better explain the difference between modulation coherence estimation and other, more standard spectral estimation methods we consider a signal comprising of a single sinusoid at a frequency f 0 and a band-limited noise-only component at the first harmonic frequency 2 f 0 . The signal can be written as x(t ) 1 sin(2 f 0t ) u2 (t )sin(2 2 f 0t ) 13 (4.1) Note that this signal has energy at two frequencies, where a component at frequency f 0 has u1 (t ) 0 for all times, which results in modulation coherence of value one, and a second component at frequency 2 f 0 that has 2 0 , resulting in modulation coherence of zero value. It should be noted that the bandwidth of the noise component is not specified in the definition of modulation coherence, since both the definition and the analysis are asymptotic. From the point of view of spectral analysis, the second component at the right hand side of equation (4.1) is heterodyning of a signal u2 (t ) , which centers the energy of the noise on frequency 2 f 0 , with a bandwidth that equals that of u2 (t ) . The following figures present one such example with frame size of T 100 samples, fundamental period of 20 samples ( f 0 =1/20 or five period in a frame), and a low-pass u2 (t ) with cutoff equal to the frame rate (it was generated by band-limited interpolation from a random sequence with factor 1:100, or up-sampling of a random signal generate at frame rate into signal u2 (t ) at the original sampling rate). A total of 200 frames were generated. An excerpt from the signal is shown in top Figure 1. It can be seen that the signal has strong amplitude variations to the strongly modulted second harmonic. The mean signal was estimated by averaging the frames. It should be noted that this averaging occurs in “pitch synchronous” manner. As can be seen from the second from top Figure 1, the resulting signal corresponds to the periodic component only. The DFT analysis by multiplying the signal frames by a cosine and sine matrices, each generated with an exact period T, resulting in a matrix of dimensions 50 x 100 (50 14 frequency points and 100 time samples). The mean values of the sine and cosine components were used as an estimate of the mean signal spectrum. The variances of these components were used for estimation of the variance 2u (r ) . Both of them were used for estimation of coherence. Bottom Figure 1 shows the coherence values for the 50 DFT values. This should be contrasted with the spectral estimation using standard methods, such as periodogram or correlogram methods. The power spectral density estimate using Welch method appears above the modulation coherence graph (third from top Figure 1). One can see that there is no distinction between the sinusoidal and the band-limited noise components since both contribute approximately same energy at their respective frequencies. PLACE FIGURE 1 ABOUT HERE III. INFLUENCE OF FREQUENCY MODULATION ON MODULATION COHERENCE FUNCTION The coherence analysis of the previous sections is written out as an amplitude modulation component added to a coherent (i.e. zero bandwidth) sinusoid. Mathematically speaking, the way the modulation coherence model is written out, one cannot distinguish between amplitude and phase modulation. In fact, the variations of conjugate positive/negative components u k (t ) uk (t ) include variation both in 15 amplitude and phase. Since phase modulations, or more precisely their derivative, are in fact frequency modulations, model does incorporate effect related to random frequency deviations. In this section we explore the effect of random frequency deviations on the modulation coherence function. In order to maintain clarity in the presentation, the mathematical details of the analysis are deferred until the Appendix. It should be noted that in natural playing conditions, the signal contains quazi-periodic frequency fluctuations that could be generally termed as vibrato. Such a signal seems to violate the modulation coherence model, since it does not fulfill the condition of having an exact period of the waveform in every frame. In such a case, we consider longer frames that correspond to a multiple of an average signal period. This allows viewing vibrato over a long enough signal frame as a particular case of the periodic modulation model, as described below. We consider a mathematical model of random frequency fluctuations as x(t ) K 1 K /2 k K / 2 k exp(i 2 f k (1 Ivk (t ))t ) (5.1) where vk (t ) is a slowly varying random process limited to the range [-1,1] and I is a multiplicative factor representing modulation depth that we call “modulation index”. Note that in this model the frequency modulation is proportional to the frequency of every partial. DFT analysis with period f k can be evaluated by considering the mismatch between the analysis frame size and the instantaneous random deviation in the signal period. The frequency modulations cause a mismatch between the period of x(t ) and the size of the DFT analysis frame, which is equal to the mean period f k (or an integer 16 multiple of it). This results in distortion of the spectrum of the mean component by a discrete Dirichlet kernel, or discrete sinc function, which can be considered as a windowing effect caused by using a rectangular window whose size does not match the period of the signal. Analytical development are simplified by allowing small frequency deviations. The details of the mathematical development of the signal mean spectrum and spectrum autocorrelations are given in the Appendix. Using these approximations, we have calculated the dependence of the mean coherence on frequency modulation. Figure 2 shows that decay of the modulation coherence as a function of a parameter rI , which we call “vibrato depth”, where r is the frequency partial index and I the modulation depth, assuming zero mean uniform distribution vr (t ) v(t ) Uniform[1,1] . PLACE FIGURE 2 ABOUT HERE The graph can be used to evaluate the decay in modulation coherence as a function of modulation index I for a given partial index r, or decay in modulation coherence as function of partial index for a constant modulation index. For instance, for modulation index of 1%, the first 20 partials of a perfectly coherent signal can be looked up from the graph up to 0.2 and they will lose coherence in the amount of less then 3% (coherence of the 20th partial would be around 0.97). Partial number 50 that undergoes 50% modulation (vibrato depth 0.5) drops in coherence to 0.87, and so on. 17 Discussion The estimator x (r ) is a point estimate of the modulation coherence x (r ) that measures the relative stability of the f r frequency component of a quasi-periodic signal. The term u(t ) is not an additive stationary noise, but rather a residual process capturing the random modulation in x (t ) that deviated from the mean periodic component s(t ) . It should be noted that s(t ) and u(t ) are different from the deterministic and residual stochastic signals that are estimated in sinusoidal plus noise models of speech or musical signals. In the sinusoidal model (Rodet 97) the signal is represented as the sum of a small number of sinusoids (partials) with time-varying amplitudes and frequencies. The signal parameters are estimated in the following manner: first, search for peaks is done using amplitudes of short-time Fourier transform. Then, the peak amplitudes and the instantaneous frequencies at peak location are noted. Finally a "nearest-neighbor" matching is performed across frames in order to relate different partials in time. In (Serra and Smith, 1989) the authors incorporated a non-sinusoidal "residual" in the model as part to the additive synthesis and represented it as an additive random signal. Their model is a “signal plus noise” model, where the sinusoidal part is estimated instantaneously based on signal DFT, without matching signal period to the size of DFT and with no statistical averaging. The proposed analysis method differs from the sinusoidal plus noise model in the following important aspects: The mean signal is a consistent estimate of the mean periodic signal that is estimated on period-to-period basis (i.e. using pitch-synchronous frames). This is different from the periodogram or spectral peak picking methods, both in 18 terms of precision and in terms of the underlying statistical model. It can be shown that the estimator of the variance 2u (r ) equals T S u ( f r ) O(1 / T ) , where Su ( f r ) is the spectrum of u(t) at frequency f r 2 r / T . It should be noted that both X (r ) and 2u (r ) can have non-zero values at same r, a situation that does not occur in sinusoidal plus residual models where every frequency bin is either deterministic sinusoidal or residual signal. For instance, coherence is defined for frequencies where the periodic component is extremely weak. To sum up, the underlying statistical assumptions provide the modulation coherence a precise statistical meaning. Moreover, the estimators allow detection of a mean periodic signal under very severe SMNR conditions, or detection of coherence for very weak periodic components. IV. ANALYSIS OF MUSIC INSTRUMENT SIGNALS In this section we apply coherence measure for analysis of instrumental sounds that have a well-defined pitch during a sustained portion of their sound. It is important to state that these sounds are not produced by an artificial blowing or bowing machine, but are played by a human player, with the entire attendant vibrato, amplitude and pitch variation. For instance, both the flute and the cello (stopped string note) are played with vibrato, while the trumpet is played with no vibrato. Analysis of signals with vibrato shows vibrato depth of the order of magnitude of few percent of the fundamental frequency. We investigated the coherence of several musical instruments, which include the Flute, Cello, Trumpet and French Horn. For purpose of analysis we used samples from a McGill University Master Samples. The signals are sampled at 44100 Hz, 16 bit. For most cases 19 we have used sounds of the same pitch, playing C4. The only exception was an analysis of open string cello sound, which was played at A3 note. As mentioned above, the sounds were produced by natural playing, i.e. including vibrato and amplitude modulations, as well as in presence of recording room conditions (these are not anechoic room recordings). The results displayed in the figures below correspond to signal segments of 40,000 samples of the different instruments (i.e. duration of 0.9 sec.). For signal periods were used at every frame. The exact periods for each signal were manually tuned around the candidate pitch frequency so as to maximize the signal coherence. The signals were up-sampled by factor 4 to allow for a quarter sample precision in finding the period. Analysis Results The signal coherence function provides additional information about the signal beyond the power spectrum. There are evident differences in coherence between different instruments. PLACE FIGURE 3 ABOUT HERE PLACE FIGURE 4 ABOUT HERE PLACE FIGURE 5 ABOUT HERE PLACE FIGURE 6 ABOUT HERE 20 One particularly interesting case of coherence is the Cello. It produces its sounds through a periodic bow excitation that passes through a complicated resonance body. There is a big difference in coherence between open and stopped sounds. According to the theoretical analysis of the influence of frequency modulation on signal coherence in the previous section, if the only effect of vibrato had been frequency modulation, the resulting reduction in coherence should have been only a few percent. In realty, the effect of vibrato for the Cello results in significant loss of coherence, a fact that suggests that vibrato causes other changes in the signal, beyond simple frequency modulation. PLACE FIGURE 7 ABOUT HERE The analysis in Figure 7 shows the analysis of an open string Cello sound, which is indeed very coherent. Discussion In this paper we have presented a method for analysis of deviations that occur in periodic sounds. Two types of deviations were considered: 1) Randomly modulated periodicity that consists of amplitude or phase deviations at every signal period that occur “around” a constant mean signal, and 2) Random frequency modulations that change the signal period and might occur due to vibrato. It has been shown theoretically that the effect of frequency modulation alone is negligible for lower partials at normal vibrato playing conditions (vibrato depth of quarter tone is only 3% of the fundamental frequency). Let us summarize our findings: 21 1. Stopped Cello sound played with vibrato has high coherence only at the first two partials and the rest of the harmonics have low coherence. 2. Flute has coherence above 0.5 for the first three harmonics. Higher harmonics have significantly lower coherence. 3. French Horn has high coherence up to first 12 or 13 partials. There is a smaller drop in coherence earlier (at partial 7). 4. Trumpet exhibits a consistently decaying pattern of coherence, decreasing gradually from almost perfect coherence to as low as 0.1 at 18th harmonic. 5. Cello without vibrato (open string) has a high coherence up to partial number 21, and the coherence remains relatively high for the higher partials as well. Considering the differences between open string and vibrato Cello sounds, it is evident that the drop in coherence caused by vibrato is much higher then what the effect of frequency modulation alone as it should have been according to the theory. This finding supports the assumption that a complex interaction between the bowed string excitation and the body exist (Beauchamp 1974, McIntyre et al. 1981, Schumacher 1992, Weinreich 1997). Reproducible results The MATLAB programs used to generate the examples are available at http://music.ucsd.edu/~sdubnov/SigCoh A general program for computing signal modulation coherence is available from the second author. 22 APPENDIX In order to analyze the effect of frequency fluctuations on modulation coherence we consider a process x(t ) K 1 K /2 k K / 2 k exp(i 2 f k (1 Ivk (t ))t ) , where vk (t ) is a slowly varying random process limited to the range [-1,1] and I is a modulation index. Note that in our model the frequency deviation is proportional to the frequency of every partial. We use the subscript k in the definition of vk (t ) to allow for separate deviations in every partial, although in most practical situations it could be the same or highly correlated process. l q Considering the m-th frame of the signal x ( m ), ..., x ( m T 1) , we write the DFT of component r T 1 X (r ) x((m 1)T t ) exp(i 2 f r t ) t 0 T 1 = T 1 K /2 t 0 k K / 2 K /2 k K / 2 k exp(i 2 kt / T (1 Ivk (t ))) exp( i 2 rt / T ) k Dk , I (r ) T 1 where Dk , I (r ) T 1 exp(i 2 t / T (k r kIvk (t ))) . t 0 Let us assume that the random deviation vk (t ) is constant during the frame l x( m q ), ..., x ( m T 1) . Defining k , I (r ) k (1 Ivk ) r , we rewrite the factor 23 T 1 Dk , I (r ) T 1 exp(i 2 tk , I (r ) / T ) T 1e 1 ik ,I ( r )(1 ) T t 0 sin k , I (r ) sin k , I (r ) / T In order to evaluate the mean and variance values of the r-th DFT component, we assume that the deviation is small relative to the spacing between the harmonics, and we shall limit ourselves to deviations that occur in the main lobe of the resulting function (this function is close to so called Dirichlet kernel or discrete sinc, with an additional multiplicative complex exponential). E ( X (r )) E ( Dr , I (r )) r , E ( X (r ) X * (r )) k k ' E ( Dk , I (r ) Dk*', I (r )) k k' 1 T2 sin 2 ( k , I (r )) 2 sin ( k , I (r ) / T ) k2 E k 2 1 2 sin ( r , I (r )) 2 r E 2 sin ( (r ) / T ) T r ,I Using the estimate ˆx (r ) E ( X (r )) E ( X (r ) X * (r )) we can obtain an analytical approximation to modulation-coherence function for different values of modulation parameters. One may note that for r=k we have r , I (r ) rIvr (t ) and Dr , I (r ) D1,rI (1) . In the case of uniformly distributed vk (t ) the averaging of Dr , I (r ) equals to integrating the function D1,rI (1) over r , I ( r ) values over a range of [-rI rI], with ˆx (r ) independent of the spectral amplitudes r . Figure 2 presents a graph of the theoretical evaluation of ˆx (r ) as a function of rI (that we call “vibrato depth”), obtained by integrating of Dr , I (r ) over different ranges of 24 [-rI rI]. This function is compared to modulation coherence values obtained from a simulation of modulation-coherence function using a randomly modulated partial (calculated over 2000 instances of a sinusoidal components with randomly chosen frequency modulations at different vibrato depths). Both theoretical derivation and the simulations were done assuming a uniform distribution of the frequency modulation index values within a vibrato depth range. The resulting expression could be considered either as a function of partial index r for constant modulation index I or as a function of I for a given r. 25 REFERENCES Barker, R. W., G-A Klutke, M. J. Hinich, C.N. Ramirez, and R. J. Thornhill (1994), "Development and application of a statistically based feature extraction algorithm for monitoring tool wear in circuit board assembly," Circuits, Systems, and Signal Processing 13 (4), 411–434. Beauchamp, W. (1974), “Time-Variant spectra of violin tones”, J.Acoust.Soc.Am., Vol 56, No 3, September. Carter, G.C., C.H. Knapp, and A.H. Nuttall (1973), “Estimation of the magnitudesquared coherence function via overlapped Fast Fourier Transform Processing”, IEEE Trans. On Audio Electroacoustics AU-21, 337-344. Dubnov S., Rodet X. (2003), “Investigation of Phase Coupling Phenomena in Sustained Portion Of Musical Instruments Sound”, J.Acoust.Soc.Am., Vol. 113, No 1, January, pp. 348-359 Fletcher, N.H., and Rossing, T.D. (1998), The Physics of Musical Instruments, 2nd ed. (Springer Verlag, New York). Gardner, W. A. and L. E .Franks (1975), "Characterization of cyclostationary random signal processes," IEEE Trans. Information Th. IT-21, 4-14. Hinich, M.J. (2000), “A Statistical Theory of Signal Coherence”, Journal of Oceanic Engineering, Vol. 25, No.2, 256-261 Hinich, M.J. and P. Wild (2001), ”Testing time series stationarity against an alternative whose mean is periodic”, Macroeconomic Dynamics, Vol.5, 380-412. 26 Jenkins, G.M. and D.G. Watts (1968), Spectral Analysis and its Applications, San Francisco: Holden Day. McIntyre M.E., Schumacher R.T. and Woodhouse J. (1981), “Apperiodicity in bowed string motion, Acustica”, 49, 13-32. McGill University Master Samples, Faulty of Music, McGill University, 555 Sherbrooke St., West, Montreal, Quebec. Rodet, X. (1997), “Musical Sound Signals Analysis/Synthesis: Sinusoidal+Residual and Elementary Waveform Models”, in Proc. of the IEEE Time-Frequency and Time-Scale Workshop, Coventy, U.K. Schumacher, R.T. (1992), “Analysis of apperiodicities in nearly periodic waveforms”, J. Acoust. Soc. Am. Vol. 91, 438-451. Serra, X. and Smith J.O. (1989), “Spectral Modeling Synthesis: A sound analysis/synthesis system based on deterministic plus stochastic decomposition”, Computer Music Journal, 14(4), pp. 12-24. Weinreich. G. (1997), “Directional Tone Color”, J. Acoust. Soc. Am., Vol. 101, 23892406 27 FIGURE CAPTIONS Figure 1: Signal waveform, the mean signal, power spectrum and modulation coherence, from top to bottom. The signal comprises of one coherent and another modulation-only components. See text for more detail. Figure 2: Modulation coherence as a function of vibrato depth. The graph shows the reduction of coherence as a function of partial number for constant modulation index, or as a function of modulation index for a given partial. See text for more detail. Figure 3: Cello C4 (with natural vibrato). Top figure shows several signal waveforms from different segments of the sustained portion of the signal. Second from top figure shows the mean signal waveform. Third figure from top to bottom shows the power spectrum (solid) and the mean spectrum (dashed). Bottom figure shows the modulation coherence function at the harmonics. Figure 4: Flute C4. Top figure shows the power spectrum (solid) and the mean spectrum (dashed). Bottom figure shows the modulation coherence function at the harmonics. A drop in coherence level below 0.5 is observed at partials higher then 3. Figure 5: French Horn C4. Top figure shows the power spectrum (solid) and the mean spectrum (dashed). Gradual decay in coherence is observed for lower partials, with a larger drop occurring at partial 13 and above. 28 Figure 6: Trumpet C4. Top figure shows the power spectrum (solid) and the mean spectrum (dashed). Figure 7: Cello A3 Open String (with no vibrato). Top figure shows several signal waveforms from different segments of the sustained portion of the signal. The mean signal is not shown but it is obviously very similar to the individual waveforms. Second figure from top shows the power spectrum (solid) and the mean spectrum (dashed). Bottom figure shows the modulation coherence function at the harmonics. 29