Some Statistical Models of Periodic Phenomenon William A. Sethares March 19, 2006 Models of statistical periodicity do not presume that the signal itself is periodic; rather, they assume that there is a periodicity in the underlying statistical distributions. For example, a record of temperature verses time might experience daily and yearly cycles. A record of power consumption in a city might have daily fluctuations in both mean and variance. Midnight usage might be very predictable (small variance) while mid-afternoon usage might be highly variable depending on the amount of air conditioning needed. Statistical models of periodicity need not be much more complex than the Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, WI 53706-1691 USA. 608-262-5669 sethares@ece.wisc.edu 1 ball and urn. This section presents four models that generate repetitive behavior: the lattice model presumes a fixed grid on which random variations are imposed, the successive model builds the rungs of its ladder from realizations of positive random variables, the additive model assumes an underlying periodicity in the mean of the stochastic process, while the AM model presumes a periodic change in the variances. The model which generates outputs most resembling the character of the data is the best choice for a particular application. There are two kinds of models. In the lattice and successive models, the interval of repetition (i.e, the period) is random. In the additive and AM models, the period is fixed and the values of the process are random. The models are compared and contrasted in Sect. 5. 1 Lattice Model Perhaps the simplest statistical model with repetitive behavior assumes a fixed grid of times with period and phase (or starting point) . A collection of random variables Æ with are assumed to have known distribution. The output of the model is the process defined by the 2 grid and the Æ by if Æ otherwise The process is zero almost everywhere except in the neighborhood of the grid points , where it takes on the value one at Æ . Thus the process Æ defines deviations from the regular lattice. If the mean of the Æ are zero then the expected time of the nth event is . If the Æ were degenerate (equal to zero for all ) then would be exactly periodic. The upper half of Fig. 1 diagrams the parameters in the lattice model and shows how the realizations cluster around the underlying grid. This model is generative in the sense that if the values for all the parameters are known it is easy to determine the probability that a candidate data set may have arisen (or been generated) from the model. For example, suppose that Æ is distributed normally with the same mean and variance for all . The model is fully specified by the parameter vector . For a given set of data , the probability that the data arose from a model of this form is 3 . δ1 δ2 δ3 δ4 δ5 τ+T τ+2T τ+3T τ+4T τ+5T Lattice Model s1 Successive Model τ s2 s3 τ+s1 τ+s1+s2 s4 3 τ+Σ si i=1 s5 4 τ+Σ si i=1 5 τ+Σ si i=1 Figure 1: The lattice model is built from a discrete stochastic process Æ that defines the deviation from an underlying periodic grid. The successive model is built from a collection of positive random variables that directly define the time between events. While similar over short times, the long term behavior of the two models is quite different. 4 2 Successive Model The successive model is built from a collection of positive random variables which are assumed to have known distribution. Let tive sum of the ’s and let be the cumula- be the starting point. Then the output is if otherwise The process is zero almost everywhere except when , where it is one. If the ’s had mean and variance zero, then the output would be strictly periodic with period . A diagram of the construction of the successive model is given in the lower half of Fig. 1. If the distribution of the is known (for instance, it might be the absolute value of a normal random variable with mean and variance ), the model is fully specified by the parameter vector . Despite the similarities in the short term, the long term behavior of the successive model is very different from the long term behavior of the lattice model. The nth event in the lattice model is always close to , whereas the nth event in the successive model depends on the values of all previous ’s. Effectively, the lattice approach models deviations from a single periodicity whereas the successive approach begins each new repetition where the previous one ended. 5 3 Additive Model The additive model presumes that there is an underlying -periodic sequence and a random process Æ that defines the deviations of the output from the periodic sequence. The values of the output, given by mod Æ fluctuate around the periodic sequence. This is illustrated in the upper half of Fig. 2 for the case using the three-periodic sequence . If the Æ is degenerate (zero for all ), the output simply repeats the periodic . This model represents a periodic signal corrupted by additive noise (the Æ ) and is amenable to a large variety of analytical techniques. The parameters needed to fully specify the model are the distribution of the Æ , the periodic sequence , and a starting time . 4 Amplitude Modulation (AM) Model Like the additive model, the AM model assumes an underlying -periodic sequence . Rather than defining mean values as above, the pe6 a1 δ1 a1 δ0 a0 Additive Model a0 a2 s0 ~ N(0,σ02 ) a0 δ2 a2 s3 ~ N(0,σ02 ) s1 ~ N(0,σ12 ) Amplitude Modulation Model δ1 δ0 δ0 ... δ2 s6 ~ N(0,σ02 ) s4 ~ N(0,σ12 ) ... s2 ~ N(0,σ22 ) τ τ+k τ+2k τ+3k s5 ~ N(0,σ22 ) τ+4k τ+5k τ+6k ... Figure 2: Two models representing cyclostationary stochastic processes. The additive model is most simply viewed as a periodic signal corrupted by additive noise Æ . Outputs of the AM model are generated from a stochastic process defined with a periodic pattern of variances . 7 riodic sequence defines the variance of the zero mean process , that is, the variance of is mod . The output of the process is a realization . The case is illustrated in the lower half of Fig. 2 for the three-periodic case with , , and . The model is fully specified by the distribution of the , the variances , and the starting time . 5 Discussion of the Models Stochastic variations in repetition are important in many fields. For example, in communications the message is effectively random while the modulation, synchronization, and frame structure impose periodic fluctuations. In mechanics, rotating elements provide periodicity while cavitation, turbulence, and varying loads impose randomness. Rhythmic physiological processes such as the heartbeat and brainwaves are clearly repetitive but are neither completely periodic nor fully predictable. There are two kinds of repetitive behavior shown by these models. In the lattice and successive models, the rate of repetition depends on a random process. In the additive and AM models the process is locked to a grid on which the statistics are defined by an underlying periodic sequence. Such processes have been studied extensively in the mathematical literature. 8 A discrete-time stochastic process1 is called stationary if both the expectation and the autocorrelation are independent of . Stationarity captures the idea that while a process may be random from moment to moment, it has an underlying unity in that the distribution of the process remains fixed through time. Only slightly more complex is the idea that the mean and autocorrelation may be periodic in time. A process is called cyclostationary if both the expectation and the autocorrelation are periodic functions of [1]. Both the additive and AM models are cyclostationary. More generally, suppose that is a periodic sequence and that is a stationary process. Then the sum is cyclostationary, which generalizes the additive model. Similarly, the product is cyclostationary, which generalizes the AM model. In modeling a particular phenomenon or data set, certain models may be more appropriate than others. For example, the lattice model specifies the time at which events occur and hence may be best applied to event-driven signals. A drum sequence recorded in MIDI might be an ideal fit. The successive model might be useful to predict a heartbeat (or other physioloigical signal) because each beat begins when the previous one ends; there is not necessarily an underlying rigid 1 Analogous definitions apply to continuous-time stochastic processes. 9 lattice to which the beats must conform. Either the additive or the AM models are more appropriate when trying to represent audio signals, because they deal explicitly with amplitudes and not solely with timing. Of course, many such models are possible. Correlations may occur in the various stochastic processes. Distributions may change over time, effectively combining various models such as the sum (additive) and product (AM) models. Or the timing of events within one of the cyclostationary models may be modified by a random variable (combining, for example, the timing structure of the successive model with the amplitude variations of the additive model). In applications where the parameters are not known beforehand, there is an essential tradeoff between the accuracy of the model and the number of parameters required. The step of choosing an appropriate model (one that is complex enough to capture the essence of the phenomenon of interest, yet simple enough to remain tractable) is probably the most difficult step in any application. References [1] W. A. Gardner and L. E. Franks, “Characterization of cyclostationary random signal processes,” IEEE Trans. Inform Theory, Vol. IT-21, pp. 414, 1975. 10