A. Heading Name of Lecturer : Roland A. Madden Title Date of Lecture Noted By : Sungsu Park and Zolina Olga : July. 24th. 2000 : Statistics - Power Spectrum Analysis B. Introductory Paragraph Some basic statistical techiniques related to spectral and cross-spectral analysis will be treated in this lecture. Most of the meteorological data are in the form of time series. The average and variance ( covariance or correlation for the two variables ) are major statistics calculated in the time domain. This time domain data can be transformed to the frequency domain using Fourier transform methods, which give additional useful information. The basic transform methods and some averaging process in the frequency domain frequency averaging and segment averaging - will be introduced. For the two variables, the cross spectrum analysis which allows one to determine the relationship between two time series as a function of frequency will be treated. In the final two sections, more realistic null hypothesis for the statistical test of meteorological data - red noise spectrum - and the variance of a time-averaged time series will be introduced. C. Lecture Section 1. One Variable - Time Domain For a given time series, y(t) ( t=1,2,...,N ), the average and variance are defined as followings ; N 1 y = ---- y t N Average : t=1 2 1 s = ------------N –1 Variance : N yt – y 2 t=1 2 s N – 1 -----2 s and are true and sample variance, respectively, then has a chi squared distribution with N-1 degrees of freedom,i.e, 2 If 2 2 2 2 N – 1 ; Prob s -------------------------------- = N –1 [ Example.1 ] Fig.1. shows the variance distribution of 100 sample groups, each of which consists of 64 point samples from a population of random numbers ( zero mean and variance, 1) . The solid line is a theoretical chi squared distribution with 63 d.o.f (degrees of freedom). The distribution of sample variance fits well to the chi squared distribution. Fig.1. 2. One Variable - Frequency Domain A sample time series, y(t), t=1,2,...,N can be expressed as a Fourier series; N2 y t = ao + 2nt 2nt - + b n sin ------------ a n cos ---------- N N n=1 Here, the average and total variance of given time series are y = ao and 2 1 s = --2 N2 an 2 2 + bn n=1 , respectively. The total variance is the sum of variance explained by each spectral mode, which was calculated by the followings; 2 2 2 a 2 2 2 E a cos = ------ cos d = a 2 2 0 , 2 2 2 b 2 2 2 E b cos = ------ cos d = b 2 2 0 So the variance explained by each spectral mode is given by a n2 + bn2 P n = ----------------2 The plotting estimates of spectral variance against frequency is called Periodogram. [ Example. 2 - Periodogram] Fig.2. shows the periodogram for one 64 member time series. The solid line is the periodogram of white noise and the dotted line is the upper limit of 95% confidence level of white noise spectrum deduced from the chi squared distribution with d.o.f = 2. The periodogram around 0.11 and 0.39 frequency band show the peaks significantly different from the white noise in 95% confidence level. It is important to note that we expect 5% ( or whatever ‘significance’ level is chosen ) of estimates to differ ’significantly’ from our null hypothesis even if the hypothesis true. This is important when doing spectrum analysis on data with no idea of what we are looking for. Then there is a good chance that one of the ‘significant peaks’ is one that we should expect and not something worth noting or investigating further. Fig.2. [ Example. 3 - Periodogram ] Fig.3. shows the 100 periodogram estimates of 64 points white noise sample. The 95% confidence level of sample variance distribution are calculated by the following formula and denoted by dotted line respectively in the figure. 2 2 2 2 ;0.975 2 0.0506 Prob s ------------------------------ = 0.975 ; ------------------------ = 0.0506 2 2 2 2 2 2 ;0.025 2 7.38 Prob s ------------------------------ = 0.025 ; ------------------ = 7.38 2 2 Fig.3. Even though the samples are from the white noise, some spectral peaks are significantly different from those of white noise. Also significant peaks are different from sample to sample, i.e, not to be reproduceable. These unstable spectral distributions are due to the few degrees of freedom inherent in each spectral mode (d.o.f=2 for each mode). A spectrum with so few degrees of freedom is unlikely to be reproducable. To increase the reliability of each spectral estimates, we should increase the number of degrees of freedom. The Spectrum with higher d.o.f is calculated from the periodogram. To increase d.o.f and compute a smooth and stable spectrum, the following two methods can be used. These two methods can be combined together. For the details, the reader may refer to Bendat and Piersol (1971), Blackman and Tukey (1958). 1. Frequency Averaging Adjacent periodogram estimates are linearly averaged with given weighting function ( equal weighting for this example ). For this case, the d.o.f of each spectral mode increases but spectral resolution decreases ( increase of band width ). For odd numbered M, the frequency averaged spectrum is calculated as follows. Here, N is the total length of give time series, M is the frequency averaging interval and n is the spectral mode index. M2 S ~n = wm Pn + m m = –M 2 1w m = ---M dof = 2 M M bandwidth bw = ----N [ Example.4 - Frequency Averaging ] Fig.4. shows the frequency averaged spectrum of 3136 random number sample points. 49 adjacent spectrums are averaged (M=49). The solid line is true white noise spectrum and dashed lines are upper and lower limit of 95% confidence level of white noise spectrum. The bandwidth is 49/3136 [fq] and in the upper-right part of the figure. As expected, most of the spectral peaks are not significantly different from the white noise spectrum. Fig.4. 2. Segment Averaging The whole time series is broken into L segments and a periodogram is calculated for each segment. The peridograms of all segments are linearly averaged for each spectral band with same weighting. This will increase the d.o.f of each spectral band but decrease the spectral resolution. a n2 l + b n2 l P n l = -----------------------2 1 S~ n = --- L [ Example. 5 - Segment Averaging ] L P n l l=1 dof = 2 L L bandwidt h bw = ---N Fig.5 shows the example of segment averaged spectrum. The frequency averaged spectrum is denoted by (*). As can be seen from the figure, most of the spectral peaks are not significantly different from those of white noise. Note that the frequency averaged spectrum is not exactly same as the segment averaged spectrum. Fig.5. 3. Two Variables - Time Domain For a given time series x(t) and y(t), the covariance and correlation are defined as followings ; x(t) ; t = 1, 2, .... , N y(t) ; t = 1, 2, .... , N N 1 ---- x t – x y t – y N CV(Covariance) = t=1 r(Correlation) CV -------------= S x Sy where Sx and Sy : standard deviation of x(t) and y(t) 4. Two Variables - Frequency Domain The cross spectrum analysis allows one to determine the relationship between two time series as a function of frequency. The cospectrum is in-phase covariance and the quadrature spectrum is out-of-phase covariance explained by corresponding spectral mode. N2 x t = Ao + 2nt 2nt - + B n sin ------------ A n cos ---------- N N n=1 N2 y t = ao + 2nt 2nt - + b n sin ------------ an cos ---------- N N n=1 Co(n) (Cospectrum) = a n An + b n B n ------------------------------2 a n Bn + bn A n ------------------------------2 Qu(n) (Quadrature) = N2 CV(Total Covariance) Co n = n=1 Using the frequency-averaged ( or segment-averaged ) cross spectrum, the Coherence Squared (Coh) and Phase can be calculated. For odd numbered M, the frequency averaged cross spectrum is calculated as following. 1 ~ o n = ---C - M 1 ~u n = ---Q - M 1w m = ---M M2 w m Co n + m m = –M 2 M 2 wm Qu n + m m = –M 2 dof = 2 M M bandwidth bw = ----N The Coherence Squared and Phase are given in the following. Coh(n) ( Coherence Squared ) = 2 ~u n 2 C~o n + Q ------------------------------------------S~x n S~y n , Ph(n)(Phase) = ~ u n Q A tan --~------------C o n [ Example.6 - Cross Spectrum ] Consider two random time series with zero mean and variance 1. The length of time series is N=2048 and 49 adjacent spectrums are averaged. For this case, the d.o.f of each spectral mode is 98 and bandwidth is 49/2048 [fq]. Assume that x(t)=y(t+1). Then what might we expect ? Say then y t cos 2nt -----------N t----------+ 1 x t cos 2n -------------N phase (n) = 2n ---------T Fig.6 shows the results of cross spectrum analysis. The spectrum of x(t) and y(t) are nearly same and the coherence is nearly 1 for all spectral bands. The phase increases linearly with frequency as expected. The methods of statistical significancy test of Coh and Phase can be found in Julian (1975) and Jenkins and Watts (1969). Fig.6. 5. More realistic null hypothesis for the meteorological data Usually meteorological data have some persistence (memory). So when we try to find some peculiar variation signals different from the usual persistence, it is good to use the persistence model - red noise - as the null hypothesis rather than the random number - white noise. The red noise is defined from the first order autoregressive process. The true variance and experimental spectra of red noise are given by the following formula. The experimental spectra is calculated using the finite Fourier transform of the lag correlation function. First Order Auto regressive Process : y t = yt – 1 + t Here : : 1 t Lag-One Autocorrelation, Random Number, i.e, 2 = 0 = 1 2 ------------2 1– Variance y = Spectrum 2 1– S f = --------------------------------------------2 1 + – 2 cos 2f where f : frequency 6. Variance of a time-average For a give time series y(t), t=1,2... with zero mean and variance What is the variance( )of M 2 M 2 2 = averages of M values? y t + + yt + M – 1 2 2 = E ------------------------------------------- = ----- 1 + 2 1 – 1 M 1 + 2 1 – 2 M 2 + + 2 M M – 1 M M = To M To 2 ----- T o M : Characteristic time between independent estimates, "variance inflation factor" : effective sample size Fig.7. shows the relationship between the length of time average ( T ) and variance inflation factor ( To ) associated with the lag-one autocorrelation in a first-order autoregressive process. is large, To rapidly increase with T. But as T becomes large, To shows little variances. When T is small and For the details, the reader may refer to Madden (1979). Fig.7. D. References 1. Significance Levels for Coherence Squared Julian,P.R.,1975: J.Atmos.Sci., V32,836-837 and references there in. 2. Confidence limits for Phase Angles Jenkins,G.M., and D.G.Watts,1969 : Spectrla Analysis and its applications. Holden-Day, SanFransisco, 525pp. 3. Frequency and Segment Averaging Bendat,J.S., and A.G.Piersol : Random Data Analysis and Measurement Procedures. Wiley-Intersciences, NewYork,407pp. 4. Degrees of Freedom for Spectrla Estimates Blackmon, R.B., and J.W. Tukey, 1958 : The Measurement of the Power Spectra, Dover, NewYork,190pp. 5. Variance of Time Average Madden, 1979 : J.Appl.Meteor., V18, p.703