doc

advertisement
A. Heading
Name of Lecturer : Roland A. Madden
Title
Date of Lecture
Noted By : Sungsu Park and Zolina Olga
: July. 24th. 2000
: Statistics - Power Spectrum Analysis
B. Introductory Paragraph
Some basic statistical techiniques related to spectral and cross-spectral analysis will be treated in this lecture. Most of the
meteorological data are in the form of time series. The average and variance ( covariance or correlation for the two variables ) are major
statistics calculated in the time domain. This time domain data can be transformed to the frequency domain using Fourier transform
methods, which give additional useful information. The basic transform methods and some averaging process in the frequency domain frequency averaging and segment averaging - will be introduced. For the two variables, the cross spectrum analysis which allows one to
determine the relationship between two time series as a function of frequency will be treated. In the final two sections, more realistic null
hypothesis for the statistical test of meteorological data - red noise spectrum - and the variance of a time-averaged time series will be
introduced.
C. Lecture Section
1. One Variable - Time Domain
For a given time series, y(t) ( t=1,2,...,N ), the average and variance are defined as followings ;
N
1
y = ----  y  t 
N
Average :
t=1
2
1
s = ------------N –1
Variance :
N
  yt  – y
2
t=1
2
s
 N – 1   -----2
s

and
are true and sample variance, respectively, then
has a chi squared distribution with N-1 degrees of freedom,i.e,
2
If
2
2
2
2    N – 1 ;
Prob s  -------------------------------- = 
N –1
[ Example.1 ]
Fig.1. shows the variance distribution of 100 sample groups, each of which consists of 64 point samples from a population of random
numbers ( zero mean and variance, 1) . The solid line is a theoretical chi squared distribution with 63 d.o.f (degrees of freedom). The
distribution of sample variance fits well to the chi squared distribution.
Fig.1.
2. One Variable - Frequency Domain
A sample time series, y(t), t=1,2,...,N can be expressed as a Fourier series;
N2
y t = ao +
2nt
2nt
- + b n  sin  ------------
 a n  cos  ---------- N 
N 
n=1
Here, the average and total variance of given time series are
y = ao
and
2
1
s = --2
N2
 an
2
2
+ bn 
n=1
, respectively.
The total variance is the sum of variance explained by each spectral mode, which was calculated by the followings;
2
2
2
a 2
2
2
E  a cos   = ------   cos   d = a  2
2 0
,
2
2
2
b 2
2
2
E  b cos   = ------   cos   d = b  2
2 0
So the variance explained by each spectral mode is given by
a n2 + bn2
P  n  = ----------------2
The plotting estimates of spectral variance against frequency is called Periodogram.
[ Example. 2 - Periodogram]
Fig.2. shows the periodogram for one 64 member time series. The solid line is the periodogram of white noise and the dotted line is the
upper limit of 95% confidence level of white noise spectrum deduced from the chi squared distribution with d.o.f = 2. The periodogram
around 0.11 and 0.39 frequency band show the peaks significantly different from the white noise in 95% confidence level. It is important
to note that we expect 5% ( or whatever ‘significance’ level is chosen ) of estimates to differ ’significantly’ from our null hypothesis
even if the hypothesis true. This is important when doing spectrum analysis on data with no idea of what we are looking for. Then there
is a good chance that one of the ‘significant peaks’ is one that we should expect and not something worth noting or investigating further.
Fig.2.
[ Example. 3 - Periodogram ]
Fig.3. shows the 100 periodogram estimates of 64 points white noise sample. The 95% confidence level of sample variance distribution
are calculated by the following formula and denoted by dotted line respectively in the figure.
2
2
2    2 ;0.975
2  0.0506
Prob s  ------------------------------ = 0.975 ; ------------------------ = 0.0506
2
2
2
2
2    2 ;0.025
2  7.38
Prob s  ------------------------------ = 0.025 ; ------------------ = 7.38
2
2
Fig.3.
Even though the samples are from the white noise, some spectral peaks are significantly different from those of white noise. Also
significant peaks are different from sample to sample, i.e, not to be reproduceable. These unstable spectral distributions are due to the
few degrees of freedom inherent in each spectral mode (d.o.f=2 for each mode). A spectrum with so few degrees of freedom is unlikely
to be reproducable. To increase the reliability of each spectral estimates, we should increase the number of degrees of freedom. The
Spectrum with higher d.o.f is calculated from the periodogram. To increase d.o.f and compute a smooth and stable spectrum, the
following two methods can be used. These two methods can be combined together. For the details, the reader may refer to Bendat and
Piersol (1971), Blackman and Tukey (1958).
1. Frequency Averaging
Adjacent periodogram estimates are linearly averaged with given weighting function ( equal weighting for this example ). For this case,
the d.o.f of each spectral mode increases but spectral resolution decreases ( increase of band width ).
For odd numbered M, the frequency averaged spectrum is calculated as follows. Here, N is the total length of give time series, M is the
frequency averaging interval and n is the spectral mode index.
M2
S ~n  =

wm  Pn + m
m = –M  2
1w m = ---M
dof = 2  M
M
bandwidth  bw  = ----N
[ Example.4 - Frequency Averaging ]
Fig.4. shows the frequency averaged spectrum of 3136 random number sample points. 49 adjacent spectrums are averaged (M=49). The
solid line is true white noise spectrum and dashed lines are upper and lower limit of 95% confidence level of white noise spectrum. The
bandwidth is 49/3136 [fq] and in the upper-right part of the figure. As expected, most of the spectral peaks are not significantly different
from the white noise spectrum.
Fig.4.
2. Segment Averaging
The whole time series is broken into L segments and a periodogram is calculated for each segment. The peridograms of all segments are
linearly averaged for each spectral band with same weighting. This will increase the d.o.f of each spectral band but decrease the spectral
resolution.
a n2 l + b n2 l
P  n l  = -----------------------2
1
S~ n  = --- 
L
[ Example. 5 - Segment Averaging ]
L
 P  n l 
l=1
dof = 2  L
L
bandwidt h  bw  = ---N
Fig.5 shows the example of segment averaged spectrum. The frequency averaged spectrum is denoted by (*). As can be seen from the
figure, most of the spectral peaks are not significantly different from those of white noise. Note that the frequency averaged spectrum is
not exactly same as the segment averaged spectrum.
Fig.5.
3. Two Variables - Time Domain
For a given time series x(t) and y(t), the covariance and correlation are defined as followings ;
x(t) ; t = 1, 2, .... , N
y(t) ; t = 1, 2, .... , N
N
1
----    x  t  – x    y  t  – y 
N
CV(Covariance) =
t=1
r(Correlation)
CV -------------= S x  Sy
where Sx and Sy : standard deviation of x(t) and y(t)
4. Two Variables - Frequency Domain
The cross spectrum analysis allows one to determine the relationship between two time series as a function of frequency. The
cospectrum is in-phase covariance and the quadrature spectrum is out-of-phase covariance explained by corresponding spectral mode.
N2
x t = Ao +
2nt
2nt
- + B n  sin  ------------
 A n  cos  ---------- N 
N 
n=1
N2
y  t  = ao +
2nt
2nt
- + b n  sin  ------------
 an  cos  ---------- N 
N 
n=1
Co(n) (Cospectrum)
=
a n An + b n B n
------------------------------2
a n Bn + bn A n
------------------------------2
Qu(n) (Quadrature) =
N2
CV(Total Covariance)
 Co  n 
=
n=1
Using the frequency-averaged ( or segment-averaged ) cross spectrum, the Coherence Squared (Coh) and Phase can be calculated. For
odd numbered M, the frequency averaged cross spectrum is calculated as following.
1
~ o  n  = ---C
-
M
1
~u  n  = ---Q
-
M
1w m = ---M
M2

w m  Co  n + m 
m = –M  2
M 2

wm  Qu  n + m 
m = –M  2
dof = 2  M
M
bandwidth  bw  = ----N
The Coherence Squared and Phase are given in the following.
Coh(n) ( Coherence Squared )
=
2
~u  n  2
C~o  n  + Q
------------------------------------------S~x  n   S~y  n 
, Ph(n)(Phase) =
~ u n
Q
A  tan --~------------C o  n
[ Example.6 - Cross Spectrum ]
Consider two random time series with zero mean and variance 1. The length of time series is N=2048 and 49 adjacent spectrums are
averaged. For this case, the d.o.f of each spectral mode is 98 and bandwidth is 49/2048 [fq]. Assume that x(t)=y(t+1). Then what might
we expect ?
Say
then
y  t   cos 2nt
-----------N
 t----------+ 1 x  t   cos 2n
-------------N
phase (n) =
2n
---------T
Fig.6 shows the results of cross spectrum analysis. The spectrum of x(t) and y(t) are nearly same and the coherence is nearly 1 for all
spectral bands. The phase increases linearly with frequency as expected. The methods of statistical significancy test of Coh and Phase
can be found in Julian (1975) and Jenkins and Watts (1969).
Fig.6.
5. More realistic null hypothesis for the meteorological data
Usually meteorological data have some persistence (memory). So when we try to find some peculiar variation signals different from the
usual persistence, it is good to use the persistence model - red noise - as the null hypothesis rather than the random number - white noise.
The red noise is defined from the first order autoregressive process. The true variance and experimental spectra of red noise are given by
the following formula. The experimental spectra is calculated using the finite Fourier transform of the lag correlation function.
First Order Auto regressive Process :
y t =   yt – 1  + t 
Here
:
:
 1
t 
Lag-One Autocorrelation,
Random Number, i.e,
2
 = 0   = 1
2

------------2
1–
Variance y =
Spectrum
2
1–
S  f  = --------------------------------------------2
1 +  – 2 cos 2f
where f : frequency
6. Variance of a time-average
For a give time series y(t), t=1,2... with zero mean and variance
What is the variance( )of
M
2
M
2

2
=
averages of M values?
y t +  + yt + M – 1 2
2
= E  -------------------------------------------
= -----   1 + 2   1 – 1  M    1 + 2   1 – 2  M    2 +  + 2  M   M – 1 


M
M
=
To
M  To
2
-----  T o
M
: Characteristic time between independent estimates, "variance inflation factor"
: effective sample size
Fig.7. shows the relationship between the length of time average ( T ) and variance inflation factor ( To ) associated
with the lag-one autocorrelation in a first-order autoregressive process.
 is large, To rapidly increase with T. But as T becomes large, To shows little variances.
When T is small and
For the details, the reader may refer to Madden (1979).
Fig.7.
D. References
1. Significance Levels for Coherence Squared
Julian,P.R.,1975: J.Atmos.Sci., V32,836-837 and references there in.
2. Confidence limits for Phase Angles
Jenkins,G.M., and D.G.Watts,1969 : Spectrla Analysis and its applications. Holden-Day, SanFransisco, 525pp.
3. Frequency and Segment Averaging
Bendat,J.S., and A.G.Piersol : Random Data Analysis and Measurement Procedures. Wiley-Intersciences, NewYork,407pp.
4. Degrees of Freedom for Spectrla Estimates
Blackmon, R.B., and J.W. Tukey, 1958 : The Measurement of the Power Spectra, Dover, NewYork,190pp.
5. Variance of Time Average
Madden, 1979 : J.Appl.Meteor., V18, p.703
Download