PAGE 1
FIRST AN ADVERTISEMENT
Neil Grossbard(x6356),proficient in numerical analysis and computer programming.
Thanks to the efforts of Jack Jasperse , Bamandas Basu ,
Paul Rothwel and many other scientists, my work has involved finding solutions to differential equations eigenvalue, eigenvector problems, integrals sometimes multi-dimensional, Pic codes,wavelets, and performed symbolic manipulation using Fortran to create Fortran code to quickly solve linear equations and take derivatives of the solution.
I am an (expert?) at estimating PSD’s.
(POWER SPECTRAL DENSITIES) and I am available to help in most numerical or simulation problems.
SEE "Power spectral artifacts in published balloon data and implications regarding saturated gravity wave theories"
J.G.R. VOL 105, NO. D4,PAGES 4667-4683 Feb 27,2000
BY Dewan and Grossbard
PAGE 2
MONTE CARLO METHODS,JACKKNIFE AND
BOOTSTRAP TO FIND ERROR ESTIMATES.
IN PARTICULAR FOR VARIOUS METHODS
OF ESTIMATING PSD’S.
(POWER SPECTRAL DENSITIES)
BY NEIL GROSSBARD
PAGE 3
I AM NEIL GROSSBARD.
IN THE PREVIOUS CENTURY I RECEIVED THE
FOLLOWING SCIENCE DEGREES:
BACHELORS IN PHYSICS (M.I.T.)
AND 3 MASTERS (NORTHEASTERN)
PHYSICS,MATHEMATICS AND ELECTRICAL
ENGINERING.
RECENTLY MY HOBBY IS STUDYING
NUMERICAL METHODS, ESPECIALLY SIGNAL
PROCESSING AND MACRO-ECONOMICS.
DOES ANYONE WISH TO ARGUE,TALK,E-MAIL
ABOUT ANY OF THESE TOPICS?
I THINK THE MOST USFUL PART OF THIS
PRESENTATION IS AN EXPLAINATION
OF MONTE CARLO,JACKKNIFE
AND BOOTSTRAP METHODS FOR ERROR
ANALYSIS.
PAGE 4
Consider any data which is used to estimate some parameter(s) how can you use Monte Carlo to assign errors to the results?
THE PLAN
First consider a simple Monte Carlo problem with no correlation.
Second how to estimate PSD (autocorrelation).
Then how to do Monte Carlo with correlated data.
Finally I will suggest the use of autoreggessive methods to estimate the PSD (BURG).
PAGE 5
What is the Monte-Carlo Method
Using Google from "A Practical Guide to Monte Carlo Simulation" by Jon Wittwer
"A Monte Carlo method is a technique that involves using random numbers and probability to solve problems"
I expect that usually you would use Monte Carlo when you use a possibly complicated procedure to calculate values from experimental data and wish to assign errors to the final results.
In Monte Carlo it is hoped that you have an estimate for the errors in the experimental data. You then can add an error to the actual data using psudo-random values and process the modified data to get a new result then repeat this proceedure many times and keep track of each result. You then can calculate estimates of the error statistics. (Histograms,standard deviation, confidence regions etc...)
I will now show a simple example.
The problem is to find the formula for a straight line which is estimated from a set of data points where the relative accuracy of each data points is known.
This is a weighted least squares problem.
PAGE 6
NUMBER OF MONTE CARLO SAMPLES=10000
CONSTANT= 1.8336
ACTUAL 1.8333
VARIANCE= 0.0017
ACTUAL 0.0017
0 1 2
STRAIGHT LINE
LINE 1 STAN DEV
3 4
-3
SLOPE=
ACTUAL
-2 -1 0 1
STANDARD DEVIATIONS
2
0.9994
1.0000
VARIANCE= 0.0020
ACTUAL 0.0020
3
0 1 2 3
ESTIMATED VALUES
4
-3 -2 -1 0 1
STANDARD DEVIATIONS
2 3
PAGE 7
NUMBER OF MONTE CARLO SAMPLES=10000
AVERAGE CONSTANT= 1.8335
ACTUAL 1.8333
AVERAGE VARIANCE= 0.0017
ACTUAL 0.0017
0 1 2
STRAIGHT LINE
LINE 1 STAN DEV
3 4
-3 -2 -1 0 1
STANDARD DEVIATIONS
2
AVERAGE SLOPE= 0.9998
ACTUAL 1.0000
3
AVERAGE VARIANCE=
ACTUAL
0.0021
0.0020
0 1 2 3
ESTIMATED VALUES
4
-3 -2 -1 0 1
STANDARD DEVIATIONS
2 3
PAGE 8
The Jackknife, the Bootstrap and
Other Resampling Plans
BRADLEY EFRON
Department of Statistics
Stanford University
CBMS-NSF
REGIONAL CONFERENCE SERIES
IN APPLIED MATHEMATICS
PAGE 9
The following is a fancy way of showing how to estimate the standard deviation using Jackknife (obvious?).
σ∧
JACK
=
( n-1 n n
∑ i=1
-
(i)
-x
(.)
) 2
) 1/2
The following is a fancy way of showing how to estimate the standard deviation using Bootstrap (obvious?).
SD=
1
_
B-1
B
∑ b=1
∧
∗ b -
∧
∗
.
] 2 1/2
DID NOT KNOW WHERE TO PUT THIS
Also note you can get the Discrete Fourier Transform for any number of values (NO 2^N) using the Chirp z-
Transform Algorithm. Found in Digital Signal Processing by Openheim and Schafer p321.
PAGE 10
NUMBER OF SAMPLES FOR DRAW=20000 EACH DRAW=100
NUMBER OF MONTE CARLO SAMPLES=10000
AVERAGE CONSTANT= 1.8341
ACTUAL 1.8333
AVERAGE VARIANCE= 0.0017
ACTUAL 0.0017
0 1 2
STRAIGHT LINE
LINE 1 STAN DEV
3 4
-3 -2 -1 0 1
STANDARD DEVIATIONS
2
AVERAGE SLOPE= 1.0014
ACTUAL 1.0000
3
AVERAGE VARIANCE=
ACTUAL
0.0021
0.0020
0 1 2 3
ESTIMATED VALUES
4
-3 -2 -1 0 1
STANDARD DEVIATIONS
2 3
PAGE 11
RESULTS OF LINE FIT
SIMULATED
USING MONTE CARLO
USING SEPARATE MONTE CARLO
USING BOOT STRAP MONTE CARLO
CONSTANT SLOPE
1.8333E+00
To understand classical PSD estimation we need the convolution theorums.
1.0000E+00
1.8336E+00 9.9940E-01
1.8335E+00 9.9982E-01
1.8341E+00 1.0014E+00
CONTINUOUS CONVOLUTION INTEGRAL g(w)=
∞
-
τ τ τ
∞
h(w- )f( )d
∞
τ τ τ
DISCRETE CONVOLUTION SUMMATION
]
-
∞
] [ ] [ ] ∗ [ ]
CONVOLUTION IN FREQUENCY DOMAIN IS MULTIPLICATION IN TIME DOMAIN.
Consider some h n
] functions (tapers).
PAGE 12
TAPER
EXPANDED SCALE FFT
TAPER
EXPANDED SCALE FFT
EXPANDED SCALE FFT
PAGE 13
TIME-BANDWIDTH PRODUCT=2
THETA= 5.72E-05
LAMBDA= 1.00E+00
TAPER
THETA= 2.44E-03
LAMBDA= 9.98E-01
EXPANDED SCALE FFT
TAPER EXPANDED SCALE FFT
FROM MULTITAPER SUBROUTINE
PAGE 14
Now a fundimental problem when we use the Periodogram to estimate the PSD.
From Digital Signal Proceesing by Rabiner and Gold
The variance of the Periodogram= 4
1+
( is the standard deviation of the data.
sin
[ ω
N sin
N
]
) 2
Here N is the number of points and
ω is the frequency.
α
ω
All classical methods use some averaging technique
(explicitly or implicitly) to cut this variation.
All the methods I will mention assume the data is collored Gaussian.
This is generally a good assumption due to the central limit theorum.
When this is not true the simplest technique is to approximate the distribution as a sum of gaussians.
I had heard of this before but I was reminded by
Richard Hegbloom (thank you).
PAGE 15
Some methods of classical PSD estimation.
All of the estimates attempt to cut the variance of the PSD estimates at the expense of increasing the resolution frequency.
Original Blackman-Tukey
Calculate the biased estimate of the auto-correlation function.
n n is the number of data points.
τ
Use all possible i values varies between -AND+ N (a number <n).
Apply a Bartlett (Triangular) Window. The Triangle goes to 0 long before the end of the possible auto-correlation estimates.
Tukey suggests something like 1/10 of the data length.
Apply a cosine transform to the windowed auto-correlation.
Welch’s Method
Divide the data into usually overlapping segments.
For each segment apply your favorite window.
Find the magnitude squared of the discrete Fourier Transform for each windowed segment.
Average the separate PSD estimates.
Welch is recently the favored method, even though the next method blows it out of the water.
PAGE 16
The Multitaper method of PSD estimation.
SEE THE FOLLOWING
SPECTRAL ANALYSIS FOR PHYSICAL APPLICATIONS
Multitaper and Conventional Univariate Techniques
By Donald B. Percival and Andrew T. Walden
The general idea for Multitaper is to get PSD estimates from a set of windows each separately applied to the initial data.
In particular Thomson developed a set of multiple orthogonal tapers.
Discrete prolate spheroidal sequence data tapers.
A nice feature of this method and Burg’s method is that in principle they can be used when data values are missing.
Interpolating the missing data values for Burg’s method is usually not necessary.
THIS IS FILE RUNSIM RUN ON DATE May 14 12 KZ*= 2.00E-02 RAD/SEC ˝
NUMBER OF POINTS USED IN SIM=4096 NUMBER OF POINTS USED IN ANALYSIS=256 R.N. SEED=-12345 AVERAGE OF 1000 SCENES
SLOPE= FIT LINE BETWEEN FREQUENCIES 4.00E-03 AND 4.00E-02
106
105
104
103
102
101
100
10-1
10
-4
10
-3
10
-2
WAVENUMBER(RAD/METER)
10
-1
THE PSD TO BE ESTIMATED
105
104
103
106
THIS IS FILE RUNSIM RUN ON DATE May 14 12 KZ*= 2.00E-02 RAD/SEC ˝
NUMBER OF POINTS USED IN SIM=4096 NUMBER OF POINTS USED IN ANALYSIS=256 R.N. SEED=-12345 AVERAGE OF 1000 SCENES
SLOPE= FIT LINE BETWEEN FREQUENCIES 4.00E-03 AND 4.00E-02
AVERAGE SLOPE= -2.76E+00 ESTIMATED STANDARD DEVIATION
AVERAGE= 1.05E-02 EACH= 3.31E-01
THE MONTE CARLO RESULT
NO SPECIAL TAPER
NO PREWHITTENING
102
101
100
10-1
10
-4
10
-3
10
-2
WAVENUMBER(RAD/METER)
10
-1
106
THIS IS FILE RUNSIM1 RUN ON DATE May 14 12 KZ*= 2.00E-02 RAD/SEC ˝
NUMBER OF POINTS USED IN SIM=4096 NUMBER OF POINTS USED IN ANALYSIS=256 R.N. SEED=-12345 AVERAGE OF 1000 SCENES
SLOPE= FIT LINE BETWEEN FREQUENCIES 4.00E-03 AND 4.00E-02
AVERAGE SLOPE= -3.01E+00 ESTIMATED STANDARD DEVIATION=
AVERAGE= 1.04E-02 EACH= 3.30E-01
AFTER USING MULTITAPER WINDOW
105
104
103
102
101
100
10-1
10-2
10-3
10-4
10
-4
10
-3
10
-2
10
WAVENUMBER(RAD/METER)
-1
THE MONTE CARLO RESULT
MULTITAPER TAPER
NO PREWHITTENING
PAGE 20
NOW CONSIDER AUTOREGRESSIVE METHODS FOR PSD ESTIMATION.
Maximum Entropy Spectral Analysis and Autoregressive Decomposition
Tad J. Ulrych and Thomas N. Bishop
Rev. Geophysics and Space Phys., vol. 13,pp. 183-200 Feb, 1975
Linear Filter (Predictor) model
α α
Gaussian.
Or ignoring at and letting Z
-1 represent one time interval.
This can be written as:
Z
0 α α 2 α p
This can be seen as a polynomial in Z and the roots of this polynomial are residence frequencies of this filter.
Note: A sinusoid would have two roots that are complex conjugates of each other with magnitude 1.
Assuming the magnitude is slightly different from 1 this leads to a PSD estimate which is Lorenzian.
Burg’s algorithm finds
α
’s as the least square solution for the average in the forward and the backwards direction.
Further Burg assumes you can use Levinson’s and Durbin’s procedure for solving a Toeplitz matrix.
Burg’s algorithm is easily modified to handle most missing value situations.
PAGE 21
S(f)=
[
The general form of the resulting equations are.
ρ
ρ
ρ
1
(0)
2
(1)
.
.
.
M
(M-1)
ρ
ρ
ρ
1
(1)
2
(0)
.
.
.
...
...
M
(M-2) ...
ρ
ρ
ρ
1
2
M
(M-1)
(M-2)
.
.
.
(0)
] ] ]
α
.
MM
=
ρ
ρ
ρ
(1)
(2)
.
.
.
(M)
Here
ρ s(r) is the s th estimate of the rth lag of the autocorrelaion function.
If
ρ is not a function of s then the matrix is toeplitz.
α
ρ
Mr is the r th linear predictor of the M coefficients estimated.
(r) is another estimte of the rth lag of the autocorrelaion function.
Then the PSD (Power Spectral Density) is equal to:
1 -
2
σ
α
M
i=1
α π
2
PAGE 22
S(f)=
2
σ
1 -
M
i=1
α π
2
Remember
Z
0
= 1Z+ 2Z
2 α p
This can be seen as a polynomial in Z and the roots of this polynomial are residence frequencies of this filter.
The denominator or P(f) is simply the filter evaluated at frequency f and squared.
In general Burg (and other autoregressive methods) used in estimating
PSD’s are better at measuring power law slopes and positions of peaks in data than are classical methods.
WHY NOT BURG
What is the value of M (order)?
Too small not accurate,too large can add ficticious peaks.
Sharp peaks are not necessarily Lorenzian.
PAGE 23
-2.50
-3.00
-3.50
0
10-1
10-2
10-3
0
10-1
10-2
10-3
0
20 40 60 80 100 120 140 160 180 200
NUMBER OF COEFFICIENTS
FPE
20 40 60 80 100 120 140 160 180 200
NUMBER OF COEFFICIENTS
20 40 60 80 100 120 140 160 180 200
NUMBER OF COEFFICIENTS
105
104
103
106
THIS IS FILE RUNSIM2 RUN ON DATE May 14 12 KZ*= 2.00E-02 RAD/SEC ˝
NUMBER OF POINTS USED IN SIM=4096 NUMBER OF POINTS USED IN ANALYSIS=256 R.N. SEED=-12345 AVERAGE OF 1000 SCENES
SLOPE= FIT LINE BETWEEN FREQUENCIES 4.00E-03 AND 4.00E-02
AVERAGE SLOPE= -2.9997E+00 ESTIMATED STANDARD DEVIATION=
AVERAGE= 6.21E-03 EACH= 1.96E-01
USING BURG 25 COEFFICIENTS
102
101
100
10-1
10
-4
10
-3
10
-2
WAVENUMBER(RAD/METER)
10
-1
PAGE 25
EXPECTED
FROM MONTE CARLO
RAW PERIODOGRAM -2.76
1 TAPER MULTITAPER -3.01
25 COEF BURG
SLOPE
-3.00
STAN DEV
.331
.330
-2.9997
.196
THIS IS FILE PSDSCINTFITL PAGE NUMBER 2 May21 12 DATA ON FILE scint_sample.txt Taiwan 2006/04/07 125.00 Hz chan 1 2 3 15:07:43
0.006
0.004
0.002
0.000
-0.002
-0.004
-0.006
0 6 12 18 24 30
SECONDS
36 42 48 54 60 66
THIS IS FILE PSDSCINTFITL PAGE NUMBER 3 May21 12 DATA ON FILE scint_sample.txt Taiwan 2006/04/07 125.00 Hz chan 1 2 3 15:07:43
NUMBER OF COEFFICIENTS=6900 IPS=6900
10-4
10-5
10-6
10-7
10-8
10-9
10-10
10-11
10-12
10-13
10-14
10-4 10-3 10-2 10-1
FREQUENCY(HERTZ)
100 101 102
0.002
THIS IS FILE PSDSCINTFITL PAGE NUMBER 5 May21 12 DATA ON FILE scint_sample.txt Taiwan 2006/04/07 125.00 Hz chan 1 2 3 15:07:43
ORIGINAL RMS= 1.9343E-04
ORIGINAL # LOW FREQ SINUSOIDS=600
0.001
0.000
-0.001
-0.002
0 6 12 18 24 30
SECONDS
36 42 48 54 60 66
THIS IS FILE PSDSCINTFITL PAGE NUMBER 6 May21 12 DATA ON FILE scint_sample.txt Taiwan 2006/04/07 125.00 Hz chan 1 2 3 15:07:43
ORIGINAL RMS= 1.9343E-04
ORIGINAL # LOW FREQ SINUSOIDS=600
3.00E-03
2.50E-03
2.00E-03
1.50E-03
1.00E-03
5.00E-04
0.00E+00
-5.00E-04
-1.00E-03
-1.50E-03
-2.00E-03
0 6 12 18 24 30
SECONDS
36 42 48 54 60 66
THIS IS FILE PSDSCINTFITL PAGE NUMBER 7 May21 12 DATA ON FILE scint_sample.txt Taiwan 2006/04/07 125.00 Hz chan 1 2 3 15:07:43
NUMBER OF COEFFICIENTS=6900 IPS=6900
10-5
10-6
10-7
10-8
10-9
10-10
10-11
10-12
10-13
10-14
10-15
10-16
10-4 10-3 10-2 10-1
FREQUENCY(HERTZ)
100 101 102
0.003
THIS IS FILE PSDSCINTFITL PAGE NUMBER 8 May21 12 DATA ON FILE scint_sample.txt Taiwan 2006/04/07 125.00 Hz chan 1 2 3 15:07:43
ORIGINAL RMS= 1.9217E-04
ORIGINAL # LOW FREQ SINUSOIDS=600
FRACTIONAL STAN DEV OF LOWEST FREQ 4.0807E-01
0.002
0.001
0.000
-0.001
-0.002
-0.003
0 6 12 18 24 30
SECONDS
36 42 48 54 60 66
3.00E-04
2.00E-04
1.00E-04
0.00E+00
0
AVERAGE RMS ERROR= 1.5135E-04
6 12 18 24 30
SECONDS
36 42 48 54 60 66
PAGE 31
Hopefully you now see how powerful Monte Carlo can be in checking the accuracy in any estimation procedure.
In particular it overcomes some weaknesses found when using autoreggresive methods in estimating PSD’s.
PAGE 32
FIRST AN ADVERTISEMENT
Neil Grossbard(x6356),proficient in numerical analysis and computer programming.
Thanks to the efforts of Jack Jasperse , Bamandas Basu ,
Paul Rothwel and many other scientists, my work has involved finding solutions to differential equations eigenvalue, eigenvector problems, integrals sometimes multi-dimensional, Pic codes,wavelets, and performed symbolic manipulation using Fortran to create Fortran code to quickly solve linear equations and take derivatives of the solution.
I am an (expert?) at estimating PSD’s.
(POWER SPECTRAL DENSITIES) and I am available to help in most numerical or simulation problems.
SEE "Power spectral artifacts in published balloon data and implications regarding saturated gravity wave theories"
J.G.R. VOL 105, NO. D4,PAGES 4667-4683 Feb 27,2000
BY Dewan and Grossbard