PAGE 1 FIRST AN ADVERTISEMENT

advertisement

PAGE 1

FIRST AN ADVERTISEMENT

Neil Grossbard(x6356),proficient in numerical analysis and computer programming.

Thanks to the efforts of Jack Jasperse , Bamandas Basu ,

Paul Rothwel and many other scientists, my work has involved finding solutions to differential equations eigenvalue, eigenvector problems, integrals sometimes multi-dimensional, Pic codes,wavelets, and performed symbolic manipulation using Fortran to create Fortran code to quickly solve linear equations and take derivatives of the solution.

I am an (expert?) at estimating PSD’s.

(POWER SPECTRAL DENSITIES) and I am available to help in most numerical or simulation problems.

SEE "Power spectral artifacts in published balloon data and implications regarding saturated gravity wave theories"

J.G.R. VOL 105, NO. D4,PAGES 4667-4683 Feb 27,2000

BY Dewan and Grossbard

PAGE 2

MONTE CARLO METHODS,JACKKNIFE AND

BOOTSTRAP TO FIND ERROR ESTIMATES.

IN PARTICULAR FOR VARIOUS METHODS

OF ESTIMATING PSD’S.

(POWER SPECTRAL DENSITIES)

BY NEIL GROSSBARD

PAGE 3

I AM NEIL GROSSBARD.

IN THE PREVIOUS CENTURY I RECEIVED THE

FOLLOWING SCIENCE DEGREES:

BACHELORS IN PHYSICS (M.I.T.)

AND 3 MASTERS (NORTHEASTERN)

PHYSICS,MATHEMATICS AND ELECTRICAL

ENGINERING.

RECENTLY MY HOBBY IS STUDYING

NUMERICAL METHODS, ESPECIALLY SIGNAL

PROCESSING AND MACRO-ECONOMICS.

DOES ANYONE WISH TO ARGUE,TALK,E-MAIL

ABOUT ANY OF THESE TOPICS?

I THINK THE MOST USFUL PART OF THIS

PRESENTATION IS AN EXPLAINATION

OF MONTE CARLO,JACKKNIFE

AND BOOTSTRAP METHODS FOR ERROR

ANALYSIS.

PAGE 4

Consider any data which is used to estimate some parameter(s) how can you use Monte Carlo to assign errors to the results?

THE PLAN

First consider a simple Monte Carlo problem with no correlation.

Second how to estimate PSD (autocorrelation).

Then how to do Monte Carlo with correlated data.

Finally I will suggest the use of autoreggessive methods to estimate the PSD (BURG).

PAGE 5

What is the Monte-Carlo Method

Using Google from "A Practical Guide to Monte Carlo Simulation" by Jon Wittwer

"A Monte Carlo method is a technique that involves using random numbers and probability to solve problems"

I expect that usually you would use Monte Carlo when you use a possibly complicated procedure to calculate values from experimental data and wish to assign errors to the final results.

In Monte Carlo it is hoped that you have an estimate for the errors in the experimental data. You then can add an error to the actual data using psudo-random values and process the modified data to get a new result then repeat this proceedure many times and keep track of each result. You then can calculate estimates of the error statistics. (Histograms,standard deviation, confidence regions etc...)

I will now show a simple example.

The problem is to find the formula for a straight line which is estimated from a set of data points where the relative accuracy of each data points is known.

This is a weighted least squares problem.

PAGE 6

NUMBER OF MONTE CARLO SAMPLES=10000

CONSTANT= 1.8336

ACTUAL 1.8333

VARIANCE= 0.0017

ACTUAL 0.0017

0 1 2

STRAIGHT LINE

LINE 1 STAN DEV

3 4

-3

SLOPE=

ACTUAL

-2 -1 0 1

STANDARD DEVIATIONS

2

0.9994

1.0000

VARIANCE= 0.0020

ACTUAL 0.0020

3

0 1 2 3

ESTIMATED VALUES

4

-3 -2 -1 0 1

STANDARD DEVIATIONS

2 3

PAGE 7

NUMBER OF MONTE CARLO SAMPLES=10000

AVERAGE CONSTANT= 1.8335

ACTUAL 1.8333

AVERAGE VARIANCE= 0.0017

ACTUAL 0.0017

0 1 2

STRAIGHT LINE

LINE 1 STAN DEV

3 4

-3 -2 -1 0 1

STANDARD DEVIATIONS

2

AVERAGE SLOPE= 0.9998

ACTUAL 1.0000

3

AVERAGE VARIANCE=

ACTUAL

0.0021

0.0020

0 1 2 3

ESTIMATED VALUES

4

-3 -2 -1 0 1

STANDARD DEVIATIONS

2 3

PAGE 8

The Jackknife, the Bootstrap and

Other Resampling Plans

BRADLEY EFRON

Department of Statistics

Stanford University

CBMS-NSF

REGIONAL CONFERENCE SERIES

IN APPLIED MATHEMATICS

PAGE 9

The following is a fancy way of showing how to estimate the standard deviation using Jackknife (obvious?).

σ∧

JACK

=

( n-1 n n

∑ i=1

-

(i)

-x

(.)

) 2

) 1/2

The following is a fancy way of showing how to estimate the standard deviation using Bootstrap (obvious?).

SD=

{

1

_

B-1

B

∑ b=1

∗ b -

.

] 2 1/2

DID NOT KNOW WHERE TO PUT THIS

Also note you can get the Discrete Fourier Transform for any number of values (NO 2^N) using the Chirp z-

Transform Algorithm. Found in Digital Signal Processing by Openheim and Schafer p321.

PAGE 10

NUMBER OF SAMPLES FOR DRAW=20000 EACH DRAW=100

NUMBER OF MONTE CARLO SAMPLES=10000

AVERAGE CONSTANT= 1.8341

ACTUAL 1.8333

AVERAGE VARIANCE= 0.0017

ACTUAL 0.0017

0 1 2

STRAIGHT LINE

LINE 1 STAN DEV

3 4

-3 -2 -1 0 1

STANDARD DEVIATIONS

2

AVERAGE SLOPE= 1.0014

ACTUAL 1.0000

3

AVERAGE VARIANCE=

ACTUAL

0.0021

0.0020

0 1 2 3

ESTIMATED VALUES

4

-3 -2 -1 0 1

STANDARD DEVIATIONS

2 3

PAGE 11

RESULTS OF LINE FIT

SIMULATED

USING MONTE CARLO

USING SEPARATE MONTE CARLO

USING BOOT STRAP MONTE CARLO

CONSTANT SLOPE

1.8333E+00

To understand classical PSD estimation we need the convolution theorums.

1.0000E+00

1.8336E+00 9.9940E-01

1.8335E+00 9.9982E-01

1.8341E+00 1.0014E+00

CONTINUOUS CONVOLUTION INTEGRAL g(w)=

-

τ τ τ

h(w- )f( )d

τ τ τ

DISCRETE CONVOLUTION SUMMATION

]

-

] [ ] [ ] ∗ [ ]

CONVOLUTION IN FREQUENCY DOMAIN IS MULTIPLICATION IN TIME DOMAIN.

Consider some h n

] functions (tapers).

PAGE 12

TAPER

EXPANDED SCALE FFT

TAPER

EXPANDED SCALE FFT

EXPANDED SCALE FFT

PAGE 13

TIME-BANDWIDTH PRODUCT=2

THETA= 5.72E-05

LAMBDA= 1.00E+00

TAPER

THETA= 2.44E-03

LAMBDA= 9.98E-01

EXPANDED SCALE FFT

TAPER EXPANDED SCALE FFT

FROM MULTITAPER SUBROUTINE

PAGE 14

Now a fundimental problem when we use the Periodogram to estimate the PSD.

From Digital Signal Proceesing by Rabiner and Gold

The variance of the Periodogram= 4

{

1+

( is the standard deviation of the data.

sin

[ ω

N sin

N

]

) 2

}

Here N is the number of points and

ω is the frequency.

α

ω

All classical methods use some averaging technique

(explicitly or implicitly) to cut this variation.

All the methods I will mention assume the data is collored Gaussian.

This is generally a good assumption due to the central limit theorum.

When this is not true the simplest technique is to approximate the distribution as a sum of gaussians.

I had heard of this before but I was reminded by

Richard Hegbloom (thank you).

PAGE 15

Some methods of classical PSD estimation.

All of the estimates attempt to cut the variance of the PSD estimates at the expense of increasing the resolution frequency.

Original Blackman-Tukey

Calculate the biased estimate of the auto-correlation function.

n n is the number of data points.

τ

Use all possible i values varies between -AND+ N (a number <n).

Apply a Bartlett (Triangular) Window. The Triangle goes to 0 long before the end of the possible auto-correlation estimates.

Tukey suggests something like 1/10 of the data length.

Apply a cosine transform to the windowed auto-correlation.

Welch’s Method

Divide the data into usually overlapping segments.

For each segment apply your favorite window.

Find the magnitude squared of the discrete Fourier Transform for each windowed segment.

Average the separate PSD estimates.

Welch is recently the favored method, even though the next method blows it out of the water.

PAGE 16

The Multitaper method of PSD estimation.

SEE THE FOLLOWING

SPECTRAL ANALYSIS FOR PHYSICAL APPLICATIONS

Multitaper and Conventional Univariate Techniques

By Donald B. Percival and Andrew T. Walden

The general idea for Multitaper is to get PSD estimates from a set of windows each separately applied to the initial data.

In particular Thomson developed a set of multiple orthogonal tapers.

Discrete prolate spheroidal sequence data tapers.

A nice feature of this method and Burg’s method is that in principle they can be used when data values are missing.

Interpolating the missing data values for Burg’s method is usually not necessary.

THIS IS FILE RUNSIM RUN ON DATE May 14 12 KZ*= 2.00E-02 RAD/SEC ˝

NUMBER OF POINTS USED IN SIM=4096 NUMBER OF POINTS USED IN ANALYSIS=256 R.N. SEED=-12345 AVERAGE OF 1000 SCENES

SLOPE= FIT LINE BETWEEN FREQUENCIES 4.00E-03 AND 4.00E-02

106

105

104

103

102

101

100

10-1

10

-4

10

-3

10

-2

WAVENUMBER(RAD/METER)

10

-1

THE PSD TO BE ESTIMATED

105

104

103

106

THIS IS FILE RUNSIM RUN ON DATE May 14 12 KZ*= 2.00E-02 RAD/SEC ˝

NUMBER OF POINTS USED IN SIM=4096 NUMBER OF POINTS USED IN ANALYSIS=256 R.N. SEED=-12345 AVERAGE OF 1000 SCENES

SLOPE= FIT LINE BETWEEN FREQUENCIES 4.00E-03 AND 4.00E-02

AVERAGE SLOPE= -2.76E+00 ESTIMATED STANDARD DEVIATION

AVERAGE= 1.05E-02 EACH= 3.31E-01

THE MONTE CARLO RESULT

NO SPECIAL TAPER

NO PREWHITTENING

102

101

100

10-1

10

-4

10

-3

10

-2

WAVENUMBER(RAD/METER)

10

-1

106

THIS IS FILE RUNSIM1 RUN ON DATE May 14 12 KZ*= 2.00E-02 RAD/SEC ˝

NUMBER OF POINTS USED IN SIM=4096 NUMBER OF POINTS USED IN ANALYSIS=256 R.N. SEED=-12345 AVERAGE OF 1000 SCENES

SLOPE= FIT LINE BETWEEN FREQUENCIES 4.00E-03 AND 4.00E-02

AVERAGE SLOPE= -3.01E+00 ESTIMATED STANDARD DEVIATION=

AVERAGE= 1.04E-02 EACH= 3.30E-01

AFTER USING MULTITAPER WINDOW

105

104

103

102

101

100

10-1

10-2

10-3

10-4

10

-4

10

-3

10

-2

10

WAVENUMBER(RAD/METER)

-1

THE MONTE CARLO RESULT

MULTITAPER TAPER

NO PREWHITTENING

PAGE 20

NOW CONSIDER AUTOREGRESSIVE METHODS FOR PSD ESTIMATION.

Maximum Entropy Spectral Analysis and Autoregressive Decomposition

Tad J. Ulrych and Thomas N. Bishop

Rev. Geophysics and Space Phys., vol. 13,pp. 183-200 Feb, 1975

Linear Filter (Predictor) model

α α

Gaussian.

Or ignoring at and letting Z

-1 represent one time interval.

This can be written as:

Z

0 α α 2 α p

This can be seen as a polynomial in Z and the roots of this polynomial are residence frequencies of this filter.

Note: A sinusoid would have two roots that are complex conjugates of each other with magnitude 1.

Assuming the magnitude is slightly different from 1 this leads to a PSD estimate which is Lorenzian.

Burg’s algorithm finds

α

’s as the least square solution for the average in the forward and the backwards direction.

Further Burg assumes you can use Levinson’s and Durbin’s procedure for solving a Toeplitz matrix.

Burg’s algorithm is easily modified to handle most missing value situations.

PAGE 21

S(f)=

[

The general form of the resulting equations are.

ρ

ρ

ρ

1

(0)

2

(1)

.

.

.

M

(M-1)

ρ

ρ

ρ

1

(1)

2

(0)

.

.

.

...

...

M

(M-2) ...

ρ

ρ

ρ

1

2

M

(M-1)

(M-2)

.

.

.

(0)

] ] ]

α

.

MM

=

ρ

ρ

ρ

(1)

(2)

.

.

.

(M)

Here

ρ s(r) is the s th estimate of the rth lag of the autocorrelaion function.

If

ρ is not a function of s then the matrix is toeplitz.

α

ρ

Mr is the r th linear predictor of the M coefficients estimated.

(r) is another estimte of the rth lag of the autocorrelaion function.

Then the PSD (Power Spectral Density) is equal to:

1 -

2

σ

α

M

i=1

α π

2

PAGE 22

S(f)=

2

σ

1 -

M

i=1

α π

2

Remember

Z

0

= 1Z+ 2Z

2 α p

This can be seen as a polynomial in Z and the roots of this polynomial are residence frequencies of this filter.

The denominator or P(f) is simply the filter evaluated at frequency f and squared.

In general Burg (and other autoregressive methods) used in estimating

PSD’s are better at measuring power law slopes and positions of peaks in data than are classical methods.

WHY NOT BURG

What is the value of M (order)?

Too small not accurate,too large can add ficticious peaks.

Sharp peaks are not necessarily Lorenzian.

PAGE 23

-2.50

-3.00

-3.50

0

10-1

10-2

10-3

0

10-1

10-2

10-3

0

20 40 60 80 100 120 140 160 180 200

NUMBER OF COEFFICIENTS

FPE

20 40 60 80 100 120 140 160 180 200

NUMBER OF COEFFICIENTS

20 40 60 80 100 120 140 160 180 200

NUMBER OF COEFFICIENTS

105

104

103

106

THIS IS FILE RUNSIM2 RUN ON DATE May 14 12 KZ*= 2.00E-02 RAD/SEC ˝

NUMBER OF POINTS USED IN SIM=4096 NUMBER OF POINTS USED IN ANALYSIS=256 R.N. SEED=-12345 AVERAGE OF 1000 SCENES

SLOPE= FIT LINE BETWEEN FREQUENCIES 4.00E-03 AND 4.00E-02

AVERAGE SLOPE= -2.9997E+00 ESTIMATED STANDARD DEVIATION=

AVERAGE= 6.21E-03 EACH= 1.96E-01

USING BURG 25 COEFFICIENTS

102

101

100

10-1

10

-4

10

-3

10

-2

WAVENUMBER(RAD/METER)

10

-1

PAGE 25

EXPECTED

FROM MONTE CARLO

RAW PERIODOGRAM -2.76

1 TAPER MULTITAPER -3.01

25 COEF BURG

SLOPE

-3.00

STAN DEV

.331

.330

-2.9997

.196

THIS IS FILE PSDSCINTFITL PAGE NUMBER 2 May21 12 DATA ON FILE scint_sample.txt Taiwan 2006/04/07 125.00 Hz chan 1 2 3 15:07:43

0.006

0.004

0.002

0.000

-0.002

-0.004

-0.006

0 6 12 18 24 30

SECONDS

36 42 48 54 60 66

THIS IS FILE PSDSCINTFITL PAGE NUMBER 3 May21 12 DATA ON FILE scint_sample.txt Taiwan 2006/04/07 125.00 Hz chan 1 2 3 15:07:43

NUMBER OF COEFFICIENTS=6900 IPS=6900

10-4

10-5

10-6

10-7

10-8

10-9

10-10

10-11

10-12

10-13

10-14

10-4 10-3 10-2 10-1

FREQUENCY(HERTZ)

100 101 102

0.002

THIS IS FILE PSDSCINTFITL PAGE NUMBER 5 May21 12 DATA ON FILE scint_sample.txt Taiwan 2006/04/07 125.00 Hz chan 1 2 3 15:07:43

ORIGINAL RMS= 1.9343E-04

ORIGINAL # LOW FREQ SINUSOIDS=600

0.001

0.000

-0.001

-0.002

0 6 12 18 24 30

SECONDS

36 42 48 54 60 66

THIS IS FILE PSDSCINTFITL PAGE NUMBER 6 May21 12 DATA ON FILE scint_sample.txt Taiwan 2006/04/07 125.00 Hz chan 1 2 3 15:07:43

ORIGINAL RMS= 1.9343E-04

ORIGINAL # LOW FREQ SINUSOIDS=600

3.00E-03

2.50E-03

2.00E-03

1.50E-03

1.00E-03

5.00E-04

0.00E+00

-5.00E-04

-1.00E-03

-1.50E-03

-2.00E-03

0 6 12 18 24 30

SECONDS

36 42 48 54 60 66

THIS IS FILE PSDSCINTFITL PAGE NUMBER 7 May21 12 DATA ON FILE scint_sample.txt Taiwan 2006/04/07 125.00 Hz chan 1 2 3 15:07:43

NUMBER OF COEFFICIENTS=6900 IPS=6900

10-5

10-6

10-7

10-8

10-9

10-10

10-11

10-12

10-13

10-14

10-15

10-16

10-4 10-3 10-2 10-1

FREQUENCY(HERTZ)

100 101 102

0.003

THIS IS FILE PSDSCINTFITL PAGE NUMBER 8 May21 12 DATA ON FILE scint_sample.txt Taiwan 2006/04/07 125.00 Hz chan 1 2 3 15:07:43

ORIGINAL RMS= 1.9217E-04

ORIGINAL # LOW FREQ SINUSOIDS=600

FRACTIONAL STAN DEV OF LOWEST FREQ 4.0807E-01

0.002

0.001

0.000

-0.001

-0.002

-0.003

0 6 12 18 24 30

SECONDS

36 42 48 54 60 66

3.00E-04

2.00E-04

1.00E-04

0.00E+00

0

AVERAGE RMS ERROR= 1.5135E-04

6 12 18 24 30

SECONDS

36 42 48 54 60 66

PAGE 31

Hopefully you now see how powerful Monte Carlo can be in checking the accuracy in any estimation procedure.

In particular it overcomes some weaknesses found when using autoreggresive methods in estimating PSD’s.

PAGE 32

FIRST AN ADVERTISEMENT

Neil Grossbard(x6356),proficient in numerical analysis and computer programming.

Thanks to the efforts of Jack Jasperse , Bamandas Basu ,

Paul Rothwel and many other scientists, my work has involved finding solutions to differential equations eigenvalue, eigenvector problems, integrals sometimes multi-dimensional, Pic codes,wavelets, and performed symbolic manipulation using Fortran to create Fortran code to quickly solve linear equations and take derivatives of the solution.

I am an (expert?) at estimating PSD’s.

(POWER SPECTRAL DENSITIES) and I am available to help in most numerical or simulation problems.

SEE "Power spectral artifacts in published balloon data and implications regarding saturated gravity wave theories"

J.G.R. VOL 105, NO. D4,PAGES 4667-4683 Feb 27,2000

BY Dewan and Grossbard

Download