MatLab 2 Edition Lecture 8:

advertisement
Environmental Data Analysis with MatLab
2nd Edition
Lecture 8:
Solving Generalized Least Squares Problems
SYLLABUS
Lecture 01
Lecture 02
Lecture 03
Lecture 04
Lecture 05
Lecture 06
Lecture 07
Lecture 08
Lecture 09
Lecture 10
Lecture 11
Lecture 12
Lecture 13
Lecture 14
Lecture 15
Lecture 16
Lecture 17
Lecture 18
Lecture 19
Lecture 20
Lecture 21
Lecture 22
Lecture 23
Lecture 24
Lecture 25
Lecture 26
Using MatLab
Looking At Data
Probability and Measurement Error
Multivariate Distributions
Linear Models
The Principle of Least Squares
Prior Information
Solving Generalized Least Squares Problems
Fourier Series
Complex Fourier Series
Lessons Learned from the Fourier Transform
Power Spectra
Filter Theory
Applications of Filters
Factor Analysis
Orthogonal functions
Covariance and Autocorrelation
Cross-correlation
Smoothing, Correlation and Spectra
Coherence; Tapering and Spectral Analysis
Interpolation
Linear Approximations and Non Linear Least Squares
Adaptable Approximations with Neural Networks
Hypothesis testing
Hypothesis Testing continued; F-Tests
Confidence Limits of Spectra, Bootstraps
Goals of the lecture
use prior information to solve exemplary
problems
review of last lecture
failure-proof least-squares
add information to the problem that guarantees
that matrices like [GTG] are never singular
such information is called
prior information
examples of prior information
soil has density will be around 1500 kg/m3
give or take 500 or so
chemical components sum to 100%
pollutant transport is subject to the diffusion equation
water in rivers always flows downhill
linear prior information
with covariance Ch
simplest example
model parameters near known values
m1 = 10 ± 5
m2 = 20 ± 5
m1 and m2
uncorrelated
Hm = h with
H=I
h = [10, 20]T
Ch=
52
0
0
52
another example
relevant to chemical constituents
H
h
use Normal p.d.f. to represent
prior information
Normal p.d.f.
defines an
“error in prior information”
individual
errors weighted
by their
certainty
now suppose that we observe some data:
d = dobs
with covariance Cd
represent the observations with a
Normal p.d.f.
p(d) =
observations
mean of data
predicted by the
model
this Normal p.d.f.
defines an
“error in data”
prediction error
weighted by its
certainty
Generalized Principle of Least Squares
the best mest is the one that
minimizes the total error with respect to m
justified by Bayes Theorem
in the last lecture
generalized least squares
solution
pattern same as ordinary least squares …
… but with more complicated matrices
(new material)
How to use the
Generalized Least Squares
Equations
Generalized least squares is
equivalent to solving
Fm=f
by ordinary least squares
Cd-½G
Ch-½H
m=
Cd-½d
Ch-½h
uncorrelated, uniform variance case
Cd = σd2 I
2
Ch = σh I
σd-1G
σh-1H
m=
σd-1d
σh-1h
top part
data equation weighted by its certainty
σd-1G
σh-1H
m=
σd-1d
σh-1h
σd-1 { Gm=d }
data equation
certainty of
measurement
bottom part
prior information equation weighted by its certainty
σd-1G
σh-1H
m=
σd-1d
σh-1h
σh-1 { Hm=h }
prior information
equation
certainty of prior
information
example
no prior information but
data equation weighted by its certainty
σd1-1G11
σd1-1G12
…
σd1-1G1M
σd2-1G21
σd2-1G22
…
σd2-1G2M
…
…
…
…
σdN-1GN1
σdN-1GN2
…
σdN-1GNM
σd1-1d1
m=
called “weighted least squares”
σd2-1d2
…
σdN-1dN
straight line fit
no prior information but
data equation weighted by its certainty
500
400
fit
300
200
100
0
data with low
variance
-100
-200
-300
data with
high variance
-400
-500
0
10
20
30
40
50
60
70
80
90
100
straight line fit
no prior information but
data equation weighted by its certainty
500
400
300
fit
200
100
0
-100
-200
data with low
variance
-300
data with
high variance
-400
-500
0
10
20
30
40
50
60
70
80
90
100
another example
prior information that the model parameters are small
m≈0
H=I
h=0
assume uncorrelated with uniform variances
Cd = σd2 I
Ch = σh2 I
Fm=h
σd-1G
σh-1I
m=
σd-1d
σh-10
[FTF]-1FTm=f
m=[GTG + ε2I]-1GTd with ε= σd/σm
called “damped least squares”
m=[GTG + ε2I]-1GTd with ε= σd/σm
ε=0: minimize the prediction error
ε→∞: minimize the size of the model parameters
0<ε<∞: minimize a combination of the two
m=[GTG + ε2I]-1GTd with ε= σd/σm
advantages:
really easy to code
mest = (G’*G+(e^2)*eye(M))\(G’*d);
always works
disadvantages:
often need to determine ε empirically
prior information that the model parameters are small
not always sensible
smoothness as prior information
model parameters represent the
values of a function
m(x)
at equally spaced increments
along the x-axis
function approximated by its values
at a sequence of x’s
mi
m(x)
mi+1
xi xi+1
Δx
m(x) → m=[m1, m2, m3, …, mM]T
x
rough function has large second
derivative
a smooth function
is one that is not rough
a smooth function has a small
second derivative
approximate expressions for second
derivative
m(x)
x
xi
i-th row of H:
2nd derivative at xi
(Δx)-2 [ 0, 0, 0, … 0, 1, -2, 1, 0, …. 0, 0, 0]
column i
what to do about m1 and mM?
not enough points for 2nd derivative
two possibilities
no prior information for m1 and mM
or
prior information about flatness
(first derivative)
m(x)
x
x1
first row of H:
1st derivative at x1
(Δx)-1 [ -1, 1, 0, … 0]
“smooth interior” / “flat ends” version of Hm=h
h=0
example problem:
m=d
to fill in the missing model parameters so that
the resulting curve is smooth
x
the model parameters, m
an ordered list of all model parameters
m1
m2
m=
m3
m4
m5
m6
m7
the data, d
just the model parameters that were
measured
d=
d3
d5
d6
=
m3
m5
m6
data equation
Gm=d
m1
0
0
1
0
0
0
0
m2
0
0
0
0
1
0
0
m3
…
0
0
0
0
0
1
0
m4
m5
m6
=
d3
d5
d7
m7
data kernel “associates” a
measured model parameter with an
unknown model parameter
data are just model
parameters that have
been observed
The prior information equation, Hm=h
“smooth interior” / “flat ends”
h=0
put them together into the
Generalized Least Squares equation
σd-1G
F=
σd-1d
f=
σh-1H
0
choose σd/σm to be << 1
data takes precedence over prior information
the solution using MatLab
graph of the solution
2
solution passes close to data
solution is smooth
1.5
d
m=d
1
0.5
0
-0.5
-1
-1.5
-2
0
10
20
30
40
50
x
x
60
70
80
90
100
Two MatLab issues
Issue 1: matrices like G and F can be quite big, but
contain mostly zeros.
Solution 1: Use “sparse matrices” which don’t store
the zeros
Issue 2: matrices like GTG and FTF are not as sparse
as G and F
Solution 2: Solve equation by a method, such as
“biconjugate gradients” that doesn’t require the
calculation of GTG and FTF
Using “sparse matrices” which don’t store the zeros:
N=200000;
M=100000;
F=spalloc(N,M,3*M);
“sparse allocate”
creates a 200000× 100000 matrix that can hold up
to 300000 non-zero elements.
note that an ordinary matrix would have 20,000,000,000 elements
Once allocated, sparse matrices are used just like
ordinary matrices …
… they just consume less memory.
Issue 2: Use biconjugate gradient solver to avoid
calculating GTG and FTF
Suppose that we want to solve FTF m = FTf
The standard way would be:
mest = (F’F)\(F’f);
but that requires that we compute F’F
a “biconjugate gradient” solver requires only that we
be able to multiply a vector, v, by GTG, where the
solver supplies the vector, v.
so we have to calculate y=GTG v
the trick is to calculate t=Gv first,
and then calculate y=G’t
this is done in a Matlab function, afun()
function y = afun(v,transp_flag)
global F;
t = F*v;
ignore this
variable; its
y = F'*t;
never used
return
the bicg()solver is passed a “handle” to this
function
so, the new way of solving the generalized inverse
problem is:
clear F;
put at the top of
the MatLab
global F;
script
…
mest=bicg(@afun,F'*f,1e-10,3*L);
for “biconjugate”
“handle” to the
function
tolerance
maximum
number of
iterations
mest=bicg(@afun,F'*f,1e-10,Niter);
for “biconjugate”
r.h.s of equation
FTFm=FTf
“handle” to the
multiply function
The solution is by iterative improvement of an initial guess.
The iterations stop when the tolerance falls beneath the
specified level (good) or, regardless, when the maximum
number of iterations is reached (bad).
example of a large problem
fill in the missing model parameters
that represents a 2D function m(x,y)
so that
the function passes through measured data points
m(xi,yi) = di
and
the function satisfies the diffusion equation
d2m/dx2 + d2m/dy2 = 0
A) observed, diobs=m(xi, yi)
B) predicted, m(x,y)
y
x
y
x
(see text for details on how its done)
Download