Lecture 1 Describing Inverse Problems

advertisement
Lecture 4
The L2 Norm
and
Simple Least Squares
Syllabus
Lecture 01
Lecture 02
Lecture 03
Lecture 04
Lecture 05
Lecture 06
Lecture 07
Lecture 08
Lecture 09
Lecture 10
Lecture 11
Lecture 12
Lecture 13
Lecture 14
Lecture 15
Lecture 16
Lecture 17
Lecture 18
Lecture 19
Lecture 20
Lecture 21
Lecture 22
Lecture 23
Lecture 24
Describing Inverse Problems
Probability and Measurement Error, Part 1
Probability and Measurement Error, Part 2
The L2 Norm and Simple Least Squares
A Priori Information and Weighted Least Squared
Resolution and Generalized Inverses
Backus-Gilbert Inverse and the Trade Off of Resolution and Variance
The Principle of Maximum Likelihood
Inexact Theories
Nonuniqueness and Localized Averages
Vector Spaces and Singular Value Decomposition
Equality and Inequality Constraints
L1 , L∞ Norm Problems and Linear Programming
Nonlinear Problems: Grid and Monte Carlo Searches
Nonlinear Problems: Newton’s Method
Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals
Factor Analysis
Varimax Factors, Empircal Orthogonal Functions
Backus-Gilbert Theory for Continuous Problems; Radon’s Problem
Linear Operators and Their Adjoints
Fréchet Derivatives
Exemplary Inverse Problems, incl. Filter Design
Exemplary Inverse Problems, incl. Earthquake Location
Exemplary Inverse Problems, incl. Vibrational Problems
Purpose of the Lecture
Introduce the concept of prediction error and the norms that
quantify it
Develop the Least Squares Solution
Develop the Minimum Length Solution
Determine the covariance of these solutions
Part 1
prediction error and norms
The Linear Inverse Problem
Gm = d
The Linear Inverse Problem
Gm = d
model
parameters
data kernel
data
an estimate of the model parameters
can be used to predict the data
est
Gm
=
pre
d
but the prediction may not match the
observed data
(e.g. due to observational error)
pre
d
≠
obs
d
this mismatch leads us to define the
prediction error
e=
obs
pre
d -d
e=0
when the model parameters exactly predict
the data
example of prediction error
for line fit to data
B)
A)
15
15
10
diobs
10
ei
d
d
dipre
5
0
5
0
5
z
10
0
0
5
z
zi
10
“norm”
rule for quantifying the overall size
of the error vector e
lot’s of possible ways to do it
Ln family of norms
Ln family of norms
Euclidian length
higher norms give increaing weight to
largest element of e
e
1
0
-1
0
1
2
3
4
5
z
6
7
8
9
10
0
1
2
3
4
5
z
6
7
8
9
10
0
1
2
3
4
5
z
6
7
8
9
10
0
1
2
3
4
5
z
6
7
8
9
10
|e|
1
0
-1
|e|2
1
0
-1
|e|10
1
0
-1
limiting case
guiding principle for solving an inverse
problem
find the mest
that minimizes E=||e||
with
e = dobs –dpre
and
dpre = Gmest
but which norm to use?
it makes a difference!
15
L1
L2
d
10
L∞
5
outlier
0
0
2
4
6
z
8
10
Answer is related to the distribution of
the error. Are outliers common or rare?
B)
0.5
0.5
0.4
0.4
0.3
0.3
p(d)
p(d)
A)
0.2
0.2
0.1
0.1
0
0
0
5
long dtails
10
outliers common
outliers unimportant
use low norm
gives low weight to outliers
0
5
short
d
10
tails
outliers uncommon
outliers important
use high norm
gives high weight to outliers
as we will show later in the class …
use L2 norm
when data has
Gaussian-distributed error
Part 2
Least Squares Solution to Gm=d
L2 norm of error is its Euclidian length
= eTe
so E is the square of the Euclidean length
mimimize E
Principle of Least Squares
Least Squares Solution to Gm=d
minimize E with respect to mq
∂E/∂mq = 0
so, multiply out
first term
first term
∂mj /∂mq = δjq
since mj and mq are
independent variables
Kronecker delta
(elements of identity matrix)
[I]ij = δij
a = Ib = b
ai = Σj δij bj = bi
ai = Σj δij bj = bi
i
second term
third term
putting it all together
or
presuming [GTG] has an inverse
Least Square Solution
presuming [GTG] has an inverse
Least Square Solution
memorize
example
straight line problem
Gm = d
in practice,
no need to multiply matrices
analytically
just use MatLab
mest = (G’*G)\(G’*d);
another example
fitting a plane surface
Gm = d
z, km
Part 3
Minimum Length Solution
but Least Squares will fail
when
T
[G G]
has no inverse
example
fitting line to a single point
?
?
d
?
z
zero determinant
hence no inverse
Least Squares will fail
when more than one solution
minimizes the error
the inverse problem is
“underdetermined”
simple example of an underdetermined
problem
S
1
2
R
What to do?
use another guiding principle
“a priori” information about the
solution
in the case
choose a solution that is small
minimize ||m||2
simplest case
“purely underdetermined”
more than one solution has zero error
2
L=||m||2
minimize
with the constraint that e=0
Method of Lagrange Multipliers
minimize L with constraints
C1=0, C2=0, …
equivalent to
minimize Φ=L+λ1C1+λ2C2+…
with no constraints
λs called “Lagrange Multipliers”
e(x,y)=0
y
(x0,y0)
L (x,y)
x
2m=GT λ and Gm=d
½GGT λ =d
λ = 2[GGT ]-1d
m=GT [GGT ]-1d
presuming [GGT] has an inverse
Minimum Length Solution
mest=GT [GGT ]-1d
presuming [GGT] has an inverse
Minimum Length Solution
mest=GT [GGT ]-1d
memorize
Part 4
Covariance
Least Squares Solution
mest= [GTG ]-1GTd
Minimum Length Solution
mest=GT [GGT ]-1d
both have the linear form
m=Md
but if
m=Md
then
[cov m] = M [cov d] MT
when data are uncorrelated with uniform
variance σd2
[cov d]=σd2I
so
Least Squares Solution
[cov m] = [GTG ]-1GTσd2 G[GTG ]-1
[cov m] = σd2 [GTG ]-1
Minimum Length Solution
[cov m] = GT [GGT ]-1 σd2 [GGT ]-1G
[cov m] = σd2 GT [GGT ]-2G
Least Squares Solution
[cov m] = [GTG ]-1GTσd2 G[GTG ]-1
[cov m] = σd2 [GTG ]-1
memorize
Minimum Length Solution
[cov m] = GT [GGT ]-1 σd2 [GGT ]-1G
[cov m] = σd2 GT [GGT ]-2G
where to obtain the value of σd2
a priori value – based on knowledge of accuracy
of measurement technique
my ruler has 1 mm divisions, so σd≈½mm
a posteriori value – based on prediction error
variance critically dependent on
experiment design (structure of G)
4
1
3
3
2
2
2
1
1
1
…
1
2
3
4
1
2
3
…
which is the better way to weigh a set of boxes ?
A)
miest
m
2
0
-2
0
sm
20
30
40
20
30
40
50
z
60
70
80
90
100
50
z
60
70
80
90
100
i
B)
1
σmi
10
0.5
0
0
10
i
Relationship between
[cov m] and Error Surface
10
d
A)
10
10
10
5
5
5
5
0
0
0
d0
-5
-5
-5
-5
-10
-5
-10
-50
0 C)
0
05
z
-10
-5
0
-10
-50
5
05
0
5
m2
0 D)
4 E
m2
B)
4 E
0
3000
2000
1
2500
1
2000
1500
m1 2
1000
z
m1
2
1500
1000
3
4
4
3
500
0
1
2
3
4
500
4
4
0
1
2
3
4
Taylor Series expansion of the
error about its minimum
Taylor Series expansion of the
error about its minimum
curvature matrix
with elements
∂2E/ ∂mi∂mj
for a linear problem
curvature is related to GTG
E = (Gm-d)T(Gm-d) =
mT[GTG]m-dTGm-mTGTd+dTd
so
∂2E/ ∂mi∂mj = [GTG] ij
and since
[cov m] =
2
σd
T
-1
[G G]
we have
the sharper the minimum
the higher the curvature
the smaller the covariance
Download