Lecture 09

advertisement
Lecture 9
Inexact Theories
Syllabus
Lecture 01
Lecture 02
Lecture 03
Lecture 04
Lecture 05
Lecture 06
Lecture 07
Lecture 08
Lecture 09
Lecture 10
Lecture 11
Lecture 12
Lecture 13
Lecture 14
Lecture 15
Lecture 16
Lecture 17
Lecture 18
Lecture 19
Lecture 20
Lecture 21
Lecture 22
Lecture 23
Lecture 24
Describing Inverse Problems
Probability and Measurement Error, Part 1
Probability and Measurement Error, Part 2
The L2 Norm and Simple Least Squares
A Priori Information and Weighted Least Squared
Resolution and Generalized Inverses
Backus-Gilbert Inverse and the Trade Off of Resolution and Variance
The Principle of Maximum Likelihood
Inexact Theories
Nonuniqueness and Localized Averages
Vector Spaces and Singular Value Decomposition
Equality and Inequality Constraints
L1 , L∞ Norm Problems and Linear Programming
Nonlinear Problems: Grid and Monte Carlo Searches
Nonlinear Problems: Newton’s Method
Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals
Factor Analysis
Varimax Factors, Empirical Orthogonal Functions
Backus-Gilbert Theory for Continuous Problems; Radon’s Problem
Linear Operators and Their Adjoints
Fréchet Derivatives
Exemplary Inverse Problems, incl. Filter Design
Exemplary Inverse Problems, incl. Earthquake Location
Exemplary Inverse Problems, incl. Vibrational Problems
Purpose of the Lecture
Discuss how an inexact theory can be represented
Solve the inexact, linear Gaussian inverse problem
Use maximization of relative entropy as a guiding principle
for solving inverse problems
Introduce F-test as way to determine whether one solution is
“better” than another
Part 1
How Inexact Theories can be
Represented
How do we generalize the case of
an exact theory
to one that is inexact?
exact theory case
model, m
0
0.5
1
theory
ddpre
dobs
1.5
2
2.5
3
datum, d
3.5
d=g(m)
4
4.5
5
0
1
mest
mapm
2
3
4
5
to make theory inexact ...
model, m
0
0.5
1
ddpre
dobs
1.5
must make the
theory
probabilistic
or fuzzy
2
2.5
3
datum, d
3.5
d=g(m)
4
4.5
5
0
1
mest
mapm
2
3
4
5
theory
model, m
combination
model, m
0
1
1
1
2
2
3
3
datum, d
4
5
1
2 ap 3
m
m
4
5
5
model, m
2
3
4
0
pre
datum, d d d dobs
0
ddobs
0
datum, d
ddobs
a prior p.d.f.
4
0
1
2 ap 3
m
m
4
5
5
0
1
2
ap
m
m
est
m
3
4
5
how do you
combine
two probability density functions ?
how do you
combine
two probability density functions ?
so that the information in them is combined ...
desirable properties
order shouldn’t matter
combining something with the null
distribution should leave it unchanged
combination should be invariant under
change of variables
Answer
a priori , pA
3
4
4
1
2
mapm
3
4
5
5
0
2
2
ddobs
1
datum, d
datum, d
2
3
mapm
4
5
1
2 ap 3
m
m
4
5
5
0
1
2
3
map
m
est
m
(F)
4
5
model, m
0
1
4
0
3
5
model, m
3
4
5
1
0
1
3
0
(E)
model, m
2
4
dpre
d dobs
0
(D)
ddobs
datum, d
datum, d
3
1
dpre
d dobs
2
model, m
0
datum, d
2
ddobs
1
5
model, m
0
1
total, pT
2
3
datum, d
model, m
0
ddobs
theory, pg
4
0
1
2
3
mapm
4
5
5
0
1
2 ap 3
m
est
m m
4
5
“solution to inverse problem”
maximum likelihood point of
(with pN∝constant)
simultaneously gives
est
pre
m and d
probability that the estimated model parameters
are near m and the predicted data are near d
T
probability that the estimated model parameters
are near m irrespective of the value of the
predicted data
conceptual problem
and
T
do not necessarily have maximum
likelihood points at the same value of m
model, m
0
0
1
datum, dpredobs
1
2
2
3
3
4
4
5
0
d
5
0
1
1
3
4
5
3
ap
mest m
4
5
4
5
1
2
2
1
0.8
p(m)p(m)
p(m)
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0
1
0
1
2
3
2
3
mestm’
m
model,
m
4
5
illustrates the problem in defining a
definitive solution
to an inverse problem
illustrates the problem in defining a
definitive solution
to an inverse problem
fortunately
if all distributions are Gaussian
the two points are the same
Part 2
Solution of the inexact linear
Gaussian inverse problem
Gaussian a priori information
Gaussian a priori information
a priori values
of model
parameters
their
uncertainty
Gaussian observations
Gaussian observations
observed
data
measurement
error
Gaussian theory
Gaussian theory
linear
theory
uncertainty
in theory
mathematical statement of problem
find (m,d) that maximizes
pT(m,d) = pA(m) pA(d) pg(m,d)
and, along the way, work out the form of pT(m,d)
notational simplification
group m and d into single vector x = [dT, mT]T
group [cov m]A and [cov d]A into single matrix
write d-Gm=0 as Fx=0 with F=[I, –G]
after much algebra, we find
pT(x) is a Gaussian distribution
with mean
and variance
after much algebra, we find
pT(x) is a Gaussian distribution
with mean
solution to
inverse
problem
and variance
after pulling mest out of x*
after pulling mest out of x*
reminiscent of GT(GGT)-1
minimum length solution
after pulling mest out of x*
error in theory adds to
error in data
after pulling mest out of x*
solution depends on the
values of the prior
information only to the
extent that the model
resolution matrix is different
from an identity matrix
and after algebraic manipulation
which also equals
reminiscent of (GTG)-1 GT
least squares solution
interesting aside
weighted least squares solution
is equal to the
weighted minimum length solution
what did we learn?
for linear Gaussian inverse problem
inexactness of theory
just adds to
inexactness of data
Part 3
Use maximization of relative
entropy as a guiding principle for
solving inverse problems
from last lecture
assessing the information content
in pA(m)
Do we know a little about m
or
a lot about m ?
Information Gain, S
-S called Relative Entropy
(A)
pA(m)
p and q
0.3
0.2
pN(m)
0.1
0
-25
-20
-15
-10
-5
1
1.5
2
(B)
4
0
m
5
10
15
20
25
2.5 3
sigmamp
3.5
4
4.5
5
m
S
S(σA)
3
2
1
0
0
0.5
σA
Principle of
Maximum Relative Entropy
or if you prefer
Principle of
Minimum Information Gain
find solution p.d.f. pT(m) that has the largest relative
entropy as compared to a priori p.d.f. pA(m)
or if you prefer
find solution p.d.f. pT(m) that has smallest possible
new information as compared to a priori p.d.f. pA(m)
properly
normalized
p.d.f.
data is satisfied in the mean
or
expected value of error is zero
After minimization using Lagrange
Multipliers process
pT(m) is Gaussian with maximum
likelihood point mest satisfying
After minimization using Lagrane
Multipliers process
pT(m) is Gaussian with maximum
likelihood point mest satisfying
just the weighted minimum
length solution
What did we learn?
Only that the
Principle of Maximum Entropy
is yet another way of deriving
the inverse problem solutions
we are already familiar with
Part 4
F-test
as way to determine whether one solution is
“better” than another
Common Scenario
two different theories
solution mestA
MA model parameters
prediction error EA
solution mestB
MB model parameters
prediction error EB
Suppose EB < EA
Is B really better than A ?
What if B has many more model
parameters than A
MB >> MA
Is B fitting better any surprise?
Need to against Null Hypothesis
The difference in error is due to
random variation
suppose error e has a Gaussian p.d.f.
uncorrelated
uniform variance σd
estimate variance
want to known the probability
density function of
actually, we’ll use the quantity
which is the same, as long as the two theories
that we’re testing is applied to the same data
p.d.f. of F is known
1
0.5
N,2 0
N=2 50
p(F )
0
1
p(FN,5)0.5
0
2
N,25 1
0
p(F
1
0
0.5
0.5
0
0.5
2
2.5
3
3.5
4
4.5
1.5
2
2.5
3
3.5
4
4.5
1
5
F
50
1.5
2
2.5
3
3.5
4
4.5
1.5
2
2.5
3
3.5
4
4.5
50
1
5
F
1
N=2
)
1.5
N=2 50
N=2
)
2
1
N,50 0
p(F
0
0.5
F
F
5
5
as is its mean and variance
example
same dataset fit with
a straight line
and
a cubic polynomial
linear fit
(A) Linear fit, N-M=9, E=0.030
1
0.5
d(z)
di
0
-0.5
-1
0
0.1
0.2
0.3
0.4
0.5
z
cubici fit
0.6
0.5
zs
0.6
0.7
0.8
0.9
1
0.7
0.8
0.9
1
z
(B) Cubic fit, N-M=7, E=0.006
1
d(z)
0.5
di
0
-0.5
-1
0
0.1
0.2
0.3
0.4
zi
linear fit
(A) Linear fit, N-M=9, E=0.030
1
0.5
d(z)
di
0
-0.5
-1
0
0.1
0.2
0.3
0.4
0.5
z
cubici fit
0.6
0.5
zs
0.6
0.7
0.8
0.9
1
0.7
0.8
0.9
1
z
(B) Cubic fit, N-M=7, E=0.006
1
d(z)
0.5
di
0
-0.5
-1
0
0.1
0.2
0.3
0.4
zi
est
F7,9 =
4.1
probability that
F >F est
(cubic fit seems better than linear fit)
by random chance alone
or
F < 1/F est
(linear fit seems better than cubic fit)
by random chance alone
in MatLab
P = 1 - (fcdf(Fobs,vA,vB)-fcdf(1/Fobs,vA,vB));
answer: 6%
The Null Hypothesis
that the difference is due to random variation
cannot be rejected to 95% confidence
Download