Maximum a posteriori estimates in Bayesian inversion Tapio Helin Statistics

advertisement
Maximum a posteriori estimates in Bayesian inversion
Tapio Helin
Department of Mathematics and
Statistics
University of Helsinki
Warwick, June 2, 2015
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
1 / 35
1
Shortly on Bayesian inversion
2
Towards a general MAP estimate
3
Example: Besov priors
4
Bayes cost of MAP
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
2 / 35
Overview
1
Shortly on Bayesian inversion
2
Towards a general MAP estimate
3
Example: Besov priors
4
Bayes cost of MAP
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
3 / 35
The Bayes formula in finite dimensions
Consider a linear statistical inverse problem
M = AU + E
where M, U and E stand for the random variables describing
measurement, unknown and noise, respectively.
Prior density πpr (u) expresses all prior information independent of the
measurement.
Likehood density π(m | u) is the likelihood of a measurement
outcome m given U = u.
Bayes formula:
πpost (u) = π(u | m) =
T. Helin (UH)
πlike (m | u)πpr (u)
π(m)
MAP estimates in Bayesian inversion
June 2015
4 / 35
Typical point estimators
Classical inversion methods produce single estimates of the unknown. In
statistical approach one can calculate point estimates and confidence or
interval estimates.
Maximum a posteriori estimate (MAP):
uM AP = arg max π(u | m)
u∈Rd
Conditional mean estimate (CM):
Z
uCM = E(u | M = m) =
T. Helin (UH)
u0 π(u0 | m)du0
Rd
MAP estimates in Bayesian inversion
June 2015
5 / 35
The Gaussian case
Example. Let M = AU + E with E ∼ N (0, Ce ) and U ∼ N (0, Cu ). In
this case, the posteriori density function is
π(u | m) ∝ πpr (u)π(m | u)
1 −1/2 2
−1/2
2
(m − Au)|
.
∝ exp − |Cu u| + |Ce
2
The MAP estimate for this posteriori distribution
arg max π(u | m) ⇔ arg min |u|2C −1 + |m − Au|2C −1 .
u∈Rd
u∈Rd
u
e
In fact, for this example the MAP and the CM estimates coincide.
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
6 / 35
Total Variation prior (Lassas–Siltanen 2004)
Total Variation prior in
Bayesian inversion is formally
defined as
πprior (u) ∝
Z
exp −αn |∇u|dt
It turns out that TV prior is
asymptotically unstable. The
picture on the right is taken
from
M. Lassas and S. Siltanen,
Inverse problems 20(5), 2004.
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
7 / 35
Discretization invariance (Siltanen et al. 2004, 2009)
M = AU + E
Theoretical model
↓
Mk = Ak U + Ek Measurement model
↓
Mkn = Ak Un + Ek Computational model
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
8 / 35
Challenges with infinite dimensions
(1) No uniform translation-invariant measure available (Lebesgue
measure) ⇒ working with the Bayes formula is more cumbersome
(2) Point estimators are problematic (CM is well-defined but difficult
analyse, what is MAP?)
(3) Very few results on non-Gaussian models (Besov or hierarchical priors)
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
9 / 35
The Bayes formula in infinite-dimensional space
We consider the following measurement setting:
(1) a linear inverse problem M = AU + E, where A : X → Rd is
bounded,
(2) the prior distribution λ is a probability distribution on (X, B(X)) and
the noise satisfies E ∼ N (0, I)
Then a conditional distribution of U given M exists and
Z
1
1
2
exp − |Au − m|2 λ(du) U ∈ B(X),
µpost (U | m) =
Z U
2
for almost every m ∈ Rd .
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
10 / 35
Some of the existing infinite-dimensional literature
Behavior of Gaussian distributions is well-understood (Mandelbaum
(1984), Somersalo et al. (1989), Stuart and others)
Posterior consistency i.e. noise converges to delta distribution
(Pikkarainen et al. (2008), van der Vaart, Stuart, Agapiou, Kekkonen
and others)
Non-Gaussian phenomena (Siltanen et al. (2004, 2009), Dashti et al.
(2011), Burger-Lucka (2014))
Discretization invariance (Siltanen et al. (2004, 2009), Cotter et al.
(2010), Lasanen 2012)
How to define a MAP estimate (Hegland (2007), Dashti et al.
(2013), H-Burger (2015))
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
11 / 35
Overview
1
Shortly on Bayesian inversion
2
Towards a general MAP estimate
3
Example: Besov priors
4
Bayes cost of MAP
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
12 / 35
Differentiability of measures
The following concept originating to papers by Sergei Fomin in the 1960s.
Definition
A measure µ on X is called Fomin differentiable along the vector h if, for
every set A ∈ B(X), there exists a finite limit
µ(A + th) − µ(A)
t→0
t
dh µ(A) = lim
The set function dh µ is a countably additive signed measure on B(X) and
has bounded variation due to the Nikodym theorem.
We denote the domain of differentiability by
D(µ) = {h ∈ X | µ is Fomin differentiable along h}
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
13 / 35
Differentiability of measures
By considering function f (t) = µ(A + th) and its derivative at zero, we
see that dh µ is absolutely continuous with respect to µ.
Definition
The Radon–Nikodym density of the measure dh µ with respect to µ is
denoted by βhµ and is called the logarithmic derivative of µ along h.
Consequently, for all A ∈ B(X) the logarithmic gradient βhµ satisfies
Z
dh µ(A) =
βhµ (u)µ(du)
A
and, in particular, we have dh µ(X) = 0 for any h ∈ D(µ) by definition.
µ
Moreover, βsh
= s · βhµ for any s ∈ R.
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
14 / 35
Finite dimensional example
Suppose the posterior is of the form
1
2
πpost (u | m) ∝ exp − |Au − m|2 − J(u)
2
with differentiable J. Then the logarithmic derivate satisfies
βhµ (u) = −hA∗ (Au − m) + J 0 (u), hi.
Formally, we want study the zero points of βhµ in the infinite-dimensional
case (see Hegland 2007).
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
15 / 35
Gaussian example
Suppose
X is a separable Hilbert space
T is a non-negative self-adjoint Hilbert–Schmidt operator on X and
γ is a Gaussian measure on (X, B(X)) with mean u0 and covariance
T 2,
then the Cameron–Martin space of γ is defined by
H(γ) := T (X),
hh1 , h2 iH(γ) = hT −1 h1 , T −1 h2 iX .
and the logarithmic derivative of γ satisfies
βhγ (u) = −hh, u − u0 iH(γ)
for any
h ∈ D(γ) = H(γ).
Pitfall: The formula for βhγ should be understood as a measurable
extension.
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
16 / 35
MAP estimate by Dashti-Law-Stuart-Voss (2013)
Definition
Let M = supu∈X µ(B (u)). Any point û ∈ X satisfying
lim
→0
µ(B (û))
=1
M
is a MAP estimate for the measure µ.
Dashti and others showed that for certain non-linear F , the MAP estimate
for Gaussian noise ρ and prior λ satisfies
û = argminu∈X kF (u) − mk2CM (ρ) + kuk2CM (λ) .
How to generalize for non-Gaussian priors?
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
17 / 35
Generalized Onsager–Machlup functional
Theorem (Bogachev 2010)
Suppose µ is a Radon measure on a locally convex space X and is Fomin
differentiable along a vector h ∈ X. Moreover, if, exp(|βhµ (·)|) ∈ L1 (µ)
for some > 0, then
Z 1
dµh
µ
(u) = exp
βh (u − sh)ds
in L1 (µ).
dµ
0
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
18 / 35
Simple lemma about continuity
Lemma
1
h
Assume that µh µ and denote rh = dµ
dµ ∈ L (µ). Suppose rh has a
continuous representative r̃h ∈ C(X), i.e., rh − r̃h = 0 in L1 (µ). Then it
holds that
µh (B (u))
= r̃h (u)
lim
→0 µ(B (u))
for any u ∈ X.
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
19 / 35
Weak MAP estimate
Assumptions:
(A1) there exists a separable Banach space E ⊂ D(µ) such that E is dense
in X and βhµ ∈ C(X) for any h ∈ E and
(A2) for any h ∈ E there exists > 0 such that exp(|βhµ (·)|) ∈ L1 (µ).
Definition (H–Burger)
We call a point û ∈ X, û ∈ supp(µ), a weak MAP (wMAP) estimate if
µ(B (û − h))
dµh
(û) = lim
≤1
→0
dµ
µ(B (û))
for all h ∈ E.
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
20 / 35
Every MAP is a wMAP
Lemma
Every MAP estimate û is a weak MAP estimate.
Proof.
The claim is trivial since
M
dµh
(û) ≤ lim
=1
→0 µ(B (û))
dµ
for any h ∈ E.
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
21 / 35
Connection to a variational problem
A probability measure λ on B(X) is called convex if, for all sets
A, B ⊂ B(X) and all t ∈ [0, 1], one has
λ(tA + (1 − t)B) ≥ λ(A)t λ(B)1−t .
Theorem (H–Burger)
(1) If û ∈ X is a weak MAP estimate of µ, then βhµ (û) = 0 for all h ∈ E.
(2) Suppose that µ is convex and there exists ũ ∈ X such that βhµ (ũ) = 0
for all h ∈ E. Then ũ is a weak MAP estimate.
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
22 / 35
Theorem
If û ∈ X is a weak MAP estimate of µ, then βhµ (û) = 0 for all h ∈ E.
Proof.
h
It follows from dµ
dµ (û) ≤ 1 and identity generalized Onsager–Machlup
formula that
Z t
Z 1
µ
µ
βh (û − sh)ds =
βth
(û − s0 · th)ds0 ≤ 0
0
0
for all h ∈ E and t ∈ R. By continuity we then have βhµ (û) ≤ 0. Now since
µ
h, −h ∈ E ⊂ D(µ) and by similar reasoning β−h
(û) ≤ 0, we must have
µ
(û) = βhµ (û) ≤ 0
0 ≤ −β−h
and the claim follows.
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
23 / 35
So what about our posterior measure?
Recall that
µpost (U | m) =
1
Z
1
exp − |Au − m|2 λ(du)
2
U
Z
λ is convex ⇒ µpost is convex
Also, D(λ) ⊂ D(µpost ) and
dh µpost = f · dh λ + ∂h f · λ
=
βhλ (·) − hA · −m, AhiRd f λ
µ
= βh post µpost
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
24 / 35
Theorem
If λ satisfies (A1) and (A2), then so does µpost .
Proof.
(A1) is clear. For (A2) we have
exp(|β µpost (·)|) 1
h
L (µ)
Z
1
λ
2
≤C
exp((C1 |Au − m|Rd + |βh (u)|)) exp − |Au − m| λ(du)
2
X
Z
e
≤C
exp −(|Au − m|Rd − C2 )2 exp(|βhλ (u)|)λ(du)
X
e
≤C
exp(|βhλ (·)|) 1 ,
L (λ)
e C1 , C2 > 0.
for suitable > 0 and constants C, C,
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
25 / 35
Weak MAP in Bayesian inversion
Theorem (H–Burger)
Suppose λ is convex and satisfies (A1) and (A2). Moreover, assume that
there is a convex functional J : X → R ∪ {∞}, which is Frechet
differentiable in its domain D(J) and J 0 (u) has a bounded extension
J 0 (u) : E → R such that
βhλ (u) = J 0 (u)h
for any h ∈ E and any u ∈ X. Then û is wMAP if and only if
û ∈ arg minu∈X F (u) where
1
F (u) = |Au − m|22 + J(u).
2
T. Helin (UH)
MAP estimates in Bayesian inversion
(1)
June 2015
26 / 35
Overview
1
Shortly on Bayesian inversion
2
Towards a general MAP estimate
3
Example: Besov priors
4
Bayes cost of MAP
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
27 / 35
Shortly about Besov spaces
2
d
Suppose {ψ` }∞
`=1 form an orthonormal wavelet basis for L (T ). We
s (Td ) as follows: the series
define Bpq
f (x) =
∞
X
c` ψ` (x)
(2)
`=1
s (Td ) if and only if
belongs to Bpq

j( 12 − p1 ) 
2js 2
2j+1
X−1
1/p
|c` |p 
∈ `q (N).
(3)
`=2j
s .
We write Bps = Bpp
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
28 / 35
Besov priors
Definition
Let 1 ≤ p < ∞ and let (X` )∞
`=1 be independent identically distributed
real-valued random variables with the probability density function
Z
p
πX (x) = σp exp(−|x| )
p
−1
exp(−|x| )dx
with σp =
.
(4)
R
Let U be the random function
U (x) =
∞
X
`
− ds − 21 + p1
X` ψ` (x),
x ∈ Td .
`=1
Then we say that U is distributed according to a Besov prior in Bps .
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
29 / 35
Besov priors
Theorem
Assume that λ is a Besov prior in Bps . It holds that
s+( 12 − p1 )d
(1) D(λ) = B2
(Td ) for p > 1,
ps−(p−1)t
(2) exp(|βhλ |) ∈ L1 (λ) for any h ∈ Bp
(Td ) and
ps−(p−1)t
(3) β̃hλ ∈ C(Bpt (Td )) for any h ∈ E = Bp
(Td ) and 1 < p ≤ 2.
Moreover, the weak MAP estimate of the inverse problem is obtained by
minimizing functional
1
FBesov (u) = |Au − m|2 + kukpBps .
2
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
30 / 35
Overview
1
Shortly on Bayesian inversion
2
Towards a general MAP estimate
3
Example: Besov priors
4
Bayes cost of MAP
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
31 / 35
Bayes Cost Formalism
Consider a general estimator û : m 7→ û(m) for the inverse problem. The
Bayes cost is defined by the expected cost, i.e., the average performance
BCΨ (û) := E [Ψ(u, û(m))]
Z Z
=
Ψ(u, û(m))πpost (u | m)du π(m)dm.
The Bayes estimator ûΨ is the estimator which minimizes BCΨ (û).
The CM estimate minimizes mean squared error
ΨM SE (u, û) = |u − û|22 .
The MAP is asymptotic Bayes estimator of
(
0,
if |u − û|∞ ≤ δ,
Ψδ (u, û) =
1
otherwise.
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
32 / 35
MAP as a proper Bayes estimator
Definition
For a convex functional J : Rn → R ∪ {∞}, the Bregman distance
DJq (u, v) between u, v ∈ Rn for a subgradient q ∈ ∂J(v) is defined as
DJq (u, v) = J(u) − J(v) − hq, u − vi,
q ∈ ∂J(v).
Suppose πpr (u) ∝ exp(−J(u)), where J : Rn → R ∪ {∞} is a
Lipschitz-continuous convex functional and u 7→ |Au|22 + J(u) has at least
linear growth at infinity. Define
1
ΨBrg (u, û) = |A(û − u)|22 + DJ (û, u).
2
Theorem (Burger–Lucka 2014)
The MAP estimate is a Bayes estimator for ΨBrg .
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
33 / 35
Bregman distance and Bayes cost
Define homogeneous Bregman distance
e J (u, ·) = J(u) + βuλ (·)
D
in L1 (λ)
for any u ∈ D(λ) ∩ D(J).
Theorem
Assume that µpost and λ are as in previous theorem. Then û is a weak
MAP estimate if and only if it minimizes the functional
Z 1
2
e
|Au − Av|2 + DJ (u, v) µ(dv).
G(u) =
2
X
Moreover, we have
Z
Z
e J (uM AP , v)µ(dv) ≤
e J (uCM , v)µ(dv).
D
D
X
T. Helin (UH)
X
MAP estimates in Bayesian inversion
June 2015
34 / 35
Conclusions
Infinite-dimensional Bayesian inverse problems contain many big open
questions
Studying differentiability of the posterior opens new avenues of
research
MAP estimates can be rigorously defined for certain class of
non-Gaussian priors
For more details:
Helin, T. and Burger, M.: Maximum a posteriori probability estimates in
infinite-dimensional Bayesian inverse problems, arXiv:1412.5816.
T. Helin (UH)
MAP estimates in Bayesian inversion
June 2015
35 / 35
Download