Maximum a posteriori estimates in Bayesian inversion Tapio Helin Department of Mathematics and Statistics University of Helsinki Warwick, June 2, 2015 T. Helin (UH) MAP estimates in Bayesian inversion June 2015 1 / 35 1 Shortly on Bayesian inversion 2 Towards a general MAP estimate 3 Example: Besov priors 4 Bayes cost of MAP T. Helin (UH) MAP estimates in Bayesian inversion June 2015 2 / 35 Overview 1 Shortly on Bayesian inversion 2 Towards a general MAP estimate 3 Example: Besov priors 4 Bayes cost of MAP T. Helin (UH) MAP estimates in Bayesian inversion June 2015 3 / 35 The Bayes formula in finite dimensions Consider a linear statistical inverse problem M = AU + E where M, U and E stand for the random variables describing measurement, unknown and noise, respectively. Prior density πpr (u) expresses all prior information independent of the measurement. Likehood density π(m | u) is the likelihood of a measurement outcome m given U = u. Bayes formula: πpost (u) = π(u | m) = T. Helin (UH) πlike (m | u)πpr (u) π(m) MAP estimates in Bayesian inversion June 2015 4 / 35 Typical point estimators Classical inversion methods produce single estimates of the unknown. In statistical approach one can calculate point estimates and confidence or interval estimates. Maximum a posteriori estimate (MAP): uM AP = arg max π(u | m) u∈Rd Conditional mean estimate (CM): Z uCM = E(u | M = m) = T. Helin (UH) u0 π(u0 | m)du0 Rd MAP estimates in Bayesian inversion June 2015 5 / 35 The Gaussian case Example. Let M = AU + E with E ∼ N (0, Ce ) and U ∼ N (0, Cu ). In this case, the posteriori density function is π(u | m) ∝ πpr (u)π(m | u) 1 −1/2 2 −1/2 2 (m − Au)| . ∝ exp − |Cu u| + |Ce 2 The MAP estimate for this posteriori distribution arg max π(u | m) ⇔ arg min |u|2C −1 + |m − Au|2C −1 . u∈Rd u∈Rd u e In fact, for this example the MAP and the CM estimates coincide. T. Helin (UH) MAP estimates in Bayesian inversion June 2015 6 / 35 Total Variation prior (Lassas–Siltanen 2004) Total Variation prior in Bayesian inversion is formally defined as πprior (u) ∝ Z exp −αn |∇u|dt It turns out that TV prior is asymptotically unstable. The picture on the right is taken from M. Lassas and S. Siltanen, Inverse problems 20(5), 2004. T. Helin (UH) MAP estimates in Bayesian inversion June 2015 7 / 35 Discretization invariance (Siltanen et al. 2004, 2009) M = AU + E Theoretical model ↓ Mk = Ak U + Ek Measurement model ↓ Mkn = Ak Un + Ek Computational model T. Helin (UH) MAP estimates in Bayesian inversion June 2015 8 / 35 Challenges with infinite dimensions (1) No uniform translation-invariant measure available (Lebesgue measure) ⇒ working with the Bayes formula is more cumbersome (2) Point estimators are problematic (CM is well-defined but difficult analyse, what is MAP?) (3) Very few results on non-Gaussian models (Besov or hierarchical priors) T. Helin (UH) MAP estimates in Bayesian inversion June 2015 9 / 35 The Bayes formula in infinite-dimensional space We consider the following measurement setting: (1) a linear inverse problem M = AU + E, where A : X → Rd is bounded, (2) the prior distribution λ is a probability distribution on (X, B(X)) and the noise satisfies E ∼ N (0, I) Then a conditional distribution of U given M exists and Z 1 1 2 exp − |Au − m|2 λ(du) U ∈ B(X), µpost (U | m) = Z U 2 for almost every m ∈ Rd . T. Helin (UH) MAP estimates in Bayesian inversion June 2015 10 / 35 Some of the existing infinite-dimensional literature Behavior of Gaussian distributions is well-understood (Mandelbaum (1984), Somersalo et al. (1989), Stuart and others) Posterior consistency i.e. noise converges to delta distribution (Pikkarainen et al. (2008), van der Vaart, Stuart, Agapiou, Kekkonen and others) Non-Gaussian phenomena (Siltanen et al. (2004, 2009), Dashti et al. (2011), Burger-Lucka (2014)) Discretization invariance (Siltanen et al. (2004, 2009), Cotter et al. (2010), Lasanen 2012) How to define a MAP estimate (Hegland (2007), Dashti et al. (2013), H-Burger (2015)) T. Helin (UH) MAP estimates in Bayesian inversion June 2015 11 / 35 Overview 1 Shortly on Bayesian inversion 2 Towards a general MAP estimate 3 Example: Besov priors 4 Bayes cost of MAP T. Helin (UH) MAP estimates in Bayesian inversion June 2015 12 / 35 Differentiability of measures The following concept originating to papers by Sergei Fomin in the 1960s. Definition A measure µ on X is called Fomin differentiable along the vector h if, for every set A ∈ B(X), there exists a finite limit µ(A + th) − µ(A) t→0 t dh µ(A) = lim The set function dh µ is a countably additive signed measure on B(X) and has bounded variation due to the Nikodym theorem. We denote the domain of differentiability by D(µ) = {h ∈ X | µ is Fomin differentiable along h} T. Helin (UH) MAP estimates in Bayesian inversion June 2015 13 / 35 Differentiability of measures By considering function f (t) = µ(A + th) and its derivative at zero, we see that dh µ is absolutely continuous with respect to µ. Definition The Radon–Nikodym density of the measure dh µ with respect to µ is denoted by βhµ and is called the logarithmic derivative of µ along h. Consequently, for all A ∈ B(X) the logarithmic gradient βhµ satisfies Z dh µ(A) = βhµ (u)µ(du) A and, in particular, we have dh µ(X) = 0 for any h ∈ D(µ) by definition. µ Moreover, βsh = s · βhµ for any s ∈ R. T. Helin (UH) MAP estimates in Bayesian inversion June 2015 14 / 35 Finite dimensional example Suppose the posterior is of the form 1 2 πpost (u | m) ∝ exp − |Au − m|2 − J(u) 2 with differentiable J. Then the logarithmic derivate satisfies βhµ (u) = −hA∗ (Au − m) + J 0 (u), hi. Formally, we want study the zero points of βhµ in the infinite-dimensional case (see Hegland 2007). T. Helin (UH) MAP estimates in Bayesian inversion June 2015 15 / 35 Gaussian example Suppose X is a separable Hilbert space T is a non-negative self-adjoint Hilbert–Schmidt operator on X and γ is a Gaussian measure on (X, B(X)) with mean u0 and covariance T 2, then the Cameron–Martin space of γ is defined by H(γ) := T (X), hh1 , h2 iH(γ) = hT −1 h1 , T −1 h2 iX . and the logarithmic derivative of γ satisfies βhγ (u) = −hh, u − u0 iH(γ) for any h ∈ D(γ) = H(γ). Pitfall: The formula for βhγ should be understood as a measurable extension. T. Helin (UH) MAP estimates in Bayesian inversion June 2015 16 / 35 MAP estimate by Dashti-Law-Stuart-Voss (2013) Definition Let M = supu∈X µ(B (u)). Any point û ∈ X satisfying lim →0 µ(B (û)) =1 M is a MAP estimate for the measure µ. Dashti and others showed that for certain non-linear F , the MAP estimate for Gaussian noise ρ and prior λ satisfies û = argminu∈X kF (u) − mk2CM (ρ) + kuk2CM (λ) . How to generalize for non-Gaussian priors? T. Helin (UH) MAP estimates in Bayesian inversion June 2015 17 / 35 Generalized Onsager–Machlup functional Theorem (Bogachev 2010) Suppose µ is a Radon measure on a locally convex space X and is Fomin differentiable along a vector h ∈ X. Moreover, if, exp(|βhµ (·)|) ∈ L1 (µ) for some > 0, then Z 1 dµh µ (u) = exp βh (u − sh)ds in L1 (µ). dµ 0 T. Helin (UH) MAP estimates in Bayesian inversion June 2015 18 / 35 Simple lemma about continuity Lemma 1 h Assume that µh µ and denote rh = dµ dµ ∈ L (µ). Suppose rh has a continuous representative r̃h ∈ C(X), i.e., rh − r̃h = 0 in L1 (µ). Then it holds that µh (B (u)) = r̃h (u) lim →0 µ(B (u)) for any u ∈ X. T. Helin (UH) MAP estimates in Bayesian inversion June 2015 19 / 35 Weak MAP estimate Assumptions: (A1) there exists a separable Banach space E ⊂ D(µ) such that E is dense in X and βhµ ∈ C(X) for any h ∈ E and (A2) for any h ∈ E there exists > 0 such that exp(|βhµ (·)|) ∈ L1 (µ). Definition (H–Burger) We call a point û ∈ X, û ∈ supp(µ), a weak MAP (wMAP) estimate if µ(B (û − h)) dµh (û) = lim ≤1 →0 dµ µ(B (û)) for all h ∈ E. T. Helin (UH) MAP estimates in Bayesian inversion June 2015 20 / 35 Every MAP is a wMAP Lemma Every MAP estimate û is a weak MAP estimate. Proof. The claim is trivial since M dµh (û) ≤ lim =1 →0 µ(B (û)) dµ for any h ∈ E. T. Helin (UH) MAP estimates in Bayesian inversion June 2015 21 / 35 Connection to a variational problem A probability measure λ on B(X) is called convex if, for all sets A, B ⊂ B(X) and all t ∈ [0, 1], one has λ(tA + (1 − t)B) ≥ λ(A)t λ(B)1−t . Theorem (H–Burger) (1) If û ∈ X is a weak MAP estimate of µ, then βhµ (û) = 0 for all h ∈ E. (2) Suppose that µ is convex and there exists ũ ∈ X such that βhµ (ũ) = 0 for all h ∈ E. Then ũ is a weak MAP estimate. T. Helin (UH) MAP estimates in Bayesian inversion June 2015 22 / 35 Theorem If û ∈ X is a weak MAP estimate of µ, then βhµ (û) = 0 for all h ∈ E. Proof. h It follows from dµ dµ (û) ≤ 1 and identity generalized Onsager–Machlup formula that Z t Z 1 µ µ βh (û − sh)ds = βth (û − s0 · th)ds0 ≤ 0 0 0 for all h ∈ E and t ∈ R. By continuity we then have βhµ (û) ≤ 0. Now since µ h, −h ∈ E ⊂ D(µ) and by similar reasoning β−h (û) ≤ 0, we must have µ (û) = βhµ (û) ≤ 0 0 ≤ −β−h and the claim follows. T. Helin (UH) MAP estimates in Bayesian inversion June 2015 23 / 35 So what about our posterior measure? Recall that µpost (U | m) = 1 Z 1 exp − |Au − m|2 λ(du) 2 U Z λ is convex ⇒ µpost is convex Also, D(λ) ⊂ D(µpost ) and dh µpost = f · dh λ + ∂h f · λ = βhλ (·) − hA · −m, AhiRd f λ µ = βh post µpost T. Helin (UH) MAP estimates in Bayesian inversion June 2015 24 / 35 Theorem If λ satisfies (A1) and (A2), then so does µpost . Proof. (A1) is clear. For (A2) we have exp(|β µpost (·)|) 1 h L (µ) Z 1 λ 2 ≤C exp((C1 |Au − m|Rd + |βh (u)|)) exp − |Au − m| λ(du) 2 X Z e ≤C exp −(|Au − m|Rd − C2 )2 exp(|βhλ (u)|)λ(du) X e ≤C exp(|βhλ (·)|) 1 , L (λ) e C1 , C2 > 0. for suitable > 0 and constants C, C, T. Helin (UH) MAP estimates in Bayesian inversion June 2015 25 / 35 Weak MAP in Bayesian inversion Theorem (H–Burger) Suppose λ is convex and satisfies (A1) and (A2). Moreover, assume that there is a convex functional J : X → R ∪ {∞}, which is Frechet differentiable in its domain D(J) and J 0 (u) has a bounded extension J 0 (u) : E → R such that βhλ (u) = J 0 (u)h for any h ∈ E and any u ∈ X. Then û is wMAP if and only if û ∈ arg minu∈X F (u) where 1 F (u) = |Au − m|22 + J(u). 2 T. Helin (UH) MAP estimates in Bayesian inversion (1) June 2015 26 / 35 Overview 1 Shortly on Bayesian inversion 2 Towards a general MAP estimate 3 Example: Besov priors 4 Bayes cost of MAP T. Helin (UH) MAP estimates in Bayesian inversion June 2015 27 / 35 Shortly about Besov spaces 2 d Suppose {ψ` }∞ `=1 form an orthonormal wavelet basis for L (T ). We s (Td ) as follows: the series define Bpq f (x) = ∞ X c` ψ` (x) (2) `=1 s (Td ) if and only if belongs to Bpq j( 12 − p1 ) 2js 2 2j+1 X−1 1/p |c` |p ∈ `q (N). (3) `=2j s . We write Bps = Bpp T. Helin (UH) MAP estimates in Bayesian inversion June 2015 28 / 35 Besov priors Definition Let 1 ≤ p < ∞ and let (X` )∞ `=1 be independent identically distributed real-valued random variables with the probability density function Z p πX (x) = σp exp(−|x| ) p −1 exp(−|x| )dx with σp = . (4) R Let U be the random function U (x) = ∞ X ` − ds − 21 + p1 X` ψ` (x), x ∈ Td . `=1 Then we say that U is distributed according to a Besov prior in Bps . T. Helin (UH) MAP estimates in Bayesian inversion June 2015 29 / 35 Besov priors Theorem Assume that λ is a Besov prior in Bps . It holds that s+( 12 − p1 )d (1) D(λ) = B2 (Td ) for p > 1, ps−(p−1)t (2) exp(|βhλ |) ∈ L1 (λ) for any h ∈ Bp (Td ) and ps−(p−1)t (3) β̃hλ ∈ C(Bpt (Td )) for any h ∈ E = Bp (Td ) and 1 < p ≤ 2. Moreover, the weak MAP estimate of the inverse problem is obtained by minimizing functional 1 FBesov (u) = |Au − m|2 + kukpBps . 2 T. Helin (UH) MAP estimates in Bayesian inversion June 2015 30 / 35 Overview 1 Shortly on Bayesian inversion 2 Towards a general MAP estimate 3 Example: Besov priors 4 Bayes cost of MAP T. Helin (UH) MAP estimates in Bayesian inversion June 2015 31 / 35 Bayes Cost Formalism Consider a general estimator û : m 7→ û(m) for the inverse problem. The Bayes cost is defined by the expected cost, i.e., the average performance BCΨ (û) := E [Ψ(u, û(m))] Z Z = Ψ(u, û(m))πpost (u | m)du π(m)dm. The Bayes estimator ûΨ is the estimator which minimizes BCΨ (û). The CM estimate minimizes mean squared error ΨM SE (u, û) = |u − û|22 . The MAP is asymptotic Bayes estimator of ( 0, if |u − û|∞ ≤ δ, Ψδ (u, û) = 1 otherwise. T. Helin (UH) MAP estimates in Bayesian inversion June 2015 32 / 35 MAP as a proper Bayes estimator Definition For a convex functional J : Rn → R ∪ {∞}, the Bregman distance DJq (u, v) between u, v ∈ Rn for a subgradient q ∈ ∂J(v) is defined as DJq (u, v) = J(u) − J(v) − hq, u − vi, q ∈ ∂J(v). Suppose πpr (u) ∝ exp(−J(u)), where J : Rn → R ∪ {∞} is a Lipschitz-continuous convex functional and u 7→ |Au|22 + J(u) has at least linear growth at infinity. Define 1 ΨBrg (u, û) = |A(û − u)|22 + DJ (û, u). 2 Theorem (Burger–Lucka 2014) The MAP estimate is a Bayes estimator for ΨBrg . T. Helin (UH) MAP estimates in Bayesian inversion June 2015 33 / 35 Bregman distance and Bayes cost Define homogeneous Bregman distance e J (u, ·) = J(u) + βuλ (·) D in L1 (λ) for any u ∈ D(λ) ∩ D(J). Theorem Assume that µpost and λ are as in previous theorem. Then û is a weak MAP estimate if and only if it minimizes the functional Z 1 2 e |Au − Av|2 + DJ (u, v) µ(dv). G(u) = 2 X Moreover, we have Z Z e J (uM AP , v)µ(dv) ≤ e J (uCM , v)µ(dv). D D X T. Helin (UH) X MAP estimates in Bayesian inversion June 2015 34 / 35 Conclusions Infinite-dimensional Bayesian inverse problems contain many big open questions Studying differentiability of the posterior opens new avenues of research MAP estimates can be rigorously defined for certain class of non-Gaussian priors For more details: Helin, T. and Burger, M.: Maximum a posteriori probability estimates in infinite-dimensional Bayesian inverse problems, arXiv:1412.5816. T. Helin (UH) MAP estimates in Bayesian inversion June 2015 35 / 35