Bayesian Inference for Inverse Problems Using Surrogate Model Dongbin Xiu Department of Mathematics, Purdue University Supported by AFOSR, DOE, NNSA, NSF Overview • Uncertainty quantification (UQ) and stochastic modeling • Forward problem: uncertainty propagation • Brief introduction of generalized polynomial chaos (gPC) • Inverse problem: Bayesian inference • Use generalized polynomial chaos (gPC) • Back to the forward problem • Issues and challenges • Key Issues: • Efficiency • Curse-of-dimensionality Illustrative Example: Burgers Equation • Burgers’ equation : ∂u ∂u ∂ 2u +u =ν 2 , ∂t ∂x ∂x • Steady state solution: ⎡ A ⎤ ∂u u(x) = −Atanh ⎢ ( x − z )⎥ , where u(z) = 0, A = − ⎢ 2ν ⎥ ∂x ⎣ ⎦ u(−1) = 1, u(1) = −1 1 0.8 0.6 u(-1)=1.01 0.4 0.2 0 −0.2 u(-1)=1 −0.4 −0.6 −0.8 perturbed solution unperturbed solution mesh distribution −1 −1 −0.8 −0.6 −0.4 −0.2 0 ν=0.05 0.2 0.4 0.6 0.8 1 x= z Effects of Uncertainty – “Supersensitivity” • Burgers’ equation : ∂u ∂u ∂ 2u +u =ν 2 , ∂t ∂x ∂x • Boundary conditions : u(−1) = 1+ δ; u(1) = −1; δ ~ U (0,0.1) x ∈ ⎡⎢⎣−1,1⎤⎥⎦ upper bd Mean Standard dev Lower bd (Xiu & Karniadakis, Int. J. Numer. Eng., vol. 61, 2004) (Re-)Formulation of PDE: Input Parameterization ∂u (t, x) = L(u) + boundary/initial conditions ∂t • Goal: To characterize the random inputs by a set of random variables Finite number Mutual independence • If inputs == parameters Identify the (smallest) independent set Prescribe probability distribution • Else if inputs == fields/processes Approximate the field by a function of finite number of RVs Well-studied for Gaussian processes Under-developed for non-Gaussian processes Examples: Karhunen-Loeve expansion, spectral decomposition, etc. d a(x,ω) ≈ µa (x) + ∑ ai (x)Z i (ω) i=1 The Reformulation • Stochastic PDE: ∂u (t, x, Z ) = L(u) + boundary/initial conditions ∂t • Solution: n u(t, x, Z ) :[0,T ] × D × R Z → R • Uncertain inputs are characterized by nz random variables Z • Probability distribution of Z is prescribed FZ (s) = Pr(Z ≤ s), s ∈R nZ Non-trivial task Generalized Polynomial Chaos (gPC) n u(i, Z ) : R Z → R • Focus on dependence on Z: • Nth-order gPC expansion: uN (Z ) ∈ PN = {space of nZ -variate polynomials of degree up to N} • Orthogonal basis: i = (i1 ,,in ), Z i = i1 ++ in Z E ⎡⎢Φi (Z )Φ j (Z )⎤⎥ = ∫ Φi (z)Φ j (z)ρ(z) dz = δij ⎣ ⎦ uN (t, x, Z ) N ∑ û (t, x)Φ k k =0 k (Z ) ⎛ n +N ⎜ dim PN = ⎜⎜ z ⎜⎜⎝ N ⎞⎟ ⎟⎟ ⎟⎟ ⎠ gPC: Basis E ( g(Z )) = ∫ g(z)ρ(z) dz Expectation: R ∫ Φ (z)Φ (z)ρ(z) dz = E ⎡⎢⎣Φ (Z )Φ (Z )⎤⎥⎦ = δ Orthogonality: i j Gaussian distribution ∞ i Gamma distribution ∞ ∫ Φ (z)Φ (z)e −z 2 i j dz = δij −∞ Hermite polynomial j ij Beta distribution 1 ∫ Φ (z)Φ (z)e −z i j dz = δij 0 Laguerre polynomial ∫ Φ (z)Φ (z) dz = δ i j ij −1 Legendre polynomial (Xiu & Karniadakis, SISC, 2002) gPC Basis: the Choices ∫ Φ (z)Φ (z)ρ(z) dz = E ⎡⎣⎢Φ (Z )Φ (Z )⎤⎦⎥ = δ Orthogonality: i j i j ij Example: Hermite polynomial ∞ ∫ 2 Φi (z)Φ j (z)e−z dz = δij −∞ The polynomials: Z~N(0,1) Φ0 = 1, Φ1 = Z, Φ2 = Z 2 −1, Φ3 = Z 3 − 3Z, Approximation of arbitrary random variable: Requires L2 integrability Example: Uniform random variable o Convergence o Non-optimal o First-order Legendre is exact Stochastic Galerkin • Galerkin method: Seek uN (t, x, Z ) N ∑ û (t, x)Φ k k (Z ) k =0 Such that ⎡ ∂uN ⎤ E⎢ (t, x, Z )Φm (Z ) ⎥ = E ⎡⎣L(uN )Φm (Z ) ⎤⎦ , ∀ m ≤ N ⎣ ∂t ⎦ • The result: Residue is orthogonal to the gPC space A set of deterministic equations for the coefficients The equations are usually coupled – requires new solver Stochastic Collocation • Collocation: To satisfy governing equations at selected nodes Allow one to use existing deterministic codes repetitively n Let Z 1 , Z 2 ,…Z Q ∈R Z , be a set of nodes/samples, then solve ∂u (t, x, Z j ) = L(u), j = 1,…,Q. ∂t • Sampling: (solution statistics only) • Random (Monte Carlo) • Deterministic (lattice rule, tensor grid, cubature) • Stochastic collocation: To construct polynomial approximations Node selection is critical to efficiency and accuracy More than sampling Stochastic Collocation: Interpolation • Definition: Given a set of nodes and solution ensemble, find uN in a proper polynomial space, such that uN≈u in a proper sense. • Interpolation Approaches: Q u (Z ) ∑ u(Z j )L j (Z ) Q j=1 Li (Z j ) = δij , 1≤ i, j ≤ Q • Optimal nodal distribution in high dimensions… Tensor grids: inefficient Sparse grids: more efficient (Xiu & Hesthaven, SIAM J. Sci. Comput., 05) Stochastic Collocation: Discrete Projection • Orthogonal projection: N uN (t, x, Z ) PN u = ∑ ûk (t, x)Φk (Z ) k =0 ûk = E[u(Z )Φk (Z )] = ∫ u(z)Φk (z)ρ(z) dz • Discrete projection: N wN (t, x, Z ) = ∑ ŵk (t, x)Φk (Z ) k =0 Q ŵk = ∑ u(t, x, Z j )Φk (Z j )α j ≈ ∫ u(z)Φk (z)ρ(z) dz j=1 ŵk (t, x) → ûk (t, x), Q → ∞ • Aliasing Error: εQ uN − wN L2ρ Inverse Parameter Estimation • Solution of a stochastic system: • Need: Probability distribution of the parameters Z Prior distribution: • Information of the prior distribution is critical Requires direct measurements of the parameters No/not enough direct measurements? (Use experience/intuition …) How to take advantage of measurements of other variables? Bayesian Inference for Parameter Estimation • Setup: Prior distribution of the parameters Z: π(z) Forward problem simulation: y=G(Z) Measurement/data: d • Error: • Goal: To estimation the distribution of Z --- posterior distribution • Posterior distribution: π (Z | d) ∝ π (d | Z )π (Z ) • Likelihood function: • Notes: Difficult to manipulate Classical sampling approaches can be time consuming (MCMC, etc) Surrogate-based Bayesian Estimation • Surrogate: y N = GN (Z ) • Approximate Bayes rule: π N (Z | d) ∝ π N (d | Z )π (Z ) nd where π N (d | Z ) = ∏ π e (di − GN ,i (Z )) i=1 i • Properties: Allows direct sampling with arbitrarily large samples No additional simulations – forward problem solver only Convergence Analysis • Kullback-Leibler divergence: • Basic assumption: observation error is i.i.d. Gaussian Theorem. If the gPC expansion GN converges to G in L2π , then the posterior density Z π Nd converges to π d in the sense ( ) D π Nd π d → 0, N → ∞. Moreover, if Gi (Z ) − GN ,i (Z ) L2π ≤ CN −α , 1 ≤ i ≤ nd ,α > 0, C independent of N , z then for sufficiently large N , ( ) D π Nd π d N −α . • Notes: Fast (exponential) convergence rate is retained So is the slow convergence (Marzouk & Xiu, Comm. Comput. Phys.,vol. 6, 08) Parameter Estimation: Supersensitivity Example • Burgers’ equation : • Boundary conditions : ∂u ∂u ∂ 2u +u =ν 2 , ∂t ∂x ∂x x ∈ ⎡⎢⎣−1,1⎤⎥⎦ u(−1) = 1+ δ(Z ); u(1) = −1; 0 < δ 1 δ noisy observation of transition layer location (contrived numerically) Prior distribution is uniform Z~(0,0.1) Measurement noise: e~N(0,0.052) Parameter Estimation: Step Function • Assume the forward model is a step function • Posterior distribution is discontinuous • Gibb’s oscillations exist • Slow convergence with global gPC basis functions 3 1.2 exact posterior gPC posterior 1 2.5 0.8 0.6 !d(z) G(z) or GN(z) 2 1.5 0.4 1 0.2 0.5 0 exact forward solution gPC approximation −0.2 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 z Forward model and its approximation 1 0 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 z Posterior distribution and its approximation 1 Biological Application • Example: estimate kinetic parameters in a genetic toggle switch – Differential-algebraic equations model from [Gardner et al, Nature, 2000] – Real experimental data: steady-state expression levels of one gene (v) 1 Normalized GFP expression 0.8 0.6 0.4 0.2 0 −6 −5.5 −5 −4.5 −4 −3.5 −3 −2.5 • 6 uncertain parameters • Assumed to be independent and uniform in the forward problem • Good agreement with measurement --- but let’s now use the data again log10(IPTG) −2 Using the Data Again – Parameter estimation 1-parameter and 2-parameter marginal posterior densities (dim = 6, N = 4, 6-level sparse grid forward problem solver) An (almost) Industrial Application: Free-Cantilever Damping Gas damping of freely-vibrating cantilever MEMOSA for damping simulation 3-parameter marginal posterior densities Level 2 sparse grid in thickness, gas density and frequency 24 Summary Uncertainty Analysis: To provide improved prediction • Input characterization • Uncertainty propagation • Post processing Generalized polynomial chaos (gPC) • Multivariate approximation theory • An highly efficient uncertainty propagation strategy • Makes many UQ tasks “post processing” Data, any data, can help • UQ simulation needs to be dynamically data driven. Support: • AFOSR: FA9550-08-1-0353 • DOE: • DE-FC52-08NA28617 • DE-SC0005713 • NSF: • DMS-0645035 • IIS-0914447 • IIS-1028291