Large Scale Sequential Experimental Design for Signal Acquisition Optimization Matthias Seeger mseeger@gmail.com 08/05/2015 Seeger Large Scale Bayesian Inference 08/05/2015 1 / 30 Outline 1 Motivation 2 Bayesian Experimental Design 3 Variational Bayesian Inference 4 Experimental Results 5 Outlook Seeger Large Scale Bayesian Inference 08/05/2015 2 / 30 Motivation Magnetic Resonance Imaging ⊕ ⊕ Extremely versatile Noninvasive, no ionizing radiation Very expensive Long scan times: Major limiting factor Seeger Large Scale Bayesian Inference 08/05/2015 3 / 30 Motivation Image Reconstruction y Measurement Xu Design Ideal Image u Data y Reconstruction Seeger Large Scale Bayesian Inference 08/05/2015 4 / 30 Motivation Sampling Optimization y Xu Reconstruction Seeger Large Scale Bayesian Inference 08/05/2015 5 / 30 Bayesian Experimental Design Posterior Distribution Likelihood P(y |u): Data fit Prior P(u): Signal properties P(y |u) Posterior distribution P(u|y ): Consistent information summary Seeger Large Scale Bayesian Inference 08/05/2015 6 / 30 Bayesian Experimental Design Posterior Distribution Likelihood P(y |u): Data fit Prior P(u): Signal properties P(y |u) × P(u) Posterior distribution P(u|y ): Consistent information summary Seeger Large Scale Bayesian Inference 08/05/2015 6 / 30 Bayesian Experimental Design Posterior Distribution Likelihood P(y |u): Data fit Prior P(u): Signal properties Posterior distribution P(u|y ): Consistent information summary Seeger Large Scale Bayesian Inference P(u|y ) = P(y |u) × P(u) P(y ) 08/05/2015 6 / 30 Bayesian Experimental Design Bayesian Experimental Design Posterior: Uncertainty in reconstruction Experimental design: Find poorly determined directions Sequential search with interjacent partial measurements Seeger Large Scale Bayesian Inference 08/05/2015 7 / 30 Bayesian Experimental Design Bayesian Experimental Design Posterior: Uncertainty in reconstruction Experimental design: Find poorly determined directions Sequential search with interjacent partial measurements Seeger Large Scale Bayesian Inference 08/05/2015 7 / 30 Bayesian Experimental Design Bayesian Experimental Design Bayesian Sequential Design How much do I know? P(u|y ), H = H[P(u|y )] How much would (x ∗ , y∗ ) help? I(x ∗ , y∗ ) = H − H[P(u|y , y∗ )] How much will x ∗ help? I(x ∗ ) = EP(y∗ |y ) [I(x ∗ , y∗ )] Seeger Large Scale Bayesian Inference 08/05/2015 7 / 30 Bayesian Experimental Design Bayesian Experimental Design Bayesian Sequential Design How much do I know? P(u|y ), H = H[P(u|y )] How much would (x ∗ , y∗ ) help? I(x ∗ , y∗ ) = H − H[P(u|y , y∗ )] How much will x ∗ help? I(x ∗ ) = EP(y∗ |y ) [I(x ∗ , y∗ )] Seeger Large Scale Bayesian Inference 08/05/2015 7 / 30 Bayesian Experimental Design Bayesian Experimental Design Bayesian Sequential Design How much do I know? P(u|y ), H = H[P(u|y )] How much would (x ∗ , y∗ ) help? I(x ∗ , y∗ ) = H − H[P(u|y , y∗ )] How much will x ∗ help? I(x ∗ ) = EP(y∗ |y ) [I(x ∗ , y∗ )] Seeger Large Scale Bayesian Inference 08/05/2015 7 / 30 Bayesian Experimental Design Maximizing Information Gain Score design extension X ∗ by information gain: I(X ∗ ) = I(y ∗ , u|y ) = H[P(u|y )] − H[P(u|y ∗ , y )] {z } | {z } | Before After Standard approach in literature: Assume: I(X ∗ ) tractable to compute Assume: I(X ∗ ) cheap to compute (many X ∗ ) Concentrate on combinatorial optimization aspects Seeger Large Scale Bayesian Inference 08/05/2015 8 / 30 Bayesian Experimental Design Maximizing Information Gain Score design extension X ∗ by information gain: I(X ∗ ) = I(y ∗ , u|y ) = H[P(u|y )] − H[P(u|y ∗ , y )] {z } | {z } | Before After Standard approach in literature: Assume: I(X ∗ ) tractable to compute Assume: I(X ∗ ) cheap to compute (many X ∗ ) Concentrate on combinatorial optimization aspects Seeger Large Scale Bayesian Inference 08/05/2015 8 / 30 Bayesian Experimental Design Maximizing Information Gain Score design extension X ∗ by information gain: I(X ∗ ) = I(y ∗ , u|y ) = H[P(u|y )] − H[P(u|y ∗ , y )] {z } | {z } | Before After Standard approach in literature: Assume: I(X ∗ ) tractable to compute Assume: I(X ∗ ) cheap to compute (many X ∗ ) Concentrate on combinatorial optimization aspects So is it . . . ? Seeger Large Scale Bayesian Inference 08/05/2015 8 / 30 Bayesian Experimental Design Challenges I(X ∗ ) = H[P(u|y )] − H[P(u|y ∗ , y )] Assume: I(X ∗ ) tractable to compute. Only if P(u|y ) Gaussian . . . 0.08 0.06 0.04 0.02 0 2 0 −2 Seeger −2 −1 Large Scale Bayesian Inference 0 1 08/05/2015 9 / 30 Bayesian Experimental Design Image Statistics Whatever images are . . . they are not Gaussian! Seeger Large Scale Bayesian Inference 08/05/2015 10 / 30 Bayesian Experimental Design Challenges I(X ∗ ) = H[P(u|y )] − H[P(u|y ∗ , y )] Assume: I(X ∗ ) tractable to compute? No: Needs approximate inference Assume: I(X ∗ ) cheap to compute (many X ∗ ). Seeger Large Scale Bayesian Inference 08/05/2015 11 / 30 Bayesian Experimental Design Size Does Matter 1 Global covariances Scores I(X ∗ ) need full CovP [u|y ] 2 Massive scale R131072 (just one slice). 3 Many times Posterior after each design extension Seeger Large Scale Bayesian Inference 08/05/2015 12 / 30 Bayesian Experimental Design Challenges I(X ∗ ) = H[P(u|y )] − H[P(u|y ∗ , y )] Assume: I(X ∗ ) tractable to compute? No: Needs approximate inference Assume: I(X ∗ ) cheap to compute (many X ∗ )? No: Needs new algorithms and high performance computing Seeger Large Scale Bayesian Inference 08/05/2015 13 / 30 Variational Bayesian Inference Sparsity Priors Seeger courtesy Florian Steinke Large Scale Bayesian Inference 08/05/2015 14 / 30 Variational Bayesian Inference Sparse Linear Model Seeger Large Scale Bayesian Inference 08/05/2015 15 / 30 Variational Bayesian Inference Variational Bayesian Inference Approximate inference for non-Gaussian models Computations driven by Gaussian inference Seeger Large Scale Bayesian Inference 08/05/2015 16 / 30 Variational Bayesian Inference Variational Bayesian Inference P(u|y ) = P(y |u) × P(u) P(y ) Variational Inference Approximation Write intractable integration as optimization Relax to tractable optimization problem Seeger Large Scale Bayesian Inference 08/05/2015 17 / 30 Variational Bayesian Inference Variational Bayesian Inference P(u|y ) = P(y |u) × P(u) P(y ) Variational Relaxation: Bound the master function Z 1 − log P(y ) = − log P(u, y ) du ≤ min min φ(u ∗ , γ) 2 γ u∗ Approximate posterior P(u|y ) by Gaussian Integration ⇒ Convex optimization Seeger Large Scale Bayesian Inference 08/05/2015 17 / 30 Variational Bayesian Inference No Inference Without . . . Seeger Large Scale Bayesian Inference 08/05/2015 18 / 30 Variational Bayesian Inference Why Inference Algorithms Are Slow Seeger Large Scale Bayesian Inference 08/05/2015 19 / 30 Variational Bayesian Inference Decoupling by Concavity Z − log P(u, y ) du ≤ min min φ(u, γ) γ u Dependencies in posterior P(u|y ) ⇒ Difficult couplings in criterion φ Critically coupled part is concave Upper bound by tangent plane Seeger Large Scale Bayesian Inference 08/05/2015 20 / 30 Variational Bayesian Inference Decoupling by Concavity Z − log P(u, y ) du ≤ min min φ(u, γ) γ u Dependencies in posterior P(u|y ) ⇒ Difficult couplings in criterion φ Critically coupled part is concave Upper bound by tangent plane Seeger Large Scale Bayesian Inference 08/05/2015 20 / 30 Variational Bayesian Inference Decoupling by Concavity Z − log P(u, y ) du ≤ min min φ(u, γ) = min min min φz (u, γ) γ u z γ u {z } | Decoupled problem Dependencies in posterior P(u|y ) ⇒ Difficult couplings in criterion φ Critically coupled part is concave Upper bound by tangent plane Seeger Large Scale Bayesian Inference 08/05/2015 20 / 30 Variational Bayesian Inference Double Loop Algorithm Double loop algorithm Inner loop optimization: Standard MAP Estimation Outer loop update: Gaussian Variances Seeger Large Scale Bayesian Inference 08/05/2015 21 / 30 Variational Bayesian Inference Double Loop Algorithm Double loop algorithm Inner loop optimization: Standard MAP Estimation Outer loop update: Gaussian Variances Seeger Large Scale Bayesian Inference 08/05/2015 21 / 30 Variational Bayesian Inference Closing the Loop I(X ∗ ) = EP(y ∗ |y ) [H[P(u|y )] − H[P(u|y , y ∗ )]] Gaussian prior P(u)? Things are simple Posterior P(u|y ) Gaussian as well Information gain tractable I(X ∗ ) = 21 log I + X ∗ A−1 X T∗ , A = CovP [u|y ]−1 ⇒ Posterior covariance Seeger Large Scale Bayesian Inference 08/05/2015 22 / 30 Variational Bayesian Inference Closing the Loop I(X ∗ ) = EP(y ∗ |y ) [H[P(u|y )] − H[P(u|y , y ∗ )]] Gaussian prior P(u)? Things are simple Posterior P(u|y ) Gaussian as well Information gain tractable I(X ∗ ) = 12 log I + X ∗ A−1 X T∗ , A = CovQ [u|y ]−1 ⇒ Posterior covariance Image priors P(u) are not Gaussian . . . Variational approximation: Gaussian posterior Q(u|y ) ≈ P(u|y ) Seeger Large Scale Bayesian Inference 08/05/2015 22 / 30 Variational Bayesian Inference Variational Bayesian Experimental Design I(X ∗ ) = EP(y ∗ |y ) [H[P(u|y )] − H[P(u|y , y ∗ )]] Given initial X , y . repeat Variational inference: Update γ s.t. Q(u|y ) ≈ P(u|y ). for X ∗ ∈ Xcand do Compute approximate information gain: I(X ∗ ) = 12 log |I + X ∗ A−1 X T∗ |, A = CovQ [u|y ]−1 . end for Pick score maximizer X ∗ . Acquire data y ∗ . Append X ∗ to X , y ∗ to y . until X has desired size Seeger Large Scale Bayesian Inference 08/05/2015 23 / 30 Variational Bayesian Inference Gaussian Covariances Elephant in the room . . . Covariance matrix CovQ [u|y ] = A−1 A = σ −2 X H X + B T Γ−1 B 1 Outer loop updates: z ← diag(BA−1 B T ) 2 Experimental design scores: I(X ∗ ) = Seeger 1 2 log |I + σ −2 X ∗ A−1 X H ∗| Large Scale Bayesian Inference 08/05/2015 24 / 30 Variational Bayesian Inference Gaussian Covariances CovQ [u|y ]−1 = A = σ −2 X H X + B T Γ−1 B Some fields care about them Electronic structure calculations Uncertainty quantifications for PDEs Gaussian MRFs for remote sensing Simple trick: Perturb&MAP Papandreou, Yuille, NIPS 2010 w l ∼ N(0, A) [simple], q l = A−1 w l ∼ N(0, A−1 ), X CovQ [u|y ] ≈ q l (q l )H l One linear system per sample (parallelizable) Noisy, but unbiased Seeger Large Scale Bayesian Inference 08/05/2015 25 / 30 Variational Bayesian Inference Gaussian Covariances CovQ [u|y ]−1 = A = σ −2 X H X + B T Γ−1 B Some fields care about them Electronic structure calculations Uncertainty quantifications for PDEs Gaussian MRFs for remote sensing Simple trick: Perturb&MAP Papandreou, Yuille, NIPS 2010 w l ∼ N(0, A) [simple], q l = A−1 w l ∼ N(0, A−1 ), X CovQ [u|y ] ≈ q l (q l )H l One linear system per sample (parallelizable) Noisy, but unbiased Seeger Large Scale Bayesian Inference 08/05/2015 25 / 30 Experimental Results Optimizing Cartesian MRI Bayes Optim. Seeger et.al., MRM 63(1), 2010 Seeger VD Random Lustig, Donoho, Pauli, MRM 58(6), 2007 Large Scale Bayesian Inference Low Pass Common MRI practice 08/05/2015 26 / 30 Experimental Results Experimental Results: Test Set Errors L2 reconstruction error sagittal short TE sagittal long TE Low Pass VD Random Bayes Optim 7 6 5 4 3 2 1 L2 reconstruction error axial short TE axial long TE 7 6 5 4 3 2 1 80 Seeger 100 120 140 Number phase encodes 160 80 Large Scale Bayesian Inference 100 120 140 Number phase encodes 08/05/2015 160 27 / 30 Outlook Large Scale Bayesian Inference Computer Vision Hierarchically structured image priors Learning Image Models (fields of experts, . . . ) Bayesian dictionary learning Intelligent user interfaces (Bayesian active learning) Ko, Seeger, ICML 2012 Advanced variational inference Speeding up expectation propagation Seeger, Nickisch, AISTATS 2011 Generic framework You can do penalized estimation efficiently? You can do variational Bayesian inference! Seeger Large Scale Bayesian Inference 08/05/2015 28 / 30 Outlook Large Scale Bayesian Inference Computer Vision Hierarchically structured image priors Learning Image Models (fields of experts, . . . ) Bayesian dictionary learning Intelligent user interfaces (Bayesian active learning) Ko, Seeger, ICML 2012 Advanced variational inference Speeding up expectation propagation Seeger, Nickisch, AISTATS 2011 Generic framework You can do penalized estimation efficiently? You can do variational Bayesian inference! Seeger Large Scale Bayesian Inference 08/05/2015 28 / 30 Outlook Large Scale Bayesian Inference Computer Vision Hierarchically structured image priors Learning Image Models (fields of experts, . . . ) Bayesian dictionary learning Intelligent user interfaces (Bayesian active learning) Ko, Seeger, ICML 2012 Advanced variational inference Speeding up expectation propagation Seeger, Nickisch, AISTATS 2011 Generic framework You can do penalized estimation efficiently? You can do variational Bayesian inference! Seeger Large Scale Bayesian Inference 08/05/2015 28 / 30 Outlook People& Code glm-ie: Toolbox by Hannes Nickisch mloss.org/software/view/269/ Generalized sparse linear models MAP reconstruction and variational Bayesian inference (double loop algorithm for super-Gaussian bounding) Matlab 7.x, GNU Octave 3.2.x Hannes Nickisch (now Philips Research, Hamburg) Rolf Pohmann, Bernhard Schölkopf (MPI Tübingen) Young Jun Ko (EPFL) Emtiyaz Khan (EPFL) Seeger Large Scale Bayesian Inference 08/05/2015 29 / 30 Outlook References Seeger, Nickisch Large Scale Bayesian Inference and Experimental Design for Sparse Linear Models SIAM Journal on Imaging Science 4(1), 2011 Seeger, Nickisch, Pohmann, Schölkopf Optimization of k-Space Trajectories for Compressed Sensing by Bayesian Experimental Design Magnetic Resonance in Medicine 63(1), 2010 Seeger, Wipf Variational Bayesian Inference Techniques IEEE Signal Processing Magazine 27(6), 2010 Seeger, Nickisch Compressed Sensing and Bayesian Experimental Design International Conference on Machine Learning (ICML), 2008 Seeger Large Scale Bayesian Inference 08/05/2015 30 / 30