Large Scale Sequential Experimental Design for Signal Acquisition Optimization Matthias Seeger 08/05/2015

advertisement
Large Scale Sequential Experimental Design
for Signal Acquisition Optimization
Matthias Seeger
mseeger@gmail.com
08/05/2015
Seeger
Large Scale Bayesian Inference
08/05/2015
1 / 30
Outline
1
Motivation
2
Bayesian Experimental Design
3
Variational Bayesian Inference
4
Experimental Results
5
Outlook
Seeger
Large Scale Bayesian Inference
08/05/2015
2 / 30
Motivation
Magnetic Resonance Imaging
⊕
⊕
Extremely versatile
Noninvasive, no ionizing radiation
Very expensive
Long scan times: Major limiting factor
Seeger
Large Scale Bayesian Inference
08/05/2015
3 / 30
Motivation
Image Reconstruction
y
Measurement
Xu
Design
Ideal Image u
Data y
Reconstruction
Seeger
Large Scale Bayesian Inference
08/05/2015
4 / 30
Motivation
Sampling Optimization
y
Xu
Reconstruction
Seeger
Large Scale Bayesian Inference
08/05/2015
5 / 30
Bayesian Experimental Design
Posterior Distribution
Likelihood P(y |u): Data fit
Prior P(u): Signal properties
P(y |u)
Posterior distribution P(u|y ):
Consistent information summary
Seeger
Large Scale Bayesian Inference
08/05/2015
6 / 30
Bayesian Experimental Design
Posterior Distribution
Likelihood P(y |u): Data fit
Prior P(u): Signal properties
P(y |u) × P(u)
Posterior distribution P(u|y ):
Consistent information summary
Seeger
Large Scale Bayesian Inference
08/05/2015
6 / 30
Bayesian Experimental Design
Posterior Distribution
Likelihood P(y |u): Data fit
Prior P(u): Signal properties
Posterior distribution P(u|y ):
Consistent information summary
Seeger
Large Scale Bayesian Inference
P(u|y ) =
P(y |u) × P(u)
P(y )
08/05/2015
6 / 30
Bayesian Experimental Design
Bayesian Experimental Design
Posterior: Uncertainty in
reconstruction
Experimental design:
Find poorly determined
directions
Sequential search with
interjacent partial
measurements
Seeger
Large Scale Bayesian Inference
08/05/2015
7 / 30
Bayesian Experimental Design
Bayesian Experimental Design
Posterior: Uncertainty in
reconstruction
Experimental design:
Find poorly determined
directions
Sequential search with
interjacent partial
measurements
Seeger
Large Scale Bayesian Inference
08/05/2015
7 / 30
Bayesian Experimental Design
Bayesian Experimental Design
Bayesian Sequential Design
How much do I know?
P(u|y ), H = H[P(u|y )]
How much would (x ∗ , y∗ ) help?
I(x ∗ , y∗ ) = H − H[P(u|y , y∗ )]
How much will x ∗ help?
I(x ∗ ) = EP(y∗ |y ) [I(x ∗ , y∗ )]
Seeger
Large Scale Bayesian Inference
08/05/2015
7 / 30
Bayesian Experimental Design
Bayesian Experimental Design
Bayesian Sequential Design
How much do I know?
P(u|y ), H = H[P(u|y )]
How much would (x ∗ , y∗ ) help?
I(x ∗ , y∗ ) = H − H[P(u|y , y∗ )]
How much will x ∗ help?
I(x ∗ ) = EP(y∗ |y ) [I(x ∗ , y∗ )]
Seeger
Large Scale Bayesian Inference
08/05/2015
7 / 30
Bayesian Experimental Design
Bayesian Experimental Design
Bayesian Sequential Design
How much do I know?
P(u|y ), H = H[P(u|y )]
How much would (x ∗ , y∗ ) help?
I(x ∗ , y∗ ) = H − H[P(u|y , y∗ )]
How much will x ∗ help?
I(x ∗ ) = EP(y∗ |y ) [I(x ∗ , y∗ )]
Seeger
Large Scale Bayesian Inference
08/05/2015
7 / 30
Bayesian Experimental Design
Maximizing Information Gain
Score design extension X ∗ by information gain:
I(X ∗ ) = I(y ∗ , u|y ) = H[P(u|y )] − H[P(u|y ∗ , y )]
{z
}
| {z } |
Before
After
Standard approach in literature:
Assume: I(X ∗ ) tractable to compute
Assume: I(X ∗ ) cheap to compute (many X ∗ )
Concentrate on combinatorial optimization aspects
Seeger
Large Scale Bayesian Inference
08/05/2015
8 / 30
Bayesian Experimental Design
Maximizing Information Gain
Score design extension X ∗ by information gain:
I(X ∗ ) = I(y ∗ , u|y ) = H[P(u|y )] − H[P(u|y ∗ , y )]
{z
}
| {z } |
Before
After
Standard approach in literature:
Assume: I(X ∗ ) tractable to compute
Assume: I(X ∗ ) cheap to compute (many X ∗ )
Concentrate on combinatorial optimization aspects
Seeger
Large Scale Bayesian Inference
08/05/2015
8 / 30
Bayesian Experimental Design
Maximizing Information Gain
Score design extension X ∗ by information gain:
I(X ∗ ) = I(y ∗ , u|y ) = H[P(u|y )] − H[P(u|y ∗ , y )]
{z
}
| {z } |
Before
After
Standard approach in literature:
Assume: I(X ∗ ) tractable to compute
Assume: I(X ∗ ) cheap to compute (many X ∗ )
Concentrate on combinatorial optimization aspects
So is it . . . ?
Seeger
Large Scale Bayesian Inference
08/05/2015
8 / 30
Bayesian Experimental Design
Challenges
I(X ∗ ) = H[P(u|y )] − H[P(u|y ∗ , y )]
Assume: I(X ∗ ) tractable to compute.
Only if P(u|y ) Gaussian . . .
0.08
0.06
0.04
0.02
0
2
0
−2
Seeger
−2
−1
Large Scale Bayesian Inference
0
1
08/05/2015
9 / 30
Bayesian Experimental Design
Image Statistics
Whatever images are . . .
they are not Gaussian!
Seeger
Large Scale Bayesian Inference
08/05/2015
10 / 30
Bayesian Experimental Design
Challenges
I(X ∗ ) = H[P(u|y )] − H[P(u|y ∗ , y )]
Assume: I(X ∗ ) tractable to compute?
No: Needs approximate inference
Assume: I(X ∗ ) cheap to compute (many X ∗ ).
Seeger
Large Scale Bayesian Inference
08/05/2015
11 / 30
Bayesian Experimental Design
Size Does Matter
1
Global covariances
Scores I(X ∗ ) need full CovP [u|y ]
2
Massive scale
R131072 (just one slice).
3
Many times
Posterior after each design extension
Seeger
Large Scale Bayesian Inference
08/05/2015
12 / 30
Bayesian Experimental Design
Challenges
I(X ∗ ) = H[P(u|y )] − H[P(u|y ∗ , y )]
Assume: I(X ∗ ) tractable to compute?
No: Needs approximate inference
Assume: I(X ∗ ) cheap to compute (many X ∗ )?
No: Needs new algorithms and high performance computing
Seeger
Large Scale Bayesian Inference
08/05/2015
13 / 30
Variational Bayesian Inference
Sparsity Priors
Seeger
courtesy Florian Steinke
Large Scale Bayesian Inference
08/05/2015
14 / 30
Variational Bayesian Inference
Sparse Linear Model
Seeger
Large Scale Bayesian Inference
08/05/2015
15 / 30
Variational Bayesian Inference
Variational Bayesian Inference
Approximate inference for non-Gaussian models
Computations driven by Gaussian inference
Seeger
Large Scale Bayesian Inference
08/05/2015
16 / 30
Variational Bayesian Inference
Variational Bayesian Inference
P(u|y ) =
P(y |u) × P(u)
P(y )
Variational Inference Approximation
Write intractable integration as optimization
Relax to tractable optimization problem
Seeger
Large Scale Bayesian Inference
08/05/2015
17 / 30
Variational Bayesian Inference
Variational Bayesian Inference
P(u|y ) =
P(y |u) × P(u)
P(y )
Variational Relaxation: Bound the master function
Z
1
− log P(y ) = − log P(u, y ) du ≤ min min φ(u ∗ , γ)
2 γ u∗
Approximate posterior P(u|y ) by Gaussian
Integration ⇒ Convex optimization
Seeger
Large Scale Bayesian Inference
08/05/2015
17 / 30
Variational Bayesian Inference
No Inference Without . . .
Seeger
Large Scale Bayesian Inference
08/05/2015
18 / 30
Variational Bayesian Inference
Why Inference Algorithms Are Slow
Seeger
Large Scale Bayesian Inference
08/05/2015
19 / 30
Variational Bayesian Inference
Decoupling by Concavity
Z
− log
P(u, y ) du ≤ min min φ(u, γ)
γ
u
Dependencies in posterior P(u|y )
⇒ Difficult couplings in criterion φ
Critically coupled part is concave
Upper bound by tangent plane
Seeger
Large Scale Bayesian Inference
08/05/2015
20 / 30
Variational Bayesian Inference
Decoupling by Concavity
Z
− log
P(u, y ) du ≤ min min φ(u, γ)
γ
u
Dependencies in posterior P(u|y )
⇒ Difficult couplings in criterion φ
Critically coupled part is concave
Upper bound by tangent plane
Seeger
Large Scale Bayesian Inference
08/05/2015
20 / 30
Variational Bayesian Inference
Decoupling by Concavity
Z
− log
P(u, y ) du ≤ min min φ(u, γ) = min min min φz (u, γ)
γ
u
z
γ
u
{z
}
|
Decoupled problem
Dependencies in posterior P(u|y )
⇒ Difficult couplings in criterion φ
Critically coupled part is concave
Upper bound by tangent plane
Seeger
Large Scale Bayesian Inference
08/05/2015
20 / 30
Variational Bayesian Inference
Double Loop Algorithm
Double loop algorithm
Inner loop optimization:
Standard MAP Estimation
Outer loop update:
Gaussian Variances
Seeger
Large Scale Bayesian Inference
08/05/2015
21 / 30
Variational Bayesian Inference
Double Loop Algorithm
Double loop algorithm
Inner loop optimization:
Standard MAP Estimation
Outer loop update:
Gaussian Variances
Seeger
Large Scale Bayesian Inference
08/05/2015
21 / 30
Variational Bayesian Inference
Closing the Loop
I(X ∗ ) = EP(y ∗ |y ) [H[P(u|y )] − H[P(u|y , y ∗ )]]
Gaussian prior P(u)? Things are simple
Posterior P(u|y ) Gaussian as well
Information gain tractable
I(X ∗ ) = 21 log I + X ∗ A−1 X T∗ ,
A = CovP [u|y ]−1
⇒ Posterior covariance
Seeger
Large Scale Bayesian Inference
08/05/2015
22 / 30
Variational Bayesian Inference
Closing the Loop
I(X ∗ ) = EP(y ∗ |y ) [H[P(u|y )] − H[P(u|y , y ∗ )]]
Gaussian prior P(u)? Things are simple
Posterior P(u|y ) Gaussian as well
Information gain tractable
I(X ∗ ) = 12 log I + X ∗ A−1 X T∗ ,
A = CovQ [u|y ]−1
⇒ Posterior covariance
Image priors P(u) are not Gaussian . . .
Variational approximation:
Gaussian posterior Q(u|y ) ≈ P(u|y )
Seeger
Large Scale Bayesian Inference
08/05/2015
22 / 30
Variational Bayesian Inference
Variational Bayesian Experimental Design
I(X ∗ ) = EP(y ∗ |y ) [H[P(u|y )] − H[P(u|y , y ∗ )]]
Given initial X , y .
repeat
Variational inference: Update γ s.t. Q(u|y ) ≈ P(u|y ).
for X ∗ ∈ Xcand do
Compute approximate information gain:
I(X ∗ ) = 12 log |I + X ∗ A−1 X T∗ |, A = CovQ [u|y ]−1 .
end for
Pick score maximizer X ∗ . Acquire data y ∗ .
Append X ∗ to X , y ∗ to y .
until X has desired size
Seeger
Large Scale Bayesian Inference
08/05/2015
23 / 30
Variational Bayesian Inference
Gaussian Covariances
Elephant in the room . . .
Covariance matrix
CovQ [u|y ] = A−1
A = σ −2 X H X + B T Γ−1 B
1
Outer loop updates: z ← diag(BA−1 B T )
2
Experimental design scores: I(X ∗ ) =
Seeger
1
2
log |I + σ −2 X ∗ A−1 X H
∗|
Large Scale Bayesian Inference
08/05/2015
24 / 30
Variational Bayesian Inference
Gaussian Covariances
CovQ [u|y ]−1 = A = σ −2 X H X + B T Γ−1 B
Some fields care about them
Electronic structure calculations
Uncertainty quantifications for PDEs
Gaussian MRFs for remote sensing
Simple trick: Perturb&MAP
Papandreou, Yuille, NIPS 2010
w l ∼ N(0, A) [simple], q l = A−1 w l ∼ N(0, A−1 ),
X
CovQ [u|y ] ≈
q l (q l )H
l
One linear system per sample (parallelizable)
Noisy, but unbiased
Seeger
Large Scale Bayesian Inference
08/05/2015
25 / 30
Variational Bayesian Inference
Gaussian Covariances
CovQ [u|y ]−1 = A = σ −2 X H X + B T Γ−1 B
Some fields care about them
Electronic structure calculations
Uncertainty quantifications for PDEs
Gaussian MRFs for remote sensing
Simple trick: Perturb&MAP
Papandreou, Yuille, NIPS 2010
w l ∼ N(0, A) [simple], q l = A−1 w l ∼ N(0, A−1 ),
X
CovQ [u|y ] ≈
q l (q l )H
l
One linear system per sample (parallelizable)
Noisy, but unbiased
Seeger
Large Scale Bayesian Inference
08/05/2015
25 / 30
Experimental Results
Optimizing Cartesian MRI
Bayes Optim.
Seeger et.al., MRM 63(1), 2010
Seeger
VD Random
Lustig, Donoho, Pauli, MRM 58(6), 2007
Large Scale Bayesian Inference
Low Pass
Common MRI practice
08/05/2015
26 / 30
Experimental Results
Experimental Results: Test Set Errors
L2 reconstruction error
sagittal short TE
sagittal long TE
Low Pass
VD Random
Bayes Optim
7
6
5
4
3
2
1
L2 reconstruction error
axial short TE
axial long TE
7
6
5
4
3
2
1
80
Seeger
100
120
140
Number phase encodes
160
80
Large Scale Bayesian Inference
100
120
140
Number phase encodes
08/05/2015
160
27 / 30
Outlook
Large Scale Bayesian Inference
Computer Vision
Hierarchically structured image priors
Learning Image Models (fields of experts, . . . )
Bayesian dictionary learning
Intelligent user interfaces (Bayesian active learning)
Ko, Seeger, ICML 2012
Advanced variational inference
Speeding up expectation propagation
Seeger, Nickisch, AISTATS 2011
Generic framework
You can do penalized estimation efficiently?
You can do variational Bayesian inference!
Seeger
Large Scale Bayesian Inference
08/05/2015
28 / 30
Outlook
Large Scale Bayesian Inference
Computer Vision
Hierarchically structured image priors
Learning Image Models (fields of experts, . . . )
Bayesian dictionary learning
Intelligent user interfaces (Bayesian active learning)
Ko, Seeger, ICML 2012
Advanced variational inference
Speeding up expectation propagation
Seeger, Nickisch, AISTATS 2011
Generic framework
You can do penalized estimation efficiently?
You can do variational Bayesian inference!
Seeger
Large Scale Bayesian Inference
08/05/2015
28 / 30
Outlook
Large Scale Bayesian Inference
Computer Vision
Hierarchically structured image priors
Learning Image Models (fields of experts, . . . )
Bayesian dictionary learning
Intelligent user interfaces (Bayesian active learning)
Ko, Seeger, ICML 2012
Advanced variational inference
Speeding up expectation propagation
Seeger, Nickisch, AISTATS 2011
Generic framework
You can do penalized estimation efficiently?
You can do variational Bayesian inference!
Seeger
Large Scale Bayesian Inference
08/05/2015
28 / 30
Outlook
People& Code
glm-ie: Toolbox by Hannes Nickisch
mloss.org/software/view/269/
Generalized sparse linear models
MAP reconstruction and variational Bayesian
inference (double loop algorithm for
super-Gaussian bounding)
Matlab 7.x, GNU Octave 3.2.x
Hannes Nickisch (now Philips Research, Hamburg)
Rolf Pohmann, Bernhard Schölkopf (MPI Tübingen)
Young Jun Ko (EPFL)
Emtiyaz Khan (EPFL)
Seeger
Large Scale Bayesian Inference
08/05/2015
29 / 30
Outlook
References
Seeger, Nickisch
Large Scale Bayesian Inference and Experimental Design for Sparse Linear Models
SIAM Journal on Imaging Science 4(1), 2011
Seeger, Nickisch, Pohmann, Schölkopf
Optimization of k-Space Trajectories for Compressed Sensing by Bayesian Experimental Design
Magnetic Resonance in Medicine 63(1), 2010
Seeger, Wipf
Variational Bayesian Inference Techniques
IEEE Signal Processing Magazine 27(6), 2010
Seeger, Nickisch
Compressed Sensing and Bayesian Experimental Design
International Conference on Machine Learning (ICML), 2008
Seeger
Large Scale Bayesian Inference
08/05/2015
30 / 30
Download