Linear and non-linear Bayesian models of 3-D Confocal Microscope Movies of Biofilms

advertisement
Linear and non-linear Bayesian
models of 3-D Confocal
Microscope Movies of Biofilms
Al Parker
MUQ, Missoula, MT
June 26, 2015
Talk Outline:
1. Describe the problem of estimating biofilm characteristics from confocal scanning
laser microscope (CSLM) movies using a linear statistical model of biofilm
heights.
2. Apply polynomial accelerated iterative samplers to
solve this large (> 108-dim) Bayesian linear inverse
problem.
3. Overcome some limitations of the linear approach
by constructing a non-linear model.
Center for Biofilm Engineering
Betsey
Pitts
www.biofilm.montana.edu
Phil
Stewart
In vitro methods to grow relevant biofilms
Rotating Disk
Reactor
Moderate shear
- CSTR ASTM
Method E2196
Drip Flow
Biofilm Reactor
Low shear
- Plug flow ASTM
Method E2647
CDC Biofilm
Reactor
MBEC Assay
High shear
- CSTR ASTM
Method E2562
Gentle shear
- Batch ASTM
Method E2799
“Traditional way” to quantify microbial abundances
http://www.hypertextbookshop.com/biofilmbook/v004/r003/
Colony forming units (CFU)
Microbiologists in my lab usually want to kill biofilms:
3
Michaelis-Menten,
Monod,
lethality model:
š‘‰
š‘
āˆ†abundance = logā”(CFUā”ratio) = š¾š‘šš‘Žš‘„
+š‘
š‘š
2
1
Lauchnor et al., JAM, 2015
Inference via the posterior
Check model fit via residuals
Using confocal images to quantify microbial abundances
CDC Biofilm Reactor
After growing a Staph. aureus
biofilm that has been
genetically modified to express
a green fluorescent protein …
… apply real-time,
microscopy-based analysis of
biofilm accumulation.
High shear
- CSTR ASTM
Method E2562
The confocal microscope
Cartoon from A. Constans, The Confocal Microscope, The
Scientist, November 22, 2004, http://www.thescientist.com/?articles.view/articleNo/16043/title/TheConfocal-Microscope/
The data:
The biofilm movies, y є R4.5 x 10^8, are stored as uint8 (i.e., 0255), with dimension calculated by:
(
planar slice
) x (depth)
x (time)
= (620μm x 620μm) x (119μm)
x (4 f/m over hours)
=(
512 pixels
)2 x (17 z-slices) x (100’s)
= 5122 x 17 x 100
= 4.5 x 108
Linear model assumptions:
1. The height of the biofilm is modeled as a surface over a 2-D domain at each
time point (Sheppard & Shotton 1997), i.e.,
z є R512^2 = R2.6 x 10^5
Independent
cryosectioning of biofilm
suggests that there is
biofilm all the way through
Linear model assumptions:
1. The height of the biofilm is modeled as a surface over a 2-D domain at each
time point (Sheppard & Shotton 1997), i.e.,
z є R512^2 = R2.6 x 10^5
Linear model assumptions:
2. The biofilm’s height changes smoothly: each point on the biofilm surface is
assumed to be dependent on its neighbors (1st order), and conditionally independent
the rest of the surface
Figure 14 from D. Higdon, A primer on space-time
modelling from a Bayesian perspective, in Statistical
Methods for Spatio-Temporal Systems, B.
Finkenstadt, L. Held, and V. Isham, eds., Chapman
& Hall/CRC, Boca Raton, FL, 2007, pp. 217–279.
3. Assume a locally linear structure:
z(i) | (neighbors) ~ N(mean of the neighbors, 1/ [λz(# of neighbors)])
The linear statistical model:
The assumptions 1-3 are encapsulated by the prior in the following
locally linear Gauss-Markov Random Field:
y = Az + ε, , A = I, ε ~ N(0,λyI);
likelihood:
priors:
y|z
z
λy
λz
~ N(z, λyI);
~ N(0, λzW-1);
~ Gamma(ay, by)
~ Gamma(az, bz)
W is a Laplacian (a sparse singular precision matrix)
with bandwidth b = 512:
There are 3 parameters to estimate at each time t:
* true image:
* SD of the data:
* SD of the prior:
z є R512^2 = R2.6 x 10^5
1/sqrt(λy)
1/sqrt(λz)
W=
Estimation procedure:
The posterior is straightforward to calculate:
Page 32 from D. Higdon, A primer on space-time
modelling from a Bayesian perspective, in Statistical
Methods for Spatio-Temporal Systems, B.
Finkenstadt, L. Held, and V. Isham, eds., Chapman
& Hall/CRC, Boca Raton, FL, 2007, pp. 217–279.
To get a sample from the posterior, after sampling λy and λz, it is expensive to sample this
R2.6 x 10^5 Gaussian (this is an expensive step for any linear Bayesian model y = Az + ε) :
Apply iterative samplers from a MASSIVE Gaussian N(µ, Λ-1)
Solving Λx=b:
Gauss-Seidel
Chebyshev-GS
CG
Sampling y ~ N(µ, Λ-1):
Gibbs
Chebyshev-Gibbs
CG-Lanczos sampler
The sampler error decreases according to a polynomial,
(E(yk) - µ)= Pk(I-G) (E(y0) – µ)
(Λ -1 - Var(yk)) = Pk(I-G) (Λ-1 - Var(y0)) Pk(I-G)T
Gibbs
Pk(I-G) =
for any Krylov vector v
CG-Lanczos
Chebyshev-Gibbs
Gk,
with error reduction
factor
p(G)2
(Λ-1 - Var(yk))v = 0
Pk(I-G)
Var(yk) is the kth
order
CG polynomial
kth order Chebyshev
polyomial,
optimal variance asymptotic
average reduction factor is
ļƒ¦ 1 ļ€­ cond ( I ļ€­ G ) ļƒ¶
2
ļƒ·
ļ³ ļ€½ ļƒ§ļƒ§
ļƒ·
ļƒØ 1 ļ€« cond ( I ļ€­ G ) ļƒø
2
converges in a finite
number of steps* in a
Krylov space
depending on eig(I-G)
In theory and on a computer (finite precision),
a Chebyshev accelerated sampler is faster than a Gibbs sampler
but slower than a Cholesky sampler
Example: In 10x10 domain, N(µ,
Covariance
matrix
convergence,
||A-1 – Var(yk)||2 /||A-1 ||2
Benchmark for cost in
finite precision is the cost
of a Cholesky
factorization
Benchmark for
convergence in
finite precision is
105 Cholesky samples
-1
) in R100
… but in R2.6 x 10^5,
iterative sampling is much faster than Cholesky sampling
Each Gaussian sample of a single movie frame costs:
1. 6.9x1010 flops for a Cholesky factorization
[O(b2n) = O(5124)]
2. 9.7x107 flops for an iterative Chebyshev sampler
[O(#iters x 2n) = O(185x2x5122)]
3. 2.2x107 flops for an iterative CG sampler
[O(#iters x n) = O(85x5122)]
These iterative samplers of
N(0, Λ-1) are of the form:
1. Split Λ = M - N for M invertible.
2. Sample ck ~ N(0, (2-vk)/vk ( (2 – uk)/ uk M + N)
3. xk+1 = (1- vk) xk-1 + vk xk + vk uk M-1 (b- Λ xk)
4. yk+1 = (1- vk) yk-1 + vk yk + vk uk M-1 (ck - Λyk)
5. Check for convergence:
Quit if ||b - Λ xk+1 || is small.
Otherwise, update linear solver parameters vk and uk, go to step 2.
Examples of matrix splittings …
yk+1 = (1- vk) yk-1 + vk yk + vk uk M-1 (ck - Λ yk)
ck ~ N(0, (2-vk)/vk ( (2 – uk)/ uk MT + N)
Gibbs
MGS = D + L,
vk = uk = 1
Chebyshev-Gibbs
M = MGS D MTGS,
vk and uk are functions
of the 2 extreme
eigenvalues of
I-G=M-1Λ
CG-Lanczos
M = I,
vk , uk are
functions of
the residuals
b - Λ xk
My attempt at the historical development of
iterative Gaussian samplers:
Type
Stationary
(vk = uk = 1)
Sampler
Matrix Splittings
Gibbs (Gauss-Seidel)
Literature
Adler 1981,
Goodman & Sokal 1989,
Amit & Grenander 1991
BF (SOR)
Barone & Frigessi 1990
REGS (SSOR)
Roberts & Sahu 1997
Chen & Oliver 1996
Bardsley, Solonen, Haario & Laine 2014
RML or linear RTO
Generalized
Non-stationary
Multi-Grid
Lanczos Krylov
subspace
Krylov sampling
CD Sampler
with conjugate
directions
CG Sampler
Krylov sampling
with Lanczos
Lanczos sampler
vectors
Chebyshev
Fox & P 2014
Goodman & Sokal 1989
Liu & Sabatti 2000
Schneider & Wilsky 2003
Fox 2007
Ceriotti, Bussi & Parrinello 2007
P & Fox 2012
Simpson, Turner, & Pettitt 2008
Aune, Eidsvik, & Pokern 2014
Chow & Saad 2014
Fox & P 2014
Posterior based
on linear model:
Mean(z)
SD(z)
med(λy-1/2) med(λz-1/2)
Cholesky sampler
0.3717
0.1
Chebyshev sampler
0.3715
0.1
0.7130
10
1. extreme e-vals of M-1Λ
2. CG sample
CG sampler
Bayesian UQ with iterative samplers using a linear model of heights
Salt water added
here
25-27%
reduction
based on 99% credible interval
Bayesian UQ with iterative samplers using a linear model of heights
These blips in
volume are due to
peristaltic pump
rate (2mL/m)
en.wikipedia.org/wiki/File:Peristal
tic_pump.gif
Model assessment and tweaking
Joint work with Colin Fox
“Assess and Tweak” approach
resembles Doug Nichka’s (director of
Computational & Information System Lab's (CISL)
Institute for Mathematics Applied to Geosciences (IMAGe)
at the National Center for Atmospheric Research (NCAR))
SIAMUQ2014 workshop
Hierarchical models of tropical ocean winds given high-resolution satellite
scatterometer observations and low-resolution assimilated model output.
(Wikle, Milliff, Nychka, Berliner, JASA, 2001)
Assessing the Linear Model’s Image Reconstruction:
Linear model yields samples z є R512^2 over 2-D domain that do not:
ļ‚§ capture holes or over-hanging biofilm features
ļ‚§ allow an accurate reconstruction of the CSLM images
Instead, consider a non-linear model 3D binary τ є {0,1}17x512^2 over 3-D
domain (Lewandowski & Beyenal 2014; Higdon 2002)
ļ‚§ Let τ(š‘–, š‘—, š‘˜) =
0ā”ifā”noā”biofilm
1ā”ifā”biofilm
ļ‚§ l (τā”) = min(255, š¹(l0τ š‘Ÿ š‘‘ ))ā”= blurred attenuated light profile
š¹ā”implementsā”smoothingā”inā”theā”zā”direction,
l0ā”ā”isā”luminescenseā”ofā”topā”ofā”theā”biofilm,
ā”ā”ā”ā”ā”ā”ā”ā”š‘Ÿā” < 1ā”modelsā”attentationā”ofā”lightā”atā”biofilmā”depthā”š‘‘
Example: A 2D vertical slice from CSLM
The CSLM user typically thresholds images
before quantitation,
i.e., only bright pixels are included
Example: A 2D vertical slice from CSLM
One might ‘fill in’ below the bright pixels to
estimate where the biofilm is ….
Nonlinear Bayesian model with 1 Gaussian layer:
ļ‚§ Let τ(š‘–, š‘—, š‘˜) =
0ā”ifā”noā”biofilm
1ā”ifā”biofilm
ļ‚§ l (τā”) = min(255, š¹(l0τ š‘Ÿ š‘‘ ))ā”= blurred attenuated light profile
š¹ā”implementsā”smoothingā”inā”theā”š‘§ā”direction,
l0ā”ā”isā”luminescenseā”ofā”topā”ofā”theā”biofilm,
ā”ā”ā”ā”ā”ā”ā”ā”š‘Ÿā” < 1ā”modelsā”attentationā”ofā”lightā”atā”biofilmā”depthā”š‘‘
ļ‚§ y = l (τā”) + ε, ε ~ N(0,λyI)
likelihood:
y|τ ~ N(l (τā”) , λyI)
prior:
τ ~ Ising(J)
hyperparameters: ā”ā”ā”ā”ā”ā”ā”ā”l0ā”, r, λy, J
ļ‚§ Apply MCMC to estimate p(τ|y)
Example: A 2D vertical slice from CSLM
An MCMC sample from the posterior using 1 Gaussian layer ...
y
τ|y
Nonlinear Bayesian model with 2 Gaussian layers:
ļ‚§ Let τ(š‘–, š‘—, š‘˜) =
0ā”ifā”noā”biofilm
1ā”ifā”biofilm
ļ‚§ l (š‘„|τ) = min(255, š¹(š‘„ š‘Ÿ š‘‘ ))ā”= blurred attenuated light profile
š¹ā”implementsā”smoothingā”inā”theā”zā”direction,
š‘„ā”ā”isā”luminescenseā”throughoutā”theā”biofilm,
ā”ā”ā”ā”ā”ā”ā”ā”š‘Ÿā” < 1ā”modelsā”attentationā”ofā”lightā”atā”biofilmā”depthā”š‘‘
ļ‚§ y = l (š‘„|τ) + ε, ε ~ N(0,λyI);
likelihood:
y|τ, š‘„ ~ N(l (š‘„|τ) , λyI)
x|τ ~ N(l0τ, λxI)
priors:
l0
~ Unif(0, L0)
ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”ā”τ
~ Ising(J)
hyperparameters: ā”ā”ā”ā”ā”ā”ā”ā”r, λy, λx, L0, J
Example: A 2D vertical slice from CSLM
An MCMC sample from the posterior using 2 Gaussian layers ...
y
τ
x|τ
Example: A 2D vertical slice from CSLM
An MCMC sample from the posterior using 2 Gaussian layers ...
y
Example: A 2D vertical slice from CSLM
Assess residuals to further fine-tune the model ...
Next steps:
ļ‚§ MCMC implementation:
ļ‚§ Devise efficient schemes for sampling
hyperparameters, esp. variance components
ļ‚§ Can be slow due to single or paired pixel moves in proposal
ļ‚§ Model correlation across time
ļ‚§ Alternative to 3D binary model is to model each of N cells by points or
ellipsoids, z є R3N or R10N (Al-Awadhi, Hurn & Jennison 2011)
ļ‚§ Construct posterior for θ|… and model y = l (θ) + ε, where the vector of
unknowns θ are directly related to biofilm and confocal microscope
parameters like limiting nutrients, wavelength of laser light
ļ‚§ Devise sampling schemes over time (trade-off with spatial resolution)
ļ‚§ Compare with other estimation methods
(Lewandowski & Beyenal 2014; Jonkman & Stelzer 2002; COMSTAT (Heydorn et al. 2000); Errington & White 1999)
References:
ļ‚§ Iterative Sampling
Fox & P. Convergence in variance of Chebyshev accelerated Gibbs samplers. SIAM Journal on Scientific Computing, 36(1):A124-A147, 2014.
Bardsley, Solonen, Haario, & Laine. Randomize-then-Optimize: a method for sampling from posterior distributions in nonlinear
inverse problems," SIAM Journal on Scientific Computing 36(4): A1359-C399, 2014.
P & Fox. Sampling Gaussian Distributions in Krylov Spaces with Conjugate Directions. SIAM Journal on Scientific Computing, 34(3), 2012.
ļ‚§
Confocal image analysis
Lewandowski & Beyenal. Fundamentals of Biofilm Research, 2nd ed. 2014.
Jonkman & Stelzer. Resolution and Contrast in Confocal and Two-Photon Microscopy. Confocal and Two-Photon Microscopy, Diaspro ed., 2002.
Heydorn, Nielsen, Hentzer, Sternberg, Givskov, Ersboll & Molin. Quantification of biofilm structures by the novel computer program COMSTAT,
Microbiology 146, 2000. (MATLAB package)
Rodenacker, Bruhl, Hausner, Kuhn, Liebscher, Wagner, Winkler & Wuertz. Quantification of biofilms in multi-spectral digital volumes from confocal
laser-scanning microscopes. Image Anal. Stereo 19:151-156, 2000. (math morphology, watershed algorithm)
Errington & White. Measuring Dynamic Volume In Situ by Confocal Microscopy. Confocal Microscopy, Paddock ed., 1999
Lewandowski, Webb, Hamilton & Harkin. Quantifying biofilm structure. Water Sci Tech., 1999.
Sheppard & Shotton, Microscopy Handbooks 38: Confocal Laser Scanning Microscopy, 1997. (surface representation)
Van Der Voort & Strasters, Restoration of confocal images for quantitative image analysis. J. Microscopy 178: 1995. (PSF
individual cells)
estimation, volumes of
Bloem, Veninga & Shephard. Fully Automated Determination of Soil Bacterium Numbers, cell volumes, and frequencies of dividing cells by confocal
laser scanning microscopy and image analysis. AEM. 1995. (math morphology)
Stewart, Peyton, Drury & Murga. Quantitative Observations of heterogeneities in Pseudomonas aeruginosa biofilms. AEM 59: 1993.
ļ‚§
Bayesian image analysis
Al-Awadhi, Hurn & Jennison. Three-dimensional Bayesian analysis and confocal microscopy. Journal of Applied Statistics, 38(1), 2011 (applied to
cartilage cells).
Higdon. A primer on space-time modelling from a Bayesian perspective, in Statistical Methods for Spatio-Temporal Systems, B. Finkenstadt, L. Held, &
V. Isham, eds., 2007.
Geman & Geman. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. Advances in Applied Statistics, 1993.
Download