Approximate Bayesian Inference for Machine Learning 1,2 Louis Ellam l.ellam@warwick.ac.uk Warwick Centre of Predictive Modelling (WCPM) The University of Warwick 13 October 2015 http://www2.warwick.ac.uk/wcpm/ 1 Based on Louis Ellam, ”Approximate Inference for Machine Learning”, MSc Dissertation, Centre for Scientific Computing, University of Warwick 2 Warwick Centre of Predictive Modelling (WCPM) Seminar Series, University of Warwick (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 1 / 44 Outline 1 Motivation Machine Learning Classical Approaches to Regression Statistical Approaches to Regression 2 Monte Carlo Inference Importance Sampling Markov Chain Monte Carlo 3 Approximate Inference Laplace Approximation Variational Inference Expectation Propagation 4 Summary of Methods (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 2 / 44 Outline 1 Motivation Machine Learning Classical Approaches to Regression Statistical Approaches to Regression 2 Monte Carlo Inference Importance Sampling Markov Chain Monte Carlo 3 Approximate Inference Laplace Approximation Variational Inference Expectation Propagation 4 Summary of Methods (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 3 / 44 Machine Learning Machine Learning is the study of algorithms that can learn from data Regression Given D = {x 1 , . . . , x N , t1 , . . . , tN }, find a model that allows us to accurately estimate the response of any input. Clustering Given data D = {x 1 , . . . , x N } and some fixed number of clusters K , assign each data point to a cluster so that points in each cluster are close together. The Clutter Problem Given data D = {x 1 , ..., x N } comprising of a noisy signal that is immersed in clutter, recover the original signal θ i.e. determine E[θ] and p(D). Observations are drawn from p(x |θ) = (1 − w )N (x |θ, I ) + w N (x |0,aI ). (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 4 / 44 Outline 1 Motivation Machine Learning Classical Approaches to Regression Statistical Approaches to Regression 2 Monte Carlo Inference Importance Sampling Markov Chain Monte Carlo 3 Approximate Inference Laplace Approximation Variational Inference Expectation Propagation 4 Summary of Methods (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 5 / 44 Classical Approach to Regression Linear Model Use a linear model f defined over M + 1 linearly independent basis functions φ0 (x ), . . . , φM (x ) with weights w0 , . . . , wM M X f (x , w ) = wm φm (x ). m=0 Tikhonov Regularisation E (w ) = λ 1 2 2 kΦw − t k + kw k , 2 2 w ∗ = argminE (w ) = (ΦT Φ + λI )−1 ΦT t . w C. Bishop (2006), D. Calvetti et al. (2007) (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 6 / 44 Outline 1 Motivation Machine Learning Classical Approaches to Regression Statistical Approaches to Regression 2 Monte Carlo Inference Importance Sampling Markov Chain Monte Carlo 3 Approximate Inference Laplace Approximation Variational Inference Expectation Propagation 4 Summary of Methods (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 7 / 44 Statistical Approach to Regression Bayes Theorem p(θ|D) = p(D|θ)p(θ) . p(D) Likelihood and Prior We expect that an observation to satisfy t = f (x , w ) + η, where η denotes mean centered Gaussian noise with precision parameter β. Thus the likelihood function is defined as N Y T −1 p(t |x , w , β) = N (tn |w φ(x n ), β ). n=1 p(w |α) = N (w |0,α −1 I ). Posterior ln p(w |, x , t , α, β) = − N β X α T T 2 (tn − w φ(x n )) − w w + const. 2 n=1 2 R. Kass (1995) (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 8 / 44 Bayesian Approach to Regression Predictive Distribution p(t|x , t , α, β) = Z p(t|w )p(w |x , t , α, β)d w = N (t|m N φ(x ), σN (x )), T where 2 σN = 2 1 T + φ(x ) S N φ(x ), β T S −1 N = αI + βΦ Φ, T mN = βS N Φ y . ’More Bayesian’ if α is treated as a random variable, but analytically intractable. Classical integration methods are not suitable for high dimensional probability density functions. C. Bishop (2006) (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 9 / 44 Outline 1 Motivation Machine Learning Classical Approaches to Regression Statistical Approaches to Regression 2 Monte Carlo Inference Importance Sampling Markov Chain Monte Carlo 3 Approximate Inference Laplace Approximation Variational Inference Expectation Propagation 4 Summary of Methods (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 10 / 44 Monte Carlo Integration Monte Carlo Estimate Z q(θ)f (θ)dθ ≈ Ef [q(θ)] = Z Ef [q(θ)] = q(θ) N 1 X q(X n ) where N n=1 f (θ) g (θ)dθ = g (θ) | {z } iid Xn ∼ f (θ). Z q(θ)w (θ)g (θ)dθ =:w (θ) Importance Sampling Estimate Ef [q(θ)] ≈ N 1 X w (Z n )q(Z n ) where N n=1 iid Zn ∼ g (θ). Self-Normalised Importance Sampling Estimate Ef [q(θ)] ≈ PN m=1 (WCPM) 1 w̃ (Z m ) N X w̃ (Z n )q(Z n ) where iid Zn ∼ g (θ). n=1 Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 11 / 44 Monte Carlo Integration for the Clutter Problem Define importance weights w (θ) = f˜(θ) p(D|θ)p(θ) = = p(D|θ). g̃ (θ) p(θ) Expectation (SNIS) and Evidence (’Vanilla’ MC) Ep(θ|D) [θ] ≈ PN m=1 1 p(D|X m ) p(D) = Ep(θ) [p(D|θ)] ≈ N X iid X n p(D|X n ) where X n ∼ p(θ), n=1 N 1 X p(D|X n ) where N n=1 iid Xn ∼ p(θ). T. Minka (2001a), J. Liu (2008) (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 12 / 44 Outline 1 Motivation Machine Learning Classical Approaches to Regression Statistical Approaches to Regression 2 Monte Carlo Inference Importance Sampling Markov Chain Monte Carlo 3 Approximate Inference Laplace Approximation Variational Inference Expectation Propagation 4 Summary of Methods (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 13 / 44 Markov Chain Monte Carlo A discrete Markov chain is defined as a series of random variables p(Z (M) |Z (1) ,...,Z (M−1) ) = p(Z Z (1) , . . . , Z (M) (M−1) |Z ). (M) Draw new sample from ) accept with probability ( proposal distribution and f (Z )g (Z (M−1) |Z ) . A(Z |Z (M−1) ) = min 1, f (Z (M−1) )g (Z |Z (M−1) ) Algorithm 1 Metropolis-Hastings (0) (0) Initialise Z (0) = {Z 1 , . . . , Z K } for M=1,. . . ,T do Draw Z ∼ g (·|Z (M−1) ) n Compute A(Z |Z (M−1) ) = min 1, With probability A(Z |Z (M−1) ) set end for f (Z )g (Z (M−1) |Z ) f (Z (M−1) )g (Z |Z (M−1) ) (M) o = Z , otherwise set Z Z (M) = Z (M−1) Algorithm 2 Gibbs Sampling (0) (0) (M) ∼ Z 2 |Z 1 (M) K ∼ Z K |Z 1 Initialise Z 1 , . . . , Z K for M=1,. . . ,T do (M) (M−1) (M−1) draw Z 1 ∼ Z 1 |Z 2 , . . . , ZK draw .. . draw end for Z2 Z (M) (WCPM) (M−1) , Z3 (M) (M−1) , . . . , ZK (M) , . . . , Z K −1 Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 14 / 44 Gibbs Sampling for Clustering Likelihood for GMM p(D|µ, Σ, π) = N X K Y πk N (x n |µk , Σk ). n=1 k=1 Introduce Latent Variables p(znk = 1|πk ) = πk , p(z |π) = N Y K Y z πknk . n=1 k=1 Joint Distribution for GMM (Augmented) p(D, z , π, µ, Σ) = p(D|z , π, µ, Σ)p(z |π)p(π)p(µ)p(Σ). p(D|µ, Σ, z ) = N Y K Y N (x n |µk , Σk ) znk . n=1 k=1 p(µ)p(Σ) = K Y N (µk |m 0 , V 0 )IW(Σk |W 0 , v0 ), k=1 p(π) = Dir(π|α0 ). (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 15 / 44 Gibbs Sampling for Clustering Full Conditionals πk N (x n |µk , Σk ) p(znk = 1|x n , µ, Σ, π) = PK , j=1 πj N (x n |µj , Σj ) p(π|z ) = Dir(π|αk ), p(µk |Σk , z , x ) = N (µk |m k , V k ), p(Σk |µk , z , x ) = IW(Σk |W k , νk ), K. Murphy (2012) (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 16 / 44 Outline 1 Motivation Machine Learning Classical Approaches to Regression Statistical Approaches to Regression 2 Monte Carlo Inference Importance Sampling Markov Chain Monte Carlo 3 Approximate Inference Laplace Approximation Variational Inference Expectation Propagation 4 Summary of Methods (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 17 / 44 Laplace Approximation Seek to approximate probability density function p(z ) of the form p(z ) = 1 Z f (z ). 1 T ln f (z ) ≈ ln f (z 0 ) + (z − z 0 ) ∇ ln f (z )|z =z 0 + (z − z 0 ) ∇∇ ln f (z )|z =z 0 (z − z 0 ), 2 ∇f (z )|z =z 0 = 0. T Laplace Approximation q(z ) = N (z |z 0 , A −1 ), where A = −∇∇ ln f (z )|z =z 0 , Example 2 p(z) ∝ exp(−z /2)σ(20z + 4), where σ(t) = (1 + e −t −1 ) . R. Kass et al. (1995), C. Bishop (2006) (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 18 / 44 EM Algorithm (GMM) Want to find mode of GMM ln p(D|π, µ, Σ) = N X ln K nX n=1 o πk N (x n |µk , Σk ) . k=1 MLE of µk is µk = N 1 X γ(znk )xn , Nk n=1 where Nk = N X γ(znk ), n=1 πk N (x n |µk , Σk ) . γ(znk ) = PK j=1 πj N (x n |µj , Σj ) MLE of Σk is Σk = N 1 X T γ(znk )(x n − µk )(x n − µk ) . Nk n=1 A. Dempster et al. (1977), C. Bishop (2006) (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 19 / 44 EM Algoririthm (GMM) The maximisation for π is constrained to PK k=1 πk = 1 so find MLE of K X πk − 1 , ln p(D|π, µ, Σ) + λ k=1 which yields the following πk = Nk . N Algorithm 3 EM GMM Initialise µk , Σk , πk for k = 1, . . . , K repeat E Step: Evaluate responsibilities πk N (x n |µk , Σk ) . γ(znk ) = P K π N (x |µ , Σ ) n j j j=1 j M Step: Define Nk = N X γ(znk ), n=1 then update parameters µk = 1 Σk = N X Nk n=1 1 N X Nk n=1 γ(znk )x n , T γ(znk )(x n − µk )(x n − µk ) , πk = Nk . N until convergence (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 20 / 44 EM Algorithm (GMM) Link to code / .mp4 (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 21 / 44 Outline 1 Motivation Machine Learning Classical Approaches to Regression Statistical Approaches to Regression 2 Monte Carlo Inference Importance Sampling Markov Chain Monte Carlo 3 Approximate Inference Laplace Approximation Variational Inference Expectation Propagation 4 Summary of Methods (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 22 / 44 Variational Inference KL Divergence Choose a proxy function q(θ) and optimise θ so that q is as close fit as possible to p. One such approach is to minimise Z p(θ|D) dθ. KL(q||p) := − q(θ) ln q(θ) Example 2 p(z) ∝ exp(−z /2)σ(20z + 4), where σ(t) = (1 + e −t −1 ) . H. Attias (2000), Z. Ghahramani et al. (1999), C. Bishop (2006) (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 23 / 44 Variational Inference Z p(D, θ) p(D, θ) q(θ) dθ = q(θ) ln dθ p(θ|D) p(θ|D) q(θ) Z Z p(θ|D) p(D, θ) dθ − q(θ) ln dθ = q(θ) ln q(θ) q(θ) ln p(D) = ln p(D, θ) = p(θ|D) Z q(θ) ln = L(q) + KL(q||p). Lowerbound Minimising the KL divergence is equivalent to maximising the lower bound Z p(D, θ) L(q) = q(θ) ln dθ, q(θ) Lowerbound Evidence Approximation It may be shown that KL(q||p) ≥ 0. hence p(D) may be approximated using ln p(D) ≥ L(q), p(D) → exp L(q) (WCPM) as KL(q||p) → 0. Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 24 / 44 Mean Field Approximation Mean Field Approximation Assume q(θ) can be expressed in factorised form q(θ) = M Y qm (θ m ). m=1 Mean Field Lowerbound L(q) = Z Y M p(D, θ) q(θ m ) ln QM dθ = m=1 q(θ m ) m=1 Z = q(θ j ) nZ ln p(D, θ) M Y Z Y M M n o X q(θ m ) ln p(D, θ) − ln q(θ m ) dθ m=1 o q(θ m )dθ m dθ j m=1 Z − q(θ j ) ln q(θ j )dθ j + const m=1 m6=j Z = Z q(θ j )Em6=j [ln p(D, θ)]dθ j − q(θ j ) ln q(θ j )dθ j + const = −KL(q(θ j )|| exp Em6=j [ln p(D, θ)]) + const. Mean Field Minimised KL Divergence ∗ ln qj (θ j ) = Em6=j [ln p(D, θ)] + const. (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 25 / 44 Variational Inference GMM Target Posterior p(D, z , π, µ, Λ) = p(D|z , π, µ, Λ)p(z |π)p(π)p(µ|Λ)p(Λ). Conjugate Prior p(µ|Λ)p(Λ) = K Y p(µk |Λk )p(Λk ) = k=1 K Y N (µk |m 0 , (β0 Λk ) −1 )W(Λk |W 0 , v0 ). k=1 Joint Distribution p(D, z , µ, Λ, π) ∝ N Y K Y n=1 k=1 (WCPM) πk N (x n |µk , Λk ) z nk ! · Dir(π|α0 ) · K Y N (µk |m 0 , β0 Λk )W(Λk |W 0 , ν0 ). −1 k=1 Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 26 / 44 Variational Inference GMM Mean Field Approximation q(z , π, µ, Λ) = q(z )q(π, µ, Λ), Mean Field Minimised KL Divergence ∗ ln qj (θ j ) = Em6=j [ln p(D, θ)] + const. Expression for Responsbilities ln q (z ) = ∗ N X K X znk ln ρnk + const, n=1 k=1 where 1 D 1 T E[ln |Λk |] − ln 2π − Eµk ,Λk (x n − µk ) Λk (x n − µk ) , 2 2 2 which can be used to give an expression of the responsibilities ρnk rnk = E[znk ] = PK . j=1 ρnj ln ρnk = E[ln πk ] + (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 27 / 44 Variational Inference GMM Define Nk = N X rnk , n=1 x̄k Sk = = N 1 X rnk x n , Nk n=1 N 1 X T rnk (x n − x̄ k )(x n − x̄ k ) . Nk n=1 Expression for Model Parameters ln q (π, µ, Λ) = Ez [ln p(D, z , π, µ, Λ)] + const. ∗ By inspection q(π, µ, Λ) = q(π)q(µ, Λ) = q(π) K Y q(µk , Λk ), k=1 q(π|α) = Dir(π|α), q (µk |Λk ) = N (µk |m k , (βk Λk ) ∗ q (Λk ) = W(Λk |Wk , νk ). −1 ), ∗ (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 28 / 44 Variational Inference GMM Expressions for Model Parameters αk = α0 + Nk , βk = β0 + Nk and νk = ν0 + Nk , m k = β1 (β0 m 0 + Nk x̄ k ), k β0 Nk −1 W −1 = W + N S + (x̄ − m 0 )(x̄ k − m 0 )T . k k 0 k β0 + Nk k Expressions for Responsbilities T T −1 Eµk ,Λk (x n − µk ) Λk (x n − µk ) = νk (x n − m k ) W k (x n − m k ) + Dβk , ln π̃k := E[ln |πk |] = ψ(αk ) − ψ( K X αk ), k=1 ln Λ̃k := E[ln |Λk |] = D ν + 1 − i X k + D ln 2 + ln |Wk |, ψ 2 i=1 Mixing Coefficients E[πk ] = (WCPM) α0 + Nk , K α0 + N Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 29 / 44 Variational Inference GMM Algorithm 4 Variational Bayes GMM Initialise π, µ, Λ, m , W , β0 , ν0 , α0 repeat E Step: Compute the responsibilities as follows 1/2 rnk ∝ π˜k Λ̃k exp n − o D νk − (x n − m k )T W k (x n − m k ) , 2βk 2 ln π̃k = ψ(αk ) − ψ K X αk , k=1 ln Λ̃k = D X νk + 1 − i ψ + D ln 2 + ln |W k |, 2 i=1 M Step: Update model parameters as follows Nk = N X rnk , n=1 x̄ k Sk = = N 1 X rnk x n , Nk n=1 N 1 X rnk (x n − x̄ k )(x n − x̄ k )T , Nk n=1 αk = α0 + Nk , βk = β0 + Nk and νk = ν0 + Nk , mk −1 W k = −1 W0 = βk−1 (β0 m 0 + Nk x̄ k ), + Nk S k + β0 N k T (x̄ k − m 0 )(x̄ k − m 0 ) . β0 + Nk until convergence (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 30 / 44 Variational Inference GMM Link to code / .mp4 (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 31 / 44 Variational Inference GMM Link to code / .mp4 (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 32 / 44 Variational Inference GMM Lowerbound Lowerbound L(q) = X ZZZ z q(z , π, µ, Λ) ln n p(D, z , π, µ, Λ) o q(z , π, µ, Λ) dπdµdΛ = E[ln p(D, z , π, µ, Λ)] − E[ln q(z , π, µ, Λ)] = E[ln p(D|z , µ, Λ)] + E[ln p(z |π)] + E[ln p(π)] + E[ln p(µ, Λ)] − E[ln q(z )] − E[ln q(π)] − E[ln q(µ, Λ)]. (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 33 / 44 Variational Inference GMM Model Selection Model Selection with Lowerbound p(K |D) = P K !p(D|K ) , K !p(D|K 0 ) K0 where p(D|K ) ≈ exp L(K ). Alternative View LM (K ) ≈ L(K ) + ln K !. A. Corduneanu (2001), C. Bishop (2006), K. Murphy (2012) (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 34 / 44 Outline 1 Motivation Machine Learning Classical Approaches to Regression Statistical Approaches to Regression 2 Monte Carlo Inference Importance Sampling Markov Chain Monte Carlo 3 Approximate Inference Laplace Approximation Variational Inference Expectation Propagation 4 Summary of Methods (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 35 / 44 Expectation Propagation Minimise Reverse KL Divergence Z KL(p||q) = − p(θ|D) ln q(θ) dθ. p(θ|D) Example 2 p(z) ∝ exp(−z /2)σ(20z + 4), where σ(t) = (1 + e −t −1 ) . T. Minka (2001a), T. Minka (2001b) (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 36 / 44 Expectation Propagation Exponential Family q(θ) = h(θ)g (η) exp{η T u (θ)}, Reverse KL Divergence for Exponential Family Z KL(p||q) = p(θ) ln Z = n p(θ) o q(θ) dθ T p(θ) ln p(θ) − ln (h(θ)g (η)) − η u (θ) dθ = − ln g (η) − η Ep(θ) [u (θ)] + const. T Minimised Reverse KL Divergence for Exponential Family −∇ ln g (η) = Ep(θ) [u (θ)], i.e. when Eq(θ) [u (θ)] = Ep(θ) [u (θ)]. Difficult, as trying to taking expectation w.r.t. unknown posterior. (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 37 / 44 Expectation Propagation Assumed Form of Joint Distribution p(θ, D) = Y fn (θ), n Approximating Distribution An approximating posterior belonging to the exponential family is sought of the following form where Z is a normalisation constant 1 Y˜ q(θ) = fn (θ). Z n The idea is to refine each factor factor f˜n by making the approximating distribution Y new q (θ) ∝ f˜n (θ) f˜j (θ), j6=n a closer approximation to fn (θ) Y f˜j (θ). j6=n (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 38 / 44 Expectation Propagation To proceed, factor n is first removed from the approximating distribution by q \n (θ) = q(θ) . f˜n (θ) A new distribution is defined by n p (θ) = with normalisation constant 1 \n fn (θ)q (θ), Zn Z Zn = fn (θ)q \n (θ)dθ. The factor f˜n is updated such that the sufficient statistics of q are matched to those of p n . Algorithm 5 Expectation Propagation Initialise approximating factors f˜n (θ) Q Initialise approximating function q(θ) ∝ n f˜n (θ) repeat Choose a factor f˜n (θ) for refinement Set q \n (θ) = q(θ)/f˜n (θ) R Set sufficient statistics of q new (θ) to those of q \n (θ)fn (θ) including Z = q \n (θ)fn (θ)dθ Set f˜n (θ) = Zn q new (θ)/q \n (θ) until convergence (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 39 / 44 Expectation Propagation for the Clutter Problem Factorised Posterior Distribution fn (θ) = (1 − w )N (x n |θ, I ) + w N (x n |0, aI ) for n = 1, ..., N f0 (θ) = N (θ|0, aI ) Approximating Distribution Taken to be an isotropic Gaussian of the form q(θ) = N (θ|m , v I ), q is treated as a product of N + 1 approximating factors q(θ) ∝ N Y f˜n (θ). n=0 The factors have been defined as follows where factor f˜n has scaling constant sn , mean vn I f˜n (θ) = sn N (θ|m n , vn I ), m n and variance in terms of a scaled Gaussian distribution. The approximating distribution is therefore Gaussian q(θ) = N (θ|m , v I ). p(D) ≈ R q(θ)dθ is obtained using results for Gaussian products. (WCPM) Approximate Bayesian Inference for Machine Learning 1,2 October 13, 2015 40 / 44 Expectation Propagation for Clutter Problem Algorithm 6 Expectation Propagation for the Clutter Problem Initialise the prior factor f˜0 (θ) = N (θ|0, b I ) Initialise the remaining factors f˜n = 1, vn = ∞, m n = 0 Initialise the approximating function m = 0, v = b repeat for n=1,. . . ,N do Remove the current factor from q(θ) (v \n )−1 = v −1 − vn−1 , m \n = m + vn−1 v \n (m − m n ). Evaluate the new posterior q(θ) by matching sufficient statistics Zn = (1 − w )N (x n |m \n , (v \n + 1)I ) + w N (x n |0, aI ), ρn = 1 − m = m \n + ρn v = v \n − ρn w N (x n |0, aI ), Zn v \n (x n − m \n ), v \n + 1 (v \n )2 (v \n )2 kx n − m \n k2 + ρn (1 + ρn ) . v \n + 1 D(v \n + 1)2 Evaluate and store the new factor f˜n (θ) vn−1 = v −1 − (v \n )−1 , mn =m \n sn = + (vn + v \n )(v \n )−1 (m − m \n ), Zn . N (m n |m \n , (vn + v \n )I ) end for until convergence (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 41 / 44 Expectation Propagation for the Clutter Problem (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 42 / 44 Summary of Methods for the Clutter Problem T. Minka (2001a), T. Minka (2001b) (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 43 / 44 Further Reading and Acknowledgments MSc Dissertation L. Ellam. Approximate Bayesian Inference for Machine Learning. The MSc dissertation may be downloaded from here. Software The Python code for implementing the various algorithms reported in this presentation may be obtained by clicking on the respective figures. All software and datasets may be obtained in zip form from here. Acknowledgments Professor N. Zabaras (Thesis Advisor) and Dr P. Brommer (Second Marker) EPSRC Strategic Package Project EP/L027682/1 for research at WCPM (WCPM) Approximate Bayesian Inference for Machine Learning 1, 2 October 13, 2015 44 / 44