Approximate Inference in GMs Sampling Le Song

advertisement

Approximate Inference in GMs

Sampling

Le Song

Machine Learning II: Advanced Topics

CSE 8803ML, Spring 2012

What we have learned so far

Variable elimination and junction tree

Exact inference

Exponential in tree-width

Mean field, loopy belief propagation

Approximate inference for marginals/conditionals

Fast, but can get inaccurate estimates

Sample-based inference

Approximate inference for marginals/conditionals

With “enough” samples, will converge to the right answer

2

Variational Inference

What is the approximating structure?

? ?

𝑃 𝑄 𝑄

How to measure the goodness of the approximation of

𝑄 𝑋

1

, … , 𝑋 𝑛

to the original 𝑃 𝑋

1

, … , 𝑋 𝑛

?

Reverse KL-divergence 𝐾𝐿(𝑄| 𝑃 mean field

𝑄

?

How to compute the new parameters?

Optimization Q ∗ = argmin

Q

𝐾𝐿(𝑄| 𝑃

𝑁𝑒𝑀 π‘ƒπ‘Žπ‘Ÿπ‘Žπ‘šπ‘’π‘‘π‘’π‘Ÿπ‘ 

3

Deriving mean field fixed point condition I

Approximate

𝑃 𝑋 ∝ Ψ π· 𝑗

with

𝑄 𝑋 = 𝑄 𝑖

𝑋 𝑖 𝑠. 𝑑. ∀𝑋 𝑖

, 𝑄 𝑖

𝑋 𝑖

> 0, ∫ 𝑄 𝑖

𝑋 𝑖 𝑑𝑋 𝑖

= 1

𝐷(𝑄| 𝑃 = −∫ 𝑄 𝑋 ln 𝑃 𝑋 𝑑𝑋 + ∫ 𝑄 𝑋 ln 𝑄 𝑋 𝑑𝑋 + π‘π‘œπ‘›π‘ π‘‘.

= −∫ 𝑄

= − ∫ 𝑖

𝑋 𝑖 𝑖∈𝐷 𝑗

( ln Ψ π· 𝑗

𝑄 𝑖

𝑋 𝑖

)𝑑𝑋 + ln Ψ π· 𝑗 𝑑𝑋

𝐷 𝑗

∫ 𝑄 𝑖

𝑋

+ ∫ 𝑄 𝑖 𝑖

( ln 𝑄 𝑖

𝑋 𝑖 ln 𝑄 𝑖

𝑋 𝑖

𝑋 𝑖

)𝑑𝑋 𝑑𝑋 𝑖

𝑄

1

𝑋

1

Ψ(𝑋

1

, 𝑋

5

)

E.g., ∫ 𝑄

∫ 𝑄

1

𝑋

1 𝑖

𝑄

5 𝑖

𝑋 𝑖

𝑋

5 ln Ψ π‘‹

1 ln Ψ π‘‹

1

, 𝑋

5

, 𝑋

5 𝑑𝑋 = 𝑑𝑋

1 𝑑𝑋

5

𝑋

6

𝑋

5

𝑋

2

𝑋

1

𝑋

4

𝑋

7

Ψ(𝑋

1

, 𝑋

2

)

𝑋

3

Ψ(𝑋

1

, 𝑋

3

)

Ψ π‘‹

1

, 𝑋

4

4

Deriving mean field fixed point condition II

Let 𝐿 be the Lagrangian function ( π‘›π‘œ 𝑄 𝑖

𝐿 =

− ∫ 𝑖∈𝐷 𝑗

𝑄 𝑖

∫ 𝑄 𝑖

𝑋 𝑖 ln 𝑄 𝑖

𝑋 𝑖

𝑋 𝑖 ln Ψ π· 𝑗 𝑑𝑋 𝑖 𝑑𝑋

𝐷 𝑗

+

𝑋

− 𝛽 𝑖

(1 − ∫ 𝑄 𝑖 𝑖

𝑋 𝑖

> 0 ?)

)𝑑𝑋 𝑖

For KL divergence, the nonnegativity constraint comes for free

If 𝑄 is a maximum for the original constrained problem, then there exists πœ† 𝑖

, 𝛽 𝑖

such that ( 𝑄, πœ† 𝑖

, 𝛽 𝑖

) is a stationary point for the Lagrange function (derivative equal 0).

5

Deriving mean field fixed point condition III

𝐿 =

− ∫ 𝑖∈𝐷 𝑗

𝑄 𝑖

𝛽 𝑖

(1 − ∫ 𝑄 𝑖

𝑋 𝑖

𝑋 𝑖 ln Ψ π·

)𝑑𝑋 𝑖 𝑗 𝑑𝑋

𝐷 𝑗

+ ∫ 𝑄 𝑖

πœ•π‘„ 𝑖 𝛽 𝑖

πœ•πΏ

(𝑋 𝑖

)

= 0

= − 𝑗:𝑖∈𝐷 𝑗

( π‘˜∈𝐷 𝑗

\i

𝑄 π‘˜

𝑋 π‘˜

𝑋 𝑖 ln 𝑄 𝑖

𝑋 𝑖 ln Ψ(𝐷 𝑗

)) + ln 𝑄 𝑖 𝑑𝑋 𝑖

𝑋 𝑖

𝑄 𝑖

1

𝑍 𝑖

𝑋 𝑖

= exp 𝑗:𝑖∈𝐷 𝑗 exp 𝑗:𝑖∈𝐷 𝑗

( π‘˜∈𝐷 𝑗

\i

( π‘˜∈𝐷 𝑗

\i

𝑄 π‘˜

𝑋 π‘˜

𝑄 π‘˜

𝑋 π‘˜ ln Ψ(𝐷 𝑗 ln Ψ(𝐷

)) 𝑗

)) + 𝛽 𝑖

𝑄 𝑖

𝑋 𝑖

=

1

𝑍 𝑖 exp

𝐷 𝑗

:𝑋 𝑖

∈𝐷 𝑗

𝐸

𝑄 ln Ψ π· 𝑗

=

6

Mean Field Algorithm

Initialize 𝑄 𝑋

1

, … , 𝑋 𝑛

= 𝑄 𝑋 𝑖

(eg., randomly or smartly)

Set all variables to unprocessed

Pick an unprocessed variable 𝑋 𝑖

Update 𝑄 𝑖

:

𝑄 𝑖

𝑋 𝑖

=

1

𝑍 𝑖 exp 𝐸

𝐷 𝑗

:𝑋 𝑖

∈𝐷 𝑗

𝑄

Set variable 𝑋 𝑖 as processed

If 𝑄 𝑖

changed

Set neighbors of 𝑋 𝑖

to unprocessed ln Ψ π· 𝑗

Guaranteed to converge

7

Understanding mean field algorithm

𝑄

1

𝑋

1

𝑋

6

𝑋

5

𝑄(𝑋

5

) ln Ψ π‘‹

1

, 𝑋

5 π‘₯5

𝑄

2

𝑋

2

𝑋

6

𝑄(𝑋

6

) ln Ψ π‘‹

1

, 𝑋

6 π‘₯6

𝑋

5

𝑋

2

𝑋

1

𝑋

4

𝑋

2

𝑋

1

𝑋

4

𝑋

7

𝑋

3 𝑋

7

𝑋

3

𝑄 𝑋 π‘₯

2

2 ln Ψ π‘‹

1

, 𝑋

2

𝑄 𝑋 π‘₯

4

4 ln Ψ π‘‹

1

, 𝑋

4

𝑄 𝑋

1 π‘₯

1 ln Ψ π‘‹

1

, 𝑋

2 𝑄 𝑋

7 π‘₯

7 ln Ψ π‘‹

2

, 𝑋

7

𝑄

1

𝑋

1

𝑄

2

𝑋

2

𝑄 𝑋

3 π‘₯

3 ln Ψ π‘‹

1

, 𝑋

3

=

=

1

𝑍

1

1

𝑍

1 exp 𝐸

𝑄(𝑋

2 exp 𝐸

𝑄(𝑋

1

)

) ln Ψ π‘‹

1

, 𝑋

2 ln Ψ π‘‹

1

, 𝑋

2

+ 𝐸

𝑄(𝑋

3

) ln Ψ π‘‹

1

, 𝑋

3

+ 𝐸

𝑄(𝑋

6

) ln Ψ π‘‹

2

, 𝑋

6

+ 𝐸

𝑄(𝑋

4

) ln Ψ π‘‹

1

, 𝑋

4

+ 𝐸

𝑄(𝑋

5

) ln Ψ π‘‹

1

, 𝑋

5

+ 𝐸

𝑄(𝑋

7

) ln Ψ π‘‹

2

, 𝑋

7

8

Example of Mean Field

Mean field equation

𝑄 𝑖

𝑋 𝑖

=

1

𝑍 𝑖 exp

𝐷 𝑗

:𝑋 𝑖

∈𝐷 𝑗

𝐸

𝑄

Pairwise Markov random field ln Ψ π· 𝑗

𝑃 𝑋

1

, … , 𝑋 𝑛

= exp (πœƒ 𝑖𝑗

∝ exp πœƒ

𝑋 𝑖

𝑋 𝑗

) 𝑖𝑗

𝑋 𝑖

𝑋 𝑗

+ πœƒ 𝑖

𝑋 𝑖

exp (πœƒ 𝑖

𝑋 𝑖

) ln Ψ π‘‹ 𝑖

Ψ π‘‹ 𝑖

, 𝑋 𝑗

, 𝑋 𝑗

= πœƒ 𝑖𝑗

𝑋 𝑖

𝑋 𝑗

Ψ π‘‹ 𝑖 lnΨ π‘‹ 𝑖

= πœƒ 𝑖

𝑋 𝑖

The mean field equation has simple form:

𝑄 𝑖

=

𝑋 𝑖

=

1

𝑍 𝑖 exp 𝑗∈𝑁 𝑖

1

𝑍 𝑖 exp 𝑗∈𝑁 𝑖 πœƒ 𝑖𝑗

𝑋 𝑖

πœƒ

< 𝑋 𝑗

> 𝑖𝑗

𝑄 𝑗

𝑋 𝑖

𝑋 𝑗

+ πœƒ

𝑄 𝑗 𝑖

𝑋

𝑋 𝑖 𝑗

+ πœƒ 𝑖

𝑋 𝑖

𝑄 𝑖

𝑋 𝑖

< 𝑋 𝑠

>

𝑄 𝑠

< 𝑋 𝑙

>

𝑄 𝑙

< 𝑋 𝑗

>

𝑄 𝑗

< 𝑋 π‘˜

>

𝑄 π‘˜

9

Example of generalized mean field

𝐢

2

𝐢

3

𝐢

1

𝐢

4

𝑄 𝐢

1 exp(

∝ 𝑖∈𝐢

1 πœƒ 𝑖

𝑋 𝑖

+ 𝑖𝑗 ∈𝐸,𝑖∈𝐢

1

,𝑗∈𝐢

1 πœƒ 𝑖𝑗

𝑋 𝑖

𝑋 𝑗

Node potential within 𝐢

1

Edge potential within 𝐢

1

𝑀𝐡(𝐢

1

) 𝑖∈𝐢

1

,𝑗∈𝐢 π‘˜

,𝑗∈𝑀𝐡 𝐢

1 πœƒ 𝑖𝑗

𝑋 𝑖

< 𝑋 𝑗

>

𝑄 𝐢 π‘˜

)

Mean of variables in Markov blanket

Edge potential across 𝐢

1

and 𝐢 π‘˜

10

Why Sampling

Previous inference tasks focus on obtaining the entire posterior distribution 𝑃 𝑋 𝑖 𝑒

Often we want to take expectations

Mean πœ‡

𝑋 𝑖

|𝑒

= 𝐸 𝑋

Variance 𝜎 2

𝑋 𝑖

|𝑒 𝑖 𝑒 = ∫ 𝑋

= 𝐸 (𝑋 𝑖 𝑖

𝑃 𝑋

−πœ‡

𝑋 𝑖

|𝑒

) 2 𝑖 𝑒 𝑑𝑋 𝑖 𝑒 = ∫ (𝑋 𝑖

−πœ‡

𝑋 𝑖

|𝑒

) 2 𝑃 𝑋 𝑖 𝑒 𝑑𝑋

More general 𝐸 𝑓 = ∫ 𝑓 𝑋 𝑃 𝑋|𝑒 𝑑𝑋 , can be difficult to do it 𝑖 analytically

Key idea : approximate expectation by sample average

𝑁

1

𝐸 𝑓 ≈

𝑁

𝑓 π‘₯ 𝑖 𝑖=1 where π‘₯

1

, … , π‘₯

𝑁

∼ 𝑃 𝑋|𝑒 independently and identically

11

Sampling

Samples: points from the domain of a distribution 𝑃 𝑋

The higher the 𝑃 π‘₯ , the more likely we see π‘₯ in the sample

𝑃(𝑋) π‘₯

1 π‘₯

5 π‘₯

2 π‘₯

6 π‘₯

4 π‘₯

3

𝑋

Mean πœ‡ =

1

𝑁

π‘₯ 𝑖

Variance 𝜎 2 =

1

𝑁

(π‘₯ 𝑖

−πœ‡ ) 2

12

Sampling Example

For 𝑛 independent Gaussian variable with 0 mean and unit variance , use π‘₯ ∼ π‘Ÿπ‘Žπ‘›π‘‘π‘› 𝑛, 1 in Matlab

For multivariate Gaussian 𝒩(πœ‡, Σ) 𝑦 ∼ π‘Ÿπ‘Žπ‘›π‘‘π‘› 𝑛, 1

Transform the sample: π‘₯ = πœ‡ + Σ 1/2 𝑦

𝑋 takes finite number of values, e.g., 1, 2, 3 , 𝑃 𝑋 = 1 =

0.7, 𝑃 𝑋 = 2 = 0.2, 𝑃 𝑋 = 3 = 0.1

Draw a sample from uniform distribution U[0,1], 𝑒 ∼ π‘Ÿπ‘Žπ‘›π‘‘

If 𝑒 falls into interval [0, 0.7) , output sample π‘₯ = 1

If 𝑒 falls into interval [0.7, 0.9) , output sample π‘₯ = 2

If 𝑒 falls into interval [0.9, 1] , output sample π‘₯ = 3

0

Value 1 Value 2 Value 3

1

0.7 0.2 0.1

13

Generate Samples from Bayesian Networks

BN describe a generative process for observations

First, sort the nodes in topological order

Then, generate sample using this order according to the CPTs

1

π΄π‘™π‘™π‘’π‘Ÿπ‘”π‘¦

𝑆𝑖𝑛𝑒𝑠

Generate a set of sample for (A, F, S, N, H):

Sample π‘Ž 𝑖

Sample 𝑓 𝑖

Sample 𝑠 𝑖

Sample 𝑛 𝑖

Sample β„Ž 𝑖

∼ 𝑃 𝐴

∼ 𝑃 𝐹

∼ 𝑃 𝑆 π‘Ž

∼ 𝑃 𝑁

∼ 𝑃 𝐻 𝑖 𝑠 𝑠

, 𝑓 𝑖 𝑖 𝑖

π‘π‘œπ‘ π‘’

4

3

𝐹𝑙𝑒

2

π»π‘’π‘Žπ‘‘π‘Žπ‘β„Žπ‘’

5

14

Challenge in sampling

How to draw sample from a given distribution?

Not all distributions can be trivially sampled

How to make better use of samples (not all samples are equally useful)?

How do know we’ve sampled enough?

15

Sampling Methods

Direct Sampling

Simple

Works only for easy distributions

Rejection Sampling

Create samples like direct sampling

Only count samples consistent with given evidence

Importance Sampling

Create samples like direct sampling

Assign weights to samples

Gibbs Sampling

Often used for high-dimensional problem

Use variables and its Markov blanket for sampling

16

Rejection sampling I

Want to sample from distribution 𝑃 𝑋 =

1

𝑍 𝑓(𝑋)

𝑃(𝑋) is difficult to sample, but f(X) is easy to evaluate

Key idea: sample from a simpler distribution Q(X), and then select samples

Rejection sampling (choose

𝑀 s.t. 𝑓 𝑋 ≤ 𝑀 𝑄 𝑋

) 𝑖 = 1

Repeat until 𝑖 = 𝑁 π‘₯ ∼ 𝑄 𝑋 , 𝑒 ∼ π‘ˆ 0,1 if 𝑒 ≤ 𝑓 𝑋

𝑀 𝑄 𝑋

, accept π‘₯ and 𝑖 = 𝑖 + 1 ; otherwise, reject π‘₯

Reject sample with probability 1 − 𝑓 𝑋

𝑀 𝑄 𝑋

17

Rejection sampling II

Sample π‘₯ ∼ 𝑄(𝑋)

and reject with probability

1 − 𝑓 𝑋

𝑀 𝑄 𝑋

𝑀 𝑄(π‘₯

1

)

𝑀 𝑄(𝑋) 𝑓(π‘₯

1

)

Between red and blue curves is rejection region 𝑓(𝑋) 𝑒

1

∼ π‘ˆ[0,1]

𝑋 π‘₯

1

∼ 𝑄(𝑋)

The likelihood of seeing a particular π‘₯

𝑃 π‘₯ =

∫ 𝑓 π‘₯

𝑀𝑄 π‘₯ 𝑓 𝑋

𝑀𝑄 𝑋

𝑄(π‘₯)

𝑄 𝑋 𝑑𝑋

=

1

𝑍 𝑓(π‘₯)

18

Drawback of rejection sampling

Can have very small acceptance rate

𝑀 𝑄(π‘₯

1

)

𝑀 𝑄(𝑋)

Big rejection region 𝑓(𝑋) 𝑒

1

∼ π‘ˆ[0,1] π‘₯

1

∼ 𝑄(𝑋)

Adaptive rejection sampling: use envelop function for Q

𝑋

19

𝑋

Importance Sampling I

Support sampling from 𝑃 𝑋 is hard

We can sample from a simpler proposal distribution 𝑄 𝑋

If

𝑄 dominates P (i.e.,

𝑄 𝑋 > 0

whenever

𝑃 𝑋 > 0

), we can sample from 𝑄 and reweight

𝐸

𝑃 𝑓 𝑋 = ∫ 𝑓 𝑋 𝑃 𝑋 𝑑𝑋) = ∫ 𝑓(𝑋)

𝑃 𝑋

𝑄 𝑋

Sample π‘₯ 𝑖

∼ 𝑄 𝑋

Reweight sample π‘₯ 𝑖

with 𝑀 𝑖

=

𝑃 π‘₯ 𝑖

𝑄 π‘₯ 𝑖 approximate expectation with

1

𝑁

𝑓 π‘₯ 𝑖

𝑄(𝑋)𝑑𝑋 𝑀 𝑖

20

Importance Sampling II

Instead of reject sample, reweight sample instead

𝑄(π‘₯

2

)

𝑄(π‘₯

1

)

𝑄(𝑋)

𝑃(π‘₯

2

) 𝑃(π‘₯

1

)

𝑃(𝑋) 𝑀

2 π‘₯

2

∼ 𝑃 π‘₯

2

∼ 𝑄(𝑋)

/𝑄 π‘₯

2 π‘₯

1 𝑀

1

∼ 𝑄(𝑋)

∼ 𝑃 π‘₯

1

/𝑄 π‘₯

1

The efficiency of importance sampling depends on how close the proposal Q is to the target P

𝑋

21

Unnormalized importance sampling

Suppose we only evaluate 𝑃 𝑋 =

1

𝑍 𝑓(𝑋) is difficult to compute), and don’t know Z

(e.g., grid or MRF, Z

We can get around the normalization Z by normalize the importance weight π‘Ÿ 𝑋 = 𝑓 𝑋

𝑄 𝑋

⇒ 𝐸

𝑄 π‘Ÿ 𝑋 = ∫ 𝑓 𝑋

𝑄 𝑋

𝑄 𝑋 𝑑𝑋 = ∫ 𝑓 𝑋 𝑑𝑋 = 𝑍

𝐸

𝑃 𝑔 𝑋 = ∫ 𝑔 𝑋 𝑃 𝑋 𝑑𝑋 =

1

𝑍

∫ 𝑔 𝑋

∫ 𝑔 𝑋 π‘Ÿ 𝑋 𝑄 𝑋 𝑑𝑋

∫ π‘Ÿ 𝑋 𝑄 𝑋 𝑑𝑋

𝑔 π‘₯ 𝑖 π‘Ÿ π‘₯

π‘Ÿ π‘₯ 𝑖 𝑖 𝑓 𝑋

𝑄 𝑋

𝑄 𝑋 𝑑𝑋 = 𝑑𝑒𝑓𝑖𝑛𝑒 𝑀 𝑖

= π‘Ÿ π‘₯ 𝑖

π‘Ÿ π‘₯ 𝑖

π‘Žπ‘  π‘‘β„Žπ‘’ 𝑛𝑒𝑀 π‘–π‘šπ‘π‘œπ‘Ÿπ‘‘π‘Žπ‘›π‘π‘’ π‘€π‘’π‘–π‘”β„Žπ‘‘

22

Difference between normalized and unnormalized

Unnormalized importance sampling is unbiased

Normalized importance sampling is biased, eg. for N=1

𝐸

𝑄 𝑓 π‘₯

1 𝑀

1 𝑀

1 = 𝐸

𝑄 𝑓 π‘₯

1

≠ 𝐸

𝑃 𝑓 π‘₯

1

However, the normalized importance sampler usually has lower variance

But most importantly, unnormalized importance sampler work for unnormalized distributions

For unnormalized sampler, one can also do resampling based on 𝑀 𝑖

(multinomial distribution with probability 𝑀 𝑖

for π‘₯ 𝑖

)

23

Gibbs Sampling

Both rejection sampling and importance sampling do not scale well to high dimensions

Markov Chain Monte Carlo (MCMC) is an alternative

Key idea : Construct a Markov chain whose stationary distribution is the target distribution 𝑃 𝑋

Sampling process: random walk in the Markov chain

Gibbs sampling is a very special and simple MCMC method.

24

Markov Chain Monte Carlo

Wan to sample from 𝑃 𝑋 , start with a random initial vector X

𝑋 𝑑 : 𝑋 at time step 𝑑

𝑋 𝑑

transition to 𝑋 𝑑+1

with probability

𝑄(𝑋 𝑑+1

| 𝑋 𝑑 , … , 𝑋 1 ) = 𝑇 (𝑋 𝑑+1

| 𝑋 𝑑 )

The stationary distribution of 𝑇 (𝑋 𝑑+1

| 𝑋 𝑑 ) is our 𝑃 𝑋

Run for an intial 𝑀 samples (burn-in time) until the chain converges/mixes/reaches the stationary ditribution

Then collect 𝑁 (correlated) sample as π‘₯ 𝑖

Key issues : Designing the transition kernel, and diagnose convergence

25

Gibbs Sampling

A very special transition kernel, works nicely with Markov blanket in GMs.

The procedure

We have variables set 𝑋 = 𝑋

1

, … , 𝑋

𝐾

variables in a GM.

At each step, one variable 𝑋 𝑖

is selected (at random or some fixed sequence), denote the remaining variables as 𝑋

−𝑖

, and its current value as π‘₯ 𝑑

−𝑖

Compute the conditional distribution 𝑃(𝑋 𝑖

| π‘₯ 𝑑

−𝑖

)

A value π‘₯ 𝑖 𝑑

is sampled from this distribution

This sample π‘₯ 𝑖 𝑑

replaces the previous sampled value of 𝑋 𝑖

in 𝑋

26

Gibbs Sampling in formula

Gibbs sampling

𝑋 = π‘₯ 0

For t = 1 to N π‘₯ π‘₯

1

2

… π‘₯ 𝑑

𝐾

= 𝑃(𝑋

1

| π‘₯

= 𝑃(𝑋

2

| π‘₯

= 𝑃(𝑋

2

| π‘₯ 𝑑

1

, … , π‘₯

, … , π‘₯

, … , π‘₯ 𝑑

𝐾−1

)

)

)

Variants:

Randomly pick variable to sample sample block by block

πΉπ‘œπ‘Ÿ π‘”π‘Ÿπ‘Žπ‘β„Žπ‘–π‘π‘Žπ‘™ π‘šπ‘œπ‘‘π‘’π‘™π‘ 

Only need to condition on the

Variables in the Markov blanket

𝑋

2

𝑋

3

𝑋

4

𝑋

1

𝑋

5

27

Gibbs Sampling: Examples

Pairwise Markov random field

𝑃 𝑋

1

, … , 𝑋 π‘˜

∝ exp πœƒ 𝑖𝑗

𝑋 𝑖

𝑋 𝑗

+ πœƒ 𝑖

𝑋 𝑖

In the Gibbs sampling step, we need conditional 𝑃(𝑋

1

| 𝑋

2

, … 𝑋 π‘˜

)

𝑃(𝑋

1

| 𝑋

2 exp πœƒ

12

, … 𝑋 π‘˜

𝑋

1

𝑋

2

) ∝

+ πœƒ

13

𝑋

1

𝑋

3

+ πœƒ

14

𝑋

1

𝑋

4

+ πœƒ

15

𝑋

1

𝑋

5

+ πœƒ

1

𝑋

1

𝑋

6

𝑋

5

𝑋

2

𝑋

1

𝑋

4

𝑋

7

𝑋

3

28

Diagnose convergence

Good chain

Sampled Value

Iteration number

29

Diagnose convergence

Bad chain

Sampled Value

Iteration number

30

Sampling Methods

Direct Sampling

Simple

Works only for easy distributions

Rejection Sampling

Create samples like direct sampling

Only count samples consistent with given evidence

Importance Sampling

Create samples like direct sampling

Assign weights to samples

Gibbs Sampling

Often used for high-dimensional problem

Use variables and its Markov blanket for sampling

31

Download