P(A|B)

advertisement
INFERENCE IN BAYESIAN
NETWORKS
AGENDA
Reading off independence assumptions
 Efficient inference in Bayesian Networks

Top-down inference
 Variable elimination
 Monte-Carlo methods

SOME APPLICATIONS OF BN
Medical diagnosis
 Troubleshooting of hardware/software systems
 Fraud/uncollectible debt detection
 Data mining
 Analysis of genetic sequences
 Data interpretation, computer vision, image
understanding

MORE COMPLICATED
SINGLY-CONNECTED BELIEF NET
Battery
Radio
Gas
SparkPlugs
Starts
Moves
Region = {Sky, Tree, Grass, Rock}
R1
Above
R2
R3
R4
BN to evaluate
insurance risks
BN FROM LAST LECTURE
Burglary
Intuitive meaning of
arc from x to y: “x
has direct influence
on y”
Earthquake
causes
Alarm
Directed
acyclic graph
effects
JohnCalls
MaryCalls
ARCS DO NOT NECESSARILY ENCODE
CAUSALITY!
A
C
B
B
C
A
2 BN’s that can encode the same joint probability distribution
READING OFF INDEPENDENCE
RELATIONSHIPS

A
Given B, does the value
of A affect the
probability of C?


B
C


P(C|B,A) = P(C|B)?
No!
C parent’s (B) are given,
and so it is independent
of its non-descendents
(A)
Independence is
symmetric:
C  A | B => A  C | B
WHAT DOES THE BN ENCODE?
Burglary
Earthquake
Alarm
JohnCalls
Burglary  Earthquake
JohnCalls  MaryCalls | Alarm
JohnCalls  Burglary | Alarm
JohnCalls  Earthquake | Alarm
MaryCalls  Burglary | Alarm
MaryCalls  Earthquake | Alarm
MaryCalls
A node is independent of
its non-descendents, given
its parents
READING OFF INDEPENDENCE
RELATIONSHIPS
Burglary
Earthquake
Alarm
JohnCalls
MaryCalls
How about Burglary Earthquake | Alarm ?
 No! Why?

READING OFF INDEPENDENCE
RELATIONSHIPS
Burglary
Earthquake
Alarm
JohnCalls
MaryCalls
How about Burglary  Earthquake | Alarm ?
 No! Why?
 P(BE|A) = P(A|B,E)P(BE)/P(A) = 0.00075
 P(B|A)P(E|A) = 0.086

READING OFF INDEPENDENCE
RELATIONSHIPS
Burglary
Earthquake
Alarm
JohnCalls
MaryCalls
How about Burglary  Earthquake | JohnCalls?
 No! Why?
 Knowing JohnCalls affects the probability of
Alarm, which makes Burglary and Earthquake
dependent

INDEPENDENCE RELATIONSHIPS

Rough intuition (this holds for tree-like graphs,
polytrees):
Evidence on the (directed) road between two
variables makes them independent
 Evidence on an “A” node makes descendants
independent
 Evidence on a “V” node, or below the V, makes the
ancestors of the variables dependent (otherwise
they are independent)


Formal property in general case : D-separation 
independence (see R&N)
BENEFITS OF SPARSE MODELS

Modeling
Fewer relationships need to be encoded (either
through understanding or statistics)
 Large networks can be built up from smaller ones


Intuition


Dependencies/independencies between variables can
be inferred through network structures
Tractable inference
TOP-DOWN INFERENCE
Suppose we want to compute P(Alarm)
Burglary
P(B)
Earthquake
0.001
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE
Suppose we want to compute P(Alarm)
1. P(Alarm) = Σb,e P(A,b,e)
P(B)
2. P(Alarm) = Σb,e P(A|b,e)P(b)P(e)
Burglary
Earthquake
0.001
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE
Suppose we want to compute P(Alarm)
1. P(Alarm) = Σb,e P(A,b,e)
P(B)
2. P(Alarm) = Σb,e P(A|b,e)P(b)P(e)
0.001
3. P(Alarm)Burglary
= P(A|B,E)P(B)P(E)
+
P(A|B, E)P(B)P(E) +
P(A|B,E)P(B)P(E) +
P(A|B,E)P(B)P(E)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
Earthquake
P(E)
0.002
B E P(A|…)
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE
Suppose we want to compute P(Alarm)
1. P(A) = Σb,e P(A,b,e)
P(B)
2. P(A) = Σb,e P(A|b,e)P(b)P(e)
Burglary 0.001+
3. P(A) = P(A|B,E)P(B)P(E)
P(A|B, E)P(B)P(E) +
P(A|B,E)P(B)P(E) +
P(A|B,E)P(B)P(E)
4. P(A) = 0.95*0.001*0.002 +
0.94*0.001*0.998 + Alarm
0.29*0.999*0.002 +
0.001*0.999*0.998
= 0.00252
JohnCalls
A
P(J|…)
T
F
0.90
0.05
Earthquake
P(E)
0.002
B E P(A|…)
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE
Now, suppose we want to compute P(MaryCalls)
Burglary
P(B)
Earthquake
0.001
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE
Now, suppose we want to compute P(MaryCalls)
1. P(M) = P(M|A)P(A) + P(M| A) P(A)
Burglary
P(B)
Earthquake
0.001
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE
Now, suppose we want to compute P(MaryCalls)
1. P(M) = P(M|A)P(A) + P(M| A) P(A)
2. P(M) = 0.70*0.00252 + P(B)
0.01*(1-0.0252)
Burglary 0.001
Earthquake
= 0.0117
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE WITH EVIDENCE
Suppose we want to compute P(Alarm|Earthquake)
Burglary
P(B)
Earthquake
0.001
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE WITH EVIDENCE
Suppose we want to compute P(A|e)
1. P(A|e) = Σb P(A,b|e)
P(B)
2. P(A|e) = Σb P(A|b,e)P(b)
Burglary
Earthquake
0.001
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE WITH EVIDENCE
Suppose we want to compute P(A|e)
1. P(A|e) = Σb P(A,b|e)
P(B)
2. P(A|e) = Σb P(A|b,e)P(b)
3. P(A|e) =Burglary
0.95*0.001 + 0.001
0.29*0.999 +
= 0.29066
Earthquake
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
TOP-DOWN INFERENCE
Only works if the graph of ancestors of a variable
is a polytree
 Evidence given on ancestor(s) of the query
variable
 Efficient:

O(d 2k) time, where d is the number of ancestors of a
variable, with k a bound on # of parents
 Evidence on an ancestor cuts off influence of portion
of graph above evidence node

QUERYING THE BN
The BN gives P(T|C)
 What about P(C|T)?

Cavity
P(C)
0.1
C P(T|C)
Toothache
T 0.4
F 0.01111
BAYES’ RULE

P(AB)

So…
= P(A|B) P(B)
= P(B|A) P(A)
P(A|B) = P(B|A) P(A) / P(B)
APPLYING BAYES’ RULE


Let A be a cause, B be an effect, and let’s say we know
P(B|A) and P(A) (conditional probability tables)
What’s P(B)?
APPLYING BAYES’ RULE


Let A be a cause, B be an effect, and let’s say we know
P(B|A) and P(A) (conditional probability tables)
What’s P(B)?
 P(B)
= Sa P(B,A=a)
 P(B,A=a)
 So,
= P(B|A=a)P(A=a)
P(B) = Sa P(B | A=a) P(A=a)
[marginalization]
[conditional probability]
APPLYING BAYES’ RULE


Let A be a cause, B be an effect, and let’s say we know
P(B|A) and P(A) (conditional probability tables)
What’s P(A|B)?
APPLYING BAYES’ RULE


Let A be a cause, B be an effect, and let’s say we know
P(B|A) and P(A) (conditional probability tables)
What’s P(A|B)?
 P(A|B)
 P(B)
 So,
= P(B|A)P(A)/P(B)
[Bayes rule]
= Sa P(B | A=a) P(A=a)
[Last slide]
P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)]
HOW DO WE READ THIS?



P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)]
[An equation that holds for all values A can take on, and
all values B can take on]
P(A=a|B=b) =
HOW DO WE READ THIS?



P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)]
[An equation that holds for all values A can take on, and
all values B can take on]
P(A=a|B=b) = P(B=b|A=a)P(A=a) /
[Sa P(B=b | A=a) P(A=a)]
Are these the same a?
HOW DO WE READ THIS?



P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)]
[An equation that holds for all values A can take on, and
all values B can take on]
P(A=a|B=b) = P(B=b|A=a)P(A=a) /
[Sa P(B=b | A=a) P(A=a)]
Are these the same a?
NO!
HOW DO WE READ THIS?



P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)]
[An equation that holds for all values A can take on, and
all values B can take on]
P(A=a|B=b) = P(B=b|A=a)P(A=a) /
[Sa’ P(B=b | A=a’) P(A=a’)]
Be careful about indices!
QUERYING THE BN

Cavity
P(C)

0.1

The BN gives P(T|C)
What about P(C|T)?
P(Cavity|Toothache) =
P(Toothache|Cavity) P(Cavity)
P(Toothache)
C P(T|C)
Toothache
[Bayes’ rule]
T 0.4
F 0.01111
Denominator computed by
summing out numerator over
Cavity and Cavity

Querying a BN is just applying
Bayes’ rule on a larger scale…
PERFORMING INFERENCE
Variables X
 Have evidence set E=e, query variable Q
 Want to compute the posterior probability
distribution over Q, given E=e
 Let the non-evidence variables be Y (= X \ E)
 Straight forward method:

1.
2.
3.
Compute joint P(YE=e)
Marginalize to get P(Q,E=e)
Divide by P(E=e) to get P(Q|E=e)
INFERENCE IN THE ALARM EXAMPLE
Burglary
P(B)
Earthquake
0.001
P(J|M) = ??
P(E)
0.002
B E P(A|…)
Alarm
Evidence E=e
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
Query Q
JohnCalls
A
P(J|…)
T
F
0.90
0.05
MaryCalls
A P(M|…)
T 0.70
F 0.01
INFERENCE IN THE ALARM EXAMPLE
Burglary
P(B)
Earthquake
0.001
P(J|MaryCalls) = ??
B E P(A|…)
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
P(E)
0.002
24 entries
1. P(J,A,B,E,MaryCalls) Alarm
=
P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)
P(x1x
= Pi=1,…,nP(x
JohnCalls
MaryCalls
T 0.70 i))
2…xnT) 0.90
i|parents(X
A
P(J|…)
A P(M|…)
F
0.05
F 0.01
 full joint distribution table
INFERENCE IN THE ALARM EXAMPLE
Burglary
P(B)
Earthquake
0.001
P(J|MaryCalls) = ??
B E P(A|…)
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
1. P(J,A,B,E,MaryCalls) Alarm
=
P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)
2. P(J,MaryCalls) =
A P(J|…)
Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)
JohnCalls T 0.90
MaryCalls
F
0.05
P(E)
0.002
2 entries:
one for JohnCalls,
the other for
JohnCalls
A P(M|…)
T 0.70
F 0.01
INFERENCE IN THE ALARM EXAMPLE
Burglary
P(B)
Earthquake
0.001
P(J|MaryCalls) = ??
P(E)
0.002
B E P(A|…)
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
1. P(J,A,B,E,MaryCalls) Alarm
=
P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)
2. P(J,MaryCalls) =
A P(J|…)
Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)
JohnCalls T 0.90
MaryCalls
F 0.05
3. P(J|MaryCalls) = P(J,MaryCalls)/P(MaryCalls)
= P(J,MaryCalls)/(SjP(j,MaryCalls))
A P(M|…)
T 0.70
F 0.01
HOW EXPENSIVE?
P(X) = P(x1x2…xn) = Pi=1,…,n P(xi|parents(Xi))
Straightforward method:
1. Use above to compute P(Y,E=e)
2. P(Q,E=e) = Sy1 … Syk P(Y,E=e)
Normalization factor
– no big deal once
3. P(E=e) = Sq P(Q,E=e)
we have P(Q,E=e)
n-|E|
 Step 1: O( 2
) entries!

Can we do better?
VARIABLE ELIMINATION
Consider linear network X1X2X3
 P(X) = P(X1) P(X2|X1) P(X3|X2)
 P(X3) = Σx1 Σx2 P(x1) P(x2|x1) P(X3|x2)

VARIABLE ELIMINATION
Consider linear network X1X2X3
 P(X) = P(X1) P(X2|X1) P(X3|X2)
 P(X3) = Σx1 Σx2 P(x1) P(x2|x1) P(X3|x2)

= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1)
Rearrange equation…
VARIABLE ELIMINATION
Consider linear network X1X2X3
 P(X) = P(X1) P(X2|X1) P(X3|X2)
 P(X3) = Σx1 Σx2 P(x1) P(x2|x1) P(X3|x2)

= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1)
= Σx2 P(X3|x2) P(x2)
Computed for each
value of X2
Cache P(x2) for both
values of X3!
VARIABLE ELIMINATION
Consider linear network X1X2X3
 P(X) = P(X1) P(X2|X1) P(X3|X2)
 P(X3) = Σx1 Σx2 P(x1) P(x2|x1) P(X3|x2)

= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1)
= Σx2 P(X3|x2) P(x2)
How many * and + saved?
*: 2*4*2=16 vs 4+4=8
+ 2*3=8 vs 2+1=3
Computed for each
value of X2
Can lead to huge
gains in larger
networks
VE IN ALARM EXAMPLE
P(E|j,m)=P(E,j,m)/P(j,m)
 P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)

VE IN ALARM EXAMPLE
P(E|j,m)=P(E,j,m)/P(j,m)
 P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)
= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)

VE IN ALARM EXAMPLE
P(E|j,m)=P(E,j,m)/P(j,m)
 P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)
= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)

= P(E) Σb P(b) P(j,m|E,b)
Compute for all
values of E,b
VE IN ALARM EXAMPLE
P(E|j,m)=P(E,j,m)/P(j,m)
 P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)
= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)

= P(E) Σb P(b) P(j,m|E,b)
= P(E) P(j,m|E)
Compute for all
values of E
WHAT ORDER TO PERFORM VE?

For tree-like BNs (polytrees), order so parents
come before children



# of variables in each intermediate probability table
is 2^(# of parents of a node)
If the number of parents of a node is bounded,
then VE is linear time!
Other networks: intermediate factors may
become large
NON-POLYTREE NETWORKS

P(D) = Σa Σb Σc P(A)P(B|A)P(C|A)P(D|B,C)
= Σb Σc P(D|B,C) Σa P(A)P(B|A)P(C|A)
A
B
No more
simplifications…
C
D
APPROXIMATE INFERENCE TECHNIQUES
Based on the idea of Monte Carlo simulation
 Basic idea:



To estimate the probability of a coin flipping heads, I
can flip it a huge number of times and count the
fraction of heads observed
Conditional simulation:

1.
2.
3.
4.
To estimate the probability P(H) that a coin picked
out of bucket B flips heads, I can:
Pick a coin C out of B (occurs with probability P(C))
Flip C and observe whether it flips heads (occurs
with probability P(H|C))
Put C back and repeat from step 1 many times
Return the fraction of heads observed (estimate of
P(H))
APPROXIMATE INFERENCE: MONTE-CARLO
SIMULATION

Sample from the joint distribution
Burglary
P(B)
Earthquake
0.001
P(E)
0.002
B E P(A|…)
B=0
E=0
A=0
J=1
M=0
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
APPROXIMATE INFERENCE: MONTE-CARLO
SIMULATION

As more samples are generated, the distribution
of the samples approaches the joint distribution!
B=0
E=0
A=0
J=1
M=0
B=0
E=0
A=0
J=0
M=0
B=0
E=0
A=0
J=0
M=0
B=1
E=0
A=1
J=1
M=0
APPROXIMATE INFERENCE: MONTE-CARLO
SIMULATION
Inference: given evidence E=e (e.g., J=1)
 Remove the samples that conflict

B=0
E=0
A=0
J=1
M=0
B=0
E=0
A=0
J=0
M=0
B=0
E=0
A=0
J=0
M=0
B=1
E=0
A=1
J=1
M=0
Distribution of remaining samples
approximates the conditional distribution!
HOW MANY SAMPLES?

average

1
𝑛
Error of estimate, for n samples, is 𝑂( ) on
Variance-reduction techniques
RARE EVENT PROBLEM:
What if some events are really rare (e.g.,
burglary & earthquake ?)
 # of samples must be huge to get a reasonable
estimate
 Solution: likelihood weighting

Enforce that each sample agrees with evidence
 While generating a sample, keep track of the ratio of

(how likely the sampled value is to occur in the real world)
(how likely you were to generate the sampled value)
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=1
Burglary
P(B)
Earthquake
0.001
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=0.008
Burglary
P(B)
Earthquake
0.001
B=0
E=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=0.0023
B=0
E=1
A=1
Burglary
P(B)
Earthquake
0.001
A=1 is enforced, and
the weight updated
Alarm to
reflect the likelihood
that this occurs
JohnCalls
A
P(J|…)
T
F
0.90
0.05
P(E)
0.002
B E P(A|…)
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=0.0016
Burglary
P(B)
Earthquake
0.001
B=0
E=1
A=1
M=1
J=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=3.988
Burglary
P(B)
Earthquake
0.001
B=0
E=0
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=0.004
Burglary
P(B)
Earthquake
0.001
B=0
E=0
A=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=0.0028
Burglary
P(B)
Earthquake
0.001
B=0
E=0
A=1
M=1
J=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=0.00375
Burglary
P(B)
Earthquake
0.001
B=1
E=0
A=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=0.0026
Burglary
P(B)
Earthquake
0.001
B=1
E=0
A=1
M=1
J=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=5e-7
Burglary
P(B)
Earthquake
0.001
B=1
E=1
A=1
M=1
J=1
P(E)
0.002
B E P(A|…)
Alarm
JohnCalls
A
P(J|…)
T
F
0.90
0.05
T
T
F
F
T
F
T
F
0.95
0.94
0.29
0.001
MaryCalls
A P(M|…)
T 0.70
F 0.01
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls
 Sample B,E with P=0.5

w=0.0016
B=0
E=1
A=1
M=1
J=1
w=0.0028
B=0
E=0
A=1
M=1
J=1
w=0.0026
B=1
E=0
A=1
M=1
J=1
w~=0
B=1
E=1
A=1
M=1
J=1
N=4 gives P(B|A,M)~=0.371
 Exact inference gives P(B|A,M) = 0.375

RECAP
Efficient inference in BNs
 Variable elimination
 Approximate methods: Monte-Carlo sampling

NEXT LECTURE
Statistical learning: from data to distributions
 R&N 20.1-2

Download