bayesnets1

advertisement
Bayesian Networks
Tamara Berg
CS 560 Artificial Intelligence
Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Andrew Moore, Percy Liang, Luke Zettlemoyer, Rob Pless, Killian Weinberger, Deva Ramanan
1
Announcements
• HW2 due tomorrow, 11:59pm
• Mid-term was handed back last week (see
me if you haven’t collected yours)
• No class Monday (University Day)
Review: Probability
•
•
•
•
•
•
•
•
Random variables, events
Axioms of probability
Atomic events
Joint and marginal probability distributions
Conditional probability distributions
Product rule
Independence and conditional independence
Inference
Bayesian decision making
• Suppose the agent has to make decisions about
the value of an unobserved query variable X
based on the values of an observed evidence
variable E
• Inference problem: given some evidence E = e,
what is P(X | e)?
• Learning problem: estimate the parameters of
the probabilistic model P(X | E) given training
samples {(x1,e1), …, (xn,en)}
Bayesian networks (BNs)



A type of graphical model
A BN states conditional independence
relationships between random variables
Compact specification of full joint distributions
Random Variables
Random variables
X = { X1, X 2 X 3,...X n }
Let
x i be a realization of X i
Random Variables
Random variables
X = { X1, X 2 X 3,...X n }
Let
x i be a realization of X i
A random variable is some aspect of the world about which
we (may) have uncertainty.
Random variables can be:
Binary (e.g. {true,false}, {spam/ham}),
Take on a discrete set of values
(e.g. {Spring, Summer, Fall, Winter}),
Or be continuous (e.g. [0 1]).
Joint Probability Distribution
Random variables
X = { X1, X 2 X 3,...X n }
Let
x i be a realization of X i
Joint Probability Distribution specifies a
probability for every atomic event:
P(X1 = x1, X2 = x2, X3 = x3,...Xn = xn )
Also written
p(x1, x 2 x 3,...x n )
Queries
Joint Probability Distribution specifies a
probability for every atomic event:
P(X1 = x1, X2 = x2, X3 = x3,...Xn = xn )
Also written
p(x1, x 2 x 3,...x n )
Given a joint distribution, we can reason about unobserved
variables given observations (evidence):
P(X q | x e1,..., x ek )
Stuff you care about
Stuff you already know
Representation
Joint Probability Distribution specifies a
probability for every atomic event:
P(X1 = x1, X2 = x2, X3 = x3,...Xn = xn )
Also written
p(x1, x 2 x 3,...x n )
One way to represent the joint probability distribution for discrete Xi
is as an n-dimensional table, each cell containing the probability
n
for a setting of X. This would have r entries if each Xi ranges
over r values.
Graphical Models!
Representation
Joint Probability Distribution:
P(X1 = x1, X 2 = x2, X 3 = x 3,...X n = x n )
Also written p(x1, x 2 x 3 ,...x n )
Graphical models represent joint probability
distributions more economically, using a set of
“local” relationships among variables.
Graphical Models
Graphical models offer several useful properties:
1. They provide a simple way to visualize the
structure of a probabilistic model and can be
used to design and motivate new models.
2. Insights into the properties of the model, including
conditional independence properties, can be
obtained by inspection of the graph.
3. Complex computations, required to perform
inference and learning in sophisticated models,
can be expressed in terms of graphical
manipulations.
from Chris Bishop
Main kinds of models
• Undirected (also called Markov Random Fields)
- links express constraints between variables.
A
B
• Directed (also called Bayesian Networks) - have
a notion of causality -- one can regard an arc
from A to B as indicating that A "causes" B.
A
B
Syntax
Directed Acyclic Graph

Weather
Cavity
Nodes: random variables


Can be assigned/observed (shaded)
or unassigned/unobserved (unshaded)
Toothache
Arcs: interactions



An arrow from one variable to another indicates
direct influence
Encode independence/conditional independence


Weather is independent of the other variables
Toothache and Catch are conditionally independent
given Cavity
Catch
Example: N independent
coin flips

Complete independence: no interactions
X1
X2
…
Xn
Example: Naïve Bayes document
model
Random variables:



X: document class
W1, …, Wn: words in the document
X
W1
W2
…
Wn
Example: Burglar Alarm
I have a burglar alarm that is sometimes set off by minor
earthquakes. My two neighbors, John and Mary, promise
to call me at work if they hear the alarm


Example inference task: suppose Mary calls and John doesn’t call.
What is the probability of a burglary?
What are the random variables?


Burglary, Earthquake, Alarm, JohnCalls, MaryCalls
What are the direct influence relationships?





A burglar can set the alarm off
An earthquake can set the alarm off
The alarm can cause Mary to call
The alarm can cause John to call
Example: Burglar Alarm
What does this
mean?
What are the model
parameters?
Bayes Nets
Directed Graph, G = (X,E)
Nodes
Edges
X = { X1, X 2 X 3,...X n }
E = {(X i, X j ) : i ¹ j}
Each node is associated with a random variable
Example
X4
X2
X6
X1
X3
X5
Joint Distribution
p(x1, x 2 x 3, x 4 , x 5 , x 6 ) =
p(x1 )p(x2 | x1 )p(x3 | x1, x2 )p(x4 | x1, x2, x3 )p(x5 | x1, x2, x3, x4 )p(x6 | x1, x2, x3, x4, x5 )
By Chain Rule (using the usual arithmetic ordering)
Directed Graphical Models
Directed Graph, G = (X,E)
Nodes
Edges
X = { X1, X 2 X 3,...X n }
E = {(X i, X j ) : i ¹ j}
Each node is associated with a random variable
Definition of joint probability in a graphical model:
n
p(x1,...x n ) = Õ p(xi
i=1
xp i ) where
pi
are the
parents of x i
Example
X4
X2
X6
X1
X3
X5
Joint Probability:
p(x1, x 2 x 3 , x 4 , x 5 , x 6 ) =
p(x1) p(x 2 | x1) p(x 3 | x1) p(x 4 | x 2 ) p(x 5 | x 3 ) p(x 6 | x 2, x 5 )
Conditional Independence
Independence:
p(x A , x B ) = p(x A )p(x B )
Conditional Independence:
p(xA , xC | x B ) = p(x A | x B )p(xC | x B )
Or,
p(x A | x B , xC ) = p(x A | x B )
Conditional Independence
p(x1, x 2 x 3, x 4 , x 5 , x 6 ) =
p(x1) p(x 2 | x1)p(x3 | x1, x 2 )p(x4 | x1, x2, x3 )p(x5 | x1, x2, x 3, x4 ) p(x6 | x1, x2, x3, x 4 , x5 )
By Chain Rule (using the usual arithmetic ordering)
Joint distribution from the example graph:
p(x1) p(x2 | x1)p(x 3 | x1)p(x 4 | x2 )p(x5 | x3 )p(x6 | x2, x5 )
Missing variables in the local conditional probability functions
correspond to missing edges in the underlying graph.
Removing an edge into node i eliminates an argument from the
conditional probability factor
i
1 2
i-1
p(x | x , x ,..., x )
Semantics


A BN represents a full joint distribution in a compact way.
We only need to specify a conditional probability
distribution for each node given its parents:
P (X | Parents(X))
Z1
…
Z2
Zn
X
P (X | Z1, …, Zn)
X1
Example
0
21
0 1
X
X4
0 1
X
X2
0
41
X2
X6
X1
0
X1
1
X3
1
0
X5
X3
X1
0 1
X
0
31
0
0
1
0 1
X
0
51
1
Example: Alarm Network
B
P(B)
+b
0.001
b
0.999
Burglary
Earthqk
E
P(E)
+e
0.002
e
0.998
B
E
A
P(A|B,E)
+b
+e
+a
0.95
+b
+e
a
0.05
+b
e
+a
0.94
Alarm
John
calls
Mary
calls
A
J
P(J|A)
A
M
P(M|A)
+b
e
a
0.06
+a
+j
0.9
+a
+m
0.7
b
+e
+a
0.29
+a
j
0.1
+a
m
0.3
b
+e
a
0.71
a
+j
0.05
a
+m
0.01
b
e
+a
0.001
a
j
0.95
a
m
0.99
b
e
a
0.999
Size of a Bayes’ Net
• How big is a joint distribution over N Boolean variables?
2N
• How big is an N-node net if nodes have up to k parents?
O(N * 2k+1)
•
•
•
•
Both give you the power to calculate
BNs: Huge space savings!
Easier to elicit local CPTs
Also faster to answer queries
30
The joint probability distribution

For example, P(j, m, a, ¬b, ¬e)
= P(¬b) P(¬e) P(a | ¬b, ¬e) P(j | a) P(m | a)
Independence in a BN
• Important question about a BN:
–
–
–
–
Are two nodes independent given some evidence?
If yes, can prove using algebra (tedious in general)
If no, can prove with a counter example
Example:
X
Y
Z
– Question: are X and Z necessarily independent?
• Answer: no. Example: low pressure causes rain, which
causes traffic.
• X can influence Z, Z can influence X (via Y)
• Addendum: they could be independent: how?
Independence
Key properties:
a) Each node is conditionally independent of its non-descendants given its
parents
b) A node is conditionally independent of all other nodes in the graph given it’s
Markov blanket (it’s parents, children, and children’s other parents)
Moral Graphs
Equivalent undirected form of a directed acyclic graph
Independence in a BN
• Important question about a BN:
–
–
–
–
Are two nodes independent given certain evidence?
If yes, can prove using algebra (tedious in general)
If no, can prove with a counter example
Example:
X
Y
Z
– Question: are X and Z necessarily independent?
• Answer: no. Example: low pressure causes rain, which
causes traffic.
• X can influence Z, Z can influence X (via Y)
• Addendum: they could be independent: how?
Causal Chains
• This configuration is a “causal chain”
X: Project due
X
Y
Z
Y: No office hours
Z: Students panic
– Is Z independent of X given Y?
Yes!
– Evidence along the chain “blocks” the influence
37
Common Cause
• Another basic configuration: two
effects of the same cause
Y
– Are X and Z independent?
– Are X and Z independent given Y?
X
Z
Y: Homework due
X: Full attendance
Z: Students sleepy
Yes!
– Observing the cause blocks
influence between effects.
38
Common Effect
• Last configuration: two causes of
one effect (v-structures)
– Are X and Z independent?
• Yes: the ballgame and the rain cause traffic,
but they are not correlated
• Still need to prove they must be (try it!)
– Are X and Z independent given Y?
• No: seeing traffic puts the rain and the
ballgame in competition as explanation
– This is backwards from the other cases
X
Z
Y
X: Raining
Z: Ballgame
Y: Traffic
• Observing an effect activates influence
between possible causes.
39
The General Case
• Any complex example can
be analyzed using these
three canonical cases
Causal Chain
Common Cause
• General question: in a given
BN, are two variables
independent (given
evidence)?
(Unobserved)
Common Effect
• Solution: analyze the graph
40
Bayes Ball
• Shade all observed nodes. Place balls at
the starting node, let them bounce around
according to some rules, and ask if any of
the balls reach any of the goal node.
• We need to know what happens when a
ball arrives at a node on its way to the goal
node.
42
43
Example
Yes
R
B
T
T’
44
Example
L
Yes
R
Yes
D
B
T
Yes
T’
45
Constructing Bayesian networks
1.
2.
Choose an ordering of variables X1, … , Xn
For i = 1 to n

add Xi to the network

select parents from X1, … ,Xi-1 such that
P(Xi | Parents(Xi)) = P(Xi | X1, ... Xi-1)
Example

Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?
Example

Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?
No
Example

Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?
P(A | J, M) = P(A)?
P(A | J, M) = P(A | J)?
P(A | J, M) = P(A | M)?
No
Example

Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?
P(A | J, M) = P(A)?
P(A | J, M) = P(A | J)?
P(A | J, M) = P(A | M)?
No
No
No
No
Example

Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?
No
P(A | J, M) = P(A)?
P(A | J, M) = P(A | J)?
P(A | J, M) = P(A | M)?
P(B | A, J, M) = P(B)?
P(B | A, J, M) = P(B | A)?
No
No
No
Example

Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?
No
P(A | J, M) = P(A)?
P(A | J, M) = P(A | J)?
P(A | J, M) = P(A | M)?
P(B | A, J, M) = P(B)?
P(B | A, J, M) = P(B | A)?
No
No
No
No
Yes
Example

Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?
No
P(A | J, M) = P(A)?
P(A | J, M) = P(A | J)?
P(A | J, M) = P(A | M)?
P(B | A, J, M) = P(B)?
P(B | A, J, M) = P(B | A)?
P(E | B, A ,J, M) = P(E)?
P(E | B, A, J, M) = P(E | A, B)?
No
No
No
No
Yes
Example

Suppose we choose the ordering M, J, A, B, E
P(J | M) = P(J)?
No
P(A | J, M) = P(A)?
P(A | J, M) = P(A | J)?
P(A | J, M) = P(A | M)?
P(B | A, J, M) = P(B)?
P(B | A, J, M) = P(B | A)?
P(E | B, A ,J, M) = P(E)?
P(E | B, A, J, M) = P(E | A, B)?
No
No
No
No
Yes
No
Yes
Example contd.
Deciding conditional independence is hard in noncausal
directions



The causal direction seems much more natural, but is not
mandatory
Network is less compact
Summary



Bayesian networks provide a natural
representation for (causally induced)
conditional independence
Topology + conditional probability tables
Generally easy for domain experts to
construct
Download