Factor Graphs

advertisement
Christopher M. Bishop,
Pattern Recognition and Machine Learning
1
Outline
Introduction
 Directed Graphs
 Undirected Graphs
 Factor Graphs
 Summary

2
Outline
Introduction
 Directed Graphs
 Undirected Graphs
 Factor Graphs
 Summary

3
Introduction
A graph consists of nodes (vertices) that are
connected by edges (links, arcs)
 They provide a simple and clear way to
visualize the probabilistic model
 Complex computations can be expressed in
terms of graphical manipulations

4
Probabilistic Graphical Models
There are two models: directed and
undirected graphical models
 Each node represents a random variable
and the edges represent probabilistic
relationships between these variables

Directed
Undirected
5
Outline
Introduction
 Directed Graphs
 Undirected Graphs
 Factor Graphs
 Summary

6
Directed Graphical Models

An example: p ( a , b , c )  p ( c | a , b ) p ( a , b )
 p (c | a , b ) p (b | a ) p ( a )
a
b
c

Definition: for a graph with K nodes, the joint
distribution is given by
K
p ( x1 ,
, xK ) 

p ( x k | pa k )
k 1
where pa k denotes the set of parents of x k
7
An Example
x1
x2
x4
x6
p ( x1 ,
x3
x5
x7
, x 7 )  p ( x1 ) p ( x 2 ) p ( x 3 ) p ( x 4 | x1 , x 2 , x 3 ) 
p ( x 5 | x1 , x 3 ) p ( x 6 | x 4 ) p ( x 7 | x 4 , x 5 )
8
Conditional Independence (1)
 a
is conditionally independent of b given c :
p (a | b, c)  p (a | c)
 p ( a , b | c )  p ( a | b , c ) p (b | c )  p ( a | c ) p (b | c )

A shorthand notation:

There are three types of conditional
independencies for the directed graphs
9
Conditional Independence (2)
tail-to-tail
a
b
p ( a , b , c )  p ( a | c ) p (b | c ) p ( c )
p (a, b) 

c
blocked
c
c
p ( a | c ) p (b | c ) p ( c )
a
b
p (a, b | c) 
p (a, b, c)
p (c )
 p ( a | c ) p (b | c )
10
Conditional Independence (2)
a
c
b
head-to-tail
b
a
c
head-to-head
b
a
c
dependence

Definition: d-separation is the notion of
being separated on a directed graph
11
D-separation: an example
f
a
e
b
c
12
Application: an Example

Hidden Markov model:
13
Outline
Introduction
 Directed Graphs
 Undirected Graphs
 Factor Graphs
 Summary

14
Undirected Graphical Models
C
B
A

Nodes of set A and B are separated by the
third set C  A and B are conditionally
independent,
 p ( a1 , b1 | c1 , c 2 )  p ( a1 | c1 , c 2 ) p ( b1 | c1 , c 2 )
15
Conditional Independence
C1
H1
H2
C2

The computers can infect each other via the
hubs and the hubs can infect each other via
the computers
16
Cliques
Definition: a subset of the nodes in a clique
is fully connected
 Maximal cliques


x1
x2
x3
x4
We can define the factors in decomposition
of the joint distribution as functions of the
variable in the clique
17
Undirected Factorization

Consider factorizations of the form:
p(x ) 
1


Z
C
(x C )
C
where  C (x C ) is a non-negative potential
function of a maximal clique
 An example:
x
x
p ( x1 ,
1
2
x3
x4
, x 4 )   ( x1 , x 2 , x 3 ) ( x 2 , x 3 , x 4 )
18
An Example

Markov random field:
19
Directed versus Undirected (1)
x1
x3
x1
x2
moralization
x3
x2
moral graph
x4
p ( x1 ,
x4
, x 4 )  p ( x1 ) p ( x 2 ) p ( x 3 ) p ( x 4 | x1 , x 2 , x 3 )
  ( x1 , x 2 , x 3 , x 4 )

We have to discard some conditional
independence properties to complete this
transfer
20
Directed versus Undirected (2)
D
U
P

P: the set of all distributions over a given set
of variables
21
Outline
Introduction
 Directed Graphs
 Undirected Graphs
 Factor Graphs
 Summary

22
Factor Graphs (1)
A factor graph is a more general graph
 It allows us to be more explicit about the
details of the factorization
 An example:

x2
x1
x3
Variable node
Factor node
fa
fb
fc
fd
p ( x1 , x 2 , x 3 )  f a ( x1 , x 2 ) f b ( x1 , x 2 ) f c ( x 2 , x 3 ) f d ( x 3 )
23
Factor Graphs (2)

Definition: given a factor graph, the joint
probability distribution is given by
p(x ) 

fs (x s )
s
where the x s denotes a subset of the
variables that connect to the factor f s
 Each factor f s is a function of a
corresponding set of variables x s
24
Factor Graphs (3)
 ( x1 , x 2 , x 3 )

f a ( x1f, (xx2 1, ,xx32), fxb3()x1 , x 2 )
25
Factor Graphs (4)
f a ( x1 )  pf ((xx11),, xf2b,(xx32))  pp((xx1 2) ),p (fxc 2( )x1p,(xx23, |xx3 1), x 2 p) ( x 3 | x1 , x 2 )

Directed and undirected graphs are special
cases of factor graphs
26
Sum-Product Algorithm (1)

Goal:
 Obtain a efficient, exact inference algorithm for
finding marginals
 Allow computations to be shared efficiently

By definition, the marginal is
p(x) 

p (x )
x\x
27
Sum-Product Algorithm (2)
p(x ) 

Fs ( x , X s )
s  ne ( x )
where
n e ( x ) : the factor nodes are neighbors of x
X s : all variables in the subtree
Fs ( x , X s ) : the product of all the factors in the
group associated with factor f s
28
Sum-Product Algorithm (3)
p( x) 

p(x ) 
x\x
 f

x\x


  Fs ( x , X s )  
 s ne ( x )



   Fs ( x , X s )  
s  ne ( x )  X s


s  ne ( x )
f
s
x
( x)
can be view as messages from the
factor node fs to the variable node x
 Fs ( x , X s ) which is a factor sub-graph can
itself be factorized
s
x
( x)
Fs ( x , X s )  f s ( x , x1 ,
, x M ) G1 ( x1 , X s1 )
G M ( x M , X sM )
29
Sum-Product Algorithm (4)
G1(x1,Xs1)
x1
x
x2
fs
xM
Fs ( x , X s )  f s ( x , x1 ,
, x M ) G1 ( x1 , X s1 )
G M ( x M , X sM )
30
Sum-Product Algorithm (5)
f
s
x
( x) 



x1
xM


x1
xM
f s ( x , x1 ,


, x M )    G m ( x m , X sm ) 
m  ne ( f s ) \ x  X sm

f s ( x , x1 ,
, xM )

m  ne ( f s ) \ x
x
m
 fs
( xm )
31
Sum-Product Algorithm (6)
G m ( x m , X sm ) 

Fl ( x m , X m l )
l n e ( x m ) \ f s
x
m
 fs
( xm ) 

l  ne ( x m ) \ f s


  Fl ( x m , X m l )  
 X ml


l  ne ( x m ) \ f s
f
l
 xm
( xm )
32
Sum-Product Algorithm (7)

Messages:
 Variable node  factor node: take the product of
the in coming messages along all of the other link
 Factor node  variable node: take the product of
the in coming messages along all of the other link
and multiply by the factor
33
Sum-Product Algorithm (8)

The sum-product algorithm can be viewed
purely in terms of messages sent out by
factor nodes to other factor nodes
34
Sum-Product Algorithm – an Example
root
x1
x2
x3
fb
fa
fc
x4
35
Max-Sum Algorithm (1)

Find a setting of the variables that has the
largest probability
x
m ax
 arg m ax p ( x )
x

Find the value of that probability
p(x
m ax
)  m ax p ( x )
x
m ax p ( x )  m ax
x
x1
m ax p ( x )
xM
36
Max-Sum Algorithm (2)

Compare this with the marginal:
p(x
m ax
)  m ax p ( x )
x
 m ax m ax p ( x )
x

x\x
p(x) 

p (x )
x\x
That is similar to the sum-product algorithm
except that the summations are replaced by
maximization
37
Max-Sum Algorithm (3)

The max-product algorithm:
f
s
x
( x) 


x1
xM
 f  x ( x )  m ax
x1
f s ( x , x1 ,
x
mne ( f s ) \ x
m ax f ( x , x1 ,
xM
 x f ( x ) 

, xM )
, xM )

mne ( f ) \ x

l n e ( x ) \ f
f
l
x
m
 fs
x
m
( xm )
 f
( xm )
( x)
38
Max-Sum Algorithm (4)
It is convenient to work with the logarithm of
the joint distribution
 The max-sum algorithm:


 f  x ( x )  m ax  ln f ( x , x1 ,
x1 , , x M

 x f ( x) 

, xM ) 
l  ne ( x ) \ f

m  ne ( f ) \ x
f
l
x
x
m
 f

( xm ) 

( x)
39
Max-Sum Algorithm (5)

We can find the maximum by propagating
messages from leaves to a root node
p

m ax


 m ax    f s  x ( x ) 
x
 s ne ( x )

Now we want to find the configuration of the
variables for which the joint distribution
attains this maximum value
40
Max-Sum Algorithm (6)

An example:
x2
x1
x3
f2,3
f1,2
x
m ax
N
xN-1
xN
fN-1,N
 arg m ax   f N 1 , N  x N ( x N ) 


x
N
 ( x n )  arg m ax  ln f n 1, n ( x n 1, x n )   x
x n 1


n 1  f n 1 , n
( xn ) 

m ax
N
Once we know x , we can propagate a
message back down the chain using
x n 1   ( x n
m ax
m ax
)
41
Max-Sum Algorithm (7)

It is known as back-tracking

This can be extended to a general treestructure factor graph
42
Examples

A Markov chain:

A hidden Markov model:
43
Outline
Introduction
 Directed Graphs
 Undirected Graphs
 Factor Graphs
 Summary

44
Summary
The author introduces three types of
probabilistic graphs
 Graphical models are composed of
probability theory and graphical theory
 The concept is to factorize a complicated
system into some simple components

45
Download