Factor Graphs

Christopher M. Bishop, Pattern Recognition and Machine Learning 1 Outline Introduction  Directed Graphs  Undirected Graphs  Factor Graphs  Summary  2 Outline Introduction  Directed Graphs  Undirected Graphs  Factor Graphs  Summary  3 Introduction A graph consists of nodes (vertices) that are connected by edges (links, arcs)  They provide a simple and clear way to visualize the probabilistic model  Complex computations can be expressed in terms of graphical manipulations  4 Probabilistic Graphical Models There are two models: directed and undirected graphical models  Each node represents a random variable and the edges represent probabilistic relationships between these variables  Directed Undirected 5 Outline Introduction  Directed Graphs  Undirected Graphs  Factor Graphs  Summary  6 Directed Graphical Models  An example: p ( a , b , c )  p ( c | a , b ) p ( a , b )  p (c | a , b ) p (b | a ) p ( a ) a b c  Definition: for a graph with K nodes, the joint distribution is given by K p ( x1 , , xK )   p ( x k | pa k ) k 1 where pa k denotes the set of parents of x k 7 An Example x1 x2 x4 x6 p ( x1 , x3 x5 x7 , x 7 )  p ( x1 ) p ( x 2 ) p ( x 3 ) p ( x 4 | x1 , x 2 , x 3 )  p ( x 5 | x1 , x 3 ) p ( x 6 | x 4 ) p ( x 7 | x 4 , x 5 ) 8 Conditional Independence (1)  a is conditionally independent of b given c : p (a | b, c)  p (a | c)  p ( a , b | c )  p ( a | b , c ) p (b | c )  p ( a | c ) p (b | c )  A shorthand notation:  There are three types of conditional independencies for the directed graphs 9 Conditional Independence (2) tail-to-tail a b p ( a , b , c )  p ( a | c ) p (b | c ) p ( c ) p (a, b)   c blocked c c p ( a | c ) p (b | c ) p ( c ) a b p (a, b | c)  p (a, b, c) p (c )  p ( a | c ) p (b | c ) 10 Conditional Independence (2) a c b head-to-tail b a c head-to-head b a c dependence  Definition: d-separation is the notion of being separated on a directed graph 11 D-separation: an example f a e b c 12 Application: an Example  Hidden Markov model: 13 Outline Introduction  Directed Graphs  Undirected Graphs  Factor Graphs  Summary  14 Undirected Graphical Models C B A  Nodes of set A and B are separated by the third set C  A and B are conditionally independent,  p ( a1 , b1 | c1 , c 2 )  p ( a1 | c1 , c 2 ) p ( b1 | c1 , c 2 ) 15 Conditional Independence C1 H1 H2 C2  The computers can infect each other via the hubs and the hubs can infect each other via the computers 16 Cliques Definition: a subset of the nodes in a clique is fully connected  Maximal cliques   x1 x2 x3 x4 We can define the factors in decomposition of the joint distribution as functions of the variable in the clique 17 Undirected Factorization  Consider factorizations of the form: p(x )  1   Z C (x C ) C where  C (x C ) is a non-negative potential function of a maximal clique  An example: x x p ( x1 , 1 2 x3 x4 , x 4 )   ( x1 , x 2 , x 3 ) ( x 2 , x 3 , x 4 ) 18 An Example  Markov random field: 19 Directed versus Undirected (1) x1 x3 x1 x2 moralization x3 x2 moral graph x4 p ( x1 , x4 , x 4 )  p ( x1 ) p ( x 2 ) p ( x 3 ) p ( x 4 | x1 , x 2 , x 3 )   ( x1 , x 2 , x 3 , x 4 )  We have to discard some conditional independence properties to complete this transfer 20 Directed versus Undirected (2) D U P  P: the set of all distributions over a given set of variables 21 Outline Introduction  Directed Graphs  Undirected Graphs  Factor Graphs  Summary  22 Factor Graphs (1) A factor graph is a more general graph  It allows us to be more explicit about the details of the factorization  An example:  x2 x1 x3 Variable node Factor node fa fb fc fd p ( x1 , x 2 , x 3 )  f a ( x1 , x 2 ) f b ( x1 , x 2 ) f c ( x 2 , x 3 ) f d ( x 3 ) 23 Factor Graphs (2)  Definition: given a factor graph, the joint probability distribution is given by p(x )   fs (x s ) s where the x s denotes a subset of the variables that connect to the factor f s  Each factor f s is a function of a corresponding set of variables x s 24 Factor Graphs (3)  ( x1 , x 2 , x 3 )  f a ( x1f, (xx2 1, ,xx32), fxb3()x1 , x 2 ) 25 Factor Graphs (4) f a ( x1 )  pf ((xx11),, xf2b,(xx32))  pp((xx1 2) ),p (fxc 2( )x1p,(xx23, |xx3 1), x 2 p) ( x 3 | x1 , x 2 )  Directed and undirected graphs are special cases of factor graphs 26 Sum-Product Algorithm (1)  Goal:  Obtain a efficient, exact inference algorithm for finding marginals  Allow computations to be shared efficiently  By definition, the marginal is p(x)   p (x ) x\x 27 Sum-Product Algorithm (2) p(x )   Fs ( x , X s ) s  ne ( x ) where n e ( x ) : the factor nodes are neighbors of x X s : all variables in the subtree Fs ( x , X s ) : the product of all the factors in the group associated with factor f s 28 Sum-Product Algorithm (3) p( x)   p(x )  x\x  f  x\x     Fs ( x , X s )    s ne ( x )       Fs ( x , X s )   s  ne ( x )  X s   s  ne ( x ) f s x ( x) can be view as messages from the factor node fs to the variable node x  Fs ( x , X s ) which is a factor sub-graph can itself be factorized s x ( x) Fs ( x , X s )  f s ( x , x1 , , x M ) G1 ( x1 , X s1 ) G M ( x M , X sM ) 29 Sum-Product Algorithm (4) G1(x1,Xs1) x1 x x2 fs xM Fs ( x , X s )  f s ( x , x1 , , x M ) G1 ( x1 , X s1 ) G M ( x M , X sM ) 30 Sum-Product Algorithm (5) f s x ( x)     x1 xM   x1 xM f s ( x , x1 ,   , x M )    G m ( x m , X sm )  m  ne ( f s ) \ x  X sm  f s ( x , x1 , , xM )  m  ne ( f s ) \ x x m  fs ( xm ) 31 Sum-Product Algorithm (6) G m ( x m , X sm )   Fl ( x m , X m l ) l n e ( x m ) \ f s x m  fs ( xm )   l  ne ( x m ) \ f s     Fl ( x m , X m l )    X ml   l  ne ( x m ) \ f s f l  xm ( xm ) 32 Sum-Product Algorithm (7)  Messages:  Variable node  factor node: take the product of the in coming messages along all of the other link  Factor node  variable node: take the product of the in coming messages along all of the other link and multiply by the factor 33 Sum-Product Algorithm (8)  The sum-product algorithm can be viewed purely in terms of messages sent out by factor nodes to other factor nodes 34 Sum-Product Algorithm – an Example root x1 x2 x3 fb fa fc x4 35 Max-Sum Algorithm (1)  Find a setting of the variables that has the largest probability x m ax  arg m ax p ( x ) x  Find the value of that probability p(x m ax )  m ax p ( x ) x m ax p ( x )  m ax x x1 m ax p ( x ) xM 36 Max-Sum Algorithm (2)  Compare this with the marginal: p(x m ax )  m ax p ( x ) x  m ax m ax p ( x ) x  x\x p(x)   p (x ) x\x That is similar to the sum-product algorithm except that the summations are replaced by maximization 37 Max-Sum Algorithm (3)  The max-product algorithm: f s x ( x)    x1 xM  f  x ( x )  m ax x1 f s ( x , x1 , x mne ( f s ) \ x m ax f ( x , x1 , xM  x f ( x )   , xM ) , xM )  mne ( f ) \ x  l n e ( x ) \ f f l x m  fs x m ( xm )  f ( xm ) ( x) 38 Max-Sum Algorithm (4) It is convenient to work with the logarithm of the joint distribution  The max-sum algorithm:    f  x ( x )  m ax  ln f ( x , x1 , x1 , , x M   x f ( x)   , xM )  l  ne ( x ) \ f  m  ne ( f ) \ x f l x x m  f  ( xm )   ( x) 39 Max-Sum Algorithm (5)  We can find the maximum by propagating messages from leaves to a root node p  m ax    m ax    f s  x ( x )  x  s ne ( x )  Now we want to find the configuration of the variables for which the joint distribution attains this maximum value 40 Max-Sum Algorithm (6)  An example: x2 x1 x3 f2,3 f1,2 x m ax N xN-1 xN fN-1,N  arg m ax   f N 1 , N  x N ( x N )    x N  ( x n )  arg m ax  ln f n 1, n ( x n 1, x n )   x x n 1   n 1  f n 1 , n ( xn )   m ax N Once we know x , we can propagate a message back down the chain using x n 1   ( x n m ax m ax ) 41 Max-Sum Algorithm (7)  It is known as back-tracking  This can be extended to a general treestructure factor graph 42 Examples  A Markov chain:  A hidden Markov model: 43 Outline Introduction  Directed Graphs  Undirected Graphs  Factor Graphs  Summary  44 Summary The author introduces three types of probabilistic graphs  Graphical models are composed of probability theory and graphical theory  The concept is to factorize a complicated system into some simple components  45

Factor Graphs

Related documents

Products

Support

Factor Graphs

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib