Graphical models Svend Kreiner Dept. of Biostatistics, University of Copenhagen 1 Graphical models for symmetrical relationships Definition 1. A graphical model is defined by a set of assumptions concerning conditional independence by pairs of variables given the remaining – the rest of the – variables of the model. The properties of graphical models are encapsulated in a mathematical graph, a set of nodes and edges between nodes, as shown in Figure 1. Independence graph defined by the following five assumptions: A╨C |B,D,E,F A╨F |B,C,D,E B╨D |A,C,E,F D╨F |A,B,C,E E╨F |A,B,C,D 2 Independence graphs are second order mathematical models. A model is an intentionally simplified representation of some kind of system. Certain properties of models correspond to properties of the system. Certain properties of the system is not represented in the model. Certain properties of the model do not represent anything A statistical model is a mathematical (probabilistic) model of reality. An independence graph is a mathematical model of a probabilistic model. Certain graph theoretical properties of independence graphs correspond to properties of the statistical model. Certain properties of statistical models have no counterpart in independence graphs Certain properties of independence graphs can not be interpreted as properties of statistical models. 3 Graphical models for discrete data are loglinear. Generators are defined by the cliques of the independence graph. Cliques/generators: ABE, ADE, CDE, BCEF. ln( P( A, B ,C , D, E , F )) abe ade cde bcef 4 Models collapse onto graphical marginal models. Collaps over D: P(A,B,C,E,F) The conditional distribution of a subset, of the variables given the remaining variables will be a graphical model. P(A,B,C,E,F|D) 5 The separation theorem: Separation implies conditional independence. Paths without detours from D to F: D-E-F D-C-F D-A-B-F D and F are separated by A, E, and C D ╨ F | A,E,C B, C, and E D ╨ F | B,E,C or 6 Separation implies parametric collapsibility in loglinear models. All indirect paths from A to D goes through (B,E) and (C,E). Parameters relating to A and D is the same in P(A,B,C,D,E,F) and in the marginal models P(A,D,B,E) and P(A,D,C,E) Marginal models sometimes have a simpler parametric structure than implied by the marginal graphical model. ln( P( A, B ,C , D, E , F )) abe ade cde bcef ln( P( A, B, D, E )) abe ade cde *bce 7 Decomposition by separation of complete subsets leads to decompositions of statistical models implying collapsibility in terms of likelihood inference for certain types of models. The BCE clique separate A and D from F (A,D) ╨ F | (B,C,E) P( A, B ,C , D , E , F ) P( A, D| E , B ,C ) P( F|E , B ,C ) P( B ,C , E ) P( A, B ,C , D , E ) P( B ,C , E , F ) P( B ,C , E ) 8 The topography of marginal models. Marginal model: P(A,B,C,E) exterior – all variables and edges not in the marginal model: The exterior of P(A,B,C,E) is equal to variables D and F and edges AD, ED, CD, BF, EF, and CF boundary - all variables connected to a disconnected subset of variables in the exterior. Two boundaries: AEC and BEC. Boundaries are always complete and fixed in a marginal model. border - the set of all boundaries interior – variables and edges of a marginal model that are not included in any boundary. The interior of P(A,B,C,E) is the ABedge. problem core – the smallest irreducible component onto which the model may be collapsed for an analysis of a given problem. P(A,B,C,E) is not the core of the problem of estimating the AB association. 9 Graphical regression models Dependent variables: Independent variables: Y = (Y1,...,Yr), X = (X1,...,Xs). Definition 2. A graphical regression model is a graphical model of P(Y|X). Conditional independence assumptions are restricted to Yi ╨ Yj | Y1,..,Yi-1,Yi+1,..,Yj-1,Yj+1, .,Yr,X1,..,Xs Yi ╨ Xj | Y1,..,Yi-1,Yi+1,..,Yr,X1,..,Xj-1,Xj+1,..,Xs Independence graph for a graphical regression model of P(a,c,h|b,d,e,f,g). 10 Chain graph models Recursive models: Variables V = (V1,…,Vk) Disjoint subsets of variables :(U1,…,Ur) such that V U i i r 1 P(V ) P(U i | U i 1 ,..,U r ) P (U r ) 1 Definition 2 A chain graph model is a block recursive model where P(Ur) is a graphical model and each of the components P(Ui | Ui+1,..,Ur) are graphical regression models. Directed independence graph of a chain graph model 11 Regression graphs implied by a chain graph model 12