Introduction to multivariate statistics Terry Speed, SICSA Summer School Statistical Inference in Computational Biology, Edinburgh, June 14-15, 2010 Lecture 4 1 Aim of this lecture In this final lecture, I want to obtain a result completely analogous to the Proposition in the previous lecture, but for discrete not Gaussian distributions. This is known as the Hammersley-Clifford theorem, which appeared in 1971 in a still unpublished technical report. The 1979 Sankhya paper (v.41, pp184-197) which was handed out yesterday contains a full proof of this result, and a discussion of aspects of some proofs then available. My plan today is to talk you through the proof, without writing everything out a second time. Iʼll cut and paste some details here, and discuss the paper as I go. The first thing to note is that our graph setting is identical to that in Lecture 3: our notation X = (Xγ: γ ε C) is the same, the only difference being that the random variables Xγ are all discrete, not Gaussian. The second thing to note is that parts (ii) and (iii) of the Proposition we seek here are identical to those in the Proposition of the last lecture; what we need that is new is the discrete analogue of the condition on K-1. To get there requires some notation. We seek 2 Characterization of discrete Markov distributions over a finite graph Proposition. Let C be a simple undirected graph with vertex set C and edge set E(C) indexing discrete random variables X = (Xϒ) with strictly positive joint distribution p. Then the following are equivalent. (i) Constraint on log p: This is what I need to tell you, see p.14. (ii) Local Markov property: For every ϒ ε C, Xϒ and X{ϒ}’ are conditionally independent given Xbd{ϒ} . (iii) Global Markov property: For every pair of disjoint subsets a and b of C and third subset d separating a from b in C, Xa and Xb are conditionally independent given Xd . I hope you now see where we are heading. Note that the condition on K-1 is in a sense a constraint on log φ, where φ is the normal density. In the discrete case, we need to go beyond pairs. 3 The condition on log p Before I can even write down the condition on log p, I need to spell out some notation. The random variable Xϒ is of course discrete, and so takes its values in a finite set which will be denoted by Iϒ . (Extending to countably infinite sets is not really a problem, see the paper.) 4 5 6 Illustration of the previous Lemma Letʼs suppose that C = {1,2,3} and that the sets I1, I2 and I3 are arbitrary finite sets with elements i, j and k respectively. Suppose that P(X1=i, X2=j , X3=k) = pijk . If we have P(X1=i,X3=k | X2=j) = P(X1=i|X2=j)P(X3=k|X2=j), then we readily calculate that (*) pijk = pij+ p+jk / p+j+ , where “+” denotes summing out the missing subscript. Check! Thus log p ijk = uij + vjk for suitable functions u and v. Conversely, if pijk = aij bjk , then it is a straightforward calculation to check that pij+ = aij bj+ , p+jk = a+j bjk and p+j+ =a+j bj+ , from which (*) follows by multiplication and division. 7 8 Illustration of the previous lemma Suppose that we carry on the notation for the example on p.7 above, and write wijk = log p ijk . The following additive expansion of wijk into a grand mean, 3 main effects, 3 two-factor interactions, and a 3-factor interaction, is an instance of that described in the text on the previous page. It should be familiar to anyone who has ever encountered linear models and anova. Here “.” means the average has been taken over the missing subscript. w ijk = w… + (wi... – w…) + (w.j. – w…) + (w..k – w…) + (wij. – wi.. – w.j. + w…) + (w.jk – w.j. – w..k + w…) + (wi.k – w i.. – w..k + w…) + (wijk – wij. – w.jk – wi.k - wi.. + w.j. + w..k – w…). These 8 terms are orthogonal in a suitable inner product, and the expansion is unique. 9 10 Examples of graphs and their sets of cliques 11 12 Illustration of the previous lemma Continue our running example, when C = {1,2,3}, and suppose now that a graph over C has precisely two edges: E(C) = {{1,2}, {2.3}}. Then the cliques of this graph are again {1,2} and {2,3} and a function wijk would belong to the linear space of Lemma 3.1 if and only if it could be written wijk = uij + vjk for suitable functions u and v. Going back to the representation on p.9 above, this means omitting all the terms in that expansion other than those involving those corresponding to the cliques {1,2} and {2,3}. We see that there is redundancy in this representation. 13 14 15 16 17 18 Examples of graphs and their condi4onal independencies. 19 Thanks for listening! 20