Introduction to multivariate statistics Terry Speed, SICSA Summer School Statistical Inference in Computational Biology, Edinburgh, June 14-15, 2010 Lecture 3 1 From yesterdayʼs second lecture: Conditional independence with Gaussians I turn now to the material in the paper Gaussian Markov distributions over finite graphs. The setting there is a random vector X = (Xγ: γ ε C) indexed by finite set C which will later be given a graph structure. The covariance matrix of X is denoted by K, and for subsets a, b of C, I use the notation Xa , Xb , Ka,b , Ka = Ka,a for the restrictions of X and K to these subsets. Also, ab and a\b denote intersection and difference, respectively. Proposition 1 For subsets a and b of C with aub = C, the following are equivalent. (i) Ka,b = Ka,abKab-1Kab,b ; (iʼ) Ka\b,b\a = Ka\b,abKab-1Kab, b\a ; (ii) (K-1)a\b,b\a = 0 ; (iii) Xa and Xb are c.i. given Xab . Corollary. Xα and Xβ are c.i. given X{α ,β }ʼ iff K-1(α,β) = 0. (Here c.i. abbreviates conditional independence, ʻ = complement.) 2 Gaussian Markov distributions over finite undirected graphs All the graphs I will discuss are undirected. In fact, it was from seeing the developments in the books by C Bishop and MI Jordan, who both begin with directed graphs, which led me to present this material. In a sense it is easier to begin with directed graphs, for the theory is just a specialization of the familiar factorization: P(X1, X2, X3,…,Xn) = P(X1)P(X2|X1)P(X3|X1,X2)… …P(Xn|X1,X2, X3,…Xn-1). However, once directed graphs are introduced, it is hard to go back and do justice to the undirected case. In my view, neither of the above authors do so. 3 Terminology and notation concerning undirected graphs A graph C has vertices V(C) and edges E(C), but I will keep things simple by supposing that V(C) = C, and not using C any more. The notion of adjacency (there exists an edge), neighbours, (maximal) clique, chain or path and cycle should be familiar. As notation, write bd{γ} for the set of neighbours of ϒ ε C, the boundary (which would be denoted by a del if .ppt had one), cl(γ) = {γ}ubd{γ} for the closure of {γ} (which would be a bar over γ if I could do one in .ppt). Finally, we say that two sets a and b of vertices are separated by a third set d if every path connecting an element α ε a to an element β ε b must intersect (“cross”) d. 4 Characterization of Gaussian Markov distributions over a finite graph Proposition. Let C be a simple undirected graph with vertex set C and edge set E(C) indexing Gaussian random variables X = (Xϒ :ϒ ε C) with covariance matrix K. Then the following are equivalent. (i) Constraint on K: K-1(α,β) = 0 if {α,β} is not an edge and α≠β . (ii) Local Markov property: For every ϒ ε C, Xϒ and X{ϒ}’ are conditionally independent given Xbd{ϒ} . (iii) Global Markov property: For every pair of disjoint subsets a and b of C and third subset d separating a from b in C, Xa and Xb are conditionally independent given Xd . 5 Proof of the Proposition To see that (i) and (ii) are equivalent, we note that (i) is equivalent to K-1{γ},cl{γ}ʼ = 0. Putting a = cl{γ} and b = {γ}ʼ in our result from yesterday, see p.2 above, proves the result, since the intersection of cl{γ} and {γ}ʼ is exactly bd{γ}. Draw a diagram! (I note here that the notation in the paper is not wholly consistent.) The equivalence of (i) and (iii) for the case aubud = C follows in a similar way. To see this, put a1 = aud and b1 = bud in yesterdayʼs result. Then the intersection of a1 and b1 is d. It remains to prove that every disjoint pair a and b separated by d in the graph C can be included in subsets a* ≥ a and b* ≥ b also separated by d, where a*ub*ud = C. We turn now to this purely topological result. When it is proved, our Proposition is proved. 6 Proof of the topological result I first saw the argument below in the paper “Markov meaures and Markov extensions” by N N Vorobʼev, Theory of Probability and its Applications, 1963. Suppose that a* ≥ a and b* ≥ b are separated by d, and are maximal w.r.t. this property. We prove that a*ub*ud = C by contradiction. Assume that a*ub*ud ≠ C . Then there is an element δ ε C\a*ub*ud, and we consider a**=a*u{δ} and b**=b*u{δ}. From the assumed maximality of a* and b*, a** and b** cannot be separated by d in C. Thus there exists a path p1 say, connecting some α ε a* to δ without intersecting d, and also a path p2 say, connecting some β ε b* to δ without intersecting d. But then the concatenation of the paths p1 and p2 connects α to β without intersecting d, which contradicts our assumption on a* and b*. This contradiction proves the assertion. 7 8