Introduction to multivariate statistics
Terry Speed, SICSA Summer School
Statistical Inference in Computational Biology, Edinburgh, June 14-15, 2010
Lecture 3 1 From yesterdayʼs second lecture: Conditional
independence with Gaussians
I turn now to the material in the paper Gaussian Markov
distributions over finite graphs. The setting there is a random
vector X = (Xγ: γ ε C) indexed by finite set C which will later be
given a graph structure. The covariance matrix of X is denoted
by K, and for subsets a, b of C, I use the notation Xa , Xb , Ka,b ,
Ka = Ka,a for the restrictions of X and K to these subsets. Also,
ab and a\b denote intersection and difference, respectively.
Proposition 1 For subsets a and b of C with aub = C, the
following are equivalent. (i) Ka,b = Ka,abKab-1Kab,b ; (iʼ) Ka\b,b\a = Ka\b,abKab-1Kab, b\a ;
(ii) (K-1)a\b,b\a = 0 ; (iii) Xa and Xb are c.i. given Xab .
Corollary. Xα and Xβ are c.i. given X{α ,β }ʼ iff K-1(α,β) = 0.
(Here c.i. abbreviates conditional independence, ʻ = complement.)
2 Gaussian Markov distributions
over finite undirected graphs
All the graphs I will discuss are undirected. In fact, it was
from seeing the developments in the books by C Bishop
and MI Jordan, who both begin with directed graphs,
which led me to present this material. In a sense it is
easier to begin with directed graphs, for the theory is just
a specialization of the familiar factorization:
P(X1, X2, X3,…,Xn) = P(X1)P(X2|X1)P(X3|X1,X2)…
…P(Xn|X1,X2, X3,…Xn-1).
However, once directed graphs are introduced, it is hard
to go back and do justice to the undirected case. In my
view, neither of the above authors do so. 3 Terminology and notation concerning
undirected graphs
A graph C has vertices V(C) and edges E(C), but I will keep
things simple by supposing that V(C) = C, and not using C
any more. The notion of adjacency (there exists an edge),
neighbours, (maximal) clique, chain or path and cycle should
be familiar. As notation, write bd{γ} for the set of neighbours of ϒ ε C, the
boundary (which would be denoted by a del if .ppt had one),
cl(γ) = {γ}ubd{γ} for the closure of {γ} (which would be a bar
over γ if I could do one in .ppt). Finally, we say that two sets a and b of vertices are separated
by a third set d if every path connecting an element α ε a to
an element β ε b must intersect (“cross”) d.
4 Characterization of Gaussian Markov
distributions over a finite graph
Proposition. Let C be a simple undirected graph with vertex
set C and edge set E(C) indexing Gaussian random
variables X = (Xϒ :ϒ ε C) with covariance matrix K.
Then the following are equivalent. (i) Constraint on K: K-1(α,β) = 0 if {α,β} is not an edge and
α≠β .
(ii) Local Markov property: For every ϒ ε C, Xϒ and X{ϒ}’
are conditionally independent given Xbd{ϒ} .
(iii) Global Markov property: For every pair of disjoint
subsets a and b of C and third subset d separating a from b
in C, Xa and Xb are conditionally independent given Xd .
5 Proof of the Proposition
To see that (i) and (ii) are equivalent, we note that (i) is equivalent to
K-1{γ},cl{γ}ʼ = 0. Putting a = cl{γ} and b = {γ}ʼ in our result from
yesterday, see p.2 above, proves the result, since the intersection
of cl{γ} and {γ}ʼ is exactly bd{γ}. Draw a diagram! (I note here that
the notation in the paper is not wholly consistent.)
The equivalence of (i) and (iii) for the case aubud = C follows in a
similar way. To see this, put a1 = aud and b1 = bud in yesterdayʼs
result. Then the intersection of a1 and b1 is d.
It remains to prove that every disjoint pair a and b separated by d in
the graph C can be included in subsets a* ≥ a and b* ≥ b also
separated by d, where a*ub*ud = C. We turn now to this purely
topological result. When it is proved, our Proposition is proved. 6 Proof of the topological result
I first saw the argument below in the paper “Markov meaures and
Markov extensions” by N N Vorobʼev, Theory of Probability and its
Applications, 1963. Suppose that a* ≥ a and b* ≥ b are separated by d, and are maximal
w.r.t. this property. We prove that a*ub*ud = C by contradiction. Assume that a*ub*ud ≠ C . Then there is an element δ ε C\a*ub*ud,
and we consider a**=a*u{δ} and b**=b*u{δ}. From the assumed maximality of a* and b*, a** and b** cannot be
separated by d in C. Thus there exists a path p1 say, connecting
some α ε a* to δ without intersecting d, and also a path p2 say,
connecting some β ε b* to δ without intersecting d. But then the
concatenation of the paths p1 and p2 connects α to β without
intersecting d, which contradicts our assumption on a* and b*.
