Document 13354405

advertisement
Introduction to multivariate statistics
Terry Speed, SICSA Summer School
Statistical Inference in Computational Biology, Edinburgh, June 14-15, 2010
Lecture 4 1 Aim of this lecture
In this final lecture, I want to obtain a result completely analogous to
the Proposition in the previous lecture, but for discrete not Gaussian
distributions. This is known as the Hammersley-Clifford theorem,
which appeared in 1971 in a still unpublished technical report. The 1979 Sankhya paper (v.41, pp184-197) which was handed out
yesterday contains a full proof of this result, and a discussion of
aspects of some proofs then available. My plan today is to talk you
through the proof, without writing everything out a second time. Iʼll cut
and paste some details here, and discuss the paper as I go. The first thing to note is that our graph setting is identical to that in
Lecture 3: our notation X = (Xγ: γ ε C) is the same, the only difference
being that the random variables Xγ are all discrete, not Gaussian.
The second thing to note is that parts (ii) and (iii) of the Proposition
we seek here are identical to those in the Proposition of the last
lecture; what we need that is new is the discrete analogue of the
condition on K-1. To get there requires some notation. We seek 2 Characterization of discrete Markov
distributions over a finite graph
Proposition. Let C be a simple undirected graph with vertex set C
and edge set E(C) indexing discrete random variables X = (Xϒ)
with strictly positive joint distribution p.
Then the following are equivalent. (i) Constraint on log p: This is what I need to tell you, see p.14. (ii) Local Markov property: For every ϒ ε C, Xϒ and X{ϒ}’ are
conditionally independent given Xbd{ϒ} .
(iii) Global Markov property: For every pair of disjoint subsets a
and b of C and third subset d separating a from b in C, Xa and
Xb are conditionally independent given Xd .
I hope you now see where we are heading. Note that the condition
on K-1 is in a sense a constraint on log φ, where φ is the normal
density. In the discrete case, we need to go beyond pairs. 3 The condition on log p
Before I can even write down the condition on log p, I need
to spell out some notation. The random variable Xϒ is of
course discrete, and so takes its values in a finite set which
will be denoted by Iϒ . (Extending to countably infinite sets is
not really a problem, see the paper.)
4 5 6 Illustration of the previous Lemma
Letʼs suppose that C = {1,2,3} and that the sets I1, I2 and I3 are
arbitrary finite sets with elements i, j and k respectively.
Suppose that P(X1=i, X2=j , X3=k) = pijk . If we have P(X1=i,X3=k | X2=j) = P(X1=i|X2=j)P(X3=k|X2=j), then we readily calculate that (*)
pijk = pij+ p+jk / p+j+ ,
where “+” denotes summing out the missing subscript. Check! Thus log p ijk = uij + vjk for suitable functions u and v.
Conversely, if pijk = aij bjk , then it is a straightforward
calculation to check that pij+ = aij bj+ , p+jk = a+j bjk and
p+j+ =a+j bj+ , from which (*) follows by multiplication and division. 7 8 Illustration of the previous lemma
Suppose that we carry on the notation for the example on p.7
above, and write wijk = log p ijk . The following additive expansion
of wijk into a grand mean, 3 main effects, 3 two-factor
interactions, and a 3-factor interaction, is an instance of that
described in the text on the previous page. It should be familiar
to anyone who has ever encountered linear models and anova.
Here “.” means the average has been taken over the missing
subscript.
w ijk = w… + (wi... – w…) + (w.j. – w…) + (w..k – w…) + (wij. – wi.. – w.j. + w…) + (w.jk – w.j. – w..k + w…) + (wi.k – w i.. – w..k + w…)
+ (wijk – wij. – w.jk – wi.k - wi.. + w.j. + w..k – w…). These 8 terms are orthogonal in a suitable inner product, and the
expansion is unique.
9 10 Examples of graphs and their sets of cliques
11 12 Illustration of the previous lemma
Continue our running example, when C = {1,2,3}, and
suppose now that a graph over C has precisely two edges:
E(C) = {{1,2}, {2.3}}. Then the cliques of this graph are
again {1,2} and {2,3} and a function wijk would belong to the
linear space of Lemma 3.1 if and only if it could be written wijk = uij + vjk for suitable functions u and v. Going back to the
representation on p.9 above, this means omitting all the
terms in that expansion other than those involving those
corresponding to the cliques {1,2} and {2,3}. We see that there is redundancy in this representation. 13 14 15 16 17 18 Examples of graphs and their condi4onal independencies. 19 Thanks for listening!
20 
Download