Document 13354401

advertisement
Introduction to multivariate statistics
Terry Speed, SICSA Summer School
Statistical Inference in Computational Biology, Edinburgh, June 14-15, 2010
Lecture 2 1 The multivariate normal: Conditional densities
In the notation of p.8 of Lecture 1, put ak = -qkp/qpp, so that yp = qpp(xp – a1x1 -…-ap-1xp-1). Recall that Yp was found to be independent of X1, …, Xp-1.
In other words, the ak are numbers such that T = Xp – a1X1 -…-ap-1Xp-1 is independent of (X1, …, Xp-1), and this property uniquely
characterizes the coefficients ak.
To obtain the conditional density of Xp given X1 =x1,…,Xp-1 =xp-1,
we must divide the density of (X1, …, Xp ) by the marginal
density of (X1, …, Xp-1). From the above, we get an
exponential with exponent -½ yp2/qpp. 2 Multivariate normal: Conditional densities, II
Thus the conditional density of Xp given X1 = x1, …, Xp-1 = xp-1 is
normal with expectation a1x1 +…+ap-1xp-1 and variance 1/qpp , i.e.
(*)
E(Xp | X1, …,Xp-1 ) = a1X1 +…+ap-1Xp-1 .
Theorem. If (X1,…,Xp ) has a normal density, the conditional density
of Xp given X1,…,Xp-1 is again normal,. Further, the conditional
expectation (*) is the unique linear function of X1…,Xp-1 making T
independent of (X1,…,Xp-1). The conditional variance equals var(T)
= 1/qpp .
I now turn to an alternative development, leading to a more general
conclusion. But first, 3 Two simple facts about normal distributions
Introduce the notation X ~ N(μ, Σ) to abbreviate the statement
X is normal with center μ and covariance matrix Σ. Also note
that I am no longer making my vectors and matrices bold.
Fact 1:
If X ~ N(μ, Σ) , and Y = AX + b, then Y ~ N(Aμ, AΣAT). Fact 2:
If X ~ N(μ, Σ), then AX and BX are independent iff AΣBT = 0.
The proofs of these facts are straightforward consequences
of earlier results, e.g. we use the readily established formulae
var(AX) = AΣAT, and cov(AX, BX) = AΣBT.
4 Conditional densities: a second approach
Here is a more multivariate version of the our recent result. It is
proved directly in Bishopʼs book, but the following derivation is
simpler, given the independence result we use. Write our p-vector X = (X1T,X2T)T , where X1 is an r-vector, and
X2 is an s-vector, s=p-r. Partition the covariance matrix var(X)
= Σ of X into diagonal blocks Σ11 = var(X1), Σ22 = var(X2), and
off-diagonal blocks Σ12 = Σ21T = cov(X1, X2). Theorem. If X ~ N(μ, Σ ), then X1 and X2.1 = X2 –Σ21Σ11-1X1 have
the following distributions, and are independent: X1 ~ N(μ1, Σ11), X2.1 ~ N(μ2.1, Σ22.1), where
μ2.1 = μ2 – Σ21Σ22-1μ1, and Σ22.1 = Σ22 - Σ21Σ11-1Σ12 . 5 Proof of the preceding theorem
The main work lies in establishing that X1 and X2.1 are
uncorrelated, and so independent. This and the formulae
given follow from the two facts stated on p.4 above.
It follows from what we have just proved that the conditional
distribution of X2.1 given X1 is the same as its marginal
distribution. But X2 is just X2.1 plus Σ21Σ11-1X1 , which is
constant, given X1 . Hence
Theorem. With the same notation and assumptions, X2 | X1 ~ N(μ2 + Σ21Σ22-1(X1-μ1), Σ22.1) . I leave you to fill in the details.
6 Details of the proof
In the lecture, I screwed up the proof. Hereʼs the correct
version, using the fact that matrix coefficients in the second
argument of cov(.,.) must be transposed (last line, p.4 above).
cov(X1,X2.1) = cov(X1, X2 –Σ21Σ11-1X1 ) = cov(X1 ,X2) – cov(X1 ,X1)Σ11-1Σ21T
= Σ12 - Σ11Σ11-1Σ21T = 0.
7 The partitioned matrix inverse formula
Bishopʼs approach makes essential use of the following
formula, which we need later anyway. −1
 M
 A B

 =  −1
C D
−D CM

−MBD−1
, where
−1
−1
−1
D + D CMBD 
M = (A − BD−1C)−1 .
€
This formula, which simplifies for symmetric matrices
(i.e., when BT = C) permits us to interpret Σ22.1 as the
reciprocal of Q22, where Q = Σ -1 is partitioned in the
same way as Σ . Check. 8 Conditional independence with normals
I turn now to the material in the paper Gaussian Markov
distributions over finite graphs. The setting there is a random
vector X = (Xγ: γ ε C) indexed by finite set C which will later be
given a graph structure. The covariance matrix of X is denoted
by K, and for subsets a, b of C, I use the notation Xa , Xb , Ka,b ,
Ka = Ka,a for the restrictions of X and K to these subsets. Also,
ab and a\b denote intersection and difference, respectively.
Proposition 1 For subsets a and b of C with aub = C, the
following are equivalent. (i) Ka,b = Ka,abKab-1Kab,b ; (iʼ) Ka\b,b\a = Ka\b,abKab-1Kab, b\a ;
(ii) (K-1)a\b,b\a = 0 ; (iii) Xa and Xb are c.i. given Xab .
Corollary. Xα and Xβ are c.i. given X{α ,β }ʼ iff K-1(α,β) = 0.
(Here c.i. abbreviates conditional independence, ʻ = complement.)
9 Illustration
In the lecture I drew several figures illustrating the previous
result. They are best summarized as follows:
a\b
Xa\b
ab
b\a
independent of Xb\a
given Xab 10 
Download