Document 13354400

advertisement
Introduction to multivariate statistics
Terry Speed, SICSA Summer School
Statistical Inference in Computational Biology, Edinburgh, June 14-15, 2010
Lecture 1 1 Use of probability distributions
I will be presenting a view of multivariate statistics based on
probability distributions. Although I will only be discussing theory,
and not inference, the use of probability distributions underlies
analysis methods which regard any data as having been generated
by a (multivariate) statistical model. In the CS literature, this is
referred to as the use of generative models. Bayesian methods, and
likelihood methods more generally, use probability distributions,
while many other methods, e.g. principal components analysis
(PCA), neural networks (NN), and most clustering methods do not,
at least in their standard forms. However, a good proportion of
those that do not, can, in fact, be re-phrased in terms of probability
models, including PCA, NN and clustering.
Some people argue that the probability model paradigm is a big
deal. I tend to agree, but donʼt make too much of this claim.
Nonsense can be written by those who make use of the alleged
power of probability models, and excellent work can be done by
those who use non-probabilistic methods. 2 Two pure classes of multivariate models
It is my observation that there are basically two pure classes of multivariate
probability models: the discrete, and the normal (Gaussian). These can be and are extended and combined in many creative ways, by
making use of other probability distributions such as the geometric,
exponential, binomial, multinomial, Dirichlet, Poisson, and negative
binomial, and by using general probability ideas such as independence,
conditioning and mixing. While some other distributions have apparent multivariate analogues, they
usually have heavy constraints, and are little no more than univariate
distributions extended and combined together. There are exceptions to this
statement, notably bivariate and hyperbolic distributions.
By contrast, the discrete and normal are truly multivariate, and have much
greater flexibility, though of course multivariate normals are constrained by
their form. For this reason, I plan to focus on discrete and normal multivariate models
today and tomorrow. 3 Comment
Someone asked whether I regarded the Dirichlet as a truly
multivariate distribution. I answered no. Dirichlets can be
simulated from i.i.d. gamma random variables, by
conditioning on the sum. In this sense, they are variants
on i.i.d. random variables, see the remark on the
previous page. There is a link between gamma and normal random
variables, but it is not a primary relationship (speaking
loosely). 4 The multivariate normal: Approach 1
Iʼll begin with multivariate normal (Gaussian) distributions in p
dimensions. There are two main approaches to defining them and
deriving the basic facts. Approach 1: Start with the univariate normal, and say a random p(column) vector X has a p-variate normal distribution if for every pvector of constants a, the linear combination aTX has a univariate
normal distribution.
This approach yields a few important results easily, e,g. the one
asserting that an arbitrary linear combination of normals is again
normal. (Iʼm dropping multivariate from now on.). But some basic
results require the use of characteristic functions, which are not, in
my view, elementary. Accordingly, I wonʼt pursue this approach.
Those of you that are interested can see it further developed in the
excellent book Multivariate analysis by K Mardia, J Kent and J
Bibby, Academic Press. 5 Comment about multivariate normality
It is far from true that a multivariate distribution is normal if
all the marginals are univariate normal. The requirement of
the previous page says much more: all linear combinations
must be univariate normal. Not only is this hard to check in
practice, it is hard to satisfy. Having said this, I observed that it is unreasonable to expect
a set of data to fit the multivariate normal in every respect.
Useful results can be obtained from approximations in this
context, as in others. 6 The multivariate normal: Approach 2
A probability density in p dimensions is called normal and centered
at the origin if it has the form φ(x) = γ-1 exp{ -½ q(x)}
where γ is a normalizing constant, and q(x) = ΣjΣk qjkxjxk = xTQx
is a quadratic form. (Note: Many people are more explicit here.)
A normal density centered at μ = (μ1,…μp) is given by φ(x-μ).
Elementary arguments show that no diagonal element of Q can
vanish. Define Σ = var(X) = E{(X-EX)(X-EX)T} . Mostly, weʼll center at 0.
The details relating Q and Σ which follow can be found in
Introduction to probability theory and its applications, volume 2,
by W Feller, Wiley, chapter III, section 6. See also Pattern recognition and machine learning by CM Bishop,
Springer, section 2.3 for a parallel, but slightly different
development. 7 The multivariate normal: Marginal densities
Introduce the transformation y1 = x1, …, yp-1 = xp-1, yp = q1px1+…+qppxp.
It can be seen that q(x) – yp2/qpp is a quadratic form in x1, …, xp-1
not involving xp. Thus q(x) = yp2/qpp + q*(y)
where q*(y) is a quadratic form in y1, …,yp-1. This shows that the
vector y = Ax has a normal density that factors into two normal
densities for Yp and (Y1, …,Yp-1). Theorem. All marginal densities of a normal density are again
normal.
A simple inductive argument (see Feller for full details) based on
the transformation above shows that there is a matrix C with
positive determinant, such that Z = CX has components which
are mutually independent normal random variables. 8 The multivariate normal: a basic fact
Theorem. The matrices Q and Σ are mutually inverse, and ϒ2 = (2π)p.|Σ| . Proof. Put D = E(ZZT) = CΣCT. This is a matrix with diagonal
elements E(Zj2) = σj2, and zeros off the diagonal. The
density of Z is the product of normal densities n(xσj-1)σj-1
and hence induced by the matrix D-1 with diagonal
elements σj-2. Now the density of Z is obtained from that of
X by the substitution x =C-1z and multiplication by the
determinant |C-1|. Thus
zTD-1z = xTQx and (2π)p|D| = γ2|C|2 .
It follows that Q = CTD-1C, and this implies Q = Σ-1. It also follows that |D| = |Σ|.|C|2, and hence ϒ2 = (2π)p|Σ|.
9 The multivariate normal: Covariance matrices
With this we see that factorization of Σ corresponds to
factorization of Q, and hence the very important
Corollary. If (X1T, X 2T)T is normally distributed, then X1 and
X2 are independent iff cov(X1, X2) = 0, that is, iff X1 and
X2 are uncorrelated. A second important fact is the following
Theorem. A matrix Σ is the covariance matrix of a normal
density iff Σ is positive definite.
Equivalently, A matrix Q induces a normal density by the
formula given earlier iff it is positive definite.
The proof is a simple induction, see Feller cited above.
(Of course there are less elementary proofs of this fact.)
10 Elaboration on the Corollary
from the previous page.
Suppose that φ1(x1) = γ1-1exp{ -½ q1(x1)} and
φ2(x2) = γ2-1exp{ -½ q2(x2)} are two independent normal
densities with inverse covariance matrices Q1 and Q2 and
covariance matrices Σ1 and Σ2 respectively. Then the
product density φ(x1 ,x2) = (γ1 γ2)-1exp{ -½ [q1(x1)+q2(x2)]}
has inverse covariance matrix Q and covariance matrix Σ,
where
Q1 0 
Σ1 0 
Q=
 and Σ = 
.
 0 Q2 
 0 Σ2
It is clear that if Σ can be partitioned as on the right, i.e. the
two components are uncorrelated, than so can Q, and so
€
independence
can be inferred.
11 
Download