Multivariate Normal Distribution – I • We will almost always assume that the joint distribution of the p × 1 vectors of measurements on each sample unit is the p-dimensional multivariate normal distribution. • The MVN assumption is often appropriate: – Variables can sometimes be assumed to be multivariate normal (perhaps after transformation) – Central limit theorem tells us that distribution of many multivariate sample statistics is approximately normal, regardless of the form of the population distribution. • As a bonus, the MVN assumption leads to tractable results (but mathematical convenience should not be the reason for choosing the MVN as the probability model). 107 Multivariate Normal Distribution • The MVN is a generalization of the univariate normal distribution for the case p ≥ 2. • Recall that if X is normal with mean µ and variance σ 2, and its density function is given by 1 1 2 ], f (x) = exp[− (x − µ) 2σ 2 (2πσ 2)1/2 −∞ < x < ∞. • We can write the kernel of the density as: 1 1 2 exp[− 2 (x − µ) ] = exp[− (x − µ)0(σ 2)−1(x − µ)]. 2σ 2 108 Multivariate Normal Distribution Now consider a p × 1 random vector X = [X1, X2, . . . , Xp]0. The kernel shown above generalizes to 1 exp[− (x − µ)0Σ−1(x − µ)], 2 where x = x1 x2 ... xp and µ = µ1 µ2 ... µp and Σ = σ11 σ12 · · · σ1p σ21 σ22 · · · σ2p ... ... ... ... σp1 σp2 · · · σpp is the p × p population covariance matrix. This is the kernel of the MVN distribution. 109 Multivariate Normal Distribution • The quadratic form (x − µ)0Σ−1(x − µ) in the kernel is a statistical distance measure, of the type we described earlier. For any value of x, the quadratic form gives the squared statistical distance of x from µ accounting for the fact that the variances of the p variables may be different and that the variables may be correlated. • This quadratic form is often referred to as Mahalanobis distance. • The density function of the MVN distribution is: f (x) = 1 1 0 Σ−1 (x − µ)], exp[− ( x − µ ) 2 (2π)p/2|Σ|1/2 where the normalizing constant (2π)p/2|Σ|1/2 makes the volume under the MVN density equal to 1. 110 Multivariate Normal Distribution • We use the notation X ∼ Np(µ, Σ) or X ∼ MVN(µ, Σ) to indicate that the p-dimensional random vector X has a multivariate normal distribution with mean vector µ and covariance matrix Σ. 111 Example: Bivariate Normal Density σ11 = σ22, and ρ12 = 0 112 Example: Bivariate Normal Density σ11 = σ22, and ρ12 > 0 113 Density Contours • The x values that yield a constant height for the density form ellipsoids centered at µ. • The MVN density is constant on surfaces or contours where (x − µ)0Σ−1(x − µ) = c2. • Definition of a constant probability density contour is all x’s that satisfy the expression above. • The axes of the ellipses are in the directions of the eigenvectors of Σqand the length of the j − th longest axis is proportional to ( λj ), where λj is the eigenvalue associated with the j − th eigenvector of Σ. 114 Density Contours • Recall that if (λj , ej ) is an eigenvalue-eigenvector pair for Σ and Σ is positive definite, then (λ−1 j , ej ) is an eigenvalueeigenvector pair of Σ−1. q • The jth axis is ±c λj ej , for j = 1, ..., p. • We show later that if c2 = χ 2 p (α), 2 where χ2 p (α) is the upper (100α)th percentile of a χ distribution with p degrees of freedom, then the probability is 1−α that the value of a random vector will be inside the ellipsoid defined by (x − µ)0Σ−1(x − µ) ≤ χ2 p (α) 115 Bivariate Normal Density Contours • Eigenvalues and eigenvectors of Σ are obtained from √ √ |Σ − λI| = 0. Using σ12 = σ21 = ρ σ11 σ22, we have " # " # λ 0 σ11 σ12 − 0 = σ12 σ22 0 λ √ √ σ − λ ρ σ 11 σ22 = √ 11 √ ρ σ11 σ22 σ22 − λ = (σ11 − λ)(σ22 − λ) − ρ2σ11σ22 = λ2 − (σ11 + σ22)λ + σ11σ22(1 − ρ2). 116 Bivariate Normal Density Contours • Solutions to this quadratic equation are: 1 σ11 + σ22 + λ1 = 2 q 1 σ11 + σ22 − λ2 = 2 q (σ11 + σ22)2 − 4σ11σ22(1 − ρ2) (σ11 + σ22)2 − 4σ11σ22(1 − ρ2) Solving quadratic equations: ax2 + bx + c = 0 The solutions are x= −b + q b2 − 4ac 2a and x= −b − q b2 − 4ac 2a 117 Bivariate Normal Density Contours • An eigenvector associated with λ1 satisfies " σ11 σ12 σ12 σ11 #" e11 e12 # " = λ1 e11 e12 # , where eij denotes the jth element in the ith eigenvector. We must solve √ √ (σ11 − λ1)e11 + ρ σ11 σ22e12 = 0 √ √ (σ22 − λ1)e12 + ρ σ11 σ22e11 = 0 118 Bivariate Normal Density Contours Solutions: " e1 = e11 e12 # " = d db # where d is any scalar and b= σ22 − σ11 + q (σ11 + σ22)2 − 4σ11σ22(1 − ρ2) √ √ 2ρ σ11 σ22 119 Bivariate Normal Density Contours Choose d to satisfy 1 = e01e1 = d2(1 + b2). Then 1 d=q 1 + b2 −1 d=q 1 + b2 or and 1 √ 1+b2 e1 = b √ 1+b2 −1 √ 1+b2 −b √ 1+b2 or 120 Bivariate Normal Density Contours The eigenvector corresponding to λ2 is 1 √ 1+c2 e2 = √ c 1+c2 −1 √ 1+c2 √ −c 1+c2 or where c= σ22 − σ11 − q (σ11 + σ22)2 − 4σ11σ22(1 − ρ2) √ √ 2ρ σ11 σ22 These eigenvectors have the following properties: • ||ei|| = q e0iei = 1 • e0iej = 0 for i 6= j 121 Bivariate Normal Contours (σ11 = σ22) • When σ11 = σ22, the formulas simplify to λ1 = σ11(1+ρ) = σ11+σ12 1 √ 2 e1 = , 1 √ 2 and λ2 = σ11(1−ρ) = σ11−σ12 1 √ 2 e2 = . 1 −√ 2 • If σ12 > 0, the major axis of the ellipse will be in the direction of the 45o line. The actual value of σ12 does not matter. If σ12 < 0, then the major axis will be perpendicular to the 45o line. 122 Bivariate Normal Contours (σ11 = σ22) 123 More Bivariate Normal Examples • 50% and 90% contours of two bivariate normal densities. Density is the highest when x = µ. 124 Central (1 − α) × 100% Region of a Bivariate Normal Distribution • The ratio of the lengths of the major and minor axes is √ Length of major axis λ =√ 1 Length of minor axis λ2 • If 1 − α is the probability that a randomly selected member of the population is observed inside the ellipse, then the halflength of the axes are given by q q 2 χ2(α) λi • This is the smallest region that has probability 1 − α of containing a randomly selected member of the population 125 Central (1 − α) × 100% Region of a Bivariate Normal Distribution • The area of the ellipse containing the central (1 − α) × 100% of a bivariate nornal population is area = πχ2 2 (α) q q 1/2 λ1 λ2 = πχ2 2 (α)|Σ| • Note that det(Σ) = det σ11 σ12 σ21 σ22 = det σ11 ρσ1σ2 ρσ1σ2 σ22 = σ11σ22(1−ρ2) 126 Central (1 − α) × 100% Region of a Bivariate Normal Distribution • For fixed variances, σ11 and σ22, area of the ellipse is largest when ρ = 0 • The area becomes smaller as ρ approaches 1 or as ρ approches −1. • For σ11 = σ22 and ρ = 0 the contours of constant density are concentric circles and λ1 = λ2 • For σ11 > σ22 the axes of the ellipse are parallel to the coordinate axes, with the major axis parallel to the horizontal axis. • For σ22 > σ11 the axes of the ellipse are parallel to the coordinate axes, with the major axis parallel to the vertical axis. 127 Central (1 − α) × 100% Region of a Multivariate Normal Distribution • For a p-dimensional normal distribution, the smallest region such that there is probability 1 − α that a randomly selected observation will fall in the region is – a p-dimensional ellipsoid – with hypervolume ip/2 2π p/2 h 2 χp (α) |Σ|1/2 pΓ(p/2) where Γ(·) is the gamma function 128 Gamma Function p p Γ = −1 2 2 when p is an even integer, and p − 2 · · · (2)(1) 2 p (p − 2)(p − 4) · · · (3)(1) √ Γ( ) = π (p−1)/2 2 2 when p is an odd integer 129 Overall Measures of Variability • Generalized variance: |Σ| = λ1λ2 · · · λp • Generalized standard deviation |Σ|1/2 = q λ1λ2 · · · λp • Total variance trace(Σ) = σ11 + σ22 + · · · + σpp = λ1 + λ2 + · · · + λp 130 Sample Estimates: Air Samples " Xi = " X1 = 7 12 # " X2 = Xi1 ← CO concentration Xi2 ← N2O concentration 4 9 # " X3 = 4 5 # " X4 = # 5 8 # " X5 = 4 8 131 # Sample Estimates: Air Samples • Sample mean vector: " X̄ = 4.8 8.4 # • Sample covariance matrix: " S= 1.7 2.6 2.6 6.3 # • Sample correlation: r12=0.7945 • Generalized variance: |S| = 3.95 • Total variance; trace(S) = 1.7 + 6.3 = 8.0 132 Example 3.8 " S= 5 4 4 5 # r12 = 0.8 |S| = 9 tr(S) = 10 " S= 5 −4 −4 5 # " S= 3 0 0 3 r12 = −0.8 r12 = 0 |S| = 9 |S| = 9 tr(S) = 10 # tr(S) = 6 133