Uploaded by Emmanuel DJEGOU

MultiAnalChapt2Spring2024

advertisement
Statistics 6545
Multivariate Statistical Methods
2
2.1
Matrix Algebra and Random Vectors
Introduction
The study of multivariate methods, and statistical methods in
general, is greatly facilitated by the use of matrix algebra
Statistics 6545 Multivariate Statistical Methods
53
Robert Paige (Missouri S &T)
2.2
Some Basics of Matrix Algebra and Random Vectors
Data in a multivariate analysis can be represented as a matrix


x11 x12 · · · x1k · · · x1p
 x21 x22 · · · x2k · · · x2p 
 .
..
..
.. 
 .



X =

 x.j1 x.j2 · · · x.jk · · · x.jp 
 .
.
.
. 
xn1 xn2 · · · xnk · · · xnp
Many calculations in multivariate analysis are best performed with
matrices and vectors
Statistics 6545 Multivariate Statistical Methods
54
Robert Paige (Missouri S &T)
2.2.1
Vectors
An array x of n real numbers x1, . . . , xn is called a vector and is
written as


x1
 x2 

x=
 .. 
xn
or
′
x = x1 x2 · · · xn
A vector can be represented geometrically as a directed line in n
dimensions
Statistics 6545 Multivariate Statistical Methods
55
Robert Paige (Missouri S &T)
A vector can be expanded or contracted by multiplying it by a
constant c


cx1
 cx2 

cx = 
 .. 
cxn
Statistics 6545 Multivariate Statistical Methods
56
Robert Paige (Missouri S &T)
Example: Let x = [0, 1]′ then 4x = [4 (0) , 4 (1)]′ = [0, 4]′
Statistics 6545 Multivariate Statistical Methods
57
Robert Paige (Missouri S &T)
Two vectors may be added




x1
y1
 x2   y2 
  
x+y =
 ..  +  .. 
xn
yn


x1 + y1
 x2 + y2 

=
.
.

xn + yn
Statistics 6545 Multivariate Statistical Methods
58
Robert Paige (Missouri S &T)
′
′
Example: If x = [0, 1] and y = [1, 1] then x + y = [1, 2]′
A vector has both direction and length
The length of a vector is defined as
Lx =
x21 + x22 + · · · + x2n
For n = 2 the length of x can be viewed as the hypotenuse of a
right triangle
Statistics 6545 Multivariate Statistical Methods
59
Robert Paige (Missouri S &T)
Note that
Lcx =
c2x21 + c2x22 + · · · + c2x2n
√
= c2
= |c| Lx
x21 + x22 + · · · + x2n
√
Example: If x = [0, 1] then Lx = 02 + 12 = 1
′
Another important concept is that of the angle between two vectors x and y
Suppose that n = 2 and that the angles associated with x =
′
′
[x1, x2] and y = [y1, y2] are θ1 and θ2 respectively and θ1 < θ2 so
that the angle between x and y is θ2 − θ1
Statistics 6545 Multivariate Statistical Methods
60
Robert Paige (Missouri S &T)
We have that
x1
cos (θ1) = , sin (θ1) =
Lx
y1
cos (θ2) = , sin (θ2) =
Ly
x2
Lx
y2
Ly
cos (θ) = cos (θ2 − θ1)
= cos (θ2) cos (θ1) + sin (θ2) sin (θ1)
y1
x1
y2
x2
=
+
Ly
Lx
Lx
Lx
y1x1 + y2x2
=
LxLy
Statistics 6545 Multivariate Statistical Methods
61
Robert Paige (Missouri S &T)
Here
′
x y = y1x1 + y2x2
is the inner product of x and y
Note that
√ ′
Lx = x x
so we have
y1x1 + y2x2
LxLy
′
xy
=√ ′
x x y′ y
cos (θ) =
′
′
Example: x = [0, 1] and y = [1, 1]
(1) (0) + (1) (1)
√
cos (θ) =
1 2
1
=√
2
Statistics 6545 Multivariate Statistical Methods
62
Robert Paige (Missouri S &T)
′
Vectors x and y are perpendicular (orthogonal) when x y = 0
′
′
′
Example: If x = [0, 1] and y = [1, 0] then x y = 0
All of this holds for vectors of length n
A set of vectors x1, x2, . . . , xk are linearly dependent if there exists
constants c1, c2, . . . , ck , where at least one is nonzero, where
c1x1 + · · · + ck xk = 0
and 0 represents the vector of all zeroes
When this is the case then at least one of the vectors can be written
as a linear combination of the others
If all the ci must be zero for
c1x1 + · · · + ck xk = 0
then the set of vectors is linearly independent
Statistics 6545 Multivariate Statistical Methods
63
Robert Paige (Missouri S &T)
Example:
 
 
 
1
1
1
x1 =  1  , x2 =  1  , x3 =  0 
1
0
0
are these vectors linearly dependent?
The space of all real m-tuples with scalar multiplication and vector
addition is a called a vector space
Note that this is simply n-dimensional Euclidean space Rm
Rm is simply the linear span of its basis vectors (the set of all
linear combinations of the basis vectors)
Example:

 
 
 
1
0
0 

R 3 = x : x = c1  0  + c2  1  + c3  0 

0
0
1 
Statistics 6545 Multivariate Statistical Methods
64
Robert Paige (Missouri S &T)
Any set of m linearly independent vectors is called a basis for
vector space of all m-tuples of real numbers
Example: A basis for R3 is
     
0 
0
 1
 0 , 1 , 0 
 0
1 
0
Every vector in Rm can be represented as a linear combination of
a fixed basis
The orthogonal projection ("shadow") of a vector x on y is
′
xy
′ y
yy
′
xy y
=
Ly Ly
Projection of x on y =
where
Statistics 6545 Multivariate Statistical Methods
y
Ly
65
Robert Paige (Missouri S &T)
is a unit vector and describes the direction of the projection and
′
′
xy
xy
= Lx
= Lx cos (θ)
Ly
Ly Lx
The length of the projection is
|Lx cos (θ)| = Lx |cos (θ)|
′
′
Example: x = [0, 1] and y = [1, 1]
1
Projection of x on y = y = [1/2, 1/2]′
2
Statistics 6545 Multivariate Statistical Methods
66
Robert Paige (Missouri S &T)
Note that we can also define a projection matrix
′
xy
Projection of x on y = ′ y
yy
′
yx
=y ′
yy
−1
= y (y ′y) y ′x
So then
′
−1
y (y y)
′
y =
=
and
Projection of x on y =
′
1 1
[1, 1]
1 2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
0
1
=
1
2
1
2
If y1, y2, . . . , yr are mutually orthogonal then the projection of
Statistics 6545 Multivariate Statistical Methods
67
Robert Paige (Missouri S &T)
vector x on the linear span of y1, y2, . . . , yr is given as
′
′
′
x y1
x y2
x yr
y1 + ′ y2 + · · · + ′ yr
′
yr yr
y1y1
y2y2
Gram-Schmidt Process: Given linearly independent vectors
x1, x2, . . . , xk
there exits mutually orthogonal vectors
u1, u2, . . . , uk
with the same linear span.
u1 = x1
′
x2u1
u2 = x2 − ′ u1
u1u1
..
′
′
xk u1
xk uk−1
uk−1
uk = xk − ′ u1 − · · · − ′
u1u1
uk−1uk−1
Statistics 6545 Multivariate Statistical Methods
68
Robert Paige (Missouri S &T)
Example:
Then
 
 
 
1
1
1
x1 =  1  , x2 =  1  , x3 =  0 
1
0
0
 
1
u1 = x1 =  1 
1
Statistics 6545 Multivariate Statistical Methods
69
Robert Paige (Missouri S &T)
′
x2u1
u2 = x2 − ′ u1
 u1u1  
1
1
2
=1− 1
3 1
0
 1 
=
Statistics 6545 Multivariate Statistical Methods
3
1
3
− 23

70
Robert Paige (Missouri S &T)
′
′
x3u1
x3u2
u3 = x3 − ′ u1 − ′ u2
 u1u1  u2u2  1 
1
1
1
3
1
3  1 




1 −6
= 0 −
3
3
9
0
1
− 23
 1 
2
=  − 12 
0
2.2.2
Matrices
A matrix is a rectangular array of
columns, for instance,

a11 a12
 a21 a22
A =
..
 ..
(n×p)
an1 an2
Statistics 6545 Multivariate Statistical Methods
71
numbers with n rows and p

· · · a1p
· · · a2p 
. . . .. 

· · · anp
Robert Paige (Missouri S &T)
The transpose of a matrix A changes its columns into rows and is
′
denoted by A
The product of a constant c and matrix A is

ca11 ca12 · · · ca1p
 ca21 ca22 · · · ca2p
cA = 
.. . . . ..
 ..
(n×p)
can1 can2 · · · canp




If matrices A and B have the same dimensions then
A+B
is defined and has (i, j)th entry
aij + bij
When A is (n × k) and B is (k × p) then the product
AB
Statistics 6545 Multivariate Statistical Methods
72
Robert Paige (Missouri S &T)
is defined with (i, j)th entry that is the inner product of the ith
row of A and jth column of B;
k
ai1b1j + ai2b2j + · · · + aik bkj =
ail bkl
l=1
A square matrix (n = p) is symmetric if
′
A=A
A square matrix A has inverse B if
AB = I = BA
where

1
0
I =
0
0
Statistics 6545 Multivariate Statistical Methods
0
1
0
0
73
0
0
...
0

0
0

0
1
Robert Paige (Missouri S &T)
is the identity matrix
Example: Let


1 1 1
A=1 1 0
1 0 0


0 0 1
B = A−1 =  0 1 −1 
1 −1 0
 


1 1 1
0 0 1
1
AB =  1 1 0   0 1 −1  =  0
1 0 0
1 −1 0
0


 
0 0 1
1 1 1
1
BA =  0 1 −1   1 1 0  =  0
1 −1 0
1 0 0
0
Statistics 6545 Multivariate Statistical Methods
74

0 0
1 0
0 1

0 0
1 0
0 1
Robert Paige (Missouri S &T)
The determinant of the square k × k matrix A is the scalar given
as
|A| = a11
if k = 1
a11 a12
|A| =
a21 a22
= a11a22 − a12a21
if k = 2 more generally
|A| =
k
j=1aij
|Aij | (−1)i+j
where Aij is the (k − 1) × (k − 1) matrix obtained by deleting the
ith row and the jth column of A
Statistics 6545 Multivariate Statistical Methods
75
Robert Paige (Missouri S &T)
Example:


1 1 1
1 1 0 =1 1 0 −1 1 0 +1
0 0
1 0
1 0 0
=0−0−1
= −1
The inverse of any 2 × 2 invertible matrix
a11 a12
A=
a21 a22
is
1
a22 −a21
−1
A =
|A| −a12 a11
The inverse of any 3 × 3 invertible matrix


a11 a12 a13
A =  a21 a22 a23 
a31 a32 a33
Statistics 6545 Multivariate Statistical Methods
76
1 1
1 0
Robert Paige (Missouri S &T)
is

a22
+

a32

1 
−1
 − a21
A =
a31
|A| 


a
+ 21
a31
a12
a32
a
+ 11
a31
a
− 11
a31
a23
a33
a23
a33
a22
a32
a13
a33
a13
a33
a12
a32
−
a12
a22
a
− 11
a21
a
+ 11
a21
+
a13
a23
a13
a23
a12
a22








In general the (j, i)th entry of A−1 is
|Aij |
(−1)i+j
|A|
Example:


1 1 1
A=1 1 0
1 0 0
Statistics 6545 Multivariate Statistical Methods
77
Robert Paige (Missouri S &T)
A−1

1 0
1 1
+
−

0 0
0 0

1 
− 1 0 + 1 1
=
1 0
1 0
−1 


1 1
1 1
+
−
1 0
1 0


0 0 1
=  0 1 −1 
1 −1 0
1
1
1
−
1
1
+
1
+
1
0
1
0
1
1








Inverse matrix B exists if a1, . . . , ak the k columns of A;
are linearly independent
A = [a1 · · · ak ]
Statistics 6545 Multivariate Statistical Methods
78
Robert Paige (Missouri S &T)
It is very easy to find the inverse of a diagonal matrix, for instance

−1 

0
0
a11 0 0 0
1/a11 0
 0 a22 0 0 

0 

 =  0 1/a22 0

 0 0 a33 0 
 0
0 1/a33 0 
0 0 0 a44
0
0
0 1/a44
Orthogonal matrices is a class of matrices for which it is also easy
to find an inverse;
′
QQ = Q′Q = I
so that
′
Q−1 = Q
Note that
Q′Q = I
implies that for
Q = [q1 · · · qk ]
′
0 if i = j
qiqj =
1 if i = j
Statistics 6545 Multivariate Statistical Methods
79
Robert Paige (Missouri S &T)
The columns of Q are mutually orthogonal and have unit length
Since
′
QQ = I
then rows of Q have the same property
Example: Consider the counterclockwise rotation matrix seen in
Chapter 1
cos (θ) sin (θ)
A=
− sin (θ) cos (θ)
A−1 = A′ =
Statistics 6545 Multivariate Statistical Methods
cos (θ) − sin (θ)
sin (θ) cos (θ)
80
Robert Paige (Missouri S &T)
This is the clockwise rotation matrix
cos (θ) sin (θ)
cos (θ) − sin (θ)
AA−1 =
− sin (θ) cos (θ)
sin (θ) cos (θ)
cos2 θ + sin2 θ
0
=
0
cos2 θ + sin2 θ
1 0
=
= A−1A
0 1
For a square matrix A of dimension k × k the following are equivalent
1. Ax = 0 implies that x = 0 (meaning that A is nonsingular)
2. |A| = 0
3. There exists a matrix A−1 such that AA−1 = A−1A = I
Let A and B be k × k matrices and assume that their inverses
exist, then
Statistics 6545 Multivariate Statistical Methods
81
Robert Paige (Missouri S &T)
′
′
1. A−1 = A
−1
2. (AB)−1 = B −1A−1
3. |A| = |A′|
4. If each element of a row (column) of A is zero then |A| = 0
5. If any two rows (column) of A are identical then |A| = 0
6. If A is nonsingular then |A| = 1/ A−1 so that |A| A−1 = 1
7. |AB| = |A| |B|
8. |cA| = ck |A| for scalar c
Let A = {aij } be a k × k matrix, then the trace of A is
k
tr (A) =
aii
i=1
Let A and B be k × k matrices and c be a scalar
Statistics 6545 Multivariate Statistical Methods
82
Robert Paige (Missouri S &T)
1. tr (cA) = ctr (A)
2. tr (A ± B) = tr (A) ± tr (B)
3. tr (AB) = tr (BA)
4. tr B −1AB = tr (A)
5. tr (AA′) =
k
i=1
k
2
a
j=1 ij
A square matrix A is said to have an eigenvalue λ, with corresponding eigenvector x = 0 if
Ax = λx
′
Usually eigenvector x is normalized so that x x = 1 in which case
we denote x as e
The k eigenvalues of a k × k matrix A satisfy the polynomial
equation
|A − λI| = 0
and as such are sometimes referred to as the characteristic roots
of A
Statistics 6545 Multivariate Statistical Methods
83
Robert Paige (Missouri S &T)
Example:
3 −2
4 −1
A=
|A − λI| =
has solutions
3 − λ −2
= λ2 − 2λ + 5 = 0
4 −1 − λ
λ = 1 ± 2i
The associated eigenvectors are
1
2
Statistics 6545 Multivariate Statistical Methods
± 12 i
1
84
Robert Paige (Missouri S &T)
Check;
3 −2
4 −1
1
2
− 12 i
1
=
− 12 − 32 i
1 − 2i
= (1 − 2i)
3 −2
4 −1
1
2
+ 12 i
1
=
1
2
− 12 i
1
1
2
+ 12 i
1
− 12 + 32 i
1 + 2i
= (1 + 2i)
Result: Let A be a k × k square symmetric matrix. Then A has
k pairs of real eigenvalues and eigenvectors;
λ1, e1, λ2, e2 · · · λk , ek
where the eigenvectors have unit length, are mutually orthogonal
and unique unless two or more of the eigenvalues are equal
Statistics 6545 Multivariate Statistical Methods
85
Robert Paige (Missouri S &T)
Example: Changing the previous matrix to make it symmetric...
A=
The eigenvalues are
3 4
4 −1
√
λ=1±2 5
The associated eigenvectors are
√
1
1
2 ±2 5
1
Note that for non-symmetric matrices the eigenvectors need not
be orthogonal but they always will be linearly independent
Singular Value Decomposition (SVD): Let A be a m × k matrix
of real numbers, then there exits an m × m orthogonal matrix U
and a k × k orthogonal matrix V such that
A = U ΛV ′
where the m × k matrix Λ has (i, i)th entry λi ≥ 0 for i =
Statistics 6545 Multivariate Statistical Methods
86
Robert Paige (Missouri S &T)
1, 2, . . . , min (m, k) and the other entries are zero. The λi are
called the singular values of A.
When A is of rank r (number of linearly independent columns in
A is r) then one may write A as
r
λiuivi′ = Ur Λr Vr′
A=
i=1
where
Ur = [u1, u2, . . . , ur ]
Vr = [v1, v2, . . . , vr ]
both have orthogonal columns and Λr is the diagonal matrix with
diagonal entries λi
It can be shown that (homework)
AA′ui = λ2i ui
A′Avi = λ2i vi
Statistics 6545 Multivariate Statistical Methods
87
Robert Paige (Missouri S &T)
where
and
λ21, λ22, . . . , λ2r > 0
λ2r+1 = λ2r+2 = · · · = λ2m = 0
It also follows that
′
vi = λ−1
A
ui
i
ui = λ−1
i Avi
Example:


1 1 1 1
A=1 1 0 2
1 0 0 3
Statistics 6545 Multivariate Statistical Methods
88
Robert Paige (Missouri S &T)



4. 194 0
0 0
0.381 0.812 0.440
A =  0.570 0.167 −0.803   0 1. 441 0 0 
0
0 0.573 0
0.727 −0.558 0.399


−2
0.400
0.227 9.102 × 10
0.883

0.292
0.680
0.563
−0.365 


×
−2
−2 
6.308 × 10 −0.634
0.768
5.533 × 10
0.866
−0.288
−0.288
−0.288
Let A be a m × k matrix of real numbers with m ≥ k with SVD
A = U ΛV ′
and let s < k = rank (A) then
s
λiuivi′
B=
i=1
is the rank-s least squares approximation in the sense that it minStatistics 6545 Multivariate Statistical Methods
89
Robert Paige (Missouri S &T)
imizes
m
tr (A − B) (A − B)′ =
k
i=1 j=1
(aij − bij )2
over all m × k matrices B having rank no greater than s
Example: For


1 1
A=1 2
1 3
The SVD is



0.323 11 0.853 78 0.408 25
4. 079 1
0
 0.547 51 0.183 22 −0.816 50   0
0.600 49 
0.771 9 −0.487 34 0.408 25
0
0
0.402 66 0.915 35
×
0.915 35 −0.402 66
the 2−rank approximation is
Statistics 6545 Multivariate Statistical Methods
90
Robert Paige (Missouri S &T)


0.323 11 0.853 78
0
 0.547 51 0.183 22  4. 079 1
0
0.600 49
0.771
9
−0.487
34


 
1 1
0.999 99 0.999 99
=  0.999 99 2. 000 0  ≃  1 2 
1 3
0.999 97 3. 000 0
Homework 2.1
2.3
0.402 66 0.915 35
0.915 35 −0.402 66
Positive Definite Matrices
The spectral decomposition of a k×k symmetric matrix A is given
by
′
′
′
A = λ1e1e1 + λ2e2e2 + · · · + λk ek ek
Example:
A=
Statistics 6545 Multivariate Statistical Methods
1
2
1
2
91
1
2
1
2
Robert Paige (Missouri S &T)
1
2
|A − λI| =
−λ
1
2
1
2
The eigenvalues are λ = 0, 1
1
2
−λ
= λ2 − λ = 0
Let’s find the eigenvectors by hand
First the one associated with λ = 0
1
2
1
2
1
2
1
2
e11
e12
=0
e11
e12
We see that e11 = −e12 so we have
−1
√
2
1
√
2
e1 =
Now the eigenvector associated with λ = 1
Statistics 6545 Multivariate Statistical Methods
92
Robert Paige (Missouri S &T)
1
2
1
2
1
2
1
2
e21
e22
e21
e22
=1
e21 + e22 = 2e21
e21 + e22 = 2e22
e22 = e21
√1
2
1
√
2
e2 =
Check;
′
′
λ1e1e1 + λ2e2e2 = 0
=
1
2
1
2
−1
√
2
1
√
2
1
2
1
2
Statistics 6545 Multivariate Statistical Methods
−1
√
2
1
√
2
93
′
+1
√1
2
1
√
2
√1
2
1
√
2
T
Robert Paige (Missouri S &T)
The spectral decomposition is a very useful tool for studying quadratic
forms
A quadratic form associated with symmetric matrix is
k
k
′
x Ax =
aij xixj
i=1 j=1
When
′
x Ax ≥ 0
for all x then A is said to be nonnegative definite
If
′
x Ax > 0
for all x = 0 then A is said to be positive definite
The spectral decomposition can be used to show that a k ×k symmetric matrix A is positive definite if and only if every eigenvector
of A is positive (homework)
Statistics 6545 Multivariate Statistical Methods
94
Robert Paige (Missouri S &T)
A k × k symmetric matrix A is nonnegative definite if and only if
every eigenvector of A is greater than or equal to zero
Note that
′
′
′
′
′
x Ax = x λ1e1e1 + λ2e2e2 + · · · + λk ek ek x
′
′
′
′
′
′
= λ1x e1e1x + λ2x e2e2x + · · · + λk x ek ek x
= λ1y12 + λ2y22 + · · · + λk yk2
where scalar yi is given as
′
yi = eix
for i = 1, 2, . . . , k
Note that
so that if



′

y1
e1
y =  ..  = Ex =  ..  x
′
yk
ek
′
x=Ey =0
Statistics 6545 Multivariate Statistical Methods
95
Robert Paige (Missouri S &T)
then
y = Ex = 0
Recall from Chapter 1 the formula for the statistical distance from
P = (x1, x2, . . . , xp) to O = (0, 0, . . . , 0) is
d (O, P ) =
a11x21 + a22x22 + · · · + appx2p
+2a12x1x2 + 2a13x1x3 + · · · + 2ap−1,pxp−1xp
Now
a11x21 + a22x22 + · · · + appx2p
d (O, P ) =
+2a12x1x2 + 2a13x1x3 + · · · + 2ap−1,pxp−1xp
2
It turns out that
aij = aji
Statistics 6545 Multivariate Statistical Methods
96
Robert Paige (Missouri S &T)
for all i and j so that

a11 a12
 a21 a22
2
d (O, P ) = [x1, x2, . . . , xp] 
..
 ..
an1 an2


· · · a1p
x1
 x2 
· · · a2p 


. . . ..  
 .. 
· · · anp
xp
We know that since d2 (O, P ) is a square distance function that
for x = 0
′
d2 (O, P ) = x Ax > 0
Note that a positive quadratic form can be interpreted as square
distance
Note that for p = 2 the points of constant distance c from the
origin satisfy
′
x Ax = a11x21 + 2a12x1x2 + a22x22 = c2
and by the spectral decomposition
′
x Ax = λ1y12 + λ2y22
Statistics 6545 Multivariate Statistical Methods
97
Robert Paige (Missouri S &T)
so that
2
2
λ1y12 + λ2y22 = λ1 (x′e1) + λ2 (x′e2) = c2
Since λ1, λ2 > 0
λ1y12 + λ2y22 = c2
is an ellipse in
y1 = x′e1
y2 = x′e2
Note that at
−1/2
x = cλ1
′
x Ax = λ1
and at
−1/2
cλ1 e′1e1
−1/2
x = cλ2
Statistics 6545 Multivariate Statistical Methods
e1
98
2
= c2
e2
Robert Paige (Missouri S &T)
′
x Ax = λ2
2.4
−1/2
cλ2 e′2e2
2
= c2
A Square-Root Matrix
We know that the spectral decomposition of a k × k symmetric
positive definite matrix A is given by
′
′
′
A = λ1e1e1 + λ2e2e2 + · · · + λk ek ek
Statistics 6545 Multivariate Statistical Methods
99
Robert Paige (Missouri S &T)
which may be rewritten as
Λ P′
A = P
(k×k)
(k×k)(k×k)(k×k)
where
P = [e1, e2, . . . , ek ]
′
and λi > 0 for all i
′
P P = P P = I 
λ1 0 · · · 0
 0 λ2 · · · 0 
Λ =
.. .. . . . .. 


(k×k)
0 0 · · · λk
Therefore
A−1 = P Λ−1P ′
1
1
1
′
′
′
= e1e1 + e2e2 + · · · + ek ek
λ1
λ2
λk
Statistics 6545 Multivariate Statistical Methods
100
Robert Paige (Missouri S &T)
since
P Λ−1P ′ P ΛP ′ = (P ΛP ′) P Λ−1P ′
= PP′
=I
The square-root of a positive definite matrix A is given as
1/2
A
′
= λ1e1e1 +
= P Λ1/2P ′
′
λ2e2e2 + · · · +
′
λk ek ek
Note that A1/2 can similarly be defined for non-negative definite
matrices
This matrix has the following properties:
1/2
1. A
′
= A1/2
2. A1/2A1/2 = A
Statistics 6545 Multivariate Statistical Methods
101
Robert Paige (Missouri S &T)
3.
1/2
A
−1
4.
1
1
1
′
′
′
= √ e1e1 + √ e2e2 + · · · + √ ek ek
λ1
λ2
λk
= P Λ−1/2P ′
A1/2A−1/2 = A−1/2A1/2 = I
A−1/2A−1/2 = A−1
where
−1/2
A
1/2
= A
Example: For
A=
1
2
1
2
−1
1
2
1
2
we have λ = 0, 1
Statistics 6545 Multivariate Statistical Methods
102
Robert Paige (Missouri S &T)
e1 =
−1
√
2
1
√
2
e2 =
√1
2
1
√
2
Now
A1/2 =
′
λ1e1e1 +
−1
√
2
1
√
2
√
= 0
=
1
2
1
2
1
2
1
2
′
λ2e2e2
!
−1 √1
√
2
2
√
+ 1
√1
2
1
√
2
√1 √1
2
2
!
=A
Why?
2.5
Random Vectors and Matrices
A random matrix (vector) is a matrix (vector) whose elements
consist of random variables
Statistics 6545 Multivariate Statistical Methods
103
Robert Paige (Missouri S &T)
The expectation of a random
componentwise fashion, i.e. if

X11
 X21
X=
 ..
Xn1
then
where
matrix (vector) is performed in a
X12
X22
..
Xn2

· · · X1p
· · · X2p 
. . . .. 

· · · Xnp
E [X11] E [X12]
 E [X21] E [X22]
E [X] = 
..
..

E [Xn1] E [Xn2]
E [Xij ] =
"∞

−∞ xij f (xij ) dxij
xij xij f (xij )

· · · E [X1p]
· · · E [X2p] 

..
...

· · · E [Xnp]
,Xij continuous
,Xij is discrete
Expectation is linear
E [Xij + Yij ] = E [Xij ] + E [Yij ]
Statistics 6545 Multivariate Statistical Methods
104
Robert Paige (Missouri S &T)
E [cXij ] = cE [Xij ]
Then it can be shown for random matrices X and Y (homework)
and constant matrices A and B that
E [X + Y ] = E [X] + E [Y ]
E [AXB] = AE [X] B
Example: The joint and marginal distributions of X1 and X2 are
given below
x1\x2 0 1 p1 (x1)
0
.2 .4
.6
1
.1 .3
.4
p2 (x2) .3 .7
1
E (X1) =
x1p1 (x1) = (0) (.6) + 1 (.4) = .4
x1
E (X2) =
x2p2 (x2) = (0) (.3) + 1 (.7) = .7
x2
Statistics 6545 Multivariate Statistical Methods
105
Robert Paige (Missouri S &T)
E (X) = E
X1
X2
=
E (X1)
E (X2)
=
.4
.7
Homework 2.2
2.6
Mean Vectors and Covariance Matrices
′
Suppose that X = [X1, . . . , Xp] is a random vector
The marginal µi means and variances σi2 are defined as
µi = E [Xi]
!
2
σi2 = E (Xi − µi)
The behavior of (Xi, Xj ) is described by their joint probability
distribution
A measure of linear association between them is their covariance
σij = E (Xi − µi) (Xj − µj )
Statistics 6545 Multivariate Statistical Methods
106
Robert Paige (Missouri S &T)
where
 "∞ "∞
(xi − µi) (xj − µj ) fij (xi, xj ) dxidxj if

−∞
−∞


(Xi, Xj ) are jointly continuous
σij =
if

xi
xj (xi − µi ) (xj − µj ) fij (xi , xj )


(Xi, Xj ) are jointly discrete
Note that
σii = σi2
σij is also denoted as
Cov (Xi, Xj )
Continuous random variables Xi and Xj are (statistically) independent if
fij (xi, xj ) = fi (xi) fj (xj )
Statistics 6545 Multivariate Statistical Methods
107
Robert Paige (Missouri S &T)
Continuous random variables X1, . . . , Xp are mutually (statistically) independent if
f12···p (x1, x2, . . . , xp) = f1 (x1) f2 (x2) · · · fp (xp)
Note that
Cov (Xi, Xj ) = 0
if Xi and Xj are independent
′
The means and covariances of X = [X1, X2, . . . , Xp] can be represented in matrix-vector form


E (X1)
 E (X2) 
=µ
E (X) = 
.


.
E (Xp)
Statistics 6545 Multivariate Statistical Methods
108
Robert Paige (Missouri S &T)


σ11 σ12 · · · σ1p
 σ21 σ22 · · · σ2p 
=Σ
Cov (X) = 
.
.
.
.
.
 .
. . 
.
σp1 σp2 · · · σpp
Example: The joint and marginal distributions of X1 and X2 are
x1\x2
0
1
p2 (x2)
0
.2
.1
.3
1 p1 (x1)
.4
.6
.3
.4
.7
1
Let’s find Σ
σ11 = E (X1 − µ1)2 =
x1
(x1 − µ1)2 p1 (x1)
= (0 − .4)2 (.6) + (1 − .4)2 (.4)
= 0.24
Statistics 6545 Multivariate Statistical Methods
109
Robert Paige (Missouri S &T)
σ22 = E (X2 − µ2)2 =
x2
(x2 − µ2)2 p2 (x2)
= (0 − .7)2 (.3) + (1 − .7)2 (.7)
= 0.21
σ12 = E (X1 − µ1) (X2 − µ2)
=
x1 ,x2
(x1 − µ1) (x2 − µ2) p (x1, x2)
= (0 − .4) (0 − .7) (.2) + (0 − .4) (1 − .7) (.4)
+ (1 − .4) (0 − .7) (.1) + (1 − .4) (1 − .7) (.3)
= 0.02
Σ=
Statistics 6545 Multivariate Statistical Methods
0.24 0.02
0.02 0.21
110
Robert Paige (Missouri S &T)
Note that
′
!
Σ = E (X − µ) (X − µ)



X1 − µ1
..
 [X1 − µ1, . . . , Xp − µp]
= E 
Xp − µp
This simplifies to


2
(X1 − µ1)
· · · (X1 − µ1) (Xp − µp)
..
..
...

E
(Xp − µp)2
(Xp − µp) (X1 − µ1) · · ·
and


2
· · · E (X1 − µ1) (Xp − µp)
E (X1 − µ1)
..
..
...


E (Xp − µp)2
E (Xp − µp) (X1 − µ1) · · ·
Statistics 6545 Multivariate Statistical Methods
111
Robert Paige (Missouri S &T)
and then

σ11 σ12
 σ21 σ22
 .
..
 .
σp1 σp2

· · · σ1p
· · · σ2p 
. . . .. 

· · · σpp
The population correlation coefficient ρij is
σij
ρij = √ √
σii σjj
A measure of linear association between Xi and Xj
Note that
σii
ρii = √ √ = 1
σii σii
Statistics 6545 Multivariate Statistical Methods
112
Robert Paige (Missouri S &T)
The population correlation matrix is given as


1 ρ12 · · · ρ1p
 ρ21 1 · · · ρ2p 
ρ=
.. . . . .. 

 ..
ρp1 ρp2 · · · 1
Example: The joint and marginal distributions of X1 and X2 are
x1\x2
0
1
p2 (x2)
Σ=
Statistics 6545 Multivariate Statistical Methods
0
.2
.1
.3
1 p1 (x1)
.4
.6
.3
.4
.7
1
0.24 0.02
0.02 0.21
113
Robert Paige (Missouri S &T)
Let’s find ρ
σ12
ρ12 = √ √
σ11 σ22
0.02
√
=√
0.24 0.21
= 0.089
1 0.089
0.089 1
The standard deviation matrix is given as
√

σ11 0 · · · 0
√

σ22 · · · 0 
0
1/2


= .
V
.
.
.
.
.
.
.
. 
√
0
0 · · · σkk
ρ=
It turns out that
V 1/2ρV 1/2 = Σ
Statistics 6545 Multivariate Statistical Methods
114
Robert Paige (Missouri S &T)
and
ρ= V
1/2
−1
Σ V
1/2
−1
Example: Consider covariance and correlation matrices
Σ=
ρ=
Now
V 1/2 =
0.24 0.02
0.02 0.21
1 0.089
0.089 1
√
σ11 0
√
0
σ22
Statistics 6545 Multivariate Statistical Methods
115
=
√
0.24 √ 0
0
0.21
Robert Paige (Missouri S &T)
and
V
=
=
2.6.1
1/2
−1
Σ V
1/2
−1
√
−1
0.24 √ 0
0.24 0.02
0.02 0.21
0
0.21
1 0.089
=ρ
0.089 1
√
0.24 √ 0
0
0.21
−1
Partitioning the Covariance Matrix
We can partition the p characteristics in X into two groups


X1
 .. 


(1)
 Xq 
X

X=
 Xq+1  = X (2)
 . 
 . 
Xp
Statistics 6545 Multivariate Statistical Methods
116
Robert Paige (Missouri S &T)


µ1
 .. 


 µq 
=
µ = E (X) = 
 µq+1 
 . 
 . 
µp
µ(1)
µ(2)
We determine the matrix of covariances between X (1) and X (2) as
′
X (1) − µ(1) X (2) − µ(2)



X1 − µ1
..
 [Xq+1 − µq+1, . . . , Xp − µp]
= E 
Xq − µq
Σ12 = E
Statistics 6545 Multivariate Statistical Methods
117
Robert Paige (Missouri S &T)
This simplifies to


E (X1 − µ1) (Xq+1 − µq+1) · · · E (X1 − µ1) (Xp − µp)
..
..
...


E (Xq − µq ) (Xq+1 − µq+1) · · · E (Xq − µq ) (Xp − µp)
and


σ1,q+1 σ1,q+2
 σ2,q+1 σ2,q+2
 .
..
 .
σq,q+1 σq,q+2
· · · σ1p
· · · σ2p 
. . . .. 
 = Σ12
· · · σqp
Note that
′
(X − µ) (X − µ)
=
X (1) − µ(1)
X (2) − µ(2)
X (1) − µ(1)
X (1) − µ(1)
Statistics 6545 Multivariate Statistical Methods
′
′
118
X (1) − µ(1)
X (2) − µ(2)
X (2) − µ(2)
X (2) − µ(2)
′
′
Robert Paige (Missouri S &T)
Therefore
′
Σ = E (X − µ) (X − µ)
!
X (1) − µ(1) X (1) − µ(1)
=E
X (2) − µ(2) X (1) − µ(1)
q p−q
q
Σ11 Σ12
=
Σ21 Σ22
p−q
Note that

σ11 σ12
 σ21 σ22
Σ11 = 
..
 ..
σq1 σq2
Statistics 6545 Multivariate Statistical Methods
119
′
′
X (1) − µ(1)
X (2) − µ(2)
X (2) − µ(2)
X (2) − µ(2)
′
′

· · · σ1q
· · · σ2q 
. . . .. 

· · · σqq
Robert Paige (Missouri S &T)


σq+1,q+1 σq+1,q+2 · · · σq+1,p
 σq+2,q+1 σq+2,q+2 · · · σq+2,p 

Σ22 = 
.
.
.
.
.

.
.
.
. 
σp,q+1 σp,q+12 · · · σpp


σ1,q+1 σ1,q+2 · · · σ1p
 σ2,q+1 σ2,q+2 · · · σ2p 

Σ12 = 
.
.
.
.
.
 .
. . 
.
σq,q+1 σq,q+2 · · · σqp


σq+1,1 σq+1,2 · · · σq+1,q
 σq+2,1 σq+2,2 · · · σq+2,q 

Σ21 = 
.
.
.
.
.
 .
.
.
. 
σp1
σp2 · · · σpq
Σ′12 = Σ21
Note that we sometimes use the notation
Cov X (1), X (2) = Σ12
Statistics 6545 Multivariate Statistical Methods
120
Robert Paige (Missouri S &T)
2.6.2
The Mean Vector and Covariance Matrix for Linear Combinations of Random Variables
Note that for scalar random variables X1 and X2, and constants
a, b, and c we have that
E (cX1) = cE (X1)
2
V ar (cX1) = E (cX1 − cµ1)
2
= c2E (X1 − µ1)
= c2V ar (X1)
= c2σ11
Statistics 6545 Multivariate Statistical Methods
121
!
!
Robert Paige (Missouri S &T)
Cov (aX1, bX2) = E [(aX1 − aµ1) (bX2 − bµ2)]
= abE [(X1 − µ1) (X2 − µ2)]
= abCov (X1, X2) = abσ12
E [aX1 + bX2] = aE [X1] + bE [X2]
= aµ1 + bµ2
V ar [aX1 + bX2] = E [(aX1 + bX2) − (aµ1 + bµ2)]2
= E [(aX1 − aµ1) + (bX2 − bµ2)]2
2
a2 (X1 − µ1)2 + b2 (X2 − µ2)2
=E
+2ab (X1 − µ1) (X2 − µ2)
= a2V ar (X1) + b2V ar (X2) + 2abCov (X1, X2)
= a2σ11 + b2σ22 + 2abσ12
Statistics 6545 Multivariate Statistical Methods
122
Robert Paige (Missouri S &T)
Note that
X1
X2
aX1 + bX2 = a b
= c′X
and
E [aX1 + bX2] = a b
µ1
µ2
= c′ µ
V ar [aX1 + bX2] = a b
σ11 σ12
σ21 σ22
a
b
= c′Σc
In general, for linear combination
c′X = c1X1 + · · · + cpXp
Statistics 6545 Multivariate Statistical Methods
123
Robert Paige (Missouri S &T)
we have
E [c′X] = c′µ
V ar [c′X] = c′Σc
where
E [X] = µ
Cov [X] = Σ
More generally for a constant matrix C
E [CX] = Cµ
Cov [CX] = CΣC ′
Example: Let
X=
Statistics 6545 Multivariate Statistical Methods
X1
X2
124
Robert Paige (Missouri S &T)
be a random vector with mean vector
µ1
µX =
µ2
and covariance matrix
σ11 σ12
σ21 σ22
ΣX =
Let’s find the mean vector and covariance matrix of
Z1 = X1 − X2
Z=
Z1
Z2
µZ = CµX =
Z2 = X1 + X2
1 −1
X1
=
1 1
X2
1 −1
1 1
Statistics 6545 Multivariate Statistical Methods
125
µ1
µ2
=
= CX
µ1 − µ2
µ1 + µ2
Robert Paige (Missouri S &T)
1 1
1 −1
σ11 σ12
1 1
σ21 σ22
−1 1
σ11 − 2σ12 + σ22
σ11 − σ22
σ11 − σ22
σ11 + 2σ12 + σ22
ΣZ = CΣX C ′ =
=
If σ11 = σ22 then X1 − X2 and X1 + X2 are uncorrelated
2.6.3
Partitioning the Sample Mean Vector
and Covariance Matrix
Our data in matrix form is

x11 x12 · · · x1k
 x21 x22 · · · x2k
 .
..
..
 .
X=
 xj1 xj2 · · · xjk
 .
..
..
 .
xn1 xn2 · · · xnk
Statistics 6545 Multivariate Statistical Methods
126
· · · x1p
· · · x2p
..
· · · xjp
..
· · · xnp








Robert Paige (Missouri S &T)

x̄1
 x̄2 

x̄ = 
.
 . 
x̄p
where
where

n
1
x̄k =
xjk
n j=1

s11 s12 · · ·
 s21 s22 · · ·
Sn = 
 .. .. . . .
sp1 sp2 · · ·
1
sik =
n
n
j=1

s1p
s2p 
.. 

spp
(xji − x̄i) (xjk − x̄k )
We can of course also partition the sample mean vector
Statistics 6545 Multivariate Statistical Methods
127
Robert Paige (Missouri S &T)


x̄1
 .. 


 x̄q 

x̄ = 
 x̄q+1  =
 . 
 . 
x̄p
and the sample covariance matrix
Sn =
where
q
p−q

s11 s12
 s21 s22
S11 = 
 .. ..
sq1 sq2
Statistics 6545 Multivariate Statistical Methods
128
x̄(1)
x̄(2)
q p−q
S11 S12
S21 S22

· · · s1q
· · · s2q 
. . . .. 

· · · sqq
Robert Paige (Missouri S &T)

sq+1,q+1 sq+1,q+2 · · ·
 sq+2,q+1 sq+2,q+2 · · ·
S22 = 
..
..
...

sp,q+1 sp,q+12 · · ·

s1,q+1 s1,q+2 · · ·
 s2,q+1 s2,q+2 · · ·
S12 = 
..
...
 ..
sq,q+1 sq,q+2 · · ·
′
S21 = S12
2.7

sq+1,p
sq+2,p 

..

spp

s1p
s2p 
.. 

sqp
Matrix Inequalities and Maximization
Cauchy-Schwarz Inequality: Let b and d be any two p × 1 vectors,
then
2
(b′d) ≤ (b′b) (d′d)
with equality holding if and only if b = cd for some scalar constant
c
Statistics 6545 Multivariate Statistical Methods
129
Robert Paige (Missouri S &T)
Review (completing the square):
Idea is to write
a1x2 + a2x + a3
in the form
a1(x − h)2 + k
Equating coefficients yields
−a2
h=
2a1
a22
k = a3 −
4a1
Proof:
Trivially true if b = 0 or d = 0
Assume that b = 0 or d = 0 and consider
b − xd
where x is an arbitrary scalar constant
Statistics 6545 Multivariate Statistical Methods
130
Robert Paige (Missouri S &T)
Now
0 < (b − xd)′ (b − xd) = b′b − 2xb′d + x2d′d
we can complete the square in the right to obtain
a1 = d′d
a2 = −2b′d
a3 = b′b
−a2 2b′d b′d
h=
= ′ = ′
2a1
2d d d d
a22
(−2b′d)2
(b′d)2
′
′
k = a3 −
=bb−
=bb− ′
′
4a1
4d d
dd
2
′ 2
′
(b
d)
b
d
0 < b′b − ′ + (d′d) x − ′
dd
dd
If we let
b′d
x= ′
dd
Statistics 6545 Multivariate Statistical Methods
131
Robert Paige (Missouri S &T)
we obtain
and
′ 2
(b
d)
′
0<bb− ′
dd
2
0 < (b′b) (d′d) − (b′d)
If
b = cd
then
(b − cd)′ (b − cd) = 0
and we can retrace the previous steps to show that
2
0 = (b′b) (d′d) − (b′d)
Extended Cauchy-Schwarz Inequality: Let b and d be any two p×1
vectors and B be a p × p positive definite matrix then
2
(b′d) ≤ (b′Bb) d′B −1d
Statistics 6545 Multivariate Statistical Methods
132
Robert Paige (Missouri S &T)
with equality holding if and only if b = cB −1d for some scalar
constant c
Proof:
Trivially true if b = 0 or d = 0
Assume that b = 0 or d = 0
B 1/2 may be written as
B
1/2
′
λ1e1e1 +
B
λpepep
1
1
′
′
= √ e1e1 + √ e2e2 + · · · +
λ1
λ2
1
′
ep ep
λp
Now
′
′
λ2e2e2 + · · · +
=
and B −1/2 is
−1/2
′
′
bd=bB
1/2
B
−1/2
Statistics 6545 Multivariate Statistical Methods
d= B
133
1/2
b
′
B −1/2d
Robert Paige (Missouri S &T)
Finally we apply the Cauchy-Schwarz Inequality to B 1/2b and
B −1/2d
Maximization Lemma: Let B be a p × p positive definite matrix
and d a given p × 1 vector then
(x′d)2
max ′
= d′B −1d
x=0 x Bx
where the maximum is attained for
x = cB −1d
for any scalar c = 0
Proof:
By extended Cauchy-Schwarz
2
(x′d) ≤ (x′Bx) d′B −1d
Statistics 6545 Multivariate Statistical Methods
134
Robert Paige (Missouri S &T)
Dividing by x′Bx > 0 yields
(x′d)2
′ −1
≤
d
B d
x′Bx
so that the maximum must occur for
x = cB −1d
where c = 0
Maximization of Quadratic Forms for Points on the Unit Sphere:
Let B be a p × p positive definite matrix with eigenvalues
λ1 ≥ λ2 ≥ · · · ≥ λp > 0
and associated normalized eigenvectors e1, e2, . . . , ep. Then
x′Bx
max ′ = λ1
x=0 x x
Statistics 6545 Multivariate Statistical Methods
135
Robert Paige (Missouri S &T)
attained when x = e1;
x′Bx
min ′ = λp
x=0 x x
attained when x = ep. Also
x′Bx
max
= λk+1
x⊥e1 ,e2,...,ek x′ x
attained when x = ek+1 for k = 1, 2, . . . , p − 1
Proof:
Consider the spectral decomposition of B;
B = P ΛP ′
and
B 1/2 = P Λ1/2P ′
Statistics 6545 Multivariate Statistical Methods
136
Robert Paige (Missouri S &T)
Now let y = P ′x
x′Bx x′B 1/2B 1/2x
=
x′x
x′P P ′x
x′B 1/2B 1/2x
=
y ′y
x′P Λ1/2P ′P Λ1/2P ′x
=
y ′y
y ′Λy
= ′
yy
p
2
λ
y
i
i
= i=1
p
2
i=1 yi
and
p
2
i=1 λi yi
p
2
i=1 yi
Statistics 6545 Multivariate Statistical Methods
≤ λ1
137
p
2
i=1 yi
p
2
i=1 yi
= λ1
Robert Paige (Missouri S &T)
When x = e1 we have
and


1
0
′

y = P e1 = 
.
.
0
x′Bx
=
′
xx
p
2
i=1 λi yi
p
2
i=1 yi
= λ1
Now note that
p
2
λ
y
i
i
i=1
p
2
i=1 yi
≥ λp
When x = ep we have
p
2
y
i=1 i
p
2
y
i=1 i


0
0
′

y = P ep = 
 .. 
1
Statistics 6545 Multivariate Statistical Methods
138
= λp
Robert Paige (Missouri S &T)
and
p
2
i=1 λi yi
p
2
i=1 yi
x′Bx
=
′
xx
Now
= λp
y = P ′x
so
x = Py
= y1e1 + y2e2 + · · · + ypep
Since x ⊥ e1, e2, . . . , ek we have that for 1 ≤ i ≤ k
0 = e′ix = y1e′ie1 + y2e′ie2 + · · · + ype′iep
= yi
and
x′Bx
=
x′x
Statistics 6545 Multivariate Statistical Methods
p
2
λ
y
i
i
i=k+1
p
2
i=k+1 yi
139
≤ λk+1
Robert Paige (Missouri S &T)
The maximum is attained for yk+1 = 1, yk+2 = · · · = yp = 0
Example: Let
1 0
0 2
B=
Here λ1 = 2, λ2 = 1 and
e1 =
0
1
is attained for
x1
x2
1 0
0 2
x′Bx = x1 x2
we have that
e2 =
1
0
= x21 + 2x22
x21 + 2x22
max 2
=2
x=0 x1 + x22
e1 =
Statistics 6545 Multivariate Statistical Methods
140
0
1
Robert Paige (Missouri S &T)
x21 + 2x22
min 2
=1
2
x=0 x1 + x2
is attained at
e2 =
Statistics 6545 Multivariate Statistical Methods
141
1
0
Robert Paige (Missouri S &T)
Note that
where
x′Bx
′
=
z
Bz
′
xx
√
z = x/ x′x
Vector z lies on the surface of the p-dimensional unit sphere centered at 0 since
z ′z = 1
Therefore the results of this section actually have to do with the
max/minimization of x′Bx on the unit sphere
Homework 2.3
Statistics 6545 Multivariate Statistical Methods
142
Robert Paige (Missouri S &T)
Download