Basic Concepts in Matrix Algebra x

advertisement
Basic Concepts in Matrix Algebra
• An column array of p elements is called a vector of dimension p and is
written as


x1
 x 

xp×1 = 
 ..2  .
 . 
xp
• The transpose of the column vector xp×1 is row vector
x0 = [x1 x2 . . . xp]
• A vector can be represented in p-space as a directed line with components along the p axes.
38
Basic Matrix Concepts (cont’d)
• Two vectors can be added if they have the same dimension. Addition
is carried out elementwise.

x+y


=

x1
x2
...
xp






+


y1
y2
...
yp






=


x1 + y1
x2 + y2
...
xp + yp





• A vector can be contracted or expanded if multiplied by a constant c.
Multiplication is also elementwise.

cx


= c

x1
x2
...
xp






=


cx1
cx2
...
cxp





39
Examples

x

=

2
h
i

0
1  and x = 2 1 −4
−4






12
6×2
2






6×x=6× 1 =
6 
6×1 =
−24
6 × (−4)
−4

x+y

=







2
5
2+5
7







1  +  −2  =  1 − 2  =  −1 
−4
0
−4 + 0
−4
40
Basic Matrix Concepts (cont’d)
• Multiplication by c > 0 does not change the direction of x. Direction is
reversed if c < 0.
41
Basic Matrix Concepts (cont’d)
• The length of a vector x is the Euclidean distance from the origin
v
u p
uX
x2
Lx = u
t
j
j=1
• Multiplication of a vector x by a constant c changes the length:
v
v
u p
u p
uX
uX
u
2
2
c xj = |c|u
x2
Lcx = t
t
j = |c|Lx
j=1
j=1
• If c = L−1
x , then cx is a vector of unit length.
42
Examples

The length of x
Lx =


=

q
2
1
−4
−2



 is

(2)2 + (1)2 + (−4)2 + (−2)2 =
√
25 = 5
Then

z
2
1 
 1
= ×
5  −4
−2


0.4


 0.2


=

 −0.8




−0.4
is a vector of unit length.
43
Angle Between Vectors
• Consider two vectors x and y in two dimensions. If θ1 is the angle
between x and the horizontal axis and θ2 > θ1 is the angle between
y and the horizontal axis, then
x
y
cos(θ1) = 1
cos(θ2) = 1
Lx
Ly
x
y
sin(θ1) = 2
sin(θ2) = 2 ,
Lx
Ly
If θ is the angle between x and y, then
cos(θ) = cos(θ2 − θ1) = cos(θ2) cos(θ1) + sin(θ2) sin(θ1).
Then
x y + x2y2
cos(θ) = 1 1
.
LxLy
44
Angle Between Vectors (cont’d)
45
Inner Product
• The inner product between two vectors x and y is
x0y =
p
X
xj yj .
j=1
• Then Lx =
√
x0x, Ly =
q
y0y and
cos(θ) = q
x0y
(x0x)
q
(y 0 y )
• Since cos(θ) = 0 when x0y = 0 and cos(θ) = 0 for θ = 90
or θ = 270, then the vectors are perpendicular (orthogonal) when
x0y = 0.
46
Linear Dependence
• Two vectors, x and y, are linearly dependent if there exist
two constants c1 and c2, not both zero, such that
c1 x + c2 y = 0
• If two vectors are linearly dependent, then one can be written as a
linear combination of the other. From above:
x = (c2/c1)y
• k vectors, x1, x2, . . . , xk , are linearly dependent if there exist constants (c1, c2, ..., ck ) not all zero such that
k
X
j=1
cj xj = 0
47
• Vectors of the same dimension that are not linearly
dependent are said to be linearly independent
Linear Independence-example
Let

1

x1 = 
 2 ,
1


x2 = 



1

0 ,
−1
1


x3 = 
 −2 
1
Then c1x1 + c2x2 + c3x3 = 0 if
c1 + c2 + c3 = 0
2c1 + 0 − 2c3 = 0
c1 − c2 + c3 = 0
The unique solution is c1 = c2 = c3 = 0, so the vectors are linearly
independent.
48
Projections
• The projection of x on y is defined by
x0y 1
x0y
y.
Projection of x on y = 0 y =
yy
Ly Ly
• The length of the projection is
|x0y|
|x0y|
Length of projection =
= Lx
= Lx| cos(θ)|,
Ly
LxLy
where θ is the angle between x and y.
49
Matrices
A matrix A is an array of elements aij with n rows and p columns:



A=

a11 a12 · · · a1p
a21 a22 · · · a2p
...
...
...
...
an1 an2 · · · anp
The transpose A0 has p rows and n columns.
column of A

a11 a21 · · ·
 a
a22 · · ·

0
A =  12
...
...
 ...
a1p a2p · · ·





The j-th row of A0 is the j-th
an1
an2
...
anp





50
Matrix Algebra
• Multiplication of A by a constant c is carried out element by element.



cA = 

ca11 ca12 · · · ca1p
ca21 ca22 · · · ca2p
...
...
...
...
can1 can2 · · · canp





51
Matrix Addition
Two matrices An×p = {aij } and Bn×p = {bij } of the same dimensions can be added element by element. The resulting matrix is Cn×p =
{cij } = {aij + bij }
C = A+B



= 




= 

a11 a12 · · · a1p
a21 a22 · · · a2p
...
...
...
...
an1 an2 · · · anp






+


b11 b12 · · · b1p
b21 b22 · · · b2p
...
...
...
...
bn1 bn2 · · · bnp
a11 + b11 a12 + b12 · · · a1p + b1p
a21 + b21 a22 + b22 · · · a2p + b2p
...
...
...
...
an1 + bn1 an2 + bn2 · · · anp + bnp










52
Examples
"
"
2 1 −4
5 7
0
6×
2 1 −4
5 7
0
"
#
2 −1
0
3
"
+

#0

=
#
"
=
2 1
5 7

2 5

1 7 
−4 0
12 6 −24
30 42
0
#
"
=
4 0
5 10
#
#
53
Matrix Multiplication
• Multiplication of two matrices An×p and Bm×q can be carried out only
if the matrices are compatible for multiplication:
– An×p × Bm×q : compatible if p = m.
– Bm×q × An×p: compatible if q = n.
The element in the i-th row and the j-th column of A × B is the inner
product of the i-th row of A with the j-th column of B.
54
Multiplication Examples
"
2 0 1
5 1 3
"
"
2 1
5 3
#

"
#
1 4
2 10


×  −1 3  =
4 29
0 2
#
1 4
−1 3

"
×
#
1 4
−1 3
"
×
2 1
5 3
#
"
=
#
"
=
1 11
2 29
22 13
13 8
#
#
55
Identity Matrix
• An identity matrix, denoted by I, is a square matrix with 1’s along the
main diagonal and 0’s everywhere else. For example,
"
I2×2 =
1 0
0 1
#


1 0 0


and I3×3 =  0 1 0 
0 0 1
• If A is a square matrix, then AI = IA = A.
• In×nAn×p = An×p but An×pIn×n is not defined for p 6= n.
56
Symmetric Matrices
• A square matrix is symmetric if A = A0.
• If a square matrix A has elements {aij }, then A is symmetric if aij =
aji.
• Examples
"
4 2
2 4
#


5
1 −3


 1 12 −5 
−3 −5
9
57
Inverse Matrix
• Consider two square matrices Ak×k and Bk×k . If
AB = BA = I
then B is the inverse of A, denoted A−1.
• The inverse of A exists only if the columns of A are linearly independent.
• If A = diag{aij } then A−1 = diag{1/aij }.
58
Inverse Matrix
• For a 2 × 2 matrix A, the inverse is
A−1 =
"
a11 a12
a21 a22
#−1
1
=
det(A)
"
a22 −a12
−a21
a11
#
,
where det(A) = (a11 × a22) − (a12 × a21) denotes the
determinant of A.
59
Orthogonal Matrices
• A square matrix Q is orthogonal if
QQ0 = Q0Q = I,
or Q0 = Q−1.
• If Q is orthogonal, its rows and columns have unit length (q0j qj = 1)
and are mutually perpendicular (q0j qk = 0 for any j 6= k).
60
Eigenvalues and Eigenvectors
• A square matrix A has an eigenvalue λ with corresponding eigenvector z 6= 0 if
Az = λz
• The eigenvalues of A are the solution to |A − λI| = 0.
• A normalized eigenvector (of unit length) is denoted by e.
• A k × k matrix A has k pairs of eigenvalues and eigenvectors
λ1, e1
λ2, e2
...
λk , ek
where e0iei = 1, e0iej = 0 and the eigenvectors are unique up to a
change in sign unless two or more eigenvalues are equal.
61
Spectral Decomposition
• Eigenvalues and eigenvectors will play an important role in this course.
For example, principal components are based on the eigenvalues and
eigenvectors of sample covariance matrices.
• The spectral decomposition of a k × k symmetric matrix A is
A = λ1e1e01 + λ2e2e02 + ... + λk ek e0k



= [e1 e2 · · · ek ] 

λ1 0 · · · 0
0 λ2 · · · 0
...
... . . . ...
0 0 · · · λk



 [e1 e2 · · ·

ek ]0
= P ΛP 0
62
Determinant and Trace
• The trace of a k × k matrix A is the sum of the diagonal elements, i.e.,
P
trace(A) = ki=1 aii
• The trace of a square, symmetric matrix A is the sum of the eigenvalP
P
ues, i.e., trace(A) = ki=1 aii = ki=1 λi
• The determinant of a square, symmetric matrix A is the product of the
Q
eigenvalues, i.e., |A| = ki=1 λi
63
Rank of a Matrix
• The rank of a square matrix A is
– The number of linearly independent rows
– The number of linearly independent columns
– The number of non-zero eigenvalues
• The inverse of a k × k matrix A exists, if and only if
rank(A) = k
i.e., there are no zero eigenvalues
64
Positive Definite Matrix
• For a k × k symmetric matrix A and a vector x = [x1, x2, ..., xk ]0 the
quantity x0Ax is called a quadratic form
• Note that x0Ax =
Pk
Pk
i=1 j=1 aij xi xj
• If x0Ax ≥ 0 for any vector x, both A and the quadratic form are said
to be non-negative definite.
• If x0Ax > 0 for any vector x 6= 0, both A and the quadratic form are
said to be positive definite.
65
Example 2.11
2−2
• Show that the matrix of the quadratic form 3x2
+
2x
1
2
√
2x1x2 is
positive definite.
• For
"
A=
√
√3 − 2
− 2
2
#
,
the eigenvalues are λ1 = 4, λ2 = 1. Then A = 4e1e01 + e2e02. Write
x0Ax = 4x0e1e01x + x0e2e02x
= 4y12 + y22 ≥ 0,
and is zero only for y1 = y2 = 0.
66
Example 2.11 (cont’d)
• y1, y2 cannot be zero because
"
y1
y2
#
"
=
e01
e02
#"
x1
x2
#
0
= P2×2
x2×1
with P 0 orthonormal so that (P 0)−1 = P . Then x = P y and since
x 6= 0 it follows that y 6= 0.
• Using the spectral decomposition, we can show that:
– A is positive definite if all of its eigenvalues are positive.
– A is non-negative definite if all of its eigenvalues are ≥ 0.
67
Distance and Quadratic Forms
• For x = [x1, x2, ..., xp]0 and a p × p positive definite matrix A,
d2 = x0Ax > 0
when x 6= 0. Thus, a positive definite quadratic form can be interpreted as a squared distance of x from the origin and vice versa.
• The squared distance from x to a fixed point µ is given by the quadratic
form
(x − µ)0A(x − µ).
68
Distance and Quadratic Forms (cont’d)
• We can interpret distance in terms of eigenvalues and eigenvectors of
A as well. Any point x at constant distance c from the origin satisfies
x0Ax = x0(
p
X
λj ej e0j )x =
j=1
p
X
λj (x0ej )2 = c2,
j=1
the expression for an ellipsoid in p dimensions.
−1/2
• Note that the point x = cλ1
e1 is at a distance c (in the direction of
e1) from the origin because it satisfies x0Ax = c2. The same is true
−1/2
for points x = cλj
ej , j = 1, ..., p. Thus, all points at distance c lie
on an ellipsoid with axes in the directions of the eigenvectors and with
−1/2
lengths proportional to λj
.
69
Distance and Quadratic Forms (cont’d)
70
Square-Root Matrices
• Spectral decomposition of a positive definite matrix A yields
A=
p
X
λj ej e0j = P ΛP,
j=1
with Λk×k = diag{λj }, all λj > 0, and Pk×k = [e1 e2 ... ep] an
orthonormal matrix of eigenvectors. Then
A−1 = P Λ−1P 0 =
1/2
• With Λ1/2 = diag{λj
p
X
1
j=1 λj
ej e0j
}, a square-root matrix is
A1/2 = P Λ1/2P 0 =
p q
X
λj ej e0j
j=1
71
Square-Root Matrices
The square root of a positive definite matrix A has the
following properties:
1. Symmetry: (A1/2)0 = A1/2
2. A1/2A1/2 = A
3. A−1/2 =
Pp
−1/2
0 =
λ
e
e
j
j
j=1 j
P Lambda−1/2P 0
4. A1/2A−1/2 = A−1/2A1/2 = I
5. A−1/2A−1/2 = A−1
Note that there are other ways of defining the square root of a positive
definite matrix: in the Cholesky decomposition A = LL0, with L a matrix
of lower triangular form, L is also called a square root of A.
72
Random Vectors and Matrices
• A random matrix (vector) is a matrix (vector) whose elements are random variables.
• If Xn×p is a random matrix, the expected value of X is the n×p matrix



E(X) = 

E(X11) E(X12)
E(X21) E(X22)
...
...
E(Xn1) E(Xn2)

· · · E(X1p)
· · · E(X2p) 

,
.
..

···
· · · E(Xnp)
where
E(Xij ) =
Z ∞
∞
xij fij (xij )dxij
with fij (xij ) the density function of the continuous random variable
Xij . If X is a discrete random variable, we compute its expectation as
a sum rather than an integral.
73
Linear Combinations
• The usual rules for expectations apply. If X and Y are two random
matrices and A and B are two constant matrices of the appropriate
dimensions, then
E(X + Y ) = E(X) + E(Y )
E(AX) = AE(X)
E(AXB) = AE(X)B
E(AX + BY ) = AE(X) + BE(Y )
• Further, if c is a scalar-valued constant then
E(cX) = cE(X).
74
Mean Vectors and Covariance Matrices
• Suppose that X is p × 1 (continuous) random vector drawn from some
p−dimensional distribution.
• Each element of X, say Xj has its own marginal distribution with
marginal mean µj and variance σjj defined in the usual way:
µj =
σjj =
Z ∞
−∞
Z
−∞
xj fj (xj )dxj
(xj − µj )2fj (xj )dxj
75
Mean Vectors and Covariance Matrices (cont’d)
• To examine association between a pair of random variables we need
to consider their joint distribution.
• A measure of the linear association between pairs of variables is given
by the covariance
h
σjk = E (Xj − µj )(Xk − µk )
=
Z ∞ Z ∞
−∞ −∞
i
(xj − µj )(xk − µk )fjk (xj , xk )dxj dxk .
76
Mean Vectors and Covariance Matrices (cont’d)
• If the joint density function fjk (xj , xk ) can be written as
the product of the two marginal densities, e.g.,
fjk (xj , xk ) = fj (xj )fk (xk ),
then Xj and Xk are independent.
• More generally, the p−dimensional random vector X has
mutually independent elements if the p−dimensional joint
density function can be written as the product of the p
univariate marginal densities.
• If two random variables Xj and Xk are independent, then their covariance is equal to 0. [Converse is not always true.]
77
Mean Vectors and Covariance Matrices (cont’d)
• We use µ to denote the p × 1 vector of marginal population means
and use Σ to denote the p × p population variance-covariance matrix:
i
0
Σ = E (X − µ)(X − µ) .
h
• If we carry out the multiplication (outer product)then Σ is equal to:

)2
(X1 − µ1
(X1 − µ1 )(X2 − µ2 )
2
 (X2 − µ2)(X1 − µ1)
(X
−
µ
)
2
2
E
...
...

(Xp − µp )(X1 − µ1 ) (Xp − µp )(X2 − µ2 )
···
···
···
···

(X1 − µ1 )(Xp − µp )
(X2 − µ2 )(Xp − µp ) 
.
...

(Xp − µp )2
78
Mean Vectors and Covariance Matrices (cont’d)
• By taking expectations element-wise we find that



Σ=

σ11 σ12 · · · σ1p
σ21 σ22 · · · σ2p
...
...
...
···
σp1 σp2 · · · σpp



.

• Since σjk = σkj for all j 6= k we note that Σ is symmetric.
• Σ is also non-negative definite
79
Correlation Matrix
• The population correlation matrix is the p × p matrix with
off-diagonal elements equal to ρjk and diagonal elements
equal to 1.






1 ρ12 · · · ρ1p
ρ21 1 · · · ρ2p 

.
...
... · · ·
... 

ρp1 ρp2 · · · 1
• Since ρij = ρji the correlation matrix is symmetric
• The correlation matrix is also non-negative definite
80
Correlation Matrix (cont’d)
• The p × p population standard deviation matrix V 1/2 is a diagonal ma√
trix with σjj along the diagonal and zeros in all off-diagonal positions.
Then
Σ = V 1/2P V 1/2
and the population correlation matrix is
(V 1/2)−1Σ(V 1/2)−1
• Given Σ, we can easily obtain the correlation matrix
81
Partitioning Random vectors
• If we partition the random p × 1 vector X into two components X1, X2
of dimensions q × 1 and (p − q) × 1 respectively, then the mean vector
and the variance-covariance matrix need to be partitioned accordingly.
• Partitioned mean vector:
"
E(X) = E
X1
X2
#
"
=
E(X1)
E(X2)
#
"
=
µ1
#
µ2
• Partitioned variance-covariance matrix:
"
V ar(X1)
#
"
#
Cov(X1, X2)
Σ11 Σ12
Σ=
=
,
0
Cov(X2, X1)
V ar(X2)
Σ12 Σ22
where Σ11 is q × q, Σ12 is q × (p − q) and Σ22 is (p − q) × (p − q).
82
Partitioning Covariance Matrices (cont’d)
• Σ11, Σ22 are the variance-covariance matrices of the
sub-vectors X1, X2, respectively. The off-diagonal elements
in those two matrices reflect linear associations among
elements within each sub-vector.
• There are no variances in Σ12, only covariances. These
covariancs reflect linear associations between elements
in the two different sub-vectors.
83
Linear Combinations of Random variables
• Let X be a p × 1 vector with mean µ and variance covariance matrix
Σ, and let c be a p×1 vector of constants. Then the linear combination
c0X has mean and variance:
E(c0X) = c0µ,
and
V ar(c0X) = c0Σc
• In general, the mean and variance of a q × 1 vector of linear combinations Z = Cq×pXp×1 are
µZ = CµX and ΣZ = CΣX C 0.
84
Cauchy-Schwarz Inequality
• We will need some of the results below to derive some
maximization results later in the course.
Cauchy-Schwarz inequality Let b and d be any two
p × 1 vectors. Then,
(b0d)2 ≤ (b0b)(d0d)
with equality only if b = cd for some scalar constant c .
Proof: The equality is obvious for b = 0 or d = 0. For other cases,
consider b − cd for any constant c 6= 0 . Then if b − cd 6= 0, we have
0 < (b − cd)0(b − cd) = b0b − 2c(b0d) + c2d0d,
since b − cd must have positive length.
85
Cauchy-Schwarz Inequality
We can add and subtract (b0d)2/(d0d) to obtain
0 < b0b−2c(b0d)+c2d0d−
(b0d)2
d0d
+
(b0d)2
d0d
= b0b−
!2
0
bd
(b0d)2
0 d) c −
+(
d
d0d
d0d
Since c can be anything, we can choose c = b0d/d0d. Then,
(b0d)2
0
0<bb−
d0d
⇒
(b0d)2 < (b0b)(d0d)
for b 6= cd (otherwise, we have equality).
86
Extended Cauchy-Schwarz Inequality
If b and d are any two p × 1 vectors and B is a p × p positive definite
matrix, then
(b0d)2 ≤ (b0B b)(d0B −1d)
with equality if and only if b = cB −1d or d = cB b for some
constant c.
√
Pp
Pp
1/2
0
−1/2
= i=1 λieiei, and B
= i=1 √1 eie0i.
Proof: Consider B
( λi )
Then we can write
b0d =
b0 I d =
b0B 1/2B −1/2d = (B 1/2b)0(B −1/2d) =
0 ∗
∗
b d .
To complete the proof, simply apply the Cauchy-Schwarz
inequality to the vectors b∗ and d∗.
87
Optimization
Let B be positive definite and let d be any p × 1 vector. Then
(x0d)2
= d0B −1d
max 0
x6=0 x B x
is attained when x = cB −1d for any constant c 6= 0.
Proof: By the extended Cauchy-Schwartz inequality: (x0d)2 ≤ (x0B x)(d0B −1d).
Since x 6= 0 and B is positive definite, x0B x > 0 and we can divide both
sides by x0B x to get an upper bound
(x0d)2
0 B −1 d.
≤
d
x0B x
Differentiating the left side with respect to x shows that
maximum is attained at x = cB −1d.
88
Maximization of a Quadratic Form
on a Unit Sphere
• B is positive definite with eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λp ≥ 0 and
associated eigenvectors (normalized) e1, e2, · · · , ep. Then
x0B x
max 0
= λ1,
x6=0 x x
attained when x = e1
x0B x
min 0
= λp,
x6=0 x x
attained when x = ep.
• Furthermore, for k = 1, 2, · · · , p − 1,
x0B x
= λk+1 is attained when x = ek+1.
max
0
x⊥e1,e2,··· ,ek x x
See proof at end of chapter 2 in the textbook (pages 80-81).
89
Download