Notes

advertisement
MA.1
Introduction to matrix algebra
The basics
What is a matrix? From the regression analysis book of
Kutner, Nachtsheim, and Neter (2004, p. 176):
“A matrix is a rectangular array of elements arranged in
rows and columns”
Example:
1 2 3
4 5 6


The dimension or size of a matrix: # rows  # columns = rc
Example: 23
Rather than using numbers, we can represent parts of a
matrix (and the matrix itself) by symbols.
Example:
MA.2
 a11 a12
A
a21 a22
a13 
a23 
where aij is the row i and column j element of A
a11 is often called the “(1,1) element” of A, a12 is called
the “(1,2) element” of A,…
a11 = 1 from the previous example
Notice that the matrix A was bolded. When bolding is not
possible (writing on a piece of paper or chalkboard), the
letter is underlined - A
Example: A matrix of size rc
 a11 a12
a
a22
 21

A
 ai1 ai2


 ar1 ar 2
a1j
a2 j
aij
arj
a1c 
a2c 



aic 


arc 
Example: Square matrix is rc where r = c
MA.3
Example: HS and College GPA
From the introduction to R lecture, we can start putting
parts of the observed data into a matrix form. For
example,
1
1

1
X

1

1
3.04 
2.35 

2.70 


2.28 

1.88 
The above 202 matrix contains the HS GPAs in the
second column. It will be clear shortly why we have a
leading column of 1’s.
Vector – a r1 (column vector) or 1c (row vector) matrix –
special case of a matrix
Example: Symbolic representation of a 31 column vector
MA.4
 a1 
A  a 2 
 
a3 
Example: HS and College GPA
3.10 
2.30 


3.00 
Y



2.20 


1.60


The above 201 vector contains the College GPAs.
Transpose – Interchange the rows and columns of a matrix
or vector
Example:
 a11 a21 
 a11 a12 a13 
a


A

a
A
and
22 

 12
a
a
a
22
23 
 21
a13 a23 
A is 23 and A is 32
MA.5
The  symbol indicates a transpose, and it is said as the
word “prime”. Thus, the transpose of A is “A prime”.
Example: HS and College GPA
Y  3.10 2.30 3.00
2.20 1.60
MA.6
Matrix addition and subtraction
Add or subtract the corresponding elements of matrices
with the same dimension.
Example:
1 2 3
Suppose A  
and B 

4 5 6
 1 10 1
 5 5 8 .


0 12 2 
Then A  B  
and A  B 

9 10 14 
 2 8 4 
 1 0 2 .


Example: Using R (basic_matrix_algebra.R)
> A<-matrix(data = c(1, 2, 3,
4, 5, 6), nrow = 2, ncol = 3, byrow =
TRUE)
> A
[,1] [,2] [,3]
[1,]
1
2
3
[2,]
4
5
6
> class(A)
[1] "matrix"
> B<-matrix(data = c(-1, 10, -1, 5, 5, 8), nrow = 2, ncol =
3, byrow = TRUE)
> A+B
[,1] [,2] [,3]
[1,]
0
12
2
MA.7
[2,]
9
10
14
> A-B
[,1] [,2] [,3]
[1,]
2
-8
4
[2,]
-1
0
-2
Notes:
1. Be careful with the byrow option. By default, this is
set to FALSE. Thus, the numbers would be entered
into the matrix by columns. For example,
> matrix(data = c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
[,1] [,2] [,3]
[1,]
1
3
5
[2,]
2
4
6
2. The class of these objects is “matrix”.
3. A vector can be represented as a “matrix” class type or
a type of its own.
> y<-matrix(data = c(1,2,3), nrow = 3, ncol = 1, byrow
= TRUE)
> y
[,1]
[1,]
1
[2,]
2
[3,]
3
> class(y)
[1] "matrix"
> x<-c(1,2,3)
> x
[1] 1 2 3
> class(x)
[1] "numeric"
MA.8
> is.vector(x)
[1] TRUE
This can present some confusion when vectors are
multiplied with other vectors or matrices because no
specific row or column dimensions are given. More on
this shortly.
4. A transpose of a matrix can be done using the t()
function. For example,
> t(A)
[,1] [,2]
[1,]
1
4
[2,]
2
5
[3,]
3
6
MA.9
Matrix multiplication
Scalar - 11 matrix
Example: Matrix multiplied by a scalar
 ca11 ca12
cA  
ca21 ca22
ca13 
where c is a scalar

ca23 
 1 2 3
2 4 6 
Let A  
and c=2. Then 2A  
.


4 5 6
8 10 12 
Multiplying two matrices
Suppose you want to multiply the matrices A and B; i.e.,
AB or AB. In order to do this, you need the number of
columns of A to be the same as the number of rows as
B. For example, suppose A is 23 and B is 310. You
can multiply these matrices. However if B is 410
instead, these matrices could NOT be multiplied.
The resulting dimension of C=AB:
1. The number of rows of A is the number of rows of C.
2. The number of columns of B is the number of rows of
C.
MA.10
3. In other words, C  A B where the dimension of the
wy
w  z z y
matrices are shown below them.
How to multiply two matrices – an example
3 0 
 1 2 3
 1 2  . Notice that A
and
B

Suppose A  



4
5
6


0 1
is 23 and B is 32 so C=AB can be done.
3 0 
 1 2 3 

C  AB  
1
2

 4 5 6  0 1


 1 3  2  1  3  0 1 0  2  2  3  1 


 4  3  5  1  6  0 4  0  5  2  6  1
5 7


17
16


The “cross product” of the rows of A and the columns of
B are taken to form C
In the above example, D=BAAB where BA is:
MA.11
3 0 
1 2


BA  1 2 

 4 5
0 1
3  1  0  4
  1 1  2  4

 0  1  1 4
3
6 
3  2  0  5 3  3  0  6
1 2  2  5 1 3  2  6 

0  2  1 5 0  3  1 6 
3 6 9 
 9 12 15 


 4 5 6 
In general for a 23 matrix times a 32 matrix:
b11 b12 
 a11 a12 a13  

C  AB  
b
b
22 
  21
a
a
a
22
23 
 21
b31 b32 
 a11b11  a12b21  a13b31 a11b12  a12b22  a13b32 


a
b

a
b

a
b
a
b

a
b

a
b
22 21
23 31
21 12
22 22
23 32 
 21 11
Example: Using R (basic_matrix_algebra.R)
> A<-matrix(data
3,
> B<-matrix(data
2,
= c(1, 2, 3, 4, 5, 6), nrow = 2, ncol =
byrow = TRUE)
= c(3, 0, 1, 2, 0, 1), nrow = 3, ncol =
byrow = TRUE)
MA.12
> C<-A%*%B
> D<-B%*%A
> C
[,1] [,2]
[1,]
5
7
[2,]
17
16
> D
[,1] [,2] [,3]
[1,]
3
6
9
[2,]
9
12
15
[3,]
4
5
6
> #What is A*B?
> A*B
Error in A * B : non-conformable arrays
Notes:
1. %*% is used for multiplying matrices and/or vectors
2. * means to perform elementwise multiplications. Here
is an example where this can be done:
> E<-A
> A*E
[,1] [,2] [,3]
[1,]
1
4
9
[2,]
16
25
36
The (i,j) elements of each matrix are multiplied
together.
3. In R, multiplying vectors with other vectors or matrices
can be confusing since no row or column dimensions
MA.13
are given for a vector object. For example, suppose x
 1
=  2  , a 31 vector
3 
> x<-c(1,2,3)
> x%*%x
[,1]
[1,]
14
> A%*%x
[,1]
[1,]
14
[2,]
32
How does R know that we want xx (11) instead of
xx (33) when we have not told R that x is 31?
Similarly, how does R know that Ax is 21? From the
R help for %*% in the Base package:
Multiplies two matrices, if they are conformable. If
one argument is a vector, it will be promoted to
either a row or column matrix to make the two
arguments conformable. If both are vectors it will
return the inner product.
An inner product produces a scalar value. If you
wanted xx (33), one can use the outer product %o%
> x%o%x #outer product
[,1] [,2] [,3]
[1,]
1
2
3
[2,]
2
4
6
MA.14
[3,]
3
6
9
We will only need to use %*% in this class.
Example: HS and College GPA (HS_college_GPA_MA.R)
1
1

1
X

1

1
3.04 
3.10 
2.30 
2.35 



3.00 
2.70 
and
Y





2.20 
2.28 



1.88 
1.60


Find XX, XY, and YY
> #Read in the data
> gpa<-read.table(file = "C:\\chris\\ unl\\Dropbox\\NEW\\
STAT873_rev\\matrix_algebra\\gpa.txt", header=TRUE, sep
= "")
> head(gpa)
HS.GPA College.GPA
1
3.04
3.1
2
2.35
2.3
3
2.70
3.0
4
2.05
1.9
5
2.83
2.5
6
4.32
3.7
> X<-cbind(1, gpa$HS.GPA)
> Y<-gpa$College.GPA
> X
MA.15
[1,]
[2,]
[3,]
[4,]
[5,]
[6,]
[7,]
[8,]
[9,]
[10,]
[11,]
[12,]
[13,]
[14,]
[15,]
[16,]
[17,]
[18,]
[19,]
[20,]
[,1]
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
[,2]
3.04
2.35
2.70
2.05
2.83
4.32
3.39
2.32
2.69
0.83
2.39
3.65
1.85
3.83
1.22
1.48
2.28
4.00
2.28
1.88
> Y
[1] 3.1 2.3 3.0 1.9 2.5 3.7 3.4 2.6 2.8 1.6 2.0 2.9 2.3
3.2 1.8 1.4 2.0 3.8 2.2
[20] 1.6
> t(X)%*%X
[,1]
[,2]
[1,] 20.00 51.3800
[2,] 51.38 148.4634
> t(X)%*%Y
[,1]
[1,] 50.100
[2,] 140.229
> t(Y)%*%Y
[,1]
[1,] 135.15
MA.16
Notes:
1. The cbind() function combines items by
“c”olumns. Since 1 is only one element, it will
replicate (called “recycling” in R) itself for all
elements that you are combining so that one full
matrix is formed. There is also a rbind() function
that combines by rows. Thus, rbind(a,b) forms a
matrix with a above b.
 y1 
y  n
2
2. Y Y   y1,y2 ,...,yn      yi2
  i1
 
 yn 
 y1   n
n


xi1yi
yi 




xn1  y2
 i1
  i1 
 x11 x21



n
n
3. X Y  





x
x
x
12
22
n2


xi2 yi    xi2 yi 
   i
1
  i1

 yn 
because x11=…=xn1=1
MA.17
4. XX
 x11

 x12
 1

 x12
x21
x22
1
x22
 x11 x12 
xn1   x21 x22 




xn2  


x
x
n1
n2


1 x12  
n
1  1 x22  



 n x
xn2  
i2

  i

1
1 xn2 
 xi2 
i 1

n
2
 xi2 
i 1

n
5. Here’s another way to get the X matrix
> mod.fit<-lm(formula = College.GPA ~ HS.GPA, data =
gpa)
> model.matrix(object = mod.fit)
(Intercept) HS.GPA
1
1
3.04
2
1
2.35
3
1
2.70
4
1
2.05
5
1
2.83
6
1
4.32
7
1
3.39
8
1
2.32
9
1
2.69
10
1
0.83
11
1
2.39
12
1
3.65
13
1
1.85
14
1
3.83
15
1
1.22
16
1
1.48
17
1
2.28
MA.18
18
1
19
1
20
1
attr(,"assign")
[1] 0 1
4.00
2.28
1.88
MA.19
Special types of matrices
Symmetric matrix: If A=A, then A is symmetric.
 1 2
Example: A  
, A 

2 3 
 1 2
2 3 


Diagonal matrix: A square matrix whose “off-diagonal”
elements are 0.
a11 0
Example: A   0 a22
 0
0
0 
0 

a33 
Identity matrix: A diagonal matrix with 1’s on the diagonal.
1 0 0
Example: I  0 1 0 
0 0 1
Note that “I” (the letter I, not the number one) usually
denotes the identity matrix.
Vector and matrix of 1’s
MA.20
A column vector of 1’s:
1
1
A matrix of 1’s: J  
r r


1
Notes:
1. j  j  r
r 1 r 1
2. j j   J
r 1r 1
r r
3. J J  r J
r r r r
r r
0 
0 
Vector of 0’s: 0   
r 1
 
 
0 
1
1
1 j  
r 1
 
r 1
 
1
1
1
1
1



1
1
MA.21
Linear dependence and rank of matrix
1 2 6 
Let A  3 4 12  . Think of each column of A as a
5 6 18 
vector; i.e., A = [A1, A2, A3]. Note that 3A2=A3. This
means the columns of A are “linearly dependent.”
Formally, a set of column vectors are linearly dependent
if there exists constants 1, 2,…, c (not all zero) such
that 1A1+2A2+…+cAc=0. A set of column vectors are
linearly independent if 1A1+2A2+…+cAc=0 only for
1=2=…=c=0 .
The rank of a matrix is the maximum number of linearly
independent columns in the matrix.
rank(A)=2
If a matrix does not have a rank equal to the number of
its columns, this can cause problems in some of our
calculations this semester. More on this later…
MA.22
Inverse of a matrix
Note that the inverse of a scalar, say b, is b-1. For
example, the inverse of b=3 is 3-1=1/3. Also, bb-1=1. In
matrix algebra, the inverse of a matrix is another matrix.
For example, the inverse of A is A-1, and AA-1=A-1A=I.
Note that A must be a square matrix.
1 
 1 2  1  2
Example: A  
,A 


3 4 
1.5 0.5 
Check:
AA
1
 1 ( 2)  2  1.5 1 1  2 * ( 0.5)   1 0 




3  ( 2)  4  1.5 3  1  4 * ( 0.5) 0 1
Finding the inverse
For a general way, see a matrix algebra book (e.g.,
Kolman, 1988). For a 22 matrix, there is a simple
 a11 a12 
formula. Let A  
. Then

a21 a22 
 a22 a12 
1
A1 
.


a11a22  a12 a21  a21 a11 
Verify AA-1=I on your own.
Example: Using R (basic_matrix_algebra.R)
MA.23
> A<-matrix(data = c(1, 2, 3, 4), nrow = 2, ncol = 2,
byrow = TRUE)
> solve(A)
[,1] [,2]
[1,] -2.0 1.0
[2,] 1.5 -0.5
> A%*%solve(A)
[,1]
[,2]
[1,]
1 1.110223e-16
[2,]
0 1.000000e+00
> solve(A)%*%A
[,1] [,2]
[1,] 1.000000e+00
0
[2,] 1.110223e-16
1
> round(solve(A)%*%A, 2)
[,1] [,2]
[1,]
1
0
[2,]
0
1
The solve() function inverts a matrix in R. The
solve(A,b) function can also be used to “solve” for x
in Ax = b since A-1Ax = A-1b  Ix = A-1b  x = A-1b
Example: HS and College GPA (HS_college_GPA_MA.R)
MA.24
1 3.04 
3.10 
1 2.35 
2.30 




1 2.70 
3.00 
Remember that X  
 and Y  





1 2.28 
2.20 




1
1.88
1.60




Find (XX)-1 and (XX)-1XY
> solve(t(X)%*%X)
[,1]
[,2]
[1,] 0.4507584 -0.15599781
[2,] -0.1559978 0.06072316
> solve(t(X)%*%X) %*% t(X)%*%Y
[,1]
[1,] 0.7075776
[2,] 0.6996584
From previous output:
> mod.fit<-lm(formula = College.GPA ~ HS.GPA, data = gpa)
> mod.fit$coefficients
(Intercept)
HS.GPA
0.7075776
0.6996584
ˆ 0 
Thus, (XX) XY =   , where ̂0 and ̂1 are the
 ˆ1 
estimated intercept and slope coefficients in a simple
linear regression model.
-1
MA.25
Trace and Determinant
Suppose we have a pp matrix A:
 a11 a12
a
a22
21

Α


ap1 ap2
a1p 
a2p 



app 
Notice this is a square matrix!
Trace – The trace of a square matrix A is defined as
p
tr( Α )   aii  a11  a22  ...  app ; i.e., the sum of a square
i1
matrix’s diagonal elements.
Example: Using R (basic_matrix_algebra.R)
> A<-matrix(data = c(1, 2, 3, 4), nrow = 2, ncol = 2, byrow
= TRUE)
> A
[,1] [,2]
[1,]
1
2
[2,]
3
4
> diag(A)
[1] 1 4
> sum(diag(A))
[1] 5
MA.26
Determinant – The determinant of a square matrix A is
p
Α   a1j Α1j where A1j = (-1)1+j|A1j| and A1j is obtained from A
j1
by deleting its first row and its jth column.
The determinant for a 22 matrix is defined as
 a11 a12 
 a11a22  a12a21
a

 21 a22 
and the determinant for a 33 matrix can be defined as
 a11 a12
a
a
 21 22
a31 a32
a13 
a23   a11

a33 
a22
a
 32
a23 
 a12

a33 
a21 a23 
 a13
a

 31 a33 
a21 a22 
a

 31 a32 
Example: Using R (basic_matrix_algebra.R)
> A<-matrix(data = c(1, 2, 3, 4), nrow = 2, ncol = 2, byrow
= TRUE)
> det(A)
[1] -2
MA.27
Eigenvalues and eigenvectors
Suppose we have a pp matrix A:
 a11 a12
a
a22
21

Α


ap1 ap2
a1p 
a2p 



app 
Notice this is a square matrix!
Eigenvalues – The roots of the polynomial equation defined
by Α  I  0 where I is an identity matrix.
If p=2, then the eigenvalues are the roots of
a12 
 a11 a12   0  a11  

 
a



a
0

a
a


21
22
21
22

 
 

 (a11   )(a22   )  a21a12
 2  (a11  a22 )  (a21a12  a11a22 )  0
Using the quadratic formula, we obtain
a11  a22  (a11  a22 )2  4(a21a12  a11a22 )

2
MA.28
In general, the eigenvalues are the p roots of
c1p + c2p-1 + c3p-2 + … + cp + cp+1 = 0 where cj for i =
1, …, p+1 denote constants.
When A is a symmetric matrix, the eigenvalues are real
numbers and can be ordered from largest to smallest as
1  2  …  p where 1 is the largest.
Example: Using R (basic_matrix_algebra.R)
> A<-matrix(data = c(1, 0.5, 0.5, 1.25), nrow = 2, ncol = 2,
byrow = TRUE)
> A
[,1] [,2]
[1,] 1.0 0.50
[2,] 0.5 1.25
> eigen(A)
$values
[1] 1.6403882 0.6096118
$vectors
[,1]
[,2]
[1,] 0.6154122 -0.7882054
[2,] 0.7882054 0.6154122
> save.eig<-eigen(A)
> save.eig$values
[1] 1.6403882 0.6096118
> save.eig$vectors
[,1]
[,2]
[1,] 0.6154122 -0.7882054
[2,] 0.7882054 0.6154122
2.25  2.252  4(.25  1.25)

 1.6404 and 0.6096
2
MA.29
Eigenvectors – Each eigenvalue of A has a corresponding
nonzero vector b called an eigenvector that satisfies Ab =
b.
Notes:
 Eigenvectors for a particular eigenvalue are not
unique.
 When two eigenvalues are not equal, their
corresponding eigenvectors are orthogonal (i.e.,
bibj  0 ).
p
 tr( Α )   i
i1
p
 Α   i  12
i1
p
Example: Using R (basic_matrix_algebra.R)
The eigenvectors of A satisfy
0.5   b1 
 1
 b1 
0.5 1.25  b    b 

 2
 2
for b = [b1, b2], 1=1.6404, and 2=0.6096.
Two possible vectors are:
MA.30
0.6154 
 0.7882 
0.7882  for 1=1.6404 and  0.6154  for 2=0.6096




(see the previous example’s output).
Note that
0.5  0.6154 
 1
0.6154 

1.6404
0.5 1.25  0.7882 
0.7882 





and
0.5   0.7882 
 1
 0.7882 
0.5 1.25   0.6154   0.6096  0.6154 





Below is the verification from R:
> A%*%save.eig$vectors[,1]
[,1]
[1,] 1.009515
[2,] 1.292963
> save.eig$values[1]*save.eig$vectors[,1]
[1] 1.009515 1.292963
> A%*%save.eig$vectors[,2]
[,1]
[1,] -0.4804993
[2,] 0.3751625
> save.eig$values[2]*save.eig$vectors[,2]
[1] -0.4804993 0.3751625
MA.31
Vector length – Suppose b = [b1, …, bp] is a vector. The
length of a vector is
p
2
 bi .
i1
R chooses eigenvectors so that they have a length of 1.
Example: Using R (basic_matrix_algebra.R)
> sqrt(sum(save.eig$vectors[,1]^2))
[1] 1
> sqrt(sum(save.eig$vectors[,2]^2))
[1] 1
Below is a plot of the vectors:
> #Square plot (default is "m" which means maximal region)
> par(pty = "s")
> #Set up some dummy values for plot
> b1<-c(-1,1)
> b2<-c(-1,1)
> plot(x = b1, y = b2, type = "n", main =
expression(paste("Eigenvectors of ", A)),
xlab = expression(b[1]), ylab = expression(b[2]) ,
panel.first=grid(col="gray", lty="dotted"))
> #Run demo(plotmath) for help on mathematical notation
> #draw line on plot - h specifies a horizontal line
> abline(h = 0, lty = "solid", lwd = 2)
> #v specifies a vertical line
> abline(v = 0, lty = "solid", lwd = 2)
> arrows(x0 = 0, y0 = 0, x1 = save.eig$vectors[1,1], y1 =
save.eig$vectors[2,1], col = "red", lty = "solid")
> arrows(x0 = 0, y0 = 0, x1 = save.eig$vectors[1,2], y1 =
MA.32
save.eig$vectors[2,2], col = "red", lty = "solid")
0.0
-1.0
-0.5
b2
0.5
1.0
Eigenvectors of A
-1.0
-0.5
0.0
0.5
1.0
b1
Notes:
 The vectors are orthogonal.
 Other eigenvectors do exist! For example, you can
multiply one of the vectors by any scalar value. If the
scalar was -10, this is what we obtain:
> new.vec<--10*save.eig$vectors[,1]
MA.33
> A%*%new.vec
[,1]
[1,] -10.09515
[2,] -12.92963
> save.eig$values[1]*new.vec
[1] -10.09515 -12.92963
> sqrt(sum(new.vec^2))
[1] 10
MA.34
Positive Definite and Positive Semidefinite Matrices
Definition 4.5: If a symmetric matrix has all of its eigenvalues
positive, the matrix is called a positive definite matrix.
Definition 4.6: If a symmetric matrix has all nonnegative
eigenvalues and if at least one of the eigenvalues is actually
0, then the matrix is called a positive semidefinite matrix.
Definition 4.7: If a matrix is either positive definite or positive
semidefinite, the matrix is defined to be a nonnegative
matrix.
If you are familiar with covariance and correlation
matrices, they are always nonnegative!
Example: Using R (basic_matrix_algebra.R)
All eigenvalues are positive so A is a positive definite
matrix.
Download