A Review of Elementary Matrix Algebra Appendix A

advertisement
Appendix A
A Review of Elementary Matrix Algebra
Dr. B.W. Kennedy originally prepared this review for use alongside his course in Linear Models
in Animal Breeding. His permission to use these notes is gratefully acknowledged. Not all the
operations outlined here are necessary for this course, but most would be necessary for some
applications in animal breeding.
A much more complete treatment of matrix algebra can be found in "Matrix Algebra Useful for
Statistics’’ by S.R. Searle.
A.1 Definitions
A matrix is an ordered array of numbers. For example, an experimenter might have observations
on a total of 35 animals assigned to three treatments over two trials as follows:
Treatment
1
2
3
1
6
3
8
Trial
2
4
9
5
The array of numbers of observations can be written as a matrix as
Î6
Ï
M = Ï3
Ï
Ï8
Ð
4Þ
ß
9ß
ß
5 ßà
with rows representing treatments (1,2,3) and columns representing trials (1,2).
The numbers of observations then represent the elements of matrix M. The order of a matrix is
the number of rows and columns it consists of. M has order 3 x 2.
A vector is a matrix consisting of a single row or column. For example, observations on 3
animals of 3, 4 and 1, respectively, can be represented as column or row vectors as follows:
285
Î 3 Þ
x = ÏÏ 4 ßß
ÏÐ 1 ßà
A column vector:
x’ = [3
A row vector:
4 1]
A scalar is a single number such as 1, 6 or -9.
A.2 Matrix Operations
A.2.1 Addition
If matrices are of the same order, they are conformable for addition. The sum of two
conformable matrices, is the matrix of sums element by element of the two matrices. For
example, suppose A represents observations on the first replicate of a 2 x 2 factorial experiment,
B represents observations on a second replicate and we want the sum of each treatment over
replicates. This is given by matrix S = A + B.
Î 2 5 Þ
A = Ï
ß
Ð 1 9 à
S = A + B
Î -4 6 Þ
ß,
Ð 5 2 à
B = Ï
Î 2 - 4 5 + 6 Þ
Ï 1+ 5 9- 2 ß =
Ð
à
Î - 2 11 Þ
Ï 6 7 ß.
Ð
à
A.2.2 Subtraction
The difference between two conformable matrices is the matrix of differences element by
element of the two matrices. For example, suppose now we want the difference between
replicate 1 and replicate 2 for each treatment combination, i.e. D = A - B,
D = A + B
Î 2 + 4 5 - 6 Þ
Ï 1- 5 9 + 2 ß =
Ð
à
Î 6 -1 Þ
Ï - 4 11 ß .
Ð
à
A.2.3 Multiplication
Scalar Multiplication
A matrix multiplied by a scalar is the matrix with every element multiplied by the scalar. For
example, suppose A represents a collection of measurements taken on one scale which we would
like to convert to an alternative scale, and the conversion factor is 3.
286
For a scalar l = 3.
lA = 3
Î 2 5 Þ
Ï 1 9 ß =
Ð
à
Î 6 15 Þ
Ï 3 27 ß .
Ð
à
Vector Multiplication
The product of a row vector with a column vector is a scalar obtained from the sum of the
products of corresponding elements of the vectors. For example, suppose v represents the
number of observations taken on each of 3 animals and that y represents the mean of these
observations on each of the 3 animals and we want the totals for each animal.
Î1 Þ
v’ = [3 4 1]
y = ÏÏ5ßß ,
ÏÐ2ßà
Î1 Þ
t = v’y = [3 4 1] ÏÏ5ßß = 3(1) + 4(5) + 1(2) = 25.
ÏÐ2ßà
Matrix Multiplication
Vector multiplication can be extended to the multiplication of a vector with a matrix, which is
simply a collection of vectors. The product of a vector and a matrix is a vector and is obtained as
follows:
Î 6 4 Þ
e.g.
v’ = [3 4 1]
M = ÏÏ 3 9 ßß
ÏÐ 8 5 ßà
v’M = [3
4 1]
Î 6 4 Þ
Ï 3 9 ß
Ï
ß
ÏÐ 8 5 ßà
= [3(6) + 4(3) + 1(8)
= [38
3(4) + 4(9) + 1(5)]
53]
That is, each column (or row) of the matrix is treated as a vector multiplication.
This can be extended further to the multiplication of matrices. The product of two conformable
matrices is illustrated by the following example:
Î 2 5 Þ
AxB = Ï
ß
Ð 1 9 à
287
Î 4 -6 Þ
Ï -5 2 ß
Ð
à
=
=
Î 2(4) + 5(-5) 2(-6) + 5(2) Þ
Ï 1(4) + 9(-5) 1(-6) + 9(2) ß
Ð
à
Î - 17 - 2 Þ
Ï - 41 12 ß .
Ð
à
For matrix multiplication to be conformable, the number of columns of the first matrix must
equal the number of rows of the second matrix.
A.2.4 Transpose
The transpose of a matrix is obtained by replacing
versa, e.g.
‡
Î 6 4 Þ
M’ = ÏÏ 3 9 ßß =
ÏÐ 8 5 ßà
rows with corresponding columns and viceÎ 6 3 8Þ
Ï4 9 5ß .
Ð
à
The transpose of the product of two matrices is the product of the transposes of the matrices
taken in reverse order, e.g.
(AB)’ = B’A’
A.2.5 Determinants
The determinant of a matrix is a scalar and exists only for square matrices. Knowledge of the
determinant of a matrix is useful for obtaining the inverse of the matrix, which in matrix algebra
is analogous to the reciprocal of scalar algebra. If A is a square matrix, its determinant can be
symbolized as |A|. Procedures for evaluating the determinant of various order matrices follow.
The determinant of a scalar (1 x 1 matrix) is the scalar itself, e.g. for A = 6, |A| = 6. The
determinant of a 2 x 2 matrix is the difference between the product of the diagonal elements and
the product of the off-diagonal elements, e.g. for
A =
Î 5 2 Þ
Ï 6 3 ß
Ð
à
|A| = 5(3) - 6(2) = 3.
The determinant of a 3 x 3 matrix can be obtained from the expansion of three 2 x 2 matrices
obtained from it. Each of the second order determinants is preceded by a coefficient of +1 or -1,
e.g. for
288
Î5 2 4 Þ
A = ÏÏ6 3 1ßß
ÏÐ8 7 9ßà
Based on elements of the first row,
|A|
= 5(+1)
3 1
7 9
+ 2(-1) +
6 1
8 9
+ 4(+1)
6 3
8 7
= 5(27 - 7) - 2(54 - 8) + 4(42 - 24)
= 5(20) - 2(46) + 4(18)
= 100 - 92 + 72 = 80
The determinant was derived by taking in turn each element of the first row, crossing out the row
and column corresponding to the element, obtaining the determinant of the resulting 2 x 2
matrix, multiplying this determinant by +1 or -1 and the element concerned, and summing the
resulting products for each of the three first row elements. The (+1) or (-1) coefficients for the
ijth element were obtained according to (-1)i+j. For example, the coefficient for the 12 element is
(-1)1+2 = (-1)3} = -1. The coefficient for the 13 element is (1)1+3 = (-1)4 = 1. The determinants of
each of the 2 x 2 sub-matrices are called minors. For example, the minor of first row element 2
is
Î 6 1 Þ
Ï 8 9 ß = 46
Ð
à
When multiplied by its coefficient of (-1), the product is called the co-factor of element 12. For
example, the co-factor of elements 11, 12 and 13 are 20, -46 and 18.
Expansion by the elements of the second row yields the same determinant, e.g.
Î 2 4 Þ
Î 5 4 Þ
Î 5 2 Þ
|A| = 6(-1) Ï
+ 3(+1) Ï
+ 1 (-1) Ï
ß
ß
ß
Ð 7 9 à
Ð 8 9 à
Ð 8 7 à
= -6 (18 - 28) - 3 (45 - 32) + 1 (35 - 16)
= 60 + 39 - 19 = 80
Similarly, expansion by elements of the third row again yields the same determinant, etc.
Î 2 4 Þ
Î 5 4 Þ
Î 5 2 Þ
|A| = 8(+1) Ï
+ 7(-1) Ï
+ 9(+1) Ï
ß
ß
ß
Ð 3 1 à
Ð 6 1 à
Ð 6 3 à
= 8 (2 - 12) - 7 (5 - 24) + 9 (15 - 12)
289
= -80 + 133 + 27 = 80
In general, multiplying the elements of any row by their co-factors yields the determinant. Also,
multiplying the elements of a row by the co-factors of the elements of another row yields zero, e.
g. the elements of the first row by the co-factors of the second row gives
Î 2 4 Þ
Î 5 4 Þ
Î 5 2 Þ
5(-1) Ï
+ 2(+1) Ï
+ 4(-1) Ï
ß
ß
ß
Ð 7 9 à
Ð 8 9 à
Ð 8 7 à
= -5 (18 - 28) + 2 (45 - 32) + 4 (35 - 16)
= 50 + 26 - 76 = 0
Expansion for larger order matrices follows according to
|A| =
Ê a ( -1)
n
i j
ij
M ij
j 1
for any i where n is the order of the matrix, i = 1, …, n and j = 1,…, n, aij is the ijth element, and
|Mij| is the minor of the ijth element.
A2.6 Inverse
As suggested earlier, the inverse of a matrix is analogous to the reciprocal in scalar algebra and
performs an equivalent operation to division. The inverse of matrix A is symbolized as A-1. The
multiplication of a matrix by its inverse gives an identity matrix (I), which is composed of all
diagonal elements of one and all off-diagonal elements of zero, i.e. A x A-1 = I. For the inverse
of a matrix to exist, it must be square and have a non-zero determinant.
The inverse of a matrix can be obtained from the co-factors of the elements and the determinant.
The following example illustrates the derivation of the inverse.
Î 5 2 4 Þ
A = ÏÏ 6 3 1 ßß
ÏÐ 8 7 9 ßà
i) Calculate the co-factors of each element of the matrix, e.g. the co-factors of the elements
Î 6 3 Þ
Î 3 1 Þ
Î 6 1 Þ
, and (+1) Ï
of the first row are (+1) Ï
, (-1) Ï
ß = 20, -46 and 18.
ß
ß
Ð 8 9 à
Ð 8 7 à
Ð 7 9 à
290
Similarly, the co-factors of the elements of the second row are
= 10, 13 and -19
and the co-factors of the elements of the third row are
= -10, 19 and 3.
ii) Replace the elements of the matrix by their co-factors, e.g.
Î 5 2 4 Þ
Î 20 - 46 18
A = ÏÏ 6 3 1 ßß yields C = ÏÏ 10
13 - 19
ÏÐ 8 7 9 ßà
ÏÐ - 10 19
3
iii)
Transpose the matrix of co-factors, e.g.
Î 20 - 46 18
C’ = ÏÏ 10
13 - 19
ÏÐ - 10 19
3
iv)
Þ
ß
ß
ßà
10 - 10
Þ
Î 20
ß = Ï - 46 13
19
ß
Ï
ÏÐ 18 - 19 3
ßà
’
Þ
ß
ß
ßà
Multiply the transpose matrix of co-factors by the reciprocal of the determinant to
yield the inverse, e.g.
|A| = 80, 1/|A| = 1/80
-1
A
v)
10 - 10
Î 20
1 Ï
=
- 46 13
19
80 Ï
ÏÐ 18 - 19 3
Þ
ß .
ß
ßà
As a check, the inverse multiplied by the original matrix should yield an identity
matrix, i.e. A-1 A = I, e.g.
10 - 10
Î 20
1 Ï
- 46 13
19
80 Ï
ÏÐ 18 - 19 3
The inverse of a 2 x 2 matrix is:
Þ
ß
ß
ßà
Î 5 2 4
Ï 6 3 1
Ï
ÏÐ 8 7 9
Þ
Î 1 0 0 Þ
ß = Ï 0 1 0 ß .
ß
Ï
ß
ßà
ÏÐ 0 0 1 ßà
Îa b Þ
1 Î d - bÞ
Ï c d ß = ad - bc Ï - c a ß
Ð
à
Ð
à
A.2.7 Linear Independence and Rank
As indicated, if the determinant of a matrix is zero, a unique inverse of the matrix does not exist.
The determinant of a matrix is zero if any of its rows or columns are linear combinations of other
291
rows or columns. In other words, a determinant is zero if the rows or columns
Î 5
Ï 2
of linearly independent vectors. For example, in the following matrix
Ï
ÏÐ 3
do not form a set
2 3 Þ
2 0 ßß
0 3 ßà
rows 2 and 3 sum to row 1 and the determinant of the matrix is zero.
The rank of a matrix is the number of linearly independent rows or columns. For example, the
rank of the above matrix is 2. If the rank of matrix A is less than its order n, then the determinant
is zero and the inverse of A does not exist, i.e. if r(A) < n then A-1 does not exist.
A.2.8 Generalized Inverse
Although a unique inverse does not exist for a matrix of less than full rank, generalized inverses
do exist. If A- is a generalized inverse of A, it satisfies AA-A = A. Generalized or g-inverses are
not unique and there are many A- which satisfy AA-A = A. There are also many ways to obtain a
g-inverse, but one of the simplest ways is to follow these steps:
a)
b)
c)
d)
e)
Obtain a full rank subset of A and call it M.
Invert M to yield M-1.
Replace each element in A with the corresponding element of M-1.
Replace all other elements of A with zeros.
The result is A-, a generalized inverse of A.
Example
A =
c, d)
6
3
2
1
3
3
0
0
2
0
2
0
1
0
0
1
Þ
ß
ß
ß
ß
à
Î 3 0 0 Þ
Ï 0 2 0 ß
Ï
ß
ÏÐ 0 0 1 ßà
a) M, a full rank subset, is
b)
Î
Ï
Ï
Ï
Ï
Ð
Î 1/ 3 0 0 Þ
M = ÏÏ 0 1 / 2 0 ßß .
ÏÐ 0
0 1 ßà
-1
Replacing elements of A with corresponding elements of M-1 and all other elements with
zeros gives
292
e)
Î
Ï
A- = Ï
Ï
Ï
Ð
0 Þ
0 ßß
.
1/ 2 0 ß
ß
0 1 à
0 0
0 1/ 3
0
0
0
0
0
0
A.2.9 Special Matrices
In many applications of statistics we deal with matrices that are the product of a matrix and its
transpose, e.g.
A = X’X
Such matrices are always symmetric, that is every off-diagonal element above the diagonal
equals its counterpart below the diagonal. For such matrices
X (X’X)- X’X = X
and X(X’X)X’ is invariant to (X’X)-, that is, although there are many possible g-inverses of X’X,
any g-inverse pre-multiplied by X and post-multiplied by X’X yields the same matrix X.
A.2.10 Trace
The trace of a matrix is the sum of the diagonal elements. For the matrix A of order n with
elements (aij), the trace is defined as
tr (A) =
Êa
n
ii
i 1
As an example, the trace of
Î 3 1 4 Þ
Ï 1 6 2 ß
Ï
ß
ÏÐ 4 2 5 ßà
is
3 + 6 + 5 = 14
For products of matrices, tr(AB) = tr(BA) if the products are conformable. This can be extended
to the product of three or more matrices, e.g.
Tr(ABC) = tr(BCA) = tr(CAB)
A.3 Quadratic Forms
All sums of squares can be expressed as quadratic forms that is a y’Ay.
y ~ (m, V), then
293
If
E(y’Ay) = m’Am
Exercises
1. For
Î 6 3 Þ
A = ÏÏ 0 5 ßß
ÏÐ - 5 1 ßà
Î 3 8 Þ
B = ÏÏ 2 - 4 ßß
ÏÐ 5 - 1 ßà
Find the sum of A + B.
Find the difference of A - B.
2. For A and B above and v’ = [1 3 -1], find v’A and v’B.
3. For
5 Þ
Î 3 2
B’ = Ï
ß
Ð 8 - 4 -1 à
and A as above. Find B’A. Find AB’.
4. For A and B above, find AB.
5. Obtain determinants of the following matrices
Î 3 8 Þ
Ï 2 -4 ß
Ð
à
Î 6 3 Þ
Ï 1 5 ß
Ï
ß
ÏÐ - 5 2 ßà
Î 1 1 3 Þ
Ï 2 3 0 ß
Ï
ß
ÏÐ 4 5 6 ßà
Î 1 2 3 Þ
Ï 4 5 6 ß
Ï
ß
ÏÐ 7 8 9 ßà
294
Î
Ï
Ï
Ï
Ï
Ð
1 6 4 3 Þ
2 8 5 4 ßß
3 8 7 5 ß
ß
4 9 7 7 à
6. Show that the solution to Ax = y is x = A-1y.
7. Derive the inverses of the following matrices:
Î 4 2 Þ
Ï 6 1 ß
Ð
à
3
Î 2 1
Ï -5 1
0
Ï
ÏÐ 1 4 - 2
Î
Ï
Ï
Ï
Ï
Ð
8. For
Þ
ß
ß
ßà
3 0 0 0 Þ
0 4 0 0 ßß
0 0 2 0 ß
ß
0 0 0 5 à
Î 1 1 3 Þ
Î 1 2 3 Þ
ß
Ï
A = Ï 2 3 0 ß and B = ÏÏ 4 5 6 ßß ,
ÏÐ 4 5 6 ßà
ÏÐ 7 8 9 ßà
show that tr(AB) = tr (BA).
295
Appendix B
A Few Useful Standard Forms of
(Co) Variances and Derivatives
This is a list of some of the more useful derivations of (co)variances of simple functions that are
used in the course or might be used in particular problems. Some standard derivatives also follow.
B.1 (Co) Variances
NOTATION: V = variance, W = covariance.
Vax = a2 Vx, where a is a constant
V(x+y) = Vx + Vy + 2Wxy
V(x-y) = Vx + Vy - 2Wxy
V(x1
x2
... xn )
= V x1 + Vx2 + ... Vn + 2Wx1x2 + 2Wxn-1xn
Wax,by = abWxy, where a and b are constants
V
=
x
y
Ë x Û
1
Vx + ÌÌ 2 ÜÜ
2
y
Íy Ý
2
Vy - 2
Vxy = y 2Vx + x 2 V y - 2 xy Wxy
x
Wxy
y3
(1st order approximation)
(1st order approximation)
W(x,xy) = yVx + x Wxy
If T= f(x) and S = f(y), then
VT
Ë ›T Û
= Ì
Ü
Í ›x Ý
2
Vx
and if T = f(x1 ,x2 … xn) and S = f(y1},y2 … ym), then
VT
and
=
W(T,S) =
Ê Ê
n
n
i 1
j 1
Ê Ê
n
n
i 1
j 1
Ë ›T
ÌÌ
Í ›xi
Ë ›T
ÌÌ
Í ›xi
Û
ÜÜ
Ý
Û
ÜÜ
Ý
Ë ›T Û
Ì
Ü W
Ì ›x Ü xi x j
Í jÝ
Ë ›S Û
Ì
Ü W
Ì ›y Ü xi y j
Í jÝ
296
(1st order approximation)
Note that, as always, the derivation of a variance is just a special case of derivation of a
covariance. Also, both can be put in matrix notation as
W(T,S) = t’Ws
Ë ›T Û
ÜÜ , s is a vector of length m with elements
where t is a vector of length n with elements ÌÌ
Í ›xi Ý
Ë ›S Û
ÌÌ
ÜÜ , and W is a n x m matrix of covariances among the x and y.
Í ›yi Ý
B.2 Differentiation
Recap on Basic Differentiation
A small change in a variable, x, is usually denoted as Dx or as dx. Where two variables, say x
and u, are functionally related, we have
u = f(x)
where f(x) denotes a function of x. A small change in x, Dx (or dx) causes a small change in u,
Du (or du). In the limit, as Dx and hence Du become very small, the ratio of the changes in the
two variables tend to a limit
Du
›u
x 0
£lim,
££
£“
›x
Dx
which is the rate of change of u with respect to x known as the differential of u with respect to x.
Some Common Differentials
(All log to base e)
u = axn,
u
x
u = log x,
u
x
u = log v, where v = f(x)
anx n -1
1
x
u
x
1
v
v
u = ae where v = f(x)
u
w
y
u
x
, where w = f1(x), y = f2(x)
297
v
x
u
x
a
y
w
y
- w
x
x
y2
v v
e
x
Appendix C
Generation of Correlated
Random Variates
A General Method
We wish to generate a vector, p, of random variables, which have a multi-variate normal
distribution such that the variance/covariance matrix among the variables is
V(p) = V
(C.1)
Define a matrix L such that LL’ = V
(C.2)
then create a vector, w, of random normal deviates with mean zero and variance 1, so that
V(w) = I
(C.3)
Then p is obtained as
P
(C.4)
which has the variance,
V(p) = V(Lw) = LV(w)L’ = LIL’
= Lw
= LL’
and from (C.2)
V(p) = V
as it should do.
Using Cholesky Decomposition
There are several possible solutions to L, one of which is the Cholesky decomposition of V,
which is of lower triangular form, such that, for 3 variables say,
L =
and p then takes the form
Î l11
Ï l
Ï 21
ÏÐ l 31
0
l 22
l 32
0 Þ
0 ßß
l 33 ßà
Îl11w1
p = Lw = ÏÏl21w1 + l 22 w2
ÏÐl31w1 + l32 w2 + l33 w3
298
(C.5)
Þ
ß
ß
ßà
(C.6)
Thus the elements of p can be found sequentially from w1 to w3without the need to store w or use
of multiplications involving zero. This, along with the fact that L is lower triangular and can
therefore be stored efficiently, can cause considerable reductions in storage and computing time
for large problems.
An Intuitive Example
Since the workings of the Cholesky decomposition may not be obvious at first sight, consider a
2
simple example of two random variables with variances s 21 and s 2 and covariance s 12 . We
could simulate p1 and p2 by first generating p1 and then recalling from regression theory that p2
can be written as
p2 = bp1 + ep2
where b is the regression of p2 on p1 when both are expressed as deviations from their population
means and ep2 is that part of p2 which is not dependant on p1, which has mean zero and variance
2
) 22 (i.e. the residual error variance of p2 after allowance for correlation with p1).
(1 - r 12
If p1 is generated from a random normal deviate, w1, with variance 1, then
p1 = s 1w1
Hence,
bp1 =
s 12
s
s 1w1 = 122 w1
2
s1
s1
bp1 = s 2 r12 w1
(C.7)
Similarly,
e p 2 = sep2 w2 = s p ( 1-r122 ) w2
(C.8)
Thus we have,
Îs 1 w1
Þ
Î p Þ
p = Ï 1 ß = Ï
ß
2
Ð p2 à
ÐÏs 2 r12 w1 + s 2 ( 1 - r12 ) w2 ßà
(C.9)
2
Now consider the Cholesky decomposition,
Î l
LL’ = Ï 11
Ð l 21
0 Þ Î l11
l 22 ßà ÏÐ 0
but
Î s2
LL’ = V = Ï 1
Ð s 12
hence
l11 = s 1
299
s 12 Þ
ß
s 22 à
l 21 Þ
Î l112
=
Ï
l 22 ßà
Ðl11l 21
l
2
21
l11l 21 Þ
ß
+ l 222 à
l12 =
s 12
s
= 12 = s 2 r12
l11
s1
l11 = s 22 - l122 = s 2
(1 - r122 )
Substituting these into (C.6), we get
Îs 1 w 1
Þ
Î p1 Þ
p = Lw = Ï
=
Ï
ß
ß
2
Ð p2 à
ÏÐs 2 r12 w1 + s 2 (1 - r12 ) w2 ßà
(C.10)
which is exactly the same solution as (C.9).
The Cholesky decomposition can thus be seen to break down the variance of each variable in
turn, in terms of its conditional variance on previous traits (i.e. the variance of that trait which is
in some way associated with the variance of previous traits) and the unconditional variance of
that trait (i.e. the variance of that trait which is completely independent of all previous traits).
300
Download