Appendix A A Review of Elementary Matrix Algebra Dr. B.W. Kennedy originally prepared this review for use alongside his course in Linear Models in Animal Breeding. His permission to use these notes is gratefully acknowledged. Not all the operations outlined here are necessary for this course, but most would be necessary for some applications in animal breeding. A much more complete treatment of matrix algebra can be found in "Matrix Algebra Useful for Statistics’’ by S.R. Searle. A.1 Definitions A matrix is an ordered array of numbers. For example, an experimenter might have observations on a total of 35 animals assigned to three treatments over two trials as follows: Treatment 1 2 3 1 6 3 8 Trial 2 4 9 5 The array of numbers of observations can be written as a matrix as Î6 Ï M = Ï3 Ï Ï8 Ð 4Þ ß 9ß ß 5 ßà with rows representing treatments (1,2,3) and columns representing trials (1,2). The numbers of observations then represent the elements of matrix M. The order of a matrix is the number of rows and columns it consists of. M has order 3 x 2. A vector is a matrix consisting of a single row or column. For example, observations on 3 animals of 3, 4 and 1, respectively, can be represented as column or row vectors as follows: 285 Î 3 Þ x = ÏÏ 4 ßß ÏÐ 1 ßà A column vector: x’ = [3 A row vector: 4 1] A scalar is a single number such as 1, 6 or -9. A.2 Matrix Operations A.2.1 Addition If matrices are of the same order, they are conformable for addition. The sum of two conformable matrices, is the matrix of sums element by element of the two matrices. For example, suppose A represents observations on the first replicate of a 2 x 2 factorial experiment, B represents observations on a second replicate and we want the sum of each treatment over replicates. This is given by matrix S = A + B. Î 2 5 Þ A = Ï ß Ð 1 9 à S = A + B Î -4 6 Þ ß, Ð 5 2 à B = Ï Î 2 - 4 5 + 6 Þ Ï 1+ 5 9- 2 ß = Ð à Î - 2 11 Þ Ï 6 7 ß. Ð à A.2.2 Subtraction The difference between two conformable matrices is the matrix of differences element by element of the two matrices. For example, suppose now we want the difference between replicate 1 and replicate 2 for each treatment combination, i.e. D = A - B, D = A + B Î 2 + 4 5 - 6 Þ Ï 1- 5 9 + 2 ß = Ð à Î 6 -1 Þ Ï - 4 11 ß . Ð à A.2.3 Multiplication Scalar Multiplication A matrix multiplied by a scalar is the matrix with every element multiplied by the scalar. For example, suppose A represents a collection of measurements taken on one scale which we would like to convert to an alternative scale, and the conversion factor is 3. 286 For a scalar l = 3. lA = 3 Î 2 5 Þ Ï 1 9 ß = Ð à Î 6 15 Þ Ï 3 27 ß . Ð à Vector Multiplication The product of a row vector with a column vector is a scalar obtained from the sum of the products of corresponding elements of the vectors. For example, suppose v represents the number of observations taken on each of 3 animals and that y represents the mean of these observations on each of the 3 animals and we want the totals for each animal. Î1 Þ v’ = [3 4 1] y = ÏÏ5ßß , ÏÐ2ßà Î1 Þ t = v’y = [3 4 1] ÏÏ5ßß = 3(1) + 4(5) + 1(2) = 25. ÏÐ2ßà Matrix Multiplication Vector multiplication can be extended to the multiplication of a vector with a matrix, which is simply a collection of vectors. The product of a vector and a matrix is a vector and is obtained as follows: Î 6 4 Þ e.g. v’ = [3 4 1] M = ÏÏ 3 9 ßß ÏÐ 8 5 ßà v’M = [3 4 1] Î 6 4 Þ Ï 3 9 ß Ï ß ÏÐ 8 5 ßà = [3(6) + 4(3) + 1(8) = [38 3(4) + 4(9) + 1(5)] 53] That is, each column (or row) of the matrix is treated as a vector multiplication. This can be extended further to the multiplication of matrices. The product of two conformable matrices is illustrated by the following example: Î 2 5 Þ AxB = Ï ß Ð 1 9 à 287 Î 4 -6 Þ Ï -5 2 ß Ð à = = Î 2(4) + 5(-5) 2(-6) + 5(2) Þ Ï 1(4) + 9(-5) 1(-6) + 9(2) ß Ð à Î - 17 - 2 Þ Ï - 41 12 ß . Ð à For matrix multiplication to be conformable, the number of columns of the first matrix must equal the number of rows of the second matrix. A.2.4 Transpose The transpose of a matrix is obtained by replacing versa, e.g. Î 6 4 Þ M’ = ÏÏ 3 9 ßß = ÏÐ 8 5 ßà rows with corresponding columns and viceÎ 6 3 8Þ Ï4 9 5ß . Ð à The transpose of the product of two matrices is the product of the transposes of the matrices taken in reverse order, e.g. (AB)’ = B’A’ A.2.5 Determinants The determinant of a matrix is a scalar and exists only for square matrices. Knowledge of the determinant of a matrix is useful for obtaining the inverse of the matrix, which in matrix algebra is analogous to the reciprocal of scalar algebra. If A is a square matrix, its determinant can be symbolized as |A|. Procedures for evaluating the determinant of various order matrices follow. The determinant of a scalar (1 x 1 matrix) is the scalar itself, e.g. for A = 6, |A| = 6. The determinant of a 2 x 2 matrix is the difference between the product of the diagonal elements and the product of the off-diagonal elements, e.g. for A = Î 5 2 Þ Ï 6 3 ß Ð à |A| = 5(3) - 6(2) = 3. The determinant of a 3 x 3 matrix can be obtained from the expansion of three 2 x 2 matrices obtained from it. Each of the second order determinants is preceded by a coefficient of +1 or -1, e.g. for 288 Î5 2 4 Þ A = ÏÏ6 3 1ßß ÏÐ8 7 9ßà Based on elements of the first row, |A| = 5(+1) 3 1 7 9 + 2(-1) + 6 1 8 9 + 4(+1) 6 3 8 7 = 5(27 - 7) - 2(54 - 8) + 4(42 - 24) = 5(20) - 2(46) + 4(18) = 100 - 92 + 72 = 80 The determinant was derived by taking in turn each element of the first row, crossing out the row and column corresponding to the element, obtaining the determinant of the resulting 2 x 2 matrix, multiplying this determinant by +1 or -1 and the element concerned, and summing the resulting products for each of the three first row elements. The (+1) or (-1) coefficients for the ijth element were obtained according to (-1)i+j. For example, the coefficient for the 12 element is (-1)1+2 = (-1)3} = -1. The coefficient for the 13 element is (1)1+3 = (-1)4 = 1. The determinants of each of the 2 x 2 sub-matrices are called minors. For example, the minor of first row element 2 is Î 6 1 Þ Ï 8 9 ß = 46 Ð à When multiplied by its coefficient of (-1), the product is called the co-factor of element 12. For example, the co-factor of elements 11, 12 and 13 are 20, -46 and 18. Expansion by the elements of the second row yields the same determinant, e.g. Î 2 4 Þ Î 5 4 Þ Î 5 2 Þ |A| = 6(-1) Ï + 3(+1) Ï + 1 (-1) Ï ß ß ß Ð 7 9 à Ð 8 9 à Ð 8 7 à = -6 (18 - 28) - 3 (45 - 32) + 1 (35 - 16) = 60 + 39 - 19 = 80 Similarly, expansion by elements of the third row again yields the same determinant, etc. Î 2 4 Þ Î 5 4 Þ Î 5 2 Þ |A| = 8(+1) Ï + 7(-1) Ï + 9(+1) Ï ß ß ß Ð 3 1 à Ð 6 1 à Ð 6 3 à = 8 (2 - 12) - 7 (5 - 24) + 9 (15 - 12) 289 = -80 + 133 + 27 = 80 In general, multiplying the elements of any row by their co-factors yields the determinant. Also, multiplying the elements of a row by the co-factors of the elements of another row yields zero, e. g. the elements of the first row by the co-factors of the second row gives Î 2 4 Þ Î 5 4 Þ Î 5 2 Þ 5(-1) Ï + 2(+1) Ï + 4(-1) Ï ß ß ß Ð 7 9 à Ð 8 9 à Ð 8 7 à = -5 (18 - 28) + 2 (45 - 32) + 4 (35 - 16) = 50 + 26 - 76 = 0 Expansion for larger order matrices follows according to |A| = Ê a ( -1) n i j ij M ij j 1 for any i where n is the order of the matrix, i = 1, …, n and j = 1,…, n, aij is the ijth element, and |Mij| is the minor of the ijth element. A2.6 Inverse As suggested earlier, the inverse of a matrix is analogous to the reciprocal in scalar algebra and performs an equivalent operation to division. The inverse of matrix A is symbolized as A-1. The multiplication of a matrix by its inverse gives an identity matrix (I), which is composed of all diagonal elements of one and all off-diagonal elements of zero, i.e. A x A-1 = I. For the inverse of a matrix to exist, it must be square and have a non-zero determinant. The inverse of a matrix can be obtained from the co-factors of the elements and the determinant. The following example illustrates the derivation of the inverse. Î 5 2 4 Þ A = ÏÏ 6 3 1 ßß ÏÐ 8 7 9 ßà i) Calculate the co-factors of each element of the matrix, e.g. the co-factors of the elements Î 6 3 Þ Î 3 1 Þ Î 6 1 Þ , and (+1) Ï of the first row are (+1) Ï , (-1) Ï ß = 20, -46 and 18. ß ß Ð 8 9 à Ð 8 7 à Ð 7 9 à 290 Similarly, the co-factors of the elements of the second row are = 10, 13 and -19 and the co-factors of the elements of the third row are = -10, 19 and 3. ii) Replace the elements of the matrix by their co-factors, e.g. Î 5 2 4 Þ Î 20 - 46 18 A = ÏÏ 6 3 1 ßß yields C = ÏÏ 10 13 - 19 ÏÐ 8 7 9 ßà ÏÐ - 10 19 3 iii) Transpose the matrix of co-factors, e.g. Î 20 - 46 18 C’ = ÏÏ 10 13 - 19 ÏÐ - 10 19 3 iv) Þ ß ß ßà 10 - 10 Þ Î 20 ß = Ï - 46 13 19 ß Ï ÏÐ 18 - 19 3 ßà ’ Þ ß ß ßà Multiply the transpose matrix of co-factors by the reciprocal of the determinant to yield the inverse, e.g. |A| = 80, 1/|A| = 1/80 -1 A v) 10 - 10 Î 20 1 Ï = - 46 13 19 80 Ï ÏÐ 18 - 19 3 Þ ß . ß ßà As a check, the inverse multiplied by the original matrix should yield an identity matrix, i.e. A-1 A = I, e.g. 10 - 10 Î 20 1 Ï - 46 13 19 80 Ï ÏÐ 18 - 19 3 The inverse of a 2 x 2 matrix is: Þ ß ß ßà Î 5 2 4 Ï 6 3 1 Ï ÏÐ 8 7 9 Þ Î 1 0 0 Þ ß = Ï 0 1 0 ß . ß Ï ß ßà ÏÐ 0 0 1 ßà Îa b Þ 1 Î d - bÞ Ï c d ß = ad - bc Ï - c a ß Ð à Ð à A.2.7 Linear Independence and Rank As indicated, if the determinant of a matrix is zero, a unique inverse of the matrix does not exist. The determinant of a matrix is zero if any of its rows or columns are linear combinations of other 291 rows or columns. In other words, a determinant is zero if the rows or columns Î 5 Ï 2 of linearly independent vectors. For example, in the following matrix Ï ÏÐ 3 do not form a set 2 3 Þ 2 0 ßß 0 3 ßà rows 2 and 3 sum to row 1 and the determinant of the matrix is zero. The rank of a matrix is the number of linearly independent rows or columns. For example, the rank of the above matrix is 2. If the rank of matrix A is less than its order n, then the determinant is zero and the inverse of A does not exist, i.e. if r(A) < n then A-1 does not exist. A.2.8 Generalized Inverse Although a unique inverse does not exist for a matrix of less than full rank, generalized inverses do exist. If A- is a generalized inverse of A, it satisfies AA-A = A. Generalized or g-inverses are not unique and there are many A- which satisfy AA-A = A. There are also many ways to obtain a g-inverse, but one of the simplest ways is to follow these steps: a) b) c) d) e) Obtain a full rank subset of A and call it M. Invert M to yield M-1. Replace each element in A with the corresponding element of M-1. Replace all other elements of A with zeros. The result is A-, a generalized inverse of A. Example A = c, d) 6 3 2 1 3 3 0 0 2 0 2 0 1 0 0 1 Þ ß ß ß ß à Î 3 0 0 Þ Ï 0 2 0 ß Ï ß ÏÐ 0 0 1 ßà a) M, a full rank subset, is b) Î Ï Ï Ï Ï Ð Î 1/ 3 0 0 Þ M = ÏÏ 0 1 / 2 0 ßß . ÏÐ 0 0 1 ßà -1 Replacing elements of A with corresponding elements of M-1 and all other elements with zeros gives 292 e) Î Ï A- = Ï Ï Ï Ð 0 Þ 0 ßß . 1/ 2 0 ß ß 0 1 à 0 0 0 1/ 3 0 0 0 0 0 0 A.2.9 Special Matrices In many applications of statistics we deal with matrices that are the product of a matrix and its transpose, e.g. A = X’X Such matrices are always symmetric, that is every off-diagonal element above the diagonal equals its counterpart below the diagonal. For such matrices X (X’X)- X’X = X and X(X’X)X’ is invariant to (X’X)-, that is, although there are many possible g-inverses of X’X, any g-inverse pre-multiplied by X and post-multiplied by X’X yields the same matrix X. A.2.10 Trace The trace of a matrix is the sum of the diagonal elements. For the matrix A of order n with elements (aij), the trace is defined as tr (A) = Êa n ii i 1 As an example, the trace of Î 3 1 4 Þ Ï 1 6 2 ß Ï ß ÏÐ 4 2 5 ßà is 3 + 6 + 5 = 14 For products of matrices, tr(AB) = tr(BA) if the products are conformable. This can be extended to the product of three or more matrices, e.g. Tr(ABC) = tr(BCA) = tr(CAB) A.3 Quadratic Forms All sums of squares can be expressed as quadratic forms that is a y’Ay. y ~ (m, V), then 293 If E(y’Ay) = m’Am Exercises 1. For Î 6 3 Þ A = ÏÏ 0 5 ßß ÏÐ - 5 1 ßà Î 3 8 Þ B = ÏÏ 2 - 4 ßß ÏÐ 5 - 1 ßà Find the sum of A + B. Find the difference of A - B. 2. For A and B above and v’ = [1 3 -1], find v’A and v’B. 3. For 5 Þ Î 3 2 B’ = Ï ß Ð 8 - 4 -1 à and A as above. Find B’A. Find AB’. 4. For A and B above, find AB. 5. Obtain determinants of the following matrices Î 3 8 Þ Ï 2 -4 ß Ð à Î 6 3 Þ Ï 1 5 ß Ï ß ÏÐ - 5 2 ßà Î 1 1 3 Þ Ï 2 3 0 ß Ï ß ÏÐ 4 5 6 ßà Î 1 2 3 Þ Ï 4 5 6 ß Ï ß ÏÐ 7 8 9 ßà 294 Î Ï Ï Ï Ï Ð 1 6 4 3 Þ 2 8 5 4 ßß 3 8 7 5 ß ß 4 9 7 7 à 6. Show that the solution to Ax = y is x = A-1y. 7. Derive the inverses of the following matrices: Î 4 2 Þ Ï 6 1 ß Ð à 3 Î 2 1 Ï -5 1 0 Ï ÏÐ 1 4 - 2 Î Ï Ï Ï Ï Ð 8. For Þ ß ß ßà 3 0 0 0 Þ 0 4 0 0 ßß 0 0 2 0 ß ß 0 0 0 5 à Î 1 1 3 Þ Î 1 2 3 Þ ß Ï A = Ï 2 3 0 ß and B = ÏÏ 4 5 6 ßß , ÏÐ 4 5 6 ßà ÏÐ 7 8 9 ßà show that tr(AB) = tr (BA). 295 Appendix B A Few Useful Standard Forms of (Co) Variances and Derivatives This is a list of some of the more useful derivations of (co)variances of simple functions that are used in the course or might be used in particular problems. Some standard derivatives also follow. B.1 (Co) Variances NOTATION: V = variance, W = covariance. Vax = a2 Vx, where a is a constant V(x+y) = Vx + Vy + 2Wxy V(x-y) = Vx + Vy - 2Wxy V(x1 x2 ... xn ) = V x1 + Vx2 + ... Vn + 2Wx1x2 + 2Wxn-1xn Wax,by = abWxy, where a and b are constants V = x y Ë x Û 1 Vx + ÌÌ 2 ÜÜ 2 y Íy Ý 2 Vy - 2 Vxy = y 2Vx + x 2 V y - 2 xy Wxy x Wxy y3 (1st order approximation) (1st order approximation) W(x,xy) = yVx + x Wxy If T= f(x) and S = f(y), then VT Ë T Û = Ì Ü Í x Ý 2 Vx and if T = f(x1 ,x2 … xn) and S = f(y1},y2 … ym), then VT and = W(T,S) = Ê Ê n n i 1 j 1 Ê Ê n n i 1 j 1 Ë T ÌÌ Í xi Ë T ÌÌ Í xi Û ÜÜ Ý Û ÜÜ Ý Ë T Û Ì Ü W Ì x Ü xi x j Í jÝ Ë S Û Ì Ü W Ì y Ü xi y j Í jÝ 296 (1st order approximation) Note that, as always, the derivation of a variance is just a special case of derivation of a covariance. Also, both can be put in matrix notation as W(T,S) = t’Ws Ë T Û ÜÜ , s is a vector of length m with elements where t is a vector of length n with elements ÌÌ Í xi Ý Ë S Û ÌÌ ÜÜ , and W is a n x m matrix of covariances among the x and y. Í yi Ý B.2 Differentiation Recap on Basic Differentiation A small change in a variable, x, is usually denoted as Dx or as dx. Where two variables, say x and u, are functionally related, we have u = f(x) where f(x) denotes a function of x. A small change in x, Dx (or dx) causes a small change in u, Du (or du). In the limit, as Dx and hence Du become very small, the ratio of the changes in the two variables tend to a limit Du u x 0 £lim, ££ £ x Dx which is the rate of change of u with respect to x known as the differential of u with respect to x. Some Common Differentials (All log to base e) u = axn, u x u = log x, u x u = log v, where v = f(x) anx n -1 1 x u x 1 v v u = ae where v = f(x) u w y u x , where w = f1(x), y = f2(x) 297 v x u x a y w y - w x x y2 v v e x Appendix C Generation of Correlated Random Variates A General Method We wish to generate a vector, p, of random variables, which have a multi-variate normal distribution such that the variance/covariance matrix among the variables is V(p) = V (C.1) Define a matrix L such that LL’ = V (C.2) then create a vector, w, of random normal deviates with mean zero and variance 1, so that V(w) = I (C.3) Then p is obtained as P (C.4) which has the variance, V(p) = V(Lw) = LV(w)L’ = LIL’ = Lw = LL’ and from (C.2) V(p) = V as it should do. Using Cholesky Decomposition There are several possible solutions to L, one of which is the Cholesky decomposition of V, which is of lower triangular form, such that, for 3 variables say, L = and p then takes the form Î l11 Ï l Ï 21 ÏÐ l 31 0 l 22 l 32 0 Þ 0 ßß l 33 ßà Îl11w1 p = Lw = ÏÏl21w1 + l 22 w2 ÏÐl31w1 + l32 w2 + l33 w3 298 (C.5) Þ ß ß ßà (C.6) Thus the elements of p can be found sequentially from w1 to w3without the need to store w or use of multiplications involving zero. This, along with the fact that L is lower triangular and can therefore be stored efficiently, can cause considerable reductions in storage and computing time for large problems. An Intuitive Example Since the workings of the Cholesky decomposition may not be obvious at first sight, consider a 2 simple example of two random variables with variances s 21 and s 2 and covariance s 12 . We could simulate p1 and p2 by first generating p1 and then recalling from regression theory that p2 can be written as p2 = bp1 + ep2 where b is the regression of p2 on p1 when both are expressed as deviations from their population means and ep2 is that part of p2 which is not dependant on p1, which has mean zero and variance 2 ) 22 (i.e. the residual error variance of p2 after allowance for correlation with p1). (1 - r 12 If p1 is generated from a random normal deviate, w1, with variance 1, then p1 = s 1w1 Hence, bp1 = s 12 s s 1w1 = 122 w1 2 s1 s1 bp1 = s 2 r12 w1 (C.7) Similarly, e p 2 = sep2 w2 = s p ( 1-r122 ) w2 (C.8) Thus we have, Îs 1 w1 Þ Î p Þ p = Ï 1 ß = Ï ß 2 Ð p2 à ÐÏs 2 r12 w1 + s 2 ( 1 - r12 ) w2 ßà (C.9) 2 Now consider the Cholesky decomposition, Î l LL’ = Ï 11 Ð l 21 0 Þ Î l11 l 22 ßà ÏÐ 0 but Î s2 LL’ = V = Ï 1 Ð s 12 hence l11 = s 1 299 s 12 Þ ß s 22 à l 21 Þ Î l112 = Ï l 22 ßà Ðl11l 21 l 2 21 l11l 21 Þ ß + l 222 à l12 = s 12 s = 12 = s 2 r12 l11 s1 l11 = s 22 - l122 = s 2 (1 - r122 ) Substituting these into (C.6), we get Îs 1 w 1 Þ Î p1 Þ p = Lw = Ï = Ï ß ß 2 Ð p2 à ÏÐs 2 r12 w1 + s 2 (1 - r12 ) w2 ßà (C.10) which is exactly the same solution as (C.9). The Cholesky decomposition can thus be seen to break down the variance of each variable in turn, in terms of its conditional variance on previous traits (i.e. the variance of that trait which is in some way associated with the variance of previous traits) and the unconditional variance of that trait (i.e. the variance of that trait which is completely independent of all previous traits). 300