Short Guides to Microeconometrics Fall 2016 Kurt Schmidheiny/Klaus Neusser Universität Basel Elements of Matrix Algebra Contents 1 Definitions 2 2 Matrix Operations 3 3 Rank of a Matrix 6 4 Special Functions of Square Matrices 7 5 Systems of Equations 10 6 Eigenvalue, -vector and Spectral Decomposition 11 7 Quadratic Forms 13 8 Partitioned Matrices 15 9 Derivatives with Matrix Algebra 16 10 Kronecker Product 18 References 19 Formula Sources and Proofs 20 Version: 21-9-2016, 21:49 Elements of Matrix Algebra 2 Foreword These lecture notes are supposed to summarize the main results concerning matrix algebra as they are used in econometrics and economics. For a deeper discussion of the material, the interested reader should consult the references listed at the end. 1 Definitions A matrix is a rectangular array of numbers. Here we consider only real numbers. If the matrix has n rows and m columns, we say that the matrix is of dimension (n × m). We denote matrices by capital bold letters: a11 a12 ... a1m a21 A = (A)ij = (aij ) = .. . a22 .. . ... .. . a2m .. . an1 an2 ... anm The numbers aij are called the elements of the matrix. An (n × 1) matrix is a column vector with n elements. Similarly, a (1 × m) matrix is a row vector with m elements. We denote vectors by bold letters. a1 a2 a= .. . b = b1 b2 ... bm . an A (1 × 1) matrix is a scalar which is denoted by an italic letter. The null matrix (O) is a matrix whose elements are all equal to zero, i.e. aij = 0 for all i = 1, . . . , n and j = 1, . . . , m. A square matrix is a matrix with the same number of columns and rows, i.e. n = m. 3 Short Guides to Microeconometrics A symmetric matrix is a square matrix such that aij = aji for all i = 1, . . . , n and j = 1, . . . , m. A diagonal matrix is a square matrix such that the off-diagonal elements are all equal to zero, i.e. aij = 0 for i 6= j. The identity matrix is a diagonal matrix with all diagonal elements equal to one. The identity matrix is denoted by I or In . A square matrix is said to be upper triangular whenever aij = 0 for i > j and lower triangular whenever aij = 0 for i < j. Two vectors a and b are said to be linearly dependent if there exist scalars α and β both not equal to zero such that αa + βb = 0. Otherwise they are said to be linearly independent. 2 Matrix Operations 2.1 Equality Two matrices or two vectors are equal if they have the same dimension and if their respective elements are all equal: A=B 2.2 ⇐⇒ aij = bij for all i and j Transpose Definition 1. The matrix B is called the transpose of matrix A if and only if bij = aji for all i and j. The matrix B is denoted by A0 or AT . Taking the transpose of a matrix is equivalent to interchanging rows and columns. If A has dimension (n × m) then A0 has dimension (m × n). The transpose of a column vector is a row vector and vice versa. Note: • (A0 )0 = A for any matrix A (2.1) • A0 = A for a symmetric matrix A (2.2) Elements of Matrix Algebra 2.3 4 Addition and Subtraction The addition and subtraction of matrices is only defined for matrices with the same dimension. Definition 2. The sum of two matrices A and B of the same dimensions is given by the sum of their elements, i.e. ⇐⇒ C=A+B cij = aij + bij for all i and j We have the following calculation rules if matrix dimensions agree: 2.4 • A+O=A (2.3) • A − B = A + (−B) (2.4) • A+B=B+A (2.5) • (A + B) + C = A + (B + C) (2.6) • (A + B)0 = A0 + B0 (2.7) Product Definition 3. The inner product (dot product, scalar product) of two vectors a and b of the same dimension (n × 1) is a scalar (real number) defined as: a0 b = b0 a = a1 b1 + a2 b2 + · · · + an bn = n X ai bi . i=1 The product of a scalar c and a matrix A is a matrix B = cA with bij = caij . Note that cA = Ac when c is a scalar. Definition 4. The product of two matrices A and B with dimensions (n × k) and (k × m), respectively, is given by the matrix C with dimension (n × m) such that C = AB ⇐⇒ cij = k X s=1 ais bsj for all i and j 5 Short Guides to Microeconometrics Remark 1. The matrix product is only defined if the number of columns of the first matrix is equal to the number of rows of the second matrix. Thus, although A B may be defined, B A is only defined if n = m. Thus for square matrices both A B and B A are defined. Remark 2. The product of two matrices is in general not commutative, i.e. A B 6= B A. Remark 3. The product A B may also be defined as cij = (C)ij = a0i• b•j where a0i• denotes the i-th row of A and b•j the j-th column of B. We have the following calculation rules if matrix dimensions agree: • AI = A, • AO = O, • (AB)C = A(BC) = ABC (2.10) • A(B + C) = AB + AC (2.11) • (B + C)A = BA + CA (2.12) • c(A + B) = cA + cB (2.13) • (AB)0 = B0 A0 (2.14) • (ABC)0 = C0 B0 A0 (2.15) IA = A OA = O (2.8) (2.9) Elements of Matrix Algebra 3 6 Rank of a Matrix A set of vectors x1 , x2 , . . . , xn is linearly independent if Pn i=1 ci xi = 0 implies ci = 0 for all i = 1, . . . , n. The column rank of a matrix is the maximal number of linearly independent columns. The row rank of a matrix is the maximal number of linearly independent rows. A matrix is said to have full column (row) rank if the column rank (row rank) equals the number of columns (rows). The column rank of an n × m matrix A is equal to its row rank. We can therefore just speak of the rank of a matrix denoted by rank(A). For an (n × k) matrix A, a (k × m) matrix B and an (n × n) square matrix C, we have • rank(A) ≤ min(n, k) (3.1) • rank(A0 ) = rank(A) (3.2) • rank(A0 A) = rank(AA0 ) = rank(A) (3.3) • rank(AB) ≤ min(rank(A), rank(B)) (3.4) • rank(AB) = rank(B) if A has full column rank (3.5) • rank(AB) = rank(A) if B has full row rank (3.6) • rank(A0 CA) = rank(CA) if C is nonnegative definite (3.7) • rank(A0 CA) = rank(A) if C is positive definite (3.8) 7 4 Short Guides to Microeconometrics Special Functions of Square Matrices In this section only square (n × n) matrices are considered. 4.1 Trace of a Matrix Definition 5. The trace of a matrix A, denoted by tr(A), is the sum of its diagonal elements: tr(A) = n X aii i=1 The following calculation rules hold if matrix dimensions agree: 4.2 • tr(cA) = c tr(A) (4.1) • tr(A0 ) = tr(A) (4.2) • tr(A + B) = tr(A) + tr(B) (4.3) • tr(AB) = tr(BA) (4.4) • tr(ABC) = tr(BCA) = tr(CAB) (4.5) Determinant The determinant can be computed according to the following formula: |A| = n X aij (−1)i+j |Aij | for some arbitrary j i=1 The determinant, computed as above, is said to be developed according to the j-th column. The term (−1)i+j |Aij | is called the cofactor of the element aij . Thereby Aij is a matrix of dimension ((n − 1) × (n − 1)) which is obtained by deleting the i-th row and the j-th column. Elements of Matrix Algebra 8 § a11 a1 j a1n · ¨ ¸ ¸ ¨ ¸ ¨ A ij = ¨ ai1 aij ain ¸ ¨ ¸ ¸ ¨ ¨a a a ¸ nj nn ¹ © n1 If at least two columns (rows) are linearly dependent, the determinant is equal to zero and the inverse of A does not exist. The matrix is called singular in this case. If the matrix is nonsingular then all columns (rows) are linearly independent. If a column or a row has just zeros as its elements, the determinant is equal to zero. If two columns (rows) are interchanged, the determinant changes its sign. For n = 2 , the determinant is given by a tractable formula: |A| = a11 a22 − a12 a21 Calculation rules for the determinant are: 4.3 • |A| = |A0 | (4.6) • |AB| = |A|·|B| (4.7) • |cA| = cn |A| (4.8) Inverse of a Matrix If A is a square matrix, there may exist a matrix B with property AB = BA = I. If such a matrix exists, it is called the inverse of A and is denoted by A−1 , hence AA−1 = A−1 A = I. The inverse of a matrix can be computed as follows (−1)1+1 |A11 | (−1)1+2 |A12 | 1 −1 A = .. |A| . (−1)2+1 |A21 | ... (−1)n+1 |An1 | (−1)2+2 |A22 | .. . ... .. . (−1)n+2 |An2 | (−1)1+n |A1n | (−1)2+n |A2n | . . . (−1)n+n |Ann | 9 Short Guides to Microeconometrics where Aij is the matrix of dimension (n − 1) × (n − 1) obtained from A by deleting the i-th row and the j-th column. § a11 a1 j a1n · ¨ ¸ ¸ ¨ ¸ ¨ A ij = ¨ ai1 aij ain ¸ ¨ ¸ ¸ ¨ ¨a a a ¸ n 1 nj nn © ¹ The term (−1)i+j |Aij | is called the cofactor of aij . For n = 2, the inverse is given by A −1 1 = a11 a22 − a12 a21 a22 −a12 −a21 a11 ! . We have the following calculation rules if both A−1 and B−1 exist and matrix dimensions agree: • A−1 −1 =A • (AB)−1 = B−1 A−1 0 −1 (A0 ) = A−1 • |A−1 | = |A|−1 • 4.4 (4.9) (order reversed) (4.10) (4.11) (4.12) Nonsingular Square Matrices The following statements about a square (n × n) matrix A are equivalent: • A is nonsingular (4.13) • |A| = 6 0 (4.14) • A−1 exists (4.15) • rank(A) = n (full rank) (4.16) • λi 6= 0 for all i = 1, ..., n (4.17) Elements of Matrix Algebra 5 10 Systems of Equations Consider the following system of m equations in n unknowns x1 , . . . , xn : a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 ... am1 x1 + am2 x2 + · · · + amn xn = bm If we collect the unknowns into a vector x = (x1 , . . . , xn )0 , the coefficients b1 , . . . , bn in to a vector b, and the coefficients (aij ) into a matrix A, we can rewrite the equation system compactly in matrix form as follows: a11 a12 ... a21 . . . a22 .. . ... .. . am1 | am2 . . . {z A b1 a2n x2 b2 .. . = . . .. .. a1n amn x1 xn } | {z } x bn | {z } b Ax = b This equation system has a unique solution if m = n, i.e. if A is a square matrix, and A is nonsingular, i.e A−1 exits. The solution is then given by x = A−1 b Remark 4. To achieve numerical accuracy it is preferable not to compute the inverse explicitly. There are efficient numerical algorithms which can solve the equation system without computing the inverse. 11 6 6.1 Short Guides to Microeconometrics Eigenvalue, -vector and Spectral Decomposition Eigenvalue and Eigenvector A scalar λ is said to be an eigenvalue of the square matrix A if there exists a vector x 6= 0 such that A x = λx The vector x is called an eigenvector corresponding to λ. If x is an eigenvector then α x, α 6= 0, is also an eigenvector. Eigenvectors are therefore not unique. It is sometimes useful to normalize the length of the eigenvectors to one, i.e. to choose the eigenvector such that x0 x = 1. 6.2 Characteristic Equation In order to find the eigenvalues and eigenvectors of a square matrix, one has to solve the equation system A x = λx = λI x ⇐⇒ (A − λ I)x = 0. This equation system has a nontrivial solution, x 6= 0, if and only if the matrix (A − λ I) is singular, or equivalently if and only if the determinant of (A − λ I) is equal to zero. This leads to an equation in the unknown parameter λ: |A − λ I| = 0. This equation is called the characteristic equation of the matrix A and corresponds to a polynomial equation of order n. The n solutions of this equation (roots) are the eigenvalues of the matrix. The solutions may be complex numbers. Some solutions may appear several times. Eigenvectors corresponding to some eigenvalue λ can be obtained from the equation (A − λ I)x = 0. We have the following relations for an (n × n) matrix A: Pn • tr(A) = i=1 λi Qn • |A| = i=1 λi (6.1) (6.2) Elements of Matrix Algebra 6.3 12 Eigenvalues and Eigenvectors of Symmetric Matrices If A is a symmetric (n × n) matrix, all eigenvalues are real and there exist n linearly independent eigenvectors x1 , . . . , xn with the properties x0i xj = 0 for i 6= j and x0i xi = 1, i.e the eigenvectors are orthogonal to each other and of length one. If we collect the eigenvectors into an (n × n) matrix T = (x1 , . . . , xn ), we have T0 T = TT0 = I and hence, T−1 = T0 . If we collect all the eigenvalues into a diagonal matrix Λ, λ1 0 . . . 0 0 λ2 . . . 0 Λ= . .. . . .. , . .. . . 0 0 ... λn we can diagonalize the matrix A as follows: T0 AT = T0 TΛ = IΛ = Λ. (6.3) This implies that we can decompose A into the sum of n matrices: A = TΛT0 = n X λi xi x0i i=1 where the matrices xi x0i have all rank one. The above decomposition is called the spectral decomposition or eigenvalue decomposition of A. The inverse of a nonsingular symmetric matrix A can be calculated as A−1 = TΛ−1 T0 = n X 1 xi x0i . λ i i=1 Remark 5. Beside symmetric matrices, many other matrices, but not all matrices, are also diagonalizable. 13 7 Short Guides to Microeconometrics Quadratic Forms For a vector x ∈ Rn and a square matrix A of dimension (n × n) the scalar function f (x) = x0 Ax = n X n X xi xj aij j=1 i=1 is called a quadratic form. The quadratic form x0 Ax and therefore the matrix A is called positive (negative) definite, if and only if x0 Ax > 0(< 0) for all x 6= 0. The property that A is positive definite implies that • λi > 0 for all i = 1, ..., n (7.1) • |A| > 0 (7.2) • A−1 exists and is positive definite (7.3) • tr(A) > 0 (7.4) The first property is an alternative definition for a positive definite matrix. The quadratic form x0 Ax and therefore the matrix A is called nonnegative definite or positive semi-definite, if and only if x0 Ax ≥ 0 for all x. For nonnegative definite matrices we have: • λi ≥ 0 for all i = 1, ..., n (7.5) • |A| ≥ 0 (7.6) • tr(A) ≥ 0 (7.7) The first property is an alternative definition for nonnegative definiteness. Elements of Matrix Algebra 14 For an (n × m) matrix B, • B0 B is nonnegative definite (7.8) • B0 B is positive definite if B has full column rank (7.9) • BB0 is nonnegative definite (7.10) If the (n × m) matrix B has rank m (full column rank) and the (n × n) matrix A is positive definite then • B0 AB is positive definite (7.11) The inverse of a positive definite (n × n) matrix A can be decomposed into A−1 = C0 C where CAC0 = I. 15 8 Short Guides to Microeconometrics Partitioned Matrices Consider a square matrix P of dimensions ((p + q) × (r + s)) which is partitioned into the (p × r) matrix P11 , the (p × s) matrix P12 , the (q × r) matrix P21 and the (q × s) matrix P22 : P= P11 P12 P21 P22 ! Assuming that dimensions in the involved multiplications agree, two partitioned matrices are multiplied as ! ! P11 P12 Q11 Q12 P11 Q11 + P12 Q21 = P21 P22 Q21 Q22 P21 Q11 + P22 Q21 P11 Q12 + P12 Q22 ! P21 Q12 + P22 Q22 Assuming that P−1 11 exists, the determinant of a partitioned matrix is P 11 P12 (8.1) = |P11 | · |P22 − P21 P−1 11 P12 | P21 P22 and the inverse is !−1 P11 P12 P21 P22 = −1 −1 P−1 P21 P−1 11 + P11 P12 F 11 −1 −P−1 11 P12 F −F−1 P21 P−1 11 F−1 where F = P22 − P21 P−1 11 P12 is assumed nonsingular. The determinant of a block diagonal matrix is P 11 O = |P11 | · |P22 | O P22 −1 and its inverse is, assuming that P−1 11 and P22 exist, P11 O O P22 !−1 = P−1 11 O O P−1 22 ! . ! (8.2) Elements of Matrix Algebra 9 16 Derivatives with Matrix Algebra A linear function f from the n-dimensional vector space of real numbers, Rn , to the real numbers, R, f : Rn −→ R is determined by the coefficient vector a = (a1 , . . . , an )0 : y = f (x) = a0 x = n X ai xi = a1 x1 + a2 x2 + · · · + an xn i=1 where x is a column vector of dimension n and y a scalar. The derivative of y = f (x) with respect to the column vector x is defined as follows: ∂y/∂x1 a1 ∂y/∂x2 a2 ∂a0 x ∂x0 a ∂y = . =a = = = .. . ∂x ∂x ∂x . . ∂y/∂xn an and with respect to the row vector x0 as follows: ∂y ∂a0 x ∂x0 a h ∂y = = = ∂x1 ∂x0 ∂x0 ∂x0 ∂y ∂x2 ... ∂y ∂xn i h = a1 a2 ... i a n = a0 The simultaneous equation system y = Ax can be viewed as m linear functions yi = a0i x where a0i denotes the i-th row of the (m×n) dimensional matrix A. Thus the derivative of yi with respect to x is given by ∂yi ∂a0i x = = ai ∂x ∂x Consequently the derivative of y = Ax with respect to row vector x0 can be defined as ∂y1 /∂x0 a01 ∂y2 /∂x0 a02 ∂Ax ∂y = . = A. = = .. . ∂x0 ∂x0 . . ∂ym /∂x0 a0m 17 Short Guides to Microeconometrics The derivative of y = Ax with respect to column vector x is therefore ∂Ax ∂y = = A0 . ∂x ∂x For a square matrix A of dimension (n × n) and the quadratic form Pn Pn x Ax = j=1 i=1 xi xj aij the derivative with respect to the column 0 vector x is defined as ∂x0 Ax = (A + A0 )x. ∂x If A is a symmetric matrix this reduces to ∂x0 Ax = 2Ax. ∂x The derivative of the quadratic form x0 Ax with respect to the matrix elements aij is given by ∂x0 Ax = xi xj . ∂aij Therefore the derivative with respect to the matrix A is given by ∂x0 Ax = xx0 . ∂A Elements of Matrix Algebra 10 18 Kronecker Product The Kronecker Product of a m × n Matrix A with a p × q Matrix B is a mp × nq Matrix A ⊗ B defined as follows: a11 B a12 B ... a1n B a21 B A⊗B= .. . a22 B .. . ... .. . a21 B .. . am1 B am2 B . . . amn B The following calculation rules hold if matrix dimensions agree: • (A ⊗ B) + (C ⊗ B) = (A + C) ⊗ B (10.1) • (A ⊗ B) + (A ⊗ C) = A ⊗ (B + C) (10.2) • (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD) (10.3) • (A ⊗ B)−1 = A−1 ⊗ B−1 (10.4) • tr(A ⊗ B) = tr(A)tr(B) (10.5) 19 Short Guides to Microeconometrics References [1] Abadir, K.M. and J.R. Magnus, Matrix Algebra, Cambridge: Cambridge University Press, 2005. [2] Amemiya, T., Introduction to Statistics and Econometrics, Cambridge, Massachusetts: Harvard University Press, 1994. [3] Dhrymes, P.J., Introductory Econometrics, New York : SpringerVerlag, 1978. [4] Greene, W.H., Econometrics, New York: Macmillan, 1997. [5] Meyer, C.D., Matrix Analysis and Applied Linear Algebra, Philadelphia: SIAM, 2000. [6] Strang, G., Linear Algebra and its Applications, 3rd Edition, San Diego: Harcourt Brace Jovanovich, 1986. [7] Magnus, J.R., and H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics, Chichester: John Wiley, 1988. Elements of Matrix Algebra Formula Sources and Proofs (2.8) Abadir and Magnus (2005), p. 28, ex. 2.18 (b) (2.10) Abadir and Magnus (2005), p. 25, ex. 2.14 (a) (2.11) Abadir and Magnus (2005), p. 25, ex. 2.14 (b) (2.14) Abadir and Magnus (2005), p. 26, ex. 2.15 (a) (2.15) Abadir and Magnus (2005), p. 26, ex. 2.15 (b) (3.1) Abadir and Magnus (2005), p. 78 - 79, ex. 4.7 (a) (3.2) Abadir and Magnus (2005), p. 77 - 78, ex. 4.5 (3.3) Abadir and Magnus (2005), p. 81, ex. 4.13 (d) (3.4) Abadir and Magnus (2005), p. 81, ex. 4.15 (b) (3.5) Abadir and Magnus (2005), p. 85, ex. 4.25 (c) (3.6) Abadir and Magnus (2005), p. 85, ex. 4.25 (d) (3.7) Abadir and Magnus (2005), p. 221, ex. 8.27 (a) (3.8) Abadir and Magnus (2005), p. 221, ex. 8.26 (a) (4.1) Abadir and Magnus (2005), p. 30, ex. 2.24 (b) (4.2) Abadir and Magnus (2005), p. 30, ex. 2.24 (c) (4.3) Abadir and Magnus (2005), p. 30, ex. 2.24 (a) (4.4) Abadir and Magnus (2005), p. 30, ex. 2.26 (a) (4.5) Abadir and Magnus (2005), p. 31, ex. 2.26 (c) (4.6) Abadir and Magnus (2005), p. 88, ex. 4.30 (4.7) Abadir and Magnus (2005), p. 94, ex. 4.42 (4.8) Abadir and Magnus (2005), p. 90, ex. 4.35 (a) (4.9) Abadir and Magnus (2005), p. 84, ex. 4.22 (b) (4.10) Abadir and Magnus (2005), p. 84, ex. 4.22 (d) (4.11) Abadir and Magnus (2005), p. 84, ex. 4.22 (c) (4.12) Abadir and Magnus (2005), p. 95, ex. 4.44 (a) 20 21 Short Guides to Microeconometrics (4.13) Abadir and Magnus (2005), p. 83-84, ex. 4.21 (4.14) Abadir and Magnus (2005), p. 94, ex. 4.43 (4.15) Abadir and Magnus (2005), p. 83-84, ex. 4.21 (4.16) Abadir and Magnus (2005), p. 83-84, ex. 4.21 (4.17) Abadir and Magnus (2005), p. 164, ex. 7.16 (6.1) Abadir and Magnus (2005), p. 168, ex. 7.27 (6.2) Abadir and Magnus (2005), p. 167, ex. 7.26 (6.3) Abadir and Magnus (2005), p. 177, ex. 7.46 (7.1) Abadir and Magnus (2005), p. 215, ex. 8.11 (a) (7.2) Abadir and Magnus (2005), p. 215, ex. 8.12 (a) (7.3) Abadir and Magnus (2005), p. 216, ex. 8.14 (c) (7.4) Abadir and Magnus (2005), p. 215, ex. 8.12 (b) (7.5) Abadir and Magnus (2005), p. 215, ex. 8.11 (b) (7.6) Abadir and Magnus (2005), p. 216, ex. 8.13 (a) (7.7) Abadir and Magnus (2005), p. 216, ex. 8.13 (b) (7.8) Abadir and Magnus (2005), p. 214, ex. 8.9 (a) (7.10) Abadir and Magnus (2005), p. 214, ex. 8.9 (a) (7.11) Abadir and Magnus (2005), p. 221, ex. 8.26 (b) (8.1) Abadir and Magnus (2005), p. 114, ex. 5.30 (a) (8.2) Abadir and Magnus (2005), p. 106, ex. 5.16 (a) (10.1) Abadir and Magnus (2005), p. 275, ex. 10.3 (a) (10.2) Abadir and Magnus (2005), p. 275, ex. 10.3 (b) (10.3) Abadir and Magnus (2005), p. 275, ex. 10.3 (d) (10.4) Abadir and Magnus (2005), p. 278, ex. 10.8 (10.5) Abadir and Magnus (2005), p. 277, ex. 10.7 (b)