Basic Concepts in Matrix Algebra • An column array of p elements is called a vector of dimension p and is written as x1 x xp×1 = ..2 . . xp • The transpose of the column vector xp×1 is row vector x0 = [x1 x2 . . . xp] • A vector can be represented in p-space as a directed line with components along the p axes. 38 Basic Matrix Concepts (cont’d) • Two vectors can be added if they have the same dimension. Addition is carried out elementwise. x+y = x1 x2 ... xp + y1 y2 ... yp = x1 + y1 x2 + y2 ... xp + yp • A vector can be contracted or expanded if multiplied by a constant c. Multiplication is also elementwise. cx = c x1 x2 ... xp = cx1 cx2 ... cxp 39 Examples x = 2 h i 0 1 and x = 2 1 −4 −4 12 6×2 2 6×x=6× 1 = 6 6×1 = −24 6 × (−4) −4 x+y = 2 5 2+5 7 1 + −2 = 1 − 2 = −1 −4 0 −4 + 0 −4 40 Basic Matrix Concepts (cont’d) • Multiplication by c > 0 does not change the direction of x. Direction is reversed if c < 0. 41 Basic Matrix Concepts (cont’d) • The length of a vector x is the Euclidean distance from the origin v u p uX x2 Lx = u t j j=1 • Multiplication of a vector x by a constant c changes the length: v v u p u p uX uX u 2 2 c xj = |c|u x2 Lcx = t t j = |c|Lx j=1 j=1 • If c = L−1 x , then cx is a vector of unit length. 42 Examples The length of x Lx = = q 2 1 −4 −2 is (2)2 + (1)2 + (−4)2 + (−2)2 = √ 25 = 5 Then z 2 1 1 = × 5 −4 −2 0.4 0.2 = −0.8 −0.4 is a vector of unit length. 43 Angle Between Vectors • Consider two vectors x and y in two dimensions. If θ1 is the angle between x and the horizontal axis and θ2 > θ1 is the angle between y and the horizontal axis, then x y cos(θ1) = 1 cos(θ2) = 1 Lx Ly x y sin(θ1) = 2 sin(θ2) = 2 , Lx Ly If θ is the angle between x and y, then cos(θ) = cos(θ2 − θ1) = cos(θ2) cos(θ1) + sin(θ2) sin(θ1). Then x y + x2y2 cos(θ) = 1 1 . LxLy 44 Angle Between Vectors (cont’d) 45 Inner Product • The inner product between two vectors x and y is x0y = p X xj yj . j=1 • Then Lx = √ x0x, Ly = q y0y and cos(θ) = q x0y (x0x) q (y 0 y ) • Since cos(θ) = 0 when x0y = 0 and cos(θ) = 0 for θ = 90 or θ = 270, then the vectors are perpendicular (orthogonal) when x0y = 0. 46 Linear Dependence • Two vectors, x and y, are linearly dependent if there exist two constants c1 and c2, not both zero, such that c1 x + c2 y = 0 • If two vectors are linearly dependent, then one can be written as a linear combination of the other. From above: x = (c2/c1)y • k vectors, x1, x2, . . . , xk , are linearly dependent if there exist constants (c1, c2, ..., ck ) not all zero such that k X j=1 cj xj = 0 47 • Vectors of the same dimension that are not linearly dependent are said to be linearly independent Linear Independence-example Let 1 x1 = 2 , 1 x2 = 1 0 , −1 1 x3 = −2 1 Then c1x1 + c2x2 + c3x3 = 0 if c1 + c2 + c3 = 0 2c1 + 0 − 2c3 = 0 c1 − c2 + c3 = 0 The unique solution is c1 = c2 = c3 = 0, so the vectors are linearly independent. 48 Projections • The projection of x on y is defined by x0y 1 x0y y. Projection of x on y = 0 y = yy Ly Ly • The length of the projection is |x0y| |x0y| Length of projection = = Lx = Lx| cos(θ)|, Ly LxLy where θ is the angle between x and y. 49 Matrices A matrix A is an array of elements aij with n rows and p columns: A= a11 a12 · · · a1p a21 a22 · · · a2p ... ... ... ... an1 an2 · · · anp The transpose A0 has p rows and n columns. column of A a11 a21 · · · a a22 · · · 0 A = 12 ... ... ... a1p a2p · · · The j-th row of A0 is the j-th an1 an2 ... anp 50 Matrix Algebra • Multiplication of A by a constant c is carried out element by element. cA = ca11 ca12 · · · ca1p ca21 ca22 · · · ca2p ... ... ... ... can1 can2 · · · canp 51 Matrix Addition Two matrices An×p = {aij } and Bn×p = {bij } of the same dimensions can be added element by element. The resulting matrix is Cn×p = {cij } = {aij + bij } C = A+B = = a11 a12 · · · a1p a21 a22 · · · a2p ... ... ... ... an1 an2 · · · anp + b11 b12 · · · b1p b21 b22 · · · b2p ... ... ... ... bn1 bn2 · · · bnp a11 + b11 a12 + b12 · · · a1p + b1p a21 + b21 a22 + b22 · · · a2p + b2p ... ... ... ... an1 + bn1 an2 + bn2 · · · anp + bnp 52 Examples " " 2 1 −4 5 7 0 6× 2 1 −4 5 7 0 " # 2 −1 0 3 " + #0 = # " = 2 1 5 7 2 5 1 7 −4 0 12 6 −24 30 42 0 # " = 4 0 5 10 # # 53 Matrix Multiplication • Multiplication of two matrices An×p and Bm×q can be carried out only if the matrices are compatible for multiplication: – An×p × Bm×q : compatible if p = m. – Bm×q × An×p: compatible if q = n. The element in the i-th row and the j-th column of A × B is the inner product of the i-th row of A with the j-th column of B. 54 Multiplication Examples " 2 0 1 5 1 3 " " 2 1 5 3 # " # 1 4 2 10 × −1 3 = 4 29 0 2 # 1 4 −1 3 " × # 1 4 −1 3 " × 2 1 5 3 # " = # " = 1 11 2 29 22 13 13 8 # # 55 Identity Matrix • An identity matrix, denoted by I, is a square matrix with 1’s along the main diagonal and 0’s everywhere else. For example, " I2×2 = 1 0 0 1 # 1 0 0 and I3×3 = 0 1 0 0 0 1 • If A is a square matrix, then AI = IA = A. • In×nAn×p = An×p but An×pIn×n is not defined for p 6= n. 56 Symmetric Matrices • A square matrix is symmetric if A = A0. • If a square matrix A has elements {aij }, then A is symmetric if aij = aji. • Examples " 4 2 2 4 # 5 1 −3 1 12 −5 −3 −5 9 57 Inverse Matrix • Consider two square matrices Ak×k and Bk×k . If AB = BA = I then B is the inverse of A, denoted A−1. • The inverse of A exists only if the columns of A are linearly independent. • If A = diag{aij } then A−1 = diag{1/aij }. 58 Inverse Matrix • For a 2 × 2 matrix A, the inverse is A−1 = " a11 a12 a21 a22 #−1 1 = det(A) " a22 −a12 −a21 a11 # , where det(A) = (a11 × a22) − (a12 × a21) denotes the determinant of A. 59 Orthogonal Matrices • A square matrix Q is orthogonal if QQ0 = Q0Q = I, or Q0 = Q−1. • If Q is orthogonal, its rows and columns have unit length (q0j qj = 1) and are mutually perpendicular (q0j qk = 0 for any j 6= k). 60 Eigenvalues and Eigenvectors • A square matrix A has an eigenvalue λ with corresponding eigenvector z 6= 0 if Az = λz • The eigenvalues of A are the solution to |A − λI| = 0. • A normalized eigenvector (of unit length) is denoted by e. • A k × k matrix A has k pairs of eigenvalues and eigenvectors λ1, e1 λ2, e2 ... λk , ek where e0iei = 1, e0iej = 0 and the eigenvectors are unique up to a change in sign unless two or more eigenvalues are equal. 61 Spectral Decomposition • Eigenvalues and eigenvectors will play an important role in this course. For example, principal components are based on the eigenvalues and eigenvectors of sample covariance matrices. • The spectral decomposition of a k × k symmetric matrix A is A = λ1e1e01 + λ2e2e02 + ... + λk ek e0k = [e1 e2 · · · ek ] λ1 0 · · · 0 0 λ2 · · · 0 ... ... . . . ... 0 0 · · · λk [e1 e2 · · · ek ]0 = P ΛP 0 62 Determinant and Trace • The trace of a k × k matrix A is the sum of the diagonal elements, i.e., P trace(A) = ki=1 aii • The trace of a square, symmetric matrix A is the sum of the eigenvalP P ues, i.e., trace(A) = ki=1 aii = ki=1 λi • The determinant of a square, symmetric matrix A is the product of the Q eigenvalues, i.e., |A| = ki=1 λi 63 Rank of a Matrix • The rank of a square matrix A is – The number of linearly independent rows – The number of linearly independent columns – The number of non-zero eigenvalues • The inverse of a k × k matrix A exists, if and only if rank(A) = k i.e., there are no zero eigenvalues 64 Positive Definite Matrix • For a k × k symmetric matrix A and a vector x = [x1, x2, ..., xk ]0 the quantity x0Ax is called a quadratic form • Note that x0Ax = Pk Pk i=1 j=1 aij xi xj • If x0Ax ≥ 0 for any vector x, both A and the quadratic form are said to be non-negative definite. • If x0Ax > 0 for any vector x 6= 0, both A and the quadratic form are said to be positive definite. 65 Example 2.11 2−2 • Show that the matrix of the quadratic form 3x2 + 2x 1 2 √ 2x1x2 is positive definite. • For " A= √ √3 − 2 − 2 2 # , the eigenvalues are λ1 = 4, λ2 = 1. Then A = 4e1e01 + e2e02. Write x0Ax = 4x0e1e01x + x0e2e02x = 4y12 + y22 ≥ 0, and is zero only for y1 = y2 = 0. 66 Example 2.11 (cont’d) • y1, y2 cannot be zero because " y1 y2 # " = e01 e02 #" x1 x2 # 0 = P2×2 x2×1 with P 0 orthonormal so that (P 0)−1 = P . Then x = P y and since x 6= 0 it follows that y 6= 0. • Using the spectral decomposition, we can show that: – A is positive definite if all of its eigenvalues are positive. – A is non-negative definite if all of its eigenvalues are ≥ 0. 67 Distance and Quadratic Forms • For x = [x1, x2, ..., xp]0 and a p × p positive definite matrix A, d2 = x0Ax > 0 when x 6= 0. Thus, a positive definite quadratic form can be interpreted as a squared distance of x from the origin and vice versa. • The squared distance from x to a fixed point µ is given by the quadratic form (x − µ)0A(x − µ). 68 Distance and Quadratic Forms (cont’d) • We can interpret distance in terms of eigenvalues and eigenvectors of A as well. Any point x at constant distance c from the origin satisfies x0Ax = x0( p X λj ej e0j )x = j=1 p X λj (x0ej )2 = c2, j=1 the expression for an ellipsoid in p dimensions. −1/2 • Note that the point x = cλ1 e1 is at a distance c (in the direction of e1) from the origin because it satisfies x0Ax = c2. The same is true −1/2 for points x = cλj ej , j = 1, ..., p. Thus, all points at distance c lie on an ellipsoid with axes in the directions of the eigenvectors and with −1/2 lengths proportional to λj . 69 Distance and Quadratic Forms (cont’d) 70 Square-Root Matrices • Spectral decomposition of a positive definite matrix A yields A= p X λj ej e0j = P ΛP, j=1 with Λk×k = diag{λj }, all λj > 0, and Pk×k = [e1 e2 ... ep] an orthonormal matrix of eigenvectors. Then A−1 = P Λ−1P 0 = 1/2 • With Λ1/2 = diag{λj p X 1 j=1 λj ej e0j }, a square-root matrix is A1/2 = P Λ1/2P 0 = p q X λj ej e0j j=1 71 Square-Root Matrices The square root of a positive definite matrix A has the following properties: 1. Symmetry: (A1/2)0 = A1/2 2. A1/2A1/2 = A 3. A−1/2 = Pp −1/2 0 = λ e e j j j=1 j P Lambda−1/2P 0 4. A1/2A−1/2 = A−1/2A1/2 = I 5. A−1/2A−1/2 = A−1 Note that there are other ways of defining the square root of a positive definite matrix: in the Cholesky decomposition A = LL0, with L a matrix of lower triangular form, L is also called a square root of A. 72 Random Vectors and Matrices • A random matrix (vector) is a matrix (vector) whose elements are random variables. • If Xn×p is a random matrix, the expected value of X is the n×p matrix E(X) = E(X11) E(X12) E(X21) E(X22) ... ... E(Xn1) E(Xn2) · · · E(X1p) · · · E(X2p) , . .. ··· · · · E(Xnp) where E(Xij ) = Z ∞ ∞ xij fij (xij )dxij with fij (xij ) the density function of the continuous random variable Xij . If X is a discrete random variable, we compute its expectation as a sum rather than an integral. 73 Linear Combinations • The usual rules for expectations apply. If X and Y are two random matrices and A and B are two constant matrices of the appropriate dimensions, then E(X + Y ) = E(X) + E(Y ) E(AX) = AE(X) E(AXB) = AE(X)B E(AX + BY ) = AE(X) + BE(Y ) • Further, if c is a scalar-valued constant then E(cX) = cE(X). 74 Mean Vectors and Covariance Matrices • Suppose that X is p × 1 (continuous) random vector drawn from some p−dimensional distribution. • Each element of X, say Xj has its own marginal distribution with marginal mean µj and variance σjj defined in the usual way: µj = σjj = Z ∞ −∞ Z −∞ xj fj (xj )dxj (xj − µj )2fj (xj )dxj 75 Mean Vectors and Covariance Matrices (cont’d) • To examine association between a pair of random variables we need to consider their joint distribution. • A measure of the linear association between pairs of variables is given by the covariance h σjk = E (Xj − µj )(Xk − µk ) = Z ∞ Z ∞ −∞ −∞ i (xj − µj )(xk − µk )fjk (xj , xk )dxj dxk . 76 Mean Vectors and Covariance Matrices (cont’d) • If the joint density function fjk (xj , xk ) can be written as the product of the two marginal densities, e.g., fjk (xj , xk ) = fj (xj )fk (xk ), then Xj and Xk are independent. • More generally, the p−dimensional random vector X has mutually independent elements if the p−dimensional joint density function can be written as the product of the p univariate marginal densities. • If two random variables Xj and Xk are independent, then their covariance is equal to 0. [Converse is not always true.] 77 Mean Vectors and Covariance Matrices (cont’d) • We use µ to denote the p × 1 vector of marginal population means and use Σ to denote the p × p population variance-covariance matrix: i 0 Σ = E (X − µ)(X − µ) . h • If we carry out the multiplication (outer product)then Σ is equal to: )2 (X1 − µ1 (X1 − µ1 )(X2 − µ2 ) 2 (X2 − µ2)(X1 − µ1) (X − µ ) 2 2 E ... ... (Xp − µp )(X1 − µ1 ) (Xp − µp )(X2 − µ2 ) ··· ··· ··· ··· (X1 − µ1 )(Xp − µp ) (X2 − µ2 )(Xp − µp ) . ... (Xp − µp )2 78 Mean Vectors and Covariance Matrices (cont’d) • By taking expectations element-wise we find that Σ= σ11 σ12 · · · σ1p σ21 σ22 · · · σ2p ... ... ... ··· σp1 σp2 · · · σpp . • Since σjk = σkj for all j 6= k we note that Σ is symmetric. • Σ is also non-negative definite 79 Correlation Matrix • The population correlation matrix is the p × p matrix with off-diagonal elements equal to ρjk and diagonal elements equal to 1. 1 ρ12 · · · ρ1p ρ21 1 · · · ρ2p . ... ... · · · ... ρp1 ρp2 · · · 1 • Since ρij = ρji the correlation matrix is symmetric • The correlation matrix is also non-negative definite 80 Correlation Matrix (cont’d) • The p × p population standard deviation matrix V 1/2 is a diagonal ma√ trix with σjj along the diagonal and zeros in all off-diagonal positions. Then Σ = V 1/2P V 1/2 and the population correlation matrix is (V 1/2)−1Σ(V 1/2)−1 • Given Σ, we can easily obtain the correlation matrix 81 Partitioning Random vectors • If we partition the random p × 1 vector X into two components X1, X2 of dimensions q × 1 and (p − q) × 1 respectively, then the mean vector and the variance-covariance matrix need to be partitioned accordingly. • Partitioned mean vector: " E(X) = E X1 X2 # " = E(X1) E(X2) # " = µ1 # µ2 • Partitioned variance-covariance matrix: " V ar(X1) # " # Cov(X1, X2) Σ11 Σ12 Σ= = , 0 Cov(X2, X1) V ar(X2) Σ12 Σ22 where Σ11 is q × q, Σ12 is q × (p − q) and Σ22 is (p − q) × (p − q). 82 Partitioning Covariance Matrices (cont’d) • Σ11, Σ22 are the variance-covariance matrices of the sub-vectors X1, X2, respectively. The off-diagonal elements in those two matrices reflect linear associations among elements within each sub-vector. • There are no variances in Σ12, only covariances. These covariancs reflect linear associations between elements in the two different sub-vectors. 83 Linear Combinations of Random variables • Let X be a p × 1 vector with mean µ and variance covariance matrix Σ, and let c be a p×1 vector of constants. Then the linear combination c0X has mean and variance: E(c0X) = c0µ, and V ar(c0X) = c0Σc • In general, the mean and variance of a q × 1 vector of linear combinations Z = Cq×pXp×1 are µZ = CµX and ΣZ = CΣX C 0. 84 Cauchy-Schwarz Inequality • We will need some of the results below to derive some maximization results later in the course. Cauchy-Schwarz inequality Let b and d be any two p × 1 vectors. Then, (b0d)2 ≤ (b0b)(d0d) with equality only if b = cd for some scalar constant c . Proof: The equality is obvious for b = 0 or d = 0. For other cases, consider b − cd for any constant c 6= 0 . Then if b − cd 6= 0, we have 0 < (b − cd)0(b − cd) = b0b − 2c(b0d) + c2d0d, since b − cd must have positive length. 85 Cauchy-Schwarz Inequality We can add and subtract (b0d)2/(d0d) to obtain 0 < b0b−2c(b0d)+c2d0d− (b0d)2 d0d + (b0d)2 d0d = b0b− !2 0 bd (b0d)2 0 d) c − +( d d0d d0d Since c can be anything, we can choose c = b0d/d0d. Then, (b0d)2 0 0<bb− d0d ⇒ (b0d)2 < (b0b)(d0d) for b 6= cd (otherwise, we have equality). 86 Extended Cauchy-Schwarz Inequality If b and d are any two p × 1 vectors and B is a p × p positive definite matrix, then (b0d)2 ≤ (b0B b)(d0B −1d) with equality if and only if b = cB −1d or d = cB b for some constant c. √ Pp Pp 1/2 0 −1/2 = i=1 λieiei, and B = i=1 √1 eie0i. Proof: Consider B ( λi ) Then we can write b0d = b0 I d = b0B 1/2B −1/2d = (B 1/2b)0(B −1/2d) = 0 ∗ ∗ b d . To complete the proof, simply apply the Cauchy-Schwarz inequality to the vectors b∗ and d∗. 87 Optimization Let B be positive definite and let d be any p × 1 vector. Then (x0d)2 = d0B −1d max 0 x6=0 x B x is attained when x = cB −1d for any constant c 6= 0. Proof: By the extended Cauchy-Schwartz inequality: (x0d)2 ≤ (x0B x)(d0B −1d). Since x 6= 0 and B is positive definite, x0B x > 0 and we can divide both sides by x0B x to get an upper bound (x0d)2 0 B −1 d. ≤ d x0B x Differentiating the left side with respect to x shows that maximum is attained at x = cB −1d. 88 Maximization of a Quadratic Form on a Unit Sphere • B is positive definite with eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λp ≥ 0 and associated eigenvectors (normalized) e1, e2, · · · , ep. Then x0B x max 0 = λ1, x6=0 x x attained when x = e1 x0B x min 0 = λp, x6=0 x x attained when x = ep. • Furthermore, for k = 1, 2, · · · , p − 1, x0B x = λk+1 is attained when x = ek+1. max 0 x⊥e1,e2,··· ,ek x x See proof at end of chapter 2 in the textbook (pages 80-81). 89