L3: Review of linear algebra • • • • • • • Vector and matrix notation Vectors Matrices Vector spaces Linear transformations Eigenvalues and eigenvectors MATLAB® primer 1 Vector and matrix notation – A d-dimensional (column) vector 𝑥 and its transpose are written as: 𝑥1 𝑥2 𝑥 = ⋮ and 𝑥 𝑇 = 𝑥1 𝑥𝑑 𝑥2 … 𝑥𝑑 – An 𝑛 × 𝑑 (rectangular) matrix and its transpose are written as 𝑎11 𝑎21 𝐴= ⋮ 𝑎𝑛1 𝑎12 𝑎22 𝑎13 𝑎23 … 𝑎1𝑑 𝑎2𝑑 ⋱ 𝑎𝑛2 𝑎𝑛3 𝑎𝑛𝑑 𝑎11 𝑎12 and 𝑎𝑇 = 𝑎13 ⋮ 𝑎1𝑑 𝑎21 𝑎22 𝑎23 … ⋱ 𝑎𝑛1 𝑎𝑛2 𝑎𝑛3 𝑎2𝑑 𝑎𝑛𝑑 – The product of two matrices is 𝑎11 𝑎 𝐴𝐵 = 21 𝑎12 𝑎22 𝑎13 𝑎23 𝑎1𝑑 𝑎2𝑑 𝑎𝑚1 𝑎𝑚2 𝑎𝑚3 𝑎𝑚𝑑 𝑏11 𝑏21 𝑏31 𝑏12 𝑏22 𝑏32 𝑏1𝑛 𝑐11 𝑏2𝑛 𝑐21 𝑏3𝑛 = 𝑐31 𝑐12 𝑐22 𝑐32 𝑐13 𝑐23 𝑐33 𝑐1𝑛 𝑐2𝑛 𝑐3𝑛 𝑏𝑑1 𝑏𝑑2 𝑏𝑑𝑛 𝑐𝑚2 𝑐𝑚3 𝑐𝑚𝑛 where 𝑐𝑖𝑗 = 𝑐𝑚1 𝑑 𝑘=1 𝑎𝑖𝑘 𝑏𝑘𝑗 2 Vectors – The inner product (a.k.a. dot product or scalar product) of two vectors is defined by 𝑇 𝑑 𝑇 𝑥, 𝑦 = 𝑥 𝑦 = 𝑦 𝑥 = 𝑥𝑘 𝑦𝑘 𝑘=1 – The magnitude of a vector is 𝑥 = 𝑥𝑇𝑥 = 𝑑 𝑥𝑘 𝑥𝑘 1 2 𝑘=1 – The orthogonal projection of vector 𝑦 onto vector 𝑥 is • where vector 𝑢𝑥 has unit magnitude and the same direction as 𝑥 – The angle between vectors 𝑥 and 𝑦 is 𝑥, 𝑦 𝑐𝑜𝑠𝜃 = 𝑥 𝑦 – Two vectors 𝑥 and 𝑦 are said to be • orthogonal if 𝑥𝑇𝑦 = 0 • orthonormal if 𝑥𝑇𝑦 = 0 and |𝑥| = |𝑦| = 1 𝑦 𝑇 𝑢𝑥 𝑢𝑥 y x ux yTux |x | 3 – A set of vectors 𝑥1, 𝑥2, … , 𝑥𝑛 are said to be linearly dependent if there exists a set of coefficients 𝑎1, 𝑎2, … , 𝑎𝑛 (at least one different than zero) such that 𝑎1 𝑥1 + 𝑎2 𝑥2 … 𝑎𝑛 𝑥𝑛 = 0 – Alternatively, a set of vectors 𝑥1, 𝑥2, … , 𝑥𝑛 are said to be linearly independent if 𝑎1 𝑥1 + 𝑎2 𝑥2 … 𝑎𝑛 𝑥𝑛 = 0 ⇒ 𝑎𝑘 = 0 ∀𝑘 4 Matrices – The determinant of a square matrix 𝐴𝑑𝑑 is 𝑑 𝐴 = 𝑎𝑖𝑘 𝐴𝑖𝑘 −1 𝑘+𝑖 𝑘=1 • where 𝐴𝑖𝑘 is the minor formed by removing the ith row and the kth column of 𝐴 • NOTE: the determinant of a square matrix and its transpose is the same: |𝐴| = |𝐴𝑇 | – The trace of a square matrix 𝐴𝑑𝑑 is the sum of its diagonal elements 𝑑 𝑡𝑟 𝐴 = 𝑎𝑘𝑘 𝑘=1 – The rank of a matrix is the number of linearly independent rows (or columns) – A square matrix is said to be non-singular if and only if its rank equals the number of rows (or columns) • A non-singular matrix has a non-zero determinant 5 – A square matrix is said to be orthonormal if 𝐴𝐴𝑇 = 𝐴𝑇 𝐴 = 𝐼 – For a square matrix A • if 𝑥 𝑇 𝐴𝑥 > 0 ∀𝑥 ≠ 0, then 𝐴 is said to be positive-definite (i.e., the covariance matrix) • 𝑥 𝑇 𝐴𝑥 ≥ 0 ∀𝑥 ≠ 0, then A is said to be positive-semi-definite – The inverse of a square matrix 𝐴 is denoted by 𝐴−1 and is such that 𝐴𝐴−1 = 𝐴−1 𝐴 = 𝐼 • The inverse 𝐴−1 of a matrix 𝐴 exists if and only if 𝐴 is non-singular – The pseudo-inverse matrix 𝐴† is typically used whenever 𝐴−1 does not exist (because 𝐴 is not square or 𝐴 is singular) 𝐴† = 𝐴𝑇 𝐴 −1 𝐴𝑇 with 𝐴† 𝐴 = 𝐼 (assuming 𝐴𝑇 𝐴 is non-singular) • Note that A𝐴† ≠ 𝐼 in general 6 Vector spaces – The n-dimensional space in which all the n-dimensional vectors reside is called a vector space – A set of vectors {𝑢1, 𝑢2, … 𝑢𝑛 } is said to form a basis for a vector space if any arbitrary vector x can be represented by a linear combination of the 𝑢𝑖 u 𝑥 = 𝑎1 𝑢1 + 𝑎2 𝑢2 + ⋯ 𝑎𝑛 𝑢𝑛 3 • The coefficients {𝑎1, 𝑎2, … 𝑎𝑛 } are called the components of vector 𝑥 with respect to the basis {𝑢𝑖} • In order to form a basis, it is necessary and sufficient that the {𝑢𝑖 } vectors be linearly independent a3 a a1 u1 a2 u2 ≠0 =0 =1 – A basis {𝑢𝑖 } is said to be orthonormal if 𝑢𝑖𝑇 𝑢𝑗 =0 – A basis {𝑢𝑖 } is said to be orthogonal if 𝑢𝑖𝑇 𝑢𝑗 𝑖=𝑗 𝑖≠𝑗 𝑖=𝑗 𝑖≠𝑗 • As an example, the Cartesian coordinate base is an orthonormal base 7 – Given n linearly independent vectors {𝑥1, 𝑥2, … 𝑥𝑛}, we can construct an orthonormal base 𝜙1 , 𝜙2 , … 𝜙𝑛 for the vector space spanned by {𝑥𝑖 } with the Gram-Schmidt orthonormalization procedure (to be discussed in the RBF lecture) – The distance between two points in a vector space is defined as the magnitude of the vector difference between the points 1 2 𝑑 𝑑𝐸 𝑥, 𝑦 = 𝑥 − 𝑦 = 𝑥𝑘 − 𝑦𝑘 2 𝑘=1 • This is also called the Euclidean distance 8 The Gram-Schmidt orthogonalization process Let V be a vector space with an inner product. Suppose x1 , x2 , . . . , xn is a basis for V . Let v1 = x1 , hx2 , v1 i v1 , hv1 , v1 i hx3 , v1 i hx3 , v2 i v3 = x3 − v1 − v2 , hv1 , v1 i hv2 , v2 i ................................................. hxn , vn−1 i hxn , v1 i v1 − · · · − vn−1 . vn = xn − hv1 , v1 i hvn−1 , vn−1 i v2 = x2 − Then v1 , v2 , . . . , vn is an orthogonal basis for V . 9 Orthogonalization / Normalization An alternative form of the Gram-Schmidt process combines orthogonalization with normalization. Suppose x1 , x2 , . . . , xn is a basis for an inner product space V . Let v1 = x1 , w1 = v1 kv1 k , v2 = x2 − hx2 , w1 iw1 , w2 = v2 kv2 k , v3 = x3 − hx3 , w1 iw1 − hx3 , w2 iw2 , w3 = v3 kv3 k , ................................................. vn = xn − hxn , w1 iw1 − · · · − hxn , wn−1 iwn−1 , wn = kvvnn k . Then w1 , w2 , . . . , wn is an orthonormal basis for V . 10 Example Lei V = R3 ,vi h h Euclid an inn r product. V e ,, ill ,pp]y th Gram-8 h:i.nid algori hm to or hogonalize h b i {(1 -1 1) (1 0 1) (1 1 2:) }. Sep 1 Sep 2 V1 = ( 1 -1 1). )· ll-11 (1 -1 1) (1 ' 0. 1)· - ( ,110(11-1 1 ll2 (1,0 1) - i(l -1 1) (½ � }). Sep 3 >· l,-l2 l (11 11 1) (112 . 13 11 3 3, (1 (11 ' 1 2) (1, 12-1 1 1 1 1) l1 · . · - 11 3 · - 11 (333) 112 2' 51 2 1 (1, 1 2) - f(l -1 1) - 2(3 3, 3) ] (� 0 !). You an v rif that { (1 -1 1) (½, �: ½ ), (-1, 0 ! ) } for·m an or hogonal ba i for R 3 . malizing th v , 1 or in I h orthogonal basis we obi ain I he orthonormal ba is ,/3).· -v'3.3 -/3 { 1(-- -3 3 3, (J6 J6 J6).·. 1--i6 ' 3 ' . (. -J2 .. . 2 . or­ J22).· } . 2 . --0-1 2 ' 11 Linear transformations – A linear transformation is a mapping from a vector space 𝑋 𝑁 onto a vector space 𝑌 𝑀 , and is represented by a matrix • Given vector 𝑥𝜖𝑋 𝑁 , the corresponding vector y on 𝑌 𝑀 is computed as 𝑦1 𝑎11 𝑎12 𝑎1𝑁 𝑥1 𝑦2 𝑎21 𝑎22 𝑎2𝑁 𝑥2 = ⋮ ⋮ ⋱ 𝑦𝑀 𝑎𝑀1 𝑎𝑀2 𝑎𝑀𝑁 𝑥𝑁 • Notice that the dimensionality of the two spaces does not need to be the same • For pattern recognition we typically have 𝑀 < 𝑁 (project onto a lower-dim space) – A linear transformation represented by a square matrix A is said to be orthonormal when 𝐴𝐴𝑇 = 𝐴𝑇 𝐴 = 𝐼 • This implies that 𝐴𝑇 = 𝐴−1 • An orthonormal xform has the property of preserving the magnitude of the vectors 𝑦 = 𝑦 𝑇 𝑦 = 𝐴𝑥 𝑇 𝐴𝑥 = 𝑥 𝑇 𝐴𝑇 𝐴𝑥 = 𝑥 𝑇 𝑥 = 𝑥 • An orthonormal matrix can be thought of as a rotation of the reference frame • The row vectors of an orthonormal xform are a set of orthonormal basis vectors ← 𝑎1 → ← 𝑎2 → 0 𝑖≠𝑗 𝑌𝑀×1 = 𝑋𝑁×1 with 𝑎𝑖𝑇 𝑎𝑗 = 1 𝑖=𝑗 ← 𝑎𝑁 → 12 Eigenvectors and eigenvalues – Given a matrix 𝐴𝑁×𝑁 , we say that 𝑣 is an eigenvector* if there exists a scalar 𝜆 (the eigenvalue) such that 𝐴𝑣 = 𝜆𝑣 – Computing the eigenvalues 𝐴𝑣 = 𝜆𝑣 ⇒ 𝐴 − 𝜆𝐼 = 0 ⇒ 𝐴 − 𝜆𝐼 𝑣 = 0 ⇒ Trivial solution 𝑣=0 𝐴 − 𝜆𝐼 = 0 Non-trivial solution 𝐴 − 𝜆𝐼 = 0 ⇒ 𝜆𝑁 + 𝑎1 𝜆𝑁−1 + 𝑎2 𝜆𝑁−2 + +𝑎𝑁−1 𝜆 + 𝑎0 = 0 Characteristic equation *The "eigen-" in "eigenvector" translates as "characteristic" 13 – The matrix formed by the column eigenvectors is called the modal matrix M • Matrix Λ is the canonical form of A: a diagonal matrix with eigenvalues on the main diagonal ↑ 𝑀 = 𝑣1 ↓ ↑ 𝑣2 ↓ ↑ 𝑣𝑁 Λ = ↓ 𝜆1 𝜆2 𝜆𝑁 – Properties • If A is non-singular, all eigenvalues are non-zero • If A is real and symmetric, all eigenvalues are real – The eigenvectors associated with distinct eigenvalues are orthogonal • If A is positive definite, all eigenvalues are positive 14 Interpretation of eigenvectors and eigenvalues – If we view matrix 𝐴 as a linear transformation, an eigenvector represents an invariant direction in vector space • When transformed by 𝐴, any point lying on the direction defined by 𝑣 will remain on that direction, and its magnitude will be multiplied by 𝜆 P’ y2 P x2 P y=A x v d ’= d v d x1 y1 • For example, the transform that rotates 3-d vectors about the 𝑍 axis has vector [0 0 1] as its only eigenvector and 𝜆 = 1 as its eigenvalue z cos β A sin β 0 sin β 0 cos β 0 0 1 v 0 0 1 T x y 15 – Given the covariance matrix Σ of a Gaussian distribution • The eigenvectors of Σ are the principal directions of the distribution • The eigenvalues are the variances of the corresponding principal directions – The linear transformation defined by the eigenvectors of Σ leads to vectors that are uncorrelated regardless of the form of the distribution • If the distribution happens to be Gaussian, then the transformed vectors will be statistically independent ↑ Σ𝑀 = 𝑀Λ with 𝑀 = 𝑣1 ↓ 𝑓𝑥 𝑥 = 1 2𝜋 𝑁/2 ↑ 𝑣𝑁 Λ = ↓ 𝑁 1 exp − 𝑥 − 𝜇 𝑇 Σ −1 𝑥 − 𝜇 1/2 2 Σ 𝑓𝑦 𝑦 = 𝑖=1 𝜆1 𝜆2 𝜆𝑁 𝑦𝑖 − 𝜇𝑦𝑖 exp − 2𝜆𝑖 √2𝜋𝜆𝑖 1 2 y2 x2 𝑦 = 𝑀𝑇𝑥 x2 v2 ↑ 𝑣2 ↓ y2 v1 x1 x1 y1 y1 16