Mathematics for ML Linear Algebra Suk-Ju Kang Department of Electronic Engineering Sogang University 2 Linear Algebra • Linear algebra is the study of vectors and certain algebra rules to manipulate vectors. • Vectors are special objects that can be added together and multiplied by scalars to produce another object of the same kind. • Examples βͺ Geometric vectors : Geometric vectors are directed segments, which have direction and magnitude (at least in two dimensions). βͺ Polynomials : Polynomials are very different from geometric vectors. While geometric vectors are concrete “drawings”, polynomials are abstract concepts. βͺ Audio signals: The signals are represented as a series of numbers. βͺ Elements of π π : π π is more abstract than polynomials (we focus them on in this book) 2 2 Linear Algebra A mind map of the concepts introduced in this chapter, along with where they are used in other parts of the book 3 2.1 Systems of Linear Equations • For a real-valued system of linear equations, we obtain either no, exactly one, or infinitely many solutions. • In a system of linear equations with two variables π₯1 , π₯2 , each linear equation defines a line on the π₯1 π₯2 -plane. βͺ Since a solution to a system of linear equations must satisfy all equations simultaneously, the solution set is the intersection of these lines. βͺ This intersection set can be a line, a point, or empty. 4 2.1 Systems of Linear Equations • For a systematic approach to solving systems of linear equations, we will introduce a useful compact notation. 5 2.2 Matrices • Definition of Matrix With π, π ∈ π a real-valued (π, π) matrix π΄ is an π β π -tuple of elements πππ , π = 1,…,π, π = 1,…,π, which is ordered according to a rectangular scheme consisting of π rows and π columns: row column 6 2.2.1 Matrix Addition and Multiplication • The sum of two matrices π΄ ∈ π π∗π , π΅ ∈ π π∗π is defined as the elementwise sum. • For matrices π΄ ∈ π π∗π , π΅ ∈ π π∗π , the elements πππ of the product πΆ = π΄π΅ ∈ π π∗π are computed as 7 2.2.1 Matrix Addition and Multiplication • Matrices can only be multiplied if their “neighboring” dimensions match. • The product π΅π΄ is not defined if π ≠ π since the neighboring dimensions do not match. • Matrix multiplication is not defined as an element-wise operation on matrix elements, i.e., πππ ≠ πππ πππ • This kind of element-wise multiplication often appears in programming languages when we multiply (multi-dimensional) arrays with each other, and is called a Hadamard product. 8 2.2.1 Matrix Addition and Multiplication • Definition of Identity Matrix In π π∗π , we define the identity matrix as the πο΄π -matrix containing 1 on the diagonal and 0 everywhere else. 9 2.2.1 Matrix Addition and Multiplication • Properties of matrices βͺ Associativity βͺ Distributivity βͺ Multiplication with the identity matrix 10 2.2.2 Inverse and Transpose • Definition of Inverse Consider a square matrix π΄ ∈ π π∗π . Let matrix π΅ ∈ π π∗π have the property that π΄π΅ = πΌπ = π΅π΄. ο π΅ is called the inverse of π΄ and denoted by π΄−1 . • Unfortunately, not every matrix π΄ possesses an inverse π΄−1 . • If this inverse does exist, π΄ is called regular / invertible / nonsingular, otherwise singular / noninvertible. • When the inverse matrix exists, it is unique. • If the determinant of a matrix is not equal to zero, then the matrix is called a nonsingular matrix. 11 2.2.2 Inverse and Transpose • π11 π22 − π12 π21 is the determinant of a 2ο΄2-matrix. • We can generally use the determinant to check whether a matrix is invertible. 12 2.2.2 Inverse and Transpose • Definition of Transpose For π΄ ∈ π π∗π , the matrix π΅ ∈ π π∗π with πππ = πππ is called the transpose of π΄. We write π΅ = π΄π . • π΄π can be obtained by writing the columns of π΄ as the rows of π΄π . • Properties of inverses and transposes: 13 2.2.2 Inverse and Transpose • Definition of Symmetric Matrix π΄ matrix π΄ ∈ π π∗π is symmetric if π΄ = π΄π . • Note that only (π, π)-matrices can be symmetric. we call (π, π)-matrices also square matrices because they possess the same number of rows and columns. βͺ Generally, • If π΄ is invertible, then so is π΄π , and (π΄−1 )π = (π΄π )−1 = π΄−π . • The sum of symmetric matrices π΄, B ∈ π π∗π is always symmetric. βͺ However, their product is generally not symmetric: 14 2.2.2 Inverse and Transpose < Normalizing flow ꡬ쑰> 15 2.2.3 Multiplication by a Scalar • Associativity • Distributivity 16 2.2.4 Compact Representations of Systems of Linear Equations • The system of linear equations • Use the rules for matrix multiplication 17 2.3 Solving Systems of Linear Equations 2.3.1 Particular and General Solution • The system has two equations and four unknowns. βͺ Therefore, in general, we would expect infinitely many solutions. • A solution to the problem in (2.38) can be found immediately by taking 42 times the first column and 8 times the second column so that • Therefore, a solution is [42, 8, 0, 0]π . This solution is called a particular solution or special solution. 18 2.3.1 Particular and General Solution • Adding 0 to our special solution does not change the special solution. • To do so, we express the third column using the first two columns. 19 2.3.1 Particular and General Solution • Putting everything together, we obtain all solutions of the equation: General solution 20 2.3.1 Particular and General Solution • The general approach with three steps: • Neither the general nor the particular solution is unique. • In the convenient form(equations) previously used, particular and general solution can be found, but general equation systems are not of this simple form. • Fortunately, there exists a constructive algorithmic way of transforming any system of linear equations into this particularly simple form: ο Gaussian elimination 21 2.3.2 Elementary Transformations • Key to solving a system of linear equations are elementary transformations that keep the solution set the same, but that transform the equation system into a simpler form: • First of all, we no longer mention the variables x explicitly and build the augmented matrix [π΄ | π] 1. 2. 3. 22 2.3.2 Elementary Transformations row-echelon form : pivot : free variable 23 2.3.2 Elementary Transformations • Reverting this compact notation back into the explicit notation with the variables, • Only for π = −1 this system can be solved (solvability condition). A particular solution is free variable ο 0 • The general solution, which captures the set of all possible solutions, is # of free variables = 2, 24 2.3.2 Elementary Transformations • Definition of Row-Echelon Form All rows that contain only zeros are at the bottom of the matrix βͺ Correspondingly, all rows that contain at least one nonzero element are on top of rows that contain only zeros. • Looking at nonzero rows only, the first nonzero number from the left (also called the pivot or the leading coefficient) is always strictly to the right of the pivot of the row above it. 25 2.3.2 Elementary Transformations • Basic and Free Variables The variables corresponding to the pivots in the row-echelon form are called basic variables and the other variables are free variables. • x1, x3, x4 are basic variable variables, whereas x2, x5 are free variables. free variables basic variables, pivot column 26 2.3.2 Elementary Transformations • Reduced Row Echelon Form - It is in row-echelon form. - Every pivot is 1. - The pivot is the only nonzero entry in its column. • The reduced row-echelon form allows us to determine the general solution of a system of linear equations in a straightforward way. • Gaussian Elimination Gaussian elimination is an algorithm that performs elementary transformations to bring a system of linear equations into reduced rowechelon form. 27 2.3.3 The Minus-1 Trick • We introduce a practical trick for reading out the solutions x of a homogeneous system of linear equations π΄π₯ = 0. • We extend this matrix to an π ∗ π-matrix π΄α by adding π − π rows of the form so that the diagonal of the augmented matrix π΄α contains either 1 or -1. • The columns of π΄α that contain the -1 as pivots are solutions of the homogeneous equation system π΄π₯ = 0. 28 2.3.3 The Minus-1 Trick • Row-echelon form (REF) • Augmented matrix to a 5*5 by adding rows of the form [0 … 0 -1 0 … 0] at the places where the pivots on the diagonal are missing • From this form, we can immediately read out the solutions of Ax = 0 29 2.3.3 The Minus-1 Trick • Calculating the Inverse We use the augmented matrix notation for a compact representation of this set of systems of linear equations. , • This means that if we bring the augmented equation system into reduced row- echelon form, we can read out the inverse on the right-hand side of the equation system. Augmented matrix The use of Gaussian elimination to bring it into reduced row-echelon form The desired inverse is given as its right-hand side 30 2.4 Vector Spaces 2.4.1 Groups • Definition of Group Consider a set G and an operation ⊗ : π × π → π defined on π. Then πΊ β (π,⊗) is called a group if the following hold: • The inverse element is defined with respect to the operation ⊗ and does not 1 necessarily mean . π₯ • If additionally ∀π₯, π¦ ∈ π βΆ π₯ ⊗ π¦ = π¦ ⊗ π₯, then πΊ = (π,⊗) is an Abelian group (commutative). 31 2.4.1 Groups • Definition of General Linear Group The set of regular (invertible) matrices π΄ ∈ π π∗π is a group with respect to matrix multiplication and is called general linear group πΊπΏ(π, π ). However, since matrix multiplication is not commutative, the group is not Abelian. μ°Έκ³ ) 32 2.4.2 Vector Spaces • Definition of Vector Space A real-valued vector space π½ = (π, +,β) is a set π with two operations • The elements π₯ ∈ π½ are called vectors. • The neutral element of π, + is the zero vector π = [0, … , 0]π . • The inner operation + is called vector addition, the elements ο¬ ο R are called scalars, and the outer operation ο is a multiplication by scalars. 33 2.4.3 Vector Subspaces • Definition of Vector Subspace Intuitively, Vector Subspaces are sets contained in the original vector space with the property that when we perform vector space operations on elements within this subspace, we will never leave it. Let π½ = π, +,β be a vector space and U ⊆ V, U ≠ ∅ . Then π = U, +,β is called vector subspace of π (or linear subspace) if π is a vector space with the vector space operations + and β restricted to π × π and π × π. We write πΌ ⊆ π to denote a subspace πΌ of π . 34 2.4.3 Vector Subspaces • Examples: βͺ Not all subsets of R2 are subspaces. βͺ In A and C, the closure property is violated. βͺB does not contain 0. βͺ Only D is a subspace. 35 2.5 Linear Independence • Definition of Linear Combination Consider a vector space π½ and a finite number of vectors π₯1 , … , π₯π ∈ π½. Then, every π£ ∈ π½ of the form with π1 , … , ππ ∈ π is a linear combination of the vectors π₯1 , … , π₯π . • Definition of Linear (In)dependence Let us consider a vector space π½ with π ∈ π and π₯1 , … , π₯π ∈ π½. If there is a non-trivial linear combination, such that 0 = σππ=1 ππ π₯π with at least one ππ ≠ 0, the vectors π₯1 , … , π₯π are linearly dependent. If only the trivial solution exists, i.e., π1 = β― = ππ = 0 the vectors π₯1 , … , π₯π are linearly independent. • Linear independence is one of the most important concepts in linear algebra. βͺ Intuitively, a set of linearly independent vectors consists of vectors that have no redundancy. • A practical way of checking whether vectors π₯1 , … , π₯π ∈ π½ are linearly independent is to use Gaussian elimination: Write all vectors as columns of a matrix π¨ and perform Gaussian elimination until the matrix is in row echelon form. 36 2.5 Linear Independence • Geographic example (with crude approximations to cardinal directions) of linearly dependent vectors in a two-dimensional space (plane) third “751km West” vector (black) is a linear combination of the other two vectors, and it makes the set of vectors linearly dependent. βͺ The given “751km West” and “374km Southwest” can be linearly combined to obtain “506km Northwest”. βͺ Equivalently, 37 2.6 Basis and Rank 2.6.1 Generating Set and Basis • Definition of Generating Set and Span Consider a vector space π½ = π, +,β and set of vectors π΄ = {π₯1 , … , π₯π } ⊆ V. If every vector π£ ∈ π can be expressed as a linear combination of π₯1 , … , π₯π , π΄ is called a generating set of π½. The set of all linear combinations of vectors in π΄ is called the span of π΄. If π΄ spans the vector space π½, we write π½ = π πππ[π΄] or π½ = π πππ[π₯1 , … , π₯π ]. • Generating sets are sets of vectors that span vector (sub)spaces. 38 2.6.1 Generating Set and Basis • Definition of Basis Consider a vector space π½ = (π, +,β) and A ⊆ π. A generating set A ππ π½ is called minimal if there exists no smaller set π΄α ⊆ A ⊆ V that spans π½. Every linearly independent generating set of π½ is minimal and is called a basis of π½. 39 2.6.1 Generating Set and Basis • Every vector space V possesses a basis B. There can be many bases of a vector space V , i.e., there is no unique basis. βͺ However, all bases possess the same number of elements, the basis vectors. • The dimension of a vector space corresponds to the number of its basis vectors. 40 2.6.2 Rank • Definition of Rank The number of linearly independent columns of a matrix π΄ ∈ π π∗π equals the number of linearly independent rows and is called the rank of π΄ and is denoted by ππ(π΄). • Example 1 • Example 2 41 2.7 Linear Mappings • Consider two real vector spaces π, π. A mapping π· βΆ π → π preserves the structure of the vector space if for all π₯, π¦ ∈ π and π ∈ π . • Definition of Linear Mapping For vector spaces π, π, a mapping π· βΆ π → π is called a linear mapping (or vector space homomorphism / linear transformation) if 42 2.7 Linear Mappings • Definition of Injective, Surjective, Bijective Injective Function(λ¨μ¬ν¨μ) Surjective Function(μ μ¬ν¨μ) Bijective Function(μ λ¨μ¬ν¨μ) 43 2.7 Linear Mappings • Definition of Injective, Surjective, Bijective Consider a mapping π· βΆ π → π, where π, π can be arbitrary sets. Then π· is called as follows. 44 2.7 Linear Mappings • Definition of Injective, Surjective, Bijective 45 2.7.1 Matrix Representation of Linear Mappings • Any n-dimensional vector space is isomorphic (linear and bijective). • In the following, the order of the basis vectors will be important. ο an ordered basis of π (π : π-dimensional vector space) ο an unordered basis • Definition of Coordinates Consider a vector space π and an ordered basis π΅ = (π1 , … , ππ ) of π. For any π₯ ∈ π, we obtain a unique representation (linear combination) of π₯ with respect to π΅. Then, πΌ1 , … , πΌπ are the coordinates of π₯ with respect to π΅, and the vector is the coordinate vector / coordinate representation of π₯ with respect to the ordered basis π΅. 46 2.7.1 Matrix Representation of Linear Mappings 47 2.7.1 Matrix Representation of Linear Mappings • Definition of Transformation Matrix Consider vector spaces π, π with corresponding (ordered) bases π΅ = (π1 , … , ππ ) and πΆ = π1 , … , ππ . Moreover, we consider a linear mapping π· βΆ π → π. • For π ∈ 1, … , π , is the unique representation of π·(ππ ) with respect to πΆ. • Then, we call the πο΄π-matrix π΄π· , whose elements are given by the transformation matrix of π· (with respect to the ordered bases π΅ of π and πΆ of π). 48 2.7.1 Matrix Representation of Linear Mappings • Three examples of linear transformations of the vectors shown as dots in (a); (b) Rotation by 45 degree; (c) Stretching of the horizontal coordinates by 2; (d) Combination of reflection, rotation and stretching 49 2.7.2 Basis Change • Definition of Basis Change 50 2.7.2 Basis Change • Definition of Equivalence Two matrices π΄, π΄ΰ·© ∈ π π∗π are equivalent if there exist regular matrices π ∈ π π∗π and π ∈ π π∗π , such that π΄ΰ·© = π −1 π΄π. • Definition of Similarity Two matrices π΄, π΄ΰ·© ∈ π π∗π are similar if there exists a regular matrix π ∈ π π∗π with π΄ΰ·© = π −1 π΄π. • Similar matrices are always equivalent. However, equivalent matrices are not necessarily similar. 51 2.7.2 Basis Change • Example 2.24 (Basis Change) 52 2.7.3 Image and Kernel • The image and kernel of a linear mapping are vector subspaces with certain important properties. • Definition of Image and Kernel For π· βΆ π → π, we define the kernel / null space and the image / range We also call π and π also the domain and codomain of π·, respectively. • Intuitively, the kernel is the set of vectors in π£ ∈ π that π· maps onto the neutral element 0π€ ∈ π. • The image is the set of vectors π€ ∈ π that can be “reached” by π· from any vector in π. 53 2.7.3 Image and Kernel 54 2.8 Affine Spaces 2.8.1 Affine Subspaces • Discuss properties of mappings between these affine spaces, which resemble linear mappings. βͺ In the machine learning literature, the distinction between linear and affine is sometimes not clear so that we can find references to affine spaces/mappings as linear spaces/mappings • Definition of Affine Subspace Let π be a vector space, π₯0 ∈ π and π ⊆ π a subspace. Then, the subset is called affine subspace or linear manifold of π. π is called direction or direction space, and π₯0 is called support point. 55 2.8 Affine Spaces 2.8.1 Affine Subspaces 56 2.8.2 Affine Mappings • Linear and affine mappings are closely related ο Therefore, many properties that we already know from linear mappings also hold for affine mappings. • Definition of Affine Mapping For two vector spaces π, π, a linear mapping π· βΆ π → π, and π ∈ π, the mapping is an affine mapping from π to π. The vector π is called the translation vector of π·. 57 2.8.2 Affine Mappings • Example 58 2.8.2 Affine Mappings 59 60