Uploaded by ddongddi2000

mml 2 final

advertisement
Mathematics for ML
Linear Algebra
Suk-Ju Kang
Department of Electronic Engineering
Sogang University
2 Linear Algebra
• Linear algebra is the study of vectors and certain algebra rules to
manipulate vectors.
• Vectors are special objects that can be added together and multiplied by
scalars to produce another object of the same kind.
• Examples
β–ͺ
Geometric vectors : Geometric vectors are directed segments,
which have direction and magnitude (at least in two dimensions).
β–ͺ
Polynomials : Polynomials are very different from geometric vectors.
While geometric vectors are concrete “drawings”,
polynomials are abstract concepts.
β–ͺ
Audio signals: The signals are represented as a series of numbers.
β–ͺ
Elements of 𝑅𝑛 : 𝑅𝑛 is more abstract than polynomials (we focus them on in this book)
2
2 Linear Algebra
A mind map of the concepts introduced in this chapter,
along with where they are used in other parts of the book
3
2.1 Systems of Linear Equations
• For a real-valued system of linear equations, we obtain either no, exactly
one, or infinitely many solutions.
• In a system of linear equations with two variables π‘₯1 , π‘₯2 , each linear
equation defines a line on the π‘₯1 π‘₯2 -plane.
β–ͺ Since
a solution to a system of linear equations must satisfy all equations
simultaneously, the solution set is the intersection of these lines.
β–ͺ This
intersection set can be a line, a point, or empty.
4
2.1 Systems of Linear Equations
• For a systematic approach to solving systems of linear equations, we will
introduce a useful compact notation.
5
2.2 Matrices
• Definition of Matrix
With π‘š, 𝑛 ∈ 𝑁 a real-valued (π‘š, 𝑛) matrix 𝐴 is an π‘š βˆ™ 𝑛 -tuple of elements
π‘Žπ‘–π‘— , 𝑖 = 1,…,π‘š, 𝑗 = 1,…,𝑛, which is ordered according to a rectangular
scheme consisting of π‘š rows and 𝑛 columns:
row
column
6
2.2.1 Matrix Addition and Multiplication
• The sum of two matrices 𝐴 ∈ π‘…π‘š∗𝑛 , 𝐡 ∈ π‘…π‘š∗𝑛 is defined as the elementwise sum.
• For matrices 𝐴 ∈ π‘…π‘š∗𝑛 , 𝐡 ∈ 𝑅𝑛∗π‘˜ , the elements 𝑐𝑖𝑗 of the product 𝐢 =
𝐴𝐡 ∈ π‘…π‘š∗π‘˜ are computed as
7
2.2.1 Matrix Addition and Multiplication
• Matrices can only be multiplied if their “neighboring” dimensions match.
• The product 𝐡𝐴 is not defined if π‘š ≠ 𝑛 since the neighboring dimensions
do not match.
• Matrix multiplication is not defined as an element-wise operation on
matrix elements, i.e., 𝑐𝑖𝑗 ≠ π‘Žπ‘–π‘— 𝑏𝑖𝑗
• This kind of element-wise multiplication often appears in programming
languages when we multiply (multi-dimensional) arrays with each other,
and is called a Hadamard product.
8
2.2.1 Matrix Addition and Multiplication
• Definition of Identity Matrix
In 𝑅𝑛∗𝑛 , we define the identity matrix as the 𝑛𝑛 -matrix containing 1 on
the diagonal and 0 everywhere else.
9
2.2.1 Matrix Addition and Multiplication
• Properties of matrices
β–ͺ Associativity
β–ͺ Distributivity
β–ͺ Multiplication
with the identity matrix
10
2.2.2 Inverse and Transpose
• Definition of Inverse
Consider a square matrix 𝐴 ∈ 𝑅𝑛∗𝑛 . Let matrix 𝐡 ∈ 𝑅𝑛∗𝑛 have the
property that 𝐴𝐡 = 𝐼𝑛 = 𝐡𝐴.
οƒ  𝐡 is called the inverse of 𝐴 and denoted by 𝐴−1 .
• Unfortunately, not every matrix 𝐴 possesses an inverse 𝐴−1 .
• If this inverse does exist, 𝐴 is called regular / invertible / nonsingular,
otherwise singular / noninvertible.
• When the inverse matrix exists, it is unique.
• If the determinant of a matrix is not equal to zero, then the matrix is
called a nonsingular matrix.
11
2.2.2 Inverse and Transpose
• π‘Ž11 π‘Ž22 − π‘Ž12 π‘Ž21 is the determinant of a 2ο‚΄2-matrix.
• We can generally use the determinant to check whether a matrix is
invertible.
12
2.2.2 Inverse and Transpose
• Definition of Transpose
For 𝐴 ∈ π‘…π‘š∗𝑛 , the matrix 𝐡 ∈ 𝑅𝑛∗π‘š with 𝑏𝑖𝑗 = π‘Žπ‘–π‘— is called the transpose
of 𝐴. We write 𝐡 = 𝐴𝑇 .
• 𝐴𝑇 can be obtained by writing the columns of 𝐴 as the rows of 𝐴𝑇 .
• Properties of inverses and transposes:
13
2.2.2 Inverse and Transpose
• Definition of Symmetric Matrix
𝐴 matrix 𝐴 ∈ 𝑅𝑛∗𝑛 is symmetric if 𝐴 = 𝐴𝑇 .
• Note that only (𝑛, 𝑛)-matrices can be symmetric.
we call (𝑛, 𝑛)-matrices also square matrices because they possess
the same number of rows and columns.
β–ͺ Generally,
• If 𝐴 is invertible, then so is 𝐴𝑇 , and (𝐴−1 )𝑇 = (𝐴𝑇 )−1 = 𝐴−𝑇 .
• The sum of symmetric matrices 𝐴, B ∈ 𝑅𝑛∗𝑛 is always symmetric.
β–ͺ However,
their product is generally not symmetric:
14
2.2.2 Inverse and Transpose
< Normalizing flow ꡬ쑰>
15
2.2.3 Multiplication by a Scalar
• Associativity
• Distributivity
16
2.2.4 Compact Representations of Systems of Linear
Equations
• The system of linear equations
• Use the rules for matrix multiplication
17
2.3 Solving Systems of Linear Equations
2.3.1 Particular and General Solution
• The system has two equations and four unknowns.
β–ͺ Therefore,
in general, we would expect infinitely many solutions.
• A solution to the problem in (2.38) can be found immediately by taking
42 times the first column and 8 times the second column so that
• Therefore, a solution is [42, 8, 0, 0]𝑇 .
This solution is called a particular solution or special solution.
18
2.3.1 Particular and General Solution
• Adding 0 to our special solution does not change the special solution.
• To do so, we express the third column using the first two columns.
19
2.3.1 Particular and General Solution
• Putting everything together, we obtain all solutions of the equation:
General solution
20
2.3.1 Particular and General Solution
• The general approach with three steps:
• Neither the general nor the particular solution is unique.
• In the convenient form(equations) previously used, particular and
general solution can be found, but general equation systems are not of
this simple form.
• Fortunately, there exists a constructive algorithmic way of transforming
any system of linear equations into this particularly simple form:
οƒ  Gaussian elimination
21
2.3.2 Elementary Transformations
• Key to solving a system of linear equations are elementary
transformations that keep the solution set the same, but that transform
the equation system into a simpler form:
• First of all, we no longer mention the variables x explicitly and build the
augmented matrix [𝐴 | 𝑏]
1.
2.
3.
22
2.3.2 Elementary Transformations
row-echelon form
: pivot
: free variable
23
2.3.2 Elementary Transformations
• Reverting this compact notation back into the explicit notation with the variables,
• Only for π‘Ž = −1 this system can be solved (solvability condition). A particular solution is
free variable οƒ  0
• The general solution, which captures the set of all possible solutions, is
# of free variables = 2,
24
2.3.2 Elementary Transformations
• Definition of Row-Echelon Form
All rows that contain only zeros are at the bottom of the matrix
β–ͺ
Correspondingly, all rows that contain at least one nonzero element are on top of rows
that contain only zeros.
• Looking at nonzero rows only, the first nonzero number from the left (also called
the pivot or the leading coefficient) is always strictly to the right of the pivot of the
row above it.
25
2.3.2 Elementary Transformations
• Basic and Free Variables
The variables corresponding to the pivots in the row-echelon form are
called basic variables and the other variables are free variables.
• x1, x3, x4 are basic variable variables, whereas x2, x5 are free variables.
free variables
basic variables,
pivot column
26
2.3.2 Elementary Transformations
• Reduced Row Echelon Form
- It is in row-echelon form.
- Every pivot is 1.
- The pivot is the only nonzero entry in its column.
• The reduced row-echelon form allows us to determine the general
solution of a system of linear equations in a straightforward way.
• Gaussian Elimination
Gaussian elimination is an algorithm that performs elementary
transformations to bring a system of linear equations into reduced rowechelon form.
27
2.3.3 The Minus-1 Trick
• We introduce a practical trick for reading out the solutions x of a
homogeneous system of linear equations 𝐴π‘₯ = 0.
• We extend this matrix to an 𝑛 ∗ 𝑛-matrix 𝐴ሚ by adding 𝑛 − π‘˜ rows of the
form
so that the diagonal of the augmented matrix
𝐴ሚ contains either 1 or -1.
• The columns of 𝐴ሚ that contain the -1 as pivots are solutions of the
homogeneous equation system 𝐴π‘₯ = 0.
28
2.3.3 The Minus-1 Trick
• Row-echelon form (REF)
• Augmented matrix to a 5*5 by adding rows of the form [0 … 0 -1 0 … 0]
at the places where the pivots on the diagonal are missing
• From this form, we can immediately read out the solutions of Ax = 0
29
2.3.3 The Minus-1 Trick
• Calculating the Inverse
We use the augmented matrix notation for a compact representation of this set
of systems of linear equations.
,
• This means that if we bring the augmented equation system into reduced row-
echelon form, we can read out the inverse on the right-hand side of the equation
system.
Augmented matrix
The use of Gaussian elimination
to bring it into reduced row-echelon form
The desired inverse is given
as its right-hand side
30
2.4 Vector Spaces
2.4.1 Groups
• Definition of Group
Consider a set G and an operation ⊗ : 𝑔 × π‘” → 𝑔 defined on 𝑔.
Then 𝐺 ≔ (𝑔,⊗) is called a group if the following hold:
• The inverse element is defined with respect to the operation ⊗ and does not
1
necessarily mean .
π‘₯
• If additionally ∀π‘₯, 𝑦 ∈ 𝑔 ∢ π‘₯ ⊗ 𝑦 = 𝑦 ⊗ π‘₯, then 𝐺 = (𝑔,⊗) is an Abelian group
(commutative).
31
2.4.1 Groups
• Definition of General Linear Group
The set of regular (invertible) matrices 𝐴 ∈ 𝑅𝑛∗𝑛 is a group with respect
to matrix multiplication and is called general linear group 𝐺𝐿(𝑛, 𝑅).
However, since matrix multiplication is not commutative, the group is not
Abelian.
μ°Έκ³ )
32
2.4.2 Vector Spaces
• Definition of Vector Space
A real-valued vector space 𝑽 = (𝑉, +,βˆ™) is a set 𝑉 with two operations
• The elements π‘₯ ∈ 𝑽 are called vectors.
• The neutral element of 𝑉, + is the zero vector 𝟎 = [0, … , 0]𝑇 .
• The inner operation + is called vector addition, the elements  οƒŽ R are called
scalars, and the outer operation οƒ— is a multiplication by scalars.
33
2.4.3 Vector Subspaces
• Definition of Vector Subspace
Intuitively, Vector Subspaces are sets contained in the original vector space with the
property that when we perform vector space operations on elements within this subspace,
we will never leave it.
Let 𝑽 = 𝑉, +,βˆ™ be a vector space and U ⊆ V, U ≠ ∅ .
Then 𝐔 = U, +,βˆ™ is called vector subspace of 𝐕 (or linear subspace) if 𝐔 is a vector space
with the vector space operations + and βˆ™ restricted to π‘ˆ × π‘ˆ and 𝑅 × π‘ˆ.
We write 𝑼 ⊆ 𝐕 to denote a subspace 𝑼 of 𝐕 .
34
2.4.3 Vector Subspaces
• Examples:
β–ͺ Not
all subsets of R2 are subspaces.
β–ͺ In
A and C, the closure property is violated.
β–ͺB
does not contain 0.
β–ͺ Only
D is a subspace.
35
2.5 Linear Independence
• Definition of Linear Combination
Consider a vector space 𝑽 and a finite number of vectors π‘₯1 , … , π‘₯π‘˜ ∈ 𝑽.
Then, every 𝑣 ∈ 𝑽 of the form
with πœ†1 , … , πœ†π‘˜ ∈ 𝑅 is a linear combination of the vectors π‘₯1 , … , π‘₯π‘˜ .
• Definition of Linear (In)dependence
Let us consider a vector space 𝑽 with π‘˜ ∈ 𝑁 and π‘₯1 , … , π‘₯π‘˜ ∈ 𝑽.
If there is a non-trivial linear combination, such that 0 = σπ‘˜π‘–=1 πœ†π‘– π‘₯𝑖 with at least one πœ†π‘– ≠ 0, the vectors
π‘₯1 , … , π‘₯π‘˜ are linearly dependent. If only the trivial solution exists, i.e., πœ†1 = β‹― = πœ†π‘˜ = 0 the vectors
π‘₯1 , … , π‘₯π‘˜ are linearly independent.
• Linear independence is one of the most important concepts in linear algebra.
β–ͺ
Intuitively, a set of linearly independent vectors consists of vectors that have no redundancy.
• A practical way of checking whether vectors π‘₯1 , … , π‘₯π‘˜ ∈ 𝑽 are linearly independent is to use
Gaussian elimination: Write all vectors as columns of a matrix 𝑨 and perform Gaussian elimination until
the matrix is in row echelon form.
36
2.5 Linear Independence
• Geographic example (with crude approximations to cardinal directions)
of linearly dependent vectors in a two-dimensional space (plane)
third “751km West” vector (black) is a linear combination of the other two
vectors, and it makes the set of vectors linearly dependent.
β–ͺ The
given “751km West” and “374km Southwest” can be linearly
combined to obtain “506km Northwest”.
β–ͺ Equivalently,
37
2.6 Basis and Rank
2.6.1 Generating Set and Basis
• Definition of Generating Set and Span
Consider a vector space 𝑽 = 𝑉, +,βˆ™ and set of vectors 𝐴 = {π‘₯1 , … , π‘₯π‘˜ } ⊆ V.
If every vector 𝑣 ∈ 𝑉 can be expressed as a linear combination of π‘₯1 , … , π‘₯π‘˜ , 𝐴 is
called a generating set of 𝑽.
The set of all linear combinations of vectors in 𝐴 is called the span of 𝐴.
If 𝐴 spans the vector space 𝑽, we write 𝑽 = π‘ π‘π‘Žπ‘›[𝐴] or 𝑽 = π‘ π‘π‘Žπ‘›[π‘₯1 , … , π‘₯π‘˜ ].
• Generating sets are sets of vectors that span vector (sub)spaces.
38
2.6.1 Generating Set and Basis
• Definition of Basis
Consider a vector space 𝑽 = (𝑉, +,βˆ™) and A ⊆ 𝑉.
A generating set A π‘œπ‘“ 𝑽 is called minimal if there exists no smaller set 𝐴ሚ ⊆ A ⊆ V that spans 𝑽.
Every linearly independent generating set of 𝑽 is minimal and is called a basis of 𝑽.
39
2.6.1 Generating Set and Basis
• Every vector space V possesses a basis B.
There can be many bases of a vector space
V , i.e., there is no unique basis.
β–ͺ
However, all bases possess the same number
of elements, the basis vectors.
• The dimension of a vector space
corresponds to the number of its basis
vectors.
40
2.6.2 Rank
• Definition of Rank
The number of linearly independent columns of a matrix 𝐴 ∈ π‘…π‘š∗𝑛 equals the
number of linearly independent rows and is called the rank of 𝐴 and is denoted
by π‘Ÿπ‘˜(𝐴).
• Example 1
• Example 2
41
2.7 Linear Mappings
• Consider two real vector spaces 𝑉, π‘Š.
A mapping 𝛷 ∢ 𝑉 → π‘Š preserves the structure of the vector space if
for all π‘₯, 𝑦 ∈ 𝑉 and πœ† ∈ 𝑅.
• Definition of Linear Mapping
For vector spaces 𝑉, π‘Š, a mapping 𝛷 ∢ 𝑉 → π‘Š is called a linear mapping (or
vector space homomorphism / linear transformation) if
42
2.7 Linear Mappings
• Definition of Injective, Surjective, Bijective
Injective Function(λ‹¨μ‚¬ν•¨μˆ˜)
Surjective Function(μ „μ‚¬ν•¨μˆ˜)
Bijective Function(μ „λ‹¨μ‚¬ν•¨μˆ˜)
43
2.7 Linear Mappings
• Definition of Injective, Surjective, Bijective
Consider a mapping 𝛷 ∢ 𝑉 → π‘Š, where 𝑉, π‘Š can be arbitrary sets. Then 𝛷 is called as
follows.
44
2.7 Linear Mappings
• Definition of Injective, Surjective, Bijective
45
2.7.1 Matrix Representation of Linear Mappings
• Any n-dimensional vector space is isomorphic (linear and bijective).
• In the following, the order of the basis vectors will be important.
οƒ  an ordered basis of 𝑉 (𝑉 : 𝑛-dimensional vector space)
οƒ  an unordered basis
• Definition of Coordinates
Consider a vector space 𝑉 and an ordered basis 𝐡 = (𝑏1 , … , 𝑏𝑛 ) of 𝑉.
For any π‘₯ ∈ 𝑉, we obtain a unique representation (linear combination)
of π‘₯ with respect to 𝐡.
Then, 𝛼1 , … , 𝛼𝑛 are the coordinates of π‘₯ with respect to 𝐡, and the vector
is the coordinate vector / coordinate representation of π‘₯ with respect
to the ordered basis 𝐡.
46
2.7.1 Matrix Representation of Linear Mappings
47
2.7.1 Matrix Representation of Linear Mappings
• Definition of Transformation Matrix
Consider vector spaces 𝑉, π‘Š with corresponding (ordered) bases
𝐡 = (𝑏1 , … , 𝑏𝑛 ) and 𝐢 = 𝑐1 , … , π‘π‘š .
Moreover, we consider a linear mapping 𝛷 ∢ 𝑉 → π‘Š.
• For 𝑗 ∈ 1, … , 𝑛 ,
is the unique representation of 𝛷(𝑏𝑗 ) with respect to 𝐢.
• Then, we call the π‘šο‚΄π‘›-matrix 𝐴𝛷 , whose elements are given by
the transformation matrix of 𝛷 (with respect to the ordered bases 𝐡 of 𝑉 and 𝐢
of π‘Š).
48
2.7.1 Matrix Representation of Linear Mappings
• Three examples of linear transformations of the vectors shown as dots in (a);
(b) Rotation by 45 degree; (c) Stretching of the horizontal coordinates by 2;
(d) Combination of reflection, rotation and stretching
49
2.7.2 Basis Change
• Definition of Basis Change
50
2.7.2 Basis Change
• Definition of Equivalence
Two matrices 𝐴, 𝐴෩ ∈ π‘…π‘š∗𝑛 are equivalent if there exist regular matrices
𝑆 ∈ 𝑅𝑛∗𝑛 and 𝑇 ∈ π‘…π‘š∗π‘š , such that 𝐴෩ = 𝑇 −1 𝐴𝑆.
• Definition of Similarity
Two matrices 𝐴, 𝐴෩ ∈ 𝑅𝑛∗𝑛 are similar if there exists a regular matrix 𝑆 ∈
𝑅𝑛∗𝑛 with 𝐴෩ = 𝑆 −1 𝐴𝑆.
• Similar matrices are always equivalent.
However, equivalent matrices are not necessarily similar.
51
2.7.2 Basis Change
• Example 2.24 (Basis Change)
52
2.7.3 Image and Kernel
• The image and kernel of a linear mapping are vector subspaces with
certain important properties.
• Definition of Image and Kernel
For 𝛷 ∢ 𝑉 → π‘Š, we define the kernel / null space
and the image / range
We also call 𝑉 and π‘Š also the domain and codomain of 𝛷, respectively.
• Intuitively, the kernel is the set of vectors in 𝑣 ∈ 𝑉 that 𝛷 maps onto the neutral
element 0𝑀 ∈ π‘Š.
• The image is the set of vectors 𝑀 ∈ π‘Š that can be “reached” by 𝛷 from any
vector in 𝑉.
53
2.7.3 Image and Kernel
54
2.8 Affine Spaces
2.8.1 Affine Subspaces
• Discuss properties of mappings between these affine spaces, which
resemble linear mappings.
β–ͺ In
the machine learning literature, the distinction between linear and affine
is sometimes not clear so that we can find references to affine
spaces/mappings as linear spaces/mappings
• Definition of Affine Subspace
Let 𝑉 be a vector space, π‘₯0 ∈ 𝑉 and π‘ˆ ⊆ 𝑉 a subspace.
Then, the subset
is called affine subspace or linear manifold of 𝑉.
π‘ˆ is called direction or direction space, and π‘₯0 is called support point.
55
2.8 Affine Spaces
2.8.1 Affine Subspaces
56
2.8.2 Affine Mappings
• Linear and affine mappings are closely related οƒ  Therefore, many properties that we
already know from linear mappings also hold for affine mappings.
• Definition of Affine Mapping
For two vector spaces 𝑉, π‘Š, a linear mapping 𝛷 ∢ 𝑉 → π‘Š, and π‘Ž ∈ π‘Š, the mapping
is an affine mapping from 𝑉 to π‘Š.
The vector π‘Ž is called the translation vector of 𝛷.
57
2.8.2 Affine Mappings
• Example
58
2.8.2 Affine Mappings
59
60
Download