Uploaded by 6713fdeadd

ML Lec 3- Review of Linear Algebra

advertisement
L3: Review of linear algebra
•
•
•
•
•
•
•
Vector and matrix notation
Vectors
Matrices
Vector spaces
Linear transformations
Eigenvalues and eigenvectors
MATLAB® primer
1
Vector and matrix notation
– A d-dimensional (column) vector 𝑥 and its transpose are written as:
𝑥1
𝑥2
𝑥 = ⋮ and 𝑥 𝑇 = 𝑥1
𝑥𝑑
𝑥2
… 𝑥𝑑
– An 𝑛 × 𝑑 (rectangular) matrix and its transpose are written as
𝑎11
𝑎21
𝐴=
⋮
𝑎𝑛1
𝑎12
𝑎22
𝑎13
𝑎23
…
𝑎1𝑑
𝑎2𝑑
⋱
𝑎𝑛2
𝑎𝑛3
𝑎𝑛𝑑
𝑎11
𝑎12
and 𝑎𝑇 = 𝑎13
⋮
𝑎1𝑑
𝑎21
𝑎22
𝑎23
…
⋱
𝑎𝑛1
𝑎𝑛2
𝑎𝑛3
𝑎2𝑑
𝑎𝑛𝑑
– The product of two matrices is
𝑎11
𝑎
𝐴𝐵 = 21
𝑎12
𝑎22
𝑎13
𝑎23
𝑎1𝑑
𝑎2𝑑
𝑎𝑚1 𝑎𝑚2 𝑎𝑚3
𝑎𝑚𝑑
𝑏11
𝑏21
𝑏31
𝑏12
𝑏22
𝑏32
𝑏1𝑛
𝑐11
𝑏2𝑛
𝑐21
𝑏3𝑛 = 𝑐31
𝑐12
𝑐22
𝑐32
𝑐13
𝑐23
𝑐33
𝑐1𝑛
𝑐2𝑛
𝑐3𝑛
𝑏𝑑1
𝑏𝑑2
𝑏𝑑𝑛
𝑐𝑚2
𝑐𝑚3
𝑐𝑚𝑛
where 𝑐𝑖𝑗 =
𝑐𝑚1
𝑑
𝑘=1 𝑎𝑖𝑘 𝑏𝑘𝑗
2
Vectors
– The inner product (a.k.a. dot product or scalar product) of two vectors is
defined by
𝑇
𝑑
𝑇
𝑥, 𝑦 = 𝑥 𝑦 = 𝑦 𝑥 =
𝑥𝑘 𝑦𝑘
𝑘=1
– The magnitude of a vector is
𝑥 =
𝑥𝑇𝑥 =
𝑑
𝑥𝑘 𝑥𝑘
1
2
𝑘=1
– The orthogonal projection of vector 𝑦 onto vector 𝑥 is
• where vector 𝑢𝑥 has unit magnitude and
the same direction as 𝑥
– The angle between vectors 𝑥 and 𝑦 is
𝑥, 𝑦
𝑐𝑜𝑠𝜃 =
𝑥 𝑦
– Two vectors 𝑥 and 𝑦 are said to be
• orthogonal if 𝑥𝑇𝑦 = 0
• orthonormal if 𝑥𝑇𝑦 = 0 and |𝑥| = |𝑦| = 1
𝑦 𝑇 𝑢𝑥 𝑢𝑥
y
x

ux
yTux
|x |
3
– A set of vectors 𝑥1, 𝑥2, … , 𝑥𝑛 are said to be linearly dependent if there exists a
set of coefficients 𝑎1, 𝑎2, … , 𝑎𝑛 (at least one different than zero) such that
𝑎1 𝑥1 + 𝑎2 𝑥2 … 𝑎𝑛 𝑥𝑛 = 0
– Alternatively, a set of vectors 𝑥1, 𝑥2, … , 𝑥𝑛 are said to be linearly independent if
𝑎1 𝑥1 + 𝑎2 𝑥2 … 𝑎𝑛 𝑥𝑛 = 0 ⇒ 𝑎𝑘 = 0 ∀𝑘
4
Matrices
– The determinant of a square matrix 𝐴𝑑𝑑 is
𝑑
𝐴 =
𝑎𝑖𝑘 𝐴𝑖𝑘 −1
𝑘+𝑖
𝑘=1
• where 𝐴𝑖𝑘 is the minor formed by removing the ith row and the kth column of 𝐴
• NOTE: the determinant of a square matrix and its transpose is the same: |𝐴| = |𝐴𝑇 |
– The trace of a square matrix 𝐴𝑑𝑑 is the sum of its diagonal elements
𝑑
𝑡𝑟 𝐴 =
𝑎𝑘𝑘
𝑘=1
– The rank of a matrix is the number of linearly independent rows (or columns)
– A square matrix is said to be non-singular if and only if its rank equals the
number of rows (or columns)
• A non-singular matrix has a non-zero determinant
5
– A square matrix is said to be orthonormal if 𝐴𝐴𝑇 = 𝐴𝑇 𝐴 = 𝐼
– For a square matrix A
• if 𝑥 𝑇 𝐴𝑥 > 0 ∀𝑥 ≠ 0, then 𝐴 is said to be positive-definite (i.e., the covariance
matrix)
• 𝑥 𝑇 𝐴𝑥 ≥ 0 ∀𝑥 ≠ 0, then A is said to be positive-semi-definite
– The inverse of a square matrix 𝐴 is denoted by 𝐴−1 and is such that
𝐴𝐴−1 = 𝐴−1 𝐴 = 𝐼
• The inverse 𝐴−1 of a matrix 𝐴 exists if and only if 𝐴 is non-singular
– The pseudo-inverse matrix 𝐴† is typically used whenever 𝐴−1 does not exist
(because 𝐴 is not square or 𝐴 is singular)
𝐴† = 𝐴𝑇 𝐴 −1 𝐴𝑇 with 𝐴† 𝐴 = 𝐼 (assuming 𝐴𝑇 𝐴 is non-singular)
• Note that A𝐴† ≠ 𝐼 in general
6
Vector spaces
– The n-dimensional space in which all the n-dimensional vectors reside is called
a vector space
– A set of vectors {𝑢1, 𝑢2, … 𝑢𝑛 } is said to form a basis for a vector space if any
arbitrary vector x can be represented by a linear combination of the 𝑢𝑖
u
𝑥 = 𝑎1 𝑢1 + 𝑎2 𝑢2 + ⋯ 𝑎𝑛 𝑢𝑛
3
• The coefficients {𝑎1, 𝑎2, … 𝑎𝑛 } are called
the components of vector 𝑥 with respect
to the basis {𝑢𝑖}
• In order to form a basis, it is necessary
and sufficient that the {𝑢𝑖 } vectors be
linearly independent
a3
a
a1
u1
a2
u2
≠0
=0
=1
– A basis {𝑢𝑖 } is said to be orthonormal if 𝑢𝑖𝑇 𝑢𝑗
=0
– A basis {𝑢𝑖 } is said to be orthogonal if 𝑢𝑖𝑇 𝑢𝑗
𝑖=𝑗
𝑖≠𝑗
𝑖=𝑗
𝑖≠𝑗
• As an example, the Cartesian coordinate base is an orthonormal base
7
– Given n linearly independent vectors {𝑥1, 𝑥2, … 𝑥𝑛}, we can construct an
orthonormal base 𝜙1 , 𝜙2 , … 𝜙𝑛 for the vector space spanned by {𝑥𝑖 } with
the Gram-Schmidt orthonormalization procedure (to be discussed in the RBF
lecture)
– The distance between two points in a vector space is defined as the
magnitude of the vector difference between the points
1
2
𝑑
𝑑𝐸 𝑥, 𝑦 = 𝑥 − 𝑦 =
𝑥𝑘 − 𝑦𝑘
2
𝑘=1
• This is also called the Euclidean distance
8
The Gram-Schmidt orthogonalization process
Let V be a vector space with an inner product.
Suppose x1 , x2 , . . . , xn is a basis for V . Let
v1 = x1 ,
hx2 , v1 i
v1 ,
hv1 , v1 i
hx3 , v1 i
hx3 , v2 i
v3 = x3 −
v1 −
v2 ,
hv1 , v1 i
hv2 , v2 i
.................................................
hxn , vn−1 i
hxn , v1 i
v1 − · · · −
vn−1 .
vn = xn −
hv1 , v1 i
hvn−1 , vn−1 i
v2 = x2 −
Then v1 , v2 , . . . , vn is an orthogonal basis for V .
9
Orthogonalization / Normalization
An alternative form of the Gram-Schmidt process combines
orthogonalization with normalization.
Suppose x1 , x2 , . . . , xn is a basis for an inner
product space V . Let
v1 = x1 , w1 =
v1
kv1 k ,
v2 = x2 − hx2 , w1 iw1 , w2 =
v2
kv2 k ,
v3 = x3 − hx3 , w1 iw1 − hx3 , w2 iw2 , w3 =
v3
kv3 k ,
.................................................
vn = xn − hxn , w1 iw1 − · · · − hxn , wn−1 iwn−1 ,
wn = kvvnn k .
Then w1 , w2 , . . . , wn is an orthonormal basis for V .
10
Example
Lei V = R3 ,vi h h Euclid an inn r product. V e ,, ill ,pp]y th Gram-8 h:i.nid algori hm
to or hogonalize h b i {(1 -1 1) (1 0 1) (1 1 2:) }.
Sep 1
Sep 2
V1
= ( 1 -1 1).
)· ll-11 (1 -1 1)
(1 ' 0. 1)· - ( ,110(11-1
1 ll2
(1,0 1) - i(l -1 1)
(½ � }).
Sep 3
>· l,-l2 l (11 11 1) (112 . 13 11
3 3, (1
(11 ' 1 2) (1, 12-1
1
1
1
1) l1
·
. · - 11
3
· - 11 (333) 112
2'
51 2 1
(1, 1 2) - f(l -1 1) - 2(3 3, 3)
]
(� 0
!).
You an v rif that { (1 -1 1) (½, �: ½ ), (-1, 0 ! ) } for·m an or hogonal ba i for R 3 .
malizing th v , 1 or in I h orthogonal basis we obi ain I he orthonormal ba is
,/3).·
-v'3.3 -/3
{ 1(-- -3
3
3,
(J6 J6 J6).·.
1--i6 ' 3 '
.
(. -J2
.. . 2
.
or­
J22).· }
.
2
.
--0-1
2
'
11
Linear transformations
– A linear transformation is a mapping from a vector space 𝑋 𝑁 onto a vector
space 𝑌 𝑀 , and is represented by a matrix
• Given vector 𝑥𝜖𝑋 𝑁 , the corresponding vector y on 𝑌 𝑀 is computed as
𝑦1
𝑎11 𝑎12
𝑎1𝑁 𝑥1
𝑦2
𝑎21 𝑎22
𝑎2𝑁 𝑥2
=
⋮
⋮
⋱
𝑦𝑀
𝑎𝑀1 𝑎𝑀2
𝑎𝑀𝑁 𝑥𝑁
• Notice that the dimensionality of the two spaces does not need to be the same
• For pattern recognition we typically have 𝑀 < 𝑁 (project onto a lower-dim space)
– A linear transformation represented by a square matrix A is said to be
orthonormal when 𝐴𝐴𝑇 = 𝐴𝑇 𝐴 = 𝐼
• This implies that 𝐴𝑇 = 𝐴−1
• An orthonormal xform has the property of preserving the magnitude of the vectors
𝑦 = 𝑦 𝑇 𝑦 = 𝐴𝑥 𝑇 𝐴𝑥 = 𝑥 𝑇 𝐴𝑇 𝐴𝑥 = 𝑥 𝑇 𝑥 = 𝑥
• An orthonormal matrix can be thought of as a rotation of the reference frame
• The row vectors of an orthonormal xform are a set of orthonormal basis vectors
← 𝑎1 →
← 𝑎2 →
0 𝑖≠𝑗
𝑌𝑀×1 =
𝑋𝑁×1 with 𝑎𝑖𝑇 𝑎𝑗 =
1 𝑖=𝑗
← 𝑎𝑁 →
12
Eigenvectors and eigenvalues
– Given a matrix 𝐴𝑁×𝑁 , we say that 𝑣 is an eigenvector* if there exists a scalar 𝜆
(the eigenvalue) such that
𝐴𝑣 = 𝜆𝑣
– Computing the eigenvalues
𝐴𝑣 = 𝜆𝑣 ⇒
𝐴 − 𝜆𝐼 = 0 ⇒
𝐴 − 𝜆𝐼 𝑣 = 0 ⇒
Trivial solution
𝑣=0
𝐴 − 𝜆𝐼 = 0 Non-trivial solution
𝐴 − 𝜆𝐼 = 0 ⇒ 𝜆𝑁 + 𝑎1 𝜆𝑁−1 + 𝑎2 𝜆𝑁−2 + +𝑎𝑁−1 𝜆 + 𝑎0 = 0
Characteristic equation
*The "eigen-" in "eigenvector"
translates as "characteristic"
13
– The matrix formed by the column eigenvectors is called the modal matrix M
• Matrix Λ is the canonical form of A: a diagonal matrix with eigenvalues on the main
diagonal
↑
𝑀 = 𝑣1
↓
↑
𝑣2
↓
↑
𝑣𝑁 Λ =
↓
𝜆1
𝜆2
𝜆𝑁
– Properties
• If A is non-singular, all eigenvalues are non-zero
• If A is real and symmetric, all eigenvalues are real
– The eigenvectors associated with distinct eigenvalues are orthogonal
• If A is positive definite, all eigenvalues are positive
14
Interpretation of eigenvectors and eigenvalues
– If we view matrix 𝐴 as a linear transformation, an eigenvector represents an
invariant direction in vector space
• When transformed by 𝐴, any point lying on the direction defined by 𝑣 will remain
on that direction, and its magnitude will be multiplied by 𝜆
P’
y2
P
x2
P
y=A x
v
d ’=  d
v
d
x1
y1
• For example, the transform that rotates 3-d vectors about the 𝑍 axis has vector
[0 0 1] as its only eigenvector and 𝜆 = 1 as its eigenvalue
z
 cos β

A 
sin β


0
 sin β
0
cos β
0
0
1




v  0 0 1
T
x
y
15
– Given the covariance matrix Σ of a Gaussian distribution
• The eigenvectors of Σ are the principal directions of the distribution
• The eigenvalues are the variances of the corresponding principal directions
– The linear transformation defined by the eigenvectors of Σ leads to vectors
that are uncorrelated regardless of the form of the distribution
• If the distribution happens to be Gaussian, then the transformed vectors will be
statistically independent
↑
Σ𝑀 = 𝑀Λ with 𝑀 = 𝑣1
↓
𝑓𝑥 𝑥 =
1
2𝜋
𝑁/2
↑
𝑣𝑁 Λ =
↓
𝑁
1
exp − 𝑥 − 𝜇 𝑇 Σ −1 𝑥 − 𝜇
1/2
2
Σ
𝑓𝑦 𝑦 =
𝑖=1
𝜆1
𝜆2
𝜆𝑁
𝑦𝑖 − 𝜇𝑦𝑖
exp −
2𝜆𝑖
√2𝜋𝜆𝑖
1
2
y2
x2
𝑦 = 𝑀𝑇𝑥
x2
v2
↑
𝑣2
↓
y2
v1
x1
x1
y1
y1
16
Download