Vectors:

advertisement
SM261 Class synopsis
- Prof. W D Joyner1
Vectors:
Let V = Rn denote the set of n-tuples of elements taken from the real
numbers R. The elements of V are called vectors. Vectors are often thought
of, intuitively and geometrically, as a quantity with a length and a direction,
drawn as an arrow.
Define vector addition + and scalar multiplication · as follows: for
v = (v1 , ..., vn ), w = (w1 , w2 , ..., wn ) ∈ V and c ∈ R, define vector addition
and scalar multiplication “component-wise”
v + w = (v1 + w1 , v2 + w2 , ..., vn + wn ),
c · v = (cv1 , cv2 , ..., cvn ).
The dot product or inner product, denoted by v · w or by hv, wi, is
the real number given by
v · w = v1 w1 + v2 w2 + . . . + vn wn .
We say vectors v, w ∈ Rn are orthogonal if hv, wi = 0, written v ⊥ w.
The length of a vector can be defined by
√
||v|| = v · v.
A unit vector is a vector of length 1.
If we have two vectors v, w which are (non-zero) scalar multiples, say
v = cw, c ∈ R, (c 6= 0),
then v, w are parallel.
The angle θ between two vectors v and w satisfies
cos(θ) =
v·w
.
||v|| · ||w||
A linear combination of vectors v1 , v2 , ..., vk of V is an expression of the
form
c1 v1 + c2 v2 + ... + ck vk ,
1
wdj@usna.edu. Last updated 2014-11-25.
1
for some scalar coefficients ci ∈ R (i = 1, 2, ..., k).
Matrices:
An m×n matrix is a rectangular array or table of elements of R arranged
with m rows and n columns. It is usually written:


a11 a12 ... a1n
 a21 a22 ... a2n 


A =  ..
..  .
 .
. 
am1 am2 ... amn
The (i, j)th entry of A is aij . The ith row of A is
ri =
ai1 ai2 · · · ain
(1 ≤ i ≤ m)
The jth column of A is



cj = 

a1j
a2j
..
.





(1 ≤ j ≤ n)
amj
If A is as above then the (n − 1) × (n − 1) matrix obtained by removing the
i-th row and the j-th column is denoted A(i, j):


a11
a12 . . . a1,j−1 a1,j+1 . . . a1n
 a21

a22
... a2n
 .

.
 .

.
.
.




...
ai−1,n  .
A(i, j) =  ai−1,1 ai−1,2


...
ai+1,n 
 ai+1,1 ai+1,2
 .

..
 ..

.
am1
am2
...
amn
The determinant of A(i, j) is called the (i, j)th minor of A:
M inor(A)i,j = det(A(i, j)).
The signed (i, j)th minor of A is called the (i, j)th cofactor of A:
2
Cof (A)i,j = (−1)i+j det(A(i, j)).
Solving linear equations:
Let A be an m × n matrix. The solutions to the matrix equation Av = b
are obtained by the following method.
• Compute the row-reduced echelon form of the augmented matrix, rref(A, b).
Suppose it has the form (A0 , b0 ), for some m × n matrix A0 in rowreduced echelon form. Let i1 , . . . ik denote the indices of the non-pivot
columns.
• Using the non-pivot variables, parameterize the solution to A0 v = b0 :
v = b0 + b1 vi1 + . . . + bk vik ∈ b0 + Span({b1 , . . . , bk }),
where v = (v1 , . . . , vn ).
• A vector v solves Av = b if and only if v ∈ b0 + Span({b1 , . . . , bk }).
Types of matrices:
A matrix having as many rows as it has columns (m = n) is called a
square matrix. The entries aii of an m × n matrix A = (aij ) are called the
diagonal entries, the entries aij with i > j are called the lower diagonal
entries, and the entries aij with i < j are called the upper diagonal
entries. An m × n matrix A = (aij ) all of whose lower diagonal entries
are zero is called an upper triangular matrix. A similar definition holds
for lower triangular matrices. The square n × n matrix with 1’s on the
diagonal and 0’s elsewhere,


1
... 0
.. 

.
 0 ..
. 
 .
,
 ..

0


..
0 . 0 1
3
is called the n × n identity matrix and denoted I or In . This is both upper
triangular and lower triangular. In general, any square matrix which is both
upper triangular and lower triangular is called a diagonal matrix.
A square n × n matrix with exactly one 1 in each row and each column,
and 0’s elsewhere, is called an n × n permutation matrix. The identity In
is a permutation matrix.
Elementary row operations:
The elementary row operations on a matrix are
• multiplying one row by a non-zero scalar (Ri 7→ cRi ),
• interchanging two rows (Ri ↔ Rj ),
• adding a scalar times one row to another row, replacing that row by
the sum (Ri + cRj 7→ Ri ).
These operations are to be applied to a m × n matrix A, such as the one
above.
A matrix is in row-reduced echelon form if the following conditions
are true.
• All of the 0-rows (rows with no one nonzero element), if any, belong at
the bottom of the matrix. That is, nonzero rows (rows with at least
one nonzero element) are above any rows of all 0s.
• The leading coefficient (the first nonzero number reading left-to-right,
also called the pivot) of a nonzero row is always strictly to the right
of the pivot of the row above it.
• All entries in a column above and below a pivot are 0.
• The leading coefficient (pivot) of any nonzero row is equal to 1.
Matrix inverse:
If A is an n×n matrix then A has an inverse matrix, if there is a matrix
B such that AB = BA = In . The inverse is denoted A−1 (when it exists)
and we have
4
In = A−1 A = AA−1 .
The inverse of a matrix A only exists when A is square.
When does A−1 exist?
Theorem 1. Let A be an n × n matrix. The following are equivalent.
• A−1 exists,
• det(A) 6= 0,
• The system Av = 0 has the unique solution v = 0.
• The system Ax = b has a unique solution (namely, v = A−1 b).
• The rank of A is n.
• rref(A) = In .
• rref(A, I) = (I, A−1 ).
• rref(A, b) = (I, A−1 b).
Adjugate formula for A−1 :
The adjugate of A is the transpose of the matrix of cofactors of A:
Adj(A) = (Cof (A)i,j )T .
The following formula for A−1 is valid:
A−1 =
1
Adj(A).
det(A)
In the 2 × 2 case, this formula becomes:
−1
1
a b
d −b
=
.
c d
ad − bc −c a
Of course, this only makes sense if ad − bc 6= 0, ie if det(A) 6= 0.
5
Determinants:
Suppose A = (aij ) is an n × n matrix and fix any i, j ∈ {1, 2, ..., n}. Its
determinant det(A) is can be computed by a cofactor expansion down any
column
n
X
det(A) =
ai,j Cof (A)i,j ,
j=1
or across any row,
det(A) =
n
X
ai,j Cof (A)i,j .
i=1
Theorem 2. Let A, B be an n × n matrix. The following statements are
true.
• det(A−1 ) = det(A)−1 .
• det(Ak ) = det(A)k .
• det( t A ) = det(A) (where t A is the transpose of A).
• det(A) det(b) = det(AB).
• If we write A as as a column of row vectors,


r1
 r2 


A =  .. 
 . 
rn
then the elementary row operation Ri + cRj → Ri does not change the
value of the determinant:





det 



r1
r2
..
.
ri
..
.
rn


r1
r2
..
.








 = det 

 ri + crj


..


.
rn
6









• If we write A as as a column of row vectors, as above, then the elementary row operation cRi → Ri changes the value of the determinant by
a factor of c:





c det 



r1
r2
..
.
ri
..
.










 = det 






rn
r1
r2
..
.
cri
..
.









rn
Main properties of row-reduction
(a) The rank of the matrix A is the number of non-zero rows in rref(A),
denoted rank(A).
(b) The column span or column space of the matrix A is the span of
the column vectors of A. It’s denoted Col(A) and its dimension is
dim Col(A) = rank(A).
A basis of Col(A) is obtained by writing down the column vectors of
A corresponding to the pivot columns of rref(A).
(c) The row span or row space of the matrix A is the span of the row
vectors of A. It’s denoted Row(A) and its dimension is
dim Row(A) = rank(At ).
Note Row(A) = Col(At ).
(d) The kernel or null space of the matrix A is the subspace of vectors
which get sent to 0 by A:
Ker(A) = N ull(A) = {v ∈ Rn | Av = 0}.
A basis of ker(A) is obtained by the following method.
7
– Compute the row-reduced echelon form of A, rref(A). Let i1 , . . . ik
denote the indices of the non-pivot columns.
– Using the non-pivot variables, parameterize the solution to rref(A)v =
0:
v = b1 vi1 + . . . + bk vik ,
where v = (v1 , . . . , vn ).
– A basis of Ker(A) is {b1 , . . . , bk }.
Sometimes the dimension of the kernel of A is called the nullity of A;
nullity(A) = dim Ker(A).
• Algorithm to compute A−1 :
If A is an n × n matrix, a fast way to compute A−1 is obtained by the
following method.
– Compute rref(A, In )
– A is invertible if and only of this has the form (In , B) for some
matrix B. Set A−1 = B.
As a consequence of (a), (b) and (d) above we have the rank plus nullity theorem (whose Gilbert Strang calls the “fundamental theorem
of linear algebra”)
rank(A) + nullity(A) = n,
for every m × n matrix A.
Let A be a given n × n matrix,


a11 a12 ... a1n
 a21 a22 ... a2n 


A =  ..
..  ,
 .
. 
an1 am2 ... ann
8
b be a given column vector,

b1
b2
..
.


b=






bn
and x be a column vector of variables,

x1
x2
..
.


x=




.

xn
The system Ax = b has a unique solution if and only if det(A) 6= 0. Cramer’s
rule gives us a formula for this solution without having to compute A−1 .
Cramer’s Rule: Write A as a vector of column vectors:
A = (c1 , c2 , . . . , cn ),
where



cj = 

a1j
a2j
..
.



,

(1 ≤ j ≤ n).
amj
Then the ith coordinate of the solution x to Ax = b is
xi =
det(c1 , . . . , ci−1 , b, ci+1 , . . . , cn )
,
det(c1 , . . . , ci−1 , ci , ci+1 , . . . , cn )
(1 ≤ i ≤ n).
A linear combination of vectors v1 , . . . , vk of V is an expression of the
form
c1 v 1 + . . . + ck v k ,
for some constants c1 , . . . , ck . The collection of all possible linear combinations of v1 , . . . , vk is called the span and denoted
9
Span(v1 , . . . , vk ).
If the only solution to
c1 v1 + . . . + ck vk = 0,
is c1 = . . . = ck = 0 then we call the vectors v1 , . . . , vk linearly independent.
We call vectors v1 , . . . , vk of V a basis of V if (a) the v1 , . . . , vk are
linearly independent, (b) V = Span(v1 , . . . , vk ). In this case we call k the
dimension of V , written
dim(V ) = k.
If the B = {v1 , . . . , vk } is a set of linearly independent vectors and if
c1 v1 + . . . + ck vk = v,
then we write
[v]B = (c1 , . . . , ck ),
and call c1 , . . . , ck the coordinates of v with respect to B.
Matrix representation of a linear transformation with respect
to a basis:
• Let T : Rn → Rm denote a linear transformation. Assume Rm and Rn
each are given the standard basis.
The matrix representation of T in the standard basis is the m × n
matrix given by:
[T ] = (T (e1 ), . . . , T (en )),
where each T (e1 ), . . . , T (en ) is represented as a column vector in Rm .
• Assume T : V → V is a linear transformation from a vector space V
to itself. Let B = {b1 , . . . , bn } denote a basis for V .
The matrix representation of T in the basis B is n × n matrix given by:
10
[T ]B = ([T (b1 )]B , . . . , [T (bn )]B ),
where each [T (b1 )]B , . . . , [T (bn )]B is represented as a column vector in
Rn .
Another formula for this matrix representation is
[T ]B = S −1 [T ]S,
where the vectors b1 , . . . , bn are used as the column vectors of the n×n
matrix S. (Sometimes S is called the change-of-basis matrix.)
• Assume T : V → V 0 is a linear transformation between vector spaces
V and V 0 . Let B = {b1 , . . . , bn } denote a basis for V and let B 0 =
{b01 , . . . , b0m } denote a basis for V 0 .
The matrix representation of T in the bases B, B 0 is m×n matrix given
by:
[T ]B = ([T (b1 )]B 0 , . . . , [T (bn )]B 0 ),
where each [T (b1 )]B 0 , . . . , [T (bn )]B 0 is represented as a column vector
in Rm .
Orthogonal matrices
An orthogonal basis for the vector space Rn is a basis of mutually
orthogonal vectors. An orthonormal basis for the vector space Rn is a
basis of mutually orthogonal vectors each of which has unit length. An
orthogonal matrix is a square matrix whose rows (or columns) form a
orthonormal basis. Equivalently, an n × n matrix is orthogonal if and only if
A · At = In .
Gram-Schmidt process
Input: k linearly independent vectors v1 , . . . , vk in Rn , for k ≤ n.
Output: k orthonormal vectors u1 , . . . , uk in Rn , with Span{v1 , . . . , vk } =
Span{u1 , . . . , uk }.
11
• Let w1 = v1 .
• Let w2 = v2 −a1 w1 , where a1 is determined by the condition hw1 , w2 i =
0.
• Let w3 = v3 − a1 w1 − a2 w2 , where a1 , a2 are determined by the conditions hw1 , w3 i = 0, hw2 , w3 i = 0.
...
• Let wk = vk − a1 w1 − a2 w2 . . . − ak−1 wk−1 , where a1 , a2 , . . . , ak−1
are determined by the conditions hw1 , wk i = 0, hw2 , wk i = 0, . . . ,
hwk−1 , wk i = 0.
• Finally, normalize the wj ’s to have unit length: u1 = w1 /||w1 ||, u2 =
w2 /|ww2 ||, . . . , uk = wk /||wk ||.
Orthogonal projections
Let V be a subspace of Rn . The orthogonal complement of V is the
subspace
V ⊥ = {w ∈ Rn | hv, wi = 0, for all v ∈ V }.
Facts:
• Every w ∈ Rn has a unique representation w = v + v⊥ , for v ∈ V and
v⊥ ∈ V ⊥ .
• If V = Col(A), for some matrix A, then Col(A)⊥ = ker(At ).
• If V = Row(A), for some matrix A, then Row(A)⊥ = ker(A).
• If V has orthonormal basis u1 , . . . , uk in Rn , then the orthogonal
projection prV : Rn → V is given by
prV (v) = hv, u1 iu1 + hv, u2 iu2 + . . . + hv, uk iuk .
The matrix representation of prV is the matrix Q · Qt , where Q =
(u1 , . . . , uk ) (written as a matrix of column vectors).
12
• If V = Col(A), where A has linearly independent columns, then the
matrix representation of prV is the matrix A(At A)−1 At :
[prV ] = A(At A)−1 At .
Application to linear regression:
To find the line y = mx + b of best fit for the data (x1 , y1 ), . . . , (xn , yn ),
let

x1 1
 x2 1 


A =  ..
,
 .

xn 1


y1
 y2 


y =  ..  ,
 . 
yn

and compute
(At A)−1 At y.
Eigenvalues and Eigenvectors
In general, you have the eigenvector equation
v 6= 0.
Av = λv,
This means that λ is an eigenvalue of A with eigenvector v.
• Find the eigenvalues. These are the roots of the characteristic polynomial
pA (x) = det(A − xIn ).
In the 2 × 2 case, this becomes
p(x) = det
a−x
b
c
d−x
13
= x2 − (a + d)x + (ad − bc).
• The eigenspace associated to λ is
Eλ = ker(A − λIn ).
The non-zero elements in this vector subspace are the eigenvectors associated to λ.
In the 2 × 2 case, if b 6= 0, then you can use the formula
b
v=
,
λ−a
to get an eigenvector associated to λ.
• The algebraic multiplicity of an eigenvalue λ is A is the multiplicity
of it as a root of the characteristic polynomial. For example, if
k
Y
pA (x) =
(x − λj )mj ,
j=1
where λ1 , . . . λk are distinct, then the algebraic multiplicity of λj is mj .
• The geometric multiplicity of an eigenvalue λ is A is the dimension
of the subspace ker(A − λI).
• In general, the geometric multiplicity of an eigenvalue λ is always less
than or equal to the algebraic multiplicity of λ.
• The matrix A is diagonalizable if and only if there is an invertible
matrix S such that
S −1 AS = D,
where D is a diagonal matrix having the eigenvalues of A on the diagonal (counted according to multiplicity).
• If A is symmetric (ie, A = At ) then A is diagonalizable.
• If A has distinct eigenvalues then A is diagonalizable.
14
• If, for each eigenvalue λ of A, the geometric multiplicity of λ is always
equal to the algebraic multiplicity of λ, then A is diagonalizable. The
converse also holds.
Moreover, the following holds. If, for each eigenvalue λ of A, the geometric multiplicity of λ is always equal to the algebraic multiplicity of
λ, then there are n linearly independent eigenvectors v1 , . . . , vn of A.
Using these vectors vj as column vectors of an n × n matrix S, then
S −1 AS = D,
where D is a diagonal matrix having the eigenvalues of A on the diagonal (counted according to multiplicity). The matrix representation of
A with respect to the basis B = {v1 , . . . , vn } is D:
[A]B = D.
15
Download