review packet.

advertisement
A Brief Outline of Math 355
Lecture 1. The geometry of linear equations; elimination with matrices.
• A system of m linear equations with n unknowns can be thought
of geometrically as m hyperplanes intersecting in Rn . Some
basic questions in this course are:
i. Are there any points in Rn where all the hyperplanes intersect?
ii. How many such points are there?
Geometrically, we saw that in R2 , two lines can intersect either
in a single point, everywhere (i.e. they are the same line), or
nowhere (i.e. they are parallel).
• We also emphasized the column picture, where we look for a
solution as a linear combination of the columns of our matrix.
The columns are viewed as vectors in Rn .
• In order to (attempt to) solve a system of linear equations,
we convert the equations to a matrix, then use row operations
to reduce a matrix A to an upper triangular matrix U , using which it is easy to backsubstitute and find any solutions.
Allowable row operations are:
i. Add a multiple of one row to another
ii. Exchange rows
iii. Multiply a row by a nonzero number
Lecture 2. Multiplication and inverse matrices;
• Matrix multiplication (i.e. AB = C) can be thought of in four
ways:
1. One entry at a time: The entry ci,j is the inner product
of the ith row of A with the jth column of B.
2. A row at a time: The ith row of C is a linear combination of the rows of B, with the coefficients of the linear
combination being the ith row of A.
3. A column at a time: The ith column of C is a linear
combination of the columns of A, with the coefficients of
the linear combination being the ith column of B.
4. A whole matrix at a time: Multiply a column of A with a
row of B to get an m×n matrix. Add up all such matrices
to get AB.
• Given a square, invertible matrix, take the augmented matrix
[ A | I ], and row reduce A to reduced row echelon form (which
will be the identity matrix, I), to get [ I | A−1 ].
1
Lecture 3. Factorization into A = LU ; transposes, permutations, spaces.
• Using the ”One row at a time” idea of multiplication, we could
translate row operations into matrix algebra. For example, if
we were row reducing a 3x3 matrix and wanted to subtract 2
of row 1 from row 2, the picture might look something like:

1
 −2
0

0 0
1
1 0  2
0 1
0
2
5
0


3
1
6  = E2,1 A =  0
1
0
0
1
0

0
0 
1
We prefer to reduce a matrix to the form A = LU , since the
inverse of the elimination matrices are particularly easy to find
(the inverse of a matrix that subtracts 2 of row 2 is one that
−1 −1 −1
adds 2 of row 2), and the product E2,1
E3,1 E3,2 is just the
coefficients from elimination.
• If A = (ai,j ), then the (i, j)th entry of AT , the transpose of A,
is (aj,i ).
• The matrix that permutes the rows of another matrix can be
found by performing the permutation on the identity matrix.
There are n! n × n permutation matrices, the inverse of a
permutation matrix is its transpose (i.e. P P T = P P −1 = I),
and the product (or transpose) of a permutation matrix is
another permutation matrix.
• A vector space is a collection V of objects (which are called
vectors), which can be added or multiplied by a (real) number
(and the result will still be in V ). A subspace of a vector space
V is a subset of V which is still a vector space. For example,
the column space of an m × n matrix is a subspace of Rm .
Lecture 4. Rn ; column space and null space; solving Ax = 0: pivot variables,
special solutions.
• Our prime example of a vector space is Rn , that is, we take our
vectors to be n-tuples of real numbers, and perform addition
and scalar multiplication componentwise.
• The column space of a matrix A is the set of all linear combinations of the columns of a matrix. Equivalently, it is the set
of all b so that Ax = b has a solution. We denote the column
space by C(A).
• The null space of a matrix A is the set of all x so that Ax = 0.
We denote the null space by N (A).
• To find the null space of a matrix A:
2
i. Use Gauss-Jordan elimination to convert A to reduced row
echelon form, R. You will have r pivot variables and n − r
free variables.
ii. Set the first free variable equal to 1 and the rest equal
to 0, then solve for the pivot variables. This is the first
special solution.
iii. Repeat the previous step with each of the other free variables to find n − r linearly independent special solutions.
iv. These special solutions form a basis for N (A) (so any
linear combination of the special solutions is in the null
space).
Lecture 5. Solving Ax = b: row reduced form R; independence, span, basis
and dimension.
• Algorithm for complete solution to Ax = b:
i. Use row operations to change A to R.
ii. Set free variables to zero, solve for pivot variables to find
xparticular .
iii. Find the nullspace of A, N (A).
iv. Complete solution to Ax = b is x = xp + xn , where xn is
any vector in N (A).
So a complete solution to such a problem would consist of
finding some xp (remember there are infinitely many), and
N (A) (plus writing x = xp + xn ).
• Given an m × n matrix A with rank r, there are three special
cases:
i. r = n < m: Then N (A) = {0}, and Ax = b has either 0
or 1 solution.
ii. r = m < n: Then dim(C(A)) = r = m, so C(A) = Rm ,
and so Ax = b always has a solution. Also, dim(N (A)) =
n − r > 0, so there are in fact ∞ many solutions.
iii. r = m = n: Then N (A) = {0} and Ax = b always
has a solution. Thus there is always a unique solution to
Ax = b.
• A set of vectors v1 , v2 , . . . , vn are linearly independent if
a1 v1 + a2 v2 + · · · + an vn = 0
means that a1 = a2 = · · · = an = 0. The algorithm for
checking whether vectors are independent is to create a matrix
A with the vectors as columns. If N (A) = 0, then the vectors
are independent. Otherwise, they are dependent.
3
• The span of a set of vectors is all the linear combinations of
those vectors. We say that v1 , v2 , . . . , vn span a vector space
V if V = span{v1 , v2 , . . . , vn }. For example, the span of the
columns of a matrix is the column space.
• A set of vectors is a basis for a vector space V if
i. The vectors are linearly independent, and
ii. The vectors span V .
• The dimension of a vector space V is the number of vectors in
any basis for V (recall, we showed that any basis for V has the
same number of vectors). We also showed that if dim(V ) = n,
then any n linearly independent vectors in V will be a basis
for V .
Lecture 6. The four fundamental subspaces; matrix spaces, polynomial spaces.
• We took an m×n matrix A, and looked at the column space(C(A)),
the null space (N (A)), the row space (C(AT )), and the left null
space (N (AT )).
The natural questions to ask when looking at subspaces are:
1. What is a basis?
2. What is the dimension?
We answer these questions here:
Suppose we have a matrix A. Then if we take the augmented
matrix [ A | I ] and use row operations to reduce A to reduced
row echelon form, R, then we call the matrix on the right E,
for elimination matrix (note the matrix E’s relationship to
the elimination matrices from chapter 1). That is, we use row
reduction to go from:
[A|I ]→[R|E ]
Now we can easily read off the rank, r of the matrix A, by
counting the pivot variables in R, as well as calculate:
Dimension of C(A): Is just the rank, r.
Basis for C(A): Is the r columns of A that correspond to the
pivot columns of R.
Dimension of C(AT ): Is also the rank, r.
Basis for C(AT ): These are the first r ows of R (since row
operations do not change the row space, and the first r
rows are the pivot rows).
Dimension of N (A): This is the number of free variables,
which is n − r.
4
Basis for N (A): We find n − r solutions to the system of
equations Rx = 0, by setting one free variable equal to 1
at a time, while leaving the rest equal to zero, and solving.
Dimension of N (AT ): Since this is just the null space of AT ,
which has r pivots and m−r free variables, this must have
dimension m − r.
Basis for N (AT ): Take the bottom m − r rows of E.
• The space Mm×n of m × n matrices can also be considered
a vector space, even though matrices are not traditionally
thought of as “vectors”. We also inspected the subspaces of
upper triangular, symmetric and diagonal matrices. Be able
to find bases for these spaces. As an example, a basis for M3×3
is
 




0 1 0
0 0 0
1 0 0
 0 0 0 , 0 0 0 ,..., 0 0 0 .
0 0 0
0 0 1
0 0 0
• The space Pn of polynomials of degree ≤ n is also a vector
space of dimension n + 1, with basis {1, x, x2 , . . . , xn }.
Lecture 7. Graphs, networks, incidence matrices.
• A graph consists of nodes and edges. If these were more serious
notes, there’d be an example drawn.
• An incidence matrix for an oriented graph with m edges and
n nodes will be an m × n matrix, with the entries

 −1 if edge i leaves node j
1
if edge i enters node j
ai,j =

0
otherwise.
• Each of the four fundamental subspaces has a physical interpretation, starting with interpreting the vector x as the potential at each node:
The column space The vector e = Ax represents the possible potential differences.
The null space This is the stationary solution- when there is no potential
difference.
The left null space The set of y so that Ay = 0 are those currents which
satisfy Kirchoff ’s circuit law, which says that the net flow
of current at any node must be 0.
The row space The corresponding pivot rows will create a maximum tree
in the graph (i.e. a subgraph that has no loops, but contains every node).
5
• An incidence matrix has another interesting interpretation:
the dimension of N (AT ) is the number of loops in the graph,
while the rank is the number of nodes, minus 1. Hence,
dim(N (AT ))
# loops
= m−r
=
# edges − # of nodes − 1.
This is Euler’s formula.
Lecture 8. Orthogonal vectors and subspaces; projections onto subspaces.
• Two vectors x and y are orthogonal if xT y = 0.
• Two subspaces S and T are orthogonal if sT t = 0 for every
vector s ∈ S and t ∈ T .
• Two subspaces S and T of Rn are orthogonal complements if
i. S and T are orthogonal.
ii. dimS + dimT = n.
• The row space and null space of an m×n matrix are orthogonal
complements in Rn .
• The column space and the left null space of an m × n matrix
are orthogonal complements in Rm .
• To project a vector b onto the subspace generated by a, we
use the projection matrix P , given by
P =
aaT
.
aT a
Then the projection of b is just P b.
• We define that a projection matrix is any matrix so that
i. P T = P , and
ii. P 2 = P
Lecture 9. Projection matrices and least squares; orthogonal matrices and
Gram-Schmidt
• We use projections to solve least squares problems. That is, in
the event that there is no x so that Ax = b, we find an x̂ so
2
that Ax̂ = b and ||x − x̂|| is as small as possible.
• We solve least squares using the projection matrix
P = A(AT A)−1 AT ,
in the sense that P b = Ax̂.
6
• In practice, to solve the least squares problem, you solve the
equation
AT Ax̂ = AT b.
This will have a solution if and only if AT A is invertible, which
is true whenever A has independent columns.
• A set of vectors q1 , q2 , . . . , qn is orthonormal if
(
1 if i = j
T
qi qj =
0 if i 6= j
• Any matrix (rectangular or square) Q with orthonormal columns
has the property
QT Q = I.
If Q is also square, then QT = Q−1 .
• If we have a least squares problem with an orthogonal matrix (i.e. one with orthonormal columns), then the projection
equation simplifies to
x̂ = QT b,
so in particular, the ith coordinate of x̂ is x̂i = qTi b.
• The Gram-Schmidt process takes a set of vectors a, b, c, . . . z
(ok, I don’t mean precisely 26 vectors, but I don’t want to
involve subscripts either so bear with me), and converts them
into orthogonal vectors A, B, C, . . . , Z, and then into orthonormal vectors q1 , q2 , q3 , . . . , qn , so that all of the different sets
of vectors have the same span. Here is the algorithm:
1. We define the orthogonal vectors recursively:
A
= a,
AT b
,
AT A
AT c
BT c
C = c− T − T ,
A A B B
..
.
AT z
BT z
YT z
Z = z − T − T − ··· − T .
A A B B
Y Y
B = b−
7
2. We normalize the vectors:
q1
=
q2
=
q3
=
A
||A||
B
||B||
C
||C||
(1)
..
.
qn
=
Z
||Z||
Lecture 10. Properties of determinants; determinant formulas and cofactors.
We deduced that three properties of the determinant completely determine the determinant. We used these three properties to prove
that seven more properties hold, and then used this to deduce formulas for the determinant.
• The determinant is a function that eats square (real valued)
matrices and gives a (real) number. The three defining properties of the determinant are:
1. detI = 1.
2. Transposing two rows of a matrix changes the sign of the
determinant.
3. The determinant is linear in each row. This means:
a. Multiplying a row by a number multiplies the determinant by the same number. For example, if




← r1 →
← r1 →
 ← r2 → 
 ← r2 → 




 ..
 ..
..
.. 
..
.. 
 .


.
. 
.
. 
0
 .

A=
 ← ri →  , and A =  ← tri →  ,




 .
 .
..
.. 
..
.. 
 ..
 ..
.
. 
.
. 
← rn →
← rn →
then detA0 = t · detA.
b. Adding a vector to a row of a matrix is “additive”
(this doesn’t seem like the right word to use, but I
don’t think a correct and simple word exists...). For
8
example, if

←
 ←

 ..
 .
Ar = 
 ←

 .
 ..
←
and
r1
r2
..
.
→
→
..
.
ri
..
.
→
..
.
rn
→





A=






←
←
..
.
r1
r2
..
.








 , As = 
 ← si



 .

..
 ..

.
← rn
←
←
..
.
r1
r2
..
.
← ri + si
..
..
.
.
←
rn
→
→
..
.
→
..
.
→
→
..
.
→
..
.





,




→





,




→
then detA = detAr + detAs .
• We then used the previous three properties to deduce seven
more properties that the determinant must satisfy:
4. If two rows of A are equal, then detA = 0.
5. Subtracting k × (row i) from row j does not change the
determinant.
6. If A has a row of zeros, then detA = 0.
7. The determinant of an upper triangular matrix is the
product of the pivots:
d1 u1,2 . . . u1,n 0
d2 . . . u2,n detU = .
..
.. = d1 d2 . . . dn .
..
..
.
.
. 0
0
. . . dn 8. detA = 0 if and only if A is singular.
9. det(AB) = (detA)(detB).
10. detA = detAT .
• We then used the above 10 properties to determine 3 formulas
for the determinant of a matrix:
Long formula with n! terms: By expanding a matrix using property 3b, and eliminating those with rows of 0 using property 6, we got
X
detA =
±a1,α a2,β a3,γ . . . an,ω ,
n! permutations
of 1,...n
9
where {α, β, γ, . . . , ω} is some permutation of {1, 2, 3, . . . , n},
and the sign is determined by whether this is an odd or
even permutation.
Cofactor expansion: The cofactor of ai,j , denoted ci,j , is
(n − 1) × (n − 1) matrix
i+j
ci,j = (−1) det
.
with row i, col j removed.
Then we concluded that
detA = a1,1 c1,1 + aa,2 c1,2 + · · · + a1,n c1,n ,
and referred to this as the cofactor expansion along row
1. A similar formula holds expanding along any row or
column.
Row reduction: Using properties 5 and 7, we concluded that
row reducing a matrix to A = LU , then detA = detU =
the product of the pivots. This is the most computationally efficient method in general, though cofactors are also
very useful in computing by hand.
Lecture 11. Applications of the determinant: Cramer’s rule, inverse matrices,
and volume; eigenvalues and eigenvectors.
• We defined C to be the cofactor matrix of A: that is, ci,j is
the cofactor associated with ai,j . This allowed us to write
A−1 =
1
C.
detA
(note this equation only holds if A is invertible)
• Cramer’s Rule gives us an explicit way of solving for each
coordinate of Ax = b. In particular,
x1
=
x2
=
detB1
detA
detB2
detA
..
.
xn
=
detBn
,
detA
where Bi is the matrix A with the ith column replaced by the
vector b.
• We also saw that the volume of an n-dimensional parallelepiped
with edges a1 , a2 , . . . , an is the absolute value of the determinant of the matrix A with columns a1 , a2 , . . . , an .
10
• An eigenvalue of a matrix A is a number λ so that there exists
a vector x (called the eigenvector ) with
Ax = λx.
• To find the eigenvalues of A, we solve the characteristic equation
det[A − λI] = 0,
which will be an nth degree polynomial (and so will have n
not-necessarily-distinct, not-necessarily real roots).
• To find the eigenvectors, we take the eigenvalues λ1 , . . . , λn ,
and let xi be a vector in the nullspace of A−λi I. (this is a little
imprecise, since if two eigenvalues are the same, the nullspace
of A − λI may contain more than one linearly independent
vector)
• If we have n independent eigenvectors, and put them as columns
of the matrix S, then
S −1 AS = Λ and A = SΛS −1 ,
where Λ is a diagonal matrix with the eigenvalues along the
diagonal:


λ1 0 . . . 0
 0 λ2 . . . 0 


Λ= .
.. . .
..  .
 ..
. . 
.
0
0
...
λn
You can remember this equation since Axi = xi λi corresponds
to multiplying A on the right by the column xi , giving AS =
SΛ.
• Note that if a matrix can be diagonalized, then
Ak = SΛk S −1 ,
where Λk is easily computed
 k
λ1
 0

Λk =  .
 ..
0
as
0
λk2
..
.
...
...
..
.
0
0
..
.
0
...
λkn



.

• A matrix is diagonalizable if and only if it has n independent
eigenvectors. If each of the eigenvalues are different, then the
matrix is sure to be diagonalizable. However, if a matrix has
repeated eigenvalues, then it may or may not be diagonalizable.
11
• Solved the equation:
uk+1 = Auk ,
given the initial vector u0 , by noting that uk = Ak u0 .
• To actually compute uk :
i. Find eigenvalues λ1 , . . . , λn and eigenvectors x1 , . . . , xn of
A,
ii. Write u0 = c1 x1 + c2 x2 + · · · + cn xn = Sc, where S is the
eigenvector matrix, and c = [c1 , . . . , cn ]T is the solution
vector to Sc = u0 .
iii. Then
uk = Λk Sc.
Lecture 12. Diagonalization and powers of A; differential equations and eAt .
• Solved linear equations of the form
du1
dt
du2
dt
= a1,1 u1 + a1,2 u2 + · · · + a1,n un
= a2,1 u1 + a2,2 u2 + · · · + a2,n un
..
.
dun
dt
= an,1 u1 + an,2 u2 + · · · + an,n un ,
which we wrote in the decidedly more compact form
du
= Au.
dt
We typically are also given an initial condition u(0).
• To solve:
i. Find eigenvalues λ1 , . . . , λn and eigenvectors x1 , . . . , xn of
A,
ii. Solution is
u(t) = c1 eλ1 t x1 + c2 eλ2 t x2 + · · · + cn eλn t xn ,
where c = [c1 , . . . , cn ]T is found by noting that u(0) = Sc.
• This can also be written
u(t) = SeΛt S −1 u(0).
• We noted that the exponential of a matrix is defined by:
eAt = I + At +
(At)3
(At)2
+
+ · · · = SeΛt S −1 ,
2!
3!
(2)
with the second equality holding only if A is diagonalizable.
12
• We also saw that



eΛt = 

eλ1 t
0
..
.
0
0
eλ2 t
..
.
...
...
..
.
0
0
..
.
0
...
eλn t





• You can change a single 2nd order equation into a system of
1st order equations by rewriting
y 00 + by 0 + ky = 0
as
u=
y0
y
, so u0 =
y 00
y0
=
−b −k
1
0
y0
y
.
This can also be used to reduce nth order differential equations
to a system of n first order equations.
Lecture 13. Markov matrices, Fourier series.
• A Markov matrix is one where
i. All entries ≥ 0.
ii. The entries in each column add to 1.
• If A is a Markov matrix, then λ = 1 is an eigenvalue, and
|λi | ≤ 1 for all other eigenvalues. Hence the steady state
will be some multiple of the eigenvector x1 corresponding to
λ1 = 1.
• Given an orthonormal basis q1 , q2 , . . . , qn , we can write any
v as
v = x1 q1 + x2 q2 + · · · + cn qn .
Since the qi ’s are orthonormal, multiplying the equation on
the left by qTi leaves us with
qTi v = xi .
• The Fourier series for a function f (x) is the expansion
f (x) = a0 + a1 cos x + b1 sin x + a2 cos 2x + b2 sin 2x + · · · .
• We define the inner product for these functions as
Z 2π
T
f g=
f (x)g(x)dx,
0
13
and observe that 1, cos x, sin x, cos 2x, sin 2x, . . . is an orthogonal basis (though each one has norm π, so it is easy to make
it orthonormal). Hence to find b2 (for example), we use the
above and observe
Z
1 2π
f (x) sin 2x dx
b2 =
π 0
Lecture 14. Symmetric Matrices
• If you have a symmetric matrix (that is, A = AT , or when a
complex matrix, A = ĀT ), then
1. The eigenvalues of A are real, and
2. The eigenvectors of A can be chosen to be orthogonal.
• Then a symmetric matrix A can be factored as
A = QΛQT
(compare to the usual case A = SΛS −1 ). This is called the
spectral theorem.
• Multiplying out the factorization above, we get
A = λ1 q1 qT1 + λ2 q2 qT2 + · · · + λn qn qTn ,
where each
qi qTi =
qi qTi
qTi qi
is an orthogonal projection matrix. So every symmetric matrix
is a linear combination of orthogonal projection matrices.
• For a symmetric matrix, the number of positive pivots is the
same as the number of positive eigenvalues.
• A positive definite matrix is a symmetric matrix where all
eigenvalues are positive (which is the same as all the pivots
being positive).
Lecture 15. Complex matrices and the Fast Fourier Transform.
• A complex number z can be written in three ways:
i. z = x + iy. It can be viewed on the complex plane as the
point (x, y), making the obvious identification with R2 .
ii. z = r(cos θ + i sin θ). In this case, r is called the modulus
(fancy word for “length”) of z, and θ is called the “argument”. It can be viewed on the complex plane as the
endpoint of the vector leaving from the origin with lengh
r and angle θ.
14
iii. z = reiθ . See above for the terminology. This form has
the same geometric interpretation as ii, but is more widely
πi
used. For example, 2i = 2e 2 .
• The complex conjugate of a complex number z is found by
switching the sign on the imaginary part of z, or graphically
by reflecting z over the real axis, and is denoted by z̄:
If z = x + iy = reiθ , then z̄ = x − iy = re−iθ .
A number z is real if and only if z = z̄.
• The length of a complex number z is
p
p
1
(z z̄) 2 = (x + iy)(x − iy) = x2 + y 2 = r.
• Given a complex vector z ∈ Cn ,

z1
 z2

z= .
 ..



,

zn
we noticed that the length was given by z̄T z, and so defined
the Hermitian as the transpose of the conjugate:
For vectors, zH := z̄T . For complex matrices, AH := ĀT .
• We use the Hermitian to translate words we used for real matrices and vectors into words for complex matrices and vectors:
Def. for R-valued
Def. for C-valued
Length of x
xT x
xH x
T
Inner product x y
xH y
T
A symmetric
A=A (
A = AH (
0 if i 6= j
0 if i 6= j
q1 , . . . , qn
qTi qj =
qH
i qj =
1 otherwise.
1 otherwise.
orthonormal
Notice that the only difference is that the transpose is always
exchanged for a Hermitian, and that when dealing with real
vectors/matrices, each definition is the same.
• The nth Fourier matrix, Fn is defined as

1
1
1
...
1
2
n−1
 1
ω
ω
.
.
.
ω


ω2
ω4
...
ω 2(n−1)
Fn =  1
 ..
..
..
..
..
 .
.
.
.
.
1 ω n−1 ω 2(n−1) . . . ω (n−1)(n−1)
15




,


where ω is the nth root of unity, that is, ω is a solution of
xn − 1 = 0. More specifically,
ω=e
2πi
n
.
• The columns of Fn are orthogonal (so FnH Fn = I), and can be
multiplied very quickly.
Lecture 16. Positive definite matrices and minima; Similar matrices and Jordan
form.
• We looked at four equivalent definitions for an n × n matrix
A being positive definite:
i. λ1 > 0, λ2 > 0, . . . , λn > 0.
ii. Each of the n leading subdeterminants are strictly positive.
The mth leading subdeterminant is the determinant of the
m × m matrix in the top left corner of A.
iii. Each of the pivots of A are strictly positive. (Careful: this
does not mean that the diagonal of A is positive. It means
that if A = LU , then the elements on the diagonal of U
are positive!)
iv. xT Ax > 0 for all x.
• We define positive semidefinite by replacing all the incidences
of the words “strictly positive” above by “positive or zero”.
The terms negative definite and negative semidefinite are defined the same, just replacing “positive” by “negative” in the
definition.
• The function produced by xT Ax is called a quadratic form.
When A is 2 × 2, this corresponds to a conic section.
• If A is positive definite, then xT Ax is a paraboloid. More
generally, let f : Rn → R (think: f (x1 , x2 , . . . , xn ) = y) then
∂f
∂f
if ∇f (a) = 0 (where ∇f = ( ∂x
, ∂f , . . . , ∂x
)) we have:
1 ∂x2
n
 ∂2f
2
∂2f
f
. . . ∂x∂1 ∂x
∂x1 ∂x2
∂x21
n
 ∂2f
2
2
∂ f
∂ f

.
.
.
2
∂x2 ∂xn
 ∂x2 ∂x1
∂x2
f (a) is a minimum if f 00 (a) = 
..
..
..
..

.
.
.
.

∂2f
∂2f
∂2f
...
∂xn ∂x1
∂xn ∂x2
∂x2
n
is positive definite (where each second derivative is evaluated
at a). Compare this to calculus where a is a minimum if
f 0 (a) = 0 and f 00 (a) > 0.
• Positive definite matrices act like positive numbers: If A, B are
positive definite matrices, then so are A−1 and A + B. Also,
AT A is positive definite for any m × n matrix A with rank n.
(since xT AT Ax = (Ax)T (Ax) = ||Ax||2 > 0)
16







• Two n × n matrices are similar if there is an invertible matrix
M so that B = M −1 AM .
• An example to remember is that every diagonalizable matrix
is similar to a diagonal matrix (A = SΛS −1 ).
• Similar matrices have the same eigenvalues, and represent the
same linear transformations with different coordinates.
• We found a ’good’ representative for each family of matrices
(that is to say, a family of matrices is the set of all matrices
you can get by conjugating the matrix by an invertible matrix.
i.e. the set of matrices similar to eachother), which we called
the Jordan canonical form.
• A Jordan block is the matrix

λ 1
 0 λ


Jλ =  0 0
 .. ..
 . .
0
0
0
1
λ
..
.
...
...
...
..
.
0
0
0
..
.
0
...
λ




.


• Every matrix A is similar to a Jordan canonical matrix, which
looks like


Jλ1
0 ...
0
 0 Jλ2 . . .
0 


J = .
.
..  .
.
..
..
 ..
. 
0
0 . . . J λn
Lecture 17. Singular Value Decomposition; Linear transformations and their
matrices, coordinates.
• The singular value decomposition works for all matrices, and
decomposes the m × n matrix A into
A = U ΣV T ,
where U is an m × m orthogonal matrix, Σ is an m × n “diagonal” matrix with all entries ≥ 0, and V is an n × n orthogonal
matrix.
• The columns of U are the eigenvectors of AT A (which, recall,
is positive indefinite), and the diagonal entries of Σ are the
square roots of the associated eigenvalues.
• The columns of V are the eigenvectors of AAT , and again the
diagonal entries of Σ are square roots of the eigenvalues.
• We can also look at the columns v1 , . . . , vn of V and the
columns u1 , . . . , um of U in the following way:
Let r be the rank of A.
17
– v1 , . . . , vr are an orthonormal basis for C(A),
– u1 , . . . , ur are an orthonormal basis for C(AT ),
– vr+1 , . . . , vn are an orthonormal basis for N (AT ),
– ur+1 , . . . , um are an orthonormal basis for N (A).
• A linear transformation is a function T : Rn → Rm so that
i. T (u + v) = T (u) + T (v), and
ii. T (cv) = cT (v).
• Given coordinates, that is, a basis for Rn and Rm , every linear
transformation T is uniquely associated with a matrix A.
• Translating a linear transformation into a matrix:
1. You will be given a linear transformation T : Rn → Rm ,
as well as a basis v1 , . . . , vn of Rn and a basis u1 , . . . , um
of Rm (in practice, you may decide the bases).
2. Evaluate the basis elements of Rn , and write them in the
coordinates of Rm :
T (v1 )
=
a1,1 w1 + a2,1 w2 + · · · + am,1 wm
T (v2 )
=
..
.
a1,2 w1 + a2,2 w2 + · · · + am,2 wm
T (vn )
=
a1,n w1 + a2,n w2 + · · · + am,n wm .
3. Now A = (ai,j ) will be the matrix representation of the
linear transformation in the given basis.
• Two matrices represent the same linear transformation in different coordinates precisely when they are similar, which is
one good reason to use eigenvectors as coordinates (so that
the linear transformation matrix is diagonal).
Lecture 18. Change of basis.
• A natural question is : if a linear transformation T : Rn →
Rm has matrix A with respect to the basis v1 , . . . , vn , and
matrix B with respect to the basis u1 , . . . , um then what is
the relationship between A and B?
• Denote Rn by V or U , depending on the basis used. Then we
want to find a matrix M so that
M
V
↓
U
A
−
→
B
−
→
V
↓
M
U
commutes. That is to say, B = M −1 AM .
• But M may be interpreted as a linear transformation, with
M vi = vi = a1,i u1 + a2,i u2 + · · · + an,i un , and we use the
above to find the matrix M .
18
Download