MATH 2270 Project Description and Ideas November 23, 2014

advertisement
MATH 2270 Project Description and Ideas
November 23, 2014
• Goals of the project:
(a) Apply the linear algebra that you’ve learned to an interesting problem.
(b) Practice reading and thinking about new material on your own.
(c) Show your understanding of linear algebra and work out relevant examples.
(d) Discuss your work with your peers.
(e) Produce a short, understandable paper presenting your findings.
• Topic: You can choose any topic you wish, but you need to make sure that there is
enough linear algebra involved. Solving linear equations will probably not be sufficient. There are detailed examples below and plenty of good ideas in Chapter 8 of
Strang and at http://www.math.harvard.edu/archive/21b\_fall\_03/handouts/
use.pdf and http://aix1.uottawa.ca/~jkhoury/app.htm. If you are inventing your
own topic, be sure to check in with me to make sure you are on the right track.
• Graded components (20 points total, 20% of final grade):
1. Topic proposal (1 point).
(a) Due December 1 or 2, in class. I will meet with each of you on those days to
briefly discuss your topic choice, and you will turn in your proposal. While
I wander around the room, you should form pairs and help each other get
started on your projects.
(b) The proposal should include your chosen topic, the resources you plan to use,
and a short outline of what you plan to do.
2. Small group discussion record (1 point)
(a) On December 8 and 9, you will form small groups of 3-4 people. Each of you
will explain what you are working on. Ask questions and make suggestions!
If your group finishes, form a new group and repeat!
(b) As you go, maintain a sheet of paper with the following information: the
name of each person who explains his or her project to you, the topic he or
she is working on, and a sentence or two summarizing the project. Turn these
sheets in at the end of the day!
(c) You are only expected to attend class on one of those days.
3. Paper (18 points).
(a) Due December 12.
(b) Roughly 5 pages, typed. No other specific format requirements.
(c) To type math symbols and matrices, the options for inserting equations on a
good text editor should be sufficient. You can also use a professional mathematics typesetting language like LaTeX.
(d) Include a bibliography listing your sources, which may include the textbook.
• Class attendance: I expect you to attend class on December 1 or 2, so we can have a
brief chat about your proposal, and on December 8 or 9, so that you can discuss your
project with your peers. I highly recommend that you attend class on November 24
and 26, when I will present the six project ideas included later in this document. We
have a test on November 25. You are under no obligation to attend class on December
3, 5, 10, or 12. I may begin reviewing for the final exam on December 10 or 12.
• Getting help: I am here to help you and I know exactly what my expectations are!
Consult with me frequently while you’re working on your project. I prefer if you come
with questions during the usual class hours (usual room, usual time) or during the
problem sessions (usual rooms, usual times). The regular class hours may also be a
good time for discussions with other students pursuing similar projects. I can also
answer short questions via e-mail. If you desire in-depth help from me, then come to
the problem sessions or make an appointment to meet me at some other time.
• Collaboration: I encourage you to get help from any resource whatsoever (me, other
students in the class, the textbook, the internet, your math-wiz neighbor, Gandalf, ...).
However, getting help is very di↵erent from having someone else do your work for you
or copying someone else’s work. Getting help is about enhancing your understanding
and your ability to write a good paper, not about getting credit for work you didn’t
do. If you copy someone else’s work, I will know about it, and you will get a very poor
score on this project.
• Advice: You will likely encounter new definitions and theorems. A great way to begin
understanding new mathematical definitions is to invent simple examples that satisfy
the definition and simple examples that don’t satisfy the definition. A great way to
understand new theorems is to think about what the theorem is saying in the context
of some simple examples. Simple examples should be part of your write-up! Now you
see why I’ve been asking you for inventions all year long.
• Be brave! It is okay if writing mathematics makes you uncomfortable. Explain things
to the best of your ability. Write up your findings and simple examples as you go, and
get feedback on your work in progress. I’m here to help you every day of the week.
• Below you will find outlines of possible projects. Don’t judge them by their length! The
longer examples tend to be mostly self-contained, while some of the shorter examples
require you to do outside reading (I’ve included sources). The sections marked with
an asterisk (*) are intended for particularly ambitious students.
2
1
“The Extraordinary SVD”
In this project, you will learn how the singular value decomposition (SVD) can be used to
extract the most important information in a matrix, producing good approximations of a
matrix requiring far less data than the original matrix. The first section below covers the
essential facts you need to know about the SVD, while the second and third section are two
di↵erent applications (you should pick one). The title of this project comes from a paper
http://arxiv.org/pdf/1103.2338v5.pdf, which you should peruse.
1. Background on the SVD
(a) Let A be an m⇥n matrix of rank r. Recall that the SVD produces a decomposition
A = U ⌃V T , where the columns ~u1 , . . . , ~um of U are an orthonormal basis of Rm ,
the columns ~v1 , . . . , ~vn of V are an orthonormal basis of Rn , ⌃ is m ⇥ n with the
singular values 1
···
2
r > 0 on its diagonal and zeros everywhere else,
and A~vi = i~ui for all 1  i  r.
(b) Show that the SVD can also be written as a sum of r rank 1 matrices.
A=
u1~v1T
1~
+ ··· +
ur~vrT
r~
(columns times rows).
There is a
Theorem. For every 1  k  r, the matrix obtained by summing the first k terms
Ak =
u1~v1T
1~
+ ··· +
uk~vrT
k~
in the SVD is the best approximation to A by a rank k matrix.
When the singular values decrease quickly, the approximations using the largest
singular values are very good approximations of A by matrices of low rank. When
A is a large matrix, these low rank approximations have much less data than the full
A and hence are much more practical in computations than A. Many applications
make use of this.
(c) As an example,
2
1
A = 41
1
consider the SVD
3
213
2 3
p
0
1
i p 0 h
i
2 h
p1
405 p1 p1 p1
0
15 = 2 4 p12 5 p12 0
+
3
2
3
3
3
1 1
1
0
2
3 2
3
1 0
1
0 0 0
1 5 + 4 0 0 05 .
= 41 0
0 0 0
1 1 1
⇥
⇤
1 occurs twice, and that this double
You can see that in A, the row 1 0
row is the first term in the SVD. This means that this
⇥ double row
⇤ is the best
1 is particularly
approximation to A by a rank 1 matrix, namely this row 1 0
important in explaining what is going on in the matrix. Moreover, the vector ~u1
is a weighted vector telling you which rows are most important for this rank 1
approximation, and the vector ~v1 is a unit approximation of those most important
rows.
3
(d) Building on the intuition from the example, if A is a matrix of data, then the ~viT
are trends in the rows of the matrix of data, and the ~ui show in which rows those
trends are strongest. The importance of each trend is measured by the singular
value i , so the ~v1 trend is most important, followed by ~v2 , and so on.
(e) For large matrices, computing the SVD by hand is impractical. Fortunately, there
is plenty of software that can help us. Matlab, Mathematica, and most other
mathematics or statistics software packages can compute SVD for you. You can
also find free online applications that compute SVD, such as Wolfram Alpha and
http://www.bluebit.gr/matrix-calculator/.
(f) Compute the SVD for some matrices A with multiple rows, or a row that is much
larger in length than all other rows, and observe how the best rank 1 approximation
of A (the first term of the SVD) emphasizes the multiple row or the particularly
large row. What do ~u1 and ~v1 look like, and why? Can you explain why the SVD
is behaving like this?
2. First Application: Congressmen Voting
(a) Read Sections 1, 2, and 3 of the paper http://arxiv.org/pdf/1103.2338v5.
pdf. Feel free to find supplementary resources online (google “.
(b) What is the voting matrix A? Invent some examples of voting matrices.
(c) The paper says that using the SVD, A2 is a good approximation of A. The vector
~u1 correlates well with how “partisan” every Congressman is, while ~u2 correlates
with how “bipartisan” every Congressman is, namely “how often a Congressman
votes with the majority”. Explain how ~u1 , ~u2 let us graph Congressmen in the
plane using these two coordinates.
(d) Explain how A2 can be used to determine (with some uncertainty) whether each
bill passed. How accurate are the results in the paper for the 107th Congress?
(e) Invent a (small) Congress of your own and a list of bills. Invent various voting
matrices A, compute the SVD for each one, graph the Congressmen in the plane
using partisan-bipartisan (~u1 -~u2 ) coordinates, and determine how accurately you
can reconstruct which bills were passed. Try to invent A so that A2 is a very
good approximation of A. Why is A2 so good for this A? Also invent A so that
A2 does a poor job, and explain why A2 is no good for your example. (Hint: the
number of strong political parties – groups of Congressmen that vote similarly –
should play an important role in your answer.)
(f) If you want to go further with this topic, you can find plenty of additional material
online. Googling “voting SVD” is a way to get started.
3. Second Application: Image Compression
(a) For this application, you will need to find some grayscale image files and have access to an application that can convert the pixel values into a matrix and compute
SVD for that large matrix. Matlab is a good choice.
4
(b) Read about image compression at http://www.columbia.edu/itc/applied/e3101/
SVD_applications.pdf.
(c) Convert each of your grayscale images into a matrix A whose entries are the pixel
intensity values. Compute the SVD of A, graph the singular values, compute
approximations Ak for a variety of values k (k is the number of “modes”, and
each Ak is a compressed version of the image A), and convert Ak into an actual
grayscale image.
(d) How many modes do you need for the compressed image to look good? How does
this depend on the individual images and on the sizes of the singular values? How
much more efficient are your nice compressed images in terms of storage space?
5
2
Markov matrices and the Perron-Frobenius Theorem
In this project, you will study when the random walk problem associated to a Markov matrix
stabilizes after a large number of iterations. The solution is but one of the many applications
of the Perron-Frobenius Theorem (which is explained in detail at http://www2.math.umd.
edu/~mboyle/courses/475sp05/spec.pdf).
1. A primitive matrix is a square matrix A with all entries
A has all entries > 0. Primitive matrices satisfy the
0 such that some power of
Theorem (Perron-Frobenius). Suppose A is a primitive matrix. Let be the maximum
of the absolute values of the eigenvalues of A. Then is an eigenvalue of multiplicity 1,
no other eigenvalue has absolute value , and there are eigenvectors in the -eigenspace
that have all entries > 0.
Work out some examples of the theorem (choosing A to have all entries > 0 guarantees

1 0
0 1
that A is primitive) and some non-examples (like A = I =
and A =
).
0 1
1 0
2. Given an n ⇥ n Markov matrix M (all entries are
0 and every column sums to
1), construct a graph (with vertices and edges) whose incidence matrix is M . This
graph will have directed edges weighted by the entries of M . If p~0 is an initial vertex
probability vector (whose entries are the probabilities of starting at each vertex), then
the entries of M p~0 are the vertex probabilities after one iteration, and M k p~0 are the
vertex probabilities after k iterations. (Go over the example we did in class during our
discussion of Section 6.2.)
3. If M is a Markov matrix, when is M primitive? The non-example A = I for the PerronFrobenius theorem above is a Markov matrix that is not primitive. Think about how
being primitive relates to the graph G whose incidence matrix is M . Try to prove the
Theorem. Let M be an n⇥n Markov matrix and G the directed graph that has vertices
v1 , . . . , vn and an edge connecting two vertices vi and vj exactly when mij > 0. If M
is primitive, then G is strongly connected.
4. A primitive n ⇥ n Markov matrix M always has = 1 as its largest eigenvalue (as in
the Perron-Frobenius theorem). It is easy to see that 1 is an eigenvalue of any Markov
matrix M (use the fact that (1, . . . , 1) is an eigenvector of M T and the fact that M and
M T have the same eigenvalues), but not as obvious that 1 is the largest eigenvalue.
Here is an outline of the proof that 1 is the largest eigenvalue:
(a) Let S denote the subset of Rn consisting of all points (x1 , . . . , xn ) such that xi 0
for all i and x1 + · · · + xn = 1. (Sketch what S looks like for n = 1, 2, 3.) Use the
fact that M is Markov to show that for all ~v in S, M~v is still in S.
(b) Since M is primitive, apply the Perron-Frobenius Theorem to find a largest eigenvalue , which has an eigenvector ~r all of whose components are > 0 (hence, by
scaling, you can choose ~r to lie in S). But then by (a), M~r = ~r must be in S, so
M~r = ~r and = 1.
6
5. You are now ready to fully understand random walks on the graph whose incidence
matrix is M when M is primitive and diagonalizable. (Even if M does not have these
properties, you can change the entries of M very slightly to get both properties to
hold. How does this work?)
(a) Diagonalize: M = S⇤S 1 , with the eigenvalues in ⇤ in decreasing order (the
biggest eigenvalue is the most important!). This makes computing powers of M
easy.
(b) Compute M 1 = limk!1 M k using the fact (which you proved above!) that M has
an eigenvalue 1 and all other eigenvalues have absolute value < 1.
(c) Deduce that every initial vertex probability vector p~0 gives rise to the same vertex
probabilities after a long time, namely that M 1 p~0 is independent of p~0 . Interpret
this as a statement about random walks on graphs.
6. * Perron-Frobenius has many other applications. Do some Googling! Actually, Google’s
search algorithm uses Perron-Frobenius (for starters, see http://en.wikipedia.org/
wiki/PageRank), though it is quite complicated. See what you can discover!
7
3
Symmetries of n-gons and platonic solids
In this project, you will study linear transformations of R2 and R3 that preserve a regular
n-gon or a platonic solid, which are called symmetries. (I studied these symmetries for my
undergraduate thesis, which you can find on my webpage.)




1
0
1
0
2
1. Consider the square centered at the origin of R with vertices at
,
,
,
.
0
1
0
1
(a) A symmetry of the square is a linear transformation R2 T / R2 such that applying T to the square results in the same square (some of the points on the square
may move to other points on the square). A symmetry of the square must take
vertices of the square to vertices of the square, and edges to edges.
(b) Find the eight symmetries (rotations and reflections) that preserve the square.
Write these symmetries as matrices with respect to the standard basis of R2 .
(c) What type of matrices are these? (Symmetries must preserve lengths and angles,
since otherwise the square would get distorted.)
(d) What are the eigenvalues and eigenvectors of each of these symmetries? Interpret
your answer geometrically.
2. Do a similar analysis for any regular n-gon centered at the origin in R2 .
2 3
±1
3. Consider the cube centered at the origin of R3 with vertices 4±15.
±1
(a) Make an argument for why this cube has 48 symmetries (rotations and reflections).
(b) Write down some of these symmetries as matrices and discuss their eigenvalues
and eigenvectors.
4. * Do a similar analysis for the other platonic solids: tetrahedron, octahedron, dodecahedron, icosahedron.
5. * Investigate the symmetries of other shapes in R2 or in R3 . Try to find a shape in R2
with infinitely many symmetries!
6. * What happens when you multiply the matrices corresponding to two symmetries
together? Argue that the set of all symmetries for a geometric object forms a group of
matrices (see Strang Exercise 36 on page 119 and Exercise 32 on page 354). The study
of groups is an important branch of algebra.
8
4
Spectral graph theory
In this project, you will study how the eigenvalues of the adjacency matrix and the Laplacian matrix of a graph (the kind with vertices and edges) encode information about the
graph. One resource is the first 10 pages of https://orion.math.iastate.edu/butler/
PDF/dissertation.pdf, which also covers plenty of other cool material.
1. What is a graph? We usually denote a graph, which consists of a vertex set V (G)
and an edge set E(G) together with incidence information, by G. Explain the basic
definitions and notation. You will need to become familiar with walks and spanning
trees.
2. What is the adjacency matrix A of the graph G? Explain why the entries of Ak
measure the number of walks of length k between two vertices. Explain how the
largest eigenvalue of A gives an asymptotic measure of the number of walks of length
k for large k. Explain how all of this works for some simple examples(try the graphs
1 1
0 1
with two vertices whose adjacency matrices are A =
and A =
.)
1 1
1 0
3. What is the combinatorial Laplacian L of G? Explain how the eigenvalues of L can be
used to compute the number of spanning trees of G (the Matrix Tree Theorem). Show
some simple examples!
4. * Let M denote the adjacency matrix with each column normalized by the vertex
degrees. M is a Markov matrix. How can you use powers of M to study random
walks on G? The eigenvectors of M can be used to diagonalize M , which will help in
computing powers of M . What do the eigenvalues of M tell you about random walks
on G that have a large number of steps? See the project “Markov Matrices and the
Perron-Frobenius Theorem” for details.
9
5
Coordinate transformations for 3D graphics
In this project, you will study how coordinate transformations in R3 can be expressed as
matrices. The project is based on Section 8.7 of Strang. For Quaternions, one reference is
Mathematics for 3D Game Programming and Computer Graphics, Third Edition, by Eric
Lengyel.
1. Scaling: what is the matrix S that scales the x, y, z components of a vector ~v in R3
by the scaling factors c1 , c2 , c3 ?
2. Rotation: what is the matrix R~a,✓ that rotates vectors in R3 by an angle ✓ about an
axis span(~a), where ~a is a unit vector?
(a) As a warm-up, do the cases when ~a is a standard basis vector.
(b) Find R~a,✓ for a general ~a. This is not easy! (Derive the formula (1) given on page
461 of Strang.)
3. Projection onto a plane through the origin: what is the matrix P~n that projects vectors
onto the plane through the origin with normal vector ~n?
4. Including translations: think of R3 inside R4 .
(a) Let ~v0 be a fixed vector in R3 . Why is the translation function T (~v ) = ~v + ~v0 not
linear? This means that T cannot be represented by a 3 ⇥ 3 matrix!
(b) We use a trick: put R3 inside R4 as the set of all vectors (“points”) whose fourth
component is 1. Explain the importance of “homogeneous” coordinates.
(c) In our new setting of R3 inside R4 , how can we think of scaling, rotation, and
translation as 4 ⇥ 4 matrices?
(d) Also, what is the 4⇥4 matrix P~n,~v0 that projects onto the “flat” with normal vector
~n that has been shifted ~v0 away from the origin?
5. * Quaternions.
(a) The set of quaternions H is a four-dimensional real vector space with a multiplication operation. Do some research to figure out the details of how H is defined.
(b) Quaternions can be used to represent rotations more efficiently: they require less
storage space, and multiplying quaternions requires fewer computations than composing rotations. Do some research to explain how this works.
10
6
The derivative is a linear transformation
In this project, you will study how the derivative of a function of several variables, which
you saw in Calculus III, can be interpreted as a linear transformation (matrix). At each
point, the derivative gives the best affine approximation to the function. This is the way
mathematicians view the derivative!
1. An affine transformation Rn
L
/ Rm
is a combination of a linear transformation
/R
R
and a translation: (~v ) = L(~v ) + ~v0 , where ~v0 is in Rm . Like linear
transformations, affine transformations take lines to lines and parallelograms to parallelograms, and they have the advantage of being able to move the origin to any other
point ~v0 . Every affine transformation that fixes the origin is a linear transformation.
n
2. Let R
m
F
/ R2
2
be
 a (di↵erentiable) parametrized curve in R , which is a vector of
f (x)
functions F (x) =
. The derivative of F is a column vector of functions F 0 (x) =
g(x)
 0
 0
f (x)
f (x )
0
, which at a point x0 gives a column vector of scalars F (x0 ) = 0 0 . This
0
g (x)
g (x0 )
derivative gives the best affine approximation to F for x close to x0 by
 0
f (x )(x x0 ) + f (x0 )
0
x0 ) + F (x0 ) = 0 0
,
F (x) = F (x0 )(x
g (x0 )(x x0 ) + g(x0 )
which is just a parametrization of the tangent line to the curve F at the point F (x0 ).
/ R2 is affine. Give geometric reasoning for why F is a better
Check that R
approximation to F near x0 than any other affine transformation.
Work all ofthis

2x
x
out for several choices of F : for instance F (x) =
(a line); F (x) =
2
1
x
x

cos x
(a parabola); F (x) =
(unit circle). A similar analysis can be done for a
sin x
parametrized curve in any Rn .
F
3. Let R3 F / R be a (di↵erentiable) function of three variables x, y, z. The derivative
of F is the gradient
⇥
⇤
(x, y, z) @F
(x, y, z) @F
(x, y, z) .
F 0 (x, y, z) = rF (x, y, z) = @F
@x
@y
@z
At every point (x0 , y0 , z0 ), F 0 (x0 , y0 , z0 ) is a 1 ⇥ 3 row vector of scalars, which gives the
best affine approximation to F near (x0 , y0 , z0 ) by
2
3
x x0
0
4
y0 5 + F (x0 , y0 , z0 ).
F (x, y, z) = F (x0 , y0 , z0 ) y
z z0
The product of the row vector F 0 (x0 , y0 , z0 ) by the column vector (x x0 , y y0 , z z0 )
is the directional derivative of F at (x0 , y0 , z0 ) in the direction (x x0 , y y0 , z z0 ),
11
which you learned about in Calculus III! Think about why F , which is built from
the directional derivative, gives the best affine approximation to F near (x0 , y0 , z0 ).
Compute some examples (possibly inspired by your textbook or notes from Calculus
III).
4. Let R2
F
R3 be a 2
(di↵erentiable)
parametrized surface in R3 , which is a vector
3
f (x, y)
4
of functions F (s, t) = g(x, y) 5. The derivative of F is the 3 ⇥ 2 matrix of partial
h(x, y)
derivatives
2
3
@f
@f
6 @x (x, y) @y (x, y)7
6
7
@g
@g
7.
F 0 (s, t) = 6
(x,
y)
(x,
y)
6 @x
7
@y
4
5
@h
@h
(x,
y)
(x,
y)
@x
@y
/
At a point (x0 , y0 ), we get an actual 3 ⇥ 2 matrix of scalars F 0 (x0 , y0 ). This matrix
gives the best affine approximation to F near (x0 , y0 ) by

x x0
0
+ F (x0 , y0 ),
F (x, y) = F (x0 , y0 )
y y0
which is a parametrization of the tangent plane to the surface F at the point F (x0 , y0 ).
Why is F affine? Why is F the best affine approximation to F near (x0 , y0 )? Work
out some explicit examples: for instance F (x, y) = (x, y, x + y) (a plane); F (x, y) =
(x, y, x2 + y 2 ) (a hyperboloid of one sheet); F (x, y) = (x, y, xy) (a saddle); F (x, y) =
(sin x cos y, sin x sin y, cos x) (unit sphere, look at the point (x0 , y0 ) = ( ⇡2 , 0)).
F
/ Rm . What is the
5. Now generalize this discussion to any di↵erentiable function Rn
derivative of F ? Write down the best linear approximation F to F near a point ~x0 of
Rn . This general case unifies many seemingly di↵erent concepts like the derivative of
a function of one variable (Calculus I), tangent lines and planes (as in the above two
cases), and the gradient.
6. * Why does this definition of the derivative agree with the usual definition of the
f
derivative in the case when R
7. * Suppose Rn
F
/
Rm and Rm
/
R?
G
/
R` are di↵erentiable functions. They can be
G F
/ R` . How does this work if F and G
composed to get a di↵erentiable function Rn
are linear or affine? In general, F and G will not be affine, but we can study their best
affine approximations. How is the best affine approximation to G F related to the
best affine approximations to F and G? The relationship is called the “chain rule”,
which generalizes the “chain rule” you have seen in your calculus courses.
8. * In order to give a rigorous proof that
affine approximation to R
n
F
/
R
m
x)
F (~
= F 0 (~x0 )(~x
~x0 ) + F (~x0 ) is the best
near ~x0 , we can use an algebraic definition of the
12
derivative. Recall the definition of the derivative in one variable:
f 0 (x0 ) = lim
x!x0
f (x)
x
f (x0 )
.
x0
Another way to write this is that f 0 (x0 ), if it exists, is the real number such that
lim
x!x0
f (x)
f 0 (x0 )(x x0 ) + f (x0 )
= 0.
x x0
This formulation says that f 0 (x0 )(x x0 ) + f (x0 ) = f (x) is the best affine approximation to f (x), because the di↵erence of f (x) and f (x) vanishes to first order as x
F
/ Rm .
gets close to x0 . Generalize this limit definition of the derivative to any Rn
Your formula will look similar, but it will require some care to state correctly because
it will involve vectors rather than just scalars (for instance, you cannot divide by a
vector). Show that the derivative defined in terms of partial derivatives satisfies the
limit definition.
13
Download