Uploaded by Aarush Kukreja

201Fall2017

advertisement
Math 201: Linear Algebra
David Perkinson
Fall 2017
Contents
Week 1, Monday: Solving systems of linear equations.
6
Week 1, Wednesday: Reduced row echelon form.
11
Week 1, Friday: Vector spaces.
18
Week 2, Wednesday: Subspaces and spanning sets.
25
Week 2, Friday: Linear independence.
36
Week 3, Monday: Linear independence.
42
Week 3, Wednesday: Bases.
46
Week 3, Friday: Dimension.
50
Week 4, Monday: Row rank = column rank; dimension of solution space.
54
Week 4, Wednesday: Linear transformations.
59
Week 5, Monday: Isomorphisms.
62
Week 5, Wednesday: Lines, planes, and hyperplanes: equations and parametrizations.
65
Week 5, Friday: Matrices.
73
Week 6, Monday: The ring of square matrices. Matrix inversion.
76
Week 6, Wednesday: Matrices and linear transformations.
82
Week 6, Friday: Matrices and linear transformations.
87
Week 7, Monday: Matrices and linear transformations: examples. Change of basis.
94
2
Week 7, Wednesday: Determinants.
99
Week 7, Friday: Determinants.
103
Week 8, Monday: det(A) = det(At ).
109
Week 8, Wednesday: Permutation expansion of the determinant.
113
Week 8, Friday: Permutation, Laplace expansions. Existence and uniqueness of the determinant. 117
Week 9, Monday: Geometric interpretation of the determinant.
126
Week 9, Wednesday: Eigenvalues.
129
Week 9, Friday: Eigenvalues, continued.
132
Week 10, Monday: Diagonalization.
136
Week 10, Wednesday: Algebraic and geometric multiplicity. Jordan form.
139
Week 10, Friday: Counting walks in graphs.
145
Week 11, Monday: Differential equations.
150
Week 11, Wednesday: Inner products.
155
Week 11, Friday: Length, distance, components, projections, angles.
158
Week 12, Monday: Gram-Schmidt.
163
Week 12, Wednesday: Orthogonal complements and orthogonal projections.
169
Week 13, Monday: Least squares.
173
Week 13, Wednesday: Markov chains I.
179
Week 13, Friday: Markov chains II. Pagerank I.
185
Week 14, Monday: Pagerank II.
191
Week 14, Wednesday: Pagerank III.
195
3
200
homework
Week 1, Friday
201
Week 2, Tuesday
203
Week 2, Friday
204
Week 3, Tuesday
205
Week 3, Friday
206
Week 4, Tuesday
208
Week 5, Tuesday
209
Week 5, Friday
210
Week 6, Tuesday
211
Week 6, Friday
212
Week 7, Tuesday
213
Week 7, Friday
214
Week 8, Tuesday
217
Week 8, Friday
219
Week 9, Tuesday
223
Week 10, Tuesday
225
Week 10, Friday
226
Week 11, Tuesday
227
Week 11, Friday
228
Week 12, Tuesday
230
Week 13, Tuesday
232
Week 13, Friday
233
Week 14, Tuesday
234
4
236
handouts
Mathematical writing.
237
Definition of the determinant.
238
Midterm II review.
239
Practice problems for final exam.
240
5
Week 1, Monday: Solving systems of linear equations.
6
Week 1, Wednesday: Reduced row echelon form.
11
Week 1, Friday: Vector spaces.
18
Week 2, Wednesday: Subspaces and spanning sets.
25
Week 2, Friday: Linear independence.
36
Week 3, Monday: Linear independence.
Let V be a vector space over a field F . Recall that vectors u1 , . . . , un ∈ V are linearly
independent if they have no nontrivial linear relation, i.e., if
a1 u1 + · · · + an un = 0
for some ai ∈ F if and only if ai = 0 for all i. A subset S ⊆ V is linearly independent
if each of its finite subsets is linearly independent, i.e., if there are no nontrivial linear
relations among its elements.
The most important result from last time was the following:
Theorem 1. Let S ⊆ V be linearly independent, and let v ∈ Span(S). The v has a
can beP
expressed uniquelyPas a linear combination of elements of S. In other words,
if v = ki=1 ai ui and v = `i=1 bi wi for some nonzero ai , bi ∈ F and some ui , wi ∈ S,
then up to re-indexing, we have k = `, ui = wi , and ai = bi for all i.
Today, we’ll continue exploring some basic properties of vector spaces having to do
with linear dependence.
Proposition 1. Let S ⊆ V . The S is linearly dependent if and only if there
exists v ∈ S such that v is a linear combination of elements of S \ {v}.
Proof. (=⇒) Suppose S is linearly dependent. Then there exists u1 , . . . , un ∈ S
and a1 , . . . , an ∈ F \ {0} such that
a1 u1 + · · · + an un = 0.
In particular, since a1 6= 0, we can divide by it and solve for u1 :
u1 = −
a2
an
u2 − · · · − un .
a1
a1
Thus, u1 is in the span of S \ {u1 }.
(⇐=) Now suppose that v is a linear combination of elements of S \ {v}. In other
words, we can write
v = a1 u1 + · · · + an un
42
for some ui ∈ S \ {v} and some ai ∈ F . But then,
a1 u1 + · · · + an un + (−1) · v
is a nontrivial linear combination of elements of S. So S is linearly dependent.
Proposition 2. A subset of a linearly independent set is linearly independent.
P
Proof. Let S ⊂ V be linearly independent, and let S 0 ⊆ S. Suppose ni=1 ai ui = 0
for some ui ∈ S 0 and some ai ∈ F . Then since S 0 ⊆ S, it follows that ui ∈ S, too, for
all i. Since S is linearly independent, it follows that ai = 0 for all i.
Very soon, we will endeavor to create a maximal set of linearly independent vectors
in V by the following process: first let v1 be any nonzero vector in V . If v1 generates V ,
i.e., if V = Span({v1 }), then stop. Otherwise, let v2 ∈ V \ Span({v}). If {v1 , v2 }
generates V , stop. Otherwise, choose v3 ∈ V \ Span(v1 , v2 ). Repeat, etc. We would
like to say that the vi form a linearly independent set. That motivates the following:
Proposition 3. If S ⊂ V is linearly independent and v ∈ V \ S, then S ∪ {v} is
linearly dependent if and only if v ∈ Span(S).
Proof. (=⇒) Suppose that S ∪ {v} is linearly dependent. Then we may write
av + a1 u1 + · · · + an un = 0
(?)
for some a, ai ∈ F , not all zero, and some ui ∈ S. We can always assume that v
appears in this expression by taking a = 0, if necessary. But, in fact, a 6= 0 since
otherwise (?) would be a linear relation among elements of S. Since S is linearly
independent, this would mean that all the ai = 0, in addition to a = 0. However, we
know that at least one of these scalars in nonzero.
Thus, it must be that a 6= 0. We can thus solve for v in (?):
an
ai
v = − u1 − · · · − un ∈ Span(S).
a
a
(⇐=) Suppose that v ∈ Span(S). Then
v = a1 u1 + · · · + an un
for some ai ∈ F and ui ∈ S. Since v ∈
/ S, it follows that
a1 u1 + · · · + an un + (−1) · v
is a nontrivial relation among elements of S ∪ {v}. So S ∪ {v} is linearly dependent. 43
Example 1. Let V = (Z/5Z)2 = {(x, y) : x, y ∈ Z/5Z}. The span of (1, 3) ∈ V is
0 · (1, 3) = (0, 0)
1 · (1, 3) = (1, 3)
2 · (1, 3) = (2, 6) = (2, 1)
3 · (1, 3) = (3, 9) = (3, 4)
4 · (1, 3) = (4, 12) = (4, 2).
Here is a picture of Span({(1, 2)}) ⊂ V :
(0, 0)
(0, 0)
(0, 0)
(0, 0)
Line spanned by (1, 3) in (Z/3Z)2 .
Example 2. Let V = (Z/3Z)3 , a vector space over Z/3Z.
How many elements are in V ? A point in V has the form (x1 , x2 , x3 ), and there are 3
choices for each xi . Hence, the number of elements in V is |V | = 33 = 27.
As an exercise, check that the following is a subspace of V :
W = {(x1 , x2 , x3 ) ∈ V : x1 + x2 + x3 = 0} .
How many elements are in W ? We have,
W = {(−x2 − x3 , x2 , x3 ) : x2 , x3 ∈ Z/3Z} .
As we let x2 and x3 vary, we get 9 elements:
{(0, 0, 0), (2, 1, 0), (1, 2, 0), (2, 0, 1), (1, 1, 1), (0, 2, 1), (1, 0, 2), (0, 1, 2), (1, 2, 2)}.
44
Let’s try to find a linearly independent generating set. Start with v1 := (2, 1, 0). The
span of {v1 } has three elements:
0 · (2, 1, 0) = (0, 0, 0)
1 · (2, 1, 0) = (2, 1, 0)
2 · (2, 1, 0) = (1, 2, 0).
Next, note that v2 = (1, 1, 1) is not in Span(v1 ). By Proposition 3, we see that S :=
{v1 , v2 } is linearly independent. We claim Span(S) = W . First, since v1 , v2 ∈ W ,
we see Span(S) ⊆ W . Next, by Theorem 1, every element of Span(S) has a unique
expression of the form
a1 v 1 + a2 v 2
where a1 , a2 ∈ Z/3Z. Hence, |S| = 32 = 9. Since S ⊆ W and |S| = |W | = 9, it
follows that W = V .
45
Week 3, Wednesday: Bases.
Definition A subset B ⊂ V is a basis is it is linearly independent and spans V . An
ordered basis is a basis whose elements have been listed as a sequence: B = hb1 , b2 , . . . i.
? Warning ? : Our book defines a basis to be what we are calling an ordered basis.
That’s not standard, and there are problems with that idea when talking about
infinite-dimensional vector spaces, which we will not go into here. We will, however,
use the book’s notation of “h” and “i” to denote an ordered basis. Thus, for us,
the word basis will connote “unordered basis”, and we will try to be careful to say
“ordered basis” when relevant. (No guarantees, though.)
Proposition 1. If B is a basis for V , then every element of V can be expressed
uniquely as a linear combination of elements of B.
Proof. Since B is linearly independent, we’ve already seen that every element
in Span(B) can be written uniquely as a linear combination of elements of B. Since B
is a basis, Span(B) = V .
Definition. Let B = hv1 , . . . , vn i be an ordered basis for V . Given v ∈ V , there are
unique a1 , . . . , an ∈ F such that
v = a1 v1 + · · · + an vn .
The coordinates of v with respect to the basis B are the compondents of the vector (a1 , . . . , an ) ∈ F .
Examples.
1. As a trivial case, consider the standard ordered basis for F 3 : B = he1 , e2 , e3 i where
e1 = (1, 0, 0), e2 = (0, 1, 0), and e3 = (0, 0, 1). It’s easy to check that B is linearly
independent. Given any vector (x, y, z) ∈ F 3 , we can write
(x, y, z) = x(1, 0, 0) + y(0, 1, 0) + z(0, 0, 1) = xe1 + ye2 + ze3 .
Hence, the coordinates of (x, y, z) with respect to the standard ordered basis is, in
fact, are given by (x, y, z).
46
2. The ordering of the basis vectors affects the coordinates. For instance, if we take
B 0 = he01 , e02 , e03 i where e01 = (0, 0, 1), e02 = (0, 1, 0), and e03 = (1, 0, 0)—a permutation of e1 , e2 , e3 —then
(x, y, z) = x(1, 0, 0) + y(0, 1, 0) + z(0, 0, 1) = ze01 + ye02 + ze03 .
Hence, the coordinates of (x, y, z) ∈ F 3 with respect to B 0 are given by (z, y, x).
3. Considered the ordered basis B 00 = {(1, 0, 0), (1, 1, 0), (1, 1, 1)} of F 3 (we leave the
verification that B 00 is a basis as an exercise). Given (x, y, z) ∈ F 3 , we can write
(x, y, z) = (x − y)(1, 0, 0) + (y − z)(1, 1, 0) + z(1, 1, 1).
So the coordinates of (x, y, z) ∈ F 3 with respect to B 0 are given by the vector (x −
y, y − z, z). For instance, the point (1, 0, 3) ∈ F 3 , i.e., the point whose coordinates
with respect to the standard basis are (1, 0, 3), has coordinates (1, −3, 3) with
respect to B 00 .
4. Now let V be the vector space of 2 × 2 matrices over a field F . One possible basis
is B = hM1 , M2 , M3 , M4 i where
1 0
0 1
0 0
0 0
M1 =
, M2 =
, M3 =
, M4 =
.
0 0
0 0
1 0
0 1
Then, for example, we can write
a b
= aM1 + bM2 + cM3 + dM4 .
c d
a b
So with respect to the basis B, the matrix
is represented by the vecc d
tor (a, b, c, d) ∈ F 4 .
5. Writing a vector in terms of its coordinates with respect to a given ordered basis
often comes down to solving a system of linear equations. For a concrete example,
let’s write (7, −6) ∈ R2 in terms of B = h(5, 3), (1, 4)i. We need to find a, b ∈ R
such that
(7, −6) = a(5, 3) + b(1, 4).
Thus, we must solve the system of equations
5a + b = 7
3a + 4b = −6.
47
Applying the algorithm yields a = 2 and b = −3. So the coordinates of (7, −6)
with respect to B are given by (2, −3). The picture below gives the geometry.
The basis vectors are in blue, and the red vectors indicate how (7, −6) is a linear
combination of the basis vectors.
y
2 · (5, 3)
x
(7, −6)
−3 · (1, 4)
Preservation of linear structure. Given an ordered basis B = hv1 , . . . , vn i for V ,
we have just given a method for representing each vector in V by an n-tuple in F n :
v ←→ (a1 , . . . , an )
Pn
where v = i=1 ai vi . We thus get a mapping of sets V → F n , and by Proposition 1,
this mapping is a bijection. However, this mapping is more than a mapping of sets—it
also “preserves linear structure”. We will talk a lot more about this later, but for
now, we can illustrate the idea of preserving linear structure using example 4, above.
Suppose we have two matrices
a b
A=
c d
0
and A =
a0 b 0
c0 d 0
,
Then the coordinates of A and A0 with respect to the ordered basis B = hM1 , M2 , M3 , M4 i
in the example are (a, b, c, d) and (a0 , b0 , c0 , d0 ), respectively, i.e.,
A = aM1 + bM2 + cM3 + dM4
A0 = a0 M1 + b0 M2 + c0 M3 + d0 M4 .
Add these equations to see that the coordinates for A + A0 are given by (a + a0 , b +
b0 , c + c0 , d + d0 ):
A + A0 = (a + a0 )M1 + (b + b0 )M2 + (c + c0 )M3 + (d + d0 )M4 .
48
Similarly, if λ ∈ F , the coordinates for λA are given by λ(a, b, c, d) = (λa, λb, λc, λd):
λA = λaM1 + λbM2 + λcM3 + λdM4 .
Schematically:
A + A0
λA
←→
←→
(a, b, c, d) + (a0 , b0 , c0 , d0 )
λ(a, b, c, d).
In the left column, we are performing vector addition and scalar multiplication in the
vector space of 2 × 2 matrices. In the right, we are performing these operations in F 4 ,
a different vector space. Nevertheless, the vector space operations align: it doesn’t
matter if we first take coordinates for A and A0 , then add the coordinates of if we
first add A and A0 , then take coordinates; we end up with the same vector in F 4 . The
same holds for scaling. This is what we mean by preserving linear structure. Again:
to get the coordinates for A + A0 , we can add the coordinates of A to those of A0 . To
get the coordinates for λA, we can scale the coordinates of A by λ.
Next week, when we talk about about linear transformations, we will have the appropriate language in which to more precisely describe what is meant by preserving linear
structure. The main point for now is to see that, in some sense, the vector space V
of 2 × 2 matrices is “essentially the same” as the vector space of 4-tuples, F 4 —not
just as sets, but as vector spaces.
49
Week 3, Friday: Dimension.
Definition. A vector space is finite-dimensional if it has a basis with a finite number
of elements.
For example, F n , has a basis with n elements. Examples of infinite-dimensional vector spaces include polynomials F [x] in one variable with coefficients in a field F , the
real numbers as a vector space over the rational numbers, and the space of functions f : R → R.
Our goal today is to show that if V is a finite-dimensional vector space, then every
basis for V has the same number of elements. Thus, the following definition makes
sense:
Definition. If V is a finite-dimensional vector space, then the dimension of V ,
denoted dim V or dimF V , if we want to make the scalar field explicit, is the number
of elements in any of its bases.
Exchange Lemma. Suppose B = {v1 , . . . , vn } is a basis for a vector space V over
a field F . Further, suppose that
w = a1 v1 + · · · + an vn ∈ V
(?)
with ai ∈ F , and such that a` 6= 0. Let B 0 be the set of vectors obtained from B by
exchanging w for v` . Then B 0 is also a basis for V .
Proof. We first show that B 0 is linearly independent. For ease of notation, we may
assume that ` = 1, i.e., that a1 6= 0. Suppose we have a linear relation among the
elements of B 0 :
bw + b2 v2 + · · · + bn vn = 0
Substituting for w:
0 = b(a1 v1 + · · · + an vn ) + b2 v2 + · · · + bn vn = ba1 v1 + (ba2 + b2 )v2 + · · · + (ba3 + bn )vn .
Since the vi are linearly independent,
ba1 = ba2 + b2 = · · · = ban + bn = 0.
50
Since a1 6= 0, it follows that b = 0 and then that b2 = · · · = bn = 0, as well.
Therefore, B 0 is linearly independent.
We now show that B 0 spans V . First, solve for v1 in (?):
v1 =
a2
an
1
w − v2 − · · · −
.
a1
a1
an
To see that B 0 spans, take v ∈ V . Since B is a basis, v can be written as a linear
combination of B = {v1 , . . . , vn }, but then substituting the above expression for v1
will express v as a linear combination of B 0 = {w, v2 , . . . , vn }, as required:
v = c1 v1 + · · · + cn vn
1
a2
an
w − v2 − · · · −
vn + c2 v2 + · · · + cn vn
=
a1
a1
an
a2
1
an
= w + − + c2 v2 + · · · + − + cn vn .
a1
a1
a1
Theorem. In a finite-dimensional vector space, every basis has the same number of
elements.
Proof. Let V be a finite-dimensional vector space. Among all the bases for V ,
let B = {u1 , . . . , un } be one of minimal size. Let C = {w1 , w2 , . . . , } be any other
basis. We would like to show that the C has the same number of elements as B. By
choice of B, it has at least as many elements.
The idea is to start with B, then use the exchange lemma to swap in n-elements
from C, one at a time, maintaining a basis at each step. To that end, let B0 = B and
consider w1 ∈ C. By the exchange lemma, we get a new basis B1 by swapping w1
with some u` ∈ B0 . For ease of notation, let’s suppose that ` = 1. Therefore, B1 =
{w1 , u2 , . . . , un }.
Next, consider w2 ∈ C. Since B1 is a basis, we know w2 ∈ Span(B1 ), hence, we can
write
w2 = a1 w1 + a2 u2 + . . . an un
for some ai ∈ F . Since w1 and w2 are linearly independent, at least one of a2 , . . . , an
is nonzero. Without loss of generality, suppose a2 6= 0. Then by the exchange
algorithm, B3 := {w1 , w2 , u3 , . . . , un } is a basis. Continuing in this way, we eventually
reach the basis Bn = {w1 , . . . , wn }, which is a subset of C. In fact, we must have Bn =
C. Otherwise, there is a wn+1 ∈ C. Since Bn is a basis, wn+1 ∈ Span(Bn ), in other
51
P
words, wn+1 = ni=1 di wi for some di ∈ F . But that can’t happen since C is a basis:
it’s elements are linearly independent. So, in fact, C also has n elements.
Corollary. Let V be a finite-dimensional vector space, and let S be a linearly
independent subset of V . Then S can be completed to form a basis of V ; in other
words, there is a basis for V that contains the elements of S.
Proof. We have seen that if V 6= Span(S), then for any choice of v ∈ V \ Span(S),
the set S ∪ {v} is linearly independent. Using that result, we repeatedly add vectors
to S until we get a set that spans V . That must happen after a finite number of steps
by the Theorem.
Corollary. Let V be a finite-dimensional vector space, and suppose V = Span(S).
Then there is a subset T ⊆ S forming a basis for V .
Proof. Similar to that of the previous corollary. See our text.
.
Corollary. A collection of n vectors in an n-dimensional vector space is linearly
independent if and only if it spans V .
Proof. (=⇒) Suppose S is a set of n linearly independent vectors in an n-dimensional
vector space V . By the first corollary, above, S can be completed to a basis for V .
But any basis for V contains n elements. This means that S must already be a basis.
(⇐=) Now suppose that S spans V . By the second corollary, we can shrink S to a
basis for V . However, again, a basis for V must contain n elements. So S is already
a basis.
Thus, it follows that for a finite-dimensional vector space:
• A basis is a minimal (with respect to size or inclusion) spanning set.
• A basis is a maximal (with respect to size or inclusion) linearly independent set.
Examples.
1. Rn has basis {e1 , . . . , en } where ei is the i-th standard basis vector. Hence, dim Rn =
n.
2. The vectors (1, 0, 0), (1, 2, 0), (1, 2, 3) ∈ R3 are linearly independent since (1, 2, 0) ∈
/
3
Span({(1, 0, 0)}) and (1, 2, 3) ∈
/ Span({(1, 0, 0), (1, 2, 0)}). Since dim R = 3, these
vectors form a basis.
3. Let R[x]≤2 be the vector space of polynomials in one variable with coefficients in R.
This space has dimension 3 since it has a basis {1, x, x2 }. Since 1, 1+2x, 1+2x+3x2
52
are linearly independent (see the previous example), they also form a basis: so
every polynomial of degree 2 over R can be written as a linear combination of
these polymonials.
Extra time activity for Friday afternoon. Let F = Z/3Z, and consider the
following twelve points in F 4 :
(0, 1, 0, 0) (0, 0, 1, 2) (1, 1, 1, 1)
(2, 2, 0, 1) (0, 2, 2, 2) (0, 2, 0, 1)
(1, 0, 0, 2) (0, 2, 1, 0) (2, 2, 2, 1)
(0, 1, 0, 0) (0, 0, 0, 1) (0, 0, 0, 1)
Goal: find subsets of size three of this array that sum to (0, 0, 0, 0).
All solutions:
• (0, 1, 0, 0), (0, 0, 1, 2), (0, 2, 2, 1)
• (0, 1, 0, 0), (2, 2, 0, 1), (1, 0, 0, 2)
• (1, 1, 1, 1), (2, 2, 2, 1), (0, 0, 0, 1)
• (0, 2, 2, 2), (0, 2, 0, 1), (0, 2, 1, 0)
Observations:
• Three vectors sum to zero if and only if in each component, the entries are either all
the same or all different. For example, in the solution (2, 0, 0, 1), (2, 0, 1, 2), (2, 0, 2, 0),
the entries in the first component are all 2, the entries in the second component
are all 0, the entries in the third and fourth components are 0, 1, 2—all different.
• If u, v, w is a solution so that u + v + w = 0, consider the value of
u + t(v − u)
as t varies among the element of F . When t = 0, we get u. When t = 1, we get u,
and when t = 2, we get
u + 2(v − u) = −u + 2v = −u − v = w,
recalling that 2 = −1 in F = Z/3Z. We may think of t(v − u) as determining a
line through the origin as t varies. So then u + t(v − u) is that line translated by
the vector u.
53
Week 4, Monday: Row rank = column rank; dimension of
solution space.
Leftovers. The following are corollaries, left over from last time, of the fact that all
bases of a finite-dimensional vector space have the same number of elements.
Corollary 1. Let V be a vector space with of finite dimension n. Then no linearly
independent subset of V can have more than n elements.
Proof. This result is really a corollary of the proof of the theorem we proved last
time. Let C be any linearly independent subset, and let B be a basis. We saw that,
one step at a time, the elements of C could be swapped in for the elements of B,
without changing the fact that we had a basis. If C had more than n elements, then
after swapping in n elements, we’d have a new basis Bn = {c1 , . . . , cn } consisting
solely of elements of C. Any left-over element of C would be a linear combination of
{c1 , . . . , cn }, which is not possible since C is linearly independent.
Corollary 2. Let V be a finite-dimensional vector space, and let S be a linearly
independent subset of V . Then S can be completed to form a basis of V ; in other
words, there is a basis for V that contains the elements of S.
Proof. We have seen that if V 6= Span(S), then for any choice of v ∈ V \ Span(S),
the set S ∪ {v} is linearly independent. Using that result, we repeatedly add vectors
to S until we get a set that spans V . That must happen after a finite number of steps
by Corollary 1.
Corollary 3. Let V be a finite-dimensional vector space, and suppose V = Span(S).
Then there is a subset T ⊆ S forming a basis for V .
Proof. If S = ∅ or S = {0}, then V = {0} and {∅ ⊆ S} is a basis. Otherwise,
let B = ∅, and start adding linearly independent elements to B from S: at each step,
pick v ∈ V \ Span(B). This process must stop due to Corollary 1.
.
Corollary. A collection of n vectors in an n-dimensional vector space is linearly
independent if and only if it spans V .
Proof. (=⇒) Suppose S is a set of n linearly independent vectors in an n-dimensional
vector space V . By the Corollary 2, above, S can be completed to a basis for V . But
54
any basis for V contains n elements. This means that S must already be a basis.
(⇐=) Now suppose that S spans V . By Corollary 3, we can shrink S to a basis for V .
However, again, a basis for V must contain n elements. So S is already a basis. 55
Thus, it follows that for a finite-dimensional vector space:
• A basis is a minimal (with respect to size or inclusion) spanning set.
• A basis is a maximal (with respect to size or inclusion) linearly independent set.
Row rank and column rank.
Definition. Let A be an m × n matrix over F . The row space of A is the subspace
of F n spanned by its rows, and the column space of A is the subspace of F m spanned
by its columns. The row rank of A is the dimension of its row space, and the column
rank of A is the dimension of its column space.
Since row operations are reversible, any matrix obtained from a matrix A by performing row operations has the same row space. In particular, the row space of A
is the same as the row space of it reduced echelon form. From the structure of the
reduced echelon form, it’s clear that its nonzero rows form a basis for its row space.
This gives an algorithm for computing the row space of a matrix.
Algorithm for computing a basis for the row space and the row rank.
Given an m × n matrix A, compute its reduced echelon form E. Then the rows of E
are a basis for the row space of A. The number of nonzero rows in E is the row rank
of A.
Example. Let


1 2 0 4
A =  3 3 1 0 .
7 8 2 4
To compute a basis for the row space of A, compute



1
1 2 0 4

A =  3 3 1 0  −→ E =  0
7 8 2 4
0
its reduced echelon form:

2
−4
0
3

1 − 13
4 .
0
So a basis for the row space of A is:
1, 0, 32 , −4 , 0, 1, − 13 , 4 .
56
0
0
Lemma. Row operations do not change the column rank.
Proof. Suppose A is an m × n matrix, and we have a relation among its columns:




a11
a1n
 a21 
 a2n 




c1  ..  + · · · + cn  ..  = 0.
 . 
 . 
am1
amn
Adding up the left-hand side, we see the relation is equivalent to a solution (c1 , . . . , cn )
to the linear system
c1 a11 + · · · + cn a1n = 0
..
..
..
.
.
.
c1 am1 + · · · + cn amn = 0.
Row operations do not change a solution to the system of equations, hence they do
not change relations among the columns.
As part of the above proof, we see that relations among the columns of a matrix
correspond to relations among the columns of the reduced echelon form of the matrix.
It’s clear from the structure of the reduced echelon form that the basic columns—the
ones with the pivot terms, i.e., those corresponding to the non-free variables—are a
basis. That means that these same columns of the original matrix form a basis for
the original matrix.
Algorithm for computing a basis for the column space and the column rank. Given a matrix A, compute its reduced echelon from E. Say that
columns j1 , . . . , jk are the basic columns of E (those corresponding to the non-free
variables—the one that have a single non-zero entry and that entry is equal to 1.
Then columns j1 , . . . , jk are a basis for the columns space of A. The columns rank
of A is k, the number of basic columns of its reduced echelon form.
WARNING: Be sure to take columns j1 , . . . , jk of A, not of E. (Compare with the
algorithm for computing a basis for the row space, where life is much easier.)
Example: In the previous
matrix:

1 2

3 3
A=
7 8
example, we computed


1
0 4

1 0  −→ E =  0
2 4
0
57
the reduced echelon form of a

2
0
−4
3

1 − 13
4 .
0
0
0
The first two columns of E are its basic columns. Therefore, the first two columns
of A form a basis for its columns space:
   
1
2
 3 , 3 .
7
8
NOTE: The first two columns of E in this case are the first two standard basis vectors,
which clearly don’t give the same span as the above two vectors.
We get the following, rather surprising result:
Theorem. The row rank of a matrix A is equal to its column rank.
Proof. Let E be the reduced echelon form of A. Then the number of its nonzero
rows is equal to the number of its basic columns.
Definition. The rank of a matrix A, denoted rank(A) is the dimension of its row
space or column space.
Suppose we have a homogeneous system of linear equations
a11 x11 + · · · + a1n xn = 0
..
..
..
.
.
.
am1 xm + · · · + amn xn = 0.
Let A = (aij ) be the matrix of coefficients. To solve the system, we compute the
reduced echelon form of the matrix A. The number of free parameters for the solution
space is then the number of non-basic columns, i.e., n − rank(A). There is a unique
solution ~0 exactly when the reduced echelon form is the matrix with 1s along its
diagonal and 0s, otherwise, i.e., exactly when there are no non-basic columns. Hence,
there is only the trivial solution if and only if rank(A) = n.
For a non-homogeneous system
a11 x11 + · · · + a1n xn = b1
..
..
..
.
.
.
am1 xm + · · · + amn xn = bn .
we would compute the echelon of the augmented matrix [A|b] where b is the column
with entries b1 , . . . , bn . If the system is consistent, we have seen that the set of
solutions consists of any particular solution plus any vector in the span of n−rank(A)
vectors that are solutions to the corresponding homogeneous system. So if the system
is consistent, there is a unique solution if and only if rank(A) = n.
58
Week 4, Wednesday: Linear transformations.
We’ll start off today by finishing the stuff we didn’t get to last time:
1. Relations among columns in a matrix are not changed by row operations.
2. Row rank = column rank
3. The number of free variables for an m × n linear system is n − rank(A) where A
is the m × n matrix of coefficients for the system.
Linear transformations. We have now defined the objects of study—vector spaces.
Next, we need to consider the appropriate mappings between those objects—those
that preserve the linear structure.
Definition. Let V and W be vector spaces over a field F . A linear transformation
from V to W is a function
f: V →W
satisfying, for all v, v 0 ∈ V and λ ∈ F ,
f (v + v 0 ) = f (v) + f (v 0 ) and f (λv) = λf (v).
Remarks. Using the notation from the definition:
• If f (v + v 0 ) = f (v) + f (v 0 ), we say f preserves addition. Not the the addition on the
left side is in V and the addition on the right side is in W . Thus, if V 6= W , they
are two different operations (with the same name). Similarly, if f (λv) = λf (v), we
say f preserves scalar multiplication.
• One may combine the two conditions, above, for linearity into one: for f to be
linear, we require
f (v + λv 0 ) = f (v) + λf (v 0 )
for all v, v 0 V and λ ∈ F .
59
• Synonyms for “linear transformation” are: “linear mapping” and “linear homomorphism”, often with the word “linear” dropped when clear from context (and it will
be since this is a course in linear algebra!).
• Our book restricts “linear transformation” to mean a linear transformation of the
form f : V → V , where the domain and codomain are equal. That is non-standard,
and we won’t use that terminology. Linear mappings from a vector space to itself
are called linear endomorphisms or linear self-mappings.
Template for a proof that a mapping is linear. Consider the function
f : R3 → R2
(x, y, z) 7→ (2x + 3y, x + y − 3z).
Claim: f is linear.
Proof. Let (x, y, z), (x0 , y 0 , z 0 ) ∈ R3 and λ ∈ R.
f ((x, y, z) + (x0 , y 0 , z 0 )) = f (x + x0 , y + y 0 , z + z 0 )
= (2(x + x0 ) + 3(y + y 0 ), (x + x0 ) + (y + y 0 ) − 3(z + z 0 ))
= ((2x + 3y) + (2x0 + 3y 0 ), (x + y − 3z) + (x0 + y 0 − 3z 0 ))
= (2x + 3y, x + y − 3z) + (2x0 + 3y 0 , x0 + y 0 − 3z 0 )
= f (x, y, z) + f (x0 , y 0 , z 0 ).
Thus, f preserves addition. Next,
f (λ(x, y, z)) = f (λx, λy, λz)
= (2(λx) + 3(λy), (λx + λy − (3λz))))
= (λ(2x + 3y), λ(x + y − 3z))
= λ(2x + 3y, x + y − 3z)
= λf (x, y, z).
Thus, f preserves scalar multiplication.
Note: People sometimes confuse proofs that subsets are subspaces with proofs that
mappings are linear. To prove that W ⊆ V is a subspace, we show that W is
closed under addition and scalar multiplication by taking u, v ∈ W and λ ∈ F and
showing u+λv ∈ W . To prove f : V → W is linear, we show that f preserves addition
and scalar multiplication. Be careful not to confuse the words “closed under” with
“preserves”.
60
Exercise. Show that f : R → R defined by f (x) = x2 is not linear.
Proof. We have f (1 + 1) = f (2) = 4 6= f (1) + f (1) = 1 + 1 = 2.
The following proposition is often useful for showing a function is not linear.
Proposition 1. If f : V → W is linear, then f (~0V ) = ~0W .
Proof. Since f is linear,
f (~0V ) = f (0 · ~0V ) = 0 · f (~0V ) = ~0W .
Thus, for instance,
f : R2 → R
(x, y) 7→ x + 2y + 5
is not linear since f (0, 0) = 5 6= 0.
Proposition 2. (A linear mapping is determined by its action on a basis.) Let V
and W be vector spaces over F , and let B be a basis for V . For each b ∈ B, let wb ∈ W .
Then there exists a unique linear function f : V → W such that f (b) = wb .
Proof. We define f as follows: Given v ∈ V , since B is a basis, we can write v =
a1 b1 + · · · + ak bk for some ai ∈ F , bi ∈ B, and k ∈ Z≥0 . Define
f (v) := a1 f (b1 ) + . . . ak f (bk ).
Since B is a basis, the expression of v as a linear combination of elements in B is
unique. Hence, f is well-defined. Further, linearity of f forces us to define f (v) as
we have.
Terminology. We say the function f as in Proposition 2 has been defined on B then
extended linearly to all of V .
Definition. Let V and W be vector spaces over F . The collection of all linear
functions from V to W is denoted Hom(V, W ) or L(V, W ). It is a vector space over F
under addition and scalar multiplication of functions: for linear f, g : V → W ,
f + λg : V → W
v 7→ f (v) + λg(v).
61
Week 5, Monday: Isomorphisms.
Let f : V → W be a linear mapping between vectors spaces V and W over a field F .
Recall the definitions from last time:
Definition. The null space or kernel of f is
N (f ) = ker(f ) = {v ∈ V : f (v) = 0W } = f −1 (0W ).
The range or image of f is
R(f ) = im(f ) = {f (v) ∈ W : v ∈ V } = f (V ).
We proved that ker(f ) is a subspace of V and im(f ) is a subspace of W . Also,
if dim V < ∞, we have the important rank-nullity theorem:
dim ker(f ) + dim im(f ) = dim V.
Defining nullity(f ) = dim ker(f ) and rank(f ) = dim im(f ), we can write the ranknullity theorem like this:
nullity(f ) + rank(f ) = dim V.
(Don’t confuse this concept with the mullity of f , defined as follows: mullity(f ) =
p(f ) + b(f ) where p(f ) is the amount of party of f in the back and b(f ) is the amount
of business of f in the front.)
Proposition 1. The linear mapping f : V → W is injective (i.e., one-to-one) if and
only if ker(f ) = {0V }.
Proof. First suppose that f is injective, and let v ∈ ker(f ). Since f is linear,
f (0V ) = 0W . Since f (v) = f (0V ) = 0W and f is injective, v = 0V . Therefore,
ker(f ) = {0V }. Next suppose that ker(f ) = {0V }, and let u, v ∈ V with f (u) = f (v).
It follows that f (u − v) = f (u) − f (v) = 0W . Hence, u − v ∈ ker(f ) = {0V }.
So u − v = 0V , which means u = v. Therefore, f is injective.
Proposition 2. Let S ⊆ V .
62
1. If S is linearly dependent, then f (S) := {f (s) : s ∈ S} ⊆ W is linearly dependent.
2. If f is injective and S is linearly independent, then f (S) ⊆ W is linearly independent.
P
Proof. Suppose that ki=1 ai si = 0V for some ai ∈ F and si ∈ S. Since f is linear,
we have
P
P
0W = f (0V ) = f ( ki=1 ai si ) = ki=1 ai f (si ).
Thus, f preserves linear dependencies, as claimed in part 1.
Suppose now that f is injective and S is linearly independent. If
for some ai ∈ F and si ∈ S, then since f is linear,
0W =
Pk
i=1
ai f (si ) = f (
Pk
i=1
Pk
i=1
ai f (si ) = 0W
ai si ).
Pk
ai si is in the kernel of f . Since, f is injective, ker(f ) = {0V }.
Therefore,
i=1P
k
It follows that
i=1 ai si = 0V . Then, since S is linearly independent, it follows
that ai = 0 for all i. This shows that f (S) is linearly independent.
Definition. The linear function f : V → W is an isomorphism if there exists a linear
function g : W → V such that g ◦ f = idV and f ◦ g = idW . The function g is called
the inverse of f .
Remark. Suppose that f : V → W is an isomorphism. Then, just as proved in
math 112 for mappings of sets, it follows that f is bijective, i.e. both injective and
surjective. (Prove this!) For mappings of sets, being bijective is equivalent to having
an inverse. It turns out the same is true for vector spaces: if f : V → W is a bijective
linear function, then it has an inverse mapping g : W → V as a mapping of sets, and
this mapping g is automatically linear. (Check this for yourself.)
Example. The space of 2 × 2 matrices over F is isomorphic to F 4 . One isomorphism
is given by
a b
7→ (a, b, c, d).
c d
Proposition 3. A linear mapping f : V → W is an isomorphism if and only if ker(f )
is trivial and im(f ) = W .
Proof. The condition that ker(f ) is trivial and im(f ) = W is equivalent to the
bijectivity of f .
Theorem 4. Let dim V = n < ∞. Then V is isomorphic to F n .
63
Proof. Choose a basis b1 , . . . , bn for V , and let e1 , . . . , en be the standard basis vectors
for F n . Define f : V → F n by f (bi ) = ei for i = 1, . . . , n and extending
P linearly. Recall
what this means: given v ∈ V , there are unique ai such that v = ni=1 ai bi . Then by
definition,
n
n
X
X
f (v) =
ai f (bi ) =
ai ei = (a1 , . . . , an ) ∈ F n .
i=1
i=1
Earlier, we called (a1 , . . . , an ) the coordinates of v with respect to the ordered basis hb1 , . . . , bn i.
Corollary 5. Let V and W be finite-dimensional vectors spaces. The V and W are
isomorphic if and only if they have the same dimension.
Proof. First, suppose that f : V → W is an isomorphism, and let b1 , . . . , bn be a
basis for V . By Proposition 2, f (b1 ), . . . , f (bn ) are linearly independent, and since f
is surjective, they span W . Thus, the number of elements in a basis for V is the same
as the number of elements in a basis for W .
Conversely, suppose that dim V = dim W = n. By Theorem 4, we have isomorphisms
−1
fV : V → F n and fW : W → F n . Let fW
: F n → W be the inverse of fW . It follows
that the composition,
f −1
f
V
W
V −→
F n −−
→W
is an isomorphism.
Note: The above reasoning says that for each n = 0, 1, 2, . . . , there is essentially
exactly one vector space over F of dimension n. If dim V = n < ∞, a choice of an
isomorphism V → F n is equivalent to choosing a basis for V .
64
Week 5, Wednesday: Lines, planes, and hyperplanes: equations and parametrizations.
The word “affine” denotes “linear plus a translation”. We make this idea precise in
the following definition.
Definition. Let V and W be vector spaces over F . An affine subspace of V is a
subset of the form
A = p + U := {p + u : u ∈ U }
where p ∈ V and U is a linear subspace of V . The dimension of A is the dimension
of its linear part: dim A := dim W . If dim A = k, we call A a k-plane in V . A 1-plane
is a line, and a 2-plane is simply called a plane. A (dim V − 1)-plane is a hyperplane.
A function f : V → W is affine if there exists a linear function ` : V → W and a
point q ∈ W such that
f (v) = q + `(v)
for all V ∈ V .
It is not hard to show that the image of an affine function is an affine subspace. We
say that an affine subspace is parametrized be an affine mapping ` : V → W if the
image of ` is A. Often, people will in addition require that ` is injective before they
will call it a parametrization.
Example 1. A line in R2 is an affine subspace. For example, consider the line L
pictured below passing through the point (0, 2) and (3, 0):
2
1
−1
1
2
A line L in R2 .
65
3
4
Exercise.
1. Write this line as the solution set to a system of linear equations.
2. Write this line as p + Span S for some p ∈ R2 and some linearly independent
set S ⊂ R2 . Here we will explicitly see that L is an affine subset of R2 .
3. Use part 2 to construct a parametrization of L.
solution:
1. The equation will have the form ax+by = c for some a, b, c. Plug in the points (0, 2)
and (3, 0) to get a system of equations our a, b, c must satisfy: 2b = c and 3a = c.
To get a particular equation, take c = 1. We then have an equation for the line:
1
1
x + y = 1.
3
2
(we could clear denominators to get another possible equation: 2x + 3y = 6, or we
could write the equation in the form y = 2 − (2/3)/x where the slope is −2/3 and
the y-intercept is 2).
2. Since (0, 2) and (3, 0) are on the line, the vector (3, 0) − (2, 0) = (3, −2) is a vector
that is parallel to the line. Define U := Span {(3, −2)}. Next, pick any point on
the line, for example, (0, 2). We can thus express L as an affine space as follows:
L = p + U = (0, 2) + Span {(3, −2)} .
3. The previous part of this problem shows how to parametrize the line:
` : R → R2
t 7→ (0, 2) + t((3, 0) − (0, 2)) = (0, 2) + t(3, −2) = (3t, 2 − 2t).
We’ve written ` in three reasonable ways. Not that `(0) = (0, 2) and `(1) = (3, 0).
Example. Let L be the line in R3 passing through the points (1, 2, 3) and (−2, 4, 0).
Exercise.
1. Write this line as the solution set to a system of linear equations.
2. Write this line as p + Span S for some p ∈ R3 and some linearly independent
set S ⊂ R3 . Here we will explicitly see that L is an affine subset of R3 .
66
3. Use part 2 to construct a parametrization of L.
solution:
1. The solution set for a single non-trivial linear equation in R2 will be a plane. A
line will be the intersection of two planes, which means that it will satisfies two
equations. As the geometry makes clear, there are infinitely many ways to choose
these two planes. At any rate, a linear equation will have the form ax + by + cz =
d and forcing the plane defined by this equation to contain the points (1, 2, 3)
and (−2, 4, 0) amount to a system of equations:
a + 2b + 3c = d
−2a + 4b
= d.
We’ll write this as an equivalent homogeneous system:
a + 2b + 3c − d = 0
−2a + 4b − d
= 0.
Apply the algorithm for solving systems of linear equations. (We will drop the last
column of the augmented matrix since it would be a column of zeros.)
!
1 0 23 − 14
1 2 3 −1
.
−→
−2 4 0 −1
0 1 34 − 38
Therefore
3
1
a=− c+ d
2
4
3
3
b = − c + d.
4
8
So we have two free variables: c and d. A particular solution is clearly (0, 0, 0, 0),
which indications that the set of solutions is in fact a linear subspace. To find a
basis for the space, first set (c, d) = (1, 0), then set (c, d) = (0, 1) to get
1 3
3 3
− , − , 1, 0 ,
, , 0, 1 .
2 4
4 8
We can scale these vectors to get a basis without fractions:
(−6, −3, 4, 0), (2, 3, 0, 8).
These give the equations of two planes whose intersection is our line:
−6x + 3y + 4z = 0
2x + 3y+ = 8.
67
2. Since (1, 2, 3) and (−2, 4, 0) are points on the line, L, we have
L = (−2, 4, 0) + Span {(1, 2, 3) − (−2, 4, 0)}
= (−2, 4, 0) + Span {(3, −2, 3)} .
3. We get the parametrization:
` : R → R3
t 7→ (−2, 4, 0) + t((1, 2, 3) − (−2, 4, 0))
= (−2, 4, 0) + t(3, −2, 3)
= (−2 + 3t, 4 − 2t, 3t).
Example. Let P be the plane in R3 passing through the points (0, 2, −1), (4, 2, 1),
and (1, 0, 1).
Exercise.
1. Write this plane as the solution set to a system of linear equations.
2. Write this plane as p + Span S for some p ∈ R3 and some linearly independent
set S ⊂ R3 . Here we will explicitly see that P is an affine subset of R3 .
3. Use part 2 to construct a parametrization of P .
solution:
1. A plane in R3 is the solution set to a single linear equation,
ax + by + cz = d.
We need that equation to be satisfied by the points (0, 2, −1), (4, 2, 1), and (1, 0, 1),
which leads to a system of equations:
2b − c = d
4a + 2b + c = d
a + c = d.
As a homogeneous system:
2b − c − d = 0
4a + 2b + c − d = 0
a + c − d = 0.
68
Apply the algorithm:


1
0
0
1
0 2 −1 −1


 4 2
1 −1  −→  0 1 0 − 32  .
1 0
1 −1
0 0 1 −2


Therefore,
a = −d
3
b= d
2
c = 2d
Setting d = 2 gives the solution (−2, 3, 4, 2), i.e., the equation
−2x + 3y + 4z = 2.
(And it’s easy to check that our three points satisfy this equation.)
2. Since our plane is the set of solutions to the equation −2x + 3y + 4z = 2, we
need a particular solution to this equation and a basis for the solution space to the
homogeneous equation −2x + 3y + 4z = 0. By inspection, one particular solution
is p = (−1, 0, 0). Two find a basis for the solution space to
−2x + 3y + 4z = 0,
we just need to pick two linearly independent solutions. For example, by inspection, (2, 0, 1) and (3, 2, 0). Therefore, the plane can be described as
P = (−1, 0, 0) + Span {(2, 0, 1), (3, 2, 0)} .
3. From this last description of P , we get the parametrization
` : R2 → R3
(x, y) 7→ (−1, 0, 0) + x(2, 0, 1) + y(3, 2, 0)
= (−1 + 2x + 3y, 2y, x).
Parametrizations of lines and planes, in general. Given two points p, q ∈ Rn ,
for any n, you can parametrize the line through these points with the function
`(t) = q + t(p − q).
With this parametrization, note that `(0) = q and `(1) = p.
Similarly, given three points p, q, r ∈ R3 , not all in a line, you can parametrize the
plane containing these points as
`(s, t) = r + s(p − r) + t(q − r).
69
Supplemental reading on hyperplanes
Let V be an n-dimensional vector space over F . Above, we defined a hyperplane H
in V to be a (n − 1)-plane. (We could also say that H is an affine subspace of
codimension 1.) Picking a basis for V we can identify with the vector space F n by
taking coordinates.
Proposition. An affine subspace H ⊂ F n is a hyperplane if and only if it is the set
of solutions to a single non-trivial equation
a1 x1 + · · · + an xn = d,
i.e., if and only if
H = {(x1 , . . . , xn ) ∈ F n : a1 x1 + · · · + an xn = d}
for some ai , d ∈ F , not all zero.
Proof. (⇒) Suppose that H is a hyperplane in F n . Then
H =p+U
for some p ∈ F n and some (n − 1)-dimensional linear subspace U . Pick a basis u1 , . . . , un−1 for U where ui = (ui,1 , . . . , ui,n ) ∈ F n for all i. Consider the system
of equations:
u1,1 x1 + · · · + u1,n xn = 0
u2,1 x1 + · · · + u2,n xn = 0
(12.1)
..
..
..
.
.
.
un−1,1 x1 + · · · + un−1,n xn = 0.
To solve this system, we would row reduce the matrix M with i-th column equal to ui .
Since the ui are linearly independent, M has rank n − 1. Since M has n columns, this
means the reduced echelon form of M would have would have n−1 pivot columns and
one column corresponding to a free variable. This means the solution space, which
must be linear, is spanned by a single vector, say, (a1 , . . . , an ) 6= ~0. It follows that the
solution set to
a1 x 1 + · · · + an x n = 0
is satisfied by each ui . Thus, the solution space has dimension at least n − 1. It can’t
have dimension n, since this would mean that every vector in F n would satisfy the
equation. That can’t be since not all of the ai are 0.
Now let h = (h1 , . . . , hn ) ∈ H \ {0}, and define
d := a1 h1 + · · · + an hn .
70
We claim that H is the solution set to
a1 x1 + · · · + an xn = d.
(12.2)
To see this, first suppose that q ∈ H. We want to show that q satisfies equation (12.2).
Now, q ∈ H means that q = p + u for some u ∈ U . For the h ∈ H we picked to
define d, we can write h = p + u0 for some u0 ∈ U . It follows that q − h = u − u0 ∈ U .
Hence, q−h satisfies the homogenous system of equations, (12.1). From that, it follows
that q satisfies equation (12.2). Conversely, suppose that q satisfies equation (12.2).
We would like to show that q ∈ H. Since q satisfies equation (12.2), it follows
that q − h satisfies equation (12.2). That means that q − h = u for some u ∈ U .
Since, h ∈ H, we have h = p + u0 for some u0 ∈ U . Therefore, q = h + u = p + (u + u0 ).
So q ∈ H = p + U .
(⇐) Now suppose H ⊂ F n is the solution set to the equation
a1 x1 + · · · + an xn = d.
(12.3)
for some ai ∈ F , not all 0. We would like to show that H is an affine subspace of
dimension n − 1 by writing H = p + U where p ∈ F n and U ⊂ F n is a subspace
of dimension n − 1. We know that some ai is nonzero. For ease of notation, let’s
suppose that a1 6= 0. Then the following set of vectors is a basis for the homogeneous
equation a1 x1 + · · · + an xn = 0:
v1 = (−an , 0, 0, . . . , 0, 0, a1 )
v2 = (−an−1 , 0, 0, . . . , 0, a1 , 0)
v3 = (−an−2 , 0, 0, . . . , a1 , 0, 0)
..
..
..
..
.
.
.
.
vn−1 = (−a2 , a1 , 0, . . . , 0, 0, 0).
These vectors are obviously solutions and independent. There are n − 1 of them.
There can’t be vectors not in the span of these, since then, everything would be a
solution, but for instance (1, 0, . . . , 0) is not. The following is a particular solution:
p = (d/a1 , 0, . . . , 0) ∈ F n .
It follows that
H = p + Span {v1 , . . . , vn−1 } .
The following function parametrizes H:
` : F n−1 → F n
(λ1 , . . . , λn−1 ) 7→ p + λ1 v1 + · · · + λn−1 vn−1 .
71
Example. Give a parametrization of the hyperplane in R3 defined by
2x + 3y + 5z = 6.
(Aside: A plane is, by definition, a 2-dimensional affine subspace. So in the case
of R3 , a plane is the same thing as a hyperplane.)
First find a basis for the solution space to the corresponding homogenous equation 2x + 3y + 5z = 0 using the strategy outlined in the proof of the Proposition:
(−5, 0, 2), (−3, 2, 0).
Then find a particular solution. One possibility is (3, 0, 0). (There are lots of other
possiblities, of course.) Then we can parametrize the hyperplane by
` : R2 → R3
(x, y) 7→ (3, 0, 0) + x(−5, 0, 2) + y(−3, 2, 0) = (3 − 5x − 3y, 2y, 2x).
Example. Give a parametrization of the hyperplane in R4 defined by
x + 4y + 2z + 8w = 2.
First find a basis for the solution space to the corresponding homogenous equation x+
4y + 2z + 8w = 0 using the strategy given in the proof of the Proposition:
(−8, 0, 0, 1), (−2, 0, 1, 0), (−4, 1, 0, 0).
Then find a particular solution. One possibility is (2, 0, 0, 0). (There are lots of other
possiblities, of course.) Then we can parametrize the hyperplane by
` : R3 → R4
(a, b, c) 7→ (2, 0, 0, 0) + a(−8, 0, 0, 1) + b(−2, 0, 1, 0) + c(−4, 1, 0, 0).
72
Week 5, Friday: Matrices.
Matrices
An m×n matrix is a rectangular box with m rows and n columns. If A is a matrix, the
entry in its i-th row and j-th column is denoted Aij . Sometimes we write A = (aij ),
in which case, Aij = aij . We also sometimes write Ai,j , as in A12,3 , if necessary.
Example. A 2 × 3 matrix:
1 2 6
7 0 −1
.
We have A23 = −1, for instance.
Linear structure. Let Mm×n (F ) denote the collection of m × n matrices with
entries in a field F . For A, B ∈ Mm×n and λ ∈ F , define A + B by
(A + B)ij := Aij + Bij ,
and define λA by
(λA)ij = λAij .
Example.

 


−1 2
−1 7
1 3
 2 0  + 2  2 8  =  6 16  .
−1 9
−1 23
1 5

It is an easy exercise to show that this makes Mm×n into a vector space over F .
The additive identity is, of course, the zero matrix, i.e., the matrix whose entries are
all 0 ∈ F . To prove another vector space axiom, let A, B ∈ Mm×n , and let λ ∈ F .
The following verifies the vector space axiom s(A + B) = sA + sB:
(s(A + B))ij = s(A + B)ij
= s(Aij + Bij )
= sAij + sBij
= (sA)ij + (sB)ij .
73
Some remarks about this proof: (i) We prove two matrice are equal by showing each
of their entries are equal. (ii) We are using the definitions of addition and scalar
multiplication, up until the point where we claim that s(Aij + Bij ) = sAij + sBij .
That claim follows because s, Aij , and Bij are all elements of F , which is a field.
Definition. Let A be an m × p matrix, and let B be a p × n matrix. Then their
product is the m × n matrix AB defined by
(AB)ij =
p
X
Aik Bkj .
k=1
Example.
1 0 2
3 1 4


1
0
 3 −1  = 11 6
.
26 11
5
3
The orange highlighting indicates how I am thinking while calculating (AB)1,2 . To
get the 1 − 2 entry of the product, take the “dot product” of the first row of A and
the second column of B:


0
1 0 2  −1  = 1 · 0 + 0 · (−1) + 2 · 3 = 6 .
3
Proposition. Let A be an m × n matrix, B an n × r matrix, both over a field F ,
and λ ∈ F .
1. λ(AB) = (λA)B = A(λB).
2. A(BC) = (AB)C for all r × s matrices C.
3. A(B + C) = AB + AC for all n × r matrices C.
4. (C + D)A = CA + DA for all r × m matrices C and D.
Proof. Well just prove part 2, associativity of multiplication. So let C be an r × s
74
matrix. We have
(A(BC))ij =
n
X
Aik (BC)kj
k=1
=
n
X
r
X
Aik
k=1
=
!!
Bk` C`j
`=1
n X
r
X
Aik (Bk` C`j )
k=1 `=1
=
r X
n
X
Aik (Bk` C`j )
`=1 k=1
=
r X
n
X
(Aik Bk` )C`j
`=1 k=1
=
n
r
X
X
`=1
=
r
X
!
Aik Bk` C`j
k=1
(AB)i` C`j
`=1
= ((AB)C)ij .
Warning. Matrix multiplication is not commutative, in general. First of all, if the
dimensions aren’t right, multiplication for both AB and BA might not make sense.
For instance, if
1
0
1 0 2
A=
and B =
,
3 −1
3 1 4
then AB is defined, but not BA.
However, even if AB and BA are both defined, it is usually not the case that AB =
BA. Try just about any example with 2 × 2 matrices to see this.
75
Week 6, Monday: The ring of square matrices. Matrix inversion.
Last time, we defined matrix multiplication: if A is an m × p matrix and B is a p × n
matrix, then AB is the m × n matrix with i, j-entry
(AB)ij :=
n
X
Aik Bkj .
k=1
If m = n, then BA would also be defined, but is usually that case that AB 6= BA. Another peculiar thing is that for matrices, there are “zero divisors”, i.e., matrices A, B
such that AB = 0, but neither A nor B is a zero matrix. For example,
0 0
0 1
0 0
=
.
0 1
0 0
0 0
Diagonal matrices. The matrix A is a diagonal matrix if its only nonzero entries
appear along the diagonal: Aij = 0 if i 6= j. This terminology makes sense regardless
of the dimensions of A, but is usually used in the case of square matrices, i.e., for the
case where A is an n × n matrix. In that case, we write
A = diag(a1 , . . . , an )
where Aii = ai for i = 1 . . . , n (and Aij = 0, otherwise.). For instance,


1 0 0 0
 0 4 0 0 

diag(1, 4, 0, 6) = 
 0 0 0 0 .
0 0 0 6
Identity matrices. The n × n identity matrix is the n × n matrix
In = diag(1, . . . , 1).
It has the following property: AI = A and IB = B whenever these products make
sense. For instance,
76
and
1 2 3
4 5 6


1 0 0
1
2
3
 0 1 0 =
4 5 6
0 0 1


 

1 0 0
1 2
1 2
 0 1 0  3 4  =  3 4 .
0 0 1
5 6
5 6
Inverses. Let A be an m × n matrix, and let B be an n × m matrix. If AB = In ,
we say A is a left-inverse for B and B is a right-inverse for A. For example,


1 −1
1 1 1
0 .
A=
and B =  0
0 1 1
0
1
Then
AB =
1 1 1
0 1 1


1 −1
1
0
 0
0 =
.
0 1
0
1
Hence, A is a left-inverse for B and B is a right-inverse for A. On the other hand,




1 −1 1 0 0
1 1 1
0 
BA =  0
=  0 0 0 .
0 1 1
0
1
0 1 1
So B is not a left-inverse of A and A is not a right-inverse of B. (In fact, B does
not have a left-inverse and A does not have a right-inverse. This has to do with their
ranks not being high enough, and should be able to understand this later.)
We will mainly be interested in inverses for square matrices. Suppose that A is an n×n
matrix. Suppose B is a right-inverse. So B is an n × n matrix such that AB = In .
Since matrix multiplication is not commutative, the value of BA is not immediately
clear. However, in fact, we have the following important result:
Theorem. Let A and B be n × n matrices. The following are equivalent:
1. AB = In .
2. BA = In .
If AB = In , we say A and B are invertible and write A−1 = B and B −1 = A. The
following are equivalent:
77
1. A is invertible.
2. rank(A) = n.
3. The reduced echelon form of A is In .
The proof of this theorem will follow from an elegant algorithm for computing the
inverse of a matrix which we present below. The equivalence of the last to items on
the list is something we already know.
Calculating the inverse. Our problem now is to determine whether an inverse for a
matrix exists, and if so, to calculate that inverse. The methods we present here would
also be applicable to calculating right- and left-inverses of non-square matrices—it
boils down to solving systems of linear equations, after all—but we will concentrate
on the case of square matrices.
Example. Let


0
3 −1
0
1 .
A= 1
1 −1
0
A right-inverse for A would satisfy the following:


 

0
3 −1
a b c
1 0 0
 1
0
1  d e f  =  0 1 0 .
1 −1
0
g h i
0 0 1
So we need to find the entries a, . . . , i. We can break this into three problems:
   
1
0
3 −1
a
 1
0
1  d  =  0 
1 −1
0
g
0


   
0
3 −1
b
0
 1




0
1
e
1 
=
1 −1
0
h
0

   
0
3 −1
c
0
 1
0
1  f  =  0 .
1 −1
0
i
1
78
Equivalently, we need to solve three systems of linear equations:
0 · x + 3y − z = 1
x+0·y+z =0
x−y+0·z =0
0 · x + 3y − z = 0
x+0·y+z =1
x−y+0·z =0
0 · x + 3y − z = 0
x+0·y+z =0
x−y+0·z =1
There augmented matrices would like:


0
3 −1 1
 1
0
1 0 ,
1 −1
0 0


0
3 −1 0
 1
0
1 1 ,
1 −1
0 0


0
3 −1 0
 1
0
1 0 .
1 −1
0 1
The row operations needed to determine the solvability of this system are the same
in all three cases. So we can combine all three of these systems at once in one
79
“super”-augmented matrix calculation:




1
0
1 0 1 0
0
3 −1 1 0 0
r1 ↔r2
 1
0
1 0 1 0  −−
3 −1 1 0 0 
−→  0
1 −1
0 0 0 1
1 −1
0 0 0 1


1
0
1 0 1 0
r3 →r3 −r1
3 −1 1 0 0 
−−
−−−−→  0
0 −1 −1 0 −1 1

1 0
1 0 1 0
r ↔r3
 0 1
1 0 1 −1 
−−2−−→
r3 →−r3
0 3 −1 1 0 0



1 0
1 0 1
0
r3 →r3 −3r2
1 0 1 −1 
−−
−−−−→  0 1
0 0 −4 1 −3 3

0
1
0
1 0 1
r3 →−r3 /4
0
1
−1 
−−−−−−→  0 1 1
0 0 1 −1/4 3/4 −3/4


1 0 0 1/4 1/4 3/4
r1 →r1 −r3
−−
−−−−→  0 1 0 1/4 1/4 −1/4  .
r2 →r2 −r3
0 0 1 −1/4 3/4 −3/4

Going back to the original systems of equations, we see that we need
  

  

  

a
1/4
b
1/4
c
3/4
 d  =  1/4  ,  e  =  1/4  ,  d  =  −1/4  .
g
−1/4
h
3/4
i
−3/4
In other words, the following matrix is

1/4
 1/4
−1/4
a left-inverse for A:

1/4
3/4
1/4 −1/4  .
3/4 −3/4
The argument we’ve just given for a particular matrix easily generalizes to give the
following algorithm.
80
Algorithm for computing the inverse of a matrix. Let A be an n × n matrix.
Perform row operations on the “super”-augmented matrix [A | In ] to compute the
reduced echelon form of A:
(A | In ) −→ Ã | B .
Then,
• If à 6= In , then A0 has a row of zeros, rank(A) < n, and A has no inverse (left or
right).
• Otherwise, Ã = In , rank(A) = n, and B = A−1 , the inverse of A.
todo: The argument we have given so far shows that if A is an n × n matrix of
rank n, then A has a right inverse B so that AB = In . Next time, we will show that,
in fact, it automatically follows that BA = In , too. This is surprising since matrix
multiplication is not generally commutative.
81
Week 6, Wednesday: Matrices and linear transformations.
Finishing up the argument from last time. Last time, we considered the matrix


0
3 −1
0
1 .
A= 1
1 −1
0
In order to find a right-inverse for A, we tried to find


a b c
B= d e f 
g h i
such that


 

0
3 −1
a b c
1 0 0
0
1  d e f  =  0 1 0 
AB =  1
1 −1
0
g h i
0 0 1
and found that this amounted to a giant row-reduction problem:



1 0 0 1/4 1/4 3/4
0
3 −1 1 0 0
 1
0
1 0 1 0  −→  0 1 0 1/4 1/4 −1/4 
1 −1
0 0 0 1
0 0 1 −1/4 3/4 −3/4

yielding


1/4 1/4
3/4
B =  1/4 1/4 −1/4  .
−1/4 3/4 −3/4
Imagine the similar problem of computing a left-inverse for A:

 

a0 b 0 c 0
0
3 −1
1 0 0
 d0 e0 f 0   1
0
1  =  0 1 0 .
0
0
0
g h i
1 −1
0
0 0 1

82
Converting to linear equations, as above, we will amount to row-reducing the transpose
of A,


0 1
1 1 0 0
 3 0 −1 0 1 0  −→ . . .
−1 1
0 0 0 1
Instead of doing that, consider the following: from our previous calculation, we saw
that the reduced echelon form of A is In . This means that the rank of A is 3. But
since the row rank and column rank of a matrix are equal, this means that column
rank of A is also 3. Equivalently, the row rank of the transpose of A is three. By
this same reasoning, the rank of a matrix and its transpose are always equal. In
our situation, this means that after row-reducing the above system, we will end up
with a left-inverse C for A, i.e., the system is not inconsistent. In fact, we can show
that C = B without actually doing any the row-reduction! Here’s how: we have
AB = I3
and CA = I3 .
But then,
AB = I3 ⇒ C(AB) = CI3 ⇒ (CA)B = C ⇒ I3 B = C ⇒ C = B.
Therefore, any right-inverse of an n × n invertible matrix is automatically a leftinverse.
Every matrix determines a linear function. Each m × n matrix A with entries
in a field F determines a linear function:
fA : F n → F m
x → Ax.
Here, we are thinking of x as being a column matrix, i.e., an n × 1 matrix—a single
column. The linearity of fA follows from properties of matrix multiplication which
we’ve already observed:
fA (u + λv) = A(u + λv) = Au + λAv.
Example. If
A=
1 2 3
4 5 6
,
then fA is the mapping F 3 → F 2 given by
 
 
x
x
1 2 3  
x + 2y + 3z
 y  7→
y
=
.
4 5 6
4x + 5y + 6z
z
z
83
In other words,
f (x, y, z) = (x + 2y + 3z, 4x + 5y + 6z).
Notice that you can get to this formula for f : the coefficients of the linear functions
in each coordinate come from the rows of A. We display this relation in the next
example.
Example. Here are some matrices and their corresponding linear functions:


2 5
A =  6 −3 
fA (x, y) = (2x + 5y, 6x − 3y, 4x + 3y)
4 3


3 6 1
B= 4 0 2 
fB (x, y, z) = (3x + 6y + z, 4x + 2z, 5x + y + 5z)
5 1 5
 
3
 5 

C=
fC (x) = 3x + 5y + 2z + 2w
 2 
2
 
3
 5 

D=
fD (x) = (3x, 5x, 2x, 2x) = x(3, 5, 2, 2)
 2 
2
F = 3 5 2 2
fE (x) = 3x + 5x + 2x + 2x.
Matrices representing arbitrary linear functions. Suppose f : V → W is a
linear function between finite-dimensional vector spaces V and W . In order to represent f with a matrix, we need to first identify V and W with vectors spaces of
tuples, i.e., with vector spaces of the form F k . To that end, let α = hv1 , . . . , vn i be
an ordered basis for V , and let β = hw1 , . . . , wm i be an ordered basis for W . Taking
coordinates with respect to these bases identifies V with F n and W with F m . For
instance, if v ∈ V , we write v = c1 v1 + · · · + cn vn for some uniquely determined
c1 , . . . , cn in F . We then identify v with its coordinates (c1 , . . . , cn ) ∈ F n . In this
way, taking coordinates defines an isomorphism of vector spaces
φα : V → F n
v 7→ (c1 , . . . , cn ).
Similarly, we get an isomorphism φβ : W → F m .
84
To compute the m × n matrix Aβα representing f with respect to these bases, we
compute the columns of Aβα , one at a time. To compute the j-th column, find the
coordinates of f (vj ) with respect to the basis β. In other words, find ai such that
f (vj ) = a1 w1 + · · · + am wm .
Then the j-th column of Aβα is


a1
 .. 
 . .
am
The idea is this: by taking coordinates, we can identify V with F n and W with F m .
Under this identification, the mapping f becomes the linear mapping between F n
and F m determined by the matrix Aβα :


x1


~x =  ...  −→ Aβα ~x ∈ F m ,
xn
In fancier language, we get a commutative diagram
Fn
W
∼
φα
f
∼
V
Aβ
α
φβ
Fm
where φα and φβ are the P
isomorphisms taking vectors to their coordinates. For instance, if v ∈ V , and v = ni=1 ci vi , then φα (v) = (c1 , . . . , cn ). By saying the diagram
“commutes” we mean that Aβα (φα(v) ) = φβ (f (v)) for all v ∈ V .
Example. Let R[x]≤d be the vector space of polynomials in x of degree at most d
with coefficients in R. Consider the following linear function:
f : R[x]≤2 → R[x]≤3
p 7→ xp + p0
Find the matrix representing f with respect to the ordered bases α = h1, x, x2 i
for R[x]≤2 and β = h1, x, x2 , x3 i. Compute:
f (1) = x = 0 · 1 + 1 · x + 0 · x2 + 0 · x3
f (x) = x2 + 1 = 1 · 1 + 0 · x + 1 · x2 + 0 · x3
f (x2 ) = x3 + 2x = 0 · 1 + 2 · x + 0 · x2 + 1 · x3 .
The matrix is then
85
f (1) f (x) f (x2 )


1
0
1 0
0
2 
x
 1
.
2
0
1
0 
x
0
1
x3 0
The rows and columns are labeled to give an idea of where the entries come from.
86
Week 6, Friday: Matrices and linear transformations.
Last time, we saw that there was a one-to-one correspondence between m×n matrices
with coefficients in a field F and linear functions F n → F m . The correspondence goes
like this: an m × n matrix A corresponds to the linear function
fA : F n → F m
x 7→ Ax
where we identify points in F n with column vectors:


x1
 .. 
n
 .  ←→ (x1 , . . . , xn ) ∈ F .
xn
Conversely, given a linear function f : F n → F m , let [f ] be the m × n matrix whose
j-th column is f (ej ) where ej is the j-th standard basis vector.
Example. If


1 2
 3 4 ,
5 6
then the corresponding linear function is
fA : F 2 → F 3




1 2 x + 2y
x
x
7→  3 4 
=  3x + 4y 
y
y
5 6
5x + 6y
. In other words,
fA (x, y) = (x + 2y, 3x + 4y, 5x + 6y).
Notice how the coefficients of the linear function in each coordinate of fA (x, y) come
from the corresponding row of A.
87
To recover A from the function fA , we compute the images of the standard basis
vectors:
fA (e1 ) = fA (1, 0) = (1, 3, 5)
fA (e2 ) = fA (0, 1) = (2, 4, 6).
These two vectors are the columns of A.
Let Mm×n denote the vector space of m × n matrices, and let Hom(F n , F m ) denote the vector space of linear functions with domain F n and codomain F m . The
correspondence we have just described gives an isomorphism of vector spaces:
Mm×n → Hom(Fn , Fm )
A 7→ fA .
It is easy to check that A 7→ fA is bijective (we’ve described the inverse) and that it
is linear:
fA+λB = fA + λfB .
The point: We have just described very precisely the sense in which matrices are
“the same” as linear functions between vector spaces of tuples.
The image of a linear function. The image of a linear function f : F n → F m is
im(f ) := {f (x) ∈ F m : x ∈ F n },
a subspace of F m . Recall that earlier we noticed that a linear function is determined
by the images of the standard basis vectors. Precisely: given any vectors w1 , . . . , wn
in F m , there is exactly one linear function f : F n → F m such that f (ei ) = wi for
i = 1, . . . , n. The image of f is the span of the images of its basis vectors:
im(f ) = Span {f (e1 ), . . . , f (en )} .
That’s because every vector in the image has the form f (x) for some x = (x1 , . . . , xn ) ∈
F n . We can write x = x1 e1 + · · · + xn en , and then
f (x) = f (x1 e1 + · · · + xn en ) = x1 f (e1 ) + · · · + xn f (en ) ∈ Span{f (e1 ), . . . , f (en )}.
Now suppose your linear function f corresponds to the matrix A. Then the column of A are exactly the images of the standard basis vectors. Hence, we have the
important observation:
88
The image of the linear function corresponding to A is the column span
of A. Therefore, the rank of a linear function is the rank of its corresponding matrix.
Example. Consider the matrix


1 1
A= 0 1 
−3 2
The image of its corresponding function fA : F 2 → F 3 is the column span of A:
im(fA ) = Span {(1, 0, −3), (1, 1, 2)} .
To see this directly, notice that
fA (x, y) = (x − 3z, x + y + 2z)
= x(1, 1) + y(0, 1) + (−3, 2)
1
0
−3
=x
+y
+z
.
1
1
2
Matrix multiplication corresponds to function composition. We’ll start with
an example. Consider the matrices
a b
e f g
A=
and B =
c d
h i j
where a, . . . j ∈ F . There corresponding linear functions are
fA : F 2 → F 2
(x, y) 7→ (ax + by, cx + dy)
and
fB : F 3 → F 2
(x, y, z) 7→ (ex + f y + gz, hx + iy + jz).
We can compose these function
fA ◦ fB : F 3 → F 2 .
89
In detail,
(fA ◦ fB )(x, y, z) = fA (fB (x, y, z))
= fA (ex + f y + gz, hx + iy + jz)
= (a(ex + f y + gz) + b(hx + iy + jz), c(ex + f y + gz) + d(hx + iy + jz))
= ((ae + bh)x + (af + bi)y + (ag + bj)z, (ce + dh)x + (cf + di)y + (cg + dj)z).
So the matrix corresponding to fA ◦ fB is
(ae + bh) (af + bi) (ag + bj)
a b
e f g
=
= AB.
(ce + dh) (cf + di) (cg + dj).
c d
h i j
Therefore,
fA ◦ fB = fAB .
This explains the formula for matrix multiplication. Matrix multiplication is defined
exactly so that under the correspondence with linear mappings, it corresponds with
composition of functions. We have just given an example displaying this fact, and
by working in coordinates, it might appear surprising. However, working from the
definitions the result is easy to verify:
Proposition. Let A and m×p matrix, and let Bbe a p×n matrix. Let fA : F p → F m
and fB : F n → F p be the corresponding linear functions. Then the matrix corresponding to the composition fA ◦ fB is AB.
Proof. For all x ∈ F n , identifying points with column vectors, as usual,
(fA ◦ fB )(x) = fA (fB (x)) = A f (B) = A(Bx) = (AB)x.
The key point is associativity of matrix multiplication. Thus, fA ◦ fB = fAB .
Matrices for linear functions between arbitrary vector spaces. Suppose that
f: V →W
is a linear function between vector spaces/ F where dim V = n and dim W = m. We
want to identify f with a matrix, but there are choices involved (which turns out to
be a good thing). We first need to choose ordered bases
α = hv1 , . . . , vn i and β = hw1 , . . . , wm i
for V and W , respectively. Next take coordinates with respect to those bases. Formally, we have isomorphims
φα : V → F n
and φβ : W → F m .
90
These mappings are determined by the following:
φα (vi ) = ei ∈ F n
for i = 1, . . . , n
φβ (vi ) = ei ∈ F m
for i = 1, . . . , m.
and
Recall that an isomorphism just means that the vector spaces are exactly the same
except for their names. Therefore, the linear mapping f will translate to a linear
mapping F n → F m , and thus to a matrix. We’ll call that matrix [f ]βα . As described
last time, we get a commutative diagram
Fn
W
∼
φα
f
∼
V
[f ]β
α
φβ
Fm
By definition,
[f ]βα := φβ ◦ f ◦ φ−1
α .
How do we compute this matrix? Since [f ]βα is a matrix, its j-th column is given
by [f ]βα (ej ). By definition,
−1
[f ]βα (ej ) = φβ ◦ f ◦ φ−1
.
α (ej ) = φβ f φα (ej )
We have φα (vj ) = ej . Hence, φ−1
α (ej ) = vj ∈ V . So,
[f ]βα (ej ) = φβ f φ−1
= φβ (f (vj )) .
α (ej )
So here is the algorithm for computing [f ]βα : to find its j-th column, compute the
coordinates of f (vj ) with respect to β.
Example. Consider
the following graph G = (V, E) with vertices V = {1̄, 2̄, 3̄, 4̄},
and edges E = 12, 13, 23, 24, 34 :
4̄
2̄
3̄
1̄
91
Define the edge space, QE, for G with rational coefficients to be the five-dimensional
vector space with basis
12, 13, 23, 24, 34.
So a typical element of QE might be
3
2
7
2 · 12 + · 13 − · 23 + 6 · 24 − · 34.
5
9
3
Similarly, define the vertex space, QV , for G to be the four-dimensional vector space
with basis
1̄, 2̄, 3̄, 4̄.
So a typical element of QV might be
3
· 2̄ + 5 · 4̄.
4
Next define a linear function called the boundary map:
4 · 1̄ −
∂ : QE → QV
12 7→ 2̄ − 1̄
13 7→ 3̄ − 1̄
23 7→ 3̄ − 2̄
24 7→ 4̄ − 2̄
34 7→ 4̄ − 3̄.
We have defined ∂ by indicating the images of the basis vectors for QE. We define it
on all of QE by extending linearly. For example
∂ 2 · 12 + 4 · 23 = 2 ∂(12) + 4 ∂(23)
= 2(2̄ − 1̄) + 4(3̄ − 2̄)
= −2 · 1̄ − 2 · 2̄ + 4 · 3̄.
What is the matrix representing ∂ with respect to the given (ordered) bases? We
have already computed the images of each of the basis vectors, and it is then easy to
take their coordinates. For instance,
∂(12) = 2̄ − 1̄ = −1 · 1̄ + 1 · 2̄ + 0 · 3̄ + 0 · 4̄.
So the first column of the matrix is


−1
 1 


 0 .
0
The full matrix is
92
12 13 23 24 34

−1 −1
0
0
0
1̄
0 −1 −1
0 
2̄ 
 1
.
1
1
0 −1 
3̄  0
4̄
0
0
0
1
1

A=
The rows and columns are labeled by the corresponding basis vectors, for convenience.
Challenge: describe the kernel of ∂. (Start by finding one element in it.) Hint: the
kernel is called the cycle space of G.
93
Week 7, Monday: Matrices and linear transformations: examples. Change of basis.
Examples of matrices corresponding to linear functions.
Let f : V → W be a linear function between vector spaces/ F where dim V = n
and dim W = m. To identify f with a matrix, choose ordered bases
α = hv1 , . . . , vn i and β = hw1 , . . . , wm i
for V and W , respectively. Take coordinates with respect to those bases:
φα : V → F n
and φβ : W → F m ,
determined by the following:
φα (vi ) = ei ∈ F n
for i = 1, . . . , n
φβ (wi ) = ei ∈ F m
for i = 1, . . . , m.
and
We then get a commutative diagram:
Fn
W
∼
φα
f
∼
V
[f ]β
α
φβ
F m.
By definition,
[f ]βα := φβ ◦ f ◦ φ−1
α .
By saying the diagram is commutative, we mean that
[f ]βα ◦ φα = φβ ◦ f.
In other words, the two ways of getting from the top left corner to the bottom right
in the diagram give the same result.
94
To compute the matrix [f ]βα , note that its j-th column is given by [f ]βα (ej ). By
definition,
−1
.
[f ]βα (ej ) = φβ ◦ f ◦ φ−1
α (ej ) = φβ f φα (ej )
We have φα (vj ) = ej . Hence, φ−1
α (ej ) = vj ∈ V . So,
[f ]βα (ej ) = φβ f φ−1
= φβ (f (vj )) .
α (ej )
95
Algorithm for computing [f ]βα :
The j-th column of [f ]βα consists of the coordinates of f (vj ) with respect to β.
vj
f (vj )
Fn
[f ]β
α
take coords. wrt. β
φβ
∼
φα
W
∼
V
f
F m.
ej
j-th col. of [f ]βα
Example 1. Here is a trivial example. Let A be an m × n matrix, and let fA : F n →
F m be the corresponding linear function. Let α and β be the standard bases for F n
and F m , respectively. In that case, φα = idF n and φβ = idF m and
[f ]βα = A.
To see that, we compute f (ej ) for each j. We get an m-tuple, which is already the
coordinates of f (ej ) with respect to the standard basis, β. So the columns of [f ]βα are
the f (ej ), and the result follows.
Example 2. Another special case. Let β be an ordered basis for F n , and let st denote
the standard ordered basis for F n . Let idF n : F n → F n be the identity mapping, for
which idF n (v) = v for all v ∈ F n . The coordinate mapping φst is equal to idF n and
we have [idF n ]βst :
F
Fn
∼
idF n
idF n
∼
Fn
β
n
n [idF ]st
φβ
F m.
Example 3. Consider
the following graph G = (V, E) with vertices V = {1̄, 2̄, 3̄, 4̄},
and edges E = 12, 13, 23, 24, 34 :
4̄
2̄
3̄
1̄
96
Define the edge space, QE, for G with rational coefficients to be the five-dimensional
vector space with basis
12, 13, 23, 24, 34.
So a typical element of QE might be
3
2
7
2 · 12 + · 13 − · 23 + 6 · 24 − · 34.
5
9
3
Similarly, define the vertex space, QV , for G to be the four-dimensional vector space
with basis
1̄, 2̄, 3̄, 4̄.
So a typical element of QV might be
3
· 2̄ + 5 · 4̄.
4
Next define a linear function called the boundary map:
4 · 1̄ −
∂ : QE → QV
12 7→ 2̄ − 1̄
13 7→ 3̄ − 1̄
23 7→ 3̄ − 2̄
24 7→ 4̄ − 2̄
34 7→ 4̄ − 3̄.
We have defined ∂ by indicating the images of the basis vectors for QE. We define it
on all of QE by extending linearly. For example
∂ 2 · 12 + 4 · 23 = 2 ∂(12) + 4 ∂(23)
= 2(2̄ − 1̄) + 4(3̄ − 2̄)
= −2 · 1̄ − 2 · 2̄ + 4 · 3̄.
What is the matrix representing ∂ with respect to the given (ordered) bases? We
have already computed the images of each of the basis vectors, and it is then easy to
take their coordinates. For instance,
∂(12) = 2̄ − 1̄ = −1 · 1̄ + 1 · 2̄ + 0 · 3̄ + 0 · 4̄.
So the first column of the matrix is


−1
 1 


 0 .
0
The full matrix is
97
12 13 23 24 34

−1 −1
0
0
0
1̄
0 −1 −1
0 
2̄ 
 1
.
1
1
0 −1 
3̄  0
4̄
0
0
0
1
1

A=
The rows and columns are labeled by the corresponding basis vectors, for convenience.
Challenge: describe the kernel of ∂. (Start by finding one element in it.) Hint: the
kernel is called the cycle space of G.
98
Week 7, Wednesday: Determinants.
Change of coordinates example. Consider the matrix 2 × 3 with coefficients in Q
(the rational numbers):
1 3 2
A=
.
0 2 1
The corresponding linear function is
f : Q3 → Q2
(x, y, z) 7→ (x + 3y + 2x, 2y + z).
Let’s choose the following bases for the domain and codomain:
Q3 :
Q2 :
α = h(1, 0, 0), (1, 1, 0), (1, 1, 1)i
β = h(0, 1), (1, 1)i.
Compute the matrix [f ]βα representing f with respect to these bases. (Note: the
matrix A represents f with respect to the standard bases for Q3 and Q2 .)
solution: Compute the β-coordinates for the images of the α basis vectors:
f (1, 0, 0) = (1, 0) = −1 · (0, 1) + 1 · (1, 1)
f (1, 1, 0) = (4, 2) = −2 · (0, 1) + 4 · (1, 1)
f (1, 1, 1) = (6, 3) = −3 · (0, 1) + 6 · (1, 1).
Therefore,
[f ]βα
=
−1 −2 −3
1
4
6
.
For practice, let’s compute the coordinate maps, φα and φβ .
Recall that earlier, we talked about a general coordinate mapping
φα : V → F n
n
X
v=
ci vi 7→ (c1 , . . . , cn ).
i=1
99
In our case, V = Q3 and F n = Q3 , i.e.,
φα : Q3 → Q3 .
So, in fact, φα is given by a 3 × 3 matrix. Which matrix? We need to compute
each φα (ej ), and we do that by expressing each ej as a linear combination of the
elements of α:
e1 = (1, 0, 0) = 1 · (1, 0, 0) + 0 · (1, 1, 0) + 0 · (1, 1, 1)
e2 = (0, 1, 0) = −1 · (1, 0, 0) + 1 · (1, 1, 0) + 0 · (1, 1, 1)
e3 = (0, 0, 1) = 0 · (1, 0, 0) − 1 · (1, 1, 0) + 1 · (1, 1, 1).
Hence, φα is the linear function corresponding to the matrix


1 −1
0
1 −1  .
Mα :=  0
0
0
1
Next, we compute the matrix corresponding to φβ : Q3 → Q3 by computing the
coordinates of each standard basis vector with respect to β:
e1 = (1, 0) = −1 · (0, 1) + 1 · (1, 1)
e2 = (0, 1) = 1 · (0, 1) + 0 · (1, 1).
Therefore, the matrix respresenting φβ is
−1 1
Mβ :=
.
1 0
We end up with a commutative diagram
Q3
Q2
∼
Mα
A
∼
Q3
[f ]β
α
Mβ
Q2 .
We check commutativity. Going down along the left side:


1 −1
0
−1 −2 −3 
−1 −1 −1
β

0
1 −1
[f ]α Mα =
=
.
1
4
6
1
3
2
0
0
1
100
Going over and down the right side of the diagram, we get
−1 1
1 3 2
−1 −1 −1
Mβ A =
=
.
1 0
0 2 1
1
3
2
So [f ]βα Mα = Mβ A, or
[f ]βα = Mα−1 AMβ .
Change of basis. Let A be an m × n matrix with corresponding linear function:
A
Fn −
→ F m.
(Note that we are equating A with fA in our notation, as we often will from now on.)
Let α and β be ordered bases for F n and F m , respectively. Let Mα and Mβ be the
matrices corresponding to the respective coordinate mappings. Generalizing what we
saw in the last example, the matrix B := [fA ]βα representing the linear function fA
with respect these bases is
B = Mα−1 AMβ .
In order to compute Mα , the j-th column of Mα consists of the coordinates of ej with
respect to α.
If α = hv1 , . . . , vn i, then the coordinates of vj with respect to α are given by ej . Thus,
Mα vj = ej .
Applying the inverse of Mα to both sides of this equation, we get
vj = Mα−1 Mα vj = Mα−1 ej .
For instance, in the previous example, we computed


1 −1
0
1 −1  .
Mα =  0
0
0
1
Computing the inverse, we find

Mα−1

1 1 1
=  0 1 1 ,
0 0 1
whose columns are the vectors in α.
Main point: The j-th column of Mα−1 is vj . This suggests a way of constructing Mα :
first construct Mα−1 as the matrix whose columns are the basis vectors. Then take
the inverse to get Mα .
101
Determinants
Definition. The determinant is a multilinear, alternating function of the rows of
square matrix, normalized so that its value on the identity matrix is 1.
To explain this terminology, start with the fact that the determinant is a function
det : Mn×n → F.
Given a square matrix A ∈ Mn×n with rows r1 , . . . , rn ∈ F n , we write det(A) =
det(r1 , . . . , rn ), i.e., we consider the determinant as a function of the rows of A. The
determinant function has the following properties:
1. Multilinear. The determinant is a linear function of with respect to each row.
Thus, if r1 , . . . , rn are the row vectors (elements of F n ), ri0 is another row vector,
and λ ∈ F , we have
det(r1 , . . . , ri−1 , λ ri + ri0 , ri+1 , . . . , rn ) = λ det(r1 , . . . , ri−1 , ri , ri+1 , . . . , rn )
+ det(r1 , . . . , ri−1 , ri0 , ri+1 , . . . , rn ).
2. Alternating. The determinant is zero if two of its arguments are equal:
det(r1 , . . . , rn ) = 0
if ri = rj for some i 6= j.
3. Normalized. det(In ) = det(e1 , . . . , en ) = 1.
As we study the determinant, the following theorem will become evident:
Theorem. For each n ≥ 0, there exists a unique determinant function.
102
Week 7, Friday: Determinants.
Last time, we defined the determinant function:
det : Mn×n → F
as a multilinear, alternating function of the rows of an n × n matrix, normalized so
that det(In ) = 1.
We will later see that there exists a unique determinant function. For now, though we
will begin to explore some of its properties. In particular, the following proposition
shows that we can compute the determinant through row reduction.
Proposition 1. (Behavior of the determinant with respect to row operations.)
Let A, B ∈ Mn×n .
1. If B is obtained from A by swapping two rows, then det(B) = − det(A).
2. If B is obtained from A by scaling a row by a scalar λ, then det(B) = λ det(A).
3. If B is obtained from A by adding a scalar multiple of one row to another row,
then det(B) = λ det(A).
Proof. For part (1), let r1 , . . . , rn ∈ F n be the rows of A. For ease of notation, we
will assume that B is obtained from A by swapping the first two rows. The argument
we present clearly generalizes to the case of swapping arbitrary rows. Replace the
first two rows of A with r1 + r2 to obtain a matrix whose determinant is 0 by the
alternating property:
0 = det(r1 + r2 , r1 + r2 , r3 , . . . , rn ).
Expand my multilinearity to get:
0 = det(r1 + r2 , r1 + r2 , r3 , . . . , rn )
= det(r1 , r1 + r2 , r3 , . . . , rn ) + det(r2 , r1 + r2 , r3 , . . . , rn )
= det(r1 , r1 , r3 , . . . , rn ) + det(r1 , r2 , r3 , . . . , rn )
+ det(r2 , r1 , r3 , . . . , rn ) + det(r2 , r2 , r3 , . . . , rn )
= 0 + det(A) + det(B) + 0.
103
It follows that det(B) = − det(A).
Part (2) follows immediately from the fact that the determinant is linear with respect
to each row:
det(r1 , . . . , ri−1 , λri , ri+1 , . . . , rn ) = λ det(r1 , . . . , ri−1 , ri , ri+1 , . . . , rn ).
For Part (3), we use multilinearity and the alternating property. For ease of notation,
we’ll consider the case where B is obtained from A by adding a multiple of row 1 to
row 2:
det(B) = det(r1 , λr1 + r2 , r3 , . . . , rn )
= λ det(r1 , r1 , r3 , . . . , rn ) + det(r1 , r2 , r3 , . . . , rn )
= 0 + det(r1 , r2 , r3 , . . . , rn )
= det(A).
Example 1. Here we compute the determinant of a 2 × 2 matrix using the fact that
the determinant is a multilinear alternating mapping with value 1 on the identity
matrix.
det
a b
c d
= det((a, b), (c, d))
= det(a e1 + b e2 , c e1 + d e2 )
= a det(e1 , c e1 + d e2 ) + b det(e2 , c e1 + d e2 )
= ac det(e1 , e1 ) + ad det(e1 , e2 ) + bc det(e2 , e1 ) + bd det(e2 , e2 )
= 0 + ad det(e1 , e2 ) + bc det(e2 , e1 ) + 0
= ad det(e1 , e2 ) − bc det(e1 , e2 )
1 0
1 0
= ad det
− bc det
0 1
0 1
= ad · 1 − bc · 1 = ad − bc.
Example 2. Here is an example of using
of a matrix. Let

1

9
A=
2
row reduction to compute the determinant

2 −2
4
0 
2
4
104
Using the proposition, we see that


1 2 −2
0 
det(A) = det  9 4
2 2
4


1
2 −2
= det  0 −14 18 
0 −2
8


1
2 −2
8 
= − det  0 −2
0 −14 18


1
2 −2
1 −4 
= 2 det  0
0 −14 18


1 2 −2
= 2 det  0 1 −4 
0 0 −38


1 2 −2
= 2(−38) det  0 1 −4 
0 0
1


1 0 0
= 2(−38) det  0 1 0 
0 0 1
− 2(−38) = −76.
105
Example 3.

4
 0
det 
 0
0


2 −3 8

5 1 3 
 = (4 · 5 · 2 · 3) det 

0 2 6 
0 0 3

1 1/2 −3/2 4
0 1
1/5 3/5 

0 0
1
3 
0 0
0
1

1
 0
= (4 · 5 · 2 · 3) det 
 0
0
0
1
0
0
0
0
1
0

0
0 

0 
1
= (4 · 5 · 2 · 3) · 1 = 120.
A matrix like that in the previous example, which has only zero entries below the
diagonal, is called upper-triangular. So A ∈ Mn×n is upper-triangular if Aij = 0
whenever i > j.
Proposition 2. The determinant of an upper-triangular matrix is the product of its
diagonal elements.
Proof. Let A be upper-triangular, and let E be its reduced echelon form. From
Proposition 1, we know that det(A) = k det(E) for some non-zero constant k. Imagine
row-reducing an upper-triangular matrix, and you will see that E has a row of zeros
if and only if A has some diagonal entry equal to zero. If E has a row of zeros,
then det(E) = 0. To see this, suppose the rows of E are r1 , . . . , rn with rn = ~0. By
multilinearity, we have:
det(E) = det(r1 , . . . , rn−1 , ~0)
= det(r1 , . . . , rn−1 , 0 · ~0)
= 0 · det(r1 , . . . , rn−1 , ~0)
= 0.
So if A has a diagonal entry equal to 0, then det(E) = 0, which implies det(A) =
k det(E) = 0. So the result holds in this case.
Next, suppose that A has no diagonal entries equal to 0. Compute det(A) using
106
multilinearity:





det(A) = det 



a11 a12 a13
0 a22 a23
0
0 a33
0
0
0
a14
a24
a34
a44
..
.
...
...
...
...
a1n
a2n
a3n
a4n









ann





= a11 · · · ann det 



1 a12 /a11 a13 /a11 a14 /a11
0
1
a23 /a22 a24 /a22
0
0
1
a34 /a33
0
0
0
1
...
. . . a1n /a11
. . . a2n /a22
. . . a3n /a33
. . . a4n /a44









1
= a11 · · · ann det(In )
= a11 · · · ann .
Above, it is clear that once we get to the case of all 1s on the diagonal, we can
row-reduce the matrix to the identity by adding multiples of rows to other rows—
operations that do not change the determinant.
Proposition 3. Let A ∈ Mn×n . The following are equivalent:
1. det(A) 6= 0,
2. rank(A) = n,
3. A is invertible, i.e., A has an inverse.
Proof. Given our algorithm for computing the inverse of a matrix, the equivalence
of parts 2 and 3 is evident. To show that parts 1 and 2 are equivalent, recall that by
Proposition 1, we have det(A) = k det(E) where E is the reduced echelon form of A
and k is a non-zero scalar. Thus, det(A) = 0 if and only if det(E) = 0. The rank of A
is n if and only if E = In , in which case det(A) = k 6= 0. The rank of A is strictly
less than n if and only if E has a row of zeros, which for a matrix in reduced echelon
form is equivalent to det(E) = 0.
107
To come:
1. Define the transpose, At of A by Atij := Aji . Then det At = det A, and thus, the
determinant is also the unique multilinear, alternating, normalized function on the
columns of a matrix.
2. The determinant is multiplicative: det(AB) = det(A) det(B).
3. The determinant may be calculated by “expanding” along any row or column.
4. We have the following formula for the determinant
X
det A =
sgn(σ)A1σ(1) · · · Anσ(n)
σ∈Sn
where Sn is the collection of all permutations of (1, . . . , n) and sgn(σ) is the sign
of the permutation σ (i.e., 1 if the permutation is formed by an even number of
flips and −1 if it is formed by an odd number of flips).
5. Over the real numbers, the determinant gives the signed volume of the parallelepiped spanned by the rows (or by the columns) of the matrix.
108
Week 8, Monday: det(A) = det(At ).
Elementary matrices. An n × n matrix is called an elementary matrix if it is
obtained from the identity matrix, In , through a single elementary row operations
(scaling a row by a nonzero scalar, swapping rows, or adding one row to another).
Here is why elementary matrices are interesting: Let E be an n×n elementary matrix
corresponding to some row operation and let A be any n × k matrix. Then EA is the
matrix obtained from A by performing that row operation. Thus, you can perform
row operations through multiplication by elementary matrices.
Example. Let


1 2 3 4
A =  3 0 −1 2  .
1 5 6 7
To find the elementary matrix that will subtract 3 times the first row of A from the
second row, we do that same operation to the identity matrix:




1 0 0
1 0 0
r2 →r2 −3r1
 0 1 0  −
−−−−−→ E =  −3 1 0  .
0 0 1
0 0 1
Multiplying by E on

1

−3
EA =
0
the left performs

0 0
1 2


1 0
3 0
0 1
1 5
the same elementary row operation to A:
 

3 4
1
2
3
4
−1 2  =  0 −6 −10 −10  .
6 7
1
5
6
7
Performing a sequence of row operations to a matrix is thus equivalent to multiplying
e is the reduced
on the right by a sequence of elementary matrices. In particular, if A
echelon form of A, then there are elementary matrices E1 , . . . , E` such that
e = E` · · · E2 E1 A.
A
109
Determinant of the transpose. If A is an m × n matrix, recall that its transpose
is the matrix At defined by
(At )ij := Aji .
Thus, the rows of At are the columns of A.
Our goal now is to prove the amazing fact that
det(A) = det(At ).
Example. We have seen that
det
a b
c d
= ad − bc.
Note that we also have
det
a b
c d
t !
= det
a c
b d
= ad − bc.
Recall that we can compute the determinant of A by performing row operations and
keeping track of swaps and scalings of rows. Once we have shown that det(A) =
det(At ), it follows that, in order compute the determinant of A, we may also use
column operations (again keeping track of swaps and scalings). That’s because row
operations applied to At are the same as column operations applied to A.
To prove this fact about determinants of tranposes, we need the following three results:
Theorem. Let A and B be n × n matrices. Then
det(AB) = det(A) det(B).
Proof. Upcoming Homework.
Proposition. Let A and B be n × n matrices. Then
1. (AB)t = B t At .
2. If A is invertible, then (At )−1 = (A−1 )t .
Proof. Upcoming homework.
Lemma. Let E be an elementary matrix. Then det(E) = det(E t ) 6= 0.
Proof. There are three cases to consider:
110
1. Suppose E is formed from In by swapping rows i and j. In this case, E t is also
formed from In by swapping rows i and j. So in this case, det(E) = −1 = det(E t ).
2. Suppose E is formed from In by scaling row i by λ 6= 0. In this case, E t is also
formed from In by scaling row i by λ. So in this case, det(E) = λ = det(E t ).
3. Suppose E is formed from In by adding λri to rj for rows ri 6= rj . Then E t is
formed from In by adding λrj0 to ri0 , where rk0 denotes the k-th row of E t . So in
this case. det(E t ) = det(E) = det(In ) = 1.
We can now prove our main result:
Theorem. Let A be an n × n matrix. Then det(A) = det(At ).
e be the reduced echelon form for A, and choose elementary matrices Ei
Proof. Let A
such that
e = E` · · · E2 E1 A.
A
(20.1)
Taking determinants and use the fact that determinants preserve products:
e = det(E` ) · · · det(E2 ) det(E1 ) det(A).
det(A)
Since the determinant of an elementary matrix is nonzero, we get
e
det(A) = det(E` )−1 · · · det(E2 )−1 det(E1 )−1 det(A).
(20.2)
Take transposes in equation (20.1):
et = At E t E t · · · E t .
A
1 2
`
Take determinants, solve for det(At ), and use the fact that det(E) = det(E t ) if E is
an elementary matrix:
et )
det(At ) = det(E1t )−1 det(E2t )−1 · · · det(E`t )−1 det(A
et )
= det(E1 )−1 det(E2 )−1 · · · det(E` )−1 det(A
et ).
= det(E` )−1 · · · det(E2 )−1 det(E1 )−1 det(A
(20.3)
To finish the proof, we consider two cases. The first case is where rank(A) = n. This
e = In , in which case A
et = In , as well, and hence det(A)
e =
is the case if and only if A
et ) = det(In ) = 1. From equations (20.2) and (20.3), it follows that
det(A
det(A) = det(E` )−1 · · · det(E2 )−1 det(E1 )−1 = det(At ).
111
The second case is where rank(A) < n. Since row and column rank are the same,
we have rank(At ) = rank(A) < n, as well. Recall that an n × n matrix has nonzero
determinant if and only if its rank is n (since its reduced echelon form will have a row
of 0s). So in this case, we deduce det(A) = det(At ) = 0 without needing to consider
elementary matrices.
The following incredible result immediately follows:
Corollary. The determinant is a multilinear, alternating, normalized function of the
columns of a square matrix.
112
Week 8, Wednesday: Permutation expansion of the determinant.
Permutation expansion of the determinant.
Definition. A permutation of a set X is a bijective mapping of X to itself. The
collection of all permutations of X, is called the symmetric group on X. For each
nonnegative integer n, let [n] := {1, . . . , n}. The symmetric group on [n] is called the
symmetric group of degree n and denoted by Sn .
Example. Here are six elements of S3 :
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3.
Note. Define the factorial of a natural number as follows: 0! = 1, and for each
integer n > 0, recursively define n! = n(n − 1)!. Thus, 1! = 1, 2! = 2 · 1, 3! = 3 · 2 · 1 =
6, 4! = 4 · 3 · 2 · 1 = 24, etc. Then the number of elements of Sn = n! since we can
uniquely determine every permutation σ by first choosing one of n values for σ(1),
then any of the remaining values n − 1 for σ(2), then one of the remaining n − 2
values of σ(3), etc.
Definition. Let σ ∈ Sn . The permutation matrix corresponding to σ is the n × n
matrix Pσ whose i-th row is eσ(i) . Another way of saying this is that, Pσ is obtained by permuting the columns of the identity matrix, In , according to σ: put ej
in column eσ(j) .
113
Example. Let σ ∈ S3 be defined by σ(1) = 2, σ(2) = 3, and σ(3) = 1. Then


0 1 0
Pσ =  0 0 1  .
1 0 0
Exercise. Let σ, τ ∈ Sn and let A be an n × n matrix.
1. If the rows of A are (r1 , . . . , rn ), then the i-th row of Pσ A is rσ(i) . In other words,
the multiplying on the left by Pσ permutes the rows of A in the same way that
the rows of In are permuted to form Pσ . We leave it as an exercise to the reader
to investigate the effect of multiplying A on the right by P .
2. Pσ eσ(i) = ei , and Pσ Pτ = Pτ ◦σ . (Note that the order of σ and τ have switched.)
Rook placements. Permutation matrices are exactly those that have a single 1 in
each row and in each column. Thus, if the 1s in a permutation matrix were replaced
by rooks in the game of chess, then no rook would be attacking another. We say
sometimes call a permutation matrix a rook placement.
Definition. The sign of σ ∈ Sn is
sign(σ) = det(Pσ ) = ±1.
A permutation is even if its sign is 1 and odd if its sign is −1.
Every permutation σ ∈ Sn may be obtained from In through a sequence of transpositions of columns, i.e., a sequence in which each step consists of swapping two
columns. For example,
1
2
3
1
2
3
can be obtained from two swaps:
1
2
3
1
2
3
114
1
2
3.
Thus, every permutation matrix can be obtained as the product of permutation matrices corresponding to transpositions. Swapping two columns in a matrix changes the
sign of the determinant. Therefore, even though a permutation σ may be realized in
different ways as sequences of transpositions, the parity (evenness or oddness) of the
number of transpositions required is well-defined: the number is even if det(Pσ ) = 1
and odd if det(Pσ ) = −1.
Theorem. Let A be an n × n matrix. Then
X
det(A) =
sign(σ)A1σ(1) A2σ(2) · · · Anσ(n) .
σ∈Sn
Example. Consider the case n = 3. Then


a11 a12 a13
A =  a21 a22 a23  .
a31 a32 a33
Each term A1σ(1) A2σ(2) · · · Anσ(n) in the formula in the theorem should be thought of
as the product of the entries corresponding to a rook placement. The permutations,
rook placements, and corresponding summands appear in Figure 21.1.
Proof of permutation formula for the determinant. We want to compute
det(A11 e1 + A12 e2 + · · · + A1n en ,
...
, An1 e1 + An2 e2 + · · · + Ann en ).
Each of the n components in the above expression consists of n summands where
each of the summands has the form aij ej . Using the multilinear properties of the
determinant, when we expand the above express, we get n! terms, each of the form
A1j1 A2j2 · · · Anjn det(e1j1 , e2j2 , . . . , enjn ).
If any pair of these ekjk is the same, this term will evaluate to 0. Thus, for the nonzero
terms e1j1 , e2j2 , . . . , enjn must be some permutation of e1 , . . . , en . We then have
det(e1j1 , e2j2 , . . . , enjn ) = ±1,
depending on the sign of the permutation σ defined by
σ(1) = j1 , σ(2) = j2 , . . . , σ(n) = jn ,
and we can write
A1j1 A2j2 · · · Anjn det(e1j1 , e2j2 , . . . , enjn ) = sign(σ)A1σ(1) A2σ(2) · · · Anσ(n) .
115
1
2
3
1
2
3

1
2
3
1
2
3

1
2
3
1
2
3

1
2
3
1
2
3

1
2
3
1
2
3

1
2
3
1
2
3


a11 a12 a13
 a21 a22 a23 
a31 a32 a33

a11 a12 a13
 a21 a22 a23 
a31 a32 a33

a11 a12 a13
 a21 a22 a23 
a31 a32 a33

a11 a12 a13
 a21 a22 a23 
a31 a32 a33

a11 a12 a13
 a21 a22 a23 
a31 a32 a33

a11 a12 a13
 a21 a22 a23 
a31 a32 a33
a11 a22 a33
−a12 a21 a33
−a13 a22 a31
−a11 a23 a32
a12 a23 a31
a13 a21 a32
.
Figure 21.1: Computing the determinant of the 3 × 3 matrix A = (aij ) via rook
placements. The determinant is the sum of the terms in the right-most column.
116
Week 8, Friday: Permutation, Laplace expansions. Existence
and uniqueness of the determinant.
Permutation expansion revisited. If A is an n × n matrix, we saw last time that
X
det A =
sign(σ)A1σ(1) A2σ(2) · · · Anσ(n) .
σ∈Sn
Let’s look at the proof again in the case n = 3. The i-th row vector of A is
ri = ai1 e1 + ai2 e2 + ai3 e3 .
To compute the determinant of A we start by expanding using multilinearity:
det(A) = det(r1 , r2 , r3 )
= det(a11 e1 + a12 e2 + a13 e3 , a21 e1 + a22 e2 + a23 e3 , a31 e1 + a32 e2 + a33 e3 )
= det(a11 e1 , a21 e1 + a22 e2 + a23 e3 , a31 e1 + a32 e2 + a33 e3 )
+ det(a12 e2 , a21 e1 + a22 e2 + a23 e3 , a31 e1 + a32 e2 + a33 e3 )
+ det(a13 e3 , a21 e1 + a22 e2 + a23 e3 , a31 e1 + a32 e2 + a33 e3 )
= det(a11 e1 , a21 e1 , a31 e1 + a32 e2 + a33 e3 )
+ det(a11 e1 , a22 e2 , a31 e1 + a32 e2 + a33 e3 )
+ det(a11 e1 , a23 e3 , a31 e1 + a32 e2 + a33 e3 )
+ det(a12 e2 , a21 e1 , a31 e1 + a32 e2 + a33 e3 )
+ det(a12 e2 , a22 e2 , a31 e1 + a32 e2 + a33 e3 )
+ det(a12 e2 , a23 e3 , a31 e1 + a32 e2 + a33 e3 )
+ det(a13 e3 , a21 e1 , a31 e1 + a32 e2 + a33 e3 )
+ det(a13 e3 , a22 e2 , a31 e1 + a32 e2 + a33 e3 )
+ det(a13 e3 , a23 e3 , a31 e1 + a32 e2 + a33 e3 )
117
There is one more step to go in the complete expansion, at which point, we’ll have 27
terms. For completeness, I’ll list these all on the next page.
118
= det(a11 e1 , a21 e1 , a31 e1 )
+ det(a11 e1 , a21 e1 , a32 e2 )
+ det(a11 e1 , a21 e1 , a33 e3 )
+ det(a11 e1 , a22 e2 , a31 e1 )
+ det(a11 e1 , a22 e2 , a32 e2 )
+ det(a11 e1 , a22 e2 , a33 e3 )
+ det(a11 e1 , a23 e3 , a31 e1 )
+ det(a11 e1 , a23 e3 , a32 e2 )
+ det(a11 e1 , a23 e3 , a33 e3 )
+ det(a12 e2 , a21 e1 , a31 e1 )
+ det(a12 e2 , a21 e1 , a32 e2 )
+ det(a12 e2 , a21 e1 , a33 e3 )
+ det(a12 e2 , a22 e2 , a31 e1 )
+ det(a12 e2 , a22 e2 , a32 e2 )
+ det(a12 e2 , a22 e2 , a33 e3 )
+ det(a12 e2 , a23 e3 , a31 e1 )
+ det(a12 e2 , a23 e3 , a32 e2 )
+ det(a12 e2 , a23 e3 , a33 e3 )
+ det(a13 e3 , a21 e1 , a31 e1 )
+ det(a13 e3 , a21 e1 , a32 e2 )
+ det(a13 e3 , a21 e1 , a33 e3 )
+ det(a13 e3 , a22 e2 , a31 e1 )
+ det(a13 e3 , a22 e2 , a32 e2 )
+ det(a13 e3 , a22 e2 , a33 e3 )
+ det(a13 e3 , a23 e3 , a31 e1 )
+ det(a13 e3 , a23 e3 , a32 e2 )
+ det(a13 e3 , a23 e3 , a33 e3 )
119
Use linearity to pull out the constants:
= a11 a21 a31 det(e1 , e1 , e1 )
+ a11 a21 a32 det(e1 , e1 , e2 )
+ a11 a21 a33 det(e1 , e1 , e3 )
+ a11 a22 a31 det(e1 , e2 , e1 )
+ a11 a22 a32 det(e1 , e2 , e2 )
+ a11 a22 a33 det(e1 , e2 , e3 )
+ a11 a23 a31 det(e1 , e3 , e1 )
+ a11 a23 a32 det(e1 , e3 , e2 )
+ a11 a23 a33 det(e1 , e3 , e3 )
+ a12 a21 a31 det(e2 , e1 , e1 )
+ a12 a21 a32 det(e2 , e1 , e2 )
+ a12 a21 a33 det(e2 , e1 , e3 )
+ a12 a22 a31 det(e2 , e2 , e1 )
+ a12 a22 a32 det(e2 , e2 , e2 )
+ a12 a22 a33 det(e2 , e2 , e3 )
+ a12 a23 a31 det(e2 , e3 , e1 )
+ a12 a23 a32 det(e2 , e3 , e2 )
+ a12 a23 a33 det(e2 , e3 , e3 )
+ a13 a21 a31 det(e3 , e1 , e1 )
+ a13 a21 a32 det(e3 , e1 , e2 )
+ a13 a33 a21 det(e3 , e1 , e3 )
+ a13 a22 a31 det(e3 , e2 , e1 )
+ a13 a22 a32 det(e3 , e2 , e2 )
+ a13 a22 a33 det(e3 , e2 , e3 )
+ a13 a23 a31 det(e3 , e3 , e1 )
+ a13 a23 a32 det(e3 , e3 , e2 )
+ a13 a23 a33 det(e3 , e3 , e3 )
120
Now we use the alternating property of the determinant. If any row is repeated, the
determinant is 0. Getting rid of those terms leaves:
det(A) = a11 a22 a33 det(e1 , e2 , e3 )
+ a11 a23 a32 det(e1 , e3 , e2 )
+ a12 a21 a33 det(e2 , e1 , e3 )
+ a12 a23 a31 det(e2 , e3 , e1 )
+ a13 a21 a32 det(e3 , e1 , e2 )
+ a13 a22 a31 det(e3 , e2 , e1 ).
Next notice that each determinant appearing above is the determinant of a permutation matrix. For instance, the term
a12 a23 a31 det(e2 , e3 , e1 )
contains det(e2 , e3 , e1 ), which is the determinant of the permutation matrix for the
permutation σ(1) = 2, σ(2) = 3, and σ(3) = 1. We have
a12 a23 a31 det(e2 , e3 , e1 ) = a1σ(1) a2σ(2) a3σ(3) det(Pσ )
= a1σ(1) a2σ(2) a3σ(3) sign(σ).
In this way, the six terms in the sum can be expressed as follows:
X
det(A) =
det(Pσ )a1σ(1) a2σ(2) a3σ(3)
σ∈S3
=
X
sign(Pσ )a1σ(1) a2σ(2) a3σ(3) .
σ∈S3
Laplace expansion of the determinant. Let A be an n×n matrix. For each i, j ∈
{1, 2, . . . , n}, define Aij to be the matrix formed by removing the i-th row and j-th
column from A. Fix k ∈ {1, 2, . . . , n}. Then
det(A) =
n
X
(−1)k+j Akj det(Akj ).
j=1
This expresses det(A) in terms of an alternating sum of determinants of (n−1)×(n−1)
matrices. We call this expanding det(A) along the k-th row. Applying the formula
recursively leads to a complete evaluation of det(A). Since, det(A) = det(At ), you
can also calculate the determinant by recursively expanding along columns, too.
121
Example. Let


1 2 3
A =  2 0 1 .
1 1 1
Let’s calculate the determinant by expanding along the first row:
0 1
2 1
2 0
det(A) = 1 · det
− 2 · det
+ 3 · det
1 1
1 1
1 1
= (−1) − 2(1) + 3(2) = 3.
To check, let’s expand along the second row, instead (note the signs)
2 3
1 3
1 2
det(A) = −2 · det
0 · det
− 1 · det
1 1
1 1
1 1
= −2(−1) + 0(−2) − 1(−1) = 3.
Finally, let’s expand along the third column:
2 0
1 2
1 2
det(A) = 3 · det
− 1 · det
+ 1 · det
1 1
1 1
2 0
= 3(2) − 1(−1) + 1(−4) = 3.
Note: if your matrix has a particular row or column with a lot of 0s in it, you might
want to expand along that row or column since a lot of the terms will be 0. For
example, to compute


1 3 0
det  3 2 3  ,
1 4 0
expand along the third column:
0 (blah) − 3 det
1 3
1 4
+ 0 (blah) = −3(1) = −3.
The “blah”s are there instead of explicit determinants since they are being multiplied
by 0. Their exact values don’t matter, so we don’t need to waste time calculating
them. We will not prove the formula for the Laplace expansion. It is very similar to
that for the permutation expansion.
122
Multiplicative property of the determinant. Let A and B be n × n matrices. We would like to prove det(AB) = det(A) det(B). In that case, we’ve seen in
homework that rank(AB) < n, and hence det(AB) = 0 = det(A) det(B). So now
assume det(B) 6= 0, and define the function
d(A) =
det(AB)
.
det(B)
We have seen earlier that det(A) can be computed by applying row operations to A:
1. Swapping: if A0 is formed by swapping rows of A, then det(A) = − det(A0 ).
2. Scaling: if A0 is formed by scaling a row of A by λ ∈ F , then λ det(A) = det(A0 ).
So if λ 6= 0, then det(A) = λ1 det(A0 ).
3. Skewing: if A0 is formed by adding a multiple of one row of A to another row,
then det(A) = det(A0 ).
If rank(A) < n, the determinant is 0. Otherwise, the reduced echelon form of A is In ,
and det(In ) = 1. So we can compute det(A) by reducing A to reduced echelon form,
keeping track of swaps and scalings.
How does d(A) interact with row operations? Let E be the elementary matrix corresponding to some row operation. Then EA is the matrix that is formed by applying
the row operation to A. We have
d(EA) =
det((EA)B)
.
det(B)
By associativity of matrix multiplication, we have
(EA)B = E(AB).
This means that (EA)B is the matrix obtained from AB by performing the row
operation encoded in E to AB. If the row operation is a swap, then
d(EA) =
det(EA)
det(AB)
=−
= −d(A);
det(B)
det(B)
if it’s a scaling by λ, then d(EA) = λd(A), and if
d(EA) =
det(EA)
det(AB)
=λ
= λd(A);
det(B)
det(B)
123
and if its a skewing,
d(EA) =
det(EA)
det(AB)
=
= d(A).
det(B)
det(B)
Since d(A) and det(A) behave the same with respect to row operations, and since
d(In ) =
det(B)
det(In B)
=
= 1,
det(B)
det(B)
we have d(A) = det(A). In other words,
d(A) =
det(AB
= det(A),
det(B)
which implies det(AB) = det(A) det(B).
Existence and uniqueness of the determinant. We have given three different
methods for calculating the determinant. Surprisingly, however, if you carefully examine what we have done so far, you will see that we haven’t actually proved that the
determinant exists. Let’s consider the different methods we have used to calculate
the determinant.
1. Our first method of calculating the determinant (and the fastest) came from showing the determinant, if it exists, must behave a certain way with respect to row
operations. We saw that the determinant can be calculated by keeping track of
swaps and scalings while reducing a matrix to reduced echelon form. However,
even though the reduced echelon form of a matrix is unique, the path used to get
to the reduced echelon form—the order and types of row operations used to get
there—is not unique. So it is conceivable that two different paths would lead to
two different values for the determinant, and hence the determinant would not be
well-defined.
2. Our second method of calculating the determinant was the permutation expansion.
That method requires calculating the sign of a permutation, which is defined as
the determinant of the corresponding permutation matrix. It follows that if we
could show that the determinant for permutation matrices is well-defined, then we
have proved existence of determinants for all matrices. This is a viable strategy.
A rewording of the problem is this: show that the parity of the number of transpositions needed to form a given permutation is well-defined. Each permutation
can be created by repeatedly swapping pairs of elements, but in fact, there is an
infinite number of ways of doing that. The evenness or oddness of that number is
well-defined. This is a combinatorial problem that can be solved and which then
implies the existence of the determinant.
124
3. Our third method of calculating the determinant is the Laplace expansion. It can
be used to give a rather straightforward proof of the existence of the determinant
which we now outline. Start by defining a function d : Mn×m → F recursively
that mimics the Laplace expansion along first rows:
n
X
d(A) :=
(−1)1+j d(A1,j )
j=1
and if A = [a] is a 1×1 matrix, then define d(A) = a. Then d is well-defined—there
are no choices along the way, so the result is unique. The grungy but straightforward part is to now show that d is a multilinear, alternating, normalized function
of the rows of A. Let’s assume we’ve done that. We then know there is at least
one determinant function. However, we have already seen that any multilinear,
alternating, normalized function of the rows of a matrix has the value assigned to
it by any of the three methods above. So the determinant is unique.
A consequence of this proof is that the parity of a permutation is well-defined. Another consequence could be stated involving the number of swaps and the product
of the scalings along the way to the reduced echelon form.
125
Week 9, Monday: Geometric interpretation of the determinant.
Geometric interpretation of the determinant
In this worksheet we will explore the geometric meaning of the determinant when
working over R.
Let v~1 = (x1 , y1 ) and v~2 = (x2 , y2 ) be linearly independent vectors in R2 , and consider
the parallelogram determined by them, as in the picture below.
v~2
v~1
Let A(v~1 , v~2 ) be the area of this parallelogram. Note that if v~1 and v~2 are linearly
dependent, they determine what we might call a degenerate parallelogram whose area
can be considered to be 0. Thus, we have defined a function
A : R2 × R2 → R.
1. Let k ∈ R. What is A(k v~1 , v~2 )? (Be careful with the case k < 0.)
2. Let k ∈ R. What is A(v~1 , v~2 + k v~1 )? (You can do this by drawing the correct
picture.)
3. What is A((1, 0), (0, 1))?
4. You may have noted by now that the function A almost behaves like a determinant
function (following Definition 2.1 in Chapter Four, Section I of the book). What
is the issue?
126
5. We can define the signed area of the parallelogram determined by (v~1 , v~2 ) as follows. For this definition, the order of v~1 and v~2 matters. If v~1 and v~2 are linearly
independent, they are both nonzero and non-parallel. Let θ be the angle from v~1
to v~2 , measured counterclockwise (in radians, and between 0 and 2π). The signed
area SA is defined as
(
A(v~1 , v~2 )
if θ < π
SA(v~1 , v~2 ) =
−A(v~1 , v~2 ) if θ > π.
Prove that SA(v~1 , v~2 ) = det(v~1 , v~2 ), where the latter means take the determinant
of the 2 × 2 matrix with rows as given.
Note that this formula also works if we take the matrix with columns given by v~1
and v~2 .
6. Consider
transformation f : R2 → R2 given by multiplication by the
the linear
1 1
matrix
. What happens to the square with vertices (0, 0), (1, 0), (0, 1),
−2 0
(1, 1) under this transformation. How does the area change?
Let (v~1 , . . . , v~n ) be an n-tuple of vectors in Rn . The parallelepiped formed by (v~1 , . . . , v~n )
is the set
{t1 v~1 + . . . tn v~n | t1 , . . . , tn ∈ [0, 1]}.
When n = 2, this gives precisely the parallelogram we have been considering. In R3 ,
we get a solid prism as long as the vectors are linearly independent.
We define the volume of a parallelepiped determined by (v~1 , . . . , v~n ) as the absolute
value of the determinant of the n×n matrix whose columns are given by those vectors.
127
7. Using properties of the determinant and your intuition about how a volume should
behave, argue why this definition makes sense.
8. For a linear transformation f : Rn → Rn given by multiplication by the n × n
matrix A. How does the volume of a box change under this transformation?
128
Week 9, Wednesday: Eigenvalues.
Eigenstuff
Definition. Let f : V → V be a linear transformation of a vector space V over F .
A nonzero vector v ∈ V is an eigenvector for f with eigenvalue λ ∈ F if
f (v) = λv.
Here is why we like eigenvectors: Choose v as part of an ordered basis α for V , and
consider the matrix [f ]αα representing f with respect to the basis α for the domain
and codomain (which are both V ). If v is the j-th element in α, then j-th the column
of [f ]αα is ej , the j-th standard basis vector.
Example. Let
−1 2
−6 6
A=
.
It turns out that (2, 3) and (1, 2) are eigenvectors for f with eigenvalues 2 and 3,
respectively:
−1 2
2
4
2
=
=2
−6 6
3
6
3
−1 2
−6 6
1
2
=
3
6
=3
1
2
.
The linear transformation corresponding to A over R is
f : R2 → R2
(x, y) 7→ (−x + 2y, −6x + 6y).
Find the matrix representing fA with respect to the ordered basis
α = h(2, 3), (1, 2)i.
129
To do this we write the image of each vector in α as a linear combination of the
vectors in α and pull off the coefficients create columns:
fA (2, 3) = 2(2, 3) = 2 · (2, 3) + 0 · (1, 2)
fA (1, 2) = 3(1, 2) = 0 · (2, 3) + 3 · (1, 2).
Hence,
[fA ]αα
=
2 0
0 3
= diag(2, 3).
And that’s the whole point: a basis of eigenvectors give a matrix representative that
is diagonal, which is the simplest type of matrix to think about. Let Let’s think
abstractly about what just happened. The matrix A is the matrix representing fA
with respect to the standard basis, and the matrix
2 0
D = diag(2, 3) =
.
0 3
represents fA with respect to the basis α. Let φα be the mapping that takes coordinates with respect to α. We get the commutative diagram:
(1, 2)
3(1, 2)
A
φα
D
R2
2(2, 3)
R2
∼
R2
∼
(2, 3)
φα
R2 .
(1, 0)
2(1, 0)
(0, 1)
3(0, 1)
The matrix for φα would be a bit of a chore to write down. It’s j-column would be
the image of ej . So we would have to write each ej as a linear combination of the
basis vectors in α. However, the inverse of φα is easy to write down. Take a look at
the commutative diagram. We can see that, by construction of φα , we have
−1
φ−1
α (1, 0) = (2, 3) and φα (0, 1) = (1, 2).
So the matrix for φ−1
α is
P =
2 1
3 2
.
So the matrix for φα is P −1 . So another way to write the commutative diagram is
130
R2
R2
∼
P −1
A
∼
R2
D
P −1
R2 .
For contemplating this diagram, we see that
D = P −1 AP.
Important summary: having found eigenvectors (2, 3) and (1, 2), we place those
eigenvectors as columns in a matrix P , and then P −1 AP is a diagonal matrix with
the corresponding eigenvalues on the diagonal.
131
Week 9, Friday: Eigenvalues, continued.
Recall from last time: an eigenvector for a linear transformation f : V → V is a
nonzero vector v ∈ V such that
f (v) = λv
for some λ ∈ F . In that case, λ is called an eigenvalue for f .
Definition. A linear transformation f : V → V is diagonalizable if there exists an
ordered basis α of V such that [f ]αα = diag(λ1 , . . . , λn ).
Note. If α = hv1 , . . . , vn i, in the above definition, then we have
f (vi ) = 0 · v1 + · · · + 0 · vi−1 + λi · vi + 0 · vi+1 + · · · + 0 · vn
= λi vi ,
and thus, vi is an eigenvector with eigenvalue λi for each i. It follows that
Proposition. A linear transformation f : V → V is diagonalizable if and only if it
has a basis of eigenvectors.
Example. Not all linear tranformations of a vector space to itself are diagonalizable.
For instance, consider the linear transformation f : R2 → R2 that is rotation of the
plane by 90◦ , having matrix
0 −1
A=
.
1 0
(0, 1)
(−1, 0)
(1, 0)
There are no point 0 6= v ∈ R2 such that Av = λv for some λ. (The matrix is
diagonalizable over C, though. Can you find two eigenvectors? Don’t get your hopes
up—there are matrices that are not diagonalizable over C.)
132
Suppose f : F n → F n is a linear transformation, and let A be the matrix corresponding to f (this is the matrix for f with respect to the standard basis for F n ). Suppose
we can find a basis α = hv1 , . . . , vn i of eigenvectors for f with corresponding, not necessarily distinct, eigenvalues λ1 , . . . , λn . Let P be the matrix with columns v1 , . . . , vn .
Then, as we saw last time,
P −1 AP = diag(λ1 , . . . , λn ).
Definition. Two n × n matrices over F are similar or conjugate if there exists an
invertible matrix P such that P −1 AP .
Exercise. The reader should verify that similarity is an equivalence relation.
Finding eigenvectors and eigenvalues. Let A ∈ Mn×n (F ) with corresponding
linear function
fA : F n → F n
v 7→ Av.
The following argument is of central importance in the story of eigenstuff: We are
looking for nonzero v ∈ F n and any λ ∈ F such that Av = λv. We have
Av = λv
⇔
(A − λIn )v = 0
⇔
v ∈ ker(A − λv).
This says that:
λ ∈ F is an eigenvalue for A if and only if ker(A − λIn ) 6= {0}.
So we would like to determine those λ for which the kernel of A − λIn is nontrivial,
for which the following is key:
ker(A − λIn ) 6= {0}
⇔
rank(A − λIn ) < n
Definition. The characteristic polynomial of A is
pA (x) = det(A − xIn ).
133
⇔
det(A − λIn ) = 0.
We have just seen that λ ∈ F is an eigenvalue for A if and only if it is a zero of the
characteristic polynomial for A, i.e., p(λ) = 0.
Example. Let
A=
−1 2
−6 6
.
The characteristic polynomial of A is
pA (t) = det
= det
= det
−1 2
−6 6
−1 2
−6 6
−x
−
−1 − x
2
−6
6−x
1 0
0 1
x 0
0 x
= (−1 − x)(6 − x) − 2(−6)
= x2 − 5x + 6
= (x − 2)(x − 3).
Thus, pA (x) = 0 if and only if x = 2, 3. So the eigenvalues of A are 2 and 3.
Recall that our goal is to diagonalize A by finding a basis of eigenvectors. That’s
not always possible, but we can try. The first step is to compute the zeros of the
characteristic polynomial, pA (x). This tells us the eigenvalues for A. We now need to
find the eigenvectors to go along with these eigenvalues. Recall that nonzero v ∈ F n
is an eigenvector for A with eigenvalue λ if and only v ∈ ker(A − λIn ).
Definition. Let λ be an eigenvalue of the n×n matrix A over F . Then the eigenspace
for λ is
Eλ = E(A)λ = {v ∈ F n : Av = λv} = ker(A − λv).
The eigenspace, being the kernel of a matrix, is a linear subspace of F n .
The second step in trying to diagonalize A is to compute a basis for each eigenspace Eλ .
134
Example. We have seen that the eigenvalues for
−1 2
A=
−6 6
are 2 and 3. Let’s compute the corresponding eigenspaces.
λ=2
A − 2I2 =
=
−1 2
−6 6
−3 2
−6 4
−→
−
1 − 23
0 0
2 0
0 2
.
Hence,
2
y, y
3
:y∈R .
ker(A − 2I2 ) =
For a basis we could take 32 , 1 , or easier, (2, 3).
λ=3
A − 3I2 =
=
−1 2
−6 6
−4 2
−6 3
−→
−
1 − 12
0 0
3 0
0 3
.
Hence,
1
y, y
2
ker(A − 3I2 ) =
For a basis we could take 21 , 1 , or easier, (1, 2).
:y∈R .
Thus, we have found two eigenvectors (2, 3) and (1, 2). It turns out that eigenvectors
for distinct eigenvalues are always linearly independent (we’ll see this later). Hence,
we have found a basis of eigenvectors. Thus, A is diagonalizable, as illustrated at the
beginning of this lecture.
135
Week 10, Monday: Diagonalization.
Before getting started, we make an observation which we should have probably already
mentioned:
Proposition. Let A, B be n × n matrices representing a linear function f : V → V
with respect to different bases. Then their characteristic polynomials are the same:
pA (x) = pB (x).
Proof. We have A = Q−1 BQ for some n × n matrix Q. Then
pA (x) = det(A − xIn )
= det(Q−1 BQ − xIn )
= det(Q−1 BQ − xQ−1 In Q)
= det(Q−1 BQ − Q−1 (xIn )Q)
x is a scalar
−1
= det(Q (B − xIn )Q)
= det(Q−1 ) det(B − xIn ) det(Q)
= det(B − xIn ).
For the last step, recall that det(Q−1 = det(Q)−1 follows from multiplicativity of the
determinant:
1 = det(In ) = det(Q−1 Q) = det(Q−1 ) det(Q).
Thus, it makes sense to talk about the characteristic polynomial of a linear
transformation: it the characteristic polynomial of any matrix representative of
the transformation.
Perhaps a better way to see this is to recall that the determinant of a linear transformation f : V → V is well-defined. That is, if A and B are matrices representing f
with respect to different bases, then det(A) = det(B). (Reason: there will exist a
matrix Q such that Q−1 AQ = B.) So we could define the characteristic polynomial
for f to be
pf (x) := det(f − x idV ),
the determinant of the linear transformation f − x idV .
136
Algorithm for diagonalizing A ∈ Mn×n (F ).
1. Find the eigenvalues of A as the zeros of its characteristic polynomial, pA (x) =
det(A − xIn ).
2. For each eigenvalue λ, compute a basis for the eigenspace Eλ = ker A − λIn .
3. The matrix A is diagonalizable if and only of the total number of eigenvectors in
the bases found in the previous step is n. If so, these vectors combined form a basis
for F n . Create a matrix P whose columns are these vectors. Then P −1 AP = D,
where D is a diagonal matrix with the eigenvalues along the diagonal corresponding
to a commutative diagram of linear functions:
Fn
Fn
∼
P −1
A
∼
Fn
D
P −1
F n.
The matrix P −1 corresponds to taking coordinates with respect to the basis of
eigenvalues.
An n × n matrix A is diagonalizable if and only if it has n linearly independent
eigenvectors. In step 3 of the diagonalization algorithm we combine bases for different
eigenspaces. It turns out that the resulting collection of vectors is automatically
linearly independent:
Proposition. Let V be any vector space, and let f : V → V be a linear transformation. Let v1 , . . . , vk ∈ V be eigenvectors for f with corresponding eigenvalues λi :
f (vi ) = λi vi
for i = 1, . . . , k. Suppose λ1 , . . . , λk are distinct. Then v1 , . . . , vk are linearly independent.
Proof. We will prove this by induction on k. The case k = 1 is OK since, by definition, an eigenvector is a nonzero vector. Suppose v1 , . . . , vk−1 are linearly independent
for some k > 1 and that
a1 v1 + · · · + ak vk = 0
137
for some ai ∈ F . Apply the linear transformation f − λk idV to this equality to get
(f − λk idV )(a1 v1 + · · · + ak vk ) = (f − λk idV )(0) = 0
⇒
⇒
⇒
f (a1 v1 + · · · + ak vk ) − λk idV (a1 v1 + · · · + ak vk ) = 0
(a1 λ1 v1 + · · · + ak λk vk ) − (a1 λk v1 + · · · + ak λk vk ) = 0
a1 (λ1 − λk )v1 + · · · + ak−1 (λk−1 − λk )vk = 0.
Since v1 , . . . , vk−1 are linearly independent, all the coefficients are zero:
a1 (λ1 − λk ) = · · · = ak−1 (λk−1 − λk ) = 0.
Since the λi are distinct, this implies a1 = · · · = ak−1 = 0. Therefore, the original
equation, a1 v1 + · · · + ak vk = 0 becomes ak vk = 0. Since vk is an eigenvector, it is
nonzero. Hence, ak , as well.
138
Week 10, Wednesday: Algebraic and geometric multiplicity.
Jordan form.
Last time we proved the following:
Proposition. Let V be any vector space, and let f : V → V be a linear transformation. Let v1 , . . . , vk ∈ V be eigenvectors for f with corresponding eigenvalues λi :
f (vi ) = λi vi
for i = 1, . . . , k. Suppose λ1 , . . . , λk are distinct. Then v1 , . . . , vk are linearly independent.
It immediately follows that
Corollary. Suppose dim V = n and f : V → V is a linear transformation. Then if f
has n distinct eigenvalues, it is diagonalizable.
Proof. From the proposition, V would have n linearly independent eigenvectors.
Let α be an ordered basis consisting of those eigenvectors. Then [f ]αα is diagonal. Warning. The converse to the corollary is not true. For instance, consider the
identity function on F n . Its matrix is In , which is already diagonal, and 1 is its only
eigenvalue:
pIn (x) = det(In − xIn ) = det ((1 − x)In ) = (1 − x)n det(In ) = (1 − x)n .
When does a transformation fail to be diagonalizable? We now introduce a
sequence of ideas that will allow us to answer this question.
Example. Earlier, we considered the linear transformation R2 → R2 given by the
matrix
0 −1
A=
.
1 0
139
Geometrically, it rotates the plane counterclockwise by 90◦ and hence has no eigenvectors. Its characteristic polynomial is
−x −1
pA (x) = det
= x2 + 1.
1 −x
The equation x2 + 1 = 0 has no solutions over R, and hence, the transformation has
no eigenvalues.
Now consider the linear transformation f : C2 → C2 given by the same matrix A.
Over C we can solve x2 + 1 = 0 to find two eigenvalues, ±i. Each of these will
have at least one eigenvector, and eigenvectors for distinct eigenvalues are linearly
independent. Since C2 has dimension 2, that means we will get a basis of eigenvectors.
Let’s compute a basis for the eigenspace for i:
−i −1
1 −i
1 −i
r1 ↔r2
r2 ↔r2 +ir1
A − iI2 =
−−−→
−−−−−−→
.
1 −i
−i −1
0 0
So the kernel of A − iI2 is {(iy, y) : y ∈ C}, which has basis {(i, 1)}. Similarly, the
eigenspace for −i has basis {(−i, 1)}. Check:
i
0 −1
i
−1
i
A
=
=
=i
1
1 0
1
i
1
A
−i
1
=
0 −1
1 0
−i
1
=
−1
−i
= −i
−i
1
.
Letting
i −i
1 1
P =
,
we get
P
−1
AP =
i 0
0 −i
.
This example illustrated one obstacle to diagonalization: the characteristic polynomial may not have zero in the field F .
Definition. A polynomial p ∈ F [x] splits over F if there exist c, λ1 , . . . , λn ∈ F such
that
p(x) = c(x1 − λ1 ) · · · (x − λn ).
(Equivalently, p(x) had n zeros, λ1 , . . . , λn , in F .) These λi need not be distinct.
The number of times a particular value λ occurs among the λi is called its algebraic
multiplicity.
140
Example. The polynomial 5(x−1)2 (x−3) has 1 as a zero with algebraic multiplicity 2
and 3 as a zero with algebraic multiplicity 1.
A useful fact from algebra:
Theorem. (Fundamental theorem of algebra.) Every p ∈ C[x] splits over C.
Our next step is to explore how the algebraic multiplicities of the zeros of the characteristic polynomial relate to diagonalizability.
Proposition. Let V be a vector space over F with dim V = n, and let f : V → V
be a linear transformation. Then if f is diagonalizable, its characteristic polynomial
splits over F .
Proof. Let D = diag(λ1 , . . . , λn ) be a diagonal matrix representing f . Then the
characteristic polynomial for f is
det(D − xIn ) = (λ1 − x) · · · (λn − x) = (−1)n (x1 − λ1 ) · · · (x − λn ).
Now for the next twist. The converse of this proposition is not true:
Example. Let
A=
1 1
0 1
.
Its characteristic polynomial is
pA (x) = det(A − xI2 )
1−x
1
= det
0
1−x
= (x − 1)2 .
Thus, the characteristic polynomial splits over any field F . There is one eigenvalue, 1,
which occurs with algebraic multiplicity 2. Let’s proceed with the algorithm for
diagonalization by computing a basis the eigenspace for 1, i.e., for ker(A − I2 ):
1 1
1 0
0 1
−
=
0 1
0 1
0 0
Therefore, ker(A − I2 ) = {(x, 0) : x ∈ F }. A basis is {(1, 0)}. Thus, there is no basis
for F 2 consisting of eigenvectors: our theory says any eigenvector would have to have
eigenvalue 1, and the space of eigenvectors with eigenvalue 1 is only one-dimensional!
141
Definition. Let dim V < ∞. The geometric multiplicity of an eigenvalue λ for a
linear transformation f : V → V is the dimension of the eigenspace for λ:
dim ker(f − λ idV ).
So if A is a matrix representing f , then the geometric multiplicity of λ ∈ F is
dim ker(A − λ In ).
Proposition. Let dim V < ∞, and let λ be an eigenvalue of a linear transformation
f : V → V . Then the geometric multiplicity of λ is at most the algebraic multiplicity
of λ.
Proof. Let v1 , . . . , vk be a basis for ker(f − λ idV ), and extend it to a basis v1 , . . . , vn
for all of V . Then with respect to this basis, the matrix representing f has the form
A :=
λIk B
0 C
,
where B and C are (n − k) × (n − k) matrices. So the characteristic polynomial for f
is
(λ − x)Ik
B
pf (x) = det
0
C − xIn−k
= det((λ − x)Ik ) det(C − xIn−k )
= (λ − x)k det(C − xIn−k )
= (λ − x)k g(x),
for some polynomial g. This shows that the algebraic multiplicity of λ is at least k,
the geometric multiplicity of λ.
Jordan form. What can we say when a linear transformation is not diagonalizable?
Can we still choose a basis to make the matrix for the transformation simple in some
sense? We give one answer here. We need a couple definitions, first. A Jordan
block of size k for λ ∈ F is the k× matrix with λs on the diagonal and 1s on the
142
“superdiagonal”:


0 0
0 0 

0 0 


..

.

λ 1 
0 λ
···
···
···
...
λ 1 0 0
0 λ 1 0
0 0 λ 1
..
.




Jk (λ) = 


 0 0 0 0 ···
0 0 0 0 ···
For example,

3
 0
J4 (3) = 
 0
0
1
3
0
0
0
1
3
0

0
0 
.
1 
3
Note the following special case:
J1 (5) = [5].
A matrix is in Jordan form if it is in block diagonal form with Jordan blocks for
various λ along the diagonal:


Jk1 (λ1 )
0
···
0


0
Jk2 (λ2 ) · · ·
0




...


0
0
0
0
0
· · · Jkm (λm )
For example, here is a matrix in Jordan form:












2
0
0
0
0
0
0
0
1
2
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
1
2
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
0
0
4
0
0
0
0
0
0
0
1
4
0
0
0
0
0
0
0
1
4






.





It has two 2 × 2 Jordan blocks for 2, a 1 × 1 Jordan block for 5, and a 3 × 3 Jordan
block for 4:


J2 (2)
0
0
0
 0
J2 (2)
0
0 

.
 0
0
J1 (5)
0 
0
0
0
J3 (4)
143
Theorem. Let dim V < ∞. Suppose f : V → V is a linear transformation over F
and that the characteristic polynomial for f splits, i.e., the field F contains all of
the zeros of the characteristic polynomial. Then there exists an ordered basis for V
such that the matrix representing f with respect to that basis is in Jordan form. The
Jordan form is unique up to a permutation of the Jordan blocks.
So a matrix is diagonalizable if and only if its characteristic polynomial splits and all
of its Jordan blocks have size 1. We also know that a matrix such as


5 1 0
 0 5 1 
0 0 2
which is already in Jordan form but not diagonal is not diagonalizable.
144
Week 10, Friday: Counting walks in graphs.
Walks on graphs
We have devoted a lot of energy to the problem of diagonalizing a matrix. As mentioned in class, one motivation for diagonalization is that it makes taking powers of
a matrix easier. Explicitly, suppose that A ∈ Mn×n (F ) is diagonalizable. So there
exists a matrix P such that
P −1 AP = D = diag(λ1 , . . . , λn ).
It is easy to take powers of a diagonal matrix: D` = diag(λ`1 , . . . , λ`n ). Here is the
important trick:
D` = (P −1 AP )`
= (P −1 AP )(P −1 AP )(P −1 AP ) · · · (P −1 AP )(P −1 AP )
= P −1 A(P P −1 )A(P P −1 )A(P P −1 ) · · · (P P −1 )AP
= P −1 A` P.
Therefore,
A` = P D` P −1 .
In general, there will be many fewer arithmetic operations required on the right-hand
side of this equation than on the left-hand side.
In the next couple of lectures, we will consider some applications of this idea.
Walks in graph. A graph (or network) consists of vertices connected by edges. Here
is an example with 4 vertices connected by 5 edges:
145
v3
v4
v2
v1
The diamond graph.
A walk of length ` in a graph is a sequence of vertices u0 u1 . . . u` where ui−1 is
connected to ui for i = 1, . . . , `. In our example, the following are walks from v1
to v4 :
v1 v4 and v1 v2 v3 v4 .
The first has length 1 and the second has length 3. We are interested in counting the
number of walks between vertices.
Definition. Let G be a graph with vertices v1 , . . . , vn . The adjacency matrix of G is
the n × n matrix A = A(G) defined by
(
1 if there is an edge connecting vi and vj
Aij =
0 otherwise.
For example, the adjacency matrix of the diamond graph is
v3
v1
v1 0
v2 1
A= v
3 0
v4 1

v4
v2
v1
v2
1
0
1
1
v3
0
1
0
1
v4

1
1 
.
1 
0
Theorem. Let A be the adjacency matrix for a graph G with vertices v1 , . . . , vn , and
let ` ∈ Z ≥ 0. Then then number of walks of length ` from vi to vj is (A` )ij .
Proof. Homework.
Example. Consider the diamond graph and its adjacency matrix A, displayed above.
Then
146

A0 = I4 ,
0
 1
A=
 0
1
1
0
1
1
0
1
0
1

1
1 
,
1 
0

2

1
A2 = 
 2
1
1
3
1
2
2
1
2
1

1
2 
,
1 
3

2

5
A3 = 
 2
5
5
4
5
5
2
5
2
5

5
5 
.
5 
4
The highlighted entries in the matrix say there is 1 path of length 2 from v2 to v3
and there are 4 paths of length 3 from v2 to itself. Can you find them? (The answer
appears at the end of this lecture.)
So to count the number of walks, we need to compute powers of the adjacency matrix.
Here is some good news:
Theorem. If A is an n × n symmetric matrix (A = At ) over the real numbers, then
it is diagonalizable over R.
Proof. We may prove this later in the semester. (To look it up online, search for
the “spectral theorem”, which is usual stated for the more general class of Hermitian
matrices. Over the real numbers, the Hermitian matrices are exactly the symmetric
matrices.)
This means that we can find a matrix P such that P −1 AP = D, where D is the
diagonal matrix of the eigenvalues. Then A` = P D` P −1 . So we can find a nice
closed form for the number of walks of length ` between any two vertices as a linear
expression in the `-th powers of the eigenvalues of A. If the eigenvalues are λ1 , . . . , λn ,
for each pair of vertices vi and vj there exist real numbers ck , independent of `, such
that the number of closed walks of length ` from vi to vj is
c1 λ`1 + · · · + cn λ`n .
A special case is particularly nice, and that’s the case of closed walks.
Definition. A walk is closed if it begins and ends at the same vertex.
Definition. Let A be any n×n matrix. Then the trace of A is the sum of its diagonal
entries:
n
X
tr(A) =
Aii .
i=1
Proposition. Let A be the adjacency matrix of a graph G. Then the number of
closed walks in G of length ` is tr(A` ).
147
Proof. For each i = 1, . . . , n, the number of closed walks from vi to vi is (A` )ii .
Summing over i gives the total number of closed walks.
Proposition. Let A be any n × n matrix. Then the trace of A is the sum of its
eigenvalues, each counted according to its (algebraic) multiplicity.
Proof. First, a clarification. Let pA (x) be the characteristic polynomial for A. It
could be that pA does not split over F . However, it turns out that there is a field K
containing F over which it does. The sum of these eigenvalues will be in F and does
not depend on the choice of K. In the case in which we are primarily interested—
the case of the adjacency matrix—the spectral theorem says that the characteristic
polynomial already splits over R.
A general matrix A will not be diagonalizable. However, perhaps working in a larger
field, we can change basis to put A in its Jordan form. So there exists a matrix P
such that P −1 AP = J where J is the Jordan form of A. The eigenvalues of A
appear along the diagonal, each appearing a number of times equal to its algebraic
multiplicity. Next, recall from homework that tr(U V ) = tr(V U ) for any pair of n × n
matrices U and V . It thus follows that
tr(A) = tr(P JP −1 ) = tr(JP −1 P ) = tr(J)
and the result follows.
Corollary. Let A be the adjacency matrix of a graph G with n vertices, and
let λ1 , . . . , λn ∈ R be its list of (not necessarily
Pn ` distinct) eigenvalues. Then the
number of close walks in G of length ` is i=1 λi .
Proof. The number of closed walks of length ` is tr(A` ), which is the sum of the eigenvalues of A` . By homework (an easy induction argument), if λ is an eigenvalue of A,
then λ` is an eigenvalue of A` . It follows that the eigenvalues for A` are λ`1 , . . . , λ`n .
Example. Let A be the adjacency matrix of the diamond graph G. The characteristic
polynomial of A is
det(A − xI4 ) = x4 − 5x2 − 4x = x(x + 1)(x2 − x − 4).
Using the quadratic equation, we find the eigenvalues for A:
√
√
1 + 17 1 − 17
,
.
0, −1,
2
2
Therefore, the number of walks in G of length ` is
√ !`
√ !`
1
+
17
1
−
17
w(`) = (−1)` +
+
.
2
2
148
`
1 2 3 4
5
6
w(`) 0 10 12 50 100 298
Answer to example on page 2. v2 v4 v3 has length 2 and the following have length 3:
v2 v3 v4 v2 , v2 v4 v3 v2 , v2 v1 v4 v2 , and v2 v4 v1 v2 .
149
Week 11, Monday: Differential equations.
Linear systems of differential equations1
Let
x1 (t) = population of frogs in a pond
x2 (t) = population of flies in a pond,
and suppose the rate of change of these populations satisfies the following system of
differential equations:
x01 (t) = ax1 (t) + bx2 (t)
x02 (t) = cx1 (t) + dx2 (t).
So we are assuming that the rate of growth of these populations depends linearly on
the sizes of the populations. Letting
0
x1 (t)
x1 (t)
0
,
x(t) :=
and x (t) :=
x02 (t)
x2 (t)
we can rewrite the system in matrix form:
a b
0
x (t) =
x(t).
c d
Our problem is to find x(t) solving the system. For example, if b = c = 0, then the
system becomes
x01 (t) = ax( t) and x02 (t) = dx2 (t)
which has the solution:
x1 (t) = k1 eat
and x2 (t) = k2 edt .
The constants k1 and k2 are the populations at time t = 0:
x1 (0) = k1 ea·0 = k1
1
and x2 (0) = k2 .
These lecture notes are based on Angélica Osorno’s notes.
150
When b = c = 0, we say the system is decoupled since x1 and x2 do not depend on
each other.
To generalize, suppose x(t) = (x1 (t), . . . , xn (t)) and satisfies
x0 = Ax
for some n × n matrix A over the real numbers. If A is diagonalizable over R, then
we can change variables to decouple and solve the system, as follows. Suppose
P −1 AP = D = diag(λ1 , . . . , λn ).
Then, x0 = Ax becomes x0 = P DP −1 x, or equivalently,
P −1 x0 = DP −1 x.
Define y(t) = P −1 x(t). Then since taking derivatives is linear, we have y 0 (t)) =
P −1 x0 (t). So substituting, we get the new system:
y 0 = Dy,
which means
y10 = λ1 y1
..
..
.
.
0
yn = λ1 yn .
This new system is decoupled! The solution is yi (t) = ki eλi t where ki = yi (0) for
i = 1, . . . , n. The original system is then solved since x = P y. So each xi is a linear
combination of k1 eλ1 t , . . . , kn eλn t with coefficients determined by P .
Example. Consider the coupled system
x01 = x2
x02 = x1 ,
or, in matrix form:
x0 = Ax
where
A=
0 1
1 0
151
.
Applying our algorithm to diagonalize A, we find
P −1 AP = D = diag(1, −1)
where
1 1
P =
.
1 −1
Using the notation from the discussion above, we have
y1 = k1 et
and y2 = k2 e−t .
Our solution is x = P y:
k1 et + k2 e−t
x1
1 1
y1
1 1
k1 et
.
=
=
=
k1 et − k2 e−t
k2 e−t
x2
1 −1
y2
1 −1
Therefore,
x1 (t) = k1 et + k2 e−t
x2 (t) = k1 et − k2 e−t .
If we are interested in the solution that starts at the point (1, 0), then we need to
solve
1 = x1 (0) = k1 e0 + k2 e−0 = k1 + k2
0 = x2 (0) = k1 e0 − k2 e−0 = k1 − k2 .
So k1 = k2 = 1/2, and our solution is
1 t
e + e−t
2
1 t
x2 (t) =
e − e−t .
2
A plot of that solution (x1 (t), x2 (t)) in the plane appears in blue in the picture below. The arrows indicate the following: at each point (x1 , x2 ) ∈ R2 , we attach the
vector (x01 , x02 ) = (x2 , x1 ).
x1 (t) =
152
The solution in blue has velocity vector (1, 0) at time t = 0. To repeat: geometrically,
the solution we are looking for is a parametrized curve in the plane:
x : R → R2
t 7→ x(t) = (x1 (t), x2 (t)).
The differential equation is specifying the tangent (velocity) vectors x0 (t) at each
time t. It determines a “flow” as illustrated in the picture. Specifying an initial
condition is like dropping a speck into the flow. We then get a unique solution, which
is the trajectory of that speck over time (shown in blue, above).
Note: the arrows determine new “axes” pointed in the directions of the eigenvectors, (1, 1) and (1, −1).
Another solution. Recall that
∞
X
1 k
t
e :=
k!
k=0
t
converges for all t ∈ R (or t ∈ C). Given any n × n matrix A over the real or complex
numbers, define
eAt =
∞
X
1 k k
1
1
1
A t = In + At + A2 t2 + A3 t2 + A4 t4 + · · · .
k!
2
6
24
k=0
In each entry we get a power series in t which turns out to converge for all t.
Proposition. Let A be an n × n matrix over the real or complex numbers. Then
the solution to x0 = Ax with initial condition x(0) = p is
x = eAt p.
The key to proving this proposition is
(eAt )0 = AeAt ,
and hence,
(eAt p)0 = A(eAt p).
Computing eAt . If A is diagonalizable, then we can write
P −1 AP = D = diag(λ1 , . . . , λn )
153
where the λi are the eigenvalues of A. As we have seen earlier, it follows that
Ak = (P DP −1 )k = P Dk P −1 = P diag(λk1 , . . . , λkn )P −1 .
Therefore,
eAt
∞
∞
X
1 k k X 1
=
A t =
(P Dk P −1 )tk = P
k!
k!
k=0
k=0
∞
X
1 k k
D t
k!
k=0
!
P −1 = P eDt P −1 .
Since D is diagonal, an easy calculation shows that
eDt = diag(eλ1 t , . . . , eλn t ).
So
eAt = P diag(eλ1 t , . . . , eλn t )P −1 .
Example. In our earlier example, we had
0 1
1 1
A=
P =
,
1 0
1 −1
and D =
1 0
0 −1
.
Therefore,
eAt = P eDt P −1
=
1
=
2
et 0
0 e−t
et + e−t et − e−t
et − e−t et + e−t
1 1
1 −1
1
2
1
2
1
2
!
− 12
.
So, for example, the solution with initial condition x(0) = (1, 0) is
1 et + e−t et − e−t
1 et + e−t
x1 (t)
1
=
=
,
x2 (t)
0
2 et − e−t et + e−t
2 et − e−t
as we discovered earlier.
154
Week 11, Wednesday: Inner products.
Inner product spaces
We would now like to add structure to a vector space that will allow us to define
length and angles.
Definition. Let V be a vector space over a field F where F is either R or C. An
inner product on V is a function
h , i: V × V → F
(x, y) 7→ hx, yi
satisfying for all x, y, z ∈ V and c ∈ F :
1. linearity: hx + y, zi = hx, zi + hy, zi and hcx, yi = chx, yi.
2. conjugate symmetry: hx, yi = hy, xi.
3. positive-definiteness: hx, xi ∈ R≥0 , and hx, xi = 0 iff x = 0.
Note. If F = R, then an inner product is known as a non-degenerate symmetric
form. If F = C, an inner product is known as a non-degenerate Hermitian form.
Examples.
• The ordinary dot product on Rn : Here, V = Rn and
h(x1 , . . . , xn ), (y1 , . . . , yn )i = x · y :=
n
X
xi yi = x1 y1 + · · · + xn yn .
i=1
For example, in R3 , we would have
h(1, 2, 3), (2, 3, 4)i = 2 + 6 + 12 = 20 and h(1, 2, 3), (−2, 1, 0)i = −2 + 2 + 0 = 0.
155
• The ordinary inner product on Cn : Here, V = Cn and
h(x1 , . . . , xn ), (y1 , . . . , yn )i = x · ȳ :=
n
X
xi ȳi = x1 y¯1 + · · · + xn y¯n .
i=1
For example, in C2 , we would have
h(1 + i, 1 − i), (1 + 2i, 4)i = (1 + i)(1 + 2i) + (1 − i) 4
= (1 + i)(1 − 2i) + (1 − i) 4
= (3 − i) + (4 − 4i) = 7 − 5i.
• Let V = CR ([0, 1]) = {f : [0, 1] → R : f is continuous}, the vector space of R-valued
continuous functions on the interval [0, 1], and
Z 1
hf, gi =
f (t)g(t) dt.
0
To check positive-definiteness, note that if f 6= 0, then f 2 (t) > 0 for t in some open
interval in [0, 1]. Hence,
Z
1
f 2 (t) dt > 0.
hf, f i =
0
• V = R2 , and
h(x1 , x2 ), (y1 , y2 )i = 3x1 y1 + 2x1 y2 + 2x2 y1 + 4x2 y2 .
For positive-definiteness, we have
h(x1 , x2 ), (x1 , x2 )i = 3x21 + 4x1 x2 + 4x22 .
Complete the square:
3x21
+ 4x1 x2 +
4x22
4
4 2
2
= 3 x1 + x 1 x2 + x2
3
3
!
2
4 2 4 2
2
=3
x1 + x2 − x2 + x2
3
9
3
!
2
2
8
=3
x1 + x2 + x22
3
9
≥ 0,
with equality if and only if x1 = x2 = 0.
156
• V = Mm×n (F ). For A ∈ Mm×n (F ), define the conjugate transpose of A by
A∗ = At ,
where the overline means taking the conjugate of each entry of A. If A has only
real entries, the A∗ = At . Next, define the inner product,
n
X
hA, Bi = tr(B A) =
(B ∗ A)ii .
∗
i=1
(Note: The special case m = 1 gives the usual inner product on Rn or Cn .) Proof
of positive-definiteness: Exercise.
Proposition. Let (V, h , , )i be an inner product space over F = R or C. Then for
all x, y, z ∈ V and c ∈ F ,
1. hx, y + zi = hx, yi + hx, zi.
2. hx, cyi = chx, yi.
3. h0, yi = 0.
4. if hx, yi = hx, zi for all x ∈ V , then y = z.
Proof. For part 1, notice that the definition of an inner product only guarantees
sums on the left distribute. However, using properties of conjugation,
hx, y + zi = hy + z, xi
= hy, xi + hz, xi
= hx, yi + hx, zi.
Parts 2 and 3 are left as exercises. For part 4, hx, yi = hx, zi for all x implies hx, y − zi =
0 for all x. In particular, let x = y − z to get hy − z, y − zi = 0. By positivedefiniteness, we get y − z = 0.
157
Week 11, Friday: Length, distance, components, projections,
angles.
Inner product spaces
Definition. Let (V, h , , )i be an inner product space over F = R or C. The norm
or length of x ∈ V is
p
kxk = hx, xi ∈ R.
Two vectors x, y ∈ V are orthogonal or perpendicular if hx, yi = 0. A unit vector
is a vector of norm 1: so x ∈ V is a unit vector if kxk = 1, which is equivalent
to hx, xi = 1.
Examples of norms.
• V = Rn , hx, yi = x · y, the usual dot product. Then
q
kxk = x21 + · · · + x2n .
• V = Cn , hx, yi = x · y, the usual dot product on Cn . Then
√
kzk = z1 z1 + · · · + zn zn
p
= |z1 |2 + · · · + |zn |2 .
If zj ∈ C is written as zj = xj + iyj with xj , yj ∈ R, then |zj |2 = x2j + yj2 . So then
q
kzk = x21 + y12 + · · · + x2n + yn2 .
So if we identify Cn with R2n via the isomorphism
(x1 + iy1 , . . . , xn + iyn ) → (x1 , y1 , . . . , xn , yn ),
then the isomorphism preserves norms.
Prop. (Pythagorean theorem.) Let (V, h , , )i be an inner product space over F =
R or C, and let x, y ∈ V be perpendicular. Then
kxk2 + kyk2 = kx + yk2 .
158
x+y
y
+
kx
~0
yk
kyk
x
kxk
Proof. Since x and y are perpendicular, we have hx, yi = 0. It follows that hy, xi =
hx, yi = 0, too. Therefore,
kx + yk2 = hx + y, x + yi
= hx, xi + hx, yi + hy, xi + hy, yi
= hx, xi + hy, yi
= kxk2 + kyk2 .
Suppose we are given two vectors x, y in an inner product space. A useful geometric
operation is to break x into two parts, one of which lies along the vector y. Given
any number c, the vector cy lies along y and we can evidently write x as the sum of
two vectors: x = (x − cy) + cy). In addition, though, we would like to require, by
adjusting c, that the vector x − cy is perpendicular to y. The picture in R2 would
be:
x
~0
cy
y
We can calculate the required scalar c:
hx − cy, yi = 0 ⇐⇒ hx, yi − chy, yi = 0 ⇐⇒ c =
hx, yi
hx, yi
⇐⇒ c =
,
hy, yi
kyk2
which makes sense as long as y 6= 0, naturally.
Definition. Let (V, h , i) be an inner product space over F = R or C, and let x, y ∈
V with y 6= 0. The component of x along y is the scalar
c=
hx, yi
hx, yi
=
.
hy, yi
kyk2
159
The (orthogonal) projection of x along y is the vector cy.
Example. Let x = (3, 2) and y = (5, 0) in R2 with the usual inner product. Then
the component of x along y is
(3, 2) · (5, 0)
15
3
hx, yi
=
=
= .
hy, yi
(5, 0), (5, 0)
25
5
So the projection of x along y is
3
cy = (5, 0) = (3, 0),
5
as expected.
Proposition. Let (V, h , i) be an inner product space over F = R or C. Let x, y ∈
V and c ∈ F . Then
1. kcxk = |c|kxk.
2. kxk = 0 if and only if x = 0.
3. Cauchy-Schwarz inequality: |hx, yi| ≤ kxkkyk.
4. Triangle inequality: kx + yk ≤ kxk+kyk.
Proof. Parts 1 and 2 are left as exercises. Part 3 is tricky. If y = 0, we’re
done. So assume y 6= 0, and let c = hx, yi/hy, yi be the component of x along y.
By construction, x − cy is perpendicular to y and hence to cy. Therefore, by the
Pythagorean theorem,
kx − cyk2 + kcyk2 = k(x − cy) + cyk2 = kxk2 .
Since kx − cyk2 ≥ 0, if we drop that term in the above equation, we get
kcyk2 ≤ kxk2 .
Take square roots to get
kxk ≥ kcyk = |c|kyk =
|hx, yi|
hx, yi
kyk
=
.
kyk2
kyk
Multiply through by kyk to get Cauchy-Schwarz.
160
The triangle inequality is an easy consequence of Cauchy-Schwarz:
kx + yk2 = hx + y, x + yi
= hx, xi + hx, yi + hy, xi + hy, yi
= kxk2 + hx, yi + hy, xi + kyk2
= kxk2 + hx, yi + hx, yi + kyk2
= kxk2 + 2 Re(hx, yi) + kyk2
≤ kxk2 + 2 |hx, yi| + kyk2
≤ kxk2 + 2kxkkyk + kyk2
≤ (kxk + kyk)2 .
(We’ve used the fact that if z = a + ib ∈ C with a, b ∈ R, then
z + z = (a + ib) + (a − ib) = 2a = 2Re(z),
twice the real part of z.) Take square roots to get the triangle inequality.
Distance. Let (V, h , i) be an inner product space over R or C. The distance
between x, y ∈ V is defined to be
d(x, y) :=kx − yk.
The following properties then easily follow from what we have already done:
Proposition. For all x, y, z ∈ V ,
1. Symmetry: d(x, y) = d(y, x).
2. Positive-definiteness: d(x, y) ≥ 0, and d(x, y) = 0 iff x = y.
3. Triangle inequality: d(x, y) ≤ d(x, z) + d(z, y).
Angles. Now let (V, h , i) be an inner product space over F = R. (So we will
not consider the case F = C in our discussion of angles.) We would like to define
the notion of an angle between x, y ∈ V . Our picture for the component provides
motivation:
x
c=
hx,yi
kyk2
θ
~0
cy
161
y
The dashed vertical line and the vector y are perpendicular (by definition of c).
Definition. Let (V, h , i) be an inner product space over F = R, and let x, y be
nonzero elements of V . The angle θ between x and y is
hx, yi
−1
θ = cos
,
kxkkyk
and thus,
hx, yi = kxkkyk cos(θ).
Remarks.
• Cauchy-Schwarz says |hx, yi ≤ kxkkyk. Therefore,
−1 ≤
hx, yi
≤ 1.
kxkkyk
So the inverse cosine in the definition of the angle always makes sense.
• In the definition of the angle, it might make more sense conceptually to write
y
x
,
cos(θ) =
.
kxk kyk
In other words, the cosine of the angle between x and y is the inner product of
their directions where the direction of a vector w is taken to be the scalar multiple
of w with unit lenght, w/kwk.
162
Week 12, Monday: Gram-Schmidt.
Gram-Schmidt
Let (V, h , i) be an inner product space over F = R or C.
Definition. Let S ⊆ V . Then S is an orthogonal subset of V if hu, vi = 0 for
all u, v ∈ S with u 6= v. If S is an orthogonal subset of V and kuk = 1 for all u ∈ S,
then S is an orthonormal subset of V .
Examples.
• The standard basis e1 , . . . , en for F n is orthonormal with respect to the standard
inner product on F n .
n
o
• √12 (1, 1), √12 (1, −1) is orthonormal with respect to the standard inner product
on R2 .
Proposition. Let S = {v1 , . . . , vk } be an orthogonal set of nonzero vectors in V ,
and let y ∈ Span S. Then
k
k
X
X
hy, vj i
hy, vj i
vj =
vj .
y=
hvj , vj i
kvj k2
j=1
j=1
Proof. Say y =
Pk
i=1
ai vi . Then for j = 1, . . . , k,
hy, vj i = h
Pk
i=1
ai vi , vj i =
Pk
i=1
ai hvi , vj i = aj hvj , vj i,
since hvi , vj i = 0 for i 6= j. Hence,
aj =
hy, vj i
.
hvj , vj i
163
Corollary 1. If S = {v1 , . . . , vk } is orthonormal and y ∈ Span S, then
k
X
y=
hy, vj ivi .
j=1
Corollary 2. Is S = {v1 , . . . , vk } is an orthogonal set of nonzero vectors in V then S
is linearly independent.
P
Proof. If ki=1 ai vi = 0, then for each j = 1, . . . , k,
P
0 = h0, vj i = h ki=1 ai vi , vj i = aj hvj , vj i.
Since vj 6= 0 and h , i is positive-definite, we have hvj , vj i =
6 0. Hence, aj = 0 for
j = 1, . . . , k.
Example. Consider R2 with the standard inner product, and let
1
1
u = √ (1, 1) and v = √ (1, −1).
2
2
Then β = {u, v} gives an orthonormal ordered basis for R2 . What are the coordinates
of y = (4, 7) with respect to that basis?
u
v
Answer:
y = hy, uiu + hy, viv
1
1
= (4, 7) · √ (1, 1) u + (4, 7) √ (1, −1) v
2
2
11
3
= √ u − √ v.
2
2
Check:
11
√
2
1
3
1
11
3
√ (1, 1) − √
√ (1, −1) =
(1, 1) − (1, −1) = (4, 7).
2
2
2
2
2
164
Gram-Schmidt. Given vectors w1 , w2 ∈ V , we’d like to compute orthogonal vectors v1 , v2 such that
Span {w1 , w2 } = Span {v1 , v2 } .
To do that, let v1 = w1 , then “straighten out” w2 to create v2 :
w2
v2 = w2 − cv1
~0
cv1
w1 = v1
The number c is the component of w2 along v1 . Recall, c is determined by requiring v2
and v1 to be orthogonal:
0 = hv2 , v1 i = hw2 − cv1 , v1 i = hw2 , v2 i − chv1 , v1 i.
Therefore,
c=
hw2 , v1 i
hw2 , v1 i
=
.
hv1 , v1 i
kv1 k2
(We’ve assumed v1 6= 0.)
The following algorithm generalizes this idea:
Algorithm. (Gram-Schmidt orthogonalization)
input: S = {w1 , . . . , wn }, a linearly independent subset of V .
Let
v1 = w1 .
Then for k = 2, 3, . . . , n, define vk by starting with wk , then subtracting off the
components of wk along the previously found vi :
vk = wk −
k−1
X
hwk , vi i
i=1
kvi k2
vi .
output: S 0 = {v1 , . . . , vn } an orthogonal set with Span S 0 = Span S.
or
00
output: S =
v1
vn
,...,
kv1 k
kvn k
an orthonormal set with Span S 0 = Span S.
165
Proof of validity of the algorithm. We prove this by induction on n. The case n =
1 is clear. Suppose the algorithm works for some n ≥ 1. and let S = {w1 , . . . , wn+1 }
be a linearly independent set. By induction, running the algorithm on the first n
vectors in S produces orthogonal v1 , . . . , vn with
Span {v1 , . . . , vn } = Span {w1 , . . . , wn } .
Running the algorithm further produces
vn+1 = wn+1 −
n
X
hwn+1 , vi i
i=1
kvi k2
vi .
It cannot be that vn+1 = 0, since otherwise, the above equation we would say
wn+1 ∈ Span {v1 , . . . , vn } = Span {w1 , . . . , wn } ,
contradicting the assumption of the linear independence of the wi . So vn+1 6= 0.
We now check that vn+1 is orthogonal to the previous vi . For j = 1, . . . , n, we have
hvn+1 , vj i =
wn+1 −
n
X
hwn+1 , vi i
kvi k2
i=1
= hwn+1 , vj i −
vi , vj
n
X
hwn+1 , vi i
i=1
= hwn+1 , vj i −
kvi k2
hvi , vj i
hwn+1 , vj i
hvj , vj i
kvj k2
= hwn+1 , vj i − hwn+1 , vj i
= 0.
We have shown {v1 , . . . , vn+1 } is an orthogonal set of vectors, and we would now like
to show that its span is the span of {w1 , . . . , wn+1 }. First, since {v1 , . . . , vn+1 } is
orthogonal, it’s linearly independent. Next, we have
Span {v1 , . . . , vn+1 } ⊆ Span {v1 , . . . , vn , wn+1 } ⊆ Span {w1 , . . . , wn , wn+1 } .
Since Span {v1 , . . . , vn+1 } is an (n+1)-dimensional subspace of the (n+1)-dimensional
space Span {w1 , . . . , wn , wn+1 }, these spaces must be equal.
166
Corollary. Every nonzero finite-dimensional inner product space has an orthonormal
basis.
Example. Let V = R≤1 [x], the space of polynomials of degree at most 1 with real
coefficients and with inner product
1
Z
hf, gi =
f (t)g(t) dt.
0
Apply Gram-Schmidt to the basis {1, x} to get an orthonormal basis. Note that 1
and x are not orthogonal:
1
Z
h1, xi =
t dt =
0
1
6= 0.
2
Gram-Schmidt: Start with v1 = 1, then let
v2 = x −
hx, v1 i
v1
kv1 k2
=x−
hx, 1i
·1
k1k2
R1
t dt
= x − R0 1
·1
dt
0
1
=x− .
2
Check orthogonality:
Z
h1, x − 1/2i =
1
(t − 1/2) dt = 0.
0
167
Now scale v1 = 1 and v2 = x − 1/2 to create an orthonormal basis:
s
Z 1
dt = 1
kv1 k =
0
p
kv2 k = hx − 1/2, x − 1/2i
s
Z 1
=
(t − 1/2)2 dt
0
s
Z
1
(t2 − t + 1/4) dt
=
0
p
= 1/12.
So an orthonormal basis for V is
1
1, √ (x − 1/2) .
12
168
Week 12, Wednesday: Orthogonal complements and orthogonal projections.
Orthogonal complements and projections
Definition. The direct sum of vector spaces U and W over a field F is the set
U ⊕ W = {(u, w) : u ∈ U and w ∈ W }
with scalar multiplication and vector addition defined by
λ(u, w) = (λu, λw) and (u, w) + (u0 , w0 ) = (u + u0 , w + w0 ),
for all u, u0 ∈ U , w, w0 ∈ W , and λ ∈ F .
Proposition. Let U and W be subspaces of a vector space V over F such that: (i)
the union of U and W spans V , and (ii) U ∩ W = {0}. Then there is an isomorphim
U ⊕W →V
(u, w) 7→ u + w.
Thus, every element of V has a unique expression of the form u + w with u ∈ U
and w ∈ W .
Proof. Easy exercise.
Remark. In the case of the Proposition, we says that V is the internal direct sum
of U and W and abuse notation by simply writing V = U ⊕ W . The direct sum as
we first defined it is sometimes called the external direct sum of U and W .
For the rest of this lecture, let (V, h , i) be an inner product space over F = R or C.
Definition. Let S ⊆ V be nonempty. The orthogonal complement of S is
S ⊥ = {x ∈ V : hx, yi = 0 for all y ∈ S} .
169
Exercise. Show that S ⊥ is a subspace of V .
Proposition. Suppose dim V = n and S = {v1 , . . . , vk } is an orthonormal subset
of V .
1. S can be extended to an orthonormal basis {v1 , . . . , vk , vk+1 , . . . , vn } for V .
2. If W = Span S, then S 0 = {vk+1 , . . . , vn } is an orthonormal basis for W ⊥ .
3. If W ⊆ V is any subspace, then
dim W + dim W ⊥ = dim V = n.
4. If W ⊆ V is any subspace, then (W ⊥ )⊥ = W .
Proof. (1) To prove part 1, extend S to a basis {v1 , . . . , vk , wk+1 , . . . , wn } for V , then
apply Gram-Schmidt.
(2) The set S 0 = {vk+1 , . . . , vn } is linearly independent since it’s a subset of a basis.
Since {v1 , . . . , vn } is orthonormal, and W = Span {v1 , . . . , vk }, we have S 0 ⊆ W ⊥ .
Therefore, Span S 0 ⊆ W ⊥ . For the opposite inclusion, take x ∈ W ⊥ . Then since
{v1 , . . . , vn } is orthonormal, we have
n
n
X
X
hx, vi i vi ∈ Span S 0 .
x=
hx, vi i vi =
i=1
i=k+1
(3) If W ⊆ V is any subspace, choose an orthonormal basis {v1 , . . . , vk } for W . Then
apply parts 1 and 2.
(4) It’s clear that W ⊆ (W ⊥ )⊥ since
(W ⊥ )⊥ = x ∈ V : hx, yi for all y ∈ W ⊥ .
Then, by part 3,
dim(W ⊥ )⊥ = n − dim W ⊥ = dim W.
Hence, W = (W ⊥ )⊥ .
Proposition. Let W be a finite-dimensional subspace of V . Then
V = W ⊕ W ⊥.
In other words, for each y ∈ V , there exists unique u ∈ W and z ∈ W ⊥ such that
y = u + z.
170
We define u to be the orthogonal projection of y onto W .
If u1 , . . . , uk is an orthonormal basis for W , then
k
X
u=
hy, ui i ui .
i=1
Proof. ByPGram-Schmidt, there exists and orthonormal basis u1 , . . . , uk for W .
Define u = ki=1 hy, ui i ui and z = y−u. Then u ∈ W and y = u+z. Further, z ∈ W ⊥
since for each j = 1, . . . , k, we have
hz, uj i = hy − u, uj i
P
= hy, uj i − h ki=1 hy, ui i ui , uj i
P
= hy, uj i − ki=1 hy, ui i hui , uj i
= hy, uj i − hy, uj i huj , uj i
= hy, uj i − hy, uj i
= 0.
For uniqueness, suppose there exist u0 ∈ W and z 0 ∈ W ⊥ such that
y = u + z = u0 + z 0 .
Then u − u0 = z − z 0 ∈ W ∩ W ⊥ = {0}. Thus, u = u0 and z = z 0 . (The
P reason W ∩
W ⊥ = {0} is as follows: if x ∈ W , then we saw last time that x = i=1 hx, ui i ui . If
it is also the case that x ∈ W ⊥ , then hx, ui i = 0 for i = 1, . . . , k since each ui is in W .
Hence, x = 0.)
Corollary. The orthogonal projection u of y onto W is the closest vector in W to y:
ky − uk ≤ ky − wk
for all w ∈ W with equality if and only if w = u.
Proof. Write y = u + z with u ∈ W and z ∈ W ⊥ , and let w ∈ W . Then u − w ∈ W
and y − u ∈ W ⊥ . So u − w and z = y − u are perpendicular. By the Pythagorean
theorem,
ky − wk2 = k(u + z) − wk2
= k(u − w) + zk2
= k(u − w)k2 + kzk2
≥ kzk2
= ky − uk2 .
171
Equality occurs above if and only if ku − wk2 = 0, i.e., if and only if u = w.
Example. Let V = R3 with the standard inner product, and let’s consider orthogonal
projection onto the xy-plane. An orthonormal basis for the xy-plane is {e1 , e2 }. The
projection of a point u = (x, y, z) ∈ R3 is given by
u = ((x, y, z) · e1 )e1 + ((x, y, z) · e2 )e2 = x e1 + y e2 = (x, y, 0).
The distance of (x, y, z) to the xy-plane is
k(x, y, z) − uk = k(0, 0, z)k = |z|.
172
Week 13, Monday: Least squares.
Let (V, h , i) be an inner product space over F = R or C. Recall from last time that
if W ⊆ V is a finite-dimensional subspace of V , then
V = W ⊕ W ⊥,
which means that every y ∈ V has a unique expression of the form y = u + z
with u ∈ W and z ∈ W ⊥ . If {u1 , . . . , uk } is an orthonormal basis for W , then
k
X
u=
hy, ui i ui .
i=1
2
Example. In R , find the closest line to the three points (0, 6), (1, 0), and (2, 0).
(0, 6)
(1, 0) (2, 0)
A general equation for a line is y = ax + b. To pass through the three points, we
would need
a·0+b=6
a·1+b=0
a·2+b=0
In matrix form:


 
0 1 6
 1 1  a =  0 .
b
2 1
0
A
x
173
y
So we are trying to solve Ax = y for x. Since no such solution exists (there is no line
through the three given points), we instead look for an x = (m, b) minimizing the
error:
e := ky − Axk.
Define
W = Ax : x ∈ R2 = im(A).
According to what we have discussed above, to minimize the error, e, we need to
compute the projection of y = (6, 0, 0) onto W . So let’s start by computing an
orthonormal basis for W . The columns of A are a basis for im(A). Apply GramSchmidt. We start with v1 = (0, 1, 2) as the first vector in an orthogonal basis, we
then “straighten up” the second column, (1, 1, 1), by substracting of its component
along (0, 1, 2):
v2 = (1, 1, 1) −
(1, 1, 1) · (0, 1, 2)
(0, 1, 2)
(0, 1, 2) · (0, 1, 2)
3
(0, 1, 2)
5
2 1
= 1, , −
.
5 5
= (1, 1, 1) −
So far, we’ve got an orthogonal basis for the image of A:
v1 = (0, 1, 2) and v2 =
2 1
1, , −
5 5
Divide by the lengths to get an orthonormal basis:
v1
1
= √ (0, 1, 2)
kv1 k
5
r v2
2 1
5
u2 =
1, , −
.
=
kv2 k
6
5 5
u1 =
174
.
Now compute the projection of y = (6, 0, 0) to W = im(A):
u = hy, u1 i u1 + hy, u2 i u2
1
= (6, 0, 0) · √ (0, 1, 2) u1 +
5
r
=6
r
=6
r !
5
2 1
1, , −
u2
(6, 0, 0) ·
6
5 5
5
u2
6
5
6
r !
5
2 1
1, , −
6
5 5
= (5, 2, −1).
Since (5, 2, −1) is in the image of A we can solve the system of equations




0 1 5
 1 1  a =  2 .
b
2 1
−1
We get a = −3 and b = 5. So the line of best fit is
y = −3x + 5.
(0, 6)
(1, 0) (2, 0)
Adjoints. We present, without proof, a useful tool for computing projections of the
type that arose in the previous example.
If f : V → V is a linear function, then there exists a unique linear function f ∗ : V → V
satisfying
hf (x), yi = hx, f ∗ (y)i.
175
This mapping f ∗ is called the adjoint of f . If f is represented by a matrix A with
respect to some ordered basis, then f ∗ is represented by the conjugate transpose A∗
with respect to that basis, and we have
hAx, yi = hA, A∗ yi.
To illustrate a use of the adjoint, let A be an m × n matrix over F . Given y ∈ F m ,
we would like to compute x ∈ F n minimizing ky − Axk.
Lemma. rank(A∗ A) = rank(A).
Proof. Note that A is m × n and A∗ is n × m. So A∗ A is n × n. By rank-nullity, we
have
rank(A) = n − dim(ker(A))
rank(A∗ A) = n − dim(ker(A∗ A)).
So it suffices to show
ker(A) = ker(A∗ A).
The following calculation shows ker(A) ⊆ ker(A∗ A):
x ∈ ker(A)
⇒
⇒
⇒
Ax = 0
A∗ Ax = 0
x ∈ ker(A∗ A).
For the opposite inclusion, we first use the property of the adjoint:
x ∈ ker(A∗ A)
⇒
⇒
A∗ Ax = 0
hAx, Axi = hx, A∗ Axi = hx, 0i = 0.
Then by positive-definiteness, it follows that Ax = 0. Hence x ∈ ker(A).
Corollary. If A ∈ Mm×n (F ) has rank n, then A∗ A is invertible.
Proposition. Given A ∈ Mm×n (F ) and y ∈ F m , there exists x0 ∈ F n such that
ky − Ax0 k ≤ ky − Axk
for all x ∈ F n . For this x0 , we have A∗ Ax0 = A∗ y. If rank(A) = n, then
x0 = (A∗ A)−1 A∗ y.
176
Proof. We are looking for x0 ∈ F n such that Ax0 is closest to y. Note that {Ax : x ∈ F n } =
im(A). So we are looking for the projection of y to the linear subspace im(A) of F m .
That proves existence. Now we want to find x0 ∈ F n such that
y = Ax0 + z
with z = y − Ax0 ∈ (im(A))⊥ . Calculate:
y − Ax0 ∈ (im(A))⊥
⇔
⇔
⇔
⇔
hAx, y − Ax0 i = 0
for all x ∈ F n
hx, A∗ (y − Ax0 )i = 0
for all x ∈ F n
A∗ (y − Ax0 ) = 0
A∗ Ax0 = y.
Let’s apply this theory to our original problem:


 
0 1
6



1 1 , y=
0  , rank(A) = 2,
A=
2 1
0
∗
A A=
0 1 2
1 1 1


0 1
 1 1 = 5 3 ,
3 3
2 1
x0 = (A∗ A)−1 A∗ y =
1
6
3 −3
−3 5
∗
−1
(A A)
0 1 2
1 1 1
1
=
6
3 −3
−3 5


6
−3
 0 =
.
5
0
Therefore, the line of best fit is
y = −3t + 5,
as we discovered earlier.
Remark. Note that the computation we just made is a lot easier: it avoids computing
an orthonormal basis for im(A).
Least squares. Minimizing ky − Axk is called the method of least squares. To
explain that terminology, imagine that at time ti , we measure a quantity yi ∈ F
for i = 1, . . . , n. We would like to find the “best” line y = ax + b. Proceeding as
above, we are looking for m, b ∈ F such that yi = ati + b for each i, or in matrix form,




t1 1 y1
 .. ..  a


=  ...  .
 . . 
b
tn 1
ym
177
Let A be the n × 2 matrix in the above equation, and x = (m, b). Since the the
points yi will not, in general, sit on a line, we look to minimize the error ky − Axk,
or equivalently ky − Axk2 . But
2
ky − Axk =
n
X
(yi − (ati + b))2 .
i=1
The terms yi − (mti + b) are the vertical distances between the points predicted by
the line at time ti and the actual observation yi :
yi
ati + b
178
Week 13, Wednesday: Markov chains I.
Markov Chains I
Let M = (V, E) be a directed graph, consisting of a set of vertices V and (directed)
edges E. The elements of E are ordered pairs (x, y) where x, y ∈ V :
x
y
(You can think of an undirected graph as a directed graph which has for each
edge (x, y) from x to y a corresponding edge (y, x) from y to x). If e = (x, y) ∈ E,
we call x the tail of e and y the head and write e− = x and e+ = y.
A Markov chain models random walks on a directed graph. Each edge (x, y) of the
graph is assigned a number which represents the probability of transitioning from x
to y in the walk. For example, suppose people move between the city and the suburbs
each year according to probabilities listed below:
0.1
0.9
city
suburbs
0.98
0.02
So the probability of moving from the city to the suburbs each year is 0.1 and the
probability of remaining in the city is 0.9, for example. Note that the probabilities
on the outgoing edges at each vertex add up to 1. We formalize these ideas with the
definition below.
Definition. A finite Markov chain consists of the following data:
1. A finite set of states V .
2. A transition matrix
P : V × V → [0, 1]
179
with the property that for each state x,
X
P (x, y) = 1.
y∈V
3. A sequence of random variables (X0 , X1 , X2 , . . . ) satisfying the law of the chain:
the probability that Xn+1 is y given that Xn = x is P (x, y).
We identify the Markov chain with its associated weighted graph M = (V, E, P )
where (x, y) ∈ E if P (x, y) > 0 and the weight of an edge (x, y) is P (x, y).
Example. In the above example, we have V = {c, s} where c = city and s = suburb,
and
c
P = s
c
s
0.9 0.1
.
0.02 0.98
Notice the that sum of the entries in each row is 1.
Question. Continuing with our example, suppose that we start with 70% of the
population in the city and 30% in the suburbs. We write this as a vector
π0 = (0.7, 0.3).
Technically, π0 is a probability distribution on the set of states V : it’s entries are
between 0 and 1 and they sum to 1. We can also think of π0 as a function π0 : V →
[0, 1] where π0 (c) = 0.7 and π0 (s) = 0.3. What is the distribution of our population
after a year. We start with 70% in the city. Of these, 90% remain (and 20% move
to the suburbs) while 2% of the people in the suburbs transition to the city. So the
proportion of the population in the city after a year is
0.7 · 0.9 + 0.3 · 0.02 = 0.636.
So only 63.6% remain in the city after one year. Similarly, the proportion in the
suburbs is
0.7 · 0.1 + 0.3 · 0.98 = 0.364.
Notice that these proportions could be calculated using matrix multiplication:
0.9 0.1
π0 P = 0.7 0.3
= 0.636 0.364 .
0.02 0.98
180
Exercise. Let M be a Markov chain with transition matrix P , and let π0 : V → [0, 1]
be an initial probability on its states. Then after one transition of the chain, the new
distribution is
π1 = π0 P.
Therefore, after k transitions, the distribution is
π k = π0 P k .
Example. Back to the city/suburb example: to determine the long-term distribution
of our population, we need to calculate πk = π0 P k as k → ∞. So let’s diagonalize P .
Eigenvalues and bases of corresponding eigenspaces are listed below:
λ1 = 1,
λ2 =
v1 = (1, 1)
22
,
25
v2 = (5, −1).
Letting
Q=
1 5
1 −1
1
0
!
0
22
25
we have
Q−1 P Q =
,
=: D.
It follows that
1
P k = QDk Q−1 = Q
0
0
22 k
!
Q−1 .
25
Therefore,
k
k
lim P = lim QD Q
k→∞
k→∞
−1
k
−1
= Q( lim D )Q
k→∞
=Q
1 0
0 0
!
−1
Q
1
=
6
Therefore, given any initial π0 = (c0 , s0 ) (with c0 + s0 = 1), we have
1 1 5
k
π 0 P → c0 s 0
6 1 5
181
1 5
1 5
.
=
=
5(c0 +s0 )
6
c0 +s0
6
5
6
1
6
.
So stating with any initial distribution, the system evolves towards one in which a
sixth of the population lives in the city and five-sixths live in the suburbs. We call
this the stationary distribution of the Markov chain.
Definition. Let M be a Markov chain with transition matrix P . A probability
distribution π on the states is a stationary distribution if
πP = π.
Questions. How quickly does the system evolve towards the stationary distribution?
What determined that speed of evolution in our population example?
Example. Consider the following Markov chain:
v1
v5
1
0.5
1
v3
v6
1
0.7
v2
0.5
1
v4
0.3
Question: Starting with any initial distribution π0 , what is the long-term behavior
of the system, i.e., how does πk = π0 P k evolve? Our transition matrix is


0 0 1
0
0 0
 0 0 0.7 0.3 0 0 


 0 0 0 0.5 0.5 0 
.
P =
 0 0 0

0
1
0


 0 0 0
0
0 1 
0 0 0
1
0 0
The characteristic polynomial is x6 − x3 = x3 (x3 − 1). The eigenvalue 0 has algebraic
multiplicity 3, but it turns out that its geometric multiplicity—the dimension of its
182
eigenspace—is only 2 (so P is not diagonalizable). The remaining eigenvalues are 1, ω,
and ω 2 where ω = e2πi/3 . These are the three cube roots of 1 in C. The eigenvector
for 1 is (1, 1, 1, 1, 1, 1).
Even though P is not diagonalizable, it has a Jordan form: there is a matrix Q such
that Q−1 P Q is the block-diagonal matrix


0
0 1
0 0



D=



1
ω
ω2







The eigenvalue 0 has one Jordan block of size 1 and one Jordan block of size 2.
Therefore, since
k 0 0
0 1
=
0 0
0 0
for k ≥ 2. It follows that for k ≥ 2,


0



k
D =



0
0
1
ω
ω2



.



The matrix P k = QDk Q−1 thus has period three for k ≥ 2. It turns out
1
2
13
20
1
2

0 0 0





P2 = 




0 0 0
1
2
7
20
0 0 0
0
0 0 0
0
0 0 0
1

0 

1 
2 
,
0 1 


0 0 
0 0 0
0
1 0
0

183

0 0 0 0





P3 = 




0 0 0 0
1
2
7
20
1
2
0
0 0 0 1
0
0 0 0 0
1





,
0 


0 
0 0 0 0
0
1
0 0 0
1
2
13
20
1
2

and

0 0 0





4
P =




0 0 0
1
2
13
20
1
2
0
0
1
2
7
20

0 0 0
0 1
0 0 0
0 0



0 

.
0 


1 
0 0 0
1 0
0
0 0 0
1
2
Therefore, starting with an initial distribution π0 , we will get a periodic sequence of
distributions π0 P 2 , π0 P 3 , π0 P 3 with period 3. The sequence depends on the initial
distribution and will never converge.
184
Week 13, Friday: Markov chains II. Pagerank I.
Let M be a Markov chain with transition matrix P . Recall that a probability distribution π in the states of M is a stationary distribution if
π = πP.
This means that the distribution does not evolve over time. We will introduce two
properties: (i) irreducibility, which guarantees the existence and uniqueness of a stationary distribution, and (ii) aperiodicity which will then guarantee the system evolves
to the stationary distribution from any initial distribution.
Existence and uniqueness of stationary distribution. We will first look at an
example for which there are multiple stationary states. The source of the problem is
the presence of states x and y the don’t “communicate”, i.e., there are not directed
paths from x to y and from y to x. Consider the Markov chain pictured below:
1
3
1
1
3
v1
1
3
v2
v3
1
As the Markov chain runs, we eventually end up stuck either in state v1 or in v2 with
equal probability. The transition matrix is


1 0 0


P =  13 31 13  .
0 0 1
It turns out that


lim P k = 
k→∞
1 0 0

1
2

.
0
1
2
0 0 1
185
The distribution π = (a, b, c) is a stationary distribution if and only if


1 0 0


a b c  21 0 21  = a + 12 b 0 12 b + c = a b c .
0 0 1
The general solution to this equation is
π = (a, 0, c).
Therefore, there is a 2-parameter family of stationary distributions.
Definition. A Markov chain is irreducible if for every pair of states x, y, there
exists a directed path in the chain from x to y, equivalently, there exists a k such
that P k (x, y) > 0.
Theorem. (Existence and uniqueness of stationary distribution) If M is an irreducible Markov chain then it has a unique stationary distribution.
Example. We can make the previous Markov chain irreducible by adding some
paths:
1
3
1
3
1
2
v1
1
3
v2
1
2
v3
1
2
1
2
The new transition matrix is


P =
1
2
1
3
0
1
2
1
3
1
2
0

1
3
1
2

.
To find the stationary distribution, we need to solve the equation π = πP for π. Since
the chain is irreducible, we know this equation has a unique solution. We have
πP = π
⇔
π(P − I3 ) = 0.
So π is in the left kernel of P − I3 . From the theorem we just stated, we know there
is a unique element in the left kernel whose entries are in the interval [0, 1] and whose
186
entries sum to 1. To compute the left kernel of a matrix A, apply our usual algorithm
for computing the (right) kernel of the transpose, At . That follows since
πA = 0
(πA)t = 0t
⇔
At π t = 0.
⇔
In our case,


(P − I3 )t = 


=
− 21
1
2
0
t
1
3
− 23
1
3


0
1
2
− 12
− 12
1
3
0

1
2
− 23
1
2


0
1
3
− 12

1 0 −1
→  0 1 − 23  .
0 0
0

The kernel is {(c, 3c/2, c) : c ∈ R}, which has a basis {(1, 3/2, 1)}. Scale this vector
by 1 + 3/2 + 1 = 7/2 to get a vector whose entries sum to 1:
π=
2 3 2
, ,
7 7 7
.
This is the stationary distribution.
Convergence. It turns out that in the previous example, starting with any distribution π0 , we get that limk→∞ πk = limk→∞ π0 P k converges to the stationary distribution. (To verify this, you could, for example, diagonalize P to easily deal with P k .
We did an example like this in the last lecture.) Consider the following example,
however,
v3
1
v1
1
1
187
v2
We have


0 1 0
P =  0 0 1 .
1 0 0
Since the chain is irreducible, we know there is a unique solution to πP = π for
which π is a probability distribution (non-negative with coordinates summing to 1).
It’s easy to check by computing the left kernel of P − I3 , as above, that the unique
stationary distribution is π = (1/3, 1/3, 1/3). If π0 = (a, b, c) is an initial distribution,
then
π1 = π0 P = (c, a, b),
π2 = π0 P 2 = (b, c, a),
and π3 = π0 P 3 = (a, b, c).
It follows that the only initial distribution π0 that converges to π is π, itself. The
problem, of course, is the cycle. (The above calculation also shows that (1/3, 1/3, 1/3)
is the unique stationary distribution without calculating the left kernel of P − I3 .)
Definition. Let M be a Markov chain with states V and transition matrix P . For
each x ∈ V , define
T (x) = k ≥ 1 : P k (x, x) > 0 ,
the set of those k for which there is a directed path of length k ≥ 1 in the graph
for the chain. The period of the state is the greatest common divisor of these cycle
lengths, gcd {T (x)}, provided T (x) is non-empty. Otherwise, T (x) is not defined.
The chain M is aperiodic if each of its states has period 1.
Example. In the example given just above, each state has period 3. One can always
modify a Markov process to assure aperiodicity by adding a loop at each state:
v3
0.1
0.9
0.1
v1
0.9
0.9
v2
0.1




0 1 0
1 0 0
P = 0.9  0 0 1  + 0.1  0 1 0  .
1 0 0
0 0 1
Any initial distribution will converge to the stationary distribution (1/3, 1/3, 1/3).
Example. The following chain is aperiodic:
188
0.5
0.5
1
1
1
1
Proposition. If M is irreducible, then
1. The periods of all of its states are equal.
2. If M is aperiodic, then there exists r > 0 such that P r (x, y) > 0 for all states x
and y.
Theorem. If M is irreducible, then it has a unique stationary distribution π. If, in
addition, M is aperiodic, then given any initial distribution π0 ,
lim πk = lim π0 P k = π.
k→∞
k→∞
Pagerank
Pagerank. How can we rank the importance of webpages? Think of a webpage as
being more important of a lot of important pages link to it. That sounds ill-defined,
but we’ll see it’s not.
Let I(x) be the importance of webpage x. Then we want
I(x) =
X I(y)
y→x
dy
,
(36.1)
where y → x means there is a link from page y to page x and dy is the outdegree of y:
the number of links leading out of page y. Thus, if dy = 5, then page y contributes a
fifth of its importance to each of the webpages to which it is linked. Define an n × n
matrix H where n is the number of webpages in the network by
(
1
if y → x,
Hyx = dy
0 otherwise.
Then, letting I be the row vector with x-th entry I(x), equation (36.1) becomes
I = IH.
189
Therefore, to find the importance of a webpage, we need to find a left eigenvector
for H with eigenvalue 1, or equivalently, a right eigenvector for H t . Once we find
one, we can scale it so that its entries sum to 1. If we can find an eigenvector with
nonnegative entries, then we could interpret the scaled vector I as a vector of relative
importances.
Example. See the first example in the following AMS article.
190
Week 14, Monday: Pagerank II.
Recall from last time that we defined the importance of a webpage x to be
X I(y)
I(x) =
,
dy
y→x
(37.1)
where y → x means there is a link from page y to page x and dy is the outdegree
of y: the number of links leading out of page y. We then defined an n × n matrix H
where n is the number of webpages in the network by
(
1
if y → x,
dy
Hyx =
0 otherwise.
Then, letting I be the row vector with x-th entry I(x), equation (37.1) becomes
I = IH.
Note: We will always scale I to make it a probability distribution (with components
summing to 1). In that way, we think of I as the vector of relative importances.
Example. Consider the following network of four webpages:
v4
v1
v2
v3
We have

0 1 0 0
 1
 3 0 31
H=
 0 1 0

2
0
1
2
191
1
2
1
3
1
2
0



.


To find the importance vector, we look for a left-eigenvector for H with eigenvalue 1.
Equivalently, we look for a right-eigenvector for the transpose H t with eigenvalue 1.
1
−1
0
3
1
 1 −1
2

t
H − I4 = 
1
−1
 0
3

1
3
0
The kernel is
0


1
2
1
2
1 0 0 − 12



 0 1 0 − 32
→

 0 0 1 −1
−1
0 0 0
0
1
2
1 3
c, c, c, c
2 2



.

,
which has basis, for instance, {(1, 3, 2, 2)}. Scale this vector to get the vector of
(relative importances):
1 3 2 2
, , ,
.
I=
8 8 8 8
In words: page v2 is 3 times as important as page v1 , and pages v2 and v3 are twice
as important as v1 .
Markov process interpretation. Continuing with the previous example, if we
label each link leading out of every page x with dx , the outdegree of x, we get the
graph for a Markov chain:
v4
1
2
1
3
1
v1
1
3
v2
1
2
1
2
1
2
1
3
v3
The vector of relative importances is then the stationary distribution for this associated Markov chain. In terms of webpages, the Markov process is the following: at
each webpage, choose a link uniformly at random, and follow the link to the next
page.
192
The problem of dangling nodes. Pages with no outgoing links present a problem.
Consider the following example:
1
3
1
3
1
3
v1
v2
v3
It’s matrix is


H=
0 0 0
1
3
1
3
1
3


.
0 0 0
It turns out that the eigenvalues of H are 0 (with multiplicity 2) and 1/2. So there
is no solution to I = IH. To see how the system would evolve over time, consider
powers of H:




0 0 0
0 0 0


H k =  31k 31k 31k  −→  0 0 0  .
0 0 0
0 0 0
With each step, the importance of page v2 is dissipated.
Google’s way of fixing this problem is the following: while randomly browsing the
web as described above, if you reach a page with no outgoing links, then choose a
page at random (which includes the possibility of remaining on the current page).
The effect on the page matrix is as follows:
  1 1 1 

0 0 0
3
3
3
 1 1 1  

H −→ S =  3 3 3  +  0 0 0  .
0 0 0
1
3
1
3
1
3
For a general network of n webpages, define the n × n matrix A indexed by the
webpages such that row x of A is a row of zeroes unless x is a dangling node (no
outgoing edges). In the latter case, row x of A is the row with each entry equal
to 1/n. We denote this modified matrix by S:
S = H + A.
This modified matrix S is then guaranteed to be the transition matrix for a Markov
chain.
193
Existence, uniqueness, and convergence. At this point, given a network of
webpages, we have an associated Markov chain with transition matrix S = H + A.
Starting at any webpage, the user follows a random link. It there are no links, the
user selects a webpage at random. To find the vector of relative importances, we need
to solve the equation I = IS where I is a probability distribution. From our earlier
work, we know that if the Markov chain is irreducible, then I exists and is unique.
Further, if the chain is aperiodic, the chain will evolve from any starting state to the
stationary distribution, I. In order to ensure irreducibility (i.e., being able to travel
between every pair of webpages by following links) and aperiodicity, let J be the n×n
matrix whose entries are all 1. Pick a real number α ∈ (0, 1), and define the Google
matrix:
(1 − α)
J.
G = αS +
n
Then G defines an irreducible aperiodic Markov chain. From the point of view of a
web browsing, a user first flips a coin. With probability α, the user picks a link on
the current page at random (or picking a random page if there are no links on the
current page). With probability 1 − α, the user picks a page at random.
The constant α affects the rate of convergence I0 Gk → I. If α is close to 1, then the
structure of the webpage links is highly weighted, which is desirable. However, one
may show that α is the size of the second largest eigenvalue of G. As we will discuss
later, this means that for α close to 1, convergence to I is slow. As a compromise,
Brin and Page chose α = 0.85.
194
Week 14, Wednesday: Pagerank III.
Theory underlying Pagerank. Let K be an n × n real matrix, and assume that K
is nonnegative, which means that no entry of K is negative. Define an associated
directed graph GK with n vertices and a directed edge from the i-th to the j-th
vertex if Kij is nonzero. From GK we get an associated Markov chain called the
random walk on GK . We define the chain by labeling each edge out of each vertex i
by 1/di , where di is the outdegree of i—the number of directed edges leading out of i.
So to transition out of vertex i, pick an edge leading out of i uniformly at random,
and follow that edge to its head. That gives the next state. We then say that K is
irreducible if this Markov chain is irreducible. This just means that given any pair of
vertices i, j, there is a directed path in GK from i to j.
The Perron-Frobenius Theorem. Let K be an n × n nonnegative irreducible
matrix. Then
1. There exists a simple (i.e., algebraic multiplicity 1) real eigenvalue λ > 0 called
the Perron-Frobenius eigenvalue such that vK = λv where v > 0 (i.e., all of the
entries of v are positive). Further, λ is dominant, meaning that λ ≥ |µ| for all
eigenvalues µ of K. (There is also a positive vector w such that Kw = λw.)
2. If there are ` eigenvalues µ such that |µ| = λ, then they are λ, ωλ, . . . , ω k−1
where ω = e2πi/k .
3. If there exists r such that K r has only positive entries, then λ > |µ| for all other
eigenvalues µ.
Example. Here is a fairly random nonnegative

2 1 7
 1 2 8
K=
 1 2 9
2 10 0
4 × 4 matrix

0
0 
.
1 
4
The matrix K is irreducible. For instance K12 , K23 , K34 , K41 are all nonzero, which
implies that the associated graph GK contains the directed cycle,
v1 → v2 → v3 → v4 → v1 .
195
So the vertices all communicate with each other. Sage gives the following approximate
values for the eigenvalues of K:
12.54,
1.69 − 2.37i,
1.69 + 2.37i,
1.07.
The absolute values of these are, in order:
12.54 > 2.91 = 2.91 > 1.07.
Note the Perron-Frobenius eigenvalue is strictly dominant. This is consistent with
the fact that K 2 has only positive entries (in accordance with the Perron-Frobenius
theorem). Here are approximate eigenvectors for each eigenvalue:
eigenvalue
12.54
1.69 − 2.37i
1.69 + 2.37i
1.07
eigenvector
(1, 2.14, 6.8, 0.80)
(1, 4.33 − 3.8i, −3.93 + 5.47i, −0.36 − 2.00i)
(1, 4.33 + 3.8i, −3.93 − 5.47i, −0.36 + 2.00i)
(1, −0.95, 0.08, −0.03).
In accordance with Perron-Frobenius, the eigenvector displayed for the dominant
eigenvalue is positive.
Transition matrices. The case of interest for us is the transition matrix P for an
irreducible Markov chain.
Proposition. The Perron-Frobenius eigenvalue of the transition matrix P of an
irreducible Markov chain is 1.
Proof. Let λ by the Perron-Frobenius eigenvalue, and let w = (w1 , . . . , wn ) > 0
satisfy P w = λw. Let
Pna = (a1 , . . . , an ) be an arbitrary row of P . Then we know 0 ≤
ai ≤ 1 for all i and i=1 ai = 1. Let wmax be a largest component of w. Then
0 ≤ a · w = a1 w1 + · · · + an wn ≤ (a1 + · · · + an )wmax = wmax .
Therefore, each entry of P w is a nonnegative real number that is at most wmax .
It follows that λ ≤ 1. Further, the vector (1, . . . , 1) is a right eigenvector with
eigenvalue 1. Therefore, λ = 1. (Note that the eigenvalues for any square matrix A are
the same as that for the transpose At . That’s because det(A − xIn ) = det(At − xIn ).
The left and right eigenvectors for any particular eigenvalue, however, will usually
differ.)
Since P is irreducible, there is a path in the Markov chain between any two pair of
states, i.e., if x, y are states, then there exists r > 1 such that P r (x, y) > 0. If, in
addition, the chain is aperiodic, there will exist r > 1 such that P r (x, y) for every
196
pair of states. Therefore, according to Perron-Frobenius, for an irreducible aperiodic
Markov chain, the Perron-Frobenious eigenvalue λ strictly dominates every other
eigenvalue: λ > |µ| for all other eigenvalues µ.
Convergence. Let P be the transition matrix of an irreducible aperiodic Markov
chain (for instance, P could be the Google matrix). To get an idea of why we get
convergence to the stationary distribute, imagine that we have a basis of eigenvectors v1 , . . . , vn where vi P = λi and
1 = λ1 > |λ2 | ≥ · · · ≥ |λn |.
(In general, P won’t be diagonalizable—so there won’t be a basis of eigenvectors. In
that case, we’d need to consider a basis with respect to which P is in Jordan form.
The argument we are about to give would be much the same in that case.) Let π0 be
any initial probability distribution, and write
π0 = a1 v1 + · · · + an vn .
Applying P over and over, we get
π1 = π0 P = a1 v1 + a2 λ2 v2 + · · · + an λn vn
π2 = π1 P = a1 v1 + a2 λ22 v2 + · · · + an λ2n vn
π3 = π2 P = a1 v1 + a2 λ32 v2 + · · · + an λ3n vn
..
..
.
.
πk = πk−1 P = a1 v1 + a2 λk2 v2 + · · · + an λkn vn .
Since 1 > |λi | for i > 1, it follows that λk → 0 for i > 1, and hence, πk → a1 v1 .
In other words, we get convergence to an eigenvector with eigenvalue 1. Further,
the speed of convergence is determined by the modulus of the second-largest eigenvalue, λ2 .
Last example. Consider the following collection of webpages and their links:
197
8
9
7
1
2
3
4
5
6
10
11
The matrix S = H + A is

0
 0
 1

 11
 0

 0

 0

 0

 0

 0

 0
0
The third row, with
1
11
0
0
1
11
0
0
0
1
2
0
0
0
0
0
1
2
1
11
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
11
1
11
1
11
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
2
0
0
0
0
1
2
0
1
11
1
2
1
2
1
2
0
1
1
0
0
0
0
0
0
1
11
1
11
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
2
1
2
1
11
1
2
1
2
1
2
0
0
0
0
1

0
0 

1 
11 
0 

0 

0 
.
0 

0 

0 

1 
0
in each entry, comes from the dangling node, 3.
Let α = 0.85 and consider the Google matrix
G = αS +
1−α
J
11
where J is the all-ones matrix. Starting with initial distribution I = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
198
the following table illustrates convergence to the stationary distribution:
k
1
10
15
20
100
IGk
(0.0136,
(0.0169,
(0.0167,
(0.0167,
(0.0167,
0.0136, 0.0136, 0.0136, 0.0136, 0.0136, 0.438, 0.0136, 0.0136, 0.438, 0.0136)
0.0562, 0.0403, 0.056, 0.0169, 0.0169, 0.090, 0.0169, 0.0169, 0.270, 0.403)
0.050, 0.0401, 0.05, 0.0167, 0.0167, 0.090, 0.0167, 0.0167, 0.391, 0.281)
0.0549, 0.0401, 0.0549, 0.0167, 0.0167, 0.0899, 0.0167, 0.0167, 0.340, 0.337)
0.0549, 0.0401, 0.0549, 0.0167, 0.0167, 0.0899, 0.0167, 0.0167, 0.357, 0.320)
Thus, page 10 is the most important, followed by page 11. Page 7 is next followed
by 2 and 4. The rest of the pages all have the same least ranking.
199
Homework assignments
200
Week 1, Friday
As on all homework due in Math 201 this semester, be sure to show your work for
full credit.
1. Calculations. For each of the following systems of linear equations
• Find the associated augmented matrix M .
• Compute the reduced row echelon form E for M . Show your work as in
class, specifying your row operations.
• From E determine whether there are solutions to the system. If there is a
unique solution, state it. If there are infinitely many solutions, express the
set of solutions in two ways: (i) parametrically, as in examples 2.4 and 2.5 in
Chapter One, Section I.2, and (ii) in vector form as in Chapter One, Section I.3.
(a)
x − 2y + z = 1
−4x + 2y − z = 0
3x + 3y − z = 1.
(b)
x + y + 3z = 3
−x + y + z = −1
2x + 3y + 8z = 4.
(c)
x − 2y + 2z = 5
x − y = −1
−x + y + z = 5.
(d)
2x − 2y − 3z = −2
3x − 3y − 2z + 5w = 7
x − y − 2z − w = −3.
201
(e)
x + y + 3z = 4
x + 2y + 4z = 5.
2. Some questions about conics.
(a) Let y = px2 + qx + r be the equation of a general parabola. By solving
a system of equations, find the constants p, q, and r so that the resulting
parabola passes through the points (−2, 15), (1, 3), and (2, 11).
(b) A (real) plane conic is a set of points of the form
C = {(x, y) ∈ R2 : ax2 + bxy + cy 2 + dx + ey + f = 0}
for some constants a, b, c, d, e, f ∈ R, not all zero. For example, the unit circle
centered at the origin is the conic specified by taking a = c = 1, b = d = e = 0,
and f = −1 to get the defining equation, x2 + y 2 − 1 = 0. Note that defining
equation of a conic is only determined up to a scalar multiple: for instance,
2x2 + 2y 2 − 2 = 0, the conic with a = c = 2, b = d = e = 0, and f = −2, also
determines the unit circle centered at the origin.
The parabola specified by y = px2 + qx + r is a plane conic with parameters
a = p, b = c = 0, d = q, e = −1, and f = r. Above, we saw an example
where three points determined a parabola. How many points in the plane do
you think must be specified to determine a conic, in general? Why? (Note:
You probably don’t have the tools yet to rigorously answer this question.)
202
Week 2, Tuesday
As on all homework due in Math 201 this semester, be sure to show your work for
full credit.
1. Prove that the the following sets and operations do not form vectors spaces. As
usual, to disprove something, you need to provide a concrete counterexample,
ideally as simple as possible.
(a) V = R2 , with
x1
x2
x1 + x2
+
=
y1
y2
y1 y2
x1
rx1
and r ·
=
.
y1
y1
(b) V = R2 , with
x1
x2
x1 + x2
+
=
y1
y2
y1 + y2
x1
rx1
and r ·
=
.
y1
0
(c) V = {(x, y) ∈ R2 : x + 2y = 3} with the usual addition and scalar multiplication for vectors in R2 .
2. Let V be a vector space over a field F , and let r, s ∈ F and ~v ∈ V . In preparation
for this problem, review the statement and proof of Lemma 1.16, p. 86, from your
reading. (Note that the proof given there could be improved by adding implication
arrows between the displayed equations and providing reasons for the implications.)
You may use this lemma in your solution.
(a) Prove that r · ~v = ~0 if and only if r = 0 or ~v = ~0. (Note: this is an “if and
only if” proof.)
(b) Prove that if ~v 6= ~0, then r · ~v = s · ~v if and only if r = s.
(c) If F = R, prove that any nontrivial vector space is infinite.
203
Week 2, Friday
1. In each of the following:
• Determine whether the given vector v is in the span of the set S by creating a
relevant system of linear equations in the usual form
a11 x1 + · · · + a1n xn = b1
..
.
am1 x1 + · · · + amn xn = bm
and then row reducing the corresponding augmented matrix for the system.
• If v is in the span of S, then explicitly write v as a linear combination of the
vectors in S.
Assume we are working over the field Q of rational numbers.
(a) v = (0, −1, −6), S = {(1, 0, −1), (2, 1, 3), (4, 2, 5)}.
(b) v = (1, 2, 4), S = {(1, 4, 7), (2, 5, 8), (3, 6, 9)}.
(c) v = x3 − 13x2 + 7x + 27, S = {x3 + 3x2 − 2, x3 + x2 + 4x + 1, 2x2 + x + 4}.
9 12
1
1
−1 2
2 4
(d) v =
,S=
,
,
.
10 9
1 −2
1 2
3 5
2. Here are two templates for showing a subset W of a vector space V over a field F
is a subspace:
Proof 1. First note that 0 ∈ W since
. Hence, W 6= ∅. Next, suppose
. Hence, u + v ∈ W . Now suppose λ ∈ F
that u, v ∈ W . Then
and w ∈ W . Then
. Therefore, λw ∈ W .
Proof 2. First note that 0 ∈ W since
that λ ∈ F and u, v ∈ W . Then
. Hence, W 6= ∅. Next, suppose
. Hence, λu + v ∈ W .
Use one of these two templates for each of the following exercises.
(a) Show that W = {(x, y, z) ∈ R3 : 3x + 2y − z = 0} is a subspace of R3 .
(b) Show that the set W = {f : R → R : f (t) = f (−t)} is a subspace of the vector
space of real-valued functions of one variable. (Hint: you will need to carefully
use the definitions given in Example 1.12, p. 84, of the text.)
204
Week 3, Tuesday
1. Determine whether the following sets are linearly dependent or linearly independent.
(a) {x3 + 2x2 , −x2 + 3x + 1, x3 − x2 + 2x − 1} in P3 (R).
(b) {(1, −1, 2), (1, −2, 1), (1, 1, 4)} in R3 .
(c) {(1, 1, 0), (1, 0, 1), (0, 1, 1)} in R3 .
(d) {(1, 1, 0), (1, 0, 1), (0, 1, 1)} in Z/2Z (the field with two elements).
2. Let V be a vector space over R. Let u and v be distinct vectors in V . Prove that
{u, v} is linearly independent if and only if {u + v, u − v} is linearly independent.
3. (a) For any field F , we have defined the vector space F n of n-tuples with components in F . List all elements of (i) F 2 and (ii) F 3 in the case that F = Z/2Z.
(b) Let S = {u1 , ..., un } be a set of linearly independent vectors in a vector space
over Z/2Z. How many elements are in Span(S)? Justify your solution.
205
Week 3, Friday
1. Find the coordinates of each given vector v with respect to the ordered list of
linearly independent vectors B = hβ1 , . . . , βn i. (See Definition 1.13, p. 117.) Show
your work.
(a) v = (11, −6), B = h(1, 2), (−2, 3)i.
(b) v = (11, −6), B = h(1, 0), (0, 1)i.
(c) v = x2 + 7x − 5, B = h1, (x − 1), (x − 1)2 i.
(d) v = x2 + 7x − 5, B = h1, x, x2 , x3 i. (Note: x3 ∈ B).
(e)
v=
3 7
8 11
,
1 1
−1 1
B=h
,
i.
2 1
−1 3
2. Answer the following with “True” or “False”. No justifications or counterexamples
are required.
(a) If S is a linearly dependent set, then every vector in S is a linear combination
of the other vectors in S.
(b) Any set containing the 0 vector is linearly dependent.
(c) Every subset of a linearly dependent set is linearly dependent.
(d) A vector space cannot have more than one basis.
(e) The span of the empty set is the empty set.
3. Let X = {1, 2, 3}, and consider the vector space of functions
RX := {f : X → R} .
Recall that for f, g ∈ RX and λ ∈ R, the vector space operations are defined as
follows:
(f + g)(x) = f (x) + g(x) and (λf )(x) = λ(f (x)).
Also recall that in order to prove that f = g, one would show that f (i) = g(i),
for i = 1, 2, 3—that’s how one shows they are the same function.
The zero function is z ∈ RX , defined by z(1) = z(2) = z(3) = 0. It’s the additive identity for the vector space. Also define the three characteristic function: χ1 , χ2 , χ3 by
(
1 if i = j
χi (j) :=
0 if i 6= j
for i = 1, 2, 3. Thus, for instance, χ2 (1) = χ2 (3) = 0, and χ2 (2) = 1. Define B =
{χ1 , χ2 , χ3 }. Show that B is a basis for RX by completing the following steps.
206
(a) (Warm up) Let f ∈ RX be defined by f (1) = 5, f (2) = π, and f (3) = −7.
Write f as a linear combination of elements of B.
(b) Let g be an arbitrary element of RX . Show how to write g as a linear combination of elements of B. (Thus, B spans RX .)
(c) Show that B is a linearly independent set by proving that if
aχ1 + bχ2 + cχ3 = z
for some a, b, c ∈ R, then a = b = c = 0.
207
Week 4, Tuesday
1. Let A be an m × n matrix with i, j-th entry Aij . The tranpose of A, denoted AT ,
is the n × m matrix with i, j-th entry Aji : the i-th row of AT is the i-th column
of A. Thus, for example,

T
a b
a
c
e
 c d  =
.
b d f
e f
A matrix A is symmetric if AT = A. A matrix A is skew-symmetric if AT = −A.
A 3 × 3 skew-symmetric matrix has the form:


0
a b
 −a
0 c .
−b −c 0
Let W be the set of 3 × 3 skew-symmetric matrices over a field F .
(a) Prove that W is a subspace of the vector space of all 3 × 3 matrices over F .
(b) Give a basis for W .
(c) What is dim(W )?
2. Consider the following 12 randomly chosen vectors in (Z/3Z)4 :
(0, 1, 2, 0) (2, 1, 1, 2) (1, 2, 2, 1) (0, 2, 0, 2)
(1, 1, 0, 0) (0, 0, 1, 1) (2, 0, 0, 2) (2, 1, 0, 2)
(1, 2, 0, 1) (0, 0, 1, 2) (1, 2, 2, 2) (1, 2, 2, 2).
Find three subsets {v1 , v2 , v3 } of the above vectors with the property that v1 + v2 +
v3 = (0, 0, 0, 0).
3. Define the following matrix over the real numbers:

−14
56
40
92

8
−32
−23
−53
M =

6 −24 −17 −39
−1
4
3
7


.

(a) What is the reduced echelon form for M ? (You do not need to show your
work for this.)
(b) Compute (i) a basis for the row space of M and (ii) a basis for the column
space of M using the algorithm presented in class on Monday of Week 4.
(There is a unique solution.)
208
Week 5, Tuesday
1. For the following functions f :
i prove that f is a linear transformation,
ii find bases for N (f ) and R(f ), and
iii compute the nullity and the rank.
(a) f : R3 → R2 defined by f (x, y, z) = (x − y, 2z).
(b) f : R2 → R3 defined by f (x, y) = (x + y, 0, 2x − y).
(c) f : P2 (R) → P3 (R) defined by f (p(x)) = x · p(x) + p0 (x).
(Recall that Pn (F ) denotes the vector space of polynomials with coefficients
in F of degree less or equal to n. Here, p0 (x) denotes the standard derivative
from MATH 111.)
2. (a) Prove that there exists a linear transformation f : R2 → R3 such that f (1, 1) =
(1, 0, 2) and f (2, 3) = (1, −1, 4). What is f (8, 11)?
(b) Is there a linear transformation f : R3 → R2 such that f (1, 2, 1) = (2, 3),
f (3, 1, 4) = (6, 2) and f (7, −1, 10) = (10, 1)? Explain your reasoning.
3. Let V and W be vector spaces over F , and let f : V → W be a linear transformation.
(a) Prove that f is one-to-one if and only if f carries linearly independent subsets
of V to linearly independent subsets of W .
(b) Suppose that f is one-to-one and that S is a subset of V . Prove that S is
linearly independent if and only if f (S) is linearly independent.
209
Week 5, Friday
Let V be a vector space over F . An affine subspace of V is a subset of the form
A = p + U := {p + u : u ∈ U }
(39.1)
where p ∈ V and U is a linear subspace of V . The dimension of A is the dimension
of its linear part: dim A := dim W . If dim A = k, we call A a k-plane in V . A 1-plane
is a line, and a 2-plane is simply called a plane. A (dim V − 1)-plane is a hyperplane.
To give a parametrization of an affine subspace A = p + U of V , first choose a
basis {u1 , . . . , uk } of U , and then define
`: Fk → V
(t1 , . . . , tn ) 7→ p + t1 u1 + · · · + tk uk .
Then ` is an injection whose image is A.
1. The expression of an affine subspace A as p + U is not unique. Show that
p+U =q+U
if and only if
p − q ∈ U.
(Note: Make sure to use the definition of p + U displayed in (39.1), above. Also,
recall from math 112 that in order to show two sets X and Y are equal, take x ∈ X
and show it must be in Y , then take y ∈ Y and show it’s in X.)
2. Let L be the line in R3 passing through the points (1, 1, 1) and (4, 7, 2).
(a) Find a system of two linear equations whose solution set is L. Show your
work.
(b) Express L as an affine subspace of R3 by finding p ∈ R3 and a linearly
independent set S ⊂ R3 such that L = p + Span(S).
(c) Give a parametrization of L.
3. Let H be the plane containing the points (1, 0, 0), (4, 1, −2), and (6, 1, 1).
(a) Find a linear equation whose solution set is H. Show your work.
(b) Express H as an affine subspace of R3 by finding p ∈ R3 and a linearly
independent set S ⊂ R3 such that H = p + Span(S).
(c) Give a parametrization of H.
210
Week 6, Tuesday
1. Let
1 3
A=
,
2 −1
1 0 −3
B=
,
4 1 2
C=
1
1 4
,
−1 −2 0


2
D = −2 .
3
Compute, if possible, the following. If it is not possible, explain why.
(a) AB,
(b) A(2B + 3C),
(c) A + C,
(d) (AB)D,
(e) A(BD),
(f) AD.
2. Let A and B be m × n matrices over F , and C an n × p matrix over F . Prove
that (A + B)C = AC + BC. (This is called the right distributivity property.)
3. (a) Prove that matrix multiplication of 2 × 2 matrices does not satisfy the commutative law, AB = BA.
(b) Find a nonzero 2 × 2 matrix A over R satisfying that for all 2 × 2 matrices B,
AB = BA.
211
Week 6, Friday
1. For each of the following matrices, use the algorithm from class to determine
whether they have inverses, and if so, find the inverse. Show your work (i.e., the
row reduction). (Pointer: as with many linear algebra problems, it’s easy to make
arithmetic mistakes, but it’s also easy to check your answer!)




0 2 4
1 2 1
(a)  2 4 2 
(b)  2 1 −1  .
3 3 1
1 5 4
2. Answer the following questions True or False. You do not need to show your work.
(a) The rank of a matrix is the number of its nonzero columns.
(b) A matrix has rank zero if and only if it’s the zero matrix.
(c) The rank of an m × n matrix is at most min {m, n}, the minimum of m and n.
(d) If an n × n matrix has rank n, then it has an inverse.
(e) The dimension of the row space of a matrix is always equal to the dimension
of the column space of the matrix.
3. Let R[x]≤n be the vector space of polynomials in x of degree at most n with
coefficients in R. Define
f : R[x]≤2 → R[x]≤3
Z x
p(t) dt.
p(x) 7→
0
Find the matrix representing f with respect to the bases {1, x, x2 } for R[x]≤2 and
{1, x, x2 , x3 } for R[x]≤3 .
4. Consider the linear mapping
f : R2 → R2
(x, y) 7→ (x + 2y, 4x + 3y).
Choosing the standard bases for both domain and codomain, the matrix representing f is
1 2
A=
.
4 3
What is the matrix representing f with respect to the basis {(1, 2), (1, −1)} for
both the domain and the codomain?
212
Week 7, Tuesday
1. Let Pn (R) be the vector space of polynomials in x of degree at most n with
coefficients in R. Let f : P2 (R) → P2 (R) and g : P2 (R) → R3 be the linear transformations respectively defined as
f (p(x)) = (3 + x)p0 (x) + 2p(x) and g(a + bx + cx2 ) = (a + b, c, a − b).
Let B = h1, x, x2 i and D be the standard ordered basis for R3 .
(a) Compute the matrix representing f with respect to the basis B for both the
domain and codomain.
(b) Compute the matrix representing g with respect to the bases B and D.
(c) Compute the matrix representing g ◦ f with respect to the bases B and D.
Then use Theorem 2.7 (Chapter Three, Section IV) to verify your result.
2. Let V be a vector space over a field F . Recall that the identity function idV : V →
V is given by idV (v) = v for all v ∈ V . This function is linear (if you are not
convinced, prove it, but you do not have to turn in that proof).
(a) Let V be a vector space of dimension n and let B be an ordered basis for V .
Show that the matrix representing idV with respect to the basis B for both
the domain and the codomain is I (the n × n identity matrix).
(b) Let V and W be vector spaces of dimension n and let f : V → W be an
isomorphism with inverse f −1 : W → V . Let B and D be ordered bases for
V and W , respectively. If A is the matrix representing f with respect to the
bases B and D, what is the matrix for f −1 with respect to the bases D and
B?
(c) Consider the linear transformation f : R2 → R2 given by f (x, y) = (3x +
y, −x + 4y). Using part (b), prove that f is an isomorphism and find it’s
inverse.
213
Week 7, Friday
1. The trace of an n × n matrix A is the sum of its diagonal elements:
tr(A) =
n
X
Aii .
i=1
(a) If A and B are n × n matrices, prove that tr(AB) = tr(BA). (Use the
definition of matrix multiplication and summation notation in your proof.)
(b) If P is an invertible n × n matrix, prove that tr(P AP −1 ) = tr(A).
(c) Consider the following ordered basis for M2×2 :
1 0
0 1
0 0
α=
,
,
,
0 0
0 0
1 0
0 0
0 1
.
It’s easy to see that the trace is a linear function, in general, but don’t do that here.
Instead, compute the matrix representing the trace mapping tr : M2×2 → F with
respect to α for the domain and with respect to the basis {1} for the codomain.
2. Let V be a vector space. For V
each integer r > 0, we nowVgive a provisional definition
of a new vector space called r V . A spanning set for r V consists of expressions
of the form v1 ∧ · · · ∧ vr where v1 , . . . ,V
vr ∈ V . For example, if u, v, w ∈ V , the
following would be typical elements of 2 V :
ω = 4u ∧ v − 5u ∧ w + 7v ∧ w
µ = 2 u ∧ v + 9 u ∧ w + 6 v ∧ w.
Addition is done by combining like terms, and scaling is done by scaling each term.
For instance, continuing the example above, we get
ω + µ = 6 u ∧ v + 4 u ∧ w + 13 v ∧ w
5 ω = 20 u ∧ v − 25 u ∧ w + 35 v ∧ w.
We now add a couple of rules. First, these “wedge products” of vectors are linear
in each component. For vi ∈ V and a ∈ F ,
v1 ∧ · · · ∧ vi−1 ∧ (avi + vi0 ) ∧ vi+1 ∧ . . . vr =
a v1 ∧ · · · ∧ vi−1 ∧ vi ∧ vi+1 ∧ · · · ∧ vr
+
v1 ∧ · · · ∧ vi−1 ∧ vi0 ∧ vi+1 ∧ · · · ∧ vr .
214
Second, we declare that v1 ∧ · · · ∧ vr = 0 if vi = vV
j for some i 6= j. To illustrate
these rules in action suppose u, v, w ∈ V . Then in 3 V , we have the following:
u ∧ (2u + 3v + 5w) ∧ w = u ∧ (2u) ∧ w + u ∧ (3v) ∧ w + u ∧ (5w) ∧ w
= 2u ∧ u ∧ w + 3u ∧ v ∧ w + 5u ∧ w ∧ w
= 0 + 3u ∧ v ∧ w + 0
= 3 u ∧ v ∧ w.
V
Another example, this time in 2 V :
(u + 2v) ∧ (u + 3v) = u ∧ (u + 3v) + (2v) ∧ (u + 3v)
= u ∧ u + u ∧ (3v) + (2v) ∧ u + (2v) ∧ (3v)
= 0 + 3u ∧ v + 2v ∧ u + 6v ∧ v
= 3u ∧ v + 2v ∧ u + 0
= 3 u ∧ v + 2 v ∧ u.
It turns out there is a little more we can do to simplify this last example. By the
second rule, we have (u + v) ∧ (u + v) = 0, since in this expression we have two
copies of the same vector. But linearly expanding this expression, we get
0 = (u + v) ∧ (u + v)
= u ∧ (u + v) + v ∧ (u + v)
=u∧u+u∧v+v∧u+v∧v
=0+u∧v+v∧u+0
= u ∧ v + v ∧ u.
Thus, u ∧ v + v ∧ u = 0. This means that
u ∧ v = −v ∧ u.
In fact, in a wedge product of vectors, swapping any two locations negates the
expression. (The proof is similar to the one we just gave in the case of r = 2.) For
instance,
u ∧ v ∧ w = −u ∧ w ∧ v = w ∧ u ∧ v = −w ∧ v ∧ u.
Continuing our example from above, we get
(u + 2v) ∧ (u + 3v) = . . . (see earlier calculation)
= 3u ∧ v + 2v ∧ u
= 3u ∧ v − 2u ∧ v
= u ∧ v.
Now for some problems:
215
(a) Let V = R2 , and take two vectors u = (a, b) and v = (c, d) in R2 . Let e1 =
(1, 0) and e2 = (0, 1). Writing u and v as linear combinations of e1 and e2 ,
find the number k in terms of a, b, c, d such that
u ∧ v = k e 1 ∧ e2 ,
in
V2
V . What is the relation between k and det
a b
c d
?
(b) Now let V = R3 , and take vectors u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ) in R3 .
Writing these vectors as linear combinations of the standard basis vectors e1 =
(1, 0, 0), e2 = (0, 1, 0), and e3 = (0, 0, 1), find numbers p, q, r in terms of the ui
and the vi such that
u ∧ v = p e2 ∧ e3 − q e1 ∧ e3 + r e1 ∧ e2 .
(Watch out for the minus sign in front of q.) Physics students may note a
relation with the cross product of two vectors in R3 .
216
Week 8, Tuesday
1. Compute

1
(a) 4
7
the determinant of the following matrices by using row operations.

2 3
5 6
8 9


1 −2 1 2
2 −4 1 0

(b) 
0 0 −1 0
0 0
0 5


4 −1 −1 −1
−1 4 −1 −1

(c) 
−1 −1 4 −1
−1 −1 −1 4


n −1 · · · −1
−1 n · · · −1


(d) (BONUS) Generalize part (c) for the n × n matrix  ..
.. . .
.. ,
 .
.
.
. 
−1 −1 · · · n
with n in the main diagonal and −1 everywhere else.
In the next two exercises we will prove that the determinant is multiplicative, that
is, that for n × n matrices A and B,
det(AB) = det(A) det(B).
2. Let B be a fixed n × n matrix over F such that det(B) 6= 0. Consider the function
d : Mn×n (F ) −→ F
defined by d(A) = det(AB)/ det(B). You will prove that d(A) = det(A). For a
matrix A, we write (r1 , . . . , rn ) for the rows of A, with each ri ∈ F n .
(a) Prove that d satisfies that d(r1 , . . . , rj + k · ri , . . . , rn ) = d(r1 , . . . , rj , . . . , rn )
for all i 6= j and any k ∈ F .
(Some suggested notation to help in your proof: let c1 , . . . , cn be the columns
of B. Then (AB)s,t = rs · ct , i.e., the s, t-entry of AB is the dot product of
the s-th row of A with the t-th column of B. Recall that the dot product is
217
defined by (x1 , . . . , xn ) · (y1 , . . . , yn ) = x1 y1 + · · · + xn yn . Letting A0 be the
matrix with rows (r1 , . . . , rj + k · ri , . . . , rn ), you will need compare the rows
of AB and A0 B.)
(b) Prove that d satisfies that d(r1 , . . . , k · ri , . . . , rn ) = kd(r1 , . . . , ri , . . . , rn ) for
all i = 1, . . . , n and k ∈ F .
(c) Prove that d(In ) = 1.
(d) Deduce that for all A, we have that d(A) = det(A), and that det(AB) =
det(A) det(B).
3. We still need to prove that det(AB) = det(A) det(B) when det(B) = 0.
(a) Let f : V → W and g : W → U be linear transformations of finite dimensional
vector spaces over F . Show that
N (f ) ⊆ N (g ◦ f )
and
R(g ◦ f ) ⊆ R(g),
where N and R denote the nullspace (or kernel) and the range, respectively.
(b) Use part (a) to prove that rank(g ◦ f ) ≤ rank(f ) and rank(g ◦ f ) ≤ rank(g).
(Hint: For one of them you might need to use the rank-nullity theorem.)
(c) Let A be an m × n matrix over F , and B an n × p matrix over F . Prove that
rank(AB) ≤ rank(A) and rank(AB) ≤ rank(B).
(d) Conclude that if both A and B are n × n matrices such that either det(A) = 0
or det(B) = 0, then det(AB) = 0.
218
Week 8, Friday
1. Let A and B be n × n matrices. Prove the following facts about their transposes:
(a) (AB)t = B t At .
(b) If A is invertible, then (At )−1 = (A−1 )t . (Hint: use part (a).)
2. Let
A=
1 2 3
2 0 2
.
Find elementary matrices E1 , . . . , E` such that E` · · · E2 E1 A is the reduced echelon
form of A. (Check your work.)
3. Read the attached exposition on Cramer’s rule before attempting the next problems.
(a) Consider the 3 × 3 system

1
 2
0
of equations over the real numbers:

  
2 3
x1
4




0 2
x2
0 .
=
1 2
x3
2
Use Cramer’s rule to compute x2 . (You may assume the system is consistent.)
(b) Consider the following matrix over the

1+i

0
A=
i
complex numbers:

0
0
1
0 .
0 1−i
Compute each entry of adj(A) by hand, and then use the formula coming
from Cramer’s rule to compute A−1 .
219
Cramer’s rule
Let A be an invertible n × n matrix, and let b ∈ F n . Consider the n × n system of
linear equations Ax = b where x is the column vector with entries (x1 , . . . , xn ). For
each j = 1, . . . , n, let Mj be the n × n matrix formed by replacing the j-th column of
of A by b. Cramer’s rule says that the solution to the system is given by
xj =
det(Mj )
,
det(A)
for j = 1, . . . , n.
Example. Consider the system of equation
ax + by = s
cx + dy = t.
In matrix form, we write the system as
a b
x
s
=
.
c d
y
t
a b
Assume the determinant of
is nonzero, and apply Cramer’s rule:
c d
s b
det(M1 ) = det
= sd − bt,
t d
det(M2 ) = det
a s
c t
= at − sc.
By Cramer’s rule, the solution to the system is
x=
sd − bt
det(M1 )
=
det(A)
ad − bc
y=
det(M2 )
at − sc
=
.
det(A)
ad − bc
Let’s check this solution:
a b
x
s
=
c d
y
t
=⇒
220
x
y
=
a b
c d
−1 s
t
.
It is easy to check that
a b
c d
−1
1
=
ad − bc
d −b
−c
a
.
Hence, the solution is
x
y
=
−1 a b
c d
s
t
1
=
ad − bc
d −b
−c
a
1
=
ad − bc
ds − bt
−cs + ta
s
t
.
This agrees with the solution we calculated using Cramer’s rule.
Cramer’s rule and inverses. Suppose that A is an invertible n × n matrix. To find
the inverse of A, we need to find a matrix X such that AX = In . Finding the j-th
column of X is the same as solving the system Ax = ej , where ej is the j-th standard
basis vector. We can then solve for x using Cramer’s rule n times—once for each
column. We now describe the resulting formula for the inverse of A. First, some
notation: for each i, j ∈ {1, . . . , n} let Aji be the (n − 1) × (n − 1) matrix formed
by removing the j-th row and i-th column of A. Next, define the adjugate of A,
denoted adj(A) by
(adj(A))ij = (−1)i+j det(Aji ).
The scalar (−1)i+j det(Aji ) is called the ji-th cofactor of A. Note that we are defining
the ij-th entry of the adjugate using the ji-th cofactor—the indices reverse order.
Cramer’s rule applied to the problem of finding the inverse then gives the following
important formula:
1
adj(A).
A−1 =
det(A)
Example. As a simple example of Cramer’s formula for the inverse, let
a b
A=
.
c d
In this case, each Aji is a 1 × 1 matrix. We get
(adj(A))11 = (−1)1+1 det(A11 ) = det([d]) = d
221
(adj(A))12 = (−1)1+2 det(A21 ) = − det([b]) = −b
(adj(A))21 = (−1)2+1 det(A12 ) = − det([c]) = −c
(adj(A))22 = (−1)2+2 det(A22 ) = det([a]) = a.
Thus, Cramer’s formula gives the formula for the inverse of A we used earlier:
1
1
d −b
−1
A =
adj(A) =
.
a
det(A)
ad − bc −c
Cramer’s rule: continuity of solutions and of the inverse. In general, Cramer’s
rule is not a time-efficient or numerically stable way to compute the solution to a
system of equations. However, it is theoretically useful as we see from the following
immediate corollaries of the rule:
Theorem. Let F be R or C, and let GLn (F ) denote the set of invertible n×n over F .
1. The function
GLn (F ) → F
A 7→ A−1
is a continuous function. In other words, the inverse of A is a continuous function
of the entries of A.
2. The solution x to the system Ax = b is a continuous function of the entries of A
and b.
Proof. For part (1), it suffices to show that the entries of A−1 are rational functions
(i.e., quotients of polynomials) in the entries of A (with denominators that do not
vanish for invertible A). But this follow’s immediately from Cramer’s rule:
A−1 =
1
adj(A).
det(A)
The function A 7→ det(A) ∈ F is a polynomial in the entries of A (consider the permutation or Laplace expansion of the determinant), hence continuous. Hence, restricted
to invertible matrices, the function A 7→ 1/ det(A), which gives the denominators of
the entries of A−1 , is continuous. Similarly, the entries of adj(A) are polynomials in
the entries of A. The result follows.
Part (2) follows since Ax = b implies x = A−1 b. We’ve just seen that the entries
of A−1 are quotients of polynomials in the entries of A, hence the components of x
are quotients of polynomials in the entries af A on b.
222
Week 9, Tuesday
1. Compute the determinants of the following matrices by using the permutation
expansion.




0 0 0 1
0 1 2
0 0 2 0


(a) −1 0 −3
(b) 
0 3 0 0
2 3 0
4 0 0 0
2. Compute the determinants of the same matrices as in Problem 1 by using the
Laplace expansion along any row or column.
3. Let f (x1 , . . . , xn ) be a polynomial on n variables with coefficients in a field F . An
arbitrary term of this polynomial is of the form axd11 xd22 . . . xdnn , where a ∈ F and
di is a nonnegative integer for all i. The total degree of this term is d1 + . . . dn .
Here is a result from polynomial algebra. If f satisfies the conditions:
(i) f (x1 , . . . , xn ) = 0 whenever xi = xj for i 6= j;
(ii) the total degree of every term is n(n − 1)/2,
then f (x1 , . . . , xn ) = k(x2 −x1 )(x3 −x1 ) · · · (xn −xn−1 ), where the product contains
all the terms of the form xj − xi with 1 ≤ i < j ≤ n.
n−2 n−1
Note that the coefficient k is equal to the coefficient of x2 x23 · · · xn−1
xn in f .
Now consider the Vandermonde matrix




V (x1 , . . . , xn ) = 


1
x1
x21
..
.
1
x2
x22
..
.
···
···
···
..
.
x1n−1 xn−1
···
2
1
xn
x2n
..
.
xn−1
n







Let f (x1 , . . . , xn ) = det(V (x1 , . . . , xn )).
(a) Using properties of determinants, prove that f satisfies property (i).
(b) Using the permutation expansion of the determinant, prove that f satisfies
property (ii).
(c) Find the coefficient k.
223
(d) (Optional bonus problem.) A general polynomial of degree d in one variable
over the real numbers has the form
p(x) = a0 + a1 x + a2 x2 + · · · + ad xd ,
where the ai are real numbers. Pick n distinct real numbers x1 , . . . , xn , and
pick arbitrary (not necessarily distinct) real numbers b1 , . . . , bn . Prove that
there is a unique polynomial p(x) of degree n − 1 over the real numbers such
that p(xi ) = bi for i = 1, . . . , n.
(e) (Optional bonus problem.) Use the Vandermonde determinant to prove that
the collection of functions {eαx : α ∈ R} is linearly independent. (Recall that
the set of functions from R to R is a vector space. The solution to this problem
would show that this space is infinite dimensional.)
224
Week 10, Tuesday
1. For each of the following matrices A ∈ Mn×n (F )
(i) Determine all eigenvalues of A.
(ii) For each eigenvalue λ of A, find the set of eigenvectors corresponding to λ.
(iii) If possible, find a basis for F n consisting of eigenvectors of A.
(iv) If successful in finding such a basis, determine an invertible matrix P and a
diagonal matrix D such that A = P DP −1 .
1 2
(a) A =
for F = R.
3 2


0 −2 −3
(b) A = −1 1 −1 for F = R.
2
2
5
7 −5
(c) A =
for F = R.
10 −7
7 −5
(d) A =
for F = C.
10 −7


2 0 −1
(e) A = 4 1 −4 for F = R.
2 0 −1
2. Let f : V → V be a linear transformation. For a positive integer m, we define
f m inductively as f ◦ f m−1 . Prove that if λ is an eigenvalue for f , then λm is an
eigenvalue for f m .
3. Let T : Mn×n (R) → Mn×n (R) defined as T (A) = At (taking the transpose).
(a) Show that the only eigenvalues of T are 1 and -1. (Hint: Problem 2 might
help.)
(b) For n = 2, describe the eigenvectors corresponding to each eigenvalue.
(c) Find an ordered basis B for M2×2 (R) such that the matrix that represents T
with respect to B is diagonal.
(d) Repeat part (b) for an arbitrary n > 2.
(e) Repeat part (c) for Mn×n (R).
225
Week 10, Friday
1. Here is a matrix

1 −1
 0
1

0
2
A=

 −1 −1
0
1
and its reduced echelon form:



1
4 −6
4
4
1 0 −2 0 −1 −2 −2
 0 1 −3 0
−3 −1
2
0 −3 
1
2 −2 



e


−6 −1
3
2 −5 
0 1 −1
2
1 
A= 0 0
.


5 −2
2 −4
2
0 0
0 0
0
0
0 
−3
3 −2
8
1
0 0
0 0
0
0
0
Let fA : R7 → R5 be the linear function determined by A.
(a) Find a basis for the kernel (nullspace) of fA . (Note that can at easily check
if your basis vectors are in the kernel.)
(b) Find a basis for the range of fA .
(c) What does the rank-nullity theorem say in this particular case of fA ?
2. Let V denote the vector space over R consisting of all polynomials with real coefficients having degree at most 3. That is,
V = {a0 + a1 x + a2 x2 + a3 x3 : a0 , a1 , a2 , a3 ∈ R},
with the usual addition and scalar multiplication of polynomials. Define the following linear operator on V (in which the prime denotes differentiation),
L: V → V
f 7→ xf 0 + f 0 .
(a) Write the matrix of L with respect to the basis {1, x, x2 , x3 } of V .
(b) What are the eigenvalues of L?
(c) Does V have a basis of eigenvectors of L? If so, give such a basis (written
as polynomials not tuples of real numbers), and if not, explain why not.
3. Consider the matrix


2 1 1
B =  0 3 2 .
0 0 2
(a) What are the algebraic and geometric multiplicities of each of the eigenvalues
of B?
(b) Explain whether B is diagonalizable in terms of the geometric multiplicities
of its eigenvalues.
226
Week 11, Tuesday
1. Consider the cycle graph C4 :
v4
v3
v1
v2
(a) Find the adjacency matrix A = A(G).
(b) Compute A4 and use it to determine the number of walks from v1 to v3 of
length 4. List all of these walks (these will be ordered lists of 5 vertices).
(c) What is the total number of closed walks of length 4?
(d) Compute and factor the characteristic polynomial for A.
(e) What are the algebraic multiplicities of each of the eigenvalues?
(f) Diagonalize A using our algorithm: compute bases for the eigenspaces of each
of the eigenvalues you just found, and use them to construct a matrix P such
that P −1 AP is a diagonal matrix with the eigenvalues along the diagonal.
(g) Use part (f) to find a closed expression for A` for each ` ≥ 1.
(h) Take the trace of A` to get a formula for the number of closed walks of length `
for each ` ≥ 1. (You can check your result against the formula given in class.)
2. In this exercise we will prove the theorem from class:
“Let A be the adjacency matrix for a graph G with vertices v1 , . . . , vn , and let
` ∈ Z≥0 . Then then number of walks of length ` from vi to vj is (A` )ij .”
(a) Let p(i, j, `) denote the number of walks of length ` in G from vi to vj . Prove
that for all i, j = 1, . . . , n and ` ≥ 1,
p(i, j, `) =
n
X
p(i, k, ` − 1)p(k, j, 1).
k=1
(Hint: Part of the trick is to parse this formula appropriately.)
(b) Prove the theorem by induction on `, using the result from part (a).
227
Week 11, Friday
Note: you will want to do problem 1 first.
1. Find the solution to the system of differential equations
x01 = x1 + x2
x02 = x1
with initial condition (x1 (0), x2 (0)) = (1, 0). Your solution should include the
diagonalization of a matrix. You may find the following notation useful:
√
√
1+ 5
1− 5
and φ =
,
φ=
2
2
with useful relations φφ = −1, φ2 = φ + 1, and φ + φ = 1. (Warning: You will
want to make sure you get the diagonalization perfect. This will take some time.
Using the above notation as much as possible will help.)
2. Find a closed form for the number of walks of length ` from v1 to v1 (closed walks
at v1 ) in the graph
v1
v2
3. Consider a sequence of numbers pn defined recursively by fixing constants a and b,
next assigning initial values for p0 and p1 , and then for n ≥ 1 letting
pn+1 = apn + bpn−1 .
For instance, letting a = 2, b = −1, p0 = 0, and p1 = 1, we get
p0 = 0
p1 = 1
pn+1 = 2pn − pn−1
for n ≥ 1,
which defines the sequence
0, 1, 2, 3, 4, 5, 6, . . .
228
Given any sequence of this form, we get the following matrix equation:
pn+1
a b
pn
=
.
pn
1 0
pn−1
So we have
p2
p1
=
a b
1 0
p1
p0
(39.2)
,
which implies
2 p3
a b
p2
a b
a b
p1
a b
p1
=
=
=
,
p2
1 0
p1
1 0
1 0
p0
1 0
p0
and so on. In general, we have
n pn+1
a b
p1
=
.
pn
1 0
p0
Let
A=
a b
1 0
(39.3)
.
and suppose A is diagonalizable. Take P so that
P −1 AP = D = diag(λ1 , λ2 ).
We have seen that it follows that An = P Dn P −1 , so that equation (39.3) becomes
n
pn+1
p1
λ1 0
p1
n −1
−1
= PD P
=P
P
.
pn
p0
0 λn2
p0
Thus, we get a closed form expression for pn in terms of powers of the eigenvalues
of A (just take the second component of the product on the right-hand side of the
above equation).
Let a = b = 1, p0 = 0, and p1 = 1.
(a) Write out the first several values for the sequence (pn ).
(b) Write the corresponding matrix equation, as above.
(c) Diagonalize the matrix A and compute the corresponding equation for pn in
terms of powers of the eigenvalues of A.
229
Week 12, Tuesday
1. Let
A=
0 −1
1 0
.
(a) Compute, by hand, the powers Ak for all k. (After computing the 4-th power,
you will see that it’s actually possible to answer this question succinctly.)
(b) Use your answer to
the previous problem to compute eAt , directly from the
P∞
At
definition: e = k=0 k!1 Ak tk . (You should see some extremely well-known
power series arising as the components of this 2 × 2 matrix.)
(c) Diagonalize A: find a matrix P such that P −1 AP = D where D is a diagonal
matrix (with the eigenvalues of A along the diagonal). You will need to work
over C.
(d) Use your solution to the previous problem and the method given in class to
compute eAt (and check agreement with your previous calculation).
(e) Using the exponential matrix you found in part (b), solve the system of differential equations
x01 = −x2
x02 = x1 ,
subject to the initial conditions that x1 (0) = 1 and x2 (0) = 0. Do the same
for the initial conditions x1 (0) = 0 and x2 (0) = 1.
2. A rhombus is a parallelogram satisfying that all sides are equal length. Using the
standard inner product in R2 , prove that a parallelogram is a rhombus if and only
if its diagonals meet perpendicularly.
(Hint: take two arbitrary vectors x, y ∈ R2 and consider the parallelogram determined by x and y.)
3. Consider the inner product space (R2 , h , i) presented as an example in class, where
the inner product is defined as
h(x1 , x2 ), (y1 , y2 )i = 3x1 y1 + 2x1 y2 + 2x2 y1 + 4x2 y2 .
(a) Compute the lengths of the vectors (1, 0) and (0, 1).
(b) Compute cosine of the angle between (1, 0) and (0, 1). Are these vectors
perpendicular?
230
4. Recall the inner product defined on Mm×n (F ), where F = R or C: for A, B ∈
Mm×n (F ), we define
hA, Bi = tr(B ∗ A),
where B ∗ = B t is the conjugate transpose. In this problem we will verify that this
function does indeed satisfy two of the axioms of an inner product.
(a) Prove that this inner product does indeed satisfy conjugate symmetry: for all
A, B ∈ Mm×n (F ),
hA, Bi = hB, Ai.
(b) Prove that this inner product is indeed positive-definite: for all A ∈ Mm×n (F )
with A 6= 0,
hA, Ai > 0.
231
Week 13, Tuesday
1. Let V be an n-dimensional vector space over F = R or C, and let h , i be an inner
product on V . Let B = {v1 , . . . , vn } be an ordered basis for V (not necessarily
orthonormal). Let A be the n × n matrix given by
Aij = hvi , vj i.
Recall that for x ∈ V , [x]B denotes the coordinate vector for x with respect to the
basis B, and as usual, we will think of this vector in F n as an n × 1 matrix.
(a) Prove that for all x, y ∈ V ,
hx, yi = ([x]B )t A [y]B .
(Recall that for a matrix C, we define C by C ij = Cij , and then we define
the conjugate transpose by C ∗ = C t . Hint: compute both sides using sum
notation. On the right-hand side, you will be computing the 1, 1-entry of
a 1 × 1 matrix.)
(b) Prove that the matrix A satisfies A = A∗ .
(c) If the basis B is orthonormal, what is the matrix A?
(d) (Bonus) Let D be another ordered basis for V , and let C be the associated
n × n matrix. How are A and C related?
2. Let V be theR vector space of all continuous functions [0, 1] → R with
√ inner prod1
uct hf, gi = 0 f (t)g(t) dt. Let W be the subspace spanned by t, t . (Warning:
to get this problem right, you will need to be very careful with your calculations
and double-check your solutions.)
√
(a) Apply Gram-Schmidt to t, t to compute an orthonormal basis {u1 , u2 }
for W .
(b) Find the closest function in W to f (t) = t2 . Express your solution in two
forms: (i)√as a linear combination of u1 and u2 , and (ii) as a linear combination
of t and t.
232
Week 13, Friday
Show your work in the following problems, but feel free to use mathematical software
to do the matrix algebra (computing products and inverses).
1. Hooke’s law for a spring says there is a linear relationship between the length of
a spring and the force applied to the spring: F = mx + b. Use the following data
to estimate the spring constant m (length in inches and force in pounds):
length force
x
F
3.5
1.0
4.0
2.2
4.5
2.8
5.0
4.3
2. The method of least squares can be used to fit higher degree polynomials to data
(not just lines). Let p1 = (0, 6), p2 = (1, 0), and p3 = (2, 0).
(a) A parabola is the graph of a function of the form f (t) = ct2 + dt + e
where c, d, e ∈ R. Find the three linear conditions on the coefficients of f
that must be satisfied if its graph is to pass through p1 , p2 , and p3 , expressing
these conditions in the form Ax = y where A ∈ M3×3 (R).
(b) Solve the system to compute the coefficients c, d, e.
(c) Let p4 = (3, 3), and now find the four linear conditions on the coefficients of f
if its graph is to pass through p1 , p2 , p3 , and p4 . Express the conditions in
the form Ax = y, where now A ∈ M4×3 (R).
(d) The earlier version of the matrix A was invertible, and hence, there was a
unique parabola passing through the first three points, but that’s no longer
the case. Use the technique presented in class involving adjoints to compute
the coefficients c, d, and e that minimize the error ky − Axk.
233
Week 14, Tuesday
1. For each of the transition matrices: (i) draw the corresponding graph with edges
labeled by the probabilities and (ii) state whether the resulting Markov chain is
irreducible and whether it is aperiodic.


0.2 0 0.8
0.5 0.5
1
0
1
0 
(a)
(b)
(c)  0
0.5 0.5
0.3 0.7
0 0.4 0.6


0.4 0.6 0
0 1
(d)
(e)  0 0.2 0.8  .
1 0
0.7 0.1 0.2
2. If it is raining and I have an umbrella, I use it while walking between my home
and my office. However I only have four umbrellas to distribute between the two
locations. So there’s a problem: For instance, suppose I am at my office, it’s
raining, and I have only one umbrella there. I’ll take the umbrella with me on
my walk home, and thus all four umbrellas are at home. On my next walk to my
office, suppose it is not raining. I leave all four umbrellas at home and go to work.
When it’s time to go home, if it’s raining, I’ll get wet.
Model this behavior with a Markov chain. The states are 0, 1, 2, 3, 4. Where the
state i means I have i umbrellas in my current location. Suppose the probability
of rain at any time is 0.1. For example, if I am at state 3, then there are 3
umbrellas at my current location, and there is 1 umbrella at my destination. With
probability 0.1, I’ll take an umbrella with me, and when I get to my destination
there will be 2 umbrellas there. Hence, with probability 0.1, I move from state 3
to state 2.
(a) Draw the Markov chain. (Laying out the states in the order 0, 4, 1, 3, 2 may
help.)
(b) Find the transition matrix P for the chain. (We have already determined that
P (3, 2) = 0.1).
(c) Is this chain irreducible?
(d) Is it aperiodic? (Note the loop at state 2.).
(e) Find a stationary distribution π. This means you need to find a probability
distribution π such that π = πP . Taking transposes, you’ll see that the
vector π is a (right) eigenvector for P t with eigenvalue 1 scaled so that the
sum of its entries is 1.
234
(f) What is the probability of getting wet when I leave my location? This is the
probability of being in state 0 and having it rain.
235
handouts
236
Mathematical writing.
Here are some first pointers for mathematical writing:
• A proof consists solely of complete sentences. A sentence starts with a capital letter
and ends with a period. Avoid starting a sentence with a mathematical symbol.
• When giving a counterexample (disproving) a statement, be as concrete and specific
as possible. Try to find the most simple counterexample. In this way, the form your
argument should take when disproving a statement is the opposite of that used in
providing a proof. A proof requires a general argument, covering all possible cases.
• The symbol “⇒” means “implies”, as in x = 3 ⇒ 2x = 6. It does not mean
“equals” or “my next thought is”, etc. As you are reading what you have written,
make sure that substituting the word “implies” for “⇒” makes sense.
• The symbol “⇔” means “if and only if” (which is sometimes abbreviated “iff”). If
P and Q are statements that are either true or false, then P ⇔ Q means P ⇒ Q
and Q ⇒ P , i.e., the truth of P implies the truth of Q, and conversely, the truth
of Q implies that of P .
• The symbol “∀” means “for all” and “∃” means “there exists”. I use “s. t.” for
“such that”. I do not use “” for “such that”, since it conflicts with the following
usage: {1, 2, 3} 3 2. Also, do not use “∨” for “or”, “∧” for “and”, or “∼” for “not”.
It is just as easy to write the words, and it is a lot easier to read the words.
• If you have given a proof by contradiction or by proving the contrapositive, consider
whether a straightforward proof is at least as clear as the one you have given.
• When writing down a calculation, avoid crossing out terms (for example, when
terms cancel in fractions or when they add up to zero). This type of bookkeeping
is easy for the writer, who is crossing out sequentially, but is usually confusing for
the reader.
• If you say some statement follows “by definition”, make sure it follows directly
from a definition in the statement. Rule of thumb: if you use the phrase “by
definition” in a proof, make sure to be specific, e.g. “by definition of the derivative
. . . ”. In some sense, every true statement follows “from the definitions”. Your
proof should guide the reader by showing the relevance of various definitions and
already-established results to the statement you are trying to prove.
237
Definition of the determinant.
Let Mn×n (F ) denote the vector space of n × n matrices with entries in the field F .
If A ∈ Mn×n (F ), we write A = (r1 , . . . , rn ) where ri is the i-th row of A.
Definition. The determinant is a function
det : Mn×n (F ) → F
that is multilinear, alternating, and takes the value 1 at the identity matrix. In detail,
the determinant satisfies:
1. Multilinearity: The determinant is a linear function of each row. For all i =
1, . . . , n and for all λ ∈ F ,
det(r1 , . . . , ri + λri0 , . . . , rn ) = det(r1 , . . . , ri , . . . , rn ) + λ det(r1 , . . . , ri0 , . . . , rn ).
(In these matrices, it is understood that for j 6= i the j-th row is rj . We are only
changing the i-th row.)
2. Alternating: If ri = rj , then det(r1 , . . . , rn ) = 0.
3. Normalization: det(In ) = 1.
Theorem. For each n ≥ 0, a determinant function exists and is unique.
238
Midterm II review.
Some things you should know for the second midterm:
1. This course is cumulative, so you should review the topics covered in the first
midterm.
2. Know the definition of a linear transformation and how to prove a function between
vector spaces is a linear transformation.
3. What is an isomorphism of vector spaces?
4. If f : V → W is a linear tranformation, what is the range of f ? What is the null
space of f ? What is the kernel of f ?
5. If f : F n → F m is a linear transformation corresponding to a matrix A, be able to
compute bases for the range and for the null-space (kernel) of f from the reduced
echelon form of A.
6. State the rank-nullity theorem.
7. Lines, planes, hyperplanes: equations and parametrizations: know how to parametrize
a line, plane, or hyperplane.
8. Given an n × n matrix of full rank, use our row reduction algorithm to find its
inverse.
9. Given a linear function f : F n → F m , find the matrix corresponding to f , and
conversely, given a matrix, find the corresponding function.
10. Important: Given a linear transformation f : V → W and bases α for V and β
for W , find the matrix [f ]βα representing f with respect to those bases.
11. Determinants:
(a) Know the definition of the determinant.
(b) Know how the determinant behaves with respect to row operations.
(c) What is an elementary matrix?
(d) Know that det(A) = det(At ) and det(AB) = det(A) det(B).
(e) What is a permutation matrix?
(f) Know how to calculate the determinant through: (i) row operations, (ii) the
permutation expansion, and (iii) the Laplace expansion.
239
Practice problems for final exam.
(Some of these problems come from a CalPoly linear algebra exam that I found online.)
For the following problems, unless told otherwise, assume that we are working over
the field F = R.
1. Solve the following system of equations and write your solution in the form p +
Span(S) where p is a particular solution and where S is a set of vector spanning
the kernel of the matrix for the corresponding homogeneous system.
2x − 2y − 3z = −2
3x − 3y − 2z + 5w = 7
x − y − 2z − w = −3.
2. Let


2
0
10
A =  0 7 + x −3  .
0
4
x
Find all values of x such that A is invertible. Make sure you completely justify
your answer.
3. Consider the matrix A

1 5
 1 6
A=
 1 7
1 6
e
and its reduced echelon form A



4 3 2
1 0 −6 0 6


6 6 6 
e =  0 1 2 0 −2  .
,
A
 0 0 0 1 2 
8 10 12 
6 7 8
0 0 0 0 0
(a) Find a basis for the kernel (nullspace) of A.
(b) Find a basis for the column space of A.
4. Suppose A is a 5 × 8 matrix such that
  
7
−3


  

 0   1
 4 , 2
  


 1   −6



8
2
 
 
 
,
 
 
1
−2
0
3
1












is a basis for the column space of A. Find p and q so that the following statement
is true: The nullspace of A is a p-dimensional subspace of Rq .
240
5. Consider the matrix
A=
7
8
−4 −5
.
(a) Find all eigenvalues for A.
(b) Find a basis for each eigenspace for A.
(c) Find a matrix P and a diagonal matrix D such that P −1 AP = D.
6. Suppose that λ is an eigenvalue of an invertible n × n matrix A with corresponding
eigenvector v. Determine whether v is an eigenvector of A + cIn where c is an
arbitrary scalar. If so, what is the corresponding eigenvalue?
7. Let A be an n × n matrix with eigenvalues λ1 , . . . , λn , where each eigenvalue is
repeated according to its algebraic multiplicity. Let p(x) be the characteristic
polynomial of A.
(a) Show that the constant term p(0) of the characteristic polynomial is det(A).
(b) Show that det(A) = λ1 · · · λn .
(c) Show that the coefficient of xn−1 in p(x) is (−1)n−1 tr(A).
 
1
8. Let A be a 3 × 3 matrix over R. Suppose that  2  is a basis for the nullspace
3
 
2
of matrix A + 5I3 and  0  is a basis for the nullspace of the matrix A − 2I3 .
4
Find


3
A2  −2  .
5
9. True or False?:
(a) If A and B are n×n matrices and P is an invertible matrix such that P −1 AP =
B, then det(A) = det(B).
(b) If the characteristic polynomial of an n × n matrix A is p(x) = (1 − x)n + 2,
then A is invertible.
(c) If A is an n × n matrix and A2 is invertible, then A3 is invertible.
(d) If A is a 3 × 3 matrix such that det(A) = 7, then det(2At A−1 ) = 2.
241
(e) If v is an eigenvector of an n × n matrix A with corresponding eigenvalue λ1 ,
and if w is an eigenvector of A with corresponding eigenvalue λ2 , then v + w
is an eigenvector of A with corresponding eigenvalue λ1 + λ2 .
10. Let S = {(1, 1, 1), (1, 0, 2)} ⊂ R3 , and take the standard inner product on R3 .
(a) Use Gram-Schmidt to create an orthonormal set of vectors {u1 , u2 } with the
same span as S.
(b) Find a, b, c such that (a, b, c) is the vector in Span(S) closest to the vector y =
(1, 2, 3)
(c) How close is the vector you just found to y?
11. Let V be a vector space over R or C. What is an inner product on V .
12. Using adjoints, compute the best line (least squares) through the points (1, 1),
(2, 2), and (4, 1). Express your answer in the form y = mx + b for some m and b.
13. Let u = (1, 0, 2) and v = (3, 1, 1) be vectors in R3 .
(a) What is the angle between u and v?
(b) What is the projection of u onto v?
(c) What is the distance between u and v?
14. Pagerank problem (for Dave’s class):
(a) Let I(P ) be the “importance” of webpage P according to the PageRank
algorithm, and let dP denote the outdegree of P (the number of links leading
out of P ). Write Q → P if page Q links to page P . What is the definition of
I(P ) in terms of the importance of each webpage, and how does computing
the vector of importances, I, lead to the problem of finding an eigenvector of
a matrix, H? (Define H in the process.)
(b) What is the matrix H for the following network of four webpages?:
4
1
2
3
(c) Modify your matrix H to get a matrix S that deals with the problem of
dangling nodes.
242
(d) In general, the Google matrix goes further by averaging in the all-1s matrix:
1. This matrix G is the transition matrix for a Markov chain.
G = αS + 1−α
n
i. Describe the Markov chain corresponding to G in terms of the behavior
of a person searching the network of webpages.
ii. Averaging in the all-1s matrix guarantees the Markov chain has which
two important properties? What does this have to do with the computation of the vector of relative importances?
243
Download