Uploaded by Alina Hu

MATH 146 Notes S. New

advertisement
MATH 146 Linear Algebra 1, Lecture Notes
by Stephen New
1
Chapter 1. Concrete Vector Spaces and Affine Spaces
Rings and Fields
1.1 Definition: For a set S we write S × S = {(a, b)|a ∈ S, b ∈ S}. A binary operation
on S is a map ∗ : S × S → S, where for a, b ∈ S we usually write ∗(a, b) as a ∗ b.
1.2 Definition: A ring (with identity) is a set R together with two binary operations +
and · (called addition and multiplication), where for a, b ∈ R we usually write a · b as ab,
and two distinct elements 0 and 1, such that
(1) + is associative: (a + b) + c = a + (b + c) for all a, b, c ∈ R,
(2) + is commutative: a + b = b + a for all a, b ∈ R,
(3) 0 is an additive identity: 0 + a = a for all a ∈ R,
(4) every element has an additive inverse: for every a ∈ R there exists b ∈ R with a+b = 0,
(5) · is associative: (ab)c = a(bc) for all a, b, c ∈ R,
(6) 1 is a multiplicative identity: 1 · a = a for all a ∈ R, and
(7) · is distributive over +: a(b + c) = ab + ac for all a, b, c ∈ R,
A ring R is called commutative when
(8) · is commutative: ab = ba for all a, b ∈ R.
For a ∈ R, we say that a is invertible (or that a has an inverse) when there exists an
element b ∈ R such that ab = 1 = ba. A field is a commutative ring F such that
(9) every non-zero element has a multiplicative inverse: for every a ∈ F with a 6= 0 there
exists b ∈ F such that ab = 1.
An element in a field F is called a number or a scalar.
1.3 Example: The set of integers Z is a commutative ring, but it is not a field because
it does not satisfy Property (9). The set of positive integers Z+ = {1, 2, 3, · · ·} is not a
ring because 0 ∈
/ Z+ and Z+ does not satisfy Properties (3) and (4). The set of natural
numbers N = {0, 1, 2, · · ·} is not a ring because it does not satisfy Property (4). The set
of rational numbers Q, the set of real numbers R and the set of complex numbers
C are all fields. For 2 ≤ n ∈ Z, the set Zn = {0, 1, · · · , n − 1} of integers modulo n is a
commutative ring, and Zn is a field if and only if n is prime (in Z1 = {0} we have 0 = 1,
so Z1 is not a ring).
1.4 Remark: In a field, we can perform all of the usual arithmetical operations. The
next few theorems illustrate this.
1.5 Theorem: (Uniqueness of Inverse) Let R be a field. Let a ∈ R. Then
(1) the additive inverse of a is unique: if a + b = 0 = a + c then b = c,
(2) if a has an inverse then it is unique: if ab = 1 = ac then b = c.
Proof: To prove (1), suppose that a + b = 0 = a + c. Then
b = 0 + b = (a + c) + b = b + (a + c) = (b + a) + c = (a + b) + c = 0 + c = c .
To prove (2), suppose that a 6= 0 and that ab = 1 = ac. Then
b = 1 · b = (ac)b = b(ac) = (ba)c = (ab)c = 1 · c = c .
2
1.6 Definition: Let R be a ring and let a, b ∈ R. We write the (unique) additive inverse
of a as −a, and we write b − a = b + (−a). If a has a multiplicative inverse, we write the
(unique) multiplicative inverse of a as a−1 . When R is commutative, we also write a−1 as
b
1
1
a , and we write a = b · a .
1.7 Theorem: (Cancellation) Let R be a field. Then for all a, b, c ∈ R, we have
(1) if a + b = a + c then b = c,
(2) if a + b = a then b = 0, and
(3) if a + b = 0 then b = −a.
Let F be a field. Then for all a, b, c ∈ F we have
(4)
(5)
(6)
(7)
if
if
if
if
ab = ac then either a = 0 or b = c.
ab = a then either a = 0 or b = 1,
ab = 1 then b = a−1 , and
ab = 0 then either a = 0 or b = 0.
Proof: To prove (1), suppose that a + b = a + c. Then we have
b = 0 + b = −a + a + b = −a + a + c = 0 + c = c .
Part (2) follows from part (1) since if a + b = a then a + b = a + 0, and part (3) follows
from part (1) since if a + b = 0 then a + b = a + (−a). To prove part (4), suppose that
ab = ac and a 6= 0. Then we have
b = 1 · b = a−1 ab = a−1 ac = 1 · c = c .
Note that parts (5), (6) and (7) all follow from part (4).
1.8 Remark: In the above proof, we used associativity and commutativity implicitly. If
we wished to be explicit then the proof of part (1) would be as follows. Suppose that
a + b = a + c. Then we have
b = 0+b = (a−a)+b = (−a+a)+b = −a+(a+b) = −a+(a+c) = (−a+a)+c = 0+c = c.
In the future, we shall often use associativity and commutativity implicitly in our calculations.
1.9 Theorem: (Multiplication by 0 and −1) Let R be a ring and let a ∈ R. Then
(1) 0 · a = 0, and
(2) (−1)a = −a.
Proof: We have
0a = (0 + 0)a = 0a + 0a.
Subtracting 0a from both sides (using part 2 of the Cancellation Theorem) gives 0 = 0a.
Also, we have
a + (−1)a = (1)a + (−1)a = (1 + (−1))a = 0a = 0,
and subtracting a from both sides (part 3 of the Cancellation Theorem) gives (−1)a = −a.
3
The Standard Vector Space
1.10 Definition: Let S be a set. An n-tuple on S is a function a : {1, 2, · · · , n} → S.
Given an n-tuple a on S, for k ∈ {1, 2, · · · , n} we write ak = a(k). The set {1, 2, · · · , n}
is called the index set, an element k ∈ {1, 2, · · · , n} is called an index, and the element
ak ∈ S is called the k th entry of a. We sometimes write a = (a1 , a2 , · · · , an ) but we more
often write
 
a1
 a2 

a = (a1 , a2 , · · · , an )T = 
 ... 
an
to indicate that a is the n-tuple with entries a1 , a2 , · · · , an . The set of all n-tuples on S is
denoted by S n , so we have
n
o
n
T
S = a = (a1 , a2 , · · · , an ) each ai ∈ S .
1.11 Definition: For a ring R, we define the zero element 0 ∈ Rn to be
0 = (0, 0, 0, · · · , 0)T
or equivalently we define 0 ∈ Rn to be the element with entries 0i = 0 for all i. We define
the standard basis elements e1 , e2 , · · · , en ∈ Rn to be
e1 = (1, 0, 0, 0 · · · , 0)T ,
e2 = (0, 1, 0, 0, · · · , 0T ,
e3 = (0, 0, 1, 0, · · · , 0)T ,
..
.
en = (0, 0, 0, · · · , 0, 1)T .
(
Equivalently, for each index k we define ek ∈ Rn to be given by (ek )i = δki =
1 if k = i,
0 if k 6= i.
1.12 Definition: Given t ∈ R, x = (x1 , x2 , · · · , xn )T ∈ Rn and y = (y1 , y2 , · · · , yn )T ∈ Rn ,
where R is a ring, we define the product tx and the sum x + y by
  

x1
t x1
 x2   t x2 
 

tx = t 
 ...  =  ...  ,
xn

t xn


y1
x1 + y1
x1
 x2   y2   x2 + y2 
   
.
x+y =
..
 ...  +  ..  = 

.
.
xn
yn
xn + yn



Equivalently, we can define tx to be the element with entries (tx)i = t xi for all i, and we
can define x + y to be the element with entries (x + y)i = xi + yi for all i.
4
1.13 Note: For x1 , x2 , · · · , xn ∈ R, notice that
n
P
xi ei = x1 e1 + x2 e2 + · · · + xn en .
(x1 , x2 , · · · , xn )T =
i=1
1.14 Theorem: (Basic Properties of Rn ) Let R be a ring. Then
(1) + is associative: (x + y) + z = x + (y + z) for all x, y, z ∈ Rn ,
(2) + is commutative: x + y = y + x for all x, y ∈ Rn ,
(3) 0 ∈ Rn is an additive identity 0 + x = x for all x ∈ Rn ,
(4) every x ∈ Rn has an additive inverse: for all x ∈ Rn there exists y ∈ Rn with x + y = 0,
(5) · is associative: (st)x = s(tx) for all s, t ∈ R and all x ∈ Rn ,
(7) · distributes over addition in R: (s + t)x = sx + tx, for all s, t ∈ R and x ∈ Rn ,
(8) · distributes over addition in Rn : t(x + y) = tx + ty for all t ∈ R and x, y ∈ Rn , and
(9) 1 ∈ R acts a multiplicative identity: 1x = x for all x ∈ Rn .
Proof: To prove part (4), let x ∈ Rn and choose y = (−1)x. Then for all indices i we have
yi = ((−1)x)i = −xi and so (x + y)i = xi − xi = 0. Since (x + y)i = 0 for all i, we have
x+y = 0, as required. To prove part (8), let t ∈ R and let x, y ∈ Rn . Then for
all i we have
t(x + y) i = t(x + y)i = t(xi + yi ) = t xi + t yi = (tx + ty)i . Since t(x + y) i = (tx + ty)i
for all i, we have t(x + y) = tx + ty. The other parts can be proven similarly.
1.15 Definition: When F is a field, F n is called the standard n-dimensional vector
space over F , and an element of F n is called a point or a vector.
1.16 Example: Let n = 2 or 3 and let u, v ∈ Rn . If u 6= 0 then the set {tu|t ∈ R} is the
line in Rn through the points 0 and u, and the set {tu|0 ≤ t ≤ 1} is the line segment in R2
between 0 and u. If u 6= 0 and v does not lie on the line through 0 and u, then the points 0,
u, v and u + v are the veritices of a parallelogram P in Rn , the set {su + tv|s ∈ R, t ∈ R}
is the plane which contains P (in that case that n = 2, this plane is the entire set R2 ), the
set {su + tv|0 ≤ s ≤ 1, 0 ≤ t ≤ 1} is the set of points inside (and on the edges of) P . As
an exercise, describe the sets {su + tv|s + t = 1} and {su + tv|s ≥ 0, t ≥ 0, s + t ≤ 1}.
5
Vector Spaces and Affine Spaces in F n
1.17 Definition: Let F be a field. Given a point p ∈ F n and a non-zero vector u ∈ F n ,
we define the line in F n through p in the direction of u to be the set
L = p + tu t ∈ F .
Given a point p ∈ Fn and two vectors u, v ∈ Rn with u 6= 0 and v 6= tu for any t ∈ F , we
define the plane in F n through p in the direction of u and v to be the set
P = p + su + tv s, t ∈ F .
1.18 Remark: We wish to generalize the above definitions by defining higher dimensional
versions of lines and planes.
1.19 Note: For a finite set S, the cardinality of S, denoted by |S|, is the number of
elements in S. When we write S = {a1 , a2 , · · · , am }, we shall always tacitly assume that
the elements ai ∈ S are all distinct so that |S| = m unless we explicitly indicate otherwise.
1.20 Definition: Let R be a ring, let A = {u1 , u2 , · · · , um } ⊆ Rn . A linear combination
on A (over R) is an element x ∈ Rn of the form
m
P
x=
ti ui = t1 u1 + t2 u2 + · · · + tm um with each ti ∈ R.
i=1
The span of A (over R) (also called the submodule of Rn spanned by A over R), is the
set of all linear combinations on A. We denote the span of A by Span A (or by SpanR A)
so we have
nP
o
m
Span A = SpanR A =
ti ui each ti ∈ R .
i=1
For convenience, we also define Span ∅ = {0}, where ∅ is the empty set. Given an element
p ∈ Rn , we write
n
o n
o
m
P
p + Span A = p + u u ∈ Span A = p +
ti ui each ti ∈ R .
i=1
1.21 Definition: Let F be a field. For a finite set A ⊆ F n the set U = Span A is called
the vector space in F n (or the subspace of F n ) spanned by A (over F ). A vector space
in F n (or a subspace of F n ) is a subset U ⊆ F n of the form U = Span A for some finite
subset A ⊆ F n . Given a point p ∈ F n and a finite set A ⊆ F n , the set P = p + Span A is
called the affine space in F n (or the affine subspace of F n ) through p in the direction
of the vectors in A. An affine space in F n (or an affine subspace of F n ) is a subset
P ⊆ F n of the form P = p + U for some point p ∈ F n and some vector space U in F n . An
element in a subspace of F n can be called a point or a vector. An element in an affine
subspace of F n is usually called a point.
1.22 Theorem: (Closure under Addition and Multiplication) Let R be a ring, let A be a
finite subset of R, and let U = Span A. Then
(1) U is closed under addition: for all x, y ∈ U we have x + y ∈ U , and
(2) U is closed under multiplication: for all t ∈ R and all x ∈ U we have tx ∈ U .
Proof: Let A = {u1 , u2 , · · · , um }, let t ∈ R and let x, y ∈ U = Span A, say x =
and y =
m
P
i=1
ti ui . Then x + y =
m
P
(si + ti )ui ∈ U and tx =
i=1
m
P
i=1
6
m
P
i=1
(tsi )ui ∈ U .
si ui
1.23 Theorem: Let R be a ring, let p, q ∈ Rn , let A and B be finite subsets of Rn , and
let U = Span A and V = Span B. Then
(1) p + U ⊆ q + V if and only if U ⊆ V and p − q ∈ V , and
(2) p + U = q + V if and only if U = V and p − q ∈ U .
Proof: Suppose that p + U ⊆ q + V . Since p = p + 0 ∈ p + U , we also have p ∈ q + V , say
p = q + v where v ∈ V . Then p − q = v ∈ V . Let u ∈ U . Then we have p + u ∈ p + U and so
p+u ∈ q+V , say p+u = q+w where w ∈ V . Then u = w−(p−q) = w−v = w+(−1)v ∈ V
by closure. Conversely, suppose that U ⊆ V and p−q ∈ V , say p−q = v ∈ V . Let a ∈ p+U ,
say a = p + u where u ∈ U . Then we have a = p + u = (q + v) + u = q + (u + v) ∈ q + V
by closure, since u, v ∈ V . This proves Part (1), from which Part (2) immediately follows.
1.24 Theorem: Let A = {u1 , u2 , · · · , ul } ⊆ Rn and let B = {v1 , v2 , · · · , vm } ⊆ Rn , where
R is a ring. Then
(1) Span A ⊆ Span B if and only if each uj ∈ Span B, and
(2) Span A = Span B if and only if each uj ∈ Span B and each vj ∈ Span A.
Proof: Note that each uj ∈ Span A because we can write uj as a linear combination on A,
indeed we have
l
P
uj = 0u1 + 0u2 + · · · + 0uj−1 + 1uj + 0uj+1 + · · · + 0ul =
ti ui with ti = δij .
i=1
It follows that if Span A ⊆ Span B then we have each uj ∈ Span B. Suppose, conversely,
m
l
P
P
that each uj ∈ Span B, say uj =
sji vi . Let x ∈ Span A, say x =
tj uj . Then
i=1
x=
l
P
j=1
tj uj =
l
P
j=1
tj
j=1
m
P
sji vi =
i=1
m P
l
P
i=1
tj sji vi ∈ Span B.
j=1
This Proves part (1), and Part (2) follows immediately from part (1).
7
Linear Independence, Bases and Dimension
1.25 Definition: Let R be a ring. For A = {u1 , u2 , · · · , um } ⊆ Rn , we say that A is
m
P
linearly independent (over R) when for all t1 , t2 , · · · , tm ∈ R, if
ti ui = 0 then each
i=1
ti = 0, and otherwise we say that A is linearly dependent. For convenience, we also say
that the empty set ∅ is linearly independent. For a finite set A ⊆ F n , when A is linearly
independent and U = Span A, we say that A is a basis for U .
1.26 Example: Let F be a field. The empty set ∅ is linearly independent and Span ∅ = {0}
and so ∅ is a basis for the vector space {0} in F n . If 0 6= u ∈ F n then {u} is linearly
independent and so {u} is a basis for Span {u}. As an exercise, verify that for u, v ∈ F n ,
the set {u, v} is linearly independent if and only if u 6= 0 and for all t ∈ F we have v 6= tu.
1.27 Example: Verify that the set {e1 , e2 , · · · , en } is a basis for F n . We call it the
standard basis for F n .
1.28 Theorem: Let F be a field and let A = {u1 , u2 , · · · , um } ⊆ F n . Then
(1) for 1 ≤ k ≤ m, we have uk ∈ Span A \ {uk } if and only if Span A \ {uk } = Span A,
(2) A is linearly dependent if and only if uk ∈ Span A \ {um } for some index k.
Proof: Note that if Span A \ {uk } = Span A then uk ∈ Span A = Span A \ {uk } .
P
Suppose, conversely, that uk ∈ Span A \ {uk } , say uk =
si ui where each si ∈ F .
i6=k
Since A \ {uk } ⊆ A it is clear that Span A \ {uk } ⊆ Span A. Let x ∈ Span A, say
m
P
P
P
P
x=
ti ui . Then we have x = tk uk +
ti ui = tk
si ui +
ti ui ∈ Span A \ {uk } .
i=1
i6=k
i6=k
i6=k
This proves Part (1).
Note that since A \ {uk } ⊆ A, we have Span A \ {uk } ⊆ Span A. Suppose that A is
m
P
linearly dependent. Choose coefficients si ∈ F , not all equal to zero, so that
si ui = 0.
i=1 P
P
si
Choose an index k so that sk 6= 0. Since 0 = sk uk +
si ui we have uk = −
sk ui .
i6=k
m
P
P
i6=k
P
s
P
i
ti ui ∈ Span A we have x = tk uk +
ti ui = −tk
ti ui ∈
sk ui +
i=1
i6=k
i6=k
i6=k
Span A\{uk } . This shows that if A is linearly dependent then Span A\{uk } = Span A.
For x =
1.29 Theorem: Let F be a field, let A = {u1 , u2 , · · · , um } ⊆ F n , and let U = Span A.
Then A contains a basis for U .
Proof: If A is linearly independent, then A is a basis for U . Suppose that A is linearly dependent. Then for some index k we have uk ∈ Span (A \ {uk }). Reordering
the vectors if necessary, let us assume that um ∈ Span (A \ {um }). Then we have
Span {u1 , u2 , · · · , um−1 } = Span {u1 , u2 , · · · , um } = U . If {u1 , u2 , · · · , um−1 } is linearly
independent then it is a basis for U . Otherwise, as above, we can reorder u1 , u2 , · · · , um−1
if necessary so that Span {u1 , u2 , · · · , um−2 } = Span {u1 , u2 , · · · , um−1 } = U . Repeating
this procedure we will eventually obtain a linearly independent subset {u1 , u2 , · · · , uk } ⊆ A
with Span {u1 , u2 , · · · , uk } = U (if the procedure continues until no vectors are left then
we have k = 0 and {u1 , · · · , uk } = ∅, which is linearly independent).
1.30 Corollary: For a field F , every subspace of F n has a basis.
8
1.31 Theorem: Let F be a field, let A = {u1 , u2 , · · · , um } ⊆ F n , let a1 , a2 , · · · , am ∈ F
m
P
with ak 6= 0, and let B = {v1 , v2 , · · · , vm } where vi = ui for i 6= k and vk =
ai ui . Then
i=1
(1) Span A = Span B and
(2) A is linearly independent if and only if B is linearly independent.
m
m
P
P
P
P
ti vi ∈ Span B, we have x = tk vk +
ti v i = tk
ai ui +
ti ui and
Proof: For x =
i=1
i=1
i6=k
i6=k
so x ∈ Span A. This shows that Span B ⊆ Span A.
m
P
Suppose A is linearly independent. Suppose
ti vi = 0 where each ti ∈ F . Then
i=1
0 = t k vk +
P
ti vi = tk
m
P
ai ui +
i=1
i6=k
P
ti ui = tk ak uk +
i6=k
P
(tk ai + ti )ui .
i6=k
Since A is linearly independent, all of the coefficients in the above linear combination on
A must be equal to zero, so we have tk ak = 0 and tk ai + ti = 0 for i 6= k. Since tk ak = 0
and ak 6= 0 we have tk = 0 and hence 0 = tk ai + ti = ti for all i 6= k. This shows that B is
linearly independent.
m
P
P
Finally note that since vk =
ai ui = ak uk +
ai vi with ak 6= 0, it follows that
i=1
uk =
m
P
bi vi where bk =
i=1
1
ak
i6=k
6= 0 and bi = − aaki for i 6= k. Hence the same arguments
used in the previous two paragraphs, with the rôles of A and B interchanged, show that
Span A ⊆ Span B and that if B is linearly independent then so is A.
1.32 Theorem: Let F be a field, let U be a subspace of F n and let A = {u1 , u2 , · · · , um }
be a basis for U . Let B = {v1 , v2 , · · · , vl } ⊆ U . Suppose that B is linearly independent.
Then l ≤ m, if l = m then B is a basis for U , and if l < m then there exist m − l vectors
in A which, after possibly reordering the vectors ui we can take to be ul+1 , ul+2 , · · · , um ,
such that the set {v1 , v2 , · · · , vl , ul+1 , ul+2 , · · · , um } is a basis for U .
Proof: When l = 0 so that B = ∅, we have m − l = m and we use all m of the vectors in A
to obtain the basis {u1 , u2 , · · · , um }. Let l ≥ 1 and suppose, inductively, that for every set
B0 = {v1 , v2 , · · · , vl−1 } ⊆ U , if B0 is linearly independent then we have l − 1 ≤ m and we
can reorder the vectors ui so that {v1 , v2 , · · · , vl−1 , ul , ul+1 , · · · , um } is a basis for U . Let
B = {v1 , v2 , · · · , vl } ⊆ U . Suppose B is linearly independent. Let B0 = {v1 , v2 , · · · , vl−1 }.
Note that B0 is linearly independent but B0 idoes not span U because vl ∈ U but vl ∈
/
Span B0 . By the induction hypothesis, we have l − 1 < m and we can reorder the vectors
ui so that {v1 , v2 , · · · , vl−1 , ul , ul+1 , · · · , um } is a basis for U . Since vl ∈ U we can write
l−1
m
P
P
vl in the form vl =
t i vi +
sj uj . Note that the coefficients sj cannot all be equal to
i=1
i=l
zero since vl ∈
/ Span B0 . After reordering the vectors uj we can suppose that sl 6= 0. By
Theorem XX, the set {v1 , v2 , · · · , vl−1 , vl , ul+1 , · · · um } is a basis fot U (in the case that
l − 1 = m this basis is the set B).
1.33 Corollary: For a vector space U in F n , any two bases for U have the same number
of elements.
1.34 Definition: For a vector space U in F n , we define the dimension of U , denoted by
dim U , to be the number of elements in any basis for U . For an affine space P = p + U in
F n , we define the dimension of P to be dim P = dim U .
9
Chapter 2. Solving Systems of Linear Equations
Systems of Linear Equations
2.1 Definition: Let R be a ring and let n ∈ Z+ . A linear equation in Rn (or a linear
equation in the variables x1 , x2 , · · · , xn over R) is an equation of the form
a1 x1 + a2 x2 + · · · an xn = b
with a = (a1 , a2 , · · · , an )T ∈ Rn and b ∈ R. The numbers ai are called the coefficients
of the equation. A solution to the above equation is a point x = (x1 , x2 , · · · , xn )T ∈ Rn
for which the equation holds. The solution set of the equation is the set of all solutions.
Note that x = 0 is a solution if and only if b = 0, and in this case we say that the equation
is homogeneous.
2.2 Example: Let F be a field, let a = (a1 , a2 , · · · , an )T ∈ F n , let b ∈ F , and let S be the
solution set of the linear equation a1 x1 + a2 x2 + · · · + an xn = b. Show that either S = ∅
(the empty set) or S is an affine space in F n .
Solution: If a = 0 and b 6= 0 then the equation has no solution so we have S = ∅. If a = 0
and b = 0 then every x ∈ F n is a solution, so S = F n . If a 6= 0 then we can choose an
index k such that xk 6= 0 and then we have
n
P
P ai
ai xi = b ⇐⇒ xk = abk −
ak xi
i=1
i6=k
⇐⇒ x = x1 , x2 , · · · , xk−1 , abk −
P
⇐⇒ x = p +
P
i6=k
ai
ak
xi , xk+1 · · · , xn
T
xi ui
i6=k
where p =
b
ak
ek and ui = ei −
ai
ak
ek .
2.3 Definition: Let R be a ring and let n, m ∈ Z+ . A system of m linear equations
in Rn (or in n variables over R) is a set of m equations of the form
a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
..
.
am1 x1 + am2 x2 + · · · + amn xn = bm
where each aij ∈ R and each bj ∈ R. The numbers aij are called the coefficients of the
system. A solution to the above system is a point x = (x1 , x2 , · · · , xn )T ∈ Rn for which
all of the m equations hold. The solution set of the system is the set of all solutions.
Note that x = 0 is a solution if and only if b = 0, and in this case we call the system of
linear equations homogeneous.
10
2.4 Remark: We shall describe an algorithmic method for solving a given system of liner
equations over a field F . We shall see that if the the solution set is not empty then it is
an affine space in F n . The algorithm involves performing the following operations on the
equations in the system. These operations do not change the solution set.
(1) Ei ↔ Ej : interchange the ith and j th equations,
(2) Ei 7→ t Ej : multiply (both sides of) the ith equation by t where 0 6= t ∈ F , and
(3) Ei 7→ Ei + t Ej : add t times (each side of) the j th equation to (the same side of) the
ith equation Ei , where t ∈ F .
For now, we illustrate the algorithm in a particular example.
2.5 Example: Consider the system of linear equations over the field Q.
2x1 − x2 + 3x3 = 7
x1 + 0x2 + 2x3 = 5
3x1 − 4x2 + 2x3 = 3
Show that the solution set is an affine space in Q3 .
Solution: Performing operations of the above three types, we have
E1 ↔ E2
x1 + 0x2 + 2x3 = 5
x1 + 0x2 + 2x3 = 5 E2 7→ E2 − 2E1
0x1 − x2 − x3 = −3
2x1 − x2 + 3x3 = 7
E3 →
7 E3 − 3E1
0x1 − 4x2 − 4x3 = −12
3x1 − 4x2 + 2x3 = 3
E2 7→ −E2
x1 + 0x2 + 2x3 = 5
x1 + 0x2 + 2x3 = 5
0x1 + x2 + x3 = 3
E3 7→ E3 + 4E2
0x1 + x2 + x3 = 3
0x1 − 4x2 − 4x3 = −12
0x1 + 0x2 + 0x3 = 0
Thus the original system of 3 equations has the same solution set as the system
x1 + 2x3 = 5
x2 + x3 = 3.
If we let x3 = t, then x = (x1 , x2 , x3 )T is a solution when x1 = 5 − 2t, x2 = 3 − t and
x3 = t, that is when x = (5, 3, 0)T +t(−2, −1, 1)T . Thus the solution set is the line through
p = (5, 3, 0)T in the direction of u = (−2, −1, 1)T .
11
Matrix Notation
2.6 Remark: We wish to introduce some notation which will simplify our discussion of
systems of linear equations. In fact the objects that we introduce will turn out to be of
interest in their own right.
2.7 Definition: Let F be a field. An m × n matrix over F is an array of the form


a11 a12 · · · a1n
 a21 a22 · · · a2n 
A=
..
.. 
 ...
.
. 
am1
am2
···
amn
where each aij ∈ F . The number aij is called the (i, j) entry of the matrix A, and we
write
Aij = aij .
The set of all m × n matrices over F is denoted by Mm×n (F ). The set of n × n matrices
is also denoted by Mn (F ). Note that
F n = Mn×1 (F ) , Mn (F ) = Mn×n (F ) , and F = M1 (F ) .
The j th column of the above matrix A ∈ Mm×n (F ) is the vector (a1j , a2j , · · · , amj )T ∈ F m .
The j th row of A is the vector (aj1 , aj2 , · · · , ajn )T ∈ F n . Given vectors u1 , u2 , · · · , un ∈ F m ,
the matrix with columns u1 , u2 , · · · , un is the matrix
A = u1 , u2 , · · · , un ∈ Mm×n (F )
and given vectors v1 , v2 , · · · , vm ∈ F n , the matrix with rows v1 , v2 , · · · , vm is the matrix
 T 
v1
 v2 T 

A=
 ...  ∈ Mm×n (F ) .
vm T
Given a matrix A ∈ Mm×n (F ), the transpose of A is the matrix AT ∈ Mn×m (F ) with
entries (AT )ij = Aji = aji , that is


a11 a21 · · · am1
 a12 a22 · · · am2 
AT = 
.
..
.. 
 ...
.
. 
a1n
a2n
···
amn
The m × n zero matrix is the matrix 0 ∈ Mm×n (F ) whose entries are all equal to 0.
The n × n identity matrix is the matrix I ∈ Mn (F ) with columns e1 , e2 , · · · , en . Given a
matrix a = (a1 , a2 , · · · , an ) ∈ M1×n (F ) and a vector x = (x1 , x2 , · · · , xn )T ∈ F n , we define
the product ax ∈ F to be
 
x1
n
 x2  P

ax = a1 , a2 , · · · , an 
=
 ...  i=1 ai xi = a1 x1 + a2 x2 + · · · + an xn .
xn
12
More generally, given a matrix A ∈ Mm×n (F ) with entries Aij = aij , and a vector x ∈ F n
with entries xi ∈ F , we define the product Ax ∈ F m to be


  
a11 x1 + a12 x2 + · · · + a1n xn
x1
a11 a12 · · · a1n
 a21 a22 · · · a2n   x2   a21 x1 + a22 x2 + · · · + a2n xn 
.
 . =
Ax = 
..
..
.. 

 ...
.
.
.   ..  
am1
am2
···
am1 x1 + am2 x2 + · · · + amn xn
xn
amn
Equivalently, we define Ax to be the vector in F m with entries
n
P
(Ax)j =
aji xi .
i=1
Using this notation, notice that for a = (a1 , a2 , · · · , an ) ∈ M1×n (F ) and b ∈ F , the single
linear equation a1 x1 + a2 x2 + · · · an xn = b can be written simply as ax = b, and for
A ∈ Mm×n (F ) with entries Aij = aij and b = (b1 , b2 , · · · , bn )T ∈ F n , the system of
equations
a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
..
.
am1 x1 + am2 x2 + · · · + amn xn = bk
can be written simply as
Ax = b.
2.8 Note: For vectors v1 , v2 , · · · , vm ∈ F n and for x ∈ F n , if A is the matrix with rows
v1 , v2 , · · · , vm then the product Ax is defined so that we have
 T    T

v1
x1
v1 x 1
 v2 T   x 2   v2 T x 2 
  
.
Ax = 
..
 ...   ...  = 

.
vm T
vm T x n
xn
For vectors u1 , u2 , · · · , un ∈ F m and x ∈ F n , if A is the matrix with columns u1 , u2 , · · · , un ,
notice that we have
 
x1
 x2 

Ax = u1 , u2 , · · · , un 
 ...  = x1 u1 + x2 u2 + · · · + xn un ∈ Span {u1 , u2 , · · · , un }.
xn
2.9 Note: For 0 ∈ Mm×n (F ), I ∈ Mn (F ) and x ∈ F n we have 0x = 0 and Ix = x.
2.10 Theorem: (Linearity) For A ∈ Mm×n (F ), we have
(1) A(tx) = t Ax for all t ∈ F and x ∈ F n , and
(2) A(x + y) = Ax + Ay for all x, y ∈ F n .
Proof: For t ∈ F and for x, y ∈ F n we have A(tx)
j
=
n
P
Aji (tx)i = t
i=1
n
P
i=1
Aji xi = tAx j
n
n
n
n
P
P
P
P
and A(x + y) j =
Aji (x + y)i =
Aji (xi + yi ) =
Aji xi +
Aji yi = Ax + Ay j .
i=1
i=1
i=1
13
i=1
Row Equivalence and Reduced Row Echelon Form
2.11 Definition: Given a matrix A ∈ Mm×n (F ) with entries Aij = aij and a vector
b ∈ F n , we obtain the system of linear equations
a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
..
.
am1 x1 + am2 x2 + · · · + amn xn = bn
which we write simply as Ax =
b. The matrix A is called the coefficient matrix of the
system, and the matrix A | b ∈ Mk×n+1 (F ) is called the augmented matrix of the
system. The solution set of the equation Ax = b is the set
x ∈ F n Ax = b .
2.12 Definition: We noted earlier that we can perform three kinds of operations on
the equations in the system without changing the solution set. These correspond to the
following three kinds of operations that we can perform on the rows of the augmented
matrix without changing the solution set.
(1) Ri ↔ Rj : interchange rows i and j,
(2) Ri →
7 t Ri : multiply the ith row by t, where 0 6= t ∈ F , and
(3) Ri →
7 Ri + t Rj : add t times the j th row to the ith row, where t ∈ F .
These three kinds of operations will be called elementary row operations. Given matrices A, B ∈ Mm×n (F ), we say that A and B are row equivalent, and we write A ∼ B,
when B can be obtained by applying a finite sequence of elementary row operations to A.
2.13 Remark: In the next section, we describe an algorithm for finding the solution set
to a given matrix equation Ax = b. We shall use elementary row operations to construct
a sequence of augmented matrices
(A|b) = (A0 |b0 ) ∼ (A1 |b1 ) ∼ (A2 |b2 ) ∼ · · · ∼ (Al , bl ) = (R, c)
such that each equation Ak x = bk has the same solution set as the original equation
Ax = b, and such that the final matrix Al = R is in a particularly nice form so that we can
easily determine the solution set. We shall find that when the solution set is non-empty,
we can write the solutions in the form
x = p + t1 u1 + t2 u2 + · · · + tr ur .
2.14 Definition: A matrix A ∈ Mm×n (F ) is said to be in reduced row echelon form
when A = 0 or there exist column indices 1 ≤ j1 ≤ j2 ≤ · · · ≤ jr ≤ n, where 1 ≤ r ≤ n,
such that for each row index k with 1 ≤ k ≤ m we have
(1)
(2)
(3)
(4)
Ak,jk = 1,
for each j < jk we have Akj = 0,
for each i < k we have Ai,jk = 0, and
for all i > r and all j we have Aij = 0.
The entries Ak,jk = 1 are called the pivots, the positions (k, jk ) where they occur are
called the pivot positions, the columns j1 , j2 , · · · , jr of A are called the pivot columns,
and the number of pivots r is called the rank of the reduced row echelon matrix A.
14
2.15 Example: Consider the augmented matrix

1 −2 1 0 −3 0 2
 0 0 0 1 1 0 −1
(A|b) = 
0 0 0 0 0 1 −2
0 0 0 0 0 0 0

4
1
.
3
0
Note that the matrix A is in reduced row echelon form with the pivots in positions
(1, 1), (2, 4) and (3, 6). The corresponding system of equations is
x1 − 2x2 + x3 − 3x5 + 2x7 = 4
x4 + x5 − x7 = 1
x6 − 2x7 = 3
The solutions x can be obtained by letting x2 = t1 , x3 = t2 , x5 = t3 and x7 = t4 , with
t1 , t2 , t3 , t4 ∈ F arbitrary, and then solving for x1 , x4 and x6 to get
x1 = 4 + 2t1 − t2 + 3t3 − 2t4
x4 = 1 − t3 + t4
x6 = 3 + 2t4
Thus the solution set is the set of points of the form
x = p + t1 u1 + t2 u2 + t3 u3 + t4 u4
where
 
 
 
 
 
−2
3
−1
2
4
0
0
0
1
0
 
 
 
 
 
0
0
1
0
0
 
 
 
 
 
p =  1  , u1 =  0  , u2 =  0  , u3 = −1  , u4 =  1  .
 
 
 
 
 
0
1
0
0
0
 
 
 
 
 
2
0
0
0
3
1
0
0
0
0
We can also write this as
x = p + Bt
where B is the matrix with columns u1 , u2 , u3 , u4 .
2.16 Note: In general, suppose that A ∈ Mm×n (F ) is in reduced row echelon form with
pivot column indices 1 ≤ j1 ≤ j2 ≤ · · · ≤ jr ≤ n. Let 1 ≤ l1 ≤ l2 ≤ · · · ≤ ln−r ≤ n be the
non-pivot column indices. Write
R c
A|b =
0 d
where R ∈ Mr×n (F ) is the matrix whose rows are the non-zero rows of A and where
c ∈ F r and d ∈ F m−r . Then the equation Ax = b has a solution if and only if d = 0, and
in this case, as in the above example, the solution is given by x = p + Bt where p ∈ F n
and B ∈ Mn×m−r (F ) are given by pJ = c , pL = 0 , BJ = −AL , and BL = I where
pJ = (pj1 , pj2 , · · · , pjr )T , pL = (pl1 , pl2 , · · · , plm−r )T , BJ is the matrix whose rows are the
rows j1 , j2 , · · · , jr of B, BL is the matrix whose rows are the rows l1 , l2 , · · · , lm−r of B, and
AL is the matrix whose columns are the columns l1 , l2 , · · · , lm−r of A.
15
Gauss-Jordan Elimination
2.17 Theorem: (Gauss-Jordan Elimination) Let A ∈ Mm×n (F ) and let b ∈ F m , and
consider the equation Ax = b. If A = 0 and b = 0 then every x ∈ F n is a solution so the
solution set is Fn . If A = 0 and b 6= 0 then there is no solution, so the solution set is ∅. If
A 6= 0 then we can perform a series of elementary row operations to obtain a sequence of
augmented matrices
R c
(A|b) = (A0 |b0 ) ∼ (A1 |b1 ) ∼ (A2 |b2 ) ∼ · · · ∼ (Al |bl ) =
0 d
where Al is in reduced row echelon form and R is the matrix whose rows are the non-zero
rows of Al . If d 6= 0 then the solution set is empty and if d = 0 then the solution set is the
affine space x = p + Bt, as described in Note 2.16 above.
Proof: Suppose that A 6= 0. We describe an algorithm to obtain the required sequence of
augmented matrices.
Step 1: choose the smallest index j = j1 such that the j th column of A is not zero, then
choose the smallest index i = i1 such that aij 6= 0. Perform the row operations Ri 7→ a1ij Ri
then R1 ↔ Ri to obtain a new augmented matrix (A0 |b0 ) whose first j − 1 columns are all
zero, with A0 1j = 1. For each i > 1 perform the row operation Ri 7→ Ri − A0 ij R1 to obtain
a new augmented matrix (A1 |b1 ) whose first j − 1 columns are zero and whose j th column
is e1 .
Step s + 1: suppose that we have performed the first s steps in the algorithm and have
obtained an augmented matrix (As , bs ) which is row-equivalent to (A|b) and which has the
property that there exist column indices 1 ≤ j1 ≤ j2 ≤ · · · ≤ js ≤ n such that for each row
index k with 1 ≤ k ≤ s we have
(1)
(2)
(3)
(4)
(As )k,jk = 1,
for each j < jk we have (As )kj = 0,
for each i < k we have (As )i,jk = 0, and
for all i > s and all j ≤ js we have (As )ij = 0.
If s = m or if js = n then we are done because the matrix As is already in reduced row
echelon form. Suppose that s < m and js < n. If for all i > s and all j > js we have
(As )ij = 0 then we are done because the matrix As is already in row echelon form. Suppose
that Aij 6= 0 for some i > s and some j > js . Let j = js+1 be the smallest index such that
(As )ij 6= 0 for some i > s, then apply elementary row operations, involving only rows i
with i > s, to the augmented matrix (As |bs ) to obtain a matrix (A0s |b0s ) with (A0s )s+1,j = 1;
this can be done for example by choosing an index i > s such that (As )ij 6= 0 and then
performing the row operation Ri 7→ (As1)ij Ri then (if i 6= s + 1) the operation Rs+1 ↔ Ri .
Then for each i 6= s+1 perform the row operation Ri 7→ Ri −(A0s )ij Rs+1 to the augmented
matrix (A0s |bs ) to obtain the row-equivalent augmented matrix (As+1 |bs+1 ). Verify that
the new matrix satisfies the above 4 properties with s replaced by s + 1.
2.18 Definition: The algorithm for solving the system Ax = b described in the proof of
the above theorem is called Gauss-Jordan elimination.
16
2.19 Example: Solve the system 3x + 2y + z = 4, 2x + y − z = 3, x + 2y + 4z = 1 in Q3 .
Solution: We form the augmented matrix for the system then perform row operations to
reduce to reduced row echelon form.




1
4
R
→
7
R
−
R
1
1
2
3
2
1
1
1
3
 2 1 −1 3  R2 7→ R2 − 2R1
A b =  2 1 −1 3 
R3 7→ R3 − R1
1 2 4 1
1 2 4 1




R1 7→ R1 − R2 1 0 −3 2
1 1 2 1
 0 1 5 −1 
 0 1 5 −1 
R3 7→ − 31 R3
R3 7→ R3 − R2 0 0 −3 1
0 1 2 0




1
R1 7→ R1 − 3R3 1 0 0
1 0 −3
2
2 
 0 1 5 −1  R2 7→ R2 + 5R3  0 1 0
3
0 0 1 −1
0 0 1 − 13
3
From the final reduced matrix we see that the solution is given by (x, y, z) = 1, 32 , − 13 .
2.20 Remark: Note that in the above solution, at the first step we used the row operation
R1 7→ R1 − R3 to obtain a pivot in position (1, 1), but we could have achieved this in many
different ways. For example, we could have used the row operation R1 ↔ R3 or we could
have used R1 7→ 13 R1 .
2.21 Example: Solve the system 2x + y + 3z = 1, 3x + y + 5z = 2, x − y + 3z = 0 in Q3 .
Solution: Using Gauss-Jordan elimination, we have
 
 

1
1 2 0 1
2 1 3 1





(A|b) = 3 1 5 2 ∼ 3 1 5 2 ∼ 0
0
1 −1 3 0
1 −1 3 0

 

1 0 2 −1
1 2 0 1



∼ 0 1 −1 1 ∼ 0 1 −1 1 
0 0 0 −2
0 3 −3 1
1
1
1

2 1
5 −1 
2 0
From the reduced matrix, we see that there is no solution.
2.22 Example: Solve the system x1 + 2x2 + x3 + 3x4 = 2, 2x1 + 3x2 + x3 + 4x4 + x5 = 5,
x1 + 3x2 + 2x3 + 4x4 + x5 = 3 in Q5 .
Solution: Using Gauss-Jordan elimination gives

 
1 2 1 3 0 2
1 2



(A|b) = 2 3 1 4 1 5 ∼ 0 1
1 3 2 4 2 3
0 1

 
1 0 −1 −1 2
4
1
∼  0 1 1 2 −1 −1  ∼  0
0 0 0 1 −3 −2
0
1
1
1

3 0
2
2 −1 −1 
1 2
1
0 −1
1 1
0 0

0 −1 2
0 5
3
1 −3 −2
From the reduced matrix we see that the solution set is the plane given by x = p + su + tv
where p = (2, 3, 0, −2, 0)T , u = (1, −1, 1, 0, 0)T and v = (1, −5, 0, 3, 1)T .
17
Chapter 3. Matrices and Concrete Linear Maps
The Row Space, Column Space and Null Space of a Matrix
3.1 Definition: Let R be a ring. For a matrix A ∈ Mm×n (R), the row span of A,
denoted by Row(A), is the span of the rows of A, the column span of A, denoted by
Col(A), is the span of the columns of A, and the null set of A, is the set
Null(A) = x ∈ F n Ax = 0 .
When F is a field, Row(A) and Col(A) are also called the row space and column space
of A, and we define the rank of A and the nullity of A are the dimensions
rank(A) = dim ColA and nullity(A) = dim NullA .
n
P
3.2 Note: For A = (u1 , u2 , · · · , un ) ∈ Mm×n (R) and t ∈ Rn we have At =
ti ui , so
i=1
Col(A) = At t ∈ Rn .
3.3 Theorem: Let F be a field, let A ∈ Mm×n (F ) and let b ∈ F m . If x = p is a solution
to the equation Ax = b then
x ∈ F n Ax = b = p + Null(A).
Proof: If Ap = b then for x ∈ F n we have
Ax = b ⇐⇒ Ax = Ap ⇐⇒ A(x − p) = 0 ⇐⇒ (x − p) ∈ NullA ⇐⇒ x ∈ p + Null(A).
3.4 Note: For A = {u1 , u2 , · · · , un } ⊆ F m and A = (u1 , u2 , · · · , un ) ∈ Mm×n (F ),
A is linearly independent
⇐⇒ for all t1 , t2 , · · · , tn ∈ F , if
n
P
ti ui = 0 then each ti = 0
i=1
⇐⇒ for all t ∈ F n , if At = 0 then t = 0
⇐⇒ Null(A) = {0} ⇐⇒ Null(R) = {0}
I
⇐⇒ R has a pivot in every column ⇐⇒ R is of the form R =
, and
0
A spans F m ⇐⇒ Col(A) = F m
⇐⇒ for every x ∈ F m there exists t ∈ F n with At = x
⇐⇒ for every y ∈ F m there exists t ∈ F n with Rt = y
⇐⇒ R has a pivot in every row.
3.5 Theorem: Let F be a field, let A = (u1 , u2 , · · · , un ) ∈ Mm×n (F ), and suppose A ∼ R
where R is in reduced row echelon form with pivots in columns 1 ≤ j1 < j2 < · · · < jr ≤ n.
Then
(1) the non-zero rows of R form a basis for Row(A),
(2) the set {uj1 , uj2 , · · · , ujr } is a basis for Col(A), and
(3) when we solve Ax = b using Gauss-Jordan elimination and write the solution as
x = p + Bt as in Note 2.16, the columns of B form a basis for Null(A).
18
Proof: First we prove Part (1). By Theorem 1.31, when we perform an elementary row
operation on a matrix, the span of the rows is unchanged, and so we have Row(A) =
Row(R). The nonzero rows of R span Row(R), so it suffices to show that the nonzero
rows of R are linearly independent. Let 1 ≤ j1 < j2 < · · · < jr be the indices of the
pivot columns in R. Let v1 , v2 , · · · , vr be the nonzero rows of R. Because R is in reduced
row echelon form, for 1 ≤ i ≤ r and 1 ≤ k ≤ r we have (vi )jk = δi,k . It follows that
r
P
{v1 , v2 , · · · , vr } is linearly independent because if
ti vi = 0 with each ti ∈ F then for all
i=1
k with 1 ≤ k ≤ r we have
0=
r
P
i=1
t i vi
jk
=
r
P
ti (vi )jk =
i=1
r
P
ti δi,k = tk .
i=1
To prove Part (2), let 1 ≤ l1 < l2 < · · · , ln−r ≤ n be the indices of the non-pivot
columns. Let v1 , v2 , · · · , vn ∈ F m be the columns of R and note that we have vji = ei
for 1 ≤ i ≤ r. When we use row operations to reduce A to R, the same
row operations
I
reduce AJ = (uj1 , · · · , ujr ) to RJ = (vj1 , · · · , vjt ) = (e1 , · · · , er ) =
. This shows
0
that {uj1 , · · · , ujr } is linearly independent. When we use row operations to reduce A to
R, the same row operations will reduce (A|uk ) to (R|vk ), and so the equation Ax = uk
has the same solutions as the equation Rx = vk . Since only the first r columns of R
r
r
P
P
are nonzero, each column vk can be written as vk =
(vk )i ei =
(vk )i vji = Rt where
i=1
i=1
t ∈ Rn is given by tJ = vk and tL = 0. Since Ax = uk and Rx = vk have the same
r
P
solutions, we also have uk = At =
(vk )i uji ∈ Span {uj1 , uj2 , · · · , ujr }. This shows that
i=1
Col(A) = Span {u1 , u2 , · · · , un } = Span {uj1 , · · · , ujr }.
Since the solution set to the equation Ax = b is the set
{x ∈ Rn |Ax = b} = p + Col(B) = p + Null(A)
we must have Col(B) = Null(A). Since (as in Note 2.16) we have BL = I, it follows that the
columns of B are linearly independent using the same argument that we used in Part (1)
to show that the nonzero rows of R are linearly independent. This proves Part (3).
3.6 Corollary: Let F be a field, let A ∈ Mm×n (F ), suppose that A is row equivalent to
a reduced row echelon matrix which has r pivots. Then
rank(A) = dim(RowA) = dim(ColA) = r , and
nullity(A) = dim(NullA) = n − r.
3.7 Corollary: Let F be a field, let A ∈ Mm×n (F ) and suppose that A is row equivalent
to a row reduced echelon matrix R.
(1) The rows of A are linearly independent ⇐⇒ the columns of A span F m
rank(A) = m ⇐⇒ R has a pivot in every row.
⇐⇒
(2) The rows of A span F n
⇐⇒ the columns of A are linearly independent
⇐⇒
I
rank(A) = n ⇐⇒ R has a pivot in every column ⇐⇒ R is of the form R =
.
0
(3) The rows of A form a basis for Rn ⇐⇒ the columns of A form a basis for F m ⇐⇒
rank(A) = m = n ⇐⇒ R = I.
19
Matrices and Linear Maps
3.8 Definition: Let R be a ring. A map L : Rn → Rm is called linear when
(1) L(x + y) = L(x) + L(y) for all x, y ∈ Rn , and
(2) L(tx) = t L(x) for all x ∈ Rn and all t ∈ R.
3.9 Note: Given a matrix A ∈ Mm×n (R), the map L : Rn → Rm given by L(x) = Ax is
linear.
3.10 Theorem: Let L : Rn → Rm be linear. There exists a unique matrix A ∈ Mm×n
(R)
n
such that L(x) = Ax for all x ∈ R , namely the matrix A = L(e1 ), L(e2 ), · · · , L(en ) .
Proof: Let L : Rn → Rm and let A = (u1 , u2 , · · · , un ) ∈ Mm×n (R). If L(x) = Ax for
all x ∈ R then for each index k we have uk = Aek = L(ek ). Conversely, suppose that
uk = L(ek ) for every index k. Then for all x ∈ Rn we have
n
n
n
P
P
P
L(x) = L
xi ei =
xi L(ei ) =
xi ui = Ax .
i=1
i=1
i=1
3.11 Notation: Often, we shall not make a notational distinction between the matrix
A ∈ Mm×n (R) and its corresponding linear map A : Rn → Rm given by A(x) = Ax.
When we do wish to make a distinction, we shall use the following notation. Given a
matrix A ∈ Mm×n (R) we let LA : Rn → Rn be the linear map given by
LA (x) = Ax for all x ∈ Rn
and given a linear map L : Rn → Rm we let [L] be the corresponding matrix given by
[L] = L(e1 ), L(e2 ), · · · , L(en ) ∈ Mm×n (R).
3.12 Definition: For a linear map L : Rn → Rm , the kernel (or the null set of L is the
set
Ker(L) = Null(L) = L−1 (0) = {x ∈ Rn |L(x) = 0}
and the image (or the range of L) is the set
Image(L) = Range(L) = L(Rn ) = {L(x)|x ∈ Rn }.
We also use the same terminology for a matrix A ∈ Mm×n (R) when we think of the
matrix as a linear map, so when A = [L] we have Ker(L) = Null(L) = Ker(A) = Null(A)
and Image(L) = Range(L) = Image(A) = Range(A) = Col(A). When F is a field and
L : F n → F m is linear, we define the rank and the nullity of L to be the dimensions
rank(L) = dim Range(L) and nullity(L) = dim Null(L) .
3.13 Theorem: Let R be a ring and let L : Rn → Rm be a linear map. Then
(1) L is surjective if and only if Range(L) = F m , and
(2) L is injective if and only if Null(L) = {0}.
Proof: Part (1) is obvious, so we only prove Part (2). Note that since L is linear we have
L(0) = L(0 · 0) = 0 L(0) = 0 and so 0 ∈ Null(L). Suppose that L is injective. Then for
x ∈ Rn we have x ∈ Null(L) =⇒ L(x) = 0 =⇒ L(x) = L(0) =⇒ x = 0 so Null(L) = {0}.
Conversely, suppose that Null(L) = {0}. Then for x, y ∈ Rn we have
L(x) = L(y) =⇒ L(x − y) = 0 =⇒ (x − y) ∈ Null(L) = {0} =⇒ x − y = 0 =⇒ x = y
and so L is injective.
20
3.14 Example: The identity map on Rn is the map I : Rn → Rn given by I(x) = x for
all x ∈ Rn , and it corresponds to the identity matrix I ∈ Mn (R) with entries Ii,j = δi,j .
The zero map O : Rn → Rm given by O(x) = 0 for all x ∈ Rn corresponds to the zero
matrix O ∈ Mm×n (R) with entries Oi,j = 0 for all i, j.
3.15 Note: Given linear maps L, M : Rn → Rm and K : Rm → Rl and given t ∈ R, the
maps (L + M ) : Rn → Rn , tL and KL : Rn → Rl given by (L + M )(x) = L(x) + M (x),
(tL)(x) = t L(x) and KL : Rn → Rl are all linear. For example, to see that KL is linear,
note that for x, y ∈ Rn and t ∈ R we have
(KL)(x + y) = K L(x + y) = K L(x) + L(y)
= K L(x) + K L(y) = (KL)(x) + (KL)(y) , and
KL(tx) = K L(tx) = K t L(x) = t K L(x) = t(KL)(x).
3.16 Definition: Given A, B ∈ Mm×n (R) we define A + B ∈ Mm×n (R) to be the matrix
such that (A + B)(x) = Ax + Bx for all x ∈ Rn . Given A ∈ Mm×n (R) and t ∈ R, we
define tA ∈ Mm×n (R) to be the matrix such that (tA)(x) = t Ax for all x ∈ Rn . Given
A ∈ Ml×m (R) and B ∈ Mm×n (R) we define AB ∈ Ml×n (R) to be the matrix such that
(AB)x = A(Bx) for all x ∈ Rn .
3.17 Note: From the above definitions, it follows immediately that for all matrices A, B, C
of appropriate sizes and for all s, t ∈ R, we have
(1) (A + B) + C = A + (B + C),
(2) A + B = B + A,
(3) O + A = A = A + O,
(4) A + (−A) = 0,
(5) (AB)C = A(BC),
(6) IA = A = AI,
(7) OA = O and AO = O,
(8) (A + B)C = AC + BC and A(B + C) = AB + AC,
(9) s(tA) = (st)A,
(10) if R is commutative then A(tB) = t(AB),
(11) (s + t)A = sA + tA and t(A + B) = tA + tB, and
(12) 0A = O, 1A = A and (−1)A = −A.
In particular, the set Mn (R) is a ring under addition and multiplication of matrices.
3.18 Theorem: For A, B ∈ Mm×n (R) and t ∈ R, the matrices A + B and tA are given
by (A + B)i,j = Ai,j + Bi,j and (tA)i,j = t Ai,j . For A = (u1 , u2 , · · · , ul )T ∈ Ml×m (R)
and B = (v1 , v2 , · · · , vn ) ∈ Mm×n (R), the matrix AB is given by
(AB)j,k = vj T uk =
m
P
Aj,i Bi,k .
i=1
Proof: For A, B ∈ Mm×n (R), the k th column of (A + B) is equal to (A + B)ek = Aek + Bek
which is the sum of the k th columns of A and B. It follows that (A + B)j,k = Aj,k + Bj,k
for all j, k. Similarly for t ∈ R, the k th column of tA is equal to (tA)ek = t Aek which is t
times the k th column of A.
Now let A = (u1 , · · · , ul )T ∈ Ml×m (R) and B = (v1 , · · · , vn ) ∈ Mm×n (R). The k th
column of (AB) is equal to (AB)ek = A(Bek ) = Avk , so the (j, k) entry of AB is equal to
(AB)j,k = vj T uk = Aj,1 , Aj,2 , · · · , Aj,m
21
The Transpose and the Inverse
3.19 Definition: For a linear map L : Rn → Rm the transpose of L is the map
LT : Rm → Rn such that [LT ] = [L]T .
3.20 Note: When R is a ring, for A ∈ Mm×n (R) we have Row(A) = Col(AT ) and
Col(A) = Row(AT ). When F is a field, for A ∈ Mm×n (F ) we have rank(A) = rank(AT )
and for a linear map L : Rn → Rm we have rank(L) = rank(LT ).
3.21 Definition: For linear maps L : Rn → Rm and M : Rm → Rn , when LM = I
where I : Rm → Rm we say that L is a left inverse of M and that M is a right inverse
of L, and when LM = I and M L = I we say that L and M are (two-sided) inverses of
each other. When L : Rn → Rm has a (two-sided) inverse M : Rm → Rn we say that L is
invertible. We use the same terminology for matrices A ∈ Mm×n (R) and B ∈ Mn×m (R).
3.22 Theorem: Let R be a ring, let A ∈ Mm×n (R) and B ∈ Mn×m (R). If B is a left
inverse of A and C is a right inverse of A then B = C. A similar result holds for linear
maps L : Rn → Rm and K, M : Rm → Rn .
Proof: Suppose that BA = I and that AC = I. Then
B = BI = B(AC) = (BA)C = IC = C .
3.23 Theorem: Let R be a commutative ring.
(1) For A, B ∈ Mm×n (R) and t ∈ R we have
(AT )T = A , (A + B)T = AT + B T and (tA)T = t AT .
A similar result holds for linear maps L, M : Rn → Rm .
(2) If A ∈ Ml×m (R) and B ∈ Mm×n (R) then
(AB)T = B TAT .
A similar result holds for linear maps L : Rl → Rm and M : Rm → Rn .
(3) For invertible matrices A, B ∈ Mn (R) and for an invertible element t ∈ R we have
(A−1 )−1 , (tA)−1 =
1
t
A−1 and (AB)−1 = B −1 A−1 .
A similar result holds for invertible linear maps L, M : Rn → Rn .
Proof: We leave the proof of Part (1) as an exercise. To prove Part (2), suppose that R is
commutative and let A ∈ Ml×m (R) and B ∈ Mm×n (R). Then for all indices j, k we have
(AB)T j,k = (AB)k,j =
m
P
i=1
Ak,i Bi,j =
m
P
Bi,j Ak,i =
i=1
m
P
B T j, iAT i,k = (B TAT )j,k .
i=1
To prove Part (3), let A, B ∈ Mn (R) be invertible matrices and let t ∈ R be an
invertible element.
Because AA−1 = I and A−1A = I, it follows that
(A−1 )−1 = A.
Because (tA) 1t A = t · 1t AA−1 = 1 · I = I and similarly 1t A tA = I, it follows
that (tA)−1 = 1t A. Because (AB)(B −1A−1 ) = A(BB −1 )A−1 = AA−1 = I and similarly
B −1A−1 )(AB) = I, it follows that (AB)−1 = B −1A−1 .
22
3.24 Theorem: Let F be a field and let A ∈ Mm×n (F ),
(1)
(2)
(3)
(4)
A is surjective ⇐⇒ A has a right inverse matrix,
A is injective ⇐⇒ A has a left inverse matrix,
if A is bijective then n = m and A has a (two-sided) inverse matrix, and
when n = m, A is bijective ⇐⇒ A is surjective ⇐⇒ A is injective.
A similar result holds for a linear map L : Rn → Rm .
Proof: We prove Part (1). Suppose first that A has a right inverse matrix, say AB = I
with B ∈ Mn×n (F ). Then given y ∈ F m we can choose x ∈ F n to get
Ax = A(By) = (AB)y = Iy = y.
Thus A is surjective. Conversely, suppose that A is surjective. For each index k ∈
{1, 2, · · · , m}, choose uk ∈ F n so that Auk = ek , and then let B = (u1 , u2 , · · · , um ) ∈
Mn×m (F ). Then we have
AB = A(u1 , u2 , · · · , um ) = Au1 , Au2 , · · · , Aum = (e1 , e2 , · · · , em ) = I .
To prove Part (2), suppose first that A has a left inverse matrix, say BA = I with
B ∈ Mn×n (F ). Then for x ∈ F n we have
Ax = 0 =⇒ B(Ax) = 0 =⇒ (BA)x = 0 =⇒ Ix = 0 =⇒ x = 0
and so Null(A) = {0}. Thus A is injective. Conversely, suppose that A is injective. Then
Null(A) = {}, so the columns of A are linearly independent, hence the rows of A span F n ,
equivalently the columns of AT span F n , hence Range(AT ) = F n and so AT is surjective.
Since AT is surjective, we can choose C ∈ Mm×n (F ) so that AT C = I. Let B = C T so
that ATB T = I. Transpose both sides to get BA = I T = I. Thus the matrix B is a left
inverse of A.
Parts (3) and (4) follow easily from Parts (1) and (2) together with previous results
(namely Note 3.4, Corollary 3.7 and Theorems 3.13 and 3.22).
3.25 Note: To obtain a right inverse of a given matrix A ∈ Mm×n (F ) using the method
described in the proof of Part (1) of the above theorem, we can find vectors u1 , u2 , · · · , um ∈
F n such that Auk = ek for each index k by reducing each of the augmented matrices (A|ek ).
Since the same row operations which are used to reduce (A|e1 ) to the form (R|u1 ), (with
R in reduced echelon form) will also reduce each of the augmented matrices (A|ek ) to the
form (R|ek ), we can solve all of the equations Auk = ek simultaneously by reducing the
matrix (A|I) = (A|e1 , e2 , · · · , em ) to the form (R|u1 , u2 , · · · , um ).


1 3 2
3.26 Example: Let A =  2 4 1  ∈ M3 (Q). Find A−1 .
1 1 0
Solution: We have

1

(A|I) = 2
1

1

∼ 0
0
 
0
1 3


0 ∼ 0 −2
0 −2
1
 
1
1 0 0
3 2
1
3


1 −2 0 ∼ 0
1 2
−2 −2 −1 0 1
0
3
4
1
2
1
0
1
0
0
0
1
0

2
1 0 0
−3 −2 1 0 
−2 −1 0 1
1
5 
0 0
2 −1
2
1 0 − 12 1 − 23 
0 1
1 −1 1
and so A−1 is equal to the matrix whichh appears on the right of the final matrix above.
23
Chapter 4. Determinants
Permutations
4.1 Definition: A group is a set G together with an element e ∈ G, called the identity
element, and a binary operation ∗ : G × G → G, where for a, b ∈ G we write ∗(a, b) as a ∗ b
or often simply as ab, such that
(1) ∗ is associative: (ab)c = a(bc) for all a, b, c ∈ G,
(2) e is an identity: ae = a = ea for all a ∈ G, and
(3) every a ∈ G has an inverse: for every a ∈ G there exists b ∈ G with ab = e = ba.
A group G is called abelian when
(4) ∗ is commutative: ab = ba for all a, b ∈ G.
4.2 Note: Let G be a group. Note that the identity element e ∈ G is the unique element
that satisfies Axiom (2) in the above definition because if u ∈ G has the property that
ua = a = au for all a ∈ G, then in the case that a = e we obtain e = ue = e. Also note
given a ∈ G the element b which satisfies Axiom (3) above is unique because if ab = e and
ca = e then we have b = eb = (ca)b = c(ab) = ce = e.
4.3 Definition: Let G be a group. Given a ∈ G, the unique element b ∈ G such that
ab = e = ba is called the inverse of a and is denoted by a−1 (unless the operation in G
is addition denoted by +, in which case the inverse of a is also called the negative of a
and is denoted by −a). We write a0 = e and for k ∈ Z+ we write ak = aa · · · a (where the
product involves k copies of a) and a−k = (ak )−1 .
4.4 Note: In a group G, we have the cancellation property: for all a, b, c ∈ G, if ab = ac
(or if ca = ba) then b = c. Indeed, if ab = ac then
b = eb = (a−1 a)b = a−1 (ab) = a−1 (ac) = (a−1 a)c = ec = c.
4.5 Example: If R is a ring under addition and multiplication then R is also an abelian
group under addition. The identity element is 0 and the inverse of a ∈ R is −a. For
example Zn , Z, Q, R and C are all abelian groups under addition.
4.6 Example: If R is a ring under addition and multiplication then the set
R∗ = a ∈ R a is invertible
is a group under multiplication. The identity element is 1 and the inverse of a ∈ R is a−1 .
For example Z∗ = {1, −1}, Q∗ = Q \ {0}, R∗ = R \ {0} and C∗ = C \ {0} are all abelian
groups under multiplication. For n ∈ Z+ , the group of units modulo n is the group
Un = Zn ∗ = a ∈ Zn gcd(a, n) = 1 .
The group of units Un is an abelian group under multiplication modulo n. When R is a
ring (usually commutative), the general linear group GLn (R) is the group
GLn (R) = A ∈ Mn (R) A is invertible .
When n ≥ 2, the general linear group GLn (R) is a non-abelian group under matrix multiplication.
24
4.7 Definition: Let X be a set. The group of permutations of X is the group
Perm (X) = f : X → X f is bijective
under composition. The identity element is the identity map I : X → X given by I(x) = x
for all x ∈ X. For n ∈ Z+ , the nth symmetric group is the group
Sn = Perm {1, 2, · · · , n} .
4.8 Definition: When a1 , a2 , · · · , al are distinct elements in {1, 2, · · · , n} we write
α = (a1 , a2 , · · · , al )
for the permutation α ∈ Sn given by
α(a1 ) = a2 , α(a2 ) = a3 , · · · , α(al−1 ) = al , α(al ) = a1
α(k) = k for all k ∈
/ {a1 , a2 , · · · , al } .
Such a permutation is called a cycle of length l or an l-cycle.
4.9 Note: We make several remarks.
(1) We have e = (1) = (2) = · · · = (n).
(2) We have (a1 , a2 , · · · , al ) = (a2 , a3 , · · · , al , a1 ) = (a3 , a4 , · · · , al , a1 , a2 ) = · · ·.
(3) An l-cycle with l ≥ 2 can be expressed uniquely in the form α = (a1 , a2 , · · · , al ) with
a1 = min{a1 , a2 , · · · , al }.
(4) If α = (a1 , a2 , · · · , al ) then α−1 = (al , al−1 , · · · , a2 , a1 ) = (a1 , al , · · · , a3 , a2 ).
(5) If n ≥ 3 then we have (12)(23) = (123) and (23)(12) = (132) so Sn is not abelian.
4.10 Definition: In Sn , given cycles αi with αi = (ai,1 , ai,2 , · · · , ai,li ), we say that the
cycles αi are disjoint when all the elements ai,j ∈ {1, 2, · · · , n} are distinct.
4.11 Theorem: (Cycle Notation) Every α ∈ Sn can be written as a product of disjoint
cycles. Indeed every α 6= e can be written uniquely in the form
α = (a1,1 , · · · , a1,l1 )(a2,1 , · · · , a2,l2 ) · · · (am,1 , · · · , am,lm )
with m ≥ 1, each li ≥ 2, the elements ai,j all distinct, each ai,1 = min{ai,1 , ai,2 , · · · , ai,li }
and a1,1 < a2,1 < · · · < am,1 .
Proof: Let e 6= α ∈ Sn where n ≥ 2. To write α in the given form, we must take
a1,1 to be the smallest element k ∈ {1, 2, · · · , n} with α(k) 6= k. Then we must have
a1,2 = α(a1,1 ), a1,3 = α(a1,2 ) = α2 (a1,1 ), and so on. Eventually we must reach l1 such that
a1,1 = αl1 (a1,1 ), indeed since {1, 2, · · · , n} is finite, eventually we find αi (a1,1 ) = αj (a1,1 )
for some 1 ≤ i < j and then a1,1 = α−i αi (a1,1 ) = α−i αj (a1,1 ) = αj−i (a1,1 ). For the
smallest such l1 the elements a1,1 , · · · , a1,l1 will be disjoint since if we had a1,i = a1,j for
some 1 ≤ i < j ≤ l1 then, as above, we would have αj−i (a11 ) = a11 with 1 ≤ j − i < l1 .
This gives us the first cycle α1 = (a1,1 , a1,2 , · · · , a1,l1 ).
If we have α = α1 we are done. Otherwise there must be some k ∈ {1, 2, · · · , n} with
k∈
/ {a1,1 , a1,2 , · · · , a1,l1 } such that α(k) 6= k, and we must choose a2,1 to be the smallest
such k. As above we obtain the second cycle α2 = (a2,1 , a2,2 , · · · , a2,l2 ). Note that α2 must
be disjoint from α1 because if we had αi (a2,1 ) = αj (a1,1 ) for some i, j then we would have
a2,1 = α−i αi (a2,1 ) = α−i αj (a1,1 ) = αj−i (a1,1 ) ∈ {a1,1 , · · · , a1,l1 }.
At this stage, if α = α1 α2 we are done, and otherwise we continue the procedure.
4.12 Definition: When a permutation e 6= α ∈ Sn is written in the unique form of the
above theorem, we say that α is written in cycle notation. We usually write e as e = (1).
25
4.13 Theorem: (Even and Odd Permutations) In Sn , with n ≥ 2,
(1) every α ∈ Sn is a product of 2-cycles,
(2) if e = (a1 , b1 )(a2 , b2 ) · · · (al , bl ) then l is even, that is l = 0mod 2, and
(3) if α = (a1 , b1 )(a2 , b2 ) · · · (al , bl ) = (c1 , d1 )(c2 , d2 ) · · · (cm , dm ) then l = m mod 2.
Solution: To prove part (1), note that given α ∈ Sn we can write α as a product of cycles,
and we have
(a1 , a2 , · · · , al ) = (a1 , al )(a1 , al−1 ) · · · (a1 , a2 ) .
We shall prove part (2) by induction. First note that we cannot write e as a single
2-cycle, but we can write e as a product of two 2-cycles, for example e = (1, 2)(1, 2). Fix
l ≥ 3 and suppose, inductively, that for all k < l, if we can write e as a product of k
2-cycles the k must be even. Suppose that e can be written as a product of l 2-cycles, say
e = (a1 , b1 )(a2 , b2 ) · · · (al , bl ). Let a = a1 . Of all the ways we can write e as a product of l
2-cycles, in the form e = (x1 , y1 )(x2 , y2 ) · · · (xl , yl ), with xi = a for some i, choose one way,
say e = (r1 , s1 )(r2 , s2 ) · · · (rl , sl ) with rm = a and ri , si 6= a for all i < m, with m being as
large as possible. Note that m 6= l since for α = (r1 , s1 ) · · · (rl , sl ) with rl = a and ri , si 6= a
for i < l we have α(sl ) = a 6= sl and so α 6= e. Consider the product (rm , sm )(rm+1 , sm+1 ).
This product must be (after possibly interchanging rm+1 and sm+1 ) of one of the forms
(a, b)(a, b) , (a, b)(a, c) , (a, b)(b, c) , (a, b)(c, d)
where a, b, c, d are distinct. Note that
(a, b)(a, c) = (a, c, b) = (b, c)(a, b),
(a, b)(b, c) = (a, b, c) = (b, c)(a, c), and
(a, b)(c, d) = (c, d)(a, b) ,
and so in each of these three cases we could rewrite e as a product of l 2-cycles with the
first occurrence of a being farther to the right, contradicting the fact that we chose m to
be as large as possible. Thus the product (rm , sm )(rm+1 , sm+1 ) is of the form (a, b)(a, b).
By cancelling these two terms, we can write e as a product of (l − 2) 2-cycles. By the
induction hypothesis, (l − 2) is even, and so l is even.
Finally, to prove part (3), suppose that α = (a1 , b1 ) · · · (al , bl ) = (c1 , d1 ) · · · (cm , dm ).
Then we have
e = α α−1 = (a1 , b1 ) · · · (al , bl )(cm , dm ) · · · (c1 , d1 ).
By part (2), l + m is even, and so l = m mod 2.
4.14 Definition: For n ≥ 2, a permutation α ∈ Sn is called even if it can be written as
a product of an even number of 2-cycles. Otherwise α can be written as a product of an
odd number of 2-cycles, and then it is called odd. We define the sign (or the parity) of
α ∈ Sn to be
(
1 if α is even,
(−1)α =
−1 if α is odd.
4.15 Note: Note that (−1)e = 1 and that for α, β ∈ Sn , we have (−1)αβ = (−1)α (−1)β
−1
and (−1)α = (−1)α . Also note that when α is an l-cycle we have (−1)α = (−1)l−1
because (a1 , a2 , · · · , al ) = (a1 , a2 )(a2 , a3 ) · · · (al−1 , al ).
26
Multilinear Maps
4.16 Notation: Let R be a commutative ring. For positive integers n1 , n2 , · · · , nk , let
k
Y
Rni = (u1 , u2 , · · · , uk ) each ui ∈ Rni .
i=1
Note that
Mn×k (R) =
k
Y
Rn = (u1 , u2 , · · · , uk ) each ui ∈ Rn .
i=1
4.17 Definition: For a map L :
k
Q
Rni → Rm , we say that L is k-linear when for each
i=1
index j ∈ {1, 2, · · · , k} and for all ui , v, w ∈ Rnj and all t ∈ R we have
L(u1 , · · · , uj−1 , v + w, uj+1 , · · · , un ) = L(u1 , · · · , uj−1 , v, uj+1 , · · · , un )
+ L(u1 , · · · , uj−1 , w, uj+1 , · · · , un ) , and
L(u1 , · · · , uj−1 , tv, uj+1 , · · · , un ) = t L(u1 , · · · , uj−1 , uj , uj+1 , · · · , un ).
For a k-linear map L : Mn×k (R) =
k
Q
Rn → Rm we say that L is symmetric when for
i=1
each index j ∈ {1, 2, · · · , k − 1} and for all ui , v, w ∈ Rn we have
L(u1 , · · · , uj−1 , v, w, uj+2 , · · · , un ) = L(u1 , · · · , uj−1 , w, v, uj+2 , · · · , un )
or equivalently when for every permutation σ ∈ Sk and all ui ∈ Rn we have
L(u1 , u2 , · · · , uk ) = L(uσ(1) , uσ(2) , · · · , uσ(n) ),
and we say that L is skew-symmetric when for each index j ∈ {1, 2, · · · , k − 1} and for
all ui , v, w ∈ Rn we have
L(u1 , · · · , uj−1 , v, w, uj+2 , · · · , un ) = − L(u1 , · · · , uj−1 , w, v, uj+2 , · · · , un )
or equivalently when for every permutation σ ∈ Sk and all ui ∈ Rn we have
L(u1 , u2 , · · · , uk ) = (−1)σ L(uσ(1) , uσ(2) , · · · , uσ(n) ),
and we say that L is alternating when for each index j ∈ {1, 2, · · · , k − 1} and for all
ui , v ∈ Rn we have
L(u1 , · · · , uj−1 , v, v, uj+2 , · · · , un ) = 0.
4.18 Example: As an exercise, show that for every matrix A ∈ Mm×n (R), the map
L : Rn × Rm → R given by L(x, y) = y TA x is 2-linear and, conversely, that given any
2-linear map L : Rn × Rm → R there exists a unique matrix A ∈ Mm×n (R) such that
L(x, y) = y TA x for all x ∈ Rn and y ∈ Rm .
4.19 Theorem: Let R be a commutative ring. Let L : Mn×k =
k
Q
Rn → Rm be k-linear.
i=1
Then
(1) if L is alternating then L is skew-symmetric,
(2) if L is alternating then for all indices i, j ∈ {1, 2, · · · , k} with i < j and for all ui , v ∈ Rn
we have L(u1 , · · · , ui−1 , v, ui+1 , · · · , uj−1 , v, uj+1 , · · · , un ) = 0, and
(3) if 2 ∈ R∗ and L is skew-symmetric then L is alternating.
27
Proof: To prove Part (1), we suppose that L is alternating. Then for j ∈ {1, 2, · · · , k − 1}
and ui , v, w ∈ Rn we have
0 = L(u1 , · · · , uj−1 , v + w, v + w, uj+2 , · · · , un )
= L(u1 , · · · , v, v, · · · , un ) + L(u1 , · · · , v, w, · · · , un )
+ L(u1 , · · · , w, v, · · · , un ) + L(u1 , · · · , w, w, · · · , un )
= L(u1 , · · · , v, w, · · · , un ) + L(u1 , · · · , w, v, · · · , un )
and so L(u1 , · · · , v, w, · · · , un ) = − L(u1 , · · · , w, v, · · · , un ), hence L is skew-symmetric.
To prove Part (2) we again suppose that L is alternating. Then, as shown immediately
above, L is also skew-symmetric and so for indices i, j ∈ {1, 2, · · · , k} with i < j and for
ui , v ∈ Rn , in the case that j > i + 1 we have
L(u1 , · · · , ui−1 ,v, w, ui+2 , · · · , uj−1 , v, uj+1 , · · · , un )
= − L(u1 , · · · , ui−1 , v, v, ui+2 , · · · , uj−1 , w, uj+1 · · · , un ) = 0.
Finally, to prove Part (3), suppose that 2 ∈ R∗ and that L is skew-symmetric. Then
for an index j ∈ {1, 2, · · · , k − 1} and for ui , v ∈ Rn we have
L(u1 , · · · , uj−1 , v, v, uj+2 , · · · , un ) = − L(u1 , · · · , uj−1 , v, v, uj+2 , · · · , un )
and so 2 L(u1 , · · · , uj−1 , v, v, uj+2 , · · · , un ) = 0. Since 2 ∈ R∗ we can multiply both sides
by 2−1 to get L(u1 , · · · , uj−1 , v, v, uj+2 , · · · , uk ) = 0.
4.20 Theorem: Let R be a commutative ring. Given c ∈ R there exists a unique
n
Q
alternating n-linear map L : Mn (R) =
Rn → R such that L(I) = L(e1 , e2 , · · · , en ) = c.
i=1
This unique map L is given by
L(A) = c ·
X
(−1)σ Aσ(1),1 Aσ(2),2 · · · Aσ(n),n , that is
σ∈Sn
L(u1 , u2 , · · · , un ) = c ·
X
(−1)σ (u1 )σ(1) (u2 )σ(2) · · · (un )σ(n) .
σ∈Sn
Proof: First we prove uniqueness. Suppose that L : Mn (R) =
and n-linear with L(I) = c. Then for all ui ∈ Rn we have
L(u1 , u2 , · · · , un ) = L
=
P
n
(u1 )i1 ei1 ,
i1 =1
n
X
n
P
n
Q
(u2 )i2 ei2 , · · · ,
i2 =1
Rn → R is alternating
i=1
n
P
(un )in ein
in =1
(u1 )i1 (u2 )i2 · · · (un )in L(ei1 , ei2 , · · · , ein ).
i1 ,i2 ,···,in =1
Note that because L is alternating, whenever we have eij = eik for some j 6= k, we
obtain L(ei1 , ei2 , · · · , ein ) = 0, and so the only nonzero terms in the above sum occur when
28
i1 , i2 , · · · , in are distinct, so there is a permutation σ ∈ Sn with ij = σ(j) for all j. Thus
X
L(u1 , u2 , · · · , un ) =
(u1 )σ(1) (u2 )σ(2) · · · (un )σ(n) L(eσ(1) , eσ(2) , · · · , eσ(n) )
σ∈Sn
X
=
(u1 )σ(1) (u2 )σ(2) · · · (un )σ(n) (−1)σ L(e1 , e2 , · · · , en )
σ∈Sn
=c·
X
(−1)σ (u1 )σ(1) (u2 )σ(2) · · · (un )σ(n)
σ∈Sn
This proves that there is a unique such map L and that it is given by the required formula.
n
Q
To prove existence, it suffices to show that the map L : Mn (R) =
Rn → R given
i=1
by the formula
X
L(u1 , u2 , · · · , un ) = c ·
(−1)σ (u1 )σ(1) (u2 )σ(2) · · · (un )σ(n) .
σ∈Sn
is n-linear and alternating with L(I) = c. Note that this map L is n-linear because
X
L(u1 , · · · , v + w, · · · , un ) = c ·
(−1)σ (u1 )σ(1) · · · (v + w)σ(j) , · · · (un )σ(n)
σ∈Sn
=c·
X
σ
(−1) (u1 )σ(1) · · · vσ(j) · · · (un )σ(n) + c
σ∈Sn
X
(−1)σ (u1 )σ(1) · · · wσ(j) · · · (un )σ(n)
σ∈Sn
= L(u1 , · · · , v, · · · , un ) + L(u1 , · · · , w, · · · , un )
and similarly L(u1 , · · · , tv, · · · , un ) = t L(u1 , · · · , v, · · · , un ).
Note that L is alternating because, given indices i, j ∈ {1, 2, · · · , n} with i < j, when
ui = uj = v we have
X
L(u1 , · · · ,v, · · · , v, · · · , un ) = c ·
(−1)σ (u1 )σ(1) · · · vσ(i) · · · vσ(j) · · · (un )σ(n)
σ∈Sn
=c·
X
σ
(−1) (u1 )σ(1) · · · vσ(i) · · · vσ(j) · · · (un )σ(n)
σ∈Sn ,σ(i)<σ(j)
+ c·
X
(−1)τ (u1 )τ (1) · · · vτ (i) · · · vτ (j) · · · (un )τ (n) .
τ ∈Sn ,τ (i)>τ (j)
This is equal to 0 because the term in the first sum labeled by σ ∈ Sn with σ(i) < σ(j) can
be paired with the term in the second sum labeled by τ = σ(i, j) (where σ(i, j) denotes
the composite of σ with the 2-cycle (i, j)), and then the sum of the two terms in each pair
is equal to 0 because (−1)τ = −(−1)σ .
Finally note that
X
L(e1 , e2 , · · · , en ) = c ·
(−1)σ (e1 )σ(1) (e2 )σ(2) · · · (en )σ(n)
σ∈Sn
=c·
X
(−1)σ δ1,σ(1) δ2,σ(2) · · · δn,σ(n) = c
σ∈Sn
because the only nonzero term in the sum occurs when σ = e.
29
The Determinant
4.21 Definition: Let R be a commutative ring. The unique alternating n-linear map
det : Mn (R) → R with det(I) = 1 is called the determinant map. For A ∈ Mn (R), the
determinant of A, denoted by |A| or by det(A), is given by
X
|A| = det(A) =
(−1)σ Aσ(1),1 Aσ(2),2 · · · Aσ(n),n .
σ∈Sn
4.22 Example: As an exercise, find an explicit formula for the determinant of a 2 × 2
matrix and for the determinant of a 3 × 3 matrix.
4.23 Note: Given c ∈ R, according to the above theorem, the unique alternating n-linear
map L : Mn (R) → R with L(I) = c is given by L(A) = c |A|.
4.24 Theorem: Let R be a commutative ring and let A, B ∈ Mn (R). Then
(1) |AT | = |A|, and
(2) |AB| = |A| |B|.
Proof: To prove Part (1) note that
X
|A| =
(−1)σ Aσ(1),1 Aσ(2),2 · · · Aσ(n),n
σ∈Sn
=
X
(−1)σ A1,σ−1 (1) A2,σ−1 (2) · · · An,σ−1 (n)
σ∈Sn
=
X
(−1)τ A1,τ (1) A2,τ (2) · · · An,τ (n)
τ ∈Sn
=
X
(−1)τ (AT )τ (1),1 (AT )τ (2),2 · · · (AT )τ (n),n = |AT |.
τ ∈Sn
To prove Part (2), fix a matrix A ∈ Mn (R) and define L : Mr (R) → R by L(B) = |AB|.
Note that L is n-linear because
L(u1 , · · · , v + w, · · · , un ) = A(u1 , · · · , v + w, · · · , un )
= (Au1 , · · · , A(v + w), · · · , Aun )
= (Au1 , · · · , Av + Aw, · · · , Aun )
= (Au1 , · · · , Av, · · · , Aun ) + (Au1 , · · · , Aw, · · · , Aun )
= A(u1 , · · · , v, · · · , un ) + A(u1 , · · · , w, · · · , un )
= L(u1 , · · · , v, · · · , un ) + L(u1 , · · · , w, · · · , un ).
and similarly L(u1 , · · · , tv, · · · , un ) = t L(u1 , · · · , v, · · · , un ). Note that L is alternating
because
L(u1 , · · · , v, v, · · · , un ) = A(u1 , · · · , v, v, · · · , un )
= (Au1 , · · · , Av, Av, · · · , Aun ) = 0.
Note that L(I) = AI = |A|. Thus, by Theorem 4.20 (see Note 4.23) it follows that
L(B) = L(I)|B| = |A| |A|.
4.25 Definition: Let R be a commutative ring and let A ∈ Mn (R). We say that A is
upper triangular when Aj,k = 0 for all j > k, and we say that A is lower-triangular
when Aj,k = 0 for all j < k.
30
4.26 Theorem: Let R be a commutative ring and let A, B ∈ Mn (R).
(1) If B is obtained from A by performing an elementary column operation then |B| is
obtained from |A| as follows.
(a) if we use Ck ↔ Cl with k 6= l then |B| = −|A|,
(b) if we use Ck 7→ t Ck with t ∈ R then |B| = t |A|, and
(c) if we use Ck 7→ Ck + tCl with t ∈ R and k 6= l then |B| = |A|.
The same rules apply when B is obtained from A using an elementary row operation.
n
Q
(2) If A is either upper-triangular or lower-triangular then |A| =
Ai,i .
i=1
Proof: If B is obtained from A using the column operation Ck ↔ Cl with k 6= l then
|B| = −|A| because the determinant map is skew-symmetric. If B is obtained from A
using Ck 7→ t Ck with t ∈ R then |B| = t|A| because the determinant map is linear.
Suppose B is obtained from A using Ck 7→ Ck + t Cl where t ∈ R and k 6= l. Write
A = (u1 , u2 · · · , un ) with each ui ∈ Rn . Then since the determinant map is n-linear and
alternating we have
|B| = (u1 , · · · , uk + tul , · · · , ul , · · · , un )
= (u1 , · · · , uk , · · · , ul , · · · , un ) + t (u1 , · · · , ul , · · · , ul , · · · , un )
= |A| + t · 0 = |A|.
This proves Part (1) in the case of column operations. The same rules apply when using
row operations because |AT | = |A|.
To prove Part (2), suppose that A is upper-triangular (the case that A is lowertriangular is similar). We claim that for every σ ∈ Sn with σ 6= e we have σ(i) > i,
hence Aσ(i),i = 0, for some i ∈ {1, 2, · · · , n}. Suppose, for a contradiction, that σ 6= e and
σ(i) ≤ i for all indices i. Let k be the largest index for which σ(k) < k. Then we have
σ(i) = i > k for all i > k and σ(k) < k and σ(i) ≤ i < k for all i < k. This implies that
there is no index i for which σ(i) = k, but this is not possible since σ is surjective. This
n
P
Q
proves the claim. Thus |A| =
(−1)σ Aσ(1),1 Aσ(2),2 · · · Aσ(n),n =
Ai,i because the
i=1
σ∈Sn
only nonzero term in the above sum occurs when σ = e.
4.27 Example: The above theorem gives us a method that we can use to calculate
determinants. For example, using only row operations of the form Rk → Rk + t Rl we have
1
2
3
1
3
4
5
1
2
1
4
5
4
1 3 2
2
0 −2 −3
=
1
0 −4 −2
3
0 −2 3
1 3 2
0 2 −1
=
0 0 2
0 0 2
4
1 3 2
4
1 3 2 4
−6
0 2 −1 5
0 2 −1 5
=
=
−11
0 −4 −2 −11
0 0 −4 −1
−1
0 −2 3 −1
0 0 2 4
4
1 3 2 4
1
0 2 −1 5
=
= −28.
11
0 0 2 11
4
0 0 0 −7
4.28 Definition: Let R be a commutative ring and let A ∈ Mn (R) with n ≥ 2. We write
A(j,k) to denote the (n − 1) × (n − 1) matrix which is obtained by removing the j th row
and the k th column of A. The cofactor matrix of A is the matrix Cof(A) ∈ Mn (R) with
entries
Cof(A)k,l = (−1)k+l A(l,k) .
31
4.29 Theorem: Let R be a commutative ring and let A ∈ Mn (R) with n ≥ 2.
(1) For each k ∈ {1, 2, · · · , n} we have
|A| =
n
X
(−1)j+k Aj,k A(j,k) =
j=1
n
X
(−1)k+j Ak, j A(k, j) .
j=1
(2) We have
A · Cof(A) = |A| · I = Cof(A) · A.
(3) A is invertible in Mn (R) if and only if |A| is invertible in R, and in this case we have
|A−1 | = |A|−1 and
1
A−1 = |A|
Cof(A).
(4) If A is invertible then the unique solution to the equation Ax = b is the element x ∈ Rn
with entries
|Bk |
xk =
|A|
where Bk is the matrix obtained by replacing the k th column of A by b.
Proof: We have
X
|A| =
(−1)σ Aσ(1),1 Aσ(2),2 · · · Aσ(n),n
=
σ∈Sn
n
X
X
(−1)σ Aσ(1),1 · · · Aσ(k−1),k−1 Aj,k Aσ(k+1),k+1 · · · Aσ(n),n
j=1 σ∈Sn ,σ(k)=j
=
n
X
j=1
Aj,k
X
(−1)σ A(j,k) τ (1),1 · · · A(j,k) τ (k−1),k−1 A(j,k) τ (k),k · · · A(j,k) τ (k−1),k−1
σ∈Sn ,σ(k)=j
where τ = τ (σ) ∈ Sn−1 is the permutation defined as follows:
(
(
σ(i) if σ(i) < j,
σ(i) if σ(i) < j,
and if i > k τ (i − 1) =
if i < k τ (i) =
σ(i) − 1 if σ(i) > j,
σ(i) − 1 if σ(i) > j,
or equivalently, τ is the composite
τ = (n, n − 1, · · · , j + 1, j) σ (k, k + 1, · · · , n − 1, n).
Note that (−1)τ = (−1)n−j (−1)σ (−1)n−k and so we have (−1)σ = (−1)j+k (−1)τ . Thus
|A| =
n
X
Aj,k
=
(−1)j+k (−1)τ A(j,k) τ (1),1 · · · A(j,k) τ (n−1),n−1
τ ∈Sn−1
j=1
n
X
X
(−1)j+k Aj,k A(j,k) .
j=1
The proof that |A| =
n
P
(−1)k+j Ak,j A(k,j) is similar (or it follows from the formula
j=1
|AT | = |A|). This completes the proof of Part (1).
32
To prove Part (2) we note that
n
n
X
X
Cof(A) · A
=
Cof(A)k,j Aj,l =
(−i)k+j Aj,l A(j,k) .
k,l
j=1
j=1
By Part (1), the sum on the right is equal to the determinant of the matrix B (k,l) ∈ Mn (R)
which is obtained from A by replacing the k th column of A by a copy of its lth column.
Since B (k,l) is equal to A when k = l, and B (k,l) has two equal columns when k 6= l we
have
(
)
|A|
if
k
=
l,
Cof(A) · A
= B (k,l) =
= |A| · δk,l .
k,l
0 if k 6= l
This proves that Cof(A) · A = |A| · I. A similar proof shows that A · Cof(A) = |A| · I.
If A is invertible in Mn (R), then we have |A| |A−1 | = |A · A−1 | = |I| = 1 and similarly
−1
|A | |A| = 1 and so |A| and |A−1 | are invertible in R with |A−1 | = |A|. Conversely, the
formulas in Part (2) show that if |A| is invertible in R then A is invertible in Mn (R) with
1
Cof(A). This proves Part (3).
A−1 = |A|
Part (4) now follows from Parts (1) and (3). Indeed if A is invertible then the solution
to Ax = b is given by x = A−1 b and so
n
P
1
1
xk = A−1 b k = |A|
Cof(A) b k = |A|
Cof(A)k,j bj
j=1
=
1
|A|
n
P
(−1)k+j bj A(j,k) =
j=1
1
|A|
|Bk |
where Bk is the matrix obtained by replacing the k th column of A by b.
4.30 Definition: For a matrix A ∈ Mn (R), the first of the two sums in Part (1) of the
above theorem is called the cofactor expansion of |A| along the k th column of A, and
the second sum is called the cofactor expansion of |A| along the k th row of A.
4.31 Example: Using row operations of the form Rk 7→ Rk + t Rl , together with cofactor
expansions along various columns, we have
2
5
1
2
4
1
2
0
0
3
3
4
2
1
5
4
3
3
2
6
2
2
1
1
1 = 1
0
2
2
−2
4
=− 2
2
1 3 4 2
1 −2 −5 −3
0 −2 −5 −3
1 2 3 1
0 2 3 1 =−
=−
2 1 2 0
0 1 2 0
−2 −4 −6 −4
0 −4 −6 −4
4
1
4
4
−4
2 =− 2
6
−6
4
1
2
2
4
2
1
4
4
3
2
6
0
1
0
0
0 −4
−4 −4
8 0
1 2 =−
=−
= 16.
−6 −2
−6 −2
0 −2
4.32 Example: From the formula in Part (2), if ad − bc 6= 0 then we have
−1
1
d b
a b
.
=
c d
ad − bc −c a
As an exercise, find a similar formula for the inverse of a 3 × 3 matrix.
4.33 Corollary: Let R be a commutative ring and let A ∈ Mn (R). If A is invertible in
Mn (R) then |A| is invertible in R and we have |A−1 | = |A|−1 .
33
Chapter 5. The Dot and Cross Products in Rn
5.1 Definition: Let F be a field. For vectors x, y ∈ F n we define the dot product of x
and y to be
n
P
x y = yT x =
xi yi ∈ F .
.
i=1
5.2 Theorem: (Properties of the Dot Product) For all x, y, z ∈ Rn and all t ∈ R we have
.
.
.
.
.
.
.
.
(1) (Bilinearity) (x + y) z = x z + y z , (tx) y = t(x y)
x (y + z) = x y + x z , x (ty) = t(x y),
(2) (Symmetry) x y = y x, and
(3) (Positive Definiteness) x x ≥ 0 with x x = 0 if and only if x = 0.
.
.
.
.
.
.
Proof: The proof is left as an exercise.
5.3 Definition: For a vector x ∈ Rn , we define the length (or norm) of x to be
s
n
√
P
xi 2 .
|x| = x x =
.
i=1
We say that x is a unit vector when |x| = 1.
5.4 Theorem: (Properties of Length) Let x, y ∈ Rn and let t ∈ R. Then
(1) (Positive Definiteness) |x| ≥ 0 with |x| = 0 if and only if x = 0,
(2) (Scaling) |tx| = |t||x|,
(3) |x ± y|2 = |x|2 ± 2(x y) + |y|2 .
(4) (The Polarization Identities) x y = 12 |x + y|2 − |x|2 − |y|2 = 14 |x + y|2 − |x − y|2 ,
(5) (The Cachy-Schwarz Inequality) |x y| ≤ |x| |y| with |x y| = |x| |y| if and only if the
set {x, y} is linearly dependent, and
(6) (The Triangle Inequality) |x + y| ≤ |x| + |y|.
.
.
.
.
Proof: We leave the proofs of Parts (1), (2) and (3) as an exercise, and we note that
(4) follows immediately from (3). To prove part (5), suppose first that {x, y} is linearly
dependent. Then one of x and y is a multiple of the other, say y = tx with t ∈ R. Then
.
.
.
|x y| = |x (tx)| = |t(x x)| = |t| |x|2 = |x| |tx| = |x| |y|.
Suppose next that {x, y} is linearly independent. Then for all t ∈ R we have x + ty 6= 0
and so
0 6= |x + ty|2 = (x + ty) (x + ty) = |x|2 + 2t(x y) + t2 |y|2 .
.
.
Since the quadratic on the right is non-zero for all t ∈ R, it follows that the discriminant
of the quadratic must be negative, that is
.
and hence |x . y| < |x| |y|. This proves part (5).
4(x y)2 − 4|x|2 |y|2 < 0.
.
Thus (x y)2 < |x|2 |y|2
Using part (5) note that
.
.
|x + y|2 = |x|2 + 2(x y) + |y|2 ≤ |x + y|2 + 2|x y| + |y|2 ≤ |x|2 + 2|x| |y| + |y|2 = |x| + |y|
and so |x + y| ≤ |x| + |y|, which proves part (6).
34
2
5.5 Definition: For points a, b ∈ Rn , we define the distance between a and b to be
dist(a, b) = |b − a|.
5.6 Theorem: (Properties of Distance) Let a, b, c ∈ Rn . Then
(1) (Positive Definiteness) dist(a, b) ≥ 0 with dist(a, b) = 0 if and only if a = b,
(2) (Symmetry) dist(a, b) = dist(b, a), and
(3) (The Triangle Inequality) dist(a, c) ≤ dist(a, b) + dist(b, c).
Proof: The proof is left as an exercise.
5.7 Definition: For nonzero vectors 0 6= x, y ∈ Rn , we define the angle between x and
y to be
x y
−1
θ(x, y) = cos
∈ [0, π].
|x| |y|
.
.
Note that θ(x, y) = π2 if and only if x y = 0. For vectors x, y ∈ Rn , we say that x and y
are orthogonal when x y = 0.
.
5.8 Theorem: (Properties of Angle) Let 0 6= x, y ∈ Rn . Then
(
θ(x, y) = 0 if and only if y = tx for some t > 0, and
(1) θ(x, y) ∈ [0, π] with
θ(x, y) = π if and only if y = tx for some t < 0,
(2) (Symmetry) θ(x, y) = θ(y, x), (
θ(x, y) if 0 < t ∈ R,
(3) (Scaling) θ(tx, y) = θ(x, ty) =
π − θ(x, y) if 0 > t ∈ R,
(4) (The Law of Cosines) |y − x|2 = |x|2 + |y|2 − 2|x| |y| cos θ(x, y),
(5) (Pythagoras’ Theorem) θ(x, y) = π2 if and only if |y − x|2 = |x|2 + |y|2 , and
(6) (Trigonometric Ratios) if (y − x) x = 0 then cos θ(x, y) = |x|
|y| and sin θ(x, y) =
.
.
|y−x|
|y| .
Proof: The Law of Cosines follows from the identity |y − x|2 = |y|2 − 2(y x) + |x|2 and
the definition of θ(x, y). Pythagoras’ Theorem is a special case of the Law of Cosines. We
Prove Part (6). Let 0 6= x, y ∈ Rn and write θ = θ(x, y). Suppose that (y − x) x = 0.
Then we have y x − x x = 0 so that x y = |x|2 , and so we have
.
.
cos θ =
.
.
.
x y
|x|2
|x|
=
=
.
|x| |y|
|x| |y|
|y|
Also, by Pythagoras’ Theorem we have |x|2 + |y − x|2 = |y|2 so that |y|2 − |x|2 = |y − x|2 ,
and so
|x|2
|y|2 − |x|2
|y − x|2
sin2 θ = 1 − cos2 θ = 1 − 2 =
=
.
|y|
|y|2
|y|2
Since θ ∈ [0, π] we have sin θ ≥ 0, and so taking the square root on both sides gives
sin θ =
|y − x|
.
|y|
5.9 Definition: For points a, b, c ∈ Rn with a 6= b and b 6= c we define
6
abc = θ(b − a, c − b).
35
Orthogonal Complement and Orthogonal Projection in Rn
5.10 Definition: Let F be a field and let U , V and W be subspaces of F n . Recall that
U + V = u + v u ∈ U, v ∈ V
is a subspace of F n . We say that W is the internal direct sum of U with V , and we write
W = U ⊕ V , when W = U + V and U ∩ V = {0}. As an exercise, show that W = U ⊕ V
if and only if for every x ∈ W there exist unique vectors u ∈ U and v ∈ V with x = u + v.
5.11 Definition: Let U ⊆ Rn be a subspace. We define the orthogonal complement
of U in Rn to be
U ⊥ = x ∈ Rn x u = 0 for all u ∈ U .
.
5.12 Theorem: (Properties of the Orthogonal Complement) Let U ⊆ Rn be a subspace,
let S ⊆ U and let A ∈ Mk×n (R). Then
(1) If U = Span (S) then U ⊥ = x ∈ Rn x · u = 0 for all u ∈ S ,
(2) (RowA)T = NullA.
(3) U ⊥ is a vector space,
(4) dim(U ) + dim(U ⊥ ) = n
(5) U ⊕ U ⊥ = Rn ,
(6) (U ⊥ )⊥ = U ,
(7) (NullA)⊥ = RowA.
Proof: To prove part (1), let T = x ∈ Rn x u = 0 for all u ∈ S . Note that U ⊥ ⊆ T .
n
P
Let x ∈ T . Let u ∈ U = Span (S), say u =
ti ui with each ti ∈ R and each ui ∈ S.
.
.
.
n
P
n
P
i=1
.
ti (x ui ) = 0. Thus x ∈ U ⊥ and so we have T ⊆ U ⊥ .
i=1
i=1


x v1
.
To prove part (2), let v1 , v2 , · · · , vn be the rows of A. Note that Ax =  ..  so
x vn
⊥
we have x ∈ NullA ⇐⇒ x vi = 0 for all i ⇐⇒ x ∈ Span {v1 , v2 , · · · , vk } = (RowA)⊥
by part (1).
Part (3) follows from Part (2) since we can choose the matrix A so that U = Row(A)
and then we have U ⊥ = Null(A) which is a vector space in Rn .
Part (4) also follows from part (2) since if we choose A so that RowA = U then we
have dim(U ) + dim(U ⊥ ) = dim RowA + dim(RowA)⊥ = dim RowA + dim NullA = n.
To prove part (5), in light of part (4), it suffices to show that U ∩ U ⊥ = {0}. Let
x ∈ U ∩ U ⊥ . Since x ∈ U ⊥ we have x u = 0 for all u ∈ U . In particular, since x ∈ U we
have x x = 0, and hence x = 0. Thus U ∩ U ⊥ = {0} and so U ⊕ U ⊥ = Rn .
To prove part (6), let x ∈ U . By the definition of U ⊥ we have x v = 0 for all v ∈ U ⊥ .
By the definition of (U ⊥ )⊥ we see that x ∈ (U ⊥ )⊥ . Thus U ⊆ (U ⊥ )⊥ . By part (4) we
know that dim U + dim U ⊥ = n and also that dim U ⊥ + dim(U ⊥ )⊥ = n. It follows that
dim U = n − dim U ⊥ = dim(U ⊥ )⊥ . Since U ⊆ (U ⊥ )⊥ and dim U = dim(U ⊥ )⊥ we have
U = (U ⊥ )⊥ , as required.
⊥
By parts (3) and (6) we have (NullA)⊥ = (RowA)⊥ = RowA, proving part (7).
Then x u = x
ti ui =
.
.
.
.
.
.
36
5.13 Definition: For a subspace U ⊆ Rn and a vector x ∈ Rn , we define the orthogonal
projection of x onto U , denoted by ProjU (x), as follows. Since Rn = U ⊕ U ⊥ , we can
choose unique vectors u, v ∈ Rn with u ∈ U , v ∈ U ⊥ and x = u + v. We then define
ProjU (x) = u.
Note that since U = (U ⊥ )⊥ , for u and v as above we have ProjU ⊥ (x) = v. When y ∈ Rn
and U = Span {y}, we also write Projy (x) = ProjU (x) and Projy⊥ (x) = ProjU ⊥ (x).
5.14 Theorem: Let U ⊆ Rn be a subspace and let x ∈ Rn . Then ProjU (x) is the unique
point in U which is nearest to x.
Proof: Let u, v ∈ Rn with u ∈ U , v ∈ V and u + v = x so that ProjU (x) = u. Let
w ∈ U with w 6= u. Since v ∈ U ⊥ and u, w ∈ U we have v u = v w = 0 and so
v (w − u) = v w − v u = 0. Thus we have
|x − w|2 = |u + v − w|2 = |v − (w − u)|2 = v − (w − u)
v − (w − u)
.
.
.
.
.
.
.
= |v|2 − 2 v (w − u) + |w − u|2 = |v|2 + |w − u|2 = |x − u|2 + |w − u|2 .
Since w 6= u we have |w − u| > 0 and so |x − w|2 > |x − u|2 . Thus |x − w| > |x − u|, that
is dist(x, w) > dist(x, u), so u is the vector in U nearest to x, as required.
5.15 Theorem: For any matrix A ∈ Mn×l (R) we have Null(ATA) = Null(A) and
Col(AT A) = Col(AT ) so that nullity(AT A) = nullity(A) and rank(AT A) = rank(A).
Proof: If x ∈ Null(A) then Ax = 0 so ATAx = 0 hence x ∈ Null(ATA). This shows
that Null(A) ⊆ Null(ATA). If x ∈ Null(ATA) then we have ATAx = 0 which implies that
|Ax|2 = (Ax)T (Ax) = xTATAx = 0 and so Ax = 0. This shows that Null(AT A) ⊆ Null(A).
Thus we have Null(ATA) = Null(A). It then follows that
Col(AT ) = Row(A) = Null(A)⊥ = Null(ATA)⊥ = Row(ATA) = Col (ATA)T = Col(ATA).
5.16 Theorem: Let A ∈ Mn×l (R), let U = Col(A) and let x ∈ Rn . Then
(1) the matrix equation ATA t = AT x has a solution t ∈ Rl , and for any solution t we have
ProjU (x) = At,
(2) if rank(A) = l then AT A is invertible and
ProjU (x) = A(ATA)−1 AT x.
Proof: Note that U ⊥ = (ColA)⊥ = Row(AT )⊥ = Null(AT ). Let u, v ∈ Rn with u ∈ U ,
v ∈ U ⊥ and u + v = x so that ProjU (x) = u. Since u ∈ U = ColA we can choose t ∈ Rl so
that u = At. Then we have x = u + v = At + v. Multiply by AT to get AT = ATAt + AT v.
Since v ∈ U ⊥ = Null(AT ) we have AT v = 0 so ATA t = AT x. Thus the matrix equation
ATA t = AT x does have a solution t ∈ Rl .
Now let t ∈ Rl be any solution to ATA t = At x. Let u = At and v = x − u. Note that
x = u + v, u = At ∈ Col(A) = U , and AT v = AT (x − u) = AT (x − At) = AT x − ATA t = 0
so that v ∈ Null(AT ) = U ⊥ . Thus ProjU (x) = u = At, proving part (1).
Now suppose that rank(A) = l. Since ATA ∈ Ml×l (R) with rank(ATA) = rank(A) = l,
the matrix ATA is invertible. Since ATA is invertible, the unique solution t ∈ Rl to the
matrix equation ATA t = AT x is the vector t = (ATA)−1 AT x, and so from Part (1) we
have ProjU (x) = At = A(ATA)−1 AT x, proving Part (2).
37
The Volume of a Parallelotope
5.17 Definition: Given vectors u1 , u2 , · · · , uk ∈ Rn , we define the parallelotope on
u1 , · · · , uk to be the set
nP
o
k
P (u1 , · · · , uk ) =
ti ui 0 ≤ ti ≤ 1 for all i .
j=1
We define the volume of this parallelotope, denoted by V (u1 , · · · , uk ), recursively by
V (u1 ) = |u1 | and
V (u1 , · · · , uk ) = V (u1 , · · · , uk−1 ) ProjU ⊥ (uk )
where U = Span {u1 , · · · , uk−1 }.
5.18 Theorem: Let u1 , · · · , uk ∈ Rn and let A = (u1 , · · · , uk ) ∈ Mn×k (R). Then
q
V (u1 , · · · , un ) = det(ATA).
Proof: We prove the theorem by induction on k. Note that
= 1, u1 ∈ Rn and
p when k √
√
T
A = u1 ∈ Mn×1 (R), we have V (u1 ) = |u1 | = u1 u1 = u1 u1 = ATA, as required.
Let k ≥ 2 and suppose, inductively, that
p when A = (u1 , · · · , uk−1 ) ∈ Mn×k−1 we have
det(ATA) > 0 and V (u1 , · · · , uk−1 ) = det(ATA). Let B = (u1 , · · · , uk ) = (A, uk ). Let
U = Span {u1 , · · · , uk−1 } = Col(A). Let v = ProjU (uk ) and w = ProjU ⊥ (uk ). Note
that v ∈ U = Col(A) and w ∈ U ⊥ = Null(AT ). Then we have uk = v + w so that
B = (A, v + w). Since v ∈ Col(A), the matrix B can be obtained from the matrix (A, w)
by performing elementary column operations of the type Ck 7→ Ck + tCi . Let E be the
product of the elementary matrices corresponding to these column operations, and note
that B = (A, v + w) = (A, w)E. Since the row operations Ck 7→ Ck + tCi do not alter
the determinant, E is a product of elementary matrices of determinant 1, so we have
det(E) = 1. Since det(E) = 1 and w ∈ Null(AT ) we have
T A
T
T
T
det(B B) = det E (A, w) (A, w)E = det
Aw
wT
T
T
A A ATw
A A
0
= det
=
= det(ATA) |w|2 .
wTA wTw
0
|w|2
.
By the induction hypothesis, we can take the square root on both sides to get
q
q
det(B TB) = det(ATA) |w| = V (u1 , · · · , uk−1 ) |w| = V (u1 , · · · , uk ).
38
The Cross Product in Rn
5.19 Definition: Let F be a field. For n ≥ 2 we define the cross product
X:
n−1
Q
Fn → Fn
k=1
as follows. Given vectors u1 , u2 , · · · , un−1 ∈ F n , we define X(u1 , u2 , · · · , un−1 ) ∈ F n to be
the vector with entries
X(u1 , u2 , · · · , un−1 )j = (−1)n+j |A(j) |
where A(j) ∈ Mn−1 (F ) is the matrix obtained from A = (u1 , u2 , · · · , un−1 ) ∈ Mn×n−1 (F )
by removing the j th row. Given a vector u ∈ F 2 we write X(u) as u× , and given two
vectors u, v ∈ F 3 we write X(u, v) as u × v.
5.20 Example: Given u ∈ F 2 we have
× u1
−u2
×
u =
=
.
u2
u1
Given u, v ∈ F 3 we have

   

u1
v1





u × v = u2 × v2 = 
−

u3
v3


u2
u3
u1
u3
u1
u2
v2
v3
v1
v3
v1
v2

 


u2 v3 − u3 v2

 =  u3 v1 − u1 v3  .


u1 v2 − u2 v1

5.21 Note: Because the determinant is n-linear, alternating and skew-symmetric, it
follows that the cross product is (n − 1)-linear, alternating and skew-symmetric. Thus for
ui , v, w ∈ F n and t ∈ F we have
(1) X(u1 , · · · , v + w, · · · , un−1 ) = X(u1 , · · · , v, · · · , un−1 ) + X(u1 , · · · , w, · · · , un−1 ),
(2) X(u1 , · · · , t uk , · · · , un−1 ) = t X(u1 , · · · , uk , · · · , un−1 ),
(3) X(u1 , · · · , uk , · · · , ul , · · · , un−1 ) = −X(u1 , · · · , ul , · · · , uk , · · · , un−1 ).
5.22 Definition: Recall that for u1 , · · · , un ∈ Rn , the set {u1 , · · · , un } is a basis for Rn
if and only if det(u1 , · · · , un ) 6= 0. For an ordered basis A = (u1 , · · · , un ), we say that
A is positively oriented when det(u1 , · · · , un ) > 0 and we say that A is negatively
oriented when det(u1 , · · · , un ) < 0.
5.23 Theorem: Let u1 , · · · , un−1 , v1 , · · · , vn−1 , w ∈ Rn . Then
.
(1) X(u1 , · · · , un−1 ) w = det(u1 , · · · , un−1 , w),
(2) X(u1 , · · · , un−1 ) = 0 if and only if {u1 , · · · , un−1 } is linearly dependent.
(3) When w = X(u1 , · · · , un−1 ) 6= 0 we have det(u1 , · · · , un−1 , w) > 0 so that the n-tuple
(u1 , · · · , un−1 , w) is a positively oriented basis for Rn ,
(4) X(u1 , · · · , un−1 ) X(v1 , · · · , vn−1 ) = det(ATB) where A = (u1 , · · · , un−1 ) ∈ Mn×n−1 (R)
and B = (v1 , · · · , vn−1 ) ∈ Mn×n−1 (R), and
(5) X(u1 , · · · , un−1 ) is equal to the volume of the parallelipiped on u1 , · · · , un−1 .
.
Proof: I may include a proof later.
39
Chapter 6. Vector Spaces and Modules
6.1 Definition: Let R be a commutative ring. A module over R (or an R-module)
is a set U together with an element 0 ∈ U and two operations + : U × U → U and
∗ : R × U → U , where we write +(x, y) as x + y and ∗(t, x) as t ∗ x, t · x or as tx, such that
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
+ is associative: (x + y) + z = x + (y + z) for all x, y, z ∈ U ,
+ is commutative: x + y = y + x for all x, y ∈ U ,
0 is an additive identity: x + 0 = x for all x ∈ U ,
every element has an additive inverse: for all x ∈ U there exists y ∈ U with x + y = 0,
∗ is associative: (st)x = s(tx) for all s, t ∈ R and all x ∈ U ,
1 is a multiplicative identity: 1 · x = x for all x ∈ U ,
∗ is distributive over + in R: (s + t)x = sx + tx for all s, t ∈ R and all x ∈ U , and
∗ is distributive over + in U : t(x + y) = tx + ty for all t ∈ R and all x, y ∈ U .
When F is a field, a module over F is also called a vector space over F .
6.2 Note: In an R-module U , the zero element is unique, the additive inverse of a given
element x ∈ U and we denote it by −x, and we have additive cancellation.
6.3 Definition: Let R be a commutative ring and let W be an R-module. A submodule
of W over R is a subset U ⊆ W which is also an R-module using the (restrictions of) the
same operations used in W . Note that for a subset U ⊆ W , the operations on W restrict
to well-defined operations on U if and only if
(1) U is closed under +: for all x, y ∈ U we have x + y ∈ U , and
(2) U is closed under ∗: for all t ∈ R and all x ∈ U we have tx ∈ U .
When the operations do restrict as above, U is a submodule of W if and only if
(3) U contains the zero element: 0 ∈ U , and
(4) U is closed under negation: for all x ∈ U we have −x ∈ U .
When F is a field and W is a vector space over F , a submodule of W is also called a
subspace of W over F .
6.4 Example: Let R be a ring. Then Rn , Rω and R∞ are all R-modules, where
Rn = f : {1, 2, · · · , n} → R = (a1 , a2 , · · · , an ) each ak ∈ R ,
Rω = f : Z+ → R = (a1 , a2 , a3 , · · ·) each ak ∈ R and
R∞ = f : Z+ → R f (k) = 0 for all but finitely many k ∈ Z+
= (a1 , a2 , a3 , · · ·) each ak ∈ R with ak = 0 for all but finitely many k ∈ Z+
6.5 Example: When R is a ring, Mm×n (R) is an R-modulue.
6.6 Example: For two sets A and B, we denote the set of all functions f : A → B by B A
or by Func(A, B). When A is a set and R is a ring, we define operations on Func(A, R)
by (tf )(x) = t f (x), (f + g)(x) = f (x) + g(x) and (f g)(x) = f (x)g(x) for all x ∈ X. The
set Func(A, R) is a ring under addition and multiplication and also an R-module under
addition and multiplication by t ∈ R.
6.7 Example: Let R be a ring. Recall that a polynomial ,with coefficients in R, is an
n
P
expression of the form f (x) =
ak xk = a0 +a1 x+a2 x2 +· · ·+an xn where n ∈ N and each
k=0
40
ck ∈ R. We denote the set of all such polynomials by R[x] or by P (R). We consider two
n
P
ak xk and
polynomials to be equal when their coefficients are all equal, so for f (x) =
g(x) =
n
P
k=0
k
bk x we have f = g when ak = bk for all k. In particular, we have f = 0 when
k=0
ak = 0 for all indices k. When 0 6= f ∈ R[x], the degree of f , denoted by deg(f ), is the
largest n ∈ N for which cn = 0. The degree of the zero polynomial is −∞. For n ∈ N, we
denote the set of all polynomials of degree at most n by Pn (R). A (formal) power series,
∞
P
with coefficients in R, is an expression of the form f (x) =
ck xk = c0 + c1 + c2 x2 + · · ·.
k=0
We denote the set of all such power series by R[[x]]. Thus we have
n
o
n
P
k
Pn (R) = f (x) =
ck x each ck ∈ R ,
k=0
R[x] = P (R) =
n
o
n
P
Pn (R) = f (x) =
ck xk n ∈ N and each ck ∈ R and
S
n∈N
R[[x]] =
nP
∞
k=0
o
ck xk each ck ∈ R .
k=0
We define operations on these sets as follows. Given f (x) =
ak xk and g(x) =
P
ikge0
P
bk x k
k≥0
(where the sums may be finite or infinite) we define tf , f + g and f g by
P
P
(tf )(x) =
t ak xk , (f + g)(x) =
(ak + bk )xk , and
k≥o
(f g)(x) =
P
i,j≥0
ikge0
(ai bj )xi+j =
P
k≥0
ck xk with ck =
k
P
ai bk−i .
i=0
The sets Pn (R), R[x] = P (R) and R[[x]] are all rings under addition and multiplication,
and they are all R-modules under addition and multiplication by elements t ∈ R.
6.8 Definition: Let R be a commutative ring. An algebra over R (or an R-algebra)
is a set U with an element 0 ∈ U together with three operations + : U × U → U ,
∗ : U × U → U and ∗ : R × U → U such that U is an abelian group under +, such that the
two multiplication operations satisfy (xy)z = x(yz), (x + y)z = xz + yz, x(y + z) = xy + xz
and t(xy) = (tx)y for all t ∈ R and all x, y, z ∈ U .
6.9 Example: When R is a commutative ring and A is a set, Rn , R∞ , Rω , Mn (R),
Func(A, R), Pn (R), R[x], R[[x]] and RA are all R-algebras using their usual operations.
41
6.10 Theorem: Let R be a ring, and let W be an R-module.
Let A be a set, and for
T
each α ∈ A let Uα be a submodule of W over R. Then
Uα is an R-module.
α∈A
Proof: I may include a proof later.
6.11 Definition: Let R be a ring, let W be an R-module, and let S ⊆ U . A linear
combination of the set S (or of the elements in S) (over R) is an element w ∈ W of the
n
P
form w =
ti ui for some n ∈ N, ti ∈ R and ui ∈ U (we allow the case n = 0 to include
i=1
the empty sum, which we take to be equal to 0). The span of S (over R), denoted by
Span (S) or by Span R (S), is the set of all such linear combinations, that is
nP
o
n
Span (S) =
ti ui n ∈ N, ti ∈ R, ui ∈ S .
i=1
When U = Span (S), we also say that S spans U or that U is spanned by S.
The submodule of W generated by S, denoted by hSi, is the smallest submodule of
W containing S, that is
\
hSi =
U ⊆ W U is a submodule of W with S ⊆ U .
6.12 Theorem: Let R be a ring, let W be an R-module, and let S ⊆ W . Then
hSi = Span (S).
Proof: I may include a proof later.
6.13 Definition: Let R be a ring, let U be an R-module, and let S ⊆ U . We say that S
is linearly independent (over R) when for all n ∈ Z+ , for all ti ∈ R, and for all distinct
n
P
elements ui ∈ S, if
ti ui = 0 then ti = 0 for all indices i. Otherwise we say that S is
i=1
linearly dependent. We say that S is a basis for U when S spans U and S is linearly
independent.
6.14 Note: A set S is a basis for an R-module U if and only if every element x ∈ U can
n
P
be written uniquely (up to the order of the terms in the sum) in the form x =
ti ui
i=1
where n ∈ N, 0 6= ti ∈ F and u1 , u2 , · · · , un are distinct elements in S.
6.15 Example: For ek ∈ Rn given by (ek )i = δk,i , the set S = {e1 , e2 , · · · , en } is a basis
for Rn , and we call it the standard basis for Rn . For ek ∈ R∞ given by (ek )i = δk,i , the
set S = {e1 , e2 , e3 , · · ·} is a basis for R∞ , which we call the standard basis for R∞ . It is
not immediately obvious whether the R-module Rω has a basis.
6.16 Example: For each k, l ∈ {1, 2, · · · , n}, let Ek,l ∈ Mn (R) denote the matrix with
(Ek,l )i,j = δk,i δl,j (so
the (k, l) entry is equal to 1 and all other entries are equal to 0).
Then the set S = Ek,l k, l ∈ {1, 2, · · · , n} is a basis for Mm×n (R), which we call the
standard basis for Mm×n (R).
6.17 Example: For a ring R, the set S = Sn = 1, x, x2 , · · · , xn is the standard basis
for Pn (R), and the set S = 1, x, x2 , x3 , · · · is the standard basis for Pn (R) = R[x]. It
is not immediately obvious whether the R-module R[[x]] has a basis.
42
Ordered Bases and the Coordinate Map
6.18 Definition: Let R be a ring and let W be an R-module. Let A = (u1 , u2 , · · ·) be
an ordered n-tuple of elements in W . A linear combination of A (over R) is an element
n
P
x ∈ W of the form x =
ti ui with each ti ∈ R. The span of U (over R) is the set
i=1
Span (A) =
nP
n
o
ti ui t ∈ Rn .
i=1
When U = Span (A) we say that A spans U or that U is spanned by A. We say that A
n
P
is linearly independent (over R) when for all t ∈ Rn , if
ti ui = 0 then t = 0. We say
i=1
that A is an ordered basis for U when A is linearly independent and spans U .
6.19 Note: Let R be a ring, let W be an R-module, and let u1 , u2 , · · · , un ∈ W . Then the
span of the n-tuple (u1 , u2 , · · · , un ) is equal to the span of there set {u1 , u2 , · · · , un }, and
the n-tuple (u1 , u2 , · · · , un ) is linearly independent if and only of the elements u1 , u2 , · · · , un
are distinct and the set {u1 , u2 , · · · , un } is linearly independent. For a submodule U ⊆ W ,
the n-tuple (u1 , · · · , un ) is a basis for U if and only if the elements ui are distinct and the
set {u1 , · · · , un } is a basis for U .
6.20 Definition: Let R be a ring, let U be an R-module, and let A = (u1 , u2 , · · · , un )
be an ordered basis for U . Given an element x ∈ U , since A spans U we can write x as
n
P
a linear combination x =
ti ui with t ∈ Rn , and since A is linearly independent the
i=1
n
element t ∈ R is unique. We denote the unique element t ∈ Rn such that x =
n
P
ti ui by
i=1
[x]A . Thus for x ∈ U and t ∈ Rn we have
t = [x]A ⇐⇒ x =
n
P
ti ui .
i=1
The elements t1 , t2 , · · · , tn ∈ R are called the coordinates of x with respect to the ordered
basis A. In the case that R is a field, the vector t = [x]A is called the coordinate vector
of x with respect to A. The bijective map φA : U → Rn given by
φA (x) = [x]A
is called the coordinate map from U to Rn induced by the ordered basis A.
6.21 Theorem: Let R be a ring, let U be a free R-module, and let A be a finite ordered
basis for U . Then coordinate map φA : U → Rn is bijective and linear. That is
φA (x + y) = φA (x) + φA (y) and φA (tx) = r φA (x)
for all x, y ∈ U and all r ∈ R.
Proof: I may include a proof later.
43
The Dimension of Finite Dimensional Vector Spaces
6.22 Note: Let R be a ring and let U be a free R-module. If A and B are bases for U
with A ⊆ B, then we must have A = B because if we had A ⊂
6 B then we could choose
=
n
P
v ∈ B \ A then write v as a linear combination v =
ti ui with each ui ∈ A, but then we
would would have 0 = 1 · v −
n
P
i=1
ti ui which is a linear combination of elements in B with
i=1
coefficients not all equal to zero.
6.23 Theorem: Let R be a ring and let U be a free R-module. Let A and B be two bases
for U over R. Then A and B are either both finite or both infinite.
Proof: If U = {0} then A = B = ∅. Suppose that U 6= {0}. Suppose that one of the
mi
P
two bases is finite, say A = {u1 , u2 , · · · , un }. For each index i, write ui =
ti,j vi,j with
j=1
ti,j ∈ R and vi,j ∈ B. Let C = vi,j 1 ≤ i ≤ n , 1 ≤ j ≤ mi . Note that C is finite, and C is
linearly independent because C ⊆ B and B is linearly independent, and C spans U because
mi
n
n P
P
P
given x ∈ U we can write x =
ti ui and then we have x =
(ti si,j )vi,j ∈ Span (C).
i=1
i=1 j=1
Since B and C are both bases for U and C ⊆ B we have C = B, so B is finite.
6.24 Theorem: Let F be a field and let U be a vector space over F . Let A and B be
finite bases for U over F . Then |A| = |B|.
Proof: If U = {0} then A = B = ∅. Suppose that U 6= {0}. Let n = |A| and say
A = {u1 , u2 , · · · , un }. Replace the set A by the ordered
n-tuple A = (u1 , u2 , · · · , un ). Let
n
φ = φA : U → F and consider the set φ(B) = φ(v) v ∈ B ⊆ F n . Note that φ(B)
n
P
spans F n because given t ∈ F n we can let x =
ti ui so that [x]A = t, then we can
i=1
P
m
m
P
write x =
si vi with si ∈ R and vi ∈ B, and then we have t = φ(x) = φ
si vi =
m
P
i=1
i=1
si φ(vi ) ∈ Span φ(B) . Also, we claim that φ(B) is linearly independent in F n . Suppose
i=1
that
m
P
si yi = 0 where the yi are distinct elements in F n and each si ∈ F . Choose elements
i=1
xi ∈ B so that φ(xi ) = yi , and note that the elements xi will be distinct because φ is
P
m
m
m
P
P
bijective. Then we have 0 =
si yi =
si φ(xi ) = φ
si xi . Since φ is injective it
follows that
m
P
i=1
i=1
i=1
si xi = 0. Since the elements xi are distinct elements in B and B is linearly
i=1
independent, it follows that every si = 0. Thus φ(B) is linearly independent, as claimed.
Since φ(B) spans F n it follows that φ(B) ≥ n, and since φ(B) is linearly independent it
follows that φ(B) ≤ n, and so we have φ(B) = n = |A|. Since φ is bijective we have
|B| = φ(B) = |A|.
6.25 Definition: Let U be a vector space over a field F . When U has a finite basis,
we say that U is finite dimensional (over F ) and we define the dimension of U to be
dim(U ) = |A| where A is any basis for U .
44
The Existence of a Basis for a Vector Space
6.26 Definition: Let S be a nonempty set of sets. A chain in S is a nonempty subset
T ⊆ S with the property that for all A, B ∈ T , either A ⊆ B or B ⊆ A. For a subset
T ⊆ S, an upper bound for T in S is an element B ∈ S such that A ⊆ B for all A ∈ T .
A maximal element in S is an element B ∈ S such that there is no A ∈ S with B ⊆ A.
6.27 Theorem: (Zorn’s Lemma) Let S be a nonempty set. Suppose that every chain in
S has an upper bound in S. Then S has a maximal element.
Proof: We take Zorn’s Lemma to be an axiom, which means that we accept it as true
without proof.
6.28 Note: Let F be a field and let U be a vector space over F . Let A be a linearly
independent subset of U and let v ∈ U . Then A ∪ {v} is linearly independent if and only
if v ∈
/ Span (A) We leave the proof of this result as an exercise. Note that the analogous
result does not hold when U is a module over a ring R.
6.29 Theorem: Every vector space has a basis. Indeed, every linearly independent set
in a vector space is contained in a basis for the vector space.
Proof: Let F be a field and let U be a vector space over F . Let A be a subset of U which is
linearly independent over F . Let S be the collection of all linearly independent sets B ⊆ U
with A ⊆ B. Note that S 6= ∅ because A ∈ S. Let T be a chain in S. We claim S
that
S
T ∈ S. Since T 6= ∅ we can choose an element B0 ∈ TS and then we have A ⊆ B0 ⊆ T .
Since for every B ∈ T we have B ⊆ U it follows that T ⊆ U . It remains to show that
n
S
P
T is linearly independent. Suppose that
ti ui = 0 where the ui are distinct elements
i=1
S
S
in T and ti ∈ F . For each index i, since ui ∈ T we can choose Bi ∈ T with ui ∈ Bi .
Since T is a chain, for all indices i and j, either Bi ⊆ Bj or B| ⊆ Bi . It follows that we can
choose an index k so that Bi ⊆ Bk for all indices i. Then we have ui ∈ Bk for all i. Since
n
P
the ui are distinct elements in Bk with
ti ui = 0 and since Bk is linearly independent
i=1
S
it
S follows that ti = 0 for every
S i. This shows that T is linearly independent, and so
T ∈ S, as claimed. Since
S T ∈ S it follows that T has an upper bound in S since for
every B ∈ T we have B ⊆ T . By Zorn’s Lemma, it follows that S has a maximal element.
Let B be a maximal element in S. We claim that B is a basis for U . Since B ∈ S we know
that A ⊆ B ⊆ U and that B is linearly independent. Note also that B spans U because if
we had Span (B) ⊂
/ Span (B) and then B ∪ {w}
6= U then we could choose w ∈ U with w ∈
would be linearly independent by the above Note, but then B would not be maximal in S.
6.30 Example: When F is a field and A is any set, the vector spaces F ω , F [[x]] and
Func(A, F ) all have bases. It is not easy to construct an explicit basis for any of these
vector spaces.
6.31 Example: There exists a basis for R as a vector space over Q, but it is not easy to
construct an explicit basis.
45
Some Cardinal Arithmetic
6.32 Definition:
Let S be a set of nonempty sets. A choice function on S is a function
S
f : S → S such that f (A) ∈ A for every A ∈ S.
6.33 Theorem: (The Axiom of Choice) Every set of nonempty sets has a choice function.
Proof: A proof can be found in Ehsaan Hossain’s Tutorial Lecture Notes.
6.34 Corollary: Let A beSa set. For each α ∈ A, let Xα be a nonempty set. Then there
exists a function f : A →
Xα with f (α) ∈ Xα for all α ∈ A.
α∈A
Proof: Let S = Xα α ∈ A . Note that
S
S =
S
Xα . Let g : S →
S
S be a choice
S
function for S, so we have g(Xα ) ∈ Xα for all α ∈ A. Define the map f : A →
Xα by
α∈A
α∈A
f (α) = g(Xα ) to obtain f (α) ∈ Xα for all α ∈ A, as required..
6.35 Theorem: Let A and B be nonempty sets and let f : A → B. Then
(1) f is injective if and only if f has a left inverse, and
(2) f is surjective if and only if f has a right inverse.
Proof: The proofs can be found in last term’s MATH 147 Lecture Notes. We remark that
the proof of Part (1) does not require the Axiom of Choice but the proof of Part (2) does.
6.36 Definition: For sets A and B, we say that A and B are equipotent (or that A
and B have the same cardinality), and we write |A| = |B| when there exists a bijection
f : A → B. We say that the cardinality of A is less than or equal to the cardinality of
B, and we write |A| ≤ |B|, when there exists an injective map f : A → B. Note that by
the above theorem, we have |A| ≤ |B| if and only if there exists a surjective map f gB → A.
6.37 Note: It follows immediately from the above definitions that for all sets A, B and
C we have
(1) |A| = |A|,
(2) if |A| = |B| then |B| = |A|, and
(3) if |A| = |B| and |B| = |C| then |A| = |C|, and also
(4) |A| ≤ |A|, and
(5) if |A| ≤ |B| and |B| ≤ |C| then |A| ≤ |C|.
Properties 1, 2 and 3 imply that equipotence is an equivalence relation on the class of all
sets. Properties 4 and 5 are two of the 4 properties which appear in the definition a total
order. The other 2 properties also hold, but they require proof. The third property is
known as the Cantor-Schroeder-Bernstein Theorem, and we state it below. After that, we
state and prove the fourth property.
6.38 Theorem: (The Cantor-Schroeder-Bernstein Theorem) Let A and B be sets. If
|A| ≤ |B| and |B| ≤ |A| then |A| = |B|.
Proof: A proof can be found in last term’s MATH 147 Lecture Notes (we remark that it
does not require the Axiom of Choice).
46
6.39 Theorem: Let A and B be sets. Then either |A| ≤ |B| or |B| ≤ |A|.
Proof: If A = ∅ we have |A| ≤ |B|. If B = ∅ we have |B| ≤ |A|. Suppose that |A| =
6 ∅
and B 6= ∅. Let S be the set of all (graphs of) injective functions f : X → B with X ⊆ A.
Note that S 6= ∅ since we can choose
S a ∈ A and b ∈ B and define f : {a} → B by f (a) = b.
Let T be a chain in S. Note that T ∈ S, as in the proof of the Axiom of Choice found in
Ehsaan Hossain’s notes, and so T has an upper bound in S. By Zorn’s Lemma, it follows
that S has a maximal element. Let (the graph of) g : X → B be a maximal element
⊂
in S. Note that either X = A or g(X) = B since if we had X ⊂
6= A and g(X) 6= B then
we could choose an element a ∈ A \ X and an element b ∈ B \ f (A) and then extend g
to the injective map h : X ∪ {a} → B defined by h(x) = g(x) for x ∈ X and h(a) = b,
contradicting the maximality of g. In the case that X = A, since the map g : A → B is
injective we have |A| ≤ |B. In the case that f (X) = B, the map g : X → B is surjective
so we have |B| ≤ |X| ≤ |A|.
6.40 Note: Recall that for sets X and Y , the set of all functions f : Y → X is denoted
by X Y . As an exercise, verify that given sets A1 , A2 , B1 and B2 with |A1 | = |A2 | and
|B1 | = |B2 |, we have
(1) |A1 × B1 | = |A2 × B2 |,
(2) A1 B1 = A2 B2 , and
(3) if A1 ∩ B1 = ∅ and A2 ∩ B2 = ∅ then |A1 ∪ B1 | = |A2 ∪ B2 |.
6.41 Definition: (Cardinal Arithmetic) For sets A, B and X, we write |X| = |A| |B| when
|X| = |A × B|, |X| = |A||B| when |X| = |AB |, and |X| = |A| + |B| when |X| = |A0 ∪ B 0 | for
disjoint sets A0 and B 0 with |A0 | = |A| and |B 0 | = |B| (for example the sets A0 = A × {1}
and B 0 = B × {2}).
6.42 Note: Let B be an infinite set and let n ∈ Z+ , and let Sn = {1, 2, · · · , n}. As an
exercise, show that there are injective maps f : B → B × Sn and g : B × Sn → B × B so
that |B| ≤ |B × Sn | ≤ |B × B|, then use the Cantor-Schroeder-Bernstein Theorem to show
that if |B| = |B × B| then we have |B × Sn | = |B|.
47
6.43 Theorem: Let A be an infinite set. Then |A × A| = |A|.
Proof: Let S be the set of all (graphs of) bijective functions f : X × X → X where X
is an infinite subset of A. Note that S 6= ∅ because, as proven in last term’s MATH 147
Lecture notes, we can choose a countable
subset X ∈ A and a bijection f : X × X → X.
S
Let T be a chain in S. Note that T ∈ S as in the proof of the Axiom of Choice found
in Ehsaan Hossain’s notes, and so T has an upper bound in S. By Zorn’s Lemma, S has
a maximal element. Let (the graph of) g : B × B → B be a maximal element in S. Note
that since g : B × B → B is bijective we have |B × B| = |B|. By the previous theorem,
either |B| ≤ |A \ B| or |A \ B| ≤ |B|. We claim that |A \ B| ≤ |B|.
Suppose, for a contradiction, that |B| ≤ |A \ B|. Choose C ⊆ A \ B with |C| = |B|.
Note that the set (B ∪ C) × (B ∪ C) is the disjoint union
(B ∪ C) × (B ∪ C) = (B × B) ∪ (B × C) ∪ (C × B) ∪ (C × C)
and we have
(B × C) ∪ (C × B) ∪ (C × C) = |B × C| + |C × B| + |C × C|
= |B × B| + |B × B| + |B × B| = |B| + |B| + |B| = |B × {1, 2, 3}| = |B| = |C|
and so the maximal bijective map g : B × B → B can be extended to a bijective map
h : (B ∪ C) × (B ∪ C) → (B ∪ C) contradicting the maximality of g. Thus the case in
which |B| ≤ |A \ B| cannot arise, and so we must have |A \ B| ≤ |B|, as claimed.
Since |A \ B| ≤ |B| we have
|A| = (A \ B) ∪ B = |A \ B| + |B| ≤ |B| + |B| = |B × {1, 2}| = |B|
and hence
|A × A| = |B × B| = |B| = |A|.
6.44 Corollary: Let A and B be sets.
(1) If A is nonempty and B is infinite and |A| ≤ |B|, then |A| |B| = |B|.
(2) If B is infinite and |A| ≤ |B| then |A| + |B| = |B|.
Proof: The proof is left as an exercise.
6.45 Corollary: Let A be an infinite set. For each u ∈ A let Bu be a finite set. Then
S
Bu ≤ |A|.
u∈A
Proof: For each u ∈ A, choose a surjective map fu : A → Bu . Define f : A × A →
S
u∈A
by g(u, v) = fu (v). Note that g is surjective and so we have
S
Bu ≥ |A × A| = |A|.
u∈A
48
Bu
The Dimension of an Infinite Dimensional Vector Space
6.46 Theorem: Let R be a ring and let U be a free R-module. Then any two infinite
bases for U have the same cardinality.
Proof: Let A and B be infinite bases for U . For each u ∈ A, let c = c(u) : A → R be
the (unique)
function, with c(u)v = 0 for all but finitely many elements v ∈ B, such that
P
u=
c(u)v · v, and then let Bu be the set of all elements v ∈ B for which c(u)v 6= 0.
v∈B
S
Note that each set Bu is a nonempty finite subset of B. Let C =
Bu . Note that C
u∈A
spans U because given any x ∈ U we can write x =
for each index i we can write ui =
mi
P
n
P
ti ui with each ui ∈ A, and then
i=1
si,j vi,j with each vi,j ∈ Bui , and then we have
j=1
x=
P
(ti si,j )vi,j ∈ Span
i,j
n
S
Bui ⊆ Span (C). Since B is linearly independent and C ⊆ B,
i=1
it follows that C is linearly independent. Since C is linearly independent and spans U , it is
a basis for U . Since C and B are bases for U with C ⊆ B it follows that C = B because if we
n
P
had C ⊂
B
then
we
could
choose
v
∈
B
\
C
then
write
v
as
a
linear
combination
v
=
ti vi
6=
i=1
n
P
with each vi ∈ C, but then we would have 0 = 1 · v −
ti vi which is a linear combination
i=1
of elements in B with not all coefficients equal to 0. By the above theorem, we have
S
|B| = |C| =
Bu ≤ |A|.
u∈A
By interchanging the rôles of A and B in the above proof, we see that |A| ≤ |B|. Thus we
have |A| = |B| by the Cantor-Schroeder-Bernstein Theorem.
6.47 Definition: Let R be a ring and let U be a free R-module with an infinite basis. We
define the rank of R to be rank(R) = |A| where A is any basis for U . When F is a field
and U is a vector space over F which has an infinite basis, the rank of U is also called the
dimension of U .
49
Chapter 7. Module Homomorphisms and Linear Maps
7.1 Definition: Let R be a ring and let U and V be R-modules. An (R-module) homomorphism from U to V is a map L : U → V such that
L(x + y) = L(x) + L(y) and L(tx) = t L(x)
for all x, y ∈ U and all t ∈ R. A bijective homomorphism from U to V is called an
isomorphism from U to V , a homomorphism from U to U is called an endomorphism
of U , and an isomorphism from U to U is called an automorphism of U . We say that U
is isomorphic to V , and we write U ∼
= V , when there exists an isomorphism L : U → V .
We use the following notation
Hom(U, V ) = HomR (U, V ) = L : U → V L is a homomorphism ,
Iso(U, V ) = IsoR (U, V ) = L : U → V L is an isomorphism ,
End(U ) = EndR (U ) = L : U → U L is an endomorphism ,
Aut(U ) = AutR (U ) = L : U → U L is an automorphism .
For L, M ∈ Hom(U, V ) and t ∈ R we define L + M and tM by
(L + M )(x) = L(x) + M (x) and (tL)(x) = t L(x).
Using these operations, if R is commutative then the set Hom(U, V ) is an R-module. For
L ∈ Hom(U, V ), the image (or range of L and the kernel (or null set) of L are the sets
Image(L) = Range(L) = L(U ) = L(x) x ∈ U and
Ker(L) = Null(L) = L−1 (0) = x ∈ U L(x) = 0 .
When F is a field and U and V are vector spaces over F , an F -module homomorphism
from U to V is also called a linear map from U to V .
7.2 Note: For an R-module homomorphism L : U → V and for x ∈ U we have L(0) = 0
P
n
n
P
and L(−x) = −L(x), and for ti ∈ R and xi ∈ U we have L
t i xi =
ti L(xi ).
i=1
i=1
7.3 Definition: When G and H are groups, a map L : G → H is called a group
homomorphism when L(xy) = L(x)L(y) for al x, y ∈ G. A group isomorphism is
a bijective group homomorphism. When R and S are rings, a map L : R → S is called
a ring homomorphism when L(x + y) = L(x) + L(y) and L(xy) = L(x)L(y) for all
x, y ∈ R. A ring isomorphism is a bijective ring homomorphism. When R is a ring
and U and V are R-algebras, a map L : U → V is called an R-algebra homomorphism
when L(x + y) = L(x) + L(y), L(xy) = L(x)L(y) and L(tx) = t L(x) for all x, y ∈ U and
all t ∈ R. An R-algebra isomorphism is a bijective R-algebra homomorphism.
50
7.4 Theorem: Let R be a ring and let U , V and W be R-modules.
(1) If L : U → V and M : V → W are homomorphisms then so is the composite M L : U → W.
(2) If L : U → V is an isomorphism, then so is the inverse L−1 : V → U .
Proof: Suppose that L : U → V and M : V → W are R-module homomorphisms. Then
for all x, y ∈ U and all t ∈ R we have
M (L(x + y)) = M (L(x) + L(y)) = M (L(x)) + M (L(y)) and
M (L(tx)) = M (t L(x)) = t M (L(x)).
Suppose that L : U → V is an isomorphism. Then given u, v ∈ V and t ∈ R, if we let
x = L−1 (u) and y = L−1 (v) then we have
L−1 (u + v) = L−1 L(x) + L(y) = L−1 L(x + y) = x + y = L−1 (u) + L−1 (v) and
L−1 (tu) = L−1 t L(x) = L−1 L(tx) = tx = t L−1 (u).
7.5 Corollary: Let R be a ring. Then isomorphism is an equivalence relation on the class
of all R-modules. This means that for all R-modules U , V and W we have
(1) U ∼
= U,
(2) if U ∼
= V then V ∼
= U , and
∼
∼
(3) if U = V and V = W then U ∼
= W.
7.6 Corollary: When R is a commutative ring and U is an R-module, End(U ) is a ring
under addition and composition, hence also an R-algebra, and Aut(U ) is a group under
composition.
7.7 Theorem: Let L : U → V be an R-algebra homomorphism.
(1) If U0 is a submodule of U then L(U ) is a submodule of V . In particular, the image of
L is a submodule of V .
(2) If V0 is a submodule of V then L−1 (V0 ) is a submodule of U . In particular, the kernel
of L is a submodule of U .
Proof: To prove Part (1), let U0 be a submodule of U . Let u, v ∈ L(U0 ) and let t ∈ R.
Choose x, y ∈ U0 with L(x) = u and L(y) = v. Since x + y ∈ U0 and L(x + y) =
L(x) + L(y) = u + v, it follows that u + v ∈ L(U0 ). Since tx ∈ U0 and L(tx) = t L(x) = tu,
it follows that tu ∈ L(U0 ). Thus L(U0 ) is closed under the module operations and so it is
a submodule of V .
To prove Part (2), let V0 be a submodule of V . Let x, y ∈ L−1 (V0 ) and let t ∈ R. Let
u = L(x) ∈ V0 and v = L(y) ∈ V0 . Since L(x + y) = L(x) + L(y) = u + v ∈ V0 it follows
that x + y ∈ L−1 (V0 ). Since L(tx) = t L(x) = Lu ∈ V0 it follows that tx ∈ L−1 (V0 ). Thus
L−1 (V0 ) is closed under the module operations and so it is a sub algebra of U .
7.8 Theorem: Let L : U → V be an R-module homomorphism. Then
(1) L is surjective if and only if Range(L) = V , and
(2) L is injective if and only if Ker(L) = {0}.
Proof: Part (1) is simply a restatement of the definition of subjectivity and does not require
proof. To Prove Part (2), we begin by remarking that since L(0) = 0 we have {0} ⊆ Ker(L).
Suppose L is injective. Then x ∈ Ker(L) =⇒ L(x) = 0 =⇒ L(x) = L(0) =⇒ x = 0 and
so Ker(L) = {0}. Suppose, conversely, that Ker(L) = {0}. Then L(X) = L(y) =⇒
L(x) − L(y) = 0 =⇒ L(x − y) = 0 =⇒ x − y ∈ Ker(L) = {0} =⇒ x − y = 0 =⇒ x = y and
so L is injective.
51
7.9 Example: The maps
n
P
L : Pn (R) → Rn+1 given by L
ai xi = (a0 , a1 , · · · , an )
i=0
L : R[x] → R∞ given by L
n
P
L : R[[x]] → Rω given by L
i=0
∞
P
ai xi = (a0 , a1 , · · · , an , 0, 0, · · ·)
ai xi = (a0 , a1 , a2 , · · ·)
i=0
are all R-algebra isomorphisms, so we have Pn (R) ∼
= Rn+1 , R[x] ∼
= R∞ and R[[x]] ∼
= Rω .
7.10 Example: The map L : Mm×n (R) → Rm·n given by


a1,1 a1,2 · · · a1,n

..
..  = a , a , · · · , a , a , · · · , a , · · · , a , · · · , a L  ...
1,1 1,2
1,n 2,1
2,n
m,1
m,n
.
. 
am,1
am,2
···
am,n
is an R-module isomorphism, so we have Mm×n (R) ∼
= Rm·n .
7.11 Example: Let A and B be sets with |A| = |B|, and let g : A → B be a bijection.
Then the map L : RA → RB given by L(f )(b) = f (g −1 (b)), that is by L(f ) = f g −1 , is an
R-module isomorphism, and so we have RA ∼
= RB . In particular, if |A| = n then we have
{1,2,···,n}
{1,2,3,···}
RA ∼
= Rn , and if |A| = ℵ0 then we have RA ∼
= Rω .
=R
=R
7.12 Example: Let A = (u1 , u2 , · · · , un ) be a finite ordered basis for a free R-module U .
Then the map φA : U → Rn given by φA (x) = [x]A is an R-module isomorphism, so we
have U ∼
= Rn .
If B = (v1 , v2 , · · · , vn ) is a finite ordered basis for another free R-module V , then the
∼
map φ−1
B φA : U → V is an R-molule isomorphism, so we have U = V .
7.13 Example: Let R be a commutative ring. Let φ : Hom(Rn , Rm ) → Mm×n (R) be the
map given by φ(L) = [L] = L(e1 ), L(e2 ), · · · , L(en ) ∈ Mm×n (R). Recall that the inverse
of φ is the map ψ : Mm×n (R) → Hom(Rn , Rm ) given by ψ(A) = LA where LA (a) = Ax.
Note that ψ preserves the R-module operations because
φ(A + B) = LA+B = LA + LB = ψ(A) + ψ(B) and
ψ(tA) = LtA = t LA = t ψ(A).
Thus φ and ψ are R-module isomorphisms and we have Hom(Rn , Rm ) ∼
= Mm×n (R). In
the case that m = n, we also have
ψ(AB) = LAB = LA LB = ψ(A)ψ(B)
and so the maps φ and ψ are in fact R-algebra isomorphisms so we have End(Rn ) ∼
= Mn (R)
as R-algebras. By restricting φ and ψ to the invertible elements, we also obtain a group
isomorphism Aut(Rn ) ∼
= GLn (R).
52
7.14 Theorem: Let R be a ring and let U and V be free R-modules. Then U ∼
= V if and
only if there exists a basis A for U and a basis B for V with |A| = |B|.
∼
Proof: Suppose that
U = V . Let A be a basis for U and let L : U → V be an isomorphism.
Let B = L(A) = L(u) u ∈ A . Since L is bijective we have |A| = |B|. Note that B spans
n
P
V because given y ∈ V we can choose x ∈ U with L(x) = y, then write x =
ti ui with
i=1
ti ∈ R and ui ∈ A, and then we have
P
P
n
n
y = L(x) = L
ti ui =
ti L(ui ) ∈ Span (B).
i=1
i=1
It remains to show that B is linearly independent. Suppose that
n
P
ti vi = 0 where ti ∈ R
i=1
and the vi are distinct elements in B. For each index i, choose ui ∈ A with L(ui ) = vi ,
and note that the elements ui are distinct because L is bijective. We have
P
n
n
n
P
P
ti ui .
0=
t i vi =
ti L(ui ) = L
i=1
i=1
i=1
Because L is injective, it follows that
n
P
ti ui = 0 and then, because A is linearly indepen-
i=1
dent, it follows that each ti = 0. Thus B is linearly independent, as required.
Conversely, suppose that A is a basis for U and B is a basis for V with |A| = |B|.
Let g : A → B be a bijection. Define a map L : U → V as follows. Given x ∈ U ,
n
P
write x =
ti ui where ti ∈ R and the ui are distinct elements in A, and then define
L(x) =
x=
n
P
n
P
i=1
ti g(ui ). Note that L is an R-module homomorphism because for r ∈ R and for
i=1
si ui and y =
i=1
n
P
ti ui (where we are using the same elements ui in both sums with
i=1
some of the coefficients equal to zero), we have
P
n
n
n
P
P
L(rx) = L
(rsi )ui =
rsi g(ui ) = r
si g(ui ) = rL(x) , and
i=1
i=1
i=1
P
P
n
n
n
n
P
P
L(x + y) = L
(si + ti )ui = (si + ti )g(ui ) =
si g(ui ) +
ti g(ui ) = L(x)+L(y).
i=1
i=1
i=1
Also note that L is bijective with inverse M : V → U given by M
i=1
P
n
i=1
P
n
t i vi =
ti g −1 (vi ),
i=1
where ti ∈ R and the vi are distinct elements in B.
7.15 Corollary: Let F be a field and let U and V be vector spaces over F . Then
U∼
= V ⇐⇒ dim(U ) = dim(V ).
7.16 Remark: When U and V are modules over a commutative ring R, we have
U∼
= V ⇐⇒ rank(U ) = rank(V ),
but we have not built up enough machinery to prove this result.
53
7.17 Theorem: Let R be a ring, let U be a free R-module, and let V be any R-module.
Let A be basis for U and, for each u ∈ A, let vu ∈ V . Then there exists a unique R-module
homomorphism L : U → V with L(u) = vu for all u ∈ A.
Proof: Note that if L : U → V is an R-module homomorphism with L(u) = vu for all
u ∈ U , then for ti ∈ R and ui ∈ A we have
P
P
n
n
n
P
L
ti ui =
ti L(ui ) =
ti vui .
i=1
i=1
i=1
This shows that the map L is unique and must be given by the above formula.
P
P
n
n
To prove existence, we define L : U → V by L
ti ui =
ti vui where ti ∈ R and
i=1
i=1
ui ∈ A, and we note that L is an R-module homomorphism because for x =
y=
n
P
n
P
si ui and
i=1
ti ui (using the same elements ui in both sums) and r ∈ R we have
i=1
P
P
n
n
n
P
L(rx) = L
rsi ui =
rsi vui = r
si vui = r L(x) , and
i=1
i=1
i=1
P
P
n
n
n
n
P
P
(si + ti )ui =
L(x + y) = L
(si + ti )vui =
si vui +
ti vui = L(x) + L(y).
i=1
i=1
i=1
i=1
7.18 Corollary: Let R be a commutative ring, let U be a free R-module with basis A
and let V be an R-module. Then the map φ : Hom(U, V ) → V A , given by φ(L)(u) = L(u)
for all u ∈ A, is an R-module isomorphism, and so we have Hom(U, V ) ∼
= V A.
Proof: The above theorem states that the map φ is bijective, and we note that φ is an
R-module homomorphism because for L, M ∈ Hom(U, V ) and t ∈ R we have
φ(L + M )(u) = (L + M )(u) = L(u) + M (u) = φ(L)(u) + φ(M )(u) , and
φ(tL)(u) = (tL)(u) = t L(u) = tφ(L)(u)
for all u ∈ A hence φ(L + M ) = φ(L) + φ(M ) and φ(tL) = tφ(L).
7.19 Example: For a module U over a commutative ring R, the dual module of U is
the R-module
U ∗ = Hom(U, R).
By the above corollary, when U is a free module with basis A. we have U ∗ ∼
= RA , and if
|A| = n then we have U ∗ ∼
= Rn ∼
= U . When U is a vector space over a field F , U ∗ is called
the dual vector space of U .
7.20 Definition: Let R be a commutative ring and let U and V be free R-modules with
finite bases. Let A and B be finite ordered bases for U and V respectively, with |A| = n
and |B| = m. For L ∈ Hom(U, V ), we define the matrix of L with respect to A and B to
be the matrix
−1
[L]A
∈ Mm×n (R).
B = φB L φA
When L ∈ End(U ) we write [L]A for the matrix [L]A
A ∈ Mn (R).
54
7.21 Theorem: Let R be a commutative ring. Let A and B be finite ordered bases for
free R-modules U and V , respectively, with |A| = n and |B| = m. Let L ∈ Hom(U, V ).
Then
A
(1) [L]A
B is the matrix such hat [L]B [u]A = L(u) B for all u ∈ U , and
(2) if A = (u1 , u2 , · · · , un ) and B = (v1 , v2 , · · · , vm ) then
,
·
·
·
,
L(u
)
∈ Mm×n (R).
[L]A
=
,
L(u
)
L(u
)
n B
2 B
1 B
B
Proof: Part (1) holds because for u ∈ U and x = φA (u) = [u]A we have
−1
−1
[L]A
φA (u) φB L(u) = L(u) B .
B [u]A = φB L φA φA (u) = φB L φA
Part (2) follows from Part (1) because for each index k, the k th column of [L]A
B is
A
[L]A
B (ek ) = [L]B [uk ]A = L(uk ) B .
7.22 Theorem: Let R be a commutative ring. Let A, B and C be finite ordered bases for
free R-modules U , V and W , respectively.
A
A
A
and
t L B = t [L]A
+[M
]
(1) For L, M ∈ Hom(U, V ) and t ∈ R we have L+M B = [L]A
B.
B
B
A
A
B
(2) For L ∈ Hom(U, V ) and M ∈ Hom(V, W ) we have M L C = [M ]C [L]B .
Proof: We prove Part (2), leaving the (similar) proof of Part (1) as an exercise. Let
L ∈ Hom(U, V ) andn let M ∈ Hom(V, W ). Say |A| = n, |B| = m and |C| = l. Let x ∈ Rn .
Choose u ∈ U with [u]A = φA (u) = x. Then
B B A
A
A
B A
M L C x = M L C [u]A = M (L(u)) C = M C L(u) B = M C L B [u]A = M C L B x.
A
B A
A B A
Since M L C x = M C L B x for all x ∈ Rn it follows that M L C = M C L B .
7.23 Corollary: Let R be a commutative ring. Let A and B be finite ordered bases
for free R-modules U and V . Then the map φA
B : Hom(U, V ) → Mm×n (R) given by
A
A
φB (L) = [L]B is an R-module isomorphism, and the map φA : End(U ) → Mn (R) given
by φA (L) = [L]A is an R-algebra isomorphism which restricts to a group isomorphism
φA : Aut(U ) → GLn (R).
7.24 Corollary: (Change of Basis) Let R be a commutative ring. Let U and V be free
R-modules. Let A and C be two ordered bases for U with |A| = |C| = n and let B and D
be two ordered bases for V with |B| = |D| = m. For L ∈ Hom(U, V ) and u ∈ U we have
C
B
A
C
[u]C = [IU ]A
C [u]A and [L]D = [IV ]D [L]B [IU ]A
where IU and IV are the identity maps on U and V .
Proof: By Part (1) of Theorem 7.21 we have [u]C = [IU (u) C = [IU ]A
C [u]A and by Part (2)
of Theorem 7.22
C
C
A
C
L D = IV L IU D = [IV ]B
D [L]B [IU ]A .
7.25 Definition: Let A = (u1 , u2 , · · · , un ) and B = (v1 , v2 , · · · , vn ) be two finite ordered
bases for a module U over a commutative ring R. The matrix
[I]A
B = [u1 ]B , [u2 ]B , · · · , [un ]B ∈ Mn (R)
is called the change of basis matrix from A to B. Note that [I]A
B is invertible with
−1
[I]A
= [I]B
B
A.
55
7.26 Note: Let A and B be two finite ordered bases, with |A| = |B|, for a free module U
over a commutative ring R. For L ∈ End(U ), the Change of Basis Theorem gives
A
A
B
[L]B = [L]B
B = [I]B [L]A [I]A .
If we let A = [L]A and B = [L]B and P = [I]A
B then the formula becomes
B = P AP −1 .
7.27 Note: Given a finite ordered basis B = {v1 , v2 , · · · , vn } for a free R-module U over
a commutative ring R, and given an invertible matrix P ∈ GLn (R), if we choose uk ∈ U
with [uk ]B = P ek , then A = {u1 , u2 , · · · , un } is an ordered basis for U such that [I]A
B = P.
Thus every invertible matrix P ∈ GLn (R) is equal to a change of basis matrix.
7.28 Definition: Let R be a commutative ring. For A, B ∈ Mn (R), we say that A and
B are similar, and we write A ∼ B, when B = P AP −1 for some P ∈ GLn (R).
7.29 Note: Let R be a commutative ring, and let A, B ∈ Mn (R) with A ∼ B. Choose
P ∈ GLn (R) so that B = P AP −1 . Then we have
det(B) = det(P AP −1 = det(P ) det(A) det(P )−1 = det(A).
Thus similar matrices have the same determinant.
7.30 Definition: Let F be a field and let U be a finite dimensional vector space over F .
For L ∈ End(U ), we define the determinant of L to be
det(L) = det [L]A
where A is any ordered basis for U .
56
Chapter 8. Eigenvalues, Eigenvectors and Diagonalization
8.1 Definition: For a square matrix D ∈ Mn (R) with entries in a ring R, we say that
D is a diagonal matrix when Dk,l = 0 whenever k 6= l. For λ1 , λ2 , · · · , λn ∈ R, we write
D = diag(λ1 , λ2 , · · · , λn ) for the diagonal matrix D with Dk,k = λk for all indices k.
8.2 Definition: Let L ∈ End(U ) where U is a finite dimensional vector space over a field F .
We say that that L is diagonalizable when there exists an ordered basis A for U such
that [L]A is diagonal.
8.3 Note: Let L ∈ End(U ) where U is a finite dimensional vector space over a field F .
When A = {u1 , u2 , · · · , un } is an ordered basis for U and λ1 , λ2 , · · · , λn ∈ F , we have
[L]A = diag(λ1 , λ2 , · · · , λn ) ⇐⇒ L(uk ) A = λk ek for all k ⇐⇒ L(uk ) = λk uk for all k
Thus L is diagonalizable if and only if there exists an ordered basis A = {u1 , u2 , · · · , un }
for U and there exist λ1 , λ2 , · · · , λn ∈ F such that L(uk ) = λk uk for all k.
8.4 Definition: Let L ∈ End(U ) where U is a vector space over a field F . For λ ∈ F , we
say that λ is an eigenvalue (or a characteristic value) of L when there exists a nonzero
vector 0 6= u ∈ F n such that L(u) = λu. Such a vector 0 6= u ∈ U is called an eigenvector
(or characteristic vector) of L for λ. The spectrum of L is the set
Spec(L) = λ ∈ F λ is an eigenvalue of L .
For λ ∈ F , the eigenspace of L for λ is the subspace
Eλ = u ∈ U L(u) = λu = u ∈ U (L − λI)u = 0 = Ker(L − λI) ⊆ U.
Note that Eλ consists of the eigenvectors for λ together with the zero vector.
8.5 Note: Let L ∈ End(U ) where U is a finite dimensional vector space over a field F .
For λ ∈ F
λ is an eigenvalue of L ⇐⇒ there exists 0 6= u ∈ Ker(L − λI)
⇐⇒ (L − λI) is not invertible
⇐⇒ det(L − λI) = 0
⇐⇒ λ is a root of f (x) = det(L − xI).
Note that when A is any ordered basis for U , we have
f (x) = det(L − xI) = det L − xI A = det [L]A − xI ∈ Pn (F ).
8.6 Definition: Let L ∈ End(U ) where U is an n-dimensional vector space over a field F .
The characteristic polynomial of L is the polynomial
fL (x) = det(L − xI) ∈ Pn (F ).
Note that Spec(L) is the set of roots of fL (x).
57
8.7 Note: Let L ∈ End(U ) where U is an n-dimensional vector space over a field F . Recall
that L is diagonalizable if and only if there exists an ordered basis A = {u1 , u2 , · · · , un } for
U such that each uk is an eigenvector for some eigenvalue λk . The eigenvalues of L are the
roots of fL (x), so there are at most n possible distinct eigenvalues. For each eigenvalue,
the largest number of linearly independent eigenvectors for λ is equal to the dimension
of Eλ . We can try to diagonalize L be finding all the eigenvalues λ for L, then finding
a basis for each eigenspace Eλ , then selecting an ordered
P basis A from the union of the
bases of the eigenspaces. In particular, note that if
dim(Eλ ) < n then L cannot
λ∈Spec(L)
be diagonalizable.
8.8 Definition: Let F be a field and let A ∈ Mn (F ). By identifying A with the linear
map L = LA ∈ End(F n ) given by L(x) = Ax, all of the above definitions and remarks
may be applied to the matrix A. The matrix A is diagonalizable when there exists
an invertible matrix P and a diagonal matrix D such that A = P DP −1 (or equivalently
P −1AP = D). An eigenvalue for A is an element λ ∈ F for which there exists 0 6= x ∈ F n
such that Ax = λx, and then such a vector x is called an eigenvector of A for λ. The
set of eigenvalues of A, denoted by Spec(A), is called the spectrum of A. For λ ∈ F , the
eigenspace for λ is the vector space Eλ = Null(A−λI). The characteristic polynomial
of A is the polynomial fA (x) = det(A − xI) ∈ Pn (F ),
3 −1
8.9 Example: Let A =
∈ M2 (F ) where F = R or C. The characteristic
4 −1
polynomial of A is
3 − x −1
= (x − 3)(x + 1) + 4 = x2 − 2x + 1 = (x − 1)2
4 −1 − x
fA (x) = det(A − xI) =
so the only eigenvalue of A is λ = 1. When λ = 1 we have
2 −1
2 −1
(A − λI) = (A − I) =
∼
4 −2
0 0
so the eigenspace E1 = Null(A − I) has basis {u} where u = (1, 2)T . Since
P
dim(Eλ ) = dim(E1 ) = 1 < 2,
λ∈Spec(A)
we see that A is not diagonalizable.
1 −2
8.10 Example: Let A =
∈ M2 (F ) where F = R or C. The characteristic
2 1
polynomial of A is
fA (x) =
1−x
2
−2
= (x − 1)2 + 4 = x2 − 2x + 5.
1−x
√
For x ∈ C, we have fA (x) = 0 ⇐⇒ x = 2± 24−20 = 1 ± 2i. When F = R, A has no
eigenvalues (in R) and so A is not diagonalizable. When F = C, the eigenvalues of A
are λ1 = 1 + 2i and λ2 = 1 − 2i. As an exercise, show that when λ = λ1 the eigenspace
Eλ1 has basis {u1 } where u1 = (i, 1)T , and when λ = λ2 the eigenspace Eλ2 has basis
u2 = (−i, 1)T , then verify that the matrix P = (u1 , u2 ) ∈ M2 (C) is invertible and that
P −1 A P = diag(λ1 , λ2 ) thus showing that A is diagonalizable.
58
8.11 Theorem: Let L ∈ End(U ) where U is a vector space over a field F . Let
λ1 , λ2 , · · · , λ` ∈ F be distinct eigenvalues of L. For each index k, let 0 6= uk ∈ U be
an eigenvector of L for λk . Then {u1 , u2 , · · · , u` } is linearly independent.
Proof: Since u1 6= 0 the set {u1 } is linearly independent. Suppose, inductively, that the
P̀
ti ui = 0 with ti ∈ F . Note
set {u1 , u2 , · · · , u`−1 } is linearly independent. Suppose that
i=1
that
0 = (L − λ` I)
P̀
`−1
P̀
P̀
P
ti ui =
ti (L(ui ) − λ` ui ) =
ti (λi − λ` )ui =
ti (λi − λ` )ui
i=1
i=1
i=1
i=1
and so ti = 0 for 1 ≤ i < ` since {u1 , u2 , · · · , u`−1 } is linearly independent. Since ti = 0
P̀
for 1 ≤ i ≤ ` − 1 and
ti ui = 0, we also have t` = 0. Thus {u1 , u2 , · · · , u` } is linearly
i=1
independent.
8.12 Corollary: Let L ∈ End(U ) where U is a vector space over a field F . Let
λ1 , λ2 , · · · , λ` be distinct eigenvalues of L. For each index k, let Ak be a linearly indeS̀
pendent set of eigenvectors for λk . Then
Ak is linearly independent.
k=1
Proof: Suppose that
P̀ m
Pk
tk,i uk,i = 0 where each tk,i ∈ F and for each k, the vectors uk,i
k=1 i=1
are distinct vectors in Ak . Then we have
P̀
uk = 0 where uk =
i=1
m
Pk
tk,i uk,i ∈ Eλk . From
i=1
the above theorem, it follows that uk = 0 for all k, because if we had uk 6= 0 for some values
of k, say the values k1 , k2 , · · · , kr , then {uk1 , uk2 , · · · , ukr } would be linearly independent
m
r
P
Pk
but
ukr = 0, which is impossible. Since for each index k we have 0 = uk =
tk,i uk,i
i=1
i=1
it follows that each tk,i = 0 because Ak is linearly independent.
8.13 Corollary: Let L ∈ End(U ) where U is a finite dimensional vector space over a field F .
Then
P
L is diagonalizable if and only if
dim(Eλ ) = dim U.
λ∈Spec(L)
In this case, if Spec(L) = {λ1 , λ2 , · · · , λ` } and, for each k, Ak = {uk,1 , uk,2 , · · · , uk,mk } is
an ordered basis for Eλk , and then
A=
`
[
Ak = u1,1 , u1,2 , · · · , u1,m1 , u2,1 , u2,2 , · · · , u2,m2 , · · · , u`,1 , u`,2 , · · · , u`,m`
k=1
is an ordered basis for U such that [L]A is diagonal.
59
8.14 Definition: Let F be a field. For f ∈ F [x] and a ∈ F , the multiplicity of a as a
root of f , denoted by mult(a, f (x)), is the smallest m ∈ N such that (x − a)m is a factor
of f (x). Note that a is a root of f if and only if mult(a, f ) > 0. For a non-constant
polynomial f ∈ F [x], we say that f splits (over F ) when f factors into a product of linear
n
Q
factors in F [x], that is when f is of the form f (x) = c
(x − ai ) for some ai ∈ F .
i=1
8.15 Theorem: Let L ∈ End(U ) where U is afinite dimensional vector space over a field F .
Let λ ∈ Spec(L) and let mλ = mult λ, fL (x) . Then
1 ≤ dim(Eλ ) ≤ mλ .
Proof: Since λ is an eigenvalue of L we have Eλ 6= {0} so dim(Eλ ) ≥ 1. Let m = dim(Eλ )
and let A = (u1 , u2 , · · · , um ) be an ordered basis for Eλ . Extend A to an ordered basis
B = (u1 , · · · , um , · · · , un ) for U . Since L(ui ) = λui for 1 ≤ i ≤ m, the matrix [L]B is of the
form
λI A
[L]B =
∈ Mn (F )
0 B
where I ∈ Mm (F ). The characteristic polynomial of L is
Thus (x − λ)m
(λ − x)I
0
A
= (λ − x)m fB (x).
B − xI
is a factor of fL (x) and so mλ = mult λ, fL (x) ≥ m.
fL (x) =
8.16 Corollary: Let L ∈ End(U ) where U is a finite dimensional vector space over a field F .
Then L is diagonalizable if and only if fL (x) splits and dim(Eλ ) = mult(λ, fL (x) for every
λ ∈ Spec(L).
Proof: Suppose that L is diagonalizable. Choose an ordered basis A so that [L]A is
n
Q
diagonal, say [L]A = D = diag(λ1 , · · · , λn ). Note that fL (x) = fD (x) =
(λk − x),
k=1
and so fL (x) splits. For each λ ∈ Spec(L), let mλ = mult(λ, fL (x)). Then, by the above
theorem together with Corollary 8.13, we have
P
P
n = dim(U ) =
dim(Eλ ) ≤
mλ = deg(fL ) = n
λ∈Spec(L)
λ∈Spec(L)
which implies that dim(Eλ ) = mλ for all λ. Conversely, if fL splits and dim(Eλ ) = mλ for
all λ then
P
P
dim(Eλ ) =
mλ = deg(fL ) = n = dim(U )
λ∈Spec(L)
λ∈Spec(L)
and so L is diagonalizable.
8.17 Corollary: Let A ∈ Mn (F ) where F is a field. Then A is diagonalizable if and only
if fA (x) splits and dim(Eλ ) = mult(λ, fA (x)) for all λ ∈ Spec(A).
60
8.18 Note: To summarize the above results, given a matrix A ∈ Mn (F ), where F is a
field, we can determine whether A is diagonalizable as follows. We find the characteristicc
polynomial fA (x) = det(A − xI). We factor fA (x) to find the eigenvalues of A and the
multiplicity of each eigenvalue. If fA (x) does not split then A is not diagonalizable. If fA (x)
does split, then for each eigenvalue λ with multiplicity mλ ≥ 2, we calculate dim(Eλ ). If
we find one eigenvalue λ for which dim(Eλ ) < mλ then A is not diagonalizable. Otherwise
A is diagonalizable. In particular we remark that if fA (x) splits and has n distinct roots
(so the eigenvalues all have multiplicity 1) then A is diagonalizable.
Q̀
In the case that A is diagonalizable and fA (x) = (−1)n
(x − λk )mk , if we find an
k=1
ordered basis Ak = {uk,1 , uk2 , · · · , uk,mk for each eigenspace, then we have P −1AP = D
with
P = u1,1 , u1,2 , · · · , u1,m1 , u2,1 , u2,2 , · · · , u2,m2 , · · · , u`,1 , u`,2 , · · · , u`,m`
D = diag(λ1 , λ1 , · · · , λ1 , λ2 , λ2 , · · · , λ2 , · · · , λ` , λ` , · · · , λ`
where each λk is repeated mk times.


3 1 1
8.19 Example: Let A =  2 4 2  ∈ M3 (Q). Determine whether A is diagonalizable
−1 −1 1
and, if so, find an invertible matrix P and a diagonal matrix D such that P −1AP = D.
Solution: The characteristic polynomial of A is
3−x
2
fA (x) = A − xI =
−1
1
4−x
−1
1
2
1−x
= −(x − 3)(x − 4)(x − 1) − 2 − 2 − 2(x − 3) + 2(x − 1) − (x − 4)
= −(x3 − 8x2 + 19x − 12) − x + 4 = −(x3 − 8x2 + 20x − 16)
= −(x − 2)(x2 − 6x + 8) = −(x − 2)2 (x − 4)
so the eigenvalues are λ1 = 2 and λ2 = 4 of multiplicities m1 =
λ = λ1 = 2 we have

 
1 1 1
1 1



A − λI = A − 2I =
2 2 2
∼ 0 0
0 0
−1 −1 −1
2 and m2 = 1. When

1
0
0
so the eigenspace E2 has basis {u1 , u2 } with u1 = (−1, 0, 1)T and u2 = (−1, 1, 0)T . When
λ = λ2 = 4 we have

 
 

−1 1 1
1 −1 −1
1 0 1
A − λI = A − 4I =  2 0 2  ∼  0 2 4  ∼  0 1 2 
−1 −1 −3
0 2 4
0 0 0
so the eigenspace E4 has basis {u3 } where u3 = (−1, −2, 1)T . Thus we have P −1AP = D
where




−1 −1 −1
2 0 0
P = u1 , u2 , u3 =  0 1 −2  and D = diag λ1 , λ1 , λ2 =  0 2 0  .
1 0 1
0 0 4
61
8.20 Definition: For a square matrix T ∈ Mn (R) with entries in a ring R, we say that
T is upper triangular when Tk,l = 0 whenever k > l.
8.21 Definition: Let L ∈ End(U ) where U is a finite dimensional vector space over a
field F . We say that L is (upper) triangularizable when there exists an ordered basis A
for U such that [L]A is upper triangular.
8.22 Definition: For a square matrix A ∈ Mn (F ), where F is a field, we say that A is
(upper) triangularizable when there exists an invertible matrix P ∈ GLn (F ) such that
P −1AP is upper triangular.
8.23 Theorem: (Schur’s Theorem) Let F be a field.
(1) Let L ∈ End(U ) where U is a finite dimensional vector space over F . Then L is
triangularizable if and only if fL (x) splits, and
(2) Let A ∈ Mn (F ). Then A is triangularizable if and only if fA (x) splits.
Proof: We shall prove Part (2), and we leave it as an exercise to show that Part (1) holds
if and only if Part (2) holds. Suppose first that A is triangularizable. Choose an invertible
matrix P and an upper triangular matrix T with P −1AP = T . Then
n
Q
fA (x) = fT (x) =
(Tk,k − x)
k=1
and so fA (x) splits.
Conversely, suppose that fA (x) splits. Choose a root λ1 of fA (x) and note that λ1 is
an eigenvalue of A. Choose an eigenvector u1 for λ1 , so we have Au1 = λ1 u1 . Since u1 6= 0
the set {u1 } is linearly independent. Extend the set {u1 } to a basis A = {u1 , u2 , · · · , un }
for F n . Let Q = (u1 , u2 , · · · , un ) ∈ Mn (F ), and note that Q is invertible because A is a
basis for F n . Since Q−1 Q = I, the first column of Q−1 Q is equal to e1 , so we have
Q−1AQ = Q−1 A(u1 , u2 , · · · , un ) = Q−1 (Au1 , Au2 , · · · , Aun )
= Q−1 λ1 u1 , A(u2 , · · · , un ) = λ1 Q−1 u1 , Q−1A(u2 , · · · , un )
λ 1 xT
−1
= λ1 e1 , Q A(u2 , · · · , un ) =
0 B
with x ∈ F n−1 and B ∈ Mn−1 (F ).. Note that fA (x) = (x − λ1 )fB (x) and so fB (x)
splits. We suppose, inductively, that B is triangularizable.
Choose R ∈ GLn−1 (F ) so that
1
0
R−1BR = S with S upper-triangular. Let P = Q
∈ Mn (F ). Then P is invertible
0
R
1
0
−1
with P =
Q−1 and
0 R−1
1 0
1
0
1 0
1
0
λ1 xT
−1
−1
P AP =
Q AQ
=
0 B
0 R
0 R−1
0 R
0 R−1
T
T
T
1
0
λ1 x R
λ1
x R
λ1 x R
=
=
=
0 R−1
0 BR
0 R−1BR
0
S
which is upper triangular.
62
Download