0.1 Rational Canonical Forms

Draft. Do not cite or quote.
We have already seen that it is useful and simpler to study linear
systems using matrices. But matrices are themselves cumbersome, as
they are stuffed with many entries, and it turns out that it’s best to
focus on the fact that we can view an m × n matrix A as a function
T : Rn → Rm ; namely, T (x) = Ax for every column vector x ∈ Rn .
The function T , called a linear transformation, has two important
properties: T (x + y) = T (x) + T (y), for all x, y; for every number
α and vector x, we have T (αx) = αT (x). As we shall see in the
course, the matrix A is a way of studying T by choosing a coordinate
system in Rn ; choosing another coordinate system gives another
matrix describing T . Just as in calculus when studying conic sections,
choosing a good coordinate system can greatly simplify things: we
can put a conic section defined by Ax2 + Bxy + Cy 2 + Dx + Ey + F
into normal form, and we can easily read off properties of the conic
(is it a parabola, hyperbola, or ellipse; what are its foci, etc) from
the normal form. The important object of study for us is a linear
transformation T , and any choice of coordinate system produces a
matrix describing it. Is there a “best” matrix describing T analogous
to a normal form for a conic? The answer is yes, and normal forms
here are called canonical forms. Now coordinates arise from bases of
a vector space, and the following discussion is often called change of
Rational Canonical Forms
We begin by stating some facts which will eventually be treated in
the course proper. If you have any questions, please feel free to ask
Unless we say otherwise, our vector spaces involve real numbers
as scalars (although, in several instances, we may want complex
numbers), linear transformations T are functions on a vector space
V (that is, T maps V to itself), and matrices are square.
If T : V → V is a linear transformation on a vector space V
and x = x1 , . . . , xn is a basis of V , then T determines the matrix
A = x [T ]x whose ith column consists of the coordinate list of T (xi )
with respect to x. If Y is another basis of V , then the matrix
B = Y [T ]Y may be different from A, but Theorem 4.3.1 in the book
says that A and B are similar ; that is, there exists a nonsingular
matrix P with B = P AP −1 .
Theorem 4.3.1. Let T : V → V be a linear transformation on a
vector space V . If x and Y are bases of V , then there is a nonsingular
Abstract Algebra Arising from Fermat’s Last Theorem
Draft. Do not cite or quote.
matrix P (called a transition matrix), namely, P = Y [1V ]x , so that
Y [T ]Y = P x [T ]x P
Conversely, if B = P AP −1 , where B, A, and P are n × n matrices
and P is nonsingular, then there is a linear transformation T : Rn →
Rn and bases x and Y of k n such that B = Y [T ]Y and A = x [T ]x .
We now consider how to determine whether two given matrices
are similar.
If A is an r × r matrix and B is an s × s matrix, then their direct
sum A ⊕ B is the (r + s) × (r + s) matrix
A 0
A⊕B =
0 B
If g(x) = x + c0 , then its companion matrix C(g) is the 1 × 1
matrix [−c0 ]; if s ≥ 2 and g(x) = xs + cs−1 xs−1 + · · · + c1 x + c0 , then
its companion matrix C(g) is the s × s matrix
0 0 0 · · · 0 −c0
1 0 0 · · · 0 −c 
1 
0 1 0 · · · 0 −c2 
C(g) = 0 0 1 · · · 0 −c 
3 
 .. .. .. .. ..
.. 
. . . . .
. 
0 0 0 · · · 1 −cs−1
Obviously, we can recapture the polynomial g(x) from the last
column of the companion matrix C(g). We call a polynomial f (x)
monic if the highest power of x in f has coefficient 1.
Theorem 0.1
Every n×n matrix A is similar to a direct sum of companion matrices
C(g1 ) ⊕ · · · ⊕ C(gt )
in which the gi (x) are monic polynomials and
g1 (x) | g2 (x) | · · · | gt (x).
Abstract Algebra Arising from Fermat’s Last Theorem
Draft. Do not cite or quote.
Rational Canonical Forms
A rational canonical form is a matrix R that is a direct sum of
companion matrices,
R = C(g1 ) ⊕ · · · ⊕ C(gt ),
where the gi (x) are monic polynomials with g1 (x) | g2 (x) | · · · | gt (x).
If a matrix A is similar to a rational canonical form C(g1 ) ⊕
· · · ⊕ C(gt ), where g1 (x) | g2 (x) | · · · | gt (x), then we say that the
invariant factors of A are g1 (x), g2 (x), . . . , gt (x).
Theorem 0.1 says that every n × n matrix is similar to a rational
canonical form, and so it has invariant factors. Can a matrix A have
more than one list of invariant factors?
Theorem 0.2
1. Two n × n matrices A and B are similar if and only if they
have the same invariant factors.
2. An n × n matrix A is similar to exactly one rational canonical
Corollary 0.3
Let A and B be n × n matrices with real entries. If A and B are
similar over C, then they are similar over R (i.e., if there is a
nonsingular matrix P having complex entries with B = P AP −1 ,
then there is a nonsingular matrix Q having real entries with B =
QAQ−1 ).
If T : V → V is a linear transformation, then an invariant subspace is, a subspace W of V with T (W ) ⊆ W .
Does a linear transformation T on a finite-dimensional vector
space V leave any one-dimensional subspaces of V invariant; that
is, is there a nonzero vector v ∈ V with T (v) = αv for some α? If
T : R2 → R2 is rotation
0 −1 by 90 , then its matrix with respect to the
standard basis is 1 0 . Now
" #
#" # " #
0 −1 x
1 0
If v = [ xy ] is a nonzero vector and T (v) = αv for some α ∈ R, then
αx = −y and αy = x; it follows that (α2 +1)x = x and (α2 +1)y = y.
Since v 6= 0, α2 + 1 = 0 and α ∈
/ R. Thus, T has no one-dimensional
Abstract Algebra Arising from Fermat’s Last Theorem
Draft. Do not cite or quote.
invariant subspaces. Note that
x2 + 1.
0 −1 1 0
is the companion matrix of
Let V be a vector space and let T : V → V be a linear transformation.
If T v = αv, where α ∈ C and v ∈ V is nonzero, then α is called an
eigenvalue of T and v is called an eigenvector of T for α.
Let A be an n × n matrix. If Av = αv, where α ∈ k and v ∈ k n
is a nonzero column, then α is called an eigenvalue of A and v is
called an eigenvector of A for α.
Rotation by 90o has no (real) eigenvalues. At the other extreme,
can a linear transformation have infinitely many eigenvalues?
Theorem 4.2.1. If T : Rn → Rn is a linear transformation, then
there exists a unique n × n matrix A such that T (v) = Av for all
v ∈ Rn .
To say that Av = αv for v nonzero is to say that v is a nontrivial
solution of the homogeneous system (A−αI)v = 0; that is, A−αI is a
singular matrix. But a matrix is singular if and only if its determinant
is 0.
The characteristic polynomial of an n × n matrix A is
pA (x) = det(xI − A) ∈ R[x].
Thus, the eigenvalues of an n × n matrix A are the roots of pA (x),
a polynomial of degree n, and so A has at most n real eigenvalues.
Note that some eigenvalues of A may be complex numbers.
The trace of an n × n matrix A = [aij ] is
tr(A) =
aii .
Proposition 0.4
If A = [aij ] is an n × n matrix having eigenvalues (with multiplicity)
α1 , . . . , αn , then
tr(A) = −
αi and det(A) =
αi .
Abstract Algebra Arising from Fermat’s Last Theorem
Draft. Do not cite or quote.
Rational Canonical Forms
Proof. For any polynomial f (x) with real coefficients, if f (x) =
+ cn−1 xn−1 + · · · + c1Q
x + c0 = (x − α1 ) · · · (x − αn ),Qthen cn−1 =
− i αi and c0 = (−1)
i αi . In particular, pA (x) =
i=1 (x − αi ),
so that cn−1 = − i αi = −tr(A). Now the constant term of any
polynomial f (x) is just f (0); setting x = 0 in pA (x) = det(xI
− A)
gives pA (0) = det(−A) = (−1) det(A). Hence, det(A) = i αi .
Here are some elementary facts about eigenvalues.
Corollary 0.5
Let A be an n × n matrix.
1. A is singular if and only if 0 is an eigenvalue of A.
2. If α is an eigenvalue of A, then αn is an eigenvalue of An .
3. If A is nonsingular and α is an eigenvalue of A, then α 6= 0
and α−1 is an eigenvalue of A−1 .
1. If A is singular, then the homogeneous system Ax = 0 has a
nontrivial solution; that is, there is a nonzero v with Av = 0.
But this just says that Av = 0x (here, 0 is a scalar), and so 0
is an eigenvalue.
Conversely, if 0 is an eigenvalue, then 0 = det(0I − A) =
± det(A), so that det(A) = 0 and A is singular.
2. There is a nonzero vector v with Av = αv. It follows by
induction on n ≥ 1 that An v = αn v.
3. If v is an eigenvector of A for α, then
v = A−1 Av = A−1 αv = αA−1 v.
Therefore, α 6= 0 (because eigenvectors are nonzero) and
α−1 v = A−1 v.
Let us return to rational canonical forms.
Lemma 0.6
If C(g) is the companion matrix of g(x) ∈ k[x], then
det xI − C(g) = g(x).
Abstract Algebra Arising from Fermat’s Last Theorem
Draft. Do not cite or quote.
Proof. If g(x) = x + c0 , then C(g) is the 1 × 1 matrix [−c0 ], and
det(xI − C(g)) = x + c0 = g(x). If deg(g) = s ≥ 2, then
x 0 0 ··· 0
−1 x 0 · · · 0
c1 
c2  ,
xI − C(g) =  0 −1 x · · · 0
.. .. ..
 ..
 .
. . .
0 0 · · · −1 x + cs−1
and Laplace expansion across the first row gives
det(xI − C(g)) = x det(L) + (−1)1+s c0 det(M ),
where L is the matrix obtained by erasing the top row and first
column, and M is the matrix obtained by erasing the top row and
last column. Now M is a triangular (s − 1) × (s − 1) matrix having
−1’s on the diagonal, while L = xI −C (g(x)−c0 )/x . By induction,
det(L) = (g(x) − c0 )/x, while det(M ) = (−1)s−1 . Therefore,
det(xI − C(g)) = x[(g(x) − c0 )/x] + (−1)(1+s)+(s−1) c0 = g(x).
If R = C(g1 ) ⊕ · · · ⊕ C(gt ) is a rational canonical form, then
xI − R = xI − C(g1 ) ⊕ · · · ⊕ xI − C(gt ) .
Given square matrices B1 , . . . , Bt , we have det(B1 ⊕ · · · ⊕ Bt ) =
i=1 det(Bi ). With Lemma 0.6, this gives
pR (x) =
pC(gi ) (x) =
gi (x).
Thus, the characteristic polynomial is the product of the invariant
We now show that similar matrices have the same characteristic polynomial. If
B = P AP −1 , then since xI commutes with every matrix, we have P xIP −1 =
(xI)P P −1 = xI. Therefore,
pB (x) = det(xI − B) = det(P xIP −1 − P AP −1 )
= det(P [xI − A]P −1 ) = det(P ) det(xI − A) det(P −1 )
= det(xI − A) = pA (x).
Abstract Algebra Arising from Fermat’s Last Theorem
Draft. Do not cite or quote.
Rational Canonical Forms
It follows that if A is similar to C(g1 ) ⊕ · · · ⊕ C(gt ), then
pA (x) =
gi (x).
Therefore, similar matrices have the same eigenvalues with multiplicities. Hence,
Proposition 0.4 says that similar matrices have the same trace and the same
Theorem 0.7 (Cayley–Hamilton)
If A is an n × n matrix with characteristic polynomial pA (x) =
xn + bn−1 xn−1 + · · · + b1 x + b0 , then pA (A) = 0; that is,
An + bn−1 An−1 + · · · + b1 A + b0 I = 0.
Proof. Birkhoff–Mac Lane, A Survey of Modern Algebra, p. 341.
The minimal polynomial mA (x) of an n×n matrix A is the monic
polynomial f (x) of least degree with the property that f (A) = 0.
Proposition 0.8
The minimal polynomial mA (x) is a divisor of the characteristic
polynomial pA (x), and every eigenvalue of A is a root of mA (x).
Proof. By the Cayley–Hamilton Theorem, pA (A) = 0.
Now gt (x) is the minimal polynomial of A, where gt (x) is the
invariant factor of A of highest degree. It follows from the fact that
pA (x) = g1 (x) · · · gt (x),
where g1 (x) | g2 (x) | · · · | gt (x), that mA (x) = gt (x) is a polynomial
having every eigenvalue of A as a root [of course, the multiplicity of
a root of mA (x) may be less than its multiplicity as a root of the
characteristic polynomial pA (x)].
Corollary 0.9
If all the eigenvalues of an n×n matrix A are distinct, then mA (x) =
pA (A); that is, the minimal polynomial coincides with the characteristic polynomial.
Proof. This is true because every root of pA (x) is a root of mA (x).
Abstract Algebra Arising from Fermat’s Last Theorem
Draft. Do not cite or quote.
a. How many 10 × 10 matrices A over R are there, up to
similarity, with A2 = I?
b. How many 10 × 10 matrices A over Fp are there, up to
similarity, with A2 = I? Hint. The answer depends on
the parity of p.
2. Find the rational canonical forms of
2 0 0
2 0 0
1 2
, B = 1 2 0 , and C = 1 2 0 .
3 4
0 0 3
0 1 2
3. If A is similar to A0 and B is similar to B 0 , prove that A ⊕ B
is similar to A0 ⊕ B 0 .
Jordan Canonical Forms
Even if we know the rational canonical form of a matrix A, we may
not know the rational canonical form of a power of A. For example,
given a matrix A, is some power of it with Am = I? We now give a
second canonical form whose powers are easily calculated.
Let k be a field and let α be a real or complex number. A 1 × 1
Jordan block is a matrix J(α, 1) = [α] and, if s ≥ 2, an s × s
Jordan block is a matrix J(α, s) of the form
α 0 0 ··· 0 0
1 α 0 · · · 0 0
0 1 α · · · 0 0
J(α, s) = 
 .. .. .. .. .. ..  .
. . . . . .
0 0 0 · · · α 0
0 0 0 ··· 1 α
Here is a more compact description of a Jordan block when s ≥ 2.
Let L denote the s × s matrix having all entries 0 except for 1’s on
the subdiagonal just below the main diagonal. With this notation, a
Jordan block J(α, s) can be written as
J(α, s) = αI + L.
Let us regard L as a linear transformation on k s . If e1 , . . . , es is the
standard basis, then Lei = ei+1 if i < s while Les = 0. It follows
Abstract Algebra Arising from Fermat’s Last Theorem
Draft. Do not cite or quote.
Jordan Canonical Forms
easily that the matrix L2 is all 0’s except for 1’s on the second
subdiagonal below the main diagonal; L3 is all 0’s except for 1’s
on the third subdiagonal; Ls−1 has 1 in the s, 1 position, with 0’s
everywhere else, and Ls = 0. Thus, L is nilpotent.
Lemma 0.10
If J = J(α, s) = αI + L is an s × s Jordan block, then for all m ≥ 1,
=α I+
s−1 X
αm−i Li .
Proof. Since L and αI commute (the scalar matrix αI commutes
with every matrix), the ring generated by αI and L is commutative,
and the Binomial Theorem applies. Finally, note that all terms
involving Li for i ≥ s are 0 because Ls = 0.
Different powers of L are “disjoint”; that is, if m 6= n and the i, j entry of Ln is
nonzero, then the i, j entry of Lm is zero. For example,
#m "
α 0
1 α
mαm−1 αm
m 
α 0 0
0 .
 1 α 0  =  mα
m−2 mαm−1 αm
0 1 α
2 α
Lemma 0.11
If g(x) = (x − α)s , then the companion matrix C(g) is similar to the
s × s Jordan block J(α, s).
It follows that Jordan blocks also correspond to polynomials
(just as companion matrices do); in particular, the characteristic
polynomial of J(α, s) is the same as that of C((x − α)s ):
pJ(α,s) (x) = (x − α)s .
Theorem 0.12
Let A be an n × n matrix with real entries. If all the eigenvalues of
A are real, then A is similar to a direct sum of Jordan blocks.
Abstract Algebra Arising from Fermat’s Last Theorem
Draft. Do not cite or quote.
A Jordan canonical form is a direct sum of Jordan blocks.
If a matrix A is similar to the Jordan canonical form
J(α1 , s1 ) ⊕ · · · ⊕ J(αr , sr ),
then we say that A has elementary divisors
(x − α1 )s1 , . . . , (x − αr )sr .
Theorem 0.12 says that every square matrix A having entries in
a field containing all the eigenvalues of A is similar to a Jordan
canonical form. Can a matrix be similar to several Jordan canonical
forms? The answer is yes, but not really.
Let Ir be the r × r identity matrix, and let Is be the s × s identity matrix. Then
interchanging blocks in a direct sum yields a similar matrix:
# "
B 0
0 Ir A 0
0 Is
0 A
Is 0
0 B Ir 0
Since every permutation is a product of transpositions, it follows that permuting
the blocks of a matrix of the form A1 ⊕ A2 ⊕ · · · ⊕ At yields a matrix similar to
the original one.
Theorem 0.13
1. If A and B are n×n matrices over C, then A and B are similar
if and only if they have the same elementary divisors.
2. If a matrix A is similar to two Jordan canonical forms, say, H
and H 0 , then H and H 0 have the same Jordan blocks (i.e., H 0
arises from H by permuting its Jordan blocks).
The hypothesis that all the eigenvalues of A and B lie in C is not
a serious problem. Recall that Corollary 0.3(ii) says that if A and B
are similar over C, then they are similar over R.
Here are some applications of canonical forms.
Proposition 0.14
If A is an n × n matrix, then A is similar to its transpose A> .
Abstract Algebra Arising from Fermat’s Last Theorem
Draft. Do not cite or quote.
Jordan Canonical Forms
Proof. First, Corollary 0.3(ii) allows us to assume that k contains all
the eigenvalues of A. Now if B = P AP −1 , then B > = (P > )−1 A> P > ;
that is, if B is similar to A, then B > is similar to A> . Thus, it suffices
to prove that H is similar to H > for a Jordan canonical form H; by
Exercise 3, it is enough to show that a Jordan block J = J(α, s) is
similar to J > .
We illustrate the idea for J(α, 3). Let Q be the matrix having
1’s on the “wrong” diagonal and 0’s everywhere else; notice that
Q = Q−1 :
 
0 0 1
α 0 0
0 0 1
α 1 0
 
0 1 0  1 α 0  0 1 0 =  0 α 1  .
1 0 0
0 1 α
1 0 0
0 0 α
Let v1 , . . . , vs be a basis of a vector space W , define Q : W → W by
Q : vi 7→ vs−i+1 , and define J : W → W by J : vi 7→ αvi + vi+1 for
i < s and J : vs 7→ αvs . The reader can now prove that Q = Q−1
and QJ(α, s)Q−1 = J(α, s)> .
Exponentiating a matrix is used to find solutions to systems of
linear differential equations. An n × n complex matrix A consists of
n2 entries, and so A may be regarded as a point in C n . This allows
us to define convergence of a sequence of n × n complex matrices:
A1 , A2 , . . . , Ak , . . . converges to a matrix M if, for each i, j, the
sequence of i, j entries converges. As in Calculus, convergence of a
series means convergence of the sequence of its partial sums.
If A is an n × n complex matrix, then
1 k
e =
A = I + A + 21 A2 + 16 A3 + · · · .
This series converges for every matrix A (see Exercise 7), and the
function A 7→ eA is continuous; that is, if limk→∞ Ak = M , then
lim eAk = eM
We are now going to show that the Jordan canonical form of A
can be used to compute eA .
Proposition 0.15
Let A = [aij ] be an n × n complex matrix.
1. If P is nonsingular, then P eA P −1 = eP AP
Abstract Algebra Arising from Fermat’s Last Theorem
Draft. Do not cite or quote.
2. If AB = BA, then eA eB = eA+B .
3. For every matrix A, the matrix eA is nonsingular; indeed,
(eA )−1 = e−A .
4. If L is the n × n matrix having 1’s just below the main diagonal
and 0’s elsewhere, then eL is a lower triangular matrix with 1’s
on the diagonal.
5. If D is a diagonal matrix, say, D = diag(α1 , α2 , . . . , αn ), then
eD = diag(eα1 , eα2 , . . . , eαn ).
6. If α1 , . . . , αn are the eigenvalues of A (with multiplicities), then
eα1 , . . . , eαn are the eigenvalues of eA (with multiplicities).
7. We can compute eA .
8. If tr(A) = 0, then det(eA ) = 1.
1. We use the continuity of matrix exponentiation:
P eA P −1 = P
1 k −1
= lim
P Ak P −1
P AP −1
= lim
P AP −1
2. The coefficient of the kth term of the power series for eA+B is
(A + B)k ,
while the kth term of eA eB is
k X 1 1
1 X k
i k−i
A B =
Ai B k−i .
i! j!
i!(k − i)!
Since A and B commute, the Binomial Theorem shows that
both kth coefficients are equal. (See Exercise 9 for an example
where this is false ifA and B do not commute.)
3. This follows immediately from part (ii), for −A and A commute
and e0 = I.
Abstract Algebra Arising from Fermat’s Last Theorem
Draft. Do not cite or quote.
Jordan Canonical Forms
4. The equation
eL = I + L + L2 + · · · +
(s − 1)!
holds because Ls = 0, and the result
For example, when s = 5,
1 0 0 0
1 1 0 0
e =
2 1 1 0
1 1
6 2 1 1
1 1
24 6 2 1
follows by Lemma 0.10.
5. This is clear from the definition:
eD = I + D + 21 D2 + 16 D3 + · · · ,
for Dk = diag(α1k , α2k , . . . , αnk ).
6. Since C is algebraically closed, A is similar to its Jordan canonical form J: there is a nonsingular matrix P with P AP −1 = J.
Now A and J have the same characteristic polynomial and,
hence, the same eigenvalues with multiplicities. But J is a lower
triangular matrix with the eigenvalues α1 , . . . , αn of A on the
diagonal, and so the definition of matrix exponentiation gives
eJ lower triangular with eα1 , . . . , eαn on the diagonal. Since
eA = eP JP = P −1 eJ P , it follows that the eigenvalues of eA
are as claimed.
7. By the Jordan Decomposition (Exercise 2), there is a nonsingular matrix P with P AP −1 = ∆ + L, where ∆ is a diagonal
matrix, Ln = 0, and ∆L = L∆. Hence,
P eA P −1 = eP AP
= e∆+L = e∆ eL .
But e∆ is computed in part (v) and eL is computed in part
(iv). Hence, eA = P −1 e∆ eL P is computable.
8. By Proposition 0.4, −tr(A) is the sum of its eigenvalues, while
det(A) is the product of the eigenvalues. Since the eigenvalues
of eA are eα1 , . . . , eαn , we have
det(eA ) =
eαi = e i αi = e−tr(A) .
Hence, tr(A) = 0 implies det(eA ) = 1.
Abstract Algebra Arising from Fermat’s Last Theorem
Draft. Do not cite or quote.
1. Find all n × n matrices A over a field k for which A and A2
are similar.
2. (Jordan Decomposition) Prove that every n × n complex
matrix A can be written as
A = D + N,
where D is diagonalizable (i.e., D is similar to a diagonal
matrix), N is nilpotent (i.e., N m = 0 for some m ≥ 1), and
DN = N D.
3. Give an example of an n × n complex matrix that is not
diagonalizable. [It is known that every hermitian matrix A
is diagonalizable (A is hermitian if A = A∗ , where the i, j
entry of A∗ is aji ).] Hint. A rotation (not the identity) about
the origin on R2 sends no line through the origin into itself.
a. Prove that all the eigenvalues of a nilpotent matrix are
b. Use the Jordan form to prove the converse: if all the
eigenvalues of a matrix A are 0, then A is nilpotent. (This
result also follows from the Cayley–Hamilton Theorem.)
5. How many similarity classes of 6 × 6 nilpotent matrices are
there over a field k?
6. If A and B are similar and A is nonsingular, prove that B is
nonsingular and that A−1 is similar to B −1 .
7. Let A = [aij ] be an n × n complex matrix.
a. If M = maxij |aij |, prove that no entry of As has absolute
value greater than (nM )s .
b. Prove that the series defining eA converges.
c. Prove that A 7→ eA is a continuous function.
a. Prove that every nilpotent matrix N is similar to a
strictly lower triangular matrix (i.e., all entries on and
above the diagonal are 0).
b. If N is a nilpotent matrix, prove that I+N is nonsingular.
9. Let A = [ 10 00 ] and B = [ 00 10 ]. Prove that eA eB 6= eB eA and
eA eB 6= eA+B .
Abstract Algebra Arising from Fermat’s Last Theorem