Uploaded by Cherche Lee

Aglebra notes

advertisement
Some Algebaic Structure
A notes about groups, rings, and modules
Cambridge Notes
Contents
Groups I
I
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2
Groups and homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1
2.2
2.3
2.4
2.5
Groups
Homomorphisms
Cyclic groups
Dihedral groups
Direct products of groups
3
Symmetric group I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1
3.2
Symmetric groups
Sign of permutations
4
Lagrange’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1
4.2
Small groups
Left and right cosets
5
Quotient groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1
5.2
Normal subgroups
Quotient groups
13
16
19
20
21
23
26
33
33
35
36
5.3
The Isomorphism Theorem
6
Group actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.1
6.2
6.3
6.4
Group acting on sets
Orbits and Stabilizers
Important actions
Applications
7
Examples of Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.1
7.2
7.3
7.4
Conjugacy classes in Sn
Conjugacy classes in An
Quaternions
Matrix groups
7.4.1
7.4.2
7.4.3
7.4.4
7.4.5
General and special linear groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Actions of GLn (C) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Orthogonal groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Rotations and reflections in R2 and R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Unitary groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.5
More on regular polyhedra
7.5.1
7.5.2
Symmetries of the cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Symmetries of the tetrahedron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.6
Möbius group
7.6.1
7.6.2
7.6.3
7.6.4
Möbius maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Fixed points of Möbius maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Permutation properties of Möbius maps . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Cross-ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.7
Projective line (non-examinable)
II
37
39
40
42
44
47
48
50
51
56
57
63
Groups II, Rings and Modules
8
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
9
Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
9.1
9.2
9.3
9.4
9.5
9.6
9.7
Basic concepts
Normal subgroups, quotients, homomorphisms, isomorphisms
Actions of permutations
Conjugacy, centralizers and normalizers
Finite p-groups
Finite abelian groups
Sylow theorems
71
73
79
85
88
89
90
10 Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
10.1
10.2
10.3
10.4
10.5
10.6
10.7
10.8
Definitions and examples
Homomorphisms, ideals, quotients and isomorphisms
Integral domains, field of factions, maximal and prime ideals
Factorization in integral domains
Factorization in polynomial rings
Gaussian integers
Algebraic integers
Noetherian rings
95
99
106
111
118
125
128
130
11 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
11.1
11.2
11.3
11.4
11.5
Definitions and examples
Direct sums and free modules
Matrices over Euclidean domains
Modules over F[X] and normal forms for matrices
Conjugacy of matrices*
135
141
144
157
161
I
Groups I
1
Introduction . . . . . . . . . . . . . . . . . . . . . . 11
2
Groups and homomorphisms . . . 13
2.1
2.2
2.3
2.4
2.5
Groups
Homomorphisms
Cyclic groups
Dihedral groups
Direct products of groups
3
Symmetric group I . . . . . . . . . . . . . . . 23
3.1
3.2
Symmetric groups
Sign of permutations
4
Lagrange’s Theorem . . . . . . . . . . . . . 29
4.1
4.2
Small groups
Left and right cosets
5
Quotient groups . . . . . . . . . . . . . . . . . . 35
5.1
5.2
5.3
Normal subgroups
Quotient groups
The Isomorphism Theorem
6
Group actions . . . . . . . . . . . . . . . . . . . . 39
6.1
6.2
6.3
6.4
Group acting on sets
Orbits and Stabilizers
Important actions
Applications
7
Examples of Groups . . . . . . . . . . . . . 47
7.1
7.2
7.3
7.4
7.5
7.6
7.7
Conjugacy classes in Sn
Conjugacy classes in An
Quaternions
Matrix groups
More on regular polyhedra
Möbius group
Projective line (non-examinable)
9
Examples of groups
Axioms for groups. Examples from geometry: symmetry groups of regular polygons, cube, tetrahedron. Permutations on a set; the symmetric group. Subgroups and homomorphisms. Symmetry
groups as subgroups of general permutation groups. The Möbius group; cross-ratios, preservation
of circles, the point at infinity. Conjugation. Fixed points of Möbius maps and iteration.
[4]
Lagrange’s theorem
Cosets. Lagrange’s theorem. Groups of small order (up to order 8). Quaternions. Fermat-Euler
theorem from the group-theoretic point of view.
[5]
Group actions
Group actions; orbits and stabilizers. Orbit-stabilizer theorem. Cayley’s theorem (every group is
isomorphic to a subgroup of a permutation group). Conjugacy classes. Cauchy’s theorem.
[4]
Quotient groups
Normal subgroups, quotient groups and the isomorphism theorem.
[4]
Matrix groups
The general and special linear groups; relation with the Möbius group. The orthogonal and special
orthogonal groups. Proof (in R3 ) that every element of the orthogonal group is the product of
reflections and every rotation in R3 has an axis. Basis change as an example of conjugation. [3]
Permutations
Permutations, cycles and transpositions. The sign of a permutation. Conjugacy in Sn and in An .
Simple groups; simplicity of A5 .
[4]
1. Introduction
Group theory is an example of algebra. In pure mathematics, algebra (usually) does not
refer to the boring mindless manipulation of symbols. Instead, in algebra, we have some
set of objects with some operations on them. For example, we can take the integers with
addition as the operation. However, in algebra, we allow any set and any operations, not
just numbers.
Of course, such a definition is too broad to be helpful. We categorize algebraic structures into different types. In this course, we will study a particular kind of structures,
groups. In the IB Groups, Rings and Modules course, we will study rings and modules as
well.
These different kinds of structures are defined by certain axioms. The group axioms
will say that the operation must follow certain rules, and any set and operation that satisfies
these rules will be considered to form a group. We will then have a different set of axioms
for rings, modules etc.
As mentioned above, the most familiar kinds of algebraic structures are number systems such as integers and rational numbers. The focus of group theory, however, is not on
things that resemble “numbers”. Instead, it is the study of symmetries.
First of all, what is a symmetry? We are all familiar with, say, the symmetries of
an (equilateral) triangle (we will always assume the triangle is equilateral). We rotate
a triangle by 120◦ , and we get the original triangle. We say that rotating by 120◦ is a
symmetry of a triangle. In general, a symmetry is something we do to an object that
leaves the object intact.
Of course, we don’t require that the symmetry leaves everything intact. Otherwise, we
would only be allowed to do nothing. Instead, we require certain important things to be
intact. For example, when considering the symmetries of a triangle, we only care about
how the resultant object looks, but don’t care about where the individual vertices went.
In the case of the triangle, we have six symmetries: three rotations (rotation by
0◦ , 120◦ and 240◦ ), and three reflections along the axes below:
12
Chapter 1. Introduction
These six together form the underlying set of the group of symmetries. A more sophisticated example is the symmetries of R3 . We define these as operations on R3 that leave
distances between points unchanged. These include translations, rotations, reflections,
and combinations of these.
So what is the operation? This operation combines two symmetries to give a new
symmetry. The natural thing to do is to do the symmetry one after another. For example,
if we combine the two 120◦ rotations, we get a 240◦ rotation.
Now we are studying algebra, not geometry. So to define the group, we abstract away
the triangle. Instead, we define the group to be six objects, say {e, r, r2 , s, rs, r2 s}, with
rules defining how we combine two elements to get a third. Officially, we do not mention
the triangle at all when defining the group.
We can now come up with the group axioms. What rules should the set of symmetries
obey? First of all, we must have a “do nothing” symmetry. We call this the identity
element. When we compose the identity with another symmetry, the other symmetry is
unchanged.
Secondly, given a symmetry, we can do the reverse symmetry. So for any element,
there is an inverse element that, when combined with the original, gives the identity.
Finally, given three symmetries, we can combine them, one after another. If we denote
the operation of the group as ∗, then if we have three symmetries, x, y, z, we should be able
to form x ∗ y ∗ z. If we want to define it in terms of the binary operation ∗, we can define
it as (x ∗ y) ∗ z, where we first combine the first two symmetries, then combine the result
with the third. Alternatively, we can also define it as x ∗ (y ∗ z). Intuitively, these two
should give the same result, since both are applying x after y after z. Hence we have the
third rule x ∗ (y ∗ z) = (x ∗ y) ∗ z.
Now a group is any set with an operation that satisfies the three rules above. In group
theory, the objective is to study the properties of groups just assuming these three axioms.
It turns out that there is a lot we can talk about.
2. Groups and homomorphisms
2.1 Groups
Definition 2.1.1 (Binary operation). A (binary) operation is a way of combining two
elements to get a new element. Formally, it is a map ∗ : A × A → A.
Definition 2.1.2 (Group). A group is a set G with a binary operation ∗ satisfying the
following axioms:
1. There is some e ∈ G such that for all a, we have
a ∗ e = e ∗ a = a.
(identity)
2. For all a ∈ G, there is some a−1 ∈ G such that
a ∗ a−1 = a−1 ∗ a = e.
(inverse)
3. For all a, b, c ∈ G, we have
(a ∗ b) ∗ c = a ∗ (b ∗ c).
(associativity)
Definition 2.1.3 (Order of group). The order of the group, denoted by |G|, is the number
of elements in G. A group is a finite group if the order is finite.
Note that technically, the inverse axiom makes no sense, since we have not specified
what e is. Even if we take it to be the e given by the identity axiom, the identity axiom
only states there is some e that satisfies that property, but there could be many! We don’t
know which one a ∗ a−1 is supposed to be equal to! So we should technically take that to
mean there is some a−1 such that a ∗ a−1 and a−1 ∗ a satisfy the identity axiom. Of course,
we will soon show that identities are indeed unique, and we will happily talk about “the”
identity.
Some people put a zeroth axiom called “closure”:
14
Chapter 2. Groups and homomorphisms
1. For all a, b ∈ G, we have a ∗ b ∈ G.
(closure)
Technically speaking, this axiom also makes no sense — when we say ∗ is a binary operation, by definition, a ∗ b must be a member of G. However, in practice, we often have to
check that this axiom actually holds. For example, if we let G be the set of all matrices of
the form


1 x y
0 1 z 
0 0 1
under matrix multiplication, we will have to check that the product of two such matrices
is indeed a matrix of this form. Officially, we are checking that the binary operation is a
well-defined operation on G.
It is important to know that it is generally not true that a ∗ b = b ∗ a. There is no a
priori reason why this should be true. For example, if we are considering the symmetries
of a triangle, rotating and then reflecting is different from reflecting and then rotating.
However, for some groups, this happens to be true. We call such groups abelian
groups.
Definition 2.1.4 (Abelian group). A group is abelian if it satisfies
4. (∀a, b ∈ G) a ∗ b = b ∗ a.
(commutativity)
If it is clear from context, we are lazy and leave out the operation ∗, and write a ∗ b as
0
−n
−1 n
ab. We also write a2 = aa, an = aaa
| {z· · · a}, a = e, a = (a ) etc.
n copies
Example 2.1.1. The following are abelian groups:
(i) Z with +
(ii) Q with +
(iii) Zn (integers mod n) with +n
(iv) Q∗ with ×
(v) {−1, 1} with ×
The following are non-abelian groups:
(vi) Symmetries of an equilateral triangle (or any n-gon) with composition. (D2n )
(vii) 2 × 2 invertible matrices with matrix multiplication (GL2 (R))
(viii) Symmetry groups of 3D objects
Recall that the first group axiom requires that there exists an identity element, which
we shall call e. Then the second requires that for each a, there is an inverse a−1 such that
a−1 a = e. This only makes sense if there is only one identity e, or else which identity
should a−1 a be equal to?
We shall now show that there can only be one identity. It turns out that the inverses
are also unique. So we will talk about the identity and the inverse.
Proposition 2.1.1. Let (G, ∗) be a group. Then
(i) The identity is unique.
(ii) Inverses are unique.
Proof.
2.1 Groups
15
(i) Suppose e and e0 are identities. Then we have ee0 = e0 , treating e as an inverse, and
ee0 = e, treating e0 as an inverse. Thus e = e0 .
(ii) Suppose a−1 and b both satisfy the inverse axiom for some a ∈ G. Then b = be =
b(aa−1 ) = (ba)a−1 = ea−1 = a−1 . Thus b = a−1 .
■
Proposition 2.1.2. Let (G, ∗) be a group and a, b ∈ G. Then
(i) (a−1 )−1 = a
(ii) (ab)−1 = b−1 a−1
Proof.
(i) Given a−1 , both a and (a−1 )−1 satisfy
xa−1 = a−1 x = e.
By uniqueness of inverses, (a−1 )−1 = a.
(ii) We have
(ab)(b−1 a−1 ) = a(bb−1 )a−1
= aea−1
= aa−1
=e
Similarly, (b−1 a−1 )ab = e. So b−1 a−1 is an inverse of ab. By the uniqueness of
inverses, (ab)−1 = b−1 a−1 .
■
Sometimes if we have a group G, we might want to discard some of the elements. For
example if G is the group of all symmetries of a triangle, we might one day decide that
we hate reflections because they reverse orientation. So we only pick the rotations in G
and form a new, smaller group. We call this a subgroup of G.
Definition 2.1.5 (Subgroup). A H is a subgroup of G, written H ≤ G, if H ⊆ G and H
with the restricted operation ∗ from G is also a group.
Example 2.1.2.
– (Z, +) ≤ (Q, +) ≤ (R, +) ≤ (C, +)
– (e, ∗) ≤ (G, ∗) (trivial subgroup)
– G≤G
– ({±1}, ×) ≤ (Q∗ , ×)
According to the definition, to prove that H is a subgroup of G, we need to make
sure H satisfies all group axioms. However, this is often tedious. Instead, there are some
simplified criteria to decide whether H is a subgroup.
Lemma 2.1.1 (Subgroup criteria I). Let (G, ∗) be a group and H ⊆ G. H ≤ G iff
(i) e ∈ H
(ii) (∀a, b ∈ H) ab ∈ H
(iii) (∀a ∈ H) a−1 ∈ H
Proof. The group axioms are satisfied as follows:
16
Chapter 2. Groups and homomorphisms
1. Closure: (ii)
2. Identity: (i). Note that H and G must have the same identity. Suppose that eH and
eG are the identities of H and G respectively. Then eH eH = eH . Now eH has an
−1
inverse in G. Thus we have eH eH e−1
H = eH eH . So eH eG = eG . Thus eH = eG .
3. Inverse: (iii)
4. Associativity: inherited from G.
■
Humans are lazy, and the test above is still too complicated. We thus come up with an
even simpler test:
Lemma 2.1.2 (Subgroup criteria II). A subset H ⊆ G is a subgroup of G iff:
(I) H is non-empty
(II) (∀a, b ∈ H) ab−1 ∈ H
Proof. (I) and (II) follow trivially from (i), (ii) and (iii).
To prove that (I) and (II) imply (i), (ii) and (iii), we have
(i) H must contain at least one element a. Then aa−1 = e ∈ H.
(iii) ea−1 = a−1 ∈ H.
(i) a(b−1 )−1 = ab ∈ H.
■
Proposition 2.1.3. The subgroups of (Z, +) are exactly nZ, for n ∈ N (nZ is the integer
multiples of n).
Proof. Firstly, it is trivial to show that for any n ∈ N, nZ is a subgroup. Now show that
any subgroup must be in the form nZ.
Let H ≤ Z. We know 0 ∈ H. If there are no other elements in H, then H = 0Z.
Otherwise, pick the smallest positive integer n in H. Then H = nZ.
Otherwise, suppose (∃a ∈ H) n ∤ a. Let a = pn+q, where 0 < q < n. Since a− pn ∈ H,
q ∈ H. Yet q < n but n is the smallest member of H. Contradiction. So every a ∈ H is
divisible by n. Also, by closure, all multiples of n must be in H. So H = nZ.
■
2.2 Homomorphisms
It is often helpful to study functions between different groups. First, we need to define
what a function is. These definitions should be familiar from IA Numbers and Sets.
Definition 2.2.1 (Function). Given two sets X, Y , a function f : X → Y sends each x ∈ X
to a particular f (x) ∈ Y . X is called the domain and Y is the co-domain.
Example 2.2.1.
– Identity function: for any set X, 1X : X → X with 1X (x) = x is a function. This is
also written as idX .
– Inclusion map: ι : Z → Q: ι (n) = n. Note that this differs from the identity function
as the domain and codomain are different in the inclusion map.
– f1 : Z → Z: f1 (x) = x + 1.
– f2 : Z → Z: f2 (x) = 2x.
– f3 : Z → Z: f3 (x) = x2 .
– For g : {0, 1, 2, 3, 4} → {0, 1, 2, 3, 4}, we have:
2.2 Homomorphisms
17
◦ g1 (x) = x + 1 if x < 4; g1 (4) = 4.
◦ g2 (x) = x + 1 if x < 4; g1 (4) = 0.
Definition 2.2.2 (Composition of functions). The composition of two functions is a function you get by applying one after another. In particular, if f : X → Y and G : Y → Z, then
g ◦ f : X → Z with g ◦ f (x) = g( f (x)).
Example 2.2.2. f2 ◦ f1 (x) = 2x + 2. f1 ◦ f2 (x) = 2x + 1. Note that function composition
is not commutative.
Definition 2.2.3 (Injective functions). A function f is injective if it hits everything at most
once, i.e.
(∀x, y ∈ X) f (x) = f (y) ⇒ x = y.
Definition 2.2.4 (Surjective functions). A function is surjective if it hits everything at
least once, i.e.
(∀y ∈ Y )(∃x ∈ X) f (x) = y.
Definition 2.2.5 (Bijective functions). A function is bijective if it is both injective and
surjective. i.e. it hits everything exactly once. Note that a function has an inverse iff it is
bijective.
Example 2.2.3. ι and f2 are injective but not surjective. f3 and g1 are neither. 1X , f1 and
g2 are bijective.
Lemma 2.2.1. The composition of two bijective functions is bijective
When considering sets, functions are allowed to do all sorts of crazy things, and can
send any element to any element without any restrictions. However, we are currently
studying groups, and groups have additional structure on top of the set of elements. Hence
we are not interested in arbitrary functions. Instead, we are interested in functions that
“respect” the group structure. We call these homomorphisms.
Definition 2.2.6 (Group homomorphism). Let (G, ∗) and (H, ×) be groups. A function
f : G → H is a group homomorphism iff
(∀g1 , g2 ∈ G) f (g1 ) × f (g2 ) = f (g1 ∗ g2 ),
Definition 2.2.7 (Group isomorphism). Isomorphisms are bijective homomorphisms. Two
groups are isomorphic if there exists an isomorphism between them. We write G ∼
= H.
We will consider two isomorphic groups to be “the same”. For example, when we say
that there is only one group of order 2, it means that any two groups of order 2 must be
isomorphic.
Example 2.2.4.
– f : G → H defined by f (g) = e, where e is the identity of H, is a homomorphism.
– 1G : G → G and f2 : Z → 2Z are isomorphisms. ι : Z → Q and f2 : Z → Z are
homomorphisms.
18
Chapter 2. Groups and homomorphisms
– exp : (R, +) → (R+ , ×) with exp(x) = ex is an isomorphism.
– Take (Z4 , +) and H : ({eikπ /2 : k = 0, 1, 2, 3}, ×). Then f : Z4 → H by f (a) = eiπ a/2
is an isomorphism.
– f : GL2 (R) → R∗ with f (A) = det(A) is a homomorphism, where GL2 (R) is the
set of 2 × 2 invertible matrices.
Proposition 2.2.1. Suppose that f : G → H is a homomorphism. Then
(i) Homomorphisms send the identity to the identity, i.e.
f (eG ) = eH
(ii) Homomorphisms send inverses to inverses, i.e.
f (a−1 ) = f (a)−1
(iii) The composite of 2 group homomorphisms is a group homomorphism.
(iv) The inverse of an isomorphism is an isomorphism.
Proof.
(i)
f (eG ) = f (e2G ) = f (eG )2
f (eG )−1 f (eG ) = f (eG )−1 f (eG )2
f (eG ) = eH
(ii)
eH = f (eG )
= f (aa−1 )
= f (a) f (a−1 )
Since inverses are unique, f (a−1 ) = f (a)−1 .
(iii) Let f : G1 → G2 and g : G2 → G3 . Then g( f (ab)) = g( f (a) f (b)) = g( f (a))g( f (b)).
(iv) Let f : G → H be an isomorphism. Then
n o
f −1 (ab) = f −1 f f −1 (a) f f −1 (b)
n o
−1
−1
−1
f f (a) f (b)
=f
= f −1 (a) f −1 (b)
So f −1 is a homomorphism. Since it is bijective, f −1 is an isomorphism.
■
Definition 2.2.8 (Image of homomorphism). If f : G → H is a homomorphism, then the
image of f is
im f = f (G) = { f (g) : g ∈ G}.
Definition 2.2.9 (Kernel of homomorphism). The kernel of f , written as
ker f = f −1 ({eH }) = {g ∈ G : f (g) = eH }.
2.3 Cyclic groups
19
Proposition 2.2.2. Both the image and the kernel are subgroups of the respective groups,
i.e. im f ≤ H and ker f ≤ G.
Proof. Since eH ∈ im f and eG ∈ ker f , im f and ker f are non-empty. Moreover, suppose
−1
b1 , b2 ∈ im f . Now ∃a1 , a2 ∈ G such that f (ai ) = bi . Then b1 b−1
2 = f (a1 ) f (a2 ) =
f (a1 a−1
2 ) ∈ im f .
−1 = e2 = e. So b b−1 ∈
Then consider b1 , b2 ∈ ker f . We have f (b1 b−1
1 2
2 ) = f (b1 ) f (b2 )
ker f .
■
Proposition 2.2.3. Given any homomorphism f : G → H and any a ∈ G, for all k ∈ ker f ,
aka−1 ∈ ker f .
This proposition seems rather pointless. However, it is not. All subgroups that satisfy
this property are known as normal subgroups, and normal subgroups have very important
properties. We will postpone the discussion of normal subgroups to later lectures.
Proof. f (aka−1 ) = f (a) f (k) f (a)−1 = f (a)e f (a)−1 = e. So aka−1 ∈ ker f .
■
Example 2.2.5. Images and kernels for previously defined functions:
(i) For the function that sends everything to e, im f = {e} and ker f = G.
(ii) For the identity function, im 1G = G and ker 1G = {e}.
(iii) For the inclusion map ι : Z → Q, we have im ι = Z and ker ι = {0}
(iv) For f2 : Z → Z and f2 (x) = 2x, we have im f2 = 2Z and ker f2 = {0}.
(v) For det : GL2 (R) → R∗ , we have im det = R∗ and ker det = {A : det A = 1} =
SL2 (R)
Proposition 2.2.4. For all homomorphisms f : G → H, f is
(i) surjective iff im f = H
(ii) injective iff ker f = {e}
Proof.
(i) By definition.
(ii) We know that f (e) = e. So if f is injective, then by definition ker f = {e}. If
ker f = {e}, then given a, b such that f (a) = f (b), f (ab−1 ) = f (a) f (b)−1 = e.
Thus ab−1 ∈ ker f = {e}. Then ab−1 = e and a = b.
■
So far, the definitions of images and kernels seem to be just convenient terminology to
refer to things. However, we will later prove an important theorem, the first isomorphism
theorem, that relates these two objects and provides deep insights (hopefully).
Before we get to that, we will first study some interesting classes of groups and develop some necessary theory.
2.3 Cyclic groups
The simplest class of groups is cyclic groups. A cyclic group is a group of the form
{e, a, a2 , a2 , · · · , an−1 }, where an = e. For example, if we consider the group of all rotations of a triangle, and write r = rotation by 120◦ , the elements will be {e, r, r2 } with
r3 = e.
Officially, we define a cyclic group as follows:
20
Chapter 2. Groups and homomorphisms
Definition 2.3.1 (Cyclic group Cn ). A group G is cyclic if
(∃a)(∀b)(∃n ∈ Z) b = an ,
i.e. every element is some power of a. Such an a is called a generator of G.
We write Cn for the cyclic group of order n.
Example 2.3.1.
(i) Z is cyclic with generator 1 or −1. It is the infinite cyclic group.
(ii) ({+1, −1}, ×) is cyclic with generator −1.
(iii) (Zn , +) is cyclic with all numbers coprime with n as generators.
Notation 2.1. Given a group G and a ∈ G, we write hai for the cyclic group generated by
a, i.e. the subgroup of all powers of a. It is the smallest subgroup containing a.
Definition 2.3.2 (Order of element). The order of an element a is the smallest integer n
such that an = e. If n doesn’t exist, a has infinite order. Write ord(a) for the order of a.
We have given two different meanings to the word “order”. One is the order of a
group and the other is the order of an element. Since mathematicians are usually (but not
always) sensible, the name wouldn’t be used twice if they weren’t related. In fact, we
have
Lemma 2.3.1. For a in g, ord(a) = |hai|.
Proof. If ord(a) = ∞, an 6= am for all n 6= m. Otherwise am−n = e. Thus |hai| = ∞ =
ord(a).
Otherwise, suppose ord(a) = k. Thus ak = e.
We now claim that hai = {e, a1 , a2 , · · · ak−1 }. Note that hai does not contain higher
powers of a as ak = e and higher powers will loop back to existing elements. There are
also no repeating elements in the list provided since am = an ⇒ am−n = e. So done. ■
It is trivial to show that
Proposition 2.3.1. Cyclic groups are abelian.
Definition 2.3.3 (Exponent of group). The exponent of a group G is the smallest integer
n such that an = e for all a ∈ G.
2.4 Dihedral groups
Definition 2.4.1 (Dihedral groups D2n ). Dihedral groups are the symmetries of a regular
n-gon. It contains n rotations (including the identity symmetry, i.e. rotation by 0◦ ) and n
reflections.
We write the group as D2n . Note that the subscript refers to the order of the group, not
the number of sides of the polygon.
2.5 Direct products of groups
21
The dihedral group is not hard to define. However, we need to come up with a presentation of D2n that is easy to work with.
◦
We first look at the rotations. The set of all rotations is generated by r = 360
n . This r
has order n.
How about the reflections? We know that each reflection has order 2. Let s be our
favorite reflection. Then using some geometric arguments, we can show that any reflection
can be written as a product of rm and s for some m. We also have srs = r−1 .
Hence we can define D2n as follows: D2n is a group generated by r and s, and every
element can be written as a product of r’s and s’s. Whenever we see rn and s2 , we replace
it by e. When we see srs, we replace it by r−1 .
It then follows that every element can be written in the form rm s.
Formally, we can write D2n as follows:
D2n = hr, s | rn = s2 = e, srs−1 = r−1 i
= {e, r, r2 , · · · rn−1 , s, rs, r2 s, · · · rn−1 s}
This is a notation we will commonly use to represent groups. For example, a cyclic group
of order n can be written as
Cn = ha | an = ei.
2.5 Direct products of groups
Recall that if we have to sets X,Y , then we can obtain the product X × Y = {(x, y) : x ∈
X, y ∈ Y }. We can do the same if X and Y are groups.
Definition 2.5.1 (Direct product of groups). Given two groups (G, ◦) and (H, •), we can
define a set G × H = {(g, h) : g ∈ G, h ∈ H} and an operation (a1 , a2 ) ∗ (b1 , b2 ) = (a1 ◦
b1 , a2 • b2 ). This forms a group.
Why would we want to take the product of two groups? Suppose we have two independent triangles. Then the symmetries of this system include, say rotating the first triangle,
rotating the second, or rotating both. The symmetry group of this combined system would
then be D6 × D6 .
Example 2.5.1.
C2 ×C2 = {(0, 0), (0, 1), (1, 0), (1, 1)}
= {e, x, y, xy} with everything order 2
= hx, y | x2 = y2 = e, xy = yxi
Proposition 2.5.1. Cn ×Cm ∼
= Cnm iff hcf(m, n) = 1.
Proof. Suppose that hcf(m, n) = 1. Let Cn = hai and Cm = hbi. Let k be the order of
(a, b). Then (a, b)k = (ak , bk ) = e. This is possible only if n | k and m | k, i.e. k is a
common multiple n and m. Since the order is the minimum value of k that satisfies the
nm
= nm.
above equation, k = lcm(n, m) = hcf(n,m)
22
Chapter 2. Groups and homomorphisms
Now consider h(a, b)i ≤ Cn ×Cm . Since (a, b) has order nm, h(a, b)i has nm elements.
Since Cn ×Cm also has nm elements, h(a, b)i must be the whole of Cn ×Cm . And we know
that h(a, b)i ∼
= Cnm . So Cn ×Cm ∼
= Cnm .
On the other hand, suppose hcf(m, n) 6= 1. Then k = lcm(m, n) 6= mn. Then for any
(a, b) ∈ Cn ×Cm ,we have (a, b)k = (ak , bk ) = e. So the order of any (a, b) is at most k < mn.
So there is no element of order mn. So Cn ×Cm is not a cyclic group of order nm.
■
Given a complicated group G, it is sometimes helpful to write it as a product H × K,
which could make things a bit simpler. We can do so by the following theorem:
Proposition 2.5.2 (Direct product theorem). Let H1 , H2 ≤ G. Suppose the following are
true:
(i) H1 ∩ H2 = {e}.
(ii) (∀ai ∈ Hi ) a1 a2 = a2 a1 .
(iii) (∀a ∈ G)(∃ai ∈ Hi ) a = a1 a2 . We also write this as G = H1 H2 .
Then G ∼
= H1 × H2 .
Proof. Define f : H1 × H2 → G by f (a1 , a2 ) = a1 a2 . Then it is a homomorphism since
f ((a1 , a2 ) ∗ (b1 , b2 )) = f (a1 b1 , a2 b2 )
= a1 b1 a2 b2
= a1 a2 b1 b2
= f (a1 , a2 ) f (b1 , b2 ).
Surjectivity follows from (iii). We’ll show injectivity by showing that the kernel is {e}. If
−1
f (a1 , a2 ) = e, then we know that a1 a2 = e. Then a1 = a−1
2 . Since a1 ∈ H1 and a2 ∈ H2 ,
we have a1 = a−1
■
2 ∈ H1 ∩ H2 = {e}. Thus a1 = a2 = e and ker f = {e}.
3. Symmetric group I
We will devote two full chapters to the study of symmetric groups, because it is really important. Recall that we defined a symmetry to be an operation that leaves some important
property of the object intact. We can treat each such operation as a bijection. For example,
a symmetry of R2 is a bijection f : R2 → R2 that preserves distances. Note that we must
require it to be a bijection, instead of a mere function, since we require each symmetry to
be an inverse.
We can consider the case where we don’t care about anything at all. So a “symmetry” would be any arbitrary bijection X → X, and the set of all bijections will form a
group, known as the symmetric group. Of course, we will no longer think of these as
“symmetries” anymore, but just bijections.
In some sense, the symmetric group is the most general case of a symmetry group. In
fact, we will later (in Chapter 6) show that every group can be written as a subgroup of
some symmetric group.
3.1 Symmetric groups
Definition 3.1.1 (Permutation). A permutation of X is a bijection from a set X to X itself.
The set of all permutations on X is Sym X.
When composing permutations, we treat them as functions. So if σ and ρ are permutations, σ ◦ ρ is given by first applying ρ , then applying σ .
Theorem 3.1.1. Sym X with composition forms a group.
Proof. The groups axioms are satisfied as follows:
1. If σ : X → X and τ : X → X, then σ ◦ τ : X → X. If they are both bijections, then
the composite is also bijective. So if σ , τ ∈ Sym X, then σ ◦ τ ∈ Sym X.
2. The identity 1X : X → X is clearly a permutation, and gives the identity of the group.
24
Chapter 3. Symmetric group I
3. Every bijective function has a bijective inverse. So if σ ∈ Sym X, then σ −1 ∈
Sym X.
4. Composition of functions is associative.
■
Definition 3.1.2 (Symmetric group Sn ). If X is finite, (usually use X = {1, 2, · · · , n}), we
write Sym X = Sn . This is the symmetric group of degree n.
It is important to note that the degree of the symmetric group is different from the
order of the symmetric group. For example, S3 has degree 3 but order 6. In general, the
order of Sn is n!.
There are two ways to write out an element of the symmetric group. The first is the
two row notation.
Notation 3.1. (Two row notation) We write 1, 2, 3, · · · n on the top line and their images
below, e.g.
1 2 3
1 2 3 4 5
∈ S3 and
∈ S5
2 3 1
2 1 3 4 5
In general, if σ : X → X, we write
1
2
3
···
n
σ (1) σ (2) σ (3) · · · σ (n)
Example 3.1.1. For small
n, we
have
1
(i) When n = 1, Sn =
= {e} ∼
= C1 .
1
1 2
1 2
∼
(ii) When n = 2, Sn =
,
= C2
1 2
2 1
(iii) When n = 3,

1
2
3
1
2
3
1


,
,


2 3 1
3
 1 2 3
Sn = 
 1 2 3
1 2 3
1


,
,

2 1 3
3 2 1
1

2 3 


1 2 

∼
 = D6 .

2 3 


3 2
Note that S3 is not abelian. Thus Sn is not abelian for n ≥ 3 since we can always
view S3 as a subgroup of Sn by fixing 4, 5, 6, · · · n.
In general, we can view D2n as a subgroup of Sn because each symmetry is a permutation of the corners.
While the two row notation is fully general and can represent any (finite) permutation,
it is clumsy to write and wastes a lot of space. It is also very annoying to type using LATEX.
Hence, most of the time, we actually use the cycle notation.
Notation 3.2 (Cycle notation). If a map sends 1 7→ 2, 2 7→ 3, 3 7→ 1, then we write it as a
cycle (1 2 3). Alternatively, we can write (2 3 1) or (3 1 2), but by convention, we usually
write the smallest number first. We leave out numbers that don’t move. So we write (1 2)
instead of (1 2)(3).
For more complicated maps, we can write them as products of cycles. For example, in
S4 , we can have things like (1 2)(3 4).
3.1 Symmetric groups
25
The order of each cycle is the length of the cycle, and the inverse is the cycle written
the other way round, e.g. (1 2 3)−1 = (3 2 1) = (1 3 2).
Example 3.1.2.
(i) Suppose we want to simplify (1 2 3)(1 2). Recall that composition is from right to
left. So 1 gets mapped to 3 ((1 2) maps 1 to 2, and (1 2 3) further maps it to 3).
Then 3 gets mapped to 1. 2 is mapped to 2 itself. So (1 2 3)(1 2) = (1 3)(2)
(ii) (1 2 3 4)(1 4) = (1)(2 3 4) = (2 3 4).
Definition 3.1.3 (k-cycles and transpositions). We call (a1 a2 a3 · · · ak ) a k-cycle. 2-cycles
are called transpositions. Two cycles are disjoint if no number appears in both cycles.
Example 3.1.3. (1 2) and (3 4) are disjoint but (1 2 3) and (1 2) are not.
Lemma 3.1.1. Disjoint cycles commute.
Proof. If σ , τ ∈ Sn are disjoint cycles. Consider any n. Show that: σ (τ (a)) = τ (σ (a)). If
a is in neither of σ and τ , then σ (τ (a)) = τ (σ (a)) = a. Otherwise, wlog assume that a is
in τ but not in σ . Then τ (a) ∈ τ and thus τ (a) 6∈ σ . Thus σ (a) = a and σ (τ (a)) = τ (a).
■
Therefore we have σ (τ (a)) = τ (σ (a)) = τ (a). Therefore τ and σ commute.
In general, non-disjoint cycles may not commute. For example, (1 3)(2 3) = (1 3 2)
while (2 3)(1 3) = (1 2 3).
Theorem 3.1.2. Any permutation in Sn can be written (essentially) uniquely as a product of disjoint cycles. (Essentially unique means unique up to re-ordering of cycles and
rotation within cycles, e.g. (1 2) and (2 1))
Proof. Let σ ∈ Sn . Start with (1 σ (1) σ 2 (1) σ 3 (1) · · · ). As the set {1, 2, 3 · · · n} is finite,
for some k, we must have σ k (1) already in the list. If σ k (1) = σ l (1), with l < k, then
σ k−l (1) = 1. So all σ i (1) are distinct until we get back to 1. Thus we have the first cycle
(1 σ (1) σ 2 (1) σ 3 (1) · · · σ k−1 (1)).
Now choose the smallest number that is not yet in a cycle, say j. Repeat to obtain a
cycle ( j σ ( j) σ 2 ( j) · · · σ l−1 ( j)). Since σ is a bijection, nothing in this cycle can be in
previous cycles as well.
Repeat until all {1, 2, 3 · · · n} are exhausted. This is essentially unique because every
number j completely determines the whole cycle it belongs to, and whichever number we
start with, we’ll end up with the same cycle.
■
Definition 3.1.4 (Cycle type). Write a permutation σ ∈ Sn in disjoint cycle notation. The
cycle type is the list of cycle lengths. This is unique up to re-ordering. We often (but not
always) leave out singleton cycles.
Example 3.1.4. (1 2) has cycle type 2 (transposition). (1 2)(3 4) has cycle type 2, 2
(double transposition). (1 2 3)(4 5) has cycle type 3, 2.
Lemma 3.1.2. For σ ∈ Sn , the order of σ is the least common multiple of cycle lengths
in the disjoint cycle notation. In particular, a k-cycle has order k.
26
Chapter 3. Symmetric group I
Proof. As disjoint cycles commute, we can group together each cycle when we take powers. i.e. if σ = τ1 τ2 · · · τl with τi all disjoint cycles, then σ m = τ1m τ2m · · · τlm .
Now if cycle τi has length ki , then τiki = e, and τim = e iff ki | m. To get an m such that
σ m = e, we need all ki to divide m. i.e. m is a common multiple of ki . Since the order is
the least possible m such that σ m = e, the order is the least common multiple of ki .
■
Example 3.1.5. Any transpositions and double transpositions have order 2.
(1 2 3)(4 5) has order 6.
3.2 Sign of permutations
To classify different permutations, we can group different permutations according to their
cycle type. While this is a very useful thing to do, it is a rather fine division. In this
section, we will assign a “sign” to each permutation, and each permutation can either be
odd or even. This high-level classification allows us to separate permutations into two
sets, which is also a useful notion.
To define the sign, we first need to write permutations as products of transpositions.
Proposition 3.2.1. Every permutation is a product of transpositions.
This is not a deep or mysterious fact. All it says is that you can rearrange things
however you want just by swapping two objects at a time.
Proof. As each permutation is a product of disjoint cycles, it suffices to prove that each
cycle is a product of transpositions. Consider a cycle (a1 a2 a3 · · · ak ). This is in fact
equal to (a1 a2 )(a2 a3 ) · · · (ak−1 ak ). Thus a k-cycle can be written as a product of k − 1
transpositions.
■
Note that the product is not unique. For example,
(1 2 3 4 5) = (1 2)(2 3)(3 4)(4 5) = (1 2)(2 3)(1 2)(3 4)(1 2)(4 5).
However, the number of terms in the product, mod 2, is always the same.
Theorem 3.2.1. Writing σ ∈ Sn as a product of transpositions in different ways, σ is
either always composed of an even number of transpositions, or always an odd number of
transpositions.
The proof is rather magical.
Proof. Write #(σ ) for the number of cycles in disjoint cycle notation, including singleton
cycles. So #(e) = n and #((1 2)) = n−1. When we multiply σ by a transposition τ = (c d)
(wlog assume c < d),
– If c, d are in the same σ -cycle, say,
(c a2 · · · ak−1 d ak+1 · · · ak+l )(c d) = (c ak+1 ak+2 · · · ak+l )(d a2 a3 · · · ak−1 ).
So #(σ τ ) = #(σ ) + 1.
3.2 Sign of permutations
27
– If c, d are in different σ -cycles, say
(d a2 a3 · · · ak−1 )(c ak+1 ak+2 · · · ak+l )(c d)
= (c a2 · · · ak−1 d ak+1 · · · ak+l )(c d)(c d)
= (c a2 · · · ak−1 d ak+1 · · · ak+l ) and #(σ τ ) = #(σ ) − 1.
Therefore for any transposition τ , #(σ τ ) ≡ #(σ ) + 1 (mod 2).
Now suppose σ = τ1 · · · τl = τ10 · · · τk0 . Since disjoint cycle notation is unique, #(σ ) is
uniquely determined by σ .
Now we can construct σ by starting with e and multiplying the transpositions one
by one. Each time we add a transposition, we increase #(σ ) by 1 (mod 2). So #(σ ) ≡
#(e) + l (mod 2). Similarly, #(σ ) ≡ #(e) + k (mod 2). So l ≡ k (mod 2).
■
Definition 3.2.1 (Sign of permutation). Viewing σ ∈ Sn as a product of transpositions,
σ = τ1 · · · τl , we call sgn(σ ) = (−1)l . If sgn(σ ) = 1, we call σ an even permutation. If
sgn(σ ) = −1, we call σ an odd permutation.
While l itself is not well-defined, it is either always odd or always even, and (−1)l is
well-defined.
Theorem 3.2.2. For n ≥ 2, sgn : Sn → {±1} is a surjective group homomorphism.
Proof. Suppose σ1 = τ1 · · · τl1 and σ2 = τ10 · · · τl2 . Then
sgn(σ1 σ2 ) = (−1)l1 +l2 = (−1)l1 (−1)l2 = sgn(σ1 ) sgn(σ2 ).
So it is a homomorphism. It is surjective since sgn(e) = 1 and sgn((1 2)) = −1.
■
It is this was rather trivial to prove. The hard bit is showing that sgn is well defined.
If a question asks you to show that sgn is a well-defined group homomorphism, you have
to show that it is well-defined.
Lemma 3.2.1. σ is an even permutation iff the number of cycles of even length is even.
Proof. A k-cycle can be written as k − 1 transpositions. Thus an even-length cycle is odd,
vice versa.
Since sgn is a group homomorphism, writing σ in disjoint cycle notation σ = σ1 · · · σl ,
we get sgn(σ ) = sgn(σ1 ) · · · sgn(σl ). Suppose there are m even-length cycles and n oddlength cycles, then sgn(σ ) = (−1)m 1n . This is equal to 1 iff (−1)m = 1, i.e. m is even. ■
Rather confusingly, odd length cycles are even, and even length cycles are odd.
Definition 3.2.2 (Alternating group An ). The alternating group An is the kernel of sgn, i.e.
the even permutations. Since An is a kernel of a group homomorphism, An ≤ Sn .
Among the many uses of the sgn homomorphism, it is used in the definition of the
determinant of a matrix: if An×n is a square matrix, then
det A = ∑ sgn(σ )a1σ (1) · · · anσ (n) .
σ ∈Sn
Proposition 3.2.2. Any subgroup of Sn contains either no odd permutations or exactly
half.
28
Chapter 3. Symmetric group I
Proof. If Sn has at least one odd permutation τ , then there exists a bijection between
the odd and even permutations by σ 7→ σ τ (bijection since σ 7→ σ τ −1 is a well-defined
inverse). So there are as many odd permutations as even permutations.
■
After we prove the isomorphism theorem later, we can provide an even shorter proof
of this.
4. Lagrange’s Theorem
One can model a Rubik’s cube with a group, with each possible move corresponding to a
group element. Of course, Rubik’s cubes of different sizes correspond to different groups.
Suppose I have a 4 × 4 × 4 Rubik’s cube, but I want to practice solving a 2 × 2 × 2
Rubik’s cube. It is easy. I just have to make sure every time I make a move, I move two
layers together. Then I can pretend I am solving a 2 × 2 × 2 cube. This corresponds to
picking a particular subgroup of the 4 × 4 × 4 group.
Now what if I have a 3 × 3 × 3 cube? I can still practice solving a 2 × 2 × 2 one. This
time, I just look at the corners and pretend that the edges and centers do not exist. Then
I am satisfied when the corners are in the right positions, while the centers and edges can
be completely scrambled. In this case, we are not taking a subgroup. Instead, we are
identifying certain moves together. In particular, we are treating two moves as the same
as long as their difference is confined to the centers and edges.
Let G be the 3 × 3 × 3 cube group, and H be the subgroup of G that only permutes the
edges and centers. Then for any a, b ∈ G, we think a and b are “the same” if a−1 b ∈ H.
Then the set of things equivalent to a is aH = {ah : h ∈ H}. We call this a coset, and the
set of cosets form a group.
An immediate question one can ask is: why not Ha = {ha : h ∈ H}? In this particular
case, the two happen to be the same for all possible a. However, for a general subgroup
H, they need not be. We can still define the coset aH = {ah : h ∈ H}, but these are less
interesting. For example, the set of all {aH} will no longer form a group. We will look
into these more in-depth in the next chapter. In this chapter, we will first look at results for
general cosets. In particular, we will, step by step, prove the things we casually claimed
above.
Definition 4.0.1 (Cosets). Let H ≤ G and a ∈ G. Then the set aH = {ah : h ∈ H} is a left
coset of H and Ha = {ha : h ∈ H} is a right coset of H.
Example 4.0.1.
30
Chapter 4. Lagrange’s Theorem
(i) Take 2Z ≤ Z. Then
6+2Z = {all even numbers} = 0+2Z,
1+2Z = {all odd numbers} = 17+2Z.
(ii) Take G = S3 , let H = h(1 2)i = {e, (1 2)}. The left cosets are
eH = (1 2)H = {e, (1 2)}
(1 3)H = (1 2 3)H = {(1 3), (1 2 3)}
(2 3)H = (1 3 2)H = {(2 3), (1 3 2)}
(iii) Take G = D6 (which is isomorphic to S3 ). And recall D6 = hr, s | r3 e = s2 , rs =
sr−1 i.Take H = hsi = {e, s}. We have left coset rH = {r, rs = sr−1 } and the right
coset Hr = {r, sr}. Thus rH 6= Hr.
Proposition 4.0.1. aH = bH ⇔ b−1 a ∈ H.
Proof. (⇒) Since a ∈ aH, a ∈ bH. Then a = bh for some h ∈ H. So b−1 a = h ∈ H.
(⇐). Let b−1 a = h0 . Then a = bh0 . Then ∀ah ∈ aH, we have ah = b(h0 h) ∈ bH. So
aH ⊆ bH. Similarly, bH ⊆ aH. So aH = bH.
■
Definition 4.0.2 (Partition). Let X be a set, and X1 , · · · Xn be subsets of X. The Xi are
S
called a partition of X if Xi = X and Xi ∩ X j = 0/ for i 6= j. i.e. every element is in
exactly one of Xi .
Lemma 4.0.1. The left cosets of a subgroup H ≤ G partition G, and every coset has the
same size.
Proof. For each a ∈ G, a ∈ aH. Thus the union of all cosets gives all of G. Now we have
to show that for all a, b ∈ G, the cosets aH and bH are either the same or disjoint.
Suppose that aH and bH are not disjoint. Let ah1 = bh2 ∈ aH ∩ bH. Then b−1 a =
h2 h−1
1 ∈ H. So aH = bH.
To show that they each coset has the same size, note that f : H → aH with f (h) = ah
is invertible with inverse f −1 (h) = a−1 h. Thus there exists a bijection between them and
they have the same size.
■
Definition 4.0.3 (Index of a subgroup). The index of H in G, written |G : H|, is the number
of left cosets of H in G.
Theorem 4.0.1 (Lagrange’s theorem). If G is a finite group and H is a subgroup of G,
then |H| divides |G|. In particular,
|H||G : H| = |G|.
Note that the converse is not true. If k divides |G|, there is not necessarily a subgroup
of order k, e.g. |A4 | = 12 but there is no subgroup of order 6. However, we will later see
that this is true if k is a prime (cf. Cauchy’s theorem).
Proof. Suppose that there are |G : H| left cosets in total. Since the left cosets partition G,
and each coset has size |H|, we have
|H||G : H| = |G|.
■
31
Again, the hard part of this proof is to prove that the left cosets partition G and have
the same size. If you are asked to prove Lagrange’s theorem in exams, that is what you
actually have to prove.
Corollary 4.0.1. The order of an element divides the order of the group, i.e. for any finite
group G and a ∈ G, ord(a) divides |G|.
Proof. Consider the subgroup generated by a, which has order ord(a). Then by Lagrange’s theorem, ord(a) divides |G|.
■
Corollary 4.0.2. The exponent of a group divides the order of the group, i.e. for any finite
group G and a ∈ G, a|G| = e.
Proof. We know that |G| = k ord(a) for some k ∈ N. Then a|G| = (aord(a) )k = ek = e.
■
Corollary 4.0.3. Groups of prime order are cyclic and are generated by every non-identity
element.
Proof. Say |G| = p. If a ∈ G is not the identity, the subgroup generated by a must have
order p since it has to divide p. Thus the subgroup generated by a has the same size as G
and they must be equal. Then G must be cyclic since it is equal to the subgroup generated
by a.
■
A useful way to think about cosets is to view them as equivalence classes. To do so,
we need to first define what an equivalence class is.
Definition 4.0.4 (Equivalence relation). An equivalence relation ∼ is a relation that is
reflexive, symmetric and transitive. i.e.
(i) (∀x) x ∼ x
(reflexivity)
(ii) (∀x, y) x ∼ y ⇒ y ∼ x
(symmetry)
(iii) (∀x, y, z) [(x ∼ y) ∧ (y ∼ z) ⇒ x ∼ z]
(transitivity)
Example 4.0.2. The following relations are equivalence relations:
(i) Consider Z. The relation ≡n defined as a ≡n b ⇔ n | (a − b).
(ii) Consider the set (formally: class) of all finite groups. Then “is isomorphic to” is an
equivalence relation.
Definition 4.0.5 (Equivalence class). Given an equivalence relation ∼ on A, the equivalence class of a is
[a]∼ = [a] = {b ∈ A : a ∼ b}
Proposition 4.0.2. The equivalence classes form a partition of A.
Proof. By reflexivity, we have a ∈ [a]. Thus the equivalence classes cover the whole set.
We must now show that for all a, b ∈ A, either [a] = [b] or [a] ∩ [b] = 0.
/
Suppose [a] ∩ [b] 6= 0.
/ Then ∃c ∈ [a] ∩ [b]. So a ∼ c, b ∼ c. By symmetry, c ∼ b. By
transitivity, we have a ∼ b. Now for all b0 ∈ [b], we have b ∼ b0 . Thus by transitivity, we
have a ∼ b0 . Thus [b] ⊆ [a]. Similarly, [a] ⊆ [b] and [a] = [b].
■
Lemma 4.0.2. Given a group G and a subgroup H, define the equivalence relation on G
with a ∼ b iff b−1 a ∈ H. The equivalence classes are the left cosets of H.
32
Chapter 4. Lagrange’s Theorem
Proof. First show that it is an equivalence relation.
(i) Reflexivity: Since aa−1 = e ∈ H, a ∼ a.
(ii) Symmetry: a ∼ b ⇒ b−1 a ∈ H ⇒ (b−1 a)−1 = a−1 b ∈ H ⇒ b ∼ a.
(iii) Transitivity: If a ∼ b and b ∼ c, we have b−1 a, c−1 b ∈ H. So c−1 bb−1 a = c−1 a ∈ H.
So a ∼ c.
To show that the equivalence classes are the cosets, we have a ∼ b ⇔ b−1 a ∈ H ⇔ aH =
bH.
■
Example 4.0.3. Consider (Z, +), and for fixed n, take the subgroup nZ. The cosets
are 0 + H, 1 + H, · · · (n − 1) + H. We can write these as [0], [1], [2] · · · [n]. To perform
arithmetic “mod n”, define [a] + [b] = [a + b], and [a][b] = [ab]. We need to check that it
is well-defined, i.e. it doesn’t depend on the choice of the representative of [a].
If [a1 ] = [a2 ] and [b1 ] = [b2 ], then a1 = a2 + kn and b1 = b2 + kn, then a1 + b1 =
a2 + b2 + n(k + l) and a1 b1 = a2 b2 + n(kb2 + la2 + kln). So [a1 + b1 ] = [a2 + b2 ] and
[a1 b1 ] = [a2 b2 ].
We have seen that (Zn , +n ) is a group. What happens with multiplication? We can
only take elements which have inverses (these are called units, cf. IB Groups, Rings and
Modules). Call the set of them Un = {[a] : (a, n) = 1}. We’ll see these are the units.
Definition 4.0.6 (Euler totient function). (Euler totient function) ϕ (n) = |Un |.
Example 4.0.4. If p is a prime, ϕ (n) = p − 1. ϕ (4) = 2.
Proposition 4.0.3. Un is a group under multiplication mod n.
Proof. The operation is well-defined as shown above. To check the axioms:
1. Closure: if a, b are coprime to n, then a · b is also coprime to n. So [a], [b] ∈ Un ⇒
[a] · [b] = [a · b] ∈ Un
2. Identity: [1]
3. Let [a] ∈ Un . Consider the map Un → Un with [c] 7→ [ac]. This is injective: if
[ac1 ] = [ac2 ], then n divides a(c1 − c2 ). Since a is coprime to n, n divides c1 − c2 ,
so [c1 ] = [c2 ]. Since Un is finite, any injection (Un → Un ) is also a surjection. So
there exists a c such that [ac] = [a][c] = 1. So [c] = [a]−1 .
4. Associativity (and also commutativity): inherited from Z.
■
Theorem 4.0.2 (Fermat-Euler theorem). Let n ∈ N and a ∈ Z coprime to n. Then
aϕ (n) ≡ 1
(mod n).
In particular, (Fermat’s Little Theorem) if n = p is a prime, then for any a not a multiple
of p.
a p−1 ≡ 1
(mod p).
Proof. As a is coprime with n, [a] ∈ Un . Then [a]|Un | = [1], i.e. aϕ (n) ≡ 1 (mod n).
■
4.1 Small groups
33
4.1 Small groups
We will study the structures of certain small groups.
Example 4.1.1 (Using Lagrange theorem to find subgroups). To find subgroups of D10 ,
we know that the subgroups must have size 1, 2, 5 or 10:
1: {e}
2: The groups generated by the 5 reflections of order 2
5: The group must be cyclic since it has prime order 5. It is then generated by an
element of order 5, i.e. r, r2 , r3 and r4 . They generate the same group hri.
10: D10
As for D8 , subgroups must have order 1, 2, 4 or 8.
1: {e}
2: 5 elements of order 2, namely 4 reflections and r2 .
4: First consider the subgroup isomorphic to C4 , which is hri. There are two other
non-cyclic group.
8: D8
Proposition 4.1.1. Any group of order 4 is either isomorphic to C4 or C2 ×C2 .
Proof. Let |G| = 4. By Lagrange theorem, possible element orders are 1 (e only), 2 and
4. If there is an element a ∈ G of order 4, then G = hai ∼
= C4 .
Otherwise all non-identity elements have order 2. Then G must be abelian (For any
a, b, (ab)2 = 1 ⇒ ab = (ab)−1 ⇒ ab = b−1 a−1 ⇒ ab = ba). Pick 2 elements of order 2,
say b, c ∈ G, then hbi = {e, b} and hci = {e, c}. So hbi ∩ hci = {e}. As G is abelian, hbi
and hci commute. We know that bc = cb has order 2 as well, and is the only element of
G left. So G ∼
■
= hbi × hci ∼
= C2 ×C2 by the direct product theorem.
Proposition 4.1.2. A group of order 6 is either cyclic or dihedral (i.e. is isomorphic to C6
or D6 ). (See proof in next section)
4.2 Left and right cosets
As |aH| = |H| and similarly |H| = |Ha|, left and right cosets have the same size. Are they
necessarily the same? We’ve previously shown that they might not be the same. In some
other cases, they are.
Example 4.2.1.
(i) Take G = (Z, +) and H = 2Z. We have 0 + 2Z = 2Z + 0 = even numbers and
1 + 2Z = 2Z + 1 = odd numbers. Since G is abelian, aH = Ha for all a, ∈ G,
H ≤ G.
(ii) Let G = D6 = hr, s | r3 = e = s2 , rs = sr−1 i. Let U = hri. Since the cosets partition
G, so one must be U and the other sU = {s, sr = r2 s, sr2 = rs} = Us. So for all
a ∈ G, aU = Ua.
(iii) Let G = D6 and take H = hsi . We have H = {e, s}, rH = {r, rs = sr−1 } and
r2 H = {r2 , rs }; while H = {e, s}, Hr = {r, sr} and Hr2 = {r2 , sr2 }. So the left and
right subgroups do not coincide.
This distinction will become useful in the next chapter.
5. Quotient groups
In the previous section, when attempting to pretend that a 3 × 3 × 3 Rubik’s cube is a
2 × 2 × 2 one, we came up with the cosets aH, and claimed that these form a group. We
also said that this is not the case for arbitrary subgroup H, but only for subgroups that
satisfy aH = Ha. Before we prove these, we first study these subgroups a bit.
5.1 Normal subgroups
Definition 5.1.1 (Normal subgroup). A subgroup K of G is a normal subgroup if
(∀a ∈ G)(∀k ∈ K) aka−1 ∈ K.
We write K ◁ G. This is equivalent to:
(i) (∀a ∈ G) aK = Ka, i.e. left coset = right coset
(ii) (∀a ∈ G) aKa−1 = K (cf. conjugacy classes)
From the example last time, H = hsi ≤ D6 is not a normal subgroup, but K = hri ◁ D6 .
We know that every group G has at least two normal subgroups {e} and G.
Lemma 5.1.1.
(i) Every subgroup of index 2 is normal.
(ii) Any subgroup of an abelian group is normal.
Proof.
(i) If K ≤ G has index 2, then there are only two possible cosets K and G \ K. As
eK = Ke and cosets partition G, the other left coset and right coset must be G \ K.
So all left cosets and right cosets are the same.
(ii) For all a ∈ G and k ∈ K, we have aka−1 = aa−1 k = k ∈ K.
■
Proposition 5.1.1. Every kernel is a normal subgroup.
36
Chapter 5. Quotient groups
Proof. Given homomorphism f : G → H and some a ∈ G, for all k ∈ ker f , we have
f (aka−1 ) = f (a) f (k) f (a)−1 = f (a)e f (a)−1 = e. Therefore aka−1 ∈ ker f by definition
of the kernel.
■
In fact, we will see in the next section that all normal subgroups are kernels of some
homomorphism.
Example 5.1.1. Consider G = D8 . Let K = hr2 i is normal. Check: Any element of G
is either srℓ or rℓ for some ℓ. Clearly e satisfies aka−1 ∈ K. Now check r2 : For the
case of srℓ , we have srℓ r2 (srℓ )−1 = srℓ r2 r−ℓ s−1 = sr2 s = ssr−2 = r2 . For the case of rℓ ,
rℓ r2 r−ℓ = r2 .
Proposition 5.1.2. A group of order 6 is either cyclic or dihedral (i.e. ∼
= C6 or D6 ).
Proof. Let |G| = 6. By Lagrange theorem, possible element orders are 1, 2, 3 and 6. If
there is an a ∈ G of order 6, then G = hai ∼
= C6 . Otherwise, we can only have elements of
orders 2 and 3 other than the identity. If G only has elements of order 2, the order must
be a power of 2 by Sheet 1 Q. 8, which is not the case. So there must be an element r of
order 3. So hri ◁ G as it has index 2. Now G must also have an element s of order 2 by
Sheet 1 Q. 9.
Since hri is normal, we know that srs−1 ∈ hri. If srs−1 = e, then r = e, which is not
true. If srs−1 = r, then sr = rs and sr has order 6 (lcm of the orders of s and r), which
was ruled out above. Otherwise if srs−1 = r2 = r−1 , then G is dihedral by definition of
the dihedral group.
■
5.2 Quotient groups
Proposition 5.2.1. Let K ◁ G. Then the set of (left) cosets of K in G is a group under the
operation aK ∗ bK = (ab)K.
Proof. First show that the operation is well-defined. If aK = a0 K and bK = b0 K, we want
to show that aK ∗ bK = a0 K ∗ b0 K. We know that a0 = ak1 and b0 = bk2 for some k1 , k2 ∈ K.
Then a0 b0 = ak1 bk2 . We know that b−1 k1 b ∈ K. Let b−1 k1 b = k3 . Then k1 b = bk3 . So
a0 b0 = abk3 k2 ∈ (ab)K. So picking a different representative of the coset gives the same
product.
1. Closure: If aK, bK are cosets, then (ab)K is also a coset
2. Identity: The identity is eK = K (clear from definition)
3. Inverse: The inverse of aK is a−1 K (clear from definition)
4. Associativity: Follows from the associativity of G.
■
Definition 5.2.1 (Quotient group). Given a group G and a normal subgroup K, the quotient group or factor group of G by K, written as G/K, is the set of (left) cosets of K in G
under the operation aK ∗ bK = (ab)K.
Note that the set of left cosets also exists for non-normal subgroups (abnormal subgroups?), but the group operation above is not well defined.
Example 5.2.1.
5.3 The Isomorphism Theorem
37
(i) Take G = Z and nZ (which must be normal since G is abelian), the cosets are k + nZ
for 0 ≤ k < n. The quotient group is Zn . So we can write Z/(nZ) = Zn . In fact
these are the only quotient groups of Z since nZ are the only subgroups.
Note that if G is abelian, G/K is also abelian.
(ii) Take K = hri ◁ D6 . We have two cosets K and sK. So D6 /K has order 2 and is
isomorphic to C2 .
(iii) Take K = hr2 i ◁ D8 . We know that G/K should have 82 = 4 elements. We have
G/K = {K, rK = r3 K, sK = sr2 K, srK = sr3 K}. We see that all elements (except
K) has order 2, so G/K ∼
= C2 ×C2 .
Note that quotient groups are not subgroups of G. They contain different kinds of
elements. For example, Z/nZ ∼
= Cn are finite, but all subgroups of Z infinite.
Example 5.2.2. (Non-example) Consider D6 with H = hsi. H is not a normal subgroup.
We have rH ∗ r2 H = r3 H = H, but rH = rsH and r2 H = srH (by considering the individual elements). So we have rsH ∗ srH = r2 H 6= H, and the operation is not well-defined.
Lemma 5.2.1. Given K ◁ G, the quotient map q : G → G/K with g 7→ gK is a surjective
group homomorphism.
Proof. q(ab) = (ab)K = aKbK = q(a)q(b). So q is a group homomorphism. Also for all
aK ∈ G/K, q(a) = aK. So it is surjective.
■
Note that the kernel of the quotient map is K itself. So any normal subgroup is a
kernel of some homomorphism.
Proposition 5.2.2. The quotient of a cyclic group is cyclic.
Proof. Let G = Cn with H ≤ Cn . We know that H is also cyclic. Say Cn = hci and
H = hck i ∼
= Cℓ , where kℓ = n. We have Cn /H = {H, cH, c2 H, · · · ck−1 H} = hcHi ∼
= Ck . ■
5.3 The Isomorphism Theorem
Now we come to the Really Important TheoremTM .
Theorem 5.3.1 (The Isomorphism Theorem). Let f : G → H be a group homomorphism
with kernel K. Then K ◁ G and G/K ∼
= im f .
Proof. We have proved that K ◁ G before. We define a group homomorphism θ : G/K →
im f by θ (aK) = f (a).
First check that this is well-defined: If a1 K = a2 K, then a−1
2 a1 ∈ K. So
f (a2 )−1 f (a1 ) = f (a−1
2 a1 ) = e.
So f (a1 ) = f (a2 ) and θ (a1 K) = θ (a2 K).
Now we check that it is a group homomorphism:
θ (aKbK) = θ (abK) = f (ab) = f (a) f (b) = θ (aK)θ (bK).
To show that it is injective, suppose θ (aK) = θ (bK). Then f (a) = f (b).
Hence f (b)−1 f (a) = e. Hence b−1 a ∈ K. So aK = bK.
By definition, θ is surjective since im θ = im f . So θ gives an isomorphism G/K ∼
=
im f ≤ H.
■
38
Chapter 5. Quotient groups
If f is injective, then the kernel is {e}, so G/K ∼
= G and G is isomorphic to a subgroup
of H. We can think of f as an inclusion map. If f is surjective, then im f = H. In this
case, G/K ∼
= H.
Example 5.3.1.
∗
∗
(i) Take f : GL
n (R) → R with
 A 7→ det A, ker f = SLN (R). im f = R as for all
λ 0 ··· 0
 0 1 · · · 0


λ ∈ R∗ , det  . . .
= R∗ .
.  = λ . So we know that GLn (R)/SLn (R) ∼
 .. .. . . .. 
0 0 0 1
(ii) Define θ : (R, +) → (C∗, ×) with r 7→ exp(2π ir). This is a group homomorphism
since θ (r + s) = exp(2π i(r + s)) = exp(2π ir) exp(2π is) = θ (r)θ (s). We know that
the kernel is Z ◁ R. Clearly the image is the unit circle (S1 , ×). So R/Z ∼
= (S1 , ×).
∗
2
(iii) G = (Z p , ×) for prime p 6= 2. We have f : G → G with a 7→ a . This is a homomorphism since (ab)2 = a2 b2 (Z∗p is abelian). The kernel is {±1} = {1, p − 1}. We
p−1
know that im f ∼
= G/ ker f with order 2 . These are known as quadratic residues.
Lemma 5.3.1. Any cyclic group is isomorphic to either Z or Z/(nZ) for some n ∈ N.
Proof. Let G = hci. Define f : Z → G with m 7→ cm . This is a group homomorphism
since cm1 +m2 = cm1 cm2 . f is surjective since G is by definition all cm for all m. We know
that ker f ◁ Z. We have three possibilities. Either
(i) ker f = {e}, so F is an isomorphism and G ∼
= Z; or
∼
(ii) ker f = Z, then G = Z/Z = {e} = C1 ; or
(iii) ker f = nZ (since these are the only proper subgroups of Z), then G ∼
= Z/(nZ). ■
Definition 5.3.1 (Simple group). A group is simple if it has no non-trivial proper normal
subgroup, i.e. only {e} and G are normal subgroups.
Example 5.3.2. C p for prime p are simple groups since it has no proper subgroups at all,
let alone normal ones. A5 is simple, which we will prove after Chapter 7.
The finite simple groups are the building blocks of all finite groups. All finite simple
groups have been classified (The Atlas of Finite Groups). If we have K ◁ G with K 6= G
or {e}, then we can “quotient out” G into G/K. If G/K is not simple, repeat. Then we
can write G as an “inverse quotient” of simple groups.
6. Group actions
Recall that we came up with groups to model symmetries and permutations. Intuitively,
elements of groups are supposed to “do things”. However, as we developed group theory,
we abstracted these away and just looked at how elements combine to form new elements.
Group actions recapture this idea and make each group element correspond to some function.
6.1 Group acting on sets
Definition 6.1.1 (Group action). Let X be a set and G be a group. An action of G on X is
a homomorphism φ : G → Sym X.
This means that the homomorphism φ turns each element g ∈ G into a permutation of
X, in a way that respects the group structure.
Instead of writing φ (g)(x), we usually directly write g(x) or gx.
Alternatively, we can define the group action as follows:
Proposition 6.1.1. Let X be a set and G be a group. Then φ : G → Sym X is a homomorphism (i.e. an action) iff θ : G × X → X defined by θ (g, x) = φ (g)(x) satisfies
1. (∀g ∈ G)(x ∈ X) θ (g, x) ∈ X.
2. (∀x ∈ X) θ (e, x) = x.
3. (∀g, h ∈ G)(∀x ∈ X) θ (g, θ (h, x)) = θ (gh, x).
This criteria is almost the definition of a homomorphism. However, here we do not
explicitly require θ (g, ·) to be a bijection, but require θ (e, ·) to be the identity function.
This automatically ensures that θ (g, ·) is a bijection, since when composed with θ (g−1 , ·),
it gives θ (e, ·), which is the identity. So θ (g, ·) has an inverse. This is usually an easier
thing to show.
Example 6.1.1.
40
Chapter 6. Group actions
(i) Trivial action: for any group G acting on any set X, we can have φ (g) = 1X for all
g, i.e. G does nothing.
(ii) Sn acts on {1, · · · n} by permutation.
(iii) D2n acts on the vertices of a regular n-gon (or the set {1, · · · , n}).
(iv) The rotations of a cube act on the faces/vertices/diagonals/axes of the cube.
Note that different groups can act on the same sets, and the same group can act on
different sets.
Definition 6.1.2 (Kernel of action). The kernel of an action G on X is the kernel of φ , i.e.
all g such that φ (g) = 1X .
Note that by the isomorphism theorem, ker φ ◁ G and G/K is isomorphic to a subgroup of Sym X.
Example 6.1.2.
(i) D2n acting on {1, 2 · · · n} gives φ : D2n → Sn with kernel {e}.
(ii) Let G be the rotations of a cube and let it act on the three axes x, y, z through the
faces. We have φ : G → S3 . Then any rotation by 180◦ doesn’t change the axes,
i.e. act as the identity. So the kernel of the action has at least 4 elements: e and the
three 180◦ rotations. In fact, we’ll see later that these 4 are exactly the kernel.
Definition 6.1.3 (Faithful action). An action is faithful if the kernel is just {e}.
6.2 Orbits and Stabilizers
Definition 6.2.1 (Orbit of action). Given an action G on X, the orbit of an element x ∈ X
is
orb(x) = G(x) = {y ∈ X : (∃g ∈ G) g(x) = y}.
Intuitively, it is the elements that x can possibly get mapped to.
Definition 6.2.2 (Stabilizer of action). The stabilizer of x is
stab(x) = Gx = {g ∈ G : g(x) = x} ⊆ G.
Intuitively, it is the elements in G that do not change x.
Lemma 6.2.1. stab(x) is a subgroup of G.
Proof. We know that e(x) = x by definition. So stab(x) is non-empty. Suppose g, h ∈
stab(x), then gh−1 (x) = g(h−1 (x)) = g(x) = x. So gh−1 ∈ stab(X). So stab(x) is a subgroup.
■
Example 6.2.1.
(i) Consider D8 acting on the corners of the square X = {1, 2, 3, 4}. Then orb(1) = X
since 1 can go anywhere by rotations. stab(1) = {e, reflection in the line through
1}
(ii) Consider the rotations of a cube acting on the three axes x, y, z. Then orb(x) is
everything, and stab(x) contains e, 180◦ rotations and rotations about the x axis.
6.2 Orbits and Stabilizers
41
Definition 6.2.3 (Transitive action). An action G on X is transitive if (∀x) orb(x) = X, i.e.
you can reach any element from any element.
Lemma 6.2.2. The orbits of an action partition X.
Proof. Firstly, (∀x)(x ∈ orb(x)) as e(x) = x. So every x is in some orbit.
Then suppose z ∈ orb(x) and z ∈ orb(y), we have to show that orb(x) = orb(y). We
know that z = g1 (x) and z = g2 (y) for some g1 , g2 . Then g1 (x) = g2 (y) and y = g−1
2 g1 (x).
−1
For any w = g3 (y) ∈ orb(y), we have w = g3 g2 g1 (x). So w ∈ orb(x). Thus orb(y) ⊆
orb(x) and similarly orb(x) ⊆ orb(y). Therefore orb(x) = orb(y).
■
Suppose a group G acts on X. We fix an x ∈ X. Then by definition of the orbit, given
any g ∈ G, we have g(x) ∈ orb(x). So each g ∈ G gives us a member of orb(x). Conversely,
every object in orb(x) arises this way, by definition of orb(x). However, different elements
in G can give us the same orbit. In particular, if g ∈ stab(x), then hg and h give us the same
object in orb(x), since hg(x) = h(g(x)) = h(x). So we have a correspondence between
things in orb(x) and members of G, “up to stab(x)”.
Theorem 6.2.1 (Orbit-stabilizer theorem). Let the group G act on X. Then there is a
bijection between orb(x) and cosets of stab(x) in G. In particular, if G is finite, then
| orb(x)|| stab(x)| = |G|.
Proof. We biject the cosets of stab(x) with elements in the orbit of x. Recall that G :
stab(x) is the set of cosets of stab(x). We can define
θ : (G : stab(x)) → orb(x)
g stab(x) 7→ g(x).
This is well-defined — if g stab(x) = h stab(x), then h = gk for some k ∈ stab(x). So
h(x) = g(k(x)) = g(x).
This map is surjective since for any y ∈ orb(x), there is some g ∈ G such that g(x) = y,
by definition. Then θ (g stab(x)) = y. It is injective since if g(x) = h(x), then h−1 g(x) = x.
So h−1 g ∈ stab(x). So g stab(x) = h stab(x).
Hence the number of cosets is | orb(x)|. Then the result follows from Lagrange’s
theorem.
■
An important application of the orbit-stabilizer theorem is determining group sizes.
To find the order of the symmetry group of, say, a pyramid, we find something for it to act
on, pick a favorite element, and find the orbit and stabilizer sizes.
Example 6.2.2.
(i) Suppose we want to know how big D2n is. D2n acts on the vertices {1, 2, 3, · · · , n}
transitively. So | orb(1)| = n. Also, stab(1) = {e, reflection in the line through 1}.
So |D2n | = | orb(1)|| stab(1)| = 2n.
Note that if the action is transitive, then all orbits have size |X| and thus all stabilizers have the same size.
(ii) Let h(1 2)i act on {1, 2, 3}. Then orb(1) = {1, 2} and stab(1) = {e}. orb(3) = {3}
and stab(3) = h(1 2)i.
(iii) Consider S4 acting on {1, 2, 3, 4}. We know that orb(1) = X and |S4 | = 24. So
∼
| stab(1)| = 24
4 = 6. That makes it easier to find stab(1). Clearly S{2,3,4} = S3 fix 1.
So S{2,3,4} ≤ stab(1). However, |S3 | = 6 = | stab(1)|, so this is all of the stabilizer.
42
Chapter 6. Group actions
6.3 Important actions
Given any group G, there are a few important actions we can define. In particular, we
will define the conjugation action, which is a very important concept on its own. In fact,
the whole of the next chapter will be devoted to studying conjugation in the symmetric
groups.
First, we will study some less important examples of actions.
Lemma 6.3.1 (Left regular action). Any group G acts on itself by left multiplication. This
action is faithful and transitive.
Proof. We have
1. (∀g ∈ G)(x ∈ G) g(x) = g · x ∈ G by definition of a group.
2. (∀x ∈ G) e · x = x by definition of a group.
3. g(hx) = (gh)x by associativity.
So it is an action.
To show that it is faithful, we want to know that [(∀x ∈ X) gx = x] ⇒ g = e. This
follows directly from the uniqueness of identity.
To show that it is transitive, ∀x, y ∈ G, then (yx−1 )(x) = y. So any x can be sent to any
y.
■
Theorem 6.3.1 (Cayley’s theorem). Every group is isomorphic to some subgroup of some
symmetric group.
Proof. Take the left regular action of G on itself. This gives a group homomorphism
φ : G → Sym G with ker φ = {e} as the action is faithful. By the isomorphism theorem,
■
G∼
= im φ ≤ Sym G.
Lemma 6.3.2 (Left coset action). Let H ≤ G. Then G acts on the left cosets of H by left
multiplication transitively.
Proof. First show that it is an action:
1. g(aH) = (ga)H is a coset of H.
2. e(aH) = (ea)H = aH.
3. g1 (g2 (aH)) = g1 ((g2 a)H) = (g1 g2 a)H = (g1 g2 )(aH).
To show that it is transitive, given aH, bH, we know that (ba−1 )(aH) = bH. So any
aH can be mapped to bH.
■
In the boring case where H = {e}, then this is just the left regular action since G/{e} ∼
=
G.
Definition 6.3.1 (Conjugation of element). The conjugation of a ∈ G by b ∈ G is given
by bab−1 ∈ G. Given any a, c, if there exists some b such that c = bab−1 , then we say a
and c are conjugate.
What is conjugation? This bab−1 form looks familiar from Vectors and Matrices. It is
the formula used for changing basis. If b is the change-of-basis matrix and a is a matrix,
then the matrix in the new basis is given by bab−1 . In this case, bab−1 is the same matrix
viewed from a different basis.
In general, two conjugate elements are “the same” in some sense. For example, we
will later show that in Sn , two elements are conjugate if and only if they have the same
6.3 Important actions
43
cycle type. Conjugate elements in general have many properties in common, such as their
order.
Lemma 6.3.3 (Conjugation action). Any group G acts on itself by conjugation (i.e. g(x) =
gxg−1 ).
Proof. To show that this is an action, we have
1. g(x) = gxg−1 ∈ G for all g, x ∈ G.
2. e(x) = exe−1 = x
3. g(h(x)) = g(hxh−1 ) = ghxh−1 g−1 = (gh)x(gh)−1 = (gh)(x)
■
Definition 6.3.2 (Conjugacy classes and centralizers). The conjugacy classes are the orbits of the conjugacy action.
ccl(a) = {b ∈ G : (∃g ∈ g) gag−1 = b}.
The centralizers are the stabilizers of this action, i.e. elements that commute with a.
CG (a) = {g ∈ G : gag−1 = a} = {g ∈ G : ga = ag}.
The centralizer is defined as the elements that commute with a particular element a.
For the whole group G, we can define the center.
Definition 6.3.3 (Center of group). The center of G is the elements that commute with
all other elements.
Z(G) = {g ∈ G : (∀a) gag−1 = a} = {g ∈ G : (∀a) ga = ag}.
It is sometimes written as C(G) instead of Z(G).
In many ways, conjugation is related to normal subgroups.
Lemma 6.3.4. Let K ◁ G. Then G acts by conjugation on K.
Proof. We only have to prove closure as the other properties follow from the conjugation
action. However, by definition of a normal subgroup, for every g ∈ G, k ∈ K, we have
gkg−1 ∈ K. So it is closed.
■
Proposition 6.3.1. Normal subgroups are exactly those subgroups which are unions of
conjugacy classes.
Proof. Let K ◁ G. If k ∈ K, then by definition for every g ∈ G, we get gkg−1 ∈ K. So
ccl(k) ⊆ K. So K is the union of the conjugacy classes of all its elements.
Conversely, if K is a union of conjugacy classes and a subgroup of G, then for all
k ∈ K, g ∈ G, we have gkg−1 ∈ K. So K is normal.
■
Lemma 6.3.5. Let X be the set of subgroups of G. Then G acts by conjugation on X.
Proof. To show that it is an action, we have
1. If H ≤ G, then we have to show that gHg−1 is also a subgroup. We know that e ∈ H
and thus geg−1 = e ∈ gHg−1 , so gHg−1 is non-empty. For any two elements gag−1
and gbg−1 ∈ gHg−1 , (gag−1 )(gbg−1 )−1 = g(ab−1 )g−1 ∈ gHg−1 . So gHg−1 is a
subgroup.
44
Chapter 6. Group actions
2. eHe−1 = H.
−1
−1
3. g1 (g2 Hg−1
2 )g1 = (g1 g2 )H(g1 g2 ) .
■
Under this action, normal subgroups have singleton orbits.
Definition 6.3.4 (Normalizer of subgroup). The normalizer of a subgroup is the stabilizer
of the (group) conjugation action.
NG (H) = {g ∈ G : gHg−1 = H}.
We clearly have H ⊆ NG (H). It is easy to show that NG (H) is the largest subgroup of
G in which H is a normal subgroup, hence the name.
There is a connection between actions in general and conjugation of subgroups.
Lemma 6.3.6. Stabilizers of the elements in the same orbit are conjugate, i.e. let G act on
X and let g ∈ G, x ∈ X. Then stab(g(x)) = g stab(x)g−1 .
6.4 Applications
Example 6.4.1. Let G+ be the rotations of a cube acting on the vertices. Let X be the
set of vertices. Then |X| = 8. Since the action is transitive, the orbit of element is the
whole of X. The stabilizer of vertex 1 is the set of rotations through 1 and the diagonally
opposite vertex, of which there are 3. So |G+ | = | orb(1)|| stab(1)| = 8 · 3 = 24.
Example 6.4.2. Let G be a finite simple group of order greater than 2, and H ≤ G have
index n 6= 1. Then |G| ≤ n!/2.
Proof. Consider the left coset action of G on H. We get a group homomorphism φ :
G → Sn since there are n cosets of H. Since H 6= G, φ is non-trivial and ker φ 6= G. Now
ker φ ◁ G. Since G is simple, ker φ = {e}. So G ∼
= im φ ⊆ Sn by the isomorphism theorem.
So |G| ≤ |Sn | = n!.
We can further refine this by considering sgn ◦φ : G → {±1}. The kernel of this
composite is normal in G. So K = ker(sgn ◦ϕ ) = {e} or G. Since G/K ∼
= im(sgn ◦ϕ ), we
know that |G|/|K| = 1 or 2 since im(sgn ◦ϕ ) has at most two elements. Hence for |G| > 2,
we cannot have K = {e}, or else |G|/|K| > 2. So we must have K = G, so sgn(φ (g)) = 1
for all g and im φ ≤ An . So |G| ≤ n!/2
■
We have seen on Sheet 1 that if |G| is even, then G has an element of order 2. In fact,
Theorem 6.4.1 (Cauchy’s Theorem). Let G be a finite group and prime p dividing |G|.
Then G has an element of order p (in fact there must be at least p − 1 elements of order
p).
It is important to remember that this only holds for prime p. For example, A4 doesn’t
have an element of order 6 even though 6 | 12 = |A4 |. The converse, however, holds for
any number trivially by Lagrange’s theorem.
6.4 Applications
45
Proof. Let G and p be fixed. Consider G p = G × G × · · · × G, the set of p-tuples of G.
Let X ⊆ G p be X = {(a1 , a2 , · · · , a p ) ∈ G p : a1 a2 · · · a p = e}.
In particular, if b has order p, then (b, b, · · · , b) ∈ X. In fact, if (b, b, · · · , b) ∈ X and
b 6= e, then b has order p, since p is prime.
Now let H = hh : h p = ei ∼
= C p be a cyclic group of order p with generator h (This h
is not related to G in any way). Let H act on X by “rotation”:
h(a1 , a2 , · · · , a p ) = (a2 , a3 , · · · , a p , a1 )
This is an action:
−1
1. If a1 · · · a p = e, then a−1
1 = a2 · · · a p . So a2 · · · a p a1 = a1 a1 = e. Which implies
(a2 , a3 , · · · , a p , a1 ) ∈ X.
2. e acts as an identity by construction
3. The “associativity” condition also works by construction.
As orbits partition X, the sum of all orbit sizes must be |X|. We know that |X| = |G| p−1
since we can freely choose the first p − 1, entries and the last one must be the inverse of
their product.
Since p divides |G|, p also divides |X|. We have
| orb(a1 , · · · , a p )|| stabH (a1 , · · · , a p )| = |H| = p.
So all orbits have size 1 or p, and they sum to |X| = p× something. We know that there
is one orbit of size 1, namely (e, e, · · · , e). So there must be at least p − 1 other orbits of
size 1 for the sum to be divisible by p.
In order to have an orbit of size 1, they must look like (a, a, · · · , a) for some a ∈ G,
which has order p.
■
7. Examples of Groups
In this chapter, we will look at conjugacy classes of Sn and An . It turns out this is easy
for Sn , since two elements are conjugate if and only if they have the same cycle type.
However, it is slightly more complicated in An . This is since while (1 2 3) and (1 3 2)
might be conjugate in S4 , the element needed to perform the conjugation might be odd
and not in An .
7.1 Conjugacy classes in Sn
Recall σ , τ ∈ Sn are conjugate if ∃ρ ∈ Sn such that ρσ ρ −1 = τ .
We first investigate the special case, when σ is a k-cycle.
Proposition 7.1.1. If (a1 a2 · · · ak ) is a k-cycle and ρ ∈ Sn , then ρ (a1 · · · ak )ρ −1 is the
k-cycle (ρ (a1 ) ρ (a2 ) · · · ρ (a3 )).
Proof. Consider any ρ (a1 ) acted on by ρ (a1 · · · ak )ρ −1 . The three permutations send it
to ρ (a1 ) 7→ a1 7→ a2 7→ ρ (a2 ) and similarly for other ai s. Since ρ is bijective, any b can
be written as ρ (a) for some a. So the result is the k-cycle (ρ (a1 ) ρ (a2 ) · · · ρ (a3 )).
■
Corollary 7.1.1. Two elements in Sn are conjugate iff they have the same cycle type.
Proof. Suppose σ = σ1 σ2 · · · σℓ , where σi are disjoint cycles. Then
ρσ ρ −1 = ρσ1 ρ −1 ρσ2 ρ −1 · · · ρσℓ ρ −1 .
Since the conjugation of a cycle conserves its length, ρσ ρ −1 has the same cycle type.
Conversely, if σ , τ have the same cycle type, say
σ = (a1 a2 · · · ak )(ak+1 · · · ak+ℓ ),
if we let ρ (ai ) = bi , then ρσ ρ −1 = τ .
τ = (b1 b2 · · · bk )(bk+1 · · · bk+ℓ ),
■
48
Chapter 7. Examples of Groups
Example 7.1.1. Conjugacy classes of S4 :
Cycle type
Example element
Size of ccl
Size of centralizer
Sign
(1, 1, 1, 1)
(2, 1, 1)
(2, 2)
(3, 1)
(4)
e
1
6
3
8
6
24
4
8
3
4
+1
−1
+1
+1
−1
(1 2)
(1 2)(3 4)
(1 2 3)
(1 2 3 4)
We know that a normal subgroup is a union of conjugacy classes. We can now find
all normal subgroups by finding possible union of conjugacy classes whose cardinality
divides 24. Note that the normal subgroup must contain e.
(i) Order 1: {e}
(ii) Order 2: None
(iii) Order 3: None
(iv) Order 4: {e, (1 2)(3 4), (1 3)(2 4), (1 4)(2 3)} ∼
= C2 ×C2 = V4 is a possible candidate.
We can check the group axioms and find that it is really a subgroup
(v) Order 6: None
(vi) Order 8: None
(vii) Order 12: A4 (We know it is a normal subgroup since it is the kernel of the signature
and/or it has index 2)
(viii) Order 24: S4
We can also obtain the quotients of S4 : S4 /{e} ∼
= S4 , S4 /V4 ∼
= S3 ∼
= D6 , S4 /A4 ∼
= C2 ,
S4 /S4 = {e}.
7.2 Conjugacy classes in An
We have seen that |Sn | = 2|An | and that conjugacy classes in Sn are “nice”. How about in
An ?
The first thought is that we write it down:
cclSn (σ ) = {τ ∈ Sn : (∃ρ ∈ Sn ) τ = ρσ ρ −1 }
cclAn (σ ) = {τ ∈ An : (∃ρ ∈ An ) τ = ρσ ρ −1 }
Obviously cclAn (σ ) ⊆ cclSn (σ ), but the converse need not be true since the conjugation
need to map σ to τ may be odd.
Example 7.2.1. Consider (1 2 3) and (1 3 2). They are conjugate in S3 by (2 3), but (2 3) 6∈
A3 . (This does not automatically entail that they are not conjugate in A3 because there
might be another even permutation that conjugate (1 2 3) and (1 3 2). In A5 , (2 3)(4 5)
works (but not in A3 ))
We can use the orbit-stabilizer theorem:
|Sn | = | cclSn (σ )||CSn (σ )|
|An | = | cclAn (σ )||CAn (σ )|
We know that An is half of Sn and cclAn is contained in cclSn . So we have two options:
either cclSn (σ ) = cclAn (σ ) and |CSn (σ )| = 12 |CAn (σ )|; or 12 | cclSn (σ )| = | cclAn (σ )| and
CAn (σ ) = CSn (σ ).
7.2 Conjugacy classes in An
49
Definition 7.2.1 (Splitting of conjugacy classes). When | cclAn (σ )| = 21 | cclSn (σ )|, we say
that the conjugacy class of σ splits in An .
So the conjugacy classes are either retained or split.
Proposition 7.2.1. For σ ∈ An , the conjugacy class of σ splits in An if and only if no odd
permutation commutes with σ .
Proof. We have the conjugacy classes splitting if and only if the centralizer does not.
So instead we check whether the centralizer splits. Clearly CAn (σ ) = CSn (σ ) ∩ An . So
■
splitting of centralizer occurs if and only if an odd permutation commutes with σ .
Example 7.2.2. Conjugacy classes in A4 :
Cycle type
Example
| cclS4 |
(1, 1, 1, 1)
(2, 2)
(3, 1)
e
(1 2)(3 4)
(1 2 3)
1
3
8
Odd element in CS4 ? | cclA4 |
Yes (1 2)
Yes (1 2)
No
1
3
4, 4
In the (3, 1) case, by the orbit stabilizer theorem, |CS4 ((1 2 3))| = 3, which is odd and
cannot split.
Example 7.2.3. Conjugacy classes in A5 :
Cycle type
Example
| cclS5 |
(1, 1, 1, 1, 1)
(2, 2, 1)
(3, 1, 1)
(5)
e
(1 2)(3 4)
(1 2 3)
(1 2 3 4 5)
1
15
20
24
Odd element in CS5 ? | cclA5 |
Yes (1 2)
Yes (1 2)
Yes (4 5)
No
1
15
20
12, 12
Since the centralizer of (1 2 3 4 5) has size 5, it cannot split, so its conjugacy class must
split.
Lemma 7.2.1. σ = (1 2 3 4 5) ∈ S5 has CS5 (σ ) = hσ i.
Proof. | cclSn (σ )| = 24 and |S5 | = 120. So |CS5 (σ )| = 5. Clearly hσ i ⊆ CS5 (σ ). Since
they both have size 5, we know that CS5 (σ ) = hσ i
■
Theorem 7.2.1. A5 is simple.
Proof. We know that normal subgroups must be unions of the conjugacy classes, must
contain e and their order must divide 60. The possible orders are 1, 2, 3, 4, 5, 6, 10, 12,
15, 20, 30. However, the conjugacy classes 1, 15, 20, 12, 12 cannot add up to any of the
possible orders apart from 1 and 60. So we only have trivial normal subgroups.
■
In fact, all An for n ≥ 5 are simple, but the proof is horrible (cf. IB Groups, Rings and
Modules).
50
Chapter 7. Examples of Groups
7.3 Quaternions
In the remaining of the course, we will look at different important groups. Here, we will
have a brief look at
Definition 7.3.1 (Quaternions). The quaternions is the set of matrices
i 0
0 1
0 i
1 0
,
,
,
0 1
0 −i
−1 0
i 0
−1 0
−i 0
0 −1
0 −i
,
,
,
0 −1
0 i
1 0
−i 0
which is a subgroup of GL2 (C).
Notation 7.1. We can also write the quaternions as
Q8 = ha, b : a4 = e, b2 = a2 , bab−1 = a−1 i
Even better, we can write
Q8 = {1, −1, i, −i, j, − j, k, −k}
with
(i) (−1)2 = 1
(ii) i2 = j2 = k2 = −1
(iii) (−1)i = −i etc.
(iv) i j = k, jk = i, ki = j
(v) ji = −k, k j = −i, ik = − j
We have
1 0
i 0
0 1
0 i
1=
,i =
,j=
,k =
0 1
0 −i
−1 0
i 0
−1 0
−i 0
0 −1
0 −i
−1 =
, −i =
,− j =
, −k =
0 −1
0 i
1 0
−i 0
Lemma 7.3.1. If G has order 8, then either G is abelian (i.e. ∼
= C8 ,C4 ×C2 or C2 ×C2 ×C2 ),
or G is not abelian and isomorphic to D8 or Q8 (dihedral or quaternion).
Proof. Consider the different possible cases:
– If G contains an element of order 8, then G ∼
= C8 .
– If all non-identity elements have order 2, then G is abelian (Sheet 1, Q8). Let a 6=
b ∈ G \ {e}. By the direct product theorem, ha, bi = hai × hbi. Then take c 6∈ ha, bi.
By the direct product theorem, we obtain ha, b, ci = hai × hbi × hci = C2 ×C2 ×C2 .
Since ha, b, ci ⊆ G and |ha, b, ci| = |G|, G = ha, b, ci ∼
= C2 ×C2 ×C2 .
– G has no element of order 8 but has an order 4 element a ∈ G. Let H = hai. Since
H has index 2, it is normal in G. So G/H ∼
= C2 since |G/H| = 2. This means that
for any b 6∈ H, bH generates G/H. Then (bH)2 = b2 H = H. So b2 ∈ H. Since
b2 ∈ hai and hai is a cyclic group, b2 commutes with a.
If b2 = a or a3 , then b has order 8. Contradiction. So b2 = e or a2 .
We also know that H is normal, so bab−1 ∈ H. Let bab−1 = aℓ . Since a and b2
2
commute, we know that a = b2 ab−2 = b(bab−1 )b−1 = baℓ b−1 = (bab−1 )ℓ = aℓ .
So ℓ2 ≡ 1 (mod 4). So ℓ ≡ ±1 (mod 4).
7.4 Matrix groups
51
◦ When l ≡ 1 (mod 4), bab−1 = a, i.e. ba = ab. So G is abelian.
2
∼
∼
* If b = e, then G = ha, bi = hai × hbi = C4 ×C2 .
2
2
−1 2
−1 ∼
* If b = a , then (ba ) = e. So G = ha, ba i = C4 ×C2 .
−1
−1
◦ If l ≡ −1 (mod 4), then bab = a .
2
4
2
−1
−1
∼
* If b = e, then G = ha, b : a = e = b , bab = a i. So G = D8 by
definition.
2
2
∼
■
* If b = a , then we have G = Q8 .
7.4 Matrix groups
7.4.1 General and special linear groups
Consider Mn×n (F), i.e. the set of n × n matrices over the field F = R or C (or F p ). We
know that matrix multiplication is associative (since they represent functions) but are, in
general, not commutative. To make this a group, we want the identity matrix I to be the
identity. To ensure everything has an inverse, we can only include invertible matrices.
(We do
necessarily need to take I as the identity of the group. Wecan, for
not example,
0 0
0 0
take e =
and obtain a group in which every matrix is of the form
for some
0 1
0 a
non-zero a. This forms a group, albeit a boring one (it is simply ∼
= R∗ ))
Definition 7.4.1 (General linear group GLn (F)).
GLn (F) = {A ∈ Mn×n (F) : A is invertible}
is the general linear group.
Alternatively, we can define GLn (F) as matrices with non-zero determinants.
Proposition 7.4.1. GLn (F) is a group.
Proof. Identity is I, which is in GLn (F) by definition (I is its self-inverse). The composition of invertible matrices is invertible, so is closed. Inverse exist by definition. Multiplication is associative.
■
Proposition 7.4.2. det : GLn (F) → F \ {0} is a surjective group homomorphism.
Proof. det AB = det A det B. If A is invertible, it has non-zero determinant and det A ∈
F \ {0}.
To show it is surjective, for any x ∈ F \ {0}, if we take the identity matrix and replace
I11 with x, then the determinant is x. So it is surjective.
■
Definition 7.4.2 (Special linear group SLn (F)). The special linear group SLn (F) is the
kernel of the determinant, i.e.
SLn (F) = {A ∈ GLn (F) : det A = 1}.
So SLn (F) ◁ GLn (F) as it is a kernel. Note that Q8 ≤ SL2 (C)
52
Chapter 7. Examples of Groups
7.4.2 Actions of GLn (C)
Proposition 7.4.3. GLn (C) acts faithfully on Cn by left multiplication to the vector, with
two orbits (0 and everything else).
Proof. First show that it is a group action:
1. If A ∈ GLn (C) and v ∈ Cn , then Av ∈ Cn . So it is closed.
2. Iv = v for all v ∈ Cn .
3. A(Bv) = (AB)v.
Now prove that it is faithful: a linear map is determined by what it does on a basis.
Take the standard basis e1 = (1, 0, · · · , 0), · · · en = (0, · · · , 1). Any matrix which maps each
ek to itself must be I (since the columns of a matrix are the images of the basis vectors)
To show that there are 2 orbits, we know that A0 = 0 for all A. Also, as A is invertible,
Av = 0 ⇔ v = 0. So 0 forms a singleton orbit. Then given any two vectors v 6= w ∈
Cn \ {0}, there is a matrix A ∈ GLn (C) such that Av = w (cf. Vectors and Matrices). ■
Similarly, GLn (R) acts on Rn .
Proposition 7.4.4. GLn (C) acts on Mn×n (C) by conjugation. (Proof is trivial)
This action can be thought of as a “change of basis” action. Two matrices are conjugate if they represent the same map but with respect to different bases. The P is the base
change matrix.
From Vectors and Matrices, we know that there are three different types of orbits for
GL2 (C):
conjugate to a matrix of one of these forms:
A is λ 0
(i)
, with λ 6= µ , i.e. two distinct eigenvalues
0 µ
λ 0
(ii)
, i.e. a repeated eigenvalue with 2-dimensional eigenspace
0 λ λ 1
(iii)
, i.e. a repeated eigenvalue with a 1-dimensional eigenspace
0 λ
Note that we said there are three types of orbits, not three orbits. There are infinitely many
orbits, e.g. one for each of λ I.
7.4.3 Orthogonal groups
Recall that AT is defined by ATij = A ji , i.e. we reflect the matrix in the diagonal. They have
the following properties:
(i) (AB)T = BT AT
(ii) (A−1 )T = (AT )−1
(iii) AT A = I ⇔ AAT = I ⇔ A−1 = AT . In this case A is orthogonal
(iv) det AT = det A
We are now in R, because orthogonal matrices don’t make sense with complex matrices.
Note that a matrix is orthogonal if the columns (or rows) form an orthonormal basis
of Rn : AAT = I ⇔ aik a jk = δi j ⇔ ai · a j = δi j , where ai is the ith column of A.
The importance of orthogonal matrices is that they are the isometries of Rn .
Lemma 7.4.1 (Orthogonal matrices are isometries). For any orthogonal A and x, y ∈ Rn ,
we have
(i) (Ax) · (Ay) = x · y
7.4 Matrix groups
53
(ii) |Ax| = |x|
Proof. Treat the dot product as a matrix multiplication. So
(Ax)T (Ay) = xT AT Ay = xT Iy = xT y
Then we have |Ax|2 = (Ax) · (Ax) = x · x = |x|2 . Since both are positive, we know that
|Ax| = |x|.
■
It is important to note that orthogonal matrices are isometries, but not all isometries are
orthogonal. For example, translations are isometries but are not represented by orthogonal
matrices, since they are not linear maps and cannot be represented by matrices at all!
However, it is true that all linear isometries can be represented by orthogonal matrices.
Definition 7.4.3 (Orthogonal group O(n)). The orthogonal group is
O(n) = On = On (R) = {A ∈ GLn (R) : AT A = I},
i.e. the group of orthogonal matrices.
We will later show that this is the set of matrices that preserve distances in Rn .
Lemma 7.4.2. The orthogonal group is a group.
Proof. We have to check that it is a subgroup of GLn (R): It is non-empty, since I ∈ O(n).
If A, B ∈ O(n), then (AB−1 )(AB−1 )T = AB−1 (B−1 )T AT = AB−1 BA−1 = I, so AB−1 ∈ O(n)
and this is indeed a subgroup.
■
Proposition 7.4.5. det : O(n) → {±1} is a surjective group homomorphism.
Proof. For A ∈ O(n), we know that AT A = I. So det AT A = (det A)2 = 1. So det A = ±1.
Since det(AB) = det A det B, it is a homomorphism. We have


−1 0 · · · 0
 0 1 · · · 0


det I = 1, det  . . .
 = −1,
.
.
.
 . .
. 0
0
0 ··· 1
so it is surjective.
■
Definition 7.4.4 (Special orthogonal group SO(n)). The special orthogonal group is the
kernel of det : O(n) → {±1}.
SO(n) = SOn = SOn (R) = {A ∈ O(n) : det A = 1}.
By the isomorphism theorem, O(n)/SO(n) ∼
= C2 .
What’s wrong with matrices with determinant −1? Why do we want to eliminate
these? An important example of an orthogonal matrix with determinant −1 is a reflection.
These transformations reverse orientation, and is often unwanted.


−1 0 · · · 0
 0 1 · · · 0


Lemma 7.4.3. O(n) = SO(n) ∪  . . .
 SO(n)
 .. .. . . 0
0 0 ··· 1
Proof. Cosets partition the group.
■
54
Chapter 7. Examples of Groups
7.4.4 Rotations and reflections in R2 and R3
Lemma 7.4.4. SO(2) consists of all rotations of R2 around 0.
Proof. Let A ∈ SO(2). So AT A = I and det A = 1. Suppose A =
a b
. Then A−1 =
c d
d −b
. So AT = A−1 implies ad − bc = 1, c = −b, d = a. Combining these equa−c a
tions we obtain a2 + c2 = 1. Set a = cos θ = d, and c = sin θ = −b. Then these satisfies
all three equations. So
cos θ − sin θ
A=
.
sin θ cos θ
Note that A maps (1, 0) to (cos θ , sin θ ), and maps (0, 1) = (− sin θ , cos θ ), which are
rotations by θ counterclockwise. So A represents a rotation by θ .
■
Corollary 7.4.1. Any matrix in O(2) is either a rotation around 0 or a reflection in a line
through 0.
Proof. If A ∈ SO(2), we’ve show that it is a rotation. Otherwise,
1 0
cos θ − sin θ
cos θ − sin θ
A=
=
sin θ cos θ
− sin θ − cos θ
0 −1
1 0
since O(2) = SO(2) ∪
SO(2). This has eigenvalues 1, −1. So it is a reflection in
0 −1
the line of the eigenspace E1 . The line goes through 0 since the eigenspace is a subspace
which must include 0.
■
Lemma 7.4.5. Every matrix in SO(3) is a rotation around some axis.
Proof. Let A ∈ SO(3). We know that det A = 1 and A is an isometry. The eigenvalues
λ must have |λ | = 1. They also multiply to det A = 1. Since we are in R, complex
eigenvalues come in complex conjugate pairs. If there are complex eigenvalues λ and λ̄ ,
then λ λ̄ = |λ |2 = 1. The third eigenvalue must be real and has to be +1.
If all eigenvalues are real. Then eigenvalues are either 1 or −1, and must multiply to
1. The possibilities are 1, 1, 1 and −1, −1, 1, all of which contain an eigenvalue of 1.
So pick an eigenvector for our eigenvalue 1 as the third basis vector. Then in some
orthonormal basis,


a b 0
A =  c d 0
0 0 1
Since the third column is the image of the third basis vector, and by orthogonality the
third row is 0, 0, 1. Now let
a b
0
A =
∈ GL2 (R)
c d
7.4 Matrix groups
55
with det A0 = 1. A0 is still orthogonal, so A0 ∈ SO(2). Therefore A0 is a rotation and


cos θ − sin θ 0
A =  sin θ cos θ 0
0
0
1
in some basis, and this is exactly the rotation through an axis.
■
Lemma 7.4.6. Every matrix in O(3) is the product of at most three reflections in planes
through 0.
Note that a rotation is a product of two reflections. This lemma effectively states that
every matrix in O(3) is a reflection, a rotation or a product of a reflection and a rotation.


1 0 0
Proof. Recall O(3) = SO(3) ∪ 0 1 0  SO(3). So if A ∈ SO(3), we know that A =
0 0 −1


cos θ − sin θ 0
 sin θ cos θ 0 in some basis, which is a composite of two reflections:
0
0
1



1 0 0
cos θ sin θ 0
A = 0 −1 0  sin θ − cos θ 0 ,
0 0 1
0
0
1


1 0 0
Then if A ∈ 0 1 0  SO(3), then it is automatically a product of three reflections.
0 0 −1
■
In the last line we’ve shown that everything in O(3)\SO(3) can be written as a product
of three reflections, but it is possible that they
However, some
 need only 1 reflection.

−1 0
0

0 −1 0 
matrices do genuinely need 3 reflections, e.g.
0
0 −1
7.4.5 Unitary groups
The concept of orthogonal matrices only make sense if we are talking about real matrices.
If we are talking about complex, then instead we need unitary matrices. To do so, we
replace the transposition with the Hermitian conjugate. It is defined by A† = (A∗ )T with
(A† )i j = A∗ji , where the asterisk is the complex conjugate. We still have
(i) (AB)† = B† A†
(ii) (A−1 )† = (A† )−1
(iii) A† A = I ⇔ AA† = I ⇔ A† = A−1 . We say A is a unitary matrix
(iv) det A† = (det A)∗
Definition 7.4.5 (Unitary group U(n)). The unitary group is
U(n) = Un = {A ∈ GLn (C) : A† A = I}.
56
Chapter 7. Examples of Groups
Lemma 7.4.7. det : U(n) → S1 , where S1 is the unit circle in the complex plane, is a
surjective group homomorphism.
Proof. We know that 1 = det I = det A† A = | det A|2 . So | det A| = 1. Since det AB =
det A det B, it is a group homomorphism.


λ 0 ··· 0
 0 1 · · · 0


■
Now given λ ∈ S1 , we have  . . .
 ∈ U(n). So it is surjective.
 .. .. . . 0
0 0 0 1
Definition 7.4.6 (Special unitary group SU(n)). The special unitary group SU(n) = SUn
is the kernel of detU(n) → S1 .
Similarly, unitary matrices preserve the complex dot product: (Ax) · (Ay) = x · y.
7.5 More on regular polyhedra
In this section, we will look at the symmetry groups of the cube and the tetrahedron.
7.5.1 Symmetries of the cube
Rotations
Recall that there are |G+ | = 24 rotations of the group by the orbit-stabilizer theorem.
Proposition 7.5.1. G+ ∼
= S4 , where G+ is the group of all rotations of the cube.
Proof. Consider G+ acting on the 4 diagonals of the cube. This gives a group homomorphism φ : G+ → S4 . We have (1 2 3 4) ∈ im φ by rotation around the axis through the
top and bottom face. We also (1 2) ∈ im φ by rotation around the axis through the midpoint of the edge connect 1 and 2. Since (1 2) and (1 2 3 4) generate S4 (Sheet 2 Q. 5d),
■
im φ = S4 , i.e. ϕ is surjective. Since |S4 | = |G+ |, φ must be an isomorphism.
All symmetries
Consider the reflection in the mid-point of the cube τ , sending every point to its opposite.
We can view this as −I in R3 . So it commutes with all other symmetries of the cube.
Proposition 7.5.2. G ∼
= S4 ×C2 , where G is the group of all symmetries of the cube.
Proof. Let τ be “reflection in mid-point” as shown above. This commutes with everything.
(Actually it is enough to check that it commutes with rotations only)
We have to show that G = G+ hτ i. This can be deduced using sizes: since G+ and
hτ i intersect at e only, (i) and (ii) of the Direct Product Theorem gives an injective group
homomorphism G+ × hτ i → G. Since both sides have the same size, the homomorphism
must be surjective as well. So G ∼
■
= G+ × h τ i ∼
= S4 ×C2 .
In fact, we have also proved that the group of symmetries of an octahedron is S4 ×C2
since the octahedron is the dual of the cube. (if you join the centers of each face of the
cube, you get an octahedron)
7.6 Möbius group
57
7.5.2 Symmetries of the tetrahedron
Rotations
Let 1, 2, 3, 4 be the vertices (in any order). G+ is just the rotations. Let it act on the
vertices. Then orb(1) = {1, 2, 3, 4} and stab(1) = { rotations in the axis through 1 and the
center of the opposite face } = {e, 23π , 43π }
So |G+ | = 4 · 3 = 12 by the orbit-stabilizer theorem.
The action gives a group homomorphism φ : G+ → S4 . Clearly ker φ = {e}. So
+
G ≤ S4 and G+ has size 12. We “guess” it is A4 (actually it must be A4 since that is the
only subgroup of S4 of order 12, but it’s nice to see why that’s the case).
If we rotate in an axis through 1, we get (2 3 4), (2 4 3). Similarly, rotating through
other axes through vertices gives all 3-cycles.
If we rotate through an axis that passes through two opposite edges, e.g. through 1-2
edge and 3-4 edge, then we have (1 2)(3 4) and similarly we obtain all double transpositions. So G+ ∼
= A4 . This shows that there is no rotation that fixes two vertices and swaps
the other two.
All symmetries
Now consider the plane that goes through 1, 2 and the mid-point of 3 and 4. Reflection
through this plane swaps 3 and 4, but doesn’t change 1, 2.
So now stab(1) = h(2 3 4), (3, 4)i ∼
= D6 (alternatively, if we want to fix 1, we just
move 2, 3, 4 around which is the symmetries of the triangular base.)
So |G| = 4 · 6 = 24 and G ∼
= S4 (which makes sense since we can move any of its
vertices around in any way and still be a tetrahedron, so we have all permutations of
vertices as the symmetry group.)
7.6 Möbius group
7.6.1 Möbius maps
We want to study maps f : C → C in the form f (z) = az+b
cz+d with a, b, c, d ∈ C and ad −bc 6=
0.
We impose ad − bc 6= 0 or else the map will be constant: for any z, w ∈ C, f (z) −
f (w) = (az+b)(cw+d)−(aw+b)(cz+d)
= (ad−bc)(z−w)
(cw+d)(cz+d)
(cw+d)(cz+d) . If ad − bc = 0, then f is constant and
boring (more importantly, it will not be invertible).
If c 6= 0, then f (− dc ) involves division by 0. So we add ∞ to C to form the extended
complex plane (Riemann sphere) C ∪ {∞} = C∞ (cf. Vectors and Matrices). Then we
define f (− dc ) = ∞. We call C∞ a one-point compactification of C (because it adds one
point to C to make it compact, cf. Metric and Topology).
58
Chapter 7. Examples of Groups
Definition 7.6.1 (Möbius map). A Möbius map is a map from C∞ → C∞ of the form
f (z) =
az + b
,
cz + d
where a, b, c, d ∈ C and ad − bc 6= 0, with f (− dc ) = ∞ and f (∞) = ac when c 6= 0. (if c = 0,
then f (∞) = ∞)
Lemma 7.6.1. The Möbius maps are bijections C∞ → C∞ .
dz−b
Proof. The inverse of f (z) = az+b
cz+d is g(z) = −cz+a , which we can check by composition
both ways.
■
Proposition 7.6.1. The Möbius maps form a group M under the function composition.
(The Möbius group)
Proof. The group axioms are shown as follows:
z+b2
z+b1
1. If f1 (z) = ac11z+d
and f2 (z) = ac22z+d
, then
1
2
z+b1
a2 ac11z+d
+ b2 (a a + b c )z + (a b + b d )
1
1 2
2 1
2 1
2 1
f2 ◦ f1 (z) = .
=
(c2 a1 + d2 c1 )z + (c2 b1 + d1 d2 )
c a1 z+b1 + d
2
c1 z+d1
2
Now we have to check that ad − bc 6= 0: we have (a1 a2 + b2 c1 )(c2 b1 + d1 d2 ) −
(a2 b1 + b2 d1 )(c2 a1 + d2 c1 ) = (a1 d1 − b1 c1 )(a2 d2 − b2 c2 ) 6= 0.
(This works for z 6= ∞, − dc11 . We have to manually check the special cases, which is
simply yet more tedious algebra)
2. The identity function is 1(z) = 1z+0
0+1 which satisfies ad − bc 6= 0.
dz−b
−1
3. We have shown above that f (z) = −cz+a
with da−bc 6= 0, which are also Möbius
maps
4. Composition of functions is always associative
■
M is not abelian. e.g. f1 (z) = 2z and f2 (z) = z + 1 are not commutative: f1 ◦ f2 (z) =
2z + 2 and f2 ◦ f1 (z) = 2z + 1.
Note that the point at “infinity” is not special. ∞ is no different to any other point of
the Riemann sphere. However, from the way we write down the Möbius map, we have to
check infinity specially. In this particular case, we can get quite far with conventions such
a
as ∞1 = 0, 10 = ∞ and a·∞
c·∞ = c .
λ az+λ b
Clearly az+b
cz+d = λ cz+λ d for any λ 6= 0. So we do not have a unique representation of a
map in terms of a, b, c, d. But a, b, c, d does uniquely determine a Möbius map.
az + b
a b
Proposition 7.6.2. The map θ : GL2 (C) → M sending
7→
is a surjective
c d
cz + d
group homomorphism.
Proof. Firstly, since the determinant ad − bc of any matrix in GL2 (C) is non-zero, it does
map to a Möbius map. This also shows that θ is surjective.
We have previously calculated that
θ (A2 ) ◦ θ (A1 ) =
(a1 a2 + b2 c1 )z + (a2 b1 + b2 d1 )
= θ (A2 A1 )
(c2 a1 + d2 c1 )z + (c2 b1 + d1 d2 )
So it is a homomorphism.
■
7.6 Möbius group
59
The kernel of θ is
az + b
ker(θ ) = A ∈ GL2 (C) : (∀z) z =
cz + d
We can try different values of z: z = ∞ ⇒ c = 0; z = 0 ⇒ b = 0; z = 1 ⇒ d = a. So
ker θ = Z = {λ I : λ ∈ C, λ 6= 0},
where I is the identity matrix and Z is the centre of GL2 (C).
By the isomorphism theorem, we have
M∼
= GL2 (C)/Z
Definition 7.6.2 (Projective general linear group PGL2 (C)). (Non-examinable) The projective general linear group is
PGL2 (C) = GL2 (C)/Z.
Since fA = fB iff B = λ A for some λ 6= 0 (where A, B are the corresponding matrices
of the maps), if we restrict θ to SL2 (C), we have θ |SL2 (C) : SL2 (C) → M is also surjective.
The kernel is now just {±I}. So
M∼
= SL2 (C)/{±I} = PSL2 (C)
Clearly PSL2 (C) ∼
= PGL2 (C) since both are isomorphic to the Möbius group.
Proposition 7.6.3. Every Möbius map is a composite of maps of the following form:
(i) Dilation/rotation: f (z) = az, a 6= 0
(ii) Translation: f (z) = z + b
(iii) Inversion: f (z) = 1z
Proof. Let az+b
cz+d ∈ M.
If c = 0, i.e. g(∞) = ∞, then g(z) = da z + db , i.e.
a
a
b
z 7→ z 7→ z + .
d
d
d
1
. Then hg(∞) = ∞ is of the above form. We have
If c 6= 0, let g(∞) = z0 , Let h(z) = z−z
0
1
−1
h (w) = w + z0 being of type (iii) followed by (ii). So g = h−1 (hg) is a composition of
maps of the three forms listed above.
Alternatively, with sufficient magic, we have
z 7→ z +
d
1
ad + bc
a
ad + bc
az + b
7→
7→ −
7→ −
=
.
d
d
d
c
c c2 (z + c ) cz + d
z+ c
c2 (z + c )
■
Note that the non-calculation method above can be transformed into another (different) composition with the same end result. So the way we compose a Möbius map from
the “elementary” maps are not unique.
60
Chapter 7. Examples of Groups
7.6.2 Fixed points of Möbius maps
Definition 7.6.3 (Fixed point). A fixed point of f is a z such that f (z) = z.
We know that any Möbius map with c = 0 fixes ∞. We also know that z → z + b for
any b 6= 0 fixes ∞ only, where as z 7→ az for a 6= 0, 1 fixes 0 and ∞. It turns out that you
cannot have more than two fixed points, unless you are the identity.
Proposition 7.6.4. Any Möbius map with at least 3 fixed points must be the identity.
az+b
Proof. Consider f (z) = az+b
cz+d . This has fixed points at those z which satisfy cz+d = z ⇔
cz2 + (d − a)z − b = 0. A quadratic has at most two roots, unless c = b = 0 and d = a, in
which the equation just says 0 = 0.
However, if c = b = 0 and d = a, then f is just the identity.
■
Proposition 7.6.5. Any Möbius map is conjugate to f (z) = ν z for some ν 6= 0 or to
f (z) = z + 1.
Proof. We have the surjective group homomorphism θ : GL2 (C) → M. The conjugacy
classes of GL2 (C) are of types
λ
0
λ
0
λ
0
λz+0 λ
= z
0z + µ
µ
λz+0
0
= 1z
7→ g(z) =
λ
0z + λ
1
λz+1
1
7→ g(z) =
= z+
λ
λ
λ
0
µ
7→ g(z) =
Butthe last
+ 1. We
one is not in the form z know that the last g(z) can also be represented
1
1 1
1 λ
, which is conjugate to
(since that’s its Jordan-normal form). So z+ λ1
by
0 1
0 1
is also conjugate to z + 1.
■
Now we see easily that (for ν 6= 0, 1), ν z has 0 and ∞ as fixed points, z + 1 only has ∞.
Does this transfer to their conjugates?
Proposition 7.6.6. Every non-identity has exactly 1 or 2 fixed points.
Proof. Given f ∈ M and f 6= id. So ∃h ∈ M such that h f h−1 (z) = ν z. Now f (w) = w ⇔
h f (w) = h(w) ⇔ h f h−1 (h(w)) = h(w). So h(w) is a fixed point of h f h−1 . Since h is a
bijection, f and h f h−1 have the same number of fixed points.
So f has exactly 2 fixed points if f is conjugate to ν z, and exactly 1 fixed point if f is
conjugate to z + 1.
■
Intuitively, we can show that conjugation preserves fixed points because if we conjugate by h, we first move the Riemann sphere around by h, apply f (that fixes the fixed
points) then restore the Riemann sphere to its original orientation. So we have simply
moved the fixed point around by h.
7.6 Möbius group
61
7.6.3 Permutation properties of Möbius maps
We have seen that the Möbius map with three fixed points is the identity. As a corollary,
we obtain the following.
Proposition 7.6.7. Given f , g ∈ M. If ∃z1 , z2 , z3 ∈ C∞ such that f (zi ) = g(zi ), then f = g.
i.e. every Möbius map is uniquely determined by three points.
Proof. As Möbius maps are invertible, write f (zi ) = g(zi ) as g−1 f (zi ) = zi . So g−1 f has
three fixed points. So g−1 f must be the identity. So f = g.
■
Definition 7.6.4 (Three-transitive action). An action of G on X is called three-transitive
if the induced action on {(x1 , x2 , x3 ) ∈ X 3 : xi pairwise disjoint}, given by g(x1 , x2 , x3 ) =
(g(x1 ), g(x2 ), g(x3 )), is transitive.
This means that for any two triples x1 , x2 , x3 and y1 , y2 , y3 of distinct elements of X,
there exists g ∈ G such that g(xi ) = yi .
If this g is always unique, then the action is called sharply three transitive
This is a really weird definition. The reason we raise it here is that the Möbius map
satisfies this property.
Proposition 7.6.8. The Möbius group M acts sharply three-transitively on C∞ .
Proof. We want to show that we can send any three points to any other three points.
However, it is easier to show that we can send any three points to 0, 1, ∞.
Suppose we want to send z1 → ∞, z2 7→ 0, z3 7→ 1. Then the following works:
f (z) =
(z − z2 )(z3 − z1 )
(z − z1 )(z3 − z2 )
If any term zi is ∞, we simply remove the terms with zi , e.g. if z1 = ∞, we have f (z) =
z−z2
z3 −z2 .
So given also w1 , w2 , w3 distinct in C∞ and g ∈ M sending w1 7→ ∞, w2 7→ 0, w3 7→ 1,
then we have g−1 f (zi ) = wi .
The uniqueness of the map follows from the fact that a Möbius map is uniquely determined by 3 points.
■
3 points not only define a Möbius map uniquely. They also uniquely define a line or
circle. Note that on the Riemann sphere, we can think of a line as a circle through infinity,
and it would be technically correct to refer to both of them as “circles”. However, we
would rather be clearer and say “line/circle”.
We will see how Möbius maps relate to lines and circles. We will first recap some
knowledge about lines and circles in the complex plane.
Lemma 7.6.2. The general equation of a circle or straight line in C is
Azz̄ + B̄z + Bz̄ +C = 0,
where A,C ∈ R and |B|2 > AC.
A = 0 gives a straight line. If A 6= 0, B = 0, we have a circle centered at the origin. If
C = 0, the circle passes through 0.
62
Chapter 7. Examples of Groups
Proof. This comes from noting that |z − B| = r for r ∈ R > 0 is a circle; |z − a| = |z − b|
with a 6= b is a line. The detailed proof can be found in Vectors and Matrices.
■
Proposition 7.6.9. Möbius maps send circles/straight lines to circles/straight lines. Note
that it can send circles to straight lines and vice versa.
Alternatively, Möbius maps send circles on the Riemann sphere to circles on the Riemann sphere.
dw−b
Proof. We can either calculate it directly using w = az+b
cz+d ⇔ z = −cw+a and substituting z
into the circle equation, which gives A0 ww̄ + B̄0 w + B0 w̄ +C0 = 0 with A0 ,C0 ∈ R.
Alternatively, we know that each Möbius map is a composition of translation, dilation/rotation and inversion. We can check for each of the three types. Clearly dilation/rotation and translation maps a circle/line to a circle/line. So we simply do inversion:
if w = z−1
Azz̄ + B̄z + Bz̄ +C = 0
⇔ Cww̄ + Bw + B̄w̄ + A = 0
■
z−i
Example 7.6.1. Consider f (z) = z+i
. Where does the real line go? The real line is simply
a circle through 0, 1, ∞. f maps this circle to the circle containing f (∞) = 1, f (0) = −1
and f (1) = −i, which is the unit circle.
Where does the upper half plane go? We know that the Möbius map is smooth. So the
upper-half plane either maps to the inside of the circle or the outside of the circle. We try
the point i, which maps to 0. So the upper half plane is mapped to the inside of the circle.
7.6.4 Cross-ratios
Finally, we’ll look at an important concept known as cross-ratios. Roughly speaking, this
is a quantity that is preserved by Möbius transforms.
Definition 7.6.5 (Cross-ratios). Given four distinct points z1 , z2 , z3 , z4 ∈ C∞ , their crossratio is [z1 , z2 , z3 , z4 ] = g(z4 ), with g being the unique Möbius map that maps z1 7→ ∞, z2 7→
0, z3 7→ 1. So [∞, 0, 1, λ ] = λ for any λ 6= ∞, 0, 1. We have
[z1 , z2 , z3 , z4 ] =
z4 − z2 z3 − z1
·
z4 − z1 z3 − z2
(with special cases as above).
We know that this exists and is uniquely defined since M acts sharply three-transitively
on C∞ .
Note that different authors use different permutations of 1, 2, 3, 4, but they all lead to
the same result as long as you are consistent.
Lemma 7.6.3. For z1 , z2 , z3 , z4 ∈ C∞ all distinct, then
[z1 , z2 , z3 , z4 ] = [z2 , z1 , z4 , z3 ] = [z3 , z4 , z1 , z2 ] = [z4 , z3 , z2 , z1 ]
i.e. if we perform a double transposition on the entries, the cross-ratio is retained.
Proof. By inspection of the formula.
■
7.7 Projective line (non-examinable)
63
Proposition 7.6.10. If f ∈ M, then [z1 , z2 , z3 , z4 ] = [ f (z1 ), f (z2 ), f (z3 ), f (z4 )].
Proof. Use our original definition of the cross ratio (instead of the formula). Let g be the
unique Möbius map such that [z1 , z2 , z3 , z4 ] = g(z4 ) = λ , i.e.
g
z1 →
7− ∞
z2 7→ 0
z3 7→ 1
z4 7→ λ
We know that g f −1 sends
f −1
g
f −1
g
f −1
g
f −1
g
f (z1 ) 7−−→ z1 →
7− ∞
f (z2 ) 7−−→ z2 →
7− 0
f (z3 ) 7−−→ z3 →
7− 1
f (z4 ) 7−−→ z4 →
7− λ
So [ f (z1 ), f (z2 ), f (z3 ), f (z4 )] = g f −1 f (z4 ) = g(z4 ) = λ .
■
In fact, we can see from this proof that: given z1 , z2 , z3 , z4 all distinct and w1 , w2 , w3 , w4
distinct in C∞ , then ∃ f ∈ M with f (zi ) = wi iff [z1 , z2 , z3 , z4 ] = [w1 , w2 , w3 , w4 ].
Corollary 7.6.1. z1 , z2 , z3 , z4 lie on some circle/straight line iff [z1 , z2 , z3 , z4 ] ∈ R.
Proof. Let C be the circle/line through z1 , z2 , z3 . Let g be the unique Möbius map with
g(z1 ) = ∞, g(z2 ) = 0, g(z3 ) = 1. Then g(z4 ) = [z1 , z2 , z3 , z4 ] by definition.
Since we know that Möbius maps preserve circle/lines, z4 ∈ C ⇔ g(z4 ) is on the line
through ∞, 0, 1, i.e. g(z4 ) ∈ R.
■
7.7 Projective line (non-examinable)
We have seen in matrix groups that GL2 (C) acts on C2 , the column vectors. Instead, we
can also have GL2 (C) acting on the set of 1-dimensional subspaces (i.e. lines) of C2 .
For any v ∈ C2 , write the line generated by v as hvi. Then clearly hvi = {λ v : λ ∈ C}.
Now for any A ∈ GL2 (C), define the action as Ahvi = hAvi. Check that this is welldefined: for any hvi = hwi, we want to show that hAvi = hAwi. This is true because hvi =
hwi if and only if w = λ v for some λ ∈ C \ {0}, and then hAwi = hAλ vi = hλ (Av)i =
hAvi.
What is the kernel of this action? By definition the kernel
has to
fix all lines. In
1
0
1
particular, it has to fix our magic lines generated by 0 , 1 and 1 . Since we want
Ah 10 i = h 10 i, so we must have A 10 = λ0 for some λ . Similarly, A 01 = µ0 . So we
λ 0
can write A =
. However, also need Ah 11 i = h 11 i. Since A is a linear function,
0 µ
we know that A 11 = A 10 + A 01 = λµ . For the final vector to be parallel to 11 , we
64
Chapter 7. Examples of Groups
must have λ = µ . So A = λ I for some I. Clearly any matrix of this form fixes any line.
So the kernel Z = {λ I : λ ∈ C \ {0}}.
Note that every line is uniquely determined by its slope. For any v = (v1 , v2 ), w =
(w1 , w2 ), we have hvi = hwi iff z1 /z2 = w1 /w2 . So we have a one-to-one correspondence
from our lines to C∞ , that maps h zz12 i ↔ z1 /z2 .
Finally, for each A ∈ GL2 (C), given any line h 1z i, we have
az + b
a b
z
az + b
=
↔
c d
1
cz + d
cz + d
So GL2 (C) acting on the lines is just “the same” as the Möbius groups acting on points.
II
Groups II, Rings and Modules
8
Introduction . . . . . . . . . . . . . . . . . . . . . . 69
9
Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
9.1
9.2
Basic concepts
Normal subgroups, quotients, homomorphisms, isomorphisms
Actions of permutations
Conjugacy, centralizers and normalizers
Finite p-groups
Finite abelian groups
Sylow theorems
9.3
9.4
9.5
9.6
9.7
10 Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
10.1
10.2
10.3
10.4
10.5
10.6
10.7
10.8
Definitions and examples
Homomorphisms, ideals, quotients and isomorphisms
Integral domains, field of factions, maximal
and prime ideals
Factorization in integral domains
Factorization in polynomial rings
Gaussian integers
Algebraic integers
Noetherian rings
11 Modules . . . . . . . . . . . . . . . . . . . . . . . . . 135
11.1
11.2
11.3
11.4
11.5
Definitions and examples
Direct sums and free modules
Matrices over Euclidean domains
Modules over F[X] and normal forms for matrices
Conjugacy of matrices*
67
Groups
Basic concepts of group theory recalled from Part IA Groups. Normal subgroups, quotient groups
and isomorphism theorems. Permutation groups. Groups acting on sets, permutation representations. Conjugacy classes, centralizers and normalizers. The centre of a group. Elementary
properties of finite p-groups. Examples of finite linear groups and groups arising from geometry.
Simplicity of An .
Sylow subgroups and Sylow theorems. Applications, groups of small order.
[8]
Rings
Definition and examples of rings (commutative, with 1). Ideals, homomorphisms, quotient rings,
isomorphism theorems. Prime and maximal ideals. Fields. The characteristic of a field. Field of
fractions of an integral domain.
Factorization in rings; units, primes and irreducibles. Unique factorization in principal ideal domains, and in polynomial rings. Gauss’ Lemma and Eisenstein’s irreducibility criterion.
Rings Z[α ] of algebraic integers as subsets of C and quotients of Z[x]. Examples of Euclidean
domains and uniqueness and non-uniqueness of factorization. Factorization in the ring of Gaussian
integers; representation of integers as sums of two squares.
Ideals in polynomial rings. Hilbert basis theorem.
[10]
Modules
Definitions, examples of vector spaces, abelian groups and vector spaces with an endomorphism.
Sub-modules, homomorphisms, quotient modules and direct sums. Equivalence of matrices, canonical form. Structure of finitely generated modules over Euclidean domains, applications to abelian
groups and Jordan normal form.
[6]
8. Introduction
The course is naturally divided into three sections — Groups, Rings, and Modules.
In IA Groups, we learnt about some basic properties of groups, and studied several
interesting groups in depth. In the first part of this course, we will further develop some
general theory of groups. In particular, we will prove two more isomorphism theorems of
groups. While we will not find these theorems particularly useful in this course, we will
be able to formulate analogous theorems for other algebraic structures such as rings and
modules, as we will later find in the course.
In the next part of the course, we will study rings. These are things that behave
somewhat like Z, where we can add, subtract, multiply but not (necessarily) divide. While
Z has many nice properties, these are not necessarily available in arbitrary rings. Hence
we will classify rings into different types, depending on how many properties of Z they
inherit. We can then try to reconstruct certain IA Numbers and Sets results in these rings,
such as unique factorization of numbers into primes and Bézout’s theorem.
Finally, we move on to modules. The definition of a module is very similar to that of
a vector space, except that instead of allowing scalar multiplication by elements of a field,
we have scalar multiplication by elements of a ring. It turns out modules are completely
unlike vector spaces, and can have much more complicated structures. Perhaps because
of this richness, many things turn out to be modules. Using module theory, we will be
able to prove certain important theorems such as the classification of finite abelian groups
and the Jordan normal form theorem.
9. Groups
9.1 Basic concepts
We will begin by quickly recapping some definitions and results from IA Groups.
Definition 9.1.1 (Group). A group is a triple (G, · , e), where G is a set, · : G × G → G is
a function and e ∈ G is an element such that
(i) For all a, b, c ∈ G, we have (a · b) · c = a · (b · c).
(associativity)
(ii) For all a ∈ G, we have a · e = e · a = a.
(identity)
−1
−1
−1
(iii) For all a ∈ G, there exists a ∈ G such that a · a = a · a = e.
(inverse)
Some people add a stupid axiom that says g · h ∈ G for all g, h ∈ G, but this is already
implied by saying · is a function to G. You can write that down as well, and no one will
say you are stupid. But they might secretly think so.
Lemma 9.1.1. The inverse of an element is unique.
Proof. Let a−1 , b be inverses of a. Then
b = b · e = b · a · a−1 = e · a−1 = a−1 .
■
Definition 9.1.2 (Subgroup). If (G, · , e) is a group and H ⊆ G is a subset, it is a subgroup
if
(i) e ∈ H,
(ii) a, b ∈ H implies a · b ∈ H,
(iii) · : H × H → H makes (H, · , e) a group.
We write H ≤ G if H is a subgroup of G.
Note that the last condition in some sense encompasses the first two, but we need the
first two conditions to hold before the last statement makes sense at all.
72
Chapter 9. Groups
Lemma 9.1.2. H ⊆ G is a subgroup if H is non-empty and for any h1 , h2 ∈ H, we have
h1 h−1
2 ∈ H.
Definition 9.1.3 (Abelian group). A group G is abelian if a · b = b · a for all a, b ∈ G.
Example 9.1.1. We have the following familiar examples of groups
(i) (Z, +, 0), (Q, +, 0), (R, +, 0), (C, +, 0).
(ii) We also have groups of symmetries:
(a) The symmetric group Sn is the collection of all permutations of {1, 2, · · · , n}.
(b) The dihedral group D2n is the symmetries of a regular n-gon.
(c) The group GLn (R) is the group of invertible n × n real matrices, which also is
the group of invertible R-linear maps from the vector space Rn to itself.
(iii) The alternating group An ≤ Sn .
(iv) The cyclic group Cn ≤ D2n .
(v) The special linear group SLn (R) ≤ GLn (R), the subgroup of matrices of determinant 1.
(vi) The Klein-four group C2 ×C2 .
(vii) The quaternions Q8 = {±1, ±i, ± j, ±k} with i j = k, ji = −k, i2 = j2 = k2 = −1,
(−1)2 = 1.
With groups and subgroups, we can talk about cosets.
Definition 9.1.4 (Coset). If H ≤ G, g ∈ G, the left coset gH is the set
gH = {x ∈ G : x = g · h for some h ∈ H}.
For example, since H is a subgroup, we know e ∈ H. So for any g ∈ G, we must have
g ∈ gH.
The collection of H-cosets in G forms a partition of G, and furthermore, all H-cosets
gH are in bijection with H itself, via h 7→ gh. An immediate consequence is
Theorem 9.1.1 (Lagrange’s theorem). Let G be a finite group, and H ≤ G. Then
|G| = |H||G : H|,
where |G : H| is the number of H-cosets in G.
We can do exactly the same thing with right cosets and get the same conclusion.
We have implicitly used the following notation:
Definition 9.1.5 (Order of group). The order of a group is the number of elements in G,
written |G|.
Instead of order of the group, we can ask what the order of an element is.
Definition 9.1.6 (Order of element). The order of an element g ∈ G is the smallest positive
n such that gn = e. If there is no such n, we say g has infinite order.
We write ord(g) = n.
A basic lemma is as follows:
9.2 Normal subgroups, quotients, homomorphisms, isomorphisms
73
Lemma 9.1.3. If G is a finite group and g ∈ G has order n, then n | |G|.
Proof. Consider the following subset:
H = {e, g, g2 , · · · , gn−1 }.
This is a subgroup of G, because it is non-empty and gr g−s = gr−s is on the list (we might
have to add n to the power of g to make it positive, but this is fine since gn = e). Moreover,
there are no repeats in the list: if gi = g j , with wlog i ≥ j, then gi− j = e. So i − j < n. By
definition of n, we must have i − j = 0, i.e. i = j.
Hence Lagrange’s theorem tells us n = |H| | |G|.
■
9.2 Normal subgroups, quotients, homomorphisms, isomorphisms
We all (hopefully) recall what the definition of a normal subgroup is. However, instead of
just stating the definition and proving things about it, we can try to motivate the definition,
and see how one could naturally come up with it.
Let H ≤ G be a subgroup. The objective is to try to make the collection of cosets
G/H = {gH : g ∈ G}
into a group.
Before we do that, we quickly come up with a criterion for when two cosets gH and
0
g H are equal. Notice that if gH = g0 H, then g ∈ g0 H. So g = g0 · h for some h. In other
words, (g0 )−1 · g = h ∈ H. So if two elements represent the same coset, their difference
is in H. The argument is also reversible. Hence two elements g, g0 represent the same
H-coset if and only if (g0 )−1 g ∈ H.
Suppose we try to make the set G/H = {gH : g ∈ G} into a group, by the obvious
formula
(g1 H) · (g2 H) = g1 g2 H.
However, this doesn’t necessarily make sense. If we take a different representative for the
same coset, we want to make sure it gives the same answer.
If g2 H = g02 H, then we know g02 = g2 · h for some h ∈ H. So
(g1 H) · (g02 H) = g1 g02 H = g1 g2 hH = g1 g2 H = (g1 H) · (g2 H).
So all is good.
What if we change g1 ? If g1 H = g01 H, then g01 = g1 · h for some h ∈ H. So
(g01 H) · (g2 H) = g01 g2 H = g1 hg2 H.
Now we are stuck. We would really want the equality
g1 hg2 H = g1 g2 H
to hold. This requires
(g1 g2 )−1 g1 hg2 ∈ H.
74
Chapter 9. Groups
This is equivalent to
g−1
2 hg2 ∈ H.
So for G/H to actually be a group under this operation, we must have, for any h ∈ H and
g ∈ G, the property g−1 hg ∈ H to hold.
This is not necessarily true for an arbitrary H. Those nice ones that satisfy this property are known as normal subgroups.
Definition 9.2.1 (Normal subgroup). A subgroup H ≤ G is normal if for any h ∈ H and
g ∈ G, we have g−1 hg ∈ H. We write H ◁ G.
This allows us to make the following definition:
Definition 9.2.2 (Quotient group). If H ◁ G is a normal subgroup, then the set G/H of
left H-cosets forms a group with multiplication
(g1 H) · (g2 H) = g1 g2 H.
with identity eH = H. This is known as the quotient group.
This is indeed a group. Normality was defined such that this is well-defined. Multiplication is associative since multiplication in G is associative. The inverse of gH is g−1 H,
and eH is easily seen to be the identity.
So far, we’ve just been looking at groups themselves. We would also like to know
how groups interact with each other. In other words, we want to study functions between
groups. However, we don’t allow arbitrary functions, since groups have some structure,
and we would like the functions to respect the group structures. These nice functions are
known as homomorphisms.
Definition 9.2.3 (Homomorphism). If (G, · , eG ) and (H, ∗, eH ) are groups, a function
ϕ : G → H is a homomorphism if ϕ (eG ) = eH , and for g, g0 ∈ G, we have
ϕ (g · g0 ) = ϕ (g) ∗ ϕ (g0 ).
If we think carefully, ϕ (eG ) = eH can be derived from the second condition, but it
doesn’t hurt to put it in as well.
Lemma 9.2.1. If ϕ : G → H is a homomorphism, then
ϕ (g−1 ) = ϕ (g)−1 .
Proof. We compute ϕ (g · g−1 ) in two ways. On the one hand, we have
ϕ (g · g−1 ) = ϕ (e) = e.
On the other hand, we have
ϕ (g · g−1 ) = ϕ (g) ∗ ϕ (g−1 ).
By the uniqueness of inverse, we must have
ϕ (g−1 ) = ϕ (g)−1 .
■
9.2 Normal subgroups, quotients, homomorphisms, isomorphisms
75
Given any homomorphism, we can build two groups out of it:
Definition 9.2.4 (Kernel). The kernel of a homomorphism ϕ : G → H is
ker(ϕ ) = {g ∈ G : ϕ (g) = e}.
Definition 9.2.5 (Image). The image of a homomorphism ϕ : G → H is
im(ϕ ) = {h ∈ H : h = ϕ (g) for some g ∈ G}.
Lemma 9.2.2. For a homomorphism ϕ : G → H, the kernel ker(ϕ ) is a normal subgroup,
and the image im(ϕ ) is a subgroup of H.
Proof. There is only one possible way we can prove this.
To see ker(ϕ ) is a subgroup, let g, h ∈ ker ϕ . Then
ϕ (g · h−1 ) = ϕ (g) ∗ ϕ (h)−1 = e ∗ e−1 = e.
So gh−1 ∈ ker ϕ . Also, ϕ (e) = e. So ker(ϕ ) is non-empty. So it is a subgroup.
To show it is normal, let g ∈ ker(ϕ ). Let x ∈ G. We want to show x−1 gx ∈ ker(ϕ ). We
have
ϕ (x−1 gx) = ϕ (x−1 ) ∗ ϕ (g) ∗ ϕ (x) = ϕ (x−1 ) ∗ ϕ (x) = ϕ (x−1 x) = ϕ (e) = e.
So x−1 gx ∈ ker(ϕ ). So ker(ϕ ) is normal.
Also, if ϕ (g), ϕ (h) ∈ im(ϕ ), then
ϕ (g) ∗ ϕ (h)−1 = ϕ (gh−1 ) ∈ im(ϕ ).
Also, e ∈ im(ϕ ). So im(ϕ ) is non-empty. So im(ϕ ) is a subgroup.
■
Definition 9.2.6 (Isomorphism). An isomorphism is a homomorphism that is also a bijection.
Definition 9.2.7 (Isomorphic group). Two groups G and H are isomorphic if there is an
isomorphism between them. We write G ∼
= H.
Usually, we identify two isomorphic groups as being “the same”, and do not distinguish isomorphic groups.
It is an exercise to show the following:
Lemma 9.2.3. If ϕ is an isomorphism, then the inverse ϕ −1 is also an isomorphism.
When studying groups, it is often helpful to break the group apart into smaller groups,
which are hopefully easier to study. We will have three isomorphism theorems to do so.
These isomorphism theorems tell us what happens when we take quotients of different
things. Then if a miracle happens, we can patch what we know about the quotients together to get information about the big group. Even if miracles do not happen, these are
useful tools to have.
The first isomorphism relates the kernel to the image.
76
Chapter 9. Groups
Theorem 9.2.1 (First isomorphism theorem). Let ϕ : G → H be a homomorphism. Then
ker(ϕ ) ◁ G and
G ∼
= im(ϕ ).
ker(ϕ )
Proof. We have already proved that ker(ϕ ) is a normal subgroup. We now have to construct a homomorphism f : G/ ker(ϕ ) → im(ϕ ), and prove it is an isomorphism.
Define our function as follows:
G
→ im(ϕ )
ker(ϕ )
g ker(ϕ ) 7→ ϕ (g).
f:
We first tackle the obvious problem that this might not be well-defined, since we are
picking a representative for the coset. If g ker(ϕ ) = g0 ker(ϕ ), then we know g−1 · g0 ∈
ker(ϕ ). So ϕ (g−1 · g0 ) = e. So we know
e = ϕ (g−1 · g0 ) = ϕ (g)−1 ∗ ϕ (g0 ).
Multiplying the whole thing by ϕ (g) gives ϕ (g) = ϕ (g0 ). Hence this function is welldefined.
Next we show it is a homomorphism. To see f is a homomorphism, we have
f (g ker(ϕ ) · g0 ker(ϕ )) = f (gg0 ker(ϕ ))
= ϕ (gg0 )
= ϕ (g) ∗ ϕ (g0 )
= f (g ker(ϕ )) ∗ f (g0 ker(ϕ )).
So f is a homomorphism. Finally, we show it is a bijection.
To show it is surjective, let h ∈ im(ϕ ). Then h = ϕ (g) for some g. So h = f (g ker(ϕ ))
is in the image of f .
To show injectivity, suppose f (g ker(ϕ )) = f (g0 ker(ϕ )). So ϕ (g) = ϕ (g0 ). So ϕ (g−1 ·
0
g ) = e. Hence g−1 · g0 ∈ ker(ϕ ), and hence g ker(ϕ ) = g0 ker(ϕ ). So done.
■
Before we move on to further isomorphism theorems, we see how we can use these to
identify two groups which are not obviously the same.
Example 9.2.1. Consider a homomorphism ϕ : C → C \ {0} given by z 7→ ez . We also
know that
ez+w = ez ew .
This means ϕ is a homomorphism if we think of it as ϕ : (C, +) → (C \ {0}, ×).
What is the image of this homomorphism? The existence of log shows that ϕ is
surjective. So im ϕ = C \ {0}. What about the kernel? It is given by
ker(ϕ ) = {z ∈ C : ez = 1} = 2π iZ,
i.e. the set of all integer multiples of 2π i. The conclusion is that
(C/(2π iZ), +) ∼
= (C \ {0}, ×).
9.2 Normal subgroups, quotients, homomorphisms, isomorphisms
77
The second isomorphism theorem is a slightly more complicated theorem.
Theorem 9.2.2 (Second isomorphism theorem). Let H ≤ G and K ◁G. Then HK = {h·k :
h ∈ H, k ∈ K} is a subgroup of G, and H ∩ K ◁ H. Moreover,
HK ∼ H
.
=
K
H ∩K
Proof. Let hk, h0 k0 ∈ HK. Then
h0 k0 (hk)−1 = h0 k0 k−1 h−1 = (h0 h−1 )(hk0 k−1 h−1 ).
The first term is in H, while the second term is k0 k−1 ∈ K conjugated by h, which also has
to be in K be normality. So this is something in H times something in K, and hence in
HK. HK also contains e, and is hence a group.
To show H ∩ K ◁ H, consider x ∈ H ∩ K and h ∈ H. Consider h−1 xh. Since x ∈ K, the
normality of K implies h−1 xh ∈ K. Also, since x, h ∈ H, closure implies h−1 xh ∈ H. So
h−1 xh ∈ H ∩ K. So H ∩ K ◁ H.
Now we can start to prove the second isomorphism theorem. To do so, we apply the
first isomorphism theorem to it. Define
ϕ : H → G/K
h 7→ hK
This is easily seen to be a homomorphism. We apply the first isomorphism theorem to
this homomorphism. The image is all K-cosets represented by something in H, i.e.
im(ϕ ) =
HK
.
K
Then the kernel of ϕ is
ker(ϕ ) = {h ∈ H : hK = eK} = {h ∈ H : h ∈ K} = H ∩ K.
So the first isomorphism theorem says
H ∼ HK
.
=
H ∩K
K
■
Notice we did more work than we really had to. We could have started by writing
down ϕ and checked it is a homomorphism. Then since H ∩ K is its kernel, it has to be a
normal subgroup.
Before we move on to the third isomorphism theorem, we notice that if K ◁ G, then
there is a bijection between subgroups of G/K and subgroups of G containing K, given
by
{subgroups of G/K} ←→ {subgroups of G which contain K}
G
X ≤ −→ {g ∈ G : gK ∈ X}
K
G
L
≤ ←− K ◁ L ≤ G.
K K
78
Chapter 9. Groups
This specializes to the bijection of normal subgroups:
{normal subgroups of G/K} ←→ {normal subgroups of G which contain K}
using the same bijection.
It is an elementary exercise to show that these are inverses of each other. This correspondence will be useful in later times.
Theorem 9.2.3 (Third isomorphism theorem). Let K ≤ L ≤ G be normal subgroups of G.
Then
G L ∼ G
= .
K K
L
Proof. Define the homomorphism
ϕ : G/K → G/L
gK 7→ gL
As always, we have to check this is well-defined. If gK = g0 K, then g−1 g0 ∈ K ⊆ L. So
gL = g0 L. This is also a homomorphism since
ϕ (gK · g0 K) = ϕ (gg0 K) = gg0 L = (gL) · (g0 L) = ϕ (gK) · ϕ (g0 K).
This clearly is surjective, since any coset gL is the image ϕ (gK). So the image is G/L.
The kernel is then
ker(ϕ ) = {gK : gL = L} = {gK : g ∈ L} =
L
.
K
So the conclusion follows by the first isomorphism theorem.
■
The general idea of these theorems is to take a group, find a normal subgroup, and
then quotient it out. Then hopefully the normal subgroup and the quotient group will be
simpler. However, this doesn’t always work.
Definition 9.2.8 (Simple group). A (non-trivial) group G is simple if it has no normal
subgroups except {e} and G.
In general, simple groups are complicated. However, if we only look at abelian groups,
then life is simpler. Note that by commutativity, the normality condition is always trivially
satisfied. So any subgroup is normal. Hence an abelian group can be simple only if it has
no non-trivial subgroups at all.
Lemma 9.2.4. An abelian group is simple if and only if it is isomorphic to the cyclic
group C p for some prime number p.
Proof. By Lagrange’s theorem, any subgroup of C p has order dividing |C p | = p. Hence
if p is prime, then it has no such divisors, and any subgroup must have order 1 or p, i.e. it
is either {e} or C p itself. Hence in particular any normal subgroup must be {e} or C p . So
it is simple.
9.3 Actions of permutations
79
Now suppose G is abelian and simple. Let e 6= g ∈ G be a non-trivial element, and
consider H = {· · · , g−2 , g−1 , e, g, g2 , · · · }. Since G is abelian, conjugation does nothing,
and every subgroup is normal. So H is a normal subgroup. As G is simple, H = {e} or
H = G. Since it contains g 6= e, it is non-trivial. So we must have H = G. So G is cyclic.
If G is infinite cyclic, then it is isomorphic to Z. But Z is not simple, since 2Z ◁ Z.
So G is a finite cyclic group, i.e. G ∼
= Cm for some finite m.
m/n
If n | m, then g
generates a subgroup of G of order n. So this is a normal subgroup.
Therefore n must be m or 1. Hence G cannot be simple unless m has no divisors except 1
and m, i.e. m is a prime.
■
One reason why simple groups are important is the following:
Theorem 9.2.4. Let G be any finite group. Then there are subgroups
G = H1 ▷ H2 ▷ H3 ▷ H4 ▷ · · · ▷ Hn = {e}.
such that Hi /Hi+1 is simple.
Note that here we only claim that Hi+1 is normal in Hi . This does not say that, say, H3
is a normal subgroup of H1 .
Proof. If G is simple, let H2 = {e}. Then we are done.
If G is not simple, let H2 be a maximal proper normal subgroup of G. We now claim
that G/H2 is simple.
If G/H2 is not simple, it contains a proper non-trivial normal subgroup L ◁ G/H2 such
that L 6= {e}, G/H2 . However, there is a correspondence between normal subgroups of
G/H2 and normal subgroups of G containing H2 . So L must be K/H2 for some K ◁ G
such that K ≥ H2 . Moreover, since L is non-trivial and not G/H2 , we know K is not G or
H2 . So K is a larger normal subgroup. Contradiction.
So we have found an H2 ◁ G such that G/H2 is simple. Iterating this process on H2
gives the desired result. Note that this process eventually stops, as Hi+1 < Hi , and hence
|Hi+1 | < |Hi |, and all these numbers are finite.
■
9.3 Actions of permutations
When we first motivated groups, we wanted to use them to represent some collection of
“symmetries”. Roughly, a symmetry of a set X is a permutation of X, i.e. a bijection
X → X that leaves some nice properties unchanged. For example, a symmetry of a square
is a permutation of the vertices that leaves the overall shape of the square unchanged.
Instead of just picking some nice permutations, we can consider the group of all permutations. We call this the symmetric group.
Definition 9.3.1 (Symmetric group). The symmetric group Sn is the group of all permutations of {1, · · · , n}, i.e. the set of all bijections of this set with itself.
A convenient way of writing permutations is to use the disjoint cycle notation, such
as writing (1 2 3)(4 5)(6) for the permutation that maps
1 7→ 2
2 7→ 3
3 7→ 1
4 7→ 5
5 7→ 4
6 7→ 6.
80
Chapter 9. Groups
Unfortunately, the convention for writing permutations is weird. Since permutations are
bijections, and hence functions, they are multiplied the wrong way, i.e. f ◦ g means first
apply g, then apply f . In particular, (1 2 3)(3 4) requires first applying the second permutation, then the first, and is in fact (1 2 3 4).
We know that any permutation is a product of transpositions. Hence we make the
following definition.
Definition 9.3.2 (Even and odd permutation). A permutation σ ∈ Sn is even if it can be
written as a product of evenly many transpositions; odd otherwise.
In IA Groups, we spent a lot of time proving this is well-defined, and we are not
doing that again (note that this definition by itself is well-defined — if a permutation can
be both written as an even number of transposition and an odd number of transposition,
the definition says it is even. However, this is not what we really want, since we cannot
immediately conclude that, say, (1 2) is odd).
This allows us to define the homomorphism:
sgn : Sn → ({±1}, ×)
(
+1 σ is even
σ 7→
−1 σ is odd
Definition 9.3.3 (Alternating group). The alternating group An ≤ Sn is the subgroup of
even permutations, i.e. An is the kernel of sgn.
This immediately tells us An ◁ Sn , and we can immediately work out its index, since
Sn ∼
= im(sgn) = {±1},
An
unless n = 1. So An has index 2.
More generally, for a set X, we can define its symmetric group as follows:
Definition 9.3.4 (Symmetric group of X). Let X be a set. We write Sym(X) for the group
of all permutations of X.
However, we don’t always want the whole symmetric group. Sometimes, we just
want some subgroups of symmetric groups, as in our initial motivation. So we make the
following definition.
Definition 9.3.5 (Permutation group). A group G is called a permutation group if it is a
subgroup of Sym(X) for some X, i.e. it is given by some, but not necessarily all, permutations of some set.
We say G is a permutation group of order n if in addition |X| = n.
This is not really a too interesting definition, since, as we will soon see, every group
is (isomorphic to) a permutation group. However, in some cases, thinking of a group as a
permutation group of some object gives us better intuition on what the group is about.
Example 9.3.1. Sn and An are obviously permutation groups. Also, the dihedral group
D2n is a permutation group of order n, viewing it as a permutation of the vertices of a
regular n-gon.
9.3 Actions of permutations
81
We would next want to recover the idea of a group being a “permutation”. If G ≤
Sym(X), then each g ∈ G should be able to give us a permutation of X, in a way that is
consistent with the group structure. We say the group G acts on X. In general, we make
the following definition:
Definition 9.3.6 (Group action). An action of a group (G, · ) on a set X is a function
∗ : G×X → X
such that
(i) g1 ∗ (g2 ∗ x) = (g1 · g2 ) ∗ x for all g1 , g2 ∈ G and x ∈ X.
(ii) e ∗ x = x for all x ∈ X.
There is another way of defining group actions, which is arguably a better way of
thinking about group actions.
Lemma 9.3.1. An action of G on X is equivalent to a homomorphism ϕ : G → Sym(X).
Note that the statement by itself is useless, since it doesn’t tell us how to translate
between the homomorphism and a group action. The important part is the proof.
Proof. Let ∗ : G × X → X be an action. Define ϕ : G → Sym(X) by sending g to the
function ϕ (g) = (g ∗ · : X → X). This is indeed a permutation — g−1 ∗ · is an inverse
since
ϕ (g−1 )(ϕ (g)(x)) = g−1 ∗ (g ∗ x) = (g−1 · g) ∗ x = e ∗ x = x,
and a similar argument shows ϕ (g) ◦ ϕ (g−1 ) = idX . So ϕ is at least a well-defined function.
To show it is a homomorphism, just note that
ϕ (g1 )(ϕ (g2 )(x)) = g1 ∗ (g2 ∗ x) = (g1 · g2 ) ∗ x = ϕ (g1 · g2 )(x).
Since this is true for all x ∈ X, we know ϕ (g1 ) ◦ ϕ (g2 ) = ϕ (g1 · g2 ). Also, ϕ (e)(x) =
e ∗ x = x. So ϕ (e) is indeed the identity. Hence ϕ is a homomorphism.
We now do the same thing backwards. Given a homomorphism ϕ : G → Sym(X),
define a function by g ∗ x = ϕ (g)(x). We now check it is indeed a group action. Using the
definition of a homomorphism, we know
(i) g1 ∗ (g2 ∗ x) = ϕ (g1 )(ϕ (g2 )(x)) = (ϕ (g1 ) ◦ ϕ (g2 ))(x) = ϕ (g1 · g2 )(x) = (g1 · g2 ) ∗ x.
(ii) e ∗ x = ϕ (e)(x) = idX (x) = x.
So this homomorphism gives a group action. These two operations are clearly inverses to
each other. So group actions of G on X are the same as homomorphisms G → Sym(X).
■
Definition 9.3.7 (Permutation representation). A permutation representation of a group
G is a homomorphism G → Sym(X).
We have thus shown that a permutation representation is the same as a group action.
The good thing about thinking of group actions as homomorphisms is that we can use
all we know about homomorphisms on them.
82
Chapter 9. Groups
Notation 9.1. For an action of G on X given by ϕ : G → Sym(X), we write GX = im(ϕ )
and GX = ker(ϕ ).
The first isomorphism theorem immediately gives
Proposition 9.3.1. GX ◁ G and G/GX ∼
= GX .
In particular, if GX = {e} is trivial, then G ∼
= GX ≤ Sym(X).
Example 9.3.2. Let G be the group of symmetries of a cube. Let X be the set of diagonals
of the cube.
Then G acts on X, and so we get ϕ : G → Sym(X). What is its kernel? To preserve
the diagonals, it either does nothing to the diagonal, or flips the two vertices. So GX =
ker(ϕ ) = {id, symmetry that sends each vertex to its opposite} ∼
= C2 .
X
∼
How about the image? We have G = im(ϕ ) ≤ Sym(X) = S4 . It is an exercise to
show that im(ϕ ) = Sym(X), i.e. that ϕ is surjective. We are not proving this because this
is an exercise in geometry, not group theory. Then the first isomorphism theorem tells us
GX ∼
= G/GX .
So
|G| = |GX ||GX | = 4! · 2 = 48.
This is an example of how we can use group actions to count elements in a group.
Example 9.3.3 (Cayley’s theorem). For any group G, we have an action of G on G itself
via
g ∗ g1 = gg1 .
It is trivial to check this is indeed an action. This gives a group homomorphism ϕ : G →
Sym(G). What is its kernel? If g ∈ ker(ϕ ), then it acts trivially on every element. In
particular, it acts trivially on the identity. So g ∗ e = e, which means g = e. So ker(ϕ ) =
{e}. By the first isomorphism theorem, we get
G∼
= G/{e} ∼
= im ϕ ≤ Sym(G).
So we know every group is (isomorphic to) a subgroup of a symmetric group.
Example 9.3.4. Let H be a subgroup of G, and X = G/H be the set of left cosets of H.
We let G act on X via
g ∗ g1 H = gg1 H.
9.3 Actions of permutations
83
It is easy to check this is well-defined and is indeed a group action. So we get ϕ : G →
Sym(X).
Now consider GX = ker(ϕ ). If g ∈ GX , then for every g1 ∈ G, we have g ∗ g1 H = g1 H.
This means g−1
1 gg1 ∈ H. In other words, we have
g ∈ g1 Hg−1
1 .
This has to happen for all g1 ∈ G. So
GX ⊆
\
g1 ∈G
g1 Hg−1
1 .
This argument is completely reversible — if g ∈
know
T
−1
g1 ∈G g1 Hg1 , then for each g1 ∈ G, we
g−1
1 gg1 ∈ H,
and hence
gg1 H = g1 H.
So
g ∗ g1 H = g1 H
So g ∈ GX . Hence we indeed have equality:
ker(ϕ ) = GX =
\
g1 ∈G
g1 Hg−1
1 .
Since this is a kernel, this is a normal subgroup of G, and is contained in H. Starting with
an arbitrary subgroup H, this allows us to generate a normal subgroup, and this is indeed
the biggest normal subgroup of G that is contained in H, if we stare at it long enough.
We can use this to prove the following theorem.
Theorem 9.3.1. Let G be a finite group, and H ≤ G a subgroup of index n. Then there is
a normal subgroup K ◁ G with K ≤ H such that G/K is isomorphic to a subgroup of Sn .
Hence |G/K| | n! and |G/K| ≥ n.
Proof. We apply the previous example, giving ϕ : G → Sym(G/H), and let K be the kernel
of this homomorphism. We have already shown that K ≤ H. Then the first isomorphism
theorem gives
G/K ∼
= im ϕ ≤ Sym(G/H) ∼
= Sn .
Then by Lagrange’s theorem, we know |G/K| | |Sn | = n!, and we also have |G/K| ≥
|G/H| = n.
■
Corollary 9.3.1. Let G be a non-abelian simple group. Let H ≤ G be a proper subgroup
of index n. Then G is isomorphic to a subgroup of An . Moreover, we must have n ≥ 5, i.e.
G cannot have a subgroup of index less than 5.
84
Chapter 9. Groups
Proof. The action of G on X = G/H gives a homomorphism ϕ : G → Sym(X). Then
ker(ϕ ) ◁ G. Since G is simple, ker(ϕ ) is either G or {e}. We first show that it cannot be
G. If ker(ϕ ) = G, then every element of G acts trivially on X = G/H. But if g ∈ G \ H,
which exists since the index of H is not 1, then g ∗ H = gH 6= H. So g does not act trivially.
So the kernel cannot be the whole of G. Hence ker(ϕ ) = {e}.
Thus by the first isomorphism theorem, we get
G∼
= Sn .
= im(ϕ ) ≤ Sym(X) ∼
We now need to show that G is in fact a subgroup of An .
We know An ◁ Sn . So im(ϕ ) ∩ An ◁ im(ϕ ) ∼
= G. As G is simple, im(ϕ ) ∩ An is either
{e} or G = im(ϕ ). We want to show that the second thing happens, i.e. the intersection is
not the trivial group. We use the second isomorphism theorem. If im(ϕ ) ∩ An = {e}, then
im(ϕ ) ∼
=
im(ϕ ) ∼ im(ϕ )An
Sn ∼
≤
=
= C2 .
im(ϕ ) ∩ An
An
An
So G ∼
= im(ϕ ) is a subgroup of C2 , i.e. either {e} or C2 itself. Neither of these are nonabelian. So this cannot be the case. So we must have im(ϕ ) ∩ An = im(ϕ ), i.e. im(ϕ ) ≤ An .
The last part follows from the fact that S1 , S2 , S3 , S4 have no non-abelian simple subgroups, which you can check by going to a quiet room and listing out all their subgroups.
■
Let’s recall some old definitions from IA Groups.
Definition 9.3.8 (Orbit). If G acts on a set X, the orbit of x ∈ X is
G · x = {g ∗ x ∈ X : g ∈ G}.
Definition 9.3.9 (Stabilizer). If G acts on a set X, the stabilizer of x ∈ X is
Gx = {g ∈ G : g ∗ x = x}.
The main theorem about these concepts is the orbit-stabilizer theorem.
Theorem 9.3.2 (Orbit-stabilizer theorem). Let G act on X. Then for any x ∈ X, there is a
bijection between G · x and G/Gx , given by g · x ↔ g · Gx .
In particular, if G is finite, it follows that
|G| = |Gx ||G · x|.
It takes some work to show this is well-defined and a bijection, but you’ve done it in
IA Groups. In IA Groups, you probably learnt the second statement instead, but this result
is more generally true for infinite groups.
9.4 Conjugacy, centralizers and normalizers
85
9.4 Conjugacy, centralizers and normalizers
We have seen that every group acts on itself by multiplying on the left. A group G can
also act on itself in a different way, by conjugation:
g ∗ g1 = gg1 g−1 .
Let ϕ : G → Sym(G) be the associated permutation representation. We know, by definition, that ϕ (g) is a bijection from G to G as sets. However, here G is not an arbitrary
set, but is a group. A natural question to ask is whether ϕ (g) is a homomorphism or not.
Indeed, we have
ϕ (g)(g1 · g2 ) = gg1 g2 g−1 = (gg1 g−1 )(gg2 g−1 ) = ϕ (g)(g1 )ϕ (g)(g2 ).
So ϕ (g) is a homomorphism from G to G. Since ϕ (g) is bijective (as in any group action),
it is in fact an isomorphism.
Thus, for any group G, there are many isomorphisms from G to itself, one for every
g ∈ G, and can be obtained from a group action of G on itself.
We can, of course, take the collection of all isomorphisms of G, and form a new group
out of it.
Definition 9.4.1 (Automorphism group). The automorphism group of G is
Aut(G) = { f : G → G : f is a group isomorphism}.
This is a group under composition, with the identity map as the identity.
This is a subgroup of Sym(G), and the homomorphism ϕ : G → Sym(G) by conjugation lands in Aut(G).
This is pretty fun — we can use this to cook up some more groups, by taking a group
and looking at its automorphism group.
We can also take a group, take its automorphism group, and then take its automorphism group again, and do it again, and see if this process stabilizes, or becomes periodic,
or something. This is left as an exercise for the reader.
Definition 9.4.2 (Conjugacy class). The conjugacy class of g ∈ G is
cclG (g) = {hgh−1 : h ∈ G},
i.e. the orbit of g ∈ G under the conjugation action.
Definition 9.4.3 (Centralizer). The centralizer of g ∈ G is
CG (g) = {h ∈ G : hgh−1 = g},
i.e. the stabilizer of g under the conjugation action. This is alternatively the set of all
h ∈ G that commute with g.
Definition 9.4.4 (Center). The center of a group G is
Z(G) = {h ∈ G : hgh−1 = g for all g ∈ G} =
\
g∈G
CG (g) = ker(ϕ ).
86
Chapter 9. Groups
These are the elements of the group that commute with everything else.
By the orbit-stabilizer theorem, for each x ∈ G, we obtain a bijection
ccl(x) ↔ G/CG (x).
Proposition 9.4.1. Let G be a finite group. Then
| ccl(x)| = |G : CG (x)| = |G|/|CG (x)|.
In particular, the size of each conjugacy class divides the order of the group.
Another useful notion is the normalizer.
Definition 9.4.5 (Normalizer). Let H ≤ G. The normalizer of H in G is
NG (H) = {g ∈ G : g−1 Hg = H}.
Note that we certainly have H ≤ NG (H). Even better, H ◁ NG (H), essentially by
definition. This is in fact the biggest subgroup of G in which H is normal.
We are now going to look at conjugacy classes of Sn . Now we recall from IA Groups
that permutations in Sn are conjugate if and only if they have the same cycle type when
written as a product of disjoint cycles. We can think of the cycle types as partitions of n.
For example, the partition 2, 2, 1 of 5 corresponds to the conjugacy class of (1 2)(3 4)(5).
So the conjugacy classes of Sn are exactly the partitions of n.
We will use this fact in the proof of the following theorem:
Theorem 9.4.1. The alternating groups An are simple for n ≥ 5 (also for n = 1, 2, 3).
The cases in brackets follow from a direct check since A1 ∼
= A2 ∼
= {e} and A3 ∼
= C3 ,
all of which are simple. We can also check manually that A4 has non-trivial normal
subgroups, and hence not simple.
Recall we also proved that A5 is simple in IA Groups by brute force — we listed all its
conjugacy classes, and see they cannot be put together to make a normal subgroup. This
obviously cannot be easily generalized to higher values of n. Hence we need to prove this
with a different approach.
Proof. We start with the following claim:
Claim 9.4.1. An is generated by 3-cycles.
As any element of An is a product of evenly-many transpositions, it suffices to show
that every product of two transpositions is also a product of 3-cycles.
There are three possible cases: let a, b, c, d be distinct. Then
(i) (a b)(a b) = e.
(ii) (a b)(b c) = (a b c).
(iii) (a b)(c d) = (a c b)(a c d).
So we have shown that every possible product of two transpositions is a product of threecycles.
Claim 9.4.2. Let H ◁ An . If H contains a 3-cycle, then we H = An .
9.4 Conjugacy, centralizers and normalizers
87
We show that if H contains a 3-cycle, then every 3-cycle is in H. Then we are done
since An is generated by 3-cycles. For concreteness, suppose we know (a b c) ∈ H, and
we want to show (1 2 3) ∈ H.
Since they have the same cycle type, we have σ ∈ Sn such that (a b c) = σ (1 2 3)σ −1 .
If σ is even, i.e. σ ∈ An , then we have that (1 2 3) ∈ σ −1 H σ = H, by the normality of H
and we are trivially done.
If σ is odd, replace it by σ̄ = σ · (4 5). Here is where we use the fact that n ≥ 5 (we
will use it again later). Then we have
σ̄ (1 2 3)σ̄ −1 = σ (4 5)(1 2 3)(4 5)σ −1 = σ (1 2 3)σ −1 = (a b c),
using the fact that (1 2 3) and (4 5) commute. Now σ̄ is even. So (1 2 3) ∈ H as above.
What we’ve got so far is that if H ◁ An contains any 3-cycle, then it is An . Finally, we
have to show that every normal subgroup must contain at least one 3-cycle.
Claim 9.4.3. Let H ◁ An be non-trivial. Then H contains a 3-cycle.
We separate this into many cases
(i) Suppose H contains an element which can be written in disjoint cycle notation
σ = (1 2 3 · · · r)τ ,
for r ≥ 4. We now let δ = (1 2 3) ∈ An . Then by normality of H, we know δ −1 σ δ ∈
H. Then σ −1 δ −1 σ δ ∈ H. Also, we notice that τ does not contain 1, 2, 3. So it
commutes with δ , and also trivially with (1 2 3 · · · r). We can expand this mess to
obtain
σ −1 δ −1 σ δ = (r · · · 2 1)(1 3 2)(1 2 3 · · · r)(1 2 3) = (2 3 r),
which is a 3-cycle. So done.
The same argument goes through if σ = (a1 a2 · · · ar )τ for any a1 , · · · , an .
(ii) Suppose H contains an element consisting of at least two 3-cycles in disjoint cycle
notation, say
σ = (1 2 3)(4 5 6)τ
We now let δ = (1 2 4), and again calculate
σ −1 δ −1 σ δ = (1 3 2)(4 6 5)(1 4 2)(1 2 3)(4 5 6)(1 2 4) = (1 2 4 3 6).
This is a 5-cycle, which is necessarily in H. By the previous case, we get a 3-cycle
in H too, and hence H = An .
(iii) Suppose H contains σ = (1 2 3)τ , with τ a product of 2-cycles (if τ contains
anything longer, then it would fit in one of the previous two cases). Then σ 2 =
(1 2 3)2 = (1 3 2) is a three-cycle.
(iv) Suppose H contains σ = (1 2)(3 4)τ , where τ is a product of 2-cycles. We first let
δ = (1 2 3) and calculate
u = σ −1 δ −1 σ δ = (1 2)(3 4)(1 3 2)(1 2)(3 4)(1 2 3) = (1 4)(2 3),
88
Chapter 9. Groups
which is again in u. We landed in the same case, but instead of two transpositions
times a mess, we just have two transpositions, which is nicer. Now let
v = (1 5 2)u(1 2 5) = (1 3)(4 5) ∈ H.
Note that we used n ≥ 5 again. We have yet again landed in the same case. Notice
however, that these are not the same transpositions. We multiply
uv = (1 4)(2 3)(1 3)(4 5) = (1 2 3 4 5) ∈ H.
This is then covered by the first case, and we are done.
So done. Phew.
■
9.5 Finite p-groups
Note that when studying the orders of groups and subgroups, we always talk about divisibility, since that is what Lagrange’s theorem tells us about. We never talk about things
like the sum of the orders of two subgroups. When it comes to divisibility, the simplest
case would be when the order is a prime, and we have done that already. The next best
thing we can hope for is that the order is a power of a prime.
Definition 9.5.1 (p-group). A finite group G is a p-group if |G| = pn for some prime
number p and n ≥ 1.
Theorem 9.5.1. If G is a finite p-group, then Z(G) = {x ∈ G : xg = gx for all g ∈ G} is
non-trivial.
This immediately tells us that for n ≥ 2, a p group is never simple.
Proof. Let G act on itself by conjugation. The orbits of this action (i.e. the conjugacy
classes) have order dividing |G| = pn . So it is either a singleton, or its size is divisible by
p.
Since the conjugacy classes partition G, we know the total size of the conjugacy
classes is |G|. In particular,
|G| = number of conjugacy class of size 1
+ ∑ order of all other conjugacy classes.
We know the second term is divisible by p. Also |G| = pn is divisible by p. Hence the
number of conjugacy classes of size 1 is divisible by p. We know {e} is a conjugacy class
of size 1. So there must be at least p conjugacy classes of size 1. Since the smallest prime
number is 2, there is a conjugacy class {x} 6= {e}.
But if {x} is a conjugacy class on its own, then by definition g−1 xg = x for all g ∈ G,
i.e. xg = gx for all g ∈ G. So x ∈ Z(G). So Z(G) is non-trivial.
■
The theorem allows us to prove interesting things about p-groups by induction — we
can quotient G by Z(G), and get a smaller p-group. One way to do this is via the below
lemma.
9.6 Finite abelian groups
89
Lemma 9.5.1. For any group G, if G/Z(G) is cyclic, then G is abelian.
In other words, if G/Z(G) is cyclic, then it is in fact trivial, since the center of an
abelian group is the abelian group itself.
Proof. Let g ∈ Z(G) be a generator of the cyclic group G/Z(G). Hence every coset of
Z(G) is of the form gr Z(G). So every element x ∈ G must be of the form gr z for z ∈ Z(G)
and r ∈ Z. To show G is abelian, let x̄ = gr̄ z̄ be another element, with z̄ ∈ Z(G), r̄ ∈ Z.
Note that z and z̄ are in the center, and hence commute with every element. So we have
xx̄ = gr zgr̄ z̄ = gr gr̄ zz̄ = gr̄ gr z̄z = gr̄ z̄gr z = x̄x.
So they commute. So G is abelian.
■
This is a general lemma for groups, but is particularly useful when applied to p groups.
Corollary 9.5.1. If p is prime and |G| = p2 , then G is abelian.
Proof. Since Z(G) ≤ G, its order must be 1, p or p2 . Since it is not trivial, it can only be
p or p2 . If it has order p2 , then it is the whole group and the group is abelian. Otherwise,
G/Z(G) has order p2 /p = p. But then it must be cyclic, and thus G must be abelian. This
is a contradiction. So G is abelian.
■
Theorem 9.5.2. Let G be a group of order pa , where p is a prime number. Then it has a
subgroup of order pb for any 0 ≤ b ≤ a.
This means there is a subgroup of every conceivable order. This is not true for general
groups. For example, A5 has no subgroup of order 30 or else that would be a normal
subgroup.
Proof. We induct on a. If a = 1, then {e}, G give subgroups of order p0 and p1 . So done.
Now suppose a > 1, and we want to construct a subgroup of order pb . If b = 0, then
this is trivial, namely {e} ≤ G has order 1.
Otherwise, we know Z(G) is non-trivial. So let x 6= e ∈ Z(G). Since ord(x) | |G|, its
c−1
order is a power of p. If it in fact has order pc , then x p has order p. So we can suppose,
by renaming, that x has order p. We have thus generated a subgroup hxi of order exactly
p. Moreover, since x is in the center, hxi commutes with everything in G. So hxi is in fact
a normal subgroup of G. This is the point of choosing it in the center. Therefore G/hxi
has order pa−1 .
Since this is a strictly smaller group, we can by induction suppose G/hxi has a subgroup of any order. In particular, it has a subgroup L of order pb−1 . By the subgroup
correspondence, there is some K ≤ G such that L = K/hxi and H ◁ K. But then K has
order pb . So done.
■
9.6 Finite abelian groups
We now move on to a small section, which is small because we will come back to it later,
and actually prove what we claim.
It turns out finite abelian groups are very easy to classify. We can just write down a
list of all finite abelian groups. We write down the classification theorem, and then prove
it in the last part of the course, where we hit this with a huge sledgehammer.
90
Chapter 9. Groups
Theorem 9.6.1 (Classification of finite abelian groups). Let G be a finite abelian group.
Then there exist some d1 , · · · , dr such that
G∼
= Cd1 ×Cd2 × · · · ×Cdr .
Moreover, we can pick di such that di+1 | di for each i, and this expression is unique.
It turns out the best way to prove this is not to think of it as a group, but as a Z-module,
which is something we will come to later.
Example 9.6.1. The abelian groups of order 8 are C8 , C4 ×C2 , C2 ×C2 ×C2 .
Sometimes this is not the most useful form of decomposition. To get a nicer decomposition, we use the following lemma:
Lemma 9.6.1. If n and m are coprime, then Cmn ∼
= Cm ×Cn .
This is a grown-up version of the Chinese remainder theorem. This is what the Chinese remainder theorem really says.
Proof. It suffices to find an element of order nm in Cm ×Cn . Then since Cn ×Cm has order
nm, it must be cyclic, and hence isomorphic to Cnm .
Let g ∈ Cm have order m; h ∈ Cn have order n, and consider (g, h) ∈ Cm ×Cn . Suppose
the order of (g, h) is k. Then (g, h)k = (e, e). Hence (gk , hk ) = (e, e). So the order of g and
h divide k, i.e. m | k and n | k. As m and n are coprime, this means that mn | k.
As k = ord((g, h)) and (g, h) ∈ Cm × Cn is a group of order mn, we must have k | nm.
So k = nm.
■
Corollary 9.6.1. For any finite abelian group G, we have
G∼
= Cd1 ×Cd2 × · · · ×Cdr ,
where each di is some prime power.
Proof. From the classification theorem, iteratively apply the previous lemma to break
each component up into products of prime powers.
■
As promised, this is short.
9.7 Sylow theorems
We finally get to the big theorem of this part of the course.
Theorem 9.7.1 (Sylow theorems). Let G be a finite group of order pa · m, with p a prime
and p ∤ m. Then
(i) The set of Sylow p-subgroups of G, given by
Syl p (G) = {P ≤ G : |P| = pa },
is non-empty. In other words, G has a subgroup of order pa .
(ii) All elements of Syl p (G) are conjugate in G.
9.7 Sylow theorems
91
(iii) The number of Sylow p-subgroups n p = | Syl p (G)| satisfies n p ≡ 1 (mod p) and
n p | |G| (in fact n p | m, since p is not a factor of n p ).
These are sometimes known as Sylow’s first/second/third theorem respectively.
We will not prove this just yet. We first look at how we can apply this theorem. We
can use it without knowing how to prove it.
Lemma 9.7.1. If n p = 1, then the Sylow p-subgroup is normal in G.
Proof. Let P be the unique Sylow p-subgroup, and let g ∈ G, and consider g−1 Pg. Since
this is isomorphic to P, we must have |g−1 Pg| = pa , i.e. it is also a Sylow p-subgroup.
Since there is only one, we must have P = g−1 Pg. So P is normal.
■
n !
Corollary 9.7.1. Let G be a non-abelian simple group. Then |G| | 2p for every prime p
such that p | |G|.
Proof. The group G acts on Syl p (G) by conjugation. So it gives a permutation representation ϕ : G → Sym(Syl p (G)) ∼
= Sn p . We know ker ϕ ◁ G. But G is simple. So ker(ϕ ) = {e}
or G. We want to show it is not the whole of G.
If we had G = ker(ϕ ), then g−1 Pg = P for all g ∈ G. Hence P is a normal subgroup.
As G is simple, either P = {e}, or P = G. We know P cannot be trivial since p | |G|. But
if G = P, then G is a p-group, has a non-trivial center, and hence G is not non-abelian
simple. So we must have ker(ϕ ) = {e}.
Then by the first isomorphism theorem, we know G ∼
= im ϕ ≤ Sn p . We have proved
the theorem without the divide-by-two part. To prove the whole result, we need to show
that in fact im(ϕ ) ≤ An p . Consider the following composition of homomorphisms:
G
ϕ
Sn p
sgn
{±1}.
If this is surjective, then ker(sgn ◦ϕ ) ◁ G has index 2 (since the index is the size of the
image), and is not the whole of G. This means G is not simple (the case where |G| = C2
is ruled out since it is abelian).
So the kernel must be the whole G, and sgn ◦ϕ is the trivial map. In other words,
sgn(ϕ (g)) = +1. So ϕ (g) ∈ An p . So in fact we have
G∼
= im(ϕ ) ≤ An p .
n !
So we get |G| | 2p .
■
Example 9.7.1. Suppose |G| = 1000. Then |G| is not simple. To show this, we need to
factorize 1000. We have |G| = 23 · 53 . We pick our favorite prime to be p = 5. We know
n5 ∼
= 1 (mod 5), and n5 | 23 = 8. The only number that satisfies this is n5 = 1. So the
Sylow 5-subgroup is normal, and hence G is not normal.
Example 9.7.2. Let |G| = 132 = 22 · 3 · 11. We want to show this is not simple. So for a
contradiction suppose it is.
We start by looking at p = 11. We know n11 ≡ 1 (mod 11). Also n11 | 12. As G is
simple, we must have n11 = 12.
92
Chapter 9. Groups
Now look at p = 3. We have n3 = 1 (mod 3) and n3 | 44. The possible values of n3
are 4 and 22.
If n3 = 4, then the corollary says |G| | 4!2 = 12, which is of course nonsense. So
n3 = 22.
At this point, we count how many elements of each order there are. This is particularly
useful if p | |G| but p2 ∤ |G|, i.e. the Sylow p-subgroups have order p and hence are cyclic.
As all Sylow 11-subgroups are disjoint, apart from {e}, we know there are 12 · (11 −
1) = 120 elements of order 11. We do the same thing with the Sylow 3-subgroups. We
need 22 · (3 − 1) = 44 elements of order 3. But this is more elements than the group has.
This can’t happen. So G must be simple.
We now get to prove our big theorem. This involves some non-trivial amount of
trickery.
Proof of Sylow’s theorem. Let G be a finite group with |G| = pa m, and p ∤ m.
(i) We need to show that Syl p (G) 6= 0,
/ i.e. we need to find some subgroup of order pa .
As always, we find something clever for G to act on. We let
Ω = {X subset of G : |X| = pa }.
We let G act on Ω by
g ∗ {g1 , g2 , · · · , g pa } = {gg1 , gg2 , · · · , gg pa }.
Let Σ ≤ Ω be an orbit.
We first note that if {g1 , · · · , g pa } ∈ Σ, then by the definition of an orbit, for every
g ∈ G,
−1
−1 a
a
gg−1
1 ∗ {g1 , · · · , g p } = {g, gg1 g2 , · · · , gg1 g p } ∈ Σ.
The important thing is that this set contains g. So for each g, Σ contains a set X
which contains g. Since each set X has size pa , we must have
|Σ| ≥
|G|
= m.
pa
Suppose |Σ| = m. Then the orbit-stabilizer theorem says the stabilizer H of any
{g1 , · · · , g pa } ∈ Σ has index m, hence |H| = pa , and thus H ∈ Syl p (G).
So we need to show that not every orbit Σ can have size > m. Again, by the orbitstabilizer, the size of any orbit divides the order of the group, |G| = pa m. So if
|Σ| > m, then p | |Σ|. Suppose we can show that p ∤ |Ω|. Then not every orbit Σ can
have size > m, since Ω is the disjoint union of all the orbits, and thus we are done.
So we have to show p ∤ |Ω|. This is just some basic counting. We have
a pa −1
pa m − j
p m
|G|
=
=
=
.
|Ω| =
∏
pa
pa − j
pa
j=0
Now note that the largest power of p dividing pa m − j is the largest power of p
dividing j. Similarly, the largest power of p dividing pa − j is also the largest
9.7 Sylow theorems
93
power of p dividing j. So we have the same power of p on top and bottom for each
item in the product, and they cancel. So the result is not divisible by p.
This proof is not straightforward. We first needed the clever idea of letting G act
on Ω. But then if we are given this set, the obvious thing to do would be to find
something in Ω that is also a group. This is not what we do. Instead, we find an
orbit whose stabilizer is a Sylow p-subgroup.
(ii) We instead prove something stronger: if Q ≤ G is a p-subgroup (i.e. |Q| = pb , for
b not necessarily a), and P ≤ G is a Sylow p-subgroup, then there is a g ∈ G such
that g−1 Qg ≤ P. Applying this to the case where Q is another Sylow p-subgroup
says there is a g such that g−1 Qg ≤ P, but since g−1 Qg has the same size as P, they
must be equal.
We let Q act on the set of cosets of G/P via
q ∗ gP = qgP.
We know the orbits of this action have size dividing |Q|, so is either 1 or divisible by
p. But they can’t all be divisible by p, since |G/P| is coprime to p. So at least one
of them have size 1, say {gP}. In other words, for every q ∈ Q, we have qgP = gP.
This means g−1 qg ∈ P. This holds for every element q ∈ Q. So we have found a g
such that g−1 Qg ≤ P.
(iii) Finally, we need to show that n p ∼
= 1 (mod p) and n p | |G|, where n p = | SylP (G)|.
The second part is easier — by Sylow’s second theorem, the action of G on Syl p (G)
by conjugation has one orbit. By the orbit-stabilizer theorem, the size of the orbit,
which is | Syl p (G)| = n p , divides |G|. This proves the second part.
For the first part, let P ∈ SylP (G). Consider the action by conjugation of P on
Syl p (G). Again by the orbit-stabilizer theorem, the orbits each have size 1 or size
divisible by p. But we know there is one orbit of size 1, namely {P} itself. To show
n p = | SylP (G)| ∼
= 1 (mod p), it is enough to show there are no other orbits of size
1.
Suppose {Q} is an orbit of size 1. This means for every p ∈ P, we get
p−1 Qp = Q.
In other words, P ≤ NG (Q). Now NG (Q) is itself a group, and we can look at its
Sylow p-subgroups. We know Q ≤ NG (Q) ≤ G. So pa | |NG (Q)| | pa m. So pa is the
biggest power of p that divides |NG (Q)|. So Q is a Sylow p-subgroup of NG (Q).
Now we know P ≤ NG (Q) is also a Sylow p-subgroup of NG (Q). By Sylow’s
second theorem, they must be conjugate in NG (Q). But conjugating anything in Q
by something in NG (Q) does nothing, by definition of NG (Q). So we must have
P = Q. So the only orbit of size 1 is {P} itself. So done.
■
This is all the theories of groups we’ve got. In the remaining time, we will look at
some interesting examples of groups.
Example 9.7.3. Let G = GLn (Z/p), i.e. the set of invertible n × n matrices with entries
in Z/p, the integers modulo p. Here p is obviously a prime. When we do rings later, we
will study this properly.
94
Chapter 9. Groups
First of all, we would like to know the size of this group. A matrix A ∈ GLn (Z/p) is
the same as n linearly independent vectors in the vector space (Z/p)n . We can just work
out how many there are. This is not too difficult, when you know how.
We can pick the first vector, which can be anything except zero. So there are pn − 1
ways of choosing the first vector. Next, we need to pick the second vector. This can be
anything that is not in the span of the first vector, and this rules out p possibilities. So
there are pn − p ways of choosing the second vector. Continuing this chain of thought, we
have
|GLn (Z/p)| = (pn − 1)(pn − p)(pn − p2 ) · · · (pn − pn−1 ).
What is a Sylow p-subgroup of GLn (Z/p)? We first work out what the order of this is.
We can factorize that as
|GLn (Z/p)| = (1 · p · p2 · · · · · pn−1 )((pn − 1)(pn−1 − 1) · · · (p − 1)).
n
So the largest power of p that divides |GLn (Z/p)| is p(2) . Let’s find a subgroup of size
n
p(2) . We consider matrices of the form



1 ∗ ∗ ··· ∗










0
1
∗
·
·
·
∗





0 0 1 · · · ∗ 
U= 
 ∈ GLn (Z/p) .


 .. .. .. . . .. 




. . .

.
.






0 0 0 ··· 1
n
Then
we know |U| = p(2) as each ∗ can be chosen to be anything in Z/p, and there are
n
2 ∗s.
Is the Sylow p-subgroup unique? No. We can take the lower triangular matrices and
get another Sylow p-subgroup.
Example 9.7.4. Let’s be less ambitious and consider GL2 (Z/p). So
|G| = p(p2 − 1)(p − 1) = p(p − 1)2 (p + 1).
Let ℓ be another prime number such that ℓ | p − 1. Suppose the largest power of ℓ that
divides |G| is ℓ2 . Can we (explicitly) find a Sylow ℓ-subgroup?
First, we want to find an element of order ℓ. How is p − 1 related to p (apart from the
obvious way)? We know that
(Z/p)× = {x ∈ Z/p : (∃y) xy ≡ 1
(mod p)} ∼
= C p−1 .
So as ℓ | p − 1, there is a subgroup Cℓ ≤ C p−1 ∼
= (Z/p)× . Then we immediately know
where to find a subgroup of order ℓ2 : we have
Cℓ × Cℓ ≤ (Z/p)∗ × (Z/p)× ≤ GL2 (Z/p),
where the final inclusion is the diagonal matrices, identifying
a 0
(a, b) ↔
.
0 b
So this is the Sylow ℓ-subgroup.
10. Rings
10.1 Definitions and examples
We now move on to something completely different — rings. In a ring, we are allowed to
add, subtract, multiply but not divide. Our canonical example of a ring would be Z, the
integers, as studied in IA Numbers and Sets.
In this course, we are only going to consider rings in which multiplication is commutative, since these rings behave like “number systems”, where we can study number theory.
However, some of these rings do not behave like Z. Thus one major goal of this part is to
understand the different properties of Z, whether they are present in arbitrary rings, and
how different properties relate to one another.
Definition 10.1.1 (Ring). A ring is a quintuple (R, +, · , 0R , 1R ) where 0R , 1R ∈ R, and
+, · : R × R → R are binary operations such that
(i) (R, +, 0R ) is an abelian group.
(ii) The operation · : R × R → R satisfies associativity, i.e.
a · (b · c) = (a · b) · c,
and identity:
1R · r = r · 1R = r.
(iii) Multiplication distributes over addition, i.e.
r1 · (r2 + r3 ) = (r1 · r2 ) + (r1 · r3 )
(r1 + r2 ) · r3 = (r1 · r3 ) + (r2 · r3 ).
Notation 10.1. If R is a ring and r ∈ R, we write −r for the inverse to r in (R, +, 0R ). This
satisfies r + (−r) = 0R . We write r − s to mean r + (−s) etc.
96
Chapter 10. Rings
Some people don’t insist on the existence of the multiplicative identity, but we will
for the purposes of this course.
Since we can add and multiply two elements, by induction, we can add and multiply
any finite number of elements. However, the notions of infinite sum and product are
undefined. It doesn’t make sense to ask if an infinite sum converges.
Definition 10.1.2 (Commutative ring). We say a ring R is commutative if a · b = b · a for
all a, b ∈ R.
From now onwards, all rings in this course are going to be commutative.
Just as we have groups and subgroups, we also have subrings.
Definition 10.1.3 (Subring). Let (R, +, · , 0R , 1R ) be a ring, and S ⊆ R be a subset. We say
S is a subring of R if 0R , 1R ∈ S, and the operations +, · make S into a ring in its own right.
In this case we write S ≤ R.
Example 10.1.1. The familiar number systems are all rings: we have Z ≤ Q ≤ R ≤ C,
under the usual 0, 1, +, · .
Example 10.1.2. The set Z[i] = {a + ib : a, b ∈ Z} ≤ C is the Gaussian integers, which
is a ring.
√
√
We also have the ring Q[ 2] = {a + b 2 ∈ R : a, b ∈ Q} ≤ R.
We will use the square brackets notation quite frequently. It should be clear what it
should mean, and we will define it properly later.
In general, elements in a ring do not have inverses. This is not a bad thing. This is what
makes rings interesting. For example, the division algorithm would be rather contentless
if everything in Z had an inverse. Fortunately, Z only has two invertible elements — 1
and −1. We call these units.
Definition 10.1.4 (Unit). An element u ∈ R is a unit if there is another element v ∈ R such
that u · v = 1R .
It is important that this depends on R, not just on u. For example, 2 ∈ Z is not a unit,
but 2 ∈ Q is a unit (since 12 is an inverse).
A special case is when (almost) everything is a unit.
Definition 10.1.5 (Field). A field is a non-zero ring where every u 6= 0R ∈ R is a unit.
We will later show that 0R cannot be a unit except in a very degenerate case.
Example 10.1.3. Z is not a field, but Q, R,
√ C are all fields.
Similarly, Z[i] is not a field, while Q[ 2] is.
Example 10.1.4. Let R be a ring. Then 0R + 0R = 0R , since this is true in the group
(R, +, 0R ). Then for any r ∈ R, we get
r · (0R + 0R ) = r · 0R .
We now use the fact that multiplication distributes over addition. So
r · 0R + r · 0R = r · 0R .
10.1 Definitions and examples
97
Adding (−r · 0R ) to both sides give
r · 0R = 0R .
This is true for any element r ∈ R. From this, it follows that if R 6= {0}, then 1R 6= 0R —
if they were equal, then take r 6= 0R . So
r = r · 1R = r · 0R = 0R ,
which is a contradiction.
Note, however, that {0} forms a ring (with the only possible operations and identities),
the zero ring, albeit a boring one. However, this is often a counterexample to many things.
Definition 10.1.6 (Product of rings). Let R, S be rings. Then the product R × S is a ring
via
(r, s) + (r0 , s0 ) = (r + r0 , s + s0 ),
(r, s) · (r0 , s0 ) = (r · r0 , s · s0 ).
The zero is (0R , 0S ) and the one is (1R , 1S ).
We can (but won’t) check that these indeed are rings.
Definition 10.1.7 (Polynomial). Let R be a ring. Then a polynomial with coefficients in
R is an expression
f = a0 + a1 X + a2 X 2 + · · · + an X n ,
with ai ∈ R. The symbols X i are formal symbols.
We identify f and f + 0R · X n+1 as the same things.
Definition 10.1.8 (Degree of polynomial). The degree of a polynomial f is the largest m
such that am 6= 0.
Definition 10.1.9 (Monic polynomial). Let f have degree m. If am = 1, then f is called
monic.
Definition 10.1.10 (Polynomial ring). We write R[X] for the set of all polynomials with
coefficients in R. The operations are performed in the obvious way, i.e. if f = a0 + a1 X +
· · · + An X n and g = b0 + b1 X + · · · + bk X k are polynomials, then
max{n,k}
f +g =
∑ (ai + bi)X i,
r=0
and
n+k
f ·g = ∑
i=0
i
!
∑ a j bi− j X i,
j=0
We identify R with the constant polynomials, i.e. polynomials ∑ ai X i with ai = 0 for i > 0.
In particular, 0R ∈ R and 1R ∈ R are the zero and one of R[X].
98
Chapter 10. Rings
This is in fact a ring.
Note that a polynomial is just a sequence of numbers, interpreted as the coefficients
of some formal symbols. While it does indeed induce a function in the obvious way, we
shall not identify the polynomial with the function given by it, since different polynomials
can give rise to the same function.
For example, in Z/2Z[X], f = X 2 + X is not the zero polynomial, since its coefficients
are not zero. However, f (0) = 0 and f (1) = 0. As a function, this is identically zero. So
f 6= 0 as a polynomial but f = 0 as a function.
Definition 10.1.11 (Power series). We write R[[X]] for the ring of power series on R, i.e.
f = a0 + a1 X + a2 X 2 + · · · ,
where each ai ∈ R. This has addition and multiplication the same as for polynomials, but
without upper limits.
A power series is very much not a function. We don’t talk about whether the sum
converges or not, because it is not a sum.
Example 10.1.5. Is 1 − X ∈ R[X] a unit? For every g = a0 + · · · + an X n (with an 6= 0), we
get
(1 − X)g = stuff + · · · − an X n+1 ,
which is not 1. So g cannot be the inverse of (1 − X). So (1 − X) is not a unit.
However, 1 − X ∈ R[[X]] is a unit, since
(1 − X)(1 + X + X 2 + X 3 + · · · ) = 1.
Definition 10.1.12 (Laurent polynomials). The Laurent polynomials on R is the set R[X, X −1 ],
i.e. each element is of the form
f = ∑ ai X i
i∈Z
where ai ∈ R and only finitely many ai are non-zero. The operations are the obvious ones.
We can also think of Laurent series, but we have to be careful. We allow infinitely
many positive coefficients, but only finitely many negative ones. Or else, in the formula
for multiplication, we will have an infinite sum, which is undefined.
Example 10.1.6. Let X be a set, and R be a ring. Then the set of all functions on X, i.e.
functions f : X → R, is a ring with ring operations given by
( f + g)(x) = f (x) + g(x),
( f · g)(x) = f (x) · g(x).
Here zero is the constant function 0 and one is the constant function 1.
Usually, we don’t want to consider all functions X → R. Instead, we look at some
subrings of this. For example, we can consider the ring of all continuous functions R → R.
This contains, for example, the polynomial functions, which is just R[X] (since in R,
polynomials are functions).
10.2 Homomorphisms, ideals, quotients and isomorphisms
99
10.2 Homomorphisms, ideals, quotients and isomorphisms
Just like groups, we will come up with analogues of homomorphisms, normal subgroups
(which are now known as ideals), and quotients.
Definition 10.2.1 (Homomorphism of rings). Let R, S be rings. A function ϕ : R → S is a
ring homomorphism if it preserves everything we can think of, i.e.
(i) ϕ (r1 + r2 ) = ϕ (r1 ) + ϕ (r2 ),
(ii) ϕ (0R ) = 0S ,
(iii) ϕ (r1 · r2 ) = ϕ (r1 ) · ϕ (r2 ),
(iv) ϕ (1R ) = 1S .
Definition 10.2.2 (Isomorphism of rings). If a homomorphism ϕ : R → S is a bijection,
we call it an isomorphism.
Definition 10.2.3 (Kernel). The kernel of a homomorphism ϕ : R → S is
ker(ϕ ) = {r ∈ R : ϕ (r) = 0S }.
Definition 10.2.4 (Image). The image of ϕ : R → S is
im(ϕ ) = {s ∈ S : s = ϕ (r) for some r ∈ R}.
Lemma 10.2.1. A homomorphism ϕ : R → S is injective if and only if ker ϕ = {0R }.
Proof. A ring homomorphism is in particular a group homomorphism ϕ : (R, +, 0R ) →
■
(S, +, 0S ) of abelian groups. So this follows from the case of groups.
In the group scenario, we had groups, subgroups and normal subgroups, which are
special subgroups. Here, we have a special kind of subsets of a ring that act like normal
subgroups, known as ideals.
Definition 10.2.5 (Ideal). A subset I ⊆ R is an ideal, written I ◁ R, if
(i) It is an additive subgroup of (R, +, 0R ), i.e. it is closed under addition and additive
inverses.
(additive closure)
(ii) If a ∈ I and b ∈ R, then a · b ∈ I.
(strong closure)
We say I is a proper ideal if I 6= R.
Note that the multiplicative closure is stronger than what we require for subrings —
for subrings, it has to be closed under multiplication by its own elements; for ideals, it has
to be closed under multiplication by everything in the world. This is similar to how normal
subgroups not only have to be closed under internal multiplication, but also conjugation
by external elements.
Lemma 10.2.2. If ϕ : R → S is a homomorphism, then ker(ϕ ) ◁ R.
Proof. Since ϕ : (R, +, 0R ) → (S, +, 0S ) is a group homomorphism, the kernel is a subgroup of (R, +, 0R ).
For the second part, let a ∈ ker(ϕ ), b ∈ R. We need to show that their product is in the
kernel. We have
ϕ (a · b) = ϕ (a) · ϕ (b) = 0 · ϕ (b) = 0.
So a · b ∈ ker(ϕ ).
■
100
Chapter 10. Rings
Example 10.2.1. Suppose I ◁ R is an ideal, and 1R ∈ I. Then for any r ∈ R, the axioms
entail 1R · r ∈ I. But 1R · r = r. So if 1R ∈ I, then I = R.
In other words, every proper ideal does not contain 1. In particular, every proper ideal
is not a subring, since a subring must contain 1.
We are starting to diverge from groups. In groups, a normal subgroup is a subgroup,
but here an ideal is not a subring.
Example 10.2.2. We can generalize the above a bit. Suppose I ◁ R and u ∈ I is a unit, i.e.
there is some v ∈ R such that u · v = 1R . Then by strong closure, 1R = u · v ∈ I. So I = R.
Hence proper ideals are not allowed to contain any unit at all, not just 1R .
Example 10.2.3. Consider the ring Z of integers. Then every ideal of Z is of the form
nZ = {· · · , −2n, −n, 0, n, 2n, · · · } ⊆ Z.
It is easy to see this is indeed an ideal.
To show these are all the ideals, let I ◁ Z. If I = {0}, then I = 0Z. Otherwise, let
n ∈ N be the smallest positive element of I. We want to show in fact I = nZ. Certainly
nZ ⊆ I by strong closure.
Now let m ∈ I. By the Euclidean algorithm, we can write
m = q·n+r
with 0 ≤ r < n. Now n, m ∈ I. So by strong closure, m, q · n ∈ I. So r = m − q · n ∈ I. As
n is the smallest positive element of I, and r < n, we must have r = 0. So m = q · n ∈ nZ.
So I ⊆ nZ. So I = nZ.
The key to proving this was that we can perform the Euclidean algorithm on Z. Thus,
for any ring R in which we can “do Euclidean algorithm”, every ideal is of the form
aR = {a · r : r ∈ R} for some a ∈ R. We will make this notion precise later.
Definition 10.2.6 (Generator of ideal). For an element a ∈ R, we write
(a) = aR = {a · r : r ∈ R} ◁ R.
This is the ideal generated by a.
In general, let a1 , a2 , · · · , ak ∈ R, we write
(a1 , a2 , · · · , ak ) = {a1 r1 + · · · + ak rk : r1 , · · · , rk ∈ R}.
This is the ideal generated by a1 , · · · , ak .
We can also have ideals generated by infinitely many objects, but we have to be careful,
since we cannot have infinite sums.
Definition 10.2.7 (Generator of ideal). For A ⊆ R a subset, the ideal generated by A is
)
(
(A) =
∑ ra · a : ra ∈ R, only finitely-many non-zero .
a∈A
10.2 Homomorphisms, ideals, quotients and isomorphisms
101
These ideals are rather nice ideals, since they are easy to describe, and often have
some nice properties.
Definition 10.2.8 (Principal ideal). An ideal I is a principal ideal if I = (a) for some
a ∈ R.
So what we have just shown for Z is that all ideals are principal. Not all rings are like
this. These are special types of rings, which we will study more in depth later.
Example 10.2.4. Consider the following subset:
{ f ∈ R[X] : the constant coefficient of f is 0}.
This is an ideal, as we can check manually (alternatively, it is the kernel of the “evaluate
at 0” homomorphism). It turns out this is a principal ideal. In fact, it is (X).
We have said ideals are like normal subgroups. The key idea is that we can divide by
ideals.
Definition 10.2.9 (Quotient ring). Let I ◁ R. The quotient ring R/I consists of the (additive) cosets r + I with the zero and one as 0R + I and 1R + I, and operations
(r1 + I) + (r2 + I) = (r1 + r2 ) + I
(r1 + I) · (r2 + I) = r1 r2 + I.
Proposition 10.2.1. The quotient ring is a ring, and the function
R → R/I
r 7→ r + I
is a ring homomorphism.
This is true, because we defined ideals to be those things that can be quotiented by. So
we just have to check we made the right definition.
Just as we could have come up with the definition of a normal subgroup by requiring
operations on the cosets to be well-defined, we could have come up with the definition of
an ideal by requiring the multiplication of cosets to be well-defined, and we would end up
with the strong closure property.
Proof. We know the group (R/I, +, 0R/I ) is well-defined, since I is a (normal) subgroup
of R. So we only have to check multiplication is well-defined.
Suppose r1 + I = r10 + I and r2 + I = r20 + I. Then r10 − r1 = a1 ∈ I and r20 − r2 = a2 ∈ I.
So
r10 r20 = (r1 + a1 )(r2 + a2 ) = r1 r2 + r1 a2 + r2 a1 + a1 a2 .
By the strong closure property, the last three objects are in I. So r10 r20 + I = r1 r2 + I.
It is easy to check that 0R + I and 1R + I are indeed the zero and one, and the function
given is clearly a homomorphism.
■
102
Chapter 10. Rings
Example 10.2.5. We have the ideals nZ ◁ Z. So we have the quotient rings Z/nZ. The
elements are of the form m + nZ, so they are just
0 + nZ, 1 + nZ, 2 + nZ, · · · , (n − 1) + nZ.
Addition and multiplication are just what we are used to — addition and multiplication
modulo n.
Note that it is easier to come up with ideals than normal subgroups — we can just
pick up random elements, and then take the ideal generated by them.
Example 10.2.6. Consider (X) ◁ C[X]. What is C[X]/(X)? Elements are represented by
a0 + a1 X + a2 X 2 + · · · + an X n + (X).
But everything but the first term is in (X). So every such thing is equivalent to a0 + (X). It
is not hard to convince yourself that this representation is unique. So in fact C[X]/(X) ∼
=
C, with the bijection a0 + (X) ↔ a0 .
If we want to prove things like this, we have to convince ourselves this representation
is unique. We can do that by hand here, but in general, we want to be able to do this
properly.
Proposition 10.2.2 (Euclidean algorithm for polynomials). Let F be a field and f , g ∈
F[X]. Then there is some r, q ∈ F[X] such that
f = gq + r,
with deg r < deg g.
This is like the usual Euclidean algorithm, except that instead of the absolute value,
we use the degree to measure how “big” the polynomial is.
Proof. Let deg( f ) = n. So
n
f = ∑ ai X i ,
i=0
and an 6= 0. Similarly, if deg g = m, then
m
g = ∑ bi X i ,
i=0
with bm 6= 0. If n < m, we let q = 0 and r = f , and done.
Otherwise, suppose n ≥ m, and proceed by induction on n.
We let
n−m
f1 = f − an b−1
g.
m X
This is possible since bm 6= 0, and F is a field. Then by construction, the coefficients of
X n cancel out. So deg( f1 ) < n.
10.2 Homomorphisms, ideals, quotients and isomorphisms
103
If n = m, then deg( f1 ) < n = m. So we can write
n−m
f = (an b−1
)g + f1 ,
m X
and deg( f1 ) < deg( f ). So done. Otherwise, if n > m, then as deg( f1 ) < n, by induction,
we can find r1 , q1 such that
f1 = gq1 + r1 ,
and deg(r1 ) < deg g = m. Then
n−m
n−m
f = an b−1
g + q1 g + r1 = (an b−1
+ q1 )g + r1 .
m X
m X
■
So done.
Now that we have a Euclidean algorithm for polynomials, we should be able to show
that every ideal of F[X] is generated by one polynomial. We will not prove it specifically here, but later show that in general, in every ring where the Euclidean algorithm is
possible, all ideals are principal.
We now look at some applications of the Euclidean algorithm.
Example 10.2.7. Consider R[X], and consider the principal ideal (X 2 + 1) ◁ R[X]. We
let R = R[X]/(X 2 + 1).
Elements of R are polynomials
a + a1 X + a2 X 2 + · · · + an X n +(X 2 + 1).
{z
}
|0
f
By the Euclidean algorithm, we have
f = q(X 2 + 1) + r,
with deg(r) < 2, i.e. r = b0 + b1 X. Thus f + (X 2 + 1) = r + (X 2 + 1). So every element
of R[X]/(X 2 + 1) is representable as a + bX for some a, b ∈ R.
Is this representation unique? If a + bX + (X 2 + 1) = a0 + b0 X + (X 2 + 1), then the
difference (a − a0 ) + (b − b0 )X ∈ (X 2 + 1). So it is (X 2 + 1)q for some q. This is possible
only if q = 0, since for non-zero q, we know (X 2 + 1)q has degree at least 2. So we must
have (a − a0 ) + (b − b0 )X = 0. So a + bX = a0 + b0 X. So the representation is unique.
What we’ve got is that every element in R is of the form a + bX, and X 2 + 1 = 0, i.e.
2
X = −1. This sounds like the complex numbers, just that we are calling it X instead of i.
To show this formally, we define the function
ϕ : R[X]/(X 2 + 1) → C
a + bX + (X 2 + 1) 7→ a + bi.
This is well-defined and a bijection. It is also clearly additive. So to prove this is an
isomorphism, we have to show it is multiplicative. We check this manually. We have
ϕ ((a + bX + (X 2 + 1))(c + dX + (X 2 + 1)))
= ϕ (ac + (ad + bc)X + bdX 2 + (X 2 + 1))
= ϕ ((ac − bd) + (ad + bc)X + (X 2 + 1))
= (ac − bd) + (ad + bc)i
= (a + bi)(c + di)
= ϕ (a + bX + (X 2 + 1))ϕ (c + dX + (X 2 + 1)).
104
Chapter 10. Rings
So this is indeed an isomorphism.
This is pretty tedious. Fortunately, we have some helpful results we can use, namely
the isomorphism theorems. These are exactly analogous to those for groups.
Theorem 10.2.1 (First isomorphism theorem). Let ϕ : R → S be a ring homomorphism.
Then ker(ϕ ) ◁ R, and
R ∼
= im(ϕ ) ≤ S.
ker(ϕ )
Proof. We have already seen ker(ϕ ) ◁ R. Now define
Φ : R/ ker(ϕ ) → im(ϕ )
r + ker(ϕ ) 7→ ϕ (r).
This well-defined, since if r + ker(ϕ ) = r0 + ker(ϕ ), then r − r0 ∈ ker(ϕ ). So ϕ (r − r0 ) = 0.
So ϕ (r) = ϕ (r0 ).
We don’t have to check this is bijective and additive, since that comes for free from the
(proof of the) isomorphism theorem of groups. So we just have to check it is multiplicative.
To show Φ is multiplicative, we have
Φ((r + ker(ϕ ))(t + ker(ϕ ))) = Φ(rt + ker(ϕ ))
= ϕ (rt)
= ϕ (r)ϕ (t)
= Φ(r + ker(ϕ ))Φ(t + ker(ϕ )).
■
This is more-or-less the same proof as the one for groups, just that we had a few more
things to check.
Since there is the first isomorphism theorem, we, obviously, have more coming.
Theorem 10.2.2 (Second isomorphism theorem). Let R ≤ S and J ◁ S. Then J ∩ R ◁ R,
and
R+J
S
= {r + J : r ∈ R} ≤
J
J
is a subring, and
R ∼ R+J
.
=
R∩J
J
Proof. Define the function
ϕ : R → S/J
r 7→ r + J.
Since this is the quotient map, it is a ring homomorphism. The kernel is
ker(ϕ ) = {r ∈ R : r + J = 0, i.e. r ∈ J} = R ∩ J.
10.2 Homomorphisms, ideals, quotients and isomorphisms
105
Then the image is
im(ϕ ) = {r + J : r ∈ R} =
R+J
.
J
Then by the first isomorphism theorem, we know R ∩ J ◁ R, and R+J
J ≤ S, and
R ∼ R+J
.
=
R∩J
J
■
Before we get to the third isomorphism theorem, recall we had the subgroup correspondence for groups. Analogously, for I ◁ R,
{subrings of R/I} ←→ {subrings of R which contain I}
R
L ≤ −→ {x ∈ R : x + I ∈ L}
I
S R
≤ ←− I ◁ S ≤ R.
I
I
This is exactly the same formula as for groups.
For groups, we had a correspondence for normal subgroups. Here, we have a correspondence between ideals
{ideals of R/I} ←→ {ideals of R which contain I}
It is important to note here that quotienting in groups and rings have different purposes.
In groups, we take quotients so that we have simpler groups to work with. In rings, we
often take quotients to get more interesting rings. For example, R[X] is quite boring,
but R[X]/(X 2 + 1) ∼
= C is more interesting. Thus this ideal correspondence allows us to
occasionally get interesting ideals from boring ones.
Theorem 10.2.3 (Third isomorphism theorem). Let I ◁ R and J ◁ R, and I ⊆ J. Then
J/I ◁ R/I and
R J ∼R
= .
I
I
J
Proof. We define the map
ϕ : R/I → R/J
r + I 7→ r + J.
This is well-defined and surjective by the groups case. Also it is a ring homomorphism
since multiplication in R/I and R/J are “the same”. The kernel is
J
ker(ϕ ) = {r + I : r + J = 0, i.e. r ∈ J} = .
I
So the result follows from the first isomorphism theorem.
■
106
Chapter 10. Rings
Note that for any ring R, there is a unique ring homomorphism Z → R, given by
ι :Z→R
n ≥ 0 7→ 1R + 1R + · · · + 1R
|
{z
}
n times
n ≤ 0 7→ −(1R + 1R + · · · + 1R )
|
{z
}
−n times
Any homomorphism Z → R must be given by this formula, since it must send the unit
to the unit, and we can show this is indeed a homomorphism by distributivity. So the
ring homomorphism is unique. In fancy language, we say Z is the initial object in (the
category of) rings.
We then know ker(ι ) ◁ Z. Thus ker(ι ) = nZ for some n.
Definition 10.2.10 (Characteristic of ring). Let R be a ring, and ι : Z → R be the unique
such map. The characteristic of R is the unique non-negative n such that ker(ι ) = nZ.
Example 10.2.8. The rings Z, Q, R, C all have characteristic 0. The ring Z/nZ has characteristic n. In particular, all natural numbers can be characteristics.
The notion of the characteristic will not be too useful in this course. However, fields of
non-zero characteristic often provide interesting examples and counterexamples to some
later theory.
10.3 Integral domains, field of factions, maximal and prime ideals
Many rings can be completely nothing like Z. For example, in Z, we know that if a, b 6= 0,
then ab 6= 0. However, in, say, Z/6Z, we get 2, 3 6= 0, but 2 · 3 = 0. Also, Z has some nice
properties such as every ideal is principal, and every integer has an (essentially) unique
factorization. We will now classify rings according to which properties they have.
We start with the most fundamental property that the product of two non-zero elements
are non-zero. We will almost exclusively work with rings that satisfy this property.
Definition 10.3.1 (Integral domain). A non-zero ring R is an integral domain if for all
a, b ∈ R, if a · b = 0R , then a = 0R or b = 0R .
An element that violates this property is known as a zero divisor.
Definition 10.3.2 (Zero divisor). An element x ∈ R is a zero divisor if x 6= 0 and there is
a y 6= 0 such that x · y = 0 ∈ R.
In other words, a ring is an integral domain if it has no zero divisors.
Example 10.3.1. All fields are integral domains, since if a · b = 0, and b 6= 0, then a =
a · (b · b−1 ) = 0. Similarly, if a 6= 0, then b = 0.
Example 10.3.2. A subring of an integral domain is an integral domain, since a zero
divisor in the small ring would also be a zero divisor in the big ring.
Example 10.3.3. Immediately, we know Z, Q, R, C are integral domains, since C is a
field, and the others are subrings of it. Also, Z[i] ≤ C is also an integral domain.
10.3 Integral domains, field of factions, maximal and prime ideals
107
These are the nice rings we like in number theory, since there we can sensibly talk
about things like factorization.
It turns out there are no interesting finite integral domains.
Lemma 10.3.1. Let R be a finite ring which is an integral domain. Then R is a field.
Proof. Let a ∈ R be non-zero, and consider the ring homomorphism
a·− : R → R
b 7→ a · b
We want to show this is injective. For this, it suffices to show the kernel is trivial. If
r ∈ ker(a · −), then a · r = 0. So r = 0 since R is an integral domain. So the kernel is
trivial.
Since R is finite, a · − must also be surjective. In particular, there is an element b ∈ R
such that a · b = 1R . So a has an inverse. Since a was arbitrary, R is a field.
■
So far, we know fields are integral domains, and subrings of integral domains are
integral domains. We have another good source of integral domain as follows:
Lemma 10.3.2. Let R be an integral domain. Then R[X] is also an integral domain.
Proof. We need to show that the product of two non-zero elements is non-zero. Let
f , g ∈ R[X] be non-zero, say
f = a0 + a1 X + · · · + an X n ∈ R[X]
g = b0 + b1 X + · · · + bm X m ∈ R[X],
with an , bm 6= 0. Then the coefficient of X n+m in f g is an bm . This is non-zero since R is
an integral domain. So f g is non-zero. So R[X] is an integral domain.
■
So, for instance, Z[X] is an integral domain.
We can also iterate this.
Notation 10.2. Write R[X,Y ] for (R[X])[Y ], the polynomial ring of R in two variables. In
general, write R[X1 , · · · , Xn ] = (· · · ((R[X1 ])[X2 ]) · · · )[Xn ].
Then if R is an integral domain, so is R[X1 , · · · , Xn ].
We now mimic the familiar construction of Q from Z. For any integral domain R,
we want to construct a field F that consists of “fractions” of elements in R. Recall that a
subring of any field is an integral domain. This says the converse — every integral domain
is the subring of some field.
Definition 10.3.3 (Field of fractions). Let R be an integral domain. A field of fractions F
of R is a field with the following properties
(i) R ≤ F
(ii) Every element of F may be written as a · b−1 for a, b ∈ R, where b−1 means the
multiplicative inverse to b 6= 0 in F.
For example, Q is the field of fractions of Z.
108
Chapter 10. Rings
Theorem 10.3.1. Every integral domain has a field of fractions.
Proof. The construction is exactly how we construct the rationals from the integers — as
equivalence classes of pairs of integers. We let
S = {(a, b) ∈ R × R : b 6= 0}.
We think of (a, b) ∈ S as ab . We define the equivalence relation ∼ on S by
(a, b) ∼ (c, d) ⇔ ad = bc.
We need to show this is indeed a equivalence relation. Symmetry and reflexivity are
obvious. To show transitivity, suppose
(a, b) ∼ (c, d),
(c, d) ∼ (e, f ),
i.e.
ad = bc,
c f = de.
We multiply the first equation by f and the second by b, to obtain
ad f = bc f ,
bc f = bed.
Rearranging, we get
d(a f − be) = 0.
Since d is in the denominator, d 6= 0. Since R is an integral domain, we must have
a f − be = 0, i.e. a f = be. So (a, b) ∼ (e, f ). This is where being an integral domain
is important.
Now let
F = S/∼
be the set of equivalence classes. We now want to check this is indeed the field of fractions.
We first want to show it is a field. We write ab = [(a, b)] ∈ F, and define the operations by
a c ad + bc
+ =
b d
bd
ac
a c
· = .
b d bd
These are well-defined, and make (F, +, ·, 01 , 11 ) into a ring. There are many things to
check, but those are straightforward, and we will not waste time doing that here.
Finally, we need to show every non-zero element has an inverse. Let ab 6= 0F , i.e.
0
b
a
b 6= 1 , or a · 1 6= b · 0 ∈ R, i.e. a 6= 0. Then a ∈ F is defined, and
b a ba
· =
= 1F .
a b ba
So ab has a multiplicative inverse. So F is a field.
10.3 Integral domains, field of factions, maximal and prime ideals
109
We now need to construct a subring of F that is isomorphic to R. To do so, we need
to define an injective isomorphism ϕ : R → F. This is given by
ϕ :R→F
r
r 7→ .
1
This is a ring homomorphism, as one can check easily. The kernel is the set of all r ∈ R
such that 1r = 0, i.e. r = 0. So the kernel is trivial, and ϕ is injective. Then by the first
isomorphism theorem, R ∼
= im(ϕ ) ⊆ F.
Finally, we need to show everything is a quotient of two things in R. We have
a a 1 a b −1
= · = ·
,
b 1 b 1 1
as required.
■
This gives us a very useful tool. Since this gives us a field from an integral domain,
this allows us to use field techniques to study integral domains. Moreover, we can use this
to construct new interesting fields from integral domains.
Example 10.3.4. Consider the integral domain C[X]. Its field of fractions is the field of
p(X)
all rational functions q(X)
, where p, q ∈ C[X].
To some people, it is a shame to think of rings as having elements. Instead, we should
think of a ring as a god-like object, and the only things we should ever mention are
its ideals. We should also not think of the ideals as containing elements, but just some
abstract objects, and all we know is how ideals relate to one another, e.g. if one contains
the other.
Under this philosophy, we can think of a field as follows:
Lemma 10.3.3. A (non-zero) ring R is a field if and only if its only ideals are {0} and R.
Note that we don’t need elements to define the ideals {0} and R. {0} can be defined
as the ideal that all other ideals contain, and R is the ideal that contains all other ideals.
Alternatively, we can reword this as “R is a field if and only if it has only two ideals” to
avoid mentioning explicit ideals.
Proof. (⇒) Let I ◁ R and R be a field. Suppose x 6= 0 ∈ I. Then as x is a unit, I = R.
(⇐) Suppose x 6= 0 ∈ R. Then (x) is an ideal of R. It is not {0} since it contains x. So
(x) = R. In other words 1R ∈ (x). But (x) is defined to be {x · y : y ∈ R}. So there is some
u ∈ R such that x · u = 1R . So x is a unit. Since x was arbitrary, R is a field.
■
This is another reason why fields are special. They have the simplest possible ideal
structure.
This motivates the following definition:
Definition 10.3.4 (Maximal ideal). An ideal I of a ring R is maximal if I 6= R and for any
ideal J with I ≤ J ≤ R, either J = I or J = R.
110
Chapter 10. Rings
The relation with what we’ve done above is quite simple. There is an easy way to
recognize if an ideal is maximal.
Lemma 10.3.4. An ideal I ◁ R is maximal if and only if R/I is a field.
Proof. R/I is a field if and only if {0} and R/I are the only ideals of R/I. By the ideal
correspondence, this is equivalent to saying I and R are the only ideals of R which contains
I, i.e. I is maximal. So done.
■
This is a nice result. This makes a correspondence between properties of ideals I and
properties of the quotient R/I. Here is another one:
Definition 10.3.5 (Prime ideal). An ideal I of a ring R is prime if I 6= R and whenever
a, b ∈ R are such that a · b ∈ I, then a ∈ I or b ∈ I.
This is like the opposite of the property of being an ideal — being an ideal means if
we have something in the ideal and something outside, the product is always in the ideal.
This does the backwards. If the product of two random things is in the ideal, then one of
them must be from the ideal.
Example 10.3.5. A non-zero ideal nZ ◁ Z is prime if and only if n is a prime.
To show this, first suppose n = p is a prime, and a · b ∈ pZ. So p | a · b. So p | a or
p | b, i.e. a ∈ pZ or b ∈ pZ.
For the other direction, suppose n = pq is a composite number (p, q 6= 1). Then n ∈ nZ
but p 6∈ nZ and q 6∈ nZ, since 0 < p, q < n.
So instead of talking about prime numbers, we can talk about prime ideals instead,
because ideals are better than elements.
We prove a result similar to the above:
Lemma 10.3.5. An ideal I ◁ R is prime if and only if R/I is an integral domain.
Proof. Let I be prime. Let a + I, b + I ∈ R/I, and suppose (a + I)(b + I) = 0R/I . By
definition, (a + I)(b + I) = ab + I. So we must have ab ∈ I. As I is prime, either a ∈ I or
b ∈ I. So a + I = 0R/I or b + I = 0R/I . So R/I is an integral domain.
Conversely, suppose R/I is an integral domain. Let a, b ∈ R be such that ab ∈ I. Then
(a + I)(b + I) = ab + I = 0R/I ∈ R/I. Since R/I is an integral domain, either a + I = 0R/I
or b + I = 0R/i , i.e. a ∈ I or b ∈ I. So I is a prime ideal.
■
Prime ideals and maximal ideals are the main types of ideals we care about. Note that
every field is an integral domain. So we immediately have the following result:
Proposition 10.3.1. Every maximal ideal is a prime ideal.
Proof. I ◁ R is maximal implies R/I is a field implies R/I is an integral domain implies I
is prime.
■
The converse is not true. For example, {0} ⊆ Z is prime but not maximal. Less
stupidly, (X) ∈ Z[X,Y ] is prime but not maximal (since Z[X,Y ]/(X) ∼
= Z[Y ]). We can
provide a more explicit proof of this, which is essentially the same.
10.4 Factorization in integral domains
111
Alternative proof. Let I be a maximal ideal, and suppose a, b 6∈ I but ab ∈ I. Then by
maximality, I + (a) = I + (b) = R = (1). So we can find some p, q ∈ R and n, m ∈ I such
that n + ap = m + bq = 1. Then
1 = (n + ap)(m + bq) = nm + apm + bqn + abpq ∈ I,
since n, m, ab ∈ I. This is a contradiction.
■
Lemma 10.3.6. Let R be an integral domain. Then its characteristic is either 0 or a prime
number.
Proof. Consider the unique map ϕ : Z → R, and ker(ϕ ) = nZ. Then n is the characteristic
of R by definition.
By the first isomorphism theorem, Z/nZ = im(ϕ ) ≤ R. So Z/nZ is an integral domain.
So nZ ◁ Z is a prime. So n = 0 or a prime number.
■
10.4 Factorization in integral domains
We now move on to tackle the problem of factorization in rings. For sanity, we suppose
throughout the section that R is an integral domain. We start by making loads of definitions.
Definition 10.4.1 (Unit). An element a ∈ R is a unit if there is a b ∈ R such that ab = 1R .
Equivalently, if the ideal (a) = R.
Definition 10.4.2 (Division). For elements a, b ∈ R, we say a divides b, written a | b, if
there is a c ∈ R such that b = ac. Equivalently, if (b) ⊆ (a).
Definition 10.4.3 (Associates). We say a, b ∈ R are associates if a = bc for some unit c.
Equivalently, if (a) = (b). Equivalently, if a | b and b | a.
In the integers, this can only happen if a and b differ by a sign, but in more interesting
rings, more interesting things can happen.
When considering division in rings, we often consider two associates to be “the same”.
For example, in Z, we can factorize 6 as
6 = 2 · 3 = (−2) · (−3),
but this does not violate unique factorization, since 2 and −2 are associates (and so are 3
and −3), and we consider these two factorizations to be “the same”.
Definition 10.4.4 (Irreducible). We say a ∈ R is irreducible if a 6= 0, a is not a unit, and
if a = xy, then x or y is a unit.
For integers, being irreducible is the same as being a prime number. However, “prime”
means something different in general rings.
Definition 10.4.5 (Prime). We say a ∈ R is prime if a is non-zero, not a unit, and whenever
a | xy, either a | x or a | y.
112
Chapter 10. Rings
It is important to note all these properties depend on the ring, not just the element
itself.
Example 10.4.1. 2 ∈ Z is a prime, but 2 ∈ Q is not (since it is a unit).
Similarly, the polynomial 2X ∈ Q[X] is irreducible (since 2 is a unit), but 2X ∈ Z[X]
not irreducible.
We have two things called prime, so they had better be related.
Lemma 10.4.1. A principal ideal (r) is a prime ideal in R if and only if r = 0 or r is prime.
Proof. (⇒) Let (r) be a prime ideal. If r = 0, then done. Otherwise, as prime ideals are
proper, i.e. not the whole ring, r is not a unit. Now suppose r | a · b. Then a · b ∈ (r). But
(r) is prime. So a ∈ (r) or b ∈ (r). So r | a or r | b. So r is prime.
(⇐) If r = 0, then (0) = {0} ◁ R, which is prime since R is an integral domain. Otherwise, let r 6= 0 be prime. Suppose a · b ∈ (r). This means r | a · b. So r | a or r | b. So
a ∈ (r) and b ∈ (r). So (r) is prime.
■
Note that in Z, prime numbers exactly match the irreducibles, but prime numbers are
also prime (surprise!). In general, it is not true that irreducibles are the same as primes.
However, one direction is always true.
Lemma 10.4.2. Let r ∈ R be prime. Then it is irreducible.
Proof. Let r ∈ R be prime, and suppose r = ab. Since r | r = ab, and r is prime, we must
have r | a or r | b. wlog, r | a. So a = rc for some c ∈ R. So r = ab = rcb. Since we are in
an integral domain, we must have 1 = cb. So b is a unit.
■
We now do a long interesting example.
Example 10.4.2. Let
√
√
R = Z[ −5] = {a + b −5 : a, b ∈ Z} ≤ C.
By definition, it is a subring of a field. So it is an integral domain. What are the units of
the ring? There is a nice trick we can use, when things are lying inside C. Consider the
function
N : R → Z≥0
given by
√
N(a + b −5) 7→ a2 + 5b2 .
It is convenient to think of this as z 7→ zz̄ = |z|2 . This satisfies N(z · w) = N(z)N(w). This
is a desirable thing to have for a ring, since it immediately implies all units have norm 1
— if r · s = 1, then 1 = N(1) = N(rs) = N(r)N(s). So N(r) = N(s) = 1.
So to find the units, we need to solve a2 +5b2 = 1, for a and b units. The only solutions
are ±1. So only ±1 ∈ R can be units, and these obviously are units. So these are all the
units.
10.4 Factorization in integral domains
113
Next, we claim 2 ∈ R is irreducible. We again use the norm. Suppose 2 = ab. Then
4 = N(2) = N(a)N(b). Now note that nothing has norm 2. a2 + 5b2 can never be 2 for
integers a, b ∈ Z. So we must
wlog, N(a) = 4, N(b) = 1. So b must be a unit.
√ have, √
Similarly, we see that 3, 1 + −5, 1 − −5 are irreducible (since there is also no element
of norm 3).
We have four irreducible elements in this ring. Are they prime? No! Note that
√
√
(1 + −5)(1 − −5) = 6 = 2 · 3.
√
√
We now claim 2 does not divide 1 +
−5
or
1
−
−5. So 2 is√not prime.
√
N(2)
| N(1 + −5). But N(2)
To √
show this, suppose 2 | 1 + −5. Then
√
√ = 4 and
N(1 + −5) = 6, and 4 ∤ 6. Similarly, N(1 − −5) = 6 as well. So 2 ∤ 1 ± −5.
There are several life lessons here. First is that primes and irreducibles are not the
same thing in general. We’ve always thought they were the same because we’ve been
living in the fantasy land of the integers. But we need to grow up.
The second
√ one is that
√ factorization into irreducibles is not necessarily unique, since
2 · 3 = (1 + −5)(1 − −5) are two factorizations into irreducibles.
However, there is one situation when unique factorizations holds. This is when we
have a Euclidean algorithm available.
Definition 10.4.6 (Euclidean domain). An integral domain R is a Euclidean domain (ED)
if there is a Euclidean function ϕ : R \ {0} → Z≥0 such that
(i) ϕ (a · b) ≥ ϕ (b) for all a, b 6= 0
(ii) If a, b ∈ R, with b 6= 0, then there are q, r ∈ R such that
a = b · q + r,
and either r = 0 or ϕ (r) < ϕ (b).
What are examples? Every time in this course where we said “Euclidean algorithm”,
we have an example.
Example 10.4.3. Z is a Euclidean domain with ϕ (n) = |n|.
Example 10.4.4. For any field F, F[X] is a Euclidean domain with
ϕ ( f ) = deg( f ).
Example 10.4.5. The Gaussian integers R = Z[i] ≤ C is a Euclidean domain with ϕ (z) =
N(z) = |z|2 . We now check this:
(i) We have ϕ (zw) = ϕ (z)ϕ (w) ≥ ϕ (z), since ϕ (w) is a positive integer.
(ii) Given a, b ∈ Z[i], b 6= 0. We consider the complex number
a
∈ C.
b
Consider the following complex plane, where the red dots are points in Z[i].
114
Chapter 10. Rings
Im
a
b
Re
By looking at the picture, we know that there is some q ∈ Z[i] such that ab − q < 1.
So we can write
a
= q+c
b
with |c| < 1. Then we have
a = b · q + |{z}
b·c .
r
We know r = a − bq ∈ Z[i], and ϕ (r) = N(bc) = N(b)N(c) < N(b) = ϕ (b). So
done.
This is not just true for the Gaussian integers. All we really needed was that R ≤ C, and
for any x ∈ C, there is some point in R that is not more
√ than 1 away from x. If we draw
some more pictures, we will see this is not true for Z[ −5].
Before we move on to prove unique factorization, we first derive something we’ve previously mentioned. Recall we showed that every ideal in Z is principal, and we proved this
by the Euclidean algorithm. So we might expect this to be true in an arbitrary Euclidean
domain.
Definition 10.4.7 (Principal ideal domain). A ring R is a principal ideal domain (PID) if
it is an integral domain, and every ideal is a principal ideal, i.e. for all I ◁ R, there is some
a such that I = (a).
Example 10.4.6. Z is a principal ideal domain.
Proposition 10.4.1. Let R be a Euclidean domain. Then R is a principal ideal domain.
We have already proved this, just that we did it for a particular Euclidean domain Z.
Nonetheless, we shall do it again.
Proof. Let R have a Euclidean function ϕ : R \ {0} → Z≥0 . We let I ◁ R be a non-zero
ideal, and let b ∈ I \ {0} be an element with ϕ (b) minimal. Then for any a ∈ I, we write
a = bq + r,
with r = 0 or ϕ (r) < ϕ (b). However, any such r must be in I since r = a − bq ∈ I. So
we cannot have ϕ (r) < ϕ (b). So we must have r = 0. So a = bq. So a ∈ (b). Since this
is true for all a ∈ I, we must have I ⊆ (b). On the other hand, since b ∈ I, we must have
(b) ⊆ I. So we must have I = (b).
■
10.4 Factorization in integral domains
115
This is exactly, word by word, the same proof as we gave for the integers, except we
replaced the absolute value with ϕ .
Example 10.4.7. Z is a Euclidean domain, and hence a principal ideal domain. Also, for
any field F, F[X] is a Euclidean domain, hence a principal ideal domain.
Also, Z[i] is a Euclidean domain, and hence a principal ideal domain.
What is a non-example of principal ideal domains? In Z[X], the ideal (2, X) ◁ Z[X]
is not a principal ideal. Suppose it were. Then (2, X) = ( f ). Since 2 ∈ (2, X) = ( f ), we
know 2 ∈ ( f ) , i.e. 2 = f · g for some g. So f has degree zero, and hence constant. So
f = ±1 or ±2.
If f = ±1, since ±1 are units, then ( f ) = Z[X]. But (2, X) 6= Z[X], since, say, 1 6∈
(2, X). If f = ±2, then since X ∈ (2, X) = ( f ), we must have ±2 | X, but this is clearly
false. So (2, X) cannot be a principal ideal.
Example 10.4.8. Let A ∈ Mn×n (F) be an n × n matrix over a field F. We consider the
following set
I = { f ∈ F[X] : f (A) = 0}.
This is an ideal — if f , g ∈ I, then ( f + g)(A) = f (A) + g(A) = 0. Similarly, if f ∈ I and
h ∈ F[X], then ( f g)(A) = f (A)g(A) = 0.
But we know F[X] is a principal ideal domain. So there must be some m ∈ F[X] such
that I = (m) for some m.
Suppose f ∈ F[X] such that f (A) = 0, i.e. f ∈ I. Then m | f . So m is a polynomial
that divides all polynomials that kill A, i.e. m is the minimal polynomial of A.
We have just proved that all matrices have minimal polynomials, and that the minimal
polynomial divides all other polynomials that kill A. Also, the minimal polynomial is
unique up to multiplication of units.
Let’s get further into number theory-like things. For a general ring, we cannot factorize things into irreducibles uniquely. However, in some rings, this is possible.
Definition 10.4.8 (Unique factorization domain). An integral domain R is a unique factorization domain (UFD) if
(i) Every non-unit may be written as a product of irreducibles;
(ii) If p1 p2 · · · pn = q1 · · · qm with pi , q j irreducibles, then n = m, and they can be reordered such that pi is an associate of qi .
This is a really nice property, and here we can do things we are familiar with in number
theory. So how do we know if something is a unique factorization domain?
Our goal is to show that all principal ideal domains are unique factorization domains.
To do so, we are going to prove several lemmas that give us some really nice properties
of principal ideal domains.
√
Recall we saw that every prime is an irreducible, but in Z[ −5], there are some
irreducibles that are not prime. However, this cannot happen in principal ideal domains.
Lemma 10.4.3. Let R be a principal ideal domain. If p ∈ R is irreducible, then it is prime.
Note that this is also true for general unique factorization domains, which we can
prove directly by unique factorization.
116
Chapter 10. Rings
Proof. Let p ∈ R be irreducible, and suppose p | a · b. Also, suppose p ∤ a. We need to
show p | b.
Consider the ideal (p, a) ◁ R. Since R is a principal ideal domain, there is some d ∈ R
such that (p, a) = (d). So d | p and d | a.
Since d | p, there is some q1 such that p = q1 d. As p is irreducible, either q1 or d is a
unit.
−1
If q1 is a unit, then d = q−1
1 p, and this divides a. So a = q1 px for some x. This is a
contradiction, since p ∤ a.
Therefore d is a unit. So (p, a) = (d) = R. In particular, 1R ∈ (p, a). So suppose
1R = rp + sa, for some r, s ∈ R. We now take the whole thing and multiply by b. Then we
get
b = rpb + sab.
We observe that ab is divisible by p, and so is p. So b is divisible by p. So done.
■
This is similar to the argument for integers. For integers, we would say if p ∤ a, then p
and a are coprime. Therefore there are some r, s such that 1 = rp + sa. Then we continue
the proof as above. Hence what we did in the middle is to do something similar to showing
p and a are “coprime”.
Another nice property of principal ideal domains is the following:
Lemma 10.4.4. Let R be a principal ideal domain. Let I1 ⊆ I2 ⊆ I3 ⊆ · · · be a chain of
ideals. Then there is some N ∈ N such that In = In+1 for some n ≥ N.
So in a principal ideal domain, we cannot have an infinite chain of bigger and bigger
ideals.
Definition 10.4.9 (Ascending chain condition). A ring satisfies the ascending chain condition (ACC) if there is no infinite strictly increasing chain of ideals.
Definition 10.4.10 (Noetherian ring). A ring that satisfies the ascending chain condition
is known as a Noetherian ring.
So we are proving that every principal ideal domain is Noetherian.
Proof. The obvious thing to do when we have an infinite chain of ideals is to take the
union of them. We let
I=
∞
[
In ,
n≥1
which is again an ideal. Since R is a principal ideal domain, I = (a) for some a ∈ R. We
S
know a ∈ I = ∞
n=0 In . So a ∈ IN for some N. Then we have
(a) ⊆ IN ⊆ I = (a)
So we must have IN = I. So In = IN = I for all n ≥ N.
■
10.4 Factorization in integral domains
117
Notice it is not important that I is generated by one element. If, for some reason, we
know I is generated by finitely many elements, then the same argument work. So if every
ideal is finitely generated, then the ring must be Noetherian. It turns out this is an if-andonly-if — if you are Noetherian, then every ideal is finitely generated. We will prove this
later on in the course.
Finally, we have done the setup, and we can prove the proposition promised.
Proposition 10.4.2. Let R be a principal ideal domain. Then R is a unique factorization
domain.
Proof. We first need to show any (non-unit) r ∈ R is a product of irreducibles.
Suppose r ∈ R cannot be factored as a product of irreducibles. Then it is certainly not
irreducible. So we can write r = r1 s1 , with r1 , s1 both non-units. Since r cannot be factored as a product of irreducibles, wlog r1 cannot be factored as a product of irreducibles
(if both can, then r would be a product of irreducibles). So we can write r1 = r2 s2 , with
r2 , s2 not units. Again, wlog r2 cannot be factored as a product of irreducibles. We continue this way.
By assumption, the process does not end, and then we have the following chain of
ideals:
(r) ⊆ (r1 ) ⊆ (r2 ) ⊆ · · · ⊆ (rn ) ⊆ · · ·
But then we have an ascending chain of ideals. By the ascending chain condition, these
are all eventually equal, i.e. there is some n such that (rn ) = (rn+1 ) = (rn+2 ) = · · · . In
particular, since (rn ) = (rn+1 ), and rn = rn+1 sn+1 , then sn+1 is a unit. But this is a contradiction, since sn+1 is not a unit. So r must be a product of irreducibles.
To show uniqueness, we let p1 p2 · · · pn = q1 q2 · · · qm , with pi , qi irreducible. So in
particular p1 | q1 · · · qm . Since p1 is irreducible, it is prime. So p1 divides some qi . We
reorder and suppose p1 | q1 . So q1 = p1 · a for some a. But since q1 is irreducible, a must
be a unit. So p1 , q1 are associates. Since R is a principal ideal domain, hence integral
domain, we can cancel p1 to obtain
p2 p3 · · · pn = (aq2 )q3 · · · qm .
We now rename aq2 as q2 , so that we in fact have
p2 p3 · · · pn = q2 q3 · · · qm .
We can then continue to show that pi and qi are associates for all i. This also shows that
n = m, or else if n = m + k, saw, then pk+1 · · · pn = 1, which is a contradiction.
■
We can now use this to define other familiar notions from number theory.
Definition 10.4.11 (Greatest common divisor). d is a greatest common divisor (gcd) of
integers a1 , a2 , · · · , an if d | ai for all i, and if any other d 0 satisfies d 0 | ai for all i, then
d 0 | d.
Note that the gcd of a set of numbers, if exists, is not unique. It is only well-defined
up to a unit.
This is a definition that says what it means to be a greatest common divisor. However,
it does not always have to exist.
118
Chapter 10. Rings
Lemma 10.4.5. Let R be a unique factorization domain. Then greatest common divisors
exists, and is unique up to associates.
Proof. We construct the greatest common divisor using the good-old way of prime factorization.
We let p1 , p2 , · · · , pm be a list of all irreducible factors of ai , such that no two of these
are associates of each other. We now write
m
ai = ui ∏ p j i j ,
n
j=1
where ni j ∈ N and ui are units. We let
m j = min{ni j },
i
and choose
m
d = ∏ pj j.
m
j=1
As, by definition, m j ≤ ni j for all i, we know d | ai for all i.
Finally, if d 0 | ai for all i, then we let
m
d 0 = v ∏ p jj .
t
j=1
Then we must have t j ≤ ni j for all i, j. So we must have t j ≤ m j for all j. So d 0 | d.
Uniqueness is immediate since any two greatest common divisors have to divide each
other.
■
10.5 Factorization in polynomial rings
Since polynomial rings are a bit more special than general integral domains, we can say a
bit more about them.
Recall that for F a field, we know F[X] is a Euclidean domain, hence a principal ideal
domain, hence a unique factorization domain. Therefore we know
(i) If I ◁ F[X], then I = ( f ) for some f ∈ F[X].
(ii) If f ∈ F[X], then f is irreducible if and only if f is prime.
(iii) Let f be irreducible, and suppose ( f ) ⊆ J ⊆ F[X]. Then J = (g) for some g. Since
( f ) ⊆ (g), we must have f = gh for some h. But f is irreducible. So either g or h
is a unit. If g is a unit, then (g) = F[X]. If h is a unit, then ( f ) = (g). So ( f ) is
a maximal ideal. Note that this argument is valid for any PID, not just polynomial
rings.
(iv) Let ( f ) be a prime ideal. Then f is prime. So f is irreducible. So ( f ) is maximal.
But we also know in complete generality that maximal ideals are prime. So in
F[X], prime ideals are the same as maximal ideals. Again, this is true for all PIDs
in general.
(v) Thus f is irreducible if and only if F[X]/( f ) is a field.
10.5 Factorization in polynomial rings
119
To use the last item, we can first show that F[X]/( f ) is a field, and then use this to deduce
that f is irreducible. But we can also do something more interesting — find an irreducible
f , and then generate an interesting field F[X]/( f ).
So we want to understand reducibility, i.e. we want to know whether we can factorize
a polynomial f . Firstly, we want to get rid of the trivial case where we just factor out a
scalar, e.g. 2X 2 + 2 = 2(X 2 + 1) ∈ Z[X] is a boring factorization.
Definition 10.5.1 (Content). Let R be a UFD and f = a0 + a1 X + · · · + an X n ∈ R[X]. The
content c( f ) of f is
c( f ) = gcd(a0 , a1 , · · · , an ) ∈ R.
Again, since the gcd is only defined up to a unit, so is the content.
Definition 10.5.2 (Primitive polynomial). A polynomial is primitive if c( f ) is a unit, i.e.
the ai are coprime.
Note that this is the best we can do. We cannot ask for c( f ) to be exactly 1, since the
gcd is only well-defined up to a unit.
We now want to prove the following important lemma:
Lemma 10.5.1 (Gauss’ lemma). Let R be a UFD, and f ∈ R[X] be a primitive polynomial.
Then f is reducible in R[X] if and only if f is reducible F[X], where F is the field of
fractions of R.
We can’t do this right away. We first need some preparation. Before that, we do some
examples.
Example 10.5.1. Consider X 3 + X + 1 ∈ Z[X]. This has content 1 so is primitive. We
show it is not reducible in Z[X], and hence not reducible in Q[X].
Suppose f is reducible in Q[X]. Then by Gauss’ lemma, this is reducible in Z[X]. So
we can write
X 3 + X + 1 = gh,
for some polynomials g, h ∈ Z[X], with g, h not units. But if g and h are not units, then
they cannot be constant, since the coefficients of X 3 + X + 1 are all 1 or 0. So they have
degree at least 1. Since the degrees add up to 3, we wlog suppose g has degree 1 and h
has degree 2. So suppose
g = b0 + b1 X,
h = c0 + c1 X + c2 X 2 .
Multiplying out and equating coefficients, we get
b0 c0 = 1
c2 b1 = 1
So b0 and b1 must be ±1. So g is either 1 + X, 1 − X, −1 + X or −1 − X, and hence has
±1 as a root. But this is a contradiction, since ±1 is not a root of X 3 + X + 1. So f is not
reducible in Q. In particular, f has no root in Q.
120
Chapter 10. Rings
We see the advantage of using Gauss’ lemma — if we worked in Q instead, we could
have gotten to the step b0 c0 = 1, and then we can do nothing, since b0 and c0 can be many
things if we live in Q.
Now we start working towards proving this.
Lemma 10.5.2. Let R be a UFD. If f , g ∈ R[X] are primitive, then so is f g.
Proof. We let
f = a0 + a1 X + · · · + an X n ,
g = b0 + b1 X + · · · + bm X m ,
where an , bm 6= 0, and f , g are primitive. We want to show that the content of f g is a unit.
Now suppose f g is not primitive. Then c( f g) is not a unit. Since R is a UFD, we can
find an irreducible p which divides c( f g).
By assumption, c( f ) and c(g) are units. So p ∤ c( f ) and p ∤ c(g). So suppose p | a0 ,
p | a1 , . . . , p | ak−1 but p ∤ ak . Note it is possible that k = 0. Similarly, suppose p | b0 , p |
b1 , · · · , p | bℓ−1 , p ∤ bℓ .
We look at the coefficient of X k+ℓ in f g. It is given by
∑
ai b j = ak+ℓ b0 + · · · + ak+1 bℓ−1 + ak bℓ + ak−1 bℓ+1 + · · · + a0 bℓ+k .
i+ j=k+ℓ
By assumption, this is divisible by p. So
p|
∑
ai b j .
i+ j=k+ℓ
However, the terms ak+ℓ b0 + · · · + ak+1 bℓ−1 , is divisible by p, as p | b j for j < ℓ. Similarly,
ak−1 bℓ+1 + · · · + a0 bℓ+k is divisible by p. So we must have p | ak bℓ . As p is irreducible,
and hence prime, we must have p | ak or p | bℓ . This is a contradiction. So c( f g) must be
a unit.
■
Corollary 10.5.1. Let R be a UFD. Then for f , g ∈ R[X], we have that c( f g) is an associate
of c( f )c(g).
Again, we cannot say they are equal, since content is only well-defined up to a unit.
Proof. We can write f = c( f ) f1 and g = c(g)g1 , with f1 and g1 primitive. Then
f g = c( f )c(g) f1 g1 .
Since f1 g1 is primitive, so c( f )c(g) is a gcd of the coefficients of f g, and so is c( f g), by
definition. So they are associates.
■
Finally, we can prove Gauss’ lemma.
Lemma 10.5.3 (Gauss’ lemma). Let R be a UFD, and f ∈ R[X] be a primitive polynomial.
Then f is reducible in R[X] if and only if f is reducible F[X], where F is the field of
fractions of R.
10.5 Factorization in polynomial rings
121
Proof. We will show that a primitive f ∈ R[X] is reducible in R[X] if and only if f is
reducible in F[X].
One direction is almost immediately obvious. Let f = gh be a product in R[X] with
g, h not units. As f is primitive, so are g and h. So both have degree > 0. So g, h are not
units in F[X]. So f is reducible in F[X].
The other direction is less obvious. We let f = gh in F[X], with g, h not units. So g
and h have degree > 0, since F is a field. So we can clear denominators by finding a, b ∈ R
such that (ag), (bh) ∈ R[X] (e.g. let a be the product of denominators of coefficients of g).
Then we get
ab f = (ag)(bh),
and this is a factorization in R[X]. Here we have to be careful — (ag) is one thing that
lives in R[X], and is not necessarily a product in R[X], since g might not be in R[X]. So
we should just treat it as a single symbol.
We now write
(ag) = c(ag)g1 ,
(bh) = c(bh)h1 ,
where g1 , h1 are primitive. So we have
ab = c(ab f ) = c((ag)(bh)) = u · c(ag)c(bh),
where u ∈ R is a unit, by the previous corollary. But also we have
ab f = c(ag)c(gh)g1 h1 = u−1 abg1 h1 .
So cancelling ab gives
f = u−1 g1 h1 ∈ R[X].
So f is reducible in R[X].
■
If this looks fancy and magical, you can try to do this explicitly in the case where
R = Z and F = Q. Then you will probably get enlightened.
We will do another proof performed in a similar manner.
Proposition 10.5.1. Let R be a UFD, and F be its field of fractions. Let g ∈ R[X] be
primitive. We let
J = (g) ◁ R[X],
I = (g) ◁ F[X].
Then
J = I ∩ R[X].
In other words, if f ∈ R[X] and we can write it as f = gh, with h ∈ F[X], then in fact
h ∈ R[X].
122
Chapter 10. Rings
Proof. The strategy is the same — we clear denominators in the equation f = gh, and
then use contents to get that down in R[X].
We certainly have J ⊆ I ∩ R[X]. Now let f ∈ I ∩ R[X]. So we can write
f = gh,
with h ∈ F[X]. So we can choose b ∈ R such that bh ∈ R[X]. Then we know
b f = g(bh) ∈ R[X].
We let
(bh) = c(bh)h1 ,
for h1 ∈ R[X] primitive. Thus
b f = c(bh)gh1 .
Since g is primitive, so is gh1 . So c(bh) = uc(b f ) for u a unit. But b f is really a product
in R[X]. So we have
c(b f ) = c(b)c( f ) = bc( f ).
So we have
b f = ubc( f )gh1 .
Cancelling b gives
f = g(uc( f )h1 ).
So g | f in R[X]. So f ∈ J.
■
From this we can get ourselves a large class of UFDs.
Theorem 10.5.1. If R is a UFD, then R[X] is a UFD.
In particular, if R is a UFD, then R[X1 , · · · , Xn ] is also a UFD.
Proof. We know R[X] has a notion of degree. So we will combine this with the fact that
R is a UFD.
Let f ∈ R[X]. We can write f = c( f ) f1 , with f1 primitive. Firstly, as R is a UFD, we
may factor
c( f ) = p1 p2 · · · pn ,
for pi ∈ R irreducible (and also irreducible in R[X]). Now we want to deal with f1 .
If f1 is not irreducible, then we can write
f1 = f2 f3 ,
with f2 , f3 both not units. Since f1 is primitive, f2 , f3 also cannot be constants. So we must
have deg f2 , deg f3 > 0. Also, since deg f2 +deg f3 = deg f1 , we must have deg f2 , deg f3 <
10.5 Factorization in polynomial rings
123
deg f1 . If f2 , f3 are irreducible, then done. Otherwise, keep on going. We will eventually
stop since the degrees have to keep on decreasing. So we can write it as
f1 = q1 · · · qm ,
with qi irreducible. So we can write
f = p1 p2 · · · pn q1 q2 · · · qm ,
a product of irreducibles.
For uniqueness, we first deal with the p’s. We note that
c( f ) = p1 p2 · · · pn
is a unique factorization of the content, up to reordering and associates, as R is a UFD. So
cancelling the content, we only have to show that primitives can be factored uniquely.
Suppose we have two factorizations
f1 = q1 q2 · · · qm = r1 r2 · · · rℓ .
Note that each qi and each ri is a factor of the primitive polynomial f1 , so are also primitive. Now we do (maybe) the unexpected thing. We let F be the field of fractions of R,
and consider qi , ri ∈ F[X]. Since F is a field, F[X] is a Euclidean domain, hence principal
ideal domain, hence unique factorization domain.
By Gauss’ lemma, since the qi and ri are irreducible in R[X], they are also irreducible
in F[X]. As F[X] is a UFD, we find that ℓ = m, and after reordering, ri and qi are associates,
say
ri = ui qi ,
with ui ∈ F[X] a unit. What we want to say is that ri is a unit times qi in R[X]. Firstly,
note that ui ∈ F as it is a unit. Clearing denominators, we can write
ai ri = bi qi ∈ R[X].
Taking contents, since ri , qi are primitives, we know ai and bi are associates, say
bi = vi ai ,
with vi ∈ R a unit. Cancelling ai on both sides, we know ri = vi qi as required.
■
The key idea is to use Gauss’ lemma to say the reducibility in R[X] is the same as
reducibility in F[X], as long as we are primitive. The first part about contents is just to
turn everything into primitives.
Note that the last part of the proof is just our previous proposition. We could have
applied it, but we decide to spell it out in full for clarity.
Example 10.5.2. We know Z[X] is a UFD, and if R is a UFD, then R[X1 , · · · , Xn ] is also a
UFD.
124
Chapter 10. Rings
This is a useful thing to know. In particular, it gives us examples of UFDs that are not
PIDs. However, in such rings, we would also like to have an easy to determine whether
something is reducible. Fortunately, we have the following criterion:
Proposition 10.5.2 (Eisenstein’s criterion). Let R be a UFD, and let
f = a0 + a1 X + · · · + an X n ∈ R[X]
be primitive with an 6= 0. Let p ∈ R be irreducible (hence prime) be such that
(i) p ∤ an ;
(ii) p | ai for all 0 ≤ i < n;
(iii) p2 ∤ a0 .
Then f is irreducible in R[X], and hence in F[X] (where F is the field of fractions of R).
It is important that we work in R[X] all the time, until the end where we apply Gauss’
lemma. Otherwise, we cannot possibly apply Eisenstein’s criterion since there are no
primes in F.
Proof. Suppose we have a factorization f = gh with
g = r0 + r1 X + · · · + rk X k
h = s0 + s1 X + · · · + sℓ X ℓ ,
for rk , sℓ 6= 0.
We know rk sℓ = an . Since p ∤ an , so p ∤ rk and p ∤ sℓ . We can also look at bottom
coefficients. We know r0 s0 = a0 . We know p | a0 and p2 ∤ a0 . So p divides exactly one of
r0 and s0 . wlog, p | r0 and p ∤ s0 .
Now let j be such that
p | r0 ,
p | r1 , · · · ,
p | r j−1 ,
p ∤ r j.
We now look at a j . This is, by definition,
a j = r0 s j + r1 s j−1 + · · · + r j−1 s1 + r j s0 .
We know r0 , · · · , r j−1 are all divisible by p. So
p | r0 s j + r1 s j−1 + · · · + r j−1 s1 .
Also, since p ∤ r j and p ∤ s0 , we know p ∤ r j s0 , using the fact that p is prime. So p ∤ a j . So
we must have j = n.
We also know that j ≤ k ≤ n. So we must have j = k = n. So deg g = n. Hence
ℓ = n − h = 0. So h is a constant. But we also know f is primitive. So h must be a unit.
So this is not a proper factorization.
■
Example 10.5.3. Consider the polynomial X n − p ∈ Z[X] for p a prime. Apply Eisenstein’s criterion with p, and observe all the conditions hold. This is certainly primitive,
since this is monic. So X n − p is irreducible in Z[X], hence in Q[X]. In particular, X n − p
√
has no rational roots, i.e. n p is irrational (for n > 1).
10.6 Gaussian integers
125
Example 10.5.4. Consider a polynomial
f = X p−1 + X p−2 + · · · + X 2 + X + 1 ∈ Z[X],
where p is a prime number. If we look at this, we notice Eisenstein’s criteria does not
apply. What should we do? We observe that
f=
Xp −1
.
X −1
So it might be a good idea to let Y = X − 1. Then we get a new polynomial
p
p
ˆf = fˆ(Y ) = (Y + 1) − 1 = Y p−1 + p Y p−2 + p Y p−3 + · · · +
.
Y
1
2
p−1
When we look
at it hard enough, we notice Eisenstein’s criteria can be applied — we
p
p
know p | i for 1 ≤ i ≤ p − 1, but p2 ∤ p−1
= p. So fˆ is irreducible in Z[Y ].
Now if we had a factorization
f (X) = g(X)h(X) ∈ Z[X],
then we get
fˆ(Y ) = g(Y + 1)h(Y + 1)
in Z[Y ]. So f is irreducible.
Hence none of the roots of f are rational (but we already know that — they are not
even real!).
10.6 Gaussian integers
We’ve mentioned the Gaussian integers already.
Definition 10.6.1 (Gaussian integers). The Gaussian integers is the subring
Z[i] = {a + bi : a, b ∈ Z} ≤ C.
We have already shown that the norm N(a + ib) = a2 + b2 is a Euclidean function
for Z[i]. So Z[i] is a Euclidean domain, hence principal ideal domain, hence a unique
factorization domain.
Since the units must have norm 1, they are precisely ±1, ±i. What does factorization
in Z[i] look like? What are the primes? We know we are going to get new primes, i.e.
primes that are not integers, while we will lose some other primes. For example, we have
2 = (1 + i)(1 − i).
So 2 is not irreducible, hence not prime. However, 3 is a prime. We have N(3) = 9. So
if 3 = uv, with u, v not units, then 9 = N(u)N(v), and neither N(u) nor N(v) are 1. So
N(u) = N(v) = 3. However, 3 = a2 + b2 has no solutions with a, b ∈ Z. So there is nothing
of norm 3. So 3 is irreducible, hence a prime.
Also, 5 is not prime, since
5 = (1 + 2i)(1 − 2i).
How can we understand which primes stay as primes in the Gaussian integers?
126
Chapter 10. Rings
Proposition 10.6.1. A prime number p ∈ Z is prime in Z[i] if and only if p 6= a2 + b2 for
a, b ∈ Z \ {0}.
The proof is exactly what we have done so far.
Proof. If p = a2 + b2 , then p = (a + ib)(a − ib). So p is not irreducible.
Now suppose p = uv, with u, v not units. Taking norms, we get p2 = N(u)N(v). So if
u and v are not units, then N(u) = N(v) = p. Writing u = a + ib, then this says a2 + b2 =
p.
■
So what we have to do is to understand when a prime p can be written as a sum of two
squares. We will need the following helpful lemma:
Lemma 10.6.1. Let p be a prime number. Let F p = Z/pZ be the field with p elements.
×∼
Let F×
p = F p \ {0} be the group of invertible elements under multiplication. Then F p =
C p−1 .
Proof. Certainly F×
p has order p − 1, and is abelian. We know from the classification of
finite abelian groups that if F×
p is not cyclic, then it must contain a subgroup Cm ×Cm for
m > 1 (we can write it as Cd ×Cd 0 × · · · , and that d 0 | d. So Cd has a subgroup isomorphic
to Cd 0 ).
We consider the polynomial X m − 1 ∈ F p [x], which is a UFD. At best, this factors into
m linear factors. So X m − 1 has at most m distinct roots. But if Cm × Cm ≤ F×
p , then we
2
2
can find m elements of order dividing m. So there are m elements of F p which are roots
of X m − 1. This is a contradiction. So F×
■
p is cyclic.
This is a funny proof, since we have not found any element that has order p − 1.
Proposition 10.6.2. The primes in Z[i] are, up to associates,
(i) Prime numbers p ∈ Z ≤ Z[i] such that p ≡ 3 (mod 4).
(ii) Gaussian integers z ∈ Z[i] with N(z) = zz̄ = p for some prime p such that p = 2 or
p ≡ 1 (mod 4).
Proof. We first show these are primes. If p ≡ 3 (mod 4), then p 6= a2 + b2 , since a square
number mod 4 is always 0 or 1. So these are primes in Z[i].
On the other hand, if N(z) = p, and z = uv, then N(u)N(v) = p. So N(u) is 1 or N(v)
is 1. So u or v is a unit. Note that we did not use the condition that p 6≡ 3 (mod 4). This
is not needed, since N(z) is always a sum of squares, and hence N(z) cannot be a prime
that is 3 mod 4.
Now let z ∈ Z[i] be irreducible, hence prime. Then z̄ is also irreducible. So N(z) = zz̄ is
a factorization of N(z) into irreducibles. Let p ∈ Z be an ordinary prime number dividing
N(z), which exists since N(z) 6= 1.
Now if p ≡ 3 (mod 4), then p itself is prime in Z[i] by the first part of the proof. So
p | N(z) = zz̄. So p | z or p | z̄. Note that if p | z̄, then p | z by taking complex conjugates.
So we get p | z. Since both p and z are irreducible, they must be equal up to associates.
Otherwise, we get p = 2 or p ≡ 1 (mod 4). If p ≡ 1 (mod 4), then p − 1 = 4k for
∼
some k ∈ Z. As F×
p = C p−1 = C4k , there is a unique element of order 2 (this is true for
any cyclic group of order 4k — think Z/4kZ). This must be [−1] ∈ F p . Now let a ∈ F×
p
be an element of order 4. Then a2 has order 2. So [a2 ] = [−1].
10.6 Gaussian integers
127
This is a complicated way of saying we can find an a such that p | a2 + 1. Thus p | (a +
i)(a − i). In the case where p = 2, we know by checking directly that 2 = (1 + i)(1 − i).
In either case, we deduce that p (or 2) is not prime (hence irreducible), since it clearly
does not divide a ± i (or 1 ± i). So we can write p = z1 z2 , for z1 , z2 ∈ Z[i] not units. Now
we get
p2 = N(p) = N(z1 )N(z2 ).
As the zi are not units, we know N(z1 ) = N(z2 ) = p. By definition, this means p = z1 z̄1 =
z2 z̄2 . But also p = z1 z2 . So we must have z̄1 = z2 .
Finally, we have p = z1 z̄1 | N(z) = zz̄. All these z, zi are irreducible. So z must be an
associate of z1 (or maybe z̄1 ). So in particular N(z) = p.
■
Corollary 10.6.1. An integer n ∈ Z≥0 may be written as x2 + y2 (as the sum of two
squares) if and only if “when we write n = pn11 pn22 · · · pnk k as a product as distinct primes,
then pi ≡ 3 (mod 4) implies ni is even”.
We have proved this in the case when n is a prime.
Proof. If n = x2 + y2 , then we have
n = (x + iy)(x − iy) = N(x + iy).
Let z = x + iy. So we can write z = α1 · · · αq as a product of irreducibles in Z[i]. By the
proposition, each αi is either αi = p (a genuine prime number with p ≡ 3 (mod 4)), or
N(αi ) = p is a prime number which is either 2 or ≡ 1 (mod 4). We now take the norm to
obtain
N = x2 + y2 = N(z) = N(α1 )N(α2 ) · · · N(αq ).
Now each N(αi ) is either p2 with p ≡ 3 (mod 4), or is just p for p = 2 or p ≡ 1 (mod 4).
So if pm is the largest power of p divides n, we find that n must be even if p ≡ 3 (mod 4).
Conversely, let n = pn11 pn22 · · · pnk k be a product of distinct primes. Now for each pi ,
either pi ≡ 3 (mod 4), and ni is even, in which case
n /2
pni i = (p2i )ni /2 = N(pi i );
or pi = 2 or pi ≡ 1 (mod 4), in which case, the above proof shows that pi = N(αi ) for
some αi . So pni = N(αin ).
Since the norm is multiplicative, we can write n as the norm of some z ∈ Z[i]. So
n = N(z) = N(x + iy) = x2 + y2 ,
as required.
■
Example 10.6.1. We know 65 = 5 × 13. Since 5, 13 ≡ 1 (mod 4), it is a sum of squares.
Moreover, the proof tells us how to find 65 as the sum of squares. We have to factor 5 and
13 in Z[i]. We have
5 = (2 + i)(2 − i)
13 = (2 + 3i)(2 − 3i).
128
Chapter 10. Rings
So we know
65 = N(2 + i)N(2 + 3i) = N((2 + i)(2 + 3i)) = N(1 + 8i) = 12 + 82 .
But there is a choice here. We had to pick which factor is α and which is ᾱ . So we can
also write
65 = N((2 + i)(2 − 3i)) = N(7 − 4i) = 72 + 42 .
So not only are we able to write them as sum of squares, but this also gives us many ways
of writing 65 as a sum of squares.
10.7 Algebraic integers
We generalize the idea of Gaussian integers to algebraic integers.
Definition 10.7.1 (Algebraic integer). An α ∈ C is called an algebraic integer if it is a
root of a monic polynomial in Z[X], i.e. there is a monic f ∈ Z[X] such that f (α ) = 0.
We can immediately check that this is a sensible definition — not all complex numbers
are algebraic integers, since there are only countably many polynomials with integer coefficients, hence only countably many algebraic integers, but there are uncountably many
complex numbers.
Notation 10.3. For α an algebraic integer, we write Z[α ] ≤ C for the smallest subring
containing α .
This can also be defined for arbitrary complex numbers, but it is less interesting.
We can also construct Z[α ] by taking it as the image of the map ϕ : Z[X] → C given
by g 7→ g(α ). So we can also write
Z[α ] =
Z[X]
,
I
I = ker ϕ .
Note that I is non-empty, since, say, f ∈ I, by definition of an algebraic integer.
Proposition 10.7.1. Let α ∈ C be an algebraic integer. Then the ideal
I = ker(ϕ : Z[X] → C, f 7→ f (α ))
is principal, and equal to ( fα ) for some irreducible monic fα .
This is a non-trivial theorem, since Z[X] is not a principal ideal domain. So there is
no immediate guarantee that I is generated by one polynomial.
Definition 10.7.2 (Minimal polynomial). Let α ∈ C be an algebraic integer. Then the
minimal polynomial is a polynomial fα is the irreducible monic such that I = ker(ϕ ) =
( fα ).
10.7 Algebraic integers
129
Proof. By definition, there is a monic f ∈ Z[X] such that f (a) = 0. So f ∈ I. So I 6= 0.
Now let fα ∈ I be such a polynomial of minimal degree. We may suppose that fα is
primitive. We want to show that I = ( fα ), and that fα is irreducible.
Let h ∈ I. We pretend we are living in Q[X]. Then we have the Euclidean algorithm.
So we can write
h = fα q + r,
with r = 0 or deg r < deg fα . This was done over Q[X], not Z[X]. We now clear denominators. We multiply by some a ∈ Z to get
ah = fα (aq) + (ar),
where now (aq), (ar) ∈ Z[X]. We now evaluate these polynomials at α . Then we have
ah(α ) = fα (α )aq(α ) + ar(α ).
We know fα (α ) = h(α ) = 0, since fα and h are both in I. So ar(α ) = 0. So (ar) ∈ I.
As fα ∈ I has minimal degree, we cannot have deg(r) = deg(ar) < deg( fa ). So we must
have r = 0.
Hence we know
ah = fα · (aq)
is a factorization in Z[X]. This is almost right, but we want to factor h, not ah. Again,
taking contents of everything, we get
ac(h) = c(ah) = c( fα (aq)) = c(aq),
as fα is primitive. In particular, a | c(aq). This, by definition of content, means (aq) can
be written as aq̄, where q̄ ∈ Z[X]. Cancelling, we get q = q̄ ∈ Z[X]. So we know
h = fα q ∈ ( fα ).
So we know I = ( fα ).
To show fα is irreducible, note that
Z[X] ∼ Z[X] ∼
= im(ϕ ) = Z[α ] ≤ C.
=
( fα ) ker ϕ
Since C is an integral domain, so is im(ϕ ). So we know Z[X]/( fα ) is an integral domain.
So ( fα ) is prime. So fα is prime, hence irreducible.
If this final line looks magical, we can unravel this proof as follows: suppose fα =
pq for some non-units pq. Then since fα (α ) = 0, we know p(α )q(α ) = 0. Since
p(α ), q(α ) ∈ C, which is an integral domain, we must have, say, p(α ) = 0. But then
■
deg p < deg fα , so p 6∈ I = ( fα ). Contradiction.
Example 10.7.1.
(i) We know α√= i is an algebraic integer with fα = X 2 + 1.
(ii) Also, α = 2 is an algebraic integer with fα = X 2 − 2.
130
Chapter 10. Rings
√
(iii) More interestingly, α = 12 (1 + −3) is an algebraic integer with fα = X 2 − X − 1.
(iv) The polynomial X 5 − X + d ∈ Z[X] with d ∈ Z≥0 has precisely one real root α ,
which is an algebraic integer. It is a theorem, which will be proved in
√IID Galois
Theory, that this α cannot be constructed from integers via +, −, ×, ÷, n · . It is also
a theorem, found in IID Galois Theory, that degree 5 polynomials are the smallest
degree for which this can happen (the prove involves writing down formulas analogous to the quadratic formula for degree 3 and 4 polynomials).
Lemma 10.7.1. Let α ∈ Q be an algebraic integer. Then α ∈ Z.
Proof. Let fα ∈ Z[X] be the minimal polynomial, which is irreducible. In Q[X], the
polynomial X − α must divide fα . However, by Gauss’ lemma, we know f ∈ Q[X] is
■
irreducible. So we must have fα = X − α ∈ Z[X]. So α is an integer.
It turns out the collection of all algebraic integers form a subring of C. This is not at
all obvious — given f , g ∈ Z[X] monic such that f (α ) = g(α ) = 0, there is no easy way
to find a new monic h such that h(α + β ) = 0. We will prove this much later on in the
course.
10.8 Noetherian rings
We now revisit the idea of Noetherian rings, something we have briefly mentioned when
proving that PIDs are UFDs.
Definition 10.8.1 (Noetherian ring). A ring is Noetherian if for any chain of ideals
I1 ⊆ I2 ⊆ I3 ⊆ · · · ,
there is some N such that IN = IN+1 = IN+2 = · · · .
This condition is known as the ascending chain condition.
Example 10.8.1. Every finite ring is Noetherian. This is since there are only finitely many
possible ideals.
Example 10.8.2. Every field is Noetherian. This is since there are only two possible
ideals.
Example 10.8.3. Every principal ideal domain (e.g. Z) is Noetherian. This is easy to
check directly, but the next proposition will make this utterly trivial.
Most rings we love and know are indeed Noetherian. However, we can explicitly
construct some non-Noetherian ideals.
Example 10.8.4. The ring Z[X1 , X2 , X3 , · · · ] is not Noetherian. This has the chain of
strictly increasing ideals
(X1 ) ⊆ (X1 , X2 ) ⊆ (X1 , X2 , X3 ) ⊆ · · · .
We have the following proposition that makes Noetherian rings much more concrete,
and makes it obvious why PIDs are Noetherian.
10.8 Noetherian rings
131
Definition 10.8.2 (Finitely generated ideal). An ideal I is finitely generated if it can be
written as I = (r1 , · · · , rn ) for some r1 , · · · , rn ∈ R.
Proposition 10.8.1. A ring is Noetherian if and only if every ideal is finitely generated.
Every PID trivially satisfies this condition. So we know every PID is Noetherian.
Proof. We start with the easier direction — from concrete to abstract.
Suppose every ideal of R is finitely generated. Given the chain I1 ⊆ I2 ⊆ · · · , consider
the ideal
I = I1 ∪ I2 ∪ I3 ∪ · · · .
This is obviously an ideal, and you will check this manually in example sheet 2.
We know I is finitely generated, say I = (r1 , · · · , rn ), with ri ∈ Iki . Let
K = max {ki }.
i=1,··· ,n
Then r1 , · · · , rn ∈ IK . So IK = I. So IK = IK+1 = IK+2 = · · · .
To prove the other direction, suppose there is an ideal I ◁ R that is not finitely generated. We pick r1 ∈ I. Since I is not finitely generated, we know (r1 ) 6= I. So we can find
some r2 ∈ I \ (r1 ).
Again (r1 , r2 ) 6= I. So we can find r3 ∈ I \ (r1 , r2 ). We continue on, and then can find
an infinite strictly ascending chain
(r1 ) ⊆ (r1 , r2 ) ⊆ (r1 , r2 , r3 ) ⊆ · · · .
So R is not Noetherian.
■
When we have developed some properties or notions, a natural thing to ask is whether
it passes on to subrings and quotients.
If R is Noetherian, does every subring of R have to be Noetherian? The answer is no.
For example, since Z[X1 , X2 , · · · ] is an integral domain, we can take its field of fractions,
which is a field, hence Noetherian, but Z[X1 , X2 , · · · ] is a subring of its field of fractions.
How about quotients?
Proposition 10.8.2. Let R be a Noetherian ring and I be an ideal of R. Then R/I is
Noetherian.
Proof. Whenever we see quotients, we should think of them as the image of a homomorphism. Consider the quotient map
π : R → R/I
x 7→ x + I.
We can prove this result by finitely generated or ascending chain condition. We go for the
former. Let J ◁ R/I be an ideal. We want to show that J is finitely generated. Consider
the inverse image π −1 (J). This is an ideal of R, and is hence finitely generated, since R
is Noetherian. So π −1 (J) = (r1 , · · · , rn ) for some r1 , · · · , rn ∈ R. Then J is generated by
π (r1 ), · · · , π (rn ). So done.
■
132
Chapter 10. Rings
This gives us many examples of Noetherian rings. But there is one important case
we have not tackled yet — polynomial rings. We know Z[X] is not a PID, since (2, X) is
not principal. However, this is finitely generated. So we are not dead. We might try to
construct some non-finitely generated ideal, but we are bound to fail. This is since Z[X]
is a Noetherian ring. This is a special case of the following powerful theorem:
Theorem 10.8.1 (Hilbert basis theorem). Let R be a Noetherian ring. Then so is R[X].
Since Z is Noetherian, we know Z[X] also is. Hence so is Z[X,Y ] etc.
The Hilbert basis theorem was, surprisingly, proven by Hilbert himself. Before that,
there were many mathematicians studying something known as invariant theory. The idea
is that we have some interesting objects, and we want to look at their symmetries. Often,
there are infinitely many possible such symmetries, and one interesting question to ask is
whether there is a finite set of symmetries that generate all possible symmetries.
This sounds like an interesting problem, so people devoted much time, writing down
funny proofs, showing that the symmetries are finitely generated. However, the collection
of such symmetries are often just ideals of some funny ring. So Hilbert came along
and proved the Hilbert basis theorem, and showed once and for all that those rings are
Noetherian, and hence the symmetries are finitely generated.
Proof. The proof is not too hard, but we will need to use both the ascending chain condition and the fact that all ideals are finitely-generated.
Let I ◁ R[X] be an ideal. We want to show it is finitely generated. Since we know R is
Noetherian, we want to generate some ideals of R from I.
How can we do this? We can do the silly thing of taking all constants of I, i.e. I ∩ R.
But we can do better. We can consider all linear polynomials, and take their leading
coefficients. Thinking for a while, this is indeed an ideal.
In general, for n = 0, 1, 2, · · · , we let
In = {r ∈ R : there is some f ∈ I such that f = rX n + · · · } ∪ {0}.
Then it is easy to see, using the strong closure property, that each ideal In is an ideal of R.
Moreover, they form a chain, since if f ∈ I, then X f ∈ I, by strong closure. So In ⊆ In+1
for all n.
By the ascending chain condition of R, we know there is some N such that IN = IN+1 =
· · · . Now for each 0 ≤ n ≤ N, since R is Noetherian, we can write
(n)
(n)
(n)
In = (r1 , r2 , · · · , rk(n) ).
(n)
(n)
(n)
(n)
Now for each ri , we choose some fi ∈ I with fi = ri X n + · · · .
(n)
We now claim the polynomials fi for 0 ≤ n ≤ N and 1 ≤ i ≤ k(n) generate I.
(n)
Suppose not. We pick g ∈ I of minimal degree not generated by the fi .
There are two possible cases. If deg g = n ≤ N, suppose
g = rX n + · · · .
We know r ∈ In . So we can write
r = ∑ λi ri
(n)
i
10.8 Noetherian rings
133
for some λi ∈ R, since that’s what generating an ideal means. Then we know
∑ λi f i
(n)
= rX n + · · · ∈ I.
i
( j)
(n)
But if g is not in the span of the fi , then so isn’t g − ∑i λi fi . But this has a lower
degree than g. This is a contradiction.
Now suppose deg g = n > N. This might look scary, but it is not, since In = IN . So we
write the same proof. We write
g = rX n + · · · .
But we know r ∈ In = IN . So we know
r = ∑ λi ri .
(N)
I
Then we know
X n−N ∑ λi fi
(n)
= rX N + · · · ∈ I.
i
(N)
Hence g − X n−N ∑ λi fi
( j)
has smaller degree than g, but is not in the span of fi .
■
As an aside, let E ⊆ F[X1 , X2 , · · · , Xn ] be any set of polynomials. We view this as a set
of equations f = 0 for each f ∈ E . The claim is that to solve the potentially infinite set of
equations E , we actually only have to solve finitely many equations.
Consider the ideal (E ) ◁ F[X1 , · · · , Xn ]. By the Hilbert basis theorem, there is a finite
list f1 , · · · , fk such that
( f1 , · · · , fk ) = (E ).
We want to show that we only have to solve fi (x) = 0 for these fi . Given (α1 , · · · , αn ) ∈ F n ,
consider the homomorphism
ϕα : F[X1 , · · · , Xn ] → F
Xi 7→ αi .
Then we know (α1 , · · · , αn ) ∈ F n is a solution to the equations E if and only if (E ) ⊆
ker(φα ). By our choice of fi , this is true if and only if ( f1 , · · · , fk ) ⊆ ker(φα ). By inspection, this is true if and only if (α1 , · · · , αn ) is a solution to all of f1 , · · · , fk . So solving E
is the same as solving f1 , · · · , fk . This is useful in, say, algebraic geometry.
11. Modules
Finally, we are going to look at modules. Recall that to define a vector space, we first
pick some base field F. We then defined a vector space to be an abelian group V with an
action of F on V (i.e. scalar multiplication) that is compatible with the multiplicative and
additive structure of F.
In the definition, we did not at all mention division in F. So in fact we can make
the same definition, but allow F to be a ring instead of a field. We call these modules.
Unfortunately, most results we prove about vector spaces do use the fact that F is a field.
So many linear algebra results do not apply to modules, and modules have much richer
structures.
11.1 Definitions and examples
Definition 11.1.1 (Module). Let R be a commutative ring. We say a quadruple (M, +, 0M , · )
is an R-module if
(i) (M, +, 0M ) is an abelian group
(ii) The operation · : R × M → M satisfies
(a) (r1 + r2 ) · m = (r1 · m) + (r2 · m);
(b) r · (m1 + m2 ) = (r · m1 ) + (r · m2 );
(c) r1 · (r2 · m) = (r1 · r2 ) · m; and
(d) 1R · m = m.
Note that there are two different additions going on — addition in the ring and addition in the module, and similarly two notions of multiplication. However, it is easy to
distinguish them since they operate on different things. If needed, we can make them
explicit by writing, say, +R and +M .
We can imagine modules as rings acting on abelian groups, just as groups can act on
sets. Hence we might say “R acts on M” to mean M is an R-module.
136
Chapter 11. Modules
Example 11.1.1. Let F be a field. An F-module is precisely the same as a vector space
over F (the axioms are the same).
Example 11.1.2. For any ring R, we have the R-module Rn = R × R × · · · × R via
r · (r1 , · · · , rn ) = (rr1 , · · · , rrn ),
using the ring multiplication. This is the same as the definition of the vector space Fn for
fields F.
Example 11.1.3. Let I ◁ R be an ideal. Then it is an R-module via
r ·I a = r ·R a,
r1 +I r2 = r1 +R r2 .
Also, R/I is an R-module via
r ·R/I (a + I) = (r ·R a) + I,
Example 11.1.4. A Z-module is precisely the same as an abelian group. For A an abelian
group, we have
Z×A → A
(n, a) 7→ a + · · · + a,
| {z }
n times
where we adopt the notation
a + · · · + a = (−a) + · · · + (−a),
| {z } |
{z
}
−n times
n times
and adding something to itself 0 times is just 0.
This definition is essentially forced upon us, since by the axioms of a module, we
must have (1, a) 7→ a. Then we must send, say, (2, a) = (1 + 1, a) 7→ a + a.
Example 11.1.5. Let F be a field and V a vector space over F, and α : V → V be a linear
map. Then V is an F[X]-module via
F[X] ×V → V
( f , v) 7→ f (α )(v).
This is a module.
Note that we cannot just say that V is an F[X]-module. We have to specify the α as
well. Picking a different α will give a different F[X]-module structure.
Example 11.1.6. Let ϕ : R → S be a homomorphism of rings. Then any S-module M may
be considered as an R-module via
R×M → M
(r, m) 7→ ϕ (r) ·M m.
11.1 Definitions and examples
137
Definition 11.1.2 (Submodule). Let M be an R-module. A subset N ⊆ M is an R-submodule
if it is a subgroup of (M, +, 0M ), and if n ∈ N and r ∈ R, then rn ∈ N. We write N ≤ M.
Example 11.1.7. We know R itself is an R-module. Then a subset of R is a submodule if
and only if it is an ideal.
Example 11.1.8. A subset of an F-module V , where F is a field, is an F-submodule if and
only if it is a vector subspace of V .
Definition 11.1.3 (Quotient module). Let N ≤ M be an R-submodule. The quotient module M/N is the set of N-cosets in (M, +, 0M ), with the R-action given by
r · (m + N) = (r · m) + N.
It is easy to check this is well-defined and is indeed a module.
Note that modules are different from rings and groups. In groups, we had subgroups,
and we have some really nice ones called normal subgroups. We are only allowed to
quotient by normal subgroups. In rings, we have subrings and ideals, which are unrelated
objects, and we only quotient by ideals. In modules, we only have submodules, and we
can quotient by arbitrary submodules.
Definition 11.1.4 (R-module homomorphism and isomorphism). A function f : M → N
between R-modules is an R-module homomorphism if it is a homomorphism of abelian
groups, and satisfies
f (r · m) = r · f (m)
for all r ∈ R and m ∈ M.
An isomorphism is a bijective homomorphism, and two R-modules are isomorphic if
there is an isomorphism between them.
Note that on the left, the multiplication is the action in M, while on the right, it is the
action in N.
Example 11.1.9. If F is a field and V,W are F-modules (i.e. vector spaces over F), then
an F-module homomorphism is precisely an F-linear map.
Theorem 11.1.1 (First isomorphism theorem). Let f : M → N be an R-module homomorphism. Then
ker f = {m ∈ M : f (m) = 0} ≤ M
is an R-submodule of M. Similarly,
im f = { f (m) : m ∈ M} ≤ N
is an R-submodule of N. Then
M ∼
= im f .
ker f
We will not prove this again. The proof is exactly the same.
138
Chapter 11. Modules
Theorem 11.1.2 (Second isomorphism theorem). Let A, B ≤ M. Then
A + B = {m ∈ M : m = a + b for some a ∈ A, b ∈ B} ≤ M,
and
A ∩ B ≤ M.
We then have
A+B ∼ B
.
=
A
A∩B
Theorem 11.1.3 (Third isomorphism theorem). Let N ≤ L ≤ M. Then we have
M∼ M L
.
=
L
N
N
Also, we have a correspondence
{submodules of M/N} ←→ {submodules of M which contain N}
It is an exercise to see what these mean in the cases where R is a field, and modules are
vector spaces.
We now have something new. We have a new concept that was not present in rings
and groups.
Definition 11.1.5 (Annihilator). Let M be an R-module, and m ∈ M. The annihilator of
m is
Ann(m) = {r ∈ R : r · m = 0}.
For any set S ⊆ M, we define
Ann(S) = {r ∈ R : r · m = 0 for all m ∈ S} =
\
Ann(m).
m∈S
In particular, for the module M itself, we have
Ann(M) = {r ∈ R : r · m = 0 for all m ∈ M} =
\
Ann(m).
m∈M
Note that the annihilator is a subset of R. Moreover it is an ideal — if r · m = 0 and
s · m = 0, then (r + s) · m = r · m + s · m = 0. So r + s ∈ Ann(m). Moreover, if r · m = 0,
then also (sr) · m = s · (r · m) = 0. So sr ∈ Ann(m).
What is this good for? We first note that any m ∈ M generates a submodule Rm as
follows:
Definition 11.1.6 (Submodule generated by element). Let M be an R-module, and m ∈ M.
The submodule generated by m is
Rm = {r · m ∈ M : r ∈ R}.
11.1 Definitions and examples
139
We consider the R-module homomorphism
ϕ :R→M
r 7→ rm.
This is clearly a homomorphism. Then we have
Rm = im(ϕ ),
Ann(m) = ker(ϕ ).
The conclusion is that
Rm ∼
= R/ Ann(m).
As we mentioned, rings acting on modules is like groups acting on sets. We can think of
this as the analogue of the orbit-stabilizer theorem.
In general, we can generate a submodule with many elements.
Definition 11.1.7 (Finitely generated module). An R-module M is finitely generated if
there is a finite list of elements m1 , · · · , mk such that
M = Rm1 + Rm2 + · · · + Rmk = {r1 m1 + r2 m2 + · · · + rk mk : ri ∈ R}.
This is in some sense analogous to the idea of a vector space being finite-dimensional.
However, it behaves much more differently.
While this definition is rather concrete, it is often not the most helpful characterization
of finitely-generated modules. Instead, we use the following lemma:
Lemma 11.1.1. An R-module M is finitely-generated if and only if there is a surjective
R-module homomorphism f : Rk ↠ M for some finite k.
Proof. If
M = Rm1 + Rm2 + · · · + Rmk ,
we define f : Rk → M by
(r1 , · · · , rk ) 7→ r1 m1 + · · · + rk mk .
It is clear that this is an R-module homomorphism. This is by definition surjective. So
done.
Conversely, given a surjection f : Rk ↠ M, we let
mi = f (0, 0, · · · , 0, 1, 0, · · · , 0),
where the 1 appears in the ith position. We now claim that
M = Rm1 + Rm2 + · · · + Rmk .
So let m ∈ M. As f is surjective, we know
m = f (r1 , r2 , · · · , rk )
140
Chapter 11. Modules
for some ri . We then have
f (r1 , r2 , · · · , rk )
= f ((r1 , 0, · · · , 0) + (0, r2 , 0, · · · , 0) + · · · + (0, 0, · · · , 0, rk ))
= f (r1 , 0, · · · , 0) + f (0, r2 , 0, · · · , 0) + · · · + f (0, 0, · · · , 0, rk )
= r1 f (1, 0, · · · , 0) + r2 f (0, 1, 0, · · · , 0) + · · · + rk f (0, 0, · · · , 0, 1)
= r1 m1 + r2 m2 + · · · + rk mk .
■
So the mi generate M.
This view is a convenient way of thinking about finitely-generated modules. For example, we can immediately prove the following corollary:
Corollary 11.1.1. Let N ≤ M and M be finitely-generated. Then M/N is also finitely
generated.
Proof. Since m is finitely generated, we have some surjection f : Rk ↠ M. Moreover, we
have the surjective quotient map q : M ↠ M/N. Then we get the following composition
Rk
f
M
q
M/N,
which is a surjection, since it is a composition of surjections. So M/N is finitely generated.
■
It is very tempting to believe that if a module is finitely generated, then its submodules
are also finitely generated. It would be very wrong to think so.
Example 11.1.10. A submodule of a finitely-generated module need not be finitely generated.
We let R = C[X1 , X2 , · · · ]. We consider the R-module M = R, which is finitely generated (by 1). A submodule of the ring is the same as an ideal. Moreover, an ideal is
finitely generated as an ideal if and only if it is finitely generated as a module. We pick
the submodule
I = (X1 , X2 , · · · ),
which we have already shown to be not finitely-generated. So done.
Example 11.1.11. For a complex number α , the ring Z[α ] (i.e. the smallest subring of C
containing α ) is a finitely-generated as a Z-module if and only if α is an algebraic integer.
Proof is left as an exercise for the reader on the last example sheet. This allows us
to prove that algebraic integers are closed under addition and multiplication, since it is
easier to argue about whether Z[α ] is finitely generated.
11.2 Direct sums and free modules
141
11.2 Direct sums and free modules
We’ve been secretly using the direct sum in many examples, but we shall define it properly
now.
Definition 11.2.1 (Direct sum of modules). Let M1 , M2 , · · · , Mk be R-modules. The direct
sum is the R-module
M1 ⊕ M2 ⊕ · · · ⊕ Mk ,
which is the set M1 × M2 × · · · × Mk , with addition given by
(m1 , · · · , mk ) + (m01 , · · · , m0k ) = (m1 + m01 , · · · , mk + m0k ),
and the R-action given by
r · (m1 , · · · , mk ) = (rm1 , · · · , rmk ).
We’ve been using one example of the direct sum already, namely
Rn = R ⊕ R ⊕ · · · ⊕ R .
|
{z
}
n times
Recall we said modules are like vector spaces. So we can try to define things like basis
and linear independence. However, we will fail massively, since we really can’t prove
much about them. Still, we can define them.
Definition 11.2.2 (Linear independence). Let m1 , · · · , mk ∈ M. Then {m1 , · · · , mk } is linearly independent if
k
∑ ri mi = 0
i=1
implies r1 = r2 = · · · = rk = 0.
Lots of modules will not have a basis in the sense we are used to. The next best thing
would be the following:
Definition 11.2.3 (Freely generate). A subset S ⊆ M generates M freely if
(i) S generates M
(ii) Any set function ψ : S → N to an R-module N extends to an R-module map θ : M →
N.
Note that if θ1 , θ2 are two such extensions, we can consider θ1 − θ2 : M → N. Then
θ1 − θ2 sends everything in S to 0. So S ⊆ ker(θ1 − θ2 ) ≤ M. So the submodule generated
by S lies in ker(θ1 − θ2 ) too. But this is by definition M. So M ≤ ker(θ1 − θ2 ) ≤ M, i.e.
equality holds. So θ1 − θ2 = 0. So θ1 = θ2 . So any such extension is unique.
Thus, what this definition tells us is that giving a map from M to N is exactly the same
thing as giving a function from S to N.
Definition 11.2.4 (Free module and basis). An R-module is free if it is freely generated
by some subset S ⊆ M, and S is called a basis.
142
Chapter 11. Modules
We will soon prove that if R is a field, then every module is free. However, if R is not
a field, then there are non-free modules.
Example 11.2.1. The Z-module Z/2Z is not freely generated. Suppose Z/2Z were generated by some S ⊆ Z/2Z. Then this can only possibly be S = {1}. Then this implies
there is a homomorphism θ : Z/2Z → Z sending 1 to 1. But it does not send 0 = 1 + 1 to
1 + 1, since homomorphisms send 0 to 0. So Z/2Z is not freely generated.
We now want to formulate free modules in a way more similar to what we do in linear
algebra.
Proposition 11.2.1. For a subset S = {m1 , · · · , mk } ⊆ M, the following are equivalent:
(i) S generates M freely.
(ii) S generates M and the set S is independent.
(iii) Every element of M is uniquely expressible as
r1 m1 + r2 m2 + · · · + rk mk
for some ri ∈ R.
Proof. The fact that (ii) and (iii) are equivalent is something we would expect from what
we know from linear algebra, and in fact the proof is the same. So we only show that (i)
and (ii) are equivalent.
Let S generate M freely. If S is not independent, then we can write
r1 m1 + · · · + rk mk = 0,
with ri ∈ R and, say, r1 non-zero. We define the set function ψ : S → R by sending
m1 7→ 1R and mi 7→ 0 for all i 6= 1. As S generates M freely, this extends to an R-module
homomorphism θ : M → R.
By definition of a homomorphism, we can compute
0 = θ (0)
= θ (r1 m1 + r2 m2 + · · · + rk mk )
= r1 θ (m1 ) + r2 θ (m2 ) + · · · + rk θ (mk )
= r1 .
This is a contradiction. So S must be independent.
To prove the other direction, suppose every element can be uniquely written as r1 m1 +
· · · + rk mk . Given any set function ψ : S → N, we define θ : M → N by
θ (r1 m1 + · · · + rk mk ) = r1 ψ (m1 ) + · · · + rk ψ (mk ).
This is well-defined by uniqueness, and is clearly a homomorphism. So it follows that S
generates M freely.
■
Example 11.2.2. The set {2, 3} ∈ Z generates Z. However, they do not generate Z freely,
since
3 · 2 + (−2) · 3 = 0.
Recall from linear algebra that if a set S spans a vector space V , and it is not independent,
then we can just pick some useless vectors and throw them away in order to get a basis.
However, this is no longer the case in modules. Neither 2 nor 3 generate Z.
11.2 Direct sums and free modules
143
Definition 11.2.5 (Relations). If M is a finitely-generated R-module, we have shown that
there is a surjective R-module homomorphism ϕ : Rk → M. We call ker(ϕ ) the relation
module for those generators.
Definition 11.2.6 (Finitely presented module). A finitely-generated module is finitely presented if we have a surjective homomorphism ϕ : Rk → M and ker ϕ is finitely generated.
Being finitely presented means I can tell you everything about the module with a
finite amount of paper. More precisely, if {m1 , · · · , mk } generate M and {n1 , n2 , · · · , nℓ }
generate ker(ϕ ), then each
ni = (ri1 , · · · rik )
corresponds to the relation
ri1 m1 + ri2 m2 + · · · + rik mk = 0
in M. So M is the module generated by writing down R-linear combinations of m1 , · · · , mk ,
and saying two elements are the same if they are related to one another by these relations.
Since there are only finitely many generators and finitely many such relations, we can
specify the module with a finite amount of information.
A natural question we might ask is if n 6= m, then are Rn and Rm the same? In vector spaces, they obviously must be different, since basis and dimension are well-defined
concepts.
Proposition 11.2.2 (Invariance of dimension/rank). Let R be a non-zero ring. If Rn ∼
= Rm
as R-modules, then n = m.
We know this is true if R is a field. We now want to reduce this to the case where R is
a ring.
If R is an integral domain, then we can produce a field by taking the field of fractions,
and this might be a good starting point. However, we want to do this for general rings. So
we need some more magic.
We will need the following construction:
Let I ◁ R be an ideal, and let M be an R-module. We define
IM = {am ∈ M : a ∈ I, m ∈ M} ≤ M.
So we can take the quotient module M/IM, which is an R-module again.
Now if b ∈ I, then its action on M/IM is
b(m + IM) = bm + IM = IM.
So everything in I kills everything in M/IM. So we can consider M/IM as an R/I module
by
(r + I) · (m + IM) = r · m + IM.
So we have proved that
144
Chapter 11. Modules
Proposition 11.2.3. If I ◁ R is an ideal and M is an R-module, then M/IM is an R/I
module in a natural way.
We next need to use the following general fact:
Proposition 11.2.4. Every non-zero ring has a maximal ideal.
This is a rather strong statement, since it talks about “all rings”, and we can have weird
rings. We need to use a more subtle argument, namely via Zorn’s lemma. You probably
haven’t seen it before, in which case you might want to skip the proof and just take the
lecturer’s word on it.
Proof. We observe that an ideal I ◁ R is proper if and only if 1R 6∈ I. So every increasing union of proper ideals is proper. Then by Zorn’s lemma, there is a maximal ideal
(Zorn’s lemma says if an arbitrary union of increasing things is still a thing, then there is
a maximal such thing, roughly).
■
With these two notions, we get
Proposition 11.2.5 (Invariance of dimension/rank). Let R be a non-zero ring. If Rn ∼
= Rm
as R-modules, then n = m.
Proof. Let I be a maximal ideal of R. Suppose we have Rn ∼
= Rm . Then we must have
Rn ∼ Rm
,
=
IRn IRm
as R/I modules.
But staring at it long enough, we figure that
Rn ∼ R n
,
=
IRn
I
and similarly for m. Since R/I is a field, the result follows by linear algebra.
■
The point of this proposition is not the result itself (which is not too interesting), but
the general constructions used behind the proof.
11.3 Matrices over Euclidean domains
This is the part of the course where we deliver all our promises about proving the classification of finite abelian groups and Jordan normal forms.
Until further notice, we will assume R is a Euclidean domain, and we write ϕ : R \
{0} → Z≥0 for its Euclidean function. We know that in such a Euclidean domain, the
greatest common divisor gcd(a, b) exists for all a, b ∈ R.
We will consider some matrices with entries in R.
Definition 11.3.1 (Elementary row operations). Elementary row operations on an m × n
matrix A with entries in R are operations of the form
11.3 Matrices over Euclidean domains
145
(i) Add c ∈ R times the ith row to the jth row. This may be done by multiplying by the
following matrix on the left:


1
 ..



.




1
c




.
.

,
.




1




.
.. 

1
where c appears in the ith column of the jth row.
(ii) Swap the ith and jth rows. This can be done by left-multiplication of the matrix


1
 ..



.




1




0
1




1




.
..

.




1




1
0




1




..

. 
1
Again, the rows and columns we have messed with are the ith and jth rows and
columns.
(iii) We multiply the ith row by a unit c ∈ R. We do this via the following matrix:


1
 ..



.




1




c




1




.
.. 

1
Notice that if R is a field, then we can multiply any row by any non-zero number,
since they are all units.
We also have elementary column operations defined in a similar fashion, corresponding
to right multiplication of the matrices. Notice all these matrices are invertible.
Definition 11.3.2 (Equivalent matrices). Two matrices are equivalent if we can get from
one to the other via a sequence of such elementary row and column operations.
146
Chapter 11. Modules
Note that if A and B are equivalent, then we can write
B = QAT −1
for some invertible matrices Q and T −1 .
The aim of the game is to find, for each matrix, a matrix equivalent to it that is as
simple as possible. Recall from IB Linear Algebra that if R is a field, then we can put any
matrix into the form
Ir 0
0 0
via elementary row and column operations. This is no longer true when working with
rings. For example, over Z, we cannot put the matrix
2 0
0 0
into that form, since no operation can turn the 2 into a 1. What we get is the following
result:
Theorem 11.3.1 (Smith normal form). An m × n matrix over a Euclidean domain R is
equivalent to a diagonal matrix


d1


d2




..


.



,
d
r




0




.
.. 

0
with the di all non-zero and
d1 | d2 | d3 | · · · | dr .
Note that the divisibility criterion is similar to the classification of finitely-generated
abelian groups. In fact, we will derive that as a consequence of the Smith normal form.
Definition 11.3.3 (Invariant factors). The dk obtained in the Smith normal form are called
the invariant factors of A.
We first exhibit the algorithm of producing the Smith normal form with an algorithm
in Z.
Example 11.3.1. We start with the matrix


3 7 4
1 −1 2 .
3 5 1
11.3 Matrices over Euclidean domains
147
We want to move the 1 to the top-left corner. So we swap the first and second rows to
obtain.


1 −1 2
3 7 4  .
3 5 1
We then try to eliminate the other entries in the first row by column operations. We add
multiples of the first column to the second and third to obtain


1 0 0
3 10 −2 .
3 8 −5
We similarly clear the first column to get


1 0 0
0 10 −2 .
0 8 −5
We are left with a 2 × 2 matrix to fiddle with.
We swap the second and third columns so that 2 is in the 2, 2 entry, and secretly change
sign to get


1 0 0
0 2 10 .
0 5 8
We notice that (2, 5) = 1. So we can use linear combinations to introduce a 1 at the bottom


1 0 0
0 2 10  .
0 1 −12
Swapping rows, we get


1 0 0
0 1 −12 .
0 2 10
We then clear the remaining rows and columns to get


1 0 0
0 1 0  .
0 0 34
Proof. Throughout the process, we will keep calling our matrix A, even though it keeps
changing in each step, so that we don’t have to invent hundreds of names for these matrices.
If A = 0, then done! So suppose A 6= 0. So some entry is not zero, say, Ai j 6= 0.
Swapping the ith and first row, then jth and first column, we arrange that A11 6= 0. We
now try to reduce A11 as much as possible. We have the following two possible moves:
148
Chapter 11. Modules
(i) If there is an A1 j not divisible by A11 , then we can use the Euclidean algorithm to
write
A1 j = qA11 + r.
By assumption, r 6= 0. So ϕ (r) < ϕ (A11 ) (where ϕ is the Euclidean function).
So we subtract q copies of the first column from the jth column. Then in position
(1, j), we now have r. We swap the first and jth column such that r is in position
(1, 1), and we have strictly reduced the value of ϕ at the first entry.
(ii) If there is an Ai1 not divisible by A11 , we do the same thing, and this again reduces
ϕ (A11 ).
We keep performing these until no move is possible. Since the value of ϕ (A11 ) strictly
decreases every move, we stop after finitely many applications. Then we know that we
must have A11 dividing all Ai j and Ai1 . Now we can just subtract appropriate multiples of
the first column from others so that A1 j = 0 for j 6= 1. We do the same thing with rows so
that the first row is cleared. Then we have a matrix of the form


d 0 ··· 0
0



A = .
.
.
.

C
0
We would like to say “do the same thing with C”, but then this would get us a regular
diagonal matrix, not necessarily in Smith normal form. So we need some preparation.
(iii) Suppose there is an entry of C not divisible by d, say Ai j with i, j > 1.


d 0 ··· 0 ··· 0
0



 ..

.


A=
0

Ai j


.

 ..

0
We suppose
Ai j = qd + r,
with r 6= 0 and ϕ (r) < ϕ (d). We add column 1 to column j, and subtract q times
row 1 from row i. Now we get r in the (i, j)th entry, and we want to send it back to
the (1, 1) position. We swap row i with row 1, swap column j with row 1, so that r
is in the (1, 1)th entry, and ϕ (r) < ϕ (d).
Now we have messed up the first row and column. So we go back and do (i) and
(ii) again until the first row and columns are cleared. Then we get
 0

d 0 ··· 0
0

,
A=
0
0

C
0
11.3 Matrices over Euclidean domains
149
where
ϕ (d 0 ) ≤ ϕ (r) < ϕ (d).
As this strictly decreases the value of ϕ (A11 ), we can only repeat this finitely many times.
When we stop, we will end up with a matrix


d 0 ··· 0

0


A = .
,

 ..
C
0
and d divides every entry of C. Now we apply the entire process to C. When we do this
process, notice all allowed operations don’t change the fact that d divides every entry of
C.
So applying this recursively, we obtain a diagonal matrix with the claimed divisibility
property.
■
Note that if we didn’t have to care about the divisibility property, we can just do (i)
and (ii), and we can get a diagonal matrix. The magic to get to the Smith normal form is
(iii).
Recall that the di are called the invariant factors. So it would be nice if we can prove
that the di are indeed invariant. It is not clear from the algorithm that we will always
end up with the same di . Indeed, we can multiply a whole row by −1 and get different
invariant factors. However, it turns out that these are unique up to multiplication by units.
To study the uniqueness of the invariant factors of a matrix A, we relate them to other
invariants, which involves minors.
Definition 11.3.4 (Minor). A k × k minor of a matrix A is the determinant of a k × k
sub-matrix of A (i.e. a matrix formed by removing all but k rows and all but k columns).
Any given matrix has many minors, since we get to decide which rows and columns
we can throw away. The idea is to consider the ideal generated by all the minors of matrix.
Definition 11.3.5 (Fitting ideal). For a matrix A, the kth Fitting ideal Fitk (A) ◁ R is the
ideal generated by the set of all k × k minors of A.
A key property is that equivalent matrices have the same Fitting ideal, even if they
might have very different minors.
Lemma 11.3.1. Let A and B be equivalent matrices. Then
Fitk (A) = Fitk (B)
for all k.
Proof. It suffices to show that changing A by a row or column operation does not change
the Fitting ideal. Since taking the transpose does not change the determinant i.e. Fitk (A) =
Fitk (AT ), it suffices to consider the row operations.
150
Chapter 11. Modules
The most difficult one is taking linear combinations. Let B be the result of adding c
times the ith row to the jth row, and fix C a k × k minor of A. Suppose the resultant matrix
is C0 . We then want to show that detC0 ∈ Fitk (A).
If the jth row is outside of C, then the minor detC is unchanged. If both the ith and
jth rows are in C, then the submatrix C changes by a row operation, which does not affect
the determinant. These are the boring cases.
Suppose the jth row is in C and the ith row is not. Suppose the ith row is f1 , · · · , fk .
Then C is changed to C0 , with the jth row being
(C j1 + c f1 ,C j2 + c f2 , · · · ,C jk + c fk ).
We compute detC0 by expanding along this row. Then we get
detC0 = detC + c det D,
where D is the matrix obtained by replacing the jth row of C with ( f1 , · · · , fk ).
The point is that detC is definitely a minor of A, and det D is still a minor of A, just
another one. Since ideals are closed under addition and multiplications, we know
det(C0 ) ∈ Fitk (A).
The other operations are much simpler. They just follow by standard properties of the
effect of swapping rows or multiplying rows on determinants. So after any row operation,
the resultant submatrix C0 satisfies
det(C0 ) ∈ Fitk (A).
Since this is true for all minors, we must have
Fitk (B) ⊆ Fitk (A).
But row operations are invertible. So we must have
Fitk (A) ⊆ Fitk (B)
as well. So they must be equal. So done.
■
We now notice that if we have a matrix in Smith normal form, say


d1


d2




.
.


.



,
dr
B=



0




.
.

. 
0
then we can immediately read off
Fitk (B) = (d1 d2 · · · dk ).
This is clear once we notice that the only possible contributing minors are from the diagonal submatrices, and the minor from the top left square submatrix divides all other
diagonal ones. So we have
11.3 Matrices over Euclidean domains
151
Corollary 11.3.1. If A has Smith normal form






B=






d1
d2
..
.
dr





,


0


..
. 
0
then
Fitk (A) = (d1 d2 · · · dk ).
So dk is unique up to associates.
This is since we can find dk by dividing the generator of Fitk (A) by the generator of
Fitk−1 (A).
Example 11.3.2. Consider the matrix in Z:
2 0
A=
.
0 3
This is diagonal, but not in Smith normal form. We can potentially apply the algorithm,
but that would be messy. We notice that
Fit1 (A) = (2, 3) = (1).
So we know d1 = ±1. We can then look at the second Fitting ideal
Fit2 (A) = (6).
So d1 d2 = ±6. So we must have d2 = ±6. So the Smith normal form is
1 0
.
0 6
That was much easier.
We are now going to use Smith normal forms to do things. We will need some preparation, in the form of the following lemma:
Lemma 11.3.2. Let R be a principal ideal domain. Then any submodule of Rm is generated by at most m elements.
This is obvious for vector spaces, but is slightly more difficult here.
152
Chapter 11. Modules
Proof. Let N ≤ Rm be a submodule. Consider the ideal
I = {r ∈ R : (r, r2 , · · · , rm ) ∈ N for some r2 , · · · , rm ∈ R}.
It is clear this is an ideal. Since R is a principle ideal domain, we must have I = (a) for
some a ∈ R. We now choose an
n = (a, a2 , · · · , am ) ∈ N.
Then for any vector (r1 , r2 , · · · , rm ) ∈ N, we know that r1 ∈ I. So a | r1 . So we can write
r1 = ra.
Then we can form
(r1 , r2 , · · · , rm ) − r(a, a2 , · · · , am ) = (0, r2 − ra2 , · · · , rm − ram ) ∈ N.
This lies in N 0 = N ∩ ({0} × Rm−1 ) ≤ Rm−1 . Thus everything in N can be written as a
multiple of n plus something in N 0 . But by induction, since N 0 ≤ Rm−1 , we know N 0 is
generated by at most m − 1 elements. So there are n2 , · · · , nm ∈ N 0 generating N 0 . So
n, n2 , · · · , nm generate N.
■
If we have a submodule of Rm , then it has at most m generators. However, these might
generate the submodule in a terrible way. The next theorem tells us there is a nice way of
finding generators.
Theorem 11.3.2. Let R be a Euclidean domain, and let N ≤ Rm be a submodule. Then
there exists a basis v1 , · · · , vm of Rm such that N is generated by d1 v1 , d2 v2 , · · · , dr vr for
some 0 ≤ r ≤ m and some di ∈ R such that
d1 | d2 | · · · | dr .
This is not hard, given what we’ve developed so far.
Proof. By the previous lemma, N is generated by some elements x1 , · · · , xn with n ≤ m.
Each xi is an element of Rm . So we can think of it as a column vector of length m, and we
can form a matrix


↑ ↑
↑
A = x1 x2 · · · xn  .
↓ ↓
↓
We’ve got an m × n matrix. So we can put it in Smith normal form! Since there are fewer
columns than there are rows, this is of the form


d1


d2




..


.




dr




0




.
.

. 




0



0




.
.. 

0
11.3 Matrices over Euclidean domains
153
Recall that we got to the Smith normal form by row and column operations. Performing
row operations is just changing the basis of Rm , while each column operation changes the
generators of N.
So what this tells us is that there is a new basis v1 , · · · , vm of Rm such that N is generated by d1 v1 , · · · , dr vr . By definition of Smith normal form, the divisibility condition
holds.
■
Corollary 11.3.2. Let R be a Euclidean domain. A submodule of Rm is free of rank at
most m. In other words, the submodule of a free module is free, and of a smaller (or equal)
rank.
Proof. Let N ≤ Rm be a submodule. By the above, there is a basis v1 , · · · , vn of Rm such
that N is generated by d1 v1 , · · · , dr vr for r ≤ m. So it is certainly generated by at most
m elements. So we only have to show that d1 v1 , · · · , dr vr are independent. But if they
were linearly dependent, then so would be v1 , · · · , vm . But v1 , · · · , vn are a basis, hence
independent. So d1 v1 , · · · , dr vr generate N freely. So
N∼
= Rr .
■
Note that this is not true for all rings. For example, (2, X) ◁ Z[X] is a submodule of
Z[X], but is not isomorphic to Z[X].
Theorem 11.3.3 (Classification of finitely-generated modules over a Euclidean domain).
Let R be a Euclidean domain, and M be a finitely generated R-module. Then
M∼
=
R
R
R
⊕
⊕···⊕
⊕R⊕R⊕···⊕R
(d1 ) (d2 )
(dr )
for some di 6= 0, and
d1 | d2 | · · · | dr .
This is either a good or bad thing. If you are pessimistic, this says the world of finitely
generated modules is boring, since there are only these modules we already know about.
If you are optimistic, this tells you all finitely-generated modules are of this simple form,
so we can prove things about them assuming they look like this.
Proof. Since M is finitely-generated, there is a surjection ϕ : Rm → M. So by the first
isomorphism, we have
M∼
=
Rm
.
ker ϕ
Since ker ϕ is a submodule of Rm , by the previous theorem, there is a basis v1 , · · · , vm of
Rm such that ker ϕ is generated by d1 v1 , · · · , dr vr for 0 ≤ r ≤ m and d1 | d2 | · · · | dr . So we
know
Rm
M∼
.
=
((d1 , 0, · · · , 0), (0, d2 , 0, · · · , 0), · · · , (0, · · · , 0, dr , 0, · · · , 0))
This is just
R
R
R
⊕
⊕···⊕
⊕ R ⊕ · · · ⊕ R,
(d1 ) (d2 )
(dr )
with m − r copies of R.
■
154
Chapter 11. Modules
This is particularly useful in the case where R = Z, where R-modules are abelian
groups.
Example 11.3.3. Let A be the abelian group generated by a, b, c with relations
2a + 3b + c = 0,
a + 2b
= 0,
5a + 6b + 7c = 0.
In other words, we have
A=
Z3
.
((2, 3, 1), (1, 2, 0), (5, 6, 7))
We would like to get a better description of A. It is not even obvious if this module is the
zero module or not.
To work out a good description, We consider the matrix


2 1 5
X = 3 2 6  .
1 0 7
To figure out the Smith normal form, we find the fitting ideals. We have
Fit1 (X) = (1, · · · ) = (1).
So d1 = 1.
We have to work out the second fitting ideal. In principle, we have to check all the
minors, but we immediately notice
2 1
= 1.
3 2
So Fit2 (X) = (1), and d2 = 1. Finally, we find


2 1 5
Fit3 (X) = 3 2 6 = (3).
1 0 7
So d3 = 3. So we know
A∼
=
Z
Z
Z ∼ Z ∼
⊕
⊕
=
= C3 .
(1) (1) (3) (3)
If you don’t feel like computing determinants, doing row and column reduction is often
as quick and straightforward.
We re-state the previous theorem in the specific case where R is Z, since this is particularly useful.
11.3 Matrices over Euclidean domains
155
Corollary 11.3.3 (Classification of finitely-generated abelian groups). Any finitely-generated
abelian group is isomorphic to
Cd1 × · · · ×Cdr ×C∞ × · · · ×C∞ ,
where C∞ ∼
= Z is the infinite cyclic group, with
d1 | d2 | · · · | dr .
Proof. Let R = Z, and apply the classification of finitely generated R-modules.
■
Note that if the group is finite, then there cannot be any C∞ factors. So it is just a
product of finite cyclic groups.
Corollary 11.3.4. If A is a finite abelian group, then
A∼
= Cd1 × · · · ×Cdr ,
with
d1 | d2 | · · · | dr .
This is the result we stated at the beginning of the course.
Recall that we were also to decompose a finite abelian group into products of the form
C pk , where p is a prime, and we said it was just the Chinese remainder theorem. This is
again in general true, but we, again, need the Chinese remainder theorem.
Lemma 11.3.3 (Chinese remainder theorem). Let R be a Euclidean domain, and a, b ∈ R
be such that gcd(a, b) = 1. Then
R
R ∼ R
×
=
(ab) (a) (b)
as R-modules.
The proof is just that of the Chinese remainder theorem written in ring language.
Proof. Consider the R-module homomorphism
ϕ:
R
R
R
×
→
(a) (b)
(ab)
by
(r1 + (a), r2 + (b)) 7→ br1 + ar2 + (ab).
To show this is well-defined, suppose
(r1 + (a), r2 + (b)) = (r10 + (a), r20 + (b)).
Then
r1 = r10 + xa
r2 = r20 + yb.
156
Chapter 11. Modules
So
br1 + ar2 + (ab) = br10 + xab + ar20 + yab + (ab) = br10 + ar20 + (ab).
So this is indeed well-defined. It is clear that this is a module map, by inspection.
We now have to show it is surjective and injective. So far, we have not used the
hypothesis, that gcd(a, b) = 1. As we know gcd(a, b) = 1, by the Euclidean algorithm,
we can write
1 = ax + by
for some x, y ∈ R. So we have
ϕ (y + (a), x + (b)) = by + ax + (ab) = 1 + (ab).
So 1 ∈ im ϕ . Since this is an R-module map, we get
ϕ (r(y + (a), x + (b))) = r · (1 + (ab)) = r + (ab).
The key fact is that R/(ab) as an R-module is generated by 1. Thus we know ϕ is surjective.
Finally, we have to show it is injective, i.e. that the kernel is trivial. Suppose
ϕ (r1 + (a), r2 + (b)) = 0 + (ab).
Then
br1 + ar2 ∈ (ab).
So we can write
br1 + ar2 = abx
for some x ∈ R. Since a | ar2 and a | abx, we know a | br1 . Since a and b are coprime,
unique factorization implies a | r1 . Similarly, we know b | r2 .
(r1 + (a), r2 + (b)) = (0 + (a), 0 + (b)).
So the kernel is trivial.
■
Theorem 11.3.4 (Prime decomposition theorem). Let R be a Euclidean domain, and M
be a finitely-generated R-module. Then
M∼
= N1 ⊕ N2 ⊕ · · · ⊕ Nt ,
where each Ni is either R or is R/(pn ) for some prime p ∈ R and some n ≥ 1.
Proof. We already know
M∼
=
R
R
⊕···⊕
⊕ R ⊕ · · · ⊕ R.
(d1 )
(dr )
So it suffices to show that each R/(d1 ) can be written in that form. We let
d = pn11 pn22 · · · pnk k
with pi distinct primes. So each pni i is coprime to each other. So by the lemma iterated,
we have
R
R ∼ R
■
= n1 ⊕ · · · ⊕ nk .
(d1 ) (p1 )
(pk )
11.4 Modules over F[X] and normal forms for matrices
157
11.4 Modules over F[X] and normal forms for matrices
That was one promise delivered. We next want to consider the Jordan normal form. This
is less straightforward, since considering V directly as an F module would not be too
helpful (since that would just be pure linear algebra). Instead, we use the following trick:
For a field F, the polynomial ring F[X] is a Euclidean domain, so the results of the last
few sections apply. If V is a vector space on F, and α : V → V is a linear map, then we
can make V into an F[X]-module via
F[X] ×V → V
( f , v) 7→ ( f (α ))(v).
We write Vα for this F[X]-module.
Lemma 11.4.1. If V is a finite-dimensional vector space, then Vα is a finitely-generated
F[X]-module.
Proof. If v1 , · · · , vn generate V as an F-module, i.e. they span V as a vector space over F,
then they also generate Vα as an F[X]-module, since F ≤ F[X].
■
Example 11.4.1. Suppose Vα ∼
= F[X]/(X r ) as F[X]-modules. Then in particular they are
isomorphic as F-modules (since being a map of F-modules has fewer requirements than
being a map of F[X]-modules).
Under this bijection, the elements 1, X, X 2 , · · · , X r−1 ∈ F[X]/(X r ) form a vector space
basis for Vα . Viewing F[X]/(X r ) as an F-vector space, the action of X has the matrix


0 0 ··· 0 0
1 0 · · · 0 0


0 1 · · · 0 0
.

 .. .. . . .. .. 
. .
. . .
0 0 ··· 1 0
We also know that in Vα , the action of X is by definition the linear map α . So under this
basis, α also has matrix


0 0 ··· 0 0
1 0 · · · 0 0 


0 1 · · · 0 0 

.
 .. .. . . .. .. 
. .
. . .
0 0 ··· 1 0
Example 11.4.2. Suppose
Vα ∼
=
F[X]
((X − λ )r )
for some λ ∈ F. Consider the new linear map
β = α − λ · id : V → V.
158
Chapter 11. Modules
Then Vβ ∼
= F[Y ]/(Y r ), for Y = X − λ . So there is a basis for V so that β looks like


0 0 ··· 0 0
1 0 · · · 0 0 


0 1 · · · 0 0 

.
 .. .. . . .. .. 
. .
. . .
0 0 ··· 1 0
So we know α has matrix


λ 0 ··· 0 0
1 λ ··· 0 0


0 1 ··· 0 0


 .. .. . . .. .. 
. .
. . .
0 0 ··· 1 λ
So it is a Jordan block (except the Jordan blocks are the other way round, with zeroes
below the diagonal).
Example 11.4.3. Suppose Vα ∼
= F[X]/( f ) for some polynomial f , for
f = a0 + a1 X + · · · + ar−1 X r−1 + X r .
This has a basis 1, X, X 2 , · · · , X r−1 as well, in which α is


0 0 · · · 0 −a0
1 0 · · · 0 −a1 




c( f ) = 0 1 · · · 0 −a2  .
 .. .. . . ..
.. 
. .
. .
. 
0 0 · · · 1 −ar−1
We call this the companion matrix for the monic polynomial f .
These are different things that can possibly happen. Since we have already classified
all finitely generated F[X] modules, this allows us to put matrices in a rather nice form.
Theorem 11.4.1 (Rational canonical form). Let α : V → V be a linear endomorphism of
a finite-dimensional vector space over F, and Vα be the associated F[X]-module. Then
Vα ∼
=
F[X] F[X]
F[X]
⊕
⊕···⊕
,
( f1 ) ( f2 )
( fs )
with f1 | f2 | · · · | fs . Thus there is a basis for V in which the matrix for α is the block
diagonal


c( f1 )
0
···
0
 0
c( f2 ) · · ·
0 


 ..
..
.. 
..
 .
.
.
. 
0
0
· · · c( fs )
11.4 Modules over F[X] and normal forms for matrices
159
This is the sort of theorem whose statement is longer than the proof.
Proof. We already know that Vα is a finitely-generated F[X]-module. By the structure
theorem of F[X]-modules, we know
F[X] F[X]
F[X]
Vα ∼
⊕
⊕···⊕
⊕ 0.
=
( f1 ) ( f2 )
( fs )
We know there are no copies of F[X], since Vα = V is finite-dimensional over F, but F[X]
is not. The divisibility criterion also follows from the structure theorem. Then the form
of the matrix is immediate.
■
This is really a canonical form. The Jordan normal form is not canonical, since we
can move the blocks around. The structure theorem determines the factors fi up to units,
and once we require them to be monic, there is no choice left.
In terms of matrices, this says that if α is represented by a matrix A ∈ Mn,n (F) in some
basis, then A is conjugate to a matrix of the form above.
From the rational canonical form, we can immediately read off the minimal polynomial as fs . This is since if we view Vα as the decomposition above, we find that fs (α )
kills everything in F[X]
( fs ) . It also kills the other factors since f i | f s for all i. So f s (α ) = 0.
We also know no smaller polynomial kills V , since it does not kill F[X]
( fs ) .
Similarly, we find that the characteristic polynomial of α is f1 f2 · · · fs .
Recall we had a different way of decomposing a module over a Euclidean domain,
namely the prime decomposition, and this gives us the Jordan normal form.
Before we can use that, we need to know what the primes are. This is why we need to
work over C.
Lemma 11.4.2. The prime elements of C[X] are the X − λ for λ ∈ C (up to multiplication
by units).
Proof. Let f ∈ C[X]. If f is constant, then it is either a unit or 0. Otherwise, by the
fundamental theorem of algebra, it has a root λ . So it is divisible by X − λ . So if f is
irreducible, it must have degree 1. And clearly everything of degree 1 is prime.
■
Applying the prime decomposition theorem to C[X]-modules gives us the Jordan normal form.
Theorem 11.4.2 (Jordan normal form). Let α : V → V be an endomorphism of a vector
space V over C, and Vα be the associated C[X]-module. Then
Vα ∼
=
C[X]
C[X]
C[X]
⊕
⊕···⊕
,
a
a
1
2
((X − λ1 ) ) ((X − λ2 ) )
((X − λt )at )
where λi ∈ C do not have to be distinct. So there is a basis of V in which α has matrix


0
Ja1 (λ1 )


Ja2 (λ2 )



,
..


.
0
Jat (λt )
160
Chapter 11. Modules
where


λ 0 ··· 0
1 λ ··· 0


Jm (λ ) =  . .
.. 
.
.
.
.
.
.
. .
0 ··· 1 λ
is an m × m matrix.
Proof. Apply the prime decomposition theorem to Vα . Then all primes are of the form
X − λ . We then use our second example at the beginning of the chapter to get the form of
the matrix.
■
The blocks Jm (λ ) are called the Jordan λ -blocks. It turns out that the Jordan blocks
are unique up to reordering, but it does not immediately follow from what we have so far,
and we will not prove it. It is done in the IB Linear Algebra course.
We can also read off the minimal polynomial and characteristic polynomial of α . The
minimal polynomial is
∏(X − λ )aλ ,
λ
where aλ is the size of the largest λ -block. The characteristic polynomial of α is
∏(X − λ )bλ ,
λ
where bλ is the sum of the sizes of the λ -blocks. Alternatively, it is
t
∏(X − λi)ai .
i=1
From the Jordan normal form, we can also read off another invariant, namely the size of
the λ -space of α , namely the number of λ -blocks.
We can also use the idea of viewing V as an F[X] module to prove Cayley-Hamilton
theorem. In fact, we don’t need F to be a field.
Theorem 11.4.3 (Cayley-Hamilton theorem). Let M be a finitely-generated R-module,
where R is some commutative ring. Let α : M → M be an R-module homomorphism.
Let A be a matrix representation of α under some choice of generators, and let p(t) =
det(tI − A). Then p(α ) = 0.
Proof. We consider M as an R[X]-module with action given by
( f (X))(m) = f (α )m.
Suppose e1 , · · · , en span M, and that for all i, we have
n
α (ei ) = ∑ ai j e j .
j=1
11.5 Conjugacy of matrices*
161
Then
n
∑ (X δi j − ai j )e j = 0.
j=1
We write C for the matrix with entries
ci j = X δi j − ai j ∈ F[X].
We now use the fact that
adj(C)C = det(C)I,
which we proved in IB Linear Algebra (and the proof did not assume that the underlying
ring is a field). Expanding this out, we get the following equation (in F[X]).
χα (X)I = det(XI − A)I = (adj(XI − A))(XI − A).
Writing this in components, and multiplying by ek , we have
n
χα (X)δik ek = ∑ (adj(XI − A)i j )(X δ jk − a jk )ek .
j=1
Then for each i, we sum over k to obtain
n
n
k=1
j,k=1
∑ χα (X)δik ek = ∑ (adj(XI − A)i j )(X δ jk − a jk )ek = 0,
by our choice of ai j . But the left hand side is just χα (X)ei . So χα (X) acts trivially on all
of the generators ei . So it in fact acts trivially. So χα (α ) is the zero map (since acting by
■
X is the same as acting by α , by construction).
Note that if we want to prove this just for matrices, we don’t really need the theory of
rings and modules. It just provides a convenient language to write the proof in.
11.5 Conjugacy of matrices*
We are now going to do some fun computations of conjugacy classes of matrices, using
what we have got so far.
Lemma 11.5.1. Let α , β : V → V be two linear maps. Then Vα ∼
= Vβ as F[X]-modules if
and only if α and β are conjugate as linear maps, i.e. there is some γ : V → V such that
α = γ −1 β γ .
This is not a deep theorem. This is in some sense just some tautology. All we have to
do is to unwrap what these statements say.
162
Chapter 11. Modules
Proof. Let γ : Vβ → Vα be an F[X]-module isomorphism. Then for v ∈ V , we notice that
β (v) is just X · v in Vβ , and α (v) is just X · v in Vα . So we get
β ◦ γ (v) = X · (γ (v)) = γ (X · v) = γ ◦ α (v),
using the definition of an F[X]-module homomorphism.
So we know
β γ = γα .
So
α = γ −1 β γ .
Conversely, let γ : V → V be a linear isomorphism such that γ −1 β γ = α . We now claim
that γ : Vα → Vβ is an F[X]-module isomorphism. We just have to check that
γ ( f · v) = γ ( f (α )(v))
= γ (a0 + a1 α + · · · + an α n )(v)
= γ (a0 v) + γ (a1 α (v)) + γ (a2 α 2 (v)) + · · · + γ (an α n (v))
= (a0 + a1 β + a2 β 2 + · · · + an β n )(γ (v))
= f · γ (v).
■
So classifying linear maps up to conjugation is the same as classifying modules.
We can reinterpret this a little bit, using our classification of finitely-generated modules.
Corollary 11.5.1. There is a bijection between conjugacy classes of n × n matrices over F
and sequences of monic polynomials d1 , · · · , dr such that d1 | d2 | · · · | dr and deg(d1 · · · dr ) =
n.
Example 11.5.1. Let’s classify conjugacy classes in GL2 (F), i.e. we need to classify
F[X]-modules of the form
F[X] F[X]
F[X]
⊕
⊕···⊕
(d1 ) (d2 )
(dr )
which are two-dimensional as F-modules. As we must have deg(d1 d2 · · · dr ) = 2, we either
have a quadratic thing or two linear things, i.e. either
(i) r = 1 and deg(d1 ) = 2,
(ii) r = 2 and deg(d1 ) = deg(d2 ) = 1. In this case, since we have d1 | d2 , and they are
both monic linear, we must have d1 = d2 = X − λ for some λ .
In the first case, the module is
F[X]
,
(d1 )
where, say,
d1 = X 2 + a1 X + a2 .
11.5 Conjugacy of matrices*
163
In the second case, we get
F[X]
F[X]
⊕
.
(X − λ ) (X − λ )
What does this say? In the first case, we use the basis 1, X, and the linear map has matrix
0 −a2
1 −a1
In the second case, this is
λ 0
.
0 λ
Do these cases overlap? Suppose the two of them are conjugate. Then they have the same
determinant and same trace. So we know
−a1 = 2λ
a2 = λ 2
So in fact our polynomial is
X 2 + a1 X + a2 = X 2 − 2λ + λ 2 = (X − λ )2 .
This is just the polynomial of a Jordan block. So the matrix
0 −a2
1 −a1
is conjugate to the Jordan block
λ 0
,
1 λ
but this is not conjugate to λ I, e.g. by looking at eigenspaces. So these cases are disjoint.
Note that we have done more work that we really needed, since λ I is invariant under
conjugation.
But the first case is not too satisfactory. We can further classify it as follows. If
2
X + a1 X + a2 is reducible, then it is
(X − λ )(X − µ )
for some µ , λ ∈ F. If λ = µ , then the matrix is conjugate to
λ 0
1 λ
Otherwise, it is conjugate to
λ 0
.
0 µ
In the case where X 2 + a1 X + a2 is irreducible, there is nothing we can do in general.
However, we can look at some special scenarios and see if there is anything we can do.
164
Chapter 11. Modules
Example 11.5.2. Consider GL2 (Z/3). We want to classify its conjugacy classes. By the
general theory, we know everything is conjugate to
λ 0
λ 0
0 −a2
,
,
,
0 µ
1 λ
1 −a1
with
X 2 + a1 X + a2
irreducible. So we need to figure out what the irreducibles are.
A reasonable strategy is to guess. Given any quadratic, it is easy to see if it is irreducible, since we can try to see if it has any roots, and there are just three things to try.
However, we can be a bit slightly more clever. We first count how many irreducibles we
are expecting, and then find that many of them.
There are 9 monic quadratic polynomials in total, since a1 , a2 ∈ Z/3. The reducibles
are (X − λ )2 or (X − λ )(X − µ ) with λ 6= µ . There are three of each kind. So we have 6
reducible polynomials, and so 3 irreducible ones.
We can then check that
X 2 + 1,
X 2 + X + 2, X 2 + 2X + 2
are the irreducible polynomials. So every matrix in GL2 (Z/3) is either congruent to
0 −1
0 −2
0 −2
λ 0
λ 0
,
,
,
,
,
1 0
1 −1
1 −2
0 µ
1 λ
where λ , µ ∈ (Z/3)× (since the matrix has to be invertible). The number of conjugacy
classes of each type are 1, 1, 1, 3, 2. So there are 8 conjugacy classes. The first three
classes have elements of order 4, 8, 8 respectively, by trying. We notice that the identity
matrix has order 1, and
λ 0
0 µ
has order 2 otherwise. Finally, for the last type, we have
1 0
2 0
ord
= 3, ord
=6
1 1
1 2
Note that we also have
|GL2 (Z/3)| = 48 = 24 · 3.
Since there is no element of order 16, the Sylow 2-subgroup of GL2 (Z/3) is not cyclic.
To construct the Sylow 2-subgroup, we might start with an element of order 8, say
0 1
B=
.
1 2
11.5 Conjugacy of matrices*
165
To make a subgroup of order 6, a sensible guess would be to take an element of order 2,
but that doesn’t work, since B4 will give you the element of order 2. Instead, we pick
0 2
A=
.
1 0
We notice
0 1
0 1
0 2
1 2
0 2
2 2
A BA =
=
=
= B3 .
2 0
1 2
1 0
0 2
1 0
2 0
−1
So this is a bit like the dihedral group.
We know that
hBi ◁ hA, Bi.
Also, we know |hBi| = 8. So if we can show that hBi has index 2 in hA, Bi, then this is the
Sylow 2-subgroup. By the second isomorphism theorem, something we have never used
in our life, we know
hA, Bi ∼ hAi
.
=
hBi
hAi ∩ hBi
We can list things out, and then find
2 0
∼
hAi ∩ hBi =
= C2 .
0 2
We also know hAi ∼
= C4 . So we know
|hA, Bi|
= 2.
|hBi|
So |hA, Bi| = 16. So this is the Sylow 2-subgroup. in fact, it is
hA, B | A4 = B8 = e, A−1 BA = B3 i
We call this the semi-dihedral group of order 16, because it is a bit like a dihedral group.
Note that finding this subgroup was purely guesswork. There is no method to know
that A and B are the right choices.
Download