Supplementary Notes on Mathematics Part I: Linear Algebra 1 Linear Transformations

advertisement
J. Broida
UCSD Fall 2009
Phys 130B
QM II
Supplementary Notes on Mathematics
Part I: Linear Algebra
1
Linear Transformations
Let me very briefly review some basic properties of linear transformations and their
matrix representations that will be useful to us in this course. Most of this should
be familiar to you, but I want to make sure you know my notation and understand
how I think about linear transformations. I assume that you are already familiar
with the three elementary row operations utilized in the Gaussian reduction of
matrices:
(α) Interchange two rows.
(β) Multiply one row by a nonzero scalar.
(γ) Add a scalar multiple of one row to another.
I also assume that you know how to find the inverse of a matrix, and you know
what a vector space is.
If V is an n-dimensional vector space over a field F (which you can think of as
either R or C), then a linear transformation on V is a mapping T : V → V with
the property that for all x, y ∈ V and a ∈ F we have
T (x + y) = T (x) + T (y)
and
T (ax) = aT (x) .
We will frequently write T x rather than T (x). That T (0) = 0 follows either by
letting a = 0 or noting that T (x) = T (x + 0) = T (x) + T (0).
By way of notation, we will write T ∈ L(V ) if T is a linear transformation from
V to V . A more general notation very often used is to write T ∈ L(U, V ) to denote
a linear transformation T : U → V from a space U to a space V .
P
A set of vectors {v1 , . . . , vn } is said to be linearly independent if ni=1 ai vi =
0 implies that ai = 0 for all i = 1, . . . , n. The set {vi } is also said to span V if
every vector in V can be written as a linear combination of the vi . A basis for V
is a set of linearly independent vectors that also spans V . The dimension of V is
the (unique) number of vectors in any basis.
A simple but extremely useful fact is that every vector x ∈ V has a unique
expansion
given basis {ei }. P
Indeed, if we have two such expansions
Pn in terms of any P
n
n
′
′
x =
x
e
and
x
=
x
e
,
then
i=1 i i
i=1 i i
i=1 (xi − xi )ei = 0. But the ei are
′
linearly independent by definition so that xi − xi = 0 for each i, and hence we must
have xi = x′i and the expansion is unique as claimed. The scalars xi are called the
components of x.
1
Now suppose we are given a set {v1 , v2 , . . . , vr } of linearly independent vectors in a finite-dimensional space V with dim V = n. Since V always has a basis
{e1 , e2 , . . . , en }, the set of n + r vectors {v1 , . . . , vr , e1 , . . . , en } will necessarily span
V . We know that v1 , . . . , vr are linearly independent, so check to see if e1 can be
written as a linear combination of the vi ’s. If it can, then delete it from the set. If
it can’t, then add it to the vi ’s. Now go to e2 and check to see if it can be written
as a linear combination of {v1 , . . . , vr , e1 }. If it can, delete it, and if it can’t, then
add it to the set. Continuing in this manner, we will eventually arrive at a subset
of {v1 , . . . , vr , e1 , . . . , en } that is linearly independent and spans V . In other words,
we have extended the set {v1 , . . . , vr } to a complete basis for V . The fact that this
can be done (at least in principle) is an extremely useful tool in many proofs.
A very important characterization of linear transformations that we may find
useful is the following. Define the set
Ker T = {x ∈ V : T x = 0} .
The set Ker T ⊂ V is called the kernel of T . In fact, it is not hard to show
that Ker T is actually a subspace of V . Recall that a mapping T is said to be
one-to-one if x 6= y implies T x 6= T y. The equivalent contrapositive statement
of this is that T is one-to-one if T x = T y implies x = y. Let T be a linear
transformation with Ker T = {0}, and suppose T x = T y. Then by linearity we
have T x − T y = T (x − y) = 0. But Ker T = {0} so we conclude that x − y = 0
or x = y. In other words, the fact that Ker T = {0} means that T must be oneto-one. Conversely, if T is one-to-one, then the fact that we always have T (0) = 0
means that Ker T = {0}. Thus a linear transformation is one-to-one if and only
if Ker T = {0}. A linear transformation T with Ker T = {0} is said to be a
nonsingular transformation.
If V has a basis {ei }, then any x ∈ V has a unique expansion which we will write
as x = xi ei . Note that here I am using the Einstein summation convention
where repeated indices are summed over. (The range of summation
is always clear
Pn
from the context.) Thus xi ei is a shorthand notation for i=1 xi ei . Since we will
almost exclusively work with Cartesian coordinates, there is no difference between
superscripts and subscripts, and I will freely raise or lower indices as needed for
notational clarity. In general, the summation convention should properly be applied
to an upper and a lower index, but we will sometimes ignore this, particularly when
it comes to angular momentum operators. Note also that summation indices are
dummy indices. By this we mean that the particular letter used to sum over is
irrelevant. In other words, xi ei is the same as xk ek , and we will frequently relabel
indices in many of our calculations.
In any case, since T is a linear map, we see that T (x) = T (xi ei ) = xi T (ei ) and
hence a linear transformation is fully determined by its values on a basis. Since T
maps V into V , it follows that T ei is just another vector in V , and hence we can
write
T e i = e j aj i
(1)
where the scalar coefficients aj i define the matrix representation of T with respect to the basis {ei }. We sometimes write [T ] = (ai j ) to denote the fact that the
2
n×n matrix A = (ai j ) is the matrix representation of T . (And if we need to be clear
just what basis the matrix representation is with respect to, we will write [T ]e .) Be
sure to note that it is the row index that is summed over in this equation. This is
necessary so that the composition ST of two linear transformations S and T has a
matrix representation [ST ] = AB that is the product of the matrix representations
[S] = A of S and [T ] = B of T taken in the same order.
We will denote the set of all n × n matrices over the field F by Mn (F ), and
the set of all m × n matrices over F by Mm×n (F ). Furthermore, if A ∈ Mm×n (F ),
we will label the rows of A by subscripts such as Ai , and the columns of A by
superscripts such as Aj . It is important to realize that each row vector Ai is just
a vector in F n , and each column vector Aj is just a vector in F m . Therefore the
rows of A form a subspace of Rn called the row space and denoted by row(A).
Similarly, the columns form a subspace of F m called the column space col(A).
The dimension of row(A) is called the row rank rr(A) of A, and dim col(A) is
called the column rank cr(A).
What happens if we perform elementary row operations on A? Since all we do
is take linear combinations of the rows, it should be clear that the row space won’t
change, and hence rr(A) is also unchanged. However, the components of the column
vectors get mixed up, so it isn’t at all clear just what happens to either col(A) or
cr(A). In fact, while col(A) will change, it turns out that cr(A) remains unchanged.
Probably the easiest way to see this is to consider those columns of A that are
linearly dependent ; and with no loss of generality we can call them A1 , . . . , Ar .
Then their linear dependence means there are nonzero scalars x1 , . . . , xr such that
P
r
i
i=1 A xi = 0. In full form this is




a11
a1r
 .. 
 . 
 .  x1 + · · · +  ..  xr = 0.
am1
amr
But this is a system of m linear equations in r unknowns, and we have
that the
Pseen
r
solution set doesn’t change under row equivalence. In other words, i=1 Ãi xi = 0
for the same coefficients xi . Then the same r columns of à are linearly dependent,
and hence both A and à have the same (n − r) independent columns, i.e., cr(A) =
cr(Ã). (There can’t be more dependent columns of à than A because we can apply
the row operations in reverse to go from à to A. If à had more dependent columns,
then when we got back to A we would have more than we started with.)
Furthermore, it is also true that the dimension of the row and column spaces of
a matrix are the same, and this is in fact what is meant by the rank of a matrix.
To see this, think about the reduced row echelon form of a matrix. This form has
a 1 for the first entry of every nonzero row, and every other entry in the column
containing that 1 is 0. For example, the following matrix is in reduced row echelon
form:


1 0 5 0 2
0 1 2 0 4


0 0 0 1 7 .
0 0 0 0 0
3
Note that every column either consists entirely of a single 1, or is a linear combination of columns that each have only a single 1. In addition, the number of columns
containing that single 1 is the same as the number of nonzero rows. Therefore the
row rank and column rank are the same, and this common number is called the
rank of a matrix.
If A ∈ Mn (F ) is an n × n matrix that is row equivalent to the identity matrix
I, then A has rank n, and we say that A is nonsingular. If rank A < n then A is
said to be singular. It can be shown that a matrix is invertible (i.e., A−1 exists) if
and only if it is nonsingular.
Now let A ∈ Mm×n (F ) and B ∈ Mn×r (F ) be such that
P the product AB is
defined. Since the (i, j)th entry of AB is given by (AB)ij = k aik bkj , we see that
the ith row of AB is given by a linear combination of the rows of B:
X
X
X
X
(AB)i =
aik (bk1 , . . . , bkr ) =
aik Bk . (2a)
aik bk1 , . . . ,
aik bkr =
k
k
k
k
This shows that the row space of AB is a subspace of the row space of B. Another
way to write this is to observe that
X
X
(AB)i =
aik bk1 , . . . ,
aik bkr
k
k

b11
 ..
= (ai1 , . . . , ain )  .
bn1
···
···

b1r
.. 
.  = Ai B.
(2b)
bnr
Similarly, for the columns of a product we find that the jth column of AB is a
linear combination of the columns of A:
P



a1k
k a1k bkj
n
n
X

 X  .. 
..
(AB)j = 
=
b
=
Ak bkj
(3a)



kj
.
P .
k=1
k=1
amk
k amk bkj
and therefore the column space of AB is a subspace of the column space of A. We
also have the result


 
P
b1j
a11 · · · a1n
k a1k bkj
  ..

..   ..  = AB j .
..
(3b)
(AB)j = 
= .
.  . 
P .
bnj
am1 · · · amn
k amk bkj
These formulas will be quite useful to us in a number of theorems and calculations.
Returning to linear transformations, it is extremely important to realize that T
takes the ith basis vector ei into the ith column of A = [T ]. This is easy to see
because with respect to the basis {ei } itself, the vectors ei have components simply
4
given by
 
1
 
0

e1 = 
 .. 
.
0
Then
 
0
 
1

e2 = 
 .. 
.
 
0
 
0

en = 
 ..  .
.
···
0
1
T e i = e j aj i = e 1 a1 i + e 2 a2 i + · · · e n an i
 1 
 
 
 
a i
0
0
1
 2 
 
 
 
a i
0 n
1 2
0 1


 
 

=
 ..  a i +  ..  a i + · · · +  ..  a i =  .. 
 . 
.
.
.
0
1
0
an i
which is just the ith column of (aj i ).
Example 1. For example, let V have the basis {e1 , e2 , e3 }, and let T be the linear
transformation defined by
T (e1 ) = 3e1
+ e3
T (e2 ) = e1 − 2e2 − e3
T (e3 ) =
e2 + e3
Then the representation of T (relative to this basis) is


3
1
0
1 .
[T ]e =  0 −2
1 −1
1
Now suppose we have another basis {ēi } for V . Since each basis vector ēi is just
some vector in V , it can be expressed in terms of the original basis {ei }. We can
think of this as defining another linear transformation P whose representation (pi j )
is called the transition matrix and is defined by
ēi = P ei = ej pj i .
(4)
Here we are being somewhat sloppy in using the same symbol P to denote both
the linear transformation P and its matrix representation P = (pi j ). Note that we
could equally well write each ei in terms of {ēj }, and hence the matrix P must be
invertible.
Now realize that a vector x ∈ V exists independently of any particular basis for
V . However, its components most definitely depend on the basis, and hence using
5
(4) we have
x = xj ej = x̄i ēi = x̄i ej pj i = (pj i x̄i )ej .
Equating coefficients of each ej (this is an application of the uniqueness of the
expansion in terms of a given basis) we conclude that xj = pj i x̄i or, equivalently,
x̄i = (p−1 )i j xj .
(5)
Equations (4) and (5) describe the relationship between vector components with
respect to two distinct bases. What about the matrix representation of a linear
transformation T with respect to bases {ei } and {ēi }? By definition we can write
both
T e i = e j aj i
(6a)
T ēi = ēj āj i .
(6b)
and
Using (4) in the right side of (6b) we have
T ēi = ek pk j āj i
On the other hand, we can use (4) in the left side of (6b) and then use (6a) to write
T ēi = T (ej pj i ) = pj i T ej = pj i ek ak j = ek ak j pj i
where in the last step we wrote the matrix product in the correct order. Now
equate both forms of T ēi and use the linear independence of the ek to conclude that
pk j āj i = ak j pj i which in matrix notation is just P Ā = AP . Since P is invertible
this can be written in the form that should be familiar to you:
Ā = P −1 AP .
(7)
A relationship of this form is called a similarity transformation. Be sure to note
that P goes from the basis {ei } to the basis {ēi }.
−1
Conversely, suppose T is represented by
PA in the basis {ei }, and let Ā = P AP .
Defining a new basis {ēi } by ēi = P ei = j ej pji it is straightforward to show that
the matrix representation of T relative to the basis {ēi } is just Ā.
Example 2. As an example, consider the linear transformation T : R3 → R3 (i.e.,
T ∈ L(R3 )) defined by


9x + y
T (x, y, z) =  9y  .
7z
6
Let {ei } be the standard basis for R3 , and let {ēi } be the basis defined by
 
1
ē1 =  0 
1


1
ē2 =  0 
−1
 
0
ē3 =  1  .
1
Let us first find the representation Ā = [T ]ē directly from the definition T (ēi ) =
P3
j
j=1 ēj āji = ēj ā i . We will go through two ways of doing this to help clarify the
various concepts involved.
We have T (ē1 ) = T (1, 0, 1) = (9, 0, 7). Then we write (9, 0, 7) = a(1, 0, 1) +
b(1, 0, −1) + c(0, 1, 1) and solve for a, b, c to obtain T (ē1 ) = 8ē1 + ē2 . Similarly, we
find T (ē2 ) = T (1, 0, −1) = (9, 0, −7) = ē1 + 8ē2 and T (ē3 ) = T (0, 1, 1) = (1, 9, 7) =
(−1/2)ē1 + (3/2)ē2 + 9ē3 . This shows that the representation [T ]ē is given by

8
Ā = [T ]ē =  1
0

1 −1/2
8
3/2  .
0
9
Another way is to use the fact that everything is simple with respect to the
standard basis for R3 . We see that T (e1 ) = T (1, 0, 0) = (9, 0, 0) = 9e1 , T (e2 ) =
T (0, 1, 0) = (1, 9, 0) = e1 + 9e2 and T (e3 ) = T (0, 0, 1) = (0, 0, 7) = 7e3 . Note that
this shows


9 1 0
A = [T ]e =  0 9 0 
0 0 7
which we will need below when we use the transition matrix to find Ā.
It is easy to see that ē1 = e1 + e3 , ē2 = e1 − e3 and ē3 = e2 + e3 , so inverting
these equations we have e1 = (1/2)(ē1 + ē2 ), e3 = (1/2)(ē1 − ē2 ) and e2 = ē3 − e3 =
−(1/2)(ē1 − ē2 ) + ē3 . Then using the linearity of T we have
T (ē1 ) = T (e1 + e3 ) = T (e1 ) + T (e3 ) = 9e1 + 7e3
= (9/2)(ē1 + ē2 ) + (7/2)(ē1 − ē2 )
= 8ē1 + ē2
T (ē2 ) = T (e1 − e3 ) = T (e1 ) − T (e3 ) = 9e1 − 7e3
= (9/2)(ē1 + ē2 ) − (7/2)(ē1 − ē2 )
= ē1 + 8ē2
T (ē3 ) = T (e2 + e3 ) = T (e2 ) + T (e3 ) = e1 + 9e2 + 7e3
= (1/2)(ē1 + ē2 ) − (9/2)(ē1 − ē2 ) + 9ē3 + (7/2)(ē1 − ē2 )
= −(1/2)ē1 + (3/2)ē2 + 9ē3
and, as expected, this gives the same result as we had above for [T ]ē .
7
Now we will use the transition matrix P to find Ā = [T ]ē . The matrix P is
P3
defined by ēi = P ei = j=1 ej pji = ej pj i which is just the ith column of P , so we
immediately have


1
1 0
0 1.
P =0
1 −1 1
There are a number of ways to find P −1 which you should already be familiar with,
and I won’t bother to explain them. We will simply use the fact that the inverse
matrix is defined by ei = P −1 ēi and use the expressions we found above for each ei
in terms of the ēi ’s. This last approach is the easiest for us and we can just write
down the result


1 −1
1
1
1 −1  .
P −1 =  1
2
0
2
0
We now see that


1 −1
1
9 1
1
[T ]ē = P −1 [T ]e P =  1
1 −1   0 9
2
0
2
0
0 0


8 1 −1/2
3/2 
=1 8
0 0
9


0
1
1 0
0  0
0 1
7
1 −1 1
which agrees with our previous approaches.
Also realize that a vector X = (x, y, z) ∈ R3 has components x, y, z only with
respect to the standard basis {ei } for R3 . In other words
 
 
 
 
x
1
0
0
X =  y  = x  0  + y  1  + z  0  = xe1 + ye2 + ze3 .
z
0
0
1
But with respect to the basis {ēi } we have


 

1 −1
1
x−y+z
x
1
1
1 −1   y  =  x + y − z 
X = P −1 X =  1
2
2
0
2
0
2y
z
1
1
(x − y + z)ē1 + (x + y − z)ē2 + yē3
2
2
= x̄ē1 + ȳē2 + z̄ē3 .
=
As we will see later in the course, the Clebsch-Gordan coefficients that you may
have seen are nothing more than the entries in the (unitary) transition matrix that
8
takes you between the |j1 j2 m1 m2 i basis and the |j1 j2 jmi basis in the vector space
of two-particle angular momentum states.
If T ∈ L(V ) is a linear transformation, then the image of T is the set
Im T = {T x : x ∈ V } .
It is also easy to see that Im T is a subspace of V . Furthermore, we define the rank
of T to be the number
rank T = dim(Im T ) .
By picking a basis for Ker T and extending it to a basis for all of V , it is not hard
to show that the following result holds, often called the rank theorem:
dim(Im T ) + dim(Ker T ) = dim V .
(8)
It can also be shown that the rank of a linear transformation T is equal to
the rank of any matrix representation of T (which is independent of similarity
transformations). This is a consequence of the fact that T ei is the ith column of
the matrix representation of T , and the set of all such vectors T ei spans Im T .
Then rank T is the number of linearly independent vectors T ei , which is also the
dimension of the column space of [T ]. But the dimension of the row and column
spaces of a matrix are the same, and this is what is meant by the rank of a matrix.
Thus rank T = rank[T ].
Note that if T is one-to-one, then Ker T = {0} so that dim Ker T = 0. It then
follows from (8) that rank[T ] = rank T = dim(Im T ) = dim V = n so that [T ] is
invertible.
Another result we will need is the following.
Theorem 1. If A and B are any matrices for which the product AB is defined,
then the row space of AB is a subspace of the row space of B, and the column space
of AB is a subspace of the column space of A.
P
Proof. As we saw above, using (AB)i = k aik Bk it follows that the ith row of
AB is in the space spanned by the rows of B, and hence the row space of AB is a
subspace of the row space of B.
As to the column space, this was also shown above. Alternatively, note that the
column space of AB is just the row space of (AB)T = B T AT , which is a subspace
of the row space of AT by the first part of the theorem. But the row space of AT
is just the column space of A.
Corollary. rank(AB) ≤ min{rank(A), rank(B)}.
9
Proof. Let row(A) be the row space of A, and let col(A) be the column space of A.
Then
rank(AB) = dim(row(AB)) ≤ dim(row(B)) = rank(B)
while
rank(AB) = dim(col(AB)) ≤ dim(col(A)) = rank(A).
The last topic I want to cover in this section is to briefly explain the mathematics
of two-particle states. While this isn’t really necessary for this course and we won’t
deal with it in detail, it should help you better understand what is going on when
we add angular momenta. In addition, this material is necessary to understand
direct product representations of groups, which is quite important in its own right.
So, given two vector spaces V and V ′ , we may define a bilinear map V × V ′ →
V ⊗ V ′ that takes ordered pairs (v, v ′ ) ∈ V × V ′ and gives a new vector denoted
by v ⊗ v ′ . Since this map is bilinear by definition (meaning P
that it is linear in
P each′
′
variable separately),
if
we
have
the
linear
combinations
v
=
x
v
and
v
=
yj vj
i i
P
then v ⊗ v ′ =
xi yj (vi ⊗ vj′ ). In particular, if V has basis {ei } and V ′ has basis
{e′j }, then {ei ⊗ e′j } is a basis for V ⊗ V ′ which is then of dimension (dim V )(dim V ′ )
and called the direct (or tensor) product of V and V ′ .
If we are given two operators A ∈ L(V ) and B ∈ L(V ′ ), the direct product of
A and B is the operator A ⊗ B defined on V ⊗ V ′ by
(A ⊗ B)(v ⊗ v ′ ) := A(v) ⊗ B(v ′ ) .
We know that the matrix representation of an operator is defined by its values on a
basis, and the ith basis vector goes to the ith column of the matrix representation.
In the case of the direct product, we choose an ordered basis by taking all of the
(dim V )(dim V ′ ) = mn elements ei ⊗ e′j in the obvious order
{e1 ⊗ e′1 , . . . , e1 ⊗ e′n , e2 ⊗ e′1 , . . . , e2 ⊗ e′n , . . . , em ⊗ e′1 , . . . , em ⊗ e′n } .
Now our matrix elements are labeled by double subscripts because each basis vector
is labeled by two subscripts.
The (ij)th column of C = A ⊗ B is given in the usual way by acting on ei ⊗ e′j
with A ⊗ B:
(A ⊗ B)(ei ⊗ e′j ) = Aei ⊗ Be′j = ek ak i ⊗ e′l bl j = (ek ⊗ e′l )ak i bl j
= (ek ⊗ e′l )(A ⊗ B)kl ij .
For example, the (1, 1)th column of C is the vector (A⊗B)(e1 ⊗e′1 ) = ak 1 bl 1 (ek ⊗e′l )
given by
(a1 1 b1 1 , . . . , a1 1 bn 1 , a2 1 b1 1 , . . . , a2 1 bn 1 , . . . , am 1 b1 1 , . . . , am 1 bn 1 )
and in general, the (i, j)th column is given by
(a1 i b1 j , . . . , a1 i bn j , a2 i b1 j , . . . , a2 i bn j , . . . , am i b1 j , . . . , am i bn j ) .
10
If we write this as the column vector it is,
 1 1
a ib j

..

.

 a1 i b n j



..

.

 m 1
 a ib j


..

.
am i b n j















then it is not hard to see this shows that the matrix C has the block matrix form
 1

a 1 B a12 B · · · a1m B
..
.. 
 .
C =  ..
.
. .
am1 B am2 B · · · amm B
As I said, we will see an application of this formalism when we treat the addition
of angular momentum.
2
The Levi-Civita Symbol and the Vector Cross
Product
In order to ease into the notation we will use, we begin with an elementary treatment
of the vector cross product. This will give us a very useful computational tool that
is of importance in and of itself. While you are probably already familiar with the
cross product, we will still go through its development from scratch just for the sake
of completeness.
To begin with, consider two vectors a and b in R3 (with Cartesian coordinates).
There are two ways to define their vector product (or cross product) a × b.
The first way is to define a × b as that vector with norm given by
ka × bk = kak kbk sin θ
where θ is the angle between a and b, and whose direction is such that the triple
(a, b, a × b) has the same “orientation” as the standard basis vectors (x̂, ŷ, ẑ). This
is commonly referred to as “the right hand rule.” In other words, if you rotate a
into b thru the smallest angle between them with your right hand as if you were
using a screwdriver, then the screwdriver points in the direction of a × b. Note that
by definition, a × b is perpendicular to the plane spanned by a and b.
The second way to define a × b is in terms of its vector components. I will start
from this definition and show that it is in fact equivalent to the first definition. So,
11
we define a × b to be the vector c with components
cx = (a × b)x = ay bz − az by
cy = (a × b)y = az bx − ax bz
cz = (a × b)z = ax by − ay bx
Before proceeding, note that instead of labeling components by (x, y, z) it will
be very convenient for us to use (x1 , x2 , x3 ). This is standard practice, and it will
greatly facilitate many equations throughout the remainder of these notes. Using
this notation, the above equations are written
c1 = (a × b)1 = a2 b3 − a3 b2
c2 = (a × b)2 = a3 b1 − a1 b3
c3 = (a × b)3 = a1 b2 − a2 b1
We now see that each equation can be obtained from the previous by cyclically
permuting the subscripts 1 → 2 → 3 → 1.
Using these equations, it is easy to multiply out components and verify that
a · c = a1 c1 + a2 c2 + a3 c3 = 0, and similarly b · c = 0. This shows that a × b is
perpendicular to both a and b, in agreement with our first definition.
Next, there are two ways to show that ka × bk is also the same as in the first
definition. The easy way is to note that any two vectors a and b in R3 (both based
at the same origin) define a plane. So we choose our coordinate axes so that a lies
along the x1 -axis as shown below.
x2
b
h
θ
x1
a
Then a and b have components a = (a1 , 0, 0) and b = (b1 , b2 , 0) so that
(a × b)1 = a2 b3 − a3 b2 = 0
(a × b)2 = a3 b1 − a1 b3 = 0
(a × b)3 = a1 b2 − a2 b1 = a1 b2
and therefore c = a × b = (0, 0, a1 b2 ). But a1 = kak and b2 = h = kbk sin θ so that
P
kck2 = 3i=1 ci 2 = (a1 b2 )2 = (kak kbk sin θ)2 and therefore
ka × bk = kak kbk sin θ .
12
Since both the length of a vector and the angle between two vectors is independent
of the orientation of the coordinate axes, this result holds for arbitrary a and b.
Therefore ka × bk is the same as in our first definition.
The second way to see this is with a very unenlightening brute force calculation:
2
2
ka × bk2 = (a × b) · (a × b) = (a × b)1 + (a × b)2 + (a × b)3
2
= (a2 b3 − a3 b2 )2 + (a3 b1 − a1 b3 )2 + (a1 b2 − a2 b1 )2
= a2 2 b 3 2 + a3 2 b 2 2 + a3 2 b 1 2 + a1 2 b 3 2 + a1 2 b 2 2 + a2 2 b 1 2
− 2(a2 b3 a3 b2 + a3 b1 a1 b3 + a1 b2 a2 b1 )
= (a2 2 + a3 2 )b1 2 + (a1 2 + a3 2 )b2 2 + (a1 2 + a2 2 )b3 2
− 2(a2 b2 a3 b3 + a1 b1 a3 b3 + a1 b1 a2 b2 )
= (add and subtract terms)
= (a1 2 + a2 2 + a3 2 )b1 2 + (a1 2 + a2 2 + a3 2 )b2 2
+ (a1 2 + a2 2 + a3 2 )b3 2 − (a1 2 b1 2 + a2 2 b2 2 + a3 2 b3 2 )
− 2(a2 b2 a3 b3 + a1 b1 a3 b3 + a1 b1 a2 b2 )
= (a1 2 + a2 2 + a3 2 )(b1 2 + b2 2 + b3 2 ) − (a1 b1 + a2 b2 + a3 b3 )2
= kak2 kbk2 − (a · b)2 = kak2 kbk2 − kak2 kbk2 cos2 θ
= kak2 kbk2 (1 − cos2 θ) = kak2 kbk2 sin2 θ
so again we have ka × bk = kak kbk sin θ.
To see the geometrical meaning of the vector product, first take a look at the
parallelogram with sides defined by a and b.
b
h
θ
a
In the figure, the height h is equal to b sin θ (where b = kbk and similarly for a),
and the area of the parallelogram is equal to the area of the two triangles plus the
area of the rectangle:
1
area = 2 · (b cos θ)h + (a − b cos θ)h
2
= ah = ab sin θ = ka × bk .
Now suppose we have a third vector c that is not coplanar with a and b, and
consider the parallelepiped defined by the three vectors as shown below.
13
a×b
θ
c
b
a
The volume of this parallelepiped is given by the area of the base times the height,
and hence is equal to
Vol(a, b, c) = ka × bk kck cos θ = (a × b) · c .
So we see that the so-called scalar triple product (a× b)·c represents the volume
spanned by the three vectors.
Most of this discussion so far should be familiar to most of you. Now we turn
to a formalism that is probably not so familiar. Our formulation of determinants
will use a generalization of the permutation symbol that we now introduce. Just
keep in mind that the long term benefits of what we are about to do far outweigh
the effort required to learn it.
While the concept of permutation should be fairly intuitive, let us make some
rather informal definitions. If we have a set of n numbers {a1 , a2 , . . . , an }, then
these n numbers can be arranged into n! ordered collections (ai1 , ai2 , . . . , ain ) where
(i1 , i2 , . . . , in ) is just the set (1, 2, . . . , n) arranged in any one of the n! possible
orderings. Such an arrangement is called a permutation of the set {a1 , a2 , . . . , an }.
If we have a set S of n numbers, then we denote the set of all permutations of these
numbers by Sn . This is called the permutation group of order n. Because there
are n! rearrangements (i.e., distinct orderings) of a set of n numbers (this can really
be any n objects), the permutation group of order n consists of n! elements. It is
conventional to denote an element of Sn (i.e., a particular permutation) by Greek
letters such as σ, τ, θ etc.
Now, it is fairly obvious intuitively that any permutation can be achieved by a
suitable number of interchanges of pairs of elements. Each interchange of a pair
is called a transposition. (The formal proof of this assertion is, however, more
difficult than you might think.) For example, let the ordered set (1, 2, 3, 4) be
permuted to the ordered set (4, 2, 1, 3). This can be accomplished as a sequence of
transpositions as follows:
1↔4
1↔3
(1, 2, 3, 4) −−−→ (4, 2, 3, 1) −−−→ (4, 2, 1, 3) .
It is also easy enough to find a different sequence that yields the same final result,
and hence the sequence of transpositions resulting in a given permutation is by
no means unique. However, it is a fact (also not easy to prove formally) that
whatever sequence you choose, the number of transpositions is either always an
even number or always an odd number. In particular, if a permutation σ consists
of m transpositions, then we define the sign of the permutation by
sgn σ = (−1)m .
14
Because of this, it makes sense to talk about a permutation as being either even
(if m is even) or odd (if m is odd).
Now that we have a feeling for what it means to talk about an even or an odd
permutation, let us define the Levi-Civita symbol εijk (also frequently referred
to as the permutation symbol) by


 1 if (i, j, k) is an even permutation of (1, 2, 3)
εijk = −1 if (i, j, k) is an odd permutation of (1, 2, 3) .


0 if (i, j, k) is not a permutation of (1, 2, 3)
In other words,
ε123 = −ε132 = ε312 = −ε321 = ε231 = −ε213 = 1
and εijk = 0 if there are any repeated indices. We also say that εijk is antisymmetric in all three indices, meaning that it changes sign upon interchanging any
two indices. For a given order (i, j, k) the resulting number εijk is also called the
sign of the permutation.
Before delving further into some of the properties of the Levi-Civita symbol,
let’s take a brief look at how
Pit is used. Given two vectors a and b, we can let i = 1
and form the double sum 3j,k=1 ε1jk aj bk . Since εijk = 0 if any two indices are
repeated, the only possible values for j and k are 2 and 3. Then
3
X
j,k=1
ε1jk aj bk = ε123 a2 b3 + ε132 a3 b2 = a2 b3 − a3 b2 = (a × b)1 .
But the components of the cross product are cyclic permutations of each other,
and εijk doesn’t change sign under cyclic permutations, so we have the important
general result
3
X
(a × b)i =
εijk aj bk .
(9)
j,k=1
(A cyclic permutation is one of the form 1 → 2 → 3 → 1 or x → y → z → x.)
Now, in order to handle various vector identities, we need to prove some other
properties of the Levi-Civita symbol. The first identity to prove is this:
3
X
εijk εijk = 3! = 6 .
(10)
i,j,k=1
But this is actually easy, because (i, j, k) must all be different, and there are 3!
ways to order (1, 2, 3). In other words, there are 3! permutations of {1, 2, 3}. For
every case where all three indices are different, whether εijk is +1 or −1, we always
have (εijk )2 = +1, and therefore summing over the 3! possibilities yields the desired
result.
15
Recalling the Einstein summation convention, it is important to keep the placement of any free (i.e., unsummed over) indices the same on both sides of an equation. For example, we would always write something like Aij B jk = Ci k and not
Aij B jk = Cik . In particular, the ith component of the cross product is written
(a × b)i = εijk aj bk .
(11)
As mentioned earlier, for our present purposes, raising and lowering an index is
purely a notational convenience. And in order to maintain the proper index placement, we will frequently move an index up or down as necessary. While this may
seem quite confusing at first, with a little practice it becomes second nature and
results in vastly simplified calculations.
Using this convention, equation (10) is simply written εijk εijk = 6. This also
P3
applies to the Kronecker delta, so that we have expressions like ai δij = i=1 ai δij =
aj (where δij is numerically the same as δij ). An inhomogeneous system of linear
equations would be written as simply ai j xj = y i , and the dot product as
a · b = ai b i = ai b i .
(12)
Note also that indices that are summed over are “dummy indices” meaning,
for
P3
ai b i =
example, that ai bi = ak bk . This is simply another way of writing
i=1
P3
a1 b1 + a2 b2 + a3 b3 = k=1 ak bk .
As we have said, the Levi-Civita symbol greatly simplifies many calculations
dealing with vectors. Let’s look at some examples.
Example 3. Let us take a look at the scalar triple product. We have
a · (b × c) = ai (b × c)i = ai εijk bj ck
= bj εjki ck ai
= bj (c × a)j
(because εijk = −εjik = +εjki )
= b · (c × a) .
Note also that this formalism automatically takes into account the anti-symmetry
of the cross product:
(c × a)i = εijk cj ak = −εikj cj ak = −εikj ak cj = −(a × c)i .
It doesn’t get any easier than this.
Of course, this formalism works equally well with vector calculus equations involving the gradient ∇. This is the vector defined by
∇ = x̂
∂
∂
∂
∂
∂
∂
∂
+ x̂2
+ x̂3
= ei
.
+ ŷ
+ ẑ
= x̂1
∂x
∂y
∂z
∂x1
∂x2
∂x3
∂xi
16
In fact, it will also be convenient to simplify our notation further by defining ∇i =
∂/∂xi = ∂i , so that ∇ = ei ∂i .
Example 4. Let us prove the well-known identity ∇ · (∇ × a) = 0. We have
∇ · (∇ × a) = ∇i (∇ × a)i = ∂i (εijk ∂j ak ) = εijk ∂i ∂j ak .
But now notice that εijk is antisymmetric in i and j (so that εijk = −εjik ), while
the product ∂i ∂j is symmetric in i and j (because we assume that the order of
differentiation can be interchanged so that ∂i ∂j = ∂j ∂i ). Then
εijk ∂i ∂j = −εjik ∂i ∂j = −εjik ∂j ∂i = −εijk ∂i ∂j
where the last step follows because i and j are dummy indices, and we can therefore
relabel them. But then εijk ∂i ∂j = 0 and we have proved our identity.
The last step in the previous example is actually a special case of a general
result. To see this, suppose that we have an object Aij··· that is labeled by two or
more indices, and suppose that it is antisymmetric in two of those indices (say i, j).
This means that Aij··· = −Aji··· . Now suppose that we have another object Sij···
that is symmetric in i and j, so that Sij··· = Sji··· . If we multiply A times S and
sum over the indices i and j, then using the symmetry and antisymmetry properties
of S and A we have
Aij··· Sij··· = −Aji··· Sij···
by the antisymmetry of A
ji···
Sji···
by the symmetry of S
ij···
Sij···
by relabeling the dummy indices i and j
= −A
= −A
and therefore we have the general result
Aij··· Sij··· = 0 .
It is also worth pointing out that the indices i and j need not be the first pair of indices, nor do they need to be adjacent. For example, we still have A···i···j··· S···i···j··· =
0.
Now suppose that we have an arbitrary object T ij without any particular symmetry properties. Then we can turn this into an antisymmetric object T [ij] by a
process called antisymmetrization as follows:
T ij → T [ij] :=
1 ij
(T − T ji ) .
2!
In other words, we add up all possible permutations of the indices, with the sign
of each permutation being either +1 (for an even permutation) or −1 (for an odd
17
permutation), and then divide this sum by the total number of permutations, which
in this case is 2!. If we have something of the form T ijk then we would have
1 ijk
(T
− T ikj + T kij − T kji + T jki − T jik )
3!
where we alternate signs with each transposition. The generalization to an arbitrary
number of indices should be clear. Note also that we could antisymmetrize only
over a subset of the indices if required.
It is also important to note that it is impossible to have a nonzero antisymmetric
object with more indices than the dimension of the space we are working in. This
is simply because at least one index will necessarily be repeated. For example, if we
are in R3 , then anything of the form T ijkl must have at least one index repeated
because each index can only range between 1, 2 and 3.
Now, why did we go through all of this? Well, first recall that we can write
the Kronecker delta in any of the equivalent forms δij = δji = δij . Then we can
construct quantities like
T ijk → T [ijk] :=
[1 2]
δi δj =
and
1 1 2
2
δi δj − δi2 δj1 = δ[i1 δj]
2!
1 1 2 3
δ δ δ − δi1 δj3 δk2 + δi3 δj1 δk2 − δi3 δj2 δk1 + δi2 δj3 δk1 − δi2 δj1 δk3 .
3! i j k
In particular, we now want to show that
[1
2 3]
δi δj δk =
[1
2 3]
εijk = 3! δi δj δk .
(13)
Clearly, if i = 1, j = 2 and k = 3 we have
[1
2 3]
1 1 2 3
δ δ δ − δ11 δ23 δ32 + δ13 δ21 δ32 − δ13 δ22 δ31 + δ12 δ23 δ31 − δ12 δ21 δ33
3! 1 2 3
= 1 − 0 + 0 − 0 + 0 − 0 = 1 = ε123
3! δ1 δ2 δ3 = 3!
so equation (13) is correct in this particular case. But now we make the crucial
observation that both sides of equation (13) are antisymmetric in (i, j, k), and hence
the equation must hold for all values of (i, j, k). This is because any permutation of
(i, j, k) results in the same change of sign on both sides, and both sides also equal
0 if any two indices are repeated. Therefore equation (13) is true in general.
To derive what is probably the most useful identity involving the Levi-Civita
symbol, we begin with the fact that ε123 = 1. Multiplying the left side of equation
(13) by 1 in this form yields
[1
2 3]
εijk ε123 = 3! δi δj δk .
But now we again make the observation that both sides are antisymmetric in
(1, 2, 3), and hence both sides are equal for all values of the upper indices, and
we have the fundamental result
[n
l m]
εijk εnlm = 3! δi δj δk .
18
(14)
We now set n = k and sum over k. (This process of setting two indices equal to
each other and summing is called contraction.) Using the fact that
δkk
=
3
X
δkk = 3
i=1
along with terms such as δik δkm = δim we find
[k
l m]
εijk εklm = 3! δi δj δk
= δik δjl δkm − δik δjm δkl + δim δjk δkl − δim δjl δkk + δil δjm δkk − δil δjk δkm
= δim δjl − δil δjm + δim δjl − 3δim δjl + 3δil δjm − δil δjm
= δil δjm − δim δjl .
In other words, we have the extremely useful result
εijk εklm = δil δjm − δim δjl .
(15)
This result is so useful that it should definitely be memorized.
Example 5. Let us derive the well-known triple vector product known as the
“bac − cab” rule. We simply compute using equation (15):
[a × (b × c)]i = εijk aj (b × c)k = εijk εklm aj bl cm
= (δil δjm − δim δjl )aj bl cm = am bi cm − aj bj ci
= bi (a · c) − ci (a · b)
and therefore
a × (b × c) = b(a · c) − c(a · b) .
We also point out that some of the sums in this derivation can be done in more
than one way. For example, we have either δil δjm aj bl cm = am bi cm = bi (a · c) or
δil δjm aj bl cm = aj bi cj = bi (a · c), but the end result is always the same. Note also
that at every step along the way, the only index that isn’t repeated (and hence
summed over) is i.
Example 6. Equation (15) is just as useful in vector calculus calculations. Here is
an example to illustrate the technique.
[∇ × (∇ × a)]i = εijk ∂ j (∇ × a)k = εijk εklm ∂ j ∂l am
= (δil δjm − δim δjl )∂ j ∂l am = ∂ j ∂i aj − ∂ j ∂j ai
= ∂i (∇ · a) − ∇2 ai
19
and hence we have the identity
∇ × (∇ × a) = ∇(∇ · a) − ∇2 a
which is very useful in discussing the theory of electromagnetic waves.
3
Determinants
In treating vectors in R3 , we used the permutation symbol εijk defined in the
last section. We are now ready to apply the same techniques to the theory of
determinants. The idea is that we want to define a mapping from a matrix A ∈
Mn (F ) to F in a way that has certain algebraic properties. Since a matrix in
Mn (F ) has components aij with i and j ranging from 1 to n, we are going to need
a higher dimensional version of the Levi-Civita symbol already introduced. The
obvious extension to n dimensions is the following.
We define


 1 if i1 , . . . , in is an even permutation of 1, . . . , n
εi1 ··· in = −1 if i1 , . . . , in is an odd permutation of 1, . . . , n .


0 if i1 , . . . , in is not a permutation of 1, . . . , n
Again, there is no practical difference between εi1 ··· in and εi1 ··· in . Using this, we
define the determinant of A = (aij ) ∈ Mn (F ) to be the number
det A = εi1 ··· in a1i1 a2i2 · · · anin .
(16)
Look carefully at what this expression consists of. Since εi1 ··· in vanishes unless
(i1 , . . . , in ) are all distinct, and there are n! such distinct orderings, we see that
det A consists of n! terms in the sum, where each term is a product of n factors aij ,
and where each term consists precisely of one factor from each row and each column
of A. In other words, det A is a sum of terms where each term is a product of one
element from each row and each column, and the sum is over all such possibilities.
The determinant is frequently written as
a11 . . . a1n .. .
det A = ...
.
an1 . . . ann The determinant of an n × n matrix is said to be of order n. Note also that the
determinant is only defined for square matrices.
20
Example 7. Leaving the easier 2 × 2 case to you to verify, we will work out the
3 × 3 case and show that it gives the same result that you probably learned in a
more elementary course. So, for A = (aij ) ∈ M3 (F ) we have
det A = εijk a1i a2j a3k
= ε123 a11 a22 a33 + ε132 a11 a23 a32 + ε312 a13 a21 a32
+ ε321 a13 a22 a31 + ε231 a12 a23 a31 + ε213 a12 a21 a33
= a11 a22 a33 − a11 a23 a32 + a13 a21 a32
− a13 a22 a31 + a12 a23 a31 − a12 a21 a33
You may recognize this in either of the mnemonic forms (sometimes called Sarrus’s
rule)
−
a11
a12
a13
a11
a12
a21
a22
a23
a21
a22
a31
a32
a33
a31
a32
+
+
−
−
+
or
+
−
+
+
a11
a12
a13
a21
a22
a23
a31
a32
a33
−
−
Here, we are to add together all products of terms connected by a (+) line, and
subtract all of the products connected by a (−) line. It can be shown that this 3 × 3
determinant may be expanded as a sum of three 2 × 2 determinants.
Example 8. Let A = (aij ) be a diagonal matrix, i.e., aij = 0 if i 6= j. Then
det A = εi1 ··· in a1i1 · · · anin = ε1··· n a11 · · · ann
n
Y
aii
= a11 · · · ann =
i=1
21
so that
a11
..
.
0
···
..
.
···
In particular, we see that det I = 1.
n
Y
aii .
=
ann i=1
0
..
.
We now prove a number of useful properties of determinants. These are all very
straightforward applications of the definition (16) once you have become comfortable
with the notation. In fact, in my opinion, this approach to determinants affords the
simplest way in which to arrive at these results, and is far less confusing than the
usual inductive proofs.
Theorem 2. For any A ∈ Mn (F ) we have
det A = det AT .
Proof. This is simply an immediate consequence of our definition of determinant.
We saw that det A is a sum of all possible products of one element from each row
and each column, and no product can contain more than one term from a given
column because the corresponding ε symbol would vanish. This means that an
equivalent way of writing all n! such products is (note the order of subscripts is
reversed)
det A = εi1 ··· in ai1 1 · · · ain n .
But aij = aT ji so this is just
det A = εi1 ··· in ai1 1 · · · ain n = εi1 ··· in aT 1i1 · · · aT nin = det AT .
In order to help us gain some additional practice manipulating these quantities,
we prove this theorem again based on another result which we will find very useful
in its own right. We start from the definition det A = εi1 ··· in a1i1 · · · anin . Again
using ε1··· n = 1 we have
ε1··· n det A = εi1 ··· in a1i1 · · · anin .
(17)
By definition of the permutation symbol, the left side of this equation is antisymmetric in (1, . . . , n). But so is the right side because, taking a1i1 and a2i2 as an
example, we see that
εi1 i2 ··· in a1i1 a2i2 · · · anin = εi1 i2 ··· in a2i2 a1i1 · · · anin
= −εi2 i1 ··· in a2i2 a1i1 · · · anin
= −εi1 i2 ··· in a2i1 a1i2 · · · anin
22
where the last line follows by a relabeling of the dummy indices i1 and i2 .
So, by a now familiar argument, both sides of equation (17) must be true for
any values of the indices (1, . . . , n) and we have the extremely useful result
εj1 ··· jn det A = εi1 ··· in aj1 i1 · · · ajn in .
(18)
This equation will turn out to be very helpful in many proofs that would otherwise
be considerably more difficult.
Let us now use equation (18) to prove Theorem 2. We begin with the analogous
result to equation (10). This is
εi1 ··· in εi1 ··· in = n!.
(19)
Using this, we multiply equation (18) by εj1 ··· jn to yield
n! det A = εj1 ··· jn εi1 ··· in aj1 i1 · · · ajn in .
On the other hand, by definition of det AT we have
det AT = εi1 ··· in aT 1i1 · · · aT nin = εi1 ··· in ai1 1 · · · ain n .
Multiplying the left side of this equation by 1 = ε1··· n and again using the antisymmetry of both sides in (1, . . . , n) yields
εj1 ··· jn det AT = εi1 ··· in ai1 j1 · · · ajn in .
(This also follows by applying equation (18) to AT directly.)
Now multiply this last equation by εj1 ··· jn to obtain
n! det AT = εi1 ··· in εj1 ··· jn ai1 j1 · · · ajn in .
Relabeling the dummy indices i and j we have
n! det AT = εj1 ··· jn εi1 ··· in aj1 i1 · · · ain jn
which is exactly the same as the above expression for n! det A, and we have again
proved Theorem 2.
Let us restate equation (18) as a theorem for emphasis, and also look at two of
its immmediate consequences.
Theorem 3. If A ∈ Mn (F ), then
εj1 ··· jn det A = εi1 ··· in aj1 i1 · · · ajn in .
23
Corollary 1. If B ∈ Mn (F ) is obtained from A ∈ Mn (F ) by interchanging two
rows of A, the det B = − det A.
Proof. This is really just what the theorem says in words. (See the discussion
between equations (17) and (18).) For example, let B result from interchanging
rows 1 and 2 of A. Then
det B = εi1 i2 ··· in b1i1 b2i2 · · · bnin = εi1 i2 ··· in a2i1 a1i2 · · · anin
= εi1 i2 ··· in a1i2 a2i1 · · · anin = −εi2 i1 ··· in a1i2 a2i1 · · · anin
= −εi1 i2 ··· in a1i1 a2i2 · · · anin
= − det A = ε213···n det A .
where again the next to last line follows by relabeling.
Corollary 2. If A ∈ Mn (F ) has two identical rows, then det A = 0.
Proof. If B is the matrix obtained by interchanging two identical rows of A, then
by the previous corollary we have
det A = det B = − det A
and therefore det A = 0.
Here is another way to view Theorem 3 and its corollaries. If we view det A as
a function of the rows of A, then the corollaries state that det A = 0 if any two
rows are the same, and det A changes sign if two nonzero rows are interchanged. In
other words, we have
det(Aj1 , . . . , Ajn ) = εj1 ··· jn det A .
(20)
If it isn’t immediately obvious to you that this is true, then note that for (j1 , . . . , jn ) =
(1, . . . , n) it’s just an identity. So by the antisymmetry of both sides, it must be
true for all j1 , . . . , jn .
Looking at the definition det A = εi1 ··· in a1i1 · · · anin , we see that we can view
the determinant as a function of the rows of A: det A = det(A1 , . . . , An ). Since
each row is actually a vector in F n , we can replace A1 (for example) by any linear
combination of two vectors in F n so that A1 = rB1 + sC1 where r, s ∈ F and
B1 , C1 ∈ F n . Let B = (bij ) be the matrix with rows Bi = Ai for i = 2, . . . , n, and
let C = (cij ) be the matrix with rows Ci = Ai for i = 2, . . . , n. Then
det A = det(A1 , A2 , . . . , An ) = det(rB1 + sC1 , A2 , . . . , An )
= εi1 ··· in (rb1i1 + sc1i1 )a2i2 · · · anin
= rεi1 ··· in b1i1 a2i2 · · · anin + sεi1 ··· in c1i1 a2i2 · · · anin
= r det B + s det C.
24
Since this argument clearly could have been applied to any of the rows of A, we
have proved the following theorem.
Theorem 4. Let A ∈ Mn (F ) have row vectors A1 , . . . , An and assume that for
some i = 1, . . . , n we have
Ai = rBi + sCi
where Bi , Ci
∈ F n and r, s
rows A1 , . . . , Ai−1 , Bi , Ai+1 , . . . , An
A1 , . . . , Ai−1 , Ci , Ai+1 , . . . , An . Then
∈ F.
and C
Let
∈
B
∈ Mn (F )
Mn (F ) have
have
rows
det A = r det B + s det C.
Besides the very easy to handle diagonal matrices, another type of matrix that
is easy to deal with are the triangular matrices. To be precise, a matrix A ∈ Mn (F )
is said to be upper-triangular if aij = 0 for i > j, and A is said to be lowertriangular if aij = 0 for i < j. Thus a matrix is upper-triangular if it is of the
form


a11 a12 a13 · · · a1n
 0 a22 a23 · · · a2n 


 0
0 a33 · · · a3n 


 ..
.. 
..
..
 .
. 
.
.
0
0
0 · · · ann
and lower-triangular if it is of the form

a11
0
0
 a21 a22
0

 a31 a32 a33

 ..
..
..
 .
.
.
an1
an2
an3
···
···
···
0
0
0
..
.
···
ann




.


We will use the term triangular to mean either upper- or lower-triangular.
Theorem 5. If A ∈ Mn (F ) is a triangular matrix, then
det A =
n
Y
aii .
i=1
Proof. If A is lower-triangular, then A is of the form shown above. Now look
carefully at the definition det A = εi1 ··· in a1i1 · · · anin . Since A is lower-triangular
we have aij = 0 for i < j. But then we must have i1 = 1 or else a1i1 = 0. Now
25
consider a2i2 . Since i1 = 1 and a2i2 = 0 if 2 < i2 , we must have i2 = 2. Next, i1 = 1
and i2 = 2 means that i3 = 3 or else a3i3 = 0. Continuing in this way we see that
the only nonzero term in the sum is when ij = j for each j = 1, . . . , n and hence
det A = ε
12 ··· n
a11 · · · ann =
n
Y
aii .
i=1
If A is an upper-triangular matrix, then the theorem follows from Theorem 2.
An obvious corollary is the following (which was also shown directly in Example
8).
Corollary. If A ∈ Mn (F ) is diagonal, then det A =
Qn
i=1
aii .
It is important to realize that because det AT = det A, Theorem 3 and its
corollaries apply to columns as well as to rows. Furthermore, these results now
allow us easily see what happens to the determinant of a matrix A when we apply
elementary row (or column) operations to A. In fact, if you think for a moment,
the answer should be obvious. For a type α transformation (i.e., interchanging two
rows), we have just seen that det A changes sign (Theorem 3, Corollary 1). For
a type β transformation (i.e., multiply a single row by a nonzero scalar), we can
let r = k, s = 0 and Bi = Ai in Theorem 4 to see that det A → k det A. And for
a type γ transformation (i.e., add a multiple of one row to another) we have (for
Ai → Ai + kAj and using Theorems 4 and 3, Corollary 2)
det(A1 , . . . , Ai + kAj , . . . , An ) = det A + k det(A1 , . . . , Aj , . . . , Aj , . . . , An )
= det A + 0 = det A.
Summarizing these results, we have the following theorem.
Theorem 6. Suppose A ∈ Mn (F ) and let B ∈ Mn (F ) be row equivalent to A.
(i) If B results from the interchange of two rows of A, then det B = − det A.
(ii) If B results from multiplying any row (or column) of A by a scalar k, then
det B = k det A.
(iii) If B results from adding a multiple of one row of A to another row, then
det B = det A.
Corollary. If R is the reduced row-echelon form of a matrix A, then det R = 0 if
and only if det A = 0.
Proof. This follows from Theorem 6 since A and R are row-equivalent.
26
Now, A ∈ Mn (F ) is singular if rank A < n. Hence there must be at least one
zero row in the reduced row echelon form R of A, and thus det A = det R = 0.
Conversely, if rank A = n, then the reduced row echelon form R of A is just I, and
hence det R = 1 6= 0. Therefore det A 6= 0. In other words, we have shown that
Theorem 7. A ∈ Mn (F ) is singular if and only if det A = 0.
Finally, let us prove a basic result that you already know, i.e., that the determinant of a product of matrices is the product of the determinants.
Theorem 8. If A, B ∈ Mn (F ), then
det(AB) = (det A)(det B).
Proof. If either A or B is singular (i.e., their rank is less than n) then so is AB
(by the corollary to Theorem 1). But then (by Theorem 7) either det A = 0 or
det B = 0, and also det(AB) = 0 so the theorem is true in this case.
Now assume
Pthat both A and B are nonsingular, and let C = AB. Then
Ci = (AB)i = k aik Bk for each i = 1, . . . , n so that from an inductive extension
of Theorem 4 we see that
det C = det(C1 , . . . , Cn )
X
X
= det
anjn Bjn
a1j1 Bj1 , . . . ,
=
X
j1
jn
j1
···
X
jn
a1j1 · · · anjn det(Bj1 , . . . , Bjn ).
But det(Bj1 , . . . , Bjn ) = εj1 ··· jn det B (see equation (20)) so we have
X
X
a1j1 · · · anjn εj1 ··· jn det B
···
det C =
j1
jn
= (det A)(det B).
Corollary. If A ∈ Mn (F ) is nonsingular, then
det A−1 = (det A)−1 .
Proof. If A is nonsingular, then A−1 exists, and hence by the theorem we have
1 = det I = det(AA−1 ) = (det A)(det A−1 )
and therefore
det A−1 = (det A)−1 .
27
4
Diagonalizing Matrices
If T ∈ L(V ), then an element λ ∈ F is called an eigenvalue of T if there exists a
nonzero vector v ∈ V such that T v = λv. In this case we call v an eigenvector
of T belonging to the eigenvalue λ. Note that an eigenvalue may be zero, but an
eigenvector is always nonzero by definition. It is important to realize (particularly in
quantum mechanics) that eigenvectors are only specified up to an overall constant.
This is because if T v = λv, then for any c ∈ F we have T (cv) = c(T v) = cλv = λ(cv)
so that cv is also an eigenvector with eigenvalue λ. Because of this, we are always
free to normalize our eigenvectors to any desired value.
If T has an eigenvalue λ, then T v = λv or (T − λ)v = 0. But this means that
v ∈ Ker(T − λ1) with v 6= 0, so that T − λ1 is singular. Conversely, if T − λ1 is
singular, then there exists v 6= 0 such that (T − λ1)v = 0 or T v = λv. Thus we
have proved that a linear operator T ∈ L(V ) has an eigenvalue λ ∈ F if and only if
T − λ1 is singular. (This is exactly the same as saying λ1 − T is singular.)
In an exactly analogous manner we define the eigenvalues and eigenvectors of a
matrix A ∈ Mn (F ). Thus we say that an element λ ∈ F is an eigenvalue of a A
if there exists a nonzero (column) vector v ∈ F n such that Av = λv, and we call v
an eigenvector of A belonging to the eigenvalue λ. Given a basis {ei } for F n , we
can write this matrix eigenvalue equation in terms of components as ai j v j = λv i
or, written out as
n
X
aij vj = λvi ,
i = 1, . . . , n .
(21a)
Writing λvi =
Pn
j=1
j=1
λδij vj , we can write (21a) in the form
n
X
j=1
(λδij − aij )vj = 0 .
(21b)
If A has an eigenvalue λ, then λI − A is singular so that
det(λI − A) = 0 .
(22)
Another way to think about this is that if the matrix (operator) λI − A is nonsingular, then (λI − A)−1 would exist. But then multiplying the equation (λI − A)v = 0
from the left by (λI − A)−1 implies that v = 0, which is impossible if v is to be an
eigenvector of A.
It is also worth again pointing out that there is no real difference between the
statements det(λ1 − A) = 0 and det(A − λ1) = 0, and we will use whichever one is
most appropriate for what we are doing at the time.
Example 9. Let us find all of the eigenvectors and associated eigenvalues of the
matrix
1 2
A=
.
3 2
28
This means that we must find a vector v = (x, y) such that Av = λv. In matrix
notation, this equation takes the form
1 2
x
x
=λ
3 2
y
y
and the equation (A − λI)v = 0 becomes
1−λ
2
x
= 0.
3
2−λ
y
This is equivalent to the system
(1 − λ)x + 2y = 0
3x + (2 − λ)y = 0 .
(23)
By (22) we must have
1−λ
2 = λ2 − 3λ − 4 = (λ − 4)(λ + 1) = 0 .
3
2−λ
We thus see that the eigenvalues are λ = 4 and λ = −1. (The roots of this
polynomial are found either by inspection, or by applying the elementary quadratic
formula.)
Substituting λ = 4 into equations (23) yields
−3x + 2y = 0
3x − 2y = 0
or y = (3/2)x. This means that every eigenvector corresponding to the eigenvalue
λ = 4 has the form v = (x, 3x/2). In other words, every multiple of the vector
v = (2, 3) is also an eigenvector with eigenvalue equal to 4. If we substitute λ = −1
in equations (23), then we similarly find y = −x, and hence every multiple of
the vector v = (1, −1) is an eigenvector with eigenvalue equal to −1. (Note that
both of equations (23) give the same information. This is not surprising because
the determinant of the coefficients vanishes so we know that the rows are linearly
dependent, and hence each supplies the same information.)
Let us denote the set of all polynomials over the field F by F [x]. Thus p ∈ F[x]
means that p = a0 + a1 x + a2 x2 + · · · + an xn where each ai ∈ F and an 6= 0.
The number n is called the degree of p and denoted by deg p. If an = 1 the
polynomial is said to be monic. In high school you learned how to do long division,
and an inductive application of this process yields the following result, called the
division algorithm: Given f, g ∈ F[x] with g 6= 0, there exist unique polynomials
q, r ∈ F[x] such that f = qg +r where either r = 0 or deg r < deg g. The polynomial
29
q is called the quotient and r is called the remainder.
If f (x) ∈ F[x], then c ∈ F is said to be a zero or root of f if f (c) = 0. If
f, g ∈ F[x] and g 6= 0, then we say that f is divisible by g (or g divides f ) over
F if f = qg for some q ∈ F[x]. In other words, f is divisible by g if the remainder
in the division of f by g is zero. In this case we also say that g is a factor of f .
Suppose that we divide f by x − c. By the division algorithm we know that
f = (x − c)q + r where either r = 0 or deg r < deg(x − c) = 1. But then either
r = 0 or deg r = 0 in which case r ∈ F. Either way, substituting x = c we have
f (c) = (c − c)q + r = r. Thus the remainder in the division of f by x − c is f (c).
This result is called the remainder theorem. As a consequence of this, we see
that x − c will be a factor of f if and only if f (c) = 0, a result called the factor
theorem. If c is such that (x − c)m divides f but no higher power of x − c divides
f , then we say that c is a root of multiplicity m. In counting the number of roots
a polynomial has, we shall always count a root of multiplicity m as m roots. A root
of multiplicity 1 is frequently called a simple root.
The fields R and C are by far the most common fields used by physicists. However, there is an extremely important fundamental difference between them. A field
F is said to be algebraically closed if every polynomial f ∈ F[x] with deg f > 0
has at least one zero (or root) in F . It is a fact (not at all easy to prove) that the
complex number field C is algebraically closed.
Let F be algebraically closed, and let f ∈ F[x] be of degree n ≥ 1. Since F is
algebraically closed there exists a1 ∈ F such that f (a1 ) = 0, and hence by the factor
theorem, f = (x − a1 )q1 where q1 ∈ F[x] and deg q1 = n − 1. (This is a consequence
of the general fact that if deg p = m and deg q = n, then deg pq = m + n. Just look
at the largest power of x in the product pq = (a0 + a1 x + a2 x2 + · · · + am xm )(b0 +
b1 x + b2 x2 + · · · + bn xn ).)
Now, by the algebraic closure of F there exists a2 ∈ F such that q1 (a2 ) = 0,
and therefore q1 = (x − a2 )q2 where deg q2 = n − 2. It is clear that we can continue
this process a total of n times, finally arriving at
f = c(x − a1 )(x − a2 ) · · · (x − an ) = c
n
Y
i=1
(x − ai )
where c ∈ F is nonzero. In particular, c = 1 if qn−1 is monic.
Observe that while this shows that any polynomial of degree n over an algebraically closed field has exactly n roots, it doesn’t require that these roots be
distinct, and in general they are not.
Note also that while the field C is algebraically closed, it is not true that R is
algebraically closed. This should be obvious because any quadratic equation of the
form ax2 + bx + c = 0 has solutions given by the quadratic formula
√
−b ± b2 − 4ac
x=
2a
and if b2 − 4ac < 0, then there is no solution for x in the real number system.
30
Given a matrix A = (aij ) ∈ Mn (F ), the trace of A is defined by tr A =
An important property of the trace is that it is cyclic:
tr AB =
n
X
(AB)ii =
n X
n
X
i=1 j=1
i=1
aij bji =
n X
n
X
i=1 j=1
bji aij =
n
X
Pn
i=1
aii .
(BA)jj = tr BA .
j=1
As a consequence of this, we see that the trace is invariant under similarity transformations. In other words, if A′ = P −1 AP , then tr A′ = tr P −1 AP = tr AP P −1 =
tr A.
Let A ∈ Mn (F ) be a matrix representation of T . The matrix xI − A is called
the characteristic matrix of A, and the expression det(x1 − T ) = 0 is called
the characteristic (or secular) equation of T . The determinant det(x1 − T ) is
frequently denoted by ∆T (x). Writing out the determinant in a particular basis,
we see that det(x1 − T ) is of the form
x − a11
−a12
···
−a1n −a21
x − a22 · · ·
−a2n ∆T (x) = ..
..
..
.
.
.
−an1
−an2
· · · x − ann where A = (aij ) is the matrix representation of T in the chosen basis. Since the
expansion of a determinant contains exactly one element from each row and each
column, we see that (and this is a very good exercise for you to show)
det(x1 − T ) = (x − a11 )(x − a22 ) · · · (x − ann )
+ terms containing n − 1 factors of the form x − aii
n
+ · · · + terms with no factors containing x
= x − (tr A)xn−1 + terms of lower degree in x + (−1)n det A. (24)
This monic polynomial is called the characteristic polynomial of T .
Using Theorem 8 and its corollary, we see that if A′ = P −1 AP is similar to A,
then
det(xI − A′ ) = det(xI − P −1 AP ) = det[P −1 (xI − A)P ] = det(xI − A) .
We thus see that similar matrices have the same characteristic polynomial (the converse of this statement is not true), and hence also the same eigenvalues. Therefore
the eigenvalues (not eigenvectors) of an operator T ∈ L(V ) do not depend on the
basis chosen for V .
Note that since both the determinant and trace are invariant under similarity
transformations, we may as well write tr T and det T (rather than tr A and det A)
since these are independent of the particular basis chosen.
Since the characteristic polynomial is of degree n in x, it follows from the discussion above that if we are in an algebraically closed field (such as C), then there
31
must exist n roots. In this case the characteristic polynomial may be factored into
the form
det(x1 − T ) = (x − λ1 )(x − λ2 ) · · · (x − λn )
(25)
where the eigenvalues λi are not necessarily distinct. Expanding this expression we
have
!
n
X
n
λi xn−1 + · · · + (−1)n λ1 λ2 · · · λn .
det(x1 − T ) = x −
i=1
Comparing this with the above general expression for the characteristic polynomial,
we see that
n
X
λi
(26a)
tr T =
i=1
and
det T =
n
Y
λi .
(26b)
i=1
(You can easily verify these for the matrix in Example 9.) It should be remembered
that this result only applies to an algebraically closed field (or to any other field F
as long as all n roots of the characteristic polynomial lie in F ).
If v1 , v2 , . . . , vr are eigenvectors belonging to the distinct eigenvalues λ1 , λ2 , . . . , λr
of T ∈ L(V ), then it can be shown that the set {v1 , v2 , . . . , vr } is linearly independent. Therefore, if T has n distinct eigenvalues (and it can’t have more than n)
there are n linearly independent eigenvectors which then form a basis for V .
Let us now take a careful look at what happens if a space V has a basis of
eigenvectors of an operator T . Suppose that T ∈ L(V ) with dim V = n. If V
has a basis {v1 , . . . , vn } that consists entirely of eigenvectors of T , then the matrix
representation of T in this basis is defined by
T (vi ) =
n
X
vj aji = λi vi =
n
X
δji λj vj
j=1
j=1
and therefore aji = δji λj . In other words, T is represented by a diagonal matrix in a
basis of eigenvectors, and the diagonal elements of [T ]v are precisely the eigenvalues
of T . Conversely, if T is represented by a diagonal matrix aji = δji λj relative to
some basis {vi }, then reversing the argument shows that each vi is an eigenvector
of T . This proves the following theorem.
Theorem 9. A linear operator T ∈ L(V ) can be represented by a diagonal matrix
if and only if V has a basis consisting of eigenvectors of T . If this is the case, then
the diagonal elements of the matrix representation are precisely the eigenvalues of
T . (Note however, that the eigenvalues need not necessarily be distinct.)
32
If T ∈ L(V ) is represented in some basis {ei } by a matrix A, and in the basis
of eigenvectors {vi } by a diagonal matrix D, then the discussion above Example 2
tells us that A and D must be similar matrices. This proves the following version
of Theorem 9, which we state as a corollary.
Corollary 1. A matrix A ∈ Mn (F ) is similar to a diagonal matrix D if and only
if A has n linearly independent eigenvectors.
Corollary 2. A linear operator T ∈ L(V ) can be represented by a diagonal matrix
if T has n = dim V distinct eigenvalues.
Proof. This follows from our discussion above.
Note that the existence of n = dim V distinct eigenvalues of T ∈ L(V ) is a
sufficient but not necessary condition for T to have a diagonal representation. For
example, the identity operator has the usual diagonal representation, but its only
eigenvalues are λ = 1. In general, if any eigenvalue has multiplicity greater than
1, then there will be fewer distinct eigenvalues than the dimension of V . However,
in this case it may be possible to choose an appropriate linear combination of
eigenvectors in each eigenspace so the matrix of T will still be diagonal.
We say that a matrix A is diagonalizable if it is similar to a diagonal matrix
D. If P is a nonsingular matrix such that D = P −1 AP , then we say that P
diagonalizes A. It should be noted that if λ is an eigenvalue of a matrix A with
eigenvector v (i.e., Av = λv), then for any nonsingular matrix P we have
(P −1 AP )(P −1 v) = P −1 Av = P −1 λv = λ(P −1 v).
In other words, P −1 v is an eigenvector of P −1 AP . Similarly, we say that T ∈ L(V )
is diagonalizable if there exists a basis for V that consists entirely of eigenvectors
of T .
How do we actually go about diagonalizing a matrix? If T ∈ L(V ) and A is the
matrix representation of T in a basis {ei }, then P is defined to be the transformation
that takes
P the basis {ei } into the basis {vi } of eigenvectors. In other words, vi =
P ei = j ej pji . This means that the ith column of (pji ) is just the ith eigenvector
of A. The fact that P must be nonsingular coincides with the requirement that T
(or A) have n linearly independent eigenvectors vi .
Example 10. In Example 9 we found the eigenvectors v1 = (2, 3) (corresponding
to the eigenvalue λ1 = 4) and v2 = (1, −1) (corresponding to λ2 = −1) of the
matrix
1 2
A=
.
3 2
33
Then the transition matrix P is given by
2
P =
3
1
−1
and you can use your favorite method to show that
1 1
1
−1
.
P =
5 3 −2
Then
P −1 AP =
=
1
5
1
1
1 2
2
1
3 −2
3 2
3 −1
4
0
= D.
0 −1
It is also easy to see that det A = −4 = λ1 λ2 and tr A = 3 = λ1 + λ2 .
5
More on Diagonalization
In the previous section we showed that an operator T ∈ L(V ) can be represented
by a diagonal matrix if and only if it has a basis of eigenvectors. However, we
haven’t addressed the conditions under which such a basis will exist, or the types
of matrices that will in fact be diagonalizable. One very general characterization
deals with the concepts of algebraic and geometric multiplicities. Unfortunately, in
order to explain these terms and show how they are useful we must first develop
some additional concepts. Since these notes aren’t meant to be a complete course
in linear algebra, we will be fairly brief in our discussion.
First note that one eigenvalue can belong to more than one linearly independent
eigenvector. In fact, if T ∈ L(V ) and λ is an eigenvalue of T , then the set
Vλ := {v ∈ V : T v = λv}
of all eigenvectors of T belonging to λ is a subspace of V called the eigenspace of
λ. It is also easy to see that Vλ = Ker(λ1 − T ).
Suppose we are given a matrix A = (aij ) ∈ Mm×n (F ). Then, by partitioning the
rows and columns of A in some manner, we obtain what is called a block matrix.
To illustrate, suppose A ∈ M3×5 (R) is given by


7 5
5 4 −1
5 .
A =  2 1 −3 0
0 8
2 1 −9
34
Then we may partition A into blocks to obtain (for example) the matrix
A11 A12
A=
A21 A22
where
A11 = 7 5
A21 =
2
0
5
1 −3
8
2
A12 = 4 −1
A22 =
0
5
1 −9
.
If A and B are block matrices that are partitioned into the same number of
blocks such that each of the corresponding blocks is of the same size, then it is clear
that (in an obvious notation)


A11 + B11 · · · A1n + B1n


..
..
A+B =
.
.
.
Am1 + Bm1
···
Amn + Bmn
In addition, if C and D are block matrices such that the number of columns in
each Cij is equal to the number of rows in each
P Djk , then the product of C and D
is also a block matrix CD where (CD)ik = j Cij Djk . Thus block matrices are
multiplied as if each block were just a single element of each matrix in the product.
In other words, each (CD)ik is a matrix that is the sum of a product of matrices.
The proof of this fact is an exercise in matrix multiplication, and is left to you.
The proof of the next theorem is just a careful analysis of the definition of
determinant, and is omitted.
Theorem 10. If A ∈ Mn (F ) is a block triangular matrix of the form

A11
 0

 ..
 .
0
A12
A22
..
.
A13
A23
..
.
···
···
A1k
A2k
..
.
0
0
···
Akk





where each Aii is a square matrix and the 0’s are zero matrices of appropriate size,
then
k
Y
det Aii .
det A =
i=1
35
Example 11. Consider the matrix


1 −1
2
3
2
2
0
2
.
A=
4
1 −1 −1 
1
2
3
0
Subtract multiples of row 1 from rows 2, 3 and 4 to obtain the matrix


1 −1
2
3
0
4 −4 −4 

.
0
5 −9 −13 
0
3
1 −3
Now subtract 5/4 times row 2 from row 3, and 3/4 times row 2 from row 4. This
yields the matrix


2
3
1 −1
0
4 −4 −4 

B=
0
0 −4 −8 
0
0
4
0
with det B = det A (see the discussion at the beginning of Section 4). Since B is in
block triangular form we have
1 −1 −4 −8 = 4(32) = 128.
det A = det B = 0
4 4
0
Next, suppose T ∈ L(V ) and let W be a subspace of V . Then W is said to be
invariant under T (or simply T -invariant) if T (w) ∈ W for every w ∈ W . For
example, if V = R3 then the xy-plane is invariant under the linear transformation
that rotates every vector in R3 about the z-axis. As another example, note that if
v ∈ V is an eigenvector of T , then T (v) = λv for some λ ∈ F, and hence v generates
a one-dimensional subspace of V that is invariant under T (this is not necessarily
the same as the eigenspace of λ).
Another way to describe the invariance of W under T is to say that T (W ) ⊂ W .
Then clearly T 2 (W ) = T (T (W )) ⊂ W , and in general T n (W ) ⊂ W for every
n ∈ Z+ . Since W is a subspace of V , this means f (T )(W ) ⊂ W for any f (x) ∈ F[x].
In other words, if W is invariant under T , then W is also invariant under any
polynomial in T (over the same field as W ).
If W ⊂ V is T -invariant, we may focus our attention on the effect of T on W
alone. To do this, we define the restriction of T to W as that operator T |W : W →
W defined by (T |W )(w) = T (w) for every w ∈ W . In other words, the restriction
is an operator T |W that acts only on the subspace W , and gives the same result as
the full operator T gives when it acts on those vectors in V that happen to be in
36
W . We will frequently write TW instead of T |W .
Now suppose T ∈ L(V ) and let W ⊂ V be a T -invariant subspace. Furthermore,
let {v1 , . . . , vn } be a basis for V , where the first m < n vectors form a basis for W .
If A = (aij ) is the matrix representation of T relative to this basis for V , then a
little thought should convince you that A must be of the block matrix form
B C
A=
0 D
where aij = 0 for j ≤ m and i > m. This is because T (w) ∈ W and any w ∈ W
has components (w1 , . . . , wm , 0, . . . , 0) relative to the above basis for V . It should
also be reasonably clear that B is just the matrix representation of TW . The formal
proof of this fact is given in our next theorem.
Theorem 11. Let W be a subspace of V and suppose T ∈ L(V ). Then W is
T -invariant if and only if T can be represented in the block matrix form
B C
A=
0 D
where B is a matrix representation of TW .
Proof. First suppose that W is T -invariant. Choose a basis {v1 , . . . , vm } for W ,
and extend this to a basis {v1 , . . . , vm , vm+1 , . . . , vn } for V . Then, since T (vi ) ∈ W
for each i = 1, . . . , m there exist scalars bij such that
TW (vi ) = T (vi ) = v1 b1i + · · · + vm bmi
for each i = 1, . . . , m. In addition, since T (vi ) ∈ V for each i = m + 1, . . . , n there
also exist scalars cij and dij such that
T (vi ) = v1 c1i + · · · + vm cmi + vm+1 dm+1,i + · · · + vn dni
for each i = m + 1, . . . , n.
Because T takes the ith basis vector into the ith column of the matrix representation of T , we see that this representation is given by an n × n matrix A of the
form


b11 · · · b1m c1 m+1
· · · c1n

 b21 · · · b2m c2 m+1
· · · c2n



 ..
..
..
..

 .
.
.
.



A=
b
·
·
·
b
c
·
·
·
c
mm
m m+1
mn

 m1

 0
0
d
·
·
·
d
m+1 m+1
m+1 n 


 .
.
.
.
..
..
..

 ..
0
···
0
dn m+1
· · · dnn
37
or, in block matrix form as
A=
B
0
C
D
where B is an m × m matrix that represents TW , C is an m × (n − m) matrix, and
D is an (n − m) × (n − m) matrix.
Conversely, if A has the stated form and {v1 , . . . , vn } is a basis for V , then the
subspace W of V defined by vectors of the form
w=
m
X
αi vi
i=1
where each αi ∈ F will be invariant under T . Indeed, for each i = 1, . . . , m we have
T (vi ) =
n
X
j=1
and hence T (w) =
Pm
i=1
vj aji = v1 b1i + · · · + vm bmi ∈ W
αi T (vi ) ∈ W .
Given a linear operator T ∈ L(V ), what we have called the multiplicity of an
eigenvalue λ is the largest positive integer m such that (x − λ)m divides the characteristic polynomial ∆T (x). This is properly called the algebraic multiplicity
of λ, in contrast to the geometric multiplicity which is the number of linearly
independent eigenvectors belonging to that eigenvalue. In other words, the geometric multiplicity of λ is the dimension of Vλ . In general, we will use the word
“multiplicity” to mean the algebraic multiplicity. The set of all eigenvalues of a
linear operator T ∈ L(V ) is called the spectrum of T . If some eigenvalue in the
spectrum of T is of algebraic multiplicity greater than 1, then the spectrum is said
to be degenerate.
If T ∈ L(V ) has an eigenvalue λ of algebraic multiplicity m, then it is not hard
for us to show that the dimension of the eigenspace Vλ must be less than or equal
to m. Note that since every element of Vλ is an eigenvector of T with eigenvalue λ,
the space Vλ must be a T -invariant subspace of V . Furthermore, every basis for Vλ
will obviously consist of eigenvectors corresponding to λ.
Theorem 12. Let T ∈ L(V ) have eigenvalue λ. Then the geometric multiplicity of
λ is always less than or equal to its algebraic multiplicity. In other words, if λ has
algebraic multiplicity m, then dim Vλ ≤ m.
Proof. Suppose dim Vλ = r and let {v1 , . . . , vr } be a basis for Vλ . Now extend
this to a basis {v1 , . . . , vn } for V . Relative to this basis, T must have the matrix
representation (see Theorem 11)
λIr C
.
0 D
38
Applying Theorem 10 and the fact that the determinant of a diagonal matrix is just
the product of its (diagonal) elements, we see that the characteristic polynomial
∆T (x) of T is given by
(x − λ)Ir
−C
∆T (x) = 0
xIn−r − D = det[(x − λ)Ir ] det(xIn−r − D)
= (x − λ)r det(xIn−r − D)
which shows that (x − λ)r divides ∆T (x). Since by definition m is the largest
positive integer such that (x − λ)m | ∆T (x), it follows that r ≤ m.
Note that a special case of this theorem arises when an eigenvalue is of (algebraic) multiplicity 1. In this case, it then follows that the geometric and algebraic
multiplicities are necessarily equal. We now proceed to show just when this will be
true in general. Recall that any polynomial over an algebraically closed field will
factor into linear terms (see equation (25)).
Theorem 13. Assume that T ∈ L(V ) has a characteristic polynomial that factors into (not necessarily distinct) linear terms. Let T have distinct eigenvalues
λ1 , . . . , λr with (algebraic) multiplicities m1 , . . . , mr respectively, and let dim Vλi =
di . Then T is diagonalizable if and only if mi = di for each i = 1, . . . , r.
Proof. Let dim V = n. We note that since the characteristic polynomial of T is
of degree n and factors into linear terms, it follows that m1 + · · · + mr = n. We
first assume that T is diagonalizable. By definition, this means that V has a basis
consisting of n linearly independent eigenvectors of T . Since each of these basis
eigenvectors must belong to at least one of the eigenspaces Vλi , it follows that
V = Vλ1 + · · · + Vλr and consequently n ≤ d1 + · · · + dr . From Theorem 12 we know
that di ≤ mi for each i = 1, . . . , r and hence
n ≤ d1 + · · · + dr ≤ m1 + · · · + mr = n
which implies d1 + · · · + dr = m1 + · · · + mr or
(m1 − d1 ) + · · · + (mr − dr ) = 0.
But each term in this equation is nonnegative (by Theorem 12), and hence we must
have mi = di for each i.
Conversely, suppose di = mi for each i = 1, . . . , r. For each i, we know that
any basis for Vλi consists of linearly independent eigenvectors corresponding to the
eigenvalue λi , while on the other hand we know that eigenvectors corresponding to
distinct eigenvalues are linearly independent. Therefore the union B of the bases of
39
{Vλi } forms a linearly independent set of d1 + · · · + dr = m1 + · · · + mr vectors.
But m1 + · · · + mr = n = dim V , and hence B forms a basis for V . Since this
shows that V has a basis of eigenvectors of T , it follows by definition that T must
be diagonalizable.
Example 12. Consider the operator T ∈ L(R3 ) defined by
T (x, y, z) = (9x + y, 9y, 7z).
Relative to the standard basis for R3 , the

9
A=0
0
and hence the characteristic polynomial is
matrix representation of T is given by

1 0
9 0
0 7
∆A (x) = det(A − λI) = (9 − λ)2 (7 − λ)
which is a product of linear factors. However,

0 1
A − 9I =  0 0
0 0

0
0
−2
which clearly has rank equal to 2, and hence nul(A − 9I) = 3 − 2 = 1 which is
not the same as the algebraic multiplicity of λ = 9 (which is 2). Thus T is not
diagonalizable.
Example 13. Consider the operator on R3 defined by the following matrix:


5 −6 −6
A =  −1
4
2 .
3 −6 −4
In order to avoid factoring a cubic polynomial, we compute the characteristic polynomial ∆A (x) = det(xI − A) by applying elementary row operations as follows (you
should be able to see exactly what elementary row operations were performed in
each step; see the discussion at the beginning of Section 4).
x−5
0
−x + 2 6
6 x − 2
1
1
x−4
−2 x−4
−2 = −3
6
x+4 6
x + 4 −3
1
0
= (x − 2) 1 x − 4
−3
6
40
−1 −2 x+4
1
0
−1 −1 = (x − 2) 0 x − 4
0
6
x+1
x−4
−1 = (x − 2) 6
x+1
= (x − 2)2 (x − 1).
We now see that A has eigenvalue λ1 = 1 with (algebraic) multiplicity 1, and
eigenvalue λ2 = 2 with (algebraic) multiplicity 2. From Theorem 12 we know that
the algebraic and geometric multiplicities of λ1 are necessarily the same and equal
to 1, so we need only consider λ2 . Observing that


3 −6 −6
2
2
A − 2I =  −1
3 −6 −6
it is obvious that rank(A − 2I) = 1, and hence nul(A − 2I) = 3 − 1 = 2. This shows
that A is indeed diagonalizable.
Let us now construct bases for the eigenspaces Wi = Vλi . This means we seek
vectors v = (x, y, z) ∈ R3 such that (A − λi I)v = 0. This is easily solved by the
usual row reduction techniques as follows. For λ1 = 1 we have






4 −6 −6
1
0 −1
1
0 −1
3
2  →  −1
3
2→0
3
1
A − I =  −1
3 −6 −5
3 −6 −5
0 −6 −2

1
→0
0

0 −1
3
1
0
0
which has the solutions x = z and y = −z/3 = −x/3. Therefore W1 is spanned by
the single eigenvector v1 = (3, −1, 3). As to λ2 = 2, we proceed in a similar manner
to obtain




3 −6 −6
1 −2 −2
2
2→0
0
0
A − 2I =  −1
3 −6 −6
0
0
0
which implies that any vector (x, y, z) with x = 2y + 2z will work. For example, we
can let x = 0 and y = 1 to obtain z = −1, and hence one basis vector for W2 is
given by v2 = (0, 1, −1). If we let x = 1 and y = 0, then we have z = 1/2 so that
another independent basis vector for W2 is given by v3 = (2, 0, 1).
In terms of these eigenvectors, the transformation matrix P that diagonalizes A
is given by


3
0 2
1 0
P =  −1
3 −1 1
41
and I leave it to you to verify that AP = P D (i.e., P −1 AP = D) where D is the
diagonal matrix with diagonal elements d11 = 1 and d22 = d33 = 2.
6
Diagonalizing Normal Matrices
The previous section described some general conditions under which a matrix may
be diagonalized. However, in physics the most useful matrices are either real symmetric (in the case of Mn (R)), Hermitian (in the case of Mn (C)) or unitary (also
for Mn (C)). In this section I will show that in fact all of these can always be
diagonalized.
One tool that we will find useful is the Gram-Schmidt orthogonalization
process which you have probably seen before. However, just in case, I will give a
complete statement and proof. It will also be useful to note that any orthogonal
set of vectors is necessarily linearly
Pn independent. To see this, let {v1 , . . . , vn } be an
orthogonal set, and suppose i=1 ai vi = 0. Taking the inner product with vj we
have
n
n
X
X
ai δij = aj .
ai hvj , vi i =
0=
i=1
i=1
Since this holds for all j, we have ai = 0 for all i so that {vi } is linearly independent
as claimed.
Theorem 14. Let V be a finite-dimensional inner product space. Then there exists
an orthonormal set of vectors that forms a basis for V .
Proof. Let dim V = n and let {u1 , . . . , un } be a basis for V . We will construct a
new basis {w1 , . . . , wn } such that hwi , wj i = δij . To begin, we choose
w1 =
u1
ku1 k
so that
2
kw1 k = hw1 , w1 i = hu1 / ku1 k , u1 / ku1 ki = hu1 , u1 i/ ku1 k
2
= ku1 k2 / ku1 k2 = 1
and hence w1 is a unit vector. We now take u2 and subtract off its “projection”
along w1 . This will leave us with a new vector v2 that is orthogonal to w1 . Thus,
we define
v2 = u2 − hw1 , u2 iw1
so that
hw1 , v2 i = hw1 , u2 i − hw1 , u2 ihw1 , w1 i = 0
42
If we let
v2
kv2 k
w2 =
then {w1 , w2 } is an orthonormal set . (That v2 6= 0 will be shown below.)
We now go to u3 and subtract off its projection along w1 and w2 . In other
words, we define
v3 = u3 − hw2 , u3 iw2 − hw1 , u3 iw1
so that hw1 , v3 i = hw2 , v3 i = 0. Choosing
w3 =
v3
kv3 k
we now have an orthonormal set {w1 , w2 , w3 }.
It is now clear that given an orthonormal set {w1 , . . . , wk }, we let
vk+1 = uk+1 −
k
X
i=1
hwi , uk+1 iwi
so that vk+1 is orthogonal to w1 , . . . , wk , and hence we define
wk+1 =
vk+1
.
kvk+1 k
It should now be obvious that we can construct an orthonormal set of n vectors
from our original basis of n vectors. To finish the proof, we need only show that
w1 , . . . , wn are linearly independent.
To see this, note first that since u1 and u2 are linearly independent, w1 and
u2 must also be linearly independent, and hence v2 6= 0 by definition of linear
independence. Thus w2 exists and {w1 , w2 } is linearly independent (since they are
orthogonal). Next, {w1 , w2 , u3 } is linearly independent since w1 and w2 are in the
linear span of u1 and u2 . Hence v3 6= 0 so that w3 exists, and again {w1 , w2 , w3 } is
linearly independent.
In general then, if {w1 , . . . , wk } is linearly independent, it follows that the set
{w1 , . . . , wk , uk+1 } is also independent since {w1 , . . . , wk } is in the linear span of
{u1 , . . . , uk }. Hence vk+1 6= 0 and wk+1 exists. Then {w1 , . . . , wk+1 } is linearly
independent. Thus {w1 , . . . , wn } forms a basis for V , and hwi , wj i = δij .
Corollary (Gram-Schmidt process). Let {u1 , . . . , un } be a linearly independent
set of vectors in an inner product space V . Then there exists a set of orthonormal
vectors w1 , . . . , wn ∈ V such that the linear span of {u1 , . . . , uk } is equal to the
linear span of {w1 , . . . , wk } for each k = 1, . . . , n.
Proof. This corollary follows by a careful inspection of the proof of Theorem 14.
43
We emphasize that the Gram-Schmidt algorithm (the “orthogonalization process” of Theorem 14) as such applies to any inner product space, and is not restricted
to only finite-dimensional spaces.
Example 14. Consider the following basis vectors for R3 :
u1 = (3, 0, 4)
u2 = (−1, 0, 7)
u3 = (2, 9, 11).
Let us apply the Gram-Schmidt process (with the standard inner product on R3 )
to obtain a new orthonormal
basis for R3 .
√
Since ku1 k = 9 + 16 = 5, we define
w1 = u1 /5 = (3/5, 0, 4/5).
Next, using hw1 , u2 i = −3/5 + 28/5 = 5 we let
v2 = (−1, 0, 7) − (3, 0, 4) = (−4, 0, 3).
Since kv2 k = 5, we have
w2 = (−4/5, 0, 3/5).
Finally, using hw1 , u3 i = 10 and hw2 , u3 i = 5 we let
v3 = (2, 9, 11) − (−4, 0, 3) − (6, 0, 8) = (0, 9, 0)
and hence, since kv3 k = 9, our third basis vector becomes
w3 = (0, 1, 0).
I leave it to you to show that {w1 , w2 , w3 } does indeed form an orthonormal basis
for R3 .
Recall that the transpose of A = (aij ) ∈ Mn (C) is the matrix AT = (aij )T =
(aji ). Then for any A, B ∈ Mn (C) we have
(AB)Tij = (AB)ji =
n
X
ajk bki =
k=1
n
X
bTik aTkj = (B T AT )ij
k=1
and therefore (AB)T = B T AT as you should already know.
Now suppose we have a matrix A ∈ Mn (C). We define the adjoint (or Hermitian adjoint) of A to be the matrix A† = A∗T . In other words, the adjoint of A is
its complex conjugate transpose. From what we just showed, it is easy to see that
(AB)† = B † A† .
If it so happens that A† = A, then A is said to be a Hermitian matrix.
44
If a matrix U ∈ Mn (C) has the property that U † = U −1 , then we say U is
unitary. Thus a matrix U is unitary if U U † = U † U = I. (I state without proof
that in a finite-dimensional space, it is only necessary to require either U U † = I
or U † U = I. In other words, in finite dimensions (or for any finite group) the
existence of a left (right) inverse implies the existence of a right (left) inverse.
However, the full definition is necessary in the case of infinite-dimensional spaces.)
We also see that the product of two unitary matrices U and V is unitary since
(U V )† U V = V † U † U V = V † IV = V † V = I. If a matrix N ∈ Mn (C) has the
property that it commutes with its adjoint, i.e., N N † = N † N , then N is said to
be a normal matrix. Note that Hermitian and unitary matrices are automatically
normal.
Example 15. Consider the matrix A ∈ M2 (C) given by
1 1 −1
.
A= √
i
2 i
Then the adjoint of A is given by
1
A† = √
2
1 −i
−1 −i
and I leave it to you to verify that AA† = A† A = I, and hence show that A is
unitary.
A convenient property of the adjoint is this. If A ∈ Mn (C) and x, y ∈ Cn , then
Ax ∈ Cn also, so we may use the standard inner product on Cn to write (using
A† = A∗T )
hAx, yi =
n
X
(Ax)∗i yi =
n
X
a∗ij x∗j yi =
i=1
i=1
†
= hx, A yi.
n
X
x∗j a†ji yi
i=1
(27)
In the particular case of a unitary matrix, we see that
hU x, U yi = hx, U † U yi = hx, yi
so that unitary transformations also preserve the angle between two vectors (and
hence maintains orthogonality as well). Choosing y = x we also see that
2
kU xk = hU x, U xi = hx, U † U xi = hx, Ixi = hx, xi = kxk
2
so that unitary transformations preserve lengths of vectors, i.e., they are really just
rotations in Cn .
45
It is well worth pointing out that in the case of a real matrix A ∈ Mn (R), instead
of the adjoint A† we have the transpose AT and equation (27) becomes
hAx, yi = hx, AT yi
or equivalently
hAT x, yi = hx, Ayi.
(28)
We will use this below when we prove that a real symmetric matrix has all real
eigenvalues.
Note that since U ∈ Mn (C), the rows Ui and columns U i of U are just vectors
in Cn . This means we can take their inner product relative to the standard inner
product on Cn . Writing out the relation U U † = I in terms of components, we have
(U U † )ij =
n
X
uik u†kj =
k=1
n
X
uik u∗jk =
n
X
k=1
k=1
u∗jk uik = hUj , Ui i = δij
and from U † U = I we see that
(U † U )ij =
n
X
k=1
u†ik ukj =
n
X
k=1
u∗ki ukj = hU i , U j i = δij .
In other words, a matrix is unitary if and only if its rows (or columns) each form
an orthonormal set. Note we have shown that if the rows (columns) of U ∈ Mn (C)
form an orthonormal set, then so do the columns (rows), and either of these is a
sufficient condition for U to be unitary. For example, you can easily verify that the
matrix A in Example 15 satisfies these conditions.
It is also worth pointing out that Hermitian and unitary matrices have important
analogues over the real number system. If A ∈ Mn (R) is Hermitian, then A = A† =
AT , and we say A is symmetric. If U ∈ Mn (R) is unitary, then U −1 = U † = U T ,
and we say U is orthogonal. Repeating the above calculations over R, it is easy
to see that a real matrix is orthogonal if and only if its rows (or columns) form an
orthonormal set.
Let us summarize what we have shown so far in this section.
Theorem 15. The following conditions on a matrix U ∈ Mn (C) are equivalent:
(i) U is unitary.
(ii) The rows Ui of U form an orthonormal set.
(iii) The columns U i of U form an orthonormal set.
Note that the equivalence of (ii) and (iii) in this theorem means that the rows
of U form an orthonormal set if and only if the columns of U form an orthonormal
set. But the rows of U are just the columns of U T , and hence U is unitary if and
only if U T is unitary.
46
Corollary. The following conditions on a matrix A ∈ Mn (R) are equivalent:
(i) A is orthogonal.
(ii) The rows Ai of A form an orthonormal set.
(iii) The columns Ai of A form an orthonormal set.
Our next theorem details several useful properties of orthogonal and unitary
matrices.
Theorem 16. (i) If A is an orthogonal matrix, then det A = ±1.
(ii) If U is a unitary matrix, then |det U | = 1. Alternatively, det U = eiφ for
some real number φ.
Proof. (i) We have AAT = I, and hence
1 = det I = det(AAT ) = (det A)(det AT ) = (det A)2
so that det A = ±1.
(ii) If U U † = I then, as above, we have
1 = det I = det(U U † ) = (det U )(det U † ) = (det U )(det U T )∗
2
= (det U )(det U )∗ = |det U | .
Since the absolute value is defined to be positive, this shows |det U | = 1 and hence
det U = eiφ for some real φ.
Example 16. Let us take a look at rotations in R2 as shown, for example, in
the figure below. Recall that if we have two bases {ei } P
and {ēi }, then they are
related by a transition matrix A = (aij ) defined by ēi = j ej aji . In addition, if
P i
P i
P
X=
x ei =
x̄ ēi , then xi = j aij x̄j . If both {ei } and {ēi } are orthonormal
bases, then
E X
D X
X
akj hei , ek i =
hei , ēj i = ei ,
akj δik = aij .
ek akj =
k
k
k
Using the usual dot product on R2 as our inner product and referring to the figure
below, we see that the elements aij are given by
a11 = e1 · ē1 = |e1 | |ē1 | cos θ = cos θ
a12 = e1 · ē2 = |e1 | |ē2 | cos(π/2 + θ) = − sin θ
a21 = e2 · ē1 = |e2 | |ē1 | cos(π/2 − θ) = sin θ
a22 = e2 · ē2 = |e2 | |ē2 | cos θ = cos θ
47
x̄2
x2
X
θ
x̄1
θ
x1
Thus the matrix A is given by
(aij ) =
cos θ
sin θ
− sin θ
.
cos θ
I leave it to you to compute directly and show ATA = AAT = I and det A = +1.
Example 17. Referring to Example 16, we can show that any (real) 2×2 orthogonal
matrix with det A = +1 has the form
cos θ − sin θ
(aij ) =
sin θ
cos θ
for some θ ∈ R. To see this, suppose A has the form
a b
c d
where a, b, c, d ∈ R. Since A is orthogonal, its rows form an orthonormal set, and
hence we have
a2 + b2 = 1,
c2 + d2 = 1,
ac + bd = 0,
ad − bc = 1
where the last equation follows from det A = 1.
If a = 0, then the first of these equations yields b = ±1, the third then yields
d = 0, and the last yields −c = 1/b = ±1 which is equivalent to c = −b. In other
words, if a = 0, then A has either of the forms
0 1
0 −1
or
.
−1 0
1
0
The first of these is of the required form if we choose θ = −90◦ = −π/2, and the
second is of the required form if we choose θ = +90◦ = +π/2.
Now suppose that a 6= 0. From the third equation we have c = −bd/a, and
substituting this into the second equation, we find (a2 + b2 )d2 = a2 . Using the first
48
equation, this becomes a2 = d2 or a = ±d. If a = −d, then the third equation
yields b = c, and hence the last equation yields −a2 − b2 = 1 which is impossible.
Therefore a = +d, the third equation then yields c = −b, and we are left with
a −c
c
a
Since det A = a2 + c2 = 1, there exists a real number θ such that a = cos θ and
c = sin θ which gives us the desired form for A.
One of the most important and useful properties of matrices over C is that they
can always be put into triangular form by an appropriate transformation. To show
this, it will be helpful to recall from Section 1 that if A and B are two matrices for
which the product AB is defined, then the ith row of AB is given by (AB)i = Ai B
and the ith column of AB is given by (AB)i = AB i .
Theorem 17 (Schur Canonical Form). If A ∈ Mn (C), then there exists a unitary matrix U ∈ Mn (C) such that U † AU is upper-triangular. Furthermore, the
diagonal entries of U † AU are just the eigenvalues of A.
Proof. The proof is by induction. If n = 1 there is nothing to prove, so we assume
the theorem holds for any square matrix of size n − 1 ≥ 1, and suppose A is of size
n. Since we are dealing with the algebraically closed field C, we know that A has n
(not necessarily distinct) eigenvalues. Let λ be one of these eigenvalues, and denote
e 1 . We extend U
e 1 to a basis for Cn , and by the
the corresponding eigenvector by U
Gram-Schmidt process we assume this basis is orthonormal. From our discussion
e
above, we see that this basis may be used as the columns of a unitary matrix U
1
e
with U as its first column. We then see that
e † AU
e )1 = U
e † (AU
e )1 = U
e † (AU
e 1) = U
e † (λU
e 1 ) = λ(U
e †U
e 1)
(U
e †U
e )1 = λI 1
= λ(U
e † AU
e has the form
and hence U


e † AU
e =
U


λ
0
..
.
0
∗···∗
B





where B ∈ Mn−1 (C) and the *’s are (in general) nonzero scalars.
By our induction hypothesis, we may choose a unitary matrix W ∈ Mn−1 (C)
such that W † BW is upper-triangular. Let V ∈ Mn (C) be a unitary matrix of the
49
form



V =

1
0
..
.
0···0
W
0





e V ∈ Mn (C). Then
and define the unitary matrix U = U
e V )† A(U
e V ) = V † (U
e † AU
e )V
U † AU = (U
is upper-triangular since (in an obvious shorthand notation)
λ ∗
1 0
1 0
λ
e † AU
e )V = 1 0 †
V † (U
=
0 W
0 B
0 W
0 W†
0
λ
∗
=
0 W † BW
∗
BW
and W † BW is upper-triangular by the induction hypothesis.
It is easy to see (since the determinant of a triangular matrix is the product of
the diagonal entries) that the roots of det(λI − U † AU ) are just the diagonal entries
of U † AU because λI − U † AU is of the upper triangular form


λ − (U † AU )11
∗
∗


0
λ − (U † AU )22
∗




..
..
..


.
∗
.
.
0
λ − (U † AU )nn
0
where the *’s are just some in general nonzero entries. But
det(λI − U † AU ) = det[U † (λI − A)U ] = det(λI − A)
so that A and U † AU have the same eigenvalues.
Corollary. If A ∈ Mn (R) has all its eigenvalues in R, then the matrix U defined
in Theorem 17 may be chosen to have all real entries.
Proof. If λ ∈ R is an eigenvalue of A, then A − λI is a real matrix with determinant
det(A−λI) = 0, and therefore the homogeneous system of equations (A−λI)X = 0
e 1 = X, we may now proceed as in Theorem 17. The
has a real solution. Defining U
details are left to you.
We say that two matrices A, B ∈ Mn (C) are unitarily similar if there exists a
unitary matrix U such that B = U † AU = U −1 AU . Since this defines an equivalence
50
relation on the set of all matrices in Mn (C), it is also common to say that A and
B are unitarily equivalent.
I leave it to you to show that if A and B are unitarily similar and A is normal,
then B is also normal. In particular, suppose U is unitary and N is such that
U † N U = D is diagonal. Since any diagonal matrix is automatically normal, it
follows that N must be normal also. In other words, any matrix unitarily similar
to a diagonal matrix is normal. We now show that the converse is also true, i.e.,
that any normal matrix is unitarily similar to a diagonal matrix. This extremely
important result is the basis for many physical applications in both classical and
quantum physics.
To see this, suppose N is normal, and let U † N U = D be the Schur canonical
form of N . Then D is both upper-triangular and normal (since it is unitarily
similar to a normal matrix). We claim that the only such matrices are diagonal.
For, consider the (1, 1) elements of DD† and D† D. From what we showed above,
we have
(DD† )11 = hD1 , D1 i = |d11 |2 + |d12 |2 + · · · + |d1n |2
and
(D† D)11 = hD1 , D1 i = |d11 |2 + |d21 |2 + · · · + |dn1 |2 .
But D is upper-triangular so that d21 = · · · = dn1 = 0. By normality we must have
(DD† )11 = (D† D)11 , and therefore d12 = · · · = d1n = 0 also. In other words, with
the possible exception of the (1, 1) entry, all entries in the first row and column of
D must be zero. In the same manner, we see that
(DD† )22 = hD2 , D2 i = |d21 |2 + |d22 |2 + · · · + |d2n |2
and
2
2
2
(D† D)22 = hD2 , D2 i = |d12 | + |d22 | + · · · + |dn2 | .
Since the fact that D is upper-triangular means d32 = · · · = dn2 = 0 and we just
showed that d21 = d12 = 0, it again follows by normality that d23 = · · · = d2n = 0.
Thus all entries in the second row and column with the possible exception of the
(2, 2) entry must be zero.
Continuing this procedure, it is clear that D must be diagonal as claimed. In
other words, an upper-triangular normal matrix is necessarily diagonal. This discussion proves the following very important theorem.
Theorem 18. A matrix N ∈ Mn (C) is normal if and only if there exists a unitary
matrix U such that U † N U is diagonal.
Corollary. If A = (aij ) ∈ Mn (R) is symmetric, then its eigenvalues are real and
there exists an orthogonal matrix S such that S T AS is diagonal.
51
Proof. If the eigenvalues are real, then the rest of this corollary follows from the
corollary to Theorem 17 and the real analogue of the proof of Theorem 18.
Now suppose A = AT so that aij = aji . If λ is an eigenvalue of A, then there
exists a (nonzero and not necessarily real) vector x ∈ Cn such that Ax = λx and
hence
2
hx, Axi = λhx, xi = λ kxk .
On the other hand, using equation (28) we see that
2
hx, Axi = hAT x, xi = hx, AT xi∗ = hx, Axi∗ = λ∗ hx, xi∗ = λ∗ kxk .
2
Subtracting these last two equations yields (λ − λ∗ ) kxk = 0 and hence λ = λ∗
since kxk 6= 0 by definition.
Let me make some observations. Note that any basis relative to which a normal
matrix N is diagonal is by definition a basis of eigenvectors. The unitary transition
matrix U that diagonalizes N has columns that are precisely these eigenvectors,
and since the columns of a unitary matrix are orthonormal, it follows that the
eigenvector basis is in fact orthonormal. Of course, the analogous result holds for a
real symmetric matrix also.
52
Part II: Vector Calculus
7
Surfaces
To begin, it’s worth being a little bit careful in defining a surface. For simplicity
we restrict attention to R3 , and since this isn’t an advanced calculus course, we’ll
stick to a less formal approach and rely instead on intuition. So, a surface in R3 is
best represented parametrically as
x = x(u, v) = (x(u, v), y(u, v), z(u, v))
where (u, v) ∈ A ⊂ R2 .
v
z
u = const
u = const
v = const
v = const
y
u
x
For v = v0 = const, the curve u → x(u, v0 ) has tangent vector
∂x ∂y ∂z
∂x
=
,
,
∂u
∂u ∂u ∂u
which is traditionally denoted by xu . Similarly, for u = u0 = const we have the
curve v → x(u0 , v) with tangent
∂x
∂x ∂y ∂z
:= xv .
=
,
,
∂v
∂v ∂v ∂v
At any point (u, v), the vectors xu and xv span the tangent plane to the surface.
Their cross product is the normal to the surface (because it’s normal to both xu
and xv ) and hence we have the surface normal
x̂
ŷ
ẑ xu × xv = ∂x/∂u ∂y/∂u ∂z/∂u .
∂x/∂v ∂y/∂v ∂z/∂v Therefore the principle normal is the unit normal
n̂ =
xu × xv
.
kxu × xv k
53
n̂
xv
xu
Example 18. The sphere of radius a can be expressed as
r = x(θ, ϕ) = (a sin θ cos ϕ, a sin θ sin ϕ, a cos θ)
where 0 ≤ θ ≤ π and 0 ≤ ϕ ≤ 2π. Then we have
x̂
ŷ
ẑ
xθ × xϕ = a cos θ cos ϕ a cos θ sin ϕ −a sin θ −a sin θ sin ϕ a sin θ cos ϕ
0
= a2 sin2 θ cos ϕ x̂ + a2 sin2 θ sin ϕ ŷ + a2 sin θ cos θ ẑ
so that
2
kxθ × xϕ k = (a2 sin2 θ cos ϕ)2 + (a2 sin2 θ sin ϕ)2 + (a2 sin θ cos θ)2
= a4 sin2 θ
and therefore
n̂ =
xθ × xϕ
= sin θ cos ϕ x̂ + sin θ sin ϕ ŷ + cos θ ẑ .
kxθ × xϕ k
Note that this is just n̂ = (1/a)r. But krk = kx(θ, ϕ)k = a so we have n̂ = r̂ as
should have been expected.
What about the area of a surface? We subdivide the region A ⊂ R2 into small
rectangles and look at the image of each little rectangle.
v
x(u, v + dv)
A
x(u, v)
u
54
x(u + du, v)
For the infinitesimal rectangle on the surface we have the distance between points
given by
x(u + du, v) − x(u, v) ≈
∂x
du = xu du
∂u
x(u, v + dv) − x(u, v) ≈
∂x
dv = xv dv
∂v
so the element of area is
dS = kxu du × xv dvk = kxu × xv k du dv
and hence that total area of the surface is given by
Z
S=
kxu × xv k du dv .
A
It also follows that the integral of a function f (x, y, z) over the surface is given by
Z
Z
f (x, y, z) dS =
f (x(u, v), y(u, v), z(u, v)) kxu × xv k du dv .
A
8
A
Gradients
Now let us consider the gradient of a function f (x, y, z). We know that the differential of the function is given by
df =
∂f
∂f
∂f
dx +
dy +
dz .
∂x
∂y
∂z
Since dx = dx x̂ + dy ŷ + dz ẑ, we may define the vector
∇f =
∂f
∂f
∂f
x̂ +
ŷ +
ẑ
∂x
∂y
∂z
so that
df = ∇f · dx .
The vector operator ∇ is called the gradient vector:
∂
∂
∂
∂ ∂ ∂
= x̂
,
,
+ ŷ
+ ẑ
.
∇=
∂x ∂y ∂z
∂x
∂y
∂z
To understand what this represents physically, suppose that the displacement
dx is in the direction of constant f , i.e., dx is tangent to the surface f = const.
Then clearly df = 0 in that direction so that df = ∇f · dx = 0. But this means
that ∇f ⊥dx; in other words, the gradient is orthogonal to the surface. And since
df = ∇f · dx = k∇f k kdxk cos θ, this shows that df will be largest when cos θ = 1,
i.e., when dx points along ∇f .
55
Other coordinate systems work the same way. For example, if we have f (r, θ, ϕ),
then
∂f
∂f
∂f
df =
dr +
dθ +
dϕ
∂r
∂θ
∂ϕ
while
dx = dr r̂ + rdθ θ̂ + r sin θdϕ ϕ̂ .
Remark : The first thing to formulate is how to find an infinitesimal displacement dx
in a curvilinear coordinate system. Let us consider the usual spherical coordinates
as an example.
z
θ
x
y
ϕ
x
Writing kxk = r, the position vector x has (x, y, z) coordinates
x = (r sin θ cos ϕ, r sin θ sin ϕ, r cos θ) .
If we let ui stand for the ith coordinate of a general curvilinear coordinate system,
then a unit vector in the ui direction is by definition
ûi =
∂x/∂ui
.
k∂x/∂ui k
For our spherical coordinates we have for r:
∂x
= (sin θ cos ϕ, sin θ sin ϕ, cos θ)
∂r
and
so that
1/2
∂x = ∂x , ∂x
=1
∂r ∂r ∂r
r̂ = (sin θ cos ϕ, sin θ sin ϕ, cos θ)
For θ:
and
∂x
= r̂ .
∂r
∂x
= (r cos θ cos ϕ, r cos θ sin ϕ, −r sin θ)
∂θ
56
and
so that
1/2
∂x = ∂x , ∂x
=r
∂θ ∂θ ∂θ
θ̂ = (cos θ cos ϕ, cos θ sin ϕ, − sin θ)
For ϕ:
and
so that
and
∂x
= r θ̂ .
∂θ
∂x
= (−r sin θ sin ϕ, r sin θ cos ϕ, 0)
∂ϕ
1/2
∂x = ∂x , ∂x
= r sin θ
∂ϕ ∂ϕ ∂ϕ
ϕ̂ = (− sin ϕ, cos ϕ, 0)
and
∂x
= r sin θ ϕ̂ .
∂ϕ
Putting this all together we see that
dx =
∂x
∂x
∂x
dr +
dθ +
dϕ
∂r
∂θ
∂ϕ
or
dx = dr r̂ + rdθ θ̂ + r sin θdϕ ϕ̂ .
You can also easily verify the the unit vectors r̂, θ̂, ϕ̂ constructed in this manner
are orthonormal.
While this was the correct way to find dx, the easy way to find it in various
coordinate systems is to hold two variables constant, vary the third, and see what
the resulting displacement is. In the case of spherical coordinates, holding θ, ϕ
constant and varying r we have dx = dr r̂. Holding r, ϕ constant and varying θ
we have dx = rdθ θ̂. Finally, holding r, θ constant and varying ϕ we have dx =
r sin θ dϕ ϕ̂. Putting these together we obtain the general displacement dx = dr r̂ +
rdθ θ̂ + r sin θdϕ ϕ̂. Note also that each of the different dx’s lies on the edge of an
infinitesimal “cube,” and hence the volume element in spherical coordinates is the
product of the sides of the cube or d3 x = r2 sin θ drdθdϕ.)
In any case, we see that we can write
1 ∂f
1 ∂f
∂f
r̂ +
θ̂ +
ϕ̂ · (dr r̂ + rdθ θ̂ + r sin θdϕ ϕ̂)
df =
∂r
r ∂θ
r sin θ ∂ϕ
and hence from df = ∇f · dx it follows that
∇ = r̂
∂
∂
1 ∂
1
+ θ̂
+ ϕ̂
.
∂r
r ∂θ
r sin θ ∂ϕ
It is important to realize that this is the form of the gradient with respect to an
57
orthonormal set of basis vectors. Such a basis is called a non-coordinate basis in
differential geometry.
9
Rotations
First recall the definition of the matrix representation of a linear operator: If T ∈
L(U, V ) where U has basis {ui } and V has basis {vi }, then the matrix representation
(ai j ) of T with respect to these bases is defined by T ui = vj aj i . We will sometimes
write A = [T ]vu to denote the fact that A is the matrix representation of T with
respect to the given bases of U and V . Since with respect to {vi }, the basis vectors
themselves have coordinates v1 = (1, 0, . . . , 0), . . . , vn = (0, . . . , 0, 1), we see that
 1 
 
 
a i
1
1
 .. 
 ..  n
 ..  1
1
n
T ui = v1 a i + · · · vn a i =  .  a i + · · · +  .  a i =  . 
0
0
an i
which is the ith column of the matrix (aj i ). In other words, a linear transformation
T takes the ith basis vector into the ith column of the matrix representation of T .
The result of T acting on an arbitrary vector x ∈ U is then given by
T x = T (xi ui ) = xi T ui = xi vj aj i = (aj i xi )vj .
If we write y = T x, then y = y j vj = (aj i xi )vj and we see that
y j = aj i xi .
Suppose we have a vector space V with two bases {ei } and {ēi }. Then any basis
vector ēi can be written in terms of the basis {ei } as
ēi = ej pj i
where the transition matrix (pi j ) is necessarily nonsingular (since we can also
write ei in terms of the basis {ēi }). It is useful to think of the transition matrix as
defining a linear operator P by ēi = P (ei ), and then (pi j ) is the matrix representation of P relative to the bases {ei } and {ēi } as defined in the usual manner. Since
any x ∈ V can be written in terms of either basis, we have
x = xj ej = x̄i ēi = x̄i ej pj i = (pj i x̄i )ej
which implies that
xj = pj i x̄i
or equivalently
x̄i = (p−1 )i j xj .
(29)
What we will now do is focus on rotations in R2 . Rotating the basis vectors is just
a particular type of change of basis, and the transition matrix is just a rotation
matrix.
58
To begin with, let r be a vector in R2 and consider a counterclockwise rotation
of the x1 x2 -plane about the x3 -axis as shown below. (For simplicity of notation, we
will let x = x1 and y = x2 .)
x̄2
x2
r
ē2 θ
e2
φ ē1
θ
e1
x̄1
x1
The vectors ei and ēi are the usual orthonormal basis vectors with kei k = kēi k = 1.
From the geometry of the diagram we see that
ē1 = (cos θ)e1 + (sin θ)e2
ē2 = −(sin θ)e1 + (cos θ)e2
so that ēi = P (ei ) = ej pj i and the transition matrix (pj i ) is given by
cos θ − sin θ
j
(p i ) =
.
sin θ
cos θ
(30)
You can easily compute the matrix P −1 , but it is better to make the general observation that rotating the coordinate system doesn’t change the length of r. So using
krk2 = xi xi = x̄j x̄j together with xi = pi j x̄j this becomes
xi xi = pi j x̄j pi k x̄k = (pT )k i pi j x̄j x̄k := x̄j x̄j
so that we must have
(pT )k i pi j = δjk .
In matrix notation this is just P T P = I which implies that P T = P −1 . This is the
definition of an orthogonal transformation (or orthogonal matrix). In other
words, a matrix A ∈ Mn (F ) is said to be orthogonal if and only if AT = A−1 .
As an important consequence of this definition, note that if A is orthogonal,
then
1 = det I = det(AA−1 ) = det(AAT ) = (det A)(det AT ) = (det A)2
and hence
det A = ±1 .
59
(31)
Going back to our example rotation, we therefore have
cos θ sin θ
P −1 = P T =
− sin θ cos θ
so that x̄i = (p−1 )i j xj = (pT )i j xj or
x̄1 = (cos θ)x1 + (sin θ)x2
x̄2 = −(sin θ)x1 + (cos θ)x2
To check these results, we first verify that P −1 = P T :
cos θ sin θ
cos θ − sin θ
1 0
PTP =
=
=I.
− sin θ cos θ
sin θ
cos θ
0 1
Next, from the diagram we see that
x1 = r cos(θ + φ) = r cos θ cos φ − r sin θ sin φ
= (cos θ)x̄1 − (sin θ)x̄2
x2 = r sin(θ + φ) = r sin θ cos φ + r cos θ sin φ
= (sin θ)x̄1 + (cos θ)x̄2
In matrix form this is
or, alternatively,
x1
x2
=
cos θ
sin θ
x̄1
x̄2
=
cos θ
− sin θ
− sin θ
cos θ
x̄1
x̄2
(32)
sin θ
cos θ
x1
x2
(33)
which is the same as we saw above using (pT )i j .
To be completely precise, the rotation that we have just described is properly
called a passive transformation because it left the vector alone and rotated the
coordinate system. An alternative approach is to leave the coordinate system alone
and rotate the vector itself. This is called an active transformation. One must be
very careful when reading the literature to be aware of just which type of rotation
is under consideration. Let’s compare the two types of rotation.
With an active transformation we have the following situation:
x2
r̄
r
θ
φ
x1
60
Here the vector r is rotated by θ to give the vector r̄ where, of course, krk = kr̄k.
In the passive case we defined the transition matrix P by ēi = P (ei ). Now, in the
active case we define a linear transformation T by r̄ = T (r). From the diagram,
the components of r̄ are given by
x̄1 = r cos(θ + φ) = r cos θ cos φ − r sin θ sin φ
= (cos θ)x1 − (sin θ)x2
x̄2 = r sin(θ + φ) = r sin θ cos φ + r cos θ sin φ
= (sin θ)x1 + (cos θ)x2
or
x̄1
x̄2
=
Another way to write this is
cos θ
sin θ
− sin θ
cos θ
x1
x2
.
(34)
(x̄1 , x̄2 ) = T (x1 , x2 ) = ((cos θ)x1 − (sin θ)x2 , (sin θ)x1 + (cos θ)x2 ).
Then the first column of [T ] is
T (e1 ) = T (1, 0) = (cos θ, sin θ)
and the second column is
T (e2 ) = T (0, 1) = (− sin θ, cos θ)
so that
[T ] =
cos θ
sin θ
− sin θ
cos θ
.
as in equation (34).
Carefully compare the matrix in equation (34) with that in equation (33). The
matrix in equation (34) is obtained from the matrix in equation (33) by letting
θ → −θ. This is the effective difference between active and passive rotations. If a
passive transformation rotates the coordinate system counterclockwise by an angle
θ, then the corresponding active transformation rotates the vector by the same angle
but in the clockwise direction.
Now let us see how the scalar and vector products behave under orthogonal
transformations in R3 . Let a, b ∈ R3 , and consider a · b = kak kbk cos θ. Since
the magnitude of a vector doesn’t change under a rotation, and the angle between
two vectors also doesn’t change, it should be clear that a · b is also invariant under
rotations. That’s why it’s called a scalar product. However, we can easily prove
the invariance directly as follows.
Under an orthogonal transformation a → ā we have āi = (pT )i j aj (and similarly
for b) where the rotation matrix P is orthogonal (P −1 = P T ). Then
l
ā · b̄ = āi b̄i = (pT )i k (pT )i ak bl = pl i (pT )i k ak bl = δkl ak bl = ak bk = a · b
61
so that the scalar product is indeed invariant under orthogonal transformations.
However, what can we say about the cross product? Since the definition of
a × b depends on the right-hand rule, it is reasonable to suspect that there will
be some kind of dependence on the particular orthogonal transformation (since we
know that the determinant of an orthogonal transformation can be either +1 or
−1). Proceeding as in the scalar product, we have (using Theorem 3)
r
s
ā × b̄ = (ā × b̄)i ēi = εijk āj b̄k ēi = εijk (pT )j (pT )k ar bs pt i et
= εijk pt i pr j ps k ar bs et = εtrs det(pi j )ar bs et
= (a × b)t et det P
= (a × b) det P .
Now, a proper rotation is one that has det P = +1. These are the rotations
that can be continuously connected to the identity transformation (which obviously
has det I = +1). So for a proper rotation we have ā × b̄ = a × b. However, under
an improper rotation (i.e., parity or inversion of coordinates) we have det P = −1
so that ā × b̄ = −a × b. This is why the cross product of two vectors is called a
pseudo vector or an axial vector—its definition depends on the orientation of
the coordinate system.
We have seen that under a rotation, the components of a vector x transform as
xi → x̄i = (p−1 )i j xj (see equation (29)), and this can be taken as the definition
of a vector, i.e., a quantity whose components transform under proper rotations
according to this rule. It is not hard to show directly that under proper rotations,
the cross product transforms as a vector. The proof is essentially what we just did
r
above, noting that P −1 P = I (so that δli = (p−1 )l pr i ) and det P −1 = (det P )−1 =
+1:
n
m
m
n
(ā × b̄)i = εijk āj b̄k = εijk (p−1 )j (p−1 )k am bn = εljk δli (p−1 )j (p−1 )k am bn
n
m
r
= εljk (p−1 )l pr i (p−1 )j (p−1 )k am bn = εrmn (det P −1 )pr i am bn
= (a × b)r pr i = (pT )i r (a × b)r
= (p−1 )i r (a × b)r
which is the definition of how vectors transform under proper rotations.
Finally, what can we say about how the gradient transforms? (For simplicity,
we restrict consideration to cartesian coordinates only.) From xk = pk i x̄i we have
pk i =
∂xk
∂ x̄i
and hence from the chain rule we have
∂
∂xk ∂
∂
=
= pk i k .
i
∂ x̄
∂ x̄i ∂xk
∂x
62
Then (again using P −1 = P T )
X
X
X
∂
∂
k ∂
e j pj i pk i k =
ej pj i (p−1 )i
ēi i =
∇=
∂ x̄
∂x
∂xk
i
ijk
=
X
ej δ jk
jk
ijk
X
∂
∂
ej j
=
k
∂x
∂x
j
= ∇.
In other words, the gradient is invariant under rotations, just like an ordinary vector.
10
The Divergence Theorem
Suppose we have a vector field A = A(x, y, z) defined throughout some region of
R3 . The field A could represent the velocity field of a fluid flow, or the electric
field of some charge distribution, or a lot of other things. Consider a small volume
element dxdydz centered at a point (x, y, z).
z
A(x, y, z)
y
(x, y, z)
dz
dx
x
dy
Let dS represent an outward directed surface element. For example, on the near
face we would have dS = dydz x̂. You can think of dS as n̂ dS, i.e., dS is normal to
the surface. Then the flux dφ of A thru the surface element dS is given by
dφ = A · dS = A · n̂ dS .
Example 19. As an example of a flux calculation, let A(x, y, z) = zẑ, and let us
find the flux of A across the sphere described in Example 18. We saw that
xθ × xϕ = a2 sin2 θ cos ϕ x̂ + a2 sin2 θ sin ϕ ŷ + a2 sin θ cos θ ẑ
and
n̂ =
xθ × xϕ
= sin θ cos ϕ x̂ + sin θ sin ϕ ŷ + cos θ ẑ .
kxθ × xϕ k
63
Now A = zẑ = a cos θẑ so that A · n̂ = a cos2 θ and hence the total flux of A across
the surface of the sphere is given by
Z
Z
φ = A · n̂ dS = A · n̂ kxθ × xϕ k dθdϕ
=
Z
(a cos2 θ)(a2 sin θ) dθdϕ = 2πa3
π
cos2 θ sin θ dθ
0
= 2πa
3
Z
1
−1
=
Z
cos2 θ d cos θ = 2πa3 ·
2
3
4 3
πa .
3
You should be able to see the reason for this answer.
Now, we were given A at the center (x, y, z) of the volume element. To evaluate
dφ on the surface, we need A on the surface. On the right-hand face of the volume
element we have the field
A(x, y + dy/2, z) = A(x, y, z) +
∂A dy
∂y 2
so the flux thru the right-hand side is
∂Ay dy
dxdz .
dφR = A · n̂ dS = A · ŷ dxdz = Ay +
∂y 2
On the left-hand side we have
A(x, y − dy/2, z) = A(x, y, z) −
and
dφL = A · n̂ dS = −A · ŷ dxdz =
∂A dy
∂y 2
∂Ay dy
dxdz
− Ay +
∂y 2
where in this case the outward unit normal is n̂ = −ŷ. Therefore, the net outward
flux thru these two faces is
dφL + dφR =
∂Ay
dxdydz .
∂y
We can clearly do the same for the other two pairs of faces, so adding up the
total for the entire volume element we obtain
∂Ax
∂Ay
∂Az
dφtotal =
dxdydz
+
+
∂x
∂y
∂z
:= ∇ · A d3 x := (div A)d3 x
64
where this expression defines the divergence of the vector field A. So what div A
represents is the outward flux of A per unit volume.
If we have a large volume, then we break it up into infinitesimal volume elements
and note that the contribution to the common interior faces will cancel. Then the
total flux out of the bounding surface is given by
Z
φtotal = d3 x ∇ · A .
On the other hand, we see from dφ = A · n̂ dS that the total flux is also given by
Z
φtotal = A · n̂ dS .
Equating these two expressions yields the Divergence Theorem:
Theorem 19. Let V be a volume in R3 with outward oriented bounding surface S.
If A is a vector field defined over V with continuous first partial derivatives on an
open set containing V , then
Z
Z
3
d x∇ · A =
A · n̂ dS .
V
S
While we haven’t given a careful definition of orientation, for our purposes we
take it as an intuitively clear concept.
11
Stokes’ Theorem
Stokes’ Theorem is probably the most important of all vector integral theorems. In
fact, in its most general form (which is formulated in terms of differential forms on
a manifold, and is beyond the scope of these notes) it includes the classical Green’s
theorem and the divergence theorem as special cases. We begin with a discussion
of line integrals in R3 .
Suppose we have a vector field
A(x, y, z) = Ax (x, y, z)x̂ + Ay (x, y, z)ŷ + Az (x, y, z)ẑ
and a line element dr = x̂dx + ŷdy + ẑdz. Then the line integral of A along a curve
C is defined by
Z
Z
Ax (x, y, z)x̂ + Ay (x, y, z)ŷ + Az (x, y, z)ẑ .
A · dr =
C
C
65
Example 20. Consider the two-dimensional vector field A(x, y) = x̂ cos x + ŷ sin x
and let C be the curve C : r(t) = tx̂ + t2 ŷ for −1 ≤ t ≤ 2. Then
Z
Z
A · dr =
A · r′ (t) dt
C
C
=
Z
2
−1
=
Z
(x̂ cos x + ŷ sin x) · (x̂ + 2tŷ) dt
2
(cos t + 2t sin t) dt
−1
2
= 3 sin t − 2t cos t
−1
= 3(sin 2 + sin 1) − 2(2 cos 2 + cos 1) ≈ 5.84
where
to go from theR third line to the fourth we integrated by parts to write
R
t sin t dt = −t cos t + cos t dt.
H
Let us first consider the
H integral A · dr around an infinitesimal path in the
xy-plane. (The circle on simply means that the integral is taken around a closed
path.)
y
A(x, y, z)
dy
dx
x
z
Note that the path is “positively oriented,” meaning that if your right hand has
its fingers in the direction of the path, then your thumb points along the positive
z-axis.
For this path we have
I
I
I
A · dr = Ax dx + Ay dy .
The first integral on the right-hand side has contributions at y − dy/2 and at dy +
66
dy/2:
I
Ax dx = Ax (x, y − dy/2, z) dx − Ax (x, y + dy/2, z) dx
∂Ax dy
∂Ax dy
= Ax (x, y, z) −
dx − Ax (x, y, z) +
dx
∂y 2
∂y 2
=−
∂Ax
dydx
∂y
where the (−) sign between the two terms in the first line is because along the top
path we have dr = −dx x̂. Similarly,
I
Ay dy = −Ay (x − dx/2, y, z) dy + Ay (x + dx/2, y, z)
=
so that
I
A · dr =
∂Ax
∂Ay
dxdy .
−
∂x
∂y
We now define the “vector” ∇ × A by
∂Az
∂Ax
∂Ay
∂Ay
∂Az
∂Ax
∇×A=
x̂ +
ŷ +
ẑ
−
−
−
∂y
∂z
∂z
∂x
∂x
∂y
x̂
ŷ
ẑ = ∂/∂x ∂/∂y ∂/∂z A
A
A x
or
∂Ay
dxdy
∂x
y
z
∂
Ak .
∂xj
The quantity ∇ × A is called the curl of A, and is frequently written curl A. If
∇ × A = 0, then we say that A is irrotational, and because of this, curl A is
sometimes called rot A.
Returning to our integral above, we have
H dxdy = kdSk where dS = ẑ dS, and
hence we can write our result in the form A · dr = (∇ × A) · dS. This result
was derived for an infinitesimal element of area in the xy-plane. What about other
orientations of the area element? Well, we have shown that the gradient transforms
like a vector under rotations, as does the cross product of two vectors. This means
that the curl behaves like a vector. But we also know that the scalar product of
two vectors is a scalar, and thus
I
A · dr = (∇ × A) · dS
(∇ × A)i = εijk
67
is true independently of the orientation of the infinitesimal area element dS.
To handle a large area S, we do the obvious and break the area up into very
small rectangles, each traversed in the positive direction:
∂S
S
For the ith area element we have
I
A · dri = (∇ × A) · dSi .
P H
If we sum these over all elements i, then in the term i A · dri fromPthe
H left-hand
side,
the
dr
’s
in
adjacent
paths
will
cancel
so
that
we
are
left
with
A · dri =
i
i
H
A
·
dr
where
the
integral
is
taken
over
the
external
boundary
∂S.
And
summing
∂S
R
P
the terms on the right-hand side we clearly have i (∇ × A) · dSi = S (∇ × A) · dS.
In other words, we have (essentially) proved Stokes’ Theorem:
Theorem 20. Let S be a piecewise smooth oriented surface bounded by a simple,
closed piecewise smooth curve ∂S that also has positive orientation. If the vector
field A(x, y, z) has continuous first partial derivatives on an open set containing S,
then
I
Z
A · dr .
(∇ × A) · dS =
∂S
S
12
Curvilinear Coordinates
We now want to take a look at how to formulate the gradient, divergence, curl and
laplacian in a general curvilinear coordinate system in R3 . Throughout this section,
we refrain from using the summation convention, so that repeated indices do not
imply a summation.
To begin with, we assume that we have three families of mutually orthogonal
surfaces which we label by the variables q1 , q2 , q3 . These are called the curvilinear
coordinates of a point in R3 . (By mutually orthogonal surfaces, we mean that at
68
each point in the region under consideration, the three coordinate curves intersect
in a mutually orthogonal manner.)
Let there be an infinitesimal region under consideration, and let dli be an element of length perpendicular to the surface qi = const. The length dli is really
the distance between surfaces at qi and qi + dqi in the infinitesimal region under
consideration, and in general can be written in the form
dli = hi dqi
where hi is in general a function of the coordinates q1 , q2 , q3 .
We also have the three unit vectors ûi defined by (see Section 8)
ûi =
∂x/∂qi
.
k∂x/∂qi k
Each ûi is orthogonal to the surface of constant qi . (For example, think of x̂ as
being orthogonal to a surface of constant x.) However, be sure to realize that
their orientation in space depends on their location, except in the particular case
of rectangular coordinates. Thus a general displacement dx can be written
dx = dl1 û1 + dl2 û2 + dl3 û3
= h1 dq1 û1 + h2 dq2 û2 + h3 dq3 û3
and the volume element is dl1 dl2 dl3 or
d3 x = h1 h2 h3 dq1 dq2 dq3 .
If we write dli = hi dqi ûi , then recalling the scalar triple product defined in Section
2, we can write
d3 x = Vol(dl1 , dl2 , dl3 ) = |(h1 dq1 û1 × h2 dq2 û2 ) · (h3 dq3 û3 )|
h1 dq1
0
0 h2 dq2
0 = 0
0
0
h dq 3
3
= h1 h2 h3 dq1 dq2 dq3 .
Now suppose we have a scalar function f = f (q1 , q2 , q3 ). Then its differential is
given by
∂f
∂f
∂f
df =
dq1 +
dq2 +
dq3 .
∂q1
∂q2
∂q3
But dqi = (1/hi )dli so we have
df =
1 ∂f
1 ∂f
1 ∂f
dl1 +
dl2 +
dl3
h1 ∂q1
h2 ∂q2
h3 ∂q3
69
and in terms of dx = dl1 û1 + dl2 û2 + dl3 û3 this may be written as df = ∇f · dx
where ∇f is defined by
∇f =
1 ∂f
1 ∂f
1 ∂f
û1 +
û2 +
û3
h1 ∂q1
h2 ∂q2
h3 ∂q3
or
∇f =
(35)
∂f
∂f
∂f
û1 +
û2 +
û3 .
∂l1
∂l2
∂l3
For example, recall from the beginning of Section 8 that in spherical coordinates
we had dx = dr r̂ + rdθ θ̂ + r sin θdϕ ϕ̂. Then a straightforward application of
equation (35) again yields
∇ = r̂
∂
1 ∂
1
∂
+ θ̂
+ ϕ̂
.
∂r
r ∂θ
r sin θ ∂ϕ
Now for the divergence of a vector field A. We do exactly what we did earlier
and compute the flux dφ = A · dS = A · n̂ dS thru the faces of a small volume
element centered at the point (q1 , q2 , q3 ), with h-values h1 , h2 , h3 at the center.
q3
A(q1 , q2 , q3 )
q2
dl3
dl1
q1
dl2
On the right-hand side of the element we have n̂ = û2 so (recall dφ = A · n̂ dS
where dS = dli dlj = hi hj dqi dqj is evaluated on the surface of the element)
dφR = A(q1 , q2 + dq2 /2, q3 ) · û2 dl1 dl3
∂h1 dq2
∂h3 dq2
∂A2 dq2
h1 +
h3 +
dq1 dq3
= A2 +
∂q2 2
∂q2 2
∂q2 2
= A2 h1 h3 dq1 dq3 +
∂
dq2
(A2 h1 h3 ) dq1
dq3
∂q2
2
where we have kept terms thru 3rd order in dqi .
On the left-hand side we have n̂ = −û2 so
dφL = −A(q1 , q2 − dq2 /2, q3 ) · û2 dl1 dl3
70
∂A2 dq2
∂h1 dq2
∂h3 dq2
= − A2 −
h1 −
h3 −
dq1 dq3
∂q2 2
∂q2 2
∂q2 2
= −A2 h1 h3 dq1 dq3 +
dq2
∂
(A2 h1 h3 ) dq1
dq3 .
∂q2
2
The net outward flux thru these faces is then given by
dφ2 = dφR + dφL =
∂
(A2 h1 h3 )dq1 dq2 dq3 .
∂q2
We repeat this for the other two pairs of faces (you can just cyclically permute
the indices 1 → 2 → 3 → 1) to get the total net outward flux. By definition, the net
outward flux per unit volume is the divergence, so using d3 x = h1 h2 h3 dq1 dq2 dq3 we
obtain
1
∂
∂
∂
∇·A =
(A1 h2 h3 ) +
(A2 h3 h1 ) +
(A3 h1 h2 ) .
(36)
h1 h2 h3 ∂q1
∂q2
∂q3
H
Next we look at curl A. We have seen that A · dr = (∇ × A) · dS is true
independently of the orientation of dS. Because of this, we can define the curl by
I
1
A · dr
(∇ × A)n = lim
S→0 S
where (∇ × A)n = (∇ × A) · n̂ is the component of the curl normal to dS = n̂ dS.
Let n̂ = û1 and consider this path of integration:
q3
A
d
a
b
q2
c
q1
For this path we have dS = dl2 dl3 = hH2 h3 dq2 dq3 .H The segments (a)–(d) have the
following contributions to the integral A · dr = A · dl:
(a) On path (a) we have dl = −dl3 û3 = −h3 dq3 û3 and hence
A · dl = −A3 dl3 = −A3 h3 dq3 .
71
(b) On path (b) we have dl = dl3 û3 = h3 dq3 û3 and we must evaluate both A
and h3 at q2 + dq2 . Then
∂A3
∂h3
A · dl = A3 +
h3 +
dq2
dq2 dq3
∂q2
∂q2
= A3 h3 dq3 +
∂
(A3 h3 )dq2 dq3
∂q2
where we only keep terms thru second order in dqi .
(c) Path (c) is again easy because dl = dl2 û2 = h2 dq2 û2 so
A · dl = A2 dl2 = A2 h2 dq2 .
(d) Here we have dl = −dl2 û2 = −h2 dq2 û2 where A2 and h2 must be evaluated
at q3 + dq3 . Then as for path (b) we have
∂h2
∂A3
h2 +
dq3
dq2 dq2
A · dl = − A2 +
∂q3
∂q3
= −A2 h2 dq2 −
∂
(A2 h2 )dq2 dq3
∂q3
Adding up these four contributions and using the above definition of curl we obtain
∂
∂
1
(A3 h3 )dq2 dq3 −
(A2 h2 )dq2 dq3
(∇ × A)1 =
h2 h3 dq2 dq3 ∂q2
∂q3
∂
∂
1
(A3 h3 ) −
(A2 h2 ) .
=
h2 h3 ∂q2
∂q3
The corresponding results for the other two coordinate surfaces comes by letting
1 → 2 → 3 → 1 again, and we can write these results simply in the form of a
determinant:
h1 û1
h2 û2
h3 û3 1
∇×A=
(37)
∂/∂q1 ∂/∂q2 ∂/∂q3 .
h1 h2 h3 h A
h2 A2 h3 A3
1 1
This is the general expression for curl A in curvilinear coordinates.
Finally, the laplacian is defined by ∇2 f = ∇ · ∇f , so we can use our previous
results, equations (35) and (36), to write
∇2 f =
1
∂ h3 h1 ∂f
∂ h1 h2 ∂f
∂ h2 h3 ∂f
+
+
.
h1 h2 h3 ∂q1
h1 ∂q1
∂q2
h2 ∂q2
∂q3
h3 ∂q3
Note that the three terms are cyclic permutations of each other.
72
(38)
Download