Uploaded by adgencysa

TEXTBOOK Calculus and Algebra (1)

advertisement
Further
Engineering Mathematics
(Algebra and Multivariable Calculus)
J. N. RIDLEY
3
2
1
-6
-4
-2
2
4
6
-1
-2
-3
MATH2011/2/4
Further Engineering Mathematics
(Algebra and Multivariable Calculus)
J. N. Ridley
c J. N. Ridley
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, in any
form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior
written permission of the publisher. First printed 1999
Revised 2016
The cover picture represents a Fourier series approximation to the sawtooth function
arctan(tan x) for −2π ≤ x ≤ 2π.
Introduction
i
TABLE OF CONTENTS
Calculus Chapter 1 — Differential equations
Linear equations and operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
D-operator methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Complex exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Calculus Chapter 2 — Vector functions of a scalar
Vector differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Torsion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Trajectories and orthogonal trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Calculus Chapter 3 — Scalar and vector fields
Scalar fields and quadric surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Directional derivatives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29
The operator del and vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Potential functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Stationary points and optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Calculus Chapter 4 — Vector integration
Path integrals in scalar fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Path integrals in vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Double and repeated integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Change of variables in double integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Green’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Algebra Chapter 1 — Complex numbers
Real-imaginary form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
The complex plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Modulus-argument form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Euler’s formula. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67
Roots and polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
ii
MATH2011/2/4
Complex exponentials, logarithms, and powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Algebra Chapter 2 — Convergence of series
Indeterminate forms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75
Convergence of series I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Convergence of series II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Convergence of series III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Algebra Chapter 3 — Linear algebra
Linear spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Bases and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Independence and rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
The characteristic polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Algebra Chapter 4 — Orthonormality
Dot products and orthonormal bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Unitary and hermitian matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Introduction
æ
iii
0
MATH2011/2/4
æ
Calculus Chapter 1
1
Calculus Chapter 1
Differential equations
Linear equations and operators
Differential equations were introduced in first year, together with techniques for solving
first order equations of the types variables separable, homogeneous, linear, and exact. The
differential equation y ′ + p(t)y = q(t) is called linear (we shall use t rather than x because
in practical examples it is usually time), and simultaneous algebraic equations, which can
be written as a single matrix equation AX = B, are also called linear. What do these
different situations have in common, to justify using the same word?
One clue can be found by considering the form of the solutions. As shown previously,
R
the general solution of the linear differential equation above is y = µ−1 µQ dt + cµ−1 ,
R
where µ = e p(t) dt is the integrating factor and c is an arbitrary constant. The general
R
solution can be written as a sum y = y1 + y0 , where y1 = µ−1 µQ dt and y0 = cµ−1 .
Now y1 is a particular solution of the given equation, since it is obtained by putting c = 0.
Thus we have y1′ + p(t)y1 = q(t) (see Question 1 below). However, what happens if we
R
′
substitute y0 in the left hand side of the equation? Note that ln µ = p(t) dt, so µµ = p(t),
and
y0′
+ p(t)y0 = −cµ
−2 ′
µ + cµ
−1
p(t) = cµ
−1
µ′
p(t) −
µ
= 0.
Thus y0 is a solution of the equation with zero on the right hand side, and it is a general
solution of this equation because it involves the arbitrary constant c. This means that for
any first order linear differential equation we can split the general solution into a sum
(Particular solution of given equation) + (General solution of equation with 0 on RHS).
(Sometimes the equation with 0 on the right hand side is called “homogeneous” because it is
of degree one (linear) in y and its derivative. This must not be confused with homogeneous
differential equations in which every term is of the same total degree in x and y.)
2
MATH2011/2/4
Similarly, the solution of the matrix equation AX = B can be split into X = X1 + X0 ,
where X1 is a particular solution of the given equation, and X0 , which contains all arbitrary
constants, is the general solution of the equation with 0 on the right hand side.
Now we have seen a similarity in the general form of the solutions, but what is the
underlying reason for it? The essence lies in what is called a linear operator, or linear
process or linear system. We have met the idea of a function as something, like a button
on a calculator, that takes an input number and converts it into an output number:
x −→ f −→ f (x).
An operator is more general than a function, in that its input and output need not simply
be numbers, but can be more general mathematical objects, e.g. vectors or even functions
themselves. For example, the differential equation above is associated with an operator
that takes an input function y(t) and converts it into the output function y ′ (t) + p(t)y(t).
−→ y ′ (t) + p(t)y(t).
y(t) −→
(There is no need at the moment to give the operator a name.) Such operators arise frequently in chemical processes, in vibrations, and in electrical analogues of these. Similarly,
the matrix equation is associated with the operator that takes an input n × 1 vector X
and multiplies it by the constant m × n matrix A to give the m × 1 output vector AX.
X −→
−→ AX.
With operators, as with functions, there is a unique and well defined output for every
appropriate input. Solving an operator equation amounts to solving the inverse problem,
associated with the inverse operator, which is obtained by reversing the arrows, or
swopping input and output, thus:
y ′ + p(t)y = q(t) →
−1
−1
→ y = ? or AX = B →
→ X = ?.
For example, solving the differential equation y ′ + p(t)y = q(t) is the same as asking,
“Given that y ′ + p(t)y = q(t), what is y?”, or “If q(t) is put into the inverse operator, what
function y will come out?”. Similarly, solving the algebraic equation AX = B is the same
as asking, “Given that AX = B, what is X?”, or “If B is put into the inverse operator,
what vector X will come out?”. However, with inverse operators, the output (or solution
of the equation) is not usually unique, and sometimes does not even exist, as has been
seen before for linear algebraic equations.
Calculus Chapter 1
3
We now restrict ourselves to linear operators. An operator is said to be linear if the
output is proportional to the input, and if two inputs are combined, then the resulting output is obtained by combining the separate outputs. To express this more mathematically,
suppose we have an operator P that takes an input y to the output denoted P y:
y −→ P
−→ P y.
Then the operator P is defined to be linear if
(1) P (cy) = c(P y) for any constant c,
(2) P (y1 + y2 ) = P y1 + P y2 for any two inputs y1 and y2 .
The rules for differentiation and for matrix algebra show immediately that the two operators considered above are linear.
The null space of a linear operator P consists of all y0 such that P y0 = 0. Thus saying
that y0 is in the null space of P means the same as saying that y0 is a solution of the
homogeneous equation. If y is in the null space of P , then, roughly speaking, y0 = P −1 0,
so y0 is sometimes called a zero-input solution, since it can arise when zero is input to
the inverse operator. Another name is to say that y0 is a natural response of the system,
since it can occur even if there are no external influences.
Theorem (Superposition Principle). If x0 and y0 are in the null space of a linear
operator P , then the sum x0 + y0 and any constant multiple cy0 are also in the null space
of P .
Proof. Since x0 and y0 are in the null space of P , we have P x0 = 0 = P y0 ; then by
linearity
P (x0 + y0 ) = P x0 + P y0 = 0 + 0 = 0 and P (cy0 ) = c(P y0 ) = c0 = 0,
so x0 + y0 and cy0 are also in the null space.
Theorem. If P is a linear operator, then the general solution of the equation P y = b
can be written in the form y = y1 + y0 , where y1 is any particular solution of the given
equation, and y0 is a general element of the null space.
Proof. There are two things to prove: firstly that anything of the form y1 +y0 is a solution,
and secondly that any solution is of the form y1 + y0 . Firstly, suppose y = y1 + y0 , where
y1 is a particular solution of the given equation, and y0 is in the null space. This means
that P y1 = b and P y0 = 0. Then
P y = P (y1 + y0 ) = P y1 + P y0
= b + 0 = b,
(by linearity)
4
MATH2011/2/4
so y is a solution. Secondly, suppose we have any solution y, as well as our particular
solution y1 . This means that P y = b and also P y1 = b. Let y0 = y − y1 ; then
P y0 = P (y − y1 ) = P y − P y1
(by linearity)
= b − b = 0,
so y0 is in the null space, and clearly y = y0 + y1 .
For a linear differential equation P y = b, the particular solution y1 satisfying y1 = y1′ =
y1′′ = · · · = 0 at time t = 0 is called the zero-state solution with input b, since it
arises when the external force b is applied to a system at rest. Any particular solution,
whatever its initial state, is also called a forced response of the system. It follows from
this theorem that the general solution is the sum of the zero-state solution and the general
zero-input solution.
Tutorial questions — Linear equations
and operators
R
R
1. Show that if y1 = µ−1 µq(t) dt, where µ = e
p(t) dt
, then y1 is a solution of the first
R
order linear differential equation y ′ + p(t)y = q(t). (Hint: write µy1 = µq(t) dt, then
differentiate both sides, and use the fact that p(t) = µ′ /µ.)
2. Solve the first order linear differential equations below, and show that the general solution is in the sum of a particular solution of the given equation and a general solution
of the homogeneous equation.
dy
y
1
− 2
= 2
.
dt
t +1
t +1
3. Use Gaussian elimination (and complex algebra) to solve the matrix equations below,
2
(a) y ′ + 2ty = e−t
(b)
where Z is an unknown vector in C2 or C3 . Write the general solution in the form
Z = Z1 + Z0 , where Z1 is a particular solution of the given equation (with no arbitrary
constants) and Z0 is a general solution of the homogeneous equation (including arbitrary
constants).
1
i
1+i
1 i 1+i
1
(a)
Z=
(b)
Z=
.
i −1
−1 + i
i 0
1
2−i
4. The following list gives the outputs P (y) obtained when different operators P act on
the general function y = y(t). Simplify P (y1 + y2 ) and P y1 + P y2 , and see if they
are equal. Similarly, compare P (cy) (where c is constant) and c(P y). Hence determine
which operators are linear.
(a) P (y) = 2y
(b) P (y) = y 2
(c) P (y) = yy ′
(d) P (y) = ty ′ + y.
5. The following list gives the outputs P X obtained when different operators P act on the
general 2 × 1 column vector X. Compare P (X1 + X2 ) with P X1 + P X2 and P (cX)
with c(P X). Hence determine which operators are linear.
Calculus Chapter 1
5
(a) P X = 2X
(b) P X = |X|
(c) P X = X T
(d) P X = X T X.
6. (a) Use the rules for differentiation and integration to show that the differentiation
Rb
operator (y →
→ y ′ ) and the definite integral operator (y →
→ a y dt) are
linear operators.
(b) Use the rules of matrix algebra to show that multiplication by a fixed matrix A (i.e.
X→
→ AX) is a linear operator.
7. Find the null spaces of the linear operators in Question 4. (Hint: put the given output
equal to 0, and solve for y.)
D-operator methods
In this section we shall use linearity to solve nth order linear differential equations
with constant coefficients. The general form is
an
dn−1 y
d2 y
dy
dn y
+
a
+
·
·
·
+
a
+
a
+ a0 y = f (t),
n−1
2
1
dtn
dtn−1
dt2
dt
where y = y(t) and the coefficients an , . . . , a0 are real constants. Since the leading coefficient an must be non-zero, we can take it as equal to 1, by dividing through by it if
d
necessary. We use D to denote the differentiation operator dt
, so we can write the equation
as
Dn y + an−1 Dn−1 y + · · · + a2 D2 y + a1 Dy + a0 y = f (t), or P (D)y = f (t),
where the operator P (D) = Dn + an−1 Dn−1 + · · · + a2 D2 + a1 D + a0 . P (D) is called
a D-operator and has the form of a polynomial in D. It is easy to show that every
such D-operator is linear, since differentiation and multiplication by constants are linear
operators.
A special property of D-operators, which is true only because the coefficients are all
constant, is that if we operate first by P (D) and then by Q(D), the result is the same
as multiplying the two polynomials and then operating by their product P (D)Q(D). It
is also the same as first operating by Q(D) and then by P (D). This can be written
mathematically as
P (D) Q(D)y = P (D)Q(D) y = Q(D) P (D)y ,
or diagrammatically that the three processes below are identical:


ր



y −→ −→




ց
P (D)
−→
Q(D)
−→
P (D)Q(D)
−→
Q(D)
−→
P (D)


ց 


−→ −→ P (D)Q(D) y.




ր
6
MATH2011/2/4
This is analogous to the result that the mixed second partial derivatives of a function of
two variables are equal, i.e. that
if we operate on z first by
∂
∂y
∂2z
∂x∂y
∂2z
,
∂y∂x
∂
by ∂x
,
=
and then
which means that the output is the same
or the other way round. The importance
of this property is that a D-operator can be factorized, just like any polynomial, and the
effect is the same as if we operate by the factors one at a time. By the Fundamental
Theorem of Algebra we can assume that P (D) can be factorized into factors of degree one,
though complex coefficients may be required.
It follows that the solution of D-operator equations can be found by repeated first
order methods. Firstly, if P (D) itself is of degree one, say P (D) = D − α, then we have
(D − α)y = f (t), which is a first order linear differential equation with integrating factor
e−αt . The solution is
αt
y=e
It is easy to verify that eαt
R
Z
e−αt f (t) dt + Aeαt .
e−αt f (t) dt is a particular solution and that Aeαt is in the
null space of D − α, as the theory predicts. Secondly, if P (D) is of degree two, say
P (D) = (D − α)(D − β), then we substitute u = (D − β)y. This converts P (D)y = f (t)
into (D − α)u = f (t), which can be solved for u. We then find y by solving (D − α)y = u,
since u is now known.
Repeated first order methods are very tedious in practice for equations of order higher
than two, so it is worthwhile finding a general solution technique. From the theory of
linear operators we can break it into two parts:
• find a general function in the null space, and
• find a particular solution.
The general solution will then be the sum of these two. The particular solution y1 is usually
called a particular integral, since it is the output of an inverse differentiation operator.
The general function y0 in the null space is called the complementary function, since
it completes the solution by being added to the particular integral. Thus y = y1 + y0 , as
before, or
General Solution = Particular Integral + Complementary Function
= Zero-state Solution + Zero-input Solution
= Forced Response + Natural Response
Firstly, to find the complementary function, we have to solve the homogeneous equation
P (D)y0 = 0. If P (D) is of degree one, say P (D) = D − α, and we showed above that the
solution of (D − α)y0 = 0 is y0 = Aeαt , where A is an arbitrary constant. Next suppose
P (D) is of degree two, say P (D) = (D − α)(D − β). By putting u0 = (D − β)y0 we obtain
Calculus Chapter 1
7
(D − α)u0 = 0, so u0 = Ceαt say, where C is arbitrary. We must now solve the first order
equation (D − β)y0 = Ceαt , which gives
βt
y0 = e
The integral is equal to
C
(α−β)t
α−β e
Z
e−βt Ceαt dt + Beβt .
if α 6= β, and to Ct if α = β. Thus the solution of
(D − α)(D − β)y0 = 0 is y0 =
Aeαt + Beβt
(At + B)eαt
if α 6= β
if α = β,
where A and B are new arbitrary constants. This result can easily be generalized for
higher order operators, and the following can be proved by induction if necessary.
Theorem. If P (D) = (D − α)k (D − β)l . . . , where α, β, . . . are all distinct, then a general
function y0 in the null space of P (D) has the form
y0 = (A1 + A2 t + · · · + Ak tk−1 )eαt + (B1 + B2 t + · · · + Bl tl−1 )eβt + · · · ,
where A1 , . . . , Ak , B1 , . . . , Bl , . . . are arbitrary constants, equal in number to the degree of
P (D).
Notice how each k-fold factor (D − α)k of P (D) contributes the exponential eαt multipled
by a polynomial with k arbitrary coefficients. The simplest case is when no factors are
repeated, i.e. k = l = · · · = 1, and P (D) = (D − α)(D − β) . . . . The complementary
function is then simply
y0 = Aeαt + Beβt + · · · ,
with one arbitrary constant for each exponential and no powers of t appearing.
Now that we have found the complementary function y0 , we must find a particular integral y1 , that is, a solution of P (D)y1 = f (t). Since D-operators behave like polynomials,
it is convenient to write inverse D-operators like reciprocals, so we shall write
y1 =
1
f (t),
P (D)
though it should be remembered that this is not uniquely defined, because there are infin
1
P (D)f (t) is not unique, although
itely many solutions. In particular, P (D)
P (D)
1
f (t) = f (t)
P (D)
always. (This is similar to the fact that sin(arcsin x) = x for all x for which it is defined,
but arcsin(sin x) 6= x in general.)
8
MATH2011/2/4
In order to find particular integrals, we need to discuss the action of D-operators on
products in which one factor is exponential. By the product rule for differentiation, we
have
D(eβt v1 ) = eβt Dv1 + βeβt v1 = eβt (D + β)v1 .
By induction on r it follows that Dr (eβt v1 ) = eβt (D + β)r v1 , and by adding constant
multiples of such terms we get the Shift Rule:
P (D) eβt v1 = eβt P (D + β)v1
for any polynomial operator P (D). It says that eβt can be brought to the left past any
D-operator provided D is replaced by D + β.
If we now put v1 (t) =
1
P (D+β) a(t),
then P (D) eβt v1 = eβt P (D + β)v1 = eβt a(t), so
1
1
eβt a(t) = eβt
a(t),
P (D)
P (D + β)
which shows that the Shift Rule is valid for inverse D-operators also. We assume that
f (t) = eβt a(t), the product of an exponential and a polynomial, so all that is left is to
determine
1
P (D+β) a(t),
an inverse D-operator acting on a polynomial.
By taking out the lowest non-zero term in P (D + β), we can write
P (D + β) = q0 Dk (1 + {q1 D + · · · }),
where q0 6= 0, but q1 and k may be zero. This gives
y1 =
1
eβt −k
1
eβt a(t) = eβt
a(t) =
D (1 + {q1 D + · · · })−1 a(t).
P (D)
P (D + β)
q0
We now expand the reciprocal 1 + {q1 D + · · · }
2
−1
using the binomial series 1 + u
−1
=
1 − u + u − · · · to as many terms as are required, and let the resulting polynomial D-
operator act on a(t). (Since a(t) is a polynomial in t, operating by sufficiently high powers
of D always gives zero, so an infinite series in D will never be required.) Finally, if k 6= 0,
we must operate by D−k , i.e. integrate k times, and multiply by eβt /q0 .
The procedure for finding a particular integral y1 =
1
eβt a(t)
P (D)
can be summarized as
follows. The simplest case is when a(t) is a constant, say a(t) = a, and P (β) 6= 0. Then
k = 0 and q0 = P (β) and only the constant term in the binomial expansion is required, so
aeβt
1
βt
ae =
provided a is constant and P (β) 6= 0.
P (D)
P (β)
Calculus Chapter 1
9
If the polynomial a(t) is not constant, or if P (β) = 0, then we need to perform the following
steps.
(1) If β 6= 0, use the Shift Rule to write
1
P (D+β)
1
D−k
q0
1
= eβt P (D+β)
a(t).
1 + {q1 D + · · · }
.
−1
(3) Expand 1 + {q1 D + · · · }
to a polynomial of the same degree as a(t), and let it
operate on a(t).
(2) Write
=
1
βt
P (D) e a(t)
−1
(4) If k 6= 0, operate by D−k , i.e. integrate k times.
(5) Divide by q0 and multiply by eβt .
Note that P (D) should be left in the simplest form, which is often not the factorized
form. Furthermore, steps (3) and (4) can often be simplified by algebraic manipulation of
the operators.
Tutorial questions — D-operator methods
8. Find (D2 + 3D + 2)(cy) and (D2 + 3D + 2)(y1 + y2 ), where c is constant. Hence show
that D2 + 3D + 2 is a linear operator.
*Show that a general D-operator Dn + an−1 Dn−1 + · · · + a0 is linear.
d
d
d
d
+2) ( dt
+1)y and ( dt
+1) ( dt
+2)y are both equal to (D2 +3D+2)y.
9. Verify that ( dt
10. Show that D(D + t)y 6= (D + t)Dy. (Hint: simplify each side separately, and don’t
forget the product rule for differentiation, when it is required.)
R
11. Use Question 1 to verify that y1 = eαt e−αt f (t) dt is a particular solution of the
equation (D − α)y = f (t), and that y0 = Aeαt is a general function in the null space.
(Hint: put p(t) = −α, and replace q(t) by f (t) and by zero.)
12. Use repeated first order methods to solve the D-operator equations:
2
(c) ÿ + 3ẏ + 2y = 4t.
(a) (D2 + 3D + 2)y = 0
(b) ddt2y − 4 dy
dt + 4y = 0
13. Write down complementary functions for the D-operators:
3
2
d
d
d
(d) (D2 − 1)2 .
(a) (D + 2)(D − 1)2
(b) D2 (D − 2)3
(c) dt
3 + dt2 − dt − 1
14. Find complementary functions and particular integrals for the equations from Ques-
tion 12, and compare the solutions with those obtained previously.
15. Find the complete general solutions of the D-operator equations below. (Hint: complementary functions were found in Question 13.)
(a) (D + 2)(D − 1)2 y = e−t
(b) D2 (D − 2)3 y = et
d2
2
3
(c) (D3 + D2 − D − 1)y = t
(d) ( dt
2 − 1) y = t .
* 16. Use integration by parts several times (preferably in one line) to show that
Z
f ′ (t) f ′′ (t)
1
αt
+
+··· .
e−αt f (t) dt = − f (t) +
e
α
α
α2
We know from Question 11 that the left hand side is a particular solution of (D − α)y =
f (t). Show that the right hand side can be obtained by expanding
1
D−α
as a binomial
10
MATH2011/2/4
series and applying it to f (t). (This justifies the series expansion of an inverse Doperator, provided that the right hand side converges.)
Complex exponentials
The procedure for finding complementary functions depended on factorizing P (D) into
factors of degree one. By the Fundamental Theorem of Algebra this factorization is always
possible, but the factors might involve complex numbers. However, this does not cause
any difficulties, because by Euler’s formula we know that if α = a + ib, then
Aeαt = Aeat+ibt = Aeat (cos bt + i sin bt).
where the arbitrary constant A may also be complex. However, since P (D) has real
coefficients, it follows that each non-real factor D − α has a corresponding conjugate
factor D − α, and the solution has a corresponding term Beαt = Beat (cos bt − i sin bt).
Thus the complex portion of the complementary function can be written as a sum of pairs
of the form
Aeαt + Beαt = eat (A + B) cos bt + i(A − B) sin bt .
If the solution is to be real for all t, then A + B and i(A − B) must both be pure real, say
P = A + B and Q = i(A − B). This gives A = 21 (P − iQ) and B = 12 (P + iQ) = A. If
we write P + iQ = Reiγ (modulus-argument form), then A = 21 Re−iγ , so this part of the
solution becomes
Aeαt + Aeαt = 2 Re Aeαt = Re Re−iγ eat+ibt = Reat cos(bt − γ).
Thus the real-valued term in the complementary function corresponding to the pair of
conjugate factors (D − α)(D − α) = (D − a)2 + b2 can be written as
eat (P cos bt + Q sin bt) or Reat cos(bt − γ),
where P, Q and R, γ are pairs of arbitrary real constants. For repeated factors, we add
similar expressions, each multiplied by an appropriate power of t.
Particular integrals for cosine or sine functions can be found by writing them as the
real or imaginary part of a complex exponential, and using the results from the previous
section.
Calculus Chapter 1
11
Tutorial questions — Complex exponentials
17. Find real-valued solutions for the following homogeneous D-operator equations. (Look
in Algebra Chapter 1 for the factors of (b), or subtract and add 4D2 to complete the
square.)
d4 y
dt4 +
3
(a) (D2 + 4)y = 0
(b)
(d) (D2 + 1)2 y = 0
(e) (D + D2 + D + 1)y = 0
(c) (D2 + 4D + 5)y = 0
4y = 0
(f) ÿ + 2ẏ + 2y = 0.
18. Find particular integrals for the following D-operator equations. Hence write down the
complete general solutions. (Complementary functions were found in Question 17.)
(a) ÿ + 4y = t2 + e−t
(c)
2
d y
+
dt2
3
4 dy
+ 5y = 8t sin t
dt
(e) (D + D2 + D + 1)y = te−t
(b) (D4 + 4)y = sin t
(d) (D2 + 1)2 y = t3
(f) (D2 + 2D + 2)y = 5 cos t.
19. Find real general solutions of the simultaneous differential equations below by the following method: differentiate the first equation to get ẍ in terms of ẋ and ẏ, and substitute
for ẏ so that ẍ is expressed in terms of ẋ, x, and y. Then eliminate y from the expressions for ẍ and ẋ, and solve the resulting D-operator equation for x. Finally, find y
by using the first equation to express y in terms of x and ẋ. (Dots indicate derivatives
with respect to t.)
ẋ =
y
(a)
ẏ = −x
(b)
ẋ
ẏ
= −x
= −x
+y
−y.
Stability
A real exponential function eat will tend to zero as t → ∞ if a < 0 and will tend to infinity
if a > 0. This remains true (in magnitude) if eat is multiplied by a polynomial in t, since
exponentials beat powers, or if eat is multiplied by a sine or cosine, which oscillates. Thus
the complementary function of P (D) will tend to zero as t → ∞ provided Re(α) < 0 for
every linear factor s − α of P (s). (We replace D by s since we are thinking of it as a
complex variable, not an operator.) These values of α are called the zeros of P (s), since
s = α is a solution of P (s) = 0. They are also called the poles of
1
P (s) ,
i.e., the poles of a
rational function of s are the values of s for which the denominator is zero, so the function
is undefined.
If the complementary function (or zero-input solution) does tend to zero, then it is said
to be transient (which means temporary), and the system is said to be stable, since the
zero-input solutions die away as t → ∞. For a stable system, all solutions of P (D)y = f (t)
(with the same initial conditions) will tend to the same limiting solution as t → ∞ and
the transients die away.
Thus we have the following important result.
12
MATH2011/2/4
Theorem. A D-operator P (D) is stable if and only if all the poles of
1
P (s)
have negative
real part, i.e., lie to the left of the imaginary axis.
Figure 1.1. Underdamped and overdamped stable systems
The most important example of a stable system arises from damped simple harmonic
motion: the operator is P (D) = mD2 + 2kD + c2 , where m, k, and c are all positive, and
the equation P (D)y = f (t) then becomes mÿ = −c2 y − 2k ẏ + f (t). By Newton’s second
law, if y denotes displacement, then the equation corresponds to a restoring force −c2 y
proportional to the displacement, a resisting force −2k ẏ proportional to the velocity, and
an external applied force f (t). Stability means that if there is no external force, then
the displacement dies away. The electrical analogue is the RLC-circuit, with equation
LD2 + RD + C1 Q(t) = E(t). Here L, R, and C are constants (inductance, resistance,
and capacitance respectively), Q(t) is the charge on the capacitor, and E(t) is the applied
voltage. Such a system is said to be underdamped if the zeros are non-real, because the
transients oscillate (because of the complex exponentials) as they die away. The system
is overdamped if the zeros are real; the transients die away without oscillation. These
situations are illustrated in Figure 1.1.
1
for s in the left half plane
|P (s)|
Another important feature of a stable system P (D)y = f (t) is the limiting response to
Figure 1.2. Values of
a periodic input f (t) = eiωt . By the stability assumption, all the zeros of P (s) lie to the
Calculus Chapter 1
13
left of the imaginary axis, so P (iω) 6= 0. Therefore the particular integral
1
1
eiωt =
eiωt ,
P (D)
P (iω)
using the simplest case for evaluating particular integrals. The limiting output, as the
1
1
transients die away, is therefore P (iω)
eiωt . Its amplitude or modulus is |P (iω)|
, which is
called the dynamic gain of the system, since it is the ratio of the magnitudes of the
output and the input. Figure 1.2 plots the surface of values of |P 1(s)| for values of s on and
to the left of the imaginary axis. The vertical section on the near side (along the imaginary
1
axis, where s = iω) gives the values of |P (iω)|
. The two poles of P 1(s) (where the surface
goes up to infinity) are also clearly visible.
Tutorial questions — Stability
20. Determine which of the operators in Question 17 are stable.
21. (a) If P (s) = ms2 + 2ks + c2 , where m, k, and c are all positive, find the zeros of P (s).
(Hint: complete the square, or use the quadratic formula.)
√
√
(b) If 0 < k < c m, define ω = mc2 − k 2 , and show that the zeros are at (−k ±iω)/m.
Hence find real expressions for the complementary functions, and show that they oscillate and die away (underdamping).
√
√
(c) If k > c m, define ω = k 2 − mc2 , and show that the zeros are at (−k ± ω)/m.
Hence find the complementary functions, and show that they die away with at most
one turning point (overdamping).
√
(d) If k = c m, find the complementary functions. Do they behave more like underdamped or overdamped solutions?
22. If P (D) = D2 +2ζωD+ω 2 , where ω > 0, show that the system is stable and overdamped
if ζ > 1, stable and underdamped if 0 < ζ < 1, and unstable if ζ < 0.
General tutorial questions
Use these for extra practice or revision.
23. Find complete general solutions of the equations below.
(a) (D + 2)(D − 1)2 y = 6et
(b) D2 (D − 2)3 y = 4te2t
3
2
(d) (D2 − 1)2 y = 4tet .
(c) ddt3y + ddt2y − dy
dt − y = 8 cosh t
24. Solve the following homogeneous differential equations under the given initial conditions:
dy(t)
dy(t)
d2 y(t)
= 0.
+5
+ 6y(t) = 0,
y(0) = 1,
(a)
2
dt
dt
dt t=0
d2 y(t)
dy(t)
dy(t)
(b)
+2
+ y(t) = 0,
y(0) = 1,
= 0.
2
dt
dt
dt t=0
dy(t)
d2 y(t)
= 1.
+
y(t)
=
0,
y(0)
=
1,
(c)
dt2
dt t=0
14
MATH2011/2/4
(d) ẍ + 4ẋ + 13x = 0 such that x = 2 and ẋ = 1 when t = 0.
d4 y
dy
d2 y
d3 y
(e) 4 − y = 0 such that y = 1 and
= 2 = 3 = 0.
dt
dt
dt
dt
25. Solve the following differential equations, using the given initial conditions to find the
constants. Indicate which part of the solution is the forced response and which is the
natural or free response.
dy(t)
dy(t)
d2 y(t)
−t
+
3
+
2y(t)
=
e
,
y(0)
=
= 0.
(a)
dt2
dt
dt t=0
d2 y(t)
dy(t)
dy(t)
(b)
+3
+ 2y(t) = e−t ,
y(0) =
= 1.
2
dt
dt
dt t=0
dy(t)
dy(t)
d2 y(t)
+2
+ y(t) = te−t ,
y(0) = 1,
= 0.
(c)
2
dt
dt
dt t=0
(d) ẍ(t) − x(t) = e−t + sin t,
x(0) = ẋ(0) = 0.
26. A signal x(t) = te−2t is applied to a system satisfying the differential equation
dy(t)
d2 y(t)
+4
+ 8y(t) = x(t).
2
dt
dt
1
2
The initial conditions of this system are y(0) =
and y ′ (0) = 14 . Find the forced and
natural responses.
27. If P (D) = (D + 1)2 and Q(D) = D − 1, find the full general solution of P (D)y =
Q(D)te−t . (Hint: First simplify the right hand side.)
Answers
R
1. µy1 = µq(t) dt, so µy1′ + µ′ y1 = µq(t). Now substitute for µ′ and cancel µ.
2
2
2. (a) y = y1 + y0 , where y1 = te−t and y0 = ce−t . Check that y0′ + 2ty0 = 0.
0
(b) y = y1 + y0 , where y1 = −1 and y0 = cearctan t . Check that y0′ − t2y+1
= 0.




−1 − 2i
i
1+i
−i
3. (a) Z1 + Z0 =
+c
,
(b) Z1 + Z0 =  2 − 2i  + c  −2 + i .
0
1
0
1
4. (a) P (cy) = 2cy = c(P y), P (y1 + y2 ) = 2(y1 + y2 ) = P y1 + y2 , linear;
(b) P (cy) = c2 y 2 6= cy 2 = c(P y), P (y1 + y2 ) = (y1 + y2 )2 6= y12 + y22 = P y1 + P y2 ,
non-linear;
(c) P (cy) = c2 yy ′ 6= cyy ′ = c(P y), P (y1 + y2 ) = (y1 + y2 )(y1′ + y2′ ) 6= y1 y1′ + y2 y2′ =
P y1 + P y2 , non-linear;
(d) P (cy) = tcy ′ +cy = c(P y), P (y1 +y2 ) = t(y1′ +y2′ )+(y1 +y2 ) = (ty1′ +y1 )+(ty2′ +y2 ) =
P y1 + P y2 , linear.
5. (a) P (cX) = 2cX, P (X1 + X2 ) = 2(X1 + X2 ) = P X1 + P X2 , linear;
(b) P (cX) = |c||X| =
6 c|X| = c(P X), P (X1 +X2 ) = |X1 +X2 | =
6 |X1 |+|X2 | = P X1 +P X2 ,
Calculus Chapter 1
15
non-linear;
(c) P (cX) = (cX)T = c(X T ) = c(P X), P (X1 + X2 ) = (X1 + X2 )T = P X1 + P X2 , linear;
(d) P (cX) = c2 X T X 6= c(X T X) = c(P X), P (X1 + X2 ) = (X1 + X2 )T (X1 + X2 ) 6=
X1T X1 + X2T X2 = P X1 + P X2 , non-linear.
(d) y = ct .
7. (a) y = 0,
d
d
d
9. ( dt
+2) ( dt
+1)y = ( dt
+2)( dy
dt +y) =
d2 y
dt2
d dy
dt ( dt
2
d y
dy
dy
+y)+2( dy
dt +y) = ( dt2 + dt )+(2 dt +2y) =
+ 3 dy
+ 2y = (D2 + 3D + 2)y.
dt
10. D(D + t)y = D(y ′ + ty) = y ′′ + (ty ′ + y) (product rule), while (D + t)Dy = (D + t)y ′ =
y ′′ + ty ′ . This shows that if the coefficients are not constants, then the operators cannot
simply be multiplied like polynomials.
12. (a) y = Ae−2t + Be−t ,
(b) y = (At + B)e2t ,
13. (a) Ae−2t + (B1 + B2 t)et ,
(c) y = 2t − 3 + Ae−2t + Be−t .
(b) A1 + A2 t + (B1 + B2 t + B3 t2 )e2t ,
(c) Aet + (B1 + B2 t)e−t ,
(d) (A1 + A2 t)et + (B1 + B2 t)e−t .
15. (a) 14 e−t + Ae−2t + (B1 + B2 t)et ,
(b) −et + A1 + A2 t + (B1 + B2 t + B3 t2 )e2t ,
(c) −t + 1 + Aet + (B1 + B2 t)e−t ,
(d) t3 + 12t + (A1 + A2 t)et + (B1 + B2 t)e−t .
R −αt
−αt
e−αt ′
e−αt ′′
16. e f (t) dt = e−α f (t) − (−α)
2 f (t) + (−α)3 f (t) − · · · .
17. (a) y0 = P cos 2t + Q sin 2t or y0 = R cos(2t − γ),
(b) y0 = et (P cos t + Q sin t) + e−t (R cos t + S sin t),
(d) y0 = P cos t + Q sin t + t(R cos t + S sin t),
(c) y0 = e−2t (P cos t + Q sin t),
(e) y0 = Ae−t + P cos t + Q sin t,
(f) y0 = e−t (P cos t + Q sin t).
18. (a) y = 41 (t2 − 12 ) + 51 e−t + P cos 2t + Q sin 2t, (b) y =
e−t (R cos t + S sin t),
1
5
sin t + et (P cos t + Q sin t) +
(c) y = cos t(−t + 1) + sin t(t − 21 ) + e−2t (P cos t + Q sin t),
(d) y = t3 − 12t + P cos t + Q sin t + t(R cos t + S sin t),
( 41 t2 + 12 t)e−t + Ae−t + P cos t + Q sin t,
(e) y =
(f) y = cos t + 2 sin t + e−t (P cos t + Q sin t).
19. (a) x = P cos t + Q sin t, y = Q cos t − P sin t,
(b) x = e−t (P cos t + Q sin t),
y = e−t (Q cos t − P sin t).
20. (c) and (f) are stable.
√
21. (a) −k ± k 2 − mc2 /m.
(b) y0 = e−kt/m P cos(ωt/m) + Q sin(ωt/m) , which dies away because of the negative
exponent, but oscillates because of the sine and cosine terms.
(c) y0 = Ae−(k+ω)t/m +Be−(k−ω)t/m . This dies away because both exponents are negative,
since 0 < ω < k. Solve ẏ0 = 0 to find one turning point (at most).
(d) y0 = (A1 + A2 t)e−kt/m , no oscillation, like overdamped solutions.
p
22. Poles are at s = ω(−ζ ± ζ 2 − 1). These are real and negative if ζ > 1, non-real but
in the left half-plane if 0 < ζ < 1, and in the right half-plane if ζ < 0.
1 4
23. (a) t2 et + Ae−2t + (B1 + B2 t)et , (b) ( 24
t − 61 t3 )e2t + A1 + A2 t + (B1 + B2 t + B3 t2 )e2t ,
(c) tet − t2 e−t + Aet + (B1 + B2 t)e−t , (d) ( 61 t3 − 12 t2 )et + (A1 + A2 t)et + (B1 + B2 t)e−t .
16
24. (a) y(t) = 3e−2t − 2e−3t ,
(d) x = e−2t (2 cos 3t + 35 sin 3t),
MATH2011/2/4
(b) y(t) = e−t + te−t ,
(c) y(t) = sin t + cos t,
1
(e) y(t) = 2 (cosh t + cos t).
25. (a) y(t) = e−2t + (t − 1)e−t ; forced response is te−t , natural response is e−2t − e−t .
(b) y(t) = −e−2t + (t + 2)e−t ; forced response is te−t , natural response is 2e−t − e−2t .
(c) y(t) = e−t + (te−t + 16 t3 e−t ; forced response is 16 t3 e−t , natural response is e−t + te−t .
(d) x(t) = 12 (et − e−t − te−t − sin t); forced response is − 12 (sin t + te−t ), natural response
is sinh t.
26. General solution is y = e−2t (A cos 2t + B sin 2t) + 14 te−2t ; forced response is
natural response is 12 e−2t (cos 2t + sin 2t).
27. Q(D)te−t = (1 − 2t)e−t . Solution is y = e−t (A + Bt + 21 t2 − 31 t3 ).
t −2t
,
4e
Calculus Chapter 1
æ
17
16
MATH2011/2/4
æ
Calculus Chapter 2
17
Calculus Chapter 2
Vector functions of a scalar
Vector differentiation
A real vector function of a single real variable, say r = r(t), can be thought of as the
parametric representation of a curve, since we have
r(t) = x(t), y(t)
or r(t) = x(t), y(t), z(t) ,
depending on whether r is a two-dimensional or three-dimensional vector. It is often
helpful to think of the curve as representing the path of a particle that has position or
displacement vector r(t) at time t.
In the plane, complex numbers form a more powerful alternative to vector notation,
and a plane curve can be represented as z = z(t) = x(t) + iy(t). Polar equations provide
yet another representation for plane curves, since the polar equation r = r(θ) is the same
as the parametric equation
r = r(θ) cos θ, r(θ) sin θ = r(θ) cos θ, sin θ .
Vector differentiation is straightforward, since each component is differentiated separately. Its interpretation is also self-evident, since if r(t) is thought of as a displacement
d
dt r(t) is the
d2
and dt
2 r(t) is
vector at time t, then
velocity vector (which is parallel or tangential to the
curve at that point),
the acceleration vector. The rules for vector differen-
tiation are also precisely as expected, so the rules need not be specifically learnt, in spite
of the fact that there are three possible kinds of product. The reason is that products
of vectors are built up from sums and products of their components, for which the ordinary rules of differentiation apply. (In the plane, differentiation of complex numbers also
corresponds to vector differentiation, as is shown below.)
Rules for vector differentiation. If u(t) and v(t) are vector functions of t, if φ(t) is a
scalar function of t, and if c is a constant, then
du
d
cu = c
dt
dt
du dv
d
u+v =
+
dt
dt
dt
du dφ
d
φu = φ
+
u
dt
dt
dt
18
MATH2011/2/4
dv du
d
u·v =u·
+
·v
dt
dt
dt
Since
d
dt r(t)
dv du
d
u×v =u×
+
× v.
dt
dt
dt
is a tangent vector to the curve r = r(t), it follows that a unit tangent
vector can be found by dividing it by its magnitude. Such a unit tangent vector is denoted
by u or sometimes û, so we have
u=
If
d
dt r(t)
d
r(t)
dt
1
ṙ.
|ṙ|
is thought of as the velocity vector of a particle on a curve, then its magnitude
is the (scalar) speed of the particle, i.e. the rate of change of distance along
the curve i.e., the rate of change of arc length. We denote arc length by s, so we have
ds
dt
=
d
dt r(t)
, and the arc length from the point where t = α to the point where t = β is
given by
Arc length =
Z
β
α
dr
dt.
dt
Sometimes a curve is oriented, i.e., given a direction, so that arc length is taken as positive
in one direction and negative in the opposite direction. For an oriented curve, we must
write
ds
d
=±
r(t) ,
dt
dt
since
ds
dt
is obviously negative if arc length s decreases as the parameter t increases.
Tutorial questions — Vector differentiation
1. A particle moves along the curve with displacement vector r = (2t2 , t2 − 4t, 3t − 5) at
time t.
(a) Find the velocity and acceleration vectors of the particle at any time t.
(b) Find the components, in the direction of the vector (1, −3, 2), of the velocity and
acceleration vectors at time t = 2. (Hint: the component of a vector a in the direction
of a vector b is the number |a| cos φ, where φ is the angle between a and b.
(c) Find the components, in the direction tangential to the path of the particle, of the
velocity and acceleration vectors at time t = 1.
2. If u = (t2 , t, 1) and v = (ln t, et , t), verify the rules for differentiation for
d
dt (u
× v) by evaluating each side separately.
d
(u
dt
· v) and
3. For a plane curve with polar equation r = r(θ), note that r = r(θ) cos θ, sin θ . Use the
rules for vector differentiation with respect to θ to show that
dr
dθ (cos θ, sin θ) + r(− sin θ, cos θ)
d2 r
dr
d2 r
= ( dθ
2 − r)(cos θ, sin θ) + 2 dθ (− sin θ, cos θ).
dθ2
dr
=
(i) dθ
(ii)
Calculus Chapter 2
19
Note that the right hand sides are sums of perpendicular vectors. Deduce that
q
2
.
r 2 + dr
dθ
dr
dθ
=
4. If |u| is constant, show that u is perpendicular to u̇. (Hint: |u|2 = u·u; now differentiate
both sides.)
5. Suppose a curve z = z(t) in the complex plane is written in the form z = r(t)eiθ(t) . (i)
Find
dz
dt
and write it in real-imaginary form.
(ii) Write the curve in vector form r = r(t) = (x(t), y(t)). Then find
it is the vector form of
dr
dt
and show that
dz
.
dt
This shows that differentiation of curves in the complex plane coincides with vector
differentiation. (Complex differentiation is often easier to perform).
6. For each of the following curves, find the unit tangent vector at a general point on the
curve, and find the arc length between the given points. (The parameter values for the
initial and final points must be evaluated to find the endpoints of integration.)
(a) r = et (cos t, sin t, 1) between the points r = (1, 0, 1) and r = eπ (−1, 0, 1).
(b) r = (t, cosh t, sinh t) between the points r = (ln 2, 54 , 34 ) and r = (ln 3, 53 , 34 ).
(c) z = e(1+iπ)t between the points z = 1 and z = e2 .
(d) r = 1 + cos θ (polars) between the points where θ = −π and θ = π. (Hint: use the
formula for
dr
dθ
given in Question 3.)
Sketch the curve in (a). (Hint: show that the curve lies on the surface z =
What is this surface?)
p
x2 + y 2 .
7. In electromagnetic signalling a wave is often represented by a complex curve z = z(t),
where |z| is the amplitude and arg(z) is the angular displacement. Thus
d
dt
arg(z) = an-
gular velocity = 2π× frequency. A carrier wave zc (t) has constant amplitude r and
constant angular velocity ω, so zc = reiωt . Suppose this wave carries a real signal
f (t) = cos αt.
(a) In AM signalling the amplitude is modulated, and the modulated wave z(t) is obtained by simply multiplying the carrier wave by the signal, i.e. z(t) = f (t)zc (t). Show
that z(t) = 21 r(ei(ω+α)t + ei(ω−α)t ). Note how the modulated wave is made up of two
waves with angular velocities ω ± α (one on each side of the carrier). These are called
sidebands.
(b) In FM signalling the frequency alone is modulated, and the modulated wave z(t)
satisfies
d
dt
arg(z) =
d
dt
arg(zc ) + f (t). Show that
d
dt
arg(z) = ω + cos αt, and solve
for arg(z) (assuming arg(z) = 0 at t = 0). Hence show that the modulated wave
is z(t) = r exp i(ωt + α1 sin αt) . It can be shown that there are infinitely many side-
bands, with angular velocities ω ± α, ω ± 2α, and so on, but with decreasing amplitudes.
20
MATH2011/2/4
Curvature
In elementary curve sketching we obtained a qualitative description of the curvature of a
plane curve: if the second derivative is positive, then the curve bends or curves upward
(irrespective of whether it is increasing or decreasing), while if the second derivative is
negative, then the curve bends downward. We now establish a quantitative description,
i.e. we shall measure the sharpness of the bend, not only the direction of bending. We are
also not restricted to plane curves. We showed in the previous section that if s denotes
dr
arc length along a curve r = r(t), then ds
dt = dt and a unit tangent vector is given by
. Thus u = dr
u = dr
÷ dr
÷ ds
, which by the Chain Rule gives
dt
dt
dt
dt
u=
dr
.
ds
It is therefore convenient to use arc length s as the parameter for the curve.
Low curvature
‹ High curvature
Figure 2.1. Changes in curvature
Since u is a unit vector, it is perpendicular to its derivative
Let
du
ds
= κ (kappa) and n =
1 du
κ ds
du
,
ds
as shown in Question 4.
(assuming κ 6= 0), so
du
= κ n.
ds
It follows that n is a unit vector, and n ⊥ u, so n is called the unit normal vector
to the curve. It is easy to show that for a straight line κ = 0, and for a circle κ is the
reciprocal of the radius. Therefore as κ increases, the radius decreases, and the bend
becomes sharper, so we call κ the curvature of the curve, and
1
κ
is called the radius of
curvature at each point. For a general curve the curvature κ varies from point to point,
as illustrated in Figure 2.1. The normal vector n points towards the inside of the bend,
and the point r + κ1 n is called the centre of curvature.
dy
For a curve in the (x, y) plane the situation is simpler. We have u = ( dx
ds , ds ), and
dx
the only unit vectors perpendicular to this are ±(− dy
ds , ds ), so n must be one of these.
Since u is a unit vector, we can write u = (cos φ, sin φ), say. The angle φ is called the
Calculus Chapter 2
21
dx
inclination of the curve, since tan φ = dy
ds ÷ ds , which is the slope
Rule we have
du
dφ
=
(− sin φ, cos φ),
ds
ds
dy
dx .
Then by the Chain
where (− sin φ, cos φ) is a unit vector perpendicular to u. By comparing this with the
previous expression for
κ=
dφ
ds
du
ds ,
we see that
and n = ±(− sin φ, cos φ) = ±(− dy
, dx ), as mentioned previously.
ds ds
For an explicit plane curve y = y(x), a slightly different convention is used, to avoid the
plus or minus signs. With this convention, we use the formulae
κ=
dφ
ds
dx
and n = (− sin φ, cos φ) = (− dy
ds , ds ).
This choice of n always points upward, rather than towards the centre of curvature, and
p
dy
ds
= tan φ and dx
= 1 + tan2 φ = sec φ, it follows that
κ is not always positive. Since dx
d2 y
d
dφ
dφ ds
dφ
=
tan φ = sec2 φ
= sec2 φ
= sec3 φ
= κ sec3 φ.
2
dx
dx
dx
ds dx
ds
Thus κ and y ′′ have the same sign, and
κ=
y ′′
y ′′
=
3/2 ,
sec3 φ
′
2
1 + (y )
so the sign of κ, like the sign of y ′′ , indicates the direction of curving.
An important application of plane curvature is in the bending of a horizontal beam.
The curvature at each point is proportional to the bending moment, κ = BM/EI, where
the constant EI depends on the cross-section of the beam and the material of which it is
ds
made. If the deflection is slight, then the slope y ′ is small, so dx
≈ 1 and κ ≈ y ′′ . Thus
EIy ′′ ≈ BM, and EIy (4) ≈
d2
BM,
dx2
which is the load per unit length.
Tutorial questions — Curvature
8. Find the curvature at a general point on each of the following curves. (Hint: First find
du
u, then ddtu . Then κ = ddtu / ddtr since this expression is equal to ddtu / ds
dt = ds .)
(a) r = a + bt (a, b constant)
(b) r = a(cos t, sin t) (a constant)
(c) r = (t, t2 )
(e) r = (2t − sin 2t, 1 − cos 2t) (cycloid)
(d) r = (t, cosh t)
(f) z = (1 − it)eit for t > 0 (see Question 5)
(g) r = (cos t, ln(sec t+tan t) − sin t)
(h) r = aebθ (use Question 3).
9. The path traced out by the centres of curvature of a curve is called its evolute, and
the original curve is called the involute of its evolute. Find the evolutes of the curves
22
MATH2011/2/4
below, i.e. determine r + κ1 n.
(a) The cycloid in Question 8(e). Show that the evolute is also a cycloid.
(b) The logarithmic spiral in Question 8(h). Show that the evolute is also a logarithmic
spiral.
(c) The curve in Question 8(f). Show that the evolute is the unit circle, so the curve is
an involute of a circle, which is the profile used for gear teeth.
10. Use the formula for the curvature of an explicit plane curve y = y(x) to find the
curvatures of the following curves:
(a) y = x2
(b) y = sin x
(c) y = cosh x
(d) y = ln sec x.
11. Find the maximum curvature (in absolute value), and the x value(s) at which it occurs,
on the curves
(a) y = 13 x3
(b) y = ex .
12. In designing a road for high-speed travel, it is advisable that the curvature be continuous
(i.e., not change abruptly). In particular, when a straight road starts to bend, the
curvature at the beginning of the bend should be zero.
Suppose we have two portions of straight road. One portion has equation y = m(x − a)
for x ≥ a, and the other has equation y = −m(x + a) for x ≤ −a. A road with
equation y = y(x) for −a ≤ x ≤ a is being designed to join them. What can you say
about y, y ′ , and y ′′ at x = ±a?
cos πx
gives a satisfactory road. What is the maximum curvature
Show that y = − 2am
π
2a
(at x = 0)?
Find values of A, B, and C so that the curve y = Ax4 + Bx2 + C is also satisfactory,
and find the maximum curvature. Why is this road slightly better than the previous
one?
Torsion
For a curve r = r(t), the plane passing through the point r(t) and containing the vectors u
and n is called the osculating plane of the curve at that point. If the curve does not
lie in a fixed plane, then in addition to its curvature (which takes place in the osculating
plane) it has torsion or twisting out of the osculating plane. The osculating plane is
perpendicular to the cross product b = u × n, which is called the binormal to the curve.
Note that |b| = |u||n| sin π2 , so b is also a unit vector. If the curve is a plane curve, then
= 0. More generally, db
measures how fast the osculating plane is
b is constant, so db
ds
ds
changing, i.e. how much the curve is twisting.
Since u, n and b are mutually perpendicular unit vectors, we can use dot products to
express any vector in terms of its components in those three directions.In particular, we
can write
db
= ( db
· u)u + ( db
· n)n + ( db
· b)b.
ds
ds
ds
ds
Calculus Chapter 2
23
Now b · u is constant, so
db
ds
db
ds
· u = −b ·
du
ds
= b · κn = 0. Similarly, b · b is constant, so
· b = 0. Thus the only non-zero component is
db
ds
(tau) the torsion of the curve. Thus we have
τ = − db
ds · n, and
To find the remaining derivative
dn
ds
= − du
ds × b − u ×
dn
ds ,
db
ds
db
ds
· n, which we write as −τ , and call τ
= −τ n.
we note that n = −u × b, so
= −κn × b + u × (τ n) = −κu + τ b.
The three expressions for the derivatives are called the Serret-Frenet formulae and can
be written as follows (using dashes to denote derivatives with respect to s):
u′
n′
b′
=
=
=
κn
−κu
+τ b
−τ n


or

 
0
κ
u′
 n′  =  −κ 0
0 −τ
b′

 
u
0


n.
τ
b
0
Note that the matrix of kappas and taus is the negative of its transpose; such matrices are
called skew-symmetric. The simplest formulae for evaluating curvature and torsion are
κ = |u′ | and τ = κ−2 (u · (u′ × u′′ ))
(see tutorial question below for τ ). The scalar triple product in the formula for τ is just a
determinant. Remember that the dashes denote derivatives with respect to arc length s;
if another parameter is given, then the chain rule must be used.
Tutorial questions — Torsion
13. Find the curvature and torsion of the following curves:
(b) r = (t, t2 , 23 t3 ).
(a) r = (a cos t, a sin t, bt)
Find the osculating plane at a general point on the helix in (a).
14. Use the Serret-Frenet formulae to prove:
(i) u × u′ = κb
(ii) u′′ = −κ2 u + κ′ n + κτ b
(iii) (u × u′ ) · u′′ = κ2 τ .
15. Find the curvature and torsion of curves with the following unit tangent vectors, given
that s denotes arc length. (Hint:
sech2 s + tanh2 s = 1.)
(a) u =
√1 (tanh s, sech s, 1)
2
d
ds
sech s = − sech s tanh s,
d
ds
tanh s = sech2 s, and
(b) u = (− sin s tanh s, cos s tanh s, sech s).
24
MATH2011/2/4
Trajectories and orthogonal trajectories
Suppose at each point r in a region in the plane or in space we have a vector v = v(r)
defined. We say that v forms a vector field in the region. Vector fields arise in practice
chiefly from forces or velocities, and they are then called force fields or velocity fields,
respectively. If we think of the vector field as attaching an arrow to each point in the
region, then we can imagine joining these arrows and thereby filling the region with curves
such that v(r) is always a tangent to the curve through the point r. These curves are called
the trajectories of the field, or, in special cases, the lines of force or streamlines. Note
that the trajectories give us the direction of v at each point, but they say nothing about
the magnitude of v.
To find a general trajectory r = r(t) for a vector field v = v(r), we note that, since v is
tangential to the curve, we have (if we choose the parameter t suitably)
dr
= v(r).
dt
We must therefore solve this differential equation to find the trajectories, which can all
be found by taking different values of the arbitrary constant(s). If the trajectories are
sketched, then the orientation of the field should be indicated by inserting arrows on the
trajectories.
A curve r = r(t) that is perpendicular to a two dimensional vector field v = v(r) at
each point r is called an orthogonal trajectory to the field. Orthogonal trajectories can
be found by solving the differential equation
dr
· v(r) = 0.
dt
They do not have arrows, since there is no direction involved, but the distance between
adjacent orthogonal trajectories can indicate the magnitude of the vector field, just as
the distance between adjacent contours on a surface indicates the steepness of the slope.
Three dimensional vector fields have orthogonal surfaces, since there are infinitely many
orthogonal curves at any point.
Tutorial questions — Trajectories and orthogonal trajectories
16. Find equations of the trajectories of the following vector fields:
(a) v = (2y, −x)
(d) v = (2x, y)
(b) v = (3y, x)
(e) v = (x2 − 1, xy)
(c) v = (y, 2)
(f) v = (cosh x cos y, sinh x sin y)
(g) v = (x, y ln |y|)
(h) v = (y, z, −y)
(i) v = (x − y, x + y, 1).
17. (i) Sketch the trajectories of the vector fields in Question 16(a) to (e) roughly, indicating
their directions.
Calculus Chapter 2
25
(ii) Find the equations of the orthogonal trajectories of the vector fields in Question 16(a)
to (e), and sketch them roughly.
18. (a) Sketch the curves y = 12 (x + 1) +
values of c between
− 12
and
c
x−1
1
2.
inside the square 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 for
(b) Suppose x and y denote the relative concentrations of two chemicals in a reactor,
so 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1. The speed of the reaction is given by the vector field
v = x(x − 1, x − y). Show that the curves in (a) are the trajectories of this vector field,
and indicate their directions on your sketch.
, 0). If the
(c) Find the maximum value of y on the trajectory starting at the point ( 12
13
reaction proceeds until x = 0, what is the final value of y?
Answers
1. (a) v = (4t, 2t − 4, 3), a = (4, 2, 0),
(b) component of v is
√
(c) component of v is 29 and of a is √1229 .
4. Differentiation gives 0 =
d
dt (u
√
q
14 and of a is − 27 ,
· u) = 2u · u̇.
5. (i) ż = ṙeiθ +ireiθ θ̇ = ṙ(cos θ +i sin θ)+r(i cos θ −sin θ)θ̇ = (ṙ cos θ−r θ̇ sin θ) + i(ṙ sin θ+
r θ̇ cos θ).
(ii) r = (r cos θ, r sin θ), so ṙ = (ṙ cos θ−r θ̇ sin θ, ṙ sin θ + r θ̇ cos θ), which corresponds to ż.
√
6. (a) u = √13 (cos t − sin t, sin t + cos t, 1), arc length = 3(eπ − 1).
(b) u =
√
√1 (sech t, tanh t, 1), arc length = 7 2 .
12
2
√
2
= 1 + π 2 (e − 1). (d) u = (− sin 23 θ, cos 32 θ),
(c) u =
√ 1
(1
1+π 2
+ iπ)eiπt , arc length
arc length = 8.
Curve in (a) lies on surface z = r, which is a circular cone.
8. (a) 0 (straight line),
(e)
1
4
(b)
1
a
(circle of radius a), (c) 2(1 + 4t2 )−3/2 ,
(d) sech2 t,
(f) ż = teit so |ż| = t and u = eit . Then u̇ = ieit , so |u̇| = 1 and
cosec t,
κ = |u̇|/|ż| = 1t .
(g) cot t,
(h)
√1
e−bθ .
a b2 +1
9. (a) r + κ−1 n = (2t + sin 2t, −1 + cos 2t) = r(t + π2 ) − (π, 2) (original curve shifted left
by π and down by 2),
(b) r + κ−1 n = abebθ (− sin θ, cos θ) = be−bπ/2 r(θ + π2 ),
(1 − it)eit + t(ieit ) = eit .
10. (a) 2(1 + 4x2 )−3/2 ,
11. (a) |κ| = 2 5
5/4
6
−3/2
(c) z + κ−1 n =
(c) sech2 x,
(b) − sin x(1 + cos2 x)−3/2 ,
at x = ±5
−1/4
.
(b) |κ| = 2(3
−3/2
(d) cos x.
) at x = − 21 ln 2.
12. y = 0, y ′ = ±m, y ′′ = 0 at x = ±a (i.e., same values as the lines through those points).
Curve y = − 2am
π cos x has maximum curvature
has maximum curvature
cornering.
3m
2a ,
πm
2a .
m
4
2 2
4
Curve y = − 8a
3 (x − 6a x + 5a )
which is slightly less than
πm
2a .
Lower curvature means easier
26
MATH2011/2/4
b
a
τ = a2 +b
13. (a) κ = a2 +b
2,
2,
r · (b sin t, −b cos t, a) = abt.
(b) κ = τ = 2(1 + 2t2 )−2 .
15. (a) κ = √12 sech s, τ = − √12 sech s.
(b) κ = 1,
has constant curvature, but variable torsion.
16. (a) x2 + 2y 2 = c,
(e) x2 ±
y2
c2
= 1,
(b) x2 − 3y 2 = c,
Osculating plane
τ = 2 sech s. Note that this curve
(c) y 2 = 4x + c,
cx
(d) y 2 = cx,
(f) cosh x = c sin y,
(g) y = ±e ,
(h) r = (P sin t − Q cos t + R, P cos t + Q sin t, −P sin t + Q cos t),
(i) r = (Ret cos(t − α), Ret sin(t − α), t + β).
2
2
17. (a) y = cx2 , (b) x3 y = c, (c) y 2 = Ae−x , (d) 2x2 + y 2 = c, (e) Ax2 = ex +y .
Figure 2.2. Trajectories and orthogonal trajectories
72
8
. Final value 169
.
18. (c) Maximum value 13
Calculus Chapter 2
æ
27
26
MATH2011/2/4
æ
Calculus Chapter 3
27
Calculus Chapter 3
Scalar and vector fields
Scalar fields and quadric surfaces
The previous chapter dealt with vector functions of a scalar variable, which represent
parametric equations of a curve. In this chapter we first consider real scalar functions of
a real vector variable, which are called scalar fields. A scalar field in two dimensions,
say z = φ(x, y), gives an explicit definition of a surface, and the curves defined implicitly
by the equations z = constant (or φ(x, y) = constant) are the level curves or horizontal
sections or contours of that surface In three dimensions a scalar field, say w = φ(x, y, z),
cannot be visualized directly (since it needs four dimensions), but the implicit equations
φ(x, y, z) = constant define surfaces, which are called the level surfaces of the field.
The simplest surface in three dimensions is a plane, which can be defined explicitly by
z = px + qy + r if it is not vertical, but is best defined completely generally by an implicit
equation of the form ax + by + cz = d or r · n = d, where r = (x, y, z), as usual, and
(a, b, c) = n, a constant vector normal to the plane.
The next simplest surfaces are the paraboloids, in which z is an explicit quadratic
function of x and y. After rotating or shifting the axes, if necessary, we obtain a canonical
equation of the form z = αx2 + βy 2 . The surface is an elliptic paraboloid (cup or cap) if α
and β have the same sign, because then its level curves are ellipses. If α and β have opposite
signs, then the level curves are hyperbolas, and the surface is a hyperbolic paraboloid or
saddle. If one of α and β is zero, then the surface is a parabolic cylinder. Paraboloids
and parabolic cylinders are explicit examples of what are called quadric surfaces.
28
MATH2011/2/4
Figure 3.1. Implicit quadric surfaces — ellipsoid, elliptic cylinder, hyperboloid of
one sheet, cone, hyperboloid of two sheets, hyperbolic cylinder
Other quadric surfaces cannot be defined explicitly, because they involve quadratic
terms in all three variables. After possible rotation and/or shift of all three axes (which
will be described in Algebra Chapter 4), they can be represented by an implicit equation of
the form αx2 + βy 2 + γz 2 = c. Implicit quadric surfaces are illustrated in Figure 3.1; they
can be told apart by considering the families of horizontal or vertical sections obtained
by keeping one variable constant. Such sections will be ellipses if the coefficients of the
other two variables have the same sign, and hyperbolas if the coefficients have opposite
signs. Elliptical sections may not exist for some values of the variable being kept constant,
because a sum of two positive quantities must be positive. This makes it possible to
distinguish different quadric surfaces from their equations.
If all sections are ellipses, then the surface is an ellipsoid, of which a sphere is the best
known example. If two families of sections are hyperbolas and the other family consists of
ellipses, then the surface is in general an elliptical hyperboloid, of which there are two
kinds. If there is an elliptical section through the origin (i.e. by putting one variable equal
to zero), then the surface is like a cooling tower, and is called a hyperboloid of one sheet.
If the section through the origin is a single point, then the surface is an elliptical cone. If
there is no elliptical section through the origin, then the surface is in two separate parts,
and is called a hyperboloid of two sheets. Finally, if one of the coefficients is zero (i.e. if
one variable does not appear in the equation), then the surface is an elliptic or hyperbolic
cylinder, depending on the signs of the other two coefficients.
Calculus Chapter 3
29
Tutorial questions — Scalar fields and quadric surfaces
1. Describe the vertical and horizontal sections of the following quadric surfaces in three
dimensions, and hence identify the surfaces. For elliptical sections, state for which
values of the variable they exist.
(a) x2 + 2y 2 + 3z 2 = 4
(b) x2 + 2y 2 − 3z 2 = 4
(d) x2 + 2y 2 − 3z 2 = 0
(e) x2 + 2y 2 + 3z = 4
(g) x2 − 2y 2 = 4
(h) x2 + 2y 2 = 4
(c) x2 + 2y 2 − 3z 2 = −4
(f) x2 − 2y 2 + 3z = 4
(i) x2 + 2y 2 + 3z 2 = −4.
Directional derivatives
A partial derivative represents the rate of change of a scalar field when all except one of
the independent variables are kept constant. For an explicitly defined surface z = z(x, y),
the partial derivatives
∂z
∂x
and
∂z
∂y
can also be thought of as the slopes of vertical sections
parallel to the x axis and y axis respectively. We now consider the problem of determining
the rate of change of a scalar field in an arbitrary direction at a given point, not only
parallel to an axis.
Suppose φ = φ(r), and suppose u is a unit vector. The directional derivative of φ at
the point r in the direction of u is denoted
dφ
ds ,
and defined by
φ(r + u∆s) − φ(r)
dφ
= lim
.
∆s→0
ds
∆s
Since u is a unit vector, it follows that the directional derivative measures the rate of
change of φ per unit change in displacement in the direction parallel to u. Thus the
independent variable s denotes arc length, and one reason for using a total derivative sign
is that (in the given direction) φ can be thought of as depending on the single variable s
only. However, the notation
∂φ
∂s
is sometimes used, as is
∂φ
∂u ,
which has the advantage of
indicating the direction, but is misleading because the differentiation is not with respect
to u. Directional derivatives in the directions of the standard unit vectors i, j, and k are
the familiar partial derivatives
∂φ ∂φ
∂x , ∂y ,
and
∂φ
∂z .
Theorem. At any point the directional derivative of φ in the direction of a unit vector u
is given by
∂φ ∂φ
dφ
=( ,
)·u
ds
∂x ∂y
or
dφ
∂φ ∂φ ∂φ
=( ,
,
) · u,
ds
∂x ∂y ∂z
depending on whether the scalar field φ is defined in two or three dimensions.
Proof. Suppose s denotes arc length along any curve r = r(s) passing through the required
point in the direction of u. Then u =
dr
,
ds
since both sides are unit vectors in the same
30
MATH2011/2/4
direction. By the chain rule for partial differentiation, using the diagram above, we have
∂φ dx ∂φ dy ∂φ dz
dφ
=
+
+
ds
∂x ds
∂y ds
∂z ds
∂φ ∂φ ∂φ
dx dy dz
=( ,
,
)·( , , )
∂x ∂y ∂z
ds ds ds
∂φ ∂φ ∂φ
∂φ ∂φ ∂φ dr
,
)·
=( ,
,
) · u,
=( ,
∂x ∂y ∂z ds
∂x ∂y ∂z
which is the required result for a scalar field in three dimensions. The proof in two
dimensions is obtained by omitting all terms involving z.
Figure 3.2. Surface with path and its horizontal projection
In two dimensions, the geometrical interpretation is as follows. A scalar field φ(x, y),
with two independent variables, can be visualized as an explicit surface z = φ(x, y). Imagine a smooth path x = x(s), y = y(s), z = 0, in the horizontal plane, as shown in Figure 3.2,
where s denotes arc length in the plane. Vertically above this horizontal path lies a path
on the surface, which is given by
x = x(s),
y = y(s),
since z = φ(x, y). The directional derivative
dφ
ds
z = φ x(s), y(s) ,
or
dz
ds
then measures the slope or steepness
of the path on the surface. For the simplest visualization, imagine a straight line in the
is the slope of the vertical section of the surface parallel
plane: at each point on the line, dφ
ds
to that line.
We cannot visualize a three-dimensional scalar field w = φ(x, y, z), but it can be thought
of as a variable, say temperature, existing at all points within a certain region in space. At
dφ
any point, and in any direction, dw
ds (or ds ) represents the rate of change of temperature
if you move from the given point in the given direction.
Calculus Chapter 3
31
∂φ
∂φ ∂φ ∂φ
The vector ( ∂φ
∂x , ∂y ) or ( ∂x , ∂y , ∂z ) is called the gradient of φ and is denoted grad φ.
Thus
∂φ ∂φ
∂φ ∂φ ∂φ
grad φ = ( ,
) or ( ,
,
).
∂x ∂y
∂x ∂y ∂z
The formula for the directional derivative can now be written
dφ
dφ
= grad φ · u or
= | grad φ| cos α,
ds
ds
where u is the unit vector in the required direction and α is the angle between u and
grad φ. It follows that dφ
ds is the component of grad φ in the direction of u.
Corollary 1. At any given point, the directional derivative dφ
ds has a maximum value equal
to | grad φ|, which is obtained by taking u in the direction of grad φ.
Proof. At a given point, the only variable is the direction, which is determined by the
angle α between u and grad φ. The maximum value of cos α is 1, when α = 0. Since
dφ
ds
= | grad φ| cos α, it follows that the maximum value is | grad φ|, which occurs when
α = 0, i.e., when u is in the direction of grad φ.
In particular, the steepest slope at the point (x, y, z) on an explicit surface z = z(x, y) is
q
∂z 2
∂z ∂z
∂z 2
| grad z| = ( ∂x
, ∂y ) = ( ∂x
) + ( ∂y
) ,
∂z ∂z
, ∂y ).
and it occurs in the direction of the vector grad z = ( ∂x
Corollary 2. (i) The vector grad φ is normal to an implicitly defined curve or surface
φ = constant,
∂z ∂z
(ii) ( ∂x
, ∂y , −1) is a normal vector to the explicit surface z = z(x, y) at the point (x, y, z).
Proof. (i) Suppose u is a unit tangent vector (in any direction) at a point on the curve or
surface φ = constant; then at that point dφ
ds = 0 in the direction of u, because φ is constant
dφ
on the curve or surface. Thus 0 = ds = grad φ · u, i.e. u is perpendicular to grad φ. This
means that grad φ is perpendicular to every tangent at that point, so grad φ is normal to
the curve or surface.
(ii) If we define φ(x, y, z) = z(x, y) − z, then the surface z = z(x, y) can be written as
∂z ∂z
φ = 0, and by part (i) a normal is grad φ, which is equal to ( ∂x
, ∂y , −1).
(The letter z is being used in two senses in part (ii): as a function of x and y, and as
the third variable. This abuse of notation is convenient, but can be confusing if you don’t
think clearly. An alternative proof for part (ii) is given as a tutorial question below.)
Knowing the normal to a surface at a point enables us to determine the tangent plane
to the surface at that point, i.e., the plane that passes through the point and has the same
normal as the surface there. At points where grad φ = 0 the surface may not be smooth,
and the tangent plane may not exist.
32
MATH2011/2/4
Tutorial questions — Directional derivatives
2. If z = cos π(2x2 − y 2 ) , find the directional derivatives of z at the point ( 21 , 21 ) in the
directions of the vectors
(a) (1, 0)
(b) (0, 1)
(c) (1, 1)
(d) (−1, 1)
(e) (3, −4).
How could the answers to (a) and (b) be found without using directional derivatives?
3. Let z = x4 − 3x3 y + x2 y 2 .
(i) Find the directional derivative of z at the point where x = 2 and y = 1 in the
direction of the tangent to the curve r = (t2 + 1, t3 ).
(ii) In what direction is z increasing at the maximum rate at the point where x = 1 and
y = −2, and what is the maximum rate of increase?
4. Consider the point r = (x, y, z) on a general explicit surface z = z(x, y).
∂z
∂z
) and (1, 0, ∂x
) are tangent to the surface at this point.
(i) Show that the vectors (0, 1, ∂y
(Hint: by holding either x or y constant we obtain vertical sections of the surface, and
tangent vectors to these curves are found by partially differentiating r with respect to
∂z
the other variable.) Use cross products to prove that ( ∂x
,
surface.
∂z
,
∂y
−1) is a normal to the
(ii) If the vector (u, v, w) is tangential to the surface at this point, show that w =
∂z
∂z
+ v ∂y
. (Hint: tangent is perpendicular to normal.)
u ∂x
∂z
(iii)* Show that a steepest tangent vector to the surface is parallel to ( ∂x
,
(Hint: The steepest tangent vector (u, v, w) occurs when (u, v) k grad z.)
∂z
∂y ,
| grad z|2 ).
5. Prove that grad(r · n) = n for any constant vector n. (Hint: put n = (a, b, c) and first
simplify r · n.) Deduce that n is normal to the surface with implicit equation r · n = d
(constant). Interpret this result geometrically. (Hint: what surface does the equation
represent, and what is a normal to this surface?)
6. Find the directional derivative of φ(x, y, z) = 4x2 + y 2 − z 2 at a general point r in the
direction of a given unit vector u = (u1 , u2 , u3 ).
Sketch the surface φ(r) = 4.
√
Evaluate grad φ at the points (1, 0, 0), (1, 2, 2), and (0, 5, −1). Mark these points on
your sketch, and confirm by inspection in each case that grad φ is perpendicular to the
surface.
Write down an equation of the tangent plane at the point (1, 2, 2).
Sketch the surface φ(r) = 0, and verify that grad φ = 0 at the origin. Is the surface
smooth at that point?
7. Find an equation of the tangent plane to the surface in Question 3 at the point where
x = 1 and y = −2. (Hint: don’t forget z.)
Calculus Chapter 3
33
The operator del and vector fields
The gradient of a scalar field can be conveniently expressed by introducing a new symbol,
which also serves to define other important operators. We define the operator ∇, called
del or nabla, by
∇=(
∂ ∂ ∂
,
,
).
∂x ∂y ∂z
∂φ ∂φ ∂φ
Then, symbolically, ∇φ = ( ∂x
, ∂y , ∂z ), which is just grad φ. From the rules for differentiation it is easy to verify that ∇ is a linear operator, i.e., that
∇(cφ) = c(∇φ) and ∇(φ + ψ) = ∇φ + ∇ψ,
for any constant c and any scalar functions φ and ψ. Since φ is a scalar function, but ∇φ is
a vector function, it follows that ∇ acts on a scalar field to give a vector field. Vector fields
were introduced in the section on trajectories in Calculus Chapter 2; the most important
problem, which we shall deal with in the next section, is the inverse problem for ∇: i.e., to
determine whether a given vector field can be written in the form ∇φ, and, if so, to find φ.
The symbol ∇ can be used to define two more operators, by using the notation of dot
and cross product of vectors. Firstly, the divergence of a vector field v = (v1 , v2 , v3 ) is
written div v and is defined
∂v2
∂v3
∂v1
+
+
,
∂x
∂y
∂z
∂ ∂ ∂
=( ,
, ) · (v1 , v2 , v3 ),
∂x ∂y ∂z
so div v = ∇ · v,
div(v1 , v2 , v3 ) =
if the dot is interpreted symbolically as in dot products. (For two-dimensional functions
we simply omit the third components in ∇ and v.) The divergence of a vector field is
a scalar quantity, and, roughly speaking, measures the rate at which the trajectories are
separating or diverging from one another. It can be shown by conservation of mass that the
divergence of a velocity field can be non-zero only if the density is changing or if matter is
being added to or taken away from the system. We shall explain divergence more precisely
in Calculus Chapter 4.
Secondly, the curl of a vector field v = (v1 , v2 , v3 ) is written curl v and is defined
∂v3
∂v2 ∂v1
∂v3 ∂v2
∂v1
−
,
−
,
−
)
∂y
∂z ∂z
∂x ∂x
∂y


i
j
k
∂
∂ 
∂
= det  ∂x
∂y
∂z
v1 v2 v3
curl(v1 , v2 , v3 ) = (
so curl v = ∇ × v,
34
MATH2011/2/4
where the cross is interpreted symbolically as in cross products. Beware, however, that the
vector function curl v is not necessarily perpendicular to the vector function v, because
∇ × v is only symbolically a cross product. If we have a two dimensional field v = v(x, y),
then we can put v3 = 0, and observe that partial derivatives with respect to z are also
zero, so the formula simplifies to
curl(v1 , v2 ) = (0, 0,
∂v2
∂x
−
∂v1
∂y ).
For a velocity field v it will be shown later that the component of curl v in any direction is
twice the angular velocity of the field in the plane perpendicular to that direction, so curl
measures the rotation of the field. The curl of a force field will be shown to be related to
the work done per unit area when traversing small closed paths.
Finally, since grad φ is a vector function, the operators curl and div can be applied to
it. The curl of the gradient is always zero, since
curl(grad φ) = curl( ∂φ
, ∂φ , ∂φ ) =
∂x ∂y ∂z
∂ ∂φ
( )
∂y ∂z
−
∂ ∂φ
∂ ∂φ
( ), ∂z
( ∂x )
∂z ∂y
−
∂ ∂φ
∂ ∂φ
( ), ∂x
( ∂y )
∂x ∂z
−
∂ ∂φ
( )
∂y ∂x
which is equal to the zero vector, since the mixed partial derivatives in each component
are equal to each other, for all functions we meet.
The divergence of the gradient is not necessarily zero; we have
div(grad φ) = div(
∂ ∂φ
∂ ∂φ
∂ ∂φ
∂ 2φ ∂ 2φ ∂ 2φ
∂φ ∂φ ∂φ
,
,
)=
( )+
( )+
( )=
+
+ 2.
∂x ∂y ∂z
∂x ∂x
∂y ∂y
∂z ∂z
∂x2
∂y 2
∂z
The operator div grad is called Laplace’s operator; it acts on a scalar function to give
a scalar function, and div(grad φ) = ∇ · (∇φ) = ∇2 φ, say, so symbolically we write ∇2 for
∇ · ∇. Thus we have
∇2 φ = ∇ · (∇φ) =
∂ 2φ ∂ 2φ ∂ 2φ
∂ 2φ ∂ 2φ
+
+
or
+ 2.
∂x2
∂y 2
∂z 2
∂x2
∂y
Laplace’s operator arises in problems of heat transfer, diffusion, or vibration in two or three
dimensions, where it can be shown that ∇2 φ is proportional to
(i.e., time-independent) situations we hav
∂φ
∂t
∂φ
∂t
or
∂2φ
∂t2 .
For steady state
= 0, from which it follows that
∇2 φ = 0,
which is called Laplace’s equation. A function φ satisfying Laplace’s equation is said
to be harmonic, so harmonic functions form the null space of ∇2 . They will reappear
frequently.
,
Calculus Chapter 3
35
From the rules for differentiation it is easy to show that the above operators are all
linear; their properties can be summarized in the following table.
Operator
symbol
acts on
to give
grad
div
curl
Laplace’s
∇
∇·
∇×
∇2
scalar
vector
vector
scalar
vector
scalar
vector
scalar
Tutorial questions — The operator del and vector fields
8. Find ∇φ for the scalar functions φ below:
(a) 21 ln(x2 + y 2 )
(b) exy + sin(x + y)
(c) xyz + ex+y+z
(d) x2 + y 2 + z 2 .
9. Find ∇ · v for each of the vector fields v below:
(a) (cosh x cos y, − sinh x sin y)
(b) ex (cos y, sin y)
(c) (yzex , xzey , xyez )
(d) (x − y, y − z, z − x).
10. Find ∇ × v for each of the vector fields v below:
(a) (x, y, z)
(c) (y, z, −y)
(b) (y, z, −x)
(d) (xy, yz, zx)
(e) (y 2 sin z, 2xy sin z, xy 2 cos z + z)
(f) (x − y, x + y, 1).
Determine (curl v) · v in each case, to see whether curl v is perpendicular to v.
11. Verify that curl grad φ = 0 for φ = 2xy − xy 2 and φ = x2 yez .
* 12. It was stated above that the mixed second partial derivatives of reasonable functions
are equal. Here is an example of a function for which they differ at one point. Let a
and b be constants, and define z = (ax + by)4 /(x2 + y 2 ) when (x, y) 6= (0, 0) and z = 0
when (x, y) = (0, 0).
(i) Show that z = b4 y 2 when x = 0 for all y, including y = 0. Deduce that
when (x, y) = (0, 0).
(ii) Show by rules for differentiation that
∂z
∂y
∂z
∂y
3
∂z
∂y
=0
= 2(ax + by)3(2bx2 + by 2 − axy)/(x2 + y 2 )2
= 4a bx when y = 0 and x 6= 0. By part (i) this
∂ ∂z
3
formula holds when x = 0 also. Thus ∂x
∂y = 4a b when y = 0 for all x, in particular
when (x, y) 6= (0, 0). Deduce that
at the point (x, y) = (0, 0).
(iii) By symmetry, interchanging x and y, it follows that
∂
∂y
∂z
∂x
= 4ab3 when x = 0
∂ ∂z
∂ ∂z
=
6
for all y, in particular at the point (x, y) = (0, 0). Thus at the origin ∂x
∂y
∂y ∂x
unless ab = 0 or a = ±b.
13. Find ∇2 φ for the scalar functions φ in Question 8, and state which functions are har-
monic.
14. Show by the product rule for differentiation that ∇(φψ) = φ(∇ψ) + (∇φ)ψ for scalar
functions φ and ψ, and deduce that ∇2 (φψ) = φ∇2 ψ + 2∇φ · ∇ψ + ψ∇2 φ.
36
MATH2011/2/4
Potential functions
If a vector field v is such that there exists a scalar function φ satisfying
grad φ = v,
then we say that v is a conservative field or a gradient field, and we call φ a potential
function for the vector field v. Thus a conservative field v is one for which the inverse
problem for ∇ can be solved. Since ∇ is a linear operator, and the gradient of a constant
is 0, it follows that an arbitrary constant can be added to a potential function, as with an
indefinite integral. In fact, finding a potential function for a one dimensional vector field
is simply integrating it.
The simplest examples of conservative fields are force fields (the word “conservative”
comes from conservation of energy). For a force field the potential function φ is the
potential energy, i.e., the work done by the field in moving a particle to the point in
question. The starting point (zero potential) from which the particle is moved is arbitrary,
which explains the arbitrary constant in φ.
Theorem. A vector field v is conservative if and only if curl v = 0.
Proof. If v is conservative, then v = grad φ, so curl v = curl(grad φ), which is the zero
vector, as shown in the previous section. We shall prove the converse once we have studied
integration in vector fields.
Once we know from the above test that a field v is conservative, it is straightforward
to find a potential function φ for v: if grad φ = v, then ( ∂φ
, ∂φ , ∂φ ) = (v1 , v2 , v3 ), say, so
∂x ∂y ∂z
∂φ
∂φ
∂φ
= v1 ,
= v2 ,
= v3 .
∂x
∂y
∂z
We can write each of these equations in integral form, remembering that each integral has
an arbitrary “constant” that may involve the variables that are being temporarily kept
constant, and we obtain
R
R
R
φ(x, y, z) = v1 dx + α(y, z) = v2 dy + β(z, x) = v3 dz + γ(x, y).
By using the three equations successively, in either derivative or integral form, it is possible
to obtain a single expression for φ that involves only an arbitrary (genuine) constant.
For a two-dimensional conservative field, finding a potential function is similar to solving
an exact differential equation , and the test for conservativeness is essentially the same test
as for exactness.
If v is a conservative vector field with potential function φ, then the curves or surfaces
with equation φ(x, y, z) = constant are called equipotentials. The equipotentials are
orthogonal trajectories or surfaces to the field, because v = grad φ and we have shown
earlier this chapter that grad φ is a normal to any curve or surface with equation φ =
constant.
Calculus Chapter 3
37
Tutorial questions — Potential functions
15. Test whether each of the following vector fields is conservative (by finding its curl), and
then, where possible, find a potential function.
(a) (2xy, x2 + y 2 )
(b) (y, −x +
(c) (xy − x−1 , x2 + 1)
√
xy)
(d) (sin 2y, 2x cos 2y).
16. (i) Find the orthogonal trajectories of the field in Question 15(a), and verify that the
equipotentials coincide with the orthogonal trajectories.
(ii) Find an equation for the orthogonal trajectories of the field in Question 15(b). Write
the equation in the form φ(x, y) = constant, and show that ∇φ is parallel to the original
field. (Thus you have found a conservative field with the same trajectories, which is
the same as finding an integrating factor for the differential equation satisfied by the
orthogonal trajectories.)
17. (i) Show that the differential equation M (x, y) dx + N (x, y) dy = 0 is exact if and only
if the vector field M (x, y), N (x, y), 0 is conservative.
(ii) Assuming that the differential equation in (i) is exact, show that solving it is the
same procedure as finding the equipotentials of the vector field.
18. Find a potential function for each field in Question 10 for which ∇ × v = 0.
19. Find the potential function φ for the conservative field (x, y, −z) such that φ is zero at
the origin. Identify and sketch roughly, on the same axes, the equipotential surfaces
φ = 0 (passing through the origin), φ = 1, and φ = −1.
20. (To illustrate a possible danger in the amalgamation method, because of identities
1
between functions.) Show that the field v = 2
(−y, x) is conservative, and find
x + y2
the potential function that is zero on the positive x axis.
Stationary points and optimization
We have previously discussed stationary points on a curve y = y(x), i.e., points at which
the derivative y ′ is zero. If y ′′ 6= 0, then its sign can be used to distinguish whether the point
is a maximum or minimum, but if y ′′ = 0, then there is no conclusion without considering
nearby points: the point might be a point of inflexion, or a maximum or minimum. We
now wish to extend classification of stationary points to surfaces z = z(x, y), which will
also allow us to maximize or minimize real functions of two real variables.
Firstly, to locate the the stationary points we find the points where the tangent plane
is horizontal, i.e., where
∂z
∂x
= 0 and
∂z
∂y
= 0 simultaneously. Secondly, the nature of a
stationary point is determined by considering a second approximation to the surface near
the point. This means that the surface is approximated by an explicit quadric surface.
If the approximation is an elliptic paraboloid (cap or cup), then the point is a proper
maximum or proper minimum, and if the approximation is a hyperbolic paraboloid,
38
MATH2011/2/4
then the point is a saddle point. If, however, the approximation is a parabolic cylinder,
then there is in general no conclusion about the surface itself, without considering nearby
points.
Figure 3.3. Second approximation to a point on a surface
We therefore need to obtain a second approximation to z(x + ∆x, y + ∆y) (point C in
Figure 3.3) in terms of z(x, y) (point A) and appropriate first and second partial derivatives.
We know that, to a second approximation,
f (x + ∆x) = f (x) + f ′ (x)∆x + 21 f ′′ (x)∆x2 + · · · .
If we apply this result in the vertical section z = z(x, y + ∆y), where y + ∆y is constant,
then we obtain
z(x + ∆x, y + ∆y) = z(x, y + ∆y) + zx (x, y + ∆y)∆x + 12 zxx (x, y + ∆y)∆x2 + · · · , (1)
where the subscripts denote partial derivatives. This approximates the z value at C in
terms of z and its derivatives evaluated at B. Similarly, working in the vertical section
x = constant, we can approximate each of the functions on the right hand side of (1)
(which are evaluated at B) in terms of values at A, using y as the variable. We obtain
z(x, y + ∆y) = z(x, y) + zy (x, y)∆y + 21 zyy (x, y)∆y 2 + · · · ,
zx (x, y + ∆y) = zx (x, y) + zxy (x, y)∆y + · · · ,
(2)
(3)
zxx (x, y + ∆y) = zxx (x, y) + · · · .
(4)
Now substitute (2), (3), and (4) in (1) to get
z(x + ∆x, y + ∆y) = {z(x, y) + zy (x, y)∆y + 21 zyy (x, y)∆y 2 + · · · }
(from(2))
+ {zx (x, y) + zxy (x, y)∆y + · · · }∆x (from(3))
+ 21 {zxx (x, y) + · · · }∆x2
(from(4)),
i.e., z + ∆z = z + zx ∆x + zy ∆y + 12 {zxx ∆x2 + 2zxy ∆x∆y + zyy ∆y 2 } + · · · .
(5)
Calculus Chapter 3
39
This result is true at any point where the surface is smooth; if we subtract z from both
sides and remember that zx = 0 and zy = 0 at a stationary point, then we obtain
∆z ≈ 21 {zxx ∆x2 + 2zxy ∆x∆y + zyy ∆y 2 },
which shows clearly how the second approximation to ∆z near a stationary point is a
quadratic function of ∆x and ∆y.
We can now classify the stationary point by determining the sign of ∆z in all possible
vertical sections through the point. If ∆z is positive in all directions, then the point is a
proper minimum; if ∆z is negative in all directions, then the point is a proper maximum;
if ∆z is positive in some directions and negative in others, then the point is a saddle point.
By completing the square in the approximation for ∆z, and writing λ =
∆x
∆y ,
we obtain
∆y 2
2
∆z ≈
{(zxx λ + zxy )2 + zxx zyy − zxy
}.
2zxx
2
If zxx zyy − zxy
> 0, then the expression in curly brackets is always positive, so in every
direction ∆z has the same sign as zxx , giving a proper maximum or minimum. If zxx zyy −
2
zxy
< 0, then the expression in brackets is positive in some directions and negative in
others, so ∆z can be both positive and negative, giving a saddle. (The expression for ∆z
is valid only if zxx 6= 0, but there is a similar expression if zyy 6= 0, and the situation where
both are zero is easy to check.) The conclusions can be summarized in a table:
2
zxx zyy − zxy
<0
Saddle.
2
zxx zyy − zxy
>0
Proper minimum if zxx > 0 or zyy > 0.
Proper maximum if zxx < 0 or zyy < 0.
2
zxx zyy − zxy
=0
See below.
2
If zxx zyy − zxy
= 0 at an isolated stationary point, then the second approximation to
∆z has the same sign as zxx in all directions except one, in which it is zero. This suggests
that the surface is like a ridge or a trough. Unfortunately, third order effects may alter the
surface near the stationary point, and the conclusion may not be valid. However, if there
2
is a whole curve of stationary points along which zxy
− zxx zyy = 0, then we can indeed
conclude that there is an improper maximum or minimum, roughly like a parabolic
cylinder, at all points on the curve. The sign of zxx or zyy can be used, as above, to to tell
whether it is an improper maximum or minimum.
40
MATH2011/2/4
Tutorial questions — Stationary points and optimization
* 21. Show that if you approximate z at the point C in Figure 3.3 by using firstly the vertical
section x + ∆x constant and secondly the vertical section y constant, then the end
result (equation (5)) is the same. (Hint: interchange x and y in equations (1)–(4). It is
essential that zxy and zyx be equal.)
22. Locate and classify the stationary points on the following surfaces, remembering that
x, y, and z are real:
(a) z = x2 + cos y
(d) z = (x3 − x)(y 2 + 1)
(b) z = x3 − 3x + cos y
(e) z = (x3 − x + 6)(y 2 + 1)
(c) z = (x3 − x + 6)(y 3 − y)
(f) z = cos x cos y.
23. Find the maximum volume of a cuboid (i.e., box with rectangular faces) that will fit with
its base on the (x, y) plane and its four upper corners touching the elliptic paraboloid
z = 1 − x2 − 2y 2 . (Hint: the base vertices are at the points (±x, ±y).) Check that the
volume is a proper maximum.
24. A open tin box with five rectangular faces (i.e. no lid) is to have volume 32 cubic units.
What are its dimensions if the minimum area of sheet metal is used? Check that the
area is a proper minimum.
25. Show that the surface z = x2 − 2xy + y 2 has an improper minimum at all points along
the line y = x. Identify the surface, and sketch it roughly.
26. Classify, where possible, the stationary points on the following surfaces, remembering
that x, y, and z are real:
x y
+
(c) z = x2 y 2 − x2 .
y
x
* 27. Solve the equation z = 0 if z = y(y 2 − x4 ). Hence sketch the horizontal section z = 0
(a) z = x2 y 2 − 2xy
(b) z =
on the surface z = y(y 2 − x4 ) and determine the sign of z in the regions between the
portions of the contour. Show that z changes sign six times as you circle the origin.
Deduce that the stationary point at the origin is not one of the standard types. (This is
2
because it is an isolated stationary point at which zxx zyy − zxy
= 0. Verify this fact.)
28. Verify by means of a sketch that a torus (e.g. a motor car inner tube or doughnut or
tenniquoit ring), when held vertically, has a proper maximum, a proper minimum, and
two saddles. Describe the stationary points when it is held horizontally.
29. If P and Q are arbitrary points on the lines r = (1 + 2t, 1 + t, t) and r = (3 + u, 2 −
u, 4 + 2u) respectively, express |P Q|2 in terms of t and u. Find the minimum value of
|P Q|2 and verify that the minimum occurs when the vector P Q is perpendicular to (the
direction vectors of) both lines.
* 30. After an experiment with n observations, it is required to draw the best line y = mx + c
through the n observed points (x1 , y1 ), . . . , (xn , yn ) in the plane. Note that the vertical
difference or error between the line y = mx + c and the point (xi , yi ) is mxi + c − yi .
Pn
Let S(m, c) = i=1 (mxi + c − yi )2 , the sum of the squared errors. Find expressions
Calculus Chapter 3
41
for m and c that will minimize S. (Hint: note that S ≥ 0 and S → ∞ as m → ±∞
or c → ±∞, so proper minimum is the only possibility.) This is called the method of
Least Squares, and was invented by Gauss.
* 31. It is possible to optimize a function f (r) over all points that satisfy the equation φ(r) = 0
(i.e. subject to some restriction on the points r) by optimizing the function f (r)+λ φ(r),
where λ is a dummy variable. This is called the method of Lagrange multipliers.
Use Lagrange multipliers to find the point(s) on the curve x2 + 16xy − 11y 2 = 100 that
are closest to the origin. (Hint: let f (x, y) = x2 + y 2 (the square of the distance from
the origin) and let φ = x2 + 16xy − 11y 2 − 100. Remember that the partial derivative
with respect to λ must also be zero.) Notice that it is not easy to solve for y and express
the distance in terms of x only, otherwise the problem could be solved by elementary
calculus.
Answers
1. (a) Ellipsoid (|x| < 2, |y| <
√
√
2, |z| < 2/ 3),
(b) hyperboloid of one sheet (elliptical
√
sections all values of z), (c) hyperboloid of two sheets (elliptical sections for |z| > 2/ 3),
(d) elliptic cone (elliptical sections for |z| =
6 0),
sections for z <
4
3 ),
(e) elliptic paraboloid (cap) (elliptical
(f) hyperbolic paraboloid (saddle),
(g) hyperbolic cylinder,
(h) elliptic cylinder (elliptical sections for all values of z), (i) surface does not exist (sum
of squares cannot be negative).
√
∂z
2. (a) − 2π = ∂x
,
(b) √π2 =
3. (i) t = 1, u =
√
rate 949.
√1 (2, 3),
13
and
dz
ds
∂z
∂y ,
(c) − π2 ,
√
= −48/ 13,
(d)
3π
2 ,
√
(e) − 2π.
(ii) Direction (30, −7), maximum
∂z ∂z
4. (ii) (u, v, w) · ( ∂x
, ∂y , −1) = 0.
∂z
∂z
(iii)* (u, v) k grad z, so u = k ∂x
and v = k ∂y
for some k. By part (ii) it follows that
∂z 2
∂z 2
+ k ∂y
= k| grad z|2 .
w = k ∂x
5. r · n = d (or ax + by + cz = d) represents a plane, and n (or (a, b, c) is a normal to the
plane.
dφ
ds
= 8xu1 + 2yu2 − 2zu3 . Hyperboloid of one sheet.
√
grad φ = (8, 0, 0), (8, 4, −4), (0, 2 5, 2). Tangent plane 2x + y − z = 2.
6.
Surface φ = 0 is a double elliptic cone, and the origin is the vertex, where it is not smooth.
7. z = 11; point on surface is (1, −2, 11); plane is 30x − 7y − z = 33.
8. (a)
1
x2 +y 2 (x, y),
x+y+z
(c) (yz + e
9. (a) 0,
(b) (yexy + cos(x + y), xexy + cos(x + y),
, xz + ex+y+z , xy + ex+y+z ),
(b) 2ex cos y,
(d) 2(x, y, z).
(c) yzex + xzey + xyez ,
(d) 3.
42
MATH2011/2/4
10. (a) 0,(b) (−1, 1, −1),(c) (−2, 0, −1),(d) (−y, −z, −x), (e) 0,(f) (0, 0, 2).
Dot product curl v · v = 0 only when curl v = 0.
(b) (x2 + y 2 )exy − 2 sin(x + y),
13. (a) 0, harmonic,
2
15. (a) x y +
1 3
3y
+ c,
(b) none,
(c) 3ex+y+z ,
(c) none,
(d) 6.
(d) x sin 2y + c.
16. (i) Put x = uy; solution y(x2 + 13 y 2 ) = c.
p
√
(ii) Put x = uy; solution φ(x, y) = ln y + 2 x/y = c. ∇φ = x−1/2 y −3/2 (y, −x+ xy).
17. ∇ × (M, N, 0) = (0, 0, ∂N
−
∂x
∂M
),
∂y
∂N
∂x
so the curl is zero if and only if
=
∂M
,
∂y
which is
the test for exactness.
18. (a) 12 (x2 + y 2 + z 2 ) + c,
(e) xy 2 sin z + 12 z 2 + c.
19. φ = 21 (x2 + y 2 − z 2 ). Equipotential surfaces are hyperboloid of two sheets inside cone
inside hyperboloid of one sheet.
20. arctan xy (i.e., the polar angle). NB arctan xy =
π
2
− arctan xy .
22. (a) Saddles at (0, 2kπ), proper minima at (0, (2k+1)π).
(b) Saddles at (1, 2kπ) and at (−1, (2k+1)π), proper maxima at (−1, 2kπ), proper minima
at (1, (2k+1)π).
(c) Proper maximum at (− √13 , − √13 ), proper minimum at (− √13 , √13 ), saddles at ( √13 , √13 ),
( √13 , − √13 ), (−2, ±1), (−2, 0).
(d) Saddles at (± √13 , 0).
(e) Saddle at (− √13 , 0), proper minimum at ( √13 , 0).
(f) Saddles at ( π2 + kπ,
π
2
+ mπ). Proper stationary points at (nπ, lπ): maxima if l + n is
even and minima if l + n is odd.
23. V = 4xyz = 4xy(1 − x2 − 2y 2 ). Maximum when x = 12 , y =
1
√
,
2 2
V =
1
√
.
2 2
32
24. xyz = 32 and A = xy + 2yz + 2zx = xy + 2(x + y) xy
. Dimensions for minimum area:
4 × 4 × 2.
25. z = (x − y)2 , parabolic cylinder. See sketch below.
z>0
z<0
z>0
z<0
z>0
z<0
Figure 3.4. Sketches for Questions 25 and 27
26. (a) Saddle at (0, 0). Improper minimum along xy = 1.
(b) Improper minimum along x = y. Improper maximum along x = −y.
(c) Improper stationary points along x = 0; maxima if −1 < y < 1 and minima if |y| > 1.
27. Level curves z = 0 have equations y = 0 or y = ±x2 . See sketch above.
28. Improper maximum along upper surface and improper minimum along lower surface.
√
29. |P Q| = 3 when P Q = (−1, 1, 1).
Calculus Chapter 3
30.
∂S
∂m
43
∂S
∂c
P
P
P
P
= 2 (mxi + c − yi ) = 0. .˙. m xi + c 1 = yi .
P
P
P
P
= 2 xi (mxi + c − yi ) = 0. .˙. m x2i + c xi =
xi y i .
By Cramer’s Rule
m
c
1
P
= P 2
( xi ) − n x2i
P
Pxi
− x2i
31. Minima when λ = − 51 , (x, y) = ±(4, 2).
−n
P
xi
P
y
i
P
.
xi y i
44
MATH2011/2/4
æ
Calculus Chapter 4
æ
43
44
MATH2011/2/4
Calculus Chapter 4
Vector integration
Path integrals in scalar fields
Suppose φ = φ(r) is a scalar field defined in some region in the plane or in space, and C is
a finite curve or path in the region. If s denotes arc length along C, then on C we can
use s as parameter and write r = r(s). Thus we have
φ
x
ր
տ
տ
ր
s
y
or
φ
↑
x
y
տ ↑
s
ր
տ
z
ր
so φ is a function of s, exactly as in the section on directional derivatives in Calculus
Chapter 3, but now we wish to integrate with respect to s instead of differentiate. We
define
Z
Z s1
φ(r(s)) ds,
φ ds =
C
s0
where s = s0 and s = s1 respectively define the initial and final points r0 and r1 of the
path C. Such an integral is called a path integral, or sometimes a line integral, even
when the path C is not a straight line. If C is parametrized by a different variable, say t,
dr
then ds
dt = ± dt , as remarked in Calculus Chapter 2, so, using integration by substitution,
the formula becomes
Z
Z
t1
C
φ ds = ±
φ(r(t))
t0
dr
dt
dt,
where the parameter values t = t0 and t = t1 respectively define the initial and final points
dr
of the path C. The minus sign before the integral must be inserted when ds
dt = − dt ,
i.e., when the parameter t decreases as s increases.
R
If φ = 1 (constant), then the path integral C ds represents the arc length of C. The
R
R
ratio C x ds/ C ds is the x-component of the centre of mass of the curve, and there are
similar expressions for y- and z-components. Alternatively, if φ is the mass per unit length,
Calculus Chapter 4
then
45
R
φ ds is the total mass of the curve. For a positive two dimensional field φ(x, y),
R
the path integral C φ ds represents the area of the vertical cliff face exposed when the
C
surface z = φ(x, y) is cut, as if by a band saw, along the curve C in the (x, y) plane. This
is still true if φ(x, y) assumes negative values, provided that areas of regions below the
(x, y) plane are taken to be negative.
Figure 4.1. Path integral representing area of a vertical face
Tutorial questions — path integrals in scalar fields
1. Evaluate
R
C
φ ds in the following cases.
√
(a) φ(x, y, z) = yex+z and C is the path r = (ln sin t, t 2, ln sec t) from the point where
t=
π
6
to the point where t =
π
3.
(b) φ(x, y) = xy and C is the path from (1, 0) to (0, 1) along
(i) a straight line,
(ii) a quarter circle anti-clockwise,
(iii) a three-quarter circle clockwise.
(Choose your own parametrizations, and remember the sign.)
Path integrals in vector fields
The integrands in the path integrals in the previous section above occur in regions in which
a scalar field φ = φ(r) is defined. More important path integrals are those in which the
integrand is a component of a vector field, say v = v(r), defined in the region. The unit
tangent vector to the curve C is u =
to C is v ·
dr
,
ds
dr
,
ds
so the component of v in the direction tangential
and the integral along C of the tangential component of v is
Z
s1
s0
v·
dr
ds
ds, or
Z
C
v · dr,
46
MATH2011/2/4
by the usual process of symbolically “cancelling differentials” because of the chain rule.
The integral is evaluated in terms of a general parameter t by using the expression
Z
C
v · dr =
Z
t1
t0
v·
dr
dt
dt.
Since no absolute values appear in the formula, no adjustment of signs is necessary if the
parameter t decreases along the curve C.
For a two dimensional vector field v = (f, g), the dot product v · dr = f dx + g dy, since
dr = (dx, dy), so we can also write
Z
C
v · dr =
Z
f dx + g dy.
C
R
F · dr represents the work done by the field in
R
moving a particle along the path C, since Work = Force dDistance and at each point only
the tangential component of the force is relevant. A familiar result from mechanics is that
For a force field, say F, the integral
C
the work done is equal to the change in potential. This result is true for any conservative
field F, i.e., any field that has a potential function.
Theorem. If the vector field F is conservative, and has potential function φ, then
Z
C
F · dr = φ(r1 ) − φ(r0 ),
where r0 and r1 are respectively the initial and final points of the curve C.
Proof. Since φ is a potential function for F, we have F = ∇φ. Thus F ·
dr
ds
= ∇φ · u =
dφ
,
ds
by the formula for directional derivatives, where s denotes arc length along the path C. If
the points r0 and r1 are given by parameter values s = s0 and s = s1 respectively, then
Z
C
F · dr =
Z
s1
s0
dr F·
ds =
ds
Z
s1
s0
s1
dφ ds = φ(r(s))
= φ(r1 ) − φ(r0 ), as required.
ds
s0
This theorem shows that the value of an integral in a conservative field depends only on
the initial and final points of the path, not on the path itself. We say that such integrals are
independent of the path. This is rather like a definite integral in elementary calculus,
with the potential function playing the part of the indefinite integral.
Corollary. If F is a conservative vector field defined throughout a plane region containing
H
a closed curve C and its interior, then c F · dr = 0.
(Notice how a little circle is put on the integral sign to emphasize that the integral is
around a closed curve.)
Calculus Chapter 4
47
Proof. The result follows immediately from the theorem, since for a closed curve the initial
and final points coincide, i.e., r0 = r1 . (We shall show after Green’s theorem why F must
be defined throughout the interior of C, as well as on C itself.)
Ra
a
(This is similar to the fact that −a x1 dx 6= ln |x| −a , because the integrand is undefined
at the point x = 0 between the endpoints.)
Tutorial questions — path integrals in vector fields
2. Evaluate
R
C
2
v · dr in the following cases.
√
(a) v = (x − y 2 , 2xy) and C is the hyperbola x2 − y 2 = 1 from (1, 0) to (2, 3).
(b) v = (cos x, y sec x) and C is the portion of the parabola y 2 = 4x from (0, 0) to
√
( π4 , π).
(c) v = (xz, yz, 2xy) and C is the path r = (t cos t, t sin t, t) from (0, 0, 0) to (−π, 0, π).
(d) v = (yz, xz, xy) and C is the same path as in (c).
(e) v = (x, y) and C is the ellipse
x2
a2
+
y2
b2
= 1. (Choose a parametrization going anti-
clockwise all the way around C.)
(f) v = (y, −x) and C is the same ellipse as in (e).
3. Use the potential functions found in Calculus Chapter 3 Questions 15 and 18 to evaluate
R
F · dr for the following conservative fields F between the points given.
(a) F = (2xy, x2 + y 2 ) from (0, 0) to (1, 3).
(b) F = (sin 2y, 2x cos 2y) from (1, 0) to (2, 2π
3 ).
(c) F = (x, y, z) from (1, 1, 1) to (2, −3, 6).
(d) F = (y 2 sin z, 2xy sin z, xy 2 cos z + z) from (0, 0, 0) to (1, −1, π).
Check your answers by integrating along the straight line between the points, using the
parametrization r = (1 − t)r0 + tr1 from t = 0 to t = 1.
4. At time t = 0 a particle of unit mass is at rest at the point (0, 1) in the force field F =
(2, −y).
(i) Solve Newton’s equation F = mr̈ for r(t), using the initial conditions given to
evaluate the arbitrary constants. (Hint: (2, −y) = 1(ẍ, ÿ); solve separately for x and y.)
(ii) Evaluate the work done between time t = 0 and time t = π by integration along the
path r = r(t).
(iii) Find a potential function for F and use it to re-calculate the work done.
(iv) Find ṙ and hence evaluate the kinetic energy 12 m|ṙ|2 of the particle at time t = π.
Verify that the gain in kinetic energy is equal to the work done.
5. Prove the result of Question 4(iv) in general as follows:
d 1
ṙ
·
ṙ
= r̈ · ṙ, using rules for vector differentiation.
(i) Show that dt
R2
(ii) Deduce that F · dr = 21 m|ṙ|2 . (Hint: substitute F = mr̈ and put dr = ṙ dt.)
48
MATH2011/2/4
1
6. Show that the vector field F = x2 +y
2 (−y, x) is conservative where it is defined. Evaluate
H
F · dr around the unit circle without using the potential function. Why is the answer
not zero?
Double and repeated integrals
An ordinary definite integral
Rb
a
f (x) dx represents (if f (x) is positive) the area of a plane
region bounded by the lines x = a, x = b, the x axis, and the curve y = f (x). If we
now take a function of two variables f = f (x, y) (again assumed for simplicity to be
positive), defined in a bounded region R in the (x, y) plane, then the double integral
RR
f (x, y) dA represents the volume of a solid region lying above the (x, y) plane, below
R
the surface z = f (x, y), and within “walls” erected on the boundary of R. In particular, if
f (x, y) = 1 (constant), then the solid has constant height 1 and horizontal cross-sections
identical with R, so its volume (i.e. cross-sectional area times height) is equal to the area
of R. Thus we have
ZZ
dA = Area of R.
R
Figure 4.2. Double integral as the limit of a sum
A double integral can be regarded as the limit of a sum, as shown in Figure 4.2.
Imagine the base region R split up into small subregions numbered by two indices, say
i and j. Suppose the (i, j)th subregion has area ∆Aij and contains the point (xi , yj ). Then
f (xi , yj )∆Aij represents the volume of a pillar of height f (xi , yj ) erected on this subregion
and stretching up to a point on the surface. By adding up the area of all of these pillars,
we obtain an approximation to the total volume, i.e.
m X
n
X
i=1 j=1
f (xi , yj )∆Aij ≈
ZZ
f (x, y) dA.
R
This becomes an exact expression for the double integral if we let m and n tend to infinity
in such a way that the subareas ∆Aij all tend to zero.
Calculus Chapter 4
49
The average value of a function f (x, y) over a region R is defined to be the ratio
RR
RR
f (x, y) dA
=
dA
R
f (x, y) dA
,
Area of R
R
RRR
because of the obvious interpretation
Total volume
.
Base area
Average height =
For practical purposes we need a more efficient procedure for evaluating double integrals. This comes by slicing the solid region into plane vertical sections, which we
first take parallel to the y axis. The area of each vertical section is a path integral, say
R y1 (x)
f (x, y) dy, along a line on which x is constant and ds = dy. The endpoints y = y0 (x)
y0 (x)
and y = y1 (x) are equations of boundary curves of R, as shown in Figure 4.2. Since VolR
ume = Area dLength, and since there is a section of this type for each x from x = a
to x = b, we obtain
ZZ
Z b Z y1 (x)
f (x, y) dA =
f (x, y) dy dx,
R
a
y0 (x)
which is called a repeated integral.
Figure 4.3. Repeated integrals
Similarly, if we take vertical sections parallel to the x axis, as also illustrated in FigR x (y)
ure 4.3, then the area of a typical section is x01(y) f (x, y) dx, where y is constant, and
x = x0 (y) and x = x1 (y) are equations of the left and right boundaries of R. By integrating these areas with respect to y from y = c to y = d, as shown, we see that
ZZ
f (x, y) dA =
R
Z
c
d
Z
x1 (y)
f (x, y) dx dy,
x0 (y)
which is a repeated integral in which the order of integration has been reversed.
50
MATH2011/2/4
y=d
y=y1 HxL
Hx,yL
x=x0 HyL
x=x1 HyL
Hx,yL
y=y0 HxL
x=a
y=c
x=b
Figure 4.4. Reversing the order of integration
If the region R is not convex, so that some vertical sections cut the boundary more than
twice, or if the equations for the boundary curves change, then R must be subdivided, and
the repeated integrals over each subregion evaluated separately, and then added together.
Tutorial RRquestions — double and repeated integrals
7. Evaluate
R
xy dA as the limit of a sum, where R is the rectangle with vertices (0, 0),
(0, 1), (1, 0), and (1, 1). (Hint: take ∆Aij =
1
mn
and (xi , yj ) = ( mi , nj ), where i goes
from 1 to m, and j goes from 1 to n.
Check your answer by expressing the double integral as a repeated integral.
RR
8. Evaluate the double integrals R f (x, y) dx dy as repeated integrals. Check your answers by reversing the order of integration.
(a) f (x, y) = xy ; R is bounded by the curves xy = 1, y = 1, and x = 2.
√
(b) f (x, y) = x − y 2 ; R is the region between the curves y = x2 and x = y 4 .
(c) f (x, y) = cos(x + y); R is the rectangle with vertices (0, 0), (0, π), ( π2 , 0) ( π2 , π).
√
√
(d) f (x, y) = xy; R is the region between the curves y = x and y = x.
9. Find the areas of the regions in Question 8. Hence calculate the average values of the
given functions over the corresponding regions.
10. By sketching the region, then reversing the order of integration, evaluate:
Z 4 Z √4−x p
Z 4Z 2
3
3ex dx dy
(b)
(a)
y 2 + x dy dx.
√
0
y
0
0
RR
11. If R is the quadrilateral with vertices (0, 0), (1, 1), (2, 6), (0, 4), write R f (x, y) dy dx
as the sum of two repeated integrals, inserting the correct lower and upper endpoints
on each integral sign. How many separate repeated integrals would you need if you
reversed the order of integration?
12. Find the volumes of the solids enclosed by the surfaces below. (Hint: integrate the
difference between the upper and lower z values. The boundaries of the region of
integration are found by using those equations that do not involve z (which represent
vertical “walls” for the solid), or by equating z values to find where the upper and lower
surfaces meet.)
(a) x = 0, y = x, z = 0, z = 1 − y 3 .
(c) z = x2 + y 2 , z = 2x.
(b) z 2 = 4ax, x2 + y 2 = ax.
(d) x2 + y 2 = 1, y 2 + z 2 = 1.
Calculus Chapter 4
51
Change of variables in double integrals
dx
R
R
In integration by substitution, when we replace an integral f (x) dx by f x(u) du
du,
dx
the factor du can be thought of as a linear scaling factor, since in general the u scale on
the axis will differ from the x scale, and may also vary from point to point. Similarly,
when the variables x and y in a double integral are replaced by new variables u and v,
a scaling factor for area must be introduced, since increases of ∆u in u and ∆v in v will
not in general increase the area by simply ∆u∆v. The scaling factor for area is called the
Jacobian of the transformation, and it is denoted J or
∂(x,y)
∂(u,v) .
Theorem. If r = (x, y), where x = x(u, v) and y = y(u, v), then the Jacobian J or
is given by the formula
∂r
∂r
∂(x, y)
.
=
×
∂(u, v)
∂u ∂v
∂(x,y)
∂(u,v)
Proof. The lines u constant, v constant, u + ∆u constant, and v + ∆v constant transform
into curves in the (x, y) plane with the same equations, interpreted implicitly, as shown in
Figure 4.5. Thus the rectangle bounded by these lines in the (u, v) plane transforms into
a region in the (x, y) plane with four curved boundaries as shown. However, using first
∂x
approximations of the type x(u + ∆u, v) ≈ x(u, v) + ∂u
∆u, we obtain
∂x ∂y
∂r
r(u + ∆u, v) = x(u + ∆u, v), y(u + ∆u, v) ≈ (x, y) + ∂u
, ∂u ∆u = r + ∂u
∆u
∂r
, ∂y ∆v = r + ∂v
∆v.
r(u, v + ∆v) = x(u, v + ∆v), y(u, v + ∆v) ≈ (x, y) + ∂x
∂v ∂v
Figure 4.6. Jacobian of a transformation
It follows that the region with curved boundaries in the (x, y) plane can be approximated
∂r
∂r
by the parallelogram defined by the vectors ∂u
∆u and ∂v
∆v. The parallelogram has area
∂r
∂u ∆u
∂r
× ∂v
∆v , which is equal to |∆u| |∆v|
is simply |∆u| |∆v|. Thus we have
∂r
∂u
×
∂r
∂v
, whereas the area of the rectangle
area of parallelogram
∂r
∂r
area in (x, y) plane
.
≈
=
×
corresponding area in (u, v) plane
area of rectangle
∂u ∂v
In the limit, as ∆u and ∆v tend to zero, the result becomes exact, and gives the required
formula.
52
MATH2011/2/4
Corollary 1. An alternative formula for the Jacobian J or
∂(x, y)
= det
∂(u, v)
xu
yu
xv
yv
∂(x,y)
∂(u,v)
is
,
where the subscripts denote partial derivatives.
Proof. It is easy to show that
∂r
∂u
×
∂r
∂v
= (0, 0, xu yv − yu xv ), so
∂r
∂r
= xu yv − yu xv = det
×
∂u ∂v
∂(x,y) −1
,
∂(u,v)
Corollary 2. ∂(u,v)
∂(x,y) =
reciprocal of J.
xu
yu
xv
yv
.
i.e. the Jacobian of the inverse transformation is the
Proof. The result is, in a sense, obvious, since the Jacobians are ratios of correspond
xu xv
ing areas in the two planes. More formally, we know that the matrices
and
yu yv
ux uy
are inverses of each other, so their determinants are reciprocals of each other,
vx vy
which by Corollary 1 gives the required result.
With the use of the Jacobian, a double or repeated integral can be transformed as
follows:
ZZ
ZZ
∂(x,y)
du dv.
f (x, y) dx dy =
f x(u, v), y(u, v) ∂(u,v)
R
S
The endpoints of the integrals with respect to u and v are obtained by expressing the
equations of the boundary of the original region R in terms of u and v. The most important
transformation is to polar co-ordinates r and θ, for which we have
∂(x,y)
∂(r,θ)
= |xr yθ − yr xθ | = cos θ(r cos θ) − sin θ(−r sin θ) = r(cos2 θ + sin2 θ) = r,
so a double integral transforms from cartesian to polar co-ordinates thus:
ZZ
ZZ
f (x, y) dx dy =
f (r cos θ, r sin θ) r dr dθ.
R
S
Tutorial questions —√change √of variables in double integrals
13. For the transformation x = u − v, y =
∂(x,y)
,
(i) find the Jacobian ∂(u,v)
(iii) find the Jacobian
∂(u,v)
∂(x,y) ,
u + v,
(ii) solve for u and v in terms of x and y,
(iv) verify that
∂r
∂u
∂(x,y) ∂(u,v)
∂(u,v) ∂(x,y)
= 1.
14. For a transformation r = r(u, v), explain why the vector
is a tangent to the curve
with implicit equation v = constant, and why the vector ∇v is a normal to the same
Calculus Chapter 4
53
curve. Find the dot product of these two vectors, and prove that the dot product is
zero.
∂y
∂x
∂x
∂r
15. If ∂u
= ∂y
∂v and ∂v = − ∂u (the Cauchy-Riemann equations), show that the vectors ∂u
∂r
and ∂v
are perpendicular and of the same magnitude. (It follows that a small square
∂(x,y)
∂(u,v)
transforms approximately to a square.) Also show that
=
∂r 2
.
∂u
16. If R is the triangle with vertices (0, 0), (3, 1), (1, 2), find the equations of the boundaries
of R. Transform these equations by the substitutions x = 3u+v, y = u+2v,
and find the
ZZ
dx dy
.
Jacobian of the transformation. Use the transformation to evaluate
R x + 2y + 5
17. By making the transformation x = u2 − v 2 , y = uv (where u > 0 and v ≥ 0), evaluate
ZZ p
the integral
x2 + 4y 2 dy dx, where R is the region bounded by the positive x axis,
R
the positive y axis, and the parabola y 2 = 1 − x.
* 18. Sketch the region bounded by the rays with polar angles θ and θ +∆θ, and the circles
with centre at the origin and radii r and r+∆r. Find the area of the region (difference
between two sectors). Hence prove directly that ∂(x,y)
= r.
∂(r,θ)
19. Re-do the integrals in Question 12(b)–(c) by changing to polar co-ordinates.
Z 1 Z √x−x2 p
1 − x2 − y 2 dy dx and
20. Sketch the region of integration for the integral
0
0
then evaluate the integral by changing to polar co-ordinates.
21. Use the transformation x =
1
2 (v
+
u
v ),
y =
1
2 (v
−
u
v)
to evaluate
ZZ
R
(x2 − y 2 ) dx dy,
where R is the region bounded by the curves x = y, x2 − y 2 = 1, x + y = 1, x + y = 2.
22. Sketch the region R in the first quadrant bounded by the curves xy 2 = 1, xy 2 = 4,
y 2 = 4x, y 2 = 9x. Find the area of R by using the transformation x2 = uv , y 4 = uv,
where u > 0 and v > 0.
23. The work done during one cycle of a Carnot engine is equal to the area of the region
in the (p, v) plane enclosed by the curves pv = x0 , pv = x1 , pv γ = y0 , pv γ = y1 , where
x0 , x1 , y0 , y1 and γ are positive constants (and γ 6= 1). Determine this area by using the
−1
∂(p,v)
substitutions x = pv, y = pv γ . (Hint: ∂(x,y)
= ∂(x,y)
.)
∂(p,v)
R ∞ −x2
√
dx = 21 π.) If Q(R) denotes the region in the first quadrant
* 24. (A proof that 0 e
lying inside the circle x2 + y 2 = R2 , show, by changing to polar co-ordinates, that
ZZ
2
2
2
π
e−x −y dx dy = (1 − e−R ).
4
Q(R)
√
If S(R) denotes the square 0 ≤ x ≤ R, 0 ≤ y ≤ R, show that Q(R) ⊂ S(R) ⊂ Q(R 2)
(use a sketch), and deduce that
2
π
(1 − e−R ) <
4
Z
R
0
Z
0
R
2
e−x
−y 2
dx dy <
2
π
(1 − e−2R ).
4
54
MATH2011/2/4
Show that the integral in the middle is equal to
RR
0
2
2
e−x dx , and deduce the final
result by letting R → ∞ and then taking square roots.
Green’s theorem
We proved in Chapter 3 that if a vector field is conservative, then its curl is zero, but we
were unable to prove the converse. We now do so by means of Green’s theorem, which
connects a path integral around the boundary of a region with a double integral over the
region itself.
Green’s theorem. If f (x, y) and g(x, y) are functions defined on a simple closed curve C
and throughout its interior region R, then
I
ZZ
∂f
∂g
f dx + g dy.
∂x − ∂y dA =
C
R
(The path integral on the right hand side can also be written as
taken in the positive, i.e. anticlockwise, direction.)
H
C
(f, g) · dr, and must be
Proof. We suppose the region R is convex, from which it follows that it can be expressed
either in the form a ≤ x ≤ b, y0 (x) ≤ y ≤ y1 (x) or in the form c ≤ y ≤ d, x0 (y) ≤ x ≤
x1 (y), as shown in Figure 4.4. (Otherwise split R up into convex subregions, and treat
them separately.) We write the double integral on the left hand side as the difference of
two repeated integrals, choosing the order of integration appropriately for each one. This
gives
ZZ
Z b Z y1 (x)
Z d Z x1 (y)
∂g
∂f
∂g
∂f
− ∂y dA =
dx dy −
dy dx
∂x
∂x
∂y
R
x0 (y)
c
a
y0 (x)
y1 (x)
x1 (y)
Z b
Z d
f (x, y)
dx
g(x, y)
dy −
=
c
=
Z
a
x0 (y)
d
y0 (x)
Z
g(x1 (y), y) − g(x0 (y), y) dy −
a
I
I
I
=
g dy +
f dx =
(f dx + g dy).
c
C
C
b
f (x, y1 (x)) − f (x, y0 (x)) dx
C
Corollary 1. If R is a plane region with boundary C, then
I
I
I
1
Area of R =
x dy = −
y dx = 2
(−y dx + x dy).
C
C
C
RR
Proof. Since the area of R is equal to R 1 dA, it follows from Green’s theorem that the
H
∂g
area is also equal to C f dx + g dy for any functions f and g such that ∂x
− ∂f
= 1. The
∂y
above are just some examples (firstly f = 0 and g = x, secondly f = −y and g = 0, and
thirdly f = − 21 y and g = 21 x).
Calculus Chapter 4
55
Corollary 2. If F is a vector field in the (x, y) plane defined on a simple closed curve C
and throughout its interior region R, then:
H
H
RR
(F · u) ds, where u is the unit anticlockwise tangent
F
·
dr
=
(curl
F
·
k)
dA
=
(i)
C
C
R
vector to the curve C,
H
RR
(ii) R div F dA = C (F · n) ds, where n is the unit outward normal vector to the curve C.
In words, this result says that:
(i) the double integral of curl F · k over the interior of C is equal to the path integral of
the anticlockwise tangential component of F around C,
(ii) the double integral of div F over the interior of C is equal to the path integral of the
outward normal component of F around C.
∂F2
∂x
Proof. (i) Write F = (F1 , F2 , 0); then curl F = (0, 0,
1
− ∂F
∂y ) and curl F·k =
∂F2
∂x
Thus by Green’s theorem with f = F1 and g = F2 we have
ZZ
R
(curl F · k) dA =
ZZ
∂F2
∂x
R
−
∂F1
∂y
F·
dr
ds
dA =
I
F1 dx + F2 dy =
C
I
C
1
− ∂F
∂y .
F · dr.
Next by definition we have
I
C
F · dr =
dr
ds
since the unit tangent vector u =
I
C
ds =
I
F · u ds,
C
= ( dx
, dy ).
ds ds
(ii) It is clear (see the section on plane curvature in Calculus Chapter 2) that the vecdx
tor ( dy
ds , − ds ) is perpendicular to u, has the same magnitude as u, and points outward
dx
from C. Thus n = ( dy
ds , − ds ), or
n ds = (dy, −dx),
and by Green’s theorem with f = −F2 and g = F1 we have
ZZ
div F dA =
R
ZZ
R
=
I
C
The quantity
closed curve C.
H
C
∂F1
∂x
F·
+
∂F2
∂y
dA =
dx
( dy
ds , − ds ) ds
I
(−F2 dx + F1 dy)
C
=
I
C
F · n ds.
F · dr is often called the circulation of the vector field F around the
56
MATH2011/2/4
Corollary 3 — interpretation of curl and div. If F is a smooth velocity field in the
(x, y) plane, then at any point:
(i) curl F · k is equal to the circulation per unit area, or twice the angular velocity of the
field about a vertical axis through the point,
(ii) div F is the radial outflow rate per unit area at that point.
Proof. (i) Let C be a circle of radius ∆r centred at the point and let R denote its interior.
Then
RR
(curl F · k) dA
.
Average value of curl F · k inside C = R
π∆r 2
Now angular velocity is equal to tangential velocity divided by radius, so
2
× average tangential velocity around C
∆r
H
2 C F · dr
(since the length of C is 2π∆r)
=
∆r 2π∆r
H
F · dr
circulation
= C
=
.
2
π∆r
area
2 × average angular velocity around C =
By Corollary 2(i) the two expressions are equal, and the result follows by letting ∆r → 0,
because if curl F is continuous, then, as the circle shrinks down to the point, the averages
on the left hand sides tend to the actual values at the point.
(ii) Similarly we have
Average value of div F inside C =
RR
div F dA
.
π∆r 2
R
Since F · n is the outward radial component of velocity across C, it follows that
is the total outflow across C, and therefore
Total outflow across C per unit area =
H
C
H
C
(F · n) ds
(F · n) ds
.
π∆r 2
By Corollary 2(ii) the two right hand sides are equal, and the result follows by equating the
left hand sides and letting ∆r → 0, as before, and using the fact that div F is continuous.
Corollary 4. If a vector field F defined in a plane region R satisfies curl F = 0, then F is
conservative. (Note that F must be defined throughout the interior of every simple closed
curve lying in R.)
H
Proof. Since curl F = 0, we have curl F · k = 0, so by Corollary 2 F · dr = 0 for every
R
closed curve in R. It follows that F · dr is independent of the path, because if D1 and
R
R
D2 are two paths with the same initial and final points, then D1 F · dr − D2 F · dr is an
Calculus Chapter 4
57
integral around a closed curve (out along D1 and back along D2 ), so it is equal to zero.
Rr
Now choose some fixed point r0 in R, and define a scalar field φ by φ(r) = r0 F · dr for
every point r in R. (The integral can be taken along any path from r0 to r.)
If we take a path r = r(s), then, using the Fundamental Theorem of Calculus, we have
dφ
d
=
ds
ds
Z
s
s0
F·
dr
dr
ds = F ·
= F · u,
ds
ds
where u is the unit tangent vector to the path. But by the formula for directional derivatives we also have
dφ
= ∇φ · u.
ds
From these two expressions for
dφ
ds
we see that F · u = ∇φ · u for all unit vectors u,
from which it follows that F = ∇φ. Thus φ is a potential function for F, and so F is
conservative.
We have now proved that the following statements about a plane vector field F are all
equivalent:
• F is conservative,
• F has a potential function φ such that F = ∇φ,
R
• F · dr depends only on the endpoints of the path of integration,
• ∇ × F = 0.
Tutorial questions — Green’s theorem
25. Verify Green’s theorem in the following cases, by evaluating the integrals on both sides
and showing that they are equal:
(a) f = xy, g = x + y, R is the region bounded by the curves y = x2 and y = x + 2.
(b) f = x + y, g = xy, R is the region in the first quadrant bounded by the curves
y = sin x, y = 1, and x = 0.
26. Evaluate the path integrals
them as double integrals.
R
C
f dx + g dy below, by using Green’s theorem to rewrite
(a) f = cos x sin y − xy, g = sin x cos y, C is the circle x2 + y 2 = 1.
(b) f =
1 y
xe ,
g = ey ln x + 2x, C is the boundary of the region enclosed by the curves
y = x4 + 1 and y = 2.
(c) f = 12 y 2 sin x + x ln y, g = x − y cos x + 12 x2 y −1 , C is the circle x2 + (y − 2)2 = 1.
27. Use Green’s theorem to find the areas of the regions enclosed by the curves below.
(a) The four cusped hypocycloid r = a(cos3 φ, sin3 φ) (−π ≤ φ ≤ π).
(b) The loop in the curve r(t) = t3 − 3t, t210
+1 . (Hint: first find distinct values t0 and
t1 such that r(t0 ) = r(t1 ), to find the crossing point.)
58
MATH2011/2/4
28. If F = (xy, x + y), and R is the region enclosed by the curves y = x and y =
H
RR
evaluate R (curl F · k) dy dx and C F · dr, and verify that they are equal.
RR
H
Similarly, evaluate R div F dy dx and C F · n ds, and verify that they are equal.
Answers
1. (a)
2. (a)
5π
√
3 6
17
,
3
−
√
x,
√
√1
2
ln 3,
(b)(i) 62 , (ii) 21 , (iii) − 21 .
√
√
(b) 12 2 + 2 ln(1 + 2),
(c) 13 π 3 − 21 π 2 ,
r = (a cos t, b sin t)),
(f) −2πab.
√
(c) 23,
3. (a) 12,
(b) − 3,
(d) 0
(e) 0 (putting
(d) 21 π 2 .
4. (a) r = (t2 , cos t), ṙ = (2t, − sin t).
6. (a) 2π. (F is not defined at the origin.)
7. 14 .
2
(b) 71 ,
(c) −2,
(d) 27
.
8. (a) 2 ln 2 − 34 ,
(b)
9. Average values are: (a) (2 ln 2 − 43 )/(1 − ln 2),
10. (a) e8 − 1,
(b) 8.
R 2 R x+4
R 1 R x+4
11. 0 x f (x, y) dy dx + 1 5x−4 f (x, y) dy dx.
12. (a)
13. (i)
3
10 ,
1
2
2 (u
3
(b) 32
15 a ,
2 −1/2
−v )
,
π
2,
15
,
49
(c) − π42 ,
(d) 94 .
Three integrals.
(d) 16
3 .
2
2
(c)
(b) u = 21 (x + y ), v = 12 (y 2 − x2 ),
(iii) 2xy.
∂r
14. The curve is r = r(u, v), where v is constant and u is a parameter, so ∂u
is a tangent.
∂v ∂x
∂v ∂y
∂r
The curve is given implicitly by v = constant, so ∇v is a normal. ∇v · ∂u = ∂x ∂u + ∂y
∂u =
∂v
,
∂u
which is zero since v is kept constant when differentiating partially with respect to u.
∂r
∂r
∂r
∂r
= ∂v
and ∂u
· ∂v
= 0, so the sides of the parallelogram are of equal length
15. (a) ∂u
and at right angles to each other.
16. Boundaries y = 13 x, y = 2x, y = 21 (−x + 5). Integral = 1 − ln 2.
28
.
17. 45
18. Area = 12 (r + ∆r)2 ∆θ − 21 r 2 ∆θ = (r + 21 ∆r)∆r∆θ. Find the Jacobian by dividing
through by ∆r∆θ and letting ∆r and ∆θ tend to zero.
20. π6 − 29 .
21. 41 ln 2.
22. J = 14 u−1/4 v −5/4 . Integral =
23.
1
γ−1 (x1
(a) 49 ,
25.
26. (a) 0,
− x0 ) ln yy10 .
27. (a) 38 πa2 ,
1
28. 10
, 14 .
(b) 1 −
(b) 16
5 ,
4
√
(
3 6
√
3−
√
√
2)(2 2 − 1).
3π
8 .
(c) π.
√
√
√
(b) t0 = − 3, t1 = 3, area = 20(2π − 3 3).
Calculus Chapter 4
æ
59
58
MATH2011/2/4
æ
Algebra Chapter 1
59
Algebra Chapter 1
Complex numbers
Real-imaginary form
A disadvantage of the real number system is that it is impossible to take the square root
of a negative number. In the 18th century, mathematicians overcame the problem by
adjoining what they called an imaginary number i such that
i2 = −1.
(Simplify i3 , i4 , and so on, until you see the pattern. Then repeat with i−n , for n =
1, 2, 3, . . . ) In any useful number system it must be possible to perform the usual algebraic
operations, so if you adjoin i to your real number system R, then you must also include
all real multiples iy (with y in R) and all sums x + iy (with x and y in R). These make
up what is called the complex number system C. A general complex number z can
thus be written
z = x + iy
(with x and y in R).
When z is written like this, we say that z is in real-imaginary form, we call x the real
part of z and y the imaginary part of z, and we write
x = Re(z)
and
y = Im(z).
Note that what we call the imaginary part of z is actually a real number: it is the coefficient
of i. The multiple iy is called a pure imaginary number, and x is called a pure real
number. For two complex numbers to be equal, their real parts and their imaginary
parts must be equal. Thus from a complex equation we can obtain two real equations,
by equating real and imaginary parts, but this should be regarded as a last resort, and,
wherever possible, one should do all manipulations in complex form.
60
MATH2011/2/4
The algebraic operations can all be performed in C, using the usual rules and the fact
that i2 = −1. Thus if z = x + iy and w = u + iv, then
z + w = (x + iy) + (u + iv) = (x + u) + i(y + v)
z − w = (x + iy) − (u + iv) = (x − u) + i(y − v)
zw = (x + iy)(u + iv) = xu + iyu + ixv + i2 yv = (xu − yv) + i(xv + yu).
It follows immediately that
Re(z ± w) = Re(z) ± Re(w) and Im(z ± w) = Im(z) ± Im(w),
i.e., the real or imaginary part of a sum or difference is equal to the sum or difference of
the real or imaginary parts, but the real or imaginary part of a product is not the same
as the product of the real or imaginary parts.
For division we have to multiply top and bottom by the conjugate surd of the denominator, i.e., the complex number obtained by changing the sign of the imaginary part only.
Thus we have
x + iy
(x + iy)(u − iv)
1
z
=
=
= 2
(xu + vy) + i(yu − vx) .
2
w
u + iv
(u + iv)(u − iv)
u +v
Note that the last denominator u2 +v 2 is non-zero unless both u and v are zero, so division
by any non-zero complex number is possible.
Conjugation is so important that we have a special notation for it: if z = x + iy, then
we define the complex conjugate z (pronounced z bar) by
z = x − iy.
It is easy to verify that
(z ± w) = z ± w,
z × w = z × w, and z ÷ w = z ÷ w,
so conjugation behaves well with all four algebraic operations. From the definition it also
follows easily that z = z, i.e., conjugating twice gets you back to the number itself. Note,
too, that z = z if and only if z is pure real, and that
z z = (x + iy)(x − iy) = x2 + y 2 ≥ 0,
so z z is real and non-negative for any complex number z, just as x2 is real and non-negative
for any real number x.
Algebra Chapter 1
61
However, it is important to remember that there are no restrictions on the square of a
general complex number. It follows that every complex number has a square root (in fact,
two square roots, which are negatives of each other). One method of finding the square
roots of a complex number is as follows. Given a complex number c, we must show that
the equation z 2 = c can always be solved. If z = x + iy and c = a + ib, then by equating
real and imaginary parts we obtain
x2 − y 2 = a
(1)
2xy = b.
(2)
We can now use (2) to substitute for y in (1), which gives x4 − ax2 − 41 b2 = 0, which
is a quadratic equation in x2 . We solve this and then substitute back in (2) to find y.
Alternatively, by squaring and adding (1) and (2), then taking square roots, we have
x2 + y 2 =
p
a 2 + b2 .
(3)
Now we can add and subtract (1) and (3), and take square roots again, to get x and y, from
which we can write down z. Note that there are only two values of z, not four, because
the signs of x and y have to be chosen in such a way that equation (2) is satisfied.
Tutorial questions — Real-imaginary form
1. Express in real-imaginary form:
(p + iq)2
(p − iq)2
p + iq
(b)
−
(a)
p − iq
(p − iq)2
(p + iq)2
√
(1 − i)3 ( 3 − i)
17(1 + 2i)
√
(c)
(d)
2
(1 + 3i)(1 + 4i)
(1 + i 3)
√
− 2(1 − i)
2
√
(e) √
.
(f)
1 + cos θ + i sin θ
i( 3 + i)(1 + i 3)
2. By expressing each side separately in terms of x, y, u, v (where z = x+iy and w = u+iv),
show that:
(a) Im(zw) 6= Im(z) Im(w)
(b) Re(z/w) 6= Re(z)/ Re(w).
3. Prove the following identities for a general complex number z:
Re(z) =
z+z
z−z
and Im(z) =
.
2
2i
4. Some people find complex algebra difficult to believe in, but it corresponds to a part of
matrix algebra, as follows.
x y
(i) If z = x +iy and Z =
, show that the rows of Z are the real and imaginary
−y x
parts of z and iz respectively. If w = u + iv and W is the matrix corresponding to w,
62
MATH2011/2/4
using the same rule, write down W .
(ii) Use matrix algebra to find Z + W , ZW , and W Z. Note that ZW = W Z.
(iii) Use complex algebra to find z + w, zw, and wz. Note that zw = wz.
(iv) Show that, using the rule from part (i), the matrices in part (ii) correspond to the
complex numbers in part (iii).
5. Using the notation of Question 4, show that the matrices ZW −1 and W −1 Z are equal,
and both correspond to the complex quotient z/w.
What matrix operations correspond to conjugation? (There are two answers.)
6. Show, by writing z = x + iy and w = u + iv and then simplifying both sides of each
equation, that
(z ± w) = z ± w,
z × w = z × w,
z ÷ w = z ÷ w.
7. (a) If z = w−2z, show, without breaking into real and imaginary parts, that 3z = 2w−w.
(Hint: take conjugates of the given equation, then eliminate z.)
(b) If z = (1 + i)w + (3 − i)w, express w in terms of z and z. (Take conjugates, and
then eliminate w.)
(c) If z + w + z w = 0, express w in terms of z and z.
8. (a) Find the complex square roots of 2i, 3 − 4i, −5 + 12i, 2 + 2i.
(b) Use the quadratic formula and the results of (a) to solve the equations
(i) z 2 + (1 − i)z − i = 0
(iii) iz 2 + (2 − 2i)z − 14 − 5i = 0
(ii) iz 2 + 5z − 1 − 7i = 0
(iv) z 2 − 2(1 + i)z − 2 = 0.
9. (a) Find the three complex cube roots of 8, by solving the equation z 3 − 23 = 0. (Hint:
find one obvious factor by the Remainder Theorem, then divide and solve the resulting
quadratic equation.)
(b) Find the four complex fourth roots of −4. (Hint: find the square roots of the square
roots of −4.)
The complex plane
One of many unforeseen benefits of the complex number system is its geometric structure,
which is related to its algebraic properties in a way similar to that in which geometric
results can be proved by vector algebra. We know that the real numbers lie on a line.
In order to represent the complex numbers geometrically, we need two real co-ordinates,
one for the real parts and one for the imaginary parts. We therefore identify the complex
number z = x + iy with the point (x, y) in the plane (or with its position vector, also
written (x, y)), and we can write
x + iy ←→ (x, y) or sometimes x + iy = (x, y).
Algebra Chapter 1
63
This representation is called the complex plane or Argand diagram. Since x + 0i ↔
(x, 0), it follows that pure real numbers lie on the horizontal axis, which is called the real
axis. This shows how the real number line R lies inside the complex number plane C.
Similarly, since iy = 0 +iy ↔ (0, y), pure imaginary numbers lie on the vertical axis, which
is called the imaginary axis.
Addition and subtraction of complex numbers then correspond precisely with addition
and subtraction of vectors in the plane, since
(x + iy) ± (u + iv) = (x ± u) + i(y ± v) while (x, y) ± (u, v) = (x ± u, y ± v).
These can be described geometrically by the parallelogram of vectors. The points 0, z, w,
z + w thus form the vertices of a parallelogram whose diagonals are the vectors z + w and
z − w. Note that the complex number z − w corresponds to the vector from the point w
to the point z.
Multiplication by real numbers corresponds to scalar multiplication of vectors, since
x(u + iv) = (xu) + i(xv) while x(u, v) = (xu, xv),
which always results in a parallel vector. On the other hand, multiplication by a pure
imaginary number leads to a perpendicular vector, since
(iy)(u + iv) = (−yv) + i(yu) and (u, v) · (−yv, yu) = 0.
In general, complex multiplication is a combination of a rotation and a scaling, as we shall
show later. Complex conjugation changes the sign of the imaginary part of a number,
while leaving the real part unchanged, so it is easy to see that it corresponds to reflecting
the number in the real axis. The important result z z = x2 + y 2 shows that z z is the
square of the distance of the point z from the origin.
An equation z = z(t) (where t is real) represents the parametric equations of a curve in
the plane, since by equating real and imaginary parts we can obtain x = x(t) and y = y(t).
Similarly, an expression f (z) = constant can be an implicit representation of a curve in
the plane.
Tutorial questions — The complex plane
10. Sketch all points z in the complex plane satisfying:
(a) Re(z) = 2
(d) Im(z) > 1
(b) Im(z) = −1
−1
(e) e
< Re(z) < e
(c) Re(z) ≥ 0
(f) −π < Im(z) ≤ π.
11. Draw the triangle with vertices 1, 2, and 2 + 2i in the complex plane. Then draw the
triangles obtained by
64
MATH2011/2/4
(i) adding 1 − i to each vertex,
(ii) multiplying each vertex by 1 − i,
(iii) conjugating each vertex.
Note the geometrical effect of each operation.
12. Sketch in the complex plane the three cube roots of 8 and the four fourth roots of −4.
(See Question 9.)
13. Show that the average of two complex numbers is the midpoint of the line segment
joining them. By considering the parallelogram with vertices 0, z, w, and z + w, show
that the diagonals of a parallelogram bisect each other. (Hint: find the midpoint of
each diagonal.)
14. Show that the dot product of the position vectors of two complex numbers z and w is
equal to Re(z w) or Re(z w). (Hint: write z = x + iy and w = u + iv.) Deduce that if
z/w is pure imaginary, then the position vectors of z and w are perpendicular to each
other.
15. Describe geometrically the effect of the following operations on a general complex number z, and write down the expression obtained after the operation.
(a) conjugation
(b) multiplication by i
(c) multiplication by −1
(d) adding 2 − i
(e) conjugation then multiplication by i
(f) multiplication by i followed by conjugation.
Why are (e) and (f) different?
16. (a) A curve in the complex plane has parametric equation z = t + it2 . Equate real and
imaginary parts, eliminate the parameter, and identify the curve.
(b) Another curve in has implicit curve Re(z 2 ) = 1. Write the implicit equation in
terms of x and y, and identify the curve.
17. Sketch the curve Re((2 + i)z) = −1. Show that a general straight line can be written in
the form Re(αz) = c, where α is a complex constant and c is a real constant. (Hint: write
α = a + ib.)
Modulus-argument form
We have shown that real-imaginary form (z = x+iy) corresponds to Cartesian co-ordinates
for a point in the plane, since x + iy corresponds to the point (x, y). If we now change to
polar co-ordinates (r, θ), where we know that x = r cos θ and y = r sin θ, we can substitute
for x and y to give
z = r(cos θ + i sin θ).
We say z is written in modulus-argument form; r is called the modulus of z, written
r = |z|, and θ is called the argument of z, written θ = arg(z). The argument of a
complex number is not unique, since adding any whole number of 2πs to the angle does
Algebra Chapter 1
65
not change the point represented. The principal value of the argument is the value
satisfying −π < θ ≤ π.
p
From polar co-ordinates we also know that r = x2 + y 2 and tan θ = xy . Thus we have
p
|x + iy| = x2 + y 2 and tan arg(x + iy) = xy if x 6= 0.
Since |z| = r, the modulus of z is the distance between the point z and the origin 0, and
similarly by the distance formula |a − b| is the distance between the points a and b. A very
important consequence is that
zz = |z|2 .
This formula again shows that zz is real and non-negative (like the square of a real number),
although, as we remarked previously, z 2 can take on any value. When real formulae are
generalized to a complex equivalents, x2 must often be replaced by zz (not simply z 2 ) for
the formulae to remain true.
Modulus-argument form is particularly convenient for multiplication, since if we have
z = r(cos θ + i sin θ) and w = s(cos φ + i sin φ), then
zw = rs (cos θ cos φ−sin θ sin φ) +i(sin θ cos φ+cos θ sin φ) = rs cos(θ +φ) +i sin(θ +φ) ,
using compound angle formulae. Thus to multiply two numbers we multiply their moduli
and add their arguments. For complex division, since we multiply top and bottom by the
conjugate of the denominator, it is easy to see that we divide one modulus by the other
and subtract one argument from the other.
From the modulus-argument form of the product zw above it follows that
|z|
z
=
,
|zw| = |z||w| and
w
|w|
z
= arg(z) − arg(w),
arg(zw) = arg(z) + arg(w) and
arg
w
although the arguments may not be principal values. The laws for arguments are like the
laws of logarithms, and we shall see the reason for this in a later section.
If several complex numbers are multiplied by a fixed complex number z, then their
moduli are all multiplied by |z| and their arguments increased by arg(z). This means
geometrically that the position vectors of the points are all multiplied by a scaling factor
|z| (which means an enlargement if |z| > 1 and a reduction if |z| < 1), and rotated through
an angle arg(z).
There is no exact formula for the modulus of a sum; all we can say is that
|z + w| ≤ |z| + |w|.
This is called the Triangle Inequality, since the length of any one side of a triangle cannot
exceed the sum of the lengths of the other two sides. The triangle with vertices 0, z, and
z + w has sides of lengths |z + w|, |z|, and |w|.
66
MATH2011/2/4
Tutorial questions —√Modulus-argument form
18. If z = 1 − i and w = −1 + i 3, express w and z in modulus-argument form (principal
values).
19. Show that if z is in modulus-argument form, then the corresponding matrix Z (see
Question 4) is |z| times a rotation matrix.
20. Prove by induction on n that (cos θ + i sin θ)n = cos nθ + i sin nθ. (This is called De
Moivre’s theorem.)
21. Obtain expressions for |z| and arg(z) in terms of |z| and arg(z).
z
Prove that z −1 = 2 .
|z|
22. Sketch all points z in the complex plane satisfying:
(a) |z| = 3
(d) 1 < |z| ≤ 2
(b) |z − 1| = 2
(e) |2z + 3| > 4
(c) |z − 2 + i| ≤ 1
(f) |z| < 1 and 0 ≤ arg(z) ≤
π
.
2
23. Explain geometrically the difference, if any, between the statements z = w and |z| = |w|.
What is the meaning, if any, of the statements z < w and |z| < |w|?
24. Prove that |a + b|2 + |a − b|2 = 2(|a|2 + |b|2 ). (Hint: |z|2 = zz.) Re-state this result as
a geometrical theorem about parallelograms.
25. Prove that a − a is pure imaginary for any complex number a. Hence show that if
z+w
|z| = |w|, then
is pure imaginary. (Hint: make the denominator real.) Deduce
z−w
that the diagonals of a rhombus (i.e. parallelogram with sides of equal length) intersect
at right angles.
26. (i) Use the fact that (modulus)2 = (real part)2 +(imaginary part)2 to identify the curves
|z − i| = |z − 2 + 3i| and |z + 1| = 3|z − 2|.
(ii) Show that if α and β are fixed complex numbers and k is a positive real constant,
then the equation |z − α| = k|z − β| represents a line if k = 1 and a circle if k 6= 1.
(iii) Identify the polar curves 2π|z| = arg(z), and Re(z) = |z|2 − |z|.
27. (a) If α, β, and z are complex numbers, show, using Figure 1.1, that the angle subtended
.
at z by the line from α to β is equal in magnitude to arg z−α
z−β
z
?
argHz-aL
a
argHz- bL
b
Figure 1.1. Angle subtended at a point
(b) Find a cartesian equation for the curve arg z−1
z+1 = c (constant), and hence describe
the curve. (Hint: tan(argument) = (imaginary part)/(real part).) *Which theorem in
Euclidean geometry does this illustrate?
Algebra Chapter 1
67
28. If z = r(cos θ + i sin θ) and w = s(cos φ + i sin φ), prove that
φ) .
z
w
=
r
s
cos(θ − φ) + i sin(θ −
29. If z and w are as in Question 18, find w3 /z 2 first by using real-imaginary form and then
by using modulus-argument form.
30. Prove the triangle inequality |z + w| ≤ |z| + |w| algebraically as follows: square the left
hand side, and then use the results |a|2 = aa, a + a = 2 Re(a), and Re(a) ≤ |a|.
1
1
31. If |z| < 1, prove that
≤
. (Hint: write 1 = z + (1 − z), take moduli, and
|1 − z|
1 − |z|
use the triangle inequality.)
Euler’s formula
The equation z(θ) = cos θ +i sin θ (with θ real), which is a parametric equation for the unit
circle, has many properties reminiscent of exponential functions. We have shown above
that z(θ)z(φ) = z(θ + φ), like the law of exponents. Furthermore,
dz(θ)
= − sin θ + i cos θ = i(i sin θ + cos θ) = iz(θ).
dθ
This differential equation has solution z(θ) = Aeiθ (assuming that the usual rules of
calculus still hold), and we can show that A = 1 by observing that z(0) = cos 0+i sin 0 = 1.
These results led Euler to the remarkable formula
eiθ = cos θ + i sin θ.
Final confirmation of this result can be obtained by considering the Maclaurin series of
each side. Important special cases are
eiπ/2 = i,
eiπ = −1,
e2iπ = 1.
With the aid of Euler’s formula a general complex number can be written
x + iy = reiθ ,
which links real-imaginary and modulus-argument forms, i.e., cartesian and polar coordinates. This exponential version of modulus-argument form is preferable, since multiplication and division now obey the usual rules
reiθ × seiφ = rsei(θ+φ) and reiθ ÷ seiφ = rs ei(θ−φ) .
By taking conjugates of Euler’s formula (or replacing θ by −θ) we obtain e−iθ = cos θ −
i sin θ, and by adding and subtracting the expressions for e±iθ we have
cos θ =
eiθ + e−iθ
2
and
sin θ =
eiθ − e−iθ
.
2i
68
MATH2011/2/4
This explains the similar properties of circular and hyperbolic functions, since in first year
we defined
cosh θ =
eθ + e−θ
2
and
sinh θ =
eθ − e−θ
.
2
The above formulae enable us to prove various identities concerning real functions to
be proved much more simply. For example, to express cos nθ and/or sin nθ in terms of
powers of cos θ or sin θ we may use the Binomial Theorem to write
niθ
e
iθ n
n
= (e ) = (cos θ + i sin θ) =
n X
n
r=0
r
cosn−r θir sinr θ,
and then simplify and equate real or imaginary parts. For the reverse process, we have for
example that
n
cos θ =
( 21 (eiθ
−iθ
+e
n
)) = 2
−n
n X
n (n−r)iθ −riθ
e
e
,
r
r=0
which can also be simplified. Other identities are obtained by substituting z = reiθ in
familiar series expressions (geometric, binomial, or Maclaurin) derived in first year, and
then equating real or imaginary parts. These identities, expressing functions in terms of
cosines or sines of multiple angles, are called Fourier series, and we shall meet them
again in Algebra Chapter 4.
Tutorial questions — Euler’s formula
32. Express in the form reiθ : sin α − i cos α,
e2π+i ,
eiπ/6 + eiπ/3 .
P
n
∞
33. Put z = iθ in the Maclaurin series expansion ez = n=0 zn! . Split the right hand side
into sums over even and odd integers, and note that if n = 2k (even), then in = (−1)k ,
while if n = 2k + 1 (odd), then in = (−1)k i. Hence verify Euler’s formula. (Hint: use
the Maclaurin series for sin θ and cos θ.)
34. Express cos 4θ and sin 5θ in terms of powers of cos θ and sin θ respectively, and express
sin5 θ and cos6 θ in terms of multiple angles.
35. Find the sum to n terms of the series below. (They all arise from geometric series.)
(a) sin θ + sin 3θ + sin 5θ + · · ·
2
(c) sin α + x sin(α + β) + x sin(α + 2β) + · · ·
36. Find the sum to infinity of the Fourier series:
(a) cos θ +
1
3
cos 2θ +
1
9
(b) cos θ +
1
3
cos 2θ +
1
9
cos 3θ + · · ·
(d) cosh 1 + cosh 2 + cosh 3 + · · · .
cos 3θ + · · · (θ real)
(b) sin α + x sin(α + β) + x2 sin(α + 2β) + · · · (α and β real, and |x| < 1)
1
1
(c) 1+cos θ+ 2!
cos 2θ+ 3!
cos 3θ+· · · . (Hint: this is the real part of a familiar Maclaurin
series.)
Algebra Chapter 1
69
Roots and polynomials
We showed earlier that any non-zero complex number has two square roots. We can now
show that it has n nth roots for every natural number n. If a = reiθ , then a = reiθ+2kiπ ,
so
a1/n = reiθ+2kiπ
1/n
= r 1/n eiθ/n e2kiπ/n .
Here k can be any integer, but increasing k by n increases the argument of a1/n by 2π, and
therefore defines the same value of the nth root. The n different roots can thus be found
by taking any n successive values of k; the roots all lie on the circle of radius |a|1/n , and
are equally spaced like spokes on a wheel, since each one is obtained from the previous one
by multiplying by e2iπ/n , i.e., by rotating through
From the above it follows that the equation z
1
n
n
of a revolution.
= a has n solutions, i.e., that the
polynomial z n − a factorizes into n linear factors, since by the Remainder Theorem every
solution z = α of a polynomial equation p(z) = 0 corresponds to a linear factor (z − α)
of the polynomial p(z). Gauss proved the Fundamental Theorem of Algebra, which states
that if complex coefficients are allowed, then every polynomial of degree n factorizes into
n linear factors. Assuming this result, we can now prove the following theorem, which was
used in finding the null space of a D-operators.
Theorem. If a polynomial p(z) has real coefficients, then the non-real solutions of p(z) =
0 (if any) occur in conjugate pairs. In other words, if p(α) = 0 then p(α) = 0 also.
Proof. Suppose p(z) =
Pn
r=0
ar z r , where the coefficients ar are all real, so ar = ar . Then
0 = 0 = p(α) = an αn + · · · + a1 α + a0
= an (α)n + · · · + a1 α + a0
= an (α)n + · · · + a1 α + a0 = p(α),
as required.
It follows, if α is not real, that p(z) is divisible by the product
(z − α)(z − α) = z 2 − 2z Re(α) + |α|2 ,
which is a quadratic with real coefficients and negative discriminant. This explains why
every polynomial with real coefficients factorizes into real linear or quadratic factors, as
we used in first year partial fractions. Partial fractions can sometimes be simplified by
using complex numbers, since then all factors are linear.
70
MATH2011/2/4
Tutorial questions — Roots and polynomials
37. Find in modulus-argument form and plot on the complex plane all the n-th roots of a
for the following values of n and a:
(a) n = 3 and a = 1
(b) n = 4 and a = 1 + i
(c) n = 3 and a = −2 + 2i
(d) n = 5 and a = i.
38. Find the cube roots of 8 and the fourth roots of −4. Compare with your answers to
Question 9.
3z + 2
−w + 2
39. If w =
, show that z =
. Hence solve the equation (3z+2)3 = −27(2z+1)3
2z + 1
2w − 3
by first expressing it in terms of w.
40. If z is an n-th root of 1 but z 6= 1, show that 1 + z + z 2 + · · · + z n−1 = 0. (Hint: multiply
through by z −1.) Solve the equation with n = 5, and hence factorize 1 +z +z 2 +z 3 +z 4
into linear factors with complex coefficients, and then into quadratic factors with real
4π
1
coefficients. By equating coefficients of z, prove that cos 2π
5 + cos 5 = − 2 .
41. By first putting w = z 3 , solve the equation z 6 + 4z 3 + 8 = 0. Indicate all six roots
in the complex plane. Hence factorize z 6 + 4z 3 + 8 into quadratic factors with real
coefficients.
42. Factorize the polynomials below into linear factors, given that z + 1 + i is a factor of
each of them:
(a) z 3 + (−3 + i)z 2 + (1 − 4i)z + 5 + 5i
(b) z 4 + 4z 3 + 11z 2 + 14z + 10.
43. Find complex partial fractions for the following rational functions:
4
16
(a) 4
(b) 4
.
z −1
z +4
Complex exponentials, logarithms, and powers
Complex exponentials are defined using Euler’s formula:
ez = ex+iy = ex eiy = ex cos y + iex sin y.
Complex circular and hyperbolic functions can then be obtained from the familiar expressions in terms of e±iz and e±z .
Complex logarithms and powers are similarly defined by means of Euler’s formula, but
are in general many-valued, because of the fact that the argument is not unique. Since
the argument can be increased by any number of complete revolutions, the most general
modulus-argument form is z = rei(θ+2kπ) . This immediately gives
ln z = ln r + i(θ + 2kπ) = ln |z| + i(arg(z) + 2kπ),
Algebra Chapter 1
71
so the infinitely many values of ln z all have the same real part, and therefore lie equally
spaced on a vertical line. The principal value of ln z is the value arising from the principal
value of the argument. Finally, the values of the complex power z w are given by
z w = ew ln z = ew(ln |z|+i arg(z))+2kπi) = ew(ln |z|+i arg(z)) e(2πiw)k .
The power z w has a unique value if e2πiw = 1, i.e., if w is an integer. In general, the many
values lie on a logarithmic spiral of the form αeβt with α = ew(ln |z|+i arg(z)) and β = 2πiw.
The spiral closes down to a circle if |e2πiw | = 1, and opens up to a straight line if e2πiw
is pure real. The principal value of z w is the value arising from the principal value of the
argument.
Tutorial questions — Complex exponentials, logarithms, and
powers
44. Prove from the definitions that:
(a) cosh2 z − sinh2 z = 1
(b) cosh(z + w) = cosh z cosh w + sinh z sinh w
− 1)
(d) cosh2 z = 12 (cosh 2z + 1)
(c) sinh z =
(e) (cosh z + sinh z) = cosh nz + sinh nz
(f) sinh(z + 2kπi) = sinh z
2z
e −1
(g) tanh z = 2z
(h) sech2 z + tanh2 z = 1.
e +1
45. Evaluate:
√
√
(a) cosh(2 ln(2 + 3)) (b) tanh( 21 π 2eiπ/4 ) (c) sin( 2π
(d) tan(π + 12 i ln 2).
3 + i ln 5)
2
1
2 (cosh 2z
n
46. If cos(x + iy) = u + iv, where x, y, u, and v are real, express u and v as functions of x
and y. Hence show that | cos(x + iy)|2 = cos2 x + sinh2 y = − sin2 x + cosh2 y.
Similarly, if z = x + iy, write sin z in real-imaginary form, and show that sin z 6=
Im(eiz ).
47. Evaluate (in general) and plot in the complex plane, indicating which is the principal
value:
(a) ln i
(b) ln(1 + i)
√
(c) ln(−3 + i 3)
(d) ln 1
1+i
1
).
(f) i(1/3) +i
(g) (1 + i)(1/4)−i
(h) tan( ln
2i
1
−
i
√
48. Show that |1 + z| = 1 + 2r cos θ + r 2 and arg(1 + z) = arctan r sin θ/(1 + r cos θ) if
P∞
n
z = reiθ . By substituting in the Maclaurin series ln(1+z) = − n=1 (−1)n zn (assuming
r < 1) and equating real and imaginary parts, show that
(e) ii
1
2
2
ln(1 + 2r cos θ + r ) = −
∞
X
n=1
(−r)n
cos nθ
n
∞
X
sin nθ
r sin θ =−
(−r)n
.
arctan
1 + r cos θ
n
n=1
and
72
MATH2011/2/4
By letting r → 1, and using half angle formulae, show that
ln(2| cos
1
2 θ|)
=
∞
X
(−1)
n=1
n+1 cos nθ
n
and
arctan(tan
1
2 θ)
=
∞
X
(−1)n+1
n=1
sin nθ
.
n
Sketch the graph of the function arctan(tan 12 θ) for −2π ≤ θ ≤ 2π. (It is called a
sawtooth wave, and the right hand side is its Fourier series representation.)
Answers
1. (a)
p2 −q 2
p2 +q 2
+ i p22pq
+q 2 ,
(b)
8ipq(p2 −q 2 )
(p2 +q 2 )2 ,
√
√
(c) 21 ( 3 − 1) + 12 ( 3 + 1)i,
(d)
1
10 (3
− 29i),
sin θ
1
− i),
(f) 1 − i 1+cos
θ = 1 − i tan 2 θ.
u v
4. (i) W =
.
−v u
5. Conjugation corresponds to transposing or forming the adjoint.
7. (b) 18 (−1 + i)z + 18 (3 − i)z,
(c) (z − z 2 )/(zz − 1).
p
p
√
√
(b) −1, i; 1 +
8. (a) ±(1 + i); ±(2 − i); ±(2 + 3i); ±( 1 + 2 + i −1 + 2).
p
p
√
√
2i, −1 + 3i; 4 − i, −2 + 3i;
1 + i ± ( 1 + 2 + i −1 + 2) .
√
√
3
3
(b) ±2i = ±1 ± i.
9. (a) 2, −1 ± i 3, z − 2 = (z − 2)(z 2 + 2z + 4)
10. See sketches overleaf.
√
12. Spaced at 120◦ apart on a circle of radius 2, and at 90◦ on a circle of radius 2.
13. Average = 12 (x+u)+ 21 i(y+v); midpoint = 12 (x+u), 12 (y+v) . 12 0+(z+w) = 21 (z+w).
14. Re(z w) = Re(z w) = xu + yv = (x, y) · (u, v). z/w = (z w)/(u2 + v 2 ).
(e)
1
√
(1
2 2
(c) Rotation through
15. (a) Reflection in real axis; z,
(b) Rotation through π2 ; iz,
π; −z, (d) Shift 2 units right and 1 unit down; z + 2 − i, (e) Reflection in line y = x; iz,
(f) Reflection in line y = −x; iz = −iz.
Figure 1.2. Sketches for Question 10.
Algebra Chapter 1
73
16. (a) y = x2 , parabola. (b) z 2 = x2 − y 2 + 2ixy, so Re(z 2 ) = x2 − y 2 = 1, hyperbola.
17. Line y = 2x + 1. For y = mx + c let a = −m − i. For x = c (vertical line) let α = 1.
√
2π
18. z = 2 cos(− π4 ) + i sin(− π4 ) , w = 2(cos 2π
3 + i sin 3 ).
cos θ sin θ
19. Z = r
.
− sin θ cos θ
21. |z| = |z|, arg(z) = − arg(z).
22. See sketches.
Figure 1.3. Sketches for Question 22.
23. z = w means z and w are the same point, |z| = |w| means z and w are the same
distance from the origin. z < w is meaningless, |z| < |w| means z is nearer to the origin
than w.
24. The sum of the squares on the diagonals equals twice the sum of the squares on two
adjacent sides.
25. a − a = 2i Im a.
z+w
z−w
= |z − w|−2 (z w − z w).
If the vertices are at the points 0, z, z + w, w, then the diagonal vectors of the rhombus
are z + w and z − w, which are perpendicular because of Question 14.
26. (i) x2 + (y − 1)2 = (x − 2)2 + (y + 3)2 , giving y = 21 (x − 3) (line).
35
(x + 1)2 + y 2 = 9 (x − 2)2 + y 2 , giving x2 + y 2 − 19
4 x + 8 = 0. Complete the square to
9
get circle centre at ( 19
8 , 0) and radius 8 .
(ii) If α = a1 + ia2 and β = b1 + ib2 , then (x − a1 )2 + (y − a2 )2 = k 2 (x − b1 )2 + (y − b2 )2 .
Coefficient of (x2 + y 2 ) vanishes if k = 1.
(iii) |z| = r and arg z = θ, so r = θ/2π (Archimedean spiral) and r = 1 + cos θ (cardioid).
.
27. (a) Angle = arg(z−α) − arg(z−β) = arg z−α
z−β
(b)
2y
x2 +y 2 −1
= tan c, giving x2 + y 2 − 2y cot c − 1 = 0, circle with centre at i cot c on the
74
MATH2011/2/4
imaginary axis and passing through points ±1. *Angle subtended by chord of circle is the
same at any point on the arc.
29. 4i.
32. (a) ei(α−π/2) ,
√
1+
√ 3 eiπ/4 .
2
5
e2π ei ,
34. cos 4θ = 8 cos4 θ − 8 cos2 θ + 1,
sin5 θ =
35. (a)
(c)
1
cos6
16 (sin 5θ − 5 sin 3θ + 10 sin θ),
2 sin θ−sin(2n+1)θ+sin(2n−1)θ
sin2 nθ
=
,
2(1−cos 2θ)
sin θ
θ=
sin 5θ = 16 sin θ − 20 sin3 θ + 5 sin θ,
1
32 (cos 6θ + 6 cos 4θ + 15 cos 2θ + 10).
1−n
cos(n+1)θ+3−n cos nθ)
(b) 3(3 cos θ−1−3 2(5−3
,
cos θ)
xn+1 sin α+(n−1)β −xn sin α+nβ −x sin α−β +sin α
,
x2 −2x cos β+1
36. (a)
3(3 cos θ−1)
2(5−3 cos θ) ,
±2iπ/3
(b)
−x sin α−β +sin α
x2 −2x cos β+1 ,
(d)
en+1 −e−n −e+1
.
2(e−1)
(c) Re(ecos θ+i sin θ ) = ecos θ cos(sin θ).
√
37. (a) 1, e
or 1, 12 (−1 ± i 3),
√
√ 11iπ/12 √ −5iπ/12
2e
2e
(c) 2eiπ/4 ,
,
, (d) eiπ/10 ,
√
√
38. 2, 2e±2iπ/3 and 2e±iπ/4 , 2e±3iπ/4 .
39. (a) − 59 or − 12 ± 6√i 3 .
(b) ±21/8 eiπ/16 , ±21/8 e9iπ/16 ,
i, e9iπ/10 , e−3iπ/10 , e−7iπ/10 .
40. z = e±2iπ/5 or z = e±4iπ/5 ,
(z 2 − 2z cos 2π
+ 1)(z 2 − 2z cos 4π
5 + 1).
√ ±5iπ/12
√5 ±11iπ/12
√ ±iπ/4
= 1 ± i or z = 2e
or z = 2e
41. z = 2e
√
√
5π
2
2
2
+ 2).
(z − 2z + 2)(z − 2 2z cos 12 + 2)(z − 2 2z cos 11π
12
42. (a) (z + 1 + i)(z − 2 + i)(z − 2 − i), (b) (z + 1 + i)(z + 1 − i)(z + 1 + 2i)(z + 1 − 2i).
1
i
i
1+i
1−i
−1+i
−1−i
1
− z+1
+ z−i
− z+i
,
(b) z+1+i
+ z+1−i
+ z−1+i
+ z−1−i
.
43. (a) z−1
√
π
+1
1
45. (a) 7,
(b) eeπ −1
,
(c) 10
(13 3 − 12i),
(d) 3i .
46. u = cos x cosh y, v = − sin x sinh y. sin z = sin x cosh y + i cos x sinh y. Im(eiz ) =
e−y sin x.
47. (a) i( π2 + 2kπ),
1
π
2 ln 2 + i( 4 + 2kπ),
−π(3−i)/6 2π(−1+i/3) k
(b)
(e) e−π(1+4k)/2 ,
(f) e
π
(h) tan( 4 + kπ) = 1.
e
(c)
,
1
2
ln 12 + i( 5π
6 + 2kπ),
(d) 2kiπ,
k
(g) 21/8 e(π/4)+i(π−8 ln 2)/16) ie2π ,
Algebra Chapter 1
æ
75
74
MATH2011/2/4
æ
Algebra Chapter 2
75
Algebra Chapter 2
Convergence of series
Indeterminate forms
Before dealing with convergence itself, we need some techniques for dealing with indeterminate forms, i.e., limits of the form 00 . In first year we proved
L’Hôpital’s Rule. If lim f (x) = 0 and lim g(x) = 0, and if f (x) and g(x) are sufficiently
x→a
x→a
smooth near x = a, then
f (x)
f ′ (x)
lim
= lim ′
,
x→a g(x)
x→a g (x)
provided that the limit on the right hand side exists.
Proof. The functions f (x) and g(x) can be represented near x = a as Taylor series with
constant term 0 = f (a) = g(a), so
f ′ (a) + 21 (x − a)f ′′ (a) + · · ·
0 + (x − a)f ′ (a) + 12 (x − a)2 f ′′ (a) + · · ·
f (x)
= ′
,
=
g(x)
0 + (x − a)g ′ (a) + 21 (x − a)2 g ′′ (a) + · · ·
g (a) + 21 (x − a)g ′′ (a) + · · ·
by cancelling (x − a). The result follows by letting x → a.
If the limit on the right hand side of L’Hôpital’s Rule is again an indeterminate form,
then the process of differentiating numerator and denominator can be repeated. L’Hôpital’s
Rule can also be used to evaluate limits of the form 1∞ , by taking logarithms beforehand,
and exponentials afterward.
L’Hôpital’s Rule is also valid for limits of the form
f (x)
1
g(x) . Then g(x)
0
. Thus
0
=
G(x)
F (x) ,
∞
∞.
For let F (x) =
1
f (x)
and G(x) =
to which L’Hôpital’s Rule can be applied, since it is of the form
f (x)
G(x)
G′ (x)
g(x)−2 g ′ (x)
lim
= lim
= lim ′
= lim
=
x→a g(x)
x→a F (x)
x→a F (x)
x→a f (x)−2 f ′ (x)
f (x)
lim
x→a g(x)
2
g ′ (x)
,
x→a f ′ (x)
lim
from which the result follows.
Indeterminate forms f (x)/g(x) as x → ∞ can also be determined by L’Hôpital’s Rule,
by substituting x = h1 and letting h → 0, provided f ( h1 ) and g( h1 ) are smooth at and near
h = 0. The proof appears as a tutorial question.
76
MATH2011/2/4
Tutorial questions — Indeterminate forms
1. Evaluate the following limits:
2 sin x − tan x
(a) lim
x→0
ex − 1
sin x − tan x
x→0 ln(1 + x) − x
2 ln(1 + x) + x2 − 2x
(d) lim
x→0
x3
(f) lim (1 + 2 ln x)1/(x−1)
(b) lim
(c) lim (e5x − 2x)1/x
x→0
(e) lim (sec x − tan x)
x→1
x→π/2
sin x
1 − 2 sin 6x
cosec x
(j) lim
x→0 1 + cot x
(l) lim ln(ln x) sin πx
(g) lim xx
(h) lim
x→π
x→0
cosec x
x→π/2 1 + cot x
ln(ln x)
(k) lim
x→e ln x − 1
x→1
arctan x − x
z−i
(m) lim
(n) lim πz
.
3
x→0
z→i e
x
+1
2. By first taking logarithms, evaluate the limits of the following expressions as n → ∞:
(i) lim
(a) n1/n
(b) n1/
√
n
(c) n1/ ln n
(d) n1/
√
ln n
.
3. Prove that L’Hôpital’s Rule can (under suitable conditions) be used for indeterminate
forms
f (x)
g(x)
as x → ∞ as follows. Let h =
1
,
x
and let F (h) = f (x) and G(h) = g(x).
Show that F ′ (h) = −x2 f ′ (x). (Hint: put y = f (x) = F (h) and use the fact that
dy
dx
=
dy dh
dh dx .)
Now apply L’Hôpital’s Rule to
F (h)
G(h) .
ekx
= ∞ for positive k and m,
x→∞ xm
however small k may be and however large m may be. (Hint: apply L’Hôpital’s Rule n
4. Prove that exponentials beat powers, i.e., prove that lim
times, where n is an integer greater than m.)
Deduce that powers beat logarithms by substituting y = ex .
Prove that lim (1 + na )bn = eab . (Hint: take logarithms.)
n→∞
Convergence of series I
Convergence and divergence of series have been seen before: a series
P
an is said to
converge to the number S if the partial sums a1 + a2 + · · · + aN tend to S as N → ∞.
Roughly speaking, this means that the more terms you add on, the closer you get to
the number S, which is sometimes called the sum to infinity. A series that does not
converge is said to diverge. We shall describe some tests to determine whether or not
a series converges; they are important because not many sums to infinity can be found
exactly. We often use a computer to approximate a sum to infinity by the partial sum for
say N = 100 or 1000. This is called truncating the series, but it is meaningless if the
series does not converge. Even if the series converges, it is valuable to know how fast it
converges, so that we can estimate the truncation error, which is the difference between
Algebra Chapter 2
77
the sum to infinity and the partial sum. Note that in testing for convergence it is only
the infinite tail of the series that matters: we can ignore any number of terms at the
beginning, because a finite sum must have a finite value, and therefore cannot affect the
overall convergence or divergence, though it obviously does affect the sum to infinity, if it
exists.
It has been shown that if a series converges, then its nth term tends to zero as n → ∞.
This applies to series with real or complex terms, and it can be used as a basic test for
divergence, because it can be stated in the following form.
P
Divergence Test. If an 6→ 0 as n → ∞, then
an diverges.
Notice that if an does tend to zero, then there is no conclusion: the series may still converge
or diverge. There is another important general result, which also applies to series with
real or complex terms.
P
P
P
Theorem. If
|an | converges, then an also converges, and we say an is absolutely
convergent.
This result seems obvious, since |a1 + a2 + · · · | ≤ |a1 | + |a2 | + · · · , but the proof is rather
subtle, and will be omitted.
The next test for series enables us to deal with fast converging or diverging series, usually
those whose terms involve exponentials or factorials of n. The simplest are the geometric
P n−1
series
ar
, in which r is the common ratio between the (n + 1)th term and the n th
term. A geometric series with common ratio r converges if |r| < 1 and diverges if |r| ≥ 1.
Almost the same applies to series which the ratio tends (in modulus) to a constant, even
if the ratio is not exactly constant.
P
an+1
→ L as n → ∞. Then the series
an converges if L < 1
an
and diverges if L > 1. There is no conclusion if L = 1.
Ratio Test. Suppose
The proof is technical, and will be omitted, but the result is the same as that for geometric
series, except that in the border area between convergence and divergence (i.e., L = 1)
there is no conclusion. For this reason the Ratio Test is useless unless n appears in an
exponent or factorial. Notice that since the Ratio Test deals with the absolute value of
the ratio, it is actually a test for absolute convergence, and can also be applied to series
with complex terms.
The Ratio Test is particularly valuable for power series, i.e., series like Maclaurin or
Taylor series involving powers of z or z − a. The Ratio Test, if applied to a power series
P
cn (z − a)n , can lead to the conclusion that the power series converges if |z − a| < R, say,
and diverges if |z − a| > R. This means that the series converges if the point z is inside
the circle with centre a and radius R, and diverges if z is outside the circle. There is no
78
MATH2011/2/4
conclusion from the Ratio Test if |z − a| = R, i.e., if z is on the circle, which is called the
circle of convergence of the series.
Tutorial questions — Convergence of series I
5. Use one of the tests above to determine, if possible, whether the following series converge
or diverge:
X n
(a)
2n
X1
(d)
n
X n!
(g)
nn
6. Use the Ratio Test
(b)
X 3n
X
n2√
(c)
n
√
X (−3)n
n!
X 2n
2−2n
(f)
n
1+ n
X 1 29(1 + i) n
X 12(1 + i) n
(i)
.
(h)
n
17
n
41
P α n
to show that the binomial series
n z converges for |z| < 1 and
α(α−1)...(α−n+1)
, and if the series
diverges for |z| > 1. (The binomial coefficient α
n!
n =
(e)
converges, then its sum is (1 + z)α .)
7. Use the Ratio Test to show that the Maclaurin series for ez converges for all z.
Deduce that z n /n! → 0 as n → ∞. (Hint: consider the nth term of the series.)
This shows that factorials beat exponentials (which beat powers, which beat logarithms).
8. Use the Ratio Test to find the circles of convergence of the following power series. Then
sketch the circles in the complex z plane.
X
X (2z − 1)n
X n!z n
n
(a)
nz
(b)
(c)
n
nn
X (1)(3)(5) . . . (2n − 1)
X 2n
X (nz)n
(z − 2)n .
(e)
(z − i)2n
(f)
(d)
n
n!
n!
P∞ 2n
n −1/2
n
−1/2
Show that 2n
=
(−4)
.
Deduce
that
where
n=0 n (z − 2) = (9 − 4z)
n
n
it converges. Indicate the point on the circle of convergence where the right hand side
is undefined.
9. If L < 1, where L is the limit found in the Ratio Test, show that if
∞
X
an is approx-
n=0
imated by the partial sum
N
X
an , then the absolute value of the truncation error can
n=0
aN+1
. (Hint: replace the (N + 1)th and later terms in modulus by a
1−L
geometric series with common ratio L, since aN+2 ≈ L aN+1 , and so on.)
be estimated by
Convergence of series II
The next tests apply to series for which the Ratio Test fails, i.e., those for which the ratio
of successive terms tends in modulus to 1. This includes series in which the nth term
Algebra Chapter 2
79
involves only powers or logarithms of n. Unfortunately these tests are restricted to series
with positive terms only, which behave more predictably than general series.
P
Lemma. If
an is a series with positive terms, then either it converges, or its partial
sums tend to infinity.
The important point is that the partial sums cannot oscillate, and the reason is clear: since
the terms are all positive, it follows that adding on an extra term must always increase the
value of the partial sum, so the partial sums are increasing. (This is analogous to the fact
that if the derivative of a function is positive, then the function itself is increasing.) There
are now only two possibilities: either the partial sums continue increasing to infinity, or
they must level out to a limiting value. The formal proof of this seemingly obvious result
is again too theoretical for us.
The simplest series of this type are sums of a constant power of n.
X 1
diverges if p ≤ 1 and
Theorem. The p-series If p is constant, then the series
np
converges if p > 1.
y=x-p
y=Hx+1L-p
Figure 2.1. Comparison between series and integrals
Proof. If p ≤ 0, then the nth term n−p does not tend to zero, so the series diverges. If
p > 0, we can draw the graphs y = x−p for x ≥ 1 and y = (x + 1)−p for x ≥ 0, as
shown in Figure 2.1. By joining the graphs with horizontal lines, as shown, we obtain the
graph of a step function lying between the two graphs. The area under the step function
between x = 0 and x = N is the sum of areas of rectangles with base length 1 and heights
1−p , 2−p , . . . , N −p respectively. The areas under the other two graphs can be found by
integration. Since the step function lies between (x + 1)−p and x−p , it follows that
Z
N
(x + 1)
0
−p
dx <
N
X
n=1
n
−p
<1+
Z
N
x−p dx.
1
If p 6= 1 then this becomes
N
X
1 1 1−p
1−p
(N + 1)
−1 <
n−p < 1 +
N
−1 .
1−p
1−p
n=1
80
MATH2011/2/4
If 0 < p < 1, then both (N + 1)1−p and N 1−p tend to infinity (because the exponent 1 − p
is positive), so the partial sums also tend to infinity, and the series diverges. On the other
hand, if p > 1, then both (N + 1)1−p and N 1−p tend to zero (because the exponent 1 − p
is negative), so the expressions on the left and right tend to
1
p−1
1
and 1 + p−1
respectively.
Thus the partial sums cannot tend to infinity, and, by the Lemma, the series must converge
to a value lying between
1
p−1
and 1 +
1
p−1 .
The case p = 1 must be treated separately,
because the integrals involve logarithms. The inequalities become
ln(N + 1) <
N
X
n−1 < 1 + ln N,
n=1
and since ln N → ∞ as N → ∞ it follows that the series diverges when p = 1.
P∞
−p
Notice that the infinite series
converges for the same values of p as the
n=1 n
R ∞ −p
infinite integral x=1 x dx, as shown in Calculus Chapter 1. This theorem also shows
P1
that the divergent series
n , which is called the harmonic series, represents, roughly
speaking, a borderline between convergence and divergence. It diverges incredibly slowly:
if a computer summed ten million terms a second, and kept on calculating for a year, then
the partial sum would be about ln(107 × 60 × 60 × 24 × 365), which is less than 34, yet the
partial sums eventually go to infinity.
The p-series are used as a set of standards by which other series of positive terms can
be compared. There are two tests for comparing series having positive terms, each saying
essentially that if such a series converges, then so does any such series with smaller terms,
and if it diverges, then so does any series with larger terms.
Comparison Test. If 0 ≤ an ≤ bn for all n sufficiently large, then:
P
P
(i) if
bn converges, then
an converges,
P
P
(ii) if
an diverges, then
bn diverges.
(The phrase “all n sufficiently large” draws attention to the fact that only the tail of the
series matters when testing for convergence or divergence.)
an
Limit Comparison Test. If an and bn are positive, and
tends to a non-zero numbn
P
P
ber K as n → ∞, then
an and
bn either both converge or both diverge.
(Notice the difference between this and the Ratio Test: this test compares corresponding
terms of two different series of positive real numbers, while the Ratio Test compares the
ratio of absolute values of successive terms of the same series. In this test there is a
conclusion for any limit K not equal to zero or infinity, while the Ratio Test fails only
when the limit L = 1.)
Algebra Chapter 2
81
Proof. If an ≤ bn for all n ≥ M say, then
N
X
n=M
an ≤
N
X
bn for all N ≥ M , and the
n=M
Comparison Test follows by the Lemma. For the Limit Comparison Test, note that if
an
bn
→ K then for n sufficiently large we must have at least 12 K ≤
bn ≤
2
K an
an
bn
≤ 2K. This gives
and an ≤ 2Kbn ,
and the result follows from the Comparison Test.
To use these tests in practice, we have to approximate a given series by a simpler series
(usually a p-series) whose convergence or divergence is known. We can use inequalities
like the fact that powers beat logarithms (i.e., eventually nm > (ln n)k for any positive
constants k and m). In addition, based on the fact that we can assume n is large, so
1
n
is
small, we can use first or second approximations to obtain a simpler series. Once it has
been found, we apply one of the comparison tests.
The final test for series with positive terms is also a kind of comparison, using the fact
that integrals are in general easier to evaluate than sums. It generalizes the proof used
for p-series, and emphasizes the link between convergence of series and convergence of
integrals, as discussed in Calculus Chapter 1.
Integral Test. If f (x) is a positive decreasing function for x ≥ M , then
Z
∞
∞
X
f (n) and
n=M
f (x) dx either both converge or both diverge.
M
Proof. Replace the graphs in Figure 2.1 by the graph y = f (x) for x ≥ M and the graph
y = f (x + 1) for x ≥ M − 1. Then by the same argument we obtain
Z
N
f (x + 1) dx <
M −1
N
X
f (n) < f (M ) +
n=M
Z
N
f (x) dx,
M
and the result follows by the Lemma.
Tutorial questions — Convergence of series II
10. Use a second approximation to ln(1 + x) (or a sketch graph) to show that x ≥ ln(1 + x).
N
N
X
X
1
>
ln(1 + n1 ).
Deduce that
n
n=1
n=1
Rewrite the sum on the right hand side without sigma notation, and show by the laws
of logarithms that its value is ln(N + 1).
By letting N → ∞, deduce from the above that the harmonic series diverges.
82
MATH2011/2/4
11. Use appropriate comparison tests to determine whether the following series converge or
diverge.
√
X 3
X 5 n+1
X 1
(b)
(c)
(a)
n+1
n2 + 1
n+2
Xp
X n1/2 + 1
X
(d)
( n2 + 1 − n)
(f)
(e)
sin n1
n1/3 + 5
X ln n
X ln n
X n−2 + 1
(h)
(i)
(g)
n−3 + 5
n
n2
X 1
X 1
X
1
(j)
(k)
(l)
.
2
n ln n
n ln n
n(ln n)2
* 12. Which part of the Limit Comparison Test remains true if K = 0, and which part
if K = ∞?
Convergence of series III
There is only one simple test, besides the Ratio Test, for series whose terms are not all
positive. It applies to certain alternating series, i.e., series whose terms are alternately
positive and negative.
Alternating Series Test. If cn is positive and decreasing, and if cn → 0 as n → ∞,
P
P
then the alternating series (−1)n cn and (−1)n+1 cn both converge.
P
Proof. A partial sum of (−1)n+1 cn can be written as
c1 − (c2 − c3 ) − (c4 − c5 ) − · · · or as (c1 − c2 ) + (c3 − c4 ) + (c5 − c6 ) − · · · .
Since the values of cn are decreasing, each bracketed term is positive. It follows that
the odd partial sums start at the value c1 and decrease, while the even partial sums
start at 0 and increase. However, the difference in magnitude between the N th partial
sum SN and the (N + 1)th partial sum SN+1 is cN+1 , which tends to zero as N → ∞.
Therefore the odd and even partial sums must tend to the same limit, which means that
P
P
P
the series (−1)n+1 cn converges. Finally, (−1)n cn = − (−1)n+1 cn , so if one series
converges, then the other one converges also.
Corollary. In absolute value, the truncation error for an alternating series satisfying the
conditions of the test is less than the first term omitted.
Proof. Suppose the series converges to the sum S. From the proof of the Alternating Series
Test it follows that the odd and even partial sums oscillate from one side of S to the other.
Thus |SN − S| < |SN − SN+1 | = cN+1 , as required.
When using the test, it is important to check that the terms decrease in magnitude, as
well as alternate in sign (and tend to zero). The test actually guarantees the convergence
P inθ
of
e cn for any constant θ such that 0 < θ < 2π, though we cannot prove it here.
Algebra Chapter 2
83
What we have proved is convergence when θ = π, since einπ = (−1)n . The general result
makes it sometimes possible to establish the convergence or divergence of a power series
at points on its circle of convergence.
P
P
P
If a series
an converges, but
|an | does not, then we say
an is conditionally
convergent. A surprising result is that the terms of a real conditionally convergent series
can be rearranged, without omitting or repeating any terms, to sum to any desired value T .
You simply take the next positive terms until the partial sum is greater than T , then the
next negative terms until the partial sum is less than T , and so on.
Tutorial questions — Convergence of series III
13. Prove that the Maclaurin series for ln(1 + x) converges at x = 1 but not at x = −1.
What is the sum to infinity at x = 1?
14. Roughly how many terms do you need in order to obtain an estimate of
π
4,
correct to
four decimal places, by truncating the Maclaurin series for arctan x with x = 1? (Hint:
the absolute truncation error must be less than 0, 5 × 10−4 .)
How many terms do you need if you use the identity
π
4
= arctan 21 + arctan 13 and
write the right hand side as a single alternating series? (Use a calculator, and trial and
error.)
* 15. Make up a real alternating series in which the terms tend to zero, but the series still
diverges. (Hint: make the series of positive terms converge, and the series of negative
terms diverge.)
* 16. Determine the behaviour of the power series of Question 8(a) and (b) on their circles of
convergence.
∞
X
(−1)n+1
* 17. The series
converges to ln 2 (see Question 13). Rearrange the terms of
n
n=1
the series to make a new series whose sum is 1, and write down the first few terms
of the new series. Plan a computer program to generate any number of terms of this
rearranged series.
18. Test the following assorted series for convergence:
X
X
(a)
(2 + i)−n
(b)
(2n + 1)−2
X (−1)n
√
n
X 1
(g)
n ln n
X nn
(j)
n!
X
1
(m)
2+(1/n)
n
(d)
1
√
n+1− n
X n
(h)
n2 + 1
X n!
(k)
nn
X
1 (n)
ln 1 + √
n
(e)
X
√
1
p
n(n + 1)
X 4 + 5i n
(f)
6
X
π
(i)
sin 2
n
X
1
(l)
1−(1/n)
n
X
(o)
(1 + n1 )1/3 − 1
(c)
X
84
MATH2011/2/4
X cos nπ
1
(r)
n ln n ln ln n
2n + 1
X sin nπ
X
X cos(2n + 1)π
n
(t)
(u)
(−1)n 2
(s)
2n + 1
2n + 1
n −1
X
X
X
1
1
1
(v)
*(x)
(−1)n 2+(−1)n .
(w)
q
1+(1/n)
n(ln n)
n
n
P n −y
19. Determine for which (real) values of x and y the series
x n converges, and shade
(p)
X
arctan(n−2 )
(q)
X
the region of convergence in the (x, y) plane. (Hint: regard x and y as parameters: first
use the Ratio Test to find regions of convergence and divergence, and use other tests
on the boundary between the regions. Use solid lines in your sketch for portions of the
boundary where the series converges, and dotted lines otherwise.)
Answers
1. (a) 1,(b) 0,(c) e3 ,(d) 23 , (e) 0,(f) e2 ,(g) 1,(h) 0, (i) 1,(j) 1,(k) 1,(l) 0, (m) − 13 ,(n) − π1 .
2. (a) 1,
(b) 1,
(c) e,
(d) ∞.
4. After n applications of L’Hôpital’s Rule, the numerator is k n ekx , which still tends to
infinity, but the denominator is a constant times xm−n , which tends to zero.
5. (a) C,(b) D,(c) C,(d) no conclusion (so far). (e) D,(f) no conclusion,(g) C,(h) C,(i) D.
8. (a) |z| = 1,(b) |z − 12 | = 21 ,(c) |z| = e,(d) |z| = 1e ,(e) |z − i| =
−1/2
equals 1 − 4(z − 2)
, undefined at z = 94 .
9. Absolute truncation error ≈ |aN+1 |{1 + L + L2 + · · · }.
10. ln(1 + x) = x − 21 x2 + · · · < x, .˙.
1
n
> ln(1 + n1 ), .˙.
P
1
n
√1 ,
2
>
(f) |z − 2| = 14 . Sum
P
ln(1 + n1 ).
11. (a) D, (b) C, (c) D, (d) D, (e) D, (f) D, (g) D, (h) D, (i) C, (j) D, (k) C, (l) C.
P
P
P
12. If K = 0, then
bn convergent implies
an convergent and
an divergent implies
P
bn divergent only. If K = ∞, interchange an and bn .
13. ln 2.
14. 10 000 terms, 5 terms.
15. * See Question 18(x)
16. * Converges (a) nowhere on the circle, (b) everywhere on the circle except at z = 1.
17. * (1 + 13 ) −
1
2
+
1
5
−
1
4
+ ( 71 + 19 ) −
1
6
1
+ ( 11
+
1
13 )
−
1
8
+ ···.
18. (a) C (RT), (b) C (CT/LCT), (c) D (CT/LCT), (d) C (AST), (e) D (n th term), (f) D
(RT),(g) D (IT),(h) D (CT/LCT),(i) C (CT/LCT),(j) D (RT and L’H),(k) C (RT), (l) D
(CT), (m) C (CT), (n) D (LCT and L’H), (o) D (LCT and L’H), (p) C (LCT and L’H),
(q) D (IT), (r) C (AST), (s) D (CT), (t) C (all terms are zero), (u) C (AST), (v) C if q > 1
(IT),
(w) D (LCT and L’H),
terms separately).
19. See sketch overleaf.
*(x) D (drop sigma notation, and consider odd and even
Algebra Chapter 2
85
Figure 2.2. Sketch for Question 19
86
MATH2011/2/4
æ
Algebra Chapter 3
æ
85
86
MATH2011/2/4
Algebra Chapter 3
Linear algebra
Linear spaces
In Calculus Chapter 1 we met the idea of a linear operator, i.e., an operator, say P ,
such that P (cy) = c(P y) for any constant c, and P (y1 + y2 ) = P y1 + P y2 for any inputs
y1 and y2 . By using these facts repeatedly, we can say that
P (α1 y1 + · · · + αn yn ) = α1 (P y1 ) + · · · + αn (P yn ) or P
n
X
j=1
n
X
αj (P yj )
α j yj =
j=1
for any constants α1 , . . . , αn and any inputs y1 , . . . , yn . The expression α1 y1 + · · · αn yn
Pn
or j=1 αj yj is called a linear combination of y1 , . . . , yn . Thus a linear combination of
elements is a sum of constant multiples of those elements. (One reason for the word linear
is that the expression is of degree one in the variables y1 , . . . , yn .)
A set of vectors or functions is called a linear space if it is closed under the formation
of linear combinations, i.e., if every linear combination of elements of the set is again in the
set. If only real constants are permissible, then the space is called a real linear space;
if complex constants are allowed, then the space is called a complex linear space. The
following results about linear spaces are important; brief proofs, where necessary, are given
in brackets.
1. All vectors with n real entries form a real linear space, denoted Rn . Similarly, all
vectors with n complex entries form a complex linear space, denoted Cn . Row vectors
are denoted by bold lower case letters, e.g. r or a or v; column vectors are written in
upper case, e.g. X or Z.
2. All real-valued functions form a real linear space, and all complex-valued functions form
a complex linear space.
3. The inputs to a linear operator form a linear space, which is called the domain of the
operator. (This is because in order to test for linearity, every linear combination of
inputs must also be an input.)
Algebra Chapter 3
87
4. Every linear space contains the zero element. (This is because if y is in the space, then
so is y + (−y).)
5. The outputs of a linear operator form a linear space, which is called the range space
or image space of the operator. (This is because any linear combination of outputs,
say α1 (P y1 ) + · · · + αn (P yn ), can by the properties of linear operators be re-written
α1 (P y1 ) + · · · + αn (P yn ) = P (α1 y1 + · · · + αn yn ),
so it is also an output.)
6. The null space of a linear operator is a linear space. (This is because if y1 , . . . , yn are
in the null space of P , then P (y1 ) = P (y2 ) = · · · = P (yn ) = 0, and by linearity
P (α1 y1 + · · · + αn yn ) = α1 (P y1 ) + · · · + αn (P yn ) = α1 0 + · · · + αn 0 = 0,
so the linear combination α1 y1 + · · · + αn yn is also in the null space.) This result is
called the superposition principle, and was also proved in Calculus Chapter 1.
Tutorial questions — Linear spaces and subspaces
1. Prove that if C1 , C2 , . . . , Cn are m × 1 column vectors, then the linear combination
Pn
j=1 xj Cj is the same as the matrix product CX, where C is the m × n matrix whose
columns are C1 , C2 , . . . , Cn , and X is the n×1 column vector with entries x1 , x2 , . . . , xn .
(Hint: write out CX and use the definition of matrix product.)
2. Determine whether the following sets of vectors (z, w) in C2 are linear spaces. (In each
case, take two general vectors (z1 , w1 ) and (z2 , w2 ) satisfying the given equation; then
let (z3 , w3 ) = α1 (z1 , w1 ) + α2 (z2 , w2 ), and see whether or not (z3 , w3 ) also satisfies the
same equation.)
(a) all (z, w) such that z + iw = 0.
(b) all (z, w) such that z + iw = 1 + i.
(c) all (z, w) such that z 2 − iw2 = 1 + i.
(d) all (z, w) such that z 2 − iw2 = 0.
3. Use Gauss-Jordan elimination to solve (where possible) the systems of equations AZ =
B with the following augmented matrices, giving the complete general solution, where
it exists. If the solution is not unique, write it in the form Z = Z1 + Z0 , where Z1 is a
particular solution and Z0 is a general vector in the null space.
1
2+i : 1+i
1
−1 + i : 1 + i
(a)
(b)
1−i
3
:
i
1−i
2i
:
2


1
2+i
: 1−i
1
i
1−i : 1+i
(c)
(d)  1 − i
3−i
: −2i .
1 + i −1 + i
2
:
2i
i
−1 + 2i : 1 + i
4. (Revision.) (i) Find the inverses of the following matrices T by reducing the augmented
matrix T : I by Gauss-Jordan elimination to I : T −1 .
88
MATH2011/2/4
(ii) Check your answers by working out T T −1 .
1
adj(T ).
(iii) Double-check your answers by finding T −1 using the formula T −1 =
det(T )




1 2 3
1
i
1+i
1 13
1
1+i
(a) 1 1
(b)
(c)  2 3 4  (d)  i
0
1 .
1 − 2i
3
2
2
3 4 6
1 −1 + i −1 + i
Bases and dimension
A linear space (unless it consists of zero alone) contains infinitely many elements, but
usually they can be expressed as linear combinations of relatively few of these elements.
For example, any vector (x, y, z) in R3 can be expressed as a linear combination of i, j,
and k, since
(x, y, z) = xi + yj + zk,
and this expression is, in fact, unique. Similarly, any function y(t) in the null space of
the operator D2 + 1 can be written uniquely as a linear combination of cos t and sin t,
thus: y(t) = P cos t + Q sin t. We say that i, j, k, form a basis of R3 , and that cos t and
sin t form a basis of the null space of D2 + 1.
More generally, a basis of a linear space is a set of elements in the space such that
every element in the space can be written uniquely as a linear combination of the basis
elements. Notice that:
• the basis elements must themselves be in the space,
• for every element in the space, an expression as a linear combination of basis elements
must exist, and
• for every element in the space, the expression as a linear combination of basis elements
must be unique.
Theorem. In Rn or Cn n column vectors will form a basis if and only if the matrix of
which they are the columns is invertible.
Proof. Suppose the vectors are C1 , . . . , Cn , forming the matrix (C1 . . . Cn ) = C. By matrix
multiplication (see Question 1) it is easy to show that a linear combination x1 C1 +· · ·+xn Cn
is the same as the matrix product CX, where X is the column vector (x1 . . . xn )T . To say
that C1 , . . . , Cn form a basis is to say that every n × 1 vector B has a unique expression as
a linear combination x1 C1 + · · · xn Cn . By the previous remarks this means precisely that
the equation CX = B has a unique solution for every n × 1 vector B. From elementary
matrix algebra we know that a solution of the equation CX = B will exist for every B
and be unique if and only if C is invertible, in which case X = C −1 B.
Algebra Chapter 3
89
The above theorem is true if column is replaced by row throughout. In particular, the
standard basis of Rn or Cn consists of the n vectors forming the columns (or rows) of
the identity matrix In , which is obviously invertible, since In−1 = In .
There are many ways to choose a basis of a linear space, but it can be proved that
the number of elements in any basis will be the same. For example, in R3 any three
non-coplanar vectors form a basis, as the above theorem shows, since three non-coplanar
vectors form a matrix with non-zero determinant. With fewer than three vectors not every
element of R3 will be expressible, and with more than three the expressions will not be
unique. Similarly, cos(t−α) and sin(t−β) in general form a basis of the null space of D2 +1,
as do eit and e−it (if complex coefficients are allowed), but you always need two functions
for a basis of this space. This suggests that, although the basis elements themselves can
be chosen in many ways, the number of basis elements is always the same.
The dimension of a linear space is the number of elements in any basis of the space.
Thus R3 has dimension three (so this meaning of dimension corresponds with our usual
meaning), and the null space of D2 + 1 has dimension two. A plane through the origin in
R3 also has dimension two (any two non-collinear vectors in the plane form a basis), and
a line through the origin in R2 or R3 has dimension one. The origin on its own is said to
have dimension zero.
Theorem. The dimension of the null space of a linear operator is equal to the number of
arbitrary constants in the general solution of the inverse problem for that operator.
Proof. The general solution of P y = b is of the form y = y1 + y0 , where y1 is a particular
solution and y0 is a general element of the null space. If y0 involves n arbitrary constants,
then it is a linear combination of n basis elements, so the null space has dimension n.
In particular, the solution of P y = b is unique if the null space of P consists of zero only
(no arbitrary constants in the solution, and dimension zero for the null space). For an n th
order linear differential operator we also see that the null space has dimension n, because
the solution requires n integrations and therefore involves n arbitrary constants.
Tutorial questions — Bases and dimension
5. Use your working from Question 3 to find a basis of the null space of the coefficient
matrices there. Then write down the dimension of each null space.
6. Find a basis of the null space of each of the differential operators below, and verify that
the dimension of the null space is equal to the degree of the operator. (Use complex
numbers where necessary for simplicity.)
(a) D2 − 4D + 3
(b) D2 + 2D + 2
(c) D3
(d) Dn
(e) D4 + 4.
7. (a) Prove that if a and b are row vectors in R2 , and if T is the matrix whose rows are
a and b, then | det(T )| is equal to the area of the parallelogram whose sides are formed
90
MATH2011/2/4
by the vectors a and b. (Hint: put the third component equal to 0 and use the fact
that area = |a × b|.
Use the theorem to prove that any two non-collinear vectors form a basis of R2 .
(b) Similarly, prove that any three non-coplanar vectors in R3 form a basis of R3 , by
using the fact that the determinant of the matrix they form algebraically is plus or
minus the volume of the parallelepiped they form geometrically.
Independence and rank
The uniqueness of expressions for vectors in terms of a basis comes from a property called
independence. Elements in a linear space are said to be independent if either of the
following is true (in which case it can be shown that the other is also true):
• It is impossible to express any one of the elements as a linear combination of the others.
• The only way to express zero as a linear combination of the elements is to have all
coefficients equal to zero.
The first definition makes it clear that in R3 two vectors are independent if they are not
collinear, and three vectors are independent if they are not coplanar. Comparing these
definitions with the definition of basis shows clearly that basis elements in any linear space
must always be independent.
Theorem. (i) Column vectors C1 , . . . Cn in Cm are independent if the equation CX = 0
has the unique solution X = 0 (i.e., the null space consists of 0 alone), where C is the
m × n matrix (C1 . . . Cn ).
(ii) Functions y1 (x), . . . , yn (x) are independent if their Wronskian is non-zero.
The Wronskian W (y1 , . . . , yn ) is the determinant of a matrix whose entries are the functions
and their derivatives of successive orders. For n = 2 and 3 we have:


y1 y2 y3
y1 y2
W (y1 , y2 ) = det
= y1 y2′ − y2 y1′ ,
W (y1 , y2 , y3 ) = det  y1′ y2′ y3′  .
y1′ y2′
y1′′ y2′′ y3′′
Proof. (i) Column vectors. If X = (x1 , . . . , xn )T , then, as in Question 1, CX = x1 C1 +
· · · + xn Cn . If CX = 0 has the unique solution x1 = x2 = · · · = xn = 0, then C1 , . . . , Cn
are independent by the second definition of independence above. (Note that the matrix
C need not be square, but, if it is, then its columns are independent if and only if it is
non-singular, i.e., if det C 6= 0 or C −1 exists. )
(ii) Functions. Suppose a1 y1 + a2 y2 + a3 y3 = 0, where a1 , a2 , a3 are numbers. Differentiate
twice to get a1 y1′ + a2 y2′ + a3 y3′ = 0 and a1 y1′′ + a2 y2′′ + a3 y3′′ = 0. These equations give the
matrix equation

y1
 y1′
y1′′
y2
y2′
y2′′
   
a1
y3
0
′ 


a2 = 0  .
y3
′′
a3
y3
0
Algebra Chapter 3
91
Since the Wronskian W (y1 , y2 , y3 ) is the determinant of the coefficient matrix, it follows
that if W (y1 , y2 , y3 ) 6= 0, then the coefficient matrix is invertible, so the numbers a1 , a2 , a3
are all zero (as required for independence).
The rank of a matrix is the number of non-zero rows in the Gauss or Gauss-Jordan
form of the matrix. It can be shown that the rank is equal to the dimension of the range or
image space, and is also equal to the maximum number of independent rows or columns in
the matrix. The rank also gives information about the existence or uniqueness of solutions
of linear equations.
Theorem. Suppose the m × n matrix A has rank r. Then:
(i) The null space of A, i.e., the space of solutions of the equation AX = 0, has dimension n − r.
(ii) If r = m, then the solution of any equation AX = B may not be unique, but will
always exist.
(iii) If r = n, then the solution of any equation AX = B is unique, assuming it exists.
Proof. (i) After Gauss-Jordan elimination of AX = O, there are r non-zero rows or equations; together, these express r of the n unknowns in terms of the remaining n−r unknowns,
which are arbitrary. Thus there are n − r arbitrary constants in the solution, so the null
space has dimension n − r.
(ii) If r = m, then every row in the normal form of A is non-zero, so a solution must exist.
(iii) If r = n, then there are no arbitrary constants, so the solution (if it exists) is unique.
Corollary. Column vectors C1 , . . . , Cn are independent if and only if the matrix (C1 . . . Cn )
has rank n.
Proof. The result follows from part (iii) of this theorem and part (i) of the previous theorem.
Tutorial questions — Independence and rank
8. Use your working from Question 3 to determine which coefficient matrices there have
all their columns independent.
9. Use Wronskians to test the following sets of functions for independence:
(a) ex , e−x
(b) cos x, sin x, 1
(c) cos2 x, sin2 x, 1
(d) 1, sec x, tan x.
10. Use Wronskians to show that the basis functions you found in Question 6(a)–(c) are
independent.
11. Show that two general functions y1 and y2 are not independent if their quotient is a
constant. Use the quotient rule for differentiation to show that this is the same as
W (y1 , y2 ) = 0.
92
MATH2011/2/4
12. (a) Find the rank of each of the coefficient matrices in Question 3 by counting the nonzero rows in the normal form. Verify in each case that the dimension of the null space
(i.e., the number of arbitrary constants) is equal to the difference between the number
of columns and the rank.
(b) Given a square matrix A, explain why existence of a solution for AX = B for all
possible choices of B always goes together with uniqueness.
Eigenvalues and eigenvectors
A linear operator whose inputs and outputs come from the same linear space is said to be
a linear transformation of that space. For example, multiplication by a (square) n × n
matrix A is a linear transformation of Rn or Cn . Similarly, any D operator P (D) is a
linear transformation of the space of all infinitely differentiable functions.
With a transformation, it is possible for an element to be taken to itself, or a multiple
of itself. These elements, and the scalar multiples involved, turn out to be very important.
Geometrically, it says that the output is parallel to the input, so the direction is the same,
though the magnitude may be different.
A number α is said to be an eigenvalue of a linear transformation P if there is a
non-zero element y such that P y = αy. If α is an eigenvalue of P , then the eigenspace
of P corresponding to α consists of all elements y such that P y = αy. The elements of
the eigenspace are called eigenvectors or eigenfunctions. For example, every number
α is an eigenvalue of the differentiation operator
d
, with
dt
dy
= αy.
dt
corresponding eigenfunctions
y = ceαt , where c is an arbitrary constant, since
Similarly, 2 is an eigenvalue
0 −4
of the matrix A = 1 4 , with corresponding eigenvectors X = c −2
1 , since by direct
calculation it is easy to show that AX = 2X.
If α is an eigenvalue of an operator P , then the corresponding eigenspace consists of all y
such that P y = αy, or P y − αy = 0, which we may write (by analogy with D operators) in
the form (P − α)y = 0. Thus the eigenspace corresponding to the eigenvalue α is the null
space of the operator P − α, and is therefore a linear space. The eigenspace must have
dimension at least one, since by definition of eigenvalue the eigenspace contains a non-zero
element.
Theorem. If A is an n × n matrix, then λ is an eigenvalue of A if and only if det(A −
λIn ) = 0.
Proof. The number λ is an eigenvalue of A if and only if there is a non-zero solution for
the equation AX = λX, i.e., AX − λX = 0, i.e., (A − λIn )X = 0. (Notice how we must
insert the identity matrix In to make the matrix algebra meaningful.) This equation has
a non-unique solution if and only if the coefficient matrix A − λIn is singular, i.e., if and
only if det(A − λIn ) = 0.
Algebra Chapter 3
93
The equation det(A − λIn ) = 0 is called the characteristic equation of the matrix A,
and the left hand side det(A − λIn ) is called the characteristic polynomial of A, which
we write as cA (λ), so
cA (λ) = det(A − λIn ).
The characteristic polynomial is found by subtracting the variable λ from each of the
diagonal entries of A, and then taking the determinant. Eigenvalues are sometimes called
characteristic roots, since they are the roots of the characteristic equation cA (λ) = 0.
Tutorial questions — Eigenvalues and eigenvectors
13. For each matrix A and each vector X below, show that the matrix product AX is a
scalar multiple of X. (This shows that X is an eigenvector; the multiplying factor is
the corresponding eigenvalue.)
0 1
1
1
(a) A =
;X=
and X =
.
1 0
1
−1
1 2
2
1
(b) A =
;X=
and X =
.
3 2
3
−1
1 −1
1
−1
(c) A =
;X=
and X =
.
−2
2
1
2
1 −2
1+i
1−i
(d) A =
;X=
and X =
.
1
3
−1
−1
0 −1
1
(e) A =
;X=
.
1
2
−1
14. Find det(A − λI2 ) for the matrices in Question 13 and verify that the solutions of the
equation det(A − λI2 ) = 0 are the eigenvalues as found above.
15. Find and solve the characteristic equations of the following matrices, and hence find
their (real or complex) eigenvalues and corresponding eigenvectors. (Remember that
for a real matrix, conjugate complex eigenvalues have conjugate eigenvectors.)








0 0
4
−1 −1
1
−2 −3 −3
−1 −1 −6
(a)  1 0
4  (b)  1
0 −2  (c)  3
4
3  (d)  1
0
3 .
0 1 −1
0
1
2
−3 −3 −2
0
1
1
Diagonalization
A diagonal matrix is a square matrix for which all entries not on the main diagonal
are zero. The notation diag (α1 , . . . , αn ) denotes an n × n diagonal matrix with entries
α1 , . . . , αn on the main diagonal, and zeros everywhere else. If there is a non-singular
matrix T such that T −1 AT is a diagonal matrix, then we say that A can be diagonalized.
The main result is that the columns of T are eigenvectors of A, and the diagonal entries
are the corresponding eigenvalues.
94
MATH2011/2/4
Theorem. An n × n matrix A can be diagonalized if and only if A has n independent
eigenvectors.
Proof.
⇒ Suppose T −1 AT = D, where D = diag (α1 , . . . , αn ). Let C1 , . . . , Cn be
the columns of T , so they are independent, since T is non-singular. From the equation
T −1 AT = D we have AT = T D, so
α1
.
A(C1 . . . Cn ) = (C1 . . . Cn )  ..
0

...
..
.
...

0
.. 
.
.
αn
.˙. (AC1 . . . ACn ) = (α1 C1 . . . αn Cn ),
which shows that C1 , . . . , Cn are eigenvectors of A.
⇐ Suppose the n independent eigenvectors are C1 , . . . , Cn , with corresponding eigenvalues α1 , . . . , αn . Let T = (C1 . . . Cn ), the matrix whose columns are C1 , . . . , Cn . Then
the same calculations show that AT = T D, and T is invertible, since its columns are
independent, so T −1 AT = D.
An n × n matrix A has n eigenvalues, because they are the zeros of the characteristic
polynomial, which is of degree n; if the eigenvalues are all distinct, then it can be shown
that the corresponding n eigenvectors are always independent, and by the theorem A can
be diagonalized. Thus diagonalization can fail only if A has a repeated eigenvalue.
Diagonalization has many applications, since it simplifies matrix calculations. For example, if T −1 AT = D, then A = T DT −1 and A2 = T DT −1 T DT −1 = T D2 T −1 . In
general, An = T Dn T −1 , which makes it easy to find An , since if D = diag(α1 , α2 , . . . ),
then Dn = diag(α1n , α2n , . . . ). It follows that An → O as n → ∞ if the eigenvalues of A
are all in modulus less than 1.
Diagonalization can also be used to solve simultaneous linear differential equations to
determine trajectories. Suppose
d
dt X
= AX, where X = (x, y)T or (x, y, z)T , and A is a
constant matrix such that T −1 AT = diag(α1 , α2 , . . . ). Define Y = T −1 X, so X = T Y ,
and
d
dt X
d
= T dt
Y since T is constant. The equation then becomes
d
dt Y
= T −1 AT Y =
diag(α1 , α2 , . . . )Y , which can easily be solved, since the variables in Y are now separate.
Similar methods can be used to find the paths of particles in time-dependent fields, where
d
dt X
= AX + B(t), or to solve the D-operator equation P (D)y = f (t): if we define
X = (y, y ′ , y ′′ , . . . , y (n−1) )T , then we obtain an equation of the form
d
dt X
= AX + B(t).
Tutorial questions — Diagonalization
16. In Question 13(a) to (d), let T be the matrix whose columns are the given eigenvectors.
Find T −1 , and verify by matrix multiplication that in each case T −1 AT is a diagonal
Algebra Chapter 3
95
matrix with the eigenvalues on the diagonal.
Find all eigenvalues and corresponding eigenvectors of the matrix in Question 13(e).
Why does this show that the matrix cannot be diagonalized?
17. Use your working from Question 15(a) to (c) to verify that AT = T D, where A is the
given matrix, T is a matrix of independent eigenvectors, and D is the diagonal matrix
with the corresponding eigenvalues on the diagonal.
1 4
1
18. Diagonalize the matrix A = 5
(hint: first take the fraction inside the matrix),
3 2
and hence find lim An .
n→∞
19. (a) Show by matrix multiplication that (I + A + A2 + · · · + An )(I − A) = I − An+1 for
any square matrix A. Deduce that if An → O as n → ∞, then I + A + A2 + · · · + An →
(I − A)−1 . (Compare the geometric series formula.)
(b) In a recycle chemical reactor involving k reagents, the quantities of reagents present
can be considered to form a vector in Rk . After n recycles the vector is denoted Xn ,
and it can be shown that Xn = X0 + AXn−1 , where X0 is the input vector and A is a
fixed matrix representing the reaction and the removal of finished product.
(i) Show that Xn = (I + A + A2 + · · · + An )X0 . (Hint: use induction.)
(ii) By letting n → ∞ in (i), and using part (a), show that if all the eigenvalues of A
have modulus less than 1, then Xn approaches a steady state vector X∞ , which is given
by the formula X∞ = (I − A)−1 X0 .
20. The voltages v0 , v1 , . . . at the nodes in a ladder network of resistors satisfy the difference equation vn+1 − 52 vn + vn−1 = 0. Find a constant matrix A such that
vn+1
vn
=A
vn
vn−1
.
By diagonalizing A, find an expression for vn in terms of v0 and v1 . Assuming v0 = 10
and v11 = 0, find the last non-zero voltage v10 .
21. In a gas absorption column, the concentration of solute leaving the n-th plate satisfies
the equation yn+1 − (Q + 1)yn + Qyn−1 = 0. Use diagonalization to find an expression
for yn in terms of y0 (the input concentration) and y1 .
22. Use diagonalization to find the trajectories in the vector fields below. (Hint: take transposes and re-write them in the form Ẋ = AX.)
(a) (x + y, 4x + y)
(b) (−x − y + z, x − 2z, y + 2z).
(Hint for (b): use Question 15(b) and leave your answer in complex form.)
23. Use diagonalization to find the paths of particles in the time-dependent velocity fields:
(a) (x + y − et , 4x + y + et )
(Hint for (b): use Question 15(a).)
(b) (4z + e−2t , x + 4z + e−2t , y − z + e−2t ).
96
MATH2011/2/4
The characteristic polynomial
The coefficients in the characteristic polynomial of a matrix can be expressed as sums of
subdeterminants of the matrix. In particular, the trace of a square matrix A, denoted
tr(A), is the sum of the diagonal entries, and is one of the coefficents in cA (λ).
Theorem. For an n × n matrix A = (aij ):
(i) the constant term in cA (λ) is det(A),
(ii) the coefficient of λn in cA (λ) is (−1)n ,
(iii) the coefficient of λn−1 in cA (λ) is (−1)n−1 tr(A).
(Note that for n > 2 these formulae do not give every coefficient.)
Proof. (i) The constant term in cA (λ) is cA (0), which is det(A − 0In ), i.e., det(A), as
stated.
(ii) and (iii). These results are easily verified for n = 1 and n = 2. To prove them by
induction, suppose they are true for n = k. Let n = k + 1, and expand det(A − λIk+1 ) by
its first row; we obtain
cA (λ) = det(A − λIk+1 ) = (a11 − λ) det(B − λIk ) + other terms,
where B is the matrix obtained by deleting the first row and column of A. The other
terms in the expansion of the determinant involve only lower powers than λk , since at
least two appearances of λ are deleted in each of them. Next, det(B − λIk ) = cB (λ), and
by the inductive hypothesis cB (λ) = (−1)k λk + (−1)k−1 λk−1 tr(B)+ lower powers. Thus,
representing lower powers by dots, we have
cA (λ) = (a11 − λ) (−1)k λk + (−1)k−1 λk−1 tr B + · · ·
= (−1)k+1 λk+1 + (−1)k a11 + tr(B) λk + · · ·
= (−1)k+1 λk+1 + (−1)k tr(A)λk + · · · , as required.
Corollary. The sum of the eigenvalues of a square matrix is equal to the trace, and the
product of the eigenvalues is equal to the determinant.
Proof. With the notation above, suppose the eigenvalues of A are α1 , . . . , αn . Then by the
Factor Theorem and the theorem above
cA (λ) = (α1 − λ) . . . (αn − λ),
since the polynomials on each side have the same zeros and the same coefficient of λn . The
result now follows by equating constant terms and coefficients of λn−1 .
Algebra Chapter 3
97
For the final result, we need to consider matrix polynomials p(A) of the form p(A) =
pk Ak + · · · + p1 A + p0 In , where the coefficients are constants and again we have to insert
the identity matrix after p0 for the expression to be meaningful. It is easy to show that
p(A)q(A) = q(A)p(A), even though the factors in general matrix products cannot be
interchanged. This means that the algebra of matrix polynomials (like that of D-operators)
corresponds to that of ordinary polynomials.
Cayley-Hamilton theorem. A square matrix A satisfies its own characteristic equation,
i.e., cA (A) = O, where O denotes the zero matrix.
Proof. Firstly,
cA (λ)In = det(A − λIn )In = (A − λIn ) adj(A − λIn ),
since P adj(P ) = det(P )In for any n × n matrix P . Next, if we write cA (λ) =
then
cA (A) − cA (λ)In =
n
X
(1)
Pn
j
j=0 cj λ ,
cj (Aj − λj In ) = (A − λIn )Q(λ), say,
(2)
j=0
since each term Aj − λj In has a factor A − λIn . (The term for j = 0 vanishes.) If we now
add (1) and (2), then we obtain
cA (A) = (A − λIn ){adj(A − λIn ) + Q(λ)} = (A − λIn )R(λ), say,
which is impossible unless cA (A) = O, since the left hand side does not involve λ, whereas
the right hand side is of degree at least one in λ. (This can be verified as follows: if
R(λ) 6= O, then write R(λ) = Rd λd + · · · + R0 , where Rd 6= O, then multiply out and
equate coefficients of λd+1 .)
The Cayley-Hamilton theorem can be used in applications in which diagonalization
is impossible because of insufficient independent eigenvectors. For example, if Ẋ = AX,
where A is a constant matrix, then we can differentiate to get Ẍ = AẊ = A2 X. Repeating
this process, we get
dn
dtn X
= An X for all n, and by taking linear combinations of these we
d
)X = P (A)X for any polynomial P (λ). In particular, if we take P (λ) = cA (λ),
obtain P ( dt
the characteristic polynomial, then P (A) = cA (A) = O by the Cayley-Hamilton theorem.
d
Therefore cA ( dt
)X = OX = 0. This vector D-operator equation can be solved as in
Calculus Chapter 1, but with arbitrary constant vectors, not numbers. The arbitrary
constants can be found by equating corresponding vector coefficients in Ẋ and AX.
98
MATH2011/2/4
Tutorial questions — The characteristic polynomial
24. Verify with the matrices in Question 15 above that the coefficient of λ2 in the characteristic polynomial is equal to the trace of the matrix and to the sum of the eigenvalues.
Similarly, verify that the constant term in the characteristic polynomial is equal to the
determinant of the matrix and to the product of the eigenvalues.
It can be shown that (for 3 × 3 matrices only) the coefficient of −λ in the characteristic
polynomial is the sum of the cofactors of the entries on the diagonal. Verify this fact
with the matrices in Question 15.
25. Find the characteristic polynomial cA (λ) of a general 2 × 2 matrix A =
a
c
b
d
, and
hence verify that cA (λ) = λ2 − λ tr A + det A. Then show that cA (A) is the zero matrix,
i.e., verify the Cayley-Hamilton theorem for 2 × 2 matrices.
26. A common howler is to give a false proof of the Cayley-Hamilton theorem by saying
cA (A) = det(A − AIn ) = det(O) = 0. Why is this wrong?
27. Verify the Cayley-Hamilton theorem for the matrix in Question 15(a), using the characteristic polynomial in factorized form.
28. (a) If T −1 AT = D and p(A) = pk Ak + · · · + p0 In is a general matrix polynomial, show
that T −1 p(A)T = p(D).
Also show that if D = diag(α1 , . . . , αn ), then p(D) = diag p(α1 ), . . . , p(αn ) .
(b) Deduce that if A is diagonalizable, then cA (A) = T diag cA (α1 ), . . . , cA (αn ) T −1 ,
where α1 , . . . , αn are the eigenvalues of A. Now show that the right hand side is the
zero matrix. (You have just found another proof of the Cayley-Hamilton theorem, but
this proof is valid only for diagonalizable matrices.)
29. Use the Cayley-Hamilton theorem to find parametric equations for the streamlines in
the velocity fields below. (Hint: take transposes and solve Ẋ = vT = AX.)
(a) v = (y, z, −x−3y−3z)
(b) v = (x−3y+2z, 2x−5y+3z, 3x−7y+4z).
Verify in each case that the matrix cannot be diagonalized.
30. Solve the matrix differential equations in Calculus Chapter 1 Question 19by using diagonalization or the Cayley-Hamilton theorem.
31. Two square matrices A and B are said to be similar if there is a non-singular matrix T
such that T −1 AT = B. Thus to say a matrix can be diagonalized is the same as saying
that it is similar to a diagonal matrix.
(a) Verify that the matrices in Question 15(a) to (d) have the same eigenvalues as the
diagonal matrices to which they are similar.
(b) Prove that similar matrices have the same characteristic polynomial (and therefore
the same eigenvalues). (Hint: if B = T −1 AT , show that B − λIn = T −1 (A − λIn )T and
then find cB (λ). Remember that the determinant of a product is equal to the product
of the determinants.)
Algebra Chapter 3
99
Answers
2. Only (a) is a linear space.
1 − 4i
1+i
1−i
3. (a) Z = Z1 =
and Z0 = 0,
(b) Z1 =
and z0 = z2
,
1 + 2i
0
1


 


1+i
−i
−1 + i
1−i






(c) Z1 =
0
and Z0 = z2
1
+ z3
0
,
(d) Z1 =
and
0
0
0
1
−2 − i
Z0 = z2
.
1
4. Check your own answers!
1−i
5. (a) No basis elements, dimension zero.
(b) Basis
, dimension one.
1
 


−i
−1 + i
−2
−
i
(c) Basis  1  and  0 , dimension two. (d) Basis
, dimension one.
1
0
1
6. (a) Basis et and e3t , dimension two,
(b) Basis e(−1+i)t and e(−1−i)t , dimension two,
(c) Basis 1, t, t2 , dimension three,
(d) Basis 1, t, . . . , tn−1 , dimension n,
(e) Basis e(±1±i)t , dimension four.
7. (a) Vectors non-collinear ⇒ area 6= 0 ⇒ det 6= 0 ⇒ matrix invertible.
8. Only (a), because the solution is unique (the null space contains only the zero vector).
9. Independent: (a), (b), (d).
10. (Other answers are possible.) (a) W = −2e4t ,
(b) W = e−2t
(c) W = 2.
11. Not independent means y2 = c1 y1 for some constant c1 .
12. (a) Rank two, One, One, One.
(b) If m = n, then r = m (existence) coincides with r = n (uniqueness).
13. Eigenvalues: (a) 1 and −1, (b) 4 and −1, (c) 0 and 3, (d) 2 − i and 2 + i,
14. (a) λ2 − 1,
(b) λ2 − 3λ − 4,
(e) 1.
(c) λ2 − 3λ, (d) λ2 − 4λ + 5,
(e) (λ − 1)2 .
  
 

2
−2
−4
15. (a) Eigenvalues 2, −2, −1: eigenvectors a  3  , b  −1  , c  0 .
1
1
1

 
 

1
1 − 2i
1 + 2i
(b) Eigenvalues 1, i, −i: eigenvectors a  −1  , b  −2 + i  , c  −2 − i .
1
1
1


 


−1
−1
1





1
(c) Eigenvalues 1, 1, −2: eigenvectors a
0
+b
, c −1 .
0
1,
1
 


−3
3



0
, b −3  .
(d) Eigenvalues 1, 1, −2: eigenvectors a
1
1
100
MATH2011/2/4
1
16. Eigenvalue 1: eigenvector a
. T is not invertible.
−1
3 4
1
2
n
.
18. Eigenvalues 1, − 5 . A → 7
3 4
19. (a) (1+A+· · ·+An )(I −A) → I.(b)(i) Xk+1 = X0 +AXk = X0 +A(I +A+· · ·+Ak )X0 .
(b)(ii) An → O as n → ∞.
5
−1
2 1
2
20. A =
;T =
say (not unique).
1 0
1 2
vn = 13 (21+n − 21−n )v1 − (2n − 22−n )v0
Put v0 = 10 and v11 = 0 and solve for v1 . v10 = 15/(211 − 2−11 ).
Q + 1 −Q
Q 1
21. A =
;
T =
say (not unique).
1
0
1 1
yn = (Qn − 1)y1 − (Qn − Q)y0 /(Q − 1).

  

3t x
1
1 − 2i 1 + 2i
aet
x
1 −1
ae
22. (a)
, (b)  y  =  −1 −2 + i −2 − i   beit .
=
y
2 2
be−t
ce−it
z
1
1
1
3t
x
1 1
ae + et /8
23. (a)
=
,
y
2 −2
be−t − 3et /8


  
  2t

7/12
x
2 −2 −4
ae − 7e−2t /48
(b) T −1 B =  3/4 . X =  y  =  3 −1 0   be−2t + 3te−2t /4  .
−1/3
z
1 1
1
ce−t + e−2t /3
25. cA (A) = A2 − (a + d)A + (ad − bc)I2 , which simplifies to the zero matrix.
26. The definition cA (λ) = det(A − λIn ) is valid only if λ is a scalar, so you cannot replace
λ by the matrix A.
27. (A − 2I3 )(A + 2I3 )(A + I3 ) = O.
P
P
P
28. (a) T −1 p(A)T = T −1
pj Aj T =
pj (T −1 Aj T ) =
pj Dj = p(D).
(b) T −1 cA (A)T = cA diag (α1 , . . . , αn ) = diag cA (α1 ), . . . , cA (αn ) . Finally, cA (α1 ) =
· · · = cA (αn ) = 0 since the eigenvalues are the roots of the characteristic equation.


0
1
0
d
+ 1)3 X = O, so X = P e−t +
29. (a) A =  0
0
1 , cA (λ) = −(λ + 1)3 . ( dt
−1 −3 −3
Qte−t +Rt2 e−t . Ẋ = (−P +Q)e−t +(−Q+2R)te−t +(−R)t2 e−t . AX = AP e−t +AQte−t +
ARt2 e−t . By equating coefficients on the right hand sides it followws that (A + I)R = O,


r
(A+I)Q = 2R, (A+I)P = Q. Solve first for R, then for Q, then for P , to get R =  −r 
r




4r + q
6r + 2q + p
(eigenvector), Q =  −2r − q , P =  −2r − q − p .
q
p



 2 
1 −3 2
a
b
c
t
3





(b) A = 2 −5 3 , cA (λ) = −λ , X = a 2a+b −4a+b+c
t .
3 −7 4
1
a 4a+b −6a+2b+c
Algebra Chapter 3
101
102
æ
MATH2011/2/4
100
æ
MATH2011/2/4
Algebra Chapter 4
101
Algebra Chapter 4
Orthonormality
Dot products and orthonormal bases
We have discussed bases and dimension in general linear spaces of vectors or functions, but
we have not extended the geometrical ideas of lengths and angles, which in two or three
real dimensions are obtained from the dot product. We now generalize the dot product
to n real or complex dimensions, and we define
a · b = (a1 , a2 , . . . , an ) · (b1 , b2 , · · · , bn ) = a1 b1 + a2 b2 + · · · + an bn =
n
X
a k bk .
k=1
The conjugates in the second factors can be ignored if the vectors have real entries, but
are essential for complex vectors because the dot product of a vector with itself must still
be the square of its magnitude, as with real vectors. The definition gives
a·a=
n
X
ak ak =
k=1
n
X
k=1
|ak |2 ,
which is real and non-negative, so we may take square roots and define
|a| =
√
a · a.
Note that |a| = 0 only if a = 0, i.e., the only vector with zero magnitude is the zero vector
itself. Although it is not possible to visualize complex vectors, we still say that vectors a
and b are orthogonal (or normal or perpendicular) if a · b = 0.
If a and b are row vectors (i.e., 1 × n matrices), then the dot product is equal to a
matrix product:
T
a · b = ab ,
where the conjugate in the second factor must not be forgotten. It follows from matrix
algebra that
102
MATH2011/2/4
(i) (a + d) · b = a · b + d · b for any vectors a, b, and d.
(ii) (ca) · b = c(a · b) for any vectors a and b, and any constant c.
(iii) b · a = a · b for any vectors a and b.
Properties (i) and (ii) show that dotting with a fixed vector b is a linear operator. For
column vectors A and B the dot product is given by a slightly different matrix product,
A · B = AT B,
but properties (i)–(iii) remain true.
An orthonormal basis of a linear space is a basis consisting of mutually perpendicular
unit vectors. For example, in R3 the standard basis vectors i, j, and k form an orthonormal
basis, as do the unit tangent u, the unit normal n, and the binormal b at any point on
a smooth curve. This means that orthonormal basis vectors have dot product equal to 0
with one another (because they are mutually perpendicular) and equal to 1 with themselves
(because they are unit vectors). Thus we have that e1 , e2 , . . . , en form an orthonormal basis
if and only if
ek · em =
0 if k 6= m
1 if k = m.
The most important thing about any orthonormal basis is that any vector can easily
be expressed in terms of its components in the directions of the basis vectors, where
components are obtained via dot products.
Theorem. If e1 , e2 , . . . , en form an orthonormal basis of Cn , then a general vector a can
be uniquely expressed in the form
a=
n
X
k=1
λk ek , where λk = a · ek .
Proof. Since e1 , e2 , . . . , en form a basis, it follows that there is a unique expression for a,
say a = λ1 e1 + λ2 e2 + · · · + λn en . Therefore, for k = 1, . . . , n we have
a · ek = (λ1 e1 + λ2 e2 + · · · + λn en ) · ek = λ1 (e1 · ek ) + λ2 (e2 · ek ) + · · · + λn (en · ek ),
which equals λk , since all dot products are zero except for ek · ek , which is equal to 1.
Tutorial questions — Dot products and orthonormal bases
1. Use dot products to find the lengths of the vectors below, and show that each pair of
vectors is orthogonal.
(b) (i, 1 + 2i) and (−1 − 3i, 1 + i) in C2 .
Algebra Chapter 4
103
(b) (−i, 2, 1 + i) and (1 + i, i, 2 − i) in C3 .
(c) (1 + i, 1 − i, i, 4 + 2i) and (1 + 2i, 1 − i, −3 − 3i, −i) in C4 .
2. Use the properties of dot products to show that a · (cb) = c(a · b).
3. If (1, α, β) is orthogonal to (α, β, 1), show that α + αβ + β = 0.
α2 − α
if |α| =
6 1. (Hint: take conjugates of the equation above, and
Deduce that β =
1 − |α|2
then eliminate β.)
If α = 1 + i, find β, verify that the resulting vectors are orthogonal, and find their
length.
4. Show that the vectors u = 51 (2, 2, −1+4i), v = 15 (2, −1+4i, 2), and w = 15 (−1+4i, 2, 2)
form an orthonormal basis of C3 .
5. If u, v, and w, are as in Question 4, and if a = (1, i, −1), find the components a · u,
a · v, and a · w. Verify that a = (a · u)u + (a · v)v + (a · w)w.
P2N
6. (i) If ε = eiπ/N , show that ε = ε−1 and that ε2N = 1. Deduce that j=1 ε(n−m)j = 0
for any integers n and m for which εn−m 6= 1. (Hint: the sum is a geometric series with
common ratio εn−m .) Evaluate the sum if εn−m = 1 (the geometric series formula is
not valid).
P2N (n−m)j
1
1
εn , ε2n , . . . , ε2Nn , show that en · em = 2N
(ii) If we define en := √2N
.
j=1 ε
Deduce from part (i) that the vectors en , for any 2N successive values of n, form an
orthonormal basis of C2N .
7. If e1 , e2 , e3 , . . . form an orthonormal basis, prove that for any vector a
|a|2 = |a · e1 |2 + |a · e2 |2 + |a · e3 |2 + · · · .
(Hint: write a = λ1 e1 + λ2 e2 + · · · , then substitute for the first a in a · a, and simplify,
using properties of dot products. Finally, remember that λ1 = a · e1 , etc.
Unitary and hermitian matrices
We say that an n × n matrix U is unitary if its rows or columns form an orthonormal
basis of Cn . Real unitary matrices are also called orthogonal. Since we proved in Algebra
Chapter 3 that an n × n matrix is invertible if and only if its columns (or rows) form a
basis of Cn , it follows that a unitary matrix always has an inverse. In fact, the inverse of
a unitary matrix is very easy to find.
T
Theorem. An n × n matrix U is unitary if and only if U −1 = U .
Proof. Suppose r1 , r2 , . . . , rn are the rows of U . Then r1 T , r2 T , . . . , rn T are the columns
T
of U , and by matrix multiplication we have

 
r1 · r1
r1
r ·r
 r2 
T
 ( r1 T r2 T · · · rn T ) =  2 . 1
UU = 
.
 ..
 .. 
rn
rn · r1
r1 · r2
r2 · r2
..
.
···
···
..
.
rn · r2
···

r1 · rn
r2 · rn 
,
.. 
. 
rn · rn
104
MATH2011/2/4
since rk rm T = rk · rm for all k and m. If r1 , r2 , . . . , rn form an orthonormal basis of
Cn , then the matrix of dot products has 1s on the diagonal and 0s elsewhere, so it is the
T
T
T
identity matrix. Thus we have U U = In , i.e., U = U −1 . Conversely, if U = U −1 ,
then the matrix of dot products is the identity matrix, and it follows immediately that
r1 , r2 , . . . , rn form an orthonormal basis of Cn . The proof for column vectors is similar,
and appears as a tutorial question.
T
The property that U −1 = U , which characterizes unitary matrices, was shown in first
year Algebra Chapter 4 to hold for rotation matrices. In fact, since rotations about the
origin do not change lengths of vectors or the angles between them, it follows that rotations
take orthonormal bases to orthonormal bases, and hence that any rotation of axes in R2 or
R3 is defined by a (real) unitary matrix. Reflections of the axes in a line or plane through
the origin also leave lengths and angles unchanged (in magnitude) and also correspond to
unitary matrices.
A square matrix A is said to be hermitian (after the French mathematician Hermite)
if A
T
= A. This means that akj = ajk for all j and k. A real hermitian matrix is
called symmetric, since it is equal to its transpose. Hermitian matrices occur in various
applications, as we shall show later; the important thing about them is that they can
always be diagonalized by a unitary matrix. We start with a preliminary result.
Lemma. If A is an n × n hermitian matrix, then (uA) · v = u · (vA) for any 1 × n row
vectors u and v. Similarly (AX) · Y = X · (AY ) for any n × 1 column vectors X and Y .
Proof. By the expression for the dot product as a matrix product, and since the transpose
of a matrix product is the product of the transposes in the reverse order, we have
T
u · (vA) = u(vA)T = uA vT = uAvT = (uA) · v.
The proof for column vectors appears as a tutorial question.
Theorem. If A is an n × n hermitian matrix, then
(i) the eigenvalues of A are all real,
(ii) there is an orthonormal basis of Cn consisting of eigenvectors of A,
T
(iii) there is a unitary matrix U such that U AU is diagonal.
Proof. (i) Suppose X is a non-zero column eigenvector of A, with corresponding eigenvalue α. Then AX = αX, so
(AX) · X = (αX) · X = α(X · X) = α|X|2 ,
using properties of dot products. But by the lemma we have
(AX) · X = X · (AX) = X · (αX) = α(X · X) = α|X|2 .
Algebra Chapter 4
105
(By Question 2 the constant α must be conjugated when it is taken out of the second
factor.) Since |X|2 6= 0, we have α = α, i.e., α is real.
(ii) Suppose X and Y are column eigenvectors corresponding to distinct eigenvalues α
and β. Then (AX) · Y = (αX) · Y = α(X · Y ) and also
(AX) · Y = X · (AY ) = X · (βY ) = β(X · Y ) = β(X · Y ),
since A is hermitian and β is real by (i). By equating the right hand sides we see that
X ·Y = 0, since α 6= β. This shows that eigenvectors for distinct eigenvalues are orthogonal,
and by dividing by their magnitudes we can make them unit vectors. The details for
repeated eigenvalues are beyond us at this stage.
(iii) Let U be a matrix whose columns form an orthonormal basis of eigenvectors of A,
as obtained in (ii). Then U is unitary (because its columns are mutually perpendicular
T
unit vectors) so U −1 = U . Also U −1 AU is diagonal (because the columns of U are
T
eigenvectors). By combining these results we see that U AU is diagonal, as required.
The following table summarizes diagonalizability properties of square matrices:
• Any hermitian matrix is diagonalizable by a unitary matrix of eigenvectors.
• A non-hermitian matrix with distinct eigenvalues is diagonalizable, but the invertible
matrix of eigenvectors may not be unitary.
• A non-hermitian matrix with one or more repeated eigenvalues may or may not be
diagonalizable, depending on the total number of independent eigenvectors.
Tutorial questions — Unitary and hermitian matrices
T
8. Show that the matrix U below is unitary, by calculating U U .

1+i
1  1
U= √ 
0
5
−1 + i
0
1+i
−1 + i
1
−1 + i
0
1
1+i

1
−1 + i 
.
1+i
0
9. For N = 1 and N = 2 write down the matrix whose rows are the vectors e−N+1 , . . . , eN
given in Question 6. Verify that each matrix is unitary.
10. If C1 , C2 , . . . , Cn are the column vectors of an n × n matrix U , show that U T U is the
matrix of dot products of C1 , C2 , . . . , Cn . (Hint: remember that Ck · Cm = (Ck )T Cm
because they are column vectors.) Deduce that U −1 = U
form an orthonormal basis.
T
if and only if C1 , C2 , . . . , Cn
T
11. (i) If U is a unitary matrix, show that | det U | = 1. (Hint: remember U U = In , and
T
take determinants of both sides, noting that det(U ) = det U . Why?) It follows that a
real unitary matrix has determinant equal to ±1. The sign determines whether it is a
106
MATH2011/2/4
rotation or a reflection matrix.
(ii) Show that the 2 × 2 rotation matrix
nant +1.
cos α sin α
− sin α cos α
is unitary and has determi-
cos α
sin α
(iii) Show that 2 × 2 matrix
is unitary and has determinant −1. (It
sin α − cos α
corresponds to a reflection of axes in the line with polar angle 12 α.)
12. If A is an n × n hermitian matrix and X and Y are n × 1 column vectors, prove that
(AX) · Y = X · (AY ). Also prove that the diagonal entries of A are real.
13. Find unitary matrices that will diagonalize the following hermitian matrices, i.e., find
orthonormal bases of eigenvectors. Make sure that they are all unit vectors, and that the
eigenvectors for a repeated eigenvalue are orthogonal. Distinct eigenvalues are given.
(Save (e) and (f) for revision.)
9 −2
7
−9
(a)
(b)
−2 6
−9 −17


−1 4 −8
4
−1 + i
(c)
(d)  4 −7 −4  (±9)
−1 − i
5
−8 −4 −1




1 −2i
0
2
185
48 −12
i 
 2i −2 −3
(e)  48
313 −36  (169, 338)
(f) 
 (3, −3, −6).
0 −3
0
3i
−12 −36 178
2 −i −3i −2
a b
14. (i) If A =
, a general 2 × 2 real symmetric matrix, solve the characteristic equab d
q
tion cA (x) = 0, and show that the eigenvalues of A are 12 (a+d)+ 14 (a − d)2 + b2 = λ1 ,
q
1
say, and 2 (a + d) − 14 (a − d)2 + b2 = λ2 , say.
(ii) Show that the circle with centre at the point P 21 (a+d), 0 and passing through
the point Q(a, b) has equation cA (x) + y 2 = 0. (Hint: let R(x, y) be on the circle; then
|RP |2 = |P Q|2 = |P T |2 + |T Q|2 , where T is the point (a, 0)). Let L (on the left) and
R (on the right) be the points where the circle cuts the x axis. Show that R is (λ1 , 0)
and L is (λ2 , 0). (Hint: put y = 0 in the equation of the circle.) The circle is called
Mohr’s circle; it provides a graphical method for finding eigenvalues of 2 × 2 real
symmetric matrices.
9 −2
7
−9
(iii) Draw (accurately) Mohr’s circles for the matrices
and
.
−2
6
−9 −17
Hence find their eigenvalues and check your answers by solving the characteristic equations. Then find the eigenvectors and verify that they are parallel to LQ and RQ.
(iv)* Show that the eigenvectors of the general symmetric matrix A are parallel to LQ
and RQ. (Hint: show that vector LQ = (a −λ2 , b) and that (A − λ1 I)(LQ)T = 0.)
Algebra Chapter 4
107
Applications
There are many applications of diagonalization of hermitian matrices by unitary matrices.
First of all, vector differential equations can be solved, as before. In particular, if we have
a real force field F = AX, where A is a constant matrix and X = (x, y, z)T , then, since
force fields are conservative (unless the total energy in the system changes), it follows that
A is symmetric. Hence there exists a real unitary matrix U such that U T AU is diagonal.
The path of a particle of mass m in such a field satisfies the differential equation F = mẌ,
1
AX. If we define Y = U T X (which corresponds to a rotation of axes to the
i.e., Ẍ = m
1
(U T AU )Y ,
directions given by the columns of U ), then X = U Y and we obtain Ÿ = m
which is easily solved, since U T AU is diagonal.
Secondly, real quadratic forms in two or more variables can be reduced to canonical
form by this method. If Q = ax2 + by 2 + cz 2 + 2dxy + 2exz + 2f yz, then it is easy to


a d e
check that Q = X T AX, where X = (x, y, z)T as before, and A =  d b f  . If we
e f c
T
T
T
again define Y = U X, then we obtain Q = Y (U AU )Y , which is in canonical form,
because U T AU is diagonal. The new axes, in the directions of the columns of U , are called
the principal axes of the quadratic form. The coefficients in the canonical form are the
diagonal entries in U T AU , i.e., the eigenvalues of A.
In particular, an explicit quadric surface, say
2
2
z = ax + 2bxy + cy = (x
y)
a
b
b
c
x
= X T AX = αu2 + βv 2 ,
y
is a saddle if the eigenvalues α and β of A have opposite signs, a cup if they are both
positive, and a cap if they are both negative. Since det A = αβ, the product of the
eigenvalues, we have a saddle if det A < 0, and a cup or cap if det A > 0. In the latter
case, the sign of the trace of A will distinguish cup from cap, since tr A = α + β, the sum
of the eigenvalues.
Near a stationary point on any surface, we showed in Calculus Chapter 3 how ∆z is
approximated by a quadratic form in ∆x and ∆y, thus:
∆z ≈ zxx ∆x2 + 2zxy ∆x∆y + zyy ∆y 2 ,
zxx zxy
2
in which the coefficient matrix is
, which has determinant zxx zyy − zxy
and
zxy zyy
trace zxx + zyy . From the previous paragraph it follows that there is a saddle if zxx zyy −
2
2
zxy
< 0, and if zxx zyy −zxy
> 0, then there is a cup or cap (proper minimum or maximum),
which can be distinguished by inspecting the sign of zxx +zyy . This confirms the conclusions
reached in Calculus Chapter 3.
108
MATH2011/2/4
*Thirdly, if a rigid body is under stress but in equilibrium, then the stresses at any
point can be expressed in the form of a matrix (strictly speaking, a tensor)

σx
A =  τyx
τzx
τxy
σy
τzy

τxz
τyz  ,
σz
where the σs are tensile or compressive stresses parallel to the axes, and the τ s are shear
stresses. Since the body is in equilibrium, it follows that τxy = τyx , etc., so the stress
tensor A is symmetric. If the axes are rotated to the directions of a real unitary matrix U ,
then it can be shown that the stress tensor becomes U T AU . In particular, if we take U
to be a matrix of eigenvectors, then U T AU is diagonal, so the shear stresses are all zero,
and the longituditudinal stresses are the eigenvalues of A, which are called the principal
stresses. The corresponding axes (in the directions of the eigenvectors) are called the
principal axes of stress.
*For plane stresses, where the stress tensor A is 2×2, Mohr’s circle (see Question 14 with
a = σx , b = τxy , and d = σy ) is very useful. We have shown that the points Q(σx , τxy ) and
S(σy , τxy ) lie on the circle. If the axes are rotated, then the entries in the new stress tensor
U T AU correspond to other points on the same circle. This was shown in Question 14 for
the principal stresses, which are the eigenvalues of A. (More precisely, if u and v axes are
obtained by rotating through an angle θ, then the point Q′ (σu , τuv ) on the circle satisfies
b ′ = −θ, i.e., QPb Q′ = −2θ.)
QLQ
*Fourthly, if A is an m × n matrix (not necessarily square), it is easy to show that the
T
T
matrices AA and A A are both hermitian. Thus there exist unitary matrices U and V ,
T
T
T
T
and real diagonal matrices D and E, such that U AA U = D and V A AV = E. The
non-zero entries in D and E are the same, are all positive, and are equal in number to
the rank of A. It can also be shown that (after rearrangement of columns, if necessary)
T
U AV = S say, an m × n diagonal matrix whose non-zero entries are the square roots of
those in D and E. The diagonal entries in S are called the singular values of A, and the
expression A = U SV
T
is called the singular value decomposition of A.
Tutorial questions — Applications
15. (i) Find curl(rA) if A is a constant 3 × 3 matrix and r = (x, y, z). Deduce that the
vector field rA is conservative if and only if A is symmetric.
(ii) Use the diagonalization obtained in Question 13(b) to find the general path of a
particle of unit mass in the force field F = (7x − 9y, −9x − 17y).
(iii) Repeat part (ii) with the force field F = (−x + 4y − 8z, 4x − 7y − 4z, −8x − 4y − z),
using the diagonalization obtained in Question 13(d).
Algebra Chapter 4
109
16. Discuss how the signs of the eigenvalues of a 3 × 3 real symmetric matrix A determine
the nature of the implicit quadric surface X T AX = 1. (See Calculus Chapter 3.)
17. Use the eigenvalues from Question 13(d) to rewrite the left hand sides below in canonical
form. Hence identify the implicit quadric surfaces defined by the equations.
(a) −x2 − 7y 2 − z 2 + 8xy − 16xz − 8yz = 9
(b) −x2 − 7y 2 − z 2 + 8xy − 16xz − 8yz = −9
(c) −x2 − 7y 2 − z 2 + 8xy − 16xz − 8yz = 0.
* 18. Use Question 14(ii) and the fact that the matrices A and U −1 AU have the same characteristic polynomial (see Algebra Chapter 3 Question 31(b)) to show that rotation of
axes does not change the Mohr circle.
T
19. If A is an m × n matrix, show that A A is hermitian, and that its eigenvalues are
non-negative. (Hint: let X be a non-zero column eigenvector corresponding to an eigenvalue σ. Show that |AX|2 = σ|X|2 .)
Fourier series
It is known that a smooth function can be approximated near 0 by a polynomial of degree n;
then by letting n → ∞ we obtain the Maclaurin series, which gives an exact infinite series
expression for the function, at least near 0. We now show how orthonormal bases can
be used to find another series expression, called the Fourier series, for a function. The
difference is that:
• to have a Maclaurin series the function must be differentiable infinitely many times,
• to have a Fourier series the function must be periodic, but need not be even continuous.
Suppose therefore that the function f (t) is periodic of period 2l, i.e., there is a positive
number l such that f (t + 2l) = f (t) for all t. This simply means that the graph y =
f (t) repeats itself after any interval of length 2l. We allow f (t) to take complex values,
but t must always be real. For example, the function eit is periodic of period 2π, since
ei(t+2π) = eit e2iπ = eit . The sound wave of a musical note is periodic, but that of a noise
is usually not periodic.
To obtain the Fourier series of a function of period 2l we use the method of interpolation
or sampling, as in digital recordings. We subdivide any interval of length 2l, say the interval
from t = 0 to t = 2l, into 2N equal subintervals of length ∆t, where ∆t = Nl . This gives
√
2N values f (k∆t) from k = 1 to k = 2N , which we divide by 2N and write as a vector
f=√
1
f (∆t), f (2∆t), . . . , f (2N ∆t) ,
2N
which we call the sampling vector of the function f (t).
110
MATH2011/2/4
Theorem. The sampling vectors of the functions eniπt/l , for n = −N +1, . . . , N , form an
orthonormal basis of C2N .
Proof. Let en denote the sampling vector of eniπt/l . Then, by definition of sampling vector,
1
en = √ (eniπ∆t/l , e2niπ∆t/l , . . . , e2Nniπ∆t/l ).
2n
But
∆t
l
=
1
,
N
so
1
en = √ (eniπ/N , e2niπ/N , . . . , e2Nniπ/N )
2n
1
= √ (εn , ε2n , . . . , ε2Nn ),
2n
if we put ε = eiπ/N . Thus en is exactly the same as in Question 6, where we showed that
we can form an orthonormal basis of C2N by taking en for any 2N successive values of n,
in particular, from n = −N +1 to n = N .
We can now express our sampling vector f in terms of this orthonormal basis, to get
f=
N
X
n=−N+1
λn en , where λn = f · en .
The left hand side is the sampling vector of f (t), and the right hand side is the sampling
PN
vector of the function n=−N+1 λn eniπt/l , which is also periodic of period 2l. Since the
sampling vectors of the two functions are equal, it follows that the function values coincide
whenever t is a multiple of l/N , i.e., at 2N points in every period. At intermediate values
of t the functions may not be exactly equal, and the best we can do in general is to write
f (t) ≈
N
X
n=−N+1
λn einπt/l for all t.
(∗)
Algebra Chapter 4
111
N=4
N = 12
N = 36
Figure 4.1. Fourier approximations to a square wave
In order to improve the accuracy of this approximation, we let ∆t → 0, i.e., N → ∞.
This means that the number of points in each period where the two functions are exactly
equal tends to infinity, so the right hand side gets closer and closer to f (t). The effect
of increasing N is shown in Figure 4.1: the function f (t) is a square wave, shown by the
dotted line in each graph. The approximations for different values of N are given by solid
lines: note how they oscillate from one side of f (t) to the other, and how they become
more accurate as N increases, except near the discontinuities. In the limit, the result is
exact, except at the discontinuities themselves.
Letting N → ∞ obviously turns the sum on the right hand side into an infinite series
from n = −∞ to n = ∞. What is the effect on each coefficient λn , where n is kept
constant?
lim λn = lim f · en
N→∞
N→∞
√1
N→∞ 2N
= lim
f (∆t), . . . , f (2N ∆t) ·
2N
1 X
f (k∆t)e−iknπ/N
= lim
N→∞ 2N
√1
2N
einπ/N , . . . , e2Ninπ/N
(conjugates in the second factor!)
k=1
2N
X
1
lim
f (k∆t)e−iknπ∆t/l ∆t
2l ∆t→0
k=1
Z 2l
1
f (t)e−inπt/l dt,
=
2l 0
=
(since ∆t =
l
)
N
112
MATH2011/2/4
since an integral can be expressed as the limit of a sum. The limit of λn is called the nth
complex Fourier coefficient of the function f (t), and we shall denote it by cn . If we
P∞
let N tend to infinity in equation (∗), then we expect to obtain f (t) = n=−∞ cn einπt/l ,
which is called the complex Fourier series expansion of f (t). Although the two sides are
exactly equal at infinitely many points in each period, this does not necessarily mean at
every single point. However, all periodic functions we encounter are equal to their Fourier
series expressions at all points of continuity.
Because f (t) and e−inπt/l are both periodic of period 2l, the integral expression for cn
can be evaluated over any interval of length 2l. In practice, it is usually best to integrate
from −l to l, in order to simplify the integration if f (t) is an even or odd function. Thus we
have the following result, although we are unable to prove the details about convergence,
and we assume the function is sufficiently well behaved.
Theorem. If f (t) is of period 2l, then the complex Fourier series of f (t) is
Z
∞
X
1 l
inπt/l
f (t)e−inπt/l dt.
cn e
, where cn =
2l
−l
n=−∞
At points where f (t) is continuous the Fourier series converges to f (t), and at finite discontinuities the Fourier series converges to the midpoint of the jump.
By pairing off the terms for ±1, ±2, . . . , we can re-write the Fourier series as
∞
X
c0 +
cn einπt/l + c−n e−inπt/l ,
n=1
and if f (t) is real, then it is easy to see from the integral formula for the Fourier coefficients
that cn = c−n , so c0 is real, and the Fourier series becomes
∞
∞
X
X
inπt/l
inπt/l
= c0 +
2 Re cn einπt/l .
c0 +
cn e
+ cn e
n=1
n=1
If we now define an = cn + cn = 2 Re(cn ) and bn = i(cn − cn ) = −2 Im(cn ), then c0 = 21 a0
and cn = 12 (an − ibn ), so
2 Re cn einπt/l = Re (an − ibn )(cos nπt
+ i sin nπt
) = an cos nπt
+ bn sin nπt
.
l
l
l
l
Thus for a real-valued function f (t) of period 2l the Fourier series can be written as
∞
X
1
nπt
nπt
a
+
a
cos
+
b
sin
,
0
n
n
2
l
l
n=1
where
1
an =
l
Z
l
f (t) cos
−l
nπt
l
1
dt and bn =
l
Z
l
−l
f (t) sin nπt
l dt.
The results about convergence are, of course, the same as before. In practice it is generally
easier to find the complex Fourier series and then combine the terms for ±n, since only
one integration is required.
Algebra Chapter 4
113
Tutorial questions — Fourier series
20. If f (t) has period 2 and f (t) = 1 − t for −1 < t < 1, find the complex Fourier series
of f (t). Use the theorem about convergence of Fourier series to sketch the graph of the
function to which the series converges between t = −4 and t = 4. Rewrite the series in
real form.
21. Find the complex Fourier series of the function f (x) of period 1 such that f (x) = 0 for
− 12 < x < 0 and f (x) = x for 0 < x <
1
.
2
Write the series in real form, and sketch
the graph of the function to which the series converges between x = 0 and x = 4. By
1
considering the value to which the series converges when x = 21 , show that 1 + 91 + 25
+
1
49
+··· =
π2
8 .
22. Find the complex Fourier series of the function f (x) of period 2π such that f (x) = x2
for −π < x < π. Write it in real form, and considering x = 0 and x = π deduce that
∞
∞
X
X
(−1)n−1
1
π2
π2
=
and
=
.
n2
12
n2
6
n=1
n=1
23. If f (t) has period 2π and f (t) = eiαt for −π < t < π, where α is not an integer, find
the complex Fourier series for f (t). (Hint: e±iπ = −1.)
To what value does the series converge when t = 0? (Hint: f (t) is continuous at t = 0.)
∞
X
π
(−1)n
Deduce that
=
. By what test can you be sure that the series on
sin πα n=−∞ α − n
the right hand side converges?
Show that f (t) jumps from the value eiαπ to the value e−iαπ at t = π. To what value
does the Fourier series converge here? By pairing off the terms for ±1, ±2, . . . show
that
∞
X
n=1
n2
1 − πα cot πα
1
=
.
2
−α
2α2
By what test can you be sure that the series on the left hand side converges? Use
L’Hôpital’s Rule to find the limit of the right hand side as α → 0, and hence evaluate
P∞ 1
n=1 n2 .
24. (i) Assuming f (t) is real-valued, apply the formulae an = 2 Re(cn ) and bn = −2 Im(cn )
to the integral expression for cn , and hence obtain the formulae for an and bn given
above.
(ii) If in addition f (t) is either an even function or an odd function, show that these
formulae simplify further, as follows:
Rl
f (t) cos nπt
l dt and bn = 0,
R
l
dt.
if f (t) is an odd function, then an = 0 and bn = 2l 0 f (t) sin nπt
l
if f (t) is an even function, then an =
2
l
0
114
MATH2011/2/4
25. The following Fourier series of period 2π were found in Algebra Chapter 1 Question 48:
ln(2| cos
1
2 θ|)
∞
X
(−1)n+1
=
cos nθ
n
n=1
and
arctan(tan
1
2 θ)
∞
X
(−1)n+1
=
sin nθ,
n
n=1
where the first function is even, and the second function is odd. Using the formulae
for an and bn given in Question 24, show without integrating that
Z
π
ln(2| cos
0
1
2 θ|) cos nθ dθ
(−1)n+1 π
=
=
2n
Z
0
π
arctan(tan 12 θ) sin nθ dθ.
Confirm this result by simplifying arctan(tan 21 θ) and evaluating the second integral.
(The first integral cannot be evaluated by elementary methods.)
* 26. Show that if f (t) is periodic of period 2l, and if f is the sampling vector, as above, then
Rl
1
limN→∞ f · f = 2l
|f (t)|2 dt. Hence adapt the result of Question 7 to prove that
−l
Rl
P∞
1
2
2
n=−∞ |cn | . (This is called Parseval’s identity.)
2l −l |f (t)| dt =
Answers
√
√
1. (a)
6, 2 3;
(b)
√
√
7, 2 2;
(c) 5,
√
26.
2. a · (cb) = (cb) · a = c(b · a) = c(b · a) = c(a · b).
√
3. β = 1 − 3i,
|(1, α, β)| = 13.
5. a · u = 53 (1 + 2i), a · v = 15 (4 − i), a · w = 15 (−3 − 2i).
6. (i) Sum is 2N if εn−m = 1. (Every term equals 1.)
7. |a|2 = a · a = (λ1 e1 + λ2 e2 + · · · ) · a = λ1 (e1 · a) + λ2 (e2 · a) + · · · = λ1 (a · e1 ) +
λ2 (a · e2 ) + · · · = λ1 λ1 + λ2 λ2 + · · · = |λ1 |2 + |λ2 |2 + · · · .


−i −1 i 1
1
1 1
1 1
1  1
9. √12
,
.
2  i
−1 −i 1
1 −1
−1 1 −1 1
11. (i) Transposing does not change the determinant, and conjugating the matrix also
conjugates the determinant, since the conjugate of a sum or product is equal to the sum
or product of the conjugates.
13. N.B. The unitary matrices given are not the only correct ones.
1 −2
1 −3
1
1
√
√
(a) λ = 5, 10, U = 5
(b) λ = −20, 10, U = 10
,
2 1
3 1


2 1 2
1 − i −1
(c) λ = 3, 6, U = √13
,
(d) λ = 9, −9, −9, U = 31  1 2 −2 ,
1
1+i
−2 2 1
Algebra Chapter 4
115


−3 12
4
1 
4 −3 12 ,
(e) λ = 169, 169, 338, U = 13
12
4 −3


1 i −1 0
 0 −1 i 1 
(f) λ = 3, 3, −3, 6, U = √13 
.
i 1
0 1
1 0
1 i
2
2
14. (i) cA (x) = x − (a + d)x + ad − b = x − 12 (a + d)2 − 14 (a − d)2 + b2 .
(ii) |RP |2 = x − 12 (a + d)2 + y 2 . |P Q|2 = 41 (a − d)2 + b2 . Points L and R are where
y = 0, so cA (x) = 0.
(iii) Centre at (7 12 , 0), passing through (9, −2).
Centre at (−5, 0), passing through (7, −9).
(cA (a) + b2
(a − λ1 )(a − λ2 ) + b2
. The first
=
(iv) (A − λ1 I)(LQ) =
b{(a + d) − (λ2 + λ1 )}
b(a − λ2 ) + (d − λ1 )b
component is zero by part (ii), since Q(a, b) is on the circle, and the second component is
T
zero because a + d = tr A = λ1 + λ2 .
15. (i) If A = (aij ), then curl(rA) = (a23 − a32 , a31 − a13 , a12 − a21 ).
√
√
x
20 t
A cos √20 t + B sin √
(ii)
=U
. (U as in Question 13.)
y
C cosh 10 t + D sinh 10 t
 


x
A cosh 3t + B sinh 3t
(iii)  y  = U  C cos 3t + D sin 3t .
z
E cos 3t + F sin 3t
16. +, +, + ellipsoid;
+, +, − hyperboloid of one sheet;
+, −, − hyperboloid of
two sheets;
planes.
+, +, 0 elliptic cylinder;
+, −, 0 hyperbolic cylinder;
+, 0, 0 pair of
Others impossible with 1 on RHS.
17. (a) u2 − v 2 − w2 = 1, hyperboloid of two sheets
(b) u2 − v 2 − w2 = −1,
(c) u2 − v 2 − w2 = 0, cone.
hyperboloid of one sheet,
18. If B = U T AU = U −1 AU , then cB (x) = cA (x) by Algebra Chapter 3 Question 31(b).
Equation of Mohr circle of B is cB (x) + y 2 = 0, which is the same equation as Mohr circle
of A.
T
T
T
19. (A A) = (AT A)T = A A.
T
T
T
T
T
T
(A A)X = σX, so X (A A)X = X (σX), i.e. (AX) (AX) = σX X, i.e. |AX|2 =
σ|X|2 .
20. 1 +
1
π
P
n6=0
(−1)n inπt
e
in
=1+
2
π
P∞
n=1
(−1)n
n
sin nπt.
2
1
-4
-3
-2
-1
1
2
3
4
116
21.
=
1
8
MATH2011/2/4
1
8
+
1−(−1)n 2niπx
(−1)n+1
−
e
niπ
n2 π 2
n+1
n
P
∞
(−1)
1
sin 2nπx − 1−(−1)
n=1
2
nπ
n2 π 2 cos 2nπx .
+
1
4
P
n6=0
0.5
1
2
3
4
1−(−1)n
n
=
Converges to at x = (jumps from to 0), so = +
n=1 0 − n2 π 2 (−1)
1
1
1
1
+ π 2 1 + 32 + 52 + · · · .
8
n
n
P∞
P∞
2
2
22. π3 + 2 n6=0 (−1)
einx = π3 + 4 n=1 (−1)
cos nx. Converges to 0 at x = 0 and to π 2
n2
n2
1
4
1
2
1
2
1
4
1
8
1
2
P∞
at x = π. (Points of continuity.)
P∞
1
eint . Converges to eiα0 = 1 at t = 0 (point of continuity).
23. sinπαπ n=−∞ (−1)n α−n
Check convergence by alternating series test. Converges to 21 (eiαπ + e−iαπ ) = cos απ
(midpoint of jump) at t = π. Separate term for n = 0 and combine terms for ±n. Use
P 1
π2
limit comparison test with p-series for p = 2.
n2 = 6 .
Rπ
25. arctan(tan 21 θ) = 21 θ for −π < θ < π so second integral = 12 0 θ sin nθ dθ
1
= 21 − n1 θ cos nθ + n12 sin nθ]π0 = 2n
π(−1)n+1 .
Algebra Chapter 4
æ
117
Download