Uploaded by Henry Chen

University of Auckland Maths208 Coursebook

advertisement
1
Contents
Calculus
1.1 Functions of Two and Three Variables . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.2 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.3 Implicit differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.4 Gradient and directional derivatives . . . . . . . . . . . . . . . . . . . . .
1.1.5 Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.6 Constrained Optimisation — Lagrange Multipliers . . . . . . . . . . . . .
1.2 Sequences & Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.3 Taylor Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.4 Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Integration Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Review: Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.2 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Linear algebra
2.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Review: Solving Linear Systems of Equations . . . . . . . . . . . . . . . .
2.1.2 Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.3 Linear Independence and Dependence . . . . . . . . . . . . . . . . . . . .
2.1.4 Definition of a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.5 Basis and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.6 Subspaces of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.7 Matrices and their Associated Subspaces in Rn . . . . . . . . . . . . . . .
2.1.8 The General Solution of Ax = b . . . . . . . . . . . . . . . . . . . . . .
2.2 Inner Products and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Orthogonal and orthonormal bases . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Orthogonal projection of one vector onto the line spanned by another vector
2.2.3 The Gram-Schmidt process . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.4 Least squares solutions of systems of linear equations . . . . . . . . . . . .
2.3 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Eigenvalues: The characteristic equation of a matrix. . . . . . . . . . . . .
2.3.3 Diagonalisation of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.4 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.5 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.6 Discrete Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . .
Differential equations
3.1 First-Order Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
7
11
14
19
23
27
27
38
44
49
56
56
58
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
63
63
66
68
75
77
79
81
87
91
91
93
94
96
103
103
106
110
112
116
122
. . . . . . . 128
. . . . . . . 128
. . . . . . . 129
CONTENTS
3
3.1.3 First-Order Differential Equations . . . . . . . . . . . . . . . . .
3.2 Systems of First-Order Differential Equations . . . . . . . . . . . . . . .
3.2.1 First-order linear homogeneous equations . . . . . . . . . . . . .
3.2.2 Systems of first-order linear DEs . . . . . . . . . . . . . . . . . .
3.3 Homogeneous Linear Second-Order DEs with constant coefficients . . . .
3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2 Solving Homogeneous Linear Second-Order DEs . . . . . . . . .
3.3.3 Homogeneous Linear DEs with Constant Coefficients. . . . . . .
3.3.4 Equivalence of Second Order DE and First-Order System of DEs
Appendix
4.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 Vector Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.2 Length, distance, and angles in Rn . . . . . . . . . . . . . . . . .
4.2 Vector Representation of Lines and Planes . . . . . . . . . . . . . . . . .
4.2.1 Vector Representation of Lines and Planes . . . . . . . . . . . . .
4.3 Systems of Linear Equations and Matrices . . . . . . . . . . . . . . . . .
4.3.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . .
4.3.2 Matrix notation and concepts . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
133
149
149
150
156
156
158
163
168
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
170
170
171
174
174
178
178
188
Calculus
1.1 Functions of Two and Three Variables
1.1.1 Review
Partial derivatives have been covered in MATHS 108, as
background material, we have a revision in this subsection.
A function of one variable
y = f (x)
is a rule that assigns to each value of the independent
variable x in a set on the x-axis exactly one value of the
dependent variable y. The graph of a continuous function y = f (x) is a curve in the xy-plane..
A function of two variables
z = f (x, y)
is a rule that assigns to each pair of values of the independent variables x and y in a region of the xy-plane exactly one value of the dependent variable z. The graphs
of continuous functions z = f (x, y) are surfaces, see
two examples below, in the xyz−space.
0
1
−0.5
0.5
−1
0
−1.5
−0.5
−2
150
−1
150
150
100
150
100
100
50
50
0
0
4
100
50
50
0
0
CONTENTS
5
Recall the notation for the partial derivatives of a function z = f (x, y) of two variables:
First Partial Derivatives
∂f
= fx ,
∂x
∂f
= fy
∂y
Second Partial Derivatives
∂2f
∂ ∂f
=
= (fx )x = fxx ,
∂x2
∂x ∂x
∂ ∂f
∂2f
=
= (fx )y = fxy ,
∂y∂x
∂y ∂x
∂2f
∂ ∂f
=
= (fy )x = fyx ,
∂x∂y
∂x ∂y
∂2f
∂ ∂f
=
= (fy )y = fyy .
∂y 2
∂y ∂y
Notes:
For functions we will meet in this course, the second
partial derivatives satisfy fxy = fyx . However, it is not
the case in general.
Example 1.1.1.
Find fx , fy , fxx , fxy , fyx and fyy for the following functions of two variables:
(a) f (x, y) = x sin(y)
(b) f (x, y) =
p
x2 + y 2
√
(c) f (x, y) = x xy
CONTENTS
6
Example 1.1.2.
In many production processes, manufacturing costs consist of fixed costs (purchase or rental of equipment and
facilities) and two variable costs: capital and labour. We
let k denote units of capital, and l units of labour.
If the variable cost function is
C(k, l) = 108 + 18k + 40l,
∂C
∂C
and
, the marginal costs of capital and labour
find
∂k
∂l
respectively.
Example 1.1.3.
Find fx , fy , fz , fxz , fyx and fzx , for the following functions of three variables:
(a) f (x, y, z) = ln(x2 + y 2 + z 2 )
(b) f (x, y, z) = e2xy−z
Example 1.1.4.
The Matlab code to calculate the partial derivatives in
the first example above is
%Matlab-session
syms x y z
f=’log(x^2+y^2+z^2)’;
diff(f,x)
diff(f,y)
diff(f,z)
diff(diff(f,x),z)
diff(diff(f,y),x)
diff(diff(f,z),x)
CONTENTS
7
1.1.2 The Chain Rule
Recall the Chain Rule for differentiation of a composite
function of one variable:
y = y(u(x)) ⇒ y ′ = y ′ (u(x) )u′ (x),
or in more suggestive notation
dy
dy du
=
.
dx
du dx
We now look at the extension of the Chain Rule to functions of more than one variable in the following two
cases.
z
Case 1:
Let z = f (u, v), u = u(x), and v = v(x).
In this case we can show the hierarchy of dependency in
the way given in the adjacent diagram:
u
v
x
x
To find the rate of change of z with respect to x, we need
to travel down all paths from z to x taking derivatives
or partial derivatives as we go, multiplying together all
derivatives along each path and adding these products:
z
dz
du
Chain Rule
dz
∂z du ∂z dv
=
+
dx
∂u dx ∂v dx
∂z du ∂z dv
dz
=
+
dy
∂u dy
∂v dy
dz
dv
u
v
du
dx
dv
dx
x
x
Figure 1.1: Chain rule
Note:
• If z is a function of one variable x, then use the
dz
for the derivative.
notion
dx
• If z is a function of more than one variable x, for
∂z
instance, z = z(x, y), then use the notion
and
∂x
∂z
for the derivatives.
∂y
CONTENTS
8
Example 1.1.5.
(a) z = uv 2 , u = cos(x), v = sin(x). Find
(b) z = ex sin(y), x = t2 , y = ln(t). Find
(c) w =
dz
.
dx
dz
.
dt
√
x y
dw
+ , x = t, y = cos(2t) and z = e−3t . Find
.
y
z
dt
(d) v = xy 2 z 3 , x = sin(t), y = cos(t), z = 1 + e2t . Find
dv
.
dt
CONTENTS
9
Case 2: Let z = z(u, v), u = u(x, y) and v = v(x, y).
We can show the hierarchy of dependency in the following diagram:
z
∂z
∂v
∂z
∂u
u
∂u
∂x
x
v
∂v
∂x
∂u
∂y
y
x
∂v
∂y
y
To find the rate of change of z with respect to x again
travel down all paths from z to x (i.e. consider all dependencies of z on x) taking partial derivatives as you
go, and multiplying together all derivatives along each
path:
Chain Rule
∂z ∂u ∂z ∂v
∂z
=
+
,
∂x
∂u ∂x ∂v ∂x
∂z
∂z ∂u ∂z ∂v
=
+
.
∂y
∂u ∂y
∂v ∂y
2
(a) z = u cos(v) sin(u), u = exy ,
∂z
∂z
and
.
v = x2 + y. Find
∂x
∂y
Example 1.1.6.
CONTENTS
(b) z = ln(x2 + y 2 ), x = t2 − s2 , y = 2st. Find
(c) z =
10
∂z
∂z
and
.
∂s
∂t
x
∂z ∂z
∂z
, x = rest , y = rset . Find
,
, and
when (r, s, t) = (1, 2, 0).
y
∂r ∂s
∂t
CONTENTS
11
1.1.3 Implicit differentiation
A function given by an equation F (x, y) = 0 is called
an implicit function. By contrast, a function y = f (x)
(i.e., y is expressed explicitly in terms of x) is called an
explicit function.
For example, the functions given by the equation F (x, y) =
x2 +√
y 2 − 1 = 0 are implicit
√ functions. But the functions
2
y = 1 − x and y = − 1 − x2 are explicit.
To differentiate the implicit function, we can differentiate the given equation. We do not need to find the explicit function y = f (x). Actually, it is not always possible to express a function explicitly.
The technique of implicit differentiation was covered in
MATHS 108 and is illustrated in the following example.
Example 1.1.7.
Consider a function y = f (x) given by the equation
dy
x2 + y 2 − 1 = 0. Find the derivative
.
dx
dy
, differentiate both sides of the
To find the derivative
dx
given equation with respect to x and apply the Chain
Rule (remembering that y is a function of x!) which
gives:
dy
2x + 2y
= 0.
dx
So,
dy
−2x
x
=
=− .
dx
2y
y
In general, if a differentiable function y = f (x) is given
by the equation F (x, y) = 0,
differentiating both sides of the equation F (x, y) = 0
with respect to x which gives
Fx + Fy
Thus,
dy
= 0.
dx
Fx
dy
=− .
dx
Fy
Similarly, if a differentiable function x = f (y) is given
by the equation F (x, y) = 0,
differentiating both sides of the equation F (x, y) = 0
with respect to y which gives
Fy + Fx
Thus,
dx
= 0.
dy
Fy
dx
=− .
dy
Fx
CONTENTS
12
Given F (x, y) = 0 :
dy
Fx
=− ,
dx
Fy
Fy
dx
=− .
dy
Fx
For a function of two variables given by the equation
F (x, y, z) = 0, we can find the corresponding partial
derivatives in the same way.
Example 1.1.8.
Given the equation
exyz − x2 + 3y 2 + z 2 = 208.
Find
∂z
∂z
and
.
∂x
∂y
∂z
, differentiate both sides
To find the partial derivative
∂x
of the given equation with respect to x and apply the
Chain Rule (remembering that y can be regarded as a
constant and hence z is a function of x!) which gives:
exyz y(z + x
∂z
∂z
) − 2x + 2z
= 0.
∂x
∂x
It follows that
∂z
2x − exyz yz
=
.
∂x
2z + exyz xy
∂z
, differentiate both sides
∂y
of the given equation with respect to y and apply the
Chain Rule (remembering that x can be regarded as a
constant and hence z is a function of y!) which gives:
To find the partial derivative
exyz x(z + y
∂z
∂z
) + 6y + 2z
= 0.
∂y
∂y
It follows that
6y + exyz xz
∂z
=−
.
∂y
2z + exyz xy
It can be shown that
Given F (x, y, z) = 0 :
Fx
∂z
=− ,
∂x
Fz
∂z
Fy
=− ,
∂y
Fz
Fy
∂x
=− ,
∂y
Fx
Fz
∂x
=− ,
∂z
Fx
∂y
Fx
=− ,
∂x
Fy
Fz
∂y
=− .
∂z
Fy
CONTENTS
13
Extra for Interest: Those formulae can be more generalised in MATHS 340 where vector functions are defined
by more than one equation.
Now, continuing the previous example: given the equation
exyz − x2 + 3y 2 + z 2 = 208,
∂y ∂y ∂x
∂x
,
,
and
using the formulae on the pre∂x ∂z ∂y
∂z
vious page.
find
CONTENTS
14
1.1.4 Gradient and directional derivatives
Definition 1.1.9.
The vector of first derivatives of f (x, y) evaluated at
(x0 , y0 ),
∇f (x0 , y0 ) = (fx , fy )|(x0 ,y0 ) ,
is called the gradient of the function f at (x0 , y0 ).
The vector of first derivatives of f (x, y, z) evaluated at
(x0 , y0 , z0 ),
∇f (x0 , y0 , z0 ) = (fx , fy , fz )|(x0 ,y0 ,z0 ) ,
is called the gradient of f at (x0 , y0 , z0 ).
We say “grad f" for ∇f .
Example 1.1.10.
(a) Given a surface
z = x2 − y. Find ∇z(1, 2).
plot and level curves
20
15
(b) Find the equation of the level curve
of the surface z = x2 − y through the
point (1, 2). Recall that along a level
curve of a function f , the value of f is
constant.
10
5
0
−5
4
2
4
2
0
0
−2
−2
−4
(c) Find the equation of the tangent line
to the level curve of z through (1, 2).
−4
contour plot and gradient vectors
4
3
(d) Find a unit vector in the direction of
this tangent line.
2
1
0
−1
(e) Find the dot product of this unit vector and ∇z(1, 2).
−2
−3
−4
−4
−3
−2
−1
0
1
2
3
4
CONTENTS
15
This last example illustrates the fact that if u is a vector
tangent to the level curve of f at (x0 , y0 ), then
∇f (x0 , y0 ) · u = 0,
i.e., the tangent to the level curve at (x0 , y0 ) and ∇f (x0 , y0 )
are at right angles.
FACT The tangent to the level curve at (x0 , y0 ) and
∇f (x0 , y0 ) are at right angles.
In fact, we will shortly see that the gradient ∇f (x0 , y0 )
points from (x0 , y0 ) in the direction in which the value
of f is increasing most rapidly.
Recall that for a function y = f (x) of one variable,
df
dx
f (x0 + h) − f (x0 )
h→0
h
= lim
x=x0
measures the rate of change of f at the point (x0 , f (x0 ))
as x changes.
Moving up dimensions, there is now more than one independent variable, so we need to specify the direction
in which we are interested in observing a rate of change:
The rate of change of f (x, y) at (x0 , y0 ) in the direction
of a given non-zero vector v is given by:
f (x0 + hu1 , y0 + hu2 ) − f (x0 , y0 )
h→0
h
Du f (x0 , y0 ) = lim
where u = (u1 , u2 ) =
tion of v.
v
is a unit vector in the direckvk
Definition 1.1.11.
If u is any unit vector, the directional derivative of f at
(x0 , y0 ) in the direction of u, given by
Du f (x0 , y0 ) = ∇f (x0 , y0 ) · u ,
measures the rate of change of f at (x0 , y0 ) as (x, y)
moves from (x0 , y0 ) in the direction of u.
CONTENTS
16
(1,1,5)
10
Example 1.1.12.
Find the directional derivative of f (x, y) = 5 − x2 + y 2
at the point (1, 1)
8
6
4
2
(a) in the direction of (3, −4);
(1,1,0)
0
2
1
2
1
0
0
−1
(b) in the direction of (1, 1);
−1
−2
−2
2
(c) in the direction of the gradient of f at (1, 1).
1.5
(1,1,0)
1
0.5
0
−0.5
−1
−1.5
−2
−2
If we are determining a climbing route with a topographical map, how can we tell the fastest way to ascend a
mountain? - risks and rivers aside!
Common knowledge suggests we cut across the contours
at right angles. Here’s why:
Assume
• altitude at any point (x, y) is given by the function
f (x, y), and
• current position is (x0 , y0 ).
We know that
Du f (x0 , y0 ) = ∇f (x0 , y0 ) · u
= k∇f (x0 , y0 )kkuk cos θ
(1.1)
Consider the expression for the directional derivative of
f at a fixed (x0 , y0 ), and let the direction u vary over all
possible values.
What changes?
• (x0 , y0 ) is fixed, so k∇f (x0 , y0 )k is fixed,
• u (the direction in which we examine the change
in f ) is not fixed, but kuk = 1 is constant
−1.5
−1
−0.5
0
0.5
1
1.5
2
CONTENTS
17
• cos θ varies as θ, the angle between ∇f (x0 , y0 )
and u, varies.
In other words only cos θ changes as we look at rates of
change of f in different directions.
So when is the directional derivative, Du f (x0 , y0 ), largest?
Recall that
−1 ≤ cos θ ≤ 1,
and
cos θ = 1 when
θ = 0.
Therefore the directional derivative, Du f (x0 , y0 ), is largest
when u and ∇f (x0 , y0 ) are parallel. That is, the directional derivative, Du f (x0 , y0 ), is largest when the direction u is the same as the direction of the gradient vector
∇f (x0 , y0 ). We’ve derived:
FACT
(i) The direction of maximum change in a function f (x, y) at a point (x0 , y0 ) is the direction of the gradient
vector, ∇f (x0 , y0 ).
(ii) The direction of maximum negative change in a function f (x, y) at a point (x0 , y0 ) is the direction of
−∇f (x0 , y0 ) (the direction opposite to ∇f (x0 , y0 )).
(iii) The maximum change of f at (x0 , y0 ) is k∇f (x0 , y0 )k.
Now, we consider topographical maps again: If a function f (x, y) measures altitude at position (x, y), and you
are standing at (x0 , y0 ), then ∇f (x0 , y0 ) points in the
direction of steepest increase in altitude from (x0 , y0 ).
If you want to descend from where you are most steeply,
take the direction −∇f (x0 , y0 ).
Consider a function of three variables f (x, y, z). If u
is any unit vector, the directional derivative of f at
(x0 , y0 , z0 ) in the direction of u, given by
Du f (x0 , y0 , z0 ) = ∇f (x0 , y0 , z0 ) · u ,
Example 1.1.13.
Given a function f (x, y, z) = x2 y − yz 3 + z and a point
P = (1, −2, 0).
(a) Calculate the gradient ∇f of f at the point P .
(b) Find the directional derivative Du f of f at the
point P in the direction:
CONTENTS
18
(i) of the vector v = (2, 1, −2).
(ii) of the negative z-axis.
(iii) from P to Q = (4, −3, 1).
(c) In which direction does the function increase fastest
at P and what is its rate of change in that direction?
Example 1.1.14.
In Matlab we can graph functions of more than one variable, along with
level curves
• the surf or mesh commands plot surfaces
• the surfc or meshc commands plot surfaces and their level curves
(contours)
• the contour command plots level curves in the (x, y) plane
0.5
0
−0.5
2
• the gradient function evaluates gradient vectors numerically.
1
0
−1
−1
−2
• the quiver function plots vector fields (gradients)
Matlab plots numerically: meshgrid must be called to produce a set of
points in the plane as domain for the plots. Here we use Matlab to observe
2
2
the graph of z = xe−x −y and to plot a vector field consisting of the
gradients on a contour plot.
2
1
0
−2
2
1.5
1
0.5
0
−0.5
clf;
colormap(gray);
−1
[x,y]=meshgrid(-2:.2:2, -2:.2:2);
−1.5
z = x .* exp(-x.^2 - y.^2);
−2
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
surfc(x,y,z);
figure %open another figure
2
2
colormap(gray);
Figure 1.2: Surface of z = xe−x −y
[px,py] = gradient(z,.2,.2);
with contours and gradients
contour(x,y,z),hold on, quiver(x,y,px,py), hold off
CONTENTS
19
1.1.5 Optimisation
Recall the one-dimensional case:
Suppose that f = f (x) is differentiable, then for f to
have a maximum or a minimum at a point x, the slope of
f must be zero, i.e.,
f ′ (x) = 0.
If f is twice differentiable, then the second derivative
test determines whether f has a relative maximum, a relative minimum, or neither.
Summary 1.1.15.
Suppose that f ′ (x0 ) = 0, and
• if f ′′ (x0 ) > 0 then f has a relative minimum at x0
• if f ′′ (x0 ) < 0 then f has a relative maximum at
x0
• if f ′′ (x0 ) = 0 then the test is inconclusive (use the
first derivative near x0 to analyse)
Now the two-dimensional case:
Recall that the directional derivative of f at the point
x = (x, y) in the direction v = (v1 , v2 ) is given by
f (x + tv) − f (x)
.
t→0
t
Dv f (x) = lim
Since this is the rate that f increases at x in the direction
v, for f to have a relative maximum or minimum we
must have
Dv f (x) = 0,
for all v.
Since Dv f (x) = ∇f (x)·v = v1 fx + v2 fy the condition that all directional derivatives be zero (the analog of
f ′ (x) = 0) can be written
∇f (x) = 0.
What is the analog of the second derivative test? Since
Dv f is a function of two variables, we may take a directional derivative again, to obtain
Du Dv f (x) = (Du (Dv f ))(x) = u
T
fxx (x) fxy (x)
v.
fyx (x) fyy (x)
This follows by applying Dv f = v1 fx + v2 fy twice.
A square matrix of second-order partial
derivatives of a function f
fxx fxy
Hf =
fyx fyy
is called the Hessian of f .
CONTENTS
20
This matrix is symmetric, or has Hf = HfT , provided
fxy = fyx .
We now apply the one-dimensional result. The second
derivative of f in the direction v is given by
2
T fxx fxy
Dv f = Dv (Dv f ) = v
v = vT Hf v.
fyx fyy
(1.2)
Suppose ∇f (x) = 0, and that Dv2 f (x) > 0, for all directions v. Then the function of one-variable obtained
by restricting f to the line through x in the direction v
has zero first derivative at x, second derivative Dv2 f (x) >
0, and so has a local minimum at x. Since this is true for
all directions v, a condition for f to have a local minimum at x (the analogue of f ′′ (x) > 0) is that Dv2 f (x) >
0, for all directions v.
Definition 1.1.16.
A square symmetric matrix A is said to be
(i) positive-definite if vT Av > 0 for all v 6= 0.
(ii) negative-definite if vT Av < 0 for all v 6= 0.
(iii) indefinite if it is neither positive-definite nor
negative-definite.
This argument leads to the second derivative test for relative maxima and minima of functions of several variables.
Summary 1.1.17.
Suppose that ∇f (x) = 0, i.e.,
fx (x) = 0,
fy (x) = 0,
and
• Hf (x) is positive definite ⇒ f (x) is a relative
minimum; e.g., f (x, y) = x2 + y 2 at (0, 0)
• Hf (x) is negative definite ⇒ f (x) is a relative
maximum; e.g., f (x, y) = −x2 − y 2 at (0, 0)
• Hf (x) is indefinite ⇒ f (x) is a saddle point;
e.g., f (x, y) = x2 − y 2 at (0, 0)
Otherwise the test is inconclusive;
e.g., f (x, y) = x4 + y 4 at (0, 0).
The points (x, y, f (x, y)) where ∇f (x, y) = 0 are called
critical points of the function f .
There are a number of tests for a symmetric matrix A to
be positive definite, including that the principal minors
(see section Quadratic forms) be positive, i.e.
det([a11 ]) = a11 > 0,
This leads to:
det(A) = a11 a22 − a212 > 0.
CONTENTS
21
Second partial derivative test for functions of several variables
Suppose (x0 , y0 ) is a critical point of f , and
• if det Hf (x0 , y0 ) > 0 and fxx (x0 , y0 ) > 0 then f has a
minimum at (x0 , y0 ),
2
1.5
1
0.5
0
150
150
100
100
50
50
0
0
• if det Hf (x0 , y0 ) > 0 and fxx (x0 , y0 ) < 0 then f has a
maximum at (x0 , y0 ),
0
−0.5
−1
−1.5
−2
150
150
100
100
50
50
0
0
• if det Hf (x0 , y0 ) < 0 then f has a saddle point at
(x0 , y0 ),
1
0.5
0
−0.5
−1
150
150
100
100
50
50
0
0
• if det Hf (x0 , y0 ) = 0 then the test is inconclusive.
Note that in two dimensions a critical point can be a saddle point,
which has directions in which the function is increasing and decreasing.
Example 1.1.18.
Find and classify the critical points of
f (x, y) = 4xy − x4 − y 4 .
CONTENTS
22
Example 1.1.19.
An oil refinery produces two grades of petrol: standard and super. Weekly production is represented by x megalitres of standard and y mega-litres of super. Weekly revenue is 60xy, and costs are x3 + y 3 (units K$). It follows
the weekly profit is given by the function: f (x, y) = 60xy − x3 − y 3 .
Obviously, the refinery management team would like to maximise this profit. How can this be done?
Example 1.1.20.
Find and classify the critical points of f (x, y) = x3 − 3xy − y 3 .
30
25
20
15
10
5
0
−5
−10
−15
−20
−2
0
2
y
2
1.5
0.5
1
x
0
−0.5
−1
−1.5
−2
CONTENTS
23
1.1.6 Constrained Optimisation — Lagrange Multipliers
We now consider the problem of finding the maximum or
minimum of a function f (x, y) subject to the constraint
that the point (x, y) lie on some curve given by
g(x, y) = 0.
Since the gradient of a function is orthogonal to its level
curves, ∇g is orthogonal to the curve g = 0 at each point
on the curve. Further, the function f increases most in
the direction ∇f (and decreases most in the direction
−∇f ), so that if ∇f is not parallel to ∇g at a point on
the curve, then we can increase or decrease the value of
f by moving along the curve g = 0. Thus the condition
for a point on the curve g = 0 to give a maximum or a
minimum of f is that ∇f and ∇g be parallel, i.e.,
∇f = λ∇g,
for some λ 6= 0.
Equivalently, the level curves of f and g must touch at
this point. Since vectors are equal if and only if their
components are equal, this condition gives two equations
fx = λgx ,
fy = λgy .
This gives the method of Lagrange multipliers:
To find the maximum and minimum of the function
f (x, y) subject to the constraint
g(x, y) = 0,
find the points (x, y) which satisfy the equations
∇f = λ∇g,
g = 0.
These can be expanded to a system of equations
fx (x, y) = λgx (x, y),
fy (x, y) = λgy (x, y),
g(x, y) = 0.
For these points evaluate f (x, y) to find which point (or
points) gives the maximum and the minimum.
CONTENTS
24
Example 1.1.21.
Find the maximum and minimum of the function
f (x, y) = xy subject to the constraint x2 + y 2 = 1.
objective function
800
600
400
s
200
0
−200
−400
−600
−800
−10
0
10
−20
10
0
−10
20
l
c
objective function and constraint
800
Example 1.1.22.
The sales of a fixed price product depend on the cost of
materials C, and the amount of labour L, according to
the relationship
600
400
s
200
2
S = 10CL − 2L .
Further, budget constraints require that C + L = 12.
Find the maximum sales obtainable firstly by the method
of Lagrange multipliers, and secondly by substituting for
one of the variables from the constraint equation.
0
−200
−400
−600
−800
−10
0
10
c
−20
0
−10
l
10
20
CONTENTS
25
Lagrange multipliers in three dimensions
For functions of three (or more) variables the method of Lagrange multipliers works, by the same reasoning, with
the constraint g(x, y, z) = 0, being that the points (x, y, z) lie on some surface in R3 .
Example 1.1.23.
Find the points on the sphere x2 + y 2 + z 2 = 36 that are closest and farthest from the point P (1, 2, 2).
CONTENTS
26
Lagrange multipliers with two constraints
For f (x, y, z) we can also require that the points (x, y, z) lie on a curve given by the intersection of two surfaces
g(x, y, z) = 0,
h(x, y, z) = 0.
This leads to the following Lagrange multiplier method with two constraints:
To find the maximum and minimum of the function
f (x, y, z) subject to the constraints
g(x, y, z) = 0,
h(x, y, z) = 0,
find the points (x, y, z) which satisfy the equations
∇f = λ∇g + µ∇h,
g = 0,
h = 0.
For these points evaluate f (x, y, z) to find which give
the maximum and the minimum.
The scalars λ and µ are called the Lagrange multipliers of the problem, and must not both be zero.
Example 1.1.24.
Find the extreme values of u = f (x, y, z) = xy −3z 2 subject to the conditions that x+y +z = 24 and z −y = 4.
CONTENTS
27
1.2 Sequences & Series
1.2.1 Sequences
A sequence is a list of infinitely many numbers in a particular order.
An arbitrary sequence is written as
a1 , a2 , a3 , . . .
or
{an }∞
n=1 .
• the first term is a1 ,
• the second term is a2 , and
• an is the nth general term n = 1, 2, ....
Typically a sequence is defined by a formula for the general term an .
We usually start our sequences with the index n = 1, but
they can start with any index by adjusting the formula
for an .
Example 1.2.1.
For the following sequences, whose nth general term is
given below, write out the first three terms
(a) an =
3n
n!
(b) an =
2n
n
(c) an =
3n
n3
CONTENTS
28
(a) 1, 12 , 13 , 14 , 51 , · · ·
an =
an
Example 1.2.2.
For the following sequences find a formula for an
1.4
1.2
1
0.8
0.6
0.4
0.2
0
12345
10
15
20
n
(b) 1, 2, 4, 8, 16, · · ·
an =
an
128
64
32
16
0
1
2
3
4
5
n
(c) 21 , 23 , 34 , 45 , · · ·
an =
an
1
0
12345
10
15
20
15
20
n
(d) 1, −1, 1, −1, 1, −1, 1, −1, · · ·
an =
an
1
0
-1
12345
10
n
Example 1.2.3.
The Fibonacci sequence:
Two new-born rabbits (a male and a female) are taken to a remote island and released. There are no rabbitpredators on the island, and no other rabbits occupy the island.
• Each month, each pair of rabbits produces a new-born pair, one of each gender.
• Each pair reproduces at age 2 months.
The population after n months, in pairs of rabbits, is given by
a1 = 1,
a2 = 1,
an+1 = an + an−1 , n ≥ 2.
Write down the first 10 terms of the sequence.
Generally, our interest is in what is happening eventually, i.e. in the trend of the terms in the sequence.
CONTENTS
29
Convergence
Definition 1.2.4.
We say a sequence {an }∞
n=1 converges to L if the terms in the sequence eventually become arbitrarily close to
L. In this case, we write
lim an = L,
n→∞
Otherwise, if lim an does not exist, we say the sequence {an }∞
n=1 diverges.
n→∞
Example 1.2.5.
For each of the following geometric sequences, determine whether it is convergent and fill out the table:
1
2n
an
(a) an =
1
2
3
4
5
10
n
an
(b) an = 1.1n
8
7
6
5
4
3
2
1
0
12345
10
15
20
n
n
an
(c) an = − 21
1
2
3
4
5
10
n
(d) an = (−1)n
1
an
0.5
0
-0.5
-1
12345
10
15
20
15
20
n
an
(e) an = (−1.1)n
an
1
2n
1.1n
(− 12 )n
(−1)n
(−1.1)n
lim an
n→∞
{an }∞
n=1 converges?
8
6
4
2
0
-2
-4
-6
-8
12345
10
n
CONTENTS
30
More generally, we have:
Convergence of a geometric sequence
lim r n = 0
For 0 ≤ r < 1,
n→∞
lim r n = 0
For −1 < r < 0,
n→∞
lim r n = 1
For r = 1,
n→∞
lim r n does not exist
For r = −1,
n→∞
lim r n does not exist
For r > 1,
n→∞
lim r n does not exist
For r < −1,
n→∞
Limit-taking techniques
In the previous subsection, we used the graphs to guess
the limits of geometric sequences. Now we introduce
methods to find the exact limits of some sequences. First
of all, it is not hard to see the following useful facts:
∞
Suppose that the sequences {an }∞
n=1 and {bn }n=1 converge and k is a constant. Then
• lim kan = k lim an .
n→∞
n→∞
• lim (an + bn ) = lim an + lim bn .
n→∞
n→∞
n→∞
• lim (an bn ) = lim an × lim bn .
n→∞
n→∞
n→∞
Squeezing Theorem
Given a sequence {an }∞
n=1 , suppose that there exist two
∞
∞
other sequences {bn }∞
n=1 and {cn }n=1 such that {an }n=1
is squeezed between these two sequences:
(i) :
(ii) :
b n ≤ an
and
lim bn = lim cn = L
n→∞
L ≤
for all n ≥ n0
≤ cn
↓
lim an ≤
n→∞
(where n0 some positive integer )
(i.e. they both have the same limit.)
n→∞
Then as n approaches ∞,
bn ≤
an
↓
↓
so
an ≤ cn ,
L
lim an = L.
n→∞
This theorem is useful for taking limits of sequences involving sin(n), cos(n), and (−1)n .
CONTENTS
31
Example 1.2.6.
Use the Squeezing Theorem to find the limit of each sequence below defined by
(a) an =
(−1)n
n
(b) an =
sin(n)
2n
Example 1.2.7.
The Matlab code to find the limits in the previous two
examples is
% Calculate various limits
syms n;
limit((-1)^n/n, inf)
limit(sin(n)/(2*n), inf)
Example 1.2.8.
Prove the following: If lim |an | = 0 then lim an = 0.
n→∞
n→∞
CONTENTS
32
Now we introduce a powerful limit-taking technique: L’Hôpital’s Rule to find the exact limits of some sequences.
L’Hôpital’s Rule
0
0 ∞
L’Hôpital’s Rule is used to find the limit of indeterminate forms such as 00 , ∞
∞ , 0 × ∞, ∞ − ∞, 0 , ∞ , 1 .
L’Hôpital’s Rule: For indeterminate products/quotients follow the steps:
1. If necessary, rearrange expression into fractional form
(Note that f (n) and g(n) need to have derivatives.)
f (n)
∞
0
f (n)
, so that lim
becomes
or .
n→∞ g(n)
g(n)
∞
0
f (n)
f ′ (n)
= lim ′
.
n→∞ g(n)
n→∞ g (n)
2. Use lim
(Note: step 2 is invalid if lim
f (n)
n→∞ g(n)
is not an indeterminate form!)
The following examples illustrate this theorem.
Example 1.2.9.
Use L’Hôpital’s Rule to find the limit of the following sequences defined by
(a) an =
n−3
n+2
(b) bn =
2n2 − 3n + 1
5n2 − 6
(c) cn =
1 + n + 2n3
1 − n − n3
CONTENTS
33
Note:
All sequences in the previous example are polynomial fractions. Alternatively, to find the limit of a polynomial
fraction, divide through by the highest power of n in the denominator, and then let n → ∞. It leads to the
following useful formula.


a
if k = l
b
+ ...
=
0
if
k<l
lim
n→∞ bnl + ...

∞ if k > l
ank
Example 1.2.10.
Use L’Hôpital’s Rule to find the limit of the following
sequences defined by
(a) an =
ln(n)
n
(b) bn =
en
n2
(c) cn = ne−n
Now consider a sequence {an }∞
n=1 which has an indeterminate form of 00 or ∞0 or 1∞ . If we consider the
sequence ln{an }∞
n=1 , we obtain another sequence which
∞
. Hence
has an indeterminate form of 0 × ∞ or 00 or ∞
we can use our previous results. This method uses the
following two steps:
(a) Find lim ln(an ) = b
n→∞
(b) Undo the logarithms: lim an = eb .
n→∞
Note:
• The second step uses the following fact:
CONTENTS
34
lim ln(an ) = b
n→∞
⇐⇒
lim an = eb .
n→∞
• Indeed, more generally, we have the following fact:
If an > 0 → a and bn → b,
then abnn → ab .
Example 1.2.11.
1
Calculate the limit of the sequence defined by an = n n .
Example 1.2.12.
n
Determine whether the sequence an = 1 + n1 is convergent or not.
2.71828
an
2
1
0
12345
10
15
20
n
A derivation of e
Example (1.2.12) illustrates a very important formula:
a n
= ea
n→∞
n
a nb
lim 1 +
= eab
n→∞
n
lim
and more generally
1+
(1.3)
Before using this important formula to find the limits of some indeterminate sequences with form 1∞ , we introduce an interpretation of e by considering the continuous compounding of interest.
CONTENTS
35
Example 1.2.13.
Assume that the annual interest is 100%. If you deposit
$1 into a bank, how much money is there in your bank
account after one year?
• If it is compounded once a year:
$(1 + 1) = $2.
• If it is compounded twice a year
(i.e., every half-year):
1
1+
2
1
+ 1+
2
1
=
2
1
1+
2
2
= 2.25.
• If it is compounded three times a year:
1
1 21
1 3
1 1
1+
= 2.37.
+ 1+
= 1+
+ 1+
3
3 3
3
3
3
• If it is compounded four times a year
1
1+
4
4
= 2.44.
• If it is compounded 12 times a year
(i.e., every month):
1 12
= 2.61.
1+
12
• If it is compounded 52 times a year
(i.e., every week):
1 52
= 2.69.
1+
52
• If it is compounded 365 times a year
(i.e., every day):
1
1+
365
365
= 2.71.
• If it is compounded n times a year:
1 n
1+
n
• If it is compounded continuously in a year
(i.e., every moment):
1 n
= e ≈ 2.72.
lim 1 +
n→∞
n
CONTENTS
36
Note: that lim
n→∞
lim
n→∞
1
1+
n
1
1+
n
n
n
6= 1. In fact,
= e = 2.718281828459045 · · ·
Example 1.2.14.
Find the limit of each of the following sequences defined
by
n+3 n
(a)
an =
n n
3
= 1+
n
3 n
Thus, lim an = lim 1 +
n→∞
n→∞
n
= e3
bn =
(b)
=
lim
and
n→∞
1+
1
8
n
!n
=e
1+
1+
1
8
1
8
n
1
8
n
!n+3
!n
and
1+
lim
n→∞
1
⇒ lim bn = e 8 .
n→∞
Example 1.2.15.
Find the limit of each of the following sequences defined
by
n
2
(a) an = 1 +
n+1
(b) bn =
3
1−
n
n
1
8
n
!3
1+
1
8
n
!3
=1
CONTENTS
(c) cn =
37
1+
1
2n2
3n2 −1
Example 1.2.16.
If money is deposited in an account where interest of i%
p.a. is compounded n times per year, after m periods of
compounding an initial deposit P (0) is worth
i m
P (0) 1 +
.
n
After a period of time t (in units of years), the number
of compounds will be m = nt, and the value P (t) of the
account then is
i nt
P (t) = P (0) 1 +
.
n
If the money P (0) is deposited in an account where interest of i% p.a. is compounded continuously, after a period of time t years, then the value P (t) of the account
is the following limit
i
P (t) = lim P (0) 1 +
n→∞
n
nt
.
Find the value P (t) by calculating the limit.
CONTENTS
38
1.2.2 Series
A series is an ordered sum of infinitely many numbers.
If the numbers we are adding are a1 , a2 , a3 , . . .,
their sum
∞
X
an .
a1 + a2 + a3 + . . . is written
n=1
Example 1.2.17. P
Use the notation
to express the following series.
(a) −1 + 1 − 1 + 1 − 1 + 1 − . . . =
(b) 1 − 1 + 1 − 1 + 1 − 1 + . . . =
(c) 1 +
1
2
+
1
3
+
1
4
+ ··· =
(d) 1 +
1
2
+
1
4
+
1
8
+ ··· =
1
4
−
1
8
(e) − 12 +
+ ··· =
We are mainly interested in whether a series “adds up"
to some number. Often the value itself is not important.
Some series obviously do not add up, e.g.
∞
X
n=1
1 = 1 + 1 + 1 + 1 + ... = ∞
while for others it isn’t so easy to see.
∞ n
X
1
=2
Example 1.2.18. 1.
2
n=0
2. Harmonic series:
∞
X
1
=∞
n
n=1
Example 1.2.19.
The Matlab code to evaluate the two series above is
% Calculate various series
syms n
symsum((1/2)^n, 0,inf)
symsum(1/n, 1,inf)
CONTENTS
39
Partial sums
To find out whether a series has a finite sum, we start by adding up a few terms, and then progressively more, to see if
there is a pattern in these partial sums.
Definition 1.2.20.
The partial sum of the first n terms of a series
∞
X
an is denoted by sn . Thus,
n=1
s 1 = a1
s 2 = a1 + a2
s 3 = a1 + a2 + a3
s4 = a1 + a2 + a3 + a4
..
.
s n = a1 + . . . + an
The partial sums of a series form a sequence: s1 , s2 , s3 , . . . , sn , . . . .
In some cases, we can see a pattern in the sequence {sn }:
Example 1.2.21.
∞
X
n
ln
n+1
n=1
Using the rule of logarithms:
sn = ln
=
1
2
ln
a
b
= ln a − ln b,
+ ln 23 + ln 34 + · · · + ln
n−1
n
+ ln
n
n+1
=
Definition 1.2.22.
If the sequence of partial sums of a series
∞
X
an converges to a real number L
n=1
i.e. if lim sn = L,
n→∞
then we call L the sumP
of the series, and say
the series ∞
n=1 an converges to L, for short
∞
X
an = L.
n=1
Otherwise, we say the series
Note:
∞
∞
X
X
an diverges.
an = lim sn = ∞, then the series
If
n=1
n→∞
n=1
It follows from Definition (1.2.22) that the series
∞
X
n=1
ln
n
n+1
P∞
n=1 an
diverges.
in Example (1.2.21) diverges, as
lim sn = − lim ln(n + 1) = −∞.
n→∞
n→∞
This type of series here is called telescoping, because of the way it simplifies. In the next subsection, we will apply
Definition (1.2.22) (i.e., partial sums ) to discuss geometric series.
CONTENTS
40
However, sometimes a pattern in the partial sums isn’t so easy to see, we need some special methods.
Note: Applying Definition (1.2.22), the following properties can be established.
(a) For every positive integer n0 :
(i)
(ii)
∞
X
an converges
n=1
∞
X
∞
X
=⇒
an diverges
an converges,
n=n0
∞
X
an diverges.
=⇒
n=n0
n=1
(b) For every non-zero real number C constant,
(i)
(ii)
∞
X
Can converges
n=1
∞
X
Can diverges
an diverges.
=⇒
n=1
(iii) Furthermore, when
an converges,
n=1
Can = C
∞
X
an and
∞
X
(an + bn ) converges, and
an .
n=1
∞
X
bn converge, then
n=1
n=1
(ii)
∞
X
∞
X
∞
X
n=1
(i)
an converges,
n=1
∞
X
n=1
(c) If
∞
X
=⇒
n=1
∞
X
(an + bn ) =
∞
X
n=1
n=1
an +
∞
X
bn .
n=1
Geometric series
If you decide to walk a certain distance, say 1 kilometre,
each day walking half of the distance remaining, you’ll
never reach your destination no matter how long you
live.
1
The first day you walk km.
2
1
The second day you walk km.
4
1 n
The nth day you walk
km.
2
If you kept to this programme forever, you’d walk the
kilometre. So it must be that
∞ n
X
1
= 1.
2
n=1
The sum of all the distances, being successive powers of
a fixed number 21 , forms a geometric series.
CONTENTS
41
Definition 1.2.23.
∞
X
an is geometric
A series
n=1
an+1
if
is constant for all n.
an
A geometric series can be written as
a
∞
X
rn
n=0
where
• the starting index is now 0, for convenience,
an+1
(a constant ratio of successive terms ),
an
• r=
• a is the first term of the series.
Example 1.2.24.
P
n
Express the following geometric series in the form a ∞
n=0 r .
(a)
1
1 1
+ +
+ ··· =
4 8 16
(b)
1
= 0.3̇
3
Geometric series have the convenient property that if
they converge, their
P∞ sumn can be found.
Partial sums of n=0 r :
s1 = 1
s2 = 1 + r
s3 = 1 + r + r 2
..
.
sn = 1 + r + r 2 + r 3 + . . . + r n−1
Now we use some algebra:
rsn =
r+r 2 +r 3 + . . . +r n−1 +r n
sn =1 + r+r 2 +r 3 + . . . +r n−1
⇒ rsn − sn = r n − 1
⇒ sn (r − 1) =r n − 1,
1 − rn
,
if r 6= 1.
1−r
In this case, we have an expression for sn , the partial
sum of the geometric series: P
n
The sum of a geometric series ∞
n=0 r is
⇒ sn =
∞
X
n=0
It follows:
1 − rn
.
n→∞ 1 − r
r n = lim sn = lim
n→∞
CONTENTS
42
Convergence result for Geometric Series
If |r| < 1:
∞
X
If |r| ≥ 1:
∞
X
rn =
n=0
1
1−r
r n diverges.
n=0
Example 1.2.25.
For the following geometric series determine if the series
converges or diverges and if it converges, state the limit.
(a)
∞
X
en
n=0
(b)
∞
X
e−n
n=0
∞ X
1 n
(c)
−
3
n=1
(d)
∞ n
X
3
n=0
2
Example 1.2.26.
Write the number
0.61̇ = 0.611 . . . =
6
1
+
+ ...
10 100
as a geometric series, and find its sum as a fraction.
CONTENTS
43
Ratio Test for convergence
There are many series which are not geometric.
We introduce a test closely related to the test of convergence of geometric series, to determine the convergence
an+1
If lim
of other more general series:
n→∞
an
Ratio Test for
< 1, the series converges;
> 1, the series diverges;
= 1, the test is inconclusive.
∞ n
X
e
n=0
(b)
n!
∞
X
n!
n=1
nn
Example 1.2.28.
Just so that you don’t think this test works for every series:
(i)
∞
X
1
n=1
(ii)
n
∞
X
1
n2
(diverges)
(converges)
n=1
Note: It can be shown (using methods covered in MATHS 250) that in general:
Hyperharmonic or p-series
∞
X
1
is convergent if and only if p > 1.
np
n=1
an
n=1
Example 1.2.27.
Determine whether the given series converge or diverge.
(a)
∞
X
CONTENTS
44
1.2.3 Taylor Polynomials
Local Approximations
The tangent line to the graph of a function y = f (x) at a
point where x = c has equation
y = f (c) + f ′ (c)(x − c).
In some sense, the tangent line is the line “closest” to the
curve at this point.
The tangent line to f (x) at x = c shares the following
properties with y = f (x):
(i) the same y-value as y = f (x) at x = c,
(ii) the same y ′ -value as y = f (x) at x = c.
The tangent line to y = f (x) is called a linear approximation to f (x).
Example 1.2.29.
The tangent line of y = x2 at (1, 1) is
y
y = 2x − 1
y − 1 = 2(x − 1) or y = 2x − 1.
The equation of the tangent line y = 2x − 1 is a
polynomial of degree one in x.
y = x2
(1, 1)
x
Figure 1.3: y = x2 and its tangent line at (1, 1)
Sometimes we want to make a better approximation than
a tangent line.
Consider the problem: given a function, find a polynomial of given degree k which is closest to the function, at
least in some local region.
This problem is of importance, since
• polynomials are easy to manipulate (e.g. to evaluate, graph, differentiate and integrate) and
• we can sometimes approximate a function well
about a specific point by a simple polynomial.
To ensure that we can find such a polynomial, we restrict
our study to functions f (x) which have at least as many
derivatives as the degree of the polynomial we want in
the region of interest.
If we only care to approximate f (x) about a point c up to
and including its kth derivative for some number k, then
we can do no better than its Taylor Polynomial pk (x):
CONTENTS
45
Definition 1.2.30.
The Taylor Polynomial of degree k for f (x) about a
point c is the polynomial pk (x) of degree k such that:
pk (x) = f (c) + f ′ (c)(x − c) + f ′′ (c)
=
(x − c)2
(x − c)k
+ . . . + f (k) (c)
2!
k!
k
X
f (n) (c)(x − c)n
n!
n=0
When c = 0, we obtain the Maclaurin polynomial of
degree k:
pk (x) = f (0) + f ′ (0)x + f ′′ (0)
=
xk
x2
+ . . . + f (k) (0)
2!
k!
k
X
f (n) (0)xn
n=0
n!
Note: If we find the Taylor polynomials pn (x) of y =
f (x) about a point x = c, then
p0 (x) is the horizontal line through (c, f (c)):
p0 (x) = f (c).
p0 (x) gives the correct value of the function at a
point x = c.
p1 (x) is the tangent line to f (x) through (c, f (c)):
p1 (x) = f (c) + f ′ (c)(x − c).
p1 (x) gives the correct value of the function and
slope of tangent at the point x = c.
p2 (x) is a parabola through (c, f (c)):
p2 (x) = f (c) + f ′ (c)(x − c) + f ′′ (c)
(x − c)2
.
2!
p2 (x) gives the correct value of the function, slope
of tangent and concavity at a point x = c.
pk (x) satisfies
(k)
pk (x) = f (k) (x),
k = 0, 1, 2, . . . , k.
That is, f and its Taylor polynomial have the same
value, and the same {first, second, . . . , kth} derivatives at x = c.
Using Taylor polynomials to approximate functions introduces the notion of an error in the approximation:
CONTENTS
46
Definition 1.2.31.
The error in using a Taylor polynomial pk (x) of y =
f (x) to approximate the value of f (x) at a point x̄ is
error = |f (x̄) − pk (x̄)|
(1.4)
Example 1.2.32.
For the function f (x) = ex ,
(a) find the Taylor polynomial of degree 3 about
the centre c = 0.
f (n) (x)
n
f (n) (c)
0
1
2
3
(b) At each of the following points, approximate
ex with p3 (x), and use your calculator to find the
error in this approximation.
How does the error change as x approaches the
centre of the approximation?
(a) x = 0
(d) x = 0.5
(b) x = 1
(e) x = −0.1
(c) x = −1
(f) x = 0.01
2
y = p2 (x)
y = p1 (x)
y
y = p0 (x)
0
y = ex
y = p3 (x)
−2
−2
0
x
2
Figure 1.4: ex and some of its Taylor polynomials about c = 0
CONTENTS
47
Example 1.2.33.
For function f (x) = sin(x),
(a) find the Taylor polynomial of degree 5 about
the centre c = 0.
f (n) (x)
n
f (n) (c)
0
1
2
3
4
5
(b) At each of the given points, approximate sin(x)
with p5 (x), and use your calculator to find the error in this approximation:
(a) x = 0
(b) x = −1
(c) x = 0.5
(d) x = −0.1
(e) x = 0.01
y = p5 (x)
y
y = p3 (x)
0
y = sin(x)
y = p1 (x)
−2π
− 3π
2
−π
− π2
0
x
π
2
π
3π
2
Figure 1.5: sin(x) and some of its Taylor polynomials about the centre c = 0
2π
CONTENTS
48
Example 1.2.34.
For each of the given functions, find the Maclaurin polynomial p4 (x):
(a) f (x) = ln(1 + x)
(b) f (x) =
(c) f (x) =
1
1−x
1
1 + x2
(a) y = ln(1 + x)
n
f (n) (x)
f (n) (c)
(b) y =
n
f (n) (x)
1
1−x
f (n) (c)
(c) y =
n
0
0
0
1
1
1
2
2
2
3
3
3
4
4
4
Example 1.2.35.
Using Taylor or Maclaurin polynomials, approximate the
following numbers to within accuracy of 10−4 .
√
1
(d) 0.97
(a) 0.95
√
1
(b) 4.01
(e) 1.04
π
(c) cos 12
(f) ln 1.1
Example 1.2.36.
Matlab has a taylor function to compute Taylor polynomials, and a graphical tool taylortool to illustrate the
approximation of Taylor polynomials to any given function. For example,
%Matlab-session
syms x
% Taylor polynomial of degree 3 about c=0
taylor(1/(1+x), 4, 0)
% Taylor polynomial of degree 4 about c=0,
taylor(cos(x), 5, 0)
% taylortool for graphical representation
taylortool(’cos(x)’)
f (n) (x)
1
1 + x2
f (n) (c)
CONTENTS
49
1.2.4 Taylor Series
Definition 1.2.37.
The Taylor Series of a function f (x) about the centre c is the series:
∞
X
f (n) (c)(x − c)n
n=0
n!
= f (c) + f ′ (c)(x − c) + f ′′ (c)
(x − c)2
(x − c)n
+ . . . + f (n) (c)
+ ...
2!
n!
When c = 0, the Taylor series becomes the Maclaurin series for f (x):
∞
X
f (n) (0)(x)n
n=0
n!
= f (0) + f ′ (0)x + f ′′ (0)
x2
xn
+ . . . + f (n) (0)
+ ...
2!
n!
Example 1.2.38.
Find the Maclaurin series for the following functions:
(a) ex
(c) sin(x)
(b) cos(x)
(d)
Example 1.2.39.
Find the Taylor series for
1
1−x
1
about c = 1 by writing the function as an expression in the variable x − 1.
x
CONTENTS
50
In the sections on geometric series and the ratio test, we saw series whose nth-terms are constants.
In Maclaurin and Taylor series we have series whose nth-term involves a variable x.
Such a series is known as a power series.
Definition 1.2.40.
(i) A power series in x has the form
a0 + a1 x + a2 x2 + a3 x3 + · · · =
∞
X
an xn
n=0
and is said to be centred at x = 0.
(ii) More generally, a power series in (x − c) has the form
a0 + a1 (x − c) + a2 (x − c)2 + a3 (x − c)3 + · · · =
∞
X
n=0
an (x − c)n
and is said to be centred at x = c.
Notes:
• If we truncate a Taylor series at the (x − c)k term, we get the corresponding Taylor polynomial pk (x).
P
n
• As x varies, the power series ∞
n=0 an (x − c) may or not converge.
Theorem 1.2.41.
One of the following three properties characterises any
Converges
power series in x − c:
(i) The series converges for all x.
c
Case (i) - infinite interval of convergence
(ii) The series converges only when x = c.
(iii) There is a finite interval on the x-axis centred
at c, such that
Converges
Diverges
Diverges
• within this interval, the series converges.
• outside this interval, the series diverges.
c
Case (ii) - trivial interval of convergence
• at the end-points of this interval, anything
may happen.
This interval is called the interval of convergence
of the power series.
The radius of convergence is the distance from the
centre to either end-point.
This result is useful for knowing when we can use Taylor
approximations:
If x̄ is in the interval of convergence of a Taylor series
for f (x), f (x̄) can be approximated by the value of any
associated Taylor polynomial pk (x̄).
Diverges
Converges
c−R
c
Diverges
c+R
Case (iii) - finite interval of convergence
CONTENTS
51
Example 1.2.42.
Write out the first few terms of the following power series. In the last two cases the series are geometric: identify
when the series converges.
(a)
(b)
∞ n
X
x
(c)
n!
n=0
xn
n=0
∞
X
(−1)n x2n
n=0
∞
X
(d)
(2n)!
∞
X
(x − 2)n
n=0
Example 1.2.43.
All the power series in the preceding example are in fact Taylor series for particular functions. That is, these
power series have the form:
∞
∞
X
X
f (n) (c)(x − c)n
an (x − c)n =
n!
n=0
n=0
Can you recognise the functions?
P
an+1
< 1.
Recall: the ratio test says a series an is convergent when lim
n→∞ an
P
P
So, a power series an (x − c)n = un is convergent when
lim
n→∞
un+1
< 1.
un
Solution Technique 1.2.44. To determine if a power series in x − c
X
X
an (x − c)n =
un
converges at a point x:
• Put the value of x in the expression for the nth term. The series is no longer a power series but a regular
series: each term is a number.
• Find the limit l = lim
n→∞
un+1
and use the Ratio test.
un
Example 1.2.45.
The Taylor series for ex about the centre c = 0 is
Determine whether the series converges when
(a) x = 0
∞ n
X
x
n=0
(b) x = 1
n!
.
(c) x = 100
CONTENTS
52
Taylor series from known formulae
It can be shown that the Taylor series of a function about a centre c is unique. The Taylor series of a function
f (x) is nothing but a power series with the function f (x) as its sum.
P
1
n
is the sum of a power series ∞
For example, on the interval |x| < 1, the function f (x) = 1−x
n=0 x . Thus, the
P∞ n
1
power series n=0 x is the Taylor series of f (x) = 1−x
on (−1, 1).
If a function f (x) can be expressed as a power series in any way, then it is the Taylor series of this function f (x).
Geometric series
a
= a + ar + ar 2 + ar 3 +
1−r
A
a
A
can be written
=
by solving for a and b.
Bx + C
Bx + C
1 − br
∞
X
a
In this form, we apply the series expansion
= a (br)n to obtain the Maclaurin series for
1 − br
n=0
A
.
Bx + C
(i) c = 0. An expression of the form
(ii) c 6= 0. Writing
for
A
a
=
and solving for a and b, we use this form to find the Taylor series
Bx + C
1 − b(x − c)
A
about x = c.
Bx + C
Example 1.2.46.
Find the Taylor series for function f (x) =
(a) about the centre c = 0:
(b) about the centre c = 2:
2
and the associated interval of convergence
7 − 3x
CONTENTS
53
x2 x3
+
+ ···
2!
3!
The function y = ex has Maclaurin series
Exponentials
ex = 1 + x +
x
e =
∞ n
X
x
n=0
n!
.
We can obtain the Taylor series for ex centred at any c by rewriting the Maclaurin series for ex as a series in x − c:
ex = ec+(x−c) = ec ex−c
∞
X
(x − c)n
c
= e ×
.
n!
n=0
Example 1.2.47.
Use this technique to find the Taylor series for ex about x = 1.
Example 1.2.48.
Find the Taylor series for function f (x) = e2x+6 and give the interval of convergence I.
(a) about centre c = 0.
(b) about centre c = 1.
We now investigate finding other Taylor series from known formulae. So far, we have found the following four
such formulae:
1
= 1 + x + x2 + . . . + xn + . . .
1−x
=
∞
X
xn ,
n=0
∞ n
X
x
x2
xn
+ ... +
+ ...
2!
n!
=
sin x = x −
x3
(−1)n x2n+1
+ ... +
+ ...
3!
(2n + 1)!
=
cos x = 1 −
(−1)n x2n
(−1)n x2n
x2 x4
+
− ... +
+ . . .=
,
2!
4!
(2n)!
(2n)!
n=0
ex = 1 + x +
n=0
∞
X
n=0
∞
X
n!
,
(−1)n x2n+1
,
(2n + 1)!
− 1 < x < 1.
(1.5)
− ∞ < x < ∞.
(1.6)
− ∞ < x < ∞.
(1.7)
− ∞ < x < ∞.
(1.8)
CONTENTS
54
Inside their interval of convergence, power series have a
very nice property: like polynomials, they can be differentiated and integrated term by term.
P∞
n
Fact: If f (x) =
n=0 an (x − c) , then inside its
interval of convergence:
f ′ (x) =
∞
X
n=0
Z
f (x)dx =
an n(x − c)n−1 .
∞
X
an
n=0
(x − c)n+1
+ C.
n+1
logarithms
A most important application of this fact uses the relationship
Z
1
ln(1 + x) =
dx
1+x
and the series
∞
X
1
xn ,
= 1+x+x2 +. . .+xn +. . . =
1−x
n=0
−1 < x < 1.
Example 1.2.49.
Find the Maclaurin series of ln(1 + x) and its interval of
convergence.
ln(1 + x) =
∞
X
(−1)n xn+1
n=0
n+1
,
if
− 1 < x ≤ 1.
So, to find the Taylor series for ln(Ax + B) about x = c,
we solve
Ax + B = a(1 + b(x − c)).
Example 1.2.50.
Find the Taylor series for function f (x) = ln(5x − 4)
and the interval of convergence about the centre c = 1.
CONTENTS
55
Example 1.2.51.
Find the Taylor series for function (x) =
Hint: f (x) =
x2
x2
1
about the centre c = 0 and the interval of convergence.
− 3x + 2
1
1
1
1
=
=
−
− 3x + 2
(x − 1)(x − 2)
x−2 x−1
Example 1.2.52.
(a) Find the Taylor series for sin x about the centre c = π4 .
(b) Find the Maclaurin series for function f (x) =
(c) Find
Z
f (x) dx =
Z
sin x
dx.
x
sin x
.
x
CONTENTS
56
1.3 Integration Techniques
1.3.1 Review: Substitution
In MATHS 108, the basic rules for differentiation were covered, and integration was introduced.
Integration is the reverse process to differentiation. However, it is not as simple as differentiation and not every
expression can be integrated by a simple application of rules.
A rule of differentiation may produce a rule of integration. For instance, integration by substitution, covered in
MATHS 108, is derived from the Chain Rule:
d
(f (u(x))) = f ′ (u(x)) u′ (x)
dx
Note:
The Chain Rule produces a product, with one factor the derivative of part of the other.
The technique of the integration by substitution goes in the opposite direction:
Z
Z
f ′ (u(x)) u′ (x) dx = f ′ (u) du = f (u) + c
(1.9)
(1.10)
Differentiation of this result with respect to x returns the original integrand f ′ (u(x)) u′ (x), verifying the answer.
By making a change of variables in anticipation of the Chain Rule, an integral may be transformed into a much
simpler one.
Example
R √1.3.1.
Find x 3x2 + 5 dx.
Example 1.3.2.
Find the following integrals by substitution:
Z
(a)
(1 + 3x)5 dx
(b)
Z
x
dx
1 + x2
(c)
Z
x
dx
1+x
(d)
Z
esin θ cos θ dθ
CONTENTS
Z
ln(s)
ds
s
(f)
Z
sin(2y) dy
(g)
Z
e−4x dx
(h)
Z
u
du
(1 + u)4
(i)
Z
x
(e)
p
2x2 − 3 dx
57
(j)
Z
4x
√
2x2
3 dx
−3
(k)
Z
(ln(x))4
dx
x
(l)
Z
cos(3t) sin(3t) dx
(m)
Z
tan(πθ) dθ
(n)
Z
sin(ln(z))
dz
z
CONTENTS
58
Z
ln x
1
Not every integral can be evaluated directly or by substitution. For example,
dx = (ln x)2 + c can be
x
2
Z
found by a change of variables, but ln x dx = x ln x − x + c cannot be found directly or by substitution.
We move to another rule for differentiation for a second integration technique: The Product Rule yields Integration by Parts.
1.3.2 Integration by Parts
Recall that if u = u(x) and v = v(x) are differentiable functions of a variable x, then the Product Rule says
d
du
dv
(uv) =
v+u
dx
dx
dx
Integrating both sides with respect to x gives
du
v dx +
dx
Z
u
u v ′ dx = uv −
Z
u′ v dx,
(1.11)
Z
Z
v du
(1.12)
uv =
and rearranging terms,
or
Z
Z
u dv = uv −
dv
dx,
dx
Note:
• The assignment of u(x) and v ′ (x) is crucial.
• There may be more than one way to assign them. Generally, choose for v ′ (x) a part which can be integrated
easily.
R
But only as long as the new integral u′ (x)v(x) dx is at least as easy as the original integral.
ExampleR1.3.3.
Evaluate xex dx. Consider both choices:
R x
u = ex
dv = x dx
(i)
xe dx =
2
du = ex dx v = x2
When applying the integration by parts formula
Z
(ii)
R
xex dx
u dv = uv −
ated, u(x), from as high on the following list as possible (LIATE):
• Logarithms
• Inverse trigonometric functions
• Algebraic
• Trigonometric functions
• Exponential functions
Z
=
u =x
du = dx
dv = ex dx
v = ex
v du choose the function to be differenti-
CONTENTS
59
ExampleR1.3.4.
Evaluate (x − 2) sin 2x dx.
ExampleR1.3.5.
Evaluate x ln x dx.
Example R1.3.6.
Evaluate ln x dx. (Hint: Recognise the integrand as
the product 1 · ln x.)
Note:
We may use integration by parts repeatedly,
. . . or we may construct an equation and solve it.
ExampleR1.3.7.
Evaluate t2 e−3t dt.
Example 1.3.8.
The Matlab code to integrate in the previous example is
% Matlab-session
% - symbolic integration
syms t
int(t^2*exp(-3*t),t)
CONTENTS
60
ExampleR1.3.9.
Evaluate sin(θ)e−θ dθ.
Example 1.3.10.
Find the following integrals:
Z
ln(2x)
(a)
dx
x
R √
(b) x x2 + 1 dx
(c)
Z
x
√
dx
2
x −1
(d)
Z
5x2
√
dx
2x3 + 1
CONTENTS
(e)
Z
(f)
Z
(g)
Z
(h)
Z
4x2
p
3x−4
e
61
5
3x3 + 2 dx
dx
2
x ln(x) dx
3x−4
xe
dx
(i)
Z
3
dx
2x − 1
(j)
Z
2xe−x+4 dx
(k)
Z
(x2 + 1)e2−3x dx
(l)
Z
(x2 − x + 3)e2−3x dx
CONTENTS
(m)
Z
(n)
Z
2x2
p
62
3x3 + 2 dx
θ
cos(πθ)e dθ
(o)
Z
6
dx
2 − 3x
(p)
Z
sin(πx)e−2x dx
Linear algebra
2.1 Vector Spaces
Throughout this course, all vectors and matrices we consider will have real components - values on the real number line R.
We consider Rn as an extension of R2 and R3 . The
geometry of vectors, points, lines and planes in the 2dimensional plane R2 and 3-dimensional space R3 is extended to vectors of n components.
A vector can be thought of as either
z
w
u+v+w
(a) a geometrical object in R2 or R3 with length and
direction
v
(b) an ordered list of n components (row or column),
as for example the variable x and constant b in a
system of linear equations Ax = b.
u
x
2.1.1 Review: Solving Linear Systems of Equations
Linear algebra is chiefly concerned with techniques for
solving systems of linear equations.
A system of m linear equations in the n unknowns x1 , x2 , · · · xn
has the form
a11 x1
a21 x1
..
.
+
+
..
.
a12 x2
a22 x2
am1 x1 + am2 x2
+ ···
+ ···
..
..
.
.
+ ···
+a1n xn =
+a2n xn =
b1
b2
+amn xn = bm
(2.13)
All coefficients aij , right-hand-side terms bi and unknowns
xi will be real numbers in Maths 208.
The system (2.13) can be written in matrix form as
Ax = b,
where
• A is an m × n matrix,
• x an n × 1 vector of unknowns, and
• b an m × 1 vector of numbers.
63
u+v
y
CONTENTS
64
If b = 0, we call the system homogeneous.
Otherwise we call it inhomogeneous. Recall that any
system of linear equations has either
(i) one solution,
(ii) infinitely many solutions,
(iii) no solution.
Elementary Row-Operations
We classify three types of such “elementary row-operations”:
(i) row-exchange: Interchange any two rows (ri ↔ rj ).
(ii) row-multiple: Multiply a row by a non-zero constant (ri → kri ).
(iii) row-addition: Replace a row by itself plus any multiple of another row (ri → ri − krj ).
Definition 2.1.1.
Two matrices A and B related by a series of row-operations
are written “A∼B”.
We say A and B are row-equivalent.
Echelon and Reduced Echelon Form
Definition 2.1.2.
(a) A matrix A (not necessarily square) is in echelon form if
• all rows of zeros are at the bottom
• the first non-zero entry in every row (called
a “pivot” or “leading entry”) is to the right of
the first non-zero entry in the previous row
(step-like pattern of leading entries)
• the leading entry in every row has zeros below it
(b) A matrix A is in reduced echelon form if
• it is in echelon form
• the leading entry in every row (the pivot) is 1
• each leading 1 is the only non-zero entry in
its column
Definition 2.1.3.
The number of leading entries (pivots) in the echelon
form of a matrix A is called the rank of A, denoted by
rank(A).
CONTENTS
65
Notes:
1. Echelon form for a matrix is not unique - there are many possibilities.
However, in echelon form the number of pivots is unique. This is the number of non-zero rows in echelon
form.
2. Reduced echelon form for any matrix is unique (there is only one).
You can already solve a linear system in several ways:
(a) If A is square (n × n) and has an inverse: x = A−1 b.
(b) Elementary row-operations can be used to reduced the augmented matrix [A|b] to
(i) echelon form: use back-substitution to solve.
(ii) reduced echelon form: read off the solution.
Summary 2.1.4.
For square matrices, the following are equivalent conditions:
• A−1 exists – i.e. A is non-singular, or invertible
• det(A) 6= 0
• A ∼ In
• every row and column of echelon form has a pivot
• rank(A) = n
• the system of linear equations Ax = b has a unique solution
for every b
• the homogeneous system of linear equations Ax = 0 has only
the trivial solution x = 0
And when A does not have an inverse:
Summary 2.1.5.
For square matrices, the following are equivalent conditions:
• A−1 does not exist – A is a singular matrix
• det(A) = 0
• A 6∼ In
• not every row and column of echelon form has a pivot
• rank(A) < n
• the homogeneous system of linear equations Ax = 0 has a
non-trivial solution x 6= 0
Now we begin to study vectors.
Notation: We will sometimes write vectors as row-vectors, and sometimes as columns. We will not distinguish
between the two except where matrix multiplication dictates one form over the other.
CONTENTS
66
2.1.2 Linear Combinations
Definition 2.1.6.
A vector b is a linear combination of vectors v1 , v2 , . . . , vk if we can write
b = x1 v1 + x2 v2 + · · · + xk vk
for scalars x1 , x2 , . . . , xk .
Note: We can write this linear combination in matrix form, letting


|
|
|
A =  v1 v2 · · · vk  .
|
|
|
Then (2.14) becomes b = Ax.
So a linear combination of vectors (2.14) is no more than the vector-form of a matrix equation.
(2.14)
CONTENTS
67
We will spend some time studying how to simplify linear combinations. They sometimes appear more complicated than they really are.
Example 2.1.7.
 
 
 
2
1
1
(a) Find constants c1 and c2 so that 3 = c1 1 + c2 2.
4
1
3
(b) Use your answer to (a) to simplify the linear combination
 
 
 
1
1
2





x1 1 + x2 2 + x3 3
1
3
4
It isn’t always easy to see how to simplify a linear combination of vectors. That’s the subject of our next section.
CONTENTS
68
2.1.3 Linear Independence and Dependence
Linear independence is a very important notion: it holds
the key to the idea of dimension, and is fundamental in
most applications of linear algebra.
Definition 2.1.8.
(i) We say that a set of vectors {v1 , v2 , . . . , vn } is
linearly independent if the only solution of
c1 v1 + c2 v2 . . . + cn vn = 0
(2.15)
is c1 = c2 = . . . = cn = 0.
(ii) A set of vectors which is not linearly independent is called linearly dependent.
Notes:
• (2.15) is vector form of the
 matrix equation 
|
|
|

Ax = 0 for matrix A = v1 v2 · · · vn .
|
|
|
From this perspective, vectors v1 , v2 , . . . , vn are
linearly independent if
Ax = 0
⇒ x = 0,
and linearly dependent if Ax = 0 for some x 6= 0.
• Linear dependence of a set of vectors {vi }ni=1 means
one of the vectors vi can be written as a linear
combination of the others.
From the reduced echelon form of a matrix (sometimes
abbreviated rref), it is particularly easy to detect linear
dependence among its columns.
Example 2.1.9.
Consider
 the columns
 of the matrix
1 1 1 1
A = 1 1 1 1 = v1 v2 · · · v4 .
1 2 3 4
Reducing A to reduced echelon form,

 ❦

1 1 1 1
1 0 −1 −2
1 1 1 1 ∼ 0 1❦ 2
3  = U.
1 2 3 4
0 0 0
0
We have the augmented system for simultaneously solving the pair of equations
c1 v1 + c2 v2 = v3
d1 v1 + d2 v2 = v4 .
CONTENTS
69
 
a
As any vector  b  can be written
c
 
 
 
 
0
0
a
1
 b  = a 0 + b 1 + c 0 ,
1
0
0
c
(2.16)
we can write the non-pivot columns of reduced echelon
form in terms of the pivot columns. So in the matrix U
above,
column3 = −column1 + 2 · column2 .
column4 = −2 · column1 + 3 · column2 .
(a) Verify that in the original matrix A, the same
relationships hold, establishing the linear dependence of the sets {v1 , v2 , v3 } and {v1 , v2 , v4 }.
(b) Using these relationships, simplify the general
linear combination
c1 v1 + c2 v2 + c3 v3 + c4 v4
(c) Alternatively, solve the homogeneous system
c1 v1 + c2 v2 + c3 v3 + c4 v4 = 0
directly to determine linear dependence. Compare
with (a).
Row-reduction of a matrix to reduced echelon form preserves the relationships between the columns, but makes
them more obvious!
Linear dependence among vectors v1 , v2 , · · · , vk is visible in the reduced echelon form U of a matrix


1 ∗ 0 0 ∗ ∗ ... a


 0 0 1 0 ∗ ∗ ... b 

A = v1 v2 · · · vk ∼  0 0 0 1 ∗ ∗ . . . c 
 = U.
 0 0 0 0 0 0 ... 0 
0 0 0 0 0 0 ... 0
• Divide the reduced echelon-form matrix U between
the pivot columns and the non-pivot columns.
• The set of pivot columns of U is linearly independent. So also is the set of corresponding columns
of A.
• Each non-pivot column of U is a linear combination of the pivot columns, by (2.16).
CONTENTS
70
• The same linear relationships hold for the columns
of A.
• As illustrated, vk = av1 + bv3 + cv4 .
Notes:
• The ordering of the vectors v1 , v2 , · · · vk is not
unique, but this ordering determines which relationships the reduced echelon form elicits. We
could have written the vi in a different order to
get a different linearly-independent set, and a different, but equivalent, set of linear dependence relationships.
• A pivot column of reduced echelon form does not
yield a linear relationship with other columns –
there are only zeros above and below the pivot.
• The column-relationships of reduced echelon form
hold for each intermediary matrix in the row-reduction.
FACT A matrix of rank s has exactly s linearly independent columns.
Example 2.1.10.
Identify the linear relationships between the columns of
the given matrices from their reduced-echelon forms.


1 2 3
(a) 2 3 4
3 4 4

 
1 0 0
1 1 1
1 2 3  0 1 0

 
(b) 
1 0 −1 ∼ 0 0 1
0 0 0
0 0 1


0 1 1 1
0 1 2 3

(c) 
1 0 1 0 ∼ I4
0 0 1 1

 
0 1 1 1 0
1
 0 1 0 1 0 0
 
(d) 
 1 1 1 −1 0 ∼ 0
−1 1 0 1 0
0

0
1
0
0
0
0
1
0
0
0
0
1

0
0

0
0
CONTENTS

71
0
1
(e) 
1
1
 
0 1 −1
1
0
1 1
1
∼
2 1
0  0
3 −1 1
0

1
0

(f) 
0
0
0
 
0 −1 0
1 0
1
0
1 −1 1
0 0
 

0 1 −1 −1 1
 ∼ 0
0 0 −1 0 1 0
0 0
0
0 0
0
0
1
0
0

0 3
0 −1

1 −1
0 0
0
1
0
0
0
0
0
1
0
0

0 0
0
0 −1 1 

0 −1 0 

1 0 −1
0 0
0
Example 2.1.11.
Verify the following for any vectors you choose:
(a) A single non-zero vector is always linearly independent.
(b) Two vectors are linearly independent if they are not multiples of each other (i.e. if they are not parallel).
(c) Any two non-zero vectors that are linearly dependent are multiples of each other.
(d) Any n + 1 vectors in Rn are linearly dependent.
Example 2.1.12.
Are the following sets of vectors linearly independent? If not, find a linear relationship between them.
    
2 
 1
(a) 2 , 4 .


6
3
   
2 
 1
(b) 2 ,  4  .


3
−6
1
1
1
(c)
,
,
.
2
3
4
     
0 
0
 1
(d) 0 , 1 , 0 .


1
0
0
      
2 
1
 1





(e)
1 , 1 , 0 .


0
0
1
        
1
−1
−1 
 −1
(f)  1  , −2 ,  0  ,  1  .


2
0
1
0
CONTENTS
72
The span of a set of vectors
Two distinct points (one bound vector) determine a line.
Three points not all on the same line (two linearly-independent
vectors) determine a plane. The plane is the “span” of
these vectors.
Definition 2.1.13.
The span of vectors u1 , u2 , . . . uk in Rn is {c1 u1 + · · · + cn uk ,
denoted by Span(u1 , · · · , uk ).
c1 , · · · , ck ∈ R},
z
v
y
x
u+v
u
Span(u, v)
2u
Note: We use the notation {x, y} to denote the set containing two members, x and y.
The span of a set of vectors is the set of all linear combinations of that set of vectors.
We say that vectors u1 , u2 , . . . uk span a set X if every x ∈ X can be written
x = c1 u1 + · · · + ck uk ,
for
c1 , · · · , ck ∈ R.
Note: A linear combination of linearly-dependent vectors can always be simplified, so the span of a linearlydependent set is actually the span of a smaller set of linearly-independent vectors.
Example 2.1.14.
The span of any number of vectors is closed under scalar multiplication and addition.
Consider the case of just two vectors u1 and u2 .
Two vectors x = c1 u1 + c2 u2 and y = d1 u1 + d2 u2 are both members of Span(u, u2 ). In addition,
rx = (rc1 )u1 + (rc2 )u2
x + y = (c1 + d1 )u1 + (c2 + d2 )u2
are also in Span(u1 , u2 ).
CONTENTS
73
Recall that
(i) the equation of a line through the origin in R2 is
x
d
=t 1 ,
y
d2
(ii) the equation of a line through the origin in R3 is
 
 
x
d1
y  = t d2  ,
z
d3
t∈R
t∈R
(iii) the equation of a plane through the origin in R3 is
 
 
 ′
d1
x
d1
y  = s d2  + t d′2  ,
z
d3
d′3
s, t ∈ R
where the sets of direction vectors {d} and {d, d′ } are linearly independent.
Summary 2.1.15. (i) The span of one linearly independent vector in R2 or R3 is a line through the origin.
(ii) The span of two linearly independent vectors in R2 is all of R2 .
(iii) The span of two linearly independent vectors in R3 is a plane through the origin.
(iv) The span of three linearly independent vectors in R3 is all of R3 .
Example 2.1.16.
Characterise the following sets in R3 , and sketch them.
 
2

(a) The span of u1 = 1.
1
    
4 
 2
(b) span 1 , 2 .


2
1
    
1 
 2



(c) span
1 , 2 .


0
0
CONTENTS
74
    
3 
 2
(d) span 0 ,  0  .


1
−1
      
3
4 
 2
(e) span 0 ,  0  , 0 .


1
−1
3
      
3
0 
 2





(f) span
0 , 0 , 1 .


1
−1
0
Example 2.1.17.
The Matlab function spanner.m (available from course
resources) generally takes two arguments, both vectors
in R2 or R3 , and illustrates their span. Copy it to your
working directory, and call it with two vectors, or as below with two vectors and a pause interval (in seconds).
Rotate the graph to see the shape of the span.
%Matlab-session: span of two vectors
a=[1,2 2];
b=[1 1 1]; % row or column vectors
spanner(a,b);
Figure 2.6: Output of vs1.m.
CONTENTS
75
2.1.4 Definition of a Vector Space
In its abstract form, a vector space is a non-empty set V of objects, called vectors, for which addition and
multiplication by scalars are defined. For any vectors u, v and w in V , and any scalars r and s, the following
ten properties are satisfied:
1. u + v is in V
2. u + v = v + u
3. u + (v + w) = (u + v) + w
4. There is a zero vector 0 in V so that u + 0 = u
5. u in V means −u is also in V , and u + (−u) = 0
6. ru is in V
7. r(u + v) = ru + rv
8. (r + s)u = ru + su
9. (r(su) = (rs)u
10. 1u = u
Theorem 2.1.18.
The span of any set of vectors in Rn is a vector space.
Euclidean Vector Spaces Rn
The physical entities we know as lines, planes and 3-d space can easily be seen to be vector spaces - by confirming
that they satisfy all the properties above. We call them R1 , R2 and R3 respectively. You live in R3 .
y
z
x
x
y
x
R1
R2
R3
With our geometric intuition about lines, planes and space, we can attribute a lot of properties to these vector
spaces.
Many of these properties extend, algebraically at least, to sets of vectors with an arbitrary number n of components. The set of all such vectors turns out to be a vector space too, Rn .
Example 2.1.19.
The vectors in Rn are the n-tuples (x1 , x2 , · · · xn ), where each component xi is real. We can easily verify that
Rn is a vector space for any integer n > 0.
CONTENTS
76
Definition 2.1.20.
For any positive integer n, the vector space Rn of all vectors with n components is known as (n-dimensional)
Euclidean vector space.
Other vector spaces
There are a variety of things which are not geometrically recognisable Euclidean vector spaces, but which have
an analogous structure, and can therefore be called real vector spaces.
Some examples:
1. An m×n matrix may be considered as a list of mn numbers, written in a particular rectangular fashion. Under
addition of matrices and multiplication of matrices by a
scalar, all m × n matrices form a vector space.
2. Polynomials of degree ≤ n in the variable x: all expressions of the form p(x) = a0 + a1 x + · · · + an xn .
3. In the following subsections we will see that all solutions of the matrix equation Ax = 0 for any particular
m × n matrix A. This set is called the nullspace of the
matrix A, denoted Null(A).
4. Also, all vectors b for which the linear system Ax = b
has a solution, for any given matrix A. This set is called
the column space of the matrix A, denoted Col(A).
5. In the Differential Equations section we will see systems of homogeneous linear ordinary differential equations, e.g.,
(a)
(b)
dy
dt + y = 0,
d2 y
+ 4 dy
dt +
dt2
3y = 0.
The set of solutions of each of these differential equations forms a vector space of functions.
We conclude by mentioning that everything we do with real numbers, R, can equally well be done with complex
numbers C.
Vector spaces over C are of great importance in many applications, but for this course we will stick to real vector
spaces.
CONTENTS
77
2.1.5 Basis and Dimension
Definition 2.1.21.
If V is a vector space and B a set of vectors in V , we say
that B is a basis of V if
(i) B is a linearly independent set;
(ii) B spans V .
Notes:
• It’s easy to determine whether a set of vectors B is
linearly independent (the rank of a matrix counts
its linearly independent columns).
• To determine whether the set B spans the vector
space V , we must verify that every y ∈ V is a
linear combination of vectors in B.
(Linear independence ensures that there are not too many
vectors in a basis. Spanning ensures that there are not too
few. )
If a vector space V has a basis B, then
(a) any set of vectors in V that is larger than B is linearly dependent.
(b) any set of vectors in V that is smaller than B does
not span the vector space.
The standard basis of Rn .
Among the bases of Rn , the special basis of vectors each
with one component equal to 1 and all other components
0,
 
 
 
1
0
0
 0 
 1 
 0 
 
 
 
e1 =  .  , e2 =  .  , . . . , en =  .  ,
 .. 
 .. 
 .. 
0
0
1
is called the standard basis of Rn .
Any vector x ∈ Rn can be written

 

 

x1
1
0
 x2 
 0 
 1 


 

 

 ..  = x1  ..  + x2  ..  + · · · + xn 
 . 
 . 
 . 

xn
0
0
0
0
..
.
1



.

CONTENTS
78
Example 2.1.22.
(a) Sketch the standard bases for R2 and R3 .
(b) To sketch a plane in R3 , sketch its basis vectors
and complete a parallelogram with lines parallel
to these vectors.
Definition 2.1.23.
Any vector space V that is not just the zero vector has infinitely many bases. But every basis for V has the same
number of elements. This number is called the dimension of V , denoted dim(V ).
Note: The dimension of a vector space is the number of
linearly independent vectors required to span the vector
space.
We already have one basis, the standard basis, for any
Euclidean vector space Rn . This gives the result:
FACT
Euclidean vector space Rn has dimension n.
This result agrees with our colloquial use of the word
dimension – we think of
• lines as one-dimensional,
• planes (like the black-board) as 2-dimensional
• space as 3-d.
CONTENTS
79
2.1.6 Subspaces of Rn
Definition 2.1.24.
Let W be a set of some vectors in Rn , i.e. W is a subset
of Rn . If W satisfies the following properties:
(i) if v1 and v2 are in W , then v1 + v2 ∈ W ,
(ii) if v ∈ W and r is a scalar, then rv ∈ W ,
then the set W is called a subspace of Rn .
In other words, a subspace W of Euclidean space Rn
contains the origin, and is closed under addition and scalar
multiplication.
Notes:
1. Every subspace of any vector space contains the
origin. (Let r = 0 in (ii) above.)
2. A subspace W of a vector space V is a vector
space within a vector space.
3. dim(W ) ≤ dim(V ).
The span of any set of vectors in Rn forms a vector-space.
If we call this vector space W , and if the spanning vectors are a
linearly independent set, they form a basis for W .
FACT
Example 2.1.25.
(a) A one-dimensional subspace of Rn is called a line.
(b) A two-dimensional subspace of Rn is called a plane.
Etc.
Example 2.1.26.  
 
 
1
x
1





Show that the line y = 1 + t 2 in R3 is not a
3
0
z
3
subspace of R .
CONTENTS
80
Finding a basis for a subspace of Rn
Example 2.1.27.
       
2
3
4 
 1
Suppose we have a subspace S of R3 : S= Span 5 ,  6  ,  7  ,  8  .


9
10
11
12
The vectors v1 , . . . v4 , of S span it - we simply have to
determine a linearly independent set among them.
From the reduced echelon form:

 

1 2 3 4
1 0 −1 −2
A = 5 6 7 8  ∼ 0 1 2
3 .
9 10 11 12
0 0 0
0
v1 and v2 are linearly independent, so they form a basis
for S.
Could we have seen this without reducing A to reduced
echelon form?
• We could look at the echelon form of A, and observe which columns contain the pivots.
• These are the pivots-columns in reduced echelon
form too.
• The corresponding columns of A are linearly independent, and form a basis for the vector space
S.
To find a basis for the span of vectors v1 , v2 , · · · , vn :
• form the matrix A = [v1 v2 · · · vn ] with these
vectors as its columns,
• reduce A to echelon form U (not unique),
• identify the columns of U containing the pivots,
• the corresponding columns of the original matrix
A are linearly independent and form a basis for
Span {v1 , v2 , · · · , vn }.
Example 2.1.28.
       
1
3
0 
 1
Given the set of vectors S = 1 ,  1  ,  3  , 0 .


1
−1
−1
1
Find a basis for the vector space in R3 spanned by the set
S.
CONTENTS
81
2.1.7 Matrices and their Associated Subspaces
in Rn
Associated with any m × n matrix A are two special
subspaces: its nullspace Null(A), and its column space
Col(A).
Column space
Definition 2.1.29.
The column space of a matrix A, denoted Col(A), is the
span of its columns.
If A = v1 v2 · · · vn is an m × n matrix,
Col(A) = Span {v1 , v2 , · · · , vn }
(2.17)
= {x1 v1 + x2 v2 + · · · + xn vn ,
= {b ∈ Rm : Ax = b,
x ∈ Rn }
for some x ∈ Rn }
Example 2.1.30.
As the span of a set of vectors, the column space Col(A)
of any matrix A is a subspace of Rm .
A basis for Col(A) is found with the technique of Section 2.1.6.
To find a basis for the column space Col(A) of a
matrix A,
• reduce A to echelon form U (not unique)
• identify the columns of U containing the pivots
• the corresponding columns of the original matrix
A form a basis for the column space Col(A).
Example 2.1.31.
Find a basis and the dimension of the column space of
the matrices

 

1 2 −1 2
1 2 −1 2
(a) A = 1 2 1 −2 ∼ 0 0 1 −2
1 2 −3 6
0 0 0
0

 

1 2 1
1 2 1
(b) B = 2 4 3 ∼ 0 0 1
0 0 0
3 6 4
We can only solve a system of equations
Ax = b if b ∈Col(A).
For if A = v1 v2 · · · vn is an m × n matrix,
then
Ax = x1 v1 + x2 v2 + · · · + xn vn .
(Ax is a linear combination of the columns of A)
So
Ax = b ⇐⇒ b ∈ Col(A).
CONTENTS
82
FACT
A system of linear equations Ax = b is consistent (has a solution)
if and only if b is a linear combination of the columns of A.
Example 2.1.32.
1 1 x
1
The system of equations
=
is inconsis1 1 y
2
tent.
1
1
We cannot write
= x1
(see Figure 2.7).
2
1
y
Col(A)
1
0
0(1, 2)
1
x
Figure 2.7: Ax = b inconsistent when b 6∈ Col(A)
Example 2.1.33.
Matlab has a symbolic colspace command to find the
column space of a matrix.
%Matlab-session
%
a=[1 1 1;1 0 -1;1 -2 1];
colspace(sym(a))
CONTENTS
83
Nullspace
Definition 2.1.34.
If A is an m × n matrix, the set of all solutions of the
homogeneous system Ax = 0 is called the nullspace of
A. It is denoted by Null(A).
To find Null(A), solve Ax = 0 for x.
Example 2.1.35.
Find the nullspace of the matrix

 

1 2 3 4
1 0 −1 −2
3
A = 2 3 4 5 ∼ 0 1 2
3 4 5 6
0 0 0
0
and write it as the span of a set of vectors.
Notes:
• Null(A) is what we call the general solution of the
system Ax = 0.
• Every matrix has the vector 0 in its nullspace.
• If A has linearly independent columns, Null(A) = {0}
– the only solution of Ax = 0 is the trivial solution x = 0, the set containing only the zero vector.
• If A has linearly dependent columns, Null(A) 6= {0}
– there are non-trivial solutions x 6= 0 of Ax = 0.
Recall that the rank of a matrix counts the number of
linearly independent columns.
CONTENTS
84
Example 2.1.36.
Find the nullspace of the following matrices by finding
the general solution of the associated homogeneous systems:
1 2
1 0
(a) A =
∼
.
3 4
0 1
 


1 0 −1
1 2 3
(b) B = 4 5 6 ∼ 0 1 2 .
7 8 9
0 0 0
 
x1
If Bx = 0, then x = x2 , x3 is free, and the
x3
variables x1 and x2 are bound:
   
x1
1
x2  = −2 x3 .
x3
1


 
1 0 −1 −2
1 2 3 4
3 .
(c) C = 5 6 7 8  ∼ 0 1 2
0 0 0
0
9 10 11 12
If Cx = 0, then
  
  
 
x1
x3 + 2x4
1
2
x2  −2x3 − 3x4  −2
−3
 
 =   x3 +   x4 .
x=
 x3  = 
 1
0
x3
x4
x4
0
1

 

1 2 3 4 5
1 0 0 −1 −2
2 .
(d) A =  6 7 9 10 11 ∼ 0 1 0 1
12 13 14 15 16
0 0 1 1
1

 
1 0 −1 0 −1
1 2 3 4 5
(e) A =  6 7 8 10 11 ∼ 0 1 2 0 1 .
0 0 0 1 1
12 13 14 15 16

CONTENTS
85
Example 2.1.37.
For A an m × n matrix, Null(A) is a subspace of Rn .
That is:
FACT The set of solutions of any homogeneous system
of linear equations in n unknowns is a subspace of Rn .
Definition 2.1.38.
The dimension of Null(A) is called the nullity of A.
• If the n columns of A are linearly independent,
Null(A) = {0}, and we say the nullity of A is 0.
• The dimension of Null(A) is the number of free
variables in the solution of the system Ax = b :
the total number of variables (n) minus the number of bound variables (rank(A)).
To find a basis for the nullspace Null(A) of an m × n
matrix A:
• reduce A to echelon form U ,
• back-substitute to solve U x = 0 – this solves
Ax = 0,
• write the solution in vector form, as all linear combination of some vectors,
• a basis for Null(A) consists of these vectors in Rn .
Example 2.1.39.
Find the nullity of the given matrices:
 


1 0 −1
1 2 3
(a) A = 4 5 6 ∼ 0 1 2 
7 8 9
0 0 0

 

1 2 3 4
1 0 −1 −2
(b) B = 5 6 7 8  ∼ 0 1 2
3
9 10 11 12
0 0 0
0


1 0 0 −1
(c) C = 0 0 −1 1 
0 0 0
0
CONTENTS
86
Example 2.1.40.
The Matlab function eigshow illustrates the nullspace of a 2 × 2
matrix very well. Use your mouse to move the x vector around,
and see where Ax is.
When Ax is at the origin, x ∈ Null(A).
%Matlab-session
A=[1 2; 2 4]
rank(A)
null(A)
null(sym(A))
B=[1 2 3; 2 3 4; 4 5 6]
u=null(sym(B))
eigshow([1 3;4 2]/4);
eigshow(A);
FACT
% unit vectors
% not unit-vectors
%rank 2 matrix
%rank 1 matrix
Vectors in Null(A) are orthogonal to every row of A.
Example 2.1.41.
Verify the fact above for the matrix

 

1 2 3 4
1 0 −1 −2
A = 2 3 4 5 ∼ 0 1 2
3 .
3 4 5 6
0 0 0
0
CONTENTS
87
2.1.8 The General Solution of Ax = b
The nullspace of a matrix A, the solution of Ax = 0,
plays an important part in the general solution of inhomogeneous systems of equations Ax = b. This is because matrix multiplication is a linear transformation it possesses the property of linearity.
Linearity
Definition 2.1.42.
For any matrix A, compatible vectors u and v, and scalar
c,
A(cv) = cAv,
A(u + v) = Au + Av
These two properties together characterise linear transformations.
Note: We say matrix multiplication is a linear operation.
So also are the operations of differentiation and integration:
• (c1 f + c2 g)′ = c1 f ′ + c2 g′ ,
R
R
R
• (c1 f + c2 g) dx = c1 f dx + c2 g dx.
Solutions of Ax = b
In particular, if u ∈ Null(A),
A(cu + v) = Av
So if we know just one (any) solution v of Ax = b, then
we know a whole family of solutions of Ax = b:
{v + u : Av = b
and
Au = 0}.
(2.18)
We call any solution of Ax = b a particular solution.
It turns out that (2.18) describes every possible solution
of Ax = b.
The general solution of an inhomogeneous system Ax = b,
the set of all possible solutions, can be written:
{v + u : Av = b,
u ∈ Null(A)}.
(2.19)
general solution = particular solution + general solution of
of Ax = b
of Ax = b
homogeneous system
Ax = 0.
(2.20)
The reduced echelon form of the augmented matrix contains all this information:
CONTENTS
88
Example 2.1.43.
The linear system Ax = b given by
x1 + 2x2 + 3x3 + 4x4 = 5
6x1 + 7x2 + 8x3 + 9x4 = 10
11x1 + 12x2 + 13x3 + 14x4 = 15
has reduced echelon form

 
1 2 3 4 5
 6 7 8 9 10  ∼ 
11 12 13 14 15

1❦ 0 −1 −2 −3
3
0 1❦ 2
4 
0
0 0
0
0
Showthatthe general
 solution of Ax
 = b is
1
−3
2
−2
4
−3

 
 
x=
 0  + c1  1  + c2  0  , where c1 , c2 ∈ R
0
0
1
Notes:
• The general solution of a system of equations Ax =
b is a translation of the nullspace.
• When solutions to a linear system of equations
Ax = b exist, the nullspace of A provides the
“infinitely many” part of the solution. The general
solution x to Ax = b has the same contribution
from Null(A) for all b for which a solution exists.
• If the echelon form of A has k columns without a
pivot, there are k free variables in the solution of
every consistent system Ax = b.
Example 2.1.44.
Find the general solutions of the following inhomogeneous systems. Compare with your answers to Ex 2.1.36.

   
 


1 2 3
x1
−1
1 0 −1
1 2 3 −1
3
(a) 4 5 6 x2  =  2 
2 ∼ 0 1
Note:  4 5 6
2 −2 
7 8 9
x3
5
0
5
7 8 9
0 0
0
CONTENTS
89


 
1 2 3 4
2
(b) 5 6 7 8  x = 4
9 10 11 12
6

 

3 2 1
5
(c) 6 4 2 x = 10
9 6 3
15
 

−1
1 0 −1 −2
1 2 3 4 2
2
3 −3/2 
Note:  5 6 7 8 4  ∼  0 1
9 10 11 12 6
0 0
0
0
0

 

1 2/3 1/3 5/3
3 2 1 5
0 
0
0
Note:  6 4 2 10  ∼  0
9 6 3 15
0
0
0
0

Example 2.1.45.
A Matlab plot of the nullspace and general solution of
this last example shows that that the general solution is a
vertical translation of the nullspace.
%Matlab-session
% plot plane from vector form
a=[3 2 1; 6 4 2; 9 6 3];
b=[5 10 15]’;
rank(a)
c=null(a)
u=-1:1:1;
[x,y]=meshgrid(u,u);
z_1=5-3*x-2*y; % solution
mesh(x,y,z_1); % plot
hold on
text(1,1,0,’general solution’);
v=-1:0.5:1;
[x,y]=meshgrid(v,v);
z_2=-3*x-2*y;
% nullspace
text(1,1,-5,’nullspace’);
text(0,0,0,’(0,0,0)’);
mesh(x,y,z_2);
% plot
title(’general solution and nullspace’);
hold off;
general solution and nullspace
10
5
general solution
0
(0,0,0)
nullspace
−5
1
0.5
0
−0.5
−1
−1
−0.5
0
0.5
1
CONTENTS
90
The relationship between nullity and rank
Recall that the rank of a matrix A is the number of non-zero rows or pivots in any echelon form of A.
• The pivot-columns of a matrix in echelon form are linearly independent.
• The pivot-rows of a matrix in echelon form are linearly independent.
• There are as many pivot rows as pivot columns in any echelon form. This is the rank of the matrix.
This tells us that
The rank of a matrix is the dimension of its column space:
rank(A) = dim(Col(A)).
Note: If A is a square n × n matrix, then
rank(A) = n ⇔ det(A) 6= 0 ⇔
A−1 exists.
An important result relating the nullity of a matrix to the rank is the following:
FACT
If A is an m × n matrix, then
rank(A) + nullity(A) = number of columns of A.
(2.21)
This simply states that in the general solution of any linear system Ax = b, the number of bound variables plus
the number of free variables is the total number of variables – as the number of components of x is equal to the
number of columns of A.
Example 2.1.46.
Find the rank and nullity of the matrices


1 1 1
(a) A = 1 1 1.
1 1 1


1
1 1
(b) A = −1 1 1
−1 −1 1
Because the columns of a matrix A are linearly independent only when
Null(A) = {0},
or
nullity(A) = 0 ,
we can use equation (2.21) to establish when a given set of vectors is a basis for a vector space V :
A given set of vectors in Rm is a basis for a vector space V of dimension n when
• there are n vectors
• they are linearly independent
this will be true when they are the columns of a m × n matrix A, and either
• Null(A) = {0}, or equivalently
• rank(A) = n.
CONTENTS
91
2.2 Inner Products and Orthogonality
2.2.1 Orthogonal and orthonormal bases
Definition 2.2.1.
(a) A basis {v1 , v2 , . . . , vn } of an n-dimensional vector
space V is called orthogonal if its members are pairwise
orthogonal:
(i) vi · vj = 0 for i 6= j.
(b) A basis is called orthonormal if it is orthogonal
(above) and consists of unit vectors:
(ii) kvi k = 1
or equivalently
vi · vi = 1.
(c) The members vi of an orthonormal basis satisfy
(
1 i=j
T
.
(2.22)
vi · vj = vi vj =
0 i 6= j
Definition 2.2.2.
A matrix whose columns are orthonormal vectors is called
an orthogonal matrix.
Notes:
• An n × k orthogonal matrix A has rank k, and
satisfies the equation
AT A = Ik .
(2.23)
• An n × n orthogonal matrix A has rank n, and
satisfies the very useful property
AT A = In = AAT
so
A−1 = AT
(2.24)
Example 2.2.3.
The standard basis of Rn is an orthonormal basis.
Example 2.2.4.
The vectors v1 = [1, 1, 1], v2 = [1, 0, −1], v3 = [1, −2, 1]
form an orthogonal basis B of R3 . The basis is not orthonormal.
An orthogonal basis can be made into an orthonormal
basis by ‘normalising’ the basis vectors - by dividing
each vector by its length. This gives a basis of unit vectors in the same directions as the original vectors.
Example
2.2.5.
B = [1, 1, 1], [1, 0, −1], [1, −2, 1] is an orthogonal
√ √
3 . The lengths of its vectors are
3, 2 and
basis
of
R
√
6,
respectively.
Therefore,
i h
i h
io
nh
√1 , √1 , √1 , √1 , 0, − √1 , √1 , − √2 , √1
3
3
3
2
2
6
6
6
is an orthonormal basis of R3 .
CONTENTS
92
Why would we want an orthonormal basis?
In the next topic, and in many computational situations,
we will want to express a given vector x0 as a linear
combination of a basis {v1 , · · · , vn } of Rn .
That is, we find coefficients c1 , c2 , . . . , cn , so that
x0 = c1 v1 + · · · cn vn .
(2.25)
Formulating this as a matrix equation, we write
A = v1 v2 · · · vn .


c1
Then (2.25) says x0 = Ac for c = · · ·.
cn
Because the vi form a basis, the matrix A has an inverse:
x0 = Ac
⇒
c = A−1 x0 .
If the set {v1 , v2 , · · · vn } forms an orthonormal basis,
then A is an orthogonal matrix, so A−1 = AT (see
(2.24)):


 T
v1 · x0
v1
 v2 · x0 
v2T 

 x0 = 
Ac = x0 ⇒ c = AT x0 = 
,

..
· · · 


.
vnT
vn · x0
as viT x0 = vi · x0 .
In this case, (2.25) has a particularly simple solution
x0 = (v1 ·x0 )v1 +(v2 ·x0 )v2 +· · ·+(vn ·x0 )vn . (2.26)
Example 2.2.6.
1
Confirm that the vector x =
can be written as a
2
linear
combination
of the orthonormal basis vectors u =
1
1
√1
, v = √12
by verifying that
2 1
−1
x = (x · u)u + (x · v)v.
Example 2.2.7.  
1
Write the vector 2 as a linear combination of the set
3
of orthonormal basis vectors

 
 
 
1
1
1 
 1
1
1
√ −2 , √ 1 , √  0 
 6
3 1
2 −1 
1
CONTENTS
93
2.2.2 Orthogonal projection of one vector onto
the line spanned by another vector
Definition 2.2.8.
If v and w are non-zero vectors in Rn , we define the
orthogonal projection of v onto the line spanned by w
to be the vector
w·v
projw v =
(2.27)
w.
w·w
Note: w.v and w.w are scalars; projw v is a vector in
the direction of w, so the direction of projw v is the unit
vector
w
projw v
=
.
kwk
kprojw vk
(Recall: Two vectors u1 and u2 have the same direction,
or are parallel, if a unit vector in the direction of one is
the same as a unit vector in the direction of the other.)
Formula (2.27) can be derived from Figure 2.8:
letting θ denote the angle between v and w,
cos θ =
kprojw vk
A
=
,
H
kvk
so kprojw vk = kvk cos θ
.
w.v
by (4.78)
= kvk
kwkkvk
w.v
=
kwk
projw v
⇒ projw v = kprojw vk
kproj w vk
w.v
w
=
·
kwk kwk
| {z } | {z }
length direction
An important property of the orthogonal projection,
shown in Figure 2.8, is that
v − projw v is orthogonal to w
Example 2.2.9.
Prove that (v − projw v) · w = 0
using the projection formula (2.27).
(2.28)
v − projw v
v
θ
projw v
Figure 2.8: projw v and v − projw v.
w
CONTENTS
94
2.2.3 The Gram-Schmidt process
When projecting one vector onto another, we also create an orthogonal pair of vectors. The Gram-Schmidt
process is an algorithm that extends that idea, creating an orthogonal set of vectors. The Gram-Schmidt process
is an algorithm that takes an arbitrary basis of a vector space V ,
(any basis)
B = u1 , u2 , . . . , un
and creates from it an orthogonal basis of V ,
B ′ = v1 , v2 , . . . , vn
(orthogonal basis)
This is done using the perpendicularity property (2.28), see Figure 2.8. Here is an algorithm that creates orthogonal vectors v1 , v2 , . . . , one by one.
v1 = u1
v2 = u2 − proj v1 u2
v3 = u3 − proj v1 u3 − proj v2 u3
···
Computationally,
v1 = u1
u2 · v1
v1
v2 = u2 −
v1 · v1
u3 · v2
u3 · v1
v1 −
v2
v3 = u3 −
v1 · v1
v2 · v2
···
ui · vj
ui · vj
=
is a scalar, not a vector! At each step, we subtract multiples of the
vj · vj
kvj k2
vi we’ve already found from the uk we’re working on.
2. You should check that each new vk is orthogonal to all the vi already found.
Notes: 1. Every coefficient
Example 2.2.10.
Use the Gram-Schmidt process on the basis u1 = [1, 0, 0], u2 = [1, 2, 0], u3 = [1, 2, 3] of R3 to produce an
orthogonal basis of R3 .
CONTENTS
95
Example 2.2.11.
Use the Gram-Schmidt process on the basis u1 = [1, 1, 1], u2 = [−1, 1, 0], u3 = [1, 2, 1] of R3 to produce an
orthogonal basis for R3 .
Example 2.2.12.

1 2 3
Find an orthogonal basis of the column space of the matrix A =  4 5 6 .
7 8 9

Example 2.2.13.
Matlab only has a numerical version of Gram-Schmidt: orth. This effectively finds a basis for the span of the
vectors in question (their column-space) and then applies Gram-Schmidt to this basis.
%Matlab-session
%
a=[1 2 3 ; 4 5 6 ;
rank(a)
b=orth(a)
b’*b
b*b’
gs(a)
7 8 9]
CONTENTS
96
2.2.4 Least squares solutions of systems of linear equations
So far, we have only focused on solving consistent systems of linear equations - where the system
Ax = b
has either a unique solution, or infinitely many solutions.
We turn now to inconsistent systems of linear equations
- systems for which there is no solution.
Why would we want to study a system that had no solution?
Suppose that although we know that no solution exists,
we still need to do the best we can.
Can we define a notion of “approximate solution”?
An important example is the fitting of a set of points to a
line.
Fitting data points to a line
Consider the set S of four points:
S:
(−1, −1),
(0, 0),
(2, 1) and
(3, 1)
y
(2,1)
(3,1)
(0,0)
x
(−1,−1)
The points are not collinear: there is no line (or linear
equation)
y = mx + c
which is satisfied by all four points (xi , yi ) of S. In other
words, the system of equations
−m + c = −1
0m + c = 0
2m + c = 1
3m + c = 1
(2.29)
is inconsistent.
Example 2.2.14.
Show, using Gaussian elimination, that (2.29) is inconsistent.
CONTENTS
97
Fitting a set of points S to a line means proposing a
linear model for the points: a linear equation
ȳ = mx + c
y
(2.30)
which is “best satisfied” by all the points of S.
(2,1)
• If the system (2.29) were consistent, this model
(2.30) would be satisfied exactly at each point
(xi , yi ) of S.
• Instead it is satisfied at each xi by a fitted ycoordinate
ȳi = mxi + c.
(2.31)
• The points (xi , ȳi ) all lie on the fitted line.
i=1
where y = y1 y2 · · · yn is the vector of y-values
in the data-set, and ȳ is the vector of corresponding fitted
y-values.
The name of the method “Least Squares” comes from
trying to minimise this error term, which mathematically
is equivalent to the problem of minimising the expresn
P
sion
(yi − ȳi )2 .
i=1
We write our linear model y = mx + c applied to a set
of points S = {(xi , yi )}4i=1 in matrix form:


 
x1 1 y1
x2 1 m
y2 


 
x3 1 c = y3  ,
x4 1
y4
specifically
−1
0

2
3

 
1 −1
0
1
m


=
 1 .
1 c
1
1
(2.32)
m
We abbreviate this linear model, naming x =
, as
c
Ax = b.
x
(−1,−1)
Figure 2.9: Fitting points to a line.
Definition 2.2.15.
The error in the least-squares fit ȳ = mx + c of data
points {(xi , yi )}ni=1 is given by
v
u n
uX
error = ky − ȳk = t (yi − ȳi )2

(3,1)
(0,0)
(2.33)
CONTENTS
98
z
The Method of Least Squares, the Normal Equation
b
We can solve a system of equations
Ax = b
b − Ax̄
for an m × n matrix A only if b is in the
column space of A,
Col(A) = {Ax : x ∈ Rn }.
y
11111111111111
00000000000000
00000000000000
11111111111111
00000000000000
11111111111111
00000000000000
11111111111111
The process of Least Squares fits non-linear
data to a line with slope m and y-intercept
m
c so that the vector x̄ =
makes Ax as
c
close as possible to b.
Ax̄
x
Col(A)
Figure 2.10: x̄ minimises kb − Axk.
m
• For any vector x =
, the distance from Ax to
c
a given vector b is kb − Axk.
• As x varies, Ax ranges over Col(A).
• The minimum distance from the subspace Col(A)
to b is the orthogonal distance from Col(A) to b.
Suppose this minimal distance is given by kAx̄ −
bk - see Figure 2.10. Then
(i) x̄ is the least squares solution to the system Ax = b.
(ii) the orthogonal distance kb − Ax̄k gives the leastsquares error.
(iii) the least squares solution x̄ minimises the least-squares
error.
• If the vector b − Ax̄ is orthogonal to Col(A), then
b − Ax̄ is orthogonal to each column of A:
If A = v1 v2 · · · vn , this orthogonality means
v1 · (b − Ax̄) = v1T (b − Ax̄) = 0
v2 · (b − Ax̄) = v2T (b − Ax̄) = 0
..
.
vn · (b − Ax̄) = vnT (b − Ax̄) = 0
CONTENTS
99
In matrix notation,
 T
v1
vT 
 2
 ..  (b − Ax̄) = 0
 . 
or
AT (b − Ax̄) = 0
vnT
• This means
AT Ax̄ = AT b.
Definition 2.2.16.
The normal equation for an inconsistent system of equations Ax = b is
AT Ax = AT b.
(2.34)
In the case of fitting data to a line, the vector x represents
the vector of unknowns
m
x=
c
which solves the Least Squares problem Ax = b.
Notes: In our example above,
• A has size (4 × 2)
• AT has size (2 × 4).
• AT A is square, and has size (2 × 2).
The normal equation usually give a smaller system of
equations than the original inconsistent system.
Example 2.2.17.
(a) Find the least-squares line for the data-points
(−1, −1), (0, 0), (2, 1) and (3, 1), and the leastsquares error.
y
y=.5x−.25
(2,1)
(3,1)
(b) Find the y-value on this least-squares line corresponding to each x-value
(0,0)
x
(i) x = 1
(ii) x = 2
(iii) x =
3
2
(−1,−1)
CONTENTS
100
Example 2.2.18.
Find the line of best fit for the data (−2, 1), (−1, 0),
(0, −1), (1, −1), and (2, −3), and the least-squares error.
Power Laws
An important application of the method of Least Squares is to models involving power laws: relationships of the
form y ∝ xd .
This relationship occurs in many natural and social phenomena:
• Metabolic processes in mice, rabbits, dogs, goats, men, women, and elephants, are proportional to bodyweight to the power 34 :
3
metabolic rate ∝ body weight 4
• Human brain weight and body weight are related by:
2
human brain weight ∝ body weight 3
• Zipf’s law states that the frequency of a word’s usage in English text is inversely proportional to the ranking
of the word’s frequency - which is to say that a word ranked nth in a frequency table occurs n1 times as
often as the word ranked 1st .
The power d in a hypothesised relationship y ∝ xd can be found by the method of least squares, after taking
logarithms of the expression:
y = kxd
⇒
ln(y) = ln(k) + d ln(x)
which, in relating ln(x) to ln(y) (seen in the log-log plot of x and y below), is the equation of a straight line of
slope d and ln(y)-intercept of ln(k). To find the values of k and d in a given relationship,
(a) data (xi , yi ) is transformed to (ln(xi ), ln(yi )),
(b) the method of least squares is used to determine the values of ln(k) and d in the log-log relationship
ln(y) = ln(k) + d ln(x)
(c) the values of k and d are read off.
2
y = 0.5x 3
ln(y) = ln(0.5) +
2.5
ln(x)
ln(y)
2
y
1
2
3
1.5
1
0.5
0.5
ln(x)
0
-0.5
x
0
-1
0
1
2
3
0
1
2
3
CONTENTS
101
Solving Inconsistent Linear Systems
We now have a method of solving any inconsistent system of linear equations:
Solution Technique 2.2.19. To solve an inconsistent system of equations
Ax = b
using the method of least-squares:
(i) form the normal equation
AT Ax = AT b
(by multiplying both sides of the original
equation on the left by AT );
(ii) solve the normal equation for x.
The least-squares error is given by kAx − bk.
Example 2.2.20.

 

1 −1 2
x
(a) Solve the inconsistent system 3 1  1 =  1  using the method of least-squares.
x2
1 2
−3
CONTENTS
102
(b) Find a least squares solution and error of (the obviously inconsistent) system


 
1 0 1
 0 1  x1 =  1 
x2
1 1
0
(c) Find a least squares solution and error of the system of equations
x1 + x2 = 2
x1 − x2 = 0
2x1 + x2 = 2
Example 2.2.21.
If a system of equations Ax = b is inconsistent, Matlab’s x = A\b solves it by the method of least-squares.
%Matlab-session
% a\b solves inconsistent systems
% using least-squares
a=[1 -1; 3 1; 1 2]
b=[2 1 -3]’
x=a\b
x1=inv(a’*a)*a’*b
x1-x
%check
CONTENTS
103
2.3 Eigenvalues
We begin this topic with a review of finding the determinant of an n × n matrix.
For given matrices B, we will be interested in problems where
Bu = 0
(2.35)
for vectors u.
We know already that (2.35) has a non-zero solution u if and only if det(B) = 0. (See page 195)
2.3.1 Determinants
Determinants by Cofactors
Firstly we define a few special types of matrices.
Definition 2.3.1.
(i) The main diagonal of a square n × n matrix A consists of the terms a11 , a22 , . . . , ann .
(ii) A square matrix is
diagonal if all non-zero entries
h ∗ 0 0are
i
on the main diagonal: 0 ∗ 0 ;
00∗
upper-triangular if all non-zero entries
i
h ∗ ∗ ∗are
on or above the main diagonal: 0 ∗ ∗ ;
00∗
lower-triangular if all non-zero entries
h arei
∗00
on or below the main diagonal: ∗ ∗ 0 ;
∗∗∗
triangular if it is either upper-triangular or lower-triangular.
Definition 2.3.2.
If A is a square n × n matrix A, its i, j-minor, Aij , is the (n − 1) × (n − 1) submatrix resulting when the ith
row and jth column of A are deleted.
Example 2.3.3.
1
0
(a) For A =
,
−2 −1
A11 = −1, and , A12 = −2.


1 2 3
(b) Find A12 and A23 for A = 4 5 6.
7 8 9
Definition 2.3.4.
The (i, j)-cofactor of a square matrix A is defined to be
Cij = (−1)i+j det(Aij ),
where Aij is the ij-minor of A (Definition 2.3.2).
Example 2.3.5.
1
0
(a) Find all cofactors of A =
.
−2 −1
(2.36)
CONTENTS
(b) Find C12
104

1 2 3
and C23 for A = 4 5 6.
7 8 9

The determinant of a matrix can be calculated by cofactor expansion along any row or column. The idea is to
expand along a row or column with the most zeros:
To calculate the determinant of a square n × n matrix A
• Cofactor expansion across the ith row is given by the formula
det A = ai1 Ci1 + ai2 Ci2 + ai3 Ci3 + ai4 Ci4 + · · · ain Cin
• Cofactor expansion down the j th column is given by the formula
det A = a1j C1j + a2j C2j + a3j C3j + a4j C4j + · · · anj Cnj
Example 2.3.6.
Find the determinants of the following diagonal and triangular matrices using the method of cofactors.


0 0 0
1 2
(a)
0 3
(d) 1 0 0
0 0 0


1 0 0
(b) 4 5 0
6 0 0


1 2 3
(c) 0 4 5
0 0 0
(e)
0 3
1 0

1 0 0
(f) 0 2 0
0 0 3

Clearly:
The determinant of any triangular matrix is just the product of the entries on its main diagonal.
Example 2.3.7.


1 2 0
Find the determinant of the given matrix using cofactor expansion: A = 1 −1 1.
2 0 1
CONTENTS
105
A second method for finding the determinant of a matrix uses row-reduction. For large matrices, this can be
faster than the method of cofactors.
Determinants by Row-Operations
A square matrix in echelon form is upper-triangular. If we reduce a matrix to echelon form, its determinant is
apparent. So we investigate how the determinant of a matrix changes as we apply row-operations to it.
Example 2.3.8.
a b
Compare the determinant of the matrix A =
before and after the following row-operations:
c d
1. r1 ↔ r2
2. r1 → −r1
3. r2 → −r2
4. r2 → r2 − kr1
5. r1 → r1 + kr2
Regardless of the size of a square matrix, row-operations have the following effects on its determinant:
row operation
row-swap
A
row-multiple
A
row-addition
Example 2.3.9.
A
∼
rj ↔ri
∼
rj →krj
∼
U
U
rj →rj +kri
U
change in determinant
det(U ) = − det(A)
det(U ) = k det(A)
det(U ) = det(A)


1 2 0
Using row reduction, find the determinant of A = 1 −1 1.
2 0 1
We simply mention here that a third method for finding determinants uses the fact that for square n × n matrices
A and B,
det(AB) = det(A) det(B).
(2.37)
CONTENTS
106
2.3.2 Eigenvalues: The characteristic equation
of a matrix.
Definition 2.3.10.
Given a square n × n matrix A, we can sometimes find
a non-zero vector v ∈ Rn and a corresponding scalar λ,
such that
Av = λv.
(2.38)
We call any non-zero vector v which satisfies (2.38) an
eigenvector of A, and the corresponding scalar λ an
eigenvalue.
The matrix equation (2.38) can be rewritten
(A − λI)v = 0.
(2.39)
In this form, it represents a homogeneous system of linear equations. This system always has the trivial (zero)
solution v = 0.
The eigenvectors are the non-trivial solutions of (2.39),
i.e. the non-zero members of the nullspace of A − λI. If
any exist (refer page 195),
Null(A − λI ) 6= {0}
which means det(A − λI)= 0.
Definition 2.3.11.
For an n × n matrix A,
det(A − λI) = 0
(2.40)
is called the characteristic equation of A.
It is a polynomial of degree n and has at most n solutions.
Consequently:
(i) The eigenvalues of a matrix A are all the solutions
λ of the characteristic equation of A,
det(A − λI) = 0.
(ii) The eigenvectors of a matrix A corresponding to
eigenvalue λ are the non-trivial solutions v of the
equation
(A − λI)v = 0.
Notes:
• The pivots of a matrix are not necessarily the
eigenvalues.
• We restrict ourselves in this course to matrices
whose characteristic polynomials have only real
solutions.
Find the eigenvalues
and
2 1
.
corresponding eigenvectors of A =
1 2
Example 2.3.12.
CONTENTS
107
Definition 2.3.13.
(i) If λ is an eigenvalue of a square matrix A, the
nullspace Null(A − λI) is called the eigenspace
of A corresponding to λ.
It consists of the zero vector and all eigenvectors
of A corresponding to λ.
(ii) The dimension of the eigenspace of A corresponding to any eigenvalue λ is dim(Null(A −
λI)), the nullity of the matrix A − λI.
FACT A basis for Null(A − λI) gives a set of eigenvectors for A corresponding to λ.
For any square n × n matrix A:
• If A has eigenvalue 0, then:
– det(A) = 0 (as det(A) = det(A − 0I)).
– the eigenspace of A corresponding to λ = 0
is Null(A).
– the nullity of A is the dimension of this
eigenspace.
– the eigenvectors of A corresponding to λ =
0 are the non-zero vectors of Null(A).
– A is singular.
– rank(A) < n.
• If A has all eigenvalues non-zero:
Notes:
• An eigenvector v of A corresponding to eigenvalue λ is not unique. If λ is an eigenvalue of A,
then
Null(A − λI ) 6= {0},
Null(A − λI) is a subspace of Rn of dimension
greater than zero, and has infinitely many members.
• Any basis for this subspace Null(A − λI) gives a
set of eigenvectors corresponding to eigenvalue λ.
• If A has eigenvalue 0, the corresponding
eigenspace is the nullspace of A, Null(A).
– A is non-singular.
– rank(A) = n.
• Eigenvectors of A corresponding to different
eigenvalues are linearly independent.
• If A has n distinct eigenvalues, the corresponding
eigenvectors form a basis for Rn .
Example 2.3.14.
3 1
The matrix A =
has eigenvalues 5 and 0.
6 2
Find a basis for Null(A), the eigenspace of A corresponding to λ = 0.
We can relate eigenvalues and eigenvectors to our
theory of linear algebra with the following facts
(compare with the tables on page 195):
CONTENTS
108
Example 2.3.15.
Find the characteristic
equation of the diagonal matrix
λ1 0
.
A=
0 λ2
Show that the eigenvalues of A are λ1 and λ2 .
What are the corresponding eigenvectors?
FACT The eigenvalues of a triangular matrix are its diagonal entries. Note: This is not true for matrices in
general!
Example 2.3.16.


2 0
0
Find the eigenvectors of A = 0 −1 0 .
0 0 −1
Solution: The eigenvalues are 2, −1 and −1.

   
0 0
0
v1
0





v2 = 0.
λ = 2 (A − 2I)v = 0 −3 0
0 0 −3 v3
0
v2 = 0, v3 = 0, and v1 is a free variable (the firstrow equation

 says
 0= 0 – no information).
v1
1
v =  0  = 0 v1 , for any v1 6= 0.
0
0
Any non-zero multiple of v is an eigenvector of A
corresponding to λ = 2.
   
0
3 0 0 v1
λ = −1 (A + 1I)v = 0 0 0 v2  = 0.
0
0 0 0 v3
v1 = 0, v2 and v3 are free variables (rows 2 and 3
say 0= 0 - 
no information).
 
 
0
0
0
v =  v2  = 1 v2 + 0 v3 , any v2 and v3
1
0
v3
not both zero.
Null(A
1 I)has dimension 2, and has basis the
 + 
0
0



pair 1 , 0.
1
0
Any non-zero vector in Null(A + 1 I) is an eigenvector of A corresponding to λ = −1.

FACT The eigenvectors of a diagonal (n × n) matrix are
the vectors of the standard basis of Rn : {ei }ni=1 .
CONTENTS
109
Example 2.3.17.
The characteristic equation of the matrix


1 2 1
A = 3 6 3 is 9λ2 − λ3 = 0.
2 4 2
Find its eigenvalues and corresponding eigenvectors.
Example 2.3.18.
The Matlab function eigshow is designed to illustrate eigenvectors and eigenvalues of 2 × 2 matrices. Use it to help
familiarise yourself with the physical meaning of eigenvectors. Some of the default matrices given have eigenvalues
which are complex (not real).
A=[1 0; 0 -2]
eigshow(A)
Example 2.3.19.
The Matlab command eig(a) returns the eigenvalues of A
in a column vector. The command [v, d] = eig(a) (two
outputs) returns a vector of matrices:
1. v is a matrix whose columns are the eigenvectors,
2. d is a diagonal matrix whose diagonal entries are the
corresponding eigenvalues.
For a a matrix, the computational “poly(a)” command finds
the coefficients of the characteristic polynomial of a (using
its eigenvalues!)
%Matlab-session
% eigenvalues
b=[1 2 1 ; 3 6 3; 2 4 2]
[v,d]=eig(b)
%numerical
a=sym(b)
[v,d]=eig(a)
char_poly_s=poly(a,’lambda’)
char_poly_roots_s=solve(p_s)
% symbolic
% char eqn
% its roots
char_poly_n=poly(b)
char_poly_roots_n=roots(p)
% char eqn
% its roots
CONTENTS
110
2.3.3 Diagonalisation of a Matrix
An n × n matrix with n distinct eigenvalues has n corresponding linearly independent eigenvectors. These eigenvectors form a basis for Rn .
If
• the eigenvalues are λ1 , λ2 , . . . λn ,
• the corresponding eigenvectors are v1 , v2 , · · · vn ,
and
• we form the matrices
V =
v1 v2 · · · vn
,

λ1 0 . . . 0
 0 λ2 . . . 0 


D= .
,
..
 ..
.
0
0 0 . . . λn

then V is a non-singular matrix – with n linearly independent columns, it has rank n, and
AV = V D
or
A = V DV −1
(2.41)
This relationship can also be written V −1 AV = D.
Definition 2.3.20.
A square matrix with linearly independent eigenvectors
can be diagonalised as
V −1 AV = D
(2.42)
where V is the matrix with columns the eigenvectors of
A, and D the diagonal matrix with diagonal entries the
corresponding eigenvalues.
Note: The eigenvectors of a matrix are not unique, so the
matrix V above is not unique. But the diagonal matrix
D of eigenvalues is unique.
Example 2.3.21. 

1 2 0
The matrix A = 0 3 0 has eigenvalues 1, 2 and
2 −4 2
 
  
−1
−1
0
3, and corresponding eigenvectors  0 ,0 and −1.
1
2
2
Diagonalise the matrix A.
CONTENTS
111
Example 2.3.22.
Matlab’s eig function can be used to verify this diagonalisation, if it is :
%Matlab-session - diagonalisation
a=sym([3 -3 2; -4 2 1; 3 3 -1]);
[v,d]=eig(a)
a*v-v*d
inv(v)*a*v
v*d*inv(v)
Not every n × n matrix can be diagonalised , or diagonalised simply:
0 1
Example 2.3.23.
(a) The matrix A =
has
0 0
repeated eigenvalue λ = 0, so all its eigenvectors
are in Null(A). But rank(A) = 1 so dim(Null(A)) =
1. The only eigenvector of A is in the direction of
1
. A cannot be diagonalised.
0


1 0 0
(b) B = 1 1 0 has three eigenvalues but only
2 0 1
two linearly independent eigenvectors. B cannot
be diagonalised.
"
#
cos θ − sin θ
(c) The rotation matrix Q =
has
sin θ cos θ
complex (not real) eigenvalues and eigenvectors although Q has real entries, its diagonalisation is
complex.
Powers of a Matrix
If a matrix A can be diagonalised as A = V DV −1 (so A
has n linearly independent eigenvectors), higher powers
of A can be evaluated very simply:
Ak = (V DV −1 )(V DV −1 ) . . . (V DV −1 ) = V D k V −1 .
Most conveniently
 k

λ1 0 . . . 0
 0 λk . . . 0 
2


k
D = .
..
..  .
.
 .
.
. 
0 0 . . . λkn
This is one application of diagonalisation. Another occurs in solving linear systems Ax = b. We will illustrate
this in the next section.
CONTENTS
112
2.3.4 Symmetric Matrices
A matrix A is symmetric if A = AT . This automatically means that A is square – symmetric matrices always
have eigenvalues and eigenvectors.
Symmetric matrices occur in many applications. Luckily, they have some very nice features:
• All eigenvalues of a symmetric matrix are real.
• Eigenvectors of a symmetric matrix A corresponding to distinct eigenvalues are orthogonal:
Avi = λi vi , Avj = λj vj , i 6= j
⇒
vi · vj = 0.
• A symmetric n × n matrix always has n linearly independent eigenvectors, even though its eigenvalues may
not be distinct. So we can always use the eigenvectors of a symmetric matrix as a basis for the underlying
vector space.
• In the case of repeated eigenvalues, the Gram-Schmidt procedure and normalisation may by applied to any
basis of eigenvectors to obtain an orthonormal basis of eigenvectors.
Orthogonal Diagonalisation
A symmetric matrix A can always be diagonalised:
To diagonalise a symmetric n × n matrix A,
• form a matrix V = v1 v2 · · · vn of orthonormal eigenvectors of A (not unique).
• V is an orthogonal matrix: V −1 = V T .
• The expression
A = V DV T
or
V T AV = D
(2.43)
is called an orthogonal diagonalisation of A.
• Every symmetric matrix has an orthogonal diagonalisation.
Example 2.3.24. 

1 −2 2
The matrix A = −2 1 −2 is symmetric, with eigenvalues -1, -1 and 5, and corresponding eigenvectors
2 −2 1
     
1
1
1
1,  0 , −1. Find an orthogonal diagonalisation of A.
0
−1
1
CONTENTS
113
Quadratic forms
The expression
c1 x1 + c2 x2 + · · · + cn xn is a linear form on x ∈ Rn . In matrix notation, this can be written Cx
for C = c1 c2 . . . cn .
Definition 2.3.25.
An expression of the form
c11 x21 + c22 x22 + · · · + cnn x2n +
is called a quadratic form on x ∈
Rn .
X
cij xi xj
i6=j
Notes:
• There are no linear terms in a quadratic form, only terms of the form cij xi xj .
• We call terms cij xi xj , where i 6= j, cross-product terms, or mixed terms.
• We generally combine mixed terms and write cij xi xj + cji xj xi = c′ij xi xj .
• In matrix notation, a quadratic form can be written
n
X
cij xi xj = xT Qx
(2.44)
i,j=1
for matrix Q = (cij )(n×n) .
• In this expression, the matrix Q is symmetric, as we can always arrange to have cij = cji .
Example 2.3.26.
2
Verify that x21 + 2x1 x2 − 4x
 2 x3 + 5x3  

0
x1
1 1
can be written x1 x2 x3 1 0 −2  x2 .
0 −2 5
x3
Example 2.3.27.
Rewrite the following quadratic forms as xT Qx for symmetric matrices Q:
(a) 2x21 + 3x22
(b) x21 + x1 x2 + x22
(c) x21 + x22 − x23 − 2x1 x2 + 6x2 x3
(d) −x21 + 3x22 − x1 x2 + 4x2 x3
(e) 2x21 + 3x22 − 6x1 x3 + 10x2 x3 + 9x23
The orthogonal diagonalisation of a symmetric matrix Q = V DV T , with eigenvalues λ1 , λ2 , . . . , λn , allows us
to simplify quadratic forms xT Qx:
xT Qx = xT (V DV T )x
= (xT V )D(V T x)
= (V T x)T D(V T x)
= yT Dy
where y = V T x
= λ1 y12 + λ2 y22 + · · · + λn yn2 .
(2.45)
The change of variables y = V T x simplifies the quadratic form to:
xT Qx = yT Dy =
n
X
λi yi2
i=1
with
(2.46)
CONTENTS
114
• no cross-product terms yi yj (i 6= j),
• the coefficient of each term yi2 the corresponding eigenvalue λi of Q.
Notes:
• Trivially, xT Qx = 0 if x = 0.
• If all eigenvalues λi are non-negative, xT Qx ≥ 0 for all x.
• If all eigenvalues λi are non-positive, xT Qx ≤ 0 for all x.
Definition 2.3.28.
A quadratic form xT Qx is said to be
(i) positive-definite if xT Qx > 0 for all x 6= 0
(ii) negative-definite if xT Qx < 0 for all x 6= 0
(iii) indefinite if xT Qx takes positive and negative values, depending on x.
But the quadratic form xT Qx, via (2.46), is intimately connected to the eigenvalues and eigenvectors of Q, so
we can naturally rephrase these definitions:
Theorem 2.3.29.
A symmetric matrix Q is
(i) positive-definite if all its eigenvalues are positive.
(ii) negative-definite if all its eigenvalues are negative.
(iii) indefinite if it has positive and negative eigenvalues.
The task of finding all eigenvalues of an n × n symmetric matrix to determine whether the associated quadratic
form is positive-definite becomes difficult as n increases. We now present an alternative method for determining
positive-definiteness which does not require solving the characteristic equation.
a11
a21
a31
a41
a12
a22
a32
a42
a13
a23
a33
a43
a14
a24
a34
a44
First principal submatrix
a11
a21
a31
a41
a12
a22
a32
a42
a13
a23
a33
a43
a14
a24
a34
a44
Second principal submatrix
a11
a21
a31
a41
a12
a22
a32
a42
a13
a23
a33
a43
a14
a24
a34
a44
Third principal submatrix
a11
a21
a31
a41
a12
a22
a32
a42
a13
a23
a33
a43
a14
a24
a34
a44
Fourth principal submatrix
Definition 2.3.30.
(i) The kth principal submatrix of an n × n matrix A is the k × k matrix obtained by deleting all but its first
k rows and k columns.
(ii) The kth principal minor of A, Mk , is the determinant of its kth principal submatrix.
A symmetric n × n matrix Q is
• positive-definite iff (if and only if) all principal minors M1 , M2 , . . . Mn of Q are positive.
• positive semi-definite iff all principal minors of Q are non-negative.
• negative-definite iff the principal minors of Q alternate in sign: M1 < 0, M2 > 0, M3 < 0, . . .
• negative semi-definite iff the principal minors of Q alternate in sign: M1 ≤ 0, M2 ≥ 0, M3 ≤ 0, . . .
• indefinite iff it is neither positive-semi-definite nor negative semi-definite.
CONTENTS
115
Example 2.3.31.
Classify the given matrices as positive-definite, positive semi-definite, negative-definite, negative semi-definite
or indefinite.


1 2
2 2 0
(a)
2 1
(f) 2 2 0
0 0 2
(b)
3 −1
−1 3
(c)
−3 1
1 −3
(d)
0 1
1 0
(e)
1 −1
−1 −1


1 2 1
(g) 2 1 1
1 1 2


2 2 0
(h) 2 2 0
0 0 2


2 −1 0
(i) −1 2 −1
0 −1 2
Example 2.3.32.
Matlab can handle tasks like evaluating a succession of principal minors very easily. Call the local function ev4
with the command “ev4(a)” for matrix a.
function ev4(a)
%matlab function to compute principal minors
[m,n]=size(a);
if (m~=n)
disp(’ matrix must be square’);
return;
else
for i=1:n
disp([’principal submatrix ’,int2str(i)])
disp(a(1:i,1:i))
disp(’principal minor’)
disp(det(a(1:i,1:i)));
end;
end;
return;
CONTENTS
116
2.3.5 Markov Chains
State vectors
In many practical situations, a person or object is always
in 1 of 2 states.
For example,
• a person is either a smoker or a non-smoker.
• A plant is either in flower or not in flower.
A Markov process may involve more than 2 states, but
in this section we will keep to just 2 states for clarity.
A state vector is used to record the number in each state.
Markov processes can be modelled with row state-vectors
and with column state-vectors. We will use column statevectors in this course.
Example 2.3.33.
Suppose a survey of 90 dentists finds that 25 dentists
smoke and 65 dentists
do
not smoke. This may be recorded
25
as the state vector
.
65
Notes:
• The entries in a state vector refer to the actual
number in each state.
• The order in which the numbers are put in the state
vector is important. We must indicate clearly that
the first number, 25, refers to those who smoke,
and the second number, 65, refers to those who do
not smoke.
In a Markov process, people or objects are able to move
from one state to the other state at certain fixed time
intervals. With the group of dentists mentioned above,
their state of smoking or not-smoking may be recorded at
a regular monthly medical examination. Because movement is possible between states, the state vector will change.
We will use the notation
1. v0 for the initial state vector,
2. v1 for the state vector after 1 time interval,
3. v2 for the state vector after 2 time intervals,
and so on.
The stochastic matrix
In a Markov process, the probability that a person, or
object, moves from one state to another state depends
entirely on the present state.
Returning to the above group of 90 dentists, let us say
CONTENTS
117
• the probability a smoker will change into a nonsmoker is 0.2,
• while the probability a non-smoker will change
into a smoker is 0.1.
It is convenient to record this information in the form of
a stochastic matrix, S. Using column vectors we write
0.8 0.1
S=
.
0.2 0.9
Notes:
• Every entry in a stochastic matrix represents a probability, so it must be a real number from the interval [0,1].
• When column state-vectors are being used, each
column in a stochastic matrix must sum to 1.
• To interpret the meaning of a stochastic matrix a
clear indication must have already been given of
the order of the states.
Example 2.3.34.
We return to the group of dentists.
We keep the order used for the state vector above, that is
smoker: non-smoker.
The first column in this stochastic matrix indicates the
probabilities for a dentist who smokes. There is a probability of 0.8 they will remain a smoker and a probability of 0.2 they will change to the other state and become a non-smoker. Similarly, the second column in this
stochastic matrix indicates the probabilities for a dentist
who does not smoke. There is a probability of 0.1 they
will change into a smoker and a probability of 0.9 they
will remain a non-smoker. Multiplication by the stochastic matrix
is used to find the next state vector. Thus if
25
v0 =
, then we can find v1 by the calculation:
65
v1 = Sv0 =
0.8 0.1
0.2 0.9
25
26.5
=
.
65
63.5
And
0.8 0.1
v2 = Sv1 =
0.2 0.9
26.5
27.55
=
.
63.5
62.45
Notes:
• the entries in the state vectors are written as decimals and not rounded to whole numbers.
CONTENTS
118
• the sum of the entries in the state vectors remains
the same. If the alternative notation of row-vectors
is used, then the transpose of S is used. The corresponding calculations are as follows.
Theinitial
state vector is now written v0 = 25 65 . We
can find v1 by the calculation:
0.8 0.2
T
v1 = v0 S = 25 65
= 26.5 63.5 .
0.1 0.9
Similarly,
0.8 0.2
v2 = v1 S = 26.5 63.5
= 27.55 62.45 .
0.1 0.9
T
Henceforth, we use only column vectors.
The state vector after n time periods
The key result is that we can write the nth state vector
vn as
vn = Sn v0 .
This is because
v1 = Sv0
v2 = Sv1
v3 = Sv2
..
.
= S 2 v0
= S 2 v1
= S 3 v0
vn = Svn−1 = S 2 vn−2 = · · · S n v0
If we wish to find the expected state vector after say 5 or
10 time periods, it may be convenient to diagonalise the
matrix S, as this makes it easier to find a high power of
S.
Example 2.3.35.
Continuing
theexample about
the
dentists:
0.8 0.1
25
S=
and v0 =
. Find v10 .
0.2 0.9
65
Answer: First find eigenvalues and eigenvectors for S:
S
eigenvalue
1 and 0.7 with associated eigenvectors
has
−1
1
and
respectively. Diagonalising:
2
1
" 1
1#
3
3
1 −1 1 0
S =
0 0.7 − 2 1
2 1
3
3
10
" 1
1#
3
3
1
−1
1
0
Hence S 10 =
2 1
0 0.710 − 2 1
3
3
0.3522 0.3239
=
0.6478 0.6761
0.3522 0.3239 25
10
Hence, v10 = S v0 =
=
0.6478 0.6761 65
29.86
.
60.14
CONTENTS
119
The long-term state vector
If a Markov process vk+1 = Avk is allowed to continue
for a very large number of time periods, the sequence of
state vectors v0 , v1 , v2 , v3 , · · · , vk , · · · may eventually
become constant for all subsequent time periods:
Avk = vk , for all k ≥ n.
(2.47)
Note. If
Avn = vn ,
then vn is an eigenvector of A corresponding to λ = 1.
For the given Markov process vk+1 = Avk , the vector
vn of (2.47) is called both
• a steady-state vector, as there is no change from
state vn to any subsequent state:
vn+1 = Avn = vn .
The sequence of state vectors for the Markov process, given state vector v0 , is
{v0 , v1 , v2 , v3 , · · · , vn−1 , vn , vn , vn , · · · }
• a long-term state vector, written v∞ :
v∞ = lim vk = lim Ak v0 = vn
k→∞
k→∞
The long-term state-vector makes the final outcome
of the Markov process clear.
Facts about stochastic matrices:
• Every stochastic matrix has eigenvalue λ = 1.
• All other eigenvalues λ satisfy |λ| ≤ 1
– they may not all be real, but in this course they
will be.
• A corresponding stochastic eigenvector vλ=1
gives the long-term (and steady-state) state vector:
Av = v
It is customary to scale the eigenvector corresponding to
λ = 1 so that its entries sum to the same value as the
initial state vector.
Example 2.3.36.
Continuing
theexample about the
dentists.
0.8 0.1
25
S=
and v0 =
. Find v∞ . Answer:
0.2 0.9
65
CONTENTS
120
1
S has eigenvalue 1 with associated eigenvector
. So
2
1
so that the components sum
v∞ will be a multiple of
2
to 90, asthis is the number of dentists in the study. Hence
30
v∞ =
. The long-term situation will be that 30 of
60
the
dentists
willsmoke
and
60 will not smoke. Check:
0.8 0.1 30
30
=
.
0.2 0.9 60
60
Notice that the initial state vector is not used in the computation of v∞ , except to determine the total number in
the study.
Example 2.3.37.
A car-rental agency has cars in 3 cities, say Auckland,
Wellington and Christchurch.
Initially, the proportion 
of thestock in the three cities is
xA
0
.
given by a vector x0 = xW
0
xC
0
Suppose that, after a week,
• of the cars that started out in Auckland,
–
–
–
1
2
1
4
1
4
are in Auckland
are in Wellington
are in Christchurch
• of the cars that started out in Wellington,
–
–
–
1
4
1
2
1
4
are in Auckland
are in Wellington
are in Christchurch
From
A W C
• of the cars that started out in Christchurch,
–
–
–
1
8
1
8
3
4
are in Auckland
are in Wellington
are in Christchurch
and that initially, one-third of cars are in each city.
(a) Formulate the transition matrix and initial state
vector for weekly car-movement.
A
To
W
C
CONTENTS
121
(b) Is it possible to keep cars in stock in all three
locations, preferably at a near-constant ratio?
Example 2.3.38.
We can see more of these computations on Matlab: call the local
function ev5.m with three arguments: the first a matrix, the second
a starting vector, and the third the number of steps to evaluate:
>> a=[1/2 1/4 1/8; 1/4 1/2 1/8; 1/4 1/4 3/4]
>> x0=[1/4 1/2 1/4]’
>> ev5(a,x0,10);
function b=ev5(a,x0,n)
%matlab function to demonstrate Markov chain
b=a*x0
n=min(n, 20);
for i=2:n
b=a*b;
disp(b);
end;
return;
CONTENTS
122
2.3.6 Discrete Dynamical Systems
Introduction
Processes which evolve through time, like
• the invasion of new territory by an introduced biological species,
• the growth of a population within a region,
• the spread of disease through a population,
• the saturation of a market by a new product,
can be modelled either by discrete steps, or as a continuous process.
Definition 2.3.39.
A difference equation which models the evolution of a
process through time in discrete steps is known as a discrete dynamical system: it is a model or rule of the form
xn+1 = Axn .
(2.48)
For this course, we focus on the case that A is a matrix of
size n × n, and x0 , x1 , x2 , . . . are vectors (of size n × 1).
Example 2.3.40.
Many species of insect have a discrete breeding season,
e.g. an annual breeding season, and the nth seasonal
population, xn , can be modelled, at least initially, by an
equation of the form
xn+1 = axn ,
where a is a number reflecting the birth-rate of the population. If we go a little further and incorporate deaths
into the model as well as births, we get something like
xn+1 = bxn (M − xn ),
a logistic model of population growth.
Example 2.3.41.
The simple one-year model above, xn+1 = axn , can be
adapted to species which take longer to reach breeding
age. We could break the population up into a vector of
“young" and “adult",
young x
xn = nadult ,
xn
as only adults reproduce, yet the young born this year
may be adults next breeding season. This would give us
a yearly growth-model of the form
0 a
xn+1 =
x ,
(2.49)
b c n
where a again reflects the birth-rate of the population.
CONTENTS
123
All these models are simple, and we have to keep in mind
they will have shortcomings. But they can still help us to
analyse a problem. We state here an important saying:
“All models are wrong, but some models are useful. –
George Box”
Example 2.3.42.
Write
an+1 = 2an + 3bn
bn+1 = an − bn
in matrix form.
Example 2.3.43.
If the yearly growth model (2.49) is given by
0
1
xn+1 =
x ,
1/2 1/2 n
2
and x0 =
, what is x4 ?
10
What is happening to the distribution of young and adults
in time?
Eigenvector decomposition and the long-term
For a given initial vector x0 , the rule (2.48), xn = Axn−1 ,
defines outcomes xn into the future. The matrix A is a
transition matrix: it describes the transition of a system
from one time-step to the next.
Assuming the model holds far enough ahead, we are often interested in the long-term future, in
lim xn .
n→∞
When a matrix A has distinct eigenvalues, its corresponding eigenvectors form a basis for the underlying vector
space. This means any vector in the vector space can be
written as a linear combination of these eigenvectors (a
basis spans the vector space). This makes it very easy to
see how A acts on vectors.
For example, suppose the 2×2 matrix A has eigenvalues
λ1 and λ2 (λ1 6= λ2 ), with corresponding eigenvectors
v1 and v2 . If x0 is any vector in R2 , we can write
x0 = c1 v1 + c2 v2
Then
Ax0 = c1 Av1 + c2 Av2
= c1 (λ1 v1 ) + c2 (λ2 v2 ).
(2.50)
(2.51)
With xn = Axn−1 ,
x1 = Ax0 = c1 λ1 v1 + c2 λ2 v2
x2 = A2 x0 = c1 λ21 v1 + c2 λ22 v2
..
.
xn = An x0 = c1 λn1 v1 + c2 λn2 v2
(2.52)
CONTENTS
124
We now present some simple discrete dynamical systems, with a plot of the solution vector xn , n = 1, 2, · · ·
in the x − y plane through time. Since the solution xn
is a vector-valued function of n, we need three dimensions to plot it. If we let n vary, then for fixed values
of c1 and c2 , xn traces out a curve in the (x, y)–plane.
A collection of such curves for a set of values of c1 and
c2 is called a phase portrait. Some phase portraits (with
arrows indicating the direction that we move along the
curves as n increases) are given for the solutions below.
10
Example 2.3.44.
1.05
0
If xn+1 =
x ,
0
0.95 n
write an expression for xn using (2.52).
4
(i) If x0 =
, what is x10 ?
3
1
(ii) If x0 =
, what is x10 ?
10
5
v2
y
v1
0
x
-5
(iii) For any x0 with positive entries, what does the
model predict will be the long-term distribution?
-10
-4
-2
0
2
4
CONTENTS
125
10
Example 2.3.45.
0.95
0
If xn+1 =
x ,
0
1.05 n
write an expression for xn using (2.52).
4
(i) If x0 =
, what is x10 ?
3
1
(ii) If x0 =
, what is x10 ?
10
y
v2
5
v1
0
x
-5
(iii) For any x0 with positive entries, what does the
model predict will be the long-term distribution?
-10
-8
Example 2.3.46.
1.15
0
If xn+1 =
x ,
0
1.05 n
write an expression for xn using (2.52).
-6
-4
-2
0
2
4
6
8
8
y
v2
6
4
2
4
(i) If x0 =
, what is x10 ?
3
1
(ii) If x0 =
, what is x10 ?
10
(iii) For any x0 with positive entries, what does the
model predict will be the long-term distribution?
v1
x
0
-2
-4
-6
-8
-8
-6
-4
-2
0
2
4
6
8
CONTENTS
126
Example 2.3.47.
0.95
0
If xn+1 =
x ,
0
0.90 n
write an expression for xn using (2.52).
y
v2
4
2
4
(i) If x0 =
, what is x10 ?
3
1
(ii) If x0 =
, what is x10 ?
10
(iii) For any x0 with positive entries, what does the
model predict will be the long-term distribution?
v1
-2
-4
-4
Example 2.3.48.
0.95 0
If xn+1 =
x ,
0
0 n
write an expression for xn using (2.52).
4
(i) If x0 =
, what is x10 ?
3
1
(ii) If x0 =
, what is x10 ?
10
(iii) For any x0 with positive entries, what does the
model predict will be the long-term distribution?
x
0
-2
0
2
4
CONTENTS
127
Example 2.3.49.
Stoats were introduced into this country to control rabbits: if their habitat is not threatened and they don’t find
preferable food sources, they will prey on rabbits and keep their numbers down. Suppose we model stoat-rabbit
interaction in a specific environment on a yearly scale, from year n to year n + 1, by the equations:
sn+1 = 0.4sn + 0.9rn
rn+1 = −psn + 1.2rn
(2.53)
where sn is the number of stoats in the region, rn the number of rabbits, and p is a positive predation parameter
specific to this environment. The first equation says that with no rabbits for food, 40% of the stoat population
will survive till the next year, and the 1.2 in the second equation says that with no stoats to control their numbers,
rabbits will increase at a rate of 20% per year. The term −psn measures the rate of predation of stoats on rabbits.
Let’s assume the predation parameter p has value 0.11.
0.4
0.9
To five decimal places of accuracy, the eigenvalues of the matrix A =
are λ1 = .55302 and
−0.11 1.2
λ2 = 1.04698.
0.81197
0.98585
respectively.
and v2 =
The corresponding eigenvectors are v1 =
0.58370
0.16761
(a) Use (2.52) to write xn in terms of the eigenvectors of the transition matrix.
1
s
= −7.9121v1 + 10.8380v2 , what is the long-term ratio of stoats to rabbits?
(b) If x0 = 0 =
5
r0
5
s
= 2.9466v1 + 2.5803v2 , what is the long-term ratio of stoats to rabbits?
(c) If x0 = 0 =
2
r0
s0
98585
(d) If x0 =
=
, what does the model predict in the long-term?
r0
16761
10
y
5
v2
v1
x
0
-5
-10
-10
-5
0
5
10
Differential equations
3.1 First-Order Differential Equations
3.1.1 Introduction
A differential equation is an equation between the derivatives and values of an unknown function.
Example 3.1.1.
In the differential equation
dy
=y+t
dt
y is an unknown function of the variable t. The differential equation tells us that at each value of t, the value of
the derivative of y with respect to t is equal to the value
of y + t.
Example 3.1.2.
The following are examples of DEs.
(a)
dy
dt
= 3t
(b)
dy
dx
= 5x + 1
(c)
dy
dx
= 5y + 1
(d) y ′ + 2xy = sin(x)
(e)
∂2f
∂x2
+
∂2f
∂y 2
=0
In this course, we consider only differential equations
that involve derivatives of functions of one variable. Such
differential equations (DEs for short) are called ordinary
DEs.
A partial DE involves derivatives of a function of more
than one variable. Example 5.1.2 (e) is partial DE.
A derivative measures the rate of change of a dependent
variable with respect to some independent variable. In
Example 5.1.1, t is the independent variable and y is the
dependent variable (y is a function depending on t).
The independent variable may represent
• time
• distance of some object from a fixed point
128
CONTENTS
129
• cost of an item
• demand for a product
Many quantities which depend on time or space obey
physical laws which are naturally and simply formulated
as DEs.
3.1.2 Terminology
Features of a DE that are significant to us:
order
The order of a DE is the highest derivative appearing in
the equation, e.g.,
•
dy
dt
•
d3 y
dt3
= 3t
is first order (has order 1)
2
t
− t dy
dt + (t − 1)y = e
has order 3
linear or nonlinear
Suppose the dependent variable in a DE is y which is a
function of t. The DE is linear if it can be written so that
each term is of the form y or one of its derivatives multiplied by a (possibly constant) function of t, otherwise it
is nonlinear, e.g.,
•
d2 y
dt2
•
dy
dt
+ y = 0 is linear
=
1
y
is nonlinear.
solution
A solution of a DE is any function that, when substituted
for the unknown dependent variable in the DE, gives a
true statement, i.e.,
A solution satisfies the DE.
Notes:
• a solution y may be
– explicit: given in the form y = f (t), or
– implicit: given in the form F (t, y) = 0
• a solution of a DE may have a restricted domain,
2
e.g., y = 1t satisfies dy
dt = −y for t 6= 0.
CONTENTS
130
Example 3.1.3.
(a) Verify that y = 5t + 3 is an explicit solution of
dy
= 5.
dt
(b) Verify that xy 2 = 2 is an implicit solution of
y 2 + 2xy
dy
= 0.
dx
Example 3.1.4.
You already know how to solve some DEs, e.g.,
dy
=t
dt
⇒
1
y = t2 + c,
2
where c is some arbitrary constant. Here, if an additional
condition is imposed, such as y(0) = 1, then there is a
unique solution, e.g., in this case y = t2 /2 + 1.
Example 3.1.5.
Here are some more DEs that you can already solve:
(a) x′ = 0 ⇒ x = c,
where c is an arbitrary constant
(b) y ′ = 2
(c)
dy
dt
= 2t
(d)
dy
dt
=y
(e)
d2 y
dt2
⇒ dy
dt
=1
=t+c
2
⇒y = t2 + ct + d.
where c, d are arbitrary constants
Example 3.1.6.
Verify that the given functions are solutions of the accompanying DEs:
(a) y = cx2 , xy ′ = 2y
(b) y 2 = e2x + c,
(c) y = cekx ,
yy ′ = e2x
y ′ = ky
(d) y = c1 sin(2x) + c2 cos(2x),
y ′′ + 4y = 0
CONTENTS
131
(e) y = c1 e2x + c2 e−2x ,
xy ′ = y + x2 + y 2
(f) y = x tan(x),
(g) x2 = 2y 2 ln(y),
y′ =
(h) y 2 = x2 − cx,
(i) y = c2 + xc ,
y
(j) y = ce x ,
y ′′ − 4y = 0
xy
x2 +y 2
2xyy ′ = x2 + y 2
y + xy ′ = x4 (y ′ )2
y′ =
y2
xy−x2
(y cos(y)−sin(y)+x)y ′ = y
(k) y+sin(y) = x,
Existence of solutions
Most differential equations we will meet are guaranteed
to have solutions, by the following result.
Existence Theorem: Suppose that f (t, y) is continuous
on the rectangle R := {(t, y) : a < t < b, c < y < d}
and (t0 , y0 ) is in the rectangle. Then there is a function y
defined on an open interval containing t0 which satisfies
dy
= f (t, y),
dt
y(t0 ) = y0 .
This theorem says that so long as the right hand side of
the DE is nice enough, then solutions to the DE exist.
Initial Conditions, Initial-Value Problems
The DE
y
dy
=1
dx
has solution
y 2 = 2(x + c).
The number c is an arbitrary constant of integration, and
can take any scalar value.
• A 1st-order DE is solved by one integration.
CONTENTS
132
• This introduces one constant c into the solution.
There is in fact a family of solutions, each member
of the family corresponding to a different value of
c.
• To pick a specific member of the family, we need
to know just one point on the solution curve.
Initial condition
The initial condition (abbreviated IC) of a 1st-order DE
is an observation of the unknown function at one specific
point x = x0 :
y(x0 ) = y0 .
Example 3.1.7.
An initial condition for the DE
y
is y(0) = −1,
or
dy
=1
dx
(x0 , y0 ) = (0, −1).
Notes:
• x0 is thought of as an initial point, and y0 the corresponding initial y-value.
• The graph of the solution satisfying y(x0 ) = y0 is
a curve passing through the point (x0 , y0 ).
Example 3.1.8.
dy
= 1 is y 2 = 2(x + c), with c arbidx
trary. Find the solution which also satisfies y(0) = −1.
The solution of y
Definition 3.1.9.
An Initial-Value Problem (abbreviated IVP) is the pair
Differential Equation + Initial Conditions.
In this course you will see aspects of the following three
main approaches for finding and understanding the solutions of DEs
• analytic – sometimes a formula for the solution
can be found
• numerical – for many DEs numerical approximations to the values of the solution can be found,
e.g., we will see Euler’s method
• qualitative – often we can describe important features of solutions, such as their long term behaviour,
e.g., by considering the slope field.
CONTENTS
133
3.1.3 First-Order Differential Equations
First we consider methods for finding analytic solutions.
Separable Equations
A separable first-order DE is one that can be written in
the form
y ′ = f (x)g(y).
For these the derivative of y can be separated into two
factors, one involving only x and one involving only y.
If the derivative involves only x, then we can integrate it
directly. For many DEs used in modelling the derivative
involves only y, and these are said to be autonomous.
Solution Technique 3.1.10. To solve the separable DE
dy
= y ′ = f (x)g(y) :
dx
• If g(y) = 0, then y = constant, for all x
• If g(y) 6= 0, then
– write the DE in the form
dy
g(y) = f (x) dx
R
R dy
= f (x) dx + c
– integrate: g(y)
– solve for y
Example 3.1.11.
Solve the following DEs, checking your answers.
(a) The autonomous DE
dy
dx
y = Cex
=y
y
10
5
C=2
y
C=1
Check: y ′ =
See the graphs of the solution y = Cex when C =
−1, 1, 2.
t
0
C = −1
−5
−10
−4
−2
0
x
2
4
CONTENTS
134
y 2 = 2(x + c)
5
=1
y
(b)
dy
y dx
c=1
0
c = −1
c=0
Check: y ′ =
−5
−1
1
3
5
x
y = Ce−x
2
2
1.5
(c) y ′ + 2xy = 0
Check:
⇒
y′
C=2
C=1
y′ =
+ 2xy =
y
1
0.5
0
−0.5
−1
−2
0
C = −1
2
x
Example 3.1.12.
Solve the following IVPs.
Some members of the family of solutions of each DE have been sketched. Locate the solution of each IVP.
(a) y ′ = x2 ,
y(1) =
4
3
Check: y ′ =
y=
x3
3
+c
y
5
c=1
c=0
0
(1, 43 )
•
c = −2
−5
−2
= 2,
y(e) = 4
0
1
2
x
y 2 = 4 ln |x| + c, x0 > 0
5
•
(e, 4)
c = 12
c=0
y
(b) x y
dy
dx
−1
c = −5
0
−5
0
2
4
x
CONTENTS
(c)
dy
dx
135
y = 1 + Ce3x
= 3(y − 1), y(0) = 2.
20
C=3
10
C=2
y0 > 1
0
y
C=1
y0 = 1
y0 < 1
−10
−20
−1
0
1
x
t2
+ ty = y,
y = Cet− 2
y(1) = 3
•
(1, 3)
3
1
2
C = 3e− 2
1
C=1
y
(d)
dy
dt
0
−2
C=0
−1
1
Example 3.1.13.
Matlab can be used to solve DEs. The previous example can be solved by the Matlab code:
%Matlab-SESSION
% Specify the DE
%
%
y’+t*y = y
%
% Note derivative of y written Dy
%
DE = ’ Dy+t*y=y’;
de_solution=dsolve(DE)
% You can also specify initial conditions
IC=’y(1)=3’ ;
ivp_solution=dsolve(DE,IC)
simplified=simplify(ivp_solution)
This gives the solutions for the DE and IVP:
%>> solveDEs
de_solution
= C1*exp(-1/2*t*(t-2))
ivp_solution
= 3/exp(1/2)*exp(-1/2*t*(t-2))
simplified
= 3*exp(-1/2*(t-1)^2)
0
2
t
3
4
5
CONTENTS
136
First-Order Linear Equations
First-order linear DEs are an important class of DEs which can usually be solved analytically.
A first-order linear DE can be written as
a(x)y ′ + b(x)y = c(x).
When a(x) = 0, we don’t have a DE.
In intervals where a(x) 6= 0, we can rewrite the DE as
y ′ + p(x)y = f (x).
(3.54)
Notes:
• In this form, the coefficient of y ′ is 1.
• We call this form (3.54) of the DE, where y ′ has a coefficient of 1, standard form.
a(x)y ′ + b(x)y = c(x) has standard form :
y′ +
c(x)
b(x)
y=
.
a(x)
a(x)
b(x)
c(x)
and f (x) =
are defined.
a(x)
a(x)
• Separable 1st-order DEs may be linear or nonlinear.
• The DE is defined only where p(x) =
Example 3.1.14.
Put the following DEs into the form of (3.54), with leading coefficient 1.
(a) 2y ′ + 3xy = 4x
(b) xy ′ + 3y = 4
Our method of solution will be to multiply (3.54) by an appropriate function µ called an integrating factor (IF
for short), so that the LHS becomes the derivative of µy.
Note: µ (pronounced “mu") is the Greek lower case “m", for “multiply".
Solution Technique 3.1.15. To solve the 1st-order linear DE
y ′ + p(x)y = f (x) :
R
• multiply through by IF, µ(x) = e
p(x)dx
• recognise the new DE as
(µ(x)y)′ = µ(x).f (x)
• integrate: µ(x)y =
• solve for y
R
µ(x)f (x) dx + c
(3.55)
CONTENTS
137
Example 3.1.16.
Solve the following DEs by using an appropriate integrating factor, and check your answers:
(a) y ′ − 3y = 0
(This is separable. Solve it both ways.)
y = y(0)e3t
40
30
y
20
10
y(0) = 10
0
−10
−2
(b) y ′ + 2y = e−4x+7
(c) x2 y ′ + 2xy = 3x + 1,
(d) y ′ +
3x
2 y
= 2x
y(1) = 1
y(0) = 1
y(0) = −10
0
t
2
CONTENTS
(e) y ′ +
3y
x
138
= x4 ,
y(1) = 1
y =x+
c
x
y
10
0
c = −1
c=2
−10
−5 −4 −3 −2 −1
dy
(f) x dx
+ y = 2x,
y(1) = 0
(g) y ′ + 2y = x2 + 4x + 7
(h) y ′ = x + y
(i) x
dy
− 4y = x6 ex
dx
c=0
0
x
1
2
3
4
5
CONTENTS
139
First-order Applications
See the textbook reference sheet.
Newton’s Law of Cooling
A heated object cools at a rate which depends on the surrounding (ambient) temperature:
Newton’s Law of Cooling states that the temperature T (t) of a cooling object changes at a rate proportional to
the difference between its temperature and the ambient temperature TA .
dT
= k(T − TA )
dt
(3.56)
This is a separable DE.
Example 3.1.17.
Coffee poured into a cup has temperature 75◦ Celsius, and 10 minutes later has cooled to 65◦ . The ambient
temperature is 20◦ .
(a) Find a formula for the temperature T as a function of
time t.
T (t) = 20 + 55e−0.02t
75 • (0, 75)
(b) How long before the coffee cools to 50◦ ?
• (10, 65)
T(t)
65
• (30, 50)
50
20
0
30
60
90
120
150
t (minutes)
Example 3.1.18.
A bath is run to a temperature of 40◦ . The temperature in the bathroom is 18◦ . After 15 minutes, the bath-water
is at 36◦ . How long before it is 33◦ ?
180
CONTENTS
140
Example 3.1.19.
A small country has $10 billion of paper currency in circulation, and each day $50 million comes into the
country’s banks. The government decides to introduce new currency by having the banks replace old bills with
new ones whenever old currency comes into the banks. Let x = x(t) denote the amount of new currency, in
billions of dollars, in circulation at time t, with x(0) = $0.
(a) Formulate an initial-value problem that represents the flow of the new currency into circulation.
(b) Solve the initial-value problem you’ve specified in (i).
(c) How long does it take for the new bills to account for 90% of the currency in circulation?
Solution:
(a) If time t is measured in days, and x(t) measures, in billions of dollars, the amount of new currency in circulation after t days, then
the amount of new currency going
out from day to day is:
fraction of total currency
total amount of old currency
x(tomorrow) − x(today) =
coming into banks tomorrow
remaining
of today
at end 50
= [10 − x(t)]
10, 000
so x(t + 1) − x(t) = 0.005(10 − x(t)).
With time t measured in days,
dx
x(t + 1) − x(t)
≈
= 0.005(10 − x) (the approximation improves as t gets larger),
dt
(t + 1) − 1
so the initial-value problem is
dx
= 0.005(10 − x),
x(0) = 0.
dt
When t = 0, x = 0 so C = 10.
x = 10(1 − e−0.005t ).
billion$
(b) The D.E. is separable, and for the duration of the
problem, x < 10.
dx
= 0.005 dt
10−x
⇒ − ln(10 − x) = 0.005t + c
.
⇒ ln(10 − x) = −0.005t + c
−0.005t
⇒ 10 − x = Ce
.
(c) When x = 9, what is t?
9
−0.005t
⇒1−e
⇒ e−0.005t
⇒t
= 10(1 − e−0.005t )
= .9
= .1
ln(.1)
=
= 460.5170 ≈ 461 days .
−0.005
T (t) = 10(1 − e−0.005t )
11
10
9
8
7
6
5
4
3
2
1
0
• (461, 9)
0
100
200
300
400
500
600
700
t (days)
Figure 3.11: New currency in circulation
It takes just under 461 days (between 15 and 16 months) to get 90% of the new currency in circulation.
800
900
1000
CONTENTS
141
Example 3.1.20.
A personal computer has a useful life of 5 years, and depreciates at a rate directly proportional to the time
remaining in its 5-year life-span. It is purchased for $2,500, and one year later its value is $1800. Write an
equation for the value of the computer t years into its life for t > 0, and find its value 4.5 years after its purchase.
Example 3.1.21.
A store’s sales change at a rate directly proportional to the square root of the amount spent on advertising. With
no advertising, sales are $50,000 per year, and with $16,000 spent on advertising per year, its sales are $70,000.
Write an equation describing the store’s sales as a function of the amount spent on advertising.
CONTENTS
142
Slope fields
A first-order DE
dy
= f (x, y).
dx
says that the slope of a solution y(x) which passes through the point (x0 , y0 ) is f (x0 , y0 ). We can draw these
slopes at a grid of points to get a picture of what the family of solutions looks like. This works even if we can
not find a formula for the solution to the DE.
slope field
The slope field or direction field of the first-order DE
dy
= f (x, y)
dx
is a plot of the slope f (x, y) at a set of points in the (x, y)–plane.
Note: The graph of a solution which satisfies an initial condition y(x0 ) = y0 will have slopes close to those in
the slope field (the finer the grid the closer the slopes).
Thus we can get the approximate shape of the solution that satisfies an initial condition y(x0 ) = y0 , by starting
at the point (x0 , y0 ) and following the direction field, always ensuring that the curve drawn is tangential to the
slopes.
Example 3.1.22.
The adjacent picture shows the slope field for the DE
dy
= y + t on a grid over −3 ≤ t, y ≤ 3.
dt
slopefield
4
3
2
1
0
−1
−2
−3
−4
−4
Use the slope field to sketch some approximate solutions
to the DE, including the solution that satisfies the initial
condition y(1) = 2.
−3
−2
−1
0
1
2
3
4
CONTENTS
143
Example 3.1.23.
dy
= y + t,
dt
(a) Solve the IVP
y(0) = 0.
(b) Construct the slope-field of this DE by
1. completing the table of the values of y ′ below, and
2. sketching line-segments with these slopes on the
graph below.
4
3
.
.
.
.
.
.
.
2
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
0
.
.
.
.
.
.
.
-1
.
.
.
.
.
.
.
-2
.
.
.
.
.
.
.
-3
.
.
.
.
.
.
.
-3
-2
-1
0
1
2
3
Values of f (t, y) = y + t
t=-2
1
0
t=-1
2
t=0
3
t=1
4
t=2
5
t=3
6
y
y=3
y=2
y=1
y=0
y=-1
y=-2
y=-3
t=-3
0
-1
0
0
0
0
0
t
Locate the solution to the IVP on the slopefield.
Example 3.1.24.
Slope fields can easily be plotted by a computer using a software package such as Matlab.
The Matlab code to plot the slope field in the previous example, using a local function slopefld.m, is:
%Matlab-SESSION
%
% Create the slope field for the DE
%
y’ = y+t
% and plot it a 30x30 mesh grid over [-3,3]^2
%
delta_t=0.5;
[t,y]=meshgrid(-3:delta_t:3,-3:delta_t:3);
dt=ones(size(t));
dy=y+t;
axis square;
slopefld(t,y,dt,dy,’r-’);%, hold off, axis image
title(’slopefield’);
Copy slopefld.m to your working directory before you run this file. delta_t controls the grid spacing.
4
CONTENTS
144
Example 3.1.25.
Sketch the slope field and some solution curves for
dy
dt = y.
t=-3
t=-2
t=-1
t=0
t=1
t=2
4
t=3
y
y=3
y=2
y=1
y=0
y=-1
y=-2
y=-3
Solve the IVP y ′ = y,
y(0) = 1.
3
.
.
.
.
.
.
.
2
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
0
.
.
.
.
.
.
.
-1
.
.
.
.
.
.
.
-2
.
.
.
.
.
.
.
-3
.
.
.
.
.
.
.
-3
-2
-1
0
t
1
2
3
3
.
.
.
.
.
.
.
2
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
0
.
.
.
.
.
.
.
-1
.
.
.
.
.
.
.
-2
.
.
.
.
.
.
.
-3
.
.
.
.
.
.
.
-3
-2
-1
0
1
2
3
Identify this solution on the slope-field. Notice how the
solutions follow the slope field.
4
4
Example 3.1.26.
2y
Sketch the slope field for dy
dt = t , showing the solution
y = 2t2 passing through y(1) = 2.
t=-3
t=-2
t=-1
t=0
t=1
t=2
t=3
y
y=3
y=2
y=1
y=0
y=-1
y=-2
y=-3
4
t
4
Example 3.1.27.
Sketch the slope field for dy
dt = 2t + 1, showing the solution y = t2 + t − 4 which satisfies the initial condition
y(−2) = −2.
y=3
y=2
y=1
y=0
y=-1
y=-2
y=-3
t=-2
t=-1
t=0
t=1
t=2
t=3
y
t=-3
3
.
.
.
.
.
.
.
2
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
0
.
.
.
.
.
.
.
-1
.
.
.
.
.
.
.
-2
.
.
.
.
.
.
.
-3
.
.
.
.
.
.
.
-3
-2
-1
0
1
2
3
t
4
CONTENTS
145
Euler’s method
Not all initial value problems of the form
dy
= f (t, y),
dt
y(t0 ) = y0
have analytic solutions. We can find approximations
y1 , y2 , y3 , . . .
to the value of the solution at the sequence of points
t1 = t0 + h,
t2 = t1 + h = t0 + 2h,
...
by using a numerical method.
If the step-size h is “small” and the method is any good, then the derivative of y at tn should be close to the slope
of the line through (tn , yn ) and (tn+1 , yn+1 ), i.e.,
∆y
yn+1 − yn
=
≈ f (tn , yn ).
∆t
tn+1 − tn
By setting these equal, and solving for yn+1 , we obtain
yn+1 = yn + f (tn , yn )∆t,
∆t = tn+1 − tn = h.
The corresponding approximation scheme is called Euler’s method.
The Euler formula can also be found by considering the process of linear approximation drawn on the direction
field of y ′ = f (t, y): With
yn+1 = yn + f (tn , yn )h
yn+1 lies on the line through (tn , yn ) with slope f (tn , yn ). This slope is illustrated in the slopefield.
y' = f(t,y)
5
4
3
y2 2
y1 1
y0
0
0
t0
0.5
1
t1
1.5
2
t2
2.5
CONTENTS
146
Solution Technique 3.1.28. Euler’s method for approximating the solution of
dy
= f (t, y),
dt
y(t0 ) = y0
• choose a step size h, and let
t1 = t0 + h,
t2 = t1 + h = t0 + 2h,
..
.
tn+1 = tn + h = t0 + (n + 1)h, . . .
• compute successively
y1 = y0 + hf (t0 , y0 ),
y2 = y1 + hf (t1 , y1 ),
..
.
yn+1 = yn + hf (tn , yn ),
...
Definition 3.1.29.
If
(i) the solution y = y(t) of the DE y ′ = f (t, y) is known,
(ii) and Euler’s method generates a sequence of approximate-solution values {(ti , yi )}ni=1 , then the error in the
approximate solution at the point t = ti is
error = |y(ti ) − yi |.
• The calculations for Euler’s method can be set out in a table.
• Euler’s method can easily be implemented on a computer - e.g. in Matlab.
• It can be shown that the error in Euler’s method at any point is roughly proportional to the step-size h.
Thus Euler’s method converges to the true value of the solution (at a point) as h → 0.
By making finer approximations to f (tn , yn ) one can come up with other methods which converge at faster
rates.
Example 3.1.30.
Use Euler’s method to approximate a solution to the IVP y ′ = t + y, y(0) = 0:
(a) on the interval [0, 3] with h = 0.5,
(b) on the interval [0, 1.5] with h = 0.25.
n
0
1
2
3
4
5
6
tn
yn
f ( tn , y n )
yn + hf ( t n , y n )
n
0
1
2
3
4
5
6
tn
yn
Example 3.1.31.
Use Euler’s method to approximate a solution to the IVP y ′ + ty = t, y(0) = 0:
(a) on the interval [0, 1] with h = 0.5,
(b) on the interval [0, 1] with h = 0.1.
Compare your approximations with the analytic solution.
f ( tn , y n )
yn + hf ( t n , y n )
CONTENTS
147
Further Reading: the Logistic Equation
The model of growth/decay we’ve seen,
exponential growth:
y ′ (t) = ky(t),
is the simplest. This models early growth or low population numbers, where no constraints inhibit growth.
A more realistic model for the growth of a large population, e.g. spread of disease or permeation of the market
by a new commodity, is given by the logistic equation:
y ′ (t) = y(t)(B − Dy(t)),
for constants B and D. Here
• y(t) ≥ 0 represents population at time t
• y ′ (0) > 0 — initially, the population y is growing
• B is the constant growth (birth) rate
• Dy(t) is the death rate, assumed directly proportional to the size of the population.
The logistic equation says
rate of population growth = population(birth rate - death rate)
A typical solution to the logistic equation has the following graph
solution of logistic equation
B
D
x
x (0)
200
100
0
100
200
300
t
Note: The population levels off as the death-rate becomes close to the birth-rate.
This model can accurately predict many constrained-growth situations, e.g.
• insect and bacteria populations in restricted space
• the spread of an epidemic in a confined population
• impact of advertising in a local community
• spread of information in a local community
Example 3.1.32.
Solve the equation y ′ (t) = y(t)(2 − y(t)) with initial condition y(0) = 1.
The DE is separable:
When y = 0 or 2, y ′ = 0 and y has a critical point.
The IC says y starts out at 1, and y ′ (0) = 1 > 0.
For 0 < y < 2, we can divide through:
Z
Z
dy
=
dt.
y(2 − y)
CONTENTS
148
Expand the integrand using partial fractions:
1
a
b
= +
.
y(2 − y)
y 2−y
1 = a(2 − y) + b y.
1
y = 0 ⇒ 1 = 2a ⇒ a = .
2
1
y = 2 ⇒ 1 = 2b ⇒ b = .
2
Rewrite
Clearing denominators,
So
Z
dy
1
=
y(2 − y)
2
Z
⇒
1
dy +
2−y
Z
y=
2e2t
1 + e2t
2
3
.
2e2t
1+e2t
2.5
2
x
1.5
1
0.5
0
−0.5
−4
−3
−2
−1
0
t
1
=
Z
dt
1
(− ln |2 − y| + ln |y|) = t + c
2
y
1
= t + c.
⇒ ln
2
2−y
y
⇒
= Ce2t .
2−y
Applying the initial condition, C = 1.
Solving for y, ...
x=
1
dy
y
4
CONTENTS
149
3.2 Systems of First-Order Differential
Equations
3.2.1 First-order linear homogeneous equations
A first-order linear differential equation is homogeneous
if it has the form
y ′ + p(t)y = 0
Suppose that p(t) is a constant a,
y ′ + ay = 0
or
y ′ = −ay.
(3.57)
Guess a solution of the form
y = ceλt
(3.58)
and put it into the DE:
y ′ + ay = λceλt + aceλt = 0
⇒
ceλt (λ + a) = 0
⇒
λ + a = 0.
(3.59)
Equation (3.59) is called the characteristic equation of
the DE (3.57). It has solution λ = −a. Thus
y = ce−at
(3.60)
is a solution of (3.57).
Note: We could have also obtained this solution using
separation of variables or an integrating factor.
Example 3.2.1.
Write down the characteristic equation for each of the
following DEs and use it to find a one-parameter family
of solutions to the DE.
(a) y ′ + 3y = 0
(b) 2y ′ = y
(c) y ′ − 5y = 0
CONTENTS
150
3.2.2 Systems of first-order linear DEs
Now consider two linear constant-coefficient homogeneous differential equations in two dependent variables
x1 (t) and x2 (t)
x′1 = ax1 + bx2
,
x′2 = cx1 + dx2
or
x′ = Ax, (3.61)
′
x1
x1
a b
′
.
where x =
, x = ′ , and A =
c d
x2
x2
Notice the similarity between this and the DE (3.57).
To solve this system of DEs, we take our cue from the
one-dimensional case (3.58) and guess a vector solution
λt v e
λt
x = ve = 1 λt ,
(3.62)
v2 e
where v is a vector of constants.
Substituting this into the system gives
λv1 eλt
′
x =
= λeλt v = Aeλt v,
λv2 eλt
i.e., Av = λv, so that λ must be an eigenvalue of A and
v a corresponding eigenvector.
This gives two solutions of the DE:
x1 = eλ1 t v1 ,
and
x2 = eλ2 t v2 ,
where v1 and v2 are eigenvectors of A for the eigenvalues λ1 and λ2 .
If the eigenvalues are distinct (λ1 6= λ2 ),
• it can be shown that the eigenvectors v1 and v2
are linearly independent,
• all linear combinations of eλ1 t v1 and eλ2 t v2 are
also solutions of the DE
• these two solutions form a basis of all solutions
of the DE.
The solution of x′ (t) = Ax(t) is
x(t) = c1 eλ1 t v1 + c2 eλ2 t v2
(3.63)
where λ1 and λ2 are the eigenvalues of A, v1 and v2 the
corresponding eigenvectors, and c1 and c2 are arbitrary
constants.
CONTENTS
151
Solution Technique 3.2.2. To solve the system of DEs
x′1 = ax1 + bx2
x′2 = cx1 + dx2
• write the differential equations
in system form:
a
b
x′ (t) = Ax(t) with A =
,
c d
• find the eigenvalues λ1 and λ2 and corresponding
eigenvectors v1 and v2 of A,
• the solution of the original DE is:
x(t) = c1 eλ1 t v1 + c2 eλ2 t v2
where c1 and c2 are arbitrary constants,
• apply the initial conditions to determine c1 and c2 .
Since the solution x(t) is a vector-valued function of t, we need three dimensions to plot it. If we let t vary, then,
for fixed values of c1 and c2 , x(t) traces out a curve in the (x, y)–plane. A collection of such curves for a set of
values of c1 and c2 is called a phase portrait. Some phase portraits (with arrows indicating the direction that we
move along the curves as t increases) are given for the solutions below.
x′ = Ax for diagonal matrices A
The eigenvalues of a diagonal
matrix
are its diagonal entries, and the corresponding eigenvectors are the standard
1
0
basis vectors e1 =
and e2 =
.
0
1
Example 3.2.3.
Solve the system of first-order DEs:
−x(t)
d x(t)
(a) dt
=
y(t)
y(t)
15
y
10
5
v2
0
v1
x
-5
-10
-15
-15
-10
-5
0
5
10
15
CONTENTS
(b)
d
dt
x(t)
−x(t)
=
y(t)
−2y(t)
152
10
y
5
v2
0
v1
x
-5
-10
-10
′ x
x
=
(c)
−2y
y
-5
0
5
10
10
y
5
v2
0
v1
x
-5
-10
-10
′ 1.5x
x
=
(d)
0.5y
y
-5
0
5
10
10
y
5
v2
0
v1
x
-5
-10
-15
-10
-5
0
5
10
15
CONTENTS
′ x
x
(e)
=
y
2y
153
20
y
15
10
5
v2
0
v1
x
-5
-10
-15
-20
-10
-5
0
5
10
x′ = Ax for general matrices A
Example
3.2.4.
1 −3
′
(a) x =
x:
0 −2
The solution of the DE is
1 t
1 −2t
x = c1
e + c2
e
0
1
15
y
10
5
v2
0
v1
x
-5
-10
-15
-15
-10
-5
0
5
10
15
CONTENTS
−1/4 3/4
(b)
=
x
5/4 1/4
The solution of the DE is
−1 −t
0.6 t
x = c1
e + c2
e
1
1
x′
154
15
y
10
5
v2
0
x
-5
v1
-10
-15
-15
7 −1
1
(c)
=
x,
x(0) =
.
3 3
0
1 6t 1 1 4t
Solution: x = 23
e −2
e .
1
3
x′
-10
-5
0
5
10
15
y
400
v2
v1
200
0
x
-200
-400
-400
-200
0
200
400
y
10
1 −1
1
(d)
=
x,
x(0) =
.
2 −2
0
1 −t
2
e .
−
Solution: x =
2
2
x′
v2
v1
5
0
x
-5
-10
-10
-5
0
5
10
CONTENTS
155
Extra for interest: The method above gives a way to find analytic solutions to linear constant coefficient homogeneous systems of DEs. However, many interesting DEs are not of this type, in which case qualitative and
numerical methods are often useful for finding out about the behaviour of solutions.
Example 3.2.5.
The Matlab code
%Matlab-SESSION
%
% Solve the system of DEs using ode23
% needs auxiliary file, v_prime, defining x’ and y’
% x’=-y; y’=5-2y-x^2-xy;
clear t v_out; % clear variables we name here
[t,v_out]=ode23(@v_prime,[0,20],[2;1]);
xlabel(’x’);
ylabel(’y’);
plot(v_out(:,1),v_out(:,2));
can be used to numerically integrate and plot solutions to the system
dx
= −y,
dt
dy
= 5 − 2y − x2 − xy.
dt
4
2
0
−2
−4
−6
−8
−5
−4
−3
−2
−1
0
1
2
Example 3.2.6.
Use Matlab to numerically integrate and plot solutions to the system
x
dy
y
dx
= 2x(1 − ) − xy,
= 3y(1 − ) − 2xy,
dt
2
dt
3
for x ∈ [0, 4], y ∈ [0, 4]. What is the long-term behaviour of solutions to the DE?
CONTENTS
156
3.3 Homogeneous Linear Second-Order
DEs with constant coefficients
3.3.1 Introduction
A second-order linear DE is one which can be written in
the form
a(t)y ′′ + b(t)y ′ + c(t)y = d(t).
As in the first-order case, we only consider intervals of t
where a(t) 6= 0, and then rewrite the DE as
y ′′ + p(t)y ′ + q(t)y = f (t),
(3.64)
with leading coefficient 1.
Note: The DE is defined only where p(t), q(t) and f (t)
are all defined.
Terminology
homogeneous/inhomogeneous
The second-order linear DE is said to be homogeneous
if it has a zero right-hand-side, i.e., has the form
y ′′ + p(t)y ′ + q(t)y = 0,
otherwise, if the right-hand-side term f (t) is not the zero
function, then the DE is called inhomogeneous.
solution
A solution of the DE
y ′′ + p(t)y ′ + q(t)y = f (t)
is a function y that, when substituted for the unknown
dependent variable, yields a true statement from the DE.
A solution y of a DE satisfies it.
Example 3.3.1.
Verify the following:
(a) y = cos(t) is a solution of y ′′ + y = 0.
(b) e−t and e−3t are solutions of
y ′′ + 4y ′ + 3y = 0.
(c) u = et is not a solution of u′′ − 4u = et .
CONTENTS
157
vector space of solutions
If y1 an y2 are solutions of the homogeneous DE
y ′′ + p(t)y ′ + q(t)y = 0,
then so is any linear combination y = c1 y1 + c2 y2
y = c1 y1 + c2 y2
y′ =
y ′′ =
Superposition Principle
Plug these expressions into
y ′′ + p(t)y ′ + q(t)y
If y1 and y2 are two solutions of a homogeneous linear
DE
y ′′ + p(t)y ′ + q(t)y = 0,
and see that after the cancellations, you get 0. Thus the then any linear combination
set of solutions of this DE forms a vector space. This is
c1 y1 + c2 y2
sometimes called the superposition principle.
of them is also a solution of the same DE.
Initial-Value Problems
A linear second-order DE
y ′′ + p(t)y ′ + q(t)y = f (t)
can be solved by two integrations, which introduces two
arbitrary constants of integration. You will see why this
is, in the example at the end of this section. This example
shows that such a DE can be rewritten as a first-order
system (whose solution gives the two constants).
Consequently, the family of solutions of a 2nd-order DE
is parameterised by two arbitrary constants.
A specific solution in this family requires 2 observations.
We will consider the so-called initial conditions.
• Initial Conditions (ICs): Take the values
y(t0 ), y ′ (t0 )
i.e., make both observations at one initial point t0
As for first order DEs we have:
Initial-Value Problem (IVP):
Differential Equation + Initial Conditions (ICs)
It can be shown that if the coefficients p(t) and q(t)
and the right-hand-side f (t) are well-behaved functions,
then
• a 2nd-order IVP always has a unique solution.
CONTENTS
158
3.3.2 Solving Homogeneous Linear Second-Order
DEs
We restrict ourselves to solving 2nd-order, and some 3rdorder linear homogeneous DEs.
Much of the theory extends directly to DEs of higher
order.
Our goal is:
To find the most general solution of a homogeneous linear DE
y ′′ + p(t)y ′ + q(t)y = 0,
so that we can solve all Initial-Value Problems based on
it.
Example 3.3.2.
Show the following
(a) y(t) = 0 is a solution of
y ′′ + 4y ′ + 3y = 0,
but it doesn’t satisfy the ICs y(0) = 1, y ′ (0) = 1.
(b) y(t) = e−t is a solution of
y ′′ + 4y ′ + 3y = 0,
but it doesn’t satisfy the ICs y(0) = 1, y ′ (0) = 1.
We have already seen that the set of all solutions of a
2nd-order homogeneous DE is a vector space. Since
the solution involves two arbitrary constants, this vector space is 2-dimensional. More generally, the set of
solutions of an nth–order homogeneous DE forms a n–
dimensional vector space.
To decide whether we have found enough solutions to
form a basis for the space of solutions, we need to know
when sets of functions are linearly independent.
Linear Independence of functions
Two or more functions y1 (t), y2 (t), . . . yn (t) are said to
be linearly independent on an interval I if whenever a
linear combination of them is equal to zero on the whole
interval:
c1 y1 (t) + c2 y2 (t) + . . . + cn yn (t) = 0 for all t in I,
then
c1 = c2 = . . . = cn = 0,
i.e., only the trivial linear combination equals the zero
function on I.
If they are not linearly independent, they are said to be
linearly dependent.
Notes:
CONTENTS
159
• The same definition is given in the linear algebra
part. For a function to be the zero function, every
value must be 0,
c1 y1 (t) + c2 y2 (t) + . . . + cn yn (t) = 0,
t∈I
whereas for a vector c1 v1 + c2 v2 + . . . + cn vn to
be the zero vector it must be 0 in each coordinate.
• For DEs, I is the interval where we solve the DE:
where the coefficient functions and the right-hand
side are defined.
• Functions (or vectors) are linearly dependent when
one of them can be written as a linear combination
of others.
Example 3.3.3.
If two functions f1 and f2 are linearly dependent on
some interval I, then there are constants c1 and c2 , not
both zero, such that
c1 f1 (t) + c2 f2 (t) = 0 for all t ∈ I,
i.e., one is a scalar multiple of the other.
Example 3.3.4.
Show that the following sets of functions are linearly dependent, by verifying the given relationship:
(a) t, t − 1, t + 3 on (−∞, ∞),
and c1 t + c2 (t − 1) + c3 (t + 3) = 0
for c1 = −4, c2 = 3, c3 = 1.
√
√
t − 1, t2 on (0, ∞),
(b) t +√5t, t + 5, √
and c1 ( t+5t)+c2 ( t+5))+c3 (t−1)+c4 t2 = 0
for c1 = 1, c2 = −1, c3 = −5, c4 = 0.
For a set of three or more functions the easiest way to
determine whether they are linearly independent on an
interval is via their Wronskian:
Wronskian
If y1 (t), y2 (t), . . . yn (t) all have at least n − 1 derivatives, then their Wronskian is defined to be the determinant




W (y1 , y2 , . . . yn ) = det 


y1
y1′
y1′′
..
.
(n−1)
y1
y2
y2′
y2′′
(n−1)
y2
...
...
...
yn
yn′
yn′′
..
.
(n−1)
. . . yn




.


CONTENTS
160
Test for Linear Independence
If y1 (t), y2 (t), . . . yn (t) all have at least n − 1 derivatives, and
W (y1 , y2 , . . . yn )(t) 6= 0
for at least one point t in I, then the functions
y1 (t), y2 (t), . . . yn (t) are linearly independent on I.
If the determinant is zero on all of I, the functions are
linearly dependent.
Example 3.3.5.
Compute the Wronskian of each of the following sets
of functions, and determine whether the functions are
linearly independent on the intervals given.
(a) 1 and t on (−∞, ∞)
(b) t and t − 1 on (−∞, ∞)
(c) 1 and t, t2 on (−∞, ∞)
(d) t, t − 1, t + 3 on (0, ∞)
(e) t and t2 on (−∞, ∞)
(f) 1 and et on (−∞, ∞)
(g) t and et on (−∞, ∞)
(h) e−t and e−3t on (−∞, ∞)
(i) et and tet on (−∞, ∞)
(j) sin(t) and cos(t) on [−2π, 2π]
(k) eλ1 t and eλ2 t on (−∞, ∞), for λ1 6= λ2
CONTENTS
161
If y1 (t), y2 (t), . . . yn (t) are n linearly independent functions on an interval I, we know that
W (y1 , y2 , . . . yn )(t0 ) 6= 0 for some t0 ∈ I.
If, in addition, they are also solutions of an nth -order homogeneous linear DE on any part of I (let’s assume it’s
all of I), we know a little more:
Facts about the Wronskian of a Set of
Solutions to a Homogeneous Linear DE
(i) If y1 , y2 , . . . yn are n linearly independent solutions of an nth-order homogeneous linear DE on
an interval I, then
W (y1 , y2 , . . . yn )(t) 6= 0
for every t in I.
(ii) Any set of m solutions
y1 , y2 , . . . ym ,
m>n
of an nth-order homogeneous linear DE, have
W (y1 , y2 , . . . ym )(t) = 0 for all t.
You now know enough about DEs to prove these facts in the case of a second-order DE:
y ′′ + p(t)y ′ + q(t)y = 0
(3.65)
Solutions y1 and y2 of (3.65) satisfy
y1′′ + p(t)y1′ + q(t)y1 = 0
y2′′ + p(t)y2′ + q(t)y2 = 0
so
y1′′ = −p(t)y1′ − q(t)y1
y2′′
=
−p(t)y2′
− q(t)y2 .
The Wronskian of y1 and y2 is
y1 y2
W (y1 , y2 ) = det ′
= y1 y2′ − y1′ y2 .
y1 y2′
And if they are linearly independent,
W (y1 , y2 )(t0 ) 6= 0 at some t0 ∈ I.
Differentiate this Wronskian:
d
W (y1 , y2 ) = y1′ y2′ + y1 y2′′ − y1′′ y2 − y1′ y2′
dt
= y1 y2′′ − y1′′ y2 .
= y1 (−p(t)y2′ − q(t)y2 ) − (−p(t)y1′ − q(t)y1 )y2
from (3.66) and (3.67).
This simplifies to give
d
W = −pW.
dt
(3.66)
(3.67)
CONTENTS
162
This is a separable 1st-order DE.
Solving,
R
W (y1 , y2 ) = Ce−
p(t) dt
.
Now impose a special initial condition at a particular t = t0 :
W (y1 , y2 )(t0 ) = its non-zero value.
(y1 and y2 are linearly independent, so this Wronskian has a non-zero value at at least one point t0 .)
The solution to this IVP is:
R
W (y1 , y2 ) = W (y1 , y2 )(t0 )e− p(t) dt .
Neither factor on the right-hand side is zero, so we’ve proved the result!
Example 3.3.6.
Suppose y1 , y2 and y3 are three solutions of the second order DE (3.65). For i = 1, 2 and 3,
yi′′ + p(t)yi′ + q(t)yi = 0
so
yi′′ = −p(t)yi′ − q(t)yi
Use Gaussian elimination to show that Wronskian of these three functions is constantly 0.
Now to the topic of finding enough linearly independent solutions of a DE:
Fundamental Set of Solutions for a Linear DE
Any set of n linearly independent solutions of an nth-order homogeneous linear DE is called a fundamental set
of solutions of the DE. Since the set of solutions is an n–dimensional vector space, these form a basis for it.
basis
A fundamental set of solutions of a homogeneous linear DE is a basis of solutions for the DE, i.e., every solution
of the DE can be written as a linear combination of a fundamental set of solutions.
This means we can write the most general solution of an nth-order homogeneous linear DE:
Solution Technique 3.3.7. To solve a homogeneous linear DE of order n and associated IVPs:
• find a fundamental set of solutions of the DE:
y1 , y2 , . . . , yn
• the general solution of the DE is the linear combination
y = c1 y1 + c2 y2 + . . . + cn yn
for arbitrary constants c1 , . . . , cn .
• solve for the constants by applying the initial conditions
Example 3.3.8.
Solve the IVP
y ′′ + 4y ′ + 3y = 0,
y(0) = −1, y ′ (0) = 2.
CONTENTS
163
3.3.3 Homogeneous Linear DEs with Constant
Coefficients.
We restrict ourselves now to the simplest type of homogeneous linear DE, that with constant coefficients. In the
second-order case, this is of the form
ay ′′ + by ′ + cy = 0,
a, b and c constants, a 6= 0.
What follows is a general method of finding a fundamental set of solutions of an nth-order homogeneous linear DE with constant coefficients. We demonstrate the
method in the 2nd-order and 3rd-order cases.
Characteristic Equation
To solve
ay ′′ + by ′ + cy = 0,
a, b and c constants, a 6= 0,
(3.68)
we start out by guessing a solution
y = k eλt
y ′ = λ k eλt
y ′′ = λ2 k eλt
Putting this y into (3.68),
0 = k (aλ2 eλt + bλeλt + ceλt )
= k eλt (aλ2 + bλ + c)
Since eλt 6= 0 for any t, and we assume that k 6= 0, it
must be that aλ2 + bλ + c = 0.
Definition 3.3.9.
The equation
aλ2 + bλ + c = 0.
(3.69)
obtained by making the substitution y = k eλt in the DE
ay ′′ + by ′ + cy = 0
is called the characteristic equation of the DE.
Notes: 1. Note the similarity between the characteristic
equation of the DE (3.69) and the differential equation
itself (3.68).
2. Solutions λ of the characteristic equation give solutions eλt of the DE.
Solutions of the characteristic equation
The characteristic equation aλ2 + bλ + c = 0 for a secondorder DE is a quadratic equation. It therefore has the
solution
√
−b ± b2 − 4ac
.
λ=
2a
CONTENTS
164
Three Possible Outcomes in
Solving the Characteristic Equation:
b2 − 4ac > 0, two distinct real roots λ1 and λ2 of
the characteristic equation, yielding two solutions
eλ1 t and eλ2 t of the DE.
b2 − 4ac = 0, one repeated real root λ1 of the characteristic equation, yielding one solution eλ1 t of the
DE.
b2 − 4ac < 0, two complex roots λ1 and λ2 of the characteristic equation, yielding two complex-valued
solutions eλ1 t and eλ2 t of the DE.
For this course, we restrict ourselves to cases (i) and (ii)
above - when
b2 − 4ac ≥ 0.1
(i) Distinct Roots of the Characteristic Equation
In case (i), there are two real distinct solutions for λ,
giving as many solutions eλt as the order of the DE. By
Example 3.5.5 (k), these are linearly independent, so by
solving the characteristic equation we get a fundamental
set of solutions of the DE.
Fundamental set of solutions when
characteristic equation has distinct real roots
An nth-order constant-coefficient homogeneous DE
whose characteristic equation has n distinct real roots
λ1 , λ2 , . . . , λn has the fundamental set of solutions
eλ1 t , eλ2 t , . . . , eλn t .
1
The case b2 − 4ac < 0 involves complex numbers (not a part
of this course) and Euler’s formula eit = cos(t) + i sin(t).
E.g. y ′′ + 4y = 0 has characteristic equation λ2 + 4 = 0⇒λ = ±2i.
The general solution is y = c1 e2it + c2 e−2it which can be rewritten,
using Euler’s formula, as y = d1 cos 2t + d2 sin 2t.
This can be verified directly:
y ′ = −2d1 sin 2t + 2d2 cos 2t,
′′
y = −4d1 cos 2t − 4d2 sin 2t.
So that y ′′ + 4y = 0.X
CONTENTS
165
Example 3.3.10.
Find the general solution of the following DEs by solving the characteristic equation and writing the general
solution as a linear combination of the corresponding solutions of the DE. Check your answers.
(a) 2y ′ + 3y = 0
(b) y ′′ − 2y ′ − 3y = 0
(c) y ′′ + 5y ′ + 4y = 0
(d) y ′′ − 9y = 0
(e) y ′′′ + 2y ′′ − 3y ′ − 6y = 0
(f) y ′′′ + 4y ′′ + y ′ − 6y = 0
Note: When you have to factor a cubic, unless the factorisation is obvious, try to guess a root: if you find a
number c that makes the polynomial equal to 0, then t−c
is a factor. To find the others, divide the polynomial by
t − c using long division.
CONTENTS
166
(ii) Repeated Roots of the Characteristic Equation
Example 3.3.11.
Solving the characteristic equation for y ′′ + 2y ′ + y = 0,
λ2 + 2λ + 1 = (λ + 1)2 = 0,
we see that λ = −1 is the only root, giving just one
solution y = e−t of the DE.
In every case of repeated roots, we get fewer solutions
of the form y = eλt than the order of the DE. A method
related to integrating factors tells us the following:
If a root λ of the characteristic equation is repeated k
times, then
y = eλt , teλt , t2 eλt , . . . , tk−1 eλt
are all solutions of the DE.
Further, these solutions are linearly independent.
Example 3.3.12. (a) Show that te−t is another solution of the DE y ′′ + 2y ′ + y = 0.
(b) Show that te−t is linearly independent from e−t .
Example 3.3.13.
Find the characteristic equation of y ′′′ − 6y ′′ + 12y ′ − 8y = 0, factorise and solve it.
Then show that y = e2t , y = te2t and y = t2 e2t are linearly independent solutions of the DE.
Hint: Computation of the Wronskian of these sets of solutions shows that they are linearly independent.
CONTENTS
167
Example 3.3.14.
Solve the following DEs:
Solution Technique 3.3.15. To find the general solution
of an nth -order homogeneous linear DE with constant
coefficients:
(a) y ′′ + 2y ′ + y = 0
• form characteristic equation and find its roots λ
(b)
y ′′
+ 6y ′
• form corresponding solutions eλt of DE
+ 9y = 0
• if a root λ of characteristic equation is repeated k
times, supplement eλt with
teλt , t2 eλt , . . . , tk−1 eλt
• the general solution is a linear combination of this
fundamental set of solutions
(c) y ′′′ + y ′′ − y ′ − y = 0
(d) y ′′′ + 4y ′′ + 4y ′ = 0
Initial-Value Problems
Solving the DE with initial conditions determines the constants of integration in the general solution; use either
substitution or Gaussian elimination to find them.
Example 3.3.16.
(a) y ′′ + 5y ′ + 4y = 0,
y(0) = 0, y ′ (0) = 1
(b) 2y ′′ − 5y ′ − 3y = 0,
y(0) = 1, y ′ (0) − 2
CONTENTS
168
3.3.4 Equivalence of Second Order DE and First-Order System of DEs
We now illustrate the equivalence of a second-order homogeneous DE and a corresponding system of two firstorder DEs, in the case that the roots of the characteristic equation are distinct.
Example 3.3.17.
y
v
y ′′ + 5y ′ + 4y = 0, y(0) = 0, y ′ (0) = 1. Form the vector v = 1 = ′ .
v2
y
′
y
Then v′ = ′′ ,
y
and solving for y ′′ from the DE: y ′′ = −4y − 5y ′ , so
′ y
y′
0
1
y
=
=
′′
′
y
−4y − 5y
−4 −5 y ′
y(0)
0
Also from the DE, v(0) = ′
=
.
y (0)
1
The IVP can be restated in system form as
v′ = Av
v(0) =
0
,
1
for matrix A =
The matrix A has characteristic equation
λ2 + 5λ + 4 = 0,
with roots λ1 = −1, λ2 = −4, the eigenvalues of A.
The corresponding
eigenvectors
are
1
1
v1 =
and v2 =
respectively.
−1
−4
The general solution of (3.63) is
x = c1 eλ1 t v1 + c2 eλ2 t v2 = c1 e−t
Then apply the initial condition
x′ (0)
1
1
+ c2 e−4t
.
−1
−4
0
=
.
1
Example 3.3.18.
Solve the IVP y ′′ − 2y ′ − 3y = 0, y(0) = 1, y ′ (0) = 1 using a system of first-order DEs.
(3.70)
CONTENTS
169
It is usually quicker to solve higher order, linear constant coefficient homogeneous DEs directly (i.e., without
first converting to a system). However, if we want to use numerical or qualitative techniques we usually convert
to a system. For example, to study the DE
y ′′ + sin y = 0
(a model for a pendulum if the amplitude is large), the analytic methods above will not work, and we can instead
rewrite it as a system
y ′ = z,
z ′ = − sin y
and study this numerically.
Example 3.3.19.
Numerical solution to pendulum problem (different from above).
%Matlab-SESSION
%
% Solve the system of DEs using ode23
% needs auxiliary file, v_prime, defining x’ and y’
% x’=-y; y’=5-2y-x^2-xy;
clear t v_out; % clear variables we name here
[t,v_out]=ode23(@v2_prime,[0,2000],[2;1]);
plot(v_out(:,1),v_out(:,2));
xlabel(’y’);
ylabel(’z’);
axis square;
title(’solution to pendulum problem’);
solution to pendulum problem
1.5
1
z
0.5
0
−0.5
−1
−1.5
1.5
2
2.5
3
3.5
4
4.5
5
4.5
5
y
solution to pendulum problem
1.5
1
z
0.5
0
−0.5
−1
−1.5
1.5
2
2.5
3
3.5
4
y
Figure 3.12: Early solution, and then solution over much
longer time-scale.
CONTENTS
170
4.1 Vectors
4.1.1 Vector Arithmetic
We can add and subtract vectors u = (u1 , u2 , . . . un ) and v = (v1 , v2 , . . . vn ) ∈ Rn just as we do real numbers,
and interpret the results geometrically:
u + v = (u1 , . . . , un ) + (v1 , . . . , vn ) = (u1 + v1 , . . . , un + vn ) ∈ Rn
and we can multiply v by a scalar c (i.e. a number):
cv = (cv1 , . . . , cvn )
to stretch/shrink the vector’s length, or reverse its direction.
u
−u
v−u
v
v
v−u
u+v
u
−u
Example 4.1.1.
1
3
Sketch the vectors u =
and v =
. On the same graph, sketch
2
−1
(a) u + v
(c) −2u
(b) u − v
(d) 3u − 2v
Properties of vectors in Rn If u = (u1 , u2 , · · · , un ), v = (v1 , v2 , · · · , vn ) and w = (w1 , w2 , · · · , wn ) are
vectors in Rn and k and l are scalars, then
(a) u + v = v + u
(b) u + (v + w) = (u + v) + w
(c) u + 0 = 0 + u = u
(d) u + (−u) = 0;
i.e.
(e) k(lu) = (kl)u
u−u=0
(f) k(u + v) = ku + kv
(g) (k + l)u = ku + lu
(h) 1u = u
Here 0 = (0, 0, · · · , 0) is the zero vector of Rn , with n components.
CONTENTS
171
4.1.2 Length, distance, and angles in Rn
The length kvk of a vector v ∈ Rn is defined as
kvk =
q
v12 + v22 + · · · + vn2
In R2 this is Pythagoras’s theorem. It can also be stated
kvk2 = v12 + v22 + · · · + vn2 .
(4.71)
Definition 4.1.2.
Vectors of length 1 are called unit vectors.
v
Note: For any v 6= 0, kvk
is a unit vector.
2
3
In R or R , any two vectors v and w that originate at a common point and which are not parallel (i.e. v 6= kw),
form two sides of a triangle. The vector v − w (or w − v) forms the third side. The sides have lengths kvk and
kwk, and kv − wk, respectively - see Fig 4.13.
v
|w − v|
w
Figure 4.13: Distance between vectors v and w.
Definition 4.1.3.
The distance between vectors v and w in Rn is
p
kw − vk = (w1 − v1 )2 + (w2 − v2 )2 + · · · + (wn − vn )2 .
(4.72)
Angles in Rn , orthogonality
We now present a tool which helps in the computation of lengths of vectors, and angles between them:
Definition 4.1.4.
The dot product of two vectors v and w in Rn is denoted by v · w, and defined as
v · w = vT
w = v1 w1 + v2 w2 + · · · + vn wn .
(1×n)(n×1)
Note that v · w is a scalar.
The dot product of two vectors is also referred to as their scalar product.
Properties of the dot product in Rn If u, v, w are vectors in Rn and r is a scalar,
• (u + v) · w = u · w + v · w;
• (ru) · v = r(u · v);
• u · v = v · u;
• u · u > 0 whenever u 6= 0.
We can restate the length, or magnitude, of a vector v ∈ Rn in terms of the dot-product, as
√
kvk = v · v
(4.73)
CONTENTS
172
or alternatively,
v · v = kvk2 = v12 + v22 + · · · + vn2 .
Pythagoras’s theorem says that perpendicular, or right-angled, or orthogonal vectors v and w in
satisfy
kv − wk2 = kvk2 + kwk2 .
(v ⊥ w)
(4.74)
R2
(Figure 4.14)
(4.75)
w
kv − wk
kwk
kvk
v
v−w
Figure 4.14: If v ⊥ w, then v.w = 0.
Using dot products,
kv − wk2 = (v − w) · (v − w)
= v · v − v.w − w.v + w.w
= kvk2 + kwk2 − 2v.w.
(4.76)
So if v and w are perpendicular, then v.w = 0.
We extend this result to Rn :
Definition 4.1.5.
Non-zero vectors v and w in Rn are said to be orthogonal when v · w = 0.
Again from R2 , the law of cosines tells us that vectors v and w with angle θ between them (not necessarily
perpendicular) satisfy
kv − wk2 = kvk2 + kwk2 − 2kvk kwk cos θ.
(4.77)
v · w = kvk kwk cos θ,
(4.78)
Comparing this with (4.76), we get
θ
kvk
kwk
kv − wk
Restating this result in higher dimensions:
CONTENTS
173
Definition 4.1.6.
The angle θ between non-zero vectors v and w in Rn has cos θ =
θ = cos
−1
v·w
kvkkwk
v·w
kvkkwk .
That is,
.
Note: If the angle between two vectors is π2 , they are said to be orthogonal.
%Matlab-session
%dot product, length, unit vector, angle
clf
u=[1,2 2]’
v=[1 1 1]’ %column vectors
u’*v
%u.v or dot(u,v)
norm(u)
%length of u
norm(u-v)
%distance between u and v
a=u/norm(u) %unit vector in direction of u
b=v/norm(v) %unit vector in direction of v
theta=acos(a’*b)
%angle between u and v
axis([0 2 0 2 0 2]); %set up axes for plot
A=[u,zeros(3,1),v]
%3 pts: u,0,v in mtx
plot3(A(1,:),A(2,:),A(3,:)) %plot them
text(1,2,2,’u’)
%label u at (1,2,2)
text(1,1,1,’v’)
%label v at (1,1,1)
Example 4.1.7.
If two unit vectors u and v are parallel, what is u · v?
CONTENTS
174
4.2 Vector Representation of Lines and Planes
4.2.1 Vector Representation of Lines and Planes
The equations for lines and planes in R2 and R3 can be written in vector form or as a system of linear equations.
We denote by r = (x, y, z) an arbitrary point in R3 , and by r0 = (x0 , y0 , z0 ) a particular point.
When the z-coordinate is 0, we have a point in R2 .
y
r0
(i) The vector form of the equation of a line through
point r0 with direction d is
r = r0 + td,
t∈R
(4.79)
r
(ii) The vector form of the equation of a plane through
point r0 with directions d and d’ is
r = r0 + sd + td′ ,
s, t ∈ R
x
d
(4.80)
z
Note:
r0
• lines have one direction vector,
r
d
• planes have two direction vectors.
y
d′
x
Example 4.2.1.
The solution set of the equation x + 2y − 3z = 0 is a plane.
(a) Find the equation of the plane in vector form.
(b) Verify that it is a plane through the origin perpendicular to the vector 1, 2, −3 .
In R3 , arbitrary non-parallel vectors u and v, emanating from the origin, lie in exactly one plane:
x = sv + tw,
the span of the vectors v and w. A vector which uniquely determines this plane is its normal vector.
Definition 4.2.2.
A vector n is normal to a plane P if it is perpendicular to every vector in P .
Given a plane, how do we find a vector normal to it?
The Cross-Product
Definition 4.2.3.
The cross-product of vectors v and w in R3 is denoted v × w, and defined as
v×w =
v2 w2
v w3
v
v2
i− 1
j+ 1
k
v3 w3
w1 w3
w1 w2
(4.81)
CONTENTS
175
If v and w are not parallel, v × w is a vector perpendicular to them both.
In (4.81), we use the notation
 
1
i = 0 ,
0
 
0
j = 1 ,
0
 
0
k = 0 .
1
Often, for convenience, we write the formula (4.81) in a briefer form as
i
j
k
v × w = v1 v2 v3
w1 w2 w3
(4.82)
(4.82) is not a real determinant, but it may be helpful as a mnemonic.
n=v×w
w
111111111111111111
000000000000000000
000000000000000000
111111111111111111
000000000000000000
111111111111111111
000000000000000000
111111111111111111
000000000000000000
111111111111111111
P
000000000000000000
111111111111111111
000000000000000000
111111111111111111
000000000000000000
111111111111111111
000000000000000000
111111111111111111
000000000000000000
111111111111111111
000000000000000000
111111111111111111
000000000000000000
111111111111111111
v
Example 4.2.4.
Find the following cross-products:

  
−1
0
(c)  1  × −2
0
1
(a) i × j
(b) i × k
v
|v| sin θ
θ
w
Figure 4.15: Area of parallelogram is |v × w|.
Example 4.2.5. (a) Choose any non-parallel vectors v and w in R3 and verify that v × w is a vector perpendicular to them both.
(b) For any non-parallel vectors v and w in R3 with angle θ between them,
(i) |v × w| = |v||w| sin(θ).
(ii) The area of a parallelogram with sides the vectors v and w is |v × w|.
CONTENTS
176
The normal equation of a plane in R3
The plane through a point r0 = (x0 , y0 , z0 ) and containing vectors v and w has vector form r = r0 + sv + tw.
r = (x, y, z) represents an arbitrary point on the plane.
Referring to Figure 4.16
• The vector r − r0 = (x − x0 , y − y0 , z − z0 ) from
r0 to r represents an arbitrary vector in the plane
(in the span of v and w).
n
v
• The vector n = v × w is normal to the plane.
• The vector r−r0 lies in the plane, so is orthogonal
to n:
(x, y, z)
P
(x0 , y0 , z0 )
(r − r0 ) · n = (x − x0 , y − y0 , z − z0 ) · n = 0.
w
This holds for all points (x, y, z) in the plane.
Figure 4.16: Normal vector to a plane
The normal equation of a plane through point
r0 = (x0 , y0 , z0 ) with normal vector n = (a, b, c), is ax + by + cz = d where d = ax0 + by0 + cz0 .
In particular, the equation of a plane through the origin with normal (a, b, c) is
ax + by + cz = 0.
Recall: The equation of a line in R2 through the origin is ax + by = 0.
Example 4.2.6.
(a) Find
the
normal
equation
of
the
plane
through
the
origin
and
the
vectors
v
=
1
2
1
and w =
1 0 −1 .
   
1
1



(b) Find the normal equation of the plane containing the vectors 1 , 1 , and through the point (1, 2, 1).
1
−1
   
1
1
(c) Find the normal equation of the plane spanned by the vectors 1,  1 .
1
−1
(d) Find the equation of the line of intersection of the two planes 2x + 3y − z = 0 and x − y = 0.
(e) Find the equation of the plane through the three points (1, 1, 1), (1, 0, −1) and (1, 2, 3).
(f) Write the equation of the plane z = 1 in vector form.
CONTENTS
177
(g) Write the equation of the plane through the origin, with normal k, in vector form.
Example 4.2.7.
Matlab plots the vector form of lines and planes if strings, specifying the coordinates, are given.
The following script illustrates this feature and its cross-product function: Two vectors and their span (a plane)
are illustrated. A normal to this plane is obtained with the cross function. Rotate the graph to see the normal to
the plane.
%Matlab-session: plot plane from vector form
view([-30,-60]); hold on;
%set viewpoint
ezplot3(’1*s’,’1*s’,’s’,[0,5]);
% parametric form of line thru (1,1,1) and (0,0,0)
axis image;
%don’t scale axes differently
ezplot3(’0’,’t’,’-3*t’,[0,5]);
%parametric from of line thru (0,1,-3) and (0,0,0)
ezmesh(’1*s’,’1*s+t’,’s-3*t’);
%parametric form of plane through these two lines
%(use ezplot3 to plot lines in 3d, and ezmesh to plot planes in 3d
u=5*[0 1 -3]’;
v=5*[1 1 1]’;
n=cross(u,v)/25;
% normal vector to u and v
title_string=sprintf(... %continuation on next line
’normal line: %g x+ %g y+ %g z=0’,n(1),n(2),n(3));
text(u(1),u(2),u(3),’u’);% label u
text(v(1),v(2),v(3),’v’);% label v
text(n(1),n(2),n(3),’n’);% label n
title(title_string);
%apply title to graph
plot3([n(1),0,-n(1)],[n(2),0,-n(2)],[n(3),0,-n(3)]); %plot normal vector too
hold off;
normal line: 4 x+ −3 y+ −1 z=0
30
20
z
10
0
−10
n
v
−20
u
−30
−5
−10
0
0
5
x
10
y
CONTENTS
178
4.3 Systems of Linear Equations and Matrices
4.3.1 Systems of Linear Equations
A linear equation in n variables x1 , x2 , · · · , xn (for any positive integer n) has the structure
c1 x1 + c2 x2 + · · · + cn xn = b.
(4.83)
The coefficients c1 , · · · cn and right-hand-side term b are given (known) real numbers.
An expression is linear when all variables xi appear as linear (first power) terms: x1i , i = 1, 2, · · · , n.
For example 2x1 − 3x2 − 4x3 = 9 is a linear equation in three variables.
A linear expression contains
• no quadratic terms x2i or xi xj ,
• no cubic terms x3i , xi x2j
• no other non-linear terms, e.g.
√
xi , sin xi etc.
Example 4.3.1.
The quadratic equation x2 + 2x + 1 = 0 is not linear.
Lines in the plane are graphs of linear equations
The simplest linear equation represents a line in the (x, y)-plane:
ax + by = d.
This may be more familiar to you in the form
y = mx + c.
(4.84)
For every real number x, a corresponding y = mx + c is determined. The graph of the line is the set of all points
(x, y) satisfying (4.84).
Planes in 3-d space are the graphs of linear equations
The next-simplest linear equation involves three variables,
ax + by + cz = d
(4.85)
and represents a plane in (x, y, z)-space.
z
The simplest planes are the co-ordinate planes of R3
• the xy-plane:
all points (x, y, z) with z = 0
• the yz-plane:
all points (x, y, z) with x = 0
• the xz-plane:
all points (x, y, z) with y = 0
00000000
11111111
111111
000000
00000000
11111111
00000000
11111111
y=0
000000
111111
00000000
11111111
00000000
11111111
000000
111111
00000000 x = 0
11111111
00000000
11111111
000000
111111
00000000
11111111
00000000
11111111
000000
111111
00000000
11111111
00000000
11111111
000000
111111
00000000
11111111
00000000
11111111
000000
111111
00000000
11111111
000000
111111
y
000000
111111
000000
111111
000000
111111
000000
111111
x
z=0
Figure 4.17: The co-ordinate planes of 3-d space
CONTENTS
179
Example 4.3.2.
Draw the graph of the plane
2x − y = 0 in 3-d space.
Hint:
• it contains the line y = 2x, and
• z is not specified, so it is “free”.
Definition 4.3.3.
A solution of the linear equation
a1 x1 + a2 x2 + · · · + an xn = b.
is a set of real numbers s1 , s2 , · · · , sn which when substituted for the variables x1 , x2 , · · · , xn satisfy the equation.
Example 4.3.4.
The linear equation
2x1 + 3x2 = 1
has one solution s1 = −4, s2 = 3, because
2(−4) + 3(3) = 1.
We say “(x1 , x2 ) = (−4, 3) is a solution.”
Graph the set of all solutions of 2x1 + 3x2 = 1 in the x1 x2 -plane. How many solutions are there to this equation?
Definition 4.3.5.
A system of linear equations is a set of linear equations. A system of m linear equations in n unknowns has the
form
a11 x1 + a12 x2 + · · · +a1n xn = b1
a21 x1 + a22 x2 + · · · +a2n xn = b2
(4.86)
..
..
..
..
.
.
.
.
am1 x1 + am2 x2 + · · · +amn xn = bm
Every coefficient aij is a real number.
Example 4.3.6.
The set of equations
x1 + 3x2 = 1
x1 + x2 = 1
(4.87)
is a system of two linear equations in two unknowns.
Definition 4.3.7.
A solution of the system of linear equations (4.86) is a set of real-number values for the variables x1 , x2 , · · · , xn
which satisfies all equations simultaneously.
Example 4.3.8.
The system of linear equations (4.87) has one solution x1 = 1, x2 = 0, as
1 + 3(0) = 1
and 1 + 0 = 1.
The point (x1 , x2 ) = (1, 0) is the intersection of the lines given at (4.87).
(4.88)
CONTENTS
180
Example 4.3.9.
Using Figure 4.17, locate the solution of the system of equations
x = 0,
y=0
in the variables x, y and z (3-d space), by observing the intersection of the given planes.
Example 4.3.10.
The system of equations
x = 0,
x=1
has no solution.
Definition 4.3.11.
If a system has no solution, we say it is inconsistent.
If a system is not inconsistent, it is consistent.
Notes:
• To solve a system of equations in n unknowns, we need a consistent system of n equations.
• Any fewer independent equations and we have undetermined, or free variables.
• Any more independent equations and we have inconsistency.
x2
x2
x1
x1
(a)
(b)
x2
x1
(c)
Figure 4.18: (a) No solution, (b) One solution, and (c)
Infinitely many solutions
CONTENTS
181
Example 4.3.12.
From Figure 4.18:
(a) The system of equations
2x1 − x2 = 1
2x1 − x2 = 2
is inconsistent.
(b) The system of equations
2x1 − x2 = 1
x1 + 2x2 = 2
has exactly one solution.
(c) The system of equations
2x1 − x2 = 1
4x1 − 2x2 = 2
has infinitely many solutions.
A system of linear equations falls into one of three categories: it has either
• no solution
• one solution
• infinitely many solutions
Definition 4.3.13.
A system of linear equations
a11 x1 +
a21 x1 +
..
.
a12 x2 +
a22 x2 +
..
.
···
···
+a1n xn =
+a2n xn =
..
.
0
0
..
.
am1 x1 + am2 x2 + · · · +amn xn = 0
with every right-hand-side zero is called homogeneous.
Notes:
1. A homogeneous linear system has the solution xi = 0, for all i. This is called the trivial solution of the
system.
2. The trivial solution is the obvious solution of a homogeneous system. It always exists.
3. When solving homogeneous systems, we will focus on finding non-trivial solutions.
In Matlab, an explicit system of equations can be solved with the “solve” command:
%Matlab script:
%Solve a system of equations when given explicitly.
eqn1=’2*x + 3*y - z = 0’
eqn2=’ - y + z = 1’
eqn3=’x - 2*y - z = -1’
[x,y,z]=solve(eqn1, eqn2, eqn3) %solve 3 eqns simultaneously
[x,y]=solve(eqn1,eqn2)
%solution has one free variable - by default, last(z))
[x,z]=solve(eqn1,eqn2,’x,z’) %make y free variable
It’s easy to solve two equations in two unknowns with simple algebra, but for more equations in more variables,
we tend to abbreviate the process to systematise our work.
CONTENTS
182
Augmented Matrices
Definition 4.3.14.
A set of equations
a11 x1 +
a21 x1 +
..
.
a12 x2 +
a22 x2 +
..
.
···
···
+a1n xn =
+a2n xn =
..
.
b1
b2
..
.
(4.89)
am1 x1 + am2 x2 + · · · +amn xn = bm
can be abbreviated by an array of numbers, i.e. a matrix:

a11 a12 · · · a1n b1
 a21 a22 · · · a2n b2


..
..
..
..

.
.
.
.
am1 am2 · · · amn bm





(4.90)
(4.90) is called the augmented matrix of coefficients for the system of linear equations (4.89). It is often written
[A|b].
In this review, now we will focus on solving systems of linear equations by working with the augmented matrix
of the system.
Example 4.3.15.
Write the augmented matrix for the system 2x + 3y − z =
0 .
− y + z =
1
x − 2y − z = −1
In Matlab, this is done by the command:
%Form augmented matrix from system of equations Ax=b
A=[2 3 -1 ;0 -1 1;1 -2 -1];
b=[0 1 -1]’;
% or b=[0;1;-1];
Ab=[A,b]
%form augmented matrix
sol=rref(Ab)
%solves system - reduced echelon form
sol_2=A\b
%same, as long as a unique solution exists
Elementary Row-Operations
The algebra needed to solve a system of linear equations can be viewed as a sequence of operations on the rows
of the augmented matrix of coefficients of the system.
We classify three types of such “elementary row-operations”:
(i) row-exchange: Interchange any two rows (ri ↔ rj ).
(ii) row-multiple: Multiply a row by a non-zero constant (ri → kri ).
(iii) row-addition: Replace a row by itself plus any multiple of another row (ri → ri − krj ).
Example 4.3.16.
In sequence, apply the row operations
(i) r1 ↔ r2
to the augmented matrix
(ii) r3 → r3 − r1


2
3 −1
0
 1 −1
1 
1
1 −2 −1 −1
CONTENTS
183
In Matlab:
%row-reductions by hand - very useful
%using matrix A= 2 3 -1 0
%
1 -1 1 1
%
1 -2 -1 -1
echo on;
A=[2 3 -1 0; 1 -1 1 1; 1 -2 -1 -1];
E=A; %backup A - in case of errors
E=[E(2,:);E(1,:);E(3,:)] %r_2 <-> r_1
E(3,:)=E(3,:)-E(1,:)
%r_3 -> r_3-r_1
echo off;
Definition 4.3.17.
Two matrices A and B related by a series of row-operations are written “A∼B”.
We say A and B are row-equivalent.
Echelon and Reduced Echelon Form
Definition 4.3.18.
(a) A matrix A (not necessarily square) is in echelon form if
• all rows of zeros are at the bottom
• the first non-zero entry in every row (called a “pivot” or “leading entry”) is to the right of the first
non-zero entry in the previous row (step-like pattern of leading entries)
• the leading entry in every row has zeros below it
(b) A matrix A is in reduced echelon form if
• it is in echelon form
• the leading entry in every row (the pivot) is 1
• each leading 1 is the only non-zero entry in its column
Notes:
1. Echelon form for a matrix is not unique - there are many possibilities.
However, in echelon form the number of pivots is unique. This is the number of non-zero rows in echelon
form.
2. Reduced echelon form for any matrix is unique (there is only one).
In the following examples, the pivots are circled.
Example 4.3.19.
Neither echelon form nor reduced-echelon form need have non-zero entries on their diagonal.
1❦ 0
(a)
- reduced echelon
0 0
❦

2 −1 1
 - echelon
(b) 0 0 3❦
0 0 0
Example 4.3.20.
Reduced echelon form need not have all non-diagonal entries zero (again, the pivots are circled).

❦
1❦ 1
1 0 0 2
(a)
0 0
(c) 0 1❦ 0 −3
❦

0 0 1❦ 2
1 0 0
(b) 0 1❦ 1
0 0 0
CONTENTS
184
Definition 4.3.21.
The rank of a matrix A, denoted rank(A), is the number of non-zero rows in any echelon form of A.
Note: The rank of a matrix is the number of pivots in echelon form.
Gaussian Elimination
Definition 4.3.22.
Gaussian elimination is the procedure of using elementary row-operations to reduce a matrix to echelon form.
Reducing a matrix to reduced echelon form with elementary row-operations is sometimes known as GaussJordan elimination.
FACT The solution of a system of linear equations is not changed by performing elementary row-operations on
the augmented matrix of the system.
Example 4.3.23.
In Matlab:
%Matlab-session
% row-reduction to reduced echelon form
%using matrix A= 2 3 -1 0 4
%
1 1 2 2 0
%
3 0 -1 4 5
%
1 6 5 6 -4
echo on
A =[2 -3 1 0 4; 1 1 2 2 0; 3 0 -1 4 5; 1 6 5 6 -4]
C = rref(A)
% reduced echelon form
pause
% wait till user presses enter
format rat; C
% show C again, this time in fractions:
pause
[C,pivotcols] = rref(A)
%another way to call rref:
% ask rref for two answers (on lhs)
% the second will be the pivot columns
rank(A); format; echo off
% back to defaults
We now recall the procedure of solving a system of linear equations using row-reduction and back-substitution
for the 3 × 3 case, with an example.
Example 4.3.24.
The system of linear equations
2x1 + x2 + x3 =
1
4x1 + x2
= −2
−2x1 + 2x2 + x3 =
7
has augmented matrix
(4.91)


1
2 1 1
 4 1 0 −2 
−2 2 1
7
We row-reduce it to echelon form. At each step,
• the pivot (leading non-zero entry) for the current row will be circled,
• the non-zero entries below the pivot, boxed, will be eliminated with row operations.







1
1
2
1
1
1
1
2
1
1
1
2❦ 1
2❦ 1 1

 4 1 0 −2 
0 −1 −2 −4  ∼  0 -1❦ −2 −4 
∼
∼  0 −1 −2 −4 
r3 →r3 +r1
r2 →r2 −2r1
r3 →r3 +3r2
0
0
−2
2
1
7
7
2
8
0 3
−2 2 1
-4❦ −4

↑
↑
↑
echelon form
CONTENTS
185
The solution can be“read" off in reverse order, from x3 to x1 , using backwards (or back) substitution:
row 3: −4x3 = −4
row 2:
row 1:
⇒ x3 = 1
Using Matlab,
%Matlab-session
% Solve linear system with unique solution
% 1) from matrix A and rhs b
%using matrix A= 2 1 1
%
4 1 0
%
-2 2 1
echo on
%echo commands as executed
A = [2 1 1;4 1 0;-2 2 1];
b = [1 -2 7]’; echo on;
A\b
%find solution if exists
pause
% 2) from rref of augmented matrix
Ab = [A,b]
%form augmented matrix
rref(Ab)
%read off solutions from rref
pause
% 3) from explicit equations, symbolically
[x1,x2,x3]=solve(’2*x1+x2+x3=1, 4*x1+x2+0*x3=-2, -2*x1+2*x2+x3=7’)
echo off
Solution Technique 4.3.25. To solve a linear system of equations Ax = b using row-reduction:
• form the augmented matrix [A|b], and
• either reduce the matrix to reduced echelon form and read off the solutions, or
(i) use Gaussian elimination to row-reduce it to echelon form
(ii) solve with back-substitution
Nature of the solution of a linear system
Note: In solving a system of linear equations with augmented matrix [A|b] for variables x1 , x2 , . . . , xn , the
augmented matrix [A|b] is reduced to echelon form.
For this echelon form:
(a) If the leading non-zero entry (pivot) in any row is in the last column,
1 0 2
e.g. [A|b] ∼ 0 0 1❦
the system is inconsistent : there is no solution.
(b) otherwise, the system is consistent. In this case,
(i) if any column i on the left-hand side has no pivot, the corresponding xi is a free variable - it can take
any real value
"
#
e.g. [A|b] ∼
1❦ 2 1
0 1❦ 1
0 0 0
2 3
0 1
1❦ 4
− x3 is free
↑
Another example of this is when there is a row of zeros at the bottom of echelon form:
1❦ 0 2
e.g. [A|b] ∼ 0 0 0 − x2 is free
↑
CONTENTS
186
(ii) if column i on the left-hand side has a pivot, then xi is a bound variable - it is determined. The number
of bound variables is rank(A).
(iii) if every column of the left-hand side contains a pivot, there is a unique solution.
#
" ❦
e.g. [A|b] ∼
1
0
0
2 1 3
1❦ 1 3
0 1❦ 4
Note: Several linear systems can be solved simultaneously by augmenting the coefficient matrix with any number
of right-hand-side vectors. For example, we write the augmented matrix for a set of systems
Ax = b1 ,
Ax = b2 ,
Ax = b3
as
(4.92)
A b1 b2 b3 .
The reduced echelon form of this more general augmented matrix then yields the solution of all systems at once.
Example
4.3.26.
 ❦

1 4 −1 0 2
 0 1❦ −1 2 2 
0 0
0 1❦ 1
is the echelon form of the augmented matrix of a linear system of equations.
The pivots are in columns 1, 2 and 4 – so x1 , x2 and x4 are bound, and x3 is free. Suppose x3 = s ∈ R. Using
back-substitution,
row 3: x4 = 1
row 2: x2 = s − 2 + 2
= s
row 1: x1 = −4s + s + 2 = −3s + 2.
The solution is
  

 
 
x1
−3s + 2
−3
2
x2  

1
0
s
 =
 =  s +  .
x3  

1
0
s
1
x4
1
0
This is called the general solution of the system.
Example 4.3.27.
Find the general solution ofthe pair of systems whose augmented matrix in echelon form is

2
1 1 0
1 3
 0 0 1 −1 0 −2 

.
 0 0 0 −2 6
2 
0 0 0
0 0
0
CONTENTS
187
In Matlab, you can provide names for the bound variables and get a general solution. The command rref returns
the pivot-columns (corresponding to bound variables). The free variables are the variables which are not bound.
%Matlab-session
% Solve a linear system with infinitely many solutions
% 1) using row-reduction
A = [1 1 1 1; 0 0 1 -1 ;0 0 0 -2; 0 0 0 0];
b=[3 0 6 0]’
[I,pivotcols]=rref([A,b])
pause
% 2) using solve
[x1,x3,x4]=solve(’x1+x2+x3+x4-3, x3-x4,-2*x4-6’,’x1,x3,x4’)
pause
% 3) using solve, with user-input of free variable value:
x_2=input(’enter value of free variable x_2: ’);
[x1,x2,x3,x4]=solve(...
’x1+x2+x3+x4=3, x3-x4,-2*x4-6,x2-x_2’);
v=subs([x1,x2,x3,x4]);
disp(v);
disp(’now see how A\b works when there is not a unique solution’)
disp(’press enter to continue’);
A\b
If the system has at most three variables, Matlab can graph the individual equations. Their common intersection
is the solution. Rotate the graph to see the solution.
%Matlab-session
% Solve a linear system with infinitely many solutions
% (a line) - we plot the solution too
view(37,65);
A = [1 1 1; 1 1 -1];
b=[3 1]’
[I,pivotcols]=rref([A,b])
disp(’infinitely many solutions from rref.’)
disp(’now display symbolic solution’);
pause
syms x1 x2 x3; %specify bound variables as symbolic
[x1,x2,x3]=solve(’x1+x2+x3-3, x1+x2-x3-1’,x1,x2,x3)
ezplot3(x1,x2,x3,[-3 3]);
%plot solution
hold on;
%superimpose graphs
u=-3:0.1:3;
[x,y]=meshgrid(u,u);
mesh(x,y,3-x-y)
%plot first equation
mesh(x,y,-1+x+y); hold off; %plot second equation
CONTENTS
188
x = −x2+2, y = x2, z = 1
10
z
5
0
−5
−10
−5
0
y
5
−4
2
0
−2
4
6
8
x
4.3.2 Matrix notation and concepts
Introduction
• A matrix is a rectangular array of numbers ordered in rows and columns.
• A matrix with m rows and n columns has size m × n.
For example, the matrix

 

a11 a12 a13 a14
2 1 1 0
A = a21 a22 a23 a24  =  4 1 0 −3
a31 a32 a33 a34
−2 2 1 −1
has size 3 × 4.
• The number in the ith row and jth column of an matrix A is denoted aij , and called the (i, j) entry or (i, j)
component or term of A.
• An m × n matrix A is generally written
A = (aij )m×n
Note: All matrices we study will be real matrices: their entries numbers from the real number-line R.
• Special sizes:
–
–
–
–
An m × n matrix with m = n is called square.
A 1 × 1 matrix a11 is written a11 and called a scalar. It is an element of R - a real number.
A matrix with one column is called a column vector. An m × 1 column vector is an element of Rm .
A matrix with one row is called a row vector. A 1 × n row vector is an element of Rn .
• Two matrices are equal if they are the same size, and all corresponding entries are equal, i.e.:
A=B
⇔
aij = bij
for all relevant i and j.
Special Matrix Structure
Diagonal and Triangular Matrices
Definition 4.3.28.
The main diagonal of an m × n matrix A consists of the terms a11 , a22 , . . . , akk , where k = min(m, n).
CONTENTS
189
Example 4.3.29.


1 2 3 4
Circle the entries on the main diagonal of the matrix A = 5 6 7 8 .
9 10 11 12
Definition 4.3.30.
A square matrix is
diagonal if all non-zero entries
h ∗ 0 are
i
0
on the main diagonal: 0 ∗ 0 ;
00∗
upper-triangular if all non-zero entries
h ∗ ∗are
i
∗
on or above the main diagonal: 0 ∗ ∗ ;
00∗
lower-triangular if all non-zero entries
i
h ∗ 0are
0
on or below the main diagonal: ∗ ∗ 0 ;
∗∗∗
triangular if it is either upper-triangular or lower-triangular.
Note: A square matrix in echelon form is upper-triangular.
Example 4.3.31.
Classify the following matrices as upper-triangular, lower-triangular or diagonal:


0 0 0
1 2
(a)
(d) 1 0 0
0 3


0 0 0
1 0 0
0 3
(b) 4 5 0
(e)
1 0
6 0 0




1 2 3
1 0 0
(f) 0 2 0
(c) 0 4 5
0 0 0
0 0 3
Matrix Operations
Matrix Addition/Subtraction
We define addition and subtraction on matrices of the same size, component-wise.
If A = (aij )m×n and B = (bij )m×n ,
A + B = (aij + bij )m×n .
A − B = (aij − bij )m×n .
The order in which matrices are added is irrelevant:
A + B = B + A,
and
A + (B + C) = (A + B) + C.
Scalar Multiplication
The scalar multiple of matrix A = (aij )m×n and real scalar r has
• the same size as A,
• rA = (raij )m×n ,
i.e. each term of rA is just r multiplied by the corresponding term of A.
Scalar multiplication satisfies the distribution property
r(A + B) = rA + rB.
CONTENTS
190
Matrix Multiplication
If matrix A has as many columns as matrix B has rows, the matrix product AB is defined:
(i,j)-entry:
(ab)ij = (ith row of A) • (jth column of B)
(4.93)
↑ dot-product
compatibility/size: (m × n) × (n × p) = (m × p).
Example 4.3.32.


−1 0
2
−1 1
1 0 −1 1
1
,
If A =
and B = 
0
0 1 −1 0
1 −1
1 −1 0
then (ab)13 = a 1st row · b 3rd column
.
= 1 0 −1 1 · 2 1 −1 0
= (1)(2) + (0)(1) + (−1)(−1) + (1)(0)
=3
0 −2 3
.
In full, AB =
−1 0 2
Identity Matrix
In real arithmetic, the number 1 is the multiplicative identity - multiply any number n by 1 and you get that
number back: n·1 = 1·n = n for n ∈ R. The identity matrices In perform the same role in matrix multiplication
– for an m × n matrix A,
AIn = Im A = A.
(4.94)
Here, the identity In is a square n × n matrix:
In = (iij )n×n ,

Example 4.3.33.
(
1 i=j
iij =
0 i=
6 j
1 0 0 ···
0 1 0 · · ·


In =  0 0 1 · · ·
 .. .. .. . .
. . .
.
0 0 0 ···
(4.95)

0
0

0

.. 
.
1
1 2 3
Show (4.94) for A =
.
−1 0 1
Note: The order in which matrices are multiplied is important. It may not be that AB is the same as BA.
In general, for matrices A (m × n) and B (p × q),
• multiplication may not be defined in both orders,
• multiplication may be compatible in both orders, but AB and BA may not be the same size.
• multiplication may be defined in both orders, but AB 6= BA.
Example 4.3.34.
For the following pairs of matrices A and B, are the matrix products AB and BA both defined? If so, is
AB = BA?
CONTENTS
1 2
(a) A =
,
1 1


1 2
(b) A = 2 1,
1 1
(c) A = 1 2 3 ,
191
1 1
B=
.
1 1
1 0 1
B=
,
0 −1 2


1 1
B = 1 1.
1 1
Matrix multiplication satisfies the properties
A(BC) = (AB)C
A(B + C) = AB + AC
(A + B)C = AC + BC
Example4.3.35.



1
0 0
−1 0 0
For A = −1 −1 0 and B = −1 1 0, find AB and BA.
0 −1 1
1 0 1
The product of two lower-triangular matrices is again lower-triangular.
The product of two upper-triangular matrices is again upper-triangular.
Matrix-Vector Multiplication
The multiplication of an n × m matrix A with an m × 1 vector x can be expressed in two ways:

 
a11 a12 a13
x1
Ax = a21 a22 a23  x2 
a31 a32 a33
x3
(1) matrix form

(4.96)

(4.97)

a11 x1 + a12 x2 + a13 x3
Ax = a21 x1 + a22 x2 + a23 x3  ,
a31 x1 + a32 x2 + a33 x3
(2) vector form
 

 
a11
a12
a13
Ax = a21  x1 + a22  x2 + a23  x3 .
a31
a32
a33
CONTENTS
192
Matrix Transpose
Definition 4.3.36.
By “flipping" an m × n matrix A along its main diagonal, we obtain its transpose, the n × m matrix AT .
Notationally,
aT = (aTij ) = (aji )
Example 4.3.37.




2 1 1
2 4 −2
A =  4 1 0 ⇒ AT = 1 1 2  .
−2 2 1
1 0 1
Definition 4.3.38.
A matrix A equal to its own transpose
A = AT
is called symmetric.
Example 4.3.39.
Which of the following matrices are symmetric?
1 0
(a) A =
0 2


1 0 0
(b) B = 1 2 0
3 −2 1


−2 1 0
(c) C = −1 2 1
0 −1 2


1 2 3
(d) D = 2 5 7 
3 7 −1
Determinant of a matrix
Associated with a square
n × n matrix A is a scalar called the determinant of A, written either det(A) or |A|.
In the scalar case, det a = a.
In the 2 × 2 case,
a b
a b
= ad − bc.
det
=
c d
c d
Cofactors, covered in 108, give this formula directly, and a formula for the determinant of a 3 × 3 matrix that
you may have seen before:
a11 a12 a13
a
a
a
a
a21 a22 a23 = a11 22 23 − a12 21 23
a32 a33
a31 a33
a31 a32 a33
+ a13
(4.98)
a21 a22
a31 a32
Of course, using cofactors, you can evaluate the determinant down any column or across any row to make for
less work. And with Matlab,
%Matlab-session
%
echo on
a=[1 2 3;2 3 4;1 1 -1]
det(a)
syms x
b=[exp(x) exp(-x); exp(x) -exp(-x)]
det(b)
simplify(ans)
echo off;
CONTENTS
193
Example 4.3.40.
(a) det(In ) = 1 for any n > 0.
a 0 0
(b) Find 0 b 0 .
0 0 c
a b c
(c) Find 0 d e .
0 0 f
FACT
The determinant of a triangular matrix is
the product of its diagonal terms.
Note: This is not the case for matrices in general.
Inverse of a matrix
Definition 4.3.41. (i) If an n × n matrix A has an inverse, we say it is invertible, or non-singular.
(ii) Its inverse is another n × n matrix A−1 (read “A inverse”), with the property that
AA−1 = A−1 A = In .
(iii) A square matrix without an inverse is called singular.
When a square matrix has an inverse, its reduced echelon form is the identity. Row-reduction performed simultaneously on the identity derives A−1 .
To invert a matrix using Gaussian elimination
• form the augmented matrix [A : In ]
• use Gaussian elimination simultaneously on both parts of the augmented matrix to reduce A (on the left) to
In
• the resulting expression on the right is A−1 - the augmented matrix is now [In : A−1 ].
FACT an n × n matrix A has an inverse exactly when its reduced echelon form is In , the identity matrix,
A ∼ In . So a square matrix A is singular when its reduced echelon form is not the identity matrix.
Notes
1. The inverse of a square lower-triangular matrix is again lower-triangular.
2. Similarly, the inverse of a square upper-triangular matrix is again upper-triangular.
Example 4.3.42.
Find the inverse of each triangular matrix:


1 0 0
(a) −2 1 0
0 0 1


1 0 0
(b) 0 1 0
0 −3 1

1 2 1
(c) 0 2 1 
0 0 −1

i.e.
CONTENTS
194
Matlab uses two equivalent notations for finding the inverse of a matrix directly. We can row-reduce the augmented matrix in Matlab too.
%Matlab-session
% Invert a matrix
echo on; format rat
A = [2 1 -1; 1 -1 1; 1 -2 1]
inv(A)
% directly
A^(-1)
%equivalent expression
pause
AI=[A,eye(3)]
% using Gaussian elimination
rref(AI)
echo off; format
Example 4.3.43.
When matrices A and B have inverses, their product AB has inverse B −1 A−1 . Using matrix-multiplication,
show this.
matrix inverses: Ax = b
When a square matrix A is invertible (non-singular), the solution of the linear system
Ax = b
is given by pre-multiplying both sides of this equation by A−1 :
A−1 Ax = A−1 b
(4.99)
∴ x = A−1 b
In other words, If A is a non-singular matrix, every system of linear equations Ax = b has a unique solution.
Note: This yields the important result:
If A−1 exists, then
– i.e. there is no other solution.
Ax = 0 ⇒
x=0
(4.100)
Example 4.3.44.
Use your answer to Example 4.3.42 to solve the linear system


 
1 2 1
2
0 2 1  x = 3
0 0 −1
1
In practise, unless we need to find the inverse of a matrix for some other reason, the procedure (4.99) is seldom
used to solve larger systems of equations. It is much less work (about one-third) to use row-reduction. And with
less computation involved, fewer computational errors are incurred.
CONTENTS
195
Summary 4.3.45.
The following are equivalent conditions:
• A−1 exists – i.e. A is non-singular, or invertible
• det(A) 6= 0
• A ∼ In
• every row and column of echelon form has a pivot
• rank(A) = n
• the system of linear equations Ax = b has a unique solution for every b
• the homogeneous system of linear equations Ax = 0 has only the trivial solution x = 0
Consequently, when A does not have an inverse:
Summary 4.3.46.
The following are equivalent conditions:
• A−1 does not exist – A is a singular matrix
• det(A) = 0
• A 6∼ In
• not every row and column of echelon form has a pivot
• rank(A) < n
• the homogeneous system of linear equations Ax = 0 has a non-trivial solution x 6= 0
Index
applications of differential
equations
business, 141
finance, 140
population growth, 147
augmented matrix of coefficients, 182
discrete, 122
echelon form, 64, 183
eigenspace, 107
eigenvalue
of matrix, 106
eigenvector
of matrix , 106
error
basis, 77, 90
Taylor polynomial aporthogonal, 91
proximation, 46
orthonormal, 91
least squares, 97
by-parts, 58
numerical solution of DE,
chain rule, 56
146
characteristic equation, 163 Euclidean vector space, 76
of a DE, 149
existence
complex roots, 164
solution of 1st order DE,
distinct roots, 164
131
repeated roots, 166
family of solutions
Col(A), 81
DE, 131
column space, 81
convergence
Gauss-Jordan elimination, 184
radius of, 50
Gaussian elimination, 184
convergence test
general solution
ratio, 43
of system of 1st-order
cross-product, 174
DEs, 151
cross-product term, 113
homogeneous linear DE,
162
DE, 128
to
logistic
equation, 147
derivative
general solution
directional, 15
linear system, 87
determinant
general
solution of system
by cofactor expansion,
of 1st-order DEs,
104
185
diagonalisation
geometric series, 41
of matrix, 110
gradient vector, 14
differential equation
first-order linear, 136 Gram-Schmidt process, 94
homogeneous, 156
Hessian, 19
linear, 129
homogeneous
second-order linear, 156
differential equation, 149
separable, 133
linear system , 64
directional derivative, 15
dot product, 171
indeterminate form, 32
dynamical system, 122
inhomogeneous
196
INDEX
linear system , 64
initial condition, 132, 157
initial-value problem
first-order, 132
initial condition, 132,
157
second-order, 157
second-order homogeneous, 167
integrating factor, 136
integration
substitution, 56
integration by parts, 58
interval of convergence
power series, 50
inverse
by Gaussian elimination,
193
197
column space, 81
determinant, 65, 105,
192, 195
diagonal, 103, 188
Hessian, 19
identity, 190
indefinite, 20
inverse, 193
invertible, 65, 193, 195
lower-triangular, 188
main diagonal, 103, 188
minor, 103
negative-definite, 20, 114
non-singular, 193
nullity, 85
nullspace, 83
orthogonal, 91
positive-definite, 20, 114
principal minor, 114
L’Hôpital’s Rule, 32
rank, 65, 195
least squares
row operations, 64, 182
error, 97
row-equivalent, 64, 183
linear combination, 66
stochastic, 117
linear dependence, 68, 158
symmetric, 192
linear independence, 68, 158
transpose, 192
linear system
triangular, 188
homogeneous , 64
triangular, upper & lower,
inhomogeneous , 64
103
general solution, 87
upper-triangular, 188
homogeneous, 181
matrix multiplication
inhomogeneous
matrix form, 191
general solution, 87
vector form, 191
particular solution, 87 mixed term, 113
unique solution, 65, 195
linearity, 87
Newton’s Law of Cooling,
logistic equation
139
DE, 147
normal equations, 99, 101
long-term, 123
normalising, 91
long-term state vector, 119 Null(A), 83
nullity, 85
Maclaurin formula, 49
numerical solution of DE
Maclaurin polynomial, 45
error, 146
Maclaurin series, 49
Markov Chain, 116
order of DE, 129
Markov process
orthogonal matrix, 91
state, 116
orthonormal basis, 91
state vector, 116
partial sum, 39
long-term, 119
particular solution, 87
matrix
pivot, 64, 183
augmented, 182
characteristic equation, plane
normal equation, 176
106
power series, 50
cofactor, 103
INDEX
interval of convergence,
50
radius of convergence,
50
principal minor, 114
principal submatrix, 114
product rule, 58
projection
orthogonal, 93
orthogonal , 93
quadratic form, 113
indefinite, 114
negative-definite, 114
positive-definite, 114
198
systems, 185
squeezing theorem, 30
standard basis of Rn , 77
state, 116
state vector, 116
stochastic matrix, 117
subspace, 79
substitution, 56
superposition principle, 157
Taylor formula, 49
Taylor polynomial, 45
Taylor series, 49
test
ratio, 43
transition matrix, 123
radius of convergence
unit vector, 171
power series, 50
rank, 184
reduced echelon form, 64, variable
bound, 185
183
dependent, 128
row-operations
free, 185
elementary, 64, 182
independent, 128
rref, 68
vector
rule
gradient, 14
chain, 56
length, 171
product, 58
long-term state, 119
state, 116
sequence, 27
vector space, 75
convergent, 29
basis, 77, 90
divergent, 29
dimension, 78
series, 38
vector space of solutions
partial sum , 39
of DE, 157
convergent, 39
vectors
divergent, 39
angle between, 172
geometric, 40
orthogonal, 172
slopefield, 142
parallel, 173
solution
perpendicular, 172
2nd order DE, 156
span of, 72
DE, 129
explicit, of DE, 129
Wronskian, 159
general, 186
implicit, of DE, 129
trivial, 181
solution
particular, 87
solution technique
nth -order homogeneous
linear constant-coefficient,
167
first-order linear DE, 136
separable, 133
system of DEs, 151
Download