Uploaded by cajasa7268

LectureNotes Part2(21-22) reduced

advertisement
Contents
Preface
5
21 Vector spaces associated with matrices, 1 of 2
21.1 Null space, column space and row space of a matrix
21.2 Bases for N (A), CS(A) and RS(A) . . . . . . . .
21.3 Exercises for self study . . . . . . . . . . . . . . . .
21.4 Relevant sections from the textbooks . . . . . . . .
.
.
.
.
22 Vector spaces associated with matrices, 2 of 2
22.1 Orthogonality between N (A) and RS(A) . . . . . . .
22.2 The rank-nullity theorem . . . . . . . . . . . . . . . .
22.3 Cartesian descriptions for N (A), CS(A) and RS(A)
22.4 Linear independence and span of a set X revisited . .
22.5 Consistency of a linear system revisited . . . . . . . .
22.6 Exercises for self study . . . . . . . . . . . . . . . . .
22.7 Relevant sections from the textbooks . . . . . . . . .
23 Linear transformations, 1 of 6
23.1 The main definitions . . . . . . . . . . . . . . . . .
23.2 Identity, compositions, linear combinations, inverse
23.3 Range and kernel . . . . . . . . . . . . . . . . . . .
23.4 Rank-nullity theorem for linear transformations . .
23.5 Exercises for self study . . . . . . . . . . . . . . . .
23.6 Relevant sections from the textbooks . . . . . . . .
24 Linear transformations, 2 of 6
24.1 Matrix representation of a linear T : Rn → Rm
24.2 Reflections, rotations and stretches in R2 . . .
24.3 The matrix AB→C
. . . . . . . . . . . . . . .
T
24.4 Exercises for self study . . . . . . . . . . . . .
24.5 Relevant sections from the textbooks . . . . .
25 Linear transformations, 3 of 6
25.1 Change of basis and transition matrix
25.2 The transition matrix PB→B 0 . . . .
25.3 Exercises for self study . . . . . . . .
25.4 Relevant sections from the textbooks
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6
6
7
12
12
.
.
.
.
.
.
.
13
13
14
15
16
18
19
20
.
.
.
.
.
.
20
21
22
22
24
25
26
.
.
.
.
.
26
26
28
30
33
34
.
.
.
.
34
34
38
41
42
26 Linear transformations, 4 of 6
26.1 Change of basis and linear transformations
26.2 Similarity . . . . . . . . . . . . . . . . . .
26.3 Diagonalisable linear transformations . . .
26.4 Eigenvalues, eigenvectors and eigenspaces .
26.5 Exercises for self study . . . . . . . . . . .
26.6 Relevant sections from the textbooks . . .
27 Linear transformations, 5 of 6
27.1 Diagonalisation . . . . . . . . . . . .
27.2 Algebraic and geometric multiplicity
27.3 Exercises for self study . . . . . . . .
27.4 Relevant sections from the textbooks
.
.
.
.
.
.
.
.
28 Linear transformations, 6 of 6
28.1 Orthogonal matrices . . . . . . . . . . .
28.2 Orthogonal diagonalisation . . . . . . .
28.3 Symmetric matrices and quadratic forms
28.4 Exercises for self study . . . . . . . . . .
28.5 Relevant sections from the textbooks . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29 Multivariate calculus, 1 of 5
29.1 Functions of Two Variables . . . . . . . . . . . . .
29.2 Partial derivatives . . . . . . . . . . . . . . . . . . .
29.3 Geometrical interpretation of the partial derivatives
29.4 Tangent planes . . . . . . . . . . . . . . . . . . . .
29.5 Exercises for self study . . . . . . . . . . . . . . . .
29.6 Relevant sections from the textbooks . . . . . . . .
30 Multivariate calculus, 2 of 5
30.1 The gradient . . . . . . . . . . . . . . . . . .
30.2 The derivative . . . . . . . . . . . . . . . . .
30.3 Directional derivatives . . . . . . . . . . . .
30.4 The rate of change of a function f : R2 → R
30.5 Exercises for self study . . . . . . . . . . . .
30.6 Relevant sections from the textbooks . . . .
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
42
42
45
46
47
51
53
.
.
.
.
53
53
58
59
60
.
.
.
.
.
60
60
61
64
67
68
.
.
.
.
.
.
68
68
72
73
75
78
79
.
.
.
.
.
.
79
79
81
82
85
86
88
31 Multivariate calculus, 3 of 5
31.1 Functions of n variables . . . . . . . . . . . .
31.2 Tangent hyperplanes . . . . . . . . . . . . . .
31.3 Stationary points . . . . . . . . . . . . . . . .
31.4 Contours, gradient and directional derivatives
31.5 Vector-valued functions . . . . . . . . . . . . .
31.6 The general chain rule . . . . . . . . . . . . .
31.7 Adapting the chain rule . . . . . . . . . . . .
31.8 Exercises for self study . . . . . . . . . . . . .
31.9 Relevant sections from the textbooks . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
32 Multivariate calculus, 4 of 5
32.1 The second derivative of a function . . . . . . . . . . . . . . . . . . .
32.2 Taylor polynomial for a scalar-valued function . . . . . . . . . . . . .
32.3 Classification of stationary points based on the Taylor polynomial P2
32.4 Classifying f 00 using the principal minors . . . . . . . . . . . . . . . .
32.5 Convex sets, convex and concave functions f : Rn → R . . . . . . . .
32.6 Convexity and concavity for twice differentiable functions . . . . . . .
32.7 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . .
32.8 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . .
33 Multivariate calculus, 5 of 5
33.1 Motivating Lagrange’s method . . . . . . . . . . .
33.2 Lagrange’s method with an equality constraint . .
33.3 Regarding the form of the Lagrangian . . . . . . .
33.4 Regarding the applicability of Lagrange’s method
33.5 The Lagrange multiplier . . . . . . . . . . . . . .
33.6 Exercises for self study . . . . . . . . . . . . . . .
33.7 Relevant sections from the textbooks . . . . . . .
34 Differential and difference equations, 1 of 5
34.1 Interest compounding . . . . . . . . . . . . .
34.2 Nominal and effective interest . . . . . . . .
34.3 Discounting and present value . . . . . . . .
34.4 Arithmetic sequences and their partial sums
34.5 Geometric sequences and their partial sums
34.6 Exercises for self study . . . . . . . . . . . .
34.7 Relevant sections from the textbooks . . . .
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
88
88
88
89
90
90
92
94
96
97
.
.
.
.
.
.
.
.
97
97
98
99
102
104
105
106
107
.
.
.
.
.
.
.
107
107
109
110
110
114
116
118
.
.
.
.
.
.
.
118
118
119
119
120
121
122
123
35 Differential and difference equations, 2 of 5
35.1 Complex numbers . . . . . . . . . . . . . . .
35.2 Euler’s formula and polar exponential form .
35.3 Operations on C . . . . . . . . . . . . . . .
35.4 Roots of polynomials . . . . . . . . . . . . .
35.5 Exercises for self study . . . . . . . . . . . .
35.6 Relevant sections from the textbooks . . . .
.
.
.
.
.
.
.
.
.
.
.
.
36 Differential and difference equations, 3 of 5
36.1 Difference equations . . . . . . . . . . . . . . .
36.2 Difference equations of the form P (E)yx = 0 . .
36.3 Difference equations of the form P (E)yx = Q(x)
36.4 Exercises for self study . . . . . . . . . . . . . .
36.5 Relevant sections from the textbooks . . . . . .
37 Differential and difference equations, 4 of 5
37.1 Linear ODEs with constant coefficients . . .
37.2 Solving ODEs of the form P (D)y = 0 . . . .
37.3 Solving ODEs of the form P (D)y = Q(x) . .
37.4 Exercises for self study . . . . . . . . . . . .
37.5 Relevant sections from the textbooks . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
38 Differential and difference equations, 5 of 5
38.1 Ordinary and partial differential equations . . . .
38.2 Separable ODEs . . . . . . . . . . . . . . . . . . .
38.3 Introduction to partial differential equations . . .
38.4 Exact ODEs . . . . . . . . . . . . . . . . . . . . .
38.5 Linear ODEs . . . . . . . . . . . . . . . . . . . .
38.6 Homogeneous ODEs . . . . . . . . . . . . . . . .
38.7 Solving Homogeneous ODEs . . . . . . . . . . . .
38.8 Solving ODEs by changing the dependent variable
38.9 Exercises for self study . . . . . . . . . . . . . . .
38.10Relevant sections from the textbooks . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
39 Systems of difference and differential equations
39.1 Linear homogeneous systems of difference equations .
39.2 Linear homogeneous systems of differential equations
39.3 Exercises for self study . . . . . . . . . . . . . . . . .
39.4 Relevant sections from the textbooks . . . . . . . . .
4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
123
123
126
126
128
129
130
.
.
.
.
.
130
130
132
136
139
140
.
.
.
.
.
140
140
140
144
148
148
.
.
.
.
.
.
.
.
.
.
149
149
150
151
152
154
156
157
159
159
160
.
.
.
.
160
160
162
163
164
Preface
These lecture notes are intended as a self-contained study resource for the MA100 Mathematical Methods course at the LSE. At the same time, they are designed to complement
the MA100 course texts, Linear Algebra, Concepts and Methods by Martin Anthony and
Michele Harvey, and Calculus, Concepts and Methods by Ken Binmore and Joan Davies.
I am grateful to Martin Anthony and Michele Harvey for allowing me to use some materials from their Linear Algebra, Concepts and Methods textbook and to Michele Harvey for
carefully reading and commenting on an earlier draft of the Calculus part of these lecture
notes. I am also grateful to Siri Kouletsis for her invaluable help with typing and editing
the manuscript and for various improvements to its content.
5
21
Vector spaces associated with matrices, 1 of 2
In this section, we will consider three vector spaces associated with a matrix A: the null
space of A, the column space of A and the row space of A. We will find a basis for each of
these vector spaces and then, in Lecture 22, we will establish relationships between these
spaces which will pave the way for introducing linear transformations in Lecture 23.
21.1
Null space, column space and row space of a matrix
Recall that the null space N (A) of an m × n matrix A is the set of solutions of the linear
system Ax = 0 and that N (A) is a subspace of Rn . There are two other vector spaces
associated with A: the column space of A and the row space of A. Their definitions are
given below.
If A is an m × n matrix, and if c1 , c2 , . . . , cn denote the columns of A, then the column
space of A, CS(A), is the linear span of the columns of A:
CS(A) = Lin {c1 , c2 , . . . , cn } .
Note that each column ci is an m × 1 vector, so each ci belongs to Rm . Hence, since the
linear span of a set of vectors in a vector space V is a subspace of V , it follows that the
column space CS(A) is a subspace of Rm .
Similarly, if A is an m × n matrix, and if r1 , r2 , . . . , rm denote the transposed rows of A,
then the row space of A, RS(A), is the linear span of the transposed rows of A:
RS(A) = Lin {r1 , r2 , . . . , rm } .
Note that each transposed row ri is an n × 1 vector, so each ri belongs to Rn . Therefore,
the row space RS(A) is a subspace of Rn .
Example 21.1.1 Consider the 3 × 5 matrix A given by


1 2 1 3 4
A = 1 1 0 5 −1 .
3 5 2 11 7
The column space of A is given by
          
2
1
3
4 
 1









1 , 1 , 0 , 5 , −1
CS(A) = Lin


3
5
2
11
7
and is a subspace of R3 . On the other hand, the row space of A is given by
     
1
1
3 












2  1   5 

    
RS(A) = Lin 
1 ,  0  ,  2 



3  5  11






4
−1
7
6
and is a subspace of R5 . To find the null space of A, we need to solve the homogeneous
system Ax = 0. Since the right-hand side of this equation is the zero vector, instead of
working with the augmented matrix (A|0), we can simply consider the coefficient matrix
A. Performing a few elementary row operations on A (the steps are omitted), we obtain
its reduced row echelon form:




1 2 1 3 4
1 0 −1 7 −6
RRE 
1 1 0 5 −1 −
−−−→ 0 1 1 −2 5  .
3 5 2 11 7
0 0 0
0
0
We see that x1 and x2 are leading variables and that x3 , x4 and x5 can be regarded as free
parameters. Therefore, the general solution of the homogeneous system Ax = 0 is given
by
 
 
 
 
x1
1
−7
6
 x2 
−1
2
−5
 
 
 
 
 x3  = x3  1  + x4  0  + x5  0  .
 
 
 
 
 x4 
0
1
0
x5
0
0
1
It follows that the null space of A is given by
      
1
−7
6 












−1  2  −5







N (A) = Lin  1  ,  0  ,  0  .



 0   1   0 






0
0
1
This is a subspace of R5 .
21.2
Bases for N (A), CS(A) and RS(A)
Given an m × n matrix A, we would like to find bases for its null space, its column space
and its row space. It turns out that the reduced row echelon form of A gives us all the
information we need in order to obtain these bases. Let us illustrate how using the matrix
A from Example 21.1.1.
A basis for N (A):
We claim that the three vectors appearing in the parametric solution
 
 
 
 
x1
1
−7
6
x2 
−1
2
−5
 
 
 
 
x3  = x3  1  + x4  0  + x5  0 
 
 
 
 
x4 
0
1
0
x5
0
0
1
of the homogeneous system Ax = 0 constitute a basis B1 for N (A), that is,
      
1
−7
6 












−1  2  −5

    
B1 = 
 1  ,  0  ,  0  .



0   1   0 






0
0
1
7
In order to prove this statement, we need to show that the set B1 is a linearly independent
set which spans N (A). First, B1 spans N (A) since any solution x of the equation Ax = 0
(that is, any element of N (A)) is obviously in the linear span of B1 as it can be expressed
as a linear combination of the vectors in B1 . Moreover, B1 is a linearly independent set.
This is because by construction of the general solution of the system Ax = 0, each vector
in B1 contains a leading one in a position where all the remaining vectors in B1 have zeros:
indeed, the three vectors in B1 have the form
      
∗
∗
∗ 












∗ ∗ ∗







B1 = 1 , 0 , 0 .



 0   1   0 






0
0
1
This implies that no vector in B1 can be written as a linear combination of the remaining
vectors in B1 and therefore, by Theorem 19.1.1, B1 is a linearly independent set.
Hence B1 is a basis for N (A) and since B1 consists of three vectors, we deduce that N (A)
is a 3-dimensional subspace of R5 .
A basis for CS(A):
We claim that the columns of A corresponding to the leading columns of RRE(A) (that
is, the columns of A corresponding to the columns of RRE(A) that contain the leading
ones) constitute a basis B2 for CS(A). In Example 21.1.1, the leading ones of RRE(A)
appear in the first two columns, so we take the corresponding columns of A, namely,
   
2 
 1



1 , 1 .
B2 =


3
5
In order to establish this result, we need to show that the set B2 is a linearly independent
set which spans CS(A). First, B2 is a linearly independent set because the elementary
row operations that turn A into RRE(A), namely




1 2 1 3 4
1 0 −1 7 −6
RRE 
1 1 0 5 −1 −
−−−→ 0 1 1 −2 5  ,
3 5 2 11 7
0 0 0
0
0




1 2
1 0
transform the submatrix 1 1 into 0 1, i.e.
3 5
0 0




1 2
1 0
RRE 
1 1  −
−−−→ 0 1 .
3 5
0 0


1 2

This implies that the matrix 1 1 has full column rank, which establishes the linear
 3 5
 
1
2



independence of its columns 1 and 1. In addition, B2 spans CS(A) because any
3
5
8
other column of A is a linear combination of the two columns in B2 . More precisely,
consider the submatrix (c1 c2 c3 ) of A = (c1 c2 c3 c4 c5 ) and perform the same elementary
row operations that turn A into RRE(A). We get




1 2 1
1 0 −1
RRE 
1 1 0 −
−−−→ 0 1 1  .
3 5 2
0 0 0
The absence of a third leading one implies that (c1 c2 c3 ) does not have full column rank,
so the set {c1 , c2 , c3 } is a linearly dependent set. In particular, the matrix


1 0 −1
0 1 1 
0 0 0
tells us that the reduced row echelon form of the augmented matrix (c1 c2 c3 | 0) is


1 0 −1 0
0 1 1 0 ,
0 0 0 0
which implies that the general solution of the corresponding homogeneous system (c1 c2 c3 )x = 0
is given by
 
 
1
x1
x = x2  = x3 −1 .
1
x3
Given that the matrix equation
 
x1

(c1 c2 c3 ) x2  = 0
x3
can be expressed as

x1 c1 + x2 c2 + x3 c3 = 0,

1

the particular solution x = −1 (obtained by choosing x3 = 1 in the general solution
1
for x) implies that
c1 − c2 + c3 = 0.
Hence, c3 is a linear combination of the two columns in B2 ,
c3 = −c1 + c2 ,
which shows that c3 ∈ Lin {c1 , c2 } = Lin(B2 ).
Next, by inspecting RRE(A), we identify the reduced row echelon form of the submatrix
(c1 c2 c4 ) of A = (c1 c2 c3 c4 c5 ):




1 2 3
1 0 7
RRE 
1 1 5  −
−−−→ 0 1 −2 .
3 5 11
0 0 0
9
Considering the general solution of the homogeneous system (c1 c2 c4 )x = 0 and following
an identical argument as above, we now obtain the statement that
−7c1 + 2c2 + c4 = 0.
Hence, c4 is a linear combination of the two columns in B2 ,
c4 = 7c1 − 2c2 ,
which implies that c4 ∈ Lin {c1 , c2 } = Lin(B2 ).
Finally, by inspecting RRE(A) once more, we can identify the reduced row echelon form
of the submatrix (c1 c2 c5 ) of A = (c1 c2 c3 c4 c5 ):




1 2 4
1 0 −6
RRE 
1 1 −1 −
−−−→ 0 1 5  .
3 5 7
0 0 0
By a similar reasoning, we now infer that
6c1 − 5c2 + c5 = 0,
and hence
c5 = −6c1 + 5c2 .
Therefore, c5 ∈ Lin {c1 , c2 } = Lin(B2 ).
We have thus shown that all the other columns of A can be expressed as linear combinations
of the columns in B2 and hence B2 is a basis for CS(A). Moreover, since B2 consists of
two vectors, we deduce that CS(A) is a 2-dimensional subspace of R3 .
A practical approach: Note that we do not really have to consider the individual submatrices (c1 c2 c3 ), (c1 c2 c4 ) and (c1 c2 c5 ) of A in order to express c3 , c4 and c5 as linear
combinations of c1 and c2 . The basis for the null space of A found before, namely
      
1
−7
6 












−1  2  −5

   
1
B1 = 
 ,  0  ,  0  ,


 0   1   0 






0
0
1
reveals directly which linear combinations of c1 , c2 produce the remaining columns c3 ,
c4 and c5 of A. More precisely, each of the basis vectors 
in the
 null space of A gives a
x1
 x2 
 

particular solution of the homogeneous system (c1 c2 c3 c4 c5 ) 
x3  = 0. Using the fact that
 x4 
x5
this equation amounts to x1 c1 + x2 c2 + x3 c3 + x4 c4 + x5 c5 = 0, we conclude immediately
10
that the following three linear combinations of the columns of A are equal to the zero
vector:
c1 − c2 + c3 = 0,
−7c1 + 2c2 + c4 = 0,
6c1 − 5c2 + c5 = 0.
Solving these equations for c3 , c4 and c5 in terms of c1 and c2 yields the required linear
combinations.
A basis for RS(A):
We claim that the transposed leading rows of RRE(A) (that is, the rows of RRE(A) that
contain the leading ones) constitute a basis B3 for RS(A). In Example 21.1.1, the leading
ones of RRE(A) appear in the first two rows, so B3 is given by
    
1
0 










0
   1 





B3 = −1 ,  1  .



 7  −2






−6
5
In order to establish this fact, we need to show that the set B3 is a linearly independent
set which spans RS(A). First, B3 is a linearly independent set because of the positions of
the leading ones. More precisely, each vector in B3 has a leading one where all the other
vectors in B3 have zeros. In the particular example above, the vectors in B3 are of the
form
    
1
0 










 0   1 





B3 = ∗ , ∗ .



∗ ∗






∗
∗
This implies that no vector in B3 is a linear combination of the remaining vectors in B3 .
Hence, by Theorem 19.1.1, B3 is a linearly independent set. To show that the set B3 spans
RS(A), we just need to observe that the elementary row operations (that is, Ri ↔ Rj ,
Ri 7→ λRi and Ri 7→ Ri + λRj ) guarantee that each row in RRE(A) is a linear combination of the original rows of A. Hence, we deduce that RS(RRE(A)) ⊆ RS(A). Moreover,
each such elementary row operation is invertible, which by a similar argument implies that
RS(A) ⊆ RS(RRE(A)). Combining these two statements gives RS(A) = RS(RRE(A)).
Using this result and also the fact that, by definition, Lin(B3 ) = RS(RRE(A)), we conclude that Lin(B3 ) = RS(A), which shows that B3 spans RS(A). We have thus shown
that B3 is a basis for RS(A). Moreover, since B3 consists of two vectors, RS(A) is a
2-dimensional subspace of R5 .
Remark 21.2.1 Let us emphasise that a basis for RS(A) consists of the leading rows of
the reduced row echelon form RRE(A). This is all that the above proof guarantees - there
is no reason to try to identify a ‘corresponding set’ of rows of the original matrix A.
Remark 21.2.2 Let us also emphasise that a basis for CS(A) consists of the columns of
A that correspond to the leading columns of RRE(A). The leading columns of RRE(A)
themselves do not form a basis for CS(A). This is because row operations do not preserve
the column space of a matrix in general.
11
21.3
Exercises for self study
Exercise 21.3.1
Consider the following matrix

1 2 1 3
A = 0 1 1 1
1 3 2 0
A:

0
−1 .
1
(a) Find a basis B1 for the row space of A.
(b) Find a basis B2 for the column space of A.
(c) Find a basis B3 for the null space of A.
Exercise 21.3.2 Consider the matrix A from Exercise 21.3.1 and the bases B1 , B2 and
B3 obtained there:
(a) Using B3 , express each column of A as a linear combination of the basis vectors in B2 .
(b) Find the reduced row echelon form of AT .
(c) Find a basis C1 for the column space of AT , a basis C2 for the row space of AT and
explain why Lin(B1 ) = Lin(C1 ) and Lin(B2 ) = Lin(C2 ).
Exercise 21.3.3 Consider the set of vectors X = {v1 , v2 , v3 , v4 } where
 
 


 
1
2
−1
9







2
7
v1 =
v2 = 1
v3 =
v4 = 6 .
−4
3
−29
8
(a) Find the reduced row echelon form of the matrix A = (v1 v2 v3 v4 ) whose columns are
the vectors in X.
(b) Argue that Lin(X) = CS(A) and hence find a basis B for Lin(X).
(c) Find the coordinates (vi )B of each vector vi in X with respect to the basis B of Lin(X).
Exercise 21.3.4 Consider the set X and the matrix A from Exercise 21.3.3:
(a) Find the reduced row echelon form of the matrix AT .
(b) Find a basis C for RS(AT ).
(c) Argue that RS(AT ) = Lin(X) and find the coordinates (vi )C of each vector vi in X
with respect to the basis C of Lin(X).
21.4
Relevant sections from the textbooks
• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press.
Sections 5.3 and 6.5 of our Algebra Textbook are relevant.
12
22
Vector spaces associated with matrices, 2 of 2
In this section, we complete the presentation of the vector spaces N (A), CS(A) and
RS(A) associated with a matrix A. We start by establishing certain relationships between
these vector spaces and then we use the results obtained in Lecture 21 to revisit the linear
independence and the linear span of a set X of vectors as well as the issue of the consistency
of a general system Ax = b.
22.1
Orthogonality between N (A) and RS(A)
Consider any m × n matrix A of rank k; for example, consider the 3 × 5 matrix A of rank
2 presented in Example 21.1.1, whose reduced row echelon form is given below:




1 2 1 3 4
1 0 −1 7 −6
A = 1 1 0 5 −1 ,
RRE(A) = 0 1 1 −2 5  .
3 5 2 11 7
0 0 0
0
0
We have already established that both N (A) and RS(A) are subspaces of R5 and that
    
      
1
−7
6 
1
0 























−1  2  −5
 0   1 

    
  
N (A) = Lin 
RS(A) = Lin 
−1 ,  1  ,
 1  ,  0  ,  0  ,












0
1
0 
7  −2











0
0
1
−6
5
where each of the above sets of vectors constitutes a basis for the corresponding subspace.
The spaces N (A) and RS(A) are orthogonal to each other with respect to the usual scalar
product on R5 in the sense that any vector in N (A) is orthogonal to any vector in RS(A).
Indeed, a vector x belongs to N (A) if and only if it satisfies the homogeneous equation
Ax = 0. This implies that x is orthogonal to the transposed rows of A, which in turn
implies that x is orthogonal to any vector in RS(A). In the case of the example given
above, one can verify this fact directly by using the basis vectors for N (A) and RS(A):
   
   
   
1
1
−7
1
6
1
*−1  0 +
* 2   0  +
*−5  0 +
   
   
   
 1  , −1 = 0,
 0  , −1 = 0,
 0  , −1 = 0,
   
   
   
0 7
1 7
0 7
0
−6
0
−6
1
−6

  
1
0
*−1  1 +
   
 1  ,  1  = 0,
   
 0  −2
0
5

  
−7
0
* 2   1  +
   
 0  ,  1  = 0,
   
 1  −2
0
5

  
6
0
*−5  1 +
   
 0  ,  1  = 0.
   
 0  −2
5
1
By the above results and the bilinearity of the scalar product, it follows that any linear
combination of the basis vectors of N (A) is orthogonal to any linear combination of the
basis vectors of RS(A). Hence, N (A) and RS(A) are orthogonal to each other.
13
22.2
The rank-nullity theorem
Still working with the example above, note that if we add the dimension of N (A) to
the dimension of RS(A), we obtain the dimension of the vector space R5 . This is not a
coincidence: The dimension of RS(A) is equal to the number of leading ones in RRE(A);
that is, it is equal to the rank of A:
dim(RS(A)) = rank(A).
The dimension of N (A) is equal to the number of free parameters appearing in the general
solution of the homogeneous system Ax = 0; this number is known as the nullity of A:
dim(N (A)) = nullity(A).
Now recall that any column of A which does not contain a leading one implies the existence
of a free parameter in the general solution of the system Ax = 0. Hence, the sum of the
number of leading ones and the number of free parameters is equal to the number of
columns of A. In other words, we have that
rank(A) + nullity(A) = number of columns of A.
This is known as the Rank-Nullity theorem for matrices. In the example above, the
rank of A is 2, the nullity of A is 3 and the number of columns of A is 5.
Also note that although the column space of A and the row space of A are in general
subspaces of different vector spaces (indeed, above, CS(A) ⊆ R3 and RS(A) ⊆ R5 ), the
dimensions of these vector spaces are always equal; that is,
dim(CS(A)) = dim(RS(A)).
This is because both these dimensions are equal to the number of leading ones in RRE(A);
i.e., the rank of A. Accordingly, the Rank-Nullity theorem for matrices can be expressed
in various alternative ways, as follows: For any m × n matrix A, we have that
rank(A) + nullity(A) = n,
dim(CS(A)) + dim(N (A)) = n,
dim(RS(A)) + dim(N (A)) = n.
Remark 22.2.1 Given an m × n matrix A, the only vector in Rn which belongs to both
N (A) and RS(A) is the zero vector. This is because, by the orthogonality of RS(A) and
N (A), a vector v belongs to both RS(A) and N (A) only if it is orthogonal to itself:
hv, vi = 0.
By the positivity of the scalar product on Rn , the only vector v ∈ Rn that has the property
that hv, vi = 0 is the zero vector 0 ∈ Rn . This means that for any matrix m × n A, the
intersection of the row space of A and the null space of A consists of only the zero vector
in Rn :
RS(A) ∩ N (A) = {0} .
14
22.3
Cartesian descriptions for N (A), CS(A) and RS(A)
Let us use again our previous example. We have found a basis for each of the vector spaces
N (A), RS(A) and CS(A), so each of these spaces can be regarded as the linear span of
its basis:
      
    
1
−7
6
1
0 









−1  2  −5

 0   1  




     
   










N (A) = Lin  1  ,  0  ,  0  ,
RS(A) = Lin −1 ,  1  ,












  −2

0
1
0






 7





0
0
1
−6
5
    
2 
 1



1 , 1 .
CS(A) = Lin


3
5
It is useful to be able to describe each of the subspaces N (A), RS(A) and CS(A) by a
set of Cartesian equations. Let us start with N (A), whose vector parametric description
is given below:
 
 
 
 
x1
1
−7
6
x2 
−1
2
−5
 
 
 
 
x3  = s  1  + t  0  + u  0  .
 
 
 
 
x4 
0
1
0
x5
0
0
1
We note that N (A) is a 3-dimensional subspace of R5 . Hence, a Cartesian description for N (A) amounts to imposing two independent restrictions on the five variables
x1 , x2 , x3 , x4 , x5 . With this in mind, consider the 2 × 5 matrix D of rank 2 whose rows are
the transposed basis vectors of RS(A). Since the transposed rows of D are orthogonal to
any vector in N (A), the general solution of the homogeneous system Dx = 0,
 
x1
 x2  
1 0 −1 7 −6 
 x3  = 0 ,

0
0 1 1 −2 5  
x4 
x5
which obviously contains three free parameters, coincides with the parametric equation for
N (A) given above. It follows that a Cartesian description for N (A) is given by the system
of equations
x1 − x3 + 7x4 − 6x5 = 0
x2 + x3 − 2x4 + 5x5 = 0
associated with the homogeneous system Dx = 0.
Similarly, RS(A) is a 2-dimensional subspace of R5 , described by the parametric equation
 
 
 
x1
1
0
 x2 
0
1
 
 
 
x3  = λ −1 + µ  1  .
 
 
 
 x4 
7
−2
x5
−6
5
15
A Cartesian description for RS(A) requires three independent restrictions on the five
variables x1 , x2 , x3 , x4 , x5 . Let us therefore consider the 3×5 matrix E of rank 3 whose rows
are the transposed basis vectors of N (A). Since the transposed rows of E are orthogonal
to any vector in RS(A), the general solution of the homogeneous system Ex = 0,
 

 x1
 

1 −1 1 0 0 
0
x2 
−7 2 0 1 0 x3  = 0 ,
 
6 −5 0 0 1 x4 
0
x5
which obviously contains two free parameters, coincides with the parametric equation of
RS(A) given above. It follows that a Cartesian description for R(A) is given by the system
of equations

x1 − x2 + x3 = 0

−7x1 + 2x2 + x4 = 0

6x1 − 5x2 + x5 = 0
associated with the homogeneous system Ex = 0.
Regarding a Cartesian description for CS(A), note that the columns of A are the rows of
AT , which means that
CS(A) = RS(AT ).
Using the fact that RS(AT ) is orthogonal to N (AT ) and following an identical argument as
above, we deduce that a Cartesian description for CS(A) corresponds to the homogeneous
system Fx = 0, where F is the matrix whose rows are the transposed basis vectors of
N (AT ). This method for obtaining a Cartesian description for CS(A) is illustrated at the
end of the next subsection and also in Exercise 22.6.2.
22.4
Linear independence and span of a set X revisited
Suppose that we are given a set of vectors X = {v1 , . . . , vk } where each vector vi belongs
to Rn . For example, let us consider the following five vectors in R3 ,
          
3
4 
2
1
 1
X = 1 , 1 , 0 ,  5  , −1 ,


7
11
2
3
5
which allows us to utilise the matrix A of Example 21.1.1; i.e., the columns of A are the
vectors in X. The reduced row echelon form of the matrix A is given below:




1 2 1 3 4
1 0 −1 7 −6
RRE 
1 1 0 5 −1 −
−−−→ 0 1 1 −2 5  .
3 5 2 11 7
0 0 0
0
0
Regarding the issue of the linear independence or dependence of the set X, we see that A
does not have full column rank, so X is a linearly dependent set. A question arises (i):
Can we identify a subset S of X that consists of linearly independent vectors and thereby
express the remaining vectors in X as linear combinations of the vectors in S? Similarly,
16
regarding the issue of the linear span of X, we see that A does not have full row rank, so X
does not span R3 . A question again arises: (ii) Can we identify the subspace Lin(X) ⊂ R3
spanned by the set X and also obtain a basis B and a Cartesian description for this
subspace of R3 ? Realising that Lin(X) = CS(A), both these questions have already been
answered in the previous subsections.
More precisely, a subset S of X consisting of linearly independent vectors corresponds to
a basis for CS(A). We have already found that a basis
consists of the first
 B for CS(A)
 
1
2
two columns of A; that is, B = {c1 , c2 } where c1 = 1 and c2 = 1. Moreover, as we
3
5
saw in subsection 21.2, the basis vectors in any basis of N (A), such as the three vectors
     
1
−7
6
−1  2  −5
     
 1 , 0 , 0 ,
     
0 1 0
0
0
1
imply the existence of corresponding linear combinations of the columns {ci } of A (and
hence of the vectors {vi } in X) that are equal to the zero vector:
v1 − v2 + v3 = 0,
−7v1 + 2v2 + v4 = 0,
6v1 − 5v2 + v5 = 0.
These combinations allow us to express the remaining vectors in X as linear combinations
of the vectors in B = {v1 , v2 }:
v3 = −v1 + v2 ,
v4 = 7v1 − 2v2 ,
v5 = −6v1 + 5v2 .
Finally, since
Lin(X) = CS(A) = RS(AT )
and RS(AT ) is orthogonal to N (AT ), a Cartesian description for the two-dimensional
subspace Lin(X) ⊂ R3 corresponds to the matrix equation
Fx = 0,
where F is the 1 × 3 matrix of rank 1 whose single row is a basis vector for N (AT ). So,
we just need to find the reduced row echelon form of


1 1 3
2 1 5 


T
,
1
0
2
A =


3 5 11
4 −1 7
17
which turns out to be the matrix

1
0

RRE(AT ) = 
0
0
0
0
1
0
0
0

2
1

0
,
0
0
and then identify a basis for its null space. We have
  
 −2 
T
N (A ) = Lin −1 ,


1
so the required 1 × 3 matrix F of rank 1 is
F = −2 −1 1
and a Cartesian description for Lin(X) is given by
 
x1
−2 −1 1 x2  = 0;
x3
i.e., by −2x1 − x2 + x3 = 0. Alternatively,
  we
 can
 calculate the cross product of the two
2
1


vectors in the basis B = {c1 , c2 } = 1 , 1 in order to obtain a normal vector for


5
3
Lin(X) and hence a Cartesian description. However, note that although the cross product
of two vectors is applicable here (where Lin(X) is a 2-dimensional subspace of R3 ) it will
not be applicable in general (where Lin(X) may be a k-dimensional subspace of Rn ). Only
the first method (based on the orthogonality between CS(A) and N (AT )) is applicable in
general.
22.5
Consistency of a linear system revisited
Let us now consider a general linear system Ax = b where A is a given matrix and b is a
given vector. We already know that the system is consistent if and only if the rank of the
augmented matrix (A|b) is equal to the rank of the coefficient matrix A, that is, if and
only if
ρ((A|b)) = ρ(A).
An alternative statement amounting to the consistency of the system Ax = b is that the
vector b belongs to the column space of A:
b ∈ CS(A).
Let us prove this statement by considering the columns of A = (c1 . . . ck ). If the system
Ax = b is consistent, it must have at least one solution for the vector x; let us denote this
18
 
s1
 .. 
solution by x =  . . Then, since the equation
sk
 
s1
 .. 
(c1 . . . ck )  .  = b
sk
amounts to the equation
s1 c1 + · · · + sk ck = b,
we deduce that b is a linear combination of the columns of A; i.e., b ∈ CS(A). Conversely,
if b ∈ CS(A), then there exist scalars s1 , s2 , ..., sk such that
s1 c1 + · · · + sk ck = b;
i.e., such that
 
s1
 .. 
(c1 . . . ck )  .  = b.
sk
Hence,
 the
 linear system Ax = b, where A = (c1 . . . ck ), admits at least one solution
s1
 .. 
x =  .  and is therefore consistent. Thus, we have shown that Ax = b is consistent if
sk
and only if b ∈ CS(A); i.e., we have
ρ((A|b)) = ρ(A)
if and only if
b ∈ CS(A).
22.6
Exercises for self study
Exercise 22.6.1 Consider the set of vectors X = {v1 , v2 , v3 , v4 } where
 
 
 
 
0
3
−3
3
v1 = 1
v2 = 1
v3 =  1 
v4 =  4  .
2
5
−1
11
(a) Find the matrices RRE(A) and RRE(AT ), where A = (v1 v2 v3 v4 ) is the matrix whose
columns are the vectors in X.
(b) Obtain bases B1 , B2 , B3 and B4 for the vector spaces CS(A), RS(A), N (A) and
N (AT ), respectively.
(c) Use the bases B2 and B3 to confirm that RS(A) is orthogonal to N (A).
(d) Briefly explain why CS(A) is orthogonal to N (AT ) and confirm this fact by using the
bases B1 and B4 .
19
Exercise 22.6.2 Consider the set X and the matrix A from Exercise 22.6.1:
(a) Is X a linearly independent set? Does X span R3 ? Briefly justify your answers.
(b) Obtain a basis for Lin(X) and also a Cartesian description for Lin(X).
(c) Also obtain Cartesian descriptions for RS(A), N (A) and N (AT ).
Exercise 22.6.3 Consider the set of vectors X = {v1 , v2 , v3 } where
 
 
 
0
1
1
1
2
1
 
 
 

1
1 .
2
v1 = 
v
=
v
=
2
3
 
 
 
0
3
1
3
5
1
(a) Find the matrices RRE(A) and RRE(AT ), where A = (v1 v2 v3 ) is the matrix whose
columns are the vectors in X.
(b) Obtain bases B and C for the vector spaces CS(A) and N (AT ), respectively.
(c) Hence, obtain a Cartesian description for CS(A) ⊂ R5 .
(d) Use your Cartesian description from part (c) to confirm that each vector in X belongs
to CS(A).
Exercise 22.6.4 Consider the set X and the matrix A from Exercise 22.6.3:
(a) Is X a linearly independent set? Briefly justify your answer.
(b) Obtain a Cartesian description and a basis D for Lin(X) ⊂ R5 . Also state the dimension
of Lin(X).
(c) Find the coordinates (v1 )D , (v2 )D and (v3 )D of the vectors in X with respect to your
basis D of Lin(X).
22.7
Relevant sections from the textbooks
• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press.
Sections 5.3 and 6.5 of our Algebra Textbook are relevant.
23
Linear transformations, 1 of 6
For the next six lectures, we focus on special types of functions between vector spaces known
as linear transformations. In this lecture, we introduce the relevant definitions and a few
fundamental results; among them, the rank-nullity theorem for linear transformations.
20
23.1
The main definitions
Recall that a function from a set X to a set Y is a rule which assigns to every element
x ∈ X a unique element y ∈ Y .
Now suppose that V and W are not just sets, but vector spaces.
T : V → W is called linear if for all vectors u, v ∈ V and all scalars α ∈ R:
A function
1. T (u + v) = T (u) + T (v), and
2. T (αu) = αT (u).
Any such linear function is known as a linear transformation. In the special case where
W = V , a linear transformation T : V → V is known as a linear operator.
Conditions 1 and 2 imply, and are implied by, the single condition that for all vectors
u, v ∈ V and all scalars α, β ∈ R:
T (αu + βv) = αT (u) + βT (v).
Therefore, a linear transformation T : V → W maps linear combinations of vectors in V
to the same linear combinations of the image vectors in W . In this sense, T preserves the
‘linearity’ of the vector space V . In particular, T maps the zero vector in V to the zero
vector in W . This can be seen in a number of ways. For instance, take any x ∈ V . Then,
by the linearity of T , we have T (0) = T (0x) = 0T (x) = 0.
Example 23.1.1 The function F1 : R → R defined by F1 (x) = 3x is a linear transformation, since for any vectors x, y ∈ R and any scalars α, β ∈ R, we have
F1 (αx + βy) = 3(αx + βy) = α(3x) + β(3y) = αF1 (x) + βF1 (y).
On the other hand, neither of the functions F1 : R → R and F2 : R → R defined by
F2 (x) = 3x + 2 and F3 (x) = x2 is linear. We have
F2 (αx + βy) = 3(αx + βy) + 2 6= α(3x + 2) + β(3y + 2) = αF2 (x) + βF2 (y)
and
F3 (αx + βy) = (αx + βy)2 6= α(x2 ) + β(y 2 ) = αF3 (x) + βF3 (y).
Example 23.1.2 Let A be an m × n matrix and let T : Rn → Rm be the function defined
by matrix multiplication: T (x) = Ax. Then T is a linear transformation. We have
T (u + v) = A(u + v) = Au + Av = T (u) + T (v)
and also
T (αu) = A(αu) = αAu = αT (u).
Example 23.1.3 Let V be the set of all functions of the form f (x) = a + bx, where a, b ∈
R. This is a vector space under the standard operations of pointwise addition and scalar
multiplication of functions. The transformation T : V → V defined by differentiation, that
is, T (f ) = f 0 , is a linear transformation. First, T is well-defined, since for all f ∈ V , where
f (x) = a + bx, the image vector f 0 is given by f 0 (x) = b, which is an element of V . To
show that T is a linear transformation, we use the properties of the derivative: Take any
two elements f, g ∈ V and any scalars α, β ∈ R. Then
T (αf + βg) = (αf + βg)0 = αf 0 + βg 0 = αT (f ) + βT (g).
21
23.2
Identity, compositions, linear combinations, inverse
Given a vector space V , the linear transformation T : V → V defined by T (v) = v is called
the identity transformation.
The composition of two linear transformations is again a linear transformation. In particular, if T : V → W and S : W → U , then the composite transformation S ◦ T , denoted
by ST , is the linear transformation defined by
(S ◦ T )(v) = (ST )(v) = S(T (v)).
T
S
Note that ST means ‘T followed by S’; that is, V −
→W −
→ U.
A linear combination of linear transformations is again a linear transformation. More
precisely, is S and T are both linear transformations between vector spaces V and W , i.e.
if S : V → W and T : V → W , then the sum S + T and the scalar multiple αS, α ∈ R,
are linear transformations between V and W , and therefore so is the linear combination
αS + βT for any choice of α, β ∈ R.
Finally, let V and W be finite-dimensional vector spaces of the same dimension, and let
T : V → W be a linear transformation. Then, if it exists, the inverse T −1 of T is the
unique linear transformation T −1 : W → V such that
T −1 (T (v)) = v
23.3
∀ v ∈ V.
Range and kernel
Suppose that T is a linear transformation from a vector space V to a vector space W .
Then the range of T , denoted by R(T ), is defined as the set
R(T ) = {w ∈ W | w = T (v) for some v ∈ V } ⊆ W.
The kernel of T , denoted by ker(T ), is defined as the set
ker(T ) = {v ∈ V | T (v) = 0} ⊆ V,
where 0 is the zero vector of W .
The proof of the following theorem is omitted but is quite straightforward:
Theorem 23.3.1 The kernel and the range of a linear transformation T : V → W are
subspaces of V and W , respectively.
Below, we find the range and kernel of each of the linear transformations presented in
Examples 23.1.1, 23.1.2 and 23.1.3:
Example 23.3.2 The range of the function F1 : R → R defined by F1 (x) = 3x is the set
R(F1 ) = {y ∈ R | y = 3x for some x ∈ R} .
We will show that
R(F1 ) = R
22
by proving that both statements R(F1 ) ⊆ R and R ⊆ R(F1 ) hold true. Indeed, if y ∈ R(F1 ),
then y = 3x for some x ∈ R, which implies
y that y ∈ R. Hence, R(F1 ) ⊆ R. Conversely,
y
, so there exists an x ∈ R, namely x = , such
given any y ∈ R, we can write y = 3
3
3
that y = 3x. Hence, y ∈ R(F1 ), which shows that R ⊆ R(F1 ). The equality of the sets
R(F1 ) and R follows.
The kernel of the function F1 : R → R is the set
ker(F1 ) = {x ∈ R | 3x = 0} .
Clearly,
ker(F1 ) = {0}
because the unique solution of the equation 3x = 0 is x = 0.
Example 23.3.3 Given any m × n matrix A, the range of the function T : Rn → Rm
defined by T (x) = Ax is the set
R(T ) = {y ∈ Rm | y = Ax for some x ∈ Rn } .
We claim the general result that
R(T ) = CS(A).
To prove this statement, we need to show that R(T ) ⊆ CS(A) and CS(A) ⊆ R(T ). If
y ∈ R(T ), then y = Ax for some x ∈ Rn . This implies that y is a linear combination of
the columns of A = (c1 . . . cn ), since
 
x1
 .. 
y = Ax = (c1 . . . cn )  .  = x1 c1 + · · · + xn cn .
xn
Hence, y ∈ CS(A), which shows that R(T ) ⊆ CS(A). Conversely, if y ∈ CS(A), then y
is some linear combination of the columns of A = (c1 . . . cn ); that is,
y = x1 c1 + · · · + xn cn .
 
x1
 .. 
This implies that y = Ax where x =  .  ∈ Rn . Hence, y ∈ R(T ), which shows that
xn
CS(A) ⊆ R(T ). The equality between the sets CS(A) and R(T ) follows.
The kernel of T is the set
ker(T ) = {x ∈ Rn | Ax = 0} .
We claim the general result that
ker(T ) = N (A).
To prove this, we need to show that ker(T ) ⊆ N (A) and N (A) ⊆ ker(T ). Indeed, if
x ∈ ker(T ), then Ax = 0. Therefore, x solves the homogeneous linear system Ax = 0. It
23
follows that x ∈ N (A), which shows that ker(T ) ⊆ N (A). Conversely, if x ∈ N (A), then
x solves the homogeneous linear system Ax = 0. Hence x ∈ ker(T ), which shows that
N (A) ⊆ ker(T ). The equality ker(T ) = N (A) follows.
Example 23.3.4 Given the vector space V consisting of all functions of the form f (x) =
ax + b, where a, b ∈ R, the range of the linear transformation T : V → V defined by
T (f ) = f 0 is the set
R(T ) = {g ∈ V | g = f 0 for some f ∈ V } .
We will show that R(T ) = W , where W is the subspace of V consisting of all functions of
the form f (x) = c, where c ∈ R. You may find it useful to confirm that W is a subspace of
V by using the Subspace Criterion. Let us first show that R(T ) ⊆ W . Indeed, if g ∈ R(T ),
then g = f 0 for some f ∈ V . Since any f ∈ V has the form f (x) = ax + b, we see that
f 0 (x) = a, so g(x) = a. Hence g ∈ W . Moreover, we also have that W ⊆ R(T ). Indeed, if
g ∈ W , then g(x) = c for some c ∈ R. Hence, g(x) can be written in the form g(x) = f 0 (x)
where f (x) = cx ∈ V . It follows that g ∈ R(T ). This completes the proof that R(T ) = W .
The kernel of T is the set
ker(T ) = {f ∈ V | f 0 = 0} ,
where 0 denotes the identically zero function in V . We will show that ker(T ) = U , where
U is the subspace of V consisting of all functions of the form f (x) = c, where c ∈ R. Note
that U and W correspond to the same subspace of V ; namely, the subspace consisting of all
constant functions. However, U is a subspace of the domain V of T while W is a subspace
of the codomain V of T . Let us first show that ker(T ) ⊆ U . Indeed, if f ∈ ker(T ), then
f 0 = 0, where 0 is the identically zero function of V . Hence f (x) = c for some c ∈ R,
which shows that f ∈ U . Moreover, we also have that U ⊆ ker(T ). Indeed, if f ∈ U , then
f (x) = c for some c ∈ R. Hence, f 0 (x) = 0 where 0 is the identically zero function in V . It
follows that f ∈ ker(T ). This completes the proof that ker(T ) = U .
23.4
Rank-nullity theorem for linear transformations
Going back to Example 23.3.3, we saw that given any m × n matrix A, the linear transformation T : Rn → Rm defined by T (x) = Ax satisfies
R(T ) = CS(A)
and
ker(T ) = N (A).
In particular, if the m × n matrix A has rank k, we have that
k = rank(A) = dim(CS(A)) = dim(R(T )).
Then, A has nullity n − k, so we also have that
n − k = nullity(A) = dim(N (A)) = dim(ker(T )).
Thus, the Rank-Nullity theorem associated with A can be expressed in the form
dim(R(T )) + dim(ker(T )) = n.
24
More generally, for any linear transformation T : V → W whose domain V is a finitedimensional vector space (not necessarily Euclidean), we have the following theorem, known
as the Rank-Nullity theorem for linear transformations:
Theorem 23.4.1 Suppose that T is a linear transformation from a finite-dimensional
vector space V to a vector space W . Then
rank(T ) + nullity(T ) = dim(V ).
The proof of this theorem can be found in section 7.2 of our algebra textbook. Note that
this result holds even if W is not finite-dimensional.
23.5
Exercises for self study
Exercise 23.5.1 For each of the following linear transformations, find a basis for the
kernel of T , ker(T ), and the range of T , R(T ). Verify the Rank-Nullity theorem in each
case:




1 2 x + 2y
x
x
(a) T : R2 → R3 by T (
) = 0 0 
=  0 ,
y
y
0 0
0
 

  

x
1 1 1
x
x+y+z
(b) T : R3 → R3 by T (y ) = 0 1 1 y  =  y + z  .
z
0 0 1
z
z
Exercise 23.5.2 Give an example of a matrix A such that the linear transformation
T : R3 → R3 defined as T (x) = Ax has the following properties:
ker(T ) ⊂ R3 is the line with Cartesian equation x = y = z, and
R(T ) ⊂ R3 is the plane with Cartesian equation 2x + y − z = 0.
Exercise 23.5.3 For the following linear transformation T , find a basis for the kernel
of T , ker(T ), and the range of T , R(T ). Obtain a Cartesian description and a vector
parametric description for ker(T ) and R(T ):
 

  

x
1 1 0
x
x+y
T : R3 → R3 by T (y ) =  0 1 1 y  =  y + z  .
z
−1 0 1
z
−x + z
Excercise 23.5.4 (a) Define what we mean by a linear transformation T : V → W from
a vector space V to a vector space W .
(b) Hence, show that any linear transformation must map the zero vector in V to the zero
vector in W .
(c) Find a basis B1 for the kernel, ker(S), of the linear transformation S : R3 → R2 defined
by
 
 
x
x
0 1 2  
y .
S(y ) =
1 3 4
z
z
25
Also find a basis B2 for the range, R(S), of S.
(d) Obtain a Cartesian description for ker(S) and a vector parametric description for R(S).
(e) Is the linear transformation S invertible? Briefly justify your answer.
23.6
Relevant sections from the textbooks
• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press.
Sections 7.1 and 7.2 of our Algebra Textbook are relevant.
24
Linear transformations, 2 of 6
In this section we focus on linear transformations T : V → W between finite-dimensional
vector spaces V and W and show that any such transformation can be represented by a
matrix. We start by considering linear transformations T : Rn → Rm between Euclidean
vector spaces, in which case a matrix representation for T arises naturally. Examples of
such linear transformations include reflections, rotations and stretches. These are all transformations from Rn to itself. We obtain matrices representing reflections, rotations and
stretches in the case where n = 2. We then extend our discussion to linear transformations
T : V → W between any finite-dimensional vector spaces V and W . In this general case,
a matrix representation for T arises only if a basis B is introduced for the domain V of
T and a basis C is introduced for the codomain W of T . Accordingly, we talk about the
matrix representation of T : V → W with respect to the bases B and C.
24.1
Matrix representation of a linear T : Rn → Rm
We saw in Section 23 that given any m × n matrix A, we can define an associated linear
transformation T : Rn → Rm by T (v) = Av. There is a reverse connection: for every
linear transformation T : Rn → Rm , there is a matrix A such that T (v) = Av. In this
context, we will denote the matrix by AT in order to identify it as the matrix corresponding
to T . This should not be confused with the notation AT for the transpose of a matrix.
The following theorem tells us how to construct AT given any linear transformation T :
R n → Rm .
Theorem 24.1.1 Suppose that T : Rn → Rm is a linear transformation. Let {e1 , e2 , . . . , en }
denote the standard basis of the domain of T , Rn , and let AT be the matrix whose columns
are the vectors T (e1 ), T (e2 ), . . . , T (en ) ∈ Rm : that is,
AT = T (e1 )T (e2 ) . . . T (en ) .
Then, for every x ∈ Rn , T (x) = AT x.
26


x1
 x2 
 
Proof Let x =  ..  be any vector in Rn . Then
.
xn


 
 
 
x1
1
0
0
 x2 
0
1
0
 
 
 
 
x =  ..  = x1  ..  + x2  ..  + · · · + xn  .. 
.
.
.
.
xn
0
0
1
= x1 e1 + x2 e2 + · · · + xn en .
Then by the linearity properties of T we have
T (x) = T (x1 e1 + x2 e2 + · · · + xn en )
= T (x1 e1 ) + T (x2 e2 ) + · · · + T (xn en )
= x1 T (e1 ) + x2 T (e2 ) + · · · + xn T (en ).
But this last expression
is just a
A = T (e1 )T (e2 ) . . . T (en ) , so we have
linear
combination
of
the
columns
of


x1


 x2 
T (x) = T (e1 )T (e2 ) . . . T (en )  ..  = AT x,
.
xn
which completes the proof.
Example 24.1.2 Let T : R3 → R2 be the linear transformation given by
 
x
2x
+
y
+
z
T (y ) =
.
x−y
z
To find the matrix AT associated with this linear transformation, we calculate the images
of the standard basis vectors e1 , e2 and e3 . We have
 
 
1
0
2
1
T (e1 ) = T (0) =
, T (e2 ) = T (1) =
,
1
−1
0
0
 
0
1
T (e3 ) = T (0) =
.
0
1
Hence, the 2 × 3 matrix A = T (e1 )T (e2 ) . . . T (en ) representing T : R3 → R2 is
2 1 1
A=
.
1 −1 0
Note that, indeed,
 
 
 
x
x
x
2 1 1  
2x + y + z



y =
A y =
= T ( y ).
1 −1 0
x−y
z
z
z
27
24.2
Reflections, rotations and stretches in R2
We will consider three types of linear transformations T : R2 → R2 , namely, reflections,
rotations and stretches, and construct the matrices corresponding to these. Since the
matrix of any linear transformation T : Rn → Rm is determined by the way in which T
acts on the standard basis of the domain Rn , all that we have to do is to calculate the
images T (e1 ) and T(e2 ) of the standard basis vectors and then construct AT by the formula
AT = T (e1 )T (e2 ) . We start with reflections:
Reflections A reflection in the x-axis is depicted below. It leaves the basis vector
e1
a
unchanged and sends the basis vector e2 to −e2 . Its effect on a general vector v =
∈
b
R2 is also depicted below. Note that in the diagram, we have identified the domain with
the codomain of T and regarded them both as a single copy of R2 :
Figure 24.2.1
The matrix AT representing T is given by
AT = T (e1 )T (e2 ) =
1 0
.
0 −1
a
Then, for any vector v =
∈ R2 , we have
b
a
1 0
a
a
T(
)=
=
b
0 −1
b
−b
in agreement with the illustration above.
Rotations
below:
An anticlockwise rotation by an angle θ, where 0 < θ <
28
π
, is visualised
2
Figure 24.2.2
The matrix AT representing this rotation can be read directly from the diagram. We have:
cosθ −sinθ
AT = T (e1 )T (e2 ) =
.
sinθ cosθ
a
Then, for any vector v =
∈ R2 , we have
b
a
cosθ −sinθ
a
a cosθ − b sinθ
T(
)=
=
.
b
sinθ cosθ
b
a sinθ + b cosθ
Stretches A stretch by a factor of k ∈ R in the x-direction and a factor of l ∈ R in the
y-direction is depicted below.
Figure 24.2.3
The corresponding matrix AT is thus
AT = T (e1 )T (e2 ) =
29
k 0
.
0 l
Invertible linear transformations Now, in general, an important relationship between
a linear transformation T : Rn → Rn and the n × n matrix AT representing it is that T is
invertible only if the matrix AT is invertible. This result is stated without proof but is a
consequence of the fact that each T : Rn → Rn uniquely determines an n × n matrix AT ,
and vice versa. Rotations, reflections and stretches by non-zero factors are all invertible
transformations. In particular, the inverse of a reflection in the x-axis is another reflection
in the x-axis, the inverse of an anticlockwise rotation by θ is an anticlockwise rotation by
−θ (i.e., a clockwise rotation by θ) and the inverse of a stretch by a factor of k 6= 0 in the
x-direction and a factor of l 6= 0 in the y-direction is a stretch by a factor of 1/k in the
x-direction and a factor of 1/l in the y-direction.
If T is a linear transformation from Rn to Rn and T −1 exists, then
AT −1 AT = AT AT −1 = I.
24.3
The matrix AB→C
T
In this subsection, we find a matrix representation for a linear transformation
T : V → W from a finite-dimensional vector space V to a finite-dimensional vector space
W . The spaces V and W may not be Euclidean spaces. Provided that a basis B for the
domain V of T and a basis C for the codomain W of T are introduced, the
elements v ∈ V
and T (v) ∈ W can be represented by coordinate vectors (v)B and T (v) C with respect to
B and C, respectively. The resulting matrix representing T : V → W with respect to B
and C is denoted by AB→C
.
T
We begin with the following theorem:
Theorem 24.3.1 Let V be a finite-dimensional vector space and let T be a linear transformation from V to a vector space W . Then T is completely determined by how it operates
on a basis of V .
Proof Let dim(V ) = n, and let B = {v1 , v2 , . . . , vn } be a basis of V . Then any v ∈ V
can be uniquely expressed as a linear combination of these basis vectors: v = α1 v1 +α2 v2 +
· · · + αn vn . So, by the linearity of T ,
T (v) = T (α1 v1 + α2 v2 + · · · + αn vn )
= α1 T (v1 ) + α2 T (v2 ) + · · · + αn T (vn ).
That is, if v ∈ V is expressed as a linear combination of the basis vectors, then the image
T (v) is the same linear combination of the images of the basis vectors. Therefore, if we
know how T operates on the basis vectors, we know how T operates on all v ∈ V .
In the particular case where both V and W are finite-dimensional vector spaces, and provided that a basis B for V and a basis C for W have been introduced, this result allows us to find a matrix representation for the linear transformation T . To be more
specific, let dim(V ) = n, dim(W ) = m and let the corresponding bases be given by
B = {v1 , v2 , . . . , vn } and C = {w1 , w2 , . . . , wm }. Furthermore, let
(v)B be the coordinate vector of v ∈ V with respect to the B basis and let T (v) C be the coordinate
30
vector of the image of v with respect to the C basis. Then, by working with these coordinatevectors (rather than with the vectors themselves), we can find a matrix such that
T (v) C = AB→C
(v)B .
T
The following theorem tells us how. Its proof is omitted, but it is analogous to that of
Theorem 24.1.1.
Theorem 24.3.2 Let T : V → W be a linear transformation from an n-dimensional
vector space V to an m-dimensional vector space W . Let B = {v1 , v2 , . . . , vn } denote
a basis for the domain V and C = {w1 , w2 , . . . , wm } denote a basis for the codomain
B→C
W . Furthermore,
be the m × n matrix whose columns are the coordinates
let AT
T (v1 ) C , T (v2 ) C , . . . , T (vn ) C of the images of the B-basis vectors with respect to the
C basis; that is,
AB→C
= T (v1 ) C T (v2 ) C . . . T (vn ) C .
T
Then, for every v ∈ V , T (v) C = AB→C
(v)B .
T
Note that if V is Rn , W is Rm and B and C are the standard bases for V and W , then
AB→C
becomes the matrix AT introduced in subsection 24.1; that is,
T
AB→C
= T (v1 ) C T (v2 ) C . . . T (vn ) C = T (e1 )T (e2 ) . . . T (en ) = AT .
T
Below, we illustrate Theorem 24.3.2 by means of an example where T is a linear transformation between Euclidean spaces, B is a standard basis for the domain of T , and C is a
non-standard basis for the codomain of T .
Example 24.3.3 Consider the linear transformation T : R3 → R2 defined by
 
x
x
+
2y
+
z
T (y ) =
.
x−z
z
Let B be the standard basis for the domain R3 of T ,
      
0
0 
 1





0 , 1 , 0 ,
B = {e1 , e2 , e3 } =


0
0
1
and let C be the following basis for the codomain R2 of T :
1
2
C = {w1 , w2 } =
,
.
4
3
Find the matrix AB→C
representing T with respect to the B and C bases.
T
Following Theorem 24.3.2, we need to find the coordinates T (e1 ) C , T (e2 ) C and T (e3 ) C
of the images of the B-basis vectors with respect to the C basis. Using the definition of T
we get
 
 
 
1
0
0
1
1
2






T (e1 ) = T ( 0 ) =
, T (e2 ) = T ( 1 ) =
, T (e3 ) = T ( 0 ) =
,
1
0
−1
0
0
1
31
1
2
1
where the images
,
,
are elements of R2 . In order to obtain the coordinates
1
0
−1
T (ei ) C of each image vector T (ei ) with respect to the C basis of R2 , we need to express
each T (ei ) as a linear combination of the C-basis vectors w1 , w2 . In other words, for each
T (ei ), we need to find scalars α, β such that
αw1 + βw2 = T (ei ).
Given each T (ei ), this equation is equivalent to solving the linear system
Dx = T (ei ),
where D = (w1 w2 ) is the matrix whose columns are the vectors w1 , w2 ,
α
and x =
is the vector of the unknowns. A fast way of solving this system is to invert
β
D and then use the relation x = D−1 T (ei ). We have:
1 −3 2
1 2
−1
D=
, so D =
.
4 3
5 4 −1
Hence,
for T (e1 ),
1
1 −3 2
1
−5
α
=
,
=
3
1
β
5 4 −1
5
for T (e2 ),
6
1 −3 2
α
2
−5
=
=
,
8
β
0
5 4 −1
5
for T (e3 ),
1 −3 2
α
1
−1
=
=
.
β
−1
1
5 4 −1
We conclude that
T (e1 ) C =
1
−5
3
5
,
T (e2 ) C =
C
− 56
8
5
,
C
T (e3 ) C =
−1
,
1 C
corresponding to the equations
1
3
T (e1 ) = − w1 + w2 ,
5
5
6
8
T (e2 ) = − w1 + w2 ,
5
5
T (e1 ) = −w1 + w2 .
It follows that that 2 × 3 matrix AB→C
that represents T : R3 → R2 with respect to the
T
bases B and C is
1
− 5 − 65 −1
B→C
AT
=
.
3
8
1
5
5
For any v ∈ R3 , we have that
T (v)
C
= AB→C
(v)B .
T
Note that the matrix AT representing the same transformation
T with respect to the
1
2
1
standard basis of R3 and the standard basis of R2 is AT =
. Indeed, since the
1 0 −1
32
coordinates (v) of a vector v ∈ Rn with respect to the standard basis of Rn coincide with
the entries of v; i.e. (v) = v, we can read the matrix AT directly from the definition of T :
 
 
x
x
1 2 1  
x + 2y + z
y .
T (  y ) =
=
x−z
1 0 −1
z
z
We complete this topic by generalising the result established in subsection 23.3 that for
any linear transformation T : Rn → Rm between Euclidean vector spaces defined by
T (x) = Ax, we have
ker(T ) = N (AT ),
R(T ) = CS(AT ).
The generalisation is straightforward but is stated without proof.
Theorem 24.3.4 Let T : V → W be a linear transformation between finite-dimensional
vector spaces and let B and C be bases for the domain V and the codomain W of T ,
respectively. Then
ker(T ) = N (AB→C
),
T
R(T ) = CS(AB→C
),
T
) refer to the B basis of V and the
where the coordinates of each vector in N (AB→C
T
B→C
coordinates of each vector in CS(AT ) refer to the C basis of W .
24.4
Exercises for self study
Exercise 24.4.1 T : R2 → R2 and S : R2 → R2 are linear transformations defined by
x
−x
T(
)=
y
−y
and
x
−y
S(
)=
.
y
x
(a) Sketch the effects of T and S on the standard basis of R2 and hence describe T and S
in words.
(b) Find the matrices AT and AS representing T and S; that is, find AB→C
and ASB→C
T
where B and C are both the standard basis of R2 .
(c) Describe in words the linear transformation S 2 T . Then check your answer by multiplying the corresponding matrices.
Exercise 24.4.2 T : R2 → R2 and S : R2 → R2 are linear transformations with respective
matrices
!
√1
√1
−
−1 0
2
2
AS =
.
AT = √1
√1
0 1
2
2
(a) Describe T and S in words.
(b) Illustrate ST and T S by considering their effects on the standard basis of R2 and show
that ST 6= T S.
33
Exercise 24.4.3 (a) Find the matrix AT representing the reflection T : R2 → R2 in the
line y = x by considering the effect on T on the standard basis of R2 .
(b) Explain why we should expect that A2T = I and then verify this directly.
Excercise 24.4.4 Let V be the vector space of all functions f : R → R of the form
f (x) = a + bx + cx2 where vector addition and scalar multiplication are defined in the
standard way:
(f + g)(x) := f (x) + g(x),
(αf )(x) := αf (x).
Consider the transformation T : V → V defined by differentiation, i.e.
T (f ) = f 0 .
(a) Show that the transformation T is well-defined; that is, show that the image vector
T (f ) is indeed an element of V .
(b) Show that T is a linear transformation.
You are given the basis B = {f1 , f2 , f3 } for V , where f1 (x) = 1 + x + x2 ,
f2 (x) = 3 + 2x and f3 (x) = 4x + 5x2 . You are also given the basis C = {g1 , g2 , g3 }
for V , where g1 (x) = 1, g2 (x) = x and g3 (x) = x2 .
representing T with respect to the bases B and C.
(c) Find the matrix AB→C
T
).
) and the column space CS(AB→C
(d) Find the null space N (AB→C
T
T
(e) Using your answers to part (d), find the kernel and the range of T .
24.5
Relevant sections from the textbooks
• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press.
Sections 7.1 and 7.2 of our Algebra Textbook are relevant.
25
25.1
Linear transformations, 3 of 6
Change of basis and transition matrix
Let us consider the Euclidean space Rn . Suppose that the vectors v1 , v2 , . . . , vn form a
basis B for Rn . Then, as we have seen, any x ∈ Rn can be written in exactly one way as
a linear combination
x = α1 v1 + α2 v2 + · · · + αn vn
of the vectors in the basis B. The vector
34


α1
 α2 
 
(x)B =  .. 
 . 
αn
B
is the coordinate vector of x with respect to B = {v1 , v2 , . . . , vn }. Note that the subscript
B may be omitted from the right hand side of the above equation as long as it is clear
that the
coordinates
α1 , α2 , . . . , αn refer to the basis B. In other words, we can also write

α1
 α2 
 
(x)B =  .. . In the particular case where B is the standard basis {e1 , e2 , . . . , en } for
 . 
αn
 
x1
 x2 
 
Rn , the coordinate vector (x) coincides with x itself. This is because if x =  ..  ∈ Rn ,
.
xn
then
 
x1
 x2 
 
x = x1 e1 + x2 e2 + · · · + xn en , hence (x) =  ..  .
.
xn
In practice, in order to find the coordinates of a given vector x with respect to a basis
B = {v1 , v2 , . . . , vn }, we just need to solve the system of linear equations
α1 v1 + α2 v2 + · · · + αn vn = x.
This system can be expressed in the form


α1
 α2 
 
(v1 v2 . . . vn )  ..  = x,
 . 
αn
where (v1 v2 . . . vn ) is the matrix whose columns are the vectors in the basis B. Denoting
this matrix by PB ,
PB = (v1 v2 . . . vn ),
and using the fact that


α1
 α2 
 
 ..  = (x)B
 . 
and
αn
the above equation becomes
PB (x)B = (x).
35
x = (x),
The matrix PB links the coordinates (x)B of x with respect to the B-basis to the coordinates (x) of x with respect to the standard basis. It is for this reason that PB is called
the transition matrix from B-coordinates to standard coordinates. Note that the
n × n matrix PB is invertible because its columns form a basis for Rn , which means that
the rank of PB is n. So we can also write
(x)B = P−1
B (x).
The matrix P−1
B is then the transition matrix from standard coordinates to Bcoordinates.
Example 25.1.1 Let B be the following set of vectors in R3 :
      
2
3 
 1





2 , −1 , 2 ;
B = {f1 , f2 , f3 } =


−1
4
1
that is,
f1 = e1 + 2e2 − e3 ,
f2 = 2e1 − e2 + 4e3 ,
f3 = 3e1 + 2e2 + e3 .
(a) Show that B is a basis for R3 .
(b) Consider
the vector v ∈ R3 whose coordinate vector with respect to the B-basis is


4
 1  . Find the standard coordinate vector (v) of v.
−5 B
Regarding part (a), to show that B is a basis, we can form the matrix (f1 f2 f3 ),


1
2 3
 2 −1 2 ,
−1 4 1
and evaluate its determinant. We find that this is equal to 4 6= 0, so B is a basis for R3 ;
i.e. (f1 f2 f3 ) has full row rank and full column rank. In particular, having shown that B
is a basis, we have (f1 f2 f3 ) = PB ; namely, the transition matrix from B-coordinates to
standard coordinates.
Regarding part (b), since


4
(v)B =  1  ,
−5 B
we have
v = 4f1 + f2 − 5f3 .
36
We can find the standard coordinates (v) either by expressing v as a linear combination
of the standard basis vectors e1 , e2 , e3 , according to
v = 4f1 + f2 − 5f3
= 4(e1 + 2e2 − e3 ) + (2e1 − e2 + 4e3 ) − 5(3e1 + 2e2 + e3 )
= −9e1 − 3e2 − 5e3 ,
that is, by using

  
   
1
2
3
−9
(v) = v = 4  2  + −1 − 5 2 = −3 ,
−1
4
1
−5
or, faster, by applying the formula derived previously:

 
 
1
2 3
4
−9





2 −1 2
1
(v) = PB (v)B =
= −3 .
−1 4 1
−5 B
−5


5
Example 25.1.2 Given the vector w =  7 , find the coordinate vector (w)B , where
−3
the basis B is given in Example 25.1.1.
To find the B-coordinates of w, we can either solve the equation


 
 
 
5
1
2
3
 7  = α1  2  + α2 −1 + α3 2
−3
−1
4
1
 
α1
and identify (w)B with the solution α2  of this equation, or we can use the transition
α3 B
−1
matrix PB from standard coordinates to B-coordinates; i.e.,
(w)B = P−1
B (w).
Omitting the steps, we find that


1
−1 .
(w)B = P−1
B (w) =
2 B
This implies that
w = f1 − f2 + 2f3 ,
which is verified below:

  
   
1
2
3
5







2 − −1 + 2 2 =
7 .
(w) = w = f1 − f2 + 2f3 =
−1
4
1
−3
37
25.2
The transition matrix PB→B 0
More generally, suppose that we are given a basis B of Rn , another basis B 0 of Rn , and
the coordinates (v)B of a vector v ∈ Rn . Then, the transition matrix PB→B 0 from Bcoordinates to B 0 -coordinates, and hence the coordinates (v)B 0 of v, can be calculated as
follows:
First, we change from B-coordinates to standard coordinates using (v) = PB (v)B and then
change from standard coordinates to B 0 -coordinates using (v)B 0 = P−1
B 0 (v). The combined
effect on the initial coordinate vector (v)B is
(v)B 0 = P−1
B 0 PB (v)B ,
which implies that the transition matrix PB→B 0 from B-coordinates to B 0 -coordinates is
PB→B 0 = P−1
B 0 PB .
An alternative method for calculating PB→B 0 is provided by the following theorem. The
theorem is stated without proof, but you are asked to derive it in Exercise 25.3.3.
Theorem 25.2.1 Let B and B 0 be two bases of Rn , where the first basis is
B = {v1 , v2 , . . . , vn } .
Then the transition matrix PB→B 0 from B-coordinates to B 0 -coordinates is given by
PB→B 0 = (v1 )B 0 (v2 )B 0 . . . (vn )B 0 ,
where the columns of the matrix PB→B 0 consist of the coordinates of the B-basis vectors
with respect to the B 0 basis.
Note that in the particular case when B 0 is the standard basis for Rn , both ways of deriving
PB→B 0 result in the transition matrix PB from B-coordinates to standard coordinates, as
expected. In the former case, we get
−1
PB→B 0 = P−1
B 0 PB = I PB = IPB = PB ,
and in the latter case, we get
PB→B 0 = (v1 )B 0 (v2 )B 0 . . . (v
n )B 0
= (v1 )(v2 ) . . . (vn )
= PB .
Also note that the second method for calculating PB→B 0 is directly applicable to any finitedimensional vector space V , where the concept of a ‘standard’ basis may not be available.
In contrast, the first method - that is, using PB→B 0 = P−1
B 0 PB - presupposes the existence
of a standard basis for V , so it is not applicable unless we nominate a basis for V to play
the role of the ‘standard’ basis. The following example illustrates this point.
Example 25.2.2 Consider the set
V = {f : R → R | f (x) = a + bx for some a, b ∈ R} ,
38
which is a vector space under the standard operations of pointwise addition and scalar
multiplication of functions. The following sets B, C and D are all bases for V :
B = {f1 , f2 }
where f1 (x) = 1, f2 (x) = x,
C = {g1 , g2 }
where g1 (x) = 2, g2 (x) = 1 + x,
D = {h1 , h2 }
where h1 (x) = 2 + x, h2 (x) = 1 + 2x.
Now consider
an element f ∈ V whose coordinate vector with respect to the C basis is
3
(f )C =
. Find (f )D .
5 C
Let us first note that we can solve this question
from first principles, without using any
3
transition matrices: The statement (f )C =
implies that f = 3g1 + 5g2 , which
5 C
amounts to the statement that for all x ∈ R,
f (x) = 3g1 (x) + 5g2 (x) = 3(2) + 5(1 + x) = 11 + 5x.
In order to find the coordinates (f )D of f we just need to express f as a linear combination
of the D-basis vectors. In other words, we need to find scalars α1 , α2 such that for all
x ∈ R,
11 + 5x = α1 (2 + x) + α2 (1 + 2x) = (2α1 + α2 ) + x(α1 + 2α2 ).
Since this equation must hold for all x ∈ R, it must be satisfied identically in x, which
implies that
2α1 + α2 = 11
α1 + 2α2 = 5.
and α2 = − 31 , which gives
Solving this simultaneous system, we find α1 = 17
3
17 3
(f )D =
.
− 31 D
The problem with this approach is that it is not systematic. If we were given another
element of V and asked the same question, we would need to start over.
A systematic approach amounts to obtaining (f )D by using the transition matrix PC→D
from C-coordinates to D-coordinates. Of course, here, we realise that there is no ‘standard’
basis for V unless we nominate one of the bases B, C or D to play that role. So, let us
start with the second method for calculating PC→D , since this method does not presuppose
the presence of a standard basis for V .
Using the result that PC→D = (g1 )D (g2 )D , all that we need to do is to express the Cbasis vectors g1 , g2 as linear combinations of the D-basis vectors h1 , h2 . Starting from g1 ,
let g1 = a1 h1 + a2 h2 . Then, ∀x ∈ R, we need to satisfy
g1 (x) = a1 h1 (x) + a2 h2 (x),
i.e.,
39
2 = a1 (2 + x) + a2 (1 + 2x).
Since this equation must hold identically in x, we obtain
2 = 2a1 + a2
0 = a1 + 2a2 ,
4 2
4
3
.
whose solution is a1 = 3 , a2 = − 3 . Hence (g1 )D =
− 23 D
Similarly, for g2 , let g2 = b1 h1 + b2 h2 . Then, for ∀x ∈ R, we need to satisfy
g2 (x) = b1 h1 (x) + b2 h2 (x),
i.e.
1 + x = b1 (2 + x) + b2 (1 + 2x).
By a similar argument as above, we obtain the simultaneous system
1 = 2b1 + b2
1 = b1 + 2b2 ,
1
1
1
whose solution is b1 = 3 , b2 = 3 . Hence (g2 )D = 31 .
3
Hence
PC→D =
and
(f )D = PC→D (f )C =
4
3
− 32
4
3
− 23
1
3
1
3
1
3
1
3
D
,
17 3
3
=
.
− 13 D
5 C
We have thus recovered our previous answer.
For completeness, let us calculate PC→D by the first method. As already discussed, this
presupposes that we nominate, say, B as the ‘standard’ basis for V . We then need to
find transition matrices PC from C-coordinates to ‘standard’ coordinates and PD from
D-coordinates to ‘standard’ coordinates, and finally apply the formula PC→D = P−1
D PC .
Nominating B as the ‘standard’ basis and expressing everything in B-coordinates, we have,
by inspection:
1
0
2
1
2
1
B=
,
, C=
,
, D=
,
.
0 B 1 B
0 B 1 B
1 B 2 B
We can even drop the subscript B since we have decided to treat B as the ‘standard basis’:
1
0
2
1
2
1
B=
,
, C=
,
, D=
,
.
0
1
0
1
1
2
Exactly as for Euclidean spaces, the above expressions imply that
2 1
2 1
PC =
and
PD =
.
0 1
1 2
Hence,
P−1
D
1
=
3
2 −1
−1 2
40
and
PC→D =
P−1
D PC
1
=
3
4
2 −1
2 1
2 1
3
=
=
− 32
−1 2
0 1
0 1
The conclusion that (f )D = PC→D (f )C =
25.3
4
3
− 23
1
3
1
3
1
3
1
3
.
17 3
3
=
follows once more.
5 C
− 31 D
Exercises for self study
Exercise 25.3.1 (a) Show that the following sets B and C are bases for R3 :
      
1
0 
 1
B = {f1 , f2 , f3 } = 0 , 1 , 0


1
3
1
and

      
1
0 
 1





1 , −1 , 1  .
C = {g1 , g2 , g3 } =


1
0
−1

3
(b) Given (v) = −1, find (v)B .
2
 
2

(c) Given (w)C = 1 , find (w) and (w)B .
3 C
Exercise 25.3.2 Consider the basis B for R3 given in Exercise 25.3.1.
(a) Write down each B-basis vector fi as a linear combination of the standard basis vectors
e1 , e2 , e3 of R3 .
(b) Using any method of your choice, invert the system in part (a); that is, express each
standard basis vector ei as a linear combination of the B-basis vectors f1 , f2 , f3 .
 
3

(c) Given the vector (v) = −1, use your answer to part (b) to express v as a linear
2
combination of the B-basis vectors. Verify that your answer agrees with Exercise 25.3.1,
part (b).
Exercise 25.3.3 Consider two random bases B and C for R3 :
B = {f1 , f2 , f3 }
and
C = {g1 , g2 , g3 } .
The transition matrix PB→C from B-coordinates to C-coordinates is defined by the property that
∀v ∈ R3 , PB→C (v)B = (v)C ,
41
where (v)B and (v)C are the coordinates of v ∈ R3 with respect to the B and C bases,
respectively. By choosing v = f1 , v = f2 and v = f3in the above relation, prove the result
given in the lectures that PB→C = (f1 )C (f2 )C (f3 )C .
π
Excercise 25.3.4 Consider an anticlockwise rotation T : R2 → R2 by an angle θ = − .
6
(a) Write down the matrix AT of the linear transformation which accomplishes this rotation.
(b) Write down the images T (e1 ) and T (e2 ) of the standard basis vectors e1 , e2 .
Now consider the basis B of R2 given by B = {f1 , f2 } where f1 = T (e1 ), f2 = T (e2 ).
(c) Write down the transition matrix PB from B-cordinates to standard coordinates and
verify that, numerically, PB = AT .
x
(d) Given any vector x ∈ R2 , let its standard coordinates be denoted by (x) =
and
y
X
its B-coordinates be denoted by (x)B =
. Now, a curve C ∈ R2 is described in
Y B
standard coordinates (x, y) by the Cartesian equation
√
3x2 + 2 3xy + 5y 2 = 6.
Find the Cartesian equation of this curve in the new B-coordinates (X, Y ).
(e) Hence sketch this curve.
25.4
Relevant sections from the textbooks
• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press.
Section 7.3 of our Algebra Textbook is relevant.
26
26.1
Linear transformations, 4 of 6
Change of basis and linear transformations
Given a linear transformation T : Rn → Rm , we have seen that there is a corresponding
matrix AT , namely
AT = T (e1 )T (e2 ) . . . T (en ) ,
such that T (x) = AT x for all x ∈ Rn . More generally, given a basis
B = {f1 , f2 , . . . , fn }
for the domain Rn of T and a basis
C = {g1 , g2 , . . . , gm }
42
for the codomain Rm of T , we have also seen that there is a matrix AB→C
, namely
T
AB→C
=
T
(f
)
T
(f
)
.
.
.
T
(f
)
1 C
2 C
n C ,
T
such that T (x) C = AB→C
(x)B for all x ∈ Rn .
T
As expected, there is a relation between the matrices AB→C
and AT which involves the
T
transition matrices PB and PC that accomplish the corresponding coordinate changes in
Rn and Rm . Starting from the fact that, ∀x ∈ Rn ,
T (x) = AT (x)
and using the relationships T (x) = PC T (x) C and (x) = PB (x)B , we find that, ∀x ∈
Rn ,
PC T (x) C = AT PB (x)B .
Multiplying this equation by P−1
C on the left yields
T (x) C = P−1
C AT PB (x)B .
(x)B , we obtain the following relationship
Hence, since we also have that T (x) C = AB→C
T
B→C
between the matrices AT and AT :
AB→C
= P−1
T
C AT PB .
Example 26.1.1 Consider the linear transformation T : R3 → R2 given by
 
x
x
+
y
+
z
T (y ) =
.
2x − z
z
      
0
0 
 1
1 , 1 , 0 be a basis for the domain of T and
Let B = {f1 , f2 , f3 } =


1
1
1
1
2
C = {g1 , g2 } =
,
be a basis for the codomain of T . Calculate the matri2
1
.
ces AT , PB , PC and AB→C
T
We have
 
1
1


T (e1 ) = T ( 0 ) =
,
2
0
 
0
1


T (e2 ) = T ( 1 ) =
,
0
0
so
AT = T (e1 )T (e2 )T (e3 ) =
We also have


1 0 0
PB = 1 1 0
1 1 1
 
0
1


T (e3 ) = T ( 0 ) =
,
−1
1
1 1 1
.
2 0 −1
and
43
PC =
1 2
,
2 1
hence
= P−1
AB→C
T
C AT P B =
1
−3


1 0 0
1
4
−
−1
1 −2
1 1 1 
−
3
3
1 1 0 =
.
5
5
−2 1
2 0 −1
1
3
3
1 1 1
Example 26.1.2 Use the information from Example 26.1.1 to verify that AB→C
and AT
T
represent the same tranformation T .
To show the equivalence of these two representations of T , let us compare the effect of
AB→C
on the basis B of Rn with the effect of AT on the standard basis of Rn .
T
Starting from AB→C
, this matrix tells us that
T
 
1
1
−3
B→C  
0
,
AT
=
5
3
C
0 B
 
4
0
−3
B→C  
1
AT
,
=
5
3
C
0
 B
0
−1
B→C  
0
AT
=
,
1 C
1 B
which amounts to the relations
1
5
T (f1 ) = − g1 + g2 ,
3
3
4
5
T (f2 ) = − g1 + g2 ,
3
3
T (f3 ) = −g1 + g2 .
Alternatively, one can read these relations directly from the matrix
1
− 3 − 43 −1
B→C
AT
= T (f1 ) C T (f2 ) C T (f3 ) C =
.
5
5
1
3
3
1 1 1
Similarly, the matrix AT =
tells us that
2 0 −1
 
 
1
0
1
1
AT 0 =
,
AT 1 =
,
2
0
0
0
which amount to the relations
T (e1 ) = ē1 + 2ē2 ,
T (e2 ) = ē1 ,
T (e3 ) = ē1 − ē2 .
44
 
0
1
AT 0 =
,
−1
1
The bars are used over the standard-basis vectors ē1 , ē2 of the codomain R2 in order to
distinguish these vectors from the standard basis vectors e1 , e2 and e3 of the domain R3 .
Now, using the linearity of T and the fact that f1 = e1 + e2 + e3 , f2 = e2 + e3 and f3 = e3 ,
we deduce that
T (f1 ) = T (e1 ) + T (e2 ) + T (e3 ) = (ē1 + 2ē2 ) + (ē1 ) + (ē1 − ē2 ) = 3ē1 + ē2 ,
T (f2 ) = T (e2 ) + T (e3 ) = (ē1 ) + (ē1 − ē2 ) = 2ē1 − ē2 ,
T (f3 ) = T (e3 ) = ē1 − ē2 .
Let us now compare the above relations with the relations
5
1
T (f1 ) = − g1 + g2 ,
3
3
5
4
T (f2 ) = − g1 + g2 ,
3
3
T (f3 ) = −g1 + g2
. Since g1 = ē1 + 2ē2 and g2 = 2ē1 + ē2 , we get
obtained by using the matrix AB→C
T
1
5
T (f1 ) = − (ē1 + 2ē2 ) + (2ē1 + ē2 ) = 3ē1 + ē2 ,
3
3
4
5
T (f2 ) = − (ē1 + 2ē2 ) + (2ē1 + ē2 ) = 2ē1 − ē2 ,
3
3
T (f3 ) = −(ē1 + 2ē2 ) + (2ē1 + ē2 ) = ē1 − ē2 ,
which are precisely the relations obtained previously, derived from the matrix AT . Hence,
the matrices AT and AB→C
represent the same transformation T .
T
26.2
Similarity
Of particular interest is the special case where the domain and the codomain of a linear
transformation is the same Euclidean space; that is, T : Rn → Rn , and the bases for the
domain and the codomain coincide; that is, B = C. In this case, the general equation
connecting AT and AB→C
reduces to
T
ATB→B = P−1
B AT PB ,
and the matrices AT and AB→B
are called similar.
T
In general, a square matrix N is called similar to another square matrix M if there exists
an invertible square matrix P such that
N = P−1 MP.
Note that we also have PNP−1 = M, which means that there exists an invertible matrix
Q, namely Q = P−1 , such that
M = Q−1 NQ.
45
Hence, M is similar to N as well. Similar matrices, such as AT and AB→B
, represent the
T
same linear transformation T in different bases.
More generally, for any bases B and C for Rn , we have
AB→B
= P−1
T
B AT P B
and
AC→C
= P−1
T
C AT PC .
Solving the first equation for AT and substituting the resulting expression in the second
equation, we get
B→B −1 B→B −1
PB PC .
PB PC = P−1
ATC→C = P−1
C P B AT
C PB AT
Now, recall from subsection 25.2 that
PC→B = P−1
B PC
−1
P−1
C→B = PB→C = PC PB .
and
Therefore, the above relation becomes
B→B
ATC→C = P−1
PC→B ,
C→B AT
which establishes the fact that the matrices AB→B
and AC→C
are similar. The first matrix
T
T
represents T with respect to the B basis, and the second matrix represents T with respect
to the C basis.
26.3
Diagonalisable linear transformations
Given a linear transformation T : Rn → Rn , suppose that we are able to find a basis B for
representing T is diagonal; that is,
Rn such that the matrix AB→B
T


k1 0 0 0
 0 k2 0 0 


AB→B
=

,
.
T
.
0 0
. 0
0 0 0 kn
where k1 , k2 , . . . , kn are some given constants. Working with the particular basis
B = {f1 , f2 , . . . , fn }, it is very easy to understand the effect of the transformation T .
We have

 
 
 
k1
1
k1 0 0 0
1
 0 k2 0 0  0
0
0
 
 

 
T (f1 ) B = AB→B
(f1 )B = 
  ..  =  ..  = k1  ..  = k1 (f1 )B ,
.
T
.
0 0
.
.
. 0  .
0 0 0 kn
0 B
0 B
0 B

 
 
 
k1 0 0 0
0
0
0








 0 k2 0 0  1
k2 
1
T (f2 ) B = AB→B
(f
)
=
=
=
k







.
.  = k2 (f2 )B ,
.
2 B
2
T
 0 0 . . . 0   .. 
 .. 
 .. 
0 0 0 kn
0 B
0 B
0 B
46

k1 0
 0 k2

T (fn ) B = AB→B
(fn )B = 
T
0 0
0 0
..
.
 
 
 
0 0
0
0
0
0
0
0
0 0
 
 
 
  ..  =  ..  = kn  ..  = kn (fn )B ,
...
.
.
0  .
0 kn
1 B
kn B
1 B
which implies that T stretches each basis vector fi by a factor of ki . In other words,
T (f1 ) = k1 f1 ,
T (f2 ) = k2 f2 ,
...,
T (fn ) = kn fn .
Some of the most important applications of linear algebra utilise properties of diagonal
matrices. Such applications require finding a basis B of Rn (or, generally, of a vector space
V ) with respect to which the matrix AB→B
representing a given linear transformation
T
T : Rn → Rn (or, generally, T : V → V ) is diagonal. We will see that we will not be able
to achieve such simplicity with every given linear transformation T . However, whenever
T is diagonalisable in the above sense, the technique of finding such a suitable basis B is
known as diagonalisation. We will discuss the process of diagonalisation in detail in the
next lecture. For the time being, let us consider a simple illustration of this process, which
allows us to introduce the concepts of eigenvectors, eigenvalues and eigenspaces.
26.4
Eigenvalues, eigenvectors and eigenspaces
Consider the linear transformation T : R2 → R2 defined
x
x + 3y
1
T(
)=
=
y
−x + 5y
−1
by
x
3
.
5
y
The effect of T on the standard basis {e1 , e2 } of R2 , namely
1
1
T (e1 ) = T (
)=
= e1 − e2 ,
0
−1
0
3
T (e2 ) = T (
)=
= 3e1 + 5e2 ,
1
5
is sketched below:
47
Figure 26.4.1
Although we can see the effect of T on the standard basis, we cannot claim that we have
fully understood what T does geometrically.
Instead, as a working hypothesis, let us assume that there exists a basis B = {f1 , f2 } of R2
such that AB→B
is a diagonal matrix; that is,
T
k 0
B→B
AT
=
0 l
for some k, l ∈ R. If this is the case, the effect of T on the B-basis vectors f1 and f2 is very
clear. We have
k
1
k 0
1
B→B
T (f1 ) B = AT (f1 )B =
=
=k
= k(f1 )B ,
0 l
0 B
0 B
0 B
T (f2 ) B = AB→B
(f2 )B =
T
k 0
0
0
0
=
=l
= l(f2 )B ,
0 l
1 B
l B
1 B
which implies the relations
T (f1 ) = kf1
and
T (f2 ) = lf2 .
Note that these are geometric (that is, coordinate-independent) relations and are therefore
valid with respect to any chosen basis - in particular, the standard basis: So, in standard
coordinates, we are looking for vectors f1 , f2 such that
AT f1 = kf1
and
AT f2 = lf2 .
Let us see if we can find such a basis {f1 , f2 } for the given transformation T . We start by
noting that the above requirements for f1 , f2 can be expressed as a single requirement: we
are looking for vectors x 6= 0 such that AT x = λx for some λ ∈ R. The condition that
x 6= 0 is needed because otherwise x cannot be a basis vector.
The equation
AT x = λx,
x 6= 0,
is called an eigenvalue equation. The vector x, which is simply stretched by AT by a
factor of λ, is called an eigenvector of AT . The corresponding value of λ, which gives the
amount of stretching, is called an eigenvalue of AT .
The eigenvalue equation AT x = λx, x 6= 0, seems to contain too many unknowns at this
stage since neither x nor λ is known. However, the key thing to notice is that x 6= 0:
Arranging the equation in the form of the homogeneous system
(AT − λI)x = 0,
x 6= 0,
we immediately deduce that in order for a non-zero solution for x to exist, the determinant
of the square matrix AT − λI must be zero:
|(AT − λI)| = 0.
48
If the determinant |(AT − λI)| were not zero, the matrix AT − λI would be invertible,
which would imply that the only solution of the homogeneous system (AT − λI)x = 0 is
the trivial solution x = 0, contrary to our requirement that x 6= 0.
The equation |(AT − λI)| = 0 is called the characteristic polynomial equation
T.
for A
1 3
The solutions of this equation are the eigenvalues λ of AT . Here, we have AT =
,
−1 5
so the characteristic polynomial equation for AT is
1−λ
3
= λ2 − 6λ + 8 = (λ − 4)(λ − 2) = 0,
−1 5 − λ
which implies that AT has two distinct eigenvalues, λ1 = 2 and λ2 = 4.
Having found the two eigenvalues of AT , we go back to the eigenvalue equation
(AT − λI)x = 0, x 6= 0. For each eigenvalue λ, we now need to solve this equation
for the corresponding eigenvector x. So, let us denote the eigenvectors corresponding
to λ1 and λ2 by v1 and v2 , respectively. Clearly, for each eigenvalue λi , the equation
(AT − λi I)vi = 0, vi 6= 0 amounts to the requirement that the non-zero vector vi belongs
to the null space N (AT − λi I) of the matrix AT − λi I.
Starting with λ1 = 2, we have
−1 3
v1 ∈ N (AT − λ1 I) = N (
), v1 6= 0
−1 3
3
which tells us that the non-zero vector v1 ∈ Lin
. Similarly, with λ2 = 4, we have
1
−3 3
v2 ∈ N (AT − λ2 I) = N (
), v2 6= 0
−1 1
1
which tells us that the non-zero vector v2 ∈ Lin
.
1
3
The subspace Lin
⊂ R2 is known as the eigenspace of the transformation
1
1
2
2
⊂ R2 is the
T : R → R associated with the eigenvalue 2. The subspace Lin
1
eigenspace of T associated with the eigenvalue 4.
3
The transformation T stretches any vector belonging to the eigenspace Lin
(in1
cluding the zero vector in a trivial
sense)
by a factor of λ1 = 2 and stretches any vector
1
belonging to the eigenspace Lin
(including the zero vector in a trivial sense) by a
1
factor of λ2 = 4. Moreover, the vectors
3
3
1
1
f1 =
∈ Lin
and f2 =
∈ Lin
1
1
1
1
49
spanning these subspaces of R2 are linearly independent, so they form a basis for R2 :
3
1
B = {f1 , f2 } =
,
.
1
1
A basis for R2 consisting of eigenvectors of T (that is, a basis such as B) can be obtained
by selecting
of T ; for example, we could have
any other basis vectors from the eigenspaces
6
−3
selected
from the first eigenspace and
from the second eigenspace. All choices
2
−3
of eigenvectors are equally good for the applications we have in mind. The only vector
that belongs to an eigenspace (in fact to both of them) but should never be selected as an
eigenvector is the zero vector.
Based on our previous discussion, it should be clear that the matrix AB→B
representing T
T
with respect to the basis B is diagonal, with its diagonal entries equal to the eigenvalues
λ1 , λ2 :
λ1 0
2 0
B→B
AT
=
=
.
0 λ2
0 4
Indeed, this matrix implies that
(f1 )B =
T (f1 ) B = AB→B
T
2 0
1
2
1
=
=2
= 2(f1 )B ,
0 4
0 B
0 B
0 B
(f2 )B =
T (f2 ) B = AB→B
T
2 0
0
0
0
=
=4
= 4(f2 )B ,
0 4
1 B
4 B
1 B
and these relations reproduce the geometric (i.e., coordinate-independent) relations
T (f1 ) = 2f1
and
T (f2 ) = 4f2 ,
previously obtained using standard coordinates.
B→B
Alternatively, we
is diagonal by using the transition matrix
canshow that AT
3 1
PB = (f1 f2 ) =
from B-coordinates to standard coordinates and the similarity
1 1
relation AB→B
= P−1
T
B AT PB . We have
1 1 −1
1 3
3 1
2 0
−1
B→B
=
.
AT
= P B AT P B =
−1 5
1 1
0 4
2 −1 3
The effect of T on the basis B of R2 (and hence, by the linearity of T , on any vector v ∈ R2 )
is depicted below:
50
Figure 26.4.2
Going back to Figure 26.4.1, we can now understand why the effect of T on the standard basis
vectors e1 and e2 looks so complicated. Expressing
the standard
basis vector
1
3
1
e1 =
as a linear combination of the eigenvectors f1 =
and f2 =
,
0
1
1
1
1
e 1 = f 1 − f2 ,
2
2
and using the linearity of T and the stretches T (f1 ) = 2f1 and T (f2 ) = 4f2 , we find that
1
1
1
1
T (e1 ) = T (f1 ) − T (f2 ) = (2f1 ) − (4f2 ) = f1 − 2f2 = e1 − e2 ,
2
2
2
2
0
which is not just a stretch of e1 . Similarly, expressing e2 =
as a linear combination
1
3
1
of f1 =
and f2 =
according to
1
1
3
1
e2 = − f1 + f2 ,
2
2
we see that
1
3
1
3
T (e2 ) = − T (f1 ) + T (f2 ) = − (2f1 ) + (4f2 ) = −f1 + 6f2 = 3e1 + 5e2 ,
2
2
2
2
which is not just a stretch of e2 .
26.5
Exercises for self study
Exercise 26.5.1 Consider the linear transformation S : R2 → R2 defined by
x
3x − y
S(
)=
.
y
−x + 3y
51
Also consider the standard basis B for R2 given by
1
0
B = {e1 , e2 } =
,
.
0
1
(a) Calculate the effect of S on the standard basis vectors and sketch e1 , e2 , S(e1 ) and
S(e2 ) on a single copy of R2 .
(b) Write down the matrix AS representing S with respect to the standard basis; that is,
write down AS = AB→B
.
S
(c) Calculate the effect of S on the vectors
1
f1 =
and
1
f2 =
1
−1
and add f1 , f2 , S(f1 ) and S(f2 ) to your sketch.
Now consider the basis C for R2 given by
1
1
C = {f1 , f2 } =
,
.
1
−1
representing S with respect
(d) Using your answer to part (c), write down the matrix AC→C
S
to the C basis.
(e) Verify your answer for ASC→C by using the relation AC→C
= P−1
S
C AS PC , where PC is the
transition
matrix
from
C-coordinates
to
standard
coordinates;
i.e.,
PC = PC→B where B is the standard basis.
Exercises 26.5.2 Building on Exercise 26.5.1, consider the matrix AS representing S
with respect to the standard basis.
(a) Write down the eigenvalue equation associated with AS .
(b) Solve the characteristic polynomial equation |AS − λI| = 0 to find the eigenvalues of
AS .
(c) For each eigenvalue λ, find a basis for the corresponding eigenspace N (AS − λI).
(d) Hence, find a basis E = {g1 , g2 } for R2 such that the corresponding matrix AE→E
S
representing S is diagonal.
(e) Verify that ASE→E = P−1
E AS PE is diagonal by directly multiplying the matrices on
the right hand side, where PE is the transition matrix from E-coordinates to standard
coordinates.
Exercise 26.5.3 Consider the linear transformation S : R2 → R2 defined by
x
−7x + 9y
S(
)=
.
y
−6x + 8y
52
Also consider the standard basis B for R2 given by
1
0
B = {e1 , e2 } =
,
.
0
1
(a) Calculate the effect of S on the standard basis vectors and sketch e1 , e2 , S(e1 ) and
S(e2 ) on a single copy of R2 .
(b) Write down the matrix AS representing S with respect to the standard basis; that is,
write down AS = AB→B
.
S
3
1
(c) Calculate the effect of S on the basis C = {f1 , f2 } =
,
and then find the
2
1
matrix ASC→C representing S with respect to C basis using the relation
C→C
= S(f1 ) C S(f2 ) C .
AS
(d) Solve the characteristic polynomial equation |AS − λI| = 0 and find the eigenspace
N (AS − λI) corresponding to each solution λ of this equation. Verify that your findings
agree with your answer to part (c).
1 4
Exercise 26.5.4 Consider the matrix A =
.
3 2
(a) Find the eigenvalues of A and the corresponding eigenvectors.
(b) Hence, find an invertible matrix P such that P−1 AP = D where D is a diagonal matrix.
26.6
Relevant sections from the textbooks
• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press.
Section 8.1 of our Algebra Textbook is relevant.
27
27.1
Linear transformations, 5 of 6
Diagonalisation
In the last lecture we saw examples of linear transformations T : R2 → R2 where a basis
B of R2 can be found such that the matrix AB→B
representing T is diagonal. We will now
T
present a number of theorems which tell us exactly when a given linear transformation
T : R2 → R2 is ‘diagonalisable’ in the above sense.
The central problem that we are dealing with is the solution of the eigenvalue equation
Ax = λx, x 6= 0,
where the n×n matrix A can be regarded as representing a linear transformation T : Rn →
Rn with respect to the standard basis {e1 , e2 , . . . , en } of Rn . The subscript T has been
53
omitted from the matrix A for simplicity, but it is important to remember the geometric
context to which the matrix A refers; i.e., the fact that T (x) = Ax.
Let us first review the material presented in the last lecture: We showed there that the
eigenvalue equation Ax = λx, x 6= 0 cannot have non-zero solutions for x unless the
square matrix (A − λI) is not invertible. The resulting characteristic equation,
|A − λI| = 0,
involves a polynomial of degree n on its left hand side and yields the eigenvalues of the
matrix A. For each solution λ0 of this characteristic equation, the eigenspace corresponding
to λ0 is the null space of the matrix A − λ0 I; that is, the subspace N (A − λ0 I) of Rn .
Any vector x ∈ N (A − λ0 I) other than the zero vector 0 ∈ Rn is an eigenvector of A
corresponding to the eigenvalue λ0 .
In the special case where the characteristic equation |A − λI| = 0 for the n × n matrix A yields n distinct eigenvalues λ1 , λ2 , . . . , λn , the following theorem, stated without
proof, guarantees that the matrix A is diagonalisable. In other words, it guarantees the
existence of an invertible matrix P and a corresponding diagonal matrix D such that
P−1 AP = D;
i.e.
A = PDP−1 .
This relation implies that A is similar to a diagonal matrix, which is precisely what the
term ‘diagonalisable’ means. The relevant theorem is given below:
Theorem 27.1.1
pendent.
Eigenvectors corresponding to distinct eigenvalues are linearly inde-
In order to appreciate the relevance of this theorem, observe that in all the examples of linear transformations T : R2 → R2 encountered in the last lecture, the two eigenvalues of the
2 × 2 matrix AT are distinct, and indeed, the corresponding eigenvectors are linearly independent.
It
is
for
this
reason
that
the
matrix
PB = (v1 v2 ) constructed from these eigenvectors is invertible and can be regarded as
the transition matrix from B-coordinates to standard coordinates. The resulting matrix
AB→B
representing T with respect to the B-basis is diagonal, since the relations
T
1
B→B
B→B 1
λ1
= λ1 (v1 )B = T (v1 ) B = AT (v1 )B = AT
0 B
0 B
and
0
B→B
B→B 0
λ2
= λ2 (v2 )B = T (v2 ) B = AT (v2 )B = AT
1 B
1 B
imply that
ATB→B
=
λ1 0
,
0 λ2
and the fact that AT can be expressed in the form
AT = PB AB→B
P−1
T
B
implies that AT is diagonalisable; i.e., similar to the diagonal matrix AB→B
.
T
54
The exact same methodology can be extended to any square n × n matrix A, provided that
the characteristic equation |A − λI| = 0 produces n distinct real eigenvalues; i.e., provided
that the characteristic polynomial can be written in the form
|A − λI| = (λ − λ1 )(λ − λ2 ) . . . (λ − λn )
where all the eigenvalues in the set {λi } are real and distinct.
If this is the case, Theorem 27.1.1 guarantees that the corresponding eigenvectors v1 , . . . , vn
are linearly independent and hence form a basis B = {v1 , . . . , vn } of Rn . In particular,
each eigenspace N (A − λi I) is a one-dimensional subspace of Rn , since
N (A − λi I) = Lin {vi } .
When constructing the basis B, it does not matter which particular element vi of each
eigenspace we take, as long as each vi 6= 0. The ordering of the eigenvectors in the basis B
does not matter either: Different orderings lead to different invertible transition matrices
PB , and hence different pairs (PB , AB→B
) of invertible and diagonal matrices, but in each
T
case, AT is similar to a diagonal matrix AB→B
; i.e.,
T
AT = PB AB→B
P−1
T
B .
More generally, even if A does not have distinct eigenvalues, the existence of a basis B for
Rn consisting of eigenvectors of A is enough to guarantee that A is diagonalisable. The
relevant theorem, which can be regarded as the main theorem of diagonalisation, is stated
below in two different, equivalent, ways, linked by the fact that n linearly independent
vectors in Rn form a basis B for Rn .
Main Theorem 27.1.2 (a) A n × n matrix A is diagonalisable if and only if it has n
linearly independent eigenvectors.
Main Theorem 27.1.2 (b) An n × n matrix A is diagonalisable if and only if there
exists a basis B of Rn consisting of eigenvectors of A.
The main theorem brings the following question: Exactly when does an n × n matrix A
have n linearly independent eigenvectors, so that these can form a basis B of Rn ?
In order to motivate the answer - given by Theorem 27.2.2 at the end of the next subsection
- let us examine two typical examples of 2 × 2 matrices that do not produce bases of
eigenvectors for R2 and, then, a third example of a 2 × 2 matrix that does produce such a
basis.
π
cos(θ) −sin(θ)
Example 27.1.3 Consider the rotation matrix AT =
where θ = ;
sin(θ) cos(θ)
2
π
2
2
that is, consider a rotation T : R → R by anticlockwise. Then
2
0 −1
AT =
.
1 0
Since T only rotates vectors in R2 , it preserves the length ||v|| of each vector v; hence, T
does not stretch any vector. Accordingly, we expect AT to have no real eigenvalues. Indeed,
55
the characteristic equation |AT − λI| = 0 is a quadratic equation of negative discriminant,
−λ −1
|AT − λI| =
= λ2 + 1 = 0,
1 −λ
which confirms that AT is not diagonalisable.
It is worth mentioning here that AT is diagonalisable if we introduce complex eigenvalues,
but this means that we are talking about a complex vector space. We will not cover such
vector spaces in our course.
Example 27.1.4 Consider the matrix
2 1
AT =
,
0 2
whose effect on the standard basis of R2 ,
1
2
T (e1 ) = AT
=
= 2e1 ,
0
0
0
1
T (e2 ) = AT
=
= e1 + 2e2 ,
1
2
is depicted below:
Clearly, the one-dimensional subspace Lin{e1 } is an eigenspace of the eigenvalue λ = 2
since any vector v ∈ Lin {e1 } (i.e., any vector of the form v = ke1 for some k ∈ R) is
stretched by a factor of 2.
Hence, with v1 = e1 being an eigenvector of T corresponding to λ1 = 2, is there another
eigenvector v2 of T which is linearly independent from v1 = e1 and hence results in a basis
B = {v1 , v2 } for R2 ? If the answer is yes, then A is diagonalisable by the Main Theorem
27.1.2.
56
Let us check: The characteristic equation |AT − λI| = 0 gives
2−λ
1
|AT − λI| =
= (2 − λ)2 = 0,
0
2−λ
so λ = 2 is a repeated eigenvalue. Since the two eigenvalues of A are not distinct, Theorem
27.1.1 cannot guarantee that A is diagonalisable; however, it cannot exclude that possibility
either: We need to check further, by calculating the corresponding eigenspace N (AT − 2I).
We have
0 1
N (AT − 2I) = N (
),
0 0
which
is only one free parameter
the general solution of the system
implies
that
there
in
0 1
x1
0
1
=
and hence N (AT −2I) = Lin
= Lin {e1 } is a one-dimensional
0 0
x2
0
0
subspace of R2 . Obviously, we cannot take our ‘second eigenvector’ v2 to belong to this
subspace, since any such v2 will be a scalar multiple of v1 = e1 and hence the set {v1 , v2 }
will not be a basis for R2 . In particular, the matrix (v1 v2 ) will not be invertible. We
conclude that the matrix AT is not diagonalisable.
The fact that a repeated eigenvalue arose in the last example was not the reason that
diagonalisation failed. The following example is rather trivial but it does show that A may
be diagonalisable even if it has a repeated eigenvalue.
Example 27.1.5 Consider the matrix representing a stretch in the x-direction by a factor
of 3 and a stretch in the y-direction by the same factor:
3 0
The matrix AT
is diagonal so it is also diagonalisable in a
0 3
trivial sense, since it is similar to itself: AT = I−1 AT I. Let us focus on the structure of its
eigenspaces. The characteristic equation
3−λ
0
|AT − λI| =
= (3 − λ)2 = 0
0
3−λ
= T (e1 )T (e1 ) =
57
yields a repeated eigenvalue λ1 = λ2 = 3, and hence there is a single eigenspace, namely
0 0
N (AT − 3I) = N (
) = R2 ,
0 0
which is two-dimensional. Therefore, every vector in R2 is stretched by a factor of 3, and
hence any basis B of R2 is a basis of eigenvectors
of this matrix. For example, we can
1 0
take B = {e1 , e2 } and PB = I =
or we can take a random basis {v1 , v2 } and
0 1
PB = (v1 v2 ). In all cases, the corresponding matrix AB→B
is diagonal and equal to AT ,
T
since
3 0
−1
−1 3 0
−1
−1
B→B
AT
= PB AT PB = PB
PB = PB 3IPB = 3PB IPB = 3I =
.
0 3
0 3
27.2
Algebraic and geometric multiplicity
In order to understand the three previous examples and describe with precision what
makes an n × n matrix A diagonalisable, we need to introduce the concepts of algebraic
and geometric multiplicity of eigenvalues.
A real eigenvalue λ0 of an n × n matrix A is said to have algebraic multiplicity k if
k is the largest integer such that (λ − λ0 )k is a factor of the characteristic polynomial
|A − λI|. The geometric multiplicity of an eigenvalue λ0 of A is the dimension of the
corresponding eigenspace N (A − λ0 I). The characteristic polynomial of Example 27.1.3
produced no real eigenvalues, and those encountered in Examples 27.1.4 and 27.1.5 both
produced a real eigenvalue of algebraic multiplicity 2. In the former case, the geometric
multiplicity of that eigenvalue was 1, and in the latter case, the geometric multiplicity was
2.
Note that the geometric multiplicity of any real eigenvalue λ0 of A is at least one. This is
because of the fact that |A − λ0 I| = 0 implies that the n × n matrix A − λ0 I does not have
full column rank, which in turn implies that the eigenvalue equation (A − λ0 I)x = 0 has
at least one free parameter in its general solution. Moreover, we also have an upper bound
on the geometric multiplicity of an eigenvalue λ0 of A. Stating the relevant result without
proof, we have that for any eigenvalue λ0 of an n × n matrix A, the geometric multiplicity
cannot exceed the algebraic multiplicity of λ0 . Note that, due to these bounds, if A has
n distinct real eigenvalues then each one of them has algebraic and geometric multiplicity
equal to 1.
Going back to Examples 27.1.3 - 27.1.5, also note that the only matrix that was diagonalisable had a repeated eigenvalue whose algebraic and geometric multiplicities were equal.
In the other cases, either AT had at least one non-real eigenvalue or AT had at least
one eigenvalue whose geometric multiplicity was not equal to the corresponding algebraic
multiplicity.
These results are captured by the following theorem, stated without proof, which gives
us precise conditions in order for an n × n matrix A to be diagonalisable. To properly
understand what this theorem states, it is important to keep in mind the Fundamental
58
Theorem of Algebra; namely, the result that a polynomial equation of degree n has exactly
n, generally complex 1 , roots.
Theorem 27.2.2 A matrix A is diagonalisable if and only if its characteristic polynomial
equation yields only real eigenvalues and the geometric and algebraic multiplicities of each
such eigenvalue are equal.
This theorem is illustrated in the Exercises below.
27.3
Exercises for self study


2 −1 2
Exercise 27.3.1 Consider the matrix A = 0 1 2.
0 0 3
(a) Find the eigenvalues and the corresponding eigenvectors of A.
(b) Hence, find an invertible matrix P and a diagonal matrix D such that P−1 AP = D.
Exercise 27.3.2 (a) If possible, diagonalise A =
P and a diagonal matrix
1
(b) Diagonalise A =
3
2 1
; i.e., find an invertible matrix
−4 6
D such that A = PDP−1 .
4
and hence calculate A10 .
2
Exercise 27.3.3 (a) Give the definition of a square matrix A being diagonalisable.
(b) Define the algebraic and geometric multiplicities of an eigenvalue λ of A.
A linear transformation T : R3 → R3 is represented by the matrix


4 0 0
AB→B
= 0 4 0
T
0 0 3
      
1
3 
 1
with respect to the basis B = {f1 , f2 , f3 } = 2 , 0 , 1 for R3 .


0
1
0
(c) Express each of the image vectors T (f1 ), T (f2 ), T (f3 ) as a linear combination of the
B-basis vectors.
(d)
thematrix
AT representing T with respect to the standard basis {e1 , e2 , e3 } =
Find
 
 
0
0 
 1
0 , 1 , 0 of R3 .


0
0
1
1
Recall that the real numbers are a subset of the complex numbers, so any real number is also a complex
number.
59
(e) Obtain a Cartesian description for the eigenspace associated with the largest eigenvalue
of AT .
Exercise 27.3.4 (a) Consider

1 1

A= 0 1
1 0
the matrices

1
−1
and
2


−2 1 −2
B = −1 0 1  .
2 1 2
Show that neither matrix is diagonalisable.


−1 3 0
(b) Diagonalise C =  0 2 0; that is, write C in the form C = PDP−1 for some
−3 3 2
invertible matrix P and diagonal matrix D.
27.4
Relevant sections from the textbooks
• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press.
Section 8.1, 8.2 and 8.3 of our Algebra Textbook is relevant.
28
Linear transformations, 6 of 6
We now focus on symmetric matrices and a special form of diagonalisation applicable to
symmetric matrices, known as orthogonal diagonalisation. We begin by introducing orthogonal matrices. Then, we link these matrices to the process of orthogonal diagonalisation
and, finally, we discuss applications of orthogonal diagonalisation to quadratic forms and
conic sections.
28.1
Orthogonal matrices
An n × n matrix P is said to be orthogonal if PT P = PPT = I. This means that P is
invertible and its inverse P−1 is equal to its transpose PT ; that is, PT = P−1 .
At first it appears that the use of the term ‘orthogonal’ for a matrix P satisfying
PT P = PPT = I has little to do with the concept of orthogonality of vectors. However, there is a close connection, captured by the following theorem.
Theorem 28.1.1 An n × n matrix P is orthogonal if and only if its columns are pairwise orthogonal, and each has length 1; that is, if and only if the columns of P form an
orthonormal basis for Rn .
Proof Suppose that P is an orthogonal matrix; i.e., PT P = PPT = I. Let the vectors
x1 , x2 , . . . , xn ∈ Rn be the columns of P = (x1 x2 . . . xn ), so that PT is the matrix whose
60
rows are xT1 , xT2 , . . . , xTn . Then, the matrix equation
 

xT1
xT1 x1 xT1 x2 . . .
T
x 
xT x1 xT x2 . . .
2
 2
 2
 ..  (x1 x2 . . . xn ) =  ..
..
..
 . 
 .
.
.
T
T
T
xn
xn x1 xn x2 . . .
PT P = I can be expressed as
 

xT1 xn
1 0 ... 0


xT2 xn 
 0 1 . . . 0
..  =  .. .. . . ..  ,
. .
.  . .
T
xn xn
0 0 ... 1
which implies the relations
xTi xi = hxi , xi i = ||xi ||2 = 1
and
xTi xj = hxi , xj i = 0 if i 6= j.
This shows that the columns of P form an orthonormal basis for Rn with respect to the
standard scalar product. Conversely, if the columns of P form an orthonormal basis for
Rn with respect to the standard scalar product, then
xTi xi = hxi , xi i = ||xi ||2 = 1
and
xTi xj = hxi , xj i = 0 if i 6= j,
so the matrix P is orthogonal:
 

 
1
xT1 x1 xT1 x2 . . . xT1 xn
xT1
xT x1 xT x2 . . . xT xn  0
xT 
2
2
 
 2
 2
PT P =  ..  (x1 x2 . . . xn ) =  ..
..
..  =  ..
.
.
 .
 . 
.
.
.  .
T
T
T
T
0
xn
xn x1 xn x2 . . . xn xn

0 ... 0
1 . . . 0

.. . . ..  = I.
. .
.
0 ... 1
Note that if the matrix P is orthogonal then so is its transpose PT , because
(PT )(PT )T = PT P = I = PPT = (PT )T (PT ).
Hence, Theorem 28.1.1 can also be expressed in the form: An n × n matrix P is orthogonal
if and only if the transposed rows of P form an orthonormal basis for Rn .
28.2
Orthogonal diagonalisation
A matrix A is said to be orthogonally diagonalisable if there is an orthogonal matrix
P such that P−1 AP = PT AP = D where D is a diagonal matrix.
Note that the fact that A is diagonalisable means that the columns of P form a basis for Rn
consisting of eigenvectors of A. The additional fact that A is orthogonally diagonalisable
means that the columns of P form an orthonormal basis for Rn consisting of eigenvectors
of A. Putting these facts together, we have the following theorem:
Theorem 28.2.1 A matrix A is orthogonally diagonalisable if and only if there is an
orthonormal basis for Rn consisting of eigenvectors of A.
Let us look at some examples.
Example 28.2.2 The matrix
A=
7 −15
2 −4
61
is diagonalisable but is not orthogonally diagonalisable. Omitting the calculations,
the
5
eigenvalues of A are λ1 = 1 and λ2 = 2. The eigenspace corresponding to λ1 is Lin
2
3
and the eigenspace corresponding to λ2 = 2 is Lin
. Since we have
1
5
3
,
6= 0,
2
1
no eigenvector in the eigenspace of λ1 is perpendicular to any eigenvector in the eigenspace
of λ2 . Hence it is not possible to find an orthonormal set of eigenvectors for A in order to
orthogonally diagonalise this matrix.
Example 28.2.3 Now consider the matrix
5 −3
A=
.
−3 5
The characteristic polynomial equation is
5 − λ −3
|A − λI| =
= λ2 − 10λ + 16 = 0,
−3 5 − λ
so the eigenvalues of A are λ1 = 2 and λ2 = 8. The corresponding eigenspaces are
3 −3
1 −1
1
N (A − 2I) = N (
) = N(
) = Lin
,
−3 3
0 0
1
−3 −3
1 1
−1
N (A − 8I) = N (
) = N(
) = Lin
.
−3 −3
0 0
1
Since
1
−1
,
= 0,
1
1
the eigenspaces N (A−2I) and N (A−8I) are orthogonal. It is now straightforward to create
an orthonormal basis
by selecting a unit eigenvector from each eigenspace.
!
!)
( of eigenvectors
1
√1
√
− 2
2
,
is such an orthonormal basis of eigenvectors of A.
For example, B =
1
√
√1
2
2
If we let
P = PB =
√1
2
√1
2
− √12
!
and
√1
2
D=
2 0
,
0 8
then P is orthogonal and PT AP = P−1 AP = D. We say that A has been orthogonally
diagonalised; i.e., it has been expressed in the form
A = PDPT ,
where P is an orthogonal matrix and D a diagonal matrix.
62
Note that the matrix A in this example is symmetric, whereas the matrix A in Example 28.2.2 is not. This brings the next question: Which matrices can be orthogonally
diagonalised? The answer is given by the following theorem, stated without proof.
Theorem 28.2.4 The matrix A is orthogonally diagonalisable if and only if A is symmetric.
Recall that an n × n matrix A is not diagonalisable unless all of its eigenvalues are real.
Hence, Theorem 28.2.4 implies the following result as a corollary:
Corollary 28.2.5 If A is a symmetric matrix, then all of its eigenvalues are real.
Moreover, Theorem 28.2.4 implies that, even if some eigenvalues of a symmetric matrix A
are repeated, the eigenspaces corresponding to distinct eigenvalues of A are orthogonal.
Otherwise it would have been impossible for A to have an orthonormal basis of eigenvectors, which is necessary for its orthogonal diagonalisation. This implies, in turn, that
eigenvectors corresponding to distinct eigenvalues of a symmetric matrix A are orthogonal.
It is instructive to prove this result as an independent theorem:
Theorem 28.2.6 If the matrix A is symmetric, then eigenvectors corresponding to
distinct eigenvalues are orthogonal.
Proof Suppose that λ, µ are any two distinct eigenvalues of A and that x, y are corresponding eigenvectors. Then Ax = λx and Ay = µy. Now consider the matrix product
xT Ay. Since Ay = µy, we have
xT Ay = xT (Ay) = xT (µy) = µxT y.
But also, Ax = λx. Since A is symmetric, A = AT . Substituting and using the properties
of the transpose of a matrix, we have
xT Ay = xT AT y = (xT AT )y = (Ax)T y = (λx)T y = λxT y.
Equating these two expressions for xT Ay, we deduce that µxT y = λxT y; that is,
(µ − λ)xT y = 0.
But since, by assumption, λ and µ are distinct, we have µ − λ 6= 0. Hence, we must have
that xT y = hx, yi = 0, which tells us precisely that x and y are orthogonal.
It is now straightforward to see how to construct an orthonormal basis B of eigenvectors
for a symmetric matrix A: For each eigenvalue λ of A, we use the Gram-Schmidt process to
create an orthonormal basis for the corresponding eigenspace N (A − λI). If an eigenspace
is one-dimensional, we just need to ensure that its basis vector is of unit length. If an
eigenspace is multi-dimensional, the full Gram-Schmidt process needs to be applied. Then,
the fact that eigenspaces corresponding to distinct eigenvalues are orthogonal guarantees
that the resulting basis B of eigenvectors is an orthonormal basis for Rn .
63
28.3
Symmetric matrices and quadratic forms
An important application of orthogonal diagonalisation is to the analysis of quadratic
forms. A quadratic form in two variables x and y is an expression of the form
q(x, y) = ax2 + 2cxy + by 2 .
This can be written as
q(x, y) = xT Ax,
x
where x =
and A is the symmetric matrix
y
a c
A=
.
c b
It is useful to verify that expanding the matrix product xT Ax gives
q(x, y) = ax2 + 2cxy + by 2 . Note that there are other ways of writing q(x, y) as a product
of matrices, xT Bx, where B is not symmetric, but these are of no interest to us here;
our focus is on the case where the matrix is symmetric. Similarly, a quadratic form in n
variables x1 , x2 , . . . , xn is an expression of the form
q(x1 , x2 , . . . , xn ) = xT Ax,
 
x1
 x2 
 
where A is a symmetric n × n matrix and x =  ..  ∈ Rn .
.
xn
Example 28.3.1 The following is a quadratic form in three variables:
q(x1 , x2 , x3 ) = 5x21 + 10x22 + 2x23 + 4x1 x2 + 2x1 x3 − 6x2 x3 .
 
x1
T

We have q(x1 , x2 , x3 ) = x Ax, where x = x2  and A is the symmetric matrix
x3


5 2
1
A = 2 10 −3 .
1 −3 2
Note that it is quite easy to derive the 3 × 3 symmetric matrix A from the expression
q(x1 , x2 , x3 ) of the quadratic form – and conversely – without having to perform matrix
multiplications. Specifically, the diagonal entries aii of A are the coefficients of the corresponding quadratic terms in the quadratic form q(x1 , x2 , x3 ), and the off-diagonal entries
of A, namely aij = aji , are half of the coefficients of the corresponding non-quadratic terms
in q(x1 , x2 , x3 ).
Due to certain practical applications we have in mind, we would like to know the set of all
possible values that a quadratic form q(x), x ∈ Rn may take. The technique of orthogonal
diagonalisation turns out to be very useful in this context. First, we need some terminology.
Let q(x) = xT Ax (with AT = A) be a quadratic form. Then:
64
• q(x) is positive definite if q(x) ≥ 0 for all x, and q(x) = 0 only when x = 0,
• q(x) is positive semi-definite if q(x) ≥ 0 for all x,
• q(x) is negative definite if q(x) ≤ 0 for all x, and q(x) = 0 only when x = 0,
• q(x) is negative semi-definite if q(x) ≤ 0 for all x,
• q(x) is indefinite if it is neither positive semi-definite nor negative semidefinite; that is, if there exist x1 , x2 such that q(x1 ) < 0 and q(x2 ) > 0.
Now suppose that we have found an orthogonal matrix P that orthogonally diagonalises
the symmetric matrix A. In other words, we have found P such that PT = P−1 and
PT AP = D, where D is a diagonal matrix. We perform the usual change of variables
x = Pz, which means that P is regarded as the transition matrix PB from coordinates in
the orthonormal basis B of eigenvectors of A to standard coordinates and z is regarded as
the coordinate vector (x)B of x with respect to the B basis. Then
q(x) = xT Ax = (Pz)T A(Pz) = zT (PT AP)z = zT Dz.
Note that D is a diagonal matrix whose entries are the eigenvalues of A; in other words D
plays the role of the matrix AB→B
representing the transformation
T
T : Rn → Rn with respect to the B basis, where T (x) = Ax. Let us suppose that the
eigenvalues of A (in the order in which they appear in D) are λ1 , λ2 , . . . , λn . Among the
set {λ1 , λ2 , . . . , λn }, one or more eigenvalues may be repeated. Let X1 , X2 , . . . , Xnbe the

X1
 X2 
 
coordinates of x with respect to the orthonormal basis B; that is, write z = (x)B =  .. .
 . 
Xn
T
Then, the fact that q(x) = z Dz implies that
q(x) = λ1 X12 + λ2 X22 + · · · + λn Xn2 ,
which is simply a linear combination of squares.
Now suppose that all the eigenvalues are positive. Then we can conclude that, for all z,
the quadratic form q(x) is greater than or equal to zero, and also that q(x) is zero only
when z is the zero vector. But because of the way in which x and z are related (x = Pz
and z = PT x), x = 0 if and only if z = 0. Therefore, if all the eigenvalues are positive, the
quadratic form is positive definite. Conversely, assume the quadratic form q(x) is positive
definite, so that xT Ax > 0 for all x 6= 0. Then, letting x = ui be a unit eigenvector
corresponding to the eigenvalue λi , we find that
q(ui ) = uTi Aui = uTi λi ui = λi uTi ui = λi ||ui ||2 = λi > 0,
so each eigenvalue λi of A is positive. Therefore we have shown the first part of the
following result (the other parts arise from similar reasoning):
Theorem 28.3.2 Suppose that the quadratic form q(x) is given by q(x) = xT Ax, where
AT = A. Then:
65
• q(x) is positive definite if and and only if all eigenvalues of A are positive,
• q(x) is positive semi-definite if and only if all eigenvalues of A are non-negative,
• q(x) is negative definite if and only if all eigenvalues of A are negative,
• q(x) is negative semi-definite if and only if all eigenvalues of A are non-positive,
• q(x) is indefinite if and only if at least one eigenvalue of A is negative and at least
one eigenvalue of A is positive.
The above terminology extends to the symmetric matrix A itself. So, a symmetric matrix
A is
• positive definite if and and only if all its eigenvalues are positive,
• positive semi-definite if and only if all its eigenvalues are non-negative,
• negative definite if and only if all its eigenvalues are negative,
• negative semi-definite if and only if all its eigenvalues are non-positive,
• indefinite if and only if at least one of its eigenvalues is negative and at least one of
its eigenvalues is positive.
A final note about quadratic forms in R2 is that they are directly related to conic sections.
Recall that conic sections are curves in R2 which are described by a Cartesian equation of
the form
Ax2 + Bxy + Cy 2 + Dx + Ey + F = 0
for some given real numbers A, B, C, D, E and F . Among these curves, any conic whose
Cartesian equation can be written in the simpler form
ax2 + 2cxy + by 2 = k
can be expressed as a matrix equation involving a quadratic form on its left hand side;
namely
xT Ax = k,
where
x
x=
y
and
a c
A=
= AT .
c b
The technique of orthogonal diagonalisation can be used in this context to find an orthonormal basis B of R2 so that the conic section is in standard position and orientation
with respect to the B-coordinate axes.
66
28.4
Exercises for self study
Exercise 28.4.1 (a) Orthogonally diagonalise the matrix A =
1 2
.
2 1
(b) Use the result in part (a) to sketch the curve xT Ax = 3 in the xy-plane.
Exercise 28.4.2 Consider the quadratic form
g(x, y) = xT Ax
x
where x =
and A is the matrix encountered in Exercise 28.4.1. Also consider the
y
matrices P and D that accomplish the orthogonal diagonalisation of A in Exercise 28.4.1;
i.e., the matrices P and D such that
P−1 AP = PT AP = D.
(a) Use the matrix P as a transition matrix PB from B-coordinates (X, Y ) to standard
coordinates (x, y); that is, let
x
X
x = PB (x)B
i.e.,
= PB
.
y
Y
Then express the quadratic form g(x, y) = xT Ax in terms of X, Y .
(b) Hence, classify the quadratic form g(x, y).
√
Exercise 28.4.3 Let C be the curve defined by 3x2 + 2 3xy + 5y 2 = 6.
(a) Find a symmetric matrix A such that C is given by
xT Ax = 6.
(b) Orthogonally diagonalise A; i.e., find an orthogonal matrix P and a diagonal matrix
D such that
P−1 AP = PT AP = D.
(c) Consider P as a transition matrix PB from B-coordinates to standard coordinates and
hence sketch the curve C in the xy-plane, showing the standard and the B-coordinate axes
on your diagram.
Exercise 28.4.4 Denote the columns of the matrix P obtained in Exercise 28.4.3 by u1
and u2 (i.e., P = (u1 u2 )) and consider the matrix Q = (f1 f2 ) = (−u2 u1 ).
(a) Argue that Q is an orthogonal matrix.
(b) Consider Q as a transition matrix PE from E-coordinates to standard coordinates,
where E = {f1 , f2 } = {−u2 , u1 }. Then find the corresponding diagonal matrix F such that
Q−1 AQ = QT AQ = F.
(c) Make a new sketch of the curve C encountered in Exercise 28.4.3, showing the standard
and the E-coordinate axes on your diagram.
67
28.5
Relevant sections from the textbooks
• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press.
Sections 10.3, 11.1 and 11.2 of our Algebra Textbook are relevant.
29
29.1
Multivariate calculus, 1 of 5
Functions of Two Variables
Let D be a subset of R2 . A function f : D → R is a rule that assigns to each element
(x1 , x2 ) ∈ D a unique real number f (x1 , x2 ) ∈ R. Any such function f is called a real-valued
function of two variables. For simplicity, we will assume D = R2 .
The graph of f : R2 → R consists of all points (x1 , x2 , x3 ) ∈ R3 for which
x3 = f (x1 , x2 ).
This equation imposes a single restriction on the variables x1 , x2 , x3 and hence describes a
two-dimensional surface in R3 . In general, the equation x3 = f (x1 , x2 ) may be non-linear,
in which case the corresponding surface is curved and quite difficult to visualise. Computer
packages, such as Maple, can be useful aids for the purpose of visualising the graph of f .
A linear function f : R2 → R is a function of the form
f (x1 , x2 ) = ax1 + bx2
where a and b are given real numbers. The graph of f consists of all points (x1 , x2 , x3 ) ∈ R3
for which x3 = ax1 +bx2 . We can arrange this equation in the standard form ax1 +bx2 −x3 =
3
0. This describes
 anon-vertical plane in R which passes through the origin and has a
a
normal vector  b .
−1
An affine function f : R2 → R is a function of the form
f (x1 , x2 ) = ax1 + bx2 + c
where a, b and c are given real numbers. The graph of f consists of all points (x1 , x2 , x3 ) ∈
R3 for which x3 = ax1 +bx2 +c. In standard form this equation becomes ax1 +bx2 −x3 = −c.
It describes a non-vertical plane in R3 which does not pass
the origin (unless we
 through

a
consider the linear case c = 0) and has a normal vector  b .
−1
A homogeneous function of degree n is a function f : R2 → R which satisfies the
condition that, for all x1 , x2 and λ,
f (λx1 , λx2 ) = λn f (x1 , x2 ).
68
For example, the function
f (x1 , x2 ) = 4x31 − 5x21 x2 + x32
is homogeneous of degree 3 because
f (λx1 , λx2 ) = 4(λx1 )3 − 5(λx1 )2 (λx2 ) + (λx2 )3 = λ3 (4x31 − 5x21 x2 + x32 ) = λ3 f (x1 , x2 ).
The function f (x1 , x2 ) = x1 x42 − x31 x22 + 7x51 is homogeneous of degree 5 because each term
is homogeneous of degree 5. On the other hand, the function f (x1 , x2 ) = x21 + 6x1 x2 + x2
is not homogeneous because the first two terms are homogeneous of degree 2 but the last
term is homogeneous of degree 1.
Another important class of functions f : R2 → R consists of homogeneous functions of
degree 2 of the form
f (x1 , x2 ) = ax21 + bx22
where a, b are given real numbers and (a, b) 6= (0, 0). Their graphs belong to a class called
quadric surfaces. We shall examine an example of a quadric surface shortly, after we
discuss horizontal sections and vertical sections of a general graph x3 = f (x1 , x2 )
in R3 . The horizontal sections give rise to what is known as a contour map of the
corresponding surface.
Consider the surface corresponding to the graph of a function f : R2 → R; that is, the
surface in R3 described by the Cartesian equation x3 = f (x1 , x2 ). This surface may be
curved, as illustrated below:
Figure 29.1.1
Horizontal sections of the surface x3 = f (x1 , x2 ) correspond to cutting this surface with
horizontal planes x3 = c for various values of c. The curve of intersection of the surface
x3 = f (x1 , x2 ) with a horizontal plane of the form x3 = c is called a contour. Regarded as a
curve in R3 this contour consists of all points (x1 , x2 , x3 ) in R3 that satisfy the simultaneous
equations x3 = f (x1 , x2 ) and x3 = c. Different values of c give rise to different contours:
69
Figure 29.1.2
Projecting all these contours onto the x1 x2 -plane produces what is known as a contour
map of the surface x3 = f (x1 , x2 ). Each contour in the map is labelled by its characteristic
value c. Regarded as a curve on the x1 x2 -plane, the c-contour is described by the equation
c = f (x1 , x2 ). This equation is obtained by eliminating the variable x3 from the simultaneous system of equations x3 = f (x1 , x2 ) and x3 = c which describe the same contour as
a curve in R3 . In this way, the surface x3 = f (x1 , x2 ) in R3 is visualised as a collection of
contours, all lying on a single copy of the x1 x2 -plane:
Figure 29.1.3
Vertical sections are similar to horizontal sections. The surface x3 = f (x1 , x2 ) is now
cut by vertical planes. These planes are usually chosen either parallel to the x2 x3 -plane
(that is, planes of the form x1 = a) or parallel to the x1 x3 -plane (that is, planes of the form
x2 = b). In the former case, the surface x3 = f (x1 , x2 ) in R3 is visualised as a collection of
curves x3 = f (a, x2 ). Each value of a gives rise to a curve in this collection, and all these
curves are depicted on a single copy of the x2 x3 -plane. Similarly, in the latter case, the
surface x3 = f (x1 , x2 ) in R3 is visualised as a collection of curves x3 = f (x1 , b) which are
70
depicted on a single copy of the x1 x3 -plane. Vertical sections where the vertical plane is
not parallel to the x1 x3 -plane or the x2 x3 -plane are also possible.
Let us illustrate these ideas by examining a few horizontal and vertical sections of a quadric
surface. Further examples can be found in section 3.1 of our Calculus textbook.
Example 29.1.4 Consider the function f : R2 → R given by
f (x1 , x2 ) = x21 + x22 .
Regarded as a curve in R3 , each contour of the graph x3 = x21 + x22 is described by the
simultaneous system
x3 = x21 + x22 and x3 = c.
By eliminating the variable x3 from the above system, we obtain
c = x21 + x22 ,
which
describes the relevant contour as a curve on the x1 x2 -plane. This is a circle of radius
√
c. Clearly, if c < 0 (i.e., if the horizontal plane x3 = c in R3 lies below the origin (0, 0, 0))
there are no values of x1 , x2 that satisfy the equation c = x21 + x22 , so the graph of f and the
horizontal plane x3 = c do not intersect. Also observe that as the
√ value of c increases (i.e.,
as the horizontal plane x3 = c moves higher in R3 ), the radius c of the circle c = x21 + x22
increases. This information is depicted in the two graphs below:
Figure 29.1.5
Let us also consider a few vertical sections of the graph of f with vertical planes of the
form x2 = b. Regarded as a curve in R3 , each such curve of intersection is described by the
simultaneous system
x3 = x21 + x22 and x2 = b.
71
By eliminating the variable x2 from the above system, we obtain
x3 = x21 + b2 ,
which describes the relevant curve within the context of the x1 x3 -plane. Each such curve
is a parabola. Observing the first graph in Figure 29.1.5, it should be clear that as the
value of b2 increases (i.e., as the vertical plane x2 = b moves away from the origin of R3 ),
the lowest point on the intersection curve x3 = x21 + b2 moves upwards.
29.2
Partial derivatives
The partial derivative of f : R2 → R in the x1 -direction with x2 kept constant is defined
by:
∂f
f (x1 + h, x2 ) − f (x1 , x2 )
= limh→0
.
∂x1
h
Similarly, the partial derivative of f in the x2 -direction with x1 kept constant is defined
by:
f (x1 , x2 + h) − f (x1 , x2 )
∂f
= limh→0
.
∂x2
h
The partial derivative symbol ∂ is used in place of the ordinary derivative symbol d in
order to emphasise that the function f is being differentiated with respect to one of its
variables while the other variable is kept constant.
A convenient notation for the partial derivatives with respect to x1 and x2 are fx1 and fx2 .
These should be regarded as analogous to the symbol f 0 used for an ordinary derivative.
For example, fx1 (a, b) means the partial derivative of f (x1 , x2 ) with respect to x1 evaluated
at the point (a, b).
The rules for partial differentiation follow the rules for ordinary differentiation, with the
understanding that the variable that is kept constant is treated as a fixed number. This
means that, for most practical applications, the definition of the partial derivative as a
limit is not required. Instead, one uses ordinary rules of differentiation. Let us see an
example:
Example 29.2.1
given by
Find the partial derivatives fx1 and fx2 of the function f : R2 → R
f = x2 sin(x1 3 + 5x2 ) + x1 x2 + 4x1 .
Treating x2 as a fixed number, we find that
fx1 = x2 cos(x1 3 + 5x2 )(3x21 ) + x2 + 4.
Treating x1 as a fixed number, we find that
fx2 = sin(x1 3 + 5x2 ) + x2 cos(x1 3 + 5x2 )(5) + x1 .
The second-order derivatives fx1 x1 , fx1 x2 , fx2 x1 , fx2 x2 (as well as higher-order derivatives)
are calculated in a similar manner. For example, starting from
fx1 = x2 cos(x1 3 + 5x2 )(3x21 ) + x2 + 4
72
and now treating x1 as a fixed number, we find that
fx1 x2 = cos(x1 3 + 5x2 )(3x21 ) − x2 (3x21 )sin(x1 3 + 5x2 )(5) + 1.
Note that the mixed second-order derivatives of f commute; that is,
f x1 x2 = f x2 x1 .
All functions f : R2 → R considered in this course will have the above property.
29.3
Geometrical interpretation of the partial derivatives
The partial derivatives fx1 and fx2 of a function f : R2 → R have the following geometric
meaning: Recall that the graph of f is a two-dimensional surface in R3 described by the
Cartesian equation x3 = f (x1 , x2 ). In general, the surface may be curved. Let us now
imagine slicing the surface x3 = f (x1 , x2 ) by the vertical plane x2 = b, where b is some real
number. In other words, let us consider the vertical section of the graph of f associated
with the vertical plane x2 = b. As already discussed, if the resulting curve of intersection
is regarded as a curve in R3 , its Cartesian description consists of the simultaneous system
of equations x3 = f (x1 , x2 ) and x2 = b. However, the same curve can also be regarded as
a curve lying on the two-dimensional vertical plane x2 = b. Then, it is described by the
Cartesian equation x3 = f (x1 , b), obtained by eliminating the variable x2 from the set of
equations x3 = f (x1 , x2 ) and x2 = b. In order to emphasise that x2 has been eliminated
and that x3 depends on x1 alone, we can rewrite the Cartesian equation of this curve as
x3 = g(x1 ), where g(x1 ) := f (x1 , b). The curve x3 = g(x1 ) can now be regarded as a curve
on the x1 x3 -plane, without any reference to x2 .
The partial derivative of f (x1 , x2 ) with respect to x1 evaluated at the point (a, b) is simply
the ordinary derivative of the function g(x1 ) evaluated at a; that is,
dg
∂f
(a, b) =
(a).
∂x1
dx1
In other words, the partial derivative fx1 (a, b) is the slope of the curve x3 = g(x1 ) at a.
Note that the 2 × 1 direction vector of the tangent line to the curve x3 = g(x1 ) at a is
given by
1
.
fx1 (a, b)
Indeed, regarded as a vector on the x1 x3 -plane, this vector describes a displacement of 1
unit in the x1 -direction and a displacement equal to the slope fx1 (a, b) in the x3 -direction.
The same vector can be regarded as a vector in R3 , in which case it becomes


1
 0 .
fx1 (a, b)
The component of this vector in the x2 -direction is zero because the line to which this
vector is tangent lies entirely on the plane x2 = b.
73
Similarly, we can imagine slicing the surface x3 = f (x1 , x2 ) by the vertical plane x1 = a.
If the resulting curve of intersection is regarded as a curve in R3 , its Cartesian description
consists of the set of equations x3 = f (x1 , x2 ) and x1 = a. The same curve can also be
regarded as a curve on the vertical plane x1 = a. In this case, it is described by the
Cartesian equation x3 = f (a, x2 ), obtained by eliminating the variable x1 from the set
of equations x3 = f (x1 , x2 ) and x1 = a. Since x3 depends on x2 alone, we can rewrite
the Cartesian equation of this curve as x3 = h(x2 ), where h(x2 ) := f (a, x2 ). The curve
x3 = h(x2 ) can now be regarded as a curve on the x2 x3 -plane, without any reference to x1 .
The partial derivative of the function f (x1 , x2 ) with respect to x2 evaluated at (a, b) is the
ordinary derivative of the function h(x2 ) evaluated at b; that is,
dh
∂f
(a, b) =
(b).
∂x2
dx2
In other words, the partial derivative fx2 (a, b) is the slope of the curve x3 = h(x2 ) at b.
Note that the 2 × 1 direction vector of the tangent line to the curve x3 = h(x2 ) at b is given
by
1
.
fx2 (a, b)
Indeed, regarded as a vector on the x2 x3 -plane, this vector describes a displacement of 1
unit in the x2 -direction and a displacement equal to the slope fx2 (a, b) in the x3 -direction.
The same vector can be regarded as a vector in R3 , in which case it becomes


0
 1 .
fx2 (a, b)
The component of this vector in the x1 -direction is zero because the line to which this
vector is tangent lies entirely on the plane x1 = a.
Let us illustrate the geometrical objects introduced so far with the aid of an example. Note
that Maple can be a very useful tool here, because most surfaces are difficult to sketch by
hand or even to visualise.
Example 29.3.1 Consider the function f : R2 → R given by f (x1 , x2 ) = x1 2 + x2 2 .
Also consider the curve of intersection of the surface x3 = f (x1 , x2 ) with the vertical plane
x1 = 0. Confirm that the partial derivative fx2 (0, 1) gives the slope of this curve when
x2 = 1. Consider also the curve of intersection of the surface x3 = f (x1 , x2 ) with the
vertical plane x2 = 1. Confirm that the partial derivative fx1 (0, 1) gives the slope of this
curve when x1 = 0.
Let us first consider the curve of intersection of the surface x3 = f (x1 , x2 ) with the vertical
plane x1 = 0. Regarded as a curve lying on a copy of the x2 x3 -plane, this curve is described
by the Cartesian equation
x3 = h(x2 ) := f (0, x2 ) = x22 .
The ordinary derivative h0 (x2 ) evaluated at the point x2 = 1 gives the slope of the curve
x3 = h(x2 ) at that point. We have h0 (x2 ) = 2x2 , so h0 (1) = 2. This is indeed equal to the
partial derivative fx2 (0, 1).
74
Similarly, let us consider the curve of intersection of the surface x3 = f (x1 , x2 ) with the
vertical plane x2 = 1. Regarded as a curve lying on a copy of the x1 x3 -plane, this curve is
described by the Cartesian equation
x3 = g(x1 ) := f (x1 , 1) = x21 + 1.
The ordinary derivative g 0 (x1 ) evaluated at the point x1 = 0 gives the slope of the curve
x3 = g(x1 ) at that point. We have g 0 (x1 ) = 2x1 , so g 0 (0) = 0. This is indeed equal to the
partial derivative fx1 (0, 1).
The relevant diagrams are presented below:
Figure 29.3.2
29.4
Tangent planes
Before we introduce the concept of the tangent plane to the graph of a function f : R2 → R
at a given point (a, b, f (a, b)) ∈ R3 on this graph, let us recall that a function f : R → R of
a single variable x may not admit a non-vertical tangent line at a given point (a, f (a)) ∈ R2
1
on its graph, because it may not be differentiable there. For example, the curve y = x 3
does not admit a non-vertical tangent line at (0, 0) because the derivative of the function
1
f (x) = x 3 does not exist at 0. Similarly, we cannot expect that a general surface in R3
of the form x3 = f (x1 , x2 ) will always admit a non-vertical tangent plane. However, it
75
can be shown that if the function f (x1 , x2 ) is continuous and has continuous partial
derivatives fx1 (x1 , x2 ) and fx2 (x1 , x2 ) at a point (a, b), then the graph x3 = f (x1 , x2 ) in
R3 does admit a non-vertical plane at the point (a, b, f (a, b)). In this case, we say that
f (x1 , x2 ) is differentiable at (a, b). Note that continuity and differentiability in R2 go
beyond the scope of this course, so we will not be concerned with the justification of this
last statement.
Provided that f (x1 , x2 ) is differentiable at (a, b), let us consider the vectors




1
0
 0 
 1 
and
fx1 (a, b)
fx2 (a, b)
introduced in the previous subsection. Recall that the first vector is the direction vector of
the tangent line to the curve of intersection of the surface x3 = f (x1 , x2 ) with the vertical
plane x2 = b at the point (a, b, f (a, b)) and that, similarly, the second vector is the direction
vector of the tangent line to the curve of intersection of the surface x3 = f (x1 , x2 ) with the
vertical plane x1 = a at (a, b, f (a, b)).
These two tangent lines define a unique plane that contains them, as illustrated below:
Figure 29.4.1
This is the plane which passes through the intersection point (a, b, f (a, b)) of these two
tangent lines and has the direction vectors of these lines as its direction vectors. It is called
the tangent plane to the surface x3 = f (x1 , x2 ) at the point (a, b, f (a, b)). Note that
a normal vector n for this plane can be found by requiring that n is orthogonal to the
direction vectors of the plane. Using the scalar product, it easy to check that


fx1 (a, b)
n = fx2 (a, b)
−1
76
is orthogonal to both direction vectors and hence is the required vector. Hence, a vector
parametric equation in R3 describing this tangent plane is given by
  





x1
a
1
0
x2  =  b  + s  0  + t  1 
x3
f (a, b)
fx1 (a, b)
fx2 (a, b)
and a corresponding Cartesian equation is given by
*x 

 

a
fx1 (a, b) +
x2  −  b  , fx2 (a, b) = 0.
x3
f (a, b)
−1
1
The latter can be expressed in the form
x3 − f (a, b) = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b).
Note that this is analogous to the Cartesian equation in R2 of the tangent line to the curve
y = f (x) at the point (a, f (a)); namely
y − f (a) = f 0 (a)(x − a).
Indeed, instead of the ordinary derivative f 0 (a) multiplying the difference (x − a) on the
right hand side of the Cartesian equation, we now have two partial derivatives fx1 (a, b) and
fx2 (a, b) multiplying the corresponding differences x1 − a and x2 − b. On the left hand side
of the Cartesian equation, we always have the dependent variable minus the output of the
function.
Remark 29.4.2 Since the Cartesian equation
x3 − f (a, b) = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b)
is easy to memorise, it provides the best way for reconstructing all the geometrical objects
of interest: First, arranging the Cartesian equation in standard form, we read from the
coefficients of x1 , x2 and x3 that a normal vector n to the plane is


fx1 (a, b)
fx2 (a, b) .
−1
Then we obtain the direction vectors of 
the plane
by
the

 using

 trick
 that two linearly
α
1
0
independent vectors perpendicular to n =  β  are  0  and  1 . Finally, we identify
−1  α 
β
a

b .
a position vector on the plane by using the point
f (a, b)
Example 29.4.3 Find a parametric and a Cartesian equation for the tangent plane to
the surface x3 = f (x1 , x2 ) at the point (0, 2, 5), where f (x1 , x2 ) = x1 2 + x2 2 + 1.
77
We have fx1 = 2x1 and fx2 = 2x2 , so
fx1 (0, 2) = 0
and
fx2 (0, 2) = 4.
Hence, the direction vectors of the tangent plane are

  

  
1
1
0
0
 0  = 0
 1  = 1 ,
and
fx1 (a, b)
0
fx2 (a, b)
4
and the normal vector is

  
fx1 (a, b)
0
fx2 (a, b) =  4  .
−1
−1
A Cartesian description for the tangent plane is therefore given (in scalar product form)
by
*x − 0  0 +
1
x2 − 2 ,  4 
x3 − 5
−1
= 0;
i.e., by
x3 − 5 = 0(x1 − 0) + 4(x2 − 2),
and a vector parametric description is given by
 
 
   
0
1
0
x1
x2  = 2 + s 0 + t 1 .
5
0
4
x3
29.5
Exercises for self study
Exercise 29.5.1 Using the simpler notation x1 = x, x2 = y, show that the function
f : R2 → R given by
x3 + y 3
f (x, y) =
x+y
is homogeneous of degree n = 2. Hence verify the so-called Euler’s formula valid for
homogeneous functions; that is, verify that
xfx + yfy = nf.
Exercise 29.5.2 For the function g : R2 → R given by
g(x, y) = 32x − 6x2 + 8xy + 16y − 3y 2 − 20,
find all the points (a, b, g(a, b)) ∈ R3 where the tangent plane to the graph of g is horizontal.
Exercise 29.5.3 Consider the function f : R2 → R defined by
f (x1 , x2 ) = 4x21 + x22 .
78
(a) Provide a Cartesian description for the graph of f in R3 .
(b) On a single copy of the x1 x2 -plane, sketch the curve of intersection of the graph with
the horizontal plane x3 = c for c = 0, 4, 16, i.e., sketch the contours f (x1 , x2 ) = c for
c = 0, 4, 16.
(c) On a single copy of the x1 x3 -plane, sketch the curve of intersection of the graph of f
with the vertical plane x2 = b for b = −1, 0, 1; i.e., sketch the vertical sections x3 = f (x1 , b)
for b = −1, 0, 1.
Now consider the vertical section x3 = g(x1 ) := f (x1 , 1).
(d) Evaluate the ordinary derivative g 0 (x1 ) at x1 = 2 and show that it is equal to the
∂f (x1 , x2 )
evaluated at x1 = 2, x2 = 1.
partial derivative
∂x1
Exercise 29.5.4 Let f : R2 → R be the function introduced in Exercise 29.5.3; i.e.,
f (x1 , x2 ) = 4x21 + x22 .
(a) Calculate the partial derivatives fx1 , fx2 .
(b) Find a Cartesian and a parametric description for the tangent plane Π to the graph of
f at the point (0, 2, 4) ∈ R3 .
(c) On a copy of the x1 x2 -plane, sketch the line ` of intersection of the plane Π with
the horizontal plane x3 = 4. On the same copy of the x1 x2 -plane, sketch the contour
f (x1 , x2 ) = 4.
(d) What is the relation between the line ` and the contour f (x1 , x2 ) = 4?
29.6
Relevant sections from the textbooks
• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Sections 3.1, 3.2 and 3.3 of our Calculus Textbook are relevant.
30
30.1
Multivariate calculus, 2 of 5
The gradient
Consider a function f: R2→ R. The gradient of f , denoted by ∇f , is defined as the
f
column vector ∇f := x1 .
fx2
For the purpose of interpreting the gradient geometrically, consider the surface x3 =
f (x1 , x2 ) in R3 and a point (a, b, f (a, b)) on this surface. We assume that f is differentiable at (a, b), which means that this surface admits a non-vertical tangent plane at
(a, b, f (a, b)). The Cartesian equation of this plane is
x3 − f (a, b) = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b).
79
We also assume that fx1 (a, b) and fx2 (a, b) are not both zero. This ensures that the tangent
plane at (a, b, f (a, b)) is not horizontal. We further consider the intersection of the surface
x3 = f (x1 , x2 ) with the horizontal plane x3 = f (a, b). The resulting curve of intersection
is a contour containing the point (a, b, f (a, b)). This point lies on the contour because it
satisfies both equations x3 = f (x1 , x2 ) and x3 = f (a, b). The relevant geometrical objects
are illustrated below:
Figure 30.1.1
Let us focus in particular on the intersection of the horizontal plane x3 = f (a, b) with the
tangent plane x3 − f (a, b) = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b). Recall that the tangent
plane has been assumed non-horizontal, so the intersection of these two non-parallel planes
is a line. Regarded as a geometrical object in R3 , this line is described by the set of
equations
x3 − f (a, b) = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b)
and
x3 = f (a, b).
The same line can be regarded as a line on the x1 x2 -plane, in which case its Cartesian
equation is
0 = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b).
This equation is obtained by eliminating x3 from the set of equations x3 = f (a, b) and
x3 − f (a, b) = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b) which define this line in R3 .
We repeat this process for the contour as well. Starting from the equations
x3 = f (x1 , x2 )
and
x3 = f (a, b)
that describe this contour as a curve in R3 , we eliminate x3 and derive the Cartesian
equation
f (x1 , x2 ) = f (a, b)
80
which describes the same contour as a curve on the x1 x2 -plane.
Both this contour and the line 0 = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b) are illustrated on
the x1 x2 plane, below:
Figure 30.1.2
Observe that the line is tangent to the contour2 . Moreover, the coefficients of x1 and x2
in the Cartesian equation 0 = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b) of this line are precisely
the components of the gradient ∇f (a, b). Therefore, the geometrical interpretation of the
gradient is the following:
∇f (a, b) is the normal vector to the contour f (x1 , x2 ) = f (a, b) at the point (a, b). Moreover, it can be shown that this vector points in the direction of increasing contour-values;
that is, if this vector is placed at the point (a, b) on the contour f (x1 , x2 ) = f (a, b), then
it points towards a contour f (x1 , x2 ) = c with c > f (a, b). This information has been
included in Figure 30.1.2 above.
Finally, note that if fx1 (a, b) and fx2 (a, b) are both zero, ∇f (a, b) vanishes and cannot be
considered as a normal vector to any contour. This is the reason why we excluded this
case from our previous discussion. Of course, points where both partial derivatives vanish
do arise, and there is nothing wrong with saying that the gradient is zero at these points.
It is only the claim that the gradient is a normal vector to a contour that fails there.
30.2
The derivative
We noted last time that the Cartesian equation
x3 − f (a, b) = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b)
2
This fact is proven in Exercise 30.5.1.
81
for the tangent plane to the surface x3 = f (x1 , x2 ) at the point (a, b, f (a, b)) resembles the
Cartesian equation
y − f (a) = f 0 (a)(x − a)
for the tangent line to the curve y = f (x) at the point (a, f (a)).
We would like to explore this similarity further by writing the equation of the tangent
plane in the compact form
x3 − f (a) = f 0 (a)(x − a),
x1
a
where the two-dimensional vectors x and a are given by x =
and a =
.
x2
b
For this to be the case, the symbol f (a) should be understood as f (a, b) and the derivative
T
fx1 (a, b)
0
0
f (a) should correspond to the row vector f (a) =
, where T indicates transfx2 (a, b)
posing. Then, indeed, the Cartesian equation of the tangent plane is recovered from the
equation x3 − f (a) = f 0 (a)(x − a) via matrix multiplication. This consideration leads to
the following definition:
The derivative of f : R2 → R at a point x ∈ R2 is defined by
T
fx1 (x1 , x2 )
f (x) =
.
fx2 (x1 , x2 )
0
An alternative symbol for the derivative is Df . Note that the derivative Df and the
gradient ∇f are simply the transpose of each other.
30.3
Directional derivatives
The directional derivative generalises the concept of the partial derivative of a function
f : R2 → R.
Recall that the partial derivative fx1 (a, b) is the slope of the tangent line to the curve
of intersection of the surface x3 = f (x1 , x2 ) with the vertical plane x2 = b measured at
(a, b, f (a, b)), and that the partial derivative fx2 (a, b) is the slope of the tangent line to the
curve of intersection of the surface x3 = f (x1 , x2 ) with the vertical plane x1 = a measured
at (a, b, f (a, b)).
We would like to extend this idea and find the slope, measured at (a, b, f (a, b)), of the
tangent line to the curve of intersection of the surface x3 = f (x1 , x2 ) with a vertical plane
through (a, b, f (a, b)) which is not necessarily aligned with one of the horizontal coordinate
axes. To this end, let us introduce a general horizontal direction vector u on the x1 x2 -plane,
and let us define the directional derivative fu (a, b) as the slope at (a, b, f (a, b)) of the
tangent line to the curve of intersection of the surface x3 = f (x1 , x2 ) with the vertical plane
through (a, b, f (a, b)) aligned with the direction
u. This definition indeed
vector
reproduces
1
0
the geometric meaning of fx1 (a, b) when u =
and that of fx2 when u =
.
0
1
In the illustration below, we can see the surface x3 = f (x1 , x2 ), a general point (a, b, f (a, b))
on this surface, as well as the curves of intersection of this surface with a number of vertical
82
planes through (a, b, f (a, b)), each corresponding to a different choice of the horizontal
direction vector u.
Figure 30.3.1
The key observation is that the tangent lines at (a, b, f (a, b)) to these curves of intersection
all belong to the tangent plane to the surface at (a, b, f (a, b)). Hence,
vectors
 the direction

fx1 (a, b)

of these tangent lines are all perpendicular to the normal vector fx2 (a, b) of this plane.
−1


1

0  is the direction vector of the tangent
Among these direction vectors, recall that
fx1 (a, b)
1
line associated with u =
. This is because the vertical plane x2 = b that contains this
0


0
1
tangent line is aligned with the horizontal direction u =
. Similarly,  1  is the
0
fx2 (a, b)
0
direction vector of the tangent line associated with the vector u =
. This is because
1
the vertical plane
x1 = a that contains this tangent line is aligned with the horizontal
0
direction u =
:
1
83
Figure 30.3.2


1
Now note that the vector  0  implies that, along the corresponding tangent line,
fx1 (a, b)
a displacement of 1 unit in the x1 -direction results in a displacement
equal

 to the slope
0
fx1 (a, b) of this line in the x3 -direction. Similarly, the vector  1  implies that,
fx2 (a, b)
along the corresponding tangent line, a displacement of 1 unit in the x2 -direction results
in a displacement equal to the slope fx2 (a, b) of this line in the x3 -direction.


u1
Following the same reasoning, let  u2  be the direction vector of the tangent line
fu (a, b)
u1
associated with u =
. The vertical plane that contains this tangent line is the vertical
u2
plane through
(a, b, f (a, b))
aligned
with
the
horizontal
direction
u1
u =
. In order for fu (a, b) to have the meaning of the slope of this tangent line,
u2 

u1
the vector  u2  must imply that a displacement of 1 unit in the u-direction results
fu (a, b)
in a displacement equal to the slope fu (a, b) in the x3 -direction.
For this to be the case, the vector u must be of unit length; that
 is, we
 must have
u
1
√
that u1 2 + u2 2 = 1. Otherwise, the third component fu (a, b) of  u2  cannot be
fu (a, b)
84
 
α
interpreted as a slope. Indeed, the slope of a line in the direction β  is equal to the
γ
vertical displacement γ only if the horizontal displacement
has
length
equal to 1; that is,
p
1
0
and
are both of unit length, and
only if α2 + β 2 = 1. For example, the vectors
0  1



1
0
this fact is consistent with the partial derivatives in  0  and  1  being
fx1 (a, b)
fx2 (a, b)
the slopes of the corresponding tangent lines.
Now, with the horizontal direction vector u being a unit vector, it is easy to find an
expression for the directional derivative
fu(a, b) in terms of u and the gradient ∇f (a, b)

u1
by using the fact that the vector  u2  belongs to the tangent plane at (a, b, f (a, b)).
fu (a, b)




u1
fx1 (a, b)
In particular,  u2  must be perpendicular to the normal vector fx2 (a, b) of this
fu (a, b)
−1
plane; that is, we must have
* u  f (a, b)+
1

x1
u2  , fx2 (a, b)
fu (a, b)
−1
= 0.
Solving this equation for the directional derivative fu (a, b), we find that
fx1 (a, b)
u1
,
fu (a, b) =
= hu, ∇f (a, b)i , where ||u|| = 1.
u2
fx2 (a, b)
Of course, the direction of the directional derivative fw (a, b) may be described equally well
by a horizontal vector w which is not necessarily of unit length. The above formula then
needs to be adjusted so that it becomes applicable:
w
, ∇f (a, b) , where w 6= 0.
fw (a, b) =
||w||
This formula reduces to the previous one when ||w|| = 1.
30.4
The rate of change of a function f : R2 → R
Consider the expression
w
fw (a, b) =
, ∇f (a, b)
||w||
which gives the directional derivative of f in the direction w 6= 0. Recall that the right
w
hand side of this equation is equal to the product of the lengths of the vectors
and
||w||
w
∇f (a, b) multiplied by the cosine of the angle between them. Since
is a unit vector,
||w||
we find that fw (a, b) = ||∇f (a, b)|| cos(θ).
85
Clearly, the maximum value that fw (a, b) can have is ||∇f (a, b)||. This occurs when
cos(θ) = 1, i.e., when θ = 0, and hence corresponds to the vector w pointing in the
direction of the gradient ∇f (a, b) itself; i.e., in the direction orthogonal to the contour
f (x1 , x2 ) = f (a, b) at the point (a, b), towards increasing contour values.
Now recall that the directional derivative fw (a, b) measures the slope at (a, b, f (a, b)) of the
tangent line to the curve of intersection of the surface x3 = f (x1 , x2 ) with the vertical plane
through (a, b, f (a, b)) aligned with the horizontal vector w. Hence, more simply, we can say
that fw (a, b) measures the rate of change of f at (a, b, f (a, b)) in the direction w. It follows
that the maximum rate of increase of f at (a, b, f (a, b)) is ||∇f (a, b)|| and occurs in the
direction w = ∇f (a, b). In other words, the rate of change of f in this direction is positive
and given by f∇f (a,b) (a, b) = ||∇f (a, b)||. The maximum rate of decrease of f at (a, b, f (a, b))
is also ||∇f (a, b)|| (in absolute value) and occurs in the direction w = −∇f (a, b), i.e., when
θ = π and cos(θ) = −1. In other words, the rate of change of f in this direction is negative
π
and given by f−∇f (a,b) (a, b) = −||∇f (a, b)||. Finally, fw (a, b) = 0 when θ = , in which
2
case w is perpendicular to the gradient ∇f (a, b) and therefore tangent to the contour at
(a, b, f (a, b)).
30.5
Exercises for self study
Exercise 30.5.1 Consider the line ` of intersection of the horizontal plane x3 = f (a, b)
with the tangent plane x3 − f (a, b) = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b); i.e., the line
described in the context of the x1 x2 -plane by the Cartesian equation
0 = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b).
Also consider the contour c of intersection of the horizontal plane x3 = f (a, b) with the
graph x3 = f (x1 , x2 ) of the function f ; i.e., the contour described in the context of the
x1 x2 -plane by the Cartesian equation
f (x1 , x2 ) = f (a, b).
(a) Starting from the above equation and using implicit differentiation, show that the
tangent line to the contour c at (x1 , x2 ) = (a, b) has direction vector
fx2 (a, b)
d=
.
−fx1 (a, b)
(b) Hence show that the line ` is the tangent line to the contour c at (x1 , x2 ) = (a, b).
Exercise 30.5.2 Consider the function f : R2 → R given by
f (x1 , x2 ) = x21 + x22 .
Also consider the point (3, 3, 18) on the graph of f . This point lies on the contour c,
described in the context of the x1 x2 -plane by the Cartesian equation
x21 + x22 = 18.
86
1
(a) Find the directional derivative fu (3, 3) in the direction u =
and show that it
0
is equal to the partial derivative fx1 (3, 3). Sketch the contour c on the x1 x2 -plane and
indicate the direction u on your graph as a vector starting at the point (3, 3).
1
(b) Find the directional derivative fv (3, 3) in the horizontal direction v =
. Indicate
1
the direction v on your graph as a vector starting at the point (3, 3).
(c) Calculate fx1 (3, 3) by using the definition of the partial derivative
fu (3, 3) = fx1 (3, 3) = lim+
t→0
f (3 + t, 3) − f (3, 3)
.
t
The above relation tells us that, in the context of R3 , the slope fx1 (3, 3) is the change in
the
f (3+t, 3) − f (3, 3) divided by the change in the horizontal distance
vertical
distance
√
1
3+t
3
t
2
2
−
=
= t + 0 = t associated with a displacement tu = t
0
3
3
0
in the horizontal x1 -direction.
(d) Calculate fv (3, 3) using a similar limit and verify your value for fv (3, 3) obtained in
part (b).
Exercise 30.5.3 Consider the function f : R2 → R given by f (x1 , x2 ) = x1 2 + x2 2 .
(a) Obtain a Cartesian description in R3 for the contour c obtained by slicing the surface
x3 = x1 2 + x2 2 by the horizontal plane x3 = 25.
(b) Obtain a Cartesian and a vector parametric description in R3 for the tangent plane Π
to the surface x3 = x1 2 + x2 2 at the point (3, 4, 25).
(c) Hence obtain a Cartesian and a parametric equation in R3 for the tangent line ` to the
contour at the point (3, 4, 25).
Now eliminate x3 from the description of the contour c and the line `; i.e., regard c and `
as geometric objects on the x1 x2 -plane.
(d) Obtain a Cartesian description for c as well as a Cartesian and a vector parametric
description for ` within the context of the x1 x2 -plane.
Exercise 30.5.4 Consider the contour described in the context of the x1 x2 -plane by the
Cartesian equation
x21 + x22 = 25.
(a) Sketch this contour and identfy the point (3, 4) on your graph.
(b) Use the argument based on implicit differentiation (presented in Exercise 30.5.1) to
find a 2 × 1 direction vector d for the tangent line to this contour at (3, 4).
(c) Confirm that the vector d is orthogonal to the gradient vector ∇f (3, 4), where f (x1 , x2 ) =
x21 + x22 .
87
(d) Also confirm that ∇f (3, 4) points in the direction of increasing contour values; i.e., if
∇f (3, 4) starts at (3, 4), then it points towards a contour f (x1 , x2 ) = k where k > 25.
30.6
Relevant sections from the textbooks
• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Sections 3.4, 3.5 and 3.6 of our Calculus Textbook are relevant.
31
31.1
Multivariate calculus, 3 of 5
Functions of n variables
Let D be a subset of Rn . A function f : D → R is a rule that assigns to each point
(x1 , x2 , ..., xn ) ∈ D a unique real number f (x1 , x2 , ..., xn ) ∈ R. Any such function f is
called a real-valued function of n variables. For simplicity, we will assume D = Rn .
The graph of f : Rn → R consists of all points (x1 , x2 , ...xn , xn+1 ) ∈ Rn+1 for which
xn+1 = f (x1 , x2 , ..., xn ). It corresponds to what is known as a hypersurface in Rn+1 . A
hypersurface in Rn+1 is an n-dimensional subset of Rn+1 which is generally curved.
If f is a linear function of the form f (x1 , x2 , ..., xn ) = a1 x1 + a2 x2 + ... + an xn , the hypersurface
xn+1 = a1 x1 + a2 x2 + ... + an xn
is an n-dimensional hyperplane in Rn+1 through the origin. If f is an affine function of the
form f (x1 , x2 , ..., xn ) = a1 x1 + a2 x2 + ... + an xn + c, the hypersurface
xn+1 = a1 x1 + a2 x2 + ... + an xn + c
is an n-dimensional hyperplane in Rn+1 which does not contain the origin (unless c is
zero). In other words, linear and affine functions f : Rn → R produce graphs which are
n-dimensional flats in Rn+1 . General functions f : Rn → R produce graphs which are
curved.
31.2
Tangent hyperplanes
By identical reasoning to that of the case n = 2, the tangent hyperplane to the hypersurface
xn+1 = f (x1 , x2 , ..., xn ) in Rn+1 at the point (a1 , a2 , ..., an , f (a1 , a2 , ..., an )) is described by
the Cartesian equation
xn+1 − f (a1 , a2 , ..., an ) = fx1 (a1 , a2 , ..., an )(x1 − a1 ) + · · · + fxn (a1 , a2 , ..., an )(xn − an ).
A vector n perpendicular to this hyperplane can be identified by looking at the coefficients
88
of x1 , x2 , ..., xn , xn+1 . It is given by


fx1 (a1 , a2 , ..., an )
 fx (a1 , a2 , ..., an ) 
 2



.
.
n=
.
.


fxn (a1 , a2 , ..., an )
−1
A vector parametric equation can be derived from the Cartesian equation via the Gaussian
elimination method.
Example 31.2.1 Find a Cartesian and a vector parametric equation R4 for the tangent
hyperplane to the hypersurface x4 = x1 2 + x2 2 + x3 2 at the point (1, 2, 3, 14).
The partial derivatives are fx1 = 2x1 , fx2 = 2x2 , fx3 = 2x3 , so at (1, 2, 3) they become
fx1 = 2, fx2 = 4, fx3 = 6. A Cartesian description in R4 for the 3-dimensional tangent
hyperplane to the 3-dimensional hypersurface x4 = x21 + x22 + x23 is given by
x4 − 14 = 2(x1 − 1) + 4(x2 − 2) + 6(x3 − 3),
which yields
2x1 + 4x2 + 6x3 − x4 = 14.
A vector parametric description can be obtained by Gaussian elimination using the augmented matrix (2 4 6 − 1 | 14). Equivalently, to avoid fractions, we can treat x4 as the
leading variable and regard x1 = s, x2 = t, x3 = λ as free parameters. Solving for x4 , we
get x4 = 2s + 4t + 6λ − 14; i.e.,
  

 
 
 
0
x1
1
0
0
 x2   0 
0
1
0
 =

 
 
 
x3   0  + s 0 + t 0 + λ 1 .
x4
−14
2
4
6
Note that the three
 direction vectors are
fx1 (1, 2, 3)
2
fx2 (1, 2, 3)  4 
  
n=
fx3 (1, 2, 3) =  6  to the hyperplane.
−1
−1
31.3
perpendicular
to
the
normal
Stationary points
A stationary point of a function f : Rn → R is a point where all partial derivatives of f
become zero. At any such point, the Cartesian equation of the tangent hyperplane reduces
to xn+1 − f (a1 , a2 , ..., an ) = 0. When n = 2, the tangent hyperplane is just a regular plane
in R3 , and the equation x3 − f (a1 , a2 ) = 0 implies that this plane is horizontal. Similarly,
when n = 1, the tangent hyperplane is just a line in R2 , and the equation x2 − f (a1 ) = 0
implies that this line is horizontal. For n > 2, the word “horizontal” may not have the
same meaning but is still convenient to use. So we will continue using it.
89
In order to find the stationary points of f : Rn → R, we need to solve a set of n equations
(generally, non-linear), obtained by setting all the partial derivatives of f to zero. A simple
example follows, but several more examples and exercises can be found in our calculus
textbook. Note that solving simultaneous systems of non-linear equations is significantly
harder than solving linear systems, so some practice may be needed.
Example 31.3.1 Find the stationary points of the function f : R3 → R, given by
f (x1 , x2 , x3 ) = x1 3 + x1 x2 − x2 x3 .
We have

 fx1 = 3x21 + x2 = 0
f x = x1 − x3 = 0
 2
fx3 = −x2
= 0.
The last equation implies that x2 = 0. Then, the first equation implies that x1 = 0, which
also makes x3 = 0 on the basis of the second equation. So only the origin (0, 0, 0) is a
stationary point.
31.4
Contours, gradient and directional derivatives
Consider the graph of a function f : Rn → R. As already discussed, this is an ndimensional hypersurface in Rn+1 described by the equation xn+1 = f (x1 , x2 , ..., xn ). A
contour corresponds to the intersection of this hypersurface with the n-dimensional “horizontal” hyperplane xn+1 = c. In Rn+1 , this contour is described by the set of equations
xn+1 = f (x1 , x2 , ..., xn ) and xn+1 = c. Two equations in Rn+1 eliminate two degrees of freedom and hence imply that this contour is an (n − 1)-dimensional object. The same contour
is described in the n-dimensional x1 x2 ...xn -space by the single equation c = f (x1 , x2 , ..., xn ),
obtained by eliminating xn+1 from the set of equations xn+1 = f (x1 , x2 , ..., xn ) and xn+1 =
c.
The gradient ∇f of f : Rn → R is a vector contained in the n-dimensional x1 x2 ...xn space. In this sense, it is a “horizontal” vector. This vector is normal to the contour
c = f (x1 , x2 , ..., xn ) and points in the direction of increasing contour-values.
The directional derivative in the ‘horizontal’ direction u (i.e., a direction contained in
the n-dimensional x1 x2 ...xn -space) is defined by
u
.
fu = ∇f,
||u||
It generalises the concept of the partial derivative of f and gives the rate of change of f in
the direction u. Exercise 31.8.1 provides a review of all these concepts.
31.5
Vector-valued functions
For completeness, let us also consider functions whose domain and codomain are both multidimensional spaces. We will not study tangent spaces, contours and directional derivatives
for vector-valued functions but we will study the differentiation of such functions by the
chain rule, as this arises frequently in practical applications.
90
n
m
Let
 R is a rule that assigns to each vector
 D
 be a subset of R . Afunction f : D →
x1
f1 (x1 , x2 , ..., xn )
 x2 
 f2 (x1 , x2 , ..., xn ) 
 


 ..  ∈ D a unique vector 
 ∈ Rm .
..
.


.
xn
fm (x1 , x2 , ..., xn )
Any such function f is called a vector-valued function. The m real-valued functions
f1 (x1 , x2 , ..., xn ), f2 (x1 , x2 , ..., xn ), ..., fm (x1 , x2 , ..., xn ) are called the component functions of f . The domain D will be assumed equal to Rn for simplicity.
Let (x1 , x2 , ..., xn , xn+1 , xn+2 , ..., xn+m ) be a Cartesian coordinate system for Rn+m . The n
coordinates (x1 , x2 , ..., xn ) correspond to the independent variables in the domain of f . The
m coordinates (xn+1 , xn+2 , ..., xn+m ) correspond to the dependent variables in the codomain
of f .
The graph of f : Rn → Rm consists of all points (x1 , x2 , ..., xn , xn+1 , xn+2 , ..., xn+m ) ∈ Rn+m
for which

 

xn+1
f1 (x1 , x2 , ..., xn )
 xn+2   f2 (x1 , x2 , ..., xn ) 

 

 ..  = 
.
..
 .  

.
xn+m
fm (x1 , x2 , ..., xn )
Since we have m independent equations for (n + m) variables, the graph of f is an ndimensional surface in Rn+m . This is consistent with the fact that there are n independent
variables in the domain of f , so the graph of f is an n-dimensional object in Rn+m .
Example 31.5.1 Interpret geometrically the graph of f : R2 → R3 whose component
functions are given by f1 (x1 , x2 ) = 4x1 +x2 , f2 (x1 , x2 ) = x1 x2 and f3 (x1 , x2 ) = x1 2 +x2 2 +1.
The graph of this function is in R5 . The coordinates (x1 , x2 ) correspond to the domain of f
and the coordinates (x3 , x4 , x5 ) correspond to the codomain of f . The graph of f : R2 → R3
consists of all points (x1 , x2 , x3 , x4 , x5 ) ∈ R5 for which

  
4x1 + x2
x3
.
 x4  = 
x1 x 2
2
2
x5
x1 + x2 + 1
Each of these equations describes a 4-dimensional hypersurface in R5 , and the graph of f is
the intersection of these three hypersurfaces. Since there are three independent equations
for five variables, the graph of f is a 2-dimensional surface in R5 . This is in agreement with
the fact that there are two independent variables in the domain of f .
The graph of a linear vector-valued function f : Rn → Rm has the form

 
 
 
xn+1
a11 x1 + a12 x2 + ... + a1n xn
a11 a12 . . . a1n
x1
 xn+2   a21 x1 + a22 x2 + ... + a2n xn   a21 a22 . . . a2n   x2 

 
 
 
 ..  = 
 =  ..
..
..
..   ..  ,
 .  
  .
.
.
.  . 
xn+m
am1 x1 + am2 x2 + ... + amn xn
am1 am2 . . . amn
xn
where each aij ∈ R. This graph corresponds to an n-dimensional flat in Rn+m that contains
the origin.
91
Similarly, the graph of an affine vector-valued function f : Rn → Rm can be expressed in
the matrix form

 
   
xn+1
a11 a12 . . . a1n
x1
c1
 xn+2   a21 a22 . . . a2n   x2   c2 

 
   
 ..  =  ..
..
..   ..  +  ..  ,
 .   .
.
.  .   . 
xn+m
am1 am2 . . . amn
xn
cm
where each aij ∈ R and each ck ∈ R. This graph corresponds to an n-dimensional flat in
Rn+m that does not contain the origin unless each ck = 0.
The derivative of f : Rn → Rm at a general point

∂f1
∂f1
 ∂x1 (x) ∂x2 (x)

∂f2
 ∂f2

(x)
(x)
0

∂x2
f (x) =  ∂x1
..

.

 ∂fm
∂fm
(x)
(x)
∂x1
∂x2
x ∈ Rn is defined as the matrix

∂f1
...
(x) 
∂xn

∂f2

...
(x) 
.
∂xn



∂fm 
...
(x)
∂xn
In other words, row i (where 1 ≤ i ≤ m) consists of the n partial derivatives of the
component function fi (x). Note that this expression reduces to the derivative f 0 (x) =
(∇f (x))T of a scalar-valued function f : Rn → R if we set m = 1.
31.6
The general chain rule
We are now in a position to extend the chain rule to compositions of vector-valued functions. The rule is called general because vector-valued functions incorporate all the functions encountered so far in this course. Let us begin by recalling the chain rule involving
the composition f ◦ g of two scalar-valued functions f : R → R and g : R → R of a
single variable. Let f be given by y = f (x) and g be given by x = g(t). Then, by
dy(t)
the chain rule, the derivative
of the composite function y(t) = f (g(t)) is equal to
dt
df (x)
dy(t)
dg(t)
=
|x=g(t)
. This result can be expressed in a clearer, adapted, notation as
dt
dx
dt
dy(t)
dy(x) dx(t)
=
dt
dx
dt
with the understanding that
dy(x)
is evaluated at x = g(t).
dx
Example 31.6.1 In an adapted notation, let y(x) = x3 and x(t) = sin(t). The composite function y(t) = y(x(t)) is given by y(t) = sin3 (t). By applying the chain rule
dy(t)
dy(x) dx(t)
=
, we find that
dt
dx
dt
dy(t)
= 3x2 cos(t) = 3sin2 (t) cos(t).
dt
92
Let us consider the composition f ◦ g of a scalar-valued function f : Rn → R of n variables
and a vector-valued function g : R → Rn consisting of n component functions of a single
variable.
Let f be given by y = f (x) and g be given by x = g(t). Then, the composite function
dy(t)
is equal
y(t) = f (g(t)) is a scalar-valued function of a single variable. Its derivative
dt
to the matrix product
dy(t)
dy(x) dx(t)
=
.
dt
dx
dt
As we discussed previously,
vector.
dx(t)
dy(x)
is a 1 × n row vector and
is an n × 1 column
dx
dt
Let
= x1 2 + x2 x3 + x3 be a scalar-valued function of three vari 2y(x) 
t +t

ables and let x(t) = 3t + 1 be a vector-valued function consisting of three component
t4 − 5
functions of a single variable. Then, the composite function y(t) is a scalar-valued function
of a single variable given by
Example 31.6.2
y(t) = (t2 + t)2 + (3t + 1)(t4 − 5) + (t4 − 5).
The chain rule
dy(x) dx(t)
dy(t)
=
gives
dt
dx
dt

dy(t)
= 2x1 x3
dt

2t + 1
x2 + 1  3  .
4t3
Performing the matrix multiplication and expressing the answer in terms of t we find that
dy(t)
= 2(t2 + t)(2t + 1) + (t4 − 5)(3) + [(3t + 1) + 1](4t3 ).
dt
You may confirm this result by differentiating the composite function y(t) = (t2 + t)2 +
(3t + 1)(t4 − 5) + (t4 − 5) directly.
Finally, let us consider the composition f ◦ g of a vector-valued function
f : Rn → Rm consisting of m component functions of n variables and a vector-valued
function g : Rk → Rn consisting of n component functions of k variables. Let f be given
by y = f (x) and g be given by x = g(t). Then, the composite function y(t) = f (g(t)) is a
vector-valued function consisting of m component functions of k variables. Its derivative
dy(t)
is equal to the matrix product
dt
dy(t)
dy(x) dx(t)
=
,
dt
dx
dt
where
dy(x)
dx(t)
is an m × n matrix and
is an n × k matrix.
dx
dt
93
Example 31.6.3
Let y(x) =
x1 x4
x2 + x3
be a vector-valued function consisting of two


t1 + t3
 t1 t2 

component functions of four variables and let x(t) = 
 t3 2  be a vector-valued function
t2
consisting of four component functions of three variables.
Then, the composite function y(t) = y(x(t)) is a vector-valued
function consisting of 2
(t1 + t3 )t2
component functions of 3 variables given by y(t) =
. Applying the chain rule
t1 t2 + t3 2
dy(t)
dy(x) dx(t)
=
we obtain
dt
dx
dt


1 0 1

dy(t)
x4 0 0 x1 
t2 t1 0 .
=
0 1 1 0  0 0 2t3 
dt
0 1 0
Performing the matrix multiplication and expressing the answer in terms of t we find that
dy(t)
t2 t1 + t3 t2
.
=
t2
t1
2t3
dt
You
this result by calculating the derivative of the composite function y(t) =
may confirm
(t1 + t3 )t2
directly.
t1 t2 + t3 2
31.7
Adapting the chain rule
Sometimes it becomes necessary to adapt the chain rule as in the following two examples.
Example 31.7.1 The length x1 , the width x2 and the height x3 of a rectangle are all
functions of time according to x1 (t) = 2t, x2 (t) = t2 and x3 (t) = t. Find the rate of change
of the volume as a function of time.
The volume of the rectangle is a scalar-valued function of three variables given by
V (x) = x1 x2 x3 . As a function of t, the volume V is expressed by the composite function V (t) = V (x(t)) = (2t)(t2 )(t) = 2t4 . Differentiating this expression directly, we find
dV (t)
that the rate of change of the volume is
= 8t3 . Alternatively, using the chain rule
dt
dV (t)
dV (x) dx(t)
=
we obtain the expression
dt
dx
dt
 
2
dV (t)
= x2 x3 x1 x3 x1 x2 2t.
dt
1
Multiplying the matrices and expressing the answer in terms of t we confirm that
dV (t)
= 2(t2 )(t) + 2t(2t)(t) + (2t)(t2 ) = 8t3 .
dt
94
Example 31.7.2 Let y = f (x1 , x2 , x3 ) be a scalar-valued function of three variables
and suppose that x2 and x3 both depend on x1 via some functions x2 = g(x1 ) and
dy(x1 )
of the function
x3 = h(x1 ). Adapt the chain rule in order to express the derivative
dx1
y(x1 ) = f (x1 , g(x1 ), h(x1 )) as a function of x1 .
 
x1
Let us adapt the chain rule to this case by letting x = x2  and considering x as a
x3


x1
vector-valued function of the single variable x1 according to x(x1 ) = g(x1 ). Then, the
h(x1 )
function y(x1 ) = f (x1 , g(x1 ), h(x1 )) can be regarded as the composition y(x1 ) = f (x(x1 )).
The chain rule
df (x) dx(x1 )
dy(x1 )
=
gives us the matrix product
dx1
dx dx1


1
 dg(x1 ) 
dy(x1 )
∂f (x) ∂f (x) ∂f (x) 

=
 dx1 ,
dx1
 dh(x ) 
∂x1
∂x2
∂x3
1
dx1
which can be written as
∂f (x) ∂f (x) dg(x1 ) ∂f (x) dh(x1 )
dy(x1 )
=
+
+
.
dx1
∂x1
∂x2 dx1
∂x3 dx1
This result can be expressed in the alternative (adapted) notation
dy(x1 )
∂y(x) ∂y(x) dx2 (x1 ) ∂y(x) dx3 (x1 )
=
+
+
.
dx1
∂x1
∂x2
dx1
∂x3
dx1
The use of the ordinary derivative on the left hand side makes clear that we regard y as a
function of x1 alone, via the composite function f (x(x1 )). Similarly, the use of the partial
derivatives on the right hand side makes clear that we regard y as a function of all three
variables x1 , x2 , x3 , via the original function f (x).
Another way of arriving at the same result (which is less formal than the previous) is by
realising that the expression y(x1 ) = f (x1 , g(x1 ), h(x1 )) implies that y responds to changes
of x1 via three different channels: (i) directly, (ii) via x2 and (iii) via x3 .
Hence the total response of y to changes of x1 is obtained by combining these three contributions according to
∂f (x) ∂f (x) dg(x1 ) ∂f (x) dh(x1 )
dy(x1 )
=
+
+
.
dx1
∂x1
∂x2 dx1
∂x3 dx1
95
31.8
Exercises for self study
Exercise 31.8.1 Consider the three-dimensional hypersurface in R4 described by the
Cartesian equation
x4 = f (x1 , x2 , x3 ) = x1 2 + x2 2 + x3 2 .
(a) Slice this hypersurface by the ‘horizontal’ hyperplane in R4 described by the equation
x4 = 14 and obtain a Cartesian description for the resulting two-dimensional contour,
regarded as a geometrical object in R4 .
Also consider the point (1, 2, 3, 14) ∈ R4 that lies on this contour.
(b) Find the gradient at the point (1, 2, 3) on this contour (where now the contour is
regarded as a geometrical object in the ‘horizontal’ x1 x2 x3 -space) and write down the
Cartesian equation of the plane tangent to this contour at (1, 2, 3).
(c) 
Calculate
the directional derivative fu (1, 2, 3) in the ‘horizontal’ direction

1
u = 1 by using the formula based on the scalar product.
1
(d) Finally, repeat the calculation of fu (1, 2, 3) by using the definition of the directional
derivative as a limit. Confirm that you get the same answer.
Exercise 31.8.2 (a) The length and width of a rectangle decrease at the rate of 2cm per
minute and 3cm per minute respectively. When the length of the rectangle is 6m and the
width is 3m, how fast is the area changing?
(b) Suppose that z = f (x, y) is a function of two variables x and y, and that y depends on
dz
in
x via the function y = g(x). Write down an expression for the ordinary derivative
dx
terms of the partial derivatives of f and the ordinary derivative of g.
Exercise 31.8.3 Let f (x, y, z) = 3x2 + 2y 2 − z 2 .
(a) Find
∂f
∂f ∂f
,
and
.
∂x ∂y
∂z
(b) Obtain a Cartesian equation in R4 for the tangent hyperplane to the hypersurface
u = 3x2 + 2y 2 − z 2 at (1, 1, 1, 4).
(c) Obtain in the context of the xyz-space a Cartesian equation for the plane that is tangent
to the surface
3x2 + 2y 2 − z 2 = 13
at the point (2, −1, −1).
Write down its normal vector as a 3 × 1 vector in the context of the xyz-space and also as
a 4 × 1 ‘horizontal’ vector in the context of R4 .
Exercise 31.8.4 (a) The function f : R3 → R2 with component functions f1 and f2 is
defined by
u = f1 (x, y, z) = x2 + y 2 + z 2 ; v = f2 (x, y, z) = x − y.
96
 
x
8
Find all the points x = y  such that f (x) =
, and describe the curve consisting of
0
z
these points.
(b) Write down the derivative f 0 (x) of f .
31.9
Relevant sections from the textbooks
• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Sections 3.8, 5.1, 5.2, 5.3 and 5.4 of our Calculus Textbook are relevant.
32
32.1
Multivariate calculus, 4 of 5
The second derivative of a function
d2 f (x)
of a function f : Rn → R is defined by taking the derivative
dx2
of the (transpose of the) derivative of f (x); that is, the transpose of the gradient ∇f (x):
The second derivative
d
d2 f (x)
=
2
dx
dx
df (x)
dx
T
=
d
∇f (x) .
dx
This definition results in a symmetric n × n matrix whose entries are the second order
partial derivatives of f (x). The common notation for the second derivative of f (x) is
f 00 (x).
Example 32.1.1 Find the second derivative f 00 (x) of the function f : R3 → R given by
f (x) = 2x1 x2 + x3 4 .
The first derivative f 0 (x) is the 1 × 3 row vector given by
∂f (x) ∂f (x) ∂f (x)
0
f (x) =
.
∂x1
∂x2
∂x3
Applying this definition to the given function f (x) = 2x1 x2 + x3 4 we find that
f 0 (x) = 2x2 2x1 4x3 3 .
Transposing the derivative of f we obtain the gradient of f , which is the 3 × 1 column
vector


∂f (x)

 ∂x1  
2x


2
T  ∂f (x) 
∇f (x) = f 0 (x) = 
 =  2x1  .
 ∂x2 
4x3 3
 ∂f (x) 
∂x3
97
We can now regard the gradient ∇f (x) as a vector-valued function consisting of three
component functions of three variables. Hence, its derivative (which is the second derivative
of f (x)) is given by the 3 × 3 matrix
∂ 2 f (x)
 ∂x1 2
d2 f (x) 
d
 ∂ 2 f (x)
∇f (x) =
=

 ∂x1 ∂x2
dx
dx2
 ∂ 2 f (x)
∂x1 ∂x3

∂ 2 f (x)
∂x2 ∂x1
∂ 2 f (x)
∂x2 2
∂ 2 f (x)
∂x2 ∂x3

∂ 2 f (x)


∂x3 ∂x1 
0
2
0

∂ 2 f (x)  
0 .
= 2 0
∂x3 ∂x2 
0 0 12x3 2
∂ 2 f (x) 
∂x3 2
The reason that this matrix is symmetric is that the partial derivatives of f (x) commute;
∂ 2 f (x)
∂ 2 f (x)
=
.
that is, for all i and j in the set {1, 2, 3}, we have that
∂xi ∂xj
∂xj ∂xi
This will always be the case for the functions considered in this course. Functions which
do not have this property do exist but will not be covered.
32.2
Taylor polynomial for a scalar-valued function
Recall that the Taylor polynomial P2 (x) of degree two for a twice-differentiable function
f : R → R about a point a ∈ R is given by
1
P2 (x) = f (a) + f 0 (a)(x − a) + f 00 (a)(x − a)2 .
2
The graph of P2 (x) approximates the graph of the function f (x) near the point a in the
sense that P2 (a) = f (a), P20 (a) = f 0 (a) and P200 (a) = f 00 (a).
Having now defined the first and the second derivative of a twice-differentiable scalar-valued
function f : Rn → R of n variables, it is straightforward to proceed to the corresponding
second-order Taylor polynomial.
 
a1
 a2 
 
If the point of expansion is a =  ..  ∈ Rn , the Taylor polynomial P2 of f about a is
.
an
given by
1
P2 (x) = f (a) + f 0 (a)(x − a) + (x − a)T f 00 (a)(x − a).
2
Notice that f (a) is a scalar, the first derivative f 0 (a) is a 1 × n row vector, the difference
x − a is an n × 1 column vector and the second derivative f 00 (a) is an n × n matrix. The
order of the matrix multiplications is such that every term in the Taylor polynomial P2 (x)
is a scalar.
 
3
Example 32.2.1 Find the Taylor polynomial P2 (x) about a = 2 ∈ R3 for the function
1
f : R3 → R given by f (x) = 2x1 x2 + x3 4 .
98
The first derivative is given by
df (x)
= 2x2 2x1 4x3 3 ,
dx
The second derivative is given

0
2
d f (x) 
2
=
dx2
0
so
f 0 (a) = 4 6 4 .
by

2
0
0
0 ,
0 12x3 2

so

0 2 0
f 00 (a) = 2 0 0  .
0 0 12
Finally, we have that f (a) = 13. Hence, the Taylor polynomial P2 (x) is



x
−
3
1
0 2
1
x1 − 3 x2 − 2 x3 − 1  2 0
P2 (x) = 13 + 4 6 4 x2 − 2 +
2
0 0
x3 − 1
given by


0
x1 − 3
0  x2 − 2 ,
12
x3 − 1
which simplifies to
P2 (x) = 2x1 x2 + 3 − 8x3 + 6x23 .
 
3

2
4

The graph of P2 (x) in R4 approximates the graph of f (x) near the point 
 1  ∈ R in
13
the sense that P2 (x) and f (x) agree at a, their first derivatives P20 (x) and f 0 (x) agree at a
and their second derivatives P200 (x) and f 00 (x) agree at a. You may find verifying this fact
useful.
32.3
Classification of stationary points based on the Taylor polynomial P2
The key result established for differentiable functions of a single variable still holds: A
differentiable function f : Rn → R can have a local extremum only at a stationary point
a ∈ Rn ; that is, a point where the derivative f 0 (a) is equal to the 1 × n zero vector. The
definitions of local extrema and strict local extrema are extended to Rn in a straightforward
way. For example, a local maximum of f : Rn → R is a point a ∈ Rn with the property
that f (a) ≥ f (x) for all x sufficiently near a. If a stationary point a of f : Rn → R is
neither a local maximum nor a local minimum, it is called a saddle point.
In order to classify a stationary point a of a twice-differentiable function f : Rn → R we
can consider the Taylor polynomial approximation of the function f about a:
1
f (x) ' f (a) + f 0 (a)(x − a) + (x − a)T f 00 (a)(x − a).
2
Since f 0 (a) = 0 at the stationary point a, we find that
1
f (x) ' f (a) + (x − a)T f 00 (a)(x − a).
2
99
This implies that the scalar difference f (x) − f (a) is given approximately by
1
f (x) − f (a) ' (x − a)T f 00 (a)(x − a) for all x near a.
2
We therefore deduce that the stationary point a is:
(i) a local minimum (in fact, a strict local minimum) if
1
(x − a)T f 00 (a)(x − a) > 0 for all x near a such that x 6= a,
2
(ii) a local maximum (in fact, a strict local maximum) if
1
(x − a)T f 00 (a)(x − a) < 0 for all x near a such that x 6= a.
2
Note that (x − a)T f 00 (a)(x − a) is a quadratic form in the variable ‘x − a’. Hence we deduce
that the point a is
(i) a local minimum (in fact, a strict local minimum) if f 00 (a) is positive definite,
(ii) a local maximum (in fact, a strict local maximum) if f 00 (a) is negative definite.
(iii) On the other hand, if f 00 (a) is indefinite, then a is a saddle point.
It may also happen that f 00 (a) is none of the above; that is, f 00 (a) may be positive semidefinite but not positive definite or it may be negative semi-definite but not negative
definite. Can we conclude that a is a non-strict local minimum in the first case? Similarly,
can we conclude that a is a non-strict local maximum in the second case?
The answer is ‘no’: If none of (i), (ii), (iii) holds, the test based on the quadratic Taylor
polynomial (and hence on the second derivative f 00 (a)) is inconclusive.
This is because if none of (i), (ii), (iii) holds, the quadratic form (x − a)T f 00 (a)(x − a) fails
to reproduce the behaviour of the function f near a. A higher-order Taylor polynomial is
needed.
Example 32.3.1 Consider the function f : R2 → R given by
f (x1 , x2 ) = −x1 3 + 4x1 x2 − 2x2 2 + 1.
Show that (x1 , x2 ) = (0, 0) is a stationary point of f and classify it by considering the
second-order Taylor polynomial of f about (0, 0).
The partial derivatives of f are fx1 = −3x1 2 + 4x2 and fx2 = 4x1 − 4x2 . Both partial
derivatives become zero at (0, 0), so (0, 0) is indeed a stationary point.
−6x1 4
0 4
00
The second derivative of f is f (x1 , x2 ) =
. This becomes
when
4
−4
4 −4
evaluated at (0, 0). Therefore, the second-order Taylor approximation of f about (0, 0) is
0 4
1
x1
x1 x2
.
f (x1 , x2 ) ' 1 +
4 −4
x2
2
100
In order to deduce the nature of the point (0, 0), we consider the eigenvalues of f 00 (0, 0).
The characteristic polynomial gives
−λ
4
= λ2 + 4λ − 16 = (λ + 2)2 − 20
4 −4 − λ
so the eigenvalues are
λ1 = −2 +
√
20 > 0
and
λ2 = −2 −
√
20 < 0,
and hence f 00 (0, 0) is indefinite. We conclude that (0, 0) is a saddle point.
Example 32.3.2
Show that the function f of Example 32.3.1 has another
stationary point and use the second-order Taylor approximation about that point to deduce
that it is a local maximum.
The partial derivatives of f are fx1 = −3x1 2 + 4x2 and fx2 = 4x1 − 4x2 . To find all the
stationary points of f we require that these derivatives are equal to zero. The equation
fx2 = 0 implies that x1 = x2 . Using this relation in the equation fx1 = 0 we find that
x1 (−3x1 + 4) = 0. This implies that x1 = 0 or x1 = 4/3. Hence, besides (0, 0), there is a
stationary point at (x1 , x2 ) = (4/3, 4/3).
−6x1 4
−8 4
00
The second derivative of f is f (x1 , x2 ) =
. This becomes
when
4
−4
4 −4
evaluated at (4/3, 4/3). The Taylor second-order approximation of f about (4/3, 4/3) is
given by
−8 4
1
x1 − 4/3
x1 − 4/3 x2 − 4/3
.
f (x1 , x2 ) ' f (4/3, 4/3) +
4 −4
x2 − 4/3
2
The eigenvalues of f 00 (4/3, 4/3) are found by solving the characteristic polynomial equation
−8 − λ
4
= λ2 + 12λ + 16 = (λ + 6)2 − 20 = 0
4
−4 − λ
which gives
λ1 = −6 +
√
20 < 0
and
λ2 − 6 −
√
20 < 0.
Hence f 00 (4/3, 4/3) is negative definite, so (4/3, 4/3) is a (strict) local maximum.
Example 32.3.3 Consider the function f : R2 → R given by f (x1 , x2 ) = x21 +x32 . Classify
the stationary point(s) of f .
2
The partial derivatives of f are fx1 = 2x1 andfx2 = 3x
2 so (0, 0) is theonlystationary
2 0
2 0
point of f . The second derivative of f is f 00 =
. This becomes
at (0, 0).
0 6x2
0 0
Accordingly, the Taylor polynomial P2 of f about (0, 0) is
1
2 0
x1
.
P2 (x1 , x2 ) = (x1 x2 )
0 0
x2
2
101
2 0
The eigenvalues of f (0, 0) =
are λ1 = 2 and λ2 = 0, so f 00 (0, 0) is positive semi0 0
definite but not positive definite and the test is inconclusive. We can understand why the
test fails here. Performing the matrix multiplications, we see that
1
2 0
x1
P2 (x1 , x2 ) = (x1 x2 )
= x21 .
0 0
x2
2
00
Indeed, P2 approximates f near (0, 0) only to second-order degree of accuracy, which is
why the cubic term x32 of f does not appear in P2 . Oberserving that for small positive ε,
f (0, ε) = ε3 > 0 and f (0, −ε) = −ε3 < 0, itis clear
that
(0, 0) is a saddle point. Although
1
2 0
x1
it is true that the quadratic form (x1 x2 )
= x21 is positive semi-definite and
0 0
x2
2
hence cannot take negative values, this fact cannot be used in order to draw conclusions
about the nature of the stationary point (0, 0).
32.4
Classifying f 00 using the principal minors
We saw in the previous subsection that, given a stationary point a of a twice differentiable
function f : Rn → R, a conclusive classification for a arises only if f 00 (a) is positive definite,
negative definite or indefinite. It is possible to decide whether or not f 00 (a) is one of these
three types of matrices by using a test based on the so-called principal minors of f 00 (a). In
many cases, this test is much faster than finding the eigenvalues of f 00 (a), so let us present
it:
Let a ∈ Rn be a stationary point of a twice-differentiable function f : Rn → R and consider
f 00 (a). As already discussed, the second derivative of f is the n × n symmetric matrix


fx1 x1 (a) fx1 x2 (a) . . . fx1 xn (a)
 fx x (a) fx x (a) . . . fx x (a) 
2 2
2 n
 2 1

f 00 (a) = 
.
..
..
..
.
.


.
.
.
.
fxn x1 (a) fxn x2 (a) . . . fxn xn (a)
The principal minors of f 00 (a) are the determinants of the top left
matrices of f 00 (a); that is, they are the following numbers:

fx1 x1 (a) fx1 x2 (a)

f
(a) fx1 x2 (a)
 fx2 x1 (a) fx2 x2 (a)
det fx1 x1 (a) , det x1 x1
, . . . , det 
..
..
fx2 x1 (a) fx2 x2 (a)

.
.
hand square sub
. . . fx1 xn (a)
. . . fx2 xn (a) 

.
..
..

.
.
fxn x1 (a) fxn x2 (a) . . . fxn xn (a)
Consider any of the above k × k sub-matrices of f 00 (a); i.e., any of the above principal
minors.
• If k is an even number, the principal minor is referred to as an even principal
minor.
• If k is an odd number, the principal minor is referred to as an odd principal minor.
102
For example, det fx1
is an odd principal minor of f 00 (a) since k = 1 and
x1 (a)
f
(a) fx1 x2 (a)
det x1 x1
is an even principal minor of f 00 (a) since k = 2.
fx2 x1 (a) fx2 x2 (a)
The following results are stated without proof:
(i) The matrix f 00 (a) is positive definite if and only if all the principal minors of f 00 (a)
are strictly greater than zero.
(ii) The matrix f 00 (a) is negative definite if and only if all even principal minors of
f 00 (a) are strictly greater than zero and all odd principal minors of f 00 (a) are strictly
less than zero.
(iii) If the principal minors of f 00 (a) do not follow any of the above two patterns and,
additionally, det(f 00 (a)) 6= 0, then f 00 (a) is indefinite.
If det(f 00 (a)) = 0, the classification test for f 00 (a) based on the principal minors of f 00 (a) is
inconclusive, but f 00 (a) can still be classified using the eigenvalue test.
Example 32.4.1 Classify the stationary points (0, 0) and (4/3, 4/3) of the function f
presented in Example 32.3.1 and 32.3.2 by using the test based on the principal minors.
−6x1 4
00
The second derivative of the function f is f (x1 , x2 ) =
. At the point (0, 0),
4
−4
0 4
f 00 (0, 0) =
, the odd principal minor of f 00 (0, 0) is det(fx1 x1 (0, 0)) = 0 and the
4 −4
even principal minor of f 00 (0, 0) is det(f 00 (0, 0)) = −16. Since det(f 00 (0, 0)) 6= 0, the test
is conclusive. The matrix f 00 (0, 0) is neither positive definite nor negative definite, hence
(0, 0) is a saddle point, in agreement with our findings in Example 32.3.1.
−8
4
At the point (4/3, 4/3), f 00 (4/3, 4/3) =
, the odd principal minor of f 00 (4/3, 4/3)
4 −4
is det(fx1 x1 (4/3, 4/3)) = −8 and the even principal minor is det(f 00 (4/3, 4/3)) = 16. Since
det(f 00 (4/3, 4/3)) 6= 0, the test is conclusive. The matrix f 00 (4/3, 4/3) is negative definite,
hence (4/3, 4/3) is a (strict) local maximum, in agreement with our findings in Example
32.3.2.
Example 32.4.2 Classify the following 2 × 2 symmetric matrices using the test based
on the principal minors. Whenever this fails, switch to the eigenvalue test.
1 1
1 1
−2 0
−1 0
1 2
(a)
, (b)
, (c)
, (d)
, (e)
.
1 3
1 1
0 −3
0 0
2 1
(a) The odd and even principal minors of
1 1
1 3
are 1 and 2 respectively. So this is a
positive definite matrix.
1 1
(b) The odd and even principal minors of
are 1 and 0 respectively. Since the
1 1
determinant is zero, we cannot use the classification based on the principal minors. The
103
eigenvalues are found by solving
1−λ
1
= λ2 − 2λ = λ(λ − 2) = 0.
1
1−λ
Hence λ1 = 0 and λ2 = 2, which implies that the matrix is positive semi-definite but not
positive definite.
−2 0
(c) The odd and even principal minors of
are −2 and 6 respectively. So this is
0 −3
a negative definite matrix.
−1 0
(d) The odd and even principal minors of
are −1 and 0 respectively. Since the
0 0
determinant of A is zero, we cannot use the classification based on the principal minors.
The eigenvalues are λ1 = −1, λ2 = 0 so this matrix is negative semi-definite but not
negative definite.
1 2
(e) The odd and even principal minors of
are 1 and −3 respectively. So this is an
2 1
indefinite matrix.
32.5
Convex sets, convex and concave functions f : Rn → R
Recall that a convex function f : R → R has the property that the line segment between
any two points on its graph lies above or on this graph. A concave function f : R → R
has the property that the line segment between any two points on its graph lies below or
on this graph.
Let us also recall, and review again, the alternative description of concavity and convexity
for functions f : R → R, relying on the idea of a convex set in R2 : A convex set in R2 is a
set S such that for any two position vectors x, y in S, the line segment joining x and y lies
entirely in S. Formally, given position vectors x, y in S, the line segment joining x and y
is the set of all position vectors v described by the parametric equation v = x + t(y − x),
where the parameter t ∈ R satisfies 0 ≤ t ≤ 1. Therefore, a set S ∈ R2 is convex if for all
position vectors x, y in S and for all t ∈ R such that 0 ≤ t ≤ 1, we have that the position
vector v = x + t(y − x) is also in S.
In that context, a convex function f : R → R is defined as a function with the property
that the set S ∈ R2 of position vectors lying above or on the graph of f in R2 is a convex
set. Also, a concave function f : R → R is defined as a function with the property that
the set S ∈ R2 of position vectors lying below or on the graph of f in R2 is a convex set.
Let us now extend the definition of a convex set from R2 to any Euclidean space Rk : A
convex set in Rk is a set S such that for any two position vectors x, y in S, the line
segment {v | v = x + t(y − x), t ∈ R, 0 ≤ t ≤ 1} joining x and y lies entirely in S.
A function f : Rn → R is a convex function if the set of points lying above or on the
graph of f in Rn+1 is a convex set, and f is a concave function if the set of points lying
below or on the graph of f in Rn+1 is a convex set.
104
32.6
Convexity and concavity for twice differentiable functions
If a function f : Rn → R is twice differentiable, it is possible to determine whether or not
it is concave or convex by examining its quadratic Taylor polynomial.
Recall that if a differentiable function f : R → R is convex, then all the tangent lines to
its graph in R2 lie below or on this graph. Similarly, if a differentiable function f : R → R
is concave, then all the tangent lines to its graph in R2 lie above or on this graph.
By analogy, if a differentiable function f : Rn → R is convex, then all the tangent hyperplanes to its graph in Rn+1 lie below or on this graph, and if a differentiable function
f : Rn → R is concave, then all the tangent hyperplanes to its graph in Rn+1 lie above or
on this graph. Illustrations of these statements when n = 2 can be found in section 6.4 of
our textbook.
Hence, given that the graph of f : Rn → R is described by the Cartesian equation xn+1 =
f (x) (where x ∈ Rn ) and that the tangent hyperplane at a general point (a, f (a)) on this
graph
is
described
by
the
Cartesian
equation
0
xn+1 = f (a) + f (a)(x − a), it follows that for all x and a:
f (x) ≤ f (a) + f 0 (a)(x − a) if f is concave;
i.e., any tangent hyperplane is above or on the graph, and
f (x) ≥ f (a) + f 0 (a)(x − a) if f is convex;
i.e., any tangent hyperplane is below or on the graph.
Assuming, further, that f is twice differentiable, we can use its quadratic Taylor polynomial
about an arbitrary point a,
1
f (x) = f (a) + f 0 (a)(x − a) + (x − a)T f 00 (a)(x − a),
2
in order to obtain the following equivalent statements: for all x and a,
• f (a) + f 0 (a)(x − a) + 12 (x − a)T f 00 (a)(x − a) ≤ f (a) + f 0 (a)(x − a) if f is concave,
and
• f (a) + f 0 (a)(x − a) + 21 (x − a)T f 00 (a)(x − a) ≥ f (a) + f 0 (a)(x − a) if f is convex.
Simplifying, we derive equivalent statements that involve only the quadratic Taylor term.
Specifically, for all x and a,
• (x − a)T f 00 (a)(x − a) ≤ 0 if f is concave, and
• (x − a)T f 00 (a)(x − a) ≥ 0 if f is convex.
Linking these results to our classification of symmetric matrices, we see that a twice differentiable function f : Rn → R is
• concave if and only if f 00 (a) is negative semi-definite for all a ∈ Rn , and
105
• convex if and only if f 00 (a) is positive semi-definite for all a ∈ Rn .
The importance of convex and concave functions for optimisation problems is given by
the following theorem, stated without proof:
Theorem 32.6.1 (i) If f 00 (a) is negative semi-definite for all a ∈ Rn , then a local
maximum of f is automatically a global maximum. (ii) If f 00 (a) is positive semi-definite for
all a ∈ Rn , then a local minimum of f is automatically a global minimum.
Example 32.6.2 Consider the function f : R2 → R defined by
f (x1 , x2 ) = 2x1 2 + 2x1 x2 + x2 2 − 4x1 .
Find its stationary points. Investigate if this function has any global extrema and, if yes,
identify them.
Since f is differentiable in R2 , we know that a local extremum can only appear at a stationary point of f . In addition, we know that a global extremum is also a local extremum,
so the only candidates for global extrema are the stationary points of f .
In order to find the stationary points of f we set its partial derivatives equal to zero. This
results in the system of equations 4x1 +2x2 −4 = 0 and 2x1 +2x2 = 0. The second equation
implies that x2 = −x1 . Hence, the first equation becomes 2x1 − 4 = 0, which implies that
x1 = 2. Therefore, the only stationary point of f is (x1 , x2 ) = (2, −2).
In order to classify this stationary point and also determine whetheror not
f is concave or
4
2
convex, we calculate the second derivative of f . We have f 00 (a) =
for all a ∈ R2 .
2 2
The odd and even principal minors are 4 and 4. Hence, f 00 (a) is a positive definite matrix
for all a ∈ R2 .
In particular, f 00 (2, −2) is a positive definite matrix and therefore, the stationary point
(2, −2) is a local minimum (in fact, a strict local minimum). Now, since f 00 (a) is positive
definite and hence positive semi-definite for all a ∈ R2 , f is a convex function. It follows
that (2, −2) is a global minimum. Moreover, since (2, −2) is the only stationary point of
f , it is actually the unique global minimum. Note that f cannot have a global maximum,
because there is no other stationary point and hence no candidate for a global maximum.
In order to confirm this directly, let us evaluate f at points of the form (0, t). We have
f (0, t) = t2 . Hence, as t → ∞, we see that f (0, t) → ∞. The fact that f grows without
an upper bound implies that f has no global maximum.
32.7
Exercises for self study
Exercise 32.7.1 Find all the stationary points of the function f : R2 → R defined by
f (x, y) = xy 2 + x2 y − xy.
Does f have any global extrema?
106
Exercise 32.7.2 For the following function, find all the stationary points and classify
them as local maxima, local minima, or saddle points. Show that f does not have any
global extrema.
f (x, y) = 1 − y 3 − 3yx2 − 3y 2 − 3x2 .
Exercise 32.7.3 For each of the following functions, find all the stationary points and
classify them as local maxima, local minima, or saddle points.
(a) f (x, y) = 4xy − x4 − y 4
(b) f (x, y) = 4x2 ey − 2x4 − e4y
Exercise 32.7.4 Consider the function f : R2 → R defined by
f (x, y) = x2 + 6xy + 6y 2 + 7.
Find its stationary points. Investigate if this function has any global extrema and, if yes,
identify them.
32.8
Relevant sections from the textbooks
• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Sections 4.6, 4.7, 4.8, 6.1, 6.2, 6.3 and 6.4 of our Calculus Textbook are relevant.
33
Multivariate calculus, 5 of 5
In this subsection we will focus on constrained optimisation problems; namely, problems
where the function f : Rn → R is optimised on a proper subset D ⊂ Rn defined by
imposing certain constraints on the components of the variable x ∈ Rn .
Methods for dealing with such optimisation problems usually involve optimising f on the
interior of D, then separately optimising f on the boundary of D (provided, of course,
that this boundary is included in D) and finally reaching an overall conclusion based on
these individual optimisation problems. An example of such an approach is presented in
the Practice Questions.
We start by presenting a widely used method for dealing with constrained optimisation
problems, known as Lagrange’s method. This method is applicable to optimisation
problems subject to equality constraints. It can also be applied to problems subject to
inequality constraints provided that the latter can be reduced to equality ones.
33.1
Motivating Lagrange’s method
Let us motivate Lagrange’s method by using an example. Suppose that our task is to
maximise the sum x1 + x2 of two real numbers x1 and x2 subject to the constraint that the
sum of their squares is equal to 1.
Rephrasing this problem, we want to maximise
the so-called objective function f (x1 , x2 ) =
x1 + x2 in the feasible region D = (x1 , x2 ) ∈ R2 | x21 + x22 = 1 .
107
Approaching this problem graphically, the feasible region is a circle of radius 1 centred at
the origin, and each contour f (x1 , x2 ) = c (which determines the set of points (x1 , x2 ) ∈ R2
whose sum is equal to the constant c) is a line of gradient −1.
A few of these contours are illustrated below:
Observing that c4 > c3 > 1 > 0 > −1 > c2 > c1 , we are interested in the largest possible
value of the constant c such that the contour f (x1 , x2 ) = c has points (or point) in common
with the feasible region D. Clearly, the required value is c3 and the point in common is B.
This gives the solution of the maximisation problem.
Suppose now that our task is to minimise the sum x1 + x2 in the feasible region D. Using
the same approach, we see that the required value of c is c2 and the corresponding point
is A, which is the solution to the minimisation problem.
In order to calculate the coordinates of the points A and B, we observe that at each of
these points the contour f (x1 , x2 ) = c is tangent to the constraint curve x21 + x22 = 1 which
defines the region D:
Moreover, if we introduce the function G(x1 , x2 ) = x21 + x22 , then the constraint curve
x21 + x22 = 1 is simply the contour G(x1 , x2 ) = 1. Thus, recalling that the normal vectors
to the contours f (x1 , x2 ) = c and G(x1 , x2 ) = 1 are the gradient vectors ∇f and ∇G, we
obtain the following two conditions that need to be satisfied by the coordinates (x1 , x2 ) of
A and B:
(i) ∇f (x1 , x2 ) = λ∇G(x1 , x2 ),
108
(ii) G(x1 , x2 ) = 1.
The first condition expresses the fact that the contour f (x1 , x2 ) = c is tangent to the
contour G(x1 , x2 ) = 1 at A and B (and hence ∇f is a scalar multiple of ∇G) and the
second condition expresses the fact that A and B must lie in the feasible region D.
Let us solve this system of equations. Expressing condition (i) in terms of the partial
derivatives of f and G we obtain the equations
fx1 = λGx1 (i.e., 1 = 2λx1 ) and fx2 = λGx2 (i.e, 1 = 2λx2 ).
These two equations and the constraint G(x1 , x2 ) = 1 form a system of three equations for
the variables x1 , x2 and λ. We can eliminate λ from the first two equations in order to
obtain an equation involving x1 and x2 alone, and then use this equation in the constraint
G(x1 , x2 ) = 1 in order to find x1 and x2 .
Following this plan, we see that equations 1 = 2λx1 and 1 = 2λx2 imply that λ =
1
and
2x1
1
. Hence, we have x1 = x2 . Using this equation in the constraint G(x1 , x2 ) = 1 we
2x2
1
1
see that 2x22 = 1. This yields two solutions for x2 , namely, x2 = √ and x2 = − √ . The
2
2
1 1 1 1
corresponding points are (x1 , x2 ) = √ , √ and (x1 , x2 ) = − √ , − √ . These are
2 2
2
2
respectively the coordinates√of the points B and√A. The contours of f where B and A lie
the maximum value of√the
are respectively x1 + x2 = 2 and x1 + x2 = − 2. Therefore,
√
2
2
sum x1 + x2 subject to the constraint x1 + x2 = 1 is 2 and the minimum value is − 2.
λ=
33.2
Lagrange’s method with an equality constraint
The idea of tangency used in the previous section provides the basis for Lagrange’s method.
We will only discuss cases where the optimisation problem can be reduced to a problem
involving a single equality constraint. Optimisation problems subject to multiple equality
and inequality constraints are accompanied by certain complications whose treatment goes
beyond the scope of our course.
Let us introduce Lagrange’s approach using the optimisation problem of the previous section. We define the so-called Lagrangian L(x1 , x2 , λ) by
L(x1 , x2 , λ) = f (x1 , x2 ) + λ(1 − G(x1 , x2 )),
where the functions f and G are those defined in the previous section; namely, f (x1 , x2 ) =
x1 + x2 and G(x1 , x2 ) = x21 + x22 .
Treating the Lagrangian L as a function of three variables, we find its stationary points by
setting all three partial derivatives to zero. We obtain:
Lx1 = fx1 − λGx1 = 0,
Lx2 = fx2 − λGx2 = 0,
Lλ = 1 − G(x1 , x2 ) = 0.
109
We recognise that the first two equations correspond to the statement
(i) ∇f (x1 , x2 ) = λ∇G(x1 , x2 )
and the last equation gives the constraint
(ii) G(x1 , x2 ) = 1.
We have therefore recovered the system of equations of the previous section. We already
1
1 1 1 know that the solutions (x1 , x2 ) = √ , √ and (x1 , x2 ) = − √ , − √ of these equa2 2
2
2
tions are the coordinates of the maximum B and the minimum A, respectively.
33.3
Regarding the form of the Lagrangian
One may ask: Is the Lagrangian function L(x1 , x2 , λ) = f (x1 , x2 ) + λ(1 − G(x1 , x2 )) the
only function that reproduces the tangency condition (i) and the constraint (ii)?
The answer is no. There are many alternative Lagrangian functions that reproduce these
two conditions. For example, f can be replaced in L by a function such as 2f , the term
+λ can by replaced by −λ and the term 1 − G(x1 , x2 ) can be rescaled to become, say,
5 − 5G(x1 , x2 ) or 8G(x1 , x2 ) − 8.
The reason that we have used the conventions f , +λ and 1 − G(x1 , x2 ) (i.e., the constant
comes first, followed by the term involving the variables) will become clear shortly, when
we discuss the interpretation of λ and its relation to f .
Note that in our textbook, the constraint function 1 − G(x1 , x2 ) is denoted by g(x1 , x2 ).
Accordingly, the Lagrangian is written as L(x1 , x2 , λ) = f (x1 , x2 ) + λg(x1 , x2 ) and the
equations derived from the Lagrangian yield the conditions (i) ∇f (x1 , x2 )+λ∇g(x1 , x2 ) = 0
and (ii) g(x1 , x2 ) = 0.
33.4
Regarding the applicability of Lagrange’s method
Observation 1: In the example of the previous section, both stationary points of the
Lagrangian gave constrained extrema of the objective function f .
Question 1: Can we claim in general that each stationary point of L gives a constrained
extremum of f ?
The answer is no. As the following example shows, there may be stationary points of L
that do not give constrained extrema of f .
Example 33.4.1 Maximise and minimise the objective function f (x1 , x2 ) = x1 + x2 in
the feasible region sketched below:
110
The points A, B, C and D are all points where the contour f (x1 , x2 ) = c is tangent to the
contour of the constraint function. Therefore, we expect that all these points will arise as
stationary points of the Lagrangian. Clearly, the point A gives the constrained minimum
of f and the point D gives the constrained maximum. However, points B and C give
neither constrained minima nor constrained maxima.
Observation 2: In Example 33.4.1 as well as in our initial example, the stationary points
of the Lagrangian included among them both the constrained extrema of f .
Question 2: Is it correct to say that if the Lagrangian has stationary points, then points
corresponding to constrained extrema of f can always be found among them?
The answer is again no. In the optimisation problem presented below, the Lagrangian has
two stationary points but no constrained extrema that correspond to them. In fact, this
optimisation problem does not admit any constrained extrema.
Example 33.4.2 Maximise and minimise the objective function f (x1 , x2 ) = x1 + x2 in
the feasible region D = {(x1 , x2 ) ∈ R2 | x1 x2 = 16} :
Clearly, points A and B arise as stationary points of the Lagrangian since the relevant
contours become parallel there. However, as c decreases to −∞ or as c increases to ∞ the
contour f (x1 , x2 ) = c keeps intersecting the feasible region. Hence, there is neither a lower
bound nor an upper bound on c. We deduce that f has no constrained extrema.
Observation 3: In all the examples seen so far, if a constrained extremum of f exists,
then Lagrange’s method is able to find it.
111
Question 3: Is it correct to say that if a point corresponding to a constrained extremum of
the objective function exists then it always appears as a stationary point of the Lagrangian?
The answer is again no. In the following example, the optimisation problem admits both
a constrained maximum and a constrained minimum but the corresponding points do
not arise as stationary points of the Lagrangian. This is because the Lagrangian is not
differentiable at these points.
Example 33.4.3 Maximise and minimise the objective function f (x1 , x2 ) = x1 + x2 in
the feasible region sketched below:
We see that point A corresponds to the constrained minimum of f and that point B
corresponds to the constrained maximum of f . However, the gradient of the constraint
function is not defined at any of these points, so Lagrange’s method is not applicable.
Question 4: Taking the issue raised in Example 33.4.3 into account, let us modify the
previous question: Provided that the objective function and the constraint function are
both differentiable, can we claim that if a constrained extremum of f exists then the
corresponding point always appears as a stationary point of the Lagrangian?
Strictly speaking, the answer is still no. There is an additional requirement that the
constraint function has to satisfy before we can have a valid claim.
This requirement is known as the constraint qualification. For optimisation problems
that can be reduced to problems subject to a single equality constraint (such as the problems that we will discuss in this course) the constraint qualification requires that the gradient of the constraint function is non-zero at the point where the constrained extremum
arises. Consider the following example:
Example 33.4.4 Maximise
and minimise the objective function f (x1 , x2 ) = x1 + x2 in
the feasible region D = (x1 , x2 ) ∈ R2 | x21 + x22 = 0 .
112
Note that D consists only of the origin; i.e., D = {(0, 0)}. Therefore, both the constrained maximum
and
the constrained minimum of f occur at (0, 0). However, the
2x1
gradient ∇g =
of the constraint function g(x1 , x2 ) = x21 + x22 vanishes at (0, 0),
2x2
so
the tangency
condition ∇f (0, 0) + λ∇g(0, 0) = 0 produces the inconsistent statement
1
0
=
and therefore Lagrange’s method fails to identify the constrained maximum
1
0
and the constrained minimum at (0, 0).
Summarising: After taking all these issues into account, let us finally state what Lagrange’s theorem says when applied to optimisation problems that can be reduced to
problems subject to a single equality constraint:
If
(i) the Lagrangian L is differentiable,
(ii) the constraint qualification is satisfied, and
(iii) a constrained extremum for the optimisation problem exists,
then
the point corresponding to this extremum always appears as a stationary point of
the Lagrangian.
Note that optimisation problems where the Lagrangian is not differentiable or the constraint qualification condition is not satisfied will not be covered in this course. So the
cases illustrated in Examples 33.4.3 and 33.4.4 are not relevant for our exams. However, it
is important to know all the conditions that need to be satisfied before applying Lagrange’s
theorem.
So, for our purposes, Lagrange’s theorem suggests the following approach to solving optimisation problems subject to a single equality constraint:
(i) Establish the existence of an optimal solution to the optimisation problem. This is
usually accomplished by considering the contours of the objective function.
113
(ii) Find the stationary points of the Lagrangian. By Lagrange’s theorem, these are the
only candidates for the optimal solution.
(iii) Evaluate the objective function at each of these candidates in order to decide where
the optimal solution occurs (keeping in mind that there may be many optimal solutions).
33.5
The Lagrange multiplier
The parameter λ appearing in the Lagrangian L(x1 , x2 , λ) = f (x1 , x2 ) + λg(x1 , x2 ) is called
the Lagrange multiplier. Let the constraint function g have the form g(x1 , x2 ) =
b − G(x1 , x2 ) for some constant b (recall that putting the constant first is precisely the
convention that we used when we introduced the Lagrangian earlier).
Let (x∗1 (b), x∗2 (b), λ∗ (b)) be a stationary point of the Lagrangian corresponding to a constrained extremum of the optimisation problem (we are not interested in all the stationary
points of the Lagrangian but only in those corresponding to the constrained extrema). As
the notation suggests, any stationary point of L can be regarded as a function of b.
Moreover, regard f as a function of b via the composite function
f (b) = f (x∗1 (b), x∗2 (b)).
Note that f (b) is the value of f at the constrained extremum (x∗1 (b), x∗2 (b), λ∗ (b)); in other
words, f (b) is the optimal value of f .
We have the following result, which is stated without proof:
Theorem 33.5.1 The rate of change of f (b) with respect to b is equal to the value of λ
at that particular constrained extremum:
df (b)
= λ∗ (b).
db
The significance of this result is the following:
If b is increased in the constraint b − G(x1 , x2 ) = 0 by a small amount ∆b (that is, if we are
solving a new optimisation problem where the objective function is f and the constraint
is b + ∆b − G(x1 , x2 ) = 0) then the optimal value of f (associated with the constrained
extremum (x∗1 (b), x∗2 (b), λ∗ (b))) changes by approximately λ∗ (b)∆b.
Let us illustrate this by considering the following example, which is an extension of the
example presented at the beginning of this week’s lecture notes:
Example 33.5.2 Maximise and minimise
f (x1 , x2 ) = x1 + x2
in the feasible region
D = (x1 , x2 ) ∈ R2 | x21 + x22 = b ,
where b is some given constant.
114
Introducing the constraint function g(x1 , x2 ) = b−x21 −x22 and the corresponding Lagrangian
L(x1 , x2 , λ) = f (x1 , x2 ) + λg(x1 , x2 ), we obtain the following three equations:

 1 − 2λx1 − 0
1 − 2λx2 = 0

b − x21 − x22 = 0
Solving this system in the way explained previously, we obtain the constrained optima
√ √
b
b 1 ∗
∗
∗
(x1 (b), x2 (b), λ (b)) = √ , √ , √
2 2 2b
and
(x∗1 (b), x∗2 (b), λ∗ (b))
√
√
b
b
1 = − √ , −√ , −√ .
2
2
2b
The first point corresponds to the constrained maximum of f ; the second point corresponds
to the constrained
minimum of f . The values f (b) = f√(x∗1 (b), x∗2 (b)) at these optima are
√
f (b) = 2b at the constrained maximum and f (b) = − 2b at the constrained minimum.
At each of these constrained extrema, it is easy to confirm that
df (b)
= λ∗ (b)
db
1
where λ∗ (b) is the corresponding value of λ; that is, λ∗ (b) = √ at the constrained maxi2b
1
∗
mum and λ (b) = − √ at the constrained minimum.
2b
Hence, if in the context of a new optimisation problem we replace b by b + ∆b, the optimal
value of f at each constrained optimum of the new optimisation problem will approximately
be f (b) + λ∗ (b)∆b, where f (b) and λ∗ (b) are the optimal value of f and the value of λ at
the corresponding constrained optimum of the old optimisation problem.
√
1
In particular, the value of f will approximately be 2b + √ ∆b at the constrained
2b
maximum of the new optimisation problem and the value of f will approximately be
√
1
− 2b − √ ∆b at the constrained minimum of the new optimisation problem.
2b
This is illustrated below:
115
Hence, Lagrange’s method not only solves the given optimisation problem (whenever of
course the conditions of Lagrange’s theorem are satisfied) but it also carries information
about the optimal value of f subject to a slightly modified constraint. This additional
information is certainly of interest in optimisation problems. For example, in Economics,
one frequently optimises a production function subject to a given budget constraint. Having
found the optimal solution, it is useful to know by how much the optimal value of the
production function will change if one slightly increases or reduces the budget.
33.6
Exercises for self study
Exercise 33.6.1 Consider the cost function C : R2 → R defined by
C(x, y) = 4x2 + 4y 2 − 2xy − 40x − 140y + 1800
for a firm producing two goods x and y.
(a) Show that C is a convex function on R2 and hence find its global minimum in the
feasible region D ⊂ R2 defined by the inequalities x ≥ 0 and y ≥ 0.
(b) Suppose that the production requirement
x + y ≥ 25
116
is introduced additionally to x ≥ 0 and y ≥ 0. Find the production levels x and y which
minimise C on the feasible region R ⊂ R2 defined by all three inequalities x ≥ 0, y ≥ 0
and x + y ≥ 25.
(c) Is the Lagrangian function defined by
L(x, y, λ) = C(x, y) + λ(25 − x − y)
suitable for solving the minimisation problem described in part (b)?
Exercise 33.6.2 Consider the cost function C : R2 → R introduced in Exercise 33.6.1
and the feasible region S ⊂ R2 defined by the inequalities
x ≥ 0, y ≥ 0 and x + y ≥ 35.
It is given that C attains a global minimum on S.
(a) Sketch S and explain why the global minimum of C on S must occur on the boundary
of S.
(b) Minimise C on S by eliminating one of the variables x or y using the fact that the
optimal solution occurs on the boundary of S.
(c) Hence, write down a Lagrangian L(x, y, λ) that is suitable for the minimisation of C
on S and use it to verify your answer to part (b).
Exercise 33.6.3 The production function P for a particular manufacturer has the CobbDouglas form
P (x, y) = 100x3/5 y 2/5
where the variables x and y represent labour and capital, respectively. The cost of labour
is 150 pounds per unit and the cost of capital is 250 pounds per unit; i.e., the cost function
is
C(x, y) = 150x + 250y.
(a) Sketch roughly the feasible region D ⊂ R2 defined by x ≥ 0, y ≥ 0 and the requirement
that the total cost of capital and labour cannot exceed 100,000 pounds.
(b) Sketch a few contours of the production function in order to establish the existence of
a point M on the boundary of D corresponding to the constrained maximum of P on D.
(c) Write down a suitable Lagrangian for the maximisation of P on D and use it to find the
coordinates (x∗ , y ∗ ) of the point M and the corresponding value of the Lagrange multiplier
λ∗ . Also find the value of P at M .
(d) Suppose that the total budget for capital and labour is increased by a small amount ε
to (100, 000 + ε) pounds. Determine to first order in ε the maximum value of P subject to
the modified budget constraint.
Exercise 33.6.4 Consider the production function P and the cost function C introduced
in Exercise 33.6.3.
117
(a) Sketch roughly the feasible region R ⊂ R2 defined by x ≥ 0, y ≥ 0 and the requirement
that the total production cannot be less than 20,000 product units.
(b) Sketch a few contours of the cost function in order to establish the existence of a point
m on the boundary of R corresponding to the constrained minimum of C on R.
(c) Write down a suitable Lagrangian for the minimisation of C on R and use it to find the
coordinates (x∗ , y ∗ ) of the point m and the corresponding value of the Lagrange multiplier
λ∗ . Also find the value of C at m.
(d) Suppose that the total production requirement is increased by a small amount δ to
(20, 000 + δ) units. Determine to first order in δ the minimum value of C subject to the
modified production requirement.
33.7
Relevant sections from the textbooks
• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Sections 6.6, 6.7 and 6.8 of our Calculus Textbook are relevant.
34
34.1
Differential and difference equations, 1 of 5
Interest compounding
Consider the following three problems:
Problem 1: An amount P , called the principal, is invested for t years at an annual
interest rate r. Interest is compounded annually. Calculate the final sum.
• After 1 year, the amount is P (1 + r).
• After 2 years, the amount is P (1 + r)2 .
• Hence, after t years, the amount is P (1 + r)t .
Problem 2: An amount P is invested for t years at an annual interest rate r. Interest is
compounded quarterly. Calculate the final sum.
In this case, the annual interest rate r is divided by 4 and interest is compounded 4 times
a year. Therefore:
r
• After 1 year, the amount is P (1 + )4 .
4
r
• After 2 years, the amount is P (1 + )8 .
4
r
• Hence, after t years, the amount is P (1 + )4t .
4
118
More generally, let an amount P be invested for t years at an annual interest rate r and let
r
interest be compounded m times a year. Then, the amount after t years is P (1 + )mt .
m
Problem 3: An amount P is invested for t years at an annual interest rate r. Interest is
compounded continuously. Calculate the final sum.
r
What we need to calculate here is the limit of P (1 + )mt as m → ∞. In order to do
m
1 s
so, we use the fact that lims→∞ (1 + ) = e, where e is the basis of the natural logarithm.
s
This result implies that the amount after 1 year is given by
lim P (1 +
m→∞
r m
1
r m
) = lim mr →∞ P [(1 + ) r ]r = lim P [(1 + )s ]r = P er .
s→∞
m
m
s
Therefore, the amount after t years is P ert .
1
Remark 34.1.1 One way of showing that lims→∞ (1 + )s = e is by finding the Taylor
s
x s
series of (1 + ) about 0 and taking its limit as s → ∞. The resulting expression can
s
be recognised as the Taylor series of ex . Evaluating at x = 1 yields the required result
1
lims→∞ (1 + )s = e.
s
34.2
Nominal and effective interest
The annual rate r used in the calculations above is called the nominal rate. When interest
is not compounded annually, we can calculate the so-called effective annual rate re . This
is the annual rate that would need to be given if the compounding occurred only once a
year. Depending on the type of compounding, we have the following cases:
If interest on a principal P is compounded m times a year, then at the end of the year
r
r
we have P (1 + re ) = P (1 + )m , which implies that re = (1 + )m − 1. This expression
m
m
reduces to re = r when m = 1.
If interest on a principal P is compounded continuously, then at the end of the year we
have P (1 + re ) = P er , which implies that re = er − 1.
An example of calculating effective rates is given in the Practice Questions.
34.3
Discounting and present value
A future sum of money S is not worth as much as a present sum S, since money available
now can earn interest in the meantime. The process of determining the present value P
of a future sum S is called discounting. The discount rate is the nominal interest rate r
used in the calculation of the present value. Depending on the type of compounding used
in this calculation, we have the following cases:
119
If interest is compounded m times a year at a discount rate r, then the present value P of
r
a sum S in t years satisfies the equation P (1 + )mt = S, which implies that
m
P = S(1 +
r −mt
) .
m
If interest is compounded continuously at a discount rate r, then the present value P of a
sum S in t years satisfies the equation P ert = S, which implies that P = Se−rt .
Example 34.3.1 Find the present value of 100 pounds to be paid in 20 years, assuming
that the discount rate under continuous compounding is 0.05.
We have P = 100e−(0.05)(20) = 100e−1 ≈ 36.79.
Example 34.3.2 An antique car is currently worth A pounds.
Its estimated value V is
√
expected to increase according to the formula V = A(1.2) t , where t is measured in years.
Assuming that the discount rate under continuous compounding is 0.05, find how long the
investor should keep this car in order to maximise its present value.
√
The present value P of the car to be sold in t years is given by P = A(1.2) t e−0.05t . Since
ln is an increasing function, maximising P with respect to t is the same as maximising
ln(P ) with respect to t. We have:
√
ln(P ) = ln(A) + t ln(1.2) − 0.05t,
so differentiating with respect to t yields
1 dP
1
= √ ln(1.2) − 0.05.
P dt
2 t
dP
1
0.1
= 0. This is equivalent to √ =
, so the stadt
ln(1.2)
t
tionary point is to = (10ln(1.2))2 ≈ 3.32 years.
For a stationary point, we have
dP
dt
changes for values of t on the interval (0, ∞). Since P > 0 for all t ∈ (0, ∞), the sign of
1 dP
dP
dP
is the same as the sign of
. In other words, the sign of
is the same as the
dt
P dt
dt
1
sign of √ ln(1.2) − 0.05, and we already know that the latter expression vanishes at the
2 t
stationary point to = (10ln(1.2))2 and that it is positive on the interval (0, to ) and negative
on the interval (to , ∞). Hence, the stationary point t0 corresponds to a global maximum
of the function P , and the investment should be kept for approximately 3.32 years.
In order to establish that this is a global maximum, we can investigate how the sign of
34.4
Arithmetic sequences and their partial sums
Consider an arithmetic sequence {ui } with first term a and common difference d. We
have:
120
u1 = a,
u2 = a + d,
u3 = a + 2d,
which implies that the general term is given by
un = a + (n − 1)d.
The so-called nth partial sum sn of this sequence is given by adding its first n terms:
sn = u1 + u2 + ... + un .
In order to find a simple expression for sn we can use the fact that
2sn =
=
=
=
(u1 + u2 + ... + un ) + (un + un−1 + ... + u1 )
(u1 + un ) + (u2 + un−1 ) + ... + (un + u1 )
[2a + (n − 1)d] + [2a + (n − 1)d] + ... + [2a + (n − 1)d]
n[2a + (n − 1)d].
It follows that the nth partial sum is given by
sn =
34.5
n
2a + (n − 1)d .
2
Geometric sequences and their partial sums
Consider a geometric sequence {ui } with first term a and common ratio r. We have:
u1 = a,
u2 = ar,
u3 = ar2 ,
which implies that the general term is given by
un = arn−1 .
In order to find a simple expression for the nth partial sum sn of this sequence we can
subtract
sn = a + ar + ar2 + ... + arn−1
from
r sn = ar + ar2 + ... + arn−1 + arn .
Cancelling out equal terms we find that
r sn − sn = arn − a
which yields
sn = a
rn − 1
,
r−1
121
for r 6= 1.
Note that if r = 1, the geometric sequence becomes an arithmetic sequence with first term
a and common difference d = 0. Hence, we have that
sn = na,
if r = 1.
Example 34.5.1 Suppose that a one off deposit P is made at the beginning of year 1.
In addition, a deposit D is made at the beginning of each subsequent year. The account
is earning interest at a nominal rate r, compounded annually and paid at the end of each
year. Calculate the sum accumulated at the end of t years, just after the interest is paid.
At the end of year 1 the sum is P (1 + r).
At the end of year 2 the sum is P (1 + r) + D (1 + r) = P (1 + r)2 + D(1 + r).
Similarly, at the end of year 3 the sum is P (1 + r)3 + D(1 + r)2 + D(1 + r).
Hence, at the end of year t the sum is P (1 + r)t + D(1 + r)t−1 + ... + D(1 + r)2 + D(1 + r).
Recognising the geometric sequence
u1
u2
..
.
= D(1 + r),
= D(1 + r)2 ,
ut−1 = D(1 + r)t−1 ,
we can write the sum yt accumulated at the end of t years as
yt = P (1 + r)t + st−1 ,
where the partial sum st−1 = u1 + u2 + ... + ut−1 should be evaluated using D(1 + r) as
first term and (1 + r) as common ratio.
Therefore,
(1 + r)t−1 − 1
yt = P (1 + r)t + D(1 + r)
(1 + r) − 1
D
= P (1 + r)t +
(1 + r)t − (1 + r)
r
D
D
=
P+
(1 + r)t −
(1 + r) ,
r
r
which is the final simplified expression for the sum accumulated at the end of t years.
Note that the sequence yt can be generated by the rule
yn+1 = (yn + D)(1 + r)
subject to the initial condition y1 = P (1+r). This is an example of a difference equation.
34.6
Exercises for self study
Exercise 34.6.1 (a) Find the effective annual rate of interest on £1000 at 8% compounded
(i) annually
(ii) quarterly
(iii) continously.
122
(b) Determine the interest rate needed to have money double in 8 years when compounded
semiannually.
Exercise 34.6.2 (a) A deposit of £P is made at the beginning of each month for t years
in an account that is compounded monthly at an interest rate r. Find the sum accumulated
after t years.
(b) Write down a difference equation subject to a suitable initial condition which corresponds to the above problem.
Exercise 34.6.3 The estimated value of a collection of antiques bought for investment
is increasing according to the formula
2/5
V = 325, 000(1.95)t
The discount rate under continuous compounding is 6.8%. How long should the collection
be held to maximise the present value?
Exercise 34.6.4 A loan L is obtained from a bank at the beginning of year 1. A payment
P toward repaying the loan is made at the beginning of year 2 and at the beginning of
each subsequent year, until the loan is finally repaid. At the end of each year starting from
year 1, the bank charges interest on the outstanding loan at an annual rate r. The interest
is added to the loan.
(a) Find a simplified expression for the outstanding loan Lt at the end of year t, just after
the interest is added.
(b) Write down a difference equation corresponding to this problem.
34.7
Relevant sections from the textbooks
There are no relevant sections from our textbooks this week.
35
35.1
Differential and difference equations, 2 of 5
Complex numbers
A complex number, denoted z, is a number of the form
z = x + iy
where the so-called real part x and imaginary part y, denoted
Re(z) = x,
Im(z) = y,
are both real numbers and i is defined by the property that i2 = −1. The set of all
complex numbers is denoted by C. Any complex number z whose imaginary part is zero,
is, of course, real. Any complex number z whose real part is zero is said to be purely
imaginary. We can visualise each element z = x + iy of C on a plane as depicted below:
123
The x-axis is called the real axis and the y-axis is called the imaginary axis. The plane
equipped with these axes is known as the complex plane. Since x and y can be thought
of as Cartesian coordinates for the complex number z = x + iy, we refer to the form x + iy
as the Cartesian form of z.
An alternative description of the number x + iy is obtained by using the so-called polar
coordinates (r, θ) on R2 , depicted below:
The real, non-negative number r is known as the modulus of z, denoted |z|, and the real
number θ is known as the argument of z:
Mod(z) = |z| = r,
Arg(z) = θ.
The relation between the Cartesian and the polar coordinates of z can be derived using
trigonometry. Given polar coordinates (r, θ), the Cartesian coordinates (x, y) can be found
using
x = rcos(θ)
and
y = rsin(θ).
Conversely, given Cartesian coordinates (x, y), the polar coordinates (r, θ) can be found
using

x

p
cosθ
=

p
x2 + y 2
r = x2 + y 2
and
.
y

 sinθ = p 2
x + y2
x
Note that the solution θ of the simultaneous system of equations cosθ = p
and
2
x + y2
y
sinθ = p
is not defined unambigously. This is because θ + 2nπ is also a solution,
x2 + y 2
where n ∈ Z. The so-called principal argument θ corresponds to the choice −π < θ ≤ π.
124
The principal argument θ is unique, unless (x, y) = (0, 0), in which case r = 0 and hence
the value of θ is undefined.
Example 35.1.1 Given the complex number z ∈ C, find its polar coordinates (r, θ) in
the following cases:
√
√
√
√
(ii) z = − 3 + i
(iii) z = −1 − 3i
(iv) z = 3 − i
(i) z = 1 + 3i
These numbers are sketched on the complex plane below:
Selecting the principal argument in each case, we find:
q
√
(i) r = 12 + ( 3)2 = 2,
q √
(ii) r = (− 3)2 + 12 = 2,
q
√
(iii) r = (−1)2 + (− 3)2 = 2,
q√
(iv) r = ( 3)2 + (−1)2 = 2,

1

 cosθ =
2
√
3

 sinθ =
2

√

 cosθ = − 3
2

 sinθ = 1
2

1

 cosθ = −
2√

 sinθ = − 3
2

√
3

 cosθ =
2

 sinθ = − 1
2
125
so θ =
π
3
so θ =
5π
6
so θ = −
2π
3
so θ = −
π
6
35.2
Euler’s formula and polar exponential form
Recall that the Taylor series of the real-valued functions ex , sin(x) and cos(x) converge for
all x ∈ R and are given by
ex = 1 + x +
x2 x3
+
+ ...
2!
3!
sin(x) = x −
x3 x 5 x7
+
−
+ ...
3!
5!
7!
cos(x) = 1 −
x2 x4 x6
+
−
+ ...
2!
4!
6!
These results are valid even in the case where the variable x is replaced by a complex
variable z. In particular, letting z = iθ, θ ∈ R, and using the fact that i2 = −1, we obtain
eiθ = 1 + iθ +
(iθ)2 (iθ)3 (iθ)4 (iθ)5
+
+
+
+ ...
2!
3!
4!
5!
θ2
θ3 θ4
θ5
−i +
+ i + ...
2!
3!
4!
5!
θ2 θ4
θ3 θ5
=
1−
+
− ... + i θ −
+
− ...
2!
4!
3!
5!
= 1 + iθ −
= cos(θ) + i sin(θ).
The importance of this result, known as Euler’s formula, is that it can be used to express
any complex number z as a product. Indeed, starting with the Cartesian form x + iy of a
complex number z, we obtain
z = x + iy = rcos(θ) + i rsin(θ) + r cos(θ) + i sin(θ) = reiθ .
When z is written in the form z = reiθ , it is said to be expressed in polar exponential
form. While the Cartesian form x + iy is best for addition and subtraction, the polar
exponential form reiθ is best for multiplication and division of complex numbers. These
operations are defined next.
35.3
Operations on C
Addition, subtraction and multiplication of complex numbers can be defined by treating
these numbers as polynomials in i and using i2 = −1.
In particular, letting
z1 = x1 + iy1 and z2 = x2 + iy2 , we have:
z1 ± z2 = (x1 + iy1 ) ± (x2 + iy2 ) = (x1 ± x2 ) + i(y1 ± y2 ),
z1 z2 = (x1 + iy1 )(x2 + iy2 ) = (x1 x2 − y1 y2 ) + i(x1 y2 + y1 x2 ).
126
Note that C is closed under these three operations. Indeed, on the right hand side of the
above equations, we recognise the Cartesian form of an element of C. Regarding division,
this is defined as follows: For z2 6= 0,
z1
x1 + iy1
x1 + iy1
x2 − iy2
=
=
z2
x2 + iy2
x2 + iy2
x2 − iy2
=
(x1 x2 + y1 y2 ) + i(−x1 y2 + y1 x2 )
x22 + y22
=
x1 x2 + y 1 y 2
−x1 y2 + y1 x2
+i
2
2
x2 + y2
x22 + y22
Note that the complex number on the right hand side again has the Cartesian form, which
shows that C is closed under division as well.
The complex number x2 −iy2 , by which we have multiplied the numerator and denominator
x1 + iy1
of the fraction
above, is called the complex conjugate of x2 + iy2 . In general,
x2 + iy2
the complex conjugate of a complex number z, denoted z̄, is defined by
z̄ = (x + iy) = x − iy.
In other words,
Im(z̄) = −Im(z).
z1
Clearly, the Cartesian forms of the product z1 z2 and the ratio
look complicated and are
z2
not very convenient to use in practice. By expressing z1 and z2 in polar exponential form,
multiplication and division become as easy as addition and subtraction. Indeed, we have:
Re(z̄) = Re(z)
and
z1 z2 = r1 eiθ1 r2 eiθ2 = (r1 r2 )ei(θ1 + θ2 ) ,
and, for z2 6= 0,
z1
r1 eiθ1
r1 i(θ1 − θ2 )
=
=
e
.
iθ
z2
r2
r2 e 2
Note that the complex conjugate of z = reiθ is given by z̄ = re−iθ , since
z̄ =
=
=
=
=
(reiθ ) = rcos(θ) + i rsin(θ)
rcos(θ) − i rsin(θ)
r cos(θ) − i sin(θ)
r cos(−θ) + i sin(−θ)
re−iθ .
In general, replacing ‘i’ by ‘−i’ converts the complex number z into its conjugate z̄.
√
√
Example 35.3.1 Given z = 1 + 3i and w = −3 + 3 3i, find
(i) zw
(ii) z 6
(iii) w3
127
(iv) w̄10 z 20
Regarding z, we have
√
12 + ( 3)2 = 2
q
r=
and

1

 cos(θ) =
2
√
3

 sin(θ) =
2
so z = 2eiπ/3 . Regarding w, we have
q
√
r = (−3)2 + (3 3)2 = 6
and

−3

 cos(θ) =
6√
3
3

 sin(θ) =
6
so w = 6ei2π/3 . Therefore,
(i)
zw
π
2π
= 2ei 3 6ei 3 = 12eiπ = 12 cos(π) + i sin(π) = −12
(ii)
z6
=
(iii)
w3
=
(iv)
35.4
10 20
w̄ z
π
2ei 3
6
= 26 ei2π = 26 cos(2π) + i sin(2π) = 26
2π 3
6ei 3
= 63 ei2π = 63 cos(2π) + i sin(2π) = 63
10
20π
20π
π 20
−i 2π
3
= 610 e−i 3 220 ei 3 = 610 220 .
= 6e
2ei 3
Roots of polynomials
The Fundamental Theorem of Algebra asserts that a polynomial of degree n with complex
coefficients has n complex roots (not necessarily distinct), and can therefore be factorised
into n linear factors. If the coefficients are restricted to real numbers, the polynomial can
be factorised into a product of polynomials of degree 1 with real coefficients and quadratic
polynomials of negative discriminant with real coefficients. Any such quadratic polynomial
can be further factorised into a product of polynomials of degree 1 with complex coefficients.
The proof of the Fundamental Theorem of Algebra is beyond the scope of our course, but
we note the following useful result:
Theorem 34.1.1 Non-real roots of polynomials with real coefficients appear in conjugate
pairs.
Proof Let P (x) = a0 + a1 x + · · · + an xn , ai ∈ R, be a polynomial of degree n. We
shall show that if z is a root of P (x), then so is z̄. Let z be a complex number such that
P (z) = 0. Then
a0 + a1 z + a2 z 2 + · · · + an z n = 0.
Conjugating both sides of this equation, we have
a0 + a1 z + a2 z 2 + · · · + an z n = 0̄ = 0.
Above we have used the fact that since 0 is a real number, it is equal to its complex
conjugate. We now use the following properties of the complex conjugate, which you may
128
find useful to confirm: that the complex conjugate of the sum is the sum of the conjugates,
and the complex conjugate of a product is the product of the conjugates. We have
a0 + a1 z + a2 z 2 + · · · + an z n = 0,
and
a0 + a1 z̄ + a2 z̄ 2 + · · · + an z̄ n = 0.
Since the coefficients ai are real numbers, this becomes
a0 + a1 z̄ + a2 z̄ + · · · + an z̄ n = 0.
That is, P (z̄) = 0, so the number z̄ is also a root of P (x).
Example 35.4.2 Let us consider the polynomial
x3 − 2x2 − 2x − 3 = (x − 3)(x2 + x + 1).
The quadratic polynomial x2 + x + 1 can be further factorised according to
√ !2
√ !
√ !
2
2
3
1
1
1
3
3
3
1
+ = x+
+
i
x+ −
i .
= x+ +
x2 + x + 1 = x +
2
4
2
2
2
2
2
2
√
1
3
Letting w = − −
i, we obtain
2
2
x3 − 2x2 − 2x − 3 = (x − 3)(x − w)(x − w̄),
i.e., the complex roots appear in conjugate pairs.
35.5
Exercises for self study
Exercise 35.5.1 Plot z =
√
3−i and w = 1+i as points on the complex plane. Express z
z6
and w in polar exponential form and find q = 10 in both polar exponential and Cartesian
w
form.
Exercise 35.5.2 Find the roots w and w̄ of the equation x2 − 4x + 7 = 0. For these
values of w and w̄, find the real and imaginary parts of the following functions:
f (t) = ewt ,
t∈R
and
g(t) = wt ,
Exercise 35.5.3 (a) Show that the sequence
yt = αmt + β m̄t
where m = 1 +
√
3i, satisfies the difference equation
yt+2 − 2yt+1 + 4yt = 0
for arbitrary complex constants α and β.
129
t ∈ Z+ .
(b) Show that yt can be written in the form
yt = rt αeiθt + βe−iθt ,
where (r, θ) are the polar coordinates of m.
Exercise 35.5.4 Referring to the expression yt = rt αeiθt + βe−iθt from Exercise 35.5.3:
(a) Use Euler’s formula eiθ = cos(θ) + isin(θ) to write yt in the form
yt = rt A cos(θt) + B sin(θt) ,
expressing A and B in terms of α and β.
(b) Hence find the most general condition on the complex constants α and β which makes
yt a real sequence; that is, which makes A and B both real.
35.6
Relevant sections from the textbooks
• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press.
Section 13.1 of our Algebra Textbook is relevant.
36
36.1
Differential and difference equations, 3 of 5
Difference equations
Difference equations are recurrence relations satisfied by a sequence {yx }. The sequence
{yx } is the unknown of the problem, and x ∈ N is an index.
Example 36.1.1 An arithmetic sequence {yx } with first term a and common difference
d satisfies the difference equation
yx+1 = yx + d,
subject to the initial condition
y1 = a.
The solution of this equation was derived in Lecture 34 by writing out a few terms and
observing the resulting pattern:
yx = a + (x − 1)d.
Note that substituting the solution yx = a + (x − 1)d into the difference equation yields
a + xd = a + (x − 1)d + d,
which is an identity in x, as required.
Example 36.1.2 Similarly, a geometric sequence {yx } with first term a and common
ratio r satisfies the difference equation
yx+1 = ryx ,
130
subject to the initial condition
y1 = a.
The solution
yx = arx−1
of this equation was also derived in Lecture 34 by writing out the first few terms and
observing the pattern.
Example 36.1.3 In addition, we saw in Lecture 34 that the difference equations generating arithmetic and geometric sequences can be ‘combined’ to yield a difference equation
of the form
yx+1 = ryx + d,
subject to the initial condition
y1 = a.
An example of a sequence satisfying a difference equation of the above form was given in
Example 34.5.1.
The equations presented in the above examples are known as linear difference equations
with constant coefficients. This is the only type of difference equation that we are going
to study in this course. The relevant definition follows:
A linear difference equation with constant coefficients of order n is a difference equation
of the form
P (E)yx = Q(x),
where Q(x) is a given function and P (E) is a given polynomial of degree n of the so-called
shift operator E.
The shift operator E operates on the sequence {yx } by the following rule:
E(yx ) = yx+1 .
The operator E takes the input sequence {a, b, c, d, ...} and returns the output sequence
{b, c, d, ...}. Indeed, the relation E(yx ) = yx+1 implies that
E(y1 ) = y2 ,
i.e.,
E(a) = b,
E(y2 ) = y3 ,
i.e.,
E(b) = c,
E(y3 ) = y4 ,
i.e.,
E(c) = d,
and so on. This also implies that
E 2 (yx ) = E(yx+1 ) = yx+2 ,
E 3 (yx ) = E 2 (yx+1 ) = E(yx+2 ) = yx+3 ,
and so on.
Example 36.1.4 The 3rd -order difference equation
2yx+3 − 4yx+2 − 5yx+1 + 3yx = 9x2
131
can be expressed in the form P (E)yx = Q(x) where
P (E) = 2E 3 − 4E 2 − 5E + 3
and
Q(x) = 9x2 .
Difference equations of the form P (E)yx = Q(x) are split into two categories:
• If Q(x) 6= 0, the equation P (E)yx = Q(x) is called non-homogeneous.
• If Q(x) ≡ 0, the equation P (E)yx = 0 is called homogeneous.
We study both these cases below, starting from the homogeneous case.
36.2
Difference equations of the form P (E)yx = 0
Consider the following example.
Example 36.2.1 Find the general solution (i.e. the set of all solutions) of the difference
equation
yx+2 + 6yx+1 + 8yx = 0.
This equation has the form P (E)yx = 0, where the polynomial P (E) is given by
P (E) = E 2 + 6E + 8.
Note that the constant term in this polynomial is non-zero. This requirement will always
be imposed on difference equations. In order to understand why doing so is reasonable,
replace the number 8 by 0 above and show that the resulting difference equation can be
regarded as a first-order difference equation whose constant term is non-zero.
In order to find solutions of the equation (E 2 + 6E + 8)yx = 0, we consider sequences of
the form yx = mx , where m 6= 0 is some constant to be determined.
We note that
E(mx ) = mx+1 = mmx
and
E 2 mx = mx+2 = m2 mx ,
so the difference equation (E 2 + 6E + 8)yx = 0 becomes
(m2 + 6m + 8)mx = 0.
Hence, we deduce that as long as m is a solution of the so-called auxiliary equation
m2 + 6m + 8 = 0, the sequence yx = mx solves (E 2 + 6E + 8)yx = 0. Here, the solutions
of the auxiliary equation are
m1 = −2
and
m2 = −4
yx = (−2)x
and
yx = (−4)x
so we obtain two solutions
132
of the equation (E 2 + 6E + 8)yx = 0. We now observe that this equation is linear, which
implies that any linear combination of the solutions yx = (−2)x and yx = (−4)x is also a
solution:
(E 2 + 6E + 8) α(−2)x + β(−4)x = α(E 2 + 6E + 8)(−2)x + β(E 2 + 6E + 8)(−4)x
= α(0) + β(0)
= 0.
In this way, we arrive at the solution
yx = α(−2)x + β(−4)x
which contains two arbitrary constants.
Moreover, since the difference equation
2
(E + 6E + 8)yx is second order, and we already have a solution containing two arbitrary constants, we have found the general solution. This follows from an argument based
on Linear Algebra, presented below:
Consider the vector space V of all sequences. The operations of vector addition and scalar
multiplication (which turn the set V of all such sequences into a vector space) are defined
below:
∀yx ∈ V and ∀zx ∈ V : (y + z)x := yx + zx ,
∀yx ∈ V and ∀λ ∈ R : (λy)x := λyx .
Note that the sequence ox given by ox = 0 ∀x ∈ N plays the role of the zero vector in V .
Indeed,
∀yx ∈ V : (y + o)x = yx + ox = yx + 0 = yx ,
∀yx ∈ V : (0y)x = ox .
The shift operator E : V → V defined by E(yx ) = yx+1 maps the vector space V onto
itself. E maps the input vector {a, b, c, d, ...} ∈ V to the output vector {b, c, d, ...} ∈ V .
This is a linear transformation, because
∀yx ∈ V and ∀zx ∈ V : E((y + z)x ) = (y + z)x+1 = yx+1 + zx+1 = E(yx ) + E(zx ),
∀yx ∈ V and ∀λ ∈ R : E((λy)x ) = (λy)x+1 = λyx+1 = λE(yx ).
Moreover, it is not difficult to show that any polynomial P (E) : V → V is also a linear
transformation from the vector space V onto itself.
Therefore, solving the homogeneous difference equation
P (E)yx = 0
is equivalent to finding the kernel of the linear transformation
P (E) : V → V,
i.e., the vector subspace of V consisting of all sequences in V which are mapped to the
zero sequence {ox } ∈ V . In this way, the problem of finding the general solution of the
difference equation P (E)yx = 0 reduces to the problem of finding a basis for the kernel
133
of P (E) : V → V . Given such a basis, every solution {yx } of the difference equation
P (E)yx = 0 will be a linear combination of the basis vectors. We now need the following
theorem, stated without proof:
Theorem 36.2.2 If the linear transformation P (E) : V → V defined by
P (E) = an E n + an−1 E n−1 + ... + a1 E + a0
is such that a0 6= 0, then dim(ker(T )) = n.
Theorem 36.2.2 implies that the general solution of the homogeneous equation P (E)yx = 0
where P (E) = an E n + an−1 E n−1 + · · · + a1 E + a0 contains exactly n arbitrary constants. In the particular case of Example 36.2.1, having found a basis {(−2)x , (−4)x }
for the 2-dimensional kernel of the operator P (E) = E 2 + 6E + 8, we can be certain that
yx = α(−2)x + β(−4)x , where α and β are arbitrary constants, is the general solution of
the difference equation yx+2 + 6yx+1 + 8yx = 0.
The general method of solution of P (E)yx = 0 based on Theorem 36.2.2 is presented below:
Method of Solution
Let a difference equation of order n have the form P (E)yx = 0, where the constant term
of the polynomial P (m) is not zero. We solve the auxiliary equation P (m) = 0:
Case 1: If the solutions m1 , m2 , . . . , mn of this equation are all distinct, then the general
solution of the difference equation P (E)yx = 0 is
yx = α1 (m1 )x + α2 (m2 )x + · · · + αn (mn )x
where α1 , α2 , . . . , αn are arbitrary constants.
Case 2: If a solution m has algebraic multiplicity k, then this particular m contributes in
the general solution for yx the term
yx = . . . β1 + β2 x + β3 x2 + · · · + βk xk−1 mx + . . .
where β1 , β2 , . . . , βk are arbitrary constants. We will explain the reason why this form
arises later in the course.
Example 36.2.3 Suppose that the polynomial P (E) in the difference equation P (E)yx =
0 factorises as follows:
P (m) = (m − 4)(m − 3)2 (m + 1)(m + 9)4 .
Then, the general solution for yx is
yx = α1 4x + α2 + α3 x 3x + α4 (−1)x + α5 + α6 x + α7 x2 + α8 x3 (−9)x
where α1 , α2 , . . . , α8 are arbitrary constants.
In order to complete the theory, we need to consider the possibility that some of the
solutions of the auxiliary equation P (m) = 0 may be non-real. Since the coefficients of
134
the polynomial P (m) are all real, whenever a non-real solution for m arises, it must be
accompanied by its complex conjugate.
Complex Conjugate Pairs
Suppose that P (m) = 0 admits a pair of complex conjugate solutions. Let these solutions
be expressed in polar exponential form as
m1 = reiθ
and
m2 = re−iθ ,
where
0 ≤ θ < π.
Also suppose for simplicity that this pair is not repeated. In other words, m1 and m2 are
distinct solutions.
Following the pattern described above, the general solution for yx contains a term of the
form
yx = . . . α1 (reiθ )x + α2 (re−iθ )x . . . ,
where α1 and α2 are arbitrary complex constants. This term can be expressed equivalently
as
iθx
−iθx
rx . . . .
yx = . . . α1 e + α2 e
Let us now impose the requirement that the sequence yx be real. It is not difficult to show
that the most general condition on α1 and α2 compatible with this requirement is α1 = α2 .
This leads to a term of the form
yx = . . . β1 cos(θx) + β2 sin(θx) rx . . . ,
where β1 and β2 are real arbitrary constants.
In other words, given a complex conjugate pair m1 = reiθ and m2 = re−iθ , the modulus r
is raised to the power of x (in the general solution for yx ) and the argument θ becomes the
argument of the trigonometric functions.
Example 36.2.4 Suppose that the polynomial P (m) associated with the difference equation P (E)yx = 0 can be factorised as follows:
P (m) ≡ (m + 5)2 (m − 3 − 2i)(m − 3 + 2i).
√
√
The modulus r of 3 ± 2i is equal to r = 32 + 22 = 13. The angle θ, where in this
context we can always choose 0 ≤ θ < π, is found by solving

3

 cos(θ) = √
13
2

 sin(θ) = √ .
13
Then the general solution for yx is given by
yx = (α1 + α2 x)(−5)x + α3 cos(θx) + α4 sin(θx) 13x/2 ,
where α1 , α2 , α3 and α4 are all real arbitrary constants.
135
Finally, before we proceed to the non-homogeneous case P (E)yx = Q(x), let us present
a result about the long term behaviour of the solutions of the homogeneous equation
P (E)yx = 0.
Theorem 36.2.5 Given a homogeneous linear difference equation with constant coefficients of order n, P (E)yx = 0, all solutions yx tend to 0 as x → ∞ if and only if all roots
of the auxiliary equation P (m) = 0 have modulus less than 1.
36.3
Difference equations of the form P (E)yx = Q(x)
Consider the non-homogeneous difference equation
P (E)yx = Q(x),
where Q(x) is a given function. Its general solution is constructed in three steps:
Step 1: We find the general solution of the homogeneous part
P (E)yx = 0.
This general solution is called the complementary sequence, denoted (CS)x . It is
obtained by the method developed for homogeneous difference equations. If the polynomial
P (E) has degree n, then (CS)x contains n real arbitrary constants.
Step 2: We find a single solution of the non-homogeneous equation
P (E)yx = Q(x).
This solution is called a particular sequence, denoted (P S)x . For simple functions Q(x),
a method by which we can obtain a particular sequence will be presented in Examples 36.3.1
and 36.3.2 below as well as in the Exercises.
Step 3: The general solution of the non-homogeneous equation P (E)yx = Q(x) is then
the sum of the particular sequence and the complementary sequence:
yx = (P S)x + (CS)x .
Before we proceed to the examples, let us confirm that the above expression solves P (E)yx =
Q(x).
Indeed, since P (E)(CS)x = 0 and P (E)(P S)x = Q(x), we see that the linearity of the
difference equation implies that
P (E)yx = P (E) (P S)x + (CS)x
= P (E)(P S)x + P (E)(CS)x
= Q(x) + 0
= Q(x).
Let us also show that yx = (P S)x + (CS)x provides the general solution of
P (E)yx = Q(x): To this end, we need to show that any solution sx of the equation
136
P (E)yx = Q(x) can be written in the form sx = (P S)x + (CS)x for some suitable choice
of the constants in (CS)x .
Assuming that we are given a solution sx , let us consider the sequence
sx − (P S)x .
This sequence satisfies the homogeneous equation P (E)yx = 0 because
P (E) sx − (P S)x = P (E)sx − P (E)(P S)x = Q(x) − Q(x) = 0.
Hence, using the argument based on Linear Algebra, we deduce that the vector
sx − (P S)x
belongs to the null space of the linear transformation
P (E) : V → V.
Hence sx − (P S)x must be a linear combination of the basis vectors in the null space of
P (E) and hence it must have the form (CS)x for some suitable choice of constants (i.e. the
scalars in the linear combination (CS)x ). This proves that any solution sx can be written
in the form
sx = (P S)x + (CS)x ,
for some suitable choice of constants in (CS)x .
Example 36.3.1 Solve the difference equation
(E 2 − 5E + 6)yx = 5.
Considering the homogeneous part (E 2 −5E +6)yx = 0, we find that the auxiliary equation
m2 − 5m + 6 = 0 yields the distinct roots
m1 = 3,
m2 = 2.
The complementary sequence is therefore
(CS)x = A(3)x + B(2)x ,
where A and B are arbitrary real constants.
As a particular sequence, we try a sequence yx that has a chance of producing an identity
in x when substituted into the non-homogeneous equation. Recall that this is precisely
what we mean by a solution of a difference equation. For simple choices of Q(x), this can
usually be achieved by considering a particular sequence which has the same general form
as the function Q(x). Here, Q(x) = 5, so we try a constant sequence yx = a. Indeed,
substituting yx = a into the equation
yx+2 − 5yx+1 + 6yx = 5
yields
a − 5a + 6a = 5
137
which means that a =
5
and hence a particular sequence is
2
5
(P S)x = .
2
Therefore, the general solution of the equation (E 2 − 5E + 6)yx = 5 is
yx = (P S)x + (CS)x =
5
+ A(3)x + B(2)x ,
2
where A and B are arbitrary real constants. Note that the general solution contains two
arbitrary constants, as should be the case for a second order difference equation.
Example 36.3.2 Solve the difference equation encountered in Example 34.5.1,
yx+1 = (yx + D)(1 + r),
subject to the initial condition
y1 = P (1 + r).
Recall that P and D are deposits, r is the annual interest rate, x is an index denoting the
year and yx is the accumulated amount. We arrange this difference equation in the form
yx+1 − (1 + r)yx = D(1 + r).
Regarding the homogeneous part E − (1 + r) yx = 0, the auxiliary equation
m − (1 + r) = 0
yields the single root
m = (1 + r).
Hence the complementary sequence is given by
(CS)x = A(1 + r)x ,
where A is an arbitrary constant. Regarding a particular sequence we try
yx = a,
where a is a constant to be determined by substituting yx into the non-homogeneous
equation yx+1 − (1 + r)yx = D(1 + r). We find
a − (1 + r)a = D(1 + r),
which yields
a=−
D
(1 + r).
r
Hence, a particular sequence is
(P S)x = −
D
(1 + r).
r
138
The general solution of the difference equation yx+1 − (1 + r)yx = D(1 + r) is therefore
D
(1 + r) + A(1 + r)x ,
r
where A is an arbitrary constant. In order to determine A we use the initial condition
y1 = P (1 + r). This yields
D
(1 + r) = P (1 + r),
A−
r
yx = −
which implies that
A=P+
D
.
r
Hence, the solution to our problem is given by
D
D
yx = − (1 + r) + P +
(1 + r)x
r
r
or, equivalently, by
D
(1 + r)x − (1 + r) ,
r
in agreement with the result obtained in Example 34.5.1.
yx = P (1 + r)x +
36.4
Exercises for self study
Exercise 36.4.1 (a) Find the general solution of the difference equation
yx+2 − yx+1 − 2yx = 4x.
(b) Consider the initial conditions y1 = 1 and y2 = 2. Generate y3 and y4 using the
difference equation directly.
(c) Find the particular solution of the difference equation subject to the initial condtitions
y1 = 1 and y2 = 2 and verify that it reproduces the terms y3 and y4 calculated in part (b).
Exercise 36.4.2 (a) Find the general solution of the difference equation
yx+2 − yx+1 − 2yx = (−1)x .
Hint: For a particular sequence, try yx = ax(−1)x .
(b) Find the particular solution satisfying y3 = 1 and y6 = 10.
Exercise 36.4.3 Consider the difference equation
st+1 = st (1 + R) + D
subject to the initial condition s1 = D, where R and D are some constants.
(a) Write this difference equation in the form P (E)st = Q(t) for a suitable polynomial P
and function Q(t).
(b) Solve the difference equation subject to the given initial condition.
Exercise 36.4.4 Find the general solution of the difference equation
yx+3 − 3yx+2 + 9yx+1 + 13yx = 5x + 3.
139
36.5
Relevant sections from the textbooks
• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Sections 14.2, 14.3, 14.4, 14.4, 14.6, 14.7 and 14.8 of our Calculus Textbook are relevant.
37
37.1
Differential and difference equations, 4 of 5
Linear ODEs with constant coefficients
A linear ordinary differential equation with constant coefficients of order n is a differential equation for the function y(x) of the form
P (D)y = Q(x)
where Q(x) is a given function and P (D) is a given polynomial of the differential operator
d
D=
of degree n.
dx
Example 37.1.1 The 3rd -order linear differential equation
2
dy
d3 y d2 y
+ 2 − 5 + 3y = sin(x)
3
dx
dx
dx
can be expressed in the form P (D)y = Q(x) where
P (D) = 2D3 + D2 − 5D + 3
and
Q(x) = sin(x).
As with difference equations,
• If Q(x) 6= 0, the equation P (D)y = Q(x) is called non-homogeneous.
• If Q(x) ≡ 0, the equation P (D)y = 0 is called homogeneous.
We study both these cases below, starting from the homogeneous case.
37.2
Solving ODEs of the form P (D)y = 0
Let us motivate the method of solution by using an example.
Example 37.2.1 Solve the ordinary differential equation
d2 y
dy
+ 6 + 8y = 0.
2
dx
dx
This equation has the form P (D)y = 0 where the polynomial P (D) is given by
P (D) = D2 + 6D + 8.
140
Consider functions of the form y(x) = emx where m is some constant to be determined.
We note that
d mx
e = memx
Demx =
dx
and
d2
D2 emx = 2 emx = m2 emx ,
dx
mx
so by substituting y(x) = e in the differential equation (D2 + 6D + 8)y = 0, we obtain
(m2 + 6m + 8)emx = 0.
Clearly, if m is a solution of the auxiliary equation m2 + 6m + 8 = 0, then the function
y(x) = emx is a solution of the differential equation (D2 + 6D + 8)y = 0. The equation
m2 + 6m + 8 = 0 yields
m1 = −2 and m2 = −4,
so the functions
y(x) = e−2x
and
y(x) = e−4x
are both solutions of the differential equation (D2 + 6D + 8)y = 0. Moreover, since the
differential equation is linear, any linear combination of the solutions y(x) = e−2x and
y(x) = e−4x is also a solution. Indeed, for any constants α and β, we have
(D2 + 6D + 8) αe−2x + βe−4x = α (D2 + 6D + 8) e−2x + β (D2 + 6D + 8) e−4x
= α ((−2)2 + 6(−2) + 8) e−2x + β ((−4)2 + 6(−4) + 8) e−4x
= α (0) e−2x + β (0) e−4x
= 0.
As with difference equations, the key point is this: We have a solution
y(x) = αe−2x + βe−4x
of the second order ODE (D2 + 6D + 8)y = 0 which contains two arbitrary constants.
Hence, we have the general solution of this ODE. We discussed why this is the case in
Lecture 36 in the context of the homogeneous equation P (E)yx = 0. The only difference
now is that D2 + 6D + 8 defines a linear operator on the vector space C ∞ of differentiable
functions, where the vectors e−2x and e−4x form a basis for the 2-dimensional kernel of this
operator.
Let us now generalise the above method so that it becomes applicable to any differential
equation of the form P (D)y = 0. We will consider some typical examples immediately
afterwards.
Method of Solution
Given any ordinary differential equation of the form P (D)y = 0 of order n, we solve the
polynomial equation P (m) = 0, called again the auxiliary equation.
Case 1: If the solutions m1 , m2 , ..., mn are all distinct, then the general solution of the
differential equation P (D)y = 0 is
y(x) = α1 em1 x + α2 em2 x + · · · + αn emn x
141
where α1 , α2 , ..., αn are arbitrary constants.
Case 2: If a solution m has algebraic multiplicity k, then this particular m contributes in
the general solution for y(x) the term
y(x) = · · · + β1 + β2 x + β3 x2 + ... + βk xk−1 emx + . . .
where β1 , β2 , . . . , βk are arbitrary constants.
Example 37.2.2 Suppose that the polynomial P (D) in the differential equation P (D)y =
0 is such that the auxiliary equation P (m) factorises as follows:
P (m) = (m − 4)(m − 3)2 (m + 1)(m + 9)4 .
Then, the general solution for y(x) is
y(x) = α1 e4x + (α2 + α3 x) e3x + α4 e−x + α5 + α6 x + α7 x2 + α8 x3 e−9x
where α1 , α2 , . . . , α8 are arbitrary constants.
Example 37.2.3 Solve the 4th order differential equation
d4 y
d3 y
d2 y
−
−
2
= 0.
dx4 dx3
dx2
The polynomial P (D) is given by P (D) = D4 − D3 − 2D2 , so the auxiliary equation for m
is
m4 − m3 − 2m2 = 0.
We factorise the left hand side of this equation to obtain
m4 − m3 − 2m2 ≡ m2 (m2 − m − 2) ≡ m2 (m − 2)(m + 1),
so the solutions are
m = 0 (algebraic multiplicity 2),
m=2
and
m = −1.
Therefore, the general solution of the differential equation (D4 − D3 − 2D2 )y = 0 is
y(x) = (α1 + α2 x)e0x + α3 e2x + α4 e−1x = α1 + α2 x + α3 e2x + α4 e−x ,
where α1 , α2 , α3 and α4 are arbitrary constants.
Finally, let us complete the study of the differential equation P (D)y = 0 by considering
what happends when the solutions of the auxiliary equation P (m) = 0 are non-real. Recall
that since the coefficients of the polynomial P (m) are all real, whenever a non-real solution
for m arises, it must be accompanied by its complex conjugate.
Complex Conjugate Pairs
Suppose that P (m) = 0 admits a pair of complex conjugate solutions. Let these solutions
be
m1 = a + ib
and
m2 = a − ib,
142
where a and b are real numbers. Also suppose, for simplicity, that this pair is not repeated.
In other words, m1 and m2 are distinct solutions. The general solution for y(x) then
contains a term of the form
y(x) = ...α1 em1 x + α2 em2 x ...,
where α1 and α2 are arbitrary complex constants. Using the relations m1 = a + ib and
m2 = a − ib, this term becomes
y(x) = . . . α1 eax+ibx + α2 eax−ibx . . .
which can be expressed in the form
y(x) = ... α1 eibx + α2 e−ibx eax . . . .
Following the same argument used for difference equations, requiring that y(x) be a real
function forces α1 and α2 to be complex conjugates, in which case we obtain
y(x) = . . . (β1 cos(bx) + β2 sin(bx)) eax . . .
where β1 and β2 are real arbitrary constants.
To summarise, given a complex conjugate pair m1 and m2 , the real part a becomes the
argument of the exponential function and the imaginary part b becomes the argument of
the trigonometric functions. Such solutions describe oscillations with amplitude that varies
in time.
Example 37.2.4 Suppose that the polynomial P (m) associated with the differential
equation P (D)y = 0 can be factorised as follows:
P (m) ≡ (m + 5)2 (m − 3 − 2i)(m − 3 + 2i).
Then, the general solution for y(x) is given by
y(x) = (α1 + α2 x)e−5x + (α3 cos(2x) + α4 sin(2x))e3x
where α1 , α2 , α3 and α4 are all real arbitrary constants.
Finally, let us present a result about the long term behaviour of the solutions of P (D)y = 0:
Given a homogeneous linear differential equation with constant coefficients of
order n, P (D)y = 0, all solutions y(x) tend to 0 as x → ∞ if and only if all
roots of the auxiliary equation P (m) = 0 have negative real part.
In general, it is possible to analyse the behaviour of the solutions of P (D)y = 0 in a systematic way by focusing on the dominant term appearing in these solutions. The following
example illustrates how:
Example 37.2.5 Discuss the behaviour as x → ∞ of the general solution
y(x) = Ae−3x + Be2x + Ce3x + e−x (Dcosx + Esinx) :
143
We observe that e−3x → 0 and e−x → 0 as x → ∞. Therefore,
y(x) → Be2x + Ce3x .
The dominant term as x → ∞ is e3x . Therefore, the constant C will determine the
behaviour of the solution.
If C > 0, then y(x) → ∞.
If C < 0, then y(x) → −∞.
If C = 0, then the dominant term is e2x . Therefore, the constant B will determine the
behaviour of the solution. Hence,
If C = 0 and B > 0, then y(x) → ∞.
If C = 0 and B < 0, then y(x) → −∞.
If C = 0 and B = 0, then y(x) → 0 for all A and D and E.
37.3
Solving ODEs of the form P (D)y = Q(x)
As with difference equations, the general solution of the non-homogeneous equation P (D)y =
Q(x) is constructed in three steps:
STEP 1: We find the general solution of the homogeneous part
P (D)y(x) = 0.
This solution is called the complementary function and is denoted by (CF )(x). It can
be obtained by the method developed for homogeneous ODEs. If the polynomial P (D)
has degree n, then (CF )(x) contains n real arbitrary constants.
STEP 2: We then find a single solution of the non-homogeneous equation
P (D)y(x) = Q(x).
This solution is called a particular integral and is denoted by (P I)(x). The method by
which we obtain a particular integral is similar to that introduced for difference equations.
STEP 3: The general solution of the non-homogeneous equation P (D)y(x) = Q(x) is the
sum of the particular integral and the complementary function:
y(x) = (P I)(x) + (CF )(x).
Example 37.3.1 Solve the differential equation
(D2 − 3D + 2)y(x) = 5.
Considering the homogeneous part (D2 − 3D + 2)y(x) = 0, we find that the auxiliary
equation m2 − 3m + 2 = 0 yields the distinct roots
m1 = 1,
m2 = 2.
144
The complementary function is therefore
(CF )(x) = Aex + Be2x ,
where A and B are arbitrary real constants.
For a particular integral, we try a function (P I)(x) that has a chance of producing an
identity in x when substituted into the non-homogeneous differential equation. Recall that
this is precisely what we mean by a solution.
Here, we need to satisfy y 00 (x) − 3y 0 (x) + 2y(x) = 5, so a constant solution y(x) = a should
definitely work. Indeed, substituting y(x) = a into y 00 (x) − 3y 0 (x) + 2y(x) = 5 we find
2a = 5,
which means that a particular integral is
5
(P I)(x) = .
2
Therefore, the general solution of the equation (D2 − 3D + 2)y(x) = 5 is
y(x) = (P I)(x) + (CF )(x) =
5
+ Aex + Be2x ,
2
where A and B are arbitrary real constants.
Example 37.3.2 Solve the differential equation
(D2 − 3D + 2)y(x) = cos(x).
The left hand side is identical to that of Example 37.3.1, so the complementary function
remains
(CF )(x) = Aex + Be2x ,
where A and B are arbitrary real constants.
Regarding a particular integral, we need to satisfy y 00 (x) − 3y 0 (x) + 2y(x) = cos(x), so a
natural candidate is y(x) = acos(x) where a is a constant to be determined. However,
substituting this expression into the differential equation is clearly going to produce sin(x)
terms as well; so a single constant a will not be enough to satisfy an identity involving both
functions cos(x) and sin(x).
Indeed, if y(x) = acos(x), then y 0 (x) = −asin(x) and y 00 (x) = −acos(x), so the equation
y 00 (x) − 3y 0 (x) + 2y(x) = cos(x) becomes
−acos(x) + 3asin(x) + 2acos(x) = cos(x).
Rearranging, we get
(a − 1)cos(x) + 3asin(x) = 0,
which cannot be an identity in x unless both coefficients vanish. This leads us to the
inconsistent system
a=1
and
a = 0.
145
On the other hand, if we try y(x) = acos(x) + bsin(x) as a particular integral, we will have
two constants a and b at our disposal and still two functions to deal with (since cos(x) and
sin(x) are not going to produce any more functions when substituted into the differential
equation). Therefore, the form y(x) = acos(x) + bsin(x) for a particular integral should
work:
Letting y(x) = acos(x)+bsin(x), we find y 0 (x) = −asin(x)+bcos(x) and y 00 (x) = −acos(x)−
bsin(x), so the equation
y 00 (x) − 3y 0 (x) + 2y(x) = cos(x)
becomes
−acos(x) − bsin(x) + 3asin(x) − 3bcos(x) + 2acos(x) + 2bsin(x) = cos(x).
Rearranging, we get
(a − 3b − 1)cos(x) + (b + 3a)sin(x) = 0.
This produces an identity in x provided that the coefficients of cos(x) and sin(x) are zero.
Solving the resulting set of simultaneous equations, we find
1
10
a=
and
b=−
3
.
10
Therefore, a particular integral is
(P I)(x) =
1
3
cos(x) − sin(x)
10
10
and the general solution is
y(x) =
1
3
cos(x) − sin(x) + Aex + Be2x ,
10
10
where A and B are arbitrary real constants.
Example 37.3.3 Solve the differential equation
(D2 − 3D + 2)y(x) = 4e−x .
The complementary function is still
(CF )(x) = Aex + Be2x ,
where A and B are arbitrary real constants.
Regarding a particular integral, we need to satisfy y 00 (x) − 3y 0 (x) + 2y(x) = 4e−x , so a
natural candidate is y(x) = ae−x for some a to be determined. This expression is actually
sufficient, because substituting it into the differential equation is not going to produce any
other functions. In other words, a single constant a is sufficient to deal with the single
function e−x .
Indeed, letting y(x) = ae−x , we get y 0 (x) = −ae−x and y 00 (x) = ae−x , so the equation
y 00 (x) − 3y 0 (x) + 2y(x) = 4e−x
146
becomes
ae−x + 3ae−x + 2ae−x = 4e−x .
Rearranging, we get
(6a − 4)e−x = 0,
which produces a valid identity provided that
2
a= .
3
Therefore, a particular integral is
2
(P I)(x) = e−x
3
and the general solution is
2
y(x) = e−x + Aex + Be2x ,
3
where A and B are arbitrary real constants.
Example 37.3.4 Solve the differential equation
(D2 − 3D + 2)y(x) = 3ex .
The complementary function is still
(CF )(x) = Aex + Be2x ,
where A and B are arbitrary real constants.
We now need to note that the function ex on the right hand side of the equation also appears
in the complementary function. As a result, the selection of a particular integral requires
some thought. Let us recall what the problem is: We need to satisfy y 00 (x)−3y 0 (x)+2y(x) =
3ex . A natural candidate is y(x) = aex for some a to be determined. However, any
term of the form aex is included in the complementary function and therefore satisfies
the homogeneous equation. Therefore, if we substitute y(x) = aex into the equation
y 00 (x) − 3y 0 (x) + 2y(x) = 3ex we will obtain the inconsistent relation 0 = 3ex . As with
difference equations, multiplying by x in order to obtain y(x) = axex fixes the problem.
Note that you are not going to encounter cases which are more complicated than this
one, so you do not need to worry about extending this rule further. General rules for
constructing particular integrals do exist; some can be found in our Calculus textbook.
Here, letting y(x) = axex , we find y 0 (x) = aex + axex and y 00 (x) = 2aex + axex . The
equation
y 00 (x) − 3y 0 (x) + 2y(x) = 3ex
then becomes
2aex + axex − 3aex − 3axex + 2axex = 3ex .
Rearranging, we get
(−a − 3)ex = 0,
147
which gives a valid identity provided that
a = −3.
Therefore, a particular integral is
(P I)(x) = −3xex
and the general solution is
y(x) = −3xex + Aex + Be2x ,
where A and B are arbitrary real constants.
37.4
Exercises for self study
Exercise 37.4.1 Write down a linear differential equation P (D)y = 0 whose solutions
include the function e−2x and another such equation whose solution include the function
x2 ex . Hence write down a homogeneous linear differential equation whose solutions include
both e−2x and x2 ex together with its general solution.
Exercise 37.4.2 Find the general solutions of the differential equation
d4 y
d3 y d2 y
+
2
+
= 2cos(x).
dx4
dx3 dx2
Exercise 37.4.3 (a) Find the particular solution of the equation
d2 y
dy
− 2 + 10y = 4
2
dx
dx
which satisfies the conditions
y(0) = 1
and
y
π 6
= 0.
(b) Describe the behaviour of this solution as x → ∞.
Exercise 37.4.4 Find the general solution of the differential equation
d4 y
d3 y d2 y
+
2
+
= sin(x)
dx4
dx3 dx2
37.5
Relevant sections from the textbooks
• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Sections 14.1 to 14.8 of our Calculus Textbook are all relevant.
148
38
38.1
Differential and difference equations, 5 of 5
Ordinary and partial differential equations
In lecture 37, we discussed ordinary difference equations of the special form
P (D)y(x) = Q(x). In general, a differential equation is an equation which contains
at least one derivative of an unknown function f . A solution of a differential equation
is a relation between the variables involved in the differential equation which is free from
derivatives and which is consistent with the differential equation.
df
= f is a differential equation for the unknown function
Example 38.1.1 The equation
dx
f . A solution of this differential equation is f = ex . This defines a relation between the
variables x and f which is free from derivatives and which is consistent with the differential
equation in the following sense: If we replace f by ex in the differential equation we obtain
the identity ex = ex . On the other hand, the relation f = x2 is not a solution of this
differential equation. If we replace f by x2 in the differential equation we obtain 2x = x2
which is not an identity in x.
A differential equation for f which involves only one independent variable is called an
ordinary differential equation. A differential equation for f which involves two or
more independent variables is called a partial differential equation.
∂f
∂f
∂f
+y
+z
= 2f is a partial differential equation
∂x
∂y
∂z
for f (x, y, z). A solution of this equation is f = x2 + y 2 + z 2 . If we replace f by x2 + y 2 + z 2
in the differential equation we obtain the identity 2x2 + 2y 2 + 2z 2 = 2x2 + 2y 2 + 2z 2 .
Example 38.1.2 The equation x
The order of a differential equation is the highest order of any derivative appearing in the
equation.
∂f
= f 2 + y is a first-order partial differential equation
∂x
df
d3 f
for f (x, y) and the equation
= f 5 + 2x6 −
is a third-order ordinary differential
3
dx
dx
equation for f (x).
Example 38.1.3 The equation x
The degree of a differential equation is the algebraic degree with which the derivative of
highest order appears in the differential equation.
4
∂f
∂f
Example 38.1.4 The equation x
+
= f is a first-order partial differential
∂x
∂y
3 2 5
df
df
equation of degree four for f (x, y) and the equation
=
+ f is a thirddx3
dx
order ordinary differential equation of degree two for f (x).
The general solution of a differential equation is the collection of all solutions of this
equation.
Given a differential equation, initial and boundary conditions are additional requirements
that solutions of this equation must satisfy. Boundary and initial conditions limit the
149
general solution of a differential equation.
38.2
Separable ODEs
For all ordinary differential equations, we will denote the independent variable by x and
the unknown function by y(x). Of course, given an ordinary differential equation for y(x),
we can also interpret it as defining x in terms of y; that is, as a differential equation for
the local inverse function x(y). Sometimes this is actually preferable.
A separable ordinary differential equation is an equation that can be arranged in the form
dy
= F (x)G(y),
dx
where F (x) and G(y) are given functions of x and y. Note the product structure on the
right hand side. The most familiar case of a separable differential equation is the case
G(y) = 1. The equation then becomes
dy
= F (x).
dx
Its general solution is given by any primitive of F (x) plus an arbitrary constant:
Z
y(x) = F (x)dx + C.
Example 38.2.1 Solve the separable differential equation
1
dy
= 2
dx
x +1
subject to the condition that y = 5 when x = 0.
Z
1
By recognition, we have y(x) =
dx = arctan(x) + C, which corresponds to the
2
x +1
general solution of the above equation. Imposing the condition that y = 5 when x = 0, we
obtain the particular solution of this equation given by
y(x) = arctan(x) + 5.
Another simple case of a separable differential equation is the case F (x) = 1. The equation
dy
then becomes
= G(y). This is actually identical to the previous case if we regard it as
dx
a differential equation for the ‘local inverse function’ x(y).
Example 38.2.2 Find the general solution of the separable differential equation
dy
= y 2 + 1.
dx
We realise that this equation is equivalent to
dx
1
= 2
.
dy
y +1
150
Z
Hence, by recognition, we have x(y) =
y2
1
dy = arctan(y) + C. Note that we can
+1
make y the subject of this relation, obtaining
y(x) = tan(x − C).
dy
= y 2 + 1. Indeed, replacing
dx
y by tan(x − C) in this equation we obtain sec2 (x − C) = tan2 (x − C) + 1, which is an
identity in x.
This corresponds to the general solution of the equation
The method of solution followed in Examples 38.2.1 and 38.2.2 was to separate the variables
and then integrate. This method can be applied to any differential equation of the form
dy
= F (x)G(y), which justifies why these equations are called separable. The method of
dx
solution is simply this: We separate the variables,
dy
= F (x)dx
G(y)
and integrate in order to obtain the general solution
Z
Z
dy
= F (x)dx + C,
G(y)
where C is an arbitrary constant of integration.
Example 38.2.3 Solve the differential equation
y2
dy
= 2.
dx
x
Z
We see that this is a separable equation. We arrange it in the form
dy
=
y2
Z
dx
and
x2
perform the integration in order to obtain
1
1
− = − + C.
y
x
Making y the subject of this equation leads to the so-called explicit form of the general
solution; namely
x
.
y(x) =
1 − Cx
x
You may confirm by the quotient rule that y(x) =
is indeed a solution of the
1 − Cx
dy
y2
differential equation
= 2.
dx
x
38.3
Introduction to partial differential equations
Solving general partial differential equations goes beyond the scope of this course. However,
the need for solving such equations does arise even in the context of analysing ordinary
differential equations. For example, the so-called exact ordinary differential equations
151
(presented in the next subsection) require some knowledge of simple partial differential
equations.
The partial differential equations for a function f (x, y) that are needed for our purposes
have the following form:
∂ i+j f (x, y)
= G(x, y),
∂xi ∂y j
where i and j take values in the set {0, 1, 2} and G(x, y) is a given function.
∂ i+j f (x, y)
∂xi ∂y j
until we reach the function f (x, y). Note that every time we integrate with respect to one
of the independent variables x or y, we need to introduce an arbitrary function of the other
variable, not just an arbitrary constant.
Any such equation is solved by successive integrations of the partial derivatives
Example 38.3.1 Find the general solution of the partial differential equation
∂ 2f
= x2 + xy + 3.
∂x∂y
Let us integrate first with respect to y and then we respect to x, noting that the order in
which we integrate does not affect the final answer. Regarding the y integration, we treat
x as a fixed number and introduce an arbitrary function k(x), because this is the most
general term that can be added to the primitive which is consistent with the requirement
that its partial derivative with respect to y is equal to zero. We find
1
∂f
= x2 y + xy 2 + 3y + k(x).
∂x
2
Regarding the x integration, we now treat y as a fixed number. We also realise that the
integral of k(x) is another arbitrary function K(x). Finally, we introduce an arbitrary
function L(y), because this is the most general term that can be added to the primitive
which is consistent with the requirement that its partial derivative with respect to x is
equal to zero. We obtain the final answer
1
1
f = x3 y + x2 y 2 + 3yx + K(x) + L(y).
3
4
You may confirm that this expression for f (x, y) solves the partial differential equation
∂ 2f
= x2 + xy + 3.
∂x∂y
38.4
Exact ODEs
Consider a first order differential equation arranged in the form
M (x, y)dx + N (x, y)dy = 0,
where M (x, y) and N (x, y) are given functions of x and y. This equation is equivalent to
dy
M (x, y)
=−
, so it is a quite general first-order differential equation. Now suppose that
dx
N (x, y)
there exists a function F (x, y) with the property that
∂F
= M (x, y)
∂x
and
152
∂F
= N (x, y).
∂y
Note that such a function may not actually exist; however, if it exists, the differential
equation M (x, y)dx + N (x, y)dy = 0 can be expressed in the form
∂F (x, y)
∂F (x, y)
dx +
dy = 0.
∂x
∂y
Dividing by dx we obtain the equivalent equation
∂F (x, y) ∂F (x, y) dy
+
= 0,
∂x
∂y dx
which implies that the derivative of the composite function F (x, y(x)) with respect to x is
d
F x, y(x) = 0.
equal to 0; i.e.,
dx
Hence, the expression F x, y(x) = C for some arbitrary constant C yields the general
solution of the differential equation. We say that the relation F (x, y) = C defines the
function y(x) implicitly in terms of x.
Before we develop this theory further, let us consider an example of a differential equation
of the form M (x, y)dx + N (x, y)dy = 0 for which a function F (x, y) with the property that
∂F
∂F
= M (x, y) and
= N (x, y) does exist. This example clarifies what one means by
∂x
∂y
the statement that “F (x, y) = C defines the general solution y(x) implicitly in terms of x”.
Example 38.4.1 Solve the differential equation (y + 3x2 )dx + xdy = 0.
Following the previous approach, we would like to find a function F (x, y) such that
∂F
= y + 3x2
∂x
and
∂F
= x.
∂y
We see that we have a system of simultaneous partial differential equations of the simplest
kind. Let us solve it by finding the general solution of the PDE on the left and then
substituting this solution into the PDE on the right in order to obtain a solution consistent
with both partial differential equations.
The general solution of the PDE on the left is F (x, y) = xy + x3 + g(y), where g(y) is an
arbitrary function. Substituting this solution into the P.D.E. on the right we find
x + g 0 (y) = x,
which implies that g(y) = A, where A is an arbitrary constant. Therefore, we update our
solution F (x, y) = xy + x3 + g(y), which now becomes F (x, y) = xy + x3 + A.
According to the theory developed so far, the general solution y(x) of the differential
equation (y + 3x2 )dx + xdy = 0 is obtained implicitly by setting F (x, y) equal to a constant
C. Realising that the constant A in F (x, y) = xy + x3 + A can be absorbed in the constant
C, we have
xy + x3 = C.
This defines the general solution y(x) implicitly in terms of x. By making y the subject of
this equation, we obtain the general solution y(x) in explicit form:
y(x) =
C
− x2 .
x
153
You may confirm that this expression solves the differential equation
corresponds to the expanded form (y + 3x2 )dx + xdy = 0.
−y − 3x2
dy
=
which
dx
x
Let us now resume the development of the theory by stating conditions on the given
functions M (x, y) and N (x, y) that guarantee the existence of a function F (x, y). As in
Example 38.4.1, the function F (x, y) exists provided that it satisfies the system of partial
differential equations
∂F
∂F
= M (x, y) and
= N (x, y).
∂x
∂y
However, since the partial derivatives of F (x, y) commute, the above system of equations
holds if and only if
∂ 2F
∂ 2F
∂N
∂M
=
=
=
.
∂y
∂y∂x
∂x∂y
∂x
Any differential equation of the form M (x, y)dx + N (x, y)dy = 0 such that the given
functions M (x, y) and N (x, y) satisfy
∂M
∂N
=
∂y
∂x
is called an exact ordinary differential equation.
∂M
∂N
= 1 and
= 1. So the differential
∂y
∂x
equation in Example 38.4.1 was exact, which explains why it was possible to find a function
F (x, y) in that case.
Revisiting Example 38.4.1, we can confirm that
We are now able to summarise the method of solution of the differential equation
M (x, y)dx + N (x, y)dy = 0.
∂N
∂M
=
. If this relation is not valid, the equation is not exact, so we
∂y
∂x
need to follow some other approach. If this relation holds, the equation is exact and F (x, y)
exists. In order to find F (x, y) we solve the system of partial differential equations
We first check if
∂F
= M (x, y)
∂x
and
∂F
= N (x, y)
∂y
whose solution is guaranteed by the fact that the equation is exact. Having found F (x, y),
we set it equal to a constant C. The relation F (x, y) = C defines the general solution y(x)
of the differential equation M (x, y)dx + N (x, y)dy = 0 implicitly. If possible, we make y
the subject of F (x, y) = C in order to obtain the general solution for y(x) explicitly.
38.5
Linear ODEs
A linear ordinary differential equation in y is an equation for the function y(x) that can
be arranged in the form
dy
+ P (x)y = Q(x),
dx
154
where P (x) and Q(x) are given functions. In order to derive the solution of this equation,
we express it in the form
P (x)y − Q(x) dx + (1)dy = 0
and check if this equation is exact. We have:
∂ P (x)y − Q(x)
= P (x)
∂y
and
∂(1)
= 0.
∂x
We conclude that this equation is not exact unless the given function P (x) = 0, in which
dy
case the equation becomes the separable equation
= Q(x). We already know how to
dx
solve a separable equation, so the case P (x) = 0 is of no real interest. In the case of a
non-zero P (x), the interesting fact is that the equation
P (x)y − Q(x) dx + dy = 0
R
is made exact by multiplying it by the function I(x) = e P (x)dx . This function I(x) is
called an integrating factor. In order to confirm this, consider the equivalent differential
equation arranged in the form
R
R
R
e P (x)dx P (x)y − e P (x)dx Q(x) dx + e P (x)dx dy = 0
and perform the standard test. We have:
R
R
R
∂ e P (x)dx P (x)y − e P (x)dx Q(x)
R
R
∂e P (x)dx
P (x)dx
=e
P (x) and
= e P (x)dx P (x),
∂y
∂x
so the equation is now exact. Let us therefore solve it as an exact equation. We know that
a function F (x, y) exists such that
R
R
∂F
= e P (x)dx P (x)y − e P (x)dx Q(x)
∂x
and
R
∂F
= e P (x)dx .
∂y
Let us first integrate the partial differential equation on the right in order to obtain its
general solution. We see that this is given by
F =e
R
P (x)dx
y + g(x),
where g(x) is an arbitrary function. We substitute this general solution into the partial
differential equation on the left in order to obtain a solution consistent with both partial
differential equations. We find that
R
e
P (x)dx
R
P (x)y + g 0 (x) = e
P (x)dx
P (x)y − e
which reduces to
g 0 (x) = −e
R
P (x)dx
155
Q(x).
R
P (x)dx
Q(x),
Therefore,
Z
R
g(x) = −
e
P (x)dx
Q(x)dx,
and the function F (x, y) is updated to
R
F (x, y) = e
P (x)dx
Z
y−
R
e
P (x)dx
Q(x)dx.
Finally, we set F (x, y) equal to a constant C in order to obtain the general solution of the
dy
+ P (x)y = Q(x) for the function y(x). This gives
linear differential equation
dx
Z R
R
P (x)dx
e
y − e P (x)dx Q(x)dx = C.
R
Denoting the integrating factor by I(x) = e P (x)dx , we obtain a rather simple general solution for y(x) in explicit form:
Z
1 I(x)Q(x)dx + C .
y(x) =
I(x)
Now that we have derived this result, we can simply memorise it. So, we have the following
method of solution:
Given a linear differential equation in y, arrange it in the form
dy
+ P (x)y = Q(x)
dx
dy
is equal to 1) and calculate the integrating
dx
R
factor I(x) = e P (x)dx . Then, the general solution for y(x) is given by
Z
1 y(x) =
I(x)Q(x)dx + C .
I(x)
(that is, arrange it so that the coefficient of
An example of a linear equation is given in the Practice Questions. Several more examples
of separable, exact and linear differential equations can be found in our Calculus Textbook.
38.6
Homogeneous ODEs
Recall that a function f (x, y) is called homogeneous of degree n if
f (λx, λy) = λn f (x, y).
A homogeneous ordinary differential equation of degree n is a differential equation that
can be expressed in the form
M (x, y) + N (x, y)
dy
= 0,
dx
where the given functions M (x, y) and N (x, y) are both homogeneous functions of degree
n.
156
Example 38.6.1 The ordinary differential equation
(x + y)4
dy
+ (x2 − y 2 )
=0
xy
dx
is homogeneous of degree 2 because the functions
M (x, y) =
(x + y)4
xy
and
N (x, y) = x2 − y 2
are both homogeneous of degree 2. Indeed, we have
M (tx, ty) =
(tx + ty)4
t4 (x + y)4
= 2
= t2 M (x, y)
(tx)(ty)
t
xy
and
N (tx, ty) = (tx)2 − (ty)2 = t2 (x2 − y 2 ) = t2 N (x, y).
Example 38.6.2 The ordinary differential equation
3y 2x dy
+
=0
x
y dx
is homogeneous of degree 0. Without going through the proof, it should be clear that the
functions
3y
2x
M (x, y) =
and N (x, y) =
x
y
are both homogeneous of degree 0.
Example 38.6.3 On the other hand, the ordinary differential equation
4x3 + 5y 2
dy
=0
dx
is not homogeneous. This is because the degree of the homogeneous function M (x, y) = 4x3
(which is 3) is not equal to the degree of the homogeneous function N (x, y) = 5y 2 (which
is 2).
Example 38.6.4 Similarly, the ordinary differential equation
x2 +
4
dy
+ y2
=0
x
dx
is not homogeneous. This is because the function M (x, y) = x2 +
38.7
4
is not homogeneous.
x
Solving Homogeneous ODEs
In order to solve a homogeneous ODE we replace the dependent variable y(x) by the new
dependent variable z(x) defined by
z(x) =
y(x)
.
x
157
In other words, we express y and
dy
dz
in terms of x, z and
according to
dx
dx
y = xz
and
dy
dz
=z+x
dx
dx
dy
. It
dx
can be shown that this always results in an ordinary differential equation for z(x) which
is separable. After solving this differential equation for z(x), we use the relation y(x) =
xz(x) in order to obtain the corresponding solution for y(x).
and use these expressions in the differential equation in order to eliminate y and
Example 38.7.1 For x > 0, solve the ordinary differential equation
2x2
dy
= x2 + y 2
dx
subject to the condition that y = 7 when x = 1.
We observe that this is a homogeneous ODE of degree 2. We use the relations
y = xz
and
dz
dy
=z+x
dx
dx
to obtain a ODE for the function z(x), namely
dz
2
2x z + x
= x2 + x2 z 2 .
dx
We eliminate the factor x2 ,
dz
2 z+x
= 1 + z2,
dx
and send the term 2z to the right hand side. The resulting equation is clearly separable:
2x
dz
= z 2 − 2z + 1.
dx
We separate the variables and integrate:
Z
Z
dz
dx
2
=
.
2
z − 2z + 1
x
The denominator of the integrand on the left hand side is a complete square, so we have
Z
Z
dz
dx
2
=
(z − 1)2
x
which yields the general solution for z(x) in implicit form:
−
2
= ln(x) + C.
z−1
Note that we do not need to have ln|x| because we have been told that x > 0.
158
Before we apply the condition (x, y) = (1, 7) let us find the corresponding solution for the
function y(x). In fact, we can obtain the latter in explicit form. To this end, we make z(x)
the subject of the above relation to find that
z =1−
and then replace z by the ratio
2
ln(x) + C
y
in order to obtain the general solution for y(x):
x
2
.
y =x 1−
ln(x) + C
Finally, using the condition that y is equal to 7 when x is equal to 1, we find that
7=1−
2
.
C
The solution of this equation is
1
C=− .
3
Hence, the particular solution for y(x) consistent with both the differential equation and
the given condition is
2
.
y =x 1−
ln(x) − 13
38.8
Solving ODEs by changing the dependent variable
We saw in the previous section that homogeneous equations are solved by a change of
variable which transforms the homogeneous equation for y(x) into a separable equation
y(x)
for z(x) =
. More generally, a change of variable may convert a rather complicated
x
ODE into one of the forms that we have already studied. Several examples and exercises
regarding this technique can be found in sections 12.8 and 12.13 of our Calculus textbook.
A differential equation that requires a change of variable is presented in Exercise 38.9.4
below.
38.9
Exercises for self study
Exercise 38.9.1 Find the general solution of the following equations by any appropriate
method.
(a) (x2 + 6x) dy = y 2 dx − 12dy
√
dy
(b) 2x
=2 x+y+2
dx
Exercise 38.9.2 Solve the linear differential equation
x2
dy
+ 2xy = 6x2
dx
(a) as a homogeneous differential equation
(b) as an exact differential equation.
159
Exercise 38.9.3 Find the general solution of the following equations by any appropriate
method:
dy
2
(b) cos(y) + y cos(x) dx + sin(x) − x sin(y) dy = 0
(a) xy = e−(3x +5)
dx
Exercise 38.9.4 (a) Find the general solution of the following equation by an appropriate
method:
dy
xy
− y 2 = 3x2 e2y/x .
dx
(b) Solve the following differential equation by changing the dependent variable from y to
u = arctan(y):
dy
(1 + y 2 ) = arctan(y) − x
.
dx
38.10
Relevant sections from the textbooks
• K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press.
Chapter 12 of our Calculus Textbook is relevant.
39
39.1
Systems of difference and differential equations
Linear homogeneous systems of difference equations
Example 39.1.1 Consider two sequences {xt } and {yt } related as follows: x0 = 1, y0 = 2,
and, for t ≥ 0,
xt+1 = 7xt − 15yt
xt+1
7 −15
xt
i.e.,
=
.
yt+1 = 2xt − 4yt
yt+1
2 −4
yt
Clearly, the above difference equations together with the initial conditions determine the
sequences {xt } and {yt } uniquely; i.e.,
7 −15
−23
−71
x1
7 −15
1
−23
x2
=
=
,
=
=
,
y1
2 −4
2
−6
y2
2 −4
−6
−22
and so on. However, as these equations stand, we do not have a method for obtaining
explicit expressions for the solutions. The problem is that these equations are coupled. In
order to solve xt+1 = 7xt − 15yt for xt we need to know yt ; however, we cannot find yt by
solving yt+1 = 2xt − 4yt because we do not know xt .
Diagonalisation
Indeed,
as long as the matrix
provides a way out of this
problem.
7 −15
xt+1
x
A =
appearing in the system
= A t is diagonalisable; that is,
2 −4
yt+1
yt
as long as an invertible matrix
matrix D exist such that A = PDP−1 ,
P anda diagonal
xt+1
x
the system of equations
= A t can be expressed in the form
yt+1
yt
xt+1
−1 xt
−1 xt
−1 xt+1
= PDP
i.e.,
P
= DP
.
yt+1
yt
yt+1
yt
160
xt
Xt
Then, if we let xt =
and introduce a new variable zt =
by
yy
Yt
zt = P−1 xt
i.e.,
xt = Pzt ,
our system becomes
zt+1 = Dzt .
This means that the corresponding equations for the sequences {Xt } and {Yt } have been
uncoupled and can now be solved.
In particular, it turns out that the eigenvalues of A are λ1 = 1 and λ2 = 2 and the
corresponding eigenspaces are
6 −15
5
N (A − I) = N (
) = Lin
2 −5
2
5 −15
3
N (A − 2I) = N (
) = Lin
.
2 −6
1
−1
5 3
2 1
Therefore, A can be written in the form A = PDP , where P =
and D =
1 0
. Substituting the diagonal matrix D in the system zt+1 = Dzt , we obtain the
0 2
uncoupled equations
Xt+1
1 0
Xt
Xt+1 = Xt
=
i.e.,
Yt+1
0 2
Yt
Yt+1 = 2Yt
These equations can be expressed in the standard way
(E − 1)Xt = 0
(E − 2)Yt = 0,
where E is the shift operator. Their general solutions are
Xt = a(1)t = a
Yt = b(2)t ,
where a and b are arbitrary constants.
The corresponding solution for the original variables xt and yt can then be obtained by
using the relation xt = Pzt :
xt
5 3
a
5a + 3b(2)t
=
=
.
yt
2 1
b(2)t
2a + b(2)t
Finally, using the initial conditions x0 = 1 and y0 = 2, we obtain the simultaneous system
5a + 3b = 1
a = 5
i.e.,
,
2a + b = 2
b = −8
which implies that the required sequences {xt } , {yt } are
xt = 25 − 24(2)t
yt = 10 − 8(2)t .
161
We can verify that the first few terms, namely
x1
−23
x2
−71
=
,
=
,
y1
−6
y2
−22
are in agreement with the terms obtained by using the system xt+1 = Axt directly.
An alternative, faster, method for obtaining the above solutions is the following: Starting
from the system xt+1 = Axt , we realise that
x1 = Ax0 ,
x2 = Ax1 = A2 x0 ,
x3 = Ax2 = A3 x0 , . . . , etc.
This gives us an expression for the solution xt ; namely
xt = At x0 .
We now use the fact that
At = (PDP−1 )t = PDt P−1 ,
which gives us the required solution for xt in explicit form:
t
5 3
(1)
0
−1 3
1
t −1
xt = PD P x0 =
t
2 1
0 (2)
2 −5
2
5 3(2)t
5
=
2 (2)t
−8
25 − 24(2)t
.
=
10 − 8(2)t
39.2
Linear homogeneous systems of differential equations
Example 39.2.1 Solve the system of differential equations

dx(t)


= 7x(t) − 15y(t)

 dt



 dy(t) = 2x(t) − 4y(t)
dt
for the functions x(t) and y(t), subject to the initial conditions x(0) = 1 and
y(0) = 2. Note that the coefficient matrix of this system is the matrix A introduced
in Example 39.1.1.
x(t)
As with the system of difference equations, we let x(t) =
and introduce a new
y(t)
X(t)
5 3
variable z(t) =
by x(t) = Pz(t), where P =
is the transition matrix used
Y (t)
2 1
162
in Example 39.1.1. The system of differential equations for the new functions X(t) and
Y (t) is then diagonal:

dX(t)


= X(t)

 dt



 dY (t) = 2Y (t)
dt
These equations are both linear with constant coefficients and are also separable. Regarding
them as separable, we have
Z
Z
dX
= dt ⇐⇒ lnX = t + α ⇐⇒ X = eα+t = Aet
X
and
Z
dY
= 2 dt ⇐⇒ lnY = 2t + β ⇐⇒ Y = eβ+2t = Be2t ,
Y
where A and B are arbitrary constants. Hence, the general solution for the original variables
x(t) and y(t) is given by
t x(t)
X(t)
5 3
Ae
5Aet + 3Be2t
=P
=
=
.
y(t)
Y (t)
2 1
Be2t
2Aet + Be2t
Z
Given the initial conditions x(0) = 1 and y(0) = 2 we obtain the simultaneous system
5A + 3B = 1
A = 5
i.e.,
,
2A + B = 2
B = −8
which implies that the required particular solutions for x(t) and y(t) are
x(t) = 25et − 24e2t
y(t) = 10et − 8e2t .
You may find it useful to verify that these functions indeed solve the system

dx(t)


= 7x(t) − 15y(t)

 dt



 dy(t) = 2x(t) − 4y(t)
dt
39.3
Exercises for self study
Exercise 39.3.1 (a) Find the general solution of the following system of linear differential
equations:
0
x (t) = x(t) + 4y(t)
y 0 (t) = 3x(t) + 2y(t)
(b) Then find the unique solution satisfying the initial conditions x(0) = 1 and y(0) = 0.
Exercise 39.3.2 Find the general solution of the following system of difference equations

 xt+1 = xt + yt
yt+1 = −2xt + 4yt

zt+1 = 4zt
163
Exercise 39.3.3 Two supermarkets compete for customers in a region with 10, 000
shoppers. It is assumed that each shopper shops exactly once in any given week (by going
to only one of the two supermarkets). It is known that during any given week, supermarket
A will keep 70% of its customers while losing 30% to supermarket B, and that supermarket
B will keep 80% of its customers while losing 20% to supermarket A. At the end of a certain
week (call it week zero), the total population of 10, 000 shoppers was distributed as follows:
9, 000 went to supermarket A and 1, 000 went to supermarket B.
xt
Let the variable xt be given by xt =
, where xt is the number of shoppers shopping
yt
at supermarket A in week t and yt is the number of shoppers shopping at supermarket
B in week t. Write down a system of difference equations in the form
xt+1 = Mxt for a
a
suitable matrix M and also state a suitable initial condition x0 =
which model the
b
given situation; i.e., which can be used to predict the number of shoppers shopping at each
supermarket in any future week t.
Exercise 39.3.4 Referring to the so-called Markov process described in Exercise 39.3.3,
diagonalise M and solve the system of difference equations subject to the appropriate
initial conditions. What is the long term distribution of shoppers shopping in the two
supermarkets (as percentages of the total number of shoppers in the region)?
39.4
Relevant sections from the textbooks
• M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press.
Sections 9.2 and 9.3 of our Algebra Textbook are relevant.
164
Download