Linear Algebra & Multivariate Calculus Course Material

Contents Preface 5 21 Vector spaces associated with matrices, 1 of 2 21.1 Null space, column space and row space of a matrix 21.2 Bases for N (A), CS(A) and RS(A) . . . . . . . . 21.3 Exercises for self study . . . . . . . . . . . . . . . . 21.4 Relevant sections from the textbooks . . . . . . . . . . . . 22 Vector spaces associated with matrices, 2 of 2 22.1 Orthogonality between N (A) and RS(A) . . . . . . . 22.2 The rank-nullity theorem . . . . . . . . . . . . . . . . 22.3 Cartesian descriptions for N (A), CS(A) and RS(A) 22.4 Linear independence and span of a set X revisited . . 22.5 Consistency of a linear system revisited . . . . . . . . 22.6 Exercises for self study . . . . . . . . . . . . . . . . . 22.7 Relevant sections from the textbooks . . . . . . . . . 23 Linear transformations, 1 of 6 23.1 The main definitions . . . . . . . . . . . . . . . . . 23.2 Identity, compositions, linear combinations, inverse 23.3 Range and kernel . . . . . . . . . . . . . . . . . . . 23.4 Rank-nullity theorem for linear transformations . . 23.5 Exercises for self study . . . . . . . . . . . . . . . . 23.6 Relevant sections from the textbooks . . . . . . . . 24 Linear transformations, 2 of 6 24.1 Matrix representation of a linear T : Rn → Rm 24.2 Reflections, rotations and stretches in R2 . . . 24.3 The matrix AB→C . . . . . . . . . . . . . . . T 24.4 Exercises for self study . . . . . . . . . . . . . 24.5 Relevant sections from the textbooks . . . . . 25 Linear transformations, 3 of 6 25.1 Change of basis and transition matrix 25.2 The transition matrix PB→B 0 . . . . 25.3 Exercises for self study . . . . . . . . 25.4 Relevant sections from the textbooks 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 7 12 12 . . . . . . . 13 13 14 15 16 18 19 20 . . . . . . 20 21 22 22 24 25 26 . . . . . 26 26 28 30 33 34 . . . . 34 34 38 41 42 26 Linear transformations, 4 of 6 26.1 Change of basis and linear transformations 26.2 Similarity . . . . . . . . . . . . . . . . . . 26.3 Diagonalisable linear transformations . . . 26.4 Eigenvalues, eigenvectors and eigenspaces . 26.5 Exercises for self study . . . . . . . . . . . 26.6 Relevant sections from the textbooks . . . 27 Linear transformations, 5 of 6 27.1 Diagonalisation . . . . . . . . . . . . 27.2 Algebraic and geometric multiplicity 27.3 Exercises for self study . . . . . . . . 27.4 Relevant sections from the textbooks . . . . . . . . 28 Linear transformations, 6 of 6 28.1 Orthogonal matrices . . . . . . . . . . . 28.2 Orthogonal diagonalisation . . . . . . . 28.3 Symmetric matrices and quadratic forms 28.4 Exercises for self study . . . . . . . . . . 28.5 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Multivariate calculus, 1 of 5 29.1 Functions of Two Variables . . . . . . . . . . . . . 29.2 Partial derivatives . . . . . . . . . . . . . . . . . . . 29.3 Geometrical interpretation of the partial derivatives 29.4 Tangent planes . . . . . . . . . . . . . . . . . . . . 29.5 Exercises for self study . . . . . . . . . . . . . . . . 29.6 Relevant sections from the textbooks . . . . . . . . 30 Multivariate calculus, 2 of 5 30.1 The gradient . . . . . . . . . . . . . . . . . . 30.2 The derivative . . . . . . . . . . . . . . . . . 30.3 Directional derivatives . . . . . . . . . . . . 30.4 The rate of change of a function f : R2 → R 30.5 Exercises for self study . . . . . . . . . . . . 30.6 Relevant sections from the textbooks . . . . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 42 45 46 47 51 53 . . . . 53 53 58 59 60 . . . . . 60 60 61 64 67 68 . . . . . . 68 68 72 73 75 78 79 . . . . . . 79 79 81 82 85 86 88 31 Multivariate calculus, 3 of 5 31.1 Functions of n variables . . . . . . . . . . . . 31.2 Tangent hyperplanes . . . . . . . . . . . . . . 31.3 Stationary points . . . . . . . . . . . . . . . . 31.4 Contours, gradient and directional derivatives 31.5 Vector-valued functions . . . . . . . . . . . . . 31.6 The general chain rule . . . . . . . . . . . . . 31.7 Adapting the chain rule . . . . . . . . . . . . 31.8 Exercises for self study . . . . . . . . . . . . . 31.9 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Multivariate calculus, 4 of 5 32.1 The second derivative of a function . . . . . . . . . . . . . . . . . . . 32.2 Taylor polynomial for a scalar-valued function . . . . . . . . . . . . . 32.3 Classification of stationary points based on the Taylor polynomial P2 32.4 Classifying f 00 using the principal minors . . . . . . . . . . . . . . . . 32.5 Convex sets, convex and concave functions f : Rn → R . . . . . . . . 32.6 Convexity and concavity for twice differentiable functions . . . . . . . 32.7 Exercises for self study . . . . . . . . . . . . . . . . . . . . . . . . . . 32.8 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . 33 Multivariate calculus, 5 of 5 33.1 Motivating Lagrange’s method . . . . . . . . . . . 33.2 Lagrange’s method with an equality constraint . . 33.3 Regarding the form of the Lagrangian . . . . . . . 33.4 Regarding the applicability of Lagrange’s method 33.5 The Lagrange multiplier . . . . . . . . . . . . . . 33.6 Exercises for self study . . . . . . . . . . . . . . . 33.7 Relevant sections from the textbooks . . . . . . . 34 Differential and difference equations, 1 of 5 34.1 Interest compounding . . . . . . . . . . . . . 34.2 Nominal and effective interest . . . . . . . . 34.3 Discounting and present value . . . . . . . . 34.4 Arithmetic sequences and their partial sums 34.5 Geometric sequences and their partial sums 34.6 Exercises for self study . . . . . . . . . . . . 34.7 Relevant sections from the textbooks . . . . 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 88 88 89 90 90 92 94 96 97 . . . . . . . . 97 97 98 99 102 104 105 106 107 . . . . . . . 107 107 109 110 110 114 116 118 . . . . . . . 118 118 119 119 120 121 122 123 35 Differential and difference equations, 2 of 5 35.1 Complex numbers . . . . . . . . . . . . . . . 35.2 Euler’s formula and polar exponential form . 35.3 Operations on C . . . . . . . . . . . . . . . 35.4 Roots of polynomials . . . . . . . . . . . . . 35.5 Exercises for self study . . . . . . . . . . . . 35.6 Relevant sections from the textbooks . . . . . . . . . . . . . . . . 36 Differential and difference equations, 3 of 5 36.1 Difference equations . . . . . . . . . . . . . . . 36.2 Difference equations of the form P (E)yx = 0 . . 36.3 Difference equations of the form P (E)yx = Q(x) 36.4 Exercises for self study . . . . . . . . . . . . . . 36.5 Relevant sections from the textbooks . . . . . . 37 Differential and difference equations, 4 of 5 37.1 Linear ODEs with constant coefficients . . . 37.2 Solving ODEs of the form P (D)y = 0 . . . . 37.3 Solving ODEs of the form P (D)y = Q(x) . . 37.4 Exercises for self study . . . . . . . . . . . . 37.5 Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Differential and difference equations, 5 of 5 38.1 Ordinary and partial differential equations . . . . 38.2 Separable ODEs . . . . . . . . . . . . . . . . . . . 38.3 Introduction to partial differential equations . . . 38.4 Exact ODEs . . . . . . . . . . . . . . . . . . . . . 38.5 Linear ODEs . . . . . . . . . . . . . . . . . . . . 38.6 Homogeneous ODEs . . . . . . . . . . . . . . . . 38.7 Solving Homogeneous ODEs . . . . . . . . . . . . 38.8 Solving ODEs by changing the dependent variable 38.9 Exercises for self study . . . . . . . . . . . . . . . 38.10Relevant sections from the textbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Systems of difference and differential equations 39.1 Linear homogeneous systems of difference equations . 39.2 Linear homogeneous systems of differential equations 39.3 Exercises for self study . . . . . . . . . . . . . . . . . 39.4 Relevant sections from the textbooks . . . . . . . . . 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 123 126 126 128 129 130 . . . . . 130 130 132 136 139 140 . . . . . 140 140 140 144 148 148 . . . . . . . . . . 149 149 150 151 152 154 156 157 159 159 160 . . . . 160 160 162 163 164 Preface These lecture notes are intended as a self-contained study resource for the MA100 Mathematical Methods course at the LSE. At the same time, they are designed to complement the MA100 course texts, Linear Algebra, Concepts and Methods by Martin Anthony and Michele Harvey, and Calculus, Concepts and Methods by Ken Binmore and Joan Davies. I am grateful to Martin Anthony and Michele Harvey for allowing me to use some materials from their Linear Algebra, Concepts and Methods textbook and to Michele Harvey for carefully reading and commenting on an earlier draft of the Calculus part of these lecture notes. I am also grateful to Siri Kouletsis for her invaluable help with typing and editing the manuscript and for various improvements to its content. 5 21 Vector spaces associated with matrices, 1 of 2 In this section, we will consider three vector spaces associated with a matrix A: the null space of A, the column space of A and the row space of A. We will find a basis for each of these vector spaces and then, in Lecture 22, we will establish relationships between these spaces which will pave the way for introducing linear transformations in Lecture 23. 21.1 Null space, column space and row space of a matrix Recall that the null space N (A) of an m × n matrix A is the set of solutions of the linear system Ax = 0 and that N (A) is a subspace of Rn . There are two other vector spaces associated with A: the column space of A and the row space of A. Their definitions are given below. If A is an m × n matrix, and if c1 , c2 , . . . , cn denote the columns of A, then the column space of A, CS(A), is the linear span of the columns of A: CS(A) = Lin {c1 , c2 , . . . , cn } . Note that each column ci is an m × 1 vector, so each ci belongs to Rm . Hence, since the linear span of a set of vectors in a vector space V is a subspace of V , it follows that the column space CS(A) is a subspace of Rm . Similarly, if A is an m × n matrix, and if r1 , r2 , . . . , rm denote the transposed rows of A, then the row space of A, RS(A), is the linear span of the transposed rows of A: RS(A) = Lin {r1 , r2 , . . . , rm } . Note that each transposed row ri is an n × 1 vector, so each ri belongs to Rn . Therefore, the row space RS(A) is a subspace of Rn . Example 21.1.1 Consider the 3 × 5 matrix A given by   1 2 1 3 4 A = 1 1 0 5 −1 . 3 5 2 11 7 The column space of A is given by            2 1 3 4   1          1 , 1 , 0 , 5 , −1 CS(A) = Lin   3 5 2 11 7 and is a subspace of R3 . On the other hand, the row space of A is given by       1 1 3              2  1   5        RS(A) = Lin  1 ,  0  ,  2     3  5  11       4 −1 7 6 and is a subspace of R5 . To find the null space of A, we need to solve the homogeneous system Ax = 0. Since the right-hand side of this equation is the zero vector, instead of working with the augmented matrix (A|0), we can simply consider the coefficient matrix A. Performing a few elementary row operations on A (the steps are omitted), we obtain its reduced row echelon form:     1 2 1 3 4 1 0 −1 7 −6 RRE  1 1 0 5 −1 − −−−→ 0 1 1 −2 5  . 3 5 2 11 7 0 0 0 0 0 We see that x1 and x2 are leading variables and that x3 , x4 and x5 can be regarded as free parameters. Therefore, the general solution of the homogeneous system Ax = 0 is given by         x1 1 −7 6  x2  −1 2 −5          x3  = x3  1  + x4  0  + x5  0  .          x4  0 1 0 x5 0 0 1 It follows that the null space of A is given by        1 −7 6              −1  2  −5        N (A) = Lin  1  ,  0  ,  0  .     0   1   0        0 0 1 This is a subspace of R5 . 21.2 Bases for N (A), CS(A) and RS(A) Given an m × n matrix A, we would like to find bases for its null space, its column space and its row space. It turns out that the reduced row echelon form of A gives us all the information we need in order to obtain these bases. Let us illustrate how using the matrix A from Example 21.1.1. A basis for N (A): We claim that the three vectors appearing in the parametric solution         x1 1 −7 6 x2  −1 2 −5         x3  = x3  1  + x4  0  + x5  0          x4  0 1 0 x5 0 0 1 of the homogeneous system Ax = 0 constitute a basis B1 for N (A), that is,        1 −7 6              −1  2  −5       B1 =   1  ,  0  ,  0  .    0   1   0        0 0 1 7 In order to prove this statement, we need to show that the set B1 is a linearly independent set which spans N (A). First, B1 spans N (A) since any solution x of the equation Ax = 0 (that is, any element of N (A)) is obviously in the linear span of B1 as it can be expressed as a linear combination of the vectors in B1 . Moreover, B1 is a linearly independent set. This is because by construction of the general solution of the system Ax = 0, each vector in B1 contains a leading one in a position where all the remaining vectors in B1 have zeros: indeed, the three vectors in B1 have the form        ∗ ∗ ∗              ∗ ∗ ∗        B1 = 1 , 0 , 0 .     0   1   0        0 0 1 This implies that no vector in B1 can be written as a linear combination of the remaining vectors in B1 and therefore, by Theorem 19.1.1, B1 is a linearly independent set. Hence B1 is a basis for N (A) and since B1 consists of three vectors, we deduce that N (A) is a 3-dimensional subspace of R5 . A basis for CS(A): We claim that the columns of A corresponding to the leading columns of RRE(A) (that is, the columns of A corresponding to the columns of RRE(A) that contain the leading ones) constitute a basis B2 for CS(A). In Example 21.1.1, the leading ones of RRE(A) appear in the first two columns, so we take the corresponding columns of A, namely,     2   1    1 , 1 . B2 =   3 5 In order to establish this result, we need to show that the set B2 is a linearly independent set which spans CS(A). First, B2 is a linearly independent set because the elementary row operations that turn A into RRE(A), namely     1 2 1 3 4 1 0 −1 7 −6 RRE  1 1 0 5 −1 − −−−→ 0 1 1 −2 5  , 3 5 2 11 7 0 0 0 0 0     1 2 1 0 transform the submatrix 1 1 into 0 1, i.e. 3 5 0 0     1 2 1 0 RRE  1 1  − −−−→ 0 1 . 3 5 0 0   1 2  This implies that the matrix 1 1 has full column rank, which establishes the linear  3 5   1 2    independence of its columns 1 and 1. In addition, B2 spans CS(A) because any 3 5 8 other column of A is a linear combination of the two columns in B2 . More precisely, consider the submatrix (c1 c2 c3 ) of A = (c1 c2 c3 c4 c5 ) and perform the same elementary row operations that turn A into RRE(A). We get     1 2 1 1 0 −1 RRE  1 1 0 − −−−→ 0 1 1  . 3 5 2 0 0 0 The absence of a third leading one implies that (c1 c2 c3 ) does not have full column rank, so the set {c1 , c2 , c3 } is a linearly dependent set. In particular, the matrix   1 0 −1 0 1 1  0 0 0 tells us that the reduced row echelon form of the augmented matrix (c1 c2 c3 | 0) is   1 0 −1 0 0 1 1 0 , 0 0 0 0 which implies that the general solution of the corresponding homogeneous system (c1 c2 c3 )x = 0 is given by     1 x1 x = x2  = x3 −1 . 1 x3 Given that the matrix equation   x1  (c1 c2 c3 ) x2  = 0 x3 can be expressed as  x1 c1 + x2 c2 + x3 c3 = 0,  1  the particular solution x = −1 (obtained by choosing x3 = 1 in the general solution 1 for x) implies that c1 − c2 + c3 = 0. Hence, c3 is a linear combination of the two columns in B2 , c3 = −c1 + c2 , which shows that c3 ∈ Lin {c1 , c2 } = Lin(B2 ). Next, by inspecting RRE(A), we identify the reduced row echelon form of the submatrix (c1 c2 c4 ) of A = (c1 c2 c3 c4 c5 ):     1 2 3 1 0 7 RRE  1 1 5  − −−−→ 0 1 −2 . 3 5 11 0 0 0 9 Considering the general solution of the homogeneous system (c1 c2 c4 )x = 0 and following an identical argument as above, we now obtain the statement that −7c1 + 2c2 + c4 = 0. Hence, c4 is a linear combination of the two columns in B2 , c4 = 7c1 − 2c2 , which implies that c4 ∈ Lin {c1 , c2 } = Lin(B2 ). Finally, by inspecting RRE(A) once more, we can identify the reduced row echelon form of the submatrix (c1 c2 c5 ) of A = (c1 c2 c3 c4 c5 ):     1 2 4 1 0 −6 RRE  1 1 −1 − −−−→ 0 1 5  . 3 5 7 0 0 0 By a similar reasoning, we now infer that 6c1 − 5c2 + c5 = 0, and hence c5 = −6c1 + 5c2 . Therefore, c5 ∈ Lin {c1 , c2 } = Lin(B2 ). We have thus shown that all the other columns of A can be expressed as linear combinations of the columns in B2 and hence B2 is a basis for CS(A). Moreover, since B2 consists of two vectors, we deduce that CS(A) is a 2-dimensional subspace of R3 . A practical approach: Note that we do not really have to consider the individual submatrices (c1 c2 c3 ), (c1 c2 c4 ) and (c1 c2 c5 ) of A in order to express c3 , c4 and c5 as linear combinations of c1 and c2 . The basis for the null space of A found before, namely        1 −7 6              −1  2  −5      1 B1 =   ,  0  ,  0  ,    0   1   0        0 0 1 reveals directly which linear combinations of c1 , c2 produce the remaining columns c3 , c4 and c5 of A. More precisely, each of the basis vectors  in the  null space of A gives a x1  x2     particular solution of the homogeneous system (c1 c2 c3 c4 c5 )  x3  = 0. Using the fact that  x4  x5 this equation amounts to x1 c1 + x2 c2 + x3 c3 + x4 c4 + x5 c5 = 0, we conclude immediately 10 that the following three linear combinations of the columns of A are equal to the zero vector: c1 − c2 + c3 = 0, −7c1 + 2c2 + c4 = 0, 6c1 − 5c2 + c5 = 0. Solving these equations for c3 , c4 and c5 in terms of c1 and c2 yields the required linear combinations. A basis for RS(A): We claim that the transposed leading rows of RRE(A) (that is, the rows of RRE(A) that contain the leading ones) constitute a basis B3 for RS(A). In Example 21.1.1, the leading ones of RRE(A) appear in the first two rows, so B3 is given by      1 0            0    1       B3 = −1 ,  1  .     7  −2       −6 5 In order to establish this fact, we need to show that the set B3 is a linearly independent set which spans RS(A). First, B3 is a linearly independent set because of the positions of the leading ones. More precisely, each vector in B3 has a leading one where all the other vectors in B3 have zeros. In the particular example above, the vectors in B3 are of the form      1 0             0   1       B3 = ∗ , ∗ .    ∗ ∗       ∗ ∗ This implies that no vector in B3 is a linear combination of the remaining vectors in B3 . Hence, by Theorem 19.1.1, B3 is a linearly independent set. To show that the set B3 spans RS(A), we just need to observe that the elementary row operations (that is, Ri ↔ Rj , Ri 7→ λRi and Ri 7→ Ri + λRj ) guarantee that each row in RRE(A) is a linear combination of the original rows of A. Hence, we deduce that RS(RRE(A)) ⊆ RS(A). Moreover, each such elementary row operation is invertible, which by a similar argument implies that RS(A) ⊆ RS(RRE(A)). Combining these two statements gives RS(A) = RS(RRE(A)). Using this result and also the fact that, by definition, Lin(B3 ) = RS(RRE(A)), we conclude that Lin(B3 ) = RS(A), which shows that B3 spans RS(A). We have thus shown that B3 is a basis for RS(A). Moreover, since B3 consists of two vectors, RS(A) is a 2-dimensional subspace of R5 . Remark 21.2.1 Let us emphasise that a basis for RS(A) consists of the leading rows of the reduced row echelon form RRE(A). This is all that the above proof guarantees - there is no reason to try to identify a ‘corresponding set’ of rows of the original matrix A. Remark 21.2.2 Let us also emphasise that a basis for CS(A) consists of the columns of A that correspond to the leading columns of RRE(A). The leading columns of RRE(A) themselves do not form a basis for CS(A). This is because row operations do not preserve the column space of a matrix in general. 11 21.3 Exercises for self study Exercise 21.3.1 Consider the following matrix  1 2 1 3 A = 0 1 1 1 1 3 2 0 A:  0 −1 . 1 (a) Find a basis B1 for the row space of A. (b) Find a basis B2 for the column space of A. (c) Find a basis B3 for the null space of A. Exercise 21.3.2 Consider the matrix A from Exercise 21.3.1 and the bases B1 , B2 and B3 obtained there: (a) Using B3 , express each column of A as a linear combination of the basis vectors in B2 . (b) Find the reduced row echelon form of AT . (c) Find a basis C1 for the column space of AT , a basis C2 for the row space of AT and explain why Lin(B1 ) = Lin(C1 ) and Lin(B2 ) = Lin(C2 ). Exercise 21.3.3 Consider the set of vectors X = {v1 , v2 , v3 , v4 } where         1 2 −1 9        2 7 v1 = v2 = 1 v3 = v4 = 6 . −4 3 −29 8 (a) Find the reduced row echelon form of the matrix A = (v1 v2 v3 v4 ) whose columns are the vectors in X. (b) Argue that Lin(X) = CS(A) and hence find a basis B for Lin(X). (c) Find the coordinates (vi )B of each vector vi in X with respect to the basis B of Lin(X). Exercise 21.3.4 Consider the set X and the matrix A from Exercise 21.3.3: (a) Find the reduced row echelon form of the matrix AT . (b) Find a basis C for RS(AT ). (c) Argue that RS(AT ) = Lin(X) and find the coordinates (vi )C of each vector vi in X with respect to the basis C of Lin(X). 21.4 Relevant sections from the textbooks • M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press. Sections 5.3 and 6.5 of our Algebra Textbook are relevant. 12 22 Vector spaces associated with matrices, 2 of 2 In this section, we complete the presentation of the vector spaces N (A), CS(A) and RS(A) associated with a matrix A. We start by establishing certain relationships between these vector spaces and then we use the results obtained in Lecture 21 to revisit the linear independence and the linear span of a set X of vectors as well as the issue of the consistency of a general system Ax = b. 22.1 Orthogonality between N (A) and RS(A) Consider any m × n matrix A of rank k; for example, consider the 3 × 5 matrix A of rank 2 presented in Example 21.1.1, whose reduced row echelon form is given below:     1 2 1 3 4 1 0 −1 7 −6 A = 1 1 0 5 −1 , RRE(A) = 0 1 1 −2 5  . 3 5 2 11 7 0 0 0 0 0 We have already established that both N (A) and RS(A) are subspaces of R5 and that             1 −7 6  1 0                         −1  2  −5  0   1           N (A) = Lin  RS(A) = Lin  −1 ,  1  ,  1  ,  0  ,  0  ,             0 1 0  7  −2            0 0 1 −6 5 where each of the above sets of vectors constitutes a basis for the corresponding subspace. The spaces N (A) and RS(A) are orthogonal to each other with respect to the usual scalar product on R5 in the sense that any vector in N (A) is orthogonal to any vector in RS(A). Indeed, a vector x belongs to N (A) if and only if it satisfies the homogeneous equation Ax = 0. This implies that x is orthogonal to the transposed rows of A, which in turn implies that x is orthogonal to any vector in RS(A). In the case of the example given above, one can verify this fact directly by using the basis vectors for N (A) and RS(A):             1 1 −7 1 6 1 *−1  0 + * 2   0  + *−5  0 +              1  , −1 = 0,  0  , −1 = 0,  0  , −1 = 0,             0 7 1 7 0 7 0 −6 0 −6 1 −6     1 0 *−1  1 +      1  ,  1  = 0,      0  −2 0 5     −7 0 * 2   1  +      0  ,  1  = 0,      1  −2 0 5     6 0 *−5  1 +      0  ,  1  = 0.      0  −2 5 1 By the above results and the bilinearity of the scalar product, it follows that any linear combination of the basis vectors of N (A) is orthogonal to any linear combination of the basis vectors of RS(A). Hence, N (A) and RS(A) are orthogonal to each other. 13 22.2 The rank-nullity theorem Still working with the example above, note that if we add the dimension of N (A) to the dimension of RS(A), we obtain the dimension of the vector space R5 . This is not a coincidence: The dimension of RS(A) is equal to the number of leading ones in RRE(A); that is, it is equal to the rank of A: dim(RS(A)) = rank(A). The dimension of N (A) is equal to the number of free parameters appearing in the general solution of the homogeneous system Ax = 0; this number is known as the nullity of A: dim(N (A)) = nullity(A). Now recall that any column of A which does not contain a leading one implies the existence of a free parameter in the general solution of the system Ax = 0. Hence, the sum of the number of leading ones and the number of free parameters is equal to the number of columns of A. In other words, we have that rank(A) + nullity(A) = number of columns of A. This is known as the Rank-Nullity theorem for matrices. In the example above, the rank of A is 2, the nullity of A is 3 and the number of columns of A is 5. Also note that although the column space of A and the row space of A are in general subspaces of different vector spaces (indeed, above, CS(A) ⊆ R3 and RS(A) ⊆ R5 ), the dimensions of these vector spaces are always equal; that is, dim(CS(A)) = dim(RS(A)). This is because both these dimensions are equal to the number of leading ones in RRE(A); i.e., the rank of A. Accordingly, the Rank-Nullity theorem for matrices can be expressed in various alternative ways, as follows: For any m × n matrix A, we have that rank(A) + nullity(A) = n, dim(CS(A)) + dim(N (A)) = n, dim(RS(A)) + dim(N (A)) = n. Remark 22.2.1 Given an m × n matrix A, the only vector in Rn which belongs to both N (A) and RS(A) is the zero vector. This is because, by the orthogonality of RS(A) and N (A), a vector v belongs to both RS(A) and N (A) only if it is orthogonal to itself: hv, vi = 0. By the positivity of the scalar product on Rn , the only vector v ∈ Rn that has the property that hv, vi = 0 is the zero vector 0 ∈ Rn . This means that for any matrix m × n A, the intersection of the row space of A and the null space of A consists of only the zero vector in Rn : RS(A) ∩ N (A) = {0} . 14 22.3 Cartesian descriptions for N (A), CS(A) and RS(A) Let us use again our previous example. We have found a basis for each of the vector spaces N (A), RS(A) and CS(A), so each of these spaces can be regarded as the linear span of its basis:             1 −7 6 1 0           −1  2  −5   0   1                           N (A) = Lin  1  ,  0  ,  0  , RS(A) = Lin −1 ,  1  ,               −2  0 1 0        7      0 0 1 −6 5      2   1    1 , 1 . CS(A) = Lin   3 5 It is useful to be able to describe each of the subspaces N (A), RS(A) and CS(A) by a set of Cartesian equations. Let us start with N (A), whose vector parametric description is given below:         x1 1 −7 6 x2  −1 2 −5         x3  = s  1  + t  0  + u  0  .         x4  0 1 0 x5 0 0 1 We note that N (A) is a 3-dimensional subspace of R5 . Hence, a Cartesian description for N (A) amounts to imposing two independent restrictions on the five variables x1 , x2 , x3 , x4 , x5 . With this in mind, consider the 2 × 5 matrix D of rank 2 whose rows are the transposed basis vectors of RS(A). Since the transposed rows of D are orthogonal to any vector in N (A), the general solution of the homogeneous system Dx = 0,   x1  x2   1 0 −1 7 −6   x3  = 0 ,  0 0 1 1 −2 5   x4  x5 which obviously contains three free parameters, coincides with the parametric equation for N (A) given above. It follows that a Cartesian description for N (A) is given by the system of equations x1 − x3 + 7x4 − 6x5 = 0 x2 + x3 − 2x4 + 5x5 = 0 associated with the homogeneous system Dx = 0. Similarly, RS(A) is a 2-dimensional subspace of R5 , described by the parametric equation       x1 1 0  x2  0 1       x3  = λ −1 + µ  1  .        x4  7 −2 x5 −6 5 15 A Cartesian description for RS(A) requires three independent restrictions on the five variables x1 , x2 , x3 , x4 , x5 . Let us therefore consider the 3×5 matrix E of rank 3 whose rows are the transposed basis vectors of N (A). Since the transposed rows of E are orthogonal to any vector in RS(A), the general solution of the homogeneous system Ex = 0,     x1    1 −1 1 0 0  0 x2  −7 2 0 1 0 x3  = 0 ,   6 −5 0 0 1 x4  0 x5 which obviously contains two free parameters, coincides with the parametric equation of RS(A) given above. It follows that a Cartesian description for R(A) is given by the system of equations  x1 − x2 + x3 = 0  −7x1 + 2x2 + x4 = 0  6x1 − 5x2 + x5 = 0 associated with the homogeneous system Ex = 0. Regarding a Cartesian description for CS(A), note that the columns of A are the rows of AT , which means that CS(A) = RS(AT ). Using the fact that RS(AT ) is orthogonal to N (AT ) and following an identical argument as above, we deduce that a Cartesian description for CS(A) corresponds to the homogeneous system Fx = 0, where F is the matrix whose rows are the transposed basis vectors of N (AT ). This method for obtaining a Cartesian description for CS(A) is illustrated at the end of the next subsection and also in Exercise 22.6.2. 22.4 Linear independence and span of a set X revisited Suppose that we are given a set of vectors X = {v1 , . . . , vk } where each vector vi belongs to Rn . For example, let us consider the following five vectors in R3 ,            3 4  2 1  1 X = 1 , 1 , 0 ,  5  , −1 ,   7 11 2 3 5 which allows us to utilise the matrix A of Example 21.1.1; i.e., the columns of A are the vectors in X. The reduced row echelon form of the matrix A is given below:     1 2 1 3 4 1 0 −1 7 −6 RRE  1 1 0 5 −1 − −−−→ 0 1 1 −2 5  . 3 5 2 11 7 0 0 0 0 0 Regarding the issue of the linear independence or dependence of the set X, we see that A does not have full column rank, so X is a linearly dependent set. A question arises (i): Can we identify a subset S of X that consists of linearly independent vectors and thereby express the remaining vectors in X as linear combinations of the vectors in S? Similarly, 16 regarding the issue of the linear span of X, we see that A does not have full row rank, so X does not span R3 . A question again arises: (ii) Can we identify the subspace Lin(X) ⊂ R3 spanned by the set X and also obtain a basis B and a Cartesian description for this subspace of R3 ? Realising that Lin(X) = CS(A), both these questions have already been answered in the previous subsections. More precisely, a subset S of X consisting of linearly independent vectors corresponds to a basis for CS(A). We have already found that a basis consists of the first  B for CS(A)   1 2 two columns of A; that is, B = {c1 , c2 } where c1 = 1 and c2 = 1. Moreover, as we 3 5 saw in subsection 21.2, the basis vectors in any basis of N (A), such as the three vectors       1 −7 6 −1  2  −5        1 , 0 , 0 ,       0 1 0 0 0 1 imply the existence of corresponding linear combinations of the columns {ci } of A (and hence of the vectors {vi } in X) that are equal to the zero vector: v1 − v2 + v3 = 0, −7v1 + 2v2 + v4 = 0, 6v1 − 5v2 + v5 = 0. These combinations allow us to express the remaining vectors in X as linear combinations of the vectors in B = {v1 , v2 }: v3 = −v1 + v2 , v4 = 7v1 − 2v2 , v5 = −6v1 + 5v2 . Finally, since Lin(X) = CS(A) = RS(AT ) and RS(AT ) is orthogonal to N (AT ), a Cartesian description for the two-dimensional subspace Lin(X) ⊂ R3 corresponds to the matrix equation Fx = 0, where F is the 1 × 3 matrix of rank 1 whose single row is a basis vector for N (AT ). So, we just need to find the reduced row echelon form of   1 1 3 2 1 5    T , 1 0 2 A =   3 5 11 4 −1 7 17 which turns out to be the matrix  1 0  RRE(AT ) =  0 0 0 0 1 0 0 0  2 1  0 , 0 0 and then identify a basis for its null space. We have     −2  T N (A ) = Lin −1 ,   1 so the required 1 × 3 matrix F of rank 1 is F = −2 −1 1 and a Cartesian description for Lin(X) is given by   x1 −2 −1 1 x2  = 0; x3 i.e., by −2x1 − x2 + x3 = 0. Alternatively,   we  can  calculate the cross product of the two 2 1   vectors in the basis B = {c1 , c2 } = 1 , 1 in order to obtain a normal vector for   5 3 Lin(X) and hence a Cartesian description. However, note that although the cross product of two vectors is applicable here (where Lin(X) is a 2-dimensional subspace of R3 ) it will not be applicable in general (where Lin(X) may be a k-dimensional subspace of Rn ). Only the first method (based on the orthogonality between CS(A) and N (AT )) is applicable in general. 22.5 Consistency of a linear system revisited Let us now consider a general linear system Ax = b where A is a given matrix and b is a given vector. We already know that the system is consistent if and only if the rank of the augmented matrix (A|b) is equal to the rank of the coefficient matrix A, that is, if and only if ρ((A|b)) = ρ(A). An alternative statement amounting to the consistency of the system Ax = b is that the vector b belongs to the column space of A: b ∈ CS(A). Let us prove this statement by considering the columns of A = (c1 . . . ck ). If the system Ax = b is consistent, it must have at least one solution for the vector x; let us denote this 18   s1  ..  solution by x =  . . Then, since the equation sk   s1  ..  (c1 . . . ck )  .  = b sk amounts to the equation s1 c1 + · · · + sk ck = b, we deduce that b is a linear combination of the columns of A; i.e., b ∈ CS(A). Conversely, if b ∈ CS(A), then there exist scalars s1 , s2 , ..., sk such that s1 c1 + · · · + sk ck = b; i.e., such that   s1  ..  (c1 . . . ck )  .  = b. sk Hence,  the  linear system Ax = b, where A = (c1 . . . ck ), admits at least one solution s1  ..  x =  .  and is therefore consistent. Thus, we have shown that Ax = b is consistent if sk and only if b ∈ CS(A); i.e., we have ρ((A|b)) = ρ(A) if and only if b ∈ CS(A). 22.6 Exercises for self study Exercise 22.6.1 Consider the set of vectors X = {v1 , v2 , v3 , v4 } where         0 3 −3 3 v1 = 1 v2 = 1 v3 =  1  v4 =  4  . 2 5 −1 11 (a) Find the matrices RRE(A) and RRE(AT ), where A = (v1 v2 v3 v4 ) is the matrix whose columns are the vectors in X. (b) Obtain bases B1 , B2 , B3 and B4 for the vector spaces CS(A), RS(A), N (A) and N (AT ), respectively. (c) Use the bases B2 and B3 to confirm that RS(A) is orthogonal to N (A). (d) Briefly explain why CS(A) is orthogonal to N (AT ) and confirm this fact by using the bases B1 and B4 . 19 Exercise 22.6.2 Consider the set X and the matrix A from Exercise 22.6.1: (a) Is X a linearly independent set? Does X span R3 ? Briefly justify your answers. (b) Obtain a basis for Lin(X) and also a Cartesian description for Lin(X). (c) Also obtain Cartesian descriptions for RS(A), N (A) and N (AT ). Exercise 22.6.3 Consider the set of vectors X = {v1 , v2 , v3 } where       0 1 1 1 2 1        1 1 . 2 v1 =  v = v = 2 3       0 3 1 3 5 1 (a) Find the matrices RRE(A) and RRE(AT ), where A = (v1 v2 v3 ) is the matrix whose columns are the vectors in X. (b) Obtain bases B and C for the vector spaces CS(A) and N (AT ), respectively. (c) Hence, obtain a Cartesian description for CS(A) ⊂ R5 . (d) Use your Cartesian description from part (c) to confirm that each vector in X belongs to CS(A). Exercise 22.6.4 Consider the set X and the matrix A from Exercise 22.6.3: (a) Is X a linearly independent set? Briefly justify your answer. (b) Obtain a Cartesian description and a basis D for Lin(X) ⊂ R5 . Also state the dimension of Lin(X). (c) Find the coordinates (v1 )D , (v2 )D and (v3 )D of the vectors in X with respect to your basis D of Lin(X). 22.7 Relevant sections from the textbooks • M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press. Sections 5.3 and 6.5 of our Algebra Textbook are relevant. 23 Linear transformations, 1 of 6 For the next six lectures, we focus on special types of functions between vector spaces known as linear transformations. In this lecture, we introduce the relevant definitions and a few fundamental results; among them, the rank-nullity theorem for linear transformations. 20 23.1 The main definitions Recall that a function from a set X to a set Y is a rule which assigns to every element x ∈ X a unique element y ∈ Y . Now suppose that V and W are not just sets, but vector spaces. T : V → W is called linear if for all vectors u, v ∈ V and all scalars α ∈ R: A function 1. T (u + v) = T (u) + T (v), and 2. T (αu) = αT (u). Any such linear function is known as a linear transformation. In the special case where W = V , a linear transformation T : V → V is known as a linear operator. Conditions 1 and 2 imply, and are implied by, the single condition that for all vectors u, v ∈ V and all scalars α, β ∈ R: T (αu + βv) = αT (u) + βT (v). Therefore, a linear transformation T : V → W maps linear combinations of vectors in V to the same linear combinations of the image vectors in W . In this sense, T preserves the ‘linearity’ of the vector space V . In particular, T maps the zero vector in V to the zero vector in W . This can be seen in a number of ways. For instance, take any x ∈ V . Then, by the linearity of T , we have T (0) = T (0x) = 0T (x) = 0. Example 23.1.1 The function F1 : R → R defined by F1 (x) = 3x is a linear transformation, since for any vectors x, y ∈ R and any scalars α, β ∈ R, we have F1 (αx + βy) = 3(αx + βy) = α(3x) + β(3y) = αF1 (x) + βF1 (y). On the other hand, neither of the functions F1 : R → R and F2 : R → R defined by F2 (x) = 3x + 2 and F3 (x) = x2 is linear. We have F2 (αx + βy) = 3(αx + βy) + 2 6= α(3x + 2) + β(3y + 2) = αF2 (x) + βF2 (y) and F3 (αx + βy) = (αx + βy)2 6= α(x2 ) + β(y 2 ) = αF3 (x) + βF3 (y). Example 23.1.2 Let A be an m × n matrix and let T : Rn → Rm be the function defined by matrix multiplication: T (x) = Ax. Then T is a linear transformation. We have T (u + v) = A(u + v) = Au + Av = T (u) + T (v) and also T (αu) = A(αu) = αAu = αT (u). Example 23.1.3 Let V be the set of all functions of the form f (x) = a + bx, where a, b ∈ R. This is a vector space under the standard operations of pointwise addition and scalar multiplication of functions. The transformation T : V → V defined by differentiation, that is, T (f ) = f 0 , is a linear transformation. First, T is well-defined, since for all f ∈ V , where f (x) = a + bx, the image vector f 0 is given by f 0 (x) = b, which is an element of V . To show that T is a linear transformation, we use the properties of the derivative: Take any two elements f, g ∈ V and any scalars α, β ∈ R. Then T (αf + βg) = (αf + βg)0 = αf 0 + βg 0 = αT (f ) + βT (g). 21 23.2 Identity, compositions, linear combinations, inverse Given a vector space V , the linear transformation T : V → V defined by T (v) = v is called the identity transformation. The composition of two linear transformations is again a linear transformation. In particular, if T : V → W and S : W → U , then the composite transformation S ◦ T , denoted by ST , is the linear transformation defined by (S ◦ T )(v) = (ST )(v) = S(T (v)). T S Note that ST means ‘T followed by S’; that is, V − →W − → U. A linear combination of linear transformations is again a linear transformation. More precisely, is S and T are both linear transformations between vector spaces V and W , i.e. if S : V → W and T : V → W , then the sum S + T and the scalar multiple αS, α ∈ R, are linear transformations between V and W , and therefore so is the linear combination αS + βT for any choice of α, β ∈ R. Finally, let V and W be finite-dimensional vector spaces of the same dimension, and let T : V → W be a linear transformation. Then, if it exists, the inverse T −1 of T is the unique linear transformation T −1 : W → V such that T −1 (T (v)) = v 23.3 ∀ v ∈ V. Range and kernel Suppose that T is a linear transformation from a vector space V to a vector space W . Then the range of T , denoted by R(T ), is defined as the set R(T ) = {w ∈ W | w = T (v) for some v ∈ V } ⊆ W. The kernel of T , denoted by ker(T ), is defined as the set ker(T ) = {v ∈ V | T (v) = 0} ⊆ V, where 0 is the zero vector of W . The proof of the following theorem is omitted but is quite straightforward: Theorem 23.3.1 The kernel and the range of a linear transformation T : V → W are subspaces of V and W , respectively. Below, we find the range and kernel of each of the linear transformations presented in Examples 23.1.1, 23.1.2 and 23.1.3: Example 23.3.2 The range of the function F1 : R → R defined by F1 (x) = 3x is the set R(F1 ) = {y ∈ R | y = 3x for some x ∈ R} . We will show that R(F1 ) = R 22 by proving that both statements R(F1 ) ⊆ R and R ⊆ R(F1 ) hold true. Indeed, if y ∈ R(F1 ), then y = 3x for some x ∈ R, which implies y that y ∈ R. Hence, R(F1 ) ⊆ R. Conversely, y , so there exists an x ∈ R, namely x = , such given any y ∈ R, we can write y = 3 3 3 that y = 3x. Hence, y ∈ R(F1 ), which shows that R ⊆ R(F1 ). The equality of the sets R(F1 ) and R follows. The kernel of the function F1 : R → R is the set ker(F1 ) = {x ∈ R | 3x = 0} . Clearly, ker(F1 ) = {0} because the unique solution of the equation 3x = 0 is x = 0. Example 23.3.3 Given any m × n matrix A, the range of the function T : Rn → Rm defined by T (x) = Ax is the set R(T ) = {y ∈ Rm | y = Ax for some x ∈ Rn } . We claim the general result that R(T ) = CS(A). To prove this statement, we need to show that R(T ) ⊆ CS(A) and CS(A) ⊆ R(T ). If y ∈ R(T ), then y = Ax for some x ∈ Rn . This implies that y is a linear combination of the columns of A = (c1 . . . cn ), since   x1  ..  y = Ax = (c1 . . . cn )  .  = x1 c1 + · · · + xn cn . xn Hence, y ∈ CS(A), which shows that R(T ) ⊆ CS(A). Conversely, if y ∈ CS(A), then y is some linear combination of the columns of A = (c1 . . . cn ); that is, y = x1 c1 + · · · + xn cn .   x1  ..  This implies that y = Ax where x =  .  ∈ Rn . Hence, y ∈ R(T ), which shows that xn CS(A) ⊆ R(T ). The equality between the sets CS(A) and R(T ) follows. The kernel of T is the set ker(T ) = {x ∈ Rn | Ax = 0} . We claim the general result that ker(T ) = N (A). To prove this, we need to show that ker(T ) ⊆ N (A) and N (A) ⊆ ker(T ). Indeed, if x ∈ ker(T ), then Ax = 0. Therefore, x solves the homogeneous linear system Ax = 0. It 23 follows that x ∈ N (A), which shows that ker(T ) ⊆ N (A). Conversely, if x ∈ N (A), then x solves the homogeneous linear system Ax = 0. Hence x ∈ ker(T ), which shows that N (A) ⊆ ker(T ). The equality ker(T ) = N (A) follows. Example 23.3.4 Given the vector space V consisting of all functions of the form f (x) = ax + b, where a, b ∈ R, the range of the linear transformation T : V → V defined by T (f ) = f 0 is the set R(T ) = {g ∈ V | g = f 0 for some f ∈ V } . We will show that R(T ) = W , where W is the subspace of V consisting of all functions of the form f (x) = c, where c ∈ R. You may find it useful to confirm that W is a subspace of V by using the Subspace Criterion. Let us first show that R(T ) ⊆ W . Indeed, if g ∈ R(T ), then g = f 0 for some f ∈ V . Since any f ∈ V has the form f (x) = ax + b, we see that f 0 (x) = a, so g(x) = a. Hence g ∈ W . Moreover, we also have that W ⊆ R(T ). Indeed, if g ∈ W , then g(x) = c for some c ∈ R. Hence, g(x) can be written in the form g(x) = f 0 (x) where f (x) = cx ∈ V . It follows that g ∈ R(T ). This completes the proof that R(T ) = W . The kernel of T is the set ker(T ) = {f ∈ V | f 0 = 0} , where 0 denotes the identically zero function in V . We will show that ker(T ) = U , where U is the subspace of V consisting of all functions of the form f (x) = c, where c ∈ R. Note that U and W correspond to the same subspace of V ; namely, the subspace consisting of all constant functions. However, U is a subspace of the domain V of T while W is a subspace of the codomain V of T . Let us first show that ker(T ) ⊆ U . Indeed, if f ∈ ker(T ), then f 0 = 0, where 0 is the identically zero function of V . Hence f (x) = c for some c ∈ R, which shows that f ∈ U . Moreover, we also have that U ⊆ ker(T ). Indeed, if f ∈ U , then f (x) = c for some c ∈ R. Hence, f 0 (x) = 0 where 0 is the identically zero function in V . It follows that f ∈ ker(T ). This completes the proof that ker(T ) = U . 23.4 Rank-nullity theorem for linear transformations Going back to Example 23.3.3, we saw that given any m × n matrix A, the linear transformation T : Rn → Rm defined by T (x) = Ax satisfies R(T ) = CS(A) and ker(T ) = N (A). In particular, if the m × n matrix A has rank k, we have that k = rank(A) = dim(CS(A)) = dim(R(T )). Then, A has nullity n − k, so we also have that n − k = nullity(A) = dim(N (A)) = dim(ker(T )). Thus, the Rank-Nullity theorem associated with A can be expressed in the form dim(R(T )) + dim(ker(T )) = n. 24 More generally, for any linear transformation T : V → W whose domain V is a finitedimensional vector space (not necessarily Euclidean), we have the following theorem, known as the Rank-Nullity theorem for linear transformations: Theorem 23.4.1 Suppose that T is a linear transformation from a finite-dimensional vector space V to a vector space W . Then rank(T ) + nullity(T ) = dim(V ). The proof of this theorem can be found in section 7.2 of our algebra textbook. Note that this result holds even if W is not finite-dimensional. 23.5 Exercises for self study Exercise 23.5.1 For each of the following linear transformations, find a basis for the kernel of T , ker(T ), and the range of T , R(T ). Verify the Rank-Nullity theorem in each case:     1 2 x + 2y x x (a) T : R2 → R3 by T ( ) = 0 0  =  0 , y y 0 0 0        x 1 1 1 x x+y+z (b) T : R3 → R3 by T (y ) = 0 1 1 y  =  y + z  . z 0 0 1 z z Exercise 23.5.2 Give an example of a matrix A such that the linear transformation T : R3 → R3 defined as T (x) = Ax has the following properties: ker(T ) ⊂ R3 is the line with Cartesian equation x = y = z, and R(T ) ⊂ R3 is the plane with Cartesian equation 2x + y − z = 0. Exercise 23.5.3 For the following linear transformation T , find a basis for the kernel of T , ker(T ), and the range of T , R(T ). Obtain a Cartesian description and a vector parametric description for ker(T ) and R(T ):        x 1 1 0 x x+y T : R3 → R3 by T (y ) =  0 1 1 y  =  y + z  . z −1 0 1 z −x + z Excercise 23.5.4 (a) Define what we mean by a linear transformation T : V → W from a vector space V to a vector space W . (b) Hence, show that any linear transformation must map the zero vector in V to the zero vector in W . (c) Find a basis B1 for the kernel, ker(S), of the linear transformation S : R3 → R2 defined by     x x 0 1 2   y . S(y ) = 1 3 4 z z 25 Also find a basis B2 for the range, R(S), of S. (d) Obtain a Cartesian description for ker(S) and a vector parametric description for R(S). (e) Is the linear transformation S invertible? Briefly justify your answer. 23.6 Relevant sections from the textbooks • M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press. Sections 7.1 and 7.2 of our Algebra Textbook are relevant. 24 Linear transformations, 2 of 6 In this section we focus on linear transformations T : V → W between finite-dimensional vector spaces V and W and show that any such transformation can be represented by a matrix. We start by considering linear transformations T : Rn → Rm between Euclidean vector spaces, in which case a matrix representation for T arises naturally. Examples of such linear transformations include reflections, rotations and stretches. These are all transformations from Rn to itself. We obtain matrices representing reflections, rotations and stretches in the case where n = 2. We then extend our discussion to linear transformations T : V → W between any finite-dimensional vector spaces V and W . In this general case, a matrix representation for T arises only if a basis B is introduced for the domain V of T and a basis C is introduced for the codomain W of T . Accordingly, we talk about the matrix representation of T : V → W with respect to the bases B and C. 24.1 Matrix representation of a linear T : Rn → Rm We saw in Section 23 that given any m × n matrix A, we can define an associated linear transformation T : Rn → Rm by T (v) = Av. There is a reverse connection: for every linear transformation T : Rn → Rm , there is a matrix A such that T (v) = Av. In this context, we will denote the matrix by AT in order to identify it as the matrix corresponding to T . This should not be confused with the notation AT for the transpose of a matrix. The following theorem tells us how to construct AT given any linear transformation T : R n → Rm . Theorem 24.1.1 Suppose that T : Rn → Rm is a linear transformation. Let {e1 , e2 , . . . , en } denote the standard basis of the domain of T , Rn , and let AT be the matrix whose columns are the vectors T (e1 ), T (e2 ), . . . , T (en ) ∈ Rm : that is, AT = T (e1 )T (e2 ) . . . T (en ) . Then, for every x ∈ Rn , T (x) = AT x. 26   x1  x2    Proof Let x =  ..  be any vector in Rn . Then . xn         x1 1 0 0  x2  0 1 0         x =  ..  = x1  ..  + x2  ..  + · · · + xn  ..  . . . . xn 0 0 1 = x1 e1 + x2 e2 + · · · + xn en . Then by the linearity properties of T we have T (x) = T (x1 e1 + x2 e2 + · · · + xn en ) = T (x1 e1 ) + T (x2 e2 ) + · · · + T (xn en ) = x1 T (e1 ) + x2 T (e2 ) + · · · + xn T (en ). But this last expression is just a A = T (e1 )T (e2 ) . . . T (en ) , so we have linear combination of the columns of   x1    x2  T (x) = T (e1 )T (e2 ) . . . T (en )  ..  = AT x, . xn which completes the proof. Example 24.1.2 Let T : R3 → R2 be the linear transformation given by   x 2x + y + z T (y ) = . x−y z To find the matrix AT associated with this linear transformation, we calculate the images of the standard basis vectors e1 , e2 and e3 . We have     1 0 2 1 T (e1 ) = T (0) = , T (e2 ) = T (1) = , 1 −1 0 0   0 1 T (e3 ) = T (0) = . 0 1 Hence, the 2 × 3 matrix A = T (e1 )T (e2 ) . . . T (en ) representing T : R3 → R2 is 2 1 1 A= . 1 −1 0 Note that, indeed,       x x x 2 1 1   2x + y + z    y = A y = = T ( y ). 1 −1 0 x−y z z z 27 24.2 Reflections, rotations and stretches in R2 We will consider three types of linear transformations T : R2 → R2 , namely, reflections, rotations and stretches, and construct the matrices corresponding to these. Since the matrix of any linear transformation T : Rn → Rm is determined by the way in which T acts on the standard basis of the domain Rn , all that we have to do is to calculate the images T (e1 ) and T(e2 ) of the standard basis vectors and then construct AT by the formula AT = T (e1 )T (e2 ) . We start with reflections: Reflections A reflection in the x-axis is depicted below. It leaves the basis vector e1 a unchanged and sends the basis vector e2 to −e2 . Its effect on a general vector v = ∈ b R2 is also depicted below. Note that in the diagram, we have identified the domain with the codomain of T and regarded them both as a single copy of R2 : Figure 24.2.1 The matrix AT representing T is given by AT = T (e1 )T (e2 ) = 1 0 . 0 −1 a Then, for any vector v = ∈ R2 , we have b a 1 0 a a T( )= = b 0 −1 b −b in agreement with the illustration above. Rotations below: An anticlockwise rotation by an angle θ, where 0 < θ < 28 π , is visualised 2 Figure 24.2.2 The matrix AT representing this rotation can be read directly from the diagram. We have: cosθ −sinθ AT = T (e1 )T (e2 ) = . sinθ cosθ a Then, for any vector v = ∈ R2 , we have b a cosθ −sinθ a a cosθ − b sinθ T( )= = . b sinθ cosθ b a sinθ + b cosθ Stretches A stretch by a factor of k ∈ R in the x-direction and a factor of l ∈ R in the y-direction is depicted below. Figure 24.2.3 The corresponding matrix AT is thus AT = T (e1 )T (e2 ) = 29 k 0 . 0 l Invertible linear transformations Now, in general, an important relationship between a linear transformation T : Rn → Rn and the n × n matrix AT representing it is that T is invertible only if the matrix AT is invertible. This result is stated without proof but is a consequence of the fact that each T : Rn → Rn uniquely determines an n × n matrix AT , and vice versa. Rotations, reflections and stretches by non-zero factors are all invertible transformations. In particular, the inverse of a reflection in the x-axis is another reflection in the x-axis, the inverse of an anticlockwise rotation by θ is an anticlockwise rotation by −θ (i.e., a clockwise rotation by θ) and the inverse of a stretch by a factor of k 6= 0 in the x-direction and a factor of l 6= 0 in the y-direction is a stretch by a factor of 1/k in the x-direction and a factor of 1/l in the y-direction. If T is a linear transformation from Rn to Rn and T −1 exists, then AT −1 AT = AT AT −1 = I. 24.3 The matrix AB→C T In this subsection, we find a matrix representation for a linear transformation T : V → W from a finite-dimensional vector space V to a finite-dimensional vector space W . The spaces V and W may not be Euclidean spaces. Provided that a basis B for the domain V of T and a basis C for the codomain W of T are introduced, the elements v ∈ V and T (v) ∈ W can be represented by coordinate vectors (v)B and T (v) C with respect to B and C, respectively. The resulting matrix representing T : V → W with respect to B and C is denoted by AB→C . T We begin with the following theorem: Theorem 24.3.1 Let V be a finite-dimensional vector space and let T be a linear transformation from V to a vector space W . Then T is completely determined by how it operates on a basis of V . Proof Let dim(V ) = n, and let B = {v1 , v2 , . . . , vn } be a basis of V . Then any v ∈ V can be uniquely expressed as a linear combination of these basis vectors: v = α1 v1 +α2 v2 + · · · + αn vn . So, by the linearity of T , T (v) = T (α1 v1 + α2 v2 + · · · + αn vn ) = α1 T (v1 ) + α2 T (v2 ) + · · · + αn T (vn ). That is, if v ∈ V is expressed as a linear combination of the basis vectors, then the image T (v) is the same linear combination of the images of the basis vectors. Therefore, if we know how T operates on the basis vectors, we know how T operates on all v ∈ V . In the particular case where both V and W are finite-dimensional vector spaces, and provided that a basis B for V and a basis C for W have been introduced, this result allows us to find a matrix representation for the linear transformation T . To be more specific, let dim(V ) = n, dim(W ) = m and let the corresponding bases be given by B = {v1 , v2 , . . . , vn } and C = {w1 , w2 , . . . , wm }. Furthermore, let (v)B be the coordinate vector of v ∈ V with respect to the B basis and let T (v) C be the coordinate 30 vector of the image of v with respect to the C basis. Then, by working with these coordinatevectors (rather than with the vectors themselves), we can find a matrix such that T (v) C = AB→C (v)B . T The following theorem tells us how. Its proof is omitted, but it is analogous to that of Theorem 24.1.1. Theorem 24.3.2 Let T : V → W be a linear transformation from an n-dimensional vector space V to an m-dimensional vector space W . Let B = {v1 , v2 , . . . , vn } denote a basis for the domain V and C = {w1 , w2 , . . . , wm } denote a basis for the codomain B→C W . Furthermore, be the m × n matrix whose columns are the coordinates let AT T (v1 ) C , T (v2 ) C , . . . , T (vn ) C of the images of the B-basis vectors with respect to the C basis; that is, AB→C = T (v1 ) C T (v2 ) C . . . T (vn ) C . T Then, for every v ∈ V , T (v) C = AB→C (v)B . T Note that if V is Rn , W is Rm and B and C are the standard bases for V and W , then AB→C becomes the matrix AT introduced in subsection 24.1; that is, T AB→C = T (v1 ) C T (v2 ) C . . . T (vn ) C = T (e1 )T (e2 ) . . . T (en ) = AT . T Below, we illustrate Theorem 24.3.2 by means of an example where T is a linear transformation between Euclidean spaces, B is a standard basis for the domain of T , and C is a non-standard basis for the codomain of T . Example 24.3.3 Consider the linear transformation T : R3 → R2 defined by   x x + 2y + z T (y ) = . x−z z Let B be the standard basis for the domain R3 of T ,        0 0   1      0 , 1 , 0 , B = {e1 , e2 , e3 } =   0 0 1 and let C be the following basis for the codomain R2 of T : 1 2 C = {w1 , w2 } = , . 4 3 Find the matrix AB→C representing T with respect to the B and C bases. T Following Theorem 24.3.2, we need to find the coordinates T (e1 ) C , T (e2 ) C and T (e3 ) C of the images of the B-basis vectors with respect to the C basis. Using the definition of T we get       1 0 0 1 1 2       T (e1 ) = T ( 0 ) = , T (e2 ) = T ( 1 ) = , T (e3 ) = T ( 0 ) = , 1 0 −1 0 0 1 31 1 2 1 where the images , , are elements of R2 . In order to obtain the coordinates 1 0 −1 T (ei ) C of each image vector T (ei ) with respect to the C basis of R2 , we need to express each T (ei ) as a linear combination of the C-basis vectors w1 , w2 . In other words, for each T (ei ), we need to find scalars α, β such that αw1 + βw2 = T (ei ). Given each T (ei ), this equation is equivalent to solving the linear system Dx = T (ei ), where D = (w1 w2 ) is the matrix whose columns are the vectors w1 , w2 , α and x = is the vector of the unknowns. A fast way of solving this system is to invert β D and then use the relation x = D−1 T (ei ). We have: 1 −3 2 1 2 −1 D= , so D = . 4 3 5 4 −1 Hence, for T (e1 ), 1 1 −3 2 1 −5 α = , = 3 1 β 5 4 −1 5 for T (e2 ), 6 1 −3 2 α 2 −5 = = , 8 β 0 5 4 −1 5 for T (e3 ), 1 −3 2 α 1 −1 = = . β −1 1 5 4 −1 We conclude that T (e1 ) C = 1 −5 3 5 , T (e2 ) C = C − 56 8 5 , C T (e3 ) C = −1 , 1 C corresponding to the equations 1 3 T (e1 ) = − w1 + w2 , 5 5 6 8 T (e2 ) = − w1 + w2 , 5 5 T (e1 ) = −w1 + w2 . It follows that that 2 × 3 matrix AB→C that represents T : R3 → R2 with respect to the T bases B and C is 1 − 5 − 65 −1 B→C AT = . 3 8 1 5 5 For any v ∈ R3 , we have that T (v) C = AB→C (v)B . T Note that the matrix AT representing the same transformation T with respect to the 1 2 1 standard basis of R3 and the standard basis of R2 is AT = . Indeed, since the 1 0 −1 32 coordinates (v) of a vector v ∈ Rn with respect to the standard basis of Rn coincide with the entries of v; i.e. (v) = v, we can read the matrix AT directly from the definition of T :     x x 1 2 1   x + 2y + z y . T (  y ) = = x−z 1 0 −1 z z We complete this topic by generalising the result established in subsection 23.3 that for any linear transformation T : Rn → Rm between Euclidean vector spaces defined by T (x) = Ax, we have ker(T ) = N (AT ), R(T ) = CS(AT ). The generalisation is straightforward but is stated without proof. Theorem 24.3.4 Let T : V → W be a linear transformation between finite-dimensional vector spaces and let B and C be bases for the domain V and the codomain W of T , respectively. Then ker(T ) = N (AB→C ), T R(T ) = CS(AB→C ), T ) refer to the B basis of V and the where the coordinates of each vector in N (AB→C T B→C coordinates of each vector in CS(AT ) refer to the C basis of W . 24.4 Exercises for self study Exercise 24.4.1 T : R2 → R2 and S : R2 → R2 are linear transformations defined by x −x T( )= y −y and x −y S( )= . y x (a) Sketch the effects of T and S on the standard basis of R2 and hence describe T and S in words. (b) Find the matrices AT and AS representing T and S; that is, find AB→C and ASB→C T where B and C are both the standard basis of R2 . (c) Describe in words the linear transformation S 2 T . Then check your answer by multiplying the corresponding matrices. Exercise 24.4.2 T : R2 → R2 and S : R2 → R2 are linear transformations with respective matrices ! √1 √1 − −1 0 2 2 AS = . AT = √1 √1 0 1 2 2 (a) Describe T and S in words. (b) Illustrate ST and T S by considering their effects on the standard basis of R2 and show that ST 6= T S. 33 Exercise 24.4.3 (a) Find the matrix AT representing the reflection T : R2 → R2 in the line y = x by considering the effect on T on the standard basis of R2 . (b) Explain why we should expect that A2T = I and then verify this directly. Excercise 24.4.4 Let V be the vector space of all functions f : R → R of the form f (x) = a + bx + cx2 where vector addition and scalar multiplication are defined in the standard way: (f + g)(x) := f (x) + g(x), (αf )(x) := αf (x). Consider the transformation T : V → V defined by differentiation, i.e. T (f ) = f 0 . (a) Show that the transformation T is well-defined; that is, show that the image vector T (f ) is indeed an element of V . (b) Show that T is a linear transformation. You are given the basis B = {f1 , f2 , f3 } for V , where f1 (x) = 1 + x + x2 , f2 (x) = 3 + 2x and f3 (x) = 4x + 5x2 . You are also given the basis C = {g1 , g2 , g3 } for V , where g1 (x) = 1, g2 (x) = x and g3 (x) = x2 . representing T with respect to the bases B and C. (c) Find the matrix AB→C T ). ) and the column space CS(AB→C (d) Find the null space N (AB→C T T (e) Using your answers to part (d), find the kernel and the range of T . 24.5 Relevant sections from the textbooks • M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press. Sections 7.1 and 7.2 of our Algebra Textbook are relevant. 25 25.1 Linear transformations, 3 of 6 Change of basis and transition matrix Let us consider the Euclidean space Rn . Suppose that the vectors v1 , v2 , . . . , vn form a basis B for Rn . Then, as we have seen, any x ∈ Rn can be written in exactly one way as a linear combination x = α1 v1 + α2 v2 + · · · + αn vn of the vectors in the basis B. The vector 34   α1  α2    (x)B =  ..   .  αn B is the coordinate vector of x with respect to B = {v1 , v2 , . . . , vn }. Note that the subscript B may be omitted from the right hand side of the above equation as long as it is clear that the coordinates α1 , α2 , . . . , αn refer to the basis B. In other words, we can also write  α1  α2    (x)B =  .. . In the particular case where B is the standard basis {e1 , e2 , . . . , en } for  .  αn   x1  x2    Rn , the coordinate vector (x) coincides with x itself. This is because if x =  ..  ∈ Rn , . xn then   x1  x2    x = x1 e1 + x2 e2 + · · · + xn en , hence (x) =  ..  . . xn In practice, in order to find the coordinates of a given vector x with respect to a basis B = {v1 , v2 , . . . , vn }, we just need to solve the system of linear equations α1 v1 + α2 v2 + · · · + αn vn = x. This system can be expressed in the form   α1  α2    (v1 v2 . . . vn )  ..  = x,  .  αn where (v1 v2 . . . vn ) is the matrix whose columns are the vectors in the basis B. Denoting this matrix by PB , PB = (v1 v2 . . . vn ), and using the fact that   α1  α2     ..  = (x)B  .  and αn the above equation becomes PB (x)B = (x). 35 x = (x), The matrix PB links the coordinates (x)B of x with respect to the B-basis to the coordinates (x) of x with respect to the standard basis. It is for this reason that PB is called the transition matrix from B-coordinates to standard coordinates. Note that the n × n matrix PB is invertible because its columns form a basis for Rn , which means that the rank of PB is n. So we can also write (x)B = P−1 B (x). The matrix P−1 B is then the transition matrix from standard coordinates to Bcoordinates. Example 25.1.1 Let B be the following set of vectors in R3 :        2 3   1      2 , −1 , 2 ; B = {f1 , f2 , f3 } =   −1 4 1 that is, f1 = e1 + 2e2 − e3 , f2 = 2e1 − e2 + 4e3 , f3 = 3e1 + 2e2 + e3 . (a) Show that B is a basis for R3 . (b) Consider the vector v ∈ R3 whose coordinate vector with respect to the B-basis is   4  1  . Find the standard coordinate vector (v) of v. −5 B Regarding part (a), to show that B is a basis, we can form the matrix (f1 f2 f3 ),   1 2 3  2 −1 2 , −1 4 1 and evaluate its determinant. We find that this is equal to 4 6= 0, so B is a basis for R3 ; i.e. (f1 f2 f3 ) has full row rank and full column rank. In particular, having shown that B is a basis, we have (f1 f2 f3 ) = PB ; namely, the transition matrix from B-coordinates to standard coordinates. Regarding part (b), since   4 (v)B =  1  , −5 B we have v = 4f1 + f2 − 5f3 . 36 We can find the standard coordinates (v) either by expressing v as a linear combination of the standard basis vectors e1 , e2 , e3 , according to v = 4f1 + f2 − 5f3 = 4(e1 + 2e2 − e3 ) + (2e1 − e2 + 4e3 ) − 5(3e1 + 2e2 + e3 ) = −9e1 − 3e2 − 5e3 , that is, by using         1 2 3 −9 (v) = v = 4  2  + −1 − 5 2 = −3 , −1 4 1 −5 or, faster, by applying the formula derived previously:      1 2 3 4 −9      2 −1 2 1 (v) = PB (v)B = = −3 . −1 4 1 −5 B −5   5 Example 25.1.2 Given the vector w =  7 , find the coordinate vector (w)B , where −3 the basis B is given in Example 25.1.1. To find the B-coordinates of w, we can either solve the equation         5 1 2 3  7  = α1  2  + α2 −1 + α3 2 −3 −1 4 1   α1 and identify (w)B with the solution α2  of this equation, or we can use the transition α3 B −1 matrix PB from standard coordinates to B-coordinates; i.e., (w)B = P−1 B (w). Omitting the steps, we find that   1 −1 . (w)B = P−1 B (w) = 2 B This implies that w = f1 − f2 + 2f3 , which is verified below:         1 2 3 5        2 − −1 + 2 2 = 7 . (w) = w = f1 − f2 + 2f3 = −1 4 1 −3 37 25.2 The transition matrix PB→B 0 More generally, suppose that we are given a basis B of Rn , another basis B 0 of Rn , and the coordinates (v)B of a vector v ∈ Rn . Then, the transition matrix PB→B 0 from Bcoordinates to B 0 -coordinates, and hence the coordinates (v)B 0 of v, can be calculated as follows: First, we change from B-coordinates to standard coordinates using (v) = PB (v)B and then change from standard coordinates to B 0 -coordinates using (v)B 0 = P−1 B 0 (v). The combined effect on the initial coordinate vector (v)B is (v)B 0 = P−1 B 0 PB (v)B , which implies that the transition matrix PB→B 0 from B-coordinates to B 0 -coordinates is PB→B 0 = P−1 B 0 PB . An alternative method for calculating PB→B 0 is provided by the following theorem. The theorem is stated without proof, but you are asked to derive it in Exercise 25.3.3. Theorem 25.2.1 Let B and B 0 be two bases of Rn , where the first basis is B = {v1 , v2 , . . . , vn } . Then the transition matrix PB→B 0 from B-coordinates to B 0 -coordinates is given by PB→B 0 = (v1 )B 0 (v2 )B 0 . . . (vn )B 0 , where the columns of the matrix PB→B 0 consist of the coordinates of the B-basis vectors with respect to the B 0 basis. Note that in the particular case when B 0 is the standard basis for Rn , both ways of deriving PB→B 0 result in the transition matrix PB from B-coordinates to standard coordinates, as expected. In the former case, we get −1 PB→B 0 = P−1 B 0 PB = I PB = IPB = PB , and in the latter case, we get PB→B 0 = (v1 )B 0 (v2 )B 0 . . . (v n )B 0 = (v1 )(v2 ) . . . (vn ) = PB . Also note that the second method for calculating PB→B 0 is directly applicable to any finitedimensional vector space V , where the concept of a ‘standard’ basis may not be available. In contrast, the first method - that is, using PB→B 0 = P−1 B 0 PB - presupposes the existence of a standard basis for V , so it is not applicable unless we nominate a basis for V to play the role of the ‘standard’ basis. The following example illustrates this point. Example 25.2.2 Consider the set V = {f : R → R | f (x) = a + bx for some a, b ∈ R} , 38 which is a vector space under the standard operations of pointwise addition and scalar multiplication of functions. The following sets B, C and D are all bases for V : B = {f1 , f2 } where f1 (x) = 1, f2 (x) = x, C = {g1 , g2 } where g1 (x) = 2, g2 (x) = 1 + x, D = {h1 , h2 } where h1 (x) = 2 + x, h2 (x) = 1 + 2x. Now consider an element f ∈ V whose coordinate vector with respect to the C basis is 3 (f )C = . Find (f )D . 5 C Let us first note that we can solve this question from first principles, without using any 3 transition matrices: The statement (f )C = implies that f = 3g1 + 5g2 , which 5 C amounts to the statement that for all x ∈ R, f (x) = 3g1 (x) + 5g2 (x) = 3(2) + 5(1 + x) = 11 + 5x. In order to find the coordinates (f )D of f we just need to express f as a linear combination of the D-basis vectors. In other words, we need to find scalars α1 , α2 such that for all x ∈ R, 11 + 5x = α1 (2 + x) + α2 (1 + 2x) = (2α1 + α2 ) + x(α1 + 2α2 ). Since this equation must hold for all x ∈ R, it must be satisfied identically in x, which implies that 2α1 + α2 = 11 α1 + 2α2 = 5. and α2 = − 31 , which gives Solving this simultaneous system, we find α1 = 17 3 17 3 (f )D = . − 31 D The problem with this approach is that it is not systematic. If we were given another element of V and asked the same question, we would need to start over. A systematic approach amounts to obtaining (f )D by using the transition matrix PC→D from C-coordinates to D-coordinates. Of course, here, we realise that there is no ‘standard’ basis for V unless we nominate one of the bases B, C or D to play that role. So, let us start with the second method for calculating PC→D , since this method does not presuppose the presence of a standard basis for V . Using the result that PC→D = (g1 )D (g2 )D , all that we need to do is to express the Cbasis vectors g1 , g2 as linear combinations of the D-basis vectors h1 , h2 . Starting from g1 , let g1 = a1 h1 + a2 h2 . Then, ∀x ∈ R, we need to satisfy g1 (x) = a1 h1 (x) + a2 h2 (x), i.e., 39 2 = a1 (2 + x) + a2 (1 + 2x). Since this equation must hold identically in x, we obtain 2 = 2a1 + a2 0 = a1 + 2a2 , 4 2 4 3 . whose solution is a1 = 3 , a2 = − 3 . Hence (g1 )D = − 23 D Similarly, for g2 , let g2 = b1 h1 + b2 h2 . Then, for ∀x ∈ R, we need to satisfy g2 (x) = b1 h1 (x) + b2 h2 (x), i.e. 1 + x = b1 (2 + x) + b2 (1 + 2x). By a similar argument as above, we obtain the simultaneous system 1 = 2b1 + b2 1 = b1 + 2b2 , 1 1 1 whose solution is b1 = 3 , b2 = 3 . Hence (g2 )D = 31 . 3 Hence PC→D = and (f )D = PC→D (f )C = 4 3 − 32 4 3 − 23 1 3 1 3 1 3 1 3 D , 17 3 3 = . − 13 D 5 C We have thus recovered our previous answer. For completeness, let us calculate PC→D by the first method. As already discussed, this presupposes that we nominate, say, B as the ‘standard’ basis for V . We then need to find transition matrices PC from C-coordinates to ‘standard’ coordinates and PD from D-coordinates to ‘standard’ coordinates, and finally apply the formula PC→D = P−1 D PC . Nominating B as the ‘standard’ basis and expressing everything in B-coordinates, we have, by inspection: 1 0 2 1 2 1 B= , , C= , , D= , . 0 B 1 B 0 B 1 B 1 B 2 B We can even drop the subscript B since we have decided to treat B as the ‘standard basis’: 1 0 2 1 2 1 B= , , C= , , D= , . 0 1 0 1 1 2 Exactly as for Euclidean spaces, the above expressions imply that 2 1 2 1 PC = and PD = . 0 1 1 2 Hence, P−1 D 1 = 3 2 −1 −1 2 40 and PC→D = P−1 D PC 1 = 3 4 2 −1 2 1 2 1 3 = = − 32 −1 2 0 1 0 1 The conclusion that (f )D = PC→D (f )C = 25.3 4 3 − 23 1 3 1 3 1 3 1 3 . 17 3 3 = follows once more. 5 C − 31 D Exercises for self study Exercise 25.3.1 (a) Show that the following sets B and C are bases for R3 :        1 0   1 B = {f1 , f2 , f3 } = 0 , 1 , 0   1 3 1 and         1 0   1      1 , −1 , 1  . C = {g1 , g2 , g3 } =   1 0 −1  3 (b) Given (v) = −1, find (v)B . 2   2  (c) Given (w)C = 1 , find (w) and (w)B . 3 C Exercise 25.3.2 Consider the basis B for R3 given in Exercise 25.3.1. (a) Write down each B-basis vector fi as a linear combination of the standard basis vectors e1 , e2 , e3 of R3 . (b) Using any method of your choice, invert the system in part (a); that is, express each standard basis vector ei as a linear combination of the B-basis vectors f1 , f2 , f3 .   3  (c) Given the vector (v) = −1, use your answer to part (b) to express v as a linear 2 combination of the B-basis vectors. Verify that your answer agrees with Exercise 25.3.1, part (b). Exercise 25.3.3 Consider two random bases B and C for R3 : B = {f1 , f2 , f3 } and C = {g1 , g2 , g3 } . The transition matrix PB→C from B-coordinates to C-coordinates is defined by the property that ∀v ∈ R3 , PB→C (v)B = (v)C , 41 where (v)B and (v)C are the coordinates of v ∈ R3 with respect to the B and C bases, respectively. By choosing v = f1 , v = f2 and v = f3in the above relation, prove the result given in the lectures that PB→C = (f1 )C (f2 )C (f3 )C . π Excercise 25.3.4 Consider an anticlockwise rotation T : R2 → R2 by an angle θ = − . 6 (a) Write down the matrix AT of the linear transformation which accomplishes this rotation. (b) Write down the images T (e1 ) and T (e2 ) of the standard basis vectors e1 , e2 . Now consider the basis B of R2 given by B = {f1 , f2 } where f1 = T (e1 ), f2 = T (e2 ). (c) Write down the transition matrix PB from B-cordinates to standard coordinates and verify that, numerically, PB = AT . x (d) Given any vector x ∈ R2 , let its standard coordinates be denoted by (x) = and y X its B-coordinates be denoted by (x)B = . Now, a curve C ∈ R2 is described in Y B standard coordinates (x, y) by the Cartesian equation √ 3x2 + 2 3xy + 5y 2 = 6. Find the Cartesian equation of this curve in the new B-coordinates (X, Y ). (e) Hence sketch this curve. 25.4 Relevant sections from the textbooks • M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press. Section 7.3 of our Algebra Textbook is relevant. 26 26.1 Linear transformations, 4 of 6 Change of basis and linear transformations Given a linear transformation T : Rn → Rm , we have seen that there is a corresponding matrix AT , namely AT = T (e1 )T (e2 ) . . . T (en ) , such that T (x) = AT x for all x ∈ Rn . More generally, given a basis B = {f1 , f2 , . . . , fn } for the domain Rn of T and a basis C = {g1 , g2 , . . . , gm } 42 for the codomain Rm of T , we have also seen that there is a matrix AB→C , namely T AB→C = T (f ) T (f ) . . . T (f ) 1 C 2 C n C , T such that T (x) C = AB→C (x)B for all x ∈ Rn . T As expected, there is a relation between the matrices AB→C and AT which involves the T transition matrices PB and PC that accomplish the corresponding coordinate changes in Rn and Rm . Starting from the fact that, ∀x ∈ Rn , T (x) = AT (x) and using the relationships T (x) = PC T (x) C and (x) = PB (x)B , we find that, ∀x ∈ Rn , PC T (x) C = AT PB (x)B . Multiplying this equation by P−1 C on the left yields T (x) C = P−1 C AT PB (x)B . (x)B , we obtain the following relationship Hence, since we also have that T (x) C = AB→C T B→C between the matrices AT and AT : AB→C = P−1 T C AT PB . Example 26.1.1 Consider the linear transformation T : R3 → R2 given by   x x + y + z T (y ) = . 2x − z z        0 0   1 1 , 1 , 0 be a basis for the domain of T and Let B = {f1 , f2 , f3 } =   1 1 1 1 2 C = {g1 , g2 } = , be a basis for the codomain of T . Calculate the matri2 1 . ces AT , PB , PC and AB→C T We have   1 1   T (e1 ) = T ( 0 ) = , 2 0   0 1   T (e2 ) = T ( 1 ) = , 0 0 so AT = T (e1 )T (e2 )T (e3 ) = We also have   1 0 0 PB = 1 1 0 1 1 1   0 1   T (e3 ) = T ( 0 ) = , −1 1 1 1 1 . 2 0 −1 and 43 PC = 1 2 , 2 1 hence = P−1 AB→C T C AT P B = 1 −3   1 0 0 1 4 − −1 1 −2 1 1 1  − 3 3 1 1 0 = . 5 5 −2 1 2 0 −1 1 3 3 1 1 1 Example 26.1.2 Use the information from Example 26.1.1 to verify that AB→C and AT T represent the same tranformation T . To show the equivalence of these two representations of T , let us compare the effect of AB→C on the basis B of Rn with the effect of AT on the standard basis of Rn . T Starting from AB→C , this matrix tells us that T   1 1 −3 B→C   0 , AT = 5 3 C 0 B   4 0 −3 B→C   1 AT , = 5 3 C 0  B 0 −1 B→C   0 AT = , 1 C 1 B which amounts to the relations 1 5 T (f1 ) = − g1 + g2 , 3 3 4 5 T (f2 ) = − g1 + g2 , 3 3 T (f3 ) = −g1 + g2 . Alternatively, one can read these relations directly from the matrix 1 − 3 − 43 −1 B→C AT = T (f1 ) C T (f2 ) C T (f3 ) C = . 5 5 1 3 3 1 1 1 Similarly, the matrix AT = tells us that 2 0 −1     1 0 1 1 AT 0 = , AT 1 = , 2 0 0 0 which amount to the relations T (e1 ) = ē1 + 2ē2 , T (e2 ) = ē1 , T (e3 ) = ē1 − ē2 . 44   0 1 AT 0 = , −1 1 The bars are used over the standard-basis vectors ē1 , ē2 of the codomain R2 in order to distinguish these vectors from the standard basis vectors e1 , e2 and e3 of the domain R3 . Now, using the linearity of T and the fact that f1 = e1 + e2 + e3 , f2 = e2 + e3 and f3 = e3 , we deduce that T (f1 ) = T (e1 ) + T (e2 ) + T (e3 ) = (ē1 + 2ē2 ) + (ē1 ) + (ē1 − ē2 ) = 3ē1 + ē2 , T (f2 ) = T (e2 ) + T (e3 ) = (ē1 ) + (ē1 − ē2 ) = 2ē1 − ē2 , T (f3 ) = T (e3 ) = ē1 − ē2 . Let us now compare the above relations with the relations 5 1 T (f1 ) = − g1 + g2 , 3 3 5 4 T (f2 ) = − g1 + g2 , 3 3 T (f3 ) = −g1 + g2 . Since g1 = ē1 + 2ē2 and g2 = 2ē1 + ē2 , we get obtained by using the matrix AB→C T 1 5 T (f1 ) = − (ē1 + 2ē2 ) + (2ē1 + ē2 ) = 3ē1 + ē2 , 3 3 4 5 T (f2 ) = − (ē1 + 2ē2 ) + (2ē1 + ē2 ) = 2ē1 − ē2 , 3 3 T (f3 ) = −(ē1 + 2ē2 ) + (2ē1 + ē2 ) = ē1 − ē2 , which are precisely the relations obtained previously, derived from the matrix AT . Hence, the matrices AT and AB→C represent the same transformation T . T 26.2 Similarity Of particular interest is the special case where the domain and the codomain of a linear transformation is the same Euclidean space; that is, T : Rn → Rn , and the bases for the domain and the codomain coincide; that is, B = C. In this case, the general equation connecting AT and AB→C reduces to T ATB→B = P−1 B AT PB , and the matrices AT and AB→B are called similar. T In general, a square matrix N is called similar to another square matrix M if there exists an invertible square matrix P such that N = P−1 MP. Note that we also have PNP−1 = M, which means that there exists an invertible matrix Q, namely Q = P−1 , such that M = Q−1 NQ. 45 Hence, M is similar to N as well. Similar matrices, such as AT and AB→B , represent the T same linear transformation T in different bases. More generally, for any bases B and C for Rn , we have AB→B = P−1 T B AT P B and AC→C = P−1 T C AT PC . Solving the first equation for AT and substituting the resulting expression in the second equation, we get B→B −1 B→B −1 PB PC . PB PC = P−1 ATC→C = P−1 C P B AT C PB AT Now, recall from subsection 25.2 that PC→B = P−1 B PC −1 P−1 C→B = PB→C = PC PB . and Therefore, the above relation becomes B→B ATC→C = P−1 PC→B , C→B AT which establishes the fact that the matrices AB→B and AC→C are similar. The first matrix T T represents T with respect to the B basis, and the second matrix represents T with respect to the C basis. 26.3 Diagonalisable linear transformations Given a linear transformation T : Rn → Rn , suppose that we are able to find a basis B for representing T is diagonal; that is, Rn such that the matrix AB→B T   k1 0 0 0  0 k2 0 0    AB→B =  , . T . 0 0 . 0 0 0 0 kn where k1 , k2 , . . . , kn are some given constants. Working with the particular basis B = {f1 , f2 , . . . , fn }, it is very easy to understand the effect of the transformation T . We have        k1 1 k1 0 0 0 1  0 k2 0 0  0 0 0        T (f1 ) B = AB→B (f1 )B =    ..  =  ..  = k1  ..  = k1 (f1 )B , . T . 0 0 . . . 0  . 0 0 0 kn 0 B 0 B 0 B        k1 0 0 0 0 0 0          0 k2 0 0  1 k2  1 T (f2 ) B = AB→B (f ) = = = k        . .  = k2 (f2 )B , . 2 B 2 T  0 0 . . . 0   ..   ..   ..  0 0 0 kn 0 B 0 B 0 B 46  k1 0  0 k2  T (fn ) B = AB→B (fn )B =  T 0 0 0 0 .. .       0 0 0 0 0 0 0 0 0 0         ..  =  ..  = kn  ..  = kn (fn )B , ... . . 0  . 0 kn 1 B kn B 1 B which implies that T stretches each basis vector fi by a factor of ki . In other words, T (f1 ) = k1 f1 , T (f2 ) = k2 f2 , ..., T (fn ) = kn fn . Some of the most important applications of linear algebra utilise properties of diagonal matrices. Such applications require finding a basis B of Rn (or, generally, of a vector space V ) with respect to which the matrix AB→B representing a given linear transformation T T : Rn → Rn (or, generally, T : V → V ) is diagonal. We will see that we will not be able to achieve such simplicity with every given linear transformation T . However, whenever T is diagonalisable in the above sense, the technique of finding such a suitable basis B is known as diagonalisation. We will discuss the process of diagonalisation in detail in the next lecture. For the time being, let us consider a simple illustration of this process, which allows us to introduce the concepts of eigenvectors, eigenvalues and eigenspaces. 26.4 Eigenvalues, eigenvectors and eigenspaces Consider the linear transformation T : R2 → R2 defined x x + 3y 1 T( )= = y −x + 5y −1 by x 3 . 5 y The effect of T on the standard basis {e1 , e2 } of R2 , namely 1 1 T (e1 ) = T ( )= = e1 − e2 , 0 −1 0 3 T (e2 ) = T ( )= = 3e1 + 5e2 , 1 5 is sketched below: 47 Figure 26.4.1 Although we can see the effect of T on the standard basis, we cannot claim that we have fully understood what T does geometrically. Instead, as a working hypothesis, let us assume that there exists a basis B = {f1 , f2 } of R2 such that AB→B is a diagonal matrix; that is, T k 0 B→B AT = 0 l for some k, l ∈ R. If this is the case, the effect of T on the B-basis vectors f1 and f2 is very clear. We have k 1 k 0 1 B→B T (f1 ) B = AT (f1 )B = = =k = k(f1 )B , 0 l 0 B 0 B 0 B T (f2 ) B = AB→B (f2 )B = T k 0 0 0 0 = =l = l(f2 )B , 0 l 1 B l B 1 B which implies the relations T (f1 ) = kf1 and T (f2 ) = lf2 . Note that these are geometric (that is, coordinate-independent) relations and are therefore valid with respect to any chosen basis - in particular, the standard basis: So, in standard coordinates, we are looking for vectors f1 , f2 such that AT f1 = kf1 and AT f2 = lf2 . Let us see if we can find such a basis {f1 , f2 } for the given transformation T . We start by noting that the above requirements for f1 , f2 can be expressed as a single requirement: we are looking for vectors x 6= 0 such that AT x = λx for some λ ∈ R. The condition that x 6= 0 is needed because otherwise x cannot be a basis vector. The equation AT x = λx, x 6= 0, is called an eigenvalue equation. The vector x, which is simply stretched by AT by a factor of λ, is called an eigenvector of AT . The corresponding value of λ, which gives the amount of stretching, is called an eigenvalue of AT . The eigenvalue equation AT x = λx, x 6= 0, seems to contain too many unknowns at this stage since neither x nor λ is known. However, the key thing to notice is that x 6= 0: Arranging the equation in the form of the homogeneous system (AT − λI)x = 0, x 6= 0, we immediately deduce that in order for a non-zero solution for x to exist, the determinant of the square matrix AT − λI must be zero: |(AT − λI)| = 0. 48 If the determinant |(AT − λI)| were not zero, the matrix AT − λI would be invertible, which would imply that the only solution of the homogeneous system (AT − λI)x = 0 is the trivial solution x = 0, contrary to our requirement that x 6= 0. The equation |(AT − λI)| = 0 is called the characteristic polynomial equation T. for A 1 3 The solutions of this equation are the eigenvalues λ of AT . Here, we have AT = , −1 5 so the characteristic polynomial equation for AT is 1−λ 3 = λ2 − 6λ + 8 = (λ − 4)(λ − 2) = 0, −1 5 − λ which implies that AT has two distinct eigenvalues, λ1 = 2 and λ2 = 4. Having found the two eigenvalues of AT , we go back to the eigenvalue equation (AT − λI)x = 0, x 6= 0. For each eigenvalue λ, we now need to solve this equation for the corresponding eigenvector x. So, let us denote the eigenvectors corresponding to λ1 and λ2 by v1 and v2 , respectively. Clearly, for each eigenvalue λi , the equation (AT − λi I)vi = 0, vi 6= 0 amounts to the requirement that the non-zero vector vi belongs to the null space N (AT − λi I) of the matrix AT − λi I. Starting with λ1 = 2, we have −1 3 v1 ∈ N (AT − λ1 I) = N ( ), v1 6= 0 −1 3 3 which tells us that the non-zero vector v1 ∈ Lin . Similarly, with λ2 = 4, we have 1 −3 3 v2 ∈ N (AT − λ2 I) = N ( ), v2 6= 0 −1 1 1 which tells us that the non-zero vector v2 ∈ Lin . 1 3 The subspace Lin ⊂ R2 is known as the eigenspace of the transformation 1 1 2 2 ⊂ R2 is the T : R → R associated with the eigenvalue 2. The subspace Lin 1 eigenspace of T associated with the eigenvalue 4. 3 The transformation T stretches any vector belonging to the eigenspace Lin (in1 cluding the zero vector in a trivial sense) by a factor of λ1 = 2 and stretches any vector 1 belonging to the eigenspace Lin (including the zero vector in a trivial sense) by a 1 factor of λ2 = 4. Moreover, the vectors 3 3 1 1 f1 = ∈ Lin and f2 = ∈ Lin 1 1 1 1 49 spanning these subspaces of R2 are linearly independent, so they form a basis for R2 : 3 1 B = {f1 , f2 } = , . 1 1 A basis for R2 consisting of eigenvectors of T (that is, a basis such as B) can be obtained by selecting of T ; for example, we could have any other basis vectors from the eigenspaces 6 −3 selected from the first eigenspace and from the second eigenspace. All choices 2 −3 of eigenvectors are equally good for the applications we have in mind. The only vector that belongs to an eigenspace (in fact to both of them) but should never be selected as an eigenvector is the zero vector. Based on our previous discussion, it should be clear that the matrix AB→B representing T T with respect to the basis B is diagonal, with its diagonal entries equal to the eigenvalues λ1 , λ2 : λ1 0 2 0 B→B AT = = . 0 λ2 0 4 Indeed, this matrix implies that (f1 )B = T (f1 ) B = AB→B T 2 0 1 2 1 = =2 = 2(f1 )B , 0 4 0 B 0 B 0 B (f2 )B = T (f2 ) B = AB→B T 2 0 0 0 0 = =4 = 4(f2 )B , 0 4 1 B 4 B 1 B and these relations reproduce the geometric (i.e., coordinate-independent) relations T (f1 ) = 2f1 and T (f2 ) = 4f2 , previously obtained using standard coordinates. B→B Alternatively, we is diagonal by using the transition matrix canshow that AT 3 1 PB = (f1 f2 ) = from B-coordinates to standard coordinates and the similarity 1 1 relation AB→B = P−1 T B AT PB . We have 1 1 −1 1 3 3 1 2 0 −1 B→B = . AT = P B AT P B = −1 5 1 1 0 4 2 −1 3 The effect of T on the basis B of R2 (and hence, by the linearity of T , on any vector v ∈ R2 ) is depicted below: 50 Figure 26.4.2 Going back to Figure 26.4.1, we can now understand why the effect of T on the standard basis vectors e1 and e2 looks so complicated. Expressing the standard basis vector 1 3 1 e1 = as a linear combination of the eigenvectors f1 = and f2 = , 0 1 1 1 1 e 1 = f 1 − f2 , 2 2 and using the linearity of T and the stretches T (f1 ) = 2f1 and T (f2 ) = 4f2 , we find that 1 1 1 1 T (e1 ) = T (f1 ) − T (f2 ) = (2f1 ) − (4f2 ) = f1 − 2f2 = e1 − e2 , 2 2 2 2 0 which is not just a stretch of e1 . Similarly, expressing e2 = as a linear combination 1 3 1 of f1 = and f2 = according to 1 1 3 1 e2 = − f1 + f2 , 2 2 we see that 1 3 1 3 T (e2 ) = − T (f1 ) + T (f2 ) = − (2f1 ) + (4f2 ) = −f1 + 6f2 = 3e1 + 5e2 , 2 2 2 2 which is not just a stretch of e2 . 26.5 Exercises for self study Exercise 26.5.1 Consider the linear transformation S : R2 → R2 defined by x 3x − y S( )= . y −x + 3y 51 Also consider the standard basis B for R2 given by 1 0 B = {e1 , e2 } = , . 0 1 (a) Calculate the effect of S on the standard basis vectors and sketch e1 , e2 , S(e1 ) and S(e2 ) on a single copy of R2 . (b) Write down the matrix AS representing S with respect to the standard basis; that is, write down AS = AB→B . S (c) Calculate the effect of S on the vectors 1 f1 = and 1 f2 = 1 −1 and add f1 , f2 , S(f1 ) and S(f2 ) to your sketch. Now consider the basis C for R2 given by 1 1 C = {f1 , f2 } = , . 1 −1 representing S with respect (d) Using your answer to part (c), write down the matrix AC→C S to the C basis. (e) Verify your answer for ASC→C by using the relation AC→C = P−1 S C AS PC , where PC is the transition matrix from C-coordinates to standard coordinates; i.e., PC = PC→B where B is the standard basis. Exercises 26.5.2 Building on Exercise 26.5.1, consider the matrix AS representing S with respect to the standard basis. (a) Write down the eigenvalue equation associated with AS . (b) Solve the characteristic polynomial equation |AS − λI| = 0 to find the eigenvalues of AS . (c) For each eigenvalue λ, find a basis for the corresponding eigenspace N (AS − λI). (d) Hence, find a basis E = {g1 , g2 } for R2 such that the corresponding matrix AE→E S representing S is diagonal. (e) Verify that ASE→E = P−1 E AS PE is diagonal by directly multiplying the matrices on the right hand side, where PE is the transition matrix from E-coordinates to standard coordinates. Exercise 26.5.3 Consider the linear transformation S : R2 → R2 defined by x −7x + 9y S( )= . y −6x + 8y 52 Also consider the standard basis B for R2 given by 1 0 B = {e1 , e2 } = , . 0 1 (a) Calculate the effect of S on the standard basis vectors and sketch e1 , e2 , S(e1 ) and S(e2 ) on a single copy of R2 . (b) Write down the matrix AS representing S with respect to the standard basis; that is, write down AS = AB→B . S 3 1 (c) Calculate the effect of S on the basis C = {f1 , f2 } = , and then find the 2 1 matrix ASC→C representing S with respect to C basis using the relation C→C = S(f1 ) C S(f2 ) C . AS (d) Solve the characteristic polynomial equation |AS − λI| = 0 and find the eigenspace N (AS − λI) corresponding to each solution λ of this equation. Verify that your findings agree with your answer to part (c). 1 4 Exercise 26.5.4 Consider the matrix A = . 3 2 (a) Find the eigenvalues of A and the corresponding eigenvectors. (b) Hence, find an invertible matrix P such that P−1 AP = D where D is a diagonal matrix. 26.6 Relevant sections from the textbooks • M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press. Section 8.1 of our Algebra Textbook is relevant. 27 27.1 Linear transformations, 5 of 6 Diagonalisation In the last lecture we saw examples of linear transformations T : R2 → R2 where a basis B of R2 can be found such that the matrix AB→B representing T is diagonal. We will now T present a number of theorems which tell us exactly when a given linear transformation T : R2 → R2 is ‘diagonalisable’ in the above sense. The central problem that we are dealing with is the solution of the eigenvalue equation Ax = λx, x 6= 0, where the n×n matrix A can be regarded as representing a linear transformation T : Rn → Rn with respect to the standard basis {e1 , e2 , . . . , en } of Rn . The subscript T has been 53 omitted from the matrix A for simplicity, but it is important to remember the geometric context to which the matrix A refers; i.e., the fact that T (x) = Ax. Let us first review the material presented in the last lecture: We showed there that the eigenvalue equation Ax = λx, x 6= 0 cannot have non-zero solutions for x unless the square matrix (A − λI) is not invertible. The resulting characteristic equation, |A − λI| = 0, involves a polynomial of degree n on its left hand side and yields the eigenvalues of the matrix A. For each solution λ0 of this characteristic equation, the eigenspace corresponding to λ0 is the null space of the matrix A − λ0 I; that is, the subspace N (A − λ0 I) of Rn . Any vector x ∈ N (A − λ0 I) other than the zero vector 0 ∈ Rn is an eigenvector of A corresponding to the eigenvalue λ0 . In the special case where the characteristic equation |A − λI| = 0 for the n × n matrix A yields n distinct eigenvalues λ1 , λ2 , . . . , λn , the following theorem, stated without proof, guarantees that the matrix A is diagonalisable. In other words, it guarantees the existence of an invertible matrix P and a corresponding diagonal matrix D such that P−1 AP = D; i.e. A = PDP−1 . This relation implies that A is similar to a diagonal matrix, which is precisely what the term ‘diagonalisable’ means. The relevant theorem is given below: Theorem 27.1.1 pendent. Eigenvectors corresponding to distinct eigenvalues are linearly inde- In order to appreciate the relevance of this theorem, observe that in all the examples of linear transformations T : R2 → R2 encountered in the last lecture, the two eigenvalues of the 2 × 2 matrix AT are distinct, and indeed, the corresponding eigenvectors are linearly independent. It is for this reason that the matrix PB = (v1 v2 ) constructed from these eigenvectors is invertible and can be regarded as the transition matrix from B-coordinates to standard coordinates. The resulting matrix AB→B representing T with respect to the B-basis is diagonal, since the relations T 1 B→B B→B 1 λ1 = λ1 (v1 )B = T (v1 ) B = AT (v1 )B = AT 0 B 0 B and 0 B→B B→B 0 λ2 = λ2 (v2 )B = T (v2 ) B = AT (v2 )B = AT 1 B 1 B imply that ATB→B = λ1 0 , 0 λ2 and the fact that AT can be expressed in the form AT = PB AB→B P−1 T B implies that AT is diagonalisable; i.e., similar to the diagonal matrix AB→B . T 54 The exact same methodology can be extended to any square n × n matrix A, provided that the characteristic equation |A − λI| = 0 produces n distinct real eigenvalues; i.e., provided that the characteristic polynomial can be written in the form |A − λI| = (λ − λ1 )(λ − λ2 ) . . . (λ − λn ) where all the eigenvalues in the set {λi } are real and distinct. If this is the case, Theorem 27.1.1 guarantees that the corresponding eigenvectors v1 , . . . , vn are linearly independent and hence form a basis B = {v1 , . . . , vn } of Rn . In particular, each eigenspace N (A − λi I) is a one-dimensional subspace of Rn , since N (A − λi I) = Lin {vi } . When constructing the basis B, it does not matter which particular element vi of each eigenspace we take, as long as each vi 6= 0. The ordering of the eigenvectors in the basis B does not matter either: Different orderings lead to different invertible transition matrices PB , and hence different pairs (PB , AB→B ) of invertible and diagonal matrices, but in each T case, AT is similar to a diagonal matrix AB→B ; i.e., T AT = PB AB→B P−1 T B . More generally, even if A does not have distinct eigenvalues, the existence of a basis B for Rn consisting of eigenvectors of A is enough to guarantee that A is diagonalisable. The relevant theorem, which can be regarded as the main theorem of diagonalisation, is stated below in two different, equivalent, ways, linked by the fact that n linearly independent vectors in Rn form a basis B for Rn . Main Theorem 27.1.2 (a) A n × n matrix A is diagonalisable if and only if it has n linearly independent eigenvectors. Main Theorem 27.1.2 (b) An n × n matrix A is diagonalisable if and only if there exists a basis B of Rn consisting of eigenvectors of A. The main theorem brings the following question: Exactly when does an n × n matrix A have n linearly independent eigenvectors, so that these can form a basis B of Rn ? In order to motivate the answer - given by Theorem 27.2.2 at the end of the next subsection - let us examine two typical examples of 2 × 2 matrices that do not produce bases of eigenvectors for R2 and, then, a third example of a 2 × 2 matrix that does produce such a basis. π cos(θ) −sin(θ) Example 27.1.3 Consider the rotation matrix AT = where θ = ; sin(θ) cos(θ) 2 π 2 2 that is, consider a rotation T : R → R by anticlockwise. Then 2 0 −1 AT = . 1 0 Since T only rotates vectors in R2 , it preserves the length ||v|| of each vector v; hence, T does not stretch any vector. Accordingly, we expect AT to have no real eigenvalues. Indeed, 55 the characteristic equation |AT − λI| = 0 is a quadratic equation of negative discriminant, −λ −1 |AT − λI| = = λ2 + 1 = 0, 1 −λ which confirms that AT is not diagonalisable. It is worth mentioning here that AT is diagonalisable if we introduce complex eigenvalues, but this means that we are talking about a complex vector space. We will not cover such vector spaces in our course. Example 27.1.4 Consider the matrix 2 1 AT = , 0 2 whose effect on the standard basis of R2 , 1 2 T (e1 ) = AT = = 2e1 , 0 0 0 1 T (e2 ) = AT = = e1 + 2e2 , 1 2 is depicted below: Clearly, the one-dimensional subspace Lin{e1 } is an eigenspace of the eigenvalue λ = 2 since any vector v ∈ Lin {e1 } (i.e., any vector of the form v = ke1 for some k ∈ R) is stretched by a factor of 2. Hence, with v1 = e1 being an eigenvector of T corresponding to λ1 = 2, is there another eigenvector v2 of T which is linearly independent from v1 = e1 and hence results in a basis B = {v1 , v2 } for R2 ? If the answer is yes, then A is diagonalisable by the Main Theorem 27.1.2. 56 Let us check: The characteristic equation |AT − λI| = 0 gives 2−λ 1 |AT − λI| = = (2 − λ)2 = 0, 0 2−λ so λ = 2 is a repeated eigenvalue. Since the two eigenvalues of A are not distinct, Theorem 27.1.1 cannot guarantee that A is diagonalisable; however, it cannot exclude that possibility either: We need to check further, by calculating the corresponding eigenspace N (AT − 2I). We have 0 1 N (AT − 2I) = N ( ), 0 0 which is only one free parameter the general solution of the system implies that there in 0 1 x1 0 1 = and hence N (AT −2I) = Lin = Lin {e1 } is a one-dimensional 0 0 x2 0 0 subspace of R2 . Obviously, we cannot take our ‘second eigenvector’ v2 to belong to this subspace, since any such v2 will be a scalar multiple of v1 = e1 and hence the set {v1 , v2 } will not be a basis for R2 . In particular, the matrix (v1 v2 ) will not be invertible. We conclude that the matrix AT is not diagonalisable. The fact that a repeated eigenvalue arose in the last example was not the reason that diagonalisation failed. The following example is rather trivial but it does show that A may be diagonalisable even if it has a repeated eigenvalue. Example 27.1.5 Consider the matrix representing a stretch in the x-direction by a factor of 3 and a stretch in the y-direction by the same factor: 3 0 The matrix AT is diagonal so it is also diagonalisable in a 0 3 trivial sense, since it is similar to itself: AT = I−1 AT I. Let us focus on the structure of its eigenspaces. The characteristic equation 3−λ 0 |AT − λI| = = (3 − λ)2 = 0 0 3−λ = T (e1 )T (e1 ) = 57 yields a repeated eigenvalue λ1 = λ2 = 3, and hence there is a single eigenspace, namely 0 0 N (AT − 3I) = N ( ) = R2 , 0 0 which is two-dimensional. Therefore, every vector in R2 is stretched by a factor of 3, and hence any basis B of R2 is a basis of eigenvectors of this matrix. For example, we can 1 0 take B = {e1 , e2 } and PB = I = or we can take a random basis {v1 , v2 } and 0 1 PB = (v1 v2 ). In all cases, the corresponding matrix AB→B is diagonal and equal to AT , T since 3 0 −1 −1 3 0 −1 −1 B→B AT = PB AT PB = PB PB = PB 3IPB = 3PB IPB = 3I = . 0 3 0 3 27.2 Algebraic and geometric multiplicity In order to understand the three previous examples and describe with precision what makes an n × n matrix A diagonalisable, we need to introduce the concepts of algebraic and geometric multiplicity of eigenvalues. A real eigenvalue λ0 of an n × n matrix A is said to have algebraic multiplicity k if k is the largest integer such that (λ − λ0 )k is a factor of the characteristic polynomial |A − λI|. The geometric multiplicity of an eigenvalue λ0 of A is the dimension of the corresponding eigenspace N (A − λ0 I). The characteristic polynomial of Example 27.1.3 produced no real eigenvalues, and those encountered in Examples 27.1.4 and 27.1.5 both produced a real eigenvalue of algebraic multiplicity 2. In the former case, the geometric multiplicity of that eigenvalue was 1, and in the latter case, the geometric multiplicity was 2. Note that the geometric multiplicity of any real eigenvalue λ0 of A is at least one. This is because of the fact that |A − λ0 I| = 0 implies that the n × n matrix A − λ0 I does not have full column rank, which in turn implies that the eigenvalue equation (A − λ0 I)x = 0 has at least one free parameter in its general solution. Moreover, we also have an upper bound on the geometric multiplicity of an eigenvalue λ0 of A. Stating the relevant result without proof, we have that for any eigenvalue λ0 of an n × n matrix A, the geometric multiplicity cannot exceed the algebraic multiplicity of λ0 . Note that, due to these bounds, if A has n distinct real eigenvalues then each one of them has algebraic and geometric multiplicity equal to 1. Going back to Examples 27.1.3 - 27.1.5, also note that the only matrix that was diagonalisable had a repeated eigenvalue whose algebraic and geometric multiplicities were equal. In the other cases, either AT had at least one non-real eigenvalue or AT had at least one eigenvalue whose geometric multiplicity was not equal to the corresponding algebraic multiplicity. These results are captured by the following theorem, stated without proof, which gives us precise conditions in order for an n × n matrix A to be diagonalisable. To properly understand what this theorem states, it is important to keep in mind the Fundamental 58 Theorem of Algebra; namely, the result that a polynomial equation of degree n has exactly n, generally complex 1 , roots. Theorem 27.2.2 A matrix A is diagonalisable if and only if its characteristic polynomial equation yields only real eigenvalues and the geometric and algebraic multiplicities of each such eigenvalue are equal. This theorem is illustrated in the Exercises below. 27.3 Exercises for self study   2 −1 2 Exercise 27.3.1 Consider the matrix A = 0 1 2. 0 0 3 (a) Find the eigenvalues and the corresponding eigenvectors of A. (b) Hence, find an invertible matrix P and a diagonal matrix D such that P−1 AP = D. Exercise 27.3.2 (a) If possible, diagonalise A = P and a diagonal matrix 1 (b) Diagonalise A = 3 2 1 ; i.e., find an invertible matrix −4 6 D such that A = PDP−1 . 4 and hence calculate A10 . 2 Exercise 27.3.3 (a) Give the definition of a square matrix A being diagonalisable. (b) Define the algebraic and geometric multiplicities of an eigenvalue λ of A. A linear transformation T : R3 → R3 is represented by the matrix   4 0 0 AB→B = 0 4 0 T 0 0 3        1 3   1 with respect to the basis B = {f1 , f2 , f3 } = 2 , 0 , 1 for R3 .   0 1 0 (c) Express each of the image vectors T (f1 ), T (f2 ), T (f3 ) as a linear combination of the B-basis vectors. (d) thematrix AT representing T with respect to the standard basis {e1 , e2 , e3 } = Find     0 0   1 0 , 1 , 0 of R3 .   0 0 1 1 Recall that the real numbers are a subset of the complex numbers, so any real number is also a complex number. 59 (e) Obtain a Cartesian description for the eigenspace associated with the largest eigenvalue of AT . Exercise 27.3.4 (a) Consider  1 1  A= 0 1 1 0 the matrices  1 −1 and 2   −2 1 −2 B = −1 0 1  . 2 1 2 Show that neither matrix is diagonalisable.   −1 3 0 (b) Diagonalise C =  0 2 0; that is, write C in the form C = PDP−1 for some −3 3 2 invertible matrix P and diagonal matrix D. 27.4 Relevant sections from the textbooks • M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press. Section 8.1, 8.2 and 8.3 of our Algebra Textbook is relevant. 28 Linear transformations, 6 of 6 We now focus on symmetric matrices and a special form of diagonalisation applicable to symmetric matrices, known as orthogonal diagonalisation. We begin by introducing orthogonal matrices. Then, we link these matrices to the process of orthogonal diagonalisation and, finally, we discuss applications of orthogonal diagonalisation to quadratic forms and conic sections. 28.1 Orthogonal matrices An n × n matrix P is said to be orthogonal if PT P = PPT = I. This means that P is invertible and its inverse P−1 is equal to its transpose PT ; that is, PT = P−1 . At first it appears that the use of the term ‘orthogonal’ for a matrix P satisfying PT P = PPT = I has little to do with the concept of orthogonality of vectors. However, there is a close connection, captured by the following theorem. Theorem 28.1.1 An n × n matrix P is orthogonal if and only if its columns are pairwise orthogonal, and each has length 1; that is, if and only if the columns of P form an orthonormal basis for Rn . Proof Suppose that P is an orthogonal matrix; i.e., PT P = PPT = I. Let the vectors x1 , x2 , . . . , xn ∈ Rn be the columns of P = (x1 x2 . . . xn ), so that PT is the matrix whose 60 rows are xT1 , xT2 , . . . , xTn . Then, the matrix equation    xT1 xT1 x1 xT1 x2 . . . T x  xT x1 xT x2 . . . 2  2  2  ..  (x1 x2 . . . xn ) =  .. .. ..  .   . . . T T T xn xn x1 xn x2 . . . PT P = I can be expressed as    xT1 xn 1 0 ... 0   xT2 xn   0 1 . . . 0 ..  =  .. .. . . ..  , . . .  . . T xn xn 0 0 ... 1 which implies the relations xTi xi = hxi , xi i = ||xi ||2 = 1 and xTi xj = hxi , xj i = 0 if i 6= j. This shows that the columns of P form an orthonormal basis for Rn with respect to the standard scalar product. Conversely, if the columns of P form an orthonormal basis for Rn with respect to the standard scalar product, then xTi xi = hxi , xi i = ||xi ||2 = 1 and xTi xj = hxi , xj i = 0 if i 6= j, so the matrix P is orthogonal:      1 xT1 x1 xT1 x2 . . . xT1 xn xT1 xT x1 xT x2 . . . xT xn  0 xT  2 2    2  2 PT P =  ..  (x1 x2 . . . xn ) =  .. .. ..  =  .. . .  .  .  . . .  . T T T T 0 xn xn x1 xn x2 . . . xn xn  0 ... 0 1 . . . 0  .. . . ..  = I. . . . 0 ... 1 Note that if the matrix P is orthogonal then so is its transpose PT , because (PT )(PT )T = PT P = I = PPT = (PT )T (PT ). Hence, Theorem 28.1.1 can also be expressed in the form: An n × n matrix P is orthogonal if and only if the transposed rows of P form an orthonormal basis for Rn . 28.2 Orthogonal diagonalisation A matrix A is said to be orthogonally diagonalisable if there is an orthogonal matrix P such that P−1 AP = PT AP = D where D is a diagonal matrix. Note that the fact that A is diagonalisable means that the columns of P form a basis for Rn consisting of eigenvectors of A. The additional fact that A is orthogonally diagonalisable means that the columns of P form an orthonormal basis for Rn consisting of eigenvectors of A. Putting these facts together, we have the following theorem: Theorem 28.2.1 A matrix A is orthogonally diagonalisable if and only if there is an orthonormal basis for Rn consisting of eigenvectors of A. Let us look at some examples. Example 28.2.2 The matrix A= 7 −15 2 −4 61 is diagonalisable but is not orthogonally diagonalisable. Omitting the calculations, the 5 eigenvalues of A are λ1 = 1 and λ2 = 2. The eigenspace corresponding to λ1 is Lin 2 3 and the eigenspace corresponding to λ2 = 2 is Lin . Since we have 1 5 3 , 6= 0, 2 1 no eigenvector in the eigenspace of λ1 is perpendicular to any eigenvector in the eigenspace of λ2 . Hence it is not possible to find an orthonormal set of eigenvectors for A in order to orthogonally diagonalise this matrix. Example 28.2.3 Now consider the matrix 5 −3 A= . −3 5 The characteristic polynomial equation is 5 − λ −3 |A − λI| = = λ2 − 10λ + 16 = 0, −3 5 − λ so the eigenvalues of A are λ1 = 2 and λ2 = 8. The corresponding eigenspaces are 3 −3 1 −1 1 N (A − 2I) = N ( ) = N( ) = Lin , −3 3 0 0 1 −3 −3 1 1 −1 N (A − 8I) = N ( ) = N( ) = Lin . −3 −3 0 0 1 Since 1 −1 , = 0, 1 1 the eigenspaces N (A−2I) and N (A−8I) are orthogonal. It is now straightforward to create an orthonormal basis by selecting a unit eigenvector from each eigenspace. ! !) ( of eigenvectors 1 √1 √ − 2 2 , is such an orthonormal basis of eigenvectors of A. For example, B = 1 √ √1 2 2 If we let P = PB = √1 2 √1 2 − √12 ! and √1 2 D= 2 0 , 0 8 then P is orthogonal and PT AP = P−1 AP = D. We say that A has been orthogonally diagonalised; i.e., it has been expressed in the form A = PDPT , where P is an orthogonal matrix and D a diagonal matrix. 62 Note that the matrix A in this example is symmetric, whereas the matrix A in Example 28.2.2 is not. This brings the next question: Which matrices can be orthogonally diagonalised? The answer is given by the following theorem, stated without proof. Theorem 28.2.4 The matrix A is orthogonally diagonalisable if and only if A is symmetric. Recall that an n × n matrix A is not diagonalisable unless all of its eigenvalues are real. Hence, Theorem 28.2.4 implies the following result as a corollary: Corollary 28.2.5 If A is a symmetric matrix, then all of its eigenvalues are real. Moreover, Theorem 28.2.4 implies that, even if some eigenvalues of a symmetric matrix A are repeated, the eigenspaces corresponding to distinct eigenvalues of A are orthogonal. Otherwise it would have been impossible for A to have an orthonormal basis of eigenvectors, which is necessary for its orthogonal diagonalisation. This implies, in turn, that eigenvectors corresponding to distinct eigenvalues of a symmetric matrix A are orthogonal. It is instructive to prove this result as an independent theorem: Theorem 28.2.6 If the matrix A is symmetric, then eigenvectors corresponding to distinct eigenvalues are orthogonal. Proof Suppose that λ, µ are any two distinct eigenvalues of A and that x, y are corresponding eigenvectors. Then Ax = λx and Ay = µy. Now consider the matrix product xT Ay. Since Ay = µy, we have xT Ay = xT (Ay) = xT (µy) = µxT y. But also, Ax = λx. Since A is symmetric, A = AT . Substituting and using the properties of the transpose of a matrix, we have xT Ay = xT AT y = (xT AT )y = (Ax)T y = (λx)T y = λxT y. Equating these two expressions for xT Ay, we deduce that µxT y = λxT y; that is, (µ − λ)xT y = 0. But since, by assumption, λ and µ are distinct, we have µ − λ 6= 0. Hence, we must have that xT y = hx, yi = 0, which tells us precisely that x and y are orthogonal. It is now straightforward to see how to construct an orthonormal basis B of eigenvectors for a symmetric matrix A: For each eigenvalue λ of A, we use the Gram-Schmidt process to create an orthonormal basis for the corresponding eigenspace N (A − λI). If an eigenspace is one-dimensional, we just need to ensure that its basis vector is of unit length. If an eigenspace is multi-dimensional, the full Gram-Schmidt process needs to be applied. Then, the fact that eigenspaces corresponding to distinct eigenvalues are orthogonal guarantees that the resulting basis B of eigenvectors is an orthonormal basis for Rn . 63 28.3 Symmetric matrices and quadratic forms An important application of orthogonal diagonalisation is to the analysis of quadratic forms. A quadratic form in two variables x and y is an expression of the form q(x, y) = ax2 + 2cxy + by 2 . This can be written as q(x, y) = xT Ax, x where x = and A is the symmetric matrix y a c A= . c b It is useful to verify that expanding the matrix product xT Ax gives q(x, y) = ax2 + 2cxy + by 2 . Note that there are other ways of writing q(x, y) as a product of matrices, xT Bx, where B is not symmetric, but these are of no interest to us here; our focus is on the case where the matrix is symmetric. Similarly, a quadratic form in n variables x1 , x2 , . . . , xn is an expression of the form q(x1 , x2 , . . . , xn ) = xT Ax,   x1  x2    where A is a symmetric n × n matrix and x =  ..  ∈ Rn . . xn Example 28.3.1 The following is a quadratic form in three variables: q(x1 , x2 , x3 ) = 5x21 + 10x22 + 2x23 + 4x1 x2 + 2x1 x3 − 6x2 x3 .   x1 T  We have q(x1 , x2 , x3 ) = x Ax, where x = x2  and A is the symmetric matrix x3   5 2 1 A = 2 10 −3 . 1 −3 2 Note that it is quite easy to derive the 3 × 3 symmetric matrix A from the expression q(x1 , x2 , x3 ) of the quadratic form – and conversely – without having to perform matrix multiplications. Specifically, the diagonal entries aii of A are the coefficients of the corresponding quadratic terms in the quadratic form q(x1 , x2 , x3 ), and the off-diagonal entries of A, namely aij = aji , are half of the coefficients of the corresponding non-quadratic terms in q(x1 , x2 , x3 ). Due to certain practical applications we have in mind, we would like to know the set of all possible values that a quadratic form q(x), x ∈ Rn may take. The technique of orthogonal diagonalisation turns out to be very useful in this context. First, we need some terminology. Let q(x) = xT Ax (with AT = A) be a quadratic form. Then: 64 • q(x) is positive definite if q(x) ≥ 0 for all x, and q(x) = 0 only when x = 0, • q(x) is positive semi-definite if q(x) ≥ 0 for all x, • q(x) is negative definite if q(x) ≤ 0 for all x, and q(x) = 0 only when x = 0, • q(x) is negative semi-definite if q(x) ≤ 0 for all x, • q(x) is indefinite if it is neither positive semi-definite nor negative semidefinite; that is, if there exist x1 , x2 such that q(x1 ) < 0 and q(x2 ) > 0. Now suppose that we have found an orthogonal matrix P that orthogonally diagonalises the symmetric matrix A. In other words, we have found P such that PT = P−1 and PT AP = D, where D is a diagonal matrix. We perform the usual change of variables x = Pz, which means that P is regarded as the transition matrix PB from coordinates in the orthonormal basis B of eigenvectors of A to standard coordinates and z is regarded as the coordinate vector (x)B of x with respect to the B basis. Then q(x) = xT Ax = (Pz)T A(Pz) = zT (PT AP)z = zT Dz. Note that D is a diagonal matrix whose entries are the eigenvalues of A; in other words D plays the role of the matrix AB→B representing the transformation T T : Rn → Rn with respect to the B basis, where T (x) = Ax. Let us suppose that the eigenvalues of A (in the order in which they appear in D) are λ1 , λ2 , . . . , λn . Among the set {λ1 , λ2 , . . . , λn }, one or more eigenvalues may be repeated. Let X1 , X2 , . . . , Xnbe the  X1  X2    coordinates of x with respect to the orthonormal basis B; that is, write z = (x)B =  .. .  .  Xn T Then, the fact that q(x) = z Dz implies that q(x) = λ1 X12 + λ2 X22 + · · · + λn Xn2 , which is simply a linear combination of squares. Now suppose that all the eigenvalues are positive. Then we can conclude that, for all z, the quadratic form q(x) is greater than or equal to zero, and also that q(x) is zero only when z is the zero vector. But because of the way in which x and z are related (x = Pz and z = PT x), x = 0 if and only if z = 0. Therefore, if all the eigenvalues are positive, the quadratic form is positive definite. Conversely, assume the quadratic form q(x) is positive definite, so that xT Ax > 0 for all x 6= 0. Then, letting x = ui be a unit eigenvector corresponding to the eigenvalue λi , we find that q(ui ) = uTi Aui = uTi λi ui = λi uTi ui = λi ||ui ||2 = λi > 0, so each eigenvalue λi of A is positive. Therefore we have shown the first part of the following result (the other parts arise from similar reasoning): Theorem 28.3.2 Suppose that the quadratic form q(x) is given by q(x) = xT Ax, where AT = A. Then: 65 • q(x) is positive definite if and and only if all eigenvalues of A are positive, • q(x) is positive semi-definite if and only if all eigenvalues of A are non-negative, • q(x) is negative definite if and only if all eigenvalues of A are negative, • q(x) is negative semi-definite if and only if all eigenvalues of A are non-positive, • q(x) is indefinite if and only if at least one eigenvalue of A is negative and at least one eigenvalue of A is positive. The above terminology extends to the symmetric matrix A itself. So, a symmetric matrix A is • positive definite if and and only if all its eigenvalues are positive, • positive semi-definite if and only if all its eigenvalues are non-negative, • negative definite if and only if all its eigenvalues are negative, • negative semi-definite if and only if all its eigenvalues are non-positive, • indefinite if and only if at least one of its eigenvalues is negative and at least one of its eigenvalues is positive. A final note about quadratic forms in R2 is that they are directly related to conic sections. Recall that conic sections are curves in R2 which are described by a Cartesian equation of the form Ax2 + Bxy + Cy 2 + Dx + Ey + F = 0 for some given real numbers A, B, C, D, E and F . Among these curves, any conic whose Cartesian equation can be written in the simpler form ax2 + 2cxy + by 2 = k can be expressed as a matrix equation involving a quadratic form on its left hand side; namely xT Ax = k, where x x= y and a c A= = AT . c b The technique of orthogonal diagonalisation can be used in this context to find an orthonormal basis B of R2 so that the conic section is in standard position and orientation with respect to the B-coordinate axes. 66 28.4 Exercises for self study Exercise 28.4.1 (a) Orthogonally diagonalise the matrix A = 1 2 . 2 1 (b) Use the result in part (a) to sketch the curve xT Ax = 3 in the xy-plane. Exercise 28.4.2 Consider the quadratic form g(x, y) = xT Ax x where x = and A is the matrix encountered in Exercise 28.4.1. Also consider the y matrices P and D that accomplish the orthogonal diagonalisation of A in Exercise 28.4.1; i.e., the matrices P and D such that P−1 AP = PT AP = D. (a) Use the matrix P as a transition matrix PB from B-coordinates (X, Y ) to standard coordinates (x, y); that is, let x X x = PB (x)B i.e., = PB . y Y Then express the quadratic form g(x, y) = xT Ax in terms of X, Y . (b) Hence, classify the quadratic form g(x, y). √ Exercise 28.4.3 Let C be the curve defined by 3x2 + 2 3xy + 5y 2 = 6. (a) Find a symmetric matrix A such that C is given by xT Ax = 6. (b) Orthogonally diagonalise A; i.e., find an orthogonal matrix P and a diagonal matrix D such that P−1 AP = PT AP = D. (c) Consider P as a transition matrix PB from B-coordinates to standard coordinates and hence sketch the curve C in the xy-plane, showing the standard and the B-coordinate axes on your diagram. Exercise 28.4.4 Denote the columns of the matrix P obtained in Exercise 28.4.3 by u1 and u2 (i.e., P = (u1 u2 )) and consider the matrix Q = (f1 f2 ) = (−u2 u1 ). (a) Argue that Q is an orthogonal matrix. (b) Consider Q as a transition matrix PE from E-coordinates to standard coordinates, where E = {f1 , f2 } = {−u2 , u1 }. Then find the corresponding diagonal matrix F such that Q−1 AQ = QT AQ = F. (c) Make a new sketch of the curve C encountered in Exercise 28.4.3, showing the standard and the E-coordinate axes on your diagram. 67 28.5 Relevant sections from the textbooks • M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press. Sections 10.3, 11.1 and 11.2 of our Algebra Textbook are relevant. 29 29.1 Multivariate calculus, 1 of 5 Functions of Two Variables Let D be a subset of R2 . A function f : D → R is a rule that assigns to each element (x1 , x2 ) ∈ D a unique real number f (x1 , x2 ) ∈ R. Any such function f is called a real-valued function of two variables. For simplicity, we will assume D = R2 . The graph of f : R2 → R consists of all points (x1 , x2 , x3 ) ∈ R3 for which x3 = f (x1 , x2 ). This equation imposes a single restriction on the variables x1 , x2 , x3 and hence describes a two-dimensional surface in R3 . In general, the equation x3 = f (x1 , x2 ) may be non-linear, in which case the corresponding surface is curved and quite difficult to visualise. Computer packages, such as Maple, can be useful aids for the purpose of visualising the graph of f . A linear function f : R2 → R is a function of the form f (x1 , x2 ) = ax1 + bx2 where a and b are given real numbers. The graph of f consists of all points (x1 , x2 , x3 ) ∈ R3 for which x3 = ax1 +bx2 . We can arrange this equation in the standard form ax1 +bx2 −x3 = 3 0. This describes  anon-vertical plane in R which passes through the origin and has a a normal vector  b . −1 An affine function f : R2 → R is a function of the form f (x1 , x2 ) = ax1 + bx2 + c where a, b and c are given real numbers. The graph of f consists of all points (x1 , x2 , x3 ) ∈ R3 for which x3 = ax1 +bx2 +c. In standard form this equation becomes ax1 +bx2 −x3 = −c. It describes a non-vertical plane in R3 which does not pass the origin (unless we  through  a consider the linear case c = 0) and has a normal vector  b . −1 A homogeneous function of degree n is a function f : R2 → R which satisfies the condition that, for all x1 , x2 and λ, f (λx1 , λx2 ) = λn f (x1 , x2 ). 68 For example, the function f (x1 , x2 ) = 4x31 − 5x21 x2 + x32 is homogeneous of degree 3 because f (λx1 , λx2 ) = 4(λx1 )3 − 5(λx1 )2 (λx2 ) + (λx2 )3 = λ3 (4x31 − 5x21 x2 + x32 ) = λ3 f (x1 , x2 ). The function f (x1 , x2 ) = x1 x42 − x31 x22 + 7x51 is homogeneous of degree 5 because each term is homogeneous of degree 5. On the other hand, the function f (x1 , x2 ) = x21 + 6x1 x2 + x2 is not homogeneous because the first two terms are homogeneous of degree 2 but the last term is homogeneous of degree 1. Another important class of functions f : R2 → R consists of homogeneous functions of degree 2 of the form f (x1 , x2 ) = ax21 + bx22 where a, b are given real numbers and (a, b) 6= (0, 0). Their graphs belong to a class called quadric surfaces. We shall examine an example of a quadric surface shortly, after we discuss horizontal sections and vertical sections of a general graph x3 = f (x1 , x2 ) in R3 . The horizontal sections give rise to what is known as a contour map of the corresponding surface. Consider the surface corresponding to the graph of a function f : R2 → R; that is, the surface in R3 described by the Cartesian equation x3 = f (x1 , x2 ). This surface may be curved, as illustrated below: Figure 29.1.1 Horizontal sections of the surface x3 = f (x1 , x2 ) correspond to cutting this surface with horizontal planes x3 = c for various values of c. The curve of intersection of the surface x3 = f (x1 , x2 ) with a horizontal plane of the form x3 = c is called a contour. Regarded as a curve in R3 this contour consists of all points (x1 , x2 , x3 ) in R3 that satisfy the simultaneous equations x3 = f (x1 , x2 ) and x3 = c. Different values of c give rise to different contours: 69 Figure 29.1.2 Projecting all these contours onto the x1 x2 -plane produces what is known as a contour map of the surface x3 = f (x1 , x2 ). Each contour in the map is labelled by its characteristic value c. Regarded as a curve on the x1 x2 -plane, the c-contour is described by the equation c = f (x1 , x2 ). This equation is obtained by eliminating the variable x3 from the simultaneous system of equations x3 = f (x1 , x2 ) and x3 = c which describe the same contour as a curve in R3 . In this way, the surface x3 = f (x1 , x2 ) in R3 is visualised as a collection of contours, all lying on a single copy of the x1 x2 -plane: Figure 29.1.3 Vertical sections are similar to horizontal sections. The surface x3 = f (x1 , x2 ) is now cut by vertical planes. These planes are usually chosen either parallel to the x2 x3 -plane (that is, planes of the form x1 = a) or parallel to the x1 x3 -plane (that is, planes of the form x2 = b). In the former case, the surface x3 = f (x1 , x2 ) in R3 is visualised as a collection of curves x3 = f (a, x2 ). Each value of a gives rise to a curve in this collection, and all these curves are depicted on a single copy of the x2 x3 -plane. Similarly, in the latter case, the surface x3 = f (x1 , x2 ) in R3 is visualised as a collection of curves x3 = f (x1 , b) which are 70 depicted on a single copy of the x1 x3 -plane. Vertical sections where the vertical plane is not parallel to the x1 x3 -plane or the x2 x3 -plane are also possible. Let us illustrate these ideas by examining a few horizontal and vertical sections of a quadric surface. Further examples can be found in section 3.1 of our Calculus textbook. Example 29.1.4 Consider the function f : R2 → R given by f (x1 , x2 ) = x21 + x22 . Regarded as a curve in R3 , each contour of the graph x3 = x21 + x22 is described by the simultaneous system x3 = x21 + x22 and x3 = c. By eliminating the variable x3 from the above system, we obtain c = x21 + x22 , which describes the relevant contour as a curve on the x1 x2 -plane. This is a circle of radius √ c. Clearly, if c < 0 (i.e., if the horizontal plane x3 = c in R3 lies below the origin (0, 0, 0)) there are no values of x1 , x2 that satisfy the equation c = x21 + x22 , so the graph of f and the horizontal plane x3 = c do not intersect. Also observe that as the √ value of c increases (i.e., as the horizontal plane x3 = c moves higher in R3 ), the radius c of the circle c = x21 + x22 increases. This information is depicted in the two graphs below: Figure 29.1.5 Let us also consider a few vertical sections of the graph of f with vertical planes of the form x2 = b. Regarded as a curve in R3 , each such curve of intersection is described by the simultaneous system x3 = x21 + x22 and x2 = b. 71 By eliminating the variable x2 from the above system, we obtain x3 = x21 + b2 , which describes the relevant curve within the context of the x1 x3 -plane. Each such curve is a parabola. Observing the first graph in Figure 29.1.5, it should be clear that as the value of b2 increases (i.e., as the vertical plane x2 = b moves away from the origin of R3 ), the lowest point on the intersection curve x3 = x21 + b2 moves upwards. 29.2 Partial derivatives The partial derivative of f : R2 → R in the x1 -direction with x2 kept constant is defined by: ∂f f (x1 + h, x2 ) − f (x1 , x2 ) = limh→0 . ∂x1 h Similarly, the partial derivative of f in the x2 -direction with x1 kept constant is defined by: f (x1 , x2 + h) − f (x1 , x2 ) ∂f = limh→0 . ∂x2 h The partial derivative symbol ∂ is used in place of the ordinary derivative symbol d in order to emphasise that the function f is being differentiated with respect to one of its variables while the other variable is kept constant. A convenient notation for the partial derivatives with respect to x1 and x2 are fx1 and fx2 . These should be regarded as analogous to the symbol f 0 used for an ordinary derivative. For example, fx1 (a, b) means the partial derivative of f (x1 , x2 ) with respect to x1 evaluated at the point (a, b). The rules for partial differentiation follow the rules for ordinary differentiation, with the understanding that the variable that is kept constant is treated as a fixed number. This means that, for most practical applications, the definition of the partial derivative as a limit is not required. Instead, one uses ordinary rules of differentiation. Let us see an example: Example 29.2.1 given by Find the partial derivatives fx1 and fx2 of the function f : R2 → R f = x2 sin(x1 3 + 5x2 ) + x1 x2 + 4x1 . Treating x2 as a fixed number, we find that fx1 = x2 cos(x1 3 + 5x2 )(3x21 ) + x2 + 4. Treating x1 as a fixed number, we find that fx2 = sin(x1 3 + 5x2 ) + x2 cos(x1 3 + 5x2 )(5) + x1 . The second-order derivatives fx1 x1 , fx1 x2 , fx2 x1 , fx2 x2 (as well as higher-order derivatives) are calculated in a similar manner. For example, starting from fx1 = x2 cos(x1 3 + 5x2 )(3x21 ) + x2 + 4 72 and now treating x1 as a fixed number, we find that fx1 x2 = cos(x1 3 + 5x2 )(3x21 ) − x2 (3x21 )sin(x1 3 + 5x2 )(5) + 1. Note that the mixed second-order derivatives of f commute; that is, f x1 x2 = f x2 x1 . All functions f : R2 → R considered in this course will have the above property. 29.3 Geometrical interpretation of the partial derivatives The partial derivatives fx1 and fx2 of a function f : R2 → R have the following geometric meaning: Recall that the graph of f is a two-dimensional surface in R3 described by the Cartesian equation x3 = f (x1 , x2 ). In general, the surface may be curved. Let us now imagine slicing the surface x3 = f (x1 , x2 ) by the vertical plane x2 = b, where b is some real number. In other words, let us consider the vertical section of the graph of f associated with the vertical plane x2 = b. As already discussed, if the resulting curve of intersection is regarded as a curve in R3 , its Cartesian description consists of the simultaneous system of equations x3 = f (x1 , x2 ) and x2 = b. However, the same curve can also be regarded as a curve lying on the two-dimensional vertical plane x2 = b. Then, it is described by the Cartesian equation x3 = f (x1 , b), obtained by eliminating the variable x2 from the set of equations x3 = f (x1 , x2 ) and x2 = b. In order to emphasise that x2 has been eliminated and that x3 depends on x1 alone, we can rewrite the Cartesian equation of this curve as x3 = g(x1 ), where g(x1 ) := f (x1 , b). The curve x3 = g(x1 ) can now be regarded as a curve on the x1 x3 -plane, without any reference to x2 . The partial derivative of f (x1 , x2 ) with respect to x1 evaluated at the point (a, b) is simply the ordinary derivative of the function g(x1 ) evaluated at a; that is, dg ∂f (a, b) = (a). ∂x1 dx1 In other words, the partial derivative fx1 (a, b) is the slope of the curve x3 = g(x1 ) at a. Note that the 2 × 1 direction vector of the tangent line to the curve x3 = g(x1 ) at a is given by 1 . fx1 (a, b) Indeed, regarded as a vector on the x1 x3 -plane, this vector describes a displacement of 1 unit in the x1 -direction and a displacement equal to the slope fx1 (a, b) in the x3 -direction. The same vector can be regarded as a vector in R3 , in which case it becomes   1  0 . fx1 (a, b) The component of this vector in the x2 -direction is zero because the line to which this vector is tangent lies entirely on the plane x2 = b. 73 Similarly, we can imagine slicing the surface x3 = f (x1 , x2 ) by the vertical plane x1 = a. If the resulting curve of intersection is regarded as a curve in R3 , its Cartesian description consists of the set of equations x3 = f (x1 , x2 ) and x1 = a. The same curve can also be regarded as a curve on the vertical plane x1 = a. In this case, it is described by the Cartesian equation x3 = f (a, x2 ), obtained by eliminating the variable x1 from the set of equations x3 = f (x1 , x2 ) and x1 = a. Since x3 depends on x2 alone, we can rewrite the Cartesian equation of this curve as x3 = h(x2 ), where h(x2 ) := f (a, x2 ). The curve x3 = h(x2 ) can now be regarded as a curve on the x2 x3 -plane, without any reference to x1 . The partial derivative of the function f (x1 , x2 ) with respect to x2 evaluated at (a, b) is the ordinary derivative of the function h(x2 ) evaluated at b; that is, dh ∂f (a, b) = (b). ∂x2 dx2 In other words, the partial derivative fx2 (a, b) is the slope of the curve x3 = h(x2 ) at b. Note that the 2 × 1 direction vector of the tangent line to the curve x3 = h(x2 ) at b is given by 1 . fx2 (a, b) Indeed, regarded as a vector on the x2 x3 -plane, this vector describes a displacement of 1 unit in the x2 -direction and a displacement equal to the slope fx2 (a, b) in the x3 -direction. The same vector can be regarded as a vector in R3 , in which case it becomes   0  1 . fx2 (a, b) The component of this vector in the x1 -direction is zero because the line to which this vector is tangent lies entirely on the plane x1 = a. Let us illustrate the geometrical objects introduced so far with the aid of an example. Note that Maple can be a very useful tool here, because most surfaces are difficult to sketch by hand or even to visualise. Example 29.3.1 Consider the function f : R2 → R given by f (x1 , x2 ) = x1 2 + x2 2 . Also consider the curve of intersection of the surface x3 = f (x1 , x2 ) with the vertical plane x1 = 0. Confirm that the partial derivative fx2 (0, 1) gives the slope of this curve when x2 = 1. Consider also the curve of intersection of the surface x3 = f (x1 , x2 ) with the vertical plane x2 = 1. Confirm that the partial derivative fx1 (0, 1) gives the slope of this curve when x1 = 0. Let us first consider the curve of intersection of the surface x3 = f (x1 , x2 ) with the vertical plane x1 = 0. Regarded as a curve lying on a copy of the x2 x3 -plane, this curve is described by the Cartesian equation x3 = h(x2 ) := f (0, x2 ) = x22 . The ordinary derivative h0 (x2 ) evaluated at the point x2 = 1 gives the slope of the curve x3 = h(x2 ) at that point. We have h0 (x2 ) = 2x2 , so h0 (1) = 2. This is indeed equal to the partial derivative fx2 (0, 1). 74 Similarly, let us consider the curve of intersection of the surface x3 = f (x1 , x2 ) with the vertical plane x2 = 1. Regarded as a curve lying on a copy of the x1 x3 -plane, this curve is described by the Cartesian equation x3 = g(x1 ) := f (x1 , 1) = x21 + 1. The ordinary derivative g 0 (x1 ) evaluated at the point x1 = 0 gives the slope of the curve x3 = g(x1 ) at that point. We have g 0 (x1 ) = 2x1 , so g 0 (0) = 0. This is indeed equal to the partial derivative fx1 (0, 1). The relevant diagrams are presented below: Figure 29.3.2 29.4 Tangent planes Before we introduce the concept of the tangent plane to the graph of a function f : R2 → R at a given point (a, b, f (a, b)) ∈ R3 on this graph, let us recall that a function f : R → R of a single variable x may not admit a non-vertical tangent line at a given point (a, f (a)) ∈ R2 1 on its graph, because it may not be differentiable there. For example, the curve y = x 3 does not admit a non-vertical tangent line at (0, 0) because the derivative of the function 1 f (x) = x 3 does not exist at 0. Similarly, we cannot expect that a general surface in R3 of the form x3 = f (x1 , x2 ) will always admit a non-vertical tangent plane. However, it 75 can be shown that if the function f (x1 , x2 ) is continuous and has continuous partial derivatives fx1 (x1 , x2 ) and fx2 (x1 , x2 ) at a point (a, b), then the graph x3 = f (x1 , x2 ) in R3 does admit a non-vertical plane at the point (a, b, f (a, b)). In this case, we say that f (x1 , x2 ) is differentiable at (a, b). Note that continuity and differentiability in R2 go beyond the scope of this course, so we will not be concerned with the justification of this last statement. Provided that f (x1 , x2 ) is differentiable at (a, b), let us consider the vectors     1 0  0   1  and fx1 (a, b) fx2 (a, b) introduced in the previous subsection. Recall that the first vector is the direction vector of the tangent line to the curve of intersection of the surface x3 = f (x1 , x2 ) with the vertical plane x2 = b at the point (a, b, f (a, b)) and that, similarly, the second vector is the direction vector of the tangent line to the curve of intersection of the surface x3 = f (x1 , x2 ) with the vertical plane x1 = a at (a, b, f (a, b)). These two tangent lines define a unique plane that contains them, as illustrated below: Figure 29.4.1 This is the plane which passes through the intersection point (a, b, f (a, b)) of these two tangent lines and has the direction vectors of these lines as its direction vectors. It is called the tangent plane to the surface x3 = f (x1 , x2 ) at the point (a, b, f (a, b)). Note that a normal vector n for this plane can be found by requiring that n is orthogonal to the direction vectors of the plane. Using the scalar product, it easy to check that   fx1 (a, b) n = fx2 (a, b) −1 76 is orthogonal to both direction vectors and hence is the required vector. Hence, a vector parametric equation in R3 describing this tangent plane is given by         x1 a 1 0 x2  =  b  + s  0  + t  1  x3 f (a, b) fx1 (a, b) fx2 (a, b) and a corresponding Cartesian equation is given by *x      a fx1 (a, b) + x2  −  b  , fx2 (a, b) = 0. x3 f (a, b) −1 1 The latter can be expressed in the form x3 − f (a, b) = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b). Note that this is analogous to the Cartesian equation in R2 of the tangent line to the curve y = f (x) at the point (a, f (a)); namely y − f (a) = f 0 (a)(x − a). Indeed, instead of the ordinary derivative f 0 (a) multiplying the difference (x − a) on the right hand side of the Cartesian equation, we now have two partial derivatives fx1 (a, b) and fx2 (a, b) multiplying the corresponding differences x1 − a and x2 − b. On the left hand side of the Cartesian equation, we always have the dependent variable minus the output of the function. Remark 29.4.2 Since the Cartesian equation x3 − f (a, b) = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b) is easy to memorise, it provides the best way for reconstructing all the geometrical objects of interest: First, arranging the Cartesian equation in standard form, we read from the coefficients of x1 , x2 and x3 that a normal vector n to the plane is   fx1 (a, b) fx2 (a, b) . −1 Then we obtain the direction vectors of  the plane by the   using   trick  that two linearly α 1 0 independent vectors perpendicular to n =  β  are  0  and  1 . Finally, we identify −1  α  β a  b . a position vector on the plane by using the point f (a, b) Example 29.4.3 Find a parametric and a Cartesian equation for the tangent plane to the surface x3 = f (x1 , x2 ) at the point (0, 2, 5), where f (x1 , x2 ) = x1 2 + x2 2 + 1. 77 We have fx1 = 2x1 and fx2 = 2x2 , so fx1 (0, 2) = 0 and fx2 (0, 2) = 4. Hence, the direction vectors of the tangent plane are         1 1 0 0  0  = 0  1  = 1 , and fx1 (a, b) 0 fx2 (a, b) 4 and the normal vector is     fx1 (a, b) 0 fx2 (a, b) =  4  . −1 −1 A Cartesian description for the tangent plane is therefore given (in scalar product form) by *x − 0  0 + 1 x2 − 2 ,  4  x3 − 5 −1 = 0; i.e., by x3 − 5 = 0(x1 − 0) + 4(x2 − 2), and a vector parametric description is given by         0 1 0 x1 x2  = 2 + s 0 + t 1 . 5 0 4 x3 29.5 Exercises for self study Exercise 29.5.1 Using the simpler notation x1 = x, x2 = y, show that the function f : R2 → R given by x3 + y 3 f (x, y) = x+y is homogeneous of degree n = 2. Hence verify the so-called Euler’s formula valid for homogeneous functions; that is, verify that xfx + yfy = nf. Exercise 29.5.2 For the function g : R2 → R given by g(x, y) = 32x − 6x2 + 8xy + 16y − 3y 2 − 20, find all the points (a, b, g(a, b)) ∈ R3 where the tangent plane to the graph of g is horizontal. Exercise 29.5.3 Consider the function f : R2 → R defined by f (x1 , x2 ) = 4x21 + x22 . 78 (a) Provide a Cartesian description for the graph of f in R3 . (b) On a single copy of the x1 x2 -plane, sketch the curve of intersection of the graph with the horizontal plane x3 = c for c = 0, 4, 16, i.e., sketch the contours f (x1 , x2 ) = c for c = 0, 4, 16. (c) On a single copy of the x1 x3 -plane, sketch the curve of intersection of the graph of f with the vertical plane x2 = b for b = −1, 0, 1; i.e., sketch the vertical sections x3 = f (x1 , b) for b = −1, 0, 1. Now consider the vertical section x3 = g(x1 ) := f (x1 , 1). (d) Evaluate the ordinary derivative g 0 (x1 ) at x1 = 2 and show that it is equal to the ∂f (x1 , x2 ) evaluated at x1 = 2, x2 = 1. partial derivative ∂x1 Exercise 29.5.4 Let f : R2 → R be the function introduced in Exercise 29.5.3; i.e., f (x1 , x2 ) = 4x21 + x22 . (a) Calculate the partial derivatives fx1 , fx2 . (b) Find a Cartesian and a parametric description for the tangent plane Π to the graph of f at the point (0, 2, 4) ∈ R3 . (c) On a copy of the x1 x2 -plane, sketch the line ` of intersection of the plane Π with the horizontal plane x3 = 4. On the same copy of the x1 x2 -plane, sketch the contour f (x1 , x2 ) = 4. (d) What is the relation between the line ` and the contour f (x1 , x2 ) = 4? 29.6 Relevant sections from the textbooks • K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press. Sections 3.1, 3.2 and 3.3 of our Calculus Textbook are relevant. 30 30.1 Multivariate calculus, 2 of 5 The gradient Consider a function f: R2→ R. The gradient of f , denoted by ∇f , is defined as the f column vector ∇f := x1 . fx2 For the purpose of interpreting the gradient geometrically, consider the surface x3 = f (x1 , x2 ) in R3 and a point (a, b, f (a, b)) on this surface. We assume that f is differentiable at (a, b), which means that this surface admits a non-vertical tangent plane at (a, b, f (a, b)). The Cartesian equation of this plane is x3 − f (a, b) = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b). 79 We also assume that fx1 (a, b) and fx2 (a, b) are not both zero. This ensures that the tangent plane at (a, b, f (a, b)) is not horizontal. We further consider the intersection of the surface x3 = f (x1 , x2 ) with the horizontal plane x3 = f (a, b). The resulting curve of intersection is a contour containing the point (a, b, f (a, b)). This point lies on the contour because it satisfies both equations x3 = f (x1 , x2 ) and x3 = f (a, b). The relevant geometrical objects are illustrated below: Figure 30.1.1 Let us focus in particular on the intersection of the horizontal plane x3 = f (a, b) with the tangent plane x3 − f (a, b) = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b). Recall that the tangent plane has been assumed non-horizontal, so the intersection of these two non-parallel planes is a line. Regarded as a geometrical object in R3 , this line is described by the set of equations x3 − f (a, b) = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b) and x3 = f (a, b). The same line can be regarded as a line on the x1 x2 -plane, in which case its Cartesian equation is 0 = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b). This equation is obtained by eliminating x3 from the set of equations x3 = f (a, b) and x3 − f (a, b) = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b) which define this line in R3 . We repeat this process for the contour as well. Starting from the equations x3 = f (x1 , x2 ) and x3 = f (a, b) that describe this contour as a curve in R3 , we eliminate x3 and derive the Cartesian equation f (x1 , x2 ) = f (a, b) 80 which describes the same contour as a curve on the x1 x2 -plane. Both this contour and the line 0 = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b) are illustrated on the x1 x2 plane, below: Figure 30.1.2 Observe that the line is tangent to the contour2 . Moreover, the coefficients of x1 and x2 in the Cartesian equation 0 = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b) of this line are precisely the components of the gradient ∇f (a, b). Therefore, the geometrical interpretation of the gradient is the following: ∇f (a, b) is the normal vector to the contour f (x1 , x2 ) = f (a, b) at the point (a, b). Moreover, it can be shown that this vector points in the direction of increasing contour-values; that is, if this vector is placed at the point (a, b) on the contour f (x1 , x2 ) = f (a, b), then it points towards a contour f (x1 , x2 ) = c with c > f (a, b). This information has been included in Figure 30.1.2 above. Finally, note that if fx1 (a, b) and fx2 (a, b) are both zero, ∇f (a, b) vanishes and cannot be considered as a normal vector to any contour. This is the reason why we excluded this case from our previous discussion. Of course, points where both partial derivatives vanish do arise, and there is nothing wrong with saying that the gradient is zero at these points. It is only the claim that the gradient is a normal vector to a contour that fails there. 30.2 The derivative We noted last time that the Cartesian equation x3 − f (a, b) = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b) 2 This fact is proven in Exercise 30.5.1. 81 for the tangent plane to the surface x3 = f (x1 , x2 ) at the point (a, b, f (a, b)) resembles the Cartesian equation y − f (a) = f 0 (a)(x − a) for the tangent line to the curve y = f (x) at the point (a, f (a)). We would like to explore this similarity further by writing the equation of the tangent plane in the compact form x3 − f (a) = f 0 (a)(x − a), x1 a where the two-dimensional vectors x and a are given by x = and a = . x2 b For this to be the case, the symbol f (a) should be understood as f (a, b) and the derivative T fx1 (a, b) 0 0 f (a) should correspond to the row vector f (a) = , where T indicates transfx2 (a, b) posing. Then, indeed, the Cartesian equation of the tangent plane is recovered from the equation x3 − f (a) = f 0 (a)(x − a) via matrix multiplication. This consideration leads to the following definition: The derivative of f : R2 → R at a point x ∈ R2 is defined by T fx1 (x1 , x2 ) f (x) = . fx2 (x1 , x2 ) 0 An alternative symbol for the derivative is Df . Note that the derivative Df and the gradient ∇f are simply the transpose of each other. 30.3 Directional derivatives The directional derivative generalises the concept of the partial derivative of a function f : R2 → R. Recall that the partial derivative fx1 (a, b) is the slope of the tangent line to the curve of intersection of the surface x3 = f (x1 , x2 ) with the vertical plane x2 = b measured at (a, b, f (a, b)), and that the partial derivative fx2 (a, b) is the slope of the tangent line to the curve of intersection of the surface x3 = f (x1 , x2 ) with the vertical plane x1 = a measured at (a, b, f (a, b)). We would like to extend this idea and find the slope, measured at (a, b, f (a, b)), of the tangent line to the curve of intersection of the surface x3 = f (x1 , x2 ) with a vertical plane through (a, b, f (a, b)) which is not necessarily aligned with one of the horizontal coordinate axes. To this end, let us introduce a general horizontal direction vector u on the x1 x2 -plane, and let us define the directional derivative fu (a, b) as the slope at (a, b, f (a, b)) of the tangent line to the curve of intersection of the surface x3 = f (x1 , x2 ) with the vertical plane through (a, b, f (a, b)) aligned with the direction u. This definition indeed vector reproduces 1 0 the geometric meaning of fx1 (a, b) when u = and that of fx2 when u = . 0 1 In the illustration below, we can see the surface x3 = f (x1 , x2 ), a general point (a, b, f (a, b)) on this surface, as well as the curves of intersection of this surface with a number of vertical 82 planes through (a, b, f (a, b)), each corresponding to a different choice of the horizontal direction vector u. Figure 30.3.1 The key observation is that the tangent lines at (a, b, f (a, b)) to these curves of intersection all belong to the tangent plane to the surface at (a, b, f (a, b)). Hence, vectors  the direction  fx1 (a, b)  of these tangent lines are all perpendicular to the normal vector fx2 (a, b) of this plane. −1   1  0  is the direction vector of the tangent Among these direction vectors, recall that fx1 (a, b) 1 line associated with u = . This is because the vertical plane x2 = b that contains this 0   0 1 tangent line is aligned with the horizontal direction u = . Similarly,  1  is the 0 fx2 (a, b) 0 direction vector of the tangent line associated with the vector u = . This is because 1 the vertical plane x1 = a that contains this tangent line is aligned with the horizontal 0 direction u = : 1 83 Figure 30.3.2   1 Now note that the vector  0  implies that, along the corresponding tangent line, fx1 (a, b) a displacement of 1 unit in the x1 -direction results in a displacement equal   to the slope 0 fx1 (a, b) of this line in the x3 -direction. Similarly, the vector  1  implies that, fx2 (a, b) along the corresponding tangent line, a displacement of 1 unit in the x2 -direction results in a displacement equal to the slope fx2 (a, b) of this line in the x3 -direction.   u1 Following the same reasoning, let  u2  be the direction vector of the tangent line fu (a, b) u1 associated with u = . The vertical plane that contains this tangent line is the vertical u2 plane through (a, b, f (a, b)) aligned with the horizontal direction u1 u = . In order for fu (a, b) to have the meaning of the slope of this tangent line, u2   u1 the vector  u2  must imply that a displacement of 1 unit in the u-direction results fu (a, b) in a displacement equal to the slope fu (a, b) in the x3 -direction. For this to be the case, the vector u must be of unit length; that  is, we  must have u 1 √ that u1 2 + u2 2 = 1. Otherwise, the third component fu (a, b) of  u2  cannot be fu (a, b) 84   α interpreted as a slope. Indeed, the slope of a line in the direction β  is equal to the γ vertical displacement γ only if the horizontal displacement has length equal to 1; that is, p 1 0 and are both of unit length, and only if α2 + β 2 = 1. For example, the vectors 0  1    1 0 this fact is consistent with the partial derivatives in  0  and  1  being fx1 (a, b) fx2 (a, b) the slopes of the corresponding tangent lines. Now, with the horizontal direction vector u being a unit vector, it is easy to find an expression for the directional derivative fu(a, b) in terms of u and the gradient ∇f (a, b)  u1 by using the fact that the vector  u2  belongs to the tangent plane at (a, b, f (a, b)). fu (a, b)     u1 fx1 (a, b) In particular,  u2  must be perpendicular to the normal vector fx2 (a, b) of this fu (a, b) −1 plane; that is, we must have * u  f (a, b)+ 1  x1 u2  , fx2 (a, b) fu (a, b) −1 = 0. Solving this equation for the directional derivative fu (a, b), we find that fx1 (a, b) u1 , fu (a, b) = = hu, ∇f (a, b)i , where ||u|| = 1. u2 fx2 (a, b) Of course, the direction of the directional derivative fw (a, b) may be described equally well by a horizontal vector w which is not necessarily of unit length. The above formula then needs to be adjusted so that it becomes applicable: w , ∇f (a, b) , where w 6= 0. fw (a, b) = ||w|| This formula reduces to the previous one when ||w|| = 1. 30.4 The rate of change of a function f : R2 → R Consider the expression w fw (a, b) = , ∇f (a, b) ||w|| which gives the directional derivative of f in the direction w 6= 0. Recall that the right w hand side of this equation is equal to the product of the lengths of the vectors and ||w|| w ∇f (a, b) multiplied by the cosine of the angle between them. Since is a unit vector, ||w|| we find that fw (a, b) = ||∇f (a, b)|| cos(θ). 85 Clearly, the maximum value that fw (a, b) can have is ||∇f (a, b)||. This occurs when cos(θ) = 1, i.e., when θ = 0, and hence corresponds to the vector w pointing in the direction of the gradient ∇f (a, b) itself; i.e., in the direction orthogonal to the contour f (x1 , x2 ) = f (a, b) at the point (a, b), towards increasing contour values. Now recall that the directional derivative fw (a, b) measures the slope at (a, b, f (a, b)) of the tangent line to the curve of intersection of the surface x3 = f (x1 , x2 ) with the vertical plane through (a, b, f (a, b)) aligned with the horizontal vector w. Hence, more simply, we can say that fw (a, b) measures the rate of change of f at (a, b, f (a, b)) in the direction w. It follows that the maximum rate of increase of f at (a, b, f (a, b)) is ||∇f (a, b)|| and occurs in the direction w = ∇f (a, b). In other words, the rate of change of f in this direction is positive and given by f∇f (a,b) (a, b) = ||∇f (a, b)||. The maximum rate of decrease of f at (a, b, f (a, b)) is also ||∇f (a, b)|| (in absolute value) and occurs in the direction w = −∇f (a, b), i.e., when θ = π and cos(θ) = −1. In other words, the rate of change of f in this direction is negative π and given by f−∇f (a,b) (a, b) = −||∇f (a, b)||. Finally, fw (a, b) = 0 when θ = , in which 2 case w is perpendicular to the gradient ∇f (a, b) and therefore tangent to the contour at (a, b, f (a, b)). 30.5 Exercises for self study Exercise 30.5.1 Consider the line ` of intersection of the horizontal plane x3 = f (a, b) with the tangent plane x3 − f (a, b) = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b); i.e., the line described in the context of the x1 x2 -plane by the Cartesian equation 0 = fx1 (a, b)(x1 − a) + fx2 (a, b)(x2 − b). Also consider the contour c of intersection of the horizontal plane x3 = f (a, b) with the graph x3 = f (x1 , x2 ) of the function f ; i.e., the contour described in the context of the x1 x2 -plane by the Cartesian equation f (x1 , x2 ) = f (a, b). (a) Starting from the above equation and using implicit differentiation, show that the tangent line to the contour c at (x1 , x2 ) = (a, b) has direction vector fx2 (a, b) d= . −fx1 (a, b) (b) Hence show that the line ` is the tangent line to the contour c at (x1 , x2 ) = (a, b). Exercise 30.5.2 Consider the function f : R2 → R given by f (x1 , x2 ) = x21 + x22 . Also consider the point (3, 3, 18) on the graph of f . This point lies on the contour c, described in the context of the x1 x2 -plane by the Cartesian equation x21 + x22 = 18. 86 1 (a) Find the directional derivative fu (3, 3) in the direction u = and show that it 0 is equal to the partial derivative fx1 (3, 3). Sketch the contour c on the x1 x2 -plane and indicate the direction u on your graph as a vector starting at the point (3, 3). 1 (b) Find the directional derivative fv (3, 3) in the horizontal direction v = . Indicate 1 the direction v on your graph as a vector starting at the point (3, 3). (c) Calculate fx1 (3, 3) by using the definition of the partial derivative fu (3, 3) = fx1 (3, 3) = lim+ t→0 f (3 + t, 3) − f (3, 3) . t The above relation tells us that, in the context of R3 , the slope fx1 (3, 3) is the change in the f (3+t, 3) − f (3, 3) divided by the change in the horizontal distance vertical distance √ 1 3+t 3 t 2 2 − = = t + 0 = t associated with a displacement tu = t 0 3 3 0 in the horizontal x1 -direction. (d) Calculate fv (3, 3) using a similar limit and verify your value for fv (3, 3) obtained in part (b). Exercise 30.5.3 Consider the function f : R2 → R given by f (x1 , x2 ) = x1 2 + x2 2 . (a) Obtain a Cartesian description in R3 for the contour c obtained by slicing the surface x3 = x1 2 + x2 2 by the horizontal plane x3 = 25. (b) Obtain a Cartesian and a vector parametric description in R3 for the tangent plane Π to the surface x3 = x1 2 + x2 2 at the point (3, 4, 25). (c) Hence obtain a Cartesian and a parametric equation in R3 for the tangent line ` to the contour at the point (3, 4, 25). Now eliminate x3 from the description of the contour c and the line `; i.e., regard c and ` as geometric objects on the x1 x2 -plane. (d) Obtain a Cartesian description for c as well as a Cartesian and a vector parametric description for ` within the context of the x1 x2 -plane. Exercise 30.5.4 Consider the contour described in the context of the x1 x2 -plane by the Cartesian equation x21 + x22 = 25. (a) Sketch this contour and identfy the point (3, 4) on your graph. (b) Use the argument based on implicit differentiation (presented in Exercise 30.5.1) to find a 2 × 1 direction vector d for the tangent line to this contour at (3, 4). (c) Confirm that the vector d is orthogonal to the gradient vector ∇f (3, 4), where f (x1 , x2 ) = x21 + x22 . 87 (d) Also confirm that ∇f (3, 4) points in the direction of increasing contour values; i.e., if ∇f (3, 4) starts at (3, 4), then it points towards a contour f (x1 , x2 ) = k where k > 25. 30.6 Relevant sections from the textbooks • K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press. Sections 3.4, 3.5 and 3.6 of our Calculus Textbook are relevant. 31 31.1 Multivariate calculus, 3 of 5 Functions of n variables Let D be a subset of Rn . A function f : D → R is a rule that assigns to each point (x1 , x2 , ..., xn ) ∈ D a unique real number f (x1 , x2 , ..., xn ) ∈ R. Any such function f is called a real-valued function of n variables. For simplicity, we will assume D = Rn . The graph of f : Rn → R consists of all points (x1 , x2 , ...xn , xn+1 ) ∈ Rn+1 for which xn+1 = f (x1 , x2 , ..., xn ). It corresponds to what is known as a hypersurface in Rn+1 . A hypersurface in Rn+1 is an n-dimensional subset of Rn+1 which is generally curved. If f is a linear function of the form f (x1 , x2 , ..., xn ) = a1 x1 + a2 x2 + ... + an xn , the hypersurface xn+1 = a1 x1 + a2 x2 + ... + an xn is an n-dimensional hyperplane in Rn+1 through the origin. If f is an affine function of the form f (x1 , x2 , ..., xn ) = a1 x1 + a2 x2 + ... + an xn + c, the hypersurface xn+1 = a1 x1 + a2 x2 + ... + an xn + c is an n-dimensional hyperplane in Rn+1 which does not contain the origin (unless c is zero). In other words, linear and affine functions f : Rn → R produce graphs which are n-dimensional flats in Rn+1 . General functions f : Rn → R produce graphs which are curved. 31.2 Tangent hyperplanes By identical reasoning to that of the case n = 2, the tangent hyperplane to the hypersurface xn+1 = f (x1 , x2 , ..., xn ) in Rn+1 at the point (a1 , a2 , ..., an , f (a1 , a2 , ..., an )) is described by the Cartesian equation xn+1 − f (a1 , a2 , ..., an ) = fx1 (a1 , a2 , ..., an )(x1 − a1 ) + · · · + fxn (a1 , a2 , ..., an )(xn − an ). A vector n perpendicular to this hyperplane can be identified by looking at the coefficients 88 of x1 , x2 , ..., xn , xn+1 . It is given by   fx1 (a1 , a2 , ..., an )  fx (a1 , a2 , ..., an )   2    . . n= . .   fxn (a1 , a2 , ..., an ) −1 A vector parametric equation can be derived from the Cartesian equation via the Gaussian elimination method. Example 31.2.1 Find a Cartesian and a vector parametric equation R4 for the tangent hyperplane to the hypersurface x4 = x1 2 + x2 2 + x3 2 at the point (1, 2, 3, 14). The partial derivatives are fx1 = 2x1 , fx2 = 2x2 , fx3 = 2x3 , so at (1, 2, 3) they become fx1 = 2, fx2 = 4, fx3 = 6. A Cartesian description in R4 for the 3-dimensional tangent hyperplane to the 3-dimensional hypersurface x4 = x21 + x22 + x23 is given by x4 − 14 = 2(x1 − 1) + 4(x2 − 2) + 6(x3 − 3), which yields 2x1 + 4x2 + 6x3 − x4 = 14. A vector parametric description can be obtained by Gaussian elimination using the augmented matrix (2 4 6 − 1 | 14). Equivalently, to avoid fractions, we can treat x4 as the leading variable and regard x1 = s, x2 = t, x3 = λ as free parameters. Solving for x4 , we get x4 = 2s + 4t + 6λ − 14; i.e.,           0 x1 1 0 0  x2   0  0 1 0  =        x3   0  + s 0 + t 0 + λ 1 . x4 −14 2 4 6 Note that the three  direction vectors are fx1 (1, 2, 3) 2 fx2 (1, 2, 3)  4     n= fx3 (1, 2, 3) =  6  to the hyperplane. −1 −1 31.3 perpendicular to the normal Stationary points A stationary point of a function f : Rn → R is a point where all partial derivatives of f become zero. At any such point, the Cartesian equation of the tangent hyperplane reduces to xn+1 − f (a1 , a2 , ..., an ) = 0. When n = 2, the tangent hyperplane is just a regular plane in R3 , and the equation x3 − f (a1 , a2 ) = 0 implies that this plane is horizontal. Similarly, when n = 1, the tangent hyperplane is just a line in R2 , and the equation x2 − f (a1 ) = 0 implies that this line is horizontal. For n > 2, the word “horizontal” may not have the same meaning but is still convenient to use. So we will continue using it. 89 In order to find the stationary points of f : Rn → R, we need to solve a set of n equations (generally, non-linear), obtained by setting all the partial derivatives of f to zero. A simple example follows, but several more examples and exercises can be found in our calculus textbook. Note that solving simultaneous systems of non-linear equations is significantly harder than solving linear systems, so some practice may be needed. Example 31.3.1 Find the stationary points of the function f : R3 → R, given by f (x1 , x2 , x3 ) = x1 3 + x1 x2 − x2 x3 . We have   fx1 = 3x21 + x2 = 0 f x = x1 − x3 = 0  2 fx3 = −x2 = 0. The last equation implies that x2 = 0. Then, the first equation implies that x1 = 0, which also makes x3 = 0 on the basis of the second equation. So only the origin (0, 0, 0) is a stationary point. 31.4 Contours, gradient and directional derivatives Consider the graph of a function f : Rn → R. As already discussed, this is an ndimensional hypersurface in Rn+1 described by the equation xn+1 = f (x1 , x2 , ..., xn ). A contour corresponds to the intersection of this hypersurface with the n-dimensional “horizontal” hyperplane xn+1 = c. In Rn+1 , this contour is described by the set of equations xn+1 = f (x1 , x2 , ..., xn ) and xn+1 = c. Two equations in Rn+1 eliminate two degrees of freedom and hence imply that this contour is an (n − 1)-dimensional object. The same contour is described in the n-dimensional x1 x2 ...xn -space by the single equation c = f (x1 , x2 , ..., xn ), obtained by eliminating xn+1 from the set of equations xn+1 = f (x1 , x2 , ..., xn ) and xn+1 = c. The gradient ∇f of f : Rn → R is a vector contained in the n-dimensional x1 x2 ...xn space. In this sense, it is a “horizontal” vector. This vector is normal to the contour c = f (x1 , x2 , ..., xn ) and points in the direction of increasing contour-values. The directional derivative in the ‘horizontal’ direction u (i.e., a direction contained in the n-dimensional x1 x2 ...xn -space) is defined by u . fu = ∇f, ||u|| It generalises the concept of the partial derivative of f and gives the rate of change of f in the direction u. Exercise 31.8.1 provides a review of all these concepts. 31.5 Vector-valued functions For completeness, let us also consider functions whose domain and codomain are both multidimensional spaces. We will not study tangent spaces, contours and directional derivatives for vector-valued functions but we will study the differentiation of such functions by the chain rule, as this arises frequently in practical applications. 90 n m Let  R is a rule that assigns to each vector  D  be a subset of R . Afunction f : D → x1 f1 (x1 , x2 , ..., xn )  x2   f2 (x1 , x2 , ..., xn )       ..  ∈ D a unique vector   ∈ Rm . .. .   . xn fm (x1 , x2 , ..., xn ) Any such function f is called a vector-valued function. The m real-valued functions f1 (x1 , x2 , ..., xn ), f2 (x1 , x2 , ..., xn ), ..., fm (x1 , x2 , ..., xn ) are called the component functions of f . The domain D will be assumed equal to Rn for simplicity. Let (x1 , x2 , ..., xn , xn+1 , xn+2 , ..., xn+m ) be a Cartesian coordinate system for Rn+m . The n coordinates (x1 , x2 , ..., xn ) correspond to the independent variables in the domain of f . The m coordinates (xn+1 , xn+2 , ..., xn+m ) correspond to the dependent variables in the codomain of f . The graph of f : Rn → Rm consists of all points (x1 , x2 , ..., xn , xn+1 , xn+2 , ..., xn+m ) ∈ Rn+m for which     xn+1 f1 (x1 , x2 , ..., xn )  xn+2   f2 (x1 , x2 , ..., xn )       ..  =  . ..  .    . xn+m fm (x1 , x2 , ..., xn ) Since we have m independent equations for (n + m) variables, the graph of f is an ndimensional surface in Rn+m . This is consistent with the fact that there are n independent variables in the domain of f , so the graph of f is an n-dimensional object in Rn+m . Example 31.5.1 Interpret geometrically the graph of f : R2 → R3 whose component functions are given by f1 (x1 , x2 ) = 4x1 +x2 , f2 (x1 , x2 ) = x1 x2 and f3 (x1 , x2 ) = x1 2 +x2 2 +1. The graph of this function is in R5 . The coordinates (x1 , x2 ) correspond to the domain of f and the coordinates (x3 , x4 , x5 ) correspond to the codomain of f . The graph of f : R2 → R3 consists of all points (x1 , x2 , x3 , x4 , x5 ) ∈ R5 for which     4x1 + x2 x3 .  x4  =  x1 x 2 2 2 x5 x1 + x2 + 1 Each of these equations describes a 4-dimensional hypersurface in R5 , and the graph of f is the intersection of these three hypersurfaces. Since there are three independent equations for five variables, the graph of f is a 2-dimensional surface in R5 . This is in agreement with the fact that there are two independent variables in the domain of f . The graph of a linear vector-valued function f : Rn → Rm has the form        xn+1 a11 x1 + a12 x2 + ... + a1n xn a11 a12 . . . a1n x1  xn+2   a21 x1 + a22 x2 + ... + a2n xn   a21 a22 . . . a2n   x2          ..  =   =  .. .. .. ..   ..  ,  .     . . . .  .  xn+m am1 x1 + am2 x2 + ... + amn xn am1 am2 . . . amn xn where each aij ∈ R. This graph corresponds to an n-dimensional flat in Rn+m that contains the origin. 91 Similarly, the graph of an affine vector-valued function f : Rn → Rm can be expressed in the matrix form        xn+1 a11 a12 . . . a1n x1 c1  xn+2   a21 a22 . . . a2n   x2   c2          ..  =  .. .. ..   ..  +  ..  ,  .   . . .  .   .  xn+m am1 am2 . . . amn xn cm where each aij ∈ R and each ck ∈ R. This graph corresponds to an n-dimensional flat in Rn+m that does not contain the origin unless each ck = 0. The derivative of f : Rn → Rm at a general point  ∂f1 ∂f1  ∂x1 (x) ∂x2 (x)  ∂f2  ∂f2  (x) (x) 0  ∂x2 f (x) =  ∂x1 ..  .   ∂fm ∂fm (x) (x) ∂x1 ∂x2 x ∈ Rn is defined as the matrix  ∂f1 ... (x)  ∂xn  ∂f2  ... (x)  . ∂xn    ∂fm  ... (x) ∂xn In other words, row i (where 1 ≤ i ≤ m) consists of the n partial derivatives of the component function fi (x). Note that this expression reduces to the derivative f 0 (x) = (∇f (x))T of a scalar-valued function f : Rn → R if we set m = 1. 31.6 The general chain rule We are now in a position to extend the chain rule to compositions of vector-valued functions. The rule is called general because vector-valued functions incorporate all the functions encountered so far in this course. Let us begin by recalling the chain rule involving the composition f ◦ g of two scalar-valued functions f : R → R and g : R → R of a single variable. Let f be given by y = f (x) and g be given by x = g(t). Then, by dy(t) the chain rule, the derivative of the composite function y(t) = f (g(t)) is equal to dt df (x) dy(t) dg(t) = |x=g(t) . This result can be expressed in a clearer, adapted, notation as dt dx dt dy(t) dy(x) dx(t) = dt dx dt with the understanding that dy(x) is evaluated at x = g(t). dx Example 31.6.1 In an adapted notation, let y(x) = x3 and x(t) = sin(t). The composite function y(t) = y(x(t)) is given by y(t) = sin3 (t). By applying the chain rule dy(t) dy(x) dx(t) = , we find that dt dx dt dy(t) = 3x2 cos(t) = 3sin2 (t) cos(t). dt 92 Let us consider the composition f ◦ g of a scalar-valued function f : Rn → R of n variables and a vector-valued function g : R → Rn consisting of n component functions of a single variable. Let f be given by y = f (x) and g be given by x = g(t). Then, the composite function dy(t) is equal y(t) = f (g(t)) is a scalar-valued function of a single variable. Its derivative dt to the matrix product dy(t) dy(x) dx(t) = . dt dx dt As we discussed previously, vector. dx(t) dy(x) is a 1 × n row vector and is an n × 1 column dx dt Let = x1 2 + x2 x3 + x3 be a scalar-valued function of three vari 2y(x)  t +t  ables and let x(t) = 3t + 1 be a vector-valued function consisting of three component t4 − 5 functions of a single variable. Then, the composite function y(t) is a scalar-valued function of a single variable given by Example 31.6.2 y(t) = (t2 + t)2 + (3t + 1)(t4 − 5) + (t4 − 5). The chain rule dy(x) dx(t) dy(t) = gives dt dx dt  dy(t) = 2x1 x3 dt  2t + 1 x2 + 1  3  . 4t3 Performing the matrix multiplication and expressing the answer in terms of t we find that dy(t) = 2(t2 + t)(2t + 1) + (t4 − 5)(3) + [(3t + 1) + 1](4t3 ). dt You may confirm this result by differentiating the composite function y(t) = (t2 + t)2 + (3t + 1)(t4 − 5) + (t4 − 5) directly. Finally, let us consider the composition f ◦ g of a vector-valued function f : Rn → Rm consisting of m component functions of n variables and a vector-valued function g : Rk → Rn consisting of n component functions of k variables. Let f be given by y = f (x) and g be given by x = g(t). Then, the composite function y(t) = f (g(t)) is a vector-valued function consisting of m component functions of k variables. Its derivative dy(t) is equal to the matrix product dt dy(t) dy(x) dx(t) = , dt dx dt where dy(x) dx(t) is an m × n matrix and is an n × k matrix. dx dt 93 Example 31.6.3 Let y(x) = x1 x4 x2 + x3 be a vector-valued function consisting of two   t1 + t3  t1 t2   component functions of four variables and let x(t) =   t3 2  be a vector-valued function t2 consisting of four component functions of three variables. Then, the composite function y(t) = y(x(t)) is a vector-valued function consisting of 2 (t1 + t3 )t2 component functions of 3 variables given by y(t) = . Applying the chain rule t1 t2 + t3 2 dy(t) dy(x) dx(t) = we obtain dt dx dt   1 0 1  dy(t) x4 0 0 x1  t2 t1 0 . = 0 1 1 0  0 0 2t3  dt 0 1 0 Performing the matrix multiplication and expressing the answer in terms of t we find that dy(t) t2 t1 + t3 t2 . = t2 t1 2t3 dt You this result by calculating the derivative of the composite function y(t) = may confirm (t1 + t3 )t2 directly. t1 t2 + t3 2 31.7 Adapting the chain rule Sometimes it becomes necessary to adapt the chain rule as in the following two examples. Example 31.7.1 The length x1 , the width x2 and the height x3 of a rectangle are all functions of time according to x1 (t) = 2t, x2 (t) = t2 and x3 (t) = t. Find the rate of change of the volume as a function of time. The volume of the rectangle is a scalar-valued function of three variables given by V (x) = x1 x2 x3 . As a function of t, the volume V is expressed by the composite function V (t) = V (x(t)) = (2t)(t2 )(t) = 2t4 . Differentiating this expression directly, we find dV (t) that the rate of change of the volume is = 8t3 . Alternatively, using the chain rule dt dV (t) dV (x) dx(t) = we obtain the expression dt dx dt   2 dV (t) = x2 x3 x1 x3 x1 x2 2t. dt 1 Multiplying the matrices and expressing the answer in terms of t we confirm that dV (t) = 2(t2 )(t) + 2t(2t)(t) + (2t)(t2 ) = 8t3 . dt 94 Example 31.7.2 Let y = f (x1 , x2 , x3 ) be a scalar-valued function of three variables and suppose that x2 and x3 both depend on x1 via some functions x2 = g(x1 ) and dy(x1 ) of the function x3 = h(x1 ). Adapt the chain rule in order to express the derivative dx1 y(x1 ) = f (x1 , g(x1 ), h(x1 )) as a function of x1 .   x1 Let us adapt the chain rule to this case by letting x = x2  and considering x as a x3   x1 vector-valued function of the single variable x1 according to x(x1 ) = g(x1 ). Then, the h(x1 ) function y(x1 ) = f (x1 , g(x1 ), h(x1 )) can be regarded as the composition y(x1 ) = f (x(x1 )). The chain rule df (x) dx(x1 ) dy(x1 ) = gives us the matrix product dx1 dx dx1   1  dg(x1 )  dy(x1 ) ∂f (x) ∂f (x) ∂f (x)   =  dx1 , dx1  dh(x )  ∂x1 ∂x2 ∂x3 1 dx1 which can be written as ∂f (x) ∂f (x) dg(x1 ) ∂f (x) dh(x1 ) dy(x1 ) = + + . dx1 ∂x1 ∂x2 dx1 ∂x3 dx1 This result can be expressed in the alternative (adapted) notation dy(x1 ) ∂y(x) ∂y(x) dx2 (x1 ) ∂y(x) dx3 (x1 ) = + + . dx1 ∂x1 ∂x2 dx1 ∂x3 dx1 The use of the ordinary derivative on the left hand side makes clear that we regard y as a function of x1 alone, via the composite function f (x(x1 )). Similarly, the use of the partial derivatives on the right hand side makes clear that we regard y as a function of all three variables x1 , x2 , x3 , via the original function f (x). Another way of arriving at the same result (which is less formal than the previous) is by realising that the expression y(x1 ) = f (x1 , g(x1 ), h(x1 )) implies that y responds to changes of x1 via three different channels: (i) directly, (ii) via x2 and (iii) via x3 . Hence the total response of y to changes of x1 is obtained by combining these three contributions according to ∂f (x) ∂f (x) dg(x1 ) ∂f (x) dh(x1 ) dy(x1 ) = + + . dx1 ∂x1 ∂x2 dx1 ∂x3 dx1 95 31.8 Exercises for self study Exercise 31.8.1 Consider the three-dimensional hypersurface in R4 described by the Cartesian equation x4 = f (x1 , x2 , x3 ) = x1 2 + x2 2 + x3 2 . (a) Slice this hypersurface by the ‘horizontal’ hyperplane in R4 described by the equation x4 = 14 and obtain a Cartesian description for the resulting two-dimensional contour, regarded as a geometrical object in R4 . Also consider the point (1, 2, 3, 14) ∈ R4 that lies on this contour. (b) Find the gradient at the point (1, 2, 3) on this contour (where now the contour is regarded as a geometrical object in the ‘horizontal’ x1 x2 x3 -space) and write down the Cartesian equation of the plane tangent to this contour at (1, 2, 3). (c)  Calculate the directional derivative fu (1, 2, 3) in the ‘horizontal’ direction  1 u = 1 by using the formula based on the scalar product. 1 (d) Finally, repeat the calculation of fu (1, 2, 3) by using the definition of the directional derivative as a limit. Confirm that you get the same answer. Exercise 31.8.2 (a) The length and width of a rectangle decrease at the rate of 2cm per minute and 3cm per minute respectively. When the length of the rectangle is 6m and the width is 3m, how fast is the area changing? (b) Suppose that z = f (x, y) is a function of two variables x and y, and that y depends on dz in x via the function y = g(x). Write down an expression for the ordinary derivative dx terms of the partial derivatives of f and the ordinary derivative of g. Exercise 31.8.3 Let f (x, y, z) = 3x2 + 2y 2 − z 2 . (a) Find ∂f ∂f ∂f , and . ∂x ∂y ∂z (b) Obtain a Cartesian equation in R4 for the tangent hyperplane to the hypersurface u = 3x2 + 2y 2 − z 2 at (1, 1, 1, 4). (c) Obtain in the context of the xyz-space a Cartesian equation for the plane that is tangent to the surface 3x2 + 2y 2 − z 2 = 13 at the point (2, −1, −1). Write down its normal vector as a 3 × 1 vector in the context of the xyz-space and also as a 4 × 1 ‘horizontal’ vector in the context of R4 . Exercise 31.8.4 (a) The function f : R3 → R2 with component functions f1 and f2 is defined by u = f1 (x, y, z) = x2 + y 2 + z 2 ; v = f2 (x, y, z) = x − y. 96   x 8 Find all the points x = y  such that f (x) = , and describe the curve consisting of 0 z these points. (b) Write down the derivative f 0 (x) of f . 31.9 Relevant sections from the textbooks • K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press. Sections 3.8, 5.1, 5.2, 5.3 and 5.4 of our Calculus Textbook are relevant. 32 32.1 Multivariate calculus, 4 of 5 The second derivative of a function d2 f (x) of a function f : Rn → R is defined by taking the derivative dx2 of the (transpose of the) derivative of f (x); that is, the transpose of the gradient ∇f (x): The second derivative d d2 f (x) = 2 dx dx df (x) dx T = d ∇f (x) . dx This definition results in a symmetric n × n matrix whose entries are the second order partial derivatives of f (x). The common notation for the second derivative of f (x) is f 00 (x). Example 32.1.1 Find the second derivative f 00 (x) of the function f : R3 → R given by f (x) = 2x1 x2 + x3 4 . The first derivative f 0 (x) is the 1 × 3 row vector given by ∂f (x) ∂f (x) ∂f (x) 0 f (x) = . ∂x1 ∂x2 ∂x3 Applying this definition to the given function f (x) = 2x1 x2 + x3 4 we find that f 0 (x) = 2x2 2x1 4x3 3 . Transposing the derivative of f we obtain the gradient of f , which is the 3 × 1 column vector   ∂f (x)   ∂x1   2x   2 T  ∂f (x)  ∇f (x) = f 0 (x) =   =  2x1  .  ∂x2  4x3 3  ∂f (x)  ∂x3 97 We can now regard the gradient ∇f (x) as a vector-valued function consisting of three component functions of three variables. Hence, its derivative (which is the second derivative of f (x)) is given by the 3 × 3 matrix ∂ 2 f (x)  ∂x1 2 d2 f (x)  d  ∂ 2 f (x) ∇f (x) = =   ∂x1 ∂x2 dx dx2  ∂ 2 f (x) ∂x1 ∂x3  ∂ 2 f (x) ∂x2 ∂x1 ∂ 2 f (x) ∂x2 2 ∂ 2 f (x) ∂x2 ∂x3  ∂ 2 f (x)   ∂x3 ∂x1  0 2 0  ∂ 2 f (x)   0 . = 2 0 ∂x3 ∂x2  0 0 12x3 2 ∂ 2 f (x)  ∂x3 2 The reason that this matrix is symmetric is that the partial derivatives of f (x) commute; ∂ 2 f (x) ∂ 2 f (x) = . that is, for all i and j in the set {1, 2, 3}, we have that ∂xi ∂xj ∂xj ∂xi This will always be the case for the functions considered in this course. Functions which do not have this property do exist but will not be covered. 32.2 Taylor polynomial for a scalar-valued function Recall that the Taylor polynomial P2 (x) of degree two for a twice-differentiable function f : R → R about a point a ∈ R is given by 1 P2 (x) = f (a) + f 0 (a)(x − a) + f 00 (a)(x − a)2 . 2 The graph of P2 (x) approximates the graph of the function f (x) near the point a in the sense that P2 (a) = f (a), P20 (a) = f 0 (a) and P200 (a) = f 00 (a). Having now defined the first and the second derivative of a twice-differentiable scalar-valued function f : Rn → R of n variables, it is straightforward to proceed to the corresponding second-order Taylor polynomial.   a1  a2    If the point of expansion is a =  ..  ∈ Rn , the Taylor polynomial P2 of f about a is . an given by 1 P2 (x) = f (a) + f 0 (a)(x − a) + (x − a)T f 00 (a)(x − a). 2 Notice that f (a) is a scalar, the first derivative f 0 (a) is a 1 × n row vector, the difference x − a is an n × 1 column vector and the second derivative f 00 (a) is an n × n matrix. The order of the matrix multiplications is such that every term in the Taylor polynomial P2 (x) is a scalar.   3 Example 32.2.1 Find the Taylor polynomial P2 (x) about a = 2 ∈ R3 for the function 1 f : R3 → R given by f (x) = 2x1 x2 + x3 4 . 98 The first derivative is given by df (x) = 2x2 2x1 4x3 3 , dx The second derivative is given  0 2 d f (x)  2 = dx2 0 so f 0 (a) = 4 6 4 . by  2 0 0 0 , 0 12x3 2  so  0 2 0 f 00 (a) = 2 0 0  . 0 0 12 Finally, we have that f (a) = 13. Hence, the Taylor polynomial P2 (x) is    x − 3 1 0 2 1 x1 − 3 x2 − 2 x3 − 1  2 0 P2 (x) = 13 + 4 6 4 x2 − 2 + 2 0 0 x3 − 1 given by   0 x1 − 3 0  x2 − 2 , 12 x3 − 1 which simplifies to P2 (x) = 2x1 x2 + 3 − 8x3 + 6x23 .   3  2 4  The graph of P2 (x) in R4 approximates the graph of f (x) near the point   1  ∈ R in 13 the sense that P2 (x) and f (x) agree at a, their first derivatives P20 (x) and f 0 (x) agree at a and their second derivatives P200 (x) and f 00 (x) agree at a. You may find verifying this fact useful. 32.3 Classification of stationary points based on the Taylor polynomial P2 The key result established for differentiable functions of a single variable still holds: A differentiable function f : Rn → R can have a local extremum only at a stationary point a ∈ Rn ; that is, a point where the derivative f 0 (a) is equal to the 1 × n zero vector. The definitions of local extrema and strict local extrema are extended to Rn in a straightforward way. For example, a local maximum of f : Rn → R is a point a ∈ Rn with the property that f (a) ≥ f (x) for all x sufficiently near a. If a stationary point a of f : Rn → R is neither a local maximum nor a local minimum, it is called a saddle point. In order to classify a stationary point a of a twice-differentiable function f : Rn → R we can consider the Taylor polynomial approximation of the function f about a: 1 f (x) ' f (a) + f 0 (a)(x − a) + (x − a)T f 00 (a)(x − a). 2 Since f 0 (a) = 0 at the stationary point a, we find that 1 f (x) ' f (a) + (x − a)T f 00 (a)(x − a). 2 99 This implies that the scalar difference f (x) − f (a) is given approximately by 1 f (x) − f (a) ' (x − a)T f 00 (a)(x − a) for all x near a. 2 We therefore deduce that the stationary point a is: (i) a local minimum (in fact, a strict local minimum) if 1 (x − a)T f 00 (a)(x − a) > 0 for all x near a such that x 6= a, 2 (ii) a local maximum (in fact, a strict local maximum) if 1 (x − a)T f 00 (a)(x − a) < 0 for all x near a such that x 6= a. 2 Note that (x − a)T f 00 (a)(x − a) is a quadratic form in the variable ‘x − a’. Hence we deduce that the point a is (i) a local minimum (in fact, a strict local minimum) if f 00 (a) is positive definite, (ii) a local maximum (in fact, a strict local maximum) if f 00 (a) is negative definite. (iii) On the other hand, if f 00 (a) is indefinite, then a is a saddle point. It may also happen that f 00 (a) is none of the above; that is, f 00 (a) may be positive semidefinite but not positive definite or it may be negative semi-definite but not negative definite. Can we conclude that a is a non-strict local minimum in the first case? Similarly, can we conclude that a is a non-strict local maximum in the second case? The answer is ‘no’: If none of (i), (ii), (iii) holds, the test based on the quadratic Taylor polynomial (and hence on the second derivative f 00 (a)) is inconclusive. This is because if none of (i), (ii), (iii) holds, the quadratic form (x − a)T f 00 (a)(x − a) fails to reproduce the behaviour of the function f near a. A higher-order Taylor polynomial is needed. Example 32.3.1 Consider the function f : R2 → R given by f (x1 , x2 ) = −x1 3 + 4x1 x2 − 2x2 2 + 1. Show that (x1 , x2 ) = (0, 0) is a stationary point of f and classify it by considering the second-order Taylor polynomial of f about (0, 0). The partial derivatives of f are fx1 = −3x1 2 + 4x2 and fx2 = 4x1 − 4x2 . Both partial derivatives become zero at (0, 0), so (0, 0) is indeed a stationary point. −6x1 4 0 4 00 The second derivative of f is f (x1 , x2 ) = . This becomes when 4 −4 4 −4 evaluated at (0, 0). Therefore, the second-order Taylor approximation of f about (0, 0) is 0 4 1 x1 x1 x2 . f (x1 , x2 ) ' 1 + 4 −4 x2 2 100 In order to deduce the nature of the point (0, 0), we consider the eigenvalues of f 00 (0, 0). The characteristic polynomial gives −λ 4 = λ2 + 4λ − 16 = (λ + 2)2 − 20 4 −4 − λ so the eigenvalues are λ1 = −2 + √ 20 > 0 and λ2 = −2 − √ 20 < 0, and hence f 00 (0, 0) is indefinite. We conclude that (0, 0) is a saddle point. Example 32.3.2 Show that the function f of Example 32.3.1 has another stationary point and use the second-order Taylor approximation about that point to deduce that it is a local maximum. The partial derivatives of f are fx1 = −3x1 2 + 4x2 and fx2 = 4x1 − 4x2 . To find all the stationary points of f we require that these derivatives are equal to zero. The equation fx2 = 0 implies that x1 = x2 . Using this relation in the equation fx1 = 0 we find that x1 (−3x1 + 4) = 0. This implies that x1 = 0 or x1 = 4/3. Hence, besides (0, 0), there is a stationary point at (x1 , x2 ) = (4/3, 4/3). −6x1 4 −8 4 00 The second derivative of f is f (x1 , x2 ) = . This becomes when 4 −4 4 −4 evaluated at (4/3, 4/3). The Taylor second-order approximation of f about (4/3, 4/3) is given by −8 4 1 x1 − 4/3 x1 − 4/3 x2 − 4/3 . f (x1 , x2 ) ' f (4/3, 4/3) + 4 −4 x2 − 4/3 2 The eigenvalues of f 00 (4/3, 4/3) are found by solving the characteristic polynomial equation −8 − λ 4 = λ2 + 12λ + 16 = (λ + 6)2 − 20 = 0 4 −4 − λ which gives λ1 = −6 + √ 20 < 0 and λ2 − 6 − √ 20 < 0. Hence f 00 (4/3, 4/3) is negative definite, so (4/3, 4/3) is a (strict) local maximum. Example 32.3.3 Consider the function f : R2 → R given by f (x1 , x2 ) = x21 +x32 . Classify the stationary point(s) of f . 2 The partial derivatives of f are fx1 = 2x1 andfx2 = 3x 2 so (0, 0) is theonlystationary 2 0 2 0 point of f . The second derivative of f is f 00 = . This becomes at (0, 0). 0 6x2 0 0 Accordingly, the Taylor polynomial P2 of f about (0, 0) is 1 2 0 x1 . P2 (x1 , x2 ) = (x1 x2 ) 0 0 x2 2 101 2 0 The eigenvalues of f (0, 0) = are λ1 = 2 and λ2 = 0, so f 00 (0, 0) is positive semi0 0 definite but not positive definite and the test is inconclusive. We can understand why the test fails here. Performing the matrix multiplications, we see that 1 2 0 x1 P2 (x1 , x2 ) = (x1 x2 ) = x21 . 0 0 x2 2 00 Indeed, P2 approximates f near (0, 0) only to second-order degree of accuracy, which is why the cubic term x32 of f does not appear in P2 . Oberserving that for small positive ε, f (0, ε) = ε3 > 0 and f (0, −ε) = −ε3 < 0, itis clear that (0, 0) is a saddle point. Although 1 2 0 x1 it is true that the quadratic form (x1 x2 ) = x21 is positive semi-definite and 0 0 x2 2 hence cannot take negative values, this fact cannot be used in order to draw conclusions about the nature of the stationary point (0, 0). 32.4 Classifying f 00 using the principal minors We saw in the previous subsection that, given a stationary point a of a twice differentiable function f : Rn → R, a conclusive classification for a arises only if f 00 (a) is positive definite, negative definite or indefinite. It is possible to decide whether or not f 00 (a) is one of these three types of matrices by using a test based on the so-called principal minors of f 00 (a). In many cases, this test is much faster than finding the eigenvalues of f 00 (a), so let us present it: Let a ∈ Rn be a stationary point of a twice-differentiable function f : Rn → R and consider f 00 (a). As already discussed, the second derivative of f is the n × n symmetric matrix   fx1 x1 (a) fx1 x2 (a) . . . fx1 xn (a)  fx x (a) fx x (a) . . . fx x (a)  2 2 2 n  2 1  f 00 (a) =  . .. .. .. . .   . . . . fxn x1 (a) fxn x2 (a) . . . fxn xn (a) The principal minors of f 00 (a) are the determinants of the top left matrices of f 00 (a); that is, they are the following numbers:  fx1 x1 (a) fx1 x2 (a)  f (a) fx1 x2 (a)  fx2 x1 (a) fx2 x2 (a) det fx1 x1 (a) , det x1 x1 , . . . , det  .. .. fx2 x1 (a) fx2 x2 (a)  . . hand square sub . . . fx1 xn (a) . . . fx2 xn (a)   . .. ..  . . fxn x1 (a) fxn x2 (a) . . . fxn xn (a) Consider any of the above k × k sub-matrices of f 00 (a); i.e., any of the above principal minors. • If k is an even number, the principal minor is referred to as an even principal minor. • If k is an odd number, the principal minor is referred to as an odd principal minor. 102 For example, det fx1 is an odd principal minor of f 00 (a) since k = 1 and x1 (a) f (a) fx1 x2 (a) det x1 x1 is an even principal minor of f 00 (a) since k = 2. fx2 x1 (a) fx2 x2 (a) The following results are stated without proof: (i) The matrix f 00 (a) is positive definite if and only if all the principal minors of f 00 (a) are strictly greater than zero. (ii) The matrix f 00 (a) is negative definite if and only if all even principal minors of f 00 (a) are strictly greater than zero and all odd principal minors of f 00 (a) are strictly less than zero. (iii) If the principal minors of f 00 (a) do not follow any of the above two patterns and, additionally, det(f 00 (a)) 6= 0, then f 00 (a) is indefinite. If det(f 00 (a)) = 0, the classification test for f 00 (a) based on the principal minors of f 00 (a) is inconclusive, but f 00 (a) can still be classified using the eigenvalue test. Example 32.4.1 Classify the stationary points (0, 0) and (4/3, 4/3) of the function f presented in Example 32.3.1 and 32.3.2 by using the test based on the principal minors. −6x1 4 00 The second derivative of the function f is f (x1 , x2 ) = . At the point (0, 0), 4 −4 0 4 f 00 (0, 0) = , the odd principal minor of f 00 (0, 0) is det(fx1 x1 (0, 0)) = 0 and the 4 −4 even principal minor of f 00 (0, 0) is det(f 00 (0, 0)) = −16. Since det(f 00 (0, 0)) 6= 0, the test is conclusive. The matrix f 00 (0, 0) is neither positive definite nor negative definite, hence (0, 0) is a saddle point, in agreement with our findings in Example 32.3.1. −8 4 At the point (4/3, 4/3), f 00 (4/3, 4/3) = , the odd principal minor of f 00 (4/3, 4/3) 4 −4 is det(fx1 x1 (4/3, 4/3)) = −8 and the even principal minor is det(f 00 (4/3, 4/3)) = 16. Since det(f 00 (4/3, 4/3)) 6= 0, the test is conclusive. The matrix f 00 (4/3, 4/3) is negative definite, hence (4/3, 4/3) is a (strict) local maximum, in agreement with our findings in Example 32.3.2. Example 32.4.2 Classify the following 2 × 2 symmetric matrices using the test based on the principal minors. Whenever this fails, switch to the eigenvalue test. 1 1 1 1 −2 0 −1 0 1 2 (a) , (b) , (c) , (d) , (e) . 1 3 1 1 0 −3 0 0 2 1 (a) The odd and even principal minors of 1 1 1 3 are 1 and 2 respectively. So this is a positive definite matrix. 1 1 (b) The odd and even principal minors of are 1 and 0 respectively. Since the 1 1 determinant is zero, we cannot use the classification based on the principal minors. The 103 eigenvalues are found by solving 1−λ 1 = λ2 − 2λ = λ(λ − 2) = 0. 1 1−λ Hence λ1 = 0 and λ2 = 2, which implies that the matrix is positive semi-definite but not positive definite. −2 0 (c) The odd and even principal minors of are −2 and 6 respectively. So this is 0 −3 a negative definite matrix. −1 0 (d) The odd and even principal minors of are −1 and 0 respectively. Since the 0 0 determinant of A is zero, we cannot use the classification based on the principal minors. The eigenvalues are λ1 = −1, λ2 = 0 so this matrix is negative semi-definite but not negative definite. 1 2 (e) The odd and even principal minors of are 1 and −3 respectively. So this is an 2 1 indefinite matrix. 32.5 Convex sets, convex and concave functions f : Rn → R Recall that a convex function f : R → R has the property that the line segment between any two points on its graph lies above or on this graph. A concave function f : R → R has the property that the line segment between any two points on its graph lies below or on this graph. Let us also recall, and review again, the alternative description of concavity and convexity for functions f : R → R, relying on the idea of a convex set in R2 : A convex set in R2 is a set S such that for any two position vectors x, y in S, the line segment joining x and y lies entirely in S. Formally, given position vectors x, y in S, the line segment joining x and y is the set of all position vectors v described by the parametric equation v = x + t(y − x), where the parameter t ∈ R satisfies 0 ≤ t ≤ 1. Therefore, a set S ∈ R2 is convex if for all position vectors x, y in S and for all t ∈ R such that 0 ≤ t ≤ 1, we have that the position vector v = x + t(y − x) is also in S. In that context, a convex function f : R → R is defined as a function with the property that the set S ∈ R2 of position vectors lying above or on the graph of f in R2 is a convex set. Also, a concave function f : R → R is defined as a function with the property that the set S ∈ R2 of position vectors lying below or on the graph of f in R2 is a convex set. Let us now extend the definition of a convex set from R2 to any Euclidean space Rk : A convex set in Rk is a set S such that for any two position vectors x, y in S, the line segment {v | v = x + t(y − x), t ∈ R, 0 ≤ t ≤ 1} joining x and y lies entirely in S. A function f : Rn → R is a convex function if the set of points lying above or on the graph of f in Rn+1 is a convex set, and f is a concave function if the set of points lying below or on the graph of f in Rn+1 is a convex set. 104 32.6 Convexity and concavity for twice differentiable functions If a function f : Rn → R is twice differentiable, it is possible to determine whether or not it is concave or convex by examining its quadratic Taylor polynomial. Recall that if a differentiable function f : R → R is convex, then all the tangent lines to its graph in R2 lie below or on this graph. Similarly, if a differentiable function f : R → R is concave, then all the tangent lines to its graph in R2 lie above or on this graph. By analogy, if a differentiable function f : Rn → R is convex, then all the tangent hyperplanes to its graph in Rn+1 lie below or on this graph, and if a differentiable function f : Rn → R is concave, then all the tangent hyperplanes to its graph in Rn+1 lie above or on this graph. Illustrations of these statements when n = 2 can be found in section 6.4 of our textbook. Hence, given that the graph of f : Rn → R is described by the Cartesian equation xn+1 = f (x) (where x ∈ Rn ) and that the tangent hyperplane at a general point (a, f (a)) on this graph is described by the Cartesian equation 0 xn+1 = f (a) + f (a)(x − a), it follows that for all x and a: f (x) ≤ f (a) + f 0 (a)(x − a) if f is concave; i.e., any tangent hyperplane is above or on the graph, and f (x) ≥ f (a) + f 0 (a)(x − a) if f is convex; i.e., any tangent hyperplane is below or on the graph. Assuming, further, that f is twice differentiable, we can use its quadratic Taylor polynomial about an arbitrary point a, 1 f (x) = f (a) + f 0 (a)(x − a) + (x − a)T f 00 (a)(x − a), 2 in order to obtain the following equivalent statements: for all x and a, • f (a) + f 0 (a)(x − a) + 12 (x − a)T f 00 (a)(x − a) ≤ f (a) + f 0 (a)(x − a) if f is concave, and • f (a) + f 0 (a)(x − a) + 21 (x − a)T f 00 (a)(x − a) ≥ f (a) + f 0 (a)(x − a) if f is convex. Simplifying, we derive equivalent statements that involve only the quadratic Taylor term. Specifically, for all x and a, • (x − a)T f 00 (a)(x − a) ≤ 0 if f is concave, and • (x − a)T f 00 (a)(x − a) ≥ 0 if f is convex. Linking these results to our classification of symmetric matrices, we see that a twice differentiable function f : Rn → R is • concave if and only if f 00 (a) is negative semi-definite for all a ∈ Rn , and 105 • convex if and only if f 00 (a) is positive semi-definite for all a ∈ Rn . The importance of convex and concave functions for optimisation problems is given by the following theorem, stated without proof: Theorem 32.6.1 (i) If f 00 (a) is negative semi-definite for all a ∈ Rn , then a local maximum of f is automatically a global maximum. (ii) If f 00 (a) is positive semi-definite for all a ∈ Rn , then a local minimum of f is automatically a global minimum. Example 32.6.2 Consider the function f : R2 → R defined by f (x1 , x2 ) = 2x1 2 + 2x1 x2 + x2 2 − 4x1 . Find its stationary points. Investigate if this function has any global extrema and, if yes, identify them. Since f is differentiable in R2 , we know that a local extremum can only appear at a stationary point of f . In addition, we know that a global extremum is also a local extremum, so the only candidates for global extrema are the stationary points of f . In order to find the stationary points of f we set its partial derivatives equal to zero. This results in the system of equations 4x1 +2x2 −4 = 0 and 2x1 +2x2 = 0. The second equation implies that x2 = −x1 . Hence, the first equation becomes 2x1 − 4 = 0, which implies that x1 = 2. Therefore, the only stationary point of f is (x1 , x2 ) = (2, −2). In order to classify this stationary point and also determine whetheror not f is concave or 4 2 convex, we calculate the second derivative of f . We have f 00 (a) = for all a ∈ R2 . 2 2 The odd and even principal minors are 4 and 4. Hence, f 00 (a) is a positive definite matrix for all a ∈ R2 . In particular, f 00 (2, −2) is a positive definite matrix and therefore, the stationary point (2, −2) is a local minimum (in fact, a strict local minimum). Now, since f 00 (a) is positive definite and hence positive semi-definite for all a ∈ R2 , f is a convex function. It follows that (2, −2) is a global minimum. Moreover, since (2, −2) is the only stationary point of f , it is actually the unique global minimum. Note that f cannot have a global maximum, because there is no other stationary point and hence no candidate for a global maximum. In order to confirm this directly, let us evaluate f at points of the form (0, t). We have f (0, t) = t2 . Hence, as t → ∞, we see that f (0, t) → ∞. The fact that f grows without an upper bound implies that f has no global maximum. 32.7 Exercises for self study Exercise 32.7.1 Find all the stationary points of the function f : R2 → R defined by f (x, y) = xy 2 + x2 y − xy. Does f have any global extrema? 106 Exercise 32.7.2 For the following function, find all the stationary points and classify them as local maxima, local minima, or saddle points. Show that f does not have any global extrema. f (x, y) = 1 − y 3 − 3yx2 − 3y 2 − 3x2 . Exercise 32.7.3 For each of the following functions, find all the stationary points and classify them as local maxima, local minima, or saddle points. (a) f (x, y) = 4xy − x4 − y 4 (b) f (x, y) = 4x2 ey − 2x4 − e4y Exercise 32.7.4 Consider the function f : R2 → R defined by f (x, y) = x2 + 6xy + 6y 2 + 7. Find its stationary points. Investigate if this function has any global extrema and, if yes, identify them. 32.8 Relevant sections from the textbooks • K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press. Sections 4.6, 4.7, 4.8, 6.1, 6.2, 6.3 and 6.4 of our Calculus Textbook are relevant. 33 Multivariate calculus, 5 of 5 In this subsection we will focus on constrained optimisation problems; namely, problems where the function f : Rn → R is optimised on a proper subset D ⊂ Rn defined by imposing certain constraints on the components of the variable x ∈ Rn . Methods for dealing with such optimisation problems usually involve optimising f on the interior of D, then separately optimising f on the boundary of D (provided, of course, that this boundary is included in D) and finally reaching an overall conclusion based on these individual optimisation problems. An example of such an approach is presented in the Practice Questions. We start by presenting a widely used method for dealing with constrained optimisation problems, known as Lagrange’s method. This method is applicable to optimisation problems subject to equality constraints. It can also be applied to problems subject to inequality constraints provided that the latter can be reduced to equality ones. 33.1 Motivating Lagrange’s method Let us motivate Lagrange’s method by using an example. Suppose that our task is to maximise the sum x1 + x2 of two real numbers x1 and x2 subject to the constraint that the sum of their squares is equal to 1. Rephrasing this problem, we want to maximise the so-called objective function f (x1 , x2 ) = x1 + x2 in the feasible region D = (x1 , x2 ) ∈ R2 | x21 + x22 = 1 . 107 Approaching this problem graphically, the feasible region is a circle of radius 1 centred at the origin, and each contour f (x1 , x2 ) = c (which determines the set of points (x1 , x2 ) ∈ R2 whose sum is equal to the constant c) is a line of gradient −1. A few of these contours are illustrated below: Observing that c4 > c3 > 1 > 0 > −1 > c2 > c1 , we are interested in the largest possible value of the constant c such that the contour f (x1 , x2 ) = c has points (or point) in common with the feasible region D. Clearly, the required value is c3 and the point in common is B. This gives the solution of the maximisation problem. Suppose now that our task is to minimise the sum x1 + x2 in the feasible region D. Using the same approach, we see that the required value of c is c2 and the corresponding point is A, which is the solution to the minimisation problem. In order to calculate the coordinates of the points A and B, we observe that at each of these points the contour f (x1 , x2 ) = c is tangent to the constraint curve x21 + x22 = 1 which defines the region D: Moreover, if we introduce the function G(x1 , x2 ) = x21 + x22 , then the constraint curve x21 + x22 = 1 is simply the contour G(x1 , x2 ) = 1. Thus, recalling that the normal vectors to the contours f (x1 , x2 ) = c and G(x1 , x2 ) = 1 are the gradient vectors ∇f and ∇G, we obtain the following two conditions that need to be satisfied by the coordinates (x1 , x2 ) of A and B: (i) ∇f (x1 , x2 ) = λ∇G(x1 , x2 ), 108 (ii) G(x1 , x2 ) = 1. The first condition expresses the fact that the contour f (x1 , x2 ) = c is tangent to the contour G(x1 , x2 ) = 1 at A and B (and hence ∇f is a scalar multiple of ∇G) and the second condition expresses the fact that A and B must lie in the feasible region D. Let us solve this system of equations. Expressing condition (i) in terms of the partial derivatives of f and G we obtain the equations fx1 = λGx1 (i.e., 1 = 2λx1 ) and fx2 = λGx2 (i.e, 1 = 2λx2 ). These two equations and the constraint G(x1 , x2 ) = 1 form a system of three equations for the variables x1 , x2 and λ. We can eliminate λ from the first two equations in order to obtain an equation involving x1 and x2 alone, and then use this equation in the constraint G(x1 , x2 ) = 1 in order to find x1 and x2 . Following this plan, we see that equations 1 = 2λx1 and 1 = 2λx2 imply that λ = 1 and 2x1 1 . Hence, we have x1 = x2 . Using this equation in the constraint G(x1 , x2 ) = 1 we 2x2 1 1 see that 2x22 = 1. This yields two solutions for x2 , namely, x2 = √ and x2 = − √ . The 2 2 1 1 1 1 corresponding points are (x1 , x2 ) = √ , √ and (x1 , x2 ) = − √ , − √ . These are 2 2 2 2 respectively the coordinates√of the points B and√A. The contours of f where B and A lie the maximum value of√the are respectively x1 + x2 = 2 and x1 + x2 = − 2. Therefore, √ 2 2 sum x1 + x2 subject to the constraint x1 + x2 = 1 is 2 and the minimum value is − 2. λ= 33.2 Lagrange’s method with an equality constraint The idea of tangency used in the previous section provides the basis for Lagrange’s method. We will only discuss cases where the optimisation problem can be reduced to a problem involving a single equality constraint. Optimisation problems subject to multiple equality and inequality constraints are accompanied by certain complications whose treatment goes beyond the scope of our course. Let us introduce Lagrange’s approach using the optimisation problem of the previous section. We define the so-called Lagrangian L(x1 , x2 , λ) by L(x1 , x2 , λ) = f (x1 , x2 ) + λ(1 − G(x1 , x2 )), where the functions f and G are those defined in the previous section; namely, f (x1 , x2 ) = x1 + x2 and G(x1 , x2 ) = x21 + x22 . Treating the Lagrangian L as a function of three variables, we find its stationary points by setting all three partial derivatives to zero. We obtain: Lx1 = fx1 − λGx1 = 0, Lx2 = fx2 − λGx2 = 0, Lλ = 1 − G(x1 , x2 ) = 0. 109 We recognise that the first two equations correspond to the statement (i) ∇f (x1 , x2 ) = λ∇G(x1 , x2 ) and the last equation gives the constraint (ii) G(x1 , x2 ) = 1. We have therefore recovered the system of equations of the previous section. We already 1 1 1 1 know that the solutions (x1 , x2 ) = √ , √ and (x1 , x2 ) = − √ , − √ of these equa2 2 2 2 tions are the coordinates of the maximum B and the minimum A, respectively. 33.3 Regarding the form of the Lagrangian One may ask: Is the Lagrangian function L(x1 , x2 , λ) = f (x1 , x2 ) + λ(1 − G(x1 , x2 )) the only function that reproduces the tangency condition (i) and the constraint (ii)? The answer is no. There are many alternative Lagrangian functions that reproduce these two conditions. For example, f can be replaced in L by a function such as 2f , the term +λ can by replaced by −λ and the term 1 − G(x1 , x2 ) can be rescaled to become, say, 5 − 5G(x1 , x2 ) or 8G(x1 , x2 ) − 8. The reason that we have used the conventions f , +λ and 1 − G(x1 , x2 ) (i.e., the constant comes first, followed by the term involving the variables) will become clear shortly, when we discuss the interpretation of λ and its relation to f . Note that in our textbook, the constraint function 1 − G(x1 , x2 ) is denoted by g(x1 , x2 ). Accordingly, the Lagrangian is written as L(x1 , x2 , λ) = f (x1 , x2 ) + λg(x1 , x2 ) and the equations derived from the Lagrangian yield the conditions (i) ∇f (x1 , x2 )+λ∇g(x1 , x2 ) = 0 and (ii) g(x1 , x2 ) = 0. 33.4 Regarding the applicability of Lagrange’s method Observation 1: In the example of the previous section, both stationary points of the Lagrangian gave constrained extrema of the objective function f . Question 1: Can we claim in general that each stationary point of L gives a constrained extremum of f ? The answer is no. As the following example shows, there may be stationary points of L that do not give constrained extrema of f . Example 33.4.1 Maximise and minimise the objective function f (x1 , x2 ) = x1 + x2 in the feasible region sketched below: 110 The points A, B, C and D are all points where the contour f (x1 , x2 ) = c is tangent to the contour of the constraint function. Therefore, we expect that all these points will arise as stationary points of the Lagrangian. Clearly, the point A gives the constrained minimum of f and the point D gives the constrained maximum. However, points B and C give neither constrained minima nor constrained maxima. Observation 2: In Example 33.4.1 as well as in our initial example, the stationary points of the Lagrangian included among them both the constrained extrema of f . Question 2: Is it correct to say that if the Lagrangian has stationary points, then points corresponding to constrained extrema of f can always be found among them? The answer is again no. In the optimisation problem presented below, the Lagrangian has two stationary points but no constrained extrema that correspond to them. In fact, this optimisation problem does not admit any constrained extrema. Example 33.4.2 Maximise and minimise the objective function f (x1 , x2 ) = x1 + x2 in the feasible region D = {(x1 , x2 ) ∈ R2 | x1 x2 = 16} : Clearly, points A and B arise as stationary points of the Lagrangian since the relevant contours become parallel there. However, as c decreases to −∞ or as c increases to ∞ the contour f (x1 , x2 ) = c keeps intersecting the feasible region. Hence, there is neither a lower bound nor an upper bound on c. We deduce that f has no constrained extrema. Observation 3: In all the examples seen so far, if a constrained extremum of f exists, then Lagrange’s method is able to find it. 111 Question 3: Is it correct to say that if a point corresponding to a constrained extremum of the objective function exists then it always appears as a stationary point of the Lagrangian? The answer is again no. In the following example, the optimisation problem admits both a constrained maximum and a constrained minimum but the corresponding points do not arise as stationary points of the Lagrangian. This is because the Lagrangian is not differentiable at these points. Example 33.4.3 Maximise and minimise the objective function f (x1 , x2 ) = x1 + x2 in the feasible region sketched below: We see that point A corresponds to the constrained minimum of f and that point B corresponds to the constrained maximum of f . However, the gradient of the constraint function is not defined at any of these points, so Lagrange’s method is not applicable. Question 4: Taking the issue raised in Example 33.4.3 into account, let us modify the previous question: Provided that the objective function and the constraint function are both differentiable, can we claim that if a constrained extremum of f exists then the corresponding point always appears as a stationary point of the Lagrangian? Strictly speaking, the answer is still no. There is an additional requirement that the constraint function has to satisfy before we can have a valid claim. This requirement is known as the constraint qualification. For optimisation problems that can be reduced to problems subject to a single equality constraint (such as the problems that we will discuss in this course) the constraint qualification requires that the gradient of the constraint function is non-zero at the point where the constrained extremum arises. Consider the following example: Example 33.4.4 Maximise and minimise the objective function f (x1 , x2 ) = x1 + x2 in the feasible region D = (x1 , x2 ) ∈ R2 | x21 + x22 = 0 . 112 Note that D consists only of the origin; i.e., D = {(0, 0)}. Therefore, both the constrained maximum and the constrained minimum of f occur at (0, 0). However, the 2x1 gradient ∇g = of the constraint function g(x1 , x2 ) = x21 + x22 vanishes at (0, 0), 2x2 so the tangency condition ∇f (0, 0) + λ∇g(0, 0) = 0 produces the inconsistent statement 1 0 = and therefore Lagrange’s method fails to identify the constrained maximum 1 0 and the constrained minimum at (0, 0). Summarising: After taking all these issues into account, let us finally state what Lagrange’s theorem says when applied to optimisation problems that can be reduced to problems subject to a single equality constraint: If (i) the Lagrangian L is differentiable, (ii) the constraint qualification is satisfied, and (iii) a constrained extremum for the optimisation problem exists, then the point corresponding to this extremum always appears as a stationary point of the Lagrangian. Note that optimisation problems where the Lagrangian is not differentiable or the constraint qualification condition is not satisfied will not be covered in this course. So the cases illustrated in Examples 33.4.3 and 33.4.4 are not relevant for our exams. However, it is important to know all the conditions that need to be satisfied before applying Lagrange’s theorem. So, for our purposes, Lagrange’s theorem suggests the following approach to solving optimisation problems subject to a single equality constraint: (i) Establish the existence of an optimal solution to the optimisation problem. This is usually accomplished by considering the contours of the objective function. 113 (ii) Find the stationary points of the Lagrangian. By Lagrange’s theorem, these are the only candidates for the optimal solution. (iii) Evaluate the objective function at each of these candidates in order to decide where the optimal solution occurs (keeping in mind that there may be many optimal solutions). 33.5 The Lagrange multiplier The parameter λ appearing in the Lagrangian L(x1 , x2 , λ) = f (x1 , x2 ) + λg(x1 , x2 ) is called the Lagrange multiplier. Let the constraint function g have the form g(x1 , x2 ) = b − G(x1 , x2 ) for some constant b (recall that putting the constant first is precisely the convention that we used when we introduced the Lagrangian earlier). Let (x∗1 (b), x∗2 (b), λ∗ (b)) be a stationary point of the Lagrangian corresponding to a constrained extremum of the optimisation problem (we are not interested in all the stationary points of the Lagrangian but only in those corresponding to the constrained extrema). As the notation suggests, any stationary point of L can be regarded as a function of b. Moreover, regard f as a function of b via the composite function f (b) = f (x∗1 (b), x∗2 (b)). Note that f (b) is the value of f at the constrained extremum (x∗1 (b), x∗2 (b), λ∗ (b)); in other words, f (b) is the optimal value of f . We have the following result, which is stated without proof: Theorem 33.5.1 The rate of change of f (b) with respect to b is equal to the value of λ at that particular constrained extremum: df (b) = λ∗ (b). db The significance of this result is the following: If b is increased in the constraint b − G(x1 , x2 ) = 0 by a small amount ∆b (that is, if we are solving a new optimisation problem where the objective function is f and the constraint is b + ∆b − G(x1 , x2 ) = 0) then the optimal value of f (associated with the constrained extremum (x∗1 (b), x∗2 (b), λ∗ (b))) changes by approximately λ∗ (b)∆b. Let us illustrate this by considering the following example, which is an extension of the example presented at the beginning of this week’s lecture notes: Example 33.5.2 Maximise and minimise f (x1 , x2 ) = x1 + x2 in the feasible region D = (x1 , x2 ) ∈ R2 | x21 + x22 = b , where b is some given constant. 114 Introducing the constraint function g(x1 , x2 ) = b−x21 −x22 and the corresponding Lagrangian L(x1 , x2 , λ) = f (x1 , x2 ) + λg(x1 , x2 ), we obtain the following three equations:   1 − 2λx1 − 0 1 − 2λx2 = 0  b − x21 − x22 = 0 Solving this system in the way explained previously, we obtain the constrained optima √ √ b b 1 ∗ ∗ ∗ (x1 (b), x2 (b), λ (b)) = √ , √ , √ 2 2 2b and (x∗1 (b), x∗2 (b), λ∗ (b)) √ √ b b 1 = − √ , −√ , −√ . 2 2 2b The first point corresponds to the constrained maximum of f ; the second point corresponds to the constrained minimum of f . The values f (b) = f√(x∗1 (b), x∗2 (b)) at these optima are √ f (b) = 2b at the constrained maximum and f (b) = − 2b at the constrained minimum. At each of these constrained extrema, it is easy to confirm that df (b) = λ∗ (b) db 1 where λ∗ (b) is the corresponding value of λ; that is, λ∗ (b) = √ at the constrained maxi2b 1 ∗ mum and λ (b) = − √ at the constrained minimum. 2b Hence, if in the context of a new optimisation problem we replace b by b + ∆b, the optimal value of f at each constrained optimum of the new optimisation problem will approximately be f (b) + λ∗ (b)∆b, where f (b) and λ∗ (b) are the optimal value of f and the value of λ at the corresponding constrained optimum of the old optimisation problem. √ 1 In particular, the value of f will approximately be 2b + √ ∆b at the constrained 2b maximum of the new optimisation problem and the value of f will approximately be √ 1 − 2b − √ ∆b at the constrained minimum of the new optimisation problem. 2b This is illustrated below: 115 Hence, Lagrange’s method not only solves the given optimisation problem (whenever of course the conditions of Lagrange’s theorem are satisfied) but it also carries information about the optimal value of f subject to a slightly modified constraint. This additional information is certainly of interest in optimisation problems. For example, in Economics, one frequently optimises a production function subject to a given budget constraint. Having found the optimal solution, it is useful to know by how much the optimal value of the production function will change if one slightly increases or reduces the budget. 33.6 Exercises for self study Exercise 33.6.1 Consider the cost function C : R2 → R defined by C(x, y) = 4x2 + 4y 2 − 2xy − 40x − 140y + 1800 for a firm producing two goods x and y. (a) Show that C is a convex function on R2 and hence find its global minimum in the feasible region D ⊂ R2 defined by the inequalities x ≥ 0 and y ≥ 0. (b) Suppose that the production requirement x + y ≥ 25 116 is introduced additionally to x ≥ 0 and y ≥ 0. Find the production levels x and y which minimise C on the feasible region R ⊂ R2 defined by all three inequalities x ≥ 0, y ≥ 0 and x + y ≥ 25. (c) Is the Lagrangian function defined by L(x, y, λ) = C(x, y) + λ(25 − x − y) suitable for solving the minimisation problem described in part (b)? Exercise 33.6.2 Consider the cost function C : R2 → R introduced in Exercise 33.6.1 and the feasible region S ⊂ R2 defined by the inequalities x ≥ 0, y ≥ 0 and x + y ≥ 35. It is given that C attains a global minimum on S. (a) Sketch S and explain why the global minimum of C on S must occur on the boundary of S. (b) Minimise C on S by eliminating one of the variables x or y using the fact that the optimal solution occurs on the boundary of S. (c) Hence, write down a Lagrangian L(x, y, λ) that is suitable for the minimisation of C on S and use it to verify your answer to part (b). Exercise 33.6.3 The production function P for a particular manufacturer has the CobbDouglas form P (x, y) = 100x3/5 y 2/5 where the variables x and y represent labour and capital, respectively. The cost of labour is 150 pounds per unit and the cost of capital is 250 pounds per unit; i.e., the cost function is C(x, y) = 150x + 250y. (a) Sketch roughly the feasible region D ⊂ R2 defined by x ≥ 0, y ≥ 0 and the requirement that the total cost of capital and labour cannot exceed 100,000 pounds. (b) Sketch a few contours of the production function in order to establish the existence of a point M on the boundary of D corresponding to the constrained maximum of P on D. (c) Write down a suitable Lagrangian for the maximisation of P on D and use it to find the coordinates (x∗ , y ∗ ) of the point M and the corresponding value of the Lagrange multiplier λ∗ . Also find the value of P at M . (d) Suppose that the total budget for capital and labour is increased by a small amount ε to (100, 000 + ε) pounds. Determine to first order in ε the maximum value of P subject to the modified budget constraint. Exercise 33.6.4 Consider the production function P and the cost function C introduced in Exercise 33.6.3. 117 (a) Sketch roughly the feasible region R ⊂ R2 defined by x ≥ 0, y ≥ 0 and the requirement that the total production cannot be less than 20,000 product units. (b) Sketch a few contours of the cost function in order to establish the existence of a point m on the boundary of R corresponding to the constrained minimum of C on R. (c) Write down a suitable Lagrangian for the minimisation of C on R and use it to find the coordinates (x∗ , y ∗ ) of the point m and the corresponding value of the Lagrange multiplier λ∗ . Also find the value of C at m. (d) Suppose that the total production requirement is increased by a small amount δ to (20, 000 + δ) units. Determine to first order in δ the minimum value of C subject to the modified production requirement. 33.7 Relevant sections from the textbooks • K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press. Sections 6.6, 6.7 and 6.8 of our Calculus Textbook are relevant. 34 34.1 Differential and difference equations, 1 of 5 Interest compounding Consider the following three problems: Problem 1: An amount P , called the principal, is invested for t years at an annual interest rate r. Interest is compounded annually. Calculate the final sum. • After 1 year, the amount is P (1 + r). • After 2 years, the amount is P (1 + r)2 . • Hence, after t years, the amount is P (1 + r)t . Problem 2: An amount P is invested for t years at an annual interest rate r. Interest is compounded quarterly. Calculate the final sum. In this case, the annual interest rate r is divided by 4 and interest is compounded 4 times a year. Therefore: r • After 1 year, the amount is P (1 + )4 . 4 r • After 2 years, the amount is P (1 + )8 . 4 r • Hence, after t years, the amount is P (1 + )4t . 4 118 More generally, let an amount P be invested for t years at an annual interest rate r and let r interest be compounded m times a year. Then, the amount after t years is P (1 + )mt . m Problem 3: An amount P is invested for t years at an annual interest rate r. Interest is compounded continuously. Calculate the final sum. r What we need to calculate here is the limit of P (1 + )mt as m → ∞. In order to do m 1 s so, we use the fact that lims→∞ (1 + ) = e, where e is the basis of the natural logarithm. s This result implies that the amount after 1 year is given by lim P (1 + m→∞ r m 1 r m ) = lim mr →∞ P [(1 + ) r ]r = lim P [(1 + )s ]r = P er . s→∞ m m s Therefore, the amount after t years is P ert . 1 Remark 34.1.1 One way of showing that lims→∞ (1 + )s = e is by finding the Taylor s x s series of (1 + ) about 0 and taking its limit as s → ∞. The resulting expression can s be recognised as the Taylor series of ex . Evaluating at x = 1 yields the required result 1 lims→∞ (1 + )s = e. s 34.2 Nominal and effective interest The annual rate r used in the calculations above is called the nominal rate. When interest is not compounded annually, we can calculate the so-called effective annual rate re . This is the annual rate that would need to be given if the compounding occurred only once a year. Depending on the type of compounding, we have the following cases: If interest on a principal P is compounded m times a year, then at the end of the year r r we have P (1 + re ) = P (1 + )m , which implies that re = (1 + )m − 1. This expression m m reduces to re = r when m = 1. If interest on a principal P is compounded continuously, then at the end of the year we have P (1 + re ) = P er , which implies that re = er − 1. An example of calculating effective rates is given in the Practice Questions. 34.3 Discounting and present value A future sum of money S is not worth as much as a present sum S, since money available now can earn interest in the meantime. The process of determining the present value P of a future sum S is called discounting. The discount rate is the nominal interest rate r used in the calculation of the present value. Depending on the type of compounding used in this calculation, we have the following cases: 119 If interest is compounded m times a year at a discount rate r, then the present value P of r a sum S in t years satisfies the equation P (1 + )mt = S, which implies that m P = S(1 + r −mt ) . m If interest is compounded continuously at a discount rate r, then the present value P of a sum S in t years satisfies the equation P ert = S, which implies that P = Se−rt . Example 34.3.1 Find the present value of 100 pounds to be paid in 20 years, assuming that the discount rate under continuous compounding is 0.05. We have P = 100e−(0.05)(20) = 100e−1 ≈ 36.79. Example 34.3.2 An antique car is currently worth A pounds. Its estimated value V is √ expected to increase according to the formula V = A(1.2) t , where t is measured in years. Assuming that the discount rate under continuous compounding is 0.05, find how long the investor should keep this car in order to maximise its present value. √ The present value P of the car to be sold in t years is given by P = A(1.2) t e−0.05t . Since ln is an increasing function, maximising P with respect to t is the same as maximising ln(P ) with respect to t. We have: √ ln(P ) = ln(A) + t ln(1.2) − 0.05t, so differentiating with respect to t yields 1 dP 1 = √ ln(1.2) − 0.05. P dt 2 t dP 1 0.1 = 0. This is equivalent to √ = , so the stadt ln(1.2) t tionary point is to = (10ln(1.2))2 ≈ 3.32 years. For a stationary point, we have dP dt changes for values of t on the interval (0, ∞). Since P > 0 for all t ∈ (0, ∞), the sign of 1 dP dP dP is the same as the sign of . In other words, the sign of is the same as the dt P dt dt 1 sign of √ ln(1.2) − 0.05, and we already know that the latter expression vanishes at the 2 t stationary point to = (10ln(1.2))2 and that it is positive on the interval (0, to ) and negative on the interval (to , ∞). Hence, the stationary point t0 corresponds to a global maximum of the function P , and the investment should be kept for approximately 3.32 years. In order to establish that this is a global maximum, we can investigate how the sign of 34.4 Arithmetic sequences and their partial sums Consider an arithmetic sequence {ui } with first term a and common difference d. We have: 120 u1 = a, u2 = a + d, u3 = a + 2d, which implies that the general term is given by un = a + (n − 1)d. The so-called nth partial sum sn of this sequence is given by adding its first n terms: sn = u1 + u2 + ... + un . In order to find a simple expression for sn we can use the fact that 2sn = = = = (u1 + u2 + ... + un ) + (un + un−1 + ... + u1 ) (u1 + un ) + (u2 + un−1 ) + ... + (un + u1 ) [2a + (n − 1)d] + [2a + (n − 1)d] + ... + [2a + (n − 1)d] n[2a + (n − 1)d]. It follows that the nth partial sum is given by sn = 34.5 n 2a + (n − 1)d . 2 Geometric sequences and their partial sums Consider a geometric sequence {ui } with first term a and common ratio r. We have: u1 = a, u2 = ar, u3 = ar2 , which implies that the general term is given by un = arn−1 . In order to find a simple expression for the nth partial sum sn of this sequence we can subtract sn = a + ar + ar2 + ... + arn−1 from r sn = ar + ar2 + ... + arn−1 + arn . Cancelling out equal terms we find that r sn − sn = arn − a which yields sn = a rn − 1 , r−1 121 for r 6= 1. Note that if r = 1, the geometric sequence becomes an arithmetic sequence with first term a and common difference d = 0. Hence, we have that sn = na, if r = 1. Example 34.5.1 Suppose that a one off deposit P is made at the beginning of year 1. In addition, a deposit D is made at the beginning of each subsequent year. The account is earning interest at a nominal rate r, compounded annually and paid at the end of each year. Calculate the sum accumulated at the end of t years, just after the interest is paid. At the end of year 1 the sum is P (1 + r). At the end of year 2 the sum is P (1 + r) + D (1 + r) = P (1 + r)2 + D(1 + r). Similarly, at the end of year 3 the sum is P (1 + r)3 + D(1 + r)2 + D(1 + r). Hence, at the end of year t the sum is P (1 + r)t + D(1 + r)t−1 + ... + D(1 + r)2 + D(1 + r). Recognising the geometric sequence u1 u2 .. . = D(1 + r), = D(1 + r)2 , ut−1 = D(1 + r)t−1 , we can write the sum yt accumulated at the end of t years as yt = P (1 + r)t + st−1 , where the partial sum st−1 = u1 + u2 + ... + ut−1 should be evaluated using D(1 + r) as first term and (1 + r) as common ratio. Therefore, (1 + r)t−1 − 1 yt = P (1 + r)t + D(1 + r) (1 + r) − 1 D = P (1 + r)t + (1 + r)t − (1 + r) r D D = P+ (1 + r)t − (1 + r) , r r which is the final simplified expression for the sum accumulated at the end of t years. Note that the sequence yt can be generated by the rule yn+1 = (yn + D)(1 + r) subject to the initial condition y1 = P (1+r). This is an example of a difference equation. 34.6 Exercises for self study Exercise 34.6.1 (a) Find the effective annual rate of interest on £1000 at 8% compounded (i) annually (ii) quarterly (iii) continously. 122 (b) Determine the interest rate needed to have money double in 8 years when compounded semiannually. Exercise 34.6.2 (a) A deposit of £P is made at the beginning of each month for t years in an account that is compounded monthly at an interest rate r. Find the sum accumulated after t years. (b) Write down a difference equation subject to a suitable initial condition which corresponds to the above problem. Exercise 34.6.3 The estimated value of a collection of antiques bought for investment is increasing according to the formula 2/5 V = 325, 000(1.95)t The discount rate under continuous compounding is 6.8%. How long should the collection be held to maximise the present value? Exercise 34.6.4 A loan L is obtained from a bank at the beginning of year 1. A payment P toward repaying the loan is made at the beginning of year 2 and at the beginning of each subsequent year, until the loan is finally repaid. At the end of each year starting from year 1, the bank charges interest on the outstanding loan at an annual rate r. The interest is added to the loan. (a) Find a simplified expression for the outstanding loan Lt at the end of year t, just after the interest is added. (b) Write down a difference equation corresponding to this problem. 34.7 Relevant sections from the textbooks There are no relevant sections from our textbooks this week. 35 35.1 Differential and difference equations, 2 of 5 Complex numbers A complex number, denoted z, is a number of the form z = x + iy where the so-called real part x and imaginary part y, denoted Re(z) = x, Im(z) = y, are both real numbers and i is defined by the property that i2 = −1. The set of all complex numbers is denoted by C. Any complex number z whose imaginary part is zero, is, of course, real. Any complex number z whose real part is zero is said to be purely imaginary. We can visualise each element z = x + iy of C on a plane as depicted below: 123 The x-axis is called the real axis and the y-axis is called the imaginary axis. The plane equipped with these axes is known as the complex plane. Since x and y can be thought of as Cartesian coordinates for the complex number z = x + iy, we refer to the form x + iy as the Cartesian form of z. An alternative description of the number x + iy is obtained by using the so-called polar coordinates (r, θ) on R2 , depicted below: The real, non-negative number r is known as the modulus of z, denoted |z|, and the real number θ is known as the argument of z: Mod(z) = |z| = r, Arg(z) = θ. The relation between the Cartesian and the polar coordinates of z can be derived using trigonometry. Given polar coordinates (r, θ), the Cartesian coordinates (x, y) can be found using x = rcos(θ) and y = rsin(θ). Conversely, given Cartesian coordinates (x, y), the polar coordinates (r, θ) can be found using  x  p cosθ =  p x2 + y 2 r = x2 + y 2 and . y   sinθ = p 2 x + y2 x Note that the solution θ of the simultaneous system of equations cosθ = p and 2 x + y2 y sinθ = p is not defined unambigously. This is because θ + 2nπ is also a solution, x2 + y 2 where n ∈ Z. The so-called principal argument θ corresponds to the choice −π < θ ≤ π. 124 The principal argument θ is unique, unless (x, y) = (0, 0), in which case r = 0 and hence the value of θ is undefined. Example 35.1.1 Given the complex number z ∈ C, find its polar coordinates (r, θ) in the following cases: √ √ √ √ (ii) z = − 3 + i (iii) z = −1 − 3i (iv) z = 3 − i (i) z = 1 + 3i These numbers are sketched on the complex plane below: Selecting the principal argument in each case, we find: q √ (i) r = 12 + ( 3)2 = 2, q √ (ii) r = (− 3)2 + 12 = 2, q √ (iii) r = (−1)2 + (− 3)2 = 2, q√ (iv) r = ( 3)2 + (−1)2 = 2,  1   cosθ = 2 √ 3   sinθ = 2  √   cosθ = − 3 2   sinθ = 1 2  1   cosθ = − 2√   sinθ = − 3 2  √ 3   cosθ = 2   sinθ = − 1 2 125 so θ = π 3 so θ = 5π 6 so θ = − 2π 3 so θ = − π 6 35.2 Euler’s formula and polar exponential form Recall that the Taylor series of the real-valued functions ex , sin(x) and cos(x) converge for all x ∈ R and are given by ex = 1 + x + x2 x3 + + ... 2! 3! sin(x) = x − x3 x 5 x7 + − + ... 3! 5! 7! cos(x) = 1 − x2 x4 x6 + − + ... 2! 4! 6! These results are valid even in the case where the variable x is replaced by a complex variable z. In particular, letting z = iθ, θ ∈ R, and using the fact that i2 = −1, we obtain eiθ = 1 + iθ + (iθ)2 (iθ)3 (iθ)4 (iθ)5 + + + + ... 2! 3! 4! 5! θ2 θ3 θ4 θ5 −i + + i + ... 2! 3! 4! 5! θ2 θ4 θ3 θ5 = 1− + − ... + i θ − + − ... 2! 4! 3! 5! = 1 + iθ − = cos(θ) + i sin(θ). The importance of this result, known as Euler’s formula, is that it can be used to express any complex number z as a product. Indeed, starting with the Cartesian form x + iy of a complex number z, we obtain z = x + iy = rcos(θ) + i rsin(θ) + r cos(θ) + i sin(θ) = reiθ . When z is written in the form z = reiθ , it is said to be expressed in polar exponential form. While the Cartesian form x + iy is best for addition and subtraction, the polar exponential form reiθ is best for multiplication and division of complex numbers. These operations are defined next. 35.3 Operations on C Addition, subtraction and multiplication of complex numbers can be defined by treating these numbers as polynomials in i and using i2 = −1. In particular, letting z1 = x1 + iy1 and z2 = x2 + iy2 , we have: z1 ± z2 = (x1 + iy1 ) ± (x2 + iy2 ) = (x1 ± x2 ) + i(y1 ± y2 ), z1 z2 = (x1 + iy1 )(x2 + iy2 ) = (x1 x2 − y1 y2 ) + i(x1 y2 + y1 x2 ). 126 Note that C is closed under these three operations. Indeed, on the right hand side of the above equations, we recognise the Cartesian form of an element of C. Regarding division, this is defined as follows: For z2 6= 0, z1 x1 + iy1 x1 + iy1 x2 − iy2 = = z2 x2 + iy2 x2 + iy2 x2 − iy2 = (x1 x2 + y1 y2 ) + i(−x1 y2 + y1 x2 ) x22 + y22 = x1 x2 + y 1 y 2 −x1 y2 + y1 x2 +i 2 2 x2 + y2 x22 + y22 Note that the complex number on the right hand side again has the Cartesian form, which shows that C is closed under division as well. The complex number x2 −iy2 , by which we have multiplied the numerator and denominator x1 + iy1 of the fraction above, is called the complex conjugate of x2 + iy2 . In general, x2 + iy2 the complex conjugate of a complex number z, denoted z̄, is defined by z̄ = (x + iy) = x − iy. In other words, Im(z̄) = −Im(z). z1 Clearly, the Cartesian forms of the product z1 z2 and the ratio look complicated and are z2 not very convenient to use in practice. By expressing z1 and z2 in polar exponential form, multiplication and division become as easy as addition and subtraction. Indeed, we have: Re(z̄) = Re(z) and z1 z2 = r1 eiθ1 r2 eiθ2 = (r1 r2 )ei(θ1 + θ2 ) , and, for z2 6= 0, z1 r1 eiθ1 r1 i(θ1 − θ2 ) = = e . iθ z2 r2 r2 e 2 Note that the complex conjugate of z = reiθ is given by z̄ = re−iθ , since z̄ = = = = = (reiθ ) = rcos(θ) + i rsin(θ) rcos(θ) − i rsin(θ) r cos(θ) − i sin(θ) r cos(−θ) + i sin(−θ) re−iθ . In general, replacing ‘i’ by ‘−i’ converts the complex number z into its conjugate z̄. √ √ Example 35.3.1 Given z = 1 + 3i and w = −3 + 3 3i, find (i) zw (ii) z 6 (iii) w3 127 (iv) w̄10 z 20 Regarding z, we have √ 12 + ( 3)2 = 2 q r= and  1   cos(θ) = 2 √ 3   sin(θ) = 2 so z = 2eiπ/3 . Regarding w, we have q √ r = (−3)2 + (3 3)2 = 6 and  −3   cos(θ) = 6√ 3 3   sin(θ) = 6 so w = 6ei2π/3 . Therefore, (i) zw π 2π = 2ei 3 6ei 3 = 12eiπ = 12 cos(π) + i sin(π) = −12 (ii) z6 = (iii) w3 = (iv) 35.4 10 20 w̄ z π 2ei 3 6 = 26 ei2π = 26 cos(2π) + i sin(2π) = 26 2π 3 6ei 3 = 63 ei2π = 63 cos(2π) + i sin(2π) = 63 10 20π 20π π 20 −i 2π 3 = 610 e−i 3 220 ei 3 = 610 220 . = 6e 2ei 3 Roots of polynomials The Fundamental Theorem of Algebra asserts that a polynomial of degree n with complex coefficients has n complex roots (not necessarily distinct), and can therefore be factorised into n linear factors. If the coefficients are restricted to real numbers, the polynomial can be factorised into a product of polynomials of degree 1 with real coefficients and quadratic polynomials of negative discriminant with real coefficients. Any such quadratic polynomial can be further factorised into a product of polynomials of degree 1 with complex coefficients. The proof of the Fundamental Theorem of Algebra is beyond the scope of our course, but we note the following useful result: Theorem 34.1.1 Non-real roots of polynomials with real coefficients appear in conjugate pairs. Proof Let P (x) = a0 + a1 x + · · · + an xn , ai ∈ R, be a polynomial of degree n. We shall show that if z is a root of P (x), then so is z̄. Let z be a complex number such that P (z) = 0. Then a0 + a1 z + a2 z 2 + · · · + an z n = 0. Conjugating both sides of this equation, we have a0 + a1 z + a2 z 2 + · · · + an z n = 0̄ = 0. Above we have used the fact that since 0 is a real number, it is equal to its complex conjugate. We now use the following properties of the complex conjugate, which you may 128 find useful to confirm: that the complex conjugate of the sum is the sum of the conjugates, and the complex conjugate of a product is the product of the conjugates. We have a0 + a1 z + a2 z 2 + · · · + an z n = 0, and a0 + a1 z̄ + a2 z̄ 2 + · · · + an z̄ n = 0. Since the coefficients ai are real numbers, this becomes a0 + a1 z̄ + a2 z̄ + · · · + an z̄ n = 0. That is, P (z̄) = 0, so the number z̄ is also a root of P (x). Example 35.4.2 Let us consider the polynomial x3 − 2x2 − 2x − 3 = (x − 3)(x2 + x + 1). The quadratic polynomial x2 + x + 1 can be further factorised according to √ !2 √ ! √ ! 2 2 3 1 1 1 3 3 3 1 + = x+ + i x+ − i . = x+ + x2 + x + 1 = x + 2 4 2 2 2 2 2 2 √ 1 3 Letting w = − − i, we obtain 2 2 x3 − 2x2 − 2x − 3 = (x − 3)(x − w)(x − w̄), i.e., the complex roots appear in conjugate pairs. 35.5 Exercises for self study Exercise 35.5.1 Plot z = √ 3−i and w = 1+i as points on the complex plane. Express z z6 and w in polar exponential form and find q = 10 in both polar exponential and Cartesian w form. Exercise 35.5.2 Find the roots w and w̄ of the equation x2 − 4x + 7 = 0. For these values of w and w̄, find the real and imaginary parts of the following functions: f (t) = ewt , t∈R and g(t) = wt , Exercise 35.5.3 (a) Show that the sequence yt = αmt + β m̄t where m = 1 + √ 3i, satisfies the difference equation yt+2 − 2yt+1 + 4yt = 0 for arbitrary complex constants α and β. 129 t ∈ Z+ . (b) Show that yt can be written in the form yt = rt αeiθt + βe−iθt , where (r, θ) are the polar coordinates of m. Exercise 35.5.4 Referring to the expression yt = rt αeiθt + βe−iθt from Exercise 35.5.3: (a) Use Euler’s formula eiθ = cos(θ) + isin(θ) to write yt in the form yt = rt A cos(θt) + B sin(θt) , expressing A and B in terms of α and β. (b) Hence find the most general condition on the complex constants α and β which makes yt a real sequence; that is, which makes A and B both real. 35.6 Relevant sections from the textbooks • M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press. Section 13.1 of our Algebra Textbook is relevant. 36 36.1 Differential and difference equations, 3 of 5 Difference equations Difference equations are recurrence relations satisfied by a sequence {yx }. The sequence {yx } is the unknown of the problem, and x ∈ N is an index. Example 36.1.1 An arithmetic sequence {yx } with first term a and common difference d satisfies the difference equation yx+1 = yx + d, subject to the initial condition y1 = a. The solution of this equation was derived in Lecture 34 by writing out a few terms and observing the resulting pattern: yx = a + (x − 1)d. Note that substituting the solution yx = a + (x − 1)d into the difference equation yields a + xd = a + (x − 1)d + d, which is an identity in x, as required. Example 36.1.2 Similarly, a geometric sequence {yx } with first term a and common ratio r satisfies the difference equation yx+1 = ryx , 130 subject to the initial condition y1 = a. The solution yx = arx−1 of this equation was also derived in Lecture 34 by writing out the first few terms and observing the pattern. Example 36.1.3 In addition, we saw in Lecture 34 that the difference equations generating arithmetic and geometric sequences can be ‘combined’ to yield a difference equation of the form yx+1 = ryx + d, subject to the initial condition y1 = a. An example of a sequence satisfying a difference equation of the above form was given in Example 34.5.1. The equations presented in the above examples are known as linear difference equations with constant coefficients. This is the only type of difference equation that we are going to study in this course. The relevant definition follows: A linear difference equation with constant coefficients of order n is a difference equation of the form P (E)yx = Q(x), where Q(x) is a given function and P (E) is a given polynomial of degree n of the so-called shift operator E. The shift operator E operates on the sequence {yx } by the following rule: E(yx ) = yx+1 . The operator E takes the input sequence {a, b, c, d, ...} and returns the output sequence {b, c, d, ...}. Indeed, the relation E(yx ) = yx+1 implies that E(y1 ) = y2 , i.e., E(a) = b, E(y2 ) = y3 , i.e., E(b) = c, E(y3 ) = y4 , i.e., E(c) = d, and so on. This also implies that E 2 (yx ) = E(yx+1 ) = yx+2 , E 3 (yx ) = E 2 (yx+1 ) = E(yx+2 ) = yx+3 , and so on. Example 36.1.4 The 3rd -order difference equation 2yx+3 − 4yx+2 − 5yx+1 + 3yx = 9x2 131 can be expressed in the form P (E)yx = Q(x) where P (E) = 2E 3 − 4E 2 − 5E + 3 and Q(x) = 9x2 . Difference equations of the form P (E)yx = Q(x) are split into two categories: • If Q(x) 6= 0, the equation P (E)yx = Q(x) is called non-homogeneous. • If Q(x) ≡ 0, the equation P (E)yx = 0 is called homogeneous. We study both these cases below, starting from the homogeneous case. 36.2 Difference equations of the form P (E)yx = 0 Consider the following example. Example 36.2.1 Find the general solution (i.e. the set of all solutions) of the difference equation yx+2 + 6yx+1 + 8yx = 0. This equation has the form P (E)yx = 0, where the polynomial P (E) is given by P (E) = E 2 + 6E + 8. Note that the constant term in this polynomial is non-zero. This requirement will always be imposed on difference equations. In order to understand why doing so is reasonable, replace the number 8 by 0 above and show that the resulting difference equation can be regarded as a first-order difference equation whose constant term is non-zero. In order to find solutions of the equation (E 2 + 6E + 8)yx = 0, we consider sequences of the form yx = mx , where m 6= 0 is some constant to be determined. We note that E(mx ) = mx+1 = mmx and E 2 mx = mx+2 = m2 mx , so the difference equation (E 2 + 6E + 8)yx = 0 becomes (m2 + 6m + 8)mx = 0. Hence, we deduce that as long as m is a solution of the so-called auxiliary equation m2 + 6m + 8 = 0, the sequence yx = mx solves (E 2 + 6E + 8)yx = 0. Here, the solutions of the auxiliary equation are m1 = −2 and m2 = −4 yx = (−2)x and yx = (−4)x so we obtain two solutions 132 of the equation (E 2 + 6E + 8)yx = 0. We now observe that this equation is linear, which implies that any linear combination of the solutions yx = (−2)x and yx = (−4)x is also a solution: (E 2 + 6E + 8) α(−2)x + β(−4)x = α(E 2 + 6E + 8)(−2)x + β(E 2 + 6E + 8)(−4)x = α(0) + β(0) = 0. In this way, we arrive at the solution yx = α(−2)x + β(−4)x which contains two arbitrary constants. Moreover, since the difference equation 2 (E + 6E + 8)yx is second order, and we already have a solution containing two arbitrary constants, we have found the general solution. This follows from an argument based on Linear Algebra, presented below: Consider the vector space V of all sequences. The operations of vector addition and scalar multiplication (which turn the set V of all such sequences into a vector space) are defined below: ∀yx ∈ V and ∀zx ∈ V : (y + z)x := yx + zx , ∀yx ∈ V and ∀λ ∈ R : (λy)x := λyx . Note that the sequence ox given by ox = 0 ∀x ∈ N plays the role of the zero vector in V . Indeed, ∀yx ∈ V : (y + o)x = yx + ox = yx + 0 = yx , ∀yx ∈ V : (0y)x = ox . The shift operator E : V → V defined by E(yx ) = yx+1 maps the vector space V onto itself. E maps the input vector {a, b, c, d, ...} ∈ V to the output vector {b, c, d, ...} ∈ V . This is a linear transformation, because ∀yx ∈ V and ∀zx ∈ V : E((y + z)x ) = (y + z)x+1 = yx+1 + zx+1 = E(yx ) + E(zx ), ∀yx ∈ V and ∀λ ∈ R : E((λy)x ) = (λy)x+1 = λyx+1 = λE(yx ). Moreover, it is not difficult to show that any polynomial P (E) : V → V is also a linear transformation from the vector space V onto itself. Therefore, solving the homogeneous difference equation P (E)yx = 0 is equivalent to finding the kernel of the linear transformation P (E) : V → V, i.e., the vector subspace of V consisting of all sequences in V which are mapped to the zero sequence {ox } ∈ V . In this way, the problem of finding the general solution of the difference equation P (E)yx = 0 reduces to the problem of finding a basis for the kernel 133 of P (E) : V → V . Given such a basis, every solution {yx } of the difference equation P (E)yx = 0 will be a linear combination of the basis vectors. We now need the following theorem, stated without proof: Theorem 36.2.2 If the linear transformation P (E) : V → V defined by P (E) = an E n + an−1 E n−1 + ... + a1 E + a0 is such that a0 6= 0, then dim(ker(T )) = n. Theorem 36.2.2 implies that the general solution of the homogeneous equation P (E)yx = 0 where P (E) = an E n + an−1 E n−1 + · · · + a1 E + a0 contains exactly n arbitrary constants. In the particular case of Example 36.2.1, having found a basis {(−2)x , (−4)x } for the 2-dimensional kernel of the operator P (E) = E 2 + 6E + 8, we can be certain that yx = α(−2)x + β(−4)x , where α and β are arbitrary constants, is the general solution of the difference equation yx+2 + 6yx+1 + 8yx = 0. The general method of solution of P (E)yx = 0 based on Theorem 36.2.2 is presented below: Method of Solution Let a difference equation of order n have the form P (E)yx = 0, where the constant term of the polynomial P (m) is not zero. We solve the auxiliary equation P (m) = 0: Case 1: If the solutions m1 , m2 , . . . , mn of this equation are all distinct, then the general solution of the difference equation P (E)yx = 0 is yx = α1 (m1 )x + α2 (m2 )x + · · · + αn (mn )x where α1 , α2 , . . . , αn are arbitrary constants. Case 2: If a solution m has algebraic multiplicity k, then this particular m contributes in the general solution for yx the term yx = . . . β1 + β2 x + β3 x2 + · · · + βk xk−1 mx + . . . where β1 , β2 , . . . , βk are arbitrary constants. We will explain the reason why this form arises later in the course. Example 36.2.3 Suppose that the polynomial P (E) in the difference equation P (E)yx = 0 factorises as follows: P (m) = (m − 4)(m − 3)2 (m + 1)(m + 9)4 . Then, the general solution for yx is yx = α1 4x + α2 + α3 x 3x + α4 (−1)x + α5 + α6 x + α7 x2 + α8 x3 (−9)x where α1 , α2 , . . . , α8 are arbitrary constants. In order to complete the theory, we need to consider the possibility that some of the solutions of the auxiliary equation P (m) = 0 may be non-real. Since the coefficients of 134 the polynomial P (m) are all real, whenever a non-real solution for m arises, it must be accompanied by its complex conjugate. Complex Conjugate Pairs Suppose that P (m) = 0 admits a pair of complex conjugate solutions. Let these solutions be expressed in polar exponential form as m1 = reiθ and m2 = re−iθ , where 0 ≤ θ < π. Also suppose for simplicity that this pair is not repeated. In other words, m1 and m2 are distinct solutions. Following the pattern described above, the general solution for yx contains a term of the form yx = . . . α1 (reiθ )x + α2 (re−iθ )x . . . , where α1 and α2 are arbitrary complex constants. This term can be expressed equivalently as iθx −iθx rx . . . . yx = . . . α1 e + α2 e Let us now impose the requirement that the sequence yx be real. It is not difficult to show that the most general condition on α1 and α2 compatible with this requirement is α1 = α2 . This leads to a term of the form yx = . . . β1 cos(θx) + β2 sin(θx) rx . . . , where β1 and β2 are real arbitrary constants. In other words, given a complex conjugate pair m1 = reiθ and m2 = re−iθ , the modulus r is raised to the power of x (in the general solution for yx ) and the argument θ becomes the argument of the trigonometric functions. Example 36.2.4 Suppose that the polynomial P (m) associated with the difference equation P (E)yx = 0 can be factorised as follows: P (m) ≡ (m + 5)2 (m − 3 − 2i)(m − 3 + 2i). √ √ The modulus r of 3 ± 2i is equal to r = 32 + 22 = 13. The angle θ, where in this context we can always choose 0 ≤ θ < π, is found by solving  3   cos(θ) = √ 13 2   sin(θ) = √ . 13 Then the general solution for yx is given by yx = (α1 + α2 x)(−5)x + α3 cos(θx) + α4 sin(θx) 13x/2 , where α1 , α2 , α3 and α4 are all real arbitrary constants. 135 Finally, before we proceed to the non-homogeneous case P (E)yx = Q(x), let us present a result about the long term behaviour of the solutions of the homogeneous equation P (E)yx = 0. Theorem 36.2.5 Given a homogeneous linear difference equation with constant coefficients of order n, P (E)yx = 0, all solutions yx tend to 0 as x → ∞ if and only if all roots of the auxiliary equation P (m) = 0 have modulus less than 1. 36.3 Difference equations of the form P (E)yx = Q(x) Consider the non-homogeneous difference equation P (E)yx = Q(x), where Q(x) is a given function. Its general solution is constructed in three steps: Step 1: We find the general solution of the homogeneous part P (E)yx = 0. This general solution is called the complementary sequence, denoted (CS)x . It is obtained by the method developed for homogeneous difference equations. If the polynomial P (E) has degree n, then (CS)x contains n real arbitrary constants. Step 2: We find a single solution of the non-homogeneous equation P (E)yx = Q(x). This solution is called a particular sequence, denoted (P S)x . For simple functions Q(x), a method by which we can obtain a particular sequence will be presented in Examples 36.3.1 and 36.3.2 below as well as in the Exercises. Step 3: The general solution of the non-homogeneous equation P (E)yx = Q(x) is then the sum of the particular sequence and the complementary sequence: yx = (P S)x + (CS)x . Before we proceed to the examples, let us confirm that the above expression solves P (E)yx = Q(x). Indeed, since P (E)(CS)x = 0 and P (E)(P S)x = Q(x), we see that the linearity of the difference equation implies that P (E)yx = P (E) (P S)x + (CS)x = P (E)(P S)x + P (E)(CS)x = Q(x) + 0 = Q(x). Let us also show that yx = (P S)x + (CS)x provides the general solution of P (E)yx = Q(x): To this end, we need to show that any solution sx of the equation 136 P (E)yx = Q(x) can be written in the form sx = (P S)x + (CS)x for some suitable choice of the constants in (CS)x . Assuming that we are given a solution sx , let us consider the sequence sx − (P S)x . This sequence satisfies the homogeneous equation P (E)yx = 0 because P (E) sx − (P S)x = P (E)sx − P (E)(P S)x = Q(x) − Q(x) = 0. Hence, using the argument based on Linear Algebra, we deduce that the vector sx − (P S)x belongs to the null space of the linear transformation P (E) : V → V. Hence sx − (P S)x must be a linear combination of the basis vectors in the null space of P (E) and hence it must have the form (CS)x for some suitable choice of constants (i.e. the scalars in the linear combination (CS)x ). This proves that any solution sx can be written in the form sx = (P S)x + (CS)x , for some suitable choice of constants in (CS)x . Example 36.3.1 Solve the difference equation (E 2 − 5E + 6)yx = 5. Considering the homogeneous part (E 2 −5E +6)yx = 0, we find that the auxiliary equation m2 − 5m + 6 = 0 yields the distinct roots m1 = 3, m2 = 2. The complementary sequence is therefore (CS)x = A(3)x + B(2)x , where A and B are arbitrary real constants. As a particular sequence, we try a sequence yx that has a chance of producing an identity in x when substituted into the non-homogeneous equation. Recall that this is precisely what we mean by a solution of a difference equation. For simple choices of Q(x), this can usually be achieved by considering a particular sequence which has the same general form as the function Q(x). Here, Q(x) = 5, so we try a constant sequence yx = a. Indeed, substituting yx = a into the equation yx+2 − 5yx+1 + 6yx = 5 yields a − 5a + 6a = 5 137 which means that a = 5 and hence a particular sequence is 2 5 (P S)x = . 2 Therefore, the general solution of the equation (E 2 − 5E + 6)yx = 5 is yx = (P S)x + (CS)x = 5 + A(3)x + B(2)x , 2 where A and B are arbitrary real constants. Note that the general solution contains two arbitrary constants, as should be the case for a second order difference equation. Example 36.3.2 Solve the difference equation encountered in Example 34.5.1, yx+1 = (yx + D)(1 + r), subject to the initial condition y1 = P (1 + r). Recall that P and D are deposits, r is the annual interest rate, x is an index denoting the year and yx is the accumulated amount. We arrange this difference equation in the form yx+1 − (1 + r)yx = D(1 + r). Regarding the homogeneous part E − (1 + r) yx = 0, the auxiliary equation m − (1 + r) = 0 yields the single root m = (1 + r). Hence the complementary sequence is given by (CS)x = A(1 + r)x , where A is an arbitrary constant. Regarding a particular sequence we try yx = a, where a is a constant to be determined by substituting yx into the non-homogeneous equation yx+1 − (1 + r)yx = D(1 + r). We find a − (1 + r)a = D(1 + r), which yields a=− D (1 + r). r Hence, a particular sequence is (P S)x = − D (1 + r). r 138 The general solution of the difference equation yx+1 − (1 + r)yx = D(1 + r) is therefore D (1 + r) + A(1 + r)x , r where A is an arbitrary constant. In order to determine A we use the initial condition y1 = P (1 + r). This yields D (1 + r) = P (1 + r), A− r yx = − which implies that A=P+ D . r Hence, the solution to our problem is given by D D yx = − (1 + r) + P + (1 + r)x r r or, equivalently, by D (1 + r)x − (1 + r) , r in agreement with the result obtained in Example 34.5.1. yx = P (1 + r)x + 36.4 Exercises for self study Exercise 36.4.1 (a) Find the general solution of the difference equation yx+2 − yx+1 − 2yx = 4x. (b) Consider the initial conditions y1 = 1 and y2 = 2. Generate y3 and y4 using the difference equation directly. (c) Find the particular solution of the difference equation subject to the initial condtitions y1 = 1 and y2 = 2 and verify that it reproduces the terms y3 and y4 calculated in part (b). Exercise 36.4.2 (a) Find the general solution of the difference equation yx+2 − yx+1 − 2yx = (−1)x . Hint: For a particular sequence, try yx = ax(−1)x . (b) Find the particular solution satisfying y3 = 1 and y6 = 10. Exercise 36.4.3 Consider the difference equation st+1 = st (1 + R) + D subject to the initial condition s1 = D, where R and D are some constants. (a) Write this difference equation in the form P (E)st = Q(t) for a suitable polynomial P and function Q(t). (b) Solve the difference equation subject to the given initial condition. Exercise 36.4.4 Find the general solution of the difference equation yx+3 − 3yx+2 + 9yx+1 + 13yx = 5x + 3. 139 36.5 Relevant sections from the textbooks • K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press. Sections 14.2, 14.3, 14.4, 14.4, 14.6, 14.7 and 14.8 of our Calculus Textbook are relevant. 37 37.1 Differential and difference equations, 4 of 5 Linear ODEs with constant coefficients A linear ordinary differential equation with constant coefficients of order n is a differential equation for the function y(x) of the form P (D)y = Q(x) where Q(x) is a given function and P (D) is a given polynomial of the differential operator d D= of degree n. dx Example 37.1.1 The 3rd -order linear differential equation 2 dy d3 y d2 y + 2 − 5 + 3y = sin(x) 3 dx dx dx can be expressed in the form P (D)y = Q(x) where P (D) = 2D3 + D2 − 5D + 3 and Q(x) = sin(x). As with difference equations, • If Q(x) 6= 0, the equation P (D)y = Q(x) is called non-homogeneous. • If Q(x) ≡ 0, the equation P (D)y = 0 is called homogeneous. We study both these cases below, starting from the homogeneous case. 37.2 Solving ODEs of the form P (D)y = 0 Let us motivate the method of solution by using an example. Example 37.2.1 Solve the ordinary differential equation d2 y dy + 6 + 8y = 0. 2 dx dx This equation has the form P (D)y = 0 where the polynomial P (D) is given by P (D) = D2 + 6D + 8. 140 Consider functions of the form y(x) = emx where m is some constant to be determined. We note that d mx e = memx Demx = dx and d2 D2 emx = 2 emx = m2 emx , dx mx so by substituting y(x) = e in the differential equation (D2 + 6D + 8)y = 0, we obtain (m2 + 6m + 8)emx = 0. Clearly, if m is a solution of the auxiliary equation m2 + 6m + 8 = 0, then the function y(x) = emx is a solution of the differential equation (D2 + 6D + 8)y = 0. The equation m2 + 6m + 8 = 0 yields m1 = −2 and m2 = −4, so the functions y(x) = e−2x and y(x) = e−4x are both solutions of the differential equation (D2 + 6D + 8)y = 0. Moreover, since the differential equation is linear, any linear combination of the solutions y(x) = e−2x and y(x) = e−4x is also a solution. Indeed, for any constants α and β, we have (D2 + 6D + 8) αe−2x + βe−4x = α (D2 + 6D + 8) e−2x + β (D2 + 6D + 8) e−4x = α ((−2)2 + 6(−2) + 8) e−2x + β ((−4)2 + 6(−4) + 8) e−4x = α (0) e−2x + β (0) e−4x = 0. As with difference equations, the key point is this: We have a solution y(x) = αe−2x + βe−4x of the second order ODE (D2 + 6D + 8)y = 0 which contains two arbitrary constants. Hence, we have the general solution of this ODE. We discussed why this is the case in Lecture 36 in the context of the homogeneous equation P (E)yx = 0. The only difference now is that D2 + 6D + 8 defines a linear operator on the vector space C ∞ of differentiable functions, where the vectors e−2x and e−4x form a basis for the 2-dimensional kernel of this operator. Let us now generalise the above method so that it becomes applicable to any differential equation of the form P (D)y = 0. We will consider some typical examples immediately afterwards. Method of Solution Given any ordinary differential equation of the form P (D)y = 0 of order n, we solve the polynomial equation P (m) = 0, called again the auxiliary equation. Case 1: If the solutions m1 , m2 , ..., mn are all distinct, then the general solution of the differential equation P (D)y = 0 is y(x) = α1 em1 x + α2 em2 x + · · · + αn emn x 141 where α1 , α2 , ..., αn are arbitrary constants. Case 2: If a solution m has algebraic multiplicity k, then this particular m contributes in the general solution for y(x) the term y(x) = · · · + β1 + β2 x + β3 x2 + ... + βk xk−1 emx + . . . where β1 , β2 , . . . , βk are arbitrary constants. Example 37.2.2 Suppose that the polynomial P (D) in the differential equation P (D)y = 0 is such that the auxiliary equation P (m) factorises as follows: P (m) = (m − 4)(m − 3)2 (m + 1)(m + 9)4 . Then, the general solution for y(x) is y(x) = α1 e4x + (α2 + α3 x) e3x + α4 e−x + α5 + α6 x + α7 x2 + α8 x3 e−9x where α1 , α2 , . . . , α8 are arbitrary constants. Example 37.2.3 Solve the 4th order differential equation d4 y d3 y d2 y − − 2 = 0. dx4 dx3 dx2 The polynomial P (D) is given by P (D) = D4 − D3 − 2D2 , so the auxiliary equation for m is m4 − m3 − 2m2 = 0. We factorise the left hand side of this equation to obtain m4 − m3 − 2m2 ≡ m2 (m2 − m − 2) ≡ m2 (m − 2)(m + 1), so the solutions are m = 0 (algebraic multiplicity 2), m=2 and m = −1. Therefore, the general solution of the differential equation (D4 − D3 − 2D2 )y = 0 is y(x) = (α1 + α2 x)e0x + α3 e2x + α4 e−1x = α1 + α2 x + α3 e2x + α4 e−x , where α1 , α2 , α3 and α4 are arbitrary constants. Finally, let us complete the study of the differential equation P (D)y = 0 by considering what happends when the solutions of the auxiliary equation P (m) = 0 are non-real. Recall that since the coefficients of the polynomial P (m) are all real, whenever a non-real solution for m arises, it must be accompanied by its complex conjugate. Complex Conjugate Pairs Suppose that P (m) = 0 admits a pair of complex conjugate solutions. Let these solutions be m1 = a + ib and m2 = a − ib, 142 where a and b are real numbers. Also suppose, for simplicity, that this pair is not repeated. In other words, m1 and m2 are distinct solutions. The general solution for y(x) then contains a term of the form y(x) = ...α1 em1 x + α2 em2 x ..., where α1 and α2 are arbitrary complex constants. Using the relations m1 = a + ib and m2 = a − ib, this term becomes y(x) = . . . α1 eax+ibx + α2 eax−ibx . . . which can be expressed in the form y(x) = ... α1 eibx + α2 e−ibx eax . . . . Following the same argument used for difference equations, requiring that y(x) be a real function forces α1 and α2 to be complex conjugates, in which case we obtain y(x) = . . . (β1 cos(bx) + β2 sin(bx)) eax . . . where β1 and β2 are real arbitrary constants. To summarise, given a complex conjugate pair m1 and m2 , the real part a becomes the argument of the exponential function and the imaginary part b becomes the argument of the trigonometric functions. Such solutions describe oscillations with amplitude that varies in time. Example 37.2.4 Suppose that the polynomial P (m) associated with the differential equation P (D)y = 0 can be factorised as follows: P (m) ≡ (m + 5)2 (m − 3 − 2i)(m − 3 + 2i). Then, the general solution for y(x) is given by y(x) = (α1 + α2 x)e−5x + (α3 cos(2x) + α4 sin(2x))e3x where α1 , α2 , α3 and α4 are all real arbitrary constants. Finally, let us present a result about the long term behaviour of the solutions of P (D)y = 0: Given a homogeneous linear differential equation with constant coefficients of order n, P (D)y = 0, all solutions y(x) tend to 0 as x → ∞ if and only if all roots of the auxiliary equation P (m) = 0 have negative real part. In general, it is possible to analyse the behaviour of the solutions of P (D)y = 0 in a systematic way by focusing on the dominant term appearing in these solutions. The following example illustrates how: Example 37.2.5 Discuss the behaviour as x → ∞ of the general solution y(x) = Ae−3x + Be2x + Ce3x + e−x (Dcosx + Esinx) : 143 We observe that e−3x → 0 and e−x → 0 as x → ∞. Therefore, y(x) → Be2x + Ce3x . The dominant term as x → ∞ is e3x . Therefore, the constant C will determine the behaviour of the solution. If C > 0, then y(x) → ∞. If C < 0, then y(x) → −∞. If C = 0, then the dominant term is e2x . Therefore, the constant B will determine the behaviour of the solution. Hence, If C = 0 and B > 0, then y(x) → ∞. If C = 0 and B < 0, then y(x) → −∞. If C = 0 and B = 0, then y(x) → 0 for all A and D and E. 37.3 Solving ODEs of the form P (D)y = Q(x) As with difference equations, the general solution of the non-homogeneous equation P (D)y = Q(x) is constructed in three steps: STEP 1: We find the general solution of the homogeneous part P (D)y(x) = 0. This solution is called the complementary function and is denoted by (CF )(x). It can be obtained by the method developed for homogeneous ODEs. If the polynomial P (D) has degree n, then (CF )(x) contains n real arbitrary constants. STEP 2: We then find a single solution of the non-homogeneous equation P (D)y(x) = Q(x). This solution is called a particular integral and is denoted by (P I)(x). The method by which we obtain a particular integral is similar to that introduced for difference equations. STEP 3: The general solution of the non-homogeneous equation P (D)y(x) = Q(x) is the sum of the particular integral and the complementary function: y(x) = (P I)(x) + (CF )(x). Example 37.3.1 Solve the differential equation (D2 − 3D + 2)y(x) = 5. Considering the homogeneous part (D2 − 3D + 2)y(x) = 0, we find that the auxiliary equation m2 − 3m + 2 = 0 yields the distinct roots m1 = 1, m2 = 2. 144 The complementary function is therefore (CF )(x) = Aex + Be2x , where A and B are arbitrary real constants. For a particular integral, we try a function (P I)(x) that has a chance of producing an identity in x when substituted into the non-homogeneous differential equation. Recall that this is precisely what we mean by a solution. Here, we need to satisfy y 00 (x) − 3y 0 (x) + 2y(x) = 5, so a constant solution y(x) = a should definitely work. Indeed, substituting y(x) = a into y 00 (x) − 3y 0 (x) + 2y(x) = 5 we find 2a = 5, which means that a particular integral is 5 (P I)(x) = . 2 Therefore, the general solution of the equation (D2 − 3D + 2)y(x) = 5 is y(x) = (P I)(x) + (CF )(x) = 5 + Aex + Be2x , 2 where A and B are arbitrary real constants. Example 37.3.2 Solve the differential equation (D2 − 3D + 2)y(x) = cos(x). The left hand side is identical to that of Example 37.3.1, so the complementary function remains (CF )(x) = Aex + Be2x , where A and B are arbitrary real constants. Regarding a particular integral, we need to satisfy y 00 (x) − 3y 0 (x) + 2y(x) = cos(x), so a natural candidate is y(x) = acos(x) where a is a constant to be determined. However, substituting this expression into the differential equation is clearly going to produce sin(x) terms as well; so a single constant a will not be enough to satisfy an identity involving both functions cos(x) and sin(x). Indeed, if y(x) = acos(x), then y 0 (x) = −asin(x) and y 00 (x) = −acos(x), so the equation y 00 (x) − 3y 0 (x) + 2y(x) = cos(x) becomes −acos(x) + 3asin(x) + 2acos(x) = cos(x). Rearranging, we get (a − 1)cos(x) + 3asin(x) = 0, which cannot be an identity in x unless both coefficients vanish. This leads us to the inconsistent system a=1 and a = 0. 145 On the other hand, if we try y(x) = acos(x) + bsin(x) as a particular integral, we will have two constants a and b at our disposal and still two functions to deal with (since cos(x) and sin(x) are not going to produce any more functions when substituted into the differential equation). Therefore, the form y(x) = acos(x) + bsin(x) for a particular integral should work: Letting y(x) = acos(x)+bsin(x), we find y 0 (x) = −asin(x)+bcos(x) and y 00 (x) = −acos(x)− bsin(x), so the equation y 00 (x) − 3y 0 (x) + 2y(x) = cos(x) becomes −acos(x) − bsin(x) + 3asin(x) − 3bcos(x) + 2acos(x) + 2bsin(x) = cos(x). Rearranging, we get (a − 3b − 1)cos(x) + (b + 3a)sin(x) = 0. This produces an identity in x provided that the coefficients of cos(x) and sin(x) are zero. Solving the resulting set of simultaneous equations, we find 1 10 a= and b=− 3 . 10 Therefore, a particular integral is (P I)(x) = 1 3 cos(x) − sin(x) 10 10 and the general solution is y(x) = 1 3 cos(x) − sin(x) + Aex + Be2x , 10 10 where A and B are arbitrary real constants. Example 37.3.3 Solve the differential equation (D2 − 3D + 2)y(x) = 4e−x . The complementary function is still (CF )(x) = Aex + Be2x , where A and B are arbitrary real constants. Regarding a particular integral, we need to satisfy y 00 (x) − 3y 0 (x) + 2y(x) = 4e−x , so a natural candidate is y(x) = ae−x for some a to be determined. This expression is actually sufficient, because substituting it into the differential equation is not going to produce any other functions. In other words, a single constant a is sufficient to deal with the single function e−x . Indeed, letting y(x) = ae−x , we get y 0 (x) = −ae−x and y 00 (x) = ae−x , so the equation y 00 (x) − 3y 0 (x) + 2y(x) = 4e−x 146 becomes ae−x + 3ae−x + 2ae−x = 4e−x . Rearranging, we get (6a − 4)e−x = 0, which produces a valid identity provided that 2 a= . 3 Therefore, a particular integral is 2 (P I)(x) = e−x 3 and the general solution is 2 y(x) = e−x + Aex + Be2x , 3 where A and B are arbitrary real constants. Example 37.3.4 Solve the differential equation (D2 − 3D + 2)y(x) = 3ex . The complementary function is still (CF )(x) = Aex + Be2x , where A and B are arbitrary real constants. We now need to note that the function ex on the right hand side of the equation also appears in the complementary function. As a result, the selection of a particular integral requires some thought. Let us recall what the problem is: We need to satisfy y 00 (x)−3y 0 (x)+2y(x) = 3ex . A natural candidate is y(x) = aex for some a to be determined. However, any term of the form aex is included in the complementary function and therefore satisfies the homogeneous equation. Therefore, if we substitute y(x) = aex into the equation y 00 (x) − 3y 0 (x) + 2y(x) = 3ex we will obtain the inconsistent relation 0 = 3ex . As with difference equations, multiplying by x in order to obtain y(x) = axex fixes the problem. Note that you are not going to encounter cases which are more complicated than this one, so you do not need to worry about extending this rule further. General rules for constructing particular integrals do exist; some can be found in our Calculus textbook. Here, letting y(x) = axex , we find y 0 (x) = aex + axex and y 00 (x) = 2aex + axex . The equation y 00 (x) − 3y 0 (x) + 2y(x) = 3ex then becomes 2aex + axex − 3aex − 3axex + 2axex = 3ex . Rearranging, we get (−a − 3)ex = 0, 147 which gives a valid identity provided that a = −3. Therefore, a particular integral is (P I)(x) = −3xex and the general solution is y(x) = −3xex + Aex + Be2x , where A and B are arbitrary real constants. 37.4 Exercises for self study Exercise 37.4.1 Write down a linear differential equation P (D)y = 0 whose solutions include the function e−2x and another such equation whose solution include the function x2 ex . Hence write down a homogeneous linear differential equation whose solutions include both e−2x and x2 ex together with its general solution. Exercise 37.4.2 Find the general solutions of the differential equation d4 y d3 y d2 y + 2 + = 2cos(x). dx4 dx3 dx2 Exercise 37.4.3 (a) Find the particular solution of the equation d2 y dy − 2 + 10y = 4 2 dx dx which satisfies the conditions y(0) = 1 and y π 6 = 0. (b) Describe the behaviour of this solution as x → ∞. Exercise 37.4.4 Find the general solution of the differential equation d4 y d3 y d2 y + 2 + = sin(x) dx4 dx3 dx2 37.5 Relevant sections from the textbooks • K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press. Sections 14.1 to 14.8 of our Calculus Textbook are all relevant. 148 38 38.1 Differential and difference equations, 5 of 5 Ordinary and partial differential equations In lecture 37, we discussed ordinary difference equations of the special form P (D)y(x) = Q(x). In general, a differential equation is an equation which contains at least one derivative of an unknown function f . A solution of a differential equation is a relation between the variables involved in the differential equation which is free from derivatives and which is consistent with the differential equation. df = f is a differential equation for the unknown function Example 38.1.1 The equation dx f . A solution of this differential equation is f = ex . This defines a relation between the variables x and f which is free from derivatives and which is consistent with the differential equation in the following sense: If we replace f by ex in the differential equation we obtain the identity ex = ex . On the other hand, the relation f = x2 is not a solution of this differential equation. If we replace f by x2 in the differential equation we obtain 2x = x2 which is not an identity in x. A differential equation for f which involves only one independent variable is called an ordinary differential equation. A differential equation for f which involves two or more independent variables is called a partial differential equation. ∂f ∂f ∂f +y +z = 2f is a partial differential equation ∂x ∂y ∂z for f (x, y, z). A solution of this equation is f = x2 + y 2 + z 2 . If we replace f by x2 + y 2 + z 2 in the differential equation we obtain the identity 2x2 + 2y 2 + 2z 2 = 2x2 + 2y 2 + 2z 2 . Example 38.1.2 The equation x The order of a differential equation is the highest order of any derivative appearing in the equation. ∂f = f 2 + y is a first-order partial differential equation ∂x df d3 f for f (x, y) and the equation = f 5 + 2x6 − is a third-order ordinary differential 3 dx dx equation for f (x). Example 38.1.3 The equation x The degree of a differential equation is the algebraic degree with which the derivative of highest order appears in the differential equation. 4 ∂f ∂f Example 38.1.4 The equation x + = f is a first-order partial differential ∂x ∂y 3 2 5 df df equation of degree four for f (x, y) and the equation = + f is a thirddx3 dx order ordinary differential equation of degree two for f (x). The general solution of a differential equation is the collection of all solutions of this equation. Given a differential equation, initial and boundary conditions are additional requirements that solutions of this equation must satisfy. Boundary and initial conditions limit the 149 general solution of a differential equation. 38.2 Separable ODEs For all ordinary differential equations, we will denote the independent variable by x and the unknown function by y(x). Of course, given an ordinary differential equation for y(x), we can also interpret it as defining x in terms of y; that is, as a differential equation for the local inverse function x(y). Sometimes this is actually preferable. A separable ordinary differential equation is an equation that can be arranged in the form dy = F (x)G(y), dx where F (x) and G(y) are given functions of x and y. Note the product structure on the right hand side. The most familiar case of a separable differential equation is the case G(y) = 1. The equation then becomes dy = F (x). dx Its general solution is given by any primitive of F (x) plus an arbitrary constant: Z y(x) = F (x)dx + C. Example 38.2.1 Solve the separable differential equation 1 dy = 2 dx x +1 subject to the condition that y = 5 when x = 0. Z 1 By recognition, we have y(x) = dx = arctan(x) + C, which corresponds to the 2 x +1 general solution of the above equation. Imposing the condition that y = 5 when x = 0, we obtain the particular solution of this equation given by y(x) = arctan(x) + 5. Another simple case of a separable differential equation is the case F (x) = 1. The equation dy then becomes = G(y). This is actually identical to the previous case if we regard it as dx a differential equation for the ‘local inverse function’ x(y). Example 38.2.2 Find the general solution of the separable differential equation dy = y 2 + 1. dx We realise that this equation is equivalent to dx 1 = 2 . dy y +1 150 Z Hence, by recognition, we have x(y) = y2 1 dy = arctan(y) + C. Note that we can +1 make y the subject of this relation, obtaining y(x) = tan(x − C). dy = y 2 + 1. Indeed, replacing dx y by tan(x − C) in this equation we obtain sec2 (x − C) = tan2 (x − C) + 1, which is an identity in x. This corresponds to the general solution of the equation The method of solution followed in Examples 38.2.1 and 38.2.2 was to separate the variables and then integrate. This method can be applied to any differential equation of the form dy = F (x)G(y), which justifies why these equations are called separable. The method of dx solution is simply this: We separate the variables, dy = F (x)dx G(y) and integrate in order to obtain the general solution Z Z dy = F (x)dx + C, G(y) where C is an arbitrary constant of integration. Example 38.2.3 Solve the differential equation y2 dy = 2. dx x Z We see that this is a separable equation. We arrange it in the form dy = y2 Z dx and x2 perform the integration in order to obtain 1 1 − = − + C. y x Making y the subject of this equation leads to the so-called explicit form of the general solution; namely x . y(x) = 1 − Cx x You may confirm by the quotient rule that y(x) = is indeed a solution of the 1 − Cx dy y2 differential equation = 2. dx x 38.3 Introduction to partial differential equations Solving general partial differential equations goes beyond the scope of this course. However, the need for solving such equations does arise even in the context of analysing ordinary differential equations. For example, the so-called exact ordinary differential equations 151 (presented in the next subsection) require some knowledge of simple partial differential equations. The partial differential equations for a function f (x, y) that are needed for our purposes have the following form: ∂ i+j f (x, y) = G(x, y), ∂xi ∂y j where i and j take values in the set {0, 1, 2} and G(x, y) is a given function. ∂ i+j f (x, y) ∂xi ∂y j until we reach the function f (x, y). Note that every time we integrate with respect to one of the independent variables x or y, we need to introduce an arbitrary function of the other variable, not just an arbitrary constant. Any such equation is solved by successive integrations of the partial derivatives Example 38.3.1 Find the general solution of the partial differential equation ∂ 2f = x2 + xy + 3. ∂x∂y Let us integrate first with respect to y and then we respect to x, noting that the order in which we integrate does not affect the final answer. Regarding the y integration, we treat x as a fixed number and introduce an arbitrary function k(x), because this is the most general term that can be added to the primitive which is consistent with the requirement that its partial derivative with respect to y is equal to zero. We find 1 ∂f = x2 y + xy 2 + 3y + k(x). ∂x 2 Regarding the x integration, we now treat y as a fixed number. We also realise that the integral of k(x) is another arbitrary function K(x). Finally, we introduce an arbitrary function L(y), because this is the most general term that can be added to the primitive which is consistent with the requirement that its partial derivative with respect to x is equal to zero. We obtain the final answer 1 1 f = x3 y + x2 y 2 + 3yx + K(x) + L(y). 3 4 You may confirm that this expression for f (x, y) solves the partial differential equation ∂ 2f = x2 + xy + 3. ∂x∂y 38.4 Exact ODEs Consider a first order differential equation arranged in the form M (x, y)dx + N (x, y)dy = 0, where M (x, y) and N (x, y) are given functions of x and y. This equation is equivalent to dy M (x, y) =− , so it is a quite general first-order differential equation. Now suppose that dx N (x, y) there exists a function F (x, y) with the property that ∂F = M (x, y) ∂x and 152 ∂F = N (x, y). ∂y Note that such a function may not actually exist; however, if it exists, the differential equation M (x, y)dx + N (x, y)dy = 0 can be expressed in the form ∂F (x, y) ∂F (x, y) dx + dy = 0. ∂x ∂y Dividing by dx we obtain the equivalent equation ∂F (x, y) ∂F (x, y) dy + = 0, ∂x ∂y dx which implies that the derivative of the composite function F (x, y(x)) with respect to x is d F x, y(x) = 0. equal to 0; i.e., dx Hence, the expression F x, y(x) = C for some arbitrary constant C yields the general solution of the differential equation. We say that the relation F (x, y) = C defines the function y(x) implicitly in terms of x. Before we develop this theory further, let us consider an example of a differential equation of the form M (x, y)dx + N (x, y)dy = 0 for which a function F (x, y) with the property that ∂F ∂F = M (x, y) and = N (x, y) does exist. This example clarifies what one means by ∂x ∂y the statement that “F (x, y) = C defines the general solution y(x) implicitly in terms of x”. Example 38.4.1 Solve the differential equation (y + 3x2 )dx + xdy = 0. Following the previous approach, we would like to find a function F (x, y) such that ∂F = y + 3x2 ∂x and ∂F = x. ∂y We see that we have a system of simultaneous partial differential equations of the simplest kind. Let us solve it by finding the general solution of the PDE on the left and then substituting this solution into the PDE on the right in order to obtain a solution consistent with both partial differential equations. The general solution of the PDE on the left is F (x, y) = xy + x3 + g(y), where g(y) is an arbitrary function. Substituting this solution into the P.D.E. on the right we find x + g 0 (y) = x, which implies that g(y) = A, where A is an arbitrary constant. Therefore, we update our solution F (x, y) = xy + x3 + g(y), which now becomes F (x, y) = xy + x3 + A. According to the theory developed so far, the general solution y(x) of the differential equation (y + 3x2 )dx + xdy = 0 is obtained implicitly by setting F (x, y) equal to a constant C. Realising that the constant A in F (x, y) = xy + x3 + A can be absorbed in the constant C, we have xy + x3 = C. This defines the general solution y(x) implicitly in terms of x. By making y the subject of this equation, we obtain the general solution y(x) in explicit form: y(x) = C − x2 . x 153 You may confirm that this expression solves the differential equation corresponds to the expanded form (y + 3x2 )dx + xdy = 0. −y − 3x2 dy = which dx x Let us now resume the development of the theory by stating conditions on the given functions M (x, y) and N (x, y) that guarantee the existence of a function F (x, y). As in Example 38.4.1, the function F (x, y) exists provided that it satisfies the system of partial differential equations ∂F ∂F = M (x, y) and = N (x, y). ∂x ∂y However, since the partial derivatives of F (x, y) commute, the above system of equations holds if and only if ∂ 2F ∂ 2F ∂N ∂M = = = . ∂y ∂y∂x ∂x∂y ∂x Any differential equation of the form M (x, y)dx + N (x, y)dy = 0 such that the given functions M (x, y) and N (x, y) satisfy ∂M ∂N = ∂y ∂x is called an exact ordinary differential equation. ∂M ∂N = 1 and = 1. So the differential ∂y ∂x equation in Example 38.4.1 was exact, which explains why it was possible to find a function F (x, y) in that case. Revisiting Example 38.4.1, we can confirm that We are now able to summarise the method of solution of the differential equation M (x, y)dx + N (x, y)dy = 0. ∂N ∂M = . If this relation is not valid, the equation is not exact, so we ∂y ∂x need to follow some other approach. If this relation holds, the equation is exact and F (x, y) exists. In order to find F (x, y) we solve the system of partial differential equations We first check if ∂F = M (x, y) ∂x and ∂F = N (x, y) ∂y whose solution is guaranteed by the fact that the equation is exact. Having found F (x, y), we set it equal to a constant C. The relation F (x, y) = C defines the general solution y(x) of the differential equation M (x, y)dx + N (x, y)dy = 0 implicitly. If possible, we make y the subject of F (x, y) = C in order to obtain the general solution for y(x) explicitly. 38.5 Linear ODEs A linear ordinary differential equation in y is an equation for the function y(x) that can be arranged in the form dy + P (x)y = Q(x), dx 154 where P (x) and Q(x) are given functions. In order to derive the solution of this equation, we express it in the form P (x)y − Q(x) dx + (1)dy = 0 and check if this equation is exact. We have: ∂ P (x)y − Q(x) = P (x) ∂y and ∂(1) = 0. ∂x We conclude that this equation is not exact unless the given function P (x) = 0, in which dy case the equation becomes the separable equation = Q(x). We already know how to dx solve a separable equation, so the case P (x) = 0 is of no real interest. In the case of a non-zero P (x), the interesting fact is that the equation P (x)y − Q(x) dx + dy = 0 R is made exact by multiplying it by the function I(x) = e P (x)dx . This function I(x) is called an integrating factor. In order to confirm this, consider the equivalent differential equation arranged in the form R R R e P (x)dx P (x)y − e P (x)dx Q(x) dx + e P (x)dx dy = 0 and perform the standard test. We have: R R R ∂ e P (x)dx P (x)y − e P (x)dx Q(x) R R ∂e P (x)dx P (x)dx =e P (x) and = e P (x)dx P (x), ∂y ∂x so the equation is now exact. Let us therefore solve it as an exact equation. We know that a function F (x, y) exists such that R R ∂F = e P (x)dx P (x)y − e P (x)dx Q(x) ∂x and R ∂F = e P (x)dx . ∂y Let us first integrate the partial differential equation on the right in order to obtain its general solution. We see that this is given by F =e R P (x)dx y + g(x), where g(x) is an arbitrary function. We substitute this general solution into the partial differential equation on the left in order to obtain a solution consistent with both partial differential equations. We find that R e P (x)dx R P (x)y + g 0 (x) = e P (x)dx P (x)y − e which reduces to g 0 (x) = −e R P (x)dx 155 Q(x). R P (x)dx Q(x), Therefore, Z R g(x) = − e P (x)dx Q(x)dx, and the function F (x, y) is updated to R F (x, y) = e P (x)dx Z y− R e P (x)dx Q(x)dx. Finally, we set F (x, y) equal to a constant C in order to obtain the general solution of the dy + P (x)y = Q(x) for the function y(x). This gives linear differential equation dx Z R R P (x)dx e y − e P (x)dx Q(x)dx = C. R Denoting the integrating factor by I(x) = e P (x)dx , we obtain a rather simple general solution for y(x) in explicit form: Z 1 I(x)Q(x)dx + C . y(x) = I(x) Now that we have derived this result, we can simply memorise it. So, we have the following method of solution: Given a linear differential equation in y, arrange it in the form dy + P (x)y = Q(x) dx dy is equal to 1) and calculate the integrating dx R factor I(x) = e P (x)dx . Then, the general solution for y(x) is given by Z 1 y(x) = I(x)Q(x)dx + C . I(x) (that is, arrange it so that the coefficient of An example of a linear equation is given in the Practice Questions. Several more examples of separable, exact and linear differential equations can be found in our Calculus Textbook. 38.6 Homogeneous ODEs Recall that a function f (x, y) is called homogeneous of degree n if f (λx, λy) = λn f (x, y). A homogeneous ordinary differential equation of degree n is a differential equation that can be expressed in the form M (x, y) + N (x, y) dy = 0, dx where the given functions M (x, y) and N (x, y) are both homogeneous functions of degree n. 156 Example 38.6.1 The ordinary differential equation (x + y)4 dy + (x2 − y 2 ) =0 xy dx is homogeneous of degree 2 because the functions M (x, y) = (x + y)4 xy and N (x, y) = x2 − y 2 are both homogeneous of degree 2. Indeed, we have M (tx, ty) = (tx + ty)4 t4 (x + y)4 = 2 = t2 M (x, y) (tx)(ty) t xy and N (tx, ty) = (tx)2 − (ty)2 = t2 (x2 − y 2 ) = t2 N (x, y). Example 38.6.2 The ordinary differential equation 3y 2x dy + =0 x y dx is homogeneous of degree 0. Without going through the proof, it should be clear that the functions 3y 2x M (x, y) = and N (x, y) = x y are both homogeneous of degree 0. Example 38.6.3 On the other hand, the ordinary differential equation 4x3 + 5y 2 dy =0 dx is not homogeneous. This is because the degree of the homogeneous function M (x, y) = 4x3 (which is 3) is not equal to the degree of the homogeneous function N (x, y) = 5y 2 (which is 2). Example 38.6.4 Similarly, the ordinary differential equation x2 + 4 dy + y2 =0 x dx is not homogeneous. This is because the function M (x, y) = x2 + 38.7 4 is not homogeneous. x Solving Homogeneous ODEs In order to solve a homogeneous ODE we replace the dependent variable y(x) by the new dependent variable z(x) defined by z(x) = y(x) . x 157 In other words, we express y and dy dz in terms of x, z and according to dx dx y = xz and dy dz =z+x dx dx dy . It dx can be shown that this always results in an ordinary differential equation for z(x) which is separable. After solving this differential equation for z(x), we use the relation y(x) = xz(x) in order to obtain the corresponding solution for y(x). and use these expressions in the differential equation in order to eliminate y and Example 38.7.1 For x > 0, solve the ordinary differential equation 2x2 dy = x2 + y 2 dx subject to the condition that y = 7 when x = 1. We observe that this is a homogeneous ODE of degree 2. We use the relations y = xz and dz dy =z+x dx dx to obtain a ODE for the function z(x), namely dz 2 2x z + x = x2 + x2 z 2 . dx We eliminate the factor x2 , dz 2 z+x = 1 + z2, dx and send the term 2z to the right hand side. The resulting equation is clearly separable: 2x dz = z 2 − 2z + 1. dx We separate the variables and integrate: Z Z dz dx 2 = . 2 z − 2z + 1 x The denominator of the integrand on the left hand side is a complete square, so we have Z Z dz dx 2 = (z − 1)2 x which yields the general solution for z(x) in implicit form: − 2 = ln(x) + C. z−1 Note that we do not need to have ln|x| because we have been told that x > 0. 158 Before we apply the condition (x, y) = (1, 7) let us find the corresponding solution for the function y(x). In fact, we can obtain the latter in explicit form. To this end, we make z(x) the subject of the above relation to find that z =1− and then replace z by the ratio 2 ln(x) + C y in order to obtain the general solution for y(x): x 2 . y =x 1− ln(x) + C Finally, using the condition that y is equal to 7 when x is equal to 1, we find that 7=1− 2 . C The solution of this equation is 1 C=− . 3 Hence, the particular solution for y(x) consistent with both the differential equation and the given condition is 2 . y =x 1− ln(x) − 13 38.8 Solving ODEs by changing the dependent variable We saw in the previous section that homogeneous equations are solved by a change of variable which transforms the homogeneous equation for y(x) into a separable equation y(x) for z(x) = . More generally, a change of variable may convert a rather complicated x ODE into one of the forms that we have already studied. Several examples and exercises regarding this technique can be found in sections 12.8 and 12.13 of our Calculus textbook. A differential equation that requires a change of variable is presented in Exercise 38.9.4 below. 38.9 Exercises for self study Exercise 38.9.1 Find the general solution of the following equations by any appropriate method. (a) (x2 + 6x) dy = y 2 dx − 12dy √ dy (b) 2x =2 x+y+2 dx Exercise 38.9.2 Solve the linear differential equation x2 dy + 2xy = 6x2 dx (a) as a homogeneous differential equation (b) as an exact differential equation. 159 Exercise 38.9.3 Find the general solution of the following equations by any appropriate method: dy 2 (b) cos(y) + y cos(x) dx + sin(x) − x sin(y) dy = 0 (a) xy = e−(3x +5) dx Exercise 38.9.4 (a) Find the general solution of the following equation by an appropriate method: dy xy − y 2 = 3x2 e2y/x . dx (b) Solve the following differential equation by changing the dependent variable from y to u = arctan(y): dy (1 + y 2 ) = arctan(y) − x . dx 38.10 Relevant sections from the textbooks • K. Binmore and J. Davies, Calculus, Concepts and Methods, Cambridge University Press. Chapter 12 of our Calculus Textbook is relevant. 39 39.1 Systems of difference and differential equations Linear homogeneous systems of difference equations Example 39.1.1 Consider two sequences {xt } and {yt } related as follows: x0 = 1, y0 = 2, and, for t ≥ 0, xt+1 = 7xt − 15yt xt+1 7 −15 xt i.e., = . yt+1 = 2xt − 4yt yt+1 2 −4 yt Clearly, the above difference equations together with the initial conditions determine the sequences {xt } and {yt } uniquely; i.e., 7 −15 −23 −71 x1 7 −15 1 −23 x2 = = , = = , y1 2 −4 2 −6 y2 2 −4 −6 −22 and so on. However, as these equations stand, we do not have a method for obtaining explicit expressions for the solutions. The problem is that these equations are coupled. In order to solve xt+1 = 7xt − 15yt for xt we need to know yt ; however, we cannot find yt by solving yt+1 = 2xt − 4yt because we do not know xt . Diagonalisation Indeed, as long as the matrix provides a way out of this problem. 7 −15 xt+1 x A = appearing in the system = A t is diagonalisable; that is, 2 −4 yt+1 yt as long as an invertible matrix matrix D exist such that A = PDP−1 , P anda diagonal xt+1 x the system of equations = A t can be expressed in the form yt+1 yt xt+1 −1 xt −1 xt −1 xt+1 = PDP i.e., P = DP . yt+1 yt yt+1 yt 160 xt Xt Then, if we let xt = and introduce a new variable zt = by yy Yt zt = P−1 xt i.e., xt = Pzt , our system becomes zt+1 = Dzt . This means that the corresponding equations for the sequences {Xt } and {Yt } have been uncoupled and can now be solved. In particular, it turns out that the eigenvalues of A are λ1 = 1 and λ2 = 2 and the corresponding eigenspaces are 6 −15 5 N (A − I) = N ( ) = Lin 2 −5 2 5 −15 3 N (A − 2I) = N ( ) = Lin . 2 −6 1 −1 5 3 2 1 Therefore, A can be written in the form A = PDP , where P = and D = 1 0 . Substituting the diagonal matrix D in the system zt+1 = Dzt , we obtain the 0 2 uncoupled equations Xt+1 1 0 Xt Xt+1 = Xt = i.e., Yt+1 0 2 Yt Yt+1 = 2Yt These equations can be expressed in the standard way (E − 1)Xt = 0 (E − 2)Yt = 0, where E is the shift operator. Their general solutions are Xt = a(1)t = a Yt = b(2)t , where a and b are arbitrary constants. The corresponding solution for the original variables xt and yt can then be obtained by using the relation xt = Pzt : xt 5 3 a 5a + 3b(2)t = = . yt 2 1 b(2)t 2a + b(2)t Finally, using the initial conditions x0 = 1 and y0 = 2, we obtain the simultaneous system 5a + 3b = 1 a = 5 i.e., , 2a + b = 2 b = −8 which implies that the required sequences {xt } , {yt } are xt = 25 − 24(2)t yt = 10 − 8(2)t . 161 We can verify that the first few terms, namely x1 −23 x2 −71 = , = , y1 −6 y2 −22 are in agreement with the terms obtained by using the system xt+1 = Axt directly. An alternative, faster, method for obtaining the above solutions is the following: Starting from the system xt+1 = Axt , we realise that x1 = Ax0 , x2 = Ax1 = A2 x0 , x3 = Ax2 = A3 x0 , . . . , etc. This gives us an expression for the solution xt ; namely xt = At x0 . We now use the fact that At = (PDP−1 )t = PDt P−1 , which gives us the required solution for xt in explicit form: t 5 3 (1) 0 −1 3 1 t −1 xt = PD P x0 = t 2 1 0 (2) 2 −5 2 5 3(2)t 5 = 2 (2)t −8 25 − 24(2)t . = 10 − 8(2)t 39.2 Linear homogeneous systems of differential equations Example 39.2.1 Solve the system of differential equations  dx(t)   = 7x(t) − 15y(t)   dt     dy(t) = 2x(t) − 4y(t) dt for the functions x(t) and y(t), subject to the initial conditions x(0) = 1 and y(0) = 2. Note that the coefficient matrix of this system is the matrix A introduced in Example 39.1.1. x(t) As with the system of difference equations, we let x(t) = and introduce a new y(t) X(t) 5 3 variable z(t) = by x(t) = Pz(t), where P = is the transition matrix used Y (t) 2 1 162 in Example 39.1.1. The system of differential equations for the new functions X(t) and Y (t) is then diagonal:  dX(t)   = X(t)   dt     dY (t) = 2Y (t) dt These equations are both linear with constant coefficients and are also separable. Regarding them as separable, we have Z Z dX = dt ⇐⇒ lnX = t + α ⇐⇒ X = eα+t = Aet X and Z dY = 2 dt ⇐⇒ lnY = 2t + β ⇐⇒ Y = eβ+2t = Be2t , Y where A and B are arbitrary constants. Hence, the general solution for the original variables x(t) and y(t) is given by t x(t) X(t) 5 3 Ae 5Aet + 3Be2t =P = = . y(t) Y (t) 2 1 Be2t 2Aet + Be2t Z Given the initial conditions x(0) = 1 and y(0) = 2 we obtain the simultaneous system 5A + 3B = 1 A = 5 i.e., , 2A + B = 2 B = −8 which implies that the required particular solutions for x(t) and y(t) are x(t) = 25et − 24e2t y(t) = 10et − 8e2t . You may find it useful to verify that these functions indeed solve the system  dx(t)   = 7x(t) − 15y(t)   dt     dy(t) = 2x(t) − 4y(t) dt 39.3 Exercises for self study Exercise 39.3.1 (a) Find the general solution of the following system of linear differential equations: 0 x (t) = x(t) + 4y(t) y 0 (t) = 3x(t) + 2y(t) (b) Then find the unique solution satisfying the initial conditions x(0) = 1 and y(0) = 0. Exercise 39.3.2 Find the general solution of the following system of difference equations   xt+1 = xt + yt yt+1 = −2xt + 4yt  zt+1 = 4zt 163 Exercise 39.3.3 Two supermarkets compete for customers in a region with 10, 000 shoppers. It is assumed that each shopper shops exactly once in any given week (by going to only one of the two supermarkets). It is known that during any given week, supermarket A will keep 70% of its customers while losing 30% to supermarket B, and that supermarket B will keep 80% of its customers while losing 20% to supermarket A. At the end of a certain week (call it week zero), the total population of 10, 000 shoppers was distributed as follows: 9, 000 went to supermarket A and 1, 000 went to supermarket B. xt Let the variable xt be given by xt = , where xt is the number of shoppers shopping yt at supermarket A in week t and yt is the number of shoppers shopping at supermarket B in week t. Write down a system of difference equations in the form xt+1 = Mxt for a a suitable matrix M and also state a suitable initial condition x0 = which model the b given situation; i.e., which can be used to predict the number of shoppers shopping at each supermarket in any future week t. Exercise 39.3.4 Referring to the so-called Markov process described in Exercise 39.3.3, diagonalise M and solve the system of difference equations subject to the appropriate initial conditions. What is the long term distribution of shoppers shopping in the two supermarkets (as percentages of the total number of shoppers in the region)? 39.4 Relevant sections from the textbooks • M. Anthony and M. Harvey, Linear Algebra, Concepts and Methods, Cambridge University Press. Sections 9.2 and 9.3 of our Algebra Textbook are relevant. 164

Linear Algebra & Multivariate Calculus Course Material

Related documents

Products

Support

Linear Algebra & Multivariate Calculus Course Material

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib