1 Contents Calculus 1.1 Functions of Two and Three Variables . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 Implicit differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.4 Gradient and directional derivatives . . . . . . . . . . . . . . . . . . . . . 1.1.5 Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.6 Constrained Optimisation — Lagrange Multipliers . . . . . . . . . . . . . 1.2 Sequences & Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Taylor Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Integration Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Review: Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear algebra 2.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Review: Solving Linear Systems of Equations . . . . . . . . . . . . . . . . 2.1.2 Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Linear Independence and Dependence . . . . . . . . . . . . . . . . . . . . 2.1.4 Definition of a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.5 Basis and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.6 Subspaces of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.7 Matrices and their Associated Subspaces in Rn . . . . . . . . . . . . . . . 2.1.8 The General Solution of Ax = b . . . . . . . . . . . . . . . . . . . . . . 2.2 Inner Products and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Orthogonal and orthonormal bases . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Orthogonal projection of one vector onto the line spanned by another vector 2.2.3 The Gram-Schmidt process . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Least squares solutions of systems of linear equations . . . . . . . . . . . . 2.3 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Eigenvalues: The characteristic equation of a matrix. . . . . . . . . . . . . 2.3.3 Diagonalisation of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.6 Discrete Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . Differential equations 3.1 First-Order Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 7 11 14 19 23 27 27 38 44 49 56 56 58 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 63 66 68 75 77 79 81 87 91 91 93 94 96 103 103 106 110 112 116 122 . . . . . . . 128 . . . . . . . 128 . . . . . . . 129 CONTENTS 3 3.1.3 First-Order Differential Equations . . . . . . . . . . . . . . . . . 3.2 Systems of First-Order Differential Equations . . . . . . . . . . . . . . . 3.2.1 First-order linear homogeneous equations . . . . . . . . . . . . . 3.2.2 Systems of first-order linear DEs . . . . . . . . . . . . . . . . . . 3.3 Homogeneous Linear Second-Order DEs with constant coefficients . . . . 3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Solving Homogeneous Linear Second-Order DEs . . . . . . . . . 3.3.3 Homogeneous Linear DEs with Constant Coefficients. . . . . . . 3.3.4 Equivalence of Second Order DE and First-Order System of DEs Appendix 4.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Vector Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Length, distance, and angles in Rn . . . . . . . . . . . . . . . . . 4.2 Vector Representation of Lines and Planes . . . . . . . . . . . . . . . . . 4.2.1 Vector Representation of Lines and Planes . . . . . . . . . . . . . 4.3 Systems of Linear Equations and Matrices . . . . . . . . . . . . . . . . . 4.3.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . 4.3.2 Matrix notation and concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 149 149 150 156 156 158 163 168 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 170 171 174 174 178 178 188 Calculus 1.1 Functions of Two and Three Variables 1.1.1 Review Partial derivatives have been covered in MATHS 108, as background material, we have a revision in this subsection. A function of one variable y = f (x) is a rule that assigns to each value of the independent variable x in a set on the x-axis exactly one value of the dependent variable y. The graph of a continuous function y = f (x) is a curve in the xy-plane.. A function of two variables z = f (x, y) is a rule that assigns to each pair of values of the independent variables x and y in a region of the xy-plane exactly one value of the dependent variable z. The graphs of continuous functions z = f (x, y) are surfaces, see two examples below, in the xyz−space. 0 1 −0.5 0.5 −1 0 −1.5 −0.5 −2 150 −1 150 150 100 150 100 100 50 50 0 0 4 100 50 50 0 0 CONTENTS 5 Recall the notation for the partial derivatives of a function z = f (x, y) of two variables: First Partial Derivatives ∂f = fx , ∂x ∂f = fy ∂y Second Partial Derivatives ∂2f ∂ ∂f = = (fx )x = fxx , ∂x2 ∂x ∂x ∂ ∂f ∂2f = = (fx )y = fxy , ∂y∂x ∂y ∂x ∂2f ∂ ∂f = = (fy )x = fyx , ∂x∂y ∂x ∂y ∂2f ∂ ∂f = = (fy )y = fyy . ∂y 2 ∂y ∂y Notes: For functions we will meet in this course, the second partial derivatives satisfy fxy = fyx . However, it is not the case in general. Example 1.1.1. Find fx , fy , fxx , fxy , fyx and fyy for the following functions of two variables: (a) f (x, y) = x sin(y) (b) f (x, y) = p x2 + y 2 √ (c) f (x, y) = x xy CONTENTS 6 Example 1.1.2. In many production processes, manufacturing costs consist of fixed costs (purchase or rental of equipment and facilities) and two variable costs: capital and labour. We let k denote units of capital, and l units of labour. If the variable cost function is C(k, l) = 108 + 18k + 40l, ∂C ∂C and , the marginal costs of capital and labour find ∂k ∂l respectively. Example 1.1.3. Find fx , fy , fz , fxz , fyx and fzx , for the following functions of three variables: (a) f (x, y, z) = ln(x2 + y 2 + z 2 ) (b) f (x, y, z) = e2xy−z Example 1.1.4. The Matlab code to calculate the partial derivatives in the first example above is %Matlab-session syms x y z f=’log(x^2+y^2+z^2)’; diff(f,x) diff(f,y) diff(f,z) diff(diff(f,x),z) diff(diff(f,y),x) diff(diff(f,z),x) CONTENTS 7 1.1.2 The Chain Rule Recall the Chain Rule for differentiation of a composite function of one variable: y = y(u(x)) ⇒ y ′ = y ′ (u(x) )u′ (x), or in more suggestive notation dy dy du = . dx du dx We now look at the extension of the Chain Rule to functions of more than one variable in the following two cases. z Case 1: Let z = f (u, v), u = u(x), and v = v(x). In this case we can show the hierarchy of dependency in the way given in the adjacent diagram: u v x x To find the rate of change of z with respect to x, we need to travel down all paths from z to x taking derivatives or partial derivatives as we go, multiplying together all derivatives along each path and adding these products: z dz du Chain Rule dz ∂z du ∂z dv = + dx ∂u dx ∂v dx ∂z du ∂z dv dz = + dy ∂u dy ∂v dy dz dv u v du dx dv dx x x Figure 1.1: Chain rule Note: • If z is a function of one variable x, then use the dz for the derivative. notion dx • If z is a function of more than one variable x, for ∂z instance, z = z(x, y), then use the notion and ∂x ∂z for the derivatives. ∂y CONTENTS 8 Example 1.1.5. (a) z = uv 2 , u = cos(x), v = sin(x). Find (b) z = ex sin(y), x = t2 , y = ln(t). Find (c) w = dz . dx dz . dt √ x y dw + , x = t, y = cos(2t) and z = e−3t . Find . y z dt (d) v = xy 2 z 3 , x = sin(t), y = cos(t), z = 1 + e2t . Find dv . dt CONTENTS 9 Case 2: Let z = z(u, v), u = u(x, y) and v = v(x, y). We can show the hierarchy of dependency in the following diagram: z ∂z ∂v ∂z ∂u u ∂u ∂x x v ∂v ∂x ∂u ∂y y x ∂v ∂y y To find the rate of change of z with respect to x again travel down all paths from z to x (i.e. consider all dependencies of z on x) taking partial derivatives as you go, and multiplying together all derivatives along each path: Chain Rule ∂z ∂u ∂z ∂v ∂z = + , ∂x ∂u ∂x ∂v ∂x ∂z ∂z ∂u ∂z ∂v = + . ∂y ∂u ∂y ∂v ∂y 2 (a) z = u cos(v) sin(u), u = exy , ∂z ∂z and . v = x2 + y. Find ∂x ∂y Example 1.1.6. CONTENTS (b) z = ln(x2 + y 2 ), x = t2 − s2 , y = 2st. Find (c) z = 10 ∂z ∂z and . ∂s ∂t x ∂z ∂z ∂z , x = rest , y = rset . Find , , and when (r, s, t) = (1, 2, 0). y ∂r ∂s ∂t CONTENTS 11 1.1.3 Implicit differentiation A function given by an equation F (x, y) = 0 is called an implicit function. By contrast, a function y = f (x) (i.e., y is expressed explicitly in terms of x) is called an explicit function. For example, the functions given by the equation F (x, y) = x2 +√ y 2 − 1 = 0 are implicit √ functions. But the functions 2 y = 1 − x and y = − 1 − x2 are explicit. To differentiate the implicit function, we can differentiate the given equation. We do not need to find the explicit function y = f (x). Actually, it is not always possible to express a function explicitly. The technique of implicit differentiation was covered in MATHS 108 and is illustrated in the following example. Example 1.1.7. Consider a function y = f (x) given by the equation dy x2 + y 2 − 1 = 0. Find the derivative . dx dy , differentiate both sides of the To find the derivative dx given equation with respect to x and apply the Chain Rule (remembering that y is a function of x!) which gives: dy 2x + 2y = 0. dx So, dy −2x x = =− . dx 2y y In general, if a differentiable function y = f (x) is given by the equation F (x, y) = 0, differentiating both sides of the equation F (x, y) = 0 with respect to x which gives Fx + Fy Thus, dy = 0. dx Fx dy =− . dx Fy Similarly, if a differentiable function x = f (y) is given by the equation F (x, y) = 0, differentiating both sides of the equation F (x, y) = 0 with respect to y which gives Fy + Fx Thus, dx = 0. dy Fy dx =− . dy Fx CONTENTS 12 Given F (x, y) = 0 : dy Fx =− , dx Fy Fy dx =− . dy Fx For a function of two variables given by the equation F (x, y, z) = 0, we can find the corresponding partial derivatives in the same way. Example 1.1.8. Given the equation exyz − x2 + 3y 2 + z 2 = 208. Find ∂z ∂z and . ∂x ∂y ∂z , differentiate both sides To find the partial derivative ∂x of the given equation with respect to x and apply the Chain Rule (remembering that y can be regarded as a constant and hence z is a function of x!) which gives: exyz y(z + x ∂z ∂z ) − 2x + 2z = 0. ∂x ∂x It follows that ∂z 2x − exyz yz = . ∂x 2z + exyz xy ∂z , differentiate both sides ∂y of the given equation with respect to y and apply the Chain Rule (remembering that x can be regarded as a constant and hence z is a function of y!) which gives: To find the partial derivative exyz x(z + y ∂z ∂z ) + 6y + 2z = 0. ∂y ∂y It follows that 6y + exyz xz ∂z =− . ∂y 2z + exyz xy It can be shown that Given F (x, y, z) = 0 : Fx ∂z =− , ∂x Fz ∂z Fy =− , ∂y Fz Fy ∂x =− , ∂y Fx Fz ∂x =− , ∂z Fx ∂y Fx =− , ∂x Fy Fz ∂y =− . ∂z Fy CONTENTS 13 Extra for Interest: Those formulae can be more generalised in MATHS 340 where vector functions are defined by more than one equation. Now, continuing the previous example: given the equation exyz − x2 + 3y 2 + z 2 = 208, ∂y ∂y ∂x ∂x , , and using the formulae on the pre∂x ∂z ∂y ∂z vious page. find CONTENTS 14 1.1.4 Gradient and directional derivatives Definition 1.1.9. The vector of first derivatives of f (x, y) evaluated at (x0 , y0 ), ∇f (x0 , y0 ) = (fx , fy )|(x0 ,y0 ) , is called the gradient of the function f at (x0 , y0 ). The vector of first derivatives of f (x, y, z) evaluated at (x0 , y0 , z0 ), ∇f (x0 , y0 , z0 ) = (fx , fy , fz )|(x0 ,y0 ,z0 ) , is called the gradient of f at (x0 , y0 , z0 ). We say “grad f" for ∇f . Example 1.1.10. (a) Given a surface z = x2 − y. Find ∇z(1, 2). plot and level curves 20 15 (b) Find the equation of the level curve of the surface z = x2 − y through the point (1, 2). Recall that along a level curve of a function f , the value of f is constant. 10 5 0 −5 4 2 4 2 0 0 −2 −2 −4 (c) Find the equation of the tangent line to the level curve of z through (1, 2). −4 contour plot and gradient vectors 4 3 (d) Find a unit vector in the direction of this tangent line. 2 1 0 −1 (e) Find the dot product of this unit vector and ∇z(1, 2). −2 −3 −4 −4 −3 −2 −1 0 1 2 3 4 CONTENTS 15 This last example illustrates the fact that if u is a vector tangent to the level curve of f at (x0 , y0 ), then ∇f (x0 , y0 ) · u = 0, i.e., the tangent to the level curve at (x0 , y0 ) and ∇f (x0 , y0 ) are at right angles. FACT The tangent to the level curve at (x0 , y0 ) and ∇f (x0 , y0 ) are at right angles. In fact, we will shortly see that the gradient ∇f (x0 , y0 ) points from (x0 , y0 ) in the direction in which the value of f is increasing most rapidly. Recall that for a function y = f (x) of one variable, df dx f (x0 + h) − f (x0 ) h→0 h = lim x=x0 measures the rate of change of f at the point (x0 , f (x0 )) as x changes. Moving up dimensions, there is now more than one independent variable, so we need to specify the direction in which we are interested in observing a rate of change: The rate of change of f (x, y) at (x0 , y0 ) in the direction of a given non-zero vector v is given by: f (x0 + hu1 , y0 + hu2 ) − f (x0 , y0 ) h→0 h Du f (x0 , y0 ) = lim where u = (u1 , u2 ) = tion of v. v is a unit vector in the direckvk Definition 1.1.11. If u is any unit vector, the directional derivative of f at (x0 , y0 ) in the direction of u, given by Du f (x0 , y0 ) = ∇f (x0 , y0 ) · u , measures the rate of change of f at (x0 , y0 ) as (x, y) moves from (x0 , y0 ) in the direction of u. CONTENTS 16 (1,1,5) 10 Example 1.1.12. Find the directional derivative of f (x, y) = 5 − x2 + y 2 at the point (1, 1) 8 6 4 2 (a) in the direction of (3, −4); (1,1,0) 0 2 1 2 1 0 0 −1 (b) in the direction of (1, 1); −1 −2 −2 2 (c) in the direction of the gradient of f at (1, 1). 1.5 (1,1,0) 1 0.5 0 −0.5 −1 −1.5 −2 −2 If we are determining a climbing route with a topographical map, how can we tell the fastest way to ascend a mountain? - risks and rivers aside! Common knowledge suggests we cut across the contours at right angles. Here’s why: Assume • altitude at any point (x, y) is given by the function f (x, y), and • current position is (x0 , y0 ). We know that Du f (x0 , y0 ) = ∇f (x0 , y0 ) · u = k∇f (x0 , y0 )kkuk cos θ (1.1) Consider the expression for the directional derivative of f at a fixed (x0 , y0 ), and let the direction u vary over all possible values. What changes? • (x0 , y0 ) is fixed, so k∇f (x0 , y0 )k is fixed, • u (the direction in which we examine the change in f ) is not fixed, but kuk = 1 is constant −1.5 −1 −0.5 0 0.5 1 1.5 2 CONTENTS 17 • cos θ varies as θ, the angle between ∇f (x0 , y0 ) and u, varies. In other words only cos θ changes as we look at rates of change of f in different directions. So when is the directional derivative, Du f (x0 , y0 ), largest? Recall that −1 ≤ cos θ ≤ 1, and cos θ = 1 when θ = 0. Therefore the directional derivative, Du f (x0 , y0 ), is largest when u and ∇f (x0 , y0 ) are parallel. That is, the directional derivative, Du f (x0 , y0 ), is largest when the direction u is the same as the direction of the gradient vector ∇f (x0 , y0 ). We’ve derived: FACT (i) The direction of maximum change in a function f (x, y) at a point (x0 , y0 ) is the direction of the gradient vector, ∇f (x0 , y0 ). (ii) The direction of maximum negative change in a function f (x, y) at a point (x0 , y0 ) is the direction of −∇f (x0 , y0 ) (the direction opposite to ∇f (x0 , y0 )). (iii) The maximum change of f at (x0 , y0 ) is k∇f (x0 , y0 )k. Now, we consider topographical maps again: If a function f (x, y) measures altitude at position (x, y), and you are standing at (x0 , y0 ), then ∇f (x0 , y0 ) points in the direction of steepest increase in altitude from (x0 , y0 ). If you want to descend from where you are most steeply, take the direction −∇f (x0 , y0 ). Consider a function of three variables f (x, y, z). If u is any unit vector, the directional derivative of f at (x0 , y0 , z0 ) in the direction of u, given by Du f (x0 , y0 , z0 ) = ∇f (x0 , y0 , z0 ) · u , Example 1.1.13. Given a function f (x, y, z) = x2 y − yz 3 + z and a point P = (1, −2, 0). (a) Calculate the gradient ∇f of f at the point P . (b) Find the directional derivative Du f of f at the point P in the direction: CONTENTS 18 (i) of the vector v = (2, 1, −2). (ii) of the negative z-axis. (iii) from P to Q = (4, −3, 1). (c) In which direction does the function increase fastest at P and what is its rate of change in that direction? Example 1.1.14. In Matlab we can graph functions of more than one variable, along with level curves • the surf or mesh commands plot surfaces • the surfc or meshc commands plot surfaces and their level curves (contours) • the contour command plots level curves in the (x, y) plane 0.5 0 −0.5 2 • the gradient function evaluates gradient vectors numerically. 1 0 −1 −1 −2 • the quiver function plots vector fields (gradients) Matlab plots numerically: meshgrid must be called to produce a set of points in the plane as domain for the plots. Here we use Matlab to observe 2 2 the graph of z = xe−x −y and to plot a vector field consisting of the gradients on a contour plot. 2 1 0 −2 2 1.5 1 0.5 0 −0.5 clf; colormap(gray); −1 [x,y]=meshgrid(-2:.2:2, -2:.2:2); −1.5 z = x .* exp(-x.^2 - y.^2); −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 surfc(x,y,z); figure %open another figure 2 2 colormap(gray); Figure 1.2: Surface of z = xe−x −y [px,py] = gradient(z,.2,.2); with contours and gradients contour(x,y,z),hold on, quiver(x,y,px,py), hold off CONTENTS 19 1.1.5 Optimisation Recall the one-dimensional case: Suppose that f = f (x) is differentiable, then for f to have a maximum or a minimum at a point x, the slope of f must be zero, i.e., f ′ (x) = 0. If f is twice differentiable, then the second derivative test determines whether f has a relative maximum, a relative minimum, or neither. Summary 1.1.15. Suppose that f ′ (x0 ) = 0, and • if f ′′ (x0 ) > 0 then f has a relative minimum at x0 • if f ′′ (x0 ) < 0 then f has a relative maximum at x0 • if f ′′ (x0 ) = 0 then the test is inconclusive (use the first derivative near x0 to analyse) Now the two-dimensional case: Recall that the directional derivative of f at the point x = (x, y) in the direction v = (v1 , v2 ) is given by f (x + tv) − f (x) . t→0 t Dv f (x) = lim Since this is the rate that f increases at x in the direction v, for f to have a relative maximum or minimum we must have Dv f (x) = 0, for all v. Since Dv f (x) = ∇f (x)·v = v1 fx + v2 fy the condition that all directional derivatives be zero (the analog of f ′ (x) = 0) can be written ∇f (x) = 0. What is the analog of the second derivative test? Since Dv f is a function of two variables, we may take a directional derivative again, to obtain Du Dv f (x) = (Du (Dv f ))(x) = u T fxx (x) fxy (x) v. fyx (x) fyy (x) This follows by applying Dv f = v1 fx + v2 fy twice. A square matrix of second-order partial derivatives of a function f fxx fxy Hf = fyx fyy is called the Hessian of f . CONTENTS 20 This matrix is symmetric, or has Hf = HfT , provided fxy = fyx . We now apply the one-dimensional result. The second derivative of f in the direction v is given by 2 T fxx fxy Dv f = Dv (Dv f ) = v v = vT Hf v. fyx fyy (1.2) Suppose ∇f (x) = 0, and that Dv2 f (x) > 0, for all directions v. Then the function of one-variable obtained by restricting f to the line through x in the direction v has zero first derivative at x, second derivative Dv2 f (x) > 0, and so has a local minimum at x. Since this is true for all directions v, a condition for f to have a local minimum at x (the analogue of f ′′ (x) > 0) is that Dv2 f (x) > 0, for all directions v. Definition 1.1.16. A square symmetric matrix A is said to be (i) positive-definite if vT Av > 0 for all v 6= 0. (ii) negative-definite if vT Av < 0 for all v 6= 0. (iii) indefinite if it is neither positive-definite nor negative-definite. This argument leads to the second derivative test for relative maxima and minima of functions of several variables. Summary 1.1.17. Suppose that ∇f (x) = 0, i.e., fx (x) = 0, fy (x) = 0, and • Hf (x) is positive definite ⇒ f (x) is a relative minimum; e.g., f (x, y) = x2 + y 2 at (0, 0) • Hf (x) is negative definite ⇒ f (x) is a relative maximum; e.g., f (x, y) = −x2 − y 2 at (0, 0) • Hf (x) is indefinite ⇒ f (x) is a saddle point; e.g., f (x, y) = x2 − y 2 at (0, 0) Otherwise the test is inconclusive; e.g., f (x, y) = x4 + y 4 at (0, 0). The points (x, y, f (x, y)) where ∇f (x, y) = 0 are called critical points of the function f . There are a number of tests for a symmetric matrix A to be positive definite, including that the principal minors (see section Quadratic forms) be positive, i.e. det([a11 ]) = a11 > 0, This leads to: det(A) = a11 a22 − a212 > 0. CONTENTS 21 Second partial derivative test for functions of several variables Suppose (x0 , y0 ) is a critical point of f , and • if det Hf (x0 , y0 ) > 0 and fxx (x0 , y0 ) > 0 then f has a minimum at (x0 , y0 ), 2 1.5 1 0.5 0 150 150 100 100 50 50 0 0 • if det Hf (x0 , y0 ) > 0 and fxx (x0 , y0 ) < 0 then f has a maximum at (x0 , y0 ), 0 −0.5 −1 −1.5 −2 150 150 100 100 50 50 0 0 • if det Hf (x0 , y0 ) < 0 then f has a saddle point at (x0 , y0 ), 1 0.5 0 −0.5 −1 150 150 100 100 50 50 0 0 • if det Hf (x0 , y0 ) = 0 then the test is inconclusive. Note that in two dimensions a critical point can be a saddle point, which has directions in which the function is increasing and decreasing. Example 1.1.18. Find and classify the critical points of f (x, y) = 4xy − x4 − y 4 . CONTENTS 22 Example 1.1.19. An oil refinery produces two grades of petrol: standard and super. Weekly production is represented by x megalitres of standard and y mega-litres of super. Weekly revenue is 60xy, and costs are x3 + y 3 (units K$). It follows the weekly profit is given by the function: f (x, y) = 60xy − x3 − y 3 . Obviously, the refinery management team would like to maximise this profit. How can this be done? Example 1.1.20. Find and classify the critical points of f (x, y) = x3 − 3xy − y 3 . 30 25 20 15 10 5 0 −5 −10 −15 −20 −2 0 2 y 2 1.5 0.5 1 x 0 −0.5 −1 −1.5 −2 CONTENTS 23 1.1.6 Constrained Optimisation — Lagrange Multipliers We now consider the problem of finding the maximum or minimum of a function f (x, y) subject to the constraint that the point (x, y) lie on some curve given by g(x, y) = 0. Since the gradient of a function is orthogonal to its level curves, ∇g is orthogonal to the curve g = 0 at each point on the curve. Further, the function f increases most in the direction ∇f (and decreases most in the direction −∇f ), so that if ∇f is not parallel to ∇g at a point on the curve, then we can increase or decrease the value of f by moving along the curve g = 0. Thus the condition for a point on the curve g = 0 to give a maximum or a minimum of f is that ∇f and ∇g be parallel, i.e., ∇f = λ∇g, for some λ 6= 0. Equivalently, the level curves of f and g must touch at this point. Since vectors are equal if and only if their components are equal, this condition gives two equations fx = λgx , fy = λgy . This gives the method of Lagrange multipliers: To find the maximum and minimum of the function f (x, y) subject to the constraint g(x, y) = 0, find the points (x, y) which satisfy the equations ∇f = λ∇g, g = 0. These can be expanded to a system of equations fx (x, y) = λgx (x, y), fy (x, y) = λgy (x, y), g(x, y) = 0. For these points evaluate f (x, y) to find which point (or points) gives the maximum and the minimum. CONTENTS 24 Example 1.1.21. Find the maximum and minimum of the function f (x, y) = xy subject to the constraint x2 + y 2 = 1. objective function 800 600 400 s 200 0 −200 −400 −600 −800 −10 0 10 −20 10 0 −10 20 l c objective function and constraint 800 Example 1.1.22. The sales of a fixed price product depend on the cost of materials C, and the amount of labour L, according to the relationship 600 400 s 200 2 S = 10CL − 2L . Further, budget constraints require that C + L = 12. Find the maximum sales obtainable firstly by the method of Lagrange multipliers, and secondly by substituting for one of the variables from the constraint equation. 0 −200 −400 −600 −800 −10 0 10 c −20 0 −10 l 10 20 CONTENTS 25 Lagrange multipliers in three dimensions For functions of three (or more) variables the method of Lagrange multipliers works, by the same reasoning, with the constraint g(x, y, z) = 0, being that the points (x, y, z) lie on some surface in R3 . Example 1.1.23. Find the points on the sphere x2 + y 2 + z 2 = 36 that are closest and farthest from the point P (1, 2, 2). CONTENTS 26 Lagrange multipliers with two constraints For f (x, y, z) we can also require that the points (x, y, z) lie on a curve given by the intersection of two surfaces g(x, y, z) = 0, h(x, y, z) = 0. This leads to the following Lagrange multiplier method with two constraints: To find the maximum and minimum of the function f (x, y, z) subject to the constraints g(x, y, z) = 0, h(x, y, z) = 0, find the points (x, y, z) which satisfy the equations ∇f = λ∇g + µ∇h, g = 0, h = 0. For these points evaluate f (x, y, z) to find which give the maximum and the minimum. The scalars λ and µ are called the Lagrange multipliers of the problem, and must not both be zero. Example 1.1.24. Find the extreme values of u = f (x, y, z) = xy −3z 2 subject to the conditions that x+y +z = 24 and z −y = 4. CONTENTS 27 1.2 Sequences & Series 1.2.1 Sequences A sequence is a list of infinitely many numbers in a particular order. An arbitrary sequence is written as a1 , a2 , a3 , . . . or {an }∞ n=1 . • the first term is a1 , • the second term is a2 , and • an is the nth general term n = 1, 2, .... Typically a sequence is defined by a formula for the general term an . We usually start our sequences with the index n = 1, but they can start with any index by adjusting the formula for an . Example 1.2.1. For the following sequences, whose nth general term is given below, write out the first three terms (a) an = 3n n! (b) an = 2n n (c) an = 3n n3 CONTENTS 28 (a) 1, 12 , 13 , 14 , 51 , · · · an = an Example 1.2.2. For the following sequences find a formula for an 1.4 1.2 1 0.8 0.6 0.4 0.2 0 12345 10 15 20 n (b) 1, 2, 4, 8, 16, · · · an = an 128 64 32 16 0 1 2 3 4 5 n (c) 21 , 23 , 34 , 45 , · · · an = an 1 0 12345 10 15 20 15 20 n (d) 1, −1, 1, −1, 1, −1, 1, −1, · · · an = an 1 0 -1 12345 10 n Example 1.2.3. The Fibonacci sequence: Two new-born rabbits (a male and a female) are taken to a remote island and released. There are no rabbitpredators on the island, and no other rabbits occupy the island. • Each month, each pair of rabbits produces a new-born pair, one of each gender. • Each pair reproduces at age 2 months. The population after n months, in pairs of rabbits, is given by a1 = 1, a2 = 1, an+1 = an + an−1 , n ≥ 2. Write down the first 10 terms of the sequence. Generally, our interest is in what is happening eventually, i.e. in the trend of the terms in the sequence. CONTENTS 29 Convergence Definition 1.2.4. We say a sequence {an }∞ n=1 converges to L if the terms in the sequence eventually become arbitrarily close to L. In this case, we write lim an = L, n→∞ Otherwise, if lim an does not exist, we say the sequence {an }∞ n=1 diverges. n→∞ Example 1.2.5. For each of the following geometric sequences, determine whether it is convergent and fill out the table: 1 2n an (a) an = 1 2 3 4 5 10 n an (b) an = 1.1n 8 7 6 5 4 3 2 1 0 12345 10 15 20 n n an (c) an = − 21 1 2 3 4 5 10 n (d) an = (−1)n 1 an 0.5 0 -0.5 -1 12345 10 15 20 15 20 n an (e) an = (−1.1)n an 1 2n 1.1n (− 12 )n (−1)n (−1.1)n lim an n→∞ {an }∞ n=1 converges? 8 6 4 2 0 -2 -4 -6 -8 12345 10 n CONTENTS 30 More generally, we have: Convergence of a geometric sequence lim r n = 0 For 0 ≤ r < 1, n→∞ lim r n = 0 For −1 < r < 0, n→∞ lim r n = 1 For r = 1, n→∞ lim r n does not exist For r = −1, n→∞ lim r n does not exist For r > 1, n→∞ lim r n does not exist For r < −1, n→∞ Limit-taking techniques In the previous subsection, we used the graphs to guess the limits of geometric sequences. Now we introduce methods to find the exact limits of some sequences. First of all, it is not hard to see the following useful facts: ∞ Suppose that the sequences {an }∞ n=1 and {bn }n=1 converge and k is a constant. Then • lim kan = k lim an . n→∞ n→∞ • lim (an + bn ) = lim an + lim bn . n→∞ n→∞ n→∞ • lim (an bn ) = lim an × lim bn . n→∞ n→∞ n→∞ Squeezing Theorem Given a sequence {an }∞ n=1 , suppose that there exist two ∞ ∞ other sequences {bn }∞ n=1 and {cn }n=1 such that {an }n=1 is squeezed between these two sequences: (i) : (ii) : b n ≤ an and lim bn = lim cn = L n→∞ L ≤ for all n ≥ n0 ≤ cn ↓ lim an ≤ n→∞ (where n0 some positive integer ) (i.e. they both have the same limit.) n→∞ Then as n approaches ∞, bn ≤ an ↓ ↓ so an ≤ cn , L lim an = L. n→∞ This theorem is useful for taking limits of sequences involving sin(n), cos(n), and (−1)n . CONTENTS 31 Example 1.2.6. Use the Squeezing Theorem to find the limit of each sequence below defined by (a) an = (−1)n n (b) an = sin(n) 2n Example 1.2.7. The Matlab code to find the limits in the previous two examples is % Calculate various limits syms n; limit((-1)^n/n, inf) limit(sin(n)/(2*n), inf) Example 1.2.8. Prove the following: If lim |an | = 0 then lim an = 0. n→∞ n→∞ CONTENTS 32 Now we introduce a powerful limit-taking technique: L’Hôpital’s Rule to find the exact limits of some sequences. L’Hôpital’s Rule 0 0 ∞ L’Hôpital’s Rule is used to find the limit of indeterminate forms such as 00 , ∞ ∞ , 0 × ∞, ∞ − ∞, 0 , ∞ , 1 . L’Hôpital’s Rule: For indeterminate products/quotients follow the steps: 1. If necessary, rearrange expression into fractional form (Note that f (n) and g(n) need to have derivatives.) f (n) ∞ 0 f (n) , so that lim becomes or . n→∞ g(n) g(n) ∞ 0 f (n) f ′ (n) = lim ′ . n→∞ g(n) n→∞ g (n) 2. Use lim (Note: step 2 is invalid if lim f (n) n→∞ g(n) is not an indeterminate form!) The following examples illustrate this theorem. Example 1.2.9. Use L’Hôpital’s Rule to find the limit of the following sequences defined by (a) an = n−3 n+2 (b) bn = 2n2 − 3n + 1 5n2 − 6 (c) cn = 1 + n + 2n3 1 − n − n3 CONTENTS 33 Note: All sequences in the previous example are polynomial fractions. Alternatively, to find the limit of a polynomial fraction, divide through by the highest power of n in the denominator, and then let n → ∞. It leads to the following useful formula. a if k = l b + ... = 0 if k<l lim n→∞ bnl + ... ∞ if k > l ank Example 1.2.10. Use L’Hôpital’s Rule to find the limit of the following sequences defined by (a) an = ln(n) n (b) bn = en n2 (c) cn = ne−n Now consider a sequence {an }∞ n=1 which has an indeterminate form of 00 or ∞0 or 1∞ . If we consider the sequence ln{an }∞ n=1 , we obtain another sequence which ∞ . Hence has an indeterminate form of 0 × ∞ or 00 or ∞ we can use our previous results. This method uses the following two steps: (a) Find lim ln(an ) = b n→∞ (b) Undo the logarithms: lim an = eb . n→∞ Note: • The second step uses the following fact: CONTENTS 34 lim ln(an ) = b n→∞ ⇐⇒ lim an = eb . n→∞ • Indeed, more generally, we have the following fact: If an > 0 → a and bn → b, then abnn → ab . Example 1.2.11. 1 Calculate the limit of the sequence defined by an = n n . Example 1.2.12. n Determine whether the sequence an = 1 + n1 is convergent or not. 2.71828 an 2 1 0 12345 10 15 20 n A derivation of e Example (1.2.12) illustrates a very important formula: a n = ea n→∞ n a nb lim 1 + = eab n→∞ n lim and more generally 1+ (1.3) Before using this important formula to find the limits of some indeterminate sequences with form 1∞ , we introduce an interpretation of e by considering the continuous compounding of interest. CONTENTS 35 Example 1.2.13. Assume that the annual interest is 100%. If you deposit $1 into a bank, how much money is there in your bank account after one year? • If it is compounded once a year: $(1 + 1) = $2. • If it is compounded twice a year (i.e., every half-year): 1 1+ 2 1 + 1+ 2 1 = 2 1 1+ 2 2 = 2.25. • If it is compounded three times a year: 1 1 21 1 3 1 1 1+ = 2.37. + 1+ = 1+ + 1+ 3 3 3 3 3 3 • If it is compounded four times a year 1 1+ 4 4 = 2.44. • If it is compounded 12 times a year (i.e., every month): 1 12 = 2.61. 1+ 12 • If it is compounded 52 times a year (i.e., every week): 1 52 = 2.69. 1+ 52 • If it is compounded 365 times a year (i.e., every day): 1 1+ 365 365 = 2.71. • If it is compounded n times a year: 1 n 1+ n • If it is compounded continuously in a year (i.e., every moment): 1 n = e ≈ 2.72. lim 1 + n→∞ n CONTENTS 36 Note: that lim n→∞ lim n→∞ 1 1+ n 1 1+ n n n 6= 1. In fact, = e = 2.718281828459045 · · · Example 1.2.14. Find the limit of each of the following sequences defined by n+3 n (a) an = n n 3 = 1+ n 3 n Thus, lim an = lim 1 + n→∞ n→∞ n = e3 bn = (b) = lim and n→∞ 1+ 1 8 n !n =e 1+ 1+ 1 8 1 8 n 1 8 n !n+3 !n and 1+ lim n→∞ 1 ⇒ lim bn = e 8 . n→∞ Example 1.2.15. Find the limit of each of the following sequences defined by n 2 (a) an = 1 + n+1 (b) bn = 3 1− n n 1 8 n !3 1+ 1 8 n !3 =1 CONTENTS (c) cn = 37 1+ 1 2n2 3n2 −1 Example 1.2.16. If money is deposited in an account where interest of i% p.a. is compounded n times per year, after m periods of compounding an initial deposit P (0) is worth i m P (0) 1 + . n After a period of time t (in units of years), the number of compounds will be m = nt, and the value P (t) of the account then is i nt P (t) = P (0) 1 + . n If the money P (0) is deposited in an account where interest of i% p.a. is compounded continuously, after a period of time t years, then the value P (t) of the account is the following limit i P (t) = lim P (0) 1 + n→∞ n nt . Find the value P (t) by calculating the limit. CONTENTS 38 1.2.2 Series A series is an ordered sum of infinitely many numbers. If the numbers we are adding are a1 , a2 , a3 , . . ., their sum ∞ X an . a1 + a2 + a3 + . . . is written n=1 Example 1.2.17. P Use the notation to express the following series. (a) −1 + 1 − 1 + 1 − 1 + 1 − . . . = (b) 1 − 1 + 1 − 1 + 1 − 1 + . . . = (c) 1 + 1 2 + 1 3 + 1 4 + ··· = (d) 1 + 1 2 + 1 4 + 1 8 + ··· = 1 4 − 1 8 (e) − 12 + + ··· = We are mainly interested in whether a series “adds up" to some number. Often the value itself is not important. Some series obviously do not add up, e.g. ∞ X n=1 1 = 1 + 1 + 1 + 1 + ... = ∞ while for others it isn’t so easy to see. ∞ n X 1 =2 Example 1.2.18. 1. 2 n=0 2. Harmonic series: ∞ X 1 =∞ n n=1 Example 1.2.19. The Matlab code to evaluate the two series above is % Calculate various series syms n symsum((1/2)^n, 0,inf) symsum(1/n, 1,inf) CONTENTS 39 Partial sums To find out whether a series has a finite sum, we start by adding up a few terms, and then progressively more, to see if there is a pattern in these partial sums. Definition 1.2.20. The partial sum of the first n terms of a series ∞ X an is denoted by sn . Thus, n=1 s 1 = a1 s 2 = a1 + a2 s 3 = a1 + a2 + a3 s4 = a1 + a2 + a3 + a4 .. . s n = a1 + . . . + an The partial sums of a series form a sequence: s1 , s2 , s3 , . . . , sn , . . . . In some cases, we can see a pattern in the sequence {sn }: Example 1.2.21. ∞ X n ln n+1 n=1 Using the rule of logarithms: sn = ln = 1 2 ln a b = ln a − ln b, + ln 23 + ln 34 + · · · + ln n−1 n + ln n n+1 = Definition 1.2.22. If the sequence of partial sums of a series ∞ X an converges to a real number L n=1 i.e. if lim sn = L, n→∞ then we call L the sumP of the series, and say the series ∞ n=1 an converges to L, for short ∞ X an = L. n=1 Otherwise, we say the series Note: ∞ ∞ X X an diverges. an = lim sn = ∞, then the series If n=1 n→∞ n=1 It follows from Definition (1.2.22) that the series ∞ X n=1 ln n n+1 P∞ n=1 an diverges. in Example (1.2.21) diverges, as lim sn = − lim ln(n + 1) = −∞. n→∞ n→∞ This type of series here is called telescoping, because of the way it simplifies. In the next subsection, we will apply Definition (1.2.22) (i.e., partial sums ) to discuss geometric series. CONTENTS 40 However, sometimes a pattern in the partial sums isn’t so easy to see, we need some special methods. Note: Applying Definition (1.2.22), the following properties can be established. (a) For every positive integer n0 : (i) (ii) ∞ X an converges n=1 ∞ X ∞ X =⇒ an diverges an converges, n=n0 ∞ X an diverges. =⇒ n=n0 n=1 (b) For every non-zero real number C constant, (i) (ii) ∞ X Can converges n=1 ∞ X Can diverges an diverges. =⇒ n=1 (iii) Furthermore, when an converges, n=1 Can = C ∞ X an and ∞ X (an + bn ) converges, and an . n=1 ∞ X bn converge, then n=1 n=1 (ii) ∞ X ∞ X ∞ X n=1 (i) an converges, n=1 ∞ X n=1 (c) If ∞ X =⇒ n=1 ∞ X (an + bn ) = ∞ X n=1 n=1 an + ∞ X bn . n=1 Geometric series If you decide to walk a certain distance, say 1 kilometre, each day walking half of the distance remaining, you’ll never reach your destination no matter how long you live. 1 The first day you walk km. 2 1 The second day you walk km. 4 1 n The nth day you walk km. 2 If you kept to this programme forever, you’d walk the kilometre. So it must be that ∞ n X 1 = 1. 2 n=1 The sum of all the distances, being successive powers of a fixed number 21 , forms a geometric series. CONTENTS 41 Definition 1.2.23. ∞ X an is geometric A series n=1 an+1 if is constant for all n. an A geometric series can be written as a ∞ X rn n=0 where • the starting index is now 0, for convenience, an+1 (a constant ratio of successive terms ), an • r= • a is the first term of the series. Example 1.2.24. P n Express the following geometric series in the form a ∞ n=0 r . (a) 1 1 1 + + + ··· = 4 8 16 (b) 1 = 0.3̇ 3 Geometric series have the convenient property that if they converge, their P∞ sumn can be found. Partial sums of n=0 r : s1 = 1 s2 = 1 + r s3 = 1 + r + r 2 .. . sn = 1 + r + r 2 + r 3 + . . . + r n−1 Now we use some algebra: rsn = r+r 2 +r 3 + . . . +r n−1 +r n sn =1 + r+r 2 +r 3 + . . . +r n−1 ⇒ rsn − sn = r n − 1 ⇒ sn (r − 1) =r n − 1, 1 − rn , if r 6= 1. 1−r In this case, we have an expression for sn , the partial sum of the geometric series: P n The sum of a geometric series ∞ n=0 r is ⇒ sn = ∞ X n=0 It follows: 1 − rn . n→∞ 1 − r r n = lim sn = lim n→∞ CONTENTS 42 Convergence result for Geometric Series If |r| < 1: ∞ X If |r| ≥ 1: ∞ X rn = n=0 1 1−r r n diverges. n=0 Example 1.2.25. For the following geometric series determine if the series converges or diverges and if it converges, state the limit. (a) ∞ X en n=0 (b) ∞ X e−n n=0 ∞ X 1 n (c) − 3 n=1 (d) ∞ n X 3 n=0 2 Example 1.2.26. Write the number 0.61̇ = 0.611 . . . = 6 1 + + ... 10 100 as a geometric series, and find its sum as a fraction. CONTENTS 43 Ratio Test for convergence There are many series which are not geometric. We introduce a test closely related to the test of convergence of geometric series, to determine the convergence an+1 If lim of other more general series: n→∞ an Ratio Test for < 1, the series converges; > 1, the series diverges; = 1, the test is inconclusive. ∞ n X e n=0 (b) n! ∞ X n! n=1 nn Example 1.2.28. Just so that you don’t think this test works for every series: (i) ∞ X 1 n=1 (ii) n ∞ X 1 n2 (diverges) (converges) n=1 Note: It can be shown (using methods covered in MATHS 250) that in general: Hyperharmonic or p-series ∞ X 1 is convergent if and only if p > 1. np n=1 an n=1 Example 1.2.27. Determine whether the given series converge or diverge. (a) ∞ X CONTENTS 44 1.2.3 Taylor Polynomials Local Approximations The tangent line to the graph of a function y = f (x) at a point where x = c has equation y = f (c) + f ′ (c)(x − c). In some sense, the tangent line is the line “closest” to the curve at this point. The tangent line to f (x) at x = c shares the following properties with y = f (x): (i) the same y-value as y = f (x) at x = c, (ii) the same y ′ -value as y = f (x) at x = c. The tangent line to y = f (x) is called a linear approximation to f (x). Example 1.2.29. The tangent line of y = x2 at (1, 1) is y y = 2x − 1 y − 1 = 2(x − 1) or y = 2x − 1. The equation of the tangent line y = 2x − 1 is a polynomial of degree one in x. y = x2 (1, 1) x Figure 1.3: y = x2 and its tangent line at (1, 1) Sometimes we want to make a better approximation than a tangent line. Consider the problem: given a function, find a polynomial of given degree k which is closest to the function, at least in some local region. This problem is of importance, since • polynomials are easy to manipulate (e.g. to evaluate, graph, differentiate and integrate) and • we can sometimes approximate a function well about a specific point by a simple polynomial. To ensure that we can find such a polynomial, we restrict our study to functions f (x) which have at least as many derivatives as the degree of the polynomial we want in the region of interest. If we only care to approximate f (x) about a point c up to and including its kth derivative for some number k, then we can do no better than its Taylor Polynomial pk (x): CONTENTS 45 Definition 1.2.30. The Taylor Polynomial of degree k for f (x) about a point c is the polynomial pk (x) of degree k such that: pk (x) = f (c) + f ′ (c)(x − c) + f ′′ (c) = (x − c)2 (x − c)k + . . . + f (k) (c) 2! k! k X f (n) (c)(x − c)n n! n=0 When c = 0, we obtain the Maclaurin polynomial of degree k: pk (x) = f (0) + f ′ (0)x + f ′′ (0) = xk x2 + . . . + f (k) (0) 2! k! k X f (n) (0)xn n=0 n! Note: If we find the Taylor polynomials pn (x) of y = f (x) about a point x = c, then p0 (x) is the horizontal line through (c, f (c)): p0 (x) = f (c). p0 (x) gives the correct value of the function at a point x = c. p1 (x) is the tangent line to f (x) through (c, f (c)): p1 (x) = f (c) + f ′ (c)(x − c). p1 (x) gives the correct value of the function and slope of tangent at the point x = c. p2 (x) is a parabola through (c, f (c)): p2 (x) = f (c) + f ′ (c)(x − c) + f ′′ (c) (x − c)2 . 2! p2 (x) gives the correct value of the function, slope of tangent and concavity at a point x = c. pk (x) satisfies (k) pk (x) = f (k) (x), k = 0, 1, 2, . . . , k. That is, f and its Taylor polynomial have the same value, and the same {first, second, . . . , kth} derivatives at x = c. Using Taylor polynomials to approximate functions introduces the notion of an error in the approximation: CONTENTS 46 Definition 1.2.31. The error in using a Taylor polynomial pk (x) of y = f (x) to approximate the value of f (x) at a point x̄ is error = |f (x̄) − pk (x̄)| (1.4) Example 1.2.32. For the function f (x) = ex , (a) find the Taylor polynomial of degree 3 about the centre c = 0. f (n) (x) n f (n) (c) 0 1 2 3 (b) At each of the following points, approximate ex with p3 (x), and use your calculator to find the error in this approximation. How does the error change as x approaches the centre of the approximation? (a) x = 0 (d) x = 0.5 (b) x = 1 (e) x = −0.1 (c) x = −1 (f) x = 0.01 2 y = p2 (x) y = p1 (x) y y = p0 (x) 0 y = ex y = p3 (x) −2 −2 0 x 2 Figure 1.4: ex and some of its Taylor polynomials about c = 0 CONTENTS 47 Example 1.2.33. For function f (x) = sin(x), (a) find the Taylor polynomial of degree 5 about the centre c = 0. f (n) (x) n f (n) (c) 0 1 2 3 4 5 (b) At each of the given points, approximate sin(x) with p5 (x), and use your calculator to find the error in this approximation: (a) x = 0 (b) x = −1 (c) x = 0.5 (d) x = −0.1 (e) x = 0.01 y = p5 (x) y y = p3 (x) 0 y = sin(x) y = p1 (x) −2π − 3π 2 −π − π2 0 x π 2 π 3π 2 Figure 1.5: sin(x) and some of its Taylor polynomials about the centre c = 0 2π CONTENTS 48 Example 1.2.34. For each of the given functions, find the Maclaurin polynomial p4 (x): (a) f (x) = ln(1 + x) (b) f (x) = (c) f (x) = 1 1−x 1 1 + x2 (a) y = ln(1 + x) n f (n) (x) f (n) (c) (b) y = n f (n) (x) 1 1−x f (n) (c) (c) y = n 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 Example 1.2.35. Using Taylor or Maclaurin polynomials, approximate the following numbers to within accuracy of 10−4 . √ 1 (d) 0.97 (a) 0.95 √ 1 (b) 4.01 (e) 1.04 π (c) cos 12 (f) ln 1.1 Example 1.2.36. Matlab has a taylor function to compute Taylor polynomials, and a graphical tool taylortool to illustrate the approximation of Taylor polynomials to any given function. For example, %Matlab-session syms x % Taylor polynomial of degree 3 about c=0 taylor(1/(1+x), 4, 0) % Taylor polynomial of degree 4 about c=0, taylor(cos(x), 5, 0) % taylortool for graphical representation taylortool(’cos(x)’) f (n) (x) 1 1 + x2 f (n) (c) CONTENTS 49 1.2.4 Taylor Series Definition 1.2.37. The Taylor Series of a function f (x) about the centre c is the series: ∞ X f (n) (c)(x − c)n n=0 n! = f (c) + f ′ (c)(x − c) + f ′′ (c) (x − c)2 (x − c)n + . . . + f (n) (c) + ... 2! n! When c = 0, the Taylor series becomes the Maclaurin series for f (x): ∞ X f (n) (0)(x)n n=0 n! = f (0) + f ′ (0)x + f ′′ (0) x2 xn + . . . + f (n) (0) + ... 2! n! Example 1.2.38. Find the Maclaurin series for the following functions: (a) ex (c) sin(x) (b) cos(x) (d) Example 1.2.39. Find the Taylor series for 1 1−x 1 about c = 1 by writing the function as an expression in the variable x − 1. x CONTENTS 50 In the sections on geometric series and the ratio test, we saw series whose nth-terms are constants. In Maclaurin and Taylor series we have series whose nth-term involves a variable x. Such a series is known as a power series. Definition 1.2.40. (i) A power series in x has the form a0 + a1 x + a2 x2 + a3 x3 + · · · = ∞ X an xn n=0 and is said to be centred at x = 0. (ii) More generally, a power series in (x − c) has the form a0 + a1 (x − c) + a2 (x − c)2 + a3 (x − c)3 + · · · = ∞ X n=0 an (x − c)n and is said to be centred at x = c. Notes: • If we truncate a Taylor series at the (x − c)k term, we get the corresponding Taylor polynomial pk (x). P n • As x varies, the power series ∞ n=0 an (x − c) may or not converge. Theorem 1.2.41. One of the following three properties characterises any Converges power series in x − c: (i) The series converges for all x. c Case (i) - infinite interval of convergence (ii) The series converges only when x = c. (iii) There is a finite interval on the x-axis centred at c, such that Converges Diverges Diverges • within this interval, the series converges. • outside this interval, the series diverges. c Case (ii) - trivial interval of convergence • at the end-points of this interval, anything may happen. This interval is called the interval of convergence of the power series. The radius of convergence is the distance from the centre to either end-point. This result is useful for knowing when we can use Taylor approximations: If x̄ is in the interval of convergence of a Taylor series for f (x), f (x̄) can be approximated by the value of any associated Taylor polynomial pk (x̄). Diverges Converges c−R c Diverges c+R Case (iii) - finite interval of convergence CONTENTS 51 Example 1.2.42. Write out the first few terms of the following power series. In the last two cases the series are geometric: identify when the series converges. (a) (b) ∞ n X x (c) n! n=0 xn n=0 ∞ X (−1)n x2n n=0 ∞ X (d) (2n)! ∞ X (x − 2)n n=0 Example 1.2.43. All the power series in the preceding example are in fact Taylor series for particular functions. That is, these power series have the form: ∞ ∞ X X f (n) (c)(x − c)n an (x − c)n = n! n=0 n=0 Can you recognise the functions? P an+1 < 1. Recall: the ratio test says a series an is convergent when lim n→∞ an P P So, a power series an (x − c)n = un is convergent when lim n→∞ un+1 < 1. un Solution Technique 1.2.44. To determine if a power series in x − c X X an (x − c)n = un converges at a point x: • Put the value of x in the expression for the nth term. The series is no longer a power series but a regular series: each term is a number. • Find the limit l = lim n→∞ un+1 and use the Ratio test. un Example 1.2.45. The Taylor series for ex about the centre c = 0 is Determine whether the series converges when (a) x = 0 ∞ n X x n=0 (b) x = 1 n! . (c) x = 100 CONTENTS 52 Taylor series from known formulae It can be shown that the Taylor series of a function about a centre c is unique. The Taylor series of a function f (x) is nothing but a power series with the function f (x) as its sum. P 1 n is the sum of a power series ∞ For example, on the interval |x| < 1, the function f (x) = 1−x n=0 x . Thus, the P∞ n 1 power series n=0 x is the Taylor series of f (x) = 1−x on (−1, 1). If a function f (x) can be expressed as a power series in any way, then it is the Taylor series of this function f (x). Geometric series a = a + ar + ar 2 + ar 3 + 1−r A a A can be written = by solving for a and b. Bx + C Bx + C 1 − br ∞ X a In this form, we apply the series expansion = a (br)n to obtain the Maclaurin series for 1 − br n=0 A . Bx + C (i) c = 0. An expression of the form (ii) c 6= 0. Writing for A a = and solving for a and b, we use this form to find the Taylor series Bx + C 1 − b(x − c) A about x = c. Bx + C Example 1.2.46. Find the Taylor series for function f (x) = (a) about the centre c = 0: (b) about the centre c = 2: 2 and the associated interval of convergence 7 − 3x CONTENTS 53 x2 x3 + + ··· 2! 3! The function y = ex has Maclaurin series Exponentials ex = 1 + x + x e = ∞ n X x n=0 n! . We can obtain the Taylor series for ex centred at any c by rewriting the Maclaurin series for ex as a series in x − c: ex = ec+(x−c) = ec ex−c ∞ X (x − c)n c = e × . n! n=0 Example 1.2.47. Use this technique to find the Taylor series for ex about x = 1. Example 1.2.48. Find the Taylor series for function f (x) = e2x+6 and give the interval of convergence I. (a) about centre c = 0. (b) about centre c = 1. We now investigate finding other Taylor series from known formulae. So far, we have found the following four such formulae: 1 = 1 + x + x2 + . . . + xn + . . . 1−x = ∞ X xn , n=0 ∞ n X x x2 xn + ... + + ... 2! n! = sin x = x − x3 (−1)n x2n+1 + ... + + ... 3! (2n + 1)! = cos x = 1 − (−1)n x2n (−1)n x2n x2 x4 + − ... + + . . .= , 2! 4! (2n)! (2n)! n=0 ex = 1 + x + n=0 ∞ X n=0 ∞ X n! , (−1)n x2n+1 , (2n + 1)! − 1 < x < 1. (1.5) − ∞ < x < ∞. (1.6) − ∞ < x < ∞. (1.7) − ∞ < x < ∞. (1.8) CONTENTS 54 Inside their interval of convergence, power series have a very nice property: like polynomials, they can be differentiated and integrated term by term. P∞ n Fact: If f (x) = n=0 an (x − c) , then inside its interval of convergence: f ′ (x) = ∞ X n=0 Z f (x)dx = an n(x − c)n−1 . ∞ X an n=0 (x − c)n+1 + C. n+1 logarithms A most important application of this fact uses the relationship Z 1 ln(1 + x) = dx 1+x and the series ∞ X 1 xn , = 1+x+x2 +. . .+xn +. . . = 1−x n=0 −1 < x < 1. Example 1.2.49. Find the Maclaurin series of ln(1 + x) and its interval of convergence. ln(1 + x) = ∞ X (−1)n xn+1 n=0 n+1 , if − 1 < x ≤ 1. So, to find the Taylor series for ln(Ax + B) about x = c, we solve Ax + B = a(1 + b(x − c)). Example 1.2.50. Find the Taylor series for function f (x) = ln(5x − 4) and the interval of convergence about the centre c = 1. CONTENTS 55 Example 1.2.51. Find the Taylor series for function (x) = Hint: f (x) = x2 x2 1 about the centre c = 0 and the interval of convergence. − 3x + 2 1 1 1 1 = = − − 3x + 2 (x − 1)(x − 2) x−2 x−1 Example 1.2.52. (a) Find the Taylor series for sin x about the centre c = π4 . (b) Find the Maclaurin series for function f (x) = (c) Find Z f (x) dx = Z sin x dx. x sin x . x CONTENTS 56 1.3 Integration Techniques 1.3.1 Review: Substitution In MATHS 108, the basic rules for differentiation were covered, and integration was introduced. Integration is the reverse process to differentiation. However, it is not as simple as differentiation and not every expression can be integrated by a simple application of rules. A rule of differentiation may produce a rule of integration. For instance, integration by substitution, covered in MATHS 108, is derived from the Chain Rule: d (f (u(x))) = f ′ (u(x)) u′ (x) dx Note: The Chain Rule produces a product, with one factor the derivative of part of the other. The technique of the integration by substitution goes in the opposite direction: Z Z f ′ (u(x)) u′ (x) dx = f ′ (u) du = f (u) + c (1.9) (1.10) Differentiation of this result with respect to x returns the original integrand f ′ (u(x)) u′ (x), verifying the answer. By making a change of variables in anticipation of the Chain Rule, an integral may be transformed into a much simpler one. Example R √1.3.1. Find x 3x2 + 5 dx. Example 1.3.2. Find the following integrals by substitution: Z (a) (1 + 3x)5 dx (b) Z x dx 1 + x2 (c) Z x dx 1+x (d) Z esin θ cos θ dθ CONTENTS Z ln(s) ds s (f) Z sin(2y) dy (g) Z e−4x dx (h) Z u du (1 + u)4 (i) Z x (e) p 2x2 − 3 dx 57 (j) Z 4x √ 2x2 3 dx −3 (k) Z (ln(x))4 dx x (l) Z cos(3t) sin(3t) dx (m) Z tan(πθ) dθ (n) Z sin(ln(z)) dz z CONTENTS 58 Z ln x 1 Not every integral can be evaluated directly or by substitution. For example, dx = (ln x)2 + c can be x 2 Z found by a change of variables, but ln x dx = x ln x − x + c cannot be found directly or by substitution. We move to another rule for differentiation for a second integration technique: The Product Rule yields Integration by Parts. 1.3.2 Integration by Parts Recall that if u = u(x) and v = v(x) are differentiable functions of a variable x, then the Product Rule says d du dv (uv) = v+u dx dx dx Integrating both sides with respect to x gives du v dx + dx Z u u v ′ dx = uv − Z u′ v dx, (1.11) Z Z v du (1.12) uv = and rearranging terms, or Z Z u dv = uv − dv dx, dx Note: • The assignment of u(x) and v ′ (x) is crucial. • There may be more than one way to assign them. Generally, choose for v ′ (x) a part which can be integrated easily. R But only as long as the new integral u′ (x)v(x) dx is at least as easy as the original integral. ExampleR1.3.3. Evaluate xex dx. Consider both choices: R x u = ex dv = x dx (i) xe dx = 2 du = ex dx v = x2 When applying the integration by parts formula Z (ii) R xex dx u dv = uv − ated, u(x), from as high on the following list as possible (LIATE): • Logarithms • Inverse trigonometric functions • Algebraic • Trigonometric functions • Exponential functions Z = u =x du = dx dv = ex dx v = ex v du choose the function to be differenti- CONTENTS 59 ExampleR1.3.4. Evaluate (x − 2) sin 2x dx. ExampleR1.3.5. Evaluate x ln x dx. Example R1.3.6. Evaluate ln x dx. (Hint: Recognise the integrand as the product 1 · ln x.) Note: We may use integration by parts repeatedly, . . . or we may construct an equation and solve it. ExampleR1.3.7. Evaluate t2 e−3t dt. Example 1.3.8. The Matlab code to integrate in the previous example is % Matlab-session % - symbolic integration syms t int(t^2*exp(-3*t),t) CONTENTS 60 ExampleR1.3.9. Evaluate sin(θ)e−θ dθ. Example 1.3.10. Find the following integrals: Z ln(2x) (a) dx x R √ (b) x x2 + 1 dx (c) Z x √ dx 2 x −1 (d) Z 5x2 √ dx 2x3 + 1 CONTENTS (e) Z (f) Z (g) Z (h) Z 4x2 p 3x−4 e 61 5 3x3 + 2 dx dx 2 x ln(x) dx 3x−4 xe dx (i) Z 3 dx 2x − 1 (j) Z 2xe−x+4 dx (k) Z (x2 + 1)e2−3x dx (l) Z (x2 − x + 3)e2−3x dx CONTENTS (m) Z (n) Z 2x2 p 62 3x3 + 2 dx θ cos(πθ)e dθ (o) Z 6 dx 2 − 3x (p) Z sin(πx)e−2x dx Linear algebra 2.1 Vector Spaces Throughout this course, all vectors and matrices we consider will have real components - values on the real number line R. We consider Rn as an extension of R2 and R3 . The geometry of vectors, points, lines and planes in the 2dimensional plane R2 and 3-dimensional space R3 is extended to vectors of n components. A vector can be thought of as either z w u+v+w (a) a geometrical object in R2 or R3 with length and direction v (b) an ordered list of n components (row or column), as for example the variable x and constant b in a system of linear equations Ax = b. u x 2.1.1 Review: Solving Linear Systems of Equations Linear algebra is chiefly concerned with techniques for solving systems of linear equations. A system of m linear equations in the n unknowns x1 , x2 , · · · xn has the form a11 x1 a21 x1 .. . + + .. . a12 x2 a22 x2 am1 x1 + am2 x2 + ··· + ··· .. .. . . + ··· +a1n xn = +a2n xn = b1 b2 +amn xn = bm (2.13) All coefficients aij , right-hand-side terms bi and unknowns xi will be real numbers in Maths 208. The system (2.13) can be written in matrix form as Ax = b, where • A is an m × n matrix, • x an n × 1 vector of unknowns, and • b an m × 1 vector of numbers. 63 u+v y CONTENTS 64 If b = 0, we call the system homogeneous. Otherwise we call it inhomogeneous. Recall that any system of linear equations has either (i) one solution, (ii) infinitely many solutions, (iii) no solution. Elementary Row-Operations We classify three types of such “elementary row-operations”: (i) row-exchange: Interchange any two rows (ri ↔ rj ). (ii) row-multiple: Multiply a row by a non-zero constant (ri → kri ). (iii) row-addition: Replace a row by itself plus any multiple of another row (ri → ri − krj ). Definition 2.1.1. Two matrices A and B related by a series of row-operations are written “A∼B”. We say A and B are row-equivalent. Echelon and Reduced Echelon Form Definition 2.1.2. (a) A matrix A (not necessarily square) is in echelon form if • all rows of zeros are at the bottom • the first non-zero entry in every row (called a “pivot” or “leading entry”) is to the right of the first non-zero entry in the previous row (step-like pattern of leading entries) • the leading entry in every row has zeros below it (b) A matrix A is in reduced echelon form if • it is in echelon form • the leading entry in every row (the pivot) is 1 • each leading 1 is the only non-zero entry in its column Definition 2.1.3. The number of leading entries (pivots) in the echelon form of a matrix A is called the rank of A, denoted by rank(A). CONTENTS 65 Notes: 1. Echelon form for a matrix is not unique - there are many possibilities. However, in echelon form the number of pivots is unique. This is the number of non-zero rows in echelon form. 2. Reduced echelon form for any matrix is unique (there is only one). You can already solve a linear system in several ways: (a) If A is square (n × n) and has an inverse: x = A−1 b. (b) Elementary row-operations can be used to reduced the augmented matrix [A|b] to (i) echelon form: use back-substitution to solve. (ii) reduced echelon form: read off the solution. Summary 2.1.4. For square matrices, the following are equivalent conditions: • A−1 exists – i.e. A is non-singular, or invertible • det(A) 6= 0 • A ∼ In • every row and column of echelon form has a pivot • rank(A) = n • the system of linear equations Ax = b has a unique solution for every b • the homogeneous system of linear equations Ax = 0 has only the trivial solution x = 0 And when A does not have an inverse: Summary 2.1.5. For square matrices, the following are equivalent conditions: • A−1 does not exist – A is a singular matrix • det(A) = 0 • A 6∼ In • not every row and column of echelon form has a pivot • rank(A) < n • the homogeneous system of linear equations Ax = 0 has a non-trivial solution x 6= 0 Now we begin to study vectors. Notation: We will sometimes write vectors as row-vectors, and sometimes as columns. We will not distinguish between the two except where matrix multiplication dictates one form over the other. CONTENTS 66 2.1.2 Linear Combinations Definition 2.1.6. A vector b is a linear combination of vectors v1 , v2 , . . . , vk if we can write b = x1 v1 + x2 v2 + · · · + xk vk for scalars x1 , x2 , . . . , xk . Note: We can write this linear combination in matrix form, letting | | | A = v1 v2 · · · vk . | | | Then (2.14) becomes b = Ax. So a linear combination of vectors (2.14) is no more than the vector-form of a matrix equation. (2.14) CONTENTS 67 We will spend some time studying how to simplify linear combinations. They sometimes appear more complicated than they really are. Example 2.1.7. 2 1 1 (a) Find constants c1 and c2 so that 3 = c1 1 + c2 2. 4 1 3 (b) Use your answer to (a) to simplify the linear combination 1 1 2 x1 1 + x2 2 + x3 3 1 3 4 It isn’t always easy to see how to simplify a linear combination of vectors. That’s the subject of our next section. CONTENTS 68 2.1.3 Linear Independence and Dependence Linear independence is a very important notion: it holds the key to the idea of dimension, and is fundamental in most applications of linear algebra. Definition 2.1.8. (i) We say that a set of vectors {v1 , v2 , . . . , vn } is linearly independent if the only solution of c1 v1 + c2 v2 . . . + cn vn = 0 (2.15) is c1 = c2 = . . . = cn = 0. (ii) A set of vectors which is not linearly independent is called linearly dependent. Notes: • (2.15) is vector form of the matrix equation | | | Ax = 0 for matrix A = v1 v2 · · · vn . | | | From this perspective, vectors v1 , v2 , . . . , vn are linearly independent if Ax = 0 ⇒ x = 0, and linearly dependent if Ax = 0 for some x 6= 0. • Linear dependence of a set of vectors {vi }ni=1 means one of the vectors vi can be written as a linear combination of the others. From the reduced echelon form of a matrix (sometimes abbreviated rref), it is particularly easy to detect linear dependence among its columns. Example 2.1.9. Consider the columns of the matrix 1 1 1 1 A = 1 1 1 1 = v1 v2 · · · v4 . 1 2 3 4 Reducing A to reduced echelon form, ❦ 1 1 1 1 1 0 −1 −2 1 1 1 1 ∼ 0 1❦ 2 3 = U. 1 2 3 4 0 0 0 0 We have the augmented system for simultaneously solving the pair of equations c1 v1 + c2 v2 = v3 d1 v1 + d2 v2 = v4 . CONTENTS 69 a As any vector b can be written c 0 0 a 1 b = a 0 + b 1 + c 0 , 1 0 0 c (2.16) we can write the non-pivot columns of reduced echelon form in terms of the pivot columns. So in the matrix U above, column3 = −column1 + 2 · column2 . column4 = −2 · column1 + 3 · column2 . (a) Verify that in the original matrix A, the same relationships hold, establishing the linear dependence of the sets {v1 , v2 , v3 } and {v1 , v2 , v4 }. (b) Using these relationships, simplify the general linear combination c1 v1 + c2 v2 + c3 v3 + c4 v4 (c) Alternatively, solve the homogeneous system c1 v1 + c2 v2 + c3 v3 + c4 v4 = 0 directly to determine linear dependence. Compare with (a). Row-reduction of a matrix to reduced echelon form preserves the relationships between the columns, but makes them more obvious! Linear dependence among vectors v1 , v2 , · · · , vk is visible in the reduced echelon form U of a matrix 1 ∗ 0 0 ∗ ∗ ... a 0 0 1 0 ∗ ∗ ... b A = v1 v2 · · · vk ∼ 0 0 0 1 ∗ ∗ . . . c = U. 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 ... 0 • Divide the reduced echelon-form matrix U between the pivot columns and the non-pivot columns. • The set of pivot columns of U is linearly independent. So also is the set of corresponding columns of A. • Each non-pivot column of U is a linear combination of the pivot columns, by (2.16). CONTENTS 70 • The same linear relationships hold for the columns of A. • As illustrated, vk = av1 + bv3 + cv4 . Notes: • The ordering of the vectors v1 , v2 , · · · vk is not unique, but this ordering determines which relationships the reduced echelon form elicits. We could have written the vi in a different order to get a different linearly-independent set, and a different, but equivalent, set of linear dependence relationships. • A pivot column of reduced echelon form does not yield a linear relationship with other columns – there are only zeros above and below the pivot. • The column-relationships of reduced echelon form hold for each intermediary matrix in the row-reduction. FACT A matrix of rank s has exactly s linearly independent columns. Example 2.1.10. Identify the linear relationships between the columns of the given matrices from their reduced-echelon forms. 1 2 3 (a) 2 3 4 3 4 4 1 0 0 1 1 1 1 2 3 0 1 0 (b) 1 0 −1 ∼ 0 0 1 0 0 0 0 0 1 0 1 1 1 0 1 2 3 (c) 1 0 1 0 ∼ I4 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 0 (d) 1 1 1 −1 0 ∼ 0 −1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 CONTENTS 71 0 1 (e) 1 1 0 1 −1 1 0 1 1 1 ∼ 2 1 0 0 3 −1 1 0 1 0 (f) 0 0 0 0 −1 0 1 0 1 0 1 −1 1 0 0 0 1 −1 −1 1 ∼ 0 0 0 −1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 3 0 −1 1 −1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 −1 1 0 −1 0 1 0 −1 0 0 0 Example 2.1.11. Verify the following for any vectors you choose: (a) A single non-zero vector is always linearly independent. (b) Two vectors are linearly independent if they are not multiples of each other (i.e. if they are not parallel). (c) Any two non-zero vectors that are linearly dependent are multiples of each other. (d) Any n + 1 vectors in Rn are linearly dependent. Example 2.1.12. Are the following sets of vectors linearly independent? If not, find a linear relationship between them. 2 1 (a) 2 , 4 . 6 3 2 1 (b) 2 , 4 . 3 −6 1 1 1 (c) , , . 2 3 4 0 0 1 (d) 0 , 1 , 0 . 1 0 0 2 1 1 (e) 1 , 1 , 0 . 0 0 1 1 −1 −1 −1 (f) 1 , −2 , 0 , 1 . 2 0 1 0 CONTENTS 72 The span of a set of vectors Two distinct points (one bound vector) determine a line. Three points not all on the same line (two linearly-independent vectors) determine a plane. The plane is the “span” of these vectors. Definition 2.1.13. The span of vectors u1 , u2 , . . . uk in Rn is {c1 u1 + · · · + cn uk , denoted by Span(u1 , · · · , uk ). c1 , · · · , ck ∈ R}, z v y x u+v u Span(u, v) 2u Note: We use the notation {x, y} to denote the set containing two members, x and y. The span of a set of vectors is the set of all linear combinations of that set of vectors. We say that vectors u1 , u2 , . . . uk span a set X if every x ∈ X can be written x = c1 u1 + · · · + ck uk , for c1 , · · · , ck ∈ R. Note: A linear combination of linearly-dependent vectors can always be simplified, so the span of a linearlydependent set is actually the span of a smaller set of linearly-independent vectors. Example 2.1.14. The span of any number of vectors is closed under scalar multiplication and addition. Consider the case of just two vectors u1 and u2 . Two vectors x = c1 u1 + c2 u2 and y = d1 u1 + d2 u2 are both members of Span(u, u2 ). In addition, rx = (rc1 )u1 + (rc2 )u2 x + y = (c1 + d1 )u1 + (c2 + d2 )u2 are also in Span(u1 , u2 ). CONTENTS 73 Recall that (i) the equation of a line through the origin in R2 is x d =t 1 , y d2 (ii) the equation of a line through the origin in R3 is x d1 y = t d2 , z d3 t∈R t∈R (iii) the equation of a plane through the origin in R3 is ′ d1 x d1 y = s d2 + t d′2 , z d3 d′3 s, t ∈ R where the sets of direction vectors {d} and {d, d′ } are linearly independent. Summary 2.1.15. (i) The span of one linearly independent vector in R2 or R3 is a line through the origin. (ii) The span of two linearly independent vectors in R2 is all of R2 . (iii) The span of two linearly independent vectors in R3 is a plane through the origin. (iv) The span of three linearly independent vectors in R3 is all of R3 . Example 2.1.16. Characterise the following sets in R3 , and sketch them. 2 (a) The span of u1 = 1. 1 4 2 (b) span 1 , 2 . 2 1 1 2 (c) span 1 , 2 . 0 0 CONTENTS 74 3 2 (d) span 0 , 0 . 1 −1 3 4 2 (e) span 0 , 0 , 0 . 1 −1 3 3 0 2 (f) span 0 , 0 , 1 . 1 −1 0 Example 2.1.17. The Matlab function spanner.m (available from course resources) generally takes two arguments, both vectors in R2 or R3 , and illustrates their span. Copy it to your working directory, and call it with two vectors, or as below with two vectors and a pause interval (in seconds). Rotate the graph to see the shape of the span. %Matlab-session: span of two vectors a=[1,2 2]; b=[1 1 1]; % row or column vectors spanner(a,b); Figure 2.6: Output of vs1.m. CONTENTS 75 2.1.4 Definition of a Vector Space In its abstract form, a vector space is a non-empty set V of objects, called vectors, for which addition and multiplication by scalars are defined. For any vectors u, v and w in V , and any scalars r and s, the following ten properties are satisfied: 1. u + v is in V 2. u + v = v + u 3. u + (v + w) = (u + v) + w 4. There is a zero vector 0 in V so that u + 0 = u 5. u in V means −u is also in V , and u + (−u) = 0 6. ru is in V 7. r(u + v) = ru + rv 8. (r + s)u = ru + su 9. (r(su) = (rs)u 10. 1u = u Theorem 2.1.18. The span of any set of vectors in Rn is a vector space. Euclidean Vector Spaces Rn The physical entities we know as lines, planes and 3-d space can easily be seen to be vector spaces - by confirming that they satisfy all the properties above. We call them R1 , R2 and R3 respectively. You live in R3 . y z x x y x R1 R2 R3 With our geometric intuition about lines, planes and space, we can attribute a lot of properties to these vector spaces. Many of these properties extend, algebraically at least, to sets of vectors with an arbitrary number n of components. The set of all such vectors turns out to be a vector space too, Rn . Example 2.1.19. The vectors in Rn are the n-tuples (x1 , x2 , · · · xn ), where each component xi is real. We can easily verify that Rn is a vector space for any integer n > 0. CONTENTS 76 Definition 2.1.20. For any positive integer n, the vector space Rn of all vectors with n components is known as (n-dimensional) Euclidean vector space. Other vector spaces There are a variety of things which are not geometrically recognisable Euclidean vector spaces, but which have an analogous structure, and can therefore be called real vector spaces. Some examples: 1. An m×n matrix may be considered as a list of mn numbers, written in a particular rectangular fashion. Under addition of matrices and multiplication of matrices by a scalar, all m × n matrices form a vector space. 2. Polynomials of degree ≤ n in the variable x: all expressions of the form p(x) = a0 + a1 x + · · · + an xn . 3. In the following subsections we will see that all solutions of the matrix equation Ax = 0 for any particular m × n matrix A. This set is called the nullspace of the matrix A, denoted Null(A). 4. Also, all vectors b for which the linear system Ax = b has a solution, for any given matrix A. This set is called the column space of the matrix A, denoted Col(A). 5. In the Differential Equations section we will see systems of homogeneous linear ordinary differential equations, e.g., (a) (b) dy dt + y = 0, d2 y + 4 dy dt + dt2 3y = 0. The set of solutions of each of these differential equations forms a vector space of functions. We conclude by mentioning that everything we do with real numbers, R, can equally well be done with complex numbers C. Vector spaces over C are of great importance in many applications, but for this course we will stick to real vector spaces. CONTENTS 77 2.1.5 Basis and Dimension Definition 2.1.21. If V is a vector space and B a set of vectors in V , we say that B is a basis of V if (i) B is a linearly independent set; (ii) B spans V . Notes: • It’s easy to determine whether a set of vectors B is linearly independent (the rank of a matrix counts its linearly independent columns). • To determine whether the set B spans the vector space V , we must verify that every y ∈ V is a linear combination of vectors in B. (Linear independence ensures that there are not too many vectors in a basis. Spanning ensures that there are not too few. ) If a vector space V has a basis B, then (a) any set of vectors in V that is larger than B is linearly dependent. (b) any set of vectors in V that is smaller than B does not span the vector space. The standard basis of Rn . Among the bases of Rn , the special basis of vectors each with one component equal to 1 and all other components 0, 1 0 0 0 1 0 e1 = . , e2 = . , . . . , en = . , .. .. .. 0 0 1 is called the standard basis of Rn . Any vector x ∈ Rn can be written x1 1 0 x2 0 1 .. = x1 .. + x2 .. + · · · + xn . . . xn 0 0 0 0 .. . 1 . CONTENTS 78 Example 2.1.22. (a) Sketch the standard bases for R2 and R3 . (b) To sketch a plane in R3 , sketch its basis vectors and complete a parallelogram with lines parallel to these vectors. Definition 2.1.23. Any vector space V that is not just the zero vector has infinitely many bases. But every basis for V has the same number of elements. This number is called the dimension of V , denoted dim(V ). Note: The dimension of a vector space is the number of linearly independent vectors required to span the vector space. We already have one basis, the standard basis, for any Euclidean vector space Rn . This gives the result: FACT Euclidean vector space Rn has dimension n. This result agrees with our colloquial use of the word dimension – we think of • lines as one-dimensional, • planes (like the black-board) as 2-dimensional • space as 3-d. CONTENTS 79 2.1.6 Subspaces of Rn Definition 2.1.24. Let W be a set of some vectors in Rn , i.e. W is a subset of Rn . If W satisfies the following properties: (i) if v1 and v2 are in W , then v1 + v2 ∈ W , (ii) if v ∈ W and r is a scalar, then rv ∈ W , then the set W is called a subspace of Rn . In other words, a subspace W of Euclidean space Rn contains the origin, and is closed under addition and scalar multiplication. Notes: 1. Every subspace of any vector space contains the origin. (Let r = 0 in (ii) above.) 2. A subspace W of a vector space V is a vector space within a vector space. 3. dim(W ) ≤ dim(V ). The span of any set of vectors in Rn forms a vector-space. If we call this vector space W , and if the spanning vectors are a linearly independent set, they form a basis for W . FACT Example 2.1.25. (a) A one-dimensional subspace of Rn is called a line. (b) A two-dimensional subspace of Rn is called a plane. Etc. Example 2.1.26. 1 x 1 Show that the line y = 1 + t 2 in R3 is not a 3 0 z 3 subspace of R . CONTENTS 80 Finding a basis for a subspace of Rn Example 2.1.27. 2 3 4 1 Suppose we have a subspace S of R3 : S= Span 5 , 6 , 7 , 8 . 9 10 11 12 The vectors v1 , . . . v4 , of S span it - we simply have to determine a linearly independent set among them. From the reduced echelon form: 1 2 3 4 1 0 −1 −2 A = 5 6 7 8 ∼ 0 1 2 3 . 9 10 11 12 0 0 0 0 v1 and v2 are linearly independent, so they form a basis for S. Could we have seen this without reducing A to reduced echelon form? • We could look at the echelon form of A, and observe which columns contain the pivots. • These are the pivots-columns in reduced echelon form too. • The corresponding columns of A are linearly independent, and form a basis for the vector space S. To find a basis for the span of vectors v1 , v2 , · · · , vn : • form the matrix A = [v1 v2 · · · vn ] with these vectors as its columns, • reduce A to echelon form U (not unique), • identify the columns of U containing the pivots, • the corresponding columns of the original matrix A are linearly independent and form a basis for Span {v1 , v2 , · · · , vn }. Example 2.1.28. 1 3 0 1 Given the set of vectors S = 1 , 1 , 3 , 0 . 1 −1 −1 1 Find a basis for the vector space in R3 spanned by the set S. CONTENTS 81 2.1.7 Matrices and their Associated Subspaces in Rn Associated with any m × n matrix A are two special subspaces: its nullspace Null(A), and its column space Col(A). Column space Definition 2.1.29. The column space of a matrix A, denoted Col(A), is the span of its columns. If A = v1 v2 · · · vn is an m × n matrix, Col(A) = Span {v1 , v2 , · · · , vn } (2.17) = {x1 v1 + x2 v2 + · · · + xn vn , = {b ∈ Rm : Ax = b, x ∈ Rn } for some x ∈ Rn } Example 2.1.30. As the span of a set of vectors, the column space Col(A) of any matrix A is a subspace of Rm . A basis for Col(A) is found with the technique of Section 2.1.6. To find a basis for the column space Col(A) of a matrix A, • reduce A to echelon form U (not unique) • identify the columns of U containing the pivots • the corresponding columns of the original matrix A form a basis for the column space Col(A). Example 2.1.31. Find a basis and the dimension of the column space of the matrices 1 2 −1 2 1 2 −1 2 (a) A = 1 2 1 −2 ∼ 0 0 1 −2 1 2 −3 6 0 0 0 0 1 2 1 1 2 1 (b) B = 2 4 3 ∼ 0 0 1 0 0 0 3 6 4 We can only solve a system of equations Ax = b if b ∈Col(A). For if A = v1 v2 · · · vn is an m × n matrix, then Ax = x1 v1 + x2 v2 + · · · + xn vn . (Ax is a linear combination of the columns of A) So Ax = b ⇐⇒ b ∈ Col(A). CONTENTS 82 FACT A system of linear equations Ax = b is consistent (has a solution) if and only if b is a linear combination of the columns of A. Example 2.1.32. 1 1 x 1 The system of equations = is inconsis1 1 y 2 tent. 1 1 We cannot write = x1 (see Figure 2.7). 2 1 y Col(A) 1 0 0(1, 2) 1 x Figure 2.7: Ax = b inconsistent when b 6∈ Col(A) Example 2.1.33. Matlab has a symbolic colspace command to find the column space of a matrix. %Matlab-session % a=[1 1 1;1 0 -1;1 -2 1]; colspace(sym(a)) CONTENTS 83 Nullspace Definition 2.1.34. If A is an m × n matrix, the set of all solutions of the homogeneous system Ax = 0 is called the nullspace of A. It is denoted by Null(A). To find Null(A), solve Ax = 0 for x. Example 2.1.35. Find the nullspace of the matrix 1 2 3 4 1 0 −1 −2 3 A = 2 3 4 5 ∼ 0 1 2 3 4 5 6 0 0 0 0 and write it as the span of a set of vectors. Notes: • Null(A) is what we call the general solution of the system Ax = 0. • Every matrix has the vector 0 in its nullspace. • If A has linearly independent columns, Null(A) = {0} – the only solution of Ax = 0 is the trivial solution x = 0, the set containing only the zero vector. • If A has linearly dependent columns, Null(A) 6= {0} – there are non-trivial solutions x 6= 0 of Ax = 0. Recall that the rank of a matrix counts the number of linearly independent columns. CONTENTS 84 Example 2.1.36. Find the nullspace of the following matrices by finding the general solution of the associated homogeneous systems: 1 2 1 0 (a) A = ∼ . 3 4 0 1 1 0 −1 1 2 3 (b) B = 4 5 6 ∼ 0 1 2 . 7 8 9 0 0 0 x1 If Bx = 0, then x = x2 , x3 is free, and the x3 variables x1 and x2 are bound: x1 1 x2 = −2 x3 . x3 1 1 0 −1 −2 1 2 3 4 3 . (c) C = 5 6 7 8 ∼ 0 1 2 0 0 0 0 9 10 11 12 If Cx = 0, then x1 x3 + 2x4 1 2 x2 −2x3 − 3x4 −2 −3 = x3 + x4 . x= x3 = 1 0 x3 x4 x4 0 1 1 2 3 4 5 1 0 0 −1 −2 2 . (d) A = 6 7 9 10 11 ∼ 0 1 0 1 12 13 14 15 16 0 0 1 1 1 1 0 −1 0 −1 1 2 3 4 5 (e) A = 6 7 8 10 11 ∼ 0 1 2 0 1 . 0 0 0 1 1 12 13 14 15 16 CONTENTS 85 Example 2.1.37. For A an m × n matrix, Null(A) is a subspace of Rn . That is: FACT The set of solutions of any homogeneous system of linear equations in n unknowns is a subspace of Rn . Definition 2.1.38. The dimension of Null(A) is called the nullity of A. • If the n columns of A are linearly independent, Null(A) = {0}, and we say the nullity of A is 0. • The dimension of Null(A) is the number of free variables in the solution of the system Ax = b : the total number of variables (n) minus the number of bound variables (rank(A)). To find a basis for the nullspace Null(A) of an m × n matrix A: • reduce A to echelon form U , • back-substitute to solve U x = 0 – this solves Ax = 0, • write the solution in vector form, as all linear combination of some vectors, • a basis for Null(A) consists of these vectors in Rn . Example 2.1.39. Find the nullity of the given matrices: 1 0 −1 1 2 3 (a) A = 4 5 6 ∼ 0 1 2 7 8 9 0 0 0 1 2 3 4 1 0 −1 −2 (b) B = 5 6 7 8 ∼ 0 1 2 3 9 10 11 12 0 0 0 0 1 0 0 −1 (c) C = 0 0 −1 1 0 0 0 0 CONTENTS 86 Example 2.1.40. The Matlab function eigshow illustrates the nullspace of a 2 × 2 matrix very well. Use your mouse to move the x vector around, and see where Ax is. When Ax is at the origin, x ∈ Null(A). %Matlab-session A=[1 2; 2 4] rank(A) null(A) null(sym(A)) B=[1 2 3; 2 3 4; 4 5 6] u=null(sym(B)) eigshow([1 3;4 2]/4); eigshow(A); FACT % unit vectors % not unit-vectors %rank 2 matrix %rank 1 matrix Vectors in Null(A) are orthogonal to every row of A. Example 2.1.41. Verify the fact above for the matrix 1 2 3 4 1 0 −1 −2 A = 2 3 4 5 ∼ 0 1 2 3 . 3 4 5 6 0 0 0 0 CONTENTS 87 2.1.8 The General Solution of Ax = b The nullspace of a matrix A, the solution of Ax = 0, plays an important part in the general solution of inhomogeneous systems of equations Ax = b. This is because matrix multiplication is a linear transformation it possesses the property of linearity. Linearity Definition 2.1.42. For any matrix A, compatible vectors u and v, and scalar c, A(cv) = cAv, A(u + v) = Au + Av These two properties together characterise linear transformations. Note: We say matrix multiplication is a linear operation. So also are the operations of differentiation and integration: • (c1 f + c2 g)′ = c1 f ′ + c2 g′ , R R R • (c1 f + c2 g) dx = c1 f dx + c2 g dx. Solutions of Ax = b In particular, if u ∈ Null(A), A(cu + v) = Av So if we know just one (any) solution v of Ax = b, then we know a whole family of solutions of Ax = b: {v + u : Av = b and Au = 0}. (2.18) We call any solution of Ax = b a particular solution. It turns out that (2.18) describes every possible solution of Ax = b. The general solution of an inhomogeneous system Ax = b, the set of all possible solutions, can be written: {v + u : Av = b, u ∈ Null(A)}. (2.19) general solution = particular solution + general solution of of Ax = b of Ax = b homogeneous system Ax = 0. (2.20) The reduced echelon form of the augmented matrix contains all this information: CONTENTS 88 Example 2.1.43. The linear system Ax = b given by x1 + 2x2 + 3x3 + 4x4 = 5 6x1 + 7x2 + 8x3 + 9x4 = 10 11x1 + 12x2 + 13x3 + 14x4 = 15 has reduced echelon form 1 2 3 4 5 6 7 8 9 10 ∼ 11 12 13 14 15 1❦ 0 −1 −2 −3 3 0 1❦ 2 4 0 0 0 0 0 Showthatthe general solution of Ax = b is 1 −3 2 −2 4 −3 x= 0 + c1 1 + c2 0 , where c1 , c2 ∈ R 0 0 1 Notes: • The general solution of a system of equations Ax = b is a translation of the nullspace. • When solutions to a linear system of equations Ax = b exist, the nullspace of A provides the “infinitely many” part of the solution. The general solution x to Ax = b has the same contribution from Null(A) for all b for which a solution exists. • If the echelon form of A has k columns without a pivot, there are k free variables in the solution of every consistent system Ax = b. Example 2.1.44. Find the general solutions of the following inhomogeneous systems. Compare with your answers to Ex 2.1.36. 1 2 3 x1 −1 1 0 −1 1 2 3 −1 3 (a) 4 5 6 x2 = 2 2 ∼ 0 1 Note: 4 5 6 2 −2 7 8 9 x3 5 0 5 7 8 9 0 0 0 CONTENTS 89 1 2 3 4 2 (b) 5 6 7 8 x = 4 9 10 11 12 6 3 2 1 5 (c) 6 4 2 x = 10 9 6 3 15 −1 1 0 −1 −2 1 2 3 4 2 2 3 −3/2 Note: 5 6 7 8 4 ∼ 0 1 9 10 11 12 6 0 0 0 0 0 1 2/3 1/3 5/3 3 2 1 5 0 0 0 Note: 6 4 2 10 ∼ 0 9 6 3 15 0 0 0 0 Example 2.1.45. A Matlab plot of the nullspace and general solution of this last example shows that that the general solution is a vertical translation of the nullspace. %Matlab-session % plot plane from vector form a=[3 2 1; 6 4 2; 9 6 3]; b=[5 10 15]’; rank(a) c=null(a) u=-1:1:1; [x,y]=meshgrid(u,u); z_1=5-3*x-2*y; % solution mesh(x,y,z_1); % plot hold on text(1,1,0,’general solution’); v=-1:0.5:1; [x,y]=meshgrid(v,v); z_2=-3*x-2*y; % nullspace text(1,1,-5,’nullspace’); text(0,0,0,’(0,0,0)’); mesh(x,y,z_2); % plot title(’general solution and nullspace’); hold off; general solution and nullspace 10 5 general solution 0 (0,0,0) nullspace −5 1 0.5 0 −0.5 −1 −1 −0.5 0 0.5 1 CONTENTS 90 The relationship between nullity and rank Recall that the rank of a matrix A is the number of non-zero rows or pivots in any echelon form of A. • The pivot-columns of a matrix in echelon form are linearly independent. • The pivot-rows of a matrix in echelon form are linearly independent. • There are as many pivot rows as pivot columns in any echelon form. This is the rank of the matrix. This tells us that The rank of a matrix is the dimension of its column space: rank(A) = dim(Col(A)). Note: If A is a square n × n matrix, then rank(A) = n ⇔ det(A) 6= 0 ⇔ A−1 exists. An important result relating the nullity of a matrix to the rank is the following: FACT If A is an m × n matrix, then rank(A) + nullity(A) = number of columns of A. (2.21) This simply states that in the general solution of any linear system Ax = b, the number of bound variables plus the number of free variables is the total number of variables – as the number of components of x is equal to the number of columns of A. Example 2.1.46. Find the rank and nullity of the matrices 1 1 1 (a) A = 1 1 1. 1 1 1 1 1 1 (b) A = −1 1 1 −1 −1 1 Because the columns of a matrix A are linearly independent only when Null(A) = {0}, or nullity(A) = 0 , we can use equation (2.21) to establish when a given set of vectors is a basis for a vector space V : A given set of vectors in Rm is a basis for a vector space V of dimension n when • there are n vectors • they are linearly independent this will be true when they are the columns of a m × n matrix A, and either • Null(A) = {0}, or equivalently • rank(A) = n. CONTENTS 91 2.2 Inner Products and Orthogonality 2.2.1 Orthogonal and orthonormal bases Definition 2.2.1. (a) A basis {v1 , v2 , . . . , vn } of an n-dimensional vector space V is called orthogonal if its members are pairwise orthogonal: (i) vi · vj = 0 for i 6= j. (b) A basis is called orthonormal if it is orthogonal (above) and consists of unit vectors: (ii) kvi k = 1 or equivalently vi · vi = 1. (c) The members vi of an orthonormal basis satisfy ( 1 i=j T . (2.22) vi · vj = vi vj = 0 i 6= j Definition 2.2.2. A matrix whose columns are orthonormal vectors is called an orthogonal matrix. Notes: • An n × k orthogonal matrix A has rank k, and satisfies the equation AT A = Ik . (2.23) • An n × n orthogonal matrix A has rank n, and satisfies the very useful property AT A = In = AAT so A−1 = AT (2.24) Example 2.2.3. The standard basis of Rn is an orthonormal basis. Example 2.2.4. The vectors v1 = [1, 1, 1], v2 = [1, 0, −1], v3 = [1, −2, 1] form an orthogonal basis B of R3 . The basis is not orthonormal. An orthogonal basis can be made into an orthonormal basis by ‘normalising’ the basis vectors - by dividing each vector by its length. This gives a basis of unit vectors in the same directions as the original vectors. Example 2.2.5. B = [1, 1, 1], [1, 0, −1], [1, −2, 1] is an orthogonal √ √ 3 . The lengths of its vectors are 3, 2 and basis of R √ 6, respectively. Therefore, i h i h io nh √1 , √1 , √1 , √1 , 0, − √1 , √1 , − √2 , √1 3 3 3 2 2 6 6 6 is an orthonormal basis of R3 . CONTENTS 92 Why would we want an orthonormal basis? In the next topic, and in many computational situations, we will want to express a given vector x0 as a linear combination of a basis {v1 , · · · , vn } of Rn . That is, we find coefficients c1 , c2 , . . . , cn , so that x0 = c1 v1 + · · · cn vn . (2.25) Formulating this as a matrix equation, we write A = v1 v2 · · · vn . c1 Then (2.25) says x0 = Ac for c = · · ·. cn Because the vi form a basis, the matrix A has an inverse: x0 = Ac ⇒ c = A−1 x0 . If the set {v1 , v2 , · · · vn } forms an orthonormal basis, then A is an orthogonal matrix, so A−1 = AT (see (2.24)): T v1 · x0 v1 v2 · x0 v2T x0 = Ac = x0 ⇒ c = AT x0 = , .. · · · . vnT vn · x0 as viT x0 = vi · x0 . In this case, (2.25) has a particularly simple solution x0 = (v1 ·x0 )v1 +(v2 ·x0 )v2 +· · ·+(vn ·x0 )vn . (2.26) Example 2.2.6. 1 Confirm that the vector x = can be written as a 2 linear combination of the orthonormal basis vectors u = 1 1 √1 , v = √12 by verifying that 2 1 −1 x = (x · u)u + (x · v)v. Example 2.2.7. 1 Write the vector 2 as a linear combination of the set 3 of orthonormal basis vectors 1 1 1 1 1 1 √ −2 , √ 1 , √ 0 6 3 1 2 −1 1 CONTENTS 93 2.2.2 Orthogonal projection of one vector onto the line spanned by another vector Definition 2.2.8. If v and w are non-zero vectors in Rn , we define the orthogonal projection of v onto the line spanned by w to be the vector w·v projw v = (2.27) w. w·w Note: w.v and w.w are scalars; projw v is a vector in the direction of w, so the direction of projw v is the unit vector w projw v = . kwk kprojw vk (Recall: Two vectors u1 and u2 have the same direction, or are parallel, if a unit vector in the direction of one is the same as a unit vector in the direction of the other.) Formula (2.27) can be derived from Figure 2.8: letting θ denote the angle between v and w, cos θ = kprojw vk A = , H kvk so kprojw vk = kvk cos θ . w.v by (4.78) = kvk kwkkvk w.v = kwk projw v ⇒ projw v = kprojw vk kproj w vk w.v w = · kwk kwk | {z } | {z } length direction An important property of the orthogonal projection, shown in Figure 2.8, is that v − projw v is orthogonal to w Example 2.2.9. Prove that (v − projw v) · w = 0 using the projection formula (2.27). (2.28) v − projw v v θ projw v Figure 2.8: projw v and v − projw v. w CONTENTS 94 2.2.3 The Gram-Schmidt process When projecting one vector onto another, we also create an orthogonal pair of vectors. The Gram-Schmidt process is an algorithm that extends that idea, creating an orthogonal set of vectors. The Gram-Schmidt process is an algorithm that takes an arbitrary basis of a vector space V , (any basis) B = u1 , u2 , . . . , un and creates from it an orthogonal basis of V , B ′ = v1 , v2 , . . . , vn (orthogonal basis) This is done using the perpendicularity property (2.28), see Figure 2.8. Here is an algorithm that creates orthogonal vectors v1 , v2 , . . . , one by one. v1 = u1 v2 = u2 − proj v1 u2 v3 = u3 − proj v1 u3 − proj v2 u3 ··· Computationally, v1 = u1 u2 · v1 v1 v2 = u2 − v1 · v1 u3 · v2 u3 · v1 v1 − v2 v3 = u3 − v1 · v1 v2 · v2 ··· ui · vj ui · vj = is a scalar, not a vector! At each step, we subtract multiples of the vj · vj kvj k2 vi we’ve already found from the uk we’re working on. 2. You should check that each new vk is orthogonal to all the vi already found. Notes: 1. Every coefficient Example 2.2.10. Use the Gram-Schmidt process on the basis u1 = [1, 0, 0], u2 = [1, 2, 0], u3 = [1, 2, 3] of R3 to produce an orthogonal basis of R3 . CONTENTS 95 Example 2.2.11. Use the Gram-Schmidt process on the basis u1 = [1, 1, 1], u2 = [−1, 1, 0], u3 = [1, 2, 1] of R3 to produce an orthogonal basis for R3 . Example 2.2.12. 1 2 3 Find an orthogonal basis of the column space of the matrix A = 4 5 6 . 7 8 9 Example 2.2.13. Matlab only has a numerical version of Gram-Schmidt: orth. This effectively finds a basis for the span of the vectors in question (their column-space) and then applies Gram-Schmidt to this basis. %Matlab-session % a=[1 2 3 ; 4 5 6 ; rank(a) b=orth(a) b’*b b*b’ gs(a) 7 8 9] CONTENTS 96 2.2.4 Least squares solutions of systems of linear equations So far, we have only focused on solving consistent systems of linear equations - where the system Ax = b has either a unique solution, or infinitely many solutions. We turn now to inconsistent systems of linear equations - systems for which there is no solution. Why would we want to study a system that had no solution? Suppose that although we know that no solution exists, we still need to do the best we can. Can we define a notion of “approximate solution”? An important example is the fitting of a set of points to a line. Fitting data points to a line Consider the set S of four points: S: (−1, −1), (0, 0), (2, 1) and (3, 1) y (2,1) (3,1) (0,0) x (−1,−1) The points are not collinear: there is no line (or linear equation) y = mx + c which is satisfied by all four points (xi , yi ) of S. In other words, the system of equations −m + c = −1 0m + c = 0 2m + c = 1 3m + c = 1 (2.29) is inconsistent. Example 2.2.14. Show, using Gaussian elimination, that (2.29) is inconsistent. CONTENTS 97 Fitting a set of points S to a line means proposing a linear model for the points: a linear equation ȳ = mx + c y (2.30) which is “best satisfied” by all the points of S. (2,1) • If the system (2.29) were consistent, this model (2.30) would be satisfied exactly at each point (xi , yi ) of S. • Instead it is satisfied at each xi by a fitted ycoordinate ȳi = mxi + c. (2.31) • The points (xi , ȳi ) all lie on the fitted line. i=1 where y = y1 y2 · · · yn is the vector of y-values in the data-set, and ȳ is the vector of corresponding fitted y-values. The name of the method “Least Squares” comes from trying to minimise this error term, which mathematically is equivalent to the problem of minimising the expresn P sion (yi − ȳi )2 . i=1 We write our linear model y = mx + c applied to a set of points S = {(xi , yi )}4i=1 in matrix form: x1 1 y1 x2 1 m y2 x3 1 c = y3 , x4 1 y4 specifically −1 0 2 3 1 −1 0 1 m = 1 . 1 c 1 1 (2.32) m We abbreviate this linear model, naming x = , as c Ax = b. x (−1,−1) Figure 2.9: Fitting points to a line. Definition 2.2.15. The error in the least-squares fit ȳ = mx + c of data points {(xi , yi )}ni=1 is given by v u n uX error = ky − ȳk = t (yi − ȳi )2 (3,1) (0,0) (2.33) CONTENTS 98 z The Method of Least Squares, the Normal Equation b We can solve a system of equations Ax = b b − Ax̄ for an m × n matrix A only if b is in the column space of A, Col(A) = {Ax : x ∈ Rn }. y 11111111111111 00000000000000 00000000000000 11111111111111 00000000000000 11111111111111 00000000000000 11111111111111 The process of Least Squares fits non-linear data to a line with slope m and y-intercept m c so that the vector x̄ = makes Ax as c close as possible to b. Ax̄ x Col(A) Figure 2.10: x̄ minimises kb − Axk. m • For any vector x = , the distance from Ax to c a given vector b is kb − Axk. • As x varies, Ax ranges over Col(A). • The minimum distance from the subspace Col(A) to b is the orthogonal distance from Col(A) to b. Suppose this minimal distance is given by kAx̄ − bk - see Figure 2.10. Then (i) x̄ is the least squares solution to the system Ax = b. (ii) the orthogonal distance kb − Ax̄k gives the leastsquares error. (iii) the least squares solution x̄ minimises the least-squares error. • If the vector b − Ax̄ is orthogonal to Col(A), then b − Ax̄ is orthogonal to each column of A: If A = v1 v2 · · · vn , this orthogonality means v1 · (b − Ax̄) = v1T (b − Ax̄) = 0 v2 · (b − Ax̄) = v2T (b − Ax̄) = 0 .. . vn · (b − Ax̄) = vnT (b − Ax̄) = 0 CONTENTS 99 In matrix notation, T v1 vT 2 .. (b − Ax̄) = 0 . or AT (b − Ax̄) = 0 vnT • This means AT Ax̄ = AT b. Definition 2.2.16. The normal equation for an inconsistent system of equations Ax = b is AT Ax = AT b. (2.34) In the case of fitting data to a line, the vector x represents the vector of unknowns m x= c which solves the Least Squares problem Ax = b. Notes: In our example above, • A has size (4 × 2) • AT has size (2 × 4). • AT A is square, and has size (2 × 2). The normal equation usually give a smaller system of equations than the original inconsistent system. Example 2.2.17. (a) Find the least-squares line for the data-points (−1, −1), (0, 0), (2, 1) and (3, 1), and the leastsquares error. y y=.5x−.25 (2,1) (3,1) (b) Find the y-value on this least-squares line corresponding to each x-value (0,0) x (i) x = 1 (ii) x = 2 (iii) x = 3 2 (−1,−1) CONTENTS 100 Example 2.2.18. Find the line of best fit for the data (−2, 1), (−1, 0), (0, −1), (1, −1), and (2, −3), and the least-squares error. Power Laws An important application of the method of Least Squares is to models involving power laws: relationships of the form y ∝ xd . This relationship occurs in many natural and social phenomena: • Metabolic processes in mice, rabbits, dogs, goats, men, women, and elephants, are proportional to bodyweight to the power 34 : 3 metabolic rate ∝ body weight 4 • Human brain weight and body weight are related by: 2 human brain weight ∝ body weight 3 • Zipf’s law states that the frequency of a word’s usage in English text is inversely proportional to the ranking of the word’s frequency - which is to say that a word ranked nth in a frequency table occurs n1 times as often as the word ranked 1st . The power d in a hypothesised relationship y ∝ xd can be found by the method of least squares, after taking logarithms of the expression: y = kxd ⇒ ln(y) = ln(k) + d ln(x) which, in relating ln(x) to ln(y) (seen in the log-log plot of x and y below), is the equation of a straight line of slope d and ln(y)-intercept of ln(k). To find the values of k and d in a given relationship, (a) data (xi , yi ) is transformed to (ln(xi ), ln(yi )), (b) the method of least squares is used to determine the values of ln(k) and d in the log-log relationship ln(y) = ln(k) + d ln(x) (c) the values of k and d are read off. 2 y = 0.5x 3 ln(y) = ln(0.5) + 2.5 ln(x) ln(y) 2 y 1 2 3 1.5 1 0.5 0.5 ln(x) 0 -0.5 x 0 -1 0 1 2 3 0 1 2 3 CONTENTS 101 Solving Inconsistent Linear Systems We now have a method of solving any inconsistent system of linear equations: Solution Technique 2.2.19. To solve an inconsistent system of equations Ax = b using the method of least-squares: (i) form the normal equation AT Ax = AT b (by multiplying both sides of the original equation on the left by AT ); (ii) solve the normal equation for x. The least-squares error is given by kAx − bk. Example 2.2.20. 1 −1 2 x (a) Solve the inconsistent system 3 1 1 = 1 using the method of least-squares. x2 1 2 −3 CONTENTS 102 (b) Find a least squares solution and error of (the obviously inconsistent) system 1 0 1 0 1 x1 = 1 x2 1 1 0 (c) Find a least squares solution and error of the system of equations x1 + x2 = 2 x1 − x2 = 0 2x1 + x2 = 2 Example 2.2.21. If a system of equations Ax = b is inconsistent, Matlab’s x = A\b solves it by the method of least-squares. %Matlab-session % a\b solves inconsistent systems % using least-squares a=[1 -1; 3 1; 1 2] b=[2 1 -3]’ x=a\b x1=inv(a’*a)*a’*b x1-x %check CONTENTS 103 2.3 Eigenvalues We begin this topic with a review of finding the determinant of an n × n matrix. For given matrices B, we will be interested in problems where Bu = 0 (2.35) for vectors u. We know already that (2.35) has a non-zero solution u if and only if det(B) = 0. (See page 195) 2.3.1 Determinants Determinants by Cofactors Firstly we define a few special types of matrices. Definition 2.3.1. (i) The main diagonal of a square n × n matrix A consists of the terms a11 , a22 , . . . , ann . (ii) A square matrix is diagonal if all non-zero entries h ∗ 0 0are i on the main diagonal: 0 ∗ 0 ; 00∗ upper-triangular if all non-zero entries i h ∗ ∗ ∗are on or above the main diagonal: 0 ∗ ∗ ; 00∗ lower-triangular if all non-zero entries h arei ∗00 on or below the main diagonal: ∗ ∗ 0 ; ∗∗∗ triangular if it is either upper-triangular or lower-triangular. Definition 2.3.2. If A is a square n × n matrix A, its i, j-minor, Aij , is the (n − 1) × (n − 1) submatrix resulting when the ith row and jth column of A are deleted. Example 2.3.3. 1 0 (a) For A = , −2 −1 A11 = −1, and , A12 = −2. 1 2 3 (b) Find A12 and A23 for A = 4 5 6. 7 8 9 Definition 2.3.4. The (i, j)-cofactor of a square matrix A is defined to be Cij = (−1)i+j det(Aij ), where Aij is the ij-minor of A (Definition 2.3.2). Example 2.3.5. 1 0 (a) Find all cofactors of A = . −2 −1 (2.36) CONTENTS (b) Find C12 104 1 2 3 and C23 for A = 4 5 6. 7 8 9 The determinant of a matrix can be calculated by cofactor expansion along any row or column. The idea is to expand along a row or column with the most zeros: To calculate the determinant of a square n × n matrix A • Cofactor expansion across the ith row is given by the formula det A = ai1 Ci1 + ai2 Ci2 + ai3 Ci3 + ai4 Ci4 + · · · ain Cin • Cofactor expansion down the j th column is given by the formula det A = a1j C1j + a2j C2j + a3j C3j + a4j C4j + · · · anj Cnj Example 2.3.6. Find the determinants of the following diagonal and triangular matrices using the method of cofactors. 0 0 0 1 2 (a) 0 3 (d) 1 0 0 0 0 0 1 0 0 (b) 4 5 0 6 0 0 1 2 3 (c) 0 4 5 0 0 0 (e) 0 3 1 0 1 0 0 (f) 0 2 0 0 0 3 Clearly: The determinant of any triangular matrix is just the product of the entries on its main diagonal. Example 2.3.7. 1 2 0 Find the determinant of the given matrix using cofactor expansion: A = 1 −1 1. 2 0 1 CONTENTS 105 A second method for finding the determinant of a matrix uses row-reduction. For large matrices, this can be faster than the method of cofactors. Determinants by Row-Operations A square matrix in echelon form is upper-triangular. If we reduce a matrix to echelon form, its determinant is apparent. So we investigate how the determinant of a matrix changes as we apply row-operations to it. Example 2.3.8. a b Compare the determinant of the matrix A = before and after the following row-operations: c d 1. r1 ↔ r2 2. r1 → −r1 3. r2 → −r2 4. r2 → r2 − kr1 5. r1 → r1 + kr2 Regardless of the size of a square matrix, row-operations have the following effects on its determinant: row operation row-swap A row-multiple A row-addition Example 2.3.9. A ∼ rj ↔ri ∼ rj →krj ∼ U U rj →rj +kri U change in determinant det(U ) = − det(A) det(U ) = k det(A) det(U ) = det(A) 1 2 0 Using row reduction, find the determinant of A = 1 −1 1. 2 0 1 We simply mention here that a third method for finding determinants uses the fact that for square n × n matrices A and B, det(AB) = det(A) det(B). (2.37) CONTENTS 106 2.3.2 Eigenvalues: The characteristic equation of a matrix. Definition 2.3.10. Given a square n × n matrix A, we can sometimes find a non-zero vector v ∈ Rn and a corresponding scalar λ, such that Av = λv. (2.38) We call any non-zero vector v which satisfies (2.38) an eigenvector of A, and the corresponding scalar λ an eigenvalue. The matrix equation (2.38) can be rewritten (A − λI)v = 0. (2.39) In this form, it represents a homogeneous system of linear equations. This system always has the trivial (zero) solution v = 0. The eigenvectors are the non-trivial solutions of (2.39), i.e. the non-zero members of the nullspace of A − λI. If any exist (refer page 195), Null(A − λI ) 6= {0} which means det(A − λI)= 0. Definition 2.3.11. For an n × n matrix A, det(A − λI) = 0 (2.40) is called the characteristic equation of A. It is a polynomial of degree n and has at most n solutions. Consequently: (i) The eigenvalues of a matrix A are all the solutions λ of the characteristic equation of A, det(A − λI) = 0. (ii) The eigenvectors of a matrix A corresponding to eigenvalue λ are the non-trivial solutions v of the equation (A − λI)v = 0. Notes: • The pivots of a matrix are not necessarily the eigenvalues. • We restrict ourselves in this course to matrices whose characteristic polynomials have only real solutions. Find the eigenvalues and 2 1 . corresponding eigenvectors of A = 1 2 Example 2.3.12. CONTENTS 107 Definition 2.3.13. (i) If λ is an eigenvalue of a square matrix A, the nullspace Null(A − λI) is called the eigenspace of A corresponding to λ. It consists of the zero vector and all eigenvectors of A corresponding to λ. (ii) The dimension of the eigenspace of A corresponding to any eigenvalue λ is dim(Null(A − λI)), the nullity of the matrix A − λI. FACT A basis for Null(A − λI) gives a set of eigenvectors for A corresponding to λ. For any square n × n matrix A: • If A has eigenvalue 0, then: – det(A) = 0 (as det(A) = det(A − 0I)). – the eigenspace of A corresponding to λ = 0 is Null(A). – the nullity of A is the dimension of this eigenspace. – the eigenvectors of A corresponding to λ = 0 are the non-zero vectors of Null(A). – A is singular. – rank(A) < n. • If A has all eigenvalues non-zero: Notes: • An eigenvector v of A corresponding to eigenvalue λ is not unique. If λ is an eigenvalue of A, then Null(A − λI ) 6= {0}, Null(A − λI) is a subspace of Rn of dimension greater than zero, and has infinitely many members. • Any basis for this subspace Null(A − λI) gives a set of eigenvectors corresponding to eigenvalue λ. • If A has eigenvalue 0, the corresponding eigenspace is the nullspace of A, Null(A). – A is non-singular. – rank(A) = n. • Eigenvectors of A corresponding to different eigenvalues are linearly independent. • If A has n distinct eigenvalues, the corresponding eigenvectors form a basis for Rn . Example 2.3.14. 3 1 The matrix A = has eigenvalues 5 and 0. 6 2 Find a basis for Null(A), the eigenspace of A corresponding to λ = 0. We can relate eigenvalues and eigenvectors to our theory of linear algebra with the following facts (compare with the tables on page 195): CONTENTS 108 Example 2.3.15. Find the characteristic equation of the diagonal matrix λ1 0 . A= 0 λ2 Show that the eigenvalues of A are λ1 and λ2 . What are the corresponding eigenvectors? FACT The eigenvalues of a triangular matrix are its diagonal entries. Note: This is not true for matrices in general! Example 2.3.16. 2 0 0 Find the eigenvectors of A = 0 −1 0 . 0 0 −1 Solution: The eigenvalues are 2, −1 and −1. 0 0 0 v1 0 v2 = 0. λ = 2 (A − 2I)v = 0 −3 0 0 0 −3 v3 0 v2 = 0, v3 = 0, and v1 is a free variable (the firstrow equation says 0= 0 – no information). v1 1 v = 0 = 0 v1 , for any v1 6= 0. 0 0 Any non-zero multiple of v is an eigenvector of A corresponding to λ = 2. 0 3 0 0 v1 λ = −1 (A + 1I)v = 0 0 0 v2 = 0. 0 0 0 0 v3 v1 = 0, v2 and v3 are free variables (rows 2 and 3 say 0= 0 - no information). 0 0 0 v = v2 = 1 v2 + 0 v3 , any v2 and v3 1 0 v3 not both zero. Null(A 1 I)has dimension 2, and has basis the + 0 0 pair 1 , 0. 1 0 Any non-zero vector in Null(A + 1 I) is an eigenvector of A corresponding to λ = −1. FACT The eigenvectors of a diagonal (n × n) matrix are the vectors of the standard basis of Rn : {ei }ni=1 . CONTENTS 109 Example 2.3.17. The characteristic equation of the matrix 1 2 1 A = 3 6 3 is 9λ2 − λ3 = 0. 2 4 2 Find its eigenvalues and corresponding eigenvectors. Example 2.3.18. The Matlab function eigshow is designed to illustrate eigenvectors and eigenvalues of 2 × 2 matrices. Use it to help familiarise yourself with the physical meaning of eigenvectors. Some of the default matrices given have eigenvalues which are complex (not real). A=[1 0; 0 -2] eigshow(A) Example 2.3.19. The Matlab command eig(a) returns the eigenvalues of A in a column vector. The command [v, d] = eig(a) (two outputs) returns a vector of matrices: 1. v is a matrix whose columns are the eigenvectors, 2. d is a diagonal matrix whose diagonal entries are the corresponding eigenvalues. For a a matrix, the computational “poly(a)” command finds the coefficients of the characteristic polynomial of a (using its eigenvalues!) %Matlab-session % eigenvalues b=[1 2 1 ; 3 6 3; 2 4 2] [v,d]=eig(b) %numerical a=sym(b) [v,d]=eig(a) char_poly_s=poly(a,’lambda’) char_poly_roots_s=solve(p_s) % symbolic % char eqn % its roots char_poly_n=poly(b) char_poly_roots_n=roots(p) % char eqn % its roots CONTENTS 110 2.3.3 Diagonalisation of a Matrix An n × n matrix with n distinct eigenvalues has n corresponding linearly independent eigenvectors. These eigenvectors form a basis for Rn . If • the eigenvalues are λ1 , λ2 , . . . λn , • the corresponding eigenvectors are v1 , v2 , · · · vn , and • we form the matrices V = v1 v2 · · · vn , λ1 0 . . . 0 0 λ2 . . . 0 D= . , .. .. . 0 0 0 . . . λn then V is a non-singular matrix – with n linearly independent columns, it has rank n, and AV = V D or A = V DV −1 (2.41) This relationship can also be written V −1 AV = D. Definition 2.3.20. A square matrix with linearly independent eigenvectors can be diagonalised as V −1 AV = D (2.42) where V is the matrix with columns the eigenvectors of A, and D the diagonal matrix with diagonal entries the corresponding eigenvalues. Note: The eigenvectors of a matrix are not unique, so the matrix V above is not unique. But the diagonal matrix D of eigenvalues is unique. Example 2.3.21. 1 2 0 The matrix A = 0 3 0 has eigenvalues 1, 2 and 2 −4 2 −1 −1 0 3, and corresponding eigenvectors 0 ,0 and −1. 1 2 2 Diagonalise the matrix A. CONTENTS 111 Example 2.3.22. Matlab’s eig function can be used to verify this diagonalisation, if it is : %Matlab-session - diagonalisation a=sym([3 -3 2; -4 2 1; 3 3 -1]); [v,d]=eig(a) a*v-v*d inv(v)*a*v v*d*inv(v) Not every n × n matrix can be diagonalised , or diagonalised simply: 0 1 Example 2.3.23. (a) The matrix A = has 0 0 repeated eigenvalue λ = 0, so all its eigenvectors are in Null(A). But rank(A) = 1 so dim(Null(A)) = 1. The only eigenvector of A is in the direction of 1 . A cannot be diagonalised. 0 1 0 0 (b) B = 1 1 0 has three eigenvalues but only 2 0 1 two linearly independent eigenvectors. B cannot be diagonalised. " # cos θ − sin θ (c) The rotation matrix Q = has sin θ cos θ complex (not real) eigenvalues and eigenvectors although Q has real entries, its diagonalisation is complex. Powers of a Matrix If a matrix A can be diagonalised as A = V DV −1 (so A has n linearly independent eigenvectors), higher powers of A can be evaluated very simply: Ak = (V DV −1 )(V DV −1 ) . . . (V DV −1 ) = V D k V −1 . Most conveniently k λ1 0 . . . 0 0 λk . . . 0 2 k D = . .. .. . . . . . 0 0 . . . λkn This is one application of diagonalisation. Another occurs in solving linear systems Ax = b. We will illustrate this in the next section. CONTENTS 112 2.3.4 Symmetric Matrices A matrix A is symmetric if A = AT . This automatically means that A is square – symmetric matrices always have eigenvalues and eigenvectors. Symmetric matrices occur in many applications. Luckily, they have some very nice features: • All eigenvalues of a symmetric matrix are real. • Eigenvectors of a symmetric matrix A corresponding to distinct eigenvalues are orthogonal: Avi = λi vi , Avj = λj vj , i 6= j ⇒ vi · vj = 0. • A symmetric n × n matrix always has n linearly independent eigenvectors, even though its eigenvalues may not be distinct. So we can always use the eigenvectors of a symmetric matrix as a basis for the underlying vector space. • In the case of repeated eigenvalues, the Gram-Schmidt procedure and normalisation may by applied to any basis of eigenvectors to obtain an orthonormal basis of eigenvectors. Orthogonal Diagonalisation A symmetric matrix A can always be diagonalised: To diagonalise a symmetric n × n matrix A, • form a matrix V = v1 v2 · · · vn of orthonormal eigenvectors of A (not unique). • V is an orthogonal matrix: V −1 = V T . • The expression A = V DV T or V T AV = D (2.43) is called an orthogonal diagonalisation of A. • Every symmetric matrix has an orthogonal diagonalisation. Example 2.3.24. 1 −2 2 The matrix A = −2 1 −2 is symmetric, with eigenvalues -1, -1 and 5, and corresponding eigenvectors 2 −2 1 1 1 1 1, 0 , −1. Find an orthogonal diagonalisation of A. 0 −1 1 CONTENTS 113 Quadratic forms The expression c1 x1 + c2 x2 + · · · + cn xn is a linear form on x ∈ Rn . In matrix notation, this can be written Cx for C = c1 c2 . . . cn . Definition 2.3.25. An expression of the form c11 x21 + c22 x22 + · · · + cnn x2n + is called a quadratic form on x ∈ Rn . X cij xi xj i6=j Notes: • There are no linear terms in a quadratic form, only terms of the form cij xi xj . • We call terms cij xi xj , where i 6= j, cross-product terms, or mixed terms. • We generally combine mixed terms and write cij xi xj + cji xj xi = c′ij xi xj . • In matrix notation, a quadratic form can be written n X cij xi xj = xT Qx (2.44) i,j=1 for matrix Q = (cij )(n×n) . • In this expression, the matrix Q is symmetric, as we can always arrange to have cij = cji . Example 2.3.26. 2 Verify that x21 + 2x1 x2 − 4x 2 x3 + 5x3 0 x1 1 1 can be written x1 x2 x3 1 0 −2 x2 . 0 −2 5 x3 Example 2.3.27. Rewrite the following quadratic forms as xT Qx for symmetric matrices Q: (a) 2x21 + 3x22 (b) x21 + x1 x2 + x22 (c) x21 + x22 − x23 − 2x1 x2 + 6x2 x3 (d) −x21 + 3x22 − x1 x2 + 4x2 x3 (e) 2x21 + 3x22 − 6x1 x3 + 10x2 x3 + 9x23 The orthogonal diagonalisation of a symmetric matrix Q = V DV T , with eigenvalues λ1 , λ2 , . . . , λn , allows us to simplify quadratic forms xT Qx: xT Qx = xT (V DV T )x = (xT V )D(V T x) = (V T x)T D(V T x) = yT Dy where y = V T x = λ1 y12 + λ2 y22 + · · · + λn yn2 . (2.45) The change of variables y = V T x simplifies the quadratic form to: xT Qx = yT Dy = n X λi yi2 i=1 with (2.46) CONTENTS 114 • no cross-product terms yi yj (i 6= j), • the coefficient of each term yi2 the corresponding eigenvalue λi of Q. Notes: • Trivially, xT Qx = 0 if x = 0. • If all eigenvalues λi are non-negative, xT Qx ≥ 0 for all x. • If all eigenvalues λi are non-positive, xT Qx ≤ 0 for all x. Definition 2.3.28. A quadratic form xT Qx is said to be (i) positive-definite if xT Qx > 0 for all x 6= 0 (ii) negative-definite if xT Qx < 0 for all x 6= 0 (iii) indefinite if xT Qx takes positive and negative values, depending on x. But the quadratic form xT Qx, via (2.46), is intimately connected to the eigenvalues and eigenvectors of Q, so we can naturally rephrase these definitions: Theorem 2.3.29. A symmetric matrix Q is (i) positive-definite if all its eigenvalues are positive. (ii) negative-definite if all its eigenvalues are negative. (iii) indefinite if it has positive and negative eigenvalues. The task of finding all eigenvalues of an n × n symmetric matrix to determine whether the associated quadratic form is positive-definite becomes difficult as n increases. We now present an alternative method for determining positive-definiteness which does not require solving the characteristic equation. a11 a21 a31 a41 a12 a22 a32 a42 a13 a23 a33 a43 a14 a24 a34 a44 First principal submatrix a11 a21 a31 a41 a12 a22 a32 a42 a13 a23 a33 a43 a14 a24 a34 a44 Second principal submatrix a11 a21 a31 a41 a12 a22 a32 a42 a13 a23 a33 a43 a14 a24 a34 a44 Third principal submatrix a11 a21 a31 a41 a12 a22 a32 a42 a13 a23 a33 a43 a14 a24 a34 a44 Fourth principal submatrix Definition 2.3.30. (i) The kth principal submatrix of an n × n matrix A is the k × k matrix obtained by deleting all but its first k rows and k columns. (ii) The kth principal minor of A, Mk , is the determinant of its kth principal submatrix. A symmetric n × n matrix Q is • positive-definite iff (if and only if) all principal minors M1 , M2 , . . . Mn of Q are positive. • positive semi-definite iff all principal minors of Q are non-negative. • negative-definite iff the principal minors of Q alternate in sign: M1 < 0, M2 > 0, M3 < 0, . . . • negative semi-definite iff the principal minors of Q alternate in sign: M1 ≤ 0, M2 ≥ 0, M3 ≤ 0, . . . • indefinite iff it is neither positive-semi-definite nor negative semi-definite. CONTENTS 115 Example 2.3.31. Classify the given matrices as positive-definite, positive semi-definite, negative-definite, negative semi-definite or indefinite. 1 2 2 2 0 (a) 2 1 (f) 2 2 0 0 0 2 (b) 3 −1 −1 3 (c) −3 1 1 −3 (d) 0 1 1 0 (e) 1 −1 −1 −1 1 2 1 (g) 2 1 1 1 1 2 2 2 0 (h) 2 2 0 0 0 2 2 −1 0 (i) −1 2 −1 0 −1 2 Example 2.3.32. Matlab can handle tasks like evaluating a succession of principal minors very easily. Call the local function ev4 with the command “ev4(a)” for matrix a. function ev4(a) %matlab function to compute principal minors [m,n]=size(a); if (m~=n) disp(’ matrix must be square’); return; else for i=1:n disp([’principal submatrix ’,int2str(i)]) disp(a(1:i,1:i)) disp(’principal minor’) disp(det(a(1:i,1:i))); end; end; return; CONTENTS 116 2.3.5 Markov Chains State vectors In many practical situations, a person or object is always in 1 of 2 states. For example, • a person is either a smoker or a non-smoker. • A plant is either in flower or not in flower. A Markov process may involve more than 2 states, but in this section we will keep to just 2 states for clarity. A state vector is used to record the number in each state. Markov processes can be modelled with row state-vectors and with column state-vectors. We will use column statevectors in this course. Example 2.3.33. Suppose a survey of 90 dentists finds that 25 dentists smoke and 65 dentists do not smoke. This may be recorded 25 as the state vector . 65 Notes: • The entries in a state vector refer to the actual number in each state. • The order in which the numbers are put in the state vector is important. We must indicate clearly that the first number, 25, refers to those who smoke, and the second number, 65, refers to those who do not smoke. In a Markov process, people or objects are able to move from one state to the other state at certain fixed time intervals. With the group of dentists mentioned above, their state of smoking or not-smoking may be recorded at a regular monthly medical examination. Because movement is possible between states, the state vector will change. We will use the notation 1. v0 for the initial state vector, 2. v1 for the state vector after 1 time interval, 3. v2 for the state vector after 2 time intervals, and so on. The stochastic matrix In a Markov process, the probability that a person, or object, moves from one state to another state depends entirely on the present state. Returning to the above group of 90 dentists, let us say CONTENTS 117 • the probability a smoker will change into a nonsmoker is 0.2, • while the probability a non-smoker will change into a smoker is 0.1. It is convenient to record this information in the form of a stochastic matrix, S. Using column vectors we write 0.8 0.1 S= . 0.2 0.9 Notes: • Every entry in a stochastic matrix represents a probability, so it must be a real number from the interval [0,1]. • When column state-vectors are being used, each column in a stochastic matrix must sum to 1. • To interpret the meaning of a stochastic matrix a clear indication must have already been given of the order of the states. Example 2.3.34. We return to the group of dentists. We keep the order used for the state vector above, that is smoker: non-smoker. The first column in this stochastic matrix indicates the probabilities for a dentist who smokes. There is a probability of 0.8 they will remain a smoker and a probability of 0.2 they will change to the other state and become a non-smoker. Similarly, the second column in this stochastic matrix indicates the probabilities for a dentist who does not smoke. There is a probability of 0.1 they will change into a smoker and a probability of 0.9 they will remain a non-smoker. Multiplication by the stochastic matrix is used to find the next state vector. Thus if 25 v0 = , then we can find v1 by the calculation: 65 v1 = Sv0 = 0.8 0.1 0.2 0.9 25 26.5 = . 65 63.5 And 0.8 0.1 v2 = Sv1 = 0.2 0.9 26.5 27.55 = . 63.5 62.45 Notes: • the entries in the state vectors are written as decimals and not rounded to whole numbers. CONTENTS 118 • the sum of the entries in the state vectors remains the same. If the alternative notation of row-vectors is used, then the transpose of S is used. The corresponding calculations are as follows. Theinitial state vector is now written v0 = 25 65 . We can find v1 by the calculation: 0.8 0.2 T v1 = v0 S = 25 65 = 26.5 63.5 . 0.1 0.9 Similarly, 0.8 0.2 v2 = v1 S = 26.5 63.5 = 27.55 62.45 . 0.1 0.9 T Henceforth, we use only column vectors. The state vector after n time periods The key result is that we can write the nth state vector vn as vn = Sn v0 . This is because v1 = Sv0 v2 = Sv1 v3 = Sv2 .. . = S 2 v0 = S 2 v1 = S 3 v0 vn = Svn−1 = S 2 vn−2 = · · · S n v0 If we wish to find the expected state vector after say 5 or 10 time periods, it may be convenient to diagonalise the matrix S, as this makes it easier to find a high power of S. Example 2.3.35. Continuing theexample about the dentists: 0.8 0.1 25 S= and v0 = . Find v10 . 0.2 0.9 65 Answer: First find eigenvalues and eigenvectors for S: S eigenvalue 1 and 0.7 with associated eigenvectors has −1 1 and respectively. Diagonalising: 2 1 " 1 1# 3 3 1 −1 1 0 S = 0 0.7 − 2 1 2 1 3 3 10 " 1 1# 3 3 1 −1 1 0 Hence S 10 = 2 1 0 0.710 − 2 1 3 3 0.3522 0.3239 = 0.6478 0.6761 0.3522 0.3239 25 10 Hence, v10 = S v0 = = 0.6478 0.6761 65 29.86 . 60.14 CONTENTS 119 The long-term state vector If a Markov process vk+1 = Avk is allowed to continue for a very large number of time periods, the sequence of state vectors v0 , v1 , v2 , v3 , · · · , vk , · · · may eventually become constant for all subsequent time periods: Avk = vk , for all k ≥ n. (2.47) Note. If Avn = vn , then vn is an eigenvector of A corresponding to λ = 1. For the given Markov process vk+1 = Avk , the vector vn of (2.47) is called both • a steady-state vector, as there is no change from state vn to any subsequent state: vn+1 = Avn = vn . The sequence of state vectors for the Markov process, given state vector v0 , is {v0 , v1 , v2 , v3 , · · · , vn−1 , vn , vn , vn , · · · } • a long-term state vector, written v∞ : v∞ = lim vk = lim Ak v0 = vn k→∞ k→∞ The long-term state-vector makes the final outcome of the Markov process clear. Facts about stochastic matrices: • Every stochastic matrix has eigenvalue λ = 1. • All other eigenvalues λ satisfy |λ| ≤ 1 – they may not all be real, but in this course they will be. • A corresponding stochastic eigenvector vλ=1 gives the long-term (and steady-state) state vector: Av = v It is customary to scale the eigenvector corresponding to λ = 1 so that its entries sum to the same value as the initial state vector. Example 2.3.36. Continuing theexample about the dentists. 0.8 0.1 25 S= and v0 = . Find v∞ . Answer: 0.2 0.9 65 CONTENTS 120 1 S has eigenvalue 1 with associated eigenvector . So 2 1 so that the components sum v∞ will be a multiple of 2 to 90, asthis is the number of dentists in the study. Hence 30 v∞ = . The long-term situation will be that 30 of 60 the dentists willsmoke and 60 will not smoke. Check: 0.8 0.1 30 30 = . 0.2 0.9 60 60 Notice that the initial state vector is not used in the computation of v∞ , except to determine the total number in the study. Example 2.3.37. A car-rental agency has cars in 3 cities, say Auckland, Wellington and Christchurch. Initially, the proportion of thestock in the three cities is xA 0 . given by a vector x0 = xW 0 xC 0 Suppose that, after a week, • of the cars that started out in Auckland, – – – 1 2 1 4 1 4 are in Auckland are in Wellington are in Christchurch • of the cars that started out in Wellington, – – – 1 4 1 2 1 4 are in Auckland are in Wellington are in Christchurch From A W C • of the cars that started out in Christchurch, – – – 1 8 1 8 3 4 are in Auckland are in Wellington are in Christchurch and that initially, one-third of cars are in each city. (a) Formulate the transition matrix and initial state vector for weekly car-movement. A To W C CONTENTS 121 (b) Is it possible to keep cars in stock in all three locations, preferably at a near-constant ratio? Example 2.3.38. We can see more of these computations on Matlab: call the local function ev5.m with three arguments: the first a matrix, the second a starting vector, and the third the number of steps to evaluate: >> a=[1/2 1/4 1/8; 1/4 1/2 1/8; 1/4 1/4 3/4] >> x0=[1/4 1/2 1/4]’ >> ev5(a,x0,10); function b=ev5(a,x0,n) %matlab function to demonstrate Markov chain b=a*x0 n=min(n, 20); for i=2:n b=a*b; disp(b); end; return; CONTENTS 122 2.3.6 Discrete Dynamical Systems Introduction Processes which evolve through time, like • the invasion of new territory by an introduced biological species, • the growth of a population within a region, • the spread of disease through a population, • the saturation of a market by a new product, can be modelled either by discrete steps, or as a continuous process. Definition 2.3.39. A difference equation which models the evolution of a process through time in discrete steps is known as a discrete dynamical system: it is a model or rule of the form xn+1 = Axn . (2.48) For this course, we focus on the case that A is a matrix of size n × n, and x0 , x1 , x2 , . . . are vectors (of size n × 1). Example 2.3.40. Many species of insect have a discrete breeding season, e.g. an annual breeding season, and the nth seasonal population, xn , can be modelled, at least initially, by an equation of the form xn+1 = axn , where a is a number reflecting the birth-rate of the population. If we go a little further and incorporate deaths into the model as well as births, we get something like xn+1 = bxn (M − xn ), a logistic model of population growth. Example 2.3.41. The simple one-year model above, xn+1 = axn , can be adapted to species which take longer to reach breeding age. We could break the population up into a vector of “young" and “adult", young x xn = nadult , xn as only adults reproduce, yet the young born this year may be adults next breeding season. This would give us a yearly growth-model of the form 0 a xn+1 = x , (2.49) b c n where a again reflects the birth-rate of the population. CONTENTS 123 All these models are simple, and we have to keep in mind they will have shortcomings. But they can still help us to analyse a problem. We state here an important saying: “All models are wrong, but some models are useful. – George Box” Example 2.3.42. Write an+1 = 2an + 3bn bn+1 = an − bn in matrix form. Example 2.3.43. If the yearly growth model (2.49) is given by 0 1 xn+1 = x , 1/2 1/2 n 2 and x0 = , what is x4 ? 10 What is happening to the distribution of young and adults in time? Eigenvector decomposition and the long-term For a given initial vector x0 , the rule (2.48), xn = Axn−1 , defines outcomes xn into the future. The matrix A is a transition matrix: it describes the transition of a system from one time-step to the next. Assuming the model holds far enough ahead, we are often interested in the long-term future, in lim xn . n→∞ When a matrix A has distinct eigenvalues, its corresponding eigenvectors form a basis for the underlying vector space. This means any vector in the vector space can be written as a linear combination of these eigenvectors (a basis spans the vector space). This makes it very easy to see how A acts on vectors. For example, suppose the 2×2 matrix A has eigenvalues λ1 and λ2 (λ1 6= λ2 ), with corresponding eigenvectors v1 and v2 . If x0 is any vector in R2 , we can write x0 = c1 v1 + c2 v2 Then Ax0 = c1 Av1 + c2 Av2 = c1 (λ1 v1 ) + c2 (λ2 v2 ). (2.50) (2.51) With xn = Axn−1 , x1 = Ax0 = c1 λ1 v1 + c2 λ2 v2 x2 = A2 x0 = c1 λ21 v1 + c2 λ22 v2 .. . xn = An x0 = c1 λn1 v1 + c2 λn2 v2 (2.52) CONTENTS 124 We now present some simple discrete dynamical systems, with a plot of the solution vector xn , n = 1, 2, · · · in the x − y plane through time. Since the solution xn is a vector-valued function of n, we need three dimensions to plot it. If we let n vary, then for fixed values of c1 and c2 , xn traces out a curve in the (x, y)–plane. A collection of such curves for a set of values of c1 and c2 is called a phase portrait. Some phase portraits (with arrows indicating the direction that we move along the curves as n increases) are given for the solutions below. 10 Example 2.3.44. 1.05 0 If xn+1 = x , 0 0.95 n write an expression for xn using (2.52). 4 (i) If x0 = , what is x10 ? 3 1 (ii) If x0 = , what is x10 ? 10 5 v2 y v1 0 x -5 (iii) For any x0 with positive entries, what does the model predict will be the long-term distribution? -10 -4 -2 0 2 4 CONTENTS 125 10 Example 2.3.45. 0.95 0 If xn+1 = x , 0 1.05 n write an expression for xn using (2.52). 4 (i) If x0 = , what is x10 ? 3 1 (ii) If x0 = , what is x10 ? 10 y v2 5 v1 0 x -5 (iii) For any x0 with positive entries, what does the model predict will be the long-term distribution? -10 -8 Example 2.3.46. 1.15 0 If xn+1 = x , 0 1.05 n write an expression for xn using (2.52). -6 -4 -2 0 2 4 6 8 8 y v2 6 4 2 4 (i) If x0 = , what is x10 ? 3 1 (ii) If x0 = , what is x10 ? 10 (iii) For any x0 with positive entries, what does the model predict will be the long-term distribution? v1 x 0 -2 -4 -6 -8 -8 -6 -4 -2 0 2 4 6 8 CONTENTS 126 Example 2.3.47. 0.95 0 If xn+1 = x , 0 0.90 n write an expression for xn using (2.52). y v2 4 2 4 (i) If x0 = , what is x10 ? 3 1 (ii) If x0 = , what is x10 ? 10 (iii) For any x0 with positive entries, what does the model predict will be the long-term distribution? v1 -2 -4 -4 Example 2.3.48. 0.95 0 If xn+1 = x , 0 0 n write an expression for xn using (2.52). 4 (i) If x0 = , what is x10 ? 3 1 (ii) If x0 = , what is x10 ? 10 (iii) For any x0 with positive entries, what does the model predict will be the long-term distribution? x 0 -2 0 2 4 CONTENTS 127 Example 2.3.49. Stoats were introduced into this country to control rabbits: if their habitat is not threatened and they don’t find preferable food sources, they will prey on rabbits and keep their numbers down. Suppose we model stoat-rabbit interaction in a specific environment on a yearly scale, from year n to year n + 1, by the equations: sn+1 = 0.4sn + 0.9rn rn+1 = −psn + 1.2rn (2.53) where sn is the number of stoats in the region, rn the number of rabbits, and p is a positive predation parameter specific to this environment. The first equation says that with no rabbits for food, 40% of the stoat population will survive till the next year, and the 1.2 in the second equation says that with no stoats to control their numbers, rabbits will increase at a rate of 20% per year. The term −psn measures the rate of predation of stoats on rabbits. Let’s assume the predation parameter p has value 0.11. 0.4 0.9 To five decimal places of accuracy, the eigenvalues of the matrix A = are λ1 = .55302 and −0.11 1.2 λ2 = 1.04698. 0.81197 0.98585 respectively. and v2 = The corresponding eigenvectors are v1 = 0.58370 0.16761 (a) Use (2.52) to write xn in terms of the eigenvectors of the transition matrix. 1 s = −7.9121v1 + 10.8380v2 , what is the long-term ratio of stoats to rabbits? (b) If x0 = 0 = 5 r0 5 s = 2.9466v1 + 2.5803v2 , what is the long-term ratio of stoats to rabbits? (c) If x0 = 0 = 2 r0 s0 98585 (d) If x0 = = , what does the model predict in the long-term? r0 16761 10 y 5 v2 v1 x 0 -5 -10 -10 -5 0 5 10 Differential equations 3.1 First-Order Differential Equations 3.1.1 Introduction A differential equation is an equation between the derivatives and values of an unknown function. Example 3.1.1. In the differential equation dy =y+t dt y is an unknown function of the variable t. The differential equation tells us that at each value of t, the value of the derivative of y with respect to t is equal to the value of y + t. Example 3.1.2. The following are examples of DEs. (a) dy dt = 3t (b) dy dx = 5x + 1 (c) dy dx = 5y + 1 (d) y ′ + 2xy = sin(x) (e) ∂2f ∂x2 + ∂2f ∂y 2 =0 In this course, we consider only differential equations that involve derivatives of functions of one variable. Such differential equations (DEs for short) are called ordinary DEs. A partial DE involves derivatives of a function of more than one variable. Example 5.1.2 (e) is partial DE. A derivative measures the rate of change of a dependent variable with respect to some independent variable. In Example 5.1.1, t is the independent variable and y is the dependent variable (y is a function depending on t). The independent variable may represent • time • distance of some object from a fixed point 128 CONTENTS 129 • cost of an item • demand for a product Many quantities which depend on time or space obey physical laws which are naturally and simply formulated as DEs. 3.1.2 Terminology Features of a DE that are significant to us: order The order of a DE is the highest derivative appearing in the equation, e.g., • dy dt • d3 y dt3 = 3t is first order (has order 1) 2 t − t dy dt + (t − 1)y = e has order 3 linear or nonlinear Suppose the dependent variable in a DE is y which is a function of t. The DE is linear if it can be written so that each term is of the form y or one of its derivatives multiplied by a (possibly constant) function of t, otherwise it is nonlinear, e.g., • d2 y dt2 • dy dt + y = 0 is linear = 1 y is nonlinear. solution A solution of a DE is any function that, when substituted for the unknown dependent variable in the DE, gives a true statement, i.e., A solution satisfies the DE. Notes: • a solution y may be – explicit: given in the form y = f (t), or – implicit: given in the form F (t, y) = 0 • a solution of a DE may have a restricted domain, 2 e.g., y = 1t satisfies dy dt = −y for t 6= 0. CONTENTS 130 Example 3.1.3. (a) Verify that y = 5t + 3 is an explicit solution of dy = 5. dt (b) Verify that xy 2 = 2 is an implicit solution of y 2 + 2xy dy = 0. dx Example 3.1.4. You already know how to solve some DEs, e.g., dy =t dt ⇒ 1 y = t2 + c, 2 where c is some arbitrary constant. Here, if an additional condition is imposed, such as y(0) = 1, then there is a unique solution, e.g., in this case y = t2 /2 + 1. Example 3.1.5. Here are some more DEs that you can already solve: (a) x′ = 0 ⇒ x = c, where c is an arbitrary constant (b) y ′ = 2 (c) dy dt = 2t (d) dy dt =y (e) d2 y dt2 ⇒ dy dt =1 =t+c 2 ⇒y = t2 + ct + d. where c, d are arbitrary constants Example 3.1.6. Verify that the given functions are solutions of the accompanying DEs: (a) y = cx2 , xy ′ = 2y (b) y 2 = e2x + c, (c) y = cekx , yy ′ = e2x y ′ = ky (d) y = c1 sin(2x) + c2 cos(2x), y ′′ + 4y = 0 CONTENTS 131 (e) y = c1 e2x + c2 e−2x , xy ′ = y + x2 + y 2 (f) y = x tan(x), (g) x2 = 2y 2 ln(y), y′ = (h) y 2 = x2 − cx, (i) y = c2 + xc , y (j) y = ce x , y ′′ − 4y = 0 xy x2 +y 2 2xyy ′ = x2 + y 2 y + xy ′ = x4 (y ′ )2 y′ = y2 xy−x2 (y cos(y)−sin(y)+x)y ′ = y (k) y+sin(y) = x, Existence of solutions Most differential equations we will meet are guaranteed to have solutions, by the following result. Existence Theorem: Suppose that f (t, y) is continuous on the rectangle R := {(t, y) : a < t < b, c < y < d} and (t0 , y0 ) is in the rectangle. Then there is a function y defined on an open interval containing t0 which satisfies dy = f (t, y), dt y(t0 ) = y0 . This theorem says that so long as the right hand side of the DE is nice enough, then solutions to the DE exist. Initial Conditions, Initial-Value Problems The DE y dy =1 dx has solution y 2 = 2(x + c). The number c is an arbitrary constant of integration, and can take any scalar value. • A 1st-order DE is solved by one integration. CONTENTS 132 • This introduces one constant c into the solution. There is in fact a family of solutions, each member of the family corresponding to a different value of c. • To pick a specific member of the family, we need to know just one point on the solution curve. Initial condition The initial condition (abbreviated IC) of a 1st-order DE is an observation of the unknown function at one specific point x = x0 : y(x0 ) = y0 . Example 3.1.7. An initial condition for the DE y is y(0) = −1, or dy =1 dx (x0 , y0 ) = (0, −1). Notes: • x0 is thought of as an initial point, and y0 the corresponding initial y-value. • The graph of the solution satisfying y(x0 ) = y0 is a curve passing through the point (x0 , y0 ). Example 3.1.8. dy = 1 is y 2 = 2(x + c), with c arbidx trary. Find the solution which also satisfies y(0) = −1. The solution of y Definition 3.1.9. An Initial-Value Problem (abbreviated IVP) is the pair Differential Equation + Initial Conditions. In this course you will see aspects of the following three main approaches for finding and understanding the solutions of DEs • analytic – sometimes a formula for the solution can be found • numerical – for many DEs numerical approximations to the values of the solution can be found, e.g., we will see Euler’s method • qualitative – often we can describe important features of solutions, such as their long term behaviour, e.g., by considering the slope field. CONTENTS 133 3.1.3 First-Order Differential Equations First we consider methods for finding analytic solutions. Separable Equations A separable first-order DE is one that can be written in the form y ′ = f (x)g(y). For these the derivative of y can be separated into two factors, one involving only x and one involving only y. If the derivative involves only x, then we can integrate it directly. For many DEs used in modelling the derivative involves only y, and these are said to be autonomous. Solution Technique 3.1.10. To solve the separable DE dy = y ′ = f (x)g(y) : dx • If g(y) = 0, then y = constant, for all x • If g(y) 6= 0, then – write the DE in the form dy g(y) = f (x) dx R R dy = f (x) dx + c – integrate: g(y) – solve for y Example 3.1.11. Solve the following DEs, checking your answers. (a) The autonomous DE dy dx y = Cex =y y 10 5 C=2 y C=1 Check: y ′ = See the graphs of the solution y = Cex when C = −1, 1, 2. t 0 C = −1 −5 −10 −4 −2 0 x 2 4 CONTENTS 134 y 2 = 2(x + c) 5 =1 y (b) dy y dx c=1 0 c = −1 c=0 Check: y ′ = −5 −1 1 3 5 x y = Ce−x 2 2 1.5 (c) y ′ + 2xy = 0 Check: ⇒ y′ C=2 C=1 y′ = + 2xy = y 1 0.5 0 −0.5 −1 −2 0 C = −1 2 x Example 3.1.12. Solve the following IVPs. Some members of the family of solutions of each DE have been sketched. Locate the solution of each IVP. (a) y ′ = x2 , y(1) = 4 3 Check: y ′ = y= x3 3 +c y 5 c=1 c=0 0 (1, 43 ) • c = −2 −5 −2 = 2, y(e) = 4 0 1 2 x y 2 = 4 ln |x| + c, x0 > 0 5 • (e, 4) c = 12 c=0 y (b) x y dy dx −1 c = −5 0 −5 0 2 4 x CONTENTS (c) dy dx 135 y = 1 + Ce3x = 3(y − 1), y(0) = 2. 20 C=3 10 C=2 y0 > 1 0 y C=1 y0 = 1 y0 < 1 −10 −20 −1 0 1 x t2 + ty = y, y = Cet− 2 y(1) = 3 • (1, 3) 3 1 2 C = 3e− 2 1 C=1 y (d) dy dt 0 −2 C=0 −1 1 Example 3.1.13. Matlab can be used to solve DEs. The previous example can be solved by the Matlab code: %Matlab-SESSION % Specify the DE % % y’+t*y = y % % Note derivative of y written Dy % DE = ’ Dy+t*y=y’; de_solution=dsolve(DE) % You can also specify initial conditions IC=’y(1)=3’ ; ivp_solution=dsolve(DE,IC) simplified=simplify(ivp_solution) This gives the solutions for the DE and IVP: %>> solveDEs de_solution = C1*exp(-1/2*t*(t-2)) ivp_solution = 3/exp(1/2)*exp(-1/2*t*(t-2)) simplified = 3*exp(-1/2*(t-1)^2) 0 2 t 3 4 5 CONTENTS 136 First-Order Linear Equations First-order linear DEs are an important class of DEs which can usually be solved analytically. A first-order linear DE can be written as a(x)y ′ + b(x)y = c(x). When a(x) = 0, we don’t have a DE. In intervals where a(x) 6= 0, we can rewrite the DE as y ′ + p(x)y = f (x). (3.54) Notes: • In this form, the coefficient of y ′ is 1. • We call this form (3.54) of the DE, where y ′ has a coefficient of 1, standard form. a(x)y ′ + b(x)y = c(x) has standard form : y′ + c(x) b(x) y= . a(x) a(x) b(x) c(x) and f (x) = are defined. a(x) a(x) • Separable 1st-order DEs may be linear or nonlinear. • The DE is defined only where p(x) = Example 3.1.14. Put the following DEs into the form of (3.54), with leading coefficient 1. (a) 2y ′ + 3xy = 4x (b) xy ′ + 3y = 4 Our method of solution will be to multiply (3.54) by an appropriate function µ called an integrating factor (IF for short), so that the LHS becomes the derivative of µy. Note: µ (pronounced “mu") is the Greek lower case “m", for “multiply". Solution Technique 3.1.15. To solve the 1st-order linear DE y ′ + p(x)y = f (x) : R • multiply through by IF, µ(x) = e p(x)dx • recognise the new DE as (µ(x)y)′ = µ(x).f (x) • integrate: µ(x)y = • solve for y R µ(x)f (x) dx + c (3.55) CONTENTS 137 Example 3.1.16. Solve the following DEs by using an appropriate integrating factor, and check your answers: (a) y ′ − 3y = 0 (This is separable. Solve it both ways.) y = y(0)e3t 40 30 y 20 10 y(0) = 10 0 −10 −2 (b) y ′ + 2y = e−4x+7 (c) x2 y ′ + 2xy = 3x + 1, (d) y ′ + 3x 2 y = 2x y(1) = 1 y(0) = 1 y(0) = −10 0 t 2 CONTENTS (e) y ′ + 3y x 138 = x4 , y(1) = 1 y =x+ c x y 10 0 c = −1 c=2 −10 −5 −4 −3 −2 −1 dy (f) x dx + y = 2x, y(1) = 0 (g) y ′ + 2y = x2 + 4x + 7 (h) y ′ = x + y (i) x dy − 4y = x6 ex dx c=0 0 x 1 2 3 4 5 CONTENTS 139 First-order Applications See the textbook reference sheet. Newton’s Law of Cooling A heated object cools at a rate which depends on the surrounding (ambient) temperature: Newton’s Law of Cooling states that the temperature T (t) of a cooling object changes at a rate proportional to the difference between its temperature and the ambient temperature TA . dT = k(T − TA ) dt (3.56) This is a separable DE. Example 3.1.17. Coffee poured into a cup has temperature 75◦ Celsius, and 10 minutes later has cooled to 65◦ . The ambient temperature is 20◦ . (a) Find a formula for the temperature T as a function of time t. T (t) = 20 + 55e−0.02t 75 • (0, 75) (b) How long before the coffee cools to 50◦ ? • (10, 65) T(t) 65 • (30, 50) 50 20 0 30 60 90 120 150 t (minutes) Example 3.1.18. A bath is run to a temperature of 40◦ . The temperature in the bathroom is 18◦ . After 15 minutes, the bath-water is at 36◦ . How long before it is 33◦ ? 180 CONTENTS 140 Example 3.1.19. A small country has $10 billion of paper currency in circulation, and each day $50 million comes into the country’s banks. The government decides to introduce new currency by having the banks replace old bills with new ones whenever old currency comes into the banks. Let x = x(t) denote the amount of new currency, in billions of dollars, in circulation at time t, with x(0) = $0. (a) Formulate an initial-value problem that represents the flow of the new currency into circulation. (b) Solve the initial-value problem you’ve specified in (i). (c) How long does it take for the new bills to account for 90% of the currency in circulation? Solution: (a) If time t is measured in days, and x(t) measures, in billions of dollars, the amount of new currency in circulation after t days, then the amount of new currency going out from day to day is: fraction of total currency total amount of old currency x(tomorrow) − x(today) = coming into banks tomorrow remaining of today at end 50 = [10 − x(t)] 10, 000 so x(t + 1) − x(t) = 0.005(10 − x(t)). With time t measured in days, dx x(t + 1) − x(t) ≈ = 0.005(10 − x) (the approximation improves as t gets larger), dt (t + 1) − 1 so the initial-value problem is dx = 0.005(10 − x), x(0) = 0. dt When t = 0, x = 0 so C = 10. x = 10(1 − e−0.005t ). billion$ (b) The D.E. is separable, and for the duration of the problem, x < 10. dx = 0.005 dt 10−x ⇒ − ln(10 − x) = 0.005t + c . ⇒ ln(10 − x) = −0.005t + c −0.005t ⇒ 10 − x = Ce . (c) When x = 9, what is t? 9 −0.005t ⇒1−e ⇒ e−0.005t ⇒t = 10(1 − e−0.005t ) = .9 = .1 ln(.1) = = 460.5170 ≈ 461 days . −0.005 T (t) = 10(1 − e−0.005t ) 11 10 9 8 7 6 5 4 3 2 1 0 • (461, 9) 0 100 200 300 400 500 600 700 t (days) Figure 3.11: New currency in circulation It takes just under 461 days (between 15 and 16 months) to get 90% of the new currency in circulation. 800 900 1000 CONTENTS 141 Example 3.1.20. A personal computer has a useful life of 5 years, and depreciates at a rate directly proportional to the time remaining in its 5-year life-span. It is purchased for $2,500, and one year later its value is $1800. Write an equation for the value of the computer t years into its life for t > 0, and find its value 4.5 years after its purchase. Example 3.1.21. A store’s sales change at a rate directly proportional to the square root of the amount spent on advertising. With no advertising, sales are $50,000 per year, and with $16,000 spent on advertising per year, its sales are $70,000. Write an equation describing the store’s sales as a function of the amount spent on advertising. CONTENTS 142 Slope fields A first-order DE dy = f (x, y). dx says that the slope of a solution y(x) which passes through the point (x0 , y0 ) is f (x0 , y0 ). We can draw these slopes at a grid of points to get a picture of what the family of solutions looks like. This works even if we can not find a formula for the solution to the DE. slope field The slope field or direction field of the first-order DE dy = f (x, y) dx is a plot of the slope f (x, y) at a set of points in the (x, y)–plane. Note: The graph of a solution which satisfies an initial condition y(x0 ) = y0 will have slopes close to those in the slope field (the finer the grid the closer the slopes). Thus we can get the approximate shape of the solution that satisfies an initial condition y(x0 ) = y0 , by starting at the point (x0 , y0 ) and following the direction field, always ensuring that the curve drawn is tangential to the slopes. Example 3.1.22. The adjacent picture shows the slope field for the DE dy = y + t on a grid over −3 ≤ t, y ≤ 3. dt slopefield 4 3 2 1 0 −1 −2 −3 −4 −4 Use the slope field to sketch some approximate solutions to the DE, including the solution that satisfies the initial condition y(1) = 2. −3 −2 −1 0 1 2 3 4 CONTENTS 143 Example 3.1.23. dy = y + t, dt (a) Solve the IVP y(0) = 0. (b) Construct the slope-field of this DE by 1. completing the table of the values of y ′ below, and 2. sketching line-segments with these slopes on the graph below. 4 3 . . . . . . . 2 . . . . . . . 1 . . . . . . . 0 . . . . . . . -1 . . . . . . . -2 . . . . . . . -3 . . . . . . . -3 -2 -1 0 1 2 3 Values of f (t, y) = y + t t=-2 1 0 t=-1 2 t=0 3 t=1 4 t=2 5 t=3 6 y y=3 y=2 y=1 y=0 y=-1 y=-2 y=-3 t=-3 0 -1 0 0 0 0 0 t Locate the solution to the IVP on the slopefield. Example 3.1.24. Slope fields can easily be plotted by a computer using a software package such as Matlab. The Matlab code to plot the slope field in the previous example, using a local function slopefld.m, is: %Matlab-SESSION % % Create the slope field for the DE % y’ = y+t % and plot it a 30x30 mesh grid over [-3,3]^2 % delta_t=0.5; [t,y]=meshgrid(-3:delta_t:3,-3:delta_t:3); dt=ones(size(t)); dy=y+t; axis square; slopefld(t,y,dt,dy,’r-’);%, hold off, axis image title(’slopefield’); Copy slopefld.m to your working directory before you run this file. delta_t controls the grid spacing. 4 CONTENTS 144 Example 3.1.25. Sketch the slope field and some solution curves for dy dt = y. t=-3 t=-2 t=-1 t=0 t=1 t=2 4 t=3 y y=3 y=2 y=1 y=0 y=-1 y=-2 y=-3 Solve the IVP y ′ = y, y(0) = 1. 3 . . . . . . . 2 . . . . . . . 1 . . . . . . . 0 . . . . . . . -1 . . . . . . . -2 . . . . . . . -3 . . . . . . . -3 -2 -1 0 t 1 2 3 3 . . . . . . . 2 . . . . . . . 1 . . . . . . . 0 . . . . . . . -1 . . . . . . . -2 . . . . . . . -3 . . . . . . . -3 -2 -1 0 1 2 3 Identify this solution on the slope-field. Notice how the solutions follow the slope field. 4 4 Example 3.1.26. 2y Sketch the slope field for dy dt = t , showing the solution y = 2t2 passing through y(1) = 2. t=-3 t=-2 t=-1 t=0 t=1 t=2 t=3 y y=3 y=2 y=1 y=0 y=-1 y=-2 y=-3 4 t 4 Example 3.1.27. Sketch the slope field for dy dt = 2t + 1, showing the solution y = t2 + t − 4 which satisfies the initial condition y(−2) = −2. y=3 y=2 y=1 y=0 y=-1 y=-2 y=-3 t=-2 t=-1 t=0 t=1 t=2 t=3 y t=-3 3 . . . . . . . 2 . . . . . . . 1 . . . . . . . 0 . . . . . . . -1 . . . . . . . -2 . . . . . . . -3 . . . . . . . -3 -2 -1 0 1 2 3 t 4 CONTENTS 145 Euler’s method Not all initial value problems of the form dy = f (t, y), dt y(t0 ) = y0 have analytic solutions. We can find approximations y1 , y2 , y3 , . . . to the value of the solution at the sequence of points t1 = t0 + h, t2 = t1 + h = t0 + 2h, ... by using a numerical method. If the step-size h is “small” and the method is any good, then the derivative of y at tn should be close to the slope of the line through (tn , yn ) and (tn+1 , yn+1 ), i.e., ∆y yn+1 − yn = ≈ f (tn , yn ). ∆t tn+1 − tn By setting these equal, and solving for yn+1 , we obtain yn+1 = yn + f (tn , yn )∆t, ∆t = tn+1 − tn = h. The corresponding approximation scheme is called Euler’s method. The Euler formula can also be found by considering the process of linear approximation drawn on the direction field of y ′ = f (t, y): With yn+1 = yn + f (tn , yn )h yn+1 lies on the line through (tn , yn ) with slope f (tn , yn ). This slope is illustrated in the slopefield. y' = f(t,y) 5 4 3 y2 2 y1 1 y0 0 0 t0 0.5 1 t1 1.5 2 t2 2.5 CONTENTS 146 Solution Technique 3.1.28. Euler’s method for approximating the solution of dy = f (t, y), dt y(t0 ) = y0 • choose a step size h, and let t1 = t0 + h, t2 = t1 + h = t0 + 2h, .. . tn+1 = tn + h = t0 + (n + 1)h, . . . • compute successively y1 = y0 + hf (t0 , y0 ), y2 = y1 + hf (t1 , y1 ), .. . yn+1 = yn + hf (tn , yn ), ... Definition 3.1.29. If (i) the solution y = y(t) of the DE y ′ = f (t, y) is known, (ii) and Euler’s method generates a sequence of approximate-solution values {(ti , yi )}ni=1 , then the error in the approximate solution at the point t = ti is error = |y(ti ) − yi |. • The calculations for Euler’s method can be set out in a table. • Euler’s method can easily be implemented on a computer - e.g. in Matlab. • It can be shown that the error in Euler’s method at any point is roughly proportional to the step-size h. Thus Euler’s method converges to the true value of the solution (at a point) as h → 0. By making finer approximations to f (tn , yn ) one can come up with other methods which converge at faster rates. Example 3.1.30. Use Euler’s method to approximate a solution to the IVP y ′ = t + y, y(0) = 0: (a) on the interval [0, 3] with h = 0.5, (b) on the interval [0, 1.5] with h = 0.25. n 0 1 2 3 4 5 6 tn yn f ( tn , y n ) yn + hf ( t n , y n ) n 0 1 2 3 4 5 6 tn yn Example 3.1.31. Use Euler’s method to approximate a solution to the IVP y ′ + ty = t, y(0) = 0: (a) on the interval [0, 1] with h = 0.5, (b) on the interval [0, 1] with h = 0.1. Compare your approximations with the analytic solution. f ( tn , y n ) yn + hf ( t n , y n ) CONTENTS 147 Further Reading: the Logistic Equation The model of growth/decay we’ve seen, exponential growth: y ′ (t) = ky(t), is the simplest. This models early growth or low population numbers, where no constraints inhibit growth. A more realistic model for the growth of a large population, e.g. spread of disease or permeation of the market by a new commodity, is given by the logistic equation: y ′ (t) = y(t)(B − Dy(t)), for constants B and D. Here • y(t) ≥ 0 represents population at time t • y ′ (0) > 0 — initially, the population y is growing • B is the constant growth (birth) rate • Dy(t) is the death rate, assumed directly proportional to the size of the population. The logistic equation says rate of population growth = population(birth rate - death rate) A typical solution to the logistic equation has the following graph solution of logistic equation B D x x (0) 200 100 0 100 200 300 t Note: The population levels off as the death-rate becomes close to the birth-rate. This model can accurately predict many constrained-growth situations, e.g. • insect and bacteria populations in restricted space • the spread of an epidemic in a confined population • impact of advertising in a local community • spread of information in a local community Example 3.1.32. Solve the equation y ′ (t) = y(t)(2 − y(t)) with initial condition y(0) = 1. The DE is separable: When y = 0 or 2, y ′ = 0 and y has a critical point. The IC says y starts out at 1, and y ′ (0) = 1 > 0. For 0 < y < 2, we can divide through: Z Z dy = dt. y(2 − y) CONTENTS 148 Expand the integrand using partial fractions: 1 a b = + . y(2 − y) y 2−y 1 = a(2 − y) + b y. 1 y = 0 ⇒ 1 = 2a ⇒ a = . 2 1 y = 2 ⇒ 1 = 2b ⇒ b = . 2 Rewrite Clearing denominators, So Z dy 1 = y(2 − y) 2 Z ⇒ 1 dy + 2−y Z y= 2e2t 1 + e2t 2 3 . 2e2t 1+e2t 2.5 2 x 1.5 1 0.5 0 −0.5 −4 −3 −2 −1 0 t 1 = Z dt 1 (− ln |2 − y| + ln |y|) = t + c 2 y 1 = t + c. ⇒ ln 2 2−y y ⇒ = Ce2t . 2−y Applying the initial condition, C = 1. Solving for y, ... x= 1 dy y 4 CONTENTS 149 3.2 Systems of First-Order Differential Equations 3.2.1 First-order linear homogeneous equations A first-order linear differential equation is homogeneous if it has the form y ′ + p(t)y = 0 Suppose that p(t) is a constant a, y ′ + ay = 0 or y ′ = −ay. (3.57) Guess a solution of the form y = ceλt (3.58) and put it into the DE: y ′ + ay = λceλt + aceλt = 0 ⇒ ceλt (λ + a) = 0 ⇒ λ + a = 0. (3.59) Equation (3.59) is called the characteristic equation of the DE (3.57). It has solution λ = −a. Thus y = ce−at (3.60) is a solution of (3.57). Note: We could have also obtained this solution using separation of variables or an integrating factor. Example 3.2.1. Write down the characteristic equation for each of the following DEs and use it to find a one-parameter family of solutions to the DE. (a) y ′ + 3y = 0 (b) 2y ′ = y (c) y ′ − 5y = 0 CONTENTS 150 3.2.2 Systems of first-order linear DEs Now consider two linear constant-coefficient homogeneous differential equations in two dependent variables x1 (t) and x2 (t) x′1 = ax1 + bx2 , x′2 = cx1 + dx2 or x′ = Ax, (3.61) ′ x1 x1 a b ′ . where x = , x = ′ , and A = c d x2 x2 Notice the similarity between this and the DE (3.57). To solve this system of DEs, we take our cue from the one-dimensional case (3.58) and guess a vector solution λt v e λt x = ve = 1 λt , (3.62) v2 e where v is a vector of constants. Substituting this into the system gives λv1 eλt ′ x = = λeλt v = Aeλt v, λv2 eλt i.e., Av = λv, so that λ must be an eigenvalue of A and v a corresponding eigenvector. This gives two solutions of the DE: x1 = eλ1 t v1 , and x2 = eλ2 t v2 , where v1 and v2 are eigenvectors of A for the eigenvalues λ1 and λ2 . If the eigenvalues are distinct (λ1 6= λ2 ), • it can be shown that the eigenvectors v1 and v2 are linearly independent, • all linear combinations of eλ1 t v1 and eλ2 t v2 are also solutions of the DE • these two solutions form a basis of all solutions of the DE. The solution of x′ (t) = Ax(t) is x(t) = c1 eλ1 t v1 + c2 eλ2 t v2 (3.63) where λ1 and λ2 are the eigenvalues of A, v1 and v2 the corresponding eigenvectors, and c1 and c2 are arbitrary constants. CONTENTS 151 Solution Technique 3.2.2. To solve the system of DEs x′1 = ax1 + bx2 x′2 = cx1 + dx2 • write the differential equations in system form: a b x′ (t) = Ax(t) with A = , c d • find the eigenvalues λ1 and λ2 and corresponding eigenvectors v1 and v2 of A, • the solution of the original DE is: x(t) = c1 eλ1 t v1 + c2 eλ2 t v2 where c1 and c2 are arbitrary constants, • apply the initial conditions to determine c1 and c2 . Since the solution x(t) is a vector-valued function of t, we need three dimensions to plot it. If we let t vary, then, for fixed values of c1 and c2 , x(t) traces out a curve in the (x, y)–plane. A collection of such curves for a set of values of c1 and c2 is called a phase portrait. Some phase portraits (with arrows indicating the direction that we move along the curves as t increases) are given for the solutions below. x′ = Ax for diagonal matrices A The eigenvalues of a diagonal matrix are its diagonal entries, and the corresponding eigenvectors are the standard 1 0 basis vectors e1 = and e2 = . 0 1 Example 3.2.3. Solve the system of first-order DEs: −x(t) d x(t) (a) dt = y(t) y(t) 15 y 10 5 v2 0 v1 x -5 -10 -15 -15 -10 -5 0 5 10 15 CONTENTS (b) d dt x(t) −x(t) = y(t) −2y(t) 152 10 y 5 v2 0 v1 x -5 -10 -10 ′ x x = (c) −2y y -5 0 5 10 10 y 5 v2 0 v1 x -5 -10 -10 ′ 1.5x x = (d) 0.5y y -5 0 5 10 10 y 5 v2 0 v1 x -5 -10 -15 -10 -5 0 5 10 15 CONTENTS ′ x x (e) = y 2y 153 20 y 15 10 5 v2 0 v1 x -5 -10 -15 -20 -10 -5 0 5 10 x′ = Ax for general matrices A Example 3.2.4. 1 −3 ′ (a) x = x: 0 −2 The solution of the DE is 1 t 1 −2t x = c1 e + c2 e 0 1 15 y 10 5 v2 0 v1 x -5 -10 -15 -15 -10 -5 0 5 10 15 CONTENTS −1/4 3/4 (b) = x 5/4 1/4 The solution of the DE is −1 −t 0.6 t x = c1 e + c2 e 1 1 x′ 154 15 y 10 5 v2 0 x -5 v1 -10 -15 -15 7 −1 1 (c) = x, x(0) = . 3 3 0 1 6t 1 1 4t Solution: x = 23 e −2 e . 1 3 x′ -10 -5 0 5 10 15 y 400 v2 v1 200 0 x -200 -400 -400 -200 0 200 400 y 10 1 −1 1 (d) = x, x(0) = . 2 −2 0 1 −t 2 e . − Solution: x = 2 2 x′ v2 v1 5 0 x -5 -10 -10 -5 0 5 10 CONTENTS 155 Extra for interest: The method above gives a way to find analytic solutions to linear constant coefficient homogeneous systems of DEs. However, many interesting DEs are not of this type, in which case qualitative and numerical methods are often useful for finding out about the behaviour of solutions. Example 3.2.5. The Matlab code %Matlab-SESSION % % Solve the system of DEs using ode23 % needs auxiliary file, v_prime, defining x’ and y’ % x’=-y; y’=5-2y-x^2-xy; clear t v_out; % clear variables we name here [t,v_out]=ode23(@v_prime,[0,20],[2;1]); xlabel(’x’); ylabel(’y’); plot(v_out(:,1),v_out(:,2)); can be used to numerically integrate and plot solutions to the system dx = −y, dt dy = 5 − 2y − x2 − xy. dt 4 2 0 −2 −4 −6 −8 −5 −4 −3 −2 −1 0 1 2 Example 3.2.6. Use Matlab to numerically integrate and plot solutions to the system x dy y dx = 2x(1 − ) − xy, = 3y(1 − ) − 2xy, dt 2 dt 3 for x ∈ [0, 4], y ∈ [0, 4]. What is the long-term behaviour of solutions to the DE? CONTENTS 156 3.3 Homogeneous Linear Second-Order DEs with constant coefficients 3.3.1 Introduction A second-order linear DE is one which can be written in the form a(t)y ′′ + b(t)y ′ + c(t)y = d(t). As in the first-order case, we only consider intervals of t where a(t) 6= 0, and then rewrite the DE as y ′′ + p(t)y ′ + q(t)y = f (t), (3.64) with leading coefficient 1. Note: The DE is defined only where p(t), q(t) and f (t) are all defined. Terminology homogeneous/inhomogeneous The second-order linear DE is said to be homogeneous if it has a zero right-hand-side, i.e., has the form y ′′ + p(t)y ′ + q(t)y = 0, otherwise, if the right-hand-side term f (t) is not the zero function, then the DE is called inhomogeneous. solution A solution of the DE y ′′ + p(t)y ′ + q(t)y = f (t) is a function y that, when substituted for the unknown dependent variable, yields a true statement from the DE. A solution y of a DE satisfies it. Example 3.3.1. Verify the following: (a) y = cos(t) is a solution of y ′′ + y = 0. (b) e−t and e−3t are solutions of y ′′ + 4y ′ + 3y = 0. (c) u = et is not a solution of u′′ − 4u = et . CONTENTS 157 vector space of solutions If y1 an y2 are solutions of the homogeneous DE y ′′ + p(t)y ′ + q(t)y = 0, then so is any linear combination y = c1 y1 + c2 y2 y = c1 y1 + c2 y2 y′ = y ′′ = Superposition Principle Plug these expressions into y ′′ + p(t)y ′ + q(t)y If y1 and y2 are two solutions of a homogeneous linear DE y ′′ + p(t)y ′ + q(t)y = 0, and see that after the cancellations, you get 0. Thus the then any linear combination set of solutions of this DE forms a vector space. This is c1 y1 + c2 y2 sometimes called the superposition principle. of them is also a solution of the same DE. Initial-Value Problems A linear second-order DE y ′′ + p(t)y ′ + q(t)y = f (t) can be solved by two integrations, which introduces two arbitrary constants of integration. You will see why this is, in the example at the end of this section. This example shows that such a DE can be rewritten as a first-order system (whose solution gives the two constants). Consequently, the family of solutions of a 2nd-order DE is parameterised by two arbitrary constants. A specific solution in this family requires 2 observations. We will consider the so-called initial conditions. • Initial Conditions (ICs): Take the values y(t0 ), y ′ (t0 ) i.e., make both observations at one initial point t0 As for first order DEs we have: Initial-Value Problem (IVP): Differential Equation + Initial Conditions (ICs) It can be shown that if the coefficients p(t) and q(t) and the right-hand-side f (t) are well-behaved functions, then • a 2nd-order IVP always has a unique solution. CONTENTS 158 3.3.2 Solving Homogeneous Linear Second-Order DEs We restrict ourselves to solving 2nd-order, and some 3rdorder linear homogeneous DEs. Much of the theory extends directly to DEs of higher order. Our goal is: To find the most general solution of a homogeneous linear DE y ′′ + p(t)y ′ + q(t)y = 0, so that we can solve all Initial-Value Problems based on it. Example 3.3.2. Show the following (a) y(t) = 0 is a solution of y ′′ + 4y ′ + 3y = 0, but it doesn’t satisfy the ICs y(0) = 1, y ′ (0) = 1. (b) y(t) = e−t is a solution of y ′′ + 4y ′ + 3y = 0, but it doesn’t satisfy the ICs y(0) = 1, y ′ (0) = 1. We have already seen that the set of all solutions of a 2nd-order homogeneous DE is a vector space. Since the solution involves two arbitrary constants, this vector space is 2-dimensional. More generally, the set of solutions of an nth–order homogeneous DE forms a n– dimensional vector space. To decide whether we have found enough solutions to form a basis for the space of solutions, we need to know when sets of functions are linearly independent. Linear Independence of functions Two or more functions y1 (t), y2 (t), . . . yn (t) are said to be linearly independent on an interval I if whenever a linear combination of them is equal to zero on the whole interval: c1 y1 (t) + c2 y2 (t) + . . . + cn yn (t) = 0 for all t in I, then c1 = c2 = . . . = cn = 0, i.e., only the trivial linear combination equals the zero function on I. If they are not linearly independent, they are said to be linearly dependent. Notes: CONTENTS 159 • The same definition is given in the linear algebra part. For a function to be the zero function, every value must be 0, c1 y1 (t) + c2 y2 (t) + . . . + cn yn (t) = 0, t∈I whereas for a vector c1 v1 + c2 v2 + . . . + cn vn to be the zero vector it must be 0 in each coordinate. • For DEs, I is the interval where we solve the DE: where the coefficient functions and the right-hand side are defined. • Functions (or vectors) are linearly dependent when one of them can be written as a linear combination of others. Example 3.3.3. If two functions f1 and f2 are linearly dependent on some interval I, then there are constants c1 and c2 , not both zero, such that c1 f1 (t) + c2 f2 (t) = 0 for all t ∈ I, i.e., one is a scalar multiple of the other. Example 3.3.4. Show that the following sets of functions are linearly dependent, by verifying the given relationship: (a) t, t − 1, t + 3 on (−∞, ∞), and c1 t + c2 (t − 1) + c3 (t + 3) = 0 for c1 = −4, c2 = 3, c3 = 1. √ √ t − 1, t2 on (0, ∞), (b) t +√5t, t + 5, √ and c1 ( t+5t)+c2 ( t+5))+c3 (t−1)+c4 t2 = 0 for c1 = 1, c2 = −1, c3 = −5, c4 = 0. For a set of three or more functions the easiest way to determine whether they are linearly independent on an interval is via their Wronskian: Wronskian If y1 (t), y2 (t), . . . yn (t) all have at least n − 1 derivatives, then their Wronskian is defined to be the determinant W (y1 , y2 , . . . yn ) = det y1 y1′ y1′′ .. . (n−1) y1 y2 y2′ y2′′ (n−1) y2 ... ... ... yn yn′ yn′′ .. . (n−1) . . . yn . CONTENTS 160 Test for Linear Independence If y1 (t), y2 (t), . . . yn (t) all have at least n − 1 derivatives, and W (y1 , y2 , . . . yn )(t) 6= 0 for at least one point t in I, then the functions y1 (t), y2 (t), . . . yn (t) are linearly independent on I. If the determinant is zero on all of I, the functions are linearly dependent. Example 3.3.5. Compute the Wronskian of each of the following sets of functions, and determine whether the functions are linearly independent on the intervals given. (a) 1 and t on (−∞, ∞) (b) t and t − 1 on (−∞, ∞) (c) 1 and t, t2 on (−∞, ∞) (d) t, t − 1, t + 3 on (0, ∞) (e) t and t2 on (−∞, ∞) (f) 1 and et on (−∞, ∞) (g) t and et on (−∞, ∞) (h) e−t and e−3t on (−∞, ∞) (i) et and tet on (−∞, ∞) (j) sin(t) and cos(t) on [−2π, 2π] (k) eλ1 t and eλ2 t on (−∞, ∞), for λ1 6= λ2 CONTENTS 161 If y1 (t), y2 (t), . . . yn (t) are n linearly independent functions on an interval I, we know that W (y1 , y2 , . . . yn )(t0 ) 6= 0 for some t0 ∈ I. If, in addition, they are also solutions of an nth -order homogeneous linear DE on any part of I (let’s assume it’s all of I), we know a little more: Facts about the Wronskian of a Set of Solutions to a Homogeneous Linear DE (i) If y1 , y2 , . . . yn are n linearly independent solutions of an nth-order homogeneous linear DE on an interval I, then W (y1 , y2 , . . . yn )(t) 6= 0 for every t in I. (ii) Any set of m solutions y1 , y2 , . . . ym , m>n of an nth-order homogeneous linear DE, have W (y1 , y2 , . . . ym )(t) = 0 for all t. You now know enough about DEs to prove these facts in the case of a second-order DE: y ′′ + p(t)y ′ + q(t)y = 0 (3.65) Solutions y1 and y2 of (3.65) satisfy y1′′ + p(t)y1′ + q(t)y1 = 0 y2′′ + p(t)y2′ + q(t)y2 = 0 so y1′′ = −p(t)y1′ − q(t)y1 y2′′ = −p(t)y2′ − q(t)y2 . The Wronskian of y1 and y2 is y1 y2 W (y1 , y2 ) = det ′ = y1 y2′ − y1′ y2 . y1 y2′ And if they are linearly independent, W (y1 , y2 )(t0 ) 6= 0 at some t0 ∈ I. Differentiate this Wronskian: d W (y1 , y2 ) = y1′ y2′ + y1 y2′′ − y1′′ y2 − y1′ y2′ dt = y1 y2′′ − y1′′ y2 . = y1 (−p(t)y2′ − q(t)y2 ) − (−p(t)y1′ − q(t)y1 )y2 from (3.66) and (3.67). This simplifies to give d W = −pW. dt (3.66) (3.67) CONTENTS 162 This is a separable 1st-order DE. Solving, R W (y1 , y2 ) = Ce− p(t) dt . Now impose a special initial condition at a particular t = t0 : W (y1 , y2 )(t0 ) = its non-zero value. (y1 and y2 are linearly independent, so this Wronskian has a non-zero value at at least one point t0 .) The solution to this IVP is: R W (y1 , y2 ) = W (y1 , y2 )(t0 )e− p(t) dt . Neither factor on the right-hand side is zero, so we’ve proved the result! Example 3.3.6. Suppose y1 , y2 and y3 are three solutions of the second order DE (3.65). For i = 1, 2 and 3, yi′′ + p(t)yi′ + q(t)yi = 0 so yi′′ = −p(t)yi′ − q(t)yi Use Gaussian elimination to show that Wronskian of these three functions is constantly 0. Now to the topic of finding enough linearly independent solutions of a DE: Fundamental Set of Solutions for a Linear DE Any set of n linearly independent solutions of an nth-order homogeneous linear DE is called a fundamental set of solutions of the DE. Since the set of solutions is an n–dimensional vector space, these form a basis for it. basis A fundamental set of solutions of a homogeneous linear DE is a basis of solutions for the DE, i.e., every solution of the DE can be written as a linear combination of a fundamental set of solutions. This means we can write the most general solution of an nth-order homogeneous linear DE: Solution Technique 3.3.7. To solve a homogeneous linear DE of order n and associated IVPs: • find a fundamental set of solutions of the DE: y1 , y2 , . . . , yn • the general solution of the DE is the linear combination y = c1 y1 + c2 y2 + . . . + cn yn for arbitrary constants c1 , . . . , cn . • solve for the constants by applying the initial conditions Example 3.3.8. Solve the IVP y ′′ + 4y ′ + 3y = 0, y(0) = −1, y ′ (0) = 2. CONTENTS 163 3.3.3 Homogeneous Linear DEs with Constant Coefficients. We restrict ourselves now to the simplest type of homogeneous linear DE, that with constant coefficients. In the second-order case, this is of the form ay ′′ + by ′ + cy = 0, a, b and c constants, a 6= 0. What follows is a general method of finding a fundamental set of solutions of an nth-order homogeneous linear DE with constant coefficients. We demonstrate the method in the 2nd-order and 3rd-order cases. Characteristic Equation To solve ay ′′ + by ′ + cy = 0, a, b and c constants, a 6= 0, (3.68) we start out by guessing a solution y = k eλt y ′ = λ k eλt y ′′ = λ2 k eλt Putting this y into (3.68), 0 = k (aλ2 eλt + bλeλt + ceλt ) = k eλt (aλ2 + bλ + c) Since eλt 6= 0 for any t, and we assume that k 6= 0, it must be that aλ2 + bλ + c = 0. Definition 3.3.9. The equation aλ2 + bλ + c = 0. (3.69) obtained by making the substitution y = k eλt in the DE ay ′′ + by ′ + cy = 0 is called the characteristic equation of the DE. Notes: 1. Note the similarity between the characteristic equation of the DE (3.69) and the differential equation itself (3.68). 2. Solutions λ of the characteristic equation give solutions eλt of the DE. Solutions of the characteristic equation The characteristic equation aλ2 + bλ + c = 0 for a secondorder DE is a quadratic equation. It therefore has the solution √ −b ± b2 − 4ac . λ= 2a CONTENTS 164 Three Possible Outcomes in Solving the Characteristic Equation: b2 − 4ac > 0, two distinct real roots λ1 and λ2 of the characteristic equation, yielding two solutions eλ1 t and eλ2 t of the DE. b2 − 4ac = 0, one repeated real root λ1 of the characteristic equation, yielding one solution eλ1 t of the DE. b2 − 4ac < 0, two complex roots λ1 and λ2 of the characteristic equation, yielding two complex-valued solutions eλ1 t and eλ2 t of the DE. For this course, we restrict ourselves to cases (i) and (ii) above - when b2 − 4ac ≥ 0.1 (i) Distinct Roots of the Characteristic Equation In case (i), there are two real distinct solutions for λ, giving as many solutions eλt as the order of the DE. By Example 3.5.5 (k), these are linearly independent, so by solving the characteristic equation we get a fundamental set of solutions of the DE. Fundamental set of solutions when characteristic equation has distinct real roots An nth-order constant-coefficient homogeneous DE whose characteristic equation has n distinct real roots λ1 , λ2 , . . . , λn has the fundamental set of solutions eλ1 t , eλ2 t , . . . , eλn t . 1 The case b2 − 4ac < 0 involves complex numbers (not a part of this course) and Euler’s formula eit = cos(t) + i sin(t). E.g. y ′′ + 4y = 0 has characteristic equation λ2 + 4 = 0⇒λ = ±2i. The general solution is y = c1 e2it + c2 e−2it which can be rewritten, using Euler’s formula, as y = d1 cos 2t + d2 sin 2t. This can be verified directly: y ′ = −2d1 sin 2t + 2d2 cos 2t, ′′ y = −4d1 cos 2t − 4d2 sin 2t. So that y ′′ + 4y = 0.X CONTENTS 165 Example 3.3.10. Find the general solution of the following DEs by solving the characteristic equation and writing the general solution as a linear combination of the corresponding solutions of the DE. Check your answers. (a) 2y ′ + 3y = 0 (b) y ′′ − 2y ′ − 3y = 0 (c) y ′′ + 5y ′ + 4y = 0 (d) y ′′ − 9y = 0 (e) y ′′′ + 2y ′′ − 3y ′ − 6y = 0 (f) y ′′′ + 4y ′′ + y ′ − 6y = 0 Note: When you have to factor a cubic, unless the factorisation is obvious, try to guess a root: if you find a number c that makes the polynomial equal to 0, then t−c is a factor. To find the others, divide the polynomial by t − c using long division. CONTENTS 166 (ii) Repeated Roots of the Characteristic Equation Example 3.3.11. Solving the characteristic equation for y ′′ + 2y ′ + y = 0, λ2 + 2λ + 1 = (λ + 1)2 = 0, we see that λ = −1 is the only root, giving just one solution y = e−t of the DE. In every case of repeated roots, we get fewer solutions of the form y = eλt than the order of the DE. A method related to integrating factors tells us the following: If a root λ of the characteristic equation is repeated k times, then y = eλt , teλt , t2 eλt , . . . , tk−1 eλt are all solutions of the DE. Further, these solutions are linearly independent. Example 3.3.12. (a) Show that te−t is another solution of the DE y ′′ + 2y ′ + y = 0. (b) Show that te−t is linearly independent from e−t . Example 3.3.13. Find the characteristic equation of y ′′′ − 6y ′′ + 12y ′ − 8y = 0, factorise and solve it. Then show that y = e2t , y = te2t and y = t2 e2t are linearly independent solutions of the DE. Hint: Computation of the Wronskian of these sets of solutions shows that they are linearly independent. CONTENTS 167 Example 3.3.14. Solve the following DEs: Solution Technique 3.3.15. To find the general solution of an nth -order homogeneous linear DE with constant coefficients: (a) y ′′ + 2y ′ + y = 0 • form characteristic equation and find its roots λ (b) y ′′ + 6y ′ • form corresponding solutions eλt of DE + 9y = 0 • if a root λ of characteristic equation is repeated k times, supplement eλt with teλt , t2 eλt , . . . , tk−1 eλt • the general solution is a linear combination of this fundamental set of solutions (c) y ′′′ + y ′′ − y ′ − y = 0 (d) y ′′′ + 4y ′′ + 4y ′ = 0 Initial-Value Problems Solving the DE with initial conditions determines the constants of integration in the general solution; use either substitution or Gaussian elimination to find them. Example 3.3.16. (a) y ′′ + 5y ′ + 4y = 0, y(0) = 0, y ′ (0) = 1 (b) 2y ′′ − 5y ′ − 3y = 0, y(0) = 1, y ′ (0) − 2 CONTENTS 168 3.3.4 Equivalence of Second Order DE and First-Order System of DEs We now illustrate the equivalence of a second-order homogeneous DE and a corresponding system of two firstorder DEs, in the case that the roots of the characteristic equation are distinct. Example 3.3.17. y v y ′′ + 5y ′ + 4y = 0, y(0) = 0, y ′ (0) = 1. Form the vector v = 1 = ′ . v2 y ′ y Then v′ = ′′ , y and solving for y ′′ from the DE: y ′′ = −4y − 5y ′ , so ′ y y′ 0 1 y = = ′′ ′ y −4y − 5y −4 −5 y ′ y(0) 0 Also from the DE, v(0) = ′ = . y (0) 1 The IVP can be restated in system form as v′ = Av v(0) = 0 , 1 for matrix A = The matrix A has characteristic equation λ2 + 5λ + 4 = 0, with roots λ1 = −1, λ2 = −4, the eigenvalues of A. The corresponding eigenvectors are 1 1 v1 = and v2 = respectively. −1 −4 The general solution of (3.63) is x = c1 eλ1 t v1 + c2 eλ2 t v2 = c1 e−t Then apply the initial condition x′ (0) 1 1 + c2 e−4t . −1 −4 0 = . 1 Example 3.3.18. Solve the IVP y ′′ − 2y ′ − 3y = 0, y(0) = 1, y ′ (0) = 1 using a system of first-order DEs. (3.70) CONTENTS 169 It is usually quicker to solve higher order, linear constant coefficient homogeneous DEs directly (i.e., without first converting to a system). However, if we want to use numerical or qualitative techniques we usually convert to a system. For example, to study the DE y ′′ + sin y = 0 (a model for a pendulum if the amplitude is large), the analytic methods above will not work, and we can instead rewrite it as a system y ′ = z, z ′ = − sin y and study this numerically. Example 3.3.19. Numerical solution to pendulum problem (different from above). %Matlab-SESSION % % Solve the system of DEs using ode23 % needs auxiliary file, v_prime, defining x’ and y’ % x’=-y; y’=5-2y-x^2-xy; clear t v_out; % clear variables we name here [t,v_out]=ode23(@v2_prime,[0,2000],[2;1]); plot(v_out(:,1),v_out(:,2)); xlabel(’y’); ylabel(’z’); axis square; title(’solution to pendulum problem’); solution to pendulum problem 1.5 1 z 0.5 0 −0.5 −1 −1.5 1.5 2 2.5 3 3.5 4 4.5 5 4.5 5 y solution to pendulum problem 1.5 1 z 0.5 0 −0.5 −1 −1.5 1.5 2 2.5 3 3.5 4 y Figure 3.12: Early solution, and then solution over much longer time-scale. CONTENTS 170 4.1 Vectors 4.1.1 Vector Arithmetic We can add and subtract vectors u = (u1 , u2 , . . . un ) and v = (v1 , v2 , . . . vn ) ∈ Rn just as we do real numbers, and interpret the results geometrically: u + v = (u1 , . . . , un ) + (v1 , . . . , vn ) = (u1 + v1 , . . . , un + vn ) ∈ Rn and we can multiply v by a scalar c (i.e. a number): cv = (cv1 , . . . , cvn ) to stretch/shrink the vector’s length, or reverse its direction. u −u v−u v v v−u u+v u −u Example 4.1.1. 1 3 Sketch the vectors u = and v = . On the same graph, sketch 2 −1 (a) u + v (c) −2u (b) u − v (d) 3u − 2v Properties of vectors in Rn If u = (u1 , u2 , · · · , un ), v = (v1 , v2 , · · · , vn ) and w = (w1 , w2 , · · · , wn ) are vectors in Rn and k and l are scalars, then (a) u + v = v + u (b) u + (v + w) = (u + v) + w (c) u + 0 = 0 + u = u (d) u + (−u) = 0; i.e. (e) k(lu) = (kl)u u−u=0 (f) k(u + v) = ku + kv (g) (k + l)u = ku + lu (h) 1u = u Here 0 = (0, 0, · · · , 0) is the zero vector of Rn , with n components. CONTENTS 171 4.1.2 Length, distance, and angles in Rn The length kvk of a vector v ∈ Rn is defined as kvk = q v12 + v22 + · · · + vn2 In R2 this is Pythagoras’s theorem. It can also be stated kvk2 = v12 + v22 + · · · + vn2 . (4.71) Definition 4.1.2. Vectors of length 1 are called unit vectors. v Note: For any v 6= 0, kvk is a unit vector. 2 3 In R or R , any two vectors v and w that originate at a common point and which are not parallel (i.e. v 6= kw), form two sides of a triangle. The vector v − w (or w − v) forms the third side. The sides have lengths kvk and kwk, and kv − wk, respectively - see Fig 4.13. v |w − v| w Figure 4.13: Distance between vectors v and w. Definition 4.1.3. The distance between vectors v and w in Rn is p kw − vk = (w1 − v1 )2 + (w2 − v2 )2 + · · · + (wn − vn )2 . (4.72) Angles in Rn , orthogonality We now present a tool which helps in the computation of lengths of vectors, and angles between them: Definition 4.1.4. The dot product of two vectors v and w in Rn is denoted by v · w, and defined as v · w = vT w = v1 w1 + v2 w2 + · · · + vn wn . (1×n)(n×1) Note that v · w is a scalar. The dot product of two vectors is also referred to as their scalar product. Properties of the dot product in Rn If u, v, w are vectors in Rn and r is a scalar, • (u + v) · w = u · w + v · w; • (ru) · v = r(u · v); • u · v = v · u; • u · u > 0 whenever u 6= 0. We can restate the length, or magnitude, of a vector v ∈ Rn in terms of the dot-product, as √ kvk = v · v (4.73) CONTENTS 172 or alternatively, v · v = kvk2 = v12 + v22 + · · · + vn2 . Pythagoras’s theorem says that perpendicular, or right-angled, or orthogonal vectors v and w in satisfy kv − wk2 = kvk2 + kwk2 . (v ⊥ w) (4.74) R2 (Figure 4.14) (4.75) w kv − wk kwk kvk v v−w Figure 4.14: If v ⊥ w, then v.w = 0. Using dot products, kv − wk2 = (v − w) · (v − w) = v · v − v.w − w.v + w.w = kvk2 + kwk2 − 2v.w. (4.76) So if v and w are perpendicular, then v.w = 0. We extend this result to Rn : Definition 4.1.5. Non-zero vectors v and w in Rn are said to be orthogonal when v · w = 0. Again from R2 , the law of cosines tells us that vectors v and w with angle θ between them (not necessarily perpendicular) satisfy kv − wk2 = kvk2 + kwk2 − 2kvk kwk cos θ. (4.77) v · w = kvk kwk cos θ, (4.78) Comparing this with (4.76), we get θ kvk kwk kv − wk Restating this result in higher dimensions: CONTENTS 173 Definition 4.1.6. The angle θ between non-zero vectors v and w in Rn has cos θ = θ = cos −1 v·w kvkkwk v·w kvkkwk . That is, . Note: If the angle between two vectors is π2 , they are said to be orthogonal. %Matlab-session %dot product, length, unit vector, angle clf u=[1,2 2]’ v=[1 1 1]’ %column vectors u’*v %u.v or dot(u,v) norm(u) %length of u norm(u-v) %distance between u and v a=u/norm(u) %unit vector in direction of u b=v/norm(v) %unit vector in direction of v theta=acos(a’*b) %angle between u and v axis([0 2 0 2 0 2]); %set up axes for plot A=[u,zeros(3,1),v] %3 pts: u,0,v in mtx plot3(A(1,:),A(2,:),A(3,:)) %plot them text(1,2,2,’u’) %label u at (1,2,2) text(1,1,1,’v’) %label v at (1,1,1) Example 4.1.7. If two unit vectors u and v are parallel, what is u · v? CONTENTS 174 4.2 Vector Representation of Lines and Planes 4.2.1 Vector Representation of Lines and Planes The equations for lines and planes in R2 and R3 can be written in vector form or as a system of linear equations. We denote by r = (x, y, z) an arbitrary point in R3 , and by r0 = (x0 , y0 , z0 ) a particular point. When the z-coordinate is 0, we have a point in R2 . y r0 (i) The vector form of the equation of a line through point r0 with direction d is r = r0 + td, t∈R (4.79) r (ii) The vector form of the equation of a plane through point r0 with directions d and d’ is r = r0 + sd + td′ , s, t ∈ R x d (4.80) z Note: r0 • lines have one direction vector, r d • planes have two direction vectors. y d′ x Example 4.2.1. The solution set of the equation x + 2y − 3z = 0 is a plane. (a) Find the equation of the plane in vector form. (b) Verify that it is a plane through the origin perpendicular to the vector 1, 2, −3 . In R3 , arbitrary non-parallel vectors u and v, emanating from the origin, lie in exactly one plane: x = sv + tw, the span of the vectors v and w. A vector which uniquely determines this plane is its normal vector. Definition 4.2.2. A vector n is normal to a plane P if it is perpendicular to every vector in P . Given a plane, how do we find a vector normal to it? The Cross-Product Definition 4.2.3. The cross-product of vectors v and w in R3 is denoted v × w, and defined as v×w = v2 w2 v w3 v v2 i− 1 j+ 1 k v3 w3 w1 w3 w1 w2 (4.81) CONTENTS 175 If v and w are not parallel, v × w is a vector perpendicular to them both. In (4.81), we use the notation 1 i = 0 , 0 0 j = 1 , 0 0 k = 0 . 1 Often, for convenience, we write the formula (4.81) in a briefer form as i j k v × w = v1 v2 v3 w1 w2 w3 (4.82) (4.82) is not a real determinant, but it may be helpful as a mnemonic. n=v×w w 111111111111111111 000000000000000000 000000000000000000 111111111111111111 000000000000000000 111111111111111111 000000000000000000 111111111111111111 000000000000000000 111111111111111111 P 000000000000000000 111111111111111111 000000000000000000 111111111111111111 000000000000000000 111111111111111111 000000000000000000 111111111111111111 000000000000000000 111111111111111111 000000000000000000 111111111111111111 000000000000000000 111111111111111111 v Example 4.2.4. Find the following cross-products: −1 0 (c) 1 × −2 0 1 (a) i × j (b) i × k v |v| sin θ θ w Figure 4.15: Area of parallelogram is |v × w|. Example 4.2.5. (a) Choose any non-parallel vectors v and w in R3 and verify that v × w is a vector perpendicular to them both. (b) For any non-parallel vectors v and w in R3 with angle θ between them, (i) |v × w| = |v||w| sin(θ). (ii) The area of a parallelogram with sides the vectors v and w is |v × w|. CONTENTS 176 The normal equation of a plane in R3 The plane through a point r0 = (x0 , y0 , z0 ) and containing vectors v and w has vector form r = r0 + sv + tw. r = (x, y, z) represents an arbitrary point on the plane. Referring to Figure 4.16 • The vector r − r0 = (x − x0 , y − y0 , z − z0 ) from r0 to r represents an arbitrary vector in the plane (in the span of v and w). n v • The vector n = v × w is normal to the plane. • The vector r−r0 lies in the plane, so is orthogonal to n: (x, y, z) P (x0 , y0 , z0 ) (r − r0 ) · n = (x − x0 , y − y0 , z − z0 ) · n = 0. w This holds for all points (x, y, z) in the plane. Figure 4.16: Normal vector to a plane The normal equation of a plane through point r0 = (x0 , y0 , z0 ) with normal vector n = (a, b, c), is ax + by + cz = d where d = ax0 + by0 + cz0 . In particular, the equation of a plane through the origin with normal (a, b, c) is ax + by + cz = 0. Recall: The equation of a line in R2 through the origin is ax + by = 0. Example 4.2.6. (a) Find the normal equation of the plane through the origin and the vectors v = 1 2 1 and w = 1 0 −1 . 1 1 (b) Find the normal equation of the plane containing the vectors 1 , 1 , and through the point (1, 2, 1). 1 −1 1 1 (c) Find the normal equation of the plane spanned by the vectors 1, 1 . 1 −1 (d) Find the equation of the line of intersection of the two planes 2x + 3y − z = 0 and x − y = 0. (e) Find the equation of the plane through the three points (1, 1, 1), (1, 0, −1) and (1, 2, 3). (f) Write the equation of the plane z = 1 in vector form. CONTENTS 177 (g) Write the equation of the plane through the origin, with normal k, in vector form. Example 4.2.7. Matlab plots the vector form of lines and planes if strings, specifying the coordinates, are given. The following script illustrates this feature and its cross-product function: Two vectors and their span (a plane) are illustrated. A normal to this plane is obtained with the cross function. Rotate the graph to see the normal to the plane. %Matlab-session: plot plane from vector form view([-30,-60]); hold on; %set viewpoint ezplot3(’1*s’,’1*s’,’s’,[0,5]); % parametric form of line thru (1,1,1) and (0,0,0) axis image; %don’t scale axes differently ezplot3(’0’,’t’,’-3*t’,[0,5]); %parametric from of line thru (0,1,-3) and (0,0,0) ezmesh(’1*s’,’1*s+t’,’s-3*t’); %parametric form of plane through these two lines %(use ezplot3 to plot lines in 3d, and ezmesh to plot planes in 3d u=5*[0 1 -3]’; v=5*[1 1 1]’; n=cross(u,v)/25; % normal vector to u and v title_string=sprintf(... %continuation on next line ’normal line: %g x+ %g y+ %g z=0’,n(1),n(2),n(3)); text(u(1),u(2),u(3),’u’);% label u text(v(1),v(2),v(3),’v’);% label v text(n(1),n(2),n(3),’n’);% label n title(title_string); %apply title to graph plot3([n(1),0,-n(1)],[n(2),0,-n(2)],[n(3),0,-n(3)]); %plot normal vector too hold off; normal line: 4 x+ −3 y+ −1 z=0 30 20 z 10 0 −10 n v −20 u −30 −5 −10 0 0 5 x 10 y CONTENTS 178 4.3 Systems of Linear Equations and Matrices 4.3.1 Systems of Linear Equations A linear equation in n variables x1 , x2 , · · · , xn (for any positive integer n) has the structure c1 x1 + c2 x2 + · · · + cn xn = b. (4.83) The coefficients c1 , · · · cn and right-hand-side term b are given (known) real numbers. An expression is linear when all variables xi appear as linear (first power) terms: x1i , i = 1, 2, · · · , n. For example 2x1 − 3x2 − 4x3 = 9 is a linear equation in three variables. A linear expression contains • no quadratic terms x2i or xi xj , • no cubic terms x3i , xi x2j • no other non-linear terms, e.g. √ xi , sin xi etc. Example 4.3.1. The quadratic equation x2 + 2x + 1 = 0 is not linear. Lines in the plane are graphs of linear equations The simplest linear equation represents a line in the (x, y)-plane: ax + by = d. This may be more familiar to you in the form y = mx + c. (4.84) For every real number x, a corresponding y = mx + c is determined. The graph of the line is the set of all points (x, y) satisfying (4.84). Planes in 3-d space are the graphs of linear equations The next-simplest linear equation involves three variables, ax + by + cz = d (4.85) and represents a plane in (x, y, z)-space. z The simplest planes are the co-ordinate planes of R3 • the xy-plane: all points (x, y, z) with z = 0 • the yz-plane: all points (x, y, z) with x = 0 • the xz-plane: all points (x, y, z) with y = 0 00000000 11111111 111111 000000 00000000 11111111 00000000 11111111 y=0 000000 111111 00000000 11111111 00000000 11111111 000000 111111 00000000 x = 0 11111111 00000000 11111111 000000 111111 00000000 11111111 00000000 11111111 000000 111111 00000000 11111111 00000000 11111111 000000 111111 00000000 11111111 00000000 11111111 000000 111111 00000000 11111111 000000 111111 y 000000 111111 000000 111111 000000 111111 000000 111111 x z=0 Figure 4.17: The co-ordinate planes of 3-d space CONTENTS 179 Example 4.3.2. Draw the graph of the plane 2x − y = 0 in 3-d space. Hint: • it contains the line y = 2x, and • z is not specified, so it is “free”. Definition 4.3.3. A solution of the linear equation a1 x1 + a2 x2 + · · · + an xn = b. is a set of real numbers s1 , s2 , · · · , sn which when substituted for the variables x1 , x2 , · · · , xn satisfy the equation. Example 4.3.4. The linear equation 2x1 + 3x2 = 1 has one solution s1 = −4, s2 = 3, because 2(−4) + 3(3) = 1. We say “(x1 , x2 ) = (−4, 3) is a solution.” Graph the set of all solutions of 2x1 + 3x2 = 1 in the x1 x2 -plane. How many solutions are there to this equation? Definition 4.3.5. A system of linear equations is a set of linear equations. A system of m linear equations in n unknowns has the form a11 x1 + a12 x2 + · · · +a1n xn = b1 a21 x1 + a22 x2 + · · · +a2n xn = b2 (4.86) .. .. .. .. . . . . am1 x1 + am2 x2 + · · · +amn xn = bm Every coefficient aij is a real number. Example 4.3.6. The set of equations x1 + 3x2 = 1 x1 + x2 = 1 (4.87) is a system of two linear equations in two unknowns. Definition 4.3.7. A solution of the system of linear equations (4.86) is a set of real-number values for the variables x1 , x2 , · · · , xn which satisfies all equations simultaneously. Example 4.3.8. The system of linear equations (4.87) has one solution x1 = 1, x2 = 0, as 1 + 3(0) = 1 and 1 + 0 = 1. The point (x1 , x2 ) = (1, 0) is the intersection of the lines given at (4.87). (4.88) CONTENTS 180 Example 4.3.9. Using Figure 4.17, locate the solution of the system of equations x = 0, y=0 in the variables x, y and z (3-d space), by observing the intersection of the given planes. Example 4.3.10. The system of equations x = 0, x=1 has no solution. Definition 4.3.11. If a system has no solution, we say it is inconsistent. If a system is not inconsistent, it is consistent. Notes: • To solve a system of equations in n unknowns, we need a consistent system of n equations. • Any fewer independent equations and we have undetermined, or free variables. • Any more independent equations and we have inconsistency. x2 x2 x1 x1 (a) (b) x2 x1 (c) Figure 4.18: (a) No solution, (b) One solution, and (c) Infinitely many solutions CONTENTS 181 Example 4.3.12. From Figure 4.18: (a) The system of equations 2x1 − x2 = 1 2x1 − x2 = 2 is inconsistent. (b) The system of equations 2x1 − x2 = 1 x1 + 2x2 = 2 has exactly one solution. (c) The system of equations 2x1 − x2 = 1 4x1 − 2x2 = 2 has infinitely many solutions. A system of linear equations falls into one of three categories: it has either • no solution • one solution • infinitely many solutions Definition 4.3.13. A system of linear equations a11 x1 + a21 x1 + .. . a12 x2 + a22 x2 + .. . ··· ··· +a1n xn = +a2n xn = .. . 0 0 .. . am1 x1 + am2 x2 + · · · +amn xn = 0 with every right-hand-side zero is called homogeneous. Notes: 1. A homogeneous linear system has the solution xi = 0, for all i. This is called the trivial solution of the system. 2. The trivial solution is the obvious solution of a homogeneous system. It always exists. 3. When solving homogeneous systems, we will focus on finding non-trivial solutions. In Matlab, an explicit system of equations can be solved with the “solve” command: %Matlab script: %Solve a system of equations when given explicitly. eqn1=’2*x + 3*y - z = 0’ eqn2=’ - y + z = 1’ eqn3=’x - 2*y - z = -1’ [x,y,z]=solve(eqn1, eqn2, eqn3) %solve 3 eqns simultaneously [x,y]=solve(eqn1,eqn2) %solution has one free variable - by default, last(z)) [x,z]=solve(eqn1,eqn2,’x,z’) %make y free variable It’s easy to solve two equations in two unknowns with simple algebra, but for more equations in more variables, we tend to abbreviate the process to systematise our work. CONTENTS 182 Augmented Matrices Definition 4.3.14. A set of equations a11 x1 + a21 x1 + .. . a12 x2 + a22 x2 + .. . ··· ··· +a1n xn = +a2n xn = .. . b1 b2 .. . (4.89) am1 x1 + am2 x2 + · · · +amn xn = bm can be abbreviated by an array of numbers, i.e. a matrix: a11 a12 · · · a1n b1 a21 a22 · · · a2n b2 .. .. .. .. . . . . am1 am2 · · · amn bm (4.90) (4.90) is called the augmented matrix of coefficients for the system of linear equations (4.89). It is often written [A|b]. In this review, now we will focus on solving systems of linear equations by working with the augmented matrix of the system. Example 4.3.15. Write the augmented matrix for the system 2x + 3y − z = 0 . − y + z = 1 x − 2y − z = −1 In Matlab, this is done by the command: %Form augmented matrix from system of equations Ax=b A=[2 3 -1 ;0 -1 1;1 -2 -1]; b=[0 1 -1]’; % or b=[0;1;-1]; Ab=[A,b] %form augmented matrix sol=rref(Ab) %solves system - reduced echelon form sol_2=A\b %same, as long as a unique solution exists Elementary Row-Operations The algebra needed to solve a system of linear equations can be viewed as a sequence of operations on the rows of the augmented matrix of coefficients of the system. We classify three types of such “elementary row-operations”: (i) row-exchange: Interchange any two rows (ri ↔ rj ). (ii) row-multiple: Multiply a row by a non-zero constant (ri → kri ). (iii) row-addition: Replace a row by itself plus any multiple of another row (ri → ri − krj ). Example 4.3.16. In sequence, apply the row operations (i) r1 ↔ r2 to the augmented matrix (ii) r3 → r3 − r1 2 3 −1 0 1 −1 1 1 1 −2 −1 −1 CONTENTS 183 In Matlab: %row-reductions by hand - very useful %using matrix A= 2 3 -1 0 % 1 -1 1 1 % 1 -2 -1 -1 echo on; A=[2 3 -1 0; 1 -1 1 1; 1 -2 -1 -1]; E=A; %backup A - in case of errors E=[E(2,:);E(1,:);E(3,:)] %r_2 <-> r_1 E(3,:)=E(3,:)-E(1,:) %r_3 -> r_3-r_1 echo off; Definition 4.3.17. Two matrices A and B related by a series of row-operations are written “A∼B”. We say A and B are row-equivalent. Echelon and Reduced Echelon Form Definition 4.3.18. (a) A matrix A (not necessarily square) is in echelon form if • all rows of zeros are at the bottom • the first non-zero entry in every row (called a “pivot” or “leading entry”) is to the right of the first non-zero entry in the previous row (step-like pattern of leading entries) • the leading entry in every row has zeros below it (b) A matrix A is in reduced echelon form if • it is in echelon form • the leading entry in every row (the pivot) is 1 • each leading 1 is the only non-zero entry in its column Notes: 1. Echelon form for a matrix is not unique - there are many possibilities. However, in echelon form the number of pivots is unique. This is the number of non-zero rows in echelon form. 2. Reduced echelon form for any matrix is unique (there is only one). In the following examples, the pivots are circled. Example 4.3.19. Neither echelon form nor reduced-echelon form need have non-zero entries on their diagonal. 1❦ 0 (a) - reduced echelon 0 0 ❦ 2 −1 1 - echelon (b) 0 0 3❦ 0 0 0 Example 4.3.20. Reduced echelon form need not have all non-diagonal entries zero (again, the pivots are circled). ❦ 1❦ 1 1 0 0 2 (a) 0 0 (c) 0 1❦ 0 −3 ❦ 0 0 1❦ 2 1 0 0 (b) 0 1❦ 1 0 0 0 CONTENTS 184 Definition 4.3.21. The rank of a matrix A, denoted rank(A), is the number of non-zero rows in any echelon form of A. Note: The rank of a matrix is the number of pivots in echelon form. Gaussian Elimination Definition 4.3.22. Gaussian elimination is the procedure of using elementary row-operations to reduce a matrix to echelon form. Reducing a matrix to reduced echelon form with elementary row-operations is sometimes known as GaussJordan elimination. FACT The solution of a system of linear equations is not changed by performing elementary row-operations on the augmented matrix of the system. Example 4.3.23. In Matlab: %Matlab-session % row-reduction to reduced echelon form %using matrix A= 2 3 -1 0 4 % 1 1 2 2 0 % 3 0 -1 4 5 % 1 6 5 6 -4 echo on A =[2 -3 1 0 4; 1 1 2 2 0; 3 0 -1 4 5; 1 6 5 6 -4] C = rref(A) % reduced echelon form pause % wait till user presses enter format rat; C % show C again, this time in fractions: pause [C,pivotcols] = rref(A) %another way to call rref: % ask rref for two answers (on lhs) % the second will be the pivot columns rank(A); format; echo off % back to defaults We now recall the procedure of solving a system of linear equations using row-reduction and back-substitution for the 3 × 3 case, with an example. Example 4.3.24. The system of linear equations 2x1 + x2 + x3 = 1 4x1 + x2 = −2 −2x1 + 2x2 + x3 = 7 has augmented matrix (4.91) 1 2 1 1 4 1 0 −2 −2 2 1 7 We row-reduce it to echelon form. At each step, • the pivot (leading non-zero entry) for the current row will be circled, • the non-zero entries below the pivot, boxed, will be eliminated with row operations. 1 1 2 1 1 1 1 2 1 1 1 2❦ 1 2❦ 1 1 4 1 0 −2 0 −1 −2 −4 ∼ 0 -1❦ −2 −4 ∼ ∼ 0 −1 −2 −4 r3 →r3 +r1 r2 →r2 −2r1 r3 →r3 +3r2 0 0 −2 2 1 7 7 2 8 0 3 −2 2 1 -4❦ −4 ↑ ↑ ↑ echelon form CONTENTS 185 The solution can be“read" off in reverse order, from x3 to x1 , using backwards (or back) substitution: row 3: −4x3 = −4 row 2: row 1: ⇒ x3 = 1 Using Matlab, %Matlab-session % Solve linear system with unique solution % 1) from matrix A and rhs b %using matrix A= 2 1 1 % 4 1 0 % -2 2 1 echo on %echo commands as executed A = [2 1 1;4 1 0;-2 2 1]; b = [1 -2 7]’; echo on; A\b %find solution if exists pause % 2) from rref of augmented matrix Ab = [A,b] %form augmented matrix rref(Ab) %read off solutions from rref pause % 3) from explicit equations, symbolically [x1,x2,x3]=solve(’2*x1+x2+x3=1, 4*x1+x2+0*x3=-2, -2*x1+2*x2+x3=7’) echo off Solution Technique 4.3.25. To solve a linear system of equations Ax = b using row-reduction: • form the augmented matrix [A|b], and • either reduce the matrix to reduced echelon form and read off the solutions, or (i) use Gaussian elimination to row-reduce it to echelon form (ii) solve with back-substitution Nature of the solution of a linear system Note: In solving a system of linear equations with augmented matrix [A|b] for variables x1 , x2 , . . . , xn , the augmented matrix [A|b] is reduced to echelon form. For this echelon form: (a) If the leading non-zero entry (pivot) in any row is in the last column, 1 0 2 e.g. [A|b] ∼ 0 0 1❦ the system is inconsistent : there is no solution. (b) otherwise, the system is consistent. In this case, (i) if any column i on the left-hand side has no pivot, the corresponding xi is a free variable - it can take any real value " # e.g. [A|b] ∼ 1❦ 2 1 0 1❦ 1 0 0 0 2 3 0 1 1❦ 4 − x3 is free ↑ Another example of this is when there is a row of zeros at the bottom of echelon form: 1❦ 0 2 e.g. [A|b] ∼ 0 0 0 − x2 is free ↑ CONTENTS 186 (ii) if column i on the left-hand side has a pivot, then xi is a bound variable - it is determined. The number of bound variables is rank(A). (iii) if every column of the left-hand side contains a pivot, there is a unique solution. # " ❦ e.g. [A|b] ∼ 1 0 0 2 1 3 1❦ 1 3 0 1❦ 4 Note: Several linear systems can be solved simultaneously by augmenting the coefficient matrix with any number of right-hand-side vectors. For example, we write the augmented matrix for a set of systems Ax = b1 , Ax = b2 , Ax = b3 as (4.92) A b1 b2 b3 . The reduced echelon form of this more general augmented matrix then yields the solution of all systems at once. Example 4.3.26. ❦ 1 4 −1 0 2 0 1❦ −1 2 2 0 0 0 1❦ 1 is the echelon form of the augmented matrix of a linear system of equations. The pivots are in columns 1, 2 and 4 – so x1 , x2 and x4 are bound, and x3 is free. Suppose x3 = s ∈ R. Using back-substitution, row 3: x4 = 1 row 2: x2 = s − 2 + 2 = s row 1: x1 = −4s + s + 2 = −3s + 2. The solution is x1 −3s + 2 −3 2 x2 1 0 s = = s + . x3 1 0 s 1 x4 1 0 This is called the general solution of the system. Example 4.3.27. Find the general solution ofthe pair of systems whose augmented matrix in echelon form is 2 1 1 0 1 3 0 0 1 −1 0 −2 . 0 0 0 −2 6 2 0 0 0 0 0 0 CONTENTS 187 In Matlab, you can provide names for the bound variables and get a general solution. The command rref returns the pivot-columns (corresponding to bound variables). The free variables are the variables which are not bound. %Matlab-session % Solve a linear system with infinitely many solutions % 1) using row-reduction A = [1 1 1 1; 0 0 1 -1 ;0 0 0 -2; 0 0 0 0]; b=[3 0 6 0]’ [I,pivotcols]=rref([A,b]) pause % 2) using solve [x1,x3,x4]=solve(’x1+x2+x3+x4-3, x3-x4,-2*x4-6’,’x1,x3,x4’) pause % 3) using solve, with user-input of free variable value: x_2=input(’enter value of free variable x_2: ’); [x1,x2,x3,x4]=solve(... ’x1+x2+x3+x4=3, x3-x4,-2*x4-6,x2-x_2’); v=subs([x1,x2,x3,x4]); disp(v); disp(’now see how A\b works when there is not a unique solution’) disp(’press enter to continue’); A\b If the system has at most three variables, Matlab can graph the individual equations. Their common intersection is the solution. Rotate the graph to see the solution. %Matlab-session % Solve a linear system with infinitely many solutions % (a line) - we plot the solution too view(37,65); A = [1 1 1; 1 1 -1]; b=[3 1]’ [I,pivotcols]=rref([A,b]) disp(’infinitely many solutions from rref.’) disp(’now display symbolic solution’); pause syms x1 x2 x3; %specify bound variables as symbolic [x1,x2,x3]=solve(’x1+x2+x3-3, x1+x2-x3-1’,x1,x2,x3) ezplot3(x1,x2,x3,[-3 3]); %plot solution hold on; %superimpose graphs u=-3:0.1:3; [x,y]=meshgrid(u,u); mesh(x,y,3-x-y) %plot first equation mesh(x,y,-1+x+y); hold off; %plot second equation CONTENTS 188 x = −x2+2, y = x2, z = 1 10 z 5 0 −5 −10 −5 0 y 5 −4 2 0 −2 4 6 8 x 4.3.2 Matrix notation and concepts Introduction • A matrix is a rectangular array of numbers ordered in rows and columns. • A matrix with m rows and n columns has size m × n. For example, the matrix a11 a12 a13 a14 2 1 1 0 A = a21 a22 a23 a24 = 4 1 0 −3 a31 a32 a33 a34 −2 2 1 −1 has size 3 × 4. • The number in the ith row and jth column of an matrix A is denoted aij , and called the (i, j) entry or (i, j) component or term of A. • An m × n matrix A is generally written A = (aij )m×n Note: All matrices we study will be real matrices: their entries numbers from the real number-line R. • Special sizes: – – – – An m × n matrix with m = n is called square. A 1 × 1 matrix a11 is written a11 and called a scalar. It is an element of R - a real number. A matrix with one column is called a column vector. An m × 1 column vector is an element of Rm . A matrix with one row is called a row vector. A 1 × n row vector is an element of Rn . • Two matrices are equal if they are the same size, and all corresponding entries are equal, i.e.: A=B ⇔ aij = bij for all relevant i and j. Special Matrix Structure Diagonal and Triangular Matrices Definition 4.3.28. The main diagonal of an m × n matrix A consists of the terms a11 , a22 , . . . , akk , where k = min(m, n). CONTENTS 189 Example 4.3.29. 1 2 3 4 Circle the entries on the main diagonal of the matrix A = 5 6 7 8 . 9 10 11 12 Definition 4.3.30. A square matrix is diagonal if all non-zero entries h ∗ 0 are i 0 on the main diagonal: 0 ∗ 0 ; 00∗ upper-triangular if all non-zero entries h ∗ ∗are i ∗ on or above the main diagonal: 0 ∗ ∗ ; 00∗ lower-triangular if all non-zero entries i h ∗ 0are 0 on or below the main diagonal: ∗ ∗ 0 ; ∗∗∗ triangular if it is either upper-triangular or lower-triangular. Note: A square matrix in echelon form is upper-triangular. Example 4.3.31. Classify the following matrices as upper-triangular, lower-triangular or diagonal: 0 0 0 1 2 (a) (d) 1 0 0 0 3 0 0 0 1 0 0 0 3 (b) 4 5 0 (e) 1 0 6 0 0 1 2 3 1 0 0 (f) 0 2 0 (c) 0 4 5 0 0 0 0 0 3 Matrix Operations Matrix Addition/Subtraction We define addition and subtraction on matrices of the same size, component-wise. If A = (aij )m×n and B = (bij )m×n , A + B = (aij + bij )m×n . A − B = (aij − bij )m×n . The order in which matrices are added is irrelevant: A + B = B + A, and A + (B + C) = (A + B) + C. Scalar Multiplication The scalar multiple of matrix A = (aij )m×n and real scalar r has • the same size as A, • rA = (raij )m×n , i.e. each term of rA is just r multiplied by the corresponding term of A. Scalar multiplication satisfies the distribution property r(A + B) = rA + rB. CONTENTS 190 Matrix Multiplication If matrix A has as many columns as matrix B has rows, the matrix product AB is defined: (i,j)-entry: (ab)ij = (ith row of A) • (jth column of B) (4.93) ↑ dot-product compatibility/size: (m × n) × (n × p) = (m × p). Example 4.3.32. −1 0 2 −1 1 1 0 −1 1 1 , If A = and B = 0 0 1 −1 0 1 −1 1 −1 0 then (ab)13 = a 1st row · b 3rd column . = 1 0 −1 1 · 2 1 −1 0 = (1)(2) + (0)(1) + (−1)(−1) + (1)(0) =3 0 −2 3 . In full, AB = −1 0 2 Identity Matrix In real arithmetic, the number 1 is the multiplicative identity - multiply any number n by 1 and you get that number back: n·1 = 1·n = n for n ∈ R. The identity matrices In perform the same role in matrix multiplication – for an m × n matrix A, AIn = Im A = A. (4.94) Here, the identity In is a square n × n matrix: In = (iij )n×n , Example 4.3.33. ( 1 i=j iij = 0 i= 6 j 1 0 0 ··· 0 1 0 · · · In = 0 0 1 · · · .. .. .. . . . . . . 0 0 0 ··· (4.95) 0 0 0 .. . 1 1 2 3 Show (4.94) for A = . −1 0 1 Note: The order in which matrices are multiplied is important. It may not be that AB is the same as BA. In general, for matrices A (m × n) and B (p × q), • multiplication may not be defined in both orders, • multiplication may be compatible in both orders, but AB and BA may not be the same size. • multiplication may be defined in both orders, but AB 6= BA. Example 4.3.34. For the following pairs of matrices A and B, are the matrix products AB and BA both defined? If so, is AB = BA? CONTENTS 1 2 (a) A = , 1 1 1 2 (b) A = 2 1, 1 1 (c) A = 1 2 3 , 191 1 1 B= . 1 1 1 0 1 B= , 0 −1 2 1 1 B = 1 1. 1 1 Matrix multiplication satisfies the properties A(BC) = (AB)C A(B + C) = AB + AC (A + B)C = AC + BC Example4.3.35. 1 0 0 −1 0 0 For A = −1 −1 0 and B = −1 1 0, find AB and BA. 0 −1 1 1 0 1 The product of two lower-triangular matrices is again lower-triangular. The product of two upper-triangular matrices is again upper-triangular. Matrix-Vector Multiplication The multiplication of an n × m matrix A with an m × 1 vector x can be expressed in two ways: a11 a12 a13 x1 Ax = a21 a22 a23 x2 a31 a32 a33 x3 (1) matrix form (4.96) (4.97) a11 x1 + a12 x2 + a13 x3 Ax = a21 x1 + a22 x2 + a23 x3 , a31 x1 + a32 x2 + a33 x3 (2) vector form a11 a12 a13 Ax = a21 x1 + a22 x2 + a23 x3 . a31 a32 a33 CONTENTS 192 Matrix Transpose Definition 4.3.36. By “flipping" an m × n matrix A along its main diagonal, we obtain its transpose, the n × m matrix AT . Notationally, aT = (aTij ) = (aji ) Example 4.3.37. 2 1 1 2 4 −2 A = 4 1 0 ⇒ AT = 1 1 2 . −2 2 1 1 0 1 Definition 4.3.38. A matrix A equal to its own transpose A = AT is called symmetric. Example 4.3.39. Which of the following matrices are symmetric? 1 0 (a) A = 0 2 1 0 0 (b) B = 1 2 0 3 −2 1 −2 1 0 (c) C = −1 2 1 0 −1 2 1 2 3 (d) D = 2 5 7 3 7 −1 Determinant of a matrix Associated with a square n × n matrix A is a scalar called the determinant of A, written either det(A) or |A|. In the scalar case, det a = a. In the 2 × 2 case, a b a b = ad − bc. det = c d c d Cofactors, covered in 108, give this formula directly, and a formula for the determinant of a 3 × 3 matrix that you may have seen before: a11 a12 a13 a a a a a21 a22 a23 = a11 22 23 − a12 21 23 a32 a33 a31 a33 a31 a32 a33 + a13 (4.98) a21 a22 a31 a32 Of course, using cofactors, you can evaluate the determinant down any column or across any row to make for less work. And with Matlab, %Matlab-session % echo on a=[1 2 3;2 3 4;1 1 -1] det(a) syms x b=[exp(x) exp(-x); exp(x) -exp(-x)] det(b) simplify(ans) echo off; CONTENTS 193 Example 4.3.40. (a) det(In ) = 1 for any n > 0. a 0 0 (b) Find 0 b 0 . 0 0 c a b c (c) Find 0 d e . 0 0 f FACT The determinant of a triangular matrix is the product of its diagonal terms. Note: This is not the case for matrices in general. Inverse of a matrix Definition 4.3.41. (i) If an n × n matrix A has an inverse, we say it is invertible, or non-singular. (ii) Its inverse is another n × n matrix A−1 (read “A inverse”), with the property that AA−1 = A−1 A = In . (iii) A square matrix without an inverse is called singular. When a square matrix has an inverse, its reduced echelon form is the identity. Row-reduction performed simultaneously on the identity derives A−1 . To invert a matrix using Gaussian elimination • form the augmented matrix [A : In ] • use Gaussian elimination simultaneously on both parts of the augmented matrix to reduce A (on the left) to In • the resulting expression on the right is A−1 - the augmented matrix is now [In : A−1 ]. FACT an n × n matrix A has an inverse exactly when its reduced echelon form is In , the identity matrix, A ∼ In . So a square matrix A is singular when its reduced echelon form is not the identity matrix. Notes 1. The inverse of a square lower-triangular matrix is again lower-triangular. 2. Similarly, the inverse of a square upper-triangular matrix is again upper-triangular. Example 4.3.42. Find the inverse of each triangular matrix: 1 0 0 (a) −2 1 0 0 0 1 1 0 0 (b) 0 1 0 0 −3 1 1 2 1 (c) 0 2 1 0 0 −1 i.e. CONTENTS 194 Matlab uses two equivalent notations for finding the inverse of a matrix directly. We can row-reduce the augmented matrix in Matlab too. %Matlab-session % Invert a matrix echo on; format rat A = [2 1 -1; 1 -1 1; 1 -2 1] inv(A) % directly A^(-1) %equivalent expression pause AI=[A,eye(3)] % using Gaussian elimination rref(AI) echo off; format Example 4.3.43. When matrices A and B have inverses, their product AB has inverse B −1 A−1 . Using matrix-multiplication, show this. matrix inverses: Ax = b When a square matrix A is invertible (non-singular), the solution of the linear system Ax = b is given by pre-multiplying both sides of this equation by A−1 : A−1 Ax = A−1 b (4.99) ∴ x = A−1 b In other words, If A is a non-singular matrix, every system of linear equations Ax = b has a unique solution. Note: This yields the important result: If A−1 exists, then – i.e. there is no other solution. Ax = 0 ⇒ x=0 (4.100) Example 4.3.44. Use your answer to Example 4.3.42 to solve the linear system 1 2 1 2 0 2 1 x = 3 0 0 −1 1 In practise, unless we need to find the inverse of a matrix for some other reason, the procedure (4.99) is seldom used to solve larger systems of equations. It is much less work (about one-third) to use row-reduction. And with less computation involved, fewer computational errors are incurred. CONTENTS 195 Summary 4.3.45. The following are equivalent conditions: • A−1 exists – i.e. A is non-singular, or invertible • det(A) 6= 0 • A ∼ In • every row and column of echelon form has a pivot • rank(A) = n • the system of linear equations Ax = b has a unique solution for every b • the homogeneous system of linear equations Ax = 0 has only the trivial solution x = 0 Consequently, when A does not have an inverse: Summary 4.3.46. The following are equivalent conditions: • A−1 does not exist – A is a singular matrix • det(A) = 0 • A 6∼ In • not every row and column of echelon form has a pivot • rank(A) < n • the homogeneous system of linear equations Ax = 0 has a non-trivial solution x 6= 0 Index applications of differential equations business, 141 finance, 140 population growth, 147 augmented matrix of coefficients, 182 discrete, 122 echelon form, 64, 183 eigenspace, 107 eigenvalue of matrix, 106 eigenvector of matrix , 106 error basis, 77, 90 Taylor polynomial aporthogonal, 91 proximation, 46 orthonormal, 91 least squares, 97 by-parts, 58 numerical solution of DE, chain rule, 56 146 characteristic equation, 163 Euclidean vector space, 76 of a DE, 149 existence complex roots, 164 solution of 1st order DE, distinct roots, 164 131 repeated roots, 166 family of solutions Col(A), 81 DE, 131 column space, 81 convergence Gauss-Jordan elimination, 184 radius of, 50 Gaussian elimination, 184 convergence test general solution ratio, 43 of system of 1st-order cross-product, 174 DEs, 151 cross-product term, 113 homogeneous linear DE, 162 DE, 128 to logistic equation, 147 derivative general solution directional, 15 linear system, 87 determinant general solution of system by cofactor expansion, of 1st-order DEs, 104 185 diagonalisation geometric series, 41 of matrix, 110 gradient vector, 14 differential equation first-order linear, 136 Gram-Schmidt process, 94 homogeneous, 156 Hessian, 19 linear, 129 homogeneous second-order linear, 156 differential equation, 149 separable, 133 linear system , 64 directional derivative, 15 dot product, 171 indeterminate form, 32 dynamical system, 122 inhomogeneous 196 INDEX linear system , 64 initial condition, 132, 157 initial-value problem first-order, 132 initial condition, 132, 157 second-order, 157 second-order homogeneous, 167 integrating factor, 136 integration substitution, 56 integration by parts, 58 interval of convergence power series, 50 inverse by Gaussian elimination, 193 197 column space, 81 determinant, 65, 105, 192, 195 diagonal, 103, 188 Hessian, 19 identity, 190 indefinite, 20 inverse, 193 invertible, 65, 193, 195 lower-triangular, 188 main diagonal, 103, 188 minor, 103 negative-definite, 20, 114 non-singular, 193 nullity, 85 nullspace, 83 orthogonal, 91 positive-definite, 20, 114 principal minor, 114 L’Hôpital’s Rule, 32 rank, 65, 195 least squares row operations, 64, 182 error, 97 row-equivalent, 64, 183 linear combination, 66 stochastic, 117 linear dependence, 68, 158 symmetric, 192 linear independence, 68, 158 transpose, 192 linear system triangular, 188 homogeneous , 64 triangular, upper & lower, inhomogeneous , 64 103 general solution, 87 upper-triangular, 188 homogeneous, 181 matrix multiplication inhomogeneous matrix form, 191 general solution, 87 vector form, 191 particular solution, 87 mixed term, 113 unique solution, 65, 195 linearity, 87 Newton’s Law of Cooling, logistic equation 139 DE, 147 normal equations, 99, 101 long-term, 123 normalising, 91 long-term state vector, 119 Null(A), 83 nullity, 85 Maclaurin formula, 49 numerical solution of DE Maclaurin polynomial, 45 error, 146 Maclaurin series, 49 Markov Chain, 116 order of DE, 129 Markov process orthogonal matrix, 91 state, 116 orthonormal basis, 91 state vector, 116 partial sum, 39 long-term, 119 particular solution, 87 matrix pivot, 64, 183 augmented, 182 characteristic equation, plane normal equation, 176 106 power series, 50 cofactor, 103 INDEX interval of convergence, 50 radius of convergence, 50 principal minor, 114 principal submatrix, 114 product rule, 58 projection orthogonal, 93 orthogonal , 93 quadratic form, 113 indefinite, 114 negative-definite, 114 positive-definite, 114 198 systems, 185 squeezing theorem, 30 standard basis of Rn , 77 state, 116 state vector, 116 stochastic matrix, 117 subspace, 79 substitution, 56 superposition principle, 157 Taylor formula, 49 Taylor polynomial, 45 Taylor series, 49 test ratio, 43 transition matrix, 123 radius of convergence unit vector, 171 power series, 50 rank, 184 reduced echelon form, 64, variable bound, 185 183 dependent, 128 row-operations free, 185 elementary, 64, 182 independent, 128 rref, 68 vector rule gradient, 14 chain, 56 length, 171 product, 58 long-term state, 119 state, 116 sequence, 27 vector space, 75 convergent, 29 basis, 77, 90 divergent, 29 dimension, 78 series, 38 vector space of solutions partial sum , 39 of DE, 157 convergent, 39 vectors divergent, 39 angle between, 172 geometric, 40 orthogonal, 172 slopefield, 142 parallel, 173 solution perpendicular, 172 2nd order DE, 156 span of, 72 DE, 129 explicit, of DE, 129 Wronskian, 159 general, 186 implicit, of DE, 129 trivial, 181 solution particular, 87 solution technique nth -order homogeneous linear constant-coefficient, 167 first-order linear DE, 136 separable, 133 system of DEs, 151