Differential Equations with Linear Algebra Textbook

Worldwide Di↵erential Equations with Linear Algebra Robert McOwen 2 c 2012, Worldwide Center of Mathematics, LLC ISBN 978-0-9842071-2-1 v.1217161311 Contents 0.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 First-Order Differential Equations 1.1 Differential Equations and Mathematical Models 1.2 Geometric Analysis and Existence/Uniqueness . 1.3 Separable Equations & Applications . . . . . . . 1.4 Linear Equations & Applications . . . . . . . . . 1.5 Other Methods . . . . . . . . . . . . . . . . . . . 1.6 Additional Exercises . . . . . . . . . . . . . . . . 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 11 14 21 27 32 2 Second-Order Differential Equations 2.1 Introduction to Higher-Order Equations . . . . . . . . . 2.2 General Solutions for Second-Order Equations . . . . . . 2.3 Homogeneous Equations with Constant Coefficients . . . 2.4 Free Mechanical Vibrations . . . . . . . . . . . . . . . . 2.5 Nonhomogeneous Equations with Constant Coefficients 2.6 Forced Mechanical Vibrations . . . . . . . . . . . . . . . 2.7 Electrical Circuits . . . . . . . . . . . . . . . . . . . . . 2.8 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 35 39 45 50 58 65 71 77 3 Laplace Transform 3.1 Laplace Transform and Its Inverse . . . . . . . . 3.2 Transforms of Derivatives, Initial-Value Problems 3.3 Shifting Theorems . . . . . . . . . . . . . . . . . 3.4 Discontinuous Inputs . . . . . . . . . . . . . . . . 3.5 Convolutions . . . . . . . . . . . . . . . . . . . . 3.6 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 . 79 . 86 . 90 . 94 . 100 . 103 4 Systems of Linear Equations and Matrices 4.1 Introduction to Systems and Matrices . . . 4.2 Gaussian Elimination . . . . . . . . . . . . . 4.3 Reduced Row-Echelon Form and Rank . . . 4.4 Inverse of a Square Matrix . . . . . . . . . . 4.5 The Determinant of a Square Matrix . . . . 4.6 Cofactor Expansions . . . . . . . . . . . . . 4.7 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 105 112 120 125 131 136 142 4 5 Vector Spaces 5.1 Vectors in Rn . . . . . . . . . . . . 5.2 General Vector Spaces . . . . . . . 5.3 Subspaces and Spanning Sets . . . 5.4 Linear Independence . . . . . . . . 5.5 Bases and Dimension . . . . . . . . 5.6 Row and Column Spaces . . . . . . 5.7 Inner Products and Orthogonality 5.8 Additional Exercises . . . . . . . . CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 145 148 151 157 163 169 172 178 6 Linear Transformations and Eigenvalues 6.1 Introduction to Transformations and Eigenvalues 6.2 Diagonalization and Similarity . . . . . . . . . . 6.3 Symmetric and Orthogonal Matrices . . . . . . . 6.4 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 181 187 193 201 7 Systems of First-Order Equations 7.1 Introduction to First-Order Systems . . . . . 7.2 Theory of First-Order Linear Systems . . . . 7.3 Eigenvalue Method for Homogeneous Systems 7.4 Applications to Multiple Tank Mixing . . . . 7.5 Applications to Mechanical Vibrations . . . . 7.6 Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 203 207 214 222 227 234 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A Complex Numbers 237 Appendix B Review of Partial Fractions 241 Appendix C Table of Integrals 245 Appendix D Table of Laplace Transforms 247 Appendix E Answers to Some Exercises 249 Index 267 0.1. PREFACE 0.1 5 Preface This textbook is designed for a one-semester undergraduate course in ordinary differential equations and linear algebra. We have had such a course at Northeastern University since our conversion from the quarter to semester system required us to offer one course instead of two. Many other institutions have a similarly combined course, perhaps for a similar reason; consequently, there are many other textbooks available that cover both differential equations and linear algebra. Let me describe some of the features of my book and draw some contrasts with the other texts on this subject. Because many students taking the course at Northeastern are electrical engineering majors who concurrently take a course in circuits, we always include the Laplace transform in the first half of the course. For this reason, in my textbook I cover first and second-order differential equations as well as the Laplace transform in the first three chapters, then I turn to linear algebra in Chapters 4-6, and finally draw on both in the analysis of systems of differential equations in Chapter 7. This ordering of material is unusual (perhaps unique) amongst other textbooks for this course, which generally alternate more between differential equations and linear algebra, and put Laplace transform near the end of the book. Another feature of my textbook is a fairly concise writing style and selection of topics. I find that many textbooks on this subject are excessively long: they use a verbose writing style, include too many sections, and many of the sections contain too much material. As an instructor using such a book for a one-semester course, I am constantly deciding what to not cover: not only what sections to skip, but what topics in each section to leave out. I think that students using such a textbook also find it difficult to know what has been covered and what has not. On the other hand, I think it is good to have some additional or optional material, to provide some flexibility for the instructor, and to make the book more appropriate for advanced or honors students. Consequently, in my book I have tried to make judicious choices about what material to include, and to arrange it in such a way as to conveniently allow the instructor to omit certain topics. For example, an instructor can cover separable first-order differential equations with applications to unlimited population growth and Newton’s law of cooling, and then decide whether or not to include the subsections on resistive force models and on the logistic model for population growth. The careful selection and arrangement of material is also reflected in the exercises for the student. At the end of each section I have provided exercises that are designed to develop fairly basic skills, and I grouped problems together according to the skill that they are intended to develop. For example, Exercise #1 may address a certain skill, and there are six specific problems (a-f) to do this. Moreover, the answers to all of these exercises (not just the odd-numbered ones) are provided in the Appendix. In fact, some exercises have solution videos on YouTube, which is indicated by Solution ; a full list of the solution videos can be found at http://www.centerofmath.org/textbooks/diff eq/supplements.html In addition to the exercises at the end of each section, I have provided at the end of each chapter a list of Additional Exercises. These include exercises involving additional 6 CONTENTS applications and some more challenging problems. Only the odd-numbered problems from the Additional Exercises sections are given answers in the Appendix. Let me add that, in order to keep this book at a length that is convenient for a single semester course, I have had to leave out some important topics. For example, I have not tried to cover numerical methods in this book. While I believe that numerical methods (including the use of computational software) should be taught along with theoretical techniques, there are so many of the latter in a course that covers both differential equations and linear algebra, that it seemed inadvisable to try to also include numerical methods. Consequently, I made the difficult decision to leave numerical methods out of this textbook. On the other hand, I have taken some advantage of the fact that this book is being primarily distributed in electronic form to expand the coverage. For example, I have included links to online resources (especially Wikipedia articles) that provide more information about topics that are only briefly mentioned in this book. Again, I have tried to make judicious choices about this: if a Wikipedia article on a certain topic exists but does not provide significantly more information than is given in the text, then I chose not to include it. I hope that the choices that I have made in writing this book make it a valuable learning tool for the students and instructors alike. Robert McOwen June 2012 Chapter 1 First-Order Differential Equations 1.1 Differential Equations and Mathematical Models A differential equation is an equation that involves an unknown function and its derivatives. These arise naturally in the physical sciences. For example, Newton’s second law of motion F = ma concerns the acceleration a of an object of mass m under a force F . But if we denote the object’s velocity by v and assume that F could depend on both v and t, then this can be written as a first-order differential equation for v m dv = F (t, v). dt (1.1) The simplest example of (1.1) is when F is a constant, such as the gravitational force Fg near the surface of the earth. In this case, Fg = mg where g is the constant acceleration due to gravity, which is given approximately by g ≈ 9.8 m/sec2 ≈ 32 ft/sec2 . If we use this in (1.1), we can easily integrate to find v(t): Z t dv dv m = mg ⇒ = g ⇒ v(t) − v0 = g dt ⇒ v(t) = gt + v0 , dt dt 0 where v0 is the initial velocity. Notice that we need to know the initial velocity in order to determine the velocity at time t. While equations in which time is the independent variable occur frequently in applications, it is often more convenient to consider x as the independent variable. Let us use this notation and consider a first-order differential equation in which we can solve for dy/dx in terms of x and y: dy = f (x, y). dx (1.2) We are frequently interested in finding a solution of (1.2) that also satisfies an initial condition: y(x0 ) = y0 . (1.3) 7 Fig.1. Gravitational force 8 CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS The combination of (1.2) and (1.3) is called an initial-value problem. The class of all solutions of (1.2) is called the general solution and it usually depends upon a constant that can be evaluated to find the particular solution satisfying a given initial condition. Most of this chapter is devoted to finding general solutions for (1.2), as well as particular solutions of initial-value problems, when f (x, y) takes various forms. A very easy case of (1.2) occurs when f is independent of y, i.e. dy = f (x), dx since we can simply integrate to obtain the general solution as Z y(x) = f (x) dx + C, where C is an arbitrary constant. On the other hand, if we also require y to satisfy the initial condition (1.3), then we can evaluate C to find the particular solution. This technique was used to solve the gravitational force problem in the first paragraph and should be familiar from calculus, but further examples are given in the Exercises. In (1.1), if we replace v by dx/dt where x denotes the position of the object and we assume that F could also depend on x, then we obtain an example of a second-order differential equation for x: d2 x dx m 2 = F t, x, . (1.4) dt dt An instance of (1.4) is the damped, forced spring-mass system considered in Chapter 2: m d2 x dx +c + kx = F (t). dt2 dt (1.5) But now initial conditions at t0 must specify the values of both x and dx/dt: Fig.2. Spring-mass system x(t0 ) = x0 , dx (t0 ) = v0 . dt (1.6) In general, the order of the differential equation is determined by the highest-order derivative of the unknown function appearing in the equation. Moreover, an initial-value problem for an n-th-order differential equation should specify the first n − 1 derivatives of the unknown function at some initial point. An important concept for differential equations is linearity: a differential equation is linear if the unknown function and all of its derivatives occur linearly. For example, (1.1) is linear if F (t, v) = f (t) + g(t)v, but not if F (t, v) = v 2 . Similarly, (1.4) is linear if F (t, x, v) = f (t) + g(t)x + h(t)v, but not if F (t, x, v) = sin x or F (t, x, v) = ev . For example, (1.5) is linear. (Note that the coefficient functions f (t), etc do not have to be linear in t.) In the above examples, the independent variable is sometimes x and sometimes t. However, when it is clear what is the independent variable, we may use 0 to denote derivatives; for example, we can write (1.5) as m x00 + c x0 + k x = F (t). 1.1. DIFFERENTIAL EQUATIONS AND MATHEMATICAL MODELS 9 Notation: for simplicity, we generally do not show the dependence of the unknown function on the independent variable, so we will not write mx00 (t)+cx0 (t)+kx(t) = F (t). Mathematical Models Before we begin the analysis of differential equations, let us consider a little more carefully how they arise in mathematical models. Mathematical models are used to reach conclusions and make predictions about the physical world. Suppose there is a particular physical system that we want to study. Let us describe the modeling process for the system in several steps: 1. Abstraction: Describe the physical system using mathematical terms and relationships; this provides the model itself. 2. Analysis: Apply mathematical analysis of the model to obtain mathematical conclusions. 3. Interpretation: Use the mathematical conclusions to obtain conclusions about the physical system. 4. Refinement (if necessary): If the conclusions of the model do not agree with experiments, it may be necessary to replace or at least refine the model to make it more accurate. In this textbook, of course, we consider models involving differential equations. Perhaps the simplest case is population growth. It is based upon the observation that populations of all kinds grow (or shrink) at a rate proportional to the size of the population. If we let P (t) denote the size of the population at time t, then this translates to the mathematical statement dP/dt = kP, where k is the proportionality constant: if k > 0 then the population is growing, and if k < 0 then it is shrinking. If we know the population is P0 at time t = 0, then we have an initial-value problem for a first-order linear differential equation: dP = kP, dt P (0) = P0 . P P(t) (1.7) This is our mathematical model for population growth. It can easily be analyzed by separation of variables (see Section 1.3), and the solution is found to be P (t) = P0 ekt . When k > 0, the interpretation of this analysis is that the population grows exponentially, and without any upper bound. While this may be true for a while, growing populations eventually begin to slow down due to additional factors like overcrowding or limited food supply. This means that the model must be refined to account for these additional factors; we will discuss one such refinement of (1.7) in Section 1.3. We should also mention that the case k < 0 in (1.7) provides a model for radioactive decay. Similar reasoning lies behind Newton’s law of cooling in heat transfer: it is observed that a body with temperature T that is higher than the ambient temperature A will cool at a rate proportional to the temperature differential T − A. Consequently, if the initial temperature T0 is greater than A, then the body will cool: rapidly at first, - Po t t Fig.3. Population growth 10 CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS but then gradually as it decreases to A. Our mathematical model for cooling is the initial-value problem for a first-order linear differential equation: -T 0 dT = −k(T − A), dt T(t) - T (0) = T0 , (1.8) A 0 Fig.4. cooling Newton’s law of where k > 0. In fact, if T0 is less than A, then (1.8) also governs the warming of the body; cf. Example 4 in Section 1.3 where we shall solve (1.8) by separation of variables. But for now, observe that (1.8) implies that T = A is an equilibrium, i.e. dT /dt = 0 and the object remains at the ambient temperature. We shall have more to say about equilibria in the next section. Also, note that we have assumed that the proportionality constant k is independent of T ; of course, this may not be strictly true for some materials, which means that a refinement of the model may be necessary. We can also use mathematical models to study resistive forces that depend on the velocity of a moving body; these are of the form (1.1) and will be discussed in Section 1.3. Other mathematical models discussed in this textbook include the damped, forced spring-mass system (1.5) and other mechanical vibrations, electrical circuits, and mixture problems. Remark. Differential equations involving unknown functions of a single variable, e.g. x(t) or y(x), are often called ordinary differential equations. On the other hand, differential equations involving unknown functions of several variables, such as u(x, y), are called partial differential equations since the derivatives are partial derivatives, ux and uy . We shall not consider partial differential equations in this textbook. Exercises 1. For each differential equation, determine (i) the order, and (ii) whether it is linear: (a) y 0 + xy 2 = cos x (c) y 000 + y = x2 (b) x00 + 2x0 + x = sin t (d) x000 + t x = x2 2. For the given differential equation, use integration to (i) find the general solution, and (ii) find the particular solution satisfying the initial condition y(0) = 1. (a) (b) dy dt dy dx = sin t = xex 2 (c) (d) dy dx dy dx = x cos x = √1 1−x 3. A rock is hurled into the air with an initial velocity of 64 ft/sec. Assuming only gravitational force with g = 32 ft/sec2 applies, when does the rock reach its maximum height? What is the maximum height that the rock achieves? 4. A ball is dropped from a tower that is 100 m tall. Assuming only gravitational force g = 9.8 m/sec2 , how long does it take to reach the ground? What is the speed upon impact? 1.2. GEOMETRIC ANALYSIS AND EXISTENCE/UNIQUENESS 1.2 11 Geometric Analysis and Existence/Uniqueness A first-order differential equation of the form dy = f (x, y) (1.9) dx defines a slope field (or direction field) in the xy-plane: the value f (x, y) is the slope of a tiny line segment at the point (x, y). It has geometrical significance for solution curves (also called integral curves) which are simply graphs of solutions y(x) of the equation: at each point (x0 , y0 ) on a solution curve, f (x0 , y0 ) is the slope of the tangent line to the curve. If we sketch the slope field, and then try to draw curves which are tangent to this field at each point, we get a good idea how the various solutions behave. In particular, if we pick (x0 , y0 ) and try to draw a curve passing through (x0 , y0 ) which is everywhere tangent to the slope field, we should get the graph of the solution for the initial-value problem dy/dx = f (x, y), y(x0 ) = y0 . Let us illustrate this with a simple example: dy = 2x + y. (1.10) dx We first take a very low-tech approach and simply calculate the slope at various values of x and y: y= x = −2 x = −1 x=0 x=1 x=2 −2 −6 −4 −2 0 2 −1.5 −5.5 −3.5 −1.5 0.5 2.5 −1 −5 −3 −1 1 3 −0.5 −4.5 −2.5 −0.5 1.5 3.5 0 −4 −2 0 2 4 0.5 −3.5 −1.5 0.5 2.5 4.5 1 −3 −1 1 3 5 1.5 −2.5 −0.5 1.5 3.5 5.5 2 −2 0 2 4 6 2 1.5 1 0.5 Figure 1. Values of f (x, y) = 2x + y at various values of x and y Then we plot these values in the xy plane: see Figure 2 which includes the slope field at even more points than those computed above. (The slopes are color-coded according to how “steep” they are.) Using the slope field, we can then sketch solution curves: in Figure 2 we see the solution curve starting at (−1, 0); note that it is everywhere tangent to the slope field. Obviously, this is a very crude and labor-intensive analysis. Happily, technology can come to our assistance: there are many excellent computational and graphing programs such as MATLAB, Mathematica, Maple, etc that can plot slope fields and compute solution curves (not just sketch them). These programs utilize numerical methods to “solve” differential equations; however, we shall not discuss them here. Existence and Uniqueness of Solutions This graphical analysis of (1.9) suggests that we can always find a solution curve passing through any point (x0 , y0 ), i.e. there exists a solution of (1.9) satisfying the initial condition y(x0 ) = y0 . It also seems that there is only one such solution curve, i.e. the solution of the initial-value problem is unique. This question of the existence and uniqueness of solutions to an initial-value problem is so fundamental that we now carefully formulate the conditions under which it is true. -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -0.5 -1 -1.5 -2 Fig.2. Slope field and the solution curve from (-1,0). 12 CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS Theorem 1. If f (x, y) is continuous in an open rectangle R = (a, b) × (c, d) in the xyplane that contains the point (x0 , y0 ), then there exists a solution y(x) to the initial-value problem dy = f (x, y), y(x0 ) = y0 , (1.11) dx that is defined in an open interval I = (α, β) containing x0 . In addition, if the partial derivative ∂f /∂y is continuous in R, then the solution y(x) of (1.11) is unique. Figure 3. Existence and Uniqueness of a Solution. Remark 1. We might hope that existence holds with α = a and β = b, and indeed this is the case for linear equations f (x, y) = a(x)y + b(x). But for nonlinear equations, this need not be the case. For example, if we plot the slope field and some solution curves for y 0 = y 2 , we see that the solutions seem to approach vertical asymptotes; this suggests that they are not defined on I = (−∞, ∞) even though f = y 2 is continuous on the whole xy-plane. (When we discuss separation of variables in Section 1.3, we will be able to confirm the existence of these vertical asymptotes for many nonlinear equations.) 2.5 2 1.5 1 -2 0.5. . . -1.5 -1 -0.5 This theorem is proved using successive approximations, but we shall not give the details; see, for example, [1]. However, let us make a couple of remarks that will help clarify the need for the various hypotheses. 0 0.5 1 1.5 2 Fig.4. Slope field and solution curves for y 0 = y 2 . Remark 2. Regarding the condition for uniqueness, Exercise 3 in Section 1.3 shows that the initial-value problem y 0 = y 2/3 , y(0) = 0 has two different solutions; the uniqueness claim in the theorem does not apply to this example since ∂f /∂y is not continuous in any rectangle containing (0, 0). Qualitative Analysis When the function f in (1.9) does not depend on x, the equation is called autonomous: dy = f (y). dx (1.12) For such equations, the slope fields do not depend upon the value of x, so changing the value of x0 merely translates the solution curve horizontally. But more importantly, values of y0 for which f (y0 ) = 0 are called critical point, and provide equilibrium solutions of (1.12), namely y(x) ≡ y0 is a constant solution. (We mentioned this phenomenon in connection with Newton’s law of cooling in the previous section.) Given an equilibrium solution y0 for the autonomous equation (1.12), nearby solution curves often approach y0 as x increases, or move away from y0 as x increases. This phenomenon is called stability: 1. If all nearby solution curves approach y0 , then the equilibrium is called a sink, which is stable; 2. If at least some nearby solution curves do not approach y0 , then the equilibrium is called unstable; Fig.5. Stability of critical points. 3. If all nearby solution curves move away from y0 , then the equilibrium is called a source, which is unstable. 1.2. GEOMETRIC ANALYSIS AND EXISTENCE/UNIQUENESS 13 The reason for the terms “sink” and “source” can be seen from the graph. We illustrate this with an example. Example 1. Find all equilibrium solutions for y 0 = y 2 − 4y + 3 and determine the stability of each one. Solution. We find f (y) = y 2 − 4y + 3 = (y − 1)(y − 3) = 0 has solutions y = 1 and y = 3, so these are the equilibrium solutions. To determine their stability, we need to consider the behavior of solution curves in the three intervals (−∞, 1), (1, 3), and (3, ∞); this is determined by the sign of f (y) in these intervals. But it is clear (for example, by sampling f at the points y = 0, 2, 4) that f (y) < 0 on the interval (1, 3) and f (y) > 0 on the intervals (−∞, 1) and (3, ∞). From this we determine that all solutions near y = 1 tend towards y = 1 (like water flows towards a drain), so this critical point is asymptotically stable, and it resembles a “sink”. On the other hand, near y = 3 solutions tend away from y = 3 (like water flowing from a spigot), so this critical point is unstable, and in fact is a “source”. 2 Note that an equilibrium point y0 may be unstable without being a source: this occurs if some nearby solution curves approach y0 as x increases while others move away; such an equilibrium may be called semistable. Let us consider an example of this. Fig.6. Stability of critical points in Example 1. Example 2. Find all equilibrium solutions for y 0 = y 3 −2y 2 and determine the stability of each one. Solution. We find f (y) = y 2 (y − 2) = 0 has solutions y = 0 and y = 2, so these are the equilibrium solutions. For y < 0 and 0 < y < 2 we have f (y) < 0, so nearby solutions starting below y = 0 tend towards −∞ while solutions starting between y = 0 and y = 2 tend towards y = 0. This means that y = 0 is semistable. On the other hand, solutions starting above y = 2 tend towards +∞, so y = 2 is a source. 2 Exercises 1. Sketch the slope field and some solution curves for the following equations (a) dy dx =x−y (b) dy dx = y − sin x 2. For each initial-value problem, determine whether Theorem 1 applies to show the existence of a unique solution in an open interval I containing x = 0. (a) dy dx = x y 4/3 , y(0) = 0 (b) dy dx = x y 1/3 , y(0) = 0 (c) dy dx = xy 1/3 , y(0) = 1 3. Find all equilibrium solutions for the following autonomous equations, and determine the stability of each equilibrium. (a) (c) dy dt dy dt = y − y2 (b) = y − y3 (d) dy dt dy dt = y2 − y3 = y sin y Fig.7. Stability of critical points in Example 2. 14 CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS 4. A cylindrical tank with cross-sectional area A is being filled with water at the rate of k ft3 /sec, but there is a small hole in the bottom of the tank of cross-sectional area a. Torricelli’s law in hydrodynamics states that the water exits the hole with √ velocity v = 2 g y, where g = 32 ft/sec2 is acceleration due to gravity and y is the depth of water in the tank. (a) Show that y(t) satisfies the differential equation √ dy k − a 2gy = . dt A (b) Find the equilibrium depth of water in the tank. Is it a stable equilibrium? Fig.8. Tank with Hole. 1.3 Separable Equations & Applications A first-order differential equation is called separable if it can be put into the form p(y) dy = q(x). dx (1.13) The equation is called “separable” because, if we formally treat dy/dx as a fraction of differentials, then (1.13) can be put into the form p(y) dy = q(x) dx, in which the variables are “separated” by the equality sign. We can now integrate both sides of this equation to obtain Z Z p(y) dy = q(x) dx, which yields P (y) = Q(x) + C, (1.14) where P (y) be an antiderivative for p(y), Q(x) an antiderivative for q(x), and C is an arbitrary constant. Since we were a little cavalier in our derivation of (1.14), let us verify that it solves (1.13): if we differentiate both sides with respect to x and use the chain rule on the left-hand side to calculate d dy P (y) = p(y) , dx dx we see that (1.14) implies (1.13). This means that we have indeed solved (1.13), although (1.14) only gives the solution y(x) in implicit form. It may be possible to solve (1.14) explicitly for y as a function of x; otherwise, it is acceptable to leave the solution in the implicit form (1.14). If an initial-condition is given, the constant C in (1.14) can be evaluated. Example 1. Let us solve the initial-value problem dy = 2xy 2 , dx y(0) = 1. 1.3. SEPARABLE EQUATIONS & APPLICATIONS 15 6 Solution. We separate variables and integrate both sides to find Z Z y −2 dy = 2x dx, 5.5 5 4.5 4 and then evaluate both integrals to obtain 3.5 −y −1 = x2 + C. 3 2.5 We can solve for y to obtain the general solution of the differential equation: 2 1.5 y(x) = − x2 1 . +C 1 0.5 Finally we use the initial-condition y(0) = 1 to evaluate C = −1 and obtain the solution of the initial-value problem as 1 . y(x) = 1 − x2 Notice that this solution is not defined at x = ±1, so the largest interval containing x = 0 on which the solution is continuous is I = (−1, 1). On the other hand, f (x, y) = 2xy 2 is continuous for all −∞ < x, y < ∞, so this provides another nonlinear example that in Theorem 1 of Section 1.2 we do not always have α = a and β = b. 2 Example 2. Let us find the general solution for dy = (1 + y 2 ) cos x. dx Solution. We separate variables and integrate both sides to obtain Z Z dy = cos x dx. 1 + y2 Evaluating the integrals yields tan−1 y = sin x + C. This defines y(x) implicitly, but we can find y(x) explicitly by taking tan of both sides: y(x) = tan[sin x + C]. 2 The following example involves the model for population growth that was mentioned in Section 1.1. Example 3. A population of bacteria grows at a rate proportional to its size. In 1 hour it increases from 500 to 1,500. What will the population be after 2 hours? Solution. Letting P (t) denote the population at time t, we have the initial-value problem dP = kP, P (0) = 500. dt -1 0 1 2 Fig.1. y(x) = 1/(1 − x ). 16 We do not yet know the value of k, but we have the additional data point P (1) = 1, 500. If we separate variables and integrate we obtain Z Z dP = k dt. P 4000 3000 P(t) =500 e CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS t ln 3 This yields ln |P | = kt + c where c is an arbitrary constant. Exponentiating both sides, we find |P (t)| = ekt+c = C ekt , where C = ec . 2000 1000 0 1 2 Since P is nonnegative, we may remove the absolute value sign, and using the initialcondition P (0) = 500 we obtain Fig.2. P (t) = 500 et ln 3 . P (t) = 500 ekt . Now letting t = 1 we find 1, 500 = 500 ek , which we may solve for k to obtain k = ln 3. Hence (see Fig.2) P (t) = 500 et ln 3 . 2 We now evaluate at t = 2 to obtain P (2) = 500 e2 ln 3 = 500 eln 3 = 500 (9) = 4, 500. 2 The next example concerns Newton’s law of cooling/warming, which was also discussed in Section 1.1. Example 4. A dish of leftovers is removed from a refrigerator at 35◦ F and placed in an oven at 200◦ F. After 10 minutes, it is 75◦ F. How long will it take until it is 100◦ F? Solution. Let the temperature of the leftovers at time t be T (t). Since the ambient temperature is A = 200, our mathematical model (1.8) involves the differential equation dT = −k(T − 200). dt If we separate variables and integrate we find Z Z dT = − k dt. T − 200 Evaluating the integrals yields ln |T − 200| = −kt + c where c is an arbitrary constant. But in this problem we have T < 200, so ln |T − 200| = ln(200 − T ) = −kt + c, and exponentiating yields 200 − T = Ce−kt where C = ec . Let us write this as T (t) = 200 − Ce−kt , and then use the initial condition T (0) = 35 to evaluate C: 35 = 200 − C e0 ⇒ C = 165. In order to determine k we use T (10) = 75: 75 = 200 − 165 e−10k ⇒ e−10k = 25 33 ⇒ k= 1 ln 10 33 25 = .02776. 1.3. SEPARABLE EQUATIONS & APPLICATIONS 17 Therefore, we have obtained 200 T (t) = 200 − 165 e −.02776 t . 150 Finally, to determine when the dish is 100◦ F, we solve T (t) = 100 for t: 200 − 165 e−.02667 t = 100 ⇒ e−.02667 t = 100 165 T=200 ⇒ So the dish is 100◦ F after approximately 19 minutes. T(t) 100 t = 18.7767. 50 2 0 5 10 15 20 25 30 Fig.3. Graph of T (t) = 200 − 165e−.02776t . Resistive Force Models Consider an object that is moving with velocity v, but whose motion is subject to a resistive force Fr (sometimes called drag). For example, such a resistive force is experienced by an object moving along a surface in the form of friction, or an object moving through the atmosphere in the form of air resistance. In all cases, the resistive force is zero when v = 0 and increases as v increases; but the specific relationship between v and Fr depends on the particular physical system. Let us consider a relatively simple relationship between v and Fr . If c and α are positive constants, then Fr = −c v α (1.15) has the properties we are looking for: Fr = 0 when v = 0 and Fr increases as v increases. (The minus sign accounts for the fact that the force decelerates the object.) If α = 1, then Fr depends linearly upon v; this linear relationship is convenient for analysis, but may not be appropriate in all cases. In fact, for objects moving through a fluid (including the air), physical principles may be used to predict that α = 1 is reasonable for relatively low velocites, but α = 2 is more appropriate for high velocities. Let us use separation of variables to solve a problem involving low velocity. Example 5. Predict the motion of an object of mass m falling near the earth’s surface under the force of gravity and air resistance that is proportional to velocity. Solution. Let y(t) denote the distance travelled at time t, and v = dy/dt. According to Newton’s second law of motion, ma = Fg + Fr , where Fg = mg is the force due to gravity (positive since y is measured downwards) and Fr = − c v as in (1.15) with α = 1 is air resistance. Consequently, we have the first-order differential equation dv = mg − c v. dt Notice that (1.16) is an autonomous equation with equilibrium m v∗ = mg ; c (1.16) (1.17) at this velocity, the object has zero acceleration. (Using qualitative analysis, we can reach some additional conclusions at this point; see Exercise 12.) But (1.16) is also a separable equation, so we can solve it. If we separate the variables v and t and then integrate, we obtain Z Z m dv = − dt. c v − mg Fig.4. Forces on a falling object 18 CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS We can use a substitution u = c v − mg to evaluate the left-hand side: Z Z m du m m m dv = = ln |u| + C = ln |c v − mg| + C. c v − mg c u c c R Since we easily integrate − dt = −t + C, we conclude v = mg/c v(t) m ln |c v − mg| = −t + C1 c v0 ln |c v − mg| = − ct + C2 m mg + C4 e−ct/m . c We can use the initial velocity v0 to evaluate the constant C4 and conclude ⇒ t ⇒ c v − mg = C3 e−ct/m Fig.5. v(t) approaching the terminal velocity. ⇒ v= v(t) = mg −ct/m mg + (v0 − )e . c c v(t) → mg = v∗ c In particular, we note that as t → ∞, leading to the interpretation of the equilibrium v ∗ as the terminal velocity. 2 The Logistic Model for Population Growth In Section 1.1 we described the simple model for population growth dP = k P, dt where k > 0 is a constant, and analyzed it in Example 3 above to conclude that a population of bacteria will continue to grow exponentially. The model may be accurate for a while, but eventually other factors such as overcrowding or limited food supply will tend to slow the population growth. How can we refine the model to account for this? Suppose that the population is in an environment that supports a maximum population M : as long as P (t) < M then P (t) will increase, but if P (t) > M then P (t) will decrease. A simple model for this is a nonlinear equation called the logistic model : dP = k P (M − P ), dt where k, M > 0 are constants. (1.18) This has the important features that we are looking for: 0<P <M ⇒ dP/dt > 0, i.e. P (t) is increasing P >M ⇒ dP/dt < 0, i.e. P (t) is decreasing. Moreover, (1.18) is an autonomous equation with critical points at P = 0 and P = M ; qualitative analysis shows that P = M is a stable equilibrium, so for any positive initial population P0 we have P (t) → M as t → ∞. (Of course, the critical point at P = 0 corresponds to the trivial solution of a zero population; qualitative analysis shows it is unstable, which is what we expect since any positive initial population will grow and move away from P = 0.) 1.3. SEPARABLE EQUATIONS & APPLICATIONS 19 Beyond qualitative analysis, we notice that (1.18) is separable, and we can solve it by first writing: dP = k dt. P (M − P ) To integrate the left-hand side, we use a partial fraction decomposition (cf. Appendix B) 1 1 1 1 = + , P (M − P ) M P M −P so Z 1 1 P dP = (ln P − ln |M − P |) + c = ln + c. P (M − P ) M M |M − P | This yields ln P |M − P | = M kt + c1 , where c1 = −M c, and we can exponentiate to conclude P = C eM kt , |M − P | where C = ec1 > 0. In this formula, we need C > 0; but if we allow C to be negative then we can remove the absolute value signs on M − P and simply write P = C eM kt , M −P for some constant C. If we let t = 0, then we can evaluate C and conclude P P0 eM kt = . M −P M − P0 A little algebra enables us to solve for P and write our solution as P (t) = M P0 . P0 + (M − P0 ) e−M kt (1.19) Notice that, regardless of the value of P0 , we have P (t) → M as t → ∞, as expected from our qualitative analysis. Example 3 (revisited). If, after 2 hours, the population of bacteria in Example 3 has only increased to 3, 500 instead of 4, 500 as predicted in Example 3, we might suppose that a logistic model is more accurate. Find the maximum population M . Solution. In addition to the initial population P0 = 500, we now have two data points: P (1) = 1, 500 and P (2) = 3, 500. This should be enough to evaluate the constants k and M in (1.19), although it is only M that we are asked to find. In particular, we have the following two equations involving k and M : 500 M = 1, 500 500 + (M − 500) e−M k and 500 M = 3, 500. 500 + (M − 500) e−2M k 20 CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS In general, solving such a system of nonlinear equations may require the assistance of a computer algebra system. However, in this case, we can use simple algebra to find M . First, in each equation we solve for the exponential involving k: P =7,502 e−M k = 6000 M − 1, 500 3 (M − 500) and e−2M k = M − 3, 500 . 7 (M − 500) P(t) But e−2M k = (e−M k )2 , so we obtain 4000 M − 3, 500 (M − 1, 500)2 = . 9 (M − 500)2 7 (M − 500) 2000 0 1 2 3 4 5 6 7 8 9 10 Fig.6. Solution of Example 3 (revisited). If we cross-multiply and simplify, we obtain 2 2 M − 1, 667 M = 0 ⇒ M = 7, 502. 9 Thus the maximum population that this collection of bacteria can achieve is 7, 502. Of course, once we have found M , we can easily find k from 1 M − 1, 500 k = − ln ≈ 0.000167, M 3 (M − 500) 2 and graph the solution as in Figure 6. Exercises 1. Find the general solution of the following differential equations (a) dy dx + 2xy = 0 (b) dy dx = y2 1+x2 (c) dy dx = cos x y2 (d) dy dx = ex 1+y 2 (f) dx dt 2 (e) x dx dt = x + 1 √ = 3 x t (x, t > 0) 2. Find the solution of the following initial-value problems (a) dy dx = y cos x, (b) dy dx = x2 +1 y , (c) dy dt = y (3t2 − 1), y(0) = 1 √ y(1) = 3 y(1) = −2 3. Consider the initial-value problem dy = y 2/3 , dx y(0) = 0. (a) Use separation of variables to obtain the solution y(x) = x3 /27. (b) Observe that another solution is y(x) ≡ 0. (c) Since this initial-value problem has two distinct solutions, uniqueness does not apply. What hypothesis in Theorem 1 in Section 1.2 does not hold? 1.4. LINEAR EQUATIONS & APPLICATIONS 21 4. A city has a population of 50,000 in the year 2000 and 63,000 in 2010. Assuming the population grows at a rate proportional to its size, find the population in the year 2011. Solution 5. A population of 1, 000 bacteria grows to 1, 500 in an hour. When will it reach 2, 000? 6. Radioactive substances decay at a rate proportional to the amount, i.e. dN/dt = −kN with k a positive constant. Uranium-238 has a half-life of 4.5 × 109 years. How long will it take 500 grams of Uranium-238 to decay to 400 grams? 7. Radiocarbon dating is based on the decay of the radioactive isotope 14 C of carbon, which has a half-life of 5, 730 years. While alive, an organism maintains equal amounts of 14 C and ordinary carbon 12 C, but upon death the ratio of 14 C to 12 C starts to decrease. If a bone is found to contain 70% as much 14 C as 12 C, how old is it? 8. A cake is removed from a 350◦ F oven and placed on a cooling rack in a 70◦ F room. After 30 minutes the cake is 200◦ F. When will it be 100◦ F? 9. A piece of paper is placed in a 500◦ F oven, and after 1 minute its tempeature raises from 70◦ F to 250◦ F. How long until the paper reaches its combustion temperature of 451◦ F? 10. A dead body is found at Noon in an office that is maintained at 72◦ F. If the body is 82◦ F when it is found, and has cooled to 80◦ F at 1 pm, estimate the time of death. (Assume a living body maintains a temperature of 98.6◦ F.) 11. A drug is being administered intravenously to a patient’s body at the rate of 1 mg/min. The body removes 2% of the drug per minute. If there is initially no drug in the patient’s body, find the amount at time t. 12. Perform a qualitative analysis on (1.16) to determine the stability of the equilibrium (1.17). Use this stability to conclude that (1.17) is the terminal velocity. 13. A rock with mass 1 kg is hurled into the air with an initial velocity of 10 m/sec. In addition to gravitational force Fg with g = 9.8 m/sec2 , assume the moving rock experiences air resistance (1.15) with c = .02 and α = 1. When does the rock achieve its maximum height? What is the maximum height that it achieves? 14. Consider an object falling near the earth’s surface under the force of gravity and air resistance that is proportional to the square of the velocity, i.e. (1.15) with α = 2. Use a qualitative analysis to find the terminal velocity v ∗ (in terms of m, g, and c). 1.4 Linear Equations & Applications As defined in Section 1.1, a linear first-order differential equation is one in which the unknown function and its first-order derivative occur linearly. If the unknown function 22 CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS is y(x), then this means that the equation is of the form a(x)y 0 +b(x)y = c(x). Provided a(x) is nonzero, we can put the equation into standard form y 0 + p(x)y = q(x). (1.20) We want to find a procedure for obtaining the general solution of (1.20). The idea is to multiply the equation by a function I(x) so that the left-hand side is the derivative of a product: this means that we can simply integrate both sides and then solve for y(x). Because multiplication by I(x) reduces the problem to integration, I(x) is called an integrating factor. Let us see how to define I(x). Multiplication of the left-hand side of (1.20) by I(x) yields (using y 0 instead of dy/dx) I(x)y 0 + I(x) p(x) y. On the other hand, the product rule implies [I(x)y]0 = I(x)y 0 + I 0 (x)y. Comparing these formulas, we see that we want I(x) to satisfy I 0 (x) = I(x)p(x). But this is a separable differential equation for I, which we know how to solve: Z Z Z dI dI = Ip ⇒ = p dx ⇒ ln |I(x)| = p dx + c dx I R where c is an arbitrary constant. Exponentiating, we obtain |I(x)| = C exp( p dx). But we only need one integrating factor, so we can choose C as convenient. In fact, we can choose C = ±1 so as to obtain the following formula for our integrating factor: I(x) = e R p(x) dx . (1.21) At this point, we can use I(x) as described to solve (1.20) and obtain a general solution formula. However, it is more important to remember the method than to remember the solution formula, so let us first apply the method to a simple example. Example 1. Find the general solution of xy 0 + 2y = 9x. Solution. We first need to put the equation in standard form by dividing by x: y0 + 2 y = 9. x (Of course, dividing by x makes us worry about the possibility that x = 0, but the analysis will at least be valid R for −∞ < x < 0 and 0 < x < ∞.) Comparing with (1.20) we see that p(x) = 2/x, so p dx = 2 ln |x|, and according to (1.21) we have I(x) = e2 ln |x| = x2 . Multiplying the standard form of our equation by I(x) we obtain x2 y 0 + 2xy = 9x2 . 1.4. LINEAR EQUATIONS & APPLICATIONS 23 Recognizing the left-hand side as (x2 y)0 , we can integrate: Z 2 0 2 2 (x y) = 9x ⇒ x y = 9x2 dx = 3x3 + C, where C is an arbitrary constant. Thus our general solution is 2 y = 3x + C x−2 . Of course, we can also solve initial-value problems for linear equations. Example 2. Let us find the solution of the initial-value problem y 0 + 2x y = x, y(0) = 1. R Solution. We see that p(x) = 2x, so p dx = x2 and our integrating factor is I(x) = 2 ex . Multiplying by I(x) we have 2 2 2 ex y 0 + 2x ex y = x ex . But by the product rule, we see that this can be written 2 d x2 [e y] = x ex . dx 1 Integrating both sides (using substitution on the right hand side) yields Z Z 2 2 1 1 1 2 ex y = x ex dx = eu du = eu + C = ex + C. 2 2 2 We can divide by our integrating factor to obtain our general solution -2 -1 0 1 2 −x2 Fig.1. y(x) = (1 + e 2 1 + C e−x . 2 Now we can use our initial condition at x = 0 to conclude y(x) = 1 = y(0) = 1 +C 2 ⇒ C= 1 , 2 2 2 and our solution may be written as y(x) = (1 + e−x )/2; see Fig.1. x2 In Example 2 we were lucky that it was possible to integrate x e by substitution. However, even if we cannot evaluate some of the integrals, the method can still be used. In fact, if we carry out the procedure in the general case (1.20), we obtain the solution formula Z R R y(x) = e− pdx q(x) e p(x)dx dx + c . (1.22) But we repeat that it is better to remember the method than the solution formula. In the next example, we encounter an integral that must be evaluated piecewise. Example 3. Solve the initial-value problem y 0 + 2y = q(x), y(0) = 2, )/2 24 CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS where q(x) is the function ( 2, if 0 ≤ x ≤ 1 q(x) = 0, if x > 1. Solution. Since p(x) = 2 we find I(x) = e2x , and multiplication as before gives y 2 d 2x [e y] = q(x) e2x . dx 1 0 1 x 2 Integration yields 3 Fig.2. Graph of q(x) e2x y = Z q(x) e2x dx. This shows that the solution is continuous for x ≥ 0, but we need to evaluate the integral separately for 0 ≤ x ≤ 1 and x > 1. For 0 ≤ x ≤ 1 we have q(x) = 2, and integration yields Z 2 e2x dx = e2x + C0 , where C0 is an arbitrary constant. Solving for y yields y(x) = 1 + C0 e−2x . Using the initial condition y(0) = 2, we have C0 = 1 and our solution is y(x) = 1 + e−2x , for 0 ≤ x ≤ 1. To find the solution for x > 1, we need to perform the integration with q(x) = 0: Z 2x e y = 0 dx = C1 ⇒ y(x) = C1 e−2x . To find the constant C1 we need to know the value of y(x) at some x > 1. We do not have this information, but if the solution y(x) is to be continuous for x ≥ 0, the limiting value of y(x) as x ↓ 1 must agree with the value y(1) provided by our solution formula on 0 ≤ x ≤ 1. In other words, y 2 1 0 1 2 3 x 4 Fig.3. Graph of solution to Example 3 C1 e−2 = y(1) = 1 + e−2 . We conclude C1 = 1 + e2 , so y(x) = 1 + e2 e−2x , for x > 1. Putting these formulas together, we conclude that the solution is given by ( 1 + e−2x , if 0 ≤ x ≤ 1 −2x 2 y(x) = 2 1+e e , if x > 1. Mixture Problems Fig 4. Mixing tank. Suppose we have a tank containing a solution, a solute dissolved in a solvent such as salt in water (i.e. brine). A different concentration of the solution flows into the tank at some rate, and the well-mixed solution is drawn off at a possibly different rate. Obviously, the amount of solute in the tank can vary with time, and we want to derive a differential equation that governs this system. Suppose we know ci , the concentration of the solute in the inflow, and ri , the rate of the inflow; then the product ci ri is the rate at which the solute is entering the tank. 1.4. LINEAR EQUATIONS & APPLICATIONS 25 Suppose we also know the rate of the outflow ro . However, the concentration of the outflow co can vary with time and is yet to be determined. Let x(t) denote the amount of solute in the tank at time t > 0 and suppose that we are given x0 , the amount of salt in the tank at time t = 0. If V (t) denotes the volume of solution in the tank at time t (which will vary with time unless ri = ro ), then we have co (t) = x(t) . V (t) Moreover, the rate at which the solute is flowing out of the tank is the product co ro , so the rate of change of x(t) is the difference dx x(t) = ci ri − co ro = ci ri − ro . dt V (t) Rearranging this, we have a initial-value problem for a first-order linear differential equation for x(t): dx ro + x(t) = ci ri , x(0) = x0 . dt V (t) Since it is easy to find V (t) from ri , ro , and V0 , the initial volume of solution in the tank, we can solve this problem using an integrating factor. Let us consider an example. Example 4. A 40 liter tank is initially half-full of water. A solution containing 10 grams per liter of salt begins to flow in at 4 liters per minute and the mixed solution flows out at 2 liters per minute. How much salt is in the tank just before it overflows? Solution. We know that ri = 4, ci = 10, and ro = 2. We also know that V (0) = 20 and dV /dt = ri − ro = 2, so V (t) = 2t + 20. If we let x(t) denote the amount of salt in the tank at time t, then x(0) = 0 and the rate of change of x is dx x = (4)(10) − (2) . dt 2t + 20 We can rewrite this as an initial-value problem dx 1 + x = 40, dt t + 10 R R Since p dt = This yields dt t+10 x(0) = 0. = ln(t + 10), the integrating factor is I(t) = eln(t+10) = t + 10. d [(t + 10)x] = 40(t + 10) dt which we integrate to find (t + 10)x = 20(t + 10)2 + C ⇒ x(t) = 20(t + 10) + C(t + 10)−1 . But we can use the initial condition x(0) = 0 to evaluate C = −2, 000, and we know that overflow occurs when V (t) = 40, i.e. t = 10, so the amount of salt at overflow is x(10) = 20 · 20 − 2,000 20 = 300 grams. 2 26 CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS Exercises 1. Find the general solution for each of the following linear differential equations. (Prime 0 denotes d/dx.) (d) xy 0 + y = (a) y 0 + y = 2 0 −3x 0 −2x (b) y + 3y = x e (c) y + 3y = x e (e) dx dt (f) t2 dy dt + 1 t+1 x √ x (x > 0) =2 − 3ty = t6 sin t (t > 0) 2. Solve the following initial-value problems. (Prime 0 denotes d/dx.) (a) y 0 + y = ex , y(0) = 1 (b) y 0 + (cot x) y = cos x, y(π/2) = 1 (c) t dy/dt + y = t−1 , y(−1) = 1 ( 1, if 0 ≤ x ≤ 1 (d) y + y = q(x), y(0) = 2, where q(x) = 0, if x > 1. ( 1 − x, if 0 ≤ x ≤ 1 (e) y 0 − x1 y = p(x), y(1) = 1, where p(x) = 0, if x > 1. 0 3. A 100 gallon tank initially contains 10 lbs salt dissolved in 40 gallons of water. Brine containing 1 lb salt per gallon begins to flow into the tank at the rate of 3 gal/min and the well-mixed solution is drawn off at the rate of 1 gal/min. How much salt is in the tank when it is about to overflow? Solution 4. A tank initially contains 100 liters of pure water. Brine containing 3 lb salt/liter begins to enter the tank at 1 liter/min, and the well-mixed aolution is drawn off at 2 liters/min. (a) How much salt is in the solution after 10 min? (b) What is the maximum amount of salt in the tank during the 100 minutes it takes for the tank to drain? 5. A reservoir is filled with 1 billion cubic feet of polluted water that initially contains 0.2% pollutant. Every day 400 million cubic feet of pure water enters the reservoir and the well-mixed solution flows out at the same rate. When will the pollutant concentration in the lake be reduced to 0.1%? 6. In Example 4, suppose that the inflow only contains 5 grams per liter of salt, but all other quantities are the same. In addition, suppose that when the tank becomes full at t = 10 minutes, the inflow is shut off but the solution continues to be drawn off at 2 liters per minute. How much salt will be in the tank when it is once again half-full? 7. The rate of change of the temperature T (t) of a body is still governed by (1.8) when the ambient temperature A(t) varies with time. Suppose the body is known to have k = 0.2 and initially is at 20◦ C; suppose also that A(t) = 20e−t . Find the temperature T (t). 1.5. OTHER METHODS 1.5 27 Other Methods In this section we gather together some additional methods for solving first-order differential equations. The first basic method is to find a substitution that simplifies the equation, enabling us to solve it using the techniques that we have already discussed; the two specific cases that we shall discuss are homogeneous equations and Bernoulli equations. The second basic method is to use multi-variable calculus techniques to find solutions using the level curves of a potential function for a gradient vector field; the equations to which this method applies are called exact equations. Homogeneous Equations A first-order differential equation in the form dy =F dx y (1.23) x is called homogeneous. Since the right-hand side only depends upon v = y/x, let us introduce this as the new dependent variable: v= y x ⇒ y = xv ⇒ dy dv =v+x . dx dx Substituting this into (1.23) we obtain x dv = F (v) − v. dx (1.24) But (1.24) is separable, so we can find the solution v(x). Then we obtain the solution of (1.23) simply by letting y(x) = x v(x). Let us consider an example. Example 1. Find the general solution of (x + y) y 0 = x − y. Solution. This equation is not separable and it is not linear, so we cannot use those techniques. However, if we divide through by (x + y), we realize the equation is homogeneous: dy x−y 1 − (y/x) = = . dx x+y 1 + (y/x) Introducing v = y/x, we obtain v+x dv 1−v = dx 1+v ⇒ x dv 1 − 2v − v 2 = . dx 1+v Separating variables and integrating both sides, we find Z Z 1+v dx = ln |x| + c. dv = 2 1 − 2v − v x To integrate the left-hand side, we use the substitution w = 1 − 2v − v 2 to calculate Z Z 1+v 1 dw 1 dv = − = − ln |w| + c = ln(|w|−1/2 ) + c. 1 − 2v − v 2 2 w 2 A function f (x, y) is called “homogeneous of degree d” if f (t x, t y) = td f (x, y) for any t > 0. If f (x, y) = F (y/x), then f (x, y) is homogeneous of degree 0. 28 CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS If we exponentiate ln(|w|−1/2 ) = ln |x| + c, we obtain w = C x−2 , where we have removed the absolute value sign on w by allowing C to be positive or negative. Using w = 1 − 2v − v 2 , v = y/x, and some algebra, we obtain x2 − 2xy − y 2 = C. 2 Notice that this defines the solution y(x) implicitly. Of course, we can also solve initial-value problems for homogeneous equations. Example 2. Find the solution of the initial-value problem xy 2 y 0 = x3 + y 3 , y(1) = 2. Solution. If we divide through by xy 2 , we see that the equation is homogeneous: y −2 y dy x2 y = 2 + = + . dx y x x x In terms of v = y/x, we obtain v+x and separating variables yields Z v 2 dv = dv 1 = 2 + v, dx v Z dx = ln |x| + c. x Integrating and then replacing v by y/x, we obtain y 3 = x3 (3 ln |x| + c). Using the initial condition y(1) = 2, we can evaluate c = 8. Moreover, since ln |x| is not defined at x = 0 and our initial condition occurs at x = 1, this solution should be restricted to x > 0; thus we can remove the absolute value signs on x. Then take the cube root of both sides to find y(x) = x (3 ln x + 8) 1/3 for x > 0. 2 Bernoulli Equations A first-order differential equation in the form dy + p(x)y = q(x)y α , dx (1.25) where α is a real number, is called a Bernoulli equation. If α = 0 or α = 1, then (1.25) is linear; otherwise it is not linear. However, there is a simple substitution that reduces (1.25) to a linear equation. In fact, if we replace y by v = y 1−α , (1.26) 1.5. OTHER METHODS 29 then 1 dv dy = yα dx 1−α dx and p(x) y = p(x) y α v, so (1.25) becomes dv + (1 − α) p(x) v = (1 − α) q(x). dx (1.27) Now we can use an integrating factor to solve (1.27) for v, and then use (1.26) to recover y. Let us perform a simple example. Example 3. Find the general solution of x2 y 0 + 2 x y = 3 y 4 . Solution. We divide by x2 to put this into the form (1.25) with α = 4: y0 + 2 3 y = 2 y4 . x x We introduce v = y −3 , but rather than just plugging into (1.27), it is better to derive the equation that v satisfies: v = y −3 ⇒ y = v −1/3 1 y 0 = − v −4/3 v 0 , 3 ⇒ 1 2 3 − v −4/3 v 0 + v −1/3 = 2 v −4/3 , 3 x x and after some elementary algebra we obtain the linear equation ⇒ v0 − 6 9 v = − 2. x x As integrating factor, we take I(x) = e−6 R x−1 dx = e−6 ln |x| = x−6 , which enables us to find v: (x−6 v)0 = −9 x−8 ⇒ x−6 v = 9 −7 x +C 7 ⇒ v= 9 −1 x + C x6 . 7 Finally, we use y = v −1/3 to find our desired solution: y(x) = 9 7 x−1 + C x6 −1/3 . 2 Exact Equations Let us consider a first-order equation in differential form: M (x, y) dx + N (x, y) dy = 0. (1.28) 30 CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS For example, dy/dx = f (x, y) is equivalent to (1.28) if we take M (x, y) = −f (x, y) and N (x, y) = 1, but there are many other possibilities. In fact, we are interested in the case that there is a differentiable function Φ(x, y) so that M= ∂Φ ∂x and N = ∂Φ . ∂y (1.29) If we can find Φ(x, y) satisfying (1.29), then we say that (1.28) is exact and we call Φ a potential function for the vector field (M, N ). The significance of the potential function is that its level curves, i.e. Φ(x, y) = c, (1.30) implicitly define functions y(x) which are solutions of (1.28); we know this since taking the differential of (1.30) is exactly (1.28). Let us consider an example. Example 4. Suppose we want the general solution of 2x sin y dx + x2 cos y dy = 0. By inspection, we see that Φ(x, y) = x2 sin y has the desired partial derivatives ∂Φ = 2x sin y ∂x and ∂Φ = x2 cos y. ∂y Consequently, x2 sin y = c provides the general solution (in implicit form). 2 But how do we know when (1.28) is exact, and how do we construct Φ(x, y)? Recall from multivariable calculus that if Φ(x, y) has continuous second-order derivatives (i.e. ∂ 2 Φ/∂x2 , ∂ 2 Φ/∂y∂x, ∂ 2 Φ/∂x∂y, and ∂ 2 Φ/∂y 2 are all continuous functions of x and y) then “mixed partials are equal”: ∂2Φ ∂2Φ = . ∂x ∂y ∂y ∂x (1.31) Putting (1.31) together with (1.29), we find that (1.28) being exact implies ∂M ∂N = . ∂y ∂x (1.32) On the other hand, if M , N satisfy (1.32), then we want to construct Φ so that (1.29) holds. Let us define Z Φ(x, y) = M (x, y) dx + g(y) (1.33) where g(y) is a function that is to be determined. Whatever g(y) is, we have ∂Φ/∂x = M , but we want to choose g(y) so that Z ∂Φ ∂M N (x, y) = = (x, y) dx + g 0 (y). ∂y ∂y 1.5. OTHER METHODS 31 Consequently, g 0 (y) must satisfy g 0 (y) = N (x, y) − Z ∂M (x, y) dx. ∂y (1.34) But this requires the right-hand side in (1.34) to be independent of x. Is this true? We check by differentiating it with respect to x: Z ∂ ∂N ∂M ∂M N (x, y) − (x, y) dx = − =0 ∂x ∂y ∂x ∂y since we have assumed (1.32). We summarize these conclusions as a theorem. Theorem 1. Suppose M (x, y) and N (x, y) are continuously differentiable functions on a rectangle R given by a < x < b and c < y < d. The equation (1.28) is exact if and only if (1.32) holds. Moreover, in this case, the potential function Φ(x, y) is given by (1.33) where g(y) is determined by (1.34). Example 5. Find the solution of the initial-value problem y 2 ex dx + (2y ex + cos y) dy = 0, y(0) = π. Solution. Let us first check to see whether the differential equation is exact. Letting M = y 2 ex and N = 2y ex + cos y, we compute ∂M = 2y ex ∂y and ∂N = 2y ex , ∂x so we conclude the equation is exact. Next, as in (1.33), we define Z Φ(x, y) = y 2 ex dx + g(y) = y 2 ex + g(y). We know that Φx = M , so we check Φy = N , i.e. we want 2y ex + g 0 (y) = 2y ex + cos y. Notice that the terms involving x drop out and we are left with g 0 (y) = cos y, which we easily solve to find g(y) = sin y. We conclude that the general solution is given implicitly by Φ(x, y) = y 2 ex + sin y = c. To find the solution satisfying the initial condition, we simply plug in x = 0 and y = π to evaluate c = π 2 . Consequently, the solution of the initial-value problem is given implicitly by y 2 ex + sin y = π 2 . Remark 1. Of course, instead of defining Φ by (1.33), (1.34) we could have defined Z Φ(x, y) = N (x, y) dy + h(x), where h(x) is chosen to satisfy h0 (x) = M (x, y) − Z ∂N (x, y) dy. ∂x 2 32 CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS Exercises 1. Determine whether the given differential equation is homogeneous, and if so find the general solution. √ (a) 2x y y 0 = x2 + 2y 2 , (b) x y 0 = y + 2 xy (x, y > 0), (c) x(x + y) y 0 = y (x − y), (d) (x2 − y 2 ) y 0 = 2xy. 2. Find the general solution of the following Bernoulli equations (a) x2 y 0 + 2xy = 5y 3 , (b) x y 0 + 6 y = 3 x y 4/3 . 3. Determine whether the given equation in differential form is exact, and if so find the general solution. (a) 2x ey dx + (x2 ey − sin y) dy = 0, y (b) (sin y + cos x) dx + x cos y dy = 0, (d) (cos x + ln y) dx + ( xy + ey ) dy = 0. y (c) sin x e dx + cos x + e dy = 0, 4. Solve the following initial-value problems (which could be homogeneous, Bernoulli, or exact). (a) (2x2 + y 2 ) dx − xy dy = 0, y(1) = 2, (b) (2xy 2 + 3x2 ) dx + (2x2 y + 4y 3 ) dy = 0, y(1) = 0, (c) (y + y 3 ) dx − dy = 0, y(0) = 1. 1.6 Additional Exercises 1. A car is traveling at 30 m/sec when the driver slams on the brakes and the car skids 100 m. Assuming the braking system provided constant deceleration, how long did it take for the car to stop? 2. Suppose a car skids 50 m after the brakes are applied at 100 km/hr. How far will the same car skid if the brakes are applied at 150 km/hr? For the next two problems, the graph represents the velocity v(t) of a particle moving along the x-axis for 0 ≤ t ≤ 10. Sketch the graph of the position function x(t). 3. 4. (5,2) . 2 . . 1 1 0 1 (7,2) (3,2) 2 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 1.6. ADDITIONAL EXERCISES 33 For the following two problems, identify which differential equation corresponds to the given slope field. 5. (a) dy/dx = xy 2 (b) dy/dx = y 2 (c) 2 (d) dy/dx = x + y -1 (a) dy/dx = x sin y (b) dy/dx = x cos y (c) dy/dx = y + x 2 -2 6. 2 dy/dx = y sin x (d) dy/dx = y cos y 2 2 1 1 0 1 2 -2 -1 0 -1 -1 -2 -2 1 2 The next two problems concern the autonomous equation dy/dx = f (y). For the function f (y) given by the graph, find the equilibrium solutions and determine their stability. 7. 8. f(y) f(y) -1 0 1 y y -1 0 1 9. For all values of c, determine whether the autonomous equation dy = y 2 + 2y + c dt has equilibrium solutions; if so, determine their stability. 10. Is it possible for a non-autonomous equation dy/dx = f (x, y) to have an equilibrium solution y(x) ≡ y0 ? If not, explain. If so, give an example. 34 CHAPTER 1. FIRST-ORDER DIFFERENTIAL EQUATIONS 11. A hemispherical tank of radius R contains water at a depth of y, but a small hole in the bottom of the tank has cross-sectional area a, so Torricelli’s law says that √ water exits the tank with velocity v = 2 g y. TheR volume of water in the tank y depends upon y, and can be computed by V (y) = 0 A(u) du, where A(y) is the horizontal cross-sectional area of the tank at height y above the hole. (a) Show that A(y) = π[R2 − (R − y)2 ]. Fig.1. Hemispherical Tank with a Hole. (b) Show that y(t) satisfies the differential equation A(y) p dy = −a 2 g y. dt 12. A hemispherical tank as in the previous problem has radius R = 1 m and the hole has cross-sectional area a = 1 cm2 . Suppose the tank begins full. How long will it take for all the water to drain out through the hole? 13. The population of Country X grows at a rate proportional to its size with proportionality constant k = 0.1. However, during hard times, there is continuous emigration. If the initial population is one million, what annual emigration rate will keep the population to one and a half million in ten years? 14. The rate of change of the temperature T (t) of a body is still governed by (1.8) when the ambient temperature A(t) varies with time. Suppose the dish of leftovers in Example 4 of Section 1.3 is placed in the oven at room temperature 70◦ F, and then the oven is turned up to 200◦ F. Assume the oven heats at a constant rate from 70◦ to 200◦ in 10 minutes, after which it remains at 200◦ . How long until the leftovers are 100◦ ? (Note: the same value of k applies here as in Example 4.) Find the general solution using whatever method is appropriate. 15. dy/dx = 1 + x + y + xy 16. dy/dx = sin(y + x)/(sin y cos x) − 1 17. y 0 + (x ln x)−1 y = x 18. dy/dx = −(x3 + y 2 )/(2xy) Solve the following initial-value problems using whatever method is appropriate. 19. y 0 + y 2 sin x = 0, y(π) = 1/2 2 −t 20. t dy dt − y = t e , y(1) = 3 21. (y + ey )dy + (e−x − x)dx = 0, y(0) = 1 2 22. (x2 + 1)y 0 + 2x3 y = 6xe−x , y(0) = −1 2 2 23. tx dx dt = t + 3x , x(1) = 1 24. (y + 2x sin y cos y)y 0 = 3x2 − sin2 y, y(0) = π Chapter 2 Second-Order Differential Equations 2.1 Introduction to Higher-Order Equations An nth-order differential equation, or differential equation of order n, involves an unknown function y(x) and its derivatives up to order n. Generally such an equation is in the form y (n) = f (x, y, . . . , y (n−1) ) where y (k) denotes the k-th order derivative of y(x); note that this generalizes (1.2) in Section 1.1. But we usually assume the equation is linear, so can be put in the form y (n) + a1 (x)y (n−1) + · · · an (x)y = f (x), (2.1) where the coefficients a1 (x), . . . , an (x) are functions of x. We can also write y (k) as Dk y and aj (x)y (k) as aj Dk y. This enables us to write (2.1) as Dn y + a1 Dn−1 y + · · · an y = (Dn + a1 Dn−1 + · · · + an ) y = f. If we introduce the linear differential operator of order n as L = Dn + a1 Dn−1 + · · · + an , Differential operators L provide a convenient way of writing linear differential equations as Ly = f then we can simplify (2.1) even further: Ly = f. When f is nonzero, then (2.1) (in any of its notational forms) is called a nonhomogeneous equation. On the other hand, we say that Ly = y (n) + a1 (x)y (n−1) + · · · an (x)y = 0, (2.2) is a homogeneous equation. Homogeneous differential equations are not only important for their own sake, but will prove important in studying the solutions of an associated nonhomogeneous equation. Let us further discuss the linearity of the differential operator L. If y1 and y2 are two functions that are sufficiently differentiable that L(y1 ) and L(y2 ) are both defined, then L(y1 + y2 ) is also defined and in fact L(y1 + y2 ) = L(y1 ) + L(y2 ). Moreover, if 35 The use of the term “homogeneous” to refer to (2.2) is distinct from its usage in Section 1.5. 36 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS c1 is any constant, then L(c1 y1 ) = c1 L(y1 ). We can combine these statements in one formula that expresses the linearity of L: L(c1 y1 + c2 y2 ) = c1 L(y1 ) + c2 L(y2 ) Superposition just means that solutions of a homogeneous equation can be added together or multiplied by a constant; they can also be added to a solution of an associated nonhomogeneous equation for any constants c1 and c2 . (2.3) This linearity provides an important property for the solutions of both (2.1) and (2.2) called superposition. Theorem 1. (a) If y1 and y2 are two solutions of the homogeneous equation (2.2), then the linear combination y(x) = c1 y1 (x) + c2 y2 (x), where c1 and c2 are arbitrary constants, is also a solution of (2.2). (b) If yp is a particular solution of the nonhomogeneous equation (2.1) and y0 is any solution of the associated homogeneous equation (2.2), then the linear combination y(x) = yp (x) + c y0 (x), where c is an arbitrary constant, is also a solution of the nonhomogeneous equation (2.1). Proof. We simply use the linearity of L: L(c1 y1 + c2 y2 ) = c1 Ly1 + c2 Ly2 = 0 + 0 = 0 L(yp + c y0 ) = L(yp ) + cL(y0 ) = f + 0 = f. 2 Example 1. (a) y 00 + y = 0 and (b) y 00 + y = 2ex . (a) The functions y1 (x) = cos x and y2 (x) = sin x clearly both satisfy y 00 + y = 0, hence so does the linear combination y(x) = c1 cos x + c2 sin x. (b) The function yp (x) = ex clearly satisfies y 00 + y = 2ex , and hence so does y(x) = ex + c1 cos x + c2 sin x. 2 In the next section we will discuss the general solution for a second-order equation and its role in solving initial-value problems. But in the remainder of this section we discuss another useful approach to analyzing higher-order equations, and then an important application of second-order equations to mechanical vibrations. Conversion to a System of First-Order Differential Equations One approach to analyzing a higher-order equation is to convert it to a system of first-order equations: this is especially important since it can be used for higher-order nonlinear equations. Let us see how to do this with the linear equation (2.1). We begin by renaming y as y1 , then renaming y 0 as y2 so that y10 = y2 . Similarly, if we rename y 00 2.1. INTRODUCTION TO HIGHER-ORDER EQUATIONS 37 as y3 , then we have y20 = y3 . We continue in this fashion; however, instead of renaming y (n) , we use (2.1) to express it in terms of y1 , . . . , yn . We summarize this as follows: y10 = y2 y20 = y3 .. . (2.4) yn0 = f − a1 yn − a2 yn−1 − · · · − an y1 . This is an example of a system of first-order differential equations, also called a first-order system. Our experience with first-order equations suggests that an initialvalue problem for (2.4) involves specifying the values η1 , . . . , ηn for y1 , . . . , yn at some point x0 : y1 (x0 ) = η1 , y2 (x0 ) = η2 , . . . yn (x0 ) = ηn . (2.5) Recalling how y1 , . . . , yn were defined, this means that an initial-value problem for (2.1) should specify the first n − 1 derivatives of y at x = x0 : y(x0 ) = η1 , y 0 (x0 ) = η2 , ... y (n−1) (x0 ) = ηn . (2.6) Let us consider an example. An initial-value problem for a differential equation of order n specifies the first n − 1 derivatives at the initial point Example 2. Convert the initial-value problem for the second-order equation y 00 + 2y 0 + 3y = sin x, y(0) = 0, y 0 (0) = 1 to an initial-value problem for a first-order system. Solution. Let y1 = y and y2 = y 0 . We find that y10 = y 0 = y2 and y20 = y 00 = sin x − 2y 0 − 3y = sin x − 2y2 − 3y1 . Moreover, y1 (0) = y(0) = 0 and y2 (0) = y 0 (0) = 1. Consequently, the initial-value problem can be written as y10 = y2 , y1 (0) = 0 y20 = sin x − 2y2 − 3y1 , y2 (0) = 1. 2 Of course, replacing a single (higher-order) equation by a system of (first-order) equations introduces complications associated with manipulating systems of equations; but this is the purpose of linear algebra, that we begin to study in Chapter 4. We shall further develop the theory of systems of first-order differential equations in Chapter 7, after we have developed the tools from linear algebra that we will need. Mechanical Vibrations: Spring-Mass Systems Suppose an object of mass m is attached to a spring. Compressing or stretching the spring causes it to exert a restorative force Fs that tries to return the spring to its natural length. Moreover, it has been observed that the magnitude of this restorative force is proportional to the amount it has been stretched or compressed; this is called Hooke’s law. If we denote by x the amount that the spring is stretched beyond its natural length, then Hooke’s law may be written Fig.1. Spring’s Force Fs = −kx, where k > 0 is called the spring constant. (2.7) 38 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS This may be viewed as our mathematical model for the force of the spring: if x > 0 then Fs acts in the negative x direction, and if x < 0 then Fs acts in the positive x direction, in both cases working to restore the equilibrium position x = 0. Now suppose the object is in motion. Invoking Newton’s second law, F = ma, we see that the motion x(t) of the object is governed by m d2 x/dt2 = −kx. Writing this in the form (2.1) we obtain d2 x (2.8) m 2 + kx = 0. dt This is an example of a homogeneous second-order differential equation. What should we take as initial conditions for (2.8)? At this point, we can either call upon (2.6), or our experience that the motion is only determined if we know both the initial position of the object and its initial velocity. Thus we take as initial conditions for (2.8): x(0) = x0 Fig.2. Vertical Spring dx (0) = v0 . dt (2.9) The equation (2.8) together with the conditions (2.9) is an initial-value problem. Notice in (2.8) that the motion was horizontal, so we did not consider the effect of gravity. If the spring is hanging vertically and the motion of the object is also vertical, then gravity does play a role. Let us now denote by y the amount the spring is stretched beyond its natural length. The positive y direction is downward, which is also the direction of the gravitational force Fg , so the sum of forces is −ky + mg. This leads to the equation d2 y m 2 + ky = mg. (2.10) dt This is an example of a nonhomogeneous second-order differential equation. Letting a mass stretch a vertically hung spring is also a good way to determine the spring constant k, since both the force mg and the displacement y are known (cf. Exercises 4 and 5). Of course, other forces could be involved as well. For example, the object attached to the horizontal spring may be resting on a table, and its motion could be impeded by the resistive force of friction. As discussed in Section 1.3, resistive forces Fr generally depend upon the velocity of the object, in this case dx/dt. If we assume that the frictional force is proportional to the velocity, then Fr = −c dx/dt where c is called the damping coefficient, and in place of (2.8) we have m Fig.3. Spring with Friction and d2 x dx + kx = 0. +c dt2 dt (2.11) This is another example of a homogeneous second-order differential equation. The vertical spring, of course, is not subject to friction, but we could consider air resistance as a resistive force; it is clear how to modify (2.10) to account for this. Restorative and resistive forces are called internal forces in the system because they depend upon position or velocity. Mechanical vibrations involving only internal forces are called free vibrations; hence, (2.8) and (2.11) are both examples of free vibrations, and these will be analyzed in Section 2.4. However, there could be other forces involved in mechanical vibrations that are independent of the position or velocity; these are called external forces. Of course, gravity is an example of a constant external force, but more interesting external forces depend on time t, for example a periodic 2.2. GENERAL SOLUTIONS FOR SECOND-ORDER EQUATIONS 39 electromagnetic force. This leads us to the nonhomogeneous second-order differential equation mentioned in Section 1.1. We shall study such forced vibrations later in this chapter. Exercises 1. Verify that the given functions y1 and y2 satisfy the given homogeneous differential equation. (a) y1 (x) = ex , y2 (x) = e−x ; y 00 − y = 0. (b) y1 (x) = cos 2x, y2 (x) = sin 2x; y 00 + 4y = 0. (c) y1 (x) = e−x cos x, y2 (x) = e−x sin x; y 00 + 2y 0 + 2y = 0. (d) y1 (x) = x−1 , y2 (x) = x−2 ; y 00 + x4 y 0 + 2 x2 y = 0. 2. Convert the initial-value problem for the second-order equation to an initial-value problem for a first-order system: (a) y 00 − 9y = 0, y(0) = 1, y 0 (0) = −1. (b) y 00 + 3y 0 − y = ex , y(0) = 1, y 0 (0) = 0. (c) y 00 = y 2 , y(0) = 1, y 0 (0) = 0. (d) y 00 + y 0 + 5 sin y = ex , y(0) = 1, y 0 (0) = 1. 3. Convert each of the second order equations (2.8), (2.10), and (2.11) to a system of first-order equations. 4. A vertically hung spring is stretched .5 m when a 10 kg mass is attached. Assuming the spring obeys Hooke’s law (2.7), find the spring constant k. (Include the units in your answer.) Solution 5. A 10 lb weight stretches a vertical spring 2 ft. Assuming the spring obeys Hooke’s law (2.7), find the spring constant k. (Include the units in your answer.) 2.2 General Solutions for Second-Order Equations Let us write our second-order linear differential equation in the form y 00 + p(x)y 0 + q(x)y = f (x). (2.12) An initial-value problem for (2.12) consists of specifying both the value of y and y 0 at some point x0 : y(x0 ) = y0 and y 0 (x0 ) = y1 . (2.13) The existence and uniqueness of a solution to this initial-value problem is provided by the following: Theorem 1. Suppose that the functions p, q, and f are all continuous on an open interval I containing the point x0 . Then for any numbers y0 and y1 , there is a unique solution of (2.12) satisfying (2.13). For a linear equation with continuous coefficients on an interval, the solution of an initial-value problem exists throughout the interval and is unique 40 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS Like Theorem 1 in Section 1.2, this existence and uniqueness theorem is proved using successive approximations, although we shall not give the details here. However, let us observe that existence and uniqueness holds on all of I since the equation is linear. Of course, Theorem 1 above applies to the case f = 0, i.e. a homogeneous secondorder linear differential equation y 00 + p(x)y 0 + q(x)y = 0. (2.14) Notice that (2.14) admits the trivial solution y(x) ≡ 0. In fact, as an immediate consequence of the uniqueness statement in Theorem 1, we have the following: Corollary 1. Suppose that p and q are continuous on an open interval I containing the point x0 , and y(x) is a solution of (2.14) satisfying y(x0 ) = 0 = y 0 (x0 ). Then y is the trivial solution. Let us see how to solve an initial-value problem for (2.14) with an example. Example 1. Find the solution for the initial-value problem y 00 + y = 0, Using a linear combination of solutions to solve an initial-value problem y(0) = 1, y 0 (0) = −1. Solution. As we saw in the previous section, the linear combination y(x) = c1 cos x + c2 sin x provides a two-parameter family of solutions of y 00 + y = 0. If we evaluate this and its derivative at x = 0 we obtain y(0) = c1 cos 0 + c2 sin 0 = c1 , y 0 (0) = −c1 sin 0 + c2 cos 0 = c2 . Using the given initial conditions, we find c1 = 1 and c2 = −1, so the unique solution is y(x) = cos x − sin x. 2 This example demonstrates the usefulness of taking linear combinations of solutions in order to solve initial-value problems for homogeneous equations. However, we need to make sure that we are taking linear combinations of solutions that are truly different from each; in Example 1, we could not use y1 (x) = cos x and y2 (x) = 2 cos x because the linear combination y(x) = c1 y1 (x) + c2 y2 (x) = c1 cos x + 2c2 cos x = (c1 + 2c2 ) cos x Linear independence for more than two functions will be discussed in Chapter 5 will not have enough flexibility to satisfy both initial conditions y(0) = 1 and y 0 (0) = −1. This leads us to the notion of linear independence: two functions defined on an interval I are said to be linearly dependent if they are constant multiples of each other; otherwise, they are linearly independent. Theorem 2. Suppose the functions p and q are continuous on an interval I, and y1 and y2 are two solutions of (2.14) that are linearly independent on I. Then every solution y of this equation can be expressed as a linear combination of y1 and y2 , i.e. y(x) = c1 y1 (x) + c2 y2 (x) for all x ∈ I. 2.2. GENERAL SOLUTIONS FOR SECOND-ORDER EQUATIONS 41 Thus, if y1 and y2 are linearly independent solutions of (2.14), we call y(x) = c1 y1 (x) + c2 y2 (x) the general solution of (2.14). To prove this theorem, we need to use linear algebra for a system of two equations in two unknowns x1 , x2 . In particular, for any constants a, b, c, and d, the system a x1 + b x2 = 0 c x1 + d x2 = 0 (2.15) always admits the trivial solution x1 = 0 = x2 , but we want to know when it admits a nontrivial solution: at least one of x1 or x2 is nonzero. Also, we want to know when, for any values of y1 and y2 , the system a x1 + b x2 = y1 c x1 + d x2 = y2 (2.16) has a solution x1 , x2 . Lemma 1. (a) The equations (2.15) admit a nontrivial solution (x1 , x2 ) if and only if ad − bc = 0. (b) The equations (2.16) admit a unique solution (x1 , x2 ) for each choice of (y1 , y2 ) if and only if ad − bc 6= 0. Since the quantity ad − bc is so important for the solvability of (2.15) and (2.16), we give it a special name, the determinant: a b det ≡ ad − bc = 0. (2.17) c d Lemma 1 is easily proved using elementary algebra (cf. Exercise 13), but it is also a special case of the n variable version that we shall derive in Chapter 4; its appearance here provides extra motivation for the material presented in Chapter 4. Given two differentiable functions f and g on an interval I, let us define the Wronskian of f and g to be the function f g W (f, g) = det 0 = f g 0 − f 0 g. f g0 The relevance of the Wronskian for linear independence is given in the following: Theorem 3. Suppose y1 and y2 are differentiable functions on the interval I. (a) If W (y1 , y2 )(x0 ) 6= 0 for some x0 ∈ I, then y1 and y2 are linearly independent on I. (b) If y1 and y2 are both solutions of (2.14) such that W (y1 , y2 )(x0 ) = 0 for some x0 ∈ I, then y1 and y2 are linearly dependent on I. Remark 1. In this theorem, it is somewhat remarkable that the condition on the Wronskian is only made a one point x0 , but the conclusions hold on the whole interval I. Proof. (a) We will prove the contrapositive: if y1 and y2 are linearly dependent, then W (y1 , y2 ) ≡ 0 on I. In fact, y1 and y2 being linearly dependent means y2 = c y1 for some constant c. Hence W (y1 , y2 ) = y1 y20 − y10 y2 = y1 (c y1 )0 − y10 (c y1 ) = c(y1 y10 − y10 y1 ) ≡ 0. The Wronskian is named after the Polish mathematician and philosopher Józef Hoene Wroński (1776-1853) 42 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS (b) Suppose W (y1 , y2 )(x0 ) = 0 for some point x0 in I; we want to show that y1 and y2 must be linearly dependent, i.e. for some nonzero constants c1 and c2 we have c1 y1 (x) + c2 y2 (x) = 0 on I. Consider the 2×2 system c1 y1 (x0 ) + c2 y2 (x0 ) = 0 c1 y10 (x0 ) + c2 y20 (x0 ) = 0. The condition in Lemma 1 (a) that we can find a nontrivial solution c1 , c2 is just W (y1 , y2 )(x0 ) = 0, which we have assumed. Using these nonzero values c1 , c2 , define y(x) = c1 y1 (x) + c2 y2 (x). As a linear combination of solutions of (2.14), y is also a solution of (2.14), and it satisfies the zero initial conditions at x0 : y(x0 ) = 0, y 0 (x0 ) = 0. (2.18) By Corollary 1, we must have y(x) ≡ 0. But this means that y1 and y2 are linearly dependent on I. 2 Example 2. (a) Show that f (x) = 2 and g(x) = sin2 x + cos2 x are linearly dependent. (b) Show that f (x) = e2x and g(x) = e3x are linearly independent. Solution. (a) By trigonometry, g(x) ≡ 1 so f = 2g, and hence f and g are linearly dependent. (We may compute W (f, g) ≡ f g 0 − f 0 g ≡ 0, but we cannot use Theorem 3 to conclude that f and g are linearly dependent without knowing that both functions satisfy an equation of the form (2.14).) (b) W (f, g) = e2x 3 e3x − 2 e2x e3x = e5x 6= 0, so Theorem 3 (a) implies that f and g are linearly independent. (This was already obvious since f is not a constant multiple of g.) 2 Now we are ready to prove that the general solution of (2.14) is given as a linear combination of two linearly independent solutions. Proof of Theorem 2. Consider any solution y of (2.14) on I and pick any point x0 in I. Given our two linearly independent solutions y1 and y2 , we ask whether we can solve the following system for constants c1 and c2 : c1 y1 (x0 ) + c2 y2 (x0 ) = y(x0 ) c1 y10 (x0 ) + c2 y20 (x0 ) = y 0 (x0 ). Since W (y1 , y2 )(x0 ) = y1 (x0 )y20 (x0 ) − y10 (x0 )y2 (x0 ) 6= 0, Lemma 1 (b) assures us that we can find c1 , c2 . Using these constants, let us define z(x) = c1 y1 (x) + c2 y2 (x). But z is a solution of (2.14) with z(x0 ) = y(x0 ) and z 0 (x0 ) = y 0 (x0 ), so by uniqueness we must have z(x) ≡ y(x) for x in I. In other words, y is a linear combination of y1 and y2 . 2 Consequently, if we have two linearly independent solutions of a second-order homogeneous equation, we can use them to solve an initial-value problem. 2.2. GENERAL SOLUTIONS FOR SECOND-ORDER EQUATIONS 43 Example 3. Use y1 = e2x and y2 = e3x to solve the initial-value problem y 00 − 5y 0 + 6y = 0, y(0) = 1, y 0 (0) = 4. Solution. We can easily check that both y1 and y2 satisfy the differential equation, and they are obviously linearly independent (or we can use Theorem 3 (a) as in Example 2). So the general solution is y(x) = c1 e2x + c2 e3x , and we need only choose c1 and c2 to satisfy the initial conditions. But y(0) = c1 + c2 = 1 and y 0 (x) = 2c1 e2x + 3c2 e3x ⇒ y 0 (0) = 2c1 + 3c2 = 4, so c1 = −1 and c2 = 4. We conclude y(x) = 2 e3x − e2x . 2 Now that we know how to find the general solution of a second-order homogeneous equation, we naturally want to know how to do the same for a nonhomogeneous equation. This is provided by the following. Theorem 4. Suppose that the functions p, q, and f are all continuous on an open interval I, yp is a particular solution of the nonhomogeneous equation (2.12), and y1 , y2 are linearly independent solutions of the homogeneous equation (2.14). Then every solution y of (2.12) can be written in the form y(x) = yp (x) + c1 y1 (x) + c2 y2 (x) for some constants c1 , c2 . Proof . Since L(yp ) = f = L(y), by linearity we have L(y − yp ) = L(y) − L(yp ) = 0. So y − yp is a solution of the homogeneous equation (2.14), and we can use Theorem 2 to find constants c1 , c2 so that y(x) − yp (x) = c1 y1 (x) + c2 y2 (x). 2 But this is what we wanted to show. Another way of stating this theorem is that the general solution of (2.12) can be written as y(x) = yp (x) + yc (x), (2.19) where yp (x) is any particular solution of (2.12) and yc (x) is the general solution of the associated homogeneous equation (2.14); yc is called the complementary solution for (2.12). Let us consider an example of using the general solution to solve an initial-value problem for a nonhomogeneous equation. Example 4. Find the solution for the initial-value problem y 00 + y = 2ex , y(0) = 1, y 0 (0) = 0. Solution. We first want to find two linearly independent solutions of the associated homogenous equation y 00 + y = 0. But, in Example 1, we observed that y1 (x) = cos x and y2 (x) = sin x are solutions, and they are obviously linearly independent (or we can compute the Wronskian to confirm this). So our complementary solution is yc (x) = c1 cos x + c2 sin x. The general solution for a nonhomogeneous equation requires a particular solution and the general solution of the homogeneous equation 44 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS We need to have a particular solution yp of y 00 + y = 2ex . In Section 2.5 we will discuss a systematic way of finding yp , but in this case we might notice that yp (x) = ex works. Thus our general solution is y(x) = ex + c1 cos x + c2 sin x, and it is just a matter of finding the constants c1 and c2 so that the initial conditions are satisfied: y(x) = ex + c1 cos x + c2 sin x ⇒ y(0) = 1 + c1 = 1 ⇒ c1 = 0, y 0 (x) = ex − c1 sin x + c2 cos x ⇒ y 0 (0) = 1 + c2 = 0 ⇒ c2 = −1. 2 So the solution is y(x) = ex − sin x. Exercises For Exercises 1-4 below, (a) verify that y1 and y2 satisfy the given second-order equation, and (b) find the solution satisfying the given initial conditions (I.C.). 1. y 00 − y = 0; y1 (x) = ex , y2 (x) = e−x . I.C. y(0) = 1, y 0 (0) = 0. 2. y 00 − 3y 0 + 2y = 0; y1 (x) = ex , y2 (x) = e2x . I.C. y(0) = 0, y 0 (0) = −1. 3. y 00 − 2y 0 + y = 0; y1 (x) = ex , y2 (x) = x ex . I.C. y(0) = 1, y 0 (0) = 3. 4. y 00 + 2y 0 + 2y = 0; y1 (x) = e−x cos x, y2 (x) = e−x sin x. I.C. y(0) = 1, y 0 (0) = 1. For Exercises 5-8 below, determine whether the given pair of functions is linearly independent on I = (0, ∞). 5. f (x) = 1 + x2 , g(x) = 1 − x2 . 7. f (x) = 3 x2 , g(x) = 4 e2 ln x . 6. f (x) = cos x, g(x) = sin x. 8. f (x) = x, g(x) = x ex . For Exercises 9-12 below, (a) verify that yp satisfies the given second-order equation, (b) verify that y1 and y2 satisfy the associated homogenous equation, and (c) find the solution of the given equation with initial conditions (I.C.). 9. y 00 − y = x; yp (x) = −x, y1 (x) = ex , y2 (x) = e−x . I.C. y(0) = 1, y 0 (0) = 0. Solution 10. y 00 + 9y = 3; yp (x) = 1/3, y1 (x) = cos 3x, y2 (x) = sin 3x. I.C. y(0) = 0, y 0 (0) = 1. 11. y 00 − 2y 0 + 2y = 4x; yp (x) = 2x + 2, y1 (x) = ex cos x, y2 (x) = ex sin x. I.C. y(0) = 0 = y 0 (0). 12. x2 y 00 − 2x y 0 + 2y = 2; yp = 1, y1 (x) = x, y2 (x) = x2 . I.C. y(1) = 0 = y 0 (1). The following exercise proves Lemma 1 concerning the linear systems (2.15) and (2.16). 13. a. b. c. d. If a = c = 0 or b = d = 0, show (2.15) has a nontrivial solution (x1 , x2 ). Assuming a 6= 0 and ad = bc, show that (2.15) has a nontrivial solution. If (2.15) has a nontrivial solution, show that ad − bc = 0. If ad − bc 6= 0, show that (2.16) has a solution (x1 , x2 ) for any choice of (y1 , y2 ). e. If ad − bc 6= 0, show that the solution (x1 , x2 ) in (d) is unique. 2.3. HOMOGENEOUS EQUATIONS WITH CONSTANT COEFFICIENTS 2.3 45 Homogeneous Equations with Constant Coefficients In the previous section we saw the importance of having linearly independent solutions in order to obtain the general solution for a homogeneous 2nd-order linear differential equation. In this section we shall describe how to find these linearly independent solutions when the equation has constant coefficients. Since the method works for nth-order equations, not just for n = 2, we shall initially describe it in this more general context. Let us recall from Section 2.1 that a homogeneous linear differential equation of order n with constant coefficients can be written in the form y (n) + a1 y (n−1) + · · · + an y = 0, (2.20) where the a1 , . . . , an are constants. Alternatively, (2.20) can be written as Ly = 0, where L is the nth-order differential operator L = Dn + a1 Dn−1 + · · · + an . (2.21) In fact, if we introduce the characteristic polynomial for (2.20), p(r) = rn + a1 rn−1 + · · · + an , (2.22) then we can write the differential operator L as L = p(D). We begin our analysis of (2.20) with the simplest case n = 1, i.e. we want to solve (D + a1 )y = 0. Let us change notation slightly and solve (D − r1 )y = 0. (2.23) But this is just y 0 = r1 y which we can easily solve to find y(x) = C er1 x . In particular, with C = 1 we have the exponential solution y1 (x) = er1 x . (2.24) But this calculation has implications for the general case. Namely, suppose that r1 is a root of the characteristic polynomial p, i.e. satisfies the characteristic equation p(r) = 0, (2.25) Then p(r) = q(r)(r − r1 ) for some polynomial q of degree n − 1, and so Ly1 = p(D)y1 = q(D)(D − r1 )y1 = 0 In other words, we see that if r1 is a root of the characteristic polynomial p, then y1 (x) = er1 x is a solution of (2.20). Since an nth-order polynomial can have up to n roots, this promises to generate several solutions of (2.20). Example 1. Find solutions of the 3rd-order differential equation y 000 + 3y 00 + 2y 0 = 0. Solution. The characteristic equation is r3 + 3r2 + 2r = 0. Now cubic polynomials can generally be difficult to factor, but in this case r itself is a factor so we easily obtain r3 + 3r2 + 2r = r (r2 + 3r + 2) = r (r + 1) (r + 2). Since we have three roots r = 0, −1, −2, we have three solutions: 46 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS y1 (x) = e0x = 1, y2 (x) = e−x , 2 y3 (x) = e−2x . Generalizing Example 1, if the characteristic equation for (2.20) has n distinct real roots r1 , . . . , rn , then we get n solutions y1 (x) = er1 x , . . . , yn (x) = ern x . But the fact that the characteristic polynomial factors as (r − r1 ) · · · (r − rn ) means that the differential operator L can be similarly factored: L = (D − r1 )(D − r2 ) · · · (D − rn ). (2.26) (Incidentally, notice that the order in which the factors (D − ri ) appear does not matter since (D − ri )(D − rj ) = D2 − (ri + rj )D + r1 r2 = (D − rj )(D − ri ).) But now suppose that the roots of p(r) are not all distinct; can we still generate as many solutions? For example, let us generalize (2.23) and consider (D − r1 )m1 y = 0, (2.27) for an integer m1 ≥ 1; can we generate m1 solutions? Let us try to solve this by generalizing (2.24) to consider y(x) = u(x) er1 x , (2.28) where the function u(x) is to be determined. But we can easily calculate (D − r1 )y =u0 er1 x + ur1 er1 x − r1 u er1 x = u0 er1 x .. . (D − r1 )m1 y =(Dm1 u) er1 x . Therefore, we want u to satisfy Dm1 u = 0, which means that u can be any polynomial of degree less than m1 : u(x) = c0 + c1 x + c2 x2 + · · · + cm1 −1 xm1 −1 . But this means that we have indeed generated m1 solutions of (2.27): y1 (x) = er1 x , y2 (x) = x er1 x , . . . , ym1 (x) = xm1 −1 er1 x . (2.29) For the same reasons as before, this construction can be extended to handle operators L whose characteristic polynomial factors into powers of linear terms: L = p(D) = (D − r1 )m1 (D − r2 )m2 · · · (D − rk )mk , (2.30) where the r1 , . . . , rk are distinct. Example 2. Find solutions of the 3rd-order differential equation y 000 − 2y 00 + y 0 = 0. Solution. The characteristic polynomial factors as p(r) = r3 − 2y 2 + r = (r − 1)2 r = 0. The roots are r = 0 and r = 1 (double). So r = 0 contributes one solution, namely y0 = 1, and r = 1 contributes two solutions, namely y1 (x) = ex and y2 (x) = x ex . 2 Evidently, generating all solutions by this method requires us to be able to completely factor the characteristic polynomial p(r). For higher-order equations, this could be problematic; but we know how to factor quadratic polynomials (possibly encountering complex roots), so we now restrict our attention to second-order equations. 2.3. HOMOGENEOUS EQUATIONS WITH CONSTANT COEFFICIENTS 47 Second-Order Equations We shall describe how to find the general solution for all second-order homogeneous equations. Let us change notation slightly and write (2.20) for n = 2 as a y 00 + b y 0 + c y = 0, (2.31) where a 6= 0. The characteristic equation for (2.31) is a r2 + b r + c = 0, (2.32) which is solved using the quadratic formula √ −b ± b2 − 4ac r= . 2a As is often the case with quadratic equations, we encounter different cases depending on the roots. The case of distinct real roots is the simplest. Theorem 1. If (2.32) has distinct real roots r1 , r2 , then the general solution of (2.31) is given by y(x) = c1 er1 x + c2 er2 x . Proof. If we let y1 (x) = er1 x and y2 (x) = er2 x , then we have two solutions that are not constant multiples of each other, so they are linearly independent. (Linear independence can also be checked using the Wronskian; cf. Exercise 1.) Therefore, Theorem 2 in Section 2.2 implies the general solution is given by c1 er1 x + c2 er2 x . 2 Case 1: Distinct Real Roots Example 3. Find the general solution of y 00 + 2y 0 − 3y = 0. Solution. The characteristic equation is r2 + 2r − 3 = 0. We can factor it to find the roots: r2 + 2r − 3 = (r + 3)(r − 1) ⇒ r = −3, 1. Consequently, the general solution is y(x) = c1 ex + c2 e−3x . 2 Now let us consider the case that (2.32) has a double real root r1 = −b/(2m) (since b2 − 4ac = 0). This means that the characteristic polynomial factors as ar2 + br + c = a(r − r1 )2 , which in turn means that (2.31) can be written as (aD2 + bD + c)y = a(D − r1 )2 y = 0. (2.33) But we have seen above that this has two distinct solutions y1 (x) = er1 x and y2 (x) = x er1 x . Observing that these solutions are not constant multiples of each other (or using the Wronskian; cf. Exercise 1), we conclude that they are linearly independent and hence generate the general solution of (2.31): Case 2: Double Real Root 48 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS Theorem 2. If (2.32) has a double real root r1 , then the general solution of (2.31) is given by y(x) = c1 er1 x + c2 x er1 x . Example 4. Find the general solution of y 00 + 2y 0 + y = 0. Solution. The characteristic equation is r2 + 2r + 1 = (r + 1)2 = 0, so r = −1 is a double root. The general solution is therefore y(x) = c1 e−x + c2 x e−x . 2 Finally we recall that (2.32) need not have any real roots: for example, the quadratic formula shows that r2 + 2r + 2 = 0 has two complex roots, r = −1 ± i. In fact, since we are assuming that a, b, and c are all real numbers, if r = α + iβ is a complex root (where α, β are real numbers), then the complex conjugate r = α − iβ is also a root: a r2 + b r + c = 0 Case 3: Complex Roots (For a review of complex numbers, see the Appendix.) ⇒ a r2 + b r + c = a r2 + b r + c = 0. So we have two solutions of (2.31): y1 (x) = erx = e(α+iβ)x and y2 (x) = erx = e(α−iβ)x . However, these are both complex-valued solutions of the real-valued equation (2.31); if possible, can we find real-valued solutions? To do this, we need to use Euler’s formula (see Appendix A): eiθ = cos θ + i sin θ. Applying this to both y1 and y2 we find y1 (x) = e(α+iβ)x = eαx eiβx = eαx (cos βx + i sin βx) y2 (x) = e(α−iβ)x = eαx e−iβx = eαx (cos βx − i sin βx), where in the last step we used cos(−βx) = cos βx and sin(−βx) = − sin βx, since cosine is an even function and sine is an odd function. But then by linearity we find y1 + y2 = eαx cos βx is a real-valued solution of (2.31), and 2 y1 − y2 = eαx sin βx is a real-valued solution of (2.31). 2i Theorem 3. If (2.32) has a complex-valued root r = α + iβ where β 6= 0, then the general solution of (2.31) is given by y(x) = eαx (c1 cos βx + c2 sin βx). Proof. We have seen that ỹ1 (x) = eαx cos βx and ỹ2 (x) = eαx sin βx are both solutions of (2.31), and they are not constant multiples of each other, so they are linearly independent. (Linear independence can also be checked using the Wronskian; cf. Exercise 1.) 2 2.3. HOMOGENEOUS EQUATIONS WITH CONSTANT COEFFICIENTS 49 Example 5. Find the general solution of y 00 + 8y 0 + 20y = 0. Solution. The characteristic equation is r2 + 8r + 20 = 0, which we solve using the quadratic formula √ −8 ± 64 − 80 r= = −4 ± 2i. 2 We see that we have complex conjugate roots, so let us select one, say r = −4 + 2i. Theorem 3 tells us that the general solution is y(x) = e−4 x (c1 cos 2x + c2 sin 2x). 2 Let us recall that the general solution is used to solve initial-value problems by evaluating the constants. Thus, in all cases we can now find the solution of (2.31) satisfying the initial conditions y(x0 ) = y0 and y 0 (x0 ) = y1 . Examples are given in the Exercises. Exercises 1. Use the Wronskian to verify that the following pairs of functions encountered in the proofs of Theorems 1, 2, and 3 are linearly independent: (a) y1 (x) = er1 x and y2 (x) = er2 x where r1 , r2 are distinct real numbers, (b) y1 (x) = erx and y2 (x) = x erx where r is a real number, (c) ỹ1 (x) = eαx cos βx and ỹ2 (x) = eαx sin βx where α, β are real numbers with β 6= 0. 2. Find the general solution (a) y 00 − 4y = 0. (e) 9y 00 + 12y 0 + 4y = 0. (b) y 00 + 4y = 0. (f) y 00 + 8y 0 + 25y = 0. (c) y 00 + 6y 0 + 9y = 0. Solution (d) y 00 + 2y 0 − 15y = 0. (g) y 00 + 2y 0 − 2y = 0. (h) y 00 − 6y 0 + 11y = 0. 3. Find the solution of the initial-value problem (a) y 00 − 9y = 0, y(0) = 1, y 0 (0) = −1. (b) y 00 + 9y = 0, y(0) = 1, y 0 (0) = −1. Solution (c) y 00 − 10y 0 + 25y = 0, y(0) = −1, y 0 (0) = 1. (d) y 00 − 6y 0 + 25y = 0, y(0) = 1, y 0 (0) = −1 (e) 2y 00 − 7y 0 + 3y = 0, y(0) = 0, y 0 (0) = 2 (f) y 00 − 4y 0 + 5y = 0, y(0) = 1, y 0 (0) = 0 50 2.4 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS Free Mechanical Vibrations In this section we will apply the techniques developed in the preceding sections of this chapter to study the homogeneous second-order linear differential equations associated with a free (i.e. unforced) mechanical vibration. One simple model of such a vibration is the spring-mass system described in Section 2.1, namely we consider an object of mass m attached to a spring which exerts a linear restorative force with spring-constant k and whose motion is subject to a linear resistive force with damping coefficient c: m Fig.1. Spring-mass- dashpot System dx d2 x +c + kx = 0, 2 dt dt (2.34) where x denotes the displacement from the equilibrium position at x = 0. Note that the damping force could be due to friction if the mass is supported by a horizontal surface, or due to air resistance if the mass is moving through the air. In a controlled experiment, the damping force is usually induced by a “dashpot,” a device designed to exert a resistive force proportional to the velocity of the object; an example of a dashpot is a shock absorber, like one has in a car suspension. Note that we have used x as the unknown function, suggesting that the motion is lateral; we can also use (2.34) for vertical vibrations provided we factor out the effect of gravity (see Exercise 8). Since (2.34) has constant coefficients, we know that we can find the general solution (and hence the solution for initial-value problems) using the roots of the characteristic equation. However, there are different cases depending upon the values of m, k, and c. We start with the simplest case of undamped motion, i.e. c = 0. Undamped Motion Suppose that the motion occurs in a vacuum so there is no air resistance, and there are no other damping factors. Then we take c = 0 in (2.34) and obtain m d2 x + kx = 0. dt2 (2.35) The characteristic equation is m r2 + k = 0, which has purely imaginary roots q k r = ± i ω, where ω = m , and so the general of (2.35) is x(t) = c1 cos ωt + c2 sin ωt. (2.36) This function is clearly a periodic function with period 2π/ω, and the quantity ω is called the circular frequency. Given initial conditions on x and dx/dt, we can evaluate the constants c1 and c2 and obtain a particular solution. While the solution is clearly periodic, it is difficult to see the amplitude and shape of the graph. However, a little trigonometry will enable us to put the solution (2.36) into the more convenient amplitude-phase form x(t) = A cos(ωt − φ) where A > 0 and 0 ≤ φ < 2π, (2.37) 2.4. FREE MECHANICAL VIBRATIONS 51 in which A is clearly the amplitude of the periodic motion. The angle φ is called the phase shift, and controls how much the graph is shifted from that of A cos ωt: the first peak in the graph of (2.37) occurs at φ t = ω ∗ A x (2.38) instead of at t = 0 (see Fig.2). The quantity (2.38) is called the time lag. In order to put the solution (2.36) in the form (2.37), we use the difference of angles formula for cosine cos(α − β) = cos α cos β + sin α sin β, | t t* to conclude Fig.2. Undamped Motion cos(ωt − φ) = cos ωt cos φ + sin ωt sin φ. Comparing this with (2.36), we see that we need A and φ to satisfy A cos φ = c1 and A sin φ = c2 . (2.39) To determine A from these equations, we square and add them, then take square-root: q A2 cos2 φ + A2 sin2 φ = A2 = c21 + c22 ⇒ A = c21 + c22 . To obtain the value of φ, we can divide them: c2 A sin φ = tan φ = . A cos φ c1 From this formula, we may be tempted to conclude that φ = tan−1 (c2 /c1 ), but we need to be careful: tan−1 returns a value in (−π/2, π/2) and we want 0 ≤ φ < 2π! Consequently, we may need to add either π or 2π to tan−1 (c2 /c1 ) in order to ensure that (2.39) is indeed satisfied. Let us see how this works in an example. Example 1. Suppose a mass of 0.5 kg is attached to a spring with spring constant k = 2 N/m. Find the motion in amplitude-phase form if (a) the spring is compressed 1 m and then released, and (b) the spring is compressed 1 m and then given an initial velocity of 1 m/sec towards the equilibrium position. Solution. With m = 0.5 and k = 2, the equation (2.35) becomes x00 + 4x = 0. The characteristic equation r2 + 4 = 0 has imaginary roots r = ± 2 i, which means the general solution is x(t) = c1 cos 2t + c2 sin 2t. We see that the oscillation has circular frequency ω = 2, and we can use the initial conditions to determine c1 , c2 . In case (a), the mass is released with zero velocity from the point x = −1, so the initial conditions are x(0) = −1 and x0 (0) = 0. (2.40) Recall that tan−1 is the principal branch of the inverse tangent function and returns values in (−π/2, π/2). 52 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS Using these conditions to evaluate the constants, we find c1 = −1 and c2 = 0, so our particular solution is x(t) = − cos 2t. However, this is not in amplitude-phase form (since we want a positive amplitude); we want to write x − cos 2t = A cos(2t − φ). 0.5 Obviously we take A = 1 and from (2.39) we find that we want | π _ 2 cos φ = −1 and sin φ = 0. But this implies φ = π and our solution can be written Fig.3. Solution x(t) = cos(2t − π) for Example 1a. x(t) = cos(2t − π). The graph appears in Figure 3; note that the time lag t∗ = π/2 is found by setting 2t − π = 0. Now let us consider the initial conditions (b), which can be expressed as x(0) = −1 and x0 (0) = 1. (2.41) (Note that x0 (0) > 0 since x(0) < 0 and the initial velocity is towards the equilibrium at x = 0.) Using these to evaluate the constants, we find c1 = −1 and c2 = 1/2, so our particular solution is x(t) = − cos 2t + 1 sin 2t. 2 To put this in amplitude-phase form we want s A= (−1)2 √ 2 1 5 = + 2 2 and 1 tan φ = − . 2 x 1.0 0.5 | 1.339 t Now tan−1 (−0.5) ≈ −0.4636 radians, but we cannot take φ = −0.4636 since we want 0 ≤ φ < 2π. In fact, −0.4636 is in the fourth quadrant, but A cos φ = −1 < 0 and A sin φ = 1/2 > 0 imply that φ should be in the second quadrant. Consequently, we add π to tan−1 (−0.5): φ = −0.4636 + π ≈ 2.678 radians. Therefore, our solution in amplitude-phase form is Fig.4. Solution to Exam√ ple 1b has amplitude 5/2 and time lag 1.339. √ x(t) ≈ 5 2 cos(2t − 2.678). Note that the time lag is t∗ = 2.678/2 = 1.339. 2 2.4. FREE MECHANICAL VIBRATIONS 53 Damped Motion: Three Cases With c > 0, we want to find the general solution of (2.34). The characteristic equation is mr2 + cr + k = 0 with roots given by the quadratic formula: √ −c ± c2 − 4mk . r= 2m We see that there are three cases: • c2 > 4mk ⇒ two distinct real roots. This case is called overdamped. • c2 = 4mk ⇒ one double real root. This case is called critically damped. • c2 < 4mk ⇒ two complex conjugate roots. This case is called underdamped. Let us investigate each case separately. √ Overdamped Motion. In this case c2 > 4mk implies that c2 − 4mk is real and √ 0 < c2 − 4mk < c. Hence the characteristic equation has two negative real roots: √ √ −c − c2 − 4mk −c + c2 − 4mk r1 = < r2 = < 0. 2m 2m As we saw in the previous section, the general solution is of the form x(t) = c1 er1 t + c2 er2 t , and we can evaluate the constants if we are given initial conditions. But regardless of the values of c1 , c2 the fact that r1 and r2 are negative implies that x(t) decays rapidly (exponentially) to x = 0: this is not surprising since the damping factor c is so large that it rapidly overcomes the tendency for oscillatory motion that we observed when c = 0. In fact, if c1 and c2 are both positive, we see that x(t) remains positive (as it rapidly approaches 0) and never passes through the equilibirum x = 0. For certain values of c1 and c2 it is possible for the motion to pass through the equilibrium at x = 0 once, but that is the most oscillation that can occur. See Figure 5 for the graphs of several particular solutions satisfying x(0) = 1 but with various values for x0 (0). Critically Damped Motion. In this case c2 = 4mk implies that there is one negative (double) root −c r= < 0. 2m As we saw in the previous section, the general solution is of the form ct ct ct x(t) = c1 e− 2m + c2 t e− 2m = e− 2m (c1 + c2 t). ct Although (c1 + c2 t) may grow (if c2 6= 0), the factor e− 2m decays so much more rapidly that x(t) behaves much like the overdamped case: solutions can pass through x = 0 at most once. Underdamped Motion. In this case c2 < 4mk implies that there are two complex conjugate roots √ c 4mk − c2 r=− ± i µ where µ = > 0. 2m 2m 1 0 t Fig.5. Overdamped vibration with x(0) = 1 and various values of x0 (0). 54 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS As we saw in the previous section, the general solution is of the form ct x(t) = e− 2m (c1 cos µt + c2 sin µt). But the parenthetical term is of the same form as the general solution in the undamped case, i.e. (2.36), so we can put it in amplitude-phase form and obtain ct x(t) = A e− 2m cos(µt − φ), where A cos φ = c1 , A sin φ = c2 . x (2.42) ct t Fig.6. Underdamped vibration with time-varying amplitude (dotted curve). Now this is interesting: we have exponential decay (due to the factor e− 2m ) but also oscillation (due to cos(µt − φ)). This is sometimes described as pseudo-periodic motion (with pseudo-frequency µ and pseudo-period T = 2π/µ) with time-varying ct amplitude A e− 2m . We are not surprised to find that, for very small damping c, the pseudo-frequency µ is very close to the frequency ω of the undamped vibration; perhaps more interesting is to note that c > 0 implies µ < ω, so the presence of damping slows down the frequency of the oscillation. See Figure 6 for a graph of pseudo-periodic motion. Example 2. Suppose that the mass and spring of Example 1 is now attached to a dashpot which exerts a resistive force proportional to velocity with c = 0.2 N/(m/s). Find the motion in the two cases (a) and (b) described in Example 1. Solution. We find that (2.34) becomes 0.5 x00 + 0.2 x0 + 2 x = 0. The characteristic equation can be written as r2 +0.4 r +4 = 0, so the quadratic formula gives us the roots √ −0.4 ± −15.84 ≈ −0.2 ± i 1.99. r= 2 (In particular, since c2 = 0.16 < 16 = 4mk, we see that the system is underdamped, which is consistent with having complex roots.) The general solution can be written as √ x(t) = e(−0.2)t (c1 cos µt + c2 sin µt) where µ = 3.96 ≈ 1.99. Notice that the pseudo-frequency of this oscillation is µ ≈ 1.990, which indeed is a little smaller than the circular frequency ω = 2 of Example 1. Now we want to use initial conditions to evaluate the constants c1 , c2 . In case a), we use initial conditions (2.40). First, we evaluate x(t) at t = 0 to find c1 : x(0) = e0 (c1 cos 0 + c2 sin 0) = −1 ⇒ c1 = −1, Then we differentiate x(t) to find x0 (t) = −0.2 e(−0.2)t (c1 cos µt + c2 sin µt) + e(−0.2)t (−c1 µ sin µt + c2 µ cos µt), and we evaluate this to find c2 : x0 (0) = −0.2 c1 + c2 µ = 0 ⇒ c2 = −0.2 −0.2 ≈ ≈ −0.101. µ 1.99 2.4. FREE MECHANICAL VIBRATIONS 55 We conclude that the solution of our initial-value problem is 0.2 x(t) = e(−0.2)t − cos µt − sin µt . µ But now we’d like to put it in the form (2.42), so we want to find A and φ so that A cos φ = −1 and A sin φ = − 0.2 ≈ −0.101. µ p We immediately see that A ≈ 1 + (0.101)2 ≈ 1.01 and φ should be in the 3rd quadrant (where cos and sin are negative) satisfying tan φ = c2 /c1 ≈ 0.101. Since tan−1 (0.101) = 0.101 is in the first quadrant, we must add π: x 1 φ ≈ 0.101 + π ≈ 3.24. 0.5 Therefore, we may approximate the solution in case a) by t 0 x(t) ≈ (1.01) e(−0.2)t cos(µt − 3.24), µ ≈ 1.99. (2.43) x(t) = e 2 3 4 5 -0.5 -1 The graph of this solution appears in Figure 7. Finally, in case b) we use the initial conditions (2.41) to find c1 , c2 . Proceeding as before, we again find c1 = −1 but now c2 = 0.8/µ, so our solution may be written (−0.2)t 1 Fig.7. Solution to Example 2a. 0.8 sin µt . − cos µt + µ To obtain the form (2.42), we want A and φ to satisfy A cos φ = −1 and A sin φ = 0.8 0.8 ≈ ≈ 0.40. µ 1.99 p So A ≈ 1 + (0.40)2 ≈ 1.08 and φ should be in the second quadrant (where cos is negative and sin is positive) satisfying tan φ = c2 /c1 ≈ −0.40. Since tan−1 (−0.40) ≈ −0.38 is in the fourth quadrant, we again must add π: x 1 0.5 t 0 φ ≈ −0.38 + π ≈ 2.76. 1 2 3 4 5 -0.5 -1 Therefore, we may approximate the solution in case b) by Fig.8. Solution to Example (−0.2)t x(t) ≈ (1.08) e cos(µt − 2.76), The graph of this solution appears in Figure 8. µ ≈ 1.99. (2.44) 2 2b. 56 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS Other Mechanical Vibrations The model for a spring-mass system was discussed in great detail in this section, but the principles can be applied to many other forms of mechanical vibrations. One other example that we shall now discuss is the motion of a pendulum consisting of a mass m attached to one end of a rod of length L. We assume that the rod has negligible mass and the end not attached to the mass is fixed in place, but allowed to pivot as the pendulum swings back and forth. To describe the position of the mass at time t, let θ(t) denote the angle that the rod makes with the vertical, so θ = 0 corresponds to the rod hanging straight down. We want to derive a differential equation satisfied by θ(t). We shall use the physical principle of conservation of energy, namely the sum of the kinetic energy and the potential energy must remain constant. The kinetic energy is Ekin = 12 mv 2 , so we need to find v in terms of θ and L. But the distance along the arc of motion is Lθ and L is fixed, so v = Ldθ/dt. Consequently, the kinetic energy is Fig.9. Pendulum. Ekin 1 = mL2 2 dθ dt 2 . The potential energy is Epot = mgh where h is the height above its equilibrium position (when θ = 0). Figure 10 shows that L − h = L cos θ or h = L(1 − cos θ), so Epot = mgL(1 − cos θ). By conservation of energy, we know 1 mL2 2 dθ dt 2 + mgL(1 − cos θ) = C, where C is a constant. If we differentiate both sides of this equation using the chain rule, we obtain Fig.10. Pendulum trigonometry. mL2 dθ d2 θ dθ + mgL sin θ = 0. 2 dt dt dt We can factor mLdθ/dt out of this equation and conclude L d2 θ + g sin θ = 0. dt2 (2.45) Now this is a nonlinear second-order differential equation, so we may dismay that the techniques of this section will not apply. However, if the oscillations are small, then θ is small and by Taylor series approximation we know sin θ ≈ θ. Thus we might expect that (2.45) is well-appromiated by the linear equation L d2 θ + gθ = 0. dt2 (2.46) This process of approximating (under certain circumstances) a nonlinear equation by a linear one is called linearization. Again, we emphasize that we can only expect solutions of (2.46) to be a good approximations for the solutions of (2.45) when the oscillations are small. 2.4. FREE MECHANICAL VIBRATIONS 57 Exercises In Exercises 1-5, we ignore damping forces. 1. If a 2 kg mass is attached to a spring with constant k = 8 N/m and set in motion, find the period and circular frequency of the motion. Solution 2. A 16 lb weight is attached to a spring with constant k = 8 lb/ft and set in motion. Find the period and circular frequency of the motion. 3. A mass of 3 kg is attached to a spring with constant k = 12 N/m, then the spring is stretched 1 m beyond its natural length and given an initial velocity of 1 m/sec back towards its equilibrium position. Find the circular frequency, period, and amplitude of the motion. 4. A 2 kg mass is attached to a spring with constant k = 18 N/m. Given initial conditions x(0) = 1 = x0 (0), find the motion x(t) in amplitude-phase form (2.37). 5. A 1 kg mass is attached to a spring with constant k = 16 N/m. Find the motion x(t) in amplitude-phase form (2.37) if x(0) = 1 and x0 (0) = −1. 6. For the given values of mass m, damping coefficient c, spring constant k, initial position x0 , and initial velocity v0 : i) find the solution x(t), and ii) state whether the motion is overdamped, critically damped, or underdamped. (a) (b) (c) (d) (e) m = 1, m = 2, m = 1, m = 4, m = 2, c = 6, k = 8, x0 = 0, v0 = 2. c = 5, k = 2, x0 = 3, v0 = 0. c = 2, k = 2, x0 = 1, v0 = 0. c = 12, k = 9, x0 = 1, v0 = 1. c = 12, k = 50, x0 = 1, v0 = 2. 7. For the following underdamped systems, find x(t) in the form A e−αt cos(µt − φ), and identify the pseudo-frequency of the oscillation. What would the circular frequency ω be if the damping were removed? (a) m = 1, c = 2, k = 2, x0 = 1, v0 = 0. Solution (b) m = 2, c = 1, k = 8, x0 = −4, v0 = 2. 8. If we consider a vertical spring-mass-dashpot system, then we need to include gravity as a force. Consequently, in place of (2.34) we have d2 y dy +c + ky = mg, 2 dt dt where y is the distance (measured downward) that the spring has been stretched beyond its natural length. However, when we first attach the mass m, the vertical spring is stretched so that the equilibrium position is now at y = mg/k. (Why?) If we let x(t) = y(t) − mg/k denote the distance that the spring is stretched beyond its new equilibrium position, show that x satisfies (2.34). m 9. A pendulum has a mass of 10 kg attached to a rod of length 1/2 m. Use linearization to find the circular frequency and period of small oscillations. If the mass is doubled to 20 kg, what is the effect on the period? 58 2.5 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS Nonhomogeneous Equations with Constant Coefficients In this section we want to develop a method for finding the general solution of a secondorder nonhomogeneous linear differential equation with constant coefficients. Using the theory in Section 2.2, we know that the general solution is of the form y = yp +yc , where yp is a particular solution and yc is the general solution for the associated homogeneous equation. In Section 2.3 we developed a method for finding yc , so now we focus on yp . As in the homogeneous case, the basic method works for equations of order n so we first develop the method in that context. Let us consider a nonhomogeneous linear differential equation of order n with constant coefficients Ly = y (n) + a1 y (n−1) + · · · an y = f (x), (2.47) where L is the differential operator Dn + a1 Dn−1 + · · · an and the coefficients a1 , . . . , an are constants; the given function f is what makes (2.47) nonhomogeneous. The technique for finding a particular solution of (2.47) is called the method of undetermined coefficients. It involves making an educated guess about what form yp should take based upon the form of f . This yields a trial solution yp involving one or more “undetermined coefficients” that may be evaluated simply by plugging into (2.47). Let us proceed with some examples to see how easy it is. Example 1. Find the general solution of y 00 + 4y = 3 e2x . Solution. Although we are anxious to find yp , it is usually best to start with yc , which we need for the general solution anyway. So we solve y 00 + 4y = 0 by using the roots of the characteristic equation r2 + 4 = 0, which are r = ±2i. This gives us the complementary solution yc = c1 cos 2x + c2 sin 2x. Now what is our educated guess for yp based upon F (x) = 3 e2x ? We know that exponentials differentiate as exponentials, so a reasonable assumption is yp (x) = A e2x , where A is our undetermined coefficient. To see if this works and to evaluate A, we simply plug into y 00 + 4y = 3 e2x : yp00 + 4yp = 4A e2x + 4A e2x = 8A e2x = 3 e2x . We see that 8A = 3 or A = 3/8. In other words yp (x) = is y(x) = 3 8 3 8 e2x and our general solution e2x + c1 cos 2x + c2 sin 2x. 2 Example 2. Find the general solution of y 00 + 3y 0 + 2y = sin x. Solution. We start with the homogeneous equation y 00 +3y 0 +2y = 0. The characteristic equation r2 + 3r2 + 2 = 0 factors as (r + 1)(r + 2) = 0, so the complementary solution is yc (x) = c1 e−x + c2 e−2x . 2.5. NONHOMOGENEOUS EQUATIONS WITH CONSTANT COEFFICIENTS 59 What does F (x) = sin x suggest that we use for yp (x)? We could try yp (x) = A sin x, but when we plug into y 00 + 3y 0 + 2y = sin x we’ll get cosine as well as sine terms on the left hand side; learning from this experience, we are led to the trial solution yp (x) = A cos x + B sin x, where A and B are to be determined. We plug into y 00 + 3y 0 + 2y = sin x to find (−A cos x − B sin x) + 3(−A sin x + B cos x) + 2(A cos x + B sin x) = sin x. Rearranging terms on the left hand side, we obtain (−3A + B) sin x + (A + 3B) cos x = sin x. Comparing left and right hand sides gives us two equations that A and B must satisfy: −3A + B = 1 A + 3B = 0. We easily solve these equations to find B = 1/10 and A = −3/10, so yp (x) = 3 10 cos x and our general solution is y(x) = 1 10 sin x − 3 10 cos x + c1 e−x + c2 e−2x . 1 10 sin x − 2 Example 3. Find the general solution of y 00 + 4y 0 + 5y = x2 . Solution. The associated homogeneous equation y 00 + 4y 0 + 5y = 0 has characteristic equation r2 + 4r + 5 = 0. The quadratic formula yields roots r = −2 ± i, so the complementary solution is yc (x) = e−2x (c1 cos x + c2 sin x). Turning to the nonhomogeneous equation, f (x) = x2 is a polynomial, and derivatives of polynomials are polynomials, so we suspect yp should be a polynomial. Let us take as our trial solution yp (x) = Ax2 + Bx + C. Plugging this into y 00 + 4y 0 + 5y = x2 and collecting terms on the left according to the power of x: 5Ax2 + (8A + 5B)x + (2A + 4B + 5C) = x2 . Comparing the left and right hand sides, we get the following equations: 5A = 1, 8A + 5B = 0, 2A + 4B + 5C = 0. We easily solve these equations to find A = 1/5, B = −8/25, and C = 22/125, so the general solution is y(x) = x2 5 − 8 25 x + 22 125 + e−2x (c1 cos x + c2 sin x). 2 60 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS At this point, we have managed to find yp when f (x) in (2.47) is an exponential function, a periodic function involving sine or cosine, or a polynomial. We can summarize these results in a table: f (x) eαx , α is a real number cos βx or sin βx, β is a real number xk , k is an integrer ≥ 0 Expected Trial yp (x) A eαx A cos βx + B sin βx Ak x k + · · · + A0 Table 1. The expected trial solutions for certain functions f (x) But does this table always work? Let us consider one more example: Example 4. Find the general solution of y 00 − 3y 0 + 2y = ex . Solution. According to the table, we should take yp (x) = A ex . But when we plug that into the equation, on the left hand side we obtain A ex − 3A ex + 2A ex = 0. However, comparing with the right hand side of the equation we get 0 = ex , which cannot be satisfied no matter what A is! What went wrong? For one thing, we forgot to find the complementary solution, so let us do that now. The associated homogeneous equation is Ly = y 00 − 3y 0 + 2y = 0, and its characteristic equation r2 − 3r + 2 = (r − 1)(r − 2) = 0 has roots r = 1, 2, so we obtain yc (x) = c1 ex + c2 e2x . We now see that the trial function yp (x) = A ex is a solution of the homogeneous equation Ly = 0, so there is no way that it can be a particular solution of the nonhomogeneous equation. What can we do? We shall use a trick called the annihilator method. We first write our nonhomogeneous equation as (D − 1)(D − 2)y = ex . Now let us apply the operator D − 1 to both sides; since D − 1 annihilates ex , we get a third-order homogeneous equation (D − 1)2 (D − 2)y = 0. But in Section 2.3, we found three distinct solutions of this homogeneous equation: y1 (x) = ex , y2 (x) = x ex , and y3 (x) = e2x . The first and last of these are included in yc , so we discard them and use the middle one for our trial solution with an undetermined coefficient: yp (x) = Axex . Calculating yp0 = Aex + Axex and yp00 = 2Aex + Axex , we plug into y 00 − 3y 0 + 2y = ex : (2Aex + Axex ) − 3(Aex + Axex ) + 2Axex = ex . 2.5. NONHOMOGENEOUS EQUATIONS WITH CONSTANT COEFFICIENTS 61 But the terms involving xex on the left hand side cancel each other, leaving us with only −Aex on the left hand side. Comparing with ex on the right hand side, we obtain A = −1. We conclude that yp (x) = −x ex is a particular solution of y 00 − 3y 0 + 2y = ex and the general solution is 2 y(x) = −x ex + c1 ex + c2 e2x . Let us now consider how to generalize Table 1 to include more complicated functions f (x), as well as cases (as in Example 4) where the expected trial solution yp fails. At this point we restrict our attention to second-order equations for which we have complete knowledge of the roots of the characteristic polynomial. Second-Order Equations Let us consider the second-order nonhomogeneous linear diffferential equation Ly = ay 00 + by 0 + cy = f (x), (2.48) which has characteristic polynomial p(r) = ar2 + br + c so that we can write the differential operator as L = p(D). Let us recall that there are three cases for the roots of p(r) = 0: i) distinct real roots r1 , r2 , ii) a double real root r1 = r2 , or iii) complex conjugate roots r = α ± iβ. The type of function f (x) that we shall consider in (2.48) is f (x) = xk eαx cos βx or xk eαx sin βx, (2.49) where k is a nonnegative integer and α, β are real numbers. Based upon Table 1, our expected trial solution is yp (x) = (Ak xk + · · · + A0 )eαx cos βx + (Bk xk + · · · + B0 )eαx sin βx. (2.50) What could go wrong? Some term in (2.50) could be annihilated by p(D). What are the possibilities? • If β = 0 and p(α) = 0, then α is a real root with multiplicity m = 1 or 2. Then p(D)[A0 eαx ] = 0 so we must multiply yp by xm . • If p(α ± iβ) = 0, then p(D)[A0 eαx cos βx] = 0 = p(D)[B0 eαx sin βx], so we must multiply yp by x. (Note that p(α + iβ) = 0 ⇔ p(α − iβ) = 0.) We summarize these results in the following: f (x) x e cos βx or xk eαx sin βx k αx General Trial yp (x) x (Ak xk + · · · + A0 )eαx cos βx + m x (Bk xk + · · · + B0 )eαx sin βx, where m is the smallest integer 0, 1, or 2 so no term is annihilated by L m Table 2. The general case for undetermined coefficients 62 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS Example 5. Find the general solution of y 00 − 4y 0 + 5y = e2x sin x. Solution. We first find the complementary solution by solving y 00 − 4y 0 + 5y = 0. The characteristic equation r2 − 4r + 5 = 0 has complex roots r = 2 ± i, so yc (x) = e2x (c1 cos x + c2 sin x). Our first guess for finding a particular solution would be yp = e2x (A cos x + B sin x), but this is a solution of the homogeneous equation, so we must multiply by x: yp (x) = x e2x (A cos x + B sin x). Let us differentiate this twice: yp0 (x) = e2x (A cos x + B sin x) + xe2x [(2A + B) cos x + (−A + 2B) sin x] yp00 (x) = e2x [(4A + 2B) cos x + (−2A + 4B) sin x] + xe2x [(3A + 4B) cos x + (−4A + 3B) sin x] . If we plug this into y 00 − 4y 0 + 5y = e2x sin x and simplify, after some algebra we get 2B cos x − 2A sin x = sin x ⇒ A=− 1 2 and B = 0. We conclude that our particular solution is 1 yp (x) = − x e2x cos x, 2 and the general solution is y(x) = − 12 x e2x cos x + e2x (c1 cos x + c2 sin x). 2 Let us do one more example to illustrate the different roles played by xm and Ak xk + · · · + A0 in Table 2. Example 6. Find the general solution for y 00 + y = x sin x. Solution. We first note that the general solution of the homogeneous equation y 00 +y = 0 is yc (x) = c1 cos x + c2 sin x. Our first guess for a particular solution would be to use m = 0 in Table 2, i.e. yp (x) = (A1 x + A0 ) cos x + (B1 x + B0 ) sin x. But both A0 cos x and B0 sin x are solutions of the homogeneous equation, so we must use m = 1 in Table 2: yp (x) = x (A1 x + A0 ) cos x + x (B1 x + B0 ) sin x = (A1 x2 + A0 x) cos x + (B1 x2 + B0 x) sin x. Now no terms in yp are solutions of the homogeneous equation, so this should work! Let us proceed by differentiating yp : yp0 = (2A1 x + A0 ) cos x − (A1 x2 + A0 x) sin x + (2B1 x + B0 ) sin x + (B1 x2 + B0 x) cos x 2.5. NONHOMOGENEOUS EQUATIONS WITH CONSTANT COEFFICIENTS 63 yp00 = 2A1 cos x − (2A1 x + A0 ) sin x − (2A1 x + A0 ) sin x − (A1 x2 + A0 x) cos x + 2B1 sin x + (2B1 x + B0 ) cos x + (2B1 x + B0 ) cos x − (B1 x2 + B0 x) sin x. Plug this into the equation y 00 + y = [4B1 x + 2A1 + 2B0 ] cos x + [−4A1 x − 2A0 + 2B1 ] sin x = x sin x. Comparing the coefficients, we have the following equations −4A1 = 1, 4B1 = 0, 2A1 + 2B0 = 0, −2A0 + 2B1 = 0. We easily solve these to find A1 = −1/4, B1 = 0, A0 = 0, and B0 = 1/4, so our particular solution is x2 x yp (x) = − cos x + sin x. 4 4 Therefore, the general solution is 2 y(x) = − x4 cos x + x 4 sin x + c1 cos x + c2 sin x. 2 Finally, let us observe that it is easy to generalize the method of undetermined coefficients to functions of the form f (x) = f1 (x) + f2 (x), where f1 and f2 are of different types in the table: take yp to be the sum of the two trial solutions indicated. (Be careful to use different m parameters for the two indicated trial solutions.) Let us illustrate this with an example. Example 7. Find the general solution for y 00 − 4y = 8 e2x + 5 cos x. Solution. We first find the general solution of the associated homogenous equation y 00 − 4y = 0. The characteristic equation is r2 − 4 = (r − 2)(r + 2) = 0, so yc (x) = c1 e2x + c2 e−2x . In f (x) = 8 e2x + 5 cos x, the term e2x is a solution of the homogeneous equation, but cos x and sin x are not. Consequently, we take as our trial solution yp (x) = A x e2x + B cos x + C sin x. We calculate yp0 = Ae2x + 2Axe2x − B sin x + C cos x and yp00 = 4Ae2x − B cos x − C sin x. We plug these into our differential equation to obtain 4 A e2x − 5B cos x − 5C sin x = 8 e2x + 5 cos x. We conclude A = 2, B = −1, and C = 0, so yp (c) = 2 x e2x − cos x, and the general solution is y(x) = 2 x e2x − cos x + c1 e2x + c2 e−2x . 2 Remark 1. There is another method for finding a particular solution of a nonhomogeneous equation when undetermined coefficients fails, either because the equation has nonconstant coefficients, or f (x) is not of the form appearing in Table 2. This method, called “variation of parameters,” is discussed in Exercises 3 and 4. 64 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS Exercises 1. Find the general solution y(x): (a) y 00 + 3y 0 + 2y = 4 e−3x . (d) y 00 + 9y = 2 sin 3x. (b) y 00 + 4y = 2 cos 3x. (e) y 00 + 2y 0 = 4x + 3. Solution (c) y 00 + y 0 − 2y = 4x + 1. (f) y 00 − y 0 + 2y = 3ex + 4 cos 2x 2. Find the solution of the initial-value problem: (a) y 00 − 4y = 8e2x , y(0) = 1, y 0 (0) = 0. 00 0 00 0 (b) y + 4y + 4y = 5 e −2x Solution 0 , y(0) = 1, y (0) = −1. (c) y + 3y + 2y = x + 6ex , y(0) = 0, y 0 (0) = 1. (d) y 00 + y = 1 + 3 sin x, y(0) = 2, y 0 (0) = 0. 3. Another method for finding a particular solution of the nonhomogeneous equation y 00 + p(x)y 0 + q(x)y = f (x) (2.51) is called variation of parameters. Starting with linearly independent solutions y1 , y2 of the associated homogeneous equation, we write yp (x) = u1 (x)y1 (x) + u2 (x)y2 (x), (2.52) and try to find functions u1 , u2 so that yp satisfies (2.51). (a) Impose the additional condition on u1 , u2 that u01 y1 + u02 y2 = 0, (2.53) and plug (2.52) into (2.51) to obtain u01 y10 + u02 y20 = f. (2.54) (b) Eliminate u02 from (2.53)-(2.54) and solve for u01 to find u01 = − y2 f . W (y1 , y2 ) Obtain an analogous expression for u02 . (c) Show that yp is given by Z yp (x) = −y1 (x) y2 (x) f (x) dx + y2 (x) W (y1 , y2 )(x) Z y1 (x) f (x) dx. W (y1 , y2 )(x) (2.55) 2.6. FORCED MECHANICAL VIBRATIONS 65 4. Use (2.55) to find a particular solution 2.6 (a) y 00 + 3y 0 + 2y = ex (c) y 00 + y = tan x (b) y 00 + y = sin2 x (d) y 00 + 9y = sec 3x Forced Mechanical Vibrations In this section we apply the techniques of the preceding section to study the nonhomogeneous second-order linear differential equation associated to a forced mechanical vibration. In particular, we consider the spring-mass system (damped or undamped) that we studied in Section 2.4, but add a time-varying external forcing function f (t). According to Newton’s Second Law, the equation governing this motion is m dx d2 x +c + kx = f (t), 2 dt dt (2.56) We know that the general solution of (2.56) is of the form x(t) = xp (t) + xc (t), where xc (t) is the general solution of the homogeneous equation mx00 +cx0 +kx = 0 that we discussed in detail in Section 2.4. So the problem is reduced to finding a particular solution xp (t), which we can do using the method of undetermined coefficients. We saw that we can use the method of undetermined coefficients when f (t) is a linear combination of terms of the form tk eαt cos βt and tk eαt sin βt. However, we shall usually consider a simple periodic forcing function f (t) of the form f (t) = F0 cos ωt. (2.57) Notice that f (t) has amplitude F0 and a circular frequency ω that we call the forcing frequency. We know that damping can make a big difference in the form of the general solution, so let us begin by considering the undamped case. If we take c = 0 in (2.56), we obtain m d2 x + kx = F0 cos ωt. dt2 (2.58) We know that the complementary solution is r xc (t) = c1 cos ω0 t + c2 sin ω0 t, where ω0 = k . m The quantity ω0 is called the natural frequency of the spring-mass system, and is independent of the forcing frequency ω. It is not surprising that the behavior of the particular solution depends on whether these two frequencies are equal or not. Fig.1. Spring-mass system with external forcing. 66 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS Undamped, Forced Vibrations: ω 6= ω0 We can use A cos ωt instead of A cos ωt + B sin ωt since the left hand side does not involve x0 . As our trial solution for (2.58), let us take xp (t) = A cos ωt. Since x00p (t) = −ω 2 A cos ωt, we obtain from (2.58) (−m ω 2 + k) A cos ωt = F0 cos ωt, so A= F0 /m F0 = 2 . k − m ω2 ω0 − ω 2 Thus our particular solution is xp (t) = F0 /m cos ωt ω02 − ω 2 and our general solution is x(t) = F0 /m cos ωt + c1 cos ω0 t + c2 sin ω0 t. ω02 − ω 2 (2.59) Once we have determined c1 and c2 from given initial conditions, we can put that part of the solution into amplitude-phase form and write the solution as x x(t) = t Fig.2. Spring-mass system with periodic external forcing. F0 /m cos ωt + C cos(ω0 t − φ). ω02 − ω 2 (2.60) In this form, it is clear that the solution is a superposition of two vibrations, one with circular frequency ω0 (the natural frequency) and one with frequency ω (the forcing frequency). If ω and ω0 happen to satisfy pω = qω0 for some positive integers p and q, then the solution is periodic with period T = 2πp/ω0 = 2πq/ω; otherwise, the vibration is not periodic, and can appear quite complicated (see Figure 2). Although we have assumed ω 6= ω0 , it is possible for ω to be close to ω0 ; in this case something quite interesting occurs. For simplicity, let us assume that the mass is initially at rest when the external force begins, i.e. we have initial conditions x(0) = 0 = x0 (0). Using these in (2.59), we find F0 /m + c1 = 0 ω02 − ω 2 so our solution is x(t) = and c2 = 0, F0 /m (cos ωt − cos ω0 t). ω02 − ω 2 Now we use the trigonometric identity 2 sin A sin B = cos(A − B) − cos(A + B) with A = (ω0 + ω)t/2 and B = (ω0 − ω)t/2 to put our solution in the form x(t) = 2F0 /m (ω0 − ω)t (ω0 + ω)t sin sin . ω02 − ω 2 2 2 (2.61) 2.6. FORCED MECHANICAL VIBRATIONS 67 Since ω ≈ ω0 , we see that sin(ω0 − ω)t/2 has very small frequency (i.e. is slowly varying) compared with sin(ω0 + ω)t/2. Therefore, we can consider our solution to be rapidly varying with frequency (ω0 + ω)/2 and a slowly varying amplitude (see Figure 3): x(t) = A(t) sin (ω0 + ω)t , 2 where A(t) = x (ω0 − ω)t 2F0 /m sin . ω02 − ω 2 2 t This phenomenon is experienced in acoustics in the form of beats: if two musical instruments play two notes very close in pitch (i.e. frequency), then their combined volume (i.e. amplitude) will be heard to vary slowly in regular “beats.” Fig.3. “Beats” in ampli- Undamped, Forced Vibrations: ω = ω0 (Resonance) If ω = ω0 , then the denominator in (2.59) is zero, so that formula cannot be used for the solution. In fact, f (t) = F0 cos ω0 t is a solution of the homogeneous equation, so we need to modify our trial solution in the method of undetermined coefficients: tude result when natural and forcing frequencies are close. xp (t) = t (A cos ω0 t + B sin ω0 t). We need to include both A cos ωt and B sin ωt because we have multiplied by t. If we plug into the equation (2.58), we obtain −2Aω0 sin ω0 t + 2Bω0 cos ω0 t = F0 cos ω0 t, m x which implies A=0 F0 and B = , 2mω0 and so our particular solution is xp (t) = F0 t sin ω0 t. 2mω0 t (2.62) This solution oscillates with frequency ω0 but ever-increasing amplitude due to the factor t (see Figure 4). This is called resonance. The phenomenon of resonance is very important in physical structures like buildings and bridges which have a natural frequency of motion, and are subject to external forces such as wind or earthquakes. If the external force is applied periodically with the same frequency as the natural frequency of the structure, it will lead to larger and larger vibrations which could eventually destroy the structure. Moreover, this can occur even if the external force is very small but is sustained over a long period of time. For example, the Broughton Suspension Bridge near Manchester, England, collapsed in 1831 when British troops marched across in step; as a result of this event, the British military issued an order that troops should “break step” when marching across a bridge. Damped, Forced Vibrations Physical systems usually involve some damping forces, so it is important to analyze (2.56) when c is not zero. As above, we consider the periodic forcing function (2.57), so we want to study the behavior of solutions to m d2 x dx +c + kx = F0 cos ωt, 2 dt dt (2.63) Fig.4. Resonance results when natural and forcing frequencies coincide. 68 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS Recall that we solved the homogeneous equation in Section 2.4, and found the complementary solution xc (t) in the three cases: overdamped, critically damped, and underdamped. In each case, we found that xc (t) → 0 exponentially quickly as t → ∞. For this reason, xc (t) is called the transient part of the general solution; this terminology is also applied to a solution of an initial-value problem (see Example 1 below). Now let us find a particular solution xp (t) for (2.63). We take as our trial solution xp (t) = A cos ωt + B sin ωt. (2.64) We can plug this into our equation and evaluate the coefficients A and B to find (see Exercise 1): A= (k − mω 2 )F0 (k − mω 2 )2 + c2 ω 2 and B = c ωF0 . (k − mω 2 )2 + c2 ω 2 (2.65) However, we would like to write (2.64) in amplitude-phase form C cos(ωt − φ), so let us introduce C > 0 and 0 ≤ φ < 2π satisfying C cos φ = A and C sin φ = B. We immediately find (after some easy algebra) C= x t p A2 + B 2 = p F0 (k − mω 2 )2 + c2 ω 2 . To find φ, we observe that C, B > 0 ⇒ sin φ > 0 ⇒ 0 < φ < π; of course, we also have tan φ = B/A, so we will have  −1 2  tan (B/A) if A > 0 (i.e. if k > mω ) (2.66) φ = π/2 if A = 0 (i.e. if k = mω 2 )   −1 tan (B/A) + π if A < 0 (i.e. if k < mω 2 ). In any case, we can write our particular solution as Fig.5. Vibration becomes steady-periodic. Fig.6. Practical resonance. F0 xp (t) = p cos(ωt − φ). (k − mω 2 )2 + c2 ω 2 (2.67) This is, of course, a periodic function. Since the general solution of (2.63) is of the form x(t) = xp (t) + xc (t) and we have already observed that xc (t) → 0 as t → ∞, we see that every solution of (2.63) tends towards xp (t) as t → ∞. For this reason, xp is called the steady-periodic solution of (2.63), and sometimes denoted xsp . Since c > 0, we do not have true resonance: in particular, the trial solution (2.64) is never a solution of the associated homogeneous equation, so we pdo not need to multiply k/m, the denominator by t. However, if c is very small and ω is close to ω0 = in (2.67) can be very close to zero, which means that the amplitude of the steadyperiodic solution can be very large. This is called practical resonance: starting from equilibrium, the increasing amplitudes can look a lot like resonance before they taper off into the steady-periodic solution (see Figure 6). Moreover, a very large amplitude in the steady-periodic solution can be as destructive for buildings and bridges as is true 2.6. FORCED MECHANICAL VIBRATIONS 69 resonance, so it needs to be avoided. In fact, all physical structure experience at least a small amount of damping, so practical resonance is actually more relevant than true resonance. Incidentally, one might suspect that the maximal practical resonance occurs when ω = ω0 , i.e. k − mω 2 = 0, but this is not quite correct (cf. Exercise 5). Example 1. Consider a mass of 1 kg attached to a spring with k = 17 N/m and a dashpot with c = 2 N/(m/s). If the mass begins at equlibrium and then is subjected to a periodic force f (t) = 5 cos 4t, (a) find the solution, (b) identify the transient and steady-periodic parts of the solution, and (c) discuss practical resonance for various periodic forcing functions. Solution. The initial-value problem governing the motion is x00 + 2x0 + 17x = 5 cos 4t, x(0) = 0 = x0 (0). Rather than simply plugging into the solution formulas derived above, let us apply the solution method; this is usually the best way to approach a specific problem. So we begin with finding the general solution of the associated homogeneous equation x00 + 2x0 + 17x = 0. The characteristic equation is r2 +2r +17 = 0, which has complex solutions r = −1±4i. So the system is underdamped and the complementary solution is xc (t) = e−t (c1 cos 4t + c2 sin 4t). To find a particular solution of the nonhomogeneous equation, we use xp (t) = A cos 4t + B sin 4t. Plugging into the nonhomogeneous equation and equating coefficients of cosine and sine, we obtain A + 8B = 5 and B − 8A = 0. We easily solve these equations to find A = 1/13 and B = 8/13, so our particular solution is 8 1 cos 4t + sin 4t. xp (t) = 13 13 We now have our general solution x(t) = 1 8 cos 4t + sin 4t + e−t (c1 cos 4t + c2 sin 4t). 13 13 x 0.5 We next use our initial conditions to evaluate c1 and c2 : 1 1 + c1 = 0 ⇒ c1 = − 13 13 32 33 0 x (0) = − c1 + 4c2 = 0 ⇒ c2 = − . 13 52 x(0) = We conclude that the solution of the initial-value problem is x(t) = 1 8 1 33 cos 4t + sin 4t − e−t cos 4t − e−t sin 4t. 13 13 13 52 t 0 1 2 3 4 5 6 7 -0.5 Fig.7. Example 1 solution. 8 70 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS We easily identify the transient and steady-periodic parts of the solution: 1 −t 33 e cos 4t − e−t sin 4t. 13 52 p √ Note that the amplitude of the steady-periodic solution is 65/13 = 5/13 ≈ 0.620, and we can put the steady-periodic solution into amplitude-phase form (as in 2.67) √ 65 xsp (t) = cos(4t − 1.45), 13 xsp (t) = 1 8 cos 4t + sin 4t, 13 13 xtr (t) = − where we have used phase angle φ = tan−1 (8) ≈ 1.45. Now let us discuss practical resonance. Notice √ that the natural frequency of the spring-mass system (without damping) is ω0 = 17 ≈ 4.123, so the forcing frequency ω = 4 is relatively close to ω0 ; this means that practical resonance should be contributing √ to the steady-periodic amplitude of 65/13 = 0.602. On the other hand, if we had taken a forcing function with the same magnitude but a frequency ω̃ further from ω0 , then the amplitude of the resulting steady-periodic solution should be reduced. For example, if we take fe(t) = 5 cos 2t, then the steady-periodic solution according to (2.67) is 5 cos(2t − φ) 185 √ for some phase angle φ; notice that the amplitude 5/ 185 ≈ 0.368 is much reduced from 0.620! (Of course, we could choose another frequency that would increase the effect of √ practical resonance. If we use the value ω ∗ = 15 found in Exercise 4, we find that the forcing function fe(t) = 5 cos(ω ∗ t) results in a steady-periodic solution with amplitude 5/8 ≈ 0.625, which is slightly larger than 0.602.) 2 x esp (t) = √ While Example 1 illustrates the effect of practical resonance, one might protest that an increase in amplitude from 0.368 to 0.620 (or even to 0.625) does not seem as catastrophic as the effect of resonance when there is no damping. But the value c = 2 that was used in Example 1 is larger (compared with m = 1 and k = 17) than would be expected in cases in which practical resonance might be a concern. If a very small value of c is used, then the amplitude of the steady-periodic solution that results from using a near-resonant forcing frequency can become very large; cf. Exercise 8. Exercises 1. A mass of 1 kg is attached to a spring with constant k = 4 kg/ft. Initially at equilibrium, a periodic external force of f (t) = cos 3t begins to affect the mass for t > 0. Find the resultant motion x(t). 2. A 100 lb weight is attached to a vertical spring, which stretches the spring 1 ft. Then the weight is subjected to a periodic external force f (t) = f0 cos ωt. For what value of the circular frequency ω will resonance occur? 3. For the following undamped, forced vibrations mx00 + kx = f (t), identify the natural and forcing frequencies, ω0 and ω, and determine whether resonance occurs. If resonance does not occur but ω ≈ ω0 , find the amplitude and frequency of the beats. 2.7. ELECTRICAL CIRCUITS 71 (a) 2x00 + 18x = 5 cos 3t. (b) 3x00 + 11x = 2 cos 2t. (c) 6x00 + 7x = 3 cos t. Solution (d) 3x00 + 12x = 4 cos 2t. 4. Evaluate the constants A and B in (2.64) and confirm that they have the values (2.65). 5. To find the frequency ω ∗ that will produce maximal practical resonance, we want to minimize the denominator in (2.67). Use calculus to show that, provided c2 < 2mk, such a minimum occurs at r 2mk − c2 ∗ . ω = 2m2 p (Note that ω ∗ ≈ k/m for c very small.) 6. For the given values of m, c, k, and f (t), assume the forced vibration is initially at equilibrium. For t > 0, find the motion x(t), and identify the steady-periodic and transient parts. (a) m = 1, c = 4, k = 5, f (t) = 10 cos 3t. Solution (b) m = 1, c = 6, k = 13, f (t) = 29 sin 5t. (c) m = 2, c = 2, k = 1, f (t) = 5 cos t. 7. For the given values of m, c, k, and f (t), find the steady-periodic solution in amplitude-phase form A cos(ωt − φ). (a) m = 1, c = 2, k = 5, f (t) = 26 cos 3t. (b) m = 1, c = 2, k = 6, f (t) = 5 sin 4t. 8. Let m = 1, c = 0.1, k = 16, and f (t) = 5 cos ωt where ω is to be determined. For each of the values of ω below, calculate the amplitude of the steady-periodic solution. Can you account for the dramatically different numbers? (a) ω = 2, 2.7 (b) ω = 4, (c) ω = 6 Electrical Circuits In this section we consider another application of second-order differential equations, namely to an electrical circuit involving a resistor, an inductor, and a capacitor. If we let vR , vL , and vC denote the respective voltage changes across these three components, then Kirchoff’s circuit laws tells us vR + vL + vC = v(t), (2.68) where v(t) is the (possibly) time-varying voltage provided by an electrical source in the circuit. The quantities vR , vL , and vC can be expressed in terms of the electric charge on the capacitor q(t), which is measured in coulombs (C), and the electric current in the circuit i(t), which is measured in amperes (A): Fig.1. An RLC Circuit. 72 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS • According to Ohm’s Law, the voltage drop across the resistor is proportional to the current. If we denote this proportionality constant by R, measured in ohms (Ω), then we obtain vR = R i. • A capacitor stores charge and opposes the passage of current. The resultant difference in voltage across the capacitor is proportional to the charge, and it is customary to denote this proportionality constant by 1/C, where C is measured in farads (F). Consequently, we have vC = 1 q. C • An inductor stores energy and opposes the passage of current. The resultant difference in voltage across the inductor is proportional to the rate of change of the current; if we denote this constant by L, measured in henrys (H), then we have di vL = L . dt Substituting these relations into Kirchoff’s voltage law (2.68), we obtain L di 1 + R i + q = v(t). dt C But the current is just the rate of change of the charge, so i(t) = dq , dt (2.69) and we obtain the second-order linear nonhomogeneous equation for q: L 1 d2 q dq + q = v(t). +R 2 dt dt C (2.70) In applications, the current is usually more important than the charge, but if we solve (2.70) with initial conditions q(0) = q0 and i(0) = i0 (2.71) to find q(t), then we can find i(t) using (2.69). If we compare (2.70) with (2.56), we see that there is a mechanical-electric analogue provided by the following table: LRC Circuit Spring-Mass System q(t) x(t) L m R c 1/C k v(t) f (t) To apply this analogue, let us consider the homogenous equation associated with (2.70): L d2 q dq 1 +R + q = 0. 2 dt dt C (2.72) 2.7. ELECTRICAL CIRCUITS 73 We find the characteristic equation is L r2 + R r + C −1 = 0, which has roots p −R ± R2 − 4L/C r= . 2L Analogous to mechanical vibrations, we have the three cases: • R2 > 4L/C is overdamped and the general solution of (2.72) is p R2 − 4L/C −Rt/2L −µ t µt . q(t) = e (c1 e + c2 e ), where µ = 2L • R2 = 4L/C is critically damped and the general solution of (2.72) is q(t) = e−Rt/2L (c1 + c2 t). • R2 < 4L/C is underdamped and the general solution of (2.72) is p 4L/C − R2 −Rt/2L . q(t) = e (c1 cos µt + c2 sin µt), where µ = 2L √ In particular, if R = 0 then µ simplifies to 1/ LC; we generally write ω0 = √ 1 , LC as this corresponds to the natural frequency of the “undamped circuit.” Recall that the general solution of (2.70) is given by q(t) = qp (t) + qc (t), where qp is a particular solution of (2.70) and qc is the general solution of (2.72). Since lim qc (t) = 0, t→∞ we call qc the transient part of the solution. To determine a particular solution qp of (2.70), we can use undetermined coefficients, depending on the form of v(t). The two most common cases are when v(t) ≡ v0 is a constant (this corresponds to direct current such as supplied by a battery) or v(t) is periodic (this corresponds to alternating current, such as household power). We consider the periodic case first. Let us assume our periodic electric source is v(t) = V0 cos ωt, where V0 is the amplitude and ω is the frequency, so (2.70) becomes L d2 q dq 1 +R + q = V0 cos ωt. dt2 dt C (2.73) But we used undetermined coefficients to find a particular solution for (2.73) under the mechanical vibration analogue, so we can simply avail ourselves of the formula (2.67), and make the appropriate substitutions to obtain qp (t) = p V0 (C −1 − Lω 2 )2 + R2 ω 2 cos(ωt − φ), (2.74) 74 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS where 0 < φ < π is the phase shift given by (2.66). Since q(t) → qp (t) as t → ∞, we call qp the steady-periodic charge and denote it by qsp , just as we did for mechanical vibrations. By differentiating qsp we obtain the steady-periodic current: V0 sin(ωt − φ − π), isp (t) = q 1 ( Cω − Lω)2 + R2 (2.75) where we have used − sin(θ) = sin(θ − π) to remove the negative sign that result from differentiating cos. Example 1. Suppose a resistor of 3 ohms, a capacitor of 5 × 10−3 farads, and an inductor of 10−2 henrys are connected in an electric circuit with a 110 volt, 60 Hz alternating current generator. Find the steady periodic current isp (t). Solution. We have R = 3, C = 5 × 10−3 , and L = 10−2 . Frequency 60 Hz means ω = (2π) 60 = 120π, so our initial-value problem for q(t) is 10−2 q 00 + 3q 0 + 200 q = 110 cos(120π t), q(0) = 0 = q 0 (0). We first find qsp . We can plug our values for L, R, and C into (2.74) to obtain qsp (t) = p 110 (200 − 10−2 ω 2 )2 + 9 ω 2 cos(ωt − φ), ω = 120π, but to obtain the phase shift φ we need to appeal again to our mechanical vibration analogue: making the appropriate substitutions into (2.65), we find φ satisfies cos φ = p (200 − ω 2 ) 110 (200 − 10−2 ω 2 )2 + 9 ω 2 and sin φ = p 3 ω 110 (200 − 10−2 ω 2 )2 + 9 ω 2 . Since 200 − ω 2 = 200 − (120π)2 < 0, we see that cos φ < 0, so φ is in the second quadrant, and we compute φ = tan−1 3ω + π ≈ 3.134. 200 − ω 2 Evaluating the amplitude in qsp we can write qsp (t) = 6.61 × 10−2 cos(120π t − 3.134). This is the steady-periodic charge, but to find the steady-periodic current we need to differentiate. Since 120π × 6.61 × 10−2 ≈ 24.9 and − sin(θ) = sin(θ − π), we obtain isp (t) = −24.9 sin(120π t − 3.134) = 24.9 sin(120π t − 6.276). 2 In the previous problem, we did not need (and were not given) initial conditions. But if we had been given initial conditions, then we could have used them to find the solution q(t) satisfying the initial conditions. Since q(t) = qp (t) + qc (t), this means we would also have found the transient part of the solution, namely qc (t). Some examples of this occur in the Exercises. But now let us consider a problem involving direct current. 2.7. ELECTRICAL CIRCUITS 75 Example 2. Suppose the circuit in the previous example is disconnected from the alternating current generator, and all the charge is allowed to decay. At t = 0, a battery supplying a constant power of 110 volts is attached. Find the charge q(t) and the current i(t). Solution. At t = 0 there is no charge (q(0) = 0) and no current (i(0) = q 0 (0) = 0), so our initial-value problem for q(t) is 10−2 q 00 + 3q 0 + 200 q = 110, q(0) = 0 = q 0 (0). (2.76) This is a nonhomogeneous equation that we can easily solve to find q(t) and then differentiate to find i(t). But if we differentiate the equation and use q 00 = i0 and q 000 = i00 , we obtain a homogeneous equation for i(t), namely 10−2 i00 + 3i0 + 200i = 0. However, we need initial conditions to solve this equation. We certainly have i(0) = q 0 (0) = 0, but what about i0 (0)? We simply evaluate 10−2 i0 + 3i + 200q = 110 at t = 0 to obtain 10−2 i0 (0) + 3 i(0) + 200 q(0) = 110 ⇒ i0 (0) = 1.1 × 104 . Consequently, our initial-value problem for i(t) becomes 10−2 i00 + 3 i0 + 200 i = 0, i(0) = 0, i0 (0) = 1.1 × 104 . To solve this initial-value problem, we consider the characteristic equation: √ −300 ± 9 × 104 − 8 × 104 2 4 = −100, −200. r + 300 r + 2 × 10 = 0 ⇒ r = 2 We see the circuit is overdamped and the general solution is i(t) = c1 e−100t + c2 e−200t . We next use the initial conditions to evaluate c1 and c2 : c1 = 1.1 × 102 , c2 = −1.1 × 102 . So the current in the circuit is i(t) = 1.1 × 102 (e−100 t − e−200 t ). In particular, we see that the current decays rapidly to zero even though there is a constant electric source. Now we can simply integrate i(t) to find q(t) = −1.1 e−100t + 0.55 e−200t + c, and then use q(0) = 0 to obtain c = 0.55. We conclude that the charge is given by q(t) = 0.55 − 1.1 e−100t + 0.55 e−200t . Note that q(t) → 0.55 as t → ∞, so it does not decay to zero. (Of course, q(t) could also have been found directly by solving the nonhomogeneous equation (2.76). 2 76 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS Electrical Resonance Resonance can occur in electrical circuits in the same way that it occurs in mechanical vibrations, except that we need to specify whether we are considering q(t) √ or i(t). In the undamped case (R = 0), we see that true resonance occurs at ω0 = 1/ LC; this can be found by making the denominator in either (2.74) or (2.75) equal to zero, or by considering the roots of the characteristic equation Lr2 + C −1 = 0 that is associated with (2.73). Moreover, we see that practical resonance for qsp (t) can occur if the denominator in (2.74) is small; as for mechanical vibrations, this requires R to be small and ω close to ω0 , but the resonant frequency ω ∗ is that which minimizes the denominator in (2.74). We can also consider practical resonance for isp : it occurs at the frequency ω # that minimizes the denominator in (2.75). It is interesting that the resonant frequencies ω ∗ and ω # are not exactly the same (see Exercises 1 and 2). Recall from Section 2.6 that resonance in mechanical vibrations can be a very destructive phenomenon. However, in electrical circuits, resonance is used in some very useful applications. One example is tuning a radio. To tune in a radio station broadcasting with frequency ω, we can adjust the capacitor C so that ω coincides with the √ resonant frequency of the circuit 1/ LC, i.e. we let C = 1/Lω 2 . This maximizes the amplitude of isp , i.e. the volume of the sound of that station, enabling us to hear it above stations broadcasting at other frequencies. Exercises 1. Find the frequency ω ∗ in the electric source that will induce practical resonance for the steady-periodic charge qsp (t) in (2.74). 2. Find the frequency ω # in the electric source that will induce practical resonance for the steady-periodic current isp (t). 3. For the following LRC circuits with periodic electric source v(t), find the steadyperiodic current in the form isp (t) = I0 sin(ω t − δ), where I0 > 0 and 0 ≤ δ < 2π. (a) R = 30, L = 10, C = 0.02, v(t) = 50 cos 2t. (b) R = 20, L = 10, C = 0.01, v(t) = 200 cos 5t (c) R = 3/2, L = 1/2, C = 2/3, v(t) = 13 cos 3t 4. For the following LRC circuits with electric source v(t), assume the charge and current are initially zero, and find the current i(t) for t > 0. Is the circuit overdamped, critically damped, or underdamped? (a) L = 1, R = 2, C = 1/5, v(t) = 10. (b) L = 2, R = 60, C = 0.0025, v(t) = 100e−5t , q(0) = 1, i(0) = 0. (c) L = 10, R = 20, C = 0.01, v(t) = 130 cos 2t, q(0) = 0, i(0) = 0. (d) L = 1/2, R = 3/2, C = 1, v(t) = 5 cos 2t, q(0) = 1, i(0) = 0 2.8. ADDITIONAL EXERCISES 2.8 77 Additional Exercises 1. An object of unknown mass m stretches a vertical spring 2 feet. Then the object is given an upward velocity of 3 ft/sec. Find the period of the resultant motion. 2. An object of unknown mass m is attached to a spring with spring constant k = 1 N/m. If the mass is pulled 1 m beyond its equilibrium position and given velocity 1 m/sec back towards its equilibrium position, this results in an oscillation with amplitude 3 m. Find the mass m. 3. Two springs S1 and S2 both have natural length 1 m, but different spring constants: k1 = 2 N/m and k2 = 3 N/m. Suppose the springs are attached to two walls 1 m apart with a mass m = 1 kg between them. (a) In the equilibrium position for this new arrangement, what are the lengths of S1 and S2 ? (Neglect the “length” of the mass.) (b) If the mass is displaced from the equilibrium position and released, find the frequency of the resultant vibration. 4. Consider the springs S1 , S2 and mass m = 1 kg as in the previous exercise, but now assume the two walls are 3 m apart. Answer the same questions (a) and (b). 5. A mass m = 1 kg is attached to a spring with constant k = 4 N/m and a dashpot with variable damping coefficient c. If the mass is to be pulled 2 m beyond its equilibrium (stretching the spring) and released with zero velocity, what value of c ensures that the mass will pass through the equilibrium position and compress the spring exactly 1 m before reversing direction? 6. A mass m = 1 kg is attached to a spring with constant k = 4 N/m and a dashpot exerting a resistive force propostional to velocity but with unknown coefficient c. The mass is pulled √ 1 m beyond its equilibrium resulting in a motion with pseudo-period µ = 3. Find c. 7. Consider the homogeneous equation with nonconstant coefficients y 00 + 1 0 1 y − 2y=0 x x for x > 0. (a) Assuming y = xr , find two linearly independent solutions y1 , y2 . (b) Use variation of parameters (see Exercise 3 in Section 2.5) to find a particular solution of √ 1 1 y 00 + y 0 − 2 y = x for x > 0. x x 8. Consider the homogeneous equation with nonconstant coefficients y 00 + 1 0 1 y + 2y=0 x x for x > 0. (a) Find two linearly independent (real-valued) solutions y1 , y2 . (Hint: first try y = xr .) Fig.1. Exercise 3. 78 CHAPTER 2. SECOND-ORDER DIFFERENTIAL EQUATIONS (b) Use variation of parameters to find a particular solution of y 00 + 1 0 ln x 1 y − 2y= 2 x x x for x > 0. 9. A car’s suspension system and springs and shock absorbers may be modeled as a spring-mass-dashpot system (see Figure 2); irregularities in the road surface act as a forcing function. Suppose the mass of the car is 500 kg, the springs in the suspension system have constant k = 104 N/m, and the shock absorbers have damping coefficient c = 103 N/(m/sec). Suppose the road surface has periodic vertical displacements 0.5 cos π5 x, where x is the position along the road. (a) Show that y(t), the vertical displacement from the car’s equilibrium position, satisfies π y 00 + 2y 0 + 20y = 10 cos vt, 5 where v is the speed of the car. Fig.2. Exercise 9. (b) If the shock absorbers break, what speed of the car will induce resonance along this road? 10. Let us model a tall building as a spring-mass system: a horizontal force F = kx is required to displace the top floor of the building a distance x from its equilibrium position. (The model treats the top floor as the mass and the rest of the building as the spring.) (a) Suppose the top floor has mass 103 kg and a force of 500 N is required to displace it 1 m. If a wind exerts a periodic force f (t) = 100 cos π2 t N on the top floor, find the magnitude of the horizontal displacements. (b) What frequency ω of the periodic wind force f (t) = 100 cos ωt will produce resonance? Fig.3. Exercise 10. (c) If the building is initially in equilibrium and a wind force f (t) = 100 cos ωt with the resonant frequency ω found in (b), how long will it take until the magnitude of the horizontal displacment of the top floor reaches 10 m? 11. Suppose that a pendulum of rod length L and mass m experiences a horizontal force f (t) as in Figure 8. (a) Show that the nonlinear model (2.45) becomes mL Fig.8. Forced Pendulum. d2 θ + mg sin θ = cos θ f (t). dt2 (b) For small oscillations, show that this is well-approximated by the linear equation d2 θ mL 2 + mg θ = f (t). dt Chapter 3 Laplace Transform 3.1 Laplace Transform and Its Inverse The basic idea of the Laplace transform is to replace a differential equation with an algebraic equation. This is achieved by starting with a function f of the variable t, and transforming the function into a new function F of a new variable s, in such a way that derivatives in t are transformed into multiplication by s. As we shall see, this is particularly useful when solving initial-value problems for differential equations that involve discontinuous terms. Let us discuss the details. Definition 1. If f (t) is defined for t ≥ 0, then its Laplace transform F (s), also denoted Lf (s) or L[f (t)], is defined by Z ∞ F (s) = L[f (t)] = e−st f (t) dt, (3.1) The transform strategy 0 for values of s for which the improper integral converges. The Laplace transform is named after French mathematician Pierre Simon de Laplace (1749-1827) Recall that the improper integral is defined by Z ∞ Z N −st e f (t) dt = lim e−st f (t) dt, N →∞ 0 0 and if the limit exists then we say that the improper integral converges. Notice that the integrand e−st f (t) depends on s and t; it is typical that the improper integral converges for some values of s and not for others. Let us calculate some examples. Example 1. If f (t) ≡ 1 for t ≥ 0, then for s 6= 0 we calculate Z ∞ Z N F (s) = L[1] = e−st dt = lim e−st dt N →∞ 0 0 t=N 1 −st 1 1 −sN = lim − e = lim − e . N →∞ N →∞ s s s t=0 Now, provided s > 0, we have 1 s e−sN → 0 as N → ∞, so we conclude L[1] = 1 s for s > 0. 2 79 (3.2) 80 CHAPTER 3. LAPLACE TRANSFORM Example 2. If f (t) = eat for a real number a, then Z ∞ F (s) = e Z −st at ∞ e dt = 0 e 0 Now if s > a, then limN →∞ we conclude e(a−s)N (a−s) (a−s)t e(a−s)t dt = a−s = 0; and since evaluating t=∞ . t=0 e(a−s)t (a−s) at t = 0 gives 1 a−s , 1 for s > a. (3.3) s−a It is worth noting that (3.3) continues to hold if a is a complex number and we replace the condition s > a by s > Re(a). 2 L[eat ] = As we do more examples, we shall gradually treat improper integrals less formally. Example 3. If f (t) = tn for a positive integer n, then we can use integration by parts to compute −st n t=∞ Z Z ∞ n ∞ −st n−1 e t n −st n e t dt. F (s) = L[t ] = + e t dt = −s t=0 s 0 0 Provided s > 0, we can say limt→∞ e−st tn = 0 since exponential decay is greater than polynomial growth, and evaluating e−st tn at t = 0 also gives zero. So Z n ∞ −st n−1 F (s) = e t dt. s 0 We have not reached our answer yet, but we have made progress: the power of t has been reduced from n to n − 1. This suggests that we apply integration by parts iteratively until we reduce to the Laplace transform of t0 = 1, which we know from Example 1: Z Z Z n ∞ −st n−1 n n − 1 ∞ −st n−2 n! ∞ −st n! e t dt = e t dt = · · · = n e dt = n+1 . s 0 s s s 0 s 0 We conclude n! L[tn ] = for s > 0. 2 sn+1 (3.4) Example 4. If f (t) = sin bt for a real number b 6= 0, then we can use a table of integrals (cf. Appendix C) to compute −st t=∞ Z ∞ e F (s) = e−st sin bt dt = 2 (−s sin bt − b cos bt) s + b2 0 t=0 = b s2 + b2 We have shown L[sin bt] = for s > 0. b s2 + b2 for s > 0. (3.5) for s > 0. 2 (3.6) A similar calculation shows L[cos bt] = s s2 + b2 3.1. LAPLACE TRANSFORM AND ITS INVERSE 81 The next example generalizes Example 3. Example 5. Suppose f (t) = ta for a real number a > −1. Then we use a substitution u = st ⇒ du = sdt to evaluate the Laplace transform: Z ∞ Z ∞ Z ∞ ua du 1 e−u ua du. L[ta ] = e−st ta dt = e−u a = a+1 s s s 0 0 0 We conclude that L[ta ] = 1 Γ(a + 1). sa+1 (3.7) In (3.7) we have used the gamma function Γ(x) that is defined for x > 0 by ∞ Z e−t tx−1 dt. Γ(x) = (3.8) 0 The gamma function arises in several fields of mathematics, including probability and statistics. Some of its important properties include Γ(1) = 1 (3.9a) Γ(x + 1) = x Γ(x) √ Γ(1/2) = π. (3.9b) (3.9c) (These properties are not difficult to check: see Exercise 1.) Using (3.9b) we see that Γ(n + 1) = n!, so if a = n then (3.7) agrees with (3.4). Now let us discuss the linearity of the Laplace transform. The next result follows immediately from the linearity of the integral in the definition of the Laplace transform. Theorem 1. If f (t) and g(t) are defined for t ≥ 0, and their Laplace transforms F (s) and G(s) are defined for s > a, then the Laplace transform of f (t) + g(t) exists for s > a and equals F (s) + G(s). In other words, L[f (t) + g(t)] = L[f (t)] + L[g(t)]. Moreover, if λ is a real or complex number, then L[λf (t)] = λL[f (t)]. This result enables us to compute the Laplace transform of more complicated-looking functions. Example 6. If f (t) = 2t3 + 5e−t − 4 sin 2t, then the Laplace transform is F (s) = L[f (t)] = 2L[t3 ] + 5L[e−t ] − 4L[sin 2t] 1 − 4 s22+4 = = 2 s3!4 + 5 s+1 12 s4 + 5 s+1 − 8 s2 +4 for s > 0. 2 82 CHAPTER 3. LAPLACE TRANSFORM Note: There is a table of frequently encountered Laplace transforms in Appendix D. Existence of the Laplace Transform We have defined the Laplace transform and computed it for several examples. However, we have not discussed conditions on f (t) that are sufficient to guarantee that its Laplace transform F (s) exists, at least for sufficiently large s. We do this now. For the Laplace transform to exist, we need the improper integral in (3.1) to converge. We can allow f to be discontinuous, but not too wild. | | t Fig.1. A piecewise continuous function. Definition 2. A function f defined on a closed interval [a, b] is piecewise continuous if [a, b] can be divided into a finite number of subintervals such that (a) f is continuous on each subinterval, and (b) f has finite limits (from within) at the endpoints of each subinterval. A function is piecewise continuous on [0, ∞) if it is piecewise continuous on [0, b] for each finite value b > 0. An important example of a piecewise continuous function is the unit step function ( 0, for t < 0 u(t) = (3.10) 1, for t ≥ 0. Notice that u is continuous on the subinterval (−∞, 0) with a finite limit (from below) at the endpoint 0: u(0−) = lim u(t) = 0. 2 1 t→0− 0 a t And u is continuous on the subinterval [0, ∞) with a finite limit (from above) at the endpoint 0: Fig.2. Graph of u(t − a). u(0+) = lim u(t) = 1. t→0+ The discontinuity of u at t = 0 is called a finite jump discontinuity; this is exactly the kind of discontinuity allowed for piecewise continuous functions. We can also shift the discontinuity from 0 to another point a: ( 0, for t < a ua (t) = u(t − a) = (3.11) 1, for t ≥ a. Example 7. If a > 0, then let us compute the Laplace transform of ua (t): −st t=∞ Z ∞ Z ∞ e e−as L[ua (t)] = e−st ua (t) dt = e−st dt = = . −s t=a s 0 a (3.12) Even if f is continuous, the improper integral in (3.1) might diverge if f grows too rapidly at infinity, so we need to impose some limit on its growth. Definition 3. A function f defined on [0, ∞) is of exponential order c as t → ∞ (where c is a real number) if there exist positive constants M and T so that |f (t)| ≤ M ect for all t > T. (3.13) 3.1. LAPLACE TRANSFORM AND ITS INVERSE 83 All of the functions that we have considered so far are certainly of exponential order 2 as t → ∞; on the other hand, f (t) = et grows too rapidly as t → ∞ for the Laplace transform to exist. We can now state and prove our existence theorem for the Laplace transform. Theorem 2. If f is piecewise continuous on [0, ∞) and of exponential order c as t → ∞, then the Laplace transform F (s) = L[f (t)] exists for all s > c (where c is the constant in (3.13)). Proof. R ∞Assume s > c. By the comparison theorem of integral calculus, we need only show 0 g(t) dt < ∞ for some positive function satisfying e−st |f (t)| ≤ g(t) for t ≥ 0. f e(c−s)t where M f is greater than or equal to the M in (3.13). Then we Let g(t) = M −st f have e |f (t)| ≤ g(t) for t ≥ T , and since f (t) is bounded on [0, T ], we can increase M if necessary to have f e(c−s)t e−st |f (t)| ≤ g(t) ≡ M Now we simply compute (recalling that s > c) it=∞ h R∞ f e(c−s)t = g(t) dt = M c−s 0 t=0 for t ≥ 0. f M s−c < ∞. 2 Inverse Laplace Transform In applications, we also need to be able to recover f (t) from its Laplace transform F (s): this is called the inverse Laplace transform and is denoted L−1 . Of course, we need to make sure that f is uniquely determined by its transform F . While this need not be true for piecewise continuous f (see Exercise 2), it is true if we assume that f is continuous. Theorem 3. Suppose that f (t) and g(t) are continuous on [0, ∞) and of exponential order c as t → ∞. By Theorem 2, Laplace transforms F (s) and G(s) exist. If F (s) = G(s) for s > c, then f (t) = g(t) for all t. The proof of this theorem is not difficult, but for the details we refer to [2]. However, let us comment on the significance of the assumption that f is continuous. Recall that we want to use the Laplace transform to solve differential equations, and these differential equations could have discontinuous terms which need to be transformed. (More will be said about discontinuous inputs in Section 3.4.) However, the solution that we will recover using the inverse Laplace transform is usually a continuous function, so the assumption of continuity in Theorem 3 is no restriction for us. Formulas for L can now be inverted to obtain formulas for L−1 : n! 1 tn L[tn ] = n+1 ⇒ L−1 n+1 = s s n! 1 1 L[eat ] = ⇒ L−1 = eat s−a s−a b b −1 = sin bt L[sin bt] = 2 ⇒ L s + b2 s2 + b2 s s −1 L[cos bt] = 2 ⇒ L = cos bt s + b2 s2 + b2 84 CHAPTER 3. LAPLACE TRANSFORM Moreover, the linearity of L means that L−1 is also linear, so the above formulas may be used to handle more complicated expressions. 5 s3 Example 8. Find the inverse Laplace transform of F (s) = + 1 s2 +4 . 1 2 sin 2t. Solution. We use linearity and then the above formulas: L−1 [F (s)] = 5 2 L−1 2 s3 + 1 2 L−1 h 2 s2 +4 i = 5 2 2 t + 2 Frequently we encounter functions F (s) that need to be simplified using a partial fraction decomposition before we can apply the inverse Laplace transform. (For a review of partial fraction decompositions, see Appendix B.) Example 9. Find the inverse Laplace transform of F (s) = 1 s(s+1) . Solution. We cannot find F (s) in our table, but we can use partial fractions to express it using terms that do appear in the table. Let us write 1 A B (A + B)s + A = + = . s(s + 1) s s+1 s(s + 1) This means A + B = 0 and A = 1, so B = −1. We now can take the inverse Laplace transform: L−1 h 1 s(s+1) i = L−1 h 1 s − 1 s+1 i = L−1 1 s − L−1 h 1 s+1 i = 1 − e−t . 2 Hyperbolic Sine and Cosine The hyperbolic sine and cosine functions defined by 6 sinh t = cosh(t) et − e−t 2 and cosh t = et + e−t 2 (3.14) 4 2 -3 -2 -1 0 1 2 3 -2 sinh(t) -4 Fig.3. Graphs of cosh and sinh. share similar properties to sin and cos (see Exercise 3); their definitions are also reminiscent of the expressions for sin and cos in terms of exponentials (cf. (A.9) in Appendix A). However, our interest in sinh and cosh stems from the fact that their Laplace transforms are similar to those for sin and cos. In fact, using the definition of sinh bt and cosh bt in terms of ebt and e−bt , for which we know the Laplace transform, it is easy to verify (see Exercise 3c) the following: L[sinh bt] = s2 b − b2 and L[cosh bt] = s2 s . − b2 (3.15) We add these formulas to the table in Appendix C since they can be useful in finding inverse Laplace transforms. 3.1. LAPLACE TRANSFORM AND ITS INVERSE 85 Exercises 1. In this exercise we verify the gamma function’s properties (3.9). (a) Verify directly that Γ(1) = 1. (b) Use an integration by parts in Γ(x + 1) to obtain x Γ(x) when x > 0. √ (c) Use the substitution u = t in the formula for Γ(1/2) and then apply the R ∞ −u2 √ √ well-known Gaussian formula −∞ e du = π to show Γ(1/2) = π. 2. In Example 7, we showed that L[ua (t)] = s−1 e−as . Let ũa (t) be defined like ua except with ũa (a) = 0 (whereas ua (a) = 1). Compute L[ũa (t)]. Why does this not contradict Theorem 3? 3. (a) Show that sinh(−t) = − sinh(t) and cosh(−t) = cosh t, i.e. sinh is an odd function and cosh is an even function. d d sinh t = cosh t and dt cosh t = sinh t. (In particular, you do not (b) Show that dt d need to worry about the pesky minus sign that occurs in dt cos t = − sin t.) (c) Verify the Laplace transform formulas in (3.15). 4. Apply Definition 1 to find the Laplace transform of the following functions: ( (a) f (t) = e2t+1 1, if 0 ≤ t ≤ 1 (d) f (t) = Sol’n 0, if t > 1 t (b) f (t) = t e ( t, if 0 ≤ t ≤ 1 (e) f (t) = (c) f (t) = et sin 2t Solution 0, if t > 1 5. Use the formulas derived in this section (or the table in Appendix C) to find the Laplace transform of the following functions: (a) f (t) = cos 3t (b) g(t) = 3t2 − 5e3t (c) h(t) = 2 cos 3t − 3 sin 2t √ (e) g(t) = t + 1 (d) f (t) = t3/2 (f) h(t) = (1 + t)3 (g) f (t) = 3 sinh 2t + 2 sin 3t (h) g(t) = 3 u(t − π) 6. Find the inverse Laplace transform of the following functions: (c) F (s) = 2 s−2 2 √ s s (d) F (s) = 2s−3 s2 +4 1 s2 −4 (e) F (s) = 3s−2 s2 −16 (f) F (s) = e−2s 3s (a) F (s) = (b) F (s) = 7. Use a partial fraction decomposition in order to find the inverse Laplace transform: (a) F (s) = 1 s(s+2) (b) F (s) = s−1 (s+1)(s2 +1) Solution (c) F (s) = −2 s2 (s−2) (d) F (s) = s (s+1)(s+2)(s+3) 86 CHAPTER 3. LAPLACE TRANSFORM 8. Using complex partial fraction decompositions (CPFD’s, see the last subsection in Appendix B), we can avoid having to take the inverse Laplace transform of a rational function with an irreducible quadratic denominator. For example, s2 b b A B i 1 i 1 = = + = − 2 +b (s + ib)(s − ib) s + ib s − ib 2 s + ib 2 s − ib so L−1 i −1 i −1 b 1 1 = L − L s2 + b2 2 s + ib 2 s − ib i −ibt i ibt eibt − e−ibt e = sin bt, − e = 2 2 2i where we used (3.3) with a = ±ib, and then (A.9) in the last step. Use a CPFD to find the inverse Laplace transform of the following: = (a) F (s) = (b) F (s) = 3.2 s s2 +b2 1 s2 +2s+2 (c) F (s) = (d) F (s) = 1 s3 +4s 2s+4 s2 +4s+8 Transforms of Derivatives, Initial-Value Problems Recall that we want to use the Laplace transform to solve differential equations, so we need to know how to evaluate L[f 0 (t)]. To allow the possibility that f 0 (t) has jump discontinuities, we define a continuous function f to be piecewise differentiable on [a, b] if f 0 (t) exists at all but finitely many points and it is piecewise continuous on [a, b]; we say f is piecewise differentiable on [0, ∞) if it is piecewise differentiable on [0, b] for every finite b. Theorem 1. Suppose f (t) is piecewise differentiable on [0, ∞) and of exponential order c as t → ∞. Then L[f 0 (t)](s) exists for s > c and is given by L[f 0 (t)] = sL[f (t)] − f (0) = s F (s) − f (0). (3.16) Proof. For simplicity, we assume that f 0 is continuous on [0, ∞). We use integration by parts to evaluate L[f 0 (t)]: Z ∞ Z ∞ −st t=∞ 0 −st 0 L[f (t)] = e f (t) dt = e f (t) t=0 − (−s e−st ) f (t) dt. 0 0 But since f is of exponential order c as t → ∞, we have e−st |f (t)| ≤ M e(c−s)t and this tends to zero as t → ∞ for s > c. Consequently, −st t=∞ e f (t) t=0 = lim e−st f (t) − f (0) = 0 − f (0), t→∞ and − R∞ 0 (−s e−st ) f (t) dt = R∞ 0 s e−st f (t) dt = sL[f (t)], so we obtain (3.16). 2 3.2. TRANSFORMS OF DERIVATIVES, INITIAL-VALUE PROBLEMS 87 Let us immediately apply this to an initial-value problem. Example 1. Solve y 0 + 3y = 1, y(0) = 1. Solution. Let us denote the Laplace transform of y(t) by Y (s), i.e. L[y(t)] = Y (s). Using (3.16) we have L[y 0 (t)] = sY (s) − y(0) = sY (s) − 1. So, taking the Laplace transform of the differential equation and solving for Y (s) yields sY + 3Y − 1 = 1 s ⇒ Y (s) = 1 1 + . s + 3 s(s + 3) Before we can take the inverse Laplace transform, we need to perform a partial fraction decomposition on the last term: 1/3 −1/3 1 = + . s(s + 3) s s+3 Using this in Y (s) and then taking the inverse Laplace transform, we find our solution: Y (s) = 1/3 s + 2/3 s+3 ⇒ y(t) = 1 3 + 2 3 e−3t . 2 Of course, we would also like to use the Laplace transform to solve second-order (and possibly higher-order) differential equations, so we want to evaluate L[f 00 (t)]. But if we assume that f and f 0 are both piecewise differentiable and of exponential type on [0, ∞), then we can apply (3.16) twice, once with f 0 in place of f , and once as it stands: L[f 00 (t)] = sL[f 0 (t)] − f 0 (0) = s (sL[f (t)] − f (0)) − f 0 (0). Let us record the result as L[f 00 (t)] = s2 F (s) − s f (0) − f 0 (0). (3.17) We leave as Exercise 1 the generalization of (3.17) to n-th order derivatives. We can apply this to initial-value problems. Example 2. Solve y 00 + 4y = 1, y(0) = 1, y 0 (0) = 3. Solution. We apply the Laplace transform to the equation to obtain 1 s2 Y (s) − s − 3 + 4Y = . s Next we solve for Y (s): (s2 + 4) Y (s) = 1 s2 + 3s + 1 +s+3= s s ⇒ Y (s) = s2 + 3s + 1 . s(s2 + 4) We now find the partial fraction decomposition for Y (s): 1/4 (3/4)s + 3 1 1 3 s 3 2 Y (s) = + = + + . s s2 + 4 4 s 4 s2 + 4 2 s2 + 4 In this last form, we can easily apply the inverse Laplace transform to obtain our solution: y(t) = 1 4 + 3 4 cos 2t + 3 2 sin 2t. 2 88 CHAPTER 3. LAPLACE TRANSFORM We can, of course, use the Laplace transform to solve initial-value problems for mass-spring-dashpot systems as in Sections 2.4 and 2.6. Example 3. Suppose a mass m = 1 is attached to a spring with constant k = 9, and subject to an external force of f (t) = sin 2t; ignore friction. If the spring is initially at equilibrium (x0 = 0 = v0 ), then use Laplace transform to find the motion x(t). Solution. The initial-value problem that we must solve is x00 + 9x = sin 2t, x(0) = 0 = x0 (0). If we take the Laplace transform of this equation and let X(s) = L[x(t)], we obtain s2 X(s) + 9X(s) = s2 2 . +4 Solving for X(s) and using a partial fraction decomposition, we obtain X(s) = 2/5 2/5 1 2 2 3 2 = 2 − = − . (s2 + 4)(s2 + 9) s + 4 s2 + 9 5 s2 + 4 15 s2 + 9 In this last form it is easy to apply the inverse Laplace transform to obtain x(t) = 1 5 sin 2t − 2 15 sin 3t. 2 We now illustrate one more use of Theorem 1, namely providing a short-cut in the calculation of some Laplace transforms. Example 4. Show that L[t sin bt] = (s2 2bs . + b2 ) 2 (3.18) Solution. Let f (t) = t sin bt, so we want to find F (s). Let us differentiate f (t) twice: f 0 (t) = sin bt + bt cos bt, f 00 (t) = 2b cos bt − b2 t sin bt, and observe that f (0) = 0 = f 0 (0). Now we can apply the Laplace transform to f 00 (t) to conclude s s2 F (s) = 2b 2 − b2 F (s). s + b2 Finally, we solve this for F (s) and obtain (3.18). 2 Derivatives of Transforms Theorem 1 shows that differentiation of f with respect to t corresponds under L to multiplication of F by s (at least when f (0) = 0). We might wonder if differentiation of F with respect to s corresponds to multiplication of f by t. We can check this by differentiating (3.1) with respect to s. If we are allowed to differentiate under the integral sign, we obtain Z ∞ Z ∞ Z ∞ d d −st 0 −st F (s) = e f (t) dt = [e f (t)] dt = − e−st [t f (t)] dt = −L[t f (t)], ds 0 ds 0 0 so we see that our expectation was off by a minus sign! The differentiation under the integral sign is justified provided the improper integral converges “uniformly” (see, for example, [2]), so we conclude the following: 3.2. TRANSFORMS OF DERIVATIVES, INITIAL-VALUE PROBLEMS 89 Theorem 2. Suppose f (t) is piecewise continuous on [0, ∞) and of exponential order c as t → ∞. Then, for s > c, its Laplace transform F (s) satisfies F 0 (s) = −L[tf (t)]. Repeating the differentiation multiple times yields the following useful formula: L[tn f (t)] = (−1)n F (n) (s), (3.19) where n is a positive integer. In fact, with n = 1 and f (t) = sin bt, we see that (3.18) is also a consequence of this formula. Let us consider another example. Example 5. Show that L[t eat ] = 1 (s − a)2 for s > a. (3.20) Solution. Letting f (t) = eat , we have F (s) = 1/(s − a) for s > a, and so (3.19) implies h i d 1 1 L[t eat ] = − ds 2 s−a = (s−a)2 . Exercises 1. If f , f 0 ,. . . ,f (n−1) are piecewise differentiable functions of exponential type on [0, ∞), show that the Laplace transform of the nth-order derivative is given by L[f (n) (t)] = sn F (s) − s(n−1) f (0) − s(n−2) f 0 (0) − · · · − f (n−1) (0). 2. Use the Laplace transform to solve the following initial-value problems for firstorder equations. (a) y 0 + y = et , y(0) = 2. (b) y 0 + 3y = 2e−t , 0 (c) y − y = 6 cos t, y(0) = 1. y(0) = 2. 3. Use the Laplace transform to solve the following initial-value problems for secondorder equations. (a) y 00 + 4y = 0, 00 y(0) = −3, y 0 (0) = 5. 0 (b) y + y − 2y = 0, 00 y(0) = 1, y (0) = −1. (c) y − y = 12 e , y(0) = 1 = y 0 (0). (d) y 00 − y = 8 cos t, y(0) = 0 = y 0 (0). 2t Solution 0 (e) y 00 − 5y 0 + 6y = 0, y(0) = 1, y 0 (0) = −1. (f) y 00 + 4y 0 + 3y = cosh 2t, y(0) = 0 = y 0 (0). 4. Use the Laplace transform to find the motion x(t) of a mass-spring-dashpot system with m = 1, c = 6, k = 8, x0 = 0, and v0 = 2. 90 CHAPTER 3. LAPLACE TRANSFORM 5. Use the Laplace transform to find the motion x(t) of forced mass-spring-dashpot system with m = 1, c = 5, k = 6, and F (t) = cos t. Assume the mass is initially at equilibrium. 6. Use the technique of Example 4 or Theorem 2 to take the Laplace transform of the following functions 3.3 (a) t2 eat (b) t2 sin bt (c) t cos bt (d) t u1 (t) Shifting Theorems Suppose f (t) has a Laplace transform F (s). What is the Laplace transform of eat f (t)? The answer is found by a simple calculation: Z ∞ Z ∞ L[eat f (t)] = e−st eat f (t) dt = e−(s−a)t f (t) dt = F (s − a). 0 0 Now the function F (s − a) is just the function F (s) “shifted” or “translated” by the number a, so we see that multiplication of f by eat corresponds to shifting F (s) by a. We summarize this as our first “shifting theorem”: Theorem 1. Suppose f has Laplace transform F (s) defined for s > c, and a is a real number. Then eat f (t) has Laplace transform defined for s > a + c by L[eat f (t)] = F (s − a). (3.21) A rather trivial application of this theorem shows L[1] = 1 s ⇒ L[eat ] = L[eat · 1] = 1 , s−a but of course, we already knew this. More importantly, we obtain the following Laplace transforms that we did not know before: L[eat tn ] = n! (s − a)n+1 (s > a) (3.22a) L[eat cos bt] = (s − a) (s − a)2 + b2 (s > a) (3.22b) L[eat sin bt] = b (s − a)2 + b2 (s > a). (3.22c) Example 1. Find the inverse Laplace transform of the following functions F (s) = 2 (s − 2)3 G(s) = s+2 . s2 + 2s + 5 3.3. SHIFTING THEOREMS 91 Solution. For L−1 [F (s)] we can apply (3.22a) with a = 2 = n; but to see how Theorem 1 applies in this case, observe that L[t2 ] = 2/s3 implies L[e2t t2 ] = 2/(s − 2)3 , and hence 2 −1 f (t) = L = e2t t2 . (s − 2)3 Now let us consider G(s). Since the denominator does not factor into linear factors, we cannot use a partial fraction decomposition. However, we can “complete the square” to write it as s2 + 2s + 5 = (s + 1)2 + 4 = (s + 1)2 + 22 . Hence G(s) = s+1 1 (s + 1) + 1 2 s+2 = + . = s2 + 2s + 5 (s + 1)2 + 22 (s + 1)2 + 22 2 (s + 1)2 + 22 In this form we can easily use Theorem 1 to take the inverse Laplace transform to conclude i h i h s+1 g(t) = L−1 (s+1) + 12 L−1 (s+1)22 +22 = e−t cos 2t + 21 e−t sin 2t. 2 2 +22 Theorem 1 is frequently useful in solving initial-value problems. Example 2. Solve y 00 + 6y 0 + 34y = 0, y(0) = 3, y 0 (0) = 1. Solution. We take the Laplace transform of the equation and use the initial conditions to obtain (s2 Y − 3s − 1) + 6(sY − 3) + 34Y = 0. We can rearrange this and solve for Y to find Y (s) = 3s + 19 . s2 + 6s + 34 The denominator does not factor, but we can complete the square to write it as s2 + 6s + 34 = s2 + 6s + 9 + 25 = (s + 3)2 + 52 . Using this and rearranging the numerator yields Y (s) = 3s + 19 s+3 5 =3 +2 . (s + 3)2 + 52 (s + 3)2 + 52 (s + 3)2 + 52 Finally, we can apply (3.22) to obtain our solution y(t) = 3 e−3t cos 5t + 2 e−3t sin 5t. 2 Since multiplication by the exponential function eat corresponds to a shift in the Laplace transform, it is natural to wonder whether multiplication of the Laplace transform by an exponential corresponds to a shift in the original function. This is essentially true, but there is an additional subtlety due to the fact that the original function is only assumed defined for t ≥ 0. In fact, we know L[1] = 1/s and in Example 7 in Section 3.1 we found that L[ua (t)] = e−as /s. So multiplication of L[1] by e−as does not correspond to a simple of shift of the function 1 (which would have no effect), but corresponds to multiplication of 1 by ua (t) = u(t − a). This observation is generalized in our second shifting theorem: 92 CHAPTER 3. LAPLACE TRANSFORM Theorem 2. Suppose f (t) has Laplace transform F (s) defined for s > c, and suppose a > 0. Then L[u(t − a)f (t − a)] = e−as F (s). (3.23) Proof. We simply compute, using the definition of u(t − a) and a change of the integration variable t → t̃ = t − a: Z ∞ Z ∞ −st L[u(t − a)f (t − a)] = e u(t − a)f (t − a) dt = e−st f (t − a) dt 0 = R∞ 0 a e−s(t̃+a) f (t̃) dt̃ = e R −sa ∞ 0 e−st̃ f (t̃) dt̃ = e−sa F (s). 2 Theorem 2 is useful in taking the Laplace transform of functions that are defined piecewise, provided they can be written in the form u(t − a)f (t − a). Example 2. Find the Laplace transform of ( 0, if 0 ≤ t < 1 g(t) = t − 1, if t ≥ 1. 2 g(t) 1 Solution. If we let f (t) = t then g(t) = u(t − 1)f (t − 1). Since f (t) has Laplace transform F (s) = 1/s2 , we use Theorem 2 with a = 1 to conclude t 0 1 2 Fig.1. Graph of g(t) in Example 2. L[g(t)] = L[u(t − 1)f (t − 1)] = e−s s12 . 2 On the other hand, Theorem 2 is also useful in taking the inverse Laplace transform of functions involving exponentials. Example 3. Find the inverse Laplace transform of e−πs s/(s2 + 4). Solution. If we let F (s) = s/(s2 +4), then its inverse Laplace transform is f (t) = cos 2t. Consequently, using Theorem 2 with a = π, we find s −1 −πs L e = u(t − π) cos 2(t − π) = u(t − π) cos 2t, s2 + 4 1 t -1 0 1 2 3 4 5 6 7 8 9 Fig.2. Graph of u(t − π) cos 2t. 10 where in the last step we used the fact that cosine has period 2π. The graph of this solution is shown in Figure 2. 2 Now let us apply Theorem 2 to an initial-value problem. Example 4. Solve x00 + 4x = 2uπ (t), x(0) = 0, x0 (0) = 1. Solution. We take the Laplace transform of the problem to obtain s2 X(s) − 1 + 4X(s) = 2e−πs . s Solving for X(s), we obtain X(s) = 2e−πs 1 + . s(s2 + 4) s2 + 4 3.3. SHIFTING THEOREMS 93 If we use partial fractions, we can write 2 A Bs + C 1 = + 2 = s(s2 + 4) s s +4 2 1 s − s s2 + 4 , so X(s) becomes s 1 2 1 −πs 1 e − 2 + . 2 s s +4 2 s2 + 4 Now we can apply Theorem 2 to conclude X(s) = x(t) = 1 1 1 1 u(t − π) (1 − cos 2(t − π)) + sin 2t = u(t − π) (1 − cos 2t) + sin 2t, 2 2 2 2 2 where in the last step we used cos 2(t − π) = cos 2t. Exercises 1. Use Theorem 1 to calculate the Laplace transform of the following functions (d) e−4t sin 3t. (a) t e2t . 2 −3t (b) t e . (e) t (et + e−t ). Solution (f) t3 et + e−t cos 2t. (c) e3t cos 5t. 2. Use Theorem 1 to find the inverse Laplace transform of the following functions (a) F (s) = 3 (s−2)3 . (b) F (s) = 1 s2 +4s+4 . Solution (c) F (s) = s+3 s2 +4s+13 . (d) F (s) = √1 . s+3 3. Solve the following initial-value problems involving homogeneous equations (a) y 00 − 4y 0 + 8y = 0, y(0) = 0, y 0 (0) = 1. (b) y 00 + 6y 0 + 10y = 0, y(0) = 2, y 0 (0) = 1. (c) y 00 − 6y 0 + 25y = 0, y(0) = 2, y 0 (0) = 3. Solution 4. Solve the following initial-value problems involving nonhomogeneous equations (a) y 00 + 4y 0 + 8y = e−t , y(0) = 0 = y 0 (0) (b) y 00 − 4y = 3 t et , y(0) = 0 = y 0 (0) (c) y 00 − y = 8 et sin 2t, y(0) = 0 = y 0 (0) 5. Use Theorem 2 to find the Laplace transform of the following functions: (a) ( f (t) = 0 sin(t − π) for 0 ≤ t < π for t ≥ π (b) ( 0 g(t) = et for 0 ≤ t < 1 for t ≥ 1 94 CHAPTER 3. LAPLACE TRANSFORM 6. Use Theorem 2 to find the inverse Laplace transforms of the following functions: (a) 2 e−s /(s2 + 9). Solution (b) e−3s /(s2 + 2s + 5). 7. Use the Laplace transform to find the motion x(t) of a forced mass-spring-dashpot system with m = 1, c = 4, k = 5, and F (t) = −20 cos 3t. Assume the mass is initially at equilibrium. 8. Solve the initial-value problem x00 + 9x = 4 uπ (t) sin(t − π), x(0) = 0, x0 (0) = 1. 3.4 Discontinuous Inputs In this section we focus on differential equations involving discontinuous inputs. An example is the damped spring-mass-dashpot model with external forcing that we studied in Section 2.6 dx d2 x + kx = f (t), (3.24) m 2 +c dt dt but now we allow the forcing term f (t) to be a discontinuous function. Such discontinuous forcing functions arise naturally in applications where the forcing may suddenly be turned on or off. In fact, we also want to consider systems in which the external force occurs in the form of an impulse, i.e. a very sudden and short input. When f (t) is a piecewise continuous function, the first step is to express it using unit step functions; then we can use Theorem 2 in Section 3.3 to take the Laplace transform, and carry on with the solution. In this context, it is worth observing that if a > 0 and f (t) is a function that is defined for t ≥ 0, then multiplication by u(t − a) has the effect of “turning off the input” for 0 ≤ t < a: ( 0 for 0 ≤ t < a u(t − a)f (t) = f (t) for t ≥ a. a t Fig.1. Graph of u(t − a)f (t). On the other hand, multiplication by (1 − u(t − a)) has the effect of turning off the input for t ≥ a: ( f (t) for 0 ≤ t < a (1 − u(t − a)) f (t) = 0 for t ≥ a. These observations can be useful in expressing a piecewise continuous function in terms of unit step functions. Example 1. The function a t Fig.2. Graph of (1 − u(t − a))f (t).   0 f (t) = sin t   0 for 0 ≤ t < 2π for 2π ≤ t < 4π for t ≥ 4π can be thought of as sin t, but turned off for 0 ≤ t < 2π and also for t ≥ 4π. To turn off sin t for 0 ≤ t < 2π, we multiply by u(t − 2π) to obtain u(t − 2π) sin t. Now we multiply this by (1 − u(t − 4π)) in order to also turn it off for t ≥ 4π. The result is f (t) = (1 − u(t − 4π))u(t − 2π) sin t =u(t − 2π) sin t − u(t − 4π)u(t − 2π) sin t = u(t − 2π) sin t − u(t − 4π) sin t, 3.4. DISCONTINUOUS INPUTS 95 where in the last step we have used the fact that u(t − 4π)u(t − 2π) = u(t − 4π). Since sine has period 2π, sin(t) = sin(t − 2π) = sin(t − 4π), so we can rewrite this as 0 f (t) = u(t − 2π) sin(t − 2π) − u(t − 4π) sin(t − 4π). The advantage of this last form of f (t) is that we are ready to apply (3.23) to compute its Laplace transform. 2 Now let us apply this technique to solving initial-value problems involving piecewise continuous functions. We begin with a first-order equation. Example 2. Solve the initial-value problem 0 y − y = g(t), y(0) = 0 ( cos t where g(t) = 0 for 0 ≤ t < π for t ≥ π. Solution. We can express g(t) as (1 − uπ (t)) cos t, where by “turning off” cos t at t = π we have a jump discontinuity. Now we want to take the Laplace transform of g(t), but uπ (t) cos t is not in the form to apply (3.23). However, if we recall from trigonometry that cos(t − π) = − cos t, then we can write g(t) = cos t + u(t − π) cos(t − π). This allows us to use (3.23) to compute the Laplace transform of g(t) and obtain G(s) = e−πs s s + . s2 + 1 s2 + 1 We use this to take the Laplace transform of the equation and then solve for Y (s): Y (s) = s (1 + e−πs ). (s − 1)(s2 + 1) We can use partial fractions to write s 1 = (s − 1)(s2 + 1) 2 so −1 L 1 s 1 − + s − 1 s2 + 1 s2 + 1 , s 1 t = e − cos t + sin t . 2 (s − 1)(s + 1) 2 If we use (3.23), we compute s e−πs 1 L−1 = u(t − π) et−π − cos(t − π) + sin(t − π) . 2 (s − 1)(s + 1) 2 Using cos(t − π) = − cos t and sin(t − π) = − sin t, we conclude that our solution is given by y(t) = 1 2 (et − cos t + sin t) + 12 u(t − π) (et−π + cos t − sin t) . 1 2 4π 2π t -1 Fig.3. Graph of f (t) in Example 1. 96 CHAPTER 3. LAPLACE TRANSFORM Finally, we apply this to (3.24) when f is piecewise continuous. Example 3. Suppose a mass of 1 kg is attached to a spring with constant k = 4. The mass is initially in equilibrium, but for 0 ≤ t < 2π experiences a periodic force cos 2t, and then the force is turned off. Determine the motion, ignoring friction (i.e. let c = 0). Solution. As usual we let x denote the displacement from equilibrium that stretches the spring, so the initial-value problem that we want to solve is ( cos 2t for 0 ≤ t < 2π 00 0 (3.25) x + 4x = f (t), x(0) = 0 = x (0), where f (t) = 0 for t ≥ 2π. We can write f (t) as f (t) = (1 − u(t − 2π)) cos 2t = cos 2t − u(t − 2π) cos 2(t − 2π), where cos 2t = cos 2(t − 2π) since cos is periodic with period 2π. We can now take the Laplace transform of f (t) to obtain F (s) = s e−2πs s − . s2 + 4 s2 + 4 Now we can apply the Laplace transform to the differential equation and conclude that X(s) = e−2πs s s − . (s2 + 4)2 (s2 + 4)2 To take the inverse Laplace transform, we can apply (3.18) to conclude s 1 −1 L = t sin 2t. 2 2 (s + 4) 4 If we combine this with (3.23), we obtain −2πs 1 1 e s −1 L = u(t − 2π)(t − 2π) sin 2(t − 2π) = u(t − 2π)(t − 2π) sin 2t. 2 2 (s + 4) 4 4 Thus we can express our solution as 1 x(t) = [t − u(t − 2π)(t − 2π)] sin 2t = 4 x=π/2 | 2π | t 4π x=-π/2 Fig.4. Graph of x(t) in Example 3. ( t 4 π 2 sin 2t for 0 ≤ t < 2π sin 2t for t ≥ 2π The graph of this solution may be seen in Figure 4. Notice that, for 0 ≤ t < 2π, amplitude of the oscillations steadily increase since the forcing frequency matches natural frequency, i.e. we have resonance. But once the force is removed at t = 2π, oscillations continue at the amplitude that has been attained. the the the 2 Impulsive Force: Dirac Delta Function If a force f (t) is applied to an object over a time interval [t1 , t2 ], then the impulse I represents the resultant change in the momentum mv of the object. Since this change 3.4. DISCONTINUOUS INPUTS 97 in momentum can be calculated by mv(t2 ) − mv(t1 ) = m see that the impulse over [t1 , t2 ] is given by Z t2 I= f (t) dt. R t2 t1 v 0 (t) dt = m R t2 t1 a(t) dt, we (3.26) t1 Now suppose that the force is applied over a very short time interval [t1 , t1 + ], but has such a large magnitude that its impulse is 1. For example, for > 0 very small, consider the function ( 1/ for 0 ≤ t < d (t) = 0 for t < 0 and t ≥ , Fig.5. Graph of d (t). whose graph is shown in Figure 5. Then we calculate its impulse over (−∞, ∞) and find Z ∞ Z 1 d (t) dt = dt = 1. −∞ 0 So the impulse of d is 1 for all > 0. But if we take → 0, what happens to d ? For t < 0 we have d (t) = 0 for all > 0, and for t0 > 0 we have d (t0 ) = 0 for < t0 . On the other hand, d (0) = 1/ → +∞ as → 0, so ( 0 for all t 6= 0 lim d (t) = →0 +∞ for t = 0. Now let us be bold and define the Dirac delta function δ(t) to be the limit of d (t) as → 0. It has the following significant properties: Z ∞ δ(t) = 0 for t 6= 0 and δ(t) dt = 1. (3.27) −∞ If we think of δ(t) as a function, this is very strange indeed: it is zero for all t 6= 0 and yet its integral is 1! In fact, δ(t) is not a function at all, but a “generalized function” or “distribution.” Such objects are analyzed by their effect on actual functions through integration, and in this capacity δ(t) performs perfectly well. For example, if g(t) is any continuous function, then by the mean value theorem of calculus, we know Z 1 g(t) dt = g(t∗ ) for some 0 < t∗ < . 0 But as → 0, we have t∗ → 0, so Z ∞ Z δ(t)g(t) dt = lim −∞ →0 ∞ −∞ 1 →0 Z d (t)g(t) dt = lim g(t) dt = g(0). 0 In fact, if we consider δ(t − a) for any real number a, we may generalize this calculation to obtain the important property Z ∞ δ(t − a) g(t) dt = g(a) for any continuous function g(t). (3.28) −∞ Now we are in business, since (3.28) enables us to compute the Laplace transform of δ(t − a): Fig.6. The mean value theorem. 98 CHAPTER 3. LAPLACE TRANSFORM Z L[δ(t − a)] = ∞ e−st δ(t − a) dt = e−as for a > 0, (3.29) 0 and in the special case a = 0 we can take the limit a → 0 to obtain L[δ(t)] = 1. (3.30) We can now use the Laplace transform to solve initial-value problems involving the Dirac delta function. Example 4. Solve the initial-value problem y 0 + 3y = δ(t − 1), y(0) = 1. Solution. If we take the Laplace transform of this equation, we obtain sY − 1 + 3Y = e−s , where we have used (3.29) with a = 1 to calculate L[δ(t − 1)]. We easily solve this in terms of Y (s) 1 e−s Y (s) = + , s+3 s+3 1 t 0 1 2 Fig.7. The solution for Example 4. and then take the inverse Laplace transform, using (3.23) on the second term: y(t) = e−3t + u1 (t)e−3(t−1) . A sketch of this solution appears in Figure 7. Notice that the jump discontinuity at t = 1 corresponds to the time when the impulse δ(t − 1) is applied. 2 The Dirac delta function is especially useful in applications where it can be used to express an instantaneous force. For example, if we consider a spring-mass system, then a sudden, sharp blow to the mass that imparts an impulse I0 over a very short time interval near the time t0 can be modeled using ±I0 δ(t − t0 ), where ± depends on the direction of the impulse. Let us illustrate this with an example. Example 5. A vertically hung spring with constant k = 1 is attached to a mass of 1 kg. At t = 0 the spring is in its equilibrium position but with a downward velocity of 1 m/sec. At t = π sec, the mass is given an instantaneous blow that imparts an impulse of 2 units of momentum in an upward direction. Find the resultant motion. Solution. As usual, we measure y in a downward direction and assume y = 0 is the equilibrium position after the mass has been attached. Thus the impulse imparted by the instantaneous blow at t = π has the value −2 and the initial-value problem that we must solve is y 00 + y = −2δ(t − π), y(0) = 0, y 0 (0) = 1. If we take the Laplace transform of this equation we obtain s2 Y − 1 + Y = −2 e−πs , 3.4. DISCONTINUOUS INPUTS 99 which we can solve for Y to find Y (s) = 1 e−πs − 2 . s2 + 1 s2 + 1 3 2 1 Now we can take the inverse Laplace transform to find 0 y(t) = sin t − 2 u(t − π) sin(t − π). But if we recall from trigonometry that sin(t − π) = − sin t, then we can write our solution as ( sin t, 0≤t<π y(t) = (1 + 2uπ (t)) sin t = 3 sin t, t ≥ π. The graph appears in Figure 8. Notice that the amplitude of the vibration is 1 until the impulse occurs at t = π; thereafter the amplitude is 3. 2 Exercises 1. Express the following functions using step functions ua (t) = u(t − a)  (  0 ≤ t < 1, 0 1 0 ≤ t < 2π, (b) g(t) = (a) f (t) = t − 1 1 ≤ t < 2,  cos t t ≥ 2π.  1 t ≥ 2. 2. Solve the following first-order initial-value problems (a) y 0 − y = 3u1 (t), y(0) = 0. 0 (b) y + 2y = u5 (t), y(0) = 1. 3. Solve the following second-order initial-value problems (a) y 00 − y = u1 (t), y(0) = 2, 00 (b) y − 4y = u1 (t) − u2 (t), 00 0 (c) y + 4y + 5y = 5u3 (t), y 0 (0) = 0. y(0) = 0, y(0) = 2, (d) y 00 + 3y 0 + 2y = 10 uπ (t) sin t, Solution 0 y (0) = 2. y 0 (0) = 0. y(0) = 0 = y 0 (0). 4. Solve the following first-order initial-value problems (a) y 0 − 2y = δ(t − 1), y(0) = 1. (b) y 0 − 5y = e−t + δ(t − 2), y(0) = 0. 5. Solve the following second-order initial-value problems (a) y 00 + 4y = δ(t − 3), y(0) = 1, y 0 (0) = 0. Solution 00 0 y(0) = 0 = y (0). 00 0 y(0) = 0, (b) y + 4y + 4y = 1 + δ(t − 2), 0 (c) y + 4y + 5y = δ(t − π) + δ(t − 2π), (d) y 00 + 16y = cos 3t + 2 δ(t − π2 ), y(0) = 0 = y 0 (0). y 0 (0) = 2. -1 -2 -3 Fig.8. The solution for Example 5. 100 CHAPTER 3. LAPLACE TRANSFORM 6. A 1 kg mass is attached to a spring with constant k = 9 N/m. Initially, the mass is in equilibrium, but for 0 ≤ t ≤ 2π experiences a periodic force of 4 sin t N, and then the force is turned off. Ignoring friction (let c = 0), determine the motion. 7. A mass m = 1 is attached to a spring with constant k = 5 and a dashpot with coefficient c = 2. Initially in equilibrium, the mass is subjected to a constant force f = 1 for 0 ≤ t ≤ π, and then the force is turned off. Find the motion. 8. A mass m = 1 is attached to a spring with constant k = 1 and no friction (c = 0). At t = 0, the mass is pulled from its equilibrium at x = 0 to x = 1, extending the spring, and released; then at t = 1 sec, the mass is given a unit impulse in the positive x-direction. Determine the motion x(t). 3.5 Convolutions When solving an initial-value problem using the Laplace transform, we may want to take the inverse Laplace transform of a product of two functions whose inverse Laplace transforms are known, i.e. F (s)G(s) where F (s) = L[f (t)] and G(s) = L[g(t)]. In general, L−1 [F (s)G(s)] 6= f (t)g(t) so we need to know how to take the inverse Laplace transform of a product. This leads us to the “convolution” of two functions. Definition 1. If f and g are piecewise continuous function for t ≥ 0, then the convolution is the function defined for t ≥ 0 by Z t f ? g(t) = f (t − τ )g(τ ) dτ. 0 If we make the substitution u = t − τ in the integral, then du = −dτ and Z t Z 0 Z t f (t − τ )g(τ ) dτ = − f (u)g(t − u) du = g(t − u)f (u) du = g ? f (t), 0 t 0 so convolution is commutative: f ? g = g ? f . It is also easy to check that convolution is associative, i.e. f ?(g ?h) = (f ?g)?h, and distributive, i.e. f ?(g +h) = f ?g +f ?h. Let us compute an example. Example 1. If f (t) = t and g(t) = sin t, then we use integration by parts to calculate Z t Z t Z t f ? g(t) = (t − τ ) sin τ dτ = t sin τ dτ − τ sin τ dτ 0 0 0 Z t τ =t = −t (cos τ )|ττ =t cos τ dτ =0 + (τ cos τ )|τ =0 − 0 = −t (cos t − 1) + t cos t − sin τ |t=τ t=0 = t − sin t. 2 We conclude that t ? sin t = t − sin t. Now let us see how convolution relates to the Laplace transform. Theorem 1. If f and g are both piecewise continuous on [0, ∞) and of exponential order c as t → ∞, then L[f ? g(t)] = F (s) G(s) for s > c. 3.5. CONVOLUTIONS 101 Proof. Let us write G(s) as Z ∞ Z G(s) = e−su g(u) du = 0 ∞ e−s(t−τ ) g(t − τ ) dt = esτ Z ∞ e−st g(t − τ ) dt, τ τ where we have replaced the integration variable u by t = u + τ ; here τ > 0 is fixed but below we will allow it to vary as an integration variable. Using this we find Z ∞ Z ∞ e−sτ f (τ ) G(s) dτ e−sτ f (τ ) dτ G(s) = F (s) G(s) = 0 0 Z ∞ Z ∞Z ∞ Z ∞ −st f (τ ) e−st g(t − τ ) dt dτ. e g(t − τ ) dt dτ = f (τ ) = 0 0 τ τ Now in this last integral, we want to change the order of integration: instead of integrating τ ≤ t < ∞ and then 0 ≤ τ < ∞, we want to integrate 0 ≤ τ ≤ t and then 0 ≤ t < ∞ (see Figure 1). But this means Z ∞Z t F (s) G(s) = f (τ ) e−st g(t − τ ) dτ dt 0 0 Z t Z ∞ = e−st f (τ )g(t − τ ) dτ dt = L[f ? g(t)], 0 0 2 which establishes our result. Fig.1. Domain of integration One application of Theorem 1 is simply to compute inverse Laplace transforms. Example 2. Find the inverse Laplace transform: 1 L−1 2 . s (s + 1) Solution. We could use partial fractions to express s−2 (s + 1)−1 as a sum of terms that are easier to invert through the Laplace transform. But, since f (t) = t has Laplace transform F (s) = 1/s2 and g(t) = e−t has Laplace transform G(s) = 1/(s + 1), we can also use Theorem 1: Z t Z t Z τ 1 −1 −τ −τ L = f ? g(t) = (t − τ ) e dτ = t e dτ − τ e−τ dτ s2 (s + 1) 0 0 0 =t =t = −t (e−τ )|ττ =0 + (τ e−τ )|ττ =0 − Rt 0 e−τ dτ = e−t + t − 1. 2 Another application of Theorem 1 is to derive solution formulas when the input function is not yet known. This can be important in applications where the system is fixed but the input function is allowed to vary; for example, consider a fixed springmass-dashpot system with different forcing functions f (t). Example 3. Show that the solution of the initial-value problem x00 + ω 2 x = f (t), x(0) = 0 = x0 (0), 102 CHAPTER 3. LAPLACE TRANSFORM is given by x(t) = 1 ω Z t sin ω(t − τ ) f (τ ) dτ. (3.31) 0 Solution. We assume that f (t) is piecewise continuous on [0, ∞) and of exponential order as t → ∞. Taking the Laplace transform, we obtain (s2 + ω 2 )X(s) = F (s), where F (s) = L[f (t)]. Solving for X(s), we find X(s) = s2 1 ω 1 F (s) = F (s). 2 2 +ω ω s + ω2 But we recognize ω/(s2 + ω 2 ) as the Laplace transform of sin ωt, so we may apply Theorem 1 to obtain (3.31). 2 Remark 1. The solution formula (3.31) provides the mapping of the system’s input f (t) to its output or response, x(t). Under the Laplace transform, the relationship of the input to the output is given by X(s) = H(s)F (s), where H(s) is called the transfer function. Of course, for the system in Example 3, the transfer function is H(s) = (s2 + ω 2 )−1 . Exercises 1. Calculate the convolution of the following functions (a) f (t) = t, g(t) = 1, (b) f (t) = t, g(t) = eat (a 6= 0), (c) f (t) = t2 , g(t) = sin t (d) f (t) = eat , g(t) = ebt (a 6= b) 2. Find the inverse Laplace transform using Theorem 1. (You may want to use the table of integrals in Appendix C.) 2 s 1 (a) (s−1)(s−2) (b) (s+1)(s (c) s3 +s 2 +1) 2 3. Find the Laplace transform of the given function Rt (a) f (t) = 0 (t − τ )2 cos 2τ dτ Rt (b) f (t) = 0 e(t−τ ) sin 3τ dτ Rt (c) f (t) = 0 sin(t − τ ) cos τ dτ 4. Determine solution formulas like (3.31) for the following initial-value problems (a) y 00 − y = f (t), y(0) = 0 = y 0 (0) (b) y 00 + 2y 0 + 5y = f (t), (c) y 00 + 2y 0 + y = f (t), y(0) = 0 = y 0 (0) y(0) = 0 = y 0 (0) 3.6. ADDITIONAL EXERCISES 3.6 103 Additional Exercises 1. Find the values of c so that f (t) = t100 e2t is of exponential order c. 2 2. Show that f (t) = sin(et ) is of exponential order 0, but f 0 (t) is not of exponential order at all! 3. Consider the function ( f (t) = 1, for 2k ≤ t < 2k + 1 −1, for 2k + 1 ≤ t < 2k + 2, where k = 0, 1, . . . . Use Definition 1 and a geometric series to show L[f (t)] = 1 − e−s s (1 + e−s ) for s > 0. 4. Consider the function ( 1, for 2k ≤ t < 2k + 1 f (t) = 0, for 2k + 1 ≤ t < 2k + 2, where k = 0, 1, . . . . Use Definition 1 and a geometric series to show 1 1 L[f (t)] = s (1 + e−s ) for s > 0. 0 5. Consider the saw-tooth function f (t) given in the sketch. Find L[f (t)]. (Hint: Sketch f 0 (t).) 1 7. Solve the following initial-value problem (see Exercise 1 in Section 3.2): 0 1 x(0) = 1, x0 (0) = x00 (0) = x(3) (0) = 0. 8. Solve the following initial-value problem (see Exercise 1 in Section 3.2): x(4) + 2x00 + x = 1, x(0) = x0 (0) = x00 (0) = x(3) (0) = 0. 9. Find the inverse Laplace transform of F (s) = 1 s4 −a4 10. Find the inverse Laplace transform of F (s) = s2 s4 −a4 11. Consider the equation y 00 + y = f (t) where f (t) = u0 (t) + 2 ∞ X k=1 (−1)k ukπ (t). 2 3 4 5 Fig.1. f (t) in Exercise 5. 6. Consider the modified saw-tooth function f (t) given in the sketch. Find L[f (t)]. (Hint: Sketch f 0 (t).) x(4) − x = 0, 1 2 3 4 5 6 7 Fig.2. f (t) in Exercise 6. 8 104 CHAPTER 3. LAPLACE TRANSFORM (a) Explain why the infinite series converges for each t in [0, ∞). (b) Sketch the graph of f (t). Is it periodic? If so, find the period. (c) Use the Laplace transform to find the solution satisfying y(0) = 0 = y 0 (0). (d) Do you see resonance? Explain. 12. Consider the equation y 00 + y = g(t) where g(t) = ∞ X ukπ (t). k=0 (a) Explain why the infinite series converges for each t in [0, ∞). (b) Sketch the graph of g(t). Is it periodic? If so, find the period. (c) Use the Laplace transform to find the solution satisfying y(0) = 0 = y 0 (0). (d) Would you describe this as resonance? Explain. 13. Suppose a mass m = 1 is connected to a spring with constant k = 4 and experiences a series of unit impulses at t = nπ for n = 0, 1, 2, . . . so that the following equation holds: ∞ X x00 + 4x = δnπ (t) n=0 (a) If the mass is initially at equilibrium, find the motion x(t) for t > 0. (b) Sketch x(t). Do you see resonance? 14. Suppose the spring in the previous problem is replaced by one for which k = 1, so that the equation becomes x00 + x = ∞ X δnπ (t) n=0 (a) If the mass is initially at equilibrium, find the motion x(t) for t > 0. (b) Sketch x(t). Explain why there is no resonance. Chapter 4 Systems of Linear Equations and Matrices 4.1 Introduction to Systems and Matrices In this chapter we want to study the solvability of linear systems, i.e. systems of linear (algebraic) equations. A simple example is the system of two equations in two unknowns x1 , x2 : a x1 + b x2 = y1 (4.1) c x1 + d x2 = y2 , where a, b, c, d, y1 , y2 are known. Note that these are algebraic rather than differential equations; however, as we saw in Section 2.2, the solvability theory for (4.1) is essential for proving the linear independence of solutions of 2nd-order differential equations. Moreover, the linear algebra theory and notation that we shall develop will be useful for our study of systems of differential equations in Chapter 7. However, before we generalize (4.1) to more equations and unknowns, let us review the results for (4.1) that should be familiar from high school. One way to solve (4.1) is by substitution: solve for one variable in one equation, say x1 in terms of x2 , and then substitute it in the other equation to obtain an equation only involving x2 ; solve this equation for x2 and use this to find x1 . Another method for solving (4.1) is elimination: multiply one equation by a constant so that adding it to the other equation eliminates one of the variables, say x1 ; solve this equation for x2 and substitute this back into one of the original equations to find x1 . Let us illustrate the method of elimination with a simple example. Example 1. Find the solution of x1 + 2 x2 = 6 3 x1 − x2 = 4. Solution. We multiply the second equation by 2 to obtain x1 + 2 x2 = 6 6 x1 − 2x2 = 8. 105 106 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES If we add these equations, we obtain 7 x1 = 14, i.e. x2 has been eliminated. We find x1 = 2 and then plug back into either of the original equations to find x2 = 2. 2 In this example we not only found a solution of the linear system, but showed that it is the unique solution (i.e. there are no other values of x1 , x2 satisfying the equation). In fact, we can interpret this result geometrically if we consider each equation in the system as defining a straight line in the x1 , x2 -plane: ⇔ x1 + 2 x2 = 6 4 6 x1 − 2x2 = 8 x2 1 x2 = − x1 + 3 (slope − 12 , x2 -intercept=3) 2 ⇔ x2 = 3x1 − 4 (slope 3, x2 -intercept=-4). 3 . (2,2) 2 1 0 1 2 3 4 5 x1 6 -1 -2 -3 -4 Fig.1 Geometric solution of Example 1 The fact that there is a unique solution is geometrically the fact that the two lines intersect in one point (see Figure 1). Of course, two lines do not necessarily intersect (they could be parallel); in this case the two equations do not admit any solution and are called inconsistent. There is one other possibility that we illustrate in the next example. Example 2. Discuss the solvability of a) x1 − x2 = 1 2 x1 − 2 x2 = 3 b) x1 − x2 = 1 2 x1 − 2 x2 = 2. Solution. a) If we multiply the 1st equation by 2 and subtract it from the second equation we obtain 0=1! There is no choice of x1 , x2 to make this statement true, so we conclude that the equations are inconsistent; if we graph the two lines, we find they are parallel. b) If we perform the same algebra on these equations, we obtain 0=0! While this is certainly true, it does not enable us to find the values of x1 , x2 . In fact, the two equations are just constant multiples of each other, so we have an infinite number of solutions all satisfying x1 − x2 = 1. 2 We conclude that there are three possibilities for the solution set of (4.1): • There is a unique solution; • There is no solution; • There are infinitely many solutions. We shall see that these three possibilities also apply to linear systems involving more equations and more unknowns. Matrix Notation and Algebra Let us now begin to develop the notation and theory for the general case of a linear system of m equations in n unknowns: a11 x1 + a12 x2 + · · · + a1n xn = c1 a21 x1 + a22 x2 + · · · + a2n xn = c2 .. . am1 x1 + am2 x2 + · · · + amn xn = cm . (4.2) 4.1. INTRODUCTION TO SYSTEMS AND MATRICES 107 We want to find the solution set, i.e. all values of x1 , . . . xn satisfying (4.2). The numbers aij and bk are usually real numbers, but they could also be complex numbers; in that case we must allow the unknowns xj to also be complex. The aij are called the coefficients of the system (4.2) and we can gather them together in an array called a matrix:   a11 a12 · · · a1n  a21 a22 · · · a2n    (4.3) A= . .. .. ..  .  .. . . .  am1 am2 ··· amn We also want to introduce vector notation for the unknowns x1 , . . . , xn and the values on the right-hand side c1 , . . . , cm . In fact, let us represent these as “column vectors”:     c1 x1  c2   x2      (4.4) c =  . . x= .   ..   ..  cm xn We want to be able to perform various algebraic operations with matrices and vectors, including a definition of multiplication that will enable us to write (4.2) in the form Ax = c, (4.5) where Ax represents the product of A and x that we have yet to define. Let us now discuss the definition and algebra of matrices more systematically. Definition 2. An (m × n)-matrix is a rectangular array of numbers arranged in m rows and n columns. The numbers in the array are called the elements of the matrix. An (m × 1)-matrix is called a column vector and a (1 × n)-matrix is called a row vector; the elements of a vector are also called its components. We shall generally denote matrices by bold faced capital letters such as A or B, although we sometimes write A = (aij ) or B = (bij ) to identify the elements of the matrix. We shall generally denote vectors by bold faced lower-case letters such as x or b. To distinguish numbers from vectors and matrices, we frequently use the term scalar for any number (usually a real number, but complex numbers can be used as well). Now let us discuss matrix algebra. Here are the two simplest rules: Multiplication by a Scalar. For a matrix A and a scalar λ we define λA by multiplying each element by λ. Matrix Addition. For (m × n)-matrices A and B, we define their sum A+B by adding elementwise. Let us illustrate these rules with an example: Example 3. Compute λA, λB, A + B, and where 1 2 3 1 λ = −2, A= , B= 4 5 6 0 0 1 −1 . −1 In the notation aij , the first subscript refers to the row and the second refers to the column 108 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES Solution. To compute λA and λB, we multiply each element of the matrix by −2: −2 −4 −6 −2 0 2 λA = and λB = . −8 −10 −12 0 −2 2 To compute A + B we simply add the corresponding elements: 1 2 3 1 0 −1 2 2 2 A+B= + = . 4 5 6 0 1 −1 4 6 5 2 We also want to be able to multiply two matrices together, at least if they are of the correct sizes. Before defining this, let us recall the definition of the dot product of two vectors x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ): x · y = x1 y1 + x2 y2 + · · · + xn yn . (4.6) Now we use the dot product to define matrix multiplication. If we are given matrices A and B, we want to define the matrix C so that AB = C. To determine the element cij in the ith row and jth column of C, we consider the ith row of A and the jth column of B as vectors and take their dot product:     a11 · · · a1n c11 · · · c1j · · · c1p   .  .. ..   b .. ..   .  11 · · · b1j · · · b1p  .. · · · . ··· . ··· .    .   .. ..  =  c   ai1 · · · ain   . ··· . ··· .   i1 · · · cij · · · cip    .  .   .. . .. ..   ..  .. ··· .  bn1 · · · bnj · · · bnp ··· . ··· .  am1 · · · amn cm1 · · · cmj · · · cmp cij = ai1 b1j + ai2 b2j + · · · + ain bnj . Note that this is only possible if the number of columns in A equals the number of rows in B. Let summarize this as follows: Matrix Multiplication. For an (m × n)-matrix A with elements aij and an (n × p)matrix B with elements bij , we define their product C = A B to be the (m × p)-matrix with elements n X cij = aik bkj . (4.7) k=1 Before we compute numerical examples, let us observe that this definition of matrix multiplication is exactly what we need to represent the linear system (4.2) as (4.5):      a11 a12 · · · a1n x1 a11 x1 + a12 x2 + · · · + a1n xn  a21 a22 · · · a2n   x2   a21 x1 + a22 x2 + · · · + a2n xn       Ax =  . =    , . . . . . .. .. ..   ..   ..  ..  am1 am2 ··· amn xn so Ax = c is exactly the same thing as (4.2). am1 x1 + am2 x2 + · · · + amn xn 4.1. INTRODUCTION TO SYSTEMS AND MATRICES 109 Now let us compute some numerical examples of matrix multiplication. Notice that the matrices A and B in Example 3 are both (2 × 3)-matrices so the number of columns of A does not match the number of rows of B, and they cannot be multiplied. Let us try another example where the matrices are of the correct sizes. Example 4. Compute A B and B A where 1 A= 4 2 5 3 6  and 1 B= 1 −1  0 0 . −1 Solution. Note that A is a (2 × 3)-matrix and B is a (3 × 2)-matrix, so the product A B is defined and is a (2 × 2)-matrix: 1+2−3 0+0−3 0 −3 AB = = . 4+5−6 0+0−6 3 −6 Note that the product B A is also defined and results in a (3 × 3)-matrix:   1 0 1 2 3   0 BA = 1 4 5 6 −1 −1     1+0 2+0 3+0 1 2 3 2+0 3+0 = 1 2 3 . = 1+0 −1 − 4 −2 − 5 −3 − 6 −5 −7 −9 2 This last example shows that, even if two matrices A and B are of compatible sizes so that both A B and B A are defined, we need not have A B = B A; in fact A B and B A need not even be of the same size! Let us summarize some of the properties of matrix algebra that are valid: Properties of Matrix Addition and Multiplication. Let A, B, and C be matrices of the appropriate sizes so that the indicated algebraic operations are well-defined. Commutativity of Addition: A + B = B + A Associativity of Addition: A + (B + C) = (A + B) + C Associativity of Multiplication: A (B C) = (A B) C Distributivity: A (B + C) = A B + A C and (A + B) C = A C + B C Again, notice that commutativity of matrix multiplication is conspicuously absent since in general A B 6= B A. Now let us introduce an important class of matrices and a special member. Definition 3. An (n × n)-matrix, i.e. one with the same number of rows and columns, is called a square matrix. The square matrix with 1’s on its main diagonal, i.e. from the upper left to lower right, and 0’s everywhere else is called the identity matrix. The identity matrix is denoted by I, or In if we want to emphasize the size is (n × n). The identity matrixis 1 0 ··· 0  0 1 · · · 0   I = . . .. ..   .. .. . . 0 0 ··· 1 110 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES If A and B are square matrices of the same size (n × n) then both A B and B A are defined (although they need not be equal to each other). In particular, the identity matrix has the property IA = A = AI (4.8) for any square matrix A (of the same size). Transpose of a Matrix If A = (aij ) is an (m×n)-matrix, then its transpose AT is the (n×m)-matrix obtained by using the rows of A as the columns of AT ; hence the columns of A become the rows of AT . If we denote the elements of AT by aTij , then we find aTij = aji . (4.9) Here are a couple of simple examples: 1 4 2 5  T 1 3 = 2 6 3  4 5 6  1 1 4 and 2 0 5 T  3 1 −1 = 2 6 3 1 0 −1  4 5 . 6 Note that the transpose of a square matrix is also a square matrix (of the same size), and may be obtained by “mirror reflection” of the elements of A across the main diagonal. A square matrix A is symmetric if AT = A, i.e. aij = aji for i, j. Here are some examples:     1 0 1 1 0 1 0 0 0  is symmetric, but  0 0 0  is not symmetric. −1 0 −1 1 0 −1 The following properties of the transpose are easily verified and left as exercises: i) (AT )T = A, ii) (A + B)T = AT + BT , iii) (AB)T = BT AT . But the true significance of the transpose lies in its relationship to the dot product. In particular, if A is an (m × n)-matrix, x is an m-vector, and y is an n-vector, then x · Ay = AT x · y. (4.10) It is elementary (although perhaps a little confusing) to directly verify (4.10):   m n m X n X X X x · Ay = xi  aij yj  = aij xi yj i=1 j=1 i=1 j=1     n m n m n X m X X X X X   AT x · y = aTij xj  yi = aji xj  yi = aji xj yi . i=1 j=1 i=1 j=1 i=1 j=1 We see these two quantities are the same (only the roles of i and j are different), so this verifies (4.10). 4.1. INTRODUCTION TO SYSTEMS AND MATRICES 111 Exercises 1. For the following systems of two equations with two unknowns, determine whether (i) there is a unique solution, and if so find it, (ii) there is no solution, or (iii) there are infinitely many solutions: (a) (c) 3x1 + 2x2 = 9 x1 − x2 = 8 (b) Sol’n 2x1 + 3x2 = 1 (d) 3x1 + 5x2 = 3 x1 − 2x2 = 3 2x1 − 4x2 = 5 4x1 − 2x2 = 12 6x1 − 3x2 = 18 2. For the following matrices A and B calculate 2A, 3B, A + B, and A − 2B: 1 0 −1 0 1 2 A= , B= 1 2 3 2 1 0 3. For the following 1 (a) A = 3 matrices, calculate AB if it exists: 2 2 1 , B= Sol’n 4 0 −1 1 (b) A = 3 2 , 4 2 B= 0 1 −1 1 (c) A = 3 2 , 4  2 B = 0 1  1 −1 3  1 (d) A = 4 5  2 3, 6 1 B= −1 0 1 4. Calculate AT , BT , AB, BA, AT BT , and BT AT .   −1 1 3 1 −1 3 A= , B= 2 2 4 5 0 −4 5. Determine which of the following square matrices are symmetric: 1 2 1 2 A= , B= , 2 −1 −2 1  1 C = 2 3 2 0 −1  3 −1, 4  0 D = 1 2 −1 0 3  −2 3 0 112 4.2 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES Gaussian Elimination In this section we want to use matrices to address the solvability of (4.2). In addition to the coefficient matrix A in (4.3) and the vector c in (4.4), we can gather both together in a single matrix called the augmented coefficient matrix:   a11 a12 · · · a1n c1  a21 a22 · · · a2n c2    (4.11)  .. .. .. .. ..   . . . . .  am1 am2 · · · amn cm (The optional vertical line distinguishes the vector column from the coefficient matrix.) The objective of “Gaussian elimination” is to perform operations on (4.11) that do not change the solution set, but convert (4.11) to a new form from which the solution set is easily described. The operations that we are allowed to perform are exactly those used in the method of elimination reviewed in the previous section. Let us illustrate this with an example: Example 1. Apply the method of elimination to solve x + 2y + z = 4 3x + 6y + 7z = 20 2x + 5y + 9z = 19. Solution. We can multiply the first equation by −3 and add it to the second equation: x + 2y + z = 4 4z = 8 2x + 5y + 9z = 19. Now multiply the first equation by −2 and add it to the third equation: x + 2y + z = 4 4z = 8 y + 7z = 11. Now multiply the second equation by 1/4 and then interchange the second and third equations: x + 2y + z = 4 y + 7z = 11 z = 2. The last row tells us that z = 2, but the other rows have a nice triangular form allowing us to iteratively substitute back into the previous equation to find the other unknowns: z = 2 into 2nd equation ⇒ y + 7(2) = 11 ⇒ y = −3, z = 2, y = −3 into 1st equation ⇒ x + 2(−3) + (2) = 4 ⇒ x = 8. 4.2. GAUSSIAN ELIMINATION We have our (unique) solution: x = 8, y = −3, z = 2. 113 2 In this example, we used three types of operations that did not affect the solution set: i) multiply one equation by a number and add it to another equation, ii) multiply one equation by a nonzero number, and iii) interchange two equations. However, we carried along the unknowns x, y, and z unneccessarily; we could have performed these operations on the augmented coefficient matrix:         1 2 1 4 1 2 1 4 1 2 1 4 1 2 1 4  3 6 7 20  →  0 0 4 8  →  0 0 4 8  →  0 1 7 11  2 5 9 19 2 5 9 19 0 1 7 11 0 0 1 2 Here we see that the final version of the augmented matrix is indeed in a triangular form that enables us to conclude z = 2 and then “back substitute” as before to find y = −3 and x = 8. When performed on a matrix, we call these “elementary row operations”: Elementary Row Operations (EROs). The following actions on an (m × n)-matrix are called elementary row operations: • Multiply any row by a nonzero number. • Interchange any two rows. • Multiply one row by a number and add it to another row (leaving the first row unchanged). It will often be convenient to denote the rows of a matrix by R1 , R2 , etc. Notice that elementary row operations are reversible: if an ERO transforms the matrix A into the matrix B, then there is an ERO that transforms B to A. This enables us to make the following definition. Definition 1. Matrices A and B are called row-equivalent if one may be transformed into the other by means of elementary row operations. In this case we write A ∼ B. The fact that EROs are reversible also means that the following holds: Theorem 1. If the augmented coefficient matrix for a system of linear equations can be transformed by EROs to the augmented coefficient matrix for another system of linear equations, then the two systems have the same solution set. We can now specify that Gaussian elimination involves the use of elementary row operations to convert a given augmented coefficient matrix into a form from which we can easily determine the solution set. However, we have not specified what this final form should look like. We address this next: Row-Echelon Form (REF). An (m × n)-matrix is in echelon or row-echelon form if it meets the following conditions: 1. All rows consisting entirely of zeros lie beneath all nonzero rows. 2. The first nonzero element in any row is a 1; this element is called a leading 1. 3. Any leading 1 must lie to the right of any leading 1’s above it. If you add λR1 to R2 , be sure to leave R1 in place. 114 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES At first glance, it is difficult to tell what this definition means, so let us give some simple examples. It is easy to check that the following three matrices are all in row-echelon form (and we have colored the leading 1’s in red):       1 2 3 4 1 2 3 1 0 −1 1 0 1 0 1   0 1 2 0 0 1 −1 0 0 0 1 . 0 0 1 0 0 0 0 0 0 0 0 In particular, note that a matrix does not need to be a square matrix to be in rowechelon form. However, here are some examples of matrices that are not in row-echelon form:       1 2 3 1 2 3 1 1 −1 1 0 0 0 0 1 2 1 0 1 −1 . 0 0 1 0 0 2 0 0 0 0 In fact, the first of these violates condition 1, the second violates condition 2, and the third violates condition 3. Now let us suppose that we have used elementary row operations to put the augmented coefficient matrix for a linear system into row-echelon form. How do we determine the solution set? Determining the Solution Set from the REF. Suppose an augmented coefficient matrix is in row-echelon form. 1. The leading variables correspond to columns that contain a leading 1. 2. All other variables are called free variables; set each free variable equal to a free parameter. 3. Use the first nonzero row (from the bottom) to solve for the corresponding leading variable in terms of the free parameters. 4. Continue up the matrix until all leading variables have been expressed in terms of the free parameters. (This process is called back substitution.) Let us illustrate this with two examples. Example 2. Suppose a system of linear equations has an augmented coefficient matrix in the following form: 1 2 −1 1 . 0 0 1 −1 Determine the solution set. Solution. This matrix has 4 columns, but the last one corresponds to the right hand side of the system of equations, so there are 3 variables x1 , x2 , and x3 (although we could also have called them x, y, z). Since the first and third columns both contain leading 1’s, we identify x1 and x3 as leading variables. Since the second column does not contain a leading 1, we recognize x2 as a free variable, and we let x2 = t, where t is 4.2. GAUSSIAN ELIMINATION 115 a free parameter. Now the first nonzero row from the bottom is the second row and it tells us x3 = −1. The remaining nonzero row is the first one and it tells us x1 + 2x2 − x3 = 1. But we know x2 = t and x3 = −1, so we “back substitute” to obtain x1 + 2t − (−1) = 1 or x1 = −2t. The solutions set is x1 = −2t, x2 = t, x3 = −1 where t is any real number. In particular, we have found that the system has an infinite number of solutions. 2 Example 3. Suppose a system of linear equations has an augmented coefficient matrix in the following form:   1 2 3 0  0 1 0 −1  . 0 0 0 1 Determine the solution set. Solution. The last equation reads 0 = 1! This is impossible, so the system is inconsistent, i.e. there are no solutions to this system. 2 Remark 1. Generalizing Example 3, we see that if an augmented coefficient matrix has a row in the form (0 . . . 0 b) where b 6= 0, then there cannot be a solution, i.e. the system is inconsistent. There is one more piece of our analysis: we would like an algorithmic procedure to go from the original augmented coefficient matrix to the row-echelon form. It is this algorithm that often is called “Gaussian elimination.” Gaussian Elimination Algorithm. Suppose A is an (m × n)-matrix. 1. Find the leftmost nonzero column; this is called a pivot column and the top position in this column is called the pivot position. (If A = 0, go to Step 6.) 2. Use elementary row operations to put a 1 in the pivot position. 3. Use elementary row operations to put zeros in all positions below the pivot position. 4. If there are no more nonzero rows below the pivot position, go to Step 6; otherwise go to Step 5. 5. Repeat Steps 1-4 with the sub-matrix below and to the right of the pivot position. 6. The matrix is in row-echelon form. 116 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES Example 4. Use Gaussian elimination to form:  1 −1 3 2 0 10 3 0 15 put the following matrix into row-echelon  −3 0 −6 2  . −4 −2 Solution. The first column is nonzero, so it is the pivot column and we already have 1 in the pivot position, which we have colored red in the displayed matrices below. We use EROs to put zeros below the pivot position:     1 −1 3 −3 0 1 −1 3 −3 0 2 0 10 −6 2  ∼ 0 2 4 0 2  (add − 2R1 to R2 ) 3 0 15 −4 −2 3 0 15 −4 −2   1 −1 3 −3 0 2  (add − 3R1 to R3 ) ∼ 0 2 4 0 0 3 6 5 −2 2 Now we repeat the process on the 2 × 4-matrix containing the new pivot column . 3 There is a 2 in the pivot position, so we color it red below as we replace it with a 1:     1 −1 3 −3 0 1 −1 3 −3 0 1 0 2 4 0 2  ∼ 0 1 2 0 1  (multiply R2 by ) 2 0 3 6 5 −2 0 3 6 5 −2 Now we want zeros below the pivot position:    1 −1 3 −3 1 −1 3 −3 0 0 1 2 0 1  ∼ 0 1 2 0 0 0 0 5 0 3 6 5 −2 Finally, we look we make a 1:  1 −1 0 1 0 0  0 1 −5 (add − 3R2 to R3 ) at the last row, and see that there is a 5 in the pivot position, which 3 −3 2 0 0 5   1 0 1  ∼ 0 0 −5 −1 1 0 3 −3 2 0 0 1  0 1 −1 (multiply R3 by 1/5) 2 This is in row-echelon form. Let us do one more example where we start with a system of linear equations and finish with the solution set. Example 5. Find the solution set for x1 + 2 x2 + x4 = 3 x1 + x2 + x3 + x4 = 1 x2 − x3 = 2. Solution. Notice that there are three equations with four unknowns, so putting the augmented coefficient matrix into row-echelon form will result in at most three leading 4.2. GAUSSIAN ELIMINATION 117 variables. Consequently, there must be at least one free variable; even at this stage we know that there must be an infinite number of solutions. Now let us put the augmented coefficient matrix into row-echelon form:     1 2 0 1 3 1 2 0 1 3  1 1 1 1 1  ∼  0 −1 1 0 −2  (add − R1 to R2 ) 0 1 −1 0 2 0 1 −1 0 2   1 2 0 1 3 ∼  0 1 −1 0 2  (add R2 to R3 , then multiply R2 by -1). 0 0 0 0 0 We see that x3 = s and x4 = t are both free variables and we can solve for x1 and x2 in terms of them: R2 : x2 − s = 2 ⇒ x2 = s + 2 R1 : x1 + 2(s + 2) + t = 3 ⇒ x1 = −1 − 2s − t. We can write the solution set as (x1 , x2 , x3 , x4 ) = (−1 − 2s − t, s + 2, s, t), where s, t are any real numbers. 2 Homogeneous Systems Of particular interest is (4.5) when c is the vector 0, whose components are all zero. In this case, we want to solve Ax = 0, which is called a homogeneous system. (Note the similarity with a homogeneous linear differential equation L(D)y = 0, as discussed in Chapter 2.) One solution is immediately obvious, namely the trivial solution x = 0. The question is whether there are nontrivial solutions? Gaussian elimination enables us to answer this question. Moreover, since the augmented coefficient matrix has all zeros in the last column, we can ignore it in our Gaussian elimination. Example 6. Find the solution set for x1 + x2 + x3 − x4 = 0 −x1 − x3 + 2 x4 = 0 x1 + 3 x2 + x3 + 2x4 = 0. Solution. To begin with, the number of equations m = 3 is less than the number of variables n = 4, so we suspect there will be an infinite number of solutions. We apply Gaussian elimination to the coefficient matrix to find the solution set:     1 1 1 −1 1 1 1 −1 −1 0 −1 2  ∼ 0 1 0 1  (add R1 to R2 , then −R1 to R3 ) 1 3 1 2 0 2 0 3   1 1 1 −1 ∼ 0 1 0 1  (add − 2R2 to R3 ). 0 0 0 1 Remember that the augmented coefficient matrix has another column of zeros on the right, so the last row means we must have x4 = 0. We also see that x3 is a free variable, 118 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES so we let x3 = s. From the second row we conclude x2 = 0. Finally, the first row tells us x1 + 0 + s − 0 = 0, so x1 = −s. Our infinite solution set is x1 = −s, x2 = 0, x3 = s, x4 = 0. It includes the trivial solution x = 0 since we can take s = 0. 2 Systems with Complex Coefficients The linear systems and matrices that we have considered so far all had real coefficients, but the principles that we have followed work equally well when the coefficients are complex numbers; of course, in this case the solutions will also be complex numbers. In order to put a 1 in the leading position, we often need to multiply by the complex conjugate; for a review of this and other properties of complex numbers, cf. Appendix A. Let us consider an example. Example 7. Find the solution set for (1 + i) x1 + (1 − 3i) x2 = 0 (−1 + i) x1 + (3 + i) x2 = 0. Solution. The augmented coefficient matrix for this system is (1 + i) (1 − 3i) 0 . (−1 + i) (3 + i) 0 We want to use elementary row operations to put this into row-echelon form. But since the system is homogeneous, as we saw above we may omit the last column and just work with the coefficient matrix. We begin by making (1 + i) real by multiplying by its complex conjugate (1 − i) and then dividing by the resultant real number: (1 + i) (1 − 3i) 2 (−2 − 4i) 1 (−1 − 2i) ∼ ∼ . (−1 + i) (3 + i) (−1 + i) (3 + i) (−1 + i) (3 + i) Now we want to make the entry below the leading 1 a zero, so we multiply the first row by 1 − i and add it to the second row: 1 (−1 − 2i) 1 (−1 − 2i) ∼ . (−1 + i) (3 + i) 0 0 This is in row-echelon form and we see that x2 is a free variable, so we let x2 = t and we solve for x1 to find x1 = (1 + 2i)t. But since we are allowing complex numbers, we can allow t to take complex values, so our solution set is (x1 , x2 ) = ((1 + 2i)t, t) where t is any complex number. 2 Linear systems with complex coefficients are of interest in their own right. However, they also come up when studying real linear systems; for example when the coefficient matrix has complex eigenvalues (cf. Section 6.1). 4.2. GAUSSIAN ELIMINATION 119 Exercises 1. Determine whether  1 2 3 (a) 0 1 2 0 0 0  2 1 3 (c) 0 1 0 0 0 1 the following matrices are in row-echelon form:    0 1 2 3 0 3 (b) 0 0 1 0 1 0 1 0 0    0 0 1 2 0 0 (d) 0 1 0 0 0 0 0 0 0 2. Use Gaussian elimination to put the following matrices into row-echelon form:   0 1 2 2 3 (b) 0 1 3 (a) Sol’n 1 −1 0 2 4     2 1 0 −1 0 1 (c) 0 0 1 0  (d) 1 2 1 2 3 4 1 0 3. For the following linear systems, put the augmented coefficient matrix into rowechelon form, and then use back substitution to find all solutions: (a) (b) 2x1 + 8x2 + 3x3 = 2 x1 + 3x2 + 2x3 = 5 3x1 − 6x2 − 2x3 = 1 Sol 0 n 2x1 − 4x2 + x3 = 17 x1 − 2x2 − 2x3 = −9 2x1 + 7x2 + 4x3 = 8 (c) 4x1 + 3x2 + x3 = 8 (d) 2x1 + 5x2 + 12x3 = 6 2x1 + x2 = 3 3x1 + x2 + 5x3 = 12 −x1 + x3 = 1 5x1 + 8x2 + 21x3 = 17 x1 + 2x2 + x3 = 3 4. Apply Gaussian elimination to the coefficient matrix of the following homogeneous systems to determine the solution set. (a) (c) 3x1 + 2x2 − 3x3 = 0 (b) 2x1 − x2 − x3 = 0 2x1 + x2 + x3 = 0 5x1 − x2 + 2x3 = 0 5x1 − 4x2 + x3 = 0 x1 + x2 + 4x3 = 0 x1 − 5x3 + 4x4 = 0 x2 + 2x3 − 7x4 = 0 Sol 0 n (d) x1 − 3x2 + 6x4 = 0 x3 + 9x4 = 0 5. Find all solutions for the following systems with complex coefficients: (a) (1 − i) x1 + 2i x2 = 0 (1 + i) x1 − 2 x2 = 0 (c) i x1 + (1 − i) x2 = 0 (−1 + i) x1 + 2 x2 = 0 Sol 0 n (b) i x1 + x2 = 1 2 x1 + (1 − i) x2 = i (d) x1 + i x2 = 1 x1 − x2 = 1 + i 120 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES 4.3 Reduced Row-Echelon Form and Rank As we saw in the previous section, every (m × n)-matrix can be put into row-echelon form. However, the row-echelon form is not unique: any row containing a leading 1 can be added to any row above it, and the result is still in row-echelon form. On the other hand, every (m × n)-matrix can be put into a particular row-echelon form that is unique: this is called reduced row-echelon form. Reduced Row-Echelon Form (RREF). An (m×n)-matrix is in reduced row-echelon form if it meets the following conditions: 1. It is in row-echelon form. 2. Any leading 1 is the only nonzero element in that column. Of course, we need to know how to put a matrix into reduced row-echelon form; this is straight-forward, but it is given the name “Gauss-Jordan elimination”: Gauss-Jordan Elimination Algorithm. Suppose A is an (m × n)-matrix. 1. Use Gaussian elimination to put A in row-echelon form. 2. Use each leading 1 to make all entries above it equal to zero. The following are examples of matrices in    0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 reduced row-echelon   0 2 1 2 0 0 1 −1 0 0 0 0 form:  0 1 . 0 The Gauss-Jordan algorithm shows that every matrix can be put into reduced rowechelon form. A little experimentation should convince you that two distinct matrices in reduced row-echelon form cannot be row-equivalent; this means the RREF is unique. While this is not really a proof, let us record this as a theorem: Theorem 1. Every (m × n)-matrix A is row-equivalent to a unique matrix in reduced row-echelon form denoted rref(A). Let us illustrate the theorem with an example: Example 1. Put the following matrix find all solutions of Ax = 0:  1 A = 0 2 A into reduced row-echelon form, and use it to 1 1 0 5 3 4 2 −1 1  −1 1 . 1 Solution. We first put the matrix into row-echelon form. The first column is the pivot column, and it already has a 1 in the pivot position, so we want to put zeros in the 4.3. REDUCED ROW-ECHELON FORM AND RANK positions below  1 1 0 1 2 0 121 it: 5 3 4 2 −1 1   −1 1 1  ∼ 0 1 0 1 1 −2 5 3 −6 2 −1 −3  −1 1 3 (add − 2R1 to R3 ) 1 Now we consider the (2 × 4)-matrix beginning with the column . We have a 1 in −2 the pivot position, so we put a zero in the position below:     1 1 5 2 −1 1 1 5 2 −1 0 1 3 −1 1  ∼ 0 1 3 −1 1  (add 2R2 to R3 ) 0 −2 −6 −3 3 0 0 0 −5 5 Finally, we  1 0 0 consider the bottom row, and we want a leading 1 where −5 is:    1 5 2 −1 1 1 5 2 −1 1 3 −1 1  ∼ 0 1 3 −1 1  (multiply R3 by − 1/5) 0 0 −5 5 0 0 0 1 −1 This is in row-echelon form. For reduced row-echelon form, we want to put zeros above each leading 1. We will move a little more quickly:     1 1 5 2 −1 1 1 5 2 −1 0 1 3 −1 1  ∼ 0 1 3 0 0  (add R3 to R2 ) 0 0 0 1 −1 0 0 0 1 −1   1 1 5 0 1 ∼ 0 1 3 0 0  (add − 2R3 to R1 ) 0 0 0 1 −1   1 0 2 0 1 ∼ 0 1 3 0 0  (add − R2 to R1 ) 0 0 0 1 −1 The last matrix is rref(A). From this we conclude that x3 = s and x5 = t are free variables, and we can use them to find the other variables: x1 + 2x3 + x5 = 0 ⇒ x1 = −2s − t x2 + 3x3 = 0 ⇒ x2 = −3s x4 − x5 = 0 ⇒ x4 = t. Notice that we did not need to use back substitution to solve for the leading variables x1 , x2 , and x4 . 2 Remark 1. The RREF is more convenient than the REF for finding solutions since it does not require back substitution. However, for very large systems, the REF plus back substitution is computationally more efficient than using extra ERO’s to convert the REF to RREF. 122 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES It is often useful to express solutions of linear systems in vector form, in which free parameters appear as multiplicative factors for fixed vectors. This is most conveniently done using column vectors. We illustrate this with two examples. Example 2. Express the solutions in vector form for Example 1. Solution. We found above that the solutions are (x1 , x2 , x3 , x4 ) = (−2s − t, −3s, s, t, t) where s and t are free parameters, but we want to express the solution in the vector form sv1 + tv2 . To find the vectors v1 and v2 we separate the s and t parts:       −1 −2 −2s − t 0 −3  −3s             2 x=  s  = s  1  + t  0  = s v1 + t v2 . 1 0  t  1 0 t We can also express solutions of nonhomogeneous systems in vector form, but now there will be a fixed vector v0 that has no free parameters as multiplicative factors. Example 3. Express the solutions in vector form for Example 5 of Section 4.2. Solution. We found (x1 , x2 , x3 , x4 ) = (−1 − 2s − t, 2 + s, s, t), so         −1 −2 −1 −1 − 2s − t 0 1  2+s   2   =   + s   + t   = v0 + sv1 + tv2 . x= 0 1  0  s 1 0 0 t 2 The Rank of a Matrix Definition 1. The rank of an (m × n)-matrix is the number of nonzero rows in its reduced row-echelon form. It is clear that rank(A) ≤ m. Moreover, since each nonzero row in rref(A) contains exactly one leading 1, we see that rank(A) is also the number of leading variables; in particular, rank(A) cannot exceed the total number of variables, so rank(A)≤ n. Example 4. Find the rank of  −1 A= 3 2 the following matrices   −1 1 2 2 4 B= 3 2 3 7  1 2 2 4 . 3 6 Solution. We put A into reduced row-echelon form (moving still        −1 1 2 1 −1 −2 1 −1 −2 1  3 2 4 ∼ 0 5 10  ∼ 0 1 2  ∼ 0 2 3 7 0 5 11 0 0 1 0 So we conclude that  −1 1 3 2 2 3 (4.12) more quickly):  0 0 1 0 . 0 1 rank(A)=3. Similarly, we put B into reduced       2 1 −1 −2 1 −1 −2 1 4 ∼ 0 5 10  ∼ 0 1 2  ∼ 0 6 0 5 10 0 0 0 0 We conclude that rank(B)=2. row-echelon form:  0 0 1 2 . 0 0 2 4.3. REDUCED ROW-ECHELON FORM AND RANK 123 Remark 2. Since all row-echelon forms of a matrix have the same number of zero rows, the rank is actually equal to the number of nonzero rows in any of its row-echelon forms. The rank of the (m × n)-matrix A has important implications for the solvability of a linear system Ax = c. If rank (A) = m then rref(A) has no zero rows; therefore the RREF of the augmented coefficient matrix has no rows of the form (0 0 · · · 0 1), so Ax = c is solvable (no matter what c is!). If rank (A) = n then there is a leading 1 in each column, which means there are no free variables and there is at most one solution. On the other hand, if rank (A) < n, then there is at least one free variable; if the system is consistent, there will be infinitely many solutions, but if inconsistent, then no solutions. We summarize this discussion in the following: Theorem 2. Consider a system of m linear equations in n unknowns and let A be the coefficient matrix. (a) We always have rank (A) ≤ m and rank (A) ≤ n; (b) If rank (A) = m then the system is consistent and has at least one solution; (c) If rank (A) = n then the system has at most one solution; (d) If rank (A) < n then the system has either infinitely many solutions or none. This theorem is of more theoretical than practical significance: for a given linear system one should use Gaussian elimination (or Gauss-Jordan elimination) to obtain the solution set, which provides more information than the conclusions of the theorem. Nevertheless, the theorem yields some useful observations. Corollary 1. A linear system with fewer equations than unknowns either has infinitely many solutions or none. A homogeneous linear system always admits the trivial solution, so we can sharpen the conclusions in this case. Corollary 2. Suppose A is the coefficient matrix for a homogeneous linear system with n unknowns. If rank(A) = n then 0 is the only solution, but if rank(A) < n then there are infinitely many solutions. Finally, let us consider the special case of a linear system with the same number of equations and unknowns; in this case the coefficient matrix A is a square matrix. Corollary 3. If A is an (n × n)-matrix, then the linear system Ax = c has a unique solution x for every choice of c if and only if rank(A) = n. This last corollary will be very useful in the next section when we consider the inverse of a square matrix. As we observed, in practice one simply uses the RREF to determine solvability, so let us do an example, using it to illustrate the theoretical results above. Example 5. Determine the solution set for the linear system x1 + 2x2 + 3x3 = −2 2x1 + 5x2 + 5x3 = −3 x1 + 3x2 + 2x3 = −1. 124 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES Solution. We  1  2 1 put the augmented coefficient matrix into     2 3 −2 1 2 3 −2 5 5 −3  ∼  0 1 −1 1  ∼  3 2 −1 0 1 −1 1 RREF: 1 0 0 1 0 0 5 −1 0  −4 1 . 0 We see that the coefficient matrix A has rank 2, but this system is consistent. We let x3 = t be a free variable and the solution set is (x1 , x2 , x3 ) = (−4 − 5t, 1 + t, t). (Note that, for other values of c, the system Ax = c may be inconsistent!) 2 Exercises 1. The following matrices are in row-echelon form; find the reduced row-echelon form:       1 2 3 1 2 3 4 1 −1 3 0 1 2 (a) (b) 0 1 4 (c) 0 0 1 0 (d) 0 1 2 0  0 1 0 0 0 0 0 0 1 0 0 1 −1 2. For the following linear systems, put the augmented coefficient matrix into reduced row-echelon form, and use this to find the solution set: (a) x1 + 2x2 + x3 = 1 3x1 + 5x2 − x3 = 14 (b) 3x1 + 5x2 + x3 = 0 x1 + 2x2 + x3 = 3 2x1 + 6x2 + 7x3 = 10 4x1 − 2x2 − 3x3 + x4 = 3 (c) 2x1 + 5x2 + 6x3 = 2 (d) 2x1 − 2x2 − 5x3 = −10 4x1 + x2 + 2x3 + x4 = 17 3x1 + x3 + x4 = 12 (e) x1 − 2x2 + 3x3 + 2x4 + x5 = 10 2x1 − 4x2 + 8x3 + 3x4 + 10x5 = 7 3x1 − 6x2 + 10x3 + 6x4 + 5x5 = 27 3x1 + x2 + x3 + 6x4 = 14 x1 − 2x2 + 5x3 − 5x4 = −7 4x1 + x2 + 2x3 + 7x4 = 17 Solution (f) x1 + x2 + x3 = 6 2x1 − 2x2 − 5x3 = −13 3x1 + x3 + x4 = 13 4x1 − 2x2 − 3x3 = −3 3. For the systems in Exercise 2, find the solution set in vector form. Sol’n (d) 4. Determine the solutions in vector form for Ax = 0.     2 1 5 1 −1 0 −1 (a) A = −1 1 −1 , Sol’n (b) A = 2 1 3 7  , 1 1 3 3 −2 1 0   0 1 0 1 1 0 −1 0  1 0 −1 2 7   (c) A =  , (d) A = . 0 0 1 −1 0 1 2 −3 4 1 1 0 0 5. Determine the rank of the following matrices:  0 −3 12 2 3 , (c) −1 (a) Sol’n , (b) 2 −8 −1 4 0   1 2 −1 0 1 , (d)  1 0 −1 1 −2 3 −3  −2 1 7 4.4. INVERSE OF A SQUARE MATRIX 4.4 125 Inverse of a Square Matrix In this section we study the invertibility of a square matrix: if A is an (n × n)-matrix, then we say that A is invertible if there exists an (n × n)-matrix B such that AB = I and BA = I, (4.13) where I is the (n × n) identity matrix. We shall explain shortly why we are interested in the invertibility of square matrices, but first let us observe that the matrix B in (4.13) is unique: Proposition 1. If A is an (n × n)-matrix and two (n × n)-matrices B and C satisfy A B = I = B A and A C = I = C A, then B = C. Proof. Since A C = I, we can multiply on the left by B to conclude B (A C) = B I = B. But B(A C) = (BA) C = I C = C, so B = C. 2 This uniqueness enables us to introduce the following notation and terminology. Definition 1. If A is an invertible (n × n)-matrix then we denote by A−1 the unique (n × n)-matrix satisfying A A−1 = I = A−1 A. We call A−1 the inverse matrix (or just the inverse) of A. Of course, the notation A−1 is reminiscent of a−1 , which is the inverse of the nonzero real number a. And just like with real numbers, not all square matrices are invertible: certainly the zero matrix 0 (i.e. with all elements equal to 0) is not invertible, but also the (2 × 2)-matrix a 0 A= where a is any real number 0 0 is not invertible (cf. Exercise 1(a)). Matrices that are not invertible are called singular. Let us observe that inverses have some interesting properties Proposition 2. If A and B are invertible matrices of the same size, then (a) A−1 is invertible and (A−1 )−1 = A; (b) AB is invertible and (AB)−1 = B−1 A−1 . Proof. (a) The formula AA−1 = I = A−1 A actually shows that A−1 is invertible and (A−1 )−1 = A. (b) We just observe (AB)(B−1 A−1 ) = A(BB−1 )A−1 = AIA−1 = AA−1 = I and similarly (B−1 A−1 )(AB) = I. 2 Now let us explain our primary interest in the inverse of a matrix. Suppose we have a linear system of n equations in n unknowns that we write as A x = b, (4.14) where A is the coefficient matrix. If A is invertible, then we can solve (4.14) simply by letting x = A−1 b : x = A−1 b ⇒ A x = A (A−1 b) = (AA−1 ) b = I b = b. 126 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES Solving Ax=b. If A is an invertible (n × n)-matrix, then we can solve (4.14) by letting x = A−1 b. Consequently, we would like to find a method for calculating A−1 . When A is a (2×2)-matrix, the construction of A−1 is provided by a simple formula: Inverting a (2 × 2)-matrix. Recall from Section 2.2 the determinant of a (2 × 2)-matrix: a b A= ⇒ det(A) = ad − bc. c d If det(A) 6= 0, then A is invertible and A−1 = 1 d −b . det(A) −c a If det(A) = 0, then A is not invertible. To verify that the above formula for A−1 works when det(A) 6= 0, just compute AA−1 and A−1 A. When det(A) = 0, then A is row-equivalent to a matrix with a row of zeros; but then b can be chosen so that the row operations performed on the augmented coefficient matrix will be inconsistent, so (4.14) is not solvable, and A is not invertible. Example 1. Find the inverses for the following matrices: 1 2 2 a) A = b) B = 3 4 4 3 6 Solution. det(A)= 4 − 6 = −2, so A is invertible and 1 4 −2 −2 1 −1 = . A =− 3/2 −1/2 2 −3 1 On the other hand, det(B)= 12 − 12 = 0, so B is not invertible. Example 2. Solve the following linear systems (a) 2x1 + 7x2 = 1 x1 + 3x2 = −1 Solution. Both systems (a) and (b) have the 2 A= 1 (b) 2x1 + 7x2 = 2 x1 + 3x2 = 0. same coefficient matrix 7 . 3 2 4.4. INVERSE OF A SQUARE MATRIX 127 We calculate det(A)= −1 and A −1 3 =− −1 −7 −3 = 2 1 7 . −2 We can now use A−1 to solve both (a) and (b): −3 7 1 −10 (a) x = = 1 −2 −1 3 −3 7 2 −6 (b) x = = . 1 −2 0 2 2 We now turn to the general case of trying to invert an n × n-matrix. We first observe that the invertibility of A is determined by its rank. Theorem 1. An (n × n)-matrix A is invertible if and only if rank(A) = n. The proof of this theorem actually provides a method for computing A−1 , but we need some notation. For j = 1, . . . , n, let ej denote the jth column vector of I:     0 1 1 0     I = e1 · · · en where e1 =  .  , e2 =  .  , etc.  ..   ..  0 0 Proof. ⇒: If A−1 exists, then for any b we can solve Ax = b uniquely by x=A−1 b. But we know from Theorem 2 in Section 4.3 that a unique solution of Ax = b requires rank(A) = n. So A being invertible implies rank(A) = n. ⇐: We assume rank(A) = n, and we want to construct A−1 . For j = 1, . . . , n, let xj be the unique solution of Axj = ej , and let X denote the matrix with column vectors xj . Then we compute AX = Ax1 Ax2 · · · Axn = e1 e2 · · · en = I. We have shown AX = I, but in order to conclude X = A−1 we also need to show XA = I. To do this, we use a little trick: multiply AX = I on the right by A to obtain AXA = IA = A ⇔ A(XA − I) = 0. Now this last equation says that every column vector y in XA-I satisfies Ay = 0. But using Corollary 2 of Section 4.3, we know that Ay = 0 only has the trivial solution y = 0, so we conclude that every column vector in XA-I is zero, i.e. XA = I. We indeed conclude that X = A−1 . 2 Now let us use the proof of Theorem 1 to obtain an algorithm for finding the inverse of a square matrix. To be specific, let us take n = 3. The matrix X has column vectors x1 , x2 , x3 found by solving Ax1 = e1 , Ax2 = e2 , Ax3 = e3 . To find x1 we use Gauss-Jordan elimnation:     1 0 0 x11 a11 a12 a13 1  a21 a22 a23 0  ∼  0 1 0 x21  for some values x11 , x21 , x31 a11 a12 a13 0 0 0 1 x31 128 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES and then we let   x11 x1 = x21  . x31 We then do the same thing with Ax2 = e2 and Ax3 = e3 to find x2 and x3 . But in each case, the row operations are the same, i.e. to transform A to I, so we might as well do all three at once:     1 0 0 x11 x12 x13 a11 a12 a13 1 0 0  a21 a22 a23 0 1 0  ∼  0 1 0 x21 x22 x23  . a11 a12 a13 0 0 1 0 0 1 x31 x32 x33 Then we let X = A−1 be the (3 × 3)-matrix on the right. This process is called the Gauss-Jordan method for inverting a matrix. Gauss-Jordan Method for Inverting a Matrix. If A and X are (n × n)-matrices such that A|I ∼ I|X , then A is invertible and X = A−1 . Example 3. Use the Gauss-Jordan method is invertible, and if so find A−1 :  3 A = 1 2 Solution. We apply  3  1 2 to determine whether the following matrix 5 2 6  1 1 . 7 the Gauss-Jordan method:    5 1 1 0 0 1 2 1 0 1 0 2 1 0 1 0 ∼ 3 5 1 1 0 0  6 7 0 0 1 2 6 7 0 0 1   1 2 1 0 1 0 ∼  0 −1 −2 1 −3 0  0 2 5 0 −2 1   1 2 1 0 1 0 ∼  0 1 2 −1 3 0  0 0 1 2 −8 1   1 2 0 −2 9 −1 ∼  0 1 0 −5 19 −2  0 0 1 2 −8 1   1 0 0 8 −29 3 ∼  0 1 0 −5 19 −2  0 0 1 2 −8 1 4.4. INVERSE OF A SQUARE MATRIX 129 Since we were able to transform A into I by ERO’s, we conclude A is invertible and   8 −29 3 A−1 = −5 19 −2 . 2 −8 1 Of course, we can check our work by making sure that A−1 A = I. 2 Now let us illustrate the use of the inverse matrix to solve a linear system. Example 4. Use the inverse of the coefficient matrix to solve 3 x1 + 5 x2 + x3 = 1 x1 + 2 x2 + x3 = 0 2 x1 + 6 x2 + 7 x3 = −1. Solution. If we write the above linear  3 5 A = 1 2 2 6 system in the form Ax=b, then we see that    1 1 1 b =  0 . 7 −1 However, we computed A−1 in the previous example, so we can calculate x = A−1 b:      5 1 8 −29 3 x = −5 19 −2  0  = −3 . 1 −1 2 −8 1 In other words, the solution is x1 = 5, x2 = −3, x3 = 1. 2 Remark 1. Of course, instead of computing A−1 , we can solve (4.14) using Gaussian elimination, which is computationally more efficient. However, the advantage of finding A−1 is that we can use it repeatedly if we want to solve (4.14) for several values of b. This was illustrated in Example 2. Invertibility Conditions Let us gather together various equivalent conditions for the invertibility of a square matrix. Invertibility Conditions. For an (n × n)-matrix A, the following are equivalent. 1. A is invertible, 2. A is row-equivalent to I, 3. rank(A)=n, 4. Ax = 0 implies x = 0, 5. for any n-vector b there exists a unique solution of Ax = b, 6. det(A)6= 0. 130 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES The equivalence of Conditions 1 and 2 follows from the Gauss-Jordan method and the equivalence of 1 and 3 is just Theorem 1. The equivalences with Conditions 4 and 5 are not difficult to prove; see Exercise 3. For completeness, we have included Condition 6 involving the determinant, which we shall discuss in the following section. Exercises a 0 1. (a) For any number a, let A = . Show that there is no (2 × 2)-matrix C 0 0 so that AC = I. 0 b (b) For any number b, let B = . Show that there is no (2 × 2)-matrix C 0 0 so that BC = I. 2. If A is invertible and k is a positive integer, show that Ak is invertible and (Ak )−1 = (A−1 )k . 3. (a) Show that Invertibility Conditions 3 and 4 are equivalent. (b) Show that Invertibility Conditions 4 and 5 are equivalent. Hint: are there free variables? 4. Find the inverses of the following matrices (if they exist): 1 −1 3 2 3 −2 (a) , (b) , (c) , 1 2 4 3 −6 4       2 7 3 1 2 −3 1 5 1 (d) 2 5 0, (e) 1 3 2, (f)  2 6 −2 Sol’n 3 7 9 −1 1 4 2 7 1 5. Find the inverses of the 3 7 (a) A = , 2 5  1 3 (c) A = 2 8 3 10 following matrices A and use them to solve Ax = b: −1 3 2 5 b= Sol’n , (b) A = , b= , 3 5 4 6        2 1 1 4 3 6 3 , b = 1, (d) A = 1 4 5 , b = 0. 6 2 2 5 1 6 6. Write the following systems as Ax = b, then find A−1 and the solution x: (a) 5x1 + 12x2 = 5 7x1 + 17x2 = 5 Sol’n (b) x1 + 4x2 + 13x3 = 5 3x1 + 2x2 + 12x3 = −1 7x1 + 9x2 = 3 5x1 + 7x2 = 2 2x1 + x2 + 3x3 = 3 x1 + x2 + 5x3 = 5 (c) . (d) x1 − x2 + 2x3 = 6 3x1 + 3x2 + 5x3 = 9 4.5. THE DETERMINANT OF A SQUARE MATRIX 4.5 131 The Determinant of a Square Matrix For an (n × n)-matrix A we want to assign a number det(A) that can be used to determine whether A is invertible. Recall the definition of det(A) for n = 2, a b det = ad − bc, (4.15) c d and the fact that Lemma 1 in Section 2.2 showed A is invertible if and only if det(A)6= 0. The goal of this section is to achieve this for n > 2. Before we give the definition of det(A) for general n ≥ 2, let us list some of the properties that we want to hold. The first three pertain to the effects that elementary row operations have upon the determinant; the fourth refers to matrices in upper triangular form, i.e. having all zeros below the main diagonal:   a11 a12 a13 · · · a1n  0 a22 a23 · · · a2n     0 0 a33 · · · a3n     ..  .. .. ..  .  . . . 0 0 0 · · · ann A matrix with all zeros above the main diagonal is in lower triangular form. Desired Properties of the Determinant. Let A and B be (n × n)-matrices. P1: If B is obtained from A by interchanging two rows, then det(B) = −det(A). P2: If B is obtained from A by multiplying one row by k then det(B) = k det(A). P3: If B is obtained from A by adding a multiple of one row to another row (leaving the first row unchanged), then det(B) = det(A). P4: If A is an upper (or lower) triangular matrix, then det(A)= a11 a22 · · · ann . It is easy to see that these properties are satisfied by the definition (4.15) for (2 × 2)matrices. Once we give the definition of det(A) for (n×n)-matrices, we shall show these properties hold in general, but for now let us show that they imply the invertibility condition that we mentioned in the previous section: Theorem 1. If A is a square matrix, then A is invertible if and only if det(A) 6= 0. Proof. We already know that A is invertible if and only if Gaussian elimination makes A ∼ I. But P1-P3 show that ERO’s cannot change a nonzero determinant to a zero determinant. (Recall that multiplication of a row by k is an ERO only if k is nonzero.) Since P4 implies det(I)= 1 6= 0, we conclude that det(A) 6= 0 is equivalent to A being invertible. 2 132 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES Remark 1. Sometimes it is convenient to write |A| instead of det(A), but we must remember that |A| could be negative! Before turning to the formal definition, let us use the above properties to actually compute the determinant of some square matrices. Example 1. Calculate the determinant of  2 A = −1 4 Apply P1 to R1 with R2 ; then apply P2 to R1 with k = −1  1 3 2 6 . 1 12 Solution. We use ERO’s with P1-P3 to put the matrix into upper triangular form and then use P4 to calculate the determinant: 2 −1 4 Apply P3 first to R1 and R2 ; then to R1 and R3 1 2 1 3 −1 6 =− 2 12 4 1 = 0 4 Apply P2 to R2 and to R3 ; then apply P3 to R2 and R3 ; finally use P4 1 = 45 0 0 −2 1 1 6 1 3 = 2 12 4 −2 1 1 1 −6 15 = 0 0 12 −2 5 9 −6 15 36 −6 1 3 = 45 0 4 0 −2 1 0 −6 3 = 45. 1 −2 5 1 2 1 1 Example 2. Calculate the determinant of  0 1 −1 0 A=  1 −1 −1 −1 −6 3 12 2  −1 1 1 1 . 0 1 −1 0 Solution. Again we use ERO’s to put the matrix into upper triangular form so that we can use P4: 0 −1 1 −1 1 0 −1 −1 −1 1 1 1 1 0 = 0 1 1 −1 0 −1 1 0 0 1 = 0 0 0 0 −1 −1 0 −3 0 1 −1 −1 −1 −1 0 −1 −1 1 1 0 = 3 0 0 0 −1 1 1 0 = 1 0 0 0 0 1 0 0 −1 −1 3 0 0 1 −1 −1 −1 −1 1 −2 −1 1 = 9. 0 3 −1 1 2 −1 2 4.5. THE DETERMINANT OF A SQUARE MATRIX 133 Defining the Determinant using Permutations A permutation p1 , . . . , pn of the integers 1, . . . , n is just a reordering of the ordered n-tuple (1, . . . , n). For example, with n = 4, we could take (p1 , p2 , p3 , p4 ) = (2, 1, 3, 4) or (p1 , p2 , p3 , p4 ) = (1, 2, 4, 3); these are both obtained by a simple interchange of two elements of (1, 2, 3, 4). In general, every permutation can be realized as a sequence of simple interchanges of the elements of (1, . . . , n), and it is easy to see that there are n! permutations of (1, . . . , n). The permutation is called even or odd depending on whether it requires an even or odd number of such interchanges. (Although a given permutation can be realized in several different ways as sequences of simple interchanges, the number of such interchanges is either even or odd; this fact is not entirely obvious, but we shall not bother to give a formal proof.) For a given permutation (p1 , . . . , pn ) we define its sign by ( 1, if (p1 , . . . , pn ) is an even permutation of (1, . . . , n) σ(p1 , . . . , pn ) = (4.16) −1, if (p1 , . . . , pn ) is an odd permutation of (1, . . . , n). Note that a simple interchange of two elements of a permutation changes its sign. For example, with n = 3, we have σ(1, 2, 3) = 1, σ(2, 1, 3) = −1, σ(2, 3, 1) = 1, etc. Now suppose that we are given a square matrix A with elements aij . The product of the elements on the diagonal is a11 a22 · · · ann ; notice that each row and each column of A contributes a factor to this n-product. We want to consider other n-products to which each row and column contributes a factor. Such a general n-product may be written as a1p1 a2p2 · · · anpn where (p1 , . . . , pn ) is a permutation of (1, . . . , n). The determinant of A is then defined by summing these n-products over all n! such permutations, and using the sign of each permutation: Definition 1. For an n × n-matrix A = (aij ), we define its determinant by X det(A) = σ(p1 , p2 , . . . , pn ) a1p1 a2p2 · · · anpn , where the sum is taken over all n! permutations (p1 , p2 , . . . , pn ) of (1, 2, . . . , n). Let us verify that this definition coincides with (4.15) when n = 2. But for n = 2, the only permutations of (1, 2) are (1, 2) itself and (2, 1). Moreover, σ(1, 2) = 1 and σ(2, 1) = −1, so a11 a12 det = a11 a22 − a12 a21 , a21 a22 which agrees with (4.15). For n = 3, let us carry out the calculation of det(A) and obtain a way of remembering the result. We have 3! = 6 permutations of (1, 2, 3) and their signs are σ(1, 2, 3) = σ(2, 3, 1) = σ(3, 1, 2) = 1 σ(1, 3, 2) = σ(2, 1, 3) = σ(3, 2, 1) = −1. We conclude that  a11 a12 deta21 a22 a31 a32  a13 a23  = a33 a11 a22 a33 + a12 a23 a31 + a13 a21 a32 −a11 a23 a32 − a12 a21 a33 − a13 a22 a31 . (4.17) 134 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES This result may be remembered as follows. First create a 3 × 5 matrix by repeating the first two columns as columns four and five respectively: a11 a21 a31 a12 a22 a32 a13 a23 a33 a11 a21 a31 a12 a22 a32 Then add together the 3-products on the three downward diagonals and subtract from them the 3-products on the three upward diagonals. The result is exactly det(A). Before we show that det indeed satisfies the desired properties P1-P4, let us add two additional properties: P5: If two rows of A are the same, then det(A) = 0. P6: Suppose that A, B, and C are all (n × n)-matrices which are identical except the ith row of A is the sum of the ith row of B and the ith row of C; then det(A)=det(B)+det(C). Theorem 2. The determinant, i.e. det, satisfies the properties P1-P6. Proof. Interchanging two rows of a matrix affects the determinant by changing each even permutation to an odd one and vice versa (see Exercise 3), so P1 holds. Since each of the products a1p1 · · · anpn contains exactly one element of each row, multiplying one row of a matrix by k multiplies the product a1p1 · · · anpn by k; thus P2 holds. Turning to P5, if two rows of A are equal, then interchanging them does not affect A, but changes the sign of det(A) according to P1; thus det(A)= −det(A), which implies det(A)=0. Going on to P6, we suppose that the elements of A, B, and C are all ajk for j 6= i and aik = bik + cik Then for k = 1, . . . , n. det(A) = X σ(p1 , . . . , pn )a1p1 a2p2 · · · aipi · · · anpn = X σ(p1 , . . . , pn )a1p1 a2p2 · · · (bipi + cipi ) · · · anpn = X σ(p1 , . . . , pn )a1p1 a2p2 · · · bipi · · · anpn X + σ(p1 , . . . , pn )a1p1 a2p2 · · · cipi · · · anpn = det(B) + det(C), proving P6. Finally, we prove P4. If A= (aij ) is upper triangular, then aij = 0 whenever i > j. So the only nonzero terms σ(p1 , . . . , pn ) = a1p1 · · · anpn occur when pi ≥ i. Since the pi must be distinct, the only possibility is pi = i for i = 1, . . . , n. Thus the sum reduces to a single term: det(A) = σ(1, 2, . . . , n)a11 a22 · · · ann = a11 a22 · · · ann . This proves P4. 2 There is one more useful property of determinants that relates to the transpose that was discussed at the end of Section 4.1: 4.5. THE DETERMINANT OF A SQUARE MATRIX 135 P7: det AT = det(A). Proof of P7. We recall that aTij = aji , so X det(AT ) = σ(p1 , p2 , . . . , pn ) ap1 1 ap2 2 · · · apn n . (4.18) But (p1 , p2 , . . . , pn ) is a permutation of (1, 2, . . . , n), so by rearranging the factors we have ap1 1 ap2 2 · · · apn n = a1q1 a2q2 · · · anqn , (4.19) for some permutation (q1 , q2 , . . . , qn ) of (1, 2, . . . , n). Now we claim that σ(p1 , p2 , . . . , pn ) = σ(q1 , q2 , . . . , qn ). (4.20) The reason (4.20) is true is that (4.19) implies the number of simple interchanges of (1, 2, . . . , n) to achieve (p1 , p2 , . . . , pn ) must equal the number of simple interchanges of (1, 2, . . . , n) to achieve (q1 , q2 , . . . , qn ). But now plugging (4.19) and (4.20) into (4.18), we find that det AT = det(A). 2 Remark 2. Using P7 it is clear that in properties P1-P3 and P5-P6 the word “row” may be replaced by the word “column,” and why P4 holds for upper triangular matrices as well as lower triangular ones. Exercises 1. Use properties P1-P4 to find the determinants of the following matrices:     3 1 −1 1 2 3 (b) 5 3 1  Solution (a) 2 5 8  2 2 0 3 7 13     0 1 −1 1 1 −1 0 4 −1 0 3 1 −1 1 2 4   (d)  (c)    −1 1 1 −1 0 1 3 2 −1 1 −1 0 2 −2 −2 2 2. Find the values of k for which the system has a nontrivial solution: (a) x1 + kx2 = 0 kx1 + 9x2 = 0 x1 + 2x2 + kx3 = 0 Solution (b) 2x1 + kx2 + x3 = 0 −x1 − 2x2 + kx3 = 0 3. Calculate the following signs of permutations (with n = 4): (a) σ(1, 3, 2, 4), 4. Use properties  i (a) 1 0 (b) σ(3, 4, 1, 2), (c) σ(4, 3, 2, 1). P1-P4 to find the determinants of the following   1+i −1 0 −i −1 Solution (b)  1 1 1 2i complex matrices:  1 i 0 1 1 0 5. Another important property of determinants is det(AB) = det(A) det(B). By direct calculation, verify this formula for any (2 × 2)−matrices A and B. 136 4.6 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES Cofactor Expansions In this section we derive another method for calculating the determinant of a square matrix. We also discuss another method for computing the inverse matrix. First we introduce some terminology. Definition 1. For any element aij of an (n × n)-matrix A, the minor Mij of aij is the determinant of the (n − 1) × (n − 1) matrix obtained by deleting that row and column. The cofactor Cij of aij is (−1)i+j Mij . The signs of alternate ±:  + − + − + −  + − +  .. .. .. . . . the cofactors For example, if we consider the matrix  ··· · · ·  · · ·  .. .  1 A = 0 2 2 1 3  3 −1 , 2 we see that a23 = −1, its minor is M23 = 1 2 2 = 3 − 4 = −1, 3 and its cofactor is C23 = (−1)2+3 M23 = (−1)(−1) = 1. If we choose any row of A, multiply each element by its cofactor and take the sum, we obtain the cofactor expansion along that row. For example, the cofactor expansion along the first row is a11 C11 + a12 C12 + · · · + a1n C1n = a11 M11 − a12 M12 + · · · + (−1)1+n M1n . We can also compute the cofactor expansion along any column of A. The significance of cofactor expansions is that they may be used to compute the determinant. Theorem 1. For an (n × n)-matrix A, the cofactor expansion along any row or column is det(A). We shall prove this theorem below, but first let us see it in action. To see how it works for a (2 × 2)-matrix, let us select the first row for our cofactor expansion. The theorem tells us a a12 det 11 = (−1)1+1 a11 M11 + (−1)1+2 a12 M12 = a11 M11 − a12 M12 . a21 a22 But M11 = a22 and M12 = a21 so we obtain det(A) = a11 a22 − a12 a21 , which is the expected result. Similarly, let us use the first row for a (3 × 3)-matrix:   a11 a12 a13 det a21 a22 a23  = a11 M11 − a12 M12 + a13 M13 , a31 a32 a33 4.6. COFACTOR EXPANSIONS 137 where M11 = a22 a32 a23 , a33 a21 a31 M12 = a23 , a33 M13 = a21 a31 a22 . a32 If we evaluate these 2 × 2 determinants, the result agrees with (4.17). Of course, we did not have to use the first row for these cofactor expansions and in some cases it makes more sense to choose other rows or columns. Example 1. Calculate the determinant  1 0 A= 2 1 of the matrix  2 3 2 1 −1 0 , 3 2 1 0 1 0 Solution. We first select a row or column for the cofactor expansion, and we may as well choose one that simplifies the calculation. Note that the second row has two zeros, so let us use that. (The last row or last column would work equally well.) 1 0 2 1 2 1 3 0 3 −1 2 1 2 1 0 = 2 1 1 0 3 2 1 1 2 1 − (−1) 2 1 0 2 3 0 1 2 1 = 2 1 0 3 2 1 1 2 1 + 2 1 0 2 3 0 2 1 0 We now must evaluate the determinants of two (3 × 3)-matrices. For the first of these, let us use the last row for the cofactor expansion since it has a zero (although we might have chosen the last column for the same reason): 1 2 1 3 2 1 2 3 1 = 2 0 1 2 − 2 1 2 = (3 − 4) − (1 − 4) = 2. 1 We also use the last row to evaluate the determinant of the second 3 × 3 matrix: 1 2 1 2 3 0 2 2 1 = 3 0 2 = 2 − 6 = −4. 1 We conclude that det(A)= 2 − 4 = −2. 2 We can sometimes use elementary row operations to simplify the matrix before we use a cofactor expansion. Example 2. Calculate the determinant of  1 A = −1 2 the matrix  2 3 −1 −2 , 3 4 Solution. None of the rows or columns contain zeros, but we can use ERO’s to change this fact: 138 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES 1 −1 2 2 −1 3 1 3 −2 = 0 0 4 2 1 −1 3 1 1 = −1 −2 1 = −2 + 1 = −1. −2 2 Now we prove that cofactor expansions indeed compute the determinant. Proof of Theorem 1. We first show that det(A) may be computed using a cofactor expansion along the first row. Note that X det(A) = σ(p1 , p2 , . . . , pn ) a1p1 a2p2 · · · anpn X = a11 σ(1, p2 , . . . , pn )a2p2 · · · anpn pi 6=1 + a12 X σ(2, p2 , . . . , pn )a2p2 · · · anpn pi 6=2 (4.21) .. . + a1n X σ(n, p2 , . . . , pn )a2p2 · · · anpn pi 6=n Now we observe that the (n − 1) × (n − 1)-matrix obtained by deleting the first row and column is (apq )np,q=2 , and its determinant is the minor for a11 : M11 = X pi 6=1 σ(p2 , . . . , pn )a2p2 · · · anpn = X σ(1, p2 , . . . , pn )a2p2 · · · anpn , pi 6=1 since σ(1, p2 , . . . , pn ) = σ(p2 , . . . , pn ), which is the sign of the permutation (p2 , . . . , pn ) of (2, . . . , n). Similarly, the (n − 1) × (n − 1)-matrix obtained by deleting the first row and second column is (apq )p6=1,q6=2 , and its determinant is X X M12 = σ(p2 , . . . , pn )a2p2 · · · anpn = − σ(2, p2 , . . . , pn )a2p2 · · · anpn , pi 6=2 pi 6=2 since σ(2, p2 , . . . , pn ) = −σ(p2 , 2, p3 , . . . , pn ) = −σ(p2 , p3 , . . . , pn ) for any permutation (p2 , . . . , pn ) of (1, 3, . . . , n). But recall that C12 = −M12 . Continuing in this fashion we find that (4.21) is just the cofactor expansion along the first row: det(A) = a11 C11 + a12 C12 + · · · + a1n C1n . Now suppose that we want to use the second row for the cofactor expansion. Let A0 denote the matrix obtained by switching the first and second rows of A. Then the cofactor expansion along the first row of A0 , which we know from the above is det(A0 ), is the negative of the cofactor expansion of A along its second row (since each factor (−1)2+j has changed sign). But from Property P1 of determinants, we also have det(A0 )= −det(A), so the cofactor expansion of A along its second row indeed yields det(A). Expansion along other rows of A are treated analogously. If we want to use a cofactor expansion along a column, we note that a column of A is a row of AT and we have det(AT )=det(A). 2 4.6. COFACTOR EXPANSIONS 139 Adjoint Method for Computing the Inverse of a Matrix In Section 4.4 we used the Gauss-Jordan method to compute the inverse of a square matrix. Here we discuss another method for computing the inverse that uses cofactors. First we need some additional terminology: Definition 2. If A is an n × n matrix with elements aij , let Cij denote the cofactor for aij and C denote the cofactor matrix, i.e. C = (Cij ). Then the adjoint matrix for A is the transpose of C: Adj(A) = CT . Now we show how to use the adjoint matrix to compute the inverse of a matrix. Theorem 2. If A is a square matrix with det(A) 6= 0, then A−1 = 1 Adj(A). det(A) (4.22) Let us confirm that this coincides with the formula in Section 4.4 for (2 × 2)-matrices: a b d −c d −b A= ⇒ C= ⇒ Adj(A) = c d −b a −c a so we obtain the familiar formula A −1 1 d −b . = ad − bc −c a Now let us use (4.22) Example 3. Find the inverse of the matrix  1 2 A = −1 −1 2 3  3 −2 , 4 Solution. In Example 2 we computed det(A) = −1, so A is invertible. Let us compute the cofactors: −1 −2 = 2, 3 4 2 3 C21 = − = 1, 3 4 2 3 C31 = = −1, −1 −2 C11 = So  −1 −2 = 0, 2 4 1 3 C22 = = −2, 2 4 1 3 C32 = − = −1, −1 −2 2 Adj(A) = CT =  0 −1 C12 = − 1 −2 1  −1 −1 1 ⇒ A−1 −1 = −1, 3 1 2 C23 = − = 1, . 2 3 1 2 C33 = =1 −1 −1 C13 =  −2 = 0 1 −1 2 −1 2 −1  1 1 . −1 140 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES Proof of Theorem 2. Theorem 1 states that, if we multiply the elements of any row (or column) of A by their respective cofactors and take the sum, we obtain det(A): ai1 Ci1 + · · · ain Cin = det(A). (4.23) However, we now want to show that if we multiply the elements of any row by the cofactors of a different row and take the sum, we get zero: ai1 Cj1 + · · · ain Cjn = 0 if i 6= j. (4.24) To show this, let B denote the matrix obtained by adding the ith row to the jth row of A. By Property P3, det(B)=det(A). If we take the cofactor expansion along the jth row of B, we obtain det(B) = n X (ajk + aik )Cjk = k=1 n X ajk Cjk + k=1 n X aik Cjk . k=1 But the first sum on the right is just det(A), so we have det(A) = det(A) + n X aik Cjk , k=1 which proves (4.24). (A similar result holds for elements and cofactors in different columns of A.) Now let B= [det(A)]−1 Adj(A); to prove (4.22) we want to show AB = I = BA. We compute the elements of AB using (4.23) and (4.24): ( n n n X X X 1, if i = j 1 1 Adj(A)kj = aik Cjk = (AB)ij = aik bkj = aik det(A) det(A) 0, if i 6= j. k=1 k=1 k=1 This shows that AB = I and a similar calculation shows BA = I. 2 Computational Efficiency We now have two different methods for computing the determinant (and finding the inverse) of an (n × n)-matrix: (a) using elementary row operations and properties of the determinant as in Section 4.5, and (b) using cofactor expansions as in this section. However, for large values of n, computing a determinant by method (b) requires many more calculations than method (a). For example, if we compare the number of multiplications (which are computationally more demanding than additions) we find that method (b) requires n! multiplications, while method (a) requires fewer than n3 . But n! grows much more rapidly than n3 as n increases. For example, 20! = 2.4 × 1018 , so to compute the determinant of a (20 × 20)-matrix by method (a) requires less than 4, 000 multiplications, but by method (b) requires 2.4 × 1018 multiplications! Similarly, the number of multiplications involved in using (4.22) to calculate the inverse of a matrix is much larger than those involved in the Gauss-Jordan algorithm. For this reason, computational programs use ERO’s instead of cofactor expansions to compute determinants and inverses of matrices. 4.6. COFACTOR EXPANSIONS 141 Exercises 1. Use a cofactor expansion to compute the following determinants:     1 2 3 1 2 3 (a) det0 1 0 (b) det0 1 1 4 5 6 4 5 6     1 2 3 4 0 1 −1 2 −1 1 0  (c) det2 0 −2 (d) det 3 2 0 0 Solution 4 2 1 4 3 0 0     2 0 3 1 1 0 −1 0 1 4 −2 3  0 1 0 −1   (f) det (e) det 0 2 −1 0 −1 0 −1 0  1 3 −2 4 0 1 0 1 2. Use elementary row operations to simplify and then perform a cofactor expansion to evaluate the following determinants:     −1 1 2 −1 1 2 1 −2 (b) det 1 (a) det 1 −1 −2 2 −1 −2 2 −1 −2     2 1 4 2 1 2 −1 3 5 5 −3 7 2 4 −1 6    (d) det (c) det 6 3 10 3 3 1 5 −1 4 2 −4 4 6 2 9 −2 3. Use Theorem 2 to find the inverses for the following matrices:     3 5 2 2 3 0 (b) −2 3 −4 (a) 2 1 5 −5 0 5 0 −1 2 4. To illustrate the issue of computational efficiency, compute A−1 for the following (4 × 4)-matrix A in two different ways: a) using the Gauss-Jordan method, and b) using (4.22). Which did you find to be shorter?   1 0 −1 0 0 −1 1 0  A= 1 0 0 −1 0 1 0 0 142 4.7 CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES Additional Exercises 1. The equations a1 x + b1 y + c1 z = d1 a2 x + b2 y + c2 z = d2 define two planes in R3 . Are the usual three cases (a unique solution, no solution, or an infinite number of solutions) all possibilities? Give a geometric explanation. 2. The equations a1 x + b1 y + c1 z = 0 a2 x + b2 y + c2 z = 0 3 define two planes in R . Give a geometric explanation for why there must be an infinite number of solutions. 3. If A and B are (n×n)-matrices, is it always true that (A+B)(A−B) = A2 −B2 ? 4. If A and B are upper triangular matrices of the same size, prove that AB is upper triangular. 5. If A and B are symmetric (n×n)-matrices, is it always true that AB is symmetric? 6. If A is any matrix (not necessarily square), prove that AAT is a symmetric matrix. 7. If A is invertible and symmetric, show that A−1 is also symmetric. 8. If A is symmetric and B is any matrix of the same size, show that BT AB is symmetric. 9. A square matrix A is called skew-symmetric if AT = −A. Show that a skewsymmetric matrix must have zeros along its main diagonal. 10. If A is an (n×n)-matrix, its trace tr(A) is the sum of the main diagonal elements: a11 +a22 +· · ·+ann . If B is also an (n×n)-matrix, show tr(A+B) = tr(A)+tr(B). 11. A square matrix A is called nilpotent if Ak is the zero matrix for some positive integer k. Determine which of the following matrices are nilpotent. 0 1 1 0 A= , B= 0 0 0 −1     0 1 2 1 2 3 C = 0 0 3, D = 0 0 0 0 0 0 0 0 0 a b 12. (a) If A = , show that A2 = (a + d)A − (ad − bc)I. c d (b) (c) (d) (e) Find Find Find Find a a a a matrix matrix matrix matrix A without ±1 on the main diagonal such that A2 = I. A with zeros on its main diagonal such that A2 = I. A with zeros on its main diagonal such that A2 = −I. A 6= 0 and A 6= I such that A2 = A. 4.7. ADDITIONAL EXERCISES 143 Determine the solution set for the system of linear equations x1 + 3x2 + 2x3 = 5 x1 + 3x2 + 2x3 = 5 13. x1 − x2 + 3x3 = 3 14. x1 − x2 + 3x3 = 3 3x1 + x2 + 8x3 = 11 3x1 + x2 + 8x3 = 0 3x1 + 2x2 − x4 − 2x5 = 0 2x1 + x2 + 4x4 = −1 3x1 + x2 + 5x4 = −1 15. 5x1 + 2x2 + x3 − 3x4 − x5 = 0 ix1 + x2 + (i − 1)x4 = 0 (2 − i)x1 − x2 + 2(1 − i)x3 = 2 + i ix1 + x2 + 2ix3 = i 17. 2x1 + x2 − x4 − x5 = 0 16. 4x1 + x2 + x3 + 9x4 = −3 18. ix1 + 2x2 + (i − 2)x4 = 0 −ix1 + x3 = 0 x1 + ix2 = 1 + 2i Determine the values of k for which the system of linear equations a) has a unique solution, b) has no solutions, and c) has an infinite number of solutions. x1 − x2 = 2 19. x1 − x2 + ix3 = 1 3x1 − x2 + x3 = 7 20. x1 − 3x2 − k 2 x3 = −k ix1 + (1 − i)x2 = −1 + i ix1 − ix2 + k 2 x3 = k Find the rank of the matrix A and find the solutions in vector form for Ax=b.    9 3 −3 6 4  , b = 0 21. A =  2 −2 0 −7 7 −14     1 4 2 1 1 3 3 2    23. A =  3 3 3 , b = 1 1 3 2 1   0 22. A = 0 0  1 3 24. A =  2 2 1 3 2 1 1 3 −3 2 1 0  1 2 , 1   1 b = 0 0    2 0 1 8 −2 3 , b =   3 1 2 9 −5 2 For a linear system Ax = b, let r = rank(A) and r# = rank(A|b). 25. If r < r# , show that the system is inconsistent and has no solution. 26. If r = r# , show that the system is consistent, and (a) there is a unique solution if and only if r# = n, (b) there are an infinite number of solutions if and only if r# < n. Find the inverse of A and use it to solve the matrix equation AX = B. 27.  1 A = 2 1 5 1 7  1 −2 , 2  1 B= 0 −1  0 1 3 0 0 1 144 28. CHAPTER 4. SYSTEMS OF LINEAR EQUATIONS AND MATRICES  6 A = 5 3 5 3 4  3 2 , 2  2 B = 1 1  −1 1 0 1 −1 2 For the next two problems, assume that A and B are both (n × n)-matrices. 29. If AB = I, show that both A and B are invertible and B = A−1 . 30. If AB is invertible, show that both A and B are invertible. Chapter 5 Vector Spaces 5.1 Vectors in Rn In this chapter we want to discuss vector spaces and the concepts associated with them: subspaces, linear independence of vectors, bases, and dimension. We begin in this section with the vector space Rn , which generalizes the familiar two and threedimensional cases, R2 and R3 . Rn is the collection of all n-tuples of real numbers (x1 , . . . , xn ); the quantities x1 , . . . , xn are called coordinates. When n = 2, the coordinates are generally labelled (x, y) and values for x and y represent the location of a point P in the plane; when n = 3, the coordinates are generally labelled (x, y, z) and locate a point P in 3-dimensional space. For both n = 2 and n = 3, the coordinates of P may also be −−→ considered as the components of the vector OP that may be visualized as an arrow pointing from the origin O to the point P , i.e. the arrow has its “tail” at the origin and its “head” at P (see Figures 1 and 2). As in Section 4.1, we shall generally represent vectors using bold-face letters such as v, u, etc. In fact, the origin itself may be considered as a vector: for n = 2, we have 0 = (0, 0) and for n = 3, we have 0 = (0, 0, 0). For n ≥ 4, it is not so easy to visualize (x1 , . . . , xn ) as a point or a vector, but we shall be able to treat it in exactly the same way as for n = 2 or 3. n Definition 3. A vector in R is an n-tuple v of real numbers (v1 , . . . , vn ) called the components of v, and we write v = (v1 , . . . , vn ). The zero vector is 0 = (0, . . . , 0). Two important algebraic operations on vectors are addition and scalar multiplication. For n = 2 or n = 3, vector addition is defined geometrically using the parallelogram rule: u + v is the vector obtained by placing the tail of v at the head of u (or vice versa); see Figure 3. However, the components of u + v are obtained algebraically simply by adding the components. Similarly, a vector for n = 2 or 3 may be multiplied by a real number r simply by multiplying component-wise. Generalizing this, we are able to define vector addition and scalar multiplication in Rn as follows: 145 Fig.1. (x, y) as a point P and as a vector v in R2 Fig.2. (x, y, z) as a point P and as a vector v in R3 Fig.3. The parallelogram rule for vector addition 146 CHAPTER 5. VECTOR SPACES Vector Addition. If u = (u1 , . . . , un ) and v = (v1 , . . . , vn ) are vectors in Rn , then their sum is the vector u + v in Rn with components: u + v = (u1 + v1 , . . . , un + vn ). Scalar Multiplication. If u = (u1 , . . . , un ) is a vector in Rn and r is any real number then ru is the vector in Rn with components: ru = (ru1 , . . . , run ). Note that real numbers are often called scalars to distinguish them from vectors. It is not difficult to verify that vector addition and scalar multiplication in Rn satisfy the following properties: Rn as a Vector Space. If u, v, w are vectors in Rn and r, s are real numbers, then: • u+v =v+u (commutativity of vector addition) • u + (v + w) = (u + v) + w • u+0=0+u (zero element) • u + (−u) = −u + u = 0 • 1u = u (associativity of vector addition) (additive inverse) (multiplicative identity) • (rs)u = r(su) (associativity of scalar multiplication) • r(u + v) = ru + rv (distributivity over vector addition) • (r + s)u = ru + su (distributivity over scalar addition) As we shall see in the next section, these properties of Rn pertain to a more general class of objects called “vector spaces.” There is another important aspect of vectors in R2 and R3 that carries over to Rn , and that is the notion of the magnitude of a vector. Recall that the distance to the origin of P = (x, y) in R2 or Q = (x, y, z) in R3 is given by the square root of the sum of the square of the coordinates: dist(O, P ) = p x2 + y 2 or dist(O, Q) = p x2 + y 2 + z 2 . −−→ −−→ But these quantities also represent the respective lengths of the vectors OP and OQ. Consequently, the following definition is natural: 5.1. VECTORS IN RN 147 Definition 4. The magnitude or length of the vector v = (v1 , . . . , vn ) in Rn is q kvk = v12 + · · · + vn2 . If kvk = 1, then v is called a unit vector. Finally, let us mention the special unit vectors that point in the direction of the coordinate axes: i = (1, 0) and j = (0, 1) in R2 , Fig.4. the unit vectors i and j in R2 and i = (1, 0, 0), j = (0, 1, 0), and k = (0, 0, 1) in R3 . In Rn , we use a different notation: e1 = (1, 0, 0, . . . , 0) e2 = (0, 1, 0, . . . , 0) .. . en = (0, 0, 0, . . . , 1) The significance of these special unit vectors is that other vectors may be expressed in terms of them. For example, in R2 : v = (2, 3) ⇒ v = 2i + 3j, in R3 : v = (1, −1, 5) ⇒ v = i − j + 5k. Having a set of vectors that may be used to write other vectors is an important concept that we shall return to later in this chapter. Exercises 1. If u = (1, 3) and v = (−1, 4) are vectors in R2 , find the following vectors and sketch them: (a) 2u, (b) −v, (c) u + 2v. 2. If u = (2, 0, −1) and v = (−1, 4, 1) are vectors in R3 , find the following vectors and sketch them: (a) 2u, (b) −v, (c) u + 2v. 3. If u = (0, 2, 0, −1) and v = (−1, 1, 4, 1) are vectors in R4 , find the following vectors: (a) 2u, (b) −v, (c) u + 2v. 4. Find the length of the following vectors: (a) u = (−1, 1), (b) v = (2, 0, 3), (c) w = (1, −1, 0, 2). 5. If u = (1, 0, −1, 2) and v = (−2, 3, 5, −1), find the length of w = u + v. 6. If v = (1, −1, 2), find a number r so that u = rv is a unit vector. 7. Express v = (2, −3) in terms of i and j. 8. Express v = (2, −3, 5) in terms of i, j, and k. 9. If u = (1, 0, −1) and v = (−2, 4, 6), find w so that 2u + v + 2w = 0. Fig.5. the unit vectors i, j, and k in R3 148 5.2 CHAPTER 5. VECTOR SPACES General Vector Spaces In this section we give a general definition of a vector space V and discuss several important examples. We shall also explore some additional properties of vector spaces that follow from the definition. In the definition, the set of “scalars” is either the real numbers R or the complex numbers C; if the former then we call V a real vector space; if the latter then V is a complex vector space. Definition of a Vector Space Suppose V is a nonempty set, whose elements we call vectors, on which are defined both vector addition and multiplication by scalars. We say that V (with its set of scalars) is a vector space provided the following conditions hold: Closure under Addition. If u and v are in V , then the sum u + v is in V . Closure under Scalar Multiplication. If u is in V and r is a scalar, then ru is in V . Commutativity of Addition. If u and v are in V , then u + v = v + u. Associativity of Addition. If u, v, and w are in V , then (u + v) + w = u + (v + w). Existence of Zero Vector. There is a vector 0 in V so that 0 + v = v for all v in V. Existence of Additive Inverses. For any v in V there exists a vector w in V such that v + w = 0. (We usually write w = −v.) Multiplicative Identity. The scalar 1 satisfies 1v = v for all v in V . Associativity of Scalar Multiplication. If v is in V and r, s are scalars, then (rs)v = r(sv). Distributivity over Vector Addition. If u and v are in V and r is a scalar, then r(u + v) = ru + rv. Distributivity over Scalar Addition. If v is in V and r, s are scalars, then (r+s)v = rv + sv. This is a long list of conditions, but most of them are pretty intuitive. In fact, the first two conditions may seem trivial, but they are the most critical conditions: we need to know that we can add vectors and multiply by scalars and stay within the set V . Let us now discuss several examples. In each example we need to identify both the vectors V and the set of scalars, but usually the latter is understood. Example 1. Let V = Rn . The fact that this is a real vector space was discussed in the previous section. 2 Example 2. Let V = Cn . Of course, by Cn we mean n-tuples of complex numbers which can be added together component-wise, or multiplied by complex scalars component-wise, exactly as we did in in the previous example. We leave checking the remaining conditions that V is a complex vector space as an exercise; see Exercise 1. 5.2. GENERAL VECTOR SPACES 149 (Although it is not natural to do so, we could have taken V = Cn with real scalars and obtain a “real vector space.”) 2 Example 3. Let M m×n (R) be the collection of (m × n)-matrices with real elements. We claim this is a real vector space. Considering matrices as “vectors” may seem strange, but we observed in Section 4.1 that we can add two (m × n)-matrices simply by adding their corresponding elements, and we can multiply an (m × n)-matrix by a scalar simply by multiplying element-wise. So V is closed under vector addition and scalar multiplication; we leave checking the remaining conditions that V is a real vector space as an exercise; see Exercise 2. (We also could have taken V = M m×n (C), i.e. the collection of (m × n)-matrices with complex elements, to obtain a complex vector space.) 2 The next two examples involve vector spaces of functions. Example 4. Let F(I) denote the real-valued functions on an interval I. At first sight, this seems a lot more complicated than the previous examples since the “vectors” are now functions! But we know how to add two functions f and g together, simply by adding their function values at each point: (f + g)(x) = f (x) + g(x) for any x in I. And we can multiply any real-valued function f by a real number r, simply by multiplying the function value by r at each point: (rf )(x) = rf (x) for any x in I. So V is closed under vector addition and scalar multiplication. Most of the other conditions are rather easily verified, but let us observe that the “zero vector” is just the function O(x) that is identically zero at each point of I, and the additive inverse of a function f in V is just the function −f which takes the value −f (x) at each x in I. 2 The next example shows that vector spaces of functions are relevant for differential equations. Example 5. The real-valued solutions of the second-order linear differential equation y 00 + p(x)y 0 + q(x)y = 0 (5.1) form a real vector space V . To confirm this, recall from Chapter 2 that we can write (5.1) as L y = 0, where L y = y 00 + p(x)y 0 + q(x)y. Moreover, L is a linear operator, so L(c1 y1 + c2 y2 ) = c1 L(y1 ) + c2 L(y2 ). (5.2) Now, to show that V is closed under vector addition, we let y1 and y2 be in V so that L(y1 ) = 0 = L(y2 ). Then we use (5.2) with c1 = 1 = c2 to conclude L(y1 + y2 ) = L(y1 ) + L(y2 ) = 0 + 0 = 0, i.e. y1 + y2 is in V . Similarly, to show that V is closed under scalar multiplication, we let r be a real number and y be in V , i.e. Ly = 0. Then r y is also in V since we can use (5.2) with c1 = r, y1 = y, and c2 = 0 to conclude L(r y) = r L y = r 0 = 0, 150 CHAPTER 5. VECTOR SPACES i.e. r y is in V . (We should be more precise about the differentiability properties of the functions p and q, as well as the functions in the vector space V , but this would not change the simple fact that solutions of (5.1) can be added together or multiplied by scalars.) 2 Additional Properties of Vector Spaces Now let us consider some additional properties of vector spaces that follow from the definition: Theorem 1. Suppose V is a vector space. Then the following hold: (a) 0 v = 0 for any vector v in V . (b) r 0 = 0 for any scalar r. (c) The zero vector is unique. (d) For any vector v in V , its additive inverse is unique. (e) For any vector v in V , its additive inverse is (−1)v. (f ) If r v = 0 for some scalar r and v in V , then either r = 0 or v = 0. Most of these properites, like 0 v = 0 or r 0 = 0, seem intuitively obvious because they are trivial for V = Rn (and other examples). But we have introduced the definition of a vector space in a general context, so we need to prove the properties in this context; this means being rather pedantic in the proofs below. The benefit, however, is that once we have shown that a property is true for a general vector space, we know it is true in all instances, and we do not need to check it for each example. Proof. (a) Let v be any vector in V . Multiplication of v by the scalar 0 is in V and (since 0 = 0 + 0) we know by distributivity that 0v = (0 + 0)v = 0v + 0v. Moreover, the additive inverse −0v for 0v is in V , so we can add it to both sides and use associativity of addition to conclude 0 = 0v −0v = (0v +0v)−0v = 0v +(0v −0v) = 0v, as claimed. (b) Let r be any scalar. The zero vector satisfies 0 = 0 + 0, so by distributivity we have r0 = r0+r0. Now the additive inverse −r0 for r0 is in V , so we can add it to both sides and use associativity to conclude 0 = −r0+r0 = −r0+(r0+r0) = (−r0+r0)+r0 = r0, as claimed. (c) Suppose there are two zero vectors, 01 and 02 , i.e. v + 01 = v for all v in V and w + 02 = w for all w in V . Letting v = 02 and w = 01 , we obtain 02 + 01 = 02 and 01 + 02 = 01 . But by commutativity of addition, we know 02 + 01 = 01 + 02 , so we conclude 01 = 02 , as desired. (d) Suppose that v has two additive inverses w1 and w2 , i.e. v + w1 = 0 = v + w2 . But w1 has an additive inverse −w1 which we can add to v + w1 = 0 to conclude v = −w1 . Substitute this for v in 0 = v + w2 and conclude 0 = −w1 + w2 . Adding w1 to this yields w1 = w2 , as desired. (e) For any v in V , we use (a), 0 = 1 + (−1), and distributivity to conclude 0 = 0 v = (1 + (−1))v = v + (−1)v. Now this shows that (−1)v is an additive inverse for v; by the uniqueness in (d) we conclude that (−1)v is the additive inverse for v, i.e. −v = (−1)v. (f) If r 6= 0, then r−1 exists. Let us multiply rv = 0 by r−1 and use (b) to obtain r−1 (rv) = r−1 0 = 0. But r−1 (rv) = (r−1 r)v = v, so v = 0. 2 5.3. SUBSPACES AND SPANNING SETS 151 Exercises 1. Verify that V = Cn is a complex vector space. 2. Verify that V = M m×n (R) is a real vector space. 3. Determine whether the following sets are real vector spaces (and state what fails if they are not): (a) The set of all integers (. . . , −1, 0, 1, 2, . . . ), Solution (b) The set of all rational numbers, (c) The set of all nonnegative real numbers, (d) The set of all upper triangular n × n matrices, (e) The set of all upper triangular square matrices, (f) The set of all polynomials of degree ≤ 2 (i.e. ax2 + bx + c, a, b, c reals), (g) The set of all polynomials of degree 2 (i.e. ax2 + bx + c, a, b, c reals, a 6= 0), (h) The set of all solutions of the differential equation y 00 + y = 0,* (i) The set of all solutions of the differential equation y 00 + y = sin x.* * You need not solve the differential equation. 5.3 Subspaces and Spanning Sets In this section we consider subsets of vector spaces that themselves form vector spaces. Definition 1. Suppose that S is a nonempty subset of a vector space V . If S itself is a vector space under the addition and scalar multiplication that is defined in V , then we say that S is a subspace of V . Since S inherits its algebraic structure from V , the main issue is showing that S is closed under vector addition and scalar multiplication. Thus we have the following. Theorem 1. A nonempty subset S of a vector space V is itself a vector space if and only if it is closed under vector addition and scalar multiplication. Proof. If S is a vector space, by definition it is closed under vector addition and scalar multiplication. Conversely, suppose S is closed under these operations; we must prove that the remaining conditions of a vector space hold. But the various commutativity, associativity, and distributivity conditions are inherited from V ; the multiplicative identity is also inherited, so we need only show that the zero vector and additive inverses (which we know exist in V ) actually lie in S. But if we choose any vector v in V , then 0 v and (−1) v are scalar multiples of v, so lie in V by hypothesis. However, by Theorem 1 of Section 5.2, 0 v is the zero vector 0 and (−1) v is the additive inverse of v. We conclude that S is indeed a vector space. 2 Let us discuss several examples. The first one has several variations within it. Example 1. Let V = Rn and let a = (a1 , . . . , an ) be a nonzero vector. Let S be the set of vectors v = (x1 , . . . , xn ) in Rn whose dot product with a is zero: a · v = a1 x1 + · · · + an xn = 0. (5.3) 152 CHAPTER 5. VECTOR SPACES We easily check that S is closed under addition: if v = (x1 , . . . , xn ) and w = (y1 , . . . , yn ) are in S, then v + w = (x1 + y1 , . . . , xn + yn ) and a · (v + w) = a1 (x1 +y1 ) + · · · + an (xn +yn ) = (a1 x1 + · · · +an xn ) + (a1 y1 + · · · +an yn ) = 0 + 0 = 0 shows that v+w is in S. We also easily check that S is closed under scalar multiplication: if v = (x1 , . . . , xn ) is in S and r is any scalar, then rv = (rx1 , . . . , rxn ) and a · (rv) = a1 (r x1 ) + · · · + an (r xn ) = r(a1 x1 + · · · + an xn ) = r 0 = 0 shows that rv is in S. By Theorem 1, S is a subspace. Let us consider the special case a = (0, . . . , 0, 1). Then (5.3) implies that xn = 0; in other words, S is the set of vectors of the form v = (x1 , . . . , xn−1 , 0) where x1 , . . . , xn−1 are unrestricted. But this means that S can be identified with Rn−1 . Now suppose that n = 3. For those familiar with multivariable calculus, (5.3) says that S is the set of vectors perpendicular to the vector a. In other words, S is a twodimensional plane in V = R3 that passes through the origin (since S must contain the zero vector). 2 Fig.1. S = Rn−1 as a subspace in Rn The next example shows that the solutions of a homogeneous linear system Ax = 0 form a subspace. Example 2. Let A be an (m × n)-matrix and let S be the vectors x in Rn satisfying Ax = 0. We easily verify that S is closed under addition: if x and y are in S, then A(x + y) = Ax + Ay = 0 + 0 = 0, so x + y is in S. And also verify that S is closed under scalar multiplication: if x is in S and r is any scalar then A(r x) = r Ax = r 0 = 0, so rx is in S. We conclude that S is a subspace of Rn . 2 Example 2 is so important that it leads us to make the following definition. Definition 2. If A is an (m × n)-matrix, then the solution space for Ax = 0 is a subspace of Rn called the nullspace of A and is denoted N (A). The next example concerns functions as vectors, as in Example 4 in Section 5.2. In fact, let F denote the real-valued functions on (−∞, ∞), which is a real vector space. Example 3. Let Pn denote the polynomials with real coefficients and degree ≤ n: p(x) = a0 + a1 x + · · · an xn , where a0 , . . . , an are real numbers. Then Pn is a subspace of F since the sum of two polynomials in Pn is also a polynomial in Pn , and the scalar multiple of a polynomial in Pn is also in Pn . 2 Having given several examples of subspaces of vector spaces, let us consider some subsets of R2 that fail to be subspaces (and the reason that they fail): 5.3. SUBSPACES AND SPANNING SETS 153 • The union of the x-coordinate axis and the y-coordinate axis (not closed under vector addition); • The upper half plane H+ = {(x, y) : y ≥ 0} (not closed under multiplication by negative scalars); • The points on the parabola y = x2 (not closed under either vector addition or scalar multiplication). Linear Combinations and Spanning Sets There are another couple of important concepts concerning vector spaces and their subspaces: linear combinations of vectors and spanning sets. To help explain these concepts, let us consider a special case. Suppose that v1 and v2 are two nonzero vectors in R3 . Let S be the vectors in R3 that can be written as v = c1 v1 + c2 v2 for some constants c1 , c2 . We see that S is closed under addition since v = c1 v1 + c2 v2 ⇒ and w = d1 v1 + d2 v2 v + w = (c1 + d1 )v1 + (c2 + d2 )v2 , and closed under scalar multiplication since v = c1 v1 + c2 v2 and r is a scalar ⇒ rv = (r c1 )v1 + (r c2 )v2 . Consequently, S is a subspace of R3 . Note that S is generally a plane in R3 (see Figure 2); however, if v1 and v2 are co-linear, i.e. v1 = kv2 for some scalar k, then S is the line in R3 containing v1 and v2 (see Figure 3). In any case, S passes through the origin. Taking the sum of constant multiples of a given set of vectors as we did above is so important that we give it a name: Fig.2. Two nonzero vectors v1 , v2 in R3 Definition 3. If v1 , . . . , vn are vectors in a vector space V , then a linear combination of these vectors is any vector of the form v = c1 v1 + · · · + cn vn where the c1 , . . . , cn are scalars. (5.4) The collection of all linear combinations (5.4) is called the linear span of v1 , . . . , vn and is denoted by span(v1 , . . . , vn ). If every vector v in V can be written as in (5.4), then we say that V is spanned by v1 , . . . , vn , or v1 , . . . , vn is a spanning set for V . Since span(v1 , . . . , vn ) is closed under vector addition and scalar multiplication, we have the following: Theorem 2. If v1 , . . . , vn are vectors in a vector space V , then span(v1 , . . . , vn ) is a subspace of V . In the particular case of two nonzero vectors v1 and v2 in R3 , we see that span(v1 ,v2 ) is either a line or a plane (depending on whether v1 and v2 are colinear or not). In fact, if v1 and v2 are colinear, then span(v1 ,v2 )=span(v1 )=span(v2 ). Clearly, whether Fig.3. Two colinear vectors v1 , v2 in R3 154 CHAPTER 5. VECTOR SPACES or not v1 and v2 are colinear makes a big difference to the number of vectors required in a spanning set; this is related to the more general notions of “linear independence” and “dimension” that we shall discuss in the next two sections. For the rest of this section, we begin to address two important questions concerning spanning sets: Question 1. Given a set of vectors v1 , . . . , vn , what other vectors lie in their span? Question 2. Given a subspace S of V , can we find a set of vectors that spans S? To answer Question 1, we must solve for the constants c1 , . . . , cn in (5.4). But we can use Gaussian elimination to do exactly this! Let us consider two examples; in the first example, the vector v in (5.4) is specified. Example 4. Does v = (3, 3, 4) lie in the span of v1 = (1, −1, 2) and v2 = (2, 1, 3) in R3 ? Solution. We want to know whether there exist constants c1 and c2 so that       3 2 1 c1 −1 + c2 1 = 3 . 4 3 2 But we can write this as a linear system for c1 and c2 c1 + 2c2 = 3 −c1 + c2 = 3 2c1 + 3c2 = 4, and then solve  1  −1 2 by applying Gaussian elimination to the augmented coefficient matrix:    2 3 3 1 2 6  (add R1 to R2 and (−2)R1 to R3 ) 1 3  ∼ 0 3 3 4 0 −1 −2   3 1 2 2  (multiply R2 by 1/3) ∼ 0 1 0 −1 −2   1 2 3 ∼  0 1 2  (add R2 to R3 ). 0 0 0 From this we see that c2 = 2 and then we back substitute to find c1 : c1 + 2(2) = 3 implies c1 = −1. So v does lie in span(v1 ,v2 ), and in fact       3 1 2 3 = − −1 + 2 1 . 2 4 2 3 Another variant of Question 1 is whether a given set of n (or more) vectors spans all of Rn ? In this case, the vector v in (5.4) is allowed to be any vector in Rn ; and yet we can still use Gaussian elimination to obtain our answer. 5.3. SUBSPACES AND SPANNING SETS 155 Example 5. Do the three vectors v1 = (1, −1, 2), v2 = (3, −4, 7), and v3 = (2, −1, 3) span all of R3 ? Solution. For any vector v = (x1 , x2 , x3 ) we want to know whether we can find constants c1 , c2 , and c3 so that         1 3 2 x1 c1 −1 + c2 −4 + c3 −1 = x2  . 2 7 3 x3 Even though we do not know the values x1 , x2 , x3 , we can perform Gaussian elimination on the augmented coefficient matrix:  1  −1 2 3 −4 7 2 −1 3   1 x1 x2  ∼  0 x3 0  1 ∼ 0 0  1 ∼ 0 0 3 −1 1 2 1 −1 3 0 1 2 0 −1 3 1 0 2 −1 0  x1 x2 + x1  x3 − 2x1  x1 x2 − x1 + x3  x3 − 2x1  x1 x3 − 2x1  . x2 − x1 + x3 From this row-echelon form, we see that there are vectors v for which c1 , c2 , and c3 cannot be found: just choose a vector v = (x1 , x2 , x3 ) for which x2 − x1 + x3 6= 0. Consequently, the vectors v1 , v2 , v3 do not span all of R3 . (In fact, to answer the question in this example, we did not even need to introduce x1 , x2 , x3 ; it would have sufficed to show that a row-echelon form of the matrix A = [v1 v2 v3 ] contains a row of zeros.) 2 Now let us turn to Question 2. An important case is when the subspace S is the nullspace of a matrix. In this case we can find a spanning set by using Gauss-Jordan elimination to achieve reduced row-echelon form, and then expressing the solution in terms of free variables that play the role of the constants in (5.4). Example 6. Find vectors that span the nullspace of  1 A = 1 1 −4 2 1  1 −4 1 8 . 1 6 156 CHAPTER 5. VECTOR SPACES Solution. We want to find a spanning set of vectors for the solutions x of Ax = 0. Let us use ERO’s to convert A to reduced row-echelon form:     1 −4 1 −4 1 1 1 6 1 2 1 8  ∼ 1 2 1 8  (switching R1 and R3 ) 1 1 1 6 1 −4 1 −4   1 1 1 6 2  (adding (−1)R1 to R2 and to R3 ) ∼ 0 1 0 0 −5 0 −10   1 1 1 6 ∼ 0 1 0 2 (adding 5R2 to R3 ) 0 0 0 0   1 0 1 4 ∼ 0 1 0 2 (adding (−1)R2 to R1 ). 0 0 0 0 From this we see that x3 = s and x4 = t are free variables, and we easily determine x1 and x2 in terms of s and t: x1 = −s − 4t and x2 = −2t. So we can write our solution in vector form as           −4 −1 −4t −s −s − 4t −2 0  −2t   0  −2t          x=  s  =  s  +  0  = s 1  + t 0 . 1 0 t 0 t In other words, we have expressed our solution x as a linear combination of the two vectors, v1 = (−1, 0, 1, 0) and v2 = (−4, −2, 0, 1). Therefore these two vectors span the nullspace of A. 2 Exercises 1. Determine whether the following subsets of R2 are subspaces (and state at least one condition fails if not). Sketch the set: (a) The set of all vectors v = (x, y) such that 2x + 3y = 0, Solution (b) The set of all vectors v = (x, y) such that x + y = 1, (c) The set of all vectors v = (x, y) such that x y = 0, (d) The set of all vectors v = (x, y) such that |x| = |y|. 2. Determine whether the following subsets of R3 are subspaces (and state at least one condition that fails if not). Sketch the set: (a) The set of all vectors v = (x, y, z) such that z = 0, Solution (b) The set of all vectors v = (x, y, z) such that x + y + z = 0, (c) The set of all vectors v = (x, y, z) such that z = 2y, 5.4. LINEAR INDEPENDENCE 157 (d) The set of all vectors v = (x, y, z) such that x2 + y 2 + z 2 = 1. 3. Let M 2 (R) denote the (2 × 2)-matrices A with real elements. Determine whether the following subsets are subspaces (and state at least one condition that fails if not): (a) The invertible matrices, (b) The matrices with determinant equal to 1, (c) The lower triangular matrices, (d) The symmetric matrices (AT = A). 4. Determine whether v lies in span(v1 ,v2 ): (a) v = (5, 6, 7), v1 = (1, 0, −1), v2 = (1, 2, 3), Solution (b) v = (0, −2, 0), v1 = (1, 0, −1), v2 = (1, 2, 3), (c) v = (2, 7, −1, 2), v1 = (1, 2, −2, −1), v2 = (0, 3, 3, 4), (d) v = (1, 2, 3, 4), v1 = (1, 2, −2, −1), v2 = (0, 3, 3, 4). 5. If possible, express w as a linear combination of v1 , v2 , and v3 (a) w = (4, 5, 6), v1 = (2, −1, 4), v2 = (3, 0, 1), v3 = (1, 2, −1), Solution (b) w = (2, 1, 2), v1 = (1, 2, 3), v2 = (−1, 1, −2), v3 = (1, 5, 4), (c) w = (1, 0, 0), v1 = (1, 0, 1), v2 = (2, −3, 4), v3 = (3, 5, 2). 6. Find vectors  1 (a) 4 7  1 (d) 2 2 that span the nullspace of the   2 3 3 5 6 Sol’n (b) −1 8 9 2   1 −4 5 13 14 5 11 13 (e) 2 −1 1 2 7 17 22 following matrices:   1 2i 1 1 −2i (c) 2 1 i 1  −3 −7 1 7 3 11 −1 −1 0  0 2 2 7 2 5 7. Find vectors that span the solution sets of the homogeneous linear systems: (a) 5.4 x1 − x2 + 2x3 = 0 (b) x1 + 3x2 + 8x3 − x4 = 0 2x1 + x2 − 2x3 = 0 x1 − 3x2 − 10x3 + 5x4 = 0 x1 − 4x2 + 8x3 = 0 x1 + 4x2 + 11x3 − 2x4 = 0. Linear Independence For vectors v1 , . . . , vn in a vector space V , we know that S = span(v1 , . . . , vn ) is a subspace of V . But it could be that not all the vi are needed to generate S. For example, if v1 , v2 are nonzero colinear vectors in R3 , then S is the line containing both v1 and v2 , so is generated by v1 alone (or by v2 alone), i.e. we do not need both v1 and v2 . Two vectors that are scalar multiples of each other are not only called colinear, but “linearly dependent;” in fact, we encountered this terminology for two functions in Section 2.2. The generalization of the concept to n vectors is provided by the following: 158 CHAPTER 5. VECTOR SPACES Definition 1. A finite collection v1 , . . . , vn of vectors in a vector space V is linearly independent if the only scalars c1 , . . . , cn for which c1 v1 + c2 v2 + · · · cn vn = 0 (5.5) are c1 = c2 = · · · = cn = 0. Thus v1 , . . . , vn are linearly dependent if (5.5) holds for some scalars c1 , . . . , cn that are not all zero. Technically, the second sentence in this definition should read “Thus the collection {v1 , . . . , vn } is linearly dependent if. . . ”, because linear independence is a property of a collection of vectors rather than the vectors themselves. However, we will frequently use the more casual wording when the meaning is clear. The linear independence of v1 , . . . , vn ensures that vectors in their linear span can be represented uniquely as a linear combination of the v1 , . . . , vn : Theorem 1. If v1 , . . . , vn are linearly independent vectors in a vector space V and v is any vector in span(v1 , . . . , vn ), then there are unique scalars c1 , . . . , cn for which v = c1 v1 + · · · + cn vn . Proof. If we have v = c1 v1 + · · · + cn vn and v = d1 v1 + · · · + dn vn , then v − v = 0 implies (c1 − d1 )v1 + · · · + (cn − dn )vn = 0. But linear independence then implies c1 − d1 = · · · = cn − dn = 0, i.e. di = ci for all i = 1, . . . , n. 2 Let us consider some examples of linearly independent vectors in Rn . Example 1. The standard unit vectors e1 = (1, 0, . . . , 0),. . . ,en = (0, 0, . . . , 1) in Rn are linearly independent since the vector equation c1 e1 + · · · cn en = 0 2 means (c1 , c2 , . . . , cn ) = (0, 0, . . . , 0), i.e. c1 = c2 = · · · = cn = 0. 3 Example 2. Let v1 = (1, 0, 1), v2 = (2, −3, 4), and v3 = (3, 5, 2) in R . Determine whether these vectors are linearly independent. Solution. We want to know whether c1 v1 + c2 v2 + c3 v3 = 0 has a nontrivial solution c1 , c2 , c3 . Now we can write this equation as the homogeneous linear system c1 + 2c2 + 3c3 = 0 −3c2 + 5c3 = 0 c1 + 4c2 + 2c3 = 0, and we can determine the solution set by applying Gaussian elimination to the coefficient matrix:     1 2 3 1 2 3 0 −3 5 ∼ · · · ∼ 0 1 −4 . 1 4 2 0 0 1 5.4. LINEAR INDEPENDENCE 159 But this tells us c3 = 0 and, by back substitution, c1 = c2 = 0. So the vectors are linearly independent. 2 In Example 2, we observed that the issue of linear independence for three vectors in R3 reduces to solving a homogeneous linear system of three equations for the three unknowns, c1 , c2 , c3 . But this observation generalizes to vectors v1 , . . . , vk in Rn . Namely, let A denote the (n × k)-matrix using v1 , . . . , vk as column vectors, which we write as A = [ v1 v2 · · · vk ], and then study the solutions of A c = 0. (5.6) Let us summarize this as follows: Theorem 2. Let v1 , . . . , vk be vectors in Rn and A = [ v1 · · · vk ]. Then v1 , . . . , vk are linearly independent if and only if A c = 0 has only the trivial solution c = 0. Our study in Chapter 4 of solving linear systems such as (5.6) implies the following: Corollary 1. Let v1 , . . . , vk be vectors in Rn and A = [ v1 · · · vk ]. (a) If k > n then v1 , . . . , vk are linearly dependent. (b) If k = n then v1 , . . . , vk are linearly dependent if and only if det(A)=0. Remark 1. If k < n, then we need to use Theorem 2 instead of Corollary 1. Example 3. Determine whether the given vectors are linearly independent in R4 : (a) v1 = (1, 0, 1, 2), v2 = (2, 0, 4, 8), v3 = (3, 2, 1, 0), v4 = (4, 0, 0, 0), v5 = (5, 4, 3, 2). (b) v1 = (1, 0, 1, 2), v2 = (2, 0, 4, 8), v3 = (3, 2, 1, 0), v4 = (4, 0, 0, 0). (c) v1 = (1, 0, 1, 2), v2 = (2, 0, 4, 8), v3 = (3, 2, 1, 0). Solution. (a) We have k = 5 vectors in R4 , so Corollary 1 (a) implies that v1 , . . . , v5 are linearly dependent. (b) We have k = 4 vectors in R4 , so we appeal to Corollary 1 (b). We form the (4 × 4)-matrix A and compute its determinant: 1 0 1 2 2 0 4 8 3 2 1 0 4 1 0 = −2 1 0 2 0 2 4 8 4 1 0 = −8 2 0 4 = −8(8 − 8) = 0. 8 So det(A)= 0, and Corollary 1 implies that v1 , . . . , v4 are linearly dependent. (c) We have k = 3 vectors in R4 , so we use Theorem 2 instead of Corollary 1:       1 2 3 1 2 3 1 0 0 0 0 2 0 0 1  0 1 0      A= 1 4 1 ∼ 0 2 −2 ∼ 0 0 1 2 8 0 0 4 −6 0 0 0 160 CHAPTER 5. VECTOR SPACES shows that Ac = 0 has only the trivial solution c = 0. Hence v1 , v2 , v3 are linearly independent. 2 While Corollary 1 provides us with nice shortcuts for determining whether or not a collection of vectors is linearly independent, to find the nonzero constants so that (5.5) holds, we need to invoke Theorem 2. Example 3 (revisited). For the collection of vectors in (a) and (b), find a nontrivial linear combination satisfying (5.5). Solution. (a) We let A = [ v1 · · · v5 ] and solve (5.6) by Gauss-Jordan elimination:     1 0 0 8 −3 1 2 3 4 5 0 1 0 −2 1  0 0 2 0 4  .  A= 1 4 1 0 3 ∼ · · · ∼ 0 0 1 0 2 0 0 0 0 0 2 8 0 0 2 We see that c4 = s and c5 = t are free variables, and in terms of these we find c1 = −8s + 3t, c2 = 2s − t, c3 = −2t. We have an infinite number of solutions, but we can arbitrarily pick s = 0 and t = 1 to obtain c1 = 3, c2 = −1, c3 = −2, c4 = 0, c5 = 1: 3v1 − v2 − 2v3 + v5 = 0. (b) We let A = [ v1 · · · v4 ] and solve (5.6) by Gauss-Jordan elimination as in (a):     1 0 0 8 1 2 3 4 0 1 0 −2 0 0 2 0    A= 1 4 1 0 ∼ · · · ∼ 0 0 1 0  . 0 0 0 0 2 8 0 0 We see that c4 = s is a free variable, and c1 = −8s, c2 = 2s, c3 = 0. We arbitrarily pick s = 1 to write −8v1 + 2v2 + v4 = 0. Linear Independence for Functions and the Wronskian Now let us consider linear independence for a set of functions. Recall that in Section 2.2 we said two functions are linearly independent if, like two vectors, they are not scalar multiples of each other. For a larger number of functions, we apply Definition 1 to functions and conclude: Definition 2. A finite collection of functions {f1 , f2 , . . . , fn } is linearly independent on an interval I if the only scalars c1 , c2 , . . . , cn for which c1 f1 (x) + c2 f2 (x) + · · · + cn fn (x) = 0 for all x in I are c1 = c2 = · · · = cn = 0. Also recall from Section 2.2 that the Wronskian of two differentiable functions was useful in determining their linear independence. For a larger number of functions, we require more differentiabilty, so let us recall C k (I) = {the functions on the interval I with continuous derivatives up to order k}. 5.4. LINEAR INDEPENDENCE 161 Definition 3. For functions f1 , f2 , . . . , fn in C n−1 (I), we define the Wronskian as f1 f10 .. . W [f1 , f2 , . . . , fn ] = (n−1) f1 f2 f20 (n−1) f2 · · · fn · · · fn0 . (n−1) · · · fn Let us consider an example. Example 4. Let f1 (x) = x, f2 (x) = x2 , and f3 (x) = x3 on I = (−∞, ∞). We compute x W (f1 , f2 , f3 )(x) = 1 0 x2 x3 x2 2x 3x2 2x 3x2 = x − 2 2 6x 2 6x x3 = 2x3 . 6x 2 Let us now show how the Wronskian is useful for establishing the linear independence of a collection of functions. Theorem 3. Let f1 , f2 , . . . , fn be functions in C n−1 (I) with W [f1 , f2 , . . . , fn ](x0 ) 6= 0 for at least one point x0 in I. Then {f1 , f2 , . . . , fn } is linearly independent. Proof. To prove the theorem, we assume c1 f1 (x) + c2 f2 (x) + · · · + cn fn (x) = 0 for all x in I, and we want to show c1 = c2 = · · · = cn = 0. But we can differentiate this equation n − 1 times to obtain c1 f1 (x) + c2 f2 (x) + · · · + cn fn (x) = 0 c1 f10 (x) + c2 f20 (x) + · · · + cn fn0 (x) = 0 .. . (n−1) c1 f1 (n−1) (x) + c2 f2 (x) + · · · + cn fn(n−1) (x) = 0, which we view as a linear system for the unknowns c1 , c2 , . . . , cn . Now this linear system can have a nontrivial solution (c1 , c2 , . . . , cn ) only if the determinant of its coefficient matrix is zero; however, the determinant of the coefficient matrix is precisely W [f1 , f2 , . . . , fn ], which we assumed is nonzero at x0 . We conclude that c1 = c2 = · · · = cn = 0, and hence that {f1 , f2 , . . . , fn } is linearly independent. 2 Example 4 (revisited). Using this theorem and W [x, x2 , x3 ] = 2x3 , we see that {x, x2 , x3 } is linearly independent. 2 It is important to note that, as we saw for n = 2 in Section 2.2, Theorem 3 can not be used to conclude that a collection of functions is linearly dependent: it does not say that if W [f1 , f2 , . . . , fn ](x) = 0 for all x in I then {f1 , f2 , . . . , fn } is linearly dependent. 162 CHAPTER 5. VECTOR SPACES Instead, to show a collection {f1 , f2 , . . . , fn } is linearly dependent on I, we need to display constants c1 , . . . , cn (not all zero) so that c1 f1 (x) + · · · cn fn (x) = 0 for all x ∈ I. Example 5. Let f1 (x) = 1, f2 (x) = x2 , and f3 (x) = 3 + 5x2 . Are these functions linearly independent? Solution. If we think they might be linearly independent, we can try computing the Wronskian: 1 x2 3 + 5x2 2x 10x 10x = 0. = W [f1 , f2 , f3 ] = 0 2x 2 10 0 2 10 Since the Wronskian is identically zero, Theorem 3 does not tell us anything. We now suspect that the functions are linearly dependent, but to verify this, we must find a linear combination that vanishes. We observe that f3 = 3f1 + 5f2 , i.e. 3f1 + 5f2 − f3 = 0. That this linear combination of f1 , f2 , f3 vanishes shows that the functions are linearly dependent. 2 Exercises 1. Determine whether the given vectors are linearly independent; if linearly dependent, find a linear combination that vanishes. (a) v1 = (1, 2), v2 = (−1, 0) (b) v1 = (1, 2), v2 = (−1, 0), v3 = (0, 1) (c) v1 = (1, −1, 0), v2 = (0, 1, −1), v3 = (1, 1, 1) Solution (d) v1 = (2, −4, 6), v2 = (−5, 10, −15) Solution (e) v1 = (2, 1, 0, 0), v2 = (3, 0, 1, 0), v3 = (4, 0, 0, 1) (f) v1 = (1, 2, 0, 0), v2 = (0, −1, 0, 1), v3 = (0, 0, 1, 1), v4 = (1, 0, −1, 0) 2. Find all values of c for which the vectors are linearly independent. (a) v1 = (1, c), v2 = (−1, 2) Solution (b) v1 = (1, c, 0), v2 = (1, 0, 1), v3 = (0, 1, −c) (c) v1 = (1, 0, 0, c), v2 = (0, c, 0, 1), v3 = (0, 0, 1, 1), v4 = (0, 0, 2, 2) 3. Use the Wronskian to show the given functions are linearly independent on the given interval I. (a) f1 (x) = sin x, f2 (x) = cos x, f3 (x) = x, I = (−∞, ∞). Solution (b) f1 (x) = 1, f2 (x) = x, f3 (x) = x2 , f4 (x) = x3 , I = (−∞, ∞). (c) f1 (x) = 1, f2 (x) = x−1 , f3 (x) = x−2 , I = (0, ∞). 4. The Wronskian of the given functions vanishes. Show they are linearly dependent. (a) f1 (x) = x, f2 (x) = x + x2 , f3 (x) = x − x2 . Solution (b) f1 (x) = 1 − 2 cos2 x, f2 (x) = 3 + sin2 x, f3 (x) = π. (c) f1 (x) = ex , f2 (x) = cosh x, f3 (x) = sinh x. 5.5. BASES AND DIMENSION 5.5 163 Bases and Dimension We are now able to introduce the notion of a basis for a vector space: Definition 1. A finite set of vectors B = {v1 , v2 , . . . , vn } in a vector space V is a basis if (a) the vectors are linearly independent, and (b) the vectors span V . The role of a basis for V is to provide a representation for vectors in V : if v is any vector in V , then we know (since {v1 , v2 , . . . , vn } spans V ) that there are constants c1 , c2 , . . . , cn so that v = c1 v1 + c2 v2 + · · · + cn vn , (5.7) and we know (by Theorem 1 in Sec. 5.4) that the constants c1 , c2 , . . . , cn are unique. The most familiar example of a basis is the standard basis for Rn : e1 = (1, 0, 0, . . . , 0) e2 = (0, 1, 0, . . . , 0) .. . en = (0, 0, 0, . . . , 1). That {e1 , e2 , . . . , en } is linearly independent was confirmed in Example 1 in Sec. 5.4. That {e1 , e2 , . . . , en } spans Rn follows from the fact that any v = (v1 , v2 , . . . , vn ) in Rn can be realized as a linear combination of e1 , e2 , . . . , en : v = v1 e1 + v2 e2 + · · · + vn en . A basis for a vector space is not unique. In Rn , for example, there are many choices of bases and we can easily determine when a collection of n vectors forms a basis. In fact, given vectors v1 , . . . , vn in Rn , we form the matrix A = [v1 · · · vn ] with the vj as column vectors, and use Corollary 1 (b) of Section 5.4 to conclude that v1 , . . . , vn are linearly independent if and only if det(A) 6= 0. But the invertibility conditions in Section 4.4 show that det(A) 6= 0 is also equivalent to being able to solve Ac = v for any v in Rn , i.e. to showing that v1 , . . . , vn span Rn . In other words, we have just proved the following: Theorem 1. A collection {v1 , . . . , vn } of n vectors in Rn is a basis if and only if A = [v1 · · · vn ] satisfies det(A) 6= 0. Example 1. Show that v1 = (1, 2, 3), v2 = (1, 0, −1), v3 = (0, 1, 0) is a basis for R3 . Solution. Let A = [v1 v2 v3 ]. We compute 1 det(A) = 2 3 1 0 −1 0 1 1 =− 3 0 1 = −(−1 − 3) = 4 6= 0. −1 2 164 CHAPTER 5. VECTOR SPACES Now let us return to a general vector space V . Although a basis for V is not unique, the number of vectors in each basis is, and that is what we will call the “dimension” of V . Towards that end we have the following: Theorem 2. If a vector space V has a basis consisting of n vectors, then any set of more than n vectors is linearly dependent. Proof. Let {v1 , v2 , . . . , vn } be a basis for V and let A = {w1 , . . . , wm } where m > n. Let us write each wi in terms of the basis vectors: w1 = a11 v1 + · · · + a1n vn w2 = a21 v1 + · · · + a2n vn .. . (5.8) wm = am1 v1 + · · · + amn vn . To show that A is linearly dependent, we want to find c1 , . . . , cm , not all zero, so that c1 w1 + · · · cm wm = 0. (5.9) Replace each wi in (5.9) by its expression in terms of v1 , . . . , vn in (5.8): c1 (a11 v1 + · · · + a1n vn ) + · · · + cm (am1 v1 + · · · amn vn ) = 0. Let us rearrange this equation, collecting the vi terms together: (c1 a11 + · · · + cm am1 )v1 + · · · + (c1 a1n + · · · + cm amn )vn = 0. However, since {v1 , v2 , . . . , vn } is linearly independent, the coefficients of this linear combination must vanish: c1 a11 + · · · + cm am1 = 0 .. .. .. . . . c1 a1n + · · · + cm amn = 0. But this is a homogeneous linear system with m unknowns c1 , . . . , cm and n equations; since m > n, we know by Section 3.3 that it has an infinite number of solutions; in particular, it has a nontrivial solution. But this shows that A is linearly dependent. 2 Now suppose we have two bases A = {v1 , . . . , vn } and B = {w1 , . . . , wm } for a vector space. Since A is a basis and B is linearly independent, Theorem 2 implies m ≤ n. Interchanging the roles of A and B we conclude n ≤ m. Thus m = n and we obtain the following: Corollary 1. Any two bases for a vector space V have the same number of vectors. Using this corollary, we can make the following definition of “dimension.” 5.5. BASES AND DIMENSION 165 Definition 2. If a vector space V has a basis containing n vectors, then n is the dimension of V which we write as dim(V ). We also say that V is finitedimensional. In particular, since we showed above that e1 , . . . , en is a basis for Rn , we see that dim(Rn ) = n, as expected. Remark 1. This definition does not mean that every vector space has finite dimension, i.e. a basis of finitely many vectors. If there is no such finite basis, then the vector space is called “infinite dimensional.” Although we have limited exposure to infinite dimensional vector spaces in this book, an example is discussed in Exercise 2. If we know that V has dimension n, then, for a collection of n vectors, the conditions (a) and (b) in Definition 1 are equivalent. Theorem 3. Suppose dim(V ) = n and B = {v1 , . . . , vn } is a set of n vectors in V . Then: B is a basis for V ⇔ B is linearly independent ⇔ B spans V . Proof. It suffices to show (a) B is linearly independent ⇒ B spans V , and (b) B spans V ⇒ B is linearly independent. (a) Let {v1 , v2 , . . . , vn } be linearly independent and let v be any vector in V . We want to show that v lies in span(v1 , . . . , vn ). But by Theorem 2, we know that {v, v1 , v2 , . . . , vn } is linearly dependent, so we must have cv + c1 v1 + c2 v2 + · · · + cn vn = 0 for some nontrivial choice of constants c, c1 , . . . , cn . If c = 0, then we would have c1 v1 + c2 v2 + · · · + cn vn = 0, which by the linear independence of {v1 , v2 , . . . , vn } means c1 = c2 = · · · cn = 0, i.e. this is a trivial choice of the constants c, c1 , . . . , cn . So we must have c 6= 0, and so we can divide by it and rearrange to have v = c̃1 v1 + c̃2 v2 + · · · + c̃n vn , where c̃i = −ci /c. But this says that v is in span(v1 , . . . , vn ). Hence, {v1 , v2 , . . . , vn } spans V . (b) Assume that B = {v1 , v2 , . . . , vn } spans V . If B is not linearly independent, then we will show that B contains a proper subset B 0 that spans V and is linearly independent, i.e. is a basis for V ; since B 0 has fewer than n vectors, this contradicts dim(V ) = n. So assume B is linearly dependent: c1 v1 + · · · ck vk = 0 for some constants ci not all zero. Suppose cj 6= 0. Then we can express vj as a linear combination of the other vectors: vj = c01 v1 + · · · c0j−1 vj−1 + c0j+1 vj+1 + · · · + c0k vk where c0i = −ci /cj . Thus vj is in the span of A = {v1 , . . . , vj−1 , vj+1 , . . . , vk }, so A also spans V . If A is also linearly dependent, then we repeat the procedure above. Since we started with a finite set of vectors, this process will eventually terminate in a linearly independent set 166 CHAPTER 5. VECTOR SPACES B 0 that spans V and contains fewer than n vectors, contradicting dim(V ) = n. This contradiction shows that B itself must be linearly independent. 2 In a sense, Theorem 3 can be viewed as a generalization of Theorem 1 to vector spaces other than Rn where we do not have access to a condition like det(A) 6= 0. For example, it applies to the important case of subspaces S of Rn : if dim(S) = k < n and we are given a collection of k vectors v1 , . . . , vk in S, then each vi is an n-vector and so the matrix A = [v1 · · · vk ] has n rows and k columns; thus det(A) is not even defined! However, using Theorem 3 to show that {v1 , . . . , vk } is a basis for S, we need only show either (a) they are linearly independent or (b) they span S. Example 3. The equation x + 2y − z = 0 defines a plane, which is a subspace S of R3 . The vectors v1 = (1, 1, 3) and v2 = (2, 1, 4) clearly lie on the plane. Moreover, v1 6= kv2 , so the vectors are linearly independent. By Theorem 3, {v1 , v2 } forms a basis for S. 2 Let us continue to discuss subspaces in the context of bases and dimension. If S is a subspace of a vector space V , then (by Theorem 2) we have dim(S) ≤ dim(V ). But a stronger statement is that we can add vectors to a basis for S to obtain a basis for V : Theorem 4. A basis {v1 , . . . , vk } for S can be extended to a basis {v1 , . . . , vk , . . . , vn } for V . Proof. Let {w1 , . . . , wn } be a basis for V . Is w1 in S? If yes, discard it and let S 0 = S; if not, let vk+1 = w1 so {v1 , . . . , vk , vk+1 } is a linearly independent set and let S 0 = span{v1 , . . . , vk , vk+1 }. Now repeat this process with w2 , w3 , etc. After we exhaust {w1 , . . . , wn }, we will have a linearly independent set {v1 , . . . , v` } whose linear span includes all the {w1 , . . . , wn }, i.e. {v1 , . . . , v` } spans V . But this means {v1 , . . . , v` } is a basis for V and we must have ` = n. 2 Example 3 (revisited). To extend the basis {v1 , v2 } for S to a basis for R3 , we follow the proof of Theorem 4: try adding one of e1 , e2 , e3 . Let us try {v1 , v2 , e1 }. We compute 1 2 1 1 1 1 1 0 = = 1 6= 0. 3 4 3 4 0 So by Theorem 1, {v1 , v2 , e1 } is a basis for R3 . 2 Now let us consider a different problem: can we find a basis for a subspace S? For example, suppose we are given vectors w1 , . . . , wm and we let S = span{w1 , . . . , wm }. We don’t want to remove the wj one by one until we get a linearly independent set! There is a neat answer to this problem, but we must put it off until the next section. On the other hand, there is another important class of subspaces, namely the nullspace of an (m × n)-matrix (or, equivalently, the solution space for a homogeneous linear system), for which we know how to find a spanning set that in fact is a basis. 5.5. BASES AND DIMENSION 167 Basis for a Solution Space Recall that in Section 5.3 we constructed a spanning set for the nullspace of an (m × n)matrix A, i.e. the solution space of Ax = 0. (5.10) To summarize, we used Gauss-Jordan elimination to put the matrix into reduced rowechelon form. In order for there to be a solution, there must be free variables. In that case, we introduced free parameters, and expressed the leading variables in terms of them. Putting the resultant solutions x into vector form, the spanning set of vectors appear with free parameters as coefficients. We now show that this spanning set of vectors is also linearly independent, so forms a basis for the solution space. Suppose that there are k free variables and ` = n − k leading variables in the reduced row-echelon form of A. For simplicity of notation, let us suppose the leading variables are the first ` variables x1 , . . . , x` and the free variables are the last k variables x`+1 , . . . , xn . We introduce k free parameters t1 , . . . , tk for the free variables: x`+j = tj for j = 1, . . . , k. After we express the leading variables in terms of the free parameters, the solution in vector form looks like         b11 t1 + · · · + b1k tk b11 b12 b1k    ..   ..   ..  ..    .   .   .  .          b`1 t1 + · · · b`k tk   b`1   b`2   b`k                  t1 x=  = t1  1  + t2  0  + · · · + tk  0         0  t2   0 1      .   .   .  ..    ..   ..   ..  . tk We see that the k vectors   b11  ..   .     b`1     v1 =   1 , 0    .   ..  0 0   b12  ..   .     b`2     v2 =   0 , 1    .   ..  0 0 ... , 1   b1k  ..   .     b`k     vk =   0   0     .   ..  (5.11) 1 span the solution space. A linear combination c1 v1 + · · · ck vk = 0 takes the form     ∗ 0  ..   ..   .  .      ∗  0        c1 v1 + c2 v2 + · · · + ck vk =  c1  = 0 , c2  0      .  .  ..   ..  ck 0 168 CHAPTER 5. VECTOR SPACES where ∗ indicates terms that we don’t care about; we see that c1 = c2 = · · · = ck = 0. Consequently, v1 , . . . , vk are linearly independent, and hence form a basis for the solution space of (5.10). Let us discuss an example. Example 4. Find the dimension and a basis for the solution space of the linear system x1 + 3x2 + 4x3 + 5x4 = 0 2x1 + 6x2 + 9x3 + 5x4 = 0. Solution. We write the coefficient matrix and use Gauss-Jordan elimination to put it in reduced row-echelon form: 1 3 4 5 1 3 0 25 ∼ . 2 6 9 5 0 0 1 −5 We see that x2 = s and x4 = t are free variables, and the leading variables are x1 = −3s − 25t and x3 = 5t. In vector form we have       −25 −3 −3s − 25t       s  = s 1  + t 0 . x=  5  0   5t 1 0 t We find that the vectors v1 = (−3, 1, 0, 0) and v2 = (−25, 0, 5, 1) span the solution space. Notice that, where one vector has a 1 the other vector has a 0, so v1 , v2 are linearly independent and form a basis for the solution space. In particular, the dimension of the solution space is 2. 2 Exercises 1. Determine whether the given set of vectors is a basis for Rn . (a) v1 = (1, 2), v2 = (3, 4); (b) v1 = (1, 2, 3), v2 = (2, 3, 4), v3 = (1, 0, −1); Solution (c) v1 = (1, 0, −1), v2 = (1, 2, 3), v3 = (0, 1, 0); (d) v1 = (1, 2, 3, 4), v2 = (1, 0, −1, 0), v3 = (0, 1, 0, 1); (e) v1 = (1, 0, 0, 0), v2 = (1, 2, 0, 0), v3 = (0, −1, 0, 1), v4 = (1, 2, 3, 4). 2. Show that the collection P of all real polynomials on (−∞, ∞) is an “infinitedimensional” vector space. 3. Find the dimension and a basis for the solution space. (a) x1 − x2 + 3x3 = 0 2x1 − 3x2 − x3 = 0 (c) Sol’n x1 − 3x2 − 10x3 + 5x4 = 0 (b) 3x1 + x2 + 6x3 + x4 = 0 2x1 + x2 + 5x3 − 2x4 = 0 (d) x1 + 3x2 − 4x3 − 8x4 + 6x5 = 0 x1 + 4x2 + 11x3 − 2x4 = 0 x1 + 2x3 + x4 + 3x5 = 0 x1 + 3x2 + 8x3 − x4 = 0 2x1 + 7x2 − 10x3 − 19x4 + 13x5 = 0 5.6. ROW AND COLUMN SPACES 169 4. Find the dimension and a basis for the nullspace  1 −2 4 1 2 (a) Sol’n ; (b) ; (c) −1 3 −6 3 4 1 5.6 of the given matrix A.    1 −1 2 3 2 3 2 −1 3 4  0 1 ; (d)  1 0 1 1. 6 11 3 −1 4 5 Row and Column Spaces Consider an (m × n)-matrix A. We may think of each row of A as an n-vector and consider their linear span: this forms a vector space called the row space of A and is denoted Row(A). Since elementary row operations on A involve no more than linear combinations of the row vectors and are reversible, this has no effect on their linear span. We have proved the following: Theorem 1. If two matrices A and B are row-equivalent, then Row(A) = Row(B). Since A has m rows, the dimension of Row(A) is at most m, but it could be less. In particular, if E is a row-echelon form for A, then the dimension of Row(A) is the same as the number of nonzero rows in E, which we identified in Section 4.3 as rank(A). Thus we have the following: Corollary 1. dim(Row(A)) = rank(A). In fact, we can use the same reasoning to find a basis for Row(A): Algorithm 1: Finding a Basis for Row(A) Use ERO’s to put A into row-echelon form E. Then the nonzero rows of E form a basis for Row(A). Example 1. Find a basis for the row space of  1 2 1 2 3 3 A= 3 4 5 1 3 0 0 −2 −3 2  2 1  −1 5 Solution. We use ERO’s to find the reduced row-echelon form   1 0 3 0 −8  0 1 −1 0 5   E=  0 0 0 1 −1 . 0 0 0 0 0 We conclude that rank(A)=3 and a basis for the row space is v1 = (1, 0, 3, 0, 8), v2 = (0, 1, −1, 0, 5), v3 = (0, 0, 0, 1, −1). Of course, we did not need to find the reduced rowechelon form; if we had used another echelon form, we would obtain a different (but equivalent) basis for Row(A). 2 Having used the row vectors of A to form a vector space, we can do the same with the column vectors: the linear span of the n column vectors of A is a vector space 170 CHAPTER 5. VECTOR SPACES called the column space and denoted Col(A). We are interested in finding a basis for Col(A), but obtaining it is more subtle than for Row(A) since elementary row operations need not preserve Col(A). However, if we let E denote a row-echelon form for A, then we shall see that it can be used to select a basis for Col(A). In fact, let us denote the column vectors of A by c1 , . . . , cn and the column vectors of E by d1 , . . . , dn , i.e. A = c1 · · · cn and E = d1 · · · dn . Because A and E are row-equivalent, they have the same solution sets, so Ax = 0 ⇔ E x = 0. But if x = (x1 , . . . , xn ), then we can write this equivalence using columns vectors as x 1 c1 + · · · + x n cn = 0 ⇔ x1 d1 + · · · + xn dn = 0. This shows that linear dependence amongst the vectors d1 , . . . , dn is exactly mirrored by linear dependence amongst the vectors c1 , . . . , cn . However, we know a subset of the column vectors of E that is linearly independent: they are the pivot columns, i.e. the columns that contain the leading 1’s. In fact, if E is in reduced row-echelon form (as we had in Example 1), then these pivot columns are the vectors e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , er = (0, . . . , 0, 1, . . . , 0) where r = rank(E), which are clearly independent. But in any row-echelon form, the pivot columns still take the form (1, 0, . . . , 0), (∗, 1, 0, . . . , 0), . . . , (∗, ∗, . . . , ∗, 1, . . . , 0) where ∗ denotes some number, so their linear independence remains clear. Moreover, the pivot columns span Col(E). We conclude that exactly the same column vectors of A are linearly independent and span Col(A). We can summarize this as follows: Algorithm 2: Finding a Basis for Col(A) Use ERO’s to put A into row-echelon form E. Then a basis for Col(A) is obtained by selecting the column vectors of A that correspond to the columns of E containing the leading 1’s. Since the number of leading ones is the same as the number of nonzero rows, we immediately have the following consequence: Theorem 2. For any matrix A we have dim(Row(A)) = dim(Col(A)). We sometimes abbreviate the above theorem with the statement: for any matrix, the row and column ranks are equal. We already acknowledged that the term rank(A) that we introduced in Section 4.3 coincides with the row rank of A, but we now see that it also coincides with the column rank of A, further justifying its simple moniker “rank(A).” Example 2. Find a basis for the column space of the matrix in Example 1. 5.6. ROW AND COLUMN SPACES 171 Solution. We see that the column vectors in E containing leading 1’s are the first, second, and fourth columns; notice that these three vectors are indeed linearly independent and span Col(E). We select the corresponding columns of A as a basis for Col(A): v1 = (1, 2, 3, 1), v2 = (2, 3, 4, 3), v3 = (0, −2, −3, 2). As expected, we have dim(Col(A)) = 3 = dim(Row(A)). 2 We want to mention one more result that is similar to the dimension results that we have obtained in this section. It concerns the dimension of the nullspace of a matrix. Definition 1. For any matrix A, the dimension of its nullspace N (A) is called the nullity of A and is denoted by null(A). Notice that null(A) is just the number of free variables in E = rref(A), and the number of free variables is the total number of variables minus the number of leading 1’s in E, which is just rank(A). In other words, we obtain the following: The Rank-Nullity Identity For an (m × n)-matrix A we have: rank(A) + null(A) = n. Finding a Basis for a Linear Span Given a collection of (possibly) linearly dependent vectors v1 , . . . , vk in Rn that span a subspace W , we can use either Algorithm 1 or Algorithm 2 to find a basis for W : we can let v1 , . . . , vk denote the rows of a (k × n)-matrix and apply Algorithm 1, or we can let v1 , . . . , vk denote the columns of an (n × k)-matrix and use Algorithm 2. However, there is a difference: only Algorithm 2 will select a basis from the collection v1 , . . . , vk . Example 3. Let S be the linear span of the vectors v1 = (1, 3, 0, 1), v2 = (0, 1, 1, −1), v3 = (−1, −1, 2, −3), v4 = (3, 7, −1, 4) in R4 . Select a basis for S from these vectors. Solution. It is not immediately clear whether S is a two or three dimensional subspace, or possibly all of R4 . However, if we use the vectors as column vectors in the 4×4-matrix   1 0 −1 3 3 1 −1 7  , A= 0 1 2 −1 1 −1 −3 4 then S is just the column space of A, and Algorithm 2 will achieve exactly what we want. In fact, if we put A in reduced row-echelon form, we obtain   1 0 −1 0 0 1 2 0  E= 0 0 0 1 . 0 0 0 0 The pivot columns of E are the first, second, and fourth columns, so we conclude that S is spanned by v1 , v2 , and v4 . 2 172 CHAPTER 5. VECTOR SPACES Exercises 1. For the given matrix A, find a  3 3 −9 (a) , (b) 1 −1 3 1 basis for Row(A) and for Col(A).    2 −1 3 1 0 2 3 2 , (c) 2 1 1 1 2 1 1 0 −1 1 2. Let W denote the linear span of the given set of vectors. Select from the vectors a basis for W . (a) v1 = (0, 3, 2), v2 = (1, 2, 1), v3 = (−1, 1, 1) (b) v1 = (2, 3, 0, 1), v2 = (1, 1, 1, 0), v3 = (3, 5, −1, 2), v4 = (−1, 0, 1, 0) (c) v1 = (1, −1, 1, −1), v2 = (2, −1, 3, 1), v3 = (3, −4, 2, −6), v4 = (4, −2, 6, 2) 5.7 Inner Products and Orthogonality In Section 4.1, we defined the dot product of vectors v and w in Rn : v · w = v1 w1 + · · · vn wn . In particular, we have v · v = v12 + · · · vn2 , so the magnitude or length of v can be expressed using the dot product as √ (5.12) kvk = v · v. In this section we will also call kvk the norm of the vector v. Recall from analytic geometry in R3 that two vectors v and w are orthogonal if v · w = 0. We can use the same definition in Rn , namely v and w are orthogonal if v · w = 0. (5.13) The dot product in Rn is an example of an “inner product”: if we write hv, wi = v · w for vectors v and w in Rn , (5.14) then we see that the following properties hold: • hv, vi ≥ 0, and hv, vi = 0 if and only if v = 0; • hv, wi = hw, vi for any vectors v and w; • hλv, wi = λhv, wi for any scalar λ and vectors v and w; • hu + v, wi = hu, wi + hv, wi for any vectors u, v, and w; p • kvk = hv, vi is the norm of v. What about an inner product for vectors in Cn ? We no longer √ hv, wi = v·w √ want to use since the norm/magnitude of a complex number is |z| = zz and not z 2 . Instead, let us define hv, wi = v · w = v1 w1 + v2 w2 + · · · vn wn for vectors v and w in Cn . (5.15) 5.7. INNER PRODUCTS AND ORTHOGONALITY 173 With this definition, we see that we need to modify one of the properties listed above, since we no longer have hv, wi = hw, vi, but rather hv, wi = hw, vi. Of course, for real vectors v and w, we have v · w = v · w, so the definitions (5.14) and (5.15) coincide, and there is no harm in using (5.15) for Rn as well. Let us now give the general definition of an inner product on a vector space; the complex conjugate signs may be removed in case V is a real vector space. Definition 1. An inner product on a vector space V associates to each pair of vectors v and w a scalar hv, wi satisfying the following properties: 1. hv, vi ≥ 0 for any vector v, and hv, vi = 0 if and only if v = 0; 2. hv, wi = hw, vi for any vectors v and w; 3. hλv, wi = λhv, wi for any scalar λ and vectors v and w; 4. hu + v, wi = hu, wi + hv, wi for any vectors u, v, and w. A vector space with an inner product is called an inner product space. The norm of a vector v in an inner product space is defined by p kvk = hv, vi (5.16) As a consequence of Properties 2 and 3 we have (see Exercise 2): hv, λwi = λhv, wi for any scalar λ and vectors v and w. (5.17) It is important to realize that subspaces of inner product spaces inherit the inner product. Example 1. Let V be a subspace of Rn . Then V is an inner product space under the dot product. (The analogous statement about subspaces of Cn is also true.) 2 In Rn we know that the norm of a vector v corresponds to its length, so a vector with norm 1 is a unit vector. Of particular interest in Rn are collections of vectors such as {e1 , e2 , . . . , en } which form a basis. Notice that the collection {e1 , e2 , . . . , en } has two additional properties: i) each element is a unit vector, and ii) any two distinct elements are orthogonal. We want to consider these properties in all inner product space. Definition 2. Let V be an inner product space. • A vector v is a unit vector if kvk = 1; • Two vectors v and w are orthogonal (and we write v ⊥ w) if hv, wi = 0; • A collection of vectors {v1 , . . . , vn } is an orthogonal set of vectors if vi ⊥ vj for all i 6= j; • An orthogonal set of unit vectors {v1 , . . . , vn } is called an orthonormal set. Example 2. The zero vector 0 is orthogonal to every vector v in V . This is a simple consequence of the linearity of the inner product: 174 CHAPTER 5. VECTOR SPACES h0, vi = hv − v, vi = hv, vi − hv, vi = 0. 2 Example 3. In R3 , the vectors v1 = (1, 0, −1) and v2 = (1, 2, 1) are orthogonal since hv1 , v2 i = (1, 0, −1) · (1, 2, 1) = 1 − 1 = 0. Consequently, {v1 , v2 } is √ an orthogonal set. But v1 and v2 are not unit vectors since √ kv1 k = 2 and kv2 k = 6. However, if we let 1 u1 = √ (1, 0, −1) 2 1 and u2 = √ (1, 2, 1), 6 2 then {u1 , u2 } is an orthonormal set. If {v1 , . . . , vn } is an orthogonal set that forms a basis for V , then we say that {v1 , . . . , vn } is an orthogonal basis; similarly, an orthonormal basis is an orthonormal set that is also a basis. Example 4. We know that i = (1, 0) and j = (0, 1) form a basis for R2 . Since i and j are orthogonal unit vectors, they form an orthonormal basis for R2 . But there are many other orthonormal bases for R2 . For example, v1 = (1, 1) and v2 = (1, −1) are linearly independent (since the determinant of the matrix [v1 v2 ] is not zero), so {v1 , v2 } is a basis for R2 . They are also orthogonal since v1 · v2 = 0, so {v1 , v2 } is an orthogonal basis. Note v1 and v2 are not unit vectors, but we can normalize them by defining 1 u1 = √ (1, 1) 2 1 and u2 = √ (1, −1). 2 Now we see that {u1 , u2 } is an orthonormal basis for R2 . 2 We are beginning to suspect that orthogonality is somehow related to linear independence. The following result confirms our suspicions. Theorem 1. If {v1 , . . . , vn } is an orthogonal set of nonzero vectors in an inner product space V , then the vectors v1 , . . . , vn are linearly independent. Proof. Suppose {v1 , . . . , vn } is an orthogonal set of nonzero vectors, and we have scalars c1 , . . . , cn so that c1 v1 + · · · cn vn = 0. Now let us take the inner product of both sides with v1 : hv1 , c1 v1 + · · · cn vn i = hv1 , 0i. Using linearity and the fact that hv1 , vj i = 0 for all j 6= 1, the left-hand side is just c1 hv1 , v1 i = c1 kv1 k2 . Since the right hand side is just 0, we have c1 kv1 k2 = 0. But since v1 is nonzero, we know kv1 k 6= 0, so we must have c1 = 0. We can repeat this for j = 1, . . . , n and conclude c1 = c2 = · · · = cn = 0. This means that v1 , . . . , vn are linearly independent. 2 Corollary 1. In an n-dimensional inner product space V , an orthogonal set of n vectors forms a basis for V . 5.7. INNER PRODUCTS AND ORTHOGONALITY 175 Orthogonal Projections and the Gram-Schmidt Procedure Does a finite-dimensional inner product space always have an orthonormal basis? The answer is not only “yes”, but there is a procedure for finding it. Let h, i be an inner product on a vector space V and suppose {w1 , . . . , wn } is a basis for V . We shall show how to use {w1 , . . . , wn } to define an orthonormal basis {u1 , . . . , un } for V . The procedure involves “orthogonal projection,” which we now discuss. Let u be a unit vector in V . For any vector v in V , we define the projection of v onto u to be the vector proju (v) = hv, uiu. (5.18) Notice that proju (v) is a scalar multiple of u, so it points in the same (or directly opposite) direction as u. Moreover, the vector orthu (v) = v − proju (v), (5.19) is orthogonal to u since hv − proju (v), ui = hv, ui − hhv, uiu, ui = hv, ui − hv, uikuk2 = 0. (Notice that we need u to be a unit vector in the last equality.) So we can write v as the sum of two vectors that are orthogonal to each other: v = proju (v) + orthu (v). (5.20) Fig.1. Projection of v onto the unit vector u Orthogonal projection onto a unit vector can be generalized to orthogonal projection onto the span U of a finite set {u1 , u2 , . . . , un } of orthonormal vectors. Namely, we define the projection of v onto U = span({u1 , u2 , . . . , un }) to be the vector projU (v) = hv, u1 iu1 + hv, u2 iu2 + · · · + hv, un iun , (5.21) which is a linear combination of {u1 , u2 , . . . , un }, and so lies in U . Analogous to (5.19), we can define orthU (v) = v − projU (v), (5.22) which is orthogonal to each of the basis vectors {u1 , u2 , . . . , un } since Fig.1. Projection of v onto the subspace U hv − projU (v), ui i = hv, ui i − hhv, u1 iu1 + · · · + hv, un iun , ui i = hv, ui i − hv, u1 ihu1 , ui i − · · · − hv, un ihun , ui i = hv, ui i − hv, ui i = 0. Consequently, we can write v as the sum of two vectors v = projU (v) + orthU (v), where projU (v) lies in U and orthU (v) is orthogonal to U . (5.23) 176 CHAPTER 5. VECTOR SPACES The Gram-Schmidt Procedure. Suppose V is an n-dimensional vector space with basis {v1 , . . . , vn }. First let u1 = v1 kv1 k and V1 = span{u1 }. Next we let w2 = v2 − projV1 (v2 ) = v2 − proju1 (v2 ) so that w2 is orthogonal to u1 , and then we let u2 = w2 kw2 k and V2 = span{u1 , u2 }. We can continue this process iteratively: having defined {u1 , . . . , uk } and Vk = span(u1 , . . . , uk ), we define wk+1 by wk+1 = vk+1 − projVk (vk+1 ), so that wk+1 is orthogonal to {u1 , . . . , uk }, and then we normalize to obtain uk . We stop when we have exhausted the set {v1 , . . . , vn }. We have established the following. Theorem 2. Let {v1 , . . . , vn } be a basis for an inner product space V . Then V has an orthonormal basis {u1 , . . . , un } generated by the Gram-Schmidt procedure. Example 5. Find an orthonormal basis for the plane in R3 spanned by the vectors v1 = (1, −1, 0) and v2 = (0, 1, −1). Solution. We follow the Gram-Schmidt procedure. Let u1 = v1 1 = √ (1, −1, 0). kv1 k 2 Next we calculate proju1 (v2 ) = hv2 , u1 i u1 = 1 −√ 2 1 1 √ (1, −1, 0) = (−1, 1, 0), 2 2 and let 1 1 w2 = v2 − proju1 (v2 ) = (0, 1, −1) − (−1, 1, 0) = (1, 1, −2). 2 2 Finally, we normalize w2 to obtain u2 : √ kw2 k = 6 2 ⇒ 1 u2 = √ (1, 1, −2). 6 Thus {u1 , u2 } is an orthonormal basis for the plane spanned by {v1 , v2 }. 2 5.7. INNER PRODUCTS AND ORTHOGONALITY 177 Exercises 1. Let V = C([0, 1]) be the real-valued continuous functions on [0, 1], and let hf, gi = R1 f (x)g(x) dx. 0 (a) Show that V is a real vector space. (b) Show that h , i is an inner product on V . (c) Show that f (x) = sin πx and g(x) = cos πx are orthogonal. 2. Prove (5.17). 3. The following are special cases of the orthogonal projections (5.19) and (5.21). (a) If v is a scalar multiple of a unit vector u, show that orthu (v) = 0. (b) If v is in U = span(u1 , . . . , un ), where {u1 , . . . , un } is an orthonormal set, show that orthU (v) = 0. 4. (a) Find all values of c so that v = (2, c) and w = (3, 1) are orthogonal. (b) Find all values of c so that v = (−1, 2, c) and w = (3, 4, 5) are orthogonal. (c) Find all values of c so that v = (1, 2, c) and w = (1, −2, c) are orthogonal. 5. Use the Gram-Schmidt procedure to find an orthonormal basis for the vector space spanned by the given vectors (a) v1 = (6, 3, 2), v2 = (2, −6, 3) (b) v1 = (1, −1, −1), v2 = (2, 1, −1) (c) v1 = (1, 0, 1, 0), v2 = (0, 1, 1, 0), v3 = (0, 1, 0, 1) 178 5.8 CHAPTER 5. VECTOR SPACES Additional Exercises P∞ 1. Let V be the collection of all infinite series k=1 ak of real numbers that converge P∞ absolutely, i.e. k=1 |ak | < ∞. Is V a real vector space? 2. Let V be the collection of all infinite sequences {zk }∞ k=1 of complex numbers that converge to zero, i.e. limk→∞ zk = 0. Is V a complex vector space? 3. Let F[0, ∞) denote the vector space of real-valued functions on [0, ∞) and EO[0, ∞) denote those functions that are of exponential order, i.e. |f (t)| ≤ M ect for some positive constants M and c. Is EO a subspace of F? 4. Let F denote the vector space of real-valued functions on (−∞, ∞) and P denote those functions that are periodic on (−∞, ∞). Is P a subspace of F? 5. Let S1 and S2 be subspaces of a vector space V . Show that the intersection S1 ∩S2 is also a subspace of V . 6. Let S1 and S2 be subspaces of a vector space V and let S1 + S2 denote the vectors in V of the form v = v1 + v2 , where vi is in Si . Show that S1 + S2 is a subspace of V . 7. Let V = M 2×2 (R) be the vector space of all real (2×2)-matrices. Are the following four matrices linearly independent? 1 0 0 1 0 0 1 2 A1 = , A2 = , A3 = , A4 = . 0 0 0 0 1 0 3 0 8. Let V = M 2×2 (C) be the vector space of all complex (2 × 2)-matrices. Are the following three matrices linearly independent? 1 0 0 i 0 0 A1 = , A2 = , A3 = . 0 0 0 0 1+i 0 9. Let Sym3×3 (R) denote all real symmetric (3 × 3)-matrices. (a) Show that Sym3×3 (R) is a subspace of M 3×3 (R). (b) Find a basis for Sym3×3 (R). (c) What is the dimension of Sym3×3 (R)? 10. Let T ri3×3 + (R) denote the vector space of all upper triangular real (3×3)-matrices. Find a basis and the dimension for T ri3×3 + (R). 11. Let f1 (x) = x and f2 (x) = |x| for −∞ < x < ∞. (a) Show that f2 is not in C 1 (−∞, ∞). (b) Show that {f1 , f2 } is linearly independent on (−∞, ∞). 12. Let f1 (x) = x, f2 (x) = |x|, and ( f3 (x) = 0 for x ≤ 0 x for x > 0. Is {f1 , f2 , f3 } linearly independent on (−∞, ∞)? 5.8. ADDITIONAL EXERCISES 179 13. Let P2 (x) denote the quadratic polynomials in x (i.e. degree ≤ 2), which is a vector space (cf. Example 3 in Section 5.3). Find a basis for P2 (x). 14. Let P2 (x, y) denote the quadratic polynomials in x and y (i.e. degree ≤ 2). Confirm that P2 (x, y) is a vector space and find its dimension. 15. Show that C n (I), where n is any nonnegative integer and I is an interval, is an infinite-dimensional vector space. 16. Show that the complex numbers C can be considered to be a real vector space. Find its dimension. 17. If S is a subspace of a finite-dimensional vector space V and dim(S) = dim(V ), then show S = V . 18. Give an example of an infinite-dimensional subspace S of an infinite-dimensional vector space V , such that S 6= V . 19. Without performing calculations, determine the rank of the matrix and and use the rank-nullity identity to determine the nullity.   1 −2 3 −4 8  A = −2 4 −6 3 −6 9 −12 20. Without performing calculations, determine the rank of the matrix and and use the rank-nullity identity to determine the nullity.   2 1 −3 0 4 0 1 2 3 4  B= −4 −2 6 0 −8 0 −2 −4 −6 −8 21. If v and w are orthogonal vectors in a real inner product space, show that kv + wk2 = kvk2 + kwk2 . (*) 22. If v and w are vectors in a real inner product space that satisfy the above equality (∗), show that v and w are orthogonal. If S is a subspace of a real inner product space V , let its orthogonal complement S ⊥ be the set of vectors v in V satisfying hv, si = 0 for all s in S. The next four problems concern this object. 23. Show that S ⊥ is a subspace of V . 24. Show that S ∩ S ⊥ = {0}. 25. If S1 is a subspace of S2 , show that S2⊥ is a subspace of S1⊥ . 26. Show that S is a subspace of (S ⊥ )⊥ . 180 CHAPTER 5. VECTOR SPACES Chapter 6 Linear Transformations and Eigenvalues 6.1 Introduction to Transformations and Eigenvalues In this chapter we will consider linear transformations between vector spaces, T : V → W , and especially linear transformations on a vector space, T : V → V . Let us give a careful definition. Definition 3. A linear transformation between vector spaces V and W is a mapping T : V → W that assigns to each vector v in V a vector w = T v in W satisfying • T (u + v) = T (u) + T (v) for all vectors u and v in V , and • T (s v) = s T (v) for all vectors v and scalars s. If V = W , then T : V → V is a linear transformation on V . Two additional properties that follow immediately are T (0) = 0 and T (−v) = −T (v) for all v in V . (6.1) There are linear transformations between infinite-dimensional vector spaces (see Exercises 1 & 2 in Section 6.4), but the most common linear transformations are given by matrices. In fact, any (m × n)-matrix A defines a linear transformation A : Rn → Rm , and any (n × n)-matrix A defines a linear transformation on Rn . Fig.1. Linear transformation between vector spaces. Example 1. The (2 × 2)-matrix A= 0 1 −1 0 defines a linear transformation on R2 ; let us see investigate its behavior. Recalling the basis vectors i = (1, 0) and j = (0, 1) for R2 , we see that 0 −1 1 0 0 −1 0 −1 Ai = = = j and Aj = = = −i. 1 0 0 1 1 0 1 0 181 Fig.2. Rotation of the plane by 90◦ 182 CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES In both cases, we see that A has the effect of rotating the vector 90◦ counterclockwise. Since i and j form a basis for R2 , we see that A rotates every vector in R2 in this way; in particular, for every nonzero vector v, Av is perpendicular to v. 2 Example 2. The (2 × 2)-matrix A= 2 0 0 1/2 also defines a linear transformation on R2 ; let us explore its properties. Notice that Ai = 2i and Aj = 21 j. Thus A stretches vectors in the i-direction by a factor of 2, and it compresses vectors in the j-direction by a factor of 12 . Clearly the two vectors i and j play a special role for the matrix A. 2 Fig.3. Stretching in the x-direction, compressing in the y-direction. Eigenvalues and Eigenvectors In the case of a linear transformation on a vector space T : V → V , of particular interest is any vector v that is mapped to a scalar multiple of itself, i.e. T v = λv, for some scalar λ. (6.2) Of course, (6.2) always holds when v = 0 (by (6.1)), but we are interested in nonzero solutions of (6.2). Definition 4. An eigenvalue for a linear transformation T on a vector space V is a scalar λ so that (6.2) holds for some nonzero vector v ∈ V . The vector v is called an eigenvector associated with the eigenvalue λ, and (λ, v) is called an eigenpair. Fig.4. The action of T on an eigenvector. Notice that an eigenvector is not unique. In fact, when V = Rn and T = A is an (n × n)-matrix, we may write (6.2) as (A − λI) v = 0. (6.3) We see that the eigenvectors for the eigenvalue λ (together with the zero vector 0) form the nullspace of the matrix A − λI. In particular, the solution space of (6.3) is a subspace of Rn ; consequently, if v is an eigenvector for λ then so is s v for any nonzero scalar s, and if v and w are both eigenvectors for λ then so is the sum v + w. We call this subspace the eigenspace for λ; the dimension of the eigenspace could be 1 or it could be greater than 1. In order to find eigenvalues and eigenvectors for A, we use the linear algebra of matrices that we studied in Chapter 4. In particular, the existence of a nontrivial solution of (6.3) means that the matrix A − λI is singular, i.e. that det(A − λI) = 0. (6.4) But p(λ) = det(A−λI) is just an nth-order polynomial called the characteristic polynomial and (6.4) is called the characteristic equation. So finding the eigenvalues of A reduces to finding the roots of its characteristic polynomial, and for each eigenvalue we can then solve (6.3) to find its eigenvectors. We summarize this as follows: 6.1. INTRODUCTION TO TRANSFORMATIONS AND EIGENVALUES 183 Finding the eigenvalues & eigenvectors of a matrix A 1. Find all roots λ of the characteristic equation (6.4); these are the eigenvalues. 2. For each eigenvalue λ, solve (6.3) to find all eigenvectors v associated with λ. Example 3. Find the eigenvalues and a basis for each eigenspace for the (2 × 2)-matrix 5 7 A= . −2 −4 Solution. We first calculate the characteristic polynomial 5−λ 7 det(A − λI) = det = (5 − λ)(−4 − λ) − (−2)(7) = λ2 − λ − 6. −2 −4 − λ Next we find the roots of the characteristic polynomial by factoring it: λ2 − λ − 6 = (λ − 3)(λ + 2) = 0 ⇒ λ1 = 3, λ2 = −2. Let us first find an eigenvector associated with the eigenvalue λ1 = 3. We must solve (A − λ1 I) v1 = (A − 3I) v1 = 0. Writing v1 = (x, y), we must solve 2 7 x 0 = . −2 −7 y 0 Since the system is homogeneous, we recall from Chapter 4 that we want to use elementary row operations to put the coefficient matrix into row-echelon form. Adding the top row to the bottom row, and then dividing the top row by 2, we obtain 2 7 2 7 1 7/2 ∼ ∼ . −2 −7 0 0 0 0 We see that y is a free variable, so we can let y = s and solve for x to find x = −7s/2. To get one eigenvector, it is convenient to let s = 2: v1 = (−7, 2). (Of course, we could as easily have taken s = −2 to obtain v1 = (7, −2).) Now we turn to the eigenvalue λ2 = −2. We must solve (A + 2I) v2 = 0. Writing v2 = (x, y), we must solve 7 7 x 0 = . −2 −2 y 0 We can put the coefficient matrix into row echelon form 7 7 1 1 ∼ , −2 −2 0 0 184 CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES and conclude that y = s is free and x = −s. For example, we can take s = 1 to obtain v2 = (−1, 1). (Of course, using s = −1 to obtain v2 = (1, −1) works just as well.) 2 In Example 3 we found each eigenvalue had a one-dimensional eigenspace spanned by a single eigenvector. However, eigenspaces could be multi-dimensional, as the next example illustrates. Example 4. Find the eigenvalues and a basis for each eigenspace for the (3 × 3)-matrix   4 −3 1 A = 2 −1 1 . 0 0 2 Solution. We calculate the characteristic polynomial det(A − λI) = 4−λ 2 0 = (4 − λ) −3 1 −1 − λ 1 0 2−λ −1 − λ 0 −3 1 −2 0 2−λ 1 2−λ = (λ2 − 3λ + 2)(2 − λ) = (λ − 1)(λ − 2)(2 − λ). We find that there are just two eigenvalues, λ1 = 1 and λ2 = 2, the latter having multiplicity 2. To find the eigenvectors associated with λ1 = 1, we put A − I into row-echelon form:         1 −1 0 1 −1 0 1 −1 0 3 −3 1 A − I = 2 −2 1 ∼ 2 −2 1 ∼ 0 0 1 ∼ 0 0 1 . 0 0 1 0 0 1 0 0 0 0 0 1 We see that x2 = t is a free variable, x3 = 0, and x1 = t. We can choose t = 1 to get v1 = (1, 1, 0). For the eigenvectors associated with λ2 = 2, we put A − 2I into row-echelon form:     2 −3 1 1 −3/2 1/2 0 0  A − 2I = 2 −3 1 ∼ 0 0 0 0 0 0 0 We see that x2 = s and x3 = t are free variables, and x1 = (3/2)s − (1/2)t. Taking s = 2 and t = 0 we get one eigenvector associated with λ2 = 2 v2 = (3, 2, 0), and if we take s = 0 and t = 2, we get another eigenvector associated with λ2 = 2 v3 = (−1, 0, 2). 2 6.1. INTRODUCTION TO TRANSFORMATIONS AND EIGENVALUES 185 Example 4 shows that an eigenvalue with multiplicity 2 can have two linearly independent eigenvectors, so the eigenspace has dimension 2. Note that there are two notions of multiplicity for an eigenvalue λ: the algebraic multiplicity ma is the multiplicity as a root of the characteristic polynomial and the geometric multiplicity mg is the dimension of the associated eigenspace. The following example shows that, if ma > 1, these two notions of multiplicity need not coincide. Example 5. Find the eigenvalues and a basis for each eigenspace for the (3 × 3)-matrix  1 B = 0 0  3 2 . 2 2 1 0 Solution. The upper triangular form of this matrix makes it particularly easy to calculate its eigenvalues:  1−λ det(B − λI) = det  0 0  2 3 1−λ 2  = (1 − λ)2 (2 − λ). 0 2−λ We see that the characteristic polynomial has two roots: λ1 = 1 has algebraic multiplicity 2, and λ2 = 2 has algebraic multiplicity 1. To find the geometric multiplicities, let us find the eigenvectors for both eigenvalues. For λ1 = 1, we put the matrix B − I into row echelon form:  0 B − I = 0 0 2 0 0   3 0 2 ∼ 0 1 0 1 0 0   3/2 0 2  ∼ 0 1 0  0 1 . 0 1 0 0 We see the first variable is free and the other two must be zero, i.e. v1 = s (1, 0, 0) for any value s. We see that the eigenspace has dimension 1 and, if we take s = 1, we get a particular eigenvector v1 = (1, 0, 0). Notice that λ1 = 1 has ma = 2 but mg = 1. For λ2 = 2 we proceed similarly:  −1 B − 2I =  0 0 2 −1 0   3 1 2 ∼ 0 0 0 −2 1 0   −3 1 −2 ∼ 0 0 0 0 1 0  −7 −2 , 0 so the third variable is free and we obtain the general solution v2 = t (7, 2, 1). We see that this eigenspace has dimension 1 (so ma = 1 = mg ), and we can choose t = 1 to obtain a particular eigenvector v2 = (7, 2, 1). 2 186 CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES Complex Eigenvalues Even when A is a real (n × n)-matrix, some of the roots of the characteristic polynomial p(λ) could be complex numbers. But if λ is complex, then the solutions of (6.3) are complex vectors, so do not qualify as eigenvectors for A : Rn → Rn ; consequently, λ is not an eigenvalue. In particular, we see that there are linear transformations on Rn without any eigenvalues; this is illustrated by Example 1 above. However, if we now let V = Cn and consider A : Cn → Cn , then a complex root of p(λ) is an eigenvalue since we can now find (complex) solutions of (6.3) which are perfectly valid as eigenvectors. Of course, when V = Cn , we could allow the matrix A to have complex elements, but let us continue to assume that A has real elements and see what benefits we obtain. One benefit is that the coefficients of p(λ) being real means that the complex roots of p(λ) = 0 occur in complex conjugate pairs: p(λ) = 0 ⇔ p(λ) = p(λ) = 0. A second benefit is that if v is an eigenvector for λ, then v is an eigenvector for λ since Av = λv ⇔ Av = Av = λv. We restate this as: If A is a square matrix with real elements, then (λ, v) is an eigenpair if and only if (λ, v) is an eigenpair. This is a useful observation: it means that, having solved the linear system (6.3) to find an eigenvector for a complex eigenvalue λ, we do not need to solve a linear system to find an eigenvector for λ ! Let us illustrate this with an example. Example 6. Find the eigenvalues and a basis for each eigenspace for the (2 × 2)-matrix −3 −1 A= . 5 1 Solution. We first find the roots of the characteristic equation: −3 − λ −1 det(A − λI) = = λ2 + 2λ + 2 = 0 ⇒ 5 1−λ λ = −1 ± i. As expected, the eigenvalues λ1 = −1 + i and λ2 = −1 − i are conjugate pairs. Let us find an eigenvector for λ1 : −2 − i −1 5 2−i 5 2−i A − λ1 I = ∼ ∼ . 5 2−i 5 2−i 0 0 From this we can conclude that an eigenvector for λ1 is v1 = (−2 + i, 5). To find an eigenvector for the eigenvalue λ2 = −1 − i, instead of solving a linear system, we simply take the complex conjugate of v1 : v2 = v1 = (−2 − i, 5). Thus we have two eigenpairs: λ1 = −1 + i, v1 = (−2 + i, 5) λ2 = −1 − i, v2 = (−2 − i, 5). 2 6.2. DIAGONALIZATION AND SIMILARITY 187 Exercises 1. Find all eigenvalues and a basis for each eigenspace for the following matrices. If an eigenvalue has algebraic multiplicity ma > 1, find its geometric multiplicity mg . 4 −3 10 −8 1 6 (b) ; (c) ; (a) Solution ; 2 −1 6 −4 2 −3       1 1 0 3 6 −2 7 −8 6 (d) 0 2 0; (e) 0 1 0 ; (f) 8 −9 6 . 0 0 3 0 0 1 0 0 −1 2. The following matrices have (some) complex eigenvalues. Find all eigenvalues and associated eigenvectors.     0 1 0 0 1 0 0 −1 0 0 0  −2 1 −2 −6  ; (c) 0 0 −2; (d)  (a) Sol’n ; (b)  0 0 0 −1. 3 4 −1 −2 0 2 0 0 0 1 0 3. Suppose λ is an eigenvalue with associated eigenvector v for a square matrix A. Show that λn is an eigenvalue with associated eigenvector v for the matrix An . 4. Consider the linear transformation T : R2 → R2 that reflects a vector in the xaxis: see Figure 5. Geometrically determine all eigenvectors and their eigenvalues. 6.2 Diagonalization and Similarity In this section we investigate when an (n × n)-matrix A has n eigenvectors v1 , . . . , vn that form a basis, and how we can use such a basis to transform A into a particularly simple form. Definition 1. An (n × n)-matrix A is called diagonalizable if it has n linearly independent eigenvectors v1 , . . . , vn . In this case, we call v1 , . . . , vn an eigenbasis. We shall explain shortly why we use the term “diagonalizable,” but first let us explore when we have linearly independent eigenvectors. Suppose (λ1 , v1 ) and (λ2 , v2 ) are two eigenpairs for A with λ1 6= λ2 ; we claim that v1 and v2 are linearly independent. To check this, we want to show that c1 v1 +c2 v2 = 0 implies c1 = 0 = c2 . But if we apply (A − λ1 I) to c1 v1 + c2 v2 and note (A − λ1 I)v1 = 0 while (A − λ1 I)v2 = (λ2 − λ1 )v2 , then we conclude c1 v1 + c2 v2 = 0 ⇒ c2 (λ2 − λ1 )v2 = 0. Since λ2 − λ1 6= 0, we must have c2 = 0, and hence c1 = 0. This reasoning can be extended to prove the following useful result. Theorem 1. Suppose v1 , . . . , vk are eigenvectors for A, corresponding to distinct eigenvalues λ1 , . . . , λk . Then v1 , . . . , vk are linearly independent. Fig.5. Reflection in the x-axis 188 CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES Proof. The proof is by mathematical induction. By the reasoning above, we know the result is true for k = 2. Now we assume it is true for k − 1 and we prove it for k. Let c1 v1 + c2 v2 + · · · + ck vk = 0. (6.5) We want to show c1 = c2 = · · · = ck = 0. If we apply (A − λ1 I) to this equation and use (A − λ1 I)v1 = 0 as well as (A − λ1 I)vj = (λj − λ1 )vj for j 6= 1, we obtain c2 (λ2 − λ1 )v2 + · · · ck (λk − λ1 )vk = 0. By the induction hypothesis, we know that v2 , . . . , vk are linearly independent, so c2 (λ2 − λ1 ) = · · · = ck (λk − λ1 ) = 0. But (λj − λ1 ) 6= 0 for j 6= 1, so c2 = · · · = ck = 0. But if we plug these into (6.5), we conclude c1 = 0 too. We conclude that v1 , . . . , vk are linearly independent. 2 Of course, if A has n distinct eigenvalues, then there are n linearly independent eigenvectors: Corollary 1. If an (n × n)-matrix A has n distinct eigenvalues, then A is diagonalizable. However, it is not necessary for a matrix to have distinct eigenvalues in order for it to be diagonalizable. The (3 × 3)-matrix A in Example 4 in the previous section has only two eigenvalues, but it has three linear independent eigenvectors, and so A is diagonalizable. On the other hand, the (3×3)-matrix B in Example 5 in the previous section has only two linear independent eigenvectors, and so B is not diagonalizable. Clearly, the issue is whether the algebraic multiplicity ma (λ) and the geometric multiplicity mg (λ) coincide for each eigenvalue λ. We restate this observation as a theorem: Theorem 2. A square matrix A is diagonalizable if and only if ma (λ) = mg (λ) for each eigenvalue λ. Remark 1. All of the above results apply to complex eigenvalues and eigenvectors as well as real ones. However, as observed at the end of the previous section, we then need to view A as a transformation on Cn ; see Exercise 2. Now let us explain the significance of the term “diagonalizable”. If A has n linearly independent eigenvectors v1 , . . . , vn , then let us introduce the (n×n)-matrix E obtained by using the vj as column vectors: E = v1 v2 · · · vn . Since the column vectors are linearly independent, the column rank of E is n. But, as we saw in Section 4.6, this means that the row rank of E is also n, which implies that E is invertible: E−1 exists. Now let us denote by (λ1 , . . . , λn ) the eigenvalues associated 6.2. DIAGONALIZATION AND SIMILARITY 189 with these eigenvectors (although the λj need not be distinct). Since Avj = λj vj for each j = 1, . . . , n, we see that the product of A and E yields AE = Av1 Av2 · · · Avn = λ1 v1 λ2 v2 · · · λn vn . Now let D denote the diagonal matrix with the λj on its main diagonal:   λ1 0 · · · 0  0 λ2 · · · 0    D = Diag(λ1 , . . . , λn ) =  . .. . .  .. . · · · ..  0 0 · · · λn If we take the product of E and D we get ED = λ1 v1 λ2 v2 ··· λn vn . We conclude that AE = ED. But if we multiply this equation on the left by E−1 , we obtain E−1 AE = D = Diag(λ1 , . . . , λn ). (6.6) Thus, by multiplying A on the right by its eigenvector matrix E and on the left by E−1 , we have turned A into the diagonal matrix with its eigenvalues on the main diagonal. The relationship (6.6) is important enough that it is given a name: Definition 2. Two matrices A and B are similar if there is an invertible matrix S such that S−1 AS = B. Moreover, the above process is reversible, so we have proved the following result that expresses the true meaning of saying that A is diagonalizable: Theorem 3. A square matrix has a basis of eigenvectors if and only if it is similar to a diagonal matrix. Let us illustrate this theorem with an example. Example 1. The (3 × 3)-matrix  4 A = 2 0 −3 −1 0  1 1 2 in Example 4 of Section 6.1 was found to have an eigenvalue λ1 = 1 with eigenvector v1 = (1, 1, 0), and an eigenvalue λ2 = 2 with two eigenvectors, v2 = (3, 2, 0) and v3 = (−1, 0, 2); for notational convenience we write λ3 = 2. Let E be the matrix with these eigenvectors as column vectors:   1 3 −1 E = 1 2 0  . 0 0 2 190 CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES Using our methods of Section 4.4, we compute the inverse of E to find   −2 3 −1 E−1 =  1 −1 1/2 . 0 0 1/2 Now we simply compute  −2 3 E−1 AE =  1 −1 0 0  −1 4 1/2 2 1/2 0  −3 1 1 −1 1 1 0 2 0 3 2 0   −1 1 0  = 0 2 0 0 2 0  0 0 . 2 The result is indeed a diagonal matrix with the eigenvalues on the main diagonal. 2 In (6.6), A and D clearly have the same eigenvalues; but is there a relationship between the eigenvalues of similar matrices in general? Yes: if (λ, v) is an eigenpair for B, i.e. Bv = λv, and B = S−1 AS, then S−1 ASv = λv and we can multiply on the left by S to obtain Aw = λw where w = Sv. In particular, we have shown the following: Theorem 4. Similar matrices have the same eigenvalues and geometric multiplicities. Application to Computing Powers of a Matrix In certain applications, we may have a square matrix A and need to compute its successive powers A2 , A3 , etc. Let us mention such an application: Transition Matrix. Suppose that a sequence of vectors x0 , x1 , x2 , . . . is defined iteratively by xk+1 = A xk , where A is a square matrix. Then A is called the transition matrix, and we can use a power of A to express each xk in terms of x0 : x1 = Ax0 , x2 = Ax1 = AAx0 = A2 x0 , x3 = Ax2 = AA2 x0 = A3 x0 , etc. In general, we have x k = Ak x 0 . Thus, to generate the sequence xk , we need to compute the matrix powers Ak . Fig.3. Transition matrix generates a sequence Now Ak can be computed simply by the usual matrix multiplication, but the calculation becomes increasingly lengthy as k increases. However, if A is diagonalizable, then we can use (6.6) to greatly simplify the calculation. In fact, observe A = E D E−1 , A2 = E D E−1 E D E−1 = E D2 E−1 , A3 = E D2 E−1 E D E−1 = E D3 E−1 , .. . 6.2. DIAGONALIZATION AND SIMILARITY 191 Ak = E Dk E−1 . Moreover, if D = Diag(λ1 , λ2 , . . . , λn ), then D2 = Diag(λ21 , λ22 , . . . , λ2n ), and in general Dk = Diag(λk1 , λk2 , . . . , λkn ). We conclude that Ak = E Diag(λk1 , λk2 , . . . , λkn ) E−1 . (6.7) Example 2. Find A5 where A is the (3 × 3)-matrix in Example 1. Solution. In Example 1 we found E as well as E−1 . Consequently, we can apply (6.7): A5 = E Diag(λ51 , λ52 , λ53 ) E−1  1 = 1 0 3 2 0  94 −93 62 −61 0 0  −1 15 0  0 2 0 0 25 0  0 −2 0  1 25 0 3 −1 0  −1 1/2 1/2  31 31 . 32 This method has certainly simplified the calculation of A5 ! 2 Application to the Matrix Exponential We are familiar with the exponential function ex where x is a real or complex number. It turns out that we can apply the exponential function to an (n × n)-matrix A to obtain an (n × n)-matrix eA called the matrix exponential . This is formally defined by simply by replacing x with A in the familiar formula ex = 1 + x + 21 x2 + · · · : eA = I + A + A2 A3 + + ··· 2! 3! (6.8) It can be shown that the series (6.8) always converges to an (n × n)-matrix; the problem that we address here is how to compute eA when A is diagonalizable. The simplest case is when A itself is a diagonal matrix: A = Diag(λ1 , . . . , λn ). In this case, we know that Ak = Diag(λk1 , . . . , λkn ) for k = 1, 2, . . . , and so 1 Diag(λ21 , . . . , λ2n ) + · · · 2! 1 1 = Diag(1 + λ1 + λ21 + · · · , . . . , 1 + λn + λ2n + · · · ) 2! 2! = Diag(eλ1 , . . . , eλn ). eA = I + Diag(λ1 , . . . , λn ) + 192 CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES Example 3.  1 A = 0 0 0 2 0  0 0 3  ⇒ eA e = 0 0 0 e2 0  0 0. e3 2 It is only slightly more work to calculate eA when A is diagonalizable: A = E D E−1 ⇒ Ak = E Dk E−1 for k = 1, 2, . . . ⇒ eA = I + E D E−1 + = E (I + D + E D2 E−1 + ··· 2! D2 + · · · )E−1 = E eD E−1 . 2! We summarize this as eA = E Diag(eλ1 , . . . , eλn ) E−1 . Example 4. Let us compute eA for the    1 3 −1 1 E = 1 2 0  , D = 0 0 0 2 0 (6.9) matrix A in Example 1. There we had    0 0 −2 3 −1 2 0 , E−1 =  1 −1 1/2 . 0 2 0 0 1/2 Consequently, eA     1 3 −1 e 0 0 −2 3 −1 = 1 2 0  0 e2 0   1 −1 1/2 0 0 2 0 0 e2 0 0 1/2   2 2 2 −2e + 3e 3e − 3e −e + e = −2e + 2e2 3e − 2e2 −e + e2  . 2 0 0 e2 We shall find the matrix exponential useful when studying systems of first-order equations in Chapter 7. Exercises 1. Determine whether the given matrix is diagonalizable; a diagonal matrix D so that E−1 AE = D. −7 4 4 −2 ; Solution ; (b) (a) −4 1 1 1      1 0 1 0 1 3 0 0 (d) 0 2 0; (e) −1 2 0; (f)  0 0 0 2 −1 1 1 0 if so, find a matrix E and (c) 6 4 0 −2 1 −2 0 −1 0 0 −6 ; −4  0 0 . 0 1 6.3. SYMMETRIC AND ORTHOGONAL MATRICES 193 2. The following matrices have (some) complex eigenvalues. given matrix is diagonalizable; if so, find a matrix E and that E−1 AE = D. 0 1 0 2 (b) (a) ; (c) Sol’n ; −2 2 −2 0 3. Calculate Ak for the given matrix  1 3 −2 (a) , k = 5; (b) 0 1 0 0 4. Calculate eA for the following  1 3 −2 (a) ; (b) 0 1 0 0 6.3 Determine whether the a diagonal matrix D so  0 0 5 A and integer k.   3 0 1 2 0, k = 6; (c) 0 0 2 0 matrices A:   3 0 1 2 0; (c) 0 0 2 0 1 0 −7  0 1. 3  −2 1 1 0, k = 8. −2 2  −2 1 1 0. −2 2 Symmetric and Orthogonal Matrices Recall that a square matrix A is symmetric if AT = A. In this section we shall see that symmetric matrices are always diagonalizable. In fact, the matrix S for which S−1 AS = Diag(λ1 , . . . , λn ) can be chosen to have some special additional properties. To discuss the results, let h , i denote the natural inner product on Rn or Cn : hv, wi = v · w. Recall that v and w are orthogonal if hv, wi = 0. Now, if A is a real symmetric (n × n)-matrix, then for any (real or complex) vectors v and w we have hAv, wi = Av · w T =v·A w = v · Aw (by definition of h , i) (using (4.10) in Section 4.1) (since A is symmetric) = v · Aw (since A is real) = hv, Awi (by definition of h , i). We summarize this calculation as a lemma: Lemma 1. If A is a real symmetric (n × n)-matrix, then hAv, wi = hv, Awi for any vectors v and w in Rn or Cn . Now let us state the main result for symmetric matrices. Theorem 1. If A is a real symmetric (n × n)-matrix, then it is diagonalizable. Moreover: (a) All eigenvalues of A are real; (b) Eigenvectors corresponding to distinct eigenvalues are orthogonal; (c) A has a set of n orthonormal eigenvectors. 194 CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES Proof. We here prove (a) and (b); part (c) will be proved at the end of this section. (a) If λ is an eigenvalue with eigenvector v, then Av = λv implies hAv, vi = hλv, vi = λhv, vi = λkvk2 . Using this and the inner product property hu, wi = hw, ui, we obtain hv, Avi = hAv, vi = λkvk2 . But by Lemma 1 we have hAv, vi = hv, Avi, so 0 = hAv, vi − hv, Avi = (λ − λ)kvk2 . Since kvk = 6 0, we conclude λ − λ = 0, i.e. λ is real. (b) Suppose we have distinct eigenvalues λ1 and λ2 with associated eigenvectors v1 and v2 . Then hAv1 , v2 i = hλ1 v1 , v2 i = λ1 hv1 , v2 i and hv1 , Av2 i = hv1 , λ2 v2 i = λ2 hv1 , v2 i = λ2 hv1 , v2 i, where in the last step λ2 = λ2 follows from (a). But by Lemma 1 we have hAv1 , v2 i = hv1 , Av2 i, so 0 = λ1 hv1 , v2 i − λ2 hv1 , v2 i = (λ1 − λ2 )hv1 , v2 i. But since we assumed λ1 − λ2 6= 0, we conclude hv1 , v2 i = 0, i.e. v1 ⊥ v2 . Example 1. Find the eigenvalues and an matrix  1 A = 2 1 orthonormal basis of eigenvectors for the  2 1 4 2 . 2 1 Solution. First we find the eigenvalues: 1−λ 2 1 2 1 0 4−λ 2 = 0 2 1−λ 1 2 2λ −λ 2 −λ2 + 2λ 2λ = λ2 (6 − λ). 1−λ So λ = 0 is a double eigenvalue and λ = 6 is a single eigenvalue. For λ = 0 we find its eigenvectors by finding the RREF of the matrix:     1 2 1 1 2 1 x2 = s 2 4 2 ∼ 0 0 0 ⇒ x3 = t 1 2 1 0 0 0 x1 = −2s − t. If we express the solution in vector form       −2s − t −2 −1 x =  s  = s 1  + t 0 . t 0 1 6.3. SYMMETRIC AND ORTHOGONAL MATRICES 195 We see that the zero eigenspace is spanned by the vectors v1 = (−1, 0, 1) and v2 = (−2, 1, 0). (We chose v1 to be the simpler vector to make subsequent calculations a little easier.) Now let us apply Gram-Schmidt to convert these to an orthonormal basis: v1 1 = √ (−1, 0, 1), kv1 k 2 2 1 √ (−1, 0, 1) = (−1, 1, −1), w2 = v2 − hv2 , u1 iu1 = (−2, 1, 0) − √ 2 2 w2 1 u2 = = √ (−1, 1, −1). kw2 k 3 Now let us find the eigenvector associated with the eigenvalue λ = 6:     x3 = t 1 0 −1 −5 2 1  2 −2 2  ∼ 0 1 −2 ⇒ x2 = 2t x1 = t. 0 0 0 1 2 −5 u1 = Choosing t = 1 we find we have the eigenvector v3 = (1, 2, 1), and we normalize to find u3 = v3 1 = √ (1, 2, 1). kv3 k 6 2 We have found our orthonormal set of eigenvectors {u1 , u2 , u3 }. Remark 1. Since a real symmetric matrix A has all real eigenvalues, we can also take eigenvectors to be real. Consequently, we henceforth use Rn as our vector space and dot product as the inner product. Orthogonal Matrices Theorem 1 shows that a real symmetric matrix A has an eigenbasis, so (by the results of the previous section) A is diagonalizable: E−1 AE = D, where E is the matrix with the eigenbasis as column vectors and D is the diagonal matrix with the eigenvalues on the diagonal. But since the eigenbasis provided by Theorem 1 is orthonormal, E has some nice additional features. Let O be an invertible real (n × n)-matrix whose column vectors u1 , . . . , un are orthonormal. Then OT has u1 , . . . , un as its row vectors, so     u1 u1 · u1 u1 · u2 · · · u1 · un  u2       u2 · u1 u2 · u2 · · · u2 · un  OT O =  .  u1 u2 · · · un =  . . . . .. .. ..   ..   ..  un  1 0  = .  .. 0 un · u1  0 1 .. . ··· ··· .. . 0 0  ..  = I. . 0 ··· 1 un · u2 ··· un · un (6.10) 196 CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES Now, if we multiply the matrix equation OT O = I on the right by O−1 , we obtain OT = O−1 . This property is so important that we give it a special name. Definition 1. A square real matrix O is called orthogonal if OT = O−1 . (6.11) The calculation (6.10) shows the following Proposition 1. A real (n × n)-matrix O is orthogonal if and only if its column vectors form an orthonormal basis for Rn . Example 2. Find the inverse of the matrix √   √ 1/ 2 0 1/ 2 1 0√  . O =  0√ −1/ 2 0 1/ 2 Solution. We observe that the column vectors are orthonormal, so O is an orthogonal matrix and we can use (6.11) to compute its inverse: √   √ 1/ 2 0 −1/ 2 1 0√  . O−1 = OT =  0√ 2 1/ 2 0 1/ 2 Another important property enjoyed by orthogonal matrices is that they preserve the length of vectors. In fact, this property characterizes orthogonal matrices: Proposition 2. A real (n × n)-matrix O is orthogonal if and only if kO vk = kvk for all v ∈ Rn . Proof. It is easy to see that OT = O−1 implies kOvk = kvk for all v: kO vk2 = O v · O v = OT O v · v = O−1 O v · v = v · v = kvk2 . Conversely, let us assume kO vk = kvk for all v ∈ Rn . Then in particular kOei k = kei k = 1, so the vectors u1 = Oe1 , . . . , un = Oen are all unit vectors. Moreover, for i 6= j we have kui + uj k2 = kOei + Oej k2 2 = kO(ei + ej )k = kei + ej k 2 (by definition of ui ) (by linearity) (by our assumption kO vk = kvk for all v ∈ Rn ) = kei k2 + 2 ei · ej + kej k2 2 = kei k + kej k 2 2 (since ei · ej = 0) = kOei k + kOej k2 = kui k2 + kuj k2 (by expanding (ei + ej ) · (ei + ej )) (by our assumption kO vk = kvk for all v ∈ Rn ) (by definition of ui ). 6.3. SYMMETRIC AND ORTHOGONAL MATRICES 197 On the other hand, kui + uj k2 = kui k2 + 2 ui · uj + kuj k2 , so we conclude ui · uj = 0. Consequently, {u1 , . . . , un } is an orthonormal basis for Rn . But u1 = Oe1 , . . . , un = Oen are the column vectors of O, so Proposition 1 implies that O is an orthogonal matrix, as we wanted to show. 2 It is still not immediately clear why a matrix satisfying (6.11) should be called “orthogonal.” The following result shows the terminology is justified since an orthogonal matrix maps orthogonal vectors to orthogonal vectors. Corollary 2. If O is an orthogonal (n × n)-matrix and v, w are orthogonal vectors in Rn , then Ov and Ow are orthogonal vectors in Rn . Proof. Using kOvk = kvk and kOwk = kwk, we compute kOv + Owk2 = kOvk2 + 2 Ov · Ow + kOwk2 = kvk2 + 2 Ov · Ow + kwk2 . Meanwhile, using v · w = 0, we find kv + wk2 = kvk2 + kwk2 . But these must be equal since kO(v + w)k = kv + wk. We conclude Ov · Ow = 0. 2 Now let us return to our diagonalization of a real symmetric matrix A. If we use the orthonormal eigenvectors provided by Theorem 1, then E is orthogonal, so let us denote it by O. Hence we obtain Theorem 2. If A is a real symmetric (n × n)-matrix, then there is an orthogonal matrix O so that OT A O = Diag(λ1 , . . . , λn ), (6.12) where λ1 , . . . , λn are the eigenvalues for A. An (n × n)-matrix A satisfying (6.12) is called orthogonally diagonalizable. Note that “orthogonally diagonalizable” is a stronger condition than just “diagonalizable.” Example 1 (revisited). For the matrix A given in Example 1, the orthogonal matrix O that provides the diagonalization OT A O = Diag(0, 0, 6) is  1  − √2 − √13 √16  √1 √2  O= 0 . 2 3 6 1 1 √1 √ √ − 3 2 6 Change of Basis There is an important interpretation of (6.12) as expressing the action of A in terms of a nonstandard basis for Rn . Let us discuss what changing the basis means for the action of A before we explore its application to (6.12) and the proof of Theorem 1(c). Suppose {v1 , . . . , vn } is an orthonormal basis for Rn (different from the standard basis {e1 , . . . , en }). Then any vector w in Rn can be expressed either as a linear 198 CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES combination of the standard basis vectors {e1 , . . . , en } or this new basis {v1 , . . . , vn }. Let us denote the coefficients of these linear combinations respectively as x1 , . . . , xn and y1 , . . . , y n : w = x1 e1 + · · · xn en = y1 v1 + · · · + yn vn . (6.13) Notice that, like x1 , . . . , xn , the coefficients y1 , . . . , yn are unique (cf. Exercise 4). Therefore, we can think of y1 , . . . , yn as the coordinates of w in the basis {v1 , . . . , vn } in the same way that x1 , . . . , xn are the coordinates of w in the basis {e1 , . . . , en }, provided we do not change the order of the v1 , . . . , vn . Therefore we specify that B = {v1 , . . . , vn } is an ordered orthonormal basis for Rn , and we call y1 , . . . , yn in (6.13) the B-coordinates for w. What is the relationship between the standard coordinates xi and the B-coordinates yi in (6.13)? If we let O be the matrix [v1 · · · vn ], then we see from (6.13) that x = O y. Moreover, O is an orthogonal matrix, so we can multiply both sides of this last equation by OT = O−1 to conclude y = OT x. Now suppose that we have a linear transformation T : Rn → Rn . Usually such a linear transformation is defined using an (n × n)-matrix A and matrix multiplication: T w = A x, where w = x1 e1 + · · · + xn en . (6.14) On the other hand, if we are first given the linear transformation T , then we can find the matrix A so that (6.14) holds simply by computing T e1 , . . . , T en , and using them as column vectors. But we could also express T using matrix multiplication on the coordinates in the basis B = {v1 , . . . , vn }: If w = y1 v1 + · · · + yn vn , then T w = z1 v1 + · · · + zn vn , where z = AB y. Here AB is the (n × n)-matrix that transforms the B-coordinates for w to the Bcoordinates for T w. What is the relationship between the matrices A and AB ? Using y = OT x, we must have AB OT x = OT A x, which we can abbreviate as AB OT = OT A. But now we can multiply on the right by O to express this as AB = OT A O. We summarize this discussion in the following: Proposition 3. Suppose B = {v1 , . . . , vn } is an ordered orthonormal basis for Rn . Then, for any vector w in Rn , its coordinates x1 , . . . , xn in the basis {e1 , . . . , en } and its coordinates y1 , . . . , yn in the basis {v1 , . . . , vn } are related by Oy = x or equivalently y = OT x, where O = [v1 · · · vn ]. Moreover, the action of an (n × n)-matrix A : Rn → Rn (using the x-coordinates) can be expressed in terms of the y-coordinates for the basis B using the matrix AB = OT A O. (6.15) 6.3. SYMMETRIC AND ORTHOGONAL MATRICES 199 Now let us interpret the formula (6.12) in light of Proposition 3: it simply says that if we compute the action of a real symmetric (n × n)-matrix A in terms of its basis of orthonormal eigenvectors B, we get a diagonal matrix: AB = Diag(λ1 , . . . , λn ). (6.16) In other words, a real symmetric matrix admits a basis {v1 , . . . , vn } in which the matrix has been diagonalized. Proof of Theorem 1(c). We want to show that a real symmetric (n × n)-matrix A has n orthonormal eigenvectors. We proceed by induction, namely we assume that any real symmetric (n − 1) × (n − 1)-matrix has n − 1 orthonormal eigenvectors. But we know all eigenvalues of A are real, so let us pick one and call it λ1 . Let u1 be a normalized eigenvector associated with λ1 , and let V1 be the set of vectors in Rn that are orthogonal to u1 : V1 = {w in Rn such that w · u1 = 0}. Notice that V1 is an (n − 1)-dimensional real vector space, so we can use the GramSchmidt procedure (cf. Section 5.7), to find an orthonormal basis for V1 ; we denote this basis for V1 by u2 , . . . , un . Note that u1 , u2 , . . . , un is an orthonormal basis for Rn . If v is in V1 , then Av is also in V1 since Av · u1 = 0: Av · u1 = v · Au1 = v · λ1 u1 = λ1 v · u1 = 0. So A maps V1 to itself. In terms of the basis B = {u1 , . . . , un } for Rn , we see that λ1 0 AB = e , 0 A e is a real (n − 1) × (n − 1)-matrix. Is A e symmetric? Clearly it is if AB is where A symmetric. But by (6.15), AB = OT AO, where O = [u1 · · · un ], so ATB = (OT A O)T = OT A O = AB . e is symmetric. Consequently, by hypothWe conclude that AB is symmetric and hence A e esis, A has (n − 1) orthonormal eigenvectors ũ2 , . . . , ũn in V1 . (These are most likely different from u2 , . . . , un , which were not required to be eigenvectors.) Thus AB has n orthonormal eigenvectors: u1 , ũ2 , . . . , ũn . But this implies that A has n orthonormal eigenvectors (cf. Exercise 5). 2 Example 3. Let B = {u1 , u2 } be the ordered orthonormal basis for R2 consisting of u1 = √12 (1, −1) and u2 = √12 (1, 1), and T : R2 → R2 be a linear transformation satisfying T u1 = u1 + 2u2 and T u2 = u1 − u2 . Find the matrix A satsfying (6.14). Solution. In this case we have √ √ √ 1/ √2 1/√2 1/√2 O= , OT = −1/ 2 1/ 2 1/ 2 √ −1/√ 2 , 1/ 2 and AB = 1 2 1 . −1 200 CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES Consequently A = OAB OT = √ 1/ √2 −1/ 2 √ 1/√2 1 1/ 2 2 √ 1 1/√2 −1 1/ 2 √ √ −1/√ 2 3/ √2 = 1/ 2 −1/ 2 √ −3/√2 .2 −3/ 2 Exercises 1. For the following symmetric (n × n)-matrices, find a set of n orthonormal eigenvectors: 1 2 4 6 (a) (b) 2 1 6 9     3 1 0 0 1 0 (d) 1 0 1 (c) 1 3 0 0 0 2 0 1 0 2. For the folowing matrices A, find an D so that OT AO = D: 1 3 (a) (b) 3 1   −1 0 0 (c)  0 1 2 (d) 0 2 1 orthogonal matrix O and a diagonal matrix 1 −1 −1 1   2 0 0 0 3 1 0 1 3 3. Use (6.11) to find the inverses of the following matrices: √    √ 1 −1 1/√3 1/ 2 0 (b) √12 0 0 (a)  1/ √3 0√ 1 1 1 −1/ 3 1/ 2 0 √   2 1 √0 −1 1 0   √2 1 (c)  −1 2 −1 √0  −1 0 1 2  √0 2 0 4. Show that the coefficients y1 , . . . , yn in (6.13) are unique. 5. If A is an (n × n)-matrix with n orthonormal eigenvectors and B = OT AO where O is an orthogonal (n × n)-matrix, show that B has n orthonormal eigenvectors. 6. Let B = {u1 , u2 , u3 } be the ordered orthonormal basis for R3 consisting of vectors u1 = √12 (1, 0, 1), u2 = (0, 1, 0), and u3 = √12 (−1, 0, 1), and T : R3 → R3 be a linear transformation satisfying T u1 = u1 + u3 and T u2 = u2 − u3 , and T u3 = u1 − u2 . Find the matrix A satsfying (6.14). 6.4. ADDITIONAL EXERCISES 6.4 201 Additional Exercises 1. Let P denote the polynomials with real coefficients on (−∞, ∞) and let T denote differentiation: T (f ) = f 0 . (Recall from Exercise 2 in Section 5.5 that P is infinitedimensional.) (a) Show that T : P → P is a linear transformation. (b) Find all eigenvalues of T : P → P and describe the eigenspace. 2. Let V = C ∞ (R) denote the “smooth” real-valued functions on (−∞, ∞), i.e. functions for which all derivatives are continuous. As in the previous problem, let T denote differentiation: T (f ) = f 0 . (a) Show that V is infinite-dimensional. (b) Find all eigenvalues of T : V → V and describe the eigenspace. 3. If A is a square matrix, show that it is invertible if and only if 0 is not an eigenvalue. 4. If A is an invertible matrix, show that λ is an eigenvalue for A if and only if λ−1 is an eigenvalue for A−1 . 5. If B is a square matrix, show that B and BT have the same eigenvalues. (Recall from Section 4.5 that det(A) = det(AT ).) 6. Suppose v is an eigenvector with eigenvalue λ for A and also an eigenvector with eigenvalue µ for B. (a) Show v is an eigenvector for AB and find its eigenvalue. (b) Show v is an eigenvector for A + B and find its eigenvalue. 7. Let M 2 (R) denote the (2 × 2)-matrices with real coefficients and 1 A= 0 0 . −1 (a) Show that TA : B 7→ AB defines a linear transformation on the vector space M 2 (R). (b) Find all eigenvalues and an eigenbasis for TA . 8. As in the previous problem, consider 1 A= 1 0 2 as defining a linear transformation TA on M 2 (R). Find all eigenvalues and an eigenbasis for TA . 9. For a square matrix A, show that the constant term (i.e. coefficient of λ0 ) in the characteristic polynomial p(λ) is det(A). 202 CHAPTER 6. LINEAR TRANSFORMATIONS AND EIGENVALUES A stochastic matrix is one for which the sum of the elements in each column is 1; such matrices arise naturally in certain applications. A (2 × 2) stochastic matrix takes the simple form p 1−q A= where 0 < p, q < 1. (6.17) 1−p q An interesting feature of stochastic matrices is that the limit Ak v as k → ∞ is the same for all vectors v = (x, y) with the same value for x + y. This will be illustrated in the exercises below. 10. Let A be the stochastic matrix (6.17). (a) Show that the eigenvalues of A are λ1 = 1 and λ2 = p + q − 1 where |λ2 | < 1. (b) If v1 and v2 are eigenvectors for λ1 and λ2 respectively, show that Ak v1 = v1 for all k = 1, 2, . . . and Ak v2 → 0 as k → ∞. (c) If v0 = (x0 , y0 ) has the property x0 + y0 = 1 and, show that v1 = Av0 has the same property: v1 = (x1 , y1 ) satisfies x1 + y1 = 1. (d) Show that Ak → [v1∗ v1∗ ] as k → ∞, where v1∗ = (x∗1 , y1∗ ) is the eigenvector for the eigenvalue λ1 = 1 satisfying x∗1 + y1∗ = 1. 11. The population of a state is fixed at 10 million, but divided into x million in urban areas and y million in rural areas, so x + y = 10. Each year, 20% of the urban population moves to rural areas, and 30% of the rural population moves to urban areas. (a) If the initial populations are x0 , y0 , introduce a stochastic matrix A as in (6.17) so that the populations x1 , y1 after one year satisfy x1 x =A 0 . y1 y0 (b) If initially x0 = 5 = y0 , find the populations after 10 years, i.e. x10 and y10 . (c) If initially x0 = 7 and y0 = 3, find x10 and y10 . (d) Explain the similar results that you obtained in (b) and (c) by referring to part (d) of the previous exercise. Chapter 7 Systems of First-Order Equations 7.1 Introduction to First-Order Systems In this chapter we want to study a first-order system of the form dx1 = f1 (x1 , . . . , xn , t) dt dx2 = f2 (x1 , . . . , xn , t) dt .. . dxn = fn (x1 , . . . , xn , t), dt (7.1) with initial conditions of the form x1 (t0 ) = b1 , x2 (t0 ) = b2 , . . . , xn (t0 ) = bn . (7.2) Since all differential equations in this chapter will have t as the independent variable, we often will write x0i in place of dxi /dt. As we saw in Section 2.1, a system of first-order equations arises naturally in the study of an nth-order differential equation. For example, recall the initial-value problem for a forced mechanical vibration that we studied in Section 2.6: mx00 + cx0 + kx = f (t) (7.3) x(0) = x0 , x0 (0) = v0 , If we introduce x1 = x and x2 = x0 , then (7.3) is equivalent to the first-order system with initial conditions x01 = x2 , x1 (0) = x0 , 1 x02 = (f (t) − cx2 − kx1 ), x2 (0) = v0 . m 203 Fig.1. Spring-mass system (7.4) 204 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS We shall study this system and its generalization to coupled mechanical vibrations later in this chapter. But first-order systems like (7.1) also arise naturally in applications involving firstorder processes. For example, let us generalize the mixing problems of Section 1.4 to two tanks as in Figure 1. If we let r0 and c0 denote respectively the rate and the concentration of the solute, say salt, in the inflow to Tank 1, and we let rj denote the rate of outflow from Tank j (for j = 1, 2) then we get a first-order system for the amounts of salt x1 (t) and x2 (t) in the two tanks: Fig.1. Mixing with Two Tanks x01 = k0 − k1 x1 x02 = k1 x1 − k2 x2 , (7.5) where k0 = c0 r0 , k1 (t) = r1 /V1 (t), and k2 (t) = r2 /V2 (t). If we are given the initial amounts of salt in each tank, x1 (0) and x2 (0), then we expect that we can solve (7.5) to find the amounts x1 (t) and x2 (t) at any time t. Existence and Uniqueness of Solutions Notice that we can simplify (7.1) by using vector notation: if we let x = (x1 , . . . , xn ) and f = (f1 , . . . , fn ), then we can write (7.1) and its initial condition (7.2) as dx = f (x, t) dt x(t0 ) = x0 , (7.6) where we have let x0 = (b1 , . . . , bn ). If a solution x(t) exists, then it can be considered as defining a curve in n-dimensional space starting at the point x0 . Let us now address the question of whether a unique solution exists and for how long. In the following, we use fx to denote ∂fi /∂xj for i, j = 1, . . . , n, i.e. all first-order derivatives of the components fi with respect to the xj . Theorem 1. Suppose t0 is a fixed point in an interval I = (a, b) and x0 is a fixed point in Rn such that f (x, t) and fx (x, t) are continuous for a < t < b and |x − x0 | < R. Then there is a unique solution x(t) of (7.6) defined for t in a neighborhood of t0 , i.e. for |t − t0 | < where is positive but may be small. This theorem is quite analogous to Theorem 1 in Section 1.2, and is also proved using successive approximations. In particular, since the dependence of f upon x may be nonlinear, the solution x(t) need not be defined on all of the interval I. x2 Autonomous Systems and Geometric Analysis x1 Fig.2. Vector Field f and a Solution Curve Like first-order equations, first-order systems are called autonomous if the function f in (7.6) does not depend on t : dx = f (x). (7.7) dt Notice that f assigns an n-vector to each point x; in other words, f is a vector field on Rn . Moreover, a solution x(t) of (7.7) is a curve in Rn whose tangent vector at each point is given by the vector field at that point; this is illustrated in Figure 2 for n = 2. By sketching the vector field and a few sample solution curves, we obtain geometric 7.1. INTRODUCTION TO FIRST-ORDER SYSTEMS 205 insight into the behavior of solutions of a first-order autonomous system. When n = 2, such pictures are called phase plane portraits. Some examples are shown in Figures 3 and 4; we have added arrows to indicate the direction of increasing t. Like first-order autonomous equations, first-order autonomous systems admit equilibrium solutions and stability analysis. Values x0 for which f (x0 ) = 0, i.e. for which the vector field vanishes, are called critical points and they correspond to equilibrium solutions of (7.7) if we let x(t) ≡ x0 . The notion of stability for an equilibrium solution of a first-order system is similar to the case of a single equation that we discussed in Section 1.2, but is a little different since it involves solution curves in n-dimensional space: • An equilibrium solution x0 for (7.7) is stable if all nearby solutions remain nearby. This is certainly true if all nearby solutions approach x0 ; as in Section 1.2 such an equilibrium is called a sink. But for systems it is possible for solutions to remain nearby by orbiting around the equilibrium; such an equilibrium is called a center. These stable equilibria are illustrated below. x2 x1 Fig.3. Phase Plane x2 Fig.5. Two Stable Equilibria: a Sink (left) and a Center (right). • An equilibrium solution x0 for (7.7) is unstable if at least some nearby solutions move away from x0 . This is certainly true if all nearby solutions move away from the equilibrium; as in Section 1.2 such an equilibrium is called a source. However, for systems it could be that some solutions approach x0 while others move away; such an equilibrium is called a saddle. These unstable equilibria are illustrated below. Fig.6. Two Unstable Equilibria: a Source (left) and a Saddle (right). x1 Fig.4. Phase Plane 206 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS In applications where initial conditions can only be approximated, an unstable equilibrium is of limited significance since a very small perturbation could yield an extremely different outcome. On the other hand, stable equilibria generally represent very important states of a physical system. Example 1. Let us consider damped free vibrations of a mechanical system, i.e. (7.3) with f ≡ 0: mx00 + cx0 + kx = 0. (7.8) If we introduce x1 = x and x2 = x0 , then we obtain the first-order system x01 = x2 x02 = − x2 x1 We see that the only critical point of (7.9) is x1 = 0, x2 = 0, corresponding to the trivial solution of (7.8) x(t) ≡ 0. We shall study (7.9) in detail in Section 7.5, but for now let us use the solutions that we obtained in Chapter 2 for (7.8) to reach conclusions about the stability of the critical point (0, 0) for (7.9). Let us consider the overdamped (c2 > 4mk) and underdamped (c2 < 4mk) cases separately. If (7.8) is overdamped, then we know from Section 2.6 that the general solution is x(t) = c1 er1 t + c2 er2 t Fig.7. Phase Plane for an Overdamped Vibration (7.9) c k x1 − x2 . m m where r1 < r2 < 0. Recalling that x1 = x and x2 = x0 , we see that the general solution of (7.9) can be written c1 er1 t + c2 er2 t x(t) = 0 r1 t where c0j = cj rj . c1 e + c02 er2 t We see that both components of x(t) tend to zero exponentially as t → ∞, so (0, 0) is a sink; this is illustrated in Figure 7 (with m = k = 1, c = 3). If (7.8) is underdamped, then we know from Section 2.6 that the general solution is ct x(t) = e− 2m (c1 cos µt + c2 sin µt) x2 where µ > 0. In terms of our vector formulation, we have x1 Fig.8. Phase Plane for an Underdamped Vibration − ct e 2m (c1 cos µt + c2 sin µt) x(t) = − ct 0 e 2m (c1 cos µt + c02 sin µt) for certain constants c0j . We again have both components decaying exponentially to zero as t → ∞, but both also oscillate about 0. If we plot the solution curves in the phase plane, we see that they spiral about (0, 0) as they approach it; such an equilibrium is called a stable spiral. Thus for both overdamped and underdamped mechanical vibrations (as well as critically damped vibrations whose phase plane diagram looks much like that in Figure 7), the origin is a stable equilibrium. Clearly this equilibrium is important as the “end state” of the damped system. 2 7.2. THEORY OF FIRST-ORDER LINEAR SYSTEMS 207 Exercises 1. For the following second-order linear equations: i) use x1 = x and x2 = x0 to replace the equation with an equivalent first-order system; ii) use the general solution for the second-order equation to determine the stability of the critical point (0, 0) for the first-order system. (a) x00 − 4x0 + 3x = 0, (b) x00 + 9x = 0, (c) x00 − 2x0 − 3x = 0, (d) x00 + 5x0 + 6x = 0. 2. For the following second-order nonlinear equations: i) use x1 = x and x2 = x0 to replace the equation with an equivalent first-order system; ii) find all critical points for the system. (a) x00 + x(x − 1) = 0, (b) x00 + x0 + ex = 1, (c) x00 = sin x, (d) x00 = (x0 )2 + x(x − 1). 3. For each of the first-order systems obtained in Exercise 2, plot the vector field near each critical point to determine its stability. 7.2 Theory of First-Order Linear Systems Now let us bring our knowledge of linear algebra into the analysis of a system of first-order linear equations or first-order linear system of the form x01 = a11 (t)x1 + a12 (t)x2 + · · · + a1n (t)xn + f1 (t) x02 = a21 (t)x1 + a22 (t)x2 + · · · + a2n (t)xn + f2 (t) .. . (7.10) x0n = an1 (t)x1 + an2 (t)x2 + · · · + ann (t)xn + fn (t), where aij (t) and fi (t) are known functions of t. If f1 = · · · fn = 0, then (7.10) is homogeneous; otherwise it is nonhomogeneous. An initial-value problem for (7.10) specifies the values of x1 , . . . xn at some value t0 of t: x1 (t0 ) = b1 , x2 (t0 ) = b2 , ... xn (t0 ) = bn , (7.11) where b1 , . . . , bn are given numbers. As we saw in Section 2.1, higher-order equations can be reduced to first-order systems by introducing additional unknown functions. Similarly, given a first-order system of differential equations, we may be able to eliminate variables and obtain a single higherorder equation involving just one of the unknown functions, say x1 ; if we can solve that equation for x1 , then we can use it to obtain the other unknown functions. This method of elimination is particularly useful for a first-order linear system with two equations and two unknowns since it reduces to a second-order linear differential equation. Example 1. (a) Find the general solution of x01 = x2 x02 = 2x1 + x2 . 208 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS (b) Find the particular solution satisfying the initial conditions x1 (0) = −1, x2 (0) = 0. Solution. (a) We differentiate the first equation, and then use both equations to eliminate x2 : x001 = x02 = 2x1 + x2 = 2x1 + x01 . This is a second-order equation for x1 : x001 − x01 − 2x1 = 0. The characteristic equation is r2 − r − 2 = (r + 1)(r − 2), so the general solution is x1 (t) = c1 e−t + c2 e2t . Now we use the first equation to obtain x2 from x1 simply by differentiation: x2 (t) = −c1 e−t + 2c2 e2t . (b) We use the initial conditions to find the constants c1 and c2 : x1 (0) = c1 + c2 = −1 x2 (0) = −c1 + 2c2 = 0. We easily solve these two equations to find c1 = −2/3 and c2 = −1/3. So the solution satisfying the initial conditions is x1 (t) = − 32 e−t − 13 e2t , x2 (t) = 23 e−t − 23 e2t . 2 If we use vector and matrix notation, we can greatly simplify the expression (7.10). We define an (n × n)-matrix valued function A(t) and n-vector valued function f (t) by     a11 a12 · · · a1n f1  a21 a22 · · · a2n   f2      A= . f =  . . ,  ..  ..   an1 an2 · · · ann fn These are functions of t, which we generally assume are continuous on an interval I, continuity being defined in terms of their element functions. Now we can write (7.10) as x0 = A(t) x + f (t), (7.12) where x is the n-vector valued function treated as a column vector   x1 (t)  x2 (t)    x(t) =  .  .  ..  xn (t) Similarly, the initial condition is simplified using vector notation: x(t0 ) = b. (7.13) The existence and uniqueness theorem discussed in the previous section assures us that solutions of of the initial-value problem (7.12)-(7.13) exist, but the linearity of (7.12) implies a stronger result: 7.2. THEORY OF FIRST-ORDER LINEAR SYSTEMS 209 Theorem 1. If A(t) and f (t) are continuous on an interval I containing t0 , then (7.12)-(7.13) has a unique solution x defined on I. Note that linearity implies that existence and uniqueness hold on all of the interval I, and not just near t0 . As was the case for second-order equations, the solution of the nonhomogeneous system (7.12) is closely connected with the associated homogeneous system x0 = A(t) x. (7.14) If c1 , c2 are constants and x1 , x2 are both solutions of (7.14), then by linearity, c1 x1 + c2 x2 is also a solution, so the solutions of (7.14) form a vector space. In fact, if the solutions x1 , . . . , xn are linearly independent, then we want to show that every solution of (7.14) is of the form x(t) = c1 x1 (t) + · · · cn xn (t) (7.15) for some choice of the constants c1 , . . . , cn . As in previous sections, the linear independence of functions is determined by the “Wronskian.” Definition 1. Let x1 , . . . , xn be n-vector functions defined on an interval I. The Wronskian of x1 , . . . , xn is the determinant of the (n × n)-matrix obtained by using the xj as column vectors: W [x1 , . . . , xn ](t) = det x1 (t) x2 (t) · · · xn (t) . Theorem 2. Suppose x1 , . . . , xn are n-vector-valued functions on an interval I. (a) If W [x1 , . . . , xn ](t0 ) 6= 0 for some t0 in I, then x1 , . . . , xn are linearly independent functions. (b) If x1 , . . . , xn are all solutions of (7.14) and W [x1 , . . . , xn ](t0 ) = 0 for some t0 in I, then x1 , . . . , xn are linearly dependent on I. Proof. (a) Let c1 x1 (t) + · · · cn xn (t) ≡ 0 for t in I; we must show c1 = · · · = cn = 0. But if we let X(t) denote the (n×n)-matrix with the xj as column vectors, and c denote the vector (c1 , . . . , cn ), then we can write our assumption as X(t) c = 0; in particular, we have X(t0 ) c = 0. But W [x1 , . . . , xn ](t0 ) = det X(t0 ) 6= 0, so X(t0 ) is invertible and we must have c = 0. (b) If W [x1 , . . . , xn ](t0 ) = 0 then the n vectors x1 (t0 ), . . . , xn (t0 ) are linearly dependent, so there exists a nonzero vector c = (c1 , . . . , cn ) such that c1 x1 (t0 ) + · · · cn xn (t0 ) = 0. Using these constants, define x(t) = c1 x1 (t) + · · · cn xn (t), which satisfies (7.14) and x(t0 ) = 0. But the uniqueness statement in Theorem 1 implies that x(t) ≡ 0, i.e. c1 x1 (t) + · · · cn xn (t) ≡ 0 for all t in I, so x1 , . . . , xn are linearly dependent on I. 2 Example 2. Show that the following three 3-vector functions are linearly independent on I = (−∞, ∞): x1 (t) = (1, 0, et ), x2 (t) = (0, 1, et ), x3 (t) = (sin t, cos t, 0). 210 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS Solution. We compute the Wronskian W [x1 , x2 , x3 ](t) = 1 0 et 0 1 et sin t 1 cos t = t e 0 0 cos t + sin t t e 0 1 = −et (cos t + sin t). et Now the Wronskian does vanish at some points, but all we need is one point were it is nonzero. Since W [x1 , x2 , x3 ](0) = −1, we conclude that x1 , x2 , x3 are linearly independent. 2 Now we are ready to find all solutions for the homogeneous system (7.14). Theorem 3. Suppose A(t) is continuous on an interval I and x1 , . . . , xn are linearly independent solutions of (7.14). Then every solution of (7.14) is of the form (7.15). Proof. Let S denote the vector space of all solutions of (7.14). It suffices to show that S is n-dimensional, since then any linearly independent set of n solutions forms a basis for S. Pick t0 in I and for each j = 1, . . . , n, let xj be the unique solution of the initial-value problem x0 = A x, x(t0 ) = ej . Now let x be any solution of (7.14) and consider its value at t0 . Since {e1 , . . . , en } is a basis, we can find constants c1 , . . . , cn so that x(t0 ) = c1 e1 + · · · + cn en . Using these same constants, let us define a solution of (7.14) by y = c1 x1 + · · · cn xn . Then x and y are both solutions of (7.14) with the same value at t0 : x(t0 ) = y(t0 ). So the uniqueness statement in Theorem 1 implies that x ≡ y, i.e. we have written x in the form (7.15). 2 As a consequence of this theorem, if x1 , . . . , xn are linearly independent solutions of (7.14) then we define (7.15) to be the general solution of (7.14). As with secondorder equations, the general solution may be used to solve initial-value problems. Example 3. Show that the following functions are linearly independent solutions of the given homogeneous system, and use them to find the solution satisfying the given initial conditions: t 2t 4 2 1 2e e 0 . x1 (t) = , x2 (t) = ; x = x, x(0) = −3et −e2t −3 −1 0 Solution. To show x1 and x2 are linearly independent, we compute the Wronskian: W [x1 , x2 ](t) = 2et −3et e2t = −2e3t + 3e3t = e3t 6= 0. −e2t Before proceeding, let us observe that we can rewrite x1 and x2 as 2 1 x 1 = et and x2 = e2t . −3 −1 7.2. THEORY OF FIRST-ORDER LINEAR SYSTEMS 211 This slightly simplifies the calculation showing that x1 and x2 are solutions, e.g. 1 4 2 4 2 1 2 1 x02 = 2e2t and x2 = e2t = e2t = 2e2t . −1 −3 −1 −3 −1 −1 −2 −1 It also allows us to express the general solution in multiple ways: t 2t 2 1 2e e 2c1 et + c2 e2t . x(t) = c1 et + c2 e2t = c1 + c = 2 −3 −1 −3et −e2t −3c1 et − c2 e2t Finally, we can use the initial conditions to specify the constants: 2c1 + c2 1 x(0) = = ⇒ c1 = −1, c2 = 3. −3c1 − c2 0 We conclude that the solution is −2et + 3e2t x(t) = . 3et − 3e2t 2 As with second-order equations, the general solution for the nonhomogeneous system (7.12) is obtained from a particular solution and the general solution of (7.14). Theorem 4. Suppose A(t) and f (t) are continuous on an interval I and xp is a particular solution of (7.12). Then every solution of (7.12) is of the form x(t) = xp (t) + xh (t), where xh is the general solution of (7.14). To find a particular solution xp for (7.12), the simplest case is when A and f are both independent of t, since the system is then autonomous and we can use an equilibrium solution as xp . This reduces to solving the algebraic equation Ax + f = 0. Example 4. Find the general solution for 4 2 2 x0 = x+ . −3 −1 0 Solution. We observe that the associated homogeneous system is the same as in Example 3, so we know that 2 1 xh (t) = c1 et + c2 e2t . (7.16) −3 −1 Since the system is autonomous, we find the equilibrium solution by solving the algebraic equation: 4 2 2 0 xp + = . −3 −1 0 0 212 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS We achieve this, as usual, by putting the augmented coefficient matrix into REF: 1 1/2 −1/2 1 4 2 −2 ∼ ⇒ xp = . −3 −3 −1 0 0 1 −3 We conclude that the general solution is 1 2 1 t 2t x(t) = + c1 e + c2 e . −3 −3 −1 2 When A is constant and the components of f are functions of t, then we can try to find xp using the method of undetermined coefficients, as we did for second-order equations in Section 2.5. Since we are dealing with systems, the method can become a little more complicated; but rather than trying to cover all cases, let us illustrate what is involved with an example. Example 5. Find the general solution for 4 2 2 0 x = x+ t . −3 −1 e Solution. We only need xp since xh is given by (7.16). The functions 2 and et require different forms in the method of undetermined coefficients, so let us write 2 2 0 f (t) = t = + t , e 0 e and try to find separate particular solutions x1p and x2p for 2 0 f1 (t) = and f2 (t) = t . 0 e For f1 , we get the autonomous system as in Example 4, so we could find x1p as an equilibrium as we did there. But let us take the perspective of undetermined coefficients and assume x1p is a constant vector a = (A, B). We plug-in to evaluate A and B: 0 4 2 2 4A + 2B + 2 4A + 2B = −2 0 x1p = and x + = ⇒ 0 −3 −1 p 0 −3A − B −3A − B = 0. We can solve the coefficients to conclude A = 1 and B = −3 to obtain x1p = (1, −3). For f2 , we might try x2p = et a where a is a constant vector, but this fails; the reason is that vectors of the form et a may be a solution of the associated homogeneous system given by (7.16). Instead, let us play it safe by taking: x2p = tet a + et b, where a and b are constant vectors. If we plug into x0 = Ax + f2 , we find −2 −2/3 −2/3 a= and b = +s , 3 0 1 where s is a free variable. We may choose s = −1 and conclude x2p = tet −2t . 3t − 1 So the general solution is x = x1p + x2p + xh where xh is given by (7.16). 2 7.2. THEORY OF FIRST-ORDER LINEAR SYSTEMS 213 Remark 1. While Theorems 3 and 4 have important and useful conclusions, they do not provide a method for actually finding n linearly independent solutions of (7.14). This topic will be discussed in detail in the next section. Exercises 1. Use the method of elimination to obtain the general solution for the following first-order linear systems. If initial conditions are given, also find the particular solution satisfying them. (a) x01 = 2x2 , x02 = −2x1 Solution (b) x01 = 3x1 + x2 , x02 = −2x1 , x1 (0) = 1, x2 (0) = 0 (c) x01 = 2x1 − 3x2 , x02 = x1 − 2x2 (d) x01 = x1 + 2x2 + 5e4t , x02 = 2x1 + x2 . 2. Express each of the systems in the previous problem in matrix form x0 = Ax + f . 3. (i) Verify that the given vector functions xj (t) are linearly independent, (ii) Verify that they satisfy the given homogeneous first-order system, and (iii) Use them to solve the initial-value problem. −3 2 0 3t 1 −2t 2 0 (a) x1 = e , x2 = e ; x = x, x(0) = . 3 1 −3 4 5 sin 2t − cos 2t 0 2 1 (b) x1 = , x2 = ; x0 = x, x(0) = ; cos 2t sin 2t −2 0 1  2t   −t   −t      e −e −e 0 1 1 0 (c) x1 = e2t , x2 =  e−t , x3 =  0 ; x0 = 1 0 1 x, x(0) = 0; e2t 0 e−t 1 1 0 5  t       3t 5t 3 −2 0 2e −2e 2e (d) x1 = 2et , x2 =  0 , x3 = −2e5t ; x0 = −1 3 −2 x, et e3t e5t 0 −1 3 x(0) = (4, 0, 0). 4. Find a particular solution xp as an equilibrium for the following nonhomogeneous (but autonomous) systems. 1 2 0 1 2 1 0 0 (b) x = x+ (a) x = x+ , Solution 3 4 1 2 1 −1         1 −1 −2 0 1 −1 −2 −1 (c) x0 = 2 −1 −3 x + −2 (d) x0 = 2 −1 −3 x +  1 . 3 −3 −5 1 3 −2 −5 0 5. Use undetermined coefficients to find a particular solution xp of the following nonhomogeneous equations. 1 3 4 e2t 0 1 sin 2t 0 0 (a) x = x+ ; (b) x = x− ; 2 1 −3 e2t 2 0 4 cos 2t t t 1 2 e 1 1 e Atet 0 x+ ; (d) x = x + , try x = . (c) x0 = p 2 1 3 e2t 0 2 3et Bet 214 7.3 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS Eigenvalue Method for Homogeneous Systems In this section we shall investigate how to find the general solution for the homogeneous system of first-order equations (7.14) when A is a constant matrix: x0 = A x. (7.17) Notice that (7.17) is autonomous, and the trivial solution x(t) ≡ 0 is an equilibrium. (Assuming that A is invertible, x = 0 is the only equilibrium solution of (7.17).) We are anxious to construct nonconstant solutions. Suppose that λ is a real eigenvalue for A with real eigenvector v, and consider the vector function x(t) = eλt v. (7.18) Then x0 = λeλt v and Ax = A(eλt v) = eλt Av = eλt λv, so (7.18) is a solution of (7.17). Note that, for all values of t, x(t) lies on the ray passing through the vector v, so a solution of the form (7.18) is called a straight-line solution of (7.17). Moreover, if λ < 0 then x(t) → 0 as t → ∞, while if λ > 0 then x(t) tends away from 0 as t → ∞. So the signs of the eigenvalues determines the stability of the equilibrium solution x(t) ≡ 0. Example 1. Find the general solution of 5 0 x = 8 −4 x, −7 and determine the stability of the equilibrium solution x(t) ≡ 0. Solution. We first find the eigenvalues and eigenvectors for A: det(A − λI) = 5−λ 8 −4 = λ2 + 2λ − 3 = (λ + 3)(λ − 1). −7 − λ So A has eigenvalues λ1 = 1 and λ2 = −3. For λ1 = 1 we find an eigenvector 4 −4 1 −1 1 A − λ1 I = ∼ ⇒ select v1 = , 8 −8 0 0 1 Fig.1. Straight-line solution x1 (t). and then use (7.18) to define a straight-line solution of (7.17): t 1 x1 (t) = e . 1 Notice that x1 (t) moves away from 0 as t → ∞; see Figure 1. Similarly, for λ2 = −3 we find 8 −4 2 −1 1 A − λ2 I = ∼ ⇒ select v2 = , 8 −4 0 0 2 and obtain the straight-line solution x2 (t) = e −3t 1 . 2 Notice that x2 (t) → 0 as t → ∞; see Figure 2. Moreover, x1 and x2 are linearly Fig.2. Straight-line solution x2 (t). 7.3. EIGENVALUE METHOD FOR HOMOGENEOUS SYSTEMS 215 independent since W [x1 , x2 ](t) = et et e−3t = e−2t 6= 0, 2e−3t x2 so the general solution is x(t) = c1 et 1 1 + c2 e−3t . 1 2 x1 If we choose an initial condition for which c1 = 0, then x(t) → 0. However, if c1 6= 0, then x(t) tends to infinity (in some direction) as t → ∞. Consequently, x = 0 is an unstable equilibrium, in fact a saddle point; see Figure 3. 2 This example shows how we can proceed in general. Suppose A has n linearly independent eigenvectors v1 , . . . , vn with associated eigenvalues λ1 , . . . , λn (not necessarily distinct). Then we have n solutions of (7.17) that are in the special form (7.18): x1 (t) = eλ1 t v1 , . . . , xn (t) = eλn t vn . Moreover, the solutions are linearly independent since W [x1 , . . . , xn ](t) = det([ eλ1 t v1 · · · eλn t vn ]) = e(λ1 +···+λn )t det([ v1 · · · vn ]) 6= 0 for any t. Notice that the above remarks apply to complex eigenvalues and eigenvectors as well as real ones. However, since we are interested in real-valued solutions of (7.17), let us first restrict our attention to the case of real eigenvalues and eigenvectors. We have shown the following: Theorem 1. If A is an (n×n)-matrix of real constants that has an eigenbasis v1 , . . . , vn for Rn , then the general solution of (7.17) is given by x(t) = c1 eλ1 t v1 + · · · + cn eλn t vn , where λ1 , . . . , λn are the (not necessarily distinct) eigenvalues associated respectively with the eigenvectors v1 , . . . , vn . Example 2. Find the general solution of  4 −3 x0 =  2 −1 0 0  1 1  x, 2 and determine the stability of the equilibrium solution x(t) ≡ 0. Solution. Notice that the coefficient matrix is the matrix A of Example 4 in Section 6.1. There we found that A has an eigenvalue λ1 = 1 with eigenvector v1 = (1, 1, 0); this lead to the solution   1 x1 (t) = et 1 . 0 Fig.3. Full Phase Plane for Example 1. 216 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS We also found that A has an eigenvalue λ2 = 2 with two linearly independent eigenvectors, v2 = (3, 2, 0) and v3 = (−1, 0, 2); these lead to the two solutions     3 −1 x2 (t) = e2t 2 and x3 (t) = e2t  0  . 0 2 We conclude that the general solution is       1 3 −1 x(t) = c1 et 1 + c2 e2t 2 + c3 e2t  0  . 0 0 2 Notice that all solutions tend away from (0, 0, 0) as t increases, so x = 0 is an unstable equilibrium, in fact a source point. 2 Complex Eigenvalues Suppose that the characteristic polynomial p(λ) for the constant matrix A has a complex root λ = α + βi leading to a complex eigenvector v. For the reasons above, we see that eλt v is a solution of (7.17), but it is complex-valued. However, if A is real-valued then we would prefer to have real-valued solutions of (7.17). Is there anything to be done? Recall that complex eigenvalues and eigenvectors for a real-valued matrix A come in complex conjugate pairs. So w(t) = eλt v and w(t) = eλt v are linearly independent complex-valued solutions of (7.17). If we decompose w(t) into its real and imaginary parts, then we can write w(t) = a(t) + ib(t) and w(t) = a(t) − ib(t). But a linear combination of two solutions of (7.17) is again a solution, so a(t) = 1 (w(t) + w(t)) 2 and b(t) = 1 (w(t) − w(t)) 2i are both real-valued solutions of (7.17). Moreover, a(t) and b(t) are linearly independent since their linear span (over C) includes both w(t) and w(t). We have shown the following. Theorem 2. If A is an (n × n)-matrix of real constants that has a complex eigenvalue λ and eigenvector v, then the real and imaginary parts of w(t) = eλt v are linearly independent real-valued solutions of (7.17): x1 (t) = Re(w(t)) and x2 (t) = Im(w(t)). Example 3. Find the general solution of −1 x0 = −2 2 x. −1 Solution. We find the eigenvalues of A: det(A − λI) = −1 − λ −2 2 = λ2 + 2λ + 5 = 0. −1 − λ 7.3. EIGENVALUE METHOD FOR HOMOGENEOUS SYSTEMS 217 Using the quadratic formula, we find λ = −1 ± 2i. Let us take λ1 = −1 + 2i and find an associated eigenvector: −2i 2 2 2i 1 i 1 A − λ1 I = ∼ ∼ ⇒ select v1 = . −2 −2i −2 −2i 0 0 i We now have a complex-valued solution (−1+2i)t w(t) = e 1 −t 2it 1 . =e e i i In order to find the real and imaginary parts of this solution, we need to use Euler’s formula (see Appendix A): e2it = cos 2t + i sin 2t. Now we compute x2 w(t) = e−t cos 2t + i sin 2t cos 2t sin 2t = e−t + i e−t , − sin 2t + i cos 2t − sin 2t cos 2t x1 and take the real and imaginary parts to obtain two real-valued solutions: cos 2t −t −t sin 2t x1 (t) = e and x2 (t) = e . − sin 2t cos 2t We conclude that the general solution is cos 2t sin 2t x(t) = c1 e−t + c2 e−t . − sin 2t cos 2t We see that for any choice of c1 and c2 , x(t) → 0 as t → ∞, so the equilibrium at x = 0 is stable; in fact is a stable spiral point (see Figure 2). 2 Remark 1. When (λ, v) is a complex eigenpair, we should not call w(t) = eλt v a “straight-line solution” since we have seen that the real-valued solutions do not move in straight lines, but rather circle the equilibrium 0 in spirals or closed curves. Defective Eigenvalues and Generalized Eigenvectors Recall from Section 6.1, that a real eigenvalue may have algebraic multiplicity ma greater than its geometric multiplicity mg ; in this case, the eigenvalue is called defective and d = ma − mg is called its defect. When A has a defective eigenvalue, it fails to have an eigenbasis, so we cannot use Theorem 1. However, we would still like to find the general solution for (7.17). What can be done? In answering this question, we shall encounter the following concept. Definition 1. If A is a square matrix and p is a positive integer, then a nonzero solution v of (A − λ)p v = 0 (7.19) is called a generalized eigenvector for the eigenvalue λ. Fig.2. Phase Plane for Example 3. 218 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS Of course, eigenvectors correspond to the special case p = 1 in (7.19). Moreover, if (7.19) holds for p > 1, then it could hold for a smaller value of p; but if we let p0 denote the smallest positive integer for which (7.19) holds, then v1 = (A − λ)p0 −1 v 6= 0 and satisfies (A−λI)v1 = (A−λ)p0 v = 0, so v1 is an eigenvector for λ. In particular, we see that generalized eigenvectors for an eigenvalue λ only exist when an actual eigenvector exists. But when the eigenvalue is defective, generalized eigenvectors can be used to find solutions of (7.17) by generalizing the definition (7.18). Let us first discuss the case of an eigenvalue λ with ma = 2 and mg = 1. We have one linearly independent eigenvector v which can be used to create a solution of (7.17): ⇒ Av = λv x1 (t) = eλt v. Recalling the case of multiple roots of the characteristic polynomial for a higher-order differential equation, we might try multiplying this by t to create another solution; but defining x2 (t) = t eλt v does not work. Before we give up on this idea, let us generalize it and try x2 (t) = eλt (tu + w), where the vectors u and w are to be determined. Compute and compare x02 and Ax2 : x02 = eλt (tλu + λw + u) and Ax2 = eλt (tAu + Aw). For these to be equal for all t, we require λu = Au and λw + u = Aw. In other words, if we take u = v and w satisfies (A − λ)w = v, then x2 (t) is indeed a solution of (7.17)! Moreover, we claim that x1 (t) and x2 (t) are linearly independent. To verify this, we compute the Wronskian at t = 0 to find W [x1 , x2 ](0) = det( v w ), so we only need show that v and w are linearly independent vectors. But c1 v + c2 w = 0 ⇒ (A − λ)(c1 v + c2 w) = c2 v = 0 shows c2 = 0 and hence c1 = 0, confirming that v and w are linearly independent. With a change in notation, let us summarize this analysis as follows If λ is an eigenvalue for A with ma = 2 and mg = 1, then two linearly independent solutions of (7.17) are • x1 (t) = eλt v1 where v1 is an eigenvector for λ • x2 (t) = eλt (tv1 + v2 ) where (A − λI)v2 = v1 . Note that v2 is a generalized eigenvector for λ since (A − λI)2 v2 = 0 although (A − λI)v2 6= 0. Below, we shall discuss why we can always solve (A − λI)v2 = v1 . Example 4. Find two linearly independent solutions of 1 −2 0 x = x. 2 5 Solution. We find the eigenvalues of the matrix A: det(A − λI) = 1−λ 2 −2 = λ2 − 6λ + 9 = (λ − 3)2 , 5−λ 7.3. EIGENVALUE METHOD FOR HOMOGENEOUS SYSTEMS 219 so we have only one eigenvalue, namely λ = 3. Now we find its eigenvectors: −2 −2 1 1 1 A − λI = ∼ ⇒ select v1 = . 2 2 0 0 −1 x2 We see that the eigenspace is one-dimensional so mg = 1, whereas we had ma = 2; so λ = 3 is defective. Now we want to solve (A − λ)v2 = v1 : −2 2 −2 2 1 −1 ∼ 1 0 1 0 −1/2 0 x1 implies that x2 = s is free and x1 = −1/2 − s, so we can choose s = 0 to obtain Fig.3. Phase Plane for Ex- −1/2 v2 = . 0 ample 4. We conclude that we have the two linearly independent solutions 1 1 −1/2 x1 (t) = e3t and x2 (t) = e3t t + . −1 −1 0 Both of these solutions grow exponentially as t → ∞, so the equilibrium at x = 0 is unstable; in fact, it is a source (see Figure 3). 2 The above analysis naturally raise two questions: • How did we know that we could solve (A − λI)v2 = v1 ? • What happens if the defect d is greater than 1? To address the first question, we change the logic. We first pick v2 to be a generalized eigenvector, so that (A − λ)2 v2 = 0 but (A − λ)v2 6= 0. Then we define v1 = (A − λ)v2 . We know v1 is nonzero and (A − λ)v1 = 0, so v1 is an eigenvector; we are done. Let us use this logic to address the second question. We suppose v is a generalized eigenvector for the eigenvalue λ and let p > 1 be the smallest integer such that (7.19) holds, i.e. (A − λI)p v = 0 and (A − λI)p−1 v 6= 0. If we let v1 := (A − λI)p−1 v, (7.20) then we see that (A − λI) annihilates v1 , i.e. (A − λI)v1 = 0, so v1 is a true eigenvector for the eigenvalue λ. On the other hand, v2 := (A − λI)p−2 v, . . . , vp := (A − λI)0 v = v, are generalized eigenvectors because they are annihilated by powers of (A − λI): (A − λI)2 v2 = 0, . . . , (A − λI)p vp = 0. 220 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS This sequence is called a chain of generalized eigenvectors of length p associated with λ. Notice that the chain is based upon a true eigenvector v1 . Moreover, the chain leads to the construction of p distinct solutions of (7.17) in the following way: x1 (t) = eλt v1 , x2 (t) = eλt (tv1 + v2 ) , 2 t λt v1 + tv2 + v3 , x3 (t) = e 2 .. . p−1 tp−2 t λt v1 + v2 + · · · + vp . xp (t) = e (p − 1)! (p − 2)! (7.21) We can easily show x1 (t),. . . ,xp (t) are linearly independent, as we did for p = 2. If mg = 1, then there is only one linearly independent eigenvector on which to base a chain of generalized eigenvectors. But if mg > 1, then we may have multiple chains associated with the same eigenvalue. However, it turns out that the sum of the lengths of all chains of generalized eigenvectors associated with an eigenvalue λ is always equal to its algebraic multiplicity ma . Moreover, if we have two chains of generalized eigenvectors based upon linearly independent eigenvectors, then the union of all these vectors is linearly independent. (These are both consequences of the “Jordan normal form” for A; for more details, see [4].) If we follow the above prescription for the construction of solutions of (7.17) from each chain of generalized eigenvectors, then we obtain a collection of ma linearly independent solutions, as desired. Let us illustrate this procedure with one more example. Example 5. Find the general solution for  −3 0 x0 =  −1 −1 1 0  −4 −1  x. 1 Solution. Let us find the eigenvalues of the coefficient matrix A: det(A − λI) = −3 − λ −1 1 0 −1 − λ 0 so λ = −1 is the only eigenvalue and ma = 3. Let    −2 0 −4 1 0 A − λI = A + I = −1 0 −1 ∼ 0 0 1 0 2 0 0 −4 −1 = −(1 + λ)3 , 1−λ us find its eigenvectors:    0 0 1 ⇒ select v1 = 1 . 0 0 We see that mg = 1 and so the defect is d = 2. We want to construct a chain of generalized eigenvectors of length p = 3 for the eigenvalue λ = −1. Let us find the vector v by solving (7.20), i.e. (A + I)2 v = v1 . But we compute  2   0 0 0 −2 0 −4 (A + I)2 = −1 0 −1 = 1 0 2 , 1 0 2 0 0 0 7.3. EIGENVALUE METHOD FOR HOMOGENEOUS SYSTEMS 221 so we can solve for v:  0 1 0 0 0 0    0 0 2 v = 1 0 0   1 ⇒ select v = 0 . 0 We then take v2 = (A + I)v and v3 = v:      −2 0 −4 1 −2 v2 = −1 0 −1 0 = −1 1 0 2 0 1   1 and v3 = 0 . 0 Our three linearly independent solutions are               −2 1 0 0 −2 2 0 t x1 (t) = e−t 1, x2 (t) = e−t t 1 + −1, x3 (t) = e−t  1 + t −1 + 0. 2 0 1 0 0 0 1 Exercises 1. Use Theorem 1 to obtain the general solution of the given first-order systems; if initial conditions are given, find the particular solution satisfying them. 2 3 4 1 7 0 0 (a) x = x; (b) x = x, x(0) = ; Sol’n 2 1 6 −1 0       1 9 4 0 5 0 −6 (d) x0 = −6 −1 0 x, x(0) =  0 . (c) x0 = 2 −1 −2 x; −1 6 4 3 4 −2 −4 2. The matrices in the following systems have complex eigenvalues; use Theorem 2 to find the general (real-valued) solution; if initial conditions are given, find the particular solution satisfying them. 4 −3 1 −5 3 0 0 (a) x = x; (b) x = x, x(0) = ; Sol’n 3 4 1 −1 −1       1 0 0 0 2 0 1 (c) x0 = 0 −1 −6 x; (d) x0 = −2 0 0 x, x(0) = 2. 0 3 5 0 0 3 3 3. The matrices in the following systems have (some) defective eigenvalues; use generalized eigenvectors to find the general solution. −2 1 3 −1 (a) x0 = x; (b) x0 = x. −1 −4 1 5 4. Find the general solution for the nonhomogeneous first-order system. If initial conditions are given, also find the solution satisfying them. t 3 4 1 6 −7 8e 1 0 0 (a) x = x+ ; (b) x = x+ , x(0) = ; Sol’n 3 2 2 1 −2 0 0        2t  5 0 −6 3e 9 4 0 −1 (c) x0 = 2 −1 −2 x +  0 ; (d) x0 = −6 −1 0 x +  4 . 4 −2 −4 0 6 4 3 −7 222 7.4 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS Applications to Multiple Tank Mixing Suppose that we have two tanks that contain solutions involving the same solute and solvent (such as salt dissolved in water) but of different concentrations; the two solutions are allowed to flow between the two tanks, changing the concentrations in each tank. This will lead to a first-order linear system involving the rates of change of x1 (t), the amount of solute in Tank 1 at time t, and x2 (t), the amount of solute in Tank 2. We must solve this first-order linear system to find x1 and x1 . Fig.1. An Open System involving Two Tanks Let us consider one arrangement, illustrated in Figure 1, in which the solutions not only flow between the two tanks, but there is inflow to Tank 1 and outflow from Tank 2; such a system that is open to the outside is called an open system. Suppose the inflow has rate ri and concentration ci while the outflow has rate ro . In addition, the solution in Tank 1 is pumped into Tank 2 at the rate r12 and the solution in Tank 2 is pumped into Tank 1 at the rate r21 . The concentration of the solution in Tank 1, c1 (t), is determined by x1 (t) and the volume of solution in Tank 1 at time t, which we denote by V1 (t); similarly, the concentration of the solution in Tank 2, c2 (t), is determined by x2 (t) and V2 (t): x1 (t) x2 (t) c1 (t) = and c2 (t) = . V1 (t) V2 (t) Now let us find the equations for the rate of change of x1 (t) and x2 (t). After a moment’s thought, it is easy to see that the following equations hold: dx1 x1 x2 = ci ri − r12 + r21 dt V1 V2 dx2 x1 x2 x2 = r12 − r21 − ro . dt V1 V2 V2 If we introduce the vector function x(t) = (x1 (t), x2 (t)), then we recognize this as a nonhomogeneous first-order system as in (7.12): 0 r12 − V1 x1 = r12 x2 V1 r21 x1 V2 −r21 −ro x2 V2 + ri ci . 0 (7.22) If we are also given initial conditions on x, then we can solve this system using the techniques discussed in this section. 7.4. APPLICATIONS TO MULTIPLE TANK MIXING 223 Example 1. Suppose that each tank initially contains 2 grams of salt dissolved in 10 Liters of water, so x1 (0) = 2 g = x2 (0) and V1 (0) = 10 L = V2 (0). Moreover, suppose the concentration and flow-rate parameters in (7.22) are ci = 1 g/L, ri = 6 L/min, r12 = 8 L/min, r21 = 2 L/min, ro = 6 L/min. Notice that both tanks have the same net in-flow and out-flow, so V1 (t) = 10 = V2 (t) for all t. Consequently, the system in (7.22) with initial-conditions becomes 0 x1 −0.8 = x2 0.8 0.2 x1 6 + , −0.8 x2 0 x1 (0) 2 = . x2 (0) 2 Since the matrix A is constant, we can apply the eigenvalue method to solve this problem. We compute the eigenvalues of A: −0.8 − λ 0.2 det = (0.8 + λ)2 − 0.16 = 0 ⇒ λ = −0.8 ± 0.4. 0.8 −0.8 − λ Let us compute an eigenvector for λ1 = −1.2: 0.4 0.2 2 1 ∼ ⇒ 0.8 0.4 0 0 v1 = Similarly, we compute an eigenvector for λ2 = −0.4: −0.4 0.2 −2 1 ∼ ⇒ 0.8 −0.4 0 0 1 . −2 1 v2 = . 2 The general solution of the homogeneous equation is 1 −1.2t −0.4t 1 xh (t) = c1 e + c2 e . −2 2 Now we need a particular solution of the nonhomogeneous equation, so let us use undetermined coefficients: A −0.8A + 0.2B + 6 = 0 xp = ⇒ ⇒ A = 10 = B. B 0.8A − 0.8B = 0 We conclude that our general solution is 1 10 −1.2t −0.4t 1 x(t) = c1 e + c2 e + . −2 2 10 Now let us use the initial conditions to evaluate c1 and c2 : c1 + c2 + 10 = 2 −2c1 + 2c2 + 10 = 2 ⇒ c1 = −2 c2 = −6. 224 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS Finally, we are able to write our solution as x1 (t) = 10 − 2 e−1.2t − 6 e−0.4t x2 (t) = 10 + 4 e−1.2t − 12 e−0.4t . Recalling that the volume in each tank remains 10 L, we see that the concentration in both tanks tends to 1 g/L as t → ∞, which matches the concentration of the in-flow. 2 Of course, the same ideas apply to mixing between more than two tanks, and will lead to a larger system of first-order linear equations. For example, consider the system of three tanks in Figure 2 in which the solution in Tank 1 is pumped into Tank 2, the solution in Tank 2 is pumped into Tank 3, and the solution in Tank 3 is pumped into Tank 1, all at the same rate r. In this system notice that no solution is added from outside the system or removed from the system: for this reason, the system in Figure 2 is called closed. Fig.2. A Closed System involving Three Tanks For j = 1, 2, 3, we let xj denote the amount of solute and Vj denote the volume of solution in Tank j; notice that Vj is independent of t. We then find that the concentration in Tank j is given by cj = xj /Vj , so the rates of change are given by dx1 x1 x3 = −r +r = −k1 x1 + k3 x3 dt V1 V3 dx2 x2 x1 = −r +r = k1 x1 − k2 x2 dt V2 V1 dx3 x3 x2 = −r +r = k2 x2 − k3 x3 dt V3 V2 where kj = r/Vj are constants. If we introduce the vector function x(t) = (x1 (t), x2 (t), x3 (t)), then we can write this homogeneous system as   −k1 0 k3 0 x x0 =  k1 −k2 (7.23) 0 k2 −k3 7.4. APPLICATIONS TO MULTIPLE TANK MIXING 225 Since the coefficient matrix is constant, we can use the methods of this chapter to find the general solution of (7.23), and if we are given initial conditions then we can find the particular solution. Example 2. Suppose that r = 20 L/min, V1 = V2 = 40 L, and V3 = 100 L. Moreover, suppose that initially, there is 18 grams of salt in Tank 1 and no salt in Tanks 2 and 3. With these values, we find that the initial-value problem for (7.23) becomes     −0.5 0 0.2 18 0  x, x0 =  0.5 −0.5 x(0) =  0  . 0 0.5 −0.2 0 We find the eigenvalues by solving  −0.5 − λ 0 −0.5 − λ det  0.5 0 0.5  0.2  = −λ(λ2 + 1.2 λ + 0.45) = 0, 0 −0.2 − λ which yields one real and two complex eigenvalues λ1 = 0, λ2 = −0.6 + 0.3 i, Now we compute an eigenvector for λ1    1 −0.5 0 0.2  0.5 −0.5 0  ∼ 0 0 0 0.5 −0.2 λ3 = −0.6 − 0.3 i. = 0 by 0 1 0  −0.4 −0.4 0 ⇒ select   2 v1 = 2 . 5 Using (7.18), we obtain one solution that is a constant:   2 x1 (t) ≡ 2 . 5 For λ2 = −0.6 + 0.3 i, we compute a complex eigenvector by    1 0.2 − 0.6 i 0.1 − 0.3 i 0 0.2  ∼ 0  1 0.5 0.1 − 0.3 i 0 0 0 0 0.5 0.4 − 0.3 i   −1 − 3 i ⇒ select v2 = −4 + 3 i . 5  0 0.8 − 0.6 i 0 This provides a complex-valued solution     −1 − 3 i −1 − 3 i w(t) = e(−0.6+0.3 i)t −4 + 3 i = e−0.6 t (cos .3 t + i sin .3 t) −4 + 3 i 5 5   − cos .3 t + 3 sin .3 t − i(3 cos .3 t + sin .3 t) = e−0.6 t −4 cos .3 t − 3 sin .3 t + i(3 cos .3 t − 4 sin .3 t) , 5 cos .3 t + i 5 sin .3 t 226 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS so we take real and imaginary parts to get two linearly independent real-valued solutions     − cos .3 t + 3 sin .3 t −3 cos .3 t − sin .3 t x2 (t) = e−0.6 t −4 cos .3 t − 3 sin .3 t and x3 (t) = e−0.6 t  3 cos .3 t − 4 sin .3 t  . 5 cos .3 t 5 sin .3 t The general solution is x(t) = c1 x1 (t) + c2 x2 (t) + c3 x3 (t). If we evaluate this at t = 0 and use the initial condition, we obtain 2c1 − c2 − 3c3 = 18 2c1 − 4c2 + 3c3 = 0 ⇒ c1 = 2, c2 = −2, c3 = −4, 5c1 + 5c2 = 0 and our final solution is     14 cos .3t − 2 sin .3t 4 x(t) =  4  + e−0.6 t  −4 cos .3t + 22 sin .3t  . −10 cos .3t − 20 sin .3t 10 Notice that, as t → ∞, we have xj /Vj → 0.1 g/L for each j = 1, 2, 3, so the three tanks will eventually have a uniform concentration of salt, as expected. 2 Exercises 1. As in Figure 1, suppose that a salt solution of 4 g/L is pumped into Tank 1 at the rate of 3 L/min, the solution in Tank 2 is pumped out at the same rate, but the solutions in the two tanks are mixed by pumping from Tank 1 to 2 at r12 = 4 L/min and from Tank 2 to 1 at r21 = 1 L/min. Suppose both tanks initially contain 20 L of salt solution, but the solution in Tank 1 has 30 g of salt and the solution in Tank 2 has 60 g of salt. Find the amount of salt in each tank at time t. What happens to the concentration in both tanks as t → ∞? 2. Suppose we have a closed system of two tanks with flow between them of 3 L/min in each direction. Initially both tanks contain 15 grams of a certain chemical, but in Tank 1 it is dissolved in 6 Liters of water and in Tank 2 it is dissolved in 12 Liters of water. Find the amounts of the chemical in each tank at time t. 3. For the closed system with three tanks as in Figure 2, suppose that the rate of flow is r = 120 L/min, and the volumes of the three tanks are V1 = 20, V2 = 6, and V3 = 40 Liters. Assume initially that Tank 1 contains 37 g salt, Tank 2 contains 29 g salt, but there is no salt in Tank 3. Find the amounts of salt in each tank, x1 , x2 , and x3 , and time t. 4. Three 10 L tanks are arranged as in Figure 3. At t = 0 all tanks are full of water in which the following amounts of salt are dissolved: Tank 1 has 10 g, Tank 2 has 5 g, and Tank 3 has no salt. A salt solution of 1 g/L begins to flow into Tank 1 at 2 L/min and the mixed solution flows out of Tank 2 at the same rate; the flow between the tanks is 1 L/min. Find the amount of salt in each tank at time t. Fig.3. Exercise 4. 7.5. APPLICATIONS TO MECHANICAL VIBRATIONS 7.5 227 Applications to Mechanical Vibrations In this section we use first-order systems to study mechanical vibrations. Let us begin with the simple spring-mass-dashpot vibration that we studied in Chapter 2 using second-order equations. Recall (Example 1 in Section 7.1) that introducing x = (x, x0 ) enables us to replace the second-order equation with the first-order system 0 1 x0 = Ax, where A = . (7.24) −k/m −c/m Let us compute the eigenvalues for A: det(A − λI) = −λ −k/m c k 1 = λ2 + λ + . −c/m − λ m m So the eigenvalues of A are given by λ= −c ± √ c2 − 4mk . 2m As we found in Section 2.4, the behavior of the solutions is heavily dependent on the sign of c2 − 4mk. In the overdamped case c2 − 4mk > 0, A has two distinct real eigenvalues, both negative: √ √ −c + c2 − 4mk −c − c2 − 4mk < λ2 = < 0. λ1 = 2m 2m We know that each eigenvalue has a one-dimensional eigenspace, so we can find an eigenbasis {v1 , v2 } for R2 . The general solution in this case is x(t) = c1 eλ1 t v1 + c2 eλ2 t v2 . We see that all solutions decay without oscillation to zero as t → ∞, just as we found in Section 2.4. (Recall that x(t) is simply the first component of x.) In the critically damped case c2 − 4mk = 0, we have one (double) eigenvalue λ = −c/2m. We leave it as an exercise (Exercise 1) to show that this eigenvalue is defective and to find the general solution. In the underdamped case c2 − 4mk < 0, A has two complex conjugate eigenvalues: √ −c 4mk − c2 λ± = ± iµ, where µ = . 2m 2m The associated eigenvectors v± also occur in complex conjugate pairs, yielding linearly independent complex solutions w(t) = eλt v and w(t) = eλt v (where, say, λ = λ+ ). However, as we saw in Section 7.3, we can obtain real-valued solutions by taking the real and imaginary parts of one of these: c x1 (t) = Re(eλt v) = e− 2m t Re(eiµ v) c x2 (t) = Im(eλt v) = e− 2m t Im(eiµ v). Fig.1. Spring-mass-dashpot system 228 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS Recalling Euler’s formula eiµ = cos µ + i sin µ, we see that both x1 and x2 oscillate as c they decay to zero (due to the factor e− 2m t ), just as we found in Section 2.4. Example 1. Let us consider m = 0.5 kg, c = 0.2 N/(m/sec), and k = 2 N/m, as in Example 2 of Section 2.4. Then k/m = 4 and c/m = 0.4, so √ 0 1 A= with eigenvalues λ± = −0.2 ± i 3.96. −4 −0.4 √ Using µ = 3.96 and λ = −0.2 + iµ, we find 0.2 − iµ 1 1 .05 + i µ4 A − λI = , ∼ 0 0 −4 −0.2 − iµ so the eigenspace is t(−.05 − i µ4 , 1) where t is a free complex parameter. Let us choose t = −(.05 + i µ4 )−1 so that the resultant eigenvector v has first component v1 = 1. We let 1 λt −0.2t w(t) = e v = e (cos µt + i sin µt) , v2 and then take real and imaginary parts to obtain −0.2t cos µt −0.2t sin µt x1 (t) = e and x2 (t) = e , y1 (t) y2 (t) where y1 (t) and y2 (t) oscillate with frequency µ. The general solution of our first-order system is x(t) = c1 x1 (t) + c2 x2 (t). But if we are only interested in the position function x(t), we just consider the first component of x(t) to obtain x(t) = c1 e−0.2t cos µt + c2 e−0.2t sin µt, exactly as we found in Section 2.4. 2 Coupled Mechanical Vibrations Now let us consider a more complicated system in which two masses m1 and m2 are attached to three springs with spring constants k1 , k2 , and k12 as in Figure 2; we also assume mi is subject to a linear damping force with coefficient ci . Let x denote the displacement of m1 from its equilibrium position and y denote the displacement of m2 from its equilibrium (both measured from left to right). Then Newton’s law implies that the following second-order equations hold (cf. Exercise 2): Fig.2. Coupled spring-mass system m1 x00 = −k1 x + k12 (y − x) − c1 x0 m2 y 00 = −k2 y − k12 (y − x) − c2 y 0 (7.25) We can express (7.25) as a first-order system by introducing x = (x1 , x2 , x3 , x4 ), where x1 = x, x2 = x0 , x3 = y, and x4 = y 0 . Then the two second-order equations can be replaced by a first-order system: x01 = x2 k1 + k12 c1 k12 x1 − x2 + x3 x02 = − m1 m1 m1 x03 = x4 k12 k12 + k2 c2 x04 = x1 − x3 − x4 . m2 m2 m2 7.5. APPLICATIONS TO MECHANICAL VIBRATIONS To simplify our equations, let us now assume c1 = system can be written in matrix form:   0 1 0 0 −q1 0 q2 0  x, where x0 =   0 0 0 1 q3 0 −q4 0 229 c2 = 0. In this case, our first-order  q1 = (k1 + k12 )/m1    q2 = k12 /m1 .  q3 = k12 /m2   q4 = (k12 + k2 )/m2 (7.26) If we calculate the characteristic polynomial of the (4×4)-matrix, we find its eigenvalues satisfy (λ2 + q1 )(λ2 + q4 ) − q2 q3 = 0 (7.27) It turns out that there are no real eigenvalues: there will be four complex eigenvalues that occur in two complex conjugate pairs ± i ω1 and ± i ω2 (see Exercise 3). The quantities ω1 and ω2 appear as circular frequencies in the general solution, and hence are called the natural frequencies of the coupled system. Let us consider an example. Example 2. Suppose m1 = 2, m2 = 1, k1 = 2, k2 = 1, and k12 = 2 in (7.26). Let us determine the natural frequencies of the system and find the general solution. We calculate q1 = 2 = q3 , q2 = 1, and q4 = 3, and use (7.27) to find the eigenvalues: (λ2 + 2)(λ2 + 3) − 2 = λ4 + 5λ2 + 4 = (λ2 + 1)(λ2 + 4) = 0 With λ1 = i we find  −i 1 −2 −i A − iI =  0 0 2 0 0 1 −i −3   1 0 0 0 0 1 0 0 ∼ 1  0 0 1 0 0 0 −i  i −1  i  0 ⇒ ⇒ λ = ±i, ±2i. v1 = (−i, 1, −i, 1). (We can find v2 = v1 for λ2 = −i, but we won’t need it.) The eigenpair (λ1 , v1 ) produces the complex-valued solution   sin t − i cos t cos t + i sin t  w1 (t) = eit v1 = (cos t + i sin t)v1 =  sin t − i cos t , cos t + i sin t and we take the real and imaginary parts to obtain two linearly independent solutions     sin t − cos t cos t  sin t     x1 (t) =   sin t  and x2 (t) = − cos t . cos t sin t With λ = 2i we find  −2i 1  −2 −2i A − 2i I =   0 0 2 0 0 1 −2i −3   0 1 0 0  ∼ 1  0 −2i 0 i/2 1 0 0  0 0 i 0   1 i/2 0 0 ⇒ v3 = (i, −2, −2i, 4). 230 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS This produces the complex-valued solution   − sin 2t + i cos 2t −2 cos 2t − 2i sin 2t  w3 (t) = e2it v3 = (cos 2t + i sin 2t) v3 =   2 sin 2t − 2i cos 2t  4 cos 2t + 4i sin 2t and we take the real and imaginary parts to obtain two linearly independent solutions     cos 2t − sin 2t  −2 sin 2t  −2 cos 2t    x3 (t) =   2 sin 2t  and x4 (t) = −2 cos 2t . 4 sin 2t 4 cos 2t We see that the natural frequencies of the system are ω1 = 1 and ω2 = 2, and the general solution of the first-order system is x = c1 x1 + c2 x2 + c3 x3 + c4 x4 . However, if we only want the position functions x(t) and y(t), then we pick out the first and third components of x: x(t) = c1 sin t − c2 cos t − c3 sin 2t + c4 cos 2t y(t) = c1 sin t − c2 cos t + 2c3 sin 2t − 2c4 cos 2t. Note that this is oscillatory about the equilibrium solution (0, 0, 0, 0), so the equilibrium is a center. 2 We can also consider coupled mechanical vibrations with external forces, which leads to a nonhomogeneous first-order system. When there is no damping and the external force is periodic with forcing frequency equal to one of the natural frequencies, then we encounter resonance. However, this analysis is simplified if we use second-order systems, which we discuss next. Undamped, Forced Vibrations as Second-Order Systems When the coupled second-order equations (7.25) do not involve damping (i.e. when c1 = c2 = 0), then it is actually simpler to treat the equations as a second-order system; this is particularly advantageous if we also have an external force f (t) = (f1 (t), f2 (t)). In fact, the two second-order equations with the forcing terms added m1 x00 = −(k1 + k12 ) x + k12 y + f1 (t) m2 x00 = k12 x − (k2 + k12 ) y + f2 (t) (7.28) can be represented in matrix form as x x= , y M= m1 0 M x00 = K x + f (t) where 0 −k1 − k12 k12 , K= , m2 k12 −k2 − k12 (7.29) f1 (t) f (t) = . f2 (t) The diagonal matrix M is called the mass matrix and the symmetric matrix K is called the stiffness matrix ; note that these are (2 × 2)-matrices and simpler than the (4 × 4)-matrix in (7.26). 7.5. APPLICATIONS TO MECHANICAL VIBRATIONS 231 To analyze (7.29), we multiply on the left by −1 m1 0 M−1 = 0 m−1 2 to obtain x00 = Ax + g(t) −1 A=M K where −1 and g(t) = M (7.30) f (t). Using our experience with second-order linear equations, it is not difficult to show that the general solution of (7.30) is x(t) = xp (t) + xh (t), (7.31) where xp (t) is a particular solution of (7.31) and xh (t) is the general solution of the homogenous second-order system x00 = A x. (7.32) Since (7.32) is a second-order (2 × 2)-system, we expect four linearly independent solutions, but how do we find them? From our experience with first-order homogenous systems, we try solutions in the form x(t) = eµt v, (7.33) where µ is a (possibly complex) scalar and v is a fixed vector. If we plug (7.33) into (7.32), we obtain eµt µ2 v = eµt A v ⇒ µ2 v = A v, in other words, (µ2 , v) is an eigenpair for A. If A has a negative eigenvalue −ω12 (where ω1 > 0) with real eigenvector v1 , then we can take µ = iω1 and obtain a complex-valued solution of (7.32): x(t) = eiω1 t v1 = (cos ω1 t + i sin ω1 t) v1 . Taking real and imaginary parts, we obtain two linearly independent solutions of (7.32): x1 (t) = (cos ω1 t) v1 and x2 (t) = (sin ω1 t) v1 . If A has another negative eigenvalue − ω22 with ω1 6= ω2 , then we can repeat this process and obtain two more solutions that are linearly independent of each other and the two solutions above; taking a linear combination of these gives us the general solution of (7.32). This argument can be modified to cover λ = 0 (see Exercise 7). Generalizing this process to (n × n)-matrices A, we have the following theorem: Theorem 1. If A is a real (n × n)-matrix with negative eigenvalues λ1 = −ω12 > · · · > λn = −ωn2 and associated eigenvectors v1 , . . . , vn , then the general solution of (7.32) is given by n X x(t) = (ai cos ωi t + bi sin ωi t) vi , i=1 where ai and bi are arbitrary constants. If λ1 = −ω12 is replaced by λ1 = 0 then (a1 cos ω1 t + b1 sin ωi t) v1 should be replaced by (a1 + b1 t)v1 . 232 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS How do we find a particular solution xp of (7.31)? Based upon our experience with second-order equations and first-order systems, we shall use the method of undetermined coefficients. As in Section 2.6, let us consider the case of a periodic forcing function f (t), which in (7.30) means that we want g(t) = (cos ωt) g0 (7.34) where ω is the forcing frequency and g0 is a fixed vector. Let us try as our trial solution xp (t) = (cos ωt) u, where the constant vector u is to be determined. But x00p (t) = (−ω 2 cos ωt) u and Axp (t) = (cos ωt) Au, so u should satisfy (A + ω 2 I) u = −g0 . (7.35) Notice that (7.35) has a unique solution u provided ω is not an eigenvalue of A, i.e. provided ω is not one of the natural frequencies of the system. On the other hand, if ω is one of the natural frequencies of the system, then we encounter resonance. Example 3. Suppose m1 = 2, m2 = 1, k1 = 2, k2 = 1, and k12 = 2 as in Example 2, but now let us also assume that mass m2 is subject to a periodic force f2 (t) = cos ωt. Let us find the general solution using a second-order system. In this case (7.29) becomes 2 0 00 −4 2 0 x = x+ , 0 1 2 −3 cos ωt and after multiplying by M−1 we obtain for (7.30) −2 1 0 00 x = x+ . 2 −3 cos ωt Now we compute the eigenvalues and eigenvectors of A to find λ1 = −1 with eigenvector v1 = (1, 1) and λ2 = −4 with eigenvector v2 = (1, −2). So we take ω1 = 1 and ω2 = 2 and obtain our four linearly independent solutions of the homogenous equation: 1 1 1 1 x1 (t) = sin t , x2 (t) = cos t , x3 (t) = sin 2t , x4 (t) = cos 2t . 1 1 −2 −2 To find xp we want to solve (7.35) with g0 = (0, 1): −2 + ω 2 1 0 u= . 2 −3 + ω 2 −1 We can solve this for u using the inverse matrix −2 + ω 2 2 1 −3 + ω 2 −1 = 2 1 ω −3 −2 (ω 2 − 1)(ω 2 − 4) −1 , ω2 − 2 7.5. APPLICATIONS TO MECHANICAL VIBRATIONS 233 provided ω 6= 1, 2. We find −1 1 −2 + ω 2 1 0 1 u= = 2 , 2 −3 + ω 2 −1 (ω − 1)(ω 2 − 4) 2 − ω 2 and consequently the particular solution is xp (t) = cos ωt 1 . (ω 2 − 1)(ω 2 − 4) 2 − ω 2 (7.36) Putting this together, we obtain the general solution cos ωt 1 1 x(t) = 2 + 2 + c1 sin t 2 2 − ω 1 (ω − 1)(ω − 4) 1 1 1 c2 cos t + c3 sin 2t + c4 cos 2t . 1 −2 −2 Of course, we needed to assume that the forcing frequency ω is not equal to either of the natural frequencies ω1 = 1 and ω2 = 2 in order for xp to be defined in (7.36). Moreover, for ω close to either of the natural frequencies we see that amplitude of the oscillations of xp become very large; this is resonance. 2 Exercises 1. In the critically damped case c2 − 4mk = 0, show that the eigenvalue λ = −c/2m for (7.24) is defective and find the general solution. Compare this with the result for critically damped vibrations in Section 2.4. 2. Explain how Newton’s law implies that (7.25) holds. 3. For the parameters q1 , . . . , q4 satisfying (7.26), show that (7.27) always has four roots of the form ± i ω1 , ± i ω2 where ω1 , ω2 > 0. Fig.3. Exercise 4. 4. Suppose that masses m1 and m2 are only connected by two springs as in Figure 3. If m1 = 2, m2 = 1, k1 = 4 and k12 = 2, find the natural frequencies and the general solution x(t) of the first-order system (7.26). 5. Replace the spring with constant k2 in (7.25) with a dashpot having damping coefficient c2 > 0; see Figure 4. Find the first-order system that replaces (7.26). 6. Consider the two spring configuration of Exercise 4, but add an external force of cos ωt that acts on m2 . For what forcing frequencies ω does resonance occur? Assuming ω is nonresonant, find the general solution x(t) of the second-order system (7.29). Fig.4. Exercise 5. 7. If the (n × n)-matrix A has eigenvalue λ = 0 with (nonzero) eigenvector v, show that two linearly independent solutions of x00 = Ax are x1 (t) = v and x2 (t) = tv. 8. Consider three masses m1 , m2 , and m3 connected by two springs as in Figure 5. Suppose m1 = 3 = m3 , m2 = 2, and k12 = 12 = k23 . Use a second-order system to find the general solution. 9. Consider the three mass and two spring system in Exercise 8, but add an external force of 12 cos 3t acting on m2 . Find the general solution. Fig.5. Exercise 8. 234 7.6 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS Additional Exercises 1. Consider the vector-valued functions x1 (t) = (t, t2 ) and x2 (t) = (t2 , t3 ). Show that there is no first-order system x0 = A(t) x with A(t) continuous in t for which x1 and x2 are both solutions. 2. Suppose A is a constant (n × n)-matrix and x0 = Ax admits a constant solution x(t) ≡ x0 where x0 6= 0. Can you identify one of the eigenvectors of A and its eigenvalue? 3. If A is an (n × n)-matrix, then so is tA and we can define etA as in Section 6.2: etA = I + tA + (tA)2 + ··· 2! If v is any vector, show that the unique solution of x0 = Ax, x(0) = v is given by x(t) = etA v. 4. If A is a diagonalizable (n×n)-matrix so that A = E D E−1 where E is the matrix of eigenvectors for A and D = Diag(λ1 , . . . , λn ), show that etA = E Diag(etλ1 , . . . , etλn ) E−1 . 5. For the first-order system, compute the matrix exponential etA discussed in the previous two exercises and use it to find the solution of the initial-value problem: 2 −1 2 0 x = x, x(0) = . −4 2 0 6. For the first-order system, compute the the matrix exponential etA and use it to find the solution of the initial-value problem: −3 −2 1 x0 = x, x(0) = . 9 3 1 Consider two masses m1 and m2 connected to two walls by dashpots with respective damping coefficients c1 and c2 , and connected to each other by a spring with constant k; see Figure 1. If x and y denote the displacements from the equilibrium positions of m1 and m2 respectively, then the following second-order equations hold: m1 x00 = −c1 x0 − k(x − y) m2 y 00 = −c2 y 0 − k(y − x). Fig.1. Exercise 7. (7.37) 7. (a) Let c1 = c2 = 1, m1 = m2 = 1, and k = 1. Using x1 = x, x2 = x0 , x3 = y, and x4 = y 0 , replace the second-order system (7.37) by a first-order system. (b) Find the general solution of the first-order system, 7.6. ADDITIONAL EXERCISES 235 (c) Would you describe this system as overdamped, underdamped, or critically damped? (d) Do all solutions satisfy x(t), y(t) → 0 as t → ∞? What is the significance of the zero eigenvalue? 8. (a) Let c1 = c2 = 3, m1 = m2 = 1, and k = 1. Using x1 = x, x2 = x0 , x3 = y, and x4 = y 0 , replace the second-order system (7.37) by a first-order system. (b) Find the general solution of the first-order system, (c) Would you describe this system as overdamped, underdamped, or critically damped? (d) Do all solutions satisfy x(t), y(t) → 0 as t → ∞? What is the significance of the zero eigenvalue? Two railroad cars of masses m1 and m2 collide on a railroad track. At the time of contact, t = 0, suppose m1 is moving with velocity v0 but m2 is stationary. However, there is a buffer spring with constant k that prevents the cars from actually hitting each other. Let x(t) denote the displacement of m1 and y(t) denote the displacement of m2 after contact. As long as contact is maintained, they satisfy the following system: m1 x00 = −k(x − y), x(0) = 0, x0 (0) = v0 , m2 y 00 = k(x − y), y(0) = 0, y 0 (0) = 0. (7.38) Of course, at some t∗ > 0 the cars will separate and will continue with constant velocities v1 and v2 for t > t∗ , but it is not immediately clear whether v1 and v2 are both positive. 9. Let m1 = 3, m2 = 2, k = 6, and v0 = 20. (a) Find the solution of (7.38) for 0 < t < t∗ . (b) Find the value of t∗ . (c) What are the velocities v1 and v2 for t > t∗ ? (d) Verify that momentum and energy are both conserved (i.e. check their values for t < 0 and t > t∗ ). 10. Let m1 = 3, m2 = 4, k = 12, and v0 = 35. (a) Find the solution of (7.38) for 0 < t < t∗ . (b) Find the value of t∗ . (c) What are the velocities v1 and v2 for t > t∗ ? (d) Verify that momentum and energy are both conserved (i.e. check their values for t < 0 and t > t∗ ). When A is a diagonalizable (n × n)-matrix, we can use a change of variable to solve the nonhomogeneous system x0 = Ax + f (t). Let E denote the matrix of eigenvectors for A so that (see Section 6.2) E−1 AE = D = Diag(λ1 , . . . , λn ) Fig.2. Exercise 9. 236 CHAPTER 7. SYSTEMS OF FIRST-ORDER EQUATIONS where λ1 , . . . , λn are the eigenvalues for A. Letting y = E−1 x yields (see Exercise 11) a first-order system for y: y0 = D y + g(t) where g = E−1 f . (7.39) The advantage of (7.39) is that it has been decoupled into n linear equations yj0 = λj yj +gj (t), which can be solved (see Exercise 12) by an integrating factor. The solution of the original system is then obtained by computing x = Ey. 11. Show that letting x = E y in x0 = Ax + f (t) yields (7.39). 12. Show that the system (7.39) with initial condition y(0) = b can be solved to find yj (t) = eλj t Z t e−λj s gj (s) ds + bj eλj t . 0 13. Use the method described above to solve the following initial-value problem: 0 1 sin t 2 x0 = x+ , x(0) = . 1 0 0 0 14. Use the method described above to solve the following initial-value problem: 4 −2 t 1 0 x = x+ , x(0) = . 3 −1 2t 1 Appendix A Complex Numbers A complex number is of the form z = a + b i, (A.1) where a and b are real numbers and i has the property that i2 = −1. (In some books, the letter j is sometimes used in place of i.) The number a in (A.1) is called the real part of z, denoted a = Re(z), and b is called the imaginary part of z, denoted b = Im(z). Complex numbers can be added or multiplied together (using i2 = −1) to again obtain a complex number. Example 1. If w = 2 − i and z = 1 + 3i, find Re(w), Im(z), w + z, and wz. Solution. Re(w) = 2, Im(z) = 3, w + z = 2 + 1 − i + 3i = 3 + 3i, and wz = (2 − i)(1 + 3i) = 2 − i + 6i − 3i2 = 2 + 5i + 3 = 5 + 5i. 2 Fig.1. The complex plane If we identify the complex number z = x + y i with the point (x, y), we can represent complex numbers as points in the complex plane. In the complex plane, the horizontal (or “x-axis”) is called the real axis and the vertical axis (or “y-axis”) is called the imaginary axis. The addition of two complex numbers can be performed in the complex plane simply by adding their coordinates, but multiplying two complex numbers looks complicated in the complex plane; we shall soon see how to make sense of it as well. If z is a complex number, its complex conjugate, denoted z̄, is obtained by changing the sign of the imaginary part: z = a + bi ⇒ z̄ = a − b i. (A.2) If we take the product of z and z̄, we obtain the sum of the squares of its real and imaginary parts: z = a + bi ⇒ z z̄ = (a + b i)(a − b i) = a2 + b2 . Since z z̄ is a nonnegative real number, we may take its square root and obtain a nonnegative real number called the modulus of z and denoted by |z|: z = a + bi ⇒ |z|2 = z z̄ = a2 + b2 . 237 (A.3) Fig.2. The complex conjugate of a complex number 238 APPENDIX A. COMPLEX NUMBERS Note that the complex conjugate and modulus can be represented geometrically: if z = x + y i is plotted in the complex plane, then z̄ = x − y i is just its reflection in the horizontal axis, and |z| is the distance of the point (x, y) to the origin in the complex plane. Example 2. If z = −1 + 3i, find the complex conjugate z̄ and the modulus |z|. Represent these geometrically in the complex plane. p √ Solution. We find z = −1 − 3i, z̄ = −1 − 3i and |z| = (−1)2 + 32 = 10. We represent them in the complex plane in Figure 2. 2 In addition to multiplying complex numbers, we want to be able to divide by a (nonzero) complex number. In particular, for a nonzero complex number z = a + bi, we want to realize 1/(a + bi) in the form (A.1). This can be achieved by multiplying the numerator and denominator by z̄ = a − bi: 1 a − bi a − bi a −b 1 = · = 2 = 2 + 2 i. 2 2 a + bi a + bi a − bi a +b a +b a + b2 (A.4) Now we can perform the division w/z simply by multiplying w and 1/z. Example 3. If w = 2 − i and z = −1 + 3i, find z −1 and w/z. Solution. 1 1 −1 − 3i 1 3 z −1 = = · =− − i. −1 + 3i −1 + 3i −1 − 3i 10 10 w 1 3 1 −1 = (2 − i) · − 10 − 10 i = 10 + 25 i. z = wz 2 There is another way of writing complex numbers. The polar form is z = r eiθ = r (cos θ + i sin θ), (A.5) where r = |z| is the modulus of z and θ is the angle that z makes with the real axis in the complex plane; θ is called the argument of z and takes values in [0, 2π). Behind (A.5) is Euler’s formula, which states Fig.3. The modulus and argument of a complex number eiθ = cos θ + i sin θ. (A.6) We can derive (A.6) by using complex numbers in the familiar power series ex = 1 + x + ∞ X x2 xn + ··· = , 2! n! n=0 1 2 1 1 1 x + x4 − · · · and sin x = x − x3 + x5 · · · 2! 4! 3! 5! In fact, replacing x by iθ in (A.7) and collecting terms involving i, we find cos x = 1 − 1 1 1 (iθ)2 + (iθ)3 + (iθ)4 + · · · 2! 3! 4! 1 2 1 4 1 1 = 1 − θ + θ + · · · + i θ − θ3 + θ5 + · · · 2! 4! 3! 5! eiθ = 1 + iθ + = cos θ + i sin θ, (A.7) (A.8) 239 which is just (A.6). We can use (A.6) to express cos θ and sin θ as complex exponentials: cos θ = eiθ + e−iθ 2 and sin θ = eiθ − e−iθ . 2i (A.9) Moreover, using (A.5) we see that the product of two complex numbers is given by the product of their moduli and the sum of their arguments: z1 = r1 eiθ1 , z2 = r2 eiθ2 ⇒ z1 z2 = r1 r2 ei(θ1 +θ2 ) . (A.10) This provides the geometric interpretation of multiplication that we referred to above. √ √ Example 4. Let z1 = 1 + 3 i and z2 = −1 + 3 i. Express both z1 and z2 in polar form, and use them to illustrate the validity of the product rule (A.10). √ Solution. Both z1 and z2 have modulus 1 + 3 = 2. To find the argument θ1 for z1 , we write √ 1 + 3 i = 2eiθ1 = 2(cos θ1 + i sin θ1 ). √ So cos θ1 = 1/2 and sin θ1 = 3/2. We see that θ1 is in the 1st quadrant and tan θ1 = √ 3, so θ1 = π/3. To find the argument θ2 for z2 we write √ −1 + 3 i = 2 eiθ2 = 2(cos θ2 + i sin θ2 ). √ So cos θ2 = √ −1/2 and sin θ2 = 3/2. We see that θ2 is in the 2nd quadrant and tan θ2 = − 3, so θ2 = 2π/3. We conclude that the polar forms are z1 = 2 ei π/3 and z2 = 2 ei 2π/3 . Let us compute the produce z1 z2 two ways: √ √ z1 z2 = (1 + 3 i)(−1 + 3 i) = −4 and z1 z2 = 2 ei π/3 2 ei 2π/3 = 4 ei π . Since eiπ = −1, we have the same answer, as asserted by (A.10). Exercises 1. For the following complex numbers z, find z and |z|. (a) 1 + 2i, (b) 2 − 3i, (c) −3 + 4i, (d) 5 − 7i 2. For the following complex numbers w and z, find wz and w/z. (a) w = 3 + 4i, z = 5i; (c) w = 2 − 3i, z = 1 + i; (b) w = 1 + 2i, z = 1 − 2i; (d) w = 4 + i, z = 1 + 4i. 3. Express the following complex numbers in polar form. √ (a) −3i, (b) −3 − 3i, (c) 1 − i 3, (d) −3 + 4i 4. Use (A.6) to prove (A.9). 2 Fig.4. The geometric interpretation of complex multiplication 240 APPENDIX A. COMPLEX NUMBERS Appendix B Review of Partial Fractions In high school algebra, we learn to put fractions over a common denominator. This applies not just to numbers like 1 1 5 + = , 2 3 6 but to rational functions, i.e. quotients of polynomials: 1 2x + 3 1 + = . x+1 x+2 (x + 1)(x + 2) (B.1) However, when we want to integrate a complicated rational function, we often want to “go the other way” and separate it into a sum of simpler rational functions. The resulting sum is called a partial fraction decomposition (PFD). For example, if we started with the rational function on the right in (B.1), the expression on the left is its partial fraction decomposition. In general, suppose we have a rational function R(x) = p(x) q(x) where deg p < deg q. (B.2) (Recall that if we do not have deg p < deg q, then we can perform polynomial division to achieve this.) Now we factor q(x) into its linear and irreducible quadratic terms. (Recall that “irreducible quadratic” means it does not contain any real linear factors, so x2 + 1 is irreducible but x2 − 1 is not irreducible since it can be factored as (x − 1)(x + 1).) • Each factor of q(x) in the form (ax − b)k contributes the following to the PFD: A1 A2 Ak + + ··· + . 2 (ax − b) (ax − b) (ax − b)k • Each factor of q(x) in the form (ax2 + bx + c)k contributes the following to the PFD: A2 x + B 2 Ak x + B k A1 x + B1 + + ··· + . 2 2 2 ax + bx + c (ax + bx + c) (ax2 + bx + c)k 241 242 APPENDIX B. REVIEW OF PARTIAL FRACTIONS Once we have the correct form of the partial fraction decomposition, we simply recombine over the common denominator and compare with the original expression to evaluate the constants. Example 1. Find the PFD for 3x + 14 . x2 + x − 6 Solution. The denominator factors as x2 +x−6 = (x−2)(x+3), so the partial fraction decomposition takes the form 3x + 14 A B = + . x2 + x − 6 x−2 x+3 We want to find A and B. We recombine over the common denominator and collect terms in the numerator according to the power of x: B A(x + 3) + B(x − 2) (A + B)x + (3A − 2B) A + = = . x−2 x+3 (x − 2)(x + 3) (x − 2)(x + 3) Comparing the numerator of this last expression with that of the original function, we see that A and B must satisfy A+B =3 and 3A − 2B = 14. We can easily solve these simultaneously to obtain A = 4 and B = −1, so our PFD is 3x+14 x2 +x−6 = 4 x−2 − 1 x+3 . 2 Example 2. Find the PFD for x2 − 2x + 7 . (x + 1)(x2 + 4) Solution. The denominator has already been factored into a linear and an irreducible quadratic factor, so the PFD takes the form A Bx + C x2 − 2x + 7 = + 2 . 2 (x + 1)(x + 4) x+1 x +4 Now we recombine over the common denominator and collect terms in the numerator according to the power of x: x2 − 2x + 7 A(x2 + 4) + (x + 1)(Bx + C) (A + B)x2 + (B + C)x + 4A + C = = . 2 2 (x + 1)(x + 4) (x + 1)(x + 4) (x + 1)(x2 + 4) Comparing both sides of this equation, we see that A, B, C must satisfy the following three equations: A + B = 1, B + C = −2, 4A + C = 7. 243 These can be solved simultaneously (e.g. one can use Gaussian elimination as in Chapter 3) to obtain A = 2, B = −1, and C = −1. In other words, we have the PFD x2 − 2x + 7 2 x+1 = − . (x + 1)(x2 + 4) x + 1 x2 + 4 Example 3. Find the PFD for 2x3 + 4x2 + 3 . (x + 1)2 (x2 + 4) Solution. The denominator has already been factored into linear and irreducible quadratic factors, so the PFD takes the form A1 A2 2x3 + 4x2 + 3 B1 x + C1 = + . + (x + 1)2 (x2 + 4) x + 1 (x + 1)2 x2 + 4 = (A1 + B1 )x3 + (A1 + A2 + 2B1 + C1 )x2 + (4A1 + B1 + 2C1 )x + 4A1 + 4A1 + C1 (x + 1)2 (x2 + 4) So we get the following system A1 + B 1 = 2 A1 + A2 + 2B1 + C1 = 4 4A1 + B1 + 2C1 = 0 4A1 + 4A2 + C1 = 3. These may be solved simultaneously (for example, using Gaussian elimination) to find A1 = 0, A2 = 1, B1 = 2, C1 = −1. Hence the PFD is 1 2x − 1 2x3 + 4x2 + 3 = + 2 . 2 2 2 (x + 1) (x + 4) (x + 1) x +4 Trick for Evaluating the Constants After recombining the terms in our PFD over the common denominator, we want the resultant numerator to equal the numerator of the original rational function. But these are both functions of x, so the equality must hold for all x. In particular, we can choose convenient values of x which simplify the calculation of the constants; this works especially well with linear factors. Let us redo Examples 1 & 2 using this method. Example 1 (revisited). We wanted to find A and B so that 3x + 14 A(x + 3) + B(x − 2) = . x2 + x − 6 (x − 2)(x + 3) Since there are two constants to find, we choose two values of x; the obvious choices are x = 2, −3. If we plug x = 2 into both numerators, the term involving B vanishes and we obtain 3(2) + 14 = A(5), i.e. 5A = 20 or A = 4. Similarly, we can plug x = −3 into 244 APPENDIX B. REVIEW OF PARTIAL FRACTIONS both numerators and obtain 3(−3) + 14 = B(−5), which means B = −1. (These agree with the values found previously.) 2 Example 2 (revisited). We wanted to find A, B, and C so that x2 − 2x + 7 A(x2 + 4) + (x + 1)(Bx + C) = . (x + 1)(x2 + 4) (x + 1)(x2 + 4) Since there are three constants, we choose three values of x. One obvious choice is x = −1, and evaluating both numerators yields 10 = 5A, or A = 2. No other choices of x will cause terms to vanish, so let us choose simple values like x = 0, 1: x=0 ⇒ 7 = 4A + C = 8 + C x=1 ⇒ 6 = 5A + 2(B + C) = 8 + 2B ⇒ C = −1 ⇒ B = −1, 2 (These agree with the values found previously.) Allowing Complex Factorization If we allow complex numbers in our PFD, then an “irreducible quadratic” polynomial such as x2 +1 can be factored as (x+i)(x−i). This allows us to treat any quadratic polynomial in the denominator as a product of linear factors; we shall call this a complex partial fraction decomposition (CPFD). Example 3 (revisited). Find the CPFD for 2x3 + 4x2 + 3 . (x + 1)2 (x2 + 4) We factor x2 + 4 as (x + 2i)(x − 2i), so we want to write 2x3 + 4x2 + 3 A1 A2 B C = + + + . 2 2 (x + 1) (x + 2i)(x − 2i) x + 1 (x + 1) x + 2i x − 2i But the work that we did before evaluated A1 = 0 and A2 = 1, so let us use the answer that we obtained before and concentrate on B and C: 2x − 1 2x − 1 B C = = + x2 + 4 (x + 2i)(x − 2i) x + 2i x − 2i B(x − 2i) + C(x + 2i) (B + C)x + 2i(C − B) = . x2 + 4 x2 + 4 Hence we get the system B+C =2 = 2i(C − B) = −1, which can easily be solved to find B = 1 − 2x3 +4x2 +3 (x+1)2 (x2 +4) = i 4 1 (x+1)2 and C = 1 + 4i . Thus + 1− 4i x+2i + 1+ 4i x−2i . One application of a CPFD is to Laplace transform; see Exercise 8 in Section 3.1. 2 Appendix C Table of Integrals R R R un du = 1 u 1 n+1 n+1 u if n 6= −1 +C du = ln |u| + C eau du = 1 a eau + C au ln a if a 6= 0 R cot u du = ln | sin u| + C R sec u du = ln | sec u + tan u| + C R csc u du = ln | csc u − cot u| + C R √ du a2 −u2 R du a2 +u2 = 1 a = 1 2a = sin−1 u a + C, if a 6= 0 R au du = R cos u du = sin u + C R sin u du = − cos u + C R du a2 −u2 R sec2 u du = tan u + C R sin2 u du = 12 u − 1 4 sin 2u + C R csc2 u du = − cot u + C R cos2 u du = 12 u + 1 4 sin 2u + C R sec u tan u du = sec u + C R tan2 u du = tan u − u + C R csc u cot u du = − csc u + C R R tan u du = ln | sec u| + C +C if a > 0 tan−1 ln u a u+a u−a + C, if a 6= 0 + C, cot2 u du = − cot u − u + C R R u dv = uv − v du (over) 245 if a 6= 0 246 APPENDIX C. TABLE OF INTEGRALS sin(a−b)u 2(a−b) sin(a+b)u 2(a+b) R sin au sin bu du = R cos au cos bu du = R sin au cos bu du = − cos(a−b)u 2(a−b) − R eau sin bu du = eau a2 +b2 (a sin bu R eau cos bu du = eau a2 +b2 (a cos bu R√ R √ a2 −u2 u R√ R a2 − u2 du = du = √ √ a2 − u2 − a ln u 2 √ if a2 6= b2 +C if a2 6= b2 sin(a+b)u 2(a+b) + a 2 − u2 + +C cos(a+b)u 2(a+b) +C if a2 6= b2 − b cos bu) + C + b sin bu) + C a2 2 sin−1 u 2 +C √ a+ a2 −u2 u 2 u2 ± a2 ± a2 ln u + √ = ln u + u2 ± a2 + C u2 ± a2 du = √ du u2 ±a2 u 2 sin(a−b)u 2(a−b) − √ +C u2 ± a2 + C Appendix D Table of Laplace Transforms Here a and b are real numbers, and the transforms will exist for sufficiently large s. → Function Transform → Function Transform f (t) → F (s) ta f 0 (t) → sF (s) − f (0) eat f 00 (t) → s2 F (s) − sf (0) − f 0 (0) tn eat → n!/(s − a)n+1 → cos bt → s/(s2 + b2 ) sin bt → b/(s2 + b2 ) Rt 0 f (τ ) dτ eat f (t) → F (s − a) u(t − a)f (t − a) Rt 0 → → e−as F (s) → F (s)G(s) −F 0 (s) → f (t)/t tn → f (τ )g(t − τ )dτ tf (t) 1 F (s)/s → R∞ s F (σ)dσ 1/s n!/sn+1 247 Γ(a + 1)/sa+1 → → 1/(s − a) cosh bt → s/(s2 − b2 ) sinh bt → b/(s2 − b2 ) eat cos bt → (s−a)/((s−a)2 +b2 ) eat sin bt → b/((s − a)2 + b2 ) u(t − a) → e−as /s δ(t − a) → e−as 248 APPENDIX D. TABLE OF LAPLACE TRANSFORMS Appendix E Answers to Some Exercises Section 1.1 (d) Equilibria at y = nπ: semistable for n = 0; sink for n = 1, 3, . . . and n = −2, −4, . . . ; source for n = −1, −3, . . . and n = 2, 4, . . . 1. (a) 1st-order nonlinear (b) 2nd-order linear (c) 3rd-order linear 4. (b) y = k 2 /(2ga2 ) is a stable equilibrium. (d) 3rd-order nonlinear 2. (a) y = 2 − cos t Section 1.3 2 (b) y = 12 (1 + ex ) (c) y = x sin x + cos x (d) y = 3 − 2(1 − x)1/2 3. The maximum height of 64 feet is achieved after 2 sec. 4. The ball hits after 4.5 sec with a speed of 44.3 m/sec. Section 1.2 1. (a) y = C exp(−x2 ) −1 (b) y = tan−1 x+C √ (c) y = 3 3 sin x + C (d) y + 13 y 3 = ex + C (e) x2 = C exp(2t) − 1 (f) x = (t3/2 + C)2 2. (a) y(x) = exp(sin x) q (b) y(x) = 23 x3 + 2x + 1 3 (c) y(t) = −2 exp(t3 − t) 2. (a) Yes 3. (c) ∂f /∂y not continuous at y = 0 (b) No 4. 64,473 (c) Yes 3. (a) y = 0 is a source, y = 1 is a sink. (b) y = 1 is a sink, y = 0 is a semistable equilibrium. (c) y = 1 is a sink, y = 0 is a source, y = −1 is a sink. 249 5. 1.71 hours 6. t = 1.45 × 109 years 7. 2,948 years 8. 87 31 minutes 9. 4 minutes 250 APPENDIX E. ANSWERS TO SOME EXERCISES 10. 7:37 am 11. A(t) = 50(1 − e−.02t ) 13. Rock achieves max height of 5.03 m at the time t = 1.01 sec p 14. v ∗ = mg/c Section 1.4 1. (a) y = 2 + C exp(−x) (b) y = (x2 /2 + c) exp(−3x) (c) y = xe−2x − e−2x + ce−3x (d) y = 23 x1/2 + cx−1 (e) x = t + 1 + c/(t + 1) 3 (f) y = t [−t cos t + sin t + c ] 2. (a) y = (b) y = (c) y = 3. (a) (b) (c) (d) Exact: x2 ey + cos y = C Exact: x sin y + sin x = C Not Exact Exact: sin x + x ln y + ey = C 4. (a) y 2 = 4x2 (ln x + 1) (x > 0) (b) x2 y 2 + x3 + y 4 = 1 (c) y = (2e−2x − 1)−1/2 (x < 21 ln 2) Section 1.6: Odd numbers only 1. 6.7 seconds 3. 10 9 7 ( 2 ( −x 1 + e if 0 ≤ x ≤ 1, (e + 1)e−x if x > 1. ln x x − 1 + x2 if 0 ≤ x ≤ 1, 1 if x > 1. 3. 81 lbs 4. (a) 27 lb, (b) 75 lb 5. 1.7 days 6. When the tank becomes full, there is 150 g of salt; and when it becomes half-full again, there is 75 g of salt. 7. T (t) = 25e−0.2t − 5e−t Section 1.5 1. (a) y 2 = x2 (ln |x| + C) (b) y = x(ln x + C)2 (c) ln |xy| = C + x y (d) y = C(x2 + y 2 ) x 8 1 x −x ) 2 (e + e 1 2 (sin x + csc x) ln |t|−1 t (d) y = (e) y = 2. (a) y = ( x2 + Cx4 )−1/2 (b) y = (x + Cx2 )−3 6 5 4 3 1 0 1 t 2 3 4 5 6 7 8 9 10 5. (c) dy/dx = y 2 + x 7. y = ±1 are sources, y = 0 is a sink. 9. If c < 1 then√2 equilibrium solutions: y+ = −1 + √1 − c unstable (source) y− = −1 − 1 − c stable (sink) If c = 1 then 1 equilibrium solution: y0 = −1 unstable If c > 1 then no equilibrium solution. 11. (a) A(y) = πr2 where r is the base of a right triangle with height R − y and hypotenuse R. Ry (b) Differentiate V (y) = 0 A(u)du, with respect to t using the chain rule to obtain dV /dt = A(y)dy/dt. On the other hand, the rate at which the volume of water changes due to the hole is the the cross-sectional area of the hole times the velocity √ through the hole, i.e. −a v = −a 2gy. 251 13. Emigration of 71, 000 annually. 15. y(x) = C exp[x + 17. y(x) = x2 2 − x2 4 ln x x2 2 ] + −1 C ln x 19. y(x) = (1 − cos x)−1 . 21. y2 2 + ey − e−x − 23. x2 = t2 4 2 (3t x2 2 = e − 12 . 10. (c) y = 13 (1 − cos 3x + sin 3x) 11. (c) y = 2x + 2 − 2ex cos x. 12. (c) y = 1 − 2x + x2 . Section 2.3 2. (a) y = c1 e2x + c2 e−2x (b) y = c1 cos 2x + c2 sin 2x − 1). (c) y = c1 e−3x + c2 x e−3x (d) y = c1 e3x + c2 e−5x Section 2.1 2 2. (a) y10 = y2 , y1 (0) = 1 y20 = 9y1 , y2 (0) = −1 (b) y10 = y2 , y1 (0) = 1 y20 = ex − 3y2 + y1 , y2 (0) = 0 (c) y10 = y2 , y1 (0) = 1 y20 = y12 , y2 (0) = 0 (d) y10 = y2 , y1 (0) = 1 y20 = ex −y2 −5 sin y1 , y2 (0) = 1 3. (a) x01 = x2 , x02 = −kx1 /m (b) (c) y10 x01 = y2 , y20 = g − ky1 /m = x2 , x02 = −kx1 /m−cx2 /m 4. k = 196 N/m 2 (e) y = c1 e− 3 x + c2 x e− 3 x (f) y = e−4x (c1 cos 3x + c2 sin 3x) √ √ + c2 e(−1+ 3)x √ √ 3x (h) y = e (c1 cos 2x+c2 sin 2x) (g) y = c1 e(−1− 3)x 3. (a) y = 31 e3x + 23 e−3x (b) y = cos 3x − 1 3 sin 3x 5x (c) y = 6xe5x − e (d) y = e3x (cos 4x − sin 4x) (e) y = 54 (e3x − ex/2 ) (f) y = e2x cos x − 2 e2x sin x Section 2.4 1. T = π, ω = 2 5. k = 5 lb/ft 2. T = π/2, ω = 4 Section 2.2 3. ω = 2, T = π, A = 4. x(t) = 2. y = ex − e2x 5. x(t) = x 3. y = e + 2x e 4. y = e−x (cos x + 2 sin x) 5. linearly independent 6. linearly independent 7. dependent: 4f − 3g = 0 8. linearly independent 9. (c) y = −x + ex 5/2 √ 1. y = 12 (ex + e−x ) x √ 10 3 √ 17 4 cos(3t − 0.322) cos(4t − 6.038) 6. (a) i) x(t) = e−2t − e−4t , ii) overdamped (b) i) x(t) = 4e−t/2 − e−2t , ii) overdamped (c) i) x(t) = e−t (cos t + sin t), ii) underdamped −3t/2 , (d) i) x(t) = e−3t/2 + 5t 2e ii) critically damped (e) i) x(t) = e−3t (cos 4t + ii) underdamped 5 4 sin 4t), 252 APPENDIX E. ANSWERS TO SOME EXERCISES 7. (a) √ 2 e−t cos(t√− π/4), µ = 1, ω = 2 ≈ 1.414 (b) 4.03 e−t/4 cos(1.98 t − 3.01), µ = 1.98, ω = 2 √ 9. ω = 19.6 = 4.43, T = 1.42. Doubling the mass has no effect on T ! Section 2.5 (b) xsp (t) = − 56 cos 5t − 31 sin 5t xtr (t) = e−3t ( 56 cos 2t + 25 12 sin 2t) (c) xsp (t) = − cos t + 2 sin t xtr (t) = e−t/2 (cos(t/2) − 3 sin(t/2)) √ 7. (a) xsp (t) = 13 cos(3t − 2.16) √5 2 41 (b) xsp (t) = 1. (a) y(x) = 2e −3x − 25 (b) y(x) = c2 sin 2x + c1 e −x + c2 e cos(4t − 4.04) −2x cos 3x + c1 cos 2x + (c) y(x) = −2x − 32 + c1 e−2x + c2 ex (d) y(x) = − 13 x cos 3x + c1 cos 3x + c2 sin 3x 8. (a) 0.42, (b) 12.5, (c) 0.25 Section 2.7 q 1. ω ∗ = ω02 − R2 2L2 (e) y(x) = x2 + 12 x + c1 + c2 e−2x 2. ω # = ω0 (f) y(x) = 32 ex − cos 2x − sin 2x + √ √ ex/2 (c1 cos 27 x + c2 sin 27 x) 3. (a) isp (t) = √10 37 sin(2t − 4.548) (b) isp (t) = √20 13 sin(5t − 5.69) 2. (a) y(x) = 2xe2x + e−2x √ (b) y(x) = ( 52 x2 + x + 1)e−2x x 2 (c) y(x) = − 34 + ex − e−x + 34 e−2x (d) y(x) = 1 − 3 2 sin x 4. (a) (b) 3 2 x cos x + cos x + 1 x 6e 2 3 − x 3 sin x 3 sin 3x + 4. (a) i(t) = 5e−t sin 2t, underdamped −5t (b) i(t) = − 10 − 10e−10t + 3 e 40 −20t , overdamped 3 e (c) i(t) = 2 cos 2t − 3 sin 2t + e−t (−2 cos 3t+ 17 3 sin 3t), underdamped 2 (c) − cos x ln | sec x + tan x| (d) (c) isp (t) = 2 13 sin(3t − 5.30) cos 3x 9 ln | cos 3x| Section 2.6 (d) i(t) = sin 2t + 3 cos 2t − 3e−2t , overdamped Section 2.8: Odd numbers only 1 5 1. x(t) = cos 2t − √ 2. ω = 4 2 1 5 cos 3t 3. (a) Resonance (b) Beats: ampl. 4, freq. 0.045 (c) Beats: ampl. 6, freq. 0.04 (d) Resonance 6. (a) xsp (t) = − 41 cos 3t + 34 sin 3t, xtr (t) = e−2t ( 14 cos t − 74 sin t). 1. π/2 3. (a) S1 , S√ 2 have lengths 2/5, 3/5 resp. (b) ω = 5 p 5. c = 4 ln 2/ π 2 + (ln 2)2 7. (a) y1 = x, y2 = x−1 , 4 (b) yp = 21 x5/2 9. v = 7.1 m/sec 253 11. (a) Use ma = F in the direction tan(d) 2e−2t cos 2t 2 2 gent to the arc, so a = Ld θ/dt and F = Fg + Ff , where Fg = −mg sin θ Section 3.2 is the component of the gravitational 2. (a) y = 23 e−t + 12 et force, and Ff = f (t) cos θ is the (b) y = e−t component of the external horizontal force. (c) y = 5et − 3 cos t + 3 sin t 3. (a) y = −3 cos 2t + Section 3.1 2 −2t 3e −t (b) y = + (b) (c) 2 (s−1)2 +4 (e) y = 4e2t − 3e3t (d) 1−e−s s (e) 1 s2 (f) y = 61 e−t − 3 −3t 10 e 5. (a) (b) (c) − (d) y = 2et + 2e−t − 4 cos t e−s s − e−s s2 (e) 5. x(t) = 6. (a) (f) (g) (h) 1 s + 2s3/2 1 3 6 6 s + s2 + s3 + s4 6 6 s2 −4 + s2 +9 3 −πs s e 6. (a) 2e (d) (e) (f) 7. (a) 1 −3t −4e−2t +cos t+sin t) 10 (3e 2 (s−a)3 (b) 6bs2 −2b3 (s2 +b2 )3 (c) s2 −b2 (s2 +b2 )2 (d) e−s s + e−s s2 1. (a) 1/(s − 2)2 sin 2t (c) (s − 3)/((s − 3)2 + 25) (e) 1/(s − 1)2 + 1/(s + 1)2 sinh 2t e4t + 7 4 (f) 6/(s−1)4 +(s+1)/((s+1)2 +4) e−4t 2. (a) u(t − 2) (c) e + cos t (c) 1 2 (d) − 12 e−t (d) + t − 12 e2t + 2e 3 2t 2 2e t −2t (b) e 1 −2t ) 2 (1 − e −t −2t 8. (a) cos bt (b) e−t sin t 1 4 (1 + (d) 3/((s + 4)2 + 9) (b) −e (c) 1 2t 30 e (b) 2/(s + 3)3 3 2 √ 4√ t π 1 2 5 4 1 3 + Section 3.3 2t (b) 2 cos 2t − (c) 1 −2t 2e 4. x(t) = e−2t − e−4t √ (d) 3 π/(4s5/2 ) π − 5e + 4e2t (c) y = 2e s s2 +9 6 5 s3 − s−3 2s 6 s2 +9 − s2 +4 √ sin 2t 1 t 3e t e s−2 1 (1−s)2 4. (a) 5 2 − cos 2t) − 3 −3t 2e 3. (a) t −2t (cos 3t + 1 3 sin 3t) −3t e√ πt 1 2t 2 e sin 2t −3t (b) 2e cos t + 7e−3t sin t (c) 2e3t cos 4t − 43 e3t sin 4t 4. (a) 1 −t 5e − 1 −2t (sin 2t 10 e + 2 cos 2t) 254 APPENDIX E. ANSWERS TO SOME EXERCISES (b) et (− 23 − t) − 1 −2t 12 e + 43 e2t 8. cos t + u1 (t) sin(t − 1) −t t (c) e (2 − cos 2t − sin 2t) − e Section 3.5 5. (a) e−πs s21+1 (b) 6. (a) (b) 1. (a) f ? g(t) = t2 /2 e1−s s−1 (b) f ? g(t) = 2 3 u1 (t) sin 3(t − 1) 1 −(t−3) sin 2(t 2 u3 (t)e (d) f ? g(t) = (ebt − eat )/(b − a) 1 2 (cos 3t −2t Section 3.4 2. (a) 2(e2t − et ) (b) 2. (a) 3u1 (t) (et−1 − 1) (b) e−2t + 12 u5 (t)(1 − e−2(t−5) ) 3. (a) 2 cosh t + u1 (t)(cosh(t − 1) − 1) (b) sinh 2t + 14 u1 (t)(cosh 2(t − 1) − 1) − 14 u2 (t)(cosh 2(t − 2) − 1) −2t 2e cos t + 4 e sin t + u(t − 3)[1 − e2(t−3) cos(t − 3) − 2e−2(t−3) sin(t − 3)] (d) uπ (t)[sin t − cos t − 5e 2e−2(t−π) ] 2t 4. (a) e + u1 (t)e 1 5t 6 (e −(t−π) + 2(t−1) − e−t ) + u2 (t)e5(t−2) 5. (a) cos 2t + 12 u3 (t) sin 2(t − 3) 1 −2t ) − 12 te−2t 4 (1 − e −2(t−2) + u2 (t)(t − 2)e (c) [2−e2π uπ (t)+e4π u2π (t)]e−2t sin t (d) +t−1 (b) 3/[(s − 1)(s2 + 9)] 1 1 7 (cos 3t−cos 4t)+ 2 uπ/2 (t) sin 4t 6. 1 2 (1 7. 1 −t cos 2t + 12 e−t sin 2t] + 5 [−1 + e uπ (t) −(t−π) cos 2t+ 12 e−(t−π) sin 2t] 5 [1−e − u2π (t))(sin t − 1 3 (c) s/(s2 + 1)2 Rt 4. (a) y(t) = 0 sinh(t − τ )f (τ ) dτ Rt (b) 21 0 e−(t−τ ) sin 2(t − τ )f (τ ) dτ Rt (c) y(t) = 0 e−(t−τ ) (t − τ )f (τ ) dτ Section 3.6: Odd numbers only −2t 1 2 (b) + sin t − e−t ] 3. (a) 2/(s4 + 4s2 ) (b) g(t) = 1 − u2π (t) + u2π (t) cos t (b) 1 2 [cos t −t (c) e 1. (a) f (t) = u1 (t) (t−1)−u2 (t) (t−2) (c) − at − 1) (c) f ? g(t) = t − 2 + 2 cos t − 3) − 3 sin 3t − e−2t cos t + 7e sin t) 1 8. 3 sin 3t + uπ (t) 16 sin 3t − 12 sin t 7. 1 at a2 (e 2 sin 3t) 1. Any c > 2. −s 3. 1s 1−e −s 1+e 5. 1 s2 1−e−s 1+e−s 7. x(t) = 12 (cos t + cosh t) 9. f (t) = (2a3 )−1 (sinh at − sin at) 11. (a) For any t, only finitely many terms are nonzero (b) Period is 2π (c) 1 − cos t + 2 cos(t − kπ)) P∞ k=1 ukπ (t)(1 − (d) Resonance occurs since the forcing frequency equals the natural frequency 13. (a) 1 2 sin 2t (1 + uπ (t) + u2π (t) + · · · ) 255 (b) Yes, we have resonance: each time the mass passes through the equilibrium position in the positive x direction, it is given an additional unit impulse.  1 (d) 0 0  0 1 0 3. (a) x1 = 3, x2 = −2, x3 = 4 (b) x1 = 5 + 2t, x2 = t, x3 = 7 Section 4.1 1. (a) Unique solution x1 = 5, x2 = −3 (b) No solution (c) Unique solution x1 = −4, x2 = 3 (d) Infinite number: 2x1 − x2 = 6 1 −2 −5 2. A − 2B = −3 0 3 2 −1 3. (a) AB = 6 −1 2 −1 2 (b) AB = 6 −1 4 (c) AB does not exist   −1 (d) AB =  1  −1 −1 10 4. AB = 6 −6   −1 3 6 14 13  BA =  12 −8 −16 −20 5. A and C are symmetric. Section 4.2 1. (a) Yes, (b) No, (c) No, (d) No 2. The REF is not unique, so other answers are possible: 1 −1 (a) 0 1   0 1 2 (b) 0 0 1 0 0 0   1 2 3 4 (c) 0 1 2 3 0 0 1 0 (c) x1 = 2, x2 = −1, x3 = 3 (d) Inconsistent (no solution) 4. (a) Only the trivial solution (b) (x1 , x2 , x3 ) = (−t, −3t, t) (c) (x1 , x2 , x3 , x4 ) = (5s − 4t, −2s + 7t, s, t) (d) (x1 , x2 , x3 , x4 ) = (3s − 6t, s, −9t, t) 5. (a) x1 = (1 − i)t, x2 = t (b) x1 = (−3 + i)/2, x2 = (3 + 3i)/2 (c) x1 = (1 + i)t, x2 = t (d) x1 = (1 + i)/2, x2 = (−1 − i)/2 Section 4.3 1 0 1. (a) 0 1  1 0 (b) 0 1 0 0  1 2 (c) 0 0 0 0  1 0 (d) 0 1 0 0  −5 4 0  0 0 1 0 0 1  0 5 0 2 1 −1 2. (a) x1 = 1, x2 = −1, x3 = 2 (b) No solution (c) x1 = 3, x2 = −2, x3 = 4, x4 = −1 (d) x1 = 3 − s − t, x2 = 5 + 2s − 3t, x3 = s, x4 = t (e) x1 = 5 + 2s − 3t, x2 = s, x3 = −3−2t, x4 = 7+4t, x5 = t 256 APPENDIX E. ANSWERS TO SOME EXERCISES (f) x1 = 2, x2 = 1, x3 = 3, x4 = 4     3 1 −2  3. (a) x = −1 (c) x =  4 2 −1       −1 3 −1 −3 5 2      (d) x =  0 + s  1  + t  0  1 0 0       −3 2 5 0 1 0            (e) x =  −3 + s 0 + t −2 4 0 7 1 0 0   2 1  (f) x =  3 4 4. (a) (b) (c) (d)   −2 x = t −1 1     −2 −1 −3 −1    x = s  1  + t 0  1 0   1 −1  x = t 1 1       1 −2 −7 −2 3 −4            x = r  1 +s 0 +t 0  0 1 0 0 0 1 5. (a) rank 2, (b) rank 1, (c) rank 3, (d) rank 2 Section 4.4 2 4. (a) 13 −1 1 1 (b) 3 −4 −2 3  −5 −2 1 (c) No inverse. (d)  2 −4 −3   −13 42 −5 −9 1  (e)  3 2 −7 1   −26 11 −14 1  6 −1 4  (f) 10 −8 3 −2 −26 5. (a) x = 11   6 (c) x = −1 −1  5 −2 5 8 (b) x = −7   −13 (d) x =  7  −3 1 2 3 25 (b) x = 41 −1 −10     −12 −107 (c) x = −128 (d) x =  0  9 48 6. (a) x = Section 4.5 1. (a) 2, (b) −8, (c) −24, (d) 1 2. (a) k = ±3, (b) k = 0, 4 3. (a) −1, (b) 1, (c) 1 4. (a) 2 + i, (b) −1 + 2i Section 4.6 1. (a) −6, (b) −3, (c) −14, (d) −4, (e) −4, (f) 1 2. (a) 0, (b) −4, (c) 0, (d) −180   7 −6 15 3. (a) A−1 = 21 −4 4 −10 −2 2 −4   15 −25 −26 1  30 25 8  (b) A−1 = 225 15 −25 19 257 4. A−1  1 0 = 0 1 1 0 1 1 0 0 0 −1  1 1  1 1 Section 4.7: Odd numbers only 1. Two planes either intersect in a line, a plane, or not at all. So cannot have a unique solution: either no solution or an infinite number of solutions. 3. No. Requires AB = BA. Section 5.1 1. (a) (2, 6), (b) (1, −4), (c) (−1, 11) 2. (a) (4, 0, −2), (b) (1, −4, −1), (c) (0, 8, 1) 3. (a) (0, 4, 0, −2), (b) (1, −1, −4, −1), (c) (−2, 4, 8, 1) √ √ √ 4. (a) 2, (b) 13, (c) 6 √ 5. 3 3 √ 6. r = 1/ 6 7. 2i − 3j 5. No. 8. 2i − 3j + 5k 7. Take the transpose of I = AA−1 to obtain I = (A−1 )T AT = (A−1 )T A. 9. w = (0, −2, −2) Similarly, the transpose of I = A−1 A yields I = AT (A−1 )T = Section 5.2 A(A−1 )T . These together imply 3. (a) No (not closed A−1 = (A−1 )T . multiplication by 9. Skew-symmetric ⇒ aji = −aij all i, j (b) No (not closed ⇒ aii = −aii all i ⇒ aii = 0 all i. multiplication by 11. (a) Nilpotent: A2 = 0 (c) No (not closed (b) Not nilpotent (B3 = B) (c) Nilpotent: C3 = 0 2 (d) Not nilpotent (D = D) 13. Inconsistent: no solution. 15. x1 = −t, x2 = −1 − 2t, x3 = −2 − 3t, x4 = t 17. x1 = 1 + i − t, x2 = 1 − it, x3 = t under scalar all reals) under scalar all reals) under scalar multiplication by all reals) (d) Yes (e) No (need to be the same size to define addition) (f) Yes (g) No (has no zero vector) (h) Yes (i) No (not closed under addition) 2 19. a) k 6= 1, b) k = 1, c) k = −1 Section 5.3 21. Rank is 1. No solutions. 23. Rank is 3. No solutions. 25. If we put A|b into reduced rowechelon form, there will be a row of the form 0 · · · 0|1, so inconsistent.   −27 9 −5 27. X =  10 −3 2  −22 6 −4 1. (a) Yes. (b) No (has no zero vector). (c) No (not closed under addition). (d) No (not closed under addition). 2. (a) Yes. (b) Yes. 258 APPENDIX E. ANSWERS TO SOME EXERCISES (c) Yes. (b) f1 − 2f2 + (7/π)f3 = 0 (d) No (has no zero vector). (c) f1 − f2 − f3 = 0 3. (a) No (not closed under addition). (b) No (not closed under scalar multiplication). (c) Yes. (d) Yes. 4. (a) Yes. (b) No. (c) Yes. (d) No. 5. (a) w = 3v1 − 2v2 + 4v3 (b) Not possible. (c) w = 26 7 v1 − 75 v2 − 37 v3 6. (a) v = (1, −2, 1) (b) v = (−i, i, 1) (c) v1 = (−2, −2, 1, 0) v2 = (−5, −3, 0, 1) (d) v = (2, −3, 1, 0) (e) v1 = (−1, −1, 1, 0) v2 = (−5, −3, 0, 1) 7. (a) v = (0, 2, 1) (b) v1 = (1, −3, 1, 0) v2 = (−2, 1, 0, 1) Section 5.4 1. (a) linearly independent (b) dependent (c) linearly independent (d) dependent (e) linearly independent (f) linearly independent 2. (a) linearly independent if c 6= −2 Section 5.5 1. (a) Yes, (b) No, (c) Yes, (d) No, (e) Yes 2. Any finite collection {1, x, x2 , . . . , xk } is linearly independent. 3. (a) v = (−10, −7, 1), dim=1 (b) v1 = (−1, −3, 1, 0), v2 = (−3, 8, 0, 1), dim=2 (c) v1 = (1, −3, 1, 0), v2 = (−2, 1, 0, 1), dim=2 (d) v1 = (−2, 2, 1, 0, 0), v2 = (−1, 3, 0, 1, 0), v3 = (−3, −1, 0, 0, 1), dim=3 4. (a) v = (2, 1), dim=1 (b) dim=0, (c) v = (1, −2, 1), dim=1 (d) v1 = (−1, 1, 1, 0), v2 = (−1, 2, 0, 1), dim=2 Section 5.6 1. (a) (1, −3) is a basis for Row(A); (3, −1) is a basis for Col(A) (b) v1 = (1, 0, −1), v2 = (0, 1, 1) is a basis for Row(A); w1 = (3, 1, 1), w2 = (2, 3, 2) is a basis for Col(A) (c) v1 = (1, 0, −1, 1), v2 = (0, 1, 3, −1) is a basis for Row(A); w1 = (3, 2, 1), w2 = (1, 1, 0) is a basis for Col(A) (b) linearly independent if c 6= ±1 (c) v4 = 2v3 so the collection is never linearly independent! 3. (a) W = −x (b) W = 12 (c) W = −2x−6 4. (a) −2f1 + f2 + f3 = 0 2. (a) v1 , v2 is a basis (b) v1 , v2 , v4 is a basis (c) v1 , v2 is a basis Section 5.7 2. hv, λwi = hλw, vi = λhw, vi = λ hw, vi = λ hv, wi 259 3. (a) orthu (v) = λu − proju (v) = λu − hv, uiu = λu − hλu, uiu = λu − λhu, uiu = λu − λu = 0 √ 4. (a) c = −6, (b) c = −1, (c) c = ± 3 5. (a) u1 = 17 (6, 3, 2), u2 = 17 (2, −6, 3) (b) u1 = u2 = (c) u1 = u2 = u3 = √1 (1, −1, −1), 3 √1 (4, 5, −1) 42 √1 (1, 0, 1, 0) 2 √1 (−1, 2, 1, 0), 6 √1 (1, 1, −1, 3) 12 15. C n (I) is a vector space since: (rf )(n) = rf (n) . It has no finite basis since for any k, the set {1, x, x2 , . . . , xk } is linearly independent. 1. Yes. 3. Yes. 17. Let n = dim(S) and {s1 , . . . , sn } be a basis for S. Show that it is also a basis for V . 19. nullity = 3. 5. (a) If f, g are in S1 ∩ S2 , then f, g in S1 ⇒ f + g in S1 , and f, g ∈ S2 ⇒ f + g in S2 , so f + g is in S1 ∩ S2 . (b) If f is in S1 ∩ S2 , then f in S1 ⇒ rf in S1 , and f in S2 ⇒ rf in S2 , so rf is in S1 ∩ S2 . 7. No: A1 + 2A2 + 3A3 − A4 = 0  0 1 0 13. {1, x, x2 } is a basis. (f + g)(n) = f (n) + g (n) and Section 5.8: Odd numbers only 9. (b)  1 0 0 (b) If c1 f1 + c2 f2 = 0 then for x > 0 we have c1 x + c2 x = 0 so c1 = −c2 , while for x < 0 we have c1 x−c2 x = 0 so c1 = c2 . This means c1 = c2 = 0. Basis:   0 0 0 0 0 , 0 0 0 0 0 1 0   0 0 0 , 0 0 0   0 0 0 , 0 0 1 0 0 0   1 0 0 , 0 0 0 1 0 0 (c) dim=6 kv+wk2 = kvk2 +hv, wi+hw, vi+kwk2 . 23. If v1 , v2 are in S ⊥ and s is in S, then hv1 + v2 , si = hv1 , si + hv2 , si = 0 so v1 + v2 is in S ⊥ . Similarly show rv is in S ⊥ if v is. 25. If v is in S2⊥ then hv, si = 0 for all s in S2 . But S1 ⊂ S2 so hv, si = 0 for all s in S1 , i.e. v is in S1⊥ .  0 Section 6.1 0 , 1 1. (a) λ1 λ2  0 0 (b) λ1 0 1 , λ2 1 0 (c) λ1 λ2 0 0 0 11. (a) ( 1 f20 (x) = −1 21. Compute for x > 0 for x < 0 so f20 is not continuous at x = 0. = 3, v1 = (3, 1); = −5, v2 = (−1, 1). = 1, v1 = (1, 1); = 2, v2 = (3, 2). = 2, v1 = (1, 1); = 4, v2 = (4, 3). (d) λ1 = 1, v1 = (1, 0, 0); λ2 = 2, v2 = (1, 1, 0); λ3 = 3, v3 = (0, 0, 1). (e) λ1 = 1 with ma = 2 = mg , v1 = (1, 0, 1), v2 = (−3, 1, 0); λ2 = 3, v3 = (1, 0, 0). 260 APPENDIX E. ANSWERS TO SOME EXERCISES (f) λ1 = −1, ma = 3, mg = 2, v1 = (1, 1, 0), v2 = (−3, 0, 4) (no 3rd eigenvector). (b) 2. (a) λ1 = 1 + 3i, v1 = (−1 + i, 1); λ2 = 1 − 3i, v2 = (−1 − i, 1). (b) λ1 = −2 + i, v1 = (1, i); λ2 = −2 − i, v2 = (1, −i). (c) (c) λ1 = 1, v1 = (1, 0, 0); λ2 = 2i, v2 = (0, i, 1); λ3 = −2i, v3 = (0, −i, 1). (d) λ1 = i, v1 = (−i, 1, 0, 0), v2 = (0, 0, i, 1); λ2 = −i, v3 = (i, 1, 0, 0), v4 = (0, 0, −i, 1). 3. (a) 3. An v = An−1 Av = λAn−1 v = · · · = λn v. (b) 4. λ1 = 1, v1 = (1, 0), λ2 = −1, v2 = (0, 1) (c) Section 6.2 1 1. (a) E = 1 4. (a) 2 2 ,D= 1 0 (b) Not diagonalizable 1 3 0 (c) E = ,D= 1 2 0   1 3 0 (d) E = 0 1 0, 0 0 1   1 0 0 D = 0 2 0 0 0 2 (e) Not diagonalizable   1 0 0 1 0 1 0 1  (f) E =  0 0 0 1, 0 0 1 0   1 0 0 0 0 1 0 0   D= 0 0 1 0  0 0 0 −1 1 1 2i 0 ,D= i −i 0 −2i 1−i 1+i E= , 2 2 1+i 0 D= 0 1−i   1 1 − 2i 1 + 2i E = 1 1 − 2i 1 + 2i, 1 5 5   1 0 0 0  D = 0 1 + 2i 0 0 1 − 2i 63 −62 31 −30   1 189 0 0 64 0  0 0 64   1 −510 255 0 1 0  0 −510 256 −e + 2e2 2e − 2e2 −e + e2 2e − e2   e −3e + 3e2 0 0 e2 0 0 0 e2   e 2e − 2e2 −e + e2 0 e 0  0 2e − 2e2 e2 2. (a) E = 0 3 (b) 0 2 (c) Section 6.3 1. (a) u1 = (b) u1 = u2 = (c) u1 = √1 (1, −1), u2 2 √1 (3, −2), 13 √1 (2, 3) 13 √1 (−1, 1, 0), 2 u2 = (0, 0, 1), u3 = √12 (1, 1, 0) (d) u1 = u2 = u3 = √1 (−1, 0, 1), 2 √ 1 (1, 2, 1), 2 √ 1 2 (1, − 2, 1) = √1 (1, 1) 2 261 √ √ 1/ √2 1/√2 O= −1/ 2 1/ 2 −2 0 D= 0 4 √ √ 1/√2 1/ √2 O= 1/ 2 −1/ 2 0 0 D= 0 2   1 0√ 0√ O = 0 1/ √2 1/√2 0 −1/ 2 1/ 2   −1 0 0 D =  0 −1 0 0 0 3   1 0√ 0√ O = 0 1/ √2 1/√2 0 −1/ 2 1/ 2   2 0 0 D = 0 2 0 0 0 4 2. (a) (b) (c) (d) 3. (a) O−1 (b) O−1 (c) O−1  √ 1/√3 = 1/ 2 0  1 = √12 −1 0  1  1  0 = 4 −1 √ 2 √ 1/ 3 0 1 √  −1/√ 3 1/ 2  0  0 1  0 √ 1 2 0 √1 2 1 0  −1 −1 √ 2 0   −1 √1  0 2 4. y1 v1 + · · · yn vn = z1 v1 + · · · zn vn ⇒ (y1 −z1 )v1 +· · ·+(yn −zn )vn = 0. Use linearly independence. 5. If Auj = λj uj , let vj = Ouj . Show that {v1 , . . . , vn } are orthnormal eigenvectors for B with the same eigenvalues. √   1/2 −1/2 1/ 2 √ √ 6. A = 1/ 2 1√ −1/ 2 1/2 −1/ 2 3/2 Section 6.4: Odd numbers only 1. (a) (p(x) + q(x))0 = p0 (x) + q 0 (x) and (rp(x))0 = rp0 (x) (b) Only eigenvalue is λ = 0 and eigenspace is the constants 3. A invertible ⇔ det(A) 6= 0 ⇔ det(A − 0I) 6= 0 ⇔ λ = 0 is not an eigenvalue 5. (B−λI)T = BT −λI so det(B−λI) = det(BT − λI) ⇒ BT and B have the same characteristic polynomial, hence the same roots. 7. λ = ±1 and eigenbasis 1 0 0 1 0 0 0 , , , 0 0 0 0 1 0 0 0 1 9. Set λ = 0 in the characteristic equation. .8 .3 11. (a) A = .2 .7 (b) (x10 , y10 ) = (5.999, 4.001) ≈ (6, 4) (c) (x10 , y10 ) = (6.001, 3.999) ≈ (6, 4) (d) From (d) in previous problem, 1 3 10 ∗ ∗ ∗ , A ≈ [v1 v1 ] where v1 = 5 2 5 6 so A10 ≈ 5 4 7 6 and A10 ≈ . 3 4 Section 7.1 1. (a) i) x01 = x2 , x02 = 4x2 − 3x1 ii) (0, 0) is unstable (source) (b) i) x01 = x2 , x02 = −9x1 ii) (0, 0) is stable (center) (c) i) x01 = x2 , x02 = 3x1 + 2x2 ii) (0, 0) is unstable (saddle) (d) i) x01 = x2 , x02 = −6x1 − 5x2 ii) (0, 0) is stable (sink) 2. (a) i) x01 = x2 , x02 = −x1 (x1 − 1) ii) critical pts (0, 0) and (1, 0) 262 APPENDIX E. ANSWERS TO SOME EXERCISES (b) i) x01 = x2 , x02 = −x2 − ex1 + 1 ii) critical pt (0, 0) (c) i) x01 = x2 , x02 = sin x1 ii) critical pts (±nπ, 0), n = 0, 1, 2, . . . (d) i) x01 = x2 , x02 = (x2 )2 + x1 (x1 − 1) ii) critical pts (0, 0), (1, 0) 3. (a) (0, 0) is a saddle, and (1, 0) is a center. (b) (0, 0) is a stable spiral. (c) (±nπ, 0) is a saddle for n even, and a center for n odd. (d) (0, 0) is a center, and (1, 0) is a saddle. Section 7.2 1. (a) x1 = c1 cos 2t + c2 sin 2t x2 = −c1 sin 2t + c2 cos 2t (b) x1 = −et + 2e2t x2 = 2et − 2e2t (c) x1 = c1 et + c2 e−t x2 = 13 c1 et + c2 e−t (d) x1 = c1 e−t + c2 e3t + 3e4t x2 = −c1 e−t + c2 e3t + 2e4t 0 x1 0 2 x1 2. (a) = x2 −2 0 x2 0 x1 3 1 x1 (b) = x2 −2 0 x2 0 x1 2 −3 x1 (c) = x2 1 −2 x2 0 4t x1 1 2 x1 5e (d) = + x2 2 1 x2 0 3t 2e − 2e−2t 3. (a) x = 6e3t − e−2t sin 2t + cos 2t (b) x = cos 2t − sin 2t  2t  e − e−t (c) x = 53  e2t − e−t  e2t + 2 e−t  et + 2e3t + e5t  et − e5t (d) x =  1 5t 1 t 3t 2e − e + 2e 1 −1 4. (a) xp = (b) xp = −1 1/2     1 −2 (c) xp =  3  (d) xp = −3 −1 0 1 5. (a) xp = e2t −1 cos 2t (b) xp = − sin 2t −2 e2t (c) x = − 1 et − e2t 2 t −2te (d) xp = −3et  Section 7.3 1 3 1. (a) x = c1 e−t + c2 e4t −1 2 −2t 5t e + 6e (b) x = −6 e−2t + 6 e5t       2 3 6 (c) x = c1 2+c2 et 1+c3 e−t 1 2 2 5  5t  3t 3e − 2e (d) x(t) = −3e5t + 3e3t  3e5t − 4e3t − sin 3t 2. (a) x = c1 e4t cos3t cos 3t + c2 e4t sin 3t 4 sin 2t + 3 cos 2t (b) x = 2 sin 2t − cos 2t   1 (c) x = c1 et 0  0  0 + c2 e2t cos 3t + sin 3t  − cos 3t  0 + c3 e2t − sin 3t + cos 3t sin 3t 263   2 sin 2t + cos 2t (d) x = 2 cos 2t − sin 2t 3 e3t c1 + c2 (t + 1) −c1 − c2 t 4t c1 + c2 (t − 1) (b) x = e −c1 − c2 t 3. (a) x = e−3t 4. (a) (b) (c) (d) −1 6t 4 x(t) = + c1 e 1/2 3 −1 + c2 e−t 1 1 5t 7 t −3 x(t) = e +2e 1 −1 1 + 12 e−t 1     6 7 x(t) = e2t 2 + c1 2 5 4     2 3 + c2 et 1 + c3 e−t 1 2 2     1 1 x(t) = −2 + c1 e5t −1 + 3 1     0 −2 c2 e3t  3  + c3 e3t 0 1 0 Section 7.4 1. x1 (t) = 80 − 30 e−0.1t − 20 e−0.3t x2 (t) = 80 − 60 e−0.1t + 40 e−0.3t −3t/4 2. x1 (t) = 10 + 5 e x2 (t) = 20 − 5 e−3t/4 3. x1 (t) = 20 + 12 e−11t + 5 e−18t x2 (t) = 6 + 8 e−11t + 15 e−18t x3 (t) = 40 − 20 e−11t − 20 e−18t 4. x1 (t) = 10 x2 (t) = 10 − 5 e−t/10 x3 (t) = 10 − 5 e−t/10 − 5e−t/5 Section 7.5 1. The general solution is ct c1 + c2 t − 2m x=e c c −c1 2m − c2 t 2m + c2 The first component is the position ct and x(t) = e− 2m (c1 +c2 t) agrees with the solution found in Section 2.4. 2. The spring forces on m1 come from the left and middle springs, and on m2 from the middle and right springs. Calculate these. 3. Let µ = λ2 , apply the quadratic formula to find µ, and compare terms to show µ± < 0. 4. Natural frequencies and  are ω = 1, 2   − cos t sin t  sin t   cos t     x(t) = c1   2 sin t  + c2 −2 cos t + 2 sin t 2 cos t     cos 2t − sin 2t −2 sin 2t −2 cos 2t    c3   sin 2t  + c4  − cos 2t  2 sin 2t 2 cos 2t   0 1 0 0 −q1 −λ q2 0   where 5. x0 =   0 0 −λ 1  q3 0 −q3 −q4 q4 = c2 /m2 while q1 , q2 , q3 are as for case of three springs. 6. Resonant frequencies are ω = 1, 2. For nonresonant ω, 1 cos ωt x(t) = (ω2 −1)(ω + 2 −4) 3 − ω2 1 1 c1 cos t + c2 sin t + 2 2 1 1 c3 cos 2t + c4 sin 2t −2 −2 7. x00i (t) ≡ 0 and Ax1 = Av = 0 = A(tv) = A(x2 (t)). 264 APPENDIX E. ANSWERS TO SOME EXERCISES     1 1 8. x(t) = a0 1 + b0 t 1 + 1 1     1 1 a1 cos 2t  0  + b1 sin 2t  0  + −1 −1     1 1 a2 cos 4t −3 + b2 sin 4t −3 1 1 9. To the solution xh (t) in Exercise 8, add the particular  solution  −4 3t  5 xp (t) = 2 cos 21 −4 Section 7.6: Odd numbers only 1. Since x1 (0) = (0, 0) = x2 (0), uniqueness of solutions would imply x1 (t) ≡ x2 (t), which is not the case. 3. We differentiate the series term-byterm and then factor 2out A: x(t) = (I + tA + (tA) 2! + · · · )v 2 0 2 x (t) = (A + tA + t2! A3 + · · · )v 2 = A(I + tA + (tA) 2! + · · · )v = AetA v = Ax(t). Check I.C.: x(0) = e0 v = Iv = v. " # 4t 4t 5. etA = 1+e 2 4t 1−e 4 1+e4t 2 √ √ √  7 t 7 cos 27 t+sin 2 √   7 −4 sin  2 t √  √ +c4 e−t/2  √  7 7 − 7 cos 2 t−sin 2 t √ 4 sin 27 t (c) The system is underdamped since x1 = x and x3 = y oscillate as they decay. (d) Not all solutions decay to zero since can have c1 6= 0; but this represents a shift in the location of the spring rather than a change in the motion. √ 9. (a) x(t) = 12t + √85 sin 5t √ 12 y(t) = 12t − √ sin 5t 5 √ (b) t∗ = π/ 5 (c) v1 = 4, v2 = 24 (d) momentum is 60, energy is 600 11. x = Ey ⇒ x0 = Ey0 and Ax + f = AEy + f , so Ey0 = AEy + f . Multiply both sides on the left by E−1 and use E−1 AE = D. cos t 5 t 5 −t − 2 + 4e + 4e 13. x = − sin2 t + 45 et − 54 e−t 1−e Appendix A 1 + e4t x(t) = √ 2(1 − e4t ) 1. (a) z = 1 − 2i, |z| = 5   √ 0 1 0 0 (b) z = 2 + 3i, |z| = 13 −1 −1 1 0 x 7. (a) x0 =  (c) z = −3 − 4i, |z| = 5 0 0 0 1 √ 1 0 −1 −1 (d) z = 5 + 7i, |z| = 74     1 −1 2. (a) wz = −20 + 15i, wz = 54 − 35 i 0   −t  1    (b) x(t) = c1   +c2 e   1 −1 (b) wz = 5, wz = − 35 + 45 i 0 1 (c) wz = 5 − i, wz = − 12 − 52 i √ √   √ 7 7 cos 2 t− 7√sin 2 t 8 (d) wz = 17i, wz = 17 − 15 17 i   7 −4 cos t   −t/2 2 √ √ √ +c3 e   − cos 27 t+ 7 sin 27 t 3. (a) 3 ei 3π/2 , (b) 3 ei 5π/4 , √ (c) 2 ei 5π/3 , (d) 5 ei 2.498 . 4 cos 27 t Bibliography [1] G. Birkhoff & G.-C. Rota, Ordinary Differential Equations (4th ed.), John Wiley, New York, 1989. [2] R. Churchill, Operational Mathematics (3rd ed.), McGraw-Hill, New York, 1972. [3] M. Hirsch, S. Smale, Differential Equations, Dynamical Systems, and Linear Algebra, Academic Press, New York, 1974. [4] G. Strang, Linear Algebra and its Applications (2nd ed.), Academic Press, 1980. 265 266 BIBLIOGRAPHY Index nth-order differential equation, 35 adjoint matrix, 139 algebraic multiplicity, 185 alternating current, 73 amplitude-phase form, 51 argument of a complex number, 238 augmented coefficient matrix, 112 autonomous, 12, 204 back substitution, 114 basis for a solution space, 167 for a vector space, 163 beats, 67 Bernoulli equations, 28 capacitor, 71 center, 205 chain of generalized eigenvectors, 219 characteristic equation for a square matrix, 182 for an nth order equation, 45 characteristic polynomial for a square matrix, 182 for an nth order equation, 45 circuits, electric, 71 circular frequency, 50 closed system (of tanks), 224 co-linear vectors, 153 coefficients, 35 coefficients of a system/matrix, 107 cofactor expansion, 136 cofactor matrix, 139 cofactor of a matrix, 136 column space of a matrix, 170 column vector, 107 complementary solution, 43 complex conjugate, 237 number, 237 plane, 237 complex vector space, 148 components of a vector, 107, 145 conservation of energy, 56 convolution of functions, 100 cooling, 9 coordinates in Rb , 145 coupled mechanical vibrations, 228, 230 critical points, 12, 205 critically damped, 53 damped forced vibrations, 67 free vibration, 53 damping coefficient, 38 dashpot, 50 defect (of an eigenvalue), 217 defective eigenvalue, 217 delta function, 96 determinant of a (2 × 2)-matrix, 41, 126 of a square matrix, 131 diagonalizable matrix, 187 differential form, 29 Dirac delta function, 96 direct current, 73 direction field, 11 discontinuous inputs, 94 dot product, 108 drag, 17 echelon form, 113 eigenbasis, 187 eigenpair, 182 eigenvalue, 182 eigenvalue method, 214 267 268 eigenvector, 182 electric circuits, 71 electrical resonance, 76 elementary row operations, 113 elements of a matrix, 107 elimination for first-order systems, 207 for linear systems, 105 equilibrium, 10, 12, 205 ERO, 113 Euler’s formula, 238 even permutation, 133 exact equations, 29 existence of solutions for first-order equations, 11 for first-order systems, 204 for second-order equations, 39 exponential order, 82 external force, 38 first-order differential equation, 7 first-order linear system, 207 first-order system, 37, 203 forced vibrations, 65 forcing frequency, 65 free variables, 114 free vibrations, 38, 50 gamma function, 81 Gauss-Jordan method, 128 Gaussian elimination, 112 general solution for 1st-order systems, 210 for 2nd-order equations, 41 for first-order equations, 8 generalized eigenvector, 217 geometric multiplicity, 185 Gram-Schmidt procedure, 176 homogeneous nth-order equation, 35 homogeneous equations (1st-order), 27 homogeneous first-order system, 207 homogeneous linear system, 117 Hooke’s law, 37 hyperbolic cosine, 84 hyperbolic sine, 84 identity matrix, 109 INDEX imaginary axis, 237 imaginary part, 237 implicit form, 14 impulsive force, 96 inconsistent equations, 106 inductor, 71 initial-value problem for first-order equations, 8 for first-order system, 37 for second-order equation, 39 inner product, 173 integrating factor, 22 internal forces, 38 inverse Laplace transform, 83 inverse of a matrix, 125 invertible matrix, 125 jump discontinuity, 82 Laplace transform, 79 inverse, 83 leading 1, 113 leading variables, 114 length of a vector in Rn , 147 linear combination of vectors, 153 linear dependence for functions, 40, 160 for vectors, 158 linear differential equation, 8 nth-order equations, 35 first-order equations, 21 second-order equations, 39 linear differential operator, 35 linear independence for functions, 40, 160 for vectors, 158 linear span of vectors, 153 linear systems, 105 linear transformation, 181 linearization, 56 logistic model, 18 lower triangular form, 131 magnitude of a vector in Rn , 147 main diagonal of a matrix, 109 mass matrix, 230 mathematical models, 9 INDEX matrix, 107 addition, 107 determinant of, 131 elements of, 107 inverse, 125 multiplication, 108 scalar multiplication, 107 matrix exponential, 191 minor of a matrix, 136 mixture problems, 24 modulus, 237 multiple tank mixing, 222 multiplication of matrices, 108 multiplicity for an eigenvalue, 185 natural frequencies, 229 natural frequency, 65 nonhomogeneous nth-order equation, 35 nonhomogeneous first-order system, 207 norm of a vector, 172, 173 nullity of a matrix, 171 nullspace of a matrix, 152 odd permutation, 133 Ohm’s law, 72 open system (of tanks), 222 order (of a differential equation), 8 ordered basis, 198 ordinary differential equation, 10 orthogonal basis, 174 matrix, 196 vectors, 172, 173, 193 orthogonally diagonalizable, 197 orthonormal basis, 174 set, 173 overdamped, 53 parallelogram rule, 145 partial differential equation, 10 particular solution, 8 pendulum, 56 permutation, 133 phase plane portraits, 205 phase shift, 51 piecewise continuous, 82 269 piecewise differentiable, 86 pivot column, 115 pivot position, 115 polar form of a complex number, 238 population growth, 9 potential function, 30 practical resonance, 68, 76 projection onto a subspace, 175 onto a vector, 175 pseudo-frequency, 54 pseudo-periodic, 54 radioactive decay, 9 radiocarbon dating, 21 rank of a matrix, 122 rank-nullity identity, 171 real axis, 237 real part, 237 real vector space, 148 reduced row-echelon form, 120 REF, 113 resistive force, 17 resistor, 71 resonance for second-order equations, 67 for second-order systems, 232 restorative force, 37 row space of a matrix, 169 row vector, 107 row-echelon form, 113 row-equivalent matrices, 113 RREF, 120 saddle, 205 scalars, 146, 148 second-order system, 230 separable, 14 shifting theorems, 90 similar matrices, 189 singular matrix, 125 sink, 12, 205 skew-symmetric matrix, 142 slope field, 11 solution curve, 11 source, 12, 205 span(v1 , . . . , vn ), 153 270 spanning set, 153 spring constant, 37 square matrix, 109 stability for first-order equations, 12 for first-order systems, 205 stable equilibrium, 12, 205 standard basis for Rn , 163 steady-periodic solution, 68 steady-periodic charge, 74 steady-periodic current, 74 step function, 82 stiffness matrix, 230 stochastic matrix, 202 straight-line solution, 214 subspace of a vector space, 151 substitution, 27, 105 successive approximations, 12 superposition, 36 symmetric matrix, 110 system of first-order equations, 37, 203 of first-order linear equations, 207 of linear equations, 105 of second-order equations, 230 terminal velocity, 18 time lag, 51 Torricelli’s law, 14, 34 trace of a square matrix, 142 transfer function, 102 transient part, 68, 73 transition matrix, 190 transpose of a matrix, 110 trial solution, 58 triangular form, 131 trivial solution, 40, 117 undamped forced vibrations, 66 free vibrations, 50 underdamped, 53 undetermined coefficients, 58, 212 uniqueness of solutions for first-order equations, 11 unit step function, 82 unit vector INDEX in Rn , 147 in an inner product space, 173 unstable equilibrium, 12, 205 upper triangular form, 131 variation of parameters, 64 vector field, 204 vector notation, 107 vector space, 148 vectors in Rn , 145 in a vector space, 148 warming, 10 Wronskian, 41, 161 zero vector in Rn , 145 in a vector space, 148

Differential Equations with Linear Algebra Textbook

Related documents

Products

Support

Differential Equations with Linear Algebra Textbook

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib