Differential Equations and Linear Algebra Jason Underdown December 8, 2014 Contents Chapter 1. First Order Equations 1 1. Differential Equations and Modeling 1 2. Integrals as General and Particular Solutions 5 3. Slope Fields and Solution Curves 9 4. Separable Equations and Applications 13 5. Linear First–Order Equations 20 6. Application: Salmon Smolt Migration Model 26 7. Homogeneous Equations 28 Chapter 2. Models and Numerical Methods 31 1. Population Models 31 2. Equilibrium Solutions and Stability 34 3. Acceleration–Velocity Models 39 4. Numerical Solutions 41 Chapter 3. Linear Systems and Matrices 45 1. Linear and Homogeneous Equations 45 2. Introduction to Linear Systems 47 3. Matrices and Gaussian Elimination 50 4. Reduced Row–Echelon Matrices 53 5. Matrix Arithmetic and Matrix Equations 53 6. Matrices are Functions 53 7. Inverses of Matrices 57 8. Determinants 58 Chapter 4. Vector Spaces 61 i ii Contents 1. Basics 61 2. Linear Independence 64 3. Vector Subspaces 65 4. 5. Affine Spaces Bases and Dimension 65 66 6. Abstract Vector Spaces 67 Chapter 5. Higher Order Linear Differential Equations 69 1. Homogeneous Differential Equations 69 2. Linear Equations with Constant Coefficients 70 3. Mechanical Vibrations 74 4. The Method of Undetermined Coefficients 76 5. The Method of Variation of Parameters 78 6. Forced Oscillators and Resonance 80 7. Damped Driven Oscillators 84 Chapter 6. Laplace Transforms 87 1. The Laplace Transform 87 2. The Inverse Laplace Transform 92 3. Laplace Transform Method of Solving IVPs 94 4. Switching 101 5. Convolution 102 Chapter 7. Eigenvalues and Eigenvectors 105 1. Introduction to Eigenvalues and Eigenvectors 105 2. Algorithm for Computing Eigenvalues and Eigenvectors 107 Chapter 8. Systems of Differential Equations 109 1. First Order Systems 109 2. Transforming a Linear DE Into a System of First Order DEs 112 3. Complex Eigenvalues and Eigenvectors 113 4. Second Order Systems 115 Chapter 1 First Order Equations 1. Differential Equations and Modeling A differential equation is simply any equation that involves a function, say y(x) and any of its derivatives. For example, (1) y 00 = −y. The above equation uses the prime notation (0 ) to denote the derivative, which has the benefit of resulting in compact equations. However, the prime notation has the drawback that it does not indicate what the independent variable is. By just looking at equation 1 you can’t tell if the independent variable is x or t or some other variable. That is, we don’t know if we’re looking for y(x) or y(t). So sometimes we will write our differential equations using the more verbose, but also more clear Leibniz notation. (1) d2 y = −y dx2 In the Leibniz notation, the dependent variable, in this case y, always appears in the numerator of the derivative, and the independent variable always appears in the denominator of the derivative. Definition 1.1. The order of a differential equation is the order of the highest derivative that appears in it. So the order of the previous equation is two. The order of the following equation is also two: (2) x(y 00 )2 = 36(y + x). Even though y 00 is squared in the equation, the highest order derivative is still just a second order derivative. 1 2 1. First Order Equations Our primary goal is to solve differential equations. Solving a differential equation requires us to find a function, that satisfies the equation. This simply means that if you replace every occurence of y in the differential equation with the found function, you get a valid equation. There are some similarities between solving differential equations and solving polynomial equations. For example, given a polynomial equation such as 3x2 − 4x = 4, it is easy to verify that x = 2 is a solution to the equation simply by substituting 2 in for x in the equation and checking whether the resulting statement is true. Analogously, it is easy to verify that y(x) = cos x satisfies, or is a solution to equation 1 by simply substituting cos x in for y in the equation and then checking if the resulting statement is true. ? (cos x)00 = − cos x ? (− sin x)0 = − cos x ? − cos x = − cos x X The biggest difference is that in the case of a polynomial equation our solutions took the form of real numbers, but in the differential equation case, our solutions take the form of functions. Example 1.2. Verify that y(x) = x3 − x is a solution of equation 2. y 00 = 6x ⇒ x(y 00 )2 = x(6x)2 = 36x3 = 36(y + x) 4 A basic study of differential equations involves two facets. Creating differential equations which encode the behavior of some real life situation. This is called modeling. The other facet is of course developing systematic solution techniques. We will examine both, but we will focus on developing solution techniques. 1.1. Mathematical Modeling. Imagine a large population or colony of bacteria in a petri dish. Suppose we wish to model the growth of bacteria in the dish. How could we go about that? Well, we have to start with some educated guesses or assumptions. Assume that the rate of change of this colony in terms of population is directly proportional to the current number of bacteria. That is to say that a larger population will produce more offspring than a smaller population during the same time interval. This seems reasonable, since we know that a single bacterium reproduces by splitting into two bacteria, and hence more bacteria will result in more offspring. How do we translate this into symbolic language? (3) ∆P = P ∆t 1. Differential Equations and Modeling 3 This says that the change in a population depends on the size of the population and the length of the time interval over which we make our population measurements. So if the time interval is short, then the population change will also be small. Similarly it roughly says that more bacteria correspond to more offspring, and vice versa. But if you look closely, the left hand side of equation 3 has units of number of bacteria, while the right hand side has units of number of bacteria times time. The equation can’t possibly be correct if the units don’t match. However to fix this we can multiply the left hand side by some parameter which has units of time, or we can multiply the right hand side by some parameter which has units of 1/time. Let’s multiply the right hand side by a parameter k which has units of 1/time. Then our equation becomes: (4) ∆P = kP ∆t Dividing both sides of the equation by ∆t and taking the limit as ∆t goes to zero, we get: dP ∆P = = kP lim ∆t→0 ∆t dt (5) dP = kP dt Here k is a constant of proportionality, a real number which allows us to balance the units on both sides of the equation and it also affords some freedom. In essence it allows us to defer saying how closely P and its derivative are related. If k is a large positive number, then that would imply a large rate of change, and a small positive number greater than zero but less than one would be a small rate of change. If k is negative then that would imply the population is shrinking in number. Example 1.3. If we let P (t) = Cekt , then a simple differentiation reveals that this is a solution to our population model in equation 5. Suppose that at time 0, there are 1000 bacteria in the dish. After one hour the population doubles to 2000. This data corresponds to the following two equations which allow us to solve for both C and k: 1000 = P (0) = Ce0 = C =⇒ C = 1000 2000 = P (1) = Cek The second equation implies 2000 = 1000ek which is equivalent to 2 = ek which is equivalent to k = ln 2. Thus we see that with these two bits of data we now know: P (t) = 1000eln(2)·t = 1000(eln(2) )t = 1000 · 2t This agrees exactly with our knowledge that bacteria multiply by splitting into two. 4 4 1. First Order Equations 1.2. Linear vs. Nonlinear. As you may have surmised we will not be able to exactly solve every differential equation that you can imagine. So it will be important to recognize which equations we can solve and those which we can’t. It turns out that a certain class of equations called linear equations are very amenable to several solution techniques and will always have a solution (under modest assumptions), whereas the complementary set of nonlinear equations are not always solvable. A linear differential equation is any differential equation where solution functions can be summed or scaled to get new solutions. Stated precisely, we mean: Definition 1.4. A differential equation is linear is equivalent to saying: If y1 (x) and y2 (x) are any solutions to the differential equation, and c is any scalar (real) number, then (1) y1 (x) + y2 (x) will be a solution and, (2) cy1 (x) will be a solution. This is a working definition, which we will change later. We will use it for now because it is simple to remember and does capture the essence of linearity, but we will see later on that we can make the definition more inclusive. That is to say that there are linear differential equations which don’t satisfy our current definition until after a certain piece of the equation has been removed. Example 1.5. Show that y1 (x) + y2 (x) is a solution to equation 1 when y1 (x) = cos x and y2 (x) = sin x. (y1 + y2 )00 = (cos x + sin x)00 = (− sin x + cos x)0 = (− cos x − sin x) = −(cos x + sin x) = −(y1 + y2 ) X 4 Notice that the above calculation does not prove that y 00 = −y is a linear differential equation. The reason for this is that summability and scalability have to hold for any solutions, but the above calculation just proves that summability holds for the two given solutions. We have no idea if there may be solutions which satisfy equation 1 but fail the summability test. The previous definition is useless for proving that a differential equation is linear. However, the negation of the definition is very useful for showing that a differential equation is nonlinear, because the requirements are much less stringent. Definition 1.6. A differential equation is nonlinear is equivalent to saying: y1 and y2 are any solutions to the differential equation, and c is any scalar (real) number, but 2. Integrals as General and Particular Solutions 5 (1) y1 + y2 is not a solution or, (2) cy1 is not a solution. Again, this is only a working definition. It captures the essence of nonlinearity, but since we will expand the definition of linearity to be more inclusive, we must by the same token change the definition of nonlinear in the future to be less inclusive. So let’s look at a nonlinear equation. Let y = differential equation: 1 c−x , then y will satisfy the y0 = y2 (6) because: y0 = 1 c−x 0 = (c − x)−1 0 = −(c − x)−2 · (−1) 1 = (c − x)2 = y2 We see that actually, y = any real number. 1 c−x is a whole family of solutions, because c can be Example 1.7. Use definition 1.6 to show that equation 6 is nonlinear. 1 1 and y2 (x) = 3−x . We know from the previous paragraph that Let y1 (x) = 5−x both of these are solutions to equation 6, but (y1 + y2 )0 = y10 + y20 1 1 = + 2 (5 − x) (3 − x)2 1 2 1 6= + + (5 − x)2 (5 − x)(3 − x) (3 − x)2 2 1 1 = + 5−x 3−x = (y1 + y2 )2 4 2. Integrals as General and Particular Solutions You probably didn’t realize it at the time, but every time you computed an indefinite integral in Calculus, you were solving a differential equation. For example if you R were asked to compute an indefinite integral such as f (x)dx where the integrand is some function f (x), then you were actually solving the differential equation dy (7) = f (x). dx 6 1. First Order Equations This is due to the fact that differentiation and integration are inverses of each other up to a constant. Which can be phrased mathematically as: Z Z dy y(x) = dx = f (x)dx = F (x) + C dx if F (x) is the antiderivative of f (x). Notice that the integration constant C can be any real number, so our solution y(x) = F (x) + C to equation 7 is not a single solution but actually a whole family of solutions, one for each value of C. Definition 1.8. A general solution to a differential equation is any solution which has an integration constant in it. As noted above, since a constant of integration is allowed to be any real number, a general solution is actually an infinite set of solutions, one for each value of the integration constant. We often say that a general solution with one integration constant forms a one parameter family of solutions. Example 1.9. Solve y 0 = x2 − 3 for y(x). x3 − 3x + C 3 Thus our general solution is y(x) = 31 x3 − 3x + C. Figure 2.1 shows plots of several solution curves for C values ranging from 0 to 3. Z y(x) = y 0 dx = Z (x2 − 3)dx = Figure 2.1. Family of solution curves for y 0 = x2 − 3. 4 Thus we see that whenever we can write a differential equation in the form y 0 = f (x) where the right hand side is only a function of x (or whatever the 2. Integrals as General and Particular Solutions 7 independent variable is, e.g. t) and does not involve y (or whatever the dependent variable is), then we can solve the equation merely by integrating. This is very useful. 2.1. Initial Value Problems (IVPs) and Particular Solutions. Definition 1.10. An initial value problem or IVP is a differential equation and a specific point which our solution curve must pass through. It is usually written: (8) y 0 = f (x, y) y(a) = b. Differential equations had their genesis in solving problems of motion, where the indpendent variable is time, t, hence the use of the word “initial”, to convey the notion of a starting point in time. Solving an IVP is a two step process. First you must find the general solution. Second you use the initial value y(a) = b to select one particular solution out of the whole family or set of solutions. Thus a particular solution is a single function which satisfies both the governing differential equation and passes through the initial value a.k.a. initial condition. Definition 1.11. A particular solution is a solution to an IVP. Example 1.12. Solve the IVP: y 0 = 3x − 2, y(2) = 5. Z y(x) = (3x − 2) dx 3 2 x − 2x + C 2 3 y(0) = 22 − 2 · 2 + C = 5 =⇒ C = 3 2 3 y(x) = x2 − 2x + 3 2 y(x) = 4 2.2. Acceleration, Velocity, Position. The method of integration extends to high order equations. For example, when confronted with a differential equation of the form: (9) d2 y = f (x), dx2 8 1. First Order Equations we simply integrate twice to solve for the way. Z y(x) = Z = Z = Z = y(x), gaining two integration constants along dy dx dx Z 2 d y dx dx dx2 Z f (x)dx dx (F (x) + C1 )dx = G(x) + C1 x + C2 Where we are assuming G00 (x) = F 0 (x) = f (x). Acceleration is the time derivative of velocity (a(t) = v 0 (t)), and velocity is the time derivative of position (v(t) = x0 (t)). Thus acceleration a(t) is the second derivative of position x(t) with respect to time, or a(t) = x00 (t). If we let x(t) denote the position of a body, and we assume that the acceleration that the body experiences is constant with value a, then in the language of math this is written as: x00 (t) = a (10) The right hand side of this is just the constant function f (t) = a, so this equation conforms to the form of equation 7. However the function name is x instead of y and the independent variable is t instead of x, but no matter, they are just names. To solve for x(t) we must integrate twice with respect to t, time. (11) 0 Z v(t) = x (t) = 00 x (t)dt = Z adt = at + v0 Here we’ve named our integrtion constant v0 because it must match the initial velocity, i.e. the velocity of the body at time t = 0. Now we integrate again. Z (12) x(t) = Z v(t)dt = (at + v0 )dt = 1 2 at + v0 t + x0 2 Again, we have named the integration constant x0 because it must match the initial position of the body, i.e. the position of the body at time t = 0. Example 1.13. Suppose we wish to know how long it will take an object to fall from a height of 500 feet down to the ground, and we want to know its velocity when it hits the ground. We know from Physics that near the surface of the Earth the acceleration due to gravity is roughly constant with a value of 32 feet per second per second (f /s2 ). Let x(t) represent the vertical position of the object with x = 0 corresponding to the ground and x(0) = x0 = 500. Since up is the positive direction and since the 3. Slope Fields and Solution Curves 9 acceleration of the body is down towards the earth a = −32. Although the problem says nothing about an initial velocity it is safe to assume that v0 = 0. 1 2 at + v0 t + x0 2 1 x(t) = (−32)t2 + 0 · t + 500 2 x(t) = −16t2 + 500 x(t) = We wish to know the time when the object will hit the ground so we wish to solve the following equation for t: 0 = −16t2 + 500 500 t2 = 16 r 500 t=± 16 5√ t=± 5 2 t ≈ ±5.59 So we find that it will take approximately 5.59 seconds to hit the earth. We can use this knowledge and equation 11 to compute its velocity at the moment of impact. v(t) = at + v0 v(t) = −32t v(5.59) = −32 · 5.59 v(5.59) = −178.88 ft/s v(5.59) ≈ −122 mi/hr. 4 3. Slope Fields and Solution Curves In section 1 we noticed that there are some similarities between solving polynomial equations and solving differential equations. Specifically, we noted that it is very easy to verify whether a function is a solution to a differential equation simply by plugging it into the equation and checking that the resulting statement is true. This is exactly analogous to checking whether a real number is a solution to a polynomial equation. Here we will explore another similarity. You are certainly familiar with using the quadratic formula for solving quadratic equations, i.e. degree two polynomial equations. But you may not know that there are similar formulas for solving third degree and even fourth degree polynomial equations. Interestingly, it was proved early in the nineteenth century that there is no general formula similar to the quadratic formula which will tell us the roots of all fifth and higher degree 10 1. First Order Equations polynomial equations in terms of the coefficients. Put simply, we don’t have a formulaic way of solving all polynomial equations. We do have numerical techniques (e.g. Newton’s Method) of approximating the roots which work very well, but these do not reveal the exact value. As you might suspect, since differential equations are generally more complicated than polynomial equations the situation is even worse. No procedure exists by which a general differential equation can be solved explicitly. Thus, we are forced to use ad hoc methods which work on certain classes of differential equations. Therefore any study of differential equations necessarily requires one to learn various ways to classify equations based upon which method(s) will solve the equation. This is unfortunate. 3.1. Slope Fields and Graphing Approximate Solutions. Luckily, in the case of first order equations a simple graphical method exists by which we may estimate solutions by constraints on their graphs. This method of approximate solution uses a special plot called a slope field. Specifically, if we can write a differential equation in the form: dy = f (x, y) (13) dx then we can approximate solutions via the slope field plot. So how does one construct such a plot? The answer lies in noticing that the right hand side of equation 13 is a function of points in the xy plane which result in the left hand side which is exactly the slope of y(x), the solution function we seek! If we know the slope of a function at every point on the x–axis, then we can graphically reconstruct the solution function y(x). Creating a slope field plot is normally done via software on a computer. The basic algorithm that a computer employs to do this is essentially the following: (1) Divide the xy plane evenly into a grid of squares. (2) For each point (xi , yi ) in the grid do the following: (a) compute the slope, dy/dx = f (xi , yi ). (b) Draw a small bar centered on the point (xi , yi ) with slope computed above. (Each bar should be of equal length and short enough so that they do not overlap.) Let’s use Maple to create a slope field plot for the differential equation y (14) y0 = 2 . x +1 with(DEtools): DE := y’(x) = y(x)/(x^2+1) dfieldplot(DE, y(x), x=-4..4, y=-4..4, arrows=line) Maple Listing 1. Slope field plot example. See figure 3.1. Because any solution curve must be tangent to the bars in the slope field plot, it is fairly easy for your eye to detect possible routes that a solution curve could 3. Slope Fields and Solution Curves Figure 3.1. Slope field plot for y 0 = 11 y . x2 +1 take. One can immediately gain a feel for the qualitative behavior of a solution which is often more valuable than a quantitative solution when modeling. 3.2. Creating Slope Field Plots By Hand. The simple algorithm given above is fine for a computer program, but is very hard for a human to use in practice. However there is a simpler algorithm which can be done by hand with pencil and graph paper. The main idea is to find the isoclines in the slopefield, and plot regularly spaced, identical slope bars over the entire length of the isocline. Definition 1.14. An isocline is a line or curve decorated by regularly spaced short bars of constant slope. Example 1.15. Suppose we wish to create a slope–field plot for the differential equation dy = x − y = f (x, y). dx The method involves two steps. First, we create a table. Each row in the table corresponds to one isocline. Second, for each row in the table we graph the corresponding isocline and decorate it with regularly spaced bars, all of which have equal slope. The slope corresponds to the value in the first column of the table. Table 1 contains the data for seven isoclines, one for each integer slope value from −3, . . . , 3. We must graph each equation of a line from the third column, and decorate it with regularly spaced bars where the slope comes from the first column. Figure 3.2. Isocline slope–field plot for y 0 = x − y. 12 1. First Order Equations m -3 -2 -1 0 1 2 3 m = f (x, y) y = h(x) −3 = x − y −2 = x − y −1 = x − y 0=x−y 1=x−y 2=x−y 3=x−y y=x+3 y=x+2 y=x+1 y=x y=x−1 y=x−2 y=x−3 Table 1. Isocline method. 4 3.3. Existence and Uniqueness Theorem. It would be useful to have a simple test that tells us when a differential equation actually has a solution. We need to be careful here though, because recall that a general solution to a differential equation is actually an infinite family of solution functions, one for each value of the integration constant. We need to be more specific. What we should really ask is, “Does my IVP have a solution?” Recall that an IVP (Initial Value Problem) is a differential equation and an initial value, (8) y 0 = f (x, y) y(a) = b. If a particular solution exists, then our follow up question should be, “Is my particular solution unique?”. The following theorem gives a test that can be performed to answer both questions. Theorem 1.16 (Existence and Uniqueness). Consider the IVP dy = f (x, y) y(a) = b dx (1) Existence If f (x, y) is continuous on some rectangle R in the xy–plane which contains the point (a, b), then there exists a solution to the IVP on some open interval I containing the point a. ∂ (2) Uniqueness If in addition to the conditions in (1), ∂y f (x, y) is continuous on R, then the solution to the IVP is unique in I. √ Example 1.17. Consider the IVP: y 0 = 3 y y(0) = 0. Use theorem 1.16 to determine (1) whether or not a solution to the IVP exists, and (2) if one does, whether it is unique. (1) The cube root function is defined for all real numbers, and is continuous everywhere thus a solution to the IVP exists. 4. Separable Equations and Applications (2) f (x, y) = √ 3 13 1 y = y3 ∂f 1 2 = y− 3 ∂y 3 1 = p 3 3 y2 which is discontinuous at (0, 0), thus the solution is not unique. 4 4. Separable Equations and Applications In the previous section we explored a method of approximately solving a large class dy = f (x, y), where the right hand side is any of first order equations of the form dx function of both the independent variable x and the dependent variable y. The graphical method of creating a slope field plot is useful, but not ideal because it does not yield an exact solution function. Luckily, a large subclass (subset) of these equations, the so–called separable equations can be solved exactly. Essentially an equation is separable if the right hand side can be factored into a product of two functions, one a function of the independent variable, and the other a function of the dependent variable. Definition 1.18. A separable equation is any differential equation that can be written in the form: dy (15) = f (x)g(y). dx Example 1.19. Determine whether the following equations are separable or not. dy (1) = 3x2 y − 5xy dx dy x−4 (2) = 2 dx y +y+1 dy √ (3) = xy dx dy (4) = y2 dx dy (5) = 3y − x dx dy (6) = sin(x + y) + sin(x − y) dx dy (7) = exy dx dy (8) = ex+y dx Solutions: (1) separable: 3x2 y − 5xy = (3x2 − 5x)y 14 1. First Order Equations x−4 = (x − 4) 2 y +y+1 √ √ √ (3) separable: xy = x y (2) separable: 1 2 y +y+1 (4) separable: y 2 = y 2 · 1 (5) not separable (6) separable: sin(x + y) + sin(x − y) = 2 sin(x) cos(y) (7) not separable (8) separable: ex+y = ex · ey 4 Before explaining and justifying the method of separation of variables formally, it is helpful to see an example of how it works. A good way to remember this method is to remember that it allows us to treat derivatives written using the Leibniz notation as if they were actual fractions. Example 1.20. Solve the initial value problem: dy = −kxy, dx assuming k is a positive constant. y(0) = 4, dy = −kx dx y Z Z dy = −k x dx y x2 ln |y| = −k +C 2 eln|y| = e(−k k x2 2 +C) 2 |y| = e(− 2 x ) · eC y = C0 e let C0 = eC 2 (− k 2x ) Now plug in x = 0 and set y = 4 to solve for our parameter C0 . 4 = C0 e0 = C0 =⇒ k 2 y(x) = 4e− 2 x 4 There are several steps in the above solution which should raise an eyebrow. First, how can you pretend that the derivative dy/dx is a fraction when clearly it is just a symbol which represents a function? Second, why are we able to integrate with respect to x on the right hand side, but with respect to y which is a function of x on the left hand side? The rest of the solution just involves algebraic manipulations and is fine. The answer to both questions above is that what we did is simply “shorthand” for a more detailed, fully correct solution. Let’s start over and solve equation 15. 4. Separable Equations and Applications 15 dy = f (x)g(y) dx 1 dy = f (x) g(y) dx So far, so good, all we have to watch out for is when g(y) = 0, but that just means that our solutions y(x) might not be defined for the whole real line. Next, let’s integrate both sides of the equation with respect to x, and we’ll rewrite y as y(x) to remind us that it is a function of x. Z dy 1 g(y(x)) dx Z dx = f (x) dx Now, to help us integrate the left hand side, we will make a u–substitution. u = y(x) dy dx. dx Z du = f (x) dx du = Z 1 g(u) This equation matches up with the second line in the example above. The “shorthand” technique used in the example skips the step of making the u–substitution. If we can integrate both sides, then on the left hand side we will have some function of u = y(x), which we can hopefully solve for y(x). However, even if we cannot solve for y(x) explicitly, we will still have an implicit solution which can be useful. Now, let’s use the above technique of separation of variables to solve the Population model from section 1. Example 1.21. Find the general solution to the population model: (5) dP = kP. dt dP = kdt Z P Z dP = k dt P ln |P | = kt + C eln|P | = ekt+C eln|P | = ekt · eC , let P0 = eC (16) P (t) = P0 ekt 4 16 1. First Order Equations The separation of variables solution technique is important because it allows us to solve several nonlinear equations. Let’s use the technique to solve equation 6 which is the first order, nonlinear differential equation we examined in section 1. Example 1.22. Solve y 0 = y 2 . dy = y2 dx Z Z dy = dx y2 Z Z y −2 dy = dx −y −1 = x + C 1 − =x+C y 1 =C −x y 1 y(x) = C −x absorb negative sign into C 4 4.1. Radioactive Decay. Most of the carbon in the world is of the isotope carbon–12, (126 C), but there are small amounts of carbon–14, (146 C) continuously being created in the upper atmosphere as a result of cosmic rays (neutrons in this case) colliding with nitrogen. 1 0n + 147 N → 146 C + 11 p The resulting 146 C is radioactive and will eventually beta decay to and an anti–neutrino: 14 6C 14 7 N, an electron → 147 N + e− + ν̄e The half–life of 146 C is 5732 years. This is the time it takes for half of the 146 C in a sample to decay to 147 N. The half–life is determined experimentally. From this knowledge we can solve for the constant of proportionality k: 1 P0 = P0 ek·5732 2 1 = ek·5732 2 1 ln = 5732k 2 ln(1) − ln(2) k= 5732 − ln(2) k= 5732 k ≈ −0.00012092589 4. Separable Equations and Applications 17 The fact that k is negative is to be expected, because we are expecting the population of carbon–14 atoms to diminish as time goes on since we are modeling exponential decay. Let us now see how we can use our new knowledge to reliably date ancient artifacts. All living things contain trace amounts of carbon–14. The proportion of carbon– 14 to carbon–12 in an organism is equal to the proportion in the atmosphere. This is because although carbon atoms in the organism continually decay, new radioactive carbon–14 atoms are taken in through respiration or consumption. That is to say that a living organism whether it be a plant or animal continually replenishes its supply of carbon–14. However, once it dies the process stops. If we assume that the amount of carbon–14 in the atmosphere has remained constant for the past several thousand years, then we can use our knowledge of differential equations to carbon date ancient artifacts that contain once living material such as wood. Example 1.23 (Carbon Dating). The logs of an old fort contain only 92% of the carbon–14 that modern day logs of the same type of wood contain. Assuming that the fort was built at about the same time as the logs were cut down, how old is the fort? Let’s assume that the decrease in the population of carbon–14 atoms is governed by the population equation dy/dt = ky, where y represents the number of carbon–14 atoms. From previous work, we know that solution to this equation is y(t) = y0 ekt , where y0 is the initial amount of carbon–14. We know that currently the wood contains 92% of the carbon–14 that it would have had upon being cut down, thus we can solve: 0.92y0 = y0 ekt ln(0.92) = kt ln(0.92) k 5732 ln(0.92) t= − ln(2) t ≈ 778 years t= 4 4.2. Diffusion. Another extremely important separable equation comes about from modeling diffusion. Diffusion is the spreading of something from a concentrated state to a less concentrated state. We will model the diffusion of salt across a semi– permeable membrane such as a cell wall. Imagine a cell, which contains a salt solution that is immersed in a bath of saline solution. If the salt concentration inside the cell is Figure 4.1. Cell in salt higher than outside the cell, then salt will on average, mostly bath 18 1. First Order Equations flow out of the cell, and vice versa. Let’s assume that the rate of change of salt concentration in the cell is proportional to the difference between the concentrations outside and inside the cell. Also, let’s assume that the surrounding bath is so much larger in volume than the cell, that its concentration remains essentially constant because the outflow from the cell is miniscule. We must translate these ideas into a model. If we let y(t) represent the salt concentration inside the cell, and A the constant concentration of the surrounding bath, then we get the diffusion equation: dy = k(A − y) dt (17) Again, k is a constant of proportionality with units, 1/time, and we assume k > 0. This is a separable equation, so we know how to solve it. Z Z dy = k dt A−y Z Z du − = k dt u − ln |A − y| = kt + C |A − y| = e−kt−C u = A − y, −du = dy let C0 = e−C |A − y| = C0 e−kt ( C0 e−kt A−y = −C0 e−kt ( C0 e−kt y = A− −C0 e−kt A>y A<y A>y A<y Thus we get two solutions depending on which concentration is initially higher. (18) y(t) = A − C0 e−kt A>y (19) −kt A<y y(t) = A + C0 e Actually, there is a third rather uninteresting solution which occurs when A = y, but then the right hand side of equation 17 is simply 0, which forces y(t) = A, the constant solution. A remark is in order here. Rather than memorizing the solution, it is far better to become familar with the steps of the solution. Example 1.24. Suppose a cell with a salt concentration of 5% is immersed in a bath of 15% salt solution. If the concentration in the cell doubles to 10% in 10 minutes, how long will it take for the salt concentration in the cell to reach 14%? We wish to solve the IVP: dy = k(.15 − y) y(0) = .05, dt along with the extra information y(10) = .10. 4. Separable Equations and Applications 19 Z Z dy = k dt .15 − y Z Z du − = k dt u − ln |.15 − y| = kt + C u = .15 − y, −du = dy |.15 − y| = e−kt−C .15 − y = C0 e−kt y = .15 − C0 e−kt .05 = .15 − C0 e0 ⇒ C0 = .10 Now we can use the second condition, (point on the solution curve), to determine k: y(t) = .15 − .10e−kt .10 = .15 − .10e−k·10 .15 − .10 e−k·10 = .10 1 −k · 10 = ln 2 ln(2) − ln(1) k= 10 ln(2) k= 10 Figure 4.2 graphs a couple of solution curves, for a few different starting cell concentrations. Notice that in the limit, as time goes to infinity all cells placed in this salt bath will approach a concentration of 15%. In other words, all cells will eventually come to equilibrium with their environment. with(DEtools) DE := y’(t) = k*(A-y(t)) A := .15 k := ln(2)/10 IVS := [y(0)=.25, y(0)=.15, y(0)=.05] # Initial values array DEplot(DE, y(t), t=0..60, IVS, y=0..0.3, linecolor=navy) Maple Listing 2. Diffusion example. See figure 4.2. 20 1. First Order Equations Figure 4.2. Three solution curves for example 1.24, showing the change in salt concentration due to diffusion. Finally, we wish to find the time at which the salt concentration of the cell will be exactly 14%. To find this time, we solve the following equation for t: .14 = .15 − .10e−kt .15 − .14 = .1 e−kt = .10 1 −kt = ln 10 −kt = ln(1) − ln(10) −kt = − ln(10) ln(10) k 10 ln(10) t= ln(2) t ≈ 33.22 minutes t= 4 5. Linear First–Order Equations 5.1. Linear vs. Nonlinear Redux. In section 1 we defined a differential equation to be linear if all of its solutions satisfied summability and scalability. A first–order, linear differential equation is any differential equation which can be written in the following form: (20) a(x)y 0 + b(x)y = c(x). 5. Linear First–Order Equations 21 If we think of y 0 and y as variables, then this equation is reminiscent of linear equations from algebra, except that the coefficients are now allowed to be functions of the independent variable x, instead of real numbers. Of course, y and y 0 are functions, not variables, but the analogy is useful. Notice that the coefficient functions are strictly forbidden from being functions of y or any of its derivatives. The above definition of linear extends to higher–order equations. For example, a fourth order, linear differential equation can be written in the form: (21) a4 (x)y (4) + a3 (x)y 000 + a2 (x)y 00 + a1 (x)y 0 + a0 (x)y = f (x) Definition 1.25. In general, an n–th order, linear differential equation is any equation which can be written in the form: (22) an (x)y (n) + an−1 (x)y (n−1) + · · · + a1 (x)y 0 + a0 (x)y = f (x). This is not just a working definition. It is the definition that we will continue to use throughout the text. Notice that this definition is very different from the previous definition in section 1. That definition suffered from the defect that it was impossible to positively determine whether an equation was linear. We could only use it to determine when a differential equation is nonlinear. The above definition is totally different. You can use the above definition to tell on sight (with practice) whether or not a given differential equation is linear. Also notice that it suffers from being a poor tool for determining whether a given differential equation is nonlinear. This is because, you don’t know if perhaps your are just not being clever enough to write the equation in the form of the definition. These notes focus on solving linear equations, however recall from section 4 that we can solve nonlinear, first–order equations when they are separable. However, in general, solving higher order, nonlinear differential equations is much more difficult. However, not all is lost. A Norwegian mathematician named Sophus Lie (prononunced “lee”) discovered that if a differential equation possesses a type of transfomational symmetry, then that symmetry can be used to find solutions of the equation. His work led a German mathematician, Hermann Weyl, to extend Lie’s ideas and today Weyl’s work forms the foundations of much of modern Quantum Mechanics. Lie’s symmetry methods are beyond the scope of this book, but if you are a Physics student, you should definitely look into them after completing this course. 5.2. The Integrating Factor Method. Good news. We can solve any first order, linear differential equation! The caveat here is that the method involves integration, so a solution function might have to be defined in terms of an integral, that is, it might be an accumulation function. The first step in this method is to divide both sides of equation 20 by the coefficient function of y 0 , i.e. a(x). a(x)y 0 + b(x)y = c(x) =⇒ y0 + c(x) b(x) y= a(x) a(x) 22 1. First Order Equations We will rename b(x)/a(x) to p(x) and c(x)/a(x) to q(x) and rewrite this equation in what we will call standard form for a first order, linear equation. y 0 + p(x)y = q(x) (23) The reason for using p(x) and q(x) is simply because they are easier to write than b(x)/a(x) and c(x)/a(x). The heart of the method is what follows. If the left hand side of equation 23 were the derivative of some expression, then we could perhaps get rid of the prime on y 0 by integrating both sides and then algebraically solve for y(x). Notice that the left hand side of equation 23 almost resembles the result of differentiating the product of two functions. Recall the product rule: d [uv] = u0 v + uv 0 . dx Perhaps we can multiply both sides of equation 23 by something that will make the left hand side into an expression which is the derivative of a product of two functions. Remember, however, that we must multiply both sides of the equation by the same factor or else we will be solving an entirely different equation. Let’s call this factor ρ(x) because the Greek letter “rho” resembles the Latin letter “p”, and we will see that p(x) must be related to ρ(x). That is we want: d (24) [yρ] = y 0 ρ + ypρ dx By comparing with the product rule, we find that if ρ0 = pρ, then the expression y 0 ρ + ypρ0 will indeed be the derivative of the product yρ. Notice that we have reduced the problem down to solving a first order, separable equation that we know how to solve. ρ0 = p(x)ρ (25) =⇒ ρ=e R p(x)dx Upon multiplying both sides of equation 23 by the integrating factor ρ from equation 25, we get: R p(x)dx 0 R ) = q(x)e p(x)dx Z Z R R p(x)dx 0 (ye ) dx = q(x)e p(x)dx dx Z R R ye p(x)dx = q(x)e p(x)dx dx Z R R − p(x)dx y=e q(x)e p(x)dx dx (ye You should not try to memorize the formula above. Instead remember the following steps: (1) Put the first order linear equation in standard form. (2) Calculate ρ(x) = e R p(x)dx . (3) Multiply both sides of the equation by ρ(x). (4) Integrate both sides. 5. Linear First–Order Equations 23 (5) Solve for y(x). Example 1.26. Solve xy 0 − y = x3 for y(x). 1 y = x2 x (1) y 0 − (2) ρ(x) = e− (3) y 0 (4) R R dx x −1 = e− ln|x| = eln(|x| ) = 1 1 = |x| x x>0 1 1 1 − y 2 = x2 x x x y (5) y = 1 x 0 dx = R xdx 1 3 x + Cx 2 =⇒ y 1 1 = x2 + C x 2 x>0 4 An important fact to notice is that we ignore the constant of integration when computing the integrating factor ρ. This is because the constant of integration is part of the exponent of e. Assume P (x) is the antiderivative of p(x), then R ρ=e p(x)dx = e(P (x)+C) = eC · eP (x) = C1 eP (x) . Since we multiply both sides of the equation by the integrating factor, the C1 s cancel out. 5.3. Mixture Problems. One very common modeling technique heavily used throughout the sciences is called compartmental analysis. The idea is to model the spread of some measurable quantity such as a chemical as it travels from one compartment to the next. Compartment models are used in many fields including medicine, epidemiology, engineering, physics, climate science and the social sciences. Figure 5.1. A brine mixing tank We will build a simple model based upon a brine mixing tank. Imagine a mixing tank with a brine solution flowing into the tank, being well mixed, and then flowing out a spigot. If we let x(t) represent the amount of salt in the tank at time t, then the main idea of the model is: dx = “rate in - rate out”. dt 24 1. First Order Equations We will use the following names/symbols for the different quantities in the model: Symbol x(t) ci (t) fi (t) co (t) fo (t) v(t) Interpretation = = = = = = amount of salt in the tank (lbs) concentration of incoming solution (lbs/gal) flow rate of incoming solution (gal/min) concentration of outgoing solution (lbs/gal) flow rate of outgoing solution (gal/min) amount of brine in the tank (gal) Notice that if you multiply a concentration by a flow rate, then the units will be lbs/min which exactly match the units of the derivative dx/dt, hence our model is: (26) dx = ci (t)fi (t) − co (t)fo (t) dt Often, ci , fi and fo will be fixed quantities, but co (t) depends upon the amount of salt in the tank at time t, and the volume of brine in the tank at that time. If we assume that the incoming salt solution and the solution in the tank are perfectly mixed, then: x(t) (27) co (t) = . v(t) Often the flow rate in, fi , and the flow rate out, fo , will be equal. When this is the case, the volume of the tank will remain constant. However if the two flow rates do not match, then v(t) = [fi (t) − fo (t)]t + v0 , where v0 is the initial volume of the tank. Now we can rewrite equation 26, in the same form as the standard first order linear equation. (28) dx fo (t) + x = ci (t)fi (t) dt v(t) Example 1.27 (Brine Tank). A tank initially contains 200 gallons of brine, holding 50 lbs of salt. Salt water (brine) containing 2 lbs of salt per gallon flows into the tank at a constant rate of 4 gal/min. The mixture is kept uniform by constant stirring, and the mixture flows out at a rate of 4 gal/min. Find the amount of salt in the tank after 40 minutes. dx = ci fi − co fo dt dx 2 lb 4 gal x lb 4 gal = − dt gal min 200 gal min dx 1 =8− x dt 50 dx 1 + x=8 dt 50 5. Linear First–Order Equations 25 This equation can be solved via the integrating factor technique. ρ(t) = e R 1 50 dt = et/50 1 t/50 xe = 8et/50 50 d h t/50 i xe = 8et/50 dt Z Z d h t/50 i dt = 8 et/50 dt xe dt x0 et/50 + xet/50 = 8 · 50et/50 + C x(t) = e−t/50 [400et/50 + C] x(t) = 400 + Ce−t/50 Next we apply the initial condition x(0) = 50: 50 = 400 + Ce0 =⇒ C = −350 Finally, we compute x(40). x(t) = 400 − 350e−t/50 x(40) = 400 − 350e−40/50 x(40) ≈ 242.7 lbs Notice that limt→∞ x(t) = 400, which is exactly how much salt would be in a 200 gallon tank filled with brine at the incoming concentration of 2 lbs/gal. 4 In the previous example the inflow rate, fi and the outflow rate, fo were equal. This results in a convenient situation where the volume in the tank remains constant. However, this does not have to be the case. If fi 6= fo , then we need to find a valid expression for v(t). Example 1.28. Suppose we again have a 200 gallon tank that is initially filled with 50 gallons of pure water. If water flows in at a rate of 5 gal/min and flows out at a rate of 3 gal/min, when will the tank be full? The rate at which the volume of fluid in the tank changes depends on two factors, the initial volume of the tank, and the difference in flow rates. v(t) = v0 + [fi (t) − fo (t)]t In this example, we have: v(t) = 50 + [5 − 3]t v(t) = 50 + 2t. The tank will be completely full when v(t) = 200, and this will occur when t = 75. 4 26 1. First Order Equations 6. Application: Salmon Smolt Migration Model Salmon spend their early life in rivers, and then swim out to sea where they live their adult lives and gain most of their body mass. When they have matured, they return to the rivers to spawn. Usually they return with uncanny precision to the river where they were born, and even to the very spawning ground of their birth. The salmon run is the time when adult salmon, which have migrated from the ocean, swim to the upper reaches of rivers where they spawn on gravel beds. Unfortunately, the building of dams and the reservoirs produced by these dams have disrupted both the salmon run and the subsequent migration of their offspring to the ocean. Luckily, the problem of how to allow the adult salmon to migrate upstream past the tall dams has been solved with the introduction of fish ladders and in a few circumstances fish elevators. These devices allow the salmon to rise up in elevation to the level of the reservoir and thus overcome the dam. However, the reservoirs still cause problems for the new generation of salmon. About 90 to 150 days after deposition, the eggs or roe hatch. These young salmon called fry remain near their birthplace for 12 to 18 months before traveling downstream towards the ocean. Once they begin this migration to the ocean they are called smolts. The problem is that the reservoirs tend to be quite large and the smolt population literally becomes far less concentrated in the reservoir water than their original concentration in the stream or river which fed the reservoir. Thus the water exiting the reservoir through the spillway has a very low concentration of smolts. This increases the time required for the migration. The more time the smolts spend in the reservoir, the more likely it is that they will be preyed upon by larger fish. The question is how to speed up smolt migration through reservoirs in order to keep the salmon population at normal levels. Let s(t) be the number of smolts in the reservoir. It is impractical to measure the concentration of smolts in the river which feeds the reservoir (the tank). Instead we will assume that the smolts arrive at a steady rate, r which has units of fish/day. If we assume the smolts spread out thoroughly through the reservoir, then the outflow concentration of the smolts is simply the number of smolts in the reservoir, s(t) divided by the volume of the reservoir which for this part of the problem we will assume remains constant, v. Finally, assume the outflow of water from the reservoir is constant and denote it by f . We have the following IVP: (29) s(t) ds =r− f dt v s(0) = s0 We can use the integrating factor method to show that the solution to this IVP is: (30) s(t) = f vr vr + s0 − e− v t . f f 6. Application: Salmon Smolt Migration Model 27 ds f + s=r dt v Rf f dt v ρ(t) = e = evt Z f f se v t = re v t dt f se v t = vr f t ev + C f f Multiply both sides by e− v t : s(t) = f vr + Ce− v t f general solution Use the initial value, s(0) = s0 to find C: vr + Ce0 s0 = f vr C = s0 − f In the questions that follow, assume the following values, and keep all water measurements in millions of gallons. r = 1000 fish/day v = 50 million gallons f = 1 million gallons/day s0 = 25000 fish (1) How many fish are initially exiting the reservoir per day? (2) How many days will it take for the smolt population in the reservoir to reach 40000? One way to allow the smolts to pass through the reservoir more quickly is to draw down the reservoir. This means letting more water flow out than is flowing in. Reducing the volume of the reservoir increases the concentration of smolts resulting in a higher rate of smolts exiting the reservoir through the spillway. This situation can be modeled by the following IVP: ds s(t) =r− · fout dt v0 + ∆f t s(0) = s0 , where v0 is the initial volume of the reservoir and ∆f = fin − fout . Use this model and fin = 1 mil gal/day fout = 2 mil gal/day to find a function s(t) which gives the number of smolts in the reservoir at time t. (3) How many days will it take to reduce the smolt population from 25000 down to 20000? And what will the volume of the reservoir be? 28 1. First Order Equations 7. Homogeneous Equations A homogeneous function is a function with multiplicative scaling behaviour. If the input is multiplied by some factor then the output is multiplied by some power of this factor. Symbolically, if we let α be a scalar—any real number, then a function f (x) is homogeneous if f (αx) = αk f (x) for some positive integer k. For example, f (x) = 3x is homogeneous of degree 1 because f (αx) = 3(αx) = α3x = αf (x). In this example k = 1, hence we say f is homogeneous of degree one. A function whose graph is a line which does not pass through the origin, such as g(x) = 3x + 1 is not homogeneous because, g(αx) = 3(αx) + 1 = α(3x) + 1 6= α(3x + 1) = αg(x). Definition 1.29. A multivariable function, f (x, y, z) is homogeneous of degree k, if given a real number α the following holds f (αx, αy, αz) = αk f (x, y, z). In other words, scaling all of the inputs by the same factor results in the output being scaled by some power of that factor. Monomials in n variables form homogeneous functions. For example, the monomial in three variables: f (x, y, z) = 4x3 y 5 z 2 is homogeneous of degree 10 since, f (αx, αy, αz) = 4(αx)3 (αy)5 (αz)2 = α10 (4x3 y 5 z 2 ) = α10 f (x, y, z). Clearly, the degree of a monomial function is simply the sum of the exponents on each variable. Polynomials formed from monomials of the same degree are homogeneous functions. For example, the polynomial function g(x, y) = x3 + 5x2 y + 9xy 2 + y 3 is homogeneous of degree three since, g(αx, αy) = α3 g(x, y). Definition 1.30. A first order differential equation is homogeneous if it can be written in the form (31) a(x, y) dy + b(x, y) = 0, dx where a(x, y) and b(x, y) are homogeneous functions of the same degree. Suppose both a(x, y) and b(x, y) from equation (31) are of degree k, then we can rewrite equation (31) in the following manner: y k Z y xZ b 1, dy xy = F (32) =− . k dx x Z xZ a 1, x An example will illustrate the rewrite rule demonstrated in equation (32). 7. Homogeneous Equations 29 Example 1.31. Transform the following first order, homogeneous equation into dy the form dx = F ( xy ). (x2 + y 2 ) dy + (x2 + 2xy + y 2 ) = 0 dx (x2 + 2xy + y 2 ) dy =− dx (x2 + y 2 ) y y 2 2 Z x 1 + 2 + Z x x dy =− 2 y 2 dx Z xZ 1 + x 4 Definition 1.32. A multivariable function, f (x, y, z) is called scale invariant if given any scalar α, f (αx, αy, αz) = f (x, y, z). Lemma 1.33. A function of two variables f (x, y) is scale invariant iff the function depends only on the ratio xy of the two variables. In other words, there exists a function F such that y f (x, y) = F . x Proof. (⇒) Assume f (x, y) is scale invariant, then for all scalars α, f (αx, αy) = f (x, y). Pick α = 1/x, then f (αx, αy) = f (x/x, y/x) = f (1, y/x) = F xy . αy (⇐) Assume f (x, y) = F xy , then f (αx, αy) = F αx = F xy = f (x, y). Thus by the lemma, we could have defined a first order, homogeneous equation as one where the derivative is a scale invariant function. Equivalently we could have defined it to be an equation which has the form: y dy (33) =F . dx x 7.1. Solution Method. Homogeneous differential equations are special because they can be transformed into separable equations. Chapter 2 Models and Numerical Methods 1. Population Models 1.1. The Logistic Model. Our earlier population model suffered from the fact that eventually the population would “blow up” and grow at unrealistic rates. This was due to the fact that the solution involved an exponential function. Recall the model and solution: (5) (16) dP = kP dt P (t) = P0 ekt . Bacteria in a petri dish can’t reproduce forever because they eventually run out of food and space. In our previous population model, the constant of proportionality was actually the birth rate minus the death rate: k = β − δ, where k and therefore also β and δ have units of 1/time. To make our model more realistic, we need the birth rate to taper off as the population reaches a certain number or size. Perhaps the simplest way to accomplish this is to have it decrease linearly with population size. β(P ) = β0 − β1 P For this to make sense in the original equation, β0 must have units of 1/time, and β1 must have units of 1/(population·time). Let’s incorporate this new, decreasing birth rate into the original population model. 31 32 2. Models and Numerical Methods dP = [(β0 − β1 P ) − δ]P dt = P [(β0 − δ) − β1 P ] β0 − δ = β1 P −P β1 In order to get a simple, easy to remember equation, let’s let k = β1 and M = β0 −δ β1 . dP = kP (M − P ) dt (34) Notice that M has units of population. We have specifically written equation 34, in the form at the bottom of the derivation because M has a special meaning, it is the carrying capacity of the population. Notice that equation 34 is separable, so we know how to go about solving it. However, before we solve the logistic model, let’s refresh our memory of solving integrals via partial fractions, because we will need to use this when solving the logistic model. Let’s solve a simplified version of the logistic model, with k = 1 and M = 1. dx = x(1 − x) dt (35) Z 1 dx = dt x(1 − x) Z Z A B + dx = dt x (1 − x) A(1 − x) + Bx = 1 Z x = 0 : A(1 − 0) + B · 0 = 1 ⇒ A = 1 x = 1 : A(1 − 1) + B · 1 = 1 ⇒ B = 1 Z Z 1 1 + dx = dt x (1 − x) ln |x| − ln |1 − x| = t + C0 x = t + C0 ln 1 − x x t+C0 = C1 et 1 − x = e x x 1 − x = 1 − x x x−1 x ≥0 1−x 0≤x<1 x <0 1−x x<0 S x>1 1. Population Models 33 Let’s solve for x(t) for 0 ≤ x < 1: x = (1 − x)C1 et x + C1 xet = x0 et x(1 + C1 et ) = C1 et C 1 et 1 + C 1 et 1 x(t) = 1 + Ce−t x= (36) When x < 0 or x > 1, then we get: (37) x(t) = 1 1 − Ce−t The last solution occurs when x(t) = 1, because this forces dx/dt = 0. Figure 1.1 shows some solution curves superimposed on the slope field plot for x0 = x(1 − x). Notice that the solution x(t) = 1 seems to “attract” solution curves, but the solution x(t) = 0, “repels” solution curves. Figure 1.1. Slope field plot and solution curves for x0 = x(1 − x). Let us now use what we just learned to solve the logistic model, with an initial condition. (34) dP = kP (M − P ) dt P (0) = P0 34 2. Models and Numerical Methods Z Z 1 dP = k dt P (M − P ) Z Z 1/M 1/M + dP = k dt P (M − P ) Z Z 1 1 + dP = kM dt P (M − P ) P = kM t + C0 ln M −P P kM t M − P = C1 e P M − P P M − P = P P −M If we solve for the first case, we find: 0≤P <M P <0 S P >M P = (M − P )C1 ekM t P + P C1 ekM t = M C1 ekM t M C1 ekM t e−kM t · 1 + C1 ekM t e−kM t M C1 P = −kM t . e + C1 P = (38) Now we can plug in the initial condition to get a particular solution: M C1 1 + C1 P0 + P0 C1 = M C1 P0 = P0 = M C 1 − P0 C 1 P0 C1 = M − P0 0 M MP−P 0 P (t) = 0 e−kM t + MP−P 0 (39) P (t) = M P0 . P0 + (M − P0 )e−kM t 2. Equilibrium Solutions and Stability Whenever the right hand side of a first order equation only involves the dependent variable, then we can quickly determine the qualitative behavior of its solutions. 2. Equilibrium Solutions and Stability 35 For example, if a differential equation has the form: dy (40) = f (y). dx Definition 2.1. When the independent variable does not appear explicitly in a differential equation, we say that equation is autonomous. Recall from section 3 how a computer makes a slope field plot. It simply grids off the xy–plane and then at each vertex of the grid draws a short bar with slope corresponding to f (xi , yi ), however if the right hand side function is only a function of the dependent variable, y in this case, then the slope field does not depend on the independent variable, i.e. location on the x–axis. This means that for an autonomous equation, the slopes which lie on a horizontal line such as y = 2 are all equivalent and thus parallel. This means that if a solution curve is shifted (translated) left or right along the x-axis, then this shifted curve will also be a solution curve, because it will still fit the slope field. We have established an important property of autonomous equations, namely translation invariance. 2.1. Phase Diagrams. Consider the following autonomous differential equation: (41) y 0 = y(y − 2). Notice that the two constant functions y(x) = 0, and y(x) = 2 are solutions to equation 41. In fact any time you have an autonomous equation, any constant function which makes the right hand side of the equation zero will be a solution. This is because constant functions have slope zero. Thus as long as this constant value of y is a root of the right hand side, then that particular constant function will satisfy the equation. Notice that other constant functions such as y(x) = 1 and y(x) = 3 are not solutions of equation 41, because y 0 = 1(1 − 2) = −1 6= 0 and y 0 = 3(3 − 2) = 1 6= 0 respectively. Definition 2.2. Given an autonomous first order equation: y 0 = f (y), the solutions of f (y) = 0 are called critical points of the equation. So the critical points of equation 41 are y = 0 and y = 2. Definition 2.3. If c is a critical point of the autonomous first order equation y 0 = f (y), then y(x) ≡ c is an equilibrium solution of the equation. So the equilibrium solutions of equation 41, are y(x) = 0 and y(x) = 2. Something in equilibrium, is something that has settled and does not change with time, i.e. is contant. To create the phase diagram for this function we pick y values surrounding the critical points to determine whether the slope is positive or negative. y = −1 : −1(−1 − 2) = (−) (−) = + y = 1 : 1(1 − 2) = (+) (−) = − y = 3 : 3(3 − 2) = (+) (+) = + 36 2. Models and Numerical Methods Example 2.4. Create a phase diagram and plot several solution curves by hand for the differential equation: dx/dt = x3 − 7x2 + 10x. We factor the right hand side to find the critical points and hence equilibrium solutions. x3 − 7x2 + 10x = 0 x(x2 − 7x + 10) = 0 x(x − 2)(x − 5) = 0 The critical points are x = 0, 2, 5, and thus the equilibrium solutions are x(t) = 0, x(t) = 2 and x(t) = 5. Figure 2.1. Phase diagram for y 0 = y(y − 2). Figure 2.2. Phase diagram for x0 = x3 − 7x2 + 10x. Figure 2.3. Hand drawn solution curves for x0 = x3 − 7x2 + 10x. 2. Equilibrium Solutions and Stability 37 4 2.2. Logistic Model with Harvesting. A population of fish in a lake is often modeled accurately via the logistic model. But the question, “How do you take into account the decrease in fish numbers as a result of fishing?”, soon arises. If the amount of fish harvested from the lake is relatively constant per time period, then we can modify the original logistic model, equation 34, by simply subtracting the amount harvested. (42) dx = kx(M − x) − h dt Where h is the amount harvested, and where we have switched from the population being represented by the variable P to the variable x, simply because it is more familiar. Example 2.5. Suppose a lake has a carrying capacity of M = 16, 000 fish, and a k value of k = .125 = 18 . What is a safe yearly harvest rate? To simplify the numbers we have to deal with, let’s let x(t) measure the fish population in thousands. Then the equation we wish to examine is: 1 (43) x0 = x(16 − x) − h. 8 We don’t need to actually solve this differential equation to understand the behavior of its solutions. We just need to determine for which range of h values will the right hand side of the equation result in equilibrium solutions. Thus we only need to solve a quadratic equation with parameter h: 1 x(16 − x) − h = 0 8 x(16 − x) − 8h = 0 16x − x2 − 8h = 0 (44) (45) x2 − 16x + 8h = 0 √ −b ± b2 − 4ac x= √2a 16 ± 256 − 32h x= √2 16 ± 4 16 − 2h x= √2 x = 8 ± 2 16 − 2h Recall that if the discriminant is positive, i.e. 16 − 2h > 0, then we get two distinct rational roots. When the discriminant is zero, i.e. 16 − 2h = 0 we get a repeated rational root. And finally, when the discriminant is negative, i.e. 16−2h < 0, then we get two complex conjugate roots. The critical values are exactly the roots of the right hand side polynomial, and we only get equilibrium solutions for real critical values, thus if the fish population 38 2. Models and Numerical Methods (a) h = 10 (b) h = 8 (c) h = 7.5 (d) h = 6 Figure 2.4. Logistic model with harvesting is to survive the harvesting, then we must choose h so that we get at least one real root. Notice that for any value of h ≤ 8 we get at least one real root. Further, letting h = 8, 7.5, 6, 3.5 all result in the discriminant being a perfect square, which allows us to factor equation 44 nicely. x2 − 16x − 8(8) = x2 − 16x − 64 = (x − 8)(x − 8) x2 − 16x − 8(7.5) = x2 − 16x − 60 = (x − 6)(x − 10) x2 − 16x − 8(6) = x2 − 16x − 48 = (x − 4)(x − 12) x2 − 16x − 8(3.5) = x2 − 16x − 28 = (x − 2)(x − 14) Thus we find that any harvesting rate above 8,000 fish per year is sure to result in the depletion of all fish. But actually harvesting 8,000 fish per year is risky, because if you accidentally overharvest one year, you could eventually cause the depletion of all fish. So perhaps a harvesting level somewhere between 6,000 and 7,500 fish per year would be acceptable. 4 3. Acceleration–Velocity Models 39 3. Acceleration–Velocity Models In section 2 we modeled a falling object, but we ignored the frictional force due to wind resistance. Let’s fix that omission. The force due to wind resistance can be modeled by positing that the force will be in the opposite direction of motion, but proportional to velocity. FR = −kv (46) Recall from physics that Newton’s second law of motion: ΣF = ma = m(dv/dt), relates the sum of the forces acting on a body with its rate of change of momemtum. There are two forces acting on a falling body, one is the pull of gravity, and the other is a buoying force due to wind resistance. If we set up our y–axis with the positive y direction pointing upward and let zero correspond to ground level, then FR = −kv = −k(dy/dt). Note that this is an upward force because v is negative, thus the sum of the forces is: ΣF = FR + FG = −kv − mg. (47) Hence our governing IVP becomes: m (48) dv = −kv − mg dt k dv =− v−g dt m dv = −ρv − g dt v(0) = v0 This is a separable, first–order equation. Let’s solve it. Z 1 dv = − ρv + g Z dt 1 ln |ρv + g| = −t + C ρ ln |ρv + g| = −ρt + C eln|ρv+g| = e−ρt+C |ρv + g| = Ce−ρt −ρt ρv + g ≥ 0 Ce ρv + g = −Ce−ρt ρv + g < 0 g g v≥− Ce−ρt − ρ ρ v(t) = g g −Ce−ρt − v<− ρ ρ 40 2. Models and Numerical Methods Next, we plug in the initial condition v(0) = v0 to get a particular solution. g v0 = C − ρ g C = v0 + ρ g g g −ρt − v≥− v + 0 ρ e ρ ρ (49) v(t) = − v + g e−ρt − g v < − g 0 ρ ρ ρ Notice that the limit as time goes to infinity of both solutions is the same. g g mg g (50) lim v(t) = lim ± v0 + e−ρt − = − = − t→∞ t→∞ ρ ρ ρ k This limiting velocity is called terminal velocity. It is the fastest speed that a dropped object can achieve. Notice that it is negative because it is a downward velocity. The first solution of equation 49 handles the situation where the body is falling more slowly than terminal velocity. The second solution handles the case where the body or object is falling faster than terminal velocity, for example a projectile shot downward. Example 2.6. In example 1.13 we calculated that it would take approximately 5.59 seconds for an object to fall 500 feet, but we neglected the effects of wind resistance. Compute how long it will take for an object to fall 500 feet if ρ = .16, and compute its final velocity. Recall that v(t) = dy/dt, and since v(t) is only a function of the independent variable, t, we can integrate the velocity to find the position as a function of time. Since we are dropping the object from 500 feet, y0 = 500 and v0 = 0. Z Z dy y(t) = dt = v(t) dt dt Z g g y(t) = v0 + e−ρt − dt ρ ρ Z Z g g y(t) = v0 + e−ρt dt − dt ρ ρ g −1 −ρt g e − t+C y(t) = v0 + ρ ρ ρ 32 +C 500 = − (.16)2 C = 500 + 1250 C = 1750 (51) y(t) = −1250e−.16t − 200t + 1750 As you can see in figure 3.1, when we model the force due to wind resistance it adds almost a full second to the amount of time that it takes for an object to fall 4. Numerical Solutions 41 Figure 3.1. Falling object with and without wind resistance 500 feet. In fact it takes approximately 6.56 seconds to reach the ground. Knowing this, we can compute the final velocity. v(6.56) = −1250e−.16(6.56) − 200(6.56) + 1750 = −1250e−.16(6.56) − 200(6.56) + 1750 60 mi/hr ≈ −130 ft/s 88 ft/s mi ≈ −89 hr Thus it takes almost a full second longer to reach the ground (6.56 s vs. 5.59 s) and will be travelling at approximately -89 miles per hour as opposed to -122 miles per hour. 4 4. Numerical Solutions In actual real–world applications, more often than not, you won’t be able to find an analytic solution to the general first order IVP. (13) dy = f (x, y) dx y(a) = b In this situation it often makes sense to approximate a solution via simulation. We will look at an algorithm for creating approximate solutions called Euler’s Method, and named after Leonhard Euler, (pronounced like “oiler”). The algorithm 42 2. Models and Numerical Methods is easier to explain if the independent variable is time, so let’s rewrite the general first order equation above using time, t as the independent variable: (13) dy = f (t, y) dt y(t0 ) = y0 . The fundamental idea behind Euler’s Method and all numerical/simulation techniques is discretization. Essentially the idea is to change the independent variable time, t, from something that can take on any real number, to a variable that is only allowed to have values from a discrete, i.e. finite sequence. Each time value is separated from the next by a fixed period of time, the “tick” of our clock. The length of this “tick” depends on how accurately we wish to approximate exact solutions. Shorter tick lengths will result in more accurate approximations. Normally a solution function must be continuous, and smooth, a.k.a. differentiable. Discretizing time forces us to relax the smoothness requirement. The approximate solution curves we create will be continuous, but not smooth. They will have small angular corners at each tick of the clock, i.e. at each time in the discrete sequence of allowed time values. The goal of the algorithm is to create a sequence of pairs (ti , yi ) which when plotted and connected by straight line segments will approximate exact solution curves. The method of generating this sequence is recursive, i.e. computing the next pair in the sequence will require us to know the values of the previous pair in the sequence. This recursion is written via two equations: ti+1 = ti + ∆t yi+1 = yi + ∆y, where the subscript i+1 refers to the “next” value in the sequence, and the subscript i refers to the “previous” value in the sequence. There are two values in the above equations that we must compute, ∆t, and ∆y. ∆t is simply the length of each clock tick, which is a constant that we choose. ∆y on the other hand changes and must be computed using the discretized version of equation 13: (52) ∆y = f (ti , yi )∆t y(t0 ) = y0 We start the clock at time t0 , which we call time zero. This will often be zero, however any starting time will work. The time after one tick is labelled t1 , and the time after two ticks is labelled t2 and so on and so forth. We know from the initial condition y(t0 ) = y0 what y value corresponds to time zero, and with equation 52 we can approximate y1 as follows: (53) y1 ≈ y0 + ∆y = y0 + f (t0 , y0 )∆t. If we continue in this fashion, we can generate a table of (ti , yi ) pairs which, as long as ∆t is “small” will approximate a particular solution through the point (t0 , y0 ) in the ty plane. Generating the left hand column of our table of values couldn’t be easier. It is done via adding the same small time interval, ∆t to the 4. Numerical Solutions 43 current time, to get the next time, i.e. t1 = t0 + ∆t t2 = t1 + ∆t = t0 + 2∆t t3 = t2 + ∆t = t0 + 3∆t t4 = t3 + ∆t = t0 + 4∆t .. . (54) tn+1 = tn + ∆t = t0 + (n + 1)∆t Generating the yi values for this table is harder because unlike ∆t which stays constant, ∆y depends on the previous time and y value. y1 ≈ y0 + ∆y = y0 + f (t0 , y0 )∆t y2 ≈ y1 + ∆y = y1 + f (t1 , y1 )∆t y3 ≈ y2 + ∆y = y2 + f (t2 , y2 )∆t y4 ≈ y3 + ∆y = y3 + f (t3 , y3 )∆t .. . (55) yn+1 ≈ yn + ∆y = yn + f (tn , yn )∆t Equations 54 and 55, together with the initial condition y(t0 ) = y0 constitute the numerical solution technique known as Euler’s Method. Chapter 3 Linear Systems and Matrices 1. Linear and Homogeneous Equations In order to understand solution methods for higher–order differential equations, we need to switch from discussing differential equations to discussing algebraic equations, specifically linear algebraic equations. However, since the phrase “linear algebraic equation” is a bit of a mouthful, we will shorten it to the simpler “linear equation”. Recall how we provisionally defined a linear differential equation in definition 1.4 to be any differential equation where all of its solutions satisfy summability and scalability. If there is any justice in mathematical nomenclature, then linear equations should also involve summing and scaling, and indeed they do. A linear equation is any equation that can be written as a finite sum of scaled variables set equal to a scalar. For example, 2x + 3y = 1 is an example of a linear equation in two variables, namely x and y. Another example is: x − 2y + 3 = 7z − 11, because this can be rearranged to the equivalent equation: x − 2y − 7z = −14. Notice that there is no restriction on the number of variables other than that there must be a finite number. So for example, 2x = 3 is an example of a linear equation in one variable, and 4w − x + 3y + z = 0 is an example of a linear equation in four variables. Typically, once we go beyond four variables we begin to run out of the usual variable names, and thus switch to using subscripted variables and subscripted coefficients as in the following definition. Definition 3.1. A linear equation is a finite sum of scaled variables set equal to a scalar. More generally, a linear equation is any equation that can be written in the form: (56) a1 x1 + a2 x2 + a3 x3 + · · · + an xn = b 45 46 3. Linear Systems and Matrices In this case we would say that the equation has n variables, which stands for some as yet undetermined but finite number of variables. Notice the similarity between the definition of a linear equation given above and definition 1.25, which defined a linear differential equation as a differential equation that can be written as a scaled sum of the derivatives of a function set equal to a scalar function. If you think of each derivative as a distinct variable, then the above definition is very similar to our current definition for linear equations. Here we reproduce the equation from each definition for comparison. a1 x1 + a2 x2 + a3 x3 + · · · + an xn = b (56) (22) an (x)y (n) + an−1 (x)y (n−1) + · · · + a1 (x)y 0 + a0 (x)y = f (x) We can add another variable and rearrange equation 56 to increase the similarity. an xn + an−1 xn−1 + · · · + a1 x1 + a0 x0 = b (56’) (22) an (x)y (n) + an−1 (x)y (n−1) + · · · + a1 (x)y 0 + a0 (x)y = f (x) The difference between these two equations is that we switch from scalar coefficients and variables to scalar coefficient functions and derivatives of a function. Definition 3.2. A homogeneous equation is a linear equation where the right hand side of the equation is zero. Every linear equation has an associated homogeneneous equation which can be obtained by changing the constant term on the right hand side of the equation to 0. Later, we will see that undersanding the set of all solutions of a linear equation, what we will eventually call the solution space, will be facilitated by understanding the solutions to the homogenous equation. Homogeneous equations are special because they always have a solution, namely the origin. For example the following homogeneous equation in three variables has solution (0, 0, 0), as you can easily check. 2x − 7y − z = 0 There are an infinite number of other solutions as well, for example (0, 1, −7) or (1/2, 0, 1) or (7, 1, 7). The interesting thing to realize is that we can take any number of solutions and sum them together to get a new solution. For example, (0, 1, −7) + (1/2, 0, 1) + (7, 1, 7) = (15/2, 2, 1) is yet another solution. Thus homogeneous equations have the summability property. And as you might guess, they also have the scalability property. For example 3 · (0, 1, −7) = (0, 3, −21) is again a solution to the above homogeneous equation. Notice that summability and scalability do not hold for regular linear equations, just the homogeneous ones. For example 2x − 7y − z = 2 has solutions (7/2, 1, −2) and (1, 0, 0) but their sum, (9/2, 1, −2) is not a solution. Nor can we scale the two solutions above to get new solutions. At this point it is 2. Introduction to Linear Systems 47 natural to wonder why we call these equations “linear” at all if they don’t satisfy the summability or scalability properties. The reason that we do classify these equations as linear is rather simple. Notice that when we plug any solution to the homogeneous equation, i.e. equation 62, into the left hand side of equation 1 we get zero. But if we add a solution to the homogeneous equation to a solution to the non–homogeneous linear equation we get a new solution to the linear equation. For example, (7/2, 1, −2) is a solution to the linear equation, and (7, 1, 7) is a solution to the corresponding homogeneous equation. Their sum, (7/2, 1, −2) + (7, 1, 7) = (21/2, 2, 5), is a solution to the linear equation: 2(21/2) − 7(2) − (5) = 21 − 14 − 5 = 2 The explanation for this situation is simple. It works because the distributive property of multiplication over addition and subtraction holds. That is a(x + y) = ax + ay. 2(21/2) − 7(2) − (5) = 2(7/2 + 7) − 7(1 + 1) − (−2 + 7) = 2(7/2) + 2(7) − 7(1) − 7(1) − (−2) − (7) = [2(7/2) + 7(1) − (−2)] + [2(7) − 7(1) − (7)] =0+2=2 What the above calculation shows is that given a solution to any linear equation, we can always add to this solution any solution of the corresponding homogeneous equation to get a new solution. This is perhaps the fundamental concept of linear equations. We will exploit this fact repeatedly when solving systems of linear equations, and later when we solve second order and higher order differential equations. 2. Introduction to Linear Systems A linear system is a collection of one or more linear equations. For example, 2x − 2y = −2 3x + 4y = 11. The above linear system is an example of a 2 × 2 sytem, prononunced “two by two”, because it has two equations in two unknowns, (x and y). When a system has equations with two unknowns as is the case above, then the solution set will be the set of all pairs of real numbers, (x, y), that satisfy both equations simultaneously. Geometrically, since each equation above is the equation of a line in the xy–plane, the solution set will be the set of all points in the xy–plane that lie on both lines. 48 3. Linear Systems and Matrices 2.1. Method of Elimination. Finding the solution set is done by the method of elimination, which in the 2 × 2 case has three steps: (1) Add a multiple of equation (1) to equation (2) such that one of the variables, perhaps x, will sum to 0 and hence be eliminated. (2) Solve the resulting equation for the remaining variable, in this case y. (3) Back–substitute the value found in step two into either of the original two equations, and solve for remaining variable, in this case x. Example 3.3. Use the method of elimination to solve the following system. 2x − 2y = −2 3x + 4y = 11 (1) (2) (1) Multiplying the first equation by −3/2 and adding to the second yields: −3x + 3y = 3 3x + 4y = 11 7y = 14 − 23 (1) (2) (10 ) (2) 7y = 14 implies y = 2. (3) Now plug y = 2 into equation (1): 2x − 2(2) = −2 2x − 4 = −2 2x = 2 x=1 The solution set is therefore {(1, 2)}, which we’ll often simply report as the pair (1,2). 4 Notice that the method of elimination transformed a 2 × 2 system down to a 1 × 1 system, namely 7y = 14. In other words, the method works by transforming the problem down to one that we already know how to solve. This is a common problem solving technique in math. A natural question to ask at this point is: “What are the possible outcomes of the method of elimination when applied to a 2×2 system?” Since the transformation down to a 1×1 system can always be performed this question is equivalent to asking, “What are the possible outcomes of solving a 1 × 1 system?” Let’s consider the following linear equation in the single variable x: ax = b, where a and b are real constants. We solve this system by multiplying both sides of the equation by the multiplicative inverse of a, namely a−1 or equivalently a1 , to yield: b x = a−1 b = . a However, upon a little reflection we realize that if a = 0, then we have a problem because 0 has no mulitiplicative inverse, or equivalently, division by 0 is undefined. 2. Introduction to Linear Systems 49 So clearly not every 1 × 1 system has a solution. There is also another interesting possibility, if both a and b are zero, then we get 0 · x = 0, and this equation is true for all possible values of x, or in other words it has infinitely many solutions. Let’s summarize our findings regarding ax = b: (1) a 6= 0: one unique solution. (2) a = 0 and b 6= 0: no solution. (3) a = 0 and b = 0: infinitely many solutions. Thus we have shown that solving a two by two system may result in three distinct possibilities. What do these three distinct possibilities correspond to geometrically? Recall that solving a two by two system corresponds to finding the set of all points in the intersection of two lines in the plane. The geometric analogies are as follows: (1) One unique solution :: two lines intersecting in a single point. (2) No solution :: two parallel lines which never intersect. (3) Infinitely many solutions :: two overlapping lines. The method of elimination extends naturally to solving a 3 × 3 system, that is a set of three linear equations in three unknowns. We will do one more example and then use the knowledge gained to generalize the method to n × n systems. Example 3.4. Since we know how to solve a 2 × 2 system, we will transform a 3 × 3 system down to a 2 × 2 system and proceed as we did in the previous example. Consider the following system, 3x − 2y + z = −1 (1) x + 5y + 2z = 11 (2) −x + 2y − z = 3 (3) The basic idea is to pick one equation and use it to eliminate a single variable from the other two equations. First we’ll use equation (2) to eliminate x from equation (1), and number the resulting equation (1’). 3x − 2y + z = −1 −3x − 15y − 6z = −33 − 17y − 5z = −34 (1) −3(2) (10 ) Now we use equation (2) to eliminate x from equation (3) and number the resulting equation (20 ). −x + 2y − z = 3 (3) x + 5y + 2z = 11 (2) 7y + z = 14 (20 ) Note, we could just as well have numbered the resulting equation (30 ) since we eliminated x from equation (3). I numbered it (20 ) because I like to think of it as the second equation in a 2 × 2 system. You can use whichever numbering scheme you prefer. 50 3. Linear Systems and Matrices We see that we have reduced the problem down to a 2 × 2 system: −17y − 5z = −34 (10 ) 7y + z = 14 (20 ) Now we must pick the next variable to eliminate. Eliminating z from the above system will be a little easier than eliminating y. (10 ) 5(20 ) (100 ) −17y − 5z = −34 35y + 5z = 70 18y = 36 Equation (100 ) becomes y = 2. Now that we know one of the values in the single prime system we can use either (10 ) or (20 ) to solve for z. Let’s use equation (10 ). −17(2) + 5z = −34 −34 + 5z = −34 5z = 0 z=0 Finally, we can use any one of equations (1), (2) or (3) to solve for x, let’s use equation (1). 3x − 2(2) + (0) = −1 3x − 4 = −1 3x = 3 x=1 The solution set to a 3 × 3 system is the set of all triples (x, y, z) that satisfy all three equations simultaneously. Geometrically, it is the set of all points at the intersection of three planes. For this problem the solution set is {(1, 2, 0)}. Clearly the point (1, 2, 0) is a solution of equation (1), but we should check that it also satisfies equations (2) and (3). ? (1) + 5(2) + 2(0) = 11 X ? −(1) + 2(2) − (0) = 3 X 4 3. Matrices and Gaussian Elimination The method of elimination for solving linear systems introduced in the previous section can be streamlined. The method works well for small systems of equations such as a 3 × 3 system, but as n grows, the number of variables one must write for an n × n system grows as n2 . In addition, one must repeatedly write other symbols such as addition, “+”, and “=”. However, we could just use the columns as a proxy for the variables and just retain the coefficients of the variables and the column of constants. 3. Matrices and Gaussian Elimination 51 Remark 3.5. The coefficients and column of constants encode all of the information in a linear system. For example, a linear system can be represented as a grid of numbers, which we will call an augmented matrix. 3 −2 1 2 3x − 2y + z = 2 0 ←→ 5 2 16 5y + 2z = 16 0 −x + 2y − z = 0 −1 2 −1 Gaussian Elimination is simply the method of elimination from the previous section applied systematically to an augmented matrix. In short, the goal of the algorithm is to transform an augmented matrix until it can be solved via back–substitution. This corresponds to transforming the augmented matrix until the left side forms a descending staircase (echelon) of zeros. To be precise, we wish to transform the left side of the augmented matrix into row–echelon form. Definition 3.6 (Row–Echelon Form). A matrix is in row–echelon form if it satisfies the following two properties: (1) Every row consisting of all zeros must lie beneath every row that has a nonzero element. (2) In each row that contains a nonzero element, the leading nonzero element of that row must be strictly to the right of the leading nonzero element in the row above it. Remark 3.7. The definition for row–echelon form is just a precise way of saying that the linear system has been transformed to a point where back–substitution can be used to solve the system. The operations that one is allowed to use in transforming an augmented matrix to row–echelon form are called the elementary row operations, and there are three of them. Definition 3.8 (Elementary Row Operations). There are three elementary row operations, which can be used to transform a matrix to row–echelon form. (1) Multiply any row by a nonzero constant. (2) Swap any two rows. (3) Add a constant multiple of any row to another row. Example 3.9. Solve the following linear system: −x + 2y − z = −2 2x − 3y + 4z = 1 2x + 3y + z = −2 −1 2 −1 −2 −1 2 −1 R2 +2R1 2 −3 0 1 4 1 2 −→ 2 3 1 −2 2 3 1 −2 −3 −2 R3 +2R1 −→ 52 3. Linear Systems and Matrices −1 0 0 2 −1 1 2 7 −1 −2 −3 −6 R3 −7R2 −→ −1 0 0 2 1 0 −1 2 −15 −2 −3 15 At this point, the last augmented matrix is in row–echelon form, however we can do two more elementary row ops to make back–substitution easier. 1 −2 1 2 −1 2 −1 −2 −1R1 0 0 1 2 −3 1 2 −3 −→ −1/15R3 0 0 −15 15 0 0 1 −1 The last augmented matrix corresponds to the linear system. x − 2y + z = 2 y + 2z = −3 z = −1 Back–substituting z = −1 into the second equation yields: y + 2(−1) = −3 y = −1 Back–substituting z = −1, y = −1 into the first equation yields: x − 2(−1) + (−1) = 2 x+1=2 x=1 Thus the solution set is: {(1, −1, −1)}, which is just a single point of R3 . 4 Definition 3.10. Two matrices are called row equivalent if one can be obtained by a (finite) sequence of elementary row operations. Theorem 3.11. If the augemented coefficient matrices of two linear systems are row equivalent, then the two systems have the same solution set. 3.1. Geometric Interpretation of Row–Echelon Forms. Recall that an augmented matrix directly corresponds to a linear system. The solution set of a linear system has a geometric interpretation. In the tables which follow, the following symbols will have a special meaning. ∗ = any nonzero number = any number The number of n×n, row–echelon matrix forms follows a pattern. It follows the pattern found in Pascal’s triangle. Once you pick n, then just read the nth row of Pascal’s triangle from left to right to determine the number of n × n matrices that have 0, 1, 2, . . . , n rows of all zeroes. So for example, when n = 4 there are exactly 4 unique row–echelon forms that correspond to one row of all zeros, 6 unique row– echelon forms that correspond to two rows of all zeros, and 4 unique row–echelon forms that correspond to three rows of all zeros. 6. Matrices are Functions 53 Table 1. All possible 2 × 2, row–echelon matrix forms All Zero Rows 0 1 2 Representative Matrices Solution Set ∗ Unique point in R2 0 ∗ ∗ 0 ∗ Line of points in R2 0 0 0 0 0 0 All points in R2 plane 0 0 Table 2. All possible 3 × 3, row–echelon matrix forms All Zero Rows Representative Matrices ∗ 0 ∗ 0 0 ∗ 0 Solution Set Unique point in R3 1 ∗ 0 0 ∗ 0 ∗ 0 0 0 0 0 0 ∗ 0 0 0 ∗ 0 0 ∗ Line of points in R3 0 2 ∗ 0 0 0 0 0 0 0 0 0 ∗ 0 0 0 0 0 0 0 0 0 0 ∗ 0 0 0 0 0 0 0 0 0 0 0 3 Plane of points in R3 All points in R3 4. Reduced Row–Echelon Matrices 5. Matrix Arithmetic and Matrix Equations 6. Matrices are Functions It is conceptually advantageous to change our perspective from viewing matrices and matrix equations as devices for solving linear systems to functions in their own right. Recall that the reason we defined matrix multiplication the way we did was so that we could write a linear system with many equations into a single matrix equation. That is, matrix multiplication allows us to write linear systems in a more compact form. The following illustrates the equivalence of the two notations. 54 3. Linear Systems and Matrices Table 3. Pascal’s Triangle n = 0: 1 n = 1: 1 n = 2: 1 n = 3: 1 n = 4: 1 n = 5: n = 6: 1 1 2x − y = 2 x + 3y = 1 2 1 3 4 5 6 1 3 6 10 15 1 4 10 20 ←→ 5 15 2 1 1 1 6 1 −1 x 2 = 3 y 1 If you squint a little bit, the above matrix equation evokes the notion of a function. This function takes a pair of numbers as input and outputs a pair of numbers. A function f : R → R, often written f (x) is defined by some expression such as f (x) = x2 + x + 5 When a function is defined in terms of an expression in x, then function application is achieved by replacing all occurences of the variable x in the definition with the input value and evaluating. For example f (3) is computed by f (3) = 32 + 3 + 5 = 17. In the matrix case, the function definition is the matrix itself, and application to an input is achieved via matrix multiplication. For example, 2 −1 4 3 = 1 3 5 19 In general if A is an m × n matrix, then we can place any column vector with n rows, i.e. a vector from Rn to the right of the matrix and multiplication will be defined. Upon multiplication we will get a new vector with m rows, a vector from Rm . In other words an m × n matrix is a function from Rn to Rm . We often denote this: Am×n : Rn → Rm . 6.1. Matrix Multiplication is Function Composition. The real reason matrix multiplication is defined the way it is, is so that it agrees with function composition. For example, if you have two n×n matrices, say A and B, then we know that, A(B~x) = (AB)~x because matrix multiplication is associative. But in the language of functions, the above equation says that applying function B to your input vector, ~x, and then 6. Matrices are Functions 55 applying A to the result is the same as first composing (multiplying) A and B and then applying the composite function (product) to the input vector. Example 3.12. This example demonstrates that matrix multiplication does indeed correspond with function composition. Let, 1 ~u = , 0 0 ~v = . 2 If we plot both of these vectors on coordinate axes then we get the “L” shaped figure you see to the right. Figure 6.1. ~ u and ~v The following matrix is called a rotation matrix, because it rotates all vectors by θ radians (or degrees) in a counter–clockwise direction around the origin. A= cos θ sin θ − sin θ cos θ Let θ = π/4, and apply A to both ~u and ~v . √ 2 1 −1 A= . 2 1 1 √ √ 2 1 −1 1 2 1 A~u = = 1 1 0 2 2 1 √ √ −1 2 1 −1 0 A~v = = 2 2 1 2 1 1 Figure 6.2. A~ u and A~v Next we introduce matrix B, which flips all vectors with respect to the y axis. Flipping with respect to the y axis simply entails changing the sign of the x component and leaving the y component untouched, and this is exactly what B does. B= −1 0 0 1 56 3. Linear Systems and Matrices −1 0 1 −1 B~u = = 0 1 0 0 −1 0 0 0 B~v = = 0 1 2 2 Figure 6.3. B~ u and B~v Next, we multiply matrices A and B in both orders and apply them to ~u and ~v . √ 2 1 −1 −1 0 C = BA = = 0 1 2 1 1 √ 2 1 −1 −1 0 D = AB = = 0 1 2 1 1 √ 2 −1 C~u = 1 2 √ 2 −1 C~v = 1 2 √ 2 −1 1 1 1 2 √ 2 −1 −1 2 −1 1 √ 2 −1 1 1 = 1 0 1 2 √ 1 1 0 = 2 1 1 2 Figure 6.4. BA~ u and BA~v √ 2 −1 D~u = 2 −1 √ 2 −1 D~v = 2 −1 √ 2 −1 −1 1 = 1 0 2 −1 √ −1 −1 0 = 2 1 2 1 Figure 6.5. AB~ u and AB~v You can use your thumb and index finger on your left hand, which form an upright “L” shape to verify that first applying A (the rotation) to both vectors followed by applying B (the flip) results in figure 6.4. Next, change the order and first apply B (the flip) followed by A (the rotation), and that results in figure 6.5. 7. Inverses of Matrices 57 Recall that function application is read from left to right, so AB~u corresponds to first applying B and then applying A. Adding parentheses may help: AB~u = A(B~u) 4 7. Inverses of Matrices Perhaps the most common problem in algebra is solving an equation. But you’ve probably never thought much about exactly what algebraic properties of arithmetic allow us to solve as simple an equation as 2x = 3. Undoubtedly, you can look at the equation and quickly arrive at an answer of x = 3/2, but what are the underlying algebraic principles which you are subconsciously employing to allow you to draw that conclusion? Suppose a, b are real numbers, can we always solve the equation: ax = b for any unknown x? No, not always. For example if a = 0 and b 6= 0, then there is no solution. This is the only case that does not have a solution because 0 is the only real number that does not have a multiplicative inverse. Assuming a 6= 0, you solve the equation in the following manner: ax = b −1 a (ax) = a−1 b −1 (a (existence of multiplicative inverses) −1 b (associativity of multiplication) −1 b (multiplicative inverse property) −1 b (multiplicative identity property) a)x = a 1x = a x=a Notice that we never needed to use the commutative property of multiplication nor distributivity. Associativity, inverses, and identity form the core of any algebraic system. Now we wish to solve matrix equations in a similar fashion, i.e. we wish to solve matrix equations by multipying both sides of an equation by the inverse of a matrix, e.g. A~x = ~b A−1 (A~x) = A−1~b (A−1 A)~x = A−1~b I~x = A−1~b ~x = A−1~b where A is a matrix and ~x and ~b are vectors. Since matrix multiplication is the same as composition of maps (functions), this method of solution amounts to finding the inverse of the map A and then applying it to the vector ~b. However, not all matrices have an inverse. 58 3. Linear Systems and Matrices 7.1. Inverse of a General 2 × 2 Matrix. In what follows, we will often need to compute the inverse of 2×2 matrix. It will save time if we can generate a formula or simple rule for determining when such a matrix is invertible and what the inverse is. To derive such a formula, we must compute the inverse of the matrix: A = ac db . a c b 1 d 0 1 0 1 0 1 R a 1 −→ 1 c b a 1 a d 0 b a 1 a 0 1 c − ad−bc a ad−bc 0 1 −→ −cR1 +R2 b −a R2 +R1 −→ 1 0 1 0 0 1 b a 1 a 0 ad−bc a − ac 1 −→ d ad−bc b − ad−bc c − ad−bc a ad−bc a ad−bc R2 Thus the inverse of A is: A −1 1 d −b = a ad − bc −c Clearly, we will not be able to invert A if ad − bc = 0, thus we have found the condition for the genral 2×2 matrix which determines whether or not it is invertible. 8. Determinants The determinant is a function which takes a square, n × n matrix and returns a real number. If we let Mn (R) denote the set of all n × n matrices with entries from R, then the determinant function has the following signature: det : Mn (R) → R. We denote this function two ways, det(A) = |A|. The algorithm for computing this function is defined recursively, similar to how the elimination algorithm was defined. Thus the first definition below, the definition for the minor of a matrix will use the term determinant which is defined later. This is just the nature of recursive algorithms. Definition 3.13. The ij th minor of a matrix A, denoted Mij , is the determinant of the matrix A with its ith row and j th column removed. For example, if A is a 4 × 4 matrix, then M23 is: a11 a12 a13 a14 a a a22 a23 a24 11 M23 = 21 = a31 a31 a32 a33 a34 a41 a41 a42 a43 a44 a12 a32 a42 a14 a34 a44 Definition 3.14. The ij th cofactor of a matrix A, denoted Aij , is defined to be, Aij = (−1)(i+j) Mij . Notice that cofactors are defined in terms of minors. Next we define the determinant in terms of cofactors. 8. Determinants 59 Definition 3.15. The determinant function is defined recursively, so we need two cases, a base case and a recursive case. • The determinant of a 1 × 1 matrix or scalar is just the scalar. • The determinant of a square, n × n matrix A, is det(A) = a11 A11 + a12 A12 + · · · + a1n A1n . Notice that a11 , a12 , . . . , a1n are just the elements in the first row of the matrix A. The A11 , A12 , . . . , A1n are cofactors of the matrix A. We call this a cofactor expansion along the first row. Let’s unwind the previous definitions to compute the determinant of the matrix, a b A= c d det(A) = a11 A11 + a12 A12 = aA11 + bA12 = a(−1)(1+1) d + b(−1)(1+2) c = ad − bc Notice that this same quantity appeared when we found the inverse of A in the previous section. This is no accident. The determinant is closely related to invertibility. Example 3.16. Compute det(A), if A is the following matrix. 1 0 3 A = 0 −2 2 −5 4 1 det(A) = a11 A11 + a12 A12 + a13 A13 = a11 (−1)1+1 M11 + a12 (−1)1+2 M12 + a13 (−1)1+3 M13 1+2 0 2 1+3 0 1+1 −2 2 = 1(−1) 4 1 + 0(−1) −5 1 + 3(−1) −5 −2 2 + 0 + 3 0 −2 = −5 4 1 4 −2 4 Using the definition for the determinant of a 2 × 2 matrix found in the previous section we get: = −10 + 0 + 3(−10) = −40 4 The recursive nature of the determinant makes it difficult to compute the determinants of large matrices even with computers. However, there are several key facts about the determinant which make computations easier. Most computer systems 60 3. Linear Systems and Matrices use the following theorem to make computing of determinants for large matrices feasible. Theorem 3.17 (Elementary Row Operations and the Determinant). Recall there are three elementary row operations. They each affect the computation of |A| differently. (1) Suppose B is obtained by swapping two rows of the matrix A, then |B| = − |A| . (2) Suppose B is obtained by multiplying a row of matrix A by a nonzero constant k, then |B| = k |A| . (3) Suppose B is obtained by adding a multiple of one row to another row in matrix A, then |B| = |A| . Theorem 3.18 (Determinants of Matrix Products). If A and B are n×n matrices, then |AB| = |A| |B| Definition 3.19. The transpose of a matrix is obtained by changing its rows into columns, or vice versa, and keeping their order intact. The transpose of a matrix A is denoted AT . Example 3.20. 2 0 2 5 1 0 −1 1 T 0 2 7 = 1 3 0 −2 0 0 7 2 −1 3 5 1 −2 4 Theorem 3.21 (Transpose Facts). The following properties of transposes are often useful. (1) (AT )T = A. (2) (A + B)T = AT + B T (3) (cA)T = c(AT ) (4) (AB)T = B T AT Theorem 3.22 (Determinants of Transposed Matrices). If A is a square matrix, then det(AT ) = det(A). 8.1. Geometric Interpretation of Determinants. Chapter 4 Vector Spaces 1. Basics Definition 4.1. A vector is an ordered list of real numbers. √ Vectors will be denoted by a lower case letter with an arrow over it, e.g. ~v = ( 2, −3, 0). Definition 4.2 (n-space). Rn = {(a1 , a2 , . . . , an ) | ai ∈ R for i = 1 . . . n}, which in words reads: Rn (pronounced “r”, “n”) is the set of all vectors with n real components. Example 4.3. A vector in R2 is a pair of real numbers. For example (3, 2) is a vector. We can interpret this pair in two ways: (1) a point in the xy–plane, or (2) an arrow whose tail is at the origin and whose tip is at (3, 2). A vector in R3 is a triple of real numbers, and can be interpreted as a point (x, y, z) in space, or as an arrow whose tail is at (0, 0, 0) and whose tip is at (x, y, z) 4 We sometimes call a vector with n components an n–tuple. The “tuple” part of n–tuple comes from quadruple, quintuple, sextuple, septuple, etc.. In this chapter we will mostly use the arrow interpretation of vectors, i.e. the notion of directional displacement. Since the arrow really only represents displacement, it doesn’t matter where we put the tail of the arrow. In fact you can translate (or move) a vector in R2 all around the plane and it remains the same vector, as long as you don’t rotate it or scale it. Similarly for a vector with three components or any number of components for that matter. Thus the defining characteristic of a vector is its direction and magnitude (or length). Definition 4.4. The magnitude of a vector is the distance from the origin to the point in Rn the vector represents. Magnitude is denoted by |~v | and is computed 61 62 4. Vector Spaces via the following generalization of the Pythagorean theorem: ! 12 n q X vi2 |~v | = v12 + v22 + · · · + vn2 = i=1 Example 4.5. Given the vector ~v = (3, −2, 4) ∈ R3 , p √ |~v | = 32 + (−2)2 + 42 = 29 4 Definition 4.6 (vector addition). Two vectors are added component–wise, meaning that given two vectors say, ~u = (u1 , u2 , . . . , un ) and ~v = (v1 , v2 , . . . , vn ), then ~u + ~v = (u1 + v1 , u2 + v2 , . . . , un + vn ). Notice that this way of defining addition of vectors only makes sense if both vectors have the same number of components. Definition 4.7 (scalar multiplication). A vector can be scaled by any real number, meaning that if a ∈ R and ~u = (u1 , u2 , . . . , un ), then a~u = (au1 , au2 , . . . , aun ). Scaling literally corresponds to stretching or contracting the magnitude of the vector. Scaling by a negative number is scaling and reflecting. If the vector is a pair, then the reflection corresponds to a reflection about the line y = −x which when written in general form is x + y = 0. A reflection in 3–space corresponds to a reflection about the plane x + y + z = 0, and so on and so forth. Definition 4.8. A vector space is a nonempty set, V , of vectors, along with the operations of vector addition, and scalar multiplication which satisfies the following requirements for all ~u, ~v , w ~ ∈ V , and for all scalars a, b ∈ R. (1) ~u + ~v = ~v + ~u (commutativity) (2) (~u + ~v ) + w ~ = ~u + (~v + w) ~ (additive associativity) ~ (3) ~v + 0 = ~v (additive identity) (4) ~v + −~v = ~0 (additive inverses) (5) a(~u + ~v ) = a~u + a~v (distributivity over vector addition) (6) (a + b)~u = a~u + b~u (distributivity over scalar addition) (7) (ab)~u = a(b~u) (multiplicative associativity) (8) 1~u = ~u (multiplicative identity) 1. Basics 63 Remark 4.9. Notice that the definition for a vector space does not require a way of multiplying two vectors to yield another vector. If you studied multi–variable calculus then you may be familiar with the cross product, which is a form of vector multiplication, but the definition of a vector space does not mention the cross product. However, being able to scale vectors by real numbers is a vital piece of the vector space definition. A vector space is probably your first introduction to a mathematical definition involving a “set with structure”. The structure here is provided by the operations of vector addition and scalar mulitplication, as well as the eight requirements that they must satisfy. We will see at the end of this chapter that a vector space isn’t defined by the objects in the set as much as by the rigid relationships between these objects that the eight requirements enforce. In short, a vector space is a nonempty set that is closed under vector addition and scalar multiplication. Here, “closed” means that if you take any two vectors in the vector space and add them, then their sum will also be a vector in the vector space. Likewise if you scale any vector in the vector space, then the scaled version will also be an element of the vector space. Definition 4.10. A linear combination of vectors is a scaled sum of vectors. For example, a linear combination of the vectors ~v1 , ~v2 , . . . , ~vn could be written: c1~v1 + c2~v2 + · · · + cn~vn , where c1 , c2 , . . . , cn are real numbers (scalars). The concept defined above, of generating a new vector from other vectors is a foundational concept in the study of Linear Algebra. It will occur throughout the rest of this book. Given this definition, one can think of a vector space as a nonempty set of vectors that is closed under linear combinations. Which is to say that any linear combination of vectors from a vector space will again be an element of the vector space. You may be wondering why we make the requirement that a vector space must be a nonempty set. This is because every vector space must contain the zero vector, ~0. Since vector spaces are closed under scalar multiplication, any nonempty vector space will necessarily contain the zero vector. This is because you can always take any vector and scale it by the scalar, 0, to get the vector, ~0. As we develop the theory, you will see that it makes more sense for the set {~0} to be the smallest possible vector space rather than the empty set {}. This is the reason for requiring vector spaces to be nonempty. Linear combinations are important because they give us a new geometric perspective on the calculations we did in the previous chapter. 64 4. Vector Spaces Linear System 1x + 1y = 3 3x + 0y = 3 Matrix Equation 1 1 x 3 = 3 0 y 3 ⇐⇒ Linear Combination 1 1 3 x +y = 3 0 3 ⇐⇒ `1 : y = −x + 3 `2 : x = 1 y `1 x~u + y~v = w ~ y `2 (3, 3) 3 3 (1, 2) 2 2 1 1 1 2 3 4 w ~ ~v x 0 ~u 0 ~v 1 x 2 3 4 2. Linear Independence The following definition is one of the most fundamental concepts in all of linear algebra and will be used again and again. You must memorize it. Definition 4.11. A set of vectors, {~v1 , . . . , ~vn } is linearly dependent if there exist scalars, c1 , . . . , cn not all 0, such that (57) c1~v1 + c2~v2 + · · · + cn~vn = ~0. Remark 4.12. Clearly if c1 = c2 = · · · = cn = 0, that is if all the scalars are zero, then the equation is true, thus the “not all 0” phrase of the definition is key. A linearly dependent set of vectors is a set that possesses a relationship amongst the vectors. Specifically, you can write any vector in the set as a linear combination of the other vectors. For example we could solve equation (57) for ~v2 , as follows: c1 c3 cn (58) ~v2 = − ~v1 − ~v3 − · · · − ~vn . c2 c2 c2 Of course, ~v2 is not special. We can solve equation (57) for any of the vectors. Thus, one of the vectors is redundant because we can generate it via a linear combination of the others. It may be the case that there are other vectors in the set that are also redundant, but at least one is. So how does one figure out if there are scalars, c1 , . . . , cn not all zero such that equation (57) is satisfied? Example 4.13. Determine whether the following set of four vectors in R3 is linearly dependent. 0 3 0 1 0 , 2 , 0 , 1 −1 0 1 1 4. Affine Spaces 65 We must figure out if there exist scalars c1 , c2 , c3 , c4 not all zero such that 1 0 3 0 0 (59) c1 0 + c2 2 + c3 0 + c4 1 = 0 . −1 0 1 1 0 Let’s rewrite the last equation by absorbing the scalars into the vectors and summing them: 1 · c1 + 0 · c2 + 3 · c3 + 0 · c4 0 0 · c1 + 2 · c2 + 0 · c3 + 1 · c4 = 0 . −1 · c1 + 0 · c2 + 1 · c3 + 1 · c4 0 The last equation corresponds to a linear system of three equations in four unknowns c1 , . . . , c 4 ! 1c1 + 0c2 + 3c3 + 0c4 = 0 0c1 + 2c2 + 0c3 + 1c4 = 0 −1c1 + 0c2 + 1c3 + 1c4 = 0 And this system of equations can be written as a coefficient matrix times a column vector of the unknowns: c1 1 0 3 0 0 c 2 0 2 0 1 = 0 . (60) c3 −1 0 1 1 0 c4 Equations (59) and (60) are two different ways of writing the exact same thing! In other words, multiplying a matrix by a column vector is equivalent to making a linear combination of the columns of the matrix. The columns of the matrix in equation (60) are exactly the vectors of equation (59). We can write this as an augmented matrix and perform elemetary row operations to determine the solution set if any. However, since it is a homogeneous system, no matter what elementary row ops we apply, the rightmost column will always remain as all zeros, thus there is no point in writing it. Instead we only need to perform row ops on the coefficient matrix. 4 3. Vector Subspaces 4. Affine Spaces There is an interesting connection between solutions to homogeneous and nonhomogeneous linear systems. Lemma 4.14. If ~u and ~v are both solutions to the nonhomogeneous equation, (61) A~x = ~b, then their difference ~y = ~u − ~v is a solution to the associated homogeneous system: (62) A~x = ~0, 66 4. Vector Spaces Proof. This is a simple consequence of the fact that matrix multiplication distributes over vector addition. A~y = A(~u − ~v ) = A~u − A~v = ~b − ~b = ~0 This idea of the lemma is illustrated in figure 4.1 for the system: 1 2 x 12 (63) = , 0 0 y 0 which has the following associated homogeneous system: 1 2 x 0 (64) = . 0 0 y 0 y 6 x + 2y = 12 (4, 4) 4 ~u (−4, 2) (8, 2) 2 ~v ~u − ~v x −4 −2 0 2 4 6 8 10 12 ~v − ~u −2 (4, −2) x + 2y = 0 Figure 4.1. Affine solution space for equation 63 and vector subspace of solutions to equation 64. 5. Bases and Dimension We know that a vector space is simply a nonempty set of vectors that is closed under taking linear combinations, a natural question to ask is, “Is there some subset of vectors which allow us to generate (via linear combinations) every vector in the vector space?”. Since a set is always considered a subset of itself, the answer to this question is clearly yes, because the vector space itself can generate any vector in the vector space. But can we find a proper subset, or perhaps even a finite 6. Abstract Vector Spaces 67 subset from which we can generate all vectors in the vector space by taking linear combinations? The answer to this question is yes, in fact such a set exists for every vector space, although it might not always be finite. 6. Abstract Vector Spaces When we first defined a vector, we defined it to be an ordered list of real numbers, and we noted that the definining characteristic was that a vector had a direction and a magnitude. We then proceeded to define a vector space as a set of objects that obeyed eight rules. These eight rules revolved around addition or summing and multiplication or scaling. Finally, we observed that all but one of these eight rules could be summed up by saying that a vector space is a nonempty set of vectors that is closed under linear combinations (scaled sums). Where “closed” meant that if ~u, ~v ∈ V , then ~u + ~v ∈ V and if c ∈ R, then c~v ∈ V . The one vector space requirement that this “definition” does not satisfy is the very first requirement that addition must be commutative, i.e. ~u + ~v = ~v + ~u for all ~u, ~v ∈ V . It turns out that vector spaces are ubiquitous in mathematics. This section will give several examples. Example 4.15 (Matrices as a Vector Space). Consider the set of m × n matrices with real entries which we will denote, Mmn (R). This set is a vector space. To see why let A, B be elements of Mmn (R), and let c be any real number then (1) Mmn (R) is not empty, specifically it contains an m × n matrix made of all zeros which serves as our zero vector. (2) This set is closed under addition, A + B ∈ Mmn (R). (3) This set is closed under scalar multiplication, cA ∈ Mmn (R). (4) Matrix addition is commutative, A + B = B + A for all matrices in Mmn (R). More concretely, consider M22 (R), the set of 2 × 2 matrices with real entries. What subset of M22 (R), is a basis for this vector space? Well, whatever it is it must allow us to write any matrix as a linear combination of its elements. The simplest choice is called the standard basis, and in this case the simplest choice is the set: 1 0 0 1 0 0 0 0 B= , , , 0 0 0 0 1 0 0 1 This allow us to write for example, a b 1 0 0 =a +b c d 0 0 0 1 0 +c 0 1 0 0 +d 0 0 0 1 Is B really a basis? Clearly it spans the vector space M22 (R), but is there perhaps a smaller set which still spans M22 (R)? Also, how do we know that the four matrices in B are linearly independent? 4 Example 4.16 (Solution Space of Homogeneous, Linear Differential Equations). Consider the differential equation (65) y 00 + y = 0. 68 4. Vector Spaces You can check that both y1 = cos x, and y2 = sin x are solutions. But also, any linear combination of these two solutions is again a solution. To see this let y = ay1 + by2 where a and b are scalars, then: y = a cos x + b sin x y 0 = −a sin x + b cos x y 00 = −a cos x − b sin x So y 00 + y = (−a cos x − b sin x) + (a cos x + b sin x) = 0, and thus we see that the set of solutions is nonempty and closed under linear combinations and therefore a vector space. 4 Notice that if equation (65) were not homogeneous, that is if the right hand side of the equation were not zero, then the set of solutions would not form a vector space. Example 4.17 (Solution Space of Nonhomogeneous, Linear Differential Equations). Consider the differential equation y 00 + y = ex . (66) You can check that both 1 y1 = cos x + ex , and 2 1 y2 = sin x + ex 2 are solutions. However, linear combinations of these two solutions are not solutions. To see this let y = ay1 + by2 where a and b are scalars, then: 1 1 y = a cos x + ex + b sin x + ex 2 2 1 1 y 0 = a − sin x + ex + b cos x + ex 2 2 1 1 y 00 = a − cos x + ex + b − sin x + ex 2 2 1 1 1 1 y 00 + y = a ex + b ex + a ex + b ex 2 2 2 2 = (a + b)ex Thus y1 and y2 will only be solutions when a + b = 1. 4 Chapter 5 Higher Order Linear Differential Equations 1. Homogeneous Differential Equations Similar to how homogeneous systems of linear equations played an important role in developing the theory of vector spaces, a similar class of differential equations will be instrumental in understanding the theory behind solving higher order linear differential equations. Recall definition 1.25, which states that a differential equation is defined to be linear if it can be written in the form: (22) an (x)y (n) + an−1 (x)y (n−1) + · · · + a1 (x)y 0 + a0 (x)y = f (x). Definition 5.1. A linear differential equation is called homogeneous if f (x) = 0. That is if it can be written in the form: (67) an (x)y (n) + an−1 (x)y (n−1) + · · · + a1 (x)y 0 + a0 (x)y = 0. where the right hand side of the equation is exactly zero. Similar to how the solution space of a homogeneous system of linear equations formed a vector subspace of Rn , the solutions of linear, homogeneous differential equations form a vector subspace of F the vector space of functions of a real variable. Theorem 5.2. The set of solutions to a linear homogeneous differential equation (equation 67) form a vector subspace of F, the set of all functions of a real variable. Proof. For sake of simplicity, we will only prove this for a general second order, homogeneous differential equation such as: (68) a2 (x)y 00 + a1 (x)y 0 + a0 (x)y = 0. The necessary modifications for a general nth order equation are left as an exercise. 69 70 5. Higher Order Linear Differential Equations Let V be the set of all solutions to equation 68. We must show three things: 1) V is nonempty, 2) V is closed under vector addition, 3) V is closed under scalar multiplication. (1) Clearly the function y(x) ≡ 0 is a solution to equation 68, thus V is nonempty. (2) Let y1 , y2 ∈ V and let y = y1 + y2 . If we plug this into equation 68 we get: a2 (x)y 00 + a1 (x)y 0 + a0 (x)y = a2 (x)(y1 + y2 )00 + a1 (x)(y1 + y2 )0 + a0 (x)(y1 + y2 ) = a2 (x)(y100 + y200 ) + a1 (x)(y10 + y20 ) + a0 (x)(y1 + y2 ) = [a2 (x)y100 + a1 (x)y10 + a0 (x)y1 ] + [a2 (x)y200 + a1 (x)y20 + a0 (x)y2 ] =0 (3) Let y1 ∈ V, α ∈ R and let y = αy1 . If we plug y into equation 68 we get: a2 (x)(αy1 )00 + a1 (x)(αy1 )0 + a0 (x)(αy1 ) = αa2 (x)y100 + αa1 (x)y10 + αa0 (x)y1 = α[a2 (x)y100 + a1 (x)y10 + a0 (x)y1 ] =0 Now that we know that the set of solutions of a linear, homogeneous equation form a vector space, the next obvious thing to do is figure out a way to generate a basis for the solution space. Then we will be able to write the general solution as a linear combination of the basis functions. For example, suppose a basis contains three functions y1 (x), y2 (x) and y3 (x), then a general solution would have the form: c1 y1 (x) + c2 y2 (x) + c3 y3 (x). At this point we don’t even know how to determine the dimension of a basis for the solution space let alone actual basis functions. Thus, it makes sense to impose some simplifying assumptions. Instead of considering general nth order, linear, homogeneous equations like equation 67, let’s only consider second order equations. In fact, it will be advantageous to be even more restrictive, let’s only consider equations with constant coefficients. Thus let’s consider DEs of the form: (69) y 00 + ay 0 + by = 0, where a and b are elements of R, the set of real numbers. 2. Linear Equations with Constant Coefficients 2.1. Linear Operators. If we let D = equation 69 like so, (70) d dx , and let D2 = d2 dx2 , then we can write D2 y + aDy + by = 0. Each term on the left now involves y, instead of different derivatives of y. This allows us to rewrite the equation as: (71) (D2 + aD + b)y = 0. 2. Linear Equations with Constant Coefficients 71 Equation 71 deserves some explanation. It is analogous to the matrix equation: (A2 + aA + b)~v = A2~v + aA~v + b~v = ~0, where we have replaced differention, (D) with a square matrix A and the function y is replaced with the vector, ~v . The expression (D2 + aD + b) is a function with domain F and codomain (range) also F. In other words it is a function on functions. You should already be familiar with the notion of the derivative as a function on functions. The only thing new we have introduced is combining multiple derivatives along with scalars into a single function. Also, we are using multiplicative notation to denote function application. That is, we will usually forgo the parentheses in (D2 + aD + b)(y) in favor of (D2 + aD + b)y. Thus we can rewrite a second order, linear, homogeneous DE with constant coefficients in two equivalent ways: y 00 + ay 0 + by = 0 ⇐⇒ (D2 + aD + b)y = 0. Finally, if we let L = D2 + aD + b, (72) then we can write equation 69 very compactly: Ly = 0. Solving this DE amounts to figuring out the set of functions which get mapped to the zero function, by the linear operator L = D2 + aD + b. Definition 5.3. A linear operator is a function often denoted L with signature: L:F →F that is, it is a function which maps functions to other functions such that L(c1 y1 + c2 y2 ) = c1 Ly1 + c2 Ly2 , where c1 , c2 are scalars and y1 , y2 are functions. Remark 5.4. The linear operator notation: (D2 + aD + b) is simply shorthand for a function on functions. In other words it is a name for a function similar to how we might associate the symbol f with the expression x2 + 1 by writing f (x) = x2 + 1. The difference is that the name also serves as the function definition. Remark 5.5. The linear operator notation: (D2 + aD + b) does not represent an expression that can be evaluated. Recall that this is analogous to the matrix equation A2 + aA + b, but although the sum A2 + aA is defined for square n × n matrices the second sum, aA + b is not even defined. For example, it makes no sense to add a 2 × 2 matrix to a scalar. Similarly it makes no sense to add the derivative operator to a scalar. However it does make sense to add the derivative of a function to a scalar multiple of a function. Of course, this notation works for first order and higher order differential equations as well. For example, y (4) y 0 − 3y = 0 ⇐⇒ (D − 3)y = 0, 00 ⇐⇒ (D4 + 6D2 + 12)y = 0. + 6y + 12y = 0 72 5. Higher Order Linear Differential Equations Lemma 5.6. First order linear operators commute. That is, if a and b are any real numbers and y is a function, then (73) (D − a)(D − b)y = (D − b)(D − a)y Proof. (D − a)(D − b)y = (D − a)(Dy − by) = D2 y + D(−by) − a(Dy) + aby = (D2 y − D(ay))) − (bDy − bay) = D(Dy − ay) − b(Dy − ay) = (D − b)(Dy − ay) = (D − b)(D − a)y This lemma provides a method for finding the solutions of linear, homogeneous DEs with constant coefficients. Example 5.7. Suppose we wish to find all solutions of (74) y 000 − 2y 00 − y 0 + 2y = 0. First rewrite the equation using linear operator notation, (D3 − 2D2 − D + 2)y = 0 and then factor the linear operator exactly like you factor a polynomial. (D − 2)(D2 − 1)y = 0 (D − 2)(D − 1)(D + 1)y = 0 Since first–order, linear operators commute, there are exactly three solutions to this equation. This is because if any one of the linear operators map y to the constant zero function, then all following operators will as well. In other words (D − a)0 = 0. Thus the three solutions are: (D − 2)y = 0 ⇔ y 0 = 2y (D − 1)y = 0 ⇔ y 0 = y (D + 1)y = 0 ⇔ y 0 = −y ⇔ y = c1 e2x ⇔ y = c2 ex ⇔ y = c3 e−x . Therefore the general solution of equation 74 is a linear combination of these three solutions: (75) y(x) = c1 e2x + c2 ex + c3 e−x . 4 The previous example leads us to believe that solutions to linear, homogeneous equations with constant coefficients will have the form y = erx . If we adopt the notation: p(D) = an Dn + an−1 Dn−1 + · · · + a1 D + a0 , 2. Linear Equations with Constant Coefficients 73 then we see that we can write a homogeneous linear equation as p(D)y = 0. In particular, we can write second order, homogeneous, linear equations in the following way a2 y 00 + a1 y 0 + a0 y = 0 (a2 D2 + a1 D + a0 )y = 0 p(D)y = 0 This means that we can interpret p(D) as a linear operator. 2.2. Repeated Roots. What if our linear operator has a repeated factor? For example (D − r)(D − r)y = 0, does this mean that the differential equation (76) (D − r)2 y = 0 ⇐⇒ y 00 − 2ry 0 + r2 y = 0 only has solution y1 = erx ? This is equivalent to saying that the solution space is one dimensional. But could the solution space still be two dimensional? Let’s guess that there is another solution y2 of the form y2 = u(x)y1 , where u(x) is some undetermined function, but with the restriction u(x) 6≡ c. We must not allow u(x) to be a constant function, otherwise y1 and y2 will be linearly dependent and will not form a basis for the solution space. (D − r)2 y2 = (D − r)2 u(x)y1 = (D − r)2 u(x)erx = (D − r)[(D − r)u(x)erx ] = (D − r)[Du(x)erx − ru(x)erx ] = (D − r)[u0 (x)erx + ru(x)erx − ru(x)erx ] = (D − r)[u0 (x)erx ] = D[u0 (x)erx ] − ru0 (x)erx = u00 (x)erx + ru0 (x)erx − ru0 (x)erx = u00 (x)erx = [D2 u(x)]erx = 0. It follows that if y2 = u(x)y1 (x) is to be a solution of equation 76, then: D2 u(x) = 0. In other words u(x) must satisfy u00 (x) = 0. We already know that degree one polynomials satisfy this constraint. Thus u(x) can be any linear polynomial, for example: u(x) = a0 + a1 x. Hence y2 (x) = (a0 + a1 x)erx and thus our general solution to equation 76 is a linear combination of the two solutions y1 and y2 : 74 5. Higher Order Linear Differential Equations y = c1 y 1 + c2 y 2 = c1 erx + c2 (a0 + a1 x)erx = c1 erx + c2 · a0 erx + c2 · a1 xerx = (c1 + c2 · a0 )erx + (c2 · a1 )xerx = (c∗1 + c∗2 x)erx Notice that the general solution is equivalent to y2 alone. Since y2 necessarily has two unknowns in it (a0 and a1 from the linear polynomial), this is reasonable. Hence the general solution to a second order, linear equation with the single repeated root r in its characteristic equation is given by: y = (c1 + c2 x)erx . The steps above can be extended to the situation where a linear operator consists of a product of k equal first order linear operators. Lemma 5.8. If the characteristic equation of a linear, homogeneous differential equation with constant coefficients has a repeated root of multiplicity k, for example (77) (D − r1 )(D − r2 ) · · · (D − rn )(D − r)k y = 0, then the part of the general solution corresponding to the root r has the form (78) (c1 + c2 x + c3 x2 + · · · + ck xk−1 )erx . 2.3. Complex Conjugate Roots. 3. Mechanical Vibrations A good physical example of a second order, linear differential equation with constant coefficients is provided by a mass, spring and dashpot setup as depicted below. A dashpot is simply a piston like device which provides resistance proportional to the rate at which it is compressed or pulled. It is like the shock absorbers found in cars. In this setup there are three separate forces acting on the mass. k c m Figure 3.1. Mass, spring, dashpot mechanical system (1) Spring: Fs = −kx (2) Dashpot: Fd = −cx0 (3) Driving Force: F (t) The spring provides a restorative force meaning that its force is proportional to but in the opposite direction of the displacement of the mass. Similarly, the force due to the dashpot is proportional to the velocity of the mass, but in the opposite direction. Finally, the driving force could correspond to any function we are capable 3. Mechanical Vibrations 75 of generating by physical means. For example, if the mass is made of iron, then we could use an electromagnet to periodically push and pull it in a sinusoidal fashion. We use Newton’s second law, which states that the sum of the forces applied to a body is equal to its mass times acceleration to derive the governing differential equation. Σf = ma −Fs − Fd + F (t) = mx00 −kx − cx0 + F (t) = mx00 mx00 + cx0 + kx = F (t) (79) As in the case of solving affine systems in chapter 4 finding the general solution of equation 79 is a two step process. First we must find all solutions to the associated homogeneous equation: mx00 + cx0 + kx = 0. (80) Next, we must find a single solution to the nonhomogeneous (or original) equation and add them together to get the general solution, i.e. family of all solutions. 3.1. Free Undamped Motion. First we will consider the simplest mass, spring dashpot system, one where there is no dashpot, and there is no driving force. Setting F (t) = 0 makes the equation homogeneous. In this case, equation 79 becomes mx00 + kx = 0. (81) We define ω0 = p k/m, which allows us to write the previous equation as: x00 + ω02 x = 0. (82) The characteristic equation of this DE is r2 + ω02 = 0, which has conjugate pure imaginary roots and yields the general solution: (83) x(t) = A cos ω0 t + B sin ω0 t It is difficult to graph the solution by hand because it is the sum of two trigonometric functions. However, we can always write a sum of two sinusoids as a single sinusoid. That is, we can rewrite our solution in the form: x(t) = C cos(ω0 t − α), (84) which is much easier to graph by hand. We just need a way to compute the amplitude C and the phase shift α. What makes this possible is the cosine subtraction trigonometric identity: (85) cos(θ − α) = cos θ cos α + sin θ sin α, which we rearrange to: (86) cos(θ − α) = cos α cos θ + sin α sin θ. 76 5. Higher Order Linear Differential Equations This formula allows us to rewrite our solution, equation 83 as follows: x(t) = A cos ω0 t + B sin ω0 t A B =C cos ω0 t + sin ω0 t C C = C (cos α cos ω0 t + sin α sin ω0 t) = C cos(ω0 t − α) where the substitutions, A = cos α C B = sin α C p C = A2 + B 2 are justified by the right triangle in figure 3.2. The final step follows from the cosine subtraction formula in equation 86. C B α A Figure 3.2. Right triangle for phase angle α Put info about changing arctan to a version with codomain (0, 2π) here. 3.2. Free Damped Motion. 4. The Method of Undetermined Coefficients Before we explain the method of undetermined coefficients we need to make a simple observation about nonhomogeneous or driven equations such as (87) y 00 + ay 0 + by = F (x). Solving such equations where the right hand side is nonzero will require us to actully find two different solutions yp and yh . The p stands for particular and the h stands for homogeneous. The following theorem explains why. Theorem 5.9. Suppose yp is a solution to: (88) y 00 + ay 0 + by = F (x). And suppose y1 and y2 are solutions to the associated homogeneous equation: (89) y 00 + ay 0 + by = 0. 4. The Method of Undetermined Coefficients 77 Then the function defined by y = yh + yp (90) y = c1 y1 + c2 y2 +yp | {z } yh is a solution to the original nonhomogeneous system, equation 88. Proof. (c1 y1 + c2 y2 + yp )00 + a(c1 y1 + c2 y2 + yp )0 + b(c1 y1 + c2 y2 + yp ) =(c1 y100 + c2 y200 + yp00 ) + a(c1 y10 + c2 y20 + yp0 ) + b(c1 y1 + c2 y2 + yp ) =c1 (y100 + ay10 + by1 ) + c2 (y200 + ay20 + by2 ) + (yp00 + ayp0 + byp ) =c1 · 0 + c2 · 0 + F (x) =F (x) Solving homogeneous, linear DEs with constant coefficients is simply a matter of finding the roots of the characteristic equation and then writing the general solution according to the types of roots and their multiplicities. But the method relies entirely on the fact that the equation is homogeneous, that is that the right hand side of the equation is zero. If we have a driven or nonhomogeneous equation such as (91) y (n) + an−1 y (n−1) + · · · + a1 y 0 + a0 y = F (x) then we can no longer rely upon factorization techniques to solve the characteristic equation: rn + an−1 rn−1 + · · · + a1 r + a0 = F (x). (92) y = yh + yp . Example 5.10. Find a particular solution of (93) y 00 + y 0 − 12y = 2x + 5. A particular solution will have the same form as the forcing function which in this case is F (x) = 2x + 5, that is it will have the form: yp = Ax + B. Here A and B are real coefficients which are as of yet “undetermined”, hence the name of the method. Our task is to determine what values for A and B will make yp a solution of equation 93. We can determine values for the undetermined coefficients by differentiating our candidate function (twice) and plugging the derivatives into equation 93: 4 78 5. Higher Order Linear Differential Equations 5. The Method of Variation of Parameters The method of undetermined coefficients examined in the previous section relied upon the fact that the forcing function f (x) on the right hand side of the differential equation had a finite number of linearly independent derivatives. What if this isn’t the case? For example consider the equation (94) y 00 + P (x)y 0 + Q(x)y = tan x. The derivatives of tan x are as follows: sec2 x, 2 sec2 x tan x, 4 sec2 x tan2 x + 2 sec4 x, . . . These functions are all linearly independent. In fact, tan x has an infinite number of linearly independent derivatives. Thus, clearly, the method of undetermined coefficients won’t work as a solution method for equation 94. The method of variation of parameters can handle this situation. It is a more general solution method, so in principle it can be used to solve any linear, non– homogeneous differential equation, but the method does force us to compute indefinite integrals, so it does not always yield closed form solutions. However, it will allow us to solve linear equations with non–constant coefficients such as: (95) y (n) + pn−1 (x)y (n−1 + · · · + p2 (x)y 00 + p1 (x)y 0 + p0 (x)y = f (x) Recall that the general solution to equation 95 will have the form y = yh + yp where yh is a solution to the associated homogeneous equation and is obtained via methods explained previously if the coefficients are all constant. If they are not all constants, then your only recourse at this point will be trial and error. This method assumes we already have a set of n linearly independent solutions to the associated homogeneous equation. The method of variation of parameters is only a method for finding a particular solution, yp . For sake of simplicity, we first derive the formula for a general second order linear equation, such as: (96) y 00 + P (x)y 0 + Q(x)y = f (x). We begin by assuming, or guessing that a particular solution might have the form (97) yp = u1 (x)y1 + u2 (x)y2 , where u1 , u2 are unknown functions, and y1 and y2 are known, linearly independent solutions to the homogeneous equation associated with equation 96. Our goal is to determine u1 and u2 . Since we have two unknown functions, we will need two equations which these functions must satisfy. One equation is obvious, our guess for yp must satisfy equation 96, but there is no other obvious equation. However, when we plug our guess for yp into equation 96, then we will find another equation which will greatly simplify the calculations. 5. The Method of Variation of Parameters 79 Before we can plug our guess for yp into equation 96 we need to compute two derivatives of yp : yp0 = u01 y1 + u1 y10 + u02 y2 + u2 y20 yp0 = (u1 y10 + u2 y20 ) + (u01 y1 + u02 y2 ). Since we have the freedom to impose one more equation’s worth of restrictions on u1 and u2 , it makes sense to impose the following condition: u01 y1 + u02 y2 = 0, (*) because then when we compute yp00 it won’t involve second derivatives of u1 or u2 . This will make solving for u1 and u2 possible. Assuming condition (*), yp0 and yp00 become: yp0 = u1 y10 + u2 y20 (98) yp00 = u01 y10 + u1 y100 + u02 y20 + u2 y200 yp00 = u01 y10 + u02 y20 + u1 y100 + u2 y200 Recall that by assumption, y1 and y2 both satisfy the homogeneous version of equation 96, thus we can write: yi00 = −P (x)yi0 − Q(x)yi Substituting this in for (99) yp00 yp00 = = u01 y10 u01 y10 + + y100 u02 y20 u02 y20 and y200 for i = 1, 2. in the equation above yields: + u1 (−P (x)y10 − Q(x)y1 ) + u2 (−P (x)y20 − Q(x)y2 ) − P (x)(u1 y10 + u2 y20 ) − Q(x)(u1 y1 + u2 y2 ). If we plug yp , yp0 and yp00 found in equations 97, 98 and 99 into the governing equation, 96, then we get: (( ((( y1(+(u2 y2 ) −P y10(+ u2 y20 )−Q(x)(u yp00 = u01 y10 + u02 y20 ( ((1( ((x)(u (((1( ( ((( +P y10(+ u2 y20 ) P (x)yp0 = ((1( ((x)(u ( (( ( +Q(x)(u y1(+(u2 y2 ) +Q(x)yp = ( 1 ( ( ( f (x) = u01 y10 + u02 y20 The last line above is our second condition which the unknowns u1 and u2 must satisfy. Combining the above condition with the previous condition (*), we get the following linear system of equations: u01 y1 + u02 y2 = 0 u01 y10 + u02 y20 = f (x) Which when written as a matrix equation becomes: y1 y2 u01 0 = f (x) y10 y20 u02 Notice that this system will have a unique solution if and only if the determinant of the 2 × 2 matrix is nonzero. This is the same condition as saying the Wronskian, W = W (y1 , y2 ) 6= 0. Since y1 and y2 were assumed to be linearly independent this 80 5. Higher Order Linear Differential Equations will be guaranteed. Therefore we can solve the system by multiplying both sides of the matrix equation by the inverse matrix: 0 0 1 u1 y2 −y2 0 = u02 y1 f (x) W −y10 0 1 −y2 f (x) u1 = u02 y1 f (x) W Integrating both of these equations with respect to x yields our solution: Z Z y1 f (x) y2 f (x) dx u2 (x) = dx. u1 (x) = − W W Assuming these integrals can be computed, then a particular solution to equation 96 will be given by: (100) yp (x) = Z Z y2 (x)f (x) y1 (x)f (x) − dx y1 (x) + dx y2 (x). W (x) W (x) It is interesting to point out that our solution for yp does not depend on the coefficient functions P (x) nor Q(x) at all. Of course, if P (x) and Q(x) are anything other than constant functions, then we don’t have an algorithmic way of finding the required linearly independent solutions to the associated homogeneous equation anyway. This method is wholly dependent on being able to solve the associated homogeneous equation. 6. Forced Oscillators and Resonance Recall the mass, spring, dashpot setup of section 3. In that section we derived the following governing differential equation for such systems: (101) mx00 + cx0 + kx = F (t). Recall that depending on m, c and k the system will be either overdamped, critically damped or underdamped. In the last case we get oscillatory behavior. It is this last case that we are interested in now. Our goal for this section is to analyze the behavior of such systems when a periodic force is applied to the mass. In particular, we are interested in the situation where the period of the forcing function almost matches or exactly matches the natural period of the mass and spring. There are many ways in which to impart a force to the mass. One clever way to impart a periodic force to the mass is to construct it such that it contains a vertical, motorized, flywheel with an off–center, center of mass. A flywheel is simply any wheel with mass. Any rotating flywheel which does not have its center of mass at the physical center, will impart a force through the axle to its housing. For a real life example, consider a top–loading, washing machine. The water and clothes filled basket is a flywheel. These machines typically spin the load of clothes to remove the water. Sometimes, however, the clothes become unevenly distributed in the basket during the spin cycle. This causes the spinning basket to impart a, sometimes quite strong, oscillatory force to the washing machine. If the 6. Forced Oscillators and Resonance 81 basket is unbalanced in the right (wrong?) way, then the back and forth vibrations of the machine might even knock the washing machine off its base. Yet another way to force a mass, spring, dashpot system is to drive or force the mass via an electromagnet. If the electromagnet has a controllable power source, then it can be used to drive the mass in numerous different ways, i.e. F (t) can take on numerous shapes. Undamped Forced Oscillations. If there is no frictional damping, that is if c = 0, then the associated homogeneous equation for equation 101, mx00 + kx = 0, always results in oscillatory solutions. In this case, we let ω02 = k/m and rewrite the equation as: x00 + ω02 x = 0, which has characteristic equation, r2 + ω02 = 0, and hence has solution xh (t) = c1 cos ω0 t + c2 sin ω0 t. Here, ω0 is the natural frequency of the mass spring system, or the frequency at which the system naturally vibrates if pushed out of equilibrium. When we periodically force the system then we must find a particular solution, xp , in order to assemble the full solution x(t) = xh (t) + xp (t). The method of undetermined coefficients tells us to make a guess for xp which matches the forcing function F (t) and any of its linearly independent derivatives. If the periodic forcing function is modeled by: F (t) = F0 cos ωt ω 6= ω0 , then our guess for xp should be: xp = A cos ωt + B sin ωt, however since the governing equation does not have any first derivatives, B will necessarily be zero, thus we guess: xp = A cos ωt. Plugging this into equation 101, still with c = 0 yields: −mω 2 A cos ωt + kA cos ωt = F0 cos ωt so F0 /m F0 = 2 . k − mω 2 ω0 − ω 2 Thus the solution to equation 101, without damping, i.e. c = 0 is: A= x(t) = xh (t) + xp (t) x(t) = c1 cos ω0 t + c2 sin ω0 t + F0 /m cos ωt. ω02 − ω 2 Which with the technique from section 3 can be rewritten: (102) x(t) = C cos(ω0 t − α) + F0 /m cos(ωt). ω02 − ω 2 This is an important result because it helps us to understand the roles of the homogeneous and particular solutions! In words, this equation says that the response of a mass spring system to beingpperiodically forced is a superposition of two separate responses. Recall that C = c21 + c22 , and similarly α only depends on c1 and c2 which in turn only depend on the initial conditions. Also ω0 is simply a function of the properties of the mass and spring, thus the function on the left of (102), i.e. xh represents the system’s response to the initial conditions. Notice that the function 82 5. Higher Order Linear Differential Equations on the right of (102) depends on the driving amplitude (F0 ), driving frequency (ω) and also m and k, but not at all on the initial conditions. That is, the function on the right, i.e. xp is the system’s response to being driven or forced. The homogeneous solution is the system’s response to being disturbed from equilibrium. The particular solution is the system’s response to being periodically driven. The interesting thing is that these two functions are not intertwined in some complicated way. This observation is common to all linear systems, that is, a solution function to any linear system will consist of a superposition of the system’s response to intitial conditions and the system’s response to being driven. Beats. In the previous solution, we assumed that ω 6= ω0 . We had to do this so that xp would be linearly independent from xh . Now we will examine what happens as ω → ω0 , that is if we let the driving frequency (ω) get close to the natural frequency of oscillation for the mass and spring, (ω0 ). Clearly, as we let these two frequencies get close, the amplitude of the particular solution blows up! lim A(ω) = lim ω→ω0 ω→ω0 F0 /m = ∞. ω02 − ω We will solve for this situation exactly in the next subsection. But what can we say about the solution when ω ≈ ω0 ? This is easiest to analyze if we impose the initial conditions, x(0) = x0 (0) = 0 on the solution in (102). If we do so, then it is easy to compute the three unknowns: F0 c2 = 0 α = π + tan−1 0 = π. c1 = − 2 m(ω0 − ω 2 ) Recall that cos(ωt − π) = − cos(ωt), hence F0 F0 xh = C cos(ω0 t − π) = cos(ω0 t − π) = − cos ω0 t. 2 2 2 m(ω0 − ω ) m(ω0 − ω 2 ) Therefore, the solution to the IVP is: F0 x(t) = [cos ωt − cos ω0 t] 2 m(ω0 − ω 2 ) F0 = cos 12 (ω0 + ω) − 21 (ω0 − ω) t − cos m(ω02 − ω 2 ) F0 [cos(A − B) − cos(A + B)] = m(ω02 − ω 2 ) F0 = [2 sin A sin B] 2 m(ω0 − ω 2 ) 2F0 = sin 12 (ω0 + ω)t sin 12 (ω0 − ω)t 2 2 m(ω − ω ) 0 2F0 1 = sin 2 (ω0 − ω)t sin 12 (ω0 + ω)t m(ω02 − ω 2 ) 1 2 (ω0 + ω) + 12 (ω0 − ω) t = A(t) sin 21 (ω0 + ω)t. Here we have used a trigonometric substitution, so that we could write the solution as the product of two sine waves. We renamed the expression in large square brackets to A(t) which is suggestive of amplitude. Notice that A(t) varies sinusoidally, 6. Forced Oscillators and Resonance 83 but does so at a much slower frequency than the remaining sinusoidal factor. Thus the solution corresponds to a rapid oscillation with a slowly varying amplitude. This phenomenon is known as beats. Figure 6.1. Example of beats In our mechanical example of a driven mass, spring system, this solution corresponds to the mass moving back and forth at a frequency equal to the average of the natural frequency and the driving frequency, i.e. (ω0 + ω)/2. However, the amplitude of each oscillation varies smoothly from zero amplitude to some maximum amplitude and then back again. When the waves are sound waves, this corresponds to a single pitch played with varying amplitude or volume. It creates a “wah, wah” kind of sound. Musicians actually use beats to tune their instruments. For example when tuning a piano or guitar you can play a note with something known to be at the correct pitch and then tighten or loosen the string depending on whether the amplitude changes are getting closer in time or more spread out. Faster beats (amplitude changes) mean you are moving away from matching the pitch, whereas slower beats correspond to getting closer to the correct pitch. Resonance. What if we let the driving frequency match the natural frequency? That is, what if we let ω = ω0 ? Our governing equation is: (103) x00 + ω02 x = F0 cos ω0 t. m Since our usual guess for xp will match the homogeneous solution we must multiply our guess for xp by t, the independent variable. So our guess, and its derivatives are: xp (t) = t(A cos ω0 t + B sin ω0 t) x0p (t) = (A cos ω0 t + B sin ω0 t) + ω0 t(B cos ω0 t − A sin ω0 t) x00p (t) = 2ω0 (B cos ω0 t − A sin ω0 t) + ω02 t(−A cos ω0 t − B sin ω0 t). 84 5. Higher Order Linear Differential Equations Upon plugging these derivatives into equation (103), we get: ( ( ((( (B ( ( x00p = 2ω0 (B cos ω0 t − A sin ω0 t) + ω02 t(−A cos ω t − sin ω t) ( 0 0 ( ( ((( ( ( ( ( +ω02 xp = ω02( t(A cos( ω( t( + B sin ω0 t) ( (( 0 F0 cos ω0 t= 2ω0 (B cos ω0 t − A sin ω0 t) m Thus A = 0 and B = F0 /2mω0 , and our particular solution is: F0 (104) xp (t) = t sin ω0 t. 2mω0 Figure 6.2. An example of resonance. Functions plotted are: xp (t) = t sin(πt) and the lines x(t) = ±t. Figure 6.2 shows the graph of xp (t) for the values F0 = ω0 = π, m = 21 . Notice how the amplitude of oscillation grows linearly without bound, this is resonance. Physically, the mass spring system has a natural frequency at which it changes kinetic energy to potential energy and vice versa. When a driving force matches that natural frequency, work is done on the system and hence its total energy increases. 7. Damped Driven Oscillators 7. Damped Driven Oscillators Figure 6.3. http://xkcd.com/228/ 85 Chapter 6 Laplace Transforms The Laplace transform is an integral transform that can be used to solve IVPs. Essentially, it transforms a differential equation along with initial values into a rational function. Whereupon the task will be to rewrite the rational function into its partial fractions decomposition. After rewriting the rational function in this simplified form, you can then perform the inverse Laplace transform to find the solution. Just as with all other solution methods for higher–order, linear, differential equations, the Laplace transform method reduces the problem to an algebraic one, in this case a partial fractions decomposition problem. Unfortunately, the Laplace transform method typically requires more work, or computation than the previous methods of undetermined coefficients and variation of parameters. But the Laplace method is more powerful. It will allow us to solve equations with more complicated forcing functions than before. It is especially useful for analyzing electric circuits where the power is periodically switched on and off. 1. The Laplace Transform Definition 6.1. If a function f (t) is defined for t ≥ 0, then its Laplace transform is denoted F (s) and defined by the integral: F (s) = L {f (t)} = Z ∞ e−st f (t) dt 0 for all values of s for which the improper integral converges. Notice that the last sentence of the above definition reminds us that improper integrals do not necessarily converge, i.e. equal a finite number. Thus when computing Laplace transforms of functions, we must be careful to state any assumptions on s which we make to ensure convergence. Another way to think of this is that the 87 88 6. Laplace Transforms domain of a transformed function, say F (s) is almost never the whole real number line. Example 6.2. L {k} = k s Recall that both the integral and the limit operator are linear, so we can pull constants outside of these operations. Z ∞ L {k} = e−st k dt 0 b Z e−st dt = k lim b→∞ 0 t=b 1 −st = k lim − e b→∞ s t=0 *0 1 1 −sb + = k lim − e b→∞ s s = k s s>0 for s > 0. 4 Example 6.3. L eat = L eat = Z 1 s−a ∞ e−st eat dt Z0 ∞ = e(−s+a)t dt let u = (−s + a)t du = −(s − a)dt 0 ∞ −eu du s−a 0 −(s−a)t t=b −e = lim b→∞ s−a t=0 *0 −(s−a)b 1 −e = lim + b→∞ s − a s−a Z = = 1 s−a s>a for s > a. 4 Notice that the integral in the previous example diverges if s ≤ a, thus we must restrict the domain of the transformed function to s > a. Example 6.4. L {t} = 1 s2 1. The Laplace Transform 89 This integral will require us to use integration by parts with the following assignments, dv = e−st dt −e−st . du = dt v = s u=t L {t} = Z ∞ e−st t dt 0 t=∞ Z ∞ −te−st −e−st = − dt s s 0 t=0 0Z ∞ * −e−st −be−sb − dt = lim b→∞ s s 0 −st t=b e = − lim s>0 b→∞ s2 t=0 0 e−sb 1 s>0 = − lim − 2 b→∞ s2 s = 1 s2 s>0 for s > 0. 4 We will need to know the Laplace transforms of both sin kt and cos kt where k is any real number. Each of these transforms can be computed in a straightforward manner from the definition and using integration by parts twice. However, it is less work to compute both of them simultaneously by making a clever observation and then solving a system of linear equations. This way, instead of having to do integration by parts four times we will only need to do it twice, and it illustrates a nice relationship between the two transforms. First, we set up each integral in the definition to be solved via integration by parts. u= e−st −st du= −se A = L {cos kt} = Z dv= cos kt dt 1 dt v= sin kt k ∞ e−st cos kt dt 0 Z b s ∞ −st 1 = lim e−st sin kt 0 + e sin kt dt k b→∞ k 0 " # :0 1 s −sb = lim e sin k − 0 + L {sin kt} s > 0 b→∞ k k s = B s>0 (see below for definition of B) k 90 6. Laplace Transforms u= e−st −st du= −se B = L {sin kt} = dv= sin kt dt 1 dt v= − cos kt k ∞ Z e−st sin kt dt 0 Z b s ∞ −st 1 e cos kt dt lim e−st cos kt 0 − k b→∞ k 0 # " :0 1 s −sb = − lim e cos kb − 1 − L {cos kt} s > 0 k b→∞ k =− 1 s − A s>0 k k Thus we have the following system: s A− B =0 k s 1 A+B = , k k = which upon solving and recalling A = L {cos kt} and B = L {sin kt} yields: s + k2 k L {sin kt}= 2 s + k2 L {cos kt}= s2 Theorem 6.5. Linearity of the Laplace Transform If a, b ∈ R are constants, and f and g are any two functions whose Laplace transforms exist, then: (105) L {af (t) + bg(t)} = a L {f (t)} + b L {g(t)} = aF (s) + b G(s), for all s such that the Laplace transforms of both functions f and g exist. Proof. Recall that both the integral and the limit operators are linear, thus: Z ∞ L {af (t) + bg(t)} = e−st [af (t) + b g(t)] dt 0 Z c = lim e−st [af (t) + b g(t)] dt c→∞ 0 Z c Z c −st = a lim e f (t) dt + b lim e−st g(t) dt c→∞ 0 c→∞ 0 = a L {f (t)} + b L {g(t)} = aF (s) + b G(s) 1. The Laplace Transform Example 6.6. L {cosh kt} = 91 s s2 − k 2 ekt + e−kt L {cosh kt} = L 2 1 = L ekt + L e−kt 2 1 1 1 = + 2 s−k s+k 1 s + k + s − k = 2 s2 − k 2 s = 2 s − k2 4 Example 6.7. L {sinh kt} = k s2 − k 2 ekt − e−kt L {sinh kt} = L 2 1 kt = L e − L e−kt 2 1 1 1 = − 2 s−k s+k 1 s + k − s + k = 2 s2 − k 2 k = 2 s − k2 4 Theorem 6.8. Translation on the s-Axis If the Laplace transform of y(t) exists for s > b, then L eat y(t) = Y (s − a) for s > a + b. Proof. L eat y(t) = Z ∞ e−st eat y(t) dt 0 Z = ∞ e−(s−a)t y(t) dt 0 = Y (s − a) We have computed Laplace transforms of a few different functions, but the question naturally arises, can we compute a Laplace transform for every function? 92 6. Laplace Transforms The answer is no. So the next logical question is, what properties must a function have in order for its Laplace transform to exist? This is what we will examine here. Definition 6.9. A function f (t) is piecewise continuous on an interval [a, b] if the interval can be divided into a finite number of subintervals such that (1) f (t) is continuous on the interior of each subinterval, and (2) f (t) has a finite limit as t approaches each endpoint of each subinterval. Definition 6.10. A function f (t) is said to be of exponential order a or exponential of order a, if there exists positive constants M, a and T such that (106) |f (t)| ≤ M eat for all t ≥ T. Theorem 6.11. Existence of Laplace Transforms 2. The Inverse Laplace Transform Although there is a way to define the inverse Laplace transform as an integral transform, it is generally not necessary and actually more convenient to use other techniques, especially table lookup. For example, we already know, s , L {cos kt} = 2 s + k2 so certainly the inverse must satisfy: s L −1 = cos kt. s2 + k 2 Thus, we will define the inverse Laplace transform simply to be the transform satisfying: L {y(t)} = Y (s) ⇐⇒ y(t) = L −1 {Y (s)} That special double headed arrow has a specific meaning in Mathematics. It is often read, “if and only if”, which has a specific meaning in formal logic, but the colloquial way to understand it is simply that it means the two statements it connects are exactly equivalent. This means that one can be interchanged for the other in any logical chain of reasoning without changing the validity of the argument. Let’s do several examples of how to find inverse Laplace transforms. Example 6.12. Find L −1 1s . This follows directly from the computation we didin the previous section which showed that L {k} = ks , thus if k = 1, then L −1 1s = 1. 4 n o 1 Example 6.13. Find L −1 s+3 . Again we previously showed L {eat } = n o 1 we see that L −1 s−(−3) = e−3t . 1 s−a Therefore if we set a = −3, then 4 2. The Inverse Laplace Transform Example 6.14. Find L −1 n s s2 +5 93 o . This matches the formula L {cos kt} = o n √ thus L −1 s2s+5 = cos 5t. Example 6.15. Find L −1 n s+1 s2 −4 s s2 +k2 with k = √ 5 so that k 2 = 5, 4 o . After a simple rewrite, we see that the transforms of cosh kt and sinh kt apply. L −1 s+1 s2 − 4 =L −1 s 2 s −4 −1 1 2 s −4 +L 1 2 = cosh 2t + L −1 2 2 s −4 2 1 −1 = cosh 2t + L 2 s2 − 4 1 = cosh 2t + sinh 2t 2 4 Example 6.16. Find L −1 n o s (s−2)2 +9 . This example will rely on the translation on the s-axis theorem, or theorem 6.8 from the previous section, which summarized says: L eat y(t) = Y (s − a) eat y(t) = L −1 {Y (s − a)} L −1 s (s − 2)2 + 9 s−2+2 (s − 2)2 + 9 (s − 2) 2 −1 + L = L −1 (s − 2)2 + 9 (s − 2)2 + 9 2 3 = e2t cos 3t + L −1 3 (s − 2)2 + 9 2 3 = e2t cos 3t + L −1 3 (s − 2)2 + 9 2 = e2t cos 3t + e2t sin 3t 3 = L −1 4 Example 6.17. Find L −1 n o 1 2 s +4s+8 . First we complete the square in the denominator. s2 1 1 1 = 2 = + 4s + 8 (s + 4s + 4) − 4 + 8 (s + 2)2 + 4 94 6. Laplace Transforms Thus, L −1 1 s2 + 4s + 8 1 (s + 2)2 + 4 2 1 = L −1 2 (s + 2)2 + 4 1 2 = L −1 2 (s + 2)2 + 4 1 = e−2t sin 2t 2 = L −1 4 3. Laplace Transform Method of Solving IVPs To solve a differential equation via the Laplace transform we will begin by taking the Laplace transform of both sides of the equation. But by definition, a differential equation will involve derivatives of some unknown function, say y(t), thus we need to figure out what the Laplace transform does to derivatives such as y 0 (t), y 00 (t), y 000 (t) and so on and so forth. We will start, by making a simplying assumption, we will assume that y 0 (t) is continuous. Later, we will revise the following theorem such that y(t) is just required to be a piecewise continuous function. Theorem 6.18. Laplace Transforms of t–Derivatives If y 0 (t) is continuous, piecewise smooth and of exponential order a, then L {y 0 (t)} exists for s > a and is given by: L {y 0 (t)} = s L {y(t)} − y(0) s > a. Or the equivalent but more compact form: (107) L {y 0 (t)} = sY (s) − y(0) s > a. Proof. We begin with the defintion of the Laplace transform and use integration by parts with the following substitutions: u= e−st dv= y 0 (t) dt du= −se−st L {y 0 (t)} = Z v= y(t) ∞ e−st y 0 (t) dt 0 Z ∞ −st ∞ = e y(t) 0 + s e−st y(t) dt 0 ∞ = e−st y(t) 0 + s L {y(t)} :0 −sb = lim e y(b) − y(0) + s L {y(t)} b→∞ = s L {y(t)} − y(0) s>0 s > 0. Where limb→∞ [e−sb y(b)] = 0 provided s > a because y(t) was assumed to be exponential of order a. The reason we had to assume that y 0 (t) is continuous is 3. Laplace Transform Method of Solving IVPs 95 because we used the second Fundamental Theorem of Calculus to evaluate the definite integrals, and therefore we need the endpoints of Example 6.19. Solve the IVP: y 0 − 5y = 0 y(0) = 2 Taking the Laplace transform of both sides of the equation and using the linearity property we get: L {y 0 − 5y} = L {0} L {y 0 } − 5 L {y} = 0 sY (s) − y(0) − 5Y (s) = 0 Y (s)[s − 5] = y(0) 2 Y (s) = s−5 Now we take the inverse Laplace transform of both sides, to solve for y(t): 1 2 = 2 L −1 = 2e5t . (108) y(t) = L −1 {Y (s)} = L −1 s−5 s−5 n o 1 Where in the last step, we used the inverse Laplace transform L −1 s−a = eat . 4 We can use the previous theorem and the linearity property of the Laplace transform to compute the Laplace transform of a second derivative. First, let v 0 (t) = y 00 (t), so v(t) = y 0 (t) + C, where C is a constant of integration, then: L {y 00 (t)} = L {v 0 (t)} = s L {v(t)} − v(0) = s L {y 0 (t) + C} − [y 0 (0) + C] = s L {y 0 (t)} + s L {C} − [y 0 (0) + C] C = s[sY (s) − y(0)] + s − [y 0 (0) + C] s 2 0 = s Y (s) − sy(0) − y (0). This formula is worth remembering: L {y 00 (t)} = s2 Y (s) − sy(0) − y 0 (0) (109) Of course we can repeat the above procedure several times to obtain a corollary to the previous theorem. Corollary 6.20. Laplace Transforms of Higher Derivatives If a function y(t) and all of its derivatives up to the (n − 1) derivative are continuous and piecewise smooth for t ≥ 0. Further suppose that each is of exponential order a. Then L y (n) (t) exists when s > a and is: (110) L n o y (n) (t) = sn Y (s) − sn−1 y(0) − sn−2 y 0 (0) − · · · − y (n−1) (0). 96 6. Laplace Transforms If you examine all three results for Laplace transforms of derivatives in this section, you will notice that if the graph of a function passes through the origin, that is if y(0) = 0, and assuming y(t) meets the hypotheses of theorem 6.18, then differentiating in the t domain corresponds to multiplication by s in the s domain, or L {y 0 (t)} = sY (s). We can sometimes exploit this to our advantage as the next example illustrates. Example 6.21. L teat = 1 (s − a)2 Let y(t) = teat , then y(0) = 0, thus L {y 0 (t)} = L eat + teat = L eat + L ateat 1 + aY (s) = s−a Now using theorem 6.18, y(0) = 0 and the result just calculated, we get: 1 + aY (s) s−a 1 (s − a)Y (s) = s−a 1 Y (s) = (s − a)2 sY (s) = 4 The previous example exploited the fact that if y(0) = 0, then the Laplace transform of the derivative of y(t) is obtained simply by multiplying the Laplace transform of y(t) by s. In symbols this can be concisely stated: L {y 0 (t)} = sY (s) y(0) = 0. Thus multiplying by s in the s domain corresponds to differentiating with respect to t in the t domain, under the precise circumstance y(0) = 0. It is natural to wonder whether the inverse operation of multiplying by s, namely dividing by s corresponds to the inverse of the derivative namely integrating in the t domain. And it does! Theorem 6.22. Laplace Transforms of Integrals If y(t) is a piecewise continuous function and is of exponential order a for t ≥ T , then: Z t 1 1 L y(τ ) dτ = L {y(t)} = Y (s) s > a. s s 0 The inverse transform way to interpret the previous theorem is simply: Z t 1 y(τ ) dτ = L −1 Y (s) . s 0 3. Laplace Transform Method of Solving IVPs 97 Example 6.23. Use theorem 6.22 to find L −1 L −1 1 2 s(s + 1) Z n t L = −1 0 Z = 1 s(s2 +1) o . 1 2 s +1 dτ t sin τ dτ 0 t = [− cos τ ]0 = − cos t − − cos 0 = 1 − cos t 4 Example 6.24. Solve the following IVP: y 0 + y = sin t y(0) = 1 First we take the Laplace transform of both sides of the equation. L {y 0 (t) + y(t)} = L {sin t} 1 by theorem 6.5 L {y 0 (t)} + L {y(t)} = 2 s +1 1 sY (s) − y(0) + Y (s) = 2 by theorem 6.18 s +1 1 sY (s) − 1 + Y (s) = 2 applied initial condition y(0) = 1 s +1 1 +1 (s + 1)Y (s) = 2 s +1 1 1 Y (s) = + (s + 1)(s2 + 1) s + 1 Now if we apply the inverse Laplace transform to both sides of the last equation, we will get y(t) on the left, which is the solution function we seek! But in order to compute the inverse Laplace transform of the right hand side, we need to recognize 1 = it as the Laplace transform of some function or sum of functions. Since s+1 1 −t at s−(−1) the term on the right has inverse Laplace transform e , (recall L {e } = 1 s−a ). But the term on the left has no obvious inverse Laplace transform. Since the denominator is a product of irreducible factors, we can do a partial fractions decomposition. That is, 1 A Bs + C = + 2 (s + 1)(s2 + 1) s+1 s +1 2 A(s + 1) + (Bs + C)(s + 1) 1 = (s + 1)(s2 + 1) (s + 1)(s2 + 1) Equating just the numerators yields, 1 = A(s2 + 1) + (Bs + C)(s + 1) 1 = As2 + A + Bs2 + Bs + Cs + C 1 = (A + B)s2 + (B + C)s + (A + C) 98 6. Laplace Transforms By equating the coefficients of powers of s on both sides we get three equations in the three unknowns, A, B and C. A +B =0 B+C= 0 A +C= 1 Which you can check by inspection has solution A = 1/2, B = −1/2, C = 1/2. Thus, Y (s) = Y (s) = Y (s) = y(t) = y(t) = 1 2 − 12 s + 21 1 + s+1 s2 + 1 s+1 1 1 1 s−1 1 − + 2 2 s+1 2 s +1 s+1 3 1 1 s 1 1 − + 2 s+1 2 s2 + 1 2 s2 + 1 3 −1 1 1 −1 s 1 −1 1 L − L + L 2 s+1 2 s2 + 1 2 s2 + 1 3 −t 1 1 e − cos t + sin t 2 2 2 + 4 Example 6.25. Solve the following IVP: y 00 (t) + y(t) = cos 2t y(0) = 0, y 0 (0) = 1. We proceed by taking the Laplace transform of both sides of the equation and use the linearity property of the Laplace transform (theorem 6.5). L {y 00 (t) + y(t)} = L {cos 2t} s L {y 00 (t)} + L {y(t)} = 2 s +4 s s2 Y (s) − sy(0) − y 0 (0) + Y (s) = 2 s +4 s 2 s Y (s) − 1 + Y (s) = 2 s +4 s (s2 + 1)Y (s) − 1 = 2 s +4 s s2 + 4 (s2 + 1)Y (s) = 2 + 2 s +4 s +4 s2 + s + 4 Y (s) = 2 (s + 1)(s2 + 4) Now we must do a partial fractions decomposition of this rational function. As + B s2 + s + 4 Cs + D = 2 + 2 (s2 + 1)(s2 + 4) s +1 s +4 s2 + s + 4 = (As + B)(s2 + 4) + (Cs + D)(s2 + 1) s2 + s + 4 = (As3 + 4As + Bs2 + 4B) + (Cs3 + Cs + Ds2 + D) s2 + s + 4 = (A + C)s3 + (B + D)s2 + (4A + C)s + (4B + D) 3. Laplace Transform Method of Solving IVPs 99 A 0 A 1/3 0 1 B = 1 =⇒ B = 1 C 1 C −1/3 0 D 4 D 0 1 1 s 1 s 1 Y (s) = − + 3 s2 + 1 3 s2 + 4 s2 + 1 Finally, we apply the inverse Laplace transform to both sides to yield: 1 1 y(t) = cos t + sin t − cos 2t. 3 3 1 0 4 0 0 1 0 4 1 0 1 0 4 3.1. Electrical Circuits. A series electrical RLC circuit is analogous to a mass, spring, dashpot system. The resistor with resistance R measured in ohms (ω) is like the dashpot because it resists the flow of electrons. The capacitor with capacitance C measured in Farads is like the spring, because it converts electron flow into potential energy similar to how a spring converts kinetic energy into potential energy. Finally, the inductor with inductance L measured in Henries is like the mass, because it resists the flow of electrons initially, but once the current reaches its maximum, the inductor also resists any decrease in current. If we sit at some point in the circuit and count the amount of charge which passes as a function of time, and denote this by q(t), then Kirchoff’s Current Law and Kirchoff’s Voltage Law yield the following equation for a series RLC circuit. Lq 00 + Rq 0 + (111) 1 q = e(t) C By definition the current i(t) is the time rate of change of charge q(t), thus: Z t dq 0 i(t) = = q (t) =⇒ q(t) = i(τ ) dτ. dt 0 This allows us to rewrite equation 111, in the following way. Z 1 t 0 i(τ ) dτ = e(t) (112) Li + Ri + C 0 C Ip Is V0 sin(ωt) V0 sin(ωt) IL IC C L L R (b) Parallel RLC Circuit (a) Series RLC circuit Figure 3.1. RLC Circuits: a Series RLC configuration. b Parallel RLC configuration IR R 100 6. Laplace Transforms Example 6.26. Consider the series RLC circuit shown in figure 3.1, with R = 110Ω, L = 1 H, C = 0.001 F, and a battery supplying E0 = 90 V. Initially there is no current in the circuit and no charge on the capacitor. At time t = 0, the switch is closed and left closed for 1 second. At time t = 1 the switch is opened and left open. Find i(t), the current in the circuit as a function of time. If we substitute the values for R, L and C into equation 112, then we get: Z t (113) i0 + 110i + 1000 i(τ ) dτ = 90[1 − u(t − 1)]. 0 Because L Z t i(τ ) dτ 0 = 1 I(s), s the transformed equation becomes: sI(s) + 110I(s) + 1000 90 I(s) = (1 − e−s ). s s We solve this equation for I(s) to obtain: 90(1 − e−s ) . + 110s + 1000 But we can use partial fractions to simplify: 90 1 1 = − , s2 + 110s + 1000 s + 10 s + 100 so we have 1 1 1 1 I(s) = − − − e−s . s + 10 s + 100 s + 10 s + 100 Whereupon we take the inverse Laplace transform and get: h i (114) i(t) = e−10t − e−100t − u(t − 1) e−10(t−1) − e−100(t−1) I(s) = s2 See figure 3.2 for the graph of the solution. Figure 3.2. Current as a function of time in a series RLC circuit. 4 4. Switching 101 4. Switching Definition 6.27. The unit step function corresponds to an “on switch”. It is defined by ( 0 t<a (115) ua (t) = u(t − a) = 1 t>a u(t − 2) 2u(t − 1) 2 2 1 1 0 −1 1 2 3 4 t −1 0 −1 1 2 3 t 4 −1 Figure 4.1. Examples of step functions This function acts like a switch for turning something on. For example, if you want to turn on the function f (t) = t2 at time t = 1, then you could multiply f (t) by u(t − 1). But more likely, you would probably like the function f (t) = t2 to act as if time begins at time t = 1. This is accomplished by first shifting the input to f , for example f (t − 1), and then multiplying by u(t − 1). t2 · u(t − 1) (t − 1)2 · u(t − 1) 2 2 1 1 0 −1 −1 1 2 3 4 t 0 −1 1 2 3 4 t −1 Figure 4.2. Switching on t2 versus (t − 1)2 via the step function u(t − 1) We can also repurpose the unit step function as a way to turn things off. Lemma 6.28. The unit step function, u(t − a), changes to a “switch off at time a” function when its input is multiplied by -1. ( 1 t<a u(a − t) = 0 t>a Proof. The unit step function is defined to be ( ( 0 t<a 0 u(t − a) = ⇐⇒ u(t − a) = 1 t>a 1 t−a<0 t−a>0 102 6. Laplace Transforms Multiplying the input by (-1) requires us to flip the inequalities in the above definition yielding: ( ( 0 a−t>0 0 t>a ⇐⇒ u(a − t) = u(a − t) = 1 a−t<0 1 t<a 5. Convolution Definition 6.29. Let f (t), g(t) be piecewise continuous functions for t ≥ 0. The convolution of f with g denoted by f ∗ g is defined by Z t (116) (f ∗ g)(t) = f (τ )g(t − τ ) dτ. 0 Theorem 6.30 (Convolution is Commutative). Let f (t) and g(t) be piecewise continuous on [0, ∞), then f ∗ g = g ∗ f. Proof. We can rewrite the convolution integral using the following substitution: v =t−τ ⇐⇒ τ =t−v Z f ∗g = =⇒ dτ = −dv. t f (τ )g(t − τ ) dτ 0 Z τ =t f (t − v)g(v) (−dv) = τ =0 When τ = 0, v = t − 0 = t and when τ = t, v = t − t = 0. Z 0 =− f (t − v)g(v) dv t Z t = g(v)f (t − v) dv 0 =g∗f Theorem 6.31 (The Convolution Theorem). If f (t) and g(t) are piecewise continuous and of exponential order c, then the Laplace transform of f ∗ g exists for s > c and is given by (117) L {f (t) ∗ g(t)} = F (s) · G(s), or equivalently, (118) L −1 {F (s) · G(s)} = f (t) ∗ g(t). 5. Convolution 103 Proof. We start with the definitions of the Laplace transform and of convolution and get the iterated integral: Z t Z ∞ −st L {f (t) ∗ g(t)} = e f (τ )g(t − τ ) dτ dt. 0 0 Next, notice that we can change the bounds of integration on the second integral if we multiply the integrand by the unit step function u(t − τ ), where τ is the variable and t is the switch off time (see lemma 6.28): Z ∞ Z ∞ f (τ )u(t − τ )g(t − τ ) dτ dt, L {f (t) ∗ g(t)} = e−st 0 0 Reversing the order of integration gives Z Z ∞ L {f (t) ∗ g(t)} = f (τ ) 0 ∞ e −st u(t − τ )g(t − τ ) dt dτ, 0 The integral in square brackets can be rewritten by theorem ?? as e−sτ G(s), giving Z ∞ Z ∞ L {f (t) ∗ g(t)} = f (τ )e−sτ G(s)dτ = G(s) e−sτ f (τ )dτ = F (s) · G(s). 0 0 Definition 6.32. The transfer function, H(s), of a linear system is the ratio of the Laplace transform of the output function to the Laplace transform of the input function when all initial conditions are zero. X(s) H(s) = . F (s) −1 Definition 6.33. The function h(t) = L {H(s)}, is called the impulse response function. It is called this because it is the system’s response to receiving a unit impulse of force at time zero. The impulse response function is also the unique solution to the following undriven (homogeneous) IVP: mx00 + cx0 + kx = 0; x(0) = 0, x0 (0) = 1 . m 1 0 0 *m 2 0 * * x (0) + c sX(s) − x(0) + kX(s) = 0 m s X(s) − s x(0) − 1 m s2 X(s) − + csX(s) + kX(s) = 0 m (ms2 + cs + k)X(s) = 1 X(s) = 1 ms2 + cs + k The importance of the impulse response function, h(t) is that once we know how the system responds to a unit impulse, then we can convolve that response 104 6. Laplace Transforms with any forcing function, f (t), to determine the system’s response to being driven in any manner. Chapter 7 Eigenvalues and Eigenvectors In the next chapter, we will see how some problems are more naturally modeled via a system or collection of differential equations. We will solve these systems of equations by first transforming the system into a matrix equation, then finding the eigenvalues and eigenvectors which belong to that matrix and finally constructing a solution from those eigenvalues and eigenvectors. We will also develop a simple algorithm to transform a single, higher order, linear, differential equation into a system of first order equations. For example, a third order, linear equation will transform into a system of three first order, linear equations. In general, an n–th order, linear differential equation can always be transformed into a system of n, first order, linear, differential equations. This system can be solved via eigenvalues and eigenvectors and then the reverse algorithm translates the matrix solution back to the context of the original problem. Thus the theory behind eigenvalues and eigenvectors has direct application to solving differential equations, but it actually does much more! In chapter 9, eigenvalues and eigenvectors will allow us to understand differential equations from a geometric perspective. Perhaps most surprising is that although the theory arises from a study of linear systems, it will also allow us to qualitatively understand nonlinear systems! A very large number of problems in science and engineering eventually distill down to “the eigenvalue problem”. From web search to petroleum exploration to archiving fingerprints to modeling the human heart, the variety of applications of this theory are so myriad that it would be a daunting task to try to enumerate them. 1. Introduction to Eigenvalues and Eigenvectors Let’s start with a square n × n matrix. Recall that such a matrix can be thought of as a function which maps Rn to itself. Unfortunately, even for the simplest case of a 2 × 2 matrix, we can’t graph this function like we did the functions in Calculus. 105 106 7. Eigenvalues and Eigenvectors This is because the graph would have to exist in a four dimensional space. The graph of a 3 × 3 matrix requires a six dimensional space, and in general the graph of an n × n matrix requires a 2n dimensional space. Since direct visualization of matrix mappings is not possible, we must get clever! A mapping takes an input and maps it to an output. That is, it changes one thing into another. But sometimes a mapping maps an input back to itself. Matrices map input vectors to output vectors. Some matrices have special vectors which get mapped exactly back to themselves, but usually this is not the case. However, many matrices do map certain vectors to scalar multiples of themselves. This situation is very common. A vector ~v which gets mapped to a scalar multiple of itself under the matrix A is called an eigenvector of A. In symbols we write: (119) A~v = λ~v . In the above equation, the symbol, λ (pronounced “lambda”), is the scalar multiplier of ~v . We call λ the eigenvalue associated with the eigenvector, ~v . An eigenvalue can be any real number, even zero. However, since ~v = ~0 is always a solution to equation (119) we will disallow the zero vector from being called an eigenvector. In other words we are only interested in the nontrival, i.e. non zero– vector solutions. 3 −1 Example 7.1. Consider the matrix A = . −1 3 1 The vector ~v1 = is an eigenvector of A with eigenvalue λ1 = 2, because: 1 3 −1 1 2 1 A~v1 = = =2 = λ1~v1 . −1 3 1 2 1 −1 The vector ~v2 = is an eigenvector of A with eigenvalue λ2 = 4, because: 1 3 −1 −1 −4 −1 A~v2 = = =4 = λ2~v2 . −1 3 1 4 1 Any scalar multiple of an eigenvector is again an eigenvector corresponding to the same eigenvalue. For example, 1 1 1 1/2 ~v1 = = 1/2 2 2 1 is an eigenvector because: 3 A(1/2)~v1 = −1 −1 1/2 1 1/2 = =2 = λ1 (1/2)~v1 . 3 1/2 1 1/2 4 The fact that a scalar multiple of an eigenvector is again an eigenvector corresponding to the same eigenvalue is simply a consequence of the fact that scalar 2. Algorithm for Computing Eigenvalues and Eigenvectors 107 multiplication of matrices and hence vectors commutes. That is, if λ, ~v form an eigenvalue, eigenvector pair for the matrix A, and c is any scalar, then A~v = λ~v cA~v = cλ~v A(c~v ) = λ(c~v ). 2. Algorithm for Computing Eigenvalues and Eigenvectors Given a square n × n matrix, A, how can we compute its eigenvalues and eigenvectors? We need to solve equation (119): A~v = λ~v , but this equation has two unknowns: λ which is a scalar and ~v which is a vector. The trick is to transform this equation into a homogeneous equation and use our knowledge of linear systems. First we rewrite equation (119) as A~v − λ~v = ~0. Notice that both terms on the left hand side involve ~v so let’s factor ~v out: (A − λ)~v = ~0. The last equation is problematic because it makes no sense to subtract the scalar λ from the matrix A! However there is an easy fix. Recall that the identity matrix I is called the identity exactly because it maps all vectors to themselves. That is, I~v = ~v for all vectors ~v . Thus we can rewrite the previous two equations as follows: A~v − λI~v = ~0 (120) (A − λI)~v = ~0. Now the quantity in parentheses makes sense because λI is an n×n matrix just like A. This last linear system is homogeneous and thus at least has solution ~v = ~0, but by definition we disallow the zero vector from being an eigenvector simply because it is an eigenvector for every matrix, and thus provides no information about A. Instead we are interested in the nonzero vectors which solve equation (120). Chapter 8 Systems of Differential Equations 1. First Order Systems A system of differential equations is simply a set or collection of DEs. A first order system is simply a set of first order, linear DEs. For example, dx dt = 3x − y dy = −x + 3y dt Solving a first order system is usually not as simple as solving two individual first order DEs. Notice that in the system above, we cannot solve for x(t) without also simultaneously solving for y(t). This is because dx/dt depends on two varying quantities. When at least one of the equations in a system depends on more than one variable we say the system is coupled. This system can be rewritten as an equivalent matrix equation: 0 x 3 −1 x (121) = y0 −1 3 y which in turn can be written in the very compact form: (122) where ~x 0 = 0 x 3 , A = y0 −1 ~x 0 = A~x −1 x and ~x = . 3 y When we wish to emphasize the fact that both ~x 0 and ~x are vectors of functions and not just constant vectors, we will write equation (122) in the following way: (122) ~x 0(t) = A~x(t) 109 110 8. Systems of Differential Equations Our method of solution will closely parallel that of chapter 5, where we guessed what form the solution might take and then plugged that guess into the governing DE to determine constaints on our guess. For a system of n first order equations, we guess that solutions have the form (123) ~x(t) = ~v eλt where λ is a scalar (possibly complex), and where ~v is an n–dimensional vector of scalars (again, possibly complex). To be clear we are assuming that ~x(t) can be written in the following way: ~x(t) = ~v eλt x1 (t) v1 x2 (t) v2 .. = .. eλt . . xn (t) vn λt v1 e x1 (t) x2 (t) v2 eλt .. = .. . . xn (t) vn eλt If the vector–valued function ~x(t) = ~v eλt is to be a solution to equation (122) then its derivative must equal A~x. Computing its derivative yields: d λt ~x 0(t) = ~v e dt 0 λv1 eλt x1 (t) x02 (t) λv2 eλt .. = .. . . x0n (t) λvn eλt 0 λt v1 e x1 (t) x02 (t) v2 eλt .. = λ .. . . x0n (t) vn eλt ~x 0(t) = λ~v eλt (124) ~x 0(t) = λ~x(t) Equating the right hand sides of equation (122) and equation (124) gives: (125) A~x(t) = λ~x(t) which is the eigenvalue–eigenvector equation from chapter 7. The only difference being that now the eigenvector is a vector of functions of t as opposed to just scalars. Since the solutions of equation (125) are actually vector–valued functions we will usually refer to them as eigenfunctions rather than eigenvectors, however it is common to just say eigenvector as well. 1. First Order Systems 111 Guessing that our solution has the form ~x(t) = ~v eλt forces our solution to satisfy equation (125). Thus, solving a system of first order DEs is equivalent to computing the eigenvalues and eigenvectors of the matrix A which encodes the salient features of the system. We know from chapter 7 that if A is an n × n matrix, then it will have n linearly independent eigenvectors. The set of eigenpairs, n o {λ1 , ~v1 }, {λ2 , ~v2 }, . . . , {λn , ~vn } allow us to form a basis of eigenfunctions, o n ~x1 (t) = ~v1 eλ1 t , ~x2 (t) = ~v2 eλ2 t , . . . , ~xn (t) = ~vn eλn t which span the solution space of equation (125). Since the eigenfunctions form a basis for the solution space, we can express the solution ~x(t) as a linear combination of them, (126) ~x(t) = c1 ~x1 (t) + c2 ~x2 (t) + · · · + cn ~xn (t) ~x(t) = c1~v1 eλ1 t + c2~v2 eλ2 t + · · · + cn~vn eλn t Example 8.1. Let’s solve the example system from the beginning of the chapter, which we reproduce here, but let’s also add initial values. Since we have two first order DEs, we need two initial values. 0 x 3 −1 x (121) = x(0) = 6, y(0) = 0 y0 −1 3 y For convenience we will refer to the coefficient matrix above as A. Computing the eigenvalues of matrix A yields two distinct eigenvalues λ1 = 2 and λ2 = 4, because 3 − λ −1 = (3 − λ)2 − 1 |A − λI| = −1 3 − λ = (λ − 3)2 − 1 = λ2 − 6λ + 8 = (λ − 2)(λ − 4) = 0. Solving the eigenvector equation, (A − λI)~v = ~0 for each eigenvalue yields 1 −1 1 −1 1 • λ1 = 2 : A − 2I = ∼ =⇒ ~v1 = −1 1 0 0 1 −1 −1 1 1 1 • λ2 = 4 : A − 4I = ∼ =⇒ ~v2 = −1 −1 0 0 −1 Thus the general solution is ~x(t) = c1~v1 eλ1 t + c2~v2 eλ2 t x(t) 1 2t 1 4t = c1 e + c2 e y(t) 1 −1 Plugging in the initial values of x(0) = 6, y(0) = 0 yields the following linear system x(0) 1 1 1 1 c1 6 = c1 + c2 = = y(0) 1 −1 1 −1 c2 0 112 8. Systems of Differential Equations This system has solution c1 = 3 and c2 = 3, so the solution functions are: x(t) = 3e2t + 3e4t y(t) = 3e2t − 3e4t . 4 2. Transforming a Linear DE Into a System of First Order DEs The eigenvalue method can also be applied to second and higher order linear DEs. We start with a simple example. Example 8.2. Consider the homogeneous, linear second order DE, (127) y 00 + 5y 0 + 6y = 0. Suppose y(t) represents the displacement (position) of a mass in an undriven, damped, mass–spring system, then it is natural to let v(t) = y 0 (t) represent the velocity of the mass. Of course, v 0 (t) = y 00 (t) and this allows us to rewrite equation (127) as follows: v 0 + 5v + 6y = 0 =⇒ v 0 = −6y − 5v. Combining our substitution, v = y 0 and our rewrite of equation (127) together yields the following system of first–order equations 0 y0 = v y 0 1 y =⇒ = v 0 = −6y − 5v v0 −6 −5 v The coefficient matrix has eigenvalues λ1 = −2 and λ2 = −3, since 0 − λ 1 = λ(λ + 5) + 6 |A − λI| = −6 −5 − λ = λ2 + 5λ + 6 = (λ + 2)(λ + 3) = 0. Notice that the eigenvalue equation is exactly the characteristic equation which we studied in chapter 5. Solving the eigenvector equation, (A − λI)~v = ~0 for each eigenvalue yields 2 1 2 1 1 ∼ =⇒ ~v1 = • λ1 = −2 : A − (−2)I = −6 −3 0 0 −2 3 1 3 1 1 • λ2 = −3 : A − (−3)I = ∼ =⇒ ~v2 = −6 −2 0 0 −3 The general solution follows the pattern ~x(t) = c1~v1 eλ1 t + c2~v2 eλ2 t and is thus y(t) 1 −2t 1 −3t = c1 e + c2 e v(t) −2 −3 But we are only interested in y(t). That is, the general solution to equation (127) is just: y(t) = c1 e−2t + c2 e−3t . Notice that v(t) is of course just y 0 (t) and is superfluous information in this case. 4 3. Complex Eigenvalues and Eigenvectors 113 Clearly this method of solution requires more work than the method of chapter 5, so it would appear that there is no advantage to transforming a linear equation into a system of first order equations. However, we will see in the next chapter, that this method allows us to study linear DEs geometrically. In the case of second order, linear DEs, the graphical methods of the next chapter will allow us to understand mechanical systems and RLC circuits in a whole new way. 3. Complex Eigenvalues and Eigenvectors Recall that our method of solving the linear system ~x 0(t) = A~x(t), (122) involves guessing that the solution will have the form ~x(t) = ~v eλt . (123) This forces ~x(t) to satisfy the eigenvalue–eigenvector equation: (125) A~x(t) = λ~x(t). The eigenfunctions which satisfy this equation form a basis for the solution space of equation (122). Thus the general solution of the system is a linear combination of the eigenfunctions: ~x(t) = c1~v1 eλ1 t + c2~v2 eλ2 t + · · · + cn~vn eλn t (126) However if any eigenvalue, λi in the general solution is complex, then the solution will be complex–valued. We want real–valued solutions. The way out of this dilemma is to realize that a single eigenpair, {λi , ~vi } where both λi and ~vi are complex–valued can actually yield two real–valued eigenfunctions. Suppose ~v eλt satisfies equation (125) and both ~v and λ are complex–valued. Then we can expand ~v and λ yielding: a1 + ib1 a2 + ib2 (p+iq)t ~v eλt = e .. . an + ibn a1 b1 a2 b2 = . + i . ept cos qt + i sin qt .. .. an bn a1 b1 a1 b1 a2 b2 a2 b2 = ept . cos qt − . sin qt + i ept . sin qt + . cos qt .. .. .. .. an bn an = ept ~a cos qt − ~b sin qt + i ept ~a sin qt + ~b cos qt {z } | {z } | ~ x1 (t) ~ x2 (t) bn 114 8. Systems of Differential Equations The above just demonstrates that we can break up any complex–valued function into its real and imaginary parts. That is, we can rewrite ~v eλt as: ~v eλt = Re ~v eλt + i Im ~v eλt = ~x1 (t) + i ~x2 (t), where both ~x1 (t) and ~x2 (t) are real–valued functions. Since ~v eλt satisfies equation (125) so does ~x1 (t) + i ~x2 (t), but due to the fact that a matrix is a linear operator (i.e. matrix multiplication distributes over linear combinations), they both individually satisfy it as well. A~v eλt = λ~v eλt h i h i A ~x1 (t) + i ~x2 (t) = λ ~x1 (t) + i ~x2 (t) A~x1 (t) + i A~x2 (t) = λ~x1 (t) + i λ~x2 (t) Equating the real and imaginary parts of both sides yields the desired result: A~x1 (t) = λ~x1 (t), A~x2 (t) = λ~x2 (t). In practice, you don’t need to memorize any formulas. The only thing from above that you need to remember is that when confronted with a pair of complex conjugate eigenvalues, pick one of them and find its corresponding complex eigenvector, then with this eigenpair form the eigenfunction ~x(t) = ~v eλt . The real and imaginary parts of this eigenfunction will be real–valued eigenfunctions of the coefficient matrix. That is find the two eigenfunctions: ~x1 (t) = Re ~v eλt and ~x2 (t) = Im ~v eλt . Then form the general solution by making a linear combination of all the eigenfunctions which correspond with the coefficient matrix: ~x(t) = c1 ~x1 (t) + c2 ~x2 (t) + · · · + cn ~xn (t). Example 8.3. Consider the first–order, linear system: x01 = 2x1 − 3x2 x02 = 3x1 + 2x2 ⇐⇒ 0 x1 2 −3 x1 = x02 3 2 x2 | {z } A 2 − λ |A − λI| = 3 −3 = (2 − λ)(2 − λ) + 9 = 0 2 − λ (λ − 2)2 = −9 (λ − 2) = ±3i λ = 2 ± 3i 2 − (2 + 3i) −3 −3i A − λI = = 3 2 − (2 + 3i) 3 −3 −3i Next, we need to solve the eigenvector equation: (A − λI)~v = ~0, but elementary row ops preserve the solution space and it is easier to solve the equation when the 4. Second Order Systems 115 matrix A − λI is in reduced row–echelon form (RREF) or at least row–echelon form (REF). −3i −3 R1 +iR2 0 0 R1 ↔R2 3 −3i (1/3)R1 1 −i −→ −→ −→ 3 −3i 3 −3i 0 0 0 0 1 −i i 0 i = =⇒ ~v = 0 0 1 0 1 Now that we have an eigenpair, we can form a complex–valued eigenfunction which we rearrange into real and imaginary parts: i (2+3i)t λt ~v e = e 1 i 2t = e (cos 3t + i sin 3t) 1 2t i cos 3t − sin 3t =e cos 3t + i sin 3t 2t cos 3t 2t − sin 3t +i e =e sin 3t cos 3t | {z } {z } | ~ x1 (t) ~ x2 (t) Finally, we form the general solution: ~x(t) = c1 x1 (t) + c2 x2 (t) x1 (t) 2t − sin 3t 2t cos 3t = c1 e + c2 e x2 (t) cos 3t sin 3t x1 (t) = −c1 e2t sin 3t + c2 e2t cos 3t x2 (t) = c1 e2t cos 3t + c2 e2t sin 3t 4 4. Second Order Systems Consider two masses connected with springs as shown in figure 4.1. Since this is a mechanical system, Newton’s laws of motion apply, specifically the second law P ma = F. k1 m1 k2 m2 k3 Figure 4.1. Two mass system Since each mass is attached to two springs, there are two forces which act upon each mass. Let’s derive the equation of motion for mass one, m1 . If we displace m1 a small amount to the right then spring one, labelled k1 , will pull it back. The force, according to Hooke’s law will be equal to −k1 x1 . The negative sign simply indicates that the force will be in the negative direction. The force on m1 due to spring two is complicated by the fact that both m1 and m2 can be displaced simultaneously. However, a simple thought experiment will 116 8. Systems of Differential Equations clarify. Imagine diplacing m2 two units to the right from its equilibrium position, and imagine displacing m1 only one unit to the right from its equilibrium. In this configuration, since spring two is stretched, it will pull m1 to the right with a force proportional to k2 times the the amount of stretch in spring two. This stretch is exactly one unit, because x2 − x1 = 2 − 1 = 1. Therefore the equation of motion for m1 is: m1 x001 = −k1 x1 + k2 (x2 − x1 ). To derive the equation of motion for mass two, m2 , we will again imagine displacing m2 to the right by two units and m1 to the right by one unit. In this configuration, since spring two is stretched, it will pull m2 to the left. Spring three will be compressed one unit and hence push m2 to the left as well. m2 x002 = −k2 (x2 − x1 ) − k3 x2 . We wish to write this system as a matrix equation so we can we can apply the eigenvalue–eigenvector method. Thus we need to rearrange these two equations such that the variables x1 and x2 line up in columns. m1 x001 = −(k1 + k2 )x1 + k2 x2 m1 0 m2 x002 = k2 x1 − (k2 + k3 )x2 00 0 x1 −(k1 + k2 ) k2 x1 = m2 x002 k2 −(k2 + k3 ) x2 This matrix equation can be written very compactly as M~x 00 (t) = K~x(t) (128) We will call matrix M the mass matrix and matrix K the stiffness matrix. Before we can apply the eigenvalue–eigenvector method we need to rewrite equation (128) so that it contains only one matrix. Luckily, the mass matrix is invertible with inverse 1/m1 0 −1 M = . 0 1/m2 This allows us to rewrite equation (128) as ~x 00 (t) = A~x(t) (129) where A = M −1 K. To solve this system we will employ the same method as before, but since each equation now involves a second derivative we will have to take that into account. We guess that the solution has the form: ~x(t) = ~v eαt . Differentiating our guess solution twice yields: (130) ~x(t) = ~v eαt ⇒ ~x 0 (t) = α~v eαt ⇒ ~x 00 (t) = α2~v eαt = α2 ~x(t). Equating the right hand sides of equation (129) and equation (130) yields (131) A~x(t) = α2 ~x(t). This is essentially the eigenvalue–eigenvector equation again if λ = α2 . But we need to 5. Nonhomogeneous Linear Systems k1 m1 k2 117 m2 k3 Figure 4.2. Three mass system 5. Nonhomogeneous Linear Systems m3 k4