Dmitry Panchenko Calculus I and II Essentials ISBN-13: 978-1-9994190-7-3 ISBN-10: 1-9994190-7-3 1st edition © 2022 Dmitriy Panchenko Acknowledgement This text grows out of the author’s experience teaching MAT135 & MAT136 at the University of Toronto during the 2021-2022 academic year. This course in its current form was developed between 2017 and 2021 under the direction of Sarah Mayes-Tang, with input from many members of a large teaching team, notably including Bernardo GalvãoSousa, who also coordinated the course in 2021-2022 academic year. The material in these notes reflects the structure of the course as designed by Professor Mayes-Tang, for example through the emphasis on a flipped classroom and in the selection of topics. The presentation is intended to complement the treatment of these topics as found in the standard MAT135 & MAT136 course material. I want to thank the entire MAT135 & MAT136 teaching teams from the previous several years, and also MAT187 from last year, who created many of the exercises. I also want to thank all the students in these classes, whose feedback was very important to me, and whose positive energy made the classes a real pleasure to teach. Contents 1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Linear functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Exponential functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Logarithmic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Logarithmic scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Polynomials and rational functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9 Limits and continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.1 Practical interpretation of derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.2 Formal definition of derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 2.3 Derivatives and graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 2.4 Differentiation rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 2.5 First applications: old and new . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 2.6 Critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 2.7 Optimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 2.8 Parametric families of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 2.9 Related rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 3 Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 3.1 Definite integrals: the case of velocity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 3.2 Definite integrals: general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 3.3 Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 3.4 Application of FTC: differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 3.5 Techniques of integration: substitution rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 3.6 Techniques of integration: integration by parts . . . . . . . . . . . . . . . . . . . . . . . . 173 3.7 Approximating integrals using Taylor polynomials . . . . . . . . . . . . . . . . . . . . . 180 3.8 CAS: computer algebra systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 3.9 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 3.10 Slicing problems: geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 3.11 Slicing problems: densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 4 Differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 4.1 Differential equations: qualitative analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 4.2 Differential equations: approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 4.3 Separable differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 4.4 Lotka-Volterra predator-prey model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 4.5 The SIR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 4.6 Approximating solutions by Taylor polynomials . . . . . . . . . . . . . . . . . . . . . . . 255 1 1 4 7 15 20 27 33 40 45 5 Taylor polynomials and series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 5.1 From Taylor polynomials to Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 5.2 Transformations of Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 5.3 Ratio test and the radius of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 5.4 Applications of Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Chapter 1 Functions 1.1 Introduction In this chapter we will study various basic functions that appear frequently in Calculus, such as linear, exponential, logarithmic, power, polynomial, rational, and trigonometric functions. Of course, we will often combine these functions by adding, subtracting, multiplying, dividing, taking compositions and inverses. The functions themselves, but also various properties of functions and operations we can do with functions, can be: • • • • described in words, both in plain English or using mathematical terminology; expressed with mathematical formulas and notation; depicted and observed in figures via their graphs; represented by tables of values. For this reason, learning Calculus is a lot like learning a new language, and even if you feel that you understood a new concept, it is important to be able to express it in different ways and translate it between words, formulas, graphs, and sometimes recognize it from a table. Let us show an example of how the same information can be expressed or observed in these different ways. Example 1. Suppose that a function T = f (t) describes temperature T changing over time, and suppose that (in plain English) the temperature is growing but the growth is slowing down. Then we can express this using mathematical terminology by saying that the function is increasing and concave down. We can also observe this behaviour from the graph of y = f (t), or use formulas and write that its first derivative is positive, f ′ (t) > 0, and second derivative is negative, f ′′ (t) < 0 (as we will learn later). Finally, if 1 2 1 Functions we are given a table of temperature values at a few equally spaced points in time, for example, t T 0 0 1 3.75 2 7 3 9.75 4 12 5 13.75 we can also see that the values are increasing, but the gaps between two consecutive temperatures are decreasing: 3.75, 3.25, 2.75, 2.25, 1.75. Of course, in this case we do not know what happens for all times t but, from what we can see in the table, we can guess that the temperature is growing but it is growing slower and slower, so the function T = f (t) is probably increasing and concave down. Exercise 1. By looking at the table of values, is the function T = f (t) increasing or decreasing? Concave up or concave down? t T 0 1 1 1.5 2 3 3 5.5 4 9 5 13.5 Domain and range. We will discuss terminology associated with functions all throughout this chapter, but here let us briefly discuss the domain and range of a function y = f (x). Generally speaking, the domain of a function is a set of all possible inputs x that we are allowed to plug into √ the function f , and the range is a set of all possible outputs y. For example, y = x has the domain [0, ∞) because we are only allowed to plug in positive values 0 ≤ x < ∞ into the square root, and the range is also [0, ∞). However, in some cases the domain may be more narrow for various reasons. √ For example, if in a given problem we are only interested in the function y = x on the interval [1, 2],√for the purpose of that problem the domain will be [1, 2] and the range will be [1, 2]. In applied problems the domain may be limited by the physical constraints of the problem. Example 2. Suppose we have a 10′′ × 10′′ cardboard, and we cut off the four corners of size x × x and then fold the sides to make a box. The volume of the box V = V (x) will depend on x. What is V (x) and what is its domain? Solution: The dimensions of the box (ℓ × w × h) will be (10 − 2x) × (10 − 2x) × x, so the volume will be V (x) = (10−2x)2 x. The domain is [0, 5], because we cannot physically cut off the corners of size bigger than 5′′ × 5′′ . Exercise 2. A ball is tossed straight up with initial speed 10m/s and initial height above ground of 2m. The height of the ball at time t is given by h = −5t 2 + 10t + 2 meters. What is the domain and range of the function that describes the height of the ball until the time it hits the ground? Hint: Recall quadratic formulas or see Section 1.7. 1.1 Introduction 3 Answer to Exercise 1. It is increasing and the gaps between consecutive values are also increasing: 0.5, 1.5, 2.5, 3.5, 4.5, so the function is increasing and concave up. Answer qto Exercise 2. The domain is [0, 1+ 75 ]. As we can see in the figure, the main point here is that the height q h(t) is negative after time t = 1 + 75 , so the formula is no longer applicable there. The maximum height will be 7m at time t = 1, so the range is [0, 7]. 4 1 Functions 1.2 Linear functions Linear functions are functions of the form y = b+m·x where constant b is the y-intercept and constant m is the slope, x is the independent variable (input of the function) and y is the dependent variable (output of the function). Linear functions may be the simplest functions in Calculus, but they play a fundamental role because the notion of a derivative f ′ (a) of a function y = f (x) at a point x = a will be based on approximating f (x) at that point by linear functions. Let us take a look at a graph of a linear function and consider any two points (x1 , y1 ) and (x2 , y2 ) on this graph. We can see that: • y1 = b + m · x1 – first point. • y2 = b + m · x2 – second point. • ∆x = x2 − x1 is called run. • ∆y = y2 − y1 is called rise. • m = tan(θ ) = ∆y ∆x is called slope. • b = b + m · 0 is y-intercept. The notation ∆ is used often in Calculus and means increment. For example, above ∆x is the increment x2 −x1 of the variable x, and ∆y is the increment y2 −y1 of ∆y the variable y. The slope m = ∆x represents how much the output variable changes, ∆y, relative to how much the input variable changes, ∆x. We can think of it as a rate of change of y with respect to x, and for linear functions it is always the same no matter what the interval [x1 , x2 ] is. Example 1. If a car drives with a constant speed of 60 km/h, what is the distance d it covers in t hours? If we think of the distance as a function of time, what is the meaning of its slope? Solution: Since Distance = Speed × Time when the speed is constant, in this case, Distance d = 60 · t, measured in km/h × h = km. It is a linear function with yintercept 0 and slope 60, so the meaning of the slope m = ∆d ∆t is speed. Constant speed means that distance is a linear function of time. Exercise 1. Suppose that during photosynthesis at temperature 10°C in direct sunlight a leaf of some plant produces 30 µmol of glucose and oxygen per one hour. What is the amount of glucose and oxygen produced during t hours? If we think of this amount as a function of time, what is the meaning of its slope? 1.2 Linear functions 5 Example 2. What is the linear function whose graph passes through points (1, 3) and (5, 2)? ∆y = − 41 = Solution: Since ∆x = 5 − 1 = 4 and ∆y = 2 − 3 = −1, the slope is m = ∆x −0.25. To find the intercept, we can use any one of the two points, for example the first one: 3 = b + m · 1 = b − 0.25 · 1 = b − 0.25, so b = 3 + 0.25 = 3.25. The linear function is y = 3.25 − 0.25x. Exercise 2. What is the linear function whose graph passes through points (4, 1) and (0, 3)? Line from slope and one point. In the Example 2 above, once we found the slope m, we computed the intercept b by plugging in the value of one point. Actually, if we know the slope m and a point (x0 , y0 ) on the graph of a linear function, we can write the formula for this linear function directly: y = y0 + m · (x − x0 ). Indeed, if x = x0 then y = y0 + m · (x0 − x0 ) = y0 + m · 0 = y0 . It is very important to remember this formula, because it will be used frequently to write the equations of secant lines and tangent lines later one when we study derivatives. Let us see how we could have used it in the second half of Example 2. Example 3. What is the linear function whose slope is −0.25 and whose graph passes through the point (1, 3)? Solution: The function is y = 3 − 0.25(x − 1). Of course, we can multiply out the second term and rewrite this as y = 3 − 0.25(−1) − 0.25x = 3.25 − 0.25x. Exercise 3. What is the linear function whose slope is through the point (2, 1)? 7 11 and whose graph passes Tables and trend lines. If we are given a table of (x, y) values, it is easy to check if they all lie on a graph of a linear function y = b + m · x. We only need to check that all slopes between two consecutive values of x are equal. This is especially easy if all increments ∆x are the same, in which case we only need to check that all increments ∆y are also the same. Example 4. Are all the points in the table lie on the graph of a linear functions? If yes, which one? x y 0 0 2 3 4 6 6 9 8 12 10 15 Solution: We can see that all increments ∆x between consecutive points are equal to 2, so we only need to check that all ∆y are also the same. Indeed, all ∆y are equal ∆y to 3, so the points lie on the graph of one linear function. Its slope is ∆x = 23 = 1.5, and since it passes through the point (0, 0), the formula is y = 0+1.5(x−0) = 1.5x. 6 1 Functions Exercise 4. Are all the points in the table lie on the graph of a linear functions? If yes, which one? x y −2 −3 0 1 3 7 6 13 8 17 11 23 Example 5. Sometimes the points might not be exactly on a straight line but pretty close to a straight line, as the points in the following table x y 0 0 2 3.4 4 5.5 6 8.8 8 12.6 10 15.1 shown as blue dots in the figure below. One way to find the trend line (shown in red in the figure below) is to use the so called least squares regression, which can be solved using optimization techniques studied later in this course. For now, we will simply mention how to find this line using Google Sheets. The equation for the line is shown at the top of the chart, in this case y = 1.52x − 0.0333. • Enter x-values and y-values into two columns, and select those columns. • Go to Insert > Chart > Chart Type. • Go to Insert > Chart. • Under Chart Type select Scatter. • Under Customize select Series. • Scroll and check Trendline. • A little below, under Label choose Use Equation. Exercise 5. Find the trend line (least squares regression) for the following data points: x y −2 −3.3 0 1.1 3 7.5 6 12.4 8 16.6 11 23.4 Answer to Exercise 1. y = 30t µmol. Slope 30 µmol/h is the rate of photosynthesis, or the rate of production of of glucose and oxygen. Answer to Exercise 2. y = 3 − 12 x. 7 3 7 Answer to Exercise 3. y = 1 + 11 (x − 2) = − 11 + 11 x. Answer to Exercise 4. Yes, y = 2x + 1. Answer to Exercise 5. y = 2.01x + 0.914. 1.3 Exponential functions 7 1.3 Exponential functions Let us recall basic algebra rules involving powers: ab ac = ab+c , (ab )c = abc , (ab)c = ac bc , a c ac b 1 −c a b−c = a , = a , = c. ac ac b b Here we assume that all the terms make sense, i.e. we do not divide by zero, etc. Example 1. Let us check that 2−x = 0.5x . Solution: Using the above rules, we can write 1 x 1 = 0.5x . 2−x = x = 2 2 Another way is to write 2−x = 2(−1)x = (2−1 )x = 1 x 2 = 0.5x . This means that 2−x and 0.5x are the same function of x. Exercise 1. Check that (a) 100x = 102x , (b) 25−x/2 = 0.2x , (c) 2−2x = 0.25x . Exponential functions are functions of the form P = P0 · at . • • • • a > 0 is a positive constant called (exponential) base. t is an independent variable which often represents time. P is a dependent variable that sometimes represents a population. P0 is the initial value of P at time 0. Indeed, P(0) = P0 · a0 = P0 . The notation for both variables t and P can change depending on the situation, so y = 0.5 · 2x , h = 3 · 0.5−t , P = 100 · 2−t , z = 2 · 3−t/2 , are all possible examples of exponential functions. Typical examples of processes described or modelled by exponential functions are: • • • • • decay of a radioactive material; compound interest in a bank account; increase or decrease of a population; concentration of a drug in a patient’s body; probability of a lifetime of high quality products. Let us take a look at the graph of several exponential functions. 8 1 Functions Exponential growth: a > 1. When the exponential base a is bigger than 1, for example y = 2x in the above graph, the function is increasing as the variable x increases. For example, if the variable increases by one unit then the function y = 2x+1 = 2 · 2x doubles compared to 2x . On the same unit interval the function y = 4x quadruples, because y = 4x+1 = 4 · 4x , so it grows faster than y = 2x as we can see in the above figure. The bigger a is, the faster the function ax grows. Exponential decay: 0 < a < 1. When the exponential base a is less than 1, for example y = 0.5x in the above graph, the function is decreasing as the variable x increases. For example, if the variable increases by one unit then the function y = 0.5x+1 = 0.5 · 0.5x if half of 0.5x . On the same unit interval y = 0.25x+1 = 0.25 · 0.25x is a quarter of 0.25x so the function y = 0.25x decays faster than y = 0.5x . The smaller a is, the faster the function ax decays. Example 2. In the figure we see the graphs of three exponential functions y = p · ax , y = q · bx , and y = r · cx . (a) Compare constants a, b and c. (b) Compare constants p, q and r. Solution: (a) The constant a is the smallest, because a is less than 1 (exponential decay), and both b and c are bigger than 1 (exponential growth). Also, b > c because the growth of y = q · bx is faster. So, a < c < b. (b) The constants p, q and r are initial values at x = 0, so we are comparing the y-intercepts: q = r < p. Exercise 2. Sketch the graphs of functions y = 2 · 1.2x , y = 3 · 1.1x and y = 2 · 0.8x on the same plot. (Check your answer using, for example, geogebra.org). 1.3 Exponential functions 9 Rate of growth/decay. Given an exponential function P = P0 · at , let us express the base a relative to 1 as a = 1 + r: P = P0 · at = P0 · (1 + r)t . In the case of exponential growth, when a > 1, the constant r = a − 1 > 0 is called the (exponential) growth rate. In the case of exponential decay, when 0 < a < 1, the constant −r = 1 − a > 0 is called the (exponential) decay rate. Example 3. Find the exponential growth/decay rate of the functions y = 2 · 1.1x and y = 3 · 0.98x . Solution: Since y = 2 · 1.1x = 2 · (1 + 0.1)x , the growth rate is r = 0.1. Since y = 3 · 0.98x = 2 · (1 − 0.02)x , the decay rate is −r = 0.02. Exercise 3. Write down an exponential function with the growth rate 0.01 and initial value 1.5. Write down an exponential function with the decay rate 0.04 and initial value 2. Example 4. We deposit D dollars into a savings account with annual interest 2%. How much money is in the account after t years? Solution: If we start with D dollars, in one year we accumulate interest 0.02D, so the total will be D + 0.02D = D · 1.02. After two years total will be D · 1.02 · 1.02 = D(1.02)2 , after three years D(1.02)3 , and after t years D(1.02)t . Actually, a typical savings account accumulates interest continuously, so if we close the account after t years, where t is not necessarily integer, we will have D(1.02)t dollars. Exercise 4. Suppose a radioactive material decays at the rate of 2.5% per year. What percent of the original will remain after 100 years? Example 5. Suppose that annual sales at a bakery are growing at 2% per year. How can we model the annual sales? Solution: If the sales during current year total A dollars, time t = 1 denotes next year, t = 2 is the year after next, etc., then as in the previous example the sales during year t will be A(1.02)t . If one prefers to denote current year by t = 1 instead of t = 0 then we need to shift time by 1 so that the sales during year t will be A(1.02)t−1 . Also, the difference with the previous example is that, for non-integer t, this formula does not have a particular meaning; for example, A(1.02)2.5 does not directly represent anything related to sales at t = 2.5 years. Instead of annual sales, we could model the rate of sales using exponential, but this will be studied much later because, in this case, sales within any interval of time would be computed using integrals. Exercise 5. Suppose that annual sales at a bakery are decreasing at 1% per year. How can we model the annual sales? 10 1 Functions Continuous rate of growth/decay. Recall Euler’s number e = 2.718281828 . . . and the natural logarithm function ln(x). We will review logarithms in the next section, but for now we only need to recall that any positive number a > 0 can be written as a = eκ , and this equation can be solved for κ using natural logarithm: a = eκ =⇒ κ = ln(a). It means that, given an exponential function P = P0 · at , we can always express the base a as a = eκ with κ = ln(a), and rewrite this function as P = P0 · at = P0 · (eκ )t = P0 · eκt . In the case of exponential growth, when a > 1, the constant κ = ln(a) > 0 is called the continuous growth rate. In the case of exponential decay, when 0 < a < 1, the constant −κ = − ln(a) > 0 is called the continuous decay rate. Example 6. Find the continuous growth/decay rate of the functions y = 2 · 1.1x and y = 3 · 0.98x . Solution: In the first case, κ = ln(1.1) = 0.0953 . . . is the continuous growth rate, and we can write the function y = 2 · 1.1x as y = 2 · e0.0953x . In the second case, κ = ln(0.98) = −0.0202 . . ., so the continuous decay rate is 0.0202, and we can write the function y = 3 · 0.98x as y = 3 · e−0.0202x . Notice that the decay rate itself is positive, but the fact that the function is decreasing (decay) instead of increasing (growth) is expressed by the minus sign in the exponent e−0.0202x . Exercise 6. Find the continuous growth/decay rate of the functions y = 2 · 1.02x and y = 7 · 0.9x . The reason why in Calculus we prefer to write exponential functions in the form P = P0 · eκt is because the base e is quite special and has nice properties, which will make it more convenient to use when dealing with derivatives and integrals later on. For example, both the derivative and integral of ex will be ex itself. For more about continuous rates: youtu.be/sbLWLvSfvwk. Half-life. Many exponential decay processes are described by their half-life, which is the interval of time H required for the quantity to decrease to one-half of its initial value. It is easy to guess the formula P = P0 · 1 t/H 2 = P0 · 2−t/H 1.3 Exponential functions 11 1 H/H = P20 . We can also find this because, in this case, P(H) is equal to P0 · 2 formula by taking an exponential function P = P0 · at , making sure that P(H) is equal to P20 , and solving for a: P(H) = P0 · aH = Then P = P0 · at = P0 · P0 2 1 t/H , 2 =⇒ aH = 1 2 =⇒ a= 1 1/H 2 . which matches the formula above. Example 7. Aspirin has a half-life of 20 minutes in a patient’s body (once absorbed in the upper gastrointestinal tract). How long does it take for 100mg of aspirin to be reduced to 30mg? t/20 = 100 · (0.5)t/20 = 30. Solution: We want to find time t such that P = 100 · 12 Dividing both sides by 100 and then taking natural logarithm of both sides: (0.5)t/20 = 0.3 =⇒ ln(0.5)t/20 = ln 0.3 =⇒ t ln 0.5 = ln 0.3, 20 0.3 = 34.74 min ≈ 34 minutes and 44 seconds. In the above calculation so t = 20lnln0.5 we used the property of logarithm that ln(ab ) = b ln(a). Exercise 7. If it takes 60 minutes for 100mg of a drug to be reduced to 30mg in a patient’s body, what is the half-life of this drug? Key property of exponentials. The key property of an exponential function is that, on any interval of the same length h, it changes by the same factor. Indeed, P(t + h) = P0 · at+h = P0 · at · ah = ah · P(t). h So the factor P(t+h) P(t) = a depends only on h but not on t. One way we can use this is to find the formula for an exponential function if we know two points on its graph. Example 8. Find the exponential function y = y0 · ax that passes through the points (2, 3) and (5, 7). Solution: Since 3 = y0 · a2 and 7 = y0 · a5 , 5 3 , so a = 7 1/3 and y = we get 37 = yy0 ·a = a 2 3 x/3 0 ·a . To find y0 , we can use any one y0 · 37 of the two points, for example, 3 = y(2) = 2/3 2/3 y0 · 73 , so y0 = 3 37 . 12 1 Functions Exercise 8. Find the exponential function y = y0 · ax that passes through the points (2, 5) and (6, 1). We can also use the above property to check if the points in a table correspond to some exponential function. Example 9. Do the values in the table correspond to some exponential function? x y 0 4 2 6 4 9 6 8 13.5 20.25 Solution: We see that the increments of x in the table are all equal to h = 2, so the 2 x ratio y(x+2) y(x) should be the same for all x, equal to a if y = y0 · a . Indeed, 6 9 13.5 20.25 = = = = 1.5, 4 6 9 13.5 2 so the √ values in the table correspond to exponential function with a = 1.5, or a = 1.5. We can find y0 using any point in the table, for example the first one, 4 = y(0) = y0 · a0 = y0 , so y = 4 · (1.5)x/2 . Exercise 9. Do the values in the table correspond to some exponential function? x y −5 8 −2 12 1 18 4 28 7 40.5 Later on we will discuss the so called logarithmic scale, which will allow us to observe more easily whether the points lie on the graph of some exponential function, even in the case when the increments of x are not necessarily equal. Average exponential growth. If the rate of growth constantly changes, how do we define the average growth rate over a given period of time? Let us answer this by looking at the example of world population growth. Example 10. The world population in 1928 was 2 billion and in 1950 it was 2.5 billion. What was the average growth rate during this period? Solution: If we had a constant rate of growth r over the period of 22 years, we would have 2.5 = 2 · (1 + r)22 . Solving for r we get r = (1.25)1/22 − 1 = 0.01. This means that the average rate of growth was 1%. 1.3 Exponential functions 13 Exercise 10. The world population in 1950 was 2.5 billion and in 1987 it was 5 billion. What was the average growth rate during this period? Double exponentials. Functions of the form −ct y = ae−be and y = ae−be ct where a > 0, b > 0 and c > 0 are positive constants (parameters), can be used to describe various growth and decay processes, such as tumour growth or survival probabilities in a population. We will see example of such models later on. Such functions, where one exponential appears inside another exponential, are also called Gompertz functions. Notice that y = −ct ct ae−be is increasing and y = y = ae−be is decreasing with t. Here we will only use these Gompertz functions to practice some basic calculations. −ct Example 11. What value does y = ae−be approach as t gets bigger and bigger? e−ct Solution: First of all, as t gets bigger, approaches 0 because it is an expo−ct −be nential decay function. This means that e approaches e−b·0 = e0 = 1 and, −ct −be finally, ae approaches a. This means that y = a is the horizontal asymptote as t approaches infinity. −ct Exercise 11. At what time t is the function y = ae−be to exist, what do we need to assume about a? equal to 12 ? For such time Answer to Exercise 3. y = 1.5 · (1.01)x , y = 2 · (0.96)x . Answer to Exercise 4. y = y0 ·(0.975)t so y(100) = y0 ·(0.975)100 = y0 ·0.0795 . . ., which is about 7.95% of the original amount. Answer to Exercise 5. If the sales during current year total A dollars, time t = 1 denotes next year, t = 2 is the year after next, etc., then the sales during year t will be A(0.99)t . Answer to Exercise 6. Continuous growth rate of y = 2 · 1.02x is κ = ln(1.02) = 0.0198, and continuous decay rate of y = 7 · 0.9x is −κ = − ln(0.9) = 0.10536. ln(2) Answer to Exercise 7. 30 = 100·2−60/H , and solving for H we get H = − 60 ln(0.3) = 34.53 min. See next section if you need to review properties of logarithms that are used in such calculations. 14 1 Functions Answer to Exercise 8. y = 53/2 · 5−t/4 = 53/2−t/4 . Answer to Exercise 9. No, because all increments of x are equal to 3, but the ratio 12 28 8 = 1.5 is different from 18 = 1.555 . . . . Answer to Exercise 10. r = 21/37 − 1 = 1.01891 or about 1.89%. For more about this example, see youtu.be/9_VJ2PvZBuo. Answer to Exercise 11. There are two ways to answer this question. First, we show in the previous example that the top asymptote in the figure is y = a, so the function takes values between 0 and a. If we want the function to be equal to 0.5 at −ct some point x, we must have a > 0.5. Another way to solve this is set ae−be = 0.5 and start solving for t: e−be −ct = 1 2a =⇒ − be−ct = ln 1 = − ln(2a) =⇒ 2a e−ct = ln(2a) . b Before we take logarithms of both sides again, we must notice that the exponential e−ct on the left hand side is always positive, so the right hand side must also be positive. Since we agreed that b > 0, the numerator ln(2a) must be positive. This means that 2a should be bigger than 1 or, again, a > 0.5. So a must be bigger than 0.5, otherwise, such t does not exist. If a > 0.5 then we can take logarithms on both . sides of the last equation above to get that t = − 1c ln ln(2a) b 1.4 Logarithmic functions 15 1.4 Logarithmic functions Suppose we want to find a number a such that 10a is equal to 4. In other words, we want to solve 10a = 4 for a. Such number a is called log(4) = 0.602 . . . where the function log is called logarithm base 10. More generally, because any positive number b > 0 can be an output of the exponential function 10x , we can also solve 10a = b to find a given any b > 0. Such a is denoted log(b). To summarize: b = 10a ⇐⇒ a = log(b). If base 10 in the exponent is replaced by Euler’s number e = 2.71828 . . . , b = ea ⇐⇒ a = ln(b) then log is replaced by the natural logarithm ln(b), again assuming that b > 0. Remark. One can similarly define logarithm with any positive base, but we will only use log(x) and ln(x). In fact, most of the time we will use ln(x), and even log(x) might appear only occasionally. We mentioned in the previous section that exponential function ex with the base e is the most commonly used exponential function in Calculus because of its nice properties (that we will learn later on). As a consequence, natural logarithm ln(x) is the most commonly used logarithmic function and, by default, ‘logarithm’ refers to ln(x). Basic properties of logarithms. Let us take a look at the graph of y = ln(x) and discuss some of its properties after stating them first. • The graph of ln(x) is graph of ex flipped around the diagonal y = x. • ln(x) is defined only for positive values x > 0. • ln(x) approaches −∞ when x approaches 0 (vertical asymptote). • ln(1) = 0, i.e. x-intercept is 1. • ln(x) slowly approaches +∞ as x goes to infinity. • The first property means that if (a, b) is on the graph of y = ex then (b, a) is on the graph of y = ln(x) (the two coordinates are flipped). This is true because, if (a, b) is on the graph of y = ex this means that b = ea , which means that a = ln(b), which means that (b, a) is on the graph of y = ln(x). 16 1 Functions • The second property is true because ea takes only positive values, so we can solve ea = x for a only if x is positive. This is a good time to mention the terminology of the domain and range of a function y = f (x). Domain means the set of all allowed inputs of a function, and range means the set of all possible outputs. For example, domain of ex is the set of all real numbers R = (−∞, ∞), while the range is the set of all positive numbers (0, ∞). For ln(x), it is exactly the opposite – domain is (0, ∞) and range is R. • The third property we can see from the graph, but we can also think of it this way. If x > 0 is small and ea = x then a = ln(x) must be a large negative number, because the exponential growth function ea takes small values x when the input a is approaching negative infinity. • The fourth property is clear because e0 = 1 implies that 0 = ln(1). • The last property has two parts: the first is that ln(x) never stops growing (so it does not have a horizontal asymptote!) and the second is that it grows slowly. First, why will ln(x) eventually be equal to any large number we want, for example, 100? That is because ln(x) = 100 means that x = e100 and, in this case, ‘eventually’ simply means e100 . This also illustrates how slowly the logarithm grows. Even though it will reach 100, we have to ‘wait’ until e100 which is approximately equal to 26881171418161354484126255515800135873611118. Example 1. Can the values in the following table correspond to an exponential or logarithmic function? x y −1 0 1 2 3 4 6 5 8 5.5 11 5.9 Solution: The answer is no to both. It cannot be a logarithm because we cannot plug in a negative number x = −1 into a logarithm (it is not in the domain). It cannot be an exponential because it can not take value y = 0 (it is not in the range). Exercise 1. What is the domain of y = ln(−x)? How does its graph look like? Algebraic properties of logarithms. From the definition above, one can derive several important algebraic properties of the logarithm: a ln = ln(a) − ln(b) b ln(ex ) = x, eln x = x. ln(ab) = ln(a) + ln(b), ln(ac ) = c · ln(a), Example 2. Find x such that 42x = 7 · 5−x/4 . Solution: Take logarithm of both sides, ln(42x ) = ln(7 · 5−x/4 ), and then apply the above rules: x 2x ln(4) = ln(7) + ln(5−x/4 ) = ln(7) − ln(5). 4 This is now a linear equation in x, so group all the terms with x on one side: 1.4 Logarithmic functions 17 x ln(5) 8 ln(4) + ln(5) ln(7) = 2x ln(4) + ln(5) = x 2 ln(4) + =x , 4 4 4 so x = 4 ln(7) 8 ln(4)+ln(5) = 0.612895 . . . . Exercise 2. If the room temperature is 20°C and the temperature of a cup of coffee is 80°C at time t = 0 then the coffee will cool down according to the formula T = 20 + 60 · e−κt for some constant κ. At what time will the temperature reach 70°C? Your answer may depend on κ. Example 3. Simplify y = 3e−2 ln x . Solution: We can write 3e−2 ln x = 3(eln x )−2 = 3x−2 = x32 . We can also take different −2 steps: 3e−2 ln x = 3eln(x ) = 3x−2 = x32 . 1 Exercise 3. Simplify y = 4e−3 ln( x ) . Example 4. Suppose that the temperature T = T (t) of a cup of coffee is initially T (0) = 90◦ C and, if the room temperature is 20◦ C, it is decreasing according to the equation − ln(T − 20) = 0.1t + c. Find T (t). Solution: First, we plug in T (0) = 90 into the equation − ln(90 − 20) = 0.1 · 0 + c, which gives that c = − ln(70). The equation becomes − ln(T − 20) = 0.1t − ln(70) or ln(T − 20) = −0.1t + ln(70). Exponentiating both sides we get that T − 20 = e−0.1t+ln(70) = e−0.1t eln(70) = 70e−0.1t . Finally, T = 20 + 70e−0.1t . Exercise 4. Suppose that the number of people N = N(t) that have heard a rumour spreading at a party is initially N(0) = 1 and, if there are 100 people attending the 1 N t party, it is increasing according to the equation 100 ln 100−N = 50 + c. Find N(t). Logarithmic scales. There are many quantities that are conventionally measured on logarithmic scales when the original (more physical) measurement can cover a wide range of values of very different orders of magnitude, from very small to very large. We will give three examples below. The Richter magnitude R of an earthquake is defined as R = log A A0 where A is the maximum amplitude (measured in millimetres) recorded on a standard seismograph (the Wood-Anderson seismograph) at a distance of 100 km from the earthquake epicentre, and A0 = 0.001mm is the amplitude corresponding to the so called ‘standard earthquake’. The amplitude A is one empirical parameter describing the strength of the earthquake. We can rewrite the above formula as A = A0 · 10R = 0.001 · 10R = 10R−3 . 18 1 Functions This means that if the Richter magnitude R increases by 1, the amplitude A (or the strength of the earthquake) increases 10 times. For example, magnitude 9 earthquake would correspond to A = 106 mm= 1 km, which means that in practice the measurements are not as simple as the definition suggests. Example 5. How much stronger is the earthquake of magnitude 5.8 compared to the earthquake of magnitude 5.3? Is it 5.8 5.3 ≈ 1.094 times stronger? Solution: Because √ A(5.8) A0 · 105.8 = = 100.5 = 10, 5.3 A(5.3) A0 · 10 the √ earthquake of magnitude 5.8 is 10 ≈ 3.16 times stronger than the earthquake of magnitude 5.3. It is not 5.8 5.3 ≈ 1.094 times stronger, because the Richter magnitude measures strength on the logarithmic scale. Exercise 5. How much stronger is the earthquake of magnitude 7.1 compared to the earthquake of magnitude 5? In chemistry, pH scale is used to measure the acidity or alkalinity of a solution in water according to the formula pH = log 1 = − log H + H+ where H + is the number of moles of hydrogen ions per litre of solution. In acoustics, sound pressure level L is measured in decibels (dB) according to the formula p L = 20 · log p0 where p is the sound pressure measured in pascal (Pa) and p0 = 20 µPa = 20 · 10−6 Pa is the reference sound pressure considered as the threshold of human hearing (according to Wikipedia, roughly the sound of a mosquito flying 3 m away). Example 6. If your earphones can output 110 dB and your friend’s earphones can output 100 dB, how much more damage can you do to your ears? Is it only 10% more? Solution: If p1 is the maximum sound pressure of your earphones and p2 is the maximum sound pressure of your friend’s earphones then the above formula gives p p 1 2 110 = 20 · log , 100 = 20 · log . p0 p0 1.4 Logarithmic functions 19 From here we can solve it in two ways. First, we can subtract the two equations and use properties of the logarithm, p p p 2 1 1 − 20 · log = 20 · log , 110 − 100 = 20 · log p0 p0 p2 √ 10 which implies that log pp21 = 20 = 0.5, so pp12 = 10 ≈ 3.16. This means that your earphones are 3.16 times as noisy in terms of sound pressure. Again, it is not just 10% more, because the dB measurement is on the logarithmic scale. Another way is first to solve the above pressure level equation for p, p = p0 · 10L/20 , 100/20 . Dividing two equations, which gives that p1 = p0 · 10110/20 and √ p2 = p0 · 10 p1 110/20−100/20 = 10. we again get p2 = 10 Exercise 6. If the sound pressure level of a jackhammer is 100 dB and of the jet engine is 140 dB, how much louder is the jet engine in terms of sound pressure? Answer to Exercise 1. Because we can only plug in positive numbers into logarithm, −x must be positive, so −x > 0, or x < 0. The domain is all negative numbers, (−∞, 0). The graph will be the same as ln(x) flipped around the y-axis. It is always the case that the graph of y = f (−x) is graph of y = f (x) flipped around the y-axis. Answer to Exercise 2. Set 70 = 20 + 60 · e−κt , so e−κt = 65 , and taking logarithms, −κt = ln 56 = − ln 65 or t = κ1 ln 65 = lnκ1.2 . Answer to Exercise 3. y = 4x3 . Answer to Exercise 4. Plugging in t = 0 and N = 1 we get that c = 1 1 100 ln 99 . 1 N t 1 1 N 1 ln = + ln =⇒ ln = 2t + ln 100 100 − N 50 100 99 100 − N 99 N 1 2t 2t 2t =⇒ = e =⇒ 99N = 100e − Ne 100 − N 99 100e2t =⇒ N(99 + e2t ) = 100e2t =⇒ N = . 99 + e2t Answer to Exercise 5. 102.1 ≈ 125 times. Answer to Exercise 6. 10140/20−100/20 = 100. Then 20 1 Functions 1.5 Logarithmic scales In the last section we have seen several examples of measurements on logarithmic scales, and in this section we will look at logarithmic scales from a different angle. Namely, we will use log-transformations to help us decide if some data points follow an exponential trend of the form y = y0 · ax or a power function trend of the form y = c · xκ . Log scale. Let us begin with an exponential trend of the form y = y0 · ax . Example 1. Bacteria is grown in a petri dish and its surface area A (cm2 ) is measured at various times t (days). t A 0.2 2.5 0.4 2.6 0.6 3.9 0.8 5.3 1.0 5.9 1.2 7.0 1.4 1.6 1.8 2.0 9.4 12.1 14.4 17.6 The values are given in the above table. It is natural to assume that the growth is exponential, but it may be hard to tell just by looking at the graph of these point (shown in blue in the figure below). Before we discuss how log-transformation can be used to see the exponential trend more clearly, let us first mention how to plot the exponential trend line (shown in red in the figure) in Google Sheets. • Enter x-values and y-values into two columns, and select those columns. • Go to Insert > Chart. • Under Chart Type select Scatter. • Under Customize select Series. • Scroll and check Trendline. • Under Type select Exponential. • Under Label choose Use Equation. The exponential trend line A = 1.98e1.1t looks like a good fit, but how could we expect this just by looking at data points? One way is to transform the dependent variable, in this case the area A, into log(A) or ln(A). That is because if A = A0 · at then, taking logarithms, A = A0 · at =⇒ log(A) = log(A0 ) + (log a) · t. For example, if A = 3.5 · 2t then log(A) = log(3.5) + log(2) · t = 0.544 + 0.301 · t. This means that log(A) is a linear function of t and, of course, it is easier to see visually if a function is linear. For example, let us add log(A) values to the above table: 1.5 Logarithmic scales 21 t 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 A 2.5 2.6 3.9 5.3 5.9 7.0 9.4 12.1 14.4 17.6 log(A) 0.39 0.42 0.59 0.73 0.77 0.84 0.97 1.08 1.16 1.24 If we plot the values log(A) in the last row vs. time t (shown in blue in the figure), we can see with the naked eye that these points follow a linear trend. By the way, in Google Sheets this figure can be produced from the above figure by a simple extra step: • Under Customize select Vertical axis and select option Log scale. Remark. If you look at the y-axis in this graph, you might notice something strange: the increment between 4 and 2 looks bigger than the increment between 6 and 4 which looks bigger than the increment between 8 and 6, etc. This is because the values 2, 4, 6, 8 etc. marked on the y-axis are the values of the original area A, while the actual values plotted on the graph are log(2), log(4), log(6), log(8) etc. This is just a convention with log-scale plots that allows us to see possible linear trend of log(A) vs. t, while at the same time see the original values A. Exercise 1. Bacteria is grown in a petri dish and its surface area A (cm2 ) is measured at various times t (days): t A 0.2 1.9 0.4 2.2 0.6 3.3 0.8 4.2 1.0 4.9 1.2 7.8 1.4 1.6 1.8 2.0 9.4 11.7 15.8 20.4 Plot the data on log-scale and fit the exponential trend line. To summarize the above discussion, if on the log-scale A is a linear function of t (or approximately follows a linear trend) then A is an exponential function of t (or approximately follows an exponential trend). Using formulas: log(A) = b + m · t =⇒ A = 10b+m·t = 10b · (10m )t . Although base 10 is commonly used in log-scale plots, the same would hold if we used base e and natural logarithm: ln(A) = b + m · t =⇒ A = eb+m·t = eb · emt . Notice that in both cases if the slope on the log-scale is positive, m > 0, then we have an exponential growth on the original scale and, if the slope on the log-scale is negative, m < 0, then we have an exponential decay on the original scale. Example 2. Find y as a function of x if (a) log(y) = 2 + 2x , (b) ln(y) = 1 − 2x. What is the continuous growth/decay rate in both cases? 22 1 Functions Solution: (a) y = 102+x/2 = 102 · 10x/2 = 100 · (101/2 )x . This is an exponential growth function with the base 101/2 ≈ 3.1622 and the continuous growth rate ln(101/2 ) = 12 ln(10) ≈ 1.151. (b) y = e1−2x = e · e−2x . This is an exponential decay function with the continuous decay rate 2. Exercise 2. Find y as a function of x if on the log-scale it is given by −1 − 2x 3 . What is the continuous growth/decay rate? Log-log scale. Next we will discuss how to use logarithmic scales to observe a trend given by the power function of the form y = c · xκ . In this case we will be using log-log scale, because both input variable and output variable will be transformed by the logarithm. There are many examples of a power law relationship between variables: see, for example, wikipedia.org/wiki/Power_law. We will start by explaining the idea on a simple example. Example 3. Consider the following data points (x, y) shown in blue in the figure below: x y 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.003 0.024 0.070 0.196 0.347 0.607 0.934 1.656 2.178 2.941 We can see that the power function y = 2.9 · x3.03 (shown in red) is a good fit, and it can be found in the same way as above only selecting Power Series instead of Exponential under the trend line type. • Enter x-values and y-values into two columns, and select those columns. • Go to Insert > Chart. • Under Chart Type select Scatter. • Under Customize select Series. • Scroll and check Trendline. • Under Type select Power Series. • Under Label choose Use Equation. How can we guess that a power function might be a good fit? This can be done by transforming both variables x and y into log(x) and log(y). The reason is because if y = c · xκ then, taking logarithms, y = c · xκ =⇒ log(y) = log(c) + κ · log(x). For example, if y = 3.5 · x2 then log(y) = log(3.5) + 2 · log(x) = 0.544 + 2 · log(x). This means that log(y) is a linear function of log(x) and, again, it should be easier to see visually if a function is linear. For example, let us replace all the values in the above table by their logarithms: 1.5 Logarithmic scales 23 log(x) -1.00 -0.70 -0.52 -0.40 -0.30 -0.22 -0.15 -0.10 -0.05 0.00 log(y) -2.46 -1.68 -1.06 -0.75 -0.42 -0.23 0.02 0.20 0.34 0.46 If we plot the values log(y) vs. log(x) (shown in blue in the figure), we can see with the naked eye that these points follow a linear trend (shown in red). To produce this figure in Google Sheets: • Select Log scale option under Vertical axis. • Select Log scale option under Horizontal axis. Again, notice that the labels on the two axes are from the first (x, y) table above, although the actual plot uses log(x) and log(y). Exercise 3. Consider the following data points (x, y) shown in blue in the figure below: x y 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.003 0.021 0.060 0.134 0.277 0.488 0.718 1.410 1.851 2.541 Plot the data on the log-log-scale and fit the power trend line. To summarize the above discussion, if on the log-log-scale y is a linear function of x (or approximately follows a linear trend) then y is a power function of x (or approximately follows a power trend). Using formulas: log(y) = b + κ · log(x) =⇒ y = 10b+κ·log(x) = 10b · xκ . Although base 10 is commonly used in log-scale plots, the same would hold if we used base e and natural logarithm: ln(y) = b + κ · ln(x) =⇒ y = eb+κ·ln(x) = eb · xκ . Notice that in both cases the slope κ of the line on the log-log-scale is the power in c · xκ on the original scale. We can see in the figure that the behaviour of such power functions is quite different depending on whether κ < 0, 0 < κ < 1, or κ > 1. As a result, the slope of a linear trend on the log-log scale tells us about the shape of the function on the original scale. 24 1 Functions Example 4. Find y as a function of x if (a) log(y) = 2 + 12 log(x), (b) ln(y) = 1 − 2 ln(x). √ 1 Solution: (a) y = 102+ 2 log(x) = 102 x1/2 = 100 · x. (b) y = e1−2 ln(x) = e · x−2 = xe2 . Exercise 4. Find y as a function of x if on the log-log-scale their relationship is linear with y-intercept −1 and slope − 32 . Exercise 5. Match the lines 1, 2, 3 on the log-log-scale to the power functions A, B,C on the original scale. Finally, let us take a look at one example of application of log-log-scale in biology. Here is an example from the paper1 . The figure shows log-log-plots of body mass M (in kg) vs. body length L (in m) of 66 plant species (white dots) and 67 animal species (black dots). This data shows that the mass of animals follows the power trend of the form M = c · L2.81 and the mass of plants follows the power trend of the form M = c · L2.95 , so the slopes of the two lines in the figure are 2.81 and 2.95. Since mass M is proportional to volume V , which should be roughly proportional to L3 , it seems natural that the law should be M = c · L3 ; indeed, both exponents 2.81 and 2.95 are close to 3. However, things are not so simple. The same paper mentions that the law for 20 year old human males is closer to M = c · L5 and another paper2 shows that within the same species of plants the law is closer to M = c · L4 . 1 “The scaling of plant and animal body mass, length, and diameter” by K.J. Niklas. “Invariant scaling relationships for interspecific plant biomass production rates and body size” by K.J. Niklas and B.J. Enquist. 2 1.5 Logarithmic scales 25 Exercise 6. The figure below is from a New York Times article about bear markets.3 On the figure it says that “the vertical scale is adjusted so that percentage changes are comparable”. What is the vertical scale, and why does it make percentage changes comparable? Summary of log-scales. To summarize why log-scales are so useful, let us compare the Dow Jones index on the original scale and on the logarithmic scale. • Many quantities take values on vastly different scales. For example, for several decades the Dow Jones was below or around 100, while in recent two decades 3 https://www.nytimes.com/2022/06/13/business/bear-market-timeline-stocks.html 26 1 Functions it was around or above 10000. In addition to examples from the last section (strength of earthquakes, acidity or alkalinity of a solution in water, sound pressure), other examples include mass and size of plants, or luminosities of stars4 . In the first figure above we can see that, if we plot values of different magnitude on the same graph, we can barely distinguish the values that are relatively small. For example, we can barely see what is going on with the Dow Jones between 1900 and 1980. Logarithm grows slowly, so it can make very small and very large values comparable to each other. • As we discussed in this section, log-scale transforms an exponential function into a linear one. If we observe a linear trend on the log-scale, it suggests an exponential trend on the original scale. Although the Dow Jones index has large fluctuations, we can see that long term it follows a roughly linear trend on the log-scale. • As we saw in the last exercise, log-scale makes relative changes comparable. For example, when we look at the Dow Jones on the original scale, we might think that the Financial crisis of 2007–2008 was the worst stock market crash, while on the log-scale we can clearly see that the Wall Street Crash of 1929 was much worse. • The same considerations apply to the log-log-scale, except that it turns a power trend into a linear one. Answer to Exercise 1. The exponential trend line is A = 1.44e1.33t . Answer to Exercise 2. y = 10−1−2x/2 = 10−1 · 10−2x/3 = 0.1 · (10−2/3 )x . This is an exponential decay function with the base 10−2/3 ≈ 0.2154 and the continuous decay rate − ln(10−2/3 ) = 32 ln(10) ≈ 1.535. Answer to Exercise 3. The power function trend line is y = 2.33 · x2.96 . 2 Answer to Exercise 4. y = 10−1− 3 log(x) = 10−1 x−2/3 = 0.1 . x2/3 Answer to Exercise 5. A3, B1,C2. Answer to Exercise 6. This is a log-scale. In other words, the y-axis reflects the logarithm of the S&P 500 price P(t). If the stock market changed by 100 · r% between time t and t + ∆t then the price P(t + ∆t) = P(t) · (1 + r). After taking logarithms, we get log P(t + ∆t) − log P(t) = log(1 + r), and since the graph shows y(t) = log P(t), this means that y(t + ∆t) − y(t) = log(1 + r). In other words, the same increment on the log-scale corresponds to the same percentage change. That is why log-scale makes percentage changes comparable no matter what the actual value is. 4 https://en.wikipedia.org/wiki/Hertzsprung-Russell_diagram 1.6 Trigonometric functions 27 1.6 Trigonometric functions Basic transformations of functions. First, let us discuss what happens to the graph of a function y = f (x) when we multiply and add a constant to its input and output: y = b + a f c(x − d) for some constants a, b, c and d. Let us break this down into steps. At the same time, you can visualize by dragging sliders in this app: www.geogebra.org/m/csuqnyhc. • Vertical stretching and flipping: y = ±a · f (x). Multiplying the output f (x) of a function by a constant a stretches the graph by a factor of a in the vertical direction, since output is depicted on the y-axis. For example, the graph of y = 2 f (x) is stretched two times vertically, and the graph of y = 0.5 f (x) is stretched 0.5 times (or shrunk 2 times) vertically. If we multiply by −a, the graph is also flipped upside down. • Horizontal shrinking and flipping: y = f (±c · x). Multiplying the input x of a function by a constant c shrinks the graph c times in the horizontal direction. For example, the graph of y = f (2x) is shrunk 2 times horizontally, and the graph of y = f (0.5x) is shrunk 0.5 times (or stretched 2 times) horizontally. This might sound a bit counterintuitive, but it is true. For example, if c = 2 then a point (2x, f (2x)) on the original graph of y = f (x) becomes (x, f (2x)) on the graph of y = f (2x), so the first coordinate 2x becomes x, so it is shrunk be a factor of 2. If we multiply by −c, the graph is also flipped horizontally around the y-axis. • Shifting up or down: y = f (x) ± b. Adding b to or subtracting b from f (x) simply shifts the graph up or down by b. • Shifting right or left: y = f (x ± d). Subtracting d from the input x shifts the graph to the right by d. Again, this might sound counterintuitive, but a point (x − d, f (x − d)) becomes (x, f (x − d)) so x − d is shifted to the right by d. Adding d shifts the graph to the left by d. Example 1. If we stretch the graph of y = f (x) both horizontally and vertically 2 times, shift it up by 1 and shift it to the left by 2, what function will this graph correspond to? Solution: y = 2 f (x) stretches vertically, then y = 2 f ( 2x ) stretches horizontally (shrinking by a factor of 21 is stretching by a factor of 2), then y = 1 + 2 f ( 2x ) shifts up by 1, and finally y = 1 + 2 f ( x+2 2 ) shifts to the left by 2. Exercise 1. If we shrink the graph of y = f (x) vertically 2 times and shrink it horizontally 3 times, shift it down by 1 and shift it to the right by 7, what function will this graph correspond to? 28 1 Functions Example 2. If we shift the graph of y = f (x) up by 1 and to the left by 2, and then stretch it both horizontally and vertically 2 times, what function will this graph correspond to? Solution: y = 1 + f (x) shifts up by 1, then y = 1 + f (x + 2) shifts to the left by 2, then y = 2(1+ f (x+2)) stretches vertically 2 times and, finally y = 2(1+ f ( 2x +2)) stretches horizontally 2 times (shrinking by a factor of 12 is stretching by a factor of 2). Notice: compared to Example 1 we simply changed the order, but we got a very different answer. Exercise 2. If we shift the graph of y = f (x) down by 1 and to the right by 7, and then stretch it vertically 2 times and shrink it horizontally 3 times, what function will this graph correspond to? Example 3. If the graph of y = f (x) is given by the solid green curve, what is the function whose graph is given by the dashed blue curve? Hint: Use the grey dotted curve as a guideline. Solution: Dashed curve looks like the dotted curve shifted by 2 to the right, and dotted curve looks like the solid curve shrunk vertically by a factor of 2. So we shrink first, y = 0.5 f (x), and then shift to the right, y = 0.5 f (x − 2). Exercise 3. If the graph of y = f (x) is given by the solid green curve, what is the function whose graph is given by the dashed blue curve? Hint: Use the grey dotted curve as a guideline. Sine and cosine. Let us recall the graphs of functions sin(x) and cos(x), and recall some of their basic properties. • Both fluctuate between −1 and +1. We say that their amplitude is equal to 1. • Both are periodic functions with the period 2π. This means that sin(x + 2π) = sin(x) and cos(x + 2π) = cos(x), and the graphs of these functions repeat the same pattern every 2π. • Many natural phenomena are (approximately) periodic, for example, daylight hours, average monthly temperatures, ocean tides, circadian rhythms, heart beat, etc. Cosine and sine can be used to describe (or model) some of them. 1.6 Trigonometric functions 29 Just like sine and cosine, a function y = f (x) is said to be periodic with the period T if f (x + T ) = f (x) for all x, in which case the graph of this function repeats every T (units of x). Compared with the above general transformation y = b + a f c(x − d) , when dealing with sin(x) and cos(x) we will write this transformation as 2π (x − d) y = b + a · cos T by replacing the constant c with 2π T , for some constant T > 0 (and, similarly, for sin(x)). This means that we shrink the graph of cos(x) horizontally by a factor 2π T , T which is the same as stretching it by a factor 2π , which means that the period 2π becomes T . In other words, we express the horizontal stretch factor in terms of the period T , which has an important practical meaning and which is often easier to see visually. Let us look at the graph of this function. • Vertical stretch factor a is called amplitude. • Vertical shift b is also called average, since the function fluctuates around this level. • Horizontal shift d is also called phase shift. • T is the period. 1 • 2π T is called angular frequency and T is called frequency. Example 4. What is the amplitude, average, phase shift, period, and frequency of y = 2(0.5 + 1.5 cos(3x + 1))? Solution: If we rewrite it as y = 1 + 3 cos 3(x + 31 ) , we see that the amplitude is 3, average is 1, and phase shift is − 13 (minus because + 13 means that we shift to the 2π 1 3 left). Since 2π T = 3, period is T = 3 , and frequency is T = 2π . Exercise 4. What is the amplitude, average, phase shift, period, and frequency of y = −3 + 4 cos( x−2 3 )? Example 5. What is the function in the figure on the left? 30 1 Functions Solution: We see that the minimum is 0.2 and maximum is 2.2, so the average is 1.2. Then the amplitude is the difference between the maximum and average, 2.2 − 1.2 = 1. The function looks like cosine shifted to the right by 0.4, so phase shift is 0.4. We can also see that the period is 1, for example by looking at the difference between consecutive peaks 1.4 − 0.4 = 1. Therefore, the function is y = 1.2 + cos(2π(x − 0.4)). Exercise 5. What is the function in the figure above on the right? Daylight hours in Toronto. The length of daylight in Toronto can be approximated by the function 12.175 − 3.255 cos 2π 365 (t + 10) where t ≤ 365 is measured in days. See: youtu.be/0ht56fyCIFQ?t=186. Varying average and amplitude. The figure above5 depicts monthly number of sunspots over 400 years of sunspot observations, which follow an approximately 11 year solar cycle. We can think of T = 11 as the period of this cycle, but the amplitude and average are varying over time. Let us look at a couple of simple examples of functions of this form: 2π y = b(x) + a(x) · cos (x − d) T where b = b(x) is the average and a = a(x) is the amplitude, both varying with x. Example 6. Write down a possible model for the function below on the left. Solution: The distance between consecutive peaks appears to be about 10, so we can take the period of one cycle to be T = 10. The curve is bounded above by the 60 line that passes through (0, 0) and (90, 60), so its slope is 90 = 2/3 and the line is 5 Robert A. Rohde, commons.wikimedia.org/wiki/File:Sunspot_Numbers.png 1.6 Trigonometric functions 31 y = 2x 3 . The curve is bounded below by the x-axis y = 0. That means that the average x x (line through the middle) is b(x) = 3x , and the amplitude is a(x) = 2x 3 − 3 = 3 . As a result, we can guess that the curve is y = 3x + 3x cos( 2π 10 x). It appears that the first peak is shifted from zero, so there might be a phase shift, but this is just an artifact of scaling cosine by a varying amplitude which is 0 at x = 0. The peaks away from zero appear to be near 10, 20, 30,etc., so there is no phase shift. Exercise 6. Write down a possible model for the function above on the right. Example 7. The diameter of a Ferris Wheel is 20 meters, and at the lowest point it is 2 meters above the ground. It takes the Ferris Wheel 3 minutes to complete one revolution. What is the height y = h(t) of a rider starting at time t = 0 at the lowest point? Solution: Let us first find the coordinates (x, y) of the rider in terms of the angle θ in the figure. Since the radius of the Ferris wheel is 10m, the vertical side of the right triangle is 10 cos(θ ) and the horizontal side is 10 sin(θ ). This means that x = 10 sin(θ ) and, because the center of the Ferris wheel is at height 12, the height y is 12 − 10 cos(θ ). Now, what is the angle θ as a function of time, θ (t)? Since the wheel is revolving at constant angular speed, θ (0) = 0 and θ (3) = 2π (full revolu2π tion), we see that θ (t) = 2π 3 t. Finally, we get that the height y = 12 − 10 cos( 3 t). Exercise 7. The diameter of a Ferris Wheel is 30 meters, and at the lowest point it is 5 meters above the ground. It takes the Ferris Wheel 2 minutes to complete one revolution. What is the height y = h(t) of a rider starting at time t = 0 at the highest point? 32 1 Functions Answer to Exercise 1. y = −1 + 21 f (3(x − 7)). Answer to Exercise 2. y = 2( f (3x − 7) − 1). Answer to Exercise 3. y = 0.5 f (x + 1). Answer to Exercise 4. Amplitude is 4, average is −3, phase shift is 2, period is 1 T = 6π, and frequency is 6π . Answer to Exercise 5. y = −2 + 5 cos( π3 (x − 3)). We can also view it as cosine flipped upside down: y = −2 − 5 cos( π3 x). Answer to Exercise 6. y = (15 − 4x ) + (15 − 4x ) cos( 2π 15 x). Answer to Exercise 7. y = 20 − 15 cos(π(1 + t)). 1.7 Polynomials and rational functions 33 1.7 Polynomials and rational functions Polynomials. Polynomials are functions of the type y = a0 + a1 x + a2 x2 + . . . + an−1 xn−1 + an xn where constants a0 , a1 , a2 , . . . , an are called the coefficients of this polynomial. In other words, a polynomial is a sum of power functions cxκ with integer powers κ ≥ 0. The largest power n is called the degree of a polynomial. For example: • 1 − 2x + 2x2 is a polynomial of degree 2, also called a quadratic polynomial; • −3x + x3 is a polynomial of degree 3, also called a cubic polynomial. Motivation. Polynomials play a fundamental role in Calculus. On the one hand, they arise naturally in applications (perhaps, the most famous example is the height of a projectile changing over time due to gravity, such as an apple falling from a tree). On the other hand, they also serve as a very useful tool, because of the combination of the following two factors: 1. Polynomials are easy to work with, e.g. easy to differentiate and integrate. 2. Other functions can be approximated by polynomials. This means that we can often replace a more complicated function by a polynomial if it helps us solve the problem. For example, here is a figure showing how y = cos(x) can be approximated near x = 0 by polynomials Pn (x) of degree n = 2, 4, 6 and 8. We can see that approximations get better on wider and wider intervals as the degree n increases. This is a preview of the so called Taylor polynomials that will be studied later on and that have numerous applications. Degree 8 poly4 6 8 2 nomial is given by 1 − x2! + x4! − x6! + x8! . Here is another example. In Chapter 1.5, Example 1, we considered a table of data points and, using Google Sheets, fit an exponential trend curve y = 1.98e1.1x . In the figure on the left, instead of Exponential, we now select a Polynomial trend curve, which turns out to be a quadratic polynomial y = 2.5 − 0.525x + 4.01x2 . We see that it fits the data very well and, as a result, this quadratic polynomial could also be used to describe the data if it is easier to work with or, especially, if this polynomial and its coefficients have some physical meaning. 34 1 Functions Basic properties. Let us review some basic properties of polynomials. • The graph of a polynomial of degree n can change direction (increasing or decreasing) at most n times. For example, polynomial of degree 4 in the figure is decreasing, then increasing, then decreasing, then increasing again. All the examples in the figure change direction exactly n times, but it could be fewer than n times. • Polynomial p(x) of degree n can have at most n roots, i.e. points x such that p(x) = 0 (where the graph crosses the x-axis). In the figure above, the polynomials of degrees 1, 3 and 4 have 1, 3 and 4 roots correspondingly, but polynomial of degree 2 has no roots. • When x goes to +∞ or −∞ (when x takes very large positive or negative values), the polynomial p(x) also goes to +∞ or −∞ and the sign can be determined by looking at the term an xn with the largest degree n, because this term dominates all the other terms for large x in the sense that it grows faster. For example, if p(x) = 1 − 2x + x2 − 3x3 then, p(x) goes to +∞ when x goes to −∞, because the term −3x3 becomes positive, e.g. −3(−10)3 = 3 · 103 > 0. Example 1. For the polynomials (a) and (b) in the figure, determine the smallest possible degree n, whether n is even or odd, and the sign of the coefficient an . Solution: (a) The graph changes direction 5 times, so the smallest possible degree is n = 5. Degree n must be odd, because p(x) goes to +∞ when x goes to −∞, and −∞ when x goes to +∞. Coefficient an < 0 must be negative, because an xn dominates and becomes negative when x goes to +∞. (b) The graph changes direction 2 times, so the smallest possible degree is n = 2. Degree n must be even, because p(x) goes to −∞ when x goes to −∞ and +∞. Coefficient an must be negative, because an xn dominates and becomes negative when x goes to +∞. Of course, this graph looks like a familiar upside down parabola. Exercise 1. For the polynomials (c) and (d) in the above figure, determine the smallest possible degree n, whether degree n is even or odd, and the sign of the coefficient an . If we know that a polynomial p(x) of degree n has n roots x1 , . . . , xn then we can write this polynomial as 1.7 Polynomials and rational functions 35 y = p(x) = c(x − x1 )(x − x2 ) · · · (x − xn ) for some constant c. If we also know the value of p(x) at some point x other than the roots, we can use it to determine c. Example 2. Determine the cubic polynomial (a) in the figure. Solution: We can see from the graph that the roots are −1, 1 and 3, so p(x) = c(x + 1)(x − 1)(x − 3). We can also see from the graph that p(0) = 6, so 6 = c(0 + 1)(0 − 1)(0 − 3) = 3c, and c = 2, so p(x) = 2(x + 1)(x − 1)(x − 3). If we need to, we can multiply this out to get p(x) = 6 − 2x − 6x2 + 2x3 . Exercise 2. Determine the cubic polynomial (b) in the above figure. Example 3. In the figure, the graphs of two polynomials are visible in a limited region. Dashed red line is a cubic polynomial with negative leading coefficient, a3 < 0. How many zeros does this polynomial have, and what can we say about their location? Solution: We can see two roots, x1 = −3 and x2 = +3. Since a3 < 0 and the leading term a3 x3 dominates, polynomial must go to −∞ when x goes to +∞, so the graph must start decreasing eventually. The graph can change direction at most 3 times, which means that it must start decreasing somewhere on the right of the observed region, and so there will be another root x3 > 7. Exercise 3. In the above figure, solid blue line is a quartic polynomial (of degree 4) with negative leading coefficient, a4 < 0. How many zeros does this polynomial have, and what can we say about their location? Quadratic polynomials. Polynomials of degree 2, y = ax2 + bx + c, appear frequently in Calculus, so let us recall their basic properties. Their graphs are given by parabolas, which open upwards when a > 0 and open downwards when a < 0. If the discriminant D = b2 − 4ac is nonnegative, D ≥ 0, then this polynomial has roots √ √ −b − b2 − 4ac −b + b2 − 4ac x1 = , x2 = . 2a 2a 36 1 Functions If the discriminant is equal to zero, D = 0, then the roots are the same. If D < 0 then there are no roots and the parabola is entirely above or below the x-axis. The extreme point of the parabola (minimum or maximum) is called the vertex and b b2 its coordinates are x = − 2a and y = c − 4a . Later we will learn how to find this extreme point by setting the derivative to zero (since the slope of the parabola is zero at that point), but one can also find it using simple algebra, by completing the square. Example 4. Find the roots (if they exist) and vertex of y = 2x2 − 2x − 4 by completing the square. Solution: First factor out the leading coefficient y = 2(x2 −x−2). Next, to complete the square for x2 − x − 2, we want to create something that looks like (x ± r)2 = x2 ± 2rx + r2 . In this case, we want −x to look like −2rx, so we must take r = 0.5. Then we add and subtract r2 = 0.52 and rewrite x2 − x − 2 = x2 − 2 · 0.5 · x + 0.52 − 0.52 − 2 = (x − 0.5)2 − 0.52 − 2 = (x − 0.5)2 − 2.25. This finishes completing the square: y = 2(x−0.5)2 −4.5. The vertex is at the point (0.5, −4.5) since the parabola opens upwards and will have a minimum when the term (x − 0.5)2 is as small as possible, i.e. when x = 0.5. Another way to see this is by noticing that y = 2(x − 0.5)2 − 4.5 is obtained from y = x2 by stretching it vertically 2 times, then shifting to the right by 0.5 and down by 4.5. We can also find the roots once we completed the square: (x − 0.5)2 − 2.25 = 0 ⇒ (x − 0.5)2 = 2.25 ⇒ x − 0.5 = ±1.5 ⇒ x = −1, 2. Of course, we could have also used the formulas above. Exercise 4. Find the roots (if they exist) and vertex of y = 2x2 + 4x − 6 by completing the square. Rational functions. Rational functions are functions of the type y= p(x) q(x) where both the numerator p(x) and denominator q(x) are polynomials. Such functions are undefined whenever q(x) = 0, so the roots of the denominator are not in the domain. Also, these roots are often vertical asymptotes, for example, when the numerator p(x) ̸= 0 at the same point, so the function will approach −∞ or +∞ as the variable x approaches the root of q(x). Rational functions can sometimes have horizontal asymptotes as x approaches −∞ and +∞, and we will see in the examples how these asymptotes can be determined. 1.7 Polynomials and rational functions 37 Example 5. Find all vertical and horizontal 2 asymptotes of y = 2xx2 −4x−96 . −x−30 Solution: First, we rewrite the function as y = 2(x2 −2x−48) = 2(x+6)(x−8) (x+5)(x−6) by finding the roots x2 −x−30 of the quadratic polynomials in the numerator and denominator. The denominator has roots −5 and 6, and the numerator is not zero at these points, so the function will approach −∞ or +∞ as x approaches −5 and 6. Of course, the easiest way to see how the function approaches these vertical asymptotes is to graph it in any graphical calculator. If we want to sketch it without a graphical calculator, we can check the sign of ±∞ by considering values of x near a root. For example, if x is slightly bigger than 6 then the function will be negative, because the signs of all the factors will be 2(+)(−) (+)(+) < 0, so the function approaches −∞ from the right side of x = 6. Similarly, if x is slightly smaller than 6 then the function will be positive, because the signs of all the factors will be 2(+)(−) (+)(−) > 0, so the function approaches +∞ from the left side of x = 6. We can check what happens near x = −5 similarly. To find the horizontal asymptote at infinity (if it exists), heuristically we can remember that a polynomial 2x2 − 4x − 96 is dominated by the leading term 2x2 when x is large, and x2 −x −30 is dominated by x2 , so for large x their ratio behaves 2 like 2x = 2 and so the horizontal asymptote will be y = 2. To make this heuristic x2 argument more precise, what we do is we take the leading term (here x2 ) in the denominator and divide both the numerator and denominator by it: 2x2 − 4x − 96 = x2 − x − 30 2x2 −4x−96 x2 x2 −x−30 x2 = 2 − 4x − 96 x2 1 x 1− − 30 x2 → 2−0−0 =2 1−0−0 as x goes to infinity, because the terms where we divide by x or x2 all go to zero. Finally, the function crosses x-axis at x = −6 and x = 8, which are the roots of the polynomial in the numerator. Exercise 5. Find all vertical and horizontal asymptotes of y = graph. x2 −3x−2 . 2x2 −8 Sketch the The calculation of the horizontal asymptote in the above example can be used to show the following. • If the degrees of p(x) and q(x) are equal then the horizontal asymptote is the ratio of their leading coefficients. • If the degree of p(x) is smaller than the degree of q(x) then the horizontal asymptote is y = 0. • If the degree of p(x) is bigger than the degree of q(x) then there is no horizontal asymptote at infinity. 38 1 Functions For example, in the above example, if we replace x2 in the denominator by x3 then 2x2 −4x−96 2 4 96 0−0−0 2x2 − 4x − 96 x − x2 − x3 x3 = = → = 0, 1 30 x3 −x−30 x3 − x − 30 1−0−0 1− 2 − 3 3 x x x so the asymptote is y = 0. On the other hand, if we remove the term x2 altogether then 2x2 −4x−96 2x − 4 − 96 2x2 − 4x − 96 x x = −x−30 = ≈ −(2x − 4), 30 −x − 30 −1 − x x in which case there is no horizontal asymptote. Example 6. Match the blue solid curve to one of the following rational functions: • y= x (x−2)(x−3) , x (x+2)(x−3) , • y= x2 (x+2)(x−3) , • y= x2 (x−2)(x−3) . • y= Solution: Because the vertical asymptotes in the figure are −2 and 3, the denominator should be (x − (−2))(x − 3) = (x + 2)(x − 3). Since the horizontal asymptote is y = 1, the degrees of the numerator and denominator should be equal, so the numerator should be x2 and not x. This means that x2 . y = (x+2)(x−3) Exercise 6. Match the dashed green curve to one of the rational functions in the above exercise. Answer to Exercise 1. (c) Degree n must be odd, because p(x) goes to both +∞ when x goes to +∞, and −∞ when x goes to −∞. Coefficient an must be positive, again, because an xn dominates and becomes positive when x goes to +∞. The graph does not change direction, so the smallest possible degree is n = 1, but because the graph is not linear, the smallest possible degree is n = 3. (d) Degree n must be even, because p(x) goes to +∞ both when x goes to +∞ and to −∞. Coefficient an must be positive, because an xn dominates and becomes positive when x goes to +∞. The graph changes direction 2 times, so the smallest possible degree is n = 2, but because the graph does not look like a parabola, the smallest possible degree is n = 4. Answer to Exercise 2. y = −4(x + 0.5)(x − 0.5)(x − 2) = −2 + x + 8x2 − 4x3 . Answer to Exercise 3. We can see one root, x1 = 6. Since a4 < 0 and the leading term a4 x4 dominates, polynomial must go to −∞ when x goes to −∞, so the graph 1.7 Polynomials and rational functions 39 must start decreasing eventually on the left. The graph can change direction at most 4 times, and from what we can see there will be one more root somewhere to the left of −6, x2 < −6. Answer to Exercise 4. y = 2(x + 1)2 − 8. Roots are −3 and 1. Vertex is (−1, −8). Answer to Exercise 5. Vertical asymptotes are x = −2 and x = 2. Horizontal asymptote is y = 0.5. We can check that the function approaches +∞ from the left side of x = −2, −∞ from the right side of x = −2, +∞ from the left side of x = 2, and −∞ from the right side of x = 2. x . The vertical asymptotes are the same, so the Answer to Exercise 6. y = (x+2)(x−3) denominator should be the same. Horizontal asymptote is zero, so the degree of the numerator should be smaller, and our only choice given was x. Another way to see this is to notice that the function changes sign as we cross x = 0. If the numerator was x2 , it would not change sign at x = 0, so the behaviour would be like in the previous example, where the blue curve does not change sign at x = 0. 40 1 Functions 1.8 Inverse functions Recall the definition of the natural logarithm: y = ex ⇐⇒ x = ln(y). This is one example of an inverse function. Before we give a general definition of an inverse function, let us consider some examples. √ Example 1. Consider a function y = x + 1 in the figure. What is the domain and range of this function? Solve this equation for x. Solution: We are allowed √ to plug in only nonnegative numbers into x, so the domain √ of this function is [0, ∞). The output of x is also nonnegative number, but since we add +1, the range is √[1, ∞). To solve for x, we first write y−1 = x and, squaring both sides, we get x = (y − 1)2 . We can think of this answer x as a function of y, i.e. y is the input and x = (y − 1)2 is the output. This function √ x = (y − 1)2 is called the inverse function of y = x + 1, but only if we restrict y to be in [1, ∞). Although the formula (y − 1)2 also makes sense for y < 1,√ we can √ solve the original equation y = x + 1 only when y is in the range of y = x + 1, i.e. y ≥ 1. As we see in the √ figure, the input x in the domain [0, ∞) of the original function produces output x + 1 in the range [1, ∞), while the input y in [1, ∞) in the domain of the inverse function x = (y−1)2 produces output (y−1)2 in its range [0, ∞). The domain and range switch between the original function and the inverse function. √ Exercise 1. Consider a function y = x − 1. What is its domain and range? Solve this equation for x. What is the domain and range of the inverse function? Definition. Let us now define the inverse function for any function y = f (x) whenever it is possible, and also explain when this is not possible. If we look at the graph of the function y = f (x) in the figure, as we well know, given any input a in the domain of the function f depicted on the x-axis, the output is a point y = f (a) in the range of this function depicted on the y-axis. Now, let us take any point b in the range of this function, which means that it is an output b = f (x) for some point x in the domain. There may be more than one such x as we will see in the examples below, but suppose for now that there is only one solution x, as in the figure. This x is denoted f −1 (b) and it is pronounced “ f inverse of b”. In other 1.8 Inverse functions 41 words, if we can solve the equation b = f (x) for unknown x and the solution is unique, it is denoted x = f −1 (b). Suppose that such solution x = f −1 (b) is unique for every point b in the range of f . If so, the function f is called invertible and x = f −1 (y) is called the inverse function of y = f (x). Notice how the output y of f becomes the input of its inverse f −1 and the input x of f becomes the output of f −1 . This means that: • The domain of f is the range of f −1 and the range of f is the domain of f −1 . Range and domain switch when taking an inverse. To summarize in plain English, a function y = f (x) is invertible if, for any possible output y in the range of f , we can determine exactly what the input x was. Warning! The superscript −1 in f −1 is just a notation for the inverse, √ and it does 1 not mean the reciprocal f . For example, in Example 1 above, f (x) = x + 1 and 1 f −1 (y) = (y − 1)2 , not √x+1 . Example 2. Suppose p is the price (in dollars) a bakery sets for its plain bagel. Let q = f (p) depicted in the figure be the average monthly demand q for plain bagels when the price p is between $1 and $5. Is this function invertible? What is the domain and range of f −1 ? What is the meaning of f (2), and what are the units of 2? What is the meaning of f −1 (1000), and what are the units of 1000? Solution: The domain of f is [1, 5] and the range is [0, 3000]. The function f is invertible because we can see from the graph that for every q in the range, there is a unique p in the domain such that q = f (p). The domain of the inverse function p = f −1 (q) is [0, 3000] and its range is [1, 5]. The meaning of f (2) is the average monthly demand for plain bagels if the price is $2, with the units of 2 being dollars. The meaning of f −1 (1000) is the price of a bagel at which the average monthly demand is 1000, with the units of 1000 being the number of bagels. Exercise 2. After taking 100 mg of aspirin, the amount of aspirin in a patient’s body is m = f (t) = 100 · (0.5)t/20 mg, where time t is measured in minutes. A therapeutic effect of aspirin becomes negligible after 4 half-lives, so we only consider this function on the domain between t = 0 and 4 half-lives. Is this function invertible? What is the domain and range of f −1 ? What is the meaning of f (60), and what are the units of 60? What is the meaning of f −1 (25), and what are the units of 25? Horizontal line test. If we are given the graph of a function y = f (x), we can see that f is invertible if every horizontal line y = b intersects the graph not more than once. If it does not intersect then b is not in the range of f . If it intersects more than once, such b is the output f (x) of more than one input x, so we cannot determine f −1 (b), and f is not invertible. 42 1 Functions For example, if we look at the graph of y = sin(x), any horizontal line y = b for b in its range [−1, 1] intersects the graph many times, so the function is not invertible. However, if we consider y = sin(x) only on the interval [− π2 , π2 ] (solid blue piece in the figure) then the function becomes invertible, since it is strictly increasing on that interval and passes the horizontal line test. This inverse is called arcsine and is denoted x = sin−1 (y) or x = arcsin(y). The domain of arcsine is [−1, 1] and the range is [− π2 , π2 ]. As we can see here, the domain is very important when deciding is the function is invertible. Example 3. Is it true that sin−1 (sin(x)) = x for all −∞ < x < ∞? Solution: The answer is no. For example, if we take x = π, sin(π) = 0 and sin−1 (0) = 0 so sin−1 (sin(π)) = 0. The problem here is that π is not in the inteval [− π2 , π2 ] which was used to define arsine function. If we restrict to − π2 ≤ x ≤ π2 then it is true that sin−1 (sin(x)) = x. Exercise 3. Is it true that sin(sin−1 (y)) = y for all −1 ≤ y ≤ 1? Cancelling inverses. The above example shows that if f is invertible then the inverses cancel each other, f −1 ( f (x)) = x and f ( f −1 (y)) = y but x must be in the domain of f and y must be in the range of f that appear in the definition of the inverse f −1 . Monotonic functions. The most common reason for a function to be invertible is when it is monotone, which means that it is strictly increasing or strictly decreasing, as in the above examples. A couple of points to keep in mind. • The function (a) in the figure is increasing, but not strictly increasing, because it is equal to 0.2 on the interval 0 ≤ x ≤ 0.4. It does not pass the horizontal line test and is not invertible, so ‘strictly’ part is important. 1.8 Inverse functions 43 • A function does not have to be monotone to be invertible. For example, the function (b) in the figure is equal to x + 1 for −1 ≤ x ≤ 0 and equal to −x for 0 < x ≤ 1. It is not monotone, but it passes the horizontal line test. Of course, most common examples are monotone. Example 4. Arnold is taking 100 mg tablet of ibuprofen every night before going to bed. If half-life of ibuprofen is 20 minutes, is the amount of ibuprofen in Arnold’s body invertible over a 72 hour period? What about over a 5 hour period starting right after he takes a pill? Solution: The amount of ibuprofen in Arnold’s body after taking a 100 mg pill is m = f (t) = 100 · (0.5)t/20 mg, which is strictly decreasing and it will be invertible over a 5 hour period, i.e. if we restrict the domain of t to [0, 5]. However, after 24 hours Arnold will take another pill and the amount will jump from essentially zero to 100 mg, and then start decreasing again, as in the figure. So over a 72 hour period the function will not be invertible. By the way, the reason we restricted to 5 hours is because when t = 5 · 60 min (since the time t in the formula is measured in minutes), the amount m will be 100(0.5)15 = 0.003mg, which is essentially zero. Mathematically speaking, the function will be invertible up to 24 hours, but in practical terms the amount will eventually become zero and the formula will no longer be applicable. Exercise 4. The price of mailing a letter by Canada Post in 2022 is: • • • • $1.07 up to 30 g, $1.30 over 30 g up to 50 g, $1.94 over 50 g up to 100 g, $3.19 over 100 g up to 200 g, • $4.44 over 200 g up to 300 g, • $5.09 over 300 g up to 400 g, • $5.47 over 400 g up to 500 g. Is the price invertible as a function of weight? Graphs of inverse functions. If a function f is invertible, once we solve y = f (x) for x to find the inverse function x = f −1 (y), we can use this function f −1 with any variable as an input, such as x,t, etc. If we plot the graph of y = f −1 (x) on the same x-y plane as the graph of the original function y = f (x), the two graphs will be mirror images around the diagonal y = x. We explained why this is so when we introduced logarithmic functions, but let us look at one more example. If we look at the graph of y = tan(x), it is not invertible because tan(x) is periodic, so it does not 44 1 Functions pass the horizontal line test. However, if we limit the domain to (− π2 , π2 ), tangent is strictly increasing there, so we can define the inverse function, called x = tan−1 (y) or x = arctan(y). If we plot it on the same x-y plane, i.e we plot y = tan−1 (x), we can see that, indeed, the graph is a mirror image of the original graph around the diagonal. The domain of tan−1 (x) is the entire real line, −∞ < x < ∞, and the range is (− π2 , π2 ). Since y = tan(x) has vertical asymptotes at x = π2 and x = − π2 , y = tan−1 (x) has a horizontal asymptote y = π2 as x approaches +∞ and a horizontal asymptote y = − π2 as x approaches −∞. These functions will arise naturally in applications, but they are also good examples to keep in mind whenever we need a function with vertical or horizontal asymptotes. Example 5. Give an example of a function that has a horizontal asymptote y = 2 as x approaches +∞ and a horizontal asymptote y = 0 as x approaches −∞. Solution: If we shrink y = tan−1 (x) vertically by a factor π/2 and then shift it up by 1, we will get what we want, so y = π2 tan−1 (x) + 1 is one such example. Exercise 5. Give an example of a function that has a horizontal asymptote y = −1 as x approaches +∞ and a horizontal asymptote y = 0 as x approaches −∞. Answer to Exercise 1. Domain is [1, ∞), range is [0, ∞). Solution is x = y2 + 1. If we think of this solution as the inverse function, then its domain is [0, ∞) and the range is [1, ∞). Answer to Exercise 2. The function f (t) is an exponential decay with the half-life of 20 minutes, so the domain is [0, 80] and the range is [6.25, 100], because f (0) = 100 and f (80) = 6.25. It is invertible and, in fact, we can find the inverse f −1 20 ln m 20 ln 100 100 = ln 2m . The domain explicitly by solving m = 100 · (0.5)t/20 for t: t = ln 0.5 of the inverse of [6.25, 100] and the range is [0, 80]. The meaning of f (60) is the amount of aspirin left in the body after 60 minutes. The meaning of f −1 (25) is the number of minutes until there is only 25 mg left in a patient’s body. Answer to Exercise 3. Yes, because y is in the range of sin(x). Answer to Exercise 4. No, because it is not strictly increasing. Answer to Exercise 5. If we flip y = tan−1 (x) upside down, shrink it by a factor π and then shift it down by 21 , we will get what we want, so y = − π1 tan−1 (x) − 12 is one such example. 1.9 Limits and continuity 45 1.9 Limits and continuity We have already implicitly encountered the idea of a limit when we discussed vertical and horizontal asymptotes. Now we will consider a more general situation and introduce new terminology and notation that will be more compact and very convenient, especially, once we start studying derivatives. For example, instead of saying that a function y = f (x) has a horizontal asymptote y = 2 as x approaches +∞, which means that f (x) approaches 2 as x approaches +∞, we can also say that the limit of f (x) as x goes to +∞ is equal to 2, and we will express this by writing lim f (x) = 2. x→+∞ Before we summarize all the definitions and notation, let us demonstrate them on a specific example. Example 1. For the function in the figure: • What are the left and right limits of f (x) at x = 0? Does the limit of f (x) at x = 0 exist? Is the function continuous as x = 0? • What are the left and right limits at x = 2? Does the limit of f (x) at x = 2 exist? Is the function continuous as x = 2? • What are the left and right limits at x = 3? Does the limit of f (x) at x = 3 exist? Is the function continuous as x = 3? • What are the left and right limits of f (x) at x = −2? Does the limit of f (x) at x = −2 exist? Is the function continuous as x = −2? Solution: We see that as x approaches 0 from the right side (i.e. x > 0 is getting close to 0), the value of the function f (x) approaches 8. In this case we say: the right limit of f (x) at x = 0 is equal to 8, which is expressed using mathematical notation as: lim f (x) = 8. x→0+ Here, notation x → 0+ means that x goes to 0 from the right. Similarly, we see that as x approaches 0 from the left side (i.e. x < 0 is getting close to 0), the value of the function f (x) also approaches 8. In this case we say: the left limit of f (x) at x = 0 is equal to 8, and we write: lim f (x) = 8. x→0− Here, notation x → 0− means that x goes to 0 from the lef t. When the function approaches the same value from both sides, we say that the limit exists and, in this 46 1 Functions particular case, the limit of f (x) at x = 0 is equal to 8, lim f (x) = 8. x→0 Here, notation x → 0 means that x goes to 0 from both sides. From the graph we can see that f (0) = 5, indicated by the solid dot at (0, 5), so the limit of f (x) at x = 0 is not equal to f (0), lim f (x) = 8 ̸= 5 = f (0). x→0 In this case, we say that the function f (x) is discontinuous (or not continuous) at x = 0. In the second case of x = 2, we see that lim f (x) = 6, x→2− lim f (x) = 4, x→2+ and we see that f (2) = 6, indicated by the solid dot at (2, 6). Since the function approaches different values from the left and right sides, we say that the limit does not exist. In this case, we again say that the function f (x) is discontinuous at x = 0. In the third case of x = 3, we see that lim f (x) = 2, x→3 because the function approaches the same value 2 from both sides, so the limit exists and is equal to 2. However, the function f (x) is undefined at x = 3, because the white open dot at (3, 2) indicates that the value of f (3) is not 2, and there is no solid dot anywhere on the line x = 3, which is a way to indicate that x = 3 is not in the domain of f . In this case, again, the function f (x) is discontinuous at x = 3 simply because we cannot compare the limit with any value f (3). Whenever a point x = a is not in the domain of f (x), the function cannot be continuous at that point. Finally, in the last case of x = −2, we see that lim f (x) = 6, x→−2 because the function approaches the same value 6 from both sides, so the limit exists and is equal to 6. The value of the function at that point is f (−2) = 6, so lim f (x) = 6 = f (−2). x→−2 In this case, we say that the function f (x) is continuous at x = −2. 1.9 Limits and continuity 47 Exercise 1. For the function in the figure: • What are the left and right limits of f (x) at x = 0? Does the limit of f (x) at x = 0 exist? Is the function continuous as x = 0? • What are the left and right limits at x = 1? Does the limit of f (x) at x = 1 exist? Is the function continuous as x = 1? • What are the left and right limits at x = 2? Does the limit of f (x) at x = 2 exist? Is the function continuous as x = 2? • What are the left and right limits of f (x) at x = −1? Does the limit of f (x) at x = −1 exist? Is the function continuous as x = −1? Definitions. To summarize the definitions in the above example: • If a function f (x) approaches some value when x approaches a from the right, this value is called the right limit of f (x) at x = a and is denoted limx→a+ f (x). • If a function f (x) approaches some value when x approaches a from the left, this value is called the left limit of f (x) at x = a and is denoted limx→a− f (x). • If both right and left limits of f (x) at x = a exist and are equal to each other, their value is called the limit of f (x) at x = a and is denoted limx→a f (x). • If limx→a f (x) = f (a) , i.e. the limit at x = a exists and is equal to the value of the function at that point, we say that the function is continuous at x = a. • If the limit limx→a f (x) does not exist, or it is not equal to f (a), or the function is undefined at x = a, we say that the function is discontinuous at x = a. Examples. Typical functions, such as linear, power functions, polynomials, exponentials, logarithms, sine and cosine, are all continuous on their domains. What happens if we add, subtract, multiply, or divide two continuous functions f (x) and g(x)? If f (x) approaches f (a) and g(x) approaches g(a) then, obvif (x) ously, f (x) ± g(x) approaches f (a) ± g(a), f (x)g(x) approaches f (a)g(a), g(x) approaches f (a) g(a) if g(a) ̸= 0, so: • The sum, difference, product, and also ratio if g(a) ̸= 0, are all continuous if f (x) and g(x) are continuous at x = a. Of course, dividing by zero can create a problem. Example 2. Are f (x) = 1 x−2 and g(x) = x + 2x continuous for all −1 ≤ x ≤ 1? 48 1 Functions 1 Solution: f (x) = x−2 is continuous for all −1 ≤ x ≤ 1, because we divide by zero only when x = 2, which is not on the interval [−1, 1]. g(x) = x + 2x is not continuous for all −1 ≤ x ≤ 1, because we divide by 0 when x = 0, so 0 is not in the domain of this function. Exercise 2. Are f (x) = 1 2x−1 and g(x) = 1 cos(x) continuous for all 0 ≤ x ≤ 1? In the next two problems we will use the following functions: 2 2 x x − 2 + 8, − 2 + 8, x ≤ −2 • p(x) = 6, • f (x) = 6, −2 < x ≤ 2 −2x + 8, 2 < x κx + 8, 3 3 13x − x , x < 0 13x − x , • g(x) = x, • q(x) = κ, 0≤x<1 −2x + 3, 1 ≤ x −2x + 3, x ≤ −2 −2 < x ≤ 2 2<x x<0 0≤x<1 1≤x The list notation means that each function is defined by different formulas on three different intervals. For example, if we want to find f (3), we see that 2 < 3, so x = 3 belongs to the last interval and f (3) = −2 · 3 + 8 = 2. Example 3. Find where the function f (x) is discontinuous and explain why. For which value of the constant κ will the function p(x) be continuous? Solution: One each interval the function is a polynomial, so it is continuous. We need to check if the function value jumps when the interval changes. First time the interval changes at x = −2. When x is approaching −2 from the left, where x < −2, 2 2 the function is − x2 + 8, so it approaches − (−2) 2 + 8 = 6. We can write this using formulas instead of words: x2 (−2)2 + 8 = 6. lim f (x) = lim − + 8 = − 2 2 x→−2− x→−2− When x is approaching −2 from the right, x now belongs to the second interval −2 < x ≤ 2 where the function is constant, 6, so it approaches 6 : lim f (x) = lim 6 = 6. x→−2+ x→−2+ The left and right limits are the same, 6, which means that the limit exists and is 2 equal to 6, and the function at x = −2 is also f (−2) = − (−2) 2 +8 = 6 (we are using the first formula because x = 2 belongs to the first interval x ≤ 2) so the function is continuous at x = −2. Second time the interval changes at x = 2. Again, let us compute the left and right limits using the corresponding formulas from the list: 1.9 Limits and continuity 49 lim f (x) = lim 6 = 6, x→2− x→2− lim f (x) = lim (−2x + 8) = −2 · 2 + 8 = 4. x→2+ x→2+ Since the two limits are not equal, the limit does not exist, and the function is discontinuous at x = 2. Everywhere else the function is continuous. To find the constant κ which ensures that p(x) is continuous at x = 2, let us compute the left and right limits using the corresponding formulas from the list: lim p(x) = lim 6 = 6, x→2− x→2− lim p(x) = lim (κx + 8) = κ · 2 + 8. x→2+ x→2+ The limits must be equal, so we must have that 6 = κ · 2 + 8. Solving for κ we get that κ = −1. For κ = 1 the limit exists and is equal to 6, and the function f (2) = 6, so now the function p(x) is also continuous at x = 2. Exercise 3. Find where the function g(x) is discontinuous and explain why. For which value of the constant κ will the function q(x) be continuous? Example 4. The price of mailing a letter by Canada Post in 2022 is (where ‘up to’ means ‘including’): • • • • $1.07 up to 30 g, $1.30 over 30 g up to 50 g, $1.94 over 50 g up to 100 g, $3.19 over 100 g up to 200 g, • $4.44 over 200 g up to 300 g, • $5.09 over 300 g up to 400 g, • $5.47 over 400 g up to 500 g. At which points is the price as a function of weight discontinuous? What are the right and left limits at those points? Solution: The domain of the price p = p(w) of a letter as a function of weight is 0 < w ≤ 500. Beyond that weight it is no longer considered a letter. The first jump (discontinuity) is at 30g, where the left limit is $1.07 and the right limit is $1.30. Other discontinuities are similar, at 50g, 100g, etc. Exercise 4. 50 mg of a drug is injected into a patient at a constant rate for 12 seconds. After that the plasma concentration of the drug decreases exponentially with a half-life of 10 minutes. Express the quantity q = q(t) of the drug in the patient’s body as a continuous function of time t, measured in minutes. Make sure it is continuous at 12 seconds. Example 5. What are the left and right limits of f (x) = at x = 0 exist? Is the function continuous at this point? |x| x at x = 0? Does the limit |−1| Solution: If x < 0 then f (x) = |x| x = −1. For example, if x = −1 then −1 = −1. Similarly, if x > 0 then f (x) = |x| x = 1. This means that the left limit limx→0− f (x) = −1 and the right limit limx→0+ f (x) = 1. We can conclude that the limit limx→0 f (x) does not exist, and the function is not continuous. Also, x = 0 is not in the domain since we cannot divide by zero, so the function cannot be continuous even if the limit existed. Exercise 5. What are the left and right limits of f (x) = |x−2| x−2 at x = 2? Does the limit at x = 2 exist? Is the function continuous at this point? 50 1 Functions 2 . Compute limx→1 f (x)? Is Example 6. Consider a rational function f (x) = xx2−3x+2 +x−2 the function continuous at x = 1? Solution: At x = 1 both the numerator and denominator are zero, so the function is undefined at x = 1 which means it cannot be continuous no matter if the limit exists or not. Finding the roots of quadratic polynomials, we can write x2 − 3x + 2 (x − 1)(x − 2) = . x2 + x − 2 (x − 1)(x + 2) When we take the limit x → 1, we consider x that approach 1 but are not equal to 1, so x − 1 ̸= 0 and we can cancel it: x2 − 3x + 2 (x − 1)(x − 2) x−2 1−2 1 = lim = lim = =− . x→1 x2 + x − 2 x→1 (x − 1)(x + 2) x→1 x + 2 1+2 3 lim Exercise 6. Give an example of a rational function that is not continuous at x = 2 but has a limit at x = 2. Intermediate Value Theorem. If we know that a function is continuous on some interval [a, b] then it must cross any value y between f (a) and f (b) at some point x inside the interval [a, b], because it is not allowed to jump over this value. Of course, there could be more than one such x (for example, there are three in the figure), but we know there is at least one. This is very useful, because we know that it is possible to solve the equation y = f (x) for x on the interval a ≤ x ≤ b, even if it might not be obvious. Example 7. Show that there is a number c ∈ [0, 1] such that ec − 3c = 0. Solution: The function f (x) = ex − 3x is continuous. We see that f (0) = 1 and f (1) = e − 3 = −0.2817 . . . < 0, so y = 0 is in between these two values. By the Intermediate Value Theorem (IVT for short), there must be a point c on the interval [0, 1] such that f (c) = ec − 3c = 0. Using computer we can find that c = 0.619 . . . . Exercise 7. Show that there is a number c ∈ [0, 1] such that c7 + c2 − 1 = 0. In the next two problems we will use the following two functions. ( ( 2 sin(x), 0 ≤ x ≤ π2 − x2 + 8, 0 ≤ x ≤ 2 • g(x) = • f (x) = x − 12 , π2 < x ≤ 2. −2x + 8, 2 < x ≤ 4. Example 8. Can we apply the IVT to the function f (x) on the interval [0, 4]? If not, find the value y between f (0) and f (4) such that y = f (x) has no solution x ∈ [0, 4]. 1.9 Limits and continuity 51 Solution: We can check that limx→2− = 6 and limx→2+ = 4, so the function is not continuous at x = 2, and IVT cannot be applied. On the first interval [0, 2], the 2 parabola − x2 + 8 is decreasing, and on the second interval [2, 4], the linear function −2x + 8 is also decreasing, so when the function f (x) jumps from 6 to 4 when we cross x = 2 it skips those values. This means that, for example, f (x) = 5 has no solutions for 0 ≤ x ≤ 4. The graph of this function is in Example 1 above. Exercise 8. Can we apply the IVT to the function g(x) on the interval [0, 2]? If not, find the value y between g(0) and g(2) such that y = g(x) has no solution x ∈ [0, 2]. Growth and limits at infinity. When we discussed rational functions, we have seen how we could find their horizontal asymptotes. For example, using the limit notation introduced above, we can write 2 − 4x − 96 2x2 − 4x − 96 2−0−0 x2 = = lim =2 30 1 2 x→∞ x − x − 30 x→∞ 1 − − 1−0−0 2 x lim x and the main idea in this calculation was to take the leading term in the denominator (here x2 ) and divide both the numerator and denominator by it. The reason why this idea worked was because we could easily compare which function grows faster, x or x2 , by dividing and cancelling. For example, xx2 = 1x → 0 means that x2 grows faster than x. To generalize this idea, let us compare how various functions grow at infinity. We will say that a function g(x) grows faster than f (x) when x goes to infinity, or f (x) grows slower than g(x), if lim x→∞ f (x) = 0. g(x) We can also express this more compactly by saying that g(x) dominates f (x) and write f (x) ≪ g(x) as x → ∞. • Logarithms grow slower than power functions: ln(x) ≪ x p for p > 0. • Power functions grow slower when the power is smaller: x p ≪ xq if p < q. • Power functions grow slower than any exponential growth: • Exponentials grow slower when the base is smaller: x p ≪ ax if a > 1. ax ≪ bx if a < b. x The second one we already know, and the last one is true because abx = ( ba )x → 0 since ( ba )x is an exponential decay function when the base ab < 1. We will not spend time discussing why the first and third cases are true but, using a change of variables, both equations can be reduced to showing that x grows slower than ex , which is much more obvious. Remember also that an exponential growth function ax for a > 1 can always be written as eκx for κ = ln(a) > 0. Now that we know 52 1 Functions how to compare the growth of basic functions, we can consider more complicated examples. x 2 +x . Example 9. Compute the limit limx→+∞ 5·77x +ln(x) Solution: The fastest growing function in the denominator is 7x so, if we divide both the numerator and denominator by it, we get that 7x + x2 lim = lim x→+∞ 5 · 7x + ln(x) x→+∞ 7x +x2 7x 5·7x +ln(x) 7x 2 = lim 1 + 7x x x→+∞ 5 + ln(x) 7x = 1+0 = 0.2. 5+0 √ √ x. Exercise 9. Compute the limit limx→+∞ ln(x)+ x− x x 2 +x Example 10. For which values of κ > 0 does the limit limx→+∞ eκx7 +ln(x) exist? Solution: In words, the fastest growing term in the numerator is 7x and in the denominator it is eκx , so the limit will exist if the denominator grows at least as fast, so eκ ≥ 7 or κ ≥ ln(7). To make this explanation more precise, let us divide both the numerator and denominator by eκx , 7x x2 7 x 7 x + +0 7x + x2 κx κx κ e . lim κx = lim e ln(x) = lim e = lim x→+∞ e + ln(x) x→+∞ 1 + x→+∞ 1 + 0 x→+∞ eκ κx e This limit can exist only if e7κ ≤ 1; otherwise, it will grow exponentially. If e7κ = 1 then the limit is 1 and if e7κ < 1 then the limit is zero, because it will decay exponentially. κ 2 +x Exercise 10. For which values of κ > 0 does the limit limx→+∞ 2xx2 −ln(x) exist? Some famous limits. We conclude this section with a list of a few famous limits that, in particular, will be useful when we study the derivatives of exponential and trigonometric functions: x n lim 1 + = ex , n→∞ n ex − 1 = 1, x→0 x lim lim x→0 sin x = 1, x lim x→0 cos x − 1 = 0. x There is no need to memorize these limits at this point but, if you have time, you can learn more about them in the videos in the footnotes.6 The first limit is, in fact, the definition of Euler’s number e and the exponential function ex . The second limit is the consequence of the first one, and it will be used to compute the derivative of ex . The last two limits are relatively easy consequences of the definition of sine and cosine, and will also be used to compute the derivative of sin x and cos x. 6 https://youtu.be/sbLWLvSfvwk, https://youtu.be/IX1cZHz-bc0, https://youtu.be/dLXal60n3JQ. 1.9 Limits and continuity 53 Answer to Exercise 1. limx→0 f (x) = 1 and the function is continuous at x = 0. limx→1− f (x) = 4, limx→1+ f (x) = 1, limit does not exist, the function is discontinuous. limx→2 f (x) = −1, but the function is undefined at x = 2 and so discontinuous. limx→−1 f (x) = −2 ̸= −1 = f (−1), the function is discontinuous. 1 is not continuous for all 0 ≤ x ≤ 1, because we Answer to Exercise 2. f (x) = 2x−1 divide by zero when x = 0.5, so 0.5 is not in the domain of this function. g(x) = 1 cos(x) is continuous for all 0 ≤ x ≤ 1, because we divide by 0 when cos(x) = 0, when x = π2 , 3π 2 , . . ., which are not on the interval 0 ≤ x ≤ 1. Answer to Exercise 3. g(x) is always continuous. It is impossible to choose κ to make q(x) continuous. Answer to Exercise 4. Since the time is measured in minutes, we need to write 12 seconds as 0.2 min. The quantity of drug increases linearly from 0mg at time 50 = 250, and q(t) = 250t for 0 ≤ 0min to 50mg at time 0.2min, so the slope is 0.2 t ≤ 0.2. After that it starts decaying exponentially with a half-life of 10 minutes, so a common mistake is to write it as 50 · 0.5t/10 . However, this formula would be correct if the time started at t = 0, while now exponential decay starts at t = 0.2. This means we need to shift this exponential decay function to the right by 0.2, so it will be q(t) = 50 · 0.5(t−0.2)/10 for 0.2 < t. We can check that ( 250t, 0 ≤ t ≤ 0.2 q(t) = 50 · 0.5(t−0.2)/10 , 0.2 < t is continuous at all times including t = 0.2. Answer to Exercise 5. The left limit limx→2− f (x) = −1 and the right limit limx→2+ f (x) = 1, and the function is not continuous at x = 2. Answer to Exercise 6. For example, x−2 x−2 . Answer to Exercise 7. f (0) = −1 and f (1) = 1, so the IVT implies that such c exists. Using computer, c = 8398 . . . . Answer to Exercise 8. We cannot apply IVP because the function is not continuous at x = π2 . Both sin(x) and x − 21 are increasing on their corresponding intervals, and at x = π2 the function jumps from 1 to π−1 2 = 1.07 . . ., so for example f (x) = 1.03 has no solution x ∈ [0, 2]. Answer to Exercise 9. The fastest growing function in the denominator is x so, if we divide both the numerator and denominator by it, we get that √ ln(x) + x √ lim = lim x→+∞ x − x x→+∞ √ ln(x)+ x x√ x− x x = lim x→+∞ ln(x) x + √1x 1− √1 x = 0+0 = 0. 1−0 Answer to Exercise 10. κ ≤ 2. If κ = 2 then the limit is 1, if κ < 2 then the limit is 0.5. Chapter 2 Derivatives 2.1 Practical interpretation of derivatives In this section, we will start discussing derivatives rather informally and learn what is means that “the derivative f ′ (a) of a function y = f (x) at a point x = a is equal to m”. We will also look at the definition of the derivative using graphs, again very informally. Jumping ahead, we will think of the derivative f ′ (a) as • the slope of the tangent line to the graph of y = f (x) at x = a, or • the rate of change of the function near x = a. However, our main goal will be to express the meaning of the derivative in plain English in practical situations, also paying attention to the units of the variables x and y. Of course, to learn how to actually compute derivatives we will need a more formal definition that will be discussed in the later sections. Linear functions. Let us first discuss linear functions and, as an illustration, let us use two examples that we have seen before: • If a car drives with constant speed of 60 km/h then the distance it covers in t hours is d = 60 · t km. • If during photosynthesis in direct sunlight at temperature 10°C a leaf of some plant produces 30 µmol of glucose and oxygen per hour then the amount of glucose and oxygen produced during t hours is y = 30 · t µmol. Given a linear function y = b+m·x and any two points (x1 , y1 ) and (x2 , y2 ) on its graph, we recall that the slope m can be computed as m= ∆y y2 − y1 = . ∆x x2 − x1 55 56 2 Derivatives ∆y Notice that for a linear function the ratio ∆x is the same constant m no matter what the two points (x1 , y1 ) and (x2 , y2 ) are and, when we learn a general definition of the derivative, we will see that, for a linear function, the slope m also happens to be its derivative. How can we think about this quantity m? If the input x changes by ∆x and the ∆y output of our linear function changes by ∆y, the ratio ∆x tell us how the output changes relative to the input, so it has the meaning of the rate of change of y with respect to x. For example: • In the car example above, if the time changes by ∆t = 0.2 hours then the distance 12 = 60 km/h is the change of distance changes by ∆d = 12km, and the ratio 0.2 relative to time, better known as speed. • In the photosynthesis example, if the time changes by ∆t = 0.2 hours then the 6 = 30 amount of glucose and oxygen changes by ∆d = 6 µmol, and the ratio 0.2 µmol/h is the change of glucose and oxygen relative to time, which is the rate of photosynthesis. Let us rephrase the same thing in a different way. What does it mean that the derivative (or slope) of a linear function is equal to m? It means that: If the input changes by ∆x then the output changes by ∆y = m · ∆x. The derivative 60 km/h means that, for example, between time 0.1 and 0.3 hours, the distance will change by 60 · 0.2 = 12 km. The derivative 30 µmol/h means that between time 0.1 and 0.3 hours, the amount of glucose and oxygen produced will change by 30 · 0.2 = 6 µmol. Derivative of a general function. Next, let us look at an informal definition of the derivative for a general function y = f (x). If the function is not linear then its slope may be constantly changing and the ∆y ratio ∆x can depend on the points (x1 , y1 ) and (x2 , y2 ). However, if we zoom in very close to a particular point (a, f (a)) on the graph of the function (figure above on 2.1 Practical interpretation of derivatives 57 the right), the graph looks almost linear and we can draw a so-called tangent line that passes through the same point (a, f (a)) and has the same slope at that point. Of course, this zooming in procedure is a very informal definition of this slope, and we will make it more formal and precise later on, but for now: • the slope of the tangent line at the point (a, f (a)) is called the derivative of a function y = f (x) at the point x = a and it is denoted f ′ (a). By analogy with the linear functions, what does it mean that the derivative of a function y = f (x) at a point x = a is equal to f ′ (a)? It means that: If ∆x is small then between x = a and x = a + ∆x the output y = f (x) will change by approximately f ′ (a)∆x. By contrast with the linear functions, here the change of y is only approximately equal ≈ to f ′ (a)∆x, not exactly, and only if ∆x is small. Of course, we can rephrase the above statement slightly depending on the setting of the problem. For example, for the sake of clarity we will always select some specific small increment ∆x. Example 1. The rate of photosynthesis depends on temperature, and suppose that at 10°C a leaf of some plant produces glucose and oxygen at a rate of 30 µmol/h. Let f (t) be the total amount of glucose and oxygen produced by the leaf since 12 pm, where time t is measured in hours. If the temperature steadily rises and at 3 pm it reaches 10°C, what is f ′ (3) and what is its practical meaning? Solution: Since t = 3h corresponds to 3 p.m. and the temperature at that time is 10°C, the rate of photosynthesis f ′ (3) = 30 µmol/h. Supposing that the temperature changes slowly, f ′ (3) = 30 µmol/h means in plain English that, for example, between 3:00 pm and 3:10 pm the leaf will produce approximately 30× 16 = 5 µmol of glucose and oxygen. Our choice of the 10 minute time interval is reasonable because the temperature is unlikely to change much on the scale of 10 minutes. Exercise 1. Suppose that the average monthly sales S at a bakery, in dollars, are a function S = f (A) of its monthly spending on advertisement A, also in dollars. If f ′ (100) = 4.5, what are the units and practical meaning of this derivative? Example 2. In the figure1 and table below we see the average finish time in 2009 New York marathon by age group and sex. Let T = f (A) be the average finish time for men of age A. In each age group, take the middle age (for example, in the 45-49 age group the middle age is 47) and suppose that the average finish time for men of that age is the same as for the group. For example, f (47) = 4h 13min, ignoring seconds. Estimate the derivative f ′ (52), give its units and describe its meaning. The table of values is: 1 https://www.runtri.com/2010/11/new-york-city-marathon-average-finish.html 58 2 Derivatives A T 22 4:12 27 4:06 32 4:08 37 4:10 42 4:09 47 4:13 52 4:22 57 4:36 62 4:47 67 5:12 Solution: From the figure, we see that f (47) = 4h 13min, f (52) = 4h 22min, and f (57) = 4h 36min. We also see that the slope of the line connecting the values at A = 47 and A = 57 is a good candidate for the slope at A = 52, which is in the middle of those values. Since ∆A = 10 years and ∆T = 4h 36min − 4h 13min = 23min, the slope of this line is ∆T ∆A = 2.3 min/year. Of course, this is only an approximation, but if indeed f ′ (52) = 2.3 min/year then its meaning is the following: the average finish time of 53 year old runners is approximately 2.3 minutes slower than 52 year old runners. Exercise 2. In the figure2 we see the graph of the energy economy E, in miles/kWh, as a function E = f (S) of speed S, in mph, for 2021 Porsche Taycan electric vehicle. Estimate the derivative f ′ (60), give its units and describe its meaning. From the graph: f (55) = 4.02, f (60) = 3.64, f (65) = 3.36. Tangent line. By our definition, the derivative f ′ (a) of a function y = f (x) at x = a is the slope of the tangent line to this function f (x) at the point x = a. Also, the tangent line passes through the same point (a, f (a)). We know the formula y = y0 + m(x − x0 ) of a linear function that has slope m and passes through the point (x0 , y0 ) so, according to this formula, the tangent line is y = f (a) + f ′ (a) · (x − a). Indeed, the slope of this line is f ′ (a) and, if we plug in x = a into this formula, the second term becomes zero and we get y = f (a), so the line passes through the point (a, f (a)). We can also write this equation in terms of the increments ∆y = y − f (a) and ∆x = x − a: ∆y = f ′ (a) · ∆x. 2 https://www.cleanmpg.com//community/index.php?media/35360/full 2.1 Practical interpretation of derivatives 59 √ Example 3. We will later learn that the derivative of the function y = x at any 1 . What is its tangent line at x = 121? Using the positive value x > 0 is equal to 2√ x √ tangent line, approximate 132 and compare with the actual value. √ Solution: The function at x = 121 is 121 = 11 and the derivative at x = 121 is 1 1 √1 = 22 , so the tangent line is y = 11 + 22 (x − 121). When x = 132, the tangent 2 121 √ 11 line gives y = 11 + 22 = 11.5 . The actual value is 132 = 11.4891 . . ., and we see that the function and the tangent line are quite close in this case even when the increment ∆x = 132 − 121 = 11 is not very small. Exercise 3. We will later learn that the derivative of the function y = cos x is equal to − sin(x). What is its tangent line at x = π4 ? Using the tangent line, approximate cos( π5 ) and compare with the actual value. Recall that cos( π4 ) = sin( π4 ) = √12 . Estimating derivatives from tables. In the two examples above about New York marathon and Porsche Taycan we actually estimated the derivative using nearby values before interpreting its practical meaning. Right now we will do a couple of similar examples, but we will spell out a bit more explicitly how we can estimate the derivatives in such cases. Example 4. In the table below we see the average finish time (in hours) in 2009 New York marathon by age group among women. A T 22 4.62 27 4.53 32 4.6 37 4.67 42 4.65 47 4.75 52 4.88 57 5.15 62 5.47 67 5.57 If T = f (A) is the average finish time for women of age A, estimate f ′ (22), f ′ (67) and f ′ (52). Solution: In the figure on the right, we plot the data points from the table and a dotted curve that interpolates smoothly between those point and could hypothetically represent the graph of the function T = f (A). If we knew this function, we could find the slope of the tangent line at any age A, which would give us the derivative f ′ (A). The problem is that we do not know this function, so we have to use the values given in the table. Normally, if we could zoom in on the actual function, we could estimate the slope of ∆y using nearby points. Right now, the closest points the tangent line by the ratio ∆x are the neighbouring points in the table, so we will use those values. For example, to estimate the derivative f ′ (22), we can use the points (22, 4.62) and (27, 4.53), for which ∆x = 27 − 22 = 5 years and ∆y = 4.53 − 4.62 = −0.09 hours. (Notice that we subtract the values in the same order.) As a result, − 0.09 5 = 60 2 Derivatives −0.018 hour/year is our estimate for the derivative f ′ (22). We could also change the units from hours to minutes, using that 1 hour = 60 minutes, to get −0.018 hour/year = −60 · 0.018 min/year = −1.08 min/year. Similarly, we can estimate f ′ (67) using the points (62, 5.47) and (67, 5.57), in which case we get 5.57−5.47 67−62 = 0.02 hour/year = 1.2 min/year. Finally, to estimate f ′ (52), we have several choices. We can use a point to the right to estimate f ′ (52) ≈ 5.15−4.88 57−52 = 0.054 hour/year = 3.24 min/year. We can use a point to the left to estimate f ′ (52) ≈ 4.75−4.88 47−52 = 0.026 hour/year = 1.56 min/year. Or we can take the average of the two estimates, which would be 3.24+1.56 = 2.4 2 min/year. Of course, taking the average is not strictly necessary, but it would often give a better estimate. In the case when the increments ∆x are the same to the left and right of our point, averaging the two values is the same as computing the slope between those two neighbouring points, in this case 5.15−4.75 57−47 = 0.04 hour/year = 2.4 min/year. Exercise 4. The table below shows the energy economy E, in miles/kWh, at various speeds S, in mph, for 2022 Audi GT RS. If E = f (S), estimate the derivatives f ′ (50), f ′ (60), and f ′ (75). S E 50 3.65 55 3.40 60 3.30 65 2.95 70 2.70 75 2.30 When derivatives do not exist. It is important to remember that derivatives do not always exist, and here are some examples when the derivative f ′ (a) is not defined. • If the function is not continuous at a point x = a then the derivative f ′ (a) does not exist, as in the figure on the left at x = 0 where the function has a jump. • If the function has a corner (also called a kink) at x = a then the derivative f ′ (a) does not exist, as in the case of y = |x| in the middle figure, because the slope on the right of x = 0 is different than on the left of x = 0. • A less common example is in the figure on the right, where the function y = x sin( 1x ) keeps fluctuating between two lines y = x and y = −x as in approaches x = 0 so, again, there is no tangent line at x = 0. 2.1 Practical interpretation of derivatives 61 Exercise. Draw two examples of graphs when derivatives do not exist, and give one example of a formula y = f (x) when a derivative does not exist. Specify at what points x the derivative is not defined and explain why. Derivatives of inverse functions. Suppose that a function y = f (x) is invertible, so x = f −1 (y). In the figure on the right, we flipped the figure at the beginning of this section around the diagonal, so now it shows the inverse function x = f −1 (y). Notice that the x-axis and y-axis switched, b = f (a) is now on the horizontal axis, and the slope of the tangent line at y = b is the derivative of this inverse function f −1 at the point y = b, which is ( f −1 )′ (b). We will study how to calculate derivatives later on, including derivatives of inverse functions, but for now let us practice interpreting the meaning of this derivative ( f −1 )′ (b). Since the role of x and y switch for the inverse functions, we can say: If ∆y is small then between y = b and y = b + ∆y the output x = f −1 (y) will change by approximately ( f −1 )′ (b) · ∆y. Again, this is just another way to state that ( f −1 )′ (b) ≈ ∆x ∆y , where the increments ∆x and ∆y switched their roles, but should still be computed for two points close to (b, f −1 (b)) = (b, a). Example 5. The temperature of a cup of coffee is decreasing from 90°C at time t = 0 min according to the function T = f (t). Suppose that ( f −1 )′ (70) = −0.8. What are the units of 70 and −0.8, and what is the practical interpretation of this derivative. Solution: The inverse function is t = f −1 (T ) so 70 must be in °C and −0.8 must ∆t be in min/°C, since this derivative has units of ∆T . The meaning of this derivative is that if the coffee temperature decreased from 70°C to 65°C then the time it took is approximately −0.8 · (65 − 70) = 4 min. Exercise 5. Suppose that r = f (t) is the number of centimetres of rainfall since midnight, where t is in hours. Suppose that the rainfall accumulated by 6 am is 12 cm. If f (t) is strictly increasing, is it invertible? Suppose that ( f −1 )′ (12) = 0.5. What are the units of 12 and 0.5, and what is the practical interpretation of this derivative. Example 6. Let T = f (A) be the average finish time for women of age A in 2009 New York marathon that appeared in the Examples 2 and 4 above. Until about 62 2 Derivatives age of 40 this function is not monotone, so not invertible, but if we restrict the domain to ages of 40 and above then it looks increasing and invertible. Estimate the derivative ( f −1 )′ (4.88) and give its units. ∆A , so Solution: Since A = f −1 (T ), the units of the derivative will be the units of ∆T −1 year/hour. From the table we see that f (4.88) = 52, so we can use nearby points (4.75, 47) and (5.15, 57) to estimate the slope. Notice how we changed the role of A and T and now write the average time T first, because it is the input of the inverse function. Since the increments between those two points are ∆A = 57 − 47 = 10 years and ∆T = 5.15 − 4.75 = 0.4 hours, our estimate of the derivative is ∆A 10 ( f −1 )′ (4.88) ≈ ∆T = 0.4 = 25 year/hour. In Example 4 we found that f ′ (52) ≈ 0.04 1 . This is not surprising because hour/year, which is exactly the reciprocal 0.04 = 25 ∆A ∆T for the inverse function we used ∆T instead of ∆A for the original function A = f (T ). Remember this example when studying later on how to compute derivatives of inverse functions, which will be based on the formula: b = f (a) =⇒ ( f −1 )′ (b) = 1 . f ′ (a) Exercise 6. Let E = f (S) be the energy economy function from the Exercise 4 above. Estimate the derivative ( f −1 )′ (3.30) and give its units. How does it relate to f ′ (60) in the Exercise 4? Answer to Exercise 1. The meaning of f ′ (100) = 4.5 is that, if the bakery spends $101 instead of $100 on monthly advertisement then its average sales will increase approximately by 4.5 · 1 = 4.5 dollars. The units of f ′ (100) are $/$. One can also cancel the $ units and think of the derivative as a unitless quantity. Answer to Exercise 2. It looks like the slope of the line connecting values at 55mph and 65mph would be a good approximation for the slope of the function at 60mph. Since ∆S = 10 mph and ∆E = 3.36 − 4.02 = −0.66 miles/kWh, the slope of this line is ∆E ∆S = −0.066 (miles/kWh)/mph. Since mph is miles/hour, we can cancel miles in the units and use h/kWh as the units of this derivative. However, when interpreting its meaning we will keep using the increments of the original variables S and E, which have units mph and miles/kWh, so from this point of view there is no need to simplify the units. Again, the above calculation was only an approximation of the derivative, but if indeed f ′ (60) = −0.066 (miles/kWh)/mph then its meaning is the following: driving a car at the speed of 61 mph decreases the energy efficiency approximately by 0.066 miles/kWh compared to driving it at 60 mph. Answer to Exercise 3. Since cos( π4 ) = √12 and the derivative of cosine at π4 is − sin( π4 ) = − √12 , the tangent line is y = √12 − √12 (x − π4 ). When x = π5 , the tangent line gives y = √12 − √12 ( π5 − π4 ) = 0.818 . . . . The actual value is cos( π5 ) = 0.809 . . ., and the difference is about 0.09. 2.1 Practical interpretation of derivatives 63 Answer to Exercise 4. f ′ (50) ≈ −0.05, f ′ (75) ≈ −0.08, and f ′ (60) ≈ −0.045. In the last one, the average was used. The units are (miles/kWh)/mph. Answer to Exercise 5. Since t = f −1 (r), 12 is in cm and 0.5 is in hour/cm. The interpretation of ( f −1 )′ (12) = 0.5 is that, between 12 cm and 12.1 cm of rainfall accumulation it should take approximately 0.5 · 0.1 = 0.05 hours = 3 min. Since we know that 12 cm accumulated at 6 am, we can say that at 6 am it will take approximately 3 more minutes to accumulate another 1 mm of rainfall. Answer to Exercise 6. ( f −1 )′ (3.30) ≈ −22.22 mph/(miles/kWh). This is the reciprocal of f ′ (60) from the Exercise 4, because f (60) = 3.30. 64 2 Derivatives 2.2 Formal definition of derivative In this section, we will translate the zooming in procedure we used to define the tangent line and its slope into formulas that we will later use to actually compute derivatives. Along the way, we will also discuss secant lines and their slopes, which have an important meaning of the average rate of change. Secant lines and average rate of change. When we estimated the slope f ′ (a) ∆y of the tangent line, we used the ratio ∆x of the increments of the input and output of our function y = f (x), but we said that this approximation works well only if we zoom in, which means that the two points should be pretty close to each other. It ∆y has an important meaning and special name even if two turns out that this ratio ∆x points are not close to each other. In the figure on the right we pick two points (a, f (a)) and (b, f (b)) on the graph of the function y = f (x) and draw a line through those points. This line is called a secant line and its slope ∆y f (b) − f (a) = ∆x b−a is called the average rate of change of the function f (x) on the interval [a, b]. • Average rate of change has the same units and the same general meaning as the ∆y derivative, because the above ratio ∆x describes how much the output changes, f (b) − f (a), relative to how much the input changes, b − a. • To understand why this rate of change is called average, think that the function f (x) represents the position of a car at time x. The car might travel from position f (a) to position f (b) between times a and b with constantly changing velocity, f (b)− f (a) but if it travelled with constant velocity v then v must be distance time = b−a . The secant line represents a car travelling at constant speed from point f (a) to point f (b) between times a and b. • In specific examples, when “rate of change” has a specific name then we can replace it by that name. For example, in the car example, we can say “average velocity” instead of “average rate of change”. Example 1. What is the average rate of change of cos(x) on the interval [0, π]? Solution: Because cos(0) = 1 and cos(π) = −1, the average rate of change equals cos(π)−cos(0) 2 = −1−1 π−0 π−0 = − π = −0.6366 . . . . Exercise 1. What is the average rate of change of ex on the interval [a, a + 1]? 2.2 Formal definition of derivative 65 Example 2. Draw a graph of any concave down function on some interval [a, b] and compare the derivatives f ′ (a), f ′ (b) at the endpoints and the average rate of f (a) change f (b)− b−a . Solution: One example of a graph of a concave down function is in the figure on the right. We can see that, as we move left to right, the slope of the tangent line is getting smaller and smaller. This means that f ′ (a) > f ′ (b), and the average rate of change is somewhere in between: f ′ (a) > f (b) − f (a) > f ′ (b). b−a What happens is the function is concave up? See https://youtu.be/U7GajesEPCo. Exercise 2. For the functions in the two figures, compare the following quantities: 0, f ′ (1), f (3) − f (2), and f ′ (4). Hint: 3 − 2 = 1. Definition of derivative. Until now we have thought about the derivative f ′ (a) as the slope of the tangent line or the rate of change near the point x = a, but we have justified this informally by observing that a function looks almost like a straight line if we zoom in close enough. How can we turn this zooming in procedure into something more formal that can be used to calculate f ′ (a)? We can notice that: • If we move the point b closer and closer to a, the slope of the secant line will get closer and closer to the slope of the tangent line f ′ (a). f (b)− f (a) b−a This is a geometric definition of the derivative. The process of moving b closer and closer to a should remind us of the concept of taking a limit and, using the language of limits, we can write f ′ (a) = lim b→a f (b) − f (a) . b−a 66 2 Derivatives Another way to write this is to rename the increment b − a as h, so b = a + h, and let this increment h become smaller and smaller, f ′ (a) = lim h→0 f (a + h) − f (a) . h This is an algebraic definition of the derivative that translates the above geometric definition into formulas. For example, what is the derivative of a constant function y = f (x) = c? Since its graph is a horizontal line, the slope is equal to 0 everywhere, so f ′ (a) = 0. Now we can also see this using the algebraic definition, f ′ (a) = lim h→0 f (a + h) − f (a) c−c = lim = lim 0 = 0. h→0 h h→0 h Let us take a look at a few more examples of using this definition. Example 3. Write down and simplify the definition of the derivative of y = ex at x = 2. Solution: Using the above formula and properties of exponentials, the derivative is e2+h − e2 e2 eh − e2 e2 (eh − 1) eh − 1 = lim = lim = e2 · lim . h→0 h→0 h→0 h→0 h h h h lim In the last step, we took the factor e2 outside of the limit, because it is just a constant that does not depend on h. Here, we practiced using the definition of the derivative and simplified it a little bit, but we will come back to the last limit in a second. Exercise 3. If we know that f (1) = 0, write down and simplify the definition of the derivative of y = f (cos(x)) at x = 0. Example 4. Write down and compute the derivative of y = x2 at x = 1. Solution: Before taking the limit, let us first simplify the slope of the tangent line, (1 + h)2 − 12 (1 + 2h + h2 ) − 1 2h + h2 = = = 2 + h. h h h When h gets small, this slope approaches 2 because 2 + h → 2 + 0 = 2, so the derivative of y = x2 at x = 1 is 2. Exercise 4. Write down and compute the derivative of y = x3 at x = 1. Hint: you can use that (a + b)3 = a3 + 3a2 b + 3ab2 + b3 . Example 5. Write down and compute the derivative of y = √ x at x = 9. 2.2 Formal definition of derivative 67 Solution: We will use a special trick of multiplying and dividing by the so called conjugate a2 − b2 to simplify the numer√ use √ (a −√b)(a + 2b) =√ √ the identity √ and then ator ( 9 + h − 9)( 9 + h + 9) = ( 9 + h) − ( 9)2 = (9 + h) − 9 = h: √ √ √ √ √ √ 9+h− 9 9+h− 9 9+h+ 9 h 1 √ = √ √ =√ √ . = ·√ h h 9 + h + 9 h( 9 + h + 9) 9+h+ 9 When we take the limit h → 0, we get √ 1 √ 9+0+ 9 = 16 . Exercise 5. Write down and compute the derivative of y = 1 x at x = 3. Derivative as a function. If we can compute the derivative f ′ (a) for all points x = a where the derivative exists, then we can think of the derivative as a new function y = f ′ (x). In the examples and exercises above, instead of choosing some specific value of a to compute the derivative f ′ (a), such as a = 2, 1, 9 or 3, we could have chosen an arbitrary x and the same calculations would have given us f ′ (x). Let us see how this works on a couple of examples. Example 6. Show that (ex )′ = ex . Solution: Setting up the derivative using the definition is the same as above: ex+h − ex ex eh − ex ex (eh − 1) eh − 1 = lim = lim = ex · lim . h→0 h→0 h→0 h→0 h h h h (ex )′ = lim h Let us plug in smaller and smaller values of h to see what number e h−1 approaches. 0.001 0.0001 0.00001 −1 For example, e 0.001−1 = 1.0005 . . ., e 0.0001−1 = 1.00005 . . ., e 0.00001 = 1.000005 . . .. We can see that this gets closer and closer to 1, so eh − 1 =1 h→0 h lim and this shows what we wanted, (ex )′ = ex . From now on we no longer need to calculate the derivative of ex at a specific point a, since we have a formula that works for all x. Comment. The truth is that, although we could see using a calculator that the above h limit limh→0 e h−1 was equal to 1, Euler’s number e = 2.718281828 . . . is actually chosen in such a way that this limit is 1. If you recall, this limit was mentioned at the end of Section 1.9 as one of the famous limits. You can watch in the footnote links3 more about where Euler’s number e comes from and how its definition implies the above limit. In Chapter 1, we also mentioned that e is a very special base of an exponential function and, what makes it special is exactly that the derivative of the function y = ex is the function ex itself. 3 https://youtu.be/sbLWLvSfvwk, https://youtu.be/IX1cZHz-bc0. 68 2 Derivatives Exercise 6. In the above examples and exercises replace a specific x = 1, 9 and 3 by a general x and show that 1 ′ √ 1 1 = − 2. (x2 )′ = 2x, (x3 )′ = 3x2 , ( x)′ = √ and 2 x x x All the functions in Exercise 6 are power functions of the form y = xn for n = 2, 3, 21 and −1, and all the derivatives are given by the following power rule: ′ xn = nxn−1 . Example 7. Using the power rule, compute the derivative of main of this derivative function? √ x. What is the do- Solution: Using the power rule with n = 12 , √ ′ 1 ′ 1 1 1 1 1 x = x 2 = x 2 −1 = x− 2 = √ . 2 2 2 x Of course, the power rule applies only where the function and the derivative are well defined, so the domain of this derivative is x > 0. Exercise 7. Using the power rule, compute the derivative of of this derivative function? 1 . What is the domain x2 Example 8. Given any power function y = cxn , show that near any point x in its domain, ∆y ∆x ≈n . y x If n = 2 and x changes by 1%, by what percentage approximately does y change? Solution: If we move ∆x and y to the opposite sides of the equation, what we want to show is that y cxn ∆y ≈n =n = cnxn−1 . ∆x x x But cnxn−1 is the derivative of y = cxn , which by definition of the derivative can ∆y be approximated by ∆x . It means that the above equation is just a rephrasing of the usual meaning of the derivative in the case of power functions. If n = 2 and x ∆y ∆x changes by 1% then ∆x = 0.01x, so ∆x x = 0.01 and y ≈ 2 x = 0.02. So the above equations shows that, for a power function y = cx2 , if the input x changes by 1% then the output changes by approximately 2%. Of course, we can change 2 to any other power n. Exercise 8. If the radius of the sphere changes by 1%, by approximately what percentage does the volume change? 2.2 Formal definition of derivative 69 A list of important derivatives. It is very important to remember the algebraic and geometric definitions of the derivative above, just like it is important to know the practical meaning of the derivative. However, once we learn the derivatives of basic functions and after we learn a few rules of differentiation,4 we will be able to compute derivatives of pretty much any function in a mechanical way. At this point it is a good time to memorize the following derivatives: (xn )′ = nxn−1 , (sin x)′ = cos x, (ex )′ = ex , 1 (ln x)′ = , x (cos x)′ = − sin x. We have already explained above the formula (ex )′ = ex , and have checked several special cases of the power rule (xn )′ = nxn−1 . The derivatives of sine and cosine follow from some trigonometric identities and the famous limits mentioned at the end of Section 1.9; if you are interested you can learn more about it in the footnote link.5 The derivative of ln x and the general case of the power rule (for arbitrary power n) will be explained later when we discuss derivatives of inverse functions. The famous limits at the end of Section 1.9 were used to compute the derivatives of ex , sin x and cos x, but once we know these derivatives we can reinterpret those limits as derivatives of these functions at zero. h Example 9. Compute the limit limh→0 e h−1 using that (ex )′ = ex . h 0+h 0 0+h 0 Solution: Since e h−1 = e h−e , the limit limh→0 e h−e is the definition of the derivative of ex at x = 0. Since (ex )′ = ex , the derivative at x = 0 is equal to e0 = 1, so the limit is 1. Exercise 9. Compute the limit limh→0 sinh h using that (sin x)′ = cos x, and the limit limh→0 ln(1+h) using that (ln x)′ = 1x . h First rules of differentiation. If the function f (x) changes by 5 between x = 0 and x = 1 and another function g(x) changes by 7 between x = 0 and x = 1 then the sum f (x) + g(x) will change by 5 + 7 = 12. This means that if we add two functions f (x) + g(x) then the increment ∆y of the sum will be equal to the sum of their increments. Similarly, if we subtract two functions f (x) − g(x) then the increment ∆y of the sum will be equal to the difference of their increments. Since ∆y we compute derivatives by looking at ∆x and then taking limits, this means that ′ f (x) + g(x) = f ′ (x) + g′ (x), 4 5 ′ f (x) − g(x) = f ′ (x) − g′ (x). To differentiate a function means to take its derivative, and differentiation means taking a derivative. https://youtu.be/buqwRTJcEmw. 70 2 Derivatives If the function f (x) changes by 5 between x = 0 and x = 1 then the function 3 f (x) will change by 3 · 5 = 15. This means that if we multiply our function by a constant c then the increment ∆y will be multiplied by c, which implies that ′ c f (x) = c f ′ (x). The above two rules together are called the linearity of differentiation. They can also be called the sum rule, difference rule and constant multiple rule. Example 10. Compute the derivative of f (x) = Solution: First, using that xa xb = xa+b and xa xb √ x(2x+7 x)−1 . 5/2 x = xa−b , we can simplify, √ x(2x + 7 x) − 1 2x2 + 7x3/2 − 1 = x5/2 x5/2 2−5/2 3/2−5/2 = 2x + 7x − x−5/2 = 2x−1/2 + 7x−1 − x−5/2 . Then, using the above two rules and the power rule, (2x−1/2 + 7x−1 − x−5/2 )′ = 2(x−1/2 )′ + 7(x−1 )′ − (x−5/2 )′ = 2(−1/2)x−1/2−1 + 7(−1)x−1−1 − (−5/2)x−5/2−1 7 5 1 = −x−3/2 − 7x−2 + (5/2)x−7/2 = − 3/2 − 2 + 7/2 . x x 2x The last step is not strictly necessary and the answer could be left in the form −x−3/2 − 7x−2 + (5/2)x−7/2 . Both the original function and the derivative have domain x > 0. Exercise 10. Compute the derivative of f (x) = to this function at x = 1? √ x(1−x)+ 2x . x5/2 What is the tangent line Example 11. Compute the derivative of f (x) = cos x + 2 ln √1 x − 3ex . Solution: First, let us simplify 1 2 ln √ = 2 ln(x−1/2 ) = 2(−1/2) ln(x) = − ln(x) x and then use the rules of differentiation, ′ 1 cos x − ln(x) − 3ex = (cos x)′ − (ln(x))′ − 3(ex )′ = − sin x − − 3ex . x Exercise 11. Compute the derivative of f (x) = −2 sin x − ln(x2 ) + 5ex . 2.2 Formal definition of derivative 71 Common notation for derivatives. Given a function y = f (x), its derivative can be written in a number of way, for example, f ′ (x), df , dx d f (x), dx y′ (x), dy . dx dy The last two, y′ (x) and dx can be used for any function, but it must be clear from the context which specific function f (x) we are talking about. If we want to write a derivative at some specific point x = a then we can use the following notation: f ′ (a), df dx x=a , d f (x) dx x=a , y′ (a), dy dx x=a . d (1 + x2 + cos x)|x=1 means that we first want to compute the For example, dx derivative of the function y = 1 + x2 + cos x and then plug in x = 1. We could also write y′ (1) since we know what the function is. However, we cannot write (1 + 12 + cos 1)′ , because it looks like we are taking the derivative of a constant 1 + 12 + cos 1 = 2.5403 . . . which is zero. Higher order derivatives. Since the derivative f ′ (x) of a function y = f (x) is a function itself, we can also compute its derivative ( f ′ (x))′ , which is called the second derivative of f (x) and is denoted f ′′ (x). If we take another derivative, we will get the third derivative f ′′′ (x). We can continue taking higher order derivatives as long as they are well defined. The derivative of order n can be written in a number of way, for example, f (n) (x), dn f , dxn dn f (x), dxn y(n) (x), dny . dxn As before, the last two can be used for any function, but it must be clear from the context which specific function f (x) we are talking about. If we want to write a derivative at some specific point x = a then we can use the following notation: f (n) (a), dn f dxn x=a , dn f (x) dxn x=a , y(n) (a), dny dxn x=a . For the first three derivatives when n = 1, 2 or 3, instead of writing f (n) we write f ′ , f ′′ , f ′′′ and, instead of writing y(n) we write y′ , y′′ , y′′′ . Of course, the linearity of differentiation rules apply to higher order derivatives because they apply at each step, dn f (x) ± g(x) = f (n) (x) ± g(n) (x), n dx dn c f (x) = c f (n) (x). n dx 72 2 Derivatives Example 12. Compute the eighth derivative of f (x) = x7 − 4x6 + x5 − 2x2 + 1. Solution: First of all, by linearity of differentiation, d8 7 d8 7 d8 6 d8 5 d8 2 d8 6 5 2 (x − 4x + x − 2x + 1) = x − 4 x + x − 2 x + 8 1. dx8 dx8 dx8 dx8 dx8 dx First of all, the derivative of a constant 1 is 0, so all higher derivatives of a constant will be zero. Every time we take a derivative of a power function, by the power rule, the power will decrease by 1. For example, derivative of x2 will become 2x, then 2, then 0, so the third and higher derivatives of x2 will be zero. For the same reason, if we take the derivative of x7 eight times it will also become zero. So the answer is zero. Exercise 12. Compute the fourth derivative of cos x. Can you think of any other function that has the same fourth derivative as cos x? Answer to Exercise 1. ea+1 −ea (a+1)−a = ea e1 −ea 1 = ea (e−1) 1 = (e − 1)ea . Answer to Exercise 2. The left figure: 0 < f ′ (4) < f (3) − f (2) < f ′ (1), because the slope is positive and decreasing as we move left to right, and because f (3) − f (2) is the average rate of change on the interval [2, 3]. The right figure: f ′ (1) < f (3) − f (2) < f ′ (4) < 0, because the slope is negative and increasing as we move left to right, and because f (3) − f (2) is the average rate of change on the interval [2, 3]. Answer to Exercise 3. lim h→0 f (cos(0+h))− f (cos(0)) h = lim h→0 f (cos(h)) h because the second term is f (cos(0)) = f (1) = 0. Answer to Exercise 4. Before taking the limit, let us first simplify the slope of the tangent line, (1 + h)3 − 13 (1 + 3h + 3h2 + h3 ) − 1 3h + 3h2 + h3 = = = 3 + 3h + h2 . h h h When h gets small, this slope approaches 3 because 3 + 3h + h2 → 3 + 0 + 0 = 3, so the derivative of y = x3 at x = 1 is 3. Answer to Exercise 5. The answer is − 91 because 1 3+h − 31 = h 3−(3+h) (3+h)3 h = h − (3+h)3 h =− 1 1 1 →− =− . (3 + h)3 (3 + 0)3 9 Answer to √ Exercise 6. We will not repeat all the calculations and only show the case of the x: 2.2 Formal definition of derivative 73 √ √ √ √ √ √ x+h− x x+h− x x+h+ x (x + h) − x 1 = ·√ √ = √ √ =√ √ . h h x + h + x h( x + h + x) x+h+ x 1 1 √ = 2√ . Of course, this only works When we take the limit h → 0, we get √x+0+ x x √ ′ when x > 0, so the derivative ( x) exists only when x > 0. ′ ′ Answer to Exercise 7. Using the power rule with n = −2, x12 = x−2 = −2x−2−1 = −2x−3 = − x23 . The function and derivative are defined when x ̸= 0. 3 Answer to Exercise 8. Volume is proportional to the radius cubed, V = 4π 3 r , so this is just like Example 10 with n = 3, and the volume will change by approximately 3%. Answer to Exercise 9. Since sinh h = sin(0+h)−sin(0) , the limit limh→0 sin(0+h)−sin(0) is h h the definition of the derivative of sin x at x = 0. Since (sin x)′ = cos x, the derivative at x = 0 is equal to cos(0) = 1, so the first limit is 1. Since ln(1+h) = ln(1+h)−ln(1) , h h ln(1+h)−ln(1) is the definition of the derivative of ln x at x = 1. Since the limit limh→0 h (ln x)′ = 1x , the derivative at x = 1 is equal to 1, so the second limit is also 1. Answer to Exercise 10. First we simplify the function as x−2 − x−1 + 2x−7/2 and then take the derivative, (x−2 − x−1 + 2x−7/2 )′ = (x−2 )′ − (x−1 )′ + 2(x−7/2 )′ = (−2)x−2−1 − (−1)x−1−1 + 2(−7/2)x−7/2−1 2 1 7 = −2x−3 + x−2 − 7x−9/2 = − 3 + 2 − 9/2 . x x x Since f (1) = 2 and f ′ (1) = −8, the tangent line is y = 2 − 8(x − 1) = 10 − 8x. Answer to Exercise 11. −2 cos x − 2x + 5ex . Answer to Exercise 12. Consecutive derivatives of cos x will be − sin x, − cos x, sin x and cos x. So the fourth derivative of cos x is cos x itself. Any function of the form cos x +ax3 +bx2 +cx +d will also have the fourth derivative equal to cos x, because all the power functions will disappear after taking four derivatives, just like in the previous example. 74 2 Derivatives 2.3 Derivatives and graphs Because the derivative f ′ (x) represents the rate of change of the function y = f (x) near a point x and, at the same time, it is the slope of the tangent line at this point, we can relate the behaviour of the rate of change to the behaviour of the graph. You can watch the footnote link for a quick summary.6 • If f ′ (x) > 0 then the slope is positive and the function is increasing. • If f ′ (x) < 0 then the slope is negative and the function is decreasing. • If f ′ (x) = 0 then the slope is horizontal. Such points will be an example of the so called critical points, but we will discuss them in the later sections when we study optimization problems. The sign of the second derivative f ′′ (x) tells us whether the first derivative f ′ (x) is increasing or decreasing. • If f ′′ (x) > 0 then the slope is increasing and the function is concave up. • If f ′′ (x) < 0 then the slope is decreasing and the function is concave down. Basic building blocks are the following four examples. To help us talk about these four cases, let us imaging that the function y = f (t) describes a position (or coordinate) y of a car moving along a straight line as a function of time t. The straight line has a positive and negative direction, so the car can move forward or backward. The derivative f ′ (t) represents velocity at time t, which can be positive (if the car is moving forward) or negative (if the car is moving backward). Absolute value of the velocity | f ′ (t)| is called speed. The second derivative f ′′ (t) represents acceleration. (a) The graph is increasing and concave up. If f ′ (x) > 0 then velocity is positive and the car is moving in the positive direction. If f ′′ (0) > 0 then velocity is increasing and, in this case, the car is moving faster and faster. The fact that the slope is increasing means that the graph is concave up. (b) The graph is increasing and concave down. If f ′ (x) > 0 then velocity is positive and the car is moving in the positive direction. If f ′′ (0) < 0 then velocity is decreasing and, in this case, the car is moving slower and slower. The fact that the slope is decreasing means that the graph is concave down. 6 https://youtu.be/tCs5DK951Js. 2.3 Derivatives and graphs 75 (c) The graph is decreasing and concave up. If f ′ (x) < 0 then velocity is negative and the car is moving in the negative direction. If f ′′ (0) > 0 then velocity is increasing and, again, increasing slope means that the graph is concave up. However, in this case, the speed is decreasing so the car is moving slower and slower. That is because, if the velocity increases from −3 to −1 then the speed decreases from 3 to 1. (d) The graph is decreasing and concave down. If f ′ (x) < 0 then velocity is negative and the car is moving in the negative direction. If f ′′ (0) < 0 then velocity is decreasing and, again, decreasing slope means that the graph is concave down. However, in this case, the speed is increasing so the car is moving faster and faster. That is because, if the velocity decreased from −1 to −3 then the speed increased from 1 to 3.7 Example 1. A cup of hot coffee left at room temperature will cool down, but the rate of cooling will slow down. Translate this into a statement about derivatives and graph of some function. Solution: The function here is the temperature T (t) of a cup of coffee as a function of time. Cooling down means that T ′ (t) < 0, and the rate of cooling slowing down means that T ′′ (t) > 0. The graph of T (t) will be decreasing and concave up. Exercise 1. Between January and May, 2021, the decline in Lake Mead water levels have accelerated. Translate this into a statement about derivatives and graph of some function. The next example and exercise will refer to the following two figures. Example 2. In the figure on the left, given the graph of y = f (x) (solid black line), determine which curve is the graph of its derivative f ′ (x), (a), (b), or (c). Solution: The function is decreasing up to about x = −4.5 and right after that it starts increasing. This means that the derivative should be negative up to −4.5, so 7 https://youtu.be/6NaUJ6OGcLU. 76 2 Derivatives the graph should be below the x-axis, and right after −4.5 it should become positive, so the graph should be above the x-axis. The only graph with such behaviour is (b). Similar behaviour happens at x = 0 and about x = 4.5, where the function f (x) changes direction and (b) changes sign by crossing the x-axis. Such points are called critical points. Notice also that when (b) is decreasing (between about −2.5 and 2.5), the function f (x) is concave down (the slope is decreasing), and when (b) is increasing, the function f (x) is concave up (the slope is increasing). These points where the derivative changes direction (here about −2.5 and 2.5) and the original function changes concavity are called inflection points. Exercise 2. In the figure on the right, given the graph of y = f (x) (solid black line), determine which curve is the graph of its derivative f ′ (x), (a), (b), or (c). Where are the critical points, and inflection points of f (x)? The next two examples will refer to the following two figures. Notice that the solid black line is the graph of the derivative y = f ′ (x), not the original function. Example 3. In the figure on the left, given the graph of y = f ′ (x) (solid black line), determine which curve is the graph of f (x), (a), (b), or (c). Solution: The derivative is positive between about −4 and 2 and negative outside of that interval, so the function f (x) should be increasing between −4 and 2 and decreasing outside. The answer is (b). The derivative changes direction at about x = −1.2, so the function (b) has an inflection point there, where it switches from concave up to concave down. Exercise 3. In the figure on the right, given the graph of y = f ′ (x) (solid black line), determine which curve is the graph of f (x), (a), (b), or (c). Example 4. Search “pole vault” on Youtube and watch some videos. Which of the following two graphs below more accurately describes the horizontal position x = x(t) of the vaulter as a function of time t? 2.3 Derivatives and graphs 77 Solution: During the approach before the jump, the athlete is accelerating, so the graph is increasing and concave up. Starting from plant and take-off stage, the vaulter starts moving in the vertical direction but the horizontal movement slows down, and the distance between take off and landing is relatively small. So the graph on the left is a more accurate description of the horizontal position x(t) of the vaulter over time. Exercise 4. Sketch a graph of the position of a tennis ball along the tennis court ℓ = f (t) as a function of time t when the players hit the ball back and forth from the baseline.8 Example 5. A skydiver free falling in a belly-to-earth (face down) position will approach the so called terminal speed of about 56 m/s. The speed is increasing due to gravity, but it is increasing slower and slower due to the drag force (air resistance). Sketch the graph of velocity as a function of time when the distance to the skydiver is measured (a) from the height of the jump or (b) from the ground. Solution: (a) When the height f (t) of the skydiver is measured from the original point of the jump, it is increasing and the derivative (velocity) f ′ (t) is positive. Also velocity will be increasing, concave down (because it is increasing slower and slower due to the drag force), and it will have a horizontal asymptote equal to the terminal speed. (b) When the height f (t) of the skydiver is measured from the ground, it is decreasing and the derivative f ′ (t) is negative. In this case the velocity is equal to minus the speed, so velocity will be decreasing, concave up (because it is decreasing slower and slower due to the drag force), and it will have a horizontal asymptote equal to the minus terminal speed. The two cases differ only by a minus sign, so the graph is simply flipped around the x-axis. 8 Photo from https://www.pexels.com/photo/two-person-playing-tennis-1619860/ 78 2 Derivatives Exercise 5. Sketch a graph of a function that satisfies the following properties: • • • • • limx→−∞ f (x) = 2 f (−3) = 2 limx→−3 f (x) does not exist f (−1) f ′ (0) < f (1)− 1−(−1) f (x) has a vertical asymptote at x = 2 • • • • • f (2) = 2 f (x) is continuous at x = 3 f (x) is not differentiable at x = 3 f ′ (x) > 0 for x > 3 f ′′ (x) < 0 for x > 3. Example 6. Suppose that two functions y = f (x) and y = g(x) are equal at x = a, i.e. f (a) = g(a), and f ′ (a) < g′ (a). Which function is bigger immediately to the right of x = a? Solution: We can see from the figure that the graphs of y = f (x) and y = g(x) intersect at x = a and the slope of g(x) is bigger than the slope of f (x) at the point of intersection. As a result, g(x) is bigger immediately to the right of x = a. Exercise 6. Suppose that two functions y = f (x) and y = g(x) are equal at x = a, i.e. f (a) = g(a), and f ′ (a) < g′ (a). Which function is bigger immediately to the left of x = a? Example 7. If f ′′ (x) > 0 on some interval then f ′ (x) is on that interval. and f (x) is Solution: If f ′′ (x) > 0 on some interval then f ′ (x) is increasing and f (x) is concave up on that intervals. Exercise 7. If f ′′ (x) < 0 on some interval then f ′ (x) is on that interval. The next example and exercise will refer to the following two figures. and f (x) is 2.3 Derivatives and graphs 79 Example 8. In the figure above on the left, determine which graph corresponds to f (x), f ′ (x), and f ′′ (x). Where are the inflection points of f (x), and what happens to f ′ (x) and f ′′ (x) at those points? Solution: We can see that at the points x where the dotted blue curve (a) crosses the x-axis, the solid green curve (b) changes direction from increasing to decreasing or vice versa. This means that (a) is the derivative of (b). Similarly, where the solid green curve (b) crosses the x-axis, the dashed red curve (c) changes direction, so (b) is the derivative of (c). This means that (c) if the graph of y = f (x), (b) if the graph of y = f ′ (x), and (a) if the graph of y = f ′′ (x). Inflection points are where f ′ (x) changes direction, which is at around x = 0, 0.75, 1.85, 3.2. At inflection points f ′ (x) changes direction, and f ′′ (x) crosses the x-axis. Exercise 8. In the figure above on the right, determine which graph corresponds to f (x), f ′ (x), and f ′′ (x). Where are the inflection points of f (x), and what happens to f ′ (x) and f ′′ (x) at those points? What happens at x = 0? Example 9. Give an example of a function f (x) such that f ′′ (0) = 0 but x = 0 is not an inflection point of f (x). Solution: One example if f (x) = x4 in the figure on the right. In this case, f ′ (x) = 4x3 and f ′′ (x) = 12x2 , so f ′′ (0) = 0, but f ′ (x) does not change direction at x = 0 and f (x) does not have change concavity at x = 0. Exercise 9. Given y = f ′ (x), sketch a continuous function y = f (x) in the two figures below. Notice that in the second case, the derivative f ′ (x) is not defined at x = 0, where it has a jump. 80 2 Derivatives Answer to Exercise 1. The function here is the water level h(t) of Lake Mead as a function of time. Water level declining means that h′ (t) < 0, and the decline accelerating means that h′′ (t) < 0. The graph of T (t) will be decreasing and concave down. Of course, water levels might fluctuate slightly, so we should talk about averages over a certain period of time. Answer to Exercise 2. The answer is (b). Critical points are about −4.8, −0.4, 4.2 where the function f (x) changes direction and f ′ (x) crosses the x-axis, and inflection points are about −3.5 and 1.5, where the derivative f ′ (x) changes direction. Answer to Exercise 3. The answer is (b). Answer to Exercise 4. https://youtu.be/aewfFlVg-MU Answer to Exercise 5. You can check one by one that the graph below satisfies f (−1) is the slope of the line connecting all the above properties. Notice that f (1)− 1−(−1) the points (−1, f (−1)) and (1, f (1)), which is bigger than f ′ (0) = 0 in the figure. Also, although f (x) has a vertical asymptote, the dot at (2, 2) indicates that we chose f (2) to be equal to 2; this is a legal move although the function will be discontinuous at x = 2. f (x) is continuous but not differentiable at x = 3 because it has a corner there. Answer to Exercise 6. f (x) is bigger immediately to the left of x = a. Answer to Exercise 7. If f ′′ (x) < 0 on some interval then f ′ (x) is decreasing and f (x) is concave down on that interval. Answer to Exercise 8. (a) if the graph of y = f (x), (b) if the graph of y = f ′ (x), and (c) if the graph of y = f ′′ (x). Inflection points are where f ′ (x) changes direction, which is at around x = 1.1, 2.3, 3.6. This is where (b) changes direction, and (c) crosses the x-axis. Notice that x = 0 is not an inflection point despite the fact that f ′′ (0) = 0, because f ′′ only touches the x-axis but does not cross it and as a result, f ′ (x) does not change direction at x = 0. So it is possible that f ′′ (x) = 0 but x is not an inflection point. 2.3 Derivatives and graphs 81 Answer to Exercise 9. Possible sketches are in the figures above. The functions y = f (x) could be shifted vertically, because adding a constant y = f (x) + c does not affect the derivative, ( f (x) + c)′ = f ′ (x). In the second figure, the function f (x) has a corner at x = 0, so f ′ (0) is undefined. The derivative jumps from positive to negative value, so increasing function suddenly becomes decreasing (like a ball bouncing off a wall changing direction suddenly). The derivative is increasing on both sides of x = 0, so the function is concave up on both sides. The function is increasing exactly when f ′ (x) > 0, i.e. the graph of y = f ′ (x) is above the x-axis. 82 2 Derivatives 2.4 Differentiation rules In this section we will learn how to use four rules of differentiation. • Product rule: the derivative of the product f (x)g(x) of two functions is ′ f (x)g(x) = f ′ (x)g(x) + f (x)g′ (x). • Quotient rule: the derivative of the ratio f (x) ′ g(x) = f (x) g(x) of two functions is f ′ (x)g(x) − f (x)g′ (x) . g(x)2 • Chain rule: the derivative of the composition f (g(x)) of two functions is ′ f (g(x)) = f ′ (g(x))g′ (x). • Inverse function rule: the derivative of the inverse function f −1 (x) is ′ f −1 (x) = 1 . f ′ ( f −1 (x)) A more convenient way to phrase the chain rule is that, if b = f (a) then f −1 )′ (b) = 1 f ′ (a) . Of course, in all these rules we assume that everything is well defined on the right hand side of each equation. For example, we never divide by zero, etc. Where the formulas come from. Below we will focus on learning how to use these rules, but if you are interested to learn where they come from, all the rules can be derived by simple manipulations from the algebraic definition of the derivative. You can learn more about the derivation of the chain rule9 and the inverse function rule 10 in the footnote links. Here we will only show how to derive the product rule. ∆y We want to see what happens to the ratio ∆x when the increment ∆x gets smaller and smaller and y = f (x)g(x). If ∆ f and ∆h are the increments of f and g then f (x + h) = f (x) + ∆ f , h(x + h) = g(x) + ∆g, and we can rewrite the increment ∆y = ∆( f g) of the product as 9 https://youtu.be/iJ94gm_-vsE https://youtu.be/y9jzS-sUeM8 10 2.4 Differentiation rules 83 ∆y = ∆( f g) = f (x + h)g(x + h) − f (x)g(x) = f (x) + ∆ f g(x) + ∆g − f (x)g(x) = ∆ f · g(x) + f (x) · ∆g + ∆ f · ∆g. After dividing by the increment ∆x (which we also call h), ∆g ∆ f ∆y ∆ f = g(x) + f (x) + ∆g → f ′ (x)g(x) + f (x)g′ (x) + f ′ (x) · 0 ∆x ∆x ∆x ∆x as ∆x → 0, which is exactly the formula in the product rule. The quotient rule can be shown by a similar calculation. Chain rule. We will start with the chain rule, because it is the most basic building block, and because it will give us a much richer collection of functions to play with when we use the product rule and quotient rule. The chain rule is sometimes called the outside-inside rule, because when we compute the derivative of the composition f (g(x)) we first take derivative of the outside function f ′ , plug in the inside function g(x), f ′ (g(x)), and then multiply by the derivative of the inside function g′ (x). Example 1. State what the outside function f (x) and inside function g(x) are, and compute the derivative using the chain rule. √ (a) e2x+5 (d) ln x p 2 (e) (b) cos(x2 ) ex + cos(x) (c) cos2 (x) (f) ecos(x) e2x Solution: (a) In e2x+5 , the outside function is f (x) = ex and the inside function is g(x) = 2x + 5, so e2x+5 = f (g(x)). Because f ′ (x) = ex and g′ (x) = 2, the chain rule f ′ (g(x))g′ (x) gives e2x+5 · 2 = 2e2x+5 . (b) In cos(x2 ), the outside function is f (x) = cos x and the inside function is g(x) = x2 , so cos(x2 ) = f (g(x)). Because f ′ (x) = − sin x and g′ (x) = 2x, the chain rule f ′ (g(x))g′ (x) gives − sin(x2 ) · 2x = −2x sin(x2 ). (c) In cos2 (x) = (cos x)2 , the outside function is f (x) = x2 and the inside function is g(x) = cos x, so cos2 (x) = f (g(x)). Because f ′ (x) = 2x and g′ (x) = − sin x, the chain rule f ′ (g(x))g′ (x) gives 2 cos x · (− sin x) = −2 cos x sin x. √ ln x, the√outside function is f (x) = ln x and the inside function is g(x) = √ (d) In1/2 1 x = x , so ln x = f (g(x)). Because f ′ (x) = 1x and g′ (x) = 2√ , the chain x rule f ′ (g(x))g′ (x) gives √1 x 1 · 2√ = x 1 2x . There is a much easier way to compute √ this derivative if we first simplify the function ln x = ln(x1/2 ) = 21 ln x; then the 1 derivative is immediately 2x . 84 2 Derivatives √ 2 (e) In ex + cos(x), the outside function is f (x) = x = x1/2 and the inside 2 1 function is ex + cos(x). The derivative of the outside function is f ′ (x) = 2√ . The x p 2 2 derivative of the inside function (ex +cos(x))′ = (ex )′ −sin x requires us to use the 2 2 2 2 chain rule one more time to compute (ex )′ = ex · (x2 )′ = ex · 2x = 2xex . Finally, we get that 2 2xex − sin x f (g(x))g (x) = p 2 . 2 ex + cos(x) ′ ′ (f) This problem might look like we need to use the quotient rule, but, in fact, we can simplify the function as ecos(x)−2x and apply the chain rule. The outside function is f (x) = ex and the inside function is g(x) = cos(x) − 2x. Because f ′ (x) = ex and g′ (x) = − sin(x) − 2 = −(sin(x) + 2), the chain rule f ′ (g(x))g′ (x) gives −ecos(x)−2x (sin(x) + 2). The case when the inside function is linear, as in (a) above, is so common that it is worth stating it explicitly as a special case of the chain rule: ′ f (mx + b) = m f ′ (mx + b). One particularly important case is an exponential function y = ax with the general base a > 0. Recall that we can always rewrite it as ax = eκx with κ = ln(a), and we already know that (eκx )′ = κeκx = ln(a)ax . This shows that (ax )′ = ln(a)ax . It is worth remembering this formula instead of repeating the same argument. Exercise 1. State what the outside function f (x) and inside function g(x) are, and compute the derivative using the chain rule. √ x (d) ln(cos(x)) p (e) (23x 32x )7 (a) 2 (b) ln(x2 ) p sin(x) (c) (f) 2cos(x) 3sin(x) Example 2. Given the graph of a function y = f (x) in the figure and the table of values for the function d y = g(x), compute dx f (g(x))|x=1 . x g(x) g′ (x) 0 0 2 1 3 0.5 2 0.5 -1 3 1 0 2.4 Differentiation rules 85 d Solution: If we use the chain rule, dx f (g(x)) = f ′ (g(x))g′ (x) and then plug in x = 1, d ′ ′ we get dx f (g(x))|x=1 = f (g(1))g (1). From the table, g(1) = 3 and g′ (1) = 0.5, so f ′ (g(1))g′ (1) = f ′ (3) · 0.5. From the graph we see that f ′ (3) = −2 because between x = 2 and x = 3 the graph is a line connecting the points (2, 4) and (4, 0), d so it has slope −2. This gives dx f (g(x))|x=1 = f ′ (g(1))g′ (1) = −2 · 0.5 = −1. Exercise 2. In the setting of the previous problem, compute d dx g( f (x))|x=1 . Example 3. If the slope of f (x) is always positive and the slope of g(x) is always negative, are the following functions increasing or decreasing? (a) f (g(x)) (b) g( f (x)) (c) f ( f (x)) (d) g(g(x)) Solution: To decide if each function is increasing or decreasing, we will compute its derivative and check if it is positive or negative. We will use symbols ⊕ and ⊖ to indicate if a quantity is positive or negative. (a) ( f (g(x)))′ = f ′ (g(x))g′ (x) = ⊕ × ⊖ = ⊖, so decreasing. (b) (g( f (x)))′ = g′ ( f (x)) f ′ (x) = ⊖ × ⊕ = ⊖, so decreasing. (c) ( f ( f (x)))′ = f ′ ( f (x)) f ′ (x) = ⊕ × ⊕ = ⊕, so increasing. (d) (g(g(x)))′ = g′ (g(x))g′ (x) = ⊖ × ⊖ = ⊕, so increasing. Exercise 3. Let f (t) be the quantity (measured in kg) of a chemical produced by some chemical reaction up to time t (measured in minutes). What is the rate of production of this chemical in g/sec at time t seconds? Product and quotient rule. Now, we will add the product and quotient rules into the mix. Example 4. Compute the derivatives d f (x)g(x) dx given the table of values: x 2 x=2 and f (x) 0.5 d f (x) dx g(x) g(x) 3 x=2 f ′ (x) g′ (x) −1 1 Solution: By the product rule, d f (x)g(x) dx x=2 = f ′ (2)g(2) + f (2)g′ (2) = (−1) · 3 + 0.5 · 1 = −2.5. By the quotient rule, d f (x) dx g(x) x=2 = f ′ (2)g(2) − f (2)g′ (2) (−1) · 3 − 0.5 · 1 3.5 = =− . 2 2 g(2) 3 9 86 2 Derivatives Exercise 4. Estimate the derivatives d f (x)g(x) dx x=2.3 x f (x) g(x) given the table of values: 1 1.5 -1 and d f (x) dx g(x) 2 0.5 0.25 x=2.3 3 0 0 4 0.3 -0.35 Example 5. Compute the derivatives of the following functions. (a) e−2x+5 cos(3x) (c) (b) tan(x) (d) 1 sin2 (x) x2 ex sin(x) Solution: (a) We use the product rule and then the chain rule, ′ ′ ′ e−2x+5 cos(3x) = e−2x+5 cos(3x) + e−2x+5 cos(3x) = e−2x+5 (−2) cos(3x) + e−2x+5 − sin(3x)(3) = −2e−2x+5 cos(3x) − 3e−2x+5 sin(3x) = −e−2x+5 2 cos(3x) + 3 sin(3x) . (b) We first rewrite tan(x) = sin(x) ′ cos(x) sin(x) cos(x) and then use the quotient rule, (sin(x))′ cos(x) − sin(x)(cos(x))′ cos2 (x) cos(x) cos(x) + sin(x) sin(x) = cos2 (x) cos2 (x) + sin2 (x) 1 = = = sec2 (x). 2 cos (x) cos2 (x) = It is good to remember that tan′ (x) = sec2 (x). (c) One way to compute the derivative of sin21(x) is to use the quotient rule (combined with the chain rule when taking the derivative of sin2 (x)) 1 ′ (1)′ sin2 (x) − (1)(sin2 (x))′ = sin2 (x) (sin2 (x))2 (0) sin2 (x) − 2 sin(x) cos(x) = sin4 (x) 2 sin(x) cos(x) 2 cos(x) =− =− 3 . 4 sin (x) sin (x) 2.4 Differentiation rules 87 Another way to compute the derivative is to rewrite 1 sin2 (x) = (sin(x))−2 and use the chain rule with the outside function f (x) = x−2 and the inside function g(x) = sin(x), ′ 2 cos(x) (sin(x))−2 = −2(sin(x))−2−1 cos(x) = −2(sin(x))−3 cos(x) = − 3 . sin (x) (d) In this case, we need to apply the product rule twice. First, we can think of x2 ex sin(x) as the product of x2 and ex sin(x), so that ′ ′ ′ ′ x2 ex sin(x) = x2 ex sin(x) + x2 ex sin(x) − = 2xex sin(x) + x2 ex sin(x) . In the second term, we need to use the product rule again to compute (ex sin(x))′ = (ex )′ sin(x) + ex (sin(x))′ = ex sin(x) + ex cos(x), and then plug in above to get the final answer, ′ x2 ex sin(x) = 2xex sin(x) + x2 ex sin(x) + x2 ex cos(x) = xex 2 sin(x) + x sin(x) + x cos(x) . In the last problem when we computed the derivative of the product of three functions and used the product rule twice, the two steps can be combined into one easy-to-remember formula: ′ f (x)g(x)h(x) = f ′ (x)g(x)h(x) + f (x)g′ (x)h(x) + f (x)g(x)h′ (x). The same rule will work with four or more factors. We have to apply derivative to each factor separately and then add up all the terms. Exercise 5. Compute the derivatives of the following functions. (a) e−x (cos(x) + sin(x)) (b) ln x 1+x2 (c) (d) 1 (2x +1)2 x22x ln(1 + x2 ) Inverse function rule. An explanation of the inverse function rule can be found in the footnote link11 , but the basic idea is quite simple and we have already seen it in the Example 6 in Section 2.1. Basically, in the inverse function the role of ∆y variables x and y switches, so the derivative is approximated by ∆x ∆y instead of ∆x . The only subtle point is that if b = f (a) then the same increments ∆x and ∆y are used at x = a for f or y = b for f −1 ; that is why the derivative of the inverse function at b is the reciprocal of the derivative of the original function at a. Before we look at the examples of using the inverse function rule, let us first review the practical meaning of the derivative of an inverse function. 11 https://youtu.be/y9jzS-sUeM8 88 2 Derivatives Example 6. Chocolate store sell 150 Sacher tortes (small size, serves 10 people) each month for the price of $40. If they lowered the price to $35, they would sell 160 per month. If N = S(p) is the number N of tortes sold each month when the price is p dollars, what formula does the above information correspond to? (a) (S−1 )′ (40) ≈ −5 (b) S′ (40) ≈ −0.5 (c) S′ (150) ≈ 10 (d) (S−1 )′ (150) ≈ −0.5 Solution: When solving this type of problem, it is helpful to visualize it using the following diagram that indicates the input variable and output variable for both N = S(p) and its inverse p = S−1 (N). S Price p (in $) Tortes sold N (in #) S−1 • The input of S and S′ should be price p, in this case p = 40, which eliminates (c). • The input of S−1 and (S−1 )′ should be the number N of tortes sold, in this case N = 150, which eliminates (a). • The derivative S′ (40) can be approximated by ∆N ∆p (increment of the output over increment of input). In our case, ∆N = 160 − 150 = 10 and ∆p = 35 − 40 = −5, 10 so S′ (40) ≈ −5 = −2. So (b) is not correct. ∆p • The derivative (S−1 )′ (150) can be approximated by ∆N (again, increment of the output over increment of input), so (S−1 )′ (150) ≈ −5 = −0.5. So (d) is the 10 correct answer. Exercise 6. The total cost of owning a car depends on the APR (annual percentage rate) of the auto loan. Suppose that when APR is 4%, the total cost is $30, 000, and lowering APR to 3.5% will decrease the total cost to $29, 500. If c = f (r) is the total cost c at the rate r, what formula does the above information correspond to? (a) ( f −1 )′ (4) ≈ 0.5 (b) ( f −1 )′ (30, 000) ≈ 0.001 (c) (d) f ′ (30, 000) ≈ 500 f ′ (4) ≈ −500 Next, let us practice the inverse function rule: if b = f (a) then f −1 )′ (b) = 1 f ′ (a) . 2.4 Differentiation rules 89 Example 7. Given the graph of a function y = f (x) in the figure and the table of values for the function y = g(x), which is invertible, compute (a) (g−1 )′ (3) d −1 and (b) dx g ( f (x))|x=1 . x 0 g(x) 3 g′ (x) −1.5 1 2 −1 2 3 1.5 1.25 −0.5 −0.25 Solution: (a) Inverse function rule tells us that (g−1 )′ (3) = g′ 1(a) , where a is such that 3 = g(a). Looking at the table, we see that g(0) = 3, so a = 0. This means that 1 (g−1 )′ (3) = g′ 1(0) = −1.5 = − 23 . d −1 (b) First, by chain rule, dx g ( f (x))|x=1 = (g−1 )′ ( f (1)) f ′ (1). Looking at the d −1 g ( f (x))|x=1 = 2(g−1 )′ (2) = graph, we see that f (1) = 2 and f ′ (1) = 2, so dx 2 2 2 g′ (a) , where 2 = g(a). Looking at the table, g(1) = 2, so a = 1 and g′ (1) = −1 = −2. Exercise 7. Given the functions in Example 7, compute (a) (g−1 )′ (1.5) and (b) d −1 dx g ( f (x))|x=3 . When we discussed inverse functions, we define three classic ones: ln(x) as the inverse of ex , arcsin(x) as the inverse of sin(x) on [− π2 , π2 ], and arctan(x) as the inverse of tan(x) on (− π2 , π2 ). One can use the inverse function rule to show that: ′ 1 ln(x) = , x ′ arctan(x) = 1 , 1 + x2 ′ 1 . arcsin(x) = √ 1 − x2 The first one is explained in the footnote link12 , so let us show the second one here. Example 8. Show that ′ arctan(x) = 1 . 1 + x2 Solution: Before solving the problem, recall the graph of arctan(x). Its range is between (− π2 , π2 ), and it has horizontal asymptotes at +∞ and −∞, so the slope is approaching 0 1 there. We can see that 1+x 2 → 0 as x → +∞ or −∞, so the formula matches the behaviour of the slope. Also, arctan(x) is increasing, which matches that 12 https://youtu.be/-W7r1Ug062w 1 1+x2 > 0. 90 2 Derivatives Now, let us prove the formula. Recall that we already proved that tan′ (x) = sec2 (x). If b = tan(a) for some a ∈ (− π2 , π2 ) then arctan′ (b) = 1 1 = . tan′ (a) sec2 (a) However, the answer should be in terms of b, so we need to express sec2 (a) in terms of b = tan(a), which can be done by finding the relationship between sec(a) and tan(a) among the Pythagorean trigonometric identities13 : sec2 (a) = 1 + tan2 a = 1 1 + b2 , so arctan′ (b) = 1+b 2 , which is exactly what we wanted. Exercise 8. Show that ′ 1 arcsin(x) = √ . 1 − x2 √ Hint: you can use the identity cos(x) = 1 − sin2 (x) for x ∈ [− π2 , π2 ]. Beyond essentials: logarithmic differentiation. Although this rule is a bit more advanced, it is actually quite simple and very useful. It say that ′ f ′ (x) = f (x) · ln f (x) whenever f (x) > 0. The main point of this rule is that sometimes it is easier to calculate the derivative of the logarithm ln f (x) of a function f (x) instead of calculating f ′ (x) directly. The reason why we need f (x) to be positive is because we are only allowed to plug in positive values into ln(x). When f (x) is negative, we can use a more general rule ′ f ′ (x) = f (x) · ln | f (x)| . The logarithmic differentiation rule can be used, for example, to prove the power rule (xn )′ = nxn−1 for all n. If you recall, we only checked this rule in a few special cases, but for general n it can be obtained by logarithmic differentiation. For the explanation of the logarithmic differentiation rule and the demonstration of the power rule, see the footnote link.14 13 14 https://en.wikipedia.org/wiki/Pythagorean_trigonometric_identity https://youtu.be/hwrTON7VAGw 2.4 Differentiation rules 91 Answer to Exercise 1. √ √ 1 . (a) f (x) = 2x , g(x) = x, ( f (g(x)))′ = ln(2)2 x 2√ x 2x 2 2 ′ (b) f (x) = ln(x), g(x) = x , ( f (g(x))) = x2 = x . √ cos(x) . (c) f (x) = x, g(x) = sin(x), ( f (g(x)))′ = √ 2 sin(x) sin(x) = − cos(x) = − tan(x). (32 )x = 9x , we can simplify ( f (g(x)))′ (d) f (x) = ln(x), g(x) = cos(x), (e) Since 23x = (23 )xp= 8x and 32x = 23x 32x = 8x 9x = p (8 · 9)x = 72x and (23x 32x )7 = (72x )7 = (72x )7/2 = (727/2 )x . This means that we can apply the rule (ax )′ = ln(a)ax with a = 727/2 to get the derivative ln(727/2 )(727/2 )x = 27 ln(72)(727/2 )x . It is important to try to simplify, if possible, before taking derivatives. (f) This might look like we need the quotient rule. However, we can rewrite 2cos(x) eln(2) cos(x) = ln(3) sin(x) = eln(2) cos(x)−ln(3) sin(x) 3sin(x) e and use the chain rule with f (x) = ex and g(x) = ln(2) cos(x) − ln(3) sin(x) to get the derivative eln(2) cos(x)−ln(3) sin(x) (− ln(2) sin(x) − ln(3) cos(x)) =− 2cos(x) (ln(2) sin(x) + ln(3) cos(x)). 3sin(x) Answer to Exercise 2. d dx g( f (x))|x=1 = g′ ( f (1)) f ′ (1) = g′ (2) f ′ (1) = −1·2 = −2. Answer to Exercise 3. Notice that time changed to seconds from minutes and quantity changed to grams from kilograms. First, the quantity (in kg) of the chemt t ical produced up to time t seconds will be equal to f ( 60 ), because t sec = 60 t min. The rate of production is its derivative and, using the chain rule, ( f ( 60 ))′ = t 1 t 1 ′ ′ ) 60 f ( 60 ) 60 , which is measured in kg/sec. Since we want g/sec, we translate f ( 60 t 1 t 100 kg/sec = f ′ ( 60 ) 60 × 1000 g/sec = f ′ ( 60 ) 6 g/sec. Answer to Exercise 4. We could estimate the first derivative of the product in two ∆y ways. First, since x = 2.3 is between 2 and 3, we could estimate by ∆x between these two points with y = f (x)g(x), so d f (x)g(x) dx x=2.3 ≈ f (3)g(3) − f (2)g(2) 0 · 0 − 0.5 · 0.25 = = −0.125. 3−2 1 Another way is to use the product rule f ′ (2.3)g(2.3) + f (2.3)g′ (2.3) first and then estimate f (2.3), g(2.3), f ′ (2.3) and g′ (2.3). To estimate the derivatives, we could ∆y use ∆x between x = 2 and x = 3, f ′ (2.3) ≈ f (3) − f (2) g(3) − g(2) = −0.5, g′ (2.3) ≈ = −0.25. 3−2 3−2 92 2 Derivatives To estimate f (2.3) we could use the straight line connecting (2, 0.5) and (3, 0). We already computed its slope, m = −0.5, and it passes through the point (3, 0), so the line is y = 0 − 0.5(x − 3) and f (2.3) ≈ −0.5(2.3 − 3) = 0.35. Similarly, to estimate g(2.3) we could use the straight line connecting (2, 0.25) and (3, 0), which is y = −0.25(x − 3), so g(2.3) ≈ −0.25(2.3 − 3) = 0.175. Finally, plugging in all the estimates, f ′ (2.3)g(2.3) + f (2.3)g′ (2.3) ≈ (−0.5)0.175 + 0.35(−0.25) = −0.175. 0 0 d f (x) For the derivative dx g(x) x=2.3 , the first method will not work, because the ratio is undefined at x = 3, so let us use the second method via the quotient rule: d f (x) dx g(x) f ′ (2.3)g(2.3) − f (2.3)g′ (2.3) g(2.3)2 x=2.3 (−0.5)0.175 − 0.35(−0.25) = 0. ≈ 0.1752 = Actually, whenever we have 00 at one of the endpoints, we will always get the f (x) estimate for the derivative of g(x) equal to 0 if we use straight lines to approximate each function. This is because, given our line estimates f (x) ≈ −0.5(x − 3) and g(x) ≈ −0.25(x − 3) between x = 2 and x = 3, if we divide them we will get f (x) −0.5(x−3) g(x) ≈ −0.25(x−3) = 2, so our estimate of the ratio is constant, and its derivative is 0. 1+x2 −2x2 ln(x) ln 2)2x . (c) − (2 . x(1+x2 )2 (1+2x )3 2 2x (d) 22x ln(1 + x2 ) + 2 ln(2)x22x ln(1 + x2 ) + 2x1+x2 2 . If we write 22x = 4x then 2 4x also write this as 4x ln(1 + x2 ) + ln(4)x4x ln(1 + x2 ) + 2x . 1+x2 Answer to Exercise 5. (a) −2e−x sin(x). (b) Answer to Exercise 6. (b) ( f −1 )′ (30, 000) ≈ ∆r ∆c = −0.5 −500 = 0.001 % $. 1 = −2. Answer to Exercise 7. (a) (g−1 )′ (1.5) = g′ 1(2) = −0.5 d −1 −1 ′ ′ −1 ′ (b) dx g ( f (x))|x=3 = (g ) ( f (3)) f (3) = −2(g ) (2) = g−2 ′ (1) = −2 −1 Answer to Exercise 8. If b = sin(a) then arcsin′ (b) = we can 1 1 1 1 = =√ =√ . ′ sin (a) cos(a) 1 − b2 1 − sin2 (a) = 2. 2.5 First applications: old and new 93 2.5 First applications: old and new We have learned how to calculate the derivatives of many functions, and now we will begin to apply this skill. We will start with two topics that are already familiar to us on a conceptual level – tangent lines and shapes of graphs – and we will combine our conceptual understanding with explicit calculations of derivatives. After that we will take a look at a new topic – implicit differentiation. Tangent line, again. We know the formula y = f (a) + f ′ (a)(x − a) for the tangent line to y = f (x) at x = a. In addition to being called the tangent line, this linear function is sometimes also called: • local linearization of f (x) near x = a; • best linear approximation to f (x) near x = a. The names reflect that f (x) is well approximated by its tangent line locally near x = a, i.e. f (x) ≈ f (a) + f ′ (a)(x − a) near x = a. It is a good idea to remember a few special cases: ex ≈ 1 + x, sin(x) ≈ x, ln(1 + x) ≈ x, (1 + x)κ ≈ 1 + κx, all near x = 0. In the last one, κ is any constant power. Example 1. Check that ex ≈ 1 + x and sin(x) ≈ x near x = 0. Solution: Because (ex )′ = ex and e0 = 1, the local linearization of ex near x = 0 is 1 + 1 · (x − 0) = 1 + x, so ex ≈ 1 + x there. Similarly, because (sin x)′ = cos x and sin 0 = 0, cos 0 = 1, the local linearization of sin x near x = 0 is 0 + 1 · (x − 0) = x, and sin x ≈ x there. Exercise 1. Check that ln(1 + x) ≈ x and (1 + x)κ ≈ 1 + κx near x = 0. Looking at the figure we see that: • If the function is concave up then tangent line is below, so using the tangent line to approximate the function will underestimate. We can see that the function is concave up by looking at its graph, or checking that f ′′ (x) > 0. • If the function is concave down then tangent line is above, so using the tangent line to approximate the function will overestimate. We can see that the function is concave down by looking at its graph, or checking that f ′′ (x) < 0. 94 2 Derivatives • The error of approximation at a point x is the absolute value of the difference between f (x) and its tangent line approximation y = f (a) + f ′ (a)(x − a). Example 2. Without using a calculator decide which number is bigger, e0.1 or 1.1? Solution: 1.1 = 1 + 0.1 is the tangent line y = 1 + x to the exponential function y = ex at x = 0 evaluated at x = 0.1, so to compare 1 + 0.1 with e0.1 we need to determine whether 1 + x is smaller or bigger than ex . We know that the graph of y = ex is concave up, but now we can also check it using formulas, because (ex )′′ = ex > 0. This implies that the tangent line underestimates, so 1.1 < e0.1 . The error of approximation is |e0.1 − 1.1| = 0.00517 . . . . √ Exercise 2. (a )Without using a calculator decide which number is bigger, 1.1 or 1.05? (b) The speed of sound in dry (0% humidity) air at temperature T °C can be calculated as r T 331.3 1 + m/s. 273.15 Simplify this assuming that the temperature is not far from 0°C. What is the error of approximation at 25°C? Example 3. The equation ln(1 + x) + x = 41 has a solution near x = 0. Find an approximate solution by local linearization near x = 0. Solution: We know that local linearization near 0 of ln(1 + x) is x, so we can say that 41 = ln(1 + x) + x ≈ x + x = 2x. This means that 14 ≈ 2x or x ≈ 18 = 0.125. Using the calculator we can find that the actual solution is x = 0.1288 . . ., so our approximation 0.125 is pretty close. √ 3 Exercise 3. The equation 1 + x + x3 − x2 =√1.1 has a solution near x = 0. Find an approximate solution by local linearization 1 + a ≈ 1 + a2 near a = 0. Shapes of graphs. We know that the sign of f ′ (x) determines whether the function y = f (x) is increasing or decreasing, and the sign of f ′′ (x) determines whether the function is concave up or concave down. Let us now combine this information with explicit calculations of derivatives. Example 4. Find where the function y = 2x3 − 3x2 − 12x + 1 is increasing, decreasing, concave up, and concave down. Sketch its graph. Solution: The first derivative is y′ (x) = 6x2 − 6x − 12 = 6(x2 −x−2). We can check that x2 −x−2 = 0 when x = −1 and x = 2. Between −1 and 2 (for example at x = 0) x2 − x − 2 < 0 is negative, and it is positive outside of [−1, 2]. This means that the function 2x3 −3x2 −12x+1 is decreasing on (−1, 2) and increasing on (−∞, −1) and (2, ∞). The second derivative is y′′ (x) = 6(2x − 1), it is equal to zero at x = 0.5, negative to the left and positive to the right of 0.5. So the function is concave down on (−∞, 0.5) and concave up on (0.5, ∞). In particular, x = 0.5 is an inflection 2.5 First applications: old and new 95 point. With this information, we can sketch the general shape of this function as in the figure. Exercise 4. Find where the function ln(1 + x2 ) is increasing, decreasing, concave up, and concave down. Sketch its graph. Implicit differentiation. Sometimes the relationship between variables x and y is not given by a simple formula y = f (x) but, instead, by some complicated equation in terms of x and y that we cannot solve for y explicitly. For example, consider an equation sin(x + y) − cos(xy) + 1 = 0. The blue curves in the figure show all the points (x, y) that satisfy this equation in a part of the x-y plane, and we can see that for a given x there could be many possible y. More importantly, we cannot solve this equation for y explicitly. Suppose that we are interested in a particular point on this curve, for example, A = (2, 2.383 . . .) in the figure. A piece of the curve near this points is given by some function y = y(x). It is called an implicit function because we do not know it explicitly and only know that it satisfies the equation sin(x + y(x)) − cos(xy(x)) + 1 = 0. Can we find the derivative y′ (2), which is the slope of the tangent line in the figure, without knowing this function? The answer is yes, using implicit differentiation. Example 5. Compute the derivative y′ (2) and the point A = (2, 2.383 . . .) in the figure. Solution: What implicit differentiation means is that we differentiate the above equation sin(x + y(x)) − cos(xy(x)) + 1 = 0, pretending that we know y(x) and using the chain rule, and then solving for y′ (x) at the end. If this equation is true then ′ sin(x + y(x)) − cos(xy(x)) + 1 = (0)′ = 0 is also true. First, we use the chain rule, cos(x + y(x)) · (x + y(x))′ + sin(xy(x)) · (xy(x))′ = 0. Then we use whatever rule is necessary for the remaining derivatives, in this case the sum rule and the product rule, cos(x + y(x)) · (1 + y′ (x)) + sin(xy(x)) · (y(x) + xy′ (x)) = 0. Notice that, at this step, we simply write y′ (x) for the derivative of y(x) formally, without knowing what it is. However, the good news is that we can now solve 96 2 Derivatives the above equation for y′ (x) by multiplying out, collecting all the terms with y′ (x) together, and moving all the other terms to the other side of the equation, cos(x + y(x)) + sin(xy(x))x y′ (x) = − cos(x + y(x)) − sin(xy(x))y(x). Now we can divide by cos(x + y(x)) + sin(xy(x))x to get y′ (x) = − cos(x + y(x)) + sin(xy(x))y(x) . cos(x + y(x)) + sin(xy(x))x Finally, since we are interested in the point A = (2, 2.383 . . .), this means that x = 2 and y(2) = 2.383 . . ., so we can plug in these values into the formula y′ (2) = − cos(2 + 2.383) + sin(2 · 2.383)2.383 = −1.1648 . . . . cos(2 + 2.383) + sin(2 · 2.383)2 The calculation we just did will be a bit cleaner if we write y instead of y(x) and y′ instead of y′ (x), keeping in mind that y and y′ depends on x: (sin(x + y) − cos(xy) + 1)′ = (0)′ = 0 cos(x + y) · (x + y)′ + sin(xy) · (xy)′ = 0 cos(x + y) · (1 + y′ ) + sin(xy) · (y + xy′ ) = 0 cos(x + y) + sin(xy)x y′ = − cos(x + y) − sin(xy)y y′ = − cos(x + y) + sin(xy)y , cos(x + y) + sin(xy)x and then plug in x = 2 and y = 2.383. Also, we could plug in the values x = 2 and y = 2.383 before solving for y′ , which would actually make solving for y′ much easier. Make sure to take advantage of this in the next two exercises. Exercise 5. A bagel with an inner radius r and an outer radius R has volume V = 14 π 2 (R + r)(R − r)2 . Given a fixed amount of dough, we can change the shape of the bagel. If the volume of the bagel is V = 4π 2 , find the derivative dR dr when r = 1 and R = 3. Exercise 6. Find the tangent line to the curve x2 + 2xy − y3 = 7 at (2, 1). 2.5 First applications: old and new 97 d 1 Answer to Exercise 1. Because dx ln(1 + x)|x=0 = 1+x |x=0 = 1 and ln(1 + 0) = 0, the best linear approximation of ln(1 + x) near x = 0 is 0 + 1 · (x − 0) = x, so d ln(1 + x) ≈ x there. Because dx (1 + x)κ |x=0 = κ(1 + x)κ−1 |x=0 = κ and (1 + 0)κ = 1, the best linear approximation of (1 + x)κ near x = 0 is 1 + κ · (x − 0) = 1 + κx, and (1 + x)κ ≈ 1 + κx there. Answer to Exercise 2.√(a) We need to use the formula (1 + x)κ ≈ 1 + κx near x = 0 with κ =√21 , so 1 + x ≈ 1 + 2x near x = 0. In other words, 1 + 2x is the √ tangent √ line to√ 1 + x at x = 0. We know that the graph of x is concave down, and 1 + x is x shifted to √ the left by 1, so it is also concave down. We √ can also check this using formulas: ( 1 + x)′ = 21 (1 + x)−1/2 and, therefore, ( 1 + x)′′ = −3/2 < 0. For a concave down function the tangent line overestimates, so − 14 (1 + x) √ √ √ 1.1 = 1 + 0.1 < 1 + 0.1 = 1.05. The error of approximation is | 1.1 − 1.05| = 2 0.00119 . . . . √ (b) Since 1 + x ≈ 1 + 2x near 0, r T T 331.3 1 + ≈ 331.3 1 + = 331.3 + 0.606T. 273.15 2 · 273.15 At T = 25, the original formula gives 346.1292, while the tangent line gives 346.461, so the error of approximation is 0.3318 m/s. √ 3 3 Answer to Exercise 3. We can √ think ofa a = x+x inside 1 + x + x as one number and use local linearization 1 + a ≈ 2 to write 1.1 = p 1 + x + x3 − x3 x + x3 x3 x ≈ 1+ − = 1+ . 2 2 2 2 This gives that 1.1 ≈ 1 + 2x and, solving for x, we get x ≈ 0.2. Using the calculator we can find that the actual solution is x = 0.210 . . ., so our approximation 0.2 is pretty close. Answer to Exercise 4. The first derivative is y′ (x) = 2x . It is equal to 0 at x = 0, negative to the left 1+x2 and positive to the right of x = 0. This means that the function ln(1 + x2 ) is decreasing on (−∞, 0) and increasing on (0, ∞). The second derivative is 2) y′′ (x) = 2(1−x , it is equal to zero at −1 and 1, pos(1+x2 )2 itive in between on the interval (−1, 1) and negative outside. The function is concave up on (−1, 1) and concave down on (−∞, −1) and (1, ∞). In particular, x = −1 and x = 1 are inflection points. With this information, we can sketch the general shape of this function as in the figure. Answer to Exercise 5. If 14 π 2 (R + r)(R − r)2 = 4π 2 then (R + r)(R − r)2 = 16. Keeping in mind that R = R(r) is a function of r and R′ means R′ (r), if we differ- 98 2 Derivatives entiate the equation ((R + r)(R − r)2 )′ = (16)′ = 0, we get (R′ + 1)(R − r)2 + (R + r)2(R − r)(R′ − 1) = 0, Before solving for R′ , if plug in r = 1, R = 3 first, we get 4(R′ +1)+16(R′ −1) = 0, or 20R′ − 12 = 0. Solving for R′ , we get R′ = 0.6. Answer to Exercise 6. Differentiating (x2 + 2xy − y3 )′ = (7)′ = 0 we get 2x + 2y + 2xy′ − 3y2 y′ = 0. Plugging in x = 2 and y = 1 we get 4 + 2 + 4y′ − 3y′ = 0, so y′ = −6. The tangent line is y = 1 − 6(x − 2). 2.6 Critical points 99 2.6 Critical points In the next section we will focus on solving optimization problems, which is one of the most important applications of derivatives, but first we need to learn about critical points and how to use them to find local minima and local maxima. A point x in the domain of a function y = f (x) is called a critical point if one of the following two things happen: • the derivative f ′ (x) = 0, • the derivative f ′ (x) is undefined. A quick summary about the critical points can be found in the footnote link15 and the figure on the right. Points B,C and D in the figure are where f ′ (x) = 0 (so the tangent line is horizontal), and points A and E are where the derivative is undefined. Point A is a corner and the function has different slopes to the right and left of it, and point E has a vertical tangent line, whose slope is infinite, so undefined. • A point x in the domain of a function y = f (x) is called a local maximum or local minimum if nearby x the maximum or minimum value is reached at this point x. For example, the points A and D in the figure are local minima, point B is a local maximum, and points C and E are neither. • To find local maxima and minima we usually need to find critical points first, because local minima or maxima must be critical points when they are inside the domain (not the endpoints). This is why the critical points are so important. We will discuss what happens at the endpoints in the examples below and next section. When we find a critical point, how do we decide if it is a local minimum, or local maximum, or neither? We can use the first derivative test (FDT) or second derivative test (SDT). We have already seen them implicitly when we looked at graphs of functions, but now we will spell them out explicitly. We will always assume that a function is at least continuous at the critical point x = a. • (Local max) Critical point x = a is a local maximum if (FDT) left of a and f ′ (x) < 0 to the right of x, or (SDT) f ′ (a) = 0 and • (Local min) Critical point x = a is a local minimum if (FDT) left of a and f ′ (x) > 0 to the right of x, or (SDT) f ′ (a) = 0 and f ′ (x) > 0 to the f ′′ (a) < 0. f ′ (x) < 0 to the f ′′ (a) > 0. In the case of SDT, the second derivative f ′′ (a) is defined only if the first derivative is defined, so the critical point must be of the first type, f ′ (a) = 0. If the derivative is 15 https://youtu.be/URm5AQwOkLQ 100 2 Derivatives undefined, we cannot use the SDT. The meaning of FDT is simple – if the function is increasing up to and decreasing after x = a then the point is a local maximum (like point B in the above figure). In the case of SDT, if f ′′ (a) < 0 then f ′ (x) is decreasing at x = a and, because f ′ (a) = 0, the derivative changes from positive to negative, so the point again must be a local maximum. If f ′′ (0) = 0 then the SDT is inconclusive and we need to use the FDT. Similar reasoning works for local minimum. Example 1. Find all critical points and determine which ones are local minima or 2 maxima for (a) f (x) = 3x5 − 5x3 , (b) f (x) = e−x . Solution: (a) Since f ′ (x) = 15x4 − 15x2 , it is defined everywhere. To find critical points we need to solve 15x4 − 15x2 = 0, or x2 (x2 − 1) = x2 (x − 1)(x + 1) = 0. The solutions are x = −1, 0, 1, so the function has three critical points where the tangent line is horizontal. To decide which ones are local minima or maxima, let us start with the first derivative test. By checking the sign of the derivative on each interval, x f ′ (x) (−∞, −1) (1, 0) (0, 1) (1, ∞) + − − + we see that −1 is a local maximum, +1 is a local minimum. The point x = 0 is neither a local maximum or minimum, because the function is decreasing before and after this point. We can also try a second derivative test. Since f ′′ (x) = 15(4x3 − 2x) = 30(2x3 − x), we can see that f ′′ (−1) < 0, f ′′ (+1) > 0 and f ′′ (x) = 0. So the second derivative test tells us that −1 is a local maximum, +1 is a local minimum, and it is inconclusive (does not tell us anything) at x = 0. So at x = 0 we have to use the first derivative test to see what is happening. 2 2 (b) Since f ′ (x) = e−x (−2x) = −2xe−x , it is defined everywhere. To find critical 2 points we need to solve −2xe−x = 0. There is only one solution x = 0, so the function has one critical point. The derivative is positive on (−∞, 0) and negative on (0, ∞), so x = 0 is a local maximum. We can also use the second derivative test: 2 2 f ′′ (x) = −2e−x + 4x2 e−x and f ′′ (0) = −2 < 0, so again we see that x = 0 is a local maximum. Exercise 1. Find all critical points and determine which ones are local minima or maxima for (a) f (x) = x − 3 ln(x), (b) f (x) = x4 − 18x2 + 1. Example 2. If a > 0 and b > 0 are √ positive constants, find all critical points f (x) = ax + b x and determine which ones are local minima or maxima. b Solution: Since f ′ (x) = a + 2√ and a and b are posix tive, the derivative is always positive and cannot equal to 0. However, it is undefined at x = 0 where we divide by zero, so x = 0 is the only critical point. To determine whether it is a local maximum or minimum, here we cannot use the first or second 2.6 Critical points 101 derivative √ test, because x = 0 is not inside the domain. The domain of the function ax + b x is x ≥ 0, so x = 0 is the left endpoint. Because the function is increasing to the right of it, x = 0 is a local minimum. As we will discuss in the next section, endpoints are often local or global minima or maxima even if they are not critical points, so they require special attention. Exercise 2. If a > 0 and b > 0 are positive constants, find all critical points f (t) = aet + be−t and determine which ones are local minima or maxima. In the next four problems we will refer to the following figures. Example 3. In the figure above on the left, we see a graph of the derivative f ′ (x) of some continuous function f (x). List all critical points of f (x) and determine which ones are local minima or maxima. Also identity all inflections points of f (x). Solution: Critical points are where the derivative is zero, x = −6, −1, 4 and 9, and where it is undefined, x = −2. By the first derivative test, local maxima are where the derivative changes from positive to negative, which happens at x = −2 and 9. Local minima are where the derivative changes from negative to positive, which happens at x = −6 and −1. Critical point x = 4 is neither a local minimum or maximum, because the derivative is positive on both sides, so the function f (x) increases before and after x = 4. Notice that the local maximum x = −2 is a corner (kink) of f (x) because the slope f ′ (x) jumps suddenly from +2 to −2. Finally, inflection points of f (x) are where the derivative f ′ (x) changes direction from increasing to decreasing or vice versa, so x = −4.4, −2, 0.6, 4, 7.6 (some values are approximate because it is hard to see exactly from the figure). Exercise 3. In the figure above on the right, we see a graph of the derivative f ′ (x) of some continuous function f (x). List all critical points of f (x) and determine which ones are local minima or maxima. Also identity all inflections points of f (x). Example 4. In the figure above on the left, we see a graph of the derivative f ′ (x) of some continuous function f (x). On the interval [0, 8], where does the function f (x) grow most rapidly and decay most rapidly? 102 2 Derivatives Solution: That the function f (x) grows most rapidly means that its derivative is as large as possible on this interval [0, 8], which happens at about x = 7.6. The function f (x) decays most rapidly means that its derivative is as small as possible on this interval [0, 8], which happens at x = 4. Notice that these points are inflection points of the original function, because f ′ (x) changes direction here. Exercise 4. In the figure above on the right, we see a graph of the derivative f ′ (x) of some continuous function f (x). On the interval [−2, 8], where does the function f (x) grow most rapidly and decay most rapidly? Example 5. Suppose that f (x) has a continuous derivative and we know its values in the following table: x f ′ (x) 0 1 1 2 3 −0.5 −0.25 0.5 4 1 5 1.5 6 0.5 7 −1 Estimate the coordinates of the critical points of f (x) on the interval [0, 7] and determine which ones are local minima or maxima. Solution: We see that the derivative changes sign (so crosses 0) somewhere in between 0 and 1, 2 and 3, and 6 and 7. If we connect two neighbouring points in the table by a line, between 0 and 1 the slope is −0.5−1 1−0 = −1.5, so the line is y = 1 − 1.5(x − 0). We want to know where it crosses zero, so we solve the 1 equation 1 − 1.5(x − 0) = 0 and get x = 1.5 = 23 . This point is a local maximum, because f ′ (x) changes from positive to negative, so the function changes from increasing to decreasing. Between 2 and 3 the slope is 0.75, so the line is y = −0.25 + 0.75(x − 2). Solving −0.25 + 0.75(x − 2) = 0 we get x = 2 31 . This point is a local minimum, because f ′ (x) changes from negative to positive. Finally, between 6 and 7 the slope is −1.5, so the line is y = 0.5 − 1.5(x − 6). Solving 0.5 − 1.5(x − 6) = 0 we get x = 6 13 . This point is a local maximum, because f ′ (x) changes from positive to negative. Of course, all critical points are only estimates, because we do not know the function exactly. Exercise 5. Suppose that f (x) has a continuous derivative and we know its values in the following table: x f ′ (x) 0 1 0.5 0.5 1 −1 1.5 −2 2 −3 2.5 −1 3 2 3.5 1 Estimate the coordinates of the critical points of f (x) on the interval [0, 7] and determine which ones are local minima or maxima. Example 6. Given the graphs of functions f (x) (solid green curve) and g(x) (dashed blue curve) below, find the critical points of f (g(x)). 2.6 Critical points 103 Solution: First of all, using the chain rule, ( f (g(x)))′ = f ′ (g(x))g′ (x). This derivative is equal to zero if either g′ (x) = 0 or f ′ (g(x)) = 0. From the graph of y = g(x) we see that g′ (x) = 0 at x = 0, where its slope is zero. From the graph of y = f (x) we see that f ′ (x) = 0 at x = −2 and x = 2. This means that f ′ (g(x)) = 0 when g(x) = −2 or g(x) = 2. We see that g(x) is never −2, but g(x) = 2 at x = −3 and x = 3. So ( f (g(x)))′ = 0 at x = −3, 0, 3. Bonus. If we were asked to determine local maxima and minima, we could use the second derivative test. The second derivative is equal to ′′ ′ f (g(x)) = f ′ (g(x))g′ (x) = f ′′ (g(x))(g′ (x))2 + f ′ (g(x))g′′ (x). Plugging in x = 0, f ′′ (g(0))(g′ (0))2 + f ′ (g(0))g′′ (0) = f ′ (0)g′′ (0) < 0, because f ′ (0) < 0 and g′′ (0) > 0. So x = 0 is a local maximum. Plugging in x = −3, f ′′ (g(−3))(g′ (−3))2 + f ′ (g(−3))g′′ (−3) = f ′′ (2)(g′ (−3))2 + f ′ (2)g′′ (−3) = f ′′ (2)(g′ (−3))2 > 0, because f ′′ (2) > 0 and (g′ (−3))2 > 0. So x = −3 is a local minimum. Similarly, we can check that x = 3 is also a local minimum. Exercise 6. Given the graphs of functions f (x) and g(x) in the above example, find the critical points of g( f (x)). Example 7. After a dog jumps into a pool, a beach ball starts floating up and down on the waves and its height is given by h(t) = e−t (cos(t) + sin(t)). Find the critical points for t ≥ 0 and explain what happens at those points. Solution: Using the product rule, the derivative h′ (t) is −e−t (cos(t) + sin(t)) + e−t (− sin(t) + cos(t)) = −2e−t sin(t). It is equal to zero when sin(t) = 0, so when t = 0, π, 2π, 3π, . . . . At these times the ball will be at the top or bottom of the wave. For example, at t = π, sin(t) changes sign from positive to negative, so −2e−t sin(t) changes sign from negative to positive, meaning that t = π is a local minimum. We can check other points similarly. Exercise 7. Suppose that constant a and b are positive and not equal to each other, a ̸= b. Show that y = e−ax − e−bx has a unique critical point. Is this critical point positive or negative? 104 2 Derivatives Answer to Exercise 1. (a) x = 3 is the only critical point, and it is a local minimum. (b) The critical points are −3, 0, 3; −3 and 3 are local minima, and 0 is a local maximum. Answer to Exercise 2. f ′ (t) = aet − be−t = 0 when e2t = ab , or t = 12 ln ba . In this case, it is easier to use the second derivative rule, because f ′′ (t) = aet + be−t > 0, so the critical point t = 21 ln ba is a local minimum. We could also use the first derivative rule. When t → −∞, the term aet gets small and the term be−t gets large, so f ′ (t) = aet − be−t gets large and negative. When t → +∞, the term aet gets large and the term be−t gets small, so f ′ (t) = aet − be−t gets large and positive. The derivative changes from negative to positive, so the critical point must be a local minimum. Answer to Exercise 3. Critical points are x = −6, −3, −2, 4, 8. Local maxima are x = −6, −2, local minima are x = −3, 4. Point x = 8 is neither. Inflection points are x = −4.5, −3, 0.5, 5.5, 8. Answer to Exercise 4. The function f (x) grows most rapidly means that its derivative is as large as possible on this interval [−2, 8], which happens at about x = 5.5. The function f (x) decays most rapidly means that its derivative is as small as possible on this interval [−2, 8], which happens at x = 0.5. Answer to Exercise 5. Between 0.5 and 1 the slope is −1−0.5 1−0.5 = −3, so the line is y = 0.5−3(x −0.5). Solving 0.5−3(x −0.5) = 0 we get x = 32 . This point is a local maximum, because f ′ (x) changes from positive to negative. Between 2.5 and 3 the 2+1 slope is 3−2.5 = 6, so the line is y = −1 + 6(x − 2.5). Solving −1 + 6(x − 2.5) = 0 2 we get x = 2 3 . This point is a local minimum, because f ′ (x) changes from negative to positive. Answer to Exercise 6. Using the chain rule, (g( f (x)))′ = g′ ( f (x)) f ′ (x). This derivative is equal to zero if either f ′ (x) = 0 or g′ ( f (x)) = 0. From the graph of y = f (x) we see that f ′ (x) = 0 at x = −2 and 2, where its slope is zero. From the graph of y = g(x) we see that g′ (x) = 0 at x = 0. This means that g′ ( f (x)) = 0 when f (x) = 0, which happens at x = −3.5, 0, 3.5. So (g( f (x)))′ = 0 at x = −3.5, −2, 0, 2, 3.5. Answer to Exercise 7. y′ (x) = −ae−ax + be−bx = 0 when e(a−b)x = ab , or x = 1 a a a a−b ln b . If a < b then a − b is negative, and ln b is also negative because b < 1, so a the critical point is positive. If a > b then a − b is positive, and ln b is also positive because ba > 1, so the critical point is positive. No matter what a and b are (as long as they are positive and not equal), the critical point is positive. 2.7 Optimization problems 105 2.7 Optimization problems In this section we will study optimization problems where the goal is to find the maximum or minimum value of some function y = f (x) on some given domain. In many problems we will have to translate a word problem in a formula f (x) first. The domain of f (x) will depend on the problem, but the most common case will be when the domain is a closed interval [a, b], in which case we need to • find all critical points on the interval [a, b], • compare the values of y = f (x) at critical points and endpoints a and b. The reason why we need to check the endpoints is because they can be local or global maximum or minimum even if they are not critical points. Example 1. Find global minima and maxima of f (x) = the interval [−2, 2]. Solution: First we find critical points: f ′ (x) = 4 + (x−1) − (x − 1)2 on 8 1 2 (x − 1)3 − 2(x − 1) = 0 2 2 − 2 = 0. We can when x − 1 = 0, i.e. x = 1, or (x−1) 2 rewrite this as (x − 1)2 = 4, or x − 1 = ±2, so x = −1 and x = 3. The point x = 3 is outside of the interval [−2, 2], so x = −1 and x = 1 are critical points inside (−2, 2). We could check whether they are local minima or maxima, but this is not necessary because we are looking for global min and max. We can simply plug in these critical points and the endpoints into f (x) and compare the values: f (−1) = −1.5, f (1) = 0.5, f (−2) = 1.625, f (2) = −0.375. This means that x = −1 is the global minimum and x = −2 is global maximum on the interval [−2, 2]. This matches the above figure. Notice that the endpoint x = −2 is the global maximum on [−2, 2], even though it is not a critical point. Exercise 1. Find global minima and maxima of g(x) = − 21 + 2(x − 1)e− [−2, 2]. Example 2. A restaurant sells pizza of diameter d between π 10′′ and 20′′ for the price of P(d) = 40 (d 2 − 100 ln(d/10)) dollars. Which diameter pizza is the best deal in terms of price per square inch? Solution: Area of pizza is A(d) = πr2 = 41 πd 2 , so dividing the price P(d) by area A(d), we get that the price per square inch is 1 ln(d/10) P(d) d 2 − 100 ln(d/10) = = − 10 . 2 A(d) 10d 10 d2 (x−1)2 2 on 106 2 Derivatives Since ln(d/10) = ln(d) − ln(10), its derivative is (ln(d/10))′ = d1 , so using the quotient rule P(d) ′ A(d) = 0 − 10 · 1 d · d 2 − ln(d/10) · 2d 1 − 2 ln(d/10) = −10 · . 4 d d3 Derivative is undefined at d = 0, because we divide by 0, but this is outside the domain [10, 20]. Derivative is equal to zero when 1 − 2 ln √ √ d 1 d d = 0 ⇒ ln = ⇒ = e1/2 = e ⇒ d = 10 e = 16.487, 10 10 2 10 which is inside the domain. Plugging into P(d) A(d) , we get that the price per square inch is 0.0816 when d = 16.487, 0.1 when d = 10, and 0.0826 when d = 20. So the best deal is 16.487′′ pizza. Exercise 2. Suppose that the distribution of grades in Calculus I follows a special case of the so called Beta distribution with the shape y = 168x5 (1 − x)2 on the interval [0, 1], shown in the figure. The grade is measured as the proportion out of 100 and, e.g. x = 0.81 represents grade 81. What grade is the most likely, or most common one? In the next few problems we will have two variables, but they will be related through some given constraint and, as a result, we will be able to eliminate one variable by expressing it in terms of the other and then optimize as usual. The next two problems will refer to the following figures. Example 3. In the figure above on the left, a region consisting of an x × y rectangle and two semicircles is enclosed by 100 meters of fence. What shape will maximize the area of this region? Solution: First of all, recall that the perimeter of a circle of radius r is 2πr, and its area is πr2 . The radius of each semicircle in the figure is 2y , so the perimeter P and the area A of the region are 2.7 Optimization problems 107 P = 2x + 2π y = 2x + πy, 2 A = xy + π y 2 2 = xy + πy2 . 4 The perimeter is 100 meters, so 2x +πy = 100. This means that one of the variables is completely determined by the other, for example, x = 50 − πy 2 , and we can write the area in terms of y only, A = xy + πy πy2 πy2 πy2 = 50 − y+ = 50y − . 4 2 4 4 Now we need to maximize this, keeping in mind that 2x + πy = 100 means that πy ′ πy must be between 0 and 100, so y ∈ [0, 100 π ]. Since A (y) = 50 − 2 = 0 when 100 y = π , the only critical point is the endpoint of our domain. This means that we need to compare area at the endpoints, A(0) = 0, and 100 100 π 100 2 = 50 − = 795.7747 . . . . A π π 4 π π 100 The shape that maximizes the area corresponds to y = 100 π and x = 50 − 2 π = 0, 100 which is a circle of diameter y = π . Actually, among all shapes with the same perimeter, circle will always have the largest area, because it is always advantageous to “stretch the perimeter in all directions”. Here, a circle was one allowed shape and we saw that, indeed, it maximized the area. Exercise 3. In the figure above on the right, a region consisting of an x×y rectangle and one semicircle is enclosed by 100 meters of fence. What shape will maximize the area of this region? Notice that circle is not an option here. Example 4. Inside a hemisphere of radius 1 we want to place a cylinder vertically, as in the figure, with the largest possible volume. What is the height h and the radius of the base r of this cylinder? Solution: The top of the cylinder should touch the sphere, so the hypotenuse of the right triangle in the figure is the radius of the sphere, which is 1. This means that r2 + h2 = 1. On the other hand, the volume of the cylinder is equal to V = πr2 h, which is the area of the base πr2 times the height h. From r2 + h2 = 1 we get that r2 = 1 − h2 and we can rewrite the volume as V = πr2 h = π(1 − h2 )h = π(h − h3 ). The domain is [0, 1] because the height h cannot be bigger than 1, so we want to maximize π(h − h3 ) on the interval [0, 1]. Since V ′ (h) = π(1 − 3h2 ) = 0 when h2 = 31 , or h = √13 , this is the only critical point in the domain. We can see that the volume V is zero at the endpoints h = 0 or h = 1, and V = 1.2091 when h = √13 , q so the cylinder with the largest volume has height h = √13 and radius r = 23 . 108 2 Derivatives Exercise 4. We want to make one round enclosure and one square enclosure using ℓ meters of fence total. What part of the fence we should spend on each enclosure to maximize the total area? What if we wanted to minimize the total area? In the next example, we will encounter a situation when the domain is not a finite closed interval [a, b], and so we have to argue a bit more carefully instead of just comparing critical points and the endpoints. Example 5. Suppose that in a family of similar drugs, the price of a drug with a half-life h hours in a patient’s body is h2 dollars per mg. If we have $100 to spend, what drug should we buy if our goal is to maximize the amount of drug remaining in a patient’s body 2 hours after administering it. Solution: $100 is the price of 100 mg of drug with the half-life h hours. Half-life h h2 means that the amount of drug in a patient’s body is decaying exponentially and the amount remaining after t hours is proportional to 2−t/h . In our case, the amount 2−t/h after t hours, and after 2 hours it will be left will be 100 h2 a= 100 − 2 2 h. h2 We want to maximize this over h > 0. We can of course take the derivative to find critical points, but there is one useful trick that can simplify the calculations. We see that we divide by h everywhere, so if we rename 1h as x then a = 100x2 2−2x . Let us maximize this function over x > 0 and then find optimal h = 1x . First, a′ (x) = 100(2x)2−2x + 100x2 ln(2)2−2x (−2) = 2−2x 200x 1 − ln(2)x = 0 1 = 1.44. If we plug these points into a(x) = 100x2 2−2x , when x = 0 or x = ln(2) we get a(0) = 0 and a(1.44) = 28.16. Does this mean that the global maximum is 1 ? Since our domain here is all x ∈ (0, ∞), which is not a finite closed at x = ln(2) interval, what do we do about the endpoints? We already checked what happens to a(x) at x = 0, a(0) = 0, but what about x = ∞? Of course, we can graph the function and see 1 that x = ln(2) is the global maximum. However, without a graphical calculator, there are several ways we can proceed. One way to decide if x = 1.44 is a global maximum is to look at the deriva1 tive a′ (x). We see that a′ (x) < 0 when x > ln(2) because 1 − ln(2)x < 0, so the function a(x) is decreasing after this critical point. This shows that 1 indeed x = ln(2) is a global maximum. Another way would be to notice that a(x) = 100x2 2−2x = 100x2 →0 22x 2.7 Optimization problems 109 when x → ∞ because exponential growth 22x dominates the power function x2 at 1 is a global maximum. Finally, we recall infinity. This also shows that x = ln(2) that the original problem was in terms of the half-life h, and the optimal choice is h = 1x = ln(2). Exercise 5. A swimmer is 20 meters from the closest point A on the beach and her beach umbrella is at the point B, which is 60 meters from point A. A swimmer swims at 1 m/s and runs on the beach at 4 m/s. Toward which point C between A and B should the swimmer aim to minimize the total time to reach her umbrella? Example 6. Suppose we have a 10′′ × 10′′ cardboard, and we cut off the four corners of size x × x and then fold the sides to make a box. What is the maximum volume of the box V = V (x)? Solution: The dimensions of the box will be (10 − 2x) × (10 − 2x) × x, and the volume is V (x) = (10 − 2x)2 x. The domain is 0 ≤ x ≤ 5. Let us find critical points: V ′ (x) = 2(10 − 2x)(−2)x + (10 − 2x)2 = (10 − 2x)(−4x + (10 − 2x)) = (10 − 2x)(10 − 6x) = 0 when x = 5, x = 10 6. Since V (0) = V (5) = 0, the maximum volume is V ( 10 6 ) ≈ 74. Exercise 6. The Statue of Liberty is 46 meters high and it stands on a pedestal which is also 46 meters high. At what distance d from the base is the angle of view of the statue (angle θ in the figure) as large as possible? If the domain is an infinite or open interval, for example [0, ∞) or (0, 1), then a function might not have a global maximum or minimum, as we will see in the next two problems. 110 2 Derivatives Example 7. Give an example of a continuous function that (a) does not have a global minimum on [0, ∞), (b) does not have a global maximum on (−2, 2). Solution: (a) For example, an exponential decay function y = e−x does not have a global minimum on [0, ∞), because it is decreasing and approaching 0 as x → ∞, but it never actually reaches 0, so there is no point x where e−x takes the smallest value. (b) For example, y = x2 does not have a global maximum on (−2, 2), because it is approaching 4 as x approaches −2 or 2, but it never actually reaches 4 because −2 and 2 are not in the domain, so there is no point x on (−2, 2) where e−x takes the largest value. Exercise 7. Sketch a graph of a differentiable function y = f (x) on the open interval (−4, 4) such that • f (x) has at least one local min on • f (x) does not have a global min on (−4, 4) (−4, 4) • f (x) has at least one local max on • f (x) has a critical point at x = 3 (−4, 4) which is not a local max or min • f (x) has a global max on (−4, 4) • f (x) has an inflection point at x = −2 The next two problems will refer to the following two figures. Example 8. In the figure above on the left we are given the graph of the second derivative f ′′ (x) of some function y = f (x) on the interval [−4, 4]. If f ′ (−2) = 0, where is the global maximum and minimum of f (x) on this interval? Solution: First, we will use that f ′′ (x) is the derivative of f ′ (x). Because f ′′ (x) in the figure is negative for x < −2, f ′ (x) is decreasing for x < −2, and because f ′′ (x) is positive for x > −2, f ′ (x) is increasing for x > −2. This means that x = −2 is the global minimum of f ′ (x). We are given that f ′ (−2) = 0, so f ′ (x) is never negative, as in the figure on the right. 2.7 Optimization problems 111 Once we know that f ′ (x) ≥ 0, this means that f (x) is increasing, so its global maximum on the interval [−4, 4] is at x = 4, and its global minimum is at x = −4, as in the figure. Exercise 8. In the figure above on the right we are given the graph of the second derivative f ′′ (x) of some function y = f (x) on the interval [−4, 4]. If f ′ (1) = 0, where is the global maximum and minimum of f (x) on this interval? Answer to Exercise 1. First we find critical points: g′ (x) = 2e− (x−1)2 2 − 2(x − 1)2 e− (x−1)2 2 =0 when 2 − 2(x − 1)2 = 0, or (x − 1)2 = 1, or x − 1 = ±1, so x = 0 and x = 2. These points are inside the interval so it remains to compare the values: g(0) = −1.713, g(2) = 0.713, g(−2) = −0.566. This means that x = 0 is the global minimum and x = 2 is global maximum on the interval [−2, 2]. This matches the above figure. Answer to Exercise 2. Let us find critical points first: y′ (x) = 168 · 5x4 (1 − x)2 − 168 · 2x5 (1 − x) = 168x4 (1 − x) 5(1 − x) − 2x = 0 when x = 0, x = 1, or when 5(1 − x) − 2x = 0, i.e. x = 57 = 0.714. At the endpoints y(0) = y(1) = 0, while y(0.714) = 2.5499, so the most likely grade is x = 0.714, or about 71. By the way, if you were wondering, the constant 168 was chosen in such a way that the area under the curve is equal to 1, representing 100% of all students. Answer to Exercise 3. The perimeter P and the area A of the region are P = 2x + y + π 2+π y = 2x + y, 2 2 1 y 2 πy2 A = xy + π = xy + . 2 2 8 2+π The perimeter is 100 meters, so 2x + 2+π 2 y = 100, and x = 50 − 4 y, so we can write the area in terms of y as 2+π πy2 4+π 2 A = 50 − y y+ = 50y − y . 4 8 8 200 Since A′ (y) = 50 − 4+π 4 y = 0 when y = 4+π = 28.0049, this is the only critical 2+π 2+π 200 point. Since 2x + 2 y = 100, 2 y must be between 0 and 100, so y ∈ [0, 2+π ]= [0, 38.8984], and the critical point 28.0049 is inside this domain. It remains to compare the values: A = 0 at y = 0, A = 700.1239 at y = 28.0049, and A = 594.1889 at y = 38.8984. Maximal area is A = 700.1239 when y = 28.0049 and x = 14.0024. 112 2 Derivatives Answer to Exercise 4. If we spend x meters on a round enclosure, we will have ℓ − x meters left for a square enclosure. If the radius of the round enclosure is r x2 x . Then its area is πr2 = 4π . Similarly, if the side of the then 2πr = x, so r = 2π square enclosure is s then 4s = ℓ − x and s = means that the total area will be A= ℓ−x 4 . Then its area is s2 = (ℓ−x)2 16 . This x2 (ℓ − x)2 + . 4π 16 The domain of this function is [0, ℓ] because we can spend anywhere between 0 and ℓ meters on the round enclosure. Let us find critical points: A′ (x) = x ℓ−x ℓ x x 4+π π − =0 ⇒ = + = x ⇒ x= ℓ ≈ 0.44ℓ. 2π 8 8 2π 8 8π 4+π If we plug this into A, we will get 0.035ℓ2 . At the endpoints, A(0) = 0.0625ℓ2 and A(ℓ) = 0.0795ℓ2 . This means that the largest total area is when we spend all fence ℓ on one round enclosure, and the smallest total area is when we spend ≈ 0.44ℓ on the round enclosure and ≈ 0.56ℓ on the square enclosure. Notice that, as in Example 3, the maximum area is achieved when we make one big circle enclosure, which was a possible option in this problem. Answer to Exercise 5. If the distance √ between A and C is x then the swimming distance between the swimmer and C is 202 + x2 and the running distance between C and B is 60 − x. Since the swimming speed is 1m/s and running speed is 4m/s, the total time to reach her umbrella is √ 60 − x 202 + x2 60 − x p T= + = 400 + x2 + . 1 4 4 We want to minimize this for x between 0 and 60. Let us find critical points: x 2x 1 1 T ′ (x) = √ − =√ − =0 2 400 + x2 4 400 + x2 4 when 4x = √ 400 + x2 and, squaring both sides, 2 16x = 400 + x 2 =⇒ 400 x = 15 2 r =⇒ x= 400 = 5.16. 15 This critical value is in the domain [0, 60] and plugging it in, we get T = 34.36 seconds. Then we then check the endpoints, T (0) = 35 and T (60) = 63.24, and we see that the optimal point C is at the distance 5.16 meters from A. 2.7 Optimization problems 113 Answer to Exercise 6. Let α be the angle between the horizontal line and line of sight to the bottom of the Statue of Liberty, and let β be the angle between the horizontal line and line of sight to the top of the Statue of Liberty, as in the figure. Then θ = β − α. From the right triangles, we see that tan(α) = 46 d and tan(β ) = 92 , d 92 92 46 so α = arctan( 46 d ), β = arctan( d ), and θ = arctan( d ) − arctan( d ). We want to maximize this angle θ over all distances d > 0. First, let us find the critical points. 1 Recall that (arctan(x))′ = 1+x 2 . Then θ ′ (d) = 92 46 1 92 1 46 − − 2 = 2 − 2 . − 92 2 46 2 2 d d d + 46 d + 922 1+( d ) 1 + ( d )2 If we set this equal to zero and solve for d, we get d 2 = 92 · 46, so d = 65.05. To check that this is the global maximum, we can notice that θ (d) approaches 0 when both d → 0 and d → ∞. This can be seen from the figure, or check using that arctan(0) = 0 and arctan(x) → π2 as x → ∞. The best viewing angle of the Statue of Liberty is at the distance of ≈ 65 meters. Answer to Exercise 7. The graph is mostly self-explanatory. The reason there is no global minimum is that the smallest values of the function are near x = 4 where f (x) is decreasing but never reaches the smallest value because x = 4 is not in the domain. Answer to Exercise 8. Because f ′′ (x) in the figure is positive for x < 1, f ′ (x) is increasing for x < 1, and because f ′′ (x) is negative for x > 1, f ′ (x) is decreasing for x > 1. This means that x = 1 is the global maximum of f ′ (x). We are given that f ′ (1) = 0, so f ′ (x) is never positive. Once we know that f ′ (x) ≤ 0, this means that f (x) is decreasing, so its global maximum on the interval [−4, 4] is at x = −4, and its global minimum is at x = 4. 114 2 Derivatives 2.8 Parametric families of functions The goal of this section is twofold. We will introduce several families of functions that appear in applications and can be used, for example, for modelling various growth and decay processes. One the other hand, understanding the shape of these functions will give us an opportunity to use derivatives, in addition to reviewing scaling of functions. As a motivation, let us keep in mind the following question, which will be answered after we introduce these families of functions. Motivating Question. Match the following families of functions 1. bell curve y = ce − (t−a)2 2b2 ; c ; 2. logistic curve y = −b(t−a) 1+e 3. exponential with a limit curve y = c(1 − e−bt ); 4. surge function y = cte−bt ; 5. quadratic polynomial y = −at 2 + bt + c; with the following behaviour of bacterial growth and decay, (a) the number of bacteria in a Petri dish grows quickly from the beginning, but then runs out of food and dies out quickly; (b) the number of bacteria in a Petri dish grows faster and faster initially and then stabilizes; (c) the number of bacteria in a Petri dish grows quickly from the beginning until it stabilizes; (d) the number of bacteria in a Petri dish grows faster and faster, but then runs out of food and dies out over time; (e) the number of bacteria in a Petri dish grows quickly from the beginning, but then runs out of food and dies out over time. Bell curve. First, let us introduce the so called bell curve given by y = ce − (x−a)2 2b2 where a ∈ R is any real number, and b > 0, c > 0 are any positive numbers. Various features of these curves are summarized in the figure below, and we will check some of them in the next two problems. This family of curves is most famously used to describe (or model) the distribution of many quantities occurring in nature, physical experiments, finance, etc.16 Here, we are simply interested in its shape 16 https://en.wikipedia.org/wiki/Normal_distribution#Occurrence_and_applications 2.8 Parametric families of functions 115 and basic properties. In the first example we will see how we can obtain all bell curves by rescaling one of them. x2 Example 1. Given a standard bell curve y = e− 2 with parameters a = 0, b = 1 and c = 1, if we stretch its graph horizontally b times, stretch it vertically c times and shift it to the right by a, what is the function corresponding to the resulting curve? Solution: Recall that stretching a graph of f (x) horizontally b times corresponds to f ( bx ), then stretching the result vertically c times corresponds to c f ( bx ) and, finally, −x2 /2 , we will get shifting to the right by a corresponds to c f ( x−a b ). When f (x) = e 2 2 y = ce−(x−a) /(2b ) , which is the general bell curve above. x2 Exercise 1. Show that y = e− 2 has the global maximum y = 1 at x = 0 and two inflection points at x = −1, x = +1. Explain how this, together with the previous example, implies the location of the maximum (y = c at x = a) and inflection points (x = a ± b) for the general bell curve above. Logistic curve. Next, let us consider the so called logistic function given by y= c c = 1 + e−b(x−a) 1 + κe−bx where parameter a is any real number, b > 0, c > 0, κ > 0 are any positive numbers, and where parameters a and κ are interchangeable and related by κ = eba or a = 116 2 Derivatives 1 b ln(κ). Logistic curves have many applications, for example, in modelling various growth processes.17 The two formulas above are just slightly different representations of the same function because e−b(x−a) = e−bx+ba = eba e−bx = κe−bx if we set κ = eba . In other words, parameter a can be replaced by κ or vice versa. Various features of these curves are summarized in the figure. For example, e−bx c goes to 0 as x → +∞, so the function approaches 1+0 = c as in the figure, and e−bx c goes to ∞ as x → −∞, so the function approaches 1+∞ = 0. In the next example we will see how we can obtain all logistic curves by rescaling one of them. Example 2. Given a standard logistic curve y = 1+e1 −x with parameters a = 0, b = 1 and c = 1, if we shrink its graph horizontally b times, stretch it vertically c times and shift it to the right by a, what is the function corresponding to the resulting curve? Solution: Shrinking a graph of f (x) horizontally b times corresponds to f (bx), then stretching the result vertically c times corresponds to c f (bx) and, finally, shifting to the right by a corresponds to c f (b(x − a)). When f (x) = 1+e1 −x , we will get c y = 1+e−b(x−a) , which is the general logistic curve above. 17 https://en.wikipedia.org/wiki/Logistic_function#Applications 2.8 Parametric families of functions 117 Exercise 2. Show that y = 1+e1 −x is increasing and has one inflection point at x = 0, where y(0) = 12 . Discuss how this, together with the previous example, explains the shape of the general logistic curve above. Exponential with a limit. Next, we will introduce the exponential with a limit function given by y = c(1 − e−bx ) for x ≥ 0, where b > 0 and c > 0 are any positive numbers. Notice that here our domain starts at x = 0 and we do not shift the function horizontally, so we do not have a parameter a as in the above two families. We think of x = 0 as the starting point of the process, although we could, of course, introduce a horizontal shift if we wanted to. The function y = c(1 − e−bx ) is a familiar exponential decay function e−bx flipped around the x-axis by a minus sign, −e−bx , then shifted up by one, 1 − e−bx , and stretched vertically c times. As a result, it is increasing, and we can say that it is approaching a horizontal asymptote y = c exponentially fast, as in the figure. For this reason, these functions can be used to describe processes that approach some limiting value c exponentially fast (see Exercise 3 below for a real life example). We can also choose a standard one in the family and obtain all these curves by rescaling one of them. Example 3. Given an exponential with a limit y = 1 − e−x with parameters b = 1 and c = 1, if we shrink its graph horizontally b times and stretch it vertically c times, what is the function corresponding to the resulting curve? Show that, as in the figure above, y = c(1 − e−bx ) is increasing and concave down if parameters b > 0 and c > 0 are positive, and the derivative y′ (0) = cb. 118 2 Derivatives Solution: Shrinking a graph of f (x) horizontally b times corresponds to f (bx), then stretching the result vertically c times corresponds to c f (bx). When f (x) = 1−e−x , we will get y = c(1−e−bx ), which is the general exponential with a limit curve. The derivative y′ (x) = cbe−bx is positive, so the function is increasing, and y′ (0) = cb. Exercise 3. A pie at room temperature of 20◦ C is put into an oven at temperature 200◦ C, after which its temperature T starts to change according to the formula T = k + c(1 − e−bt ), where time t is measured in minutes. What is k and c? If the pie temperature is initially increasing at 18◦ C per minute, what is b? Surge function. Next, we will consider the so called surge function y = cxe−bx for x ≥ 0, where b > 0, c > 0 are any positive numbers. Similarly to the exponential with a limit, the domain here starts at zero and the function grows quickly from the beginning, but eventually starts decreasing. The function has a horizontal asymptote y = 0 as x → +∞ because cxe−bx = ecxbx and exponential growth ebx dominates the power function x at infinity. The constant c here does not quite play the same role as before (stretching vertically by c) for the reason explained in the next example. Example 4. Given a standard surge function y = xe−x with parameters b = 1 and c = 1, if we shrink its graph horizontally b times and stretch it vertically c times, what is the function corresponding to the resulting curve? 2.8 Parametric families of functions 119 Solution: As before, shrinking a graph of f (x) horizontally b times corresponds to f (bx), then stretching the result vertically c times corresponds to c f (bx). When f (x) = xe−x , we will get y = cbxe−bx . It does not really make sense to write a constant cb in front of xe−bx , because it is just one constant, so we wrote it simply as c in the definition of the general surge function. Of course, given y = cxe−bx , if we rewrite it as y = ( bc )bxe−bx then bc is the vertical stretch factor; that is why in c the figure above the global maximum be = bc · 1e is proportional to bc . Exercise 4. Show that y = xe−x has global maximum y = 1e at x = 1, and one inflection point at x = 2. If a surge function y = cxe−bx has a global maximum at x = 0.75, where is its inflection point? Now that we discussed all the families of functions in the Motivating Question above (except for the quadratic polynomial that we are already familiar with), go back to that question and try to match different shapes with the description of bacterial growth or decay. After that, check your answers with the answers at the end of the section. We will introduce one more family of functions below, called the Gompertz functions, but first let us solve a couple of quick problems about parametric families of functions. 2 Example 5. Find the global minimum of y = x + ax for x > 0, where parameter a is positive, a > 0. Can we write this family as a horizontal and vertical stretching of y = x + 1x ? 2 Solution: We see that y′ (x) = 1 − ax2 = 0 when x2 = a2 or x = ±a. Also, the derivative is undefined at x = 0, because we divide by zero. However, since our domain is x > 0, the only critical point is x = a. We can see that the derivative is negative for x < a and positive for x > a, so x = a is the global minimum where the value is y(a) = 2a. The func2 tion y = x + ax approaches +∞ when x approaches 0 or +∞. Check that this matches the graphs for a = 0.5, 1 and 2 in the figure. If we start with y = x + 1x , then stretch it horizontally b times and vertically c times, we will get x x b c cb cy =c + = x+ . b b x b x 2 Can we choose b and c in such a way as to get y = x + ax ? In other words, we need that bc = 1 and bc = a2 , so b = c and b2 = a2 . Yes, we can take b = c = a, so this family of functions is y = x + 1x stretched horizontally a times and vertically a times. Exercise 5. Find a function of the form y = ae−x + bx with the global minimum at (1, 2). 120 2 Derivatives Gompertz curve. Finally, we consider the Gompertz function y = ce−e −b(x−a) = ce−κe −bx where κ > 0, b > 0, c > 0 are any positive numbers, a ∈ R is any real number, and where κ = eba , or a = 1b ln(κ). Gompertz functions are used to model growth of tumours, adoption of technology (e.g. cellphones), etc.18 As in the case of logistic function, the two formulas above are just different representations of the same function, where the parameter a can be replaced by κ or vice versa. All Gompertz functions are rescalings of one of them. −x Example 6. Given a standard Gompertz curve y = e−e with parameters a = 0, b = 1 and c = 1, if we shrink its graph horizontally b times, stretch it vertically c times and shift it to the right by a, what is the function corresponding to the −x resulting curve? What are the horizontal asymptotes of y = e−e ? Solution: Shrinking a graph of f (x) horizontally b times corresponds to f (bx), then stretching the result vertically c times corresponds to c f (bx) and, finally, shifting −x to the right by a corresponds to c f (b(x − a)). When f (x) = e−e , we will get −b(x−a) y = ce−e , which is the general Gompertz curve above. Because e−x → 0 as −x x → ∞, we see that e−e → e−0 = 1, and because e−x → ∞ as x → −∞, we see −x that e−κe → e−∞ = 0. This matches the horizontal asymptotes of the general Gompertz function in the figure above after vertical rescaling by c. 18 https://en.wikipedia.org/wiki/Gompertz_function#Example_uses 2.8 Parametric families of functions 121 −e−x Exercise 6. Show that y = e is increasing and has one inflection point at x = 0, where y(0) = 1e ≈ 0.37. Discuss how this, together with the previous example, explains the shape of the general Gompertz curve above. Bonus: Gompertz decay function. If we change b to −b in the Gompertz curve above, we will get another version of the Gompertz function, b(x−a) y = ce−e = ce−κe bx where κ > 0, b > 0, c > 0 are any positive numbers, a ∈ R is any real number, and where κ = e−ba , or a = − b1 ln(κ). What changing b to −b does is it flips the Gompertz growth functions horizontally and turns them into Gompertz decay functions. Most famously, these functions can be used to model human survival chances with age, based on the empirical Gompertz law of mortality.19 . We will not discuss this further here, but you can learn more about it in the video in the footnote link.20 Similarly, by changing b to −b, logistic growth above can be turned into logistic decay. Answer to the Motivating Question. 1(d), 2(b), 3(c), 4(e), 5(a). x2 Answer to Exercise 1. Since f ′ (x) = e− 2 (−x) = 0 when x = 0, and f ′ (x) is negative for x > 0 and positive for x < 0, we see that the critical point x = 0 is the global maximum. Since 19 20 https://en.wikipedia.org/wiki/Gompertz-Makeham_law_of_mortality https://youtu.be/6Lyv53YTPDU 122 2 Derivatives x2 x2 x2 f ′′ (x) = e− 2 (−x)2 + e− 2 (−1) = e− 2 (x2 − 1), we can see that f ′′ (x) = 0 when x = −1 or +1, it is negative in between x = −1 and x = +1 and positive elsewhere, so the function is concave down between x = −1 and x = +1 and concave up outside of this interval. This means that x = −1 and 2 x = +1 are inflection points. If we stretch the graph of y = e−x /2 horizontally b times, stretch it vertically c times and shift it to the right by a, the maximum will move to y = c at x = a, and the inflection points will move to a − b and a + b, just like in the figure of the general bell curve. −x e Answer to Exercise 2. Since f ′ (x) = (1+e −x )2 , the derivative is always positive so the functions is always increasing. Since f ′′ (x) = −e−x (e−x − 1)2 + e−x 2(e−x − 1)(−e−x ) e−x (e−x − 1) = , (1 + e−x )4 (1 + e−x )3 we can see that f ′′ (x) = 0 when e−x = 1, or x = 0, it is positive for x < 0 and negative for x > 0, so the function is concave up for x < 0 and concave down for x > 0. This means that x = 0 is the only inflection point. If we shrink the graph of y = 1+e1 −x horizontally b times, stretch it vertically c times and shift it to the right by a, the inflection will move to x = a where the value will be y = 2c , just like in the figure of the general logistic curve. Answer to Exercise 3. Since T (0) = k + c(1 − e−b·0 ) = k and at time t = 0 the pie is at room temperature 20◦ C, this means that k = 20. The limit of T = k + c(1 − e−bt ) as t → ∞ is k + c(1 − 0) = k + c, which should be the oven temperature, so k + c = 200 and c = 200 − k = 200 − 20 = 180. So T = 20 + 180(1 − e−bt ). The derivative T ′ (0) = 180b is exactly the initial rate of increase of the pie temperature, so 180b = 18 and b = 0.1. We finally get that T = 20 + 180(1 − e−0.1t ). Answer to Exercise 4. Since y′ (x) = (xe−x )′ = e−x − xe−x = (1 − x)e−x , we can see that y′ (x) = 0 when x = 1, it is positive for x < 1 and negative for x > 1, so x = 1 is the global maximum. If we plug in x = 1 we get y = 1 · e−1 = 1e . The second derivative y′′ (x) = ((1 − x)e−x )′ = −e−x − (1 − x)e−x = (−2 + x)e−x is equal to zero at x = 2, where it changes sign from negative to positive, so x = 2 is an inflection point, the function is concave down for x < 2 and concave up for x > 2 as in the figure above. If a surge function y = cxe−bx has a global maximum at x = 0.75, it means that b1 = 0.75 and, as a result, the inflection points is b2 = 2 × 0.75 = 1.5. Answer to Exercise 5. Since the global minimum is at (1, 2), x = 1 must be a critical point. Since y′ (x) = −ae−x + b, we must have y′ (1) = −ae−1 + b = 0, or 2.8 Parametric families of functions 123 a = eb. Also, y(1) = 2 so ae−1 + b = 2. Because a = eb, we get ae−1 + b = b + b = 2, so b = 1 and, finally, a = e. −x −x Answer to Exercise 6. Since y′ (x) = e−e e−x = e−e −x > 0, the function is al−x ways increasing. Since y′′ (x) = e−e −x (e−x − 1) = 0 when e−x = 1 or x = 0, y′′ (x) is positive for x < 0 and negative for x > 0, the function is concave up for x < 0, concave down for x > 0 and has inflection points at x = 0, where y(0) = 1e . After rescaling and horizontal shift as in the previous example, we get exactly the shape as in the figure. 124 2 Derivatives 2.9 Related rates In this section we will take a look at problems where • several quantities are related through some geometric or physical constraint, • these quantities are changing at the same time. If we have information about the rate of change of one (or more) of them, we can find the rate of change of the other by using the relationship between these quantities. For example, if variables x = x(t) and y = y(t) are related though some equation and we know y′ (t), we can find x′ (t) by using this equation. We will see that, typically, we can solve the equation for x and then take the derivative, but sometimes we can use implicit differentiation by taking the derivative of the equation and then solving for x′ (t). Example 1. During takeoff an airplane is climbing (gaining altitude) at 900 meters per minute, and the temperature outside is dropping at 7◦ C per 1000 meters. How fast is the temperature outside the plane changing? Solution: Let us start by writing all the variables, the given information about the variables, and the quantity we want to find. • Variables mentioned in the problem are: altitude y (in meters), temperature T (in ◦ C), and time t (in minutes). • We are given the following information: y′ (t) = dy dT = 900 m/min and T ′ (y) = = −7 ◦ C/1000m = −0.007 ◦ C/m. dt dy Notice that, because we were given the rate of change of temperature in degrees per 1000 meters, it means that at this moment we view temperature as a function of altitude, T = T (y). • The question “How fast is the temperature outside the plane changing?” means that we want to find the rate of change T ′ (t) = dT dt of temperature with respect to time. In other words, we are interested in temperature as a function of time, T = T (t). To summarize, we have three variables y, T and t, we have some information about y(t) and T (y), and we want to learn something about T (t). For this, we need to find the relationship between all the functions involved, y(t), T (y) and T (t). In this case, the key equation that relates all the functions is T (t) = T (y(t)). In other words, we plug in y(t) into T (y) to get T (y(t)) – temperature as a function of time. Sometimes the relationship is given simply by composition of functions. Since we want to find T ′ (t), we differentiate the key equation, in this case using the chain rule T ′ (t) = (T (y(t)))′ = T ′ (y(t))y′ (t) = −0.007 ◦ C/m × 900 m/min = −6.3 ◦ C/min. 2.9 Related rates 125 Exercise 1. A colony of bacteria in a Petri dish is growing in a circular shape. When the radius is 5 mm, it is growing at the rate of 1 mm per day. How fast does the area change at that moment? Example 2. The plane is climbing at 400 meters per minute, and the temperature outside the plane is dropping at 2◦ C per minute. At that moment, how fast is the temperature changing with altitude? Solution: This example is almost the same as the first example above, and the solution is the same up to the derivative of the key equation: T ′ (t) = T ′ (y(t))y′ (t). This relationship allows us to find one of the rates T ′ (t), T ′ (y), y′ (t) given the other two. For example, right now we are given that y′ (t) = 400 m/min and T ′ (t) = −2 ◦ C/min, and we can find T ′ (y(t)) = 2 ◦ T ′ (t) =− C/m = −5 ◦ C/km. ′ y (t) 400 If, instead, we knew how fast the temperature outside is changing, T ′ (t), and how temperature changes with altitude, T ′ (y), we could find how fast the airplane is ′ (t) climbing at that moment, solving for y′ (t) = T T′ (y(t)) . This is an example of implicit differentiation, which will also be useful in the next exercise. Exercise 2. Bread dough is rising in the oven during the first few minutes of baking. Its shape is (roughly) hemispherical and, at the moment when the radius is r = 10 cm, its volume is increasing at the rate of 200 cm3 /min. How fast is the radius changing at the same moment? Example 3. A 1.3 meter broom leaning against the wall starts sliding away from the wall. When the broom head is 1 meter from the wall (i.e. x = 1 in the figure) and it is sliding at a rate of 0.5 m/s, how fast is the top of the handle sliding along the wall? Solution: The variables mentioned in the problem are x and y depicted in the figure, and time t. We are given that x′ (t) = 0.5 m/s when x(t) = 1 m, and we want to find y′ (t). What is the relationship between x and y? We are also given that the broom is 1.3 meters long, which we can view geometrically as the hypotenuse of the right triangle with the other two sides x and y, so the key equation is x2 + y2 =p1.32 , or x(t)2 + y(t)2 = 1.69. Since we want to find y′ (t), we can solve for y(t) = 1.69 − x(t)2 and take the derivative, 1 1 y′ (t) = p (−2x(t))x′ (t) = √ (−2 · 1)0.5 ≈ −0.6 m/s. 2 2 1.69 − 12 2 1.69 − x(t) 126 2 Derivatives Another way it to use implicit differentiation. Taking the derivative of the equation x(t)2 + y(t)2 = 1.69 we get 2x(t)x′ (t) + 2y(t)y′ (t) = 0, and then solving for y′ (t), y′ (t) = − x(t)x′ (t) . y(t) We know thatp x(t) = 1 and x′ (t)√= 0.5, and we can find y(t) from the key equation 0.5 again, y(t) = 1.69 − x(t)2 = 1.69 − 12 = 0.83, so y′ (t) = − 0.83 ≈ −0.6 m/s. Exercise 3. A person standing on the Cherry beach in Toronto is watching an airplane take off from the Billy Bishop airport. When the airplane’s altitude is y = 550 meters and the person’s angle of view is θ = π6 , the plane’s altitude is increasing at 400 meters per minute and the angle of view is decreasing at 1.8 radians per minute. How fast is the horizontal distance x to the plane changing at that moment, in km/h? Exercise 4. The Statue of Liberty is 46 meters high and it stands on a pedestal which is also 46 meters high. When a cruise boat is at the distance d = 100 meters 2.9 Related rates 127 and is approaching the statue with the speed 5 m/s, how fast does the angle of view of the statue θ change? To summarize, the first steps in solving related rates problems are: • Start by writing all the variables, the given information about the variables, and the quantity we want to find. • Find the key equation relating all the quantities involved. After that we can use different strategies: • If possible, solve the key equation for the variable of interest and then take its derivative. • Use implicit differentiation: take the derivative of the key equation first and then solve for the derivative we want to find. Answer to Exercise 1. Variables are: radius r, area A and time t. We are given that r′ (t) = 1 mm/day when r(t) = 5 mm. We want to know A′ (t). The key equation relating the radius and area is A = πr2 , so A(t) = π(r(t))2 . Differentiating this equation we get that A′ (t) = 2πr(t) · r′ (t) = 2π · 5 · 1 = 10π mm2 /day. Answer to Exercise 2. Variables are: radius r, volume V and time t. We are given that V ′ (t) = 200 cm3 /min when r(t) = 10 cm. We want to know r′ (t). The key 3 equation relating the radius and volume of half the sphere is V = 2π 3 r , so V (t) = 2π 3 ′ 3 (r(t)) . We want to know r (t), so we can solve for r(t) first and then take the derivative, or we can use implicit differentiation. Let us use implicit differentiation. 3 ′ 2 ′ Differentiating the equation V (t) = 2π 3 (r(t)) we get that V (t) = 2π(r(t)) · r (t). Solving for r′ (t) we get r′ (t) = V ′ (t) 200 = = 0.3183 cm/min. 2 2π(r(t)) 2π(10)2 Answer to Exercise 3. The variables are x, y, θ and time t. We are given that y′ (t) = 400 m/min and θ ′ (t) = −1.8 rad/min when y = 550 and θ = π6 . We want to find x′ (t) at the same moment. The relationship between variables from the right triangle is tan(θ ) = xy . Solving for x we get x(t) = y(t) cot(θ (t)), and taking the derivative (also recall or check that (cot(x))′ = − csc2 (x) = − sin21(x) ), x′ (t) = y′ (t) cot(θ (t)) + y(t)(− csc2 (θ (t)))θ ′ (t) = 400 cot(π/6) + 550 − csc2 (π/6) (−1.8) = 4652.82 m/min, which is about 279 km/h. 128 2 Derivatives Answer to Exercise 4. We are given that d ′ (t) = −5 m/s (minus sign because the boat is moving toward the statue, so the distance is decreasing) when d(t) = 100 m, and we want to know θ ′ (t), so we need to find the equation relating d and θ . Let α be the angle between the horizontal line and line of sight to the bottom of the Statue of Liberty, and let β be the angle between the horizontal line and line of sight to the top of the Statue of Liberty, as in the figure. Then θ = β − α. From the right triangles, we see that tan(α) = 46 d and tan(β ) = 92 , d 92 92 46 so α = arctan( 46 d ), β = arctan( d ), and θ (t) = arctan( d(t) ) − arctan( d(t) ). This is our key equation. Differentiating it, we get 92 46 1 1 ′ − θ ′ (t) = d (t) − − d ′ (t) 92 2 46 2 d(t)2 d(t)2 1 + ( d(t) ) 1 + ( d(t) ) 92 46 = − d ′ (t) d(t)2 + 462 d(t)2 + 922 46 92 = − (−5) = 0.0059 rad/s. 1002 + 462 1002 + 922 Chapter 3 Integrals 3.1 Definite integrals: the case of velocity Main question that we will study in this chapter is the following. If we know the rate of change of some quantity, how do we calculate how much the quantity changes on some interval? To spell out this question more precisely: • If we know the rate of change f (x) of some quantity F(x) on some interval [a, b], which means that f (x) is the derivative of this quantity, f (x) = F ′ (x), how do we calculate how much the quantity changes between a and b? In other words, how do we calculate F(b) − F(a)? To make this question more concrete, consider a car driving on a mountain road, and suppose that at time t = a the car is at point A and at time t = b it is at point B. At time t, let D(t) be the distance of the car along this road from the highway exit O. Notice that this distance D(t) can increase or decrease depending on the direction in which the car is moving. What is D′ (t)? It is the velocity v(t) at time t, which is the speed of the car with the plus or minus sign depending on the direction. For example, when the car is moving towards the highway exit, D(t) is decreasing and its derivative will be negative, i.e. minus the speed. In this concrete setup, the question above becomes: • If we know the velocity v(t) on some time interval [a, b], how do we calculate how much the position D changes between time a and time b? In other words, how do we calculate the distance D(b) − D(a) between points A and B along this road depicted in the figure above? Notice also that the change in position D(b) − D(a) would be negative (minus the distance) if B was closer to the exit O than A, in which case D(b) < D(a). 129 130 3 Integrals Main formula. Before we give the answer to the above question, let us look at the graph of the velocity on the interval a ≤ t ≤ b. The graph in the figure on the right is not a very realistic description of a car moving on a mountain road, but as an illustration it has the right features. For example, up to a certain point in time the car is moving away from the highway exit (towards the peak of the mountain), because the velocity v(t) is positive and its graph is above the x-axis. Then the car turns around and starts moving back towards the highway exit, so the velocity is negative and its graph is below the x-axis. At time t = b the car is passing point B. Let A1 be the area below the graph of velocity y = v(t) and above the x-axis, when the car is moving in the ‘positive direction’ and D(t) is increasing. Let A2 be the area above the graph of velocity y = v(t) and below the x-axis, when the car is moving in the ‘negative direction’ and its position D(t) is decreasing. Then the answer to the above question is: • Given velocity v(t) = D′ (t), the change of the position between time a and b is D(b) − D(a) = A1 − A2 . A more detailed answer is • The car will cover distance A1 in the positive direction and distance A2 in the negative direction. So the total distance travelled by the car will be A1 + A2 but, taking into account direction, the change in car’s position on the road will be A1 − A2 . The minus sign in the second term in A1 − A2 simply takes into account the direction, or the fact that D(t) can increase and decrease, and our main goal now is to understand why the area between the graph of velocity v(t) and the x-axis represents the distance travelled. Actually, the reason for this is quite simple and will become clear through several examples. Example 1. A car drives in a ‘positive direction’ on some road with the speed of 20 mph between 2 p.m. and 5 p.m., then turns around and drives in the opposite direction with the speed of 40 mph between 5 p.m. and 9 p.m., and then turns around again and drives in the original direction with the speed of 50 mph between 9 p.m. and 11 p.m. What is the total distance travelled by the car, and what is the change in its position along this road. Draw the graph of the velocity and check that the calculation matches the above formula in terms of areas. 3.1 Definite integrals: the case of velocity 131 Solution: Using the formula Distance = Speed × Time we can calculate that the distance travelled between 2 p.m. and 5 p.m. is 20 m/h × 3 h = 60 miles, the distance travelled between 5 p.m. and 9 p.m. is 40 × 4 = 160 miles, and the distance travelled between 9 p.m. and 11 p.m. is 50 × 2 = 100 miles. The total distance travelled is 60 + 160 + 100 = 320 miles, and the position change taking into account the direction is 60 − 160 + 100 = 0 miles. If we take a look at the graph of velocity between 2 p.m. and 11 p.m., the areas we need to compute correspond to three rectangles, because velocity is constant during each time period and the sides of each rectangle are exactly Speed (height) and Time (width), so each Area = Speed × Time, which is the same as distance travelled during that time period. In the case when velocity is constant on each interval, this explains why we use the areas in the formula A1 − A2 above. Notice that A1 here consists of two disjoint rectangles, so it is okay that the region above or below the x-axis consists of several pieces. Exercise 1. The table below gives the velocity of the car driving along some road during different time intervals. Compute the total distance travelled and the change in position. t 1 to 3 p.m. 3 to 5 p.m. 5 to 7 p.m. 7 to 9 p.m. v(t) 60 mph −55 mph 50 mph −60 mph Sketch the graph of the velocity and check that various distances are exactly the areas in your figure. For the future, it is convenient to modify the formula Distance = Speed × Time slightly and rewrite it as Position Change = Velocity × Time where we simply take into account the positive or negative direction. When the position D(t) is decreasing, Position Change = −Distance and Velocity = −Speed, so it is still the same formula just with a minus sign. It is more convenient because we do not need to mention the direction explicitly, since it is reflected by the plus or minus sign. 132 3 Integrals Riemann sum approximation. In the two problems above, the time interval [a, b] was divided into several subintervals where the velocity was constant. If the velocity is not constant, we can still use the same calculation to approximate the distance travelled by the car and its change in position. To do that, we can divide the time interval [a, b] into many small subintervals and, because the velocity cannot change much over a very short period of time, the velocity is almost constant on each subinterval. This means that if we measure the velocity at any particular time on a small subinterval, we will get an approximation ∆D ≈ Velocity × ∆t where ∆D is the increment of position and ∆t is the increment of time. • If we sum up the increments of position ∆D over all subintervals we will get the total change of position D(b) − D(a). • Because Velocity × ∆t = ±Area of Rectangle (see figure below), their sum will approximate A1 − A2 . This explains the formula D(b) − D(a) = A1 − A2 in the general case, because when subintervals get smaller and smaller, the approximation of the areas A1 and A2 by rectangles gets better and better. In the figures above we divided the interval [a, b] into 20 subintervals and then made two possible choices of velocity on each subinterval: at the left endpoint or at the right endpoint. The sum corresponding to the left endpoint is called the left Riemann sum, and to the right endpoint is called the right Riemann sum. Example 2. The following table shows the velocity v(t) of the bowling ball (in meters per second) at time t (in seconds), between the moment it was released until it hit the pins. t v(t) 0 8.6 0.5 7.7 1 7.15 1.5 6.8 2 6.6 2.5 6.5 3.1 Definite integrals: the case of velocity 133 Estimate the length of the bowling lane from above and below. Solution: Because of friction, the bowling ball is slowing down, which is also reflected in the above table, because the values v(t) are decreasing. As a result, the speed is the highest at the beginning of each interval (left endpoint), and it is lowest at the end of each interval (right endpoint). This means that the left Riemann sum, in this case corresponding to 5 intervals of length ∆t = 0.5 each, 8.6 × 0.5 + 7.7 × 0.5 + 7.15 × 0.5 + 6.8 × 0.5 + 6.6 × 0.5 = 18.425, overestimates the total distance travelled by the ball. The right Riemann sum 7.7 × 0.5 + 7.15 × 0.5 + 6.8 × 0.5 + 6.6 × 0.5 + 6.5 × 0.5 = 17.375 underestimates the total distance travelled by the ball. We conclude that the length of the lane is between 17.375 and 18.425 meters. Notice how in the figure above, because the function is decreasing, the green rectangles with height given by the left endpoints are above the graph of y = v(t) and the blue rectangles with height given by the right endpoints are below the graph, so the area below the graph is, indeed, in between the right and left Riemann sums. What we have learned in the last problem is that • If the function is decreasing then the left Riemann sums overestimate and right Riemann sums underestimate the change in position D(b) − D(a). This is true even when velocity can become negative (check it!). Similarly, we can see that • If the function is increasing then the left Riemann sums underestimate and right Riemann sums overestimate the change in position D(b) − D(a). Exercise 2. The following table shows the speed v(t) of the bowling ball (in meters per second) at time t (in seconds), between the moment it was dropped from the roof of a building until it hit the ground. t v(t) 0 0 0.25 2.45 0.5 4.9 0.75 7.35 1 9.8 1.25 1.5 1.75 2 12.25 14.7 17.15 19.6 (a) Estimate the height of the building from above and below using four equal subintervals. (b) Estimate the height of the building from above and below using eight equal subintervals. (c) If the speed is v(t) = 9.8t, plot its graph and deduce what the exact height of the building is using the area formula. 134 3 Integrals Example 3. Given the following table of velocity v(t) (in meters per second) at time t (in seconds) between t = 0 and t = 3, t v(t) 0 1.6 0.5 0.7 1 1.5 2 2.5 3 0.15 −0.2 −0.4 −0.5 −0.55 which of the following expressions is not a Riemann sum estimate of the position change D(3) − D(0), and why? (a) (b) (c) (d) (1.6 + 0.15 − 0.4) × 1 (0.7 + 0.15 − 0.2 − 0.4 − 0.5 − 0.55) × 0.5 (1.6 + 0.7 + 0.15 − 0.2 − 0.4 − 0.5 − 0.55) × 0.5 (1.6 + 0.7 + 0.15 − 0.2 − 0.4 − 0.5) × 0.5 Solution: (a) Notice that here we multiply the velocity values by 1, not 0.5, so the increment of time is ∆t = 1, which means that this sum could correspond to a Riemann sum with 3 subintervals on the interval [0, 3]. This means that we need to look at the values of v(t) at t = 0, 1, 2 and 3, and we can recognize that (1.6 + 0.15 − 0.4) × 1 is a left Riemann sum v(0) × ∆t + v(1) × ∆t + v(2) × ∆t. (b) This is a right Riemann sum with ∆t = 0.5 and 6 subintervals. The number of terms matches the number of subintervals. (d) This is a left Riemann sum with ∆t = 0.5 and 6 subintervals. (c) Here, it looks like ∆t = 0.5, but this is not a Riemann sum with ∆t = 0.5 and 6 subintervals, because the number of terms is 7 and it does not match the number of subintervals. By using all the values in the table we are overcounting, or more precisely, we are counting one of the intervals twice, at the left and right endpoints. Exercise 3. Given velocity v(t) = e−t between t = 0 and t = 4, which of the following is not a Riemann sum estimate of the position change, and why? (a) (b) (c) (d) (e−2 + e−4 ) × 2 (e−1 + e−2 + e−3 + e−4 ) × 1 (1 + e−0.5 + e−1 + e−1.5 + e−2 + e−2.5 + e−3 + e−3.5 ) × 0.5 (e0 + e−1 + e−2 + e−3 + e−4 ) × 1 Constant acceleration. Let us now consider several examples with constant acceleration, which makes the areas particularly easy to compute. Example 4. An object is moving along a straight line with initial velocity 4 m/s and acceleration −1 m/s2 . What is the total distance travelled and position change at time t = 9 s? Solution: Acceleration is the derivative of velocity, so constant acceleration means that velocity v(t) is a linear function with the slope equal to acceleration. In our case, the slope is −1 and initial velocity is v(0) = 4, so the function is v(t) = 4 − t. 3.1 Definite integrals: the case of velocity 135 Because velocity v(9) = −5 at time t = 9 is negative, the object changes direction somewhere before that, and we need to find where: 4 −t = 0, or t = 4. Looking at the figure, we can now use the areas of the triangles to compute the distances. The area above the x-axis is 12 · 4 · 4 = 8, which is the distance travelled in the positive direction, and the area below the x-axis is 12 · 5 · 5 = 12.5, which is the distance travelled in the negative direction. So the change in position is 8 − 12.5 = −4.5 meters and the total distance travelled is 8 + 12.5 = 20.5 meters. Notice that multiplying the units m/s × s gives meters. Exercise 4. A baseball is thrown from height 0 directly upwards with speed 29.4 m/s. What is the height of the baseball at time t = 5 seconds? Example 5. A baseball is thrown from height 0 directly upwards and it reaches its peak at 78.4 meters. What is its initial velocity v0 and time t0 it takes to reach the peak? Solution: Acceleration due to gravity is −9.8 m/s2 , which is the derivative of velocity, so velocity v(t) is a linear function with the slope −9.8. In our case, the initial velocity v0 and time t0 to reach the peak are unknown, but we know that the velocity at the peak should be zero, v(t0 ) = 0, as in the figure. In addition to the slope, we also know the area A1 under the graph, because it is exactly the distance to the peak, A1 = 78.4. In terms of v0 and t0 the area is 21 v0t0 . v0 0 Also, since the line passes through points (0, v0 ) and (t0 , 0), its slope is 0−v t0 −0 = − t0 . This gives us two equations, Slope = − v0 = −9.8, t0 1 Area = v0t0 = 78.4. 2 From the first one, we get v0 = 9.8t0 and, plugging this into the second equation, we get 12 9.8t02 = 78.4, or t02 = 16, or t0 = 4. Then v0 = 9.8t0 = 9.8 × 4 = 39.2 m/s. Exercise 5. After spotting a police officer in a school zone with a 10 m/s speed limit, a surprised driver slams on the brakes and comes to a complete stop. The police officer was not equipped with a speed radar to see how fast the driver was going. However, by looking at the tire skid marks the police officer determined that (a) It took the driver 35 m to come to a complete stop. (b) The driver was braking (decelerating) at 7 m/s2 . Does the officer have enough information to issue a speeding ticket? Hint: Sketch the graph of velocity and express unknown initial velocity v0 and time to stop t0 in terms of given information. 136 3 Integrals Notation and terminology. The change of position D(b) − D(a) is often called total displacement, which is A1 −A2 as opposed to total distance travelled A1 +A2 . Now that we understood that the total displacement D(b) − D(a) can be computed from velocity v(t) = D′ (t) using the areas of the Riemann sum approximations, it is time to introduce important notation and terminology. First of all, the difference of areas A1 − A2 above and below the x-axis on the interval [a, b] is called the definite integral of v(t) on the interval [a, b], and it is denoted Z b v(t) dt. a This notation will become more clear below. We have seen in the examples above that we can approximate the difference A1 − A2 by the sum of areas of rectangles (of course, also with a plus or minus sign), which is called a Riemann sum. For example, the figure on the right depicts rectangles corresponding to the left Riemann sum. When the interval is divided into n subintervals, the notation for a Riemann sum of the function v(t) is n ∑ v(ti∗ )∆t i=1 where v(t) can be replaced by a specific formula, and where the meaning of each symbol is as follows: . • The symbol Σ (sigma) represents the sum. • If we divide the interval [a, b] into n small subintervals, the letter i represents the index enumerating these subintervals from 1 to n. For example, i = 3 means that we are looking at the third subinterval starting from t = a. Other letters can be used instead of i, for example, k, ℓ, m, etc. Below Σ we write i = 1 to indicate that the first interval index is 1, and above Σ we write n to indicate that the last interval index is n. Sometimes 1 and n can change depending on the context. 3.1 Definite integrals: the case of velocity 137 • The factor ∆t represents the increment of time t, which is also the width of subintervals. • Points ti∗ represents some specific choice of a point inside the subinterval #i. For example, it could be a left endpoint or right endpoint in the case of the left or right Riemann sums. Because v(ti∗ ) is the ±height of rectangle #i and ∆t is its width, v(ti∗ )∆t is exactly the ±Area of this rectangle and the notation ∑ni=1 means that we are adding up these terms, just like we did in the above problems. Example 6. Given a function v(t) = cos(t): (a) How do we denote its definite integral on the interval [0, π]? (b) What is the definition of the definite integral on the interval [0, π]? Using the definition, what is this integral? (c) What is another meaning of the definite integral on the interval [0, π]? (d) How can we compute the definite integral on the interval [0, π]? (e) How do we denote a general Riemann sum for v(t) on the interval [0, π]? (f) Write down the right Riemann sum on the interval [0, π] with four subintervals. Solution: (a) The definite integral R of y = cos(t) on the interval [0, π] is denoted 0π cos(t) dt. (b) Its definition is the difference A1 − A2 of areas above and below the x-axis on this interval, as in the figure. In the case of cosine, these two R π areas are the same so they cancel out and 0 cos(t) dt = 0. (c) Another meaning of this definite integral is the total displacement D(π) − D(0) of an object moving along a straight line with this velocity R v(t) = cos(t). In other words, if D′ (t) = cos(t) then D(π)−D(0) = 0π cos(t) dt = 0. (d) Above, we noticed that this definite integral is zero, because the areas A1 and A2 cancel out, but we can also compute this integral by approximating it with Riemann sums. (e) A general Riemann sum in this case is ∑ni=1 cos(ti∗ )∆t. We can be a bit more specific and replace ∆t with πn , because the interval has length π and we divide it into n equal subintervals: ∑ni=1 cos(ti∗ ) πn . (f) If we divide [0, π] into four subintervals then the right endpoints will be ∗ t1∗ = π4 , t2∗ = π2 , t3∗ = 3π 4 and t4 = π, so the right Riemann sum will be π π 3π π cos + cos + cos + cos(π) = −0.7853. 4 2 4 4 Exercise 6. Given a function v(t) = 1 − |t|: (a) How do we denote its definite integral on the interval [−1, 1]? (b) What is the definition of the definite integral on the interval [−1, 1]? Using the definition, what is this integral? (c) What is another meaning of the definite integral on the interval [−1, 1]? 138 3 Integrals (d) How can we compute the definite integral on the interval [−1, 1]? (e) How do we denote a general Riemann sum for v(t) on the interval [−1, 1]? (f) Write down the left Riemann sum on the interval [−1, 1] with four subintervals. Example 7. A cat is chasing a mouse, both running along a straight wall. The mouse never stops running but can reverse direction. At time t ≥ 0, the velocity of the mouse is v(t) and the distance travelled by the mouse is d(t). What is the relationship between v(t) and d(t)? Solution: The total distance travelled is the sum of areas A1 + A2 . When velocity is negative, if we flip it around the x-axis then it will become speed |v(t)| but the area A2 will stay the same, so the area below the speed function |v(t)| is exactly A1 + A2 . This means that we can express the total distance travelled at time t as Z t d(t) = |v(s)| ds. 0 Notice how we used a different name for a variable of integration, s instead of t, because t was already reserved for the time t up to which we integrate. This is a typical convention not to use the same name for more than one variable or object. Exercise 7. A cat is chasing a mouse, both running along a straight wall. The mouse never stops running but can reverse direction. At time t, the velocity of the mouse is vm (t) and velocity of the cat is vc (t). Also, at time t, the distance travelled by the mouse is dm (t) and distance travelled by the cat is dc (t). Write down the formulas for the total displacement and total distance travelled by the cat in terms of the total distance x travelled by the mouse. Taking a limit. We saw that as we increase the number of subintervals n and the width or rectangles gets smaller and smaller, Riemann sums get closer and closer to R bthe difference of areas A1 − A2 , which we called the definite integral and denoted a v(t) dt. This statement can be written using the limit n → ∞ notation: lim n→∞ n Z b i=1 a ∑ v(ti∗ )∆t = v(t) dt. In particular, this explains the notation ab v(t) dt, whichR resembles the Riemann sum, with the sum Σ now replaced by the integral sign and the interval indices are replaced by the endpoints a and b. Since the definite integral is also the change R 3.1 Definite integrals: the case of velocity 139 of position D(b) − D(a) on the interval [a, b], a more complete summary of this section is the following formula: if v(t) = D′ (t) then lim n→∞ n Z b i=1 a ∑ v(ti∗ )∆t = v(t) dt = D(b) − D(a). Answer to Exercise 1. Change in position is 60 × 2 − 55 × 2 + 50 × 2 − 60 × 2 = −10 miles. Total distance travelled is 60×2+55×2+50×2+60×2 = 450 miles. Answer to Exercise 2. The speed is increasing so the left Riemann sums will underestimate and right Riemann sums will overestimate the distance travelled by the bowling ball, i.e. the height of the building. (a) Left Riemann sum is (0 + 4.9 + 9.8 + 14.7)0.5 = 14.7 and right Riemann sum is (4.9 + 9.8 + 14.7 + 19.6)0.5 = 24.5, so the height is between 14.7 and 24.5 meters. (b) Left Riemann sum is (0 + 2.45 + 4.9 + 7.35 + 9.8 + 12.25 + 14.7 + 17.15) ∗ 0.25 = 17.15 and right Riemann sum is (2.45 + 4.9 + 7.35 + 9.8 + 12.25 + 14.7 + 17.15 + 19.6)0.25 = 22.05, so the height is between 17.5 and 22.5 meters. Notice how the estimates improved compared to part (a) when we increased the number of intervals. Compare the figures below with 4 and 8 intervals. (c) If v(t) = 9.8t is linear, the region under its graph is a triangle with sides 2 and 19.6, so its area is 21 · 2 · 19.6 = 19.6. This means that the exact height of the building is 19.6 meters. Answer to Exercise 3. (a) is a right Riemann sum with ∆t = 2, (b) is a right Riemann sum with ∆t = 1, (c) is a left Riemann sum with ∆t = 0.5, and (d) is not a Riemann sum because ∆t = 1 and we are overcounting one interval. Answer to Exercise 4. Acceleration due to gravity is −9.8 m/s2 , which is the derivative of velocity, so velocity v(t) is a linear function with the slope −9.8. In our case, the initial velocity is v(0) = 29.4, so the function is v(t) = 29.4 − 9.8t. 140 3 Integrals At time t = 5, the velocity v(5) = 29.4 − 9.8 · 5 = −19.6 is negative, so the baseball is on its way down. It reaches the peak and changes direction when 29.4 − 9.8t = 0, or t = 29.4 9.8 = 3. Looking at the figure, we can now use the areas of the triangles to compute the distances on the way up and on the way down. The area above the x-axis is A1 = 1 2 · 3 · 29.4 = 44.1, which is the distance on the way up, and the area below the x-axis is 1 2 · 2 · 19.6 = 19.6, which is the distance on the way down. So the change in position is 44.1 − 19.6 = 24.5 meters, which is the height of the baseball at time t = 5. Answer to Exercise 5. Decelerating at 7 m/s2 means that acceleration is −7 m/s2 , i.e. the slope is −7. Of course, this agrees with the fact that the speed is decreasing during braking. Total braking distance is the area under this graphs, and it is given to us as 35 m. In terms of v0 and t0 the area is 35 = 21 v0t0 . Since the line passes through points (0, v0 ) v0 0 and (t0 , 0), its slope is −7 = 0−v t0 −0 = − t0 . This gives us two equations, v0 = 7, t0 v0t0 = 70. From the first one, we get v0 = 7t0 and, √ plugging this into the√second equation, 2 we get (7t0 )t0 = 70, or t0 = 10, or t0 = 10. Then v0 = 7t0 = 7 10 ≈ 22.13 m/s, which is above the speed limit. Answer to Exercise 6. (a) The definite integral ofRy = 1 − |t| on the interval [−1, 1] is de1 noted −1 (1 − |t|)dt. (b) Its definition is the difference A1 − A2 of areas above and below the x-axis on this interval, as in the figure. In the caseR of 1 − 1 |t|, A1 = 21 · 1 · 2 = 1 and A2 = 0, so −1 (1 − |t|)dt = 1. (c) Another meaning of this definite integral is the total displacement D(1) − D(−1) of an object moving along a straight line with this velocity v(t) = 1 − |t|. In other R1 words, if D′ (t) = 1 − |t| then D(1) − D(−1) = −1 (1 − |t|)dt = 1. (d) Above, we noticed that this definite integral is 1, but we can also compute this integral by approximating it with Riemann sums. 3.1 Definite integrals: the case of velocity 141 (e) A general Riemann sum in this case is ∑ni=1 (1 − |ti∗ |)∆t. We can be more specific and replace ∆t with 2n , because the interval has length 2 and we divide it into n equal subintervals: ∑ni=1 (1 − |ti∗ |) 2n . (f) If we divide [−1, 1] into four subintervals then the left endpoints will be t1∗ = −1, t2∗ = −0.5, t3∗ = 0 and t4∗ = 0.5, so the left Riemann sum will be 2 (1 − | − 1|) + (1 − | − 0.5|) + (1 − |0|) + (1 − |0.5|) = 1. 4 Answer to Exercise 7. Since the distance x travelled by the mouse is dm (t), we can solve x = dm (t) for t as t = dm−1 (x), where we assume that the function is invertible because the mouse never stops running. The the distance travelled by the cat at that time is dc (t) = dc (dm−1 (x)). The total displacement at time t is Z t 0 vc (s) ds = Z d −1 (x) m 0 vc (s) ds. 142 3 Integrals 3.2 Definite integrals: general case Summary of last section. In the last section we considered a position D(t) of an object changing with time t, typically moving along some road or along a straight line. The rate of change of position is velocity v(t) = D′ (t), and the content of the last section can be summarized in the following formula: D(b) − D(a) = lim n→∞ n Z b i=1 a ∑ v(ti∗ )∆t = v(t) dt. In a few words, the formula says that if we break the change of position D(b) − D(a) into small steps, each step ∆D is approximately v(ti∗ )∆t and, if we interpret this as an area of rectangle, in the limit we get the difference R of areas A1 − A2 , which was denoted ab v(t) dt and called the definite integral. This means that we can compute D(b) − D(a) by finding the areas A1 and A2 if possible, or by using Riemann sums as an approximation. Of course, if we can use a calculator we can make this approximation very precise by dividing into many rectangles, see for example the link in the footnotes.1 Switching the point of view. Everything we discussed about a position D(t) and velocity v(t) = D′ (t) as a function of time t can be applied to any quantity F(x) and its derivative f (x) = F ′ (x) as a function of any input variable x. By looking at the following table Variable Function Rate of change t D(t) v(t) = D′ (t) x F(x) f (x) = F ′ (x) Small step ∆D ≈ v(ti∗ )∆t ∆F ≈ f (xi∗ )∆x Riemann sum Definite integral Rb v(t) dt ∑ni=1 v(ti∗ )∆t R ba n ∗ ∑i=1 f (xi )∆x a f (x) dx we can see that the analogue of the above formula in the general case will be n F(b) − F(a) = lim n→∞ ∑ i=1 f (xi∗ )∆x Z b = f (x) dx. a Of course, the key step is ∆F ≈ f (xi∗ )∆x, which is true because the rate of change on a small interval should be almost constant, so the rate of change f (xi∗ ) at any point xi∗ on this interval can be approximated by the average rate of change ∆F ∆x . Let Rb us add that when we write a definite integral a f (x) dx, the endpoints a and b are also called the limits of integration and the function f (x) is called the integrand. 1 https://youtu.be/DcgiBBhGreY. 3.2 Definite integrals: general case 143 Example 1. The concentration of medication in patient’s blood R 3 is changing at a rate of r(t) mg/L per hour at time t (measured in hours). If 2 r(t) dt = −5, what R3 are the units of 2, 3 and −5? What is the meaning of 2 r(t) dt? Solution: Since the variable of integration t is time, both limits 2 and 3 represent time, so its units are hours. If C(t) is the concentration of medication at time t measured in mg/L, then the rate r(t) is its derivative C′ (t), and the meaning of the R3 definite integral 2 r(t) dt is the change of concentration C(3) − C(2) between 2 and 3 hours. It means that the units of −5 are mg/L, so the concentration decreased between 2 and 3 hours by 5 mg/L. Exercise 1. Average daily profit of a restaurant is increasing at a rate of f (x) dollars R 60.5 per customer. If 40.5 f (x) dx = 150, what are the units of 40.5, 60.5 and 150? Do R 60.5 the limits 40.5 and 60.5 make sense? What is the meaning of 40.5 f (x) dx? Example 2. The following figure shows the velocity y = v(t) (in m/s) of two cars driving on the same road between t = 1 and t = 7 minutes. At time t = 1 min, the first car is ahead by 10 meters. Express how far ahead or behind the first car is at time t = 7 min in terms of the areas A1 , A2 , A3 and A4 in the figure. Solution: Let us start by comparing the distances travelled by the two cars between 1 and 7 minutes. Let us do this in two different ways. First, Car 1 travelled the distance given by the area under its graph, which is A2 + A3 + A4 , and Car 2 travelled the distance given by the area under its graph, which is A1 + A3 + A4 . If we subtract the two, we get that (A2 + A3 + A4 ) − (A1 + A3 + A)4 = A2 − A1 is how much more Car 1 travelled compared to Car 2. If this is negative, it means that Car 1 travelled less. Notice that we do not need to know A3 and A4 . Another way to get the same answer is to divide the interval in the middle point where the two velocities are equal, near 3.8. Up to that point, the velocity of Car 2 is bigger and the extra distance it travels compared to Car 1 is exactly the area A1 in between the two graphs. If this is not clear, notice that on this interval Car 1 travels A3 and Car 2 travels A1 + A3 , so it travels extra distance A1 . Similarly, in the second half the velocity of Car 1 is bigger and it travels extra distance A2 . Combining the two intervals, we get that Car 1 travels A2 − A1 extra distance compared to Car 2. Again, if this is negative, it means that Car 1 travelled less. Since Car 1 was 10 meters ahead at time t = 1 min and it travelled extra A2 − A1 between 1 and 7 min, does this mean that it is ahead by 10 + A2 − A1 ? The answer is no! We need to pay attention to the units of the variables and ‘area’ in the figure. 144 3 Integrals Time on the x-axis has minutes as its units, while the velocity on the y-axis has m/s as its units, so the units of area or distance v(t)∆t are m/s × min = m/s × 60s = 60m. So one unit of ‘area’ in this figure is equal to 60 meters, which means that the extra distance travelled by Car 1 is (A2 − A1 ) × 60m once we take units into account. At time t = 7 minutes Car 1 is 10 + 60(A2 − A1 ) meters ahead. If this number is negative, ahead by a negative number means that Car 1 is actually behind. Exercise 2. The following figure shows the growth rates (in cm/year) of boys and girls in the United States between the ages of 6 and 17. The average height of boys at age 6 is 116 cm, and the average height of girls at age 6 is 115 cm. Express the difference between the average height of boys and girls at age 17 in terms of the areas A1 and A2 in the figure. Estimate the difference between the average height of boys and girls at the age 18 using the data in the following table. 6 7 8 9 10 11 12 13 14 15 16 17 6.70 6.18 5.90 5.60 5.51 5.68 6.54 7.64 6.73 4.46 2.58 1.10 6.70 6.27 6.00 5.98 6.33 6.68 6.04 4.29 2.42 1.13 0.64 0.33 Age Boys Girls Some properties of definite integrals. When solving the above problems, we implicitly used some basic properties of areas. For example, if we divide some interval [a, b] into two subintervals [a, c] and [c, b] then adding the areas on these subintervals we get the total area on the entire interval. This means that Z c Z b f (x) dx + a Z b f (x) dx = c f (x) dx. a Example 3. If a function f (x) is even, and we know that R7 R7 3 f (x) dx = 1, what is the integral 0 f (x) dx? R7 −3 f (x) dx = 5 and Solution: By the above property, Z 7 Z 3 f (x) dx = −3 Z 7 f (x) dx, f (x) dx + −3 3 3 7 so −3 f (x) dx = −3 f (x) dx − 37 f (x) dx = 5 − 1 = 4. Because Rf (x) is even, as we can see in the figure, 3 the integral −3 f (x) dx consists of two equal parts on R subintervals [−3, 0] and [0, 3], so 03 f (x) dx = 42 = 2. R7 R R Using the above property again, 0 f (x) dx = 03 f (x) dx + 37 f (x) dx = 2 + 1 = 3. R R R 3.2 Definite integrals: general case 145 R7 Exercise 3. If a function f (x) is odd, and we know that R7 R3 0 f (x) dx = 7, what is the integral 0 f (x) dx? −3 f (x) dx = 5 and Another important property of definite integrals is the following convention. We will agree that Z a f (x) dx = − b Z b f (x) dx. a In other words, swapping a and b changes the sign of the R −3 the limits ofRintegration 1 integral. For example, 1 f (x) dx = − −3 f (x). Let us explain why this convention is very convenient. • First, for convenience, we want the formula ab f (x) = F(b) − F(a) to be true no matter which number is bigger, a or b. This way we don’t need to worry which number is bigger when we use this formula. But if we swap a and b, the right hand side will become F(a) − F(b) = −(F(b) − F(a)), so only the sign will R change. That is why we agree that ab f (x) will only change the sign if we swap a and b. R R R • Another reason is that we want the formula ac f (x) dx + cb f (x) dx = ab f (x) dx to be true no matter the numbers a, b and c are. If we take b equal to a then this formula becomes R Z c Z a f (x) dx + Z a f (x) dx = a c f (x) dx = 0. a The integral on the right hand side is equal to 0, because the areaR from a to a is R zero, so this formula will be true if we agree that ca f (x) dx = − ac f (x) dx. Example 4. If R3 −1 f (x) dx = 4 and R −1 2 f (x) dx = 2, what is R3 R3 2 f (x) dx? Solution: Let us start with the usual property −1 f (x) dx = −1 f (x) dx + 23 f (x) dx. R3 R R2 f (x) dx = 4 and we want to find 23 f (x) dx, so we need −1 f (x) dx. We know that −1 R2 R −1 But, by our convention, −1 f (x) dx = − 2 f (x) dx = −2, so we can write the R R above property as 4 = −2 + 23 f (x) dx, and 23 f (x) dx = 4 + 2 = 6. Exercise 4. If R1 −1 f (x) dx? R −1 −2 f (x) dx = 4, R2 −2 f (x) dx R2 R1 = 6 and 2 R f (x) dx = −2, what is Another two properties of definite integrals is that the integral of the sum is equal to the sum of integrals, Z b a and Z b Z b f (x) + g(x) dx = f (x) dx + g(x) dx, a a 146 3 Integrals Z b Z b κ f (x) dx = κ a f (x) dx, a i.e. we can take a constant factor κ outside of the integral. The first property is true because we can add the areas vertically, and the second property is true because stretching a function κ times vertically multiplies the area by κ. The two properties together are called the linearity of integral. Example 5. If R3 1 f (x) dx = −2 and R3 1 g(x) dx = −1, what is R3 1 (5 f (x) − 7g(x)) dx? Solution: By the linearity of integral, Z 3 Z 3 1 1 (5 f (x) − 7g(x)) dx = 5 Exercise 5. If Rb a g(x) dx? Rb f (x) dx − 7 Z 3 g(x) dx = 5(−2) − 7(−1) = −3. 1 R f (x) + g(x) dx = −2 and ab f (x) − g(x) dx = 4, what is a Another property of definite integrals is that if f (x) ≤ g(x) then Z b f (x) dx ≤ a Z b g(x) dx. a If a function is bigger than its definite integral is also bigger, which is obvious by looking at the areas. This is called monotonicity of integral. Notice that this is only true if a ≤ b! Unlike other properties, if we flip a and b the minus sign will also R R flip the inequality. For example, if 2 = 12 f (x)Rdx ≤ 12 g(x)Rdx = 7 then flipping the limits we get the opposite inequality −2 = 21 f (x) dx ≥ 21 g(x) dx = −7. So the order of limits is important for this property. Example 6. If f (x) ≤ −2 then which number is bigger, R1 6 f (x) dx or 10? Solution: If we use the limits in the increasing order, 1 to 6, then the monotonicity of integral tells us that f (x) ≤ −2 implies that Z 6 f (x) dx ≤ Z 6 1 (−2)dx = (−2)(6 − 1) = −10. 1 Flipping the limits changes the sign, so R1 6 f (x) dx ≥ 10. Exercise 6. Order the following definite integrals from the smallest to the largest without any calculations first. After that find their values. (a) R1 −1 1 dx (b) R1 (1 − |x|) dx −1 (c) R1 √ 1 − x2 dx. −1 3.2 Definite integrals: general case 147 Example 7. Which number is bigger, Z π/2 cos(x) dx or 0 π π π π π cos 0.25 + cos 0.5 + cos 0.75 + cos . 8 2 2 2 2 Solution: We should recognize that the second number is the right Riemann sum of R π/2 π π 0 cos(x) dx on the interval [0, 2 ] with n = 4 subintervals, each of length 8 . Since cos(x) is decreasing on this interval, we already discussed in the last section that the right Riemann sum is underestimating the integral, so it is smaller. Of course, this is the same idea as monotonicity, because the rectangles with height at the right endpoints will be below the function when it is decreasing. Exercise 7. Which number is bigger, Z π/2 sin(x) dx 0 or π π π π π sin 0.25 + sin 0.5 + sin 0.75 + sin . 8 2 2 2 2 Next two problems will refer to the following two figures. 1 f (x) dx corresponding to the left figure above Example 8. The definite integral −1 is approximately equal to which of the following? R (a) 1.12 (b) 1.58 (c) 2.02 (d) 2.57 √ Solution: It looks like the curve is closely tracing half a circle y = 1 − x2 , whose definite integral is π2 ≈ 1.57, so the best guess would be (b) 1.58. 1 Exercise 8. The definite integral −1 f (x) dx corresponding to the right figure above is approximately equal to which of the following? R (a) 1 (b) 1.5 (c) 2 (d) 2.5 148 3 Integrals Answer to Exercise 1. Since the quantity we are talking about is the average daily profit (in dollars) and the rate f (x) is given in dollars per customer, the variable x must be the number of customers, so the unit of 40.5 and 60.5 is the number of customers. It might look strange to consider half a customer but, because we are talking about the average daily profit over a long period of time, x can also be thought of as the average daily number of customers, so the limits 40.5 and 60.5 make sense. If P(x) is the average daily profit when x is the average number of customers, the rate f (x) is its derivative P′ (x), and the meaning of the definite R 60.5 integral 40.5 f (x) dx is the extra average daily profit P(60.5) − P(40.5) when the average number of customers increases from 40.5 to 60.5. It means that the units of 150 are dollars. Answer to Exercise 2. The the difference between the average height of boys and girls at age 17 is 1 + A2 − A1 , similarly to Example 2 and because the units of area are cm/year × year = cm. Using the values in the table, (6.70 + 6.27 + 6.00 + 5.98 + 6.33 + 6.68 + 6.04 + 4.29 + 2.42 + 1.13 + 0.64 + 0.33) × 1 = 52.81 is the left Riemann sum for the growth of girls on the interval [6, 18], and (6.70 + 6.18 + 5.90 + 5.60 + 5.51 + 5.68 + 6.54 + 7.64 + 6.73 + 4.46 + 2.58 + 1.10) × 1 = 64.62 is the left Riemann sum for the growth of boys on the interval [6, 18]. Although age 18 is not in the table, using the twelve values from age 6 to 17 estimates growth over the twelve year period from 6 to 18. The estimate of the difference of average heights at age 18 is 1 + 64.62 − 52.81 = 12.81 cm. Answer to Exercise 3. By the property of integrals, Z 7 Z 0 f (x) dx = −3 Z 7 f (x) dx + −3 f (x) dx, 0 0 7 so −3 f (x) dx = −3 f (x) dx − 07 f (x) dx = 5 − 7 = −2. BecauseR f (x) is odd, as we can Rsee in the figure, 0 the integral 03 f (x) dx differs from −3 f (x) dx only by a sign, so it is equal to 2. R R R Answer to Exercise 4. Here we divide the interval [−2, 2] into [−2, −1], [−1, 1] and [1, 2], so Z −1 Z 2 f (x) dx + f (x) dx = −2 Z 1 Z 2 f (x) dx + −2 −1 f (x) dx. 1 We know the first two integrals, and 21 f (x) dx = −2 gives that R1 R1 6 = 4 + −1 f (x) dx + 2 and −1 f (x) dx = 0. R Answer to Exercise 5. If we denote p = that Z b p+q = a Rb a f (x) dx and q = f (x) + g(x) dx = −2 Rb a R2 1 f (x) dx = 2, so g(x) dx, we know 3.2 Definite integrals: general case p−q = 149 Z b f (x) − g(x) dx = 4. a Solving these two equations for p and q, we get that p = 1 and q = −3, so Rb a g(x) dx = −3. Answer√ to Exercise 6. The key here is to remember that y = 1 − x2 is the top half of the circle of radius 1 centered at the origin, because squaring both sides we get y2 = 1 − x2 , or x2 + y2 = 1. The function y = 1 − |x| has a corner shape consisting of two lines: 1−x when x is positive and 1+x when x is negative. Sketching all three functions we can see that p 1 − |x| ≤ 1 − x2 ≤ 1 so, by monotonicity, the integrals are arranged in the same order, (b) ≤ (c) ≤ (a). Using geometry, their values are: (a) 2 × 1 = 2, (b) 12 × 2 × 1 = 1 and (c) π2 . R π/2 Answer to Exercise 7. The sum is the right Riemann sum of 0 sin(x) dx on the interval [0, π2 ] with n = 4 subintervals. Since sin(x) is increasing on this interval, the right Riemann sum is overestimating the integral, so it is bigger. Answer to Exercise 8. It looks like the curve is closely tracing a linear function 1 + x on the interval [−1, 0] and a constant function 1 on the interval [0, 1], so the best guess would be 21 + 1 = 1.5, i.e. (b). 150 3 Integrals 3.3 Fundamental Theorem of Calculus In the previous sections we learned that: if f (x) = F ′ (x) then Z b a b f (x) dx = F(x) a = F(b) − F(a). This statement is called the Fundamental Theorem of Calculus (FTC for short). The notation in the middle F(x)|ba is another way to write F(b) − F(a), which will be very convenient when using this formula. So far we have mostly used this formula to find the change F(b) − F(a) of some quantity F(x) by either computing Rb the definite integral a f (x) dx using areas (for simple enough graphs of f (x)) or approximating it by Riemann sums. Below we will start discussing a different way to use the FTC, by systematically guessing what the function F(x) could be if we know its derivative f (x). But first let us solve a couple of simple problems using the FTC and emphasizing the relationship f (x) = F ′ (x). Example 1. If f (x) = F ′ (x) and F(4.5) = −3, Z 4.5 f (x) dx = 3.2, F(7) = 1, 1.5 R7 1.5 f (x) dx? R 4.5 that 1.5 f (x) dx what is F(1.5) and Solution: Using = F(4.5) − F(1.5) and the information given, we R7 get that 3.2 = −3 − F(1.5) and so F(1.5) = −6.2. As for the integral 1.5 f (x) dx, by FTC it is equal to F(7) − F(1.5) = 1 − (−6.2) = 7.2. R7 If we haven’t already computed F(1.5), we could also compute 1.5 f (x) dx by breaking it into two integrals: Z 7 Z 4.5 f (x) dx = 1.5 Z 7 f (x) dx + 1.5 f (x) dx. 4.5 The first integral is given R 7 to us, 3.2, and the second can be computed using R 7 the FTC and the given values: 4.5 f (x) dx = F(7)−F(4.5) = 1−(−3) = 4, so 1.5 f (x) dx = 3.2 + 4 = 7.2. Exercise 1. Suppose f (x) = F ′ (x). Given the graph of y = F(x) in the figure, describe the geometric meaning of: (a) f (a) Z b f (x) dx (b) a (c) 1 b−a Z b f (x) dx a 3.3 Fundamental Theorem of Calculus 151 Average value. The quantity that appeared in the last exercise, 1 b−a Z b f (x) dx, a is called the average of f (x) on the interval [a, b]. If we look at the graph of f (x) in the figure, the rectangleR with the b 1 height equal to the average b−a a f (x) dx will have the same area as the area under F(b)−F(a) 1 Rb f (x). Also, when f (x) is the rate of change of F(x) then b−a a f (x) dx = b−a is the slope of the secant line of F(x), which was called the average rate of change of F(x) on [a, b]. In the next two problems, we will compute averages of two functions f (x) while at the same time trying to guess F(x) such that f (x) = F ′ (x). Example 2. The Gateway Arch in St. Louis is 630 ft wide and 630 ft tall. Its height can be well approximated by the function y = 704 − 37e0.0093x − 37e−0.0093x for x between −315 and 315 feet, shown by the yellow dashed line in the figure.2 (Its shape is an example of a weighted catenary curve.) What is the average height of the Gateway Arch? Solution: First, by linearity of integral, the definite integral of the height y(x) on [−315, 315] can be broken into three parts: Z 315 704 dx − 37 Z 315 −315 0.0093x e dx − 37 Z 315 e−0.0093x dx. −315 −315 The first term is just 704 × 630 = 443520, the area of rectangle. To compute the second and third integral, we would like to use the FTC and write them as F(315)− F(−315). Let us start with the second term. Can we guess what function F(x) has the derivative F ′ (x) = e0.0093x ? We know that (ex )′ = ex , so we can try e0.0093x . Its derivative (e0.0093x )′ = 0.0093e0.0093x is almost what we want, but it has an extra factor 0.0093 because of the chain rule. This is not a big problem, because we 1 can divide by 0.0093 and take F(x) = 0.0093 e0.0093x . Now we can see that F ′ (x) = e0.0093x as we wanted. Therefore, by the FTC, Z 315 −315 2 e0.0093x dx = e0.0093x 0.0093 315 −315 = e0.0093×315 e−0.0093×315 − = 2006.97. 0.0093 0.0093 Background photo by Sam Valadi, https://www.flickr.com/photos/132084522N05/17275578342/. 152 3 Integrals 1 Similarly, we can guess that e−0.0093x is the derivative of − 0.0093 e−0.0093x and Z 315 e−0.0093x dx = − −315 e−0.0093x 0.0093 315 −315 =− e−0.0093×315 e0.0093×315 − − , 0.0093 0.0093 which is also equal to 2006.97. Combining all three integrals, we get 443520 − 37 × 2006.97 − 37 × 2006.97 = 295004.22. To get the average of y(x) we need to divide this by the length 630 of the interval [−315, 315], and we get that the average height is ≈ 468 feet. Exercise 2. A pizza pie at room temperature of 20◦ C is put into a brick oven, after which the temperature T of the pizza top starts to increase according to the formula T = 100 − 80e−t where time t is in minutes. What is the average temperature during the first two minutes? Antiderivatives, a.k.a. indefinite integrals. If f (x) is the derivative of F(x) then F(x) is called an antiderivative of f (x). When we were guessing F(x) in the problems above, we were looking for an antiderivative. Another name for an antiderivative is indefinite integral. The notation for an antiderivative/indefinite R integral of f (x) is f (x) dx, which is similar to a definite integral only without the limits a and b. In other words: ′ Z If f (x) = F (x) then f (x) dx = F(x) +C where C is any constant. The reason we need to add a constant +C is because, if F ′ (x) = f (x) then (F(x) +C)′ = f (x) + 0 = f (x), so given one antiderivative F(x) any shift F(x) +C will also be an antiderivative. The constant C will be important below when we look for antiderivative passing through a specific point. Since we know derivatives of many basic functions, we can reverse the direction to make a list of a few basic indefinite integrals: Z xn dx = xn+1 +C if n ̸= −1, n+1 1 dx = ln |x| +C, Z x ex dx = ex +C, Z Z cos(x) dx = sin(x) +C, Z sin(x) dx = − cos(x) +C. 3.3 Fundamental Theorem of Calculus 153 The reason we put |x| inside the logarithm in the second integral is because 1x is defined for negative x while ln(x) is defined only for positive x and, for negative x, antiderivative of 1x will be ln(−x), which is the same as ln(x). All the other antiderivatives can be checked by taking derivatives. We can also combine these examples with the following simple rule: Z Z f (x) dx = F(x) +C then If f (b + mx) dx = 1 F(b + mx) +C. m We have already used this in the above example when we guessed that an antiderivative of emx is m1 emx . Here we can similarly check that, if F ′ (x) = f (x) then ′ 1 1 F(b + mx) = (F(b + mx))′ = F ′ (b + mx)m = f (b + mx), m m m 1 by using the chain rule in the middle, so indeed m1 F(b + mx) is an antiderivative of f (b+mx). This rule is the simplest case of the so called integration by substitution that we will study later. Another obvious rule is the linearity of indefinite integrals saying that an antiderivative of the sum is the sum of antiderivatives. Example 3. Compute the following indefinite integrals: (a) (3 + 4t + 1t ) dt R (b) (6e−z/2 − e3z ) dz R (c) (sin(2x +1)− x42 ) dx. R Solution: (a) By linearity and the above list: 1 dt = 3 + 4t + t Z Z Z Z 3 dt + 4 t dt + 1 t2 dt = 3t + 4 + ln |t| +C. t 2 Notice that we do not need to add +C to each indefinite integral separately, because all those unknown constants can be combined into one constant. Of course, we can simplify the answer as 3t + 2t 2 + ln |t| +C. (b) Using linearity and the above (b + mx) rule: Z (6e−z/2 − e3z ) dz = 6 Z e−z/2 dz − Z e3z dz = 6e−z/2 e3z − +C −1/2 3 3z which can be simplified as −12e−z/2 − e3 +C. (c) By linearity, using the above list, and using the (b + mx) rule: Z 4 (sin(2x + 1) − 2 ) dx = x Z =− sin(2x + 1) dx − 4 Z x−2 dx cos(2x + 1) 4x−2+1 − +C 2 −2 + 1 which can be simplified to − cos(2x+1) + 4x +C. 2 154 3 Integrals Exercise 3. Compute the following indefinite integrals: R R √ R 2 (c) 2 cos( 2r − 1) dr. (a) ( 3u + 1 − 2u+1 ) du (b) (1 − 6e−s/3 ) ds Example 4. Find the total area between the graph of y = cos(x) and the x-axis on the interval [0, 2.5]. Solution: We know that cosine is positive between [0, π2 ] and negative between [ π2 , π], and the right endpoint 2.5 is somewhere in the second interval. We want to compute the sum of areas A1 + A2 in the figure. The first area is just the definite integral which can be computed using the FTC: Z π/2 A1 = cos(x) dx = sin(x) 0 π/2 0 = sin(π/2) − sin(0) = 1 − 0 = 1. Notice that when using the FTC we did not write +C in (sin(x) +C), because the constant will cancel out anyway: (F(b) +C) − (F(a) +C) = F(b) − F(a). To find A2 , we know that −A2 is the definite integral on [ π2 , 2.5], so −A2 = Z 2.5 2.5 cos(x) dx = sin(x) π/2 = sin(2.5) − sin(π/2) = −0.4015. π/2 This means that A2 = 0.4015 and the total area is A1 + A2 = 1.4015. Exercise 4. Find the total area between the graph of y = 12 x(x − 1)(x − 3) and the x-axis on the interval [0, 2]. Initial Value Problem. Given the rate of change f (x) of some quantity F(x), sometimes we want to find F(x) given additional information that F(x0 ) = y0 , which is also called the initial value problem. If we can find any antiderivative G(x) of f (x) then G(x) +C is also an antiderivative so, if we want an antiderivative to pass through a point (x0 , y0 ), we can plug it in, y0 = G(x0 ) + C, and find the constant C = y0 − G(x0 ). Example 5. The velocity of a ball oscillating on a vertical spring is v(t) = cos(πt). If at time t = 0 the height of the ball is 1, what is its height h(t) at time t. Solution: Since h′ (t) = v(t), the height h(t) is an antiderivative of cos(πt) such that h(0) = 1. We can take a general antiderivative π1 sin(πt) +C of the velocity and make sure that π1 sin(0) +C = 1, so C = 1 − π1 sin(0) = 1. It means that the height at time t is h(t) = π1 sin(πt) + 1. 3.3 Fundamental Theorem of Calculus 155 Exercise 5. Blood alcohol content (BAC) is measured in grams per 100ml, so 1 BAC = 0.01 g/ml. An average person metabolizes alcohol at a rate of 0.015 BAC/hour. If a person stops drinking at 2 a.m. and at 8 a.m. their BAC level is 0.09, what is their BAC level at time t after 2 a.m. measured in hours.3 FTC-2: Reconstruction Theorem. Another way toRsolve the initial value problem F ′ (x) = f (x), F(x0 ) = y0 , is to take the statement ab f (x) dx = F(b) − F(a) of the FTC and replace a by x0 and b by x: Z x x0 f (t) dt = F(x) − F(x0 ). Notice that we also replaced the variable of integration by t in f (t) dt because x is now reserved for the upper limit b = x. Of course, instead of t we can use any other name that is not reserved, such as s, u, etc. If we want F(x0 ) to be equal to y0 , we can replace F(x0 ) by y0 and rewrite this formula as Z x F(x) = y0 + f (t) dt. x0 The FTC rewritten in this form is known as the second form of the FTC or the Reconstruction Theorem. In this statement, we are thinking of the upper limit b as R an independent variable x, and the right hand side y0 + xx0 f (t) dt gives us a specific antiderivative with the initial value y0 at x = x0 . Example 6. Blood alcohol content (BAC) is measured in grams per 100ml, so 1 BAC = 0.01 g/ml. An average person metabolizes alcohol at a rate of 0.015 BAC/hour. If a person stops drinking at 2 a.m. and at 8 a.m. their BAC level is 0.09, what is their BAC level at time t after 2 a.m. measured in hours. Solution: If A(t) is the BAC level at time t, then its derivative is −0.015 (minus sign because the level R is decreasing) and A(6) = 0.09. By the Reconstruction theorem, A(t) = 0.09 + 6t (−0.015) ds = 0.09 + (−0.015)(t − 6) = 0.18 − 0.015t. Of course, this formula only works until A(t) reaches zero, 0.18 − 0.015t = 0, i.e. until t = 12 hours, or 2 p.m. Exercise 6. A patient is given some medication in a 50 mg immediate release pill and 100 mg delayed-release pill which is released at a rate 1600te−4t mg/h, where time t is measured in hours. What amount of medication has been delivered by time t? The answer can be in an integral form. Example 7. Given a graph of the derivative y = f ′ (x) in the left figure below of some continuous function f (x) on the interval [−4, 4], sketch the graph of y = f (x) if f (1) = 4. 3 Illustration from www.houstondwiattorney.net. 156 3 Integrals Solution: First of all, notice that here we departed from the convention to call the derivative f (x) and antiderivative F(x). Instead, here f ′ (x) is the derivative of f (x) and f (x) is the antiderivative of f ′ (x) such that f (1) = 4. We see that f ′ (x) = −2 on the interval [−4, −2) which means that f (x) has slope −2 there, f ′ (x) = 3 on the interval (−2, 1) where the slope of f (x) is 3, and f ′ (x) = −1 on the interval (1, 4) where the slope of f (x) is −1. In the middle figure above we sketch a continuous function f (x) with such slopes on these three intervals. We specified that f (x) is continuous, so it does not have jumps at x = −2 and x = 1, but the derivative is not defined at those points. We labelled this graphs by y = f (x) +C in the middle figure, because it can be shifted vertically by any constant C if we only know its derivative f ′ (x). However, because we are given that f (1) = 4, we can now fix the position of the function f (x) as depicted in the right figure above. In this case, each piece of the function is linear so we could easily find its formula if we wanted to: f (x) = −x + 5 on [1, 4], f (x) = 3x + 1 on [−2, 1], and f (x) = −2x − 9 on [−4, −2]. Exercise 7. Given f (x) in the figure on the right, which function below could be its antiderivative such that F(0) = 1? (a) (b) (c) (d) 3.3 Fundamental Theorem of Calculus 157 Rx The FTC tells us that, if f (x) = F ′ (x) then a f (t) dt = F(x) − F(a). As above, we replaced the upper limit b by a variable x, because, for example, we want to compute this definite integral for any Rupper limit x. If we take the derivative of d x ′ ′ both sides with respect to x, we get dx a f (t) dt = (F(x) − F(a)) = F (x) = f (x). This gives us yet another consequence of the FTC: d dx Z x f (t) dt = f (x). a This R x formula is very useful, because even if we do not know what the integral a f (t) dt is, we know that R its derivative is f (x), so we can deduce some of the properties of this integral ax f (t) dt. Next two problems will refer to the following figures. Example 8. Suppose that h(x) = Rx 0 f (t) dt with f (t) in the left figure above. (a) What is h(0) and h′ (1)? (c) Where is the global minimum (b) On what interval is h(x) concave up? and maximum of h(x) on [0, 2]? Solution: (a) h(0) = 00 f (t) dt the interval [0, 0] has length zero. By R x = 0, because ′ ′ the above formula, h (x) = ( 0 f (t) dt) = f (x) so h′ (1) = f (1) = 0. (b) Since h′ (x) = f (x) and h(x) is concave up on the interval where its derivative is increasing, this happens on the interval [0, 0.5] where f (x) is increasing. (c) Since h′ (x) = f (x) is positive on [0, 1] and negative on [1, 2] in the figure, h(x) is increasing on [0, 1] and decreasing on [1, 2]. It means that the maximum is at x = 1 and minimum is at one of the endpoints x = 0 or x = 2. By FTC, h(x) increases on [0, 1] by the area above the x-axis in the figure and decreases on [1, 2] by the area below the x-axis in the figure. Since the area below the x-axis is bigger, h(x) will decrease more on [1, 2] than it will increase on [0, 1]. This means that the minimum will be at x = 2. R 158 3 Integrals Exercise 8. Suppose that h(x) = Rx 0 f (t) dt with f (t) in the right figure above. (a) What is h(0) and h′ (−1)? (c) Where is the global minimum (b) On what interval is h(x) concave up? and maximum of h(x) on [−1, 1]? There is a more general formula: d dx Z b(x) f (t) dt = f b(x) b′ (x) − f a(x) a′ (x). a(x) First of all, in this formula we replaced both the lower limit a and upper limit b by some functions a(x) and b(x). It might look intimidating, but all we did was apply R b(x) the chain rule. Indeed, the FTC tells us that a(x) f (t) dt = F(b(x)) − F(a(x)) and when we differentiate this equation, we simply apply the chain rule twice, d dx Z b(x) a(x) ′ f (t) dt = F(b(x)) − F(a(x)) = F ′ b(x) b′ (x) − F ′ a(x) a′ (x) = f b(x) b′ (x) − f a(x) a′ (x). Example 9. Compute the derivative of Solution: By the above formula, d dx Z x2 4 e−t dt = e−(x 2 )4 R x2 −t 4 dt. 2x e 4 8 4 (x2 )′ − e−(2x) (2x)′ = 2xe−x − 2e−16x . 2x Exercise 9. Compute the derivative of R sin(x) cos(x) ln(1 + t 2 ) dt. A playlist with an overview of the FTC can be found in the footnote link.4 You can also play around with the following Geogebra example to make sure you understand how the integral changes as a function of the upper limit.5 4 5 https://www.youtube.com/playlist?list=PLYxPH73Uem-QmJI2fdsCtRww-oYMIXUOx. https://www.geogebra.org/m/fa2w8qjy 3.3 Fundamental Theorem of Calculus 159 Answer to Exercise 1. (a) Because f (a) = F ′ (a) it is the slope of y = F(x) at x = a. (b) Because Z b f (x) dx = F(b) − F(a), a it is the distance between F(a) and F(b) on the y-axis. (c) Because 1 b−a Z b f (x) dx = a F(b) − F(a) , b−a it is the slope of the secant line connecting the points (a, F(a)) and (b, F(b)). Answer to Exercise 2. e−t is the derivative of −e−t so, by FTC, Z 2 Z 2 0 0 (100 − 80e−t )dt = 100 dt − 80 Z 2 e−t dt = 100 × 2 − 80(−e−2 − (−e0 )) 0 which is approximately 130.82◦ C×min (notice that the unit of the integral is The average should be divided by 2 minutes, so 65.41◦ C. ◦ C×min). Answer 3. R √ to Exercise 2 ) du = 29 (3u + 1)3/2 − ln |2u + 1| +C. (a) ( 3u + 1 − 2u+1 R (b) R (1 − 6e−s/3 ) ds = s + 18e−s/3 +C. (c) 2 cos( 2r − 1) dr = 4 sin( 2r − 1) +C. Answer to Exercise 4. We know that the polynomial 21 x(x − 1)(x − 3) has zeros at x = 0, 1 and 3 and we can easily check that it is positive on [0, 1] and negative on [1, 3], so the graph looks like in the figure. To find definite integrals on [0, 1] and [0, 2], we first multiply out 1 x3 3x x(x − 1)(x − 3) = − 2x2 + . 2 2 2 Its indefinite integral is Z 3 x 2 − 2x2 + 3x x4 2x3 3x2 dx = − + +C 2 8 3 4 so A1 = Z 1 3 x 0 2 − 2x2 + x4 2x3 3x2 1 1 2 3 3x 5 dx = − + = − + = , 2 8 3 4 0 8 3 4 24 160 3 Integrals −A2 = Z 2 3 x 1 2 − 2x2 + x4 2x3 3x2 2 3x 13 dx = − + =− , 2 8 3 4 1 24 and the total area is A1 + A2 = 5 24 + 13 24 = 3 4 = 0.75. Answer to Exercise 5. The rate of change of BAC level is −0.015 because it is decreasing, so a general antiderivative is −0.015t +C. At t = 6 hours after 2 a.m. BAC level is −0.015 × 6 +C = 0.09, so C = 0.09 + 0.015 × 6 = 0.18. BAC level at time t after 2 a.m. is 0.18 − 0.015t until it reaches 0, i.e. when 0.18 − 0.015t = 0 or t = 12, which is 2 p.m. So between 2 a.m. and 2 p.m. the BAC level is 0.18−0.015t. Answer to Exercise 6. If A(t) is the total amount of medication by time t, then the rate 1600te−4t is its derivative and R the initial amount is A(0) = 50. By the Reconstruction theorem, A(t) = 50 + 0t 1600se−4s ds. The units are: mg/h × h = mg. This integral can be computed using integration by parts that we will learn later on, but for any specific t we could use Riemann sums to approximate this amount if we needed to. Answer to Exercise 7. First of all, we can eliminate (b), because the derivative f ′ (0) is defined at x = 0 so the function cannot have a jump there. Next, we can eliminate (d), because the function is not equal to 1 at x = 0, but we must have F(0) = 1. Between (a) and (c), the difference is the slope between x = −4 and x = −2. Since f (x) = −2 on that interval, the slope should be −2, which means that (a) could be the answer. Why did we say “could be”? Because in this problem we did not ask for F(x) to be continuous and the derivative f (x) is not defined at x = −2, so if F(x) could have a jump then the linear piece on the interval [−4, −2] could also be shifted vertically. If we asked for a continuous antiderivative then this is the answer. Answer to Exercise 8. (a) h(0) = 0 and h′ (−1) = f (−1) = 0. (b) h′ (x) = f (x) is increasing on [−0.5, 1], so h(x) is concave up there. (c) h′ (x) = f (x) is negative on [−1, 0] and positive on [0, 1], so h(x) is decreasing on [−1, 0] and increasing on [0, 1]. The minimum is at x = 0, and the maximum is at x = 1 because the area above the x-axis is bigger, so h(x) will increase more than it will decrease. Answer to Exercise 9. d dx Z sin(x) ln(1 + t 2 ) dt = ln(1 + sin2 (x)) cos(x) − ln(1 + cos2 (x))(− sin(x)) cos(x) = ln(1 + sin2 (x)) cos(x) + ln(1 + cos2 (x)) sin(x). 3.4 Application of FTC: differential equations 161 3.4 Application of FTC: differential equations In this section we will continue using the Fundamental Theorem of Calculus in the form of the Reconstruction theorem, which tells us that the antiderivative F(x) of Rx f (x) with the initial value F(x0 ) = y0 can be written as F(x) = y0 + x0 f (t) dt. What will be new is a slightly different point of view and some terminology that will connect us to a more general topic of differential equations. In the future, we will discuss in some detail the differential equations of the form dy = f (x, y). dx A differential equation with additional information about some initial value dy = f (x, y), dx y(x0 ) = y0 is called an initial value problem (IVP). This equation is like a puzzle where an unknown function y = y(x) could appear on both sides of the equation. On the left dy hand side its derivative dx = y′ (x) appears, and on the right hand side y(x) appears inside some formula f (x, y) = f (x, y(x)) that can also depend on x. Here are some examples of such equations: dy = xy, dx dy x = , dx y dy = x2 + y2 , dx dy = 1 + y2 , dx dy = cos(x). dx To solve such an equation means to find possible functions y = y(x) that satisfy it, meaning that if we plug it in on both sides of the equation we will get equality. Example 1. Show that y = tan(x) on the interval − π2 < x < dy = 1 + y2 , y(0) = 0. initial value problem dx π 2 is the solution to the Solution: First of all, y(0) = tan(0) = 0, so the initial value matches. Next, let us plug in y = tan(x) into the differential equation and check that the two sides are dy equal. On the left hand side, the derivative dx = (tan(x))′ = sec2 (x). On the right 2 2 2 hand side, 1 + y = 1 + tan (x) = sec (x) – a famous trigonometric identity. The two sides are equal, so y = tan(x) is indeed the solution of this initial value problem. The reason the interval was limited to − π2 < x < π2 is because tan(x) has vertical asymptotes at x = − π2 and π2 , so the equation does not make sense at those points. Exercise 1. Show that x = sin(t) on the interval π2 ≤ x ≤ √ 2 initial value problem dx x(π) = 0. dt = − 1 − x , 3π 2 is the solution to the dy In general, equations dx = f (x, y) are difficult to solve, but there is one equation dy in the above list which is much easier than others. In the last equation dx = cos(x), 162 3 Integrals the right hand side does not have y in it, so the equation simply tells us that the derivative of y(x) is equal to cos(x) or, in other words, y(x) is an antiderivative of cos(x), so y(x) = sin(x) +C. Given a differential equation of this easier form dy = f (x) dx if we can find one particular antiderivative F(x) of f (x) then y = F(x) +C is called a general solution of this equation. Such equation with an additional information about some initial value dy = f (x), dx y(x0 ) = y0 is also called an initial value problem (IVP) and its solution F(x) = y0 + is called the solution of this initial value problem. Rx x0 f (t) dt Example 2. A patient is given some medication in a 50 mg immediate release pill and 100 mg delayed-release pill which is released at a rate 1600te−4t mg/h, where time t is measured in hours. What is the initial value problem that the amount of medication A(t) delivered by time t satisfies? Show that A(t) = 150 − 100(1 + 4t)e−4t is the solution to this initial value problem. Solution: The initial value A(0) = 50 comes from the immediate release pill, and the rate of change after that is due to the delayed-release pill, so the amount of medication delivered by time t satisfies the following initial value problem: dA = 1600te−4t , dt A(0) = 50. Let us show that A(t) = 150−100(1+4t)e−4t is the solution of this IVP. The initial value matches: A(0) = 150 − 100(1 + 0)e0 = 50. By the product rule, dA = 0 − 100(4)e−4t − 100(1 + 4t)e−4t (−4) = 1600te−4t , dt so the differential equation is also satisfied. OfR course, this solution comes from the Reconstruction theorem, A(t) = 50 + 1600 0t se−4s ds, and we will later learn how to find this integral using integration by parts technique, so we will learn how to solve this problem without somebody giving us the answer to check. Exercise 2. Find the solution to the following initial value problems. (a) dy = x2 + 4x3 , dx y(0) = 5. (b) dx = −10t + 3, dt x(0) = 25. 3.4 Application of FTC: differential equations 163 Motion with constant acceleration. In the first section in this chapter we have solved several problems about a linear motion with constant acceleration, in which case the velocity v(t) was a linear function of time. We have used the area under the graph of velocity to determine the position change, or displacement, but now we can also find a general formula for position y(t) using the FTC. Example 3. A coin is tossed straight up into the air from the height of 1 meters with the speed of 5 m/s. Suppose that gravity g = 10 m/s2 . (a) Find the formula for velocity and height of the coin at time t seconds. (b) When will the coin reach the highest point? What is the maximum height? (c) When will the coin hit the ground? What is its speed at that moment? Solution: (a) The acceleration is due to gravity, so it is −g = −10 m/s2 . The minus sign is because we consider upward as the positive direction. Because acceleration is the derivative of velocity, dv dt = −g, and because the initial velocity is v(0) = 5, we can solve this initial value problem by using the FTC, Z t v(t) = 5 + (−g)ds = 5 − gt = 5 − 10t. 0 Then, because vertical velocity is the derivative of height, dy dt = 5 − gt, and the initial height is y(0) = 1, we can again solve this initial value problem using the FTC-2, Z t y(t) = 1 + 0 (5 − gs)ds = 1 + 5t − g t2 = 1 + 5t − 5t 2 . 2 Notice how in both integrals above we used s as a variable of integration, because t was reserved for the upper limit. (b) At the highest point the velocity of the coin will be 0, so v(t) = 5 − 10t = 0. Solving for t we get that t = 0.5 seconds. At that moment, the height is y(0.5) = 1 + 5(0.5) − 5(0.5)2 = 2.25 meters. (c) The coin will hit the ground when the height is 0, so y(t) = 1 + 5t − 5t 2 = 0. √ 5+ 45 Solving for t we get t = 10 = 1.1708. At that moment the velocity is equal to v(1.1708) = −6.7082, so the speed is ≈ 6.7 m/s. Exercise 3. A coin is tossed straight up into the air from the height of 1 meters and it reaches the maximum height of 3 meters. What was the original velocity v0 ? Suppose that gravity g = 10 m/s2 . 164 3 Integrals Exercise 4. The lift-off speed of Airbus A3806 is approximately 90 m/s and it takes about 65 seconds to reach that speed. How long should the runway be? Assume constant acceleration. What if the lift-off speed was given as 200 mph? Answer to Exercise 1. First, the initial value sin(π) = 0 matches. Plugging x = sin(t) into the equation, the left hand side is (sin(t))′ = cos(t), and the right hand side is also q √ − 1 − sin2 (x) = − cos2 x = −| cos(x)| = cos(x). The last equality −| cos(x)| = cos(x) is true because we limit ourselves to the interval π2 ≤ x ≤ 3π 2 where cos(x) is negative. dy Answer to Exercise 2. (a) The notation dx indicates that y = y(x) is a function of x, so x is the independent variable. This means that the differential equation dy 2 3 ′ 2 3 dx = x + 4x tells us that the derivative y (x) is precisely x + 4x , so y(x) must 2 3 2 be an antiderivative of x + 4x . A general antiderivative of x + 4x3 is y(x) = x3 4 3 + x +C, which can be also called the general solution of the above differential equation. Since we are given the initial value y(0) = 5, we can plug it in, y(0) = 03 x3 4 4 3 + 0 +C = C = 5, so C = 5 and y(x) = 3 + x + 5. (b) Here, t is the independent variable, and x = x(t) is an antiderivative of −10t + 3. A general antiderivative is −5t 2 + 3t + C, and x(0) = 25 = C, so C = 25 and x(t) = −5t 2 + 3t + 25. Answer to Exercise 3. As in the previous problem, v(t) = v0 − 10t, y(t) = 1 + v0t − 5t 2 . v0 The coin reaches the maximum height when v(t) = v0 − 10t = 0, so t = 10 . At v0 that time the height must be y(t) = 3 meters, so plugging t = 10 into the equation y(t) = 1 + v0t − 5t 2 , we get v 2 v2 v2 v2 v0 0 −5 = 1+ 0 − 0 = 1+ 0 . 10 10 10 20 20 √ Solving for v0 , we get that v20 = 40, or v0 = 40 ≈ 6.32 m/s. 3 = 1 + v0 6 Image by Bill Abbott, https://www.flickr.com/photos/wbaiv/51673118672/ 3.4 Application of FTC: differential equations 165 Answer to Exercise 4. Since dv dt = a, where a is unknown acceleration, and the starting velocity is v(0) = 0, the velocity at time t is v(t) = at. We know that at 90 time t = 65 s velocity should be 90 m/s, so v(65) = 65a = 90 and a = 65 m/s2 . Since dx dt = v(t) = at and the starting position is x(0) = 0, the distance at time t is x(t) = a t 2 90 t 2 9t 2 = · = . 2 65 2 13 2 Then at lift-off time t = 65 the distance is x(65) = 9(65) 13 = 2925 meters, so the runway should be at least this long. If the lift-off speed was given as 200 mph, the units of speed (miles per hour) and time (seconds) would not match, so we would have to first convert 200 mph to distance per second, for example, 89.4 m/s or 293.3 ft/s. 166 3 Integrals 3.5 Techniques of integration: substitution rule In this section we will learn about integration by substitution, which is the reverse of the chain rule of differentiation. If F ′ (x) = f (x) then the chain rule tell us that d F(g(x)) = F ′ (g(x))g′ (x) = f (g(x))g′ (x). dx This can be rephrased in the language of antiderivatives: if F ′ (x) = f (x) then Z f (g(x))g′ (x) dx = F(g(x)) +C. In the case of definite integrals, this formula looks like this: if F ′ (x) = f (x) then Z b a f (g(x))g′ (x) dx = F(g(x)) x=b x=a = F(g(b)) − F(g(a)). How do we know that these formulas are applicable in a particular problem? The key is to recognize what the presence of some function g(x) and its derivative g′ (x) inside the integral, which takes some practice, but luckily there are several typical patterns. Power substitution. Our first example will be when the function g(x) is of the following type: g(x) = mx p + k for some constants p, m and k. Let us illustrate on a concrete example how substitution rule works in practice. Example 1. Compute the following indefinite and definite integral. √ √ Z 4 Z cos(2 x + 1) cos(2 x + 1) √ √ dx (b) dx (a) 3 x 3 x 1 Solution: Once we choose the right substitution, the solution is usually relatively short. However, in this first example let us explain how things work step by step. • First, we need to guess what the function g(x) could be and call it by a new name, for example, u, w, y, etc. We write u = g(x) which is called making a substitution, because this new variable √ u substitutes for g(x). In this case, it is a good idea to try a substitution u = 2 x + 1, because we are looking for a√function g(x) inside another function f (x), i.e. f (g(x)), and here we have cos(2 x + 1). 3.5 Techniques of integration: substitution rule 167 du dx • Second, we compute the derivative = g′ (x) and rewrite this formally du = g′ (x)dx. √ ′ √1 √1 In this problem, du dx = (2 x + 1) = x so du = x dx. We can see the presence of √1x in the integral, which is an indication that we are on the right track. • We replace all the appearances of g(x) by u and replace g′ (x) dx by du: Z Z ′ f g(x) g (x) dx = f (u) du. |{z} | {z } u du In this particular problem, √ Z Z Z 1 1 √ 1 cos(2 x + 1) √ dx = cos 2 x + 1 · · √ dx = cos(u) du. | {z } 3 3 x x 3 | {z } u du In the case of the definite integral, we also replace the limits: x = a becomes u = g(a) and x = b becomes u = g(b): Z b a Z g(b) f (u) du. f g(x) g′ (x) dx = |{z} | {z } g(a) u du √ √ Here, x = 1 becomes u = 2 1 + 1 = 3 and x = 4 becomes u = 2 4 + 1 = 5, Z 4 1 Z 5 1 1 √ 1 cos 2 x + 1 · · √ dx = cos(u) du. | {z } 3 x 3 3 | {z } u du It is important that, after we made this substitution, there is no more x left in the integral. Everything is now in terms of the new variable u. • Now that the integral has been simplified, hopefully at this step we can find the antiderivative F(u) of f (u): Z f (u) du = F(u) +C. In the case of the definite integral, we can also apply the FTC: Z g(b) u=g(b) f (u) du = F(u) g(a) In this particular problem, u=g(a) = F(g(b)) − F(g(a)). 168 3 Integrals Z 1 1 cos(u) du = sin(u) +C 3 3 and, in the case of the definite integral, Z 5 1 3 3 cos(u) du = 1 sin(u) 3 u=5 u=3 = 1 1 sin(5) − sin(3). 3 3 • The definite integral has already been computed in the last step, but in the case of indefinite integral it is very important to substitute u = g(x) back: F(u) +C = F(g(x)) +C, because our original integral f (g(x))g′ (x) dx was in terms of the x variable, so the answer should also be in terms of x. In this problem, R √ 1 1 sin(u) +C = sin(2 x + 1) +C. 3 3 • Now that we solved this problem step by step, let us show how a concise version of the solution would look like. First, we make a substitution √ √ du 1 1 u = 2 x + 1, so that = (2 x + 1)′ = √ and du = √ dx. dx x x With this substitution, we solve part (a) as follows: √ Z Z √ cos(2 x + 1) 1 1 1 √ dx = cos(u) du = sin(u) +C = sin(2 x + 1) +C. 3 x 3 3 3 In part (b), we compute the definite integral using this substitution as follows: √ Z 4 Z 5 u=5 cos(2 x + 1) 1 1 1 1 √ dx = cos(u) du = sin(u) = sin(5) − sin(3). 3 x 3 3 3 u=3 1 3 3 Comment. In the definite integral, a common mistake is not to change the limits √ Z 4 Z 4 cos(2 x + 1) 1 √ dx = cos(u) du = . . . 3 x 1 1 3 or stop writing the limits in the intermediate steps √ Z 4 Z ? cos(2 x + 1) 1 √ dx = cos(u) du = . . . . 3 x 3 1 ? On the other hand, it is totally okay to solve the indefinite integral by substitution first without writing any limits, 3.5 Techniques of integration: substitution rule Z 169 √ Z √ cos(2 x + 1) 1 1 1 √ dx = cos(u) du = sin(u) +C = sin(2 x + 1) +C, 3 x 3 3 3 and then apply the FTC in the definite integral directly √ Z 4 x=4 √ 1 1 cos(2 x + 1) 1 √ dx = sin(2 x + 1) = sin(5) − sin(3), 3 x 3 3 3 x=1 1 ⊔ ⊓ skipping all the intermediate substitution steps. Exercise 1. Compute the following indefinite and definite integral. Z (a) xe−2x 2 +3 Z 1 (b) dx xe−2x 2 +3 dx 0 Example 2. Given the graph of a Rfunction y = f (x) in the figure, estimate the integral 01 f (2x2 − 1)x dx. Solution: First, we need to notice that the integral R1 2 −1)x dx can be simplified if we make a subf (2x 0 stitution u = 2x2 − 1. Then du dx = 4x and du = 4x dx, which can be also written as x dx = 41 du. With this substitution, we can rewrite the integral as Z 1 f (2x2 − 1)x dx = 0 1 4 Z 1 f (u) du. −1 1 It remains to estimate the integral −1 f (u) du, which is the area under its graph on the interval [−1, 1]. We do not have all the information to compute this area exactly but, looking at the graph, we see that f (x) approximately follows a straight line y = 1 + x between x = −1 and x = 0, and it is approximately constant y = 1 between x = 0 and x = 1. This means that the area is approximately 12 + 1 = 23 and R the original integral is 01 f (2x2 − 1)x dx ≈ 41 × 32 = 38 . R Exercise 2. Estimate the integral x f (x) −1 0 R1 0 x f (3x2 − 1) dx given the following table: −0.5 -1.75 0 1 0.5 3.75 1 5 1.5 3.75 Trigonometric substitution. Our next substitution will be of the type g(x) = sin(x) Example 3. Compute the integral R π/2 0 or g(x) = cos(x). sin2 (x) cos(x) dx. Solution: Here we see sin(x) squared (i.e. it is inside the square function) and we see its derivative cos(x), so it is a good idea to make a substitution u = sin(x). Then 170 du dx 3 Integrals = cos(x) and du = cos(x) dx. With this substitution, Z sin2 (x) cos(x) dx = Z u2 du = u3 sin3 (x) +C = +C. 3 3 Since we already found the indefinite integral, for the definite integral we can skip the intermediate substitution steps and use the FTC directly: Z π/2 sin3 (x) 3 sin2 (x) cos(x) dx = 0 x=π/2 = x=0 1 0 1 − = . 3 3 3 We could also use the substitution for the definite integral if we also replaced the limits u = sin(0) = 0 and u = sin(π/2) = 1, so that Z π/2 sin2 (x) cos(x) dx = 0 Z 1 u2 du = 0 Exercise 3. Compute the integral R π/4 sin(x) cos2 (x) 0 u3 3 u=1 u=0 = 1 1 −0 = . 3 3 dx. Exponential substitution. Next, we will consider a couple examples when g(x) = some combination of exponentials eκx . Example 4. Compute indefinite integral R ex −e−x dx. ex +e−x Solution: If we make a substitution u = ex + e−x then and du = (ex − e−x )dx, so Z ex − e−x dx = ex + e−x Z du dx = (ex + e−x )′ = ex − e−x 1 du = ln |u| +C = ln(ex + e−x ) +C. u We do not need to write |ex + e−x | because ex + e−x is already positive. Exercise 4. Compute indefinite integral Exercise 5. Compute the integral x h(x) −1 0 R1 0 R e2x dx. 4+e2x cos eh(x) eh(x) h′ (x) dx given the table: −0.5 -1.75 0 ln π4 0.5 3.75 1 ln π2 1.5 3.75 Exercise 6. What differentiation technique does the substitution rule come from? (a) power rule (b) product rule (c) chain rule (d) quotient rule 3.5 Techniques of integration: substitution rule 171 Answer to Exercise 1. We make a substitution u = −2x2 + 3 so that du dx = −4x and du = −4x dx. We can also write x dx = − 41 du. With this substitution, we solve part (a) as follows: Z −2x2 +3 xe 1 dx = − 4 Z 2 1 1 eu du = − eu +C = − e−2x +3 +C. 4 4 In part (b), we compute the definite integral using this substitution as follows: Z 1 −2x2 +3 xe 0 1 dx = − 4 Z 1 eu du = − 3 eu 4 e e3 =− + . 4 4 u=3 u=1 Answer to Exercise 2. First, we make a substitution u = 3x2 − 1. Then du dx = 6x and du = 6x dx, which can be also written as x dx = 61 du. With this substitution, Z 1 f (3x2 − 1)x dx = 0 1 6 Z 2 f (u) du. −1 2 It remains to estimate the integral −1 f (u) du. Given the table, we can use the left Riemann sum with n = 6 subintervals on [−1, 2] of length 0.5 each: R Z 2 f (u) du ≈ (0 − 1.75 + 1 + 3.75 + 5 + 3.75) × 0.5 = 5.875. −1 The original integral is ≈ 16 5.875 = 0.97916 . . .. Answer to Exercise 3. We make a substitution u = cos(x), − sin(x) dx: Z sin(x) dx = − cos2 (x) Z du dx = − sin(x) and du = 1 1 1 du = +C = +C. 2 u u cos(x) Using the FTC: Z π/4 sin(x) 0 cos2 (x) dx = 1 cos(x) x=π/4 x=0 = √ 2 − 1. Answer to Exercise 4. If we make a substitution u = 4 + e2x then e2x dx = 12 du, so Z e2x 1 dx = 4 + e2x 2 Z du dx = 2e2x and 1 1 1 du = ln |u| +C = ln(4 + e2x ) +C. u 2 2 Answer to Exercise 5. We can substitute u = h(x) first, but this will require us to do another substitution later on (try it). Instead, if we make a substitution u = eh(x) then 172 3 Integrals = eh(x) h′ (x) and du = eh(x) h′ (x) dx. Also, from the table, x = 0 will be replaced π π by u = eh(0) = eln 4 = π4 , and x = 1 will be replaced by u = eh(1) = eln 2 = π2 , so du dx Z 1 0 Z cos eh(x) eh(x) h′ (x) dx = u=π/2 π/2 π/4 cos(u) du = sin(u) u=π/4 1 = sin(π/2) − sin(π/4) = 1 − √ . 2 Answer to Exercise 6. Substitution rule is the reverse of the chain rule, so (c). 3.6 Techniques of integration: integration by parts 173 3.6 Techniques of integration: integration by parts In this section we will learn about integration by parts, which is the reverse of the product rule of differentiation. The product rule tell us that ′ u(x)v(x) = u′ (x)v(x) + u(x)v′ (x). Integrating both sides, we can write Z Z Z ′ u(x)v(x) dx = u′ (x)v(x) dx + u(x)v′ (x) dx. The left hand side is equal to u(x)v(x) because the antiderivative of a derivative is the function itself. If we replace the left hand side by u(x)v(x) and move the last R integral u(x)v′ (x) dx to the other side of the equation, we get Z ′ u (x)v(x) dx = u(x)v(x) − Z u(x)v′ (x) dx. This formula is called integration by parts, and its definite integral version is Z b u(x)v′ (x) dx = u(x)v(x) a x=b x=a − Z b u′ (x)v(x) dx. a This formula relates one integral ab u(x)v′ (x) dx to another integral ab u′ (x)v(x) dx, and the idea is that in some cases the second integral is much easier to compute than the first one. How do we know that these formulas are applicable in a given problem? As usual, it takes some practice, but we will only focus on a few common examples of the form (possibly with some constants) R xn sin(x), R xn cos(x), x n ex , ln(x), xn ln(x). The choice of u(x) and v(x) is summarized in the following table: u v′ xn ln(x) & x sin(x), cos(x), e 1, xn u′ v 1 nxn−1 x n+1 − cos(x), sin(x), ex x, xn+1 and we will see that in all these cases u′ (x)v(x) will be simpler than u(x)v′ (x). Example 1. Compute the following indefinite and definite integral. Z π/4 Z (a) x cos(2x) dx (b) x cos(2x) dx 0 174 3 Integrals Solution: According to the above table, we take u(x) = x and v′ (x) = cos(2x). To use the integration by parts formula, we need to find u′ (x) and v(x). First, u′ (x) = 1. To find v(x) we need to find an antiderivative of v′ (x) = cos(2x). Let us recall the following substitution formula that will help us many times below: 1 if antiderivative of f (x) is F(x) then antiderivative of f (kx) is F(kx). k Since an antiderivative of cos(x) is sin(x), an antiderivative of cos(2x) is 21 sin(2x), so v(x) = 12 sin(2x). Here we do not need to write +C, because integration by parts will work with any specific antiderivative. Using the integration by parts formula with u(x) = x, v′ (x) = cos(2x), u′ (x) = 1 and v(x) = 21 sin(2x): Z x cos(2x) dx = u(x)v(x) − Z 1 = x sin(2x) − 2 u′ (x)v(x) dx Z 1 sin(2x) dx. 2 Notice how the integral simplified and we know how to integrate Z 1 1 1 1 − sin(2x) dx = − − cos(2x) +C = cos(2x) +C. 2 2 2 4 Do not forget +C at this step! Plugging this into the above formula, Z x cos(2x) dx = x 1 sin(2x) + cos(2x) +C. 2 4 For the definite integral, we can use the FTC after we computed the indefinite integral: Z π/4 x cos(2x) dx = 0 1 sin(2x) + cos(2x) 2 4 x x=π/4 x=0 = π 1 − . 8 4 Alternatively, we could also carry the limits from the beginning: Z π/4 π/4 1 x=π/4 x sin(2x) − sin(2x) dx 2 2 x=0 0 Z π/4 π 1 = − sin(2x) dx. 8 2 0 x=π/4 π 1 π 1 = − . = + cos(2x) 8 4 8 4 x=0 Z x cos(2x) dx = 0 When using integration by parts, it is often easier to find the indefinite integral first and plug in the limits at the very end. 3.6 Techniques of integration: integration by parts 175 Exercise 1. Compute the following indefinite and definite integral. Z π/3 Z (a) x sin(3x) dx (b) x sin(3x) dx 0 Example 2. Compute the following indefinite and definite integral. Z Z 1 te2t dt (a) (b) te2t dt 0 Solution: According to the above table, we take u(t) = t and v′ (t) = e2t . Then u′ (t) = 1 and v(t) = 12 e2t and, using integration by parts, Z t te2t dt = e2t − 2 Z 1 2t t 1 e dt = e2t − e2t +C. 2 2 4 For the definite integral, we can use the FTC after we computed the indefinite integral: Z 1 te2t dt = 0 1 e2t − e2t 2 4 t t=1 t=0 = e2 1 + = 2.0973 . . . 4 4 Exercise 2. Compute the following indefinite and definite integral. Z Z 4 te−t/4 dt (a) (b) te−t/4 dt 0 Example 3. Compute the following indefinite and definite integral. Z 2 Z (a) ln(x) dx (b) ln(x) dx 1 Solution: In this integral we do not have a product of two functions like in a typical integration by parts examples, but we can nevertheless use integration by parts. If we take u(x) = ln(x) then what is v′ (x)? In this case it is simply v′ (x) = 1, so u(x)v′ (x) = ln(x) × 1 = ln(x). Then u′ (x) = 1x , v(x) = x and the integration by parts formula gives Z ln(x) dx = ln(x)x − Z 1 x dx = x ln(x) − x Z 1 dx = x ln(x) − x +C. The definite integral is Z 2 1 ln(x) dx = (x ln(x) − x) x=2 x=1 = (2 ln 2 − 2) − (1 ln 1 − 1) = 2 ln 2 − 1. 176 3 Integrals Exercise 3. Compute the following indefinite and definite integral. Z (a) Z 2 x2 ln(x) dx (b) x2 ln(x) dx 1 Comment. We did not include any constant inside the logarithm ln(x) in the above problems because ln(kx) = ln(k) + ln(x) so we can separate the constant factor k into a separate term. ⊔ ⊓ R2 Example 4. Find 0 f (x)g′ (x) dx given that x 0 2 R2 ′ 0 f (x)g(x) dx = −4 and f (x) f ′ (x) 1 −1.5 2 1 g(x) 0.5 −1 g′ (x) 3.5 1 Solution: Using the integration by parts formula, Z 2 f (x)g′ (x) dx = f (x)g(x) 0 x=2 x=0 − Z 2 f ′ (x)g(x) dx 0 = f (2)g(2) − f (0)g(0) − Z 2 f ′ (x)g(x) dx. 0 Using the table and given integral, this is equal to 2(−1) − 1(0.5) − (−4) = 1.5. Exercise 4. Find R3 1 x f ′′ (x) dx given the following table f (x) f ′ (x) 1 −1.5 2 1 x 1 3 R Example 5. Compute sin(x) cos(x) dx using: (a) substitution (b) integration by parts Solution: (a) Let us take u = sin(x). Then Z du dx Z sin(x) cos(x) dx = u du = (c) trig identity = cos(x), du = cos(x)dx, and u2 sin2 (x) +C = +C. 2 2 (b) If we take u(x) = sin(x), v′ (x) = cos(x), u′ (x) = cos(x), v(x) = sin(x) then the integration by parts formula gives Z sin(x) cos(x) dx = sin2 (x) − R Z cos(x) sin(x) dx. We see the same indefinite dx on both sides of the equation, R integral sin(x) cos(x) 2 1 so solving for it we get sin(x) cos(x) dx = sin (x). We found one antiderivative, 2 R 1 and the general antiderivative is sin(x) cos(x) dx = 2 sin2 (x) +C as in (a). 3.6 Techniques of integration: integration by parts 177 (c) Using trigonometric identity 2 sin(x) cos(x) = sin(2x), Z sin(x) cos(x) dx = 1 2 1 sin(2x) dx = − cos(2x) +C. 4 Z Why does this answer look different from (a) and (b)? Because the two answers are related by another trigonometric identity cos(2x) = 1 − 2 sin2 (x), so we can rewrite this answer as − 14 + 21 sin2 (x) + C = 12 sin2 (x) + C′ , where C′ = C − 41 can be any constant, because C can be any constant. Now the answers are the same. Exercise 5. Find the mistake in the following argument that shows that 0 = 1. Proof: If we take u(x) = sin(x), v′ (x) = cos(x), u′ (x) = cos(x), v(x) = sin(x) then the integration by parts formula gives Z sin(x) cos(x) dx = sin2 (x) − Z cos(x) sin(x) dx. R Next, let us integrate the last integral cos(x) sin(x) dx by parts taking u(x) = cos(x), v′ (x) = sin(x), u′ (x) = − sin(x), v(x) = − cos(x). Integration by parts gives Z cos(x) sin(x) dx = − cos2 (x) − Z sin(x) cos(x) dx. If we plug this into the first integration by parts step above, we get Z Z sin(x) cos(x) dx = sin2 (x) − − cos2 (x) − sin(x) cos(x) dx Z 2 2 Z sin(x) cos(x) dx = sin (x) + cos (x) + {z } | sin(x) cos(x) dx. 1 R If we cancel the integrals sin(x) cos(x) dx on the left and right hand side, we get 0 = 1. Where is the mistake? Exercise 6. What differentiation technique does the integration by parts formula come from? (a) power rule (b) product rule (c) chain rule (d) quotient rule 178 3 Integrals Answer to Exercise 1. Take u(x) = x and v′ (x) = sin(3x), so that u′ (x) = 1 and v(x) = − 31 cos(3x). Integration by parts gives, Z x x sin(3x) dx = − cos(3x) + 3 Z 1 x 1 cos(3x) dx = − cos(3x) + sin(3x) +C. 3 3 9 The definite integral is Z π/3 0 x 1 x sin(3x) dx = − cos(3x) + sin(3x) 3 9 x=π/3 x=0 = π . 9 Answer to Exercise 2. According to the above table, we take u(t) = t and v′ (t) = e−t/4 . Then u′ (t) = 1 and v(t) = −4e−t/4 and, using integration by parts, Z te−t/4 dt = −4te−t/4 + Z 4e−t/4 dt = −4te−t/4 − 16e−t/4 +C. For the definite integral, we can use the FTC after we computed the indefinite integral: Z 4 te−t/4 dt = −4te−t/4 − 16e−t/4 0 t=4 t=0 =− 32 + 16 = 4.2279 . . . e Answer to Exercise 3. If we take u(x) = ln(x) and v′ (x) = x2 then u′ (x) = 3 v(x) = x3 . The integration by parts formula gives Z x2 ln(x) dx = x3 ln(x) − 3 Z 1 x3 x3 dx = ln(x) − x 3 3 Z 1 x and x2 x3 x3 dx = ln(x) − +C. 3 3 9 The definite integral is Z 2 1 x2 ln(x) dx = x3 3 ln(x) − x3 9 x=2 x=1 = 8 7 ln(2) − . 3 9 Answer to Exercise 4. Take u(x) = x and v′ (x) = f ′′ (x). Then u′ (x) = 1 and v(x) = f ′ (x). Using the integration by parts formula, Z x f ′′ (x) dx = x f ′ (x) − Z f ′ (x) dx = x f ′ (x) − f (x) +C. In the second equality we used that an antiderivative of f ′ (x) is f (x). Then the definite integral is Z 3 1 x f ′′ (x) dx = x f ′ (x) − f (x) x=3 x=1 = (3 f ′ (3) − f (3)) − ( f ′ (1) − f (1)). Using the table, this is equal to (3(1) − 2) − (−1.5 − 1) = 3.5. 3.6 Techniques of integration: integration by parts 179 Answer to Exercise 5. After two integration by parts steps we arrived at Z 2 2 Z sin(x) cos(x) dx = sin (x) + cos (x) + sin(x) cos(x) dx which is the same as Z Z sin(x) cos(x) dx = 1 + sin(x) cos(x) dx. The mistake was in cancelling the two integrals. Recall that indefinite integrals, a.k.a. antiderivatives, are defined up to a constant +C. The above equation simply says that adding +1 to any antiderivative of sin(x) cos(x) is also an antiderivative of R sin(x) cos(x). When we say that indefinite integral f (x) dx is equal to something, there is always a hidden +C in the statement. Answer to Exercise 6. Integration by parts is based on the product rule, so (b). 180 3 Integrals 3.7 Approximating integrals using Taylor polynomials In this section we will learn another way to approximate a function y = f (x) by simple functions, called Taylor polynomials, and then use them to approximate integrals. To introduce these polynomials, let us start by recalling a formula for the tangent line to y = f (x) at x = a: P1 (x) = f (a) + f ′ (a)(x − a). Tangent line is also called the Taylor polynomial of degree n = 1 (centered) at x = a and is often denoted P1 (x). What do we know about the tangent line? • Tangent line has the same value at x = a as our function: P1 (a) = f (a). • Tangent line has the same derivative (velocity) at x = a: P1′ (a) = f ′ (a). As a result, the tangent line P1 (x) approximates f (x) near x = a. Taylor polynomials of degree 2. What if we also want our approximation of y = f (x) to have the same second derivative (acceleration) at x = a? It turns out that we can do that if instead of a line we use a parabola P2 (x) = f (a) + f ′ (a)(x − a) + f ′′ (a) (x − a)2 2 which is called the Taylor polynomial of degree n = 2 (centered) at x = a and is denoted P2 (x). Similarly to the Taylor polynomial of degree 1: • P2 (x) has the same value at x = a as our function: P2 (a) = f (a). • P2 (x) has the same derivative (velocty) at x = a: P2′ (a) = f ′ (a). • In addition, P2 (x) has the same second derivative (acceleration) at x = a: P2′′ (a) = f ′′ (a). As a result, P2 (x) also approximates f (x) near x = a, often better than P1 (x) and in a bigger neighbourhood of the point x = a, as we can see in the figure above. Example 1. Explain why we divide by 2 in the coefficient f f ′′ (a) 2 2 (x − a) in the definition of the Taylor polynomial P2 (x). ′′ (a) 2 of the last term 3.7 Approximating integrals using Taylor polynomials 181 Solution: We want the first derivative to be the same, P2′ (a) = f ′ (a), and the second derivative to be the same, P2′′ (a) = f ′′ (a), at x = a. If we compute the first two derivatives of the above parabola P2 (x): P2′ (x) = f ′ (a) + f ′′ (a) 2(x − a) = f ′ (a) + f ′′ (a)(x − a), P2′′ (x) = f ′′ (a), 2 we can see that P2′ (a) = f ′ (a) + f ′′ (a)(a − a) = f ′ (a) and P2′′ (a) = f ′′ (a). Notice how 2 in the denominator cancelled 2 that came from the derivative of (x − a)2 , which explains why we divided by 2 in the last term of P2 (x). The derivatives would not match otherwise. Exercise 1. Suppose that a function f (x) is approximated near x = 0 by a Taylor polynomial of degree 2 given by P2 (x) = −3 + 2x − x2 . Find f (0), f ′ (0) and f ′′ (0). Example 2. Compute and graph the first and second degree Taylor polynomial centered at x = 0 for the functions y = ex and y = cos(x). Solution: If f (x) = ex then f ′ (x) = ex and f ′′ (x) = ex , so f (0) = f ′ (0) = f ′′ (0) = 1. 2 By definition, P1 (x) = 1 + x and P2 (x) = 1 + x + x2 . The graph is in the left figure below. If f (x) = cos(x) then f ′ (x) = − sin(x) and f ′′ (x) = − cos(x), so f (0) = 1, 2 f ′ (0) = 0 and f ′′ (0) = −1. By definition, P1 (x) = 1 and P2 (x) = 1 − x2 . The graph is in the right figure below. Notice that, if a function is concave up at x = a then its Taylor polynomial P2 (x) of degree 2 is also concave up, and if a function is concave down at x = a then its Taylor polynomial P2 (x) of degree 2 is also concave down. This is because the coefficient that determines if the parabola opens upward or downward is f ′′ (a)/2: if f ′′ (a) > 0 then f (x) is concave up and the parabola opens upward, and if f ′′ (a) < 0 then f (x) is concave down and the parabola opens downward. 182 3 Integrals Exercise 2. Suppose that P2 (x) = p + qx + rx2 is the Taylor polynomial of degree 2 centered at x = 0 for the function f (x). For each of the functions below determine the sign of p, q and r. (a) (b) (c) (d) Example 3. Estimate the coefficients of the second degree Taylor polynomial P2 (x) = p + q(x − 0.5) + r(x − 0.5)2 centered at x = 0.5 for the function f (x) given the values in the table: x f (x) 0 3 0.25 1.75 0.5 1 0.75 0.75 1 1 ′′ Solution: By definition, p = f (0.5), q = f ′ (0.5), and r = f (0.5) 2 , so p = f (0.5) = 1 and we need to estimate f ′ (0.5) and f ′′ (0.5). We can estimate the first derivative ∆y f ′ (0.5) ≈ ∆x in several ways: • By considering the increments ∆y and ∆x between x = 0.5 and x = 0.75: ∆y 0.75−1 f ′ (0.5) ≈ ∆x = 0.75−0.5 = −1. • By considering the increments ∆y and ∆x between x = 0.25 and x = 0.5: ∆y 1−1.75 f ′ (0.5) ≈ ∆x = 0.5−0.25 = −3. • By taking the average of the above two approximations: f ′ (0.5) ≈ −1−3 2 = −2. This is the same as considering the increments ∆y and ∆x between x = 0.25 and ∆y x = 0.75: f ′ (0.5) ≈ ∆x = 0.75−1.75 0.75−0.25 = −2. We have seen when estimating derivatives that taking the average is often more accurate, so let us take the estimate q = f ′ (0.5) ≈ −2. Finally, we need to estimate the second derivative f ′′ (0.5). Let us use the following estimate: 3.7 Approximating integrals using Taylor polynomials f ′′ (x) ≈ f (x+h)− f (x) h − h f (x)− f (x−h) h 183 = f (x + h) − 2 f (x) + f (x − h) h2 where h is the increment ∆x, in this case h = 0.25. Before we explain this formula, let us use in this problem: f ′′ (0.5) ≈ f (0.75) − 2 f (0.5) + f (0.25) 0.75 − 2 · 1 + 1.75 = = 8, 0.252 0.252 ′′ so our estimate of the last coefficient is r = f (0.5) ≈ 82 = 4. We estimate that the 2 Taylor polynomial is P2 (x) ≈ 1 − 2(x − 0.5) + 4(x − 0.5)2 . Finally, the reason behind the above estimate of the second derivative f ′′ (x) f (x) can be viewed as an estimate of f ′ (x + h2 ), f (x)−hf (x−h) can be is that f (x+h)− h viewed as an estimate of f ′ (x − h2 ), and then the above formula becomes f ′′ (x) ≈ f ′ (x + 2h ) − f ′ (x − h2 ) h which is exactly how we would estimate the second derivative f ′′ (x) if we knew the values of the first derivative. Exercise 3. Estimate the coefficients of the second degree Taylor polynomial P2 (x) = p + q(x − 2) + r(x − 2)2 centered at x = 2 for the function f (x) given the values in the table: x f (x) 1.5 1.75 1.75 2 2 1.75 2.25 2.5 1 -0.25 Taylor polynomials of general degree. Can we generalize the Taylor polynomials of degree 1 and 2 to match not only the first and second derivatives at x = a, but also the third, fourth derivative, and so on? The answer is yes, we can match the first n derivatives, if we use a polynomial of degree n: Pn (x) = f (a) + f ′ (a)(x − a) + f (n) (a) f ′′ (a) (x − a)2 + . . . + (x − a)n 2! n! which is called the Taylor polynomial of degree n (centered) at x = a and is denoted Pn (x). Here n! = 1 × 2 × . . . × n is called n-factorial. All the coefficients are chosen in such a way that the first n derivatives match: (n) Pn (a) = f (a), Pn′ (a) = f ′ (a), Pn′′ (a) = f ′′ (a), . . . , Pn (a) = f (n) (a). The following figures show several Taylor polynomials centered at x = 0 for three functions: ex , cos(x), and sin(x). We can see that the approximations get better and better on wider and wider intervals as the degree n increases. 184 3 Integrals Let us now consider several classic examples of Taylor polynomials. First, let us take a look at the exponential function f (x) = ex . Because all the derivatives of ex are equal to ex , we get that f (n) (x) = ex and f (n) (0) = e0 = 1, and the Taylor polynomial of ex of degree n at x = 0 is Pn (x) = 1 + x + x2 x3 xn + +...+ . 2! 3! n! Next, let us compute Taylor polynomials for cos(x) and sin(x). Example 4. Compute Taylor polynomials of all degrees at x = 0 for f (x) = cos(x). Solution: When we write consecutive derivatives of f (x) = cos(x), f = cos(x), f ′ = − sin(x), f ′′ = − cos(x), f ′′′ = sin(x), f (4) = cos(x), . . . we see that the fourth derivative is equal to the original function cos(x), so after that the same pattern of cos(x), − sin(x), − cos(x), sin(x) will keep repeating. When we plug in x = 0, we see that cos(0) = 1, − sin(0) = 0, − cos(0) = −1, sin(0) = 0, so the derivatives at x = 0 will follow a repeating pattern of 1, 0, −1, 0, etc. That means that the Taylor polynomials of cos(x) at x = 0 will have a pattern 1− x2 x4 x6 x8 x10 + − + − +... 2! 4! 6! 8! 10! Notice that all odd powers of x are missing because of the coefficients ± sin(0) = 0. We can stop at any degree to get the Taylor polynomial Pn (x) of that degree n: P1 (x) = 1, P2 (x) = 1 − x2 x2 x2 x4 , P3 (x) = 1 − , P4 (x) = 1 − + , . . . 2! 2! 2! 4! Notice how P3 (x) is equal to P2 (x). Again, that is because the coefficient in front of x3 is zero. That is why in the above figure we plotted only even degree Taylor polynomials P2 (x), P4 (x) and P6 (x). 3.7 Approximating integrals using Taylor polynomials 185 Exercise 4. Show that the Taylor polynomials of f (x) = sin(x) at x = 0 follow a pattern x3 x5 x7 x9 x11 x− + − + − +.... 3! 5! 7! 9! 11! What is P5 (x) and P6 (x)? Example 5. Write down and simplify the Taylor polynomial of degree 5 centered at x = 0 for y = f (x) given that f (0) −2 f ′ (0) f ′′ (0) f ′′′ (0) f (4) (0) f (5) (0) 2 −1 2 −3 12 Solution: By definition, P5 (x) = −2 + 2x − 1 2 2 3 3 4 12 5 x + x − x + x . 2! 3! 4! 5! We can simplify the last three coefficients, 2 2 1 = = ; 3! 1 · 2 · 3 3 − 3 3 1 =− =− ; 4! 1 · 2 · 3 · 4 8 12 3 · 4 1 = = , 5! 1 · 2 · 3 · 4 · 5 10 so the simplified form of the Taylor polynomial is P5 (x) = −2 + 2x − x2 x3 x4 x5 + − + . 2 3 8 10 Exercise 5. Find f (5) (−1) and f (7) (−1) given the Taylor polynomial of f (x) of 7 degree 7 centered at x = −1: P7 (x) = −2 + (x + 1) − 3(x + 1)3 + (x + 1)4 − (x+1) 6! . Approximating integrals using Taylor polynomials. Next, we will use Taylor polynomials to approximate definite integrals. As we will see, Taylor polynomials are very useful because they are very easy to integrate. Example 6. Write down a Taylor polynomial of cos(2x2 ) at x = 0 with three non-zero terms and use R1 it to approximate −1 cos(2x2 ) dx. Solution: In problems of this type, instead of using the definition of a Taylor polynomial and computing the derivatives of cos(2x2 ), what we can do is take a Taylor polynomial of cos(x) and then replace x by 2x2 . This will automatically give us a Taylor polynomial we want without doing extra calculations. We already know that cos(x) ≈ 1 − x2 x4 + 2! 4! 186 3 Integrals near x = 0, where we wrote the polynomial of degree 4 because we were asked to use a polynomial with three non-zero terms. If we now replace x by 2x2 , we get cos(2x2 ) ≈ 1 − 2 (2x2 )2 (2x2 )4 + = 1 − 2x4 + x8 . 2! 4! 3 Actually, this approximation is very good on the interval [−1, 1] as we can see in the figure above, where cos(2x2 ) is the black solid line and 1 − 2x4 + 32 x8 is the blue dashed line. Integrating this approximation gives Z 1 Z 1 2 1 − 2x4 + x8 dx 3 −1 9 5 2x x=1 2x + = x− 5 27 x=−1 2 2 2 2 = 1− + − −1 + − = 1.3481 . . . 5 27 5 27 cos(2x2 ) dx ≈ −1 Actually, one using a computer or a graphical calculator that the original R 1 can check integral is −1 cos(2x2 ) dx = 1.3351 . . . , so the approximation we obtained is pretty good. To make it even better we could have used a Taylor polynomial with a few more terms, which would give a better approximation of our function. 2 Exercise 6. Write down a Taylor polynomial of e−x at x = 0 with four non-zero R 1 −x2 terms and use it to approximate 0 e dx. Example 7. Use a Taylor polynomial of sin(x) at x = 0 with two non-zero terms to approximate R 1 sin(x) 0 x dx. Solution: As the problem suggests, we start with a Taylor approximation sin(x) ≈ x − x3 x3 = x− . 3! 6 When we divide both sides by x, we get sin(x) x2 ≈ 1− . x 6 2 x In the figure above, sin(x) x is the black solid line and 1 − 6 is the blue dashed line, and the approximation looks pretty good. Integrating this approximation gives Z 1 sin(x) 0 x x2 1− dx 6 0 x3 x=1 1 17 = x− = 1− −0 = = 0.9444 . . . 18 x=0 18 18 dx ≈ Z 1 3.7 Approximating integrals using Taylor polynomials 187 One can check using a computer or a graphical calculator that the original integral R is 01 sin(x) x dx = 0.9460 . . ., so the approximation is pretty good. There is one subtle point in this problem: sin(x) is not defined at x = 0 because we divide by 0. Howx ever, according to the above figure, sin(x) approaches 1 as x approaches 0, so we x implicitly assumed that that function we integrate is equal to 1 at x = 0. Exercise 7. Use a Taylor polynomial of cos(x) at x = 0 with three non-zero terms R dx. to approximate 01 1−cos(x) x2 Exercise 8. A bacterial colony starts growing in a Petri dish at time t = 0 (time is measured in days). Every hour between 22 and 26 hours you measure the area of the colony (in cm2 ) and from the data you estimate f (1), f ′ (1) and f ′′ (1), where f (t) is the growth rate (in cm2 /day) of the area occupied by the colony. However, the next day you realize that you did not save the area data and only know that f (1) = 2, f ′ (1) = −0.6, and f ′′ (1) = 0.18. How can you estimate the area at t = 1 given this information? Answer to Exercise 1. There are two ways we can solve this problem. Since we know that P2 (x) and f (x) have the same value and first two derivatives at x = 0 (so in this case a = 0), we can just compute P2′ (0) = 2 − 2x and P2′′ (0) = −2 and plug in x = 0 to get f (0) = P(0) = −3, f ′ (0) = P2′ (0) = 2, and f ′′ (0) = P2′′ (0) = −2. A better way to solve this problem without any calculations is to compare the ′′ definiton of P2 (x) = f (0) + f ′ (0)x + f 2(0) x2 centered at x = 0 with the formula given to us, P2 (x) = −3 + 2x−1x2 , and make sure that all the coefficients match: f (0) = −3, f ′ (0) = 2, f ′′ (0) = −1. 2 This immediately gives us f (0) = −3, f ′ (0) = 2, and f ′′ (0) = −2. Answer to Exercise 2. (a) p > 0 because p = f (0) > 0, q > 0 because q = f ′ (0) > 0 since the slope is positive, r < 0 because r = f ′′ (0)/2 < 0 since the function is concave down at x = 0. (b) p > 0, q < 0, r < 0. (c) p > 0, q > 0, r > 0. (d) p > 0, q < 0, r > 0. Answer to Exercise 3. p = f (2) = 1.75, q = f ′ (2) ≈ ′′ and r = f 2(2) , where f ′′ (2) ≈ f (2.25)− f (1.75) 0.5 = 1−2 0.5 = −2, f (2.25) − 2 f (2) + f (1.75) 1 − 2 · 1.75 + 2 = = −8, 0.252 0.252 so r ≈ − 28 = −4. The Taylor polynomial is P2 (x) ≈ 1.75 − 2(x − 2) − 4(x − 2)2 . 188 3 Integrals Answer to Exercise 4. When we write consecutive derivatives of f (x) = sin(x), ′′′ f = sin(x), f ′ = cos(x), f ′′ = − sin(x), f = − cos(x), f (4) = sin(x), . . . we see that the fourth derivative is equal to the original function sin(x), so after that the same pattern of sin(x), cos(x), − sin(x), − cos(x) will keep repeating. When we plug in x = 0, we see that sin(0) = 0, cos(0) = 1, − sin(0) = 0, − cos(0) = −1, so the derivatives at x = 0 will follow a repeating pattern of 0, 1, 0, −1 etc. That means that Taylor polynomials will have a pattern x− x3 x5 x7 x9 x11 + − + − +... 3! 5! 7! 9! 11! Notice that all even powers of x are missing because of the coefficients ± sin(0) = 0. We can stop at any degree to get the Taylor polynomial Pn (x) of that degree n. For example, x3 x5 x3 x5 P5 (x) = x − + , P6 (x) = x − + 3! 5! 3! 5! Notice how P6 (x) is equal to P5 (x). Again, that is because the coefficient in front of x6 is zero. Answer to Exercise 5. Because we do not have the term with (x − (−1))5 = (x + (5) 1)5 , it means that the coefficient f 5!(−1) is 0, so f (5) (−1) = 0. The coefficient in front of (x − (−1))7 = (x + 1)7 is − 6!1 which by definition should be f (7) (−1) 1 =− 7! 6! =⇒ f (7) (−1) = − f (7) (−1) , 7! so 7! = −7. 6! Answer to Exercise 6. We start with the Taylor polynomial for ex with four terms, ex ≈ 1 + x + x2 x3 + , 2! 3! and replace x by −x2 , 2 (−x2 )2 (−x2 )3 + 2! 3! 4 6 x x = 1 − x2 + − . 2 6 e−x ≈ 1 + (−x2 ) + 2 4 6 In the figure above, where e−x is the black solid line and 1 − x2 + x2 − x6 is the blue dashed line. Integrating this approximation gives 3.7 Approximating integrals using Taylor polynomials Z 1 2 x4 x6 − dx 2 6 0 x3 x5 x7 x=1 = x− + − 3 10 42 x=0 1 1 1 − 0 = 0.7428 . . . . = 1− + − 3 10 42 e−x dx ≈ 0 189 Z 1 1 − x2 + One can check using a computer or a graphical calculator that the original integral R 2 is 01 e−x dx = 0.7468 . . . , so the approximation we obtained is pretty good. Answer to Exercise 7. As the problem suggests, we start with a Taylor approximation cos(x) ≈ 1 − x2 x4 x2 x4 + = 1− + 2! 4! 2 24 When we subtract both sides from 1 and divide by x2 , we get 1 − cos(x) 1 x2 ≈ − . x2 2 24 is the black solid line and In the figure above, 1−cos(x) x2 line. Integrating this approximation gives Z 1 1 − cos(x) 0 x2 1 2 2 x − 24 is the blue dashed x2 dx 2 24 0 x x3 x=1 = − 2 72 x=0 1 1 = − − 0 = 0.4861 . . . 2 72 dx ≈ Z 1 1 − One can check using a computer or a graphical calculator that the original inteR dx = 0.4863 . . .. Again, there is one subtle point in this problem: gral is 01 1−cos(x) x2 1−cos(x) is not defined at x = 0 because we divide by 0. However, according to the x2 x2 approximation 1−cos(x) ≈ 12 − 24 , this function approaches 12 as x approaches 0, so x2 we can implicitly assume that that function we integrate is equal to 12 at x = 0. Answer to Exercise 8. If A(t) is the area at time t then f (t) = A′ (t) and, by the FTC, Z Z 1 A(1) = A(0) + 1 f (t) dt = 0 f (t) dt, 0 because the area was 0 at t = 0. Given f (1) = 2, f ′ (1) = −0.6, and f ′′ (1) = 0.18, we can estimate f (t) using the second degree Taylor polynomial centered at t = 1, 190 3 Integrals f ′′ (1) (t − 1)2 2 = 2 − 0.6(t − 1) + 0.09(t − 1)2 f (t) ≈ P2 (t) = f (1) + f ′ (1)(t − 1) + and use it to estimate the integral Z 1 A(1) = 0 Z 1 2 − 0.6(t − 1) + 0.09(t − 1)2 dt 0 (t − 1)3 t=1 (t − 1)2 + 0.09 = 2t − 0.6 2 3 t=0 0.6 0.09 2 − = 2.33 cm . = 2− − 2 3 f (t) dt ≈ 3.8 CAS: computer algebra systems 191 3.8 CAS: computer algebra systems When it comes to computing indefinite and definite integrals, there are many computer algebra systems (advanced online calculators) available, such as Wolfram Alpha, Symbolab, Geogebra, Desmos, etc. In this section we will go over several examples of computing indefinite and definite integrals, as well as Taylor polynomials, using Wolfram Alpha. It is quite flexible in terms of interpreting queries in natural language, so in this sense it is very convenient. Indefinite integrals. To find an R indefinite integral f (x) dx of some function f (x), one can simply enter “integral of f (x)” into the input bar. Depending on the function f (x), the output may vary, as we will see in the examples below. In the example on the right, the function f (x) is ex cos(x), and the first line of the output is the answer Z ex cos(x) dx = ex (cos(x)+sin(x))+C. 2 In this particular case, below some x plots of e2 (cos(x) + sin(x)), it also gives alternative forms of the integral, for example, ex π √ sin(x + ) +C 4 2 which is just another way to rewrite this function using trigonometric identities. Below that, the Taylor polynomial of degree 4 is given, 1 x2 x4 +x+ − . 2 2 12 It says “series expansion of the integral at x = 0” because Taylor polynomials give rise to the so called Taylor series that will be discussed in the later chapter. As we can see, the output contains a lot of useful information without us even asking for it. If we are simply looking for an antiderivative, it is useful to take a look at alternative forms of the integral. Exercise 1. Using www.wolframalpha.com, find the indefinite integral Z sin(x) dx. cos2 (x) What is an alternative form of the integral? 192 3 Integrals Definite integrals. To find a defiR nite integral ab f (x) dx of some function f (x), we can enter “integral of f (x) from a to b” into the input bar. In example on the right, we entered R 1 the x cos(x) dx and the output gives e 0 1 e(sin(1) + cos(1)) − 1 ≈ 1.3780 2 One can click on the answer to see a more accurate decimal approximation 1.378024613547 . . . . The exact answer 21 (e(sin(1) + cos(1)) − 1) actually comes from an application of the FTC using the indefinite integral found in the example above: x=1 ex 1 (cos(x) + sin(x)) = e(sin(1) + cos(1)) − 1. 2 2 x=0 This indefinite integral is also given in the output, if you scroll down. Exercise 2. Using www.wolframalpha.com, find the definite integral Z π/4 sin(x) 0 cos2 (x) dx. Give the answer up to ten digits. The good thing about definite integrals is that they can be estimated using Riemann sums (and other techniques of numerical integration) even if its antiderivative cannot be found explicitly, so we can not apply the FTC. In the example on the right, Wolfram Alpha was unable the indefinite inte√ to find 2 gral of x + 1 e−x in terms of standard mathematical functions, but it had no problem computing the definite integral from 0 to 1 with high accuracy. We will discuss below how to use specific numerical methods, such as the familiar left and right Riemann sums. Exercise 3. Find the definite integral 02 cos(x4 ) sin(x3 ) dx. What about indefinite integral? Does this function have an antiderivative? R 3.8 CAS: computer algebra systems 193 Unfamiliar outputs. Often when we try to find an indefinite integral, the output might look unfamiliar. For example, in the example on the right we see that Z 1 √ dx = sinh−1 (x) +C. 2 1+x We can see if there is an alternative form of the integral that might look more familiar. For example, in this case an alternative form is p ln x2 + 1 + x . Actually, the answer says log instead of ln, but if you look in the corner under it (this part was cut off in this figure), it says “log(x) is the natural logarithm”. In Wolfram Alpha log(x) denotes the natural logarithm ln(x), which is a common convention in Mathematics in general. In any case, this alternative form is more familiar in this particular case. You can also look up what “sinh” means and you will find that the function sinh(x) is the so called hyperbolic sine defined by x −x −1 sinh(x) = e −e 2 , and sinh (x) is its inverse. We could use either form to find a definite integral, for example, Z 1 0 √ 1 √ dx = sinh−1 (1) − sinh−1 (0) = ln 2 + 1 ≈ 0.88137 . . . . 2 1+x Sometimes the output might look unfamiliar, and there is no alternative form given. For example, in the example on the right Z 2 e−x dx = 1√ π erf(x) +C. 2 In this case, you can look up what erf(x) means, and you will find that it is the so called called error function defined by 2 erf(x) = √ π Z x 2 e−t dt. 0 In other words, it is a specific antiderivative of 2 √2 e−x π equal to 0 and x = 0. 194 3 Integrals R sin(x) Exercise 4. Find an indefinite integral R means. Find the definite integral 01 sin(x) x dx. x dx. Investigate what the output Riemann sums. We can find a left Riemann sum with n subintervals on the interval [a, b] by writing “integral of f (x) from a to b using left endpoint rule with n intervals”. We can replace “integral” by “Riemann sum”, and “left” by “right” if we want the right Riemann sum. The output gives the result of the Riemann sum, its symbolic representation using the ∑ notation, the graph of the function illustrating how rectangles approximate the actual function, and theR exact result for the definite integral ab f (x) dx so we can compare how well the Riemann sum approximates the actual integral. If we scroll down all the way down, the output also gives the method comparison for various numerical methods. We have only discussed the left and right Riemann sums but, as we can see, there are many other methods, some of them are much more accurate than simple left or right Riemann sums. Absolute error means how far the approximation is from the actual integral. Relative error means the absolute error divided by the actual integral (ignoring ± sign) or, in other words, the error measured as a proportion of the actual answer. Exercise 5. Compute the definite integral 12 sin(x) x dx using the right Riemann sum with 100 intervals. What is the absolute error of this approximation? R Taylor polynomials. We can find the Taylor polynomial Pn (x) of f (x) of degree n centered at x = a by writing “Taylor polynomial of degree n of f (x) at a”. We can replace “Taylor polynomial” by “series”. The answer is given under “series expansion at x = a” . Exercise 6. Find the Taylor polynomial of sin(x) of degree 3 centered at π4 . 3.8 CAS: computer algebra systems 195 R sin(x) Answer to Exercise 1. cos2 (x) dx = sec(x) +C. An alternative form of the answer 1 is cos(x) +C, which in this case is just the definition of sec(x). Answer to Exercise 2. 0.4142135623 . . . . R π/4 sin(x) 0 cos2 (x) dx = √ 2 − 1 ≈ 0.41421 . . . . Ten digit answer is Answer to Exercise 3. 02 cos(x4 ) sin(x3 ) dx ≈ 0.04745 . . . . Indefinite integral cannot be written in terms of standard mathematical functions. Nevertheless, the function cos(x4 ) sin(x3 ) is a nice continuous function so it does have an antiderivative. For example, by the FTC-2, Reconstruction theorem, we know that Rx 4 3 0 cos(t ) sin(t ) dt is one such antiderivative. Simply, there is no way to write this antiderivative using functions that have already been defined somewere and given some standard name. R Answer to Exercise 4. Wolfram Alpha outputs sin(x) x dx = Si(x) + C. Looking up Si(x) functions shows that it is called the sine integral and is defined by R sin(x) Si(x) = 0x sin(t) t Rdt. In other words, it is a specific antiderivative of x . The definite integral is 01 sin(x) x dx = Si(1) − Si(0) ≈ 0.946083 . . . . R Answer to Exercise 5. The Riemann sum equals 0.657395. Absolute error is 0.00193523. Answer to Exercise 6. P3 (x) = √1 2 + √12 (x − π4 ) − 2√1 2 (x − π4 )2 + 6√1 2 (x − π4 )3 . 196 3 Integrals 3.9 Improper integrals Improper integrals are definite integrals Rb a f (x) dx where • either the interval [a, b] is infinite, • or f (x) has some vertical asymptotes on the interval [a, b], • or a combination of both. ∞ One example is the integral −∞ f (x) dx in the left figure below, where the interval is infinite, −∞ < x < ∞, and the function has a vertical asymptote at x = 3 where it blows up to infinity from both left and right side. In this case the area between y = f (x) and the x-axis could be infinite, or it could be finite. The question is, how do we decide if this area is finite or infinite, and how do we calculate it? R The procedure is illustrated in the above two figures. • First, we slice the area into several pieces, where each piece corresponds to one potential issue that can cause the area to be infinite. For example, pieces A1 and A4 correspond to the interval stretching to infinity in one direction, and pieces A2 and A3 correspond to the function blowing up to infinity from one side of the vertical asymptote or the other. The specific choice of the points x = 2 and x = 4 that divided the regions is not important, and they could be replaced by any other points to the left and right of the vertical asymptote. • Next, we need to decide if these four regions have finite or infinite areas. We will agree to calculate their areas using the following method, illustrated in the right figure above. In the case of A1 and A4 , we will integrate up to some finite points a or d first, and then let those points get closer and closer to infinity. Using the limit notation, we can express this mathematically as Z 2 A1 = lim a→−∞ a Z d f (x) dx, A4 = lim d→+∞ 4 f (x) dx. In the case of A2 and A3 that border the vertical asymptote, we will integrate up to some point b before the vertical asymptote or from some point c after the 3.9 Improper integrals 197 vertical asymptote and then let those points b and c get closer and closer to this vertical asymptote from the corresponding side: Z 4 Z b A2 = lim b→3− 2 f (x) dx, A3 = lim c→3+ c f (x) dx. Notice how we wrote b → 3− and c → 3+, which is the notation for the left and right limits. This notation is very important because it indicates that we approach the vertical asymptote from a specific side without crossing it. • If at least one of these pieces is infinite, we say that the integral R ∞ −∞ f (x) dx diverges. If all of these pieces are finite, we say that the integral ∞ −∞ f (x) dx converges and is equal to A1 + A2 + A3 + A4 . Of course, if a function was negative on some interval, the corresponding piece could have a minus sign. R Before we consider more concrete examples, let us practice the above definition first. Example 1. If a function f (x) has a vertical asymptote on both sides of x = 1 and R is continuous everywhere else, how do we define 0∞ f (x) dx? Solution: We divide the interval [0, ∞) into three “problematic” regions, [0, 1], [1, 2] and [2, ∞), where the choice of 2 can be replaced by any other point bigger than 1. Then the definite integral on each piece is defined as Z 2 Z a lim a→1− 0 f (x) dx, lim b→1+ b Z c f (x) dx, lim c→+∞ 2 f (x) dx. The integral 0∞ f (x) dx diverges if at least one of these limits is not finite. If all three are well defined and finite then we just add them up, R f (x) dx = lim 0 Z 2 Z a Z ∞ a→1− 0 f (x) dx + lim b→1+ b Z c f (x) dx + lim c→+∞ 2 f (x) dx. Exercise 1. If a function f (x) has vertical asymptotes on both sides of x = 1 and R x = 2 and is continuous everywhere else, how do we define 03 f (x) dx? Simple power functions. Our main concrete family of examples will be the power functions 1 f (x) = p where p > 0. x These functions have a vertical asymptote at x = 0 and a horizontal asymptote y = 0 as x → +∞ (see figures below), so we will consider separately the case of the vertical asymptote on a finite interval [0, 1] and the case of the infinite interval [1, ∞). 198 3 Integrals Example 2. Consider the improper integral converges and for which p > 0 it diverges. R1 1 0 x p dx. Determine for which p > 0 it Solution: By the definition above, we need to find the limit Z 1 1 0 xp dx = lim Z 1 1 a→0+ a xp dx and determine when this limit is finite and when it is infinite. Let us start with the case p = 1: Z 1 x=1 1 dx = ln(x) = ln(1) − ln(a) = − ln(a). x=a a x We know that ln(a) → −∞ when a approaches 0 from the right, so − ln(a) → +∞. The area is infinite and the integral diverges when p = 1. When p is not equal to 1, we can use the power rule: Z 1 1 a x dx = p Z 1 a x−p dx = x−p+1 −p + 1 x=1 x=a = 1 a−p+1 − . −p + 1 −p + 1 To see what happens when a → 0+, we have to separate into two cases: p > 1 and p < 1. When p > 1 then p − 1 is positive and a−p+1 1 a−(p−1) 1 1 1 − = − =− + → +∞ −p + 1 −p + 1 −(p − 1) −(p − 1) p − 1 (p − 1)a p−1 as a → 0+, because a p−1 → 0 in the denominator. This means that the integral diverges when p > 1. Actually, we can see this without integrating because the function x1p is bigger than 1x between x = 0 and x = 1 (blue line is above the green line in the left figure) so the area will be bigger. We already computed that the area is infinite when p = 1, so the area must also be infinite when p > 1. Finally, when p < 1 then 1 − p is positive and 1 a−p+1 1 a1−p 1 1 − = − →= +0 = −p + 1 −p + 1 1 − p 1 − p 1− p 1− p 3.9 Improper integrals 199 as a → 0+, because a1−p → 0 in the numerator. So the area is finite in this case 1 and is equal to 1−p . To summarize: • The integral 01 x1p dx diverges when p = 1 or p > 1. R 1 • The integral 01 x1p dx = 1−p when p < 1. R This is easy to remember if we keep in mind that the case p = 1 is in the middle and it diverges. Then we only need to remember that the case of p > 1 is above, so it also diverges. To remember this, take e.g. p = 2 and x = 0.1, and notice that 1 1 x2 = 0.12 = 0.01 < 0.1, so when we divide we get 1x = 0.1 < 0.01 = x12 . Notice how the order of the functions reverses in the right figure above when x ≥ 1. That is because if we take x = 2 then x2 = 4 and x12 = 14 < 21 = 1x . The answer will also reverse in this case. Exercise 2. Consider the improper integral R∞ 1 1 x p dx. Check that: • The integral R1∞ x1p dx diverges when p = 1 or p < 1. 1 • The integral 1∞ x1p dx = p−1 when p > 1. R Comparison of integrals. RIn the above example, we could conclude without any calculations that the integral 01 x1p dx diverges (or is infinite) when p > 1 because, in this case, x1p is bigger that 1x on this interval, so the area will also be bigger. If we want to know if an integral converges or diverges, we can often compare to simpler integrals, or integrals that we have already computed. • • • • If area A1 is finite and A2 is smaller then it is also finite. If area A1 is finite and A2 is bigger then we cannot tell if it is finite or infinite. If area A1 is infinite and A2 is smaller then we cannot tell if it is finite or infinite. If area A1 is infinite and A2 is bigger then it is also infinite. Below we will R ∞ 1take a look R 1 1 at several concrete examples of comparison with the p-integrals 1 x p dx or 0 x p dx that we computed above, but first let us take a look at some comparisons by looking at the graphs of functions. Example 3. Suppose that the functions in the figure do not intersect anywhere besides R the two points shown. If we know R1 ∞ that f (x) dx < ∞, k(x) dx = ∞ and −∞ 0 R∞ k(x) dx = ∞, do the following integrals 1 converge, diverge, or we cannot tell? (a) R∞ (c) R0 5 h(x) dx. −∞ g(x) dx. (b) R5 (d) R∞ 0 h(x) dx. −∞ g(x) dx. 200 3 Integrals Solution: Let us start with a very important observation that will be useful in this and other similar problems. If a function is continuous on some interval [a, b], as in the figure, then the area on this interval is finite and moving the starting point from a to b does not change whether the improper integral is finite or infinite. In other R R words, if an integral a∞ is finiteR then the integral b∞ is also and if an integral a∞ is infinite then the integral Rfinite, ∞ b is also infinite. (a) We see from the figure that h(x) is bigger than k(x) after the pointR where they ∞ intersect. If we a then, by comparison, a h(x) dx is R ∞ call this intersection point xR= R∞ ∞ bigger than a k(x) dx. But we know Rthat 1 k(x) dx = ∞, so a k(x) dx is also ∞ because of the above comment. Since a∞ h(x) dx is bigger, it is also infinite. Finally, by the above comment that the starting point of the integral does not affect whether R it is finite or infinite, we get that 5∞ h(x) dx = ∞. (b) Near the vertical asymptote x = 0, the function h(x) is below k(x), so its area R is smaller. However, because 01 k(x) dx = ∞ and the area below k(x) is infinite, this does not give us any information. Area less than infinity could be finite or infinite, R so we cannot tell whether 05 h(x) dx converges or diverges. As in part (a), it does not matter if the upper limit 5 is different from the intersection point x = a. R0 (c) The answer is −∞ g(x) dx < ∞ and the integral converges. That is because g(x) is below f (x) as they approach −∞, and the area below f (x) is finite. The region between the point where they intersect and x = 0 does not matter because the areas there are finite. R∞ (d) The answer is that we cannot tell whether −∞ g(x) dx is finite or infinite. We know from part R(c) that the part between −∞ and 0 is finite, so we only need to decide whether 0∞ g(x) dx is finite or infinite. Here, g(x) is above f (x) and below k(x) but, unfortunately, it does not give us any useful information, because the integral is bigger than a finite number and smaller than infinity, so it can be finite or infinite. Exercise 3. Which of the following statements are true if the functions in the figure do not intersect. (a) If 5∞ h(x) dx converges then 3∞ converges. R R (b) If 6∞ h(x) dx diverges then 3∞ diverges. R R (c) If 7∞ g(x) dx diverges then 3∞ diverges. R R (d) If 8∞ g(x) dx converges then 3∞ converges. R R f (x) dx f (x) dx f (x) dx f (x) dx 3.9 Improper integrals 201 Comparisons with p-integrals. Next, Rwe will consider several examples of R comparison with p-integrals of the form 01 x1p dx or 1∞ x1p dx for p > 0 that we computed above. Example 4. Does the improper integral R 5 3−2 cos(x) 0 x2 dx converge or diverge? Solution: The function 3−2xcos(x) has a vertical asymptote at x = 0 where we divide 2 by 0 in the denominator, so this is indeed an improper integral. To get an idea of what to do we should notice that the numerator 3 − 2 cos(x) ‘behaves like a constant’, because cos(x) is always in between −1 and +1, so 3−2 cos(x) is always in between 3 − 2 R= 1 and 3 + 2 = 5. If the numerator was truly a constant, we know that the integral 05 x12 dx diverges because p = 2 > 1, so we should be aiming to conclude that our integral also diverges. For this purpose, logically we need to compare from below by an infinite area, so we use that 3 − 2 cos(x) 1 ≥ 2. x2 x Because R 5 3−2 cos(x) R5 1 dx also diverges. 0 x2 dx diverges, 0 x2 Exercise 4. Does the improper integral R 2 4+2 sin(t) Example 5. Does the improper integral R∞ 0 √ t dt converge or diverge? 3 3 x(x+1) dx converge or diverge? 3 Solution: The function x(x+1) has vertical asymptotes at x = 0 and x = −1 where we divide by 0, but these points are outside of the interval [3, ∞), so the only issue here is the infinite interval. Heuristically, we can think of the denominator x(x + 1) = x2 + x are roughly x2 , because x2 Rterm dominates x term when x is large. If we ignore x term then we know that 3∞ x32 dx converges, because it is the p-integral with p > 1. So we should be aiming to show that our integral converges, so we want to compare the function from above: 3 3 ≤ 2. x(x + 1) x This is true because we decreased the denominator from x2 +x to x2 . Since R 3 converges, 3∞ x(x+1) dx also converges, by comparison. Exercise 5. Does the improper integral R∞ 10 √ √2 dx x( x−1) R∞ 3 3 x2 dx converge or diverge? Answer to Exercise 1. This improper integral is defined as Z 1.5 Z a lim a→1− 0 f (x) dx + lim b→1+ b Z 3 Z c f (x) dx + lim c→2− 1.5 f (x) dx + lim d→2+ d f (x) dx 202 3 Integrals if all the limits exist and are finite. Here 1.5 can be replaced by any point in between 1 and 2, and 3 can be replace by any point bigger than 2. If even one of these four limits is not finite then the integral diverges. Answer to Exercise 2. By the definition above, we need to find the limit Z ∞ 1 x 1 dx = lim p Z a 1 a→∞ 1 xp dx. Let us start with the case p = 1: Z a 1 1 x x=a dx = ln(x) x=1 = ln(a) − ln(1) = ln(a). We know that ln(a) → ∞ when a → ∞ from the right, so the area is again infinite and the integral diverges when p = 1. When p is not equal to 1, we can again use the power rule: Z a 1 1 x Z a dx = p x−p dx = 1 x−p+1 −p + 1 x=a x=1 = a−p+1 1 − . −p + 1 −p + 1 When p > 1 then p − 1 is positive and a−p+1 1 a−(p−1) 1 1 1 1 − = − =− + → −p + 1 −p + 1 −(p − 1) −(p − 1) (p − 1)a p−1 p − 1 p−1 as a → ∞, because a p−1 → ∞ in the denominator. This means that the integral 1 converges to p−1 when p > 1. Finally, when p < 1 then 1 − p is positive and a−p+1 1 a1−p 1 − = − →∞ −p + 1 −p + 1 1 − p 1 − p as a → ∞, because a1−p → ∞ in the numerator. Answer to Exercise 3. (a) False, (b) True, (c) False, (d) True. Since the functions are below the x-axis, all integrals will have a negative sign, but we still compare the areas between these functions and the x-axis. Answer to Exercise 4. Because 4 + 2 sin(t) is always in between 4 − 2 = 2 and 4 + 2 = 6, we can use that 4 + 2 sin(t) 6 √ ≤√ . t t We know that the integral R2 6 √ 0 t dt converges because it is a p-integral with p = R 2 4+2 sin(t) 0.5 < 1, so we conclude that 0 √ t dt also converges. √ √ √ Answer to Exercise 5. The 1) = x − x is dominated by x, √ denominator x( x − which grows faster than x, so our function √x(√2x−1) = x−2√x behaves like 2x . We 3.9 Improper integrals 203 R∞ 2 know that 10 x dx diverges as the p-integral with p = 1, so if we aim to show that the original integral diverges, we need to bound it from below. Since 2 2 2 √ ≥ , √ √ = x( x − 1) x − x x this gives us the comparison we want, so R∞ 10 √ √2 dx x( x−1) also diverges. 204 3 Integrals 3.10 Slicing problems: geometry In this section we will focus on computing volumes, areas, and lengths. In the next section we will consider quantities that may be distributed unevenly throughout some geometric region, and computing their total amount will involve one more computational step. Nevertheless, the general approach will be very similar, so the purely geometric problems of this section will serve as a foundation for further applications. We have already computed areas between graphs of functions by approximating them with rectangles, so here we will focus on volumes and areas. Computing volumes by slicing. Let us consider a problem of computing the volume of a loaf of bread, depicted in the above figure. Suppose that the loaf is aligned along the x-axis, it starts at x = a and ends at x = b. Let A(x) be the area of the slice at a point x. Step 1. Let us slice the bread vertically into n thin slices along the length of the loaf. In other problems, we will also slice an object horizontally, because it will make the calculation easier. Step 2. Suppose that the slice number i is between points xi and xi+1 , as shown in the figure, and the width of one slice ∆x. If the slice is thin enough then its crosssection does not change much between the two cuts xi and xi+1 and the area of the slice can be evaluated at any point xi∗ in between xi and xi+1 . In other words, the area of the slice is approximately equal to A(x∗i ). Step 3. As a result, the volume ∆Vi of the slice number i is approximately equal ∆Vi ≈ A(xi∗ )∆x. We denote the volume of one slice by ∆V to emphasize that this volume represents a small increment of volume when we add another slice of small width ∆x. We can remember this formula more informally as ∆V ≈ A(x)∆x. 3.10 Slicing problems: geometry 205 Step 4. The total volume is the sum of volumes of n slices, so n n i=1 i=1 V = ∑ ∆Vi ≈ ∑ A(xi∗ )∆x. Step 5. The approximation will get better and better when our slices get thinner and thinner or, in other words, when the number of slices n gets bigger and bigger. Using the language of limits, V = lim n→∞ n Z b i=1 a ∑ A(xi∗ )∆x = A(x) dx. Where did the last integral come from? It appears because we recognized that the sum in the middle is the Riemann sum corresponding to the function A(x) on the interval [a, b]. This is how integrals appear in applications where the total Quantity (in this case Volume) can be approximated by a sum of small pieces that looks like a Riemann sum of some integral. Comment. In the problems where we will use the above formula V = ab A(x) dx, it will be easy to compute A(x) because the cross-sections will be simple, such as circles, or rectangles, or triangles. However, it is very important to write down all 5 steps each time, because in applications in the next section the formula will not be applicable directly. Instead, Volume will be replaced by some other Quantity and the crucial Step 3 above will be replaced by a different calculation specific to each problem. Of course, the formula ∆V ≈ A(x)∆x will typically be used as a building block in the calculations involving volumes. R Example 1. Compute the volume of a so called solid of revolution obtained by rotating the graph of y = 3e−x/2 on the interval [0, 5] around the x-axis, as shown in the figure. Solution: We want to compute the volume enclosed by this cone, and we notice that vertical slices along the x-axis look like circles, so their areas can be computed using the formula πr2 . The radius of a crosssection at the position x is 3e−x/2 , so the area of the cross section is A(x) = π(3e−x/2 )2 = 9πe−x . Step 1. We slice the region vertically into n thin slices along the x-axis on the interval [0, 5]. Step 2. If the slice number i between points xi and xi+1 is thin enough then its cross-section does not change much between the two cuts and the area of the slice ∗ is approximately equal to A(xi∗ ) = 9πe−xi , for any point xi∗ between xi and xi+1 . Step 3. As a result, the volume of the slice number i is approximately equal ∗ ∆Vi ≈ 9πe−xi ∆x, where ∆x is the width of one slice. 206 3 Integrals −xi∗ Step 4. The total volume is the sum of ∆Vi , so V ≈ ∑ni=1 9πe ∆x. Step 5. The approximation will get better as the number of slices gets bigger, so n V = lim n→∞ −xi∗ ∑ 9πe i=1 Z 5 ∆x = 9πe−x dx. 0 The last integral appears because the sum in the middle is the Riemann sum corresponding to the function 9πe−x on the interval [0, 5]. The interval [0, 5] is implicit in the notation of the sum, but we should remember that in the first step we were slicing this interval [0, 5], so the Riemann sum is defined on this interval. In this particular case we can compute the integral using the FTC, Z 5 9πe−x dx = −9πe−x 0 x=5 x=0 = −9πe−5 − (−9πe−0 ) = −9πe−5 + 9π, so V = −9πe−5 + 9π ≈ 28.0838 . . . . Exercise 1. Compute the volume of a solid of revolution obtained by rotating the graph of y = 3 − 2x on the interval [0, 6] around the x-axis, as shown in the figure. Example 2. A vase of height H has radius r(h) at height h. How much water will fit inside the vase?7 Solution: We will assume that the walls of the vase are thin, or the radius r(h) refers to the inner radius of the vase. We notice that horizontal slices along the h-axis look like circles, so their areas can be computed using the formula πr(h)2 . Step 1. We slice the vase horizontally into n thin slices along the h-axis on the interval [0, H]. Step 2. If the slice number i between height hi and hi+1 is thin enough then its cross-section does not change much between the two cuts and the area of the slice is approximately equal to πr(h∗i )2 , for any height h∗i between hi and hi+1 . Step 3. As a result, the volume of the slice number i is approximately equal to ∆Vi ≈ πr(h∗i )2 ∆h, where ∆h is the width of one slice. 7 Image from rawpixel.com. 3.10 Slicing problems: geometry 207 Step 4. The total volume is the sum of ∆Vi , so V ≈ ∑ni=1 πr(h∗i )2 ∆h. Step 5. The approximation will get better as the number of slices gets bigger, so n V = lim n→∞ ∑ πr(h∗i )2 ∆h = i=1 Z H πr(h)2 dh. 0 The last integral appears because the sum in the middle is the Riemann sum corresponding to the function πr(h)2 on the interval [0, H]. Exercise 2. A vase of height H has inner radius r(h) and outer radius R(h) at height h. Set up an integral for the volume of the sidewall of the vase. Computing lengths by slicing. Next, we will derive an integral formula for the length of the graph of a function y = f (x) on the interval [a, b], again using the slicing method. The only real difference will be that, instead of the increment ∆V of the volume, we will now have to compute (or more precisely, approximate) the increment ∆L of the length of the curve in terms of the function y = f (x). Step 1. Let us slice the curve vertically into n small arcs along the x-axis, as in the figure. Step 2. Suppose that the arc number i is between points xi and xi+1 , and the width of one slice is ∆x. If ∆x is small then the slope does not change much between xi and xi+1 and the length ∆Li of the arc is approximately equal to the length of the secant (hypothenuse of the right triangle in the figure): r ∆Li ≈ p ∆x2 + ∆y2 = r ∆y 2 ∆y2 2 1 + 2 ∆x = 1 + ∆x. ∆x ∆x ∆y The ratio ∆x approximates the slope f ′ (x) of y = f (x) between xi and xi+1 and, because the slope does not change much, it can be evaluated at any point xi∗ in between. 208 3 Integrals Step 3. As a result, the length ∆Li of the arc is approximately equal to ∆Li ≈ q 1 + f ′ (xi∗ )2 ∆x. We can remember this formula more informally as ∆L ≈ p 1 + f ′ (x)2 ∆x. Step 4. The total length is the sum of lengths of n small arcs, so n n L = ∑ ∆Li ≈ ∑ i=1 q 1 + f ′ (xi∗ )2 ∆x. i=1 Step 5. The approximation will get better and better when our arcs get smaller and smaller or, in other words, when the number of slices n gets bigger and bigger. Using the language of limits, n L = lim n→∞ ∑ i=1 q 1+ f ′ (xi∗ )2 ∆x = Z bq 1 + f ′ (x)2 dx. a Again, the last integral appearsp because the sum in the middle is the Riemann sum corresponding to the function 1 + f ′ (x)2 on the interval [a, b]. The key step that should be memorized p and does not need to be derived every time is the element of length formula: ∆L ≈ 1 + f ′ (x)2 ∆x. Example 3. The main span of Brooklyn bridge is approximately 480 meters long. If we place the origin in the center of the bridge, the shape of the main cable can be approximated by the graph of y = 0.0008x2 , as shown in the figure. Derive the formula for the length of this cable using the slicing method and evaluate it in Wolfram Alpha. 3.10 Slicing problems: geometry 209 Solution: Step 1. We slice the cable vertically into n small arcs along the x-axis on the interval [−240, 240]. Since we placed the origin in the middle of the bridge, the main span of the bridge is between the coordinates −240 and 240. Step 2. Arc number i is between points xi and xi+1 , and the width of one slice is ∆x. If ∆x is small then the slope does not change much between xi and xi+1 . Step 3. Because f ′ (x) = 0.0016x, from the right triangle calculation, the length ∆Li of the arc is approximately equal to q q ∗ ′ 2 ∆Li ≈ 1 + f (xi ) ∆x = 1 + (0.0016xi∗ )2 ∆x. Step 4. The total length is the sum of lengths of n small arcs, so n n L = ∑ ∆Li ≈ ∑ i=1 q 1 + (0.0016xi∗ )2 ∆x. i=1 Step 5. Approximation will get better when the number of slices gets bigger, so Z 240 q n q 1 + (0.0016x)2 dx. L = lim ∑ 1 + (0.0016xi∗ )2 ∆x = n→∞ −240 i=1 Evaluating this integral in Wolfram Alpha gives L = 491.5 meters. One can actually find the antiderivative and apply the FTC using some special substitution, but this is a more advanced material which is beyond what we have studied before. Exercise 3. Compute the length of the graph of y = x3/2 on the interval [0, 1] by first setting up the integral using the slicing method and then evaluating it. Answer to Exercise 1. A cross-section at the position x is a circle of radius 3 − 2x , so its area is A(x) = π(3 − 2x )2 . Step 1. We slice the region vertically into n thin slices along the x-axis on the interval [0, 6]. Step 2. If the slice number i between points xi and xi+1 is thin enough then its cross-section does not change much between the two cuts and the area of the slice x∗ is approximately equal to A(xi∗ ) = π(3 − 2i )2 for any point xi∗ between xi and xi+1 . Step 3. As ∗a result, the volume of the slice number i is approximately equal x ∆Vi ≈ π(3 − 2i )2 ∆x, where ∆x is the width of one slice. x∗ Step 4. The total volume is the sum of ∆Vi , so V ≈ ∑ni=1 π(3 − 2i )2 ∆x. Step 5. The approximation will get better as the number of slices gets bigger, so Z 6 xi∗ 2 x 2 ∆x = π 3− dx. V = lim ∑ π 3 − n→∞ 2 2 0 i=1 n We can compute the integral using the FTC, 210 3 Integrals Z 6 π 3− 0 x 2 2 π 1 x 3 dx = × 3− (−1/2) 3 2 x=6 x=0 = 0− 1 π × × 33 = 18π. (−1/2) 3 Answer to Exercise 2. We could subtract the inner volume from the outer volume to get the volume of the vase itself: Z H V= πR(h)2 dh − Z H 0 πr(h)2 dh = 0 Z H π(R(h)2 − r(h)2 ) dh. 0 However, to practice the slicing method it is better to follow the usual slicing steps as in the previous example. The only difference here is that the slice is a disk in between two circles of radius R(h) and r(h), so the area of one slice is A(h) = πR(h)2 − πr(h)2 = π(R(h)2 − r(h)2 ). Answer to Exercise 3. Step 1. We slice the graph vertically into n small arcs along the x-axis on the interval [0, 1]. Step 2. Arc number i is between points xi and xi+1 , and the width of one slice is ∆x. If ∆x is small then the slope does not change much between xi and xi+1 . Step 3. Because f ′ (x) = (3/2)x1/2 , from the right triangle calculation, the length ∆Li of the arc is approximately equal to q q ∗ ′ 2 ∆Li ≈ 1 + f (xi ) ∆x = 1 + (9/4)xi∗ ∆x. Step 4. The total length is the sum of lengths of n small arcs, so n n L = ∑ ∆Li ≈ ∑ i=1 q 1 + (9/4)xi∗ ∆x. i=1 Step 5. Approximation will get better when the number of slices gets bigger, so n L = lim n→∞ ∑ q 1 + (9/4)xi∗ ∆x = i=1 Z 1p 1 + (9/4)x dx. 0 This integral can be easily computed, L= Z 1p 0 1 + (9/4)x dx = 1 1 × (1 + (9/4)x)3/2 9/4 3/2 x=1 x=0 = 1.4397 . . . . 3.11 Slicing problems: densities 211 3.11 Slicing problems: densities We have learned how to compute areas, volumes and lengths of some geometric regions, and right now we will learn how to compute various quantities that may be distributed over those regions. These quantities will typically be distributed unevenly, so we will first need to compute the quantity is a small region (slice) and then add them up to get the total. In other words, we will be using the slicing method again, only with some extra calculations. Linear densities. If Q denotes some Quantity of interest then the general idea will be to use formulas of the form: Quantity = Quantity × Length Length or ∆Q = ∆Q × ∆L. ∆L ∆Q The ratio Quantity Length (or ∆L ) is often called a linear density, because it tells us how densely the quantity is distributed per unit of length. Examples of the units of kg $ # , mile , km etc., where the numerator has units of our quantity linear density are meter of interest, and denominator has units of length. When we use the slicing method: • if the quantity is distributed over a straight line, for example the x-axis, then the increment of length in the slicing method will simply be ∆L = ∆x; • if the quantity is distributed over some curve described by the graph of a function y = f (x) then the increment of length in the slicing method will be p ∆L ≈ 1 + f ′ (x)2 ∆x. The density ∆Q ∆L will be given to us either explicitly or implicitly, which might require some calculation of the quantity per unit of length locally on a given slice. Example 1. Suppose that the traffic on a 1.4 km stretch of Queen St between Spadina and Yonge is moving at the speed of v(x) km/h, where x (in km) is the distance from Spadina and Queen intersection. Derive a formula for the time it take to drive from from Spadina to Yonge in this traffic. Is the integral formula obtained a proper or improper integral? Solution: Recall that speed = distance time . If the speed was constant, v km/h, to find time we could simply divide the distance of 1.4 km by speed, T = 1.4 v hour. However, because the speed is changing depending on the location x, we need to use the slicing method and apply the formula time = distance speed locally. Step 1. We slice the given stretch of Queen St into n small subintervals of length ∆x each along the interval [0, 1.4] km. 212 3 Integrals Step 2. If the interval number i between points xi and xi+1 is small enough then the speed is almost constant on this interval and is approximately equal to v(xi∗ ), for any point xi∗ between xi and xi+1 . Step 3. Since the speed is almost constant, as a result, we can apply the above formula to write the increment of time between points xi and xi+1 as ∆Ti ≈ ∆x . v(xi∗ ) ∆x Step 4. The total time is the sum of ∆Ti , so T ≈ ∑ni=1 v(x ∗ . i) Step 5. The approximation will get better as the number of slices gets bigger, so n ∆x ∑ v(x∗ ) = n→∞ i i=1 T = lim Z 1.4 dx 0 v(x) because the sum in the middle is the Riemann sum corresponding to the function 1 v(x) on the interval [0, 1.4]. 1 Since the car can stop at a traffic light, the speed v(x) can approach 0 and v(x) can approach a vertical asymptote at that point. As a result, the answer could be an improper integral. It is a subtle point, but in the above calculation we should avoid points where v(xi∗ ) is equal to zero. Obviously, this improper integral would be convergent because the time cannot be infinite. In a problem like this, for simplicity, in a Calculus class it would probably be assumed that v(x) is always positive. Exercise 1. A mountain goat is walking straight east on a mountain path tracing an altitude y = f (x) (in km), where x is the horizontal coordinate (longitude, also in km). Suppose that goat’s speed is always positive and depends only on the altitude y, i.e. v = v(y) km/h. Find the time it takes to walk from longitude x = a to x = b. Example 2. In the setting of the previous problem, the density of plants is changing with altitude y, so the amount of food the goat consumes along the way is A(y) kg/km. What is the total amount of food the goat eats between x = a and x = b. Solution: Although the word “density” was not mentioned explicitly in the problem, notice how the units kg/km tell us that A(y) is actually the density of the amount of food per unit of distance. If this density was constant, we could simply multiply it by the length of the path to get the total amount of food. However, since A(y) changes with altitude, we need to use the slicing method. 3.11 Slicing problems: densities 213 Step 1. We slice the path into n small subintervals of width ∆x along the x-axis on the interval [a, b] km. Step 2. If the interval number i between points xi and xi+1 is small then the altitude does not change much and so the density of food is almost constant on this interval and is approximately equal to A(y) = A( f (xi∗ )) for any point xi∗ between xi and xi+1 . Although A(y) depends on the altitude y, this altitude should be evaluated at a point xi∗ and the quantity should be expressed in terms of x when we slice along the x-axis. Step 3. We need to multiply the density of food in kg/km by the distance in km to get the amount of food in kg. As a result, the amount of food between points xi and xi+1 is q ∆Fi ≈ A( f (xi∗ ))∆Li ≈ A( f (xi∗ )) 1 + f ′ (xi∗ )2 ∆x. Here the distance ∆L is not the horizontal increment ∆x, but the increment along p ′ the mountain path y = f (x): ∆L ≈ 1 + f (x)2 ∆x. Step 4. The total amount of food is the sum of ∆Fi , so q n n F = ∑ ∆Fi ≈ ∑ A( f (xi∗ )) 1 + f ′ (xi∗ )2 ∆x. i=1 i=1 Step 5. The approximation will get better as the number of slices gets bigger, so n F = lim n→∞ q n ∗ ∆F ≈ A( f (x )) 1 + f ′ (xi∗ )2 ∆x ∑ i ∑ i i=1 i=1 Z b = q A( f (x)) 1 + f ′ (x)2 dx a becausep the sum in the middle is the Riemann sum corresponding to the function A( f (x)) 1 + f ′ (x)2 on the interval [a, b]. Exercise 2. Hiker is walking straight east on a mountain path which is tracing an altitude y = f (x) (in km), where x (in km) is the horizontal coordinate (longitude). Suppose that hiker’s speed is always positive and depends only on the longitude, v = v(x) km/h. Hiker is breathing air at a rate r = 15 + 100|slope| litres per minute, where slope means the slope of the graph of y = f (x). Find the total volume of air the hiker breathes between x = a and x = b. Pay attention to units! 214 3 Integrals Volume densities. Next, we will consider quantities distributed over some volume, so we will be using: Quantity = Quantity × Volume Volume or ∆Q = ∆Q × ∆V. ∆V ∆Q The ratio Quantity Volume (or ∆V ) is called a volume density, because it tells us how densely the quantity is distributed per unit of volume. Example 3. Trees “lift” water from roots to shoots by a mechanism of a decrease in hydrostatic (water) pressure created by transpiration, which is the evaporation of water from leaves. The work done by a tree against the gravity force to lift m kg of water to height h meters is given by the formula W = Force × Distance = mg × h (measured in J (joule) = Newton × meter = kg × m2 /s2 ). Palm tree trunk contains 75% of water by volume. If palm tree trunk is H meters high and has radius r(h) at height h meters, how much work does the tree do to “refill” its trunk with water. Denote by ρ the density of water. Solution: When using the formula W = mg × h, we will compute the mass of water as ρ×(Volume of water)= ρ × 0.75 (Volume of tree trunk), because water is 75% of tree trunk by volume. However, in Step 3 of the slicing method below we will have to apply this formula locally at height h because the work done changes with height. Step 1. We slice the tree horizontally into n narrow disks of width ∆h along the h-axis (y-axis) on the interval [0, H], i.e. between height 0 and height H. Step 2. If the interval number i between height hi and hi+1 is small enough then the radius of the cross-section r(h) and height h do not change much and are approximately equal to r(h∗i ) and h∗i for any point h∗i between hi and hi+1 . Step 3. Palm tree looks like a body of revolution so we can think of its crosssection as a circle with the area A ≈ πr(h∗i )2 and, as a result, the volume of one slice is ∆Vi ≈ πr(h∗i )2 ∆h. Because 75% of the tree trunk is water, the mass of water in this slice is 0.75ρ∆Vi ≈ 0.75ρπr(h∗i )2 ∆h. Using the formula for work stated in the problem, the work that the tree does to lift this much water to height h∗i is ∆Wi ≈ 0.75ρπr(h∗i )2 ∆hg × h∗i = 0.75gπρr(h∗i )2 h∗i ∆h. The units will be J (joule), because all the units were consistent. Step 4. The total work is the sum of ∆Wi , so 3.11 Slicing problems: densities 215 n n i=1 i=1 W = ∑ ∆Wi ≈ ∑ 0.75gπρr(h∗i )2 h∗i ∆h. Step 5. The approximation will get better as the number of slices gets bigger, so n W = lim n→∞ ∑ i=1 0.75gπρr(h∗i )2 h∗i ∆h Z H = 0.75gπρr(h)2 h dh (in J) 0 because the sum in the middle is the Riemann sum corresponding to the function 0.75gπρr(h)2 h on the interval [0, H]. Exercise 3. The amount of heat H stored in a piece of material is H = cρTV where c is a constant called the specific heat in J/(kg · K) that depends on the material, ρ is the mass density in kg/m3 , V is the volume in m3 , and T is the temperature in K. Consider a bar of length L meters made of aluminium with constant density ρA and specific heat cA , and varying radius r(x) (in meters) and temperature T (x) (in K). What is the total heat stored in the bar? Example 4. A glass bowl depicted in the figure has semi-circular sides of radius R cm, and it has length L cm. When it is filled with potato chips they tend to break and smaller pieces accumulate towards the bottom, so the density of chips changes with depth according to the function C = C(d) g/cm3 , where depth d is 0 at the top and R at the bottom. Find the total mass of potato chips in the bowl. Solution: Main formula we want to use is, of course, mass = density × volume, but we need to combine it with the slicing method to make sure that the density is constant or almost constant. Since the density changes with depth: Step 1. We slice the bowl horizontally into n slices of height ∆d along the (y-axis) on the interval [0, R], i.e. between depth 0 and R. Step 2. When ∆d is small, the density of chips is almost constant on any given slice and is approximately equal to C(di∗ ) for any point di∗ between di and di+1 . Also we can see by looking at the √ side view that the width of the slice at depth d is 2 R2 − d 2 , so the volume of the rectangular p slice is approximately ∆Vi ≈ 2 R2 − (di∗ )2 × L × ∆d. 216 3 Integrals Step 3. Using the formula mass = density×volume when the density is constant: ∆mi ≈ C(di∗ )2 q R2 − (di∗ )2 L ∆d. The units will be grams, because all the dimensions are given in cm and the density is given in g/cm3 . Step 4. The total mass is the sum of ∆mi , so n n m = ∑ ∆mi ≈ ∑ i=1 q (2L)C(di∗ ) R2 − (di∗ )2 ∆d. i=1 Step 5. The approximation will get better as the number of slices gets bigger, so n m q = lim (2L)C(di∗ ) n→∞ i=1 ∑ R2 − (di∗ )2 ∆d Z R p (2L)C(x) R2 − x2 dx grams = 0 because the √ sum in the middle is the Riemann sum corresponding to the function (2L)C(x) R2 − x2 on the interval [0, R]. The reason we replaced the depth variable d by variable x in the integral is because dd would look confusing in place of dx. Exercise 4. Average oxygen concentration in the Atlantic ocean at depth d km is given by the function C = C(d) mg/L (depicted in the figure for depths between 0 and 5 km).8 According to the ocean depth chart, the area of the ocean at depth d is given by A = A(d) (in millions of km2 ). How much oxygen is stored in Atlantic ocean up to the depth of 5 km? Area densities. Finally, we will consider quantities distributed over some area, so we will be using: Quantity = Quantity × Area Area or ∆Q = ∆Q × ∆A. ∆A ∆Q The ratio Quantity Area (or ∆A ) is called a area density, because it tells us how densely the quantity is distributed per unit of area. 8 https://bit.ly/3QFCMNA 3.11 Slicing problems: densities 217 Example 5. A circular city of Ecbatana has radius R and population density of d(r) people per km2 at distance r km from the center. What is its total population? Solution: Step 1. We slice the city into n circular strips of width ∆r between radius 0 and R. One such slice is shown in the figure. Step 2. When ∆r is small, radius does not change much on one strip so the population density is almost constant and is approximately equal to d(ri∗ ), evaluated at any radius ri∗ between ri and ri+1 , on the strip number i. The perimeter of the strip is approximately 2πri∗ and the width is ∆r, so its area is approximately ∆Ai ≈ 2πri∗ ∆r. Step 3. This means that the population on the strip number i is ∆Pi = density × area ≈ d(ri∗ )2πri∗ ∆r. Step 4. The total population is the sum of ∆Pi , so P ≈ ∑ni=1 d(ri∗ )2πri∗ ∆r. Step 5. The approximation will get better as the number of slices gets bigger, so P = lim n→∞ n Z R i=1 0 ∑ d(ri∗ )2πri∗ ∆r = 2πd(r)r dr people because the sum in the middle is the Riemann sum corresponding to the function 2πd(r)r on the interval [0, R]. Exercise 5. Trees transform carbon dioxide (CO2 ) and water into glucose and oxygen using sunlight. This process is called photosynthesis. According to a paper9 , during 1 hour the leaves of the plant Plantago Asiatica produce glucose and oxygen at a rate r(T ) = 0.36 11 + 0.95(T − 10) − 0.025(T − 10)2 measured in µmol/cm2 , where T is the temperature of the leaf in C◦ . The leaf in the figure (where x and y are in cm) is in partial shade and the temperature of different parts of the leaf varies according to the formula 2 T (x) = 10+2 sin( x2 ) C◦ . What is the total amount of glucose and oxygen produced by this leaf during 1 hour? 9 https://doi.org/10.1093/jxb/erj049 218 3 Integrals Answer to Exercise 1. Step 1. We slice the path into n small subintervals of width ∆x along the x-axis on the interval [a, b] km. Step 2. If the interval number i between points xi and xi+1 is small enough then the altitude does not change much and so the speed is almost constant on this interval and is approximately equal to v(y) = v( f (xi∗ )) for any point xi∗ between xi and xi+1 . Although speed v(y) depends on the altitude y, this altitude should be evaluated at a point xi∗ and the quantity should be expressed in terms of x when we slice along the x-axis. Step 3. Since the speed is almost constant, as a result, we can write the increment of time between points xi and xi+1 as p 1 + f ′ (xi∗ )2 ∆x distance ∆Li ≈ ≈ . ∆Ti = speed v( f (xi∗ )) v( f (xi∗ )) Notice that, compared to the previous example, here the distance ∆L is not the horizontal increment ∆x, but the increment p along the mountain path y = f (x), which we found in the last section: ∆L ≈ 1 + f ′ (x)2 ∆x. This calculation is the main step and the biggest difference from the previous example. √ ′ ∗2 1+ f (x ) ∆x n Step 4. The total time is the sum of ∆Ti , so T ≈ ∑i=1 v( f (x∗i)) . i Step 5. The approximation will get better as the number of slices gets bigger, so p Z bp n 1 + f ′ (xi∗ )2 ∆x 1 + f ′ (x)2 dx T = lim ∑ = n→∞ v( f (xi∗ )) v( f (x)) a i=1 because √ ′ 2the sum in the middle is the Riemann sum corresponding to the function 1+ f (x) v( f (x)) on the interval [a, b]. Answer to Exercise 2. Step 1. We slice the path into n small subintervals of width ∆x along the x-axis on the interval [a, b] km. Step 2. If the interval number i between points xi and xi+1 is small enough then the speed does not change much and is approximately equal to v(xi∗ ) km/h for any point xi∗ between xi and xi+1 . The altitude also does not change much, so the breathing rate is approximately equal to 15 + 100|slope| = 15 + 100| f ′ (xi∗ )| L/min, because the slope of y = f (x) is f ′ (x). Step 3. How many litres of air does the hiker breathe between points xi and xi+1 ? Here the units can help us. To get litres we need to multiply L/min by min, so we need to multiply the rate 15 + 100| f ′ (xi∗ )| L/min by time ∆T in minutes. We can computed time as in the previous problems as p 1 + f ′ (xi∗ )2 ∆x distance ∆Li ∆Ti = ≈ ≈ . speed v(xi∗ ) v(xi∗ ) 3.11 Slicing problems: densities 219 The only subtle issue is that time is in hours, because the distance was given in km and speed was given in km/h, so when we multiply the breathing rate by time, p 1 + f ′ (xi∗ )2 ∆x ′ ∗ 15 + 100| f (xi )| × , v(xi∗ ) the units are L/min × hour = L/min × 60min = 60 L. So the amount of air is: p 1 + f ′ (xi∗ )2 ∆x ′ ∗ ∆Ai ≈ 15 + 100| f (xi )| × × 60 litres. v(xi∗ ) Step 4. The total amount of air is the sum of ∆Ai , so p n 1 + f ′ (xi∗ )2 ∆x ′ ∗ A ≈ ∑ 60 15 + 100| f (xi )| × . v(xi∗ ) i=1 Step 5. The approximation will get better as the number of slices gets bigger, so p n 1 + f ′ (xi∗ )2 ∆x ′ ∗ A = lim ∑ 60 15 + 100| f (xi )| × n→∞ v(xi∗ ) i=1 p Z b 1 + f ′ (x)2 dx litres = 60 15 + 100| f ′ (x)| × v(x) a because the sum in the middle is the Riemann sum corresponding to the function √1+ f ′ (x)2 ′ 60 15 + 100| f (x)| on the interval [a, b]. v(x) Answer to Exercise 3. Step 1. We slice the bar vertically into n narrow disks of width ∆x along the x-axis on the interval [0, L]. Step 2. If the interval number i between points xi and xi+1 is small enough then the radius of the cross-section and temperature do not change much and are approximately equal to r(xi∗ ) and T (xi∗ ) for any point xi∗ between xi and xi+1 . Step 3. As we saw in the last section, for a body of revolution the cross-section is a circle so its area is A ≈ πr(xi∗ )2 and, as a result, the volume is ∆Vi ≈ πr(xi∗ )2 ∆x. Using the formula stated in the problem, the heat stored in one slice is ∆Hi ≈ cA ρA T (xi∗ )∆Vi = cA ρA T (xi∗ )πr(xi∗ )2 ∆x. The units will be J (joule), because all the units were consistent. Step 4. The total heat is the sum of ∆Hi , so H ≈ ∑ni=1 πcA ρA T (xi∗ )r(xi∗ )2 ∆x. Step 5. The approximation will get better as the number of slices gets bigger, so n H = lim n→∞ ∑ i=1 πcA ρA T (xi∗ )r(xi∗ )2 ∆x Z L = 0 πcA ρA T (x)r(x)2 dx J because the sum in the middle is the Riemann sum corresponding to the function πcA ρA T (x)r(x)2 on the interval [0, L]. 220 3 Integrals Answer to Exercise 4. Step 1. We slice the Atlantic ocean into n slices of depth ∆d between depth 0 and 5 km. Step 2. When ∆d is small, oxygen concentration is almost constant on any given slice and is approximately equal to C(di∗ ) for any point di∗ on the interval number i between di and di+1 . The area of the slice is approximately A(di∗ ), so its volume is ∆Vi ≈ A(di∗ )∆d. Step 3. Using the formula Mass = Concentration × Volume when concentration is constant, we get that ∆mi ≈ C(di∗ )A(di∗ ) ∆d. The area was given in millions of km2 , depth in km, and concentration in mg/L. There are 1012 litres in a cubic kilometer, so when we multiply the units, mg/L × 106 km2 × km = 1018 mg = 1012 kg, we get 1012 kg. We could also convert 1012 kg to tonnes and express the units as millions of tonnes. Step 4. The total mass is the sum of ∆mi , so n n i=1 i=1 m = ∑ ∆mi ≈ ∑ C(di∗ )A(di∗ ) ∆d. Step 5. The approximation will get better as the number of slices gets bigger, so m = lim n→∞ n Z 5 i=1 0 ∑ C(di∗ )A(di∗ ) ∆d = C(x)A(x) dx millions of tonnes because the sum in the middle is the Riemann sum corresponding to the function C(x)A(x) on the interval [0, 5] km. Answer to Exercise 5. The rate of photosynthesis r(T ) will depend on the coordinate x on the leaf, because temperature T = T (x) varies with x. Let us record that r(T (x)) = 0.36 11 + 1.9 sin(x2 /2) − 0.1 sin2 (x2 /2) . Step 1. We slice the leaf into n slices of width ∆x between x = 0 and x = 10 cm. Step 2. When ∆x is small, the rate of photosynthesis is almost constant on one slice and is approximately equal to r(T (xi∗ )) for any point xi∗ on the interval number i between xi and xi+1 . The height of the slice is approximately f (xi∗ ) − g(xi∗ ), so its area is ∆Ai ≈ ( f (xi∗ ) − g(xi∗ ))∆x. Step 3. Notice that the rate of photosynthesis was given per unit or area, so we can use that Amount = Rate × Area when the rate is constant. As a result, the amount of glucose and oxygen produced during 1 hours in one slice is ∆Ai ≈ r(T (xi∗ ))( f (xi∗ ) − g(xi∗ ))∆x. The area in the figure is in cm2 , and rate is in µmol/cm2 , so the amount here is in µmol. Step 4. The total amount is the sum of ∆Ai , so 3.11 Slicing problems: densities 221 n n i=1 i=1 A = ∑ ∆Ai ≈ ∑ r(T (xi∗ ))( f (xi∗ ) − g(xi∗ ))∆x. Step 5. The approximation will get better as the number of slices gets bigger, so n A = lim n→∞ ∑ i=1 r(T (xi∗ ))( f (xi∗ ) − g(xi∗ ))∆x Z 10 = r(T (x))( f (x) − g(x)) dx µmol 0 because the sum in the middle is the Riemann sum corresponding to the function r(T (x))( f (x)−g(x)) on the interval [0, 10] cm. At this stage we can replace r(T (x)) by the specific formula above: Z 10 A= 0 0.36 11 + 1.9 sin(x2 /2) − 0.1 sin2 (x2 /2) f (x) − g(x) dx µmol Chapter 4 Differential equations 4.1 Differential equations: qualitative analysis In this chapter we will study differential equations of the form y′ = f (x, y). In this equation, x is an independent variable and y is a function y = y(x). In other words, the equation can be written more precisely as y′ (x) = f x, y(x) but we will often write it simply as y′ = f (x, y), keeping in mind that y here actually means a function y(x). This function is usually unknown to us and we want to solve this equation to find y(x). A function y(x) is a solution of the equation y′ = f (x, y) if it satisfies this equation, which means that if we plug it into the two sides of the equation we get equality. As we will see in the examples, an equation like this will have many solutions depending on the starting point y(x0 ) = y0 , which is called the initial condition. • In this section we will focus on understanding some qualitative behaviour of solutions of some equations. • In the next section we will see how we can approximate solutions using Taylor polynomials and the so called Euler’s method. • In the section after that we will learn how to find solutions for the so called separable equations. • After that, we will do some modelling using differential equations and consider examples of systems of equations. Let us introduce some additional definitions in the context of an applied example. 223 224 4 Differential equations Example 1. By Newton’s law of cooling, the temperature of a cup of coffee in a room with temperature 20◦ C can be modelled by the differential equation y′ = κ(20 − y) for some positive constant κ > 0, where time t is an independent variable and y = y(t) is the temperature changing with time. We will take κ = 0.02 in this example, as in the figure. (a) The figure shows several solutions of this equation. What distinguishes these solutions? (b) One of the solutions is y(t) = 20. Verify that it is indeed a solution. (c) Why is it sensible to call y(t) = 20 an equilibrium solution? (d) Using common sense, would you call y(t) = 20 a stable or unstable equilibrium? Why? Solution: (a) One of the things that distinguishes these solutions is the initial temperature y(0) at time t = 0 or, in other words the initial condition. We see that the solutions corresponding to temperatures 30, 50 and 70◦ C are decreasing over time to 20◦ C, while solutions corresponding to temperatures 10 and −10◦ C are increasing to 20◦ C. (b) To verify that y(t) = 20 is a solution of y′ = κ(20 − y), we check that y′ (t) = (20)′ = 0 and κ(20 − y(t)) = κ(20 − 20) = 0, so the two sides are indeed equal. (c) It is sensible to call y(t) = 20 an equilibrium solution, because it reflects that coffee at room temperature will stay at room temperature, so its temperature is in a state of rest or balance. (d) The equilibrium solution y(t) = 20 is a stable equilibrium because solutions that start close to 20◦ C do not move away and always stay close, in fact, getting closer and closer to 20◦ over time. We will see examples below when a solution starting close to an equilibrium moves away from it over time. Such an equilibrium will be called an unstable equilibrium. If we are only given the equation y′ = κ(20 − y) and do not see the figure, how can we check that y(t) = 20 is a stable equilibrium? We can use the following diagram: y′ = 0.02(20 − y) : − + 20 y Since the derivative y′ is 0.02(20 − y), so it depends only on y, we draw an arrow representing y-axis and we mark 20 on this axis, which is our equilibrium point. Then we check the sign of the derivative to the left and right of 20. For example, if we plug in y = 10, we get 0.02(20 − 10) = 0.2 which is positive and, if we plug in 4.1 Differential equations: qualitative analysis 225 y = 30, we get 0.02(20 − 30) = −0.2 which is negative. To the left of 20 we draw an arrow in the positive direction to indicate that solutions increase there, because y′ (t) > 0, and to the right of 20 we draw an arrow in the negative direction to indicate that solutions decrease there. The diagram shows that the solutions starting nearby 20 will move towards 20, so it must be a stable equilibrium. Comment. In the above example, the differential equation was of the type y′ = f (y). In other words, the right hand side depends only on y and does not depend on the independent variable x or t explicitly. Such equations are called autonomous equations. Otherwise, the equation is non-autonomous. For example, y′ = y + t is non-autonomous, because of the term +t. We will discuss autonomous vs nonautonomous equations in the examples below. To find equilibrium solutions of an autonomous equation, we simply find all the points where f (y) = 0. For example, in the above example 0.2(20−y) = 0 when y = 20, so this was the only equilibrium solution. Exercise 1. Consider a differential equation y′ = 0.3(y − 2). (a) (b) (c) (d) Is this an autonomous equation? Find all equilibrium solutions. Verify that they are indeed solutions. Are they stable or unstable? Roughly sketch some examples of solutions. Example 2. Consider a differential equation y′ = y(1 − y). (a) (b) (c) (d) Is this an autonomous equation? Find all equilibrium solutions. Verify that they are indeed solutions. Are they stable or unstable? Roughly sketch some examples of solutions. Solution: (a) This is an autonomous equation, because the right hand side y(1 − y) depends only on y. (b) The right hand side y(1 − y) is equal to 0 when y = 0 or y = 1, so there are two equilibrium solutions. To check that y(t) = 1 is a solution, we plug it into y′ = y(1 − y) and we get (1)′ = 1(1 − 1), because both sides are zero. Similarly, y(t) = 0 is a solution because (0)′ = 0(1 − 0). (c) To decide if these equilibria are stable or unstable, we draw a diagram: y′ = y(1 − y) : − − + 0 1 y From this diagram we see that y = 0 is an unstable equilibrium because solutions that start nearby are moving away from it, and y = 1 is a stable equilibrium because solutions are moving towards it. 226 4 Differential equations (d) In the figure on the right we sketched two equilibrium solutions y = 0 and y = 1, and three more solutions above, below, and in between these points. According to the above diagram, the solutions above and below are decreasing because y′ < 0, and the solution in between is increasing because y′ > 0 there. This figure contains a new feature called the slope field, which is a bunch of arrows that describe in which direction a solution would flow at various points on the plane. Before we jump to the next problem, let us explain how slope fields work exactly. Slope fields. The figure to the right explains how the slope field is constructed. Given a differential equation y′ = f (x, y): • We take some regular grid on the xy plane. In the figure, a 3 × 3 grid is illustrated by red dots. • At each point (a, b) on the grid, we draw a small segment of a line with the slope f (a, b). For example, if f (x, y) = y(1 − y) as in the previous example then, at the point (1.5, 0.5) we draw a segment of the line with the slope f (1.5, 0.5) = 0.5(1 − 0.5) = 0.25. Sometimes arrows are added like in the above example, but this is not necessary. This construction of the slope field is very natural, because a solution y(x) of the equation y′ = f (x, y) passing through a point (a, b) must have the slope f (a, b). As a result, each piece of the slope field represents a segment of the tangent line of a solution passing through that point, so the slope field helps us visualize how the solutions would move at various points of the plane.We can draw a slope field in Geogebra using the command SlopeField( f (x, y), n, a, Min x, Min y, Max x, Max y). Here Min x, Min y, Max x, Max y are the sides of the rectangle where we want to draw the slope field, number n is the number of points in the grid in both the horizontal and vertical directions, and length multiplier a controls the length of each segment, which can be adjusted for visual impact. The slope field in the above example was constructed using SlopeField(y(1 − y), 60, 0.75, 0, −5, 10, 5). We can also draw a solution starting at a point (x0 , y0 ) using the commands 4.1 Differential equations: qualitative analysis 227 SolveODE( f (x, y), (x0 , y0 )) or SolveODE( f (x, y), x0 , y0 , End x, Step). Here End x means up to which point we want the solution to be drawn, and Step means that the solution is drawn in steps of this size. The smaller the step the smoother solution looks like, although any small enough step like 0.1 will look perfectly smooth. Exercise 2. Consider a differential equation y′ = y(y − 1). (a) (b) (c) (d) Is this an autonomous equation? Find all equilibrium solutions. Verify that they are indeed solutions. Are they stable or unstable? Roughly sketch some examples of solutions. Draw the slope field along the line y = 2. Draw the slope field in Geogebra. Non-autonomous equations. In the next two problems we will take a look at a couple of non-autonomous equations y′ = f (t, y) where the right hand side depends on the independent variable t. In fact, the equations will be of the special form y′ = g(t)h(y), which are called separable equations, and which we will learn how to solve later on. In this case we can still find equilibrium solutions by solving h(y) = 0. The next two problems will refer to the following figures. Notice how the slope field changes along horizontal lines, compared to the examples above where the slope field stayed the same along any horizontal line. That is because the slope y′ now changes with t even if y stays constant, because the equation is non-autonomous. Example 3. Consider the slope field in the left figure above corresponding to the differential equation y′ = ty(1 − y). (a) By looking at the slope field, find all equilibrium solutions. Check that they are indeed solutions by plugging them into the equation. (b) Sketch some solutions following along the slope field. (c) If the initial condition y(0) > 0 is positive, what is the limit limt→∞ y(t)? 228 4 Differential equations Solution: (a) By looking at the slope field, we can see that some solutions are flowing towards y = 1 and some solutions are flowing away from y = 0, so it looks like y(t) = 1 and y(t) = 0 could be equilibrium solutions. If we plug y(t) = 1 into the equation y′ = ty(1 − y), we see that (1)′ = t × 1(1 − 1) because both sides are equal to zero. Similarly, (0)′ = t × 0(1 − 0) holds because both sides are zero, so y(t) = 0 is also an equilibrium solution. Without looking at the slope field, we could have found these solutions by setting y′ = ty(1 − y) = 0, so y(1 − y) = 0 and y = 0 or y = 1. (b) We sketched some solutions flowing along the slope field in the above figure. (c) From the slope field we see that any solution starting above the x-axis, when y(0) > 0 is positive, flows towards the equilibrium y = 1, so limt→∞ y(t) = 1 in this case. Exercise 3. Consider the slope field in the right figure above corresponding to the differential equation y′ = sin(πt) sin(πy). (a) By looking at the slope field, find all equilibrium solutions. Check that they are indeed solutions by plugging them into the equation. (b) Sketch some solutions following along the slope field. (c) If the initial condition is y(0) = 0.5, does the solution have a limit limt→∞ y(t)? 1. 2. 3. 4. 4.1 Differential equations: qualitative analysis 229 Next two problems will refer to the above figures. Example 4. Which of the slope fields above correspond to autonomous differential equations and why? Solution: Slope fields 2 and 3 correspond to autonomous differential equations of the form y′ = f (y), because the slope looks the same along any horizontal line y = c. This is exactly what should happen, because on a horizontal line y = c the slope is also constant, y′ = f (y) = f (c), so the segments should all look the same in the horizontal direction. Notice that slope fields 1 and 4 change along horizontal lines, which is an indication that the right hand side of the equation y′ = f (t, y) depends on t. Exercise 4. Match the slope fields in the above figure with the following four differential equations: 1 (c) y′ = 2y+1 (d) y′ = y − t (a) y′ = (2y + 1)2 (b) y′ = (2t + 1)2 Exercise 5. A cup of coffee is sitting in a room with temperature which is initially 20◦ C and then starts decreasing after the AC has been turned on. By looking at the slope field in the figure which describes the temperature of a cup of coffee, at what time was the AC turned on? The units on the x-axis are minutes. Sketch some solutions flowing along this slope field. Answer to Exercise 1. (a) This is an autonomous equation, because the right hand side 0.3(y − 2) depends only on y. (b) 0.3(y − 2) = 0 when y = 2. To check that y(t) = 2 is a solution, we plug it into y′ = 0.3(y − 2) and we get (2)′ = 0.3(2 − 2), because both sides are zero. (c) We draw a diagram: y′ = 0.3(y − 2) : − + 2 y From this diagram we see that y = 2 is an unstable equilibrium, because solutions that start nearby are moving away from it. 230 4 Differential equations (d) In the figure we sketched an equilibrium solution y = 2 and two solutions above and below moving away from it. The solution above y = 20 is increasing, because we saw in the diagram that the derivative is positive there, and the solution below y = 20 is decreasing because y′ < 0. Any increasing and decreasing shapes would be fine here, but in the next section we will learn how to check if the solution is concave up or concave down. As a quick preview: y′ = 0.3(y − 2) implies that y′′ = (y′ )′ = (0.3(y − 2))′ = 0.3y′ = 0.3 × 0.3(y − 2) = 0.09(y − 2). We see that y′′ > 0 when y > 2 where solutions must be concave up and y′′ < 0 when y < 2 where solutions must be concave down. Answer to Exercise 2. (a) This is an autonomous equation, because the right hand side y(y − 1) depends only on y. (b) The right hand side y(y − 1) is equal to 0 when y = 0 or y = 1, so there are two equilibrium solutions. To check that y(t) = 1 is a solution, we plug it into y′ = y(1 − y) and we get (1)′ = 1(1 − 1), because both sides are zero. Similarly, y(t) = 0 is a solution because (0)′ = 0(0 − 1). (c) To decide if these equilibria are stable or unstable, we draw a diagram: y′ = y(y − 1) : − + 0 + 1 y From this diagram we see that y = 1 is an unstable equilibrium because solutions that start nearby are moving away from it, and y = 0 is a stable equilibrium because solutions are moving towards it. (d) In the figure on the right we sketched two equilibrium solutions y = 0 and y = 1, and three more solutions above, below, and in between these points. According to the above diagram, the solutions above and below are increasing because y′ > 0, and the solution in between is decreasing because y′ < 0 there. We drew the slope field in Geogebra, but the slope field around level y = 2 could be drawn by hand by computing y′ = y(y − 1) = 2(2 − 1) = 2 and drawing a bunch of line segments with the slope 2 along the horizontal line y = 2. In the figure the regions near y = 2 is emphasized by the red dashed lines. Answer to Exercise 3. (a) By looking at the slope field, we can see that the slope is horizontal at y = −2, −1, 0, 1, 2 and 3, so these look like equilibrium solutions. For example, if we plug y(t) = 1 into the equation y′ = sin(πt) sin(πy), we see that (1)′ = sin(πt) sin(0) because both sides are equal to zero. Without looking at the 4.1 Differential equations: qualitative analysis 231 slope field, we see that y′ = sin(πt) sin(πy) = 0 when sin(πy) = 0, which holds when πy is of the form 0, ±π, ±2π, etc. In other words, when y is any integer number 0, ±1, ±2, etc. (b) We sketched the equilibrium solutions and one non-equilibrium solution flowing along the slope field in the figure. We see that solutions can now fluctuate by decreasing and increasing. (c) For the same reason it looks that the limit limt→∞ y(t) does not exist, unless we start exactly at one of the equilibrium solutions. Answer to Exercise 4. From previous example we know that 2 and 3 are autonomous equations, so they must be (a) or (c). We can see from the figure that in 2 the slope is decreasing as y increases and in 3 the slope is increasing as y increases, so we can match 2(c) and 3(a). We could also match by noticing that 3 has 1 a horizontal slope line, aka equilibrium solution, below the x-axis, while (c) 2y+1 can never be equal to zero. Next, we can notice that the slope field 4 is the same along vertical lines. This is an indication that it correspond to the equation of the form y′ = f (t), because on a vertical line t is a constant, t = c. This allows us to match 4(b), because (b) y′ = (2t + 1)2 is the only equation where the right hand side depends only on t. This leaves 1(d). Another way we could match 4(b) and 1(d) is to check when the slope is zero in (b) and (d). In (b), the slope is zero when y′ = (2t + 1)2 = 0, so t = −0.5. In (d), the slope is zero when y′ = y − t = 0, so y = t. We can see that in the slope field 4, the slopes are zero around t = −0.5, and in slope field the 1 the slopes are zero around the diagonal line y = t. Answer to Exercise 5. AC was turned on at t = 30 minutes, because that it when the slope field starts changing along the horizontal lines. A couple of solutions are sketched in the figure. 232 4 Differential equations 4.2 Differential equations: approximations In this section we will learn how to approximate the solution of the initial value problem y′ = f (x, y), y(a) = b. Here, y(a) = b is the initial condition. Euler’s method. First, we will use Euler’s method to approximate solution of the above initial value problem. The idea is to move along the tangent line in small steps ∆x = h as follows. Since we want to approximate the solution y = y(x) of the above differential equation, let us recall that T (x) = y(a) + y′ (a)(x − a) is its tangent line at x = a. We already know y(a) because we are given the initial value y(a) = b. Because y(x) is the solution of the equation y′ = f (x, y), we can use it to compute the slope y′ (a) = f (a, y(a)) = f (a, b) at x = a. So the tangent line at x = a is: T (x) = b + f (a, b)(x − a). As the above figure illustrates, if we move a small step h from x = a to x = ah along this tangent line then y = b will become y = b + f (a, b)h. If the step h is small then the tangent line should be a good approximation of the actual solution and our new point (a + h, b + f (a, b)h) is close to the solution at a new time t = a + h. Then we can think of this point as our new initial condition and repeat the process, i.e. again move along another tangent line at that point. In the figure above, we started at some point on the y-axis and then made 4 steps, each time updating the next slope according to our position. Let us show how this is done on a specific problem. Example 1. The above figure comes from the example of a cup of coffee cooling down at room temperature. The starting temperature is y(0) = 70◦ C and the differential equation is y′ = 0.02(20 − y). Describe the first four steps of Euler’s method with the step h = 20 minutes. If the actual solution at time t = 80 is equal to 30.09◦ C, does the Euler method underestimate or overestimate it? Solution: In the above figure, Euler’s method steps are given by the solid lines and the actual solution is the dashed curve. In this problem, f (t, y) = 0.02(20 − y). 4.2 Differential equations: approximations 233 Step 1. Our starting point is (0, 70) and the slope at this point is f (0, 70) = 0.02(20 − 70) = −1. This means that on the first interval from t = 0 to t = 20, the tangent line is y = 70 − t. If we move along this line, at time t = 20 we end up at y = 70 − 20 = 50. Step 2. After the first step we are at the point (20, 50). The slope at this point is f (20, 50) = 0.02(20 − 50) = −0.6. This means that on the second interval from t = 20 to t = 40, the tangent line is y = 50 − 0.6(t − 20). If we move along this line, at time t = 40 we end up at y = 50 − 0.6(40 − 20) = 38. Step 3. After the second step we are at the point (40, 38). The slope at this point is f (40, 38) = 0.02(20 − 38) = −0.36. This means that on the third interval from t = 40 to t = 60, the tangent line is y = 38 − 0.36(t − 40). If we move along this line, at time t = 60 we end up at y = 38 − 0.36(60 − 40) = 30.8. Step 4. After the third step we are at the point (60, 30.8). The slope at this point is f (60, 30.8) = 0.02(20 − 30.8) = −0.216. This means that on the fourth interval from t = 60 to t = 80, the tangent line is y = 30.8 − 0.216(t − 60). If we move along this line, at time t = 80 we end up at y = 30.8 − 0.216(80 − 60) = 26.48. Since the actual solution at time t = 80 is equal to 30.09, Euler’s method underestimated it, which agrees with the figure above. Comment. Euler’s method approximation above at t = 80 was not very accurate, but the steps of h = 20 we took along the tangent lines we quite large. If we took steps of size h = 1, our Euler method approximation would give us 29.93, which is much better even though the step h = 1 is not so small. With the step size h = 0.1, Euler method would produce 30.07 at time t = 80, so the approximation gets better and better as the step size gets smaller. In Wolfram Alpha, one can use the following command to see the answer 30.07 at time t = 80 with the step size h = 0.1: Euler’s method y′ = 0.02(20 − y), y(0) = 70, from 0 to 80, stepsize = 0.1 We can, of course, change the equation, the interval and the step size. Exercise 1. A cup of iced coffee is sitting at room temperature. The starting temperature is y(0) = 0◦ C and the differential equation is y′ = 0.02(20 − y). Describe the first four steps of Euler’s method with the step h = 5 minutes. If the actual solution at time t = 20 is equal to 6.59◦ C, does the Euler method underestimate or overestimate it? Use Wolfram Alpha to find the approximation at time t = 20 with the step size h = 0.1. 234 4 Differential equations Comment. In the above two problems we saw that Euler’s method underestimated or overestimated the actual solution. The reason was quite simple. • In the region where solutions are concave up, the tangent lines are below, so moving along the tangent lines will underestimate the actual solutions. • In the region where solutions are concave down, the tangent lines are above, so moving along the tangent lines will overestimate the actual solutions. Example 2. In the figure on the right, will Euler’s method overestimate or underestimate the solution starting from y(0) = 4.5? Solution: That solution seems to be concave down for a while and then becomes concave up. The slope field in the region below the equilibrium solution y = 6 also seems to be concave down or concave up in different parts. As a result, we can not say for sure whether Euler’s method will overestimate or underestimate this solution. Exercise 2. In the figure above, will Euler’s method overestimate or underestimate the solution starting from y(0) = 7 in the visible region? Taylor polynomial approximation. Next, we will use Taylor polynomials to approximate the solution of the initial value problem near the starting point x = a. In this section, we will only use Taylor polynomials of degree n = 2, but the same procedure can be iterated to find higher degree Taylor polynomial approximations if necessary. Since we want to approximate the solution y = y(x) of the above differential equation, let us recall that P2 (x) = y(a) + y′ (a)(x − a) + y′′ (a) (x − a)2 2 is its Taylor polynomial of degree n = 2 centered at x = a. We already know y(a) because we are given the initial value y(a) = b. Because y(x) is the solution of the equation y′ = f (x, y), we can use it to compute y′ (a) = f (a, y(a)) = f (a, b). So automatically we know that y(a) = b, y′ (a) = f (a, b) and it only remains to compute y′′ (a). Again, because y′ (x) = f (x, y(x)), we can differentiate this equation to find y′′ (x) = (y′ (x))′ . Let us illustrate this on a specific example. 4.2 Differential equations: approximations 235 Example 3. If y(t) is the solution of 1 1 = y′ = sin(π(t + y)), y 4 2 find its Taylor polynomial of degree 2 centred at 41 . Solution: We are given that y( 14 ) = 21 and, using the differential equation, we can compute the first derivative 1 1 1 1 1 3π 1 y′ = sin π +y = sin π + = sin =√ . 4 4 4 4 2 4 2 To find y′′ ( 41 ), we differentiate the equation using the chain rule and keeping in mind that y is a function of t: ′ ′ y′′ (t) = y′ (t) = sin π(t + y) ′ = cos(π(t + y)) π(t + y) = cos(π(t + y)) π(1 + y′ ) . We now plug in t = y′′ 1 4 and use that y = 1 2 and y′ = √1 2 at t = 41 : 1 1 1 3π 1 1 = cos π π 1+ √ = cos π 1+ √ = + 4 4 2 4 2 2 1 1 1 1 = − √ π 1 + √ = −π √ + . 2 2 2 2 As a result, the Taylor polynomial of degree 2 centred at t = P2 (t) = 1 4 is 1 1 1 2 1 1 π 1 √ + . +√ t− − t− 2 4 2 4 2 2 2 In the above figure, this Taylor polynomial is graphed as a dashed green curve, while the actual solution is graphed as a solid blue line, both for t ≥ 14 . We can see that the approximation works well only if we are not far from the starting time t = 14 . Comment. Here we only compute the second degree Taylor polynomials but, once we computed the second derivative y′′ (x), we can differentiate it once again to find y′′′ (x) and to find the third degree Taylor polynomial. We can repeat this process to find any degree Taylor polynomials. In another direction, we could use the Taylor polynomial of degree 2 only on a small interval of step h, which would give us a more accurate version of Euler’s method. Exercise 3. If y(x) is the solution of the differential equation y′ = sin(x2 y) with the initial condition y(1) = π, find its Taylor polynomial of degree 2 centred at x = 1. 236 4 Differential equations Exercise 4. Consider an differential equation y′ = f (y) where the plot of f (y) is given in the figure. Notice that the horizontal axis corresponds to the variable y. If y(5) = 3, is the solution y(t) concave up or down at time t = 5? Answer to Exercise 1. Step 1. Our starting point is (0, 0) and the slope at this point is f (0, 0) = 0.02(20 − 0) = 0.4. This means that on the first interval from t = 0 to t = 5, the tangent line is y = 0.4t. If we move along this line, at time t = 5 we end up at y = 2. Step 2. After the first step we are at the point (5, 2). The slope at this point is f (5, 2) = 0.02(20 − 2) = 0.36. This means that on the second interval from t = 5 to t = 10, the tangent line is y = 2 + 0.36(t − 5). If we move along this line, at time t = 10 we end up at y = 2 + 0.36(10 − 5) = 3.8. Step 3. After the second step we are at the point (10, 3.8). The slope at this point is f (10, 3.8) = 0.02(20 − 3.8) = 0.324. This means that on the third interval from t = 10 to t = 15, the tangent line is y = 3.8 + 0.324(t − 10). If we move along this line, at time t = 60 we end up at y = 3.8 + 0.324(15 − 10) = 5.42. Step 4. After the third step we are at the point (15, 5.42). The slope at this point is f (15, 5.42) = 0.02(20 − 5.42) = 0.2916. This means that on the fourth interval from t = 15 to t = 20, the tangent line is y = 5.42 + 0.2916(t − 15). If we move along this line, at time t = 20 we end up at y = 5.42 + 0.2916(20 − 15) = 6.878. Since the actual solution at time t = 20 is 6.5935, Euler’s method overestimated it. Using the following command in Wolfram Alpha Euler’s method y′ = 0.02(20 − y), y(0) = 0, from 0 to 20, stepsize = 0.1 we get that Euler’s method with the step size h = 0.1 gives 6.5989, which is a much better approximation. Answer to Exercise 2. The slope field in the region above the equilibrium solution y = 6 seems to be concave up everywhere, and the solution looks concave up. As a result, Euler’s method will underestimate this solution in this region. Answer to Exercise 3. We are given that y(1) = π and, using the differential equation, we can compute y′ (1) = sin(12 π) = sin(π) = 0. To find y′′ (1), we differentiate the equation using the chain rule and keeping in mind that y is a function of t: ′ ′ y′′ (t) = y′ (t) = sin x2 y ′ = cos(x2 y) x2 y = cos(x2 y) 2xy + x2 y′ . We now plug in x = 1 and use that y = π and y′ = 0 at x = 1: 4.2 Differential equations: approximations 237 y′′ (1) = cos(12 π) 2 · 1 · π + 12 · 0 = −2π. As a result, the Taylor polynomial of degree 2 centred at t = 1 is P2 (x) = π + 0(x − 1) − 2π (x − 1)2 2 = π − π(x − 1)2 . In the figure, this Taylor polynomial is graphed as a dashed green curve, while the actual solution is graphed as a solid blue line, both for t ≥ 1. We can see that this approximation works well if we are not far from t = 1. Answer to Exercise 4. To decide if y(t) is concave up or down at time t = 5, we need to compute the second derivative y′′ (t). Since y′ = f (y), differentiating this equation gives y′′ = ( f (y))′ = f ′ (y)y′ = f ′ (y) f (y), where in the last step we replaced y′ by f (y). At time t = 5, y′′ (5) = f ′ (y(5)) f (y(5)) = f ′ (3) f (3), because we are given that y(5) = 3. Looking at the figure, we see that f (3) is negative, but its slope f ′ (3) is positive, so y′′ (5) < 0 and y(t) is concave down at t = 5. 238 4 Differential equations 4.3 Separable differential equations In this section we will learn how to solve separable equations of the form y′ = f (x)g(y) or y′ = f (x) g(y) or y′ = g(y) . g(x) In other words, on the right hand side the variables x and y are separated into a product or ratio of two functions, and one of the functions depends only on x and another one depends only on y. Examples of such differential equations are y′ = xy, x , y′ = ex+y = ex · ey , 1+y x(1 − y) x 1−y y′ = = · . y(1 + x) 1 + x y y′ = y′ = 0.5y, Let us illustrate on a specific example how we can solve such equations. Example 1. Find the solution of a separable differential equation y′ = xy with the initial condition y(0) = −2. What if the initial condition is not given? Solution: First, let us show how to solve such equation formally and after that we will explain why these steps make sense even though they might look a bit strange at first. dy and rewrite the equation as Step 1. We write the derivative as y′ = dx dy = xy. dx Step 2. We move all the terms containing y variable to one side and all the terms containing x variable to the other side: dy = x dx. y Here we treat dy and dx formally as numbers, so we can move them around using algebra rules. Step 3. We now integrate both sides formally and find antiderivatives: Z dy = y Z x dx =⇒ ln |y| = x2 +C. 2 We do not need to write +C on both sides, because those indeterminate constants can be combined R into one constant anyway. At this step, it is very important to remember that dy y = ln |y| +C and not ln(y) +C, i.e. we should not forget the 4.3 Separable differential equations 239 absolute value |y|. For example, the initial condition is y(0) = −2, so if we forgot the absolute value and wrote ln(y) + C, ln(−2) would be undefined because we cannot plug in negative values into the logarithm. Step 4 (with initial condition). First, we plug in the initial condition y(0) = −2: ln | − 2| = 02 +C. 2 This gives us that the constant C = ln 2, so the equation is ln |y| = x2 + ln 2. 2 Then we solve for y if possible. In some cases, the equation may be too complicated to solve for y, so we can leave it as is and think of y = y(x) as an implicit solution of this equation. In this particular problem, we can solve for y: ln |y| = x2 + ln 2 2 =⇒ |y| = ex 2 /2 eln 2 = 2ex 2 /2 . Now, we recall that |y| = y is y is positive and |y| = −y is y is negative. In this case, we start with the negative initial condition y(0) = −2 which tells us that y is negative, so we can finally write −y = ex 2 /2 eln 2 = 2ex 2 /2 =⇒ y = −2ex 2 /2 . 2 We found the solution y = −2ex /2 . Step 4 (without initial condition). If we do not have the initial condition then we leave the constant C in Step 3 indeterminate. Again, we can try to solve for y if possible, or leave it as is and think of y = y(x) as an implicit solution. In this particular problem, we can solve for y: ln |y| = x2 +C 2 2 /2 =⇒ |y| = eC ex =⇒ y = (±eC )ex =⇒ y = Bex 2 /2 2 /2 . In the last step we renamed ±eC as another indeterminate constant B to keep the expression simple. If we have the initial condition then we can determine B at this step. For example, we know that y(0) = −2, so −2 = Be0 = B, so B = −2 and 2 y = −2ex /2 again. 240 4 Differential equations Comment. The first three steps above are easy to follow in practice, but they hide an implicit integration by substitution. To make sense of these steps, we could have rewritten the original equation y′ = xy as y′ (x) =x y(x) y′ (x) = xy(x) or and, since both sides are functions of x, their indefinite integrals are the same: Z y′ (x) dx = y(x) Z x dx. On the left hand side, if we make the substitution y = y(x), we can rewrite it as R dy , y so we arrive to Step 3 above in a way that makes more sense. Of course, in practice we can skip this substitution step and follow the above formal steps. Exercise 1. A cup of coffee is sitting at room temperature of 20◦ C. The coffee temperature satisfies the equation y′ = 0.1(20 − y) and its initial temperature is y(0) = 90◦ C. What is the temperature y(t) at time t? Example 2. A common model of tumour growth is described by a differential equation y′ = bce−ct y for some positive constants b > 0 and c > 0. Find the solutions of this equation. Write down the solution corresponding to b = c = 1 and the initial condition y(0) = 1e . Solution: This is a separable equation, so we go through the standard steps: dy = bce−ct y dt =⇒ =⇒ =⇒ dy = bce−ct dt y Z Z dy = bce−ct dt y bc −ct ln |y| = e +C = −be−ct +C. −c −ct Exponentiating both sides, we get |y| = eC e−be . Since the size of the tumour y −ct is positive, |y| = y and so y = eC e−be . We can also rename eC as a and write the −ct general solution as y = ae−be . This family of functions is called the Gompertz growth functions. Constant a could be determined from the initial condition. If b = c = 1 then the −t general solution is y = ae−e . Plugging in the initial condition y(0) = 1e we get 1 −e−0 = ae−1 = a , so a = 1 and the solution is y = e−e−t . e = ae e 4.3 Separable differential equations 241 Exercise 2. When a crow sitting on a tree drops a piece of cheese, the angle θ of fox’s line of sight of the cheese is changing according to the equation dθ gt = − cos2 (θ ) dt d where g = −9.8 m/s2 and d is the distance from the fox to where the cheese will land. Find θ (t) if θ (0) = π4 . Hint: (tan(θ ))′ = sec2 (θ ). Exercise 3. A rumour1 is spreading among 100 people at a party starting from one person at time t = 0. The rumour spreads at a rate proportional to the number y of people who already know the rumour and proportional to the number 100 − y of people who do not yet know the rumour, namely: dy = κy(100 − y) dt where κ > 0 is some constant, and where time t is measured in minutes. 1 1 1 1 ( y + 100−y ). (a) Find y(t). Hint: use that y(100−y) = 100 (b) If 3 people know the rumour after 1 minute, find κ. (c) At what time will 95 people know the rumour? Exercise 4. Water is flowing through a hole at the bottom of a cylindrical container, as shown in the figure. According to Torricelli’s law, when the water level is at height h, the speed v of flow of water through the hole is given by the formula v= p 2gh. Suppose that the hole has area a and the area of the horizontal cross-section of the container is A. Let us derive and solve the differential equation for the water level h(t) at time t. 1 Image from freepik.com 242 4 Differential equations (a) If between time t and t + ∆t the water level changed by ∆h, using that the crosssection area of the container is A, what is the volume ∆V of water that escaped during this time. Warning: water level is decreasing so the change ∆h is negative! (b) p By Torricelli’s law, at time t the water was flowing out at the speed v = 2gh(t). Using that the water hole has area a and the speed does not change much over a very short period ∆t, what is the approximate volume ∆V of water jet that escaped during this time. (c) Set the volumes ∆V in (a) and (b) equal to each other and derive a differential equation for h(t) by taking the limit ∆t → 0. (d) Solve this equation with the initial condition h(0) = H. How long will it take for the container to empty? Answer to Exercise 1. We start with the usual steps: dy = 0.1(20 − y) =⇒ dt Z Z dy = 0.1 dt =⇒ 20 − y dy = 0.1 dt 20 − y =⇒ − ln |20 − y| = 0.1t +C. R The minus sign in front of − ln |20−y| is because of our usual rule f (m+κx)dx = 1 κ F(m+κx)+C if F is an antiderivative of f . We plug in the initial value y(0) = 90 to find C: − ln |20 − 90| = 0.1(0) + C, so C = − ln 70 and the equation becomes − ln |20−y| = 0.1t −ln 70 or ln |20−y| = −0.1t +ln 70. Exponentiating both sides: |20 − y| = 70e−0.1t . Finally, we need to decide if |20 − y| = 20 − y or |20 − y| = −(20 − y) = y − 20. Since initial temperature is 90, the correct choice is |20 − y| = y − 20, so y − 20 = 70e−0.1t or y = 20 + 70e−0.1t . Answer to Exercise 2. We start with the usual steps (and use gt dθ = − cos2 (θ ) =⇒ dt d Z Z gt 2 sec (θ ) dθ = − dt =⇒ d 1 cos2 (θ ) = sec2 (θ )): dθ gt = sec2 (θ ) dθ = − dt 2 cos (θ ) d 2 gt tan(θ ) = − +C. 2d 2 =⇒ Plugging in the initial condition θ (0) = π4 , we get tan( π4 ) = − g(0) 2d +C, so 1 = C gt 2 and tan(θ ) = 1 − 2d . We can solve this for θ by taking the inverse tangent: θ = 2 tan−1 (1 − gt2d ). Answer to Exercise 3. (a) We start with the usual steps: dy = κy(100 − y) =⇒ dt =⇒ dy = κ dt y(100 − y) Z Z dy = κ dt. y(100 − y) 4.3 Separable differential equations Using the hint hand side, so 1 y(100−y) = 243 1 1 100 ( y 1 + 100−y ), we can easily find the integral on the left 1 ln |y| − ln |100 − y| = κt +C 100 or ln |y| − ln |100 − y| = 100κt + 100C. Plugging in the initial condition y(0) = 1, we get that 100C = − ln 99, so the equation can be rewritten as ln |y| − ln |100 − y| = ln |y| = 100κt − ln 99. |100 − y| Exponentiating both sides, we get |y| 1 = e100κt . |100 − y| 99 Because the number of people who know the rumour is always between 0 and 100, both y and 100 − y are positive, so we can forget about the absolute values and y 1 100κt = 99 e . It remains to solve it for y: write the equation as 100−y 1 100κt 100 100κt 1 e (100 − y) = e − e100κt y 99 99 99 100 1 100κt 100 100κt 1 y+ e y= e =⇒ y 1 + e100κt = e100κt 99 99 99 99 100e100κt y= . 99 + e100κt y= =⇒ =⇒ This solution is an example of the so called logistic function. (b) If y(1) = 3 then plugging this into our solution above gives 3 = Solving this for κ: 3(99 + e100κ ) = 100e100κ =⇒ 297 + 3e100κ = 100e100κ =⇒ e100κ = =⇒ 297 97 1 297 κ= ln = 0.0111 . . . . 100 97 (c) 95 people will know the rumour when 95(99 + e100κt ) = 100e100κt =⇒ 100e100κt 99+e100κt = 95, so 9405 + 95e100κt = 100e100κt 9405 = 1881 5 ln(1881) =⇒ t = ≈ 6.8 min. 100κ =⇒ e100κt = 100e100κ . 99+e100κ 244 4 Differential equations Answer to Exercise 4. (a) Since ∆h is negative, −∆h is the height by which the water level decreased. Since the cross section area is A, the volume of water that escaped is Area × Height = A × (−∆h) = −A∆h. (b) If we imagine the water jet flowing from the hole as a small cylinder, its cross-section is the area of the hole a. During short time period ∆t, the speed is almost constant, v(t), so the jet moved by Height = Speed × Time ≈ v(t)∆t = p 2gh(t)∆t, where in the last step we used Torricelli’s p law. Multiplying this by the area a gives us the volume of the water jet, ∆V ≈ a 2gh(t)∆t. p (c) The volumes we found in (a) and (b) should be equal so −A∆h ≈ a 2gh(t)∆t. p Dividing both sides by ∆t we get −A ∆h ∆t ≈ a 2gh(t). When we take the limit ′ ∆t → 0, the approximation will get better and better and ∆h ∆t will become h (t), so we finally obtain the differential equation p p p −Ah′ (t) = a 2gh(t) = a 2g h(t). (d) To solve this equation, we follow the usual steps: √ √ dh a 2g a 2g √ dh =− h =⇒ √ = − dt =⇒ dt A A h √ √ Z Z √ a 2g dh a 2g √ =− dt =⇒ 2 h = − t +C. A A h √ Using the initial condition h(0) = H, we find the constant C = 2 H, so √ a√2g √ 2 √ √ a 2g 2 h=− t + 2 H =⇒ h = − t+ H . A 2A √ √ a 2g The container will become empty when h = 0 so when − 2A t + H = 0, or √ √ H. t = 2A a 2g 4.4 Lotka-Volterra predator-prey model 245 4.4 Lotka-Volterra predator-prey model In this section we will investigate a famous Lotka-Volterra predator-prey model. We imagine a simplified situation when: • one predator species (for example, foxes) shares the same territory with one prey species (for example, rabbits); • without rabbits, foxes would not have enough food (berries, other prey, etc.) and would generally die out; • rabbit have unlimited resources (food) and, in the absence of foxes, would multiply exponentially. Let us denote by x(t) the population of rabbits and by y(t) the population of foxes at time t. We should remember that the units can be in hundreds, or thousands, etc., so x(t) = 0.5 does not mean half a rabbit. Without encounters between foxes and rabbits (in other words, if foxes did not eat rabbits), these populations would satisfy the equations: x′ (t) = ax, y′ (t) = −cy for some constants a > 0 and c > 0. The population of rabbits would grow at a rate proportional to the number of rabbits, and the population of foxes would die out at a rate proportional to the number of foxes. We know that these correspond to the exponential growth and exponential decay. Since foxes and rabbits share the same territory, it is reasonable to assume that rate of encounters is proportional to the product of populations xy. Indeed, imagine that a rabbit encounters 1 fox per month. If the population of foxes doubles then it is reasonable that the number of encounters per month would also double to 2. So the number of encounters of one rabbit with foxes per month is proportional to y, and the total number of encounters is proportional to xy. The Lotka-Volterra model modifies the above equations to account for these encounters: x′ (t) = ax − bxy, y′ (t) = −cy + dxy for some positive constants a, b, c and d. The second term −bxy in the rate x′ (t) of the rabbit population change is there because some proportion of the encounters will lead to rabbits being killed and eaten. The second term +dxy in the rate y′ (t) of the fox population change is there because those encounters yield food for foxes so they allow them to survive and procreate. Simply put, encounters are bad for rabbits and good for foxes. 246 4 Differential equations In the rest of this section we will analyze this model and, for simplicity, will take the coefficients to be all equal to 1, a = b = c = d = 1, so x′ (t) = x − xy, y′ (t) = −y + xy. We will break the analysis of the model into two exercises with relatively simple steps. In the exercises, we always assume that the derivatives x′ (t) and y′ (t) satisfy the above Lotka-Volterra equations. In the first exercise, we will draw the pair of populations x(t) and y(t) as a point (x(t), y(t)) on the xy-plane, and will try to imagine what kind of trajectory this point (x(t), y(t)) will follow over time. Before we begin, let us give a simple example. Example 1. Suppose that an ant is walking on a flat surface and its coordinates at time t are given by x(t) = cos(t) and y(t) = sin(t). (a) Describe the trajectory of the ant on the plane. (b) Notice that x′ (t) = −y and y′ (t) = x. If we only knew these equations, what could we say about the trajectory of the ant? Solution: (a) We know that cos(θ ) and sin(θ ) are the coordinates of a point on a unit circle corresponding to the angle θ , so if we think of time t as an angle, we know that the ant is moving along this unit circle. Another way to see this is to notice that x2 (t) + y2 (t) = cos2 (t) + sin2 (t) = 1, so the position (x(t), y(t)) of the ant at time t is on the units circle x2 + y2 = 1. (b) Now imagine that we only know that x′ (t) = −y and y′ (t) = x. In the first quadrant where x > 0 and y > 0, the signs of the derivatives are x′ (t) = −y < 0 and y′ (t) = x > 0 so the coordinate x(t) is decreasing and y(t) is increasing with time. This means that the ant is moving in the north-west direction, as indicated by the arrow in the first quadrant. Similarly, we can check that in the second quadrant it is moving in the south-west direction, in the third quadrant in the south-east direction, and in the fourth quadrant in the north-east direction. The equations do not immediately tell us that the ant is walking on a circle, but we get a general idea that the ant is moving counterclockwise around the origin. We will now apply a similar analysis to the fox and rabbit populations using the Lotka-Volterra equations. After that we will see how we could figure out that the ant is moving along a circle using only the equations x′ (t) = −y and y′ (t) = x. 4.4 Lotka-Volterra predator-prey model 247 Exercise 1. (a) Find all the points (x, y) on the xy-plane such that both x − xy = 0 and − y + xy = 0. What happens if the initial populations (x(0), y(0)) at time t = 0 are at one of those points? (b) In each of the four regions in the figure (separated by the lines x = 1 and y = 1), determine the signs of the derivatives x′ (t) and y′ (t) using the Lotka-Volterra equations. As time t increases, in which direction on the plane will the pair (x(t), y(t)) be moving: north-east, north-west, south-east, south-west? Draw an arrow indicating the direction in each region. What do these arrow tells us about how the populations are changing over time? Next, we will consider a different question. Both populations x(t) and y(t) depend on time t, but how does the population of foxes y depend on the population of rabbits x? In other words, can we eliminate time t and find y = y(x) as a function of x? Before we address this question, let us make a useful observation. Comment. If we plug in x = x(t) and y = y(t) into y = y(x) we get that y(t) = y x(t) . Taking the derivative of both sides, by the chain rule, y′ (t) = y′ (x)x′ (t), so y′ (x) = y′ (t) . x′ (t) This formula makes sense, because y′ (t) ≈ y′ (t) ≈ x′ (t) ∆y ∆t ∆x ∆t = ∆y ∆t and x′ (t) ≈ ∆x ∆t , so ∆y ≈ y′ (x). ∆x If we know that the coordinate x(t) moves at a rate x′ (t) and coordinate y(t) moves ′ at a rate y′ (t) then the pair moves along the curve with the slope y′ (x) = xy′ (t) (t) . This is how the slope field was graphed in the previous exercise, using the formula for y′ (x) that will be computed in the next exercise. 248 4 Differential equations Example 2. In the setting of the ant example above, if we know that x′ (t) = −y and y′ (t) = x, find an equation for y′ (x) and solve it to show that the ant is moving along a circle. Solution: Using the chain rule formula in the comment above, y′ (x) = x y′ (t) =− , ′ x (t) y where in the last step we used the given equations x′ (t) = −y and y′ (t) = x. This is a separable equation so we can solve it following standard steps: dy x =− dx y =⇒ y dy = −x dx Z =⇒ =⇒ y dy = − Z x dx x2 y2 = − +C. 2 2 If we rewrite this as x2 + y2 = 2C, we can see that this is an equation of the circle centred at the origin. The constant C can be found if we are given the initial condition (x(0), y(0)) or any point on the trajectory. Exercise 2. (a) Recall the Lotka-Volterra equations describing the populations of foxes and rabbits: x′ (t) = x − xy, y′ (t) = −y + xy. Find a differential equation for y′ (x) of the form y′ (x) = f (x, y). (b) Solve the equation you found in (a). The answer can be written as an implicit function. Exercise 3. Match the following systems of equations ′ ′ x (t) = −x + 0.5xy x (t) = 0.05x (a) (c) ′ y (t) = 0.5y − 0.01xy y′ (t) = −0.5y + 2xy ′ ′ x (t) = 0.1x − 0.5xy x (t) = −x + 0.5xy (b) (d) y′ (t) = 0.2y − 0.1xy y′ (t) = −0.5y + 0.01xy with the following pair of species. Which species is x and which one is y? 1. Humans and gut bacteria; both need 3. Polar bears and seals. each other to survive. 2. Owls and trees; owls needs trees for 4. Ducks and geese; in competition for nesting and food. food. 4.4 Lotka-Volterra predator-prey model 249 Answer to Exercise 1. (a) First, x − xy = x(1 − y) = 0 when either x = 0 or y = 1. Similarly, −y + xy = y(−1 + x) = 0 when either x = 1 or y = 0. This means that both will be equal to zero when x = 0 and y = 0, or x = 1 and y = 1. In other words, at the points (0, 0) and (1, 1). At these points x′ (t) = x − xy = 0 and y′ (t) = −y + xy = 0, so these are equilibrium solutions of this system of equations, and populations starting out at one of these points will always stay there, i.e. the populations will be constant. Of course, the case (0, 0) makes sense, but the case of (1, 1) is the feature of the Lotka-Volterra model. We remind again that 1 here could be in arbitrary units, so it does not mean one fox and one rabbit. (b) The directions of how the pair of populations (x(t), y(t)) changes over time are described by the arrows in the left figure above. Indeed, x′ (t) = x−xy = x(1−y) is positive when y < 1 and negative when y > 1, so x(t) is increasing (moving right, or east) below the line y = 1, and decreasing above this line (moving left, or west). This makes sense, because this tells us that the population of rabbits increases when there are not too many foxes around, and it decreases when the population of foxes exceeds a certain threshold. Similarly, y′ (t) = −y+xy = y(−1+x) is positive when x > 1 and negative when x < 1, so y(t) is increasing (moving up, or north) to the right of the line x = 1 and decreasing to the left of this line (moving down, or south). Over time, the point (x(t), y(t)) will be moving around the equilibrium point (1, 1). Again, this makes sense, because this tells us that the population of foxes decreases when there are not enough rabbits around, and it increases when the population of rabbits exceeds a certain threshold. In the figure above on the right, we show the slope field y′ (x) and one trajectory starting at a point (0.5, 0.5). We will discuss the slope field in the next example, and the trajectory can be graphed in Geogebra using the command: 250 4 Differential equations SolveODE(−y + xy, x − xy, 0.5, 0.5, 10, 0.01). Here −y + xy and x − xy are the formulas for x′ (t) and y′ (t), (0.5, 0.5) is the initial position at time t = 0, 10 is the time t until the equation is solved numerically (you can try other values of t), and 0.01 is the step size. We will find the formula for this trajectory in the next exercise. Answer to Exercise 2. Using the chain rule as in the previous example, y′ (x) = y′ (t) −y + xy y(−1 + x) = = x′ (t) x − xy x(1 − y) where in the middle step we applied the Lotka-Volterra equations. This gives us a separable differential equation y′ (x) = x−1 y · . x 1−y We can solve it using the standard steps: dy x − 1 y = · dx x 1−y =⇒ =⇒ =⇒ 1−y x−1 dy = dx y x Z Z 1 1 − 1 dy = dx 1− y x ln |y| − y = x − ln |x| +C. Since the populations are positive, we can forget about the absolute values and write the last equation as ln(y) − y = x − ln(x) +C. We cannot solve this for y, so we leave it as an implicit solution. The constant C can be found if we are given the initial condition (x(0), y(0)) or any point on the trajectory. Answer to Exercise 3. 1(d) – cannot tell which one is which. 2(c) – x is trees and y is owls. 3(a) – x is polar bears and y is seals. 4(b) – cannot tell which one is which. 4.5 The SIR model 251 4.5 The SIR model In this section we will take a look at a simple deterministic model of the spread of disease in a population, for example a seasonal viral infection such as flu. At any time t, the population will be divided into three groups. • Let S(t) be the proportion of susceptible individuals at time t, meaning that they have not yet been infected and do not have immunity; • Let I(t) be the proportion of currently infected individuals at time t; • Let R(t) be the proportion of recovered individuals at time t. We will assume that recovered have full immunity and cannot be reinfected. Proportion here means the proportion of the entire population, so it is a number between 0 and 1. The spread of disease is driven by encounters between infected and susceptible individuals and, as with foxes and rabbits, we will assume that: rate of encounters between infected and susceptible is proportional to S(t)I(t). Then we can model the rates of change of the three groups by the equations: S′ (t) = −aSI, I ′ (t) = aSI − bI, R′ (t) = bI. • Constant a > 0 is the rate of infection (more precisely, the rate of potential transmissions by an infected individual), defined as the average number of contacts of one person per unit of time, multiplied by the probability of disease transmission in a contact between a susceptible and an infectious subject. • Constant b > 0 is the rate of recovery, which is defined as b = D1 where D is an average time period an individual is infectious. The term aSI represents new infections (per unit of time) resulting from interactions between susceptible and infectious individuals, so it is added to the infected group and subtracted from the susceptible group. The term bI represents newly recovered individuals (per unit of time), so it is subtracted from the infected and added to the recovered group. Since the term R(t) does not appear in the first two equations, we can for now forget about the last equation R′ (t) = bI and first focus on analyzing and solving the system of the first two equations: S′ (t) = −aSI, I ′ (t) = aSI − bI. In the next problem there will appear a constant R0 = a = aD b 252 4 Differential equations which is called the basic reproduction number. Since it is the rate a of potential transmissions by an infectious individual per unit of time multiplied by the number of days D an individual is typically infected, R0 represents the average number of new cases generated by one infected individual, assuming that everyone else is susceptible. When the proportion of susceptible in the population is S, the actual average number of new cases generated by one infected individual is R0 S. Notice that S′ (t) = −aSI is negative, so the number of susceptible individuals is always decreasing (obviously). In the first exercise we will analyze when the infectious subpopulation is increasing and decreasing. Exercise 1. In the first quadrant S > 0, I > 0 on the SI-plane (with S on the xaxis and I on the y-axis) find the regions where I ′ (t) is positive, negative, or zero. Express the regions in terms of the basic reproduction number R0 . In each region, sketch in which direction the pair (S(t), I(t)) is moving as time t increases. Explain the behaviour in terms of the basic reproduction number R0 . Exercise 2. Suppose that an individual is typically infected for D = 4 days, and the basic reproduction number is R0 = 2. (a) Write down the SIR model corresponding to these parameters. (b) Find a differential equation for I ′ (S). (c) Solve this equation with the initial conditions S(0) = 0.95 and I(0) = 0.5. Spread of disease in time. If we want to see how the disease spreads over time, we could try to find the function S = S(t), which tells us how the susceptible population decreased as a function of time. Then 1−S(t) will give us the proportion of the population infected up to time t. For example, in the setting of the previous exercise, if we take the solution I = 0.5 ln S − S + 1.475 we found in part (c) and plug it into the equation S′ (t) = −0.5SI found in part (a), we get dS = −0.5S(0.5 ln S − S + 1.475). dt This is a separable equation, but we can not solve it explicitly because we can not integrate Z dS S(0.5 ln S − S + 1.475) explicitly. However, we can solve it numerically, for example, by using the following command in Geogebra: SolveODE(−0.5y(0.5 ln(y) − y + 1.475), 0, 0.95, 30, 0.1). This produced the graph in the above figure. Here the initial condition is S(0) = 0.95, and we solve the equation up to time t = 30. 4.5 The SIR model 253 Answer to Exercise 1. The derivative I ′ (t) is zero, I ′ (t) = aSI − bI = (aS − b)I = 0 when aS − b = 0 (assuming that I > 0), or when S = ba = R10 . The derivative is positive when S > ab = R10 and it is negative when S < ab = R10 , so the regions when the proportion I(t) of infected individuals is increasing or decreasing are separated by the vertical line S = R10 . Since S(t) is always decreasing as time t increases, the pair (S(t), I(t)) is moving in the north-west direction when S > R10 , and it is moving in the south-west direction when S < R10 . This is depicted in the left figure below. Recall that R0 S represents the average number of new cases generated by one infected individual when the proportion of susceptible in the population is S. If we rewrite S > R10 as R0 S > 1, and S < R10 as R0 S < 1, we see that the proportion of infected individuals I(t) is increasing when this average number of new cases is bigger than 1, and is decreasing when it is smaller than 1. This makes perfect sense. In the figure below on the right, we show the slope field I ′ (S) and one trajectory for the pair (S(t), I(t)). We will discuss the slope field in the next exercise, and the trajectory can be graphed in Geogebra using the command: SolveODE(axy − by, −axy, S(0), I(0),t, stepsize). We have to write x instead of S and y instead of I in axy − by, −axy, for Geogebra to understand what we want. Answer to Exercise 2. (a) b = 1 D = 0.25 and a = S′ (t) = −0.5SI, R0 D = 0.5, so the equations are I ′ (t) = 0.5SI − 0.25I. 254 4 Differential equations (b) Using the chain rule as in the last section, I ′ (S) = I ′ (t) 0.5SI − 0.25I 0.5 = = −1 + ′ S (t) −0.5SI S where in the middle step we applied the SIR equations. This is the slope field that was graphed in the right figure above. The right hand side depends only on the independent variable S, we we can simply integrate this to find I(S): Z 0.5 − 1 dS = 0.5 ln S − S +C. I(S) = S The constant C can be found from the initial condition (0.95, 0.5), so 0.5 = 0.5 ln 0.95 − 0.95 +C and C = 1.475 . . . . We get I(S) = 0.5 ln S − S + 1.475. 4.6 Approximating solutions by Taylor polynomials 255 4.6 Approximating solutions by Taylor polynomials We have already seen how to approximate solutions of differential equations by Taylor polynomials of degree 1 (Euler’s method), and Taylor polynomials of degree 2, by differentiating the equation. We also mentioned that the same procedure can be iterated to find higher order derivatives of solutions y(x) and, as a result, higher order Taylor polynomial approximations. In this section, we will try a different procedure to find higher order Taylor polynomial approximations called method of undetermined coefficients. Let us illustrate this method on an example. Example 1. Consider the initial value problem: y′ (x) + 4y(x) = 8, y(0) = −1. Find the Taylor polynomial of degree 4 for y(x) centred at x = 0 using the method of undetermined coefficients. Solution: Let us write the unknown solution as y(x) = c0 + c1 x + c2 x2 + c3 x3 + c4 x4 + . . . The terms c0 + c1 x + c2 x2 + c3 x3 + c4 x4 on the right hand side represent the Taylor polynomial we are looking for, so the constants c0 , c1 , c2 , c3 and c4 are unknown and are called undetermined coefficients. Since the Taylor polynomial is only an approximation of y(x), the terms . . . represent the missing difference, or error. We can also think of . . . as a placeholder: if we continue finding higher order derivatives, we can keep writing more and more terms in the Taylor polynomial approximation, and this open ended process is expressed by the dots. We will discuss this more in the next chapter on Taylor series. First of all, we know that the constant c0 of the Taylor polynomial should be the value y(0), so using the initial condition y(0) = −1 gives us that c0 = −1 and y(x) = −1 + c1 x + c2 x2 + c3 x3 + c4 x4 + . . . Next, we want to plug in this expression into the equation y′ (x) + 4y(x) = 8, which means that we need to take the derivative first: y′ (x) = c1 + 2c2 x + 3c3 x2 + 4c4 x3 + . . . If we add this to 4y(x) = −4 + 4c1 x + 4c2 x2 + 4c3 x3 + 4c4 x4 + . . . , we get y′ (x) + 4y(x) = c1 + 2c2 x + 3c3 x2 + 4c4 x3 + . . . − 4 + 4c1 x + 4c2 x2 + 4c3 x3 + 4c4 x4 + . . . = (c1 − 4) + (2c2 + 4c1 )x + (3c3 + 4c2 )x2 + (4c4 + 4c3 )x3 + . . . 256 4 Differential equations Notice how the last term 4c4 x4 in the second line disappeared. That is because it had nothing to be matched with, so it was absorbed by the dots . . .. We wrote the left hand side of the equation y′ (x) + 4y(x) = 8 as a polynomial of degree 3 plus some dots. Next, we need to write the right hand side as a polynomial of degree 3 plus some dots. In the next example, this will require a little bit of work, but in this example the right hand side is very simple, just a constant 8, so we can formally write 8 = 8 + 0x + 0x2 + 0x3 . Because we want the two sides to be equal, we need to make sure that (c1 + 4c0 ) + (2c2 + 4c1 )x + (3c3 + 4c2 )x2 + (4c4 + 4c3 )x3 = 8 + 0x + 0x2 + 0x3 . This means that the coefficients in front of each power of x must be equal, so c1 − 4 = 8, 2c2 + 4c1 = 0, 3c3 + 4c2 = 0, 4c4 + 4c3 = 0. The first equation gives us c1 = 12. Plugging it into the second equation gives us that 2c2 + 4(12) = 0, so c2 = −24. Plugging it into the third equation gives us that 3c3 + 4(−24) = 0, so c3 = 32. Plugging it into the fourth equation gives us that 4c4 + 4(32) = 0, so c4 = −32. We found the Taylor polynomial approximation of degree 4 for the solution y(x) of this equation: −1 + c1 x + c2 x2 + c3 x3 + c4 x4 = −1 + 12x − 24x2 + 32x3 − 32x4 . Comment. The above equation can be rewritten as y′ = 4(2 − y). It is a separable equation and can be solved following the usual steps: dy = 4(2 − y) =⇒ dx =⇒ =⇒ dy = 4 dx 2−y Z Z dy = 4 dx 2−y − ln |2 − y| = 4x +C. From the initial condition y(0) = −1 we get − ln |2 − (−1)| = 4(0) + C, so C = − ln 3 and the equation is − ln |2 − y| = 4x − ln 3, or ln |2 − y| = −4x + ln 3. Exponentiating both sides, we get |2 − y| = 3e−4x . Because at time t = 0, 2 − y is positive, we can drop the absolute absolute value and write 2 − y = 3e−4x , or y = 2 − 3e−4x . Since we found the exact equation for the solution, we can check the Taylor polynomial approximation above. Recall the pattern of the Taylor polynomials for ex : x2 x3 x4 ex ≈ 1 + x + + + 2! 3! 4! Replacing x with −4x: 4.6 Approximating solutions by Taylor polynomials 257 (−4x)2 (−4x)3 (−4x)4 + + 2! 3! 4! 32 3 32 4 2 = 1 − 4x + 8x − x + x . 3 3 e−4x ≈ 1 + (−4x) + Finally, y(x) = 2 − 3e−4x ≈ −1 + 12x − 24x2 + 32x3 − 32x4 , which matches the answer above. Exercise 1. Consider the initial value problem: y′ (x) − 3y(x) = 10, y(0) = 2. Find the Taylor polynomial of degree 4 for y(x) centred at x = 0 using the method of undetermined coefficients. Example 2. Consider the initial value problem: y′ (x) − 2y(x) = sin(x) , x y(0) = 0. Find the Taylor polynomial of degree 4 for y(x) centred at x = 0 using the method of undetermined coefficients. Solution: We handle the left hand side exactly the same way as in the first example. The initial condition gives us c0 = y(0) = 0, so y(x) = c1 x + c2 x2 + c3 x3 + c4 x4 + . . . y′ (x) = c1 + 2c2 x + 3c3 x2 + 4c4 x3 + . . . −2y(x) = −2c1 x − 2c2 x2 − 2c3 x3 − 2c4 x4 + . . . y′ (x) − 2y(x) = c1 + (2c2 − 2c1 )x + (3c3 − 2c2 )x2 + (4c4 − 2c3 )x3 + . . . is not a simple What is different in this example is that the right hand side sin(x) x constant anymore and, to match the coefficients, we first need to find its Taylor polynomial. For this, we need to recall the pattern of Taylor polynomials for sin(x): sin(x) = x − x3 x5 + −.... 3! 5! Dividing both sides by x, we get sin(x) x2 x4 x2 x4 = 1 − + − . . . = 1 + 0x − + 0x3 + − . . . . x 3! 5! 3! 5! Equating the coefficients of y′ (x) − 2y(x) above with the coefficients of sin(x) x : 258 4 Differential equations c1 = 1, 2c2 − 2c1 = 0, 3c3 − 2c2 = − 1 1 =− , 3! 6 Solving them sequentially we get c1 = 1, c2 = 1, c3 = the Taylor polynomial approximation of degree 4: x + x2 + 4c4 − 2c3 = 0. 11 18 , and c4 = 11 36 . We found 11 3 11 4 x + x . 18 36 Exercise 2. Consider the initial value problem: y′ (x) + y(x) = x cos(x), y(0) = 0. Find the Taylor polynomial of degree 4 for y(x) centred at x = 0 using the method of undetermined coefficients. Answer to Exercise 1. The initial condition gives us c0 = y(0) = 2, so y(x) = 2 + c1 x + c2 x2 + c3 x3 + c4 x4 + . . . y′ (x) = c1 + 2c2 x + 3c3 x2 + 4c4 x3 + . . . −3y(x) = −6 − 3c1 x − 3c2 x2 − 3c3 x3 − 3c4 x4 + . . . y′ (x) − 3y(x) = (c1 − 6) + (2c2 − 3c1 )x + (3c3 − 3c2 )x2 + (4c4 − 3c3 )x3 + . . . This must be equal to 10 + 0x + 0x2 + 0x3 , so equating the coefficients in front of the same powers: c1 − 6 = 10, 2c2 − 3c1 = 0, 3c3 − 3c2 = 0, 4c4 − 3c3 = 0. Solving these equations sequentially, we get c1 = 16, c2 = 24, c3 = 24, c4 = 18, so the Taylor polynomial approximation of degree 4 is 2 + 16x + 24x2 + 24x3 + 18x4 . Answer to Exercise 2. The initial condition gives us c0 = y(0) = 0, so y(x) = c1 x + c2 x2 + c3 x3 + c4 x4 + . . . y′ (x) = c1 + 2c2 x + 3c3 x2 + 4c4 x3 + . . . y′ (x) + y(x) = c1 + (2c2 + c1 )x + (3c3 + c2 )x2 + (4c4 + c3 )x3 + . . . To handle the right hand side, recall the pattern of Taylor polynomials for cos(x): cos(x) = 1 − Multiplying both sides by x, we get x2 x4 + −.... 2! 4! 4.6 Approximating solutions by Taylor polynomials x cos(x) = x − 259 x3 x5 x3 x5 + − . . . = 0 + x + 0 · x2 − + 0 · x4 + − . . . . 2! 4! 2! 4! Equating the coefficients of y′ (x) + y(x) with the coefficients of x cos(x): c1 = 0, 2c2 + c1 = 1, 3c3 + c2 = 0, 1 4c4 + c3 = − . 2 1 Solving these equations sequentially, we get c1 = 0, c2 = 21 , c3 = − 16 , c4 = − 12 , so the Taylor polynomial approximation of degree 4 is 1 2 1 3 1 4 x − x − x . 2 6 12 Chapter 5 Taylor polynomials and series 5.1 From Taylor polynomials to Taylor series ex = 1 + x + x2 x3 + +... 2! 3! ∞ = cos(x) = 1 − x2 x4 x6 + − +... = 2! 4! 6! sin(x) = x − x3 x5 x7 + − +... = 3! 5! 7! 1 = 1 + x + x2 + x3 + . . . 1−x ln(1 − x) = −x − ln(1 + x) = x − x2 x3 − −... 2 3 x2 x3 + −... 2 3 = xn ∑ n! (R = ∞) (−1)n 2n x ∑ n=0 (2n)! (R = ∞) n=0 ∞ ∞ (−1)n ∑ (2n + 1)! x2n+1 (R = ∞) ∑ xn (R = 1) n=0 ∞ n=0 ∞ xn n=1 n = −∑ (−1)n+1 n ∑ n x n=1 (R = 1) ∞ = (R = 1) The above table contains a list of several classic examples of Taylor series, as well as the radius of convergence R for each of them. We will discuss the meaning of what is written in the table in this section and subsequent sections. First of all, let us recall that in Section 3.7 we introduced and discussed Taylor polynomials of degree n centered at x = a, which can be used to approximate a function y = f (x) near a point x = a: Pn (x) = f (a) + f ′ (a)(x − a) + f ′′ (a) f (n) (a) (x − a)2 + . . . + (x − a)n . 2! n! In this section, we will push this definition of a Taylor polynomial to the limit, where it will become the Taylor series. 261 262 5 Taylor polynomials and series Example 1. Let us discuss the meaning of the Taylor series for ex centred at a = 0: ex = 1 + x + x2 x3 + +... 2! 3! ∞ = xn (R = ∞) ∑ n! n=0 Solution: First of all, let us recall the Taylor polynomials of f (x) = ex centred at a = 0. Because all the derivatives of ex are equal to ex itself, we get that f (n) (x) = ex and f (n) (0) = e0 = 1. For example, the Taylor polynomial of degree 7 at a = 0 is P7 (x) = 1 + x + x2 x3 x4 x5 x6 x7 + + + + + 2! 3! 4! 5! 6! 7! and we see in the figure that it approximates ex very well on the interval between −2 and 2. What happens if we do not stop at the degree 7 and keep adding more and more terms? It turns out that the Taylor polynomials will get closer and closer to our function ex . Depending on the function f (x), this approximation will work on some interval of the form a − R < x < a + R: a−R a a+R x or, in other words, for x in between a − R and a + R for some number R which is called the radius of convergence. (Sometimes things are a bit more subtle than this, but we will not encounter such unusual examples.) For example, the table above says that, in the case of the exponential function ex , the radius of convergence is R = ∞. This means that for any −∞ < x < ∞, the Taylor polynomials Pn (x) will eventually get closer and closer to ex if we keep increasing the degree n. In mathematical language, we can say that Pn (x) converges to ex for all x and write ex = lim Pn (x). n→∞ Another common way to express the same thing is to write ex = 1 + x + x2 x3 + +... 2! 3! xn ∑ . n=0 n! ∞ = In the middle, the dots . . . express that if we continue adding more and more terms to the Taylor polynomials we will get closer and closer to ex . On the right hand side, a more sophisticated notation ∑∞ n=0 also expresses that we keep adding the terms of degree n indefinitely, up to infinite degree n = ∞. This infinite sum is called the Taylor series of ex centred at a = 0. 5.1 From Taylor polynomials to Taylor series 263 The notation ∑∞ n=0 is very important, and it is called the Sigma notation. The n most important thing about this notation is that we are able to find the formula xn! that expresses the term number n the Taylor series. In many applications we can simply write out a few terms at the beginning, but sometimes we will need the formula for the nth term, so it is important to remember those. Let us check that 3 2 this formula indeed encodes correctly the terms of the series 1 + x + x2! + x3! + . . . . We need to recall the convention that 0! = 1. Then, n = 0 =⇒ x0 = 1, 0! We see that the formula from 0, 1, 2, etc. xn n! n = 1 =⇒ x1 = x, 1! n = 2 =⇒ x2 , 2! etc. matches the pattern of the series as the degree n changes Exercise 1. Discuss the meaning of the Taylor series for cos(x) and sin(x) centred at a = 0: cos(x) = 1 − x2 x4 x6 + − +... = 2! 4! 6! (−1)n 2n x ∑ n=0 (2n)! sin(x) = x − x3 x5 x7 + − +... = 3! 5! 7! ∑ (2n + 1)! x2n+1 ∞ (−1)n ∞ (R = ∞) (R = ∞) n=0 What is the meaning of R = ∞? Check that the ∑-notation matches the pattern in each case. Does the index n represent the degree of the Taylor polynomial in these formulas? Geometric series. Next, we will discuss the so called geometric series: 1 = 1 + x + x2 + x3 + . . . 1−x ∞ = ∑ xn (R = 1) n=0 Example 2. Check that the series written above is the Taylor series centred at a = 0 1 of the function f (x) = 1−x . Solution: Let us take a few derivatives of f (x): f ′ (x) = 1 , (1 − x)2 f ′′ (x) = 1·2 , (1 − x)3 etc. We notice the pattern: f (n) (x) = Taylor series will be f ′′′ (x) = n! , (1−x)n+1 f (4) (x) = 1·2·3·4 , (1 − x)5 so f (n) (0) = n!. By definition, the ′′′ f (0) + f ′ (0)x + 1·2·3 , (1 − x)4 f ′′ (0) 2 f (0) 3 2! 3! x + x + . . . = 1 + x + x2 + x3 + . . . 2! 3! 3! 2! n which is exactly 1 + x + x2 + x3 + . . . = ∑∞ n=0 x . 264 5 Taylor polynomials and series In the next exercise, we will check that the geometric series 1 + x + x2 + x3 + 1 n . . . = ∑∞ n=0 x indeed converges to 1−x when −1 < x < 1, and it does not converge outside of this interval. This precisely will mean that the radius of convergence of the geometric series is R = 1. Exercise 2. (a) If x is outside of the interval −1 < x < 1, i.e. if x ≤ −1 or x ≥ 1, does xn get smaller and smaller when n gets bigger? What can we conclude about the series 1 + x + x2 + x3 + . . . and why? 1 (b) Using simple algebra, check that the difference between the function 1−x 2 3 n and its Taylor polynomial 1 + x + x + x + . . . + x of degree n can be simplified as 1 xn+1 − (1 + x + x2 + x3 + . . . + xn ) = . 1−x 1−x (c) What happens to the difference in part (b) when −1 < x < 1 and n goes to infinity? Comment. This is a famous feature of Taylor series that they approximate a function f (x) on a symmetric interval a − R < x < a + R around the centre a. For example, if we look 1 at the function 1−x (black solid curve in the figure), it has a vertical asymptote at x = 1 where the denominator becomes 0, so the Taylor series cannot possibly approximate it to the right of a = 0 any further than x = 1. On 1 the left side of x = 0, the function 1−x is well defined all the way up to −∞, but the Taylor series fail to approximate it beyond x = −1. We see how Taylor polynomials P20 (x) and P21 (x) in the figure start growing fast around x = −1. Because we have some obstacle for convergence on the right of the centre, it automatically limits us to the left of the centre as well. Taylor series for the logarithm. Next, we will discuss the series: ln(1 − x) = −x − x2 x3 − −... 2 3 xn n=1 n ∞ = −∑ 2 (R = 1) 3 Example 2. Show that ln(1 − x) = −x − x2 − x3 − . . . for any −1 < x < 1. Solution: Let us take the formula from part (b) in the previous exercise: 1 t n+1 − (1 + t + t 2 + t 3 + . . . + t n ) = 1−t 1−t 5.1 From Taylor polynomials to Taylor series 265 and integrate it between 0 and x: Z x 0 1 dt − 1−t Z x 2 3 n (1 + t + t + t + . . . + t ) dt = Z x n+1 t 0 0 1−t dt. The left hand side is easy to integrate and we get Z x n+1 xn+1 t x2 x3 = dt. − ln(1 − x) − x + + + . . . + 2 3 n+1 0 1−t Our goal is to show that, for −1 < x < 1, the two terms on the left hand side are close to each other, − ln(1 − x) ≈ x + x2 x3 xn+1 + +...+ 2 3 n+1 n+1 so all we need to show is that the integral 0x t1−t dt is small when n is large. To show this, for certainty, take x = 0.9. Then the numerator t n+1 < 0.9n and the denominator 1 − t > 0.1 is not too small (we do not divide by something close n n+1 to zero), so the function we integrate is pretty small: t1−t ≤ 0.9 0.1 → 0 as n → ∞. As a result, the integral will be small, so the series will indeed approximate the logarithmic function ln(1 − x). R Exercise 3. Show that, for any −1 < x < 1, ln(1 + x) = x − x2 x3 + −... 2 3 (−1)n+1 n x . n n=1 ∞ = ∑ Hint: use the series from the previous example. Answer to Exercise 1. In both cases, the series notation expresses that as we add more and more terms, the sum will get closer and closer to our function, cos(x) or sin(x). The fact that the radius of convergence R is equal to ∞ means that this approximation will work for all x, between −∞ < x < ∞. Of course, the further x is from the centre a = 0, the more terms we might have to add before this approximation gets good. n 2n Let us check the ∑-notation. In the case of cos(x), the general term is (−1) (2n)! x , so: (−1)0 0 x = 1, (0)! (−1)2 4 x4 n = 2 =⇒ x = , (4)! 4! n = 0 =⇒ (−1)1 2 x2 x =− , (2)! 2! 3 (−1) 6 x6 n = 3 =⇒ x =− , (6)! 6! n = 1 =⇒ etc. We see that the formula matches the pattern correctly. In the case of sin(x), the (−1)n 2n+1 general term is (2n+1)! x , so: 266 5 Taylor polynomials and series (−1)1 3 x3 x =− , (3)! 3! (−1)3 7 x7 n = 3 =⇒ x =− , (7)! 7! (−1)0 1 x = x, (1)! (−1)2 5 x5 x = , n = 2 =⇒ (5)! 5! n = 1 =⇒ n = 0 =⇒ etc. Again, we see that the formula matches the pattern correctly. In these two cases, the index n does not represent the degree of the polynomial. It represents the term number, and the degree is either 2n in the case of cosine, or 2n + 1 in the case of sine. Notice how the degree increases by 2 in both cases, which explains why we needed to multiply n by 2 in these formulas. Answer to Exercise 2. (a) If |x| ≥ 1 then |xn | ≥ 1, so xn does not get small when n gets large. When we start adding numbers xn which do not get smaller and smaller, we can not approach any limit, so Taylor polynomials Pn (x) = 1 + x + . . . + xn will not converge to anything as the degree n gets bigger. This explains why the geometric series does not converge outside of the interval −1 < x < 1. (b) Writing the difference with the common denominator, 1 1 − (1 − x)(1 + x + x2 + x3 + . . . + xn ) − (1 + x + x2 + x3 + . . . + xn ) = . 1−x 1−x Let us multiply out the second term (1 − x)(1 + x + x2 + x3 + . . . + xn ) in the numerator: (1 + x + x2 + x3 + . . . + xn ) − x(1 + x + x2 + x3 + . . . + xn ) =(1 + x + x2 + x3 + . . . + xn ) − (x + x2 + x3 + x4 + . . . + xn + xn+1 ) =1 + (x + x2 + x3 + . . . + xn ) − (x + x2 + x3 + x4 + . . . + xn ) − xn+1 ( (((( ( ((( +( x2( +( x3( +( x4(+ . . . + xn ) − xn+1 (x( +( x2( +( x3( +( . . . + xn ) − (x( =1 + ( ( 1 − xn+1 . which is The numerator is 1 − (1 − xn+1 ) = xn+1 as promised. (c) If −1 < x < 1, which means that the absolute value |x| < 1 is smaller than 1, then |x|n → 0 as n → ∞, because an is a geometric decay function when the base a < 1. For example, 0.510 = 0.0009765625. This means that the difference we found in part (b): xn+1 →0 1−x will get smaller and smaller as n gets bigger, when x is between −1 and 1. This 1 shows that the geometric series 1 + x + x2 + x3 + . . . converges to 1−x for −1 < x < 1, so the radius of convergence R = 1, as promised. Answer to Exercise 3. If we take the series from the previous example, ln(1 − x) = −x − x2 x3 − −... 2 3 xn n=1 n ∞ =−∑ 5.1 From Taylor polynomials to Taylor series 267 and replace x by −x, we get ln(1 − (−x)) = −(−x) − (−x)2 (−x)3 − −... 2 3 (−x)n . n n=1 ∞ =−∑ Since −(−x)n = (−1)((−1)(x))n = (−1)(−1)n xn = (−1)n+1 xn , the above can be simplified as ln(1 + x) = x − x2 x3 + −... 2 3 (−1)n+1 n x , n n=1 ∞ = ∑ which is what we wanted. Notice that if x is between −1 and 1 then −x is also between −1 and 1, so this series converges on the interval −1 < x < 1. 268 5 Taylor polynomials and series 5.2 Transformations of Taylor series In this section we will learn how to combine the classic Taylor series from the last section with some simple algebraic manipulations to obtain Taylor series for more complicated functions. This is a very useful skill, because in many cases it makes finding Taylor polynomials and series much easier than using the definition directly by computing derivatives. At the end of the section we will give one application and find Padé approximations to functions ex and cos(x) near x = 0. 2 ) centered at x = 0. Write the Example 1. Find the Taylor series of f (x) = sin(x x answer using the Σ-notation and by writing out the first few terms. Where does this series converge? What is its radius of convergence R? Solution: Step 1. Because sine appears in the function f (x), we start by recalling its Taylor series, Using the Σ-notation: ∞ sin(x) = (−1)n ∑ (2n + 1)! x2n+1 . n=0 Step 2. We want to think of the variable x in the above series as a placeholder that can be replaced by any other expression. To emphasize this, let us replace this placeholder by a banana: (−1)n ∑ n=0 (2n + 1)! ∞ sin = In this problem, we will replace sin(x2 ) = ∞ (−1)n 2n+1 . by x2 and then simplify using algebra: ∞ (−1)n ∑ (2n + 1)! (x2 )2n+1 = ∑ (2n + 1)! x4n+2 . n=0 n=0 Step 3. Finally, we divide both sides by x and simplify. In the Taylor series, we can divide term by term, just like a regular sum: ∞ ∞ sin(x2 ) (−1)n x4n+2 (−1)n 4n+1 =∑ =∑ x . x n=0 (2n + 1)! x n=0 (2n + 1)! This is the answer in the Σ-notation. Writing out the first few terms: x5 x9 x13 (−1)0 1 (−1)1 5 (−1)2 9 (−1)3 13 x + x + x + x +... = x− + − +... 1! 3! 5! 7! 3! 5! 7! How can we decide where this series converges? We know from last section that the original series for sin(x) converges everywhere, because its radius of convergence is R = ∞. In other words, −∞ < x < ∞ or −∞ < < ∞, so we can replace by any number we want, which means that the new series also converges for all x, so its radius of convergence is R = ∞. 5.2 Transformations of Taylor series 269 Warning. The reason we could divide each term by x is because the series for sin(x2 ) did not have a constant term c0 , and all the terms had at least one power of x that could be cancelled out. If, for example, the series started with 1 + . . . , dividing by x would give 1x + . . . , which would not be a Taylor series. Also, in Step should be something that gives us Taylor series again at the end. As 2 above, we will see from examples below, it does not always have to the the power of x, but it should be simple enough. 2 Exercise 1. Find the Taylor series of f (x) = xe−x centered at x = 0. Write the answer using the Σ-notation and by writing out the first few terms. Where does this series converge? What is its radius of convergence R? 1 Example 2. Find the Taylor series of f (x) = 1+2x 2 centered at x = 0. Write the answer using the Σ-notation and by writing out the first few terms. Where does this series converge? What is its radius of convergence R? 1 1 Solution: The function 1+2x 2 looks similar to 1−x , so we should start with the geometric series: 1 = 1−x ∞ ∑ xn n=0 or 1 1− ∞ = ∑( )n . n=0 In the denominator, our function f (x) has 1 + 2x2 , but we want to see something like 1 − . In this case, we simply write 1 + 2x2 = 1 − (−2x2 ), which means that we should replace with −2x2 : ∞ ∞ 1 1 2 n = = (−2x ) = ∑ ∑ (−1)n 2n x2n . 1 + 2x2 1 − (−2x2 ) n=0 n=0 In this case we do not multiply by anything, so we can simply write out a few terms at the beginning of the series: ∞ 1 = (−1)n 2n x2n = 1 − 2x2 + 4x4 − 8x6 + 16x8 − . . . . ∑ 1 + 2x2 n=0 How can we decide where this series converges if we know that the original geometric series converges when −1 < x < 1, or −1 < < 1? Since we replaced 2 2 by −2x , the new series converges when −1 < −2x < 1, or −1 < 2x2 < 1. Solving this for x: 2x2 < 1 =⇒ x2 < 1 2 =⇒ 1 |x| < √ 2 =⇒ 1 1 −√ <x< √ . 2 2 This means that the radius of convergence of the new series is R = √1 . 2 270 5 Taylor polynomials and series 1 Exercise 2. Find the Taylor series for f (x) = 1+3(x−1) 2 centered around x = 1. Write the answer using the Σ-notation and by writing out the first few terms. Where does this series converge? What is its radius of convergence R? Example 3. Find the Taylor series of f (x) = ln(2 − x) centered at x = 0. Write the answer using the Σ-notation and by writing out the first few terms. Where does this series converge? What is its radius of convergence R? Solution: The function ln(2 − x) looks similar to ln(1 − x), so we should start with the series: xn n=1 n ∞ ln(1 − x) = − ∑ ∞ or ln(1 − )=−∑ n=1 )n ( n . To make 2−x look like 1− , we could rewrite it as 2−x = 1+1−x = 1−(x−1) by x − 1. However, this will give us the series with powers (x − 1)n and replace which is centered at x = 1, while we want the series centered at x = 0. We need to do something a bit different. The trick is to write 2 − x = 2(1 − 2x ), so that x x ln(2 − x) = ln 2 1 − = ln 2 + ln 1 − . 2 2 Now, we can replace above by 2x , so ∞ ∞ x (x/2)n xn ln(2 − x) = ln 2 + ln 1 − = ln 2 − ∑ = ln 2 − ∑ n . 2 n n=1 n=1 n2 The first few terms at the beginning of the series are x x2 x3 ln(2) − − − − . . . . 2 8 24 From the last section we know that the original series for ln(1 − x) converges when < 1. Since we replaced by 2x , the new series converges −1 < x < 1, or −1 < when −1 < 2x < 1, or −2 < x < 2. This means that the radius of convergence is R = 2. Exercise 3. Find the Taylor series of f (x) = ln(10 + x2 ) centered at x = 0. Write the answer using the Σ-notation and by writing out the first few terms. Where does this series converge? What is its radius of convergence R? Next two problem will be slightly more tricky, because they require both shift and rescaling of the argument. Example 4. Find the Taylor series of f (x) = 1x centered at x = 5. Write the answer using the Σ-notation. Where does this series converge? What is its radius of convergence R? 5.2 Transformations of Taylor series 271 1 Solution: The function 1x looks similar to 1−x , so we should start with the geometric series: ∞ ∞ 1 1 = ∑ ( )n . = ∑ xn or 1 − x n=0 1− n=0 We want the series to be centered at 5, so at the end we want the powers of x − 5. That means that we should add and subtract 5 and rewrite our function as 1 1 = . x 5 + (x − 5) This does not quite look like 1 1− yet, but we can factor out 5: 1 1 1 1 1 1 1 = · = · . = = x−5 x−5 x 5 + (x − 5) 5(1 + 5 ) 5 1 + 5 5 1 − (− x−5 5 ) Now it looks like what we want and if we replace by (− x−5 5 ), we get in the geometric series above ∞ 1 1 ∞ x − 5 n (−1)n 1 = = · − (x − 5)n . ∑ ∑ n+1 5 1 − (− x−5 5 5 5 ) n=0 n=0 5 We can see that this series is centered at a = 5 because all the powers are of the form (x − 5)n . This series converges when −1 < − x−5 5 < 1, or 0 < x < 10, which means that the radius of convergence is R = 5. Exercise 4. Find the Taylor series of f (x) = ln(x) centered at x = 10. Write the answer using the Σ-notation. Where does this series converge? What is its radius of convergence R? Multiplying two series. If we want to multiply two Taylor series, ∞ f (x) = ∑ an x n ∞ and g(x) = n=0 ∑ bn xn , n=0 we can multiply them out term by term, just like regular sums, and then collect the terms with the same powers. In fact, there is a general formula how to multiply two Taylor series using the Σ-notation, but it is a bit too complicated for our purposes, so we will stick with simpler examples where we only want to find a few terms of the product f (x)g(x). Let us illustrate it on an example. Example 5. Find the first few terms of the Taylor series of f (x)g(x) centered at x = 0 if f (x) = 1 − 2x + x2 + 7x3 + . . . and g(x) = 3 + x + x2 − 4x3 + . . . . 272 5 Taylor polynomials and series Solution: One subtle point to remember when multiplying out the product of two series f (x)g(x) = (1 − 2x + x2 + 7x3 + . . .) × (3 + x + x2 − 4x3 + . . .) is that the terms + . . . could contain powers of x starting from x4 , x5 , etc., and in the problem we are not told exactly what those terms are. This means the following. Suppose we make the multiplication table for f (x)g(x) writing all the terms of f (x) in the first row, all terms of g(x) in the first column, and their products in other entries of the table: 3 x x2 −4x3 ··· 1 −2x x2 7x3 3 −6x 3x2 21x3 x −2x2 x3 7x4 x2 −2x3 x4 7x5 3 4 5 −4x 8x −4x −28x6 ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· In this table the terms + . . . could contain powers starting from x4 and, as we said, we do not know what they are. This means that the terms written in purple in the lower right corner of the table should not be collected, because they contain powers x4 , x5 and x6 and they could potentially be modified by the missing terms + . . . . This means that when we multiply out two series, we should completely ignore the terms of the same degree as + . . . and our multiplication table could look like this: 3 x x2 −4x3 1 −2x 3 −6x x −2x2 2 x −2x3 3 −4x x2 3x2 x3 7x3 21x3 Then we collect the terms with the same powers, f (x)g(x) = 3 + (x − 6x) + (x2 − 2x2 + 3x2 ) + (−4x3 − 2x3 + x3 + 21x3 ) + . . . , and, after simplifying, we see that the first few terms of the product are f (x)g(x) = 3 − 5x + 2x2 + 16x3 + . . . . Exercise 5. Find the first few terms of the Taylor series of f (x)g(x) centered at x = 0 if f (x) = 7 − x2 + 2x4 + x6 + . . . and g(x) = 1 + 2x2 + 4x4 + 8x6 + . . . . 5.2 Transformations of Taylor series 273 Padé approximation. Next, we will give one application of what have learned so far in this section to find some novel approximations of ex and cos(x) near x = 0: the so called Padé approximation. Example 6. In this problem we are going to find an approximation for ex near x = 0 by a simple function of the form 1+ax 1+bx (the red dashed curve in the figure). The goal of this question is to find the best parameters a and b. (a) Find the first three terms of the Taylor 1 . series for 1+bx (b) Find the first three terms of the Taylor series for the product 1 + ax 1 = (1 + ax) · . 1 + bx 1 + bx (c) Make sure that the terms in part (b) match the first three terms of the Taylor series for ex to find a and b. Solution: (a) Using the geometric series 1 1−x = 1 + x + x2 + . . ., we get that 1 1 = = 1 + (−bx) + (−bx)2 + . . . = 1 − bx + b2 x2 + . . . . 1 + bx 1 − (−bx) (b) Using what we found in part (a), we want to multiply out (1 + ax) · 1 = (1 + ax) × (1 − bx + b2 x2 + . . .) 1 + bx = 1 − bx + b2 x2 + ax − abx2 + ab2 x3 + . . . . However, we should remember that the + . . . term contains powers starting with x3 , so we must ignore the term ab2 x3 . Correct multiplication will actually be: (1 + ax) · 1 = (1 + ax) × (1 − bx + b2 x2 + . . .) 1 + bx = 1 − bx + b2 x2 + ax − abx2 + . . . = 1 + (a − b)x + (b2 − ab)x2 + . . . . Luckily, forgetting to ignore the term ab2 x3 would not affect out next step, but it was worth emphasizing this point once again. (c) We want our series in part (b) to be a good approximation of ex near x = 0, 2 which has the Taylor series ex = 1 + x + x2 + . . .. For this purpose, we want the 2 coefficients in part (b) to match 1 + x + x2 + . . . , so 274 5 Taylor polynomials and series a−b = 1 and 1 b2 − ab = . 2 The first equation gives a = b + 1 and plugging it into the second equation gives b2 − (b + 1)b = 12 , or −b = 12 , or b = − 12 . Then a = b + 1 = − 12 + 1 = 21 . The approximation we were looking for is ex ≈ 1 + ax 1 + 0.5x = . 1 + bx 1 − 0.5x This is the function graphed by the red dashed curve in the figure above. Exercise 6. In this problem we are going to find an approximation for cos(x) near x = 0 by a simple function of the form 1+ax2 (the red dashed curve in the figure). 1+bx2 The goal of this question is to find the best parameters a and b. (a) Find the first three terms of the Taylor 1 series for 1+bx 2 . (b) Find the first three terms of the Taylor series for the product 1 + ax2 1 = (1 + ax2 ) · . 2 1 + bx 1 + bx2 (c) Make sure that the terms in part (b) match the first three terms of the Taylor series for cos(x) to find a and b. n x 2 Answer to Exercise 1. Because, ex = ∑∞ n=0 n! , replacing x with −x gives 2 e−x = ∞ ∞ (−x2 )n ((−1)x2 )n (−1)n x2n = = ∑ n! ∑ n! ∑ n! n=0 n=0 n=0 ∞ Multiplying this by x term by term like a regular sum: 2 xe−x = ∞ (−1)n x2n x (−1)n x2n+1 x5 x7 =∑ = x − x3 + − + . . . n! n! 2! 3! n=0 n=0 ∞ ∑ The original series for ex converges everywhere, so x can be replaced by any number. This means that the new series also converges for all x and its radius of convergence is R = ∞. Answer to Exercise 2. As in the previous example, if we write 1 + 3(x − 1)2 = 1 − (−3(x − 1)2 ) and use the geometric series: 5.2 Transformations of Taylor series 275 ∞ ∞ 1 1 2 n (−1)n 3n (x − 1)2n . (−3(x − 1) ) = = = ∑ ∑ 1 + 3(x − 1)2 1 − (−3(x − 1)2 ) n=0 n=0 This series is centered at a = 1, because all the terms have powers (x − 1)n . A few terms at the beginning of the series are 1 − 3(x − 1)2 + 9(x − 1)4 − 27(x − 1)6 + 81(x − 1)8 − . . . . Since we replaced x by −3(x − 1)2 , the new series converges when −1 < −3(x − 1)2 < 1, or −1 < 3(x − 1)2 < 1. Solving this for x: 3(x − 1)2 < 1 =⇒ (x − 1)2 < 1 1 1 1 =⇒ |x − 1| < √ =⇒ − √ < x − 1 < √ . 3 3 3 3 We can also write this as 1 − √13 < x < 1 + √13 , so the radius of convergence is R = √13 . 2 2 x x Answer to Exercise 3. We write 10 + x2 = 10(1 + 10 ) = 10(1 − (− 10 )), so that x2 x2 ln(10 + x2 ) = ln 10 1 − − = ln 10 + ln 1 − − . 10 10 Using the series for ln(1 − x): x2 ∞ ∞ (−x2 /10)n (−1)n+1 2n ln 10 + ln 1 − − = ln 10 − ∑ = ln 10 + ∑ x . n 10 n n=1 n=1 n10 The first few terms at the beginning of the series are ln 10 + x2 x4 x6 − + +.... 10 200 3000 The original series for ln(1 − x) converges when −1 < x√< 1, so the√new series x2 converges when −1 < − 10 <√1. Solving for x, we get − 10 < x < 10, so the radius of convergence is R = 10. Answer to Exercise 4. We begin by writing x − 10 x − 10 ln(x) = ln(10 + x − 10) = ln 10 1 + = ln 10 + ln 1 − − . 10 10 Using the series for ln(1 − x), this equals n ∞ (− x−10 (−1)n+1 10 ) = ln 10 + ∑ (x − 10)n . n n n10 n=1 n=1 ∞ ln 10 − ∑ 276 5 Taylor polynomials and series The original series for ln(1 − x) converges when −1 < x < 1, so the new series converges when −1 < − x−10 10 < 1. Solving for x, we get 0 < x < 20, so the series is centered at a = 10 and the radius of convergence is R = 10. Answer to Exercise 5. Because the + . . . term could contain powers x7 or above, we should ignore those powers when multiplying things out: 1 2x2 4x4 8x6 7 −x2 7 −x2 14x2 −2x4 28x4 −4x6 56x6 2x4 2x4 4x6 x6 x6 Collecting the terms with the same powers, we see that f (x)g(x) = 7 + 13x2 + 28x4 + 57x6 + . . . . Answer to Exercise 6. (a) Using the geometric series get that 1 1−x = 1 + x + x2 + . . ., we 1 1 = = 1 + (−bx2 ) + (−bx2 )2 + . . . = 1 − bx2 + b2 x4 + . . . . 2 1 + bx 1 − (−bx2 ) (b) Using what we found in part (a), we want to multiply out (1 + ax2 ) · 1 = (1 + ax2 ) × (1 − bx2 + b2 x4 + . . .) 1 + bx2 = 1 − bx2 + b2 x4 + ax2 − abx4 + . . . = 1 + (a − b)x2 + (b2 − ab)x4 + . . . .. We did not write the term +ab2 x6 because it is absorbed by the dots + . . . . (c) We want our series in part (b) to be a good approximation of cos(x) near 2 4 x = 0, which has the Taylor series cos(x) = 1 − x2! + x4! + . . .. For this purpose, we 2 4 want the coefficients in part (b) to match 1 − x2! + x4! + . . . , so a−b = − 1 2 and b2 − ab = 1 1 = . 4! 24 The first equation gives a = b − 21 and plugging it into the second equation gives 1 1 1 1 5 b2 − (b − 12 )b = 24 , or b2 = 24 , or b = 12 . Then a = 12 − 12 = − 12 . The approximation we were looking for is cos(x) ≈ 5 2 x 1 + ax2 1 − 12 . = 1 2 1 + bx 1 + 12 x2 This is the function graphed by the red dashed curve in the figure in the statement of the problem. 5.3 Ratio test and the radius of convergence 277 5.3 Ratio test and the radius of convergence In this section we will discuss the radius of convergence R of Taylor series in more detail, and our main tool will be the so called Ratio Test. To state this test, let us first consider arbitrary series ∞ ∑ an = a0 + a1 + a2 + a3 + a4 + a5 + . . . n=0 that consists of adding a sequence of numbers a0 , a1 , a2 , a3 , a4 , a5 , . . . , indefinitely. We want to know whether this addition process gets closer and closer to some limiting number as we keep adding more and more terms, and the Ratio Test gives us one very useful criterion, as follows. • Compute the limit of the ratio of two consecutive numbers in the sequence, ρ := lim n→∞ |an+1 | |an | where we ignore their signs by taking the absolute values |an+1 | and |an |. • Then the Ratio Test tells us that: If ρ > 1 then the series diverges. If ρ < 1 then the series converges. If ρ = 1 then the Ratio Test is inconclusive. The reason why the Ratio Test works is quite simple. • If ρ > 1, this indicates that, after a certain step n, the next number |an+1 | gets bigger than the previous number |an |, in absolute value. If we keep adding bigger and bigger numbers, we cannot hope to get closer and closer to some limiting number, so the series must diverge. • On the other hand, if ρ < 1, this indicates that, after a certain step n, the next number |an+1 | gets smaller than the previous number |an |, and it gets smaller by about a factor of ρ < 1. For example, if an = 1 then (ignoring possible ± signs) the next number is about 0.5, the next number is about 0.25, the next number is about 0.125, etc. These number are getting small quickly, so when we add them up we do get closer and closer to some limiting number. • If ρ = 1 then the number an might decrease, but they do not decrease fast enough for us to be able to conclusively tell whether their sum converges to something. Usually, a more careful analysis is needed in this case, but for the purpose of studying the radius of convergence, the Ratio Test will be enough. We recall that the Taylor series of a function f (x) centred at a converges on some interval of the form a − R < x < a + R: 278 5 Taylor polynomials and series a−R x a+R a or, in other words, for x in between a − R and a + R for some number R which is called the radius of convergence. In the examples below we will see that such behaviour is, indeed, a consequence of the Ratio Test. We have stated in Section 5.1 that the radius of convergence of the the Taylor series of the exponential function ex is R = ∞. In our first example, we will check this using the Ratio Test. Example 1. Find the radius of convergence of the Taylor series: ex = 1 + x + x2 x3 + +... 2! 3! ∞ = xn ∑ n! . n=0 Solution: To use this Ratio Test, we consider two consecutive terms in the series, |an | = |x|n n! and |an+1 | = |x|n+1 , (n + 1)! (do not forget the absolute values!) and then compute their ratio, |an+1 | = |an | |x|n+1 (n+1)! |x|n n! = |x|n+1 n! |x|n+1 n! n! · n= · = |x| . (n + 1)! |x| |x|n (n + 1)! (n + 1)! Notice how dividing |x|n+1 by |x|n cancels |x|n , so we are left with one power of |x|. This will be a typical feature when applying the Ratio Test to Taylor series. Notice that we can also simplify the ratio of factorials n! 1 · 2 · · · (n − 1) · n 1 = = (n + 1)! 1 · 2 · · · (n − 1) · n · (n + 1) n + 1 because we could cancel out 1 · 2 · · · (n − 1) · n in the numerator and denominator. |x| and its limit is As a result, the ratio is n+1 lim n→∞ |an+1 | |x| = lim = 0. n→∞ n + 1 |an | Since this limit is less than 1, the Ratio Test tells us that the series converges. Notice that the limit was 0 no matter what x was, so this conclusion works for all x, which means that the radius of convergence is R = ∞. Exercise 1. Find the radius of convergence of the Taylor series: ln(1 − x) = −x − x2 x3 − −... 2 3 xn . n=1 n ∞ = −∑ 5.3 Ratio test and the radius of convergence 279 Example 2. Find the radius of convergence of the Taylor series: cos(x) = 1 − x2 x4 x6 + − +... 2! 4! 6! (−1)n 2n x . n=0 (2n)! ∞ = ∑ Solution: To use this Ratio Test, we consider two consecutive terms in the series, |an | = |x|2n (2n)! and |an+1 | = |x|2(n+1) |x|2n+2 = . (2(n + 1))! (2n + 2)! Notice how 2n does not become 2n + 1, but 2(n + 1) = 2n + 2! That is because we have to replace each appearance of n by n + 1. Their ratio is |an+1 | = |an | |x|2n+2 (2n+2)! |x|2n (2n)! = |x|2n+2 (2n)! |x|2n+2 (2n)! (2n)! · 2n = · = |x|2 . 2n (2n + 2)! |x| |x| (2n + 2)! (2n + 2)! Notice how dividing |x|2n+2 by |x|2n cancels |x|2n , so we are left with |x|2 , not |x|. Factorials here also simplify differently: (2n)! 1 · 2 · · · (2n − 1) · 2n 1 = = (2n + 2)! 1 · 2 · · · (2n − 1) · 2n · (2n + 1)(2n + 2) (2n + 1)(2n + 2) because we could cancel out 1 · 2 · · · (2n − 1) · 2n in the numerator and denominator. Because 2n increased by 2, there are two extra factors left in this case. As a result, |x|2 and its limit is the ratio is (2n+1)(2n+2) |an+1 | |x|2 = lim = 0. n→∞ |an | n→∞ (2n + 1)(2n + 2) lim Since this limit is less than 1, the Ratio Test tells us that the series converges for all x, which means that the radius of convergence is R = ∞. Exercise 2. Find the radius of convergence of the Taylor series: sin(x) = x − x3 x5 x7 + − +... 3! 5! 7! ∞ = (−1)n ∑ (2n + 1)! x2n+1 . n=0 In the examples above, we checked that the radius of convergence is what was claimed in the table at the beginning of Section 5.1. Now, let us try some new series. Example 3. Find the radius of convergence of the Taylor series: 280 5 Taylor polynomials and series (−2)n (x + 5)2n . 2 n n=1 ∞ ∑ Solution: We consider two consecutive terms in the series, |an | = 2n |x + 5|2n n2 and |an+1 | = 2n+1 |x + 5|2(n+1) . (n + 1)2 Their ratio is |an+1 | = |an | 2n+1 |x + 5|2(n+1) (n+1)2 2n |x + 5|2n n2 = 2|x + 5|2 n2 . (n + 1)2 To compute the limit, we divide the numerator and denominator by the highest power of n, which is n2 in this case, so n2 |an+1 | n2 n2 lim = lim 2|x + 5|2 = 2|x + 5|2 lim n+1 2 n→∞ |an | n→∞ n→∞ ( (n + 1)2 n ) 1 1 = 2|x + 5|2 = 2|x + 5|2 . = 2|x + 5|2 lim 2 n→∞ (1 + 1 )2 (1 + 0) n By the Ratio Test, the series converges when 2|x + 5|2 < 1. Solving this for x, |x + 5|2 < 1 2 =⇒ 1 |x + 5| < √ 2 =⇒ =⇒ 1 1 − √ < x+5 < √ 2 2 1 1 − 5 − √ < x < −5 + √ . 2 2 This means that the centre is a = −5, as it should be because the series is written in terms of powers of (x + 5)n = (x − (−5))n , which looks like (x − a)n with a = −5. The radius is convergence is R = √12 . Comment. In Exercise 1 and Example 3, we had some polynomial P(n) in the denominator, namely, n or n2 . In both cases we saw that, if we ignore other factors, the limit of the ratio of polynomial factors converged to 1: P(n + 1) = 1. n→∞ P(n) lim This is always true for any polynomial, because when we divide the numerator and denominator by the highest power of n, it makes all the lower degree terms disappear. If we are looking for a radius of convergence, for example, on a multiple choice question on the exam where we do not need to show our work, we can simply erase any polynomial factors from the beginning. For example, the series 5.3 Ratio test and the radius of convergence 281 (−2)n (n4 − 7n + 3) (x + 5)2n 2 + 3n + 1 n n=1 ∞ ∑ ∞ and ∑ (−2)n (x + 5)2n n=1 have the same interval and radius of convergence, because the factors n4 − 7n + 3 and n2 + 3n + 1 in the numerator and denominator of the first series will not affect the limit. The second one is much easier to work with. Of course, other factors involving factorials like n! and exponentials like 5n cannot be ignored. Exercise 3. Find the radius of convergence of the Taylor series: n3 + n ∑ 5n (x − 4)n . n=0 ∞ Pretend that this is a multiple choice question on the exam. x−1 n Example 4. (a) If the series ∑∞ n=0 cn ( 4 ) has the interval of convergence (−7, 9), 2x+1 n ∞ where does the series ∑n=0 cn ( 5 ) converge? (b) Is is ever possible that the 2x+1 n series ∑∞ n=0 cn ( 5 ) converges on the interval (0, 2)? Solution: (a) The key step here is to use the given information to determine where x−1 n the series ∑∞ )n converges. The original series ∑∞ n=0 cn ( n=0 cn ( 4 ) converges x−1 when −7 < x < 9 and = 4 , so we need to rewrite −7 < x < 9 in terms of x−1 4 . First, we subtract 1, so −7 < x < 9 becomes −8 < x − 1 < 8. Then we divide by 4 and get −2 < x−1 4 < 2. As a result, the given information can be rephrases like n converges when −2 < this: ∑∞ ) < 2. c ( n=0 n 2x+1 n 2x+1 In the second series, = 5 , so the series ∑∞ n=0 cn ( 5 ) converges when 11 9 −2 < 2x+1 5 < 2 and, solving this for x we get −10 < 2x + 1 < 10, or − 2 < x < 2 . 11 9 1 11 9 The center of this series is in the middle of − 2 and 2 , which is a = 2 (− 2 + 2 ) = − 12 . The radius of convergence is R = 92 − (− 12 ) = 5. 2x+1 n 2x+1 (b) The series ∑∞ n=0 cn ( 5 ) is written in terms of the powers of 5 , so the 2x+1 1 centre must always be when 5 = 0 or x = − 2 . It is not possible that this series converges on the interval (0, 2) because its center would be at 1 and not − 12 . x−1 n Exercise 4. (a) If the series ∑∞ n=0 cn ( 2 ) has the radius of convergence R = 2, x+2 n where does the series ∑∞ n=0 cn ( 4 ) converge? (b) Is is ever possible that the series x+2 n ∞ ∑n=0 cn ( 4 ) converges on the interval (−4, 0)? Using symmetry. In the next two problems we will use the fact that the interval of convergence must be symmetric around the center: a−R a a+R x 282 5 Taylor polynomials and series Convergence at the endpoints of the interval a − R and a + R cannot be determined from the Ratio Test, because this is exactly where it is inconclusive. So we cannot say anything about the endpoints and, usually, they need to be handled separately. Sometimes the series might converge at one or both endpoints, but sometimes it might diverge at both endpoints. n Example 5. Suppose we know that the series ∑∞ n=0 cn (x − 2) converges at x = 3 but diverges at x = −1. What can we tell about the series convergence at x = 0, x = 1, x = 1.5, x = 5 and x = 7? Solution: The center of the series is when x − 2 = 0, or at x = 2. The statement about convergence of divergence must be symmetric around the center. Because we know that the series converges at x = 3 but diverges at x = −1, this leads to the following diagram: diverges converges ??? −1 1 2 diverges ??? 3 x 5 Because the series converges at x = 3, it definitely converges in between the center 2 and 3 and, by symmetry, it definitely converges between 1 and 2. Because it diverges at x = −1, it must also diverge to the left of −1 and, by symmetry, it must diverge to the right of x = 5 (−1 and 5 are at the same distance from the center). This leaves the intervals in between −1 and 1, and in between 3 and 5. Here we have no information to tell whether the series converges or diverges. As a result, we can tell that at x = 1.5 the series converges, at x = 7 the series diverges, and at x = 0 we can not tell whether it converges or diverges given what we know. We also cannot tell what happens at x = 1 and x = 5, because they could be the endpoints of the interval of convergence. (n) f (−2) (x+2)n converges Exercise 5. Suppose we know that the Taylor series ∑∞ n=0 n! at x = 2 but diverges at x = 4. What can we tell about the series convergence at x = −9, x = −6, x = −5, x = −4, x = 1 and x = 3? Answer to Exercise 1. Two consecutive terms, with absolute values, are |an | = |x|n n and their ratio is |an+1 | = |an | and |an+1 | = |x|n+1 n+1 |x|n n = |x| |x|n+1 , n+1 n . n+1 To compute the limit, we divide the numerator and denominator of highest power of n, in this case n itself, and we get n n+1 by the 5.3 Ratio test and the radius of convergence lim n→∞ n |an+1 | = lim |x| n→∞ |an | n+1 283 n n = |x| lim n+1 n→∞ n 1 = |x| lim n→∞ 1 + 1 n = |x| 1 = |x|. 1+0 The Ratio Test tells us that the series converges when this limit is smaller than 1, i.e. |x| < 1, or −1 < x < 1. This means that the center is a = 0 and the radius of convergence is R = 1. Answer to Exercise 2. To use this Ratio Test, we consider two consecutive terms in the series, |an | = |x|2n+1 (2n + 1)! and |an+1 | = |x|2(n+1)+1 |x|2n+3 = . (2(n + 1) + 1)! (2n + 3)! Their ratio is |an+1 | = |an | |x|2n+3 (2n+3)! |x|2n+1 (2n+1)! |x|2n+3 (2n + 1)! |x|2n+3 (2n + 1)! (2n + 1)! · = 2n+1 · = |x|2 . 2n+1 (2n + 3)! |x| |x| (2n + 3)! (2n + 3)! = Factorials simplify to (2n + 1)! 1 · 2 · · · 2n · (2n + 1) 1 = = . (2n + 3)! 1 · 2 · · · 2n · (2n + 1) · (2n + 2)(2n + 3) (2n + 2)(2n + 3) As a result, the ratio is |x|2 (2n+2)(2n+3) and its limit is |an+1 | |x|2 = lim = 0. n→∞ |an | n→∞ (2n + 2)(2n + 3) lim Since this limit is less than 1, the Ratio Test tells us that the series converges for all x, which means that the radius of convergence is R = ∞. Answer to Exercise 3. On a multiple choice question, we can pretend that the polynomials factor n3 + n is not there and that our series is ∞ ∞ 1 ∑ 5n (x − 4)n = ∑ n=0 n=0 x − 4 n 5 . From here, we can proceed in two ways. The fastest way is to remember that the )n converges when −1 < < 1. In this case is geometric series ∑∞ n=0 ( x−4 x−4 , so the above series converges when −1 < < 1, or −5 < x − 4 < 5, or 5 5 −1 < x < 9. So the center is the middle points a = 4 and the radius of convergence is R = 5. Another way is to use the ratio test. Two consecutive terms are |an | = x−4 5 n and |an+1 | = x−4 5 n+1 , 284 5 Taylor polynomials and series and their ratio is n+1 |an+1 | | x−4 x−4 5 | = x−4 n = . |an | 5 | 5 | This does not depend on n, so the limit |an+1 | x−4 = . n→∞ |an | 5 lim By the Ratio Test, the series converges when | x−4 5 | < 1 and, solving it for x, we get −5 < x − 4 < 5, or −1 < x < 9, as above. x−1 x−1 n Answer to Exercise 4. (a) The series ∑∞ n=0 cn ( 2 ) has the centre where 2 = 0, or x = 1. Since the radius of convergence R = 2, it means that it converges when 1 − 2 < x < 1 + 2. We can rewrite it in terms of = x−1 < 1. 2 as −1 < x+2 n ∞ This means that the series ∑n=0 cn ( 4 ) converges when −1 < x+2 4 < 1, or −4 < x + 2 < 4, or −6 < x < 2. The center is a = −2 and the radius of convergence is R = 4. (b) The center of the interval (−4, 0) is −2, so it is possible that it could be the x+2 n interval of convergence of the series ∑∞ n=0 cn ( 4 ) , which must have the center where x+2 4 = 0, or x = −2. Answer to Exercise 5. The diagram here will be diverges −8 converges ??? −6 −2 diverges ??? 2 4 x At x = −5, x = −4 and x = 1 the series converges. At x = −9 it diverges. And at x = 3, we can not tell if it converges or diverges. The point x = −6 could be the endpoint of the interval of convergence, so we cannot tell either if the series converges or diverges there. 5.4 Applications of Taylor series 285 5.4 Applications of Taylor series In this section we will go over some applications of Taylor polynomials and series. We have already seen applications of Taylor polynomials to approximating integrals, and approximating solutions of some differential equations. In this section we will consider some new applications to computing derivatives, integrals, limits, and comparing functions near some point. We will also quickly review some old applications. The first two problems will emphasize the connection between the coefficients n of the Taylor series ∑∞ n=0 cn (x − a) centered at x = a of some function f (x) and the derivatives of this function, namely, cn = f (n) (a) n! f (n) (a) = n!cn . or Of course, this is how the coefficients cn of the Taylor series are defined, but if we can compute the Taylor series first, we can use the coefficients cn to compute the derivatives. Example 1. Suppose that ( ∞ f (x) = n ∑ cn (x − 2) and f (n) n=0 (2) = n! 2n 1 n! if n is even, if n is odd. Find the formula for the coefficient cn . Solution: Using the formula above, f (n) (2) cn = = n! ( 1 2n 1 (n!)2 if n is even, if n is odd. Here, we simply applied the definition of the coefficients of the Taylor series. For 1 example, c99 = (99!) 2 , because 99 is odd. Exercise 1. Consider the function 5n √ ∑ n + 3 (x + 4)n . n=20 ∞ f (x) = Which of the following are true. (a) f (19) (−4) = 0 (b) f (22) (−4) = 522 5 (c) f (22) (−4) = 521 22! (d) f (25) (−4) = 525 286 5 Taylor polynomials and series Computing derivatives. In the next two problems we will have to compute the series first using some basic transformations of classic series. We will also use a convenient notation for the derivative f (n) (a), namely, dn f (x) dxn x=a . The notation expresses that we compute nth derivative then evaluate it at x = a. dn dxn of the function f (x) and Example 2. Compute d 20 −x2 xe dx20 x=0 and d 21 −x2 xe dx21 x=0 . 2 Solution: First, we need to find the Taylor series for xe−x centered at 0. Starting xn 2 with the exponential series ex = ∑∞ n=0 n! , we replace x by −x , 2 e−x = ∞ (−1)n 2n (−x2 )n = x , ∑ ∑ n! n=0 n! n=0 ∞ and then multiply it by x, ∞ (−1)n 2n (−1)n 2n+1 x =∑ x . n=0 n! n=0 n! ∞ 2 xe−x = x ∑ 20 2 d −x To find the derivative dx , we need to find the coefficient in this series in 20 xe x=0 20 front of x or, in other words, when the power 2n + 1 = 20, or n = 19 2 = 9.5. This 20 is not integer, so there is not term in the series with the power x . Another way to see that 2n + 1 cannot be equal to 20 is because it is always odd. Since the power x20 is not in the series, the coefficient c20 = 0 and, as a result, d 20 −x2 xe dx20 21 x=0 = 0. 2 d −x To find the derivative dx , we need to find the coefficient c21 in this 21 xe x=0 21 series in front of x or, in other words, when the power 2n + 1 = 21, or n = 10. 10 1 The coefficient c21 in front of x21 is (−1) 10! = 10! , so the derivative is d 21 −x2 xe dx21 x=0 = 21!c21 = 21! 1 21! = = 14079294028800. 10! 10! 5.4 Applications of Taylor series 287 Exercise 2. Compute d 11 x sin(x) dx11 and x=0 d 12 x sin(x) dx12 x=0 . Computing limits. In the next two problems, we will apply Taylor series to compute some limits. Example 3. Compute the limit 2 e−x − 1 + x2 lim . x→0 x4 Solution: We cannot just plug in x = 0, because we will get 00 . Instead, we will need to simplify first using Taylor series. As in the previous example, starting with the xn 2 exponential series ex = ∑∞ n=0 n! , we replace x by −x , 2 e−x = ∞ (−x2 )n (−1)n 2n x4 2 = x = 1 − x + +... . ∑ n! ∑ 2 n=0 n=0 n! ∞ If we move 1 − x2 to the left hand side, we get 2 e−x − 1 + x2 = x4 +... . 2 The dots . . . have powers at least x5 (actually, in this case, at least x6 ), so after we divide both sides by x4 , we get 2 e−x − 1 + x2 1 = +... x4 2 where the dots . . . now have at least one power of x, because x go to zero, all those . . . terms will disappear and so 2 e−x − 1 + x2 1 lim = . x→0 x4 2 Exercise 3. Compute the limit sin(x3 ) − x3 . x→0 x9 lim x5 x4 = x. When we let 288 5 Taylor polynomials and series Comparing functions near a point. In the last two problems we used that the . . . terms in the Taylor series disappeared in the limit x → 0 as long they they had at least one power of x. Next, we will use a similar idea to compare two functions near x = 0. Next two problems will refer to the following figures. Example 4. In the left figure above, which graph corresponds to: (b) 1 − sin(x2 ) (a) cos(x) 2 4 3 5 Solution: Recall that cos(x) = 1 − x2! + x4! − . . . and sin(x) = x − x3! + x5! − . . . and, replacing x by x2 in sin(x), sin(x2 ) = x2 − x6 x10 + −... 3! 5! and 1 − sin(x2 ) = 1 − x2 + x6 x10 − +... . 3! 5! Comparing cos(x) and 1 − sin(x2 ) is equivalent to comparing 1− x2 x4 + −... 2! 4! and 1 − x2 + x6 x10 − +... . 3! 5! First, we can cancel 1 on both sides, so we need to compare − x2 x4 + −... 2! 4! and − x2 + x6 x10 − +... . 3! 5! Then we can divide both sides by x2 and compare − 1 x2 + −... 2! 4! and −1+ x4 x8 − +... . 3! 5! Near x = 0, the terms that have at least one power of x will get smaller and smaller, so near 0 the main contribution is − 12 on the left hand side and −1 on the right hand side. Since − 21 > −1, the left hand side is bigger near x = 0. As a result, we conclude that cos(x) > 1 − sin(x2 ) near x = 0, so the blue solid graph corresponds to cos(x) and red dashed graph corresponds to 1 − sin(x2 ). 5.4 Applications of Taylor series 289 Exercise 4. In the right figure above, which graph corresponds to: √ (a) 1 − cos(x) 1 + x2 − 1 (b) √ 1 + x centered at x = 0 first. Hint: find three terms of the Taylor series for Computing integrals. In Chapter 3 we have used Taylor polynomials to approximate integrals. Here we will compute some integrals exactly by representing them as a series. The main fact to remember: we can integrate Taylor series term by term inside the interval of convergence. n+1 (−1) Example 5. Using the series ln(1 + x) = ∑∞ n=1 n xn , compute the integral Z 1.5 ln(x) dx. 1 Solution: The integral can actually be computed using integration by parts, but here we will try to use Taylor series. The function we integrate is ln(x), while the series is for ln(1 + x). We can either change variables in the integral or in the series, so let us make the substitution x = 1 + t, dx = dt in the integral and rewrite it as Z 1.5 Z 0.5 ln(x) dx = 1 ln(1 + t) dt. 0 Recall that the radius of convergence of the above series is R = 1 and the center is 1, so the interval of integration [0, 0.5] is inside the interval of convergence and we can integrate term by term: Z 0.5 ln(1 + t) dt = 0 Z 0.5 ∞ (−1)n+1 n t dt ∑ 0 ∞ = (−1)n+1 ∑ n n=1 ∞ = n=1 ∑ n=1 ∞ (−1)n+1 n n Z 0.5 t n dt 0 · t n+1 n+1 t=0.5 t=0 = (−1)n+1 0.5n+1 · n n+1 n=1 = (−1)n+1 0.5n+1 n(n + 1) n=1 = 0.52 0.53 0.54 − + −.... 2 6 12 ∑ ∞ ∑ For example, if we sum the first three terms written above, we get 0.109375, while the actual integral is 0.108198. 290 5 Taylor polynomials and series Exercise 5. Compute the following integral by representing it as a series: Z x sin(t) 0 t dt. Shape of graphs. Let us recall how the coefficients in front of the powers (x−a) and (x − a)2 in the Taylor series correspond to the properties of the graph of a function y = f (x). Exercise 6. Which function among above four figures has the Taylor series 1 f (x) = (x − 4) + (x − 4)2 + . . . . 2 (a) (b) (c) (d) 5.4 Applications of Taylor series 291 Answer to Exercise 1. The center of the series is a = −4, so its coefficients allow us to compute derivatives at x = −4 using the formula f (n) (−4) = n!cn . (a) True. Notice that the series starts with index n = 20. In other words, the lowest power is (x + 4)20 . Since there is no term (x + 4)19 , the coefficient in front of it is zero, c19 = 0, so the derivative f (19) (−4) = 0. (b) False. Using the above formula, 522 522 = 22! = 22!521 . f (22) (−4) = 22!c22 = 22! √ 5 22 + 3 (c) True. It matches what we computed in part (b). (d) False. 525 525 f (25) (−4) = 25!c25 = 25! √ = 25! √ ̸= 525 . 25 + 3 28 Answer to Exercise 2. Multiplying the series for sin(x) by x, ∞ (−1)n 2n+1 (−1)n 2n+2 x =∑ x . n=0 (2n + 1)! n=0 (2n + 1)! ∞ x sin(x) = x ∑ Because the powers 2n + 2 are always even, To find d 12 dx12 d 11 x sin(x) dx11 x=0 = 0. x sin(x)|x=0 , we need to find the coefficient c12 in front of x12 . This 5 (−1) 1 = − 11! . happens when 2n + 2 = 12, or n = 5, so the coefficient is c12 = (2·5+1)! The the derivative is 1 d 12 x sin(x) = 12!c = 12! − = −12. 12 x=0 dx12 11! Answer to Exercise 3. Plugging x3 into the series for sin(x): sin(x3 ) = ∞ (−1)n ∞ (−1)n x9 ∑ (2n + 1)! (x3 )2n+1 ∑ (2n + 1)! x6n+3 = x3 − 3! + . . . n=0 n=0 where the dots . . . have at least 10 (actually 15) powers of n. Subtracting x3 and dividing by x9 , we get sin(x3 ) − x3 1 = − +... 9 x 3! where the dots have at least one power of x. When x goes to zero, those terms disappear and we get sin(x3 ) − x3 1 1 =− =− . 9 x→0 x 3! 6 lim 292 5 Taylor polynomials and series Answer to Exercise 4. From the Taylor series for cos(x), we know that x2 x4 x2 x4 − +... = − +.... 2! 4! 2 24 √ Next, let us find the first three terms of the series for f (x) = 1 + x = (1 + x)1/2 . We compute 1 − cos(x) = 1 1 , f ′ (x) = (1 + x)−1/2 = √ 2 2 1+x 1 1 1 f ′′ (x) = , − (1 + x)−3/2 = − 2 2 4(1 + x)3/2 so f (0) = 1, f ′ (0) = Taylor series: 1 2 and f ′′ (0) = − 14 . This gives the first three terms of the √ x x2 1+x = 1+ − +.... 2 8 2 Plugging in x and then subtracting 1 gives: x2 x4 x2 x4 1 + x2 − 1 = 1 + − + . . . − 1 = − + . . . . 2 8 2 8 √ So, comparing 1 − cos(x) and 1 + x2 − 1 is equivalent to comparing p x2 x4 − +... 2 24 Cancelling x2 2 and x2 x4 − +.... 2 8 on both sides and then dividing by x4 leads to comparing − 1 +... 24 and 1 − +.... 8 On both sides the dots . . . contain at least one power of x, so they become √negligible 1 > − 18 . This means that 1 − cos(x) > 1 + x2 − 1 near x = 0, so we compare − 24 near x = 0, so the √ blue solid graph corresponds to 1 − cos(x) and red dashed graph corresponds to 1 + x2 − 1. Answer to Exercise 5. Since all the terms in the Taylor series for sin(t) have at least one power of t, we can divide the series by t to represent ∞ sin(t) 1 ∞ (−1)n 2n+1 (−1)n 2n = ∑ t =∑ t . t t n=0 (2n + 1)! n=0 (2n + 1)! The series for sin(t) has the radius of convergence R = ∞, so we are allowed to integrate it over any interval, term by term. Then 5.4 Applications of Taylor series 293 Z x sin(t) 0 t Z x ∞ (−1)n ∑ (2n + 1)! t 2n dt dt = 0 n=0 ∞ = (−1)n ∑ (2n + 1)! n=0 ∞ = 0 (−1)n t 2n+1 (−1)n x2n+1 t=x ∑ (2n + 1)! · 2n + 1 n=0 ∞ = t 2n dt ∑ (2n + 1)! · 2n + 1 t=0 n=0 ∞ = Z x (−1)n x2n+1 ∑ (2n + 1)!(2n + 1) . n=0 In this case, the integral can not be computed by finding the antiderivative, so using a series representation is a great alternative. Answer to Exercise 6. The coefficient 1 in front of (x − 4) tells us that f ′ (4) = 1, so the slope is 1, and coefficient 21 in front of (x−4)2 tells us that f ′′ (4) = 2!· 12 = 1, so the function is concave up. Since there is no free constant c0 , we also know that f (4) = 0. We can eliminate (b) because it is concave down at x = 4. We can eliminate (d) because it is not equal to 0 at x = 4. Finally, to choose between (a) and (c), we observe that the slope at x = 4 in (c) looks closer to 2 than 1, so the answer is (a).