The SAIS Math Companion Aaron Roth, MA ’06 2005 Contents Contents i List of Figures iii List of Tables iv 1 Functions and graphs 1.1 What is a function? . . . . . . 1.2 Functions in mathematics . . 1.3 Functions of several variables 1.4 From functions to graphs . . . 1.5 Level curves . . . . . . . . . . 1.6 Inverses . . . . . . . . . . . . 1.7 Composing functions . . . . . 1 1 2 4 4 8 11 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Some special functions 16 2.1 Concave and convex functions . . . . . . . . . . . . . . . . . 16 2.2 Exponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 Exponential and logarithmic functions . . . . . . . . . . . . 23 3 Linear equations 29 3.1 Linear functions . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Solving simultaneous equations . . . . . . . . . . . . . . . . 33 i CONTENTS 4 Differentiation 4.1 What is a derivative? . . . . 4.2 Rules for taking derivatives 4.3 The second derivative . . . . 4.4 Partial derivatives . . . . . . ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 39 45 52 57 5 Maximums and minimums 62 5.1 Using derivatives to find maximums and minimums . . . . . 62 5.2 The Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . 66 Index 73 List of Figures 1.1 1.2 1.3 1.4 1.5 1.6 1.7 2.1 2.2 2.3 3.1 3.2 3.3 A function maps elements of a domain to elements of a range. . . . . . . A function cannot map one element to several... . . . . . . . . . . . . . ...but a function can map several elements to one. . . . . . . . . . . . . A graph is a way of representing a function. . . . . . . . . . . . . . . . Using the graph to find function values. . . . . . . . . . . . . . . . . . Some level curves of f (x, y) = x2 + y 2 . . . . . . . . . . . . . . . . . . The inverse of a function is the function “run backwards.” . . . . . . . . A convex function “hangs down.” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Diminishing marginal utility. . . . . . . . . . . . . . . . . . . . . . . A concave function “arches up.” The graph of y = 21 x − 1. . . . . . . . . . . . . . . . . . . . . . . . . The graph of y = −3x + 2. . . . . . . . . . . . . . . . . . . . . . . . The intersection of y = 2x + 1 and y = 3x − 1. . . . . . . . . . . . . . . 4.1 4.2 4.3 4.4 As x moves through the domain, y follows along through the range. . . . . 5.1 5.2 5.3 The derivative – the slope of the tangent line – is 0 at a maximum. . . . . Near cities, d follows along with t more slowly. . . . . . . . . . . . . . . The tangent line rests neatly against a curve. . . . . . . . . . . . . . . The slope of tangent lines decreases along the graph of a concave function. 3 The graph of y = x . . . . . . . . . . . . . . . . . . . . . . . . . . . The graph of y = x3 − 27x. . . . . . . . . . . . . . . . . . . . . . . . iii 1 2 2 6 7 10 11 16 17 18 31 31 35 39 43 44 55 63 64 65 List of Tables 2.1 2.2 Rules of Exponents . . . . . . . . . . . . . . . . . . . . . . . . . Rules of Logarithms . . . . . . . . . . . . . . . . . . . . . . . . 22 27 4.1 Rules of Differentiation . . . . . . . . . . . . . . . . . . . . . . . 52 iv Chapter 1 Functions and graphs 1.1 What is a function? A function is a way of associating things in one set with things in another, a way, for each thing in the first set, of finding a corresponding thing in the second. We say that a function maps one set to another. We call the first set the domain and the second the range. function Domain Range Figure 1.1: A function maps elements of a domain to elements of a range. The white pages, for instance, are a kind of function: for each name, the phone book gives a corresponding number. It maps names to numbers. 1 CHAPTER 1. FUNCTIONS AND GRAPHS 2 The concept of a function is very general – but not completely openended. A function cannot be ambiguous. For each element in the domain, it must give a unique answer to the question, what is the corresponding element in the range? It cannot map one element to more than one. The index of a book on World War II is not a function: under the name Adolf Hitler, the index lists many page numbers. Note, on the other hand, that there’s nothing wrong with a function mapping several different elements in the domain to the same element in the range. The white pages give the same number for the name George Bush and for the name Laura Bush. Hitler, Adolf Bush, George (202) 555-1234 p. 54 p. 145 p. 603 Bush, Laura Figure 1.2: A function cannot map one element to several... 1.2 Figure 1.3: ...but a function can map several elements to one. Functions in mathematics There are many ways to specify how a function works, how it takes an element of the domain to the corresponding element of the range. The phone book works by listing an element of the domain – a person’s name – side by side with the corresponding element of the range – the person’s number. Starting with a name, finding the number is straightforward. In mathematics, functions commonly map numbers to numbers, in a way specified by a formula. The formula tells us how to start with one number CHAPTER 1. FUNCTIONS AND GRAPHS 3 as input, manipulate it, and find another number as output. For example, a formula might specify that we take a number, square it, and add three to produce the result. If x is the number we start with and f is the name of our function, then in symbols we write this formula f (x) = x2 + 3. In particular, if we run the number 2 through our formula, we come out with 7. Mathematically, we write this f (2) = 7 and say that f maps the number 2 to the number 7. We often want to name the result produced by a function. It’s customary to call it y and write y = f (x). This is simply shorthand for “y is the result of applying function f to the number x,” or to put it another way, for “y is the number in the range of f corresponding to the number x in its domain.” In the particular example above, x is 2, y is 7, and f is the method, “take x, square it, and add three.” Note in passing that f generally maps two different numbers in the domain to the same number in the range. f (−2), for example, is also 7. Minor confusion sometimes arises when the same symbol is used to name both a function and its result. In economics, for instance, we use the word “demand” both in expressions like, “at $30, demand for widgets is 500 units,” and in expressions like, “the demand curve slopes downward.” In the first case, we are thinking of demand as a number; in the second, as a function. This state of affairs occurs because we describe demand as a function of price by the equation D = D(P ). Here, P represents the price of a unit; the first D in the equation represents a number, the number of units demanded; and the second D represents a function, the demand function. It’s often convenient to give the same name CHAPTER 1. FUNCTIONS AND GRAPHS 4 to two different kinds of things, but as long as the dual role of a particular symbol is kept in mind, this need not cause confusion. 1.3 Functions of several variables Imagine a bath with two taps, one for cold water and one for hot, feeding into a single spout. The temperature of the water coming out of the common spout depends on how far open each of the taps is. It depends on two inputs. In other words, the temperature is a function of two variables. We can represent this state of affairs in an equation, T = f (C, H) where T is the temperature, C is the amount of water coming from the cold tap, and H the amount coming from the hot tap. A function need not be limited to two variables. It can take any number of variables as inputs, combining them to produce a single output. Gross national product, for instance, depends on four things: consumption, investment, government spending, and net exports. In fact, it is simply the sum of these four inputs: Y = f (C, I, G, N X) = C + I + G + N X where, using the conventional abbreviations, Y represents gross national product, C is consumption, I investment, G government spending, and N X net exports. 1.4 From functions to graphs This figure shows the function f (x) = x2 +3 mapping certain numbers in its domain to the corresponding numbers in its range, which, being the result of f , we give the name y. CHAPTER 1. FUNCTIONS AND GRAPHS 5 x: 0 1 2 3 4 y: 3 4 7 12 19 The points that represent elements of the range have been specially placed and joined with an imaginary curve segment drawn with a dashed line. If we compact this diagram by turning it on its side, shrinking the domain “bean” to a horizontal line segment and the range bean to a vertical one, and shortening the arrows until each is exactly as long as the number it points to, we arrive at a leaner representation of the function f – its graph – shown in Figure 1.4. We call the horizontal line segment that the domain bean has shrunk to the x-axis and the vertical line segment that the range bean has become the y-axis. The point where the axes cross is called the origin. The flat, two-dimensional space on which the graph is drawn is called the coordinateor xy-plane or sometimes simply the plane. A graph is just another way of looking at a function, much like a red line traced on a map is another way of looking at a route originally given as a series of directions, “go straight for half a mile on Massachusetts Avenue; at the light, turn left onto 16th Street; continue five miles...” Our function, too, was originally given as a series of directions, “take a number, square it, and add three,” which we then abbreviated to f (x) = x2 + 3. Although a graph is another way of specifying a function, because we can’t draw past the edge of a sheet of paper, a graph can only show a portion of a function. It is a “window” at a certain place on a function that may CHAPTER 1. FUNCTIONS AND GRAPHS 6 Figure 1.4: A graph is a way of representing a function. y (range) x (domain) extend infinitely beyond the window’s frame. Our graph of f (x) = x2 + 3 only shows the portion of it between 0 and 4, even though it would be easy enough to compute and draw its values at 5, where y = f (5) = 28, or at 83.78, or at −1, −2, −29, and so on. If we hadn’t been given the formula for the function, the graph would still tell us how f works – or at least a certain portion of f . Just as we can navigate a route without a list of written directions by following the marks on a map, starting with a number in the domain, we can find our way to the corresponding number in the range without an explicit formula for the function. From our starting number in the domain, that is, on the x-axis, we simply travel due north until we hit the line of the graph. The distance we’ve traveled is exactly the value of the function, and the ruler by which we measure that distance is just the y-axis. CHAPTER 1. FUNCTIONS AND GRAPHS 7 Figure 1.5: Using the graph to find function values. y (3) ...then look to the y-axis to see how far we traveled. (2) ...until we hit the graph... (1) Start with a number on the x-axis and travel north... x There is another very useful way to think of a graph. If we turn to the index of a street map, under the name of any long road will be a list of map coordinates: E-2, E-3, F-3, G-4, etc. These coordinates tell us where on the map to find the road. In fact, if we colored in the squares on the map designated by this series of letter-number pairs, we would roughly trace out the route of the road. Conversely, the road as it wends its way picks out a particular series of coordinate pairs. In short, we can identify a road with a list of coordinates. Likewise, a graph wends its way through the coordinate plane, picking out a set of coordinate pairs, and we can identify the graph with the set of coordinates it passes through. Instead of a letter-number pair, coordinates in the xy-plane are designated by a pair of numbers: the x-coordinate that tells how far “east” or “west” to go along the x-axis from the origin, and the y-coordinate that tells us how far north or south to go along the y-axis. CHAPTER 1. FUNCTIONS AND GRAPHS 8 A point, P , in the coordinate plane is thus written P = (x, y). To locate P , we travel x units to the east (or if x is negative, to the west) and y units to the north (or south). Which points does our old function, f (x) = x2 +3, pass through? Exactly those points (x, y) such that y = f (x). That is, f establishes a particular relationship between x’s and y’s. Only certain (x, y)-pairs satisfy this relationship, exactly those where y = x2 + 3, such as the pairs (−2, 7) and (7, 52) – but not the pairs (0, 0) and (2, −7). A point in the plane is either on the graph – if its coordinates satisfy the relationship y = f (x) – or the point is not on the graph – if y 6= f (x) for the x- and y-coordinates of that point. A function establishes a kind of “in-or-out” test for every point in the plane. Its graph is exactly the set of points it rules “in.” This way of conceiving of a graph, as a set of coordinate pairs that satisfy a relationship established by a function, as a set of points specially picked out of the plane, will be a very useful one to have in mind as you read the later section on simultaneous equations and the following one on level curves. 1.5 Level curves It’s possible for two variables to be in a relationship that’s impossible to describe by a function. The circle of radius one, for example, is the set of √ points (x, y) such that x2 +y 2 = 1. The point ( 12 , 23 ) satisfies the relationship √ √ x2 + y 2 = 1, since ( 12 )2 + ( 23 )2 = 14 + 34 = 1. But the point ( 12 , − 23 ) also satisfies it. If the relationship could be described by a function, y = f (x), √ √ then f would have to map 12 to both 23 and − 23 . It would have to map one number to several, something a function cannot by definition do. Therefore f cannot exist. On the other hand, nothing stops us from graphing the CHAPTER 1. FUNCTIONS AND GRAPHS 9 circle: it is, after all, just the set of points (x, y) in the plane that satisfy the relationship x2 + y 2 = 1. There are more graphs in the world than can be described by functions alone. y x Whether it can ultimately be described by a function or not, any relationship between x and y can, as with the points of a circle, be put in the form, expression involving only x and y = some constant. (1.1) And out of the expression involving only x and y, we can make a function of two variables, in the case of circles, for example, the function f (x, y) = x2 + y 2 . By letting the constant in Equation 1.1 range over a variety of values and setting f equal, we generate a whole family of relationships between x and CHAPTER 1. FUNCTIONS AND GRAPHS 10 y: x2 + y 2 = 1 x2 + y 2 = 4 x2 + y 2 = 9 .. . As it happens, the constants we’ve set f equal to represent the radii of various circles, or rather, the squares of their radii. We can depict all these circles simultaneously in a single picture. y radius = 3 r=2 r=1 x Figure 1.6: Some level curves of f (x, y) = x2 + y 2 . These circles form a family: they resemble each other in shape and differ only in size. This is because they are all the offspring, so to speak, of the same function, f . We call this family the level curves of f . CHAPTER 1. FUNCTIONS AND GRAPHS 1.6 11 Inverses Assume for the moment that each person has a single, unique phone number. The phone book tells us how to find that number given a name. Caller ID is a kind of reverse phone book: it tells us the name given the number. If the phone book is a function that maps numbers to names, then we say that caller ID, which maps a number back to the name it belongs to, is the inverse of this function. Bush, George (202) 555-1234 inverse function Domain Range Figure 1.7: The inverse of a function is the function “run backwards.” Think of the inverse of a function as the function run backwards. The inverse of a function, if it exists, is itself a function: it maps the range of the original function back to the original function’s domain. That is, the domain of the inverse function is the range of the original function, and vice versa. The domain of the caller ID function, the elements it maps from, is a set of numbers, the very set of numbers the phone book function maps to. Sometimes the inverse of a function does not exist. This happens when the original function maps more than one element in its domain to the same element in its range, as in Figure 1.3, where each person does not have a unique phone number. If the inverse function existed, it would have CHAPTER 1. FUNCTIONS AND GRAPHS 12 to map the number (202) 555-1234 back to two different names, George Bush and Laura Bush. This is like the situation depicted in Figure 1.2, where one element is mapped to several, something a function is forbidden do. It follows that whenever a function maps several elements to one, it has no inverse. For example, our function f (x) = x2 + 3 has no inverse. It maps two different numbers, 2 and −2, to the same number, 7. The inverse function, if it existed, would have to map 7 back to both 2 and −2, and this would disqualify it from being a function. How can we find the inverse of a mathematical function in general? The inverse of a function is the function run backwards. If a function specifies, “take nine-fifths of a number and add thirty-two,” then, reading these directions backwards, the inverse will specify, “take a number, subtract thirty-two, and multiply by five-ninths.” (Incidentally, these are the formulas for converting from degrees Celsius to degrees Fahrenheit and back.) We abbreviate the function in symbols 9 f (C) = C + 32. 5 A function takes an input, manipulates it, and produces a result. The inverse takes that result, runs the function’s manipulations backwards, and returns the original input. In this example, C is the input, f does the manipulating, and the result we’ll call big F , so that we can write 9 F = f (C) = C + 32. 5 This equation gives us the result, F , in terms of the input, C. The inverse will give us the original input, C, in terms of the original result, F . In effect, CHAPTER 1. FUNCTIONS AND GRAPHS 13 we need to solve the equation, F = 95 C + 32, for C in terms of F : 9 F = C + 32 5 9 F − 32 = C 5 5(F − 32) = 9C 5 (F − 32) = C 9 (1.2) (1.3) The last equation simply says in symbols what we’ve already said in words: “take a number, F , subtract thirty-two, and multiply by five-ninths to produce the result, C.” The equation, which gives C in terms of F , specifies the inverse of the function little f , which gave us F in terms of C. We use the notation f −1 (read “f inverse”) to denote the inverse of a function, and we write 5 C = f −1 (F ) = (F − 32). (1.4) 9 Where f gave us the formula for converting from degrees Celsius to degrees Fahrenheit, f −1 gives us the formula for converting degrees Fahrenheit back to degrees Celsius. This is how we find the inverse of a mathematical function in general. We start with some equation, y = f (x), identical in form to Equation 1.2, giving y in terms of x. Through a similar series of steps, which ends with x isolated on one side of an equation, as C is in Equation 1.3, we arrive at an expression for x in terms of y, of the form x = f −1 (y), as in Equation 1.4. 1.7 Composing functions Often when we’re learning to perform a complex action, we break it down into a series of simpler steps, taking one, pausing, and then taking the next. After practicing the individual steps for a while, we combine them into one seamless motion. CHAPTER 1. FUNCTIONS AND GRAPHS 14 Composing mathematical functions is nothing more than taking the steps specified by one function, then performing the steps specified by a second function on the result. The first function might specify, “take a number and square it,” and the second function, “take a number and add three.” In symbols, we represent the first function h(x) = x2 and the second g(x) = x + 3. Composing these two functions means applying one after the other: “Take a number and square it. Then take the result and add three.” Or, after removing the superfluous pauses: “Take a number, square it, and add three.” And this, of course, is the specification for our old friend, f (x) = x2 + 3. Note that the composition of two functions is itself a function. We can arrive at this composed function more directly by a mathematical route. We simply express the composition, “apply h, then apply g,” in symbols: composition of h and g = g h(x) . Plugging h(x) into g, we get g h(x) = h(x) + 3. But we know that h(x) = x2 , so g h(x) = h(x) + 3 = x2 + 3 (1.5) (1.6) = f (x) This is the general procedure for composing two functions. We plug the first, “inner,” function into the second, “outer,” function, as in Equation 1.5. CHAPTER 1. FUNCTIONS AND GRAPHS 15 Then we expand the inner function, as in Equation 1.6. The result is a function, which here we’ve called f , that is the composition of the two original ones. Chapter 2 Some special functions 2.1 Concave and convex functions The graph of a convex function looks like a sagging wire. On the portion of a convex function shown by a graph, it “hangs down” between endpoints. Figure 2.1: A convex function “hangs down.” y x The graph of a concave function, by contrast, looks like an arcing rain16 CHAPTER 2. SOME SPECIAL FUNCTIONS 17 bow. On the portion of a concave function shown by a graph, it “arches up” between endpoints. Figure 2.2: A concave function “arches up.” y x Imagine now that you could cross over the rainbow from one end to the other as if crossing a bridge. The most arduous part of your journey is at the outset, where the way up is steepest and you progress more by climbing than by walking. As you continue, the way up becomes more and more gradual until, when you are very close to the top, it’s as if you were walking on level ground. Your descent then begins, gradually at first, but becoming steeper and steeper as you near the end. This is the essential characteristic of a concave function: as you “walk” along its graph from left to right, it goes “uphill” less and less steeply or alternatively, goes “downhill” more and more steeply. To put it another way, as you walk from left to right, each successive step forward carries you less distance upward – or carries you farther downward – than the one before. Concave functions are everywhere in economics. Whenever you hear the words, “diminishing marginal something-or-other,” you are dealing with a concave function. The “marginal” in these kinds of expressions refers to CHAPTER 2. SOME SPECIAL FUNCTIONS 18 what’s happening at the leading edge – or margin – of your progress as you trace your way left to right along the graph of the function. Does each step forward at the margin carry you higher or lower than the one before? How many units of vertical progress are you making for each one of horizontal progress? The “diminishing” part of such expressions tells us that in fact, with each step of horizontal progress, our vertical progress diminishes, in other words, that we are traveling along the graph of a concave function. It often happens that the longer we do a particular activity, the less pleasure we get from doing it. When we’re hungry, the first bite of food is heavenly, the second excellent, the third pretty good, but, as our hunger is satisfied, we get less and less pleasure from each successive bite until, after twenty-nine bites, past full, each additional bite grows more and more distasteful. Far from giving pleasure, at this point eating is taking pleasure away. Figure 2.3: Diminishing marginal utility. pleasure (utility) bite Economists call pleasure “utility.” To express the fact that the first bite gives the most pleasure and that each one after that brings less than the one before, economists say that marginal utility is diminishing. Moving from left CHAPTER 2. SOME SPECIAL FUNCTIONS 19 to right along the graph of the utility function in Figure 2.3, with each step of horizontal progress (a bite), vertical progress (pleasure) diminishes. In other words, the utility function is concave, as we could already tell from the shape of its graph. In Chapter 4, which concerns derivatives, we’ll see a more technical and mathematically usable definition of “concave.” For the moment, it will be instructive to translate our verbal and graphical description of concavity into mathematical symbols. In fact, there’s almost no skill more worth acquiring than an ability to move easily back and forth between pictorial and symbolic representations, between geometry and algebra. Consider the example above. The utility derived from eating is a function of the number of bites taken, a fact we state in symbols U = f (b). How, then, do we translate the notion of diminishing marginal utility into symbols? Suppose that so far we’ve taken m bites, in other words, that m is the leading edge, the margin, of our eating progress. Our next bite will be the (m + 1)th, and the bite before the current one was the (m − 1)th. The gain in utility from taking the next bite will be Unext − Ucurrent = f (m + 1) − f (m) as the gain in utility from having taken the current bite was Ucurrent − Ulast = f (m) − f (m − 1). To say that marginal utility is diminishing is to say that we’ll gain less utility from taking our next bite than we gained from taking our last one, a fact we state in symbols Unext − Ucurrent < Ucurrent − Ulast CHAPTER 2. SOME SPECIAL FUNCTIONS 20 or, substituting for the U ’s, f (m + 1) − f (m) < f (m) − f (m − 1). (2.1) Remembering that f (m) is the height of the graph of f above the x-axis at m (see Figure 1.4), we can read Inequality 2.1 as saying the vertical progress, f (m + 1) − f (m), made with the horizontal step from m to m + 1 is less than the vertical progress, f (m) − f (m − 1), made with the horizontal step from m − 1 to m. And this is simply a translation from symbols back to our original, verbal description of concavity. We have only been switching back and forth between different representations – verbal, symbolic, graphical – of the same underlying idea. Learn to do this with ease. There’s no skill more valuable. 2.2 Exponents Somewhere on a remote island in the Pacific, a single breeding pair of rabbits is released. Each spring, they give birth to another breeding pair. In fact, every breeding pair gives birth to another breeding pair each spring. Initially, there are only the original two rabbits on the island. After the first spring, there are four: the two original rabbits and their two offspring. Each of these pairs then breeds, producing two more pairs of rabbits the following spring, for a total of eight rabbits – or four breeding pairs. It’s easy to see that the rabbit population is doubling every year. Years after release 0 1 2 3 .. . Pairs of rabbits 1 2 4 8 .. . CHAPTER 2. SOME SPECIAL FUNCTIONS 21 To figure out how many pairs of rabbits there will be at the end of the current spring, we simply double the number of the pairs there were after the last spring: Pcurrent = 2 · Plast . (2.2) But the number of pairs after last spring was equal to twice the number of pairs from the spring before that: Plast = 2 · Pyear bef ore last . We can then substitute this expression for Plast in Equation 2.2 to obtain Pcurrent = 2 · (2 · Pyear bef ore {z | last ) . Plast } It’s not hard to see now that we can continue this expansion... Pcurrent = 2 · 2 · 2 · Pthree years ago ...until we work our way back to year 0, when there was only the one original breeding pair. That is, P0 = 1. At this point, we’ll have a long string of 2’s (multiplied, finally, and to no effect, by the original 1): Pcurrent = 2| · 2 ·{z. . . · 2} ·1 n times where n is the number of springs that have passed since the original pair was released. It’s tiresome to have to write 2 · 2 · 2 · 2 · . . . for more than a couple of 2’s, so we use an abbreviation, 2n . Read “two to the n” or “two to the nth” or “two to the power of n,” 2n is simply shorthand for “multiply 2 by itself n times.” We call 2 the base and n the exponent. The number of pairs of CHAPTER 2. SOME SPECIAL FUNCTIONS 22 rabbits is a function of the number of years that have passed: P = f (n) = 2n . That is, the number of rabbit pairs on the island in year n is 2 to the power of n. We call this kind of a function, where the variable is in the exponent and the base is fixed, an exponential function. Let’s consider some examples. 34 – 3 times itself four times – is 81, and 51 – 5 times itself one time – is just 5. Exponents can even be negative. We interpret b−n as b1n . For instance, 2−1 = 12 . Any number to the 0th power is 1. If 2, say, is multiplied by itself m times, and the result is then multiplied by 2 another n times, 2 will have been multiplied by itself a total of m + n times. In symbols, 2m · 2n = 2m+n . All these observations and conventions, along with several others, are summarized in the following rules of exponents: Table 2.1: Rules of Exponents Rule Example b0 = 1 40 = 1 b1 = b 41 = 4 b−m = b1m √ m b n = n bm 1 4−2 = 412 = 16 √ 3 4 2 = 64 = 8 bm bn = bm+n 42 41 = (4 · 4) · 4 = 43 = 64 bm = bm−n bn m n mn 43 42 = 4·4·4 4·4 = 41 = 4 (b ) = b (42 )3 = (4 · 4)(4 · 4)(4 · 4) = 46 = 4096 bm cm = (bc)m bm b m = m c c 42 32 = (4 · 4)(3 · 3) = 122 = 144 43 4·4·4 4 3 = = = 23 = 8 3 2 2·2·2 2 CHAPTER 2. SOME SPECIAL FUNCTIONS 2.3 23 Exponential and logarithmic functions We’ve already seen an example of an exponential function in the previous section, one that gives us an expression for the growth of a population of rabbits. Any process of growth in which the size of something in one period is a multiple of its size in the previous period is described by an exponential function. These kinds of growth processes are everywhere. An economy is said to grow by such-and-such a percentage over its size the previous year. A savings account grows by interest compounded daily. The common characteristic of all these processes is that they “feed on themselves,” the output of one stage becoming the input of the next. For a practical application, let’s look at stock prices. How, for example, could we figure out the current price of a share of Microsoft stock if we are told the price has grown 12% a year since 1990, when a share cost $5? In 1991, the stock is worth 12% more than it was the year before: P1991 = P1990 + 0.12P1990 = 1.12P1990 . The next year, 1992, the stock will again rise by 12% over the previous year’s price: P1992 = P1991 + 0.12P1991 = 1.12P1991 = 1.12 · (1.12P1990 ) | {z } P1991 2 = 1.12 P1990 . It’s not hard to see that by 2005, the price of a share of Microsoft stock will be P2005 = 1.1215 P1990 . (2.3) But we were given the price of a share in 1990: $5. Thus the price in 2005 CHAPTER 2. SOME SPECIAL FUNCTIONS 24 is P2005 = 1.1215 · 5 = 5.47 · 5 = 27.35. Suppose now that instead of being asked to find the price of a share in 2005, we had been asked how many years it would take, starting at $5 in 1990, for the price of share to reach $50, assuming that annual growth continues indefinitely at 12%. Not knowing how many years this will be, we call the length of time x. The price of a share x years after 1990 is, by assumption, $50, but we know that this must equal 1.12x P1990 . That is, 1.12x P1990 = 50 1.12x · 5 = 50 1.12x = 10 (2.4) But now we are at an impasse. We don’t have any way at the moment of getting the x down from its perch in the position of exponent, so that we could reduce Equation 2.4 to an expression of the form x = . . . telling us exactly what x was. Let’s set this problem aside for a moment and return to the one in the previous section involving rabbits. There, we came up with a function, P = f (n) = 2n , that told us how many pairs of rabbits there were after n years. We might, though, be interested in the opposite question. Namely, how many years will it take for there to be P pairs of rabbits? How many years, for instance, will it take for there to be 64 pairs of rabbits? Given that after n years there are 2n pairs, when there are 64 pairs, 2n = 64. But we already happen to know that 64 is 26 , that is, 2 multiplied by itself six times. We can see then that n must be 6. It will take six years for there to be 64 pairs of rabbits on the island. Thinking back for a moment to Section 1.6 on inverses, we can describe what we have just done in working backwards from the number of rabbits to the number of years – where before f had taken us from years to rabbits – as an application of the inverse function, f −1 . Where f gave us rabbit pairs, CHAPTER 2. SOME SPECIAL FUNCTIONS 25 P , in terms of years, n, by the formula P = 2n , the inverse of f will give us an expression, n = f −1 (P ), for n in terms of P . It tells us the number of years it takes for the rabbit population to reach a size of P pairs. Where f answered the question, “what is 2 raised to the nth power?,” f −1 answers the question, “given a number, to what power must 2 be raised to get that number?” Remember that f , with its variable in the exponent, is called an exponential function. The inverses of exponential functions are important enough in their own right to merit a special name. They are called logarithmic functions. The function f (n) = 2n is an exponential function with a base of 2. Its inverse, f −1 , will be a logarithmic function with the same base. We use a special abbreviation for the logarithmic function with a base of 2, writing f −1 (P ) = log2 (P ). The righthand side of this equation is read, “log base two of P ” and is simply shorthand for, “start with a number, P , and return the exponent to which we must raise 2 to get P .” Compare this description of the inverse with a description of the original function, “start with a number, n, and return the result of raising 2 to this number.” There’s no reason we need to stick to a base of 2. We could, for instance, take logarithms base 5: log5 1 = 0 log5 5 = 1 log5 25 = 2 log5 125 = 3 .. . Perhaps the most intuitive way to think of an expression like “logb x” is to read it as, “to what power must we raise b to get x?” CHAPTER 2. SOME SPECIAL FUNCTIONS 26 Now we can revisit the Microsoft share price problem we set aside when we reached the impasse of Equation 2.4. If we apply a logarithm base 1.12 to each side of this equation, we’ll almost be done. 1.12x = 10 log1.12 1.12x = log1.12 10 Simplifying the lefthand of this last equation is straightforward: we can read it as, “to what power must we raise 1.12 to get 1.12x ?” Well, x, since 1.12 raised to the power of x is 1.12x . That is, the lefthand side of the equation simplifies to x: log1.12 1.12x = x = log1.12 10. But now we’re at another impasse. We don’t know how to compute log1.12 10. How do you figure out what power to raise 1.12 to in order to get 10? We have to set the problem aside again to make another useful digression. We saw that logarithms can be taken using any base: 2, 5, whatever. It turns out that there’s a “natural” choice of base, although its value, an infinite decimal beginning 2.718281..., at first makes it seem anything but natural. The reasons that make this base natural are deep but arcane. Suffice it to say that one of the consequences of its naturalness is the ease with which logarithmic and exponential functions that use it as a base can be computed, although this, too, is not at all obvious. In any case, a calculator will do it for you. This number, 2.718281..., is so fundamental, cropping up in so many different places, that it’s designated by its own letter, e. The function exp(x) = ex is called the exponential function. The inverse, a logarithmic function, is called the natural logarithm and denoted “ln,” as in exp−1 (x) = ln x. In other words, ln x = loge x. Just as there are rules of exponents, presented in the previous section, there are rules of logarithms. Compare these, laid out in Table 2.2, to the rules of exponents in Table 2.1. Finally we’re in a position to finish off the share price problem we’ve CHAPTER 2. SOME SPECIAL FUNCTIONS 27 Table 2.2: Rules of Logarithms Rule Example logb (xy) = logb x + logb y logb xy = logb x − logb y log2 (4 · 16) = log2 4 + log2 16 = 2 + 4 = 6 log5 125 = log5 125 − log5 5 = 3 − 1 = 2 5 logb xm = m logb x log3 94 = 4 log3 9 = 4 · 2 = 8 logb bx = x ln ex = x blogb x = x eln 1776 = 1776 twice set aside. Starting again from Equation 2.4, now instead of taking the logarithm base 1.12, we’ll take it base e: ln 1.12x = ln 10. A rule in Table 2.2 tells us how to simplify the lefthand side of this equation: ln 1.12x = x ln 1.12. The complete calculation now looks like this: 1.12x P1990 = 50 1.12x · 5 = 50 1.12x = 10 ln 1.12x = ln 10 x ln 1.12 = ln 10 ln 10 x= ln 1.12 At this point, we just plug numbers into our calculator and find that x= ln 10 = 20.32. ln 1.12 CHAPTER 2. SOME SPECIAL FUNCTIONS 28 In other words, if the price of a share of Microsoft stock grows at 12% a year starting in 1990, it will reach $50 in about 20 years, 4 months, sometime in the first half of 2010. We check that our work is correct by calculating 1.1220.32 · P1990 , the formula for the price of a share after 20.32 years. In fact, 1.1220.32 · P1990 = 1.1220.32 · 5 = 50, as expected. Chapter 3 Linear equations 3.1 Linear functions The two simplest things that can be done to a number are multiplying it by another number and adding another number to it. Any series of multiplications and additions can be consolidated into a single multiplication and a single addition: multiplying in three steps, first by 2, then by 5, then by 3, is the same as multiplying once by 30. After consolidating, then, any transformation of one number into another by multiplications and additions reduces to a multiplication by a single number, call it m, followed by the addition of a single number, call it b. Such a transformation, which maps one number to another, is a function, represented in symbols f (x) = mx + b. Functions of this form are called linear . They turn up everywhere, and much of mathematics is devoted to reducing intractable questions about more complicated functions to simpler questions about linear functions. This is the case, as we will see in the next chapter, with derivatives. For now, consider some examples of a linear functions. A bookie takes a variety of bets. On a wager with three-to-one odds, for instance, he pays 29 CHAPTER 3. LINEAR EQUATIONS 30 winning bets three times what was originally staked, minus a standard $10 commission. If s represents the original stake, then the money cleared by a winning gambler is given by the linear function, W = f (s) = 3s − 10. Here, m is 3 and b is -10. Another example comes from cooking. A recipe, as written, serves four. You, on the other hand, only need to serve two, so you halve the quantity of each ingredient. If two cups of flour are called for, you use one. If q0 is the original quantity needed of some ingredient, then the new quantity, q1 , is given by the linear function, 1 q1 = f (q0 ) = q0 . 2 Here, m is 12 and b is 0. Linear functions get their name from their graphs, which are straight lines. Figure 3.1, for instance, shows the graph of the function, y = f (x) = 1 x − 1. And Figure 3.2 shows the graph of y = f (x) = −3x + 2. There are 2 three things to notice about these graphs: • The larger m is ( 21 in Figure 3.1, -3 in Figure 3.2), the “steeper” the line. • If m is positive, as in Figure 3.1, the line slopes up to the right. If m is negative, as in Figure 3.2, the line slopes down to the right. • The line crosses the y-axis at b. Two parameters, m and b, are enough to describe a linear function. In graphical terms, this amounts to saying that a line is completely characterized by its steepness and by where it crosses the y-axis. Both of these traits have a special name. The steepness of a line is called its slope. The place where it crosses the y-axis is called its y-intercept. In Figure 3.1, the slope CHAPTER 3. LINEAR EQUATIONS 31 Figure 3.1: The graph of y = 12 x − 1. y x Figure 3.2: The graph of y = −3x + 2. y x of the line is 12 and its y-intercept is -1. In Figure 3.2, the slope is -3 and the y-intercept 2. CHAPTER 3. LINEAR EQUATIONS 32 Recall our discussion of concavity in Section 2.1. There, we characterized a concave function by imagining what it would be like to walk along its graph from left to right: an increasingly gradual climb, leveling out towards the top, followed by an increasingly steep descent. We can perform the same imaginative exercise on lines. Walking along the graph of a linear function, that is, along a straight line, requires the same amount of effort wherever we are on it. It’s just as steep where we finish as where we start – and at all points in between – like an endless ramp. The steepness of a line, its slope, is the same everywhere. Constancy of slope is the essence of a line. What is the translation of this last, somewhat mystical statement into mathematical terms? Consider any two distinct points on a line with slope m, call them P0 = (x0 , y0 ) and P1 = (x1 , y1 ). To say that a line has the same steepness everywhere is the same as saying, as we imagine ourselves walking along it, that we make the same vertical progress with every step of horizontal progress. Every step forward carries us the same fixed distance up or down. y (x1 , y1 ) (x0 , y0 ) distance up step forward x Constancy of slope means that every bit of horizontal progress must be matched by a proportionate amount of vertical progress, no matter where CHAPTER 3. LINEAR EQUATIONS 33 on the line we’re walking. For instance, if one step forward carries us three steps upward, then two steps forward must carry us six steps upward. The ratio of vertical steps to horizontal steps is fixed. Between P0 and P1 , there are x1 − x0 horizontal steps and y1 − y0 vertical steps, thus the ratio of y1 − y0 to x1 − x0 is fixed and is equal to the slope of the line, no matter which points we’ve chosen for P0 and P1 : y1 − y0 =m x1 − x0 3.2 (3.1) Solving simultaneous equations At the end of Section 1.4, we discussed how a graph can be identified with the set of points in the plane that it passes through, just as a road on a map can be identified with the set of grid coordinates it passes through. Graphs, like roads, can intersect each other. If the index of a map lists the coordinates D-3, D-4, D-5 for Massachusetts Avenue and the coordinates C-4, D-4, E-4 for Florida Avenue, we can tell without looking at the map itself that these two roads intersect in square D-4. The graph of a function f is the set of points in the plane whose xand y-coordinates satisfy the relationship y = f (x). The graph of a second function, g, is the set of points whose coordinates satisfy y = g(x). How can we tell where these two graphs intersect? In other words, how can we tell which points are in both the set of coordinates identified with f and the one identified with g? In the case of roads, we simply looked up which coordinate the index listed for both Massachusetts Avenue and Florida Avenue. This procedure is intuitive enough, but it’s worth examining in detail. What’s actually going on when we scan the index is a matching up of letter parts of the grid coordinates between those listed for Massachusetts Avenue and those listed for Florida Avenue. Whenever we encounter two coordinates, one on each list, with matching letter parts, we check whether the number parts match CHAPTER 3. LINEAR EQUATIONS 34 as well. If both the letter and the number parts match, we’re done: we’ve found the place the roads intersect. Functions are a little trickier, since the list of coordinates they pass through is more than likely infinite, but the procedure for finding the intersection of their graphs is essentially the same. Instead of matching up the letter parts of coordinates, we match up y-coordinates. Once y-coordinates are matched, we determine which x-coordinates match, just as in the case of roads, once we had matched letter parts (the map’s “y-coordinates”), we matched number parts (the map’s “x-coordinates”). The y-coordinates of points on the graph of f are given by y = f (x) and for g by y = g(x). Matching up y-coordinates means setting the two functions equal, since f (x) = y = g(x). But this equation, f (x) = g(x) is one that will have only x’s on the left and only x’s on the right – it will be an ordinary equation in one variable and with any luck, one not too hard to solve. Solving it is effectively “matching up” x-coordinates, and once solved, we’ll know all the x-coordinates of the points where the graphs of f and g intersect. To find the y-coordinates, we simply plug these x’s back into either the equation y = f (x) or the equation y = g(x), whichever seems easier to compute. Now a concrete example. Suppose we have two linear functions, f (x) = 2x + 1 and g(x) = 3x − 1. The graphs of these two functions are lines, which intersect at exactly one point. CHAPTER 3. LINEAR EQUATIONS 35 Figure 3.3: The intersection of y = 2x + 1 and y = 3x − 1. y (5,2) x To find the x-coordinate of this point, we set these two functions equal: f (x) = g(x) 2x + 1 = 3x − 1 Here is our equation with only x’s on both its right and its left sides, and it is not hard to solve: 2x + 1 = 3x − 1 1=x−1 2=x In the final step, we plug this x-value back into one of the original CHAPTER 3. LINEAR EQUATIONS 36 functions. If we choose f , our computation looks like this: y = f (2) =2·2+1 =5 If instead we choose g, our computation runs thus: y = g(2) =3·2−1 =5 Either computation results in the same y-value, which makes sense, since each computation started from the assumption that the y-coordinates match. The x- and y-coordinates of the point P = (2, 5) simultaneously satisfy the relationship y = f (x) and the relationship y = g(x). That is, P is simultaneously on the graph of f and the graph of g – it is the point where they intersect. Often, the graphs of lines are not given as functions y = f (x) and y = g(x), but as a system of linear equations. A system of two linear equations is just a pair of equations in the form ax + by = s cx + dy = t. In the example above, functions f and g define two linear equations: y = 2x + 1 y = 3x − 1. CHAPTER 3. LINEAR EQUATIONS 37 These can be rewritten in our new form as −2x + y = 1 −3x + y = −1. In this case, a = −2, b = 1, and s = 1; c = −3, d = 1, and t = −1. It’s only a little less convenient to find where two lines intersect when they are described by a pair of linear equations than when they are described by a pair of linear functions. We simply transform the given pair of equations into a more convenient pair of functions and proceed as before, by setting the functions equal. For instance, we might be given the system of equations, −10x + 2y = 6 9x + 3y = −15. We solve each equation above for y: y = f (x) = 5x + 3 y = g(x) = −3x − 5. And now we set y’s equal, as before: 5x + 3 = −3x − 5. Solving, we find x = −1. Finally, we plug this value of x back into one of the equations above to compute y: y = 5 · (−1) + 3 = −2. If we want to check our work, we can calculate y using the other equation CHAPTER 3. LINEAR EQUATIONS and see whether we get the same result: y = (−3) · (−1) − 5 = −2. 38 Chapter 4 Differentiation 4.1 What is a derivative? f x0 y0 y1 y2 x1 x2 y3 x3 Range Domain Figure 4.1: As x moves through the domain, y follows along through the range. Our first representation of a function, at the beginning of Chapter 1, was a pictorial one. Figure 1.1 represents a function as an arrow mapping an element of the domain, generically called x, to an element of the range, generically called y. Which element of the range depends on which element of the domain: choose x and f determines what y must be. When x rambles 39 CHAPTER 4. DIFFERENTIATION 40 around the domain, y gets dragged around the range, as if the arrow were a leash tethering it to x. As x moves, y follows along. The derivative is nothing more than the answer to the question, how does y follow along as x moves? As x increases, does y increase with it? Or does it decrease? Or does it increase at certain values of x and decrease at others? And how sensitive is y to changes in x? Does a little tug on x cause y to swing dramatically? Or is y a sluggard, barely stirring no matter how wildly x moves around? Speed is probably the example of a derivative where our intuitive grasp is strongest. Suppose we travel by car from Washington to New York. Starting from SAIS, after thirty minutes of driving we reach the Beltway. After an hour and half, we reach Baltimore; after three hours, Philadelphia; and so on. Distance traveled is a function of time spent traveling, which we express by d = f (t). Thus f (0:30) = 13 mi, f (1:30) = 40 mi, f (3:00) = 136 mi, and so on. As time elapses, distance increases. Or, to put it another way, as t marches forward, minute by minute, through the domain, d follows along, mile by mile, through the range. Sometimes, as when traveling through city traffic, d follows along with t slowly. At other times, as when traveling on highways, d follows along with t quickly. In short, how d follows along with t is just our speed at any given time. We say that speed is the derivative of distance with respect to time. Suppose now that we travel at a constant speed, say, 60 miles per hour. Our distance function in this case will be particularly simple: d = f (t) = 60t. And for this particular function we already know, having started with it, the answer to the question, how does d follow along with t? (At 60 miles per hour.) But let’s approach it from another direction. CHAPTER 4. DIFFERENTIATION 41 To begin with, f is a linear function whose graph is a line with a slope of 60. In Section 3.1, we developed the idea of a line’s defining property being its constancy of slope, a property we tried to get a feel for by imagining ourselves walking along the line. For every one step forward, we took the same number of steps up or down. If one step forward carried us two steps down, then two steps forward would have to carry us four steps down. Every bit of horizontal progress is accompanied by the same amount of vertical progress, in a fixed ratio of vertical to horizontal – 60 miles to 1 hour in the present example. In fact, “horizontal progress” in this case means specifically “the movement of time forward” and “vertical progress” specifically “the movement of distance upward.” As t moves forward, d follows along upward. How does d follow along? However much t changes, d changes 60 times more. To make a long story short, the derivative of f is, as we already knew, just 60. And what’s true of this particular linear function is true of all linear functions. For any function of the form f (x) = mx + b, the derivative is just m. Equation 3.1 confirms this. It states in symbols exactly that the ratio of vertical progress to horizontal progress is fixed, at m, everywhere on the line, the graph of f : y1 − y0 vertical progress = = m. horizontal progress x1 − x0 In other words, the derivative of a linear function is the slope of the line that is its graph. Slope – the steepness of a line – is precisely the measure of how fast y rises or falls off as x moves forward. To elaborate, recall the derivative is nothing more than the answer to the question, how does y follow along as x moves? In the case of a linear function, y is given by y = f (x) = mx + b. For instance, if f (x) = −2x + 5, then as x goes from 1 to 2 to 3 to 4, y goes from 3 to 1 to −1 to −3. As x moves, y follows along, and the derivative, CHAPTER 4. DIFFERENTIATION 42 −2, tells us how. The fact that it’s negative tells us that as x increases, y decreases. The fact that it’s −2 tells us that as x increases, y decreases twice as fast: a little change in x causes double the change in y. For example, as x goes from 1 to 4, y goes from 3 to −3: x changes by 3, y by −6, and the ratio of vertical progress to horizontal progress is given by −3 − 3 vertical progress = horizontal progress 4−1 = −2 =m Linear functions are the simplest kind of function, and their derivatives are correspondingly simple. More complicated functions have more complicated derivatives. In the case of linear functions, we can simply read off the derivative from the function as written. For more complicated functions, finding the derivative becomes a two-step process: first, the complicated function is approximated by a linear one; then, once the problem is reduced to a question about linear functions, we simply read off the derivative as before. Suppose, for instance, we had not made the simplifying assumption about our journey to New York that our speed was constant. Suppose instead that our speed varies, in other words, that the derivative (of distance with respect to time) is not constant. Sometimes d follows along with t slowly and at other times quickly, but as before, distance traveled is still a function of time spent traveling: d = f (t). In our first definition of the distance function, when we assumed speed was constant, f was linear. Now, when speed varies, f is a more complicated kind of function. In the first definition, the graph of f was a line. Now its graph is a curve. CHAPTER 4. DIFFERENTIATION 43 d New York Philadelphia Baltimore Beltway t Figure 4.2: Near cities, d follows along with t more slowly. The trick to finding the derivative is to approximate the complicated function f by a simpler linear function, the curving graph by a straight line. How can we do this? As it happens, we’ve done it often before, though perhaps not in a consciously mathematical context. Imagine, for instance, watching the sun set over the ocean. Gazing out at the horizon, we see a line, and we could be forgiven for thinking the earth was flat. But the earth is round, after all, and its edge, the horizon, is not a line but a circle. It’s just that a human is so small compared to the earth, that when we stare out at the very, very short stretch of horizon our eyes can take in at sea level, we hardly perceive it curving down at the far edges of our vision. In short, if we blow up a curve big enough or, equivalently, shrink down the observer small enough, a curve becomes indistinguishable from a line. This, in essence, is how we find the derivative of a complicated function, whose graph is a curve. We choose a point on the graph where we want to CHAPTER 4. DIFFERENTIATION 44 find the derivative. We blow up the stretch of curve immediately to either side of this point – ten-fold, a thousand-fold, a million-fold – until the stretch in view is a similar proportion of the whole curve that the stretch of horizon taken in by a human standing on the surface of the earth is of the earth’s whole circumference. At this level of magnification, the curve looks like a line. This line is the best approximation of the curve at the chosen point and the linear function the line represents the best approximation of the complicated function the curve represents. If we zoom back out, the curve again looks like a curve, and the line only approximates it well very close to the point where we took the derivative. At that precise point, the line and the curve rest neatly against each other, in the same way a coin balanced on its edge rests neatly against a tabletop. We say that the line is tangent to the curve. d tangent line with slope = 65 mph point of tangency t Figure 4.3: The tangent line rests neatly against a curve. If we imagine an ant walking along the graph in the immediate neighbor- CHAPTER 4. DIFFERENTIATION 45 hood of the point of tangency, it would be very hard for the minute creature to tell whether it was walking along the curve itself or along the the line that approximates it there, just as it would be hard for us to tell, from the deck of a ship, that we were not sailing across a flat expanse of ocean but a curving one. Walking along a short stretch of curve near the point of tangency is much the same as walking along the tangent line. Both paths feel identically steep in this tiny neighborhood. Whether we choose to walk along the the curve or the line, each bit of horizontal progress is accompanied by the same amount of vertical progress. As before, “horizontal progress” in this case means specifically “the movement of time forward” and “vertical progress” specifically “the movement of distance upward.” To know how much vertical progress accompanies a bit horizontal progress is to know how d follows along with t. But our imaginary walk made clear that as t steps a little distance away from the point of tangency, d follows along as if moving on the tangent line. For the immediate neighborhood around the point of tangency, this answers the question, how does d follow along as t moves? The question, in other words, what is the derivative of f at the point of tangency? It is just the slope of the tangent line. 4.2 Rules for taking derivatives The previous section aimed at giving us a good intuitive grasp of what a derivative is. This section, on the other hand, is concerned only with laying out practical mathematical rules to mechanically find derivatives. In the last section, we used the example of speed to motivate the idea of a derivative. We wrote distance as a function of time and showed how speed can be seen as the derivative of this function. At first we assumed that speed was constant. Then we assumed that it varied, that it was different at different times, in other words, that the derivative of the distance function depended on time. In short, the derivative of the distance function, like CHAPTER 4. DIFFERENTIATION 46 the distance function itself, is also a function of time. We use the “prime” symbol, 0 , to denote a derivative, so that if d = f (t) then speed = derivative of f = f 0 (t). f 0 is an altogether new function of t. When we assumed a constant speed of 60 miles per hour, the distance function was given by f (t) = 60t. In this case, the derivative, as we know, is just 60, no matter the time at which we choose to find it, so f 0 (t) = 60. (4.1) When we assume speed varies, f is more complicated and so is f 0 . The righthand side of Equation 4.1 will no longer be just a number but a complicated expression involving t. Taking a derivative, then, amounts to transforming one function, f , into a second function, f 0 , which depends on the same variable. The process of transformation is called differentiation. When we take the derivative of a function, we say that we are differentiating it. This section gives, without justifying them, the rules by which differentiation can be done mechanically. Table 4.1 summarizes these rules. Before we dive in, a word on notation. While using the “prime” symbol to denote a derivative is often convenient, in many circumstances other symbols work better. To repeat the definition made many times already, a function is a way of mapping an element of the domain, generically called x, to an element of the range, generically called y, a state of affairs we describe CHAPTER 4. DIFFERENTIATION 47 by y = f (x). The derivative of f , as we saw in the previous section, is really an expression dy of the relationship of y to x. To capture this fact, we use the notation dx to denote the derivative of y with respect to x, as speed, for example, is the derivative of distance with respect to time. Loosely speaking, the dy represents the change in y when x changes by dx. Putting them in a ratio reminds us that the derivative is the expression which tells us how y changes as x does. The following ways of denoting a derivative are all equivalent: f 0 (x) = y 0 = dy df (x) d = = f (x). dx dx dx Of these symbolic conventions, perhaps the last best captures the idea that taking a derivative transforms one function into another. Removing the “f (x)” part of the expression, we’re left with d . dx d The dx is like an idling machine, sitting and waiting for a function to be d dropped into the slot, , whereupon dx will transform it into a new function, the derivative. Constant functions The derivative of a constant function is 0. If f (x) = c, for some constant c, then f 0 (x) = 0. For example, if f (x) = 29, then f 0 (x) = 0. Linear functions The derivative of a linear function is the slope of the line that is its graph. If the linear function is given by f (x) = mx + b, then its derivative is given by CHAPTER 4. DIFFERENTIATION 48 f 0 (x) = m. For example, if f (x) = 29x, then f 0 (x) = 29. If f (x) = 29x − 13, then f 0 (x) = 29. Power functions A power function is one in which the variable is raised to the power of a fixed exponent, that is, a function of the form f (x) = xn . For a function of this form, f 0 (x) = nxn−1 . Some examples: • If f (x) = x29 , then f 0 (x) = 29x28 . • If f (x) = x−1 = x1 , then f 0 (x) = (−1)x−1−1 = −x−2 1 = − 2. x • If f (x) = √ 1 x = x 2 , then 1 1 f 0 (x) = x 2 −1 2 1 1 = x− 2 2 1 = 1 2x 2 1 = √ . 2 x The exponential function The exponential function is its own derivative. (Incidentally, this is the reason e is “natural.”) If f (x) = ex , then f 0 (x) = ex . Alternatively, we can d x write dx e = ex . CHAPTER 4. DIFFERENTIATION 49 The natural logarithm If f (x) = ln x, then f 0 (x) = x1 . Alternatively, we can write d dx ln x = x1 . The sum of two functions The derivative of the sum of two functions is the sum of the derivatives of the two individual functions. If f (x) = g(x) + h(x), then f 0 (x) = g 0 (x) + h0 (x). d d d d f (x) = dx [g(x) + h(x)] = dx g(x) + dx h(x). Alternatively, we can write dx Some examples: • If f (x) = x2 + x, then d f (x) dx d 2 = [x + x] dx d d 2 x + x = dx dx = 2x + 1. f 0 (x) = • If f (x) = x3 + ln x, then d f (x) dx d 3 = [x + ln x] dx d 3 d = x + ln x dx dx 1 = 3x2 + x f 0 (x) = The product of two functions If f (x) = g(x) · h(x), then f 0 (x) = g 0 (x) · h(x) + h0 (x) · g(x). Some examples: CHAPTER 4. DIFFERENTIATION 50 • If f (x) = x3 ln x, then d f (x) dx d 3 [x ln x] = dx d 3 d = x · ln x + ln x · x3 dx dx 1 = 3x2 ln x + · x3 x 2 = 3x ln x + x2 f 0 (x) = • If f (x) = ex , x2 then d f (x) dx d ex = dx x2 d x −2 = e x dx d −2 d x −2 e ·x + x · ex = dx dx f 0 (x) = = ex · x−2 + (−2)x−3 · ex ex 2ex = 2− 3 x x The quotient of two functions If f (x) = g(x) , h(x) then f 0 (x) = g 0 (x)·h(x)−h0 (x)·g(x) . [h(x)]2 An example: CHAPTER 4. DIFFERENTIATION • If f (x) = ex , x2 51 then f 0 (x) = = = = = d f (x) dx d ex dx x2 d 2 d x e · x2 − dx x · ex dx [x2 ]2 ex x2 − 2xex x4 x e 2ex − x2 x3 The composition of two functions: the chain rule If f (x) = g(h(x)), then f 0 (x) = g 0 (h(x)) · h0 (x). This is a particularly important rule of differentiation. Some examples: 2 • If f (x) = ex , and if we let h(x) = x2 , then d f (x) dx d h(x) e = dx = eh(x) · h0 (x) d 2 = ex · x2 dx x2 = 2xe f 0 (x) = CHAPTER 4. DIFFERENTIATION 52 • If f (x) = ln(x3 + 3), and if we let h(x) = x3 + 3, then f 0 (x) = = = = = d f (x) dx d [ln h(x)] dx 1 · h0 (x) h(x) 1 · 3x2 3 x +3 3x2 x3 + 3 Table 4.1: Rules of Differentiation Type of function Function Derivative Constant f (x) = c f 0 (x) = 0 Linear f (x) = mx + b f 0 (x) = m Power f (x) = xn f 0 (x) = nxn−1 Exponential f (x) = ex f 0 (x) = ex Logarithmic f (x) = ln x f 0 (x) = Sum f (x) = g(x) + h(x) f 0 (x) = g 0 (x) + h0 (x) Product f (x) = g(x) · h(x) Quotient f (x) = g(x) h(x) Composition (Chain Rule) f (x) = g(h(x)) 4.3 1 x f 0 (x) = g 0 (x) · h(x) + h0 (x) · g(x) f 0 (x) = g 0 (x)·h(x)−h0 (x)·g(x) [h(x)]2 f 0 (x) = g 0 (h(x)) · h0 (x) The second derivative In the introduction of the previous section, we saw that taking a derivative is a special way of transforming one function into another. There’s no reason we can’t repeat the process, differentiating the derivative to obtain yet CHAPTER 4. DIFFERENTIATION 53 another function. This last function is called the second derivative, since it is the original function differentiated twice over: d f (x) dx = f 0 (x) first derivative = d [first derivative] dx d 0 f (x) = dx = f 00 (x) second derivative = (4.2) As with the (first) derivative, sometimes the “double-prime” notation of Equation 4.2 is convenient to denote the second derivative, but sometimes other symbols work better. If, as usual, we set y = f (x), then the second derivative can be written in any of the following ways: f 00 (x) = y 00 = d2 y d2 f (x) d2 = = f (x). dx2 dx2 dx2 The last expression captures the idea that differentiating twice over transforms one function into another, just as differentiating once does. Removing the “f (x)” part of the expression, we’re left with d2 dx2 which is like a machine waiting for a function to be fed in the slot, , whereupon it will spit out that function’s second derivative. Here are some examples of the second derivative: CHAPTER 4. DIFFERENTIATION 54 • If f (x) = x3 , then d2 f (x) dx2 d d = f (x) dx dx d d 3 = x dx dx d 2 = 3x dx = 6x f 00 (x) = • If f (x) = x ln x, then f 00 (x) = = = = = = d2 f (x) dx2 d d f (x) dx dx d d (x ln x) dx dx d 1 1 · ln x + x · dx x d [ln x + 1] dx 1 x What good is the second derivative? One use relates to concave functions. Earlier, we characterized a concave function by imagining what it would be like to walk along its graph: a decreasingly steep climb, leveling out towards the top, followed by an increasingly steep descent. But the measure of a curve’s steepness at any point is just the slope of the tangent line there: if, for instance, the slope is a large, positive number, the tangent line rises steeply, and the curve, too, can be said to be rising steeply at the CHAPTER 4. DIFFERENTIATION 55 point of tangency; if the slope is a small, negative number, the tangent line – and thus the curve at the point of tangency – are descending gently. As we walk along the graph of a concave function, the slope of the tangent line at every successive point of our journey keeps falling, from large positive numbers to small positive ones to small negative ones to large negative ones. y m = 0.4 m = -0.5 m=1 x Figure 4.4: The slope of tangent lines decreases along the graph of a concave function. But we know that the slope of the tangent line at a given point along the curve is just the derivative at that point. If the value of the slope keeps falling, the value of the derivative must also be falling. Suppose that y is this value, namely, that y = f 0 (x). To say that the value of f 0 is falling is to say that as x increases, y decreases. And this is the answer to the question, how does y follow along as x moves, which is the very definition of a derivative. What do we know about this CHAPTER 4. DIFFERENTIATION 56 derivative? We know it tells us that as x goes up, y goes down. We know, in short, that it is negative: derivative of f 0 < 0. But derivative of f 0 = derivative of (the derivative of f ) = second derivative of f = f 00 We conclude that a function, f , is concave if and only if f 00 (x) < 0 for all x. This fact gives us a quick way of testing whether a function is concave. For instance, if we weren’t familiar with the shape of the graph of the natural logarithm function, we could instead determine whether it was concave by taking its second derivative: d d d2 ln x = ln x dx2 dx dx d 1 = dx x 1 = − 2. x Since x2 is always positive, − x12 must always be negative, and f 00 (x) < 0 for all x. The natural logarithm function is indeed concave. CHAPTER 4. DIFFERENTIATION 4.4 57 Partial derivatives For a function of a single variable, y = f (x), we defined the derivative as the answer to the question, how does y follow along as x moves? Here, we develop a similar concept for functions of several variables. In Section 1.3, we described a bathtub spout fed by a cold water tap and a hot water tap. The temperature of the water coming out of the shared spout depended on two inputs: how much cold water was fed in and how much hot. The temperature, we summarized, was a function of two variables, T = f (C, H). We might initially try extending the definition of the derivative to cover this function by positing that in this case, the derivative is the answer to the question, how does T change as C and H do? But there is a problem with this approach. If, for example, T were to rise, how would we know whether it was because the hot water tap had been opened farther or because less cold water was feeding in? And if T fell, was it because there was more cold water or because there was less hot? In short, if we let both the amount of cold water and the amount of hot water vary at the same time, it’s impossible to separate out the effect on T of either C or H individually. What we can do is leave one of the taps fixed while we twiddle the other, say, leave the hot water tap halfway open while we vary only the amount of cold water. In this case, it’s clear that any effect on the temperature is due entirely to a change in the amount of cold water feeding in. That is, temperature now depends on only one variable: T = g(C). And we already know very well how to find the derivative of a function of a single variable. This derivative is called a partial derivative – partial because it only tells part of the story about T ’s relationship to its inputs, C and H: the question of its relationship to H was bracketed in order to CHAPTER 4. DIFFERENTIATION 58 isolate the effects of C. Suppose now that we have an explicit formula for the temperature, something like T = f (C, H) = −20C + 80H + 75. For instance, if the cold water tap is fully open (C = 1) and the hot water tap fully closed (H = 0), then T = f (1, 0) = −20 · 1 + 80 · 0 + 75 = 55. If, on the other hand, the hot tap is fully open and the cold tap fully closed, T = f (0, 1) = 155. Finally, if the hot water tap is fixed at halfway open (H = 12 ) but C remains variable, then 1 T = f (C, ) 2 = −20C + 80 · 1 + 75 2 = −20C + 115 So g(C) = −20C + 115, and its derivative is d g(C) dC d = [−20C + 115] dC d d = (−20C) + 115 dC dC = −20 g 0 (C) = Now instead of fixing H at 12 , we could have fixed it at any value between 0 (fully closed) and 1 (fully open) – at H0 , say. In this case, T = f (C, H0 ) = CHAPTER 4. DIFFERENTIATION 59 −20C + 80H0 + 75. Although this looks like a function of two variables, it’s actually a function of only one: it depends only on C; the other variable, H, was fixed at the constant value, H0 . As before, we can express T as a function only of C: T = g(C) = f (C, H0 ) = −20C + 80H0 + 75 And as before, we can differentiate g: d g(C) dC d = [−20C + 80H0 + 115] dC d d d (−20C) + (80H0 ) + 115 = dC dC dC g 0 (C) = But H0 is a constant and thus 80H0 is as well. The derivative of a constant is always 0, so, continuing the calculation above, we have d d d (−20C) + (80H0 ) + 115 = −20 + 0 + 0 dC dC dC = −20. To conclude, the derivative of g is −20 no matter what value, H0 , we fix H at. This result has a straightforward interpretation: it means that for every fraction the cold water tap is opened, the temperature at the spout will fall by the same fraction of 20 degrees, no matter what the hot water setting. Why, we can now ask, bother appending the little subscript “0” to H at all? In the calculations above, the notation served as nothing more than a crutch for our imaginations, there to remind us that H0 should be understood as a constant, not a variable. Why not just imagine H as a constant in the first place, with or without the subscript? Our calculations above CHAPTER 4. DIFFERENTIATION 60 would not have been any different. In fact, when we formally take the partial derivative of a function of several variables, we do exactly this: we fix all but one of the variables, not with subscripts or by plugging in an actual number but only in our imaginations. To formally signify the partial derivative, we use a special ∂ , which operates on functions according to the very same rules, notation, ∂x d summarized in Table 4.1, that dx does. The set of symbols, ∂ f (x, y) ∂x means, “Imagine y is fixed, so that f becomes in effect a function of one variable, x. Then take the derivative of this one-variable function in the ∂ usual way.” We say that ∂x f (x, y) is the partial derivative of f with respect to x. We could also of course imagine fixing x and taking the partial derivative of f with respect to y, in which case we would write ∂ f (x, y). ∂y Here are some examples of partial derivatives: • If f (x, y) = −20x + 80y + 115, then ∂ ∂ f (x, y) = [−20x + 80y + 115] ∂x ∂x ∂ ∂ ∂ = (−20x) + (80y) + 115. ∂x ∂x ∂x ∂ Now as far as ∂x is concerned, 80y is nothing but a constant. (Remember, we’re imagining y is fixed.) The partial derivative of a constant is 0, just as it is for the ordinary derivative. Continuing, then, we have ∂ ∂ ∂ (−20x) + (80y) + 115 = −20 + 0 + 0 ∂x ∂x ∂x = −20. CHAPTER 4. DIFFERENTIATION Likewise, we can take the partial derivative of f with respect to y: ∂ ∂ f (x, y) = [−20x + 80y + 115] ∂y ∂y ∂ ∂ ∂ = (−20x) + (80y) + 115 ∂y ∂y ∂y = 0 + 80 + 0 = 80. • If f (x, y, z) = x3 + yz + x2 yz + exyz , then ∂ ∂ 3 f (x, y, z) = [x + yz + x2 yz + exyz ] ∂x ∂x ∂ 3 ∂ ∂ 2 ∂ xyz = (x ) + (yz) + (x yz) + (e ) ∂x ∂x ∂x ∂x ∂ (xyz) = 3x2 + 0 + 2xyz + exyz · ∂x = 3x2 + 2xyz + yzexyz . 61 Chapter 5 Maximums and minimums 5.1 Using derivatives to find maximums and minimums One of the primary uses of the derivative – and nowhere more so than in economics – is to find maximums and minimums. A maximum, we might say, is where something peaks. But what, exactly, does it mean to peak? If we walk up a hill, over the top, and down the other side, the peak is precisely the point where our journey goes from being uphill to being downhill. Like a hill, a function, y = f (x), can also peak. As x steadily increases, y may rise and fall. The precise point where it goes from rising to falling is a maximum. Where y is rising, the derivative of f is positive. Where it is falling, the derivative is negative. At the very point where it goes from rising to falling, the derivative, passing precisely there from positive values to negative ones, must be 0. Figure 5.1 describes this phenomenon in an intuitive way. A minimum of a function is the opposite of a maximum: it is a point where the function “bottoms out.” Whereas at a maximum, the function goes from rising to falling, at a minimum, it goes from falling to rising. At the point where the minimum occurs, the derivative, now passing from 62 CHAPTER 5. MAXIMUMS AND MINIMUMS 63 y m=0 m>0 m<0 x Figure 5.1: The derivative – the slope of the tangent line – is 0 at a maximum. negative to positive, is also 0. To find the maximums and minimums of a function, then, we first find where its derivative is 0, that is, we solve the equation f 0 (x) = 0. Solving it may yield one value for x, several, or none. Each of these values becomes a candidate for a point where a maximum or a minimum occurs, but until we inspect them further, we can’t declare whether a maximum, a minimum, or neither. The problem is, at these values of x, we don’t know if the function is going from rising to falling, from falling to rising, or whether it has simply “flattened out” between two periods of rising or two of falling. So we need a test that tells us if the derivative itself is rising or falling or neither, just CHAPTER 5. MAXIMUMS AND MINIMUMS 64 as the derivative in its time told us whether the original function was rising or falling or, as at maximums and minimums, neither. The value of the second derivative is this test. If the value is negative, the derivative itself is trending down, as in Figure 5.1, which represents a maximum. If the value is positive, the derivative is trending up, and we must have found a minimum. If the value is 0, we’ve found neither a maximum nor a minimum but a point, called an inflection point, where the original function flattens out between two bouts of rising or two of falling. The test itself is called the second-order condition for a maximum or a minimum. Consider two examples. From its graph, we can tell the function f (x) = 3 x has no maximum or minimum. y x Figure 5.2: The graph of y = x3 . If we nevertheless go ahead with our procedure for finding maximums and CHAPTER 5. MAXIMUMS AND MINIMUMS 65 minimums, we start by setting f 0 (x) = 3x2 = 0. This equation has only one solution, namely, x = 0. The second derivative of f (x) = x3 is f 00 (x) = 6x and when x = 0, f 00 (x) = 0 as well. Since the second derivative is neither positive – as it would be had we found a minimum – nor negative – as it would be had we found a maximum – we conclude f has nothing but an inflection point at x = 0, where, the graph shows, it merely flattens out between two periods of rising. Now consider the function f (x) = x3 − 27x. y m=0 x m=0 Figure 5.3: The graph of y = x3 − 27x. CHAPTER 5. MAXIMUMS AND MINIMUMS 66 We start as usual by setting f 0 (x) = 3x2 − 27 = 0. This equation has two solutions, x = 3 and x = −3. The second derivative of f is f 00 (x) = 6x. At x = 3, f 00 (3) = 18, which is positive, so there we have a minimum. At x = −3, f 00 (−3) = −18, which is negative, and there we find a maximum. A last word of caution: the procedure we’ve just run through is guaranteed only to find peaks and valleys, technically called local maximums and local minimums. It may happen, as in the example above, that the peak it finds is not the greatest value the function assumes anywhere. (x3 − 27x grows arbitrarily large as x does.) Likewise, the valley it finds may not be the least value the function assumes. Finally, there may be many peaks, some higher than others, and many valleys, some lower than others. The case typically isn’t settled in a single step but by a process of sifting and examining details. 5.2 The Lagrangian The previous chapter’s method for finding maximums and minimums is very useful – but limited. In particular, it only deals with functions of one variable. Many problems of maximization and minimization arise in economics that require a more powerful method to solve. The Lagrangian furnishes this method. Here, without making the least attempt to explain why it works, we present it as a purely mechanical procedure. CHAPTER 5. MAXIMUMS AND MINIMUMS 67 Step 1: Set up A Lagrangian problem always involves two components: 1) a function of several variables, whose value we wish to maximize (or minimize), and 2) a constraint on the values those variables can assume. For instance, we might have a function f (x, y) = xy of two variables. If we don’t constrain the possible values for x and y, we can make f (x, y) grow arbitrarily large by letting either x or y run freely out to infinity. In this case, f will have no finite maximum. If, on the other hand, we weren’t so permissive, we could stipulate up front that any combination of x and y on which we allow f to be evaluated come from a narrower set, say, the set of x-y combinations that satisfy the relationship x2 + y 2 = 1. In a move straight out of Section 1.5 on level curves, we can cast this relationship in terms of a function. The x-y combinations that satisfy the relationship x2 + y 2 = 1 are identical to the x-y combinations that make the function g(x, y) = x2 + y 2 − 1 equal to 0. (x2 + y 2 − 1 = 0 is the same as x2 + y 2 = 1.) That g(x, y) must equal 0 is the constraint on x and y. Together with one new variable, we now combine our two functions, f and g, in a special way to form a third function, the Lagrangian itself: L(x, y, λ) = f (x, y) − λg(x, y). The new variable, λ, is called the Lagrangian multiplier . In our particular example, L(x, y, λ) = xy − λ(x2 + y 2 − 1). CHAPTER 5. MAXIMUMS AND MINIMUMS 68 Step 2: Differentiate the Lagrangian L is a function of several variables, in the present case, of x, y, and λ. This means that it has three partial derivatives, one with respect to each variable. We take these three partial derivatives and set them all equal to 0: ∂ L(x, y, λ) = 0 ∂x ∂ L(x, y, λ) = 0 ∂y ∂ L(x, y, λ) = 0. ∂λ If f and g had been functions of more than two variables, L would be a function of more than three variables, and the above list of equations would be longer. But returning to our particular example, ∂ L(x, y, λ) = y − 2λx ∂x ∂ L(x, y, λ) = x − 2λy ∂y ∂ L(x, y, λ) = x2 + y 2 − 1. ∂λ All of these expressions in x, y, and λ are set equal to 0, resulting in a set of simultaneous equations: y − 2λx = 0 (5.1) x − 2λy = 0 (5.2) x2 + y 2 − 1 = 0. (5.3) Step 3: Solve the simultaneous system of equations This is typically the hairiest step of the Lagrangian method. We have to solve the set of simultaneous equations we obtained at the end of Step 2, and CHAPTER 5. MAXIMUMS AND MINIMUMS 69 this can get messy. Generally, we proceed by solving one of the equations for one variable in terms of the remaining variables, aiming for something of the form, isolated variable = expression involving remaining variables. For instance, we can solve Equation 5.1 for λ to obtain λ= y . 2x (5.4) (Crucially, we must assume at this point that x is nonzero, otherwise we can’t divide by it.) Now, we substitute this expression for the isolated variable into the remaining equations, thereby reducing the number of variables and the number of equations by one. We then repeat this cycle of isolation and substitution until we get down to one equation in one variable, which, with any luck, will be straightforward to solve. y So, substituting λ = 2x in Equations 5.2 and 5.3, we have y ·y =0 2x x2 + y 2 − 1 = 0. x−2· (5.5) (5.6) Equation 5.5 simplifies to x2 − y 2 = 0. (5.7) The slightly clever thing to notice now is that Equation 5.7 tells us x2 = y 2 , so without taking any square roots, we can substitute directly for y 2 in CHAPTER 5. MAXIMUMS AND MINIMUMS 70 Equation 5.6, obtaining x2 + x2 − 1 = 0 2x2 = 1 1 x2 = 2 1 x = ±√ . 2 As if unpacking matryushka dolls, we went from three equations to two equations to one equation, which we’ve just solved for x. Now we have to pack the dolls back up, solving for y by substituting x into any one of the two equations, and then for λ by substituting x and y into any one of the three equations. So, plugging either x = √12 or x = − √12 into Equation 5.7, we have y2 = 1 2 1 y = ±√ . 2 Finally, substituting various combinations of positive and negative for x and y in Equation 5.4, we find that 1 λ=± . 2 In outline, Step 3 proceeds by digging down, through cycles of isolation and substitution, until we strike a hard number, that is, until we’ve solved for one of the variables. We then work our way back up, filling in variables with hard numbers. The result is combinations of values where f possibly CHAPTER 5. MAXIMUMS AND MINIMUMS 71 assumes a maximum (or minimum): 1 x = ±√ 2 1 y = ±√ 2 1 λ=± . 2 Step 4: Check the solutions Step 3 suggested several combinations of x and y that might maximize or minimize f (x, y). To know whether it’s maximize or minimize, we need to plug the combinations into f . If x and y are both negative or both positive, e.g. x = − √12 and y = − √12 , then f (x, y) = xy 1 1 · ±√ = ±√ 2 2 1 = . 2 On the other hand, if x and y are of opposite sign, e.g. x = then √1 2 and y = − √12 , f (x, y) = xy 1 1 = ±√ · ∓√ 2 2 1 =− . 2 Thus f has a maximum of 12 at (x, y) = (± √12 , ± √12 ) and a minimum of − 12 at (x, y) = (± √12 , ∓ √12 ). It also never hurts to plug in zero values for variables we assumed at CHAPTER 5. MAXIMUMS AND MINIMUMS 72 some point in Step 3 were nonzero, just to make sure the assumption was not unwarranted. We did in fact assume that x, for one, was nonzero. But if x = 0, then f (x, y) = xy = 0 as well, so f attains neither a maximum nor a minimum when x = 0. Our assumption was warranted after all. Index ∂ , 55 ∂x d2 y , 49 dx2 d2 , 49 dx2 dy , 42 dx d , 42 dx of the product of two functions, 45 of the quotient of two functions, 46 of the sum of two functions, 44 partial, 53 rules of, 47 second, 47 differentiation, 41 rules of, 42, 47 diminishing marginal utility, 17 domain, 2 e, 25 x-axis, 6 xy-plane, see coordinate plane y-axis, 6 y-intercept, 28 base, 20 chain rule, 46 concave, 50 coordinate plane, 6 exponent, 20 rules of, 20 derivative, 35 of a constant function, 42 of a linear function, 43 of a power function, 43 of an exponential function, 44 of the composition of two functions, 46 of the natural logarithm, 44 function, 2 composing, 12 concave, 15 convex, 15 exponential, 20 inverse, 10 linear, 27 mathematical, 3 73 INDEX of several variables, 5 graph, 5 inflection point, 58 Lagrangian, 60 Lagrangian multiplier, 61 level curves, 8, 10 line, 28 logarithm, 23 natural, 24 rules of, 25 mapping, 2 maximum, 57 minimum, 57 origin, 6 plane, see coordinate plane range, 2 second-order condition, 58 slope, 28 system of linear equations, 33 tangent, 39 74