Uploaded by jsampon1

Math Companion

advertisement
The SAIS Math Companion
Aaron Roth, MA ’06
2005
Contents
Contents
i
List of Figures
iii
List of Tables
iv
1 Functions and graphs
1.1 What is a function? . . . . . .
1.2 Functions in mathematics . .
1.3 Functions of several variables
1.4 From functions to graphs . . .
1.5 Level curves . . . . . . . . . .
1.6 Inverses . . . . . . . . . . . .
1.7 Composing functions . . . . .
1
1
2
4
4
8
11
13
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Some special functions
16
2.1 Concave and convex functions . . . . . . . . . . . . . . . . . 16
2.2 Exponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Exponential and logarithmic functions . . . . . . . . . . . . 23
3 Linear equations
29
3.1 Linear functions . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Solving simultaneous equations . . . . . . . . . . . . . . . . 33
i
CONTENTS
4 Differentiation
4.1 What is a derivative? . . . .
4.2 Rules for taking derivatives
4.3 The second derivative . . . .
4.4 Partial derivatives . . . . . .
ii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
39
39
45
52
57
5 Maximums and minimums
62
5.1 Using derivatives to find maximums and minimums . . . . . 62
5.2 The Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . 66
Index
73
List of Figures
1.1
1.2
1.3
1.4
1.5
1.6
1.7
2.1
2.2
2.3
3.1
3.2
3.3
A function maps elements of a domain to elements of a range. . . . . . .
A function cannot map one element to several... . . . . . . . . . . . . .
...but a function can map several elements to one. . . . . . . . . . . . .
A graph is a way of representing a function. . . . . . . . . . . . . . . .
Using the graph to find function values. . . . . . . . . . . . . . . . . .
Some level curves of f (x, y) = x2 + y 2 .
. . . . . . . . . . . . . . . . .
The inverse of a function is the function “run backwards.” . . . . . . . .
A convex function “hangs down.” . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Diminishing marginal utility. . . . . . . . . . . . . . . . . . . . . . .
A concave function “arches up.”
The graph of y = 21 x − 1. . . . . . . . . . . . . . . . . . . . . . . . .
The graph of y = −3x + 2. . . . . . . . . . . . . . . . . . . . . . . .
The intersection of y = 2x + 1 and y = 3x − 1. . . . . . . . . . . . . . .
4.1
4.2
4.3
4.4
As x moves through the domain, y follows along through the range. . . . .
5.1
5.2
5.3
The derivative – the slope of the tangent line – is 0 at a maximum. . . . .
Near cities, d follows along with t more slowly. . . . . . . . . . . . . . .
The tangent line rests neatly against a curve. . . . . . . . . . . . . . .
The slope of tangent lines decreases along the graph of a concave function.
3
The graph of y = x . . . . . . . . . . . . . . . . . . . . . . . . . . .
The graph of y = x3 − 27x. . . . . . . . . . . . . . . . . . . . . . . .
iii
1
2
2
6
7
10
11
16
17
18
31
31
35
39
43
44
55
63
64
65
List of Tables
2.1
2.2
Rules of Exponents . . . . . . . . . . . . . . . . . . . . . . . . .
Rules of Logarithms . . . . . . . . . . . . . . . . . . . . . . . .
22
27
4.1
Rules of Differentiation . . . . . . . . . . . . . . . . . . . . . . .
52
iv
Chapter 1
Functions and graphs
1.1
What is a function?
A function is a way of associating things in one set with things in another,
a way, for each thing in the first set, of finding a corresponding thing in the
second. We say that a function maps one set to another. We call the first
set the domain and the second the range.
function
Domain
Range
Figure 1.1: A function maps elements of a domain to elements of a range.
The white pages, for instance, are a kind of function: for each name, the
phone book gives a corresponding number. It maps names to numbers.
1
CHAPTER 1. FUNCTIONS AND GRAPHS
2
The concept of a function is very general – but not completely openended. A function cannot be ambiguous. For each element in the domain,
it must give a unique answer to the question, what is the corresponding
element in the range? It cannot map one element to more than one. The
index of a book on World War II is not a function: under the name Adolf
Hitler, the index lists many page numbers.
Note, on the other hand, that there’s nothing wrong with a function
mapping several different elements in the domain to the same element in
the range. The white pages give the same number for the name George
Bush and for the name Laura Bush.
Hitler, Adolf
Bush, George
(202) 555-1234
p. 54
p. 145
p. 603
Bush, Laura
Figure 1.2: A function cannot map
one element to several...
1.2
Figure 1.3: ...but a function can
map several elements to one.
Functions in mathematics
There are many ways to specify how a function works, how it takes an
element of the domain to the corresponding element of the range. The phone
book works by listing an element of the domain – a person’s name – side
by side with the corresponding element of the range – the person’s number.
Starting with a name, finding the number is straightforward.
In mathematics, functions commonly map numbers to numbers, in a way
specified by a formula. The formula tells us how to start with one number
CHAPTER 1. FUNCTIONS AND GRAPHS
3
as input, manipulate it, and find another number as output. For example,
a formula might specify that we take a number, square it, and add three to
produce the result. If x is the number we start with and f is the name of
our function, then in symbols we write this formula
f (x) = x2 + 3.
In particular, if we run the number 2 through our formula, we come out
with 7. Mathematically, we write this f (2) = 7 and say that f maps the
number 2 to the number 7.
We often want to name the result produced by a function. It’s customary
to call it y and write
y = f (x).
This is simply shorthand for “y is the result of applying function f to the
number x,” or to put it another way, for “y is the number in the range of
f corresponding to the number x in its domain.” In the particular example
above, x is 2, y is 7, and f is the method, “take x, square it, and add three.”
Note in passing that f generally maps two different numbers in the domain
to the same number in the range. f (−2), for example, is also 7.
Minor confusion sometimes arises when the same symbol is used to name
both a function and its result. In economics, for instance, we use the word
“demand” both in expressions like, “at $30, demand for widgets is 500
units,” and in expressions like, “the demand curve slopes downward.” In
the first case, we are thinking of demand as a number; in the second, as
a function. This state of affairs occurs because we describe demand as a
function of price by the equation
D = D(P ).
Here, P represents the price of a unit; the first D in the equation represents
a number, the number of units demanded; and the second D represents a
function, the demand function. It’s often convenient to give the same name
CHAPTER 1. FUNCTIONS AND GRAPHS
4
to two different kinds of things, but as long as the dual role of a particular
symbol is kept in mind, this need not cause confusion.
1.3
Functions of several variables
Imagine a bath with two taps, one for cold water and one for hot, feeding
into a single spout. The temperature of the water coming out of the common
spout depends on how far open each of the taps is. It depends on two
inputs. In other words, the temperature is a function of two variables. We
can represent this state of affairs in an equation,
T = f (C, H)
where T is the temperature, C is the amount of water coming from the cold
tap, and H the amount coming from the hot tap.
A function need not be limited to two variables. It can take any number
of variables as inputs, combining them to produce a single output. Gross
national product, for instance, depends on four things: consumption, investment, government spending, and net exports. In fact, it is simply the sum
of these four inputs:
Y = f (C, I, G, N X) = C + I + G + N X
where, using the conventional abbreviations, Y represents gross national
product, C is consumption, I investment, G government spending, and
N X net exports.
1.4
From functions to graphs
This figure shows the function f (x) = x2 +3 mapping certain numbers in its
domain to the corresponding numbers in its range, which, being the result
of f , we give the name y.
CHAPTER 1. FUNCTIONS AND GRAPHS
5
x:
0
1
2
3
4
y:
3
4
7
12
19
The points that represent elements of the range have been specially placed
and joined with an imaginary curve segment drawn with a dashed line. If
we compact this diagram by turning it on its side, shrinking the domain
“bean” to a horizontal line segment and the range bean to a vertical one,
and shortening the arrows until each is exactly as long as the number it
points to, we arrive at a leaner representation of the function f – its graph
– shown in Figure 1.4.
We call the horizontal line segment that the domain bean has shrunk
to the x-axis and the vertical line segment that the range bean has become
the y-axis. The point where the axes cross is called the origin. The flat,
two-dimensional space on which the graph is drawn is called the coordinateor xy-plane or sometimes simply the plane.
A graph is just another way of looking at a function, much like a red
line traced on a map is another way of looking at a route originally given as
a series of directions, “go straight for half a mile on Massachusetts Avenue;
at the light, turn left onto 16th Street; continue five miles...” Our function,
too, was originally given as a series of directions, “take a number, square
it, and add three,” which we then abbreviated to f (x) = x2 + 3.
Although a graph is another way of specifying a function, because we
can’t draw past the edge of a sheet of paper, a graph can only show a portion
of a function. It is a “window” at a certain place on a function that may
CHAPTER 1. FUNCTIONS AND GRAPHS
6
Figure 1.4: A graph is a way of representing a function.
y (range)
x (domain)
extend infinitely beyond the window’s frame. Our graph of f (x) = x2 + 3
only shows the portion of it between 0 and 4, even though it would be easy
enough to compute and draw its values at 5, where y = f (5) = 28, or at
83.78, or at −1, −2, −29, and so on.
If we hadn’t been given the formula for the function, the graph would
still tell us how f works – or at least a certain portion of f . Just as we can
navigate a route without a list of written directions by following the marks
on a map, starting with a number in the domain, we can find our way to
the corresponding number in the range without an explicit formula for the
function.
From our starting number in the domain, that is, on the x-axis, we
simply travel due north until we hit the line of the graph. The distance
we’ve traveled is exactly the value of the function, and the ruler by which
we measure that distance is just the y-axis.
CHAPTER 1. FUNCTIONS AND GRAPHS
7
Figure 1.5: Using the graph to find function values.
y
(3) ...then look to the y-axis
to see how far we traveled.
(2) ...until we hit the graph...
(1) Start with a number on
the x-axis and travel north...
x
There is another very useful way to think of a graph. If we turn to the
index of a street map, under the name of any long road will be a list of
map coordinates: E-2, E-3, F-3, G-4, etc. These coordinates tell us where
on the map to find the road. In fact, if we colored in the squares on the map
designated by this series of letter-number pairs, we would roughly trace out
the route of the road. Conversely, the road as it wends its way picks out a
particular series of coordinate pairs. In short, we can identify a road with
a list of coordinates.
Likewise, a graph wends its way through the coordinate plane, picking
out a set of coordinate pairs, and we can identify the graph with the set of
coordinates it passes through. Instead of a letter-number pair, coordinates
in the xy-plane are designated by a pair of numbers: the x-coordinate that
tells how far “east” or “west” to go along the x-axis from the origin, and
the y-coordinate that tells us how far north or south to go along the y-axis.
CHAPTER 1. FUNCTIONS AND GRAPHS
8
A point, P , in the coordinate plane is thus written
P = (x, y).
To locate P , we travel x units to the east (or if x is negative, to the west)
and y units to the north (or south).
Which points does our old function, f (x) = x2 +3, pass through? Exactly
those points (x, y) such that y = f (x). That is, f establishes a particular
relationship between x’s and y’s. Only certain (x, y)-pairs satisfy this relationship, exactly those where y = x2 + 3, such as the pairs (−2, 7) and
(7, 52) – but not the pairs (0, 0) and (2, −7). A point in the plane is either
on the graph – if its coordinates satisfy the relationship y = f (x) – or the
point is not on the graph – if y 6= f (x) for the x- and y-coordinates of that
point. A function establishes a kind of “in-or-out” test for every point in
the plane. Its graph is exactly the set of points it rules “in.”
This way of conceiving of a graph, as a set of coordinate pairs that
satisfy a relationship established by a function, as a set of points specially
picked out of the plane, will be a very useful one to have in mind as you
read the later section on simultaneous equations and the following one on
level curves.
1.5
Level curves
It’s possible for two variables to be in a relationship that’s impossible to
describe by a function. The circle of radius one, for example, is the set of
√
points (x, y) such that x2 +y 2 = 1. The point ( 12 , 23 ) satisfies the relationship
√
√
x2 + y 2 = 1, since ( 12 )2 + ( 23 )2 = 14 + 34 = 1. But the point ( 12 , − 23 ) also
satisfies it. If the relationship could be described by a function, y = f (x),
√
√
then f would have to map 12 to both 23 and − 23 . It would have to map one
number to several, something a function cannot by definition do. Therefore
f cannot exist. On the other hand, nothing stops us from graphing the
CHAPTER 1. FUNCTIONS AND GRAPHS
9
circle: it is, after all, just the set of points (x, y) in the plane that satisfy
the relationship x2 + y 2 = 1. There are more graphs in the world than can
be described by functions alone.
y
x
Whether it can ultimately be described by a function or not, any relationship between x and y can, as with the points of a circle, be put in the
form,
expression involving only x and y = some constant.
(1.1)
And out of the expression involving only x and y, we can make a function
of two variables, in the case of circles, for example, the function
f (x, y) = x2 + y 2 .
By letting the constant in Equation 1.1 range over a variety of values and
setting f equal, we generate a whole family of relationships between x and
CHAPTER 1. FUNCTIONS AND GRAPHS
10
y:
x2 + y 2 = 1
x2 + y 2 = 4
x2 + y 2 = 9
..
.
As it happens, the constants we’ve set f equal to represent the radii of
various circles, or rather, the squares of their radii. We can depict all these
circles simultaneously in a single picture.
y
radius = 3
r=2
r=1
x
Figure 1.6: Some level curves of f (x, y) = x2 + y 2 .
These circles form a family: they resemble each other in shape and differ
only in size. This is because they are all the offspring, so to speak, of the
same function, f . We call this family the level curves of f .
CHAPTER 1. FUNCTIONS AND GRAPHS
1.6
11
Inverses
Assume for the moment that each person has a single, unique phone number.
The phone book tells us how to find that number given a name. Caller ID
is a kind of reverse phone book: it tells us the name given the number. If
the phone book is a function that maps numbers to names, then we say
that caller ID, which maps a number back to the name it belongs to, is the
inverse of this function.
Bush, George
(202) 555-1234
inverse function
Domain
Range
Figure 1.7: The inverse of a function is the function “run backwards.”
Think of the inverse of a function as the function run backwards.
The inverse of a function, if it exists, is itself a function: it maps the
range of the original function back to the original function’s domain. That
is, the domain of the inverse function is the range of the original function,
and vice versa. The domain of the caller ID function, the elements it maps
from, is a set of numbers, the very set of numbers the phone book function
maps to.
Sometimes the inverse of a function does not exist. This happens when
the original function maps more than one element in its domain to the
same element in its range, as in Figure 1.3, where each person does not
have a unique phone number. If the inverse function existed, it would have
CHAPTER 1. FUNCTIONS AND GRAPHS
12
to map the number (202) 555-1234 back to two different names, George
Bush and Laura Bush. This is like the situation depicted in Figure 1.2,
where one element is mapped to several, something a function is forbidden
do. It follows that whenever a function maps several elements to one, it has
no inverse.
For example, our function f (x) = x2 + 3 has no inverse. It maps two
different numbers, 2 and −2, to the same number, 7. The inverse function,
if it existed, would have to map 7 back to both 2 and −2, and this would
disqualify it from being a function.
How can we find the inverse of a mathematical function in general? The
inverse of a function is the function run backwards. If a function specifies, “take nine-fifths of a number and add thirty-two,” then, reading these
directions backwards, the inverse will specify, “take a number, subtract
thirty-two, and multiply by five-ninths.” (Incidentally, these are the formulas for converting from degrees Celsius to degrees Fahrenheit and back.) We
abbreviate the function in symbols
9
f (C) = C + 32.
5
A function takes an input, manipulates it, and produces a result. The
inverse takes that result, runs the function’s manipulations backwards, and
returns the original input. In this example, C is the input, f does the
manipulating, and the result we’ll call big F , so that we can write
9
F = f (C) = C + 32.
5
This equation gives us the result, F , in terms of the input, C. The inverse
will give us the original input, C, in terms of the original result, F . In effect,
CHAPTER 1. FUNCTIONS AND GRAPHS
13
we need to solve the equation, F = 95 C + 32, for C in terms of F :
9
F = C + 32
5
9
F − 32 = C
5
5(F − 32) = 9C
5
(F − 32) = C
9
(1.2)
(1.3)
The last equation simply says in symbols what we’ve already said in words:
“take a number, F , subtract thirty-two, and multiply by five-ninths to produce the result, C.” The equation, which gives C in terms of F , specifies
the inverse of the function little f , which gave us F in terms of C. We use
the notation f −1 (read “f inverse”) to denote the inverse of a function, and
we write
5
C = f −1 (F ) = (F − 32).
(1.4)
9
Where f gave us the formula for converting from degrees Celsius to degrees
Fahrenheit, f −1 gives us the formula for converting degrees Fahrenheit back
to degrees Celsius.
This is how we find the inverse of a mathematical function in general.
We start with some equation, y = f (x), identical in form to Equation 1.2,
giving y in terms of x. Through a similar series of steps, which ends with x
isolated on one side of an equation, as C is in Equation 1.3, we arrive at an
expression for x in terms of y, of the form x = f −1 (y), as in Equation 1.4.
1.7
Composing functions
Often when we’re learning to perform a complex action, we break it down
into a series of simpler steps, taking one, pausing, and then taking the next.
After practicing the individual steps for a while, we combine them into one
seamless motion.
CHAPTER 1. FUNCTIONS AND GRAPHS
14
Composing mathematical functions is nothing more than taking the
steps specified by one function, then performing the steps specified by a
second function on the result. The first function might specify, “take a
number and square it,” and the second function, “take a number and add
three.” In symbols, we represent the first function
h(x) = x2
and the second
g(x) = x + 3.
Composing these two functions means applying one after the other: “Take
a number and square it. Then take the result and add three.” Or, after
removing the superfluous pauses: “Take a number, square it, and add three.”
And this, of course, is the specification for our old friend, f (x) = x2 + 3.
Note that the composition of two functions is itself a function.
We can arrive at this composed function more directly by a mathematical route. We simply express the composition, “apply h, then apply g,” in
symbols:
composition of h and g = g h(x) .
Plugging h(x) into g, we get
g h(x) = h(x) + 3.
But we know that h(x) = x2 , so
g h(x) = h(x) + 3
= x2 + 3
(1.5)
(1.6)
= f (x)
This is the general procedure for composing two functions. We plug the
first, “inner,” function into the second, “outer,” function, as in Equation 1.5.
CHAPTER 1. FUNCTIONS AND GRAPHS
15
Then we expand the inner function, as in Equation 1.6. The result is a
function, which here we’ve called f , that is the composition of the two
original ones.
Chapter 2
Some special functions
2.1
Concave and convex functions
The graph of a convex function looks like a sagging wire. On the portion of
a convex function shown by a graph, it “hangs down” between endpoints.
Figure 2.1: A convex function “hangs down.”
y
x
The graph of a concave function, by contrast, looks like an arcing rain16
CHAPTER 2. SOME SPECIAL FUNCTIONS
17
bow. On the portion of a concave function shown by a graph, it “arches up”
between endpoints.
Figure 2.2: A concave function “arches up.”
y
x
Imagine now that you could cross over the rainbow from one end to
the other as if crossing a bridge. The most arduous part of your journey
is at the outset, where the way up is steepest and you progress more by
climbing than by walking. As you continue, the way up becomes more and
more gradual until, when you are very close to the top, it’s as if you were
walking on level ground. Your descent then begins, gradually at first, but
becoming steeper and steeper as you near the end.
This is the essential characteristic of a concave function: as you “walk”
along its graph from left to right, it goes “uphill” less and less steeply or
alternatively, goes “downhill” more and more steeply. To put it another way,
as you walk from left to right, each successive step forward carries you less
distance upward – or carries you farther downward – than the one before.
Concave functions are everywhere in economics. Whenever you hear the
words, “diminishing marginal something-or-other,” you are dealing with
a concave function. The “marginal” in these kinds of expressions refers to
CHAPTER 2. SOME SPECIAL FUNCTIONS
18
what’s happening at the leading edge – or margin – of your progress as you
trace your way left to right along the graph of the function. Does each step
forward at the margin carry you higher or lower than the one before? How
many units of vertical progress are you making for each one of horizontal
progress? The “diminishing” part of such expressions tells us that in fact,
with each step of horizontal progress, our vertical progress diminishes, in
other words, that we are traveling along the graph of a concave function.
It often happens that the longer we do a particular activity, the less
pleasure we get from doing it. When we’re hungry, the first bite of food is
heavenly, the second excellent, the third pretty good, but, as our hunger
is satisfied, we get less and less pleasure from each successive bite until,
after twenty-nine bites, past full, each additional bite grows more and more
distasteful. Far from giving pleasure, at this point eating is taking pleasure
away.
Figure 2.3: Diminishing marginal utility.
pleasure (utility)
bite
Economists call pleasure “utility.” To express the fact that the first bite
gives the most pleasure and that each one after that brings less than the one
before, economists say that marginal utility is diminishing. Moving from left
CHAPTER 2. SOME SPECIAL FUNCTIONS
19
to right along the graph of the utility function in Figure 2.3, with each step
of horizontal progress (a bite), vertical progress (pleasure) diminishes. In
other words, the utility function is concave, as we could already tell from
the shape of its graph.
In Chapter 4, which concerns derivatives, we’ll see a more technical and
mathematically usable definition of “concave.” For the moment, it will be
instructive to translate our verbal and graphical description of concavity
into mathematical symbols. In fact, there’s almost no skill more worth acquiring than an ability to move easily back and forth between pictorial and
symbolic representations, between geometry and algebra.
Consider the example above. The utility derived from eating is a function
of the number of bites taken, a fact we state in symbols
U = f (b).
How, then, do we translate the notion of diminishing marginal utility into
symbols? Suppose that so far we’ve taken m bites, in other words, that m
is the leading edge, the margin, of our eating progress. Our next bite will
be the (m + 1)th, and the bite before the current one was the (m − 1)th.
The gain in utility from taking the next bite will be
Unext − Ucurrent = f (m + 1) − f (m)
as the gain in utility from having taken the current bite was
Ucurrent − Ulast = f (m) − f (m − 1).
To say that marginal utility is diminishing is to say that we’ll gain less
utility from taking our next bite than we gained from taking our last one,
a fact we state in symbols
Unext − Ucurrent < Ucurrent − Ulast
CHAPTER 2. SOME SPECIAL FUNCTIONS
20
or, substituting for the U ’s,
f (m + 1) − f (m) < f (m) − f (m − 1).
(2.1)
Remembering that f (m) is the height of the graph of f above the x-axis at
m (see Figure 1.4), we can read Inequality 2.1 as saying the vertical progress,
f (m + 1) − f (m), made with the horizontal step from m to m + 1 is less
than the vertical progress, f (m) − f (m − 1), made with the horizontal step
from m − 1 to m. And this is simply a translation from symbols back to our
original, verbal description of concavity. We have only been switching back
and forth between different representations – verbal, symbolic, graphical –
of the same underlying idea. Learn to do this with ease. There’s no skill
more valuable.
2.2
Exponents
Somewhere on a remote island in the Pacific, a single breeding pair of rabbits is released. Each spring, they give birth to another breeding pair. In
fact, every breeding pair gives birth to another breeding pair each spring.
Initially, there are only the original two rabbits on the island. After the
first spring, there are four: the two original rabbits and their two offspring.
Each of these pairs then breeds, producing two more pairs of rabbits the
following spring, for a total of eight rabbits – or four breeding pairs.
It’s easy to see that the rabbit population is doubling every year.
Years after release
0
1
2
3
..
.
Pairs of rabbits
1
2
4
8
..
.
CHAPTER 2. SOME SPECIAL FUNCTIONS
21
To figure out how many pairs of rabbits there will be at the end of the
current spring, we simply double the number of the pairs there were after
the last spring:
Pcurrent = 2 · Plast .
(2.2)
But the number of pairs after last spring was equal to twice the number of
pairs from the spring before that:
Plast = 2 · Pyear
bef ore last .
We can then substitute this expression for Plast in Equation 2.2 to obtain
Pcurrent = 2 · (2 · Pyear bef ore
{z
|
last ) .
Plast
}
It’s not hard to see now that we can continue this expansion...
Pcurrent = 2 · 2 · 2 · Pthree
years ago
...until we work our way back to year 0, when there was only the one original
breeding pair. That is, P0 = 1. At this point, we’ll have a long string of 2’s
(multiplied, finally, and to no effect, by the original 1):
Pcurrent = 2| · 2 ·{z. . . · 2} ·1
n times
where n is the number of springs that have passed since the original pair
was released.
It’s tiresome to have to write 2 · 2 · 2 · 2 · . . . for more than a couple of
2’s, so we use an abbreviation, 2n . Read “two to the n” or “two to the nth”
or “two to the power of n,” 2n is simply shorthand for “multiply 2 by itself
n times.” We call 2 the base and n the exponent. The number of pairs of
CHAPTER 2. SOME SPECIAL FUNCTIONS
22
rabbits is a function of the number of years that have passed:
P = f (n) = 2n .
That is, the number of rabbit pairs on the island in year n is 2 to the power
of n. We call this kind of a function, where the variable is in the exponent
and the base is fixed, an exponential function.
Let’s consider some examples. 34 – 3 times itself four times – is 81, and
51 – 5 times itself one time – is just 5. Exponents can even be negative. We
interpret b−n as b1n . For instance, 2−1 = 12 . Any number to the 0th power is
1. If 2, say, is multiplied by itself m times, and the result is then multiplied
by 2 another n times, 2 will have been multiplied by itself a total of m + n
times. In symbols, 2m · 2n = 2m+n .
All these observations and conventions, along with several others, are
summarized in the following rules of exponents:
Table 2.1: Rules of Exponents
Rule
Example
b0 = 1
40 = 1
b1 = b
41 = 4
b−m = b1m
√
m
b n = n bm
1
4−2 = 412 = 16
√
3
4 2 = 64 = 8
bm bn = bm+n
42 41 = (4 · 4) · 4 = 43 = 64
bm
= bm−n
bn
m n
mn
43
42
=
4·4·4
4·4
= 41 = 4
(b ) = b
(42 )3 = (4 · 4)(4 · 4)(4 · 4) = 46 = 4096
bm cm = (bc)m
bm
b m
=
m
c
c
42 32 = (4 · 4)(3 · 3) = 122 = 144
43
4·4·4
4 3
=
=
= 23 = 8
3
2
2·2·2
2
CHAPTER 2. SOME SPECIAL FUNCTIONS
2.3
23
Exponential and logarithmic functions
We’ve already seen an example of an exponential function in the previous
section, one that gives us an expression for the growth of a population of
rabbits. Any process of growth in which the size of something in one period
is a multiple of its size in the previous period is described by an exponential
function. These kinds of growth processes are everywhere. An economy
is said to grow by such-and-such a percentage over its size the previous
year. A savings account grows by interest compounded daily. The common
characteristic of all these processes is that they “feed on themselves,” the
output of one stage becoming the input of the next.
For a practical application, let’s look at stock prices. How, for example,
could we figure out the current price of a share of Microsoft stock if we are
told the price has grown 12% a year since 1990, when a share cost $5? In
1991, the stock is worth 12% more than it was the year before:
P1991 = P1990 + 0.12P1990 = 1.12P1990 .
The next year, 1992, the stock will again rise by 12% over the previous
year’s price:
P1992 = P1991 + 0.12P1991
= 1.12P1991
= 1.12 · (1.12P1990 )
| {z }
P1991
2
= 1.12 P1990 .
It’s not hard to see that by 2005, the price of a share of Microsoft stock will
be
P2005 = 1.1215 P1990 .
(2.3)
But we were given the price of a share in 1990: $5. Thus the price in 2005
CHAPTER 2. SOME SPECIAL FUNCTIONS
24
is
P2005 = 1.1215 · 5 = 5.47 · 5 = 27.35.
Suppose now that instead of being asked to find the price of a share
in 2005, we had been asked how many years it would take, starting at $5
in 1990, for the price of share to reach $50, assuming that annual growth
continues indefinitely at 12%. Not knowing how many years this will be,
we call the length of time x. The price of a share x years after 1990 is, by
assumption, $50, but we know that this must equal 1.12x P1990 . That is,
1.12x P1990 = 50
1.12x · 5 = 50
1.12x = 10
(2.4)
But now we are at an impasse. We don’t have any way at the moment of
getting the x down from its perch in the position of exponent, so that we
could reduce Equation 2.4 to an expression of the form x = . . . telling us
exactly what x was.
Let’s set this problem aside for a moment and return to the one in
the previous section involving rabbits. There, we came up with a function,
P = f (n) = 2n , that told us how many pairs of rabbits there were after n
years. We might, though, be interested in the opposite question. Namely,
how many years will it take for there to be P pairs of rabbits? How many
years, for instance, will it take for there to be 64 pairs of rabbits? Given
that after n years there are 2n pairs, when there are 64 pairs, 2n = 64. But
we already happen to know that 64 is 26 , that is, 2 multiplied by itself six
times. We can see then that n must be 6. It will take six years for there to
be 64 pairs of rabbits on the island.
Thinking back for a moment to Section 1.6 on inverses, we can describe
what we have just done in working backwards from the number of rabbits to
the number of years – where before f had taken us from years to rabbits –
as an application of the inverse function, f −1 . Where f gave us rabbit pairs,
CHAPTER 2. SOME SPECIAL FUNCTIONS
25
P , in terms of years, n, by the formula P = 2n , the inverse of f will give us
an expression, n = f −1 (P ), for n in terms of P . It tells us the number of
years it takes for the rabbit population to reach a size of P pairs. Where f
answered the question, “what is 2 raised to the nth power?,” f −1 answers
the question, “given a number, to what power must 2 be raised to get that
number?”
Remember that f , with its variable in the exponent, is called an exponential function. The inverses of exponential functions are important enough in
their own right to merit a special name. They are called logarithmic functions. The function f (n) = 2n is an exponential function with a base of 2.
Its inverse, f −1 , will be a logarithmic function with the same base. We use
a special abbreviation for the logarithmic function with a base of 2, writing
f −1 (P ) = log2 (P ).
The righthand side of this equation is read, “log base two of P ” and is
simply shorthand for, “start with a number, P , and return the exponent to
which we must raise 2 to get P .” Compare this description of the inverse
with a description of the original function, “start with a number, n, and
return the result of raising 2 to this number.”
There’s no reason we need to stick to a base of 2. We could, for instance,
take logarithms base 5:
log5 1 = 0
log5 5 = 1
log5 25 = 2
log5 125 = 3
..
.
Perhaps the most intuitive way to think of an expression like “logb x” is to
read it as, “to what power must we raise b to get x?”
CHAPTER 2. SOME SPECIAL FUNCTIONS
26
Now we can revisit the Microsoft share price problem we set aside when
we reached the impasse of Equation 2.4. If we apply a logarithm base 1.12
to each side of this equation, we’ll almost be done.
1.12x = 10
log1.12 1.12x = log1.12 10
Simplifying the lefthand of this last equation is straightforward: we can read
it as, “to what power must we raise 1.12 to get 1.12x ?” Well, x, since 1.12
raised to the power of x is 1.12x . That is, the lefthand side of the equation
simplifies to x:
log1.12 1.12x = x = log1.12 10.
But now we’re at another impasse. We don’t know how to compute log1.12 10.
How do you figure out what power to raise 1.12 to in order to get 10?
We have to set the problem aside again to make another useful digression. We saw that logarithms can be taken using any base: 2, 5, whatever.
It turns out that there’s a “natural” choice of base, although its value, an
infinite decimal beginning 2.718281..., at first makes it seem anything but
natural. The reasons that make this base natural are deep but arcane. Suffice it to say that one of the consequences of its naturalness is the ease with
which logarithmic and exponential functions that use it as a base can be
computed, although this, too, is not at all obvious. In any case, a calculator
will do it for you.
This number, 2.718281..., is so fundamental, cropping up in so many different places, that it’s designated by its own letter, e. The function exp(x) =
ex is called the exponential function. The inverse, a logarithmic function,
is called the natural logarithm and denoted “ln,” as in exp−1 (x) = ln x. In
other words, ln x = loge x. Just as there are rules of exponents, presented
in the previous section, there are rules of logarithms. Compare these, laid
out in Table 2.2, to the rules of exponents in Table 2.1.
Finally we’re in a position to finish off the share price problem we’ve
CHAPTER 2. SOME SPECIAL FUNCTIONS
27
Table 2.2: Rules of Logarithms
Rule
Example
logb (xy) = logb x + logb y
logb xy = logb x − logb y
log2 (4 · 16) = log2 4 + log2 16 = 2 + 4 = 6
log5 125
= log5 125 − log5 5 = 3 − 1 = 2
5
logb xm = m logb x
log3 94 = 4 log3 9 = 4 · 2 = 8
logb bx = x
ln ex = x
blogb x = x
eln 1776 = 1776
twice set aside. Starting again from Equation 2.4, now instead of taking the
logarithm base 1.12, we’ll take it base e:
ln 1.12x = ln 10.
A rule in Table 2.2 tells us how to simplify the lefthand side of this equation:
ln 1.12x = x ln 1.12.
The complete calculation now looks like this:
1.12x P1990 = 50
1.12x · 5 = 50
1.12x = 10
ln 1.12x = ln 10
x ln 1.12 = ln 10
ln 10
x=
ln 1.12
At this point, we just plug numbers into our calculator and find that
x=
ln 10
= 20.32.
ln 1.12
CHAPTER 2. SOME SPECIAL FUNCTIONS
28
In other words, if the price of a share of Microsoft stock grows at 12% a year
starting in 1990, it will reach $50 in about 20 years, 4 months, sometime
in the first half of 2010. We check that our work is correct by calculating
1.1220.32 · P1990 , the formula for the price of a share after 20.32 years. In fact,
1.1220.32 · P1990 = 1.1220.32 · 5 = 50, as expected.
Chapter 3
Linear equations
3.1
Linear functions
The two simplest things that can be done to a number are multiplying it
by another number and adding another number to it. Any series of multiplications and additions can be consolidated into a single multiplication
and a single addition: multiplying in three steps, first by 2, then by 5, then
by 3, is the same as multiplying once by 30. After consolidating, then, any
transformation of one number into another by multiplications and additions
reduces to a multiplication by a single number, call it m, followed by the
addition of a single number, call it b. Such a transformation, which maps
one number to another, is a function, represented in symbols
f (x) = mx + b.
Functions of this form are called linear . They turn up everywhere, and
much of mathematics is devoted to reducing intractable questions about
more complicated functions to simpler questions about linear functions.
This is the case, as we will see in the next chapter, with derivatives.
For now, consider some examples of a linear functions. A bookie takes
a variety of bets. On a wager with three-to-one odds, for instance, he pays
29
CHAPTER 3. LINEAR EQUATIONS
30
winning bets three times what was originally staked, minus a standard $10
commission. If s represents the original stake, then the money cleared by a
winning gambler is given by the linear function,
W = f (s) = 3s − 10.
Here, m is 3 and b is -10.
Another example comes from cooking. A recipe, as written, serves four.
You, on the other hand, only need to serve two, so you halve the quantity
of each ingredient. If two cups of flour are called for, you use one. If q0 is
the original quantity needed of some ingredient, then the new quantity, q1 ,
is given by the linear function,
1
q1 = f (q0 ) = q0 .
2
Here, m is 12 and b is 0.
Linear functions get their name from their graphs, which are straight
lines. Figure 3.1, for instance, shows the graph of the function, y = f (x) =
1
x − 1. And Figure 3.2 shows the graph of y = f (x) = −3x + 2. There are
2
three things to notice about these graphs:
• The larger m is ( 21 in Figure 3.1, -3 in Figure 3.2), the “steeper” the
line.
• If m is positive, as in Figure 3.1, the line slopes up to the right. If m
is negative, as in Figure 3.2, the line slopes down to the right.
• The line crosses the y-axis at b.
Two parameters, m and b, are enough to describe a linear function. In
graphical terms, this amounts to saying that a line is completely characterized by its steepness and by where it crosses the y-axis. Both of these traits
have a special name. The steepness of a line is called its slope. The place
where it crosses the y-axis is called its y-intercept. In Figure 3.1, the slope
CHAPTER 3. LINEAR EQUATIONS
31
Figure 3.1: The graph of y = 12 x − 1.
y
x
Figure 3.2: The graph of y = −3x + 2.
y
x
of the line is 12 and its y-intercept is -1. In Figure 3.2, the slope is -3 and
the y-intercept 2.
CHAPTER 3. LINEAR EQUATIONS
32
Recall our discussion of concavity in Section 2.1. There, we characterized
a concave function by imagining what it would be like to walk along its
graph from left to right: an increasingly gradual climb, leveling out towards
the top, followed by an increasingly steep descent. We can perform the same
imaginative exercise on lines. Walking along the graph of a linear function,
that is, along a straight line, requires the same amount of effort wherever
we are on it. It’s just as steep where we finish as where we start – and at all
points in between – like an endless ramp. The steepness of a line, its slope,
is the same everywhere. Constancy of slope is the essence of a line.
What is the translation of this last, somewhat mystical statement into
mathematical terms? Consider any two distinct points on a line with slope
m, call them P0 = (x0 , y0 ) and P1 = (x1 , y1 ). To say that a line has the
same steepness everywhere is the same as saying, as we imagine ourselves
walking along it, that we make the same vertical progress with every step of
horizontal progress. Every step forward carries us the same fixed distance
up or down.
y
(x1 , y1 )
(x0 , y0 )
distance up
step forward
x
Constancy of slope means that every bit of horizontal progress must be
matched by a proportionate amount of vertical progress, no matter where
CHAPTER 3. LINEAR EQUATIONS
33
on the line we’re walking. For instance, if one step forward carries us three
steps upward, then two steps forward must carry us six steps upward. The
ratio of vertical steps to horizontal steps is fixed. Between P0 and P1 , there
are x1 − x0 horizontal steps and y1 − y0 vertical steps, thus the ratio of
y1 − y0 to x1 − x0 is fixed and is equal to the slope of the line, no matter
which points we’ve chosen for P0 and P1 :
y1 − y0
=m
x1 − x0
3.2
(3.1)
Solving simultaneous equations
At the end of Section 1.4, we discussed how a graph can be identified with
the set of points in the plane that it passes through, just as a road on a
map can be identified with the set of grid coordinates it passes through.
Graphs, like roads, can intersect each other. If the index of a map lists the
coordinates D-3, D-4, D-5 for Massachusetts Avenue and the coordinates
C-4, D-4, E-4 for Florida Avenue, we can tell without looking at the map
itself that these two roads intersect in square D-4.
The graph of a function f is the set of points in the plane whose xand y-coordinates satisfy the relationship y = f (x). The graph of a second
function, g, is the set of points whose coordinates satisfy y = g(x). How
can we tell where these two graphs intersect? In other words, how can we
tell which points are in both the set of coordinates identified with f and
the one identified with g?
In the case of roads, we simply looked up which coordinate the index
listed for both Massachusetts Avenue and Florida Avenue. This procedure is
intuitive enough, but it’s worth examining in detail. What’s actually going
on when we scan the index is a matching up of letter parts of the grid
coordinates between those listed for Massachusetts Avenue and those listed
for Florida Avenue. Whenever we encounter two coordinates, one on each
list, with matching letter parts, we check whether the number parts match
CHAPTER 3. LINEAR EQUATIONS
34
as well. If both the letter and the number parts match, we’re done: we’ve
found the place the roads intersect.
Functions are a little trickier, since the list of coordinates they pass
through is more than likely infinite, but the procedure for finding the intersection of their graphs is essentially the same. Instead of matching up the
letter parts of coordinates, we match up y-coordinates. Once y-coordinates
are matched, we determine which x-coordinates match, just as in the case
of roads, once we had matched letter parts (the map’s “y-coordinates”), we
matched number parts (the map’s “x-coordinates”).
The y-coordinates of points on the graph of f are given by y = f (x)
and for g by y = g(x). Matching up y-coordinates means setting the two
functions equal, since
f (x) = y = g(x).
But this equation,
f (x) = g(x)
is one that will have only x’s on the left and only x’s on the right – it
will be an ordinary equation in one variable and with any luck, one not
too hard to solve. Solving it is effectively “matching up” x-coordinates, and
once solved, we’ll know all the x-coordinates of the points where the graphs
of f and g intersect. To find the y-coordinates, we simply plug these x’s
back into either the equation y = f (x) or the equation y = g(x), whichever
seems easier to compute.
Now a concrete example. Suppose we have two linear functions,
f (x) = 2x + 1
and
g(x) = 3x − 1.
The graphs of these two functions are lines, which intersect at exactly one
point.
CHAPTER 3. LINEAR EQUATIONS
35
Figure 3.3: The intersection of y = 2x + 1 and y = 3x − 1.
y
(5,2)
x
To find the x-coordinate of this point, we set these two functions equal:
f (x) = g(x)
2x + 1 = 3x − 1
Here is our equation with only x’s on both its right and its left sides, and
it is not hard to solve:
2x + 1 = 3x − 1
1=x−1
2=x
In the final step, we plug this x-value back into one of the original
CHAPTER 3. LINEAR EQUATIONS
36
functions. If we choose f , our computation looks like this:
y = f (2)
=2·2+1
=5
If instead we choose g, our computation runs thus:
y = g(2)
=3·2−1
=5
Either computation results in the same y-value, which makes sense, since
each computation started from the assumption that the y-coordinates match.
The x- and y-coordinates of the point P = (2, 5) simultaneously satisfy the
relationship y = f (x) and the relationship y = g(x). That is, P is simultaneously on the graph of f and the graph of g – it is the point where they
intersect.
Often, the graphs of lines are not given as functions y = f (x) and
y = g(x), but as a system of linear equations. A system of two linear
equations is just a pair of equations in the form
ax + by = s
cx + dy = t.
In the example above, functions f and g define two linear equations:
y = 2x + 1
y = 3x − 1.
CHAPTER 3. LINEAR EQUATIONS
37
These can be rewritten in our new form as
−2x + y = 1
−3x + y = −1.
In this case, a = −2, b = 1, and s = 1; c = −3, d = 1, and t = −1.
It’s only a little less convenient to find where two lines intersect when
they are described by a pair of linear equations than when they are described
by a pair of linear functions. We simply transform the given pair of equations
into a more convenient pair of functions and proceed as before, by setting the
functions equal. For instance, we might be given the system of equations,
−10x + 2y = 6
9x + 3y = −15.
We solve each equation above for y:
y = f (x) = 5x + 3
y = g(x) = −3x − 5.
And now we set y’s equal, as before:
5x + 3 = −3x − 5.
Solving, we find x = −1. Finally, we plug this value of x back into one of
the equations above to compute y:
y = 5 · (−1) + 3
= −2.
If we want to check our work, we can calculate y using the other equation
CHAPTER 3. LINEAR EQUATIONS
and see whether we get the same result:
y = (−3) · (−1) − 5
= −2.
38
Chapter 4
Differentiation
4.1
What is a derivative?
f
x0
y0
y1
y2
x1
x2
y3
x3
Range
Domain
Figure 4.1: As x moves through the domain, y follows along through the range.
Our first representation of a function, at the beginning of Chapter 1, was
a pictorial one. Figure 1.1 represents a function as an arrow mapping an
element of the domain, generically called x, to an element of the range,
generically called y. Which element of the range depends on which element
of the domain: choose x and f determines what y must be. When x rambles
39
CHAPTER 4. DIFFERENTIATION
40
around the domain, y gets dragged around the range, as if the arrow were
a leash tethering it to x. As x moves, y follows along.
The derivative is nothing more than the answer to the question, how
does y follow along as x moves? As x increases, does y increase with it? Or
does it decrease? Or does it increase at certain values of x and decrease at
others? And how sensitive is y to changes in x? Does a little tug on x cause
y to swing dramatically? Or is y a sluggard, barely stirring no matter how
wildly x moves around?
Speed is probably the example of a derivative where our intuitive grasp is
strongest. Suppose we travel by car from Washington to New York. Starting
from SAIS, after thirty minutes of driving we reach the Beltway. After an
hour and half, we reach Baltimore; after three hours, Philadelphia; and so
on. Distance traveled is a function of time spent traveling, which we express
by
d = f (t).
Thus f (0:30) = 13 mi, f (1:30) = 40 mi, f (3:00) = 136 mi, and so on.
As time elapses, distance increases. Or, to put it another way, as t
marches forward, minute by minute, through the domain, d follows along,
mile by mile, through the range. Sometimes, as when traveling through city
traffic, d follows along with t slowly. At other times, as when traveling on
highways, d follows along with t quickly. In short, how d follows along with
t is just our speed at any given time. We say that speed is the derivative of
distance with respect to time.
Suppose now that we travel at a constant speed, say, 60 miles per hour.
Our distance function in this case will be particularly simple:
d = f (t) = 60t.
And for this particular function we already know, having started with it,
the answer to the question, how does d follow along with t? (At 60 miles
per hour.) But let’s approach it from another direction.
CHAPTER 4. DIFFERENTIATION
41
To begin with, f is a linear function whose graph is a line with a slope of
60. In Section 3.1, we developed the idea of a line’s defining property being
its constancy of slope, a property we tried to get a feel for by imagining
ourselves walking along the line. For every one step forward, we took the
same number of steps up or down. If one step forward carried us two steps
down, then two steps forward would have to carry us four steps down. Every
bit of horizontal progress is accompanied by the same amount of vertical
progress, in a fixed ratio of vertical to horizontal – 60 miles to 1 hour in the
present example.
In fact, “horizontal progress” in this case means specifically “the movement of time forward” and “vertical progress” specifically “the movement
of distance upward.” As t moves forward, d follows along upward. How does
d follow along? However much t changes, d changes 60 times more. To make
a long story short, the derivative of f is, as we already knew, just 60.
And what’s true of this particular linear function is true of all linear
functions. For any function of the form f (x) = mx + b, the derivative is just
m. Equation 3.1 confirms this. It states in symbols exactly that the ratio
of vertical progress to horizontal progress is fixed, at m, everywhere on the
line, the graph of f :
y1 − y0
vertical progress
=
= m.
horizontal progress
x1 − x0
In other words, the derivative of a linear function is the slope of the line
that is its graph. Slope – the steepness of a line – is precisely the measure
of how fast y rises or falls off as x moves forward.
To elaborate, recall the derivative is nothing more than the answer to
the question, how does y follow along as x moves? In the case of a linear
function, y is given by
y = f (x) = mx + b.
For instance, if f (x) = −2x + 5, then as x goes from 1 to 2 to 3 to 4, y goes
from 3 to 1 to −1 to −3. As x moves, y follows along, and the derivative,
CHAPTER 4. DIFFERENTIATION
42
−2, tells us how. The fact that it’s negative tells us that as x increases, y
decreases. The fact that it’s −2 tells us that as x increases, y decreases twice
as fast: a little change in x causes double the change in y. For example, as
x goes from 1 to 4, y goes from 3 to −3: x changes by 3, y by −6, and the
ratio of vertical progress to horizontal progress is given by
−3 − 3
vertical progress
=
horizontal progress
4−1
= −2
=m
Linear functions are the simplest kind of function, and their derivatives
are correspondingly simple. More complicated functions have more complicated derivatives. In the case of linear functions, we can simply read off the
derivative from the function as written. For more complicated functions,
finding the derivative becomes a two-step process: first, the complicated
function is approximated by a linear one; then, once the problem is reduced
to a question about linear functions, we simply read off the derivative as
before.
Suppose, for instance, we had not made the simplifying assumption
about our journey to New York that our speed was constant. Suppose instead that our speed varies, in other words, that the derivative (of distance
with respect to time) is not constant. Sometimes d follows along with t
slowly and at other times quickly, but as before, distance traveled is still a
function of time spent traveling:
d = f (t).
In our first definition of the distance function, when we assumed speed was
constant, f was linear. Now, when speed varies, f is a more complicated
kind of function. In the first definition, the graph of f was a line. Now its
graph is a curve.
CHAPTER 4. DIFFERENTIATION
43
d
New York
Philadelphia
Baltimore
Beltway
t
Figure 4.2: Near cities, d follows along with t more slowly.
The trick to finding the derivative is to approximate the complicated
function f by a simpler linear function, the curving graph by a straight
line. How can we do this? As it happens, we’ve done it often before, though
perhaps not in a consciously mathematical context. Imagine, for instance,
watching the sun set over the ocean. Gazing out at the horizon, we see a
line, and we could be forgiven for thinking the earth was flat. But the earth
is round, after all, and its edge, the horizon, is not a line but a circle. It’s
just that a human is so small compared to the earth, that when we stare
out at the very, very short stretch of horizon our eyes can take in at sea
level, we hardly perceive it curving down at the far edges of our vision. In
short, if we blow up a curve big enough or, equivalently, shrink down the
observer small enough, a curve becomes indistinguishable from a line.
This, in essence, is how we find the derivative of a complicated function,
whose graph is a curve. We choose a point on the graph where we want to
CHAPTER 4. DIFFERENTIATION
44
find the derivative. We blow up the stretch of curve immediately to either
side of this point – ten-fold, a thousand-fold, a million-fold – until the stretch
in view is a similar proportion of the whole curve that the stretch of horizon
taken in by a human standing on the surface of the earth is of the earth’s
whole circumference. At this level of magnification, the curve looks like a
line. This line is the best approximation of the curve at the chosen point
and the linear function the line represents the best approximation of the
complicated function the curve represents.
If we zoom back out, the curve again looks like a curve, and the line only
approximates it well very close to the point where we took the derivative.
At that precise point, the line and the curve rest neatly against each other,
in the same way a coin balanced on its edge rests neatly against a tabletop.
We say that the line is tangent to the curve.
d
tangent line with
slope = 65 mph
point of tangency
t
Figure 4.3: The tangent line rests neatly against a curve.
If we imagine an ant walking along the graph in the immediate neighbor-
CHAPTER 4. DIFFERENTIATION
45
hood of the point of tangency, it would be very hard for the minute creature
to tell whether it was walking along the curve itself or along the the line that
approximates it there, just as it would be hard for us to tell, from the deck
of a ship, that we were not sailing across a flat expanse of ocean but a curving one. Walking along a short stretch of curve near the point of tangency
is much the same as walking along the tangent line. Both paths feel identically steep in this tiny neighborhood. Whether we choose to walk along the
the curve or the line, each bit of horizontal progress is accompanied by the
same amount of vertical progress.
As before, “horizontal progress” in this case means specifically “the
movement of time forward” and “vertical progress” specifically “the movement of distance upward.” To know how much vertical progress accompanies a bit horizontal progress is to know how d follows along with t. But
our imaginary walk made clear that as t steps a little distance away from
the point of tangency, d follows along as if moving on the tangent line. For
the immediate neighborhood around the point of tangency, this answers the
question, how does d follow along as t moves? The question, in other words,
what is the derivative of f at the point of tangency? It is just the slope of
the tangent line.
4.2
Rules for taking derivatives
The previous section aimed at giving us a good intuitive grasp of what a
derivative is. This section, on the other hand, is concerned only with laying
out practical mathematical rules to mechanically find derivatives.
In the last section, we used the example of speed to motivate the idea
of a derivative. We wrote distance as a function of time and showed how
speed can be seen as the derivative of this function. At first we assumed that
speed was constant. Then we assumed that it varied, that it was different at
different times, in other words, that the derivative of the distance function
depended on time. In short, the derivative of the distance function, like
CHAPTER 4. DIFFERENTIATION
46
the distance function itself, is also a function of time. We use the “prime”
symbol, 0 , to denote a derivative, so that if
d = f (t)
then
speed = derivative of f = f 0 (t).
f 0 is an altogether new function of t. When we assumed a constant speed
of 60 miles per hour, the distance function was given by
f (t) = 60t.
In this case, the derivative, as we know, is just 60, no matter the time at
which we choose to find it, so
f 0 (t) = 60.
(4.1)
When we assume speed varies, f is more complicated and so is f 0 . The righthand side of Equation 4.1 will no longer be just a number but a complicated
expression involving t.
Taking a derivative, then, amounts to transforming one function, f , into
a second function, f 0 , which depends on the same variable. The process of
transformation is called differentiation. When we take the derivative of a
function, we say that we are differentiating it. This section gives, without
justifying them, the rules by which differentiation can be done mechanically.
Table 4.1 summarizes these rules.
Before we dive in, a word on notation. While using the “prime” symbol
to denote a derivative is often convenient, in many circumstances other
symbols work better. To repeat the definition made many times already, a
function is a way of mapping an element of the domain, generically called x,
to an element of the range, generically called y, a state of affairs we describe
CHAPTER 4. DIFFERENTIATION
47
by
y = f (x).
The derivative of f , as we saw in the previous section, is really an expression
dy
of the relationship of y to x. To capture this fact, we use the notation dx
to denote the derivative of y with respect to x, as speed, for example, is
the derivative of distance with respect to time. Loosely speaking, the dy
represents the change in y when x changes by dx. Putting them in a ratio
reminds us that the derivative is the expression which tells us how y changes
as x does. The following ways of denoting a derivative are all equivalent:
f 0 (x) = y 0 =
dy
df (x)
d
=
=
f (x).
dx
dx
dx
Of these symbolic conventions, perhaps the last best captures the idea
that taking a derivative transforms one function into another. Removing
the “f (x)” part of the expression, we’re left with
d
.
dx
d
The dx
is like an idling machine, sitting and waiting for a function to be
d
dropped into the slot, , whereupon dx
will transform it into a new function,
the derivative.
Constant functions
The derivative of a constant function is 0. If f (x) = c, for some constant c,
then f 0 (x) = 0. For example, if f (x) = 29, then f 0 (x) = 0.
Linear functions
The derivative of a linear function is the slope of the line that is its graph. If
the linear function is given by f (x) = mx + b, then its derivative is given by
CHAPTER 4. DIFFERENTIATION
48
f 0 (x) = m. For example, if f (x) = 29x, then f 0 (x) = 29. If f (x) = 29x − 13,
then f 0 (x) = 29.
Power functions
A power function is one in which the variable is raised to the power of a
fixed exponent, that is, a function of the form f (x) = xn . For a function of
this form, f 0 (x) = nxn−1 . Some examples:
• If f (x) = x29 , then f 0 (x) = 29x28 .
• If f (x) = x−1 = x1 , then
f 0 (x) = (−1)x−1−1
= −x−2
1
= − 2.
x
• If f (x) =
√
1
x = x 2 , then
1 1
f 0 (x) = x 2 −1
2
1 1
= x− 2
2
1
=
1
2x 2
1
= √ .
2 x
The exponential function
The exponential function is its own derivative. (Incidentally, this is the
reason e is “natural.”) If f (x) = ex , then f 0 (x) = ex . Alternatively, we can
d x
write dx
e = ex .
CHAPTER 4. DIFFERENTIATION
49
The natural logarithm
If f (x) = ln x, then f 0 (x) = x1 . Alternatively, we can write
d
dx
ln x = x1 .
The sum of two functions
The derivative of the sum of two functions is the sum of the derivatives of the
two individual functions. If f (x) = g(x) + h(x), then f 0 (x) = g 0 (x) + h0 (x).
d
d
d
d
f (x) = dx
[g(x) + h(x)] = dx
g(x) + dx
h(x).
Alternatively, we can write dx
Some examples:
• If f (x) = x2 + x, then
d
f (x)
dx
d 2
=
[x + x]
dx
d
d 2
x + x
=
dx
dx
= 2x + 1.
f 0 (x) =
• If f (x) = x3 + ln x, then
d
f (x)
dx
d 3
=
[x + ln x]
dx
d 3
d
=
x +
ln x
dx
dx
1
= 3x2 +
x
f 0 (x) =
The product of two functions
If f (x) = g(x) · h(x), then f 0 (x) = g 0 (x) · h(x) + h0 (x) · g(x). Some examples:
CHAPTER 4. DIFFERENTIATION
50
• If f (x) = x3 ln x, then
d
f (x)
dx
d 3
[x ln x]
=
dx
d 3
d
=
x · ln x +
ln x · x3
dx
dx
1
= 3x2 ln x + · x3
x
2
= 3x ln x + x2
f 0 (x) =
• If f (x) =
ex
,
x2
then
d
f (x)
dx d ex
=
dx x2
d x −2 =
e x
dx
d −2
d x
−2
e ·x +
x
· ex
=
dx
dx
f 0 (x) =
= ex · x−2 + (−2)x−3 · ex
ex 2ex
= 2− 3
x
x
The quotient of two functions
If f (x) =
g(x)
,
h(x)
then f 0 (x) =
g 0 (x)·h(x)−h0 (x)·g(x)
.
[h(x)]2
An example:
CHAPTER 4. DIFFERENTIATION
• If f (x) =
ex
,
x2
51
then
f 0 (x) =
=
=
=
=
d
f (x)
dx d ex
dx x2
d 2
d x
e · x2 − dx
x · ex
dx
[x2 ]2
ex x2 − 2xex
x4
x
e
2ex
−
x2
x3
The composition of two functions: the chain rule
If f (x) = g(h(x)), then f 0 (x) = g 0 (h(x)) · h0 (x). This is a particularly
important rule of differentiation. Some examples:
2
• If f (x) = ex , and if we let h(x) = x2 , then
d
f (x)
dx
d h(x)
e
=
dx
= eh(x) · h0 (x)
d
2
= ex · x2
dx
x2
= 2xe
f 0 (x) =
CHAPTER 4. DIFFERENTIATION
52
• If f (x) = ln(x3 + 3), and if we let h(x) = x3 + 3, then
f 0 (x) =
=
=
=
=
d
f (x)
dx
d
[ln h(x)]
dx
1
· h0 (x)
h(x)
1
· 3x2
3
x +3
3x2
x3 + 3
Table 4.1: Rules of Differentiation
Type of function
Function
Derivative
Constant
f (x) = c
f 0 (x) = 0
Linear
f (x) = mx + b
f 0 (x) = m
Power
f (x) = xn
f 0 (x) = nxn−1
Exponential
f (x) = ex
f 0 (x) = ex
Logarithmic
f (x) = ln x
f 0 (x) =
Sum
f (x) = g(x) + h(x) f 0 (x) = g 0 (x) + h0 (x)
Product
f (x) = g(x) · h(x)
Quotient
f (x) =
g(x)
h(x)
Composition (Chain Rule) f (x) = g(h(x))
4.3
1
x
f 0 (x) = g 0 (x) · h(x) + h0 (x) · g(x)
f 0 (x) =
g 0 (x)·h(x)−h0 (x)·g(x)
[h(x)]2
f 0 (x) = g 0 (h(x)) · h0 (x)
The second derivative
In the introduction of the previous section, we saw that taking a derivative is
a special way of transforming one function into another. There’s no reason
we can’t repeat the process, differentiating the derivative to obtain yet
CHAPTER 4. DIFFERENTIATION
53
another function. This last function is called the second derivative, since it
is the original function differentiated twice over:
d
f (x)
dx
= f 0 (x)
first derivative =
d
[first derivative]
dx
d 0
f (x)
=
dx
= f 00 (x)
second derivative =
(4.2)
As with the (first) derivative, sometimes the “double-prime” notation of
Equation 4.2 is convenient to denote the second derivative, but sometimes
other symbols work better. If, as usual, we set y = f (x), then the second
derivative can be written in any of the following ways:
f 00 (x) = y 00 =
d2 y
d2 f (x)
d2
=
=
f (x).
dx2
dx2
dx2
The last expression captures the idea that differentiating twice over transforms one function into another, just as differentiating once does. Removing
the “f (x)” part of the expression, we’re left with
d2
dx2
which is like a machine waiting for a function to be fed in the slot, ,
whereupon it will spit out that function’s second derivative.
Here are some examples of the second derivative:
CHAPTER 4. DIFFERENTIATION
54
• If f (x) = x3 , then
d2
f (x)
dx2
d d
=
f (x)
dx dx
d d 3
=
x
dx dx
d 2
=
3x
dx
= 6x
f 00 (x) =
• If f (x) = x ln x, then
f 00 (x) =
=
=
=
=
=
d2
f (x)
dx2
d d
f (x)
dx dx
d d
(x ln x)
dx dx
d
1
1 · ln x + x ·
dx
x
d
[ln x + 1]
dx
1
x
What good is the second derivative? One use relates to concave functions. Earlier, we characterized a concave function by imagining what it
would be like to walk along its graph: a decreasingly steep climb, leveling
out towards the top, followed by an increasingly steep descent. But the
measure of a curve’s steepness at any point is just the slope of the tangent
line there: if, for instance, the slope is a large, positive number, the tangent
line rises steeply, and the curve, too, can be said to be rising steeply at the
CHAPTER 4. DIFFERENTIATION
55
point of tangency; if the slope is a small, negative number, the tangent line
– and thus the curve at the point of tangency – are descending gently. As
we walk along the graph of a concave function, the slope of the tangent line
at every successive point of our journey keeps falling, from large positive
numbers to small positive ones to small negative ones to large negative ones.
y
m = 0.4
m = -0.5
m=1
x
Figure 4.4: The slope of tangent lines decreases along the graph of a concave function.
But we know that the slope of the tangent line at a given point along
the curve is just the derivative at that point. If the value of the slope keeps
falling, the value of the derivative must also be falling. Suppose that y is
this value, namely, that
y = f 0 (x).
To say that the value of f 0 is falling is to say that as x increases, y decreases.
And this is the answer to the question, how does y follow along as x moves,
which is the very definition of a derivative. What do we know about this
CHAPTER 4. DIFFERENTIATION
56
derivative? We know it tells us that as x goes up, y goes down. We know,
in short, that it is negative:
derivative of f 0 < 0.
But
derivative of f 0 = derivative of (the derivative of f )
= second derivative of f
= f 00
We conclude that a function, f , is concave if and only if f 00 (x) < 0
for all x. This fact gives us a quick way of testing whether a function is
concave. For instance, if we weren’t familiar with the shape of the graph of
the natural logarithm function, we could instead determine whether it was
concave by taking its second derivative:
d d
d2
ln x =
ln x
dx2
dx dx
d 1
=
dx x
1
= − 2.
x
Since x2 is always positive, − x12 must always be negative, and
f 00 (x) < 0 for all x.
The natural logarithm function is indeed concave.
CHAPTER 4. DIFFERENTIATION
4.4
57
Partial derivatives
For a function of a single variable, y = f (x), we defined the derivative as
the answer to the question, how does y follow along as x moves? Here, we
develop a similar concept for functions of several variables.
In Section 1.3, we described a bathtub spout fed by a cold water tap
and a hot water tap. The temperature of the water coming out of the
shared spout depended on two inputs: how much cold water was fed in and
how much hot. The temperature, we summarized, was a function of two
variables,
T = f (C, H).
We might initially try extending the definition of the derivative to cover
this function by positing that in this case, the derivative is the answer to the
question, how does T change as C and H do? But there is a problem with
this approach. If, for example, T were to rise, how would we know whether
it was because the hot water tap had been opened farther or because less
cold water was feeding in? And if T fell, was it because there was more cold
water or because there was less hot? In short, if we let both the amount
of cold water and the amount of hot water vary at the same time, it’s
impossible to separate out the effect on T of either C or H individually.
What we can do is leave one of the taps fixed while we twiddle the other,
say, leave the hot water tap halfway open while we vary only the amount
of cold water. In this case, it’s clear that any effect on the temperature is
due entirely to a change in the amount of cold water feeding in. That is,
temperature now depends on only one variable:
T = g(C).
And we already know very well how to find the derivative of a function
of a single variable. This derivative is called a partial derivative – partial
because it only tells part of the story about T ’s relationship to its inputs,
C and H: the question of its relationship to H was bracketed in order to
CHAPTER 4. DIFFERENTIATION
58
isolate the effects of C.
Suppose now that we have an explicit formula for the temperature, something like
T = f (C, H) = −20C + 80H + 75.
For instance, if the cold water tap is fully open (C = 1) and the hot water
tap fully closed (H = 0), then
T = f (1, 0)
= −20 · 1 + 80 · 0 + 75
= 55.
If, on the other hand, the hot tap is fully open and the cold tap fully closed,
T = f (0, 1) = 155. Finally, if the hot water tap is fixed at halfway open
(H = 12 ) but C remains variable, then
1
T = f (C, )
2
= −20C + 80 ·
1
+ 75
2
= −20C + 115
So g(C) = −20C + 115, and its derivative is
d
g(C)
dC
d
=
[−20C + 115]
dC
d
d
=
(−20C) +
115
dC
dC
= −20
g 0 (C) =
Now instead of fixing H at 12 , we could have fixed it at any value between
0 (fully closed) and 1 (fully open) – at H0 , say. In this case, T = f (C, H0 ) =
CHAPTER 4. DIFFERENTIATION
59
−20C + 80H0 + 75. Although this looks like a function of two variables, it’s
actually a function of only one: it depends only on C; the other variable,
H, was fixed at the constant value, H0 . As before, we can express T as a
function only of C:
T = g(C)
= f (C, H0 )
= −20C + 80H0 + 75
And as before, we can differentiate g:
d
g(C)
dC
d
=
[−20C + 80H0 + 115]
dC
d
d
d
(−20C) +
(80H0 ) +
115
=
dC
dC
dC
g 0 (C) =
But H0 is a constant and thus 80H0 is as well. The derivative of a constant
is always 0, so, continuing the calculation above, we have
d
d
d
(−20C) +
(80H0 ) +
115 = −20 + 0 + 0
dC
dC
dC
= −20.
To conclude, the derivative of g is −20 no matter what value, H0 , we fix H
at. This result has a straightforward interpretation: it means that for every
fraction the cold water tap is opened, the temperature at the spout will fall
by the same fraction of 20 degrees, no matter what the hot water setting.
Why, we can now ask, bother appending the little subscript “0” to H
at all? In the calculations above, the notation served as nothing more than
a crutch for our imaginations, there to remind us that H0 should be understood as a constant, not a variable. Why not just imagine H as a constant
in the first place, with or without the subscript? Our calculations above
CHAPTER 4. DIFFERENTIATION
60
would not have been any different.
In fact, when we formally take the partial derivative of a function of
several variables, we do exactly this: we fix all but one of the variables,
not with subscripts or by plugging in an actual number but only in our
imaginations. To formally signify the partial derivative, we use a special
∂
, which operates on functions according to the very same rules,
notation, ∂x
d
summarized in Table 4.1, that dx
does. The set of symbols,
∂
f (x, y)
∂x
means, “Imagine y is fixed, so that f becomes in effect a function of one
variable, x. Then take the derivative of this one-variable function in the
∂
usual way.” We say that ∂x
f (x, y) is the partial derivative of f with respect to x. We could also of course imagine fixing x and taking the partial
derivative of f with respect to y, in which case we would write
∂
f (x, y).
∂y
Here are some examples of partial derivatives:
• If f (x, y) = −20x + 80y + 115, then
∂
∂
f (x, y) =
[−20x + 80y + 115]
∂x
∂x
∂
∂
∂
=
(−20x) +
(80y) +
115.
∂x
∂x
∂x
∂
Now as far as ∂x
is concerned, 80y is nothing but a constant. (Remember, we’re imagining y is fixed.) The partial derivative of a constant is
0, just as it is for the ordinary derivative. Continuing, then, we have
∂
∂
∂
(−20x) +
(80y) +
115 = −20 + 0 + 0
∂x
∂x
∂x
= −20.
CHAPTER 4. DIFFERENTIATION
Likewise, we can take the partial derivative of f with respect to y:
∂
∂
f (x, y) =
[−20x + 80y + 115]
∂y
∂y
∂
∂
∂
=
(−20x) +
(80y) +
115
∂y
∂y
∂y
= 0 + 80 + 0
= 80.
• If f (x, y, z) = x3 + yz + x2 yz + exyz , then
∂
∂ 3
f (x, y, z) =
[x + yz + x2 yz + exyz ]
∂x
∂x
∂ 3
∂
∂ 2
∂ xyz
=
(x ) +
(yz) +
(x yz) +
(e )
∂x
∂x
∂x
∂x
∂
(xyz)
= 3x2 + 0 + 2xyz + exyz ·
∂x
= 3x2 + 2xyz + yzexyz .
61
Chapter 5
Maximums and minimums
5.1
Using derivatives to find maximums
and minimums
One of the primary uses of the derivative – and nowhere more so than in
economics – is to find maximums and minimums. A maximum, we might
say, is where something peaks. But what, exactly, does it mean to peak? If
we walk up a hill, over the top, and down the other side, the peak is precisely
the point where our journey goes from being uphill to being downhill.
Like a hill, a function, y = f (x), can also peak. As x steadily increases,
y may rise and fall. The precise point where it goes from rising to falling
is a maximum. Where y is rising, the derivative of f is positive. Where it
is falling, the derivative is negative. At the very point where it goes from
rising to falling, the derivative, passing precisely there from positive values
to negative ones, must be 0. Figure 5.1 describes this phenomenon in an
intuitive way.
A minimum of a function is the opposite of a maximum: it is a point
where the function “bottoms out.” Whereas at a maximum, the function
goes from rising to falling, at a minimum, it goes from falling to rising.
At the point where the minimum occurs, the derivative, now passing from
62
CHAPTER 5. MAXIMUMS AND MINIMUMS
63
y
m=0
m>0
m<0
x
Figure 5.1: The derivative – the slope of the tangent line – is 0 at a maximum.
negative to positive, is also 0.
To find the maximums and minimums of a function, then, we first find
where its derivative is 0, that is, we solve the equation
f 0 (x) = 0.
Solving it may yield one value for x, several, or none. Each of these values
becomes a candidate for a point where a maximum or a minimum occurs,
but until we inspect them further, we can’t declare whether a maximum, a
minimum, or neither.
The problem is, at these values of x, we don’t know if the function is
going from rising to falling, from falling to rising, or whether it has simply
“flattened out” between two periods of rising or two of falling. So we need
a test that tells us if the derivative itself is rising or falling or neither, just
CHAPTER 5. MAXIMUMS AND MINIMUMS
64
as the derivative in its time told us whether the original function was rising
or falling or, as at maximums and minimums, neither.
The value of the second derivative is this test. If the value is negative,
the derivative itself is trending down, as in Figure 5.1, which represents a
maximum. If the value is positive, the derivative is trending up, and we must
have found a minimum. If the value is 0, we’ve found neither a maximum
nor a minimum but a point, called an inflection point, where the original
function flattens out between two bouts of rising or two of falling. The test
itself is called the second-order condition for a maximum or a minimum.
Consider two examples. From its graph, we can tell the function f (x) =
3
x has no maximum or minimum.
y
x
Figure 5.2: The graph of y = x3 .
If we nevertheless go ahead with our procedure for finding maximums and
CHAPTER 5. MAXIMUMS AND MINIMUMS
65
minimums, we start by setting
f 0 (x) = 3x2 = 0.
This equation has only one solution, namely, x = 0. The second derivative
of f (x) = x3 is
f 00 (x) = 6x
and when x = 0, f 00 (x) = 0 as well. Since the second derivative is neither
positive – as it would be had we found a minimum – nor negative – as it
would be had we found a maximum – we conclude f has nothing but an
inflection point at x = 0, where, the graph shows, it merely flattens out
between two periods of rising.
Now consider the function f (x) = x3 − 27x.
y
m=0
x
m=0
Figure 5.3: The graph of y = x3 − 27x.
CHAPTER 5. MAXIMUMS AND MINIMUMS
66
We start as usual by setting
f 0 (x) = 3x2 − 27 = 0.
This equation has two solutions, x = 3 and x = −3. The second derivative
of f is
f 00 (x) = 6x.
At x = 3, f 00 (3) = 18, which is positive, so there we have a minimum. At
x = −3, f 00 (−3) = −18, which is negative, and there we find a maximum.
A last word of caution: the procedure we’ve just run through is guaranteed only to find peaks and valleys, technically called local maximums and
local minimums. It may happen, as in the example above, that the peak
it finds is not the greatest value the function assumes anywhere. (x3 − 27x
grows arbitrarily large as x does.) Likewise, the valley it finds may not be
the least value the function assumes. Finally, there may be many peaks,
some higher than others, and many valleys, some lower than others. The
case typically isn’t settled in a single step but by a process of sifting and
examining details.
5.2
The Lagrangian
The previous chapter’s method for finding maximums and minimums is
very useful – but limited. In particular, it only deals with functions of
one variable. Many problems of maximization and minimization arise in
economics that require a more powerful method to solve. The Lagrangian
furnishes this method. Here, without making the least attempt to explain
why it works, we present it as a purely mechanical procedure.
CHAPTER 5. MAXIMUMS AND MINIMUMS
67
Step 1: Set up
A Lagrangian problem always involves two components: 1) a function of
several variables, whose value we wish to maximize (or minimize), and 2) a
constraint on the values those variables can assume. For instance, we might
have a function f (x, y) = xy of two variables. If we don’t constrain the
possible values for x and y, we can make f (x, y) grow arbitrarily large by
letting either x or y run freely out to infinity. In this case, f will have no
finite maximum.
If, on the other hand, we weren’t so permissive, we could stipulate up
front that any combination of x and y on which we allow f to be evaluated
come from a narrower set, say, the set of x-y combinations that satisfy the
relationship
x2 + y 2 = 1.
In a move straight out of Section 1.5 on level curves, we can cast this
relationship in terms of a function. The x-y combinations that satisfy the
relationship x2 + y 2 = 1 are identical to the x-y combinations that make
the function
g(x, y) = x2 + y 2 − 1
equal to 0. (x2 + y 2 − 1 = 0 is the same as x2 + y 2 = 1.) That g(x, y) must
equal 0 is the constraint on x and y.
Together with one new variable, we now combine our two functions, f
and g, in a special way to form a third function, the Lagrangian itself:
L(x, y, λ) = f (x, y) − λg(x, y).
The new variable, λ, is called the Lagrangian multiplier . In our particular
example,
L(x, y, λ) = xy − λ(x2 + y 2 − 1).
CHAPTER 5. MAXIMUMS AND MINIMUMS
68
Step 2: Differentiate the Lagrangian
L is a function of several variables, in the present case, of x, y, and λ.
This means that it has three partial derivatives, one with respect to each
variable. We take these three partial derivatives and set them all equal to
0:
∂
L(x, y, λ) = 0
∂x
∂
L(x, y, λ) = 0
∂y
∂
L(x, y, λ) = 0.
∂λ
If f and g had been functions of more than two variables, L would be a
function of more than three variables, and the above list of equations would
be longer. But returning to our particular example,
∂
L(x, y, λ) = y − 2λx
∂x
∂
L(x, y, λ) = x − 2λy
∂y
∂
L(x, y, λ) = x2 + y 2 − 1.
∂λ
All of these expressions in x, y, and λ are set equal to 0, resulting in a
set of simultaneous equations:
y − 2λx = 0
(5.1)
x − 2λy = 0
(5.2)
x2 + y 2 − 1 = 0.
(5.3)
Step 3: Solve the simultaneous system of equations
This is typically the hairiest step of the Lagrangian method. We have to
solve the set of simultaneous equations we obtained at the end of Step 2, and
CHAPTER 5. MAXIMUMS AND MINIMUMS
69
this can get messy. Generally, we proceed by solving one of the equations
for one variable in terms of the remaining variables, aiming for something
of the form,
isolated variable = expression involving remaining variables.
For instance, we can solve Equation 5.1 for λ to obtain
λ=
y
.
2x
(5.4)
(Crucially, we must assume at this point that x is nonzero, otherwise we
can’t divide by it.)
Now, we substitute this expression for the isolated variable into the
remaining equations, thereby reducing the number of variables and the
number of equations by one. We then repeat this cycle of isolation and
substitution until we get down to one equation in one variable, which, with
any luck, will be straightforward to solve.
y
So, substituting λ = 2x
in Equations 5.2 and 5.3, we have
y
·y =0
2x
x2 + y 2 − 1 = 0.
x−2·
(5.5)
(5.6)
Equation 5.5 simplifies to
x2 − y 2 = 0.
(5.7)
The slightly clever thing to notice now is that Equation 5.7 tells us x2 = y 2 ,
so without taking any square roots, we can substitute directly for y 2 in
CHAPTER 5. MAXIMUMS AND MINIMUMS
70
Equation 5.6, obtaining
x2 + x2 − 1 = 0
2x2 = 1
1
x2 =
2
1
x = ±√ .
2
As if unpacking matryushka dolls, we went from three equations to two
equations to one equation, which we’ve just solved for x. Now we have to
pack the dolls back up, solving for y by substituting x into any one of the
two equations, and then for λ by substituting x and y into any one of the
three equations. So, plugging either x = √12 or x = − √12 into Equation 5.7,
we have
y2 =
1
2
1
y = ±√ .
2
Finally, substituting various combinations of positive and negative for
x and y in Equation 5.4, we find that
1
λ=± .
2
In outline, Step 3 proceeds by digging down, through cycles of isolation
and substitution, until we strike a hard number, that is, until we’ve solved
for one of the variables. We then work our way back up, filling in variables
with hard numbers. The result is combinations of values where f possibly
CHAPTER 5. MAXIMUMS AND MINIMUMS
71
assumes a maximum (or minimum):
1
x = ±√
2
1
y = ±√
2
1
λ=± .
2
Step 4: Check the solutions
Step 3 suggested several combinations of x and y that might maximize or
minimize f (x, y). To know whether it’s maximize or minimize, we need to
plug the combinations into f .
If x and y are both negative or both positive, e.g. x = − √12 and y = − √12 ,
then
f (x, y) = xy
1
1
· ±√
= ±√
2
2
1
= .
2
On the other hand, if x and y are of opposite sign, e.g. x =
then
√1
2
and y = − √12 ,
f (x, y) = xy
1
1
= ±√
· ∓√
2
2
1
=− .
2
Thus f has a maximum of 12 at (x, y) = (± √12 , ± √12 ) and a minimum of − 12
at (x, y) = (± √12 , ∓ √12 ).
It also never hurts to plug in zero values for variables we assumed at
CHAPTER 5. MAXIMUMS AND MINIMUMS
72
some point in Step 3 were nonzero, just to make sure the assumption was
not unwarranted. We did in fact assume that x, for one, was nonzero. But
if x = 0, then f (x, y) = xy = 0 as well, so f attains neither a maximum
nor a minimum when x = 0. Our assumption was warranted after all.
Index
∂
, 55
∂x
d2 y
, 49
dx2
d2
, 49
dx2
dy
, 42
dx
d
, 42
dx
of the product of two functions,
45
of the quotient of two functions,
46
of the sum of two functions, 44
partial, 53
rules of, 47
second, 47
differentiation, 41
rules of, 42, 47
diminishing marginal utility, 17
domain, 2
e, 25
x-axis, 6
xy-plane, see coordinate plane
y-axis, 6
y-intercept, 28
base, 20
chain rule, 46
concave, 50
coordinate plane, 6
exponent, 20
rules of, 20
derivative, 35
of a constant function, 42
of a linear function, 43
of a power function, 43
of an exponential function, 44
of the composition of two functions, 46
of the natural logarithm, 44
function, 2
composing, 12
concave, 15
convex, 15
exponential, 20
inverse, 10
linear, 27
mathematical, 3
73
INDEX
of several variables, 5
graph, 5
inflection point, 58
Lagrangian, 60
Lagrangian multiplier, 61
level curves, 8, 10
line, 28
logarithm, 23
natural, 24
rules of, 25
mapping, 2
maximum, 57
minimum, 57
origin, 6
plane, see coordinate plane
range, 2
second-order condition, 58
slope, 28
system of linear equations, 33
tangent, 39
74
Download