MATH 251-02: Multivariable Calculus I (83305) JB

advertisement
MATH 251-02: Multivariable Calculus I (83305)
JB-387, TuTh 4 PM - 5:50 PM
SYLLABUS Fall 2015
John Sarli
JB-326
jsarli@csusb.edu
909-537-5374
O¢ ce Hours: TuTh 11:30 AM - 1 PM and 3:30 - 4 PM, or by appt.
Text: Marsden/Tromba (W.H. Freeman and Company)
Vector Calculus (sixth edition)
Prerequisites: MATH 212 with a grade of "C" or better.
This a …rst course in multivariable calculus, covering the di¤erential calculus of
vector functions. (A second course in multivariable calculus covers the classical
theorems of integration, which generalize the fundamental theorem of calculus.)
We will develop some basic theory but the main goal is to become pro…cient in
computations with functions of two and three variables. In doing so we develop an
understanding of the geometry of space so that applications to various branches
of science and mathematics become accessible. We will cover the …rst half of
the text, Chapters 1 through 4, excluding material that explicitly depends on
certain topics from MATH 213.
Grading will be based on two midterm exams, a cumulative …nal exam, and four
graded assignments (one from each chapter), weighted as follows: First Midterm
(15%), Second Midterm (25%), Final Exam (40%), Graded Assignments (20%).
To reinforce written communication skills the Graded Assignment solutions
should be clearly presented in a typed format, either printed or sent to me as a
PDF (do not scan in handwritten work).
Although there is no attendance requirement for this class, you must complete the
CSU/UC Mathematics Diagnostic Testing Project CR test within the …rst two
weeks of the course (by October 8). Go to mdtp.ucsd.edu and scroll down to
MDTP Web Based Tests. Select the CR test. The items will appear one at
a time. You can either print the results or send them to me electronically. The
purpose of this requirement is for you to assess any ‡aws in your fundamental
skills that can hinder your conceptual understanding of this higher material. The
test results do not a¤ect your course grade in any way, in fact I will not look
at your individual item results unless you ask me to review them with you.
A list of Suggested Exercises will be provided for each chapter and the exams will be written at the level of these exercises. I will list exercises that are
representative of a particular technique or concept, but you should attempt as
many similar exercises in the text as needed for understanding. In this way we
can avoid "practice exams" and other routines that use the time we need to cover
this material. It is your responsibility to bring questions to class that arise as you
work through problems.
After computing your total scores weighted according to the percentages above,
course grades will be assigned as follows:
A
91
A
86 90
B+
81 85
B
76 80
B
71 75
C+
66 70
C
61 65
C
51 60
D
45 50
F
< 45
Success in this course requires a balance of three activities:
1) Read the text and work the exercises regularly. Keep notes of your solutions.
If you have organized them e¢ ciently, bring them to the in-class exams.
2) Follow the lecture notes on my website: www.math.csusb.edu/faculty/sarli/
which also has the syllabus. Bring questions on these notes to class as they occur
to you.
3) Participate in the class sessions as actively as you can. Lectures are more
useful to you if you use them to clarify ideas as we develop them.
Notes
2
1) Mid-term exam dates are subject to change. Due dates for the graded exercises
will be set as we approach the end of each chapter of the text.
2) Learning Outcomes: Upon successful completion of this course, students
will be able to:
1.1 demonstrate an understanding and apply fundamental concepts, operations and relations;
2.1 correctly apply mathematical theorems, properties and de…nitions;
3.3 explain and justify solutions using a variety of representations.
3) Please refer to the Academic Regulations and Policies section of your current
bulletin for information regarding add/drop procedures. Instances of academic
dishonesty will not be tolerated. Cheating on exams or plagiarism (presenting the
work of another as your own, or the use of another person’s ideas without giving
proper credit) will result in a failing grade and sanctions by the University. For
this class, all assignments are to be completed by the individual student unless
otherwise speci…ed.
4) If you are in need of an accommodation for a disability in order to participate
in this class, please let me know ASAP and also contact Services
to Students with Disabilities at UH-183, (909)537-5238.
3
Some important dates:
2015-09-24: First day of class
2015-09-30: Last day to add open classes w/o permission
2015-10-14: Census Deadline
2015-10-15: First Exam
2015-11-12: Second Exam
2015-11-26: Campus closed
2015-12-03: Last day of class
2015-12-08: Final Exam (Tuesday 4-5:50)
Approximate course schedule:
Overview of vectors in 2 and 3 dimensions; Algebra and geometry of vectors
Elementary linear algebra in R3 ; Conversions between coordinate systems; linear
algebra in Rn
10/15: First exam
Functions from Rn to R; Continuity of functions from Rn to Rm
Derivatives of real-valued functions; First-order approximation; Derivatives of
functions from Rn to Rm
Paths and Curves; functions from R to Rn ; Chain Rule; Directional derivatives
11/12: Second Exam
Higher derivatives and Second-order approximation; Extrema of real-valued functions
Constrained extrema; Acceleration and Force
Arc length; Vector …elds and di¤erential operators
12/08: Final Exam
4
Suggested Exercises for Chapter 1
The following exercises are not to be handed in. They represent skills required for
basic mastery.
1.1 (pages 18-19):
8; 9
17; 21; 23; 24; 27
1.2 (pages 29-31):
4
11; 12; 13; 14; 16; 17; 20
25; 26
1.3 (pages 49-51):
2; 4; 6; 7
11; 15; 16; 20
29; 33; 39
1.4 (pages 58-59):
3; 6; 7
10; 21
1.5 (pages 69-70):
3; 5; 9; 11; 12
21; 22; 24
5
First Graded Assignment
Due: October 15 (if turned in by October 13 will be graded and returned by
October 15)
To reinforce written communication skills the Graded Assignment solutions
should be clearly presented in a printed or PDF format. Late papers will not
be graded.
First Graded Assignment. Do any one of the following:
Page
Page
Page
Page
Page
30:
31:
59:
70:
72:
36
38
18
24
40
6
Vectors
Representation.
A vector is a quantity that is characterized by magnitude and direction. Vectors
are de…ned in any dimension. Consider two points P and Q. The segment P Q
has a length P Q determined by the distance formula, but no direction. If P 6= Q
we can specify a direction, for example, from P to Q. The directed line segment
!
P Q represents a vector. (If P = Q the segment reduces to a single point and
represents the zero vector, whose magnitude is 0 and whose direction is unde…ned.)
Any directed line segment with length P Q that points in the same direction as
!
P Q represents the same vector. For example, in 2-dimensional space we can
represent points by ordered pairs in R2 . If P = (1; 4), Q = (4; 8) and A = (0; 0),
!
!
B = (3; 4) then P Q and AB represent the same vector. Both segments have
length 5 and point in the same direction. Given any vector we can change its
magnitude without changing its direction by a process called scaling (also called
dilation in geometry) that corresponds to multiplication by a positive number.
Multiplication by a negative number reverses the direction of the vector. For
!
!
example, multiplying P Q by 1 produces the vector QP . Multiplying any vector
by 0 produces the zero vector.
A point P in Rn corresponds to an n-tuple (x1 ; x2 ; : : : ; xn ) which we usually
interpret as its location relative to designated axes that intersect at the origin
!
O = (0; 0; : : : ; 0). The directed segment OP identi…es this point with a unique
vector, sometimes called the position representation for the vector whose magnitude is OP and whose direction points from O to P . Conversely, any vector has
a unique position representation: If we imagine a directed segment from A to B
in n-dimensional space and choose a coordinate system with origin O, then the
!
position representation of AB is obtained by translating A to O whereby P is the
!
translated image of B. For example, if A = (1; 1; 2) and B = (4; 3; 14) then AB
clearly has length 13 but the direction of this vector may be di¢ cult to visualize.
!
However, the position representation of this vector is OP where P = (3; 4; 12), so
we can imagine a directed segment from the origin to the point (3; 4; 12) in order
to assess the direction of this vector.
7
Unit vectors.
In Rn it will be useful to have a standard vector for each possible direction. In
the abstract, it is common to notate vectors with bold letters, such as v. With
this generic notation it is important to distinguish the zero vector 0 from the
real number 0. If v 6= 0 then its magnitude jvj is a positive number and we
1
can multiply v by jvj
to produce a vector of magnitude 1, called a unit vector.
We can organize all unit vectors by their position representations. Thus, in R1
any unit vector is represented either by the directed segment from 0 to 1 on the
number line, or by the directed segment from 0 to 1. In R2 , however, the position
representatives for unit vectors are in one-to-one correspondence with the points
on the unit circle; in R3 they are in one-to-one correspondence with the points on
the unit sphere.
Vector addition.
Vectors were developed historically in two related contexts: physics and geometry.
These two contexts are naturally related by the need to represent translation
through a distance and physical quantities that depend on translations, such as
velocity and force. For example, to describe the velocity of an object it is necessary
to represent both its speed (distance/time) and its direction of motion. Even
before the invention of calculus it was discovered that translations, forces and
!
velocities add according to vector rules: If v and w are represented by v = P Q
!
!
and w = P R then v + w is represented by the P S, where S is the fourth vertex of
the parallelogram P QRS. This was known as the parallelogram law of addition.
For the position representation in Rn this law is equivalent to the coordinate
addition law:
!
!
!
If v = OA and w = OB then v + w = OC, where A = (a1 ; : : : ; an ),
B = (b1 ; : : : ; bn ) and C = (a1 + b1 ; : : : ; an + bn ).
The equivalence of these addition laws is easily established using congruent triangles (see Page 5 of the text for the proof in R2 , which is no loss of generality since
any two directed segments with a common endpoint determine a parallelogram in
a plane).
8
We will need both the parallelogram law (synthetic approach) and the coordinate addition law (analytic approach) when working with vectors. To understand
the synthetic approach it is important to draw …gures that interpret the vector
operations. The following is a particularly useful exercise:
!
!
As above, let v = P Q and w = P R so that v + w is represented
!
by the P S, where S is the fourth vertex of the parallelogram P QRS.
!
!
! !
Then w = QS and so P S = P Q + QS. It follows that for any triangle
P QS we have
!
!
!
P Q + QS + SP = 0
!
!
!
Similarly, P R + RQ = P Q and so
v
!
w = PQ
!
!
P R = RQ
Geometrically, then, v + w and v w are the two diagonals of the
parallelogram with directions as indicated.
Synthetic vector notation was found to provide an e¢ cient description of geometric facts. The following is a typical example of how vectors were used to prove
theorems about triangles. Sketch the vectors as you read through this construction
and make sure your …gures accurately represent the equations.
Let OP Q be a triangle with M the midpoint of P Q and N the
midpoint of OQ. Let G = OM \ P N . Then
!
!
!
OG + GN = ON
!
!
!
P G + GM = P M
!
!
!
!
Now OG and GM have the same direction, as do P G and GN , so
there are numbers and such that
!
OG
!
PG
!
GM =
!
GN =
9
!
GM
!
GN
from which we obtain
!
!
!
!
GM
GN = ON P M
!
1 !
=
OQ P Q
2
!
1 !
OQ + QP
=
2
!
1 !
OP = OL
=
2
where L is the midpoint of OP . However, since M and N are
!
!
midpoints we have N M k OP and N M = 12 OP , so N M = OL. Note
!
!
!
also that N M = GM GN , so we have arrived at the equation
!
GM
(
!
!
!
!
GN = OL = GM GN
!
!
1) GM = (
1) GN
and, since GM and GN are not parallel, both sides must be 0. We
conclude that = = 1.
From this vector calculation we conclude:
1) If G is the intersection of two medians of a triangle then the segment from
a vertex to G is twice as long as the segment from G to the opposite midpoint
!
!
(OG = 2GM , etc.).
2) Since 1) holds for any two medians, all three medians must be concurrent at
G, the centroid of the triangle.
Lines in Rn .
In R2 a linear equation of the form ax1 + bx2 = c describes a line consisting
of points (x1 ; x2 ) that satisfy this equation. In Rn with n > 2 a single linear
equation does not describe a line. For example, in R3 we will see that an equation
of the form ax1 + bx2 + cx3 = d describes a plane. To describe a line in Rn we
use vectors. Since most of our work will take place in R2 and R3 we can avoid
unnecessary subscripts by using (x; y) and (x; y; z), respectively, to denote points
10
in these spaces. Recall that the equation for a line in R2 was obtained from two
pieces of information, its slope and a known point on the line. The slope gives
information about the direction from one point to another on the line, and the
known point distinguishes the line from all others with the same slope. This is
the idea that generalizes to Rn : Identify a known point P on the line and identify
a vector v whose direction is parallel to the line; then any point Q on the line will
correspond to the position representation
!
OP + tv
for some numerical value of t. For example, in R2 consider the line with slope
!
2 that passes through P = (1; 3). If we set v = OV where V = (1; 2) then v
is parallel to the line. We say the points on the line are parameterized by t and
write the equation of the line as a vector function of t:
!
l(t) = OP + tv
= (1; 3) + t (1; 2)
= (1 + t; 3 2t)
Thus the line consists of all (x; y) given by the parametric equations
x = 1+t
y = 3 2t
Note that t = x
1=
1
2
(y
3), and so
y=
2x + 5
is the familiar Cartesian equation for the line.
It is important to interpret the vector representation of a line in terms of the
parallelogram law of vector addition:
We can represent the direction vector v by a directed line segment
!
that starts at the known point P . When we add OP to any multiple
!
!
!
of the direction vector we obtain a vector OQ = OP + P Q where the
point Q on the line is determined by the scalar parameter t.
11
Using this approach we can easily …nd the vector equation of a line in Rn . All we
need is a direction vector and a known point on the line. In fact, since there is
a unique line through any two distinct points we can …nd the equation from two
known points on the line. For example, the line in R3 determined by the points
P = (p1 ; p2 ; p3 ) and Q = (q1 ; q2 ; q3 ) has
!
v = PQ
as a direction vector, so a vector equation of the line is
l(t) = (p1 ; p2 ; p3 ) + t (q1
p1 ; q2
p2 ; q3
p3 )
which yields the parametric equations
x = tq1 + (1
y = tq2 + (1
z = tq3 + (1
t) p1
t) p2
t) p3
As before, we could solve for t in each equation to produce three symmetric equations
y p2
z p3
x p1
=
=
q1 p1
q2 p 2
q3 p3
provided qj 6= pj for any j = 1; 2; 3, but we cannot reduce these to a single
Cartesian equation in x; y; z. Returning to the vector equation, note that l(0) =
(p1 ; p2 ; p3 ) and l(1) = (q1 ; q2 ; q3 ) so the values of t in the interval [0; 1] describe
the segment P Q.
Intersecting lines.
In R2 , a pair of lines with distinct slopes will intersect at some point. In higher
dimensions it is not obvious when two lines intersect but we can use their vector
equations to obtain the point of intersection, if it exists, from the parametric
equations. For example, in R3 consider the two lines
l1 (t) = (2; 6; 3) + t (1; 0; 4)
l2 (t) = (1; 1; 1) + t (3; 4; 5)
For l1 we have the parametric equations
x = 2 + t1
y = 6
z = 4t1 3
12
For l2 we have the parametric equations
x = 1 + 3t2
y = 4t2 1
z = 1 + 5t2
If there is a point of intersection it might correspond to di¤erent values of the
parameter in each description, which is why we write t1 and t2 instead of just t in
each case. Can we …nd compatible values of t1 and t2 for these lines? For every
point on the …rst line we have y = 6 so we can solve for t2 by setting
4t2
1 = 6
7
t2 =
4
Thus, if there is a point of intersection it must be the point
l2
7
4
=
25
39
; 6;
4
4
This point is on l1 provided there is a value t1 such that
25
4
39
3 =
4
2 + t1 =
4t1
simultaneously. Since this is not the case, the two lines do not intersect.
Let l1 be the line through the point (4; 1; 6) with direction vector
i j + k and let l2 be the line through the point (0; 1; 4) with direction
vector i + j k. Find the point where these lines intersect.
Lines and Planes through the origin.
A line that passes through the origin with direction vector v has
l(t) = tv
13
as a vector equation. We say that the line is the (one-dimensional) span of the
vector v. Suppose in R3 that v = ai + bj + ck. Then the parametric equations for
the line are
x = at
y = bt
z = ct
that is, the coordinates are given as constant multiples of the parameter t, so the
origin is produced when t = 0.
Now suppose v and w are non-zero vectors such that one is not a scalar multiple
of the other. We de…ne the span of v and w to be all vector sums
sv + tw
where the parameters s and t vary over all real numbers. This span contains the
individual spans of each vector because we can set one parameter equal to 0 and
let the other vary, so geometrically the span of v and w contains two distinct
lines that intersect at the origin. Any vector in the span must lie in the plane
determined by these two lines, so the span is a plane that passes through the
origin. We will see that in R3 this plane can be described by a single linear
equation in x; y; z. For example, if v = i and w = j then the plane is the xy-plane
which is described by the single equation z = 0. Similarly, the single equation
y = 0 describes the span of v = i and w = k (the xz-plane), and x = 0 describes
the span of v = j and w = k, (the yz-plane).
Inner product, Angles, and Law of Cosines
For any two vectors v and w we de…ne their inner product v w (also called scalar
product or dot product) to be
v w = jvj jwj cos
where is the angle between v and w, measured by representing the vectors as
!
!
directed segments P Q and P R. Since cos = cos (2
) it does not matter
which of the two possible angles we use. From the de…nition, the dot product is
commutative
v w=w v
14
This de…nition makes sense in Rn in general but we will focus on R3 . Let v =
v1 i + v2 j + v3 k and w = w1 i + w2 j + w3 k. Then
jv
wj2 = (v1 w1 )2 + (v2 w2 )2 + (v3 w3 )2
= v12 + v22 + v32 + w12 + w22 + w32
2 (v1 w1 + v2 w2 + v3 w3 )
= jvj2 + jwj2
2 (v1 w1 + v2 w2 + v3 w3 )
By the Law of Cosines, since jv wj is the length of side RQ in the triangle P RQ,
we have
jv wj2 = jvj2 + jwj2 2 jvj jwj cos
and so
jvj jwj cos = v1 w1 + v2 w2 + v3 w3
Therefore, when v and w are written in component form,
v w = v1 w1 + v2 w2 + v3 w3
from which it is easy to see that
(cv) w = v (cw) = cv w
and
v (w1 + w2 ) = v w1 + v w2
Clearly, the inner product of any vector and the zero vector is the real number 0.
If neither v and w are the zero vector the angle between them is found by
cos
=
v w
jvj jwj
= arccos
v w
jvj jwj
Since the range of the inverse cosine function is [0; ] this de…nes, as a matter of
convention, the angle between two vectors so that 0
.
The inner product is useful in describing the geometry of vectors. For example,
note that
p
jvj = v v
15
!
!
and we can express the distance between points P and Q as P Q = QP . Whenever v w = 0 we say that the two vectors are orthogonal. If neither is the zero
vector then they are orthogonal precisely when the angle between them is 2 . If
= 0 or = we say the vectors are parallel; these two cases maximize and
minimize the inner product, respectively, so if 0 <
<
we have the strict
Cauchy-Schwarz Inequality
jv wj < jvj jwj
The Triangle Inequality
Note that jv wj = jvj jwj if either of the vectors is the zero vector, and, more
generally, if v = w for some real number . The general Cauchy-Schwarz
Inequality
jv wj jvj jwj
implies the Triangle Inequality
jv + wj
jvj + jwj
for any vectors in Rn . To see this, note that
v w
jv wj
jvj jwj
Consequently,
jv + wj2 = (v + w) (v + w)
= jvj2 + jwj2 + 2 (v w)
jvj2 + jwj2 + 2 jvj jwj
= (jvj + jwj)2
Since jv + wj
0 and jvj + jwj
0 the Triangle Inequality follows.
16
Projection and Re‡ection
Basic trigonometry suggests we de…ne the orthogonal projection of v on w to be
v w
w
w w
w
=
v
jwj
projw v =
w
jwj
w
where the second expression indicates that we are multiplying the unit vector jwj
w
by the inner product of v with this unit vector. The scalar v jwj
is sometimes
called the component of v in the direction of w. Since the unit vectors i; j; k
are orthogonal to each other, the equation v = v1 i + v2 j + v3 k expresses the
decomposition of v into the sum of its orthogonal projections on these vectors.
Another important geometric application of the inner product is the operation of
re‡ection. If u is a unit vector we de…ne the re‡ection of v relative to u by
ref u v = v
= v
2proju v
2 (v u) u
By taking the inner product of v 2 (v u) u with itself we see that the vector
ref u v has the same magnitude as v. If we imagine all vectors in R3 that are
orthogonal to u we see that they comprise a plane through the origin. The vector
ref u v is in the half-space opposite to that of v relative to this plane, so ref u v is
the re‡ection of v in the plane determined by u. Like the inner product itself, the
re‡ection operation makes sense in Rn in general.
17
Force and Displacement.
Even without calculus some basic applications can be described with vectors.
One of the earliest discoveries described the resultant force acting on an object
as the sum of multiple constant forces acting on that object. Each of these forces
is represented by a vector whose magnitude measures the amount of force and
whose direction speci…es the direction in which the force is applied. The resultant
force is just the vector sum of the individual forces. Force is de…ned by Newton’s
Second Law in terms of mass, distance, and time:
F orce = M ass
Acceleration
Since mass is a scalar quantity, acceleration, like force, is a vector quantity. If
mass is in kilograms, distance in meters, and time in seconds then the magnitude
of force is in units of
N = kg- m= s2
where the symbol N stands for Newtons.
A force of 3 N acts on a body in the direction of i + j + k. Simultaneously, a second force of 5 N acts on the body in the direction
of 53 i 45 j. What is the resultant force? If no other forces act on the
body, in what direction will it move?
p
The 3 N force is represented by the vector 3 (i + j + k) because
the unit direction vector is p13 (i + j + k). Similarly, the 5 N force is
represented by 3i 4j. The resultant force is
p
F =
3 (i + j + k) + (3i 4j)
p
p
p
= 3+ 3 i+
3 4 j + 3k
p
p
p
The magnitude of this force is jFj = F F = 34 2 3 N, which
is 5:53 N. The direction the body will move is the direction of the
resultant force. The unit vector in this direction is
p
p
p
F
1
=p
3 4 j + 3k
3+ 3 i+
p
jFj
34 2 3
Think of the unit sphere as a three-dimensional compass and imagine
a directed segment from the origin to the point on the unit sphere
approximated by (0:856; 0:410; 0:313).
18
The inner product has a natural interpretation as the work done by a force acting
on a body. If a force F acts on a body so as to displace it through a given distance
in a given direction then
W =F D
wherepD is the
the displacement. Thus, if the force F =
p vector representing
p
3+ 3 i+
3 4 j + 3k in the above example acts on a body that is conp
p p
strained to move in a straight line from the origin to the point P = 3
3; 4 + 3; 3
the work done is
p
p
p
p p
p
3 +
3 4 4+ 3 + 3
3
W = 3+ 3 3
=
4
p ! ! p p
The total distance the body moves is jDj = OP OP = 2 3 + 34 m and the
work done by the force is 4 J, where the symbol J stands for Joules. Note that
in this case W is a negative number. This means that the angle between the force
vector and the displacement vector is obtuse, 97 in this case since
arccos
1 p
286
143
F D
= arccos
jFj jDj
1:6893 rad
Velocity also has both magnitude and direction. The magnitude of a velocity is
the speed. Velocities add as vectors, which is why aircraft need to adjust velocity
for the velocity of the wind so that the sum of the two results in the intended
velocity.
Matrices and Vectors
We used the angle between two vectors to de…ne their inner product, in any
dimension, by
v w = jvj jwj cos
which suggests there may also be a geometric interpretation of the scalar
quantity
A = jvj jwj sin
19
Basic trigonometry shows that A is the area of the parallelogram determined
by v and w. Notice that
A2 = jvj2 jwj2
(v w)2
In R2 we …nd that
A2 = v12 + v22
w12 + w22
(v1 w1 + v2 w2 )2 = (v1 w2
v2 w1 )2
so
A = jv1 w2
0. But in R3 we have
because sin
A2 =
v12 + v22 + v32
= (v2 w3
v2 w1 j
w12 + w22 + w32
v3 w2 )2 + (v3 w1
(v1 w1 + v2 w2 + v3 w3 )2
v1 w3 )2 + (v1 w2
v2 w1 )2
and so A can be interpreted as the magnitude of the vector
(v2 w3
v3 w2 ) i+ (v3 w1
v1 w3 ) j+ (v1 w2
which is the determinant of the matrix
0
1
i
j k
@ v1 v2 v3 A
w1 w2 w3
v2 w1 ) k
We call this vector the cross product and denote it by v w. Its properties are discussed below, but …rst we introduce the basics of matrix algebra and computation
of determinants.
20
A matrix is an m n rectangular array of numbers, where m is the number of
rows and n is the number of columns. We operate with matrices algebraically in
a variety of contexts, denoting them by letters such as
A = (aij )
indicating that the entries in the matrix are the numbers aij , the entry in the ith
row and j th column. The transpose of an m n matrix A is the n m matrix AT
obtained by interchanging its rows and columns. Thus,
AT = (aji )
A matrix with m = n is called a square matrix. A square matrix with aij = 1 if
i = j and aij = 0 if i 6= j is usually denoted In and is called the n n identity
matrix. Two matrices A = (aij ) and B = (bij ) with the same m n shape can be
added entry-wise to obtain
A + B = (aij + bij )
Any matrix A can be multiplied by a scalar
to obtain
A = ( aij )
Matrices, then, share many algebraic properties with vectors but also have additional algebraic properties that make them useful in vector calculus.
Matrix multiplication.
If the number of columns of A is equal to the number of rows of B then the
product AB is de…ned as follows.
If A has shape m n and B has shape n p then AB = (cij ) is
the matrix with shape m p where cij is the inner product of the ith
row of A with the j th column of B.
For example:
2
4
1 0
1 3
0
1 3 1
@ 1 1 0
0 3 2
1
2
6 A=
5
21
3 5 2
3 22 10
10
13
Note that if p = m then the products AB and BA are both de…ned, but AB has
shape m m and BA has shape n n. For example,
0
1
3 1
2
1 0 @
5 2
1 0 A =
4 1 3
22 10
3 2
0
1
0
1
3 1
10
2 3
1 0
@ 1 0 A 2
1 0 A
= @ 2
4 1 3
3 2
14
1 6
Even if m = n it is generally the case that AB 6= BA. If AB = BA we say that
the two matrices commute.
An important special case is when B has shape n 1, which is sometimes called
the column representation of the vector corresponding to the point (b1 ; : : : ; bn ) in
Rn . The product AB has shape m 1 and we say that the matrix A represents
a linear transformation from Rn to Rm . For example, since
0 1
1
2
1 0 @ A
2
0
=
4 1 3
10
2
2
1 0
represents a linear transformation from R3 to R2 .
4 1 3
We will see that this matrix operation allows us to describe the derivative of a
multivariable function.
the matrix A =
Invariants and Inverses.
For any square matrix we can compute a set of fundamental numbers called invariants of the matrix. These invariants are derived from the coe¢ cients of the
characteristic polynomial of the matrix. To …nd this polynomial we …rst introduce
the determinant of a square matrix, which will turn out to be one of its invariants.
There are many ways to de…ne the determinant but for our purposes the inductive
de…nition works best. We de…ne the determinant of a 2 2 matrix A to be
det
a11 a12
a21 a22
=
a11 a12
a21 a22
22
= a11 a22
a12 a21
The characteristic polynomial of a A is de…ned by
P ( ) = jA
= 2
I2 j
(a11 + a22 ) + (a11 a22
a12 a21 )
The constant term of P ( ) is the determinant of A. The sum of the diagonal
entries of A, a11 + a22 , is called the trace of A, denoted tr A. The trace and
determinant are the two principal invariants of a 2 2 square matrix.
The determinant of an n n matrix is computed by …nding its cofactors. The
cofactor Cij is the determinant of the (n 1) (n 1) matrix obtained by removing the ith row and the j th column of A and then multiplying this determinant by
( 1)i+j . For n = 3 we have
1
0
a11 a12 a13
A = @ a21 a22 a23 A
a31 a32 a33
and its cofactor matrix is
0
a22 a33 a23 a32
@
a12 a33 + a13 a32
(Cij ) =
a12 a23 a13 a22
a21 a33 + a31 a23
a11 a33 a13 a31
a11 a23 + a21 a13
1
a21 a32 a22 a31
a11 a32 + a12 a31 A
a11 a22 a12 a21
The determinant of A is computed by forming the sum of the n cofactors for
any particular row or column multiplied by the respective entries in that row or
column. For example, if we choose the …rst row of A, above, we …nd
det A = a11 (a22 a33
a23 a32 ) + a12 (a31 a23
a21 a33 ) + a13 (a21 a32
a22 a31 )
The same value for det A will be obtained regardless of which row or column we
choose for this computation.
We can now …nd the characteristic polynomial of a 3
P ( ) = jA
I3 j
3
=
+ (tr A)
2
2
3 matrix:
(A) + det A
where tr A = a11 +a22 +a33 and 2 (A) = (a12 a21 a11 a22 a11 a33 + a13 a31 a22 a33 + a23 a32 )
. The invariant 2 generally is not given a speci…c name. It has important interpretations in mechanics but we will not need it speci…cally for di¤erential calculus.
23
With this de…nition of P ( ) the determinant of an n n matrix will always be
the constant term, and the trace will always be ( 1)n 1 times the coe¢ cient of
n 1
. Thus, for n = 4 we would have
P ( ) = jA
= 4
I4 j
(tr A)
3
+
2
(A)
2
3
(A) + det A
where the invariants 2 and 3 are expressions of degree 2 and 3, respectively, in
the entries aij . The trace will always be the sum of the diagonal entries of A and
the determinant will always be an expression of degree n. For this reason tr A and
det A are generically denoted by 1 and n , respectively, though we will not use
this notation since these are the only two principal invariants that will appear in
our calculations.
If A is an n n matrix with det A 6= 0 then there is a matrix A 1 , called the
inverse of A, such that
AA 1 = A 1 A = In
If A represents a linear transformation from Rn to Rn then A 1 represents the
inverse transformation. A basic theorem from linear algebra shows how to obtain
A 1 from the cofactor matrix of A.
Let C be the cofactor matrix of A, where det A 6= 0. Then A
1
CT .
det A
0
1
1 1
1
For example, if A = @ 0 2 1 A then det A =
0 0
3
Then
0
1 0
6 3
3
1@
0
3
1 A=@
A 1=
6
0
0
2
from which it is easy to check that AA
the adjugate of A.
1
0
6
@
3
6 and C =
3
1
1
1
1
2
2
1 A
0 12
6
1
0 0
3
1
=
1
0 0
3 0 A.
1 2
= I3 . The matrix C T is sometimes called
Cross Product and Geometry of Planes
24
As discussed above, for vectors in R3 it is possible to form the product of two
vectors that produces a vector orthogonal to each of them. For vectors
v = v1 i + v2 j + v3 k
w = w1 i + w2 j + w3 k
this so-called cross product is computed by evaluating the formal determinant
v
i
j k
w = v1 v2 v3
w1 w2 w3
= (v2 w3
It follows that v w = w
It also follows directly that
v3 w2 ) i
v and so v
(v
(v
(v1 w3
v3 w1 ) j + (v1 w2
kv2 w1 ) k
w = 0 if w is a scalar multiple of v.
w) v = 0
w) w = 0
These properties allow us to …nd the Cartesian equation of any plane in R3 . Let
n = Ai + Bj + Ck be a normal vector for the plane that contains the point
P0 = (x0 ; y0 ; z0 ). If P = (x; y; z) is any point on this plane then
!
n P0 P = 0
A (x x0 ) + B (y y0 ) + C (z z0 ) = 0
and so the equation for the plane is
Ax + By + Cz = D
D = Ax0 + By0 + Cz0
In particular, if the plane contains the origin then D = 0 since we can use P0 =
(0; 0; 0). If this 2-space is the the span of v and w then we can use v w for the
normal vector n. For example, the span of v = 3i + 2j k and w = i j + 2k is
the plane
3x 7y 5z = 0
because v
w =3i 7j 5k.
Application. Let Q = (x1 ; y1 ; z1 ). Find the shortest distance d from Q to the
plane Ax + By + Cz = D and …nd the point on the plane closest to Q.
25
Let P0 = (x0 ; y0 ; z0 ) be the point on the plane closest to Q. Then
Ax0 + By0 + Cz0 = D
!
and since P0 Q is parallel to n = Ai + Bj + Ck we also have
x1
y1
z1
x0 =
y0 =
z0 =
A
B
C
Solving for x0 ; y0 ; z0 it follows that
=
!
But d2 = P0 Q
2
=
2
d=
Ax1 + By1 + Cz1 D
A2 + B 2 + C 2
(A2 + B 2 + C 2 ) and so
jAx1 + By1 + Cz1 Dj
p
A2 + B 2 + C 2
Finally,
x0 = x1
y0 = y1
z0 = z1
A
B
C
are the coordinates of P0 .
Intersection of Lines and Planes.
We used the vector/parametric representation to …nd the possible intersection of
two lines in R3 . To …nd the intersection of two planes, or the intersection of a line
with a plane, we use both the vector/parametric and Cartesian forms.
Given a line
l (t) = (x (t) ; y (t) ; z (t))
and a plane
Ax + By + Cz = D
there intersection would be found by solving for t upon substitution.
26
For example, if l (t) = (3
substitute to obtain
(3
t; 2t; 5 + 4t) and x + 3y + z = 2 we
t) + 6t + (5 + 4t) = 2
t =
2
3
so the line intersects the plane at the unique point
11
;
3
4 7
;
3 3
.
There will be a unique solution if the line and plane intersect in exactly one point
and no solution if the line and plane are parallel. What happens if the line is
contained in the plane?
Two planes will either be parallel or intersect in a line. They will be parallel if
their normal vectors are scalar multiples of each other. Otherwise their crossproduct is a non-zero vector orthogonal to both of them and can be used as a
direction vector for the line of intersection.
For example, 3x + 2y z = 4 has normal vector 3i + 2j k and
x y + 2z = 0 has normal vector i j + 2k. The cross-product is
3i
7j
5k
and to …nd the line of intersection we need a known point on the
line with this direction vector. Since neither plan is parallel to the xyplane the line of intersection must contain a point with z = 0. Solving
3x + 2y = 4 and x y = 0 simultaneously we …nd that
4 4
; ;0
5 5
is on both planes, so their line of intersection is
l (t) =
4
4
+ 3t;
5
5
27
7t; 5t
Study Guide for First Exam
Vector representation.
!
Directed line segment P Q
Magnitude and direction
Parallelogram Law of addition
!
Position representation OP in R2 and R3
Component representation v = v1 i + v2 j + v3 k
Unit vectors and direction.
In R1 there are only two unit vectors and they are represented by the directed
!
line segments OP where O is located at 0 and P is either at 1 or 1.
!
In R2 the unit vectors are represented by the directed line segments OP where
O is located at (0; 0) and P is at (cos ; sin ), 0
<2 .
!
3
In R the unit vectors are represented by the directed line segments OP where
O is located at (0; 0; 0) and P is at (cos sin ; sin sin ; cos ), 0
< 2 ,
0
.
Vector algebra.
Inner Product v w
Angle between vectors
Cross Product v w
Area of parallelogram
Projection: projw v
Geometry in R2 and R3 .
!
Vector/parametric description of lines: l(t) = OP + tv = (x (t) ; y (t) ; z (t))
Intersection of lines in R3
Cartesian equation of plane containing P0 with normal vector n
Distance from a given point to a given plane
Point on a given plane closest to a given point
Intersection of two planes, or of a line and a plane, in R3
Matrices.
Transpose
Addition and Multiplication of compatible matrices
28
Trace and Determinant of 2 2 and 3
Inverse of a 2 2 or 3 3 matrix
29
3 matrices
Alternative Coordinate Systems
In R2 there are two systems of coordinates that are used most often: rectangular
and polar. Rectangular coordinates is often called Cartesian coordinates - any
point is uniquely represented by an ordered pair of real numbers. Polar coordinates
are not unique - they represent a point in terms of its distance from the origin and
its angular position relative to a reference ray, usually the positive horizontal axis.
The relation between the rectangular description (x; y) and the polar description
(r; ) is given by
x = r cos
y = r sin
so there are in…nitely many choices of for any given point once we determine r
for that point. Further, we sometimes want to consider r to be the signed distance
2; 43 are both polar coordinate defrom the origin; for example, 2; 3 and
p
scriptions of the point whose rectangular coordinates are 1; 3 . For any given
polar coordinate description (r; ) we unambiguously obtain the rectangular description (x; y) from the above equations. But to go in the other direction requires
some conventional choices to obtain a unique polar description. These conventions
vary by context, but the default is to take r
0 and 2 [0; 2 ). Then we can
compute
p
r = x2 + y 2
and use basic trigonometry to …nd in terms of arctan xy . Even though r is now
uniquely determined, we need to be careful in computing because the range of
the inverse tangent function is
; , which only covers half of the plane. In
2 2
order to have 2 [0; 2 ) we set
y
= arctan , x > 0; y 0
x
y
= 2 + arctan , x > 0; y < 0
x
y
=
+ arctan , x < 0
x
=
, x = 0; y > 0
2
3
=
, x = 0; y < 0
2
30
The only point not covered by these conventions is the origin. Note that the polar
description of a point associates the point with a position vector, whereby r is its
magnitude and is its direction. The origin would correspond to the zero vector,
which has magnitude 0 but no direction. Therefore, is not de…ned for the origin;
we simply write r = 0 as the polar description of the point whose rectangular
coordinates are (0; 0).
Polar coordinates are easily extended to R3 if we want to emphasize radial symmetry about the vertical axis:
x = r cos
y = r sin
z = z
The coordinates (r; ; z) are called cylindrical coordinates because the set of points
described by the equation r = a is a cylinder of radius a whose axis of symmetry
is the z-axis. Note also that the equation = is a half-plane bounded by the
z-axis if we only allow non-negative values for r, whereas it is a plane through the
z-axis if r can be a signed distance from the z-axis. The equation z = a is a plane
parallel to the xy-plane.
3
A third coordinate system commonly used
p in R , spherical coordinates, describes
2
2
points relative to their distance = x + y + z 2 from the origin. Spherical
coordinates are the mathematical formalization of latitude and longitude on the
globe:
x =
y =
z =
sin cos
sin sin
cos
Here 2 [0; 2 ) has the same interpretation as in polar and cylindrical coordinates. The angular coordinate is usually measured o¤ of the positive vertical
axis, so 2 [0; ]. If the point with rectangular coordinates (x; y; z) is projected
orthogonally into the xy-plane, note that its polar coordinates would be (r; )
where
r = sin
The equation
equation =
= a describes a sphere of radius a centered at the origin; the
is a half-plane bounded by the z-axis because there is generally
31
no advantage to allowing to be negative; the equation = is the branch of
a cone with vertex at the origin consisting of the points P such that the angle
!
!
between k and OP is . Since OP = we have
= arccos
!
k OP
because the range of the inverse cosine function is [0; ].
Cylindrical and spherical coordinates will be useful in describing surfaces and
solids in R3 , so it is important to understand both the geometry and algebra of
these systems. For example, the set of points that satisfy
= 4 csc sec
might be more familiar if expressed in rectangular coordinates, which we can
obtain from the above relations, as follows:
1
1
sin cos
sin
= 4
r x
r
=4
= 4
rx
x
x = 4
= 4
Thus,
= 4 csc sec is the spherical coordinate description of the plane x = 4.
32
Suggested Exercises for Chapter 2
The following exercises are not to be handed in. They represent skills required for
basic mastery.
2.1 (pages 85-87):
3; 4; 5; 6
24; 35
2.2 (pages 103-105):
3; 6; 9
21; 26
2.3 (pages 115-116):
1; 3; 5; 9; 13
19; 25
2.4 (pages 123-124):
3; 5; 7; 13; 17
21; 23
2.5 (pages 132-134):
3; 7; 9; 11; 15
2.6 (pages 142-143):
1; 3; 7; 9; 11; 17
Second Graded Assignment
Due: No later than November 10
To reinforce written communication skills the Graded Assignment solutions
should be clearly presented in a "bluebook" or provided in PDF format. Late
papers will not be graded.
Second Graded Assignment. Do any one of the following:
Page 87: 42
Page 124: 24
Page 143: 28
Page 146: 44
33
Real-Valued Functions
If f is a function whose domain is a subset A of Rn we say f is real-valued (scalarvalued) provided f : A ! R. For example, if n = 2 then f assigns a numerical
output to any point in the plane that belongs to A. We typically determine the
domain A from the properties of f , but we can also restrict the domain to a proper
subset U A. Consider
1
f (x; y) =
xy
The largest possible domain for f is
A = f(x; y) : x 6= 0; y 6= 0g
which is R2 with the two coordinate axes removed. In context, this domain could
be restricted, for example, by taking U to be the open …rst quadrant.
The range of a real-valued function is the set of real numbers that are possible
outputs of f . In the example above, the range is all non-zero numbers: ( 1; 0) [
(0; 1). For
p
f (x; y) = x2 + y 2
A = R2 and the range of f is [0; 1); this function takes a point in the plane and
gives its distance from the origin.
If n = 1 we are in the realm of single-variable calculus. Recall that the graph of
such a function on a domain U was de…ned to be
f(x; f (x)) : x 2 U g
so the graph is a subset of R2 , Similarly for n > 1, the graph of f on U is
f(x; f (x)) : x 2 U g
where x = (x1 ; : : : ; xn ). In general, then, the graph of a real-valued function is
a subset of Rn+1 . If n = 2 then we can represent the graph as a surface in R3 ,
in the same way that we drew the graph of a single-variable function as a curve
in R2 . The …gure below shows the graph of f (x; y) = x2 y 2 on the restricted
domain U = f(x; y) : 4 x 4; 4 y 4g
34
-4
20
10
-2
-4
-2
z
y
0
0 0
-10
2
-20
x
2
4
4
f (x; y) = x2
y2
One of our goals is to understand how such functions change in the vicinity of a
particular domain point x. The geometry of the graph will be an important tool.
To simplify notation we often replace subscripts with familiar variable names.
Thus, for n = 2,
x1 = x
x2 = y
z = f (x; y)
and for n = 3,
x1
x2
x3
w
=
=
=
=
x
y
z
f (x; y; z)
In applications, the names for the variables may be adapted to certain measurable
quantities, such as t for time. For n > 3 we use subscripts generically, but most
of our work will not require higher dimensions explicitly.
35
Sections of Graphs
If n > 2 then we cannot represent the graph of f by a geometric …gure that we can
readily perceive. The vector techniques that we have developed will nonetheless
allow us to analyze the function. These techniques will exploit basic Cartesian
geometry. For example, if we restrict the graph of f by holding one of the input
coordinates constant we obtain a section of the graph. Consider the graph of the
saddle-shaped surface in the above example, and set x = 2. Then f (2; y) = 4 y 2 .
The curve z = 4 y 2 is a parabola in the plane x = 2, which is parallel to the yzplane, so the curve can be described in terms of the y and z variables. Similarly,
if we set y = 3 we obtain the parabola z = x2 9 in this plane parallel to
the xz-plane. The shape of the surface shows why the …rst of these parabolas
"opens downward" whereas the second one "opens upward". A section obtained
by setting an input coordinate equal to 0 is sometimes called a trace section. A
trace section, then, is just the intersection of the surface with a coordinate plane
for the input coordinates. What are the trace sections for f (x; y) = x2 y 2 ?
While sections can be formed for any real-valued function we will primarily use
them when n = 2 in order to understand the geometry of surfaces in R3 . Here is
an example, however, of how to …nd sections when n = 3. Consider
f (x; y; z) = x2
y2 + z2
The trace section z = 0 is described by the Cartesian relation
w = x2
y2
which is the saddle-shaped surface again, but here we are imagining it in the xywspace in R4 obtained by setting z = 0. Functions f : R3 ! R are used extensively
in physical applications and they are frequently studied by constructing sections.
Level Sets
If instead of holding an input coordinate constant we set the output variable equal
to a given value c then we obtain the level set of value c
fx 2 U : f (x) = cg
36
For n = 2 this will give us a curve in the domain plane, sometimes called a level
curve (or a contour, as it is referred to on a topographical map). For f (x; y) =
x2 y 2 the level curve z = c is a hyperbola. Note that the hyperbola opens along
the x-axis if c > 0 but opens along the y-axis if c < 0. What happens if c = 0 ?
The level set is a subset of the domain but we can also represent it as slice of the
graph.
20
10
-4
-4
-2
-2
z
0
0 0
2
y -10
2
x
4
4
-20
The intersection corresponding to z =
9.
The actual level curve is the hyperbola x2 y 2 = 9 in R2 . We obtain an image
of this curve in the plane z = 9 parallel to (and below since c < 0) the xy-plane.
Level sets are particularly useful when n = 3 because we can use them to understand subsets of the domain that produce a desired output value. These subsets
will be level surfaces in R3 , which in principle we can render graphically. We will
…nd it particularly useful to represent a general surface in R3 as the level surface
f (x; y; z) = 0
for some function f . In general, these can be di¢ cult to render but certain
examples, often called quadric surfaces because they involve quadratic expressions
37
in x; y; z, are straightforward to analyze. They can be grouped by the degree and
sign of the various domain variables:
p
w = f (x; y; z) = x2 + y 2 + z 2 c : In this case w = 0 is a sphere of radius c if
c > 0, the single point (0; 0; 0) if c = 0, and empty if c < 0. For example, there
are no points (x; y; z) in the domain such that f (x; y; z) = x2 + y 2 + z 2 + 1 = 0.
w = f (x; y; z) = x2 + y 2 z 2 c : If c = 0 then the surface w = 0 is a cone
with vertex at the origin. If c > 0 then the surface is a single-sheeted
hyperboloid
p
whose intersection with the xy-plane is a circle of radius c. If c < 0 then the
surface is a double-sheeted hyperboloid that does not intersect the xy-plane. (See
Figure 2.1.13 on page 83.) What happens when we intersect these surfaces with
planes perpendicular to the xy-plane?
w = f (x; y; z) = x2 + y 2 z c : The surfaces w = 0 are paraboloids, bowlshaped surfaces that open either up or down depending on the coe¢ cient of z.
For example, if f (x; y; z) = x2 + y 2 z + 4 then the paraboloid w = 0 is described
explicitly by
z = x2 + y 2 + 4
The plane z = 4 in the domain space intersects this surface only at (0; 0; 4). A
plane z = a intersects it in a circle if a > 4 but will not intersect it if a < 4.
12
10
8
6
4
2
-2-1 0
-1-2
0 0
1
1
2-2
2
y
-4
x
-6
z = x2 + y 2 + 4
38
w = f (x; y; z) = x2 y 2 z c : The surfaces w = 0 are hyperbolic paraboloids.
For example, x2 y 2 z = 0 is the saddle-shaped surface above.
Cylinders: These surfaces consist of all translations of a plane curve along a line
and occur when the dependence of f on one or more of the input variables can be
removed by a linear transformation. For example, the cylinder
30
20
z
-4
-2
10
2
4
-2
0
0 0
-4
2
x
y
x2 + y
is congruent to
39
z=0
4
12
10
8
z 6
4
2
-2
0
0 0
-4
y2
-4
-2
2x
4
4
x2
2z = 0
which is the translation along the y-axis of the curve z = 12 x2 in the xz-plane.
The …rst cylinder is just a rotation in R3 of the second one.
Every quadric surface is equivalent to one of the above types. For example, x2 +
2y 2 + z 2 4 = 0
is an ellipsoid, obtained from a sphere by a linear transformation of the variables.
40
2
1
z
0
-1
-2
-1.0
-2
-1
-0.5
0.0
y
0
0.5
1
1.0
2
x2 + 2y 2 + z 2 = 4
41
x
Continuity
Recall that a single-variable function y = f (x) is continuous at x = a provided
lim f (x) = f (a)
x!a
This de…nition of continuity states that a is in the domain of f and that the value
of the function at a is equal to the limit of values as x approaches a. In order to
adapt this de…nition to functions from Rn to Rm we need to revisit the concept
of limit in terms of vector quantities. We will …rst study real-valued functions
f : Rn ! R because each component function in the general case f : Rn ! Rm is
real valued. Recall from single-variable calculus for y = f (x) with domain A we
say
lim f (x) = b
x!x0
if and only if for every number " > 0 there is a > 0 such that, for any x 2 A
with 0 < jx x0 j < , we have jf (x) bj < ". It is important to remember in
this de…nition that x0 need not be in the domain A, and even if x0 2 A it need
not be the case that f (x0 ) = b for the limit to be b. For f to be continuous at x0 ,
however, we must have x0 2 A and f (x0 ) = b.
If we think of the limit de…nition in terms of distances in the range as related to
distances in the domain then we have a direct generalization to functions from Rn
to R :
For y = f (x) 2 R with domain A
Rn we say
lim f (x) = b
x!x0
if and only if for every number " > 0 there is a > 0 such that, for
any x 2 A with 0 < jx x0 j < , we have jf (x) bj < ".
The generalization to f : Rn ! Rm is now natural:
For y = f (x) 2 Rm with domain A
Rn we say
lim f (x) = b
x!x0
if and only if for every number " > 0 there is a > 0 such that, for
any x 2 A with 0 < jx x0 j < , we have jf (x) bj < ".
42
We just need to remember that jvj is the magnitude of the vector v, so jx x0 j
for example is the magnitude of the di¤erence of the vectors x and x0 which is
just the distance between the points that represent these vectors. The existence
of the limit is the idea that we can make the output of f as close to the point b
as we like just by making the input x close enough to the point x0 . The de…nition
of continuity carries over as well:
The function f is continuous at x0 in its domain provided
lim f (x) = f (x0 )
x!x0
For a single-variable function the existence of a limit can be determined by analyzing what happens as x approaches the target value from the left and from
the right. The limit can exist from one direction but not from the other, or from
neither. The limit itself exists when the limit from each direction exists and these
limiting values agree. When we try to carry this idea over to multivariable functions we realize that their are in…nitely many directions of approach as well as
in…nitely many paths of approach to the target point. Since we cannot check all
of these modes of approach individually it is necessary to introduce techniques
from analysis to determine limits. We do not need very much analysis at this
introductory level, but it is helpful to understand a few common terms.
De…nition. Let r > 0 and let x0 2 Rn . The set Dr (x0 ) is the collection of points
x in Rn such that jx x0 j < r. A subset U Rn is called an open set if there is
an r for every x0 2 U such that Dr (x0 ) is contained in U .
These so-called ’disks’ Dr (x0 ) generalize the open intervals from single-variable
calculus, where an open set U R has the property that an open interval centered
at each of its points can be found that is entirely contained in U . Limits in singlevariable calculus were often most important when approaching certain boundary
points in the domain.
De…nition. A point x in Rn is a boundary point of a set A Rn if every disk
centered at x contains at least one point in A and at least one point not in A.
For example, if A = f(x; y) : x2 + y 2 1g in R2 then the boundary points of A
are precisely the points of the unit circle. If B = f(x; y) : x2 + y 2 < 1g in R2 then
43
the boundary points of B are also the points of the unit circle. Thus, boundary
points may or may not belong to the set itself. Sometimes we use @A to denote
the collection of all boundary points of A, and call @A the boundary of A.
Real-Valued Polynomials and Rational Functions.
Let A be the natural (unrestricted) domain of the function f . If f (x) = P (x)
where P (x1 ; : : : ; xn ) is a polynomial in the individual variables then A = Rn and
P (x)
f is continuous at any x0 . If f (x) = Q(x)
where P; Q are polynomials then A
consists of all points x such that Q(x) 6= 0. Even if Q(x0 ) = 0 the limit as x ! x0
may exist, in which case we can de…ne f (x0 ) to be this limit and thus extend f
to a function continuous at x0 . Consider
f (x; y) =
x2 y 2
x2 + y 2
For this function A = R2 nf(0; 0)g but the limit as x ! x0 = (0; 0) exists. One
way to see this is to argue that, no matter what path of approach is taken by
x to get to the origin, the value f (x) can be expressed in terms of jxj. This is
commonly done by converting to polar coordinates:
f (x; y) = f (r; ) =
r4 cos2 sin2
r2
As long as x 6= x0 we have r 6= 0 in which case
f (r; ) = r2 cos2 sin2
By taking x su¢ ciently close to x0 we can make r arbitrarily small. The value of
cos2 sin2 may vary considerably depending on the path of approach, but this
value remains bounded between 0 and 1, so f (r; ) can also be made arbitrarily
small. We conclude
lim f (x; y) = 0
(x;y)!(0;0)
which allows us to extend f to a function continuous on all of R2
x2 y 2
x2 + y 2
f (0; 0) = 0
f (x; y) =
This explains why a computer graph of z =
44
x2 y 2
x2 +y 2
looks unbroken at the origin:
10
-4
z
-4
5
-2
-2
0
0 0
y2
2
x
4
4
z=
x2 y 2
x2 +y 2
By contrast, consider
f (x; y) =
xy
x2 + y 2
which has the same natural domain A = R2 nf(0; 0)g, but this time the limit
as (x; y) ! (0; 0) does not exist. Converting to polar coordinates shows that a
straight line path to the origin produces di¤erent limits depending on the angle
that the line makes with the x-axis; for example, approaching along either axis
produces a limit of 0 but the limit is 12 if we approach along the line y = x. A
will attempt to display this "rupture" in the graph
computer graph of z = x2xy
+y 2
at the origin by shading, but the limitations of the rendering are easily revealed.
Here are two planes parallel to the domain plane. The curves of intersection with
the surface representing the graph are actually pairs of intersecting lines. If we
approach (0; 0) along the level curves corresponding to these lines the function
approaches the values given by the heights of the planes.
45
1.0
0.5
-1.0
-0.5
z
0.0
0.0 0.0
0.5
-0.5
-1.0
x
-0.5
1.0
0.5
y
1.0
-1.0
z=
xy
,
x2 +y 2
z=
2
5
and z =
1
3
Exercise. Show that the level curves of value c are
p
1
y=
1 4c2 x
1
2c
Thus, the range of f is
1 1
;
2 2
. What happens when c =
1
2
?
Similar techniques can be used to analyze functions of more than two variables.
By converting to spherical coordinates
x =
y =
z =
cos sin
sin sin
cos
show that
lim
(x;y;x)!(0;0;0)
f (x; y; z) = 0
xyz
for the function f (x; y; z) = x2 +y
2 +z 2 . Show that the limit as (x; y; x) ! (0; 0; 0)
does not exist for the function f (x; y; z) = xy+yz+xz
. For the …rst function,
x2 +y 2 +z 2
2
f ( ; ; ) = cos sin sin cos which goes to zero as the domain point approaches the origin since ! 0 and cos sin sin2 cos is bounded. For the second function, f ( ; ; ) = ((cos + sin ) cos + cos sin sin ) sin which does
46
not even depend on and approaches di¤erent values for varies directions of approach to the origin in R3 .
Properties of continuous functions.
Most of the continuity results for single-variable functions generalize to multivariable functions (see page 98). If the range of f is in Rm then we have
f (x) = (f1 (x); : : : ; fm (x))
and it can be shown that f is continuous at x0 if and only if each of the real-valued
coordinate functions fj is continuous at x0 . Using the concept of open sets, the
following theorem about continuity of the composition of continuous functions is
usually proved in an analysis course:
Theorem. Let g : A
Rn ! Rm and let f : B
Rm ! Rp , and suppose
g(A)
B. Then f g is de…ned on A, and if g is continuous at x0 and f is
continuous at g(x0 ) then f g is continuous at x0 .
Example. The natural domain of
h(x; y; z) =
sin (xyz)
xyz
is R3 with the coordinate planes removed (the eight open octants). Any point x0
on a coordinate plane is a boundary point of the domain because every open ball
of positive radius centered at x0 contains points in the domain and points not in
the domain. In particular, the origin x0 = (0; 0; 0) is a boundary point. Does
limx!x0 h(x) exist? Changing coordinate systems is not much help here, but the
above theorem allows us to …nd the limit easily.
Let g(x; y; z) = xyz and let f (t) =
f : B ! R where B = Rnf0g. However,
sin t
.
t
Then g : R3 ! R and
sin t
=1
t!0 t
lim
so we can extend f to a continuous function on all of R by de…ning
f (0) = 1 :
47
y
1.0
0.8
0.6
0.4
0.2
-10
-8
-6
-4
-2
2
4
6
8
10
x
-0.2
f (t) =
sin t
; f (0)
t
=1
Then h = f g has now been extended to include the origin in its
domain. Since g is a polynomial it is continuous at every point in R3 .
But g(0; 0; 0) = 0 and f is continuous at 0, so by the theorem we have
f g is continuous at (0; 0; 0). In particular, limx!x0 h(x) = 1.
Note. The above example is exercise 11b) on page 103. The answer in the
appendix is a typo.
48
Partial Derivatives
The derivative of a multivariable function will be de…ned in a way that generalizes
the slope of the tangent line to the graph of a single-variable function. The …rst
step toward this generalization is to de…ne the tangent plane to the graph of a
real-valued function of two variables. That de…nition will require the idea of a
partial derivative. Recall that a set U Rn is open when for every x0 in U there
exists r > 0 such that Dr (x0 )
U , where Dr (x0 ) = fx 2 Rn : jx x0 j < rg.
Note that if n = 1 then Dr (x0 ) is just the open interval of radius r centered at
x0 . When we denote a set by U , the assumption is that the set is open.
Let f : U
Rn ! R, x = (x1 ; : : : ; xn ). The partial derivative of f with respect
@f
, where
to xj is the real-valued function, denoted @x
j
@f
f (x + hej )
(x) = lim
h!0
@xj
h
Thus, the domain of
@f
@xj
f (x)
is the set of points in Rn for which the limit exists.
For most elementary functions it is not necessary to compute these limits explicitly.
The ordinary rules of di¤erentiation apply by treating the input variables other
than xj as constants. For example, if f (x; y; z) = sin (xyz) then
@f
@f
(x) =
(x; y; z) = xz cos(xyz)
@x2
@y
As with single-variable functions, however, it is sometimes necessary to use the
limit calculation. Consider the function
xy 2
f (x; y) = 2
x + y4
for which A = R2 nf0g. We cannot de…ne f to be continuous at (0; 0) because if
we approach along any straight line the limit is 0, but if we approach along the
parabola x = ay 2 the limit is a2a+1 . Still, we can extend the domain to all of R2
by de…ning f (0; 0) to be anything we like, say f (0; 0) = 0. Now we can try to
49
compute the partial derivatives
@f
@x
and
@f
@y
at the origin:
@f
f (0 + h; 0)
(0; 0) = lim
h!0
@x
h
0 0
= lim
=0
h!0
h
@f
f (0; 0 + h)
(0; 0) = lim
h!0
@y
h
0 0
=0
= lim
h!0
h
f (0; 0)
f (0; 0)
This example shows that the partial derivatives of a function can exist at
a point even if the function is not continuous there.
In this case, however, the partial derivative functions are not continuous at the
origin. At any point other than (0; 0) the usual rules of di¤erentiation show
@f
x2 y 4
(x; y) =
y2
@x
(x2 + y 4 )2
@f
x2 y 4
(x; y) = 2xy
@y
(x2 + y 4 )2
Exercise. Show that
@f
(x; mx) =
m2
@x
@f
lim
(x; mx) = 2m
x!0 @y
lim
x!0
so for each partial derivative function the limit as (x; y) ! (0; 0) does not exist.
For each of these functions, the limiting value depends on the path of approach
to the origin.
Tangent Planes and A¢ ne Approximation
50
For a single-variable function y = f (x) that is di¤erentiable at x0 the equation of
the tangent line to the graph at (x0 ; f (x0 )) is
y = l (x) = f (x0 ) + f 0 (x0 ) (x
x0 )
The reason that f 0 (x0 ) is the slope of this tangent line is because
f (x)
l(x) = f (x)
f (x0 )
f 0 (x0 ) (x
x0 )
and so
f (x) l(x)
f (x) f (x0 )
= lim
f 0 (x0 ) = 0
x!x0
x!x
x x0
x x0
0
by the de…nition of the derivative. In other words, the a¢ ne linear function
y = l(x) is a good approximation of f near x0 . Thinking of x as the parameter t
we can express every point on this line as
lim
l(t) = (x0 ; f (x0 )) + t (i + f 0 (x0 )j)
that is,
x(t) = x0 + t
y(t) = f (x0 ) + f 0 (x0 ) t
because l(x0 + t) = f (x0 ) + f 0 (x0 ) t. In other words, the tangent line is the line
through the point (x0 ; f (x0 )) with direction vector i + f 0 (x0 )j. Note, for future
reference, that f 0 (x0 )i j is a vector orthogonal to the direction vector for
the tangent line.
Now consider a function of two variables z = f (x; y) with partial derivatives
de…ned at (x0 ; y0 ), and look at the plane that contains the point (x0 ; y0 ; f (x0 ; y0 ))
with normal vector
@f
@f
(x0 ; y0 ) i+
(x0 ; y0 ) j k
@x
@y
As we have seen, the Cartesian equation for this plane is
@f
(x0 ; y0 ) (x
@x
x0 ) +
@f
(x0 ; y0 ) (y
@y
y0 )
(z
f (x0 ; y0 )) = 0
which is the graph of the a¢ ne linear function
z = l(x; y) = f (x0 ; y0 ) +
@f
(x0 ; y0 ) (x
@x
51
x0 ) +
@f
(x0 ; y0 ) (y
@y
y0 )
De…nition. We call this plane the tangent plane to the graph of f at
(x0 ; y0 ) provided
f (x; y) l(x; y)
=0
(x;y)!(x0 ;y0 ) j(x; y)
(x0 ; y0 )j
lim
In this case, the sections x = x0 and y = y0 for the function l are lines
tangent to the section curves for f and we say that f is di¤erentiable
at (x0 ; y0 ).
Though the test for di¤erentiability is technical many of the same results apply
as for single-variable functions. For example, rational functions are di¤erentiable
at all points in the natural domain A. Consider
f (x; y) = xy
x2 y 2
x2 + y 2
What is the tangent plane at ( 1; 3; 12
) ? Here, f ( 1; 3) =
5
we have
12
t
5
and on A = R2 nf0g
@f
4x2 y 2 + x4 y 4
(x; y) = y
@x
(x2 + y 2 )2
4x2 y 2 x4 + y 4
@f
(x; y) =
x
@y
(x2 + y 2 )2
Thus
@f
33
( 1; 3) =
@x
25
@f
29
( 1; 3) =
@y
25
and so the tangent plane at ( 1; 3; f ( 1; 3)) is the graph of the function
l(x; y) =
=
12
5
33
29
(x + 1) +
(y
25
25
1
(33x
25
52
29y + 60)
3)
10
5
-4
4
-2
z2
y
0
0 0
-5
-2
-4
2
x
4
-10
Tangent plane at ( 1; 3; 12
)
5
This graph looks "smooth" at the origin because the limit of f is 0 there as can be
seen by converting to polar coordinates. If we de…ne f (0; 0) = 0 then f becomes
continuous at the origin.Will the function be di¤erentiable there? We see from
the above calculations that both partial derivatives exist at the origin:
@f
f (0 + h; 0)
(0; 0) = lim
h!0
@x
h
@f
f (0; 0 + h)
(0; 0) = lim
h!0
@y
h
and so
l(x; y) = 0
53
f (0; 0)
f (0; 0)
=0
=0
is the approximation function at (x0 ; y0 ) = (0; 0). Then
f (x; y) l(x; y)
(x;y)!(0;0) j(x; y)
(0; 0)j
f (x; y)
=
lim p
(x;y)!(0;0)
x2 + y 2
xy (x2 y 2 )
=
lim
3
(x;y)!(0;0) (x2 + y 2 ) 2
1
= lim r sin 4 = 0
r!0 4
lim
Thus the xy-plane is the tangent plane at the origin. The function f is di¤erentiable there.
This is the idea that generalizes to a de…nition of the derivative for functions from
Rn to Rm . Some notation will simplify the statement of this de…nition. Since
f (x) = (f1 (x) ; : : : ; fm (x)) we can compute partial derivative functions for each
of the component scalar functions
tij =
@fi
@xj
and arrange them in a matrix T = (tij ).
Since the output of f is described by m coordinate functions, each
of which depends on n inputs, the shape of the matrix T is m n.
For any point x0 in Rn such that these partial derivatives all exist, the matrix T
represents a linear transformation
Df (x0 ) : Rn ! Rm
and we have the a¢ ne linear function
l(x) = f (x0 ) + Df (x0 ) (x
x0 )
Since T is an m n matrix we can evaluate Df (x0 ) (x x0 ) by expressing x x0
in the form of a matrix with shape n 1. The product is now an m 1 matrix,
so we also express f (x0 ) as an m 1 matrix for purposes of computation.
54
We say that f is di¤erentiable at x0 2 U provided
lim
x!x0
f (x) l(x)
=0
jx x0 j
The derivative of f at the point x0 is the linear transformation Df (x0 ), but
since it is represented by the matrix T we usually say T = Df (x0 ) for purposes
2
2
of computation. Let’s …nd the derivative of f (x; y) = xy xx2 +yy2 at x0 = ( 1; 3).
( 1; 3) = 33
and @f
( 1; 3) = 29
so the derivative is
We have @f
@x
25
@y
25
33
25
T = Df (x0 ) =
Note that l(x; y) = f ( 1; 3) + Df ( 1; 3) (x + 1; y
transformation at (x + 1; y 3) we write
33
25
Thus, l(x; y) =
12
5
+
33
25
29
25
29
25
3) and to evaluate the linear
x+1
y 3
29
25
x+1
y 3
=
1
25
(33x
29y + 60), as above.
When m = 1, as in our example, the linear function Df is naturally associated
with the vector
@f
@f
rf =
e1 +
+
en
@x1
@xn
called the gradient of f , whereby
Df (x0 ) (h) = rf (x0 ) h
for any h 2 Rn . That is, we usually use the dot-product notation when f is a
scalar-valued function.
As with single-variable functions, if f : U
Rn ! Rm is di¤erentiable at x0
then it is continuous at x0 . We have seen, however, that the partial derivatives
can exist at a point without the function even being continuous at that point, let
alone di¤erentiable. Existence of the partials derivatives along with continuity of
the function is also not enough:
xy
f (x; y) = p
; f (0; 0) = 0
x2 + y 2
55
is continuous at (0; 0), and @f
(0; 0) = @f
(0; 0) = 0, but the function l(x; y) = 0
@x
@y
does not satisfy the limit criterion since
f (x; y) l(x; y)
p
(x;y)!(0;0)
x2 + y 2
xy
=
lim
6= 0
2
(x;y)!(0;0) x + y 2
lim
because this limit does not exist. In particular, z = l(x; y) = 0, but the xy-plane,
cannot be called the tangent plane at the origin. The following theorem bridges
the gap:
@fi
exist in an open set containing x0 and
Theorem. If all partial derivatives @x
j
are continuous at x0 , then f is di¤erentiable at x0 .
Such functions are said to be of class C 1 . The function in the example above
is not C 1 . The di¤erentiable functions we encounter will be class C 1 , but the
converse of the theorem is not true: There exist di¤erentiable functions
with discontinuous partial derivatives, that is, the resulting a¢ ne linear
function l(x) may satisfy the limit criterion even though the partial derivatives
are not continuous at the given point. As an example we need only look to a
single-variable function, such as
1
x
which can be made continuous at the origin by setting
f (x) = x2 sin
f (0) = lim x2 sin
x!0
Then
h2 sin h1
h!0
h
f 0 (0) = lim
1
=0
x
0
=0
and if x 6= 0
1
1
cos
x
x
1
1
However, limx!0 2x sin x cos x does not exist, so the derivative exists at the
origin (the function l(x) = 0 is its linear approximation) but the derivative is not
continuous at the origin.
f 0 (x) = 2x sin
56
To summarize: The partial derivatives can exist at a point without the function being di¤erentiable there. If the partial derivatives
exist and are continuous at a point then the function will be di¤erentiable there, but the function can be di¤erentiable at a point without
the partial derivatives being continuous there.
Functions with polynomial components are particularly good examples for practice
in computing derivatives. Consider
f (x; y) = x2 + y 2 ; xy; x2
y2
and let x0 = (3; 4). Then Df (x) is represented by
2
3
2x 2y
x 5
T=4 y
2x
2y
2
3
6 8
Let x0 = (3; 4). Then T = Df (x0 ) = 4 4 3 5 and so the a¢ ne linear approx6
8
imation to f near (3; 4) is
2
3 2
3
2
3
25
6 8
6x + 8y 25
x 3
l(x; y) = 4 12 5 + 4 4 3 5
= 4 4x + 3y 12 5
y 4
7
6
8
6x 8y + 7
which represents the function l(x; y) = (6x + 8y 25; 4x + 3y 12; 6x
The point (3:01; 3:98) is "close" to x0 in the domain. We have
8y + 7).
l(3:01; 3:98) = (24:9; 11:98; 6:78)
f (3:01; 3:98) = (24:901; 11:980; 6:7803)
The approximation of f by l will get better as the point in the domain gets
closer to x0 , but will get worse as the point gets farther away because f itself has
quadratic components.
57
Optional Topic
If we suspect that a function is di¤erentiable at point where it can be made
continuous, sometimes we can "guess" the a¢ ne linear function l and test it
directly. Consider f (x; y) = sin(xy)
which becomes continuous at (0; 0) if we let
xy
f (0; 0) = 1. The graph looks very smooth at the origin and the symmetry leads
us to believe that z = 1 is the tangent plane to the graph at (0; 0; 1).
1.0
0.8
0.6
-4 z 0.4
0.2
-2
0.0
-0.20 0
y2
-4
-2
2x
4
4
z=
sin(xy)
xy
We suspect, then, that l(x; y) = 1 will work:
sin(xy)
xy
1
p
(x;y)!(0;0)
x2 + y 2
sin (r2 cos sin ) r2 cos sin
= lim
r!0
r3 sin cos
f (x; y) l(x; y)
p
lim
=
(x;y)!(0;0)
x2 + y 2
lim
If you are familiar with L’Hôspital’s Rule, use it twice to obtain
lim r sin (2 ) sin r2 cos sin
r!0
58
=0
Thus, z = 1 is, in fact, the tangent plane, and Df (0; 0) =
0 0 . This implies
@f
(0; 0) = 0
@x
@f
(0; 0) = 0
@y
In fact, we can show that z = 1 is the tangent plane for any point where either
x = 0 or y = 0 because we obtain a limit of 0 for arbitrary y after setting up the
near x = 0. Since f is symmetric in x; y the
di¤erence quotient to calculate @f
@x
@f
same result holds for @y . It follows that f is di¤erentiable at any point in R2 if
we de…ne f (x; 0) = f (0; y) = 1.
59
Paths and Curves in Rn
The vector/parametric description of a line is an example of a path whose image
is the line we want to describe. Just as many parametric representations describe
the same line, we can describe a curve by many paths.
A path in Rn is a function c : A R ! Rn . The image of A in Rn
is the curve C parameterized by the path c.
If A is the closed interval [a; b] we call c(a) and c(b) the endpoints of the curve.
As with lines, we typically use t to denote the domain variable (parameter)
c(t) = (x1 (t); : : : ; xn (t))
because many applications require locating a point on a curve as a function of
time. A familiar path is c(t) = (cos t; sin t) for which the curve C is the unit circle
in R2 . Note that the unit circle is also described by the path c(t) = (cos !t; sin !t)
for any non-zero constant !. (In applications, ! is sometimes called the angular
speed of the path.) There are in…nitely many paths for any given curve
;
describes the portion of
C. For example, c(t) = (cos t; sin t) for t 2 U =
2 2
the unit circle in the open half-plane x > 0. This curve has no endpoints since U
is an open interval. The same curve is described by the path
c(t) =
2
t
p
;p
2
4+t
4 + t2
t 2 R
It is useful to think of C being traced out by the vector whose position represen!
tation is OP where P = c(t).
60
y
1.0
0.8
0.6
0.4
0.2
0.0
0.2
0.4
0.6
0.8
-0.2
1.0
x
-0.4
-0.6
-0.8
-1.0
c(t) =
p 2 ; p t
4+t2
4+t2
Exercise. For c(t) = (x (t) ; y (t)) =
, c(2) =
p 2 ; p t
4+t2
4+t2
p1 ; p1
2
2
show that
1
c0 (t) = x (t)2 [ y (t) i + x (t) j]
2
What is the maximum speed of the path?
When n > 2 the path description of curves is essential because such a curve cannot
be described by a single Cartesian equation. Consider the path
c(t) = sin 2t; 2 sin2 t; 2 cos t
t 2 [0; 2 ]
This is a closed curve (the endpoints exist and are the same) since c(0) = c(2 ).
Note that
x(t)2 + y(t)2 + z(t)2 = 4
so the curve C is on the sphere of radius 2 centered at the origin. The path starts
and ends at the "north pole", passing through each other point of C exactly once
except for (0; 2; 0) which it reaches at t = 2 and t = 32 . For what value of t does
the path reach the "south pole"?
61
2
1
0.8
1.0
0.6
0.4
-1.0
0
-0.2 -0.4 -0.6 -0.8
0.2 0.0 0.0
0.2 0.4
0.6 0.8 1.0
1.2 1.4
1.6
1.8
2.0
-1
-2
c
3
=
p
3 3
; ;1
2 2
is the tip of the directed line segment from the origin as
shown.
Tangents and Velocity
If the path c is di¤erentiable at t we usually write
c0 (t) = (x01 (t); : : : ; x0n (t))
for the derivative
2
3
x01 (t)
6
7
Dc (t) = 4 ... 5
x0n (t)
62
The derivative represents the tangent vector at c(t), which is the velocity at this
point if t is the time parameter. Thus, the magnitude jc0 (t)j is the speed of the
path at this point as it traces out the curve C. At certain points the speed might
be 0, in which case the tangent vector is the zero vector 0. The cycloid curve
described by the path
c(t) = (t sin t; 1 cos t)
is di¤erentiable at every point:
c0 (t) = (1
= (1
cos t; sin t)
cos t) i + (sin t) j
but the tangent vector vanishes whenever t = 2k . If you look at Figure 2.4.6
on page 119 it appears that C has cusps at these points, but remember that
c0 (t) is the velocity at time t, not the rate of change of y with respect to x. In
fact, as we will see by the Chain Rule,
t
2
which is not de…ned for even multiples of . What is the maximum speed of this
path? We have
q
0
jc (t)j =
(1 cos t)2 + (sin t)2
p p
=
2 1 cos t
y 0 (x) = cot
which has a maximum value of 2 when t is an odd multiple of .
Using the vector/parametric representation of a line we can easily …nd the tangent line to a path at c(t0 ), provided c0 (t0 ) 6= 0 :
l(t) = c(t0 ) + (t
t0 ) c0 (t0 )
The direction vector for the line is c0 (t0 ) and, of course, l(t0 ) = c(t0 ). For the
path c(t) = sin 2t; 2 sin2 t; 2 cos t we have
c0 (t) = (2 cos 2t; 4 cos t sin t; 2 sin t)
63
so, when t0 =
3
the tangent line is
l(t) = (x(t); y(t); z(t))
1p
x(t) =
t
3+
2
3
p
p
3
y(t) =
3 + 3t
2 3
p
p
z(t) = 1 +
3
3t
3
3
2
1
-3
2
-2
-1
0
-1
1
00
-2
1
2
-1
3
-2
-3
Tangent line at c
3
For this path, the tangent line can be constructed at any point since c0 (t) 6= 0 for
any value of t.
Chain Rule
64
We have de…ned the derivative of f : U Rn ! Rm to be a linear transformation
T : Rn ! Rm that gives us an a¢ ne-linear approximation to f near any point
x0 2 Rn where f is di¤erentiable. Consequently the usual linearity rules apply as
for single-variable functions; see Theorem 10, (i) and (ii), page 125. Similar
considerations show that the product and quotient rules for derivatives generalize
as expected for real-valued functions; Theorem 10, (iii) and (iv). Our main
goal now is to generalize the Chain Rule so that we can compute the derivative
of the composition of functions. This is remarkably easy to develop once we look
back at the single-variable case:
Let g : U
R ! R and f : V
R ! R with g (U ) V . If g is
di¤erentiable at x0 and f is di¤erentiable at y0 = g(x0 ) then f g is
di¤erentiable at x0 and
g)0 (x0 ) = f 0 (y0 ) g 0 (x0 )
(f
If we use the new notation we developed for derivatives this statement becomes
D (f
g) (x0 ) = Df (y0 ) Dg (x0 )
which says that Tf g at x0 is the product of Tg at x0 and Tf at g(x0 ). The three
linear transformations Tf g , Tf and Tg at the respective points are very simple
in this case because any linear transformation of R to itself is of the form x 7! mx
for some constant m. In all three cases the constants m are just the slopes of the
tangent lines to the the respective graphs.
Consider the single-variable example g(x) = tan x on U = 0; 2 and let f (x) = x2
on V = R. Then g(U ) V . Since (f g) (x) = tan2 x, at x0 = 3 we have
(f
g)0
3
g0
f0 g
3
3
= 2 tan
3
= sec2
3
p
= 2
3
sec2
3
=4
p
=8 3
which veri…es the single-variable Chain Rule. We note now that Tg (x0 ) ispthe
linear transformation x 7! 4x, Tf (g(x0 )) is the linear transformation x 7! 2 3x
65
p
(since g 3 = 3 and f 0 (x) = 2x), and Tf g (x0 ) is the linear transformation x 7!
p
2x
8 3x (since (f g)0 (x) = d tan
= 2 tan x sec2 x). We are therefore accustomed
dx
to seeing the Chain Rule as the product of real numbers
p
p
8 3 = 2 3 (4)
which, for single-variable functions, is just a product of slopes. But now we want
to view it as the composition of linear functions
p
= Tf
3
Tg
Tf g
3
3
This is the insight that carries over to the general Chain Rule:
When n = m = 1 the composition of the linear maps on the
right side is obtained as the product of two numbers. But in general
the composition will be the product of two matrices, so order of
multiplication is crucially important. The Chain Rule for general n,
m and p becomes
Tf g (x0 ) = Tf (y0 ) Tg (x0 )
equivalently
D (f
g) (x0 ) = Df (y0 ) Dg (x0 )
where Dg (x0 ) is represented by an m n matrix, Df (y0 ) by a
p m matrix, and therefore D (f g) (x0 ) by a p n matrix.
Again, working with polynomial examples is easy because we do not need to worry
about compatible domains. Suppose
x3 + y; xy 1; x4 + xy 2 + 2
3x2 z; x3 + y 3 z
2
3
3x2
1
y
x 5 and Df (x) =
Here, n = 2; m = 3; p = 2. Then Dg (x) = 4
3
2
4x + y 2xy
p
p
p
6x
0
1
2
2
2
R
,
so
g(x
)
=
2
+
1;
2 1; 5 =
.
Let
x
=
1;
0
0
3x2 3y 2 z y 3
g(x; y) =
f (x; y; z) =
66
2
3
3
1
p
1 5 and Df (y0 ) =
y0 2 R3 , Dg (x0 ) = 4 2 p
6 2 2
By the Chain Rule
D (f g) (x0 ) =
=
p
6p2 + 6
6 2+9
p
6p2 + 6
p0
p 1
6 2+9
30 2 + 45 5 2 7
p
p
4 p2 + 6
18p2 + 12
93 2 75
38 2 + 74
p0
p 1
.
30 2 + 45 5 2 7
2
3
1
p3
4 2 1 5
p
6 2 2
Exercise.
Find h = (f g) (x; y) explicitly and calculate the derivative of h at
p
1; 2 without using the Chain Rule.
Applications
Certain dimensions m; n; p are particularly important in applications:
1) Suppose g is a path in R3 and f gives temperature as a function of position in
R3 . Here, n = 1; m = 3; p = 1 so f g is a single-variable function of the parameter
t, which could represent time. The derivative (f g)0 (t) is the instantaneous rate
of temperature change at the point on the curve corresponding to the time t. The
Chain Rule takes a familiar form in this case. Consider again the path
c(t) = sin 2t; 2 sin2 t; 2 cos t
and suppose the temperature in space in given by
f (x; y; z) = ek
x y z
for some constant k. Let h(t) = f (c (t)). Then
dh
= h0 (t) = Df (c (t)) Dc (t)
dt
but Dc (t) is a 3 1 matrix and Df (c (t)) is 1 3, so this matrix product is more
commonly written as
h0 (t) = rf (c (t)) c0 (t)
In this example, rf (x; y; z) =
ek
x y z
67
; ek
x y z
; ek
x y z
.
3
What is the rate of temperature change along the path at time t0 =
?
Is the temperature increasing or decreasing at this time?
We are at the point
p
3 3
; ;1
2 2
on the curve, so
k+ 5
rf (c (t0 )) =
We saw that c0 (t0 ) =
e
5
p
2
3
; e
k+ 5
p
3
2
; e
k+ 5
p
3
2
p
p
1; 3;
3 so
h0
and since ek+
p
3
2
3
= ek+
5
p
3
2
> 0 the temperature is increasing as time increases.
2) A function g : Rn ! Rn is sometimes called a vector …eld on Rn . If n = 3 it is
common to write the component functions of g as
g(x; y; z) = (u; v; w)
Then for f : R3 ! R and h = f g we have h(x; y; z) = f (u; v; w), and Dh is a
1 3 matrix. Consider the vector …eld g with
u(x; y; z) = xy
v(x; y; z) = yz
w(x; y; z) = xz
p
and suppose f (x; y; z) = x2 + y 2 + z 2 . Then the function h gives the magnitude
of the vector ui + vj + wk assigned by g to the point (x; y; z) and Dh(x; y; z) tells
us how this magnitude is changing with respect to x, y and z. Here,
2
3
y x 0
Dg(x; y; z) = 4 0 z y 5
z 0 x
h
i
v
w
u
p
p
p
Df (u; v; w) =
u2 +v 2 +w2
u2 +v 2 +w2
u2 +v 2 +w2
68
Now f is di¤erentiable provided u2 + v 2 + w2 6= 0 and so the Chain Rule will
give the derivative of h at any point in R3 that is not on a coordinate axis:
Df (u; v; w)Dg(x; y; z) =
@h
@x
@h
@y
@h
@z
1
=p
x2 y 2 + x2 z 2 + y 2 z 2
x (y 2 + z 2 ) y (x2 + z 2 ) z (x2 + y 2 )
For example, at x0 = (3; 4; 5) the real-valued function h is increasing in the x
and z directions and decreasing in the y direction because
123
@h
(x0 ) = p
@x
769
@h
136
p
(x0 ) =
@y
769
@h
125
(x0 ) = p
@z
769
3) As a third example suppose n = 1 and p = m. Then g is a path c and f is a
vector …eld on Rm . The composition f g is another path p(t) = f (c(t)) in Rm .
By the Chain Rule
p0 (t) = Df (c(t))c0 (t)
so the tangent vector of c for a given t0 is mapped to the tangent vector of p
for that value by the linear transformation whose matrix is Df
p (c(tp0 )). For
2
0
1; 3;
3 . Let
the example c(t) = sin 2t; 2 sin t; 2 cos t we found c ( 3 ) =
f (x; y; z) = (xy; yz; xz). Then
2
3
2 sin2 t sin 2t
0
0
2 cos t 2 sin2 t 5
Df (c(t)) = 4
2 cos t
0
sin 2t
so
2
3
2
p0 ( ) = 4 0
3
1
1
2
p
3
1
0
Thus, the tangent vector for p is
32
3 2
1
p
3
54
5=4
2
p
p3
1
3
3
2
0
p
3
j
2
69
5
k
2
0p
3
2
5
2
3
5
at the point p( 3 ) =
p
p
3 3 3
3
;
;
4
2 2
2
1
-2
-2
0
-1
-1
00
1
2
1
2
-1
-2
p(t) and tangent line at p
3
: What is the direction of the velocity vector?
The path p starts and ends at the origin, which it passes through four times, once
for every time the path c crosses a coordinate axis.
70
Directional Derivatives
The de…nition of a partial derivative for a real-valued function f is based on the
idea of measuring the change in f in the direction of a coordinate vector:
@f
f (x + hej )
(x) = lim
h!0
@xj
h
f (x)
The vector ej is just a unit vector in the direction of the j th -coordinate axis.
Similarly, we could ask for a derivative in the direction of any unit vector v.
This derivative may or may not exist, but if it does it should be possible, using
the Chain Rule, to compute it in terms of a single parameter t. First we would
compute
d
f (x + tv)
dt
and then evaluate this ordinary derivative at t = 0. We would call this number
the directional derivative of f at x in the direction v. This is the formula we
expect if we compute the limit
lim
h!0
f (x + hv)
h
f (x)
where the coordinate vector ej has been replaced by any unit vector v. Note that
x + tv is just the vector/parametric description of the line in Rn containing x with
direction vector v. Let c(t) = x + tv so that f (x + tv) = f (c(t)). By the Chain
Rule,
d
f (c (t)) = rf (c (t)) c0 (t)
dt
But c0 (t) = v and c(0) = x, so the evaluation of
d
f (x
dt
+ tv) at t = 0 is
rf (x) v
Thus, if a directional derivative exists at x then it is easily
computed by taking the dot product of the direction vector
v with the gradient of the function at x.
71
This has an immediate geometric consequence: If
and v then
is the angle between rf (x)
rf (x) v = jrf (x)j jvj cos
= jrf (x)j cos
so the directional derivative is maximized when rf (x) and v point in the same
direction ( = 0) and is minimized when rf (x) and v point in the opposite
direction ( = ). A simple example will illustrate this principle. Let f (x; y) =
x2 y 2 . Then
rf (x; y) = (2x; 2y)
The only point in R2 where rf = 0 is (0; 0). (Later we will refer to such a point
as a critical point.) Thus the directional derivative at x = (0; 0) is
0 v=0
for any unit vector v. Now consider a more typical point on the graph of f , such
as (3; 1; 8). If v is a unit vector in R2 then
v = (cos ) i + (sin ) j
for some . Since rf (3; 1) = (6; 2) the directional derivative at (3; 1) is
6 cos
2 sin
so the function f is increasing in the directions where tan < 3 and decreasing in
the directions where tan > 3.
72
6
4
2
0
1
2
3
4
5
6
-2
-4
-6
rf (3; 1) v as a function of
For the two directions (opposite each other) such that tan = 3 the directional
derivative is 0 because v is orthogonal to rf (3; 1). The maximum directional
derivative occurs for
1
v = p (3i
10
p
rf (3; 1) v = 2 10
j)
and the minimum directional derivative occurs for
1
v = p ( 3i + j)
10
p
rf (3; 1) v =
2 10
One of the most important facts about the derivative of a real-valued function is
that its gradient representation provides a vector that is orthogonal to level sets of
the function. If f : R3 ! R then the level sets are surfaces in R3 and the gradient
allows us to …nd the tangent plane at a point on the surface:
73
Theorem. If f : R3 ! R is C 1 and x0 = (x0 ; y0 ; z0 ) is on the
level surface S then rf (x0 ) c0 (0) = 0 for any smooth path in S such
that c (0) = x0 .
This is because f (c (t)) is constant for all t since S is a level set, so its derivative
with respect to t is 0. In particular, when we evaluate this derivative using the
Chain Rule at t = 0 we get
df (c (t))
= 0 = rf (c (0)) c0 (0)
dt
Corollary. If rf (x0 ) 6= 0 then the tangent plane to S at x0 can
be de…ned by
rf (x0 ) (x x0 ) = 0
This theorem allows us to …nd tangent planes to surfaces that are not themselves
the graphs of real-valued functions on R2 , but which may be described as level
sets of real-valued functions on R3 . The ellipsoid x2 + 2y 2 + 3z 2 = 3 is not the
graph of a function because we cannot solve for z without choosing square roots,
but it is the level surface of value 3 for the function
f (x; y; z) = x2 + 2y 2 + 3z 2
p
whose gradient is rf (x; y; z) = (2x; 4y; 6z). The point 1; 22 ;
surface and the tangent plane there is
p
p !
p p
2
3
2; 2 2; 2 3
= 0
x 1; y
;z
2
3
p
p
x + 2y + 3z = 3
74
p
3
3
is on the
2
1
-2
2
-1
z
-1
-2
x
1
0
0 0
1
-1
y
2
-2
Tangent plane at 1;
75
p
p
2
3
;
2
3
Study Guide for Second Exam
Level Sets and Sections of Real-Valued Functions.
Sketch level curves for functions on R2 .
Find section curves for functions on R2 .
Determine level surfaces for functions on R3 of quadratic type (identify type
of quadric surface).
Limits and Continuity.
Find limit (by changing to polar or spherical coordinates) of a real-valued
function at a point in order to de…ne f to be continuous at that point.
Show that the limit of f at a point does not exist by considering di¤erent paths
of approach.
Derivatives.
Find the matrix representing Df (x0 ) for x0 in Rn and f (x0 ) in Rm .
Find the a¢ ne-linear approximation at a point x0 , l(x) = f (x0 )+Df (x0 )(x x0 ).
Find the tangent plane at (x0 ; y0 ; f (x0 ; y0 )) given by the a¢ ne approximation
when n = 2; m = 1.
Find the tangent plane at (x0 ; y0 ; z0 ) on a level surface of f : R3 ! R.
Apply the Chain Rule to …nd D (f g) (x0 ).
Use the gradient rf of a real-valued function to compute its directional derivative at a given point.
Paths and Curves.
Find the point on a curve C in R2 or R3 described by a path c(t) at a particular
value of t.
Find the tangent (velocity) vector for a path c(t) for a particular value of t.
Apply the Chain Rule to f c, using the gradient interpretation of Df when
f is a real-valued function.
76
Iterated Partial Derivatives
h
i
@f
@f
As we have discussed, if f : U Rn ! R then Df is the function @x
@xn
1
whose evaluation at x0 in Rn is a 1 n matrix of numbers that represents a linear
map from Rn to R by the formula
2
3
i x1
h
6
7
@f
@f
(x0 )
(x0 ) 4 ... 5
(x1 ; : : : ; xn ) 7! @x
@xn
1
xn
Assume that f is of class C 1 . Then Df (x0 ) is the derivative of f at x0 . In order
to analyze the behavior f near x0 in greater detail we must at the very least look
at the second partial derivatives, that is, the partial derivatives of the functions
@f
. Then we can generalize results from single-variable calculus that required
@xj
the second derivative. We will mostly be concerned with n = 2 or 3, but many
applications require n = 4 in which case, to avoid unnecessary subscripts, we
usually write
x1
x2
x3
x4
=
=
=
=
x
y
z
t
@f
as a function produced from f by applying the di¤erential
It helps to think of @x
j
@
operator @xj . Then, with n = 2, the second-order partial derivative functions are
@
@x
@
@y
@
@x
@
@y
@f
@x
@f
@x
@f
@y
@f
@y
@ 2f
= fxx
@x2
@ 2f
=
= fxy
@y@x
@ 2f
=
= fyx
@x@y
@ 2f
=
= fyy
@y 2
=
77
Apparently, if there are n input variables then there are n2 second-order partial
derivative functions, so it is convenient to collect them all into an n n matrix
2f
(sij ) = @x@j @x
=
i
2
3
2
2
6
6
4
@ f
@x21
..
.
@ f
@xn @x1
...
@2f
@x1 @xn
7
7
5
..
.
@2f
@x2n
When the derivatives are evaluated at x0 in Rn we obtain an n
numbers called the Hessian matrix Hf (x0 ) of f at x0 .
n matrix of
Consider the function f (x; y; z) = ex sin (y z), for which
2 x
3
e sin (y z)
ex cos (y z)
ex cos (y z)
2
@ f
ex sin (y z) ex sin (y z) 5
= 4 ex cos (y z)
@xj @xi
ex cos (y z) ex sin (y z)
ex sin (y z)
The entries in this matrix are continuous everywhere and we can evaluate to obtain Hf (x0 ). In this case the Hessian matrix at the origin
is
2
0 1
4
1 0
Hf (0; 0; 0) =
1 0
3
1
0 5
0
Notice that Hf (x0 ) in this example is symmetric (equal to its transpose) for any
point x0 . This will always be the case when f is of class C 2 , meaning that its
second-order partial derivatives are continuous.
Theorem. If f is of class C 2 then
indices i; j.
@2f
@xj @xi
=
@2f
@xi @xj
for all pairs of
Since the theorem works with pairs of indices it su¢ ces to prove it for n = 2. As
@2f
@2f
the proof on page 151 shows, the conclusion @y@x
= @x@y
follows from applying
the ordinary Mean Value Theorem (MVT) from single-variable calculus twice.
78
MVT. Let g : R ! R be di¤erentiable on an open interval containing x0 and x0 + x. Then g(x0 + x) g(x0 ) = g 0 (x1 ) x for
some x1 between x0 and x0 + x.
The conditions of the MVT apply because f is of class C 2 . This so-called equality
of mixed partials can fail if f is not of class C 2 . In our discussion above on a¢ ne
approximation we saw that
f (x; y) = xy
x2 y 2
x2 + y 2
f (0; 0) = 0
is di¤erentiable on all of R2 . In fact, f is of class C 1 , even at the origin where the
tangent plane to the graph is the xy-plane:
y (x4 y 4 + 4x2 y 2 )
(x2 + y 2 )2
x (x4 y 4 4x2 y 2 )
fy (x; y) =
(x2 + y 2 )2
fx (0; 0) = fy (0; 0) = 0
fx (x; y) =
Here fx =
@f
@x
and fy =
@f
.
@y
Away from the origin we have
@f
@f
(x4 + y 4 + 10x2 y 2 ) (x2
(x; y) =
(x; y) =
@y@x
@x@y
(x2 + y 2 )3
y2)
But f is not of class C 2 at the origin. In fact,
@f
fx (0; h) fx (0; 0)
(0; 0) = lim
h!0
@y@x
h
h 0
= 1
= lim
h!0
h
@f
fy (h; 0) fy (0; 0)
(0; 0) = lim
h!0
@x@y
h
h 0
= lim
=1
h!0
h
We sometimes say that f is smooth at the origin to a …rst order approximation
but not smooth at the second order. We will focus on C 2 functions in the study
of local extrema where we will use the Hessian matrix to de…ne a quadratic scalar
function, the Hessian quadratic form.
79
Suggested Exercises for Chapter 3
The following exercises are not to be handed in. They represent skills required for
basic mastery.
3.1 (pages 156-158):
1; 5; 7
11; 19
26; 28
3.2 (pages 165-166):
5; 9
3.3 (pages 182-185):
1; 5; 15; 17
25; 27; 29
35; 41
3.4 (pages 201-203):
1; 3; 13; 17
19; 21; 23; 37
Third Graded Assignment
Due: No later than December 3
To reinforce written communication skills the Graded Assignment solutions
should be complete and clearly presented in a "bluebook" or provided in PDF
format. Late papers will not be graded.
Third Graded Assignment. Do any one of the following:
Page 158: 30
Page 184: 44
Page 202: 24
Page 214: 40
80
Second-Order Approximation
For a real-valued function f di¤erentiable at x0 in an open set U
the linear a¢ ne approximation of f to be
l(x) = f (x0 ) + Df (x0 ) (x
Rn we de…ned
x0 )
The claim that f is di¤erentiable at x0 means that
lim
x!x0
f (x) l(x)
=0
jx x0 j
We say that l(x) provides a …rst-order approximation of f for x near x0 . It is
helpful to rewrite this formulation in terms of the di¤erence
h = x x0
x = x0 + h
Then, f (x)
l(x) = f (x0 + h)
l(x0 + h) = R1 (x0 ; h) where
R1 (x0 ; h)
=0
h!0
jhj
lim
We call R1 (x0 ; h) the …rst-order remainder function near x0 . It measures the difference between
f3and its …rst-order approximation as a function of the di¤erence
2
h1
6
7
vector h = 4 ... 5 in the domain:
hn
2
3
h1
6
7
f (x0 + h) = f (x0 ) + Df (x0 ) 4 ... 5 + R1 (x0 ; h)
hn
2
3
h1
6
7
Sometimes Df (x0 ) 4 ... 5 is written in gradient form rf (x0 ) h.
hn
81
Now suppose f is of class C 2 near x0 . Then the Hessian matrix H for f consists
of continuous second-order partial derivatives. When these are evaluated at x0
the resulting numerical matrix de…nes a quadratic function of h :
2
3
h1
1
6
7
h1
hn Hf (x0 ) 4 ... 5
Hf (x0 ) (h) =
2
hn
Hf (x0 ) (h) is called the Hessian (form) of f at x0 , and it is used to de…ne the
second-order approximation of f for x near x0 :
f (x0 + h)
q(x0 + h) = l(x0 + h) + Hf (x0 ) (h)
q(x0 + h) = R2 (x0 ; h)
and since f is of class C 2 it follows that
lim
h!0
R2 (x0 ; h)
=0
jhj2
that is, the second-order remainder divided by the square of the magnitude of the
di¤erence vector goes to 0 as h gets small.
As an example, look at f (x; y; z) = ez cos x sin y near x0 = 0; 2 ; ln 2 . We have
f (x0 ) = 2 and
Df (x) =
2
Hf (x) = 4
ez sin x sin y ez cos x cos y ez cos x sin y
ez cos x sin y
ez cos y sin x
ez sin x sin y
ez cos y sin x
ez cos x sin y
ez cos x cos y
3
ez sin x sin y
ez cos x cos y 5
ez cos x sin y
Then f (x0 +h) = q(x0 +h)+R2 (x0 ; h) where q(x0 +h) = q h1 ; 2 + h2 ; ln 2 + h3 =
2
3
2
32
3
h1
2 0 0
h1
1
h1 h2 h3 4 0
2 0 5 4 h2 5
2 + 0 0 2 4 h2 5 +
2
h3
0
0 2
h3
= 2 + 2h3
h21
h22 + h23
This polynomial in h1 ; h2 ; h3 is the second-order approximation of f (x0 +h), which
82
is
2eh3 cos h1 cos h2
For a small di¤erence, such as h = (0:1; 0:1; 0:05), we …nd
R2 (x0 ; h)
:0009 132 4
so the quadratic polynomial is slightly greater that f for this particular h. A
formula due to Lagrange expresses this remainder in terms of an integral so that
its size can be estimated. Lagrange’s formula applies to remainders of all orders,
which are obtained from approximations using partial derivatives of corresponding
orders.
83
Critical Points
The second-order approximation is used to analyze the extreme behavior of f :
U Rn ! R at a point in U where such behavior is likely to occur:
De…nition. A point x0 2 U is a critical point if either f is not
di¤erentiable at x0 or if it is di¤erentiable but Df (x0 ) = 0.
Suppose that f is di¤erentiable at x0 and that f (x)
close to x0 . Then, for any h 2 Rn
f (x0 ) for all x su¢ ciently
g(t) = f (x0 + th)
has a local maximum at t = 0, so by the Chain Rule
g 0 (0) = Df (x0 ) h = 0
Since h is arbitrary we must have Df (x0 ) = 0. The same conclusion holds if
f (x) f (x0 ) for all x su¢ ciently close to x0 . We say in either case that x0 is a
local extremum for f . Unless stated otherwise, we will assume f is di¤erentiable
at x0 for purposes of analyzing extreme behavior. We now have:
Theorem. A necessary condition for x0 to be a local extremum is
that Df (x0 ) = 0.
An example where n = 2: Let f (x; y) = xy. Then Df (x; y) = y x
(equivalently, rf (x; y) = yi + xj). The only critical point is x0 = (0; 0).
An example where n = 3: Let f (x; y; z) = ez cos x sin y. Then
Df (x; y; z) =
ez sin x sin y ez cos x cos y ez cos x sin y
For x0 to be a critical point we must have
sin x sin y = cos x cos y = cos x sin y = 0
84
which happens only when
cos x = 0 and sin y = 0
Thus, there are in…nitely many critical points, given for arbitrary integers k; m by
(2k + 1) ; m ; z
2
Another example: Let f (x; y; z) = ez sin (x + y), for which
Df (x; y; z) =
ez cos (x + y) ez cos (x + y) ez sin (x + y)
This function does not have any critical points because cos (x + y) and sin (x + y)
cannot simultaneously be zero.
Hessian Test for Local Extrema
The idea of the Hessian test is to look at the second-order approximation of the
function near x0 . This approximation is a polynomial with terms of degree 1 and
2. The extrema of such functions are well-understood, in particular, the quadratic
terms usually, but not always, determine the nature of the extrema of f . Recall
that if f is C 2 near x0 then
q(x0 + h) = l(x0 + h) + Hf (x0 ) (h)
f (x0 + h) q(x0 + h) = R2 (x0 ; h)
R2 (x0 ; h)
= 0
lim
h!0
jhj2
2
3
h1
6
7
hn Hf (x0 ) 4 ... 5 is the Hessian quadratic
where Hf (x0 ) (h) = 21 h1
hn
form whose terms are all degree 2; in particular, we always have
Hf (x0 ) (0) = 0
Basic linear algebra classi…es quadratic forms as follows:
85
We say the form is de…nite provided Hf (x0 ) (h) = 0 only when
h = 0: positive-de…nite if Hf (x0 ) (h) > 0 for h 6= 0, and negativede…nite if Hf (x0 ) (h) < 0 for h 6= 0. We say the form is inde…nite
provided Hf (x0 ) (h) can be either positive or negative depending on
h.
The following examples illustrate the di¤erences:
1) f (x; y) = x2 + y 2
3. Since Hf (x) =
2 0
0 2
at any point x we have
Hf (x) (h) = h21 + h22
which is non-negative for all h and equals zero only when (h1 ; h2 ) = (0; 0). The
form is positive-de…nite.
2) f (x; y) = xy + 1. Here Hf (x) =
0 1
1 0
at any point x and so
Hf (x) (h) = h1 h2
which is positive if h1 ; h2 have the same sign but negative if they have opposite
signs. The form is inde…nite.
3) f (x; y) = x2
2xy + y 2 + x
y. The Hessian is
2
2
2
2
at any point x
and so
Hf (x) (h) = h21
2h1 h2 + h22 = (h1
h2 )2
This form is neither de…nite nor inde…nite because it is non-negative for all h
but equals zero whenever h1 = h2 . Such a form is called semi-de…nite, but for
purposes of analyzing critical points we will say that the form is degenerate.
86
The graphs z = f (x; y) in the above examples are all quadric surfaces. Example
1 is a paraboloid with minimum z-value equal to 3. Example 2 is a hyperbolic
paraboloid (a saddle-shaped surface). Example 3 is a parabolic cylinder with
the in…nitely many critical points x; 21 + x :
15
z
10
5
-3
-2
-3
-2
-1
y2
-1
0
0 0 1
1
x
2
3
3
z = x2
2xy + y 2 + x
y with tangent plane z =
1
4
In these examples we expect the Hessian form at any point to predict the local
behavior of f because the the second-order approximation q is equal to the function
itself. However, for any function f that is C 3 the remainder R2 (x0 ; h) satis…es
the second-order limit test and can be estimated by Lagrange’s formula. This
provides the Hessian test:
Theorem. Let x0 be a critical point of f . If Hf (x0 ) is positivede…nite then x0 is a relative minimum of f . If Hf (x0 ) is negativede…nite then x0 is a relative maximum of f .
This is our second-derivative test for local extrema. If Hf (x0 ) is inde…nite we
say that x0 is of saddle type. If Hf (x0 ) is semi-de…nite the test itself cannot
determine the nature of the critical point; often it can be determined by inspection,
87
for example, the parabolic cylinder in Example 3 "bottoms out" on the line of
critical points y0 = 12 + x0 where f (x0 ; y0 ) = 14 .
Determinant Test for Quadratic Forms
Basic linear algebra also tells us how to classify the quadratic form in any dimension provided by Hf (x0 ). The matrix Hf (x0 ) is n n and so we can compute
its determinant. But we can also compute the determinants of all the square
sub-matrices
[a11 ]
a11 a12
a21 a22
2
3
a11 a12 a13
4 a21 a22 a23 5
a31 a32 a33
..
.
Hf (x0 )
We will not have occasion to consider examples for n > 3, but these sub-determinants
allow us to generalize the second-derivative test for any dimension n. Consider
again the examples above:
1a) f (x; y) = x2 + y 2 3. The only critical point is x0 = (0; 0). The fact that the
Hessian form is positive-de…nite is determined by the two sub-determinants
det
det [2] = 2
2 0
= 4
0 2
both positive numbers: All sub-determinants positive means Hf (x0 ) is a
positive-de…nite form. The point (0; 0) is a local minimum.
1b) f (x; y) = 3
determinants are
x2
y 2 . The only critical point is x0 = (0; 0). The sub-
det
det [ 2] =
2
2 0
= 4
0
2
88
both non-zero but alternating in sign, beginning with a negative number. This signi…es a negative-de…nite form. The point (0; 0) is a local
maximum.
2a) f (x; y) = xy + 1. The only critical point is x0 = (0; 0). The sub-determinants
are
det
det [0] = 0
0 1
=
1
1 0
These numbers …t into neither pattern 1a nor 1b, but det Hf (x0 ) 6= 0. This
signi…es an inde…nite form. The point (0; 0) is of saddle type.
2b) f (x; y) = x2 y 2 . The only critical point is x0 = (0; 0). The sub-determinants
are
det
det [2] = 2
2 0
=
4
0
2
The signs alternate but not in the manner of a negative-de…nite form. This form
is also inde…nite. The point (0; 0) is of saddle type.
3) f (x; y) = x2
at any of them
2xy + y 2 + x
y. There is an entire line of critical points but
det Hf (x0 ) = 0
which means the form is degenerate despite the values of the smaller
sub-determinants. The sub-determinant test is inconclusive, though we have
seen that each of these critical points, in this example, is a minimum.
Examples with n = 3 will further demonstrate the sub-determinant analysis:
2
4) For f (x; y; z) = ln x2z+y+1
2 +1 ,
Df (x; y; z) =
2 x2 +yx2 +1
89
2 x2 +yy 2 +1 2 z2z+1
so the only critical point is (0; 0; 0). We suspect this of saddle type because
f (0; 0; 0) = 0 and any change in x or y produces a negative value whereas a
change in z produces a positive value. In fact, Hf (x) =
2
2
2
( 2) (x2 + y 2 + 1) (y 2 x2 + 1)
4 (x2 + y 2 + 1) yx
0
2
2
2
2
2
2
2
2
4
4 (x + y + 1) yx
2 (x + y + 1) (y
x
1)
0
2
2
0
0
( 2) (z + 1) (z + 1) (z
2
3
2 0 0
2 0 5 at the origin. The sub-determinants are 2; 4; 8
which becomes 4 0
0
0 2
and so the origin is of saddle type.
5) Consider again f (x; y; z) =
Df (x) =
h
2
Hf (x) = 4
sin z
2x (x2 +y
2 +1)2
sin z
,
1+x2 +y 2
for which
sin z
2y (x2 +y
2 +1)2
cos z
x2 +y 2 +1
3
i
3
2 (x2 + y 2 + 1) (y 2 3x2 + 1) (sin z)
8 (x2 + y 2 + 1) (sin z) yx
3
3
8 (x2 + y 2 + 1) (sin z) yx
2 (x2 + y 2 + 1) (3y 2 x2 1) (sin z)
2
2
2 (x2 + y 2 + 1) (cos z) x
2 (x2 + y 2 + 1) (cos z) y
The critical points are
x0 = 0; 0; (2k + 1)
2
where we have
2
2
4
0
Hf (x0 ) =
0
2
2 0
4
0 2
Hf (x0 ) =
0 0
Note that f 0; 0;
[ 1; 1].
2
0
2
0
3
0
0 5 , k even (local maximum)
1
3
0
0 5 , k odd (local minimum)
1
= 1 and f 0; 0; 32
=
1, and in fact the range of f is
Global and Constrained Extrema
90
1)
2 (x
2 (x
(
2
We saw that the function f (x; y; z) = ln x2z+y+1
2 +1 has a saddle point at the origin
and no other critical points. Suppose we were only interested in the behavior of f
on the unit ball f(x; y; z) : x2 + y 2 + z 2 1g. There can be no extreme behavior
in the open ball, but we can ask for the maximum and minimum behavior on the
bounding unit sphere x2 + y 2 + z 2 = 1. For these points
f (x; y; z) = ln
Since
2 x2 y 2
x2 + y 2 + 1
= 1 on the unit sphere we can express f in the form
2 sin2
f ( ; ; ) = ln
1 + sin2
2
sin
on the closed interval [0; ],
Now we look for the extreme behavior of ln 21+sin
2
which is the domain of . At the endpoints we have
f ( ; ; 0) = ln 2 = f ( ; ; )
For the open interval (0; ) we di¤erentiate with respect to
extreme behavior occurs when
sin 2 = 0
and …nd that possible
that is, = 2 , and f ( ; ; 2 ) = ln 2. We conclude that the maximum value of
f on the unit sphere is ln 2 and occurs at the "poles"
(x; y; z) = (0; 0; 1)
whereas for every point on the "equator"
(x; y; z) = (cos ; sin ; 0)
f achieves its minimum value of
ln 2.
This example is similar to the determination of global extrema of a single-variable
function on a closed interval where it is necessary to check the values at the
endpoints. However, when n > 1 the boundary set, such as the unit sphere in
the above example, will typically consist of a curve, surface, or collection of such
sets. If a subset of the domain is "compact" (closed and bounded) the following
theorem from basic analysis applies:
91
If D is a compact subset of Rn and f : D ! R is continuous then
f assumes its absolute maximum and minimum values at some points
of D.
Finding extreme points on a boundary set is one type of constrained extrema problem. These problems can be very di¢ cult and so techniques have been developed
to deal with them in great generality. Most of these are based on methods due to
Euler and Lagrange.
Method of Lagrange Multipliers
If g : U
Rn ! R is C 1 and x0 belongs to a level set S then we can de…ne the
tangent space to S at x0 in the same way that we de…ned the tangent plane when
n = 3. If rg (x0 ) 6= 0 this tangent space consists of all x such that
rg (x0 ) (x
x0 ) = 0
The geometry is the same: If c(t) is a path in S with c(0) = x0 then c0 (0) is
orthogonal to rg (x0 ). Now suppose f is another C 1 function on U and we look
at its behavior restricted to the level set S, which we can denote f jS. If f jS has
a local extremum at x0 then f (c(t)) has an extremum at t = 0, so by the Chain
Rule
rf (x0 ) c0 (0) = 0
It follows that rf (x0 ) is a scalar multiple of rg (x0 ) : For some real number
rf (x0 ) = rg (x0 )
(Note that = 0 if x0 happens to be a critical point for f because rg (x0 ) 6= 0
but rf (x0 ) = 0 in this case, so it is a good idea to …rst …nd the critical points for
f .) The extrema for f jS are called constrained extrema of f . We can apply this
2
reasoning to the example above with f (x; y; z) = ln x2z+y+1
2 +1 on the unit sphere,
which is the level set S of value 1 for the function g(x; y; z) = x2 + y 2 + z 2 :
2x
2y
2z
i
j
+
k
x2 + y 2 + 1
x2 + y 2 + 1
z2 + 1
rg (x; y; z) = 2xi + 2yj + 2zk
rf (x; y; z) =
92
To …nd points x0 such that rf (x0 ) = rg (x0 ) we look at the three equations
x
=
x2 + y 2 + 1
y
=
x2 + y 2 + 1
z
=
z2 + 1
x
y
z
First, suppose we can satisfy these equations with z 6= 0. Then
a point, so if either x 6= 0 or y 6= 0 we would have
z2
1
=
+1
x2
=
1
z 2 +1
for such
1
+ y2 + 1
which implies x2 + y 2 + z 2 + 2 = 0. Since this is not possible we conclude that
the only such points are (0; 0; 1) since these are the only two points on S with
x = y = 0. (From the third equation we see that = 12 at these points.) Next,
suppose z = 0. Then we have x2 + y 2 = 1 and so x and y cannot both be zero.
For all such points, which comprise the equator of the unit sphere, apparently
= 21 . Now that we have identi…ed the possible constrained extrema of f the
values of f at these points can be calculated and compared. Since the only critical
point of f on R3 is (0; 0; 0) and f (0; 0; 0) = 0 we conclude that the global extrema
of f on the closed unit ball occur on the bounding sphere where the maximum is
f (0; 0; 1) = ln 2 and the minimum is f (cos ; sin ; 0) = ln 2. In this example,
the closed unit ball is the union of the open unit ball and its boundary, which can
be written as U [ @U where @U is the level set S. (Since g is a C 1 function and
rg is never the zero vector on S we say that the boundary @U is smooth.)
This method of Lagrange multipliers can be used for more general constraint
problems where it can identify possible candidates for extrema but not verify
their existence. Typical applications arise in geometry when we need to identify con…gurations that maximize or minimize a certain property. For example,
suppose we want to …nd the points on a quadric surface ax2 + by 2 + cz 2 = 1
with abc 6= 0 that are closest to the origin in R3 . The type of quadric surface
depends on the coe¢ cients a; b; c but we van pose the problem very generally.
Let g (x; y; z) = ax2 + by 2 + cz 2 so that the surface is the level
p set S of value
1. The function to be minimized is given by the formula x2 + y 2 + z 2 but
93
since it su¢ ces to minimize the square of the distance it is easier to work with
f (x; y; z) = x2 + y 2 + z 2 . Thus, we look for points on S that satisfy
x =
y =
z =
ax
by
cz
for some . We know that the only critical point of f is (0; 0; 0) which is not
on S, so we know that 6= 0. Also, unless a = b = c (i.e., S is a sphere and
all points are equally close to the origin) we cannot have xyz 6= 0 for then the
three equations would produce di¤erent values for at such a point. Thus we can
set each coordinate equal to zero and examine each case. For example, if x = 0
but yz 6= 0 then b = c and each point (0; y; z) with y 2 + z 2 = 1b is a candidate
(no such point will exist if b < 0). If x = y = 0 then the points 0; 0; p1c
are possibilities, and no such point exists if c < 0. Similar analysis applies if
y = 0 or z = 0, respectively. For example. The closest points on the ellipsoid
x2 + 2y 2 + 3z 2 = 1 are ( 1; 0; 0) ; 0; p12 ; 0 and 0; 0; p13 . The vectors from
the origin to these points are parallel to the normal vectors to the tangent planes
at these points.
Exercise. Find the points closest to the origin on the surface x2 + y 2
Compare this to the result for the surface x2 + y 2 z 2 = 1.
z 2 = 1.
The examples above could all be done by inspection of the particular quadric
surface. The Lagrange method is useful for …nding
p 3constrained extrema that are
1
2
not obvious. Consider the graph of z = x + 3 8y + 1, which is the level set S
p
of value 3 for g(x; y; z) = 3x2 + 8y 3 z. To …nd the points closes to the origin
we analyze the Lagrange equations
x = 2 x
p
y =
8 y2
z =
Since (0; 0; 0) is not on S we cannot have = 0, which this time gives us the
important information that the closest points are not in the xy-plane. Since
z=
we can rewrite the …rst two equations
x (1 + 2z) = 0
p
y 1 + 8yz = 0
94
and examine the various cases:
1) x = 0; y = 0 : Then z = 1 and so (0; 0; 1) is a candidate.
2) x = 0; y 6= 0 : The we must have
p
1 + 8yz = 0
1p 3
z =
8y + 1
3
which leads to
p
3
2
3
y4 +
y+ =0
4
8
This equation has two real solutions
q
p
1 p
y=
6
4 3 6
4
and so we have two more candidates
q
p
p
1 p
1
0;
6
4 3 6 ;
1+ 3
4
4
p !q
p
3
1+
2 3
3
!!
3
3) x 6= 0; y = 0 : Then
1 + 2z = 0
z = x2 + 1
which has no solution.
4) x 6= 0; y 6= 0 : Then
1 + 2z = 0
p
1 + 8yz = 0
which has the unique solution y =
that will give a point on S.
p1 ; z
2
=
95
1
2
but no corresponding value for x
Finally, we check the value of f for the three points on S that the Lagrange
method has found:
f
f
0;
1
4
0;
1
4
q
p
p
6+ 4 3
6 ;
1
4
1+
p
p
6 ;
1
4
1+
p
6
q
p
4 3
3
3+
f (0; 0; 1) = 1
!!
p !q
p
3
1
1+
2 3 3
=
3
12
!
!!
p
q
p
3
1
1+
2 3 3
=
3
12
p
The third value is larger than the second since 3 > 3. The second value is
approximately 0:89971 < 1rand so there is a unique point on the graph closest to
p
p p p
1
the origin. Its distance is 12
3+5 3
3
3
2 3 3
0:948 53.
50
-2
-2
z
y2
0
0 0
-50
z = x2 +
1
3
2x
p
8y 3 + 1
p
The intersection of this surface with the xy-plane is the curve x2 + 13 8y 3 + 1 = 0
whose closest point to the origin is 0;
Note that
p
3
p3
2
p
3
p3
2
1: 019 8 > 0:948 53.
96
, which can be found by setting
dy
dx
= 0.
p
3+5 3
3
p
3+5 3+ 3
p
3
p
3
-2.0
-1.5
-1.0
-0.5
0.0
0.5
y
-0.5
-1.0
-1.5
-2.0
x2 +
p
1
3
8y 3 + 1 = 0
97
1.0
1.5
x
2.0
Application: Snell’s Law of Refraction
Refer to …gure 3.4.7 on page 202. The speed of light in a vacuum is 3 1010
centimeters per second, but in another medium, such as water or glass, this speed
can be slower. This explains the phenomenon of refraction as light pases from one
medium where the speed is v1 to another medium where the speed is v2 . As in
exercise 28, suppose light is emitted at point A and received at point B, with d
the distance between these two points. By Snell’s Law, the light will not travel in
a straight line from A to B unless its speed is the same in both media. Rather,
the path taken will consist of two line segments so as to minimize the total travel
time T . Since the distance from A to the boundary between the two media along
the …rst segment of the path is
d1 = v1 t1
where t1 is the time required to reach the boundary, and the length of the second
segment is d2 = v2 t2 , where t2 is the time required to travel from the boundary to
point B, we can …nd possible paths by minimizing the function
T (t1 ; t2 ) = t1 + t2
after expressing d in terms of d1 and d2 . As in the …gure, we can represent this
path in terms of the angles 1 and 2 . The angle between the two segments of
the path is then
=
1+ 2
whereby
cos
=
cos (
2)
1
By the Law of Cosines,
d2 = d21 + d22 2d1 d2 cos
= (v1 t1 )2 + (v2 t2 )2 + 2 (v1 t1 ) (v2 t2 ) cos (
1
2)
Let a and b be the vertical distances from the boundary to A and B, respectively.
Then
cos (
1
2)
= cos
1
cos
2
a b
=
+
v1 t1 v2 t2
+ sin
s
98
1
1
sin
a
v1 t1
2
2
s
1
b
v2 t2
2
Then
d2 = t21 v12 + t22 v22 + 2ab + 2
= g (t1 ; t2 )
q
t21 v12
a2
q
t22 v22
b2
de…nes a level curve of value d2 for ths function g. We can use the method of
Lagrange to minimize T on this level set:
rT (t1 ; t2 ) = i + j
rg (t1 ; t2 ) =
2t1 v12
p
t21 v12
a2 +
p
t21 v12
p
t22 v22
b2
a2
i+
2t2 v22
For these gradient vectors to be parallel we must have
p
t21 v12
@g
@t1
t v2
t v2
p 1 1
p 2 2
=
t21 v12 a2
t22 v22 b2
v1
v2
r
= r
a
v1 t1
1
2
b
v2 t2
1
sin
v1
=
v2
sin
=
a2 +
p
t22 v22
@g
@t2
p
t22 v22
b2
b2
j
and so
2
1
2
For a given ratio vv12 as determined by media of transmission, this law identi…es
the relation between angles required to minimize the travel time. Note, however,
that 2 is a function of 1 and so both angles can be found from the given value
v1
. To see this, let the origin O be the intersection of the x-axis (boundary) with
v2
the segment of length d from A to B. Assume A is in the second quadrant and
!
!
B in the fourth and let r1 = OA and r2 = OB so that r1 + r2 = d. Then,
for some t 2 2 ;
we have A = (r1 cos t; r1 sin t) and B = ( r2 cos t; r2 sin t).
Assume the light ray crosses the boundary somewhere along the positive x-axis
so that t 2
, whereby cot t tan 1 < 1. It is now straightforward
1
2
to show
tan
2
=
1
(d cot t + r tan
s
and so
99
1)
sin2
2
=
(r tan 1 + d cot t)2
(r tan 1 + d cot t)2 + s2
Since r; s; d and t are given, we can in principle solve for
from the equation
q
s2 + (r tan 1 + d cot t)2 sin
v1
=
v2
jr tan 1 + d cot tj
1
1
For example, if r = s then
v1
=
v2
and for t =
3
4
a ratio of
v1
v2
q
1 + (tan
1
+ 2 cot t)2 sin
jtan
1
+ 2 cot tj
1
= 1: 387 7 yields
1
= 52:5
2
= arctan 2
tan
100
7
24
34:9
in the above domain
Acceleration and Force
For a path c(t) that describes a trajectory as a di¤erentiable function of time t
the tangent vector c0 (t) represents the velocity v(t). At any time t the magnitude
jv(t)j is called the speed. The second derivative c00 (t) = v0 (t) is the acceleration
a(t). Many kinematical properties can be deduced from properties of the derivative, especially the Chain Rule. For example, suppose that the speed s is constant
but that the velocity is not necessarily constant due to changing direction. Then
jc0 (t)j = s for all t and so
s2 = c0 (t) c0 (t)
Di¤erentiating both sides:
0 = 2a(t) v(t)
we …nd that the acceleration vector for the path is always orthogonal to the
velocity vector. Similarly, it is easy to show that if jc(t)j is constant then v(t) is
always orthogonal to c(t),though the speed may vary; contrast the two examples
c(t) = (cos t; sin t) ; t 2
;
2 2
and
2
t
;p
; t 2 ( 1; 1)
4 + t2
4 + t2
In both cases we have jc(t)j = 1 and both paths describe the same curve. In the
…rst example the speed is constant. In the second example
c(t) =
p
s(t) = jv(t)j =
whereby s(0) =
1
2
2
4 + t2
is the maximum speed.
The general rule for di¤erentiating the dot product of two paths is
d
[p(t) c(t)] = p0 (t) c(t) + p(t) c0 (t)
dt
which follows, along with similar rules for other products (see page 218), by
direct computation.
In applications, it often happens that v(t) = 0 for certain times t. The resulting
curve C typically, but not always, has cusps at these points because the tangent
101
vector is 0. If v(t) 6= 0 the path is regular at this point. A regular path is one for
which the tangent vector never vanishes.
Newton’s Second Law says that a mass m is subject to a force F that satis…es
F(c(t)) = ma(t)
Finding the actual path c from a given variable force is therefore a matter of
solving a second-order di¤erential equation. This problem can be complicated
but it certain cases we can obtain simple but powerful results. Newton realized
that a mass moving on a circular path with constant speed s must be subject to a
force of constant magnitude directed toward the center of the circle. If the radius
of the circle is r0 this path turns out to be
c(t) = (r0 cos !t) i + (r0 sin !t) j
if we represent the circle in R2 , where ! = rs0 , the frequency of the motion.
Di¤erentiating, we verify that v(t) = !r0 ( (sin !t) i + (cos !t) j) and so s = !r0 .
Di¤erentiating again,
a(t) = ! 2 c(t)
and so the force
m! 2 c(t)
has magnitude m! 2 r0 = rm0 s2 (proportional to the square of the speed) and is
directed toward the center of the circle (centripetal force). This deduction was
turned into a law of gravitation by Newton: The trajectory of a mass m in the
vicinity of a mass M is determined by
GmM
c(t)
jc(t)j3
ma(t) =
where G is a universal constant. In the case of a satellite orbiting a massive body
this trajectory will be very closely circular, so we can equate
GmM
c(t) =
r03
and equating the magnitudes we obtain
s2 =
GM
r0
102
! 2 c(t)
The period T of the motion is the time required to complete one orbit, so
sT = 2 r0
so
GM
=
r0
2 r0
T
2
or
4 2 3
r
GM 0
This is one of Kepler’s Laws, which Newton validated mathematically: The square
of the period is proportional to the cube of the radius.
T2 =
This point of view allows us to generalize the problem of …nd the path of an object
when the force varies continuously with time. For example, suppose an object is
acted on by a force due to wind pressure that produces acceleration
a(t) =
1
30 cos t; 20 sin t; t
2
Suppose the initial position and velocity of the object are
c(0) = (x0 ; y0 ; z0 )
v(0) = 0
Then
v(t) =
30 sin t; 20
1
20 cos t; t2
4
and so
c(t) =
x0 + 30 (1
cos t) ; y0 + 20 (t
sin t) ; z0 +
If the mass starts at the origin then its trajectory for 0
103
t
1 3
t
12
15 will be
z
y
x
c(t) = 30
30 cos t; 20t
1 3
20 sin t; 12
t
This curve stays in the …rst octant but touches the yz-plane whenever t = 2k .
Note that the tangent vector is vertical when t = 2k . The object is swirling
upward in the wind.
104
Suggested Exercises for Chapter 4
The following exercises are not to be handed in. They represent skills required
for basic mastery.
4.1 (pages 227-228):
1; 3; 6; 7; 9; 11; 13; 15
4.2 (pages 234-236):
1; 8
4.3 (pages 243-244):
9; 10; 17; 21; 27
4.4 (pages 258-260):
1; 10; 13; 14; 18; 20; 37; 41
Fourth Graded Assignment
Due: No later than December 8.
To reinforce written communication skills the Graded Assignment solutions
should be clearly presented in a "bluebook" or provided in PDF format. Late
papers will not be graded.
Fourth Graded Assignment. Do any one of the following:
Page 235: 16
Page 244: 20
Page 259: 22
Page 261: 34
105
Arc Length
Let c : R ! Rn be a C 1 path. Imagine this path as the trajectory of a point
whose position is given as a function of time t. Then jc0 (t)j is the speed of the
point at time t and the distance travelled by the point over the time interval [a; b]
is given by
Z
b
a
jc0 (t)j dt
If c is one-to-one on [a; b] then this distance is equal to the length of the curve
between c(a) and c(b). For this reason the integral is said to de…ne the arc length
of this path on the interval, even if the trajectory passes through the same point
more than once.
As an example, let c(t) = (1
cosh t; sinh t; t) on the interval [0; ln 8]
2.0
1.5
z
1.0
0.5
0.0
0 0
-1
1
-2
2
3
4
-3
x
y
c(t) = (1
cosh t; sinh t; t)
Then c0 (t) = ( sinh t; cosh t; 1) so
p
p
0
jc (t)j = sinh2 t + cosh2 t + 1 = 2 cosh t
The arc length of this path on [0; ln 8] is
p Z ln 8
p
2
cosh tdt =
2 (sinh ln 8
0
=
63 p
2
16
106
sinh 0)
5:5685
The di¤erential
d
= dx1 e1 +
+ dxn en
is sometimes called the in…nitesimal displacement of a point moving along the
path, and
p
+ xn (t)2
d = x1 (t)2 +
is called the di¤erential of arc length. In the above example,
p
d = 2 cosh tdt
Note: The lower-case Greek letter sigma is sometimes used so that arc length is
not confused with speed s. Then the formula above example 6 on page 232
would be written
0
(t) = jc0 (t)j
so as not to con‡ict with the de…nition s(t) = jc0 (t)j.
The arc length integral can be used to …nd the geometric distance along the graph
of a function y = f (x) by noting that this curve is described by the path
c(t) = (t; f (t))
c0 (t) = (1; f 0 (t))
p
1 + f 0 (t)2
d =
For example. if y = x2 then the length of the parabola between x = a and x = b
is
Z bp
1 + 4t2 dt
a
p
1 p 2
1 p 2
1 2b + 4b2 + 1
p
=
b 4b + 1
a 4a + 1 + ln
2
2
4 2a + 4a2 + 1
p
p
For a = 0 this reduces to 12 b 4b2 + 1 + 41 ln 2b + 4b2 + 1 , so the arc of the
parabola from (0; 0) to (1; 1) measures
which is greater than
p
p
1p
1
5 + ln
5+2
2
4
2 as expected.
107
1: 478 9
Vector Fields and Flow Lines
Functions f : Rn ! Rn are called vector …elds when the emphasis is on the
vector properties of the output, i.e., the input point x0 is associated with a vector
represented by a directed segment with tail at x0 . This is useful when f represents
a vector quantity such as velocity or force. For n = 3 the usual notation is
F(x; y; z) = (F1 ; F2 ; F3 )
where the Fj are the scalar coordinate functions of the output. The same notation
is used for planar vector …elds, that is, when n = 2. This point of view suggests
a close connection with paths. In particular, it is often possible to …nd paths
whose tangent vectors are produced by a given vector …eld on a region of space
containing the path. In other words,
c0 (t) = F(c(t))
the path c is called a ‡ow line for the vector …eld F. The generic mathematical term for ‡ow line is integral curve, since the de…nition makes sense in any
dimension with or without physical context. As an example, let
F(x; y; z) =
1
(x
z
yz) ;
1
1p 2
x + y2 ; z > 0
(y + xz) ;
z
z
Then the path c(t) = (t cos t; t sin t; t) for t > 0 is a ‡ow line for F since
F (t cos t; t sin t; t) = (cos t
= c0 (t)
t sin t; sin t + t cos t; 1)
If F represented velocity of a ‡uid then, for any t0 > 0, a particle placed at
(t0 cos t0 ; t0 sin t0 ; t0 ) would follow this path for t t0 .
108
12
10
8
6
z
4
2
-4
6
4
-2 0
0
2-2 20
y-4
-4
-2
x4 6
8
c(t) = (t cos t; t sin t; t) ; 0 < t
5
2
Divergence and Curl of Vector Fields on R3
The gradient of a scalar-valued function on R3 produces a vector …eld on R3
rf (x; y; z) =
@f
@f
@f
(x; y; z) ;
(x; y; z) ;
(x; y; z)
dx
dy
dy
Vector …elds that are gradients of scalar functions are called conservative because
they describe conservation laws in physics. Consider two examples
F (x; y; z) = (xy; yz; xz)
G (x; y; z) = (yz; xz; xy)
Then G (x; y; z) = rg(x; y; z), where g(x; y; z) = xyz. However, if F (x; y; z) =
rf (x; y; z) for some scalar function f then
@f
(x; y; z) = xy
@x
@f
(x; y; z) = yz
@y
@f
(x; y; z) = xz
@z
109
and so f (x; y; z) =
1 2
x y+
2
1 (y; z)
1
= y2z +
2
2 (x; z)
1
= xz 2 +
2
But then 3 = 21 xy + 1 21 xz 2 , which is not possible since
of y and z. Thus F is not a gradient vector …eld.
3 (x; y)
1
is a function only
Theorem. Let F be a vector …eld on R3 . If F = rf for some C 2 scalar function
f then
@F3 @F2
@F1 @F3
@F2 @F1
i+
j+
k=0
@y
@z
@z
@x
@x
@y
@f
and this follows immediately by the equality of mixed
Proof. Since Fj = @x
j
partial derivatives for the C 2 function f .
3
The vector …eld @F
@y
of F. It is denoted
@F2 @F1
; @z
@z
@F3 @F2
; @x
@x
r
@F1
@y
is called the curl (or rotational)
F
because it is the cross-product of the gradient operator with the vector …eld F.
For G (x; y; z) = (yz; xz; xy) we have r G = 0, but for F (x; y; z) = (xy; yz; xz)
r
F=
yi + zj + xk
The curl operator measures the tendency of a vector …eld to circulate around a
p
given point. For F (x; y; z) = z1 (x yz) ; z1 (y + xz) ; z1 x2 + y 2
r
p
p
x z + x2 + y 2
y z + x2 + y 2
p
p
F=
i
j + 2k
z 2 x2 + y 2
z 2 x2 + y 2
The magnitude of this curl becomes very large as (x; y; z) ! (0; 0; 0).
Vector …elds are also analyzed by their tendency to expand or contract in the
vicinity of a given point. The …rst derivatives of the coordinate functions allow
us to de…ne the divergence of F as the scalar function
r F=
@F1 @F2 @F3
+
+
@x
@y
@z
110
which is formally the dot product of the gradient operator with the vector …eld
F. For F (x; y; z) = (xy; yz; xz) the divergence is r F (x; y; z) = x + y + z. For
G (x; y; z) = (yz; xz; xy) the divergence is r G (x; y; z) = 0. Note that G is the
curl of the vector …eld
A (x; y; z) =
1
x z2
4
y 2 ; y x2
z2 ; z y2
Theorem. Let G be a vector …eld on R3 . If G = r
with C 2 coordinate functions then
x2
A for some vector …eld A
r G=0
Proof. This is also immediate from equality of the mixed partial derivatives
of the coordinate functions for A.
111
Download