MATH 251-02: Multivariable Calculus I (83305) JB-387, TuTh 4 PM - 5:50 PM SYLLABUS Fall 2015 John Sarli JB-326 jsarli@csusb.edu 909-537-5374 O¢ ce Hours: TuTh 11:30 AM - 1 PM and 3:30 - 4 PM, or by appt. Text: Marsden/Tromba (W.H. Freeman and Company) Vector Calculus (sixth edition) Prerequisites: MATH 212 with a grade of "C" or better. This a …rst course in multivariable calculus, covering the di¤erential calculus of vector functions. (A second course in multivariable calculus covers the classical theorems of integration, which generalize the fundamental theorem of calculus.) We will develop some basic theory but the main goal is to become pro…cient in computations with functions of two and three variables. In doing so we develop an understanding of the geometry of space so that applications to various branches of science and mathematics become accessible. We will cover the …rst half of the text, Chapters 1 through 4, excluding material that explicitly depends on certain topics from MATH 213. Grading will be based on two midterm exams, a cumulative …nal exam, and four graded assignments (one from each chapter), weighted as follows: First Midterm (15%), Second Midterm (25%), Final Exam (40%), Graded Assignments (20%). To reinforce written communication skills the Graded Assignment solutions should be clearly presented in a typed format, either printed or sent to me as a PDF (do not scan in handwritten work). Although there is no attendance requirement for this class, you must complete the CSU/UC Mathematics Diagnostic Testing Project CR test within the …rst two weeks of the course (by October 8). Go to mdtp.ucsd.edu and scroll down to MDTP Web Based Tests. Select the CR test. The items will appear one at a time. You can either print the results or send them to me electronically. The purpose of this requirement is for you to assess any ‡aws in your fundamental skills that can hinder your conceptual understanding of this higher material. The test results do not a¤ect your course grade in any way, in fact I will not look at your individual item results unless you ask me to review them with you. A list of Suggested Exercises will be provided for each chapter and the exams will be written at the level of these exercises. I will list exercises that are representative of a particular technique or concept, but you should attempt as many similar exercises in the text as needed for understanding. In this way we can avoid "practice exams" and other routines that use the time we need to cover this material. It is your responsibility to bring questions to class that arise as you work through problems. After computing your total scores weighted according to the percentages above, course grades will be assigned as follows: A 91 A 86 90 B+ 81 85 B 76 80 B 71 75 C+ 66 70 C 61 65 C 51 60 D 45 50 F < 45 Success in this course requires a balance of three activities: 1) Read the text and work the exercises regularly. Keep notes of your solutions. If you have organized them e¢ ciently, bring them to the in-class exams. 2) Follow the lecture notes on my website: www.math.csusb.edu/faculty/sarli/ which also has the syllabus. Bring questions on these notes to class as they occur to you. 3) Participate in the class sessions as actively as you can. Lectures are more useful to you if you use them to clarify ideas as we develop them. Notes 2 1) Mid-term exam dates are subject to change. Due dates for the graded exercises will be set as we approach the end of each chapter of the text. 2) Learning Outcomes: Upon successful completion of this course, students will be able to: 1.1 demonstrate an understanding and apply fundamental concepts, operations and relations; 2.1 correctly apply mathematical theorems, properties and de…nitions; 3.3 explain and justify solutions using a variety of representations. 3) Please refer to the Academic Regulations and Policies section of your current bulletin for information regarding add/drop procedures. Instances of academic dishonesty will not be tolerated. Cheating on exams or plagiarism (presenting the work of another as your own, or the use of another person’s ideas without giving proper credit) will result in a failing grade and sanctions by the University. For this class, all assignments are to be completed by the individual student unless otherwise speci…ed. 4) If you are in need of an accommodation for a disability in order to participate in this class, please let me know ASAP and also contact Services to Students with Disabilities at UH-183, (909)537-5238. 3 Some important dates: 2015-09-24: First day of class 2015-09-30: Last day to add open classes w/o permission 2015-10-14: Census Deadline 2015-10-15: First Exam 2015-11-12: Second Exam 2015-11-26: Campus closed 2015-12-03: Last day of class 2015-12-08: Final Exam (Tuesday 4-5:50) Approximate course schedule: Overview of vectors in 2 and 3 dimensions; Algebra and geometry of vectors Elementary linear algebra in R3 ; Conversions between coordinate systems; linear algebra in Rn 10/15: First exam Functions from Rn to R; Continuity of functions from Rn to Rm Derivatives of real-valued functions; First-order approximation; Derivatives of functions from Rn to Rm Paths and Curves; functions from R to Rn ; Chain Rule; Directional derivatives 11/12: Second Exam Higher derivatives and Second-order approximation; Extrema of real-valued functions Constrained extrema; Acceleration and Force Arc length; Vector …elds and di¤erential operators 12/08: Final Exam 4 Suggested Exercises for Chapter 1 The following exercises are not to be handed in. They represent skills required for basic mastery. 1.1 (pages 18-19): 8; 9 17; 21; 23; 24; 27 1.2 (pages 29-31): 4 11; 12; 13; 14; 16; 17; 20 25; 26 1.3 (pages 49-51): 2; 4; 6; 7 11; 15; 16; 20 29; 33; 39 1.4 (pages 58-59): 3; 6; 7 10; 21 1.5 (pages 69-70): 3; 5; 9; 11; 12 21; 22; 24 5 First Graded Assignment Due: October 15 (if turned in by October 13 will be graded and returned by October 15) To reinforce written communication skills the Graded Assignment solutions should be clearly presented in a printed or PDF format. Late papers will not be graded. First Graded Assignment. Do any one of the following: Page Page Page Page Page 30: 31: 59: 70: 72: 36 38 18 24 40 6 Vectors Representation. A vector is a quantity that is characterized by magnitude and direction. Vectors are de…ned in any dimension. Consider two points P and Q. The segment P Q has a length P Q determined by the distance formula, but no direction. If P 6= Q we can specify a direction, for example, from P to Q. The directed line segment ! P Q represents a vector. (If P = Q the segment reduces to a single point and represents the zero vector, whose magnitude is 0 and whose direction is unde…ned.) Any directed line segment with length P Q that points in the same direction as ! P Q represents the same vector. For example, in 2-dimensional space we can represent points by ordered pairs in R2 . If P = (1; 4), Q = (4; 8) and A = (0; 0), ! ! B = (3; 4) then P Q and AB represent the same vector. Both segments have length 5 and point in the same direction. Given any vector we can change its magnitude without changing its direction by a process called scaling (also called dilation in geometry) that corresponds to multiplication by a positive number. Multiplication by a negative number reverses the direction of the vector. For ! ! example, multiplying P Q by 1 produces the vector QP . Multiplying any vector by 0 produces the zero vector. A point P in Rn corresponds to an n-tuple (x1 ; x2 ; : : : ; xn ) which we usually interpret as its location relative to designated axes that intersect at the origin ! O = (0; 0; : : : ; 0). The directed segment OP identi…es this point with a unique vector, sometimes called the position representation for the vector whose magnitude is OP and whose direction points from O to P . Conversely, any vector has a unique position representation: If we imagine a directed segment from A to B in n-dimensional space and choose a coordinate system with origin O, then the ! position representation of AB is obtained by translating A to O whereby P is the ! translated image of B. For example, if A = (1; 1; 2) and B = (4; 3; 14) then AB clearly has length 13 but the direction of this vector may be di¢ cult to visualize. ! However, the position representation of this vector is OP where P = (3; 4; 12), so we can imagine a directed segment from the origin to the point (3; 4; 12) in order to assess the direction of this vector. 7 Unit vectors. In Rn it will be useful to have a standard vector for each possible direction. In the abstract, it is common to notate vectors with bold letters, such as v. With this generic notation it is important to distinguish the zero vector 0 from the real number 0. If v 6= 0 then its magnitude jvj is a positive number and we 1 can multiply v by jvj to produce a vector of magnitude 1, called a unit vector. We can organize all unit vectors by their position representations. Thus, in R1 any unit vector is represented either by the directed segment from 0 to 1 on the number line, or by the directed segment from 0 to 1. In R2 , however, the position representatives for unit vectors are in one-to-one correspondence with the points on the unit circle; in R3 they are in one-to-one correspondence with the points on the unit sphere. Vector addition. Vectors were developed historically in two related contexts: physics and geometry. These two contexts are naturally related by the need to represent translation through a distance and physical quantities that depend on translations, such as velocity and force. For example, to describe the velocity of an object it is necessary to represent both its speed (distance/time) and its direction of motion. Even before the invention of calculus it was discovered that translations, forces and ! velocities add according to vector rules: If v and w are represented by v = P Q ! ! and w = P R then v + w is represented by the P S, where S is the fourth vertex of the parallelogram P QRS. This was known as the parallelogram law of addition. For the position representation in Rn this law is equivalent to the coordinate addition law: ! ! ! If v = OA and w = OB then v + w = OC, where A = (a1 ; : : : ; an ), B = (b1 ; : : : ; bn ) and C = (a1 + b1 ; : : : ; an + bn ). The equivalence of these addition laws is easily established using congruent triangles (see Page 5 of the text for the proof in R2 , which is no loss of generality since any two directed segments with a common endpoint determine a parallelogram in a plane). 8 We will need both the parallelogram law (synthetic approach) and the coordinate addition law (analytic approach) when working with vectors. To understand the synthetic approach it is important to draw …gures that interpret the vector operations. The following is a particularly useful exercise: ! ! As above, let v = P Q and w = P R so that v + w is represented ! by the P S, where S is the fourth vertex of the parallelogram P QRS. ! ! ! ! Then w = QS and so P S = P Q + QS. It follows that for any triangle P QS we have ! ! ! P Q + QS + SP = 0 ! ! ! Similarly, P R + RQ = P Q and so v ! w = PQ ! ! P R = RQ Geometrically, then, v + w and v w are the two diagonals of the parallelogram with directions as indicated. Synthetic vector notation was found to provide an e¢ cient description of geometric facts. The following is a typical example of how vectors were used to prove theorems about triangles. Sketch the vectors as you read through this construction and make sure your …gures accurately represent the equations. Let OP Q be a triangle with M the midpoint of P Q and N the midpoint of OQ. Let G = OM \ P N . Then ! ! ! OG + GN = ON ! ! ! P G + GM = P M ! ! ! ! Now OG and GM have the same direction, as do P G and GN , so there are numbers and such that ! OG ! PG ! GM = ! GN = 9 ! GM ! GN from which we obtain ! ! ! ! GM GN = ON P M ! 1 ! = OQ P Q 2 ! 1 ! OQ + QP = 2 ! 1 ! OP = OL = 2 where L is the midpoint of OP . However, since M and N are ! ! midpoints we have N M k OP and N M = 12 OP , so N M = OL. Note ! ! ! also that N M = GM GN , so we have arrived at the equation ! GM ( ! ! ! ! GN = OL = GM GN ! ! 1) GM = ( 1) GN and, since GM and GN are not parallel, both sides must be 0. We conclude that = = 1. From this vector calculation we conclude: 1) If G is the intersection of two medians of a triangle then the segment from a vertex to G is twice as long as the segment from G to the opposite midpoint ! ! (OG = 2GM , etc.). 2) Since 1) holds for any two medians, all three medians must be concurrent at G, the centroid of the triangle. Lines in Rn . In R2 a linear equation of the form ax1 + bx2 = c describes a line consisting of points (x1 ; x2 ) that satisfy this equation. In Rn with n > 2 a single linear equation does not describe a line. For example, in R3 we will see that an equation of the form ax1 + bx2 + cx3 = d describes a plane. To describe a line in Rn we use vectors. Since most of our work will take place in R2 and R3 we can avoid unnecessary subscripts by using (x; y) and (x; y; z), respectively, to denote points 10 in these spaces. Recall that the equation for a line in R2 was obtained from two pieces of information, its slope and a known point on the line. The slope gives information about the direction from one point to another on the line, and the known point distinguishes the line from all others with the same slope. This is the idea that generalizes to Rn : Identify a known point P on the line and identify a vector v whose direction is parallel to the line; then any point Q on the line will correspond to the position representation ! OP + tv for some numerical value of t. For example, in R2 consider the line with slope ! 2 that passes through P = (1; 3). If we set v = OV where V = (1; 2) then v is parallel to the line. We say the points on the line are parameterized by t and write the equation of the line as a vector function of t: ! l(t) = OP + tv = (1; 3) + t (1; 2) = (1 + t; 3 2t) Thus the line consists of all (x; y) given by the parametric equations x = 1+t y = 3 2t Note that t = x 1= 1 2 (y 3), and so y= 2x + 5 is the familiar Cartesian equation for the line. It is important to interpret the vector representation of a line in terms of the parallelogram law of vector addition: We can represent the direction vector v by a directed line segment ! that starts at the known point P . When we add OP to any multiple ! ! ! of the direction vector we obtain a vector OQ = OP + P Q where the point Q on the line is determined by the scalar parameter t. 11 Using this approach we can easily …nd the vector equation of a line in Rn . All we need is a direction vector and a known point on the line. In fact, since there is a unique line through any two distinct points we can …nd the equation from two known points on the line. For example, the line in R3 determined by the points P = (p1 ; p2 ; p3 ) and Q = (q1 ; q2 ; q3 ) has ! v = PQ as a direction vector, so a vector equation of the line is l(t) = (p1 ; p2 ; p3 ) + t (q1 p1 ; q2 p2 ; q3 p3 ) which yields the parametric equations x = tq1 + (1 y = tq2 + (1 z = tq3 + (1 t) p1 t) p2 t) p3 As before, we could solve for t in each equation to produce three symmetric equations y p2 z p3 x p1 = = q1 p1 q2 p 2 q3 p3 provided qj 6= pj for any j = 1; 2; 3, but we cannot reduce these to a single Cartesian equation in x; y; z. Returning to the vector equation, note that l(0) = (p1 ; p2 ; p3 ) and l(1) = (q1 ; q2 ; q3 ) so the values of t in the interval [0; 1] describe the segment P Q. Intersecting lines. In R2 , a pair of lines with distinct slopes will intersect at some point. In higher dimensions it is not obvious when two lines intersect but we can use their vector equations to obtain the point of intersection, if it exists, from the parametric equations. For example, in R3 consider the two lines l1 (t) = (2; 6; 3) + t (1; 0; 4) l2 (t) = (1; 1; 1) + t (3; 4; 5) For l1 we have the parametric equations x = 2 + t1 y = 6 z = 4t1 3 12 For l2 we have the parametric equations x = 1 + 3t2 y = 4t2 1 z = 1 + 5t2 If there is a point of intersection it might correspond to di¤erent values of the parameter in each description, which is why we write t1 and t2 instead of just t in each case. Can we …nd compatible values of t1 and t2 for these lines? For every point on the …rst line we have y = 6 so we can solve for t2 by setting 4t2 1 = 6 7 t2 = 4 Thus, if there is a point of intersection it must be the point l2 7 4 = 25 39 ; 6; 4 4 This point is on l1 provided there is a value t1 such that 25 4 39 3 = 4 2 + t1 = 4t1 simultaneously. Since this is not the case, the two lines do not intersect. Let l1 be the line through the point (4; 1; 6) with direction vector i j + k and let l2 be the line through the point (0; 1; 4) with direction vector i + j k. Find the point where these lines intersect. Lines and Planes through the origin. A line that passes through the origin with direction vector v has l(t) = tv 13 as a vector equation. We say that the line is the (one-dimensional) span of the vector v. Suppose in R3 that v = ai + bj + ck. Then the parametric equations for the line are x = at y = bt z = ct that is, the coordinates are given as constant multiples of the parameter t, so the origin is produced when t = 0. Now suppose v and w are non-zero vectors such that one is not a scalar multiple of the other. We de…ne the span of v and w to be all vector sums sv + tw where the parameters s and t vary over all real numbers. This span contains the individual spans of each vector because we can set one parameter equal to 0 and let the other vary, so geometrically the span of v and w contains two distinct lines that intersect at the origin. Any vector in the span must lie in the plane determined by these two lines, so the span is a plane that passes through the origin. We will see that in R3 this plane can be described by a single linear equation in x; y; z. For example, if v = i and w = j then the plane is the xy-plane which is described by the single equation z = 0. Similarly, the single equation y = 0 describes the span of v = i and w = k (the xz-plane), and x = 0 describes the span of v = j and w = k, (the yz-plane). Inner product, Angles, and Law of Cosines For any two vectors v and w we de…ne their inner product v w (also called scalar product or dot product) to be v w = jvj jwj cos where is the angle between v and w, measured by representing the vectors as ! ! directed segments P Q and P R. Since cos = cos (2 ) it does not matter which of the two possible angles we use. From the de…nition, the dot product is commutative v w=w v 14 This de…nition makes sense in Rn in general but we will focus on R3 . Let v = v1 i + v2 j + v3 k and w = w1 i + w2 j + w3 k. Then jv wj2 = (v1 w1 )2 + (v2 w2 )2 + (v3 w3 )2 = v12 + v22 + v32 + w12 + w22 + w32 2 (v1 w1 + v2 w2 + v3 w3 ) = jvj2 + jwj2 2 (v1 w1 + v2 w2 + v3 w3 ) By the Law of Cosines, since jv wj is the length of side RQ in the triangle P RQ, we have jv wj2 = jvj2 + jwj2 2 jvj jwj cos and so jvj jwj cos = v1 w1 + v2 w2 + v3 w3 Therefore, when v and w are written in component form, v w = v1 w1 + v2 w2 + v3 w3 from which it is easy to see that (cv) w = v (cw) = cv w and v (w1 + w2 ) = v w1 + v w2 Clearly, the inner product of any vector and the zero vector is the real number 0. If neither v and w are the zero vector the angle between them is found by cos = v w jvj jwj = arccos v w jvj jwj Since the range of the inverse cosine function is [0; ] this de…nes, as a matter of convention, the angle between two vectors so that 0 . The inner product is useful in describing the geometry of vectors. For example, note that p jvj = v v 15 ! ! and we can express the distance between points P and Q as P Q = QP . Whenever v w = 0 we say that the two vectors are orthogonal. If neither is the zero vector then they are orthogonal precisely when the angle between them is 2 . If = 0 or = we say the vectors are parallel; these two cases maximize and minimize the inner product, respectively, so if 0 < < we have the strict Cauchy-Schwarz Inequality jv wj < jvj jwj The Triangle Inequality Note that jv wj = jvj jwj if either of the vectors is the zero vector, and, more generally, if v = w for some real number . The general Cauchy-Schwarz Inequality jv wj jvj jwj implies the Triangle Inequality jv + wj jvj + jwj for any vectors in Rn . To see this, note that v w jv wj jvj jwj Consequently, jv + wj2 = (v + w) (v + w) = jvj2 + jwj2 + 2 (v w) jvj2 + jwj2 + 2 jvj jwj = (jvj + jwj)2 Since jv + wj 0 and jvj + jwj 0 the Triangle Inequality follows. 16 Projection and Re‡ection Basic trigonometry suggests we de…ne the orthogonal projection of v on w to be v w w w w w = v jwj projw v = w jwj w where the second expression indicates that we are multiplying the unit vector jwj w by the inner product of v with this unit vector. The scalar v jwj is sometimes called the component of v in the direction of w. Since the unit vectors i; j; k are orthogonal to each other, the equation v = v1 i + v2 j + v3 k expresses the decomposition of v into the sum of its orthogonal projections on these vectors. Another important geometric application of the inner product is the operation of re‡ection. If u is a unit vector we de…ne the re‡ection of v relative to u by ref u v = v = v 2proju v 2 (v u) u By taking the inner product of v 2 (v u) u with itself we see that the vector ref u v has the same magnitude as v. If we imagine all vectors in R3 that are orthogonal to u we see that they comprise a plane through the origin. The vector ref u v is in the half-space opposite to that of v relative to this plane, so ref u v is the re‡ection of v in the plane determined by u. Like the inner product itself, the re‡ection operation makes sense in Rn in general. 17 Force and Displacement. Even without calculus some basic applications can be described with vectors. One of the earliest discoveries described the resultant force acting on an object as the sum of multiple constant forces acting on that object. Each of these forces is represented by a vector whose magnitude measures the amount of force and whose direction speci…es the direction in which the force is applied. The resultant force is just the vector sum of the individual forces. Force is de…ned by Newton’s Second Law in terms of mass, distance, and time: F orce = M ass Acceleration Since mass is a scalar quantity, acceleration, like force, is a vector quantity. If mass is in kilograms, distance in meters, and time in seconds then the magnitude of force is in units of N = kg- m= s2 where the symbol N stands for Newtons. A force of 3 N acts on a body in the direction of i + j + k. Simultaneously, a second force of 5 N acts on the body in the direction of 53 i 45 j. What is the resultant force? If no other forces act on the body, in what direction will it move? p The 3 N force is represented by the vector 3 (i + j + k) because the unit direction vector is p13 (i + j + k). Similarly, the 5 N force is represented by 3i 4j. The resultant force is p F = 3 (i + j + k) + (3i 4j) p p p = 3+ 3 i+ 3 4 j + 3k p p p The magnitude of this force is jFj = F F = 34 2 3 N, which is 5:53 N. The direction the body will move is the direction of the resultant force. The unit vector in this direction is p p p F 1 =p 3 4 j + 3k 3+ 3 i+ p jFj 34 2 3 Think of the unit sphere as a three-dimensional compass and imagine a directed segment from the origin to the point on the unit sphere approximated by (0:856; 0:410; 0:313). 18 The inner product has a natural interpretation as the work done by a force acting on a body. If a force F acts on a body so as to displace it through a given distance in a given direction then W =F D wherepD is the the displacement. Thus, if the force F = p vector representing p 3+ 3 i+ 3 4 j + 3k in the above example acts on a body that is conp p p strained to move in a straight line from the origin to the point P = 3 3; 4 + 3; 3 the work done is p p p p p p 3 + 3 4 4+ 3 + 3 3 W = 3+ 3 3 = 4 p ! ! p p The total distance the body moves is jDj = OP OP = 2 3 + 34 m and the work done by the force is 4 J, where the symbol J stands for Joules. Note that in this case W is a negative number. This means that the angle between the force vector and the displacement vector is obtuse, 97 in this case since arccos 1 p 286 143 F D = arccos jFj jDj 1:6893 rad Velocity also has both magnitude and direction. The magnitude of a velocity is the speed. Velocities add as vectors, which is why aircraft need to adjust velocity for the velocity of the wind so that the sum of the two results in the intended velocity. Matrices and Vectors We used the angle between two vectors to de…ne their inner product, in any dimension, by v w = jvj jwj cos which suggests there may also be a geometric interpretation of the scalar quantity A = jvj jwj sin 19 Basic trigonometry shows that A is the area of the parallelogram determined by v and w. Notice that A2 = jvj2 jwj2 (v w)2 In R2 we …nd that A2 = v12 + v22 w12 + w22 (v1 w1 + v2 w2 )2 = (v1 w2 v2 w1 )2 so A = jv1 w2 0. But in R3 we have because sin A2 = v12 + v22 + v32 = (v2 w3 v2 w1 j w12 + w22 + w32 v3 w2 )2 + (v3 w1 (v1 w1 + v2 w2 + v3 w3 )2 v1 w3 )2 + (v1 w2 v2 w1 )2 and so A can be interpreted as the magnitude of the vector (v2 w3 v3 w2 ) i+ (v3 w1 v1 w3 ) j+ (v1 w2 which is the determinant of the matrix 0 1 i j k @ v1 v2 v3 A w1 w2 w3 v2 w1 ) k We call this vector the cross product and denote it by v w. Its properties are discussed below, but …rst we introduce the basics of matrix algebra and computation of determinants. 20 A matrix is an m n rectangular array of numbers, where m is the number of rows and n is the number of columns. We operate with matrices algebraically in a variety of contexts, denoting them by letters such as A = (aij ) indicating that the entries in the matrix are the numbers aij , the entry in the ith row and j th column. The transpose of an m n matrix A is the n m matrix AT obtained by interchanging its rows and columns. Thus, AT = (aji ) A matrix with m = n is called a square matrix. A square matrix with aij = 1 if i = j and aij = 0 if i 6= j is usually denoted In and is called the n n identity matrix. Two matrices A = (aij ) and B = (bij ) with the same m n shape can be added entry-wise to obtain A + B = (aij + bij ) Any matrix A can be multiplied by a scalar to obtain A = ( aij ) Matrices, then, share many algebraic properties with vectors but also have additional algebraic properties that make them useful in vector calculus. Matrix multiplication. If the number of columns of A is equal to the number of rows of B then the product AB is de…ned as follows. If A has shape m n and B has shape n p then AB = (cij ) is the matrix with shape m p where cij is the inner product of the ith row of A with the j th column of B. For example: 2 4 1 0 1 3 0 1 3 1 @ 1 1 0 0 3 2 1 2 6 A= 5 21 3 5 2 3 22 10 10 13 Note that if p = m then the products AB and BA are both de…ned, but AB has shape m m and BA has shape n n. For example, 0 1 3 1 2 1 0 @ 5 2 1 0 A = 4 1 3 22 10 3 2 0 1 0 1 3 1 10 2 3 1 0 @ 1 0 A 2 1 0 A = @ 2 4 1 3 3 2 14 1 6 Even if m = n it is generally the case that AB 6= BA. If AB = BA we say that the two matrices commute. An important special case is when B has shape n 1, which is sometimes called the column representation of the vector corresponding to the point (b1 ; : : : ; bn ) in Rn . The product AB has shape m 1 and we say that the matrix A represents a linear transformation from Rn to Rm . For example, since 0 1 1 2 1 0 @ A 2 0 = 4 1 3 10 2 2 1 0 represents a linear transformation from R3 to R2 . 4 1 3 We will see that this matrix operation allows us to describe the derivative of a multivariable function. the matrix A = Invariants and Inverses. For any square matrix we can compute a set of fundamental numbers called invariants of the matrix. These invariants are derived from the coe¢ cients of the characteristic polynomial of the matrix. To …nd this polynomial we …rst introduce the determinant of a square matrix, which will turn out to be one of its invariants. There are many ways to de…ne the determinant but for our purposes the inductive de…nition works best. We de…ne the determinant of a 2 2 matrix A to be det a11 a12 a21 a22 = a11 a12 a21 a22 22 = a11 a22 a12 a21 The characteristic polynomial of a A is de…ned by P ( ) = jA = 2 I2 j (a11 + a22 ) + (a11 a22 a12 a21 ) The constant term of P ( ) is the determinant of A. The sum of the diagonal entries of A, a11 + a22 , is called the trace of A, denoted tr A. The trace and determinant are the two principal invariants of a 2 2 square matrix. The determinant of an n n matrix is computed by …nding its cofactors. The cofactor Cij is the determinant of the (n 1) (n 1) matrix obtained by removing the ith row and the j th column of A and then multiplying this determinant by ( 1)i+j . For n = 3 we have 1 0 a11 a12 a13 A = @ a21 a22 a23 A a31 a32 a33 and its cofactor matrix is 0 a22 a33 a23 a32 @ a12 a33 + a13 a32 (Cij ) = a12 a23 a13 a22 a21 a33 + a31 a23 a11 a33 a13 a31 a11 a23 + a21 a13 1 a21 a32 a22 a31 a11 a32 + a12 a31 A a11 a22 a12 a21 The determinant of A is computed by forming the sum of the n cofactors for any particular row or column multiplied by the respective entries in that row or column. For example, if we choose the …rst row of A, above, we …nd det A = a11 (a22 a33 a23 a32 ) + a12 (a31 a23 a21 a33 ) + a13 (a21 a32 a22 a31 ) The same value for det A will be obtained regardless of which row or column we choose for this computation. We can now …nd the characteristic polynomial of a 3 P ( ) = jA I3 j 3 = + (tr A) 2 2 3 matrix: (A) + det A where tr A = a11 +a22 +a33 and 2 (A) = (a12 a21 a11 a22 a11 a33 + a13 a31 a22 a33 + a23 a32 ) . The invariant 2 generally is not given a speci…c name. It has important interpretations in mechanics but we will not need it speci…cally for di¤erential calculus. 23 With this de…nition of P ( ) the determinant of an n n matrix will always be the constant term, and the trace will always be ( 1)n 1 times the coe¢ cient of n 1 . Thus, for n = 4 we would have P ( ) = jA = 4 I4 j (tr A) 3 + 2 (A) 2 3 (A) + det A where the invariants 2 and 3 are expressions of degree 2 and 3, respectively, in the entries aij . The trace will always be the sum of the diagonal entries of A and the determinant will always be an expression of degree n. For this reason tr A and det A are generically denoted by 1 and n , respectively, though we will not use this notation since these are the only two principal invariants that will appear in our calculations. If A is an n n matrix with det A 6= 0 then there is a matrix A 1 , called the inverse of A, such that AA 1 = A 1 A = In If A represents a linear transformation from Rn to Rn then A 1 represents the inverse transformation. A basic theorem from linear algebra shows how to obtain A 1 from the cofactor matrix of A. Let C be the cofactor matrix of A, where det A 6= 0. Then A 1 CT . det A 0 1 1 1 1 For example, if A = @ 0 2 1 A then det A = 0 0 3 Then 0 1 0 6 3 3 1@ 0 3 1 A=@ A 1= 6 0 0 2 from which it is easy to check that AA the adjugate of A. 1 0 6 @ 3 6 and C = 3 1 1 1 1 2 2 1 A 0 12 6 1 0 0 3 1 = 1 0 0 3 0 A. 1 2 = I3 . The matrix C T is sometimes called Cross Product and Geometry of Planes 24 As discussed above, for vectors in R3 it is possible to form the product of two vectors that produces a vector orthogonal to each of them. For vectors v = v1 i + v2 j + v3 k w = w1 i + w2 j + w3 k this so-called cross product is computed by evaluating the formal determinant v i j k w = v1 v2 v3 w1 w2 w3 = (v2 w3 It follows that v w = w It also follows directly that v3 w2 ) i v and so v (v (v (v1 w3 v3 w1 ) j + (v1 w2 kv2 w1 ) k w = 0 if w is a scalar multiple of v. w) v = 0 w) w = 0 These properties allow us to …nd the Cartesian equation of any plane in R3 . Let n = Ai + Bj + Ck be a normal vector for the plane that contains the point P0 = (x0 ; y0 ; z0 ). If P = (x; y; z) is any point on this plane then ! n P0 P = 0 A (x x0 ) + B (y y0 ) + C (z z0 ) = 0 and so the equation for the plane is Ax + By + Cz = D D = Ax0 + By0 + Cz0 In particular, if the plane contains the origin then D = 0 since we can use P0 = (0; 0; 0). If this 2-space is the the span of v and w then we can use v w for the normal vector n. For example, the span of v = 3i + 2j k and w = i j + 2k is the plane 3x 7y 5z = 0 because v w =3i 7j 5k. Application. Let Q = (x1 ; y1 ; z1 ). Find the shortest distance d from Q to the plane Ax + By + Cz = D and …nd the point on the plane closest to Q. 25 Let P0 = (x0 ; y0 ; z0 ) be the point on the plane closest to Q. Then Ax0 + By0 + Cz0 = D ! and since P0 Q is parallel to n = Ai + Bj + Ck we also have x1 y1 z1 x0 = y0 = z0 = A B C Solving for x0 ; y0 ; z0 it follows that = ! But d2 = P0 Q 2 = 2 d= Ax1 + By1 + Cz1 D A2 + B 2 + C 2 (A2 + B 2 + C 2 ) and so jAx1 + By1 + Cz1 Dj p A2 + B 2 + C 2 Finally, x0 = x1 y0 = y1 z0 = z1 A B C are the coordinates of P0 . Intersection of Lines and Planes. We used the vector/parametric representation to …nd the possible intersection of two lines in R3 . To …nd the intersection of two planes, or the intersection of a line with a plane, we use both the vector/parametric and Cartesian forms. Given a line l (t) = (x (t) ; y (t) ; z (t)) and a plane Ax + By + Cz = D there intersection would be found by solving for t upon substitution. 26 For example, if l (t) = (3 substitute to obtain (3 t; 2t; 5 + 4t) and x + 3y + z = 2 we t) + 6t + (5 + 4t) = 2 t = 2 3 so the line intersects the plane at the unique point 11 ; 3 4 7 ; 3 3 . There will be a unique solution if the line and plane intersect in exactly one point and no solution if the line and plane are parallel. What happens if the line is contained in the plane? Two planes will either be parallel or intersect in a line. They will be parallel if their normal vectors are scalar multiples of each other. Otherwise their crossproduct is a non-zero vector orthogonal to both of them and can be used as a direction vector for the line of intersection. For example, 3x + 2y z = 4 has normal vector 3i + 2j k and x y + 2z = 0 has normal vector i j + 2k. The cross-product is 3i 7j 5k and to …nd the line of intersection we need a known point on the line with this direction vector. Since neither plan is parallel to the xyplane the line of intersection must contain a point with z = 0. Solving 3x + 2y = 4 and x y = 0 simultaneously we …nd that 4 4 ; ;0 5 5 is on both planes, so their line of intersection is l (t) = 4 4 + 3t; 5 5 27 7t; 5t Study Guide for First Exam Vector representation. ! Directed line segment P Q Magnitude and direction Parallelogram Law of addition ! Position representation OP in R2 and R3 Component representation v = v1 i + v2 j + v3 k Unit vectors and direction. In R1 there are only two unit vectors and they are represented by the directed ! line segments OP where O is located at 0 and P is either at 1 or 1. ! In R2 the unit vectors are represented by the directed line segments OP where O is located at (0; 0) and P is at (cos ; sin ), 0 <2 . ! 3 In R the unit vectors are represented by the directed line segments OP where O is located at (0; 0; 0) and P is at (cos sin ; sin sin ; cos ), 0 < 2 , 0 . Vector algebra. Inner Product v w Angle between vectors Cross Product v w Area of parallelogram Projection: projw v Geometry in R2 and R3 . ! Vector/parametric description of lines: l(t) = OP + tv = (x (t) ; y (t) ; z (t)) Intersection of lines in R3 Cartesian equation of plane containing P0 with normal vector n Distance from a given point to a given plane Point on a given plane closest to a given point Intersection of two planes, or of a line and a plane, in R3 Matrices. Transpose Addition and Multiplication of compatible matrices 28 Trace and Determinant of 2 2 and 3 Inverse of a 2 2 or 3 3 matrix 29 3 matrices Alternative Coordinate Systems In R2 there are two systems of coordinates that are used most often: rectangular and polar. Rectangular coordinates is often called Cartesian coordinates - any point is uniquely represented by an ordered pair of real numbers. Polar coordinates are not unique - they represent a point in terms of its distance from the origin and its angular position relative to a reference ray, usually the positive horizontal axis. The relation between the rectangular description (x; y) and the polar description (r; ) is given by x = r cos y = r sin so there are in…nitely many choices of for any given point once we determine r for that point. Further, we sometimes want to consider r to be the signed distance 2; 43 are both polar coordinate defrom the origin; for example, 2; 3 and p scriptions of the point whose rectangular coordinates are 1; 3 . For any given polar coordinate description (r; ) we unambiguously obtain the rectangular description (x; y) from the above equations. But to go in the other direction requires some conventional choices to obtain a unique polar description. These conventions vary by context, but the default is to take r 0 and 2 [0; 2 ). Then we can compute p r = x2 + y 2 and use basic trigonometry to …nd in terms of arctan xy . Even though r is now uniquely determined, we need to be careful in computing because the range of the inverse tangent function is ; , which only covers half of the plane. In 2 2 order to have 2 [0; 2 ) we set y = arctan , x > 0; y 0 x y = 2 + arctan , x > 0; y < 0 x y = + arctan , x < 0 x = , x = 0; y > 0 2 3 = , x = 0; y < 0 2 30 The only point not covered by these conventions is the origin. Note that the polar description of a point associates the point with a position vector, whereby r is its magnitude and is its direction. The origin would correspond to the zero vector, which has magnitude 0 but no direction. Therefore, is not de…ned for the origin; we simply write r = 0 as the polar description of the point whose rectangular coordinates are (0; 0). Polar coordinates are easily extended to R3 if we want to emphasize radial symmetry about the vertical axis: x = r cos y = r sin z = z The coordinates (r; ; z) are called cylindrical coordinates because the set of points described by the equation r = a is a cylinder of radius a whose axis of symmetry is the z-axis. Note also that the equation = is a half-plane bounded by the z-axis if we only allow non-negative values for r, whereas it is a plane through the z-axis if r can be a signed distance from the z-axis. The equation z = a is a plane parallel to the xy-plane. 3 A third coordinate system commonly used p in R , spherical coordinates, describes 2 2 points relative to their distance = x + y + z 2 from the origin. Spherical coordinates are the mathematical formalization of latitude and longitude on the globe: x = y = z = sin cos sin sin cos Here 2 [0; 2 ) has the same interpretation as in polar and cylindrical coordinates. The angular coordinate is usually measured o¤ of the positive vertical axis, so 2 [0; ]. If the point with rectangular coordinates (x; y; z) is projected orthogonally into the xy-plane, note that its polar coordinates would be (r; ) where r = sin The equation equation = = a describes a sphere of radius a centered at the origin; the is a half-plane bounded by the z-axis because there is generally 31 no advantage to allowing to be negative; the equation = is the branch of a cone with vertex at the origin consisting of the points P such that the angle ! ! between k and OP is . Since OP = we have = arccos ! k OP because the range of the inverse cosine function is [0; ]. Cylindrical and spherical coordinates will be useful in describing surfaces and solids in R3 , so it is important to understand both the geometry and algebra of these systems. For example, the set of points that satisfy = 4 csc sec might be more familiar if expressed in rectangular coordinates, which we can obtain from the above relations, as follows: 1 1 sin cos sin = 4 r x r =4 = 4 rx x x = 4 = 4 Thus, = 4 csc sec is the spherical coordinate description of the plane x = 4. 32 Suggested Exercises for Chapter 2 The following exercises are not to be handed in. They represent skills required for basic mastery. 2.1 (pages 85-87): 3; 4; 5; 6 24; 35 2.2 (pages 103-105): 3; 6; 9 21; 26 2.3 (pages 115-116): 1; 3; 5; 9; 13 19; 25 2.4 (pages 123-124): 3; 5; 7; 13; 17 21; 23 2.5 (pages 132-134): 3; 7; 9; 11; 15 2.6 (pages 142-143): 1; 3; 7; 9; 11; 17 Second Graded Assignment Due: No later than November 10 To reinforce written communication skills the Graded Assignment solutions should be clearly presented in a "bluebook" or provided in PDF format. Late papers will not be graded. Second Graded Assignment. Do any one of the following: Page 87: 42 Page 124: 24 Page 143: 28 Page 146: 44 33 Real-Valued Functions If f is a function whose domain is a subset A of Rn we say f is real-valued (scalarvalued) provided f : A ! R. For example, if n = 2 then f assigns a numerical output to any point in the plane that belongs to A. We typically determine the domain A from the properties of f , but we can also restrict the domain to a proper subset U A. Consider 1 f (x; y) = xy The largest possible domain for f is A = f(x; y) : x 6= 0; y 6= 0g which is R2 with the two coordinate axes removed. In context, this domain could be restricted, for example, by taking U to be the open …rst quadrant. The range of a real-valued function is the set of real numbers that are possible outputs of f . In the example above, the range is all non-zero numbers: ( 1; 0) [ (0; 1). For p f (x; y) = x2 + y 2 A = R2 and the range of f is [0; 1); this function takes a point in the plane and gives its distance from the origin. If n = 1 we are in the realm of single-variable calculus. Recall that the graph of such a function on a domain U was de…ned to be f(x; f (x)) : x 2 U g so the graph is a subset of R2 , Similarly for n > 1, the graph of f on U is f(x; f (x)) : x 2 U g where x = (x1 ; : : : ; xn ). In general, then, the graph of a real-valued function is a subset of Rn+1 . If n = 2 then we can represent the graph as a surface in R3 , in the same way that we drew the graph of a single-variable function as a curve in R2 . The …gure below shows the graph of f (x; y) = x2 y 2 on the restricted domain U = f(x; y) : 4 x 4; 4 y 4g 34 -4 20 10 -2 -4 -2 z y 0 0 0 -10 2 -20 x 2 4 4 f (x; y) = x2 y2 One of our goals is to understand how such functions change in the vicinity of a particular domain point x. The geometry of the graph will be an important tool. To simplify notation we often replace subscripts with familiar variable names. Thus, for n = 2, x1 = x x2 = y z = f (x; y) and for n = 3, x1 x2 x3 w = = = = x y z f (x; y; z) In applications, the names for the variables may be adapted to certain measurable quantities, such as t for time. For n > 3 we use subscripts generically, but most of our work will not require higher dimensions explicitly. 35 Sections of Graphs If n > 2 then we cannot represent the graph of f by a geometric …gure that we can readily perceive. The vector techniques that we have developed will nonetheless allow us to analyze the function. These techniques will exploit basic Cartesian geometry. For example, if we restrict the graph of f by holding one of the input coordinates constant we obtain a section of the graph. Consider the graph of the saddle-shaped surface in the above example, and set x = 2. Then f (2; y) = 4 y 2 . The curve z = 4 y 2 is a parabola in the plane x = 2, which is parallel to the yzplane, so the curve can be described in terms of the y and z variables. Similarly, if we set y = 3 we obtain the parabola z = x2 9 in this plane parallel to the xz-plane. The shape of the surface shows why the …rst of these parabolas "opens downward" whereas the second one "opens upward". A section obtained by setting an input coordinate equal to 0 is sometimes called a trace section. A trace section, then, is just the intersection of the surface with a coordinate plane for the input coordinates. What are the trace sections for f (x; y) = x2 y 2 ? While sections can be formed for any real-valued function we will primarily use them when n = 2 in order to understand the geometry of surfaces in R3 . Here is an example, however, of how to …nd sections when n = 3. Consider f (x; y; z) = x2 y2 + z2 The trace section z = 0 is described by the Cartesian relation w = x2 y2 which is the saddle-shaped surface again, but here we are imagining it in the xywspace in R4 obtained by setting z = 0. Functions f : R3 ! R are used extensively in physical applications and they are frequently studied by constructing sections. Level Sets If instead of holding an input coordinate constant we set the output variable equal to a given value c then we obtain the level set of value c fx 2 U : f (x) = cg 36 For n = 2 this will give us a curve in the domain plane, sometimes called a level curve (or a contour, as it is referred to on a topographical map). For f (x; y) = x2 y 2 the level curve z = c is a hyperbola. Note that the hyperbola opens along the x-axis if c > 0 but opens along the y-axis if c < 0. What happens if c = 0 ? The level set is a subset of the domain but we can also represent it as slice of the graph. 20 10 -4 -4 -2 -2 z 0 0 0 2 y -10 2 x 4 4 -20 The intersection corresponding to z = 9. The actual level curve is the hyperbola x2 y 2 = 9 in R2 . We obtain an image of this curve in the plane z = 9 parallel to (and below since c < 0) the xy-plane. Level sets are particularly useful when n = 3 because we can use them to understand subsets of the domain that produce a desired output value. These subsets will be level surfaces in R3 , which in principle we can render graphically. We will …nd it particularly useful to represent a general surface in R3 as the level surface f (x; y; z) = 0 for some function f . In general, these can be di¢ cult to render but certain examples, often called quadric surfaces because they involve quadratic expressions 37 in x; y; z, are straightforward to analyze. They can be grouped by the degree and sign of the various domain variables: p w = f (x; y; z) = x2 + y 2 + z 2 c : In this case w = 0 is a sphere of radius c if c > 0, the single point (0; 0; 0) if c = 0, and empty if c < 0. For example, there are no points (x; y; z) in the domain such that f (x; y; z) = x2 + y 2 + z 2 + 1 = 0. w = f (x; y; z) = x2 + y 2 z 2 c : If c = 0 then the surface w = 0 is a cone with vertex at the origin. If c > 0 then the surface is a single-sheeted hyperboloid p whose intersection with the xy-plane is a circle of radius c. If c < 0 then the surface is a double-sheeted hyperboloid that does not intersect the xy-plane. (See Figure 2.1.13 on page 83.) What happens when we intersect these surfaces with planes perpendicular to the xy-plane? w = f (x; y; z) = x2 + y 2 z c : The surfaces w = 0 are paraboloids, bowlshaped surfaces that open either up or down depending on the coe¢ cient of z. For example, if f (x; y; z) = x2 + y 2 z + 4 then the paraboloid w = 0 is described explicitly by z = x2 + y 2 + 4 The plane z = 4 in the domain space intersects this surface only at (0; 0; 4). A plane z = a intersects it in a circle if a > 4 but will not intersect it if a < 4. 12 10 8 6 4 2 -2-1 0 -1-2 0 0 1 1 2-2 2 y -4 x -6 z = x2 + y 2 + 4 38 w = f (x; y; z) = x2 y 2 z c : The surfaces w = 0 are hyperbolic paraboloids. For example, x2 y 2 z = 0 is the saddle-shaped surface above. Cylinders: These surfaces consist of all translations of a plane curve along a line and occur when the dependence of f on one or more of the input variables can be removed by a linear transformation. For example, the cylinder 30 20 z -4 -2 10 2 4 -2 0 0 0 -4 2 x y x2 + y is congruent to 39 z=0 4 12 10 8 z 6 4 2 -2 0 0 0 -4 y2 -4 -2 2x 4 4 x2 2z = 0 which is the translation along the y-axis of the curve z = 12 x2 in the xz-plane. The …rst cylinder is just a rotation in R3 of the second one. Every quadric surface is equivalent to one of the above types. For example, x2 + 2y 2 + z 2 4 = 0 is an ellipsoid, obtained from a sphere by a linear transformation of the variables. 40 2 1 z 0 -1 -2 -1.0 -2 -1 -0.5 0.0 y 0 0.5 1 1.0 2 x2 + 2y 2 + z 2 = 4 41 x Continuity Recall that a single-variable function y = f (x) is continuous at x = a provided lim f (x) = f (a) x!a This de…nition of continuity states that a is in the domain of f and that the value of the function at a is equal to the limit of values as x approaches a. In order to adapt this de…nition to functions from Rn to Rm we need to revisit the concept of limit in terms of vector quantities. We will …rst study real-valued functions f : Rn ! R because each component function in the general case f : Rn ! Rm is real valued. Recall from single-variable calculus for y = f (x) with domain A we say lim f (x) = b x!x0 if and only if for every number " > 0 there is a > 0 such that, for any x 2 A with 0 < jx x0 j < , we have jf (x) bj < ". It is important to remember in this de…nition that x0 need not be in the domain A, and even if x0 2 A it need not be the case that f (x0 ) = b for the limit to be b. For f to be continuous at x0 , however, we must have x0 2 A and f (x0 ) = b. If we think of the limit de…nition in terms of distances in the range as related to distances in the domain then we have a direct generalization to functions from Rn to R : For y = f (x) 2 R with domain A Rn we say lim f (x) = b x!x0 if and only if for every number " > 0 there is a > 0 such that, for any x 2 A with 0 < jx x0 j < , we have jf (x) bj < ". The generalization to f : Rn ! Rm is now natural: For y = f (x) 2 Rm with domain A Rn we say lim f (x) = b x!x0 if and only if for every number " > 0 there is a > 0 such that, for any x 2 A with 0 < jx x0 j < , we have jf (x) bj < ". 42 We just need to remember that jvj is the magnitude of the vector v, so jx x0 j for example is the magnitude of the di¤erence of the vectors x and x0 which is just the distance between the points that represent these vectors. The existence of the limit is the idea that we can make the output of f as close to the point b as we like just by making the input x close enough to the point x0 . The de…nition of continuity carries over as well: The function f is continuous at x0 in its domain provided lim f (x) = f (x0 ) x!x0 For a single-variable function the existence of a limit can be determined by analyzing what happens as x approaches the target value from the left and from the right. The limit can exist from one direction but not from the other, or from neither. The limit itself exists when the limit from each direction exists and these limiting values agree. When we try to carry this idea over to multivariable functions we realize that their are in…nitely many directions of approach as well as in…nitely many paths of approach to the target point. Since we cannot check all of these modes of approach individually it is necessary to introduce techniques from analysis to determine limits. We do not need very much analysis at this introductory level, but it is helpful to understand a few common terms. De…nition. Let r > 0 and let x0 2 Rn . The set Dr (x0 ) is the collection of points x in Rn such that jx x0 j < r. A subset U Rn is called an open set if there is an r for every x0 2 U such that Dr (x0 ) is contained in U . These so-called ’disks’ Dr (x0 ) generalize the open intervals from single-variable calculus, where an open set U R has the property that an open interval centered at each of its points can be found that is entirely contained in U . Limits in singlevariable calculus were often most important when approaching certain boundary points in the domain. De…nition. A point x in Rn is a boundary point of a set A Rn if every disk centered at x contains at least one point in A and at least one point not in A. For example, if A = f(x; y) : x2 + y 2 1g in R2 then the boundary points of A are precisely the points of the unit circle. If B = f(x; y) : x2 + y 2 < 1g in R2 then 43 the boundary points of B are also the points of the unit circle. Thus, boundary points may or may not belong to the set itself. Sometimes we use @A to denote the collection of all boundary points of A, and call @A the boundary of A. Real-Valued Polynomials and Rational Functions. Let A be the natural (unrestricted) domain of the function f . If f (x) = P (x) where P (x1 ; : : : ; xn ) is a polynomial in the individual variables then A = Rn and P (x) f is continuous at any x0 . If f (x) = Q(x) where P; Q are polynomials then A consists of all points x such that Q(x) 6= 0. Even if Q(x0 ) = 0 the limit as x ! x0 may exist, in which case we can de…ne f (x0 ) to be this limit and thus extend f to a function continuous at x0 . Consider f (x; y) = x2 y 2 x2 + y 2 For this function A = R2 nf(0; 0)g but the limit as x ! x0 = (0; 0) exists. One way to see this is to argue that, no matter what path of approach is taken by x to get to the origin, the value f (x) can be expressed in terms of jxj. This is commonly done by converting to polar coordinates: f (x; y) = f (r; ) = r4 cos2 sin2 r2 As long as x 6= x0 we have r 6= 0 in which case f (r; ) = r2 cos2 sin2 By taking x su¢ ciently close to x0 we can make r arbitrarily small. The value of cos2 sin2 may vary considerably depending on the path of approach, but this value remains bounded between 0 and 1, so f (r; ) can also be made arbitrarily small. We conclude lim f (x; y) = 0 (x;y)!(0;0) which allows us to extend f to a function continuous on all of R2 x2 y 2 x2 + y 2 f (0; 0) = 0 f (x; y) = This explains why a computer graph of z = 44 x2 y 2 x2 +y 2 looks unbroken at the origin: 10 -4 z -4 5 -2 -2 0 0 0 y2 2 x 4 4 z= x2 y 2 x2 +y 2 By contrast, consider f (x; y) = xy x2 + y 2 which has the same natural domain A = R2 nf(0; 0)g, but this time the limit as (x; y) ! (0; 0) does not exist. Converting to polar coordinates shows that a straight line path to the origin produces di¤erent limits depending on the angle that the line makes with the x-axis; for example, approaching along either axis produces a limit of 0 but the limit is 12 if we approach along the line y = x. A will attempt to display this "rupture" in the graph computer graph of z = x2xy +y 2 at the origin by shading, but the limitations of the rendering are easily revealed. Here are two planes parallel to the domain plane. The curves of intersection with the surface representing the graph are actually pairs of intersecting lines. If we approach (0; 0) along the level curves corresponding to these lines the function approaches the values given by the heights of the planes. 45 1.0 0.5 -1.0 -0.5 z 0.0 0.0 0.0 0.5 -0.5 -1.0 x -0.5 1.0 0.5 y 1.0 -1.0 z= xy , x2 +y 2 z= 2 5 and z = 1 3 Exercise. Show that the level curves of value c are p 1 y= 1 4c2 x 1 2c Thus, the range of f is 1 1 ; 2 2 . What happens when c = 1 2 ? Similar techniques can be used to analyze functions of more than two variables. By converting to spherical coordinates x = y = z = cos sin sin sin cos show that lim (x;y;x)!(0;0;0) f (x; y; z) = 0 xyz for the function f (x; y; z) = x2 +y 2 +z 2 . Show that the limit as (x; y; x) ! (0; 0; 0) does not exist for the function f (x; y; z) = xy+yz+xz . For the …rst function, x2 +y 2 +z 2 2 f ( ; ; ) = cos sin sin cos which goes to zero as the domain point approaches the origin since ! 0 and cos sin sin2 cos is bounded. For the second function, f ( ; ; ) = ((cos + sin ) cos + cos sin sin ) sin which does 46 not even depend on and approaches di¤erent values for varies directions of approach to the origin in R3 . Properties of continuous functions. Most of the continuity results for single-variable functions generalize to multivariable functions (see page 98). If the range of f is in Rm then we have f (x) = (f1 (x); : : : ; fm (x)) and it can be shown that f is continuous at x0 if and only if each of the real-valued coordinate functions fj is continuous at x0 . Using the concept of open sets, the following theorem about continuity of the composition of continuous functions is usually proved in an analysis course: Theorem. Let g : A Rn ! Rm and let f : B Rm ! Rp , and suppose g(A) B. Then f g is de…ned on A, and if g is continuous at x0 and f is continuous at g(x0 ) then f g is continuous at x0 . Example. The natural domain of h(x; y; z) = sin (xyz) xyz is R3 with the coordinate planes removed (the eight open octants). Any point x0 on a coordinate plane is a boundary point of the domain because every open ball of positive radius centered at x0 contains points in the domain and points not in the domain. In particular, the origin x0 = (0; 0; 0) is a boundary point. Does limx!x0 h(x) exist? Changing coordinate systems is not much help here, but the above theorem allows us to …nd the limit easily. Let g(x; y; z) = xyz and let f (t) = f : B ! R where B = Rnf0g. However, sin t . t Then g : R3 ! R and sin t =1 t!0 t lim so we can extend f to a continuous function on all of R by de…ning f (0) = 1 : 47 y 1.0 0.8 0.6 0.4 0.2 -10 -8 -6 -4 -2 2 4 6 8 10 x -0.2 f (t) = sin t ; f (0) t =1 Then h = f g has now been extended to include the origin in its domain. Since g is a polynomial it is continuous at every point in R3 . But g(0; 0; 0) = 0 and f is continuous at 0, so by the theorem we have f g is continuous at (0; 0; 0). In particular, limx!x0 h(x) = 1. Note. The above example is exercise 11b) on page 103. The answer in the appendix is a typo. 48 Partial Derivatives The derivative of a multivariable function will be de…ned in a way that generalizes the slope of the tangent line to the graph of a single-variable function. The …rst step toward this generalization is to de…ne the tangent plane to the graph of a real-valued function of two variables. That de…nition will require the idea of a partial derivative. Recall that a set U Rn is open when for every x0 in U there exists r > 0 such that Dr (x0 ) U , where Dr (x0 ) = fx 2 Rn : jx x0 j < rg. Note that if n = 1 then Dr (x0 ) is just the open interval of radius r centered at x0 . When we denote a set by U , the assumption is that the set is open. Let f : U Rn ! R, x = (x1 ; : : : ; xn ). The partial derivative of f with respect @f , where to xj is the real-valued function, denoted @x j @f f (x + hej ) (x) = lim h!0 @xj h Thus, the domain of @f @xj f (x) is the set of points in Rn for which the limit exists. For most elementary functions it is not necessary to compute these limits explicitly. The ordinary rules of di¤erentiation apply by treating the input variables other than xj as constants. For example, if f (x; y; z) = sin (xyz) then @f @f (x) = (x; y; z) = xz cos(xyz) @x2 @y As with single-variable functions, however, it is sometimes necessary to use the limit calculation. Consider the function xy 2 f (x; y) = 2 x + y4 for which A = R2 nf0g. We cannot de…ne f to be continuous at (0; 0) because if we approach along any straight line the limit is 0, but if we approach along the parabola x = ay 2 the limit is a2a+1 . Still, we can extend the domain to all of R2 by de…ning f (0; 0) to be anything we like, say f (0; 0) = 0. Now we can try to 49 compute the partial derivatives @f @x and @f @y at the origin: @f f (0 + h; 0) (0; 0) = lim h!0 @x h 0 0 = lim =0 h!0 h @f f (0; 0 + h) (0; 0) = lim h!0 @y h 0 0 =0 = lim h!0 h f (0; 0) f (0; 0) This example shows that the partial derivatives of a function can exist at a point even if the function is not continuous there. In this case, however, the partial derivative functions are not continuous at the origin. At any point other than (0; 0) the usual rules of di¤erentiation show @f x2 y 4 (x; y) = y2 @x (x2 + y 4 )2 @f x2 y 4 (x; y) = 2xy @y (x2 + y 4 )2 Exercise. Show that @f (x; mx) = m2 @x @f lim (x; mx) = 2m x!0 @y lim x!0 so for each partial derivative function the limit as (x; y) ! (0; 0) does not exist. For each of these functions, the limiting value depends on the path of approach to the origin. Tangent Planes and A¢ ne Approximation 50 For a single-variable function y = f (x) that is di¤erentiable at x0 the equation of the tangent line to the graph at (x0 ; f (x0 )) is y = l (x) = f (x0 ) + f 0 (x0 ) (x x0 ) The reason that f 0 (x0 ) is the slope of this tangent line is because f (x) l(x) = f (x) f (x0 ) f 0 (x0 ) (x x0 ) and so f (x) l(x) f (x) f (x0 ) = lim f 0 (x0 ) = 0 x!x0 x!x x x0 x x0 0 by the de…nition of the derivative. In other words, the a¢ ne linear function y = l(x) is a good approximation of f near x0 . Thinking of x as the parameter t we can express every point on this line as lim l(t) = (x0 ; f (x0 )) + t (i + f 0 (x0 )j) that is, x(t) = x0 + t y(t) = f (x0 ) + f 0 (x0 ) t because l(x0 + t) = f (x0 ) + f 0 (x0 ) t. In other words, the tangent line is the line through the point (x0 ; f (x0 )) with direction vector i + f 0 (x0 )j. Note, for future reference, that f 0 (x0 )i j is a vector orthogonal to the direction vector for the tangent line. Now consider a function of two variables z = f (x; y) with partial derivatives de…ned at (x0 ; y0 ), and look at the plane that contains the point (x0 ; y0 ; f (x0 ; y0 )) with normal vector @f @f (x0 ; y0 ) i+ (x0 ; y0 ) j k @x @y As we have seen, the Cartesian equation for this plane is @f (x0 ; y0 ) (x @x x0 ) + @f (x0 ; y0 ) (y @y y0 ) (z f (x0 ; y0 )) = 0 which is the graph of the a¢ ne linear function z = l(x; y) = f (x0 ; y0 ) + @f (x0 ; y0 ) (x @x 51 x0 ) + @f (x0 ; y0 ) (y @y y0 ) De…nition. We call this plane the tangent plane to the graph of f at (x0 ; y0 ) provided f (x; y) l(x; y) =0 (x;y)!(x0 ;y0 ) j(x; y) (x0 ; y0 )j lim In this case, the sections x = x0 and y = y0 for the function l are lines tangent to the section curves for f and we say that f is di¤erentiable at (x0 ; y0 ). Though the test for di¤erentiability is technical many of the same results apply as for single-variable functions. For example, rational functions are di¤erentiable at all points in the natural domain A. Consider f (x; y) = xy x2 y 2 x2 + y 2 What is the tangent plane at ( 1; 3; 12 ) ? Here, f ( 1; 3) = 5 we have 12 t 5 and on A = R2 nf0g @f 4x2 y 2 + x4 y 4 (x; y) = y @x (x2 + y 2 )2 4x2 y 2 x4 + y 4 @f (x; y) = x @y (x2 + y 2 )2 Thus @f 33 ( 1; 3) = @x 25 @f 29 ( 1; 3) = @y 25 and so the tangent plane at ( 1; 3; f ( 1; 3)) is the graph of the function l(x; y) = = 12 5 33 29 (x + 1) + (y 25 25 1 (33x 25 52 29y + 60) 3) 10 5 -4 4 -2 z2 y 0 0 0 -5 -2 -4 2 x 4 -10 Tangent plane at ( 1; 3; 12 ) 5 This graph looks "smooth" at the origin because the limit of f is 0 there as can be seen by converting to polar coordinates. If we de…ne f (0; 0) = 0 then f becomes continuous at the origin.Will the function be di¤erentiable there? We see from the above calculations that both partial derivatives exist at the origin: @f f (0 + h; 0) (0; 0) = lim h!0 @x h @f f (0; 0 + h) (0; 0) = lim h!0 @y h and so l(x; y) = 0 53 f (0; 0) f (0; 0) =0 =0 is the approximation function at (x0 ; y0 ) = (0; 0). Then f (x; y) l(x; y) (x;y)!(0;0) j(x; y) (0; 0)j f (x; y) = lim p (x;y)!(0;0) x2 + y 2 xy (x2 y 2 ) = lim 3 (x;y)!(0;0) (x2 + y 2 ) 2 1 = lim r sin 4 = 0 r!0 4 lim Thus the xy-plane is the tangent plane at the origin. The function f is di¤erentiable there. This is the idea that generalizes to a de…nition of the derivative for functions from Rn to Rm . Some notation will simplify the statement of this de…nition. Since f (x) = (f1 (x) ; : : : ; fm (x)) we can compute partial derivative functions for each of the component scalar functions tij = @fi @xj and arrange them in a matrix T = (tij ). Since the output of f is described by m coordinate functions, each of which depends on n inputs, the shape of the matrix T is m n. For any point x0 in Rn such that these partial derivatives all exist, the matrix T represents a linear transformation Df (x0 ) : Rn ! Rm and we have the a¢ ne linear function l(x) = f (x0 ) + Df (x0 ) (x x0 ) Since T is an m n matrix we can evaluate Df (x0 ) (x x0 ) by expressing x x0 in the form of a matrix with shape n 1. The product is now an m 1 matrix, so we also express f (x0 ) as an m 1 matrix for purposes of computation. 54 We say that f is di¤erentiable at x0 2 U provided lim x!x0 f (x) l(x) =0 jx x0 j The derivative of f at the point x0 is the linear transformation Df (x0 ), but since it is represented by the matrix T we usually say T = Df (x0 ) for purposes 2 2 of computation. Let’s …nd the derivative of f (x; y) = xy xx2 +yy2 at x0 = ( 1; 3). ( 1; 3) = 33 and @f ( 1; 3) = 29 so the derivative is We have @f @x 25 @y 25 33 25 T = Df (x0 ) = Note that l(x; y) = f ( 1; 3) + Df ( 1; 3) (x + 1; y transformation at (x + 1; y 3) we write 33 25 Thus, l(x; y) = 12 5 + 33 25 29 25 29 25 3) and to evaluate the linear x+1 y 3 29 25 x+1 y 3 = 1 25 (33x 29y + 60), as above. When m = 1, as in our example, the linear function Df is naturally associated with the vector @f @f rf = e1 + + en @x1 @xn called the gradient of f , whereby Df (x0 ) (h) = rf (x0 ) h for any h 2 Rn . That is, we usually use the dot-product notation when f is a scalar-valued function. As with single-variable functions, if f : U Rn ! Rm is di¤erentiable at x0 then it is continuous at x0 . We have seen, however, that the partial derivatives can exist at a point without the function even being continuous at that point, let alone di¤erentiable. Existence of the partials derivatives along with continuity of the function is also not enough: xy f (x; y) = p ; f (0; 0) = 0 x2 + y 2 55 is continuous at (0; 0), and @f (0; 0) = @f (0; 0) = 0, but the function l(x; y) = 0 @x @y does not satisfy the limit criterion since f (x; y) l(x; y) p (x;y)!(0;0) x2 + y 2 xy = lim 6= 0 2 (x;y)!(0;0) x + y 2 lim because this limit does not exist. In particular, z = l(x; y) = 0, but the xy-plane, cannot be called the tangent plane at the origin. The following theorem bridges the gap: @fi exist in an open set containing x0 and Theorem. If all partial derivatives @x j are continuous at x0 , then f is di¤erentiable at x0 . Such functions are said to be of class C 1 . The function in the example above is not C 1 . The di¤erentiable functions we encounter will be class C 1 , but the converse of the theorem is not true: There exist di¤erentiable functions with discontinuous partial derivatives, that is, the resulting a¢ ne linear function l(x) may satisfy the limit criterion even though the partial derivatives are not continuous at the given point. As an example we need only look to a single-variable function, such as 1 x which can be made continuous at the origin by setting f (x) = x2 sin f (0) = lim x2 sin x!0 Then h2 sin h1 h!0 h f 0 (0) = lim 1 =0 x 0 =0 and if x 6= 0 1 1 cos x x 1 1 However, limx!0 2x sin x cos x does not exist, so the derivative exists at the origin (the function l(x) = 0 is its linear approximation) but the derivative is not continuous at the origin. f 0 (x) = 2x sin 56 To summarize: The partial derivatives can exist at a point without the function being di¤erentiable there. If the partial derivatives exist and are continuous at a point then the function will be di¤erentiable there, but the function can be di¤erentiable at a point without the partial derivatives being continuous there. Functions with polynomial components are particularly good examples for practice in computing derivatives. Consider f (x; y) = x2 + y 2 ; xy; x2 y2 and let x0 = (3; 4). Then Df (x) is represented by 2 3 2x 2y x 5 T=4 y 2x 2y 2 3 6 8 Let x0 = (3; 4). Then T = Df (x0 ) = 4 4 3 5 and so the a¢ ne linear approx6 8 imation to f near (3; 4) is 2 3 2 3 2 3 25 6 8 6x + 8y 25 x 3 l(x; y) = 4 12 5 + 4 4 3 5 = 4 4x + 3y 12 5 y 4 7 6 8 6x 8y + 7 which represents the function l(x; y) = (6x + 8y 25; 4x + 3y 12; 6x The point (3:01; 3:98) is "close" to x0 in the domain. We have 8y + 7). l(3:01; 3:98) = (24:9; 11:98; 6:78) f (3:01; 3:98) = (24:901; 11:980; 6:7803) The approximation of f by l will get better as the point in the domain gets closer to x0 , but will get worse as the point gets farther away because f itself has quadratic components. 57 Optional Topic If we suspect that a function is di¤erentiable at point where it can be made continuous, sometimes we can "guess" the a¢ ne linear function l and test it directly. Consider f (x; y) = sin(xy) which becomes continuous at (0; 0) if we let xy f (0; 0) = 1. The graph looks very smooth at the origin and the symmetry leads us to believe that z = 1 is the tangent plane to the graph at (0; 0; 1). 1.0 0.8 0.6 -4 z 0.4 0.2 -2 0.0 -0.20 0 y2 -4 -2 2x 4 4 z= sin(xy) xy We suspect, then, that l(x; y) = 1 will work: sin(xy) xy 1 p (x;y)!(0;0) x2 + y 2 sin (r2 cos sin ) r2 cos sin = lim r!0 r3 sin cos f (x; y) l(x; y) p lim = (x;y)!(0;0) x2 + y 2 lim If you are familiar with L’Hôspital’s Rule, use it twice to obtain lim r sin (2 ) sin r2 cos sin r!0 58 =0 Thus, z = 1 is, in fact, the tangent plane, and Df (0; 0) = 0 0 . This implies @f (0; 0) = 0 @x @f (0; 0) = 0 @y In fact, we can show that z = 1 is the tangent plane for any point where either x = 0 or y = 0 because we obtain a limit of 0 for arbitrary y after setting up the near x = 0. Since f is symmetric in x; y the di¤erence quotient to calculate @f @x @f same result holds for @y . It follows that f is di¤erentiable at any point in R2 if we de…ne f (x; 0) = f (0; y) = 1. 59 Paths and Curves in Rn The vector/parametric description of a line is an example of a path whose image is the line we want to describe. Just as many parametric representations describe the same line, we can describe a curve by many paths. A path in Rn is a function c : A R ! Rn . The image of A in Rn is the curve C parameterized by the path c. If A is the closed interval [a; b] we call c(a) and c(b) the endpoints of the curve. As with lines, we typically use t to denote the domain variable (parameter) c(t) = (x1 (t); : : : ; xn (t)) because many applications require locating a point on a curve as a function of time. A familiar path is c(t) = (cos t; sin t) for which the curve C is the unit circle in R2 . Note that the unit circle is also described by the path c(t) = (cos !t; sin !t) for any non-zero constant !. (In applications, ! is sometimes called the angular speed of the path.) There are in…nitely many paths for any given curve ; describes the portion of C. For example, c(t) = (cos t; sin t) for t 2 U = 2 2 the unit circle in the open half-plane x > 0. This curve has no endpoints since U is an open interval. The same curve is described by the path c(t) = 2 t p ;p 2 4+t 4 + t2 t 2 R It is useful to think of C being traced out by the vector whose position represen! tation is OP where P = c(t). 60 y 1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.8 -0.2 1.0 x -0.4 -0.6 -0.8 -1.0 c(t) = p 2 ; p t 4+t2 4+t2 Exercise. For c(t) = (x (t) ; y (t)) = , c(2) = p 2 ; p t 4+t2 4+t2 p1 ; p1 2 2 show that 1 c0 (t) = x (t)2 [ y (t) i + x (t) j] 2 What is the maximum speed of the path? When n > 2 the path description of curves is essential because such a curve cannot be described by a single Cartesian equation. Consider the path c(t) = sin 2t; 2 sin2 t; 2 cos t t 2 [0; 2 ] This is a closed curve (the endpoints exist and are the same) since c(0) = c(2 ). Note that x(t)2 + y(t)2 + z(t)2 = 4 so the curve C is on the sphere of radius 2 centered at the origin. The path starts and ends at the "north pole", passing through each other point of C exactly once except for (0; 2; 0) which it reaches at t = 2 and t = 32 . For what value of t does the path reach the "south pole"? 61 2 1 0.8 1.0 0.6 0.4 -1.0 0 -0.2 -0.4 -0.6 -0.8 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 -1 -2 c 3 = p 3 3 ; ;1 2 2 is the tip of the directed line segment from the origin as shown. Tangents and Velocity If the path c is di¤erentiable at t we usually write c0 (t) = (x01 (t); : : : ; x0n (t)) for the derivative 2 3 x01 (t) 6 7 Dc (t) = 4 ... 5 x0n (t) 62 The derivative represents the tangent vector at c(t), which is the velocity at this point if t is the time parameter. Thus, the magnitude jc0 (t)j is the speed of the path at this point as it traces out the curve C. At certain points the speed might be 0, in which case the tangent vector is the zero vector 0. The cycloid curve described by the path c(t) = (t sin t; 1 cos t) is di¤erentiable at every point: c0 (t) = (1 = (1 cos t; sin t) cos t) i + (sin t) j but the tangent vector vanishes whenever t = 2k . If you look at Figure 2.4.6 on page 119 it appears that C has cusps at these points, but remember that c0 (t) is the velocity at time t, not the rate of change of y with respect to x. In fact, as we will see by the Chain Rule, t 2 which is not de…ned for even multiples of . What is the maximum speed of this path? We have q 0 jc (t)j = (1 cos t)2 + (sin t)2 p p = 2 1 cos t y 0 (x) = cot which has a maximum value of 2 when t is an odd multiple of . Using the vector/parametric representation of a line we can easily …nd the tangent line to a path at c(t0 ), provided c0 (t0 ) 6= 0 : l(t) = c(t0 ) + (t t0 ) c0 (t0 ) The direction vector for the line is c0 (t0 ) and, of course, l(t0 ) = c(t0 ). For the path c(t) = sin 2t; 2 sin2 t; 2 cos t we have c0 (t) = (2 cos 2t; 4 cos t sin t; 2 sin t) 63 so, when t0 = 3 the tangent line is l(t) = (x(t); y(t); z(t)) 1p x(t) = t 3+ 2 3 p p 3 y(t) = 3 + 3t 2 3 p p z(t) = 1 + 3 3t 3 3 2 1 -3 2 -2 -1 0 -1 1 00 -2 1 2 -1 3 -2 -3 Tangent line at c 3 For this path, the tangent line can be constructed at any point since c0 (t) 6= 0 for any value of t. Chain Rule 64 We have de…ned the derivative of f : U Rn ! Rm to be a linear transformation T : Rn ! Rm that gives us an a¢ ne-linear approximation to f near any point x0 2 Rn where f is di¤erentiable. Consequently the usual linearity rules apply as for single-variable functions; see Theorem 10, (i) and (ii), page 125. Similar considerations show that the product and quotient rules for derivatives generalize as expected for real-valued functions; Theorem 10, (iii) and (iv). Our main goal now is to generalize the Chain Rule so that we can compute the derivative of the composition of functions. This is remarkably easy to develop once we look back at the single-variable case: Let g : U R ! R and f : V R ! R with g (U ) V . If g is di¤erentiable at x0 and f is di¤erentiable at y0 = g(x0 ) then f g is di¤erentiable at x0 and g)0 (x0 ) = f 0 (y0 ) g 0 (x0 ) (f If we use the new notation we developed for derivatives this statement becomes D (f g) (x0 ) = Df (y0 ) Dg (x0 ) which says that Tf g at x0 is the product of Tg at x0 and Tf at g(x0 ). The three linear transformations Tf g , Tf and Tg at the respective points are very simple in this case because any linear transformation of R to itself is of the form x 7! mx for some constant m. In all three cases the constants m are just the slopes of the tangent lines to the the respective graphs. Consider the single-variable example g(x) = tan x on U = 0; 2 and let f (x) = x2 on V = R. Then g(U ) V . Since (f g) (x) = tan2 x, at x0 = 3 we have (f g)0 3 g0 f0 g 3 3 = 2 tan 3 = sec2 3 p = 2 3 sec2 3 =4 p =8 3 which veri…es the single-variable Chain Rule. We note now that Tg (x0 ) ispthe linear transformation x 7! 4x, Tf (g(x0 )) is the linear transformation x 7! 2 3x 65 p (since g 3 = 3 and f 0 (x) = 2x), and Tf g (x0 ) is the linear transformation x 7! p 2x 8 3x (since (f g)0 (x) = d tan = 2 tan x sec2 x). We are therefore accustomed dx to seeing the Chain Rule as the product of real numbers p p 8 3 = 2 3 (4) which, for single-variable functions, is just a product of slopes. But now we want to view it as the composition of linear functions p = Tf 3 Tg Tf g 3 3 This is the insight that carries over to the general Chain Rule: When n = m = 1 the composition of the linear maps on the right side is obtained as the product of two numbers. But in general the composition will be the product of two matrices, so order of multiplication is crucially important. The Chain Rule for general n, m and p becomes Tf g (x0 ) = Tf (y0 ) Tg (x0 ) equivalently D (f g) (x0 ) = Df (y0 ) Dg (x0 ) where Dg (x0 ) is represented by an m n matrix, Df (y0 ) by a p m matrix, and therefore D (f g) (x0 ) by a p n matrix. Again, working with polynomial examples is easy because we do not need to worry about compatible domains. Suppose x3 + y; xy 1; x4 + xy 2 + 2 3x2 z; x3 + y 3 z 2 3 3x2 1 y x 5 and Df (x) = Here, n = 2; m = 3; p = 2. Then Dg (x) = 4 3 2 4x + y 2xy p p p 6x 0 1 2 2 2 R , so g(x ) = 2 + 1; 2 1; 5 = . Let x = 1; 0 0 3x2 3y 2 z y 3 g(x; y) = f (x; y; z) = 66 2 3 3 1 p 1 5 and Df (y0 ) = y0 2 R3 , Dg (x0 ) = 4 2 p 6 2 2 By the Chain Rule D (f g) (x0 ) = = p 6p2 + 6 6 2+9 p 6p2 + 6 p0 p 1 6 2+9 30 2 + 45 5 2 7 p p 4 p2 + 6 18p2 + 12 93 2 75 38 2 + 74 p0 p 1 . 30 2 + 45 5 2 7 2 3 1 p3 4 2 1 5 p 6 2 2 Exercise. Find h = (f g) (x; y) explicitly and calculate the derivative of h at p 1; 2 without using the Chain Rule. Applications Certain dimensions m; n; p are particularly important in applications: 1) Suppose g is a path in R3 and f gives temperature as a function of position in R3 . Here, n = 1; m = 3; p = 1 so f g is a single-variable function of the parameter t, which could represent time. The derivative (f g)0 (t) is the instantaneous rate of temperature change at the point on the curve corresponding to the time t. The Chain Rule takes a familiar form in this case. Consider again the path c(t) = sin 2t; 2 sin2 t; 2 cos t and suppose the temperature in space in given by f (x; y; z) = ek x y z for some constant k. Let h(t) = f (c (t)). Then dh = h0 (t) = Df (c (t)) Dc (t) dt but Dc (t) is a 3 1 matrix and Df (c (t)) is 1 3, so this matrix product is more commonly written as h0 (t) = rf (c (t)) c0 (t) In this example, rf (x; y; z) = ek x y z 67 ; ek x y z ; ek x y z . 3 What is the rate of temperature change along the path at time t0 = ? Is the temperature increasing or decreasing at this time? We are at the point p 3 3 ; ;1 2 2 on the curve, so k+ 5 rf (c (t0 )) = We saw that c0 (t0 ) = e 5 p 2 3 ; e k+ 5 p 3 2 ; e k+ 5 p 3 2 p p 1; 3; 3 so h0 and since ek+ p 3 2 3 = ek+ 5 p 3 2 > 0 the temperature is increasing as time increases. 2) A function g : Rn ! Rn is sometimes called a vector …eld on Rn . If n = 3 it is common to write the component functions of g as g(x; y; z) = (u; v; w) Then for f : R3 ! R and h = f g we have h(x; y; z) = f (u; v; w), and Dh is a 1 3 matrix. Consider the vector …eld g with u(x; y; z) = xy v(x; y; z) = yz w(x; y; z) = xz p and suppose f (x; y; z) = x2 + y 2 + z 2 . Then the function h gives the magnitude of the vector ui + vj + wk assigned by g to the point (x; y; z) and Dh(x; y; z) tells us how this magnitude is changing with respect to x, y and z. Here, 2 3 y x 0 Dg(x; y; z) = 4 0 z y 5 z 0 x h i v w u p p p Df (u; v; w) = u2 +v 2 +w2 u2 +v 2 +w2 u2 +v 2 +w2 68 Now f is di¤erentiable provided u2 + v 2 + w2 6= 0 and so the Chain Rule will give the derivative of h at any point in R3 that is not on a coordinate axis: Df (u; v; w)Dg(x; y; z) = @h @x @h @y @h @z 1 =p x2 y 2 + x2 z 2 + y 2 z 2 x (y 2 + z 2 ) y (x2 + z 2 ) z (x2 + y 2 ) For example, at x0 = (3; 4; 5) the real-valued function h is increasing in the x and z directions and decreasing in the y direction because 123 @h (x0 ) = p @x 769 @h 136 p (x0 ) = @y 769 @h 125 (x0 ) = p @z 769 3) As a third example suppose n = 1 and p = m. Then g is a path c and f is a vector …eld on Rm . The composition f g is another path p(t) = f (c(t)) in Rm . By the Chain Rule p0 (t) = Df (c(t))c0 (t) so the tangent vector of c for a given t0 is mapped to the tangent vector of p for that value by the linear transformation whose matrix is Df p (c(tp0 )). For 2 0 1; 3; 3 . Let the example c(t) = sin 2t; 2 sin t; 2 cos t we found c ( 3 ) = f (x; y; z) = (xy; yz; xz). Then 2 3 2 sin2 t sin 2t 0 0 2 cos t 2 sin2 t 5 Df (c(t)) = 4 2 cos t 0 sin 2t so 2 3 2 p0 ( ) = 4 0 3 1 1 2 p 3 1 0 Thus, the tangent vector for p is 32 3 2 1 p 3 54 5=4 2 p p3 1 3 3 2 0 p 3 j 2 69 5 k 2 0p 3 2 5 2 3 5 at the point p( 3 ) = p p 3 3 3 3 ; ; 4 2 2 2 1 -2 -2 0 -1 -1 00 1 2 1 2 -1 -2 p(t) and tangent line at p 3 : What is the direction of the velocity vector? The path p starts and ends at the origin, which it passes through four times, once for every time the path c crosses a coordinate axis. 70 Directional Derivatives The de…nition of a partial derivative for a real-valued function f is based on the idea of measuring the change in f in the direction of a coordinate vector: @f f (x + hej ) (x) = lim h!0 @xj h f (x) The vector ej is just a unit vector in the direction of the j th -coordinate axis. Similarly, we could ask for a derivative in the direction of any unit vector v. This derivative may or may not exist, but if it does it should be possible, using the Chain Rule, to compute it in terms of a single parameter t. First we would compute d f (x + tv) dt and then evaluate this ordinary derivative at t = 0. We would call this number the directional derivative of f at x in the direction v. This is the formula we expect if we compute the limit lim h!0 f (x + hv) h f (x) where the coordinate vector ej has been replaced by any unit vector v. Note that x + tv is just the vector/parametric description of the line in Rn containing x with direction vector v. Let c(t) = x + tv so that f (x + tv) = f (c(t)). By the Chain Rule, d f (c (t)) = rf (c (t)) c0 (t) dt But c0 (t) = v and c(0) = x, so the evaluation of d f (x dt + tv) at t = 0 is rf (x) v Thus, if a directional derivative exists at x then it is easily computed by taking the dot product of the direction vector v with the gradient of the function at x. 71 This has an immediate geometric consequence: If and v then is the angle between rf (x) rf (x) v = jrf (x)j jvj cos = jrf (x)j cos so the directional derivative is maximized when rf (x) and v point in the same direction ( = 0) and is minimized when rf (x) and v point in the opposite direction ( = ). A simple example will illustrate this principle. Let f (x; y) = x2 y 2 . Then rf (x; y) = (2x; 2y) The only point in R2 where rf = 0 is (0; 0). (Later we will refer to such a point as a critical point.) Thus the directional derivative at x = (0; 0) is 0 v=0 for any unit vector v. Now consider a more typical point on the graph of f , such as (3; 1; 8). If v is a unit vector in R2 then v = (cos ) i + (sin ) j for some . Since rf (3; 1) = (6; 2) the directional derivative at (3; 1) is 6 cos 2 sin so the function f is increasing in the directions where tan < 3 and decreasing in the directions where tan > 3. 72 6 4 2 0 1 2 3 4 5 6 -2 -4 -6 rf (3; 1) v as a function of For the two directions (opposite each other) such that tan = 3 the directional derivative is 0 because v is orthogonal to rf (3; 1). The maximum directional derivative occurs for 1 v = p (3i 10 p rf (3; 1) v = 2 10 j) and the minimum directional derivative occurs for 1 v = p ( 3i + j) 10 p rf (3; 1) v = 2 10 One of the most important facts about the derivative of a real-valued function is that its gradient representation provides a vector that is orthogonal to level sets of the function. If f : R3 ! R then the level sets are surfaces in R3 and the gradient allows us to …nd the tangent plane at a point on the surface: 73 Theorem. If f : R3 ! R is C 1 and x0 = (x0 ; y0 ; z0 ) is on the level surface S then rf (x0 ) c0 (0) = 0 for any smooth path in S such that c (0) = x0 . This is because f (c (t)) is constant for all t since S is a level set, so its derivative with respect to t is 0. In particular, when we evaluate this derivative using the Chain Rule at t = 0 we get df (c (t)) = 0 = rf (c (0)) c0 (0) dt Corollary. If rf (x0 ) 6= 0 then the tangent plane to S at x0 can be de…ned by rf (x0 ) (x x0 ) = 0 This theorem allows us to …nd tangent planes to surfaces that are not themselves the graphs of real-valued functions on R2 , but which may be described as level sets of real-valued functions on R3 . The ellipsoid x2 + 2y 2 + 3z 2 = 3 is not the graph of a function because we cannot solve for z without choosing square roots, but it is the level surface of value 3 for the function f (x; y; z) = x2 + 2y 2 + 3z 2 p whose gradient is rf (x; y; z) = (2x; 4y; 6z). The point 1; 22 ; surface and the tangent plane there is p p ! p p 2 3 2; 2 2; 2 3 = 0 x 1; y ;z 2 3 p p x + 2y + 3z = 3 74 p 3 3 is on the 2 1 -2 2 -1 z -1 -2 x 1 0 0 0 1 -1 y 2 -2 Tangent plane at 1; 75 p p 2 3 ; 2 3 Study Guide for Second Exam Level Sets and Sections of Real-Valued Functions. Sketch level curves for functions on R2 . Find section curves for functions on R2 . Determine level surfaces for functions on R3 of quadratic type (identify type of quadric surface). Limits and Continuity. Find limit (by changing to polar or spherical coordinates) of a real-valued function at a point in order to de…ne f to be continuous at that point. Show that the limit of f at a point does not exist by considering di¤erent paths of approach. Derivatives. Find the matrix representing Df (x0 ) for x0 in Rn and f (x0 ) in Rm . Find the a¢ ne-linear approximation at a point x0 , l(x) = f (x0 )+Df (x0 )(x x0 ). Find the tangent plane at (x0 ; y0 ; f (x0 ; y0 )) given by the a¢ ne approximation when n = 2; m = 1. Find the tangent plane at (x0 ; y0 ; z0 ) on a level surface of f : R3 ! R. Apply the Chain Rule to …nd D (f g) (x0 ). Use the gradient rf of a real-valued function to compute its directional derivative at a given point. Paths and Curves. Find the point on a curve C in R2 or R3 described by a path c(t) at a particular value of t. Find the tangent (velocity) vector for a path c(t) for a particular value of t. Apply the Chain Rule to f c, using the gradient interpretation of Df when f is a real-valued function. 76 Iterated Partial Derivatives h i @f @f As we have discussed, if f : U Rn ! R then Df is the function @x @xn 1 whose evaluation at x0 in Rn is a 1 n matrix of numbers that represents a linear map from Rn to R by the formula 2 3 i x1 h 6 7 @f @f (x0 ) (x0 ) 4 ... 5 (x1 ; : : : ; xn ) 7! @x @xn 1 xn Assume that f is of class C 1 . Then Df (x0 ) is the derivative of f at x0 . In order to analyze the behavior f near x0 in greater detail we must at the very least look at the second partial derivatives, that is, the partial derivatives of the functions @f . Then we can generalize results from single-variable calculus that required @xj the second derivative. We will mostly be concerned with n = 2 or 3, but many applications require n = 4 in which case, to avoid unnecessary subscripts, we usually write x1 x2 x3 x4 = = = = x y z t @f as a function produced from f by applying the di¤erential It helps to think of @x j @ operator @xj . Then, with n = 2, the second-order partial derivative functions are @ @x @ @y @ @x @ @y @f @x @f @x @f @y @f @y @ 2f = fxx @x2 @ 2f = = fxy @y@x @ 2f = = fyx @x@y @ 2f = = fyy @y 2 = 77 Apparently, if there are n input variables then there are n2 second-order partial derivative functions, so it is convenient to collect them all into an n n matrix 2f (sij ) = @x@j @x = i 2 3 2 2 6 6 4 @ f @x21 .. . @ f @xn @x1 ... @2f @x1 @xn 7 7 5 .. . @2f @x2n When the derivatives are evaluated at x0 in Rn we obtain an n numbers called the Hessian matrix Hf (x0 ) of f at x0 . n matrix of Consider the function f (x; y; z) = ex sin (y z), for which 2 x 3 e sin (y z) ex cos (y z) ex cos (y z) 2 @ f ex sin (y z) ex sin (y z) 5 = 4 ex cos (y z) @xj @xi ex cos (y z) ex sin (y z) ex sin (y z) The entries in this matrix are continuous everywhere and we can evaluate to obtain Hf (x0 ). In this case the Hessian matrix at the origin is 2 0 1 4 1 0 Hf (0; 0; 0) = 1 0 3 1 0 5 0 Notice that Hf (x0 ) in this example is symmetric (equal to its transpose) for any point x0 . This will always be the case when f is of class C 2 , meaning that its second-order partial derivatives are continuous. Theorem. If f is of class C 2 then indices i; j. @2f @xj @xi = @2f @xi @xj for all pairs of Since the theorem works with pairs of indices it su¢ ces to prove it for n = 2. As @2f @2f the proof on page 151 shows, the conclusion @y@x = @x@y follows from applying the ordinary Mean Value Theorem (MVT) from single-variable calculus twice. 78 MVT. Let g : R ! R be di¤erentiable on an open interval containing x0 and x0 + x. Then g(x0 + x) g(x0 ) = g 0 (x1 ) x for some x1 between x0 and x0 + x. The conditions of the MVT apply because f is of class C 2 . This so-called equality of mixed partials can fail if f is not of class C 2 . In our discussion above on a¢ ne approximation we saw that f (x; y) = xy x2 y 2 x2 + y 2 f (0; 0) = 0 is di¤erentiable on all of R2 . In fact, f is of class C 1 , even at the origin where the tangent plane to the graph is the xy-plane: y (x4 y 4 + 4x2 y 2 ) (x2 + y 2 )2 x (x4 y 4 4x2 y 2 ) fy (x; y) = (x2 + y 2 )2 fx (0; 0) = fy (0; 0) = 0 fx (x; y) = Here fx = @f @x and fy = @f . @y Away from the origin we have @f @f (x4 + y 4 + 10x2 y 2 ) (x2 (x; y) = (x; y) = @y@x @x@y (x2 + y 2 )3 y2) But f is not of class C 2 at the origin. In fact, @f fx (0; h) fx (0; 0) (0; 0) = lim h!0 @y@x h h 0 = 1 = lim h!0 h @f fy (h; 0) fy (0; 0) (0; 0) = lim h!0 @x@y h h 0 = lim =1 h!0 h We sometimes say that f is smooth at the origin to a …rst order approximation but not smooth at the second order. We will focus on C 2 functions in the study of local extrema where we will use the Hessian matrix to de…ne a quadratic scalar function, the Hessian quadratic form. 79 Suggested Exercises for Chapter 3 The following exercises are not to be handed in. They represent skills required for basic mastery. 3.1 (pages 156-158): 1; 5; 7 11; 19 26; 28 3.2 (pages 165-166): 5; 9 3.3 (pages 182-185): 1; 5; 15; 17 25; 27; 29 35; 41 3.4 (pages 201-203): 1; 3; 13; 17 19; 21; 23; 37 Third Graded Assignment Due: No later than December 3 To reinforce written communication skills the Graded Assignment solutions should be complete and clearly presented in a "bluebook" or provided in PDF format. Late papers will not be graded. Third Graded Assignment. Do any one of the following: Page 158: 30 Page 184: 44 Page 202: 24 Page 214: 40 80 Second-Order Approximation For a real-valued function f di¤erentiable at x0 in an open set U the linear a¢ ne approximation of f to be l(x) = f (x0 ) + Df (x0 ) (x Rn we de…ned x0 ) The claim that f is di¤erentiable at x0 means that lim x!x0 f (x) l(x) =0 jx x0 j We say that l(x) provides a …rst-order approximation of f for x near x0 . It is helpful to rewrite this formulation in terms of the di¤erence h = x x0 x = x0 + h Then, f (x) l(x) = f (x0 + h) l(x0 + h) = R1 (x0 ; h) where R1 (x0 ; h) =0 h!0 jhj lim We call R1 (x0 ; h) the …rst-order remainder function near x0 . It measures the difference between f3and its …rst-order approximation as a function of the di¤erence 2 h1 6 7 vector h = 4 ... 5 in the domain: hn 2 3 h1 6 7 f (x0 + h) = f (x0 ) + Df (x0 ) 4 ... 5 + R1 (x0 ; h) hn 2 3 h1 6 7 Sometimes Df (x0 ) 4 ... 5 is written in gradient form rf (x0 ) h. hn 81 Now suppose f is of class C 2 near x0 . Then the Hessian matrix H for f consists of continuous second-order partial derivatives. When these are evaluated at x0 the resulting numerical matrix de…nes a quadratic function of h : 2 3 h1 1 6 7 h1 hn Hf (x0 ) 4 ... 5 Hf (x0 ) (h) = 2 hn Hf (x0 ) (h) is called the Hessian (form) of f at x0 , and it is used to de…ne the second-order approximation of f for x near x0 : f (x0 + h) q(x0 + h) = l(x0 + h) + Hf (x0 ) (h) q(x0 + h) = R2 (x0 ; h) and since f is of class C 2 it follows that lim h!0 R2 (x0 ; h) =0 jhj2 that is, the second-order remainder divided by the square of the magnitude of the di¤erence vector goes to 0 as h gets small. As an example, look at f (x; y; z) = ez cos x sin y near x0 = 0; 2 ; ln 2 . We have f (x0 ) = 2 and Df (x) = 2 Hf (x) = 4 ez sin x sin y ez cos x cos y ez cos x sin y ez cos x sin y ez cos y sin x ez sin x sin y ez cos y sin x ez cos x sin y ez cos x cos y 3 ez sin x sin y ez cos x cos y 5 ez cos x sin y Then f (x0 +h) = q(x0 +h)+R2 (x0 ; h) where q(x0 +h) = q h1 ; 2 + h2 ; ln 2 + h3 = 2 3 2 32 3 h1 2 0 0 h1 1 h1 h2 h3 4 0 2 0 5 4 h2 5 2 + 0 0 2 4 h2 5 + 2 h3 0 0 2 h3 = 2 + 2h3 h21 h22 + h23 This polynomial in h1 ; h2 ; h3 is the second-order approximation of f (x0 +h), which 82 is 2eh3 cos h1 cos h2 For a small di¤erence, such as h = (0:1; 0:1; 0:05), we …nd R2 (x0 ; h) :0009 132 4 so the quadratic polynomial is slightly greater that f for this particular h. A formula due to Lagrange expresses this remainder in terms of an integral so that its size can be estimated. Lagrange’s formula applies to remainders of all orders, which are obtained from approximations using partial derivatives of corresponding orders. 83 Critical Points The second-order approximation is used to analyze the extreme behavior of f : U Rn ! R at a point in U where such behavior is likely to occur: De…nition. A point x0 2 U is a critical point if either f is not di¤erentiable at x0 or if it is di¤erentiable but Df (x0 ) = 0. Suppose that f is di¤erentiable at x0 and that f (x) close to x0 . Then, for any h 2 Rn f (x0 ) for all x su¢ ciently g(t) = f (x0 + th) has a local maximum at t = 0, so by the Chain Rule g 0 (0) = Df (x0 ) h = 0 Since h is arbitrary we must have Df (x0 ) = 0. The same conclusion holds if f (x) f (x0 ) for all x su¢ ciently close to x0 . We say in either case that x0 is a local extremum for f . Unless stated otherwise, we will assume f is di¤erentiable at x0 for purposes of analyzing extreme behavior. We now have: Theorem. A necessary condition for x0 to be a local extremum is that Df (x0 ) = 0. An example where n = 2: Let f (x; y) = xy. Then Df (x; y) = y x (equivalently, rf (x; y) = yi + xj). The only critical point is x0 = (0; 0). An example where n = 3: Let f (x; y; z) = ez cos x sin y. Then Df (x; y; z) = ez sin x sin y ez cos x cos y ez cos x sin y For x0 to be a critical point we must have sin x sin y = cos x cos y = cos x sin y = 0 84 which happens only when cos x = 0 and sin y = 0 Thus, there are in…nitely many critical points, given for arbitrary integers k; m by (2k + 1) ; m ; z 2 Another example: Let f (x; y; z) = ez sin (x + y), for which Df (x; y; z) = ez cos (x + y) ez cos (x + y) ez sin (x + y) This function does not have any critical points because cos (x + y) and sin (x + y) cannot simultaneously be zero. Hessian Test for Local Extrema The idea of the Hessian test is to look at the second-order approximation of the function near x0 . This approximation is a polynomial with terms of degree 1 and 2. The extrema of such functions are well-understood, in particular, the quadratic terms usually, but not always, determine the nature of the extrema of f . Recall that if f is C 2 near x0 then q(x0 + h) = l(x0 + h) + Hf (x0 ) (h) f (x0 + h) q(x0 + h) = R2 (x0 ; h) R2 (x0 ; h) = 0 lim h!0 jhj2 2 3 h1 6 7 hn Hf (x0 ) 4 ... 5 is the Hessian quadratic where Hf (x0 ) (h) = 21 h1 hn form whose terms are all degree 2; in particular, we always have Hf (x0 ) (0) = 0 Basic linear algebra classi…es quadratic forms as follows: 85 We say the form is de…nite provided Hf (x0 ) (h) = 0 only when h = 0: positive-de…nite if Hf (x0 ) (h) > 0 for h 6= 0, and negativede…nite if Hf (x0 ) (h) < 0 for h 6= 0. We say the form is inde…nite provided Hf (x0 ) (h) can be either positive or negative depending on h. The following examples illustrate the di¤erences: 1) f (x; y) = x2 + y 2 3. Since Hf (x) = 2 0 0 2 at any point x we have Hf (x) (h) = h21 + h22 which is non-negative for all h and equals zero only when (h1 ; h2 ) = (0; 0). The form is positive-de…nite. 2) f (x; y) = xy + 1. Here Hf (x) = 0 1 1 0 at any point x and so Hf (x) (h) = h1 h2 which is positive if h1 ; h2 have the same sign but negative if they have opposite signs. The form is inde…nite. 3) f (x; y) = x2 2xy + y 2 + x y. The Hessian is 2 2 2 2 at any point x and so Hf (x) (h) = h21 2h1 h2 + h22 = (h1 h2 )2 This form is neither de…nite nor inde…nite because it is non-negative for all h but equals zero whenever h1 = h2 . Such a form is called semi-de…nite, but for purposes of analyzing critical points we will say that the form is degenerate. 86 The graphs z = f (x; y) in the above examples are all quadric surfaces. Example 1 is a paraboloid with minimum z-value equal to 3. Example 2 is a hyperbolic paraboloid (a saddle-shaped surface). Example 3 is a parabolic cylinder with the in…nitely many critical points x; 21 + x : 15 z 10 5 -3 -2 -3 -2 -1 y2 -1 0 0 0 1 1 x 2 3 3 z = x2 2xy + y 2 + x y with tangent plane z = 1 4 In these examples we expect the Hessian form at any point to predict the local behavior of f because the the second-order approximation q is equal to the function itself. However, for any function f that is C 3 the remainder R2 (x0 ; h) satis…es the second-order limit test and can be estimated by Lagrange’s formula. This provides the Hessian test: Theorem. Let x0 be a critical point of f . If Hf (x0 ) is positivede…nite then x0 is a relative minimum of f . If Hf (x0 ) is negativede…nite then x0 is a relative maximum of f . This is our second-derivative test for local extrema. If Hf (x0 ) is inde…nite we say that x0 is of saddle type. If Hf (x0 ) is semi-de…nite the test itself cannot determine the nature of the critical point; often it can be determined by inspection, 87 for example, the parabolic cylinder in Example 3 "bottoms out" on the line of critical points y0 = 12 + x0 where f (x0 ; y0 ) = 14 . Determinant Test for Quadratic Forms Basic linear algebra also tells us how to classify the quadratic form in any dimension provided by Hf (x0 ). The matrix Hf (x0 ) is n n and so we can compute its determinant. But we can also compute the determinants of all the square sub-matrices [a11 ] a11 a12 a21 a22 2 3 a11 a12 a13 4 a21 a22 a23 5 a31 a32 a33 .. . Hf (x0 ) We will not have occasion to consider examples for n > 3, but these sub-determinants allow us to generalize the second-derivative test for any dimension n. Consider again the examples above: 1a) f (x; y) = x2 + y 2 3. The only critical point is x0 = (0; 0). The fact that the Hessian form is positive-de…nite is determined by the two sub-determinants det det [2] = 2 2 0 = 4 0 2 both positive numbers: All sub-determinants positive means Hf (x0 ) is a positive-de…nite form. The point (0; 0) is a local minimum. 1b) f (x; y) = 3 determinants are x2 y 2 . The only critical point is x0 = (0; 0). The sub- det det [ 2] = 2 2 0 = 4 0 2 88 both non-zero but alternating in sign, beginning with a negative number. This signi…es a negative-de…nite form. The point (0; 0) is a local maximum. 2a) f (x; y) = xy + 1. The only critical point is x0 = (0; 0). The sub-determinants are det det [0] = 0 0 1 = 1 1 0 These numbers …t into neither pattern 1a nor 1b, but det Hf (x0 ) 6= 0. This signi…es an inde…nite form. The point (0; 0) is of saddle type. 2b) f (x; y) = x2 y 2 . The only critical point is x0 = (0; 0). The sub-determinants are det det [2] = 2 2 0 = 4 0 2 The signs alternate but not in the manner of a negative-de…nite form. This form is also inde…nite. The point (0; 0) is of saddle type. 3) f (x; y) = x2 at any of them 2xy + y 2 + x y. There is an entire line of critical points but det Hf (x0 ) = 0 which means the form is degenerate despite the values of the smaller sub-determinants. The sub-determinant test is inconclusive, though we have seen that each of these critical points, in this example, is a minimum. Examples with n = 3 will further demonstrate the sub-determinant analysis: 2 4) For f (x; y; z) = ln x2z+y+1 2 +1 , Df (x; y; z) = 2 x2 +yx2 +1 89 2 x2 +yy 2 +1 2 z2z+1 so the only critical point is (0; 0; 0). We suspect this of saddle type because f (0; 0; 0) = 0 and any change in x or y produces a negative value whereas a change in z produces a positive value. In fact, Hf (x) = 2 2 2 ( 2) (x2 + y 2 + 1) (y 2 x2 + 1) 4 (x2 + y 2 + 1) yx 0 2 2 2 2 2 2 2 2 4 4 (x + y + 1) yx 2 (x + y + 1) (y x 1) 0 2 2 0 0 ( 2) (z + 1) (z + 1) (z 2 3 2 0 0 2 0 5 at the origin. The sub-determinants are 2; 4; 8 which becomes 4 0 0 0 2 and so the origin is of saddle type. 5) Consider again f (x; y; z) = Df (x) = h 2 Hf (x) = 4 sin z 2x (x2 +y 2 +1)2 sin z , 1+x2 +y 2 for which sin z 2y (x2 +y 2 +1)2 cos z x2 +y 2 +1 3 i 3 2 (x2 + y 2 + 1) (y 2 3x2 + 1) (sin z) 8 (x2 + y 2 + 1) (sin z) yx 3 3 8 (x2 + y 2 + 1) (sin z) yx 2 (x2 + y 2 + 1) (3y 2 x2 1) (sin z) 2 2 2 (x2 + y 2 + 1) (cos z) x 2 (x2 + y 2 + 1) (cos z) y The critical points are x0 = 0; 0; (2k + 1) 2 where we have 2 2 4 0 Hf (x0 ) = 0 2 2 0 4 0 2 Hf (x0 ) = 0 0 Note that f 0; 0; [ 1; 1]. 2 0 2 0 3 0 0 5 , k even (local maximum) 1 3 0 0 5 , k odd (local minimum) 1 = 1 and f 0; 0; 32 = 1, and in fact the range of f is Global and Constrained Extrema 90 1) 2 (x 2 (x ( 2 We saw that the function f (x; y; z) = ln x2z+y+1 2 +1 has a saddle point at the origin and no other critical points. Suppose we were only interested in the behavior of f on the unit ball f(x; y; z) : x2 + y 2 + z 2 1g. There can be no extreme behavior in the open ball, but we can ask for the maximum and minimum behavior on the bounding unit sphere x2 + y 2 + z 2 = 1. For these points f (x; y; z) = ln Since 2 x2 y 2 x2 + y 2 + 1 = 1 on the unit sphere we can express f in the form 2 sin2 f ( ; ; ) = ln 1 + sin2 2 sin on the closed interval [0; ], Now we look for the extreme behavior of ln 21+sin 2 which is the domain of . At the endpoints we have f ( ; ; 0) = ln 2 = f ( ; ; ) For the open interval (0; ) we di¤erentiate with respect to extreme behavior occurs when sin 2 = 0 and …nd that possible that is, = 2 , and f ( ; ; 2 ) = ln 2. We conclude that the maximum value of f on the unit sphere is ln 2 and occurs at the "poles" (x; y; z) = (0; 0; 1) whereas for every point on the "equator" (x; y; z) = (cos ; sin ; 0) f achieves its minimum value of ln 2. This example is similar to the determination of global extrema of a single-variable function on a closed interval where it is necessary to check the values at the endpoints. However, when n > 1 the boundary set, such as the unit sphere in the above example, will typically consist of a curve, surface, or collection of such sets. If a subset of the domain is "compact" (closed and bounded) the following theorem from basic analysis applies: 91 If D is a compact subset of Rn and f : D ! R is continuous then f assumes its absolute maximum and minimum values at some points of D. Finding extreme points on a boundary set is one type of constrained extrema problem. These problems can be very di¢ cult and so techniques have been developed to deal with them in great generality. Most of these are based on methods due to Euler and Lagrange. Method of Lagrange Multipliers If g : U Rn ! R is C 1 and x0 belongs to a level set S then we can de…ne the tangent space to S at x0 in the same way that we de…ned the tangent plane when n = 3. If rg (x0 ) 6= 0 this tangent space consists of all x such that rg (x0 ) (x x0 ) = 0 The geometry is the same: If c(t) is a path in S with c(0) = x0 then c0 (0) is orthogonal to rg (x0 ). Now suppose f is another C 1 function on U and we look at its behavior restricted to the level set S, which we can denote f jS. If f jS has a local extremum at x0 then f (c(t)) has an extremum at t = 0, so by the Chain Rule rf (x0 ) c0 (0) = 0 It follows that rf (x0 ) is a scalar multiple of rg (x0 ) : For some real number rf (x0 ) = rg (x0 ) (Note that = 0 if x0 happens to be a critical point for f because rg (x0 ) 6= 0 but rf (x0 ) = 0 in this case, so it is a good idea to …rst …nd the critical points for f .) The extrema for f jS are called constrained extrema of f . We can apply this 2 reasoning to the example above with f (x; y; z) = ln x2z+y+1 2 +1 on the unit sphere, which is the level set S of value 1 for the function g(x; y; z) = x2 + y 2 + z 2 : 2x 2y 2z i j + k x2 + y 2 + 1 x2 + y 2 + 1 z2 + 1 rg (x; y; z) = 2xi + 2yj + 2zk rf (x; y; z) = 92 To …nd points x0 such that rf (x0 ) = rg (x0 ) we look at the three equations x = x2 + y 2 + 1 y = x2 + y 2 + 1 z = z2 + 1 x y z First, suppose we can satisfy these equations with z 6= 0. Then a point, so if either x 6= 0 or y 6= 0 we would have z2 1 = +1 x2 = 1 z 2 +1 for such 1 + y2 + 1 which implies x2 + y 2 + z 2 + 2 = 0. Since this is not possible we conclude that the only such points are (0; 0; 1) since these are the only two points on S with x = y = 0. (From the third equation we see that = 12 at these points.) Next, suppose z = 0. Then we have x2 + y 2 = 1 and so x and y cannot both be zero. For all such points, which comprise the equator of the unit sphere, apparently = 21 . Now that we have identi…ed the possible constrained extrema of f the values of f at these points can be calculated and compared. Since the only critical point of f on R3 is (0; 0; 0) and f (0; 0; 0) = 0 we conclude that the global extrema of f on the closed unit ball occur on the bounding sphere where the maximum is f (0; 0; 1) = ln 2 and the minimum is f (cos ; sin ; 0) = ln 2. In this example, the closed unit ball is the union of the open unit ball and its boundary, which can be written as U [ @U where @U is the level set S. (Since g is a C 1 function and rg is never the zero vector on S we say that the boundary @U is smooth.) This method of Lagrange multipliers can be used for more general constraint problems where it can identify possible candidates for extrema but not verify their existence. Typical applications arise in geometry when we need to identify con…gurations that maximize or minimize a certain property. For example, suppose we want to …nd the points on a quadric surface ax2 + by 2 + cz 2 = 1 with abc 6= 0 that are closest to the origin in R3 . The type of quadric surface depends on the coe¢ cients a; b; c but we van pose the problem very generally. Let g (x; y; z) = ax2 + by 2 + cz 2 so that the surface is the level p set S of value 1. The function to be minimized is given by the formula x2 + y 2 + z 2 but 93 since it su¢ ces to minimize the square of the distance it is easier to work with f (x; y; z) = x2 + y 2 + z 2 . Thus, we look for points on S that satisfy x = y = z = ax by cz for some . We know that the only critical point of f is (0; 0; 0) which is not on S, so we know that 6= 0. Also, unless a = b = c (i.e., S is a sphere and all points are equally close to the origin) we cannot have xyz 6= 0 for then the three equations would produce di¤erent values for at such a point. Thus we can set each coordinate equal to zero and examine each case. For example, if x = 0 but yz 6= 0 then b = c and each point (0; y; z) with y 2 + z 2 = 1b is a candidate (no such point will exist if b < 0). If x = y = 0 then the points 0; 0; p1c are possibilities, and no such point exists if c < 0. Similar analysis applies if y = 0 or z = 0, respectively. For example. The closest points on the ellipsoid x2 + 2y 2 + 3z 2 = 1 are ( 1; 0; 0) ; 0; p12 ; 0 and 0; 0; p13 . The vectors from the origin to these points are parallel to the normal vectors to the tangent planes at these points. Exercise. Find the points closest to the origin on the surface x2 + y 2 Compare this to the result for the surface x2 + y 2 z 2 = 1. z 2 = 1. The examples above could all be done by inspection of the particular quadric surface. The Lagrange method is useful for …nding p 3constrained extrema that are 1 2 not obvious. Consider the graph of z = x + 3 8y + 1, which is the level set S p of value 3 for g(x; y; z) = 3x2 + 8y 3 z. To …nd the points closes to the origin we analyze the Lagrange equations x = 2 x p y = 8 y2 z = Since (0; 0; 0) is not on S we cannot have = 0, which this time gives us the important information that the closest points are not in the xy-plane. Since z= we can rewrite the …rst two equations x (1 + 2z) = 0 p y 1 + 8yz = 0 94 and examine the various cases: 1) x = 0; y = 0 : Then z = 1 and so (0; 0; 1) is a candidate. 2) x = 0; y 6= 0 : The we must have p 1 + 8yz = 0 1p 3 z = 8y + 1 3 which leads to p 3 2 3 y4 + y+ =0 4 8 This equation has two real solutions q p 1 p y= 6 4 3 6 4 and so we have two more candidates q p p 1 p 1 0; 6 4 3 6 ; 1+ 3 4 4 p !q p 3 1+ 2 3 3 !! 3 3) x 6= 0; y = 0 : Then 1 + 2z = 0 z = x2 + 1 which has no solution. 4) x 6= 0; y 6= 0 : Then 1 + 2z = 0 p 1 + 8yz = 0 which has the unique solution y = that will give a point on S. p1 ; z 2 = 95 1 2 but no corresponding value for x Finally, we check the value of f for the three points on S that the Lagrange method has found: f f 0; 1 4 0; 1 4 q p p 6+ 4 3 6 ; 1 4 1+ p p 6 ; 1 4 1+ p 6 q p 4 3 3 3+ f (0; 0; 1) = 1 !! p !q p 3 1 1+ 2 3 3 = 3 12 ! !! p q p 3 1 1+ 2 3 3 = 3 12 p The third value is larger than the second since 3 > 3. The second value is approximately 0:89971 < 1rand so there is a unique point on the graph closest to p p p p 1 the origin. Its distance is 12 3+5 3 3 3 2 3 3 0:948 53. 50 -2 -2 z y2 0 0 0 -50 z = x2 + 1 3 2x p 8y 3 + 1 p The intersection of this surface with the xy-plane is the curve x2 + 13 8y 3 + 1 = 0 whose closest point to the origin is 0; Note that p 3 p3 2 p 3 p3 2 1: 019 8 > 0:948 53. 96 , which can be found by setting dy dx = 0. p 3+5 3 3 p 3+5 3+ 3 p 3 p 3 -2.0 -1.5 -1.0 -0.5 0.0 0.5 y -0.5 -1.0 -1.5 -2.0 x2 + p 1 3 8y 3 + 1 = 0 97 1.0 1.5 x 2.0 Application: Snell’s Law of Refraction Refer to …gure 3.4.7 on page 202. The speed of light in a vacuum is 3 1010 centimeters per second, but in another medium, such as water or glass, this speed can be slower. This explains the phenomenon of refraction as light pases from one medium where the speed is v1 to another medium where the speed is v2 . As in exercise 28, suppose light is emitted at point A and received at point B, with d the distance between these two points. By Snell’s Law, the light will not travel in a straight line from A to B unless its speed is the same in both media. Rather, the path taken will consist of two line segments so as to minimize the total travel time T . Since the distance from A to the boundary between the two media along the …rst segment of the path is d1 = v1 t1 where t1 is the time required to reach the boundary, and the length of the second segment is d2 = v2 t2 , where t2 is the time required to travel from the boundary to point B, we can …nd possible paths by minimizing the function T (t1 ; t2 ) = t1 + t2 after expressing d in terms of d1 and d2 . As in the …gure, we can represent this path in terms of the angles 1 and 2 . The angle between the two segments of the path is then = 1+ 2 whereby cos = cos ( 2) 1 By the Law of Cosines, d2 = d21 + d22 2d1 d2 cos = (v1 t1 )2 + (v2 t2 )2 + 2 (v1 t1 ) (v2 t2 ) cos ( 1 2) Let a and b be the vertical distances from the boundary to A and B, respectively. Then cos ( 1 2) = cos 1 cos 2 a b = + v1 t1 v2 t2 + sin s 98 1 1 sin a v1 t1 2 2 s 1 b v2 t2 2 Then d2 = t21 v12 + t22 v22 + 2ab + 2 = g (t1 ; t2 ) q t21 v12 a2 q t22 v22 b2 de…nes a level curve of value d2 for ths function g. We can use the method of Lagrange to minimize T on this level set: rT (t1 ; t2 ) = i + j rg (t1 ; t2 ) = 2t1 v12 p t21 v12 a2 + p t21 v12 p t22 v22 b2 a2 i+ 2t2 v22 For these gradient vectors to be parallel we must have p t21 v12 @g @t1 t v2 t v2 p 1 1 p 2 2 = t21 v12 a2 t22 v22 b2 v1 v2 r = r a v1 t1 1 2 b v2 t2 1 sin v1 = v2 sin = a2 + p t22 v22 @g @t2 p t22 v22 b2 b2 j and so 2 1 2 For a given ratio vv12 as determined by media of transmission, this law identi…es the relation between angles required to minimize the travel time. Note, however, that 2 is a function of 1 and so both angles can be found from the given value v1 . To see this, let the origin O be the intersection of the x-axis (boundary) with v2 the segment of length d from A to B. Assume A is in the second quadrant and ! ! B in the fourth and let r1 = OA and r2 = OB so that r1 + r2 = d. Then, for some t 2 2 ; we have A = (r1 cos t; r1 sin t) and B = ( r2 cos t; r2 sin t). Assume the light ray crosses the boundary somewhere along the positive x-axis so that t 2 , whereby cot t tan 1 < 1. It is now straightforward 1 2 to show tan 2 = 1 (d cot t + r tan s and so 99 1) sin2 2 = (r tan 1 + d cot t)2 (r tan 1 + d cot t)2 + s2 Since r; s; d and t are given, we can in principle solve for from the equation q s2 + (r tan 1 + d cot t)2 sin v1 = v2 jr tan 1 + d cot tj 1 1 For example, if r = s then v1 = v2 and for t = 3 4 a ratio of v1 v2 q 1 + (tan 1 + 2 cot t)2 sin jtan 1 + 2 cot tj 1 = 1: 387 7 yields 1 = 52:5 2 = arctan 2 tan 100 7 24 34:9 in the above domain Acceleration and Force For a path c(t) that describes a trajectory as a di¤erentiable function of time t the tangent vector c0 (t) represents the velocity v(t). At any time t the magnitude jv(t)j is called the speed. The second derivative c00 (t) = v0 (t) is the acceleration a(t). Many kinematical properties can be deduced from properties of the derivative, especially the Chain Rule. For example, suppose that the speed s is constant but that the velocity is not necessarily constant due to changing direction. Then jc0 (t)j = s for all t and so s2 = c0 (t) c0 (t) Di¤erentiating both sides: 0 = 2a(t) v(t) we …nd that the acceleration vector for the path is always orthogonal to the velocity vector. Similarly, it is easy to show that if jc(t)j is constant then v(t) is always orthogonal to c(t),though the speed may vary; contrast the two examples c(t) = (cos t; sin t) ; t 2 ; 2 2 and 2 t ;p ; t 2 ( 1; 1) 4 + t2 4 + t2 In both cases we have jc(t)j = 1 and both paths describe the same curve. In the …rst example the speed is constant. In the second example c(t) = p s(t) = jv(t)j = whereby s(0) = 1 2 2 4 + t2 is the maximum speed. The general rule for di¤erentiating the dot product of two paths is d [p(t) c(t)] = p0 (t) c(t) + p(t) c0 (t) dt which follows, along with similar rules for other products (see page 218), by direct computation. In applications, it often happens that v(t) = 0 for certain times t. The resulting curve C typically, but not always, has cusps at these points because the tangent 101 vector is 0. If v(t) 6= 0 the path is regular at this point. A regular path is one for which the tangent vector never vanishes. Newton’s Second Law says that a mass m is subject to a force F that satis…es F(c(t)) = ma(t) Finding the actual path c from a given variable force is therefore a matter of solving a second-order di¤erential equation. This problem can be complicated but it certain cases we can obtain simple but powerful results. Newton realized that a mass moving on a circular path with constant speed s must be subject to a force of constant magnitude directed toward the center of the circle. If the radius of the circle is r0 this path turns out to be c(t) = (r0 cos !t) i + (r0 sin !t) j if we represent the circle in R2 , where ! = rs0 , the frequency of the motion. Di¤erentiating, we verify that v(t) = !r0 ( (sin !t) i + (cos !t) j) and so s = !r0 . Di¤erentiating again, a(t) = ! 2 c(t) and so the force m! 2 c(t) has magnitude m! 2 r0 = rm0 s2 (proportional to the square of the speed) and is directed toward the center of the circle (centripetal force). This deduction was turned into a law of gravitation by Newton: The trajectory of a mass m in the vicinity of a mass M is determined by GmM c(t) jc(t)j3 ma(t) = where G is a universal constant. In the case of a satellite orbiting a massive body this trajectory will be very closely circular, so we can equate GmM c(t) = r03 and equating the magnitudes we obtain s2 = GM r0 102 ! 2 c(t) The period T of the motion is the time required to complete one orbit, so sT = 2 r0 so GM = r0 2 r0 T 2 or 4 2 3 r GM 0 This is one of Kepler’s Laws, which Newton validated mathematically: The square of the period is proportional to the cube of the radius. T2 = This point of view allows us to generalize the problem of …nd the path of an object when the force varies continuously with time. For example, suppose an object is acted on by a force due to wind pressure that produces acceleration a(t) = 1 30 cos t; 20 sin t; t 2 Suppose the initial position and velocity of the object are c(0) = (x0 ; y0 ; z0 ) v(0) = 0 Then v(t) = 30 sin t; 20 1 20 cos t; t2 4 and so c(t) = x0 + 30 (1 cos t) ; y0 + 20 (t sin t) ; z0 + If the mass starts at the origin then its trajectory for 0 103 t 1 3 t 12 15 will be z y x c(t) = 30 30 cos t; 20t 1 3 20 sin t; 12 t This curve stays in the …rst octant but touches the yz-plane whenever t = 2k . Note that the tangent vector is vertical when t = 2k . The object is swirling upward in the wind. 104 Suggested Exercises for Chapter 4 The following exercises are not to be handed in. They represent skills required for basic mastery. 4.1 (pages 227-228): 1; 3; 6; 7; 9; 11; 13; 15 4.2 (pages 234-236): 1; 8 4.3 (pages 243-244): 9; 10; 17; 21; 27 4.4 (pages 258-260): 1; 10; 13; 14; 18; 20; 37; 41 Fourth Graded Assignment Due: No later than December 8. To reinforce written communication skills the Graded Assignment solutions should be clearly presented in a "bluebook" or provided in PDF format. Late papers will not be graded. Fourth Graded Assignment. Do any one of the following: Page 235: 16 Page 244: 20 Page 259: 22 Page 261: 34 105 Arc Length Let c : R ! Rn be a C 1 path. Imagine this path as the trajectory of a point whose position is given as a function of time t. Then jc0 (t)j is the speed of the point at time t and the distance travelled by the point over the time interval [a; b] is given by Z b a jc0 (t)j dt If c is one-to-one on [a; b] then this distance is equal to the length of the curve between c(a) and c(b). For this reason the integral is said to de…ne the arc length of this path on the interval, even if the trajectory passes through the same point more than once. As an example, let c(t) = (1 cosh t; sinh t; t) on the interval [0; ln 8] 2.0 1.5 z 1.0 0.5 0.0 0 0 -1 1 -2 2 3 4 -3 x y c(t) = (1 cosh t; sinh t; t) Then c0 (t) = ( sinh t; cosh t; 1) so p p 0 jc (t)j = sinh2 t + cosh2 t + 1 = 2 cosh t The arc length of this path on [0; ln 8] is p Z ln 8 p 2 cosh tdt = 2 (sinh ln 8 0 = 63 p 2 16 106 sinh 0) 5:5685 The di¤erential d = dx1 e1 + + dxn en is sometimes called the in…nitesimal displacement of a point moving along the path, and p + xn (t)2 d = x1 (t)2 + is called the di¤erential of arc length. In the above example, p d = 2 cosh tdt Note: The lower-case Greek letter sigma is sometimes used so that arc length is not confused with speed s. Then the formula above example 6 on page 232 would be written 0 (t) = jc0 (t)j so as not to con‡ict with the de…nition s(t) = jc0 (t)j. The arc length integral can be used to …nd the geometric distance along the graph of a function y = f (x) by noting that this curve is described by the path c(t) = (t; f (t)) c0 (t) = (1; f 0 (t)) p 1 + f 0 (t)2 d = For example. if y = x2 then the length of the parabola between x = a and x = b is Z bp 1 + 4t2 dt a p 1 p 2 1 p 2 1 2b + 4b2 + 1 p = b 4b + 1 a 4a + 1 + ln 2 2 4 2a + 4a2 + 1 p p For a = 0 this reduces to 12 b 4b2 + 1 + 41 ln 2b + 4b2 + 1 , so the arc of the parabola from (0; 0) to (1; 1) measures which is greater than p p 1p 1 5 + ln 5+2 2 4 2 as expected. 107 1: 478 9 Vector Fields and Flow Lines Functions f : Rn ! Rn are called vector …elds when the emphasis is on the vector properties of the output, i.e., the input point x0 is associated with a vector represented by a directed segment with tail at x0 . This is useful when f represents a vector quantity such as velocity or force. For n = 3 the usual notation is F(x; y; z) = (F1 ; F2 ; F3 ) where the Fj are the scalar coordinate functions of the output. The same notation is used for planar vector …elds, that is, when n = 2. This point of view suggests a close connection with paths. In particular, it is often possible to …nd paths whose tangent vectors are produced by a given vector …eld on a region of space containing the path. In other words, c0 (t) = F(c(t)) the path c is called a ‡ow line for the vector …eld F. The generic mathematical term for ‡ow line is integral curve, since the de…nition makes sense in any dimension with or without physical context. As an example, let F(x; y; z) = 1 (x z yz) ; 1 1p 2 x + y2 ; z > 0 (y + xz) ; z z Then the path c(t) = (t cos t; t sin t; t) for t > 0 is a ‡ow line for F since F (t cos t; t sin t; t) = (cos t = c0 (t) t sin t; sin t + t cos t; 1) If F represented velocity of a ‡uid then, for any t0 > 0, a particle placed at (t0 cos t0 ; t0 sin t0 ; t0 ) would follow this path for t t0 . 108 12 10 8 6 z 4 2 -4 6 4 -2 0 0 2-2 20 y-4 -4 -2 x4 6 8 c(t) = (t cos t; t sin t; t) ; 0 < t 5 2 Divergence and Curl of Vector Fields on R3 The gradient of a scalar-valued function on R3 produces a vector …eld on R3 rf (x; y; z) = @f @f @f (x; y; z) ; (x; y; z) ; (x; y; z) dx dy dy Vector …elds that are gradients of scalar functions are called conservative because they describe conservation laws in physics. Consider two examples F (x; y; z) = (xy; yz; xz) G (x; y; z) = (yz; xz; xy) Then G (x; y; z) = rg(x; y; z), where g(x; y; z) = xyz. However, if F (x; y; z) = rf (x; y; z) for some scalar function f then @f (x; y; z) = xy @x @f (x; y; z) = yz @y @f (x; y; z) = xz @z 109 and so f (x; y; z) = 1 2 x y+ 2 1 (y; z) 1 = y2z + 2 2 (x; z) 1 = xz 2 + 2 But then 3 = 21 xy + 1 21 xz 2 , which is not possible since of y and z. Thus F is not a gradient vector …eld. 3 (x; y) 1 is a function only Theorem. Let F be a vector …eld on R3 . If F = rf for some C 2 scalar function f then @F3 @F2 @F1 @F3 @F2 @F1 i+ j+ k=0 @y @z @z @x @x @y @f and this follows immediately by the equality of mixed Proof. Since Fj = @x j partial derivatives for the C 2 function f . 3 The vector …eld @F @y of F. It is denoted @F2 @F1 ; @z @z @F3 @F2 ; @x @x r @F1 @y is called the curl (or rotational) F because it is the cross-product of the gradient operator with the vector …eld F. For G (x; y; z) = (yz; xz; xy) we have r G = 0, but for F (x; y; z) = (xy; yz; xz) r F= yi + zj + xk The curl operator measures the tendency of a vector …eld to circulate around a p given point. For F (x; y; z) = z1 (x yz) ; z1 (y + xz) ; z1 x2 + y 2 r p p x z + x2 + y 2 y z + x2 + y 2 p p F= i j + 2k z 2 x2 + y 2 z 2 x2 + y 2 The magnitude of this curl becomes very large as (x; y; z) ! (0; 0; 0). Vector …elds are also analyzed by their tendency to expand or contract in the vicinity of a given point. The …rst derivatives of the coordinate functions allow us to de…ne the divergence of F as the scalar function r F= @F1 @F2 @F3 + + @x @y @z 110 which is formally the dot product of the gradient operator with the vector …eld F. For F (x; y; z) = (xy; yz; xz) the divergence is r F (x; y; z) = x + y + z. For G (x; y; z) = (yz; xz; xy) the divergence is r G (x; y; z) = 0. Note that G is the curl of the vector …eld A (x; y; z) = 1 x z2 4 y 2 ; y x2 z2 ; z y2 Theorem. Let G be a vector …eld on R3 . If G = r with C 2 coordinate functions then x2 A for some vector …eld A r G=0 Proof. This is also immediate from equality of the mixed partial derivatives of the coordinate functions for A. 111