Slide 1 Honors Mathematics III Linear Algebra and Functions of Multiple Variables Horst Hohberger University of Michigan - Shanghai Jiao Tong University Joint Institute Summer Term 2022 Overview Slide 2 Welcome to MATH285 ! I Please read the Course Description, which has been uploaded to the Files section on the Canvas course site. I My office is Room 441c in in the Longbin Building. Feel free to drop in during my office hours (announced on Canvas) or just whenever you find me there. I You can also make an appointment by email or write to me with any questions. My email is horst@sjtu.edu.cn I The Teaching Assistants for this course will provide recitation classes, office hours, and help with grading. Overview Slide 3 Remote Video Client: Zhumu After Zoom decided to discontinue direct activities in China, it licensed its software to several local companies. One of these companies is Zhumu. Please download an “international” Zhumu client here: https://zhumu.com/download-intl (Note that this is a different client from the one that is offered by default on the main page.) Please create an account using you SJTU email address and make sure that your alias is visible in roman transliteration. Links for joining our classes by video will be published on Canvas. You are required to keep these links confidential and to not share them with any other JI student or anyone else. Our course will not use Feishu. Overview Slide 4 Office Hours / Piazza In addition to being available in office hours, I and the TAs will be answering course-related questions on Piazza. Please also create an account such that your name in pinyin is visible. It is possible to send private messages on Piazza, but most messages should be public so that everyone can see them and the responses or respond themselves. Feel free to answer other students’ questions! Please do not post anonymously unless you have a good reason. Don’t be shy! Please post messages in English only. Here is the sign-up link: piazza.com/sjtu.org/summer2022/math285 Overview Slide 5 Mathematica JI has obtained an unlimited student license for a computer algebra software called Mathematica, developed by Wolfram Research. You will be required to make use of this or a similar software in your homework assignments and examinations, so you should obtain a copy. Please see the Course Description for details on the download procedure. Overview Slide 6 Course Outcomes The Course Description defines a set of “Course Outcomes.” These are a sampling of minimal skills that you should obtain in the process of taking this course. The list is of course not exhaustive (you should actually learn much more than what is given there). Nevertheless, it represents an indication of whether the course successfully conveyed a selection of concepts. Whether the outcomes are attained is evaluated in two ways: I Subjectively: You will be asked your opinion on how well you think you have mastered each given outcome in the Course Evaluation at the end of the term. I Objectively: The course will include a set of online quizzes on Canvas that you can take in your own time without the extreme pressure found in exams. Each quiz will evaluate one of the course outcomes. The quizzes will contribute to the course grade. Overview Slide 7 Term Projects You will be asked to complete a Term Project within the scope of the present course. These projects will be assigned to randomly determined teams of 4-5 students each. The goal of the term projects is to conduct a deeper investigation into specific topics and applications in physics, mathematics, and engineering. More details will be announced on Canvas. In general, the teams for both projects will be the same. However, on request, certain teams may be re-arranged for the second project. In principle, all members of a given team will receive a joint grade. However, there may be an opportunity to evaluate the individual work performed by the team members and the grade may be adjusted based on this evaluation. More details will be given in the project descriptions. Overview Slide 8 Coursework There will be weekly coursework throughout the term. You will be randomly assigned into assignment groups of three students; you are expected to collaborate within each group and hand in a single, common solution paper to each coursework. Each student must achieve 60% of the total coursework points by the end of the term in order to obtain a passing grade for the course. However, the assignment points have no effect on the course grade. Each member of an assignment group will receive the same number of points for each submission. However, there will be an opportunity for team members to anonymously evaluate each others’ contributions to the assignments. In cases where one or more group members consistently do not contribute a commensurate share of the work, individual group members may lose some or all of their marks. Overview Slide 9 Course Topics and Examinations Our course will be split, broadly, into three parts: 1. Linear Algebra 2. Differential and Integral Calculus in Rn 3. Vector Fields and Higher-Order Derivatives There will be two midterm and a final exam, correspondingly aligned with these topics. There will be frequent references to results and theorems from the previous term; to allow cross-referencing, a current version of last term’s lecture slides will be placed on the Canvas site. All theorems referenced from this course will be prefixed by “186,” e.g., 186 Theorem 1.2.1 refers to Theorem 1.2.1 in last term’s lecture. Overview Slide 10 Grading Policy Please find the grade components in the Course Description on Canvas. The course will be graded on a letter scale, with a certain number of points corresponding to a letter grade. The grading scale will usually be based on the top approximately 6-12% of students receiving a grade of A+, with the following grades determined by (mostly) fixed point increments. Apart from this normalization, the grade distribution is up to you! If (for example) all students obtain many points in the exams, I am happy to see everyone receive a grade of A. Students are primarily evaluated with respect to a fixed point scale, not with respect to each other. Overview Slide 11 More Info: Syllabus (a.k.a. Course Description) Overview Course Topics: Linear Algebra 1. Systems of Linear Equations 2. Finite-Dimensional Vector Spaces 3. Inner Product Spaces 4. Linear Maps 5. Matrices 6. Theory of Systems of Linear Equations 7. Determinants Slide 12 Overview Slide 13 Course Topics: Differential and Integral Calculus 8. Sets and Equivalence of Norms 9. Continuity and Convergence 10. The First Derivative 11. The Regulated Integral for Vector-Valued Functions 12. Curves, Orientation, and Tangent Vectors 13. Curve Length, Normal Vectors, and Curvature 14. The Riemann Integral for Scalar-Valued Functions 15. Integration in Practice 16. Parametrized Surfaces and Surface Integrals Overview Slide 14 Course Topics: Vector Fields and Higher-Order Derivatives 17. Potential Functions and the Gradient 18. Vector Fields and Integrals 19. Flux and Circulation 20. Fundamental Theorems of Vector Calculus 21. The Second Derivative 22. Free Extrema 23. Taylor Series Slide 15 Part 1: Linear Algebra Slide 16 Linear Algebra 1. Systems of Linear Equations 2. Finite-Dimensional Vector Spaces 3. Inner Product Spaces 4. Linear Maps 5. Matrices 6. Theory of Systems of Linear Equations 7. Determinants Systems of Linear Equations Slide 17 1. Systems of Linear Equations Systems of Linear Equations Linear Algebra 1. Systems of Linear Equations 2. Finite-Dimensional Vector Spaces 3. Inner Product Spaces 4. Linear Maps 5. Matrices 6. Theory of Systems of Linear Equations 7. Determinants Slide 18 Systems of Linear Equations Slide 19 Linear Systems of Equations Throughout this course, we use the letter V to denote a (real or complex) vector space. Whenever necessary, we use the letter F to stand for either R or C, depending on the context. (F = R in the context of a real vector space, F = C in the context of a complex vector space.) A linear system of m (algebraic) equations in n unknowns x1 ; : : : ; xn ∈ V is a set of equations a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 .. . (1.1) am1 x1 + am2 x2 + · · · + amn xn = bm where b1 ; : : : ; bm ∈ V and aij ∈ F, i = 1; : : : ; m, j = 1; : : : ; n. If b1 = b2 = · · · = bm = 0, then (1.1) is called a homogeneous system. Otherwise, it is called an inhomogeneous system. Systems of Linear Equations Slide 20 Linear Systems of Equations 1.1. Examples. 1. This is an inhomogeneous system of equations in R: x1 + 3x2 − x3 = 1 x1 − 2x2 =2 10x2 + x3 = 1 2. This is a homogeneous system of equations in R: x1 + 3x2 − x3 = 0 x1 − 2x2 =0 4x1 + 7x2 − 3x3 = 0 3. This is an inhomogeneous system of equations in R2 : 2x1 + x2 = ! 2 ; 1 x1 − x2 = 0 1 ! Systems of Linear Equations Slide 21 Linear Systems of Equations In these examples, the number m of equations is equal to the number n of variables. This is of course not always the case. If m < n we say that the system in underdetermined, if m > n it is called overdetermined. A solution of a linear system of equations (1.1) is a tuple of elements (y1 ; : : : ; yn ) ∈ V n such that the predicate (1.1) becomes a true statement. We will prove later that an inhomogeneous system of equations may have either I a unique solution or I no solution or I an infinite number of solutions. A homogeneous system evidently always has the trivial solution x1 = x2 = · · · = xn = 0: It further either has I no non-trivial solution or I an infinite number of non-trivial solutions. Systems of Linear Equations Slide 22 Solving Linear Systems We will later discuss the theory of existence and uniqueness of solutions for linear systems of equations more extensively. For now, we want to discuss a practical method for actually finding solutions. In school, you have probably learned that there are some basic strategies for solving systems of equations: I solving one of the equations for a variable, and then substituting into the other equations, thereby reducing the number of variables. I manipulating two equations until they have identical expressions on one side, then setting them equal. I adding and subtracting multiples of one equation to another equation. Perhaps you have encountered other stategies, but we will look at the last of the three given here. We want to develop a method of systematically solving systems of equations. If we employ a good strategy of adding equations to each other, we will be able to determine the unknowns efficiently and systematically. Systems of Linear Equations Slide 23 Solving Linear Systems 1.2. Example. Consider the system, x1 + 3x2 − x3 = 1 x1 − 2x2 =2 10x2 + x3 = 1 Let us subtract the first equation from the second equation: x1 + 3x2 − x3 = 1 −5x2 + x3 = 1 10x2 + x3 = 1 Next, we add twice the second equation to the third equation: x1 + 3x2 − x3 = 1 −5x2 + x3 = 1 3x3 = 3 Systems of Linear Equations Slide 24 Solving Linear Systems We read off from x1 + 3x2 − x3 = 1 −5x2 + x3 = 1 3x3 = 3 (starting from the last equation and proceeding upwards) that x3 = 1, x2 = 0 and x1 = 2. Instead of reading off the solution, we could have proceeded more systematically: we divide the last equation by three and the second equation by −5: x1 + 3x2 − x3 = 1 1 1 x2 − x3 = − 5 5 x3 = 1 Systems of Linear Equations Slide 25 Solving Linear Systems We then add 1/5 times the last equation to the second equation, and the simple last equation to the first equation: x1 + 3x2 =2 x2 =0 x3 = 1 Lastly, we subtract thrice the second equation from the first equation: x1 =2 x2 =0 x3 = 1 This gives us the solution directly. Systems of Linear Equations Slide 26 Equivalence of Linear Systems By adding and subtracting one equation from another, we have been effectively changing the system of equations. Formally, it may be useful to understand the validity of this procedure using the notion of equivalence. We say that two systems of linear equations are equivalent if any solution of the first system is also a solution of the second system and vice-versa. Thus the systems x1 + 3x2 − x3 = 1 −5x2 + x3 = 1 10x2 + x3 = 1 are equivalent. x1 = 2 and x2 = 0 x3 = 1 Systems of Linear Equations Slide 27 Simplifying Notation Listing the variables and the equality sign is essentially a waste of space. Instead of saying that we transform x1 + 3x2 − x3 = 1 x1 + 3x2 − x3 = 1 −5x2 + x3 = 1 −5x2 + x3 = 1 to 10x2 + x3 = 1 3x3 = 3 by adding twice the second equation to the third equation, it would be more efficient to write 1 3 0 −5 0 10 ˛ −1 ˛˛ 1 ˛ ·2 ∼ 1 ˛ 1 ˛ 1 ˛ 1 ← −+ We will use this notation forthwith. 1 3 0 −5 0 0 ˛ −1 ˛˛ 1 ˛ 1 ˛ 1 ˛ 3 ˛ 3 Systems of Linear Equations Slide 28 The Gauß – Jordan Algorithm The goal of the Gauß-Jordan algorithm (also called Gaußian elimination) is to transform a system ∗ ∗ ∗ ⋄ ∗ ∗ ∗ ⋄; ∗ ∗ ∗ ⋄ ∗ ∈ R or C; ⋄∈V first into the form 1 ∗ ∗ ⋄ 0 1 ∗ ⋄ 0 0 1 ⋄ (1.2) 1 0 0 ⋄ 0 1 0 ⋄: 0 0 1 ⋄ (1.3) and subsequently into (Ideally; it may not always be possible to achieve the form (1.2)) Systems of Linear Equations Slide 29 The Gauß – Jordan Algorithm We are allowed to achieve this using elementary row manipulations. These are 1. Swapping (interchanging) two rows, 2. Multiplying each element in a row with a number, 3. Adding a multiple of one row to another row. Of course, each “row” represents an equation, so we are simply manipulating equations. It is obvious that these manipulations will transform a system into an equivalent system. A system in the form (1.2) is said to be in upper triangular form; a system in the form (1.3) is said to be in diagonal form. The procedure for transforming a system into upper triangular form is called forward elimination; the subsequent procedure for achieving diagonal form is called backward substitution. Systems of Linear Equations Slide 30 The Gauß – Jordan Algorithm 1.3. Example. Consider the system 2x1 + x2 + x3 = ! 2 ; 1 ! 0 ; 1 x1 − x2 = x 1 + x3 = ! 1 : 1 We rewrite this as 2 1 1 1 ˛ `2´ 1 ˛˛ 1 ˛ ` ´ −1 0 ˛ 01 ˛ ` ´ 0 1 ˛ 1 (1.4) 1 We now proceed with forward elimination to achieve upper diagonal form. Systems of Linear Equations Slide 31 The Gauß – Jordan Algorithm (Forward Elimination) Step 1a: Ensure that the top left hand element is equal to 1: 2 ˛ `2´ 1 −1 1 ˛ `0 ´ − 1 ← ˛ `0´ 0 ˛ 1 ← − ˛ ` ´ ˛ 1 1 1 1 ˛˛ 1 0 1 −1 0 ˛˛ 1 ˛ ` ´ 2 1 1 ˛ 21 ∼ 1 0 ˛ ` ´ 1 ˛ 1 1 Step 1b: Eliminate (transform to zero) all lower entries in the first column: ˛ `0´ ·(−2) 1 −1 0 ˛˛ 1 ˛ `2´ 2 1 1 ˛ 1 ← −+ 1 0 ˛ ` ´ 1 ˛ 1 1 ˛ `0´ ·(−1) ←−−−−−− + ∼ 1 −1 0 ˛˛ 1 ˛ `2´ 0 3 1 ˛ −1 0 1 ˛ ` ´ 1 1 ˛ 0 Systems of Linear Equations Slide 32 The Gauß – Jordan Algorithm (Forward Elimination) Step 2a: Ensure that the entry in the second row and second column is equal to 1: ˛ `0´ 1 −1 0 ˛˛ 1 ˛ `2´ ← − 0 3 1 ˛ −1 0 1 ˛ ` ´ 1 ˛ 1 ← − 0 ˛ `0´ 1 −1 0 ˛˛ 1 ˛ ` ´ 0 1 1 ˛ 10 ∼ 0 ˛ ` 1 ˛ 3 2 ´ −1 Step 2b: Eliminate (transform to zero) all entries in the second column below the second row: ˛ `0 ´ 1 −1 0 ˛˛ 1 ˛ ` ´ 0 1 1 ˛ 10 0 3 ˛ ` ´ 1 ˛ 2 −1 1 −1 ·(−3) ← −+ ∼ 0 1 0 0 ˛ `0´ 1 ˛ `1´ 1 ˛ 0 ˛ ` ´ −2 ˛ −1 0 ˛˛ −1 Systems of Linear Equations Slide 33 The Gauß – Jordan Algorithm (Forward Elimination) Step 3: Ensure that the entry in the third row and third column is equal to 1: 1 −1 0 1 0 0 ˛ `0´ ˛ `0 ´ 0 ˛˛ 1 ˛ ` ´ 1 ˛ 10 ˛ ` ´ −2 ˛ −1 −1 | : (−2) ∼ 1 −1 0 ˛˛ 1 ˛ `1´ 0 1 1 ˛ 0 0 0 ˛ ` 1 ˛ 1=2´ 1=2 The system now has upper triangular form. We next commence the backward substitution. Systems of Linear Equations Slide 34 The Gauß – Jordan Algorithm (Backward Substitution) Step 1: Eliminate all entries in the third column above the third row: ˛ `0´ 1 −1 0 ˛˛ 1 ˛ `1´ − + 0 1 1 ˛ 0 ← 0 ˛ ` ´ 1 ˛ 1=2 0 1=2 ˛ ∼ 0 ·(−1) `0´ 1 1 −1 0 ˛˛ ˛ ` 1=2 ´ 0 1 0 ˛ −1=2 0 ˛ ` ´ 1=2 1 ˛ 1=2 Step 2: Eliminate all entries in the second column above the second row: `0´ ← −+ 1 1 −1 0 ˛˛ ` ´ 1=2 ˛ 0 1 0 ˛ −1=2 ˛ 0 0 ˛ ` ´ 1 ˛ 1=2 1=2 ˛ ∼ `1=2´ 1 0 0 ˛˛ 1=2 ˛ ` 1=2 ´ 0 1 0 ˛ −1=2 ˛ 0 0 1 ˛ `1=2´ 1=2 Our system now has diagonal form, and we may directly read of the solution. Systems of Linear Equations Slide 35 The Gauß – Jordan Algorithm We see that the system 2x1 + x2 + x3 = ! 2 ; 1 ! 0 ; 1 x1 − x2 = ! 1 : 1 x 1 + x3 = is solved by x1 = ! 1=2 ; 1=2 x2 = ! 1=2 ; −1=2 x3 = ! 1=2 : 1=2 We notice that instead of solving a single system in R2 , we could have solved two systems in R, determining the components of x1 ; x2 ; x3 separately from 2x11 + x21 + x31 = 2; x11 − x21 = 0; x11 + x31 = 1 2x12 + x22 + x32 = 1; x12 − x22 = 1; x12 + x32 = 1: and Systems of Linear Equations Slide 36 Existence and Uniqueness of Solutions 1.4. Remark. A system of m equations with n unknowns will have a unique solution if and only if it is diagonalizable, i.e., if it can be transformed into diagonal form (1.3). Since backward substitution will always work, we see that a unique solution exists if and only if the system can be transformed into an upper triangular form, such as 1 ∗ ∗ ⋄ 0 1 ∗ ⋄ 0 0 1 ⋄ (m = n = 3) or 1 0 0 0 0 ∗ 1 0 0 0 ∗ ∗ 1 0 0 ⋄ ⋄ ⋄ 0 0 (m = 5; n = 3) Systems of Linear Equations Slide 37 Existence and Uniqueness of Solutions Thus m ≥ n is a necessary condition for the existence of a unique solution. A system has no solution if one of the rows has the form 0 ::: 0 0 ⋄ ⋄ ̸= 0; which represents the false statement 0 = ⋄. If a system has more than one solution, it can be transformed into a so-called echelon form, e.g., 1 ∗ ∗ ∗ ∗ ⋄ 0 1 ∗ ∗ ∗ ⋄ 0 0 0 1 ∗ ⋄ 0 0 0 0 1 ⋄ In this case, one of the unknowns acts as a parameter. (1.5) Systems of Linear Equations Slide 38 The Solution Set 1.5. Definition. The solution set S of a system of equations (1.1) is the set of all n-tuples of numbers x1 ; : : : ; xn that satisfy (1.1). I If a linear system has a unique solution, the set S contains a single point. I If there is no solution, S = ∅. I If there is more than one solutions, S is an infinite set. 1.6. Example. Consider the real system given by x1 + 2x2 + 3x3 = 0; 4x1 + 5x2 + 6x3 = 0; 7x1 + 8x2 + 9x3 = 0: Systems of Linear Equations Slide 39 A Homogeneous System Applying our algorithm, ˛ ·(−4) 1 2 3 ˛˛ 0 ˛ 4 5 6 ˛ 0 ← −+ ∼ ˛ ˛ 0 ˛ ˛ 0 −3 −6 ˛ 0 | : (−3) ·6 ˛ 0 −6 −12 ˛ 0 ←−−−−−− + ∼ 1 0 −1 ˛˛ 0 ˛ 0 1 2 ˛ 0 ·(−7) ˛ 7 8 9 ˛ 0 ←−−−−−− + 1 ˛ ∼ 1 2 3 ˛˛ 0 ← −+ ˛ ·(−2) 0 1 2 ˛ 0 ˛ 0 0 0 ˛ 0 2 3 ˛ 0 0 ˛ 0 ˛ 0 Writing this system out explicitly, x1 = x3 ; x2 = −2x3 where x3 ∈ R is arbitrary. It is often convenient to introduce a parameter: x1 = ¸; x2 = −2¸; x3 = ¸; ¸ ∈ R: Systems of Linear Equations Slide 40 A Homogeneous System In vector notation, the solution is 0 1 0 1 0 1 x1 ¸ 1 B C B C B C x = @x2 A = @−2¸A = ¸ @−2A ; x3 ¸ 1 ¸ ∈ R: The solution set is 8 > < 0 1 0 1 9 > x1 1 = B C B C 3 S = x = @x2 A ∈ R : x = ¸ · @−2A ; ¸ ∈ R : > > : ; x3 1 Geometrically, S corresponds to a straight line through the origin. We will return to discuss the geometric properties of solutions to systems of equations (they turn out to be affine spaces) later. Systems of Linear Equations Slide 41 Gauß – Jordan with Mathematica 1.7. Example. Consider the system of equations x1 − 2x2 + 3x3 + 4x4 = 2; x1 − 2x2 + 5x3 + 5x4 = 3; −x1 + 2x2 − x3 − 4x4 = 2: In our array notation, this is 1 −2 3 1 −2 5 −1 2 ˛ 4 ˛˛ 2 ˛ 5 ˛ 3 ˛ −1 −4 ˛ 2 Systems of Linear Equations Slide 42 Gauß – Jordan with Mathematica To enter a table/array/matrix in Mathematica (these are all represented in the same way), use the following command structure: A = 881, -2, 3, 4, 2<, 81, -2, 5, 5, 3<, 8-1, 2, -1, -4, 2<< 881, -2, 3, 4, 2<, 81, -2, 5, 5, 3<, 8-1, 2, -1, -4, 2<< For convenience, we have here given our array a name, “A”. We can retrieve the array by referring to A: A 881, -2, 3, 4, 2<, 81, -2, 5, 5, 3<, 8-1, 2, -1, -4, 2<< Systems of Linear Equations Slide 43 Gauß – Jordan with Mathematica A more easily readable form is obtained using the TableForm command: TableForm@AD 1 1 -1 -2 -2 2 3 5 -1 4 5 -4 2 3 2 The RowReduce command implements the Gauß-Jordan algorithm, returning the echelon form: TableForm@RowReduce@ADD 1 0 0 -2 0 0 0 1 0 0 0 1 8 2 -3 Systems of Linear Equations Slide 44 Fundamental Lemma for Homogeneous Equations We will discuss the general theory of uniqueness and existence of solutions to linear equations, after we have studied vector spaces a little more closely. However, the following fundamental lemma requires no additional theory: 1.8. Lemma. The homogeneous system a11 x1 + a12 x2 + · · · + a1n xn = 0 .. . am1 x1 + am2 x2 + · · · + amn xn = 0 of m equations in n real or complex unknowns x1 ; : : : ; xn has a non-trivial solution if n > m. Systems of Linear Equations Slide 45 Fundamental Lemma for Homogeneous Equations Proof. We proceed by induction in m, the number of equations. This means that for any m ∈ N \ {0} we will establish that the system has a non-trivial solution if n > m. We first prove the statement of the lemma for m = 1, i.e., we show that a11 x1 + a12 x2 + · · · + a1n xn = 0; a1k ̸= 0; k = 1; : : : ; n (1.6) has a non-trivial solution whenever n > 1. Proof by induction: For n = 2, a11 x1 + a12 x2 = 0 has the solution x2 = 1, x1 = −a12 =a11 . If (1.6) has a non-trivial solution (x1 ; : : : ; xn ), then a11 x1 + a12 x2 + · · · + a1n xn + a1(n+1) xn+1 = 0 has the non-trivial solution (x1 ; : : : ; xn ; 0). Systems of Linear Equations Slide 46 Fundamental Lemma for Homogeneous Equations Proof (continued). We assume that in the system a11 x1 + a12 x2 + · · · + a1n xn = 0 .. . am1 x1 + am2 x2 + · · · + amn xn = 0 at least one aij ̸= 0 (otherwise the theorem is trivially true). By reordering the equations and renumbering the indices, we can ensure that a11 ̸= 0. We write this system as a11 a21 a31 .. . a12 a22 a32 .. . ::: ::: ::: ˛ a1n ˛˛ 0 a2n ˛˛ 0 ˛ a3n ˛ 0 .. ˛˛ .. . ˛ . ˛ am1 am2 : : : amn ˛ 0 Systems of Linear Equations Slide 47 Fundamental Lemma for Homogeneous Equations Proof (continued). Then a11 a12 ::: a21 a22 ::: a31 .. . a32 .. . ::: ˛ a1n ˛˛ 0 ˛ a ·(− a21 ) 11 a ·(− a31 ) 11 a ·(− am1 ) 11 a2n ˛˛ 0 ← −+ ˛ a3n ˛ 0 ←−−−−−−− + ˛ .. ˛˛ .. . ˛ . ˛ am1 am2 : : : amn ˛ 0 ←−−−−−−−−−−−−−− + ∼ a11 a12 ::: a1n a21 a12 0 a22 − a11 : : : a2n − a21a11a1n 0 0 a32 − a31a11a12 : : : a3n − a31a11a1n 0 .. .. .. . . . am1 a1n a12 : : : a − 0 am2 − am1 mn a11 a11 .. . 0 0 (1.7) Systems of Linear Equations Slide 48 Fundamental Lemma for Homogeneous Equations Proof (continued). The boxed area represents a homogeneous system of m − 1 equations in n − 1 unknowns x2 ; : : : ; xn . We continue with our proof by induction. The case m = 1 has been established. Now assume that for m − 1 there exists a non-trivial solution whenever the number of unknowns is greater than m − 1. A system with m equations and n > m unknowns may be transformed into the form (1.7). The subsystem indicated by the boxed area in (1.7) by assumption has a non-trivial solution x2 ; : : : ; xn . Then the system of m equations in n unknowns has the solution “ x= − ” 1 (a12 x2 + · · · + a1n xn ); x2 ; : : : ; xn ; a11 which is also non-trivial. Finite-Dimensional Vector Spaces Slide 49 2. Finite-Dimensional Vector Spaces Finite-Dimensional Vector Spaces Linear Algebra 1. Systems of Linear Equations 2. Finite-Dimensional Vector Spaces 3. Inner Product Spaces 4. Linear Maps 5. Matrices 6. Theory of Systems of Linear Equations 7. Determinants Slide 50 Finite-Dimensional Vector Spaces Slide 51 Linear Independence We assume throughout that V is a real or complex vector space. As usual, we will use the letter F to denote either R (for real vector spaces) or C (for complex vector spaces). We want to distinguish elements of vector spaces that are not simply multiples of each other. For example, the vectors u; v ∈ R2 , u= ! 1 ; 2 v= −2 −4 ! are multiples of each other, because v = −2u. In general, we say that u; v ∈ V are multiples of each other if ∃ : u = –v –∈F or ∃ –1 ;–2 ∈F |–1 |+|–2 |̸=0 : –1 u + –2 v = 0 Finite-Dimensional Vector Spaces Slide 52 Linear Independence If u and v are not multiples of each other, we say that they are (linearly) independent. This means that „ ¬ ∃ –1 ;–2 ∈F |–1 |+|–2 |̸=0 : –1 u + –2 v = 0 « or ∀ –1 ;–2 ∈F –1 u + –2 v = 0 ⇒ –1 = –2 = 0: 2.1. Definition. Let V be a real or complex vector space and v1 ; : : : ; vn ∈ V . Then the vectors v1 ; : : : ; vn are said to be independent if for all –1 ; : : : ; –n ∈ F n X –k vk = 0 ⇒ –1 = –2 = · · · = –n = 0: k=1 A finite set M ⊂ V is called an independent set if its elements are independent. Finite-Dimensional Vector Spaces Slide 53 Linear Independence 2.2. Example. The vectors v1 = ! 1 ; 0 v2 = 0 2 ! are independent (and M = {v1 ; v2 } is an independent set), because 0 0 ! = 0 = –1 v1 + –2 v2 = –1 2–2 ! is equivalent to the system of equations 0 = –1 ; 0 = 2–2 ; which has the unique solution –1 = 0 and –2 = 0. Finite-Dimensional Vector Spaces Slide 54 Linear Independence 2.3. Example. The vectors 0 1 1 B C v1 = @4A ; 7 0 1 2 B C v2 = @5A ; 8 0 1 3 B C v3 = @6A 9 are not independent, because 0 1 0 1 0 –1 + 2–2 + 3–3 B C B C @0A = 0 = –1 v1 + –2 v2 + –3 v3 = @4–1 + 5–2 + 6–3 A 0 7–1 + 8–2 + 9–3 has a non-trivial solution, as we have seen in Example 1.6. For example, we can take –1 = 1, –2 = −2, –3 = 1. Hence, –1 v1 + –2 v2 + –3 v3 = 0 and the vectors are not independent. ̸⇒ –1 = –2 = –3 = 0; Finite-Dimensional Vector Spaces Slide 55 Linear Combinations and Span 2.4. Definition. Let v1 ; : : : ; vn ∈ V and –1 ; : : : ; –n ∈ F. Then the expression n X –k vk = –1 v1 + · · · + –n vn k=1 is called a linear combination of the vectors v1 ; : : : ; vn . The set n span{v1 ; : : : ; vn } := y ∈ V : y = n X –k vk ; –1 ; : : : ; –n ∈ F o k=1 is called the (linear) span or the linear hull of the vectors v1 ; : : : ; vn . Finite-Dimensional Vector Spaces Slide 56 Linear Combinations and Span 2.5. Example. span ( ! !) 1 2 ; 0 1 = R2 . We need to show that every x ∈ R2 can be written as x = –1 ! 1 2 + –2 0 1 ! for some –1 ; –2 ∈ R. This means we need to solve x= x1 x2 ! = –1 + 2–2 –2 ! This is easily done, and we obtain 2x2o. Thus for n` ´ –`2 ´= o x2 and –1 =nx`1 − 1 2 1´ `2´ 2 any x ∈ R we have x ∈ span 0 ; 1 . Since span 0 ; 1 ⊂ R2 by definition, we are finished. Finite-Dimensional Vector Spaces Slide 57 Linear Combinations and Span 2.6. Lemma. The vectors v1 ; : : : ; vn ∈ V are independent if and only if none of them is contained in the span of all the others. Proof. If vk = 0 for some k = 1; : : : ; n, then the statement is trivially true. We prove the contraposition of the statement, assuming that all vectors are non-zero: vk ∈ span{v1 ; : : : ; vk−1 ; vk+1 ; vn } ∃ k∈{1;:::;n} ⇔ ⇔ ∃ k∈{1;:::;n} ∃ ∃ –i ∈ F i∈{1;:::;n}\{k} P |–i |̸=0 X –i ∈ F i i∈{1;:::;n} P |–i |̸=0 –i vi = 0 vk = X i –i vi Finite-Dimensional Vector Spaces Slide 58 Span of Subsets More generally, if V is a vector space and M is some subset of V , then we can define the span of M as the set containing all (finite) linear combinations of elements of M, i.e., n n X o –i mi : span M := v ∈ V : ∃ :v= ∃ ∃ n∈N –1 ;:::;–n ∈F m1 ;:::;mn ∈M i=1 Note that this definition does not presume that M is a subspace, just an arbitrary subset of V . Furthermore, although only finite linear combinations are considered, the set M may well be infinite in size. Moreover, even though M is just any set, span M will be a subspace of V . 2.7. Example. Let M = {f ∈ C(R) : f (x) = x n ; n ∈ N} denote the set of all monomials in the space of continuous functions on R. Then P(R) := span M is the space of all polynomials (of any degree) in C(R). Finite-Dimensional Vector Spaces Slide 59 Basis 2.8. Definition. Let V be a real or complex vector space. An n-tuple B = (b1 ; : : : ; bn ) ∈ V n is called an (ordered and finite) basis of V if every vector v has a unique representation v= n X –i bi ; –i ∈ F : i=1 The numbers –i are called the coordinates of v with respect to B. 2.9. Example. The tuple of vectors (e1 ; : : : ; en ), ei ∈ Rn , ei = (0; : : : 0; 1 ; 0; : : : ; 0); ↑ ith entry i = 1; : : : ; n; is called the standard basis or canonical basis of Rn . Finite-Dimensional Vector Spaces Slide 60 Characterization of Bases Sometimes we are not interested in the order of the elements of a basis, and write B = {b1 ; : : : ; bn }, replacing the tuple by a set. This is known as an unordered basis. 2.10. Theorem. Let V be a real or complex vector space. An n-tuple B = (b1 ; : : : ; bn ) ∈ V n is a basis of V if and only if (i) the vectors b1 ; : : : ; bn are linearly independent, i.e., B is an independent set, and (ii) V = span B. Finite-Dimensional Vector Spaces Slide 61 Characterization of Bases Proof. (⇒) Suppose that B is a basis of V . Then every v ∈ V can be expressed as v= n X – i bi i=1 for some coefficients –i ∈ F. Hence, V ⊂ span B. From B ⊂ V it is clear that span B ⊂ V , so we deduce V = span B. The zero vector 0 ∈ V has the representation 0 = 0 · b1 + · · · + 0 · bn : Since B is a basis, this representation is unique, i.e., n X i=1 –i bi = 0 ⇒ –1 = · · · = –n = 0: Finite-Dimensional Vector Spaces Slide 62 Characterization of Bases Proof (continued). It follows that B is an independent set. (⇐) Suppose that B ⊂ V satisfies span B = V . Then every v ∈ V is an element of the span of B, so v= n X – i bi i=1 for some coefficients –i ∈ F. It remains to show that this representation is unique. Suppose that v= n X i=1 –i bi = n X i=1 —i bi ; –i ; —i ∈ F: Finite-Dimensional Vector Spaces Slide 63 Characterization of Bases Proof (continued). Then 0= n X (–i − —i )bi : i=1 Since the bi are all independent, this implies –i − —i = 0; so the representation is unique. i = 1; : : : ; n; Finite-Dimensional Vector Spaces Slide 64 Finite- and Infinite-Dimensional Spaces 2.11. Definition. Let V be a real or complex vector space. Then V is called finite-dimensional if either I V = {0} or I V possesses a finite basis. If V is not finite-dimensional, we say that it is infinite-dimensional. 2.12. Example. 1. The space of polynomials of degree at most n, R) : f (x) = Pn = {f ∈ C( n X ak x k ; a0 ; a1 ; : : : ; an ∈ R} k=0 is finite-dimensional, because it has the basis B = (1; x; x 2 ; : : : ; x n ). 2. The space of real polynomials of any degree, P(R), is infinite-dimensional. (See Example 2.7.) Finite-Dimensional Vector Spaces Slide 65 Length of Bases 2.13. Theorem. Let V be a real or complex finite-dimensional vector space, V ̸= {0}. Then any basis of V has the same length (number of elements) n. Proof. Let A = (a1 ; : : : ; an ) be a basis of V . We will show that no tuple B = (b1 ; : : : ; bm ) with m > n can be a basis of V . (We do not need to consider the case m < n, because we could just switch the role of A and B.) Thus, suppose that A and B given as above are both bases. Then for every j = 1; : : : ; m there exist uniquely determined numbers cij ∈ F, i = 1; : : : ; n, such that bj = n X i=1 cij ai : Finite-Dimensional Vector Spaces Slide 66 Length of Bases Proof (continued). Now let –1 ; : : : ; –m ∈ F and consider the linear combination m X j=1 –j bj = m X n X cij –j ai = j=1 i=1 n “X m X i=1 | j=1 ” cij –j ai {z =:—i } Now, since A and B are bases and as we know that 0 ∈ V has a unique representation in terms of basis vectors, we have –1 = –2 = · · · = –m = 0 ⇔ m X –j bj = 0 j=1 ⇔ ⇔ ⇔ n X — i ai = 0 i=1 —1 = —2 = · · · = —n = 0 ∀ i=1;:::;n m X j=1 cij –j = 0 Finite-Dimensional Vector Spaces Slide 67 Length of Bases Proof (continued). This means that the homogeneous system of equations c11 –1 + c12 –2 + · · · + c1m –m = 0 .. . (2.1) cn1 –1 + cn2 –2 + · · · + cnm –m = 0 has only the trivial solution –1 = –2 = · · · = –m = 0: However, we have assumed that m > n, i.e., there are more unknowns than equations. By the Fundamental Lemma 1.8, there must exist a non-trivial solution. Thus we have a contradiction, so B can not be a basis if A is. Finite-Dimensional Vector Spaces Slide 68 Dimension 2.14. Definition. Let V be a finite-dimensional real or complex vector space. We define the dimension of V , denoted dim V , as follows: (i) If V = {0}, dim V = 0. (ii) If V ̸= {0}, dim V = n, where n is the length of any basis of V . If V is an infinite-dimensional vector space we write dim V = ∞. 2.15. Examples. (i) dim Rn = n (ii) dim Pn = n + 1 (iii) dim C(R) = ∞ (iv) dim{(x1 ; x2 ) ∈ R2 : x2 = 3x1 } = 1 Finite-Dimensional Vector Spaces Slide 69 Characterization of Bases 2.16. Remark. In an n-dimensional vector space V a basis is an independent set with n elements that spans V . A few questions arise naturally: 1. Is any independent set with n elements in an n-dimensional space a basis? 2. If not, is it possible to find independent sets with more than n elements in an n-dimensional space? Finite-Dimensional Vector Spaces Slide 70 Maximal Subsets In order to answer these and similar questions, we will work towards a fundamentally important result called the basis extension theorem. First, we need a lemma: 2.17. Lemma. Let a1 ; : : : ; an+1 ∈ V and assume that a1 ; : : : ; an are independent and that a1 ; : : : ; an+1 are dependent. Then an+1 is a linear combination of (some of) a1 ; : : : ; an . The proof is quite easy and left to you as an exercise. 2.18. Definition. Let V be a real or complex vector space and A ⊂ V a finite set. An independent subset F ⊂ A is called maximal if every x ∈ A is a linear combination of elements of F . If A is finite and F ⊂ A is maximal, then span F = span A. A maximal subset is of course not defined uniquely. Finite-Dimensional Vector Spaces Slide 71 Maximal Subsets 2.19. Example. Let V = R3 and 80 1 0 1 0 19 > 0 1 > < 1 = B C B C B C A = @1A ; @1A ; @ 0 A : > > : 0 1 −1 ; Then 80 1 0 1 9 80 1 0 1 9 80 1 0 19 > > > 0 > 1 > 1 > < 1 = < 1 = < 0 = B C B C B C B C B C B C F1 = @1A ; @1A ; F2 = @1A ; @ 0 A ; F3 = @1A ; @ 0 A > > > > > > : 0 : 0 : 1 1 ; −1 ; −1 ; are all maximal independent subsets of A. Furthermore, span A = span F1 = span F2 = span F3 : Finite-Dimensional Vector Spaces Slide 72 Maximal Subsets 2.20. Theorem. Let V be a vector space and A ⊂ V a finite set. Then every independent subset A′ ⊂ A lies in some maximal subset F ⊂ A. Proof. We proceed algorithmically. We ask: Does there exist a vector x ∈ A \ A′ such that x ∈ = span A′ ? I If no, we are finished, because A′ is maximal. I If yes, we take this x and define A′′ = A′ ∪ {x}. By Lemma 2.17, A′′ is independent (otherwise x ∈ span A′ and we have a contradiction) and we can repeat the procedure, substituting A′′ for A′ . Since A is finite, the loop will terminate at some point and we obtain a maximal independent subset of A. Finite-Dimensional Vector Spaces Slide 73 Basis Extension Theorem 2.21. Basis Extension Theorem. Let V be a finite-dimensional vector space and A′ ⊂ V an independent set. Then there exists a basis of V containing A′ . Proof. Write A′ = {a1 ; : : : ; am } and choose a basis A = {am+1 ; : : : ; am+n } of V , dim V = n. We now define A = {a1 ; : : : ; am+n } ⊃ A′ : By Theorem 2.20 there exists a maximal independent subset F of A containing A′ . Since A is a basis, V = span A = span A. Furthermore, span F = span A, so span F = V . Thus F is a basis. Finite-Dimensional Vector Spaces Slide 74 Basis Extension Theorem 2.22. Corollary. Let V be an n-dimensional vector space, n ∈ N. Then any independent set A with n elements is a basis of V . Proof. By the basis extension theorem there is a basis containing A. Since this basis will have n elements, A itself is this basis. 2.23. Corollary. Let V be an n-dimensional vector space, n ∈ N. Then an independent set A may have at most n elements. Proof. By the basis extension theorem there is a basis containing A. Since this basis will have n elements, A may not have more elements than this. Finite-Dimensional Vector Spaces Slide 75 Sums of Vector Spaces 2.24. Definition. Let V be a real or complex vector space and U; W be sets in V . (i) We define the sum of U and W by n U + W := v ∈ V : ∃ o ∃ : v =u+w : u∈U w ∈W (ii) If U and W are subspaces of V with U ∩ W = {0}, the sum U + W is called direct, and we denote it by U ⊕ W . 2.25. Remark. It is easy to see that if U; W are subspaces of V , then U + W (or U ⊕ W ) is a subspace of V ; check this for yourself! Finite-Dimensional Vector Spaces Slide 76 Sums of Vector Spaces 2.26. Examples. (i) Let U; W ⊂ R2 be given by U = span ( 1 0 !) ; W = span ( 2 1 !) : Then every x ∈ R2 has a representation in the form x = u + w , where u ∈ U and w ∈ W . (Why? See also Example 2.5) Therefore, R2 = U + W . ` ´ Furthermore, U ∩ W = {0} since 10 and Lemma 2.6.) Hence, we can write R2 = U ⊕ W: `2 ´ 1 are independent (See Finite-Dimensional Vector Spaces Slide 77 Sums of Vector Spaces (ii) Let U; W ⊂ R3 be given by 80 1 0 1 9 > 1 > < 1 = B C B C U = span @0A ; @1A ; > > : 0 0 ; 80 1 0 1 9 > 0 > < 0 = B C B C W = span @0A ; @1A : > > : 1 0 ; Then U + W = R3 , but the sum is not direct, because (0; 1; 0) ∈ U ∩ W (iii) We could write V = {(x1 ; x2 ; x3 ) ∈ R3 : x3 = 0} as 80 19 80 19 > > < 1 > = < 1 > = B C B C V = span @1A ⊕ span @0A : > > : 0 > ; : 0 > ; Finite-Dimensional Vector Spaces Slide 78 Sums of Vector Spaces 2.27. Lemma. The sum U + W of vector spaces U; W is direct if and only if all x ∈ U + W , x ̸= 0, have a unique representation x = u + w , u ∈ U, w ∈ W. Proof. (⇒) We show the contraposition: if the representation is not unique for all x ∈ U + W , then the sum is not direct. Let x = u + w = u ′ + w ′ with u; u ′ ∈ U, w ; w ′ ∈ W . Then u − u ′ = w ′ − w , so u − u ′ ∈ U and u − u ′ ∈ W . Thus U ∩ W = ̸ {0}. Finite-Dimensional Vector Spaces Slide 79 Sums of Vector Spaces Proof (continued). (⇐) We again show the contraposition: if the sum is not direct, then there exists some x ∈ U + W with a non-unique representation. This is obvious, because if 0 ̸= x ∈ U ∩ W , then we may write x = |{z} x + |{z} 0 = ∈U ∈W 1 1 x + x; 2 2 |{z} |{z} ∈U so this x has more than one representation. ∈W Finite-Dimensional Vector Spaces Slide 80 Sums of Vector Spaces 2.28. Theorem. Let V be a vector space and U; W ⊂ V be finite-dimensional subspaces of V . Then dim(U + W ) + dim(U ∩ W ) = dim U + dim W: The proof will discussed in recitation class. Inner Product Spaces Slide 81 3. Inner Product Spaces Inner Product Spaces Linear Algebra 1. Systems of Linear Equations 2. Finite-Dimensional Vector Spaces 3. Inner Product Spaces 4. Linear Maps 5. Matrices 6. Theory of Systems of Linear Equations 7. Determinants Slide 82 Inner Product Spaces Slide 83 Inner Product Spaces 3.1. Definition. Let V be a real or complex vector space. Then a map ⟨ · ; · ⟩ : V × V → F is called a scalar product or inner product if for all u; v ; w ∈ V and all – ∈ F (i) ⟨v ; v ⟩ ≥ 0 and ⟨v ; v ⟩ = 0 if and only if v = 0, (ii) ⟨u; v + w ⟩ = ⟨u; v ⟩ + ⟨u; w ⟩, (iii) ⟨u; –v ⟩ = –⟨u; v ⟩, (iv) ⟨u; v ⟩ = ⟨v ; u⟩. The pair (V; ⟨ · ; · ⟩) is called an inner product space. 3.2. Remark. Properties (iii) and (iv) imply that ⟨–u; v ⟩ = ⟨v ; –u⟩ = –⟨v ; u⟩ = –⟨u; v ⟩: We say that the inner product is linear in the second component and anti-linear in the first component. Inner Product Spaces Slide 84 The Induced Norm 3.3. Examples. I In Rn we define the canonical or standard scalar product ⟨x; y ⟩ := n X xi yi ; x; y ∈ Rn : (3.1) i=1 I In Cn we can define the inner product ⟨x; y ⟩ := n X xi yi ; x; y ∈ Cn : i=1 I In C([a; b]), the space of complex-valued, continuous functions on the interval [a; b], we can define an inner product by ⟨f ; g ⟩ := Z b a f (x)g (x) dx; f ; g ∈ C([a; b]): Inner Product Spaces Slide 85 The Induced Norm 3.4. Definition. Let (V; ⟨·; ·⟩) be an inner product space. The map ∥ · ∥ : V → R; ∥v ∥ = q ⟨v ; v ⟩ is called the induced norm on V . 3.5. Examples. I The induced norm in Rn and Cn is given by ∥x∥ = q v u n uX ⟨x; x⟩ = t |xi |2 = ∥x∥2 ; i=1 which is the usual euclidean norm. I The induced norm on C([a; b]) is ∥f ∥ = q ⟨f ; f ⟩ = which is just the 2-norm. s Z b a |f (x)|2 dx = ∥f ∥2 (3.2) Inner Product Spaces Slide 86 The Induced Norm 3.6. Cauchy-Schwarz Inequality. Let (V; ⟨ · ; · ⟩) be an inner product vector space. Then |⟨u; v ⟩| ≤ ∥u∥ · ∥v ∥ for all u; v ∈ V where ∥ · ∥ is the induced norm. Proof. Let e := v =∥v ∥. Then ⟨e; e⟩ = ⟨v ; v ⟩=∥v ∥2 = 1 and 0 ≤ ∥u − ⟨e; u⟩e∥2 = ⟨u − ⟨e; u⟩e; u − ⟨e; u⟩e⟩ = ∥u∥2 − |⟨e; u⟩|2 It follows that |⟨u; v ⟩|2 = ∥v ∥2 · |⟨u; e⟩|2 ≤ ∥u∥2 · ∥v ∥2 : Inner Product Spaces Slide 87 The Induced Norm 3.7. Corollary. The induced norm is actually a norm, i.e., it satisfies (i) ∥v ∥ ≥ 0, ∥v ∥ = 0 ⇔ v = 0, (ii) ∥–v ∥ = |–| · ∥v ∥, (iii) ∥u + v ∥ ≤ ∥u∥ + ∥v ∥ for all u; v ∈ V and – ∈ F. Proof. All properties except for the triangle inequality are easily checked. By the Cauchy-Schwarz inequality, we have ∥u + v ∥2 = ∥u∥2 + ∥v ∥2 + 2 Re⟨u; v ⟩ ≤ ∥u∥2 + ∥v ∥2 + 2|⟨u; v ⟩| ≤ ∥u∥2 + ∥v ∥2 + 2∥u∥∥v ∥ = (∥u∥ + ∥v ∥)2 : Inner Product Spaces Slide 88 Angle Between Vectors 3.8. Remark. Every inner product space is also a normed vector space and by extension a metric space. 3.9. Definition. Let V be a real inner product space and u; v ∈ V . We define the angle ¸(u; v ) ∈ [0; ı] between u and v by cos ¸(u; v ) = ⟨u; v ⟩ : ∥u∥∥v ∥ (3.3) This definition makes sense, since by the Cauchy-Schwarz inequality ˛ ˛ ˛ ⟨u; v ⟩ ˛ |⟨u; v ⟩| ˛ ˛ ˛ ∥u∥∥v ∥ ˛ = ∥u∥∥v ∥ ≤ 1: In R2 and R3 the expression (3.3) of course corresponds to our geometric notion of the (cosine of the) angle between two vectors. Inner Product Spaces Slide 89 Angle Between Vectors 3.10. Example. For x; y ∈ R2 we have ^(x; y ) = ¸(x; y ). We may assume that ∥x∥ = ∥y ∥ = 1 and we consider the case x= ! cos ’1 ; sin ’1 y= ! cos ’2 ; sin ’2 0 < ’1 < ’2 < ı: (Cf. the section on polar coordinates in last term’s lecture.) Then ^(x; y ) = ’2 − ’1 and cos ^(x; y ) = cos(’2 − ’1 ) = cos ’2 cos ’1 + sin ’2 sin ’1 = ⟨x; y ⟩ = cos ¸(x; y ) In a similar manner, one can prove that ^(x; y ) = ¸(x; y ) for x; y ∈ R3 . Inner Product Spaces Slide 90 Vectors, Norms and Inner Products We can use a Table command to create a general vector: X = Table@xi , 8i, 3<D 8x1 , x2 , x3 < The standard inner product (3.1) is implemented by a simple dot: Y = Table@yi , 8i, 3<D; X.Y x1 y1 + x2 y2 + x3 y3 The induced norm (3.2) is given by the Norm command: Norm@XD Abs@x1 D2 + Abs@x2 D2 + Abs@x3 D2 Inner Product Spaces Slide 91 Orthogonality 3.11. Definition. Let (V; ⟨ · ; · ⟩) be an inner product vector space. (i) Two vectors u; v ∈ V are called orthogonal or perpendicular if ⟨u; v ⟩ = 0. We then write u ⊥ v . (ii) We call n o M ⊥ := v ∈ V : ∀ ⟨m; v ⟩ = 0 m∈M the orthogonal complement of a set M ⊂ V . For short, we sometimes write v ⊥ M instead of v ∈ M ⊥ or v ⊥ m for all m ∈ M. 3.12. Lemma. The orthogonal complement M ⊥ is a subspace of V . Proof. If v1 ; v2 ∈ M ⊥ , then ⟨v1 + v2 ; m⟩ = ⟨v1 ; m⟩ + ⟨v2 ; m⟩ = 0 + 0 = 0 for all m ∈ M, so v1 + v2 ∈ M ⊥ . Similarly, if v ∈ M ⊥ and – ∈ F, then ⟨–v ; m⟩ = –⟨v ; m⟩ = 0, so –v ∈ M ⊥ . Thus M ⊥ is a subspace of V . Inner Product Spaces Slide 92 Orthogonality 3.13. Pythagoras’s Theorem. Let (V; ⟨ · ; · ⟩) be an inner product space and M some subset of V . Let z = x + y , where x ∈ M and y ∈ M ⊥ . Then ∥z∥2 = ∥x∥2 + ∥y ∥2 : Proof. We see directly that ∥z∥2 = ⟨z; z⟩ = ⟨x + y ; x + y ⟩ = ⟨x; x⟩ + ⟨x; y ⟩ + ⟨y ; x⟩ +⟨y ; y ⟩ | {z } =0 2 = ∥x∥ + ∥y ∥ : 2 | {z } =0 Inner Product Spaces Slide 93 Orthonormal Systems 3.14. Definition. Let (V; ⟨ · ; · ⟩) be an inner product vector space. A tuple of vectors (v1 ; : : : ; vr ) ⊂ V is called a (finite) orthonormal system if ⟨vj ; vk ⟩ = ‹jk := ( 1 0 for j = k; ; for j ̸= k; j; k = 1; : : : ; r; i.e., if ∥vk ∥ = 1 and vj ⊥ vk for j ̸= k. 3.15. Example. The standard basis vectors in R3 , 0 1 1 B C e1 = @0A ; 0 0 1 0 B C e2 = @1A ; 0 0 1 0 B C e3 = @0A ; 1 form an orthonormal system (e1 ; e2 ; e3 ) with respect to the standard scalar product. Inner Product Spaces Slide 94 Orthonormal Systems 3.16. Lemma. Let (V; ⟨ · ; · ⟩) be an inner product vector space and F = (v1 ; : : : ; vr ) ⊂ V an orthonormal system. Then the elements of F are linearly independent. Proof. We want to prove that for any –1 ; : : : ; –r ∈ F r X (3.4) –i vi = 0 i=1 implies –1 = · · · = –r = 0. We take the scalar product of (3.4) with v1 : 0 = ⟨v1 ; 0⟩ = ⟨v1 ; –1 v1 + · · · + –r vr ⟩ = –1 ⟨v1 ; v1 ⟩ +–2 ⟨v1 ; v2 ⟩ + · · · + –r ⟨v1 ; vr ⟩ = –1 ; | {z } =1 | {z } =0 | {z } =0 so (3.4) implies –1 = 0. Similarly, we obtain –2 = · · · = –r = 0. Inner Product Spaces Slide 95 Orthonormal Bases 3.17. Definition. Let (V; ⟨ · ; · ⟩) be a finite-dimensional inner product vector space and B = (e1 ; : : : ; en ) a basis of V . If B is also an orthonormal system, we say that B is an orthonormal basis (ONB). 3.18. Theorem. Let (V; ⟨ · ; · ⟩) be a finite-dimensional inner product vector space and B = (e1 ; : : : ; en ) an orthonormal basis of V . Then every v ∈ V has the basis representation v= n X ⟨ej ; v ⟩ej : j=1 3.19. Definition. The numbers ⟨ej ; v ⟩ are called Fourier coefficients of v with respect to the basis B. The vector ıei v := ⟨ei ; v ⟩ei is called the projection of v onto ei Inner Product Spaces Slide 96 Orthonormal Bases Proof of Theorem 3.18. Since B is a basis, for every v ∈ V there exist coefficients –1 ; : : : ; –n ∈ F such that v= n X –j ej : j=1 Now for any k = 1; : : : ; n, we have ⟨ek ; v ⟩ = n X –j ⟨ek ; ej ⟩ = j=1 so it follows that v= n X j=1 n X –j ‹kj = –k ; j=1 –j e j = n X j=1 ⟨ej ; v ⟩ej : Inner Product Spaces Slide 97 Orthonormal Bases The following result, which follows directly from Theorem 3.18, generalizes Pythagoras’s Theorem 3.13: 3.20. Parseval’s Theorem. Let (V; ⟨ · ; · ⟩) be a finite-dimensional inner product vector space and B = {e1 ; : : : ; en } an orthonormal basis of V . Then ∥v ∥2 = n X |⟨v ; ei ⟩|2 i=1 for any v ∈ V . We have generalized the concepts of angle and orthogonality to vector spaces and thereby obtained Pythagoras’s Theorem and now Parseval’s inequality. For understanding the geometry of vector spaces (and thereby extending the “elementary” geometry of R3 , the projection of a vector onto subspaces is of fundamental importance. The following theorem develops this concept a little further. Inner Product Spaces Slide 98 The Projection Theorem 3.21. Projection Theorem. Let (V; ⟨ · ; · ⟩) be a (possibly infinite-dimensional) inner product vector space and (e1 ; : : : ; er ), r ∈ N, be an orthonormal system in V . Denote U := span{e1 ; : : : ; er }. Then for every v ∈ V there exists a unique representation where u ∈ U and w ∈ U ⊥ v =u+w and u = r P ⟨ei ; v ⟩ei , w := v − u. i=1 3.22. Definition. The vector ıU v := r X ⟨ei ; v ⟩ei i=1 is called the orthogonal projection of v onto U. The projection theorem essentially states that ıU v always exists and is independent of the choice of the orthonormal system (it depends only on the span of the system, U). Inner Product Spaces Slide 99 The Projection Theorem 3.23. Example. Consider the subspace U = {(x1 ; x2 ; x3 ) ∈ R3 : x3 = 0} of R3 . An orthonormal basis of U is given by B = {e1 ; e2 }, where 0 1 0 1 1 B C e1 = @0A ; 0 0 B C e2 = @1A : 0 Then the projection of a vector y = (y1 ; y2 ; y3 ) onto U is given by ıU y = ⟨e1 ; y ⟩e1 + ⟨e2 ; y ⟩e2 0 1 0 1 0 1 0 1 0 1 0 1 * 1 * 0 y1 + 1 y1 + 0 B C B C B C B C B C B C = @0A ; @y2 A @0A + @1A ; @y2 A @1A 0 0 1 y3 0 1 0 0 1 1 0 y1 B C B C B C = y1 @0A + y2 @1A = @y2 A 0 0 0 0 y3 0 Inner Product Spaces Slide 100 The Projection Theorem Proof of the Projection Theorem. We first show the uniqueness of the decomposition: Assume v = u + w = u ′ + w ′ . Then by Pythagoras’s theorem, 0 = ∥u − u ′ + (w − w ′ )∥2 = ∥u − u ′ ∥2 + ∥w − w ′ ∥2 ; so ∥u − u ′ ∥ = ∥w − w ′ ∥ = 0. Thus u = u ′ and w = w ′ . Regarding the existence of such a decomposition, it is clear that u= r X ⟨ei ; v ⟩ei i=1 lies in U. We need to show that w ∈ U ⊥ , i.e., u ⊥ v − u. Inner Product Spaces Slide 101 The Projection Theorem Proof of the Projection Theorem (continued). Note first that ∥u∥2 = ⟨u; u⟩ = r DX ⟨ei ; v ⟩ei ; i=1 = r X r X i=1 j=1 r X ⟨ej ; v ⟩ej E j=1 ⟨ei ; v ⟩⟨ej ; v ⟩ ⟨ei ; ej ⟩ = | {z } =‹ij r X |⟨ei ; v ⟩|2 : i=1 It then follows that D ⟨v − u; u⟩ = ⟨v ; u⟩ − ∥u∥2 = v ; r X E ⟨ei ; v ⟩ei − ∥u∥2 i=1 = r X i=1 ⟨ei ; v ⟩⟨ei ; v ⟩ − r X i=1 |⟨ei ; v ⟩|2 = 0: Inner Product Spaces Slide 102 Orthogonal Subspaces An immediate consequence of the Projection Theorem is as follows: 3.24. Corollary. Let (V; ⟨ · ; · ⟩) be a (possibly infinite-dimensional) inner product vector space and let U ⊂ V be a finite-dimensional subspace. Then V = U ⊕ U⊥ If V is finite-dimensional, then dim V = dim U + dim U ⊥ : This follows directly from the Projection Theorem with Lemma 2.27 and Theorem 2.28. Inner Product Spaces Slide 103 Bessel’s Inequality As a consequence of the Projection Theorem 3.21 and Pythagoras’s Theorem 3.13 we obtain the following important result: 3.25. Bessel Inequality. Let (V; ⟨ · ; · ⟩) be an inner product space and (e1 ; : : : ; en ) an orthonormal system in V . Then, for any v ∈ V and any r ≤ n, r X |⟨ek ; v ⟩|2 ≤ ∥v ∥2 : (3.5) k=1 Proof. By Pythagoras’s Theorem 3.13 we then have ∥v − u∥2 + ∥u∥2 = ∥v ∥2 , so 0 ≤ ∥v − u∥2 = ∥v ∥2 − ∥u∥2 = ∥v ∥2 − r X i=1 |⟨ei ; v ⟩|2 : Inner Product Spaces Slide 104 Best Approximation Now suppose that we want to approximate an element v ∈ V using a linear combination of the first r elements of an orthonormal system, v≈ r X –1 ; : : : ; –r ∈ F: –i ei ; (3.6) i=1 The question is how to choose the coefficients –1 ; : : : ; –r to make the approximation “as good as possible”. We note that r r r r ‚ ‚2 X X X X ‚ ‚ –i ⟨ei ; v ⟩ –i ei ‚ = ∥v ∥2 + |–i |2 − –i ⟨v ; ei ⟩ − ‚v − i=1 i=1 i=1 i=1 r ˛ r X ˛2 X 2 ˛ ˛ = ∥v ∥ + ⟨ei ; v ⟩ − –i − |⟨ei ; v ⟩|2 i=1 (3.7) i=1 It is clear that (3.7) is minimal if –i = ⟨ei ; v ⟩, i.e., the coefficients in (3.6) are just the Fourier coefficients. Inner Product Spaces Slide 105 Best Approximation From (3.7) we can also see that r′ r ‚ ‚ ‚ ‚ X X ‚ ‚ ‚ ‚ ⟨ei ; v ⟩ei ‚ ≤ ‚v − ⟨ei ; v ⟩ei ‚ ‚v − i=1 for any r ′ ≥ r ; (3.8) i=1 so the approximation can only improve when we add further elements of the orthonormal system B to the approximation. Clearly, orthonormal systems and bases are extremely useful. We next discuss how to obtain an orthonormal system from any system of vectors. Inner Product Spaces Slide 106 Gram-Schmidt Orthonormalization Assume that we have a system of vectors (perhaps a basis) (v1 ; : : : ; vn ) in an inner product vector space V . We wish to construct a new system (w1 ; : : : ; wn ) that is orthonormal. We start with v1 and norm it, defining w1 := v1 ∥v1 ∥ Next, we want to obtain from v2 a vector w2 such that w1 ⊥ w2 . By Theorem 3.21, v2 has a unique representation as a sum v2 = x + y , where x ∈ span{w1 } and y ∈ (span{w1 })⊥ . Now x = ⟨w1 ; v2 ⟩w1 , so y = v2 − ⟨w1 ; v2 ⟩w1 ∈ (span{w1 })⊥ : (Of course, y is independent and even orthogonal to w1 .) It just remains to norm y , and we define w2 := v2 − ⟨w1 ; v2 ⟩w1 : ∥v2 − ⟨w1 ; v2 ⟩w1 ∥ Inner Product Spaces Slide 107 Gram-Schmidt Orthonormalization Now we can write v3 = ⟨w1 ; v3 ⟩w1 + ⟨w2 ; v3 ⟩w2 + y ; where y ∈ (span{w1 ; w2 })⊥ . Thus w3 := v3 − ⟨w2 ; v3 ⟩w2 − ⟨w1 ; v3 ⟩w1 ∥v3 − ⟨w2 ; v3 ⟩w2 − ⟨w1 ; v3 ⟩w1 ∥ will be normed and orthogonal to w1 and w2 . Proceeding in this way, we set v1 w1 := ∥v1 ∥ wk := vk − ∥vk − Pk−1 j=1 ⟨wj ; vk ⟩wj Pk−1 j=1 ⟨wj ; vk ⟩wj ∥ ; k = 2; : : : ; n; and hence obtain an orthonormal system as desired. Inner Product Spaces Slide 108 Gram-Schmidt Orthonormalization 3.26. Example. Suppose we are given 0 1 0 1 1 B C v1 = @0A ; 1 √ Then ∥v1 ∥ = 2 so 1 B C v2 = @1A ; 1 0 1 1 B C v3 = @2A : 0 0 1 1 1 B C w1 = √ @0A : 2 1 Next, we calculate the projection of v2 onto w1 and subtract it from v2 : 0 1 0 1 0 1 „ « 1 1 0 1 1 1 B C B C B C v2 − ⟨w1 ; v2 ⟩w1 = @1A − √ · 1 + √ · 1 √ @0A = @1A 2 2 2 1 1 0 Inner Product Spaces Slide 109 Gram-Schmidt Orthonormalization Since the norm of this vector is already one, we have 0 1 0 B C w2 = @1A : 0 Next, we calculate 0 1 0 1 0 1 0 1 1 0 1 1 1B C B C B C 1B C v3 − ⟨w2 ; v3 ⟩w2 − ⟨w1 ; v3 ⟩w1 = @2A − 2 @1A − @0A = @ 0 A 2 2 0 0 1 −1 Norming, 0 1 1 1 B C w3 = √ @ 0 A : 2 −1 Inner Product Spaces Slide 110 Projections and Gram-Schmidt We can use the Normalize command to create a normed vector: v1 = 81, 0, 1<; w 1 = Normalize@v1 D : 1 , 0, 2 1 2 > The projection of v2 onto w1 can be calculated through the Projection command. v2 = 81, 1, 1<; v2 - Projection@v2 , w 1 D w2 = Norm@v2 - Projection@v2 , w 1 DD 80, 1, 0< Note that addition of vectors and multiplication with numbers work naturally. Inner Product Spaces Slide 111 Projections and Gram-Schmidt Mathematica has a built-in command for the Gram-Schmidt procedure, Orthogonalize: v3 = 81, 2, 0<; Orthogonalize@8v1 , v2 , v3 <D :: 1 2 , 0, 1 2 >, 80, 1, 0<, : 1 2 , 0, - 1 2 >> Linear Maps Slide 112 4. Linear Maps Linear Maps Linear Algebra 1. Systems of Linear Equations 2. Finite-Dimensional Vector Spaces 3. Inner Product Spaces 4. Linear Maps 5. Matrices 6. Theory of Systems of Linear Equations 7. Determinants Slide 113 Linear Maps Slide 114 Linear maps on Vector Spaces In calculus, physics and engineering applications, a fundamental role is played by functions between vector spaces that are linear: 4.1. Definition. Let (U; ⊕; ⊙) and (V; ´; `) be vector spaces that are either both real or both complex. Then a map L : U → V is said to be linear if it is both homogeneous, i.e., L(– ⊙ u) = – ` L(u) (4.1a) L(u ⊕ u ′ ) = L(u) ´ L(u ′ ); (4.1b) and additive, i.e., for all u; u ′ ∈ U and – ∈ F. The set of all linear maps L : U → V is denoted by L(U; V ). 4.2. Remark. A linear map L : U → V satisfies L(0) = 0, where we use the same symbol 0 for the zero in U or in V . Linear Maps Slide 115 Linear maps on Vector Spaces 4.3. Examples. (i) All linear maps R → R are of the form x 7→ ¸x for some ¸ ∈ R. d (ii) For I ⊂ R, the map dx : f 7→ f ′ is a linear map C 1 (I) → C(I). (iii) The map (an ) 7→ a0 is a linear map from the space of all sequences to C. (iv) The map (an ) 7→ limn→∞ an is linear map from the space of all convergent sequences to C. (v) If C is regarded as a real vector space, the map z 7→ z is linear C → C. It is not linear if C is regarded as a complex vector space. (vi) For any real or complex vector space V , the map V ∋ x 7→ c ∈ F (F = R or C) is linear if and only if c = 0. For linear maps, we often write simply Lu instead of L(u). Linear Maps Slide 116 Linear Maps are Structure-Preserving A linear map L: U → V between vector spaces (U; ⊕; ⊙) and (V; ´; `) is a structure-preserving map. What does this mean? Suppose (for the moment) that L : U → V is also bijective. Consider the scalar multiplication of a vector x ∈ U with a number –. There are now two ways of doing this: Either we calculate – ⊙ x directly, or we use the map L to form Lx ∈ V , then multiply by –, then use the inverse map L−1 : V → U to regain an element of U: U L w V –⊙ –` u U u L−1 u V (4.2) Linear Maps Slide 117 Homomorphisms The validity of (4.2) follows from ` ´ L−1 (– ` Lx) = L−1 L(– ⊙ x) = – ⊙ x: From now on, we will use · instead of symbols like ⊙ or ` and + instead of ⊕ or ´. It is up to the reader (you!) to determine which operation in which space is indicated. Since linear maps have this important property of structure preservation, they deserve an appropriately fancy name: they are also known as (vector space) homomorphisms (the greek prefix homo means “same”, while morphos means “shape”). Thus, “homomorphism” and “linear map” both denote the same thing. In fact linear maps are so intertwined with the linear structure of vector spaces that in the finite-dimensional case it suffices to know how a linear map acts on basis vectors to determine it completely. Linear Maps Slide 118 Homomorphisms and Finite-Dimensional Spaces 4.4. Theorem. Let U; V be real or complex vector spaces and (b1 ; : : : ; bn ) a basis of U (in particular, it is assumed that dim U = n < ∞). Then for every n-tuple (v1 ; : : : ; vn ) ∈ V n there exists a unique linear map L : U → V such that Lbk = vk , k = 1; : : : ; n. Proof. We first show the uniqueness of L: Assume there exists a second homomorphism M ∈ L(U; V ) with Mbk = vk . For any u ∈ U we have P numbers –1 ; : : : ; –n such that u = –k bk . Then Lu = L(–1 b1 + : : : + –n bn ) = n X –k L(bk ) k=1 = n X k=1 –k vk = n X –k M(bk ) = M(–1 b1 + : : : + –n bn ) = Mu: k=1 Since this is true for any u ∈ U, we have L = M. Linear Maps Slide 119 Homomorphisms and Finite-Dimensional Spaces Proof (continued). We now prove the existence of such a linear map, i.e., given the tuple (v1 ; : : : ; vn ) we want to show how to define L. We define L by defining it P for each u ∈ U. Every u ∈ U has a unique basis decomposition u = –k bk with numbers –1 ; : : : ; –n ∈ F. We hence define Lu in the obvious way, Lu := n X –k vk : k=1 It remains to check that L is linear: if u; u ′ ∈ U have coordinates (–k )nk=1 and (–′k )nk=1 , respectively, we have L(u + u ′ ) = n X (–k + –′k )vk = k=1 n X –k vk + k=1 The homogeneity of L can be shown similarly. n X k=1 –′k vk = Lu + Lu ′ : Linear Maps Slide 120 Coordinate Map 4.5. Remarks. (i) The identity map id : V → V , id(v ) = v , is linear. (ii) The set L(U; V ) is again a vector space when endowed with pointwise addition and scalar multiplication. (iii) If L1 ∈ L(U; V ) and L2 ∈ L(V; W ), then L2 ◦ L1 ∈ L(U; W ). (The composition of linear maps is linear.) 4.6. Examples. (i) If V is a real or complex vector space and (b1 ; : : : ; bn ) a basis of V , then the coordinate map ’ : V → Fn ; is linear (and bijective). 0 1 –1 B .. C v= –k bk 7→ @ . A k=1 –n n X Linear Maps Slide 121 Dual Space 4.7. Examples. (ii) Let V be a real or complex vector space. Then L(V; F) is known as the dual space of V and denoted by V ∗ . The dual space of V is of course itself a vector space. Let dim V = n < ∞ and B = (b1 ; : : : ; bn ) be a basis of V . Then for every k = 1; : : : ; n there exists a unique map bk∗ : V → F; bk∗ (bj ) = ‹jk = ( 1; j = k; 0; j = ̸ k: It turns out (see exercises) that the tuple of maps B∗ = (b1∗ ; : : : ; bn∗ ) is a basis of V ∗ = L(V; F) (called the dual basis of B) and thus dim V ∗ = dim V = n. Linear Maps Slide 122 Range and Kernel 4.8. Definition. Let U; V be real or complex vector spaces and L ∈ L(U; V ). Then we define the range of L by n ran L := v ∈ V : ∃ v = Lu o u∈U and the kernel of L by ker L := {u ∈ U : Lu = 0}: It is easy to see that ran L ⊂ V and ker L ⊂ U are subspaces. 4.9. Remark. It is not difficult to see that L ∈ L(U; V ) is injective if and only if ker L = {0}. Linear Maps Slide 123 Nomenclature According to their properties, there are several fancy names for linear maps. A homomorphism L ∈ L(U; V ) is said to be I an isomorphism if L is bijective; I an endomorphism if U = V ; I an automorphism if U = V and L is bijective; I epimorph if L is surjective; I monomorph if L is injective. 4.10. Remark. If L is an isomorphism, the its inverse, L−1 is also linear and hence also an isomorphism. Linear Maps Slide 124 Isomorphisms 4.11. Theorem. Let U; V be finite-dimensional vector spaces and L ∈ L(U; V ). Then L is an isomorphism if and only if for every basis (b1 ; : : : ; bn ) of U the tuple (Lb1 ; : : : ; Lbn ) is a basis of V . Proof. (⇒) Assume that L is bijective. Then for y ∈ V the pre-image x = L−1 y is P uniquely determined. Let x = –k bk be the representation of x in the basis B = (b1 ; : : : ; bn ). Now y =L n “X k=1 ” –k bk = n X –k · Lbk k=1 where the –k are uniquely determined by x, which is uniquely determined by y . Thus for any y we can find a representation in terms of (Lb1 ; : : : ; Lbn ) by considering the pre-image x = L−1 y . Linear Maps Slide 125 Isomorphisms Proof (continued). We still need to show that this representation is unique, i.e., if P y = —k · Lbk , then —k = –k . Applying L−1 , we see that L −1 y =x = n X k=1 –k bk ; −1 L y =L −1 n X k=1 —k · Lbk = n X —k bk k=1 and because (b1 ; : : : ; bn ) is a basis we see that —k = –k . (⇐) We need to show that L is injective and surjective. Since any y ∈ V P may be written as y = –k · Lbk , y is obviously the image of P x = –k bk ∈ U. Thus L is surjective. To see that L is injective, we show that ker L = {0} (see Remark 4.9). P P Now Lx = 0 for x = –k bk implies –k · Lbk = 0. Since (Lb1 ; : : : ; Lbn ) is a basis, this means that –1 = · · · = –n = 0, so x = 0. Linear Maps Slide 126 Isomorphisms 4.12. Definition. Two vector spaces U and V are called isomorphic, written U∼ = V , if there exists an isomorphism ’ : U → V . 4.13. Lemma. Two finite-dimensional vector spaces U and V are isomorphic if and only if they have the same dimension: U∼ =V ⇔ dim U = dim V Proof. (⇒) Let ’ : U → V be an isomorphism and (b1 ; : : : ; bn ) a basis of U (dim U = n). Then (’(b1 ); : : : ; ’(bn )) is a basis of V and thus dim V = n = dim U. (⇐) If (a1 ; : : : ; an ) and (b1 ; : : : ; bn ) are bases of U and V , respectively, define an isomorphism ’ by ’(ak ) = bk , k = 1; : : : ; n. Linear Maps Slide 127 The Dimension Formula We can now prove a deep and fundamental result on linear maps: 4.14. Dimension Formula. Let U; V be real or complex vector spaces, dim U < ∞. Let L ∈ L(U; V ). Then (4.3) dim ran L + dim ker L = dim U: Proof. Let dim U =: n < ∞. Since ker L ⊂ U, we have dim ker L =: r ≤ n. We choose a basis (a1 ; : : : ; ar ) of the kernel, and use the Basis Completion Theorem 2.21 to construct a basis (a1 ; : : : ; ar ; ar +1 ; : : : ; an ) of U. Then for P any x = –k ak ∈ U, Lx = L(–1 a1 + · · · + –n an ) = –r +1 Lar +1 + · · · + –n Lan : | {z } =:b1 Thus ran L = span{b1 ; : : : ; bn−r }. |{z} =:bn−r Linear Maps Slide 128 The Dimension Formula Proof. We now claim that the vectors b1 ; : : : ; bn−r are independent; in that case they form a basis of ran L and dim ran L = n − r , proving (4.3). Consider the equality 0 = —1 b1 + · · · + —n−r bn−r = L(—1 ar +1 + · · · + —n−r an ): (4.4) If (4.4) holds, then —1 ar +1 + · · · + —n−r an ∈ ker L = span{a1 ; : : : ; ar }. Thus, there exist –1 ; : : : ; –r such that —1 ar +1 + · · · + —n−r an − (–1 a1 + · · · + –r ar ) = 0: Since (a1 ; : : : ; an ) is a basis of U, we thence obtain —1 = · · · = —n−r = 0; –1 = · · · = –r = 0: Thus (4.4) implies (4.5) and b1 ; : : : ; bn−r are independent. (4.5) Linear Maps Slide 129 The Dimension Formula 4.15. Corollary. Let U; V be real or complex finite-dimensional vector spaces with dim U = dim V . Then a linear map L ∈ L(U; V ) is injective if and only if it is surjective. Proof. L injective ⇔ ker L = {0} ⇔ dim ker L = 0 ⇔ dim ran L = dim U = dim V ⇔ ran L = V ⇔ L surjective Linear Maps Slide 130 Normed Vector Spaces and Bounded Linear Maps 4.16. Definition. Let (U; ∥ · ∥U ) and (V; ∥ · ∥V ) be normed vector spaces. Then a linear map L : U → V is said to be bounded if there exists some constant c > 0 (called a bound for L) such that ∥Lu∥V ≤ c · ∥u∥U for all u ∈ U. 4.17. Remark. It can be shown that if U is a finite-dimensional vector space, then any linear map is bounded. 4.18. Examples. 1. The map L¸ : R → R, x 7→ ¸x is bounded with c = |¸|. (4.6) Linear Maps Slide 131 Bounded Linear Maps 2. The map L: R → R ; 2 2 x1 x2 ! 7→ 2x2 −x1 ! q is linear and bounded. If we take ∥x∥2 = x12 + x22 , we can see that c = 2 is a bound for L. 3. Take the space C 1 ([0; 1]) of the continuously differentiable functions on the interval [0; 1] and imbue it with the norm given by ∥f ∥∞ = sup |f (x)|. Then the map x∈[0;1] d : C 1 ([0; 1]) → C([0; 1]); f 7→ f ′ dx is not bounded. To see this, consider the function f (x) = e −nx for n ∈ N. Clearly, ∥f ∥∞ = 1 but ∥f ′ ∥∞ = n. Since we can choose n as large as we like, there can exist no c > 0 such that ∥f ′ ∥∞ ≤ c · ∥f ∥∞ . Linear Maps Slide 132 The Operator Norm By (4.6), for every bounded linear map there exists an upper bound c > 0 such that ∥Lu∥V ≤c for u ̸= 0. ∥u∥U We are now interested in the least upper bound c. 4.19. Definition and Theorem. Let U; V be normed vector spaces. Then the set of bounded linear maps L(U; V ) is also a vector space and ∥Lu∥V = sup ∥Lu∥V : u∈U ∥u∥U u∈U ∥L∥ := sup u̸=0 (4.7) ∥u∥U =1 defines a norm, the so-called operator norm or induced norm on L(U; V ). The proof of the norm properties is left to the reader. The operator norm also has the additional, very useful, property that ∥L2 L1 ∥ ≤ ∥L2 ∥ · ∥L1 ∥; L1 ∈ L(U; V ); L2 ∈ L(V; W ): Matrices Slide 133 5. Matrices Matrices Linear Algebra 1. Systems of Linear Equations 2. Finite-Dimensional Vector Spaces 3. Inner Product Spaces 4. Linear Maps 5. Matrices 6. Theory of Systems of Linear Equations 7. Determinants Slide 134 Matrices Slide 135 A Calculus of Linear Maps We have seen in Lemma 4.13 that two vector spaces are isomorphic if their dimensions are equal. In particular: I Every real n-dimensional vector space is isomorphic to Rn I Every complex n-dimensional vector space is isomorphic to Cn ∼ = R2n This means that if we can find a calculus for linear maps Rn → Rm , we can automatically treat maps from an n-dimensional space U to an m-dimensional space V : L w V U ’1 u ’2 R n A u w Rm Here L ∈ L(U; V ), ’1 ; ’2 are isomorphisms and A ∈ L(Rn ; Rm ). If L = ’−1 2 ◦ A ◦ ’1 we obtain all relevant properties of L (range, kernel) by analyzing A. Matrices Slide 136 A Calculus of Linear Maps The word calculus means a “scheme of calculating” that transforms a procedure that otherwise needs to be performed individually into an algorithm that can be (easily) applied in general. For example, the revolutionary aspect of Newton/Leibniz’s calculus was the fact that areas under curves, which earlier had been calculated by hand for each individual type of curve, could suddenly easily be computed through inverse differentiation (by finding the primitive of a function). This was exemplified by the fundamental theorem of calculus. As an example, compare Exercises 4 and 19 of Chapter 14 of Spivak’s book with the simplicity of applying the Fundamental Theorem of Calculus. In the following, we will establish an analogous calculus for linear maps, where we are able, due to Lemma 4.13, to concentrate on those in L(Rn ; Rm ). Matrices Slide 137 Matrices 5.1. Definition. An m × n matrix over the complex numbers is a map a : {1; : : : ; m} × {1; : : : ; n} → C; We represent the graph of a through 0 a11 B a B 21 A := B B .. @ . a12 a22 .. . ··· ··· .. . (i; j) 7→ aij : 1 a1n a2n C C .. C C = (aij )1≤i≤m : . A 1≤j≤n am1 am2 · · · amn We denote the set of all m × n matrices over C by Mat(m × n; C). 5.2. Remarks. 1. With the usual pointwise addition and scalar multiplication of maps, Mat(m × n; C) becomes a complex vector space. 2. Matrices over R instead of C are defined in the same way. Occasionally, we may also replace C by a real or complex vector space. Matrices Slide 138 Matrices as Linear Maps Matrices turn out to be important tools in the analysis of linear maps: every linear map between finite-dimensional vector spaces may be expressed as a matrix, and every matrix corresponds (in a certain way) to some such linear map. We first restrict ourselves to the case of Rn . 5.3. Theorem. Each matrix A ∈ Mat(m × n; R) uniquely determines a linear map j(A) ∈ L(Rn ; Rm ) such that the columns a·k are the images of the standard basis vectors ek ∈ Rn ; in particular, j : Mat(m × n; R) → L(Rn ; Rm ) is an isomorphism, Mat(m × n; R) ∼ = L(Rn ; Rm ), so every map L ∈ L(Rn ; Rm ) corresponds to a matrix j −1 (L) whose columns a·k are the images of the standard basis vectors ek ∈ Rn . Matrices Slide 139 Matrices as Linear Maps Proof. Given a matrix A with columns a·k , k = 1; : : : ; n, we simply define j(A) by j(A) : Rn → Rm ; ek 7→ a·k ; k = 1; : : : ; n: Given a map L ∈ L(Rn ; Rm ) we define j −1 (L) ∈ Mat(m × n; R) by j −1 (L) = (a·1 ; : : : ; a·n ); a·k = L(ek ); k = 1; : : : ; n: Obviously, j −1 is actually the inverse of j; hence j is bijective. It remains to show that j is linear. Let A = (aik ), B = (bik ). Then j(A + B)ek = (a + b)·k = a·k + b·k = j(A)ek + j(B)ek ; so j is additive. The homogeneity can be shown analogously. Matrices Slide 140 Matrices as Linear Maps We have thus established that every matrix A = (aik ) represents a linear map j(A). In particular, 1 0 a1k n B .. C X aik ei ; j(A)ek = @ . A = amk k = 1; : : : ; n: i=1 We also note that we can represent x ∈ Rn as 0 1 0 1 0 1 1 0 x1 B C B.C n X 0 B C B .. C B C B C C x = @ ... A = x1 B + · · · + x = xk ek : nB C B .. C @.A @0A k=1 xn 0 1 Matrices Slide 141 Matrices as Linear Maps Then j(A) ∈ L(Rn ; Rm ) acts on a general x ∈ Rn as follows, j(A)x = j(A) n “X k=1 = n X k=1 0 ” xk ek = 0 1 n X xk j(A)ek k=1 a1k B .. C xk @ . A amk 1 x1 a11 + · · · + xn a1n B C .. =@ A: . x1 am1 + · · · + xn amn Matrices Slide 142 Matrices as Linear Maps From a practical point of view, we start with A ∈ Mat(m × n; R) and some x ∈ Rn and obtain 0 1 x1 a11 + · · · + xn a1n B C .. m @ A∈R : . x1 am1 + · · · + xn amn It seems unnecessary to include the map j : Mat(n × m; R) → L(Rn ; Rm ) in this. In fact, we might directly interpret the matrix A as a linear map without mentioning j ! The isomorphism j can be simply left out; mathematicians routinely consider sets of objects that are isomorphic as being actually identical. In this way, A has a double meaning: it is on the one hand a matrix, and on the other hand a linear map. This avoids always mentioning a superfluous isomorphism and greatly simplifies the formulation of statements. Matrices Slide 143 Matrices as Linear Maps We therefore write Ax instead of j(A)x; in particular, 0 a11 B .. Ax = @ . am1 10 1 0 1 · · · a1n x1 x1 a11 + · · · + xn a1n C .. C B .. C = B .. .. A: . . A@ . A @ . · · · amn xn x1 am1 + · · · + xn amn (5.1) We can interpret (5.1) as the action of a matrix A ∈ Mat(m × n; R) on a vector x ∈ Rn , yielding a vector Ax ∈ Rm . This is the beginning of our calculus of linear maps. We now need to develop this further to deal with (e.g.) compositions and inverses of linear maps. Matrices Slide 144 Compositions Let Rn −−→ Rm −−→ Rl be linear maps and consider their composition j(B) ◦ j(A). We want to find a matrix C such that j(B) ◦ j(A) = j(C). Now j(A) j(B) j(B) ◦ j(A)ek = j(B) = m X s=1 l m “ X X t=1 s=1 | ask es = m X s=1 ask j(B)es = m X s=1 ask l X bts et t=1 ” bts ask et {z =:ctk } where C = (ctk ) ∈ Mat(l × n; R). We thus introduce C as the matrix product of B and A. Matrices Slide 145 Matrix Product 5.4. Definition. Let A ∈ Mat(l × m; C) and B ∈ Mat(m × n; C). Then we define the product of A = (ai k ) and B = (bkj ) by AB ∈ Mat(l × n; C); AB := m “X aik bkj k=1 ” i=1;:::;l j=1;:::;n We have seen that the matrix product satisfies j(A) ◦ j(B) = j(AB). Furthermore, the product is associative, i.e., ` ´ ` ´ A(BC) = j −1 j(A) ◦ j(BC) = j −1 j(A) ◦ (j(B) ◦ j(C)) ` ´ ` ´ = j −1 (j(A) ◦ j(B)) ◦ j(C) = j −1 j(AB) ◦ j(C) = (AB)C If A; B ∈ Mat(n × n; C) both products AB and BA exist; however AB ̸= BA; so the matrix product is not commutative. Matrices Slide 146 Matrix Product The matrix product is easily memorized through “row-by-column multiplication,” as seen in the following examples: 5.5. Examples. 1. A = AB = ! 1 2 ,B= 3 4 ! 5 6 7 , 1 0 2 ! 1·5+2·1 1·6+2·0 1·7+2·2 3·5+4·1 3·6+4·0 3·7+4·2 ! ! 2. A = 1 1 ,B= 2 2 3. A = 1 0 ,B= 0 1 4. A = 0 ¸ , A2 = AA = 0 0 ! ! 2 1 , AB = 1 0 ! 2 1 , AB = 1 0 0 0 0 0 ! = ! 3 1 , BA = 6 2 2 1 1 0 ! = BA ∀¸ ∈ C 7 6 11 19 18 29 4 4 1 1 ! ! Matrices Slide 147 Matrix Transpose For A = (aij ) ∈ Mat(m × n; F) we define the transpose of A by AT ∈ Mat(n × m; F); AT = (aji ): For example, 5 6 7 1 0 2 !T 0 1 5 1 B C = @6 0A : 7 2 We also define the adjoint A∗ ∈ Mat(n × m; F); A∗ = A = (aji ): T where in addition to the transpose the complex conjugate of each entry is taken. It is easy to see (in the assignments) that for A ∈ Mat(m × n; F), x ∈ Fm , y ∈ Fn , ⟨x; Ay ⟩ = ⟨A∗ x; y ⟩: Matrices Slide 148 Matrices In Mathematica, a matrix is defined as follows: In[6]:= A = TableAai ,j , 8i, 4<, 8j, 3<E Out[6]= 88a1,1 , a1,2 , a1,3 <, 8a2,1 , a2,2 , a2,3 <, 8a3,1 , a3,2 , a3,3 <, 8a4,1 , a4 The MatrixForm command can be used for nicer formatting. Matrices Slide 149 Matrix Multiplication Matrix multiplication works using the same dot as for the inner product: In[3]:= A = 881, 1<, 82, 2<<; B = 882, 1<, 81, 0<<; MatrixForm@A.BD Out[5]//MatrixForm= K 3 1 O 6 2 The Transpose command gives the transpose: Matrices Slide 150 Matrix Multiplication There are two very useful facts to keep in mind: (i) When a vector x ∈ Rn is multiplied by a matrix A ∈ Mat(m × n), the result is a linear combination of the column vectors of A. For illustration, in the case of n = 3 and m = 2, we have ! 0 1 x a11 a12 a13 B 1 C @x2 A = a21 a22 a23 x3 a11 x1 + a12 x2 + a13 x3 a21 x1 + a22 x2 + a23 x3 = x1 ! ! ! a a a11 + x2 12 + x3 13 a23 a22 a21 ! (ii) When a matrix B is multiplied by a matrix A, the result is a matrix whose columns are the products of the columns of B multiplied with A. Again, for illustration, we give a simple example. Matrices Slide 151 Matrix Multiplication Write A= ! a11 a12 ; a21 a22 where b1 = ! b11 ; b21 Then AB = = a11 a12 a21 a22 ! B= b11 b12 b13 b21 b22 b23 b2 = b12 ; b22 ! b11 b12 b13 b21 b22 b23 ! = (b1 ; b2 ; b3 ) b3 = ! b13 : b23 ! a11 b11 + a12 b21 a11 b12 + a12 b22 a11 b13 + a12 b23 a21 b11 + a22 b21 a21 b12 + a22 b22 a21 b13 + a22 b23 = (Ab1 ; Ab2 ; Ab3 ): ! Matrices Slide 152 Matrix of a Linear Map We are now able to properly define the matrix of a linear map between two finite-dimensional vector spaces. Let U; V be finite-dimensional real or complex vector spaces with bases A = (a1 ; : : : ; an ) ⊂ U and B = (b1 ; : : : ; bm ) ⊂ V: Define the isomorphisms ∼ = ’A : U − → Rn ; ∼ = ’B : V − → Rm ; ’A (aj ) = ej ; j = 1; : : : ; n; ’B (bj ) = ej ; j = 1; : : : ; m: Then any linear map L ∈ L(U; V ) induces a matrix A = ΦB A (L) ∈ Mat(m × n; R) through U L ’A w V u ’B Rn A u w Rm −1 ΦB A (L) = A = ’B ◦ L ◦ ’A Matrices Slide 153 Matrix of Complex Conjugation 5.6. Example. Consider C as a real two-dimensional vector space with basis C → C, z 7→ z is then a linear map. We want to determine the matrix of this map with respect to the basis B. The isomorphism is B = (1; i). The complex conjugation L : ’B : C → R ; 2 1 7→ ! 1 ; 0 i 7→ ! 0 : 1 ` ´ Thus ’B (a + bi) = ba . The most convenient way to determine A = ΦB B (L) is to calculate ’B (a + bi) = ! a ; b ’B (L(a + bi)) = ’B (a − bi) = ` ´ ` ´ a −b ! a and then find A ∈ Mat(2 × 2; R) such that A ba = −b . It is easily seen that ! 1 0 B A = ΦB (L) = : 0 −1 Matrices Slide 154 Matrix of Complex Conjugation 5.7. Example. If we change the basis we used in the previous example, we get a different matrix. Let us take the basis A = (1 + i; 1 − i) for C. Then the isomorphism is ’A : C → R ; 2 1 + i 7→ ! 1 ; 0 1 − i 7→ ! 0 : 1 Thus ` ´ 1 ’A (a + bi) = ’A (a + b)(1 + i) + (a − b)(1 − i) 2 ! a+b a−b 1 a+b = ’A (1 + i) + ’A (1 − i) = : 2 2 2 a−b ` ´ ` ´ a−b Hence we need to find A ∈ Mat(2 × 2; R) such that A a+b a−b = a+b , i.e., A A A = Φ (L) = ! 0 1 : 1 0 Matrices Slide 155 Systems of Equations Before we proceed, we take a step back to the beginning of the course. Recall that a system of linear equations was given by a11 x1 + · · · + a1n xn = b1 .. . (5.2) am1 x1 + · · · + amn xn = bm We can express (5.2) using vectors and matrices by writing Ax = b; where 0 a11 B .. A=@ . am1 1 : : : a1n .. C ; .. . . A : : : amn 0 1 x1 B .. C x = @ . A; xn 0 1 b1 B .. C b = @ . A: bm Matrices Slide 156 Elementary Matrix Manipulations The Gauß-Jordan algorithm introduced elementary row manipulation, which we now reformulate in the context of matrices: 5.8. Elementary Matrix Manipulations. An elementary row manipulation of a matrix is one of the following: (i) Swapping (interchanging) of two rows, (ii) Multiplication of a row with a non-zero number, (iii) Addition of a multiple of one row to another row. The additions and multiplications are performed componentwise in each row. If the word “row” is replaced by “column” these operations are termed elementary column operations The Gauß-Jordan algorithm uses only row operations. We seek to find matrices that implement these row manipulations through multiplication of Ax = b from the left. Matrices Slide 157 Elementary Matrix Manipulations For illustration, we consider the case n = 4, m = 3. Consider the most trivial operation possible: we do nothing. This would be represented by multiplying with the unit matrix, 0 10 1 0 1 0 1 1 0 0 b1 b1 B CB C B C @0 1 0A @b2 A = @b2 A 0 0 1 b3 b3 and 0 10 1 1 0 0 a11 a12 a13 a14 a11 a12 a13 a14 B CB C B C @0 1 0A @a21 a22 a23 a24 A = @a21 a22 a23 a24 A 0 0 1 a31 a32 a33 a34 a31 a32 a33 a34 Note that we do not need to mention x at all in these equations! This is the true underlying philosophy of the notational scheme used in the Gauß-Jordan algorithm. Matrices Slide 158 Elementary Matrix Manipulations Now how would we swap the first and second row? We can see that 0 10 1 0 1 0 1 0 1 0 b1 b2 B CB C B C @1 0 0A @b2 A = @b1 A 0 0 1 b3 b3 | {z =:S12 } and 0 10 1 0 1 0 a11 a12 a13 a14 a21 a22 a23 a24 B CB C B C @1 0 0A @a21 a22 a23 a24 A = @a11 a12 a13 a14 A 0 0 1 a31 a32 a33 a34 a31 a32 a33 a34 Note that we have swapped the first and second row of the unit matrix to obtain S12 ! Matrices Slide 159 Elementary Matrix Manipulations Furthermore, in order to add 3 times the second row to the third row, we would use 0 10 1 0 1 1 0 0 b1 b1 B CB C B C @0 1 0A @b2 A = @ b2 A 0 3 1 b3 b3 + 3b2 and 0 10 1 1 0 0 a11 a12 a13 a14 B CB C @0 1 0A @a21 a22 a23 a24 A 0 3 1 a31 a32 a33 a34 0 1 a11 a12 a13 a14 B C a21 a22 a23 a24 =@ A a31 + 3a21 a32 + 3a22 a33 + 3a23 a34 + 3a24 Again we have performed the elementary row operation on the unit matrix to obtain the matrix that implements this operation. Matrices Slide 160 Elementary Matrix Manipulations In conclusion, we remark that (i) An elementary row operation on a system of equations may be simply considered as a multiplication of Ax = b from the left with a suitable matrix, a so-called elementary matrix. (ii) An elementary matrix is obtained by applying the desired elementary row operation to the unit matrix. [Why must this be the case?] (iii) If we apply two elementary operations, the product of their respective matrices gives the matrix corresponding to these two operations, in order. This means that in solving a system Ax = b, the sum of all row operations in forward elimination and backward substitution may be represented by a single matrix S ∈ Mat(m × m; R). We thus have SAx = Sb: If m = n, the system Ax = b may have a unique solution; in that case, SA is a diagonal matrix of the form (1.3), i.e., SA = id. Matrices Slide 161 Inverse of a Matrix Let us now return to the question of finding the inverse of a linear map A : Rn → Rn (of course, we must assume that A is an isomorphism, so m = n). Of course, we say that a matrix is invertible if the corresponding linear map is invertible and the inverse of a matrix is just the matix of the inverse map. However, it may be useful to clarify this. 5.9. Definition. A matrix A ∈ Mat(n × n; R) is called invertible if there exists some B ∈ Mat(n × n; R) such that 0 1 0 1 B C AB = BA = id = @ . . . A : 0 1 (5.3) We then write B = A−1 and say that A−1 is the inverse of A. e both satisfy 5.10. Remark. The inverse is of course unique; if B and B (5.3) for some A, then e e e B = (BA)B = B(AB) = B: Matrices Slide 162 Inverse of a Matrix 5.11. Remark. It is obvious that the matrix S corresponding to a series of elementary row manipulations will be invertible, because the operations themselves are invertible. Thus the matrix S : Rm → Rm represents an isomorphism. 5.12. Remark. Given a matrix A ∈ Mat(n × n; R) (identified with a linear map L ∈ L(Rn ; Rn )) and a putative inverse matrix B = A−1 ∈ Mat(n × n; R) it is sufficient to verify that BA = id : In this case, B corresponds to a linear map M such that M ◦ L is the identity map. Thus dim ran M = n, so M is bijective. Hence M is invertible and L = M −1 is bijective. Then L ◦ M = M ◦ L = id, so we have AB = BA = id. Matrices Slide 163 Inverse of a Matrix 5.13. Lemma. Let A ∈ L(Rn ; Rn ). Then A is invertible if and only if there exists an elementary matrix S corresponding to elementary row operations that transform A into the unit matrix SA = id. Proof. (⇒) If A is bijective, for every y ∈ Rn there exists a unique solution x to Ax = y . Thus there exists a matrix S corresponding to row operations such that SAx = x = Sy : For every x there exists a unique y such that y = Ax. Thus SAx = x for every x ∈ Rn , and so SA = id. (⇐) By Remark 5.12, SA = AS = id, and by Remark 5.10, S = A−1 so A is invertible. Matrices Slide 164 Finding the Inverse Lemma 5.13 tells us how to actually find the inverse of a matrix A: it is simply the elementary matrix that transforms A into the unit matrix. If this transformation is not possible, A is not invertible. 5.14. Example. Consider the matrix A= ! 2 3 : 2 1 In order to find the inverse, we transform A into the unit matrix through a sequence of elementary row operations S, keeping track of the elementary matrix that implements these operations. Matrices Slide 165 Finding the Inverse SA 2 3 2 1 S ! ! 1 3=2 0 −2 1 0 0 1 „ „ ! 1 0 0 1 1=2 0 −1 1 | We may immediately check that −1 A A= −1 AA = « ! | : 2 ·(−2) ←−−− + ←−−−−−− + | : (−2) ·(−3=2) ! −1=4 3=4 1=2 −1=2 {z =A−1 ! −1=4 3=4 1=2 −1=2 2 3 2 1 « 2 3 2 1 ! ! −1=4 3=4 1=2 −1=2 } = 1 0 0 1 = 1 0 0 1 ! ! Matrices Slide 166 Matrix Inverse Mathenatica has a command for finding the inverse: In[7]:= MatrixForm@Inverse@882, 3<, 82, 1<<DD Out[7]//MatrixForm= 1 -4 1 2 3 4 1 -2 Matrices Slide 167 Inverse Maps 5.15. Remark. We note that if A; B ∈ Mat(n × n; R) are invertible, then so is their product AB ∈ Mat(n × n; R) and (AB)−1 = B −1 A−1 . We can use this procedure to find the inverse of any vector space isomorphism L: U L ’A w V u ’B Rn A u w Rm −1 L−1 = ’−1 ◦ ’B A ◦A 5.16. Example. Let P2 be the space of polynomials of degree not more than 2. Consider the linear map a−c a+b+c 2 a+b x + x+ L : P2 → P2 ; ax 2 + bx + c 7→ 3 2 2 Matrices Slide 168 Inverse Maps We choose a basis (any will do) of P2 : B = (x 2 ; x; 1). Then 0 1 0 1 a+b+c 3 C B a+b 2 ’B (L(ax + bx + c)) = @ 2 A a−c 2 a B C ’B (ax 2 + bx + c) = @b A ; c We can read off that 0 1 1=3 1=3 1=3 B C 0 A A = @1=2 1=2 1=2 0 −1=2 and (with a little bit of work) calculate 0 1 3 −2 2 B C −1 A = @−3 4 −2A 3 −2 0 Matrices Slide 169 Inverse Maps Now we are able to calculate the inverse of L: −1 L−1 (ax 2 + bx + c) = ’−1 ◦ ’B (ax 2 + bx + c) B ◦A 0 1 a −1 B C b = ’−1 ◦ A @ A B c 0 10 1 0 1 3 −2 2 a CB C −1 B = ’B @−3 4 −2A @b A 3 −2 0 c 3a − 2b + 2c B C −3a + 4b − 2c A = ’−1 @ B 3a − 2b = (3a − 2b + 2c)x 2 + (−3a + 4b − 2c)x + 3a − 2b Matrices Slide 170 Changes of Basis Suppose that in Rn we are given an initial basis B = {e1 ; : : : ; en }, e.g., the standard basis. A vector x ∈ Rn has the representation x= n X xi ei i=1 We now wish to represent x in terms of a new basis, B′ = {e1′ ; : : : ; en′ }. Let us suppose that T is the linear map such that T ei = ei′ , i = 1; : : : ; n. Then T is uniquely defined and invertible. If (e1 ; : : : ; en ) is the standard basis, then T may be represented as T = (e1′ ; : : : ; en′ ): Matrices Slide 171 Changes of Basis 5.17. Example. Consider a rotation by 45◦ in the clockwise direction, T: R →R ; 2 2 ! 1 0 ! 1 1 7 √ → ; 2 −1 so 0 1 ! ! 1 1 7 √ → 2 1 ! 1 1 1 : T =√ −1 1 2 x2 ·e2 x2 ·e2 x2 ·e2′ x x1 ·e1′ x x1 ·e1 x1 ·e1 Matrices Slide 172 Changes of Basis Suppose that x= n X i=1 xi ei = n X xi′ ei′ = i=1 Then T −1 x = n X xi′ T ei : i=1 n X xi′ ei i=1 and we can find the coordinates xi′ , i = 1; : : : ; n, of x with respect to B′ simply by applying T −1 . x 2 ·e2 x2 ·e2 x T -1 x x1 ·e1 x 1 ·e1 Matrices Slide 173 Active and Passive Points of View We can therefore implement the passive change of basis T for x by the active action of T −1 on x. In the passive point if view, we are not doing anything - x stays exactly the same, it is simply re-written in terms of another basis. In the active point of view, something is happening: we are applying a linear map to x. Both points of view are equally valid. The active point of view sometimes appears easier, while the passive point of view is often more more elegant. Matrices Slide 174 Reflection in R2 5.18. Example. Consider the reflection of vectors in R2 by the x1 axis, i.e., the map A: R → R ; 2 x1 A x2 2 The matrix representation of A is simply A= ! 1 0 : 0 −1 ! = ! x1 : −x2 (5.4) Now we want to consider the reflection in R2 about the line through the vector ! 1 y= : 2 Of course, the correct matrix for this reflection can be found geometrically. However, here we want to illustrate how a change of basis can help us determine this matrix algebraically. Matrices Slide 175 A Suitable Basis for the Reflection ` ´ Denote by L the reflection about the line through y = e1′ = 12 . A vector ` ´ ′ ′ perpendicular to y is e2′ = −2 1 and so we can choose A = (e1 ; e2 ) as a basis. These vectors have the property that Le1′ = e1′ Le2′ = −e2′ ; and so in this basis the action of L is known and simple. Now e1′ = T e1 and e2′ = T e2 where T = ! 1 −2 2 1 and we note that T −1 ! 1 1 2 = : 5 −2 1 The strategy for calculating the action of the reflection L is now as follows: 1. Change to the basis (e1′ ; e2′ ); 2. Execute the reflection in this basis. It is given by the matrix A of (5.4); 3. Change back to the basis (e1 ; e2 ). Matrices Slide 176 Active Point of View The basis change can be implemented actively by applying T −1 , we then apply A, and then change back (actively) by applying T to x: L = T AT −1 = ! ! 1 −2 2 1 1 −3 4 = 5 4 3 ! 1 0 1 1 2 0 −1 5 −2 1 ! It is easily verified that Le1′ = e1′ and Le2′ = −e2′ , as expected. Matrices Slide 177 Passive Point of View In the passive point of view we regard L : V → V where V = R2 is imbued with the basis B′ = {e1′ ; e2′ }. We then find the representing matrix A for L with respect to this basis: R2 L ’B ′ w R2 ’B ′ u R 2 A u w R2 Of course, ’B′ = T −1 maps (e1′ ; e2′ ) into (e1 ; e2 ). Then it is easy to see that ! 1 0 A= 0 −1 and therefore L = T AT −1 as above. Theory of Systems of Linear Equations Slide 178 6. Theory of Systems of Linear Equations Theory of Systems of Linear Equations Linear Algebra 1. Systems of Linear Equations 2. Finite-Dimensional Vector Spaces 3. Inner Product Spaces 4. Linear Maps 5. Matrices 6. Theory of Systems of Linear Equations 7. Determinants Slide 179 Theory of Systems of Linear Equations Slide 180 The Solution Set of Systems of Equations We briefly return to the theory of solvability of linear systems of equations Ax = b. We define the solution set Sol(A; b) = {x ∈ Rn : Ax = b}: If x0 ∈ Rn satisfies Ax0 = b we say that x0 is a particular solution of Ax = b. The associated homogeneous solution set is Sol(A; 0) = {x ∈ Rn : Ax = 0} = ker A: A very important, fundamental result states: The solution set of Ax = b is the sum of the homogeneous solution set and a particular solution. Theory of Systems of Linear Equations Slide 181 Structure of the Solution Set 6.1. Lemma. Let x0 ∈ Rn be a particular solution of Ax = b. Then Sol(A; b) = {x0 } + ker A = {y ∈ Rn : y = x0 + x; x ∈ ker A}: where the sum of sets is understood as in Definition 2.24. Proof. (i) Sol(A; b) ⊃ {x0 } + ker A: Let x ∈ ker A. Then A(x0 + x) = Ax0 + Ax = Ax0 = b; so x0 + x ∈ Sol(a; b). (ii) Sol(A; b) ⊂ {x0 } + ker A: Let v ∈ Sol(A; b). Then A(v − x0 ) = Av − Ax0 = b − b = 0; so v − x0 ∈ ker A, implying v ∈ {x0 } + ker A. Theory of Systems of Linear Equations Slide 182 Solvability of Systems of Equations The following results follow immediately: 6.2. Corollary. If x0 is a solution of Ax = b and {v1 ; : : : ; vr } a basis of ker A, then Sol(A; b) = {x ∈ Rn : x = x0 + –1 v1 + · · · + –r vr : –1 ; : : : ; –r ∈ R}: Here r = dim ker A. 6.3. Corollary. Suppose that the linear system of equations Ax = b has a solution. Then the solution is unique if and only if ker A = {0}. Theory of Systems of Linear Equations Slide 183 Solvability of Systems of Equations This gives rise to a further, fundamentally important result: 6.4. Fredholm Alternative. Let A be an n × n matrix. Then I either Ax = b has a unique solution for any b ∈ Rn I or Ax = 0 has a non-trivial solution. Proof. Either ker A = {0} (in which case Ax = b has the solution x = A−1 b for any b ∈ Rn ) or x0 ∈ ker A is a non-trivial solution of Ax = 0. The Fredholm alternative occurs in many more complicated contexts. This is the most basic case. Theory of Systems of Linear Equations Slide 184 Matrix Rank 6.5. Definition. Let A ∈ Mat(m × n; F) be a matrix with columns a·j ∈ Fm , 1 ≤ j ≤ n, and rows ai· ∈ Fn , 1 ≤ i ≤ m. Then we define I the column rank of A to be column rank A := dim span{a·1 ; : : : ; a·n } I and the row rank of A to be row rank A := dim span{a1· ; : : : ; am· }: 6.6. Remarks. I The column rank is the greatest number of independent column vectors a·j that can be selected from all columns. This is analogously true for the row rank. I column rank A = row rank AT . I column rank A = dim ran A. Theory of Systems of Linear Equations Slide 185 Matrix Rank 6.7. Definition and Theorem. Let A ∈ Mat(m × n; F). Then the column rank is equal to the row rank and we define the rank of A by rank A := column rank A = row rank A: Proof. In the assignments it will be shown that ran A = (ker A)⊥ : T Then, using Corollary 3.24 and the dimension formula (4.3), T T row rank A = column rank AT = column rank A = dim ran A = dim(ker A)⊥ = n − dim ker A = dim ran A = column rank A: Here we have used that complex conjugation is a linear, bijective map C → C if C is regarded as a real vector space. Theory of Systems of Linear Equations Slide 186 Existence of Solutions The fundamental theorem on the existence of solutions to a linear system of equations is the following: 6.8. Theorem. There exists a solution x for Ax = b if and only if rank A = rank(A | b), where 0 a11 B .. (A | b) = @ . ::: a1n .. . 1 b1 .. C ∈ Mat((n + 1) × m): . A am1 : : : amn bm Theory of Systems of Linear Equations Slide 187 Solvability of Systems of Equations Proof. We write A = (a·1 ; : : : ; a·n ), where the a·k ∈ Rm are column vectors of A. Then we use that the range of a matrix is the span of its column vectors and the rank is the dimension of the range, so Ax = b has solution x ∈ Rn ⇔ b ∈ ran A ⇔ b ∈ span{a·1 ; : : : ; a·n } ⇔ b is not independent of a·1 ; : : : ; a·n ⇔ dim span{a·1 ; : : : ; a·n } = dim span{a·1 ; : : : ; a·n ; b} ⇔ dim ran A = dim ran(A | b) ⇔ rank A = rank(A | b) Theory of Systems of Linear Equations Slide 188 Manipulating Matrices A matrix is just a list of lists. The command Append is used to add elements to a list: Append@8a, b, c, d<, xD 8a, b, c, d, x< We want to use this to check rank A = rank(A | b). Define a matrix A and a vector b as follows:: A = TableAai ,j , 8i, 2<, 8j, 3<E; B = Table@bi , 8i, 2<D; Print@"A = " MatrixForm@AD, ", b = " MatrixForm@BDD A = a1,1 a1,2 a1,3 b , b = K 1O a2,1 a2,2 a2,3 b2 Theory of Systems of Linear Equations Slide 189 Manipulating Matrices Since a matrix is a list of row vectors, it is easy to add a row: Append@A, 8x, y, z<D MatrixForm a1,1 a1,2 a1,3 a2,1 a2,2 a2,3 x y z To add a column, we could transpose, add a row, and transpose again: Transpose@Append@Transpose@AD, BDD MatrixForm a1,1 a1,2 a1,3 b1 a2,1 a2,2 a2,3 b2 However, the repeated transposition is inefficient and may cost significant computing resources for large matrices. Theory of Systems of Linear Equations Slide 190 Manipulating Matrices There exists a specialized command to achieve the same result without transposition: MapThread@Append, 8A, B<D MatrixForm a1,1 a1,2 a1,3 b1 a2,1 a2,2 a2,3 b2 The rank of a matrix is found through the MatrixRank command. Theory of Systems of Linear Equations Slide 191 Manipulating Matrices 6.9. Example. A = 881, 2, 3<, 84, 5, 6<, 87, 8, 9<<; MatrixRank@AD 2 b = 83, 4, 5<; MatrixRank@MapThread@Append, 8A, b<DD 2 b = 83, 4, 6<; MatrixRank@MapThread@Append, 8A, b<DD 3 Theory of Systems of Linear Equations Slide 192 Manipulating Matrices The kernel of a matrix is obtained from the NullSpace command: NullSpace@881, 2, 3<, 84, 5, 6<, 87, 8, 9<<D 881, -2, 1<< The output is a list of basis vectors of the kernel of the matrix. Determinants Slide 193 7. Determinants Determinants Linear Algebra 1. Systems of Linear Equations 2. Finite-Dimensional Vector Spaces 3. Inner Product Spaces 4. Linear Maps 5. Matrices 6. Theory of Systems of Linear Equations 7. Determinants Slide 194 Determinants Slide 195 Parallelograms We will motivate determinants geometrically (as areas of parallelograms/volumes of parallelepipeds) rather than algebraically (via solutions of systems of linear equations). Consider a parallelogram P (a; b) spanned by two non-colinear vectors a; b ∈ R2 . a¦ b Θ a We are interested in the area A(a; b) of the parallelogram, which is equal to the area of the rectangle with width |a| and height given by |b||cos „|. Let a = (a1 ; a2 ), a⊥ = (−a2 ; a1 ). Then a ⊥ a⊥ , i.e., ⟨a; a⊥ ⟩ = 0 and |a⊥ | = |a|. From (3.3) it follows that |b| cos „ = * a⊥ ;b |a⊥ | + Determinants Slide 196 The Determinant in R2 We obtain ˛* +˛ ˛ a⊥ ˛ |a| ˛ ˛ A(a; b) = |a| ˛ ; b ˛ = ⊥ |⟨a⊥ ; b⟩| = |⟨a⊥ ; b⟩| = |a1 b2 − a2 b1 |: ⊥ ˛ |a | ˛ |a | We remark that A(a; b) = |⟨a⊥ ; b⟩| = |a||b| sin ^(a; b) We define the determinant as a map det : R × R → R; 2 2 det ! b1 a1 ; b2 a2 !! = a1 b2 − a2 b1 ; (7.1) so that A(a; b) = |det(a; b)|. The determinant is an oriented area. Equivalently, the determinant may be regarded as a map det : Mat(2 × 2; R) → R; a b det 1 1 a2 b2 ! = a1 b2 − a2 b1 : (7.2) Determinants Slide 197 Properties of the Determinant Both interpretations of the determinant will be used frequently. 7.1. Remark. We note the following properties of the determinant: 1. det is normed, i.e., 1 0 det(e1 ; e2 ) = det 0 1 ! = 1: 2. det is bilinear: det(–a; b) = – det(a; b) = det(a; –b); det(a + b; c) = det(a; c) + det(b; c); det(a; b + c) = det(a; b) + det(a; c): This can be easily seen geometrically by considering the volumes of the parallelograms. Determinants Slide 198 Properties of the Determinant 3. det is alternating, i.e., det(a; a) = a1 a2 − a2 a1 = 0. Note that this implies that det(a; b) = − det(b; a), since (using the bilinearity) 0 = det(a + b; b + a) = det(a; a) + det(b; b) + det(a; b) + det(b; a) = det(a; b) + det(b; a): (In the case of two variables, an alternating map is often called antisymmetric.) Determinants Slide 199 Vector Product in R3 We now introduce the “vector product” a × b of two vectors a; b ∈ R3 . The vector a × b ∈ R3 is determined by 1. its length: we set |a × b| = A(a; b), the area of the parallelogram spanned by a and b (if a and b are linearly dependent, we set a × b = 0); 2. its direction: we want a × b to be orthogonal to a and b - in other words, a × b ⊥ span{a; b}; 3. its orientation: (a; b; a × b) should form a “right-hand system” (defined using the thumb, index finger and middle finger of the right hand). This is sufficient to define a unique vector a × b for a; b ∈ R3 , i.e., we have a map × : R3 × R3 → R3 . 7.2. Remark. In contradistinction to the scalar product, which can be defined on Rn for n = 1; 2; : : :, the vector product is only defined on R3 . Rechte Hand Regel [modified]. Wikimedia Commons. Wikimedia Foundation. Web. 9 May 2012 Determinants Slide 200 The Right Hand Rule Determinants Slide 201 Properties of the Vector Product in R3 Note that the vector product is 1. bilinear: the homogeneity (–a) × b = –(a × b) = a × (–b) follows from the definition of the cross product, the additivity a × (b + c) = a × b + a × c, (a + b) × c = a × c + b × c is easy to see geometrically when a; b; c are coplanar and slightly more difficult to show when they are not. 2. antisymmetric: a × a = 0, or a × b = −b × a. We can compute the vector product of the standard basis vectors with each other: e1 × e2 = e3 = −e2 × e1 e2 × e3 = e1 = −e3 × e2 e3 × e1 = e2 = −e1 × e3 e1 × e1 = 0 = e2 × e2 = e3 × e3 (7.3) Determinants Slide 202 Calculating the Vector Product in R3 Using the bilinearity and (7.3), we can now calculate a × b for arbitrary a; b ∈ R3 : a × b = (a1 e1 + a2 e2 + a3 e3 ) × (b1 e1 + b2 e2 + b3 e3 ) = 3 X ai bj (ei × ej ) i;j=1 = (a2 b3 − a3 b2 )e1 + (a3 b1 − a1 b3 )e2 + (a1 b2 − a2 b1 )e3 0 + det “ a2 b 2 a3 b 3 ”1 B “ ”C C B a1 b1 C B = B− det a3 b3 C @ ”A “ + det a1 b1 a2 b2 (7.4) Determinants Slide 203 Parallel Epipeds We now consider the problem of finding the volume of a parallel epiped spanned by three vectors a; b; c ∈ R3 . The volume is given by the base area (the area of the parallelogram spanned by a; b) multiplied with the height, |c||cos „|. Using the fact that a × b is orthogonal to a and b we have |c| cos „ = fi fl a×b ;c ; |a × b| so the volume is given by ˛fi fl˛ ˛ a×b ˛ V (a; b; c) = |a × b| ˛˛ ; c ˛˛ = |⟨a × b; c⟩|: |a × b| Determinants Slide 204 The Determinant in R3 We therefore define the determinant as an oriented volume, det : R3 × R3 × R3 → R; det(a; b; c) = ⟨a × b; c⟩: (7.5) (Again, we note that it can be equivalently defined Mat(3 × 3; R) → R.) Note that det(a; b; c) > 0 if (a; b; c) form a right-hand system, (7.6) det(a; b; c) < 0 if (a; b; c) form a left-hand system, (7.7) det(a; b; c) = 0 if a = –b or a = –c or b = –c for any – ∈ R. (7.8) The last property follows from the properties of the vector and scalar products: if a = –b then a × b = 0, and since a × b is orthogonal both a and b, the scalar product ⟨a × b; c⟩ will vanish if a = –c or b = –c. Determinants Slide 205 Cyclic Permutations Let (x1 ; : : : ; xn ) be an ordered list of elements. Define a relation ≺ by x1 ≺ x2 ≺ x3 ≺ · · · ≺ xn−1 ≺ xn ≺ x1 (“x1 precedes x2 precedes x3 etc.”). Let ı : {x1 ; : : : ; xn } → {x1 ; : : : ; xn } be a bijective map (such a map is called a permutation). Then any list (ı(x1 ); : : : ; ı(xn )) is called a cyclic permutation of (x1 ; : : : ; xn ) if ı(x1 ) ≺ ı(x2 ) ≺ · · · ≺ ı(xn ) ≺ ı(x1 ). Furthermore, if (a; b; c) form a right-hand system, then V (a; b; c) = det(a; b; c). Since the volume is independent of the designation of the vectors, we observe that a cyclic permutation of (a; b; c) preserves the right-handedness and det(a; b; c) = det(c; a; b) = det(b; c; a) or ⟨a × b; c⟩ = ⟨c × a; b⟩ = ⟨b × c; a⟩: Determinants Slide 206 Calculating Determinants in R3 Note that by (7.4), det(a; b; c) = ⟨b × c; a⟩ = 3 X ai (b × c)i i=1 ! ! b c b c b c = a1 det 2 2 − a2 det 1 1 + a3 det 1 1 b3 c3 b3 c3 b2 c2 0 ! 1 a1 b1 c1 B C = det @a2 b2 c2 A a3 b3 c3 (7.9) We may therefore calculate a 3 × 3 determinant det A by calculating 2 × 2 subdeterminants. Denote by Akj the 2 × 2 matrix obtained from A by deleting the kth row and the jth column, (7.9) can be written as 0 1 a11 a12 a13 3 X B C det A = det @a21 a22 a23 A = (−1)k+1 ak1 det Ak1 k=1 (7.10) Determinants Slide 207 Calculating Determinants in R3 We will prove later that in fact 0 1 0 1 a1 b1 c1 a1 a2 a3 B C B C det @a2 b2 c2 A = det @b1 b2 b3 A : a3 b3 c3 c1 c2 c3 This (together with (7.9)) motivates the mnemonic 0 1 e1 e2 e3 C B a × b = det @a1 a2 a3 A ; b1 b2 b3 where e1 ; e2 ; e3 are the standard unit basis vectors and a = (a1 ; a2 ; a3 ), b = (b1 ; b2 ; b3 ). Determinants Slide 208 Properties of the Determinant in R3 We once more note the following properties of the determinant: 1. det is normed, i.e., 0 1 1 0 0 B C det(e1 ; e2 ; e3 ) = det @0 1 0A = 1: 0 0 1 2. det is trilinear: det(–a; b; c) = – det(a; b; c) = det(a; –b; c) = det(a; b; –c); det(a + b; c; d) = det(a; c; d) + det(b; c; d); det(a; b + c; d) = det(a; b; d) + det(a; c; d); det(a; b; c + d) = det(a; b; c) + det(a; b; d) These properties follow from the corresponding properties of the scalar and vector products. 3. det is alternating (see (7.8)). Determinants Slide 209 Preview of Determinants in Rn Our goal is now to find a generalization of the determinant that has the three main properties of being I multilinear, I alternating and I normed. It will turn out that these three properties are actually sufficient to define the determinant uniquely; there is only one map Mat(n × n; R) → R with these properties, and for n = 2; 3 it is given by (7.2) and (7.5), respectively. In the case of n = 1, Mat(1 × 1; R) is equivalent to R and we define det(a) = a. This definition trivially has the properties of being normed and linear; while it makes no sense to define what alternating means. Determinants Slide 210 Preview of Determinants in Rn We will further see that one possible formula for determinants det : Mat(n × n) → R can be constructed recursively, similar to (7.10). In fact, if A ∈ Mat(n × n; R), we define Akj ∈ Mat((n − 1) × (n − 1); R) as the matrix obtained from A by deleting the kth row and the jth column. Then for any j = 1; : : : ; n we will obtain the recursion formula det A = n X (−1)k+j akj det Akj (7.11) k=1 In order to understand the extension of determinants to Rn better, we need to formalize the concept of permutations. Determinants Slide 211 Groups 7.3. Definition. A group is a pair (G; ◦) consisting of a set G and a group operation ◦ : G × G → G such that 1. a ◦ (b ◦ c) = (a ◦ b) ◦ c for all a; b; c ∈ G (associativity), 2. there exists an element e ∈ G such that a ◦ e = e ◦ a = a for all a ∈ G (existence of a unit element), 3. for every a ∈ G there exists an element a−1 ∈ G such that a ◦ a−1 = a−1 ◦ a = e (existence of an inverse). A group is called commutative if in addition to the above properties 4. a ◦ b = b ◦ a for all a; b ∈ G (commutativity). Determinants Slide 212 Groups and Permutations 7.4. Examples. 1. Any vector space (V; +; ·) may be regarded as a commutative group (V; +) with the additional operation of scalar multiplication. 2. The set of invertible matrices, GL(n; R) := {A ∈ Mat(n × n; R) : A is invertible} is a group with the group operation given by matrix multiplication (composition of maps). 7.5. Definition. The set of all permutations of n elements Sn = {ı : {x1 ; : : : ; xn } → {x1 ; : : : ; xn } : ı bijective} together with the group operation “composition of maps”, ı1 ◦ ı2 (x) = ı1 (ı2 (x)) is called the symmetric group. Determinants Slide 213 Permutations It is easy to check that (Sn ; ◦) in fact has properties i) – iii), but not property iv). We will often denote a group by G instead of (G; ◦) if no confusion arises therefrom. A permutation of n elements is a finite map; recall that a function f is defined by pairs of the form (x; f (x)), where x is the independent variable. A permutation is defined on a set of n elements; instead of {x1 ; : : : ; xn } we can also simply write {1; : : : ; n}, replacing the permutation of elements with a permutation of indices. Then we might define a permutation ı through a set of pairs {(1; ı(1)); : : : ; (n; ı(n))}. In fact, we do represent permutations in this way, but us a different notation, writing ı= ! 1 2 ::: n ı(1) ı(2) : : : ı(n) Determinants Slide 214 Transpositions For example, if n = 2, there are only two permutations ı1 ; ı2 ∈ S2 , ı1 : 1 7→ 1; ı2 : 1 7→ 2; ! 2 7→ 2; ı1 = 1 2 ; 1 2 2 7→ 1; ı2 = 1 2 : 2 1 ! (7.12) 7.6. Definition. A permutation in Sn that leaves exactly n − 2 elements invariant is called a transposition. A transposition fi ∈ Sn has the form fi (k) = for some i; j ∈ {1; : : : ; n}. 8 > >i < j > > :k if k = j if k = i otherwise (7.13) Determinants Slide 215 Permutations as Transpositions 7.7. Lemma. Every permutation ı ∈ Sn , n ≥ 2, is a composition of transpositions, ı = fi1 ◦ · · · ◦ fik . Note that the transpositions fij and the number k are not uniquely defined. Proof. We proceed by induction. For n = 2 there are only two permutations ı1 and ı2 (see (7.12)); ı2 is a transposition, and ı1 = ı2 ◦ ı2 . We now assume that any permutation in Sn can be written as a composition of transpositions and prove that this is also true for any permutation in Sn+1 . Let ı ∈ Sn . Then we can consider e= ı 1 ::: n n+1 ı(1) : : : ı(n) n + 1 ! (7.14) e ∈ Sn+1 of the form (7.14) as an element of Sn+1 . Also, every element ı can be regarded as an element ı ∈ Sn . Determinants Slide 216 Permutations as Transpositions Proof (continued). Now let ff ∈ Sn+1 and let fi be the transposition that exchanges n + 1 and ff −1 (n + 1). Then ff ◦ fi : n + 1 7−→ ff −1 (n + 1) 7−→ n + 1 fi ff so that ff ◦ fi = 1 ::: n n+1 ı(1) : : : ı(n) n + 1 ! for some values ı(1); : : : ; ı(n), ı ∈ Sn . It follows that ff ◦ fi can be written as a composition of transpositions fi1 ◦ · · · ◦ fik , ff ◦ fi = fi1 ◦ · · · ◦ fik ; so ff = fi1 ◦ · · · ◦ fik ◦ fi −1 , which proves the assertion. Determinants Slide 217 Sign of a Permutation While the number of transpositions that make up a permutation is not unique, we do have the following: 7.8. Definition and Theorem. Let ı ∈ Sn be represented as a composition of k transpositions, ı = fi1 ◦ · · · ◦ ık . Then the sign of ı, sgn ı := (−1)k does not depend on the representation chosen. In order to prove this, we need an additional concept from group theory, which we introduce on the following slide. In advance, we that the sign is “well-behaved”: sgn(ı1 ◦ ı2 ) = sgn ı1 sgn ı2 for any ı1 ; ı2 ∈ Sn . Determinants Slide 218 Group Actions 7.9. Definition. Let (G; ◦) be a group and X a set. Then an action (or operation) of G on X from the left is a map Φ: G × X → X (g ; x) 7→ Φ(g ; x) = Φg x = g x with the properties 1. ex = x (e ∈ G is the unit element), 2. (a ◦ b)x = a(bx) for a; b ∈ G, x ∈ X. We say that G acts (operates) on X. 7.10. Proposition. Let X be the set of all maps f : Rn → R. Then Sn acts on X via (ıf )(x1 ; : : : ; xn ) = f (xı(1) ; : : : ; xı(n) ); ı ∈ Sn : Determinants Slide 219 Group Actions Proof. We need to show the properties i) and ii) of Definition 7.9. The unit element of Sn is ! 1 ::: n ıe = ; 1 ::: n so trivially ıe f = f , since (ıe f )(x1 ; : : : ; xn ) = f (xıe (1) ; : : : ; xıe (n) ) = f (x1 ; : : : ; xn ): Furthermore, let ff; ı ∈ Sn . Then [ff(ıf )](x) = (fff )(xı(1) ; : : : ; xı(n) ) = f (xff(ı(1)) ; : : : ; xff(ı(n)) ) = f (x(ff◦ı)(1) ; : : : ; x(ff◦ı)(n) ) = [(ff ◦ ı)f ](x1 ; : : : ; xn ); so ff(ıf ) = (ff ◦ ı)f . Determinants Slide 220 Group Actions 7.11. Lemma. Denote by ∆ : Rn → R the function ∆(x1 ; : : : ; xn ) = Y (xj − xi ): (7.15) i<j Then fi ∆ = −∆ for any transposition fi ∈ Sn . Proof. Let r; s ∈ {1; : : : ; n}, r < s, and fi the transposition exchanging r and s, fi = 1 ::: r − 1 r r + 1 ::: s − 1 s s + 1 ::: 1 ::: r − 1 s r + 1 ::: s − 1 r s + 1 ::: ! n : n Determinants Slide 221 Group Actions Proof (continued). Note that fi ∆(x1 ; : : : ; xn ) = Y fi (xj − xi ): i<j Then fi (xs − xr ) = −(xs − xr ): (7.16) All other factors in (7.15) either do not contain xr or xs (and are left unchanged by fi ) or occur in one of the following pairings: I j < r : (xr − xj )(xs − xj ) I r < j < s: (xs − xj )(xj − xr ) I s < j: (xj − xs )(xj − xr ) Each of these pairs is left invariant by fi , so the sign change in (7.16) is the only effect of fi on ∆. Determinants Slide 222 Sign of a Permutation 7.12. Corollary. For every permutation ı = fi1 ◦ · · · ◦ fik ∈ Sn , ı∆ = (fi1 ◦ · · · ◦ fik )∆ = (−1)k ∆: In particular, sgn ı = (−1)k ; does not depend on the decomposition of ı into transpositions and is therefore well-defined. Proof. Let ı ∈ Sn and assume that there are transpositions fi1 ; : : : fik , fie1 ; : : : ; fiel such that ı = fi1 ◦ · · · ◦ fik = fie1 ◦ · · · ◦ fiel : Then ı∆(x1 ; : : : ; xn ) = (−1)k ∆(x1 ; : : : ; xn ) = (−1)l ∆(x1 ; : : : ; xn ). Choosing some x1 ; : : : xn such that ∆(x1 ; : : : ; xn ) ̸= 0 we obtain (−1)k = (−1)l . Determinants Slide 223 p-Multilinear Maps n 7.13. Definition. A function f : R × ·{z · · × Rn} → R is said to be a | p times p-multilinear map (or p-multilinear form) if f is linear in each entry, i.e., f (–a1 ; a2 ; : : : ; ap ) = –f (a1 ; a2 ; : : : ; ap ) and f (a1 + b; a2 ; : : : ; ap ) = f (a1 ; a2 ; : : : ; ap ) + f (b; a2 ; : : : ; ap ) for b; a1 ; : : : ; ap ∈ Rn and – ∈ R and analogous equations hold for the other entries. The form is said to be alternating if f (a1 ; : : : ; ap ) = 0 whenever aj = ak for any j = ̸ k. An n-multilinear form is said to be normed if f (e1 ; : : : ; en ) = 1, where e1 ; : : : ; en are the standard basis vectors in Rn . Determinants Slide 224 Characterization of Alternating Forms We will prove that the properties of being multilinear, alternating and normed are sufficient to uniquely define the determinant in Rn . First, however, we give a useful result: n 7.14. Lemma. Let f : R × ·{z · · × Rn} → R be a p-multilinear map. Then | p times the following are equivalent: (i) f is alternating (ii) f (a1 ; : : : ; aj−1 ; aj ; aj+1 ; : : : ; ak−1 ; ak ; ak+1 ; : : : ; ap ) = −f (a1 ; : : : ; aj−1 ; ak ; aj+1 ; : : : ; ak−1 ; aj ; ak+1 ; : : : ; ap ) (iii) f (a1 ; : : : ; ap ) = 0 if a1 ; : : : ; ap are linearly dependent. The proof is not difficult and left as an exercise! Determinants Slide 225 Determinants in Rn We will now define the determinant as an alternating, normed, n-multilinear function for column vectors in Rn and corresponding square matrices whose columns consist of these vectors, using the notation 1 0 a1j B .. C aj = @ . A ; anj 0 1 a11 a12 : : : a1n B .. .. .. C (j = 1; : : : ; n); A = (a1 ; : : : ; an ) = @ . . . A an1 an2 : : : ann 7.15. Theorem. For every n ∈ N, n > 1, there exists a unique, normed, n alternating n-multilinear form det : R × ·{z · · × Rn} ∼ = Mat(n × n; R) → R. | n times Furthermore, det(a1 ; : : : ; an ) = det A = X ı∈Sn sgn ı aı(1)1 · · · aı(n)n : (7.17) Determinants Slide 226 Determinants in Rn Proof. We will first show that the determinant defined in (7.17) in fact has the required poperties. 1. (det is multilinear) Let a1 ; : : : ; an ; b ∈ Rn . Then we show the additivity in the first entry (the proof for all other entries is completely analogous) det(a1 + b; a2 ; : : : ; an ) = X sgn ı (aı(1)1 + bı(1) )aı(2)2 · · · aı(n)n ı∈Sn = X ı∈Sn X sgn ı aı(1)1 · · · aı(n)n + sgn ı bı(1) · · · aı(n)n ı∈Sn = det(a1 ; a2 ; : : : ; an ) + det(b; a2 ; : : : ; an ) The homogeneity is shown analogously Determinants Slide 227 Determinants in Rn Proof (continued). 2. (det is normed) Let 0 1 ‹1j B .. C ej = @ . A (j = 1; : : : ; n); ‹nj ‹ij = ( 1 0 i = j; i ̸= j: Then for any permutation ı ∈ Sn , ‹ı(1)1 · · · ‹ı(n)n = ( 1 ı(k) = k; k = 1; : : : ; n; 0 otherwise: Determinants Slide 228 Determinants in Rn Proof (continued). 2. It follows that in the summation of the permutations only the summand with ı= ! 1 2 ::: n − 1 n ; 1 2 ::: n − 1 n sgn ı = 1; survives. Thus det(e1 ; : : : ; en ) = X ı∈Sn sgn ı ‹ı(1)1 · · · ‹ı(n)n = 1: Determinants Slide 229 Determinants in Rn Proof (continued). 3. (det is alternating) We will show that det(a1 ; a2 ; : : : ; an−1 ; an ) = − det(an ; a2 ; : : : ; an−1 ; a1 ) (again, the proof is similar when any other entries are exchanged).Let fi = 1 2 ::: n − 1 n n 2 ::: n − 1 1 ! ∈ Sn : (7.18) be the transposition exchanging 1 and n. We will use that sgn fi = −1 and that summing over all permutations ı ∈ Sn is the same as summing over all ı ◦ fi ∈ Sn , when fi is fixed by (7.18). Determinants Slide 230 Determinants in Rn Proof (continued). 3. Then det(an ; a2 ; : : : ; an−1 ; a1 ) = X sgn ı aı(1)n aı(2)2 · · · aı(n−1)(n−1) aı(n)1 ı∈Sn = X sgn ı aı(n)1 aı(2)2 · · · aı(n−1)(n−1) aı(1)n ı∈Sn =− X sgn(ı ◦ fi ) aı(fi (1))1 aı(fi (2))2 · · · aı(fi (n−1))(n−1) aı(fi (n))n ı∈Sn =− X sgn(ı ◦ fi ) aı(fi (1))1 aı(fi (2))2 · · · aı(fi (n−1))(n−1) aı(fi (n))n ı◦fi ∈Sn = − det(a1 ; a2 ; : : : ; an−1 ; an ): Determinants Slide 231 Determinants in Rn Proof (continued). We next show that the properties of the determinant imply the formula (7.17). By multilinearity we have det(a1 ; : : : an ) = det n “X aj1 1 ej1 ; : : : ; j1 =1 n X = n X ajn n ejn ” jn =1 aj1 1 · · · ajn n det(ej1 ; : : : ; ejn ) j1 ;:::;jn =1 Since det is supposed to be alternating, all summands vanish where any jk occurs more than once. We therefore sum only over permutations of {1; : : : ; n}, det(a1 ; : : : an ) = X ı∈Sn aı(1)1 · · · aı(n)n det(eı(1) ; : : : ; eı(n) ) Determinants Slide 232 Determinants in Rn Proof (continued). Again, because det is alternating and assuming each ı is composed of k transpositions, det(eı(1) ; : : : ; eı(n) ) = (−1)k det(e1 ; : : : ; en ) = sgn ı det(e1 ; : : : ; en ): Since det is normed, det(e1 ; : : : ; en ) = 1, so we finally have det(a1 ; : : : an ) = X ı∈Sn aı(1)1 · · · aı(n)n sgn ı: Determinants Slide 233 Determinants and Elementary Column Operations Since the determinant is alternating and multilinear, we see that the Elementary Column Operations 5.8 affect the determinant as follows: I The determinant of a matrix A changes sign if two columns of A are interchanged, e.g., det(a2 ; a1 ; : : : ; an ) = − det(a1 ; a2 ; : : : ; an ) I Multiplying all the entries in a column with a number – leads to the determinant being multiplied by this constant: det(a1 ; : : : ; –aj ; : : : ; an ) = – det(a1 ; : : : ; aj ; : : : ; : : : ; an ) I Adding a multiple of a column to another column does not change the value of the determinant: det(a1 ; : : : ; aj ; : : : ; ak + –aj ; : : : ; an ) = det(a1 ; : : : ; aj ; : : : ; ak ; : : : ; an ) Determinants Slide 234 Determinants of Transposed Matrices 7.16. Lemma. Let A ∈ Mat(n × n; R). Then det A = det AT : Proof. We first note that for every ı ∈ Sn , sgn ı = sgn ı −1 and the sum over all ı is equal to the sum over all ı −1 . Then we can reorder the terms in each summand, so that det A = X sgn ı aı(1)1 · · · aı(n)n = ı∈Sn = X ı −1 ∈Sn T X sgn ı a1ı−1 (1) · · · anı−1 (n) ı∈Sn sgn ı −1 a1ı−1 (1) · · · anı−1 (n) = = det A : X ı∈Sn sgn ı a1ı(1) : : : anı(n) Determinants Slide 235 Determinants and Elementary Row Operations As a corollary, we can rewrite (7.17) in a more commonly seen form: 7.17. Leibnitz Formula. det A = X sgn ı a1ı(1) · · · anı(n) (7.19) ı∈Sn 7.18. Corollary. Elementary row manipulations of a matrix A affect the determinant of A in the same way as the corresponding elementary column manipulations. Proof. det A det AT row manipulation column manipulation w det B w det B T Determinants Slide 236 Triangular Determinants 7.19. Proposition. Let A ∈ Mat(n × n) have upper triangular form, i.e., 0 –1 B A=@ 0 .. ∗ . –n 1 C A for diagonal elements –1 ; : : : ; –n ∈ R and arbitrary values (denoted by ∗) above the diagonal. Then det A = –1 · · · –n : Determinants Slide 237 Triangular Determinants Proof. By multilinearity, det A = n “Y ” 0 1 1 ∗ C B –k det @ . . . A : i=k 0 1 The matrix in the determinant on the right can be transformed into the unit matrix through elementary row manipulations that do not change the value of the determinant. Therefore its determinant is 1, proving the result. Proposition 7.19 can be applied to calculate determinants of matrices A ∈ Mat(n × n) when A is first transformed to upper triangular form using elementary matrix manipulations. This is of practical use for n ≥ 4. Determinants Slide 238 Determinants and Invertibility of Matrices The following result is of fundamental importance for many applications: 7.20. Proposition. A matrix A ∈ Mat(n × n) is invertible if and only if det A ̸= 0. Proof. We first show that if A is not invertible, then det A = 0. The linear map A : Rn → Rn is invertible if and only if ran A = Rn . Since ran A is the span of the column vectors, A is invertible if and only if the column vectors are independent. But if the column vectors are not independent, then det A vanishes. Now let A = (a1 ; : : : ; an ) be invertible. By Lemma 5.13 A can be transformed into the unit matrix by elementary row operations. These only change the value of the determinant by a non-zero factor. Since the determinant of the unit matrix is 1, it follows that det A ̸= 0. Determinants Slide 239 Determinants and Systems of Equations The determinant can be used to give another formulation of Fredholm’s Alternative 6.4: 7.21. Fredholm Alternative. Let A ∈ Mat(n × n). Then either I det A = 0, in which case Ax = 0 has a non-zero solution x ∈ ker A, or I det A ̸= 0, then Ax = b has a unique solution x = A−1 b for any b ∈ n. R The proof is a straightforward application of the definitions and left to the reader. 7.22. Cramer’s Rule. Let A = (a1 ; : : : ; an ) ∈ Mat(n × n), a1 ; : : : ; an ∈ Rn , be invertible. Then the system Ax = b, b ∈ Rn , has the solution xi = 1 det(a1 ; : : : ; ai−1 ; b; ai+1 ; : : : ; an ); det A i = 1; : : : ; n: (7.20) Determinants Slide 240 Determinants and Systems of Equations Proof. n P We note that Ax = xk ak for A = (a1 ; : : : ; an ) ∈ Mat(n × n). Therefore, k=1 det(a1 ; : : : ; ai−1 ; b; ai+1 ; : : : ; an ) = det(a1 ; : : : ; ai−1 ; Ax; ai+1 ; : : : ; an ) “ = det a1 ; : : : ; ai−1 ; n X xk ak ; ai+1 ; : : : ; an ” k=1 = n X k=1 “ xk det a1 ; : : : ; ai−1 ; ak ; ai+1 ; : : : ; an “ ” ” = xi det a1 ; : : : ; ai−1 ; ai ; ai+1 ; : : : ; an + 0 = xi det A: Determinants Slide 241 Minors and Cofactors 7.23. Definition. Let A = (aij ) ∈ Mat(n × n). Denote the (n − 1) × (n − 1) matrix obtained from A by deleting the ith row and jth column by Aij = (akl )1≤k;l≤n : k̸=i l̸=j Then mij := det Aij is called the (i ; j )th minor of A. The number cij := (−1)i+j mij = (−1)i+j det Aij is called the (i ; j )th cofactor of A and the matrix Cof A := (cij )1≤i;j≤n is called the cofactor matrix of A. Determinants Slide 242 Determinants and Inversion of Matrices 7.24. Definition. Let A = (aij ) ∈ Mat(n × n). The transpose of the cofactor matrix of A is called the adjugate of A, denoted by A] := (Cof A)T 7.25. Theorem. Let A = (aij ) ∈ Mat(n × n) be invertible.Then A−1 = 1 A] det A The proof is based on a useful lemma, which we first establish. Determinants Slide 243 Determinants and Inversion of Matrices 7.26. Lemma. Let A = (a1 ; : : : ; an ) ∈ Mat(n × n) and ei be the ith standard basis vector in Rn . Then det(a1 ; : : : ; aj−1 ; ei ; aj+1 ; : : : ; an ) = (−1)i+j det Aij = cij where cij is the (i; j)th cofactor of A. Proof. Since the determinant is multilinear, we have det(a1 ; : : : ; aj−1 ; ei ; aj+1 ; : : : ; an ) = − det(a1 ; : : : ; aj−1 ; aj+1 ; ei ; aj+2 ; : : : ; an ) = (−1)n−j det(a1 ; : : : ; aj−1 ; aj+1 ; : : : ; an ; ei ): Determinants Slide 244 Determinants and Inversion of Matrices Proof (continued). Swapping ith and the (i + 1)st row, etc., we obtain n−j+n−i det(a1 ; : : : ; aj−1 ; ei ; aj+1 ; : : : ; an ) = (−1) Aij det ∗ ! 0 1 = (−1)i+j det B: where the entries in ∗ represent the elements of the ith row of A (with the jth entry deleted). Now from the definition (7.17), Aij det ∗ ! 0 1 = det B = X ı∈Sn sgn ı bı(1)1 · · · bı(n)n Determinants Slide 245 Determinants and Inversion of Matrices Proof (continued). Since bı(n)n = ‹nı(n) , we can write Aij det ∗ ! 0 1 = det B = X sgn ı bı(1)1 · · · bı(n−1)(n−1) bnn = det Aij ; ı∈Sn−1 |{z} =1 completing the proof. Proof of Theorem 7.25. Let A−1 = (x1 ; : : : ; xn ) = (xij ) be a matrix of column vectors x1 ; : : : ; xn . The inverse of A satisfies AA−1 = id, so we need to find columns xj of A−1 satisfying Axj = ej , j = 1; : : : ; n. By Cramer’s rule and Lemma 7.26, xij = 1 1 det(a1 ; : : : ; ai−1 ; ej ; ai+1 ; : : : ; an ) = (−1)i+j Aji : det A det A Determinants Slide 246 Laplace Expansion Another application of Lemma 7.26 is the expansion of det A in terms of the minors of A: 7.27. Laplace Expansion. For A ∈ Mat(n × n) and any j = 1; : : : ; n the recursion formula det A = n X (−1)i+j aij det Aij (7.21) i=1 holds Note that when using this expansion to calculate the determinant of an n × n matrix, n determinants of (n − 1) × (n − 1) matrices need to be evaluated. This means the number of computational steps required is much larger than when Proposition 7.19 is used. Determinants Slide 247 Laplace Expansion Proof. Let A = (a1 ; : : : ; an ), ak ∈ Rn , k = 1; : : : ; n. Then the jth column has the P representation aj = ni=1 aij ei and “ det A = det a1 ; : : : ; aj−1 ; n X aij ei ; aj+1 ; : : : ; an ” i=1 = n X = n X “ aij det a1 ; : : : ; aj−1 ; ei ; aj+1 ; : : : ; an i=1 aij (−1)i+j det Aij ; i=1 where the last equality follows from Lemma 7.26. ” Determinants Slide 248 Determinants and Minors We can obtain the determinant of a matrix as follows: A = 881, 2, 3<, 84, 5, 6<, 87, 8, 8<<; MatrixForm@AD 1 2 3 4 5 6 7 8 8 Det@AD 3 The Mathematica command Minors gives the matrix of minors of A. However, the command returns the determinants of the submatrix found by deleting the (n − i − 1)th row and (m − j − 1)th column. To conform to our definition, the command needs to be modified slightly. Determinants Slide 249 Determinants and Minors MatrixForm@Map@Reverse, Minors@AD, 80, 1<DD -8 -10 -3 -8 -13 -6 -3 -6 -3 The adjugate matrix can be defined as follows: adj@m_D := Map@Reverse, Minors@Transpose@mD, Length@mD - 1D, 80, 1<D TableAH-1Li+j , 8i, Length@mD<, 8j, Length@mD<E MatrixForm@adj@ADD -8 8 -3 10 -13 6 -3 6 -3 Determinants Slide 250 Product Rule for Determinants 7.28. Proposition. Let A; B ∈ Mat(n × n). Then det(AB) = det A det B. Proof. If A = (aik ), B = (bkj ) then AB = C = (cij ) with column vectors cj , 0 1 c1j B .. C cj = @ . A ; cij = n X aik bkj : k=1 cnj Let bj denote the columns of B, then cj = Abj : We can assume that A is bijective (otherwise det(AB) = 0 = det A det B). Determinants Slide 251 Product Rule for Determinants Proof (continued). Hence we can write 1 1 1 det AB = det(c1 ; : : : ; cn ) = det(Ab1 ; : : : ; Abn ) det A det A det A 1 = det(A( · ); : : : ; A( · ))[b1 ; : : : ; bn ] =: f (b1 ; : : : ; bn ) det A The so defined function f is clearly multilinear, because A is linear and det is multilinear. It is also alternating, because Abk = Abj if bk = bj and det is alternating. Finally, f (e1 ; : : : ; en ) = 1 det(Ae1 ; : : : ; Aen ) = 1 det A The function f is multilinear, normed and alternating. Therefore, by the uniqueness of the determinant, it must be the determinant. Determinants Slide 252 Product Rule for Determinants Proof (continued). That means that f (B) = det B ⇔ 1 det AB = det B: det A 7.29. Corollary. Let A ∈ Mat(n × n) be invertible. Then det A−1 = 1 : det A Slide 253 Part 2: Continuity, Differentiability, Integrability Slide 254 Continuity, Differentiability, Integrability 8. Sets and Equivalence of Norms 9. Continuity and Convergence 10. The First Derivative 11. The Regulated Integral for Vector-Valued Functions 12. Curves, Orientation, and Tangent Vectors 13. Curve Length, Normal Vectors, and Curvature 14. The Riemann Integral for Scalar-Valued Functions 15. Integration in Practice 16. Parametrized Surfaces and Surface Integrals Sets and Equivalence of Norms Slide 255 8. Sets and Equivalence of Norms Sets and Equivalence of Norms Slide 256 Continuity, Differentiability, Integrability 8. Sets and Equivalence of Norms 9. Continuity and Convergence 10. The First Derivative 11. The Regulated Integral for Vector-Valued Functions 12. Curves, Orientation, and Tangent Vectors 13. Curve Length, Normal Vectors, and Curvature 14. The Riemann Integral for Scalar-Valued Functions 15. Integration in Practice 16. Parametrized Surfaces and Surface Integrals Sets and Equivalence of Norms Slide 257 Finite-Dimensional Vector Spaces For the rest of the term, we will focus on functions of several variables, e.g., functions f : Rn → Rm : Our previously developed knowledge of linear algebra is essential for this, since, for example, the derivative of such a function at a point x turns out to be a matrix. More precisely, the derivative is a map Df : Rn → Mat(m × n; R): It is not sufficient to restrict ourselves to functions defined on Rn with values in Rm . On the one hand, the second derivative of f is then the derivative of Df , a matrix-valued function. Another aspect occurs in the study of ordinary differential equations, when we need to differentiate functions such as the determinant of a matrix. Therefore, we need to define concepts such as continuity for arbitrary vector spaces. Sets and Equivalence of Norms Slide 258 Open Balls The basic ingredient in our discussion are open balls: 8.1. Definition. Let (V; ∥ · ∥) be a normed vector space. Then B" (a) := {x ∈ V : ∥x − a∥ < "}; a ∈ V; " > 0; (8.1) is called an open ball of radius " about a. Of course, the “shape” of an open ball depends on the vector space V and the norm ∥ · ∥. For instance, the open balls in R2 with norms ∥x∥1 = |x1 | + |x2 |; ∥x∥2 = q |x1 |2 + |x2 |2 ; (8.2) ∥x∥∞ = max{|x1 |; |x2 |} all have quite different shapes. Furthermore, if V = Pn , for example, open balls do not have an obvious “shape” at all. Sets and Equivalence of Norms Slide 259 Open Sets 8.2. Definition. Let (V; ∥ · ∥) be a normed vector space. A set U ⊂ V is called open if for every a ∈ U there exists an " > 0 such that B" (a) ⊂ U. 8.3. Examples. (i) Any open ball B" (a), " > 0, a ∈ V , is an open set. (For any b ∈ B" (a) take ‹ < " − ∥a − b∥. Then B‹ (b) ⊂ B" (a).) (ii) The empty set ∅ ⊂ V is open. (Since there is no a ∈ ∅ for which we need to check that B" (a) ⊂ ∅, this is an example of a vacuously true statement.) (iii) The entire space V is an open set in V . Sets and Equivalence of Norms Slide 260 Open Sets We will see that open sets are fundamental for understanding properties of continuous functions, convergence in vector spaces and much more. Therefore, it becomes important to answer a basic question: If a set is open in a vector space (V; ∥ · ∥), is it also open if ∥ · ∥ is replaced by some other norm? 8.4. Example. If a set ˙ ⊂ R2 is open with respect to any one of the norms (8.2), it is also open with respect to any of the other norms given tin (8.2) here. Why? Sets and Equivalence of Norms Slide 261 Equivalent Norms 8.5. Definition. Let V be a vector space on which we may define two norms ∥ · ∥1 and ∥ · ∥2 . Then the two norms are called equivalent if there exists two constants C1 ; C2 > 0 such that C1 ∥x∥1 ≤ ∥x∥2 ≤ C2 ∥x∥1 for all x ∈ V . (8.3) 8.6. Example. In Rn we have (amongst others) the following two possible choices of norms: ∥x∥2 := n “X |xi |2 ”1=2 ; ∥x∥∞ := max |xi |: i=1 It is easily verified that for all x ∈ Rn , 1 √ ∥x∥2 ≤ ∥x∥∞ ≤ ∥x∥2 ; n so the two norms are equivalent. 1≤i≤n (8.4) Sets and Equivalence of Norms Slide 262 Convergence of Sequences 8.7. Remark. It is obvious from the definition that if two norms on a vector space are equivalent, then any set that is open with respect to the first norm is also open with respect to the second norm. We recall the following from Vv186: 8.8. Definition. Let (V; ∥ · ∥) be a normed vector space and (vn ) a sequence in V . Then (vn ) converges to a (unique) limit v ∈ V , n→∞ vn −−−→ v if and only if n→∞ ∥vn − v ∥ −−−→ 0: For later use, we note: 8.9. Remark. If a sequence (vn ) in (V; ∥ · ∥) converges to v ∈ V , then ∥vn ∥ → ∥v ∥. This follows from ˛ ˛ ˛∥vn ∥ − ∥v ∥˛ ≤ ∥v − vn ∥ → 0: Sets and Equivalence of Norms Slide 263 Equivalence of All Norms 8.10. Remark. It is again easy to see from the definition that if two norms on a vector space are equivalent, then a sequence that converges to a limit with respect to the first norm also converges to the same limit with respect to the second norm. Therefore, the following theorem is of fundamental importance: 8.11. Theorem. In a finite-dimensional vector space, all norms are equivalent. A major consequence of Theorem 8.11 is that if we have several norms at our disposal in a finite-dimensional space, then we can freely choose a convenient one in order to show openness of sets, convergence of sequences, etc. The proof of Theorem 8.11 requires some preliminary work. Sets and Equivalence of Norms Slide 264 The Theorem of Bolzano-Weierstraß We recall two basic facts from the theory of sequences of real numbers: (i) Every bounded and monotonic sequence of real numbers converges. (ii) Every sequence of real numbers has a monotonic subsequence. Together, these yield the following fundamental result (cf. 186 Theorem 2.2.35): 8.12. Theorem of Bolzano-Weierstraß. Every bounded sequence of real numbers has a convergent subsequence. We remark that the Theorem of Bolzano-Weierstraß easily implies that every Cauchy sequence of real numbers converges, because every Cauchy sequence that has a convergent subsequence must itself converge. Thus the basic ingredient in proving that the real numbers (with the usual metric) are complete is the fact that a bounded, monotonic sequence converges. Sets and Equivalence of Norms Slide 265 The Theorem of Bolzano-Weierstraß in Rn 8.13. Theorem of Bolzano–Weierstraß in Rn . Let (x (m) )m∈N be a sequence (m) (m) of vectors in Rn , i.e., x (m) = (x1 ; : : : ; xn ). Suppose that there exists a (m) constant C > 0 such that |xk | < C for all m ∈ N and each k = 1; : : : ; n. Then there exists a subsequence (x (mj ) )j∈N that converges to a vector y ∈ Rn in the sense that (m ) j→∞ xk j −−−→ yk for k = 1; : : : ; n. Proof. (m) Consider the real coordinate sequence (x1 )m∈N . By assumption, this sequence is bounded, so by the Theorem of Bolzano-Weierstraß 8.12 there (mj ) exists a convergent subsequence (x1 1 ) with some limit, say y1 ∈ R. (m) The second coordinate sequence (x2 ) is also bounded and has a convergent subsequence, but this subsequence does not need to have the (m) same indices as that for (x ). Sets and Equivalence of Norms Slide 266 The Theorem of Bolzano-Weierstraß in Rn Proof (continued). (mj ) We therefore employ a trick: The subsequence (x2 1 ) that uses the indices from our above subsequence for the first coordinate is of course (mj ) also bounded and hence has a sub-subsequence (x2 2 ) that converges, say to y2 ∈ R. Taking the corresponding sub-subsequence for the first (mj ) coordinate, (x1 2 ) still converges to y1 . Similarly, we a sub-sub-subsequence of the third coordinate will converge to some y3 ∈ R while the corresponding sub-sub-subsequences of the first two coordinates will still converge to y1 and y2 , respectively. Repeating the (m ) procedure n times, the n-fold subsequence (xk jn ) converges to some yk ∈ R, k = 1; : : : ; n. Hence, the subsequence (x (mjn ) ) converges to some y ∈ Rn . Sets and Equivalence of Norms Slide 267 A Basic Norm inequality 8.14. Lemma. Let (V; ∥ · ∥) be a finite- or infinite-dimensional normed vector space and {v1 ; : : : ; vn } an independent set in V . Then there exists a C > 0 such that for any –1 ; : : : ; –n ∈ F ` ´ ∥–1 v1 + · · · + –n vn ∥ ≥ C |–1 | + · · · + |–n | : (8.5) Proof. Let s := |–1 | + · · · + |–n |. If s = 0, then all –k = 0 and the inequality (8.5) holds trivially for any C, so we can assume s > 0. Dividing by s, (8.5) becomes ∥—1 v1 + · · · + —n vn ∥ ≥ C; n X |—k | = 1; k=1 with —k = –k =s. (8.6) Sets and Equivalence of Norms Slide 268 A Basic Norm inequality Proof (continued). Hence, we need to show ∃ C>0 ∀ ∥—1 v1 + · · · + —n vn ∥ ≥ C: —1 ;:::;—n ∈F |—1 |+···+|—n |=1 Suppose that this is false, i.e., ∀ C>0 ∃ ∥—1 v1 + · · · + —n vn ∥ < C: —1 ;:::;—n ∈F |—1 |+···+|—n |=1 In particular, choosing C = 1=m, m = 1; 2; 3; : : :, we can find a sequence of vectors (m) u (m) := —1 v1 + · · · + —(m) n vn (m) (m) such that ∥u (m) ∥ → 0 as m → ∞ and |—1 | + · · · + |—n | = 1 for all m. Sets and Equivalence of Norms Slide 269 A Basic Norm inequality Proof (continued). (m) Hence, for each k = 1; : : : ; n, |—k | ≤ 1 and so each coefficient sequence (m) (—k ) is bounded. Write (m) —(m) := (—1 ; : : : ; —(m) n ) By the Theorem of Bolzano Weierstraß in Rn , there exists a subequence of vectors (—(mj ) )j∈N that converges to some ¸ = (¸1 ; : : : ; ¸n ) ∈ Rn . This corresponds to a subsequence u (mj ) of u (m) such that j→∞ u (mj ) −−−→ ¸1 v1 + · · · + ¸n vn =: u with |¸1 | + · · · + |¸n | = 1. Since the vectors v1 ; : : : ; vn are independent and not all ¸k vanish, it follows that u ̸= 0. Sets and Equivalence of Norms Slide 270 A Basic Norm inequality Proof (continued). Remark 8.9 then implies, j→∞ ∥u (mj ) ∥ −−−→ ∥u∥ ̸= 0: But by our construction, ∥u (m) ∥ → 0 as m → ∞, so the subsequence (∥u (mj ) ∥) must also converge to zero. This gives a contradiction. We can now proceed to prove Theorem 8.11. Sets and Equivalence of Norms Slide 271 Equivalence of Norms Proof of Theorem 8.11. Let V be a finite-dimensional vector space, ∥ · ∥ be any norm on V and {v1 ; : : : ; vn } a basis of V . Let v ∈ V have the representation v = –1 v1 + · · · + –n vn with –1 ; : : : ; –n ∈ F. By the triangle inequality, ∥v ∥ = ∥–1 v1 + · · · + –n vn ∥ ≤ n X |–i |∥vi ∥ ≤ C i=1 n X |–i | i=1 where C := max ∥vi ∥ depends only on the basis and not on v . We hence 1≤i≤n see that for any norm there are constants C1 ; C2 > 0 such that C1 n X i=1 |–i | ≤ ∥v ∥ ≤ C2 n X |–i |; (8.7) i=1 where the first inequality is just (8.5). Given two norms ∥ · ∥1 and ∥ · ∥2 , it follows from their respective inequalities (8.7) that (8.3) holds. Sets and Equivalence of Norms Slide 272 Equivalence of Norms It is essential that Theorem 8.11 assumes that V is a finite-dimensional vector space. In an infinite-dimensional vector space, it is possible to define non-equivalent norms. 8.15. Example. Consider the space of continuous functions on [0; 1], C([0; 1]). We can define the two norms ∥f ∥∞ = sup |f (x)|; x∈[0;1] ∥f ∥1 = Z 1 0 |f (x)| dx: You will show in the assignments that these two norms are not equivalent. Sets and Equivalence of Norms Slide 273 Interior, Exterior and Boundary Points 8.16. Definition. Let (V; ∥ · ∥) be a normed vector space and M ⊂ V . (i) A point x ∈ M is called an interior point of M if there exists an " > 0 such that B" (x) ⊂ M. (ii) The set of interior points of M is denoted by int M. (iii) A point x ∈ V is called a boundary point of M if for every " > 0 B" (x) ∩ M ̸= ∅ and B" (x) ∩ (V \ M) ̸= ∅. (iv) The set of boundary points of M is denoted by @M. (v) A point that is neither a boundary nor an interior point of M is called an exterior point of M. 8.17. Remarks. (i) An exterior point of M is an interior point of V \ M. (Check this!) (ii) For given M, any point of V is either an interior, boundary or exterior point of M. Sets and Equivalence of Norms Slide 274 Closed Sets 8.18. Definition. Let (V; ∥ · ∥) be a normed vector space and M ⊂ V . Then M is said to be closed if its complement V \ M is open. 8.19. Remark. Of course, a set M does not need to be either open or closed. Some sets are open and closed at the same time. 8.20. Examples. (i) A set consisting of a single point, M = {a} ⊂ V , is a closed set. (ii) The empty set ∅ ⊂ V is closed. (iii) The entire space V is a closed set in V . Sets and Equivalence of Norms Slide 275 Closed Sets 8.21. Lemma. Let (V; ∥ · ∥) be a normed vector space and M ⊂ V . (i) The set M is open if and only if M = int M. (ii) The set M is closed if and only if @M ⊂ M. Proof. (i) This is just a restatement of the definition of an open set. (ii) Suppose that M is closed. Then V \ M is open. An open set can not contain a boundary point, since all its points are interior points. Hence, @M ∩ (V \ M) = ∅ and so @M ⊂ M. Suppose that @M ⊂ M. Then V \ M contains only exterior points of M. But an exterior point of M is an interior point of V \ M, so V \ M is open. Hence, M is closed. Sets and Equivalence of Norms Slide 276 The Closure 8.22. Definition. Let (V; ∥ · ∥) be a normed vector space and M ⊂ V . Then M := M ∪ @M is called the closure of M. 8.23. Remark. It is not hard to show that the closure of a set M is a closed set. In fact, it is the smallest set that both contains M and is closed. The closure of a set may also be characterized in terms of sequences: 8.24. Lemma. Let (V; ∥ · ∥) be a normed vector space and M ⊂ V . Then n M= x ∈V: ∃ (xn )n∈N xn ∈ M and xn → x o (8.8) Sets and Equivalence of Norms Slide 277 The Closure Proof. (i) Suppose that x ∈ V is such that there exists a sequence (xn ) with xn ∈ M and xn → x. Then for every " > 0, B" (x) contains at least one xn . Hence, B" (x) ∩ M ̸= ∅ and so x can not be an exterior point. This implies x ∈ M ∪ @M. (ii) Suppose x ∈ M ∪ @M. Then for every " > 0, B" (x) ∩ M ̸= ∅. Choose " = 1=n for n ∈ N \ {0} to find a sequence of points xn ∈ B1=n (x) ∩ M. This sequence converges to x, so x is in the set on the right-hand side of (8.8). Continuity and Convergence Slide 278 9. Continuity and Convergence Continuity and Convergence Slide 279 Continuity, Differentiability, Integrability 8. Sets and Equivalence of Norms 9. Continuity and Convergence 10. The First Derivative 11. The Regulated Integral for Vector-Valued Functions 12. Curves, Orientation, and Tangent Vectors 13. Curve Length, Normal Vectors, and Curvature 14. The Riemann Integral for Scalar-Valued Functions 15. Integration in Practice 16. Parametrized Surfaces and Surface Integrals Continuity and Convergence Slide 280 Continuous Functions Recall the following definition of continuity in normed vector spaces: 9.1. Definition. Let (X; ∥ · ∥X ) and (V; ∥ · ∥V ) be normed vector spaces and f : X → V a function. Then f is continuous at a ∈ X if ∀ ∃ ∀ ">0 ‹>0 x∈X ∥x − a∥X < ‹ ⇒ ∥f (x) − f (a)∥V < ": (9.1) Of course, we can prove as usual the following: 9.2. Theorem. Let (X; ∥ · ∥X ) and (V; ∥ · ∥V ) be normed vector spaces and f : X → V a function. Then f is continuous at a ∈ X if and only if ∀ (xn )n∈N xn ∈X xn → a ⇒ f (xn ) → f (a): (9.2) Continuity and Convergence Slide 281 Image and Pre-Image of Sets Suppose that f : M → N, where M; N are any sets. Let A ⊂ M. Then we define the image of A by f (A) := {y ∈ N : y = f (x) for some x ∈ A}: In particular, we can write ran f = f (M): Similarly, for B ⊂ N we define the pre-image of B by f −1 (B) := {x ∈ M : f (x) = y for some y ∈ B}: 9.3. Examples. (i) Let f : R → R, f (x) = sin x. Then f ([0; ı]) = [0; 1]. (ii) Let f : R2 → R, f (x; y ) = x 2 + y 2 . Then f −1 ({1}) = {(x; y ) ∈ R2 : x 2 + y 2 = 1} (This is the unit circle in R2 ). (9.3) Continuity and Convergence Slide 282 Continuous Functions It is often useful to characterize continuous maps by using open sets: 9.4. Theorem. Let (X; ∥ · ∥X ) and (V; ∥ · ∥V ) be normed vector spaces and f : X → V a function. Then f is continuous if and only if the pre-image f −1 (˙) of every open set ˙ ⊂ V is open. Proof. (⇒) Let f be continuous and ˙ ⊂ V open. We will show that f −1 (˙) is open. Let a ∈ f −1 (˙). Then f (a) ∈ ˙, and since ˙ is open we can find " > 0 such that B" (f (a)) ⊂ ˙. Now let ‹ > 0. By the continuity of f we can choose ‹ small enough to ensure that f (B‹ (a)) ⊂ B" (f (a)). But then B‹ (a) ⊂ f −1 (˙). Since this is true for any a ∈ f −1 (˙), it follows that f −1 (˙) is open. Continuity and Convergence Slide 283 Continuous Functions Proof (continued). (⇐) Let f : X → V be such that the pre-image f −1 (˙) of every open set ˙ ⊂ V is open. We will show that f is continuous. Let a ∈ X be arbitrary and fix " > 0. We want to show that there exists a ‹ > 0 such that x ∈ B‹ (a) ⇒ f (x) ∈ B" (f (a)): (9.4) The set B" (f (a)) is open, and by assumption f −1 (B" (f (a))) ∋ a is also open. Thus, we can find ‹ > 0 such that B‹ (a) ⊂ f −1 (B" (f (a))). But then (9.4) holds and we are finished. Continuity and Convergence Slide 284 Continuous Functions 9.5. Example. We show that the function det : Mat(n × n; C) → C; det A = X sgn ı aı(1)1 · · · aı(n)n ı∈Sn is continuous. In particular, we can choose to use the norm ∥A∥ = maxi;j |aij |. Then fix A = (aij ) ∈ Mat(n × n; C) and suppose that (Am ) is a sequence converging (m) to A. Our choice of norm implies that all coefficients converge, aij → aij . Since det A is a polynomial in the coefficients aij , det Am → det A and therefore det is continuous at A ∈ Mat(n × n; C). Note that the pre-image of the set of non-zero complex numbers is det−1 (C \ {0}) = GL(n; C); the general linear group of invertible matrices. Since C \ {0} is an open set, Theorem 9.4 implies that GL(n; C) is an open set in Mat(n × n; C). Continuity and Convergence Slide 285 Compact Sets We are now interested in generalizing the results of Vv186 that apply to continuous functions on closed intervals to vector spaces. Note that a closed interval in R is always bounded in the following sense 9.6. Definition. Let (V; ∥ · ∥V ) be a normed vector space and M ⊂ V . Then M is said to be bounded if there exists some R > 0 such that M ⊂ BR (0). It turns out that the natural generalization of a closed interval is a little more complicated than just requiring a set to be closed and bounded. 9.7. Definition. Let (V; ∥ · ∥V ) be a normed vector space and K ⊂ V . Then K is said to be compact if every sequence in K has a convergent subsequence with limit contained in K. Continuity and Convergence Slide 286 Compact Sets are Closed and Bounded 9.8. Theorem. Let (V; ∥ · ∥V ) be a (possibly infinite-dimensional) normed vector space and K ⊂ V be compact. Then K is closed and bounded. Proof. We first show that K is closed by establishing K = K. Let x ∈ K. Then there exists a sequence (xn ) in K converging to x. Since K is compact, (xn ) has a subsequence (xnk ) that converges to x ′ ∈ K. Since (xn ) converges to x, x = x ′ ∈ K, so K = K and K is closed. Now suppose that K is unbounded. Then for any n ∈ N there exists an xn ∈ K such that ∥xn ∥V > n. This gives rise to an unbounded sequence (xn ). Furthermore, any subsequence of (xn ) is unbounded. Since a convergent sequence is bounded, we conclude that (xn ) can not have a convergent subsequence. This implies that K is not compact. By contraposition, if K is compact, then K must be bounded. Continuity and Convergence Slide 287 Closed and Bounded Sets are Sometimes Compact 9.9. Theorem. Let (V; ∥ · ∥V ) be a finite-dimensional vector space and let K ⊂ V be closed and bounded. Then K is compact. Proof. Suppose that (b1 ; : : : ; bn ) be a basis of V and K closed and bounded. Let (vm ) be a sequence in K. Then each sequence term has the representation (m) vm = –1 b1 + · · · + –(m) n bn ; –1 ; : : : ; –(m) ∈ F; n (m) m ∈ N: By Lemma 8.14 and the boundedness of K, there exist constants C1 ; C2 > 0 such that C1 ≥ ∥vm ∥V ≥ C2 n X (m) |–k |: k=1 Continuity and Convergence Slide 288 Closed and Bounded Sets are Sometimes Compact Proof (continued). (m) It follows that for each k, the sequence (–k ) is bounded. Write (m) –(m) = (–1 ; : : : ; –(m) n ): By the Theorem of Bolzano-Weierstraß in Rn , (–(m) ) has a convergent subsequence (–(mj ) ) so that (vmj ) converges to some element v ∈ K. Since K is closed, v ∈ K. This implies that K is compact. Continuity and Convergence Slide 289 Closed and Bounded Sets are Sometimes Compact Theorem 9.9 is in general false in infinite-dimensional spaces: 9.10. Example. Consider the vector space of summable complex sequences, n ‘1 := (an ) : N → C : ∞ X o |an | < ∞ : n=0 The natural norm is given by ∥(an )∥1 := ∞ X |an |: n=0 Then the set n B1 (0) = (an ) ∈ ‘ : 1 ∞ X |an | ≤ 1 n=0 is closed and bounded, but not compact. o Continuity and Convergence Slide 290 Compact Sets and Continuity Why are we so interested in compact sets? Well, it turns out that compact sets are natural extensions of closed intervals in R for the purpose of generalizing some major theorems on continuous functions. 9.11. Proposition. Let (X; ∥ · ∥X ), (V; ∥ · ∥V ) be normed vector spaces and K ⊂ X compact. Let f : K → V be continuous. Then ran f = f (K) is compact in V . Proof. Let (yn ) be a sequence in f (K). Then there exists a sequence (xn ) in K with yn = f (xn ). Since K is compact, a subsequence (xnk ) of (xn ) converges to some a ∈ K. But because f is continuous the subsequence (f (xnk )) of (yn ) converges to f (a) ∈ f (K). Hence, (yn ) has a convergent subsequence and f (K) is compact. Continuity and Convergence Slide 291 Extrema of Continuous Functions on Compact Sets 9.12. Theorem. Let (X; ∥ · ∥X ) be a normed vector space and K ⊂ X compact. Let f : K → R be continuous. Then f has a maximum in K, i.e., there exists an x ∈ K such that f (y ) ≤ f (x) for all y ∈ K. Proof. The range ran f = f (K) is compact by Proposition 9.11, so it is closed and bounded by Theorem 9.8. The least upper bound b = sup f (K) exists because f (K) is bounded. Since b is the least upper bound, b can not be an exterior point of f (K), so b ∈ f (K). Since f (K) is closed, f (K) = f (K) and b ∈ f (K). Hence, there exists an x ∈ K with f (x) = b and f (y ) ≤ b for all y ∈ K. Continuity and Convergence Slide 292 Uniform Continuity on Compact Sets Recall the definition of uniform continuity for functions in vector spaces: 9.13. Definition. Let (X; ∥ · ∥X ) and (V; ∥ · ∥V ) be normed vector spaces, ˙ ⊂ X and f : ˙ → V a function. Then f is uniformly continuous in ˙ if ∀ ∃ ∀ ">0 ‹>0 x;y ∈˙ ∥x − y ∥X < ‹ ⇒ ∥f (x) − f (y )∥V < ": (9.5) (Compare with Definition 9.1.) 9.14. Theorem. Let (X; ∥ · ∥X ) and (V; ∥ · ∥V ) be normed vector spaces, K ⊂ X a compact set and f : K → V continuous on K. Then f is uniformly continuous on K. Continuity and Convergence Slide 293 Uniform Continuity on Compact Sets Proof. Suppose that f is continuous but not uniformly continuous on K. Then ∃ ∀ ∃ ">0 ‹>0 x;y ∈K ∥x − y ∥X < ‹ ∧ ∥f (x) − f (y )∥V ≥ ": Denote this " by "0 . Then for each ‹ = 1=n there exist vectors xn ; yn ∈ K such that ∥xn − yn ∥X < 1 n ∧ ∥f (xn ) − f (yn )∥V ≥ "0 : Since K is compact, there exist subsequences (xnk ) and (ynk ) that converge, say to ‰ and ”, respectively. Since ∥xnk − ynk ∥X < n1k , we see that ‰ = ”. However, then xnk → ‰ ∧ ynk → ‰ ∧ ∥f (xnk ) − f (ynk )∥V ≥ "0 ̸→ 0: which contradicts the continuity of f at ‰. Continuity and Convergence Slide 294 Continuity and Convergence We now present two lemmas that will be very useful for our future discussion. Often, we will discuss convergence of functions that also depend on parameters. The issue is then how varying these parameters affects the convergence. 9.15. Lemma. Let (X; ∥ · ∥X ) and (V; ∥ · ∥V ) be normed vector spaces, f : X → V a function such that lim ∥f (x)∥V = 0: x→0 Then lim sup ∥f (t · x)∥V = 0: x→0 t∈[0;1] (9.6) Continuity and Convergence Slide 295 Continuity and Convergence Proof. We need to show that for any " > 0 there exists a ‹ > 0 such that for all h ∈ X the following is true: if ∥x∥X < ‹, then sup ∥f (t · x)∥V < ": t∈[0;1] Fix " > 0. Choose a ‹ > 0 such that whenever ∥y ∥X < ‹ for y ∈ X, then ∥f (y )∥V < "=2. (This is possible by the assumption (9.6).) Then ∥x∥X < ‹ implies ∥t · x∥X < ‹ for all t ∈ [0; 1] and hence ∥f (t · x)∥V < "=2 for all t ∈ [0; 1]. But then sup ∥f (t · x)∥V ≤ "=2 < " t∈[0;1] and the proof is complete. Continuity and Convergence Slide 296 Continuity and Convergence The following lemma clearly shows how uniform continuity is leveraged to provide convergence uniformly with respect to a parameter. First, we remark that if (X; ∥ · ∥X ) and (Y; ∥ · ∥Y ) are normed vector spaces, then so is the set of pairs X × Y with norm ∥(x; y )∥X×Y := ∥x∥X + ∥y ∥Y : (check that this actually defines a norm on X × Y !) Then if K1 ⊂ X is compact and K2 ⊂ Y is compact, it is easy to see (check this!) that K1 × K2 is compact in X × Y . (9.7) Continuity and Convergence Slide 297 Continuity and Convergence 9.16. Lemma. Let (X; ∥ · ∥X ), (Y; ∥ · ∥Y ) and (V; ∥ · ∥V ) be normed vector spaces, ˙ ⊂ X an open set and K ⊂ Y a compact set. Suppose that f : ˙ × K → V is continuous and that lim ∥f (x; y )∥V = 0 x→0 for every y ∈ K. Then lim sup ∥f (x; y )∥V = 0: x→0 y ∈K 9.17. Remark. The compactness of K is essential. For example, lim (1 − e −x·y ) = 0 x→0 for every y ∈ [0; ∞) but lim sup (1 − e −x·y ) = 1: x→0 y ∈[0;∞) (9.8) Continuity and Convergence Slide 298 Continuity and Convergence Proof. Since we are considering the limit as x → 0, we may restrict the x-domain of f to a compact neighborhood Kx of zero (e.g., Kx = Br (0) for some suitable r > 0). Then f is considered to be defined on the compact set Kx × K and, since f is assumed to be continuous, by Theorem 9.14 it is also uniformly continuous on Kx × K. That means that for any " > 0 there exists a ‹ > 0 such that ∀ (x;y );(‰;”)∈Kx ×K ∥(x; y ) − (‰; ”)∥X×Y < ‹ ⇒ ∥f (x; y ) − f (‰; ”)∥V < ": Now let " > 0 be fixed and choose a corresponding ‹ > 0 so that the above implication holds. Continuity and Convergence Slide 299 Continuity and Convergence Proof (Continued). Choosing y = ” and ‰ = 0 we then have ∥(x; y ) − (‰; y )∥X×Y = ∥x − ‰∥X = ∥x∥X by (9.7). Furthermore, for every y ∈ K f (0; y ) = 0 by (9.8) and the continuity of f . Then for our given choice of "; ‹ we have ∀ ∀ ∥x∥X < ‹ x∈Kx y ∈K ⇒ ∥f (x; y )∥V < ": This implies ∀ ∥x∥X < ‹ x∈Kx which proves the assertion. ⇒ sup ∥f (x; y )∥V ≤ "; y ∈K The First Derivative Slide 300 10. The First Derivative The First Derivative Slide 301 Continuity, Differentiability, Integrability 8. Sets and Equivalence of Norms 9. Continuity and Convergence 10. The First Derivative 11. The Regulated Integral for Vector-Valued Functions 12. Curves, Orientation, and Tangent Vectors 13. Curve Length, Normal Vectors, and Curvature 14. The Riemann Integral for Scalar-Valued Functions 15. Integration in Practice 16. Parametrized Surfaces and Surface Integrals The First Derivative Slide 302 Calculus on Vector Spaces In the rest of this term we will develop calculus for “functions of multiple variables”. This generally means functions defined on (a subset of) Rn , but it is not any more difficult to treat functions defined on finite-dimensional vector spaces. Throughout the following discussion, we assume that V and X denote finite-dimensional, normed vector spaces. The concrete norm will be irrelevant, as all norms are equivalent (see Theorem 8.11). We will consider first the derivative of a function f : X → V: 10.1. Definition. Let f : X → V1 , g : X → V2 and x0 ∈ X. We say that f (x) = o(g (x)) as x → x0 ⇔ ∥f (x)∥V1 =0 x→x0 ∥g (x)∥V 2 lim The First Derivative Slide 303 The Derivative of a Function 10.2. Definition. Let X; V be finite-dimensional vector spaces and ˙ ⊂ X an open set. Then a map f : ˙ → V is called differentiable at x ∈ ˙ if there exists a linear map Lx ∈ L(X; V ) such that f (x + h) = f (x) + Lx h + o(h) as h → 0. (10.1) In this case we call Lx the derivative of f at x and write Lx = Df |x = df |x : We say that f is differentiable on ˙ if it is differentiable for every x ∈ ˙. 10.3. Remarks. I Just as in the proof of 186 Lemma 3.1.2 we can show that the derivative is uniquely defined by (10.1). I We may also copy the proof of Lemma 186 3.1.8 to see that every differentiable function is continuous. The First Derivative Slide 304 The Derivative of a Function If f is differentiable on ˙, we may regard Df as a map Df : ˙ → L(X; V ); x 7→ Df |x : 10.4. Definition. We define C(˙; V ) := {f : ˙ → V : f is continuous}; C 1 (˙; V ) := {f : ˙ → V : f is differentiable and Df is continuous}: We may thus regard the derivative D as a (linear) map D : C 1 (˙; V ) → C(˙; L(X; V )); f 7→ Df : The First Derivative Slide 305 The Derivative of a Function 10.5. Example. Let X; V be finite-dimensional vector spaces and L ∈ L(X; V ) a linear map. Then ! L(x + h) = Lx + Lh = Lx + DL|x h + o(h) (h → 0); so the derivative of L at any x ∈ X is DL|x = L. 10.6. Examples. Explicit instances of Example 10.5 are, e.g., C be regarded as real vector spaces and f : z → z be the (then linear) complex conjugation. Then for z; h ∈ C I Let X = V = z + h = z + h; so Df |z (h) = h. The First Derivative Slide 306 The Derivative of a Function I Regard A ∈ Mat(2 × 2; x; h ∈ 2 R R) as a linear map R2 → R2 . Then for A(x + h) = Ax + Ah; so DA|x (h) = Ah. I Let tr : Mat(n × n; C) → C be the trace of a square matrix, i.e., tr A = tr(aij )1≤i;j≤n = n X aii : i=1 Then the trace is linear and for A; H ∈ Mat(n × n; C) D tr |A H = tr H: The First Derivative Slide 307 The Derivative of a Function 10.7. Example. Some examples of derivatives of non-linear maps are as follows: I Let X = V = C be regarded as real vector spaces and f : z → z 2 . Then for z; h ∈ C (z + h)2 = z 2 + 2zh + h2 ; so Df |z (h) = 2zh. I Let f : R2 → R be given by f (x) = f (x1 ; x2 ) = x1 + 2x2 x1 + x22 : Then, for h = (h1 ; h2 ) ∈ R2 and x ∈ R2 , f (x + h) = f (x1 + h1 ; x2 + h2 ) = x1 + h1 + 2(x2 + h2 )(x1 + h1 ) + (x2 + h2 )2 = f (x) + h1 + 2(h2 x1 + h1 x2 + h2 x2 ) + 2h1 h2 + h22 : The First Derivative Slide 308 The Derivative of a Function as a Matrix In f (x + h) = f (x) + h1 + 2(h2 x1 + h1 x2 + h2 x2 ) +2h1 h2 + h22 | {z =:L(x1 ;x2 ) h } the term L(x1 ;x2 ) h is clearly linear in h, while ∥2h1 h2 ∥R |2h1 h2 | |h2 | = lim q = 2 lim q : h→0 ∥h∥R2 h1 ;h2 →0 h1 ;h2 →0 1 + (h2 =h1 )2 h2 + h2 lim 1 2 q Since |h2 | → 0 as h2 → 0 and 1= 1 + (h2 =h1 )2 is bounded, we see that ∥2h1 h2 ∥R = 0; h→0 ∥h∥R2 lim and so 2h1 h2 = o(h) as h → 0. Similarly, we show that h22 = o(h), so we conclude Df |x h = (1 + 2x2 )h1 + 2(x1 + x2 )h2 : The First Derivative Slide 309 The Derivative of a Function as a Matrix Notice that we may express the derivative as a 1 × 2 matrix, ` Df |x h = 1 + 2x2 ; 2(x1 + x2 ) ´ h1 h2 ! : This is of course not surprising; if X = Rn and V = Rm , i.e., we are considering a function f : Rn ⊃ ˙ → Rm ; 0 1 f1 (x1 ; : : : ; xn ) B C .. f (x1 ; : : : ; xn ) = @ A; . fm (x1 ; : : : ; xn ) then its derivative at x ∈ ˙ (if it exists) is Df |x ∈ L(Rn ; Rm ) ≃ Mat(m × n; R): How to obtain this matrix? Denote by ej the jth standard basis vector in Rn or Rm . We now consider the columns of Df |x , which are given by Df |x ej , j = 1; : : : ; n. The First Derivative Slide 310 The Derivative of a Function as a Matrix Assuming that f is differentiable, for any h ∈ R, x ∈ Rn and j = 1; : : : ; n we have f (x + hej ) = f (x) + Df |x (hej ) + o(h); which we may rewrite as Df |x ej = m 1 1X (f (x +hej )−f (x))+o(1) = (fk (x +hej )−fk (x))ek +o(1): h h k=1 The (i; j)th element of Df |x is given by ⟨ei ; Df |x ej ⟩, so (Df |x )ij = ⟨ei ; Df |x ej ⟩ = 1 (fi (x + hej ) − fi (x)) + o(1): h We now take the limit h → 0 to obtain fi (x + hej ) − fi (x) : h→0 h (Df |x )ij = ⟨ei ; Df |x ej ⟩ = lim The First Derivative Slide 311 Partial Derivatives 10.8. Definition. Let ˙ ⊂ Rn and f : ˙ → R be differentiable on ˙. We then define the partial derivative with respect to xj at x ∈ ˙ by ˛ @f ˛˛ f (x + hej ) − f (x) ˛ := lim h→0 @xj ˛x h f (x1 ; : : : ; xj−1 ; xj + h; xj+1 ; : : : ; xn ) − f (x) h→0 h = lim In this notation, (Df |x )ij = or rather @fi @xj 0 @f 1 @x1 ··· @fm @x1 ··· B . Df |x = B @ .. 1˛ ˛ ˛ C .. C˛˛ . A˛ @fm ˛˛ @f1 @xn @xn x The First Derivative Slide 312 Partial Derivatives There are several notations for the partial derivatives of a function. If f : Rn → R, we may use any of the following: @f = @xj f = @j f = fxj = fj @xj to denote differentiation w.r.t. the variable xj . In practice, we calculate the partial derivative w.r.t. to xj by holding all other variables constant and simply differentiating f as a function of xj . 10.9. Example. Let f (x1 ; x2 ; x3 ) = x1 sin(x1 x2 x3 ) + 3x22 x1 . Then @f = sin(x1 x2 x3 ) + x1 x2 x3 cos(x1 x2 x3 ) + 3x22 ; @x1 @f = x12 x3 cos(x1 x2 x3 ) + 6x2 x1 ; @x2 @f = x12 x2 cos(x1 x2 x3 ): @x3 The First Derivative Slide 313 The Jacobian Of course, if Df |x exists, we may write it as a matrix of partial derivatives. However, it is not clear whether the existence of all partial derivatives implies the existence of the derivative Df |x . Thus it is useful to consider the matrix of partial derivatives on its own; in fact, it deserves a special designation. 10.10. Definition. Let ˙ ⊂ Rn and f : ˙ → Rm . Assume that all partial @fi of f exist at x ∈ ˙. The matrix derivatives @x j 0 @f @x1 ··· @fm @x1 ··· B . Jf (x) := B @ .. 1 1˛ ˛ ˛ C .. C˛˛ . A˛ @fm ˛˛ @f1 @xn @xn x called the Jacobian of f . If the derivative Df |x ∈ L(Rn ; Rm ) exists, Jf (x) ∈ Mat(m × n; R) is the representing matrix of Df |x w.r.t. the standard bases in Rn and Rm . The First Derivative Slide 314 The Jacobian 10.11. Example. Let f : R2 → R2 be given by f (x1 ; x2 ) = (x12 + x22 ; x2 − x1 ). Then the partial derivatives are @f1 @ = (x 2 + x22 ) = 2x1 ; @x1 @x1 1 @f2 @ = (x2 − x1 ) = −1; @x1 @x1 @f1 @ = (x 2 + x22 ) = 2x2 ; @x2 @x2 1 @f2 @ = (x2 − x1 ) = 1: @x2 @x2 The Jacobian is given by Jf (x1 ; x2 ) = 2x1 2x2 −1 1 ! The natural question that arises is, “Does the existence of Jf (x) imply the differentiability of f at x?” The First Derivative Slide 315 The Jacobian Regrettably, the answer to that question is negative, as the following example shows: 10.12. Example. Let g : R2 → R be given by g (x1 ; x2 ) = 8 < x21 x2 2 x1 +x2 :0 (x1 ; x2 ) ̸= (0; 0) (x1 ; x2 ) = (0; 0) Then all partial derivatives of g exist at x = 0, since ˛ g (0 + h; 0) − g (0) @g ˛˛ = lim = 0; ˛ @x1 x=0 h→0 h ˛ @g ˛˛ g (0; 0 + h) − g (0) = 0: = lim @x2 ˛x=0 h→0 h Thus both partial derivatives exist at x = 0 and in fact vanish. The First Derivative Slide 316 The Jacobian However, g is not even continuous at 0 since lim g (h; h) = h→0 h2 1 = ; 2 2 h +h 2 lim g (−h; h) = h→0 Thus g can not be differentiable at x = 0. −h2 1 =− : 2 2 (−h) + h 2 The First Derivative Slide 317 The Jacobian Thus the existence of the partial derivatives of a function f : Rn → Rm is not even enough to guarantee the continuity of f . However, we have the following result: 10.13. Theorem. Let ˙ ⊂ Rn be an open set and f : ˙ → Rm such that all partial derivatives @xj fi exist on ˙. (i) If all partial derivatives are bounded (there exists a constant M > 0 such that |@xj fi | ≤ M on ˙), then f is continuous i.e., f ∈ C(˙; Rm ). (ii) If all partial derivatives are continuous on ˙, then f is continuously differentiable on ˙, i.e., f ∈ C 1 (˙; Rm ). In particular, 0 @f 1 @x1 ··· @fm @x1 ··· B . Df |x = Jf (x) = B @ .. for all x ∈ ˙. 1˛ ˛ ˛ C .. C˛˛ . A˛ @fm ˛˛ @f1 @xn @xn x The First Derivative Slide 318 The Jacobian Proof. Let 0 1 f1 (x1 ; : : : ; xn ) B C .. f (x) = @ A: . f : Rn → Rm ; fm (x1 ; : : : ; xn ) For both statements of the theorem, we need to consider f (x + h) − f (x). To illustrate, let us first look at the case n = 2. Then fi (x + h) − fi (x) = fi (x1 + h1 ; x2 + h2 ) − fi (x1 ; x2 ) ˆ ˜ = fi (x1 + h1 ; x2 + h2 ) − fi (x1 + h1 ; x2 ) ˆ ˜ + fi (x1 + h1 ; x2 ) − fi (x1 ; x2 ) : The First Derivative Slide 319 The Jacobian Proof (continued). For fixed h1 , the first difference can be treated by the Mean Value Theorem 3.2.7 of Vv186: define g : R → R; g (y ) = fi (x1 + h1 ; y ): Then there exists a „2 ∈ (x2 ; x2 + h2 ) such that fi (x1 + h1 ; x2 + h2 ) − fi (x1 + h1 ; x2 ) = g (x2 + h2 ) − g (x2 ) = h2 · g ′ („2 ) = h2 @2 fi (x1 + h1 ; x2 + fi2 h2 ) where we have chosen fi2 ∈ (0; 1) such that „2 = x2 + fi2 h2 . The First Derivative Slide 320 The Jacobian Proof (continued). Similarly, we find that @fi (x1 + fi1 h1 ; x2 ) @x1 for some fi1 ∈ (0; 1). Generalizing to n ≥ 2, we have constants fi1 ; : : : ; fin ∈ (0; 1) such that fi (x1 + h1 ; x2 ) − fi (x1 ; x2 ) = h1 fi (x + h) − fi (x) = fi (x1 + h1 ; x2 + h2 ; : : : ; xn + hn ) − fi (x1 ; x2 + h2 ; : : : ; xn + hn ) + fi (x1 ; x2 + h2 ; : : : ; xn + hn ) − fi (x1 ; x2 ; x3 + h3 : : : ; xn + hn ) + · · · + fi (x1 ; x2 ; : : : ; xn−1 ; xn + hn ) − fi (x1 ; x2 ; : : : ; xn ) = h1 @1 fi (x1 + fi1 h1 ; x2 + h2 ; : : : ; xn + hn ) + h2 @2 fi (x1 ; x2 + fi2 h2 ; x3 + h3 : : : ; xn + hn ) + · · · + hn @n fi (x1 ; x2 ; : : : ; xn + fin hn ): The First Derivative Slide 321 The Jacobian Proof (continued). We proceed with the proof of the theorem. (i) Suppose that the partial derivatives are bounded. We want to prove that f is continuous at x ∈ ˙, i.e., lim f (x + h) = f (x) h→0 where we are free to choose arbitrary norms in Rn and Rm for the convergence. In both spaces we choose the maximum norm ∥ · ∥∞ (see (8.4)): ∥f (x + h) − f (x)∥∞ = max |fi (x + h) − fi (x)| i=1;:::;n ˛ ˛ ˛ @f ˛ ˛ i ˛ ≤ n · max |hj | max sup ˛ (x)˛ ˛ j=1;:::;n i;j=1;:::;n x∈˙ ˛ @xj h→0 ≤ n · M · ∥h∥∞ −−−→ 0: The First Derivative Slide 322 The Jacobian Proof (continued). (ii) Write L= @fi @xj ! = (Lij )i=1;:::;m i=1;:::;m j=1;:::;n j=1;:::;n for the Jacobian. We want to show that f (x + h) − f (x) − Lh = o(h) as h → 0. We again choose the maximum norm ∥ · ∥∞ to establish the convergence and write u j = (x1 ; : : : ; xj−1 ; xj + fij hj ; xj+1 + hj+1 ; : : : ; xn + hn ): for j = 1; : : : ; n. We have the following estimate: The First Derivative Slide 323 The Jacobian Proof (continued). ˛ ˛ ∥f (x + h) − f (x) − Lh∥∞ = max ˛fi (x + h) − fi (x) − i=1;:::;m n X j=1 ˛ ˛ Lij hj ˛ n ˛X ˛ ˛ ˛ = max ˛ hj (@j fi (u j ) − @j fi (x))˛ i=1;:::;m ≤ ∥h∥∞ = o(h) j=1 n X max |@j fi (u j ) − @j fi (x)| j=1 | → 0 as h → 0 i=1;:::;m {z } as h → 0. Observe that we use the assumption that @j fi (x) is continuous at x. This proves that f is differentiable, L = Df |x and Df |x depends continuously on x. The First Derivative Slide 324 The Jacobian 10.14. Remark. Let ˙ ⊂ Rn be an open set. Then C 1 (˙; Rm ) = {f : ˙ → Rm : @j fi is continuous for j = 1; : : : ; n and i = 1; : : : ; m}: If m = 1, we write C 1 (˙) := C 1 (˙; R) for short. We will next establish the product and chain rules for differentiation. The First Derivative Slide 325 Generalized Products To avoid having to re-prove the product rule for various types of products that we will encounter, we first define a generalized product through precisely those properties that we shall need. 10.15. Definition. Let X1 ; X2 ; V be normed vector spaces. A map ⊙ : X1 × X2 → V is called a (generalized) product if 1. ⊙ is bilinear, i.e., linear in each entry and 2. ∥u ⊙ v ∥V ≤ ∥u∥X1 ∥v ∥X2 for all u ∈ X1 , v ∈ X2 . 10.16. Examples. 1. The scalar product in Rn ; 2. The cross product × : R3 × R3 → R3 ; 3. For a compact non-empty set K ⊂ Rn and f ; g ∈ C(K; R) the pointwise product f · g ∈ C(K; R), defined by (f · g )(x) = f (x)g (x) The First Derivative Slide 326 The Product Rule 10.17. Product Rule. Let U; X1 ; X2 ; V be finite-dimensional vector spaces and ˙ ⊂ U an open set. Let f : ˙ → X1 and g : ˙ → X2 be differentiable maps and ⊙ : X1 × X2 → V a generalized product. Then f ⊙ g : ˙ → V is also differentiable and D(f ⊙ g ) = (Df ) ⊙ g + f ⊙ (Dg ): (10.2) At x ∈ ˙ the right-hand side is interpreted as a linear map U → V u 7→ D(f ⊙ g )|x u = (Df |x u) ⊙ g (x) + f (x) ⊙ (Dg |x u): (10.3) The First Derivative Slide 327 The Product Rule Proof. The proof is similar to that for the product rule for functions of one variable. We telescope the difference, f (x + h) ⊙ g (x + h) − f (x) ⊙ g (x) = f (x + h) ⊙ (g (x + h) − g (x)) + (f (x + h) − f (x)) ⊙ g (x) = (f (x) + O(h)) ⊙ (Dg |x h + o(h)) + (Df |x h + o(h)) ⊙ g (x) as h → 0. Extending the relevant limit theorems from the pointwise product to the generalized product, we have f (x + h) ⊙ g (x + h) − f (x) ⊙ g (x) = f (x) ⊙ (Dg |x h) + O(∥h∥2 ) + o(h) + (Df |x h) ⊙ g (x) + o(h) = f (x) ⊙ (Dg |x h) + (Df |x h) ⊙ g (x) + o(h) The First Derivative Slide 328 The Chain Rule 10.18. Chain Rule. Let U; X; V be finite-dimensional vector spaces and ˙ ⊂ U, ˚ ⊂ X open sets. Let g : ˙ → ˚ and f : ˚ → V be differentiable maps. Then the composition f ◦ g : ˙ → V is also differentiable and for all x ∈˙ D(f ◦ g )|x = Df |g (x) ◦ Dg |x ; (10.4) where the right-hand side is a composition of linear maps. The proof is basically identical to that of 186 Theorem 3.1.12, the chain rule for functions of one real variable. You are encouraged to revisit that proof and apply it to the general chain rule here. 10.19. Example. Consider the polar coordinates (r; ffi) ∈ (0; ∞) × [0; 2ı), defined through the map Φ(r; ffi) = ! r cos ffi : r sin ffi The First Derivative Slide 329 The Chain Rule Then @Φ1 @r @Φ2 @r DΦ|(r;ffi) = @Φ1 @ffi @Φ2 @ffi ! ! cos ffi −r sin ffi : sin ffi r cos ffi = Next, consider the map U : R2 → R, (x1 ; x2 ) 7→ x12 + x22 . The derivative is DU|x = “ @U @U @x1 ; @x2 ” “ = 2x1 ; 2x2 ” Now U ◦ Φ = (r cos ffi)2 + (r sin ffi)2 = r 2 . Clearly, D(U ◦ Φ)|(r;ffi) = (2r; 0). We can also apply the chain rule: D(U ◦ Φ)|(r;ffi) = DU|(r cos ffi;r sin ffi) DΦ|(r;ffi) “ = 2r cos ffi; 2r sin ffi ” cos ffi sin ffi ! −r sin ffi r cos ffi = (2r cos ffi + 2r sin ffi; −2r 2 cos ffi sin ffi + 2r 2 sin ffi cos ffi) 2 = (2r; 0) 2 The First Derivative Slide 330 The Mean Value Theorem One of the most important properties of differentiable, single-variable functions that we encountered in Vv186 was the mean value theorem: If f : [a; b] → R is continuous on [a; b] and differentiable on (a; b), then there exists a number ‰ ∈ (a; b) such that f (b) − f (a) = f ′ (‰): b−a We may ask how this might be generalized to functions of several variables: if f : Rn → Rm is differentiable, will it still be true that f (y ) − f (x) = Df |‰ (y − x) for some ‰ ∈ Rn , perhaps lying on a straight line between x and y ? The First Derivative Slide 331 The Mean Value Theorem In the case of a scalar function f : X → R, the theorem will still hold, since we can set ‚(t) = tx + (1 − t)y ; t ∈ [0; 1]; and simply consider f ◦ ‚ : [0; 1] → R. Problems will occur, however, if f is vector-valued: 10.20. Example. The function f : [0; 2ı] → R ; 2 f (x) = ! cos(x) ; sin(x) ` ´ satisfies f (0) = f (2ı) = 10 , but Df |‰ ̸= (0; 0) for all ‰ ∈ (0; 2ı). However, we may save a version of the mean value theorem by considering integrals. The Regulated Integral for Vector-Valued Functions Slide 332 11. The Regulated Integral for Vector-Valued Functions The Regulated Integral for Vector-Valued Functions Slide 333 Continuity, Differentiability, Integrability 8. Sets and Equivalence of Norms 9. Continuity and Convergence 10. The First Derivative 11. The Regulated Integral for Vector-Valued Functions 12. Curves, Orientation, and Tangent Vectors 13. Curve Length, Normal Vectors, and Curvature 14. The Riemann Integral for Scalar-Valued Functions 15. Integration in Practice 16. Parametrized Surfaces and Surface Integrals The Regulated Integral for Vector-Valued Functions Slide 334 Integrals of Vector-Space-Valued Functions The following important result will require the integral of a function of a single variable, albeit with values in a vector space V . In other words, we need to assign a meaning to Z b a f (x) dx; where f : [a; b] → V . Fortunately, the procedure is completely analogous to that of functions f : R → R, at least for the regulated integral: we define step functions on [a; b] with respect to a partition P by setting them constant on sub-intervals of the partition: 11.1. Definition. A V be a real or complex vector space. A function f : [a; b] → V is called a step function with respect to a partition P = (a0 ; : : : ; an ) if there exist elements yi ∈ V such that f (t) = yi whenever ai−1 < t < ai , i = 1; : : : ; n. We denote the set of all step functions by Step([a; b]; V ) The Regulated Integral for Vector-Valued Functions Slide 335 Step Functions 11.2. Example. The map f : [0; 1] → R ; 2 8` ´ 0 > > 1=2 < `1´ f (x) = > >`12´ : 0 0 ≤ x < 1=2 x = 1=2 1=2 < x ≤ 1 is a step function. We then follow through with analogous definitions to the ones for real functions, replacing the modulus in R by the norm in a vector space: 11.3. Definition. Let I ⊂ R be an interval and (V; ∥ · ∥V ) a normed vector space. We say that a function f : I → V is bounded if ∥f ∥∞ := sup∥f (x)∥V < ∞: x∈I The set of all bounded functions f : I → V is denoted L∞ (I; V ). (11.1) The Regulated Integral for Vector-Valued Functions Slide 336 Bounded Functions 11.4. Example. The map f:R→R ; 2 f (t) = sin t e −t 2 ! is a bounded map. To see this, we endow R2 with the norm ∥x∥1 := |x1 | + |x2 |. (Since all norms in Rn are equivalent, it doesn’t matter which norm we take.) Then ` ´ ∥f ∥∞ := sup∥f (t)∥1 = sup |sin t| + |e −t | t∈R t∈R 2 ≤ sup|sin t| + sup|e −t | = 2 < ∞: t∈R t∈R 2 The Regulated Integral for Vector-Valued Functions Slide 337 Integrals of Step Functions We then define the integral of a step function as before: 11.5. Theorem. Let f : [a; b] → V be a step function with respect to some partition P . Then IP (f ) := (a1 − a0 )y1 + · · · + (an − an−1 )yn ∈ V is independent of the choice of the partition P and is called the integral of f. Note that if f : [a; b] → V , then element of the vector space V . Rb a f (x) dx ∈ V , the integral of f is an (This makes it impossible to define the Riemann integral for functions f : I → V , because it relies comparing the size of upper and lower step functions.) The Regulated Integral for Vector-Valued Functions Slide 338 Integrals of Step Functions The main ingredient is again uniform convergence, where we now say that a sequence of functions (fn ), fn : I → V , I ⊂ R, converges uniformly to f : I → V in a normed vector space (V; ∥ · ∥V ) if n→∞ ∥fn − f ∥∞ := sup∥fn (x) − f (x)∥V −−−→ 0: x∈I A function f is then said to be regulated if it is the uniform limit of a sequence of step functions. We can then define the integral of f as the limit of the integrals of these step functions. You are invited to check that everything in fact works just as in the regulated integral for scalar real functions! The Regulated Integral for Vector-Valued Functions Slide 339 Integrals of Step Functions The upshot is the following: if f : [a; b] → Rn is piecewise continuous, then f is regulated and Z b a f (x) dx = Z b a 0 1 0R 1 b f1 (x) a f1 (x) dx B C B .. C .. C @ . A dx = B . @ A fn (x) Rb a fn (x) dx (This follows because a sequence of step functions converging uniformly to f will converge uniformly in each component; the individual components are then equal to the “usual” regulated integrals of real-valued functions.) Furthermore, we have the standard estimate ‚Z ‚ Z b ‚ b ‚ ‚ ‚ f (x) dx ‚ ≤ ∥f (x)∥V dx ≤ |b − a| · sup ∥f (x)∥V : ‚ ‚ a ‚ a x∈[a;b] V The Regulated Integral for Vector-Valued Functions Slide 340 The Mean Value Theorem We may now write down an “integral version” of the mean value theorem (186 Theorem 3.2.7): 11.6. Mean Value Theorem. Let X; V be finite-dimensional vector spaces, ˙ ⊂ X open and f ∈ C 1 (˙; V ). Let x; y ∈ ˙ and assume that the line segment x + ty , 0 ≤ t ≤ 1, is wholly contained in ˙. Then f (x + y ) − f (x) = Z 1 0 Df |x+ty y dt = “Z 1 0 ” Df |x+ty dt y : (11.2) 11.7. Remark. The integrals in (11.2) are integrals of elements of V (the integrand Df |x+ty y ) and L(X; V ) (the integrand Df |x+ty ). Hence the second equality is not trivial but needs to be proved. The Regulated Integral for Vector-Valued Functions Slide 341 The Mean Value Theorem The Mean Value Theorem 11.6 can also be understood as a generalization of the fundamental theorem of calculus: For single-variable functions, the fundamental theorem of calculus can be expressed as f (x + y ) − f (x) = Z x+y f ′ (‰) d‰: x Substituting t = (‰ − x)=y in the integral, we have the equivalent identity f (x + y ) − f (x) = a special case of (11.2). Z 1 0 f ′ (x + y t)y dt; The Regulated Integral for Vector-Valued Functions Slide 342 The Mean Value Theorem Proof of Theorem 11.6. Define the auxiliary function g ∈ C 1 ([0; 1]; V ) by g (t) := f (x + ty ). Thus (by 186 Lemma 4.2.3) we have f (x + y ) − f (x) = g (1) − g (0) = Z 1 g ′ (t) dt: 0 For ‚(t) = x + ty we have ‚ ′ (t) = y . Applying the chain rule, g ′ (t) = D(f ◦ ‚)|t = Df |‚(t) D‚|t = Df |x+ty y : Thus we obtain f (x + y ) − f (x) = proving the first equality. Z 1 0 Df |x+ty y dt; The Regulated Integral for Vector-Valued Functions Slide 343 The Mean Value Theorem Proof of Theorem 11.6 (continued). We now prove that y may be “taken out” of the integral. Let us abbreviate L(t) = Df |x+ty . For z ∈ (0; 1) we have d dz Z z L(t)y dt = L(z)y = 0 d dz ȷ„Z z « ff L(t) dt y : 0 Furthermore, setting z = 0 we have Z 0 L(t)y dt = 0 = „Z 0 0 Therefore, « L(t) dt y : 0 Z z 0 L(t)y dt = „Z z « L(t) dt y 0 for all z ∈ [0; 1], in particular also for z = 1. The Regulated Integral for Vector-Valued Functions Slide 344 The Mean Value Theorem 11.8. Corollary. From the standard estimate ‚Z b ‚ ‚ ‚ ‚ ‚ ≤ |b − a| · sup ∥f (t)∥V f (t) dt ‚ ‚ a V t∈[a;b] Theorem 11.6 yields ∥f (x + y ) − f (x)∥V ≤ ∥y ∥X · sup ∥Df |x+ty ∥; 0≤t≤1 where ∥Df |x+ty ∥ denotes the operator norm of Df |x+ty ∈ L(X; V ). The Regulated Integral for Vector-Valued Functions Slide 345 Differentiating Under an Integral We close this section with a useful result concerning the interchanging of differentiation and integration. 11.9. Theorem. Let X; V be finite-dimensional, normed vector spaces, I = [a; b] ⊂ R an interval and ˙ ⊂ X an open set. Let f : I × ˙ → V be a continuous function such that Df (t; ·)|x exists and is continuous at every (t; x) ∈ I × ˙. Then g (x) = Z b f (t; x) dt a is differentiable in ˙ and Dg (x) = Z b a Df (t; · )|x dt: The Regulated Integral for Vector-Valued Functions Slide 346 Differentiating Under an Integral Proof. Fix x ∈ ˙ and choose h small enough such that x + h ∈ ˙. In any case, we assume ∥h∥X < 1. We need to prove g (x + h) − g (x) − „Z b | a « Df (t; · )|x dt h = o(h): {z =:L } By the Mean Value Theorem, the left-hand side equals Z b a = ´ f (t; x + h) − f (t; x) − Df (t; · )|x h dt Z b „Z 1 a = ` 0 Z b „Z 1 a 0 « Df (t; · )|x+sh ds h − Df (t; · )|x h dt ` ´ « Df (t; · )|x+sh − Df (t; · )|x ds h dt: (11.3) The Regulated Integral for Vector-Valued Functions Slide 347 Differentiating Under an Integral Proof (continued). Taking the norm, we have ∥g (x + h) − g (x) − Lh∥V ‚Z 1 ‚ ‚ ` ´ ‚ ‚ ≤ (b − a) sup ‚ Df (t; · )|x+sh − Df (t; · )|x ds ‚ ‚ · ∥h∥X 0 t∈[a;b] ‚ ‚ ≤ (b − a) sup sup ‚Df (t; · )|x+sh − Df (t; · )|x ‚ · ∥h∥X t∈[a;b] s∈[0;1] Here ∥ · ∥ denotes the operator norm (4.7). We now need to show that sup ‚ ‚ sup ‚Df (t; · )|x+sh − Df (t; · )|x ‚ t∈[a;b] s∈[0;1] vanishes when h → 0. However, this requires careful reasoning because s and t are free to vary independently of h The Regulated Integral for Vector-Valued Functions Slide 348 Differentiating Under an Integral Proof (continued). Define the function ¯ : [a; b] × ˙ → L(˙; V ), ¯ (t; h) = Df (t; · )|x+h − Df (t; · )|x : Since f is assumed to be continuously differentiable, ¯ is continuous and lim ∥¯ (t; h)∥ = 0 for any t ∈ [a; b]. h→0 By Lemma 9.16 we see that lim sup ∥¯ (t; h)∥ = 0: h→0 t∈[a;b] Furthermore, by Lemma 9.15, lim sup sup ∥¯ (t; sh)∥ = 0; h→0 s∈[0;1] t∈[a;b] which is what we wanted to show. The Regulated Integral for Vector-Valued Functions Slide 349 Differentiating Under an Integral Differentiating an integral with respect to a parameter can be very useful for calculating integrals that are otherwise difficult to evaluate directly. For example, by differentiating g (x) := Z ∞ 0 sin t −xt e dt t with respect to x you will show in the assignments that g (0) = Z ∞ 0 sin t ı dt = : t 2 (Compare with the discussion of the Dirichlet integral last term, see 186 Example 4.2.14.) Curves, Orientation, and Tangent Vectors Slide 350 12. Curves, Orientation, and Tangent Vectors Curves, Orientation, and Tangent Vectors Slide 351 Continuity, Differentiability, Integrability 8. Sets and Equivalence of Norms 9. Continuity and Convergence 10. The First Derivative 11. The Regulated Integral for Vector-Valued Functions 12. Curves, Orientation, and Tangent Vectors 13. Curve Length, Normal Vectors, and Curvature 14. The Riemann Integral for Scalar-Valued Functions 15. Integration in Practice 16. Parametrized Surfaces and Surface Integrals Curves, Orientation, and Tangent Vectors Slide 352 Curves As an important application, we consider vector-space-valued functions of a single variable, ‚ : R → V . These play an important role in the parametrization of curves. In many applications V = Rn , but our results are applicable for curves in general normed vector spaces (V; ∥ · ∥). 12.1. Definition. Let (V; ∥ · ∥) be a normed vector space and I ⊂ R an interval. I A set C ⊂ V for which there exists a continuous, surjective and locally injective map ‚ : I → C is called a curve. I The map ‚ is called a parametrization of C. I A curve C together with a parametrization ‚, i.e., the pair (C; ‚), is called a parametrized curve. 12.2. Remark. Here locally injective means that in the neighborhood B" (x) ∩ I of any point x ∈ I the parametrization is injective. Curves, Orientation, and Tangent Vectors Slide 353 Local Properties More generally, we say that a property holds locally at a point p ∈ V if this property holds in some "-neighborhood B" (p). 12.3. Examples. R) and f (0) > 0, then f is locally positive at 0. R) and f ′ (0) > 0, then f is locally increasing at 0. I If f ∈ C 1 (R) and f ′ (0) ̸= 0, then f is locally invertible at 0. I If f ∈ C( I If f ∈ C 1 ( We simply say that a property holds locally if this property holds locally at every point p ∈ V . 12.4. Example. The sequence of functions (fn ) given by fn (x) = (x − 1=n)2 converges to f (x) = x 2 locally uniformly, because at every point p ∈ R there is an "-neighborhood such that the convergence is uniform in this neighborhood. Curves, Orientation, and Tangent Vectors Slide 354 Curves 12.5. Example. The set ˘ S 1 := (x1 ; x2 ) ∈ R2 : x12 + x22 = 1 ¯ is a curve in R2 because we can find a parametrization, e.g., ‚ : [0; 2ı] → S ; 1 ‚(t) = ! cos(t) : sin(t) It is clear that ‚ is continuous. Furthermore, ran ‚ ⊂ S 1 since cos2 t + sin2 t = 1 for all t ∈ [0; 2ı]. The map ‚ is not injective, since ‚(0) = ‚(2ı) = (1; 0), but it is injective on (0; 2ı): If ‚(t1 ) = ‚(t2 ) ̸= (1; 0), then cos t1 = cos t2 and sin t1 = sin t2 . The second equation implies t1 ; t2 ∈ (0; ı] or t1 ; t2 ∈ (ı; 2ı). However, since the cosine function is injective on (0; ı] and on (ı; 2ı) we obtain t1 = t2 . Hence ‚ is locally injective. Curves, Orientation, and Tangent Vectors Slide 355 Curves Now suppose (x1 ; x2 ) ∈ S 1 is given. Then taking t0 = 8 x2 > > >arctan x1 > > <ı=2 > ı + arctan xx21 > > > > : 3ı=2 if x2 > 0, x1 ̸= 0; if (x1 ; x2 ) = (0; 1); if x2 < 0, x1 ̸= 0; if (x1 ; x2 ) = (0; −1); (12.1) (using a suitable branch of the inverse tangent) gives ‚(t0 ) = (x1 ; x2 ) for some t0 ∈ [0; 2ı]. Thus, ‚ is surjective and therefore a parametrization. Another parametrization is ‚e : [0; 1] → S ; 1 ‚e(t) = Both (C; ‚) and (C; ‚e) are parametrized curves. ! cos(2ıt) : − sin(2ıt) Curves, Orientation, and Tangent Vectors Slide 356 Parametrizations of Curves It is clear that a curve will have an infinite number of parametrizations. Physically, a curve C might be considered to be the path of a particle, while the parametrization ‚ gives the position of the particle at each time t. Hence, in Example 12.5, ‚ describes the counter-clockwise movement of a particle around the unit circle, while ‚e describes a clockwise movement around the same path. The parametrization ‚ corresponds to completing the path in time 2ı, while ‚e corresponds to completing the path in time 1. Hence, the parametrization ‚e can be said to correspond to a greater velocity of the particle. Curves, Orientation, and Tangent Vectors Slide 357 Simple, Open and Closed Curves 12.6. Definition. Let C ⊂ V be a curve possessing a parametrization ‚ : I → C with int I = (a; b) for −∞ ≤ a < b ≤ ∞. (i) If ‚ is (globally) injective parametrization we say that C is a simple curve. (ii) If lim ‚(t) = lim ‚(t); t→a t→b the curve C is said to be closed. (iii) If a curve is not closed, it is said to be open. The points x := lim ‚(t) t→a and y := lim ‚(t) t→b are called the initial point and the final point of the parametrized curve (C; ‚). The open curve is said to join x and y . Curves, Orientation, and Tangent Vectors Slide 358 Sketches of Simple, Open and Closed Curves Curves, Orientation, and Tangent Vectors Slide 359 Initial and Final Points of Open Curves 12.7. Remark. Whether a point is an initial or final point of an open curve depends on the parametrization. We will explore this a little further. 12.8. Example. The simple open curve C= ˘ (x1 ; x2 ) ∈ R2 : 0 ≤ x1 ≤ 1; x2 = x12 ¯ joins the points x = (0; 0) and y = (1; 1). Either may be considered the initial point or the final point. Possible parametrizations are ‚(t) = ! t ; t2 where both ‚; ‚e : [0; 1] → C. ‚e(t) = 1−t (1 − t)2 ! Curves, Orientation, and Tangent Vectors Slide 360 Reparametrization of Curves 12.9. Definition. Let C ⊂ V be a curve with parametrization ‚ : I → C. (i) Let J ⊂ R be an interval. A continuous, bijective map r : J → I is called a reparametrization.of the parametrized curve (C; ‚). (ii) If r is increasing the reparametrization is said to be orientation-preserving. (iii) If r is decreasing the reparametrization is said to be orientation-reversing. 12.10. Remarks. (i) Given any two parametrizations ‚; ‚e of an open curve, one can always find a reparametrization by setting r = ‚ −1 ◦ ‚e (the continuity and local injectivity is enough for this definition to make sense). (ii) Since every bijective map in R is either decreasing or increasing (see 186 Theorem 2.5.20), it follows that a reparametrization is either orientation-preserving or orientation-reversing. Curves, Orientation, and Tangent Vectors Slide 361 Reparametrization of Curves 12.11. Example. Consider the unit circle S 1 of Example 12.5 with parametrizations ‚ : [0; 2ı] → S ; 1 ‚e : [0; 1] → S ; 1 ! ‚(t) = cos(t) ; sin(t) ‚e(t) = cos(2ıt) : − sin(2ıt) ! Then r : [0; 1] → [0; 2ı], r (t) = 2ı(1 − t), is a reparametrization of the paramatrized curve (C; ‚). In fact, ‚e = ‚ ◦ r: The reparametrization is not orientation-preserving since r ′ (t) = −2ı < 0. Curves, Orientation, and Tangent Vectors Slide 362 Orientation of Curves A reparametrization of a parametrized curve (C; ‚) yields a new parametrized curve (C; ‚e) where ‚e = ‚ ◦ r . It is easy to see that an orientation-preserving reparametrization of an open curve (C; ‚) yields an open parametrized curve (C; ‚e) with the same initial and final points. 12.12. Definition. Let (C; ‚) be a parametrized curve and r a reparametrization of (C; ‚). The curve (C; ‚e) with ‚e = ‚ ◦ r is said to have the same orientation as (C; ‚) if r is orientation-preserving. Otherwise it is said to have reverse orientation. 12.13. Remark. The orientation of an open curve can be fixed by selecting the initial and final points. The orientation of a closed curve can be fixed by splitting the curve into two disjoint simple curves and fixing appropriate orientations for them. Curves, Orientation, and Tangent Vectors Slide 363 Orientation of Curves Hence a curve can have two possible orientations. If we want to fix a curve C together with an orientation (but not necessarily a concrete parametrization), we denote it by C∗ and if necessary give a single parametrization ‚ so that (C; ‚) has the desired orientation. The same curve with opposite orientation is denoted by −C∗ . (This will be quite important when we discuss integration later.) There is in general no natural way to select a “proper” or positive orientation of a curve; rather both possible orientations have equal validity. There is a single exception, however: 12.14. Definition. Let (C; ‚) be a parametrized, simple, closed curve in R2 . Then C is said to have positive orientation if ‚ traverses C in a counter-clockwise direction. Curves, Orientation, and Tangent Vectors Slide 364 Curves in Polar Coordinates When we previously introduced polar coordinates in C, we remarked that there is a one-to-one correspondence C \ {0} ∋ x + iy ↔ (r; ’) ∈ R+ × [0; 2ı) We want to adapt this to R2 instead of C, i.e., associate an angle ’ and a distance r to every point (x1 ; x2 ) ∈ R2 . One of the main difficulties stems from the fact that we can not associate an angle ’ to x = 0. However, if we do not focus on associating an angle ’ to every point x ∈ R2 , but only on finding a cartesian point (x1 ; x2 ) ∈ R2 given (r; ’), we can be a bit more flexible. Curves, Orientation, and Tangent Vectors Slide 365 Curves in Polar Coordinates We will allow (r; ’) ∈ R2 , and associate to them a point x ∈ R2 as follows: x= r cos ’ r sin ’ ! Of course this association is not injective, but this will not matter for our present purposes. We consider a particular type of curve, defined through the map ‚(t) = ! f (t) cos t ; f (t) sin t (12.2) where f : R → R is some function. For short, such a curve is sometimes written as r = f (t) (12.3) A curve given (12.3) is known as a curve in polar coordinates. The equation (12.3) is to be interpreted in the sense of (12.2). Curves, Orientation, and Tangent Vectors Slide 366 Curves in Polar Coordinates 12.15. Example. The cardioid is the plane curve given in polar coordinates by r = 1 − sin t. y -1 1 -2 x Curves, Orientation, and Tangent Vectors Slide 367 Smooth Curves 12.16. Definition. A curve C ⊂ V is said to be smooth if there exists a parametrization ‚ : I → C such that (i) ‚ is continuously differentiable on int I and (ii) ‚ ′ (t) ̸= 0 for all t ∈ int I. A smooth reparametrization is a reparametrization that is continuously differentiable with non-vanishing derivative in the interior of its domain. The Jacobian J‚ = ‚ ′ of a smooth curve ‚ : I → Rn is given by 0 ′ 1 ‚1 (t) B .. C ′ ‚ (t) = @ . A ; ‚n′ (t) t ∈ int I: Curves, Orientation, and Tangent Vectors Slide 368 Graphs of Functions as Curves Let us consider the case of the graph ` of a function f : I → R, I ⊂ R an interval: it is defined as the set ˘ ¯ ` = (x; y ) ∈ R2 : x ∈ I; y = f (x) : This set can be regarded as a curve with parametrization ‚: I → R ; t 7→ 2 ! t : f (t) 12.17. Example. Consider the function f : R → R, f (x) = x 2 . Its graph is just the curve parametrized by ‚(t) = ! ‚1 (t) ‚2 (t) = ! t : t2 Curves, Orientation, and Tangent Vectors Slide 369 Curves and Graphs of Functions y Γ HtL = It, t 2 M 3 t = -1.5 2 t=1 1 -1 t=0 1 x Curves, Orientation, and Tangent Vectors Slide 370 Curves and Graphs of Functions y 2 Γ' HtL = H1, 2 tL 1 t= 1 By our previous considerations, ‚ is differentiable and 2 ′ x 1 ‚ (t) = ! ‚1′ (t) ‚2′ (t) = ! 1 : 2t The graph of ‚ ′ is quite unspectacular. -1 t=- 4 5 Curves, Orientation, and Tangent Vectors Slide 371 Tangent Lines of Curves So what is the interpretation of ‚ ′ ? If ‚ = (t; f (t)) parametrizes the graph function of some function f , then ‚ ′ (t) = (1; f ′ (t)). Recall that the derivative satisfies ‚(t0 + t) = ‚(t0 ) + ‚ ′ (t0 )t + o(t); so ‚(t0 ) + ‚ ′ (t0 )t is a linear approximation to the parametrization ‚ at a point t0 . In fact, if C ⊂ V is a curve and p = ‚(t0 ) ∈ C, then Tp C = {x ∈ V : x = ‚(t0 ) + ‚ ′ (t0 )t; t ∈ R} gives the tangent line to C at p. Curves, Orientation, and Tangent Vectors Slide 372 Tangent Lines of Curves Continuing from Example 12.17, we have the following tangent line Tp ` for p = (1; 1): y G = 8x: x = Γ HtL< 3 2 Tp G p = Γ H1L=H1,1L 1 -1 1 x Curves, Orientation, and Tangent Vectors Slide 373 Tangent Lines of Curves 1 y 0 12.18. Example. Consider the curve parametrized by 0 -1 1 t B C ‚ : [0; 8ı] → R3 ; ‚(t) = @cos t A : sin t 1 z 0 -1 This curve is called a helix. 0 10 x 20 Curves, Orientation, and Tangent Vectors Slide 374 Tangent Lines of Curves 1 y 0 -1 1 This is the graph of 0 1 1 C B ‚ ′ (t) = @− sin t A cos t z 0 -1 0 1 x 2 3 Curves, Orientation, and Tangent Vectors Slide 375 Tangent Lines of Curves The tangent line makes sense as a linear approximation 0 1 0 1 t0 1 B C B C ‚(t0 + t) = @cos t0 A + @− sin t0 A t + o(t) sin t0 cos t0 y 0 -1 1 z 0 -1 0 1 Curves, Orientation, and Tangent Vectors Slide 376 The Tangent Vector to a Curve 12.19. Definition. Let C∗ ⊂ V be an oriented smooth curve in (V; ∥ · ∥) and p ∈ C∗ . Let ‚ : I → V be a parametrization of C∗ . Then we define the unit tangent vector to C∗ at p = ‚(t) by T ◦ ‚(t) := ‚ ′ (t) ; ∥‚ ′ (t)∥ t ∈ int I: (12.4) This defines the tangent vector field T : C∗ → V on C: every point of C is associated to a vector in V . We will show that (12.4) does not depend on which parametrization ‚ is used to calculate T , as long as ‚ corresponds to the orientation of C∗ . Curves, Orientation, and Tangent Vectors Slide 377 The Tangent Vector to a Curve In fact, suppose ‚ : I → C, ‚e : J → C are two smooth parametrizations connected by a reparametrization r : J → I so that ‚e = ‚ ◦ r . Let p ∈ C satisfy p = ‚(t) = ‚e(fi ), t = r (fi ). Then ‚e′ (fi ) = d ‚(r (fi )) = ‚ ′ (r (fi ))r ′ (fi ) = ‚ ′ (t)r ′ (fi ): dfi Hence, T ◦ ‚e(fi ) = ‚e′ (fi ) r ′ (fi ) ‚ ′ (t) r ′ (fi ) = = T ◦ ‚(t): ∥‚e′ (fi )∥ |r ′ (fi )| ∥‚ ′ (t)∥ |r ′ (fi )| (12.5) If r is orientation-preserving, then r ′ (t) > 0 and the tangent vector is the same when calculated using ‚ as when using ‚e. If r is orientation-reversing, the tangent vector reverses direction. Thus (12.4) defines a unique unit tangent vector for an oriented curve. Curves, Orientation, and Tangent Vectors Slide 378 The Tangent Vector to a Curve 12.20. Example. Consider the circle of radius R in R2 , C := {x ∈ R2 : x12 + x22 = R2 }: By choosing the parametrization ‚ : [0; 2ı) → C; ‚(t) = R cos t R sin t ! we endow C with a positive (counter-clockwise) orientation. The unit tangent vector at ‚(t) is then given by ‚ ′ (t) 1 T ◦ ‚(t) = ′ = ∥‚ (t)∥ R Thus, if p = (p1 ; p2 ) ∈ C, then 1 T (p) = R Hence T (p) ⊥ p. −R sin t R cos t ! ! −p2 : p1 = ! − sin t : cos t Curve Length, Normal Vectors, and Curvature Slide 379 13. Curve Length, Normal Vectors, and Curvature Curve Length, Normal Vectors, and Curvature Slide 380 Continuity, Differentiability, Integrability 8. Sets and Equivalence of Norms 9. Continuity and Convergence 10. The First Derivative 11. The Regulated Integral for Vector-Valued Functions 12. Curves, Orientation, and Tangent Vectors 13. Curve Length, Normal Vectors, and Curvature 14. The Riemann Integral for Scalar-Valued Functions 15. Integration in Practice 16. Parametrized Surfaces and Surface Integrals Curve Length, Normal Vectors, and Curvature Slide 381 Curve Length Consider a simple curve C ⊂ V parametrized by ‚ : [a; b] → V , where (V; ∥ · ∥) is a normed vector space. Then a natural approximation to the length of ‚, which will depend on the norm used, is found by taking a partition P = (a0 ; : : : ; an ) of [a; b] and considering the lengths of the straight line segments joining ‚(ai−1 ) to ‚(ai ), i = 1; : : : ; n. The sum of the lengths of these line segments is given by ‘P (C) = n X ∥‚(ai ) − ‚(ai−1 )∥: i=1 We will say that a curve has a length if there exists an upper bound to these lengths. Note that ‘P (C) is of course independent of the parametrization ‚, since only the actual points ‚(ai ) ∈ C are used in this definition. Curve Length, Normal Vectors, and Curvature Slide 382 Curve Length 13.1. Definition. Let (V; ∥ · ∥) be a normed vector space and C ⊂ V an open curve. Then we say that C is rectifiable if ‘(C) := sup partitions P ‘P (C) exists. We then call ‘(C) the curve length or arc length of C. 13.2. Theorem. Let C ⊂ V be a smooth and open curve with parametrization ‚ : [a; b] → C. Then C is rectifiable if and only if Z b a ∥‚ ′ (t)∥ dt < ∞: Furthermore, if C is rectifiable, ‘(C) = Z b a ∥‚ ′ (t)∥ dt; where the right-hand side is independent of ‚. (13.1) Curve Length, Normal Vectors, and Curvature Slide 383 Curve Length 13.3. Example. Consider the helix segment C given by the graph of 0 1 ¸t B C ‚(t) = @R cos t A ; R sin t ‚ : [0; 2ı] → R3 ; ¸; R > 0: The length of C = ‚([0; 2ı]) is given by ‘(C) = Z 2ı 0 ∥‚ ′ (t)∥ dt = p Z 2ı q ¸2 + (−R sin t)2 + R2 cos2 t dt 0 = 2ı ¸2 + R2 : 13.4. Remark. Definition 13.1 and Theorem 13.2 refer to open curves. To find the length of a closed curve, we express it as the disjoint union of two simple curves and find their lengths separately. Curve Length, Normal Vectors, and Curvature Slide 384 Curve Length Proof of Theorem 13.2. We first show that the value of the integral Z b a ∥‚ ′ (t)∥ dt does not depend on the parametrization ‚. Let C ⊂ V be a smooth curve and ‚ : [a; b] → C a parametrization of C. Let ‚e : [¸; ˛] → C be some other parametrization and let r : [¸; ˛] → [a; b] be a smooth reparametrization so that ‚e(fi ) = ‚(r (fi )) Suppose that r is orientation-preserving. Then r (¸) = a and r (˛) = b. Curve Length, Normal Vectors, and Curvature Slide 385 Curve Length Proof (continued). Let t = r (fi ) so dt = r ′ (fi ) dfi . Furthermore, ` ´′ ‚e′ (fi ) = ‚ ◦ r (fi ) = ‚ ′ (r (fi ))r ′ (fi ) ‚ ′ (r (fi )) = ⇔ ‚e′ (fi ) r ′ (fi ) Thus, substituting t = r (fi ), Z b a ∥‚ ′ (t)∥ dt = = Z ˛ ¸ Z ˛ ¸ ∥‚ ′ (r (fi ))∥ r ′ (fi ) dfi = ∥‚e′ (fi )∥ |r ′ (fi )| r ′ (fi ) dfi = Z ˛‚ ′ ‚ ‚ ‚e (fi ) ‚ ′ ‚ ′ ‚r (fi ) dfi Z ˛ ¸ ¸ r (fi ) ∥‚e′ (fi )∥ dfi; where we have used that r is increasing, i.e., r ′ (fi ) > 0. This proves that Rb ′ a ∥‚ (t)∥ dt is independent of the parametrization ‚. Curve Length, Normal Vectors, and Curvature Slide 386 Curve Length Proof (continued). Now, for any partition P of [a; b] and any parametrization ‚ we have ‘P (C) = = ≤ = n X ∥‚(ai ) − ‚(ai−1 )∥ i=1 n ‚Z ai ‚ X ‚ ‚ ′ ‚ i=1 ai−1 n Z ai X i=1 ai−1 Z b a Hence, ‘(C) ≤ Rb ′ a ∥‚ (t)∥ dt. ‚ (t) dt ‚ ∥‚ ′ (t)∥ dt ∥‚ ′ (t)∥ dt: Curve Length, Normal Vectors, and Curvature Slide 387 Curve Length Proof (continued). Proving the converse inequality is slightly more difficult. We first establish three preliminary estimates. We will use the fact that since C is smooth, ‚ ′ : [a; b] → V is continuous and hence uniformly continuous on [a; b] (see Theorem 9.14). Fix " > 0. (i) Choose a ‹ > 0 such that |t − fi | < ‹ for all t; fi ∈ [a; b]. ⇒ ∥‚ ′ (t) − ‚ ′ (fi )∥ < " 2(b − a) Curve Length, Normal Vectors, and Curvature Slide 388 Curve Length Proof (continued). (ii) Consider the function f : [a; b] → R, f (t) = ∥‚ ′ (t)∥. Since ‚ is smooth, f is continuous and we can find a step function that uniformly approximates f . In fact, there is a 0 < ‹1 < ‹ and a partition P = (a0 ; : : : ; an ) with ai − ai−1 < ‹1 , i = 1; : : : ; n, of [a; b] such that ˛Z b ˛ n X ˛ ˛ ′ ′ ˛ ˛ ∥‚ (t)∥ dt − (a − a )∥‚ (a )∥ i i−1 i−1 ˛ < "=2 ˛ a i=1 Curve Length, Normal Vectors, and Curvature Slide 389 Curve Length Proof (continued). (iii) For any t ∈ (a; b) and h < ‹ with t + h ∈ [a; b], ‚ ‚(t + h) − ‚(t) ‚ ‚ ‚ 1 Z t+h ‚ ‚ ‚ ‚ ‚ ′ (fi ) dfi − ‚ ′ (t)‚ − ‚ ′ (t)‚ = ‚ ‚ h h t ‚ 1 Z t+h ` ´ ‚ ‚ ‚ =‚ ‚ ′ (fi ) − ‚ ′ (t) dfi ‚ h t Z 1 t+h ′ ≤ ∥‚ (fi ) − ‚ ′ (t)∥ dfi h t h " ≤ sup ∥‚ ′ (fi ) − ‚ ′ (t)∥ < : h fi ∈[t;t+h] 2(b − a) ‚ ‚(t + h) − ‚(t) ‚ ‚ ‚ ‚+ This implies ∥‚ ′ (t)∥ ≤ ‚ h " : 2(b − a) Curve Length, Normal Vectors, and Curvature Slide 390 Curve Length Proof (continued). We then have Z b a ∥‚ ′ (t)∥ dt ≤ n X ≤ n X ≤ i=1 n X = i=1 n X (ai − ai−1 )∥‚ ′ (ai−1 )∥ + i=1 " 2 ‚ ‚(a ) − ‚(a ) ‚ i i−1 ‚ ‚ ‚+ (ai − ai−1 )‚ (ai − ai−1 ) ai − ai−1 " " (b − a) + 2(b − a) 2 ∥‚(ai ) − ‚(ai−1 )∥ +" |ai − ai−1 | ∥‚(ai ) − ‚(ai−1 )∥ + " = ‘P (C) + " i=1 ≤ ‘(C) + " Letting " → 0, we obtain the desired inequality. Curve Length, Normal Vectors, and Curvature Slide 391 Curve Length We can now express the total curve length by ‘(C) = Z b a ∥‚ ′ (t)∥ dt: More generally, we can define a length function (‘ ◦ ‚)(t) = Z t a ∥‚ ′ (fi )∥ dfi (13.2) so that (‘ ◦ ‚)(b) = ‘(C). The function ‘ ◦ ‚ : [a; b] → [0; ∞) associates to each value of t the length of the curve at ‚(t). Since the integral (13.2) is strictly monotonic, ‘ ◦ ‚ is strictly increasing and hence bijective. Curve Length, Normal Vectors, and Curvature Slide 392 Curve Length 13.5. Example. We return to Example 12.20 and study the circle of radius R in R2 with parametrization ‚ : [0; 2ı) → C; ‚(t) = ! R cos t : R sin t The curve length is given by (‘ ◦ ‚)(t) = Z t 0 ′ ∥‚ (fi )∥ dfi = Z t R dfi = Rt: 0 Thus, for p = (p1 ; p2 ) ∈ C, we can read off ‘(p) = R · arctan p2 : p1 where the arctangent is understood in the sense of (12.1); i.e., the appropriate branch is chosen depending ion the signs of p1 and p2 . Curve Length, Normal Vectors, and Curvature Slide 393 Parametrization Through Curve Length The map ‘ : C∗ → [0; ∞) is bijective, since its inverse is given by ‘−1 = ‚ ◦ (‘ ◦ ‚)−1 : If s = ‘(p) is the curve length at p ∈ C∗ , then p = ‘−1 (s) is the unique point in C associated to this curve length. In other words, once we fix an orientation of a simple curve C, the curve length determines all other points of C∗ uniquely. This means that we can use the curve length as a natural parametrization of C, i.e., we can parametrize C using ‚ = ‘−1 : I → C; int I = (0; ‘(C)): If we want to parametrize closed curves through curve length, we must fix an “initial point” in some fashion. Curve Length, Normal Vectors, and Curvature Slide 394 Line Integrals We may extend the extend the concept of the curve length integral (13.1) for a parametrized curve (C; ‚), Z b a ∥‚ ′ (t)∥ dt; to integrals of the form Z b a f (‚(t)) · ∥‚ ′ (t)∥ dt where f is a real-valued function defined on C. Such an integral is called a line integral of the scalar function f . Curve Length, Normal Vectors, and Curvature The Line Integral of a Scalar Function Slide 395 Suppose that we are given a simple, open, oriented curve C∗ ⊂ R2 and a scalar function f : R2 → R. In the sketch below, the red curve C∗ joins the points p and q in the x1 -x2 -plane, and the function f is given by f (x1 ; x2 ) = 4=5 + x12 sin x2 . Curve Length, Normal Vectors, and Curvature Slide 396 The Line Integral of a Scalar Function Suppose that C∗ is parametrized by a function ‚ : [a; b] → C such that ‚(a) = p and ‚(b) = q. Then the blue curve below shows the values of f ◦ ‚ : [a; b] → R. Curve Length, Normal Vectors, and Curvature Slide 397 The Line Integral of a Scalar Function We now want to integrate the values of f along the red curve, i.e., we will determine the area of the green surface. Curve Length, Normal Vectors, and Curvature Slide 398 The Line Integral of a Scalar Function For clarity, the graph of f has been removed in the sketch below. Curve Length, Normal Vectors, and Curvature Slide 399 The Line Integral of a Scalar Function By considering the composition f ◦ ‚, we are effectively “straightening out” the red curve to the interval [a; b]. Curve Length, Normal Vectors, and Curvature Slide 400 The Line Integral of a Potential Function 13.6. Definition. Let C∗ be a smooth, oriented curve in a normed vector space V and f : C → R a continuous function. We then define the line integral of f along C∗ by Z C∗ f d‘ := Z I (f ◦ ‚)(t) · ∥‚ ′ (t)∥ dt (13.3) where ‚ : I → C is a (any) parametrization of C∗ . 13.7. Remarks. I Using the chain rule it can easily be seen that this integral is ∗ independent of the parametrization of C . I The line integral along a piecewise-smooth curve is defined as the sum of the integrals of the individual smooth segments. Curve Length, Normal Vectors, and Curvature Slide 401 The Scalar Line Element The symbol “d‘” in (13.3) is, strictly speaking, unnecessary decoration. However, it can be interpreted geometrically as a scalar line element, i.e., an infinitesimal length. Inspired by (13.2), ‘(‚(t)) = Z t a ∥‚ ′ (fi )∥ dfi one sometimes thinks of this “line element” as d‘ = ∥‚ ′ (t)∥ dt; but this equation should not be interpreted in a strict mathematical sense. Curve Length, Normal Vectors, and Curvature Slide 402 A Physical Wire 13.8. Example. The mass of a physical wire (interpreted as a curve; i.e., having no thickness) can be obtained by integrating its density along its path. If a wire C is taken to have variable density % its mass is given by ˛Z ˛ ˛ ˛ ˛ m = ˛ % d‘˛˛: C As an example, consider a semi-circular wire C = {(x; y ) ∈ R2 : x 2 + y 2 = 1; y ≥ 0} with density %(x; y ) = k(1 − y ) where k > 0 is a constant. (Thus the wire is denser at its base and light at the top. We might alternatively interpret the varying density as varying thickness of a uniformly dense wire.) Curve Length, Normal Vectors, and Curvature Slide 403 Total Mass of a Wire We choose the parametrization ‚(t) = (cos t; sin t), I = [0; ı]. We have Z C % d‘ = Z ı 0 % ◦ ‚(t) · ∥‚ ′ (t)∥ dt = so m = |k(ı − 2)| = k(ı − 2). Z ı 0 k(1 − sin t) · 1 dt = k(ı − 2); Curve Length, Normal Vectors, and Curvature Slide 404 Center of Mass of a Wire The center of mass of the wire is given by (xc ; yx ), where xc := 1 m Z x · % d‘; yc := C 1 m Z y · % d‘: C (of course, an analogous formula holds for objects represented as one-dimensional curves in Rn ). In our example, Z Z 1 ı 1 ı (x · %) ◦ ‚(t) dt = cos t · k(1 − sin t) dt = 0 m 0 m 0 Z 4−ı 1 ı sin t · k(1 − sin t) dt = yc = m 0 2(ı − 2) xc = Curve Length, Normal Vectors, and Curvature Slide 405 Rate of Change of the Tangent Vector In order to gain more insight into the geometric properties of a curve, we will study the rate of change of the tangent vector as it “travels” along a curve. We assume from now on that I V is a real inner product space and I C⊂V has a parametrization ‚ such that ‚ ′′ exists and ‚ ′′ ̸= 0. We call such a C a smooth C 2 -curve and ‚ a C 2 -parametrization. We are interested in ´ d ` T ◦ ‚(t) : dt (13.4) Now T ◦ ‚ itself parametrizes a curve T, so (13.4) gives the tangent vector of T at T ◦ ‚(t). Moreover, since ∥T ∥ = 1 we see that T ⊂ S = {x ∈ V : ∥x∥ = 1}: Curve Length, Normal Vectors, and Curvature Slide 406 The Normal Vector of a Curve Just as in Example 12.20, this implies that (13.4) is perpendicular to T ◦ ‚(t): 1 = ∥T ◦ ‚(t)∥2 = ⟨T ◦ ‚(t); T ◦ ‚(t)⟩ d ⇒ 0= ⟨T ◦ ‚(t); T ◦ ‚(t)⟩ = 2⟨(T ◦ ‚)′ (t); T ◦ ‚⟩ dt (13.5) 13.9. Definition. Let C ⊂ V be a smooth C 2 -curve. Let ‚ : I → V be a smooth C 2 -parametrization of C. Then the unit normal vector N : C → R is defined by N ◦ ‚(t) := (T ◦ ‚)′ (t) ; ∥(T ◦ ‚)′ (t)∥ t ∈ int I: (13.6) Curve Length, Normal Vectors, and Curvature Slide 407 The Normal Vector of a Curve The unit normal vector does not depend on ‚, not even up to orientation: suppose ‚ : I → C, ‚e : J → C are two C 2 -parametrizations connected by a C 2 -reparametrization r : J → I so that ‚e = ‚ ◦ r . Let p ∈ C satisfy p = ‚(t) = ‚e(fi ), t = r (fi ). By (12.5), T ◦ ‚e(fi ) = r ′ (fi ) T ◦ ‚(t): |r ′ (fi )| Suppose that r is orientation-reversing. Then d d T ◦ ‚e(fi ) = − T ◦ ‚(r (fi )) = −(T ◦ ‚)′ (r (fi ))r ′ (fi ) dfi dfi and N ◦ ‚e(fi ) = − (T ◦ ‚)′ (t) r ′ (fi ) (T ◦ ‚)′ (t) = : |r ′ (fi )| ∥(T ◦ ‚)′ (t)∥ ∥(T ◦ ‚)′ (t)∥ Of course, if r is orientation-preserving, the proof is similar. Curve Length, Normal Vectors, and Curvature Slide 408 The Normal Vector of a Curve 13.10. Example. We return to Example 12.20 and study the circle of radius R in R2 with parametrization ‚ : [0; 2ı) → C; ‚(t) = ! R cos t : R sin t The unit tangent vector at ‚(t) is given by T ◦ ‚(t) = − sin t cos t ! ⇒ (T ◦ ‚)′ (t) = and ∥(T ◦ ‚)′ (t)∥ = 1. Then (T ◦ ‚)′ (t) = N ◦ ‚(t) = ∥(T ◦ ‚)′ (t)∥ Thus, if p = (p1 ; p2 ) ∈ C, then 1 N(p) = − p R − cos t − sin t ! − cos t − sin t ! Curve Length, Normal Vectors, and Curvature Slide 409 Curvature We are interested in the rate of change of the direction of the tangent vector to a curve. However, while T does not depend on ‚, if we simply differentiate T ◦ ‚(t), the derivative will depend on ‚. In order to obtain a purely geometric measure for the rate of change of T , we need to settle on a “canonical” parametrization. Luckily, we have just developed one: parametrization using curve length. This parametrization takes into account only the specific geometric properties of the curve. 13.11. Definition. The curvature of a smooth C 2 -curve C ⊂ V is » : C → R; ‚ ‚d ` » ◦ ‘−1 (s) := ‚ ‚ ds ‚ ´‚ T ◦ ‘−1 (s) ‚ ‚ where T is the unit tangent vector and ‘−1 : I → C is the curve length parametrization of C. Note that, like the unit normal vector N, » also does not depend on the orientation of C. Curve Length, Normal Vectors, and Curvature Slide 410 Curvature in Arbitrary Parametrization Given a parametrization ‚ : I → C of C (which is not necessarily the curve length), by the chain rule ˛ ˛ ˛ d(T ◦ ‚) ˛˛ d(T ◦ ‘−1 ) ˛˛ d(‘ ◦ ‚) ˛˛ = : ˛ ˛ ˛ dt ds dt ˛t t s=‘◦‚(t) Using (13.2), ˛ ˛ 1 d(T ◦ ‘−1 ) ˛˛ d(T ◦ ‚) ˛˛ = ′ ˛ ˛ ds ∥‚ (t)∥ dt ˛t s=‘◦‚(t) and so we obtain for the curvature at p = ‚(t) = ‘−1 (s) ˛ » ◦ ‚(t) = » ◦ ‘−1 (s)˛s=‘◦‚(t) = ∥(T ◦ ‚)′ (t)∥ : ∥‚ ′ (t)∥ (13.7) Curve Length, Normal Vectors, and Curvature Slide 411 Curvature 13.12. Example. We return to Example 12.20 and study the circle of radius R in R2 with parametrization ‚ : [0; 2ı) → C; ‚(t) = ! R cos t : R sin t We have T ◦ ‚(t) = ! − sin t ; cos t so by (13.7) ‚ 1‚ ∥(T ◦ ‚)′ (t)∥ ‚ − cos t = » ◦ ‚(t) = ‚ ∥‚ ′ (t)∥ R ‚ − sin t !‚ ‚ 1 ‚ ‚= : ‚ R Thus, the curvature of circle is constant and equal to the inverse of its radius. The Riemann Integral for Scalar-Valued Functions Slide 412 14. The Riemann Integral for Scalar-Valued Functions The Riemann Integral for Scalar-Valued Functions Slide 413 Continuity, Differentiability, Integrability 8. Sets and Equivalence of Norms 9. Continuity and Convergence 10. The First Derivative 11. The Regulated Integral for Vector-Valued Functions 12. Curves, Orientation, and Tangent Vectors 13. Curve Length, Normal Vectors, and Curvature 14. The Riemann Integral for Scalar-Valued Functions 15. Integration in Practice 16. Parametrized Surfaces and Surface Integrals The Riemann Integral for Scalar-Valued Functions Slide 414 Integration in Rn Contrary to the general approach we took in differential calculus, we will now focus on functions defined on Rn , even restricting ourselves to R2 and R3 in some cases. The reason for this is that the geometry of objects in general vector spaces (even Rn ) is quite complex, and in order to understand even the finite-dimensional case, we would need to introduce a variety of abstract concepts (manifolds, tangent and cotangent spaces etc.). This is generally done in courses on vector analysis; regrettably, we do not have time to pursue these things here. When integrating a function defined on a subset ˙ ⊂ R2 , there are some difficulties that do not occur for functions of a single variable. In particular, while integrating across an interval [a; b] is straightforward, the shape of ˙ now plays a significant role. We will discuss the concept of volume of sets and introduce integrals of step functions on cuboids before attempting to extend integration to continuous functions defined on more general sets. The Riemann Integral for Scalar-Valued Functions Slide 415 Cuboids We wish to assign a volume to general sets in Rn . 14.1. Definition. Let ak ; bk , k = 1; : : : ; n be pairs of numbers with ak < bk . Then the set Q ⊂ Rn given by Q = [a1 ; b1 ] × · · · × [an ; bn ] = {x ∈ Rn : xk ∈ [ak ; bk ]; k = 1; : : : ; n} is called an n-cuboid. We define the volume of Q to be |Q| := n Y (bk − ak ): k=1 We will denote the set of all n-cuboids by Qn . 14.2. Remark. Clearly, an n-cuboid is a compact set in Rn . The Riemann Integral for Scalar-Valued Functions Slide 416 Upper and Lower Volumes of Sets The idea for assigning volume to a subset ˙ ⊂ Rn is similar to that for the Riemann integral: consider volumes of enclosing and enclosed n-cuboids; if there infimum and supremum (respectively) are equal, assign this number as the volume of ˙. 14.3. Definition. Let ˙ ⊂ Rn be a bounded non-empty set. We define the outer and inner volume of ˙ by V (˙) := inf r nX k=0 V (˙) := sup r nX Qk ; k=1 r [ Qk ; |Qk | : r ∈ N; Q0 ; : : : ; Qr ∈ Qn ; ˙ ⊃ k=0 It is easy to see that 0 ≤ V (˙) ≤ V (˙). o r [ |Qk | : r ∈ N; Q0 ; : : : ; Qr ∈ Qn ; ˙ ⊂ k=1 r \ k=1 o Qk = ∅ : The Riemann Integral for Scalar-Valued Functions Slide 417 Measurable Sets Sets for which we can define a volume are called measureable. The volume is referred to as the measure of a set. 14.4. Definition. Let ˙ ⊂ Rn be a bounded set. Then ˙ is said to be (Jordan) measurable if either (i) V (˙) = 0 or (ii) V (˙) = V (˙). In the first case, we say that ˙ has (Jordan) measure zero, in the second case we say that |˙| := V (˙) = V (˙) is the Jordan measure of ˙. The Riemann Integral for Scalar-Valued Functions Slide 418 Sets of Measure Zero For a set ˙ ⊂ Rn to have measure zero, V (˙) does not need to exist (possibly because there is no n-cuboid that can be a subset of ˙). 14.5. Examples. (i) A set {x} consisting of a single point x ∈ Rn is a set of measure zero. (ii) A subset of Rn consisting of a finite number of single points is a set of measure zero. (iii) A curve of finite length C ⊂ Rn , n ≥ 2, is a set of measure zero. (iv) A bounded section of a plane in R3 is a set of measure zero. (v) The set of rational numbers in the interval [0; 1] is not (Jordan) measurable. The outer volume is V (Q ∩ [0; 1]) = 1, but the inner volume does not exist. The Riemann Integral for Scalar-Valued Functions Slide 419 “Almost Everywhere” Properties 14.6. Definition. A function f on Rn that has a property for all x ∈ Rn \ ˙, where ˙ is a set of measure zero, is said to have this property almost everywhere (often abbreviated by “a.e.”). 14.7. Example. (i) The function f : R → R, f (x) = |x| is differentiable almost everywhere. (ii) The function f : [0; 1] × [0; 1] → R, f (x; y ) = ( 1 x−y x ̸= y ; 0 otherwise is continuous almost everywhere. ; The Riemann Integral for Scalar-Valued Functions Slide 420 Partitions of Cuboids Let Q ⊂ Rn be an n-cuboid. We can then define step functions on Q just as we did for intervals. First, recall from 186 Definition 4.1.1 that a partition P of an interval [a; b] is a finite sequence of numbers P = (a0 ; : : : ; an ) with a = a0 < a1 < : : : < am = b: 14.8. Definition. A partition P of an n-cuboid Q = [a1 ; b1 ] × · · · × [an ; bn ] is a tuple P = (P1 ; : : : ; Pn ) such that Pk = (ak0 ; : : : ; akmk ) is a partition of the interval [ak ; bk ]. The partition P of Q induces cuboids of the form Qj1 j2 :::jn := [a1(j1 −1) ; a1j1 ] × · · · × [an(jn −1) ; anjn ] ⊂ Q: The Riemann Integral for Scalar-Valued Functions Slide 421 Step Functions on Cuboids The intersection of the cuboids Qj1 j2 :::jn is a subset of ˙ = {x ∈ Q : xk = akik for some k = 1; : : : ; n − 1}; which is a set of measure zero. We say that the union of sets whose intersection is a set of measure zero is almost disjoint. Thus, Q= [ Qj1 j2 :::jn 1≤j1 ≤m1 .. . 1≤jn ≤mn is the union of almost disjoint cuboids induced by a partition P of Q. The Riemann Integral for Scalar-Valued Functions Slide 422 Step Functions on Cuboids 14.9. Definition. Let Q ⊂ Rn be an n-cuboid. A function f : Q → R is called a step function with respect to a partition P if there exist numbers yj1 j2 :::jn ∈ R such that f (x) = yj1 j2 :::jn whenever x ∈ int Qj1 j2 :::jn , jk = 1; : : : ; mk , k = 1; : : : ; n. 14.10. Remarks. (i) It doesn’t matter how the step function is defined on the set ˙ = {x ∈ Q : xk = akik for some k = 1; : : : ; n}; which is a set of measure zero. (ii) We call f simply a step function on Q if there exists some partition P of Q with respect to which it is a step function. (iii) The set of step functions on Q is a vector space (the sum of two step functions is a step function w.r.t. a common subpartition of the partitions). This vector space os a subspace of the space of bounded functions on Q. The Riemann Integral for Scalar-Valued Functions Slide 423 Integration of Step Functions 14.11. Theorem. Let Q ⊂ Rn be a cuboid and f : Q → R a step function with respect to some partition P of Q. Then X IP (f ) := |Qj1 :::jn | · yj1 :::jn j1 =1;:::;m1 .. . jn =1;:::;mn is independent of the choice of the partition P and is called the integral of f. We thus define Z f := IP (f ) Q for any partition P to be the integral of a step function over a cuboid Q. The Riemann Integral for Scalar-Valued Functions Slide 424 The Regulated Integral Recall that the regulated integral was defined through the following procedure: I We defined the set of step functions on an interval [a; b] ⊂ R. I We defined the integral of a step function. I Those functions that were uniform limits of sequences of step functions were termed regulated functions. Their integral was defined as the limit of the integrals of a corresponding sequence of step functions. I We showed that the continuous functions are regulated. The same is true for piecewise-continuous functions. We can not extend this strategy to sets in Rn ; the reason it breaks down is that functions f : ˙ → R on general domains ˙ ⊂ Rn can not be approximated uniformly by step functions. The Riemann Integral for Scalar-Valued Functions Slide 425 The Riemann / Darboux Integral However, we have an alternative approach at hand in the form of the Riemann integral. This was defined for a function f : [a; b] → R as follows: I We defined the set of step functions on an interval [a; b] ⊂ R. I We defined the set of lower step functions w.r.t. f , i.e., those whose values are less than the values of f . I We defined the set of upper step functions w.r.t. f , i.e., those whose values are greater than the values of f . I If the greatest lower bound for the values of the integrals of upper step functions coincides with the least upper bound for the integrals of lower step functions, this must be the integral of f . This integral is called the Darboux integral and is equivalent to the Riemann integral. I The Riemann integral coincides with the integral for regulated functions, but can be applied even to functions that are not regulated. The Riemann Integral for Scalar-Valued Functions Slide 426 Integration over Cuboids We will now formulate the definition of the Riemann integral for functions of several variables with real values. 14.12. Definition. Let Q ⊂ Rn be an n-cuboid and f a bounded real function on Q. let Uf denote the set of all step functions u on Q such that u ≥ f and Lf the set of all step functions v on Q such that v ≤ f . The function f is then said to be (Darboux)-integrable if sup v ∈Lf Z v = inf Q u∈Uf Z u: Q In this case, the (Darboux) integral of f over Q, this common value. R Q f , is defined to be 14.13. Theorem. A bounded function f : Q → R is Darboux-integrable if and only if for every " > 0 there exist step functions u" and v" such that v" ≤ f ≤ u" and Z Z Q u" − Q v" ≤ ": The Riemann Integral for Scalar-Valued Functions Slide 427 Integration over Cuboids 14.14. Proposition. Let Q ⊂ Rn be an n-cuboid and f : Q → R be bounded and continuous almost everywhere. Then f is Darboux-integrable. Proof. Since f is continuous almost everywhere, we can find a partition of Q such that the set of points where f is discontinuous is contained in cuboids of arbitrarily small measure. Furthermore, since f is bounded, we can find some C > 0 such that −C=2 < f (x) < C=2: For any partition P of Q we denote by Q′ the union of the induced cuboids on which f is discontinuous. Fix " > 0 and choose a partition of Q such that " : |Q′ | < 2C The Riemann Integral for Scalar-Valued Functions Slide 428 Integration over Cuboids Proof (continued). Let Q′′ := Q \ Q′ be the union of the cuboids where f is continuous. Choose the partition P in such a way that sup x;y ∈Qj1 :::jn |f (x) − f (y )| ≤ " 2|Q| for any Qj1 :::jn ⊂ Q′′ . Define a step function u as follows: u(x) = 8 > C=2 > > < sup x∈Qj1 :::jn > > > : f (x) Clearly, f ≤ u on Q. x ∈ Qj1 :::jn ; Qj1 :::jn ⊂ Q′ ; f (x) x ∈ Qj1 :::jn ; Qj1 :::jn ⊂ Q′′ ; x ∈ @Qj1 :::jn : The Riemann Integral for Scalar-Valued Functions Slide 429 Integration over Cuboids Proof (continued). Similarly, set v (x) = so v ≤ f on Q. 8 > >−C=2 > < inf x∈Qj1 :::jn > > > :f (x) x ∈ Qj1 :::jn ; Qj1 :::jn ⊂ Q′ ; f (x) x ∈ Qj1 :::jn ; Qj1 :::jn ⊂ Q′′ ; x ∈ @Qj1 :::jn ; The Riemann Integral for Scalar-Valued Functions Slide 430 Integration over Cuboids Proof (continued). Then ˛Z ˛ ˛Z ˛ Z ˛ ˛Z ˛ ˛ ˛ ˛ ˛ ˛ ˛ u− ˛ ˛ ˛ ˛ v ˛ ≤ ˛ (u − v )˛ + ˛ (u − v )˛˛ ˛ ′ ′′ Q Q Q Q ≤ |Q′ | · sup |u(x) − v (x)| + |Q′′ | · sup |u(x) − v (x)| ≤ |Q′ | · x∈Q′ ≤ " " · C + |Q′′ | · < ": 2C 2|Q| By Theorem 14.13, f is integrable. x∈Q′′ „ The Riemann Integral for Scalar-Valued Functions Slide 431 Integration over Jordan-Measurable Sets When integrating a function f : U → R defined on some set U ⊂ Rn , we automatically consider the domain of f to be extended to all of Rn by setting f (x) = 0 for x ∈ Rn \ U. At the same time, we define the indicator function for a set ˙ ⊂ Rn : 1˙ (x) = ( x ∈ ˙; otherwise: 1; 0; Then 1˙ (x)f (x) = ( f (x) 0 x ∈ ˙; x ̸∈ ˙: The Riemann Integral for Scalar-Valued Functions Slide 432 Integration over Jordan-Measurable Sets 14.15. Definition. Let U ⊂ Rn , f : U → R and ˙ ⊂ Rn be a bounded Jordan-measurable set. Then f is said to be integrable on ˙ if for every n-cuboid Q ⊂ Rn such that ˙ ⊂ Q the function 1˙ f : Q → R is integrable. We then write Z Z f := f · 1˙ : ˙ Q for any n-cuboid Q ⊃ ˙. We omit the proof of the following result: 14.16. Lemma. Let ˙ ∈ Rn be a bounded set. Then ˙ is Jordan-measurable if and only if its boundary @˙ has Jordan measure zero. Proposition 14.14 and Lemma 14.16 immediately yield: 14.17. Corollary. Let ˙ ⊂ Rn be a bounded Jordan-measurable set and let f : ˙ → R be continuous a.e. Then f is integrable on ˙. The Riemann Integral for Scalar-Valued Functions Slide 433 Basic Properties of the Integral From the definition of the integral and measurability of sets, we have the following result: 14.18. Lemma. (i) Let ˙ ⊂ Rn be a measurable set. Then |˙| = Z 1: ˙ (ii) Let ˙ ⊂ Rn be a set of measure zero and and f : ˙ → R some R function that is integrable on ˙. Then ˙ f = 0. (iii) Let ˙ ⊂ Rn and ˙ ′ ⊂ ˙ be measurable sets and f : Rn → R integrable on ˙. Then f is also integrable on ˙ ′ . (iv) Let ˙; ˙ ′ ⊂ Rn measurable sets and f : Rn → R integrable on both of them. Then f is integrable on ˙ ∪ ˙ ′ and Z ˙∪˙ ′ f = Z ˙ f + Z ˙′ f − Z ˙∩˙ ′ f: Integration in Practice Slide 434 15. Integration in Practice Integration in Practice Slide 435 Continuity, Differentiability, Integrability 8. Sets and Equivalence of Norms 9. Continuity and Convergence 10. The First Derivative 11. The Regulated Integral for Vector-Valued Functions 12. Curves, Orientation, and Tangent Vectors 13. Curve Length, Normal Vectors, and Curvature 14. The Riemann Integral for Scalar-Valued Functions 15. Integration in Practice 16. Parametrized Surfaces and Surface Integrals Integration in Practice Slide 436 Practical Integration over Cuboids The following result lets us reduce integrals over n-cuboids to separate integrals over n1 - and n2 -cuboids, where n1 + n2 = n. Since we know how to integrate over 1-cuboids (intervals!), this is a powerful tool for evaluating general integrals. 15.1. Fubini’s Theorem. Let Q1 be and n1 -cuboid and Q2 an n2 -cuboid so that Q := Q1 × Q2 ⊂ Rn1 +n2 is an (n1 + n2 )-cuboid. Assume that f : Q → R is integrable on Q and that for every x ∈ Q1 the integral g (x) = Z Q2 exists. Then Z Q f = Z Q1 ×Q2 f = f (x; · ) Z Q1 g= Z Q1 „Z « f : Q2 We omit the proof of this theorem; the statement can be shown as in the previous results by considering step functions. Integration in Practice Slide 437 Practical Integration over Cuboids 15.2. Example. Consider the 2-cuboid Q = [0; 1] × [0; 2] and the function f : Q → R, f (x; y ) = y x 2 + 2y 2 . The integral Z " Z 2 #2 x2 2 2 3 16 g (x) = f (x; · ) = f (x; y ) dy = y + y = 2x 2 + 2 3 3 [0;2] 0 y =0 exists for every x ∈ [0; 1], so we can apply Fubini’s Theorem to yield Z f = Q Z [0;1] = g= Z 1 g (x) dx = 0 Z 1 „Z 2 0 « f (x; y ) dy dx 0 i1 2h 3 18 x + 8x = : 0 3 3 We can thus use Fubini’s theorem to iteratively reduce integrals over n-cuboids to integrals over intervals, which we know well how to calculate. Integration in Practice Slide 438 Practical Integration over Cuboids We will often omit the parentheses when evaluating multiple integrals, e.g., we will write Z 1Z 2 0 f (x; y ) dy dx = Z 1 „Z 2 0 0 « f (x; y ) dy dx 0 If the hypotheses of Fubini’s Theorem are satisfied, we can therefore write Z Q f = Z bn an ··· Z b2 Z b1 a2 f (x1 ; x2 ; : : : ; xn ) dx1 dx2 : : : dxn : a1 We will often abbreviate dx := dx1 dx2 : : : dxn , writing Z Q f = Z Q f (x) dx: Integration in Practice Slide 439 Ordinate and Simple Regions in R2 Of course, we do not just want to integrate over cuboids but also over more complicated domains. For most purposes it is sufficient to consider regions whose boundaries can be expressed by the graphs of functions. 15.3. Definition. A set D ⊂ R2 is called an ordinate region with respect to x2 , if there exists an interval I ⊂ R and continuous, almost everywhere differentiable functions ’1 ; ’2 : I → R such that ˘ ¯ D = (x1 ; x2 ) ∈ R2 : x1 ∈ I; ’1 (x1 ) ≤ x2 ≤ ’2 (x1 ) : If the role of x1 and x2 above is interchanged, we say that D is an ordinate region with respect to x1 . If D ⊂ R2 is an ordinate region both with respect to x1 and x2 , we say that D is a simple region. Integration in Practice Slide 440 Ordinate and Simple Regions in R2 15.4. Example. The half-disk region ˘ R1 = (x1 ; x2 ) ∈ R2 : x2 ≥ 0; x12 + x22 ≤ 1 is a simple region, because we can write ˘ R1 = (x1 ; x2 ) ∈ R2 : x1 ∈ [−1; 1]; 0 ≤ x2 ≤ q ˘ q ¯ 1 − x12 = (x1 ; x2 ) ∈ R2 : x2 ∈ [0; 1]; − 1 − x22 ≤ x1 ≤ ¯ q x2 R1 -1 1 x1 ¯ 1 − x22 : Integration in Practice Slide 441 Ordinate and Simple Regions in R2 15.5. Example. The upper half-annulus ˘ ¯ R2 = (x1 ; x2 ) ∈ R2 : x2 ≥ 0; 1=4 ≤ x12 + x22 ≤ 1 is an ordinate region with respect to x2 but not with respect to x1 . We can write ˘ R2 = (x1 ; x2 ) ∈ R : x1 ∈ [−1; 1]; f (x1 ) ≤ x2 ≤ 2 x2 q ¯ 1 − x12 : where f (x) = 1 2 q 1=4 − x 2 if |x| < 1=2 and R2 f (x) = 0 -1 1 -2 1 2 1 x1 otherwise. Integration in Practice Slide 442 Ordinate and Simple Regions in R2 15.6. Example. The annulus ˘ ¯ R3 = (x1 ; x2 ) ∈ R2 : 1=4 ≤ x12 + x22 ≤ 1 is not an ordinate region (but can be expressed as the union of two ordinate regions). x2 1 1 2 -1 1 R3 1 2 -2 1 -2 -1 1 x1 Integration in Practice Slide 443 Ordinate Regions in Rn We now generalize ordinate regions to Rn . For x ∈ Rn we define x̂ (k) := (x1 ; : : : ; xk−1 ; xk+1 ; : : : ; xn ) ∈ Rn−1 as the vector x with the kth component omitted. 15.7. Definition. A subset U ⊂ Rn is said to be an ordinate region (with respect to xk ) if there exists a measurable set ˙ ⊂ Rn−1 and continuous, almost everywhere differentiable functions ’1 ; ’2 : ˙ → R, such that ˘ ¯ U = (x ∈ Rn : x ∈ ˙; ’1 (x̂ (k) ) ≤ xk ≤ ’2 (x̂ (k) ) : If U is an ordinate region with respect to each xk , k = 1; : : : ; n, it is said to be a simple region. 15.8. Lemma. Any ordinate region is measurable. Integration in Practice Slide 444 Ordinate Regions 15.9. Example. The unit ball in R3 , B 3 := {x ∈ R3 : ∥x∥ ≤ 1} is an ordinate region, since we can write ȷ q B 3 = x ∈ R3 : (x1 ; x2 ) ∈ B 2 ; − 1 − x12 − x22 ≤ x3 ≤ q 1 − x12 − x22 ff where B 2 := {x ∈ R2 : ∥x∥ ≤ 1}. Of course, we still need to check that B 2 is measurable. However, ȷ q B 2 = x ∈ R2 : x1 ∈ [−1; 1]; − 1 − x12 ≤ x2 ≤ q 1 − x12 ff is itself an ordinate set, and since the interval [−1; 1] is measurable, so is B2. Integration in Practice Slide 445 Integrals on Ordinate Regions For an ordinate region U ∈ Rn with respect to xk over a measurable set ˙ the indicator function 1U takes the form 1U (x) = 1˙ (x̂ (k) ) · 1[’1 (x̂ (k) );’2 (x̂ (k) )] (xk ) It then follows that Z U if R ’2 (x̂ (k) ) ’1 (x̂ (k) ) f (x) dx1 : : : dxn = Z „Z ’2 (x̂ (k) ) ˙ ’1 (x̂ (k) ) f (x) dxk exists for every x̂ (k) ∈ ˙. « f (x) dxk d x̂ (k) Integration in Practice Slide 446 Integrals on Ordinate Regions 15.10. Example. The volume of a Jordan-measurable measurable set ˙ ⊂ Rn is given by Z |˙| = 1: ˙ As an example, we calculate the volume of the three-dimensional unit ball B 3 . Writing B 3 as ordinate regions, we have Z Z Z √ 2 2 |B 3 | = =2 B3 Z B2 =2 1−x1 −x2 1= Z 1 −1 B2 q √ − 1−x12 −x22 1 dx3 d(x1 ; x2 ) 1 − x12 − x22 d(x1 ; x2 ) „Z √ 2 1−x1 √ − q 1−x12 q « 1 − x12 − x22 dx2 dx1 We now substitute y2 = x2 = 1 − x12 in the inner integral. Integration in Practice Slide 447 Integrals on Ordinate Regions |B 3 | = 2 =2 =8 Z 1„ Z 1 q −1 −1 Z 1 −1 Z 1 0 = 16 3 (1 − x12 ) (1 − x12 ) dx1 · (1 − x12 ) dx1 · Z 1q 0 « 1 − y22 dy2 dx1 Z 1 q −1 Z 1q 0 1 − y22 dy2 1 − y22 dy2 1 − y22 dy2 Substituting y2 = sin „, we obtain |B 3 | = as expected. 16 3 Z ı=2 0 4 cos2 „ d„ = ı; 3 Integration in Practice Slide 448 Bodies, Moments and Center of Mass A rigid body is a set B ⊂ Rn (in physics, n = 2; 3) with a mass distribution % : B → R. The mass of the body is given by M(B) = Z %: B We define the (first) moments of B by mk (B) = Z xk %; k = 1; : : : ; n: B Then the center of mass is given by xc (B) = 1 M(B) 0 1 m1 (B) B .. C @ . A: mn (B) If % = 1 on B, then xc (B) represents the geometric center and M(B) = |B| the volume of B. Integration in Practice Slide 449 Bodies, Moments and Center of Mass 15.11. Example. Let B ⊂ R2 be given by B = {(x; y ) ∈ R2 : 0 ≤ x ≤ 1; 0 ≤ y ≤ x 2 }: with %(x; y ) = x + y . Then M(B) = Z %= Z 1 Z x2 B = Z 1 0 (x + y ) dy dx = 0 Z 1 0 (x 3 + x 4 =2) dx = 1=4 + 1=10 = 7=20: 0 The moments of B are m1 (B) = = Z x% = B Z 1 0 Z 1 Z x2 0 2 [xy + y 2 =2]x0 dx x(x + y ) dy dx 0 (x 4 + x 5 =2) dx = 1=5 + 1=12 = 17=60; Integration in Practice Slide 450 Bodies, Moments and Center of Mass and m2 (B) = Z B = y% = Z 1 Z 1 Z x2 0 y (x + y ) dy dx 0 (x 5 =2 + x 6 =3) dx = 1=12 + 1=21 = 11=84: 0 Hence the center of mass is given by ! 1 m1 (B) xc (B) = M(B) m2 (B) ! 20 17=60 = 7 11=84 = 17=21 55=147 ! Integration in Practice Slide 451 The Substitution Rule A powerful tool in evaluating integrals is the substitution rule, which takes on an analogous form to that for functions of one variable. We will merely state, and not prove, this result. 15.12. Substitution Rule. Let ˙ ⊂ Rn be open and g : ˙ → Rn injective and continuously differentiable. Suppose that det Jg (y ) ̸= 0 for all y ∈ ˙. Let K be a compact measurable subset of ˙. The g (K) is compact and measurable and if f : g (K) → R is integrable, then Z g (K) f (x) dx = Z K f (g (y )) · |det Jg (y )| dy : Integration in Practice Slide 452 Polar Coordinates 15.13. Examples. The most important substitutions are transformations to cylindrical or spherical/polar coordinates. (i) Polar coordinates in R2 are defined by a map ˘ : (0; ∞) × [0; 2ı) → R2 \ {0}; (r; ffi) 7→ (x; y ) where x = r cos ffi; y = r sin ffi: Note that this map is bijective and even C ∞ in the interior of its domain. An alternative (but rarely used) version of polar coordinates would map x = r sin ffi, y = r cos ffi. This simply corresponds to a different geometrical interpretation of the angle ffi. In any case, ˛ ˛ cos ffi ˛ |det J˘ (r; ffi)| = ˛det ˛ sin ffi !˛ ˛ ˛ ˛=r ˛ −r sin ffi r cos ffi Integration in Practice Slide 453 Cylindrical Coordinates (ii) Cylindrical coordinates in R3 are given through a map ˘ : (0; ∞) × [0; 2ı) × R → R3 \ {0}; (r; ffi; “) 7→ (x; y ; z) defined by x = r cos ffi; y = r sin ffi; z =“ In this case, ˛ 0 ˛ cos ffi ˛ ˛ B |det J˘ (r; ffi; “)| = ˛det @ sin ffi ˛ ˛ 0 1˛ −r sin ffi 0 ˛˛ C˛ r cos ffi 0A˛ = r ˛ 0 1 ˛ Integration in Practice Slide 454 Spherical Coordinates in R3 (iii) Spherical coordinates in R3 are often defined through a map ˘ : (0; ∞) × [0; 2ı) × (0; ı) → R3 \ {0}; (r; ffi; „) 7→ (x; y ; z); x = r cos ffi sin „; y = r sin ffi sin „; z = r cos „: Of course, there is a certain freedom in defining „ and ffi, so there are alternative formulations. The modulus of the determinant of the Jacobian is given by ˛ 0 ˛ cos ffi sin „ ˛ ˛ B |det J˘ (r; ffi; „)| = ˛det @ sin ffi sin „ ˛ ˛ cos „ = r 2 sin „ 1˛ −r sin ffi sin „ r cos ffi cos „ ˛˛ C˛ r cos ffi sin „ r sin ffi cos „ A˛ ˛ ˛ 0 −r sin „ Integration in Practice Slide 455 Spherical Coordinates in Rn (iv) In Rn , we can define spherical coordinates by x1 = r cos „1 x2 = r sin „1 cos „2 x3 = r sin „1 sin „2 cos „3 .. . xn−1 = r sin „1 sin „2 : : : sin „n−2 cos „n−1 xn = r sin „1 sin „2 : : : sin „n−2 sin „n−1 with r > 0 and 0 < „k < ı, k = 1; : : : ; n − 2, and 0 < „n−1 < 2ı. Here, the determinant of the Jacobian can be shown to be |det J˘ (r; „1 ; : : : ; „n−1 )| = r n−1 sinn−2 „1 sinn−3 „2 : : : sin „n−2 : Integration in Practice Slide 456 The Substitution Rule in Practice Using spherical coordinates in R3 as an example, we write Z ˙ f = Z ˙ f (x) dx = Z ˘−1 (˙) f ◦ ˘(r; „; ffi) · |det J˘ (r; „; ffi)| dr d„ dffi The terms dx and |det J˘ (r; „; ffi)| dr d„ dffi are often referred to as volume elements, and one sometimes writes dx = |det J˘ (r; „; ffi)| dr d„ dffi: Physicists like to interpret dx as an “infinitesimally small volume” whose volume is changed when transforming by ˘−1 to dr d„ dffi. Thus |det J˘ (r; „; ffi)| (which can be interpreted as the size of the parallelepiped @x @x spanned by the tangent vectors @x @r , @„ , @ffi at x) corrects this change in volume. These ideas can be made rigorous, but we will not pursue them further. Integration in Practice Slide 457 The Substitution Rule in Practice 15.14. Example. We can again calculate the volume of the unit ball B 3 ⊂ R3 . Using spherical coordinates, |B 3 | = Z 1= Z 2ı Z ı Z 1 B3 = 2ı Z ı 0 4ı = 3 0 sin „ d„ · Z ı=2 0 3 Note that B is not given by 0 0 Z 1 r 2 sin „ dr d„ dffi r 2 dr 0 sin „ d„ = 4ı : 3 {(x; y ; z) ∈ R3 : x = r cos ffi sin „; y = r sin ffi sin „; z = r cos „; 0 ≤ ffi < 2ı; 0 < „ < ı; 0 < r ≤ 1} because this set does not include the set {(0; 0; x) : x ∈ [−1; 1]}. Since this set is of measure zero (as is the boundary S 2 of B 3 ) our calculation remains correct. Integration in Practice Slide 458 Gravitational Potential 15.15. Example. We want to calculate the gravitational potential of a homogenous solid ball in R3 of mass M and radius R at a point p ∈ R3 with distance r > R from the center of the sphere. This potential is given by Z %( · ) U(p) = −G B 3 dist(p; · ) where % is the mass density of the sphere. In our case, %= M 3M = · 1 3: 3 |B | 4ıR3 B Due to the symmetry of the problem, we may choose coordinates such that p = (0; 0; r ) and introduce polar coordinates x1 = ȷ cos ffi sin „; x2 = ȷ sin ffi sin „; with 0 ≤ ffi < 2ı, 0 ≤ „ < ı, 0 < ȷ ≤ R. x3 = ȷ cos „ Integration in Practice Slide 459 Gravitational Potential Then dist p; x(r; ffi; „) = q = q ` ´ (ȷ cos ffi sin „)2 + (ȷ sin ffi sin „)2 + (ȷ cos „ − r )2 ȷ2 + r 2 − 2r ȷ cos „ and Z 3MG dx U(p) = − 3 3 4ıR B dist(p; x) Z Z Z 3MG R 2ı ı ȷ2 sin „ d„ dffi dȷ p =− 4ıR3 0 0 ȷ2 + r 2 − 2r ȷ cos „ 0 3MG =− 4ıR3 3MG =− 3 2R r Z R Z 2ı 0 0 Z R „q ȷ 0 ˛ı ˛ ȷ2 q 2 ˛ ȷ + r 2 − 2r ȷ cos „˛ dffi dȷ ˛ rȷ 0 ȷ2 + r 2 + 2r ȷ − q ȷ2 + r 2 − 2r ȷ « dȷ Integration in Practice Slide 460 Gravitational Potential Continuing, Z „ q q 3MG R U(p) = − 3 ȷ (ȷ + r )2 − (ȷ − r )2 2R r 0 Z 3MG R ȷ (ȷ + r − |ȷ − r |) dȷ =− 3 2R r 0 « dȷ Since r > R > ȷ, we have Z 3MG R ȷ (ȷ + r − (r − ȷ)) dȷ U(p) = − 3 2R r 0 Z 3MG R 2 MG =− 3 ȷ dȷ = − R r 0 r Thus the potential induced by a sphere of mass M and radius R at a point with distance r > R from the center of the sphere is the same as that induced by a point mass with mass M situated at the center of the sphere. Integration in Practice Slide 461 Gravitational Potential 15.16. Remarks. In the physical literature, this is part of what is called the shell theorem. You will study the other parts of this theorem in the assignments. An analogous formula holds for the electrostatic potential induced by a body with charge density %. If the mass/charge distribution of the sphere is not uniform, then the integral becomes much more difficult to solve. One then expands the integrand using the generating function of the Legendre polynomials Pl , l ∈ N, ∞ X 1 √ = Pl (x)t l 2 1 + 2xt + x l=0 which for every x ∈ [−1; 1] has radius of convergence 1. The same expansion can be used when summing over several discrete point charges/masses, where it is then called a multi-pole expansion. Integration in Practice Slide 462 Improper Integrals Just as for integrals of a single variable, we can treat improper Riemann integrals of functions f : Rn → R over measurable sets ˙ ⊂ Rn . These occur if either 1. f is unbounded or 2. ˙ is unbounded. In either case, one considers the improper integral as a suitable limit of “proper” integrals; if the limit exists, so does the improper integral. 15.17. Example. Our aim is to prove that the Gauß integral Z ∞ exists and equals √ −∞ e −x =2 dx 2 2ı. First, consider the integral I(a) = Z a −a e −x =2 dx: 2 Integration in Practice Slide 463 The Gauß Integral Since the integrand is positive and continuous, I(a) exists and is increasing. For a > 1, I(a) < Z −1 −a −xe −x =2 dx + 2 = 2e −1=2 − 2e −a =2 + 2 −−−→ 2e −1=2 + a→∞ Z 1 −1 Z 1 2 −1 Z 1 −1 e −x =2 dx + Z a xe −x =2 dx 2 1 e −x =2 dx 2 e −x =2 dx < ∞; 2 so I(a) is bounded. It follows that lim I(a) =: a→∞ R∞ −∞ e −x 2 =2 dx exists. We now consider I(a)2 = „Z a −a e −x =2 dx 2 «„Z a −a « e −y =2 dy : 2 Integration in Practice Slide 464 The Gauß Integral By Fubini’s theorem, we can write I(a)2 = Z e −(x +y )=2 dx dy ; 2 2 Qa where Qa = [−a; a] × [−a; a]. Now Ba (0) ⊂ Qa ⊂ B2a (0), where Br (0) = {(x; y ) ∈ R2 : x 2 + y 2 < r 2 }, so Z e −(x +y )=2 dx dy ≤ I(a)2 ≤ 2 Ba (0) 2 Z e −(x +y )=2 dx dy : 2 2 B2a (0) Using polar coordinates, we calculate Z e −(x +y )=2 dx dy = 2 BR (0) 2 Z 2ı Z R 0 e −r =2 r dr dffi 2 0 = 2ı(1 − e −R =2 ) −−−−→ 2ı √ R ∞ −x 2 =2 This implies lim I(a)2 = 2ı and hence −∞ e dx = 2ı. 2 a→∞ R→∞ Parametrized Surfaces and Surface Integrals Slide 465 16. Parametrized Surfaces and Surface Integrals Parametrized Surfaces and Surface Integrals Slide 466 Continuity, Differentiability, Integrability 8. Sets and Equivalence of Norms 9. Continuity and Convergence 10. The First Derivative 11. The Regulated Integral for Vector-Valued Functions 12. Curves, Orientation, and Tangent Vectors 13. Curve Length, Normal Vectors, and Curvature 14. The Riemann Integral for Scalar-Valued Functions 15. Integration in Practice 16. Parametrized Surfaces and Surface Integrals Parametrized Surfaces and Surface Integrals Slide 467 Parametrized Surfaces We will now introduce surfaces in Rn . While it is possible to discuss surfaces without references to a parametrization and then consider different parametrizations and reparametrizations like we did for curves, this requires more mathematical background than we have developed at this point. Therefore, we restrict ourselves to parametrized surfaces, i.e., surfaces that are accompanied by a fixed parametrization. 16.1. Definition. A (smooth, parametrized) m-surface in Rn is a subset S ⊂ Rn together with a locally bijective, continuously differentiable map (parametrization) ’ : ˙ → S; ˙ ⊂ Rm ; such that rank D’|x = m for almost every x ∈ ˙. If m = n − 1, then (S; ’) is said to be a hypersurface. Parametrized Surfaces and Surface Integrals Slide 468 Parametrized Surfaces 16.2. Example. The unit sphere in R3 , ˘ S 2 := (x1 ; x2 ; x3 ) ∈ R3 : x12 + x22 + x32 = 1 is a two-surface with parametrization 0 1 cos ffi sin „ B C ’(ffi; „) = @ sin ffi sin „ A : cos „ ’ : [0; 2ı] × [0; ı] → S 2 ; We note that ¯ 0 1 − sin ffi sin „ cos ffi cos „ B C rank D’|(ffi;„) = rank @ cos ffi sin „ sin ffi cos „ A 0 − sin „ 0 1 − sin ffi cos ffi cos „ B C = rank @ cos ffi sin ffi cos „ A = 2 0 1 when sin „ ̸= 0. Hence rank D’ = 2 almost everywhere on [0; 2ı] × [0; ı]. Parametrized Surfaces and Surface Integrals Slide 469 Parametrized Surfaces 16.3. Example. Consider the graph of a scalar function f : ˙ → R, ˙ ⊂ Rn , ˘ ¯ ` (f ) = (x1 ; : : : ; xn ; xn+1 ) ∈ Rn+1 : x = (x1 ; : : : ; xn ) ∈ ˙; xn+1 = f (x) : This is a hypersurface in Rn+1 with parametrization 0 ’ : ˙ → ` (f ); B B ’(x) = B B xn f (x1 ; : : : ; xn ) @ The rank of the Jacobian is rank D’|x = rank x1 .. . 1 Df |x ! 1 C C C: C A = n; written in block matrix form, where 1 ∈ Mat(n × n; R) is the n × n unit matrix. Parametrized Surfaces and Surface Integrals Slide 470 Tangent Spaces of Surfaces We now want to define the tangent space of a parametrized m-surface S ⊂ Rn . The parametrization ’ : ˙ → S satisfies ’(x0 + h) = ’(x0 ) + D’|x0 h + o(h) as h → 0: Hence we consider the map h → D’|x h to be the linear approximation to ’ near x0 . The range of this map is given by {x ∈ Rn : x = D’|x0 h; h ∈ Rm }: and is equal to the span of the column vectors of D’|x0 . The elements of the range give good approximations to S at points near x0 . Parametrized Surfaces and Surface Integrals Slide 471 Tangent Spaces of Surfaces Hence, it is natural to make the following definition. 16.4. Definition. Let S ⊂ Rn be a smooth, parametrized m-surface with parametrization ’ : ˙ → S. Then 0 1˛ ’1 (x) ˛ @ B . C˛˛ tk (p) = ; @ .. A˛ ˛ @xk ˛ ’n (x) x=’−1 (p) k = 1; : : : ; m: is called the kth tangent vector of S at p ∈ S and Tp S := ran D’|x = span{t1 (p); : : : ; tm (p)} is called the tangent space to S at p. The vector field tk : S → Rn ; is called the kth tangent vector field on S. p 7→ tk (p) Parametrized Surfaces and Surface Integrals Slide 472 Tangent Vectors to the Unit Sphere 16.5. Example. For the unit sphere S 2 ⊂ R3 parametrized with 0 1 cos ffi sin „ C B ’(ffi; „) = @ sin ffi sin „ A : cos „ we have the tangent vectors at p ∈ S 2 given by 0 1 0 0 1 0 1 cos ffi sin „ − sin ffi sin „ @ B C B C sin ffi sin „ tffi (p) = = @ A @ cos ffi sin „ A @ffi cos „ 0 1 cos ffi sin „ cos ffi cos „ @ B C B C t„ (p) = @ sin ffi sin „ A = @ sin ffi cos „ A @„ cos „ − sin „ taken at (ffi; „) = ’−1 (p). Parametrized Surfaces and Surface Integrals Slide 473 Tangent Vectors to the Unit Sphere √ √ At p = (1= 2; 0; 1= 2) = ’(0; ı=4) the tangent vectors are 0 1 0 √ 1 0 2=2 B√ C C B tffi (p) = @ 2=2A ; t„ (p) = @ √0 A 0 − 2=2 and the tangent space is Tp S 2 = span{tffi (p); t„ (p)} 8 > < 0 1 0 1 9 > 0 1 = B C B C = x ∈ R3 : x = ¸ @1A + ˛ @ 0 A ; ¸; ˛ ∈ R : > > : ; 0 −1 Parametrized Surfaces and Surface Integrals Slide 474 The Normal Vector to Hypersurfaces The tangent space of an m-surface in Rn is an m-dimensional subspace of Rn . If S is a hypersurface, i.e., an (n − 1)-surface in Rn , then (Tp S)⊥ is a 1-dimensional subspace of Rn and there exists a unit basis vector of this space. This basis vector is uniquely defined up its sign. 16.6. Definition. Let S ⊂ Rn be a hypersurface. Then a unit vector that is orthogonal to all tangent vectors to S at p is called a unit normal vector to S at p and denoted by N(p). The mapping N : S → Rn ; is called the normal vector field on S. p 7→ N(p) Parametrized Surfaces and Surface Integrals Slide 475 Orientation of Hypersurfaces 16.7. Example. Returning to Example 16.5, the unit normal vector at √ √ p = (1= 2; 0; 1= 2) is orthogonal to both tffi (p) and t„ (p) and given by 0√ 1 2=2 B C N(p) = ± @√ 0 A = ±p 2=2 where we are free to choose a sign arbitrarily. Since the unit normal vector is uniquely determined up to its sign, there are two possible choices for a normal vector at each p ∈ S. Usually, one chooses the direction of the normal vector at a single point of S and attempts to choose the normal vector at all other points of S in such a way that the normal vector field is continuous on S. Parametrized Surfaces and Surface Integrals Slide 476 Orientation of Hypersurfaces 16.8. Definition. (i) A hypersurface S ⊂ Rn such that it admits a continuous normal vector field is said to be orientable. (ii) A choice of direction for the normal vector field is called an orientation of S. (iii) A hypersurface that is the boundary of a measurable set ˙ ⊂ Rn with non-zero measure is said to be a closed surface. (iv) A closed hypersurface is said to have positive orientation if the normal vector field is chosen so that the normal vectors point outwards from ˙. Later, we will give a more accessible way of distinguishing between a closed surface and the alternative, a surface with boundary. Parametrized Surfaces and Surface Integrals Slide 477 Orientation of Surfaces 16.9. Example. The classic example of a 2-surface that is not orientable is the Möbius strip in R3 : A parametrization is given by ’ : [−w ; w ] × [0; 2ı) → R3 ; 0 1 (R + s cos(t=2)) cos t B C ’(s; t) = @ (R + s cos(t=2)) sin t A s sin(t=2) The above parametrization gives a Möbius strip lying in the x1 -x2 plane of width 2w > 0. Suppose a normal vector is chosen at some point p. Moving the normal vector around the strip back to its initial position p, it then points in the other direction. Hence, the normal vector field is not continuous. Parametrized Surfaces and Surface Integrals Slide 478 Normal Vectors for Curves in R2 In R2 , a curve is a hypersurface and we have already introduced the concept of a normal vector in Definition 13.9. There, the normal vector always points into the direction of change of the tangent vector of the curve. This can cause the normal vector to “jump” as the curve winds, in which case we do not obtain a continuous normal vector field. Therefore, whenever we regard a curve in R2 as being a surface, we will use the normal vector convention for surfaces described here rather than Definition 13.9. Parametrized Surfaces and Surface Integrals Slide 479 Infinitesimal Surface Elements of Hypersurfaces Our goal now is to define the area of surfaces. Consider a parametrized 2-surface S in R3 . At any point p ∈ S there exist two tangent vectors t1 (p) and t2 (p). Suppose ’ = ’(x1 ; x2 ) is a parametrization of S and p = ’(x). We would like to define an “infinitesimal surface element” dA = area of the parallelogram spanned by t1 and t2 at ’(x1 ; x2 ) · dx1 dx2 = ∥t1 × t2 ∥ ◦ ’(x1 ; x2 ) dx1 dx2 However, this expression would not generalize well to higher dimensions. Another approach would be make use of the unit normal vector N at p = ’(x) (because S is a hypersurface). We can hence replace the area of the parallelogram spanned by t1 and t2 by the volume of the parallelepiped spanned by t1 ; t2 and N. Parametrized Surfaces and Surface Integrals Slide 480 Volume (Area) of Hypersurfaces We define the scalar surface element of a hypersurface in R3 by dA = |det(t1 ; t2 ; N) ◦ ’(x1 ; x2 )| dx1 dx2 : This may be generalized to hypersurfaces in Rn by setting dA = |det(t1 ; t2 ; : : : ; tn−1 ; N) ◦ ’(x1 ; : : : ; xn−1 )| dx1 dx2 : : : dxn−1 : 16.10. Definition. Let S ⊂ Rn be a hypersurface with parametrization ’ ∈ C 1 (˙; Rn ), ˙ ⊂ Rn−1 . Let tj = D’ej , j = 1; : : : ; n − 1, be the tangent vector fields on S. Let N be a chosen normal vector field on S (so that S is oriented). Then the volume or area of S is defined as |S| := Z ˙ |det(t1 ; : : : ; tn−1 ; N) ◦ ’(x)| dx1 dx2 : : : dxn−1 : Parametrized Surfaces and Surface Integrals Slide 481 Area of the Unit Sphere in R3 16.11. Example. In Example 16.5 we have see that S 2 with parametrization 0 1 cos ffi sin „ B C ’(ffi; „) = @ sin ffi sin „ A : cos „ has tangent vectors 0 1 − sin ffi sin „ B C tffi ◦ ’(ffi; „) = @ cos ffi sin „ A ; 0 0 1 cos ffi cos „ B C t„ ◦ ’(ffi; „) = @ sin ffi cos „ A − sin „ To calculate the normal vector, we can simply take 0 1 − cos ffi sin2 „ B C (tffi × t„ ) ◦ ’(ffi; „) = @ − sin ffi sin2 „ A − cos „ sin „ Parametrized Surfaces and Surface Integrals Slide 482 Area of the Unit Sphere in R3 Taking account of |tffi × t„ | = sin „, we have 0 1 cos ffi sin „ B C N ◦ ’(ffi; „) = − @ sin ffi sin „ A : cos „ Then the area of the unit sphere is given by ˛ 0 Z 2ı Z ı ˛˛ − sin ffi sin „ ˛ B |S 2 | = det ˛ @ cos ffi sin „ 0 0 ˛˛ 0 ˛ 0 ˛ Z 2ı Z ı − sin ffi ˛ ˛ B = sin „ ˛det @ cos ffi ˛ 0 0 ˛ 0 Z ı = 2ı sin „ d„ = 4ı 0 1˛ cos ffi cos „ − cos ffi sin „ ˛˛ C˛ sin ffi cos „ − sin ffi sin „ A˛ d„ dffi ˛ ˛ − sin „ − cos „ 1˛ cos ffi cos „ cos ffi sin „ ˛˛ C˛ sin ffi cos „ sin ffi sin „ A˛ d„ dffi ˛ ˛ − sin „ cos „ Parametrized Surfaces and Surface Integrals Slide 483 Infinitesimal Surface Elements of Arbitrary Surfaces We would like to generalize the concept of area and infinitesimal surface elements from hypersurfaces to arbitrary surfaces in Rn . From the beginning, the introduction of the normal vector to calculate the surface area by means of the volume was undertaken by necessity rather than through any other considerations. We note that, in block matrix notation. ` ´ det(t1 ; : : : ; tn−1 ; N)2 = det (t1 ; : : : ; tn−1 ; N)T · det(t1 ; : : : ; tn−1 ; N) ` = det (t1 ; : : : ; tn−1 ; N)T · (t1 ; : : : ; tn−1 ; N) 00 1 t1T BB . C BB .. C BB C = det BB C C C C · (t1 ; : : : ; tn−1 ; N)C T @@tn−1 A NT 1 A ´ Parametrized Surfaces and Surface Integrals Slide 484 Infinitesimal Surface Elements of Arbitrary Surfaces Performing the row-by-column matrix multiplication, we see that 0 ⟨t1 ; t1 ⟩ B .. B . B 1 · · · ⟨t1 ; tn−1 ⟩ ⟨t1 ; N⟩ C .. .. .. C . 2 . . C det(t1 ; : : : ; tn−1 ; N) = det B C @⟨tn−1 ; t1 ⟩ · · · ⟨tn−1 ; tn−1 ⟩ ⟨tn−1 ; N⟩A ⟨N; t1 ⟩ · · · ⟨N; tn−1 ⟩ ⟨N; N⟩ 1 0 ⟨t1 ; t1 ⟩ B .. B . B · · · ⟨t1 ; tn−1 ⟩ 0 .. .. C .. . . .C C = det B C @⟨tn−1 ; t1 ⟩ · · · ⟨tn−1 ; tn−1 ⟩ 0A 0 ··· 0 1 0 ⟨t1 ; t1 ⟩ B .. = det @ . 1 · · · ⟨t1 ; tn−1 ⟩ C .. .. A . . ⟨tn−1 ; t1 ⟩ · · · ⟨tn−1 ; tn−1 ⟩ where we have used that the normal vector is orthogonal to all tangent vectors and has unit length. Parametrized Surfaces and Surface Integrals Slide 485 The Metric Tensor 16.12. Definition. Let S ⊂ Rn be an m-surface with parametrization ’ and tangent vector fields t1 ; : : : ; tm . Then G ∈ Mat(m × m; R) given by 0 1 ⟨t1 ; t1 ⟩ B G := @ ... · · · ⟨t1 ; tm ⟩ C .. .. A . . ⟨tm ; t1 ⟩ · · · ⟨tm ; tm ⟩ is said to be the metric tensor on S with respect to ’. The coefficients gij := ⟨ti ; tj ⟩; i; j = 1; : : : ; m; are called the metric coefficients of G. We often write g (x) = det G(’(x)) for short. Parametrized Surfaces and Surface Integrals Slide 486 The Metric Tensor 16.13. Remarks. (i) We have proved that if S is a hypersurface in Rn , then √ |det(t1 ; : : : ; tn−1 ; N)| = det G: This will allow us to extend the definition of area/volume to general surfaces. (ii) In the case n = 3, m = 2, we have ! ∥t1 ∥2 ⟨t1 ; t2 ⟩ g = det ⟨t2 ; t1 ⟩ ∥t2 ∥2 = ∥t1 ∥2 ∥t2 ∥2 − ⟨t1 ; t2 ⟩2 ` ´ = ∥t1 ∥2 ∥t2 ∥2 1 − cos2 \(t1 ; t2 ) = ∥t1 ∥2 ∥t2 ∥2 sin2 \(t1 ; t2 ) = ∥t1 × t2 ∥2 so that dA = ∥t1 × t2 ∥ ◦ ’(x) dx1 dx2 : Parametrized Surfaces and Surface Integrals Slide 487 Scalar Surface Integrals 16.14. Definition. Let S be a parametrized m-surface with parametrization ’ : ˙ → S, ˙ ⊂ Rm . Then |S| := Z q g (x) dx ˙ defines the volume or area of S. Let f : S → R be a potential function. Then the (scalar) surface integral of f over S is defined as Z f dA := S Z ˙ f ◦ ’(x) q 16.15. Remark. As usual, dA := q g (x) dx is called the scalar surface element of S. g (x) dx Parametrized Surfaces and Surface Integrals Slide 488 Electrostatic Potential of a Surface Charge 16.16. Example. The electrostatic potential V (p) at a point p ∈ R3 induced by a charged surface S is given by 1 V (p) = 4ı"0 Z S %( · ) dA dist(p; · ) where % is the charge density of the surface. Let S = {(x; y ; z) ∈ R3 : x 2 + y 2 = z 2 ; 0 ≤ z ≤ 1} and assume that % is constant on S. We calculate the potential at the point p = (0; 0; 1). Introducing polar coordinates, we have S = {(x; y ; z) ∈ R3 : x = r cos „; y = r sin „; z = r; 0 ≤ „ ≤ 2ı; 0 ≤ r ≤ 1}: Parametrized Surfaces and Surface Integrals Slide 489 Electrostatic Potential of a Surface Charge We can read off that a parametrization of S is given by 0 ’ : [0; 2ı] × [0; 1] → S; The tangent vectors are 0 1 0 −r sin „ B C t„ ◦ ’(„; r ) = @ r cos „ A ; 0 1 cos „ B C tr ◦ ’(„; r ) = @ sin „ A : 1 Hence, !˛ ⟨t„ ; t„ ⟩ ⟨t„ ; tr ⟩ ˛˛ g („; r ) = det ⟨tr ; t„ ⟩ ⟨tr ; tr ⟩ ˛’(„;r ) ! r2 0 = det 0 2 = 2r 2 : 1 r cos „ B C ’(„; r ) = @ r sin „ A : r Parametrized Surfaces and Surface Integrals Slide 490 Electrostatic Potential of a Surface Charge It follows that the volume element is given by √ dA = 2r dr d„: We then have V (p) = % 4ı"0 % = 4ı"0 Z dA S ∥p − ( · )∥ Z 2ı Z 1 0 Z 0 q √ 2r dr d„ r 2 cos2 „ + r 2 sin2 „ + (1 − r )2 1 % r dr √ =√ 2"0 0 2r 2 − 2r + 1 √ % ln(3 + 2 2) = 4"0 Slide 491 Part 3: Vector Fields and Higher Order Derivatives Slide 492 Vector Fields and Higher Order Derivatives 17. Potential Functions and the Gradient 18. Vector Fields and Integrals 19. Flux and Circulation 20. Fundamental Theorems of Vector Calculus 21. The Second Derivative 22. Free Extrema 23. Taylor Series Potential Functions and the Gradient Slide 493 17. Potential Functions and the Gradient Potential Functions and the Gradient Slide 494 Vector Fields and Higher Order Derivatives 17. Potential Functions and the Gradient 18. Vector Fields and Integrals 19. Flux and Circulation 20. Fundamental Theorems of Vector Calculus 21. The Second Derivative 22. Free Extrema 23. Taylor Series Potential Functions and the Gradient Slide 495 Potentials A map f : ˙ → R where ˙ ⊂ Rn is called a scalar function or a potential. Physically, if n = 3, a potential assigns to each point in space a scalar value. Examples include temperature, pressure or height. While such functions have appeared in the previous sections, here we will take a closer look at some basic aspects of potentials. The first question is how to visualize them. Potential Functions and the Gradient Slide 496 Visualizing Functions f : R2 → R Suppose we have a function f : R2 → R, i.e., a real function of two variables. One method of graphing such a a function is using a three-dimensional graph showing the (x1 ; x2 ; z)-axes and plotting z = f (x1 ; x2 ). For example, the graph below shows the function f : [0; 4ı] × [0; 4ı] → R; f (x1 ; x2 ) = cos x1 + cos x2 Potential Functions and the Gradient Slide 497 3D Plots with Mathematica 17.1. Example. The following Mathematica command creates the three-dimensional plot on the previous slide: Plot3D@Cos@xD + Cos@yD, 8x, 0, 4 Pi<, 8y, 0, 4 Pi<, Mesh ® False, Ticks ® 880, Π, 2 Π, 3 Π<, 80, Π, 2 Π, 3 Π<, 8- 2, 0, 2<<, AxesLabel ® 8x1 , x2 , f@x1 , x2 D<, BaseStyle ® 8FontSize ® 12, FontFamily ® "CMU Sans Serif"<D Potential Functions and the Gradient Slide 498 Contour Plots Another representation for functions f : R2 → R is the so-called contour plot. In this two-dimensional graph we plot curves C¸ = f −1 ({¸}) for several values of ¸. These are the pre-image sets (see (9.3)) of {¸}. Potential Functions and the Gradient Slide 499 Contour Plots To illustrate, we successively show contours of f : [0; 4ı] × [0; 4ı] → R; 1 1.9 0 f (x1 ; x2 ) = cos x1 + cos x2 1 1.5 1.9 1.9 1 0 1.5 1.5 0.5 0.5 0.5 -1.9 3Π -1.9 -1 -1 -0.5 -0.5 -1.5 -1.5 1 1 1.9 1.9 0.5 x2 2 Π 1.5 1 1.9 0.5 0.5 1.5 -0.5 1.5 -1 Π 0 -1.5 -1.9 -1.5 0 -1.9 0.5 -1 1 1 -0.5 1.9 1.9 1.5 0 0 0.5 1 Π 1.9 2Π 0.5 1.5 3Π 1.5 Potential Functions and the Gradient Slide 500 Contour Plots Instead of labeling, Mathematica can also color-code the contours according to their values. Here, dark colors represent smaller values, light colors larger values. Potential Functions and the Gradient Slide 501 Contour Plots with Mathematica 17.2. Example. The following Mathematica commands creates the contour plots on the previous slides: ContourPlot@Cos@xD + Cos@yD, 8x, 0, 4 Pi<, 8y, 0, 4 Pi<, FrameLabel ® 8x1 , x2 <, RotateLabel ® False, FrameTicks ® 880, Π, 2 Π, 3 Π, 4 Π<, 80, Π, 2 Π, 3 Π, 4 Π<, 8<, 8<<, ContourStyle ® 88RGBColor@0, 1, 0.5D, Thickness@0.004D<<, BaseStyle ® 8FontSize ® 14, FontFamily ® "CMU Sans Serif"<, PlotPoints ® 50, ContourLabels ® HText@Framed@ð3, FrameStyle ® White, FrameMargins ® 0.2D, 8ð1, ð2<, Background ® White, BaseStyle ® 8FontSize -> 10<D &L, Contours ® 80, 0.5, - 0.5, - 1, - 1.5, 1, 1.5, 1.9, - 1.9<, ContourShading ® NoneD ContourPlot@Cos@xD + Cos@yD, 8x, 0, 4 Pi<, 8y, 0, 4 Pi<, FrameLabel ® 8x1 , x2 <, RotateLabel ® False, FrameTicks ® 880, Π, 2 Π, 3 Π, 4 Π<, 80, Π, 2 Π, 3 Π, 4 Π<, 8<, 8<<, BaseStyle ® 8FontSize ® 16, FontFamily ® "CMU Sans Serif"<, PlotPoints ® 50, Contours ® 10D Potential Functions and the Gradient Slide 502 Phase Curves In the hamiltonian formulation of analytical mechanics, one defines a so-called Hamilton function H for a mechanical system. This function is the sum of the kinetic energy (T ) and potential energy (V ). It represents the total energy of the system, and remains constant if the system satisfies the law of energy conservation (there are not, for example, any frictional forces). We will assume this for our present discussion. Inthis approach, the essential variables of a system are the position x and the momentum p. The variables are tracked in so-called phase space Rnx × Rnp = R2n (x;p) , where, typically, n = 1; 2 or 3. The time-evolution of the system is represented through phase curves in R2n , which are given by the contour lines of H, which is regarded as a function R2n → R. In other words, a phase curve is the set H −1 (E), where E is the conserved energy of the system. Potential Functions and the Gradient Slide 503 Phase Curves 17.3. Example. For the simple harmonic oscillator, the kinetic energy is given by T = 12 mv 2 = p 2 =(2m) and the potential energy is given by V = k2 x 2 , so 1 2 k 2 H(x; p) = p + x : 2m 2 The phase curves of the system are ellipses in R2x;p , with each ellipse describing the behavior of a harmonic oscillator at a fixed energy E. Potential Functions and the Gradient Slide 504 Phase Curves 17.4. Example. For a mathematical pendulum of length l with mass m, V = −mg l cos „, so 1 2 H(„; p) = p − mg l cos „: 2m Sketch the phase curves of the pendulum for different energies and interpret them physically! Potential Functions and the Gradient Slide 505 Derivatives of Potential Functions The Jacobian of a differentiable potential is given by Df |x = “ ˛ @f ˛ @x1 x ::: ˛ ” @f ˛ @xn x : The row vector Df |x may be regarded as a linear map Df |x : Rn → R, Df |x y = “ ˛ @f ˛ @x1 x n X ::: 0 1 ” y1 ˛ B .. C @f ˛ @ A @xn x . yn ˛ @f ˛˛ = yi @xi ˛x i=1 Thus Df |x ∈ (Rn )∗ , the dual space of Rn (see Examples 4.6 ii)). Potential Functions and the Gradient Slide 506 Coordinate Maps Classically, we considered xj (1 ≤ j ≤ n) as a coordinate of the vector x = (x1 ; : : : ; xn ) ∈ Rn . We now introduce a different interpretation. Define the map 0 1 x1 B .. C x = @ . A 7→ xj : Rn → R; xn This map is clearly linear; it is the coordinate map that assigns to x ∈ Rn its coordinate xj . We denote this map by xj also; hence xj : x 7→ xj or xj (x) = xj . The dual meaning of xj as a map and the value of this map is very convenient. In fact, the entire discipline of differential geometry hinges on exploiting this ambiguity. The derivative of the map xj is given by dxj = (0; : : : ; 0; 1; 0; : : : ; 0) ↑ j (17.1) Potential Functions and the Gradient Slide 507 The Differential and the Gradient Note that we have written dxj instead of Dxj ; this is a traditional notation for potential functions. The derivative is also called a differential and written df instead of Df . Note also that dxj |x does not depend on x. Therefore, we have df |x = “ ˛ @f ˛ @x1 x ::: ˛ ” @f ˛ @xn x ˛ ˛ @f ˛˛ @f ˛˛ = dx1 + · · · + dxn ˛ @x1 x @xn ˛x Each differential dxj = ej∗ is the dual basis vector to the standard basis vector ej with respect to the euclidean scalar product. The transpose of the Jacobian is the gradient, 0 @f ˛ 1 ˛ @x1 x B . C C ∇f (x) := (Jf (x))T = B @ .. ˛ A @f ˛ @xn x The triangle symbol is called nabla. Potential Functions and the Gradient Slide 508 Etymology of Nabla The term “nabla” derives from the greek word for a Phoenician harp, whose shape the nabla triangle ∇ is supposed to resemble. It was used by the physicists James Maxwell and Peter Tait (the latter developed much of the modern mathematics of the nabla operator) in their private correspondence. There is evidence that this was a private joke between them and Maxwell did not use the term in serious publications. However, it became popular nevertheless, being used by William Thomson (Lord Kelvin) at the end of the 19th century. Another proposal has been to call the symbol “del”, but it seems that “nabla” is the most common term today. Harps from 1911 Webster’s Dictionary. Wikimedia Commons. Wikimedia Foundation. Web. 18 July 2018 Potential Functions and the Gradient Slide 509 The Directional Derivative 17.5. Definition. Let ˙ ⊂ Rn be an open set, f : ˙ → R continuous and h ∈ Rn , ∥h∥ = 1, be a unit vector. Then the directional derivative Dh f in the direction h is defined by Dh f |x := ˛ d ˛ f (x + th)˛ : dt t=0 (17.2) if the right-hand side exists. 17.6. Remarks. (i) It is essential that ∥h∥ = 1, otherwise the slope will not be scaled correctly. (ii) The directional derivative is a number, in contradistinction to the derivative. Thus it should perhaps be more properly known as the “directional slope.” Potential Functions and the Gradient Slide 510 Interpretation of the Directional Derivative The directional derivative has the following interpretation: if ‚(t) = x + th, t ∈ [0; 1], parametrizes the straight line segment joining x and x + h, then Dh f is simply the derivative of f ◦ ‚. Hence, The directional derivative Dh f |x is the derivative of f at x along the line segment joining x and x + h. Another way of stating this is The directional derivative Dh f |x gives the slope of the tangent line of f at x in the direction of h. Potential Functions and the Gradient Slide 511 Visualization of the Directional Derivative x2 1 0 -1 0.5 0 fHxL x+h -0.5 x -1 0 x1 1 fHx1 ,x2 L Potential Functions and the Gradient Slide 512 The Directional Derivative We note that the tangent line of f : Rn → R at x in the direction h is given by tf ;x;h (s) = ! x + sh ; f (x) + Dh f |x s s ∈ R; (17.3) where h ∈ Rn (so the above vector is a “block vector” with n + 1 entries). For functions f : R2 → R the directional derivative is sometimes specified through the angle „. This understood to mean that h= ! cos „ : sin „ Potential Functions and the Gradient Slide 513 The Directional Derivative 17.7. Example. Let f : R2 → R, f (x1 ; x2 ) = x12 − 4x2 . Then the directional derivative of f at x in the direction is ˛ ˛ ˛ ´˛ d d ` f (x + th)˛˛ = (x1 + th1 )2 − 4(x2 + th2 ) ˛˛ dt dt t=0 t=0 = 2h1 (x1 + th1 ) − 4h2 |t=0 Dh f |x = = 2h1 x1 − 4h2 : √ √ For h = (1= 2; 1= 2) (or „ = ı=4) we would have √ √ Dh f |x = 2x1 − 2 2: At x = (0; 0), the directional derivative in direction h is √ Dh f |x=0 = −2 2 Potential Functions and the Gradient Slide 514 The Directional Derivative for Smooth Functions Suppose that f is differentiable. If ‚(t) = x + th, t ∈ [0; 1], parametrizes the straight line segment joining x and x + h, then by the chain rule Dh f |x = ˛ ˛ d ˛ ˛ f (x + th)˛ = Df |x+th h˛ = Df |x h dt t=0 t=0 so Dh f |x = Df |x h = ⟨∇f (x); h⟩: (17.4) This is a useful expression for calculating the directional derivative, but it supposes that f is differentiable. In practice, (17.4) will be valid if the partial derivatives if f exist and are continuous at x. Potential Functions and the Gradient Slide 515 The Directional Derivative for Smooth Functions 17.8. Example. Returning to Example 17.7, we have ∇f (x) = 2x1 −4 ! Since the partial derivatives are continuous, Dh f |x = ⟨∇f (x); h⟩ = * ! 2x1 h ; 1 −4 h2 !+ This coincides with the result obtained previously. = 2x1 h1 − 4h2 : Potential Functions and the Gradient Slide 516 The Normal Derivative An important special case of the directional derivative is the normal derivative. 17.9. Definition. Let ˙ ⊂ Rn be an open set, f : ˙ → R and S∗ a smooth, oriented, parametrized hypersurface in ˙. Let p ∈ S and N(p) denote the normal vector at p. Then ˛ @f ˛˛ := DN(p) f |p @n ˛p is called the normal derivative of f at p (with respect to the oriented surface S∗ ). 17.10. Example. Let f : R2 → R, f (x1 ; x2 ) = x12 − 4x2 , and R2 : x2 = x12 ; x1 ∈ R}: Then C is parametrized by ‚(t) = (t; t 2 ), t ∈ R, and C = {(x1 ; x2 ) ∈ ! 1 1 T ◦ ‚(t) = √ : 2 1 + 4t 2t Potential Functions and the Gradient Slide 517 The Normal Derivative in R2 A normal vector is then found from −4t 1 (T ◦ ‚) (t) = 3=2 2 2t (1 + 4t ) ′ 0 ! 1 −4t (1+4t 2 )3=2 A @ = −8t 2 +2(1+4t 2 ) (1+4t 2 )3=2 2 = (1 + 4t 2 )3=2 −2t 1 ! 1 0 +√ 2 1 + 4t 2 ! The unit normal vector is found by normalizing (T ◦ ‚)′ , so we have 1 N ◦ ‚(t) = √ 1 + 4t 2 ! −2t : 1 Potential Functions and the Gradient Slide 518 The Normal Derivative in R2 At a point p = ‚(t) on C the normal derivative is hence ˛ @f ˛˛ = ⟨∇f (‚(t)); N ◦ ‚(t)⟩ @n ˛‚(t) 1 =√ 1 + 4t 2 * 4(t 2 + 1) = −√ 1 + 4t 2 ! 2t −2t ; −4 1 !+ Potential Functions and the Gradient Slide 519 Properties of the Gradient The gradient vector of f at x, ∇f (x), has some interesting properties: I ∇f (x) points in the direction of the greatest directional derivative of f at x. This follows from Dh f (x) = ⟨∇f (x); h⟩ = |∇f (x)| cos \(∇f (x); h); which becomes maximal if \(∇f (x); h) = 0. I ∇f (x) is perpendicular to the contour line of f at x. More precisely, it is perpendicular to the tangent line of the contour lie at x. This is due to the fact that the tangent line to the contour is parallel to the direction h0 in which Dh0 f (x) = 0, so ⟨∇f (x); h0 ⟩ = 0: Vector Fields and Integrals Slide 520 18. Vector Fields and Integrals Vector Fields and Integrals Slide 521 Vector Fields and Higher Order Derivatives 17. Potential Functions and the Gradient 18. Vector Fields and Integrals 19. Flux and Circulation 20. Fundamental Theorems of Vector Calculus 21. The Second Derivative 22. Free Extrema 23. Taylor Series Vector Fields and Integrals Slide 522 Vector Fields We now turn to a very important type of map, the vector field. Vector fields play an extremely important role in physics and mathematics. Examples include the flow field of a fluid or the electromagnetic field induced by a charge. 18.1. Definition. Let ˙ ⊂ Rn . Then a function F : ˙ → Rn , 0 1 F1 (x) B .. C F (x) = @ . A : Fn (x) is called a vector field on ˙. 18.2. Example. Let f : Rn → R be a potential function. Then the gradient field of f given by F : Rn → Rn ; F (x) = ∇f (x); associates to every x ∈ Rn the direction of largest slope of f . Vector Fields and Integrals Slide 523 Force Fields 18.3. Example. A mass M situated at the origin of a coordinate system exerts an attractive force on another mass m at position x ∈ R3 \ {0}. This force field is given by F : R3 \ {0} → R3 ; F (x) = −G m·M x ; |x|2 |x| (18.1) where G is Newton’s gravitational constant. Any vector field that associates to each x ∈ Rn a physical force vector is said to be a force field. (This term of course has only physical, not mathematical, significance.) In physics, the concept of work arises from the integration of the forces acting along a particle’s trajectory (curve), where force that are orthogonal to the trajectory do not contribute to the work. In particular, the work is obtained by integrating only the tangential components of the force field. Vector Fields and Integrals Slide 524 Gravitational Force Field The plot below shows the gravitational force field (18.1) by attaching a vector representing F (x) to each x ∈ R3 \ {0} Vector Fields and Integrals Slide 525 Gravitational Force Field For future examples we will use the two-dimensional version, m·M x F : R2 \ {0} → R2 ; F (x) = −G ; |x|2 |x| x2 x1 (18.2) Vector Fields and Integrals Slide 526 Streamlines of Fluid Flow 18.4. Example. Consider a fluid flow in R2 where the fluid rotates about the origin in a counter-clockwise manner. The streamlines show the paths of a “fluid particle”: x2 x1 Vector Fields and Integrals Slide 527 Direction Field of Fluid Flow The streamlines are circles and the unit tangent vector field (the direction field) of the circles is given by F : R2 \ {0} → R2 ; F (x1 ; x2 ) = q ! 1 x12 + x22 x2 x1 −x2 : x1 (18.3) Vector Fields and Integrals Slide 528 Velocity Field of Fluid Flow q The velocity at a distance r = x12 + x22 from the origin is r · !, where ! > 0 is the rotational velocity. Hence, the velocity vector field is given by v: R →R ; 2 2 ! −x2 v (x1 ; x2 ) = r !F (x1 ; x2 ) = ! : x1 x2 x1 (18.4) Vector Fields and Integrals Slide 529 The Line Integral of a Vector Field 18.5. Definition. Let ˙ ⊂ Rn , F : ˙ → Rn be a continuous vector field and C∗ ⊂ ˙ an oriented open, smooth curve in Rn . We then define the line integral of the vector field F along C∗ by Z C∗ F d ~‘ := Z C∗ ⟨F; T ⟩ d‘ (18.5) 18.6. Remarks. (i) We have defined the line integral of the vector field F as the line integral of the scalar product ⟨F; T ⟩ on C∗ . Since T does not depend on the parametrization of C∗ and the line integral of a scalar function doesn’t either, the line integral of a vector field is independent of parametrization of C∗ . Vector Fields and Integrals Slide 530 The Line Integral of a Vector Field (ii) The symbol “d ~‘” can be interpreted geometrically as a vectorial line element and one often writes d ~‘ = ‚ ′ (t) dt: In the same spirit, one sometimes writes Z C∗ F d ~‘ = Z C∗ ⟨F; d ~‘⟩ (iii) Integrals along closed curves are sometimes emphasized by writing I C∗ f d‘ if the curve C is closed. or I C∗ F d ~‘ Vector Fields and Integrals Slide 531 Integrals of Vector Fields If we calculate the line integral using a concrete parametrization ‚ : I → C, we obtain Z F d ~‘ = C∗ = Z C∗ Z fi I = ⟨F; T ⟩ d‘ = Z I F ◦ ‚(t); Z ⟨F ◦ ‚(t); T ◦ ‚(t)⟩∥‚ ′ (t)∥ dt I ‚ ′ (t) ∥‚ ′ (t)∥ fl ∥‚ ′ (t)∥ dt ⟨F ◦ ‚(t); ‚ ′ (t)⟩ dt (18.6) 18.7. Example. Calculate the work performed when traveling in a force field F (x; y ) = (x; y ) along the parabola y = x 2 in R2 from (0; 0) to (1; 1). W = = Z C∗ Z 1 0 F d ~‘ = Z 1 0 ⟨F ◦ ‚(t); ‚ ′ (t)⟩ dt = (t + 2t 3 ) dt = 1 1 + =1 2 2 Z 1* 0 ! t 1 ; 2 t 2t !+ dt Vector Fields and Integrals Slide 532 Integrals of Vector Fields y x Vector Fields and Integrals Slide 533 Potential Fields 18.8. Definition. Let ˙ ⊂ Rn be an open set. A vector field F : ˙ → Rn is said to be a potential field if there exists a differentiable potential function U : ˙ → R such that F (x) = ∇U(x) convention in mathematics; F (x) = −∇U(x) convention in physics: or 18.9. Example. The gravitational force field (18.1) introduced in Example 18.3 is a potential field, because F = −∇U for U : R3 \ {0} → R; as is easily checked. U(x) = G m·M ; |x| (18.7) Vector Fields and Integrals Slide 534 Integrals of Potential Fields Potential fields are very useful, as the integral along an oriented open curve C∗ depends only on the initial and the final point of the curve. This can be seen from Z I ⟨F ◦ ‚(t); ‚ ′ (t)⟩ dt = = Z ZI I ⟨∇U ◦ ‚(t); ‚ ′ (t)⟩ dt = Z I DU|‚(t) (‚ ′ (t)) dt (U ◦ ‚)′ (t) dt: where we have used the chain rule. Supposing that the initial point of the curve is pinitial and the final point is pfinal , we have from the fundamental theorem of calculus Z C∗ F d ~‘ = Z I (U ◦ ‚)′ (t) dt = U(pfinal ) − U(pinitial ): (18.8) We see that for a potential field, the line integral along a simple open curve C∗ depends only on the initial and final points of C; the shape of the curve is irrelevant. The potential function U plays the role of a primitive for F . Vector Fields and Integrals Slide 535 Conservative Fields Integrals along closed curves can be easily realized by splitting a closed curve into two open curves. The final point of one curve is the initial point of the other curve. 18.10. Lemma. Let ˙ ⊂ Rn be open, F : ˙ → Rn a potential field and C ⊂ ˙ a closed curve. Then I F d ~‘ = 0: C The proof is obvious from the preceding discussion. 18.11. Definition. Let ˙ ⊂ Rn be open and F : ˙ → Rn a vector field. If the integral along any open curve C∗ depends only on the initial and final points or, equivalently, I F d ~‘ = 0 C then F is called conservative. for any closed curve C, Vector Fields and Integrals Slide 536 Potential Fields are Conservative In physical terms, a conservative force field has the property that the work required to move a particle from one point to another does not depend on the path taken. Therefore, energy is conserved. 18.12. Remark. We note explicitly that every potential field is a conservative field. In fact, under certain conditions a conservative field is also a potential field. 18.13. Definition. Let ˙ ⊂ Rn . Then ˙ is said to be (pathwise) connected if for any two points in ˙ there exists an open curve within ˙ joining the two points. Vector Fields and Integrals Slide 537 Conservative Fields are Potential Fields 18.14. Theorem. Let ˙ ⊂ Rn be a connected open set and suppose that F : ˙ → Rn is a continuous, conservative field. Then F is a potential field. Proof. We need to show that there exists a function U such that F = ∇U on ˙. In fact, we fix an arbitrary point x0 ∈ ˙ and define U(x) := Z C∗ F d ~‘ for any path C∗ joining x0 and x. (The path exists because ˙ is connected; the integral does not depend on which path is chosen since F is conservative.) We will show that @U = Fi ; @xi i = 1; : : : ; n: (18.9) Vector Fields and Integrals Slide 538 Conservative Fields are Potential Fields Proof (continued). Let ei be the ith unit vector and h small enough to ensure that x + hei ∈ ˙. A path joining x0 to x + hei can be found by taking a path C∗ joining x0 and x and a straight line segment Ch∗ parametrized by ‚(t) = x + thei , 0 ≤ t ≤ 1. We then have U(x + hei ) = Z x+hei x0 = U(x) + F d ~‘ = Z 1 0 = U(x) + h Z C∗ F d ~‘ + Z Ch∗ F d ~‘ ⟨F (x + thei ); hei ⟩ dt Z 1 Fi (x + thei ) dt 0 = U(x) + hFi (x) + h Z 1 0 ` ´ Fi (x + thei ) − Fi (x) dt Vector Fields and Integrals Slide 539 Conservative Fields are Potential Fields Proof (continued). The proof is complete if we can show that lim Z 1 h→0 0 ` ´ Fi (x + thei ) − Fi (x) dt = 0: Since Fi is continuous, we know that for every fixed t ∈ [0; 1] lim |Fi (x + thei ) − Fi (x)| = 0: h→0 Then by Lemma 9.16 we have lim sup |Fi (x + thei ) − Fi (x)| = 0: h→0 t∈[0;1] and, since ˛Z 1 ˛ ˛ ` ´ ˛ ˛ Fi (x + thei ) − Fi (x) dt ˛˛ ≤ sup |Fi (x + thei ) − Fi (x)|; ˛ 0 we are finished. t∈[0;1] Vector Fields and Integrals Slide 540 Criteria for Potential Fields 18.15. Lemma. Let ˙ ⊂ Rn be a connected open set and suppose that F : ˙ → Rn is continuously differentiable. If F is a potential field then the relations @Fi @Fj = : @xj @xi (18.10) hold for all i; j = 1; : : : ; n. The proof, which is based on an analysis of the second derivative of the potential, will be deferred to a later section. Vector Fields and Integrals Slide 541 Criteria for Potential Fields 18.16. Example. The velocity field (18.4) introduced in Example 18.4 is not a potential field, since F (x1 ; x2 ) = !(−x2 ; x1 ) and @F1 @F2 = −! ̸= ! = : @x2 @x1 Note that (18.10) is necessary, but not sufficient, for a field to be a potential field. 18.17. Example. The field F : R \ {0} → R ; 2 satisfies H 2 1 F (x1 ; x2 ) = 2 x1 + x22 @F2 @F1 = @x2 @x1 ! −x2 : x1 but S 1 F d ~‘ ̸= 0, so F is not a potential field in R2 \ {0}. Details are left to the assignments. Vector Fields and Integrals Slide 542 Criteria for Potential Fields On certain “nice” sets, however, we do have a converse theorem: 18.18. Theorem. Let ˙ ⊂ Rn be a simply connected open set and suppose that F : ˙ → Rn is continuously differentiable. If for all i; j = 1; : : : ; n @Fi @Fj = ; @xj @xi then F is a potential field. We will not have time prove this result here. However, we do need to explain what a “simply connected” set is. Loosely speaking, a set ˙ ⊂ Rn is said to be simply connected if (i) ˙ is pathwise connected and (ii) every closed curve in ˙ can be contracted to a single point within ˙. Vector Fields and Integrals Slide 543 Simply Connected Sets Salix alba. A homotopy of a circle around a sphere can be reduced to single point.. 2006. Wikipedia. Wikimedia Foundation. Web. 12 July 2012 For example, the unit sphere S 2 = {(x1 ; x2 ; x3 ) ∈ R3 : x12 + x22 + x32 = 1} is simply connected, because any closed curve can be “continuously contracted”, staying on the sphere the entire time, until it becomes a single point. Intuitively, a closed curve can be imagines as a stretched rubber band. If any rubber band can be contracted to a single point within a set, then the set is simply connected. Vector Fields and Integrals Slide 544 Simply Connected Sets 18.19. Examples. (i) R2 \ {(x1 ; x2 ) : x12 + x22 ≤ 1} is not simply connected. (ii) R2 \ {0} is not simply connected. (iii) R3 \ {0} is simply connected. (iv) A torus is not simply connected. A closed curve C in a set ˙ can be thought of as the image of a continuous function g : S 1 → C, where S 1 = {(x1 ; x2 ) ∈ R2 : x12 + x22 = 1}. Let us write D = {(x1 ; x2 ) ∈ R2 : x12 + x22 ≤ 1}. 18.20. Definition. Let ˙ ⊂ Rn be an open set. (i) A closed curve C ⊂ ˙ given as the image of a map g : S 1 → C is said to be contractible to a point if there exists a continuous function G : D → ˙ such that G|S 1 = g . (ii) The set ˙ is said to be simply connected if it is connected and every closed curve in ˙ is contractible to a point. Vector Fields and Integrals Slide 545 Determining Potentials We will develop a practical way of obtaining a potential function for a vector field. The idea is simply to integrate the components of the field and compare the results, then try to find a compatible potential. This is best demonstrated by an example. 18.21. Example. Consider the field F (x1 ; x2 ) = (x12 + x22 ; 2x2 x1 + x22 ). Since F is defined on the simply connected set R2 and @F1 @F2 = 2x2 = @x2 @x1 @U @U the field is a potential field, i.e., F1 = @x , F2 = @x for some U : R2 → R. 1 2 We integrate the components to find U: U(x1 ; x2 ) = Z 1 F1 (x1 ; x2 ) dx1 = x13 + x22 x1 + C1 (x2 ) 3 where the integration constant C1 may be a function of x2 . (18.11) Vector Fields and Integrals Slide 546 Determining Potentials We repeat this for the second component: U(x1 ; x2 ) = Z 1 F2 (x1 ; x2 ) dx2 = x23 + x22 x1 + C2 (x1 ) 3 (18.12) where the integration constant C2 is allowed to depend on x1 . Comparing (18.11) with (18.12), we see that 1 U(x1 ; x2 ) = (x13 + x23 ) + x22 x1 3 is a potential function for F (of course, we can add any constant to U if we like). This procedure works analogously for vector fields in Rn . Vector Fields and Integrals Slide 547 Differential Forms We conclude this section by discussing an alternative notation for integration of vector fields. More properly, these concepts belong to a formal discussion of curves and surfaces in the field of vector analysis. We mention them here only because they might be encountered in certain old-fashioned textbooks. The transpose of a vector field is called a differential form: “ ” F (x)T = F1 (x); : : : ; Fn (x) = F1 (x) dx1 + · · · + Fn (x) dxn ; where the differentials dxj , j = 1; : : : ; n, are simply the standard basis row vectors, as defined in (17.1). 18.22. Definition. Let F1 ; : : : ; Fn : Rn → R be scalar functions. Then ¸ = F1 dx1 + · · · + Fn dxn is said to be a differential one-form. Vector Fields and Integrals Slide 548 Integral of a Differential Form We then simply define Z C∗ ¸ := Z C∗ F d ~‘ where F = (F1 ; : : : ; Fn )T is the transpose of the differential form ¸. 18.23. Example. We integrate the form 4y dx + 2x 2 y dy in counter-clockwise direction along the unit circle S 1 ⊂ R2 . We parametrize the circle by ‚(„) = (cos „; sin „), 0 ≤ „ < 2ı: I 2 4y dx + 2x y dy = S1 ! I 4y d ~‘ 2x 2 y S1 = Z 2ı ` 0 = −8ı: ´ 4 sin „ · (− sin „) + 2 cos2 „ sin „ · (cos „) d„ Flux and Circulation Slide 549 19. Flux and Circulation Flux and Circulation Slide 550 Vector Fields and Higher Order Derivatives 17. Potential Functions and the Gradient 18. Vector Fields and Integrals 19. Flux and Circulation 20. Fundamental Theorems of Vector Calculus 21. The Second Derivative 22. Free Extrema 23. Taylor Series Flux and Circulation Slide 551 Vector Fields of Fluids In the previous section we have primarily motivated line integrals of vector fields through the concept of work in a force field. Another physical approach is to motivate vector fields through velocity fields of fluids. This turns out to yield further useful concepts in field theory. x2 x1 We will consider fluid flows in R2 to introduce general concepts. Observe the vector field illustrated at left, interpreted as the direction field of a fluid flow, and the closed curve, interpreted as the boundary of a region. We can decompose the vector field at the boundary into a tangential component and a normal component. Flux and Circulation Slide 552 Circulation and Flux We interpret the normal component of the vector field as the part that flows through the boundary of the region, i.e., into or out of the region. This is called the flux of the vector field through the boundary. The tangential component is the part of the vector field that flows around the boundary, called the circulation of the field. 19.1. Example. Let S 1 = {(x1 ; x2 ) ∈ R2 : x12 + x22 = 1} be the unit circle, bounding the unit disc. Consider the two vector fields F; G : R → R ; 2 2 F (x1 ; x2 ) = ! x1 ; x2 G(x1 ; x2 ) = ! −x2 : x1 A unit tangent vector field to S 1 at (x1 ; x2 ) is given by T (x) = (−x2 ; x1 ), so ˛ ⟨T; F ⟩˛S 1 = −x2 x1 + x1 x2 = 0; ˛ ⟨T; G⟩˛S 1 = −x2 (−x2 ) + x1 x1 = 1: Flux and Circulation Slide 553 Circulation and Flux x2 x2 x1 A unit normal vector field at x ∈ S 1 is given by N(x) = −x, so ˛ ⟨N; F ⟩˛S 1 = −x1 x1 + −x2 x2 = −1; ˛ ⟨N; G⟩˛S 1 = x1 (−x2 ) + x2 x1 = 0: x1 Flux and Circulation Slide 554 Circulation and Flux 19.2. Definition. Let ˙ ⊂ Rn be an open set, F : ˙ → Rn a continuously differentiable vector field. Let C∗ be a positively oriented closed curve in Rn . Then Z C∗ ⟨F; T ⟩ d‘ (19.1) is called the (total) circulation of F along C∗ . Let S∗ ⊂ ˙ be an oriented hypersurface. Then Z S∗ is called the flux of F through S. ⟨F; N⟩ dA (19.2) Flux and Circulation Slide 555 Circulation and Flux 19.3. Remarks. I The integral (19.1) coincides with the line integral we defined in (13.3) and hence also gives the amount of work needed to move a particle along the closed curve C. In a non-rotating fluid, this work should be zero. I In R2 , a hypersurface is just a curve and (19.2) becomes a line integral. The normal vector is of course taken according to the convention used for surfaces. Flux and Circulation Slide 556 Flux Through Hypersurfaces 19.4. Remark. We also sometimes write Z ~ ⟨F; d A⟩ S for the flux integral. The term ~ := N(’(x)) · dA q g (x) dx is called the vectorial surface element of a hypersurface S. For a hypersurface in R3 we have N= t1 × t2 ; ∥t1 × t2 ∥ so that ~ = t1 (’(x)) × t2 (’(x)) dx1 dx2 : dA Flux and Circulation Slide 557 Flux of an Electrostatic Field 19.5. Example. A point charge Q at the origin induces a field E(p) = 1 Q p 4ı"0 ∥p∥3 at any point p ∈ R3 \ {0}. The flux of this field through the unit sphere S 2 is given by Z ~ ⟨E; d A⟩: S2 As in Example 16.11, we can parametrize S 2 by 0 1 cos ffi sin „ B C ’(ffi; „) = @ sin ffi sin „ A ; cos „ 0 1 cos ffi sin2 „ B C t„ × tffi = @ sin ffi sin2 „ A cos „ sin „ where 0 < ffi < 2ı and 0 < „ < ı. Here we have chosen the outward-pointing (positively oriented) normal vector. Flux and Circulation Slide 558 Flux of an Electrostatic Field It follows that Z S2 ~ = ⟨E; d A⟩ = Q 4ı"0 Q 4ı"0 Q 4ı"0 Q = : "0 = Z 2ı Z ı 0 0 1 ⟨’(ffi; „); tffi × t„ (ffi; „)⟩ d„ dffi ∥’(ffi; „)∥3 | {z =1 } 0 1 0 1 Z 2ı Z ı * cos ffi sin „ cos ffi sin2 „ + B C B C @ sin ffi sin „ A ; @ sin ffi sin2 „ A d„ dffi 0 0 Z 2ı Z ı 0 cos „ cos „ sin „ sin „ d„ dffi 0 The fact that this result is actually true for any closed surface (not just S 2 ) that contains the charge at the origin is known as Gauß’s law in electrostatics. Flux and Circulation Slide 559 Circulation and Flux in R2 19.6. Example. For the vector field F (x1 ; x2 ) = (1; 0) and the square pictured below, both the circulation and the flux are zero. y x Flux and Circulation Slide 560 Total and Infinitesimal Flux The previous example shows clearly that the total flux of a vector field through a boundary is the difference between “influx” and “efflux” of the field. In the context of fluid flow, zero (total) flux through a boundary means “what flows in also flows out” or “there are nor fluid sources or sinks within the boundary” We now want to characterize vector fields where the flux through any boundary is zero. In fluid flow, these correspond to fluid fields where the fluid volume is preserved (incompressible fluids with no external influx or efflux). This approach will lead to infinitesimal flux, i.e., the flux of the field through a given point (instead of across a surface). Flux and Circulation Slide 561 The Flux Through a Square Let ˙ ⊂ R2 be open and consider the flux of a conitnuously differentiable vector field F : ˙ → R2 through a square of side length 2h, h > 0, centered at a point x ∈ ˙. In particular, the square is given by x2 Sh = [x1 − h; x1 + h] × [x2 − h; x2 + h] l3 x2 + h l4 x2 - h x and the boundary consists of four lines, l2 @Sh = l1 ∪ l2 ∪ l3 ∪ l4 : l1 x1 - h x1 + h Z @Sh x1 ⟨F; N⟩ ds = We find the flux through the boundary by integrating 4 Z X lk ⟨F; Nk ⟩ ds Flux and Circulation Slide 562 The Flux Through a Square We have the following parametrizations and normal vectors: ` ´ ` ´ ` ´ 0 1 l1 : N1 = −1 , ‚1 (t) = x2x−h + t 10 , −h ≤ t ≤ h, ` ´ ` ´ ` ´ ` ´ ` ´ ` ´ l2 : N2 = 10 , ‚2 (t) = x1x+h + t 01 , −h ≤ t ≤ h, 2 1 l3 : N3 = 01 , ‚3 (t) = x2x+h − t 10 , −h ≤ t ≤ h, ` ´ ` ´ ` ´ x1 −h l4 : N4 = −1 − t 01 , −h ≤ t ≤ h. 0 , ‚4 (t) = x2 Hence, Z @Sh ⟨F; N⟩ ds = − Z h + −h Z h F2 (x1 + t; x2 − h) dt + −h Z h F2 (x1 − t; x2 + h) dt − −h Z h F1 (x1 + h; x2 + t) dt −h F1 (x1 − h; x2 − t) dt In the last two integrals, we substitute fi = −t and then rename the variable t. Flux and Circulation Slide 563 The Flux Through a Square This gives Z @Sh ⟨F; N⟩ ds = Z h −h + ` ´ F2 (x1 + t; x2 + h) − F2 (x1 + t; x2 − h) dt Z h −h ` ´ F1 (x1 + h; x2 + t) − F1 (x1 − h; x2 + t) dt We note that @F2 (x1 + t; x2 ) · h + o(h); @x2 @F1 (x1 ; x2 + t) F1 (x1 ± h; x2 + t) = F1 (x1 ; x2 + t) ± · h + o(h) @x1 F2 (x1 + t; x2 ± h) = F2 (x1 + t; x2 ) ± as h → 0, where the small o(h) symbols represent continuous functions of h and t. Flux and Circulation Slide 564 The Flux Through a Square Inserting this expansion and substituting in the integrals Z Z h „ « @F1 (x1 ; x2 + t) ⟨F; N⟩ ds = 2h + o(1) dt @x1 @Sh −h « Z h „ @F2 (x1 + t; x2 ) + 2h + o(1) dt @x2 −h « Z 1=2 „ @F1 (x1 ; x2 + 2ht) 2 = (2h) + o(1) dt @x1 −1=2 + (2h) 2 Z 1=2 „ −1=2 « @F2 (x1 + 2ht; x2 ) + o(1) @x2 dt: Flux and Circulation Slide 565 The Flux Through a Square Since the functions o(1) are continuous functions of h and t, we can apply Lemma 9.16 to see that ˛Z 1=2 ˛ ˛ ˛ ˛ o(1) dt ˛˛ ≤ ˛ −1=2 sup h→0 |o(1)| −−−→ 0: t∈[−1=2;1=2] Furthermore, we write Z 1=2 @F1 (x1 ; x2 + 2ht) dt @x1 −1=2 = Z 1=2 @F1 (x1 ; x2 ) dt + @x1 −1=2 Z 1=2 „ −1=2 @F1 (x1 ; x2 + 2ht) @F1 (x1 ; x2 ) − @x1 @x1 2 and do the same for the term involving @F @x2 . « dt Flux and Circulation Slide 566 The Flux Through a Square Since F is continuously differentiable, lim „ h→0 @F1 (x1 ; x2 + 2ht) @F1 (x1 ; x2 ) − @x1 @x1 « =0 for all t ∈ [−1=2; 1=2]. By Lemma 9.16, we then conclude ˛Z 1=2 „ ˛ « ˛ ˛ @F1 (x1 ; x2 + 2ht) @F1 (x1 ; x2 ) ˛ − dt ˛˛ ˛ @x1 @x1 −1=2 ˛ ˛ ˛ @F1 (x1 ; x2 + 2ht) @F 1 (x1 ; x2 ) ˛˛ ˛ ≤ sup − ˛ ˛ @x @x 1 t∈[−1=2;1=2] 1 h→0 −−−→ 0: This implies that 1 lim h→0 (2h)2 Z ˛ ˛ @F2 ˛˛ @F1 ˛˛ + : ⟨F; N⟩ ds = ˛ @x1 x @x2 ˛x @Sh Flux and Circulation Slide 567 Flux Density and the Divergence We have shown that if Sh is a square of side length 2h centered at x, ˛ ˛ Flux through Sh h→0 @F1 ˛˛ @F2 ˛˛ −−−→ + ˛ Area(Sh ) @x1 x @x2 ˛x Hence, the limit on the right corresponds to a flux density. There is a special term for this flux density: 19.7. Definition. Let ˙ ⊂ Rn and F : ˙ → Rn be a continuously differentiable vector field. Then div F := @F1 @Fn + ··· + @x1 @xn is called the divergence of F . The flux density at a point x is given by the divergence of the field at x. Although we have only proven this in the case of fields in R2 , this holds in any dimension n ≥ 2 (we will prove this using surface integrals in R3 later). Flux and Circulation Slide 568 Circulation Around a Parallelogram We now turn to the circulation of a vector field. Again, our goal is to find an expression for the infinitesimal circulation around a point x ∈ Rn . In contrast to the flux, where we had to restrict ourselves to R2 , we now consider a line integral in Rn . Let ˙ ⊂ Rn and F : ˙ → Rn a continuously differentiable vector field. Let x ∈ Ω and denote by P (u; v ) a parallelogram spanned by vectors u and v and centered at x. We want to calculate the circulation of F arounbd the boundary of the parallelogram, given by Z F d~s: @P (u;v ) Our goal is to analyze the integral when ∥u∥ + ∥v ∥ → 0. Flux and Circulation Slide 569 Circulation Around a Parallelogram The parallelogram is the union of four straight line segments parametrized by x2 u+v + tu; 2 u−v ‚2 (t) = x + + tv ; 2 u+v ‚3 (t) = x + − tu; 2 v −u ‚4 (t) = x + − tv 2 ‚1 (t) = x − v x u x1 for t ∈ [0; 1]. We will use that for every point ‚j (t) on the parallelogram, we can write F (‚j (t)) = F (x) + DF |x (‚j (t) − x) + o(∥u∥ + ∥v ∥) as ∥u∥ + ∥v ∥ → 0. Flux and Circulation Slide 570 Circulation Around a Parallelogram The line integral is then Z F d~s = @P (u;v ) 4 Z 1 X j=1 0 = Z 1 0 − = 0 − ⟨F ◦ ‚1 (t); u⟩ dt + Z 1 Z 1 ⟨F ◦ ‚j (t); ‚j′ (t)⟩ dt 0 Z 1 ⟨F ◦ ‚3 (t); u⟩ dt − ⟨DF |x ‚1 (t); u⟩ dt + Z 1 0 ` 0 Z 1 0 Z 1 0 ⟨DF |x ‚3 (t); u⟩ dt − 2´ + o (∥u∥ + ∥v ∥) ⟨F ◦ ‚2 (t); v ⟩ dt ⟨F ◦ ‚4 (t); v ⟩ dt ⟨DF |x ‚2 (t); v ⟩ dt Z 1 0 ⟨DF |x ‚4 (t); v ⟩ dt Flux and Circulation Slide 571 Circulation Around a Parallelogram From the linearity of the integral, the inner product and DF |x we can write Z F d~s = @P (u;v ) Z 1 0 + = Z 1 Z 1 0 + ⟨DF |x (‚1 (t) − ‚3 (t)); u⟩ dt 0 ` ⟨DF |x (‚2 (t) − ‚4 (t)); v ⟩ dt + o (∥u∥ + ∥v ∥)2 −⟨DF |x (u + v − 2tu); u⟩ dt Z 1 0 ` ⟨DF |x (u − v + 2tv ); v ⟩ dt + o (∥u∥ + ∥v ∥)2 ` = ⟨DF |x u; v ⟩ − ⟨DF |x v ; u⟩ + o (∥u∥ + ∥v ∥)2 ´ In leading order, the circulation is thus ⟨DF |x u; v ⟩ − ⟨DF |x v ; u⟩. ´ ´ Flux and Circulation Slide 572 The Circulation Density - Rotation / Curl The expression ⟨DF |x u; v ⟩ − ⟨DF |x v ; u⟩ is clearly anti-symmetric (it changes sign when u and v are interchanged) and bilinear. As we will see, it is the main term describing the circulation density in the plane spanned by u and v . It therefore deserves a special mention. 19.8. Definition. Let ˙ ⊂ Rn be open and F : ˙ → Rn a continuously differentiable vector field. Then the anti-symmetric, bilinear form rot F | : R × R → R; rot F | (u; v ) := ⟨DF | u; v ⟩ − ⟨DF | v ; u⟩ x n n x x x (19.3) is called the rotation (in mainland Europe) or curl (in anglo-saxon countries) of the vector field F at x ∈ Rn . We will study this bilinear form in more detail for the case of fields in R2 and R3 . Flux and Circulation Slide 573 The Rotation in R2 In R2 , the area of the parallelogram is given by |det(u; v )|. The circulation density (circulation per unit area) is then 1 |det(u; v )| Z F d~s = @P (u;v ) ⟨DF |x u; v ⟩ − ⟨DF |x v ; u⟩ + o(∥u∥ + ∥v ∥) det(u; v ) For instance, if we set u = h · u0 and v = h · v0 where h > 0 and ∥u0 ∥; ∥v0 ∥ > 0 are fixed, we have 1 |det(u; v )| Z @P (u;v ) h→0 F d~s −−−→ ⟨DF |x u0 ; v0 ⟩ − ⟨DF |x v0 ; u0 ⟩ : |det(u0 ; v0 )| Therefore, rot F |x (u; v ) ⟨DF |x u; v ⟩ − ⟨DF |x v ; u⟩ = |det(u; v )| Area(P (u; v )) (19.4) represents the infinitesimal circulation around a parallelogram centered at a point x ∈ R2 . Flux and Circulation Slide 574 The Rotation in R2 19.9. Theorem. Let ˙ ⊂ R2 be open and F : ˙ → R2 a continuously differentiable vector field. Then here exists a uniquely defined continuous potential function rot F : ˙ → R such that rot F | (u; v ) = rot F (x) · det(u; v ): (19.5) x Proof. The determinant is the unique alternating, normed, bilinear form in R2 . Since F |x is alternating and bilinear (but not normed) it must be a multiple of the determinant. In fact, we have rot rot F (x) = rot F (x) · det ! 1 0 ; 0 1 !! = = ⟨DF |x e1 ; e2 ⟩ − ⟨DF |x e2 ; e1 ⟩ @F2 @F1 − : = @x1 @x2 rot F | x ! 1 0 ; 0 1 !! Flux and Circulation Slide 575 The Rotation in R2 19.10. Remark. Comparing (19.5) with the circulation density (19.4), it is clear that the function rot F gives this circulation density. We note: The circulation density of a vector field F in R2 is represented by a scalar function, rot F . This scalar function is given by rot F = @F2 @F1 − : @x1 @x2 (19.6) Flux and Circulation Slide 576 The Rotation in R3 19.11. Theorem. Let ˙ ⊂ R3 be open and F : ˙ → R3 a continuously differentiable vector field. Then here exists a uniquely defined continuous vector field rot F : ˙ → R3 such that rot F | (u; v ) = det(rot F (x); u; v ) = ⟨rot F (x); u × v ⟩: (19.7) x Proof. Suppose that the vector field rot F exists for F = (F1 ; F2 ; F3 ); then we can use (19.7) to calculate the first component of rot F : (rot F (x))1 = ⟨rot F (x); e1 ⟩ = ⟨rot F (x); e2 × e3 ⟩ = = ⟨DF |x e2 ; e3 ⟩ − ⟨DF |x e3 ; e2 ⟩ @F3 @F2 = − : @x2 @x3 rot F | (e ; e ) x 2 3 Flux and Circulation Slide 577 The Rotation in R3 Proof (continued). Similarly, we calculate the other components of rot F to obtain 1 0 @F3 2 − @F @x3 C B @x2 B 1 @F3 C − rot F (x) = B @F : @x1 C @ @x3 A @F2 @F1 @x1 − @x2 Hence, the vector rot F is uniquely determined from F by (19.7). rot The existence of a vector y ∈ R3 such that F |x (u; v ) = det(y ; u; v ) will be proven in the assignments. Since F is continuously differentiable as a function of x, so is the vector y = y (x) and the vector field rot F (x) = y (x) exists. Flux and Circulation Slide 578 The Rotation in R3 19.12. Remark. Comparing (19.7) with the circulation density (19.4), the circulation density in the plane spanned by u and v at x is given by fi fl u×v rot F |x ; : ∥u × v ∥ We note: The circulation density of a vector field F in R3 is represented by a vector field, rot F . This vector field is given by 0 1 @F3 2 − @F @x3 C B @x2 B 1 @F3 C − rot F = B @F : @x1 C A @ @x3 @F2 @F1 @x1 − @x2 (19.8) Flux and Circulation Slide 579 The Rotation in R3 19.13. Remark. We can consider R2 as being a subspace of R3 by identifying points (x1 ; x2 ) ∈ R2 with (x1 ; x2 ; 0) ∈ R3 . Similarly, a vector field in R2 can be considered as a field in R3 of the form 0 1 F1 (x1 ; x2 ) C B F (x1 ; x2 ; x3 ) = @F2 (x1 ; x2 )A : 0 We then obtain 0 1 0 B C 0 rot F (x) = @ A; @F2 @F1 − @x1 @x2 effectively regaining (19.6) from (19.8). Flux and Circulation Slide 580 The Rotation in Higher Dimensions We have shown that for x ∈ Rn , rot F | (u; v ) = ⟨u; A(x)v ⟩ x (19.9) where A(x) = (DF |x )T − DF |x : Note that A(x)T = −A(x): so at any point x the rotation is represented as an antisymmetric matrix. Flux and Circulation Slide 581 The Rotation in Higher Dimensions The space has {A ∈ Mat(n × n; R) : AT = −A} rot F | can be represented by a scalar. Dimension 3 if n = 3, so rot F | can be represented by a vector in R . Dimension 6 if n = 4, so rot F | can not easily be represented by a I Dimension 1 if n = 2, so x I x I x vector. 3 Flux and Circulation Slide 582 Irrotational Fields rot A continuously differentiable field F : ˙ → Rn such that F |x = 0 for all x ∈ ˙ is said to be irrotational. Writing an irrotational field in the form (19.9), we see that A(x) = 0 for all x. This implies (DF |x )T = DF |x which means that @Fi @Fj = ; @xj @xi i; j = 1; : : : ; n: Hence, a potential field is irrotational. Conversely, if F : ˙ → Rn is defined on a simply connected domain, we may apply Theorem 18.18 to deduce that F is a potential field. Flux and Circulation Slide 583 Fluid Statics Fluid statics is the study of time-independent flows. In particular, the streamlines are assumed to be given by a direction field F in Rn (most often, n = 2 or 3) that does not depend on time. Irrotational fluid flow is often modeled by a potential field, i.e., one assumes F = ∇U for some potential U. (The resulting flow is known as potential flow.) If there are no sources or sinks and the fluid is incompressible, one additionally has div F = 0: Combining these two equations, one obtains 0 @U 1 @x1 2 2 B . C @ U @ U C div(∇U) = div B @ .. A = @x 2 + · · · + @x 2 = ∆U = 0: n 1 @U @xn Flux and Circulation Slide 584 The Laplace Equation The equation ∆U = 0 is a partial differential equation and known as the Laplace equation. Together with boundary conditions for the flow it can (in principle) be solved to yield the streamlines of an irrotational, incompressible fluid in any physical situation. However, practical solutions must often rely on numerical or approximate methods, as solving the equation explicitly is possible only in the simplest situations (such as fluid flow around a sphere). Finding solutions to this (and other) partial differential equations is one of the main research problems in applied mathematics and engineering. Solutions of the Laplace equation play a minor role in Vv286 (Honors Math II), a major role in Vv454 (Partial Differential Equations and Boundary Value Problems) and are a principal topic of Vv557 (Methods of Applied Mathematics II). Flux and Circulation Slide 585 Triangle Calculus It is convenient to introduce the formal “vector” 0 @ 1 B @x. 1 C C ∇ := B @ .. A @ @xn Then the gradient of a potential function f is just 0 @ 1 @x1 0 @f 1 @ @xn @f @xn @x1 B . C B . C C B C ∇f = B @ .. A f = @ .. A : The divergence of a vector field F can be expressed as 0 1 0 1 * @x@ F1 + 1 B . C B . C @Fn @F1 C div F = ⟨∇; F ⟩ = B @ .. A ; @ .. A = @x + · · · + @x : @ @xn Fn 1 n Flux and Circulation Slide 586 Triangle Calculus The rotation of a vector field F can be formally written as 0 1 e1 e2 e3 @ rot F = ∇ × F (x) = det B @ @x1 @ @x2 @ C ; @x3 A B F1 F2 C F3 where e1 ; e2 ; e3 are the standard basis vectors in R3 . Finally, we can formally write „ « „ @ 2 @ ⟨∇; ∇⟩ = + ··· + @x1 @xn @2 @2 + · · · + = @xn2 @x12 «2 =∆ so it is common for physicists to write ∇2 instead of ∆. Fundamental Theorems of Vector Calculus Slide 587 20. Fundamental Theorems of Vector Calculus Fundamental Theorems of Vector Calculus Slide 588 Vector Fields and Higher Order Derivatives 17. Potential Functions and the Gradient 18. Vector Fields and Integrals 19. Flux and Circulation 20. Fundamental Theorems of Vector Calculus 21. The Second Derivative 22. Free Extrema 23. Taylor Series Fundamental Theorems of Vector Calculus Slide 589 Fundamental Theorems of Integration In our previous integrals, we have often been able to formulate a “fundamental theorem” of the type “integral of f over domain” = “values of primitive of f on the boundary” For example, for a real integrable function defined on an interval, Z b a f (x) dx = F (b) − F (a) A similar formula was found to hold for a potential field integrated along a curve, where the potential of the vector field played the role of the primitive (see (18.8)). It turns out that suitable generalizations of this principle hold in higher dimensions. These fundamental theorems of vector analysis are among the most important theorems for engineering applications that we will study this term. Fundamental Theorems of Vector Calculus Slide 590 Green’s Theorem 20.1. Green’s Theorem. Let R ⊂ R2 be a bounded, simple region and ˙ ⊃ R an open set containing R. Let F : ˙ → R2 be a continuously differentiable vector field. Then Z @R∗ F d ~‘ = Z „ R @F2 @F1 − @x1 @x2 « dx (20.1) where @R∗ denotes the boundary curve of R with positive (counter-clockwise) orientation. 20.2. Remark. In fact, Green’s Theorem is valid not only for simple regions, but for more general regions, as we shall describe below. Fundamental Theorems of Vector Calculus Slide 591 Green’s Theorem Proof. Consider R as an ordinate region with respect to x2 , let ˙ an open set containing R and suppose F1 : ˙ → R is continuously differentiable. x2 The boundary @R of R is the union of the curves `1 = {(x1 ; x2 ) ∈ R2 : x1 ∈ [a; b]; x2 = ’1 (x1 )}; G2 R `2 = {(x1 ; x2 ) ∈ R2 : x1 ∈ [a; b]; x2 = ’2 (x1 )}; G3 G4 G1 a `3 = {(x1 ; x2 ) ∈ R2 : x1 = b; x2 ∈ [’1 (b); ’2 (b)]}; b x1 `4 = {(x1 ; x2 ) ∈ R2 : x1 = a; x2 ∈ [’1 (a); ’2 (a)]}: We will later imbue the boundary with positive orientation, so we discuss integrals along the curves `1 ; `2 ; `3 ; `4 in the directions indicated by the arrows. Fundamental Theorems of Vector Calculus Slide 592 Green’s Theorem Proof (continued). Consider the integral Z @F1 dx = R @x2 = Z b Z ’2 (x1 ) a Z b a ’1 (x1 ) @F1 dx2 dx1 @x2 F1 (x1 ; ’2 (x1 )) dx1 − Z b F1 (x1 ; ’1 (x1 )) dx1 : a We now integrate the vector field Fe : ˙ → R2 ; ! F1 (x) ; 0 Fe(x) = along `1 and −`2 , using the respective parametrizations ‚ (1) (t) = ! t ; ’1 (t) ‚ (2) (t) = ! t ; ’2 (t) t ∈ [a; b]: (20.2) Fundamental Theorems of Vector Calculus Slide 593 Green’s Theorem Proof (continued). A quick calculation yields Z `1 Z −`2 ! F1 d ~‘ = 0 Z b F1 ◦ ‚ (1) (t) dt = a ! Z b F1 (t; ’1 (t)) dt; a Z Z b b F1 d ~‘ = − F1 ◦ ‚ (2) (t) dt = − F1 (t; ’2 (t)) dt: 0 a a It is also easy to see that ! Z `3 so we find I @R ! F1 d ~‘ = 0 Z b a F1 d ~‘ = 0 Z `4 ! F1 d ~‘ = 0 0 F1 (t; ’1 (t)) dt − Z b a F1 (t; ’2 (t)) dt (20.3) Fundamental Theorems of Vector Calculus Slide 594 Green’s Theorem Proof (continued). Putting (20.2) and (20.3) together, ! Z @R Z F1 @F1 d ~‘ = − dx: 0 R @x2 (20.4) Repeating a similar argument with the scalar function F2 : ˙ → R and representing R as an ordinate region with respect to x1 yields Z @R ! 0 d ~‘ = F2 Z @F2 dx: R @x1 Adding (20.4) and (20.5) then gives (20.1). (20.5) Fundamental Theorems of Vector Calculus Slide 595 Green’s Theorem R 20.3. Example. We wish to calculate the integral ` x 4 dx + xy dy , where ` ⊂ R2 is the triangle with vertices (0; 0), (0; 1) and (1; 0), oriented positively. We note that the vector field F (x; y ) = (x 4 ; xy ) is defined on all of R2 and the region R bounded by the triangle is a simple region. Instead of evaluating three separate line integrals (one for each edge of the triangle) we can apply Green’s Theorem. We note that y 1 G R 1 x R = {(x; y ) ∈ R2 : x ∈ [0; 1]; 0 ≤ y ≤ 1 − x}: Then Z ` 4 x dx + xy dy = Z R (y − 0) d(x; y ) = Z 1 Z 1−x 0 0 1 y dy dx = : 6 Fundamental Theorems of Vector Calculus Slide 596 Measurement of Area – the Planimeter For F (x1 ; x2 ) = (−x2 ; x1 ) we obtain Z 1 |R| = 1 dx = 2 R Z @R ! −x2 d ~‘: x1 Hence the integral of a vector field around the boundary of a region can be used to determine the area of that region. Several measurement instruments, known as planimeters, have been developed to implement this. The most successful version is the polar planimeter, invented by the Swiss mathematician Jakob Amsler in 1854. Previous planimeters (the first was constructed in 1814) were not as accurate as the polar planimeter, and this remains the most common form today. A description of the basic functioning of the polar planimeter is quoted on the next slide. The source is an article on military surveys and earthwork constructions by GlobalSecurity.org. Fundamental Theorems of Vector Calculus Slide 597 Functioning of a Polar Planimeter “The planimeter [...] touches the paper at three points: the anchor point, P; the tracing point, T; and the roller, R. The adjustable arm, A, is graduated to permit adjustment to the scale of the plot. This adjustment provides a direct ratio between the area traced by the tracing point and the revolutions of the roller. As the tracing point is moved over the paper, the drum, D, and the disk, F, revolve. The disk records the revolutions of the roller in units of tenths; the drum, in hundredths; and the vernier, V, in thousandths.” http://www.globalsecurity.org/military/library/policy/army/fm/5-430-00-1/CH3.htm Polar Planimeter in Use. GlobalSecurity.org. Web. 22 July 2012 Fundamental Theorems of Vector Calculus Slide 598 Principle of a Polar Planimeter y To understand the principle of the polar planimeter, consider the sketch at right: vHx,yL Assume for simplicity that both arms of the planimeter have equal length r . When tracing the boundary curve of the pictured shape, the position of the pivot (p; q) changes as a function of (x; y ) (the point (p; q) is unique if we require that the angle between the two arms is less than ı). Hx,yL r r Hp,qL x The planimeter vector field v is given by ! 1 −(y − q(x; y )) v (x; y ) = r x − p(x; y ) where r 2 = (y − q(x; y ))2 + (x − p(x; y ))2 . The disk, roller and vernier of the planimeter record the integral of v (x; y ) along the boundary. Fundamental Theorems of Vector Calculus Slide 599 Principle of a Polar Planimeter The point (p(x; y ); q(x; y )) is given by the intersection of two circles of radius r about the origin and about (x; y ). From the two equations (p − x)2 + (q − y )2 = r 2 ; p2 + q2 = r 2; we obtain x 2 + y 2 − 2px − 2qy = 0: The implicit equations for p and q, q x 2 + y 2 − 2px − 2 r 2 − p 2 y = 0; q x + y − 2qy − 2 r 2 − q 2 x = 0; 2 2 yield (see assignments) @p @q + = 1: @x @y Fundamental Theorems of Vector Calculus Slide 600 Principle of a Polar Planimeter We then have planimeter reading = Z @R∗ Z „ v d ~‘ « @v2 @v1 = − dx dy @x @y R Z 1 1 = dx dy = |R|: r R r Hence, with r known, the area of the enclosed domain R has been found. Fundamental Theorems of Vector Calculus Slide 601 Finding Areas by Green’s Theorem 20.4. Example. Let E = {(x; y ) ∈ R2 : (x=a)2 + (y =b)2 = 1} be the ellipse centered at the origin with half-axes of length a > 0 and b > 0. We can find the area of the ellipse from 1 |E| = 2 First, @E is parametrized by @E ! −y d ~‘: x ! a cos t ; b sin t ‚(t) = We obtain 1 |E| = 2 = Z 1 2 Z 2ı * 0 Z 2ı 0 t ∈ [0; 2ı]: ! −b sin t −a sin t ; a cos t b cos t !+ dt ab(cos2 t + sin2 t) dt = ıab: Fundamental Theorems of Vector Calculus Slide 602 Physical Interpretation of Green’s Theorem Green’s theorem can be interpreted both in terms of the rotation and the divergence of a vector field. Let ˙ ⊂ R2 be open, F : ˙ → R2 and R ⊂ ˙ a simple region with boundary @R. Then the circulation (see (19.1) and the following discussion) around R is given by Z @R∗ F d ~‘ where @R∗ is oriented positively. Furthermore, the rotation of F is given by rot F = (see (19.6)). @F2 @F1 − @x1 @x2 Fundamental Theorems of Vector Calculus Slide 603 Physical Interpretation of Green’s Theorem Green’s Theorem then states that circulation along @R = Z @R∗ = Z „ = ZR F d ~‘ @F2 @F1 − @x1 @x2 « dx rot F dx R = integral of circulation density over R Fundamental Theorems of Vector Calculus Slide 604 Physical Interpretation of Green’s Theorem In a similar manner, we can show that Green’s Theorem relates the total flux through @R with the divergence of F . Define Fe(x) := ! −F2 (x) : F1 (x) Then ⟨F; N⟩ = ⟨Fe; T ⟩ for a tangent vector of @R (positively oriented) and the outward-pointing normal vector. This yields flux through @R = Z = Z @R∗ R = Z ⟨F; N⟩ d‘ = @ Fe2 @ Fe1 − @x1 @x2 Z ! @R∗ Fe d ~‘ dx div F dx R = integral of flux density over R Fundamental Theorems of Vector Calculus Slide 605 Ordinate Regions in Rn We will now develop generalizations of Green’s Theorem 20.1 to higher dimensions. In R2 , we have seen that Green’s Theorem provides macroscopic equations for both the divergence and the rotation, summarized as flux through @R = integral of flux density over R; circulation along @R = integral of circulation density over R: for a suitable region R ⊂ R2 . It turns out that in R3 these physical statements lead to two separate theorems. Fundamental Theorems of Vector Calculus Slide 606 Admissible Regions We have proven Green’s theorem for simple regions in R2 , but we have not precisely specified for which regions it is valid. Let us now do this: 20.5. Definition. (i) A subset R ⊂ Rn is called a region if it is open and (pathwise) connected. (ii) A region R ⊂ Rn is said to be admissible if it is bounded and its boundary is the union of a finite number of parametrized hypersurfaces whose normal vectors point outwards from R. 20.6. Theorem. Green’s Theorem is valid for any admissible region in R2 . Fundamental Theorems of Vector Calculus Admissible Regions Admissible regions may have edges and corners: Slide 607 Fundamental Theorems of Vector Calculus Slide 608 Admissible Regions The boundary may not behave “too wildly”. This region is not admissible: Fundamental Theorems of Vector Calculus Slide 609 Admissible Regions Removing an interior point means the boundary is not everywhere a hypersurface. This region is not admissible: Fundamental Theorems of Vector Calculus Slide 610 Admissible Regions Removing part of the center line means that it is impossible to find outward-pointing normal vectors. This region is not admissible: Fundamental Theorems of Vector Calculus Slide 611 Admissible Hypersurfaces in R3 20.7. Definition. A hypersurface S ⊂ R3 with parametrization ’ : R → S is said to be admissible if (i) the interior int R is an admissible region in R2 with an oriented boundary curve @R∗ and (ii) R is closed, i.e., R = R. In particular, for the boundary of the region R consists of a finite number of parametrized hypersurfaces in R2 , i.e., smooth curves. Let us write @R = C1 ∪ C2 ∪ · · · ∪ Ck : This boundary of R is of course mapped by ’ into a set of points of S. We would like to formulate a criterion for determining whether ’(@R) (or part of this set) constitutes an “actual boundary” of the surface S or not. Since @R∗ is oriented, we define the chain of curves C1∗ ∪ C2∗ ∪ · · · ∪ Ck∗ where the individual curves are traversed in the “correct” orientation determined by @R∗ . Fundamental Theorems of Vector Calculus Slide 612 Closed Hypersurfaces in R3 and those with Boundary 20.8. Definition. Let S ⊂ R3 be an admissible hypersurface with parametrization ’ : R → S. Let @R∗ = C1∗ ∪ C2∗ ∪ · · · ∪ Ck∗ , where each Ck∗ is an oriented smooth curve in R2 and all Ci are pairwise disjoint. (i) We say that ’ annihilates a chain of curves Ci1 ∪ · · · ∪ Cij if Z ’(Ci1 ∪···∪Cij ) 1 d‘ = 0: (ii) If ’ annihilates @R, S is said to be a closed surface. (iii) Denote by C′ ⊂ @R the largest chain of curves that is annihilated by ’. If C′ ̸= @R we say that S is a surface with boundary and define @ S := ’(@R \ C′ ): (20.6) Fundamental Theorems of Vector Calculus Slide 613 Admissible Hypersurfaces in R3 20.9. Example. The unit sphere S 2 ⊂ R3 is parametrized by 0 ’ : [0; 2ı] × [0; ı] → S 2 ; 1 cos ffi sin „ B C ’(ffi; „) = @ sin ffi sin „ A : cos „ The interior of [0; 2ı] × [0; ı] is clearly an admissible region in R2 , as it is closed and its boundary consists of four lines, which are hypersurfaces in R2 and the normal vectors can be taken to point outward. It is easily seen that ’ annihilates the boundary of the rectangle, so S 2 is a closed, admissible surface. Fundamental Theorems of Vector Calculus Slide 614 Admissible Hypersurfaces in R3 20.10. Example. The map ’ : [0; 2ı] × [0; ı=2] → S; 0 1 cos ffi sin „ C B ’(ffi; „) = @ sin ffi sin „ A : cos „ parametrizes a unit hemisphere S. The boundary not annihilated, so the hemisphere is a surface with boundary, given by @ S = {x ∈ R3 : x3 = 0; x12 + x22 = 1}: Fundamental Theorems of Vector Calculus Slide 615 Stokes’s Theorem in R3 There is a theorem that states that an oriented hypersurface in R3 is closed if and only if it divides R3 into two disjoint, connected components. Hence, an oriented hypersurface is closed if and only if it is the boundary of a region in R3 . After these preparations we can finally formulate one generalization of Green’s theorem to R3 : 20.11. Stokes’s Theorem. Let ˙ ⊂ R3 be an open set, S ⊂ ˙ a parametrized, admissible surface in R3 with boundary @ S and let F : ˙ → R3 be a continuously differentiable vector field. Then Z @ S∗ F d ~‘ = Z S∗ ~ rot F d A where the orientations of the boundary curve @ S∗ and the surface S∗ are chosen so that the normal vector to S∗ points in the direction of the thumb of the right hand if the four fingers point in the direction of the tangent vector to @ S∗ . Fundamental Theorems of Vector Calculus Slide 616 Orientation for Stokes’s Theorem in R3 Orientation for Stokes’s Theorem. James Stewart, Calculus, 4th Ed., Brooks Cole Right-hand Grip Rule. Wikimedia Commons. Wikimedia Foundation. Web. 28 July 2012 Fundamental Theorems of Vector Calculus Slide 617 Physical Interpretation of Stokes’s Theorem The physical interpretation of Stokes’s theorem is the same as for Green’s theorem: in the integral of infinitesimal circulations (the rotation) across a surface, the individual circulations cancel everywhere except at the boundary, so the total integral is just the circulation along the boundary: Illustration of Stokes’s Theorem. Wikimedia Commons. Wikimedia Foundation. Web. 28 July 2012 Fundamental Theorems of Vector Calculus Slide 618 Physical Interpretation of Stokes’s Theorem Furthermore, it does not matter how the surface is deformed if the boundary remains the same; the “infinitesimal circulations” continue to cancel each other. The integral of the circulation across the hemisphere below left is not affected by deformations (middle) and even equal to the circulation integrated over the disk with the same boundary (right). Fundamental Theorems of Vector Calculus Slide 619 Proof of Stokes’s Theorem Proof of Stokes’s Theorem 20.11. Stokes’s Theorem in R3 can be reduced to Green’s Theorem 20.1. Let ’ : R → S be the parametrization of the surface. Then Z S∗ ~= rot F d A Z = ZS ∗ R ⟨rot F; N⟩ dA ⟨rot F (’(x1 ; x2 ); t1 × t2 |’(x1 ;x2 ) ⟩ dx1 dx2 : By (19.7) and (19.3) we have rot F | `„t (’(x)); t«(’(x))´ @’ @’ = rot F | ; @x @x ⟨rot F (’(x)); t1 × t2 |’(x) ⟩ = ’(x) 1 2 ’(x) fi 1 @’ @’ ; = DF |’(x) @x1 @x2 fl2 − fi @’ @’ ; DF |’(x) @x1 @x2 fl Fundamental Theorems of Vector Calculus Slide 620 Proof of Stokes’s Theorem Proof of Stokes’s Theorem 20.11 (continued). By the chain rule, @ @x1 fi @’ F |’(x) ; @x2 2 fl fi @’ @’ = DF |’(x) ; @x1 @x2 fl * @2’ + F |’(x) ; @x1 @x2 + : 2 ’ ’ Using that @x@1 @x = @x@2 @x (as we will prove later), we obtain 2 1 fi @ @’ ⟨rot F (’(x)); t1 × t2 |’(x) ⟩ = F |’(x) ; @x1 @x2 = rot Fe(x); fl @ − @x2 fi @’ F |’(x) ; @x1 where Fe : R → R2 ; Fe(x) = ˙ ¸! F| ; @’1 ˙ ’(x) @x @’ ¸ : F |’(x) ; @x 2 fl Fundamental Theorems of Vector Calculus Slide 621 Proof of Stokes’s Theorem Proof of Stokes’s Theorem 20.11 (continued). We can now apply Green’s theorem in the admissible region R. Then Z S∗ ~= rot F d A Z = Z˙ = ZR ⟨rot F (’(x1 ; x2 ); t1 × t2 |’(x1 ;x2 ) ⟩ dx1 dx2 rot Fe(x) dx1 dx2 @R Fe d ~‘: Let us suppose (for simplicity) that @R is given by a single parametrization ‚ : I → @R. Then Z S∗ ~= rot F d A Z @R Fe d ~‘ = Z I ⟨Fe(‚(t)); ‚ ′ (t)⟩ dt: Fundamental Theorems of Vector Calculus Slide 622 Proof of Stokes’s Theorem Proof of Stokes’s Theorem 20.11 (continued). Inserting the definition of Fe, Z S∗ Z „D @’ E˛˛ ‚1′ (t) ˛ @x x=‚(t) I 1 « D @’ E˛˛ + F |’(x) ; ‚2′ (t) dt ˛ @x2 x=‚(t) Z D E d = F |’(‚(t)) ; ’(‚(t)) dt dt ZI = F d ~‘ ~= rot F d A F |’(x) ; @ S∗ where we have used the chain rule and that ’(‚(t)) parametrizes @ S. We have assumed a single parametrization for the boundary; in general, this calculation can be applied to each boundary segment. Fundamental Theorems of Vector Calculus Slide 623 Stokes’s Theorem in Rn 20.12. Remark. We have formulated and proved Stokes’s theorem in R3 , since we have a good idea of the structure of the rotation (as a three-dimensional vector) in R3 . To generalize Stokes’s theorem to n dimensions would require working with the rotation as a bilinear form and would entail a fair amount of abstract algebra and geometry, including a closer study of differential forms. this is beyond the scope of our course; if you are interested in this, search for books on vector analysis. For example, Michael Spivak’s book Calculus on Manifolds is a good place to start. Fundamental Theorems of Vector Calculus Slide 624 Gauß’s Theorem The other aspect of Green’s theorem is based on the physical idea of flux. This has a straightforward generalization to n dimensions: 20.13. Gauß’s Theorem. Let R ⊂ Rn be an admissible region and F : R → Rn a continuously differentiable vector field. Then Z R div F dx = Z @R∗ ~ F d A: The integrals make sense, as the boundary of an admissible region is a union of hypersurfaces. Recall that the surfaces are oriented in such a way that the normal vector points outward. We will prove Gauß’s theorem only for the case of simple regions, whose definition we now recall. Fundamental Theorems of Vector Calculus Slide 625 Ordinate Regions in Rn For x ∈ Rn we define x̂ (k) := (x1 ; : : : ; xk−1 ; xk+1 ; : : : ; xn ) ∈ Rn−1 as the vector x with the kth component omitted. 20.14. Definition. A subset R ⊂ Rn is said to be an ordinate region (with respect to xk ) if there exists a measurable set ˙ ⊂ Rn−1 and continuous, almost everywhere differentiable functions ’1 ; ’2 : ˙ → R, such that ˘ ¯ R = x ∈ Rn : x̂ (k) ∈ ˙; ’1 (x̂ (k) ) ≤ xk ≤ ’2 (x̂ (k) ) : If R is an ordinate region with respect to each xk , k = 1; : : : ; n, it is said to be a simple region. 20.15. Theorem. A simple region is admissible. Fundamental Theorems of Vector Calculus Slide 626 Proof of Gauß’s Theorem for Simple Regions We will prove Gauß’s theorem only for simple regions. Proof of Gauß’s Theorem 20.13. Suppose that F = (F1 ; : : : ; Fn ). Then we have to prove Z div F dx = Z @R∗ R ~ F d A; where @R∗ is oriented by the normal vector pointing outwards. Since Z div F dx = R n Z X @Fk @xk k=1 R dx; Z @R∗ ⟨F; N⟩ dA = n Z X k=1 @R Fk ⟨ek ; N⟩ dA it is sufficient to show that Z @Fk dx = R @xk for k = 1; : : : ; n. Z @R∗ Fk ⟨ek ; N⟩ dA (20.7) Fundamental Theorems of Vector Calculus Slide 627 Proof of Gauß’s Theorem for Simple Regions Proof of Gauß’s Theorem 20.13 (continued). We fix some k and use that R is a simple region, so we can write ˘ ¯ R = x ∈ Rn : x̂ (k) ∈ ˙; ’1 (x̂ (k) ) ≤ xk ≤ ’2 (x̂ (k) ) : for some ˙ ⊂ Rn−1 . The boundary of R is given by ˘ ¯ @R∗ = x ∈ Rn : x̂ (k) ∈ ˙; xk = ’1 (x̂ (k) ) | ˘ {z =:S1 } ∪ x ∈ Rn : x̂ (k) ∈ ˙; xk = ’2 (x̂ (k) ) | ˘ {z =:S2 ¯ } ¯ ∪ x ∈ Rn : x̂ (k) ∈ @˙; ’1 (x̂ (k) ) ≤ xk ≤ ’2 (x̂ (k) ) | {z =:S3 } Fundamental Theorems of Vector Calculus Slide 628 Proof of Gauß’s Theorem for Simple Regions Proof of Gauß’s Theorem 20.13 (continued). It is left as an exercise to show that the normal vector N to the “mantle” S3 is orthogonal to ek (by writing down a parametrization and showing that ek is in fact a tangent vector). Then the surface integral in (20.7) becomes Z @R∗ Fk ⟨ek ; N⟩ dA = Z @ S1∗ Fk ⟨ek ; N1 ⟩ dA + Z @ S2∗ Fk ⟨ek ; N2 ⟩ dA: (20.8) To evaluate the surface integrals, we need to find the unit normal vectors N1 and N2 . The surface S1 is parametrized by “ ˘1 (x̂ (k) ) = x1 ; : : : ; xx−1 ; ’1 (x̂ (k) ); xk+1 ; : : : ; xn ”T : Fundamental Theorems of Vector Calculus Slide 629 Proof of Gauß’s Theorem for Simple Regions Proof of Gauß’s Theorem 20.13 (continued). Then the jth tangent vector is given by tj = ”T @˘(x̂ (k) ) “ @’1 = 0; : : : ; 0; 1; 0; : : : ; 0; ; 0; : : : ; 0 ↑ @xj @xj j ↑ k for j ∈ {1; : : : ; n} \ {k}. The normal vector will be orthogonal to all n − 1 tangent vectors, so N1 = C · „ @’1 @’1 @’1 @’1 ;:::; ; −1; ;:::; @x1 @xk−1 @xk+1 @xn where C ∈ R is a normalization constant. «T Fundamental Theorems of Vector Calculus Slide 630 Proof of Gauß’s Theorem for Simple Regions Proof of Gauß’s Theorem 20.13 (continued). Since N1 is to have unit length, „ n “ X 1 @’1 ”2 = 1+ C @xj j=1 «1=2 = q 1 + |D’1 |2 : j̸=k We note that, after re-arranging rows, ˛ !˛ ˛ 1n−1 (D’1 )T ˛˛ ˛ n−k det |det(t1 ; : : : ; tn−1 ; N1 )| = C · ˛(−1) ˛ ˛ ˛ D’1 −1 ˛ 1˛ 0 ˛ ˛ 1n−1 0 ˛ «C˛˛ ˛ B P ` @’1 ´2 A˛ = C · ˛det @ D’1 −1 − ˛ ˛ @xj ˛ ˛ j̸=k = 1 : C Fundamental Theorems of Vector Calculus Slide 631 Proof of Gauß’s Theorem for Simple Regions Proof of Gauß’s Theorem 20.13 (continued). We see that ⟨ek ; N1 ⟩ = −C and hence Z @ S1∗ Z Fk ⟨ek ; N1 ⟩ dA = −C =− ˙ Z Fk (˘1 (x̂ (k) )) |det(t1 ; : : : ; tn−1 ; N1 )| d x̂ (k) | {z =1=C } Fk (x1 ; : : : ; xk−1 ; ’1 (x̂ (k) ); xk+1 ; : : : ; xn ) d x̂ (k) ˙ In the same way we can show that Z @ S2∗ Fk ⟨ek ; N2 ⟩ dA = Z ˙ Fk (x1 ; : : : ; xk−1 ; ’2 (x̂ (k) ); xk+1 ; : : : ; xn ) d x̂ (k) Fundamental Theorems of Vector Calculus Slide 632 Proof of Gauß’s Theorem for Simple Regions Proof of Gauß’s Theorem 20.13 (continued). Finally, we obtain Z @R∗ Fk ⟨ek ; N⟩ dA = = Z @ S2∗ Z Fk dA + Z @ S1∗ Fk dA Fk (x1 ; : : : ; xk−1 ; ’2 (x̂ (k) ); xk+1 ; : : : ; xn ) d x̂ (k) ˙Z − Fk (x1 ; : : : ; xk−1 ; ’1 (x̂ (k) ); xk+1 ; : : : ; xn ) d x̂ (k) ˙ Z Z ’2 (x̂ (k) ) @Fk (x) dxk d x̂ (k) = (k) ˙ ’1 (x̂ ) @xk = which shows (20.7). Z @Fk dx; R @xk Fundamental Theorems of Vector Calculus Slide 633 Application to Electromagnetics We have shown in Example 19.5 that the flux through the unit sphere S 2 of the electric field 1 Q E(p) = p 4ı"0 ∥p∥3 induced by a point charge at the origin is Z S2 ~ = ⟨E; d A⟩ Q : "0 This calculation can easily be modified so that it holds for any sphere @Br (0) of radius r > 0. Now let R ⊂ R3 be an admissible region containing the origin. Then the flux through R is given by Z R ~ = ⟨E; d A⟩ Z @Br (0) Q = + "0 ~ + ⟨E; d A⟩ Z R\Br (0) Z @(R\Br (0)) div E dx: ~ ⟨E; d A⟩ Fundamental Theorems of Vector Calculus Slide 634 Application to Electromagnetics Now @Ei Q @ xi = 2 2 @xi 4ı"0 @xi (x1 + x2 + x32 )3=2 ` 2 Q 1 2 2 3=2 2 2 2 1=2 2 ´ = (x + x + x ) − 3(x + x + x ) xi : 1 2 3 1 2 3 2 2 2 4ı"0 (x1 + x2 + x3 )3 Since 3 X ` ´ (x12 + x22 + x32 )3=2 − 3(x12 + x22 + x32 )1=2 xi2 = 0; i=1 we see that div E(x) = 0 for x ̸= 0. This implies that Z ~ = Q ⟨E; d A⟩ "0 R for any admissible region R ⊂ R3 . Fundamental Theorems of Vector Calculus Slide 635 Green’s Identities 20.16. Green’s Identities. Let R ⊂ Rn be an admissible region and u; v : R → R be twice continuously differentiable potential functions. Then Z R ⟨∇u; ∇v ⟩ dx = − Z u · ∆v dx + R Z @R∗ u @v dA @n (20.9) and Z R ` ´ u · ∆v − v · ∆u dx = Z @R∗ „ u « @v @u −v dA: @n @n (20.10) Here we have used the normal derivative (see Definition 17.9) in the integrals on the boundary. The relation (20.9) is commonly called Green’s first identity and (20.10) is known as Green’s second identity. Green’s identities can be regarded as “integration by parts for ∇ and ∆”. Fundamental Theorems of Vector Calculus Slide 636 Green’s Identities Proof. We make use of the relation div(u∇v ) = u div(∇v ) +⟨∇u; ∇v ⟩: | {z } =∆v Applying Gauß’s theorem, Z R ⟨∇u; ∇v ⟩ dx = Z ` = ZR = Z@R = Z ´ div(u∇v ) − u∆v dx ∗ @R∗ @R∗ ~− u∇v d A Z u∆v dx R u⟨∇v ; N⟩ dA − u @v dA − @n Z R Z u∆v dx R u∆v dx: Fundamental Theorems of Vector Calculus Slide 637 Green’s Identities Proof (continued). This proves the first identity, (20.9). The second identity, (20.10), follows by subtracting the two equations Z Z Z @v dA; @n R R Z Z Z @u ⟨∇v ; ∇u⟩ dx = − v · ∆u dx + v dA ∗ @n R R @R ⟨∇u; ∇v ⟩ dx = − from each other. u · ∆v dx + @R∗ u The Second Derivative Slide 638 21. The Second Derivative The Second Derivative Slide 639 Vector Fields and Higher Order Derivatives 17. Potential Functions and the Gradient 18. Vector Fields and Integrals 19. Flux and Circulation 20. Fundamental Theorems of Vector Calculus 21. The Second Derivative 22. Free Extrema 23. Taylor Series The Second Derivative Slide 640 The Second Derivative In the next section, we wish to discuss maxima and minima of potential functions on Rn . This requires us to analyze the concept of the second derivative of a function a little more closely. 21.1. Definition. Let X; V be finite-dimensional normed vector spaces and ˙ ⊂ X an open set. A function f : ˙ → V is said to be twice differentiable at x ∈ ˙ if I f is differentiable in an open ball B" (x) around x and I the function Df : B" (x) → L(X; V ) is differentiable at x. We say that f is twice differentiable on ˙ if f is twice differentiable at every x ∈ ˙. The derivative of Df (if it exists) is a map D(Df ) =: D2 f : ˙ → L(X; L(X; V )): (21.1) We call (21.1) the second derivative of f . If the map x 7→ D2 f |x is continuous on ˙ we say that f ∈ C 2 (˙; V ). The Second Derivative Slide 641 The Second Derivative for Potential Functions 21.2. Example. Consider a differentiable potential function f : Rn → R. Then the derivative is given by the Jacobian Df |x = “ ˛ @f ˛ @x1 x ::: ˛ ” @f ˛ @xn x : ::: @f ˛ @xn x The second derivative is the derivative of the map Df : Rn → L(Rn ; R), Df : x 7→ Df |x = “ ˛ @f ˛ @x1 x ˛ ” : The map Df is of course in general non-linear. The derivative is found by taking Df |x+h = Df |x + D2 f |x h + o(h) as h → 0. Here Df |x and Df |x+h ∈ L(Rn ; R) are linear maps from Rn → R, while D2 f |x ∈ L(Rn ; L(Rn ; R)) so that D2 f |x h ∈ L(Rn ; R). The Second Derivative Slide 642 The Second Derivative for Potential Functions Now what shape does D2 f |x take? A “column vector” h ∈ Rn is transformed into a linear map in L(Rn ; R) ≃ Mat(n × 1; R), which we can regard as the space of “row vectors”. Recall that in this case Df |x = (∇f (x))T and that the transposition is a linear map. Then we can write Df : x 7→ (∇f (x))T = Df |x : Hence, Df = ( · )T ◦ ∇f and we can differentiate Df by the chain rule. The derivative of the map ∇f : Rn → Rn ; 0 @f ˛ 1 ˛ B @x1. x C C ∇f (x) = B @ .. ˛ A @f ˛ @xn x can be easily calculated: it is just the Jacobian of ∇f . The Second Derivative Slide 643 The Hessian We hence have 0 ˛ B @ .. . ˛ 2 @2f ˛ @x2 @x1 ˛x ::: @ f ˛ @x1 @xn ˛x @ f ˛ @x2 @xn ˛x ::: D(∇f )|x = B where ˛ @2f ˛ ˛ @x B 1 @x1 x .. . ˛ 2 ˛ 1 @2f ˛ @xn @x1 ˛x C C .. ∈ Mat(n × n; R): . ˛ C A 2 @ f ˛ @xn @xn ˛x (21.2) @2f @ @f := @xi @xj @xi @xj is the second partial derivative of f with respect to xj (first) and xi (second). The matrix in (21.2) is important enough to warrant a special name: It is called the Hessian of f and denoted by Hess f (x): The Second Derivative Slide 644 The Hessian as a Bilinear Map Recall that the transposition is a linear map, so its derivative is again the transposition (see Example 10.5). Hence, D2 f |x h = D(( · )T ◦ ∇f (x))h = D( · )T |∇f (x) ◦ D(∇f )|x h = ( · )T ◦ D(∇f )|x h = (Hess f (x)h)T : (21.3) As required, D2 f |x h is a “row vector”, i.e., an element of L(Rn ; R). We see that if h̃ ∈ Rn is some other vector, D2 f |x h acts on h̃ via (D2 f |x h)h̃ = (Hess f (x)h)T h̃ = ⟨Hess f (x)h; h̃⟩ ∈ R: Note that the expression ⟨Hess f (x)h; h̃⟩ is linear in both h and h̃; hence we can also regard the second derivative as a bilinear map D2 f |x : Rn × Rn → R; (h; h̃) 7→ ⟨Hess f (x)h; h̃⟩: The Second Derivative Slide 645 The Second Derivative for General Functions The preceding extended example already includes all relevant ideas for the general case, which we now discuss. Let X; V be normed vector spaces, ˙ ⊂ X open and f : ˙ → V a differentiable function. Then the derivative of f is a map Df : ˙ → L(X; V ); x 7→ Df |x : (21.4) The derivative of Df (if it exists) is a map D(Df ) =: D2 f : ˙ → L(X; L(X; V )): We will investigate the space L(X; L(X; V )) a little more closely. Let x1 ; x2 ∈ X and L ∈ L(X; L(X; V )). Then Lx1 ∈ L(X; V ) and (Lx1 )(x2 ) ∈ V: The Second Derivative Slide 646 The Second Derivative as a Bilinear Map e : X × X → V given by To L ∈ L(X; L(X; V )) we can associate a map L e 1 ; x2 ) := (Lx1 )(x2 ) L(x (21.5) Obviously, for x1 ; x2 ; x2′ ∈ X and – ∈ F, e 1 ; x2 + x ′ ) = (Lx1 )(x2 + x ′ ) = (Lx1 )(x2 ) + (Lx1 )(x ′ ) L(x 2 2 2 e 1 ; x2 ) + L(x e 1 ; x ′ ); = L(x 2 e 1 ; x2 ) L(x1 ; –x2 ) = (Lx1 )(–x2 ) = –(Lx1 )(x2 ) = –L(x because Lx1 ∈ L(X; V ) is linear. Furthermore, since L ∈ L(X; L(X; V )), e 1 + x ′ ; x2 ) = (L(x1 + x ′ ))(x2 ) = (Lx1 + Lx ′ )(x2 ) L(x 1 1 1 e 1 ; x2 ) + L(x e ′ ; x2 ); = (Lx1 )(x2 ) + (Lx1′ )(x2 ) = L(x 1 e e L(–x 1 ; x2 ) = (–Lx1 )(x2 ) = –(Lx1 )(x2 ) = –L(x1 ; x2 ): e is a bilinear map, i.e., linear in both components. We thus see that L The Second Derivative Slide 647 Multilinear Maps 21.3. Definition. Let X; V be finite-dimensional normed vector spaces. The set of multilinear maps from X to V is denoted by L(n) (X; V ) := n o L : X × · · · × X → V : L linear in each component : In the special case V = R an element of L(n) (X; V ) is called a multilinear form. From the previous discussion we see that there is a canonical isomorphism L(X; L(X; V )) ∼ = L(2) (X; V ) given by (21.5). From now on, we will make no difference between these two spaces, and in fact drop the tilde in (21.5), treating L either as a bilinear map X × X → V or as a linear map X → L(X; V ). The Second Derivative Slide 648 Bilinear Forms on Rn 21.4. Example. Let X = Rn and V = R. Then we have seen that R L(2) ( n × Rn ; R) ∼ = L(Rn ; L(Rn ; R)): We know that L(Rn ; R) = (Rn )∗ ∼ = Rn , so we have R L(2) ( n × Rn ; R) ∼ = L(Rn ; Rn ) ∼ = Mat(n × n; R): (21.6) Thus the space of bilinear maps on Rn is isomorphic to the set of square n × n matrices. How does this work in practice? Every linear map L ∈ (Rn )∗ has the form L = ⟨z; · ⟩ for some z ∈ Rn . The Second Derivative Slide 649 Bilinear Forms on Rn We thus interpret an element of A ∈ L(Rn ; L(Rn ; R)) as a linear map that associates A : y 7→ Ly := ⟨zy ; · ⟩ (21.7) Equivalently, we associate to every y some zy = A(y ); this is realized through a matrix which we also call A: A : y 7→ zy : (21.8) Hence, for every y ∈ Rn we obtain a linear map ⟨Ay ; · ⟩ ∈ L(Rn ; R). Letting this linear map act on some x ∈ Rn we get Ly x = ⟨Ay ; x⟩ = L(x; y ): Often, one prefers to write ⟨x; Ay ⟩ instead of ⟨Ay ; x⟩. We thus see that Mat(n × n; R) ∼ = L(2) (Rn × Rn ; R) via A ↔ ⟨ · ; A( · )⟩: The Second Derivative Slide 650 Schwarz’s Theorem If f : Rn → R is twice differentiable, we can represent D2 f in the form of a square n × n matrix; this is just the Hessian we have introduced in (21.2). However, in general we can not represent the second derivative of a function Rn → Rm as a matrix; furthermore, even in the case of potential functions (m = 1) higher order derivatives can not be expressed as matrices. We now introduce a fundamental result governing the second derivative: 21.5. Schwarz’s Theorem. Let X; V be normed vector spaces and ˙ ⊂ X an open set. Let f ∈ C 2 (˙; V ). Then D2 f |x ∈ L(2) (X × X; V ) is symmetric for all x ∈ ˙, i.e., D2 f (u; v ) = D2 f (v ; u); for all u; v ∈ X. The Second Derivative Slide 651 Schwarz’s Theorem Proof. Fix x ∈ ˙ and choose r > 0 sufficiently small that Br (x) ⊂ ˙. Choose u; v ∈ ˙ such that ∥u∥; ∥v ∥ < r =2. Set g (x) := f (x + v ) − f (x). Then the Mean Value Theorem 11.6 we have f (x + v + u) − f (x + u) − f (x + v ) + f (x) = g (x + u) − g (x) = = Z 1 = Z 1 “Z 1 0 0 ` Z 1 0 Dg |x+tu u dt ´ Df |x+tu+v − Df |x+tu u dt 0 ” D2 f |x+tu+sv v ds u dt The Second Derivative Slide 652 Schwarz’s Theorem Proof (continued). Now the continuity of D2 f implies that D2 f |x+sv +tu − D2 f |x = o(1) as ∥u∥ + ∥v ∥ → 0. for any 0 ≤ s; t ≤ 1. By Lemma 9.16 (applied with y = (s; t) in the compact set [0; 1] × [0; 1] ⊂ R2 ) the convergence is even uniform in s and t, i.e., sup ∥D2 f |x+sv +tu − D2 f |x ∥ = o(1) 0≤s;t≤1 as ∥u∥ + ∥v ∥ → 0 where we use the operator norm. This implies that Z 1 “Z 1 0 0 ” D f |x+sv +tu v ds u dt = 2 as ∥u∥ + ∥v ∥ → 0. Z 1 “Z 1 0 0 ” D2 f |x v ds u dt + ∥u∥∥v ∥o(1): The Second Derivative Slide 653 Schwarz’s Theorem Proof (continued). Again from the Mean Value Theorem 11.6 we have Z 1 “Z 1 0 0 ” D f |x v ds u dt = 2 Z 1Z 1 0 0 ` 2 ´ D f |x v u ds dt: If we regard D2 f |x as a bilinear map, we have, as ∥u∥ + ∥v ∥ → 0, g (x + u) − g (x) = Z 1Z 1 0 2 0 D2 f |x (v ; u) ds dt + ∥u∥∥v ∥o(1) = D f |x (v ; u) + ∥u∥∥v ∥o(1) since the integrand does not depend on s or t. (21.9) The Second Derivative Slide 654 Schwarz’s Theorem Proof (continued). We may repeat this entire calculation, using ge(x) := f (x + u) − f (x) instead of g . We then obtain the same result, but with u and v interchanged: g (x + v ) − g (x) = D2 f |x (u; v ) + ∥u∥∥v ∥o(1) (21.10) as ∥u∥ + ∥v ∥ → 0. Now both (21.9) and (21.10) are equal to f (x + v + u) − f (x + u) − f (x + v ) + f (x), so taking the difference yields D2 f |x (v ; u) − D2 f |x (u; v ) = ∥u∥∥v ∥o(1) as ∥u∥ + ∥v ∥ → 0. Furthermore, we can now use a scaling argument to see that the right-hand side is actually zero. For this, note that the left-hand side may be regarded as a bilinear map L ∈ L(X × X; V ). The Second Derivative Slide 655 Schwarz’s Theorem Proof (continued). We will show that if L ∈ L(X × X; V ) and L(u; v ) = ∥u∥∥v ∥o(1) as ∥u∥ + ∥v ∥ → 0 then L = 0. Choose s; t ∈ R \ {0}. Then L(u; v ) = 1 L(su; tv ): st So for |s| + |t| → 0 we have ∥L(u; v )∥ = 1 1 ∥L(su; tv )∥ = o(1) ∥su∥∥tv ∥ = o(1)∥u∥∥v ∥ |st| |st| as |s| + |t| → 0. Since the left-hand side does not depend on s or t, we see that L(u; v ) = 0. The Second Derivative Slide 656 Symmetry of the Hessian In the case of potential functions (X = Rn , V = R), Theorem 21.5 implies that ⟨Hess f (x)y ; z⟩ = ⟨Hess f (x)z; y ⟩; x; y ; z ∈ Rn : which means Hess f (x) = (Hess f (x))T , i.e., the Hessian of f at x is a symmetric matrix. Writing out the components of Hess f (x), this means that @2f @2f = : @xi @xj @xj @xi In other words, if f is twice continuously differentiable, the order of differentiation in the second-order partial derivatives does not matter. (This will be the case if all second-oder partial derivatives are continuous. Why?) This immediately proves Lemma 18.15. Free Extrema Slide 657 22. Free Extrema Free Extrema Slide 658 Vector Fields and Higher Order Derivatives 17. Potential Functions and the Gradient 18. Vector Fields and Integrals 19. Flux and Circulation 20. Fundamental Theorems of Vector Calculus 21. The Second Derivative 22. Free Extrema 23. Taylor Series Free Extrema Slide 659 Extrema of Real Functions In this section we will focus on extrema of functions. Recall that a twice continuously differentiable real function f : R → R satisfies f (x + h) = f (x) + f ′ (x)h + f ′′ (x) 2 h + o(h2 ) 2 as h → 0: (22.1) The stationary points of f are given by f ′ (x) = 0 and their nature is determined by the sign of f ′′ (x). We now aim to develop an analogous theory for functions f : Rn → R. Our first goal will be to extend the formula f (x + h) = f (x) + Df |x h + o(h) into an expression analogous to (22.1). as h → 0 Free Extrema Slide 660 Quadratic Approximation of Potential Functions 22.1. Lemma. Let ˙ ⊂ Rn be an open set and f ∈ C 2 (˙; R). Then for any h ∈ Rn small enough that x + h ∈ ˙, f (x + h) = f (x) + ⟨∇f (x); h⟩ + Z 1 0 (1 − t)⟨Hess f (x + th)h; h⟩ dt: (22.2) Proof. By the Mean Value Theorem 11.6, f (x + h) − f (x) = Z 1 0 Df |x+th h dt = Z 1 0 1 · Df |x+th h dt We now want to integrate by parts, differentiating Df |x+th h and integrating 1 in the integrand. As a primitive for 1 we can take t + c for any c ∈ R; we choose t − 1. Free Extrema Slide 661 Quadratic Approximation of Potential Functions Proof (continued). By the chain rule and (21.3), d d Df |x+th = D2 f |x+th (x + th) = (Hess f (x + th)h)T : dt dt Hence d Df |x+th h = ⟨Hess f (x + th)h; h⟩: dt Then ˛1 f (x + h) − f (x) = (t − 1)Df |x+th h˛ − 0 = Df |x h + Z 1 0 Z 1 0 (t − 1) · ⟨Hess f (x + th)h; h⟩ dt (1 − t) · ⟨Hess f (x + th)h; h⟩ dt = ⟨∇f (x); h⟩ + Z 1 0 (1 − t)⟨Hess f (x + th)h; h⟩ dt: Free Extrema Slide 662 Quadratic Approximation of Potential Functions 22.2. Corollary. Let ˙ ⊂ Rn be an open set and f ∈ C 2 (˙; R)). Then, as h → 0, 1 f (x + h) = f (x) + ⟨∇f (x); h⟩ + ⟨Hess f (x)h; h⟩ + o(∥h∥2 ): 2 (22.3) Proof. In view of (22.2), we just need to show that Z 1 1 (1 − t)⟨Hess f (x + th)h; h⟩ dt = ⟨Hess f (x)h; h⟩ + o(∥h∥2 ); 2 0 as h → 0. Free Extrema Slide 663 Quadratic Approximation of Potential Functions Proof (continued). We have Z 1 = 0 Z 1 0 + (1 − t)⟨Hess f (x + th)h; h⟩ dt (1 − t)⟨(Hess f (x + th) − Hess f (x))h; h⟩ dt Z 1 0 (1 − t)⟨Hess f (x)h; h⟩ dt 1 = ⟨Hess f (x)h; h⟩ + 2 Z 1 0 (1 − t)⟨(Hess f (x + th) − Hess f (x))h; h⟩ dt; so it remains to show Z 1 0 (1 − t)⟨(Hess f (x + th) − Hess f (x))h; h⟩ dt = o(∥h∥2 ): Free Extrema Slide 664 Quadratic Approximation of Potential Functions Proof (continued). We have ˛Z 1 ˛ ˛ ˛ ˛ (1 − t)⟨(Hess f (x + th) − Hess f (x))h; h⟩ dt ˛ ˛ ˛ 0 ≤ 1 · sup |1 − t| · sup |⟨(Hess f (x + th) − Hess f (x))h; h⟩| t∈[0;1] t∈[0;1] ≤ sup ∥Hess f (x + th) − Hess f (x)∥ · ∥h∥2 ; t∈[0;1] where we have used the operator norm for the Hessian. In order to show the desired estimate, we need to establish lim sup ∥Hess f (x + th) − Hess f (x)∥ = 0: h→0 t∈[0;1] However, this follows from the continuity of the second derivative of f together with Lemma 9.15. Free Extrema Slide 665 Quadratic Forms 22.3. Definition. Let A ∈ Mat(n × n; R). Then the quadratic form induced by A is defined as the map QA := ⟨ · ; A( · )⟩; x 7→ ⟨x; Ax⟩ = n X ajk xj xk ; x ∈ Rn : j;k=1 Clearly, QA (–x) = –2 QA (x) for any – ∈ R. Note also that QA is continuous, because it is a polynomial in x1 ; : : : ; xn . Free Extrema Slide 666 Quadratic Forms 22.4. Definition. A quadratic form QA induced by a matrix A ∈ Mat(n × n; R) is called I positive definite if QA (x) > 0 for all x ̸= 0, I negative definite if QA (x) < 0 for all x ̸= 0, I indefinite if QA (x0 ) > 0 for some x0 ∈ y0 ∈ n . R Rn and QA (y0 ) < 0 for some A matrix A is said to be negative definite / positive definite / indefinite if the induced quadratic form QA has the corresponding property. 22.5. Remarks. I It is easy to see that not all quadratic forms fall into one of the above three categories. I If A is positive definite, then −A is negative definite. Free Extrema Slide 667 Quadratic Forms 22.6. Example. The matrix A= 1 −2 1 1 ! is positive definite, since QA (x) = * ! x1 1 −2 ; x2 1 1 ! x1 x2 !+ = x1 (x1 − 2x2 ) + x2 (x1 + x2 ) ´ 1` = x12 + x22 − x1 x2 = x12 + x22 + (x1 − x2 )2 : 2 This expression is strictly positive when x ̸= 0. Free Extrema Slide 668 Criteria for Definiteness We will be mainly interested in the case n = 2, and here we can give an explicit criterion: 22.7. Lemma. Let A ∈ Mat(2 × 2; R) be symmetric, i.e., A= ! a b : b c Let ∆ = det A. Then (i) A positive definite ⇔ a > 0 and ∆ > 0 (ii) A negative definite ⇔ a < 0 and ∆ > 0 (iii) A indefinite ⇔ ∆ < 0 The proof will be part of the assignments. Free Extrema Slide 669 Criteria for Positive Definiteness For our analysis of the extrema of a function, the following criteria are essential: 22.8. Lemma. The matrix A ∈ Mat(n × n; R) is positive definite ⇔ negative definite ⇔ ∃ 2 ∀ QA (x) ≥ ¸∥x∥ ∃ 2 ∀ QA (x) ≤ −¸∥x∥ ¸>0 x∈Rn ¸>0 x∈Rn Proof. If there exists some ¸ > 0 such that QA (x) ≥ ¸∥x∥2 for all x ∈ Rn , it is clear that QA (x) > 0 if x ̸= 0, so A is positive definite. Now let A be positive definite. In particular, QA (x) > 0 for x ∈ S n−1 = {x ∈ Rn : ∥x∥ = 1}. Since S n−1 is closed and bounded, it is compact by Theorem 9.9. Free Extrema Slide 670 Criteria for Definiteness Proof (continued). By Theorem 9.12, the minimum ¸ := min QA (x) x∈S n−1 exists and is strictly positive. Thus, for x ̸= 0, „ x 1 QA (x) = QA 2 ∥x∥ ∥x∥ « ≥¸ and so QA (x) ≥ ¸∥x∥2 . This is also trivially true for x = 0, so we have proven the statement for positive definite matrices. The matrix A will be negative definite if and only if −A is positive definite. Since Q−A (x) = −QA (x), the statement for negative definite matrices follows. Free Extrema Slide 671 Extrema of Real Functions We state the obvious definitions: 22.9. Definition. Let ˙ ⊂ Rn and f : ˙ → R. (i) f is said to have a (global) maximum at ‰ ∈ ˙ if x ∈˙ ⇒ f (x) ≤ f (‰): (ii) f is said to have a strict (global) maximum at ‰ ∈ ˙ if x ∈ ˙ \ {‰} ⇒ f (x) < f (‰): The function f is said to have a (strict) global minimum at ‰ if −f has a (strict) global maximum at ‰. Free Extrema Slide 672 Extrema of Real Functions 22.10. Definition. Let ˙ ⊂ Rn and f : ˙ → R. (i) f is said to have a local maximum at ‰ ∈ ˙ if there exists a ‹ > 0 such that x ∈ ˙ ∩ B‹ (‰) ⇒ f (x) ≤ f (‰): (ii) f is said to have a strict local maximum at ‰ ∈ ˙ if there exists a ‹ > 0 such that x ∈ ˙ ∩ B‹ (‰) \ {‰} ⇒ f (x) < f (‰): The function f is said to have a (strict) local minimum at ‰ if −f has a (strict) local maximum at ‰. As usual, we will be able to deal with extrema at interior points of ˙ using methods based on differentiation, while the boundary points of ˙ must be considered separately. Free Extrema Slide 673 Extrema of Real Functions 22.11. Theorem. Let ˙ ⊂ Rn , f : ˙ → R and ‰ ∈ int ˙. Assume that all partial derivatives of f exist at ‰ and that f has a local extremum (maximum or minimum) in ‰. Then ∇f (‰) = 0: If f is differentiable in ‰, this implies Df |‰ = 0. Proof. Let ‰ = (‰1 ; : : : ; ‰n ). Define g1 (x1 ) := f (x1 ; ‰2 ; : : : ; ‰n ): Then g1 has a local extremum at x1 = ‰1 and ˛ ˛ ˛ dg1 ˛˛ @f (x1 ; ‰2 ; : : : ; ‰n ) ˛˛ @f ˛˛ 0= = = : ˛ ˛ dx1 x1 =‰1 @x1 @x1 ˛x=‰ x1 =‰1 In the same way we see that all partial derivatives of f vanish at ‰. Free Extrema Slide 674 Extrema of Real Functions 22.12. Theorem. Let ˙ ⊂ Rn be open, f ∈ C 2 (˙) and ‰ ∈ ˙. Let ∇f (‰) = 0 (i.e., Df |‰ = 0). (i) If Hess f |‰ is positive definite, f has a strict local minimum at ‰. (ii) If Hess f |‰ is negative definite, f has a strict local maximum at ‰. (iii) If Hess f |‰ is indefinite, f has no extremum at ‰. Proof. Denote by Q the quadratic form induced by Hess f |‰ . Since Df |‰ = 0, we see from (22.3) that „ f (‰ + h) − f (‰) h 1 = Q 2 ∥h∥ 2 ∥h∥ « + %(h) with lim %(h) = 0. h→0 (22.4) Now let Hess f |‰ be positive definite. By Lemma 22.8 we can find ¸ > 0 such that Q(h=∥h∥) ≥ ¸ for all h ̸= 0. Free Extrema Slide 675 Extrema of Real Functions Proof (continued). For this ¸, we can find a ‹ > 0 such that 1. |%(h)| < ¸=2 whenever ∥h∥ < ‹ and 2. B‹ (‰) ⊂ ˙. Now every x ∈ B‹ (‰) \ {‰} can be expressed as x = ‰ + h with h := x − ‰ ̸= 0, so we have f (‰ + h) − f (‰) f (x) − f (‰) = ∥h∥2 = ∥h∥2 ∥h∥2 ≥ ∥h∥2 „ « „ „ 1 h Q 2 ∥h∥ « « + %(h) ¸ − |%(h)| > 0 2 for all x ∈ B‹ (‰) \ {‰}. Thus f (‰) is a strict local minimum. In the same way one sees that f (‰) is a strict local maximum if Hess f |‰ is negative definite. Free Extrema Slide 676 Extrema of Real Functions Proof (continued). Now assume that Hess f |‰ is indefinite. Then there exist h0 ; k0 ∈ Rn such that and Q(h0 ) > 0 Q(k0 ) < 0: For all – ̸= 0 we then have „ –h0 Q ∥–h0 ∥ and similarly Q « „ „ –2 h0 = Q 2 |–| ∥h0 ∥ –k0 ∥–k0 ∥ « = « = 1 Q(h0 ) =: "1 > 0 ∥h0 ∥2 1 Q(k0 ) =: −"2 < 0 ∥k0 ∥2 Then we see from (22.4) that for sufficiently small – ̸= 0 we have f (‰ + –h0 ) > f (‰) and f (‰ + –k0 ) < f (‰), so f can not be a local extremum. Free Extrema Slide 677 Extrema of Real Functions Applying Lemma 22.7, we have the following result: 22.13. Corollary. Let ˙ ⊂ R2 be open, f ∈ C 2 (˙) and ‰ ∈ ˙ with ∇f (‰) = 0. Set ˛ ˛ @ 2 f ˛˛ @ 2 f ˛˛ ∆ := det Hess f |‰ = ˛ ˛ − @x12 ˛‰ @x22 ˛‰ Then f (‰) is ˛ ˛ 2 I a local minimum if @ f2 ˛ @x1 ˛ „ ˛ « @ 2 f ˛˛ ˛ @x1 @x2 ˛‰ > 0 and ∆ > 0, ‰ ˛ 2f ˛ @ ˛ < 0 and ∆ > 0, I a local maximum if @x12 ˛ ‰ I no extremum if ∆ < 0. Note that if ∆ = 0, Corollary 22.13 yields no information. 2 Free Extrema Slide 678 Finding Extrema In searching for extrema of functions f ∈ C 2 (˙; R), we follow a four-step process: 1. Check for critical points ‰ ∈ int ˙, i.e., those were Df |‰ = 0. 2. Use Theorem 22.12 or Corollary 22.13 to check which of the critical points is an extremum. 3. Check the boundary @˙ separately for local extrema. 4. Identify the global extrema. Any finite global extremum must also be a local extremum, so it will be included among those found above. 22.14. Example. Let f (x; y ) = x 3 + y 3 − 3xy on R2 . Then ∇f = 0 gives @f = 3x 2 − 3y = 0; @x @f = 3y 2 − 3x = 0 @y or x 2 = y and y 2 = x. The only two solutions to these equations are (0; 0) and (1; 1). Free Extrema Slide 679 Finding Extrema We have 2 ∆(x; y ) = fxx fy y − fxy = (6x)(6y ) − (−3)2 = 36xy − 9: At (0; 0), ∆ < 0 so there is no extremum at this point. At (1; 1), ∆ > 0 and fxx = 6 > 0, so this point corresponds to local minimum. Since ˙ = R2 is open, there are no other extrema. Taylor Series Slide 680 23. Taylor Series Taylor Series Slide 681 Vector Fields and Higher Order Derivatives 17. Potential Functions and the Gradient 18. Vector Fields and Integrals 19. Flux and Circulation 20. Fundamental Theorems of Vector Calculus 21. The Second Derivative 22. Free Extrema 23. Taylor Series Taylor Series Slide 682 Higher-Order Derivatives In this section we will suppose (X; ∥ · ∥X ) and (V; ∥ · ∥z v ) to be normed vector spaces, ˙ ⊂ X an open set and we will consider functions f : ˙ →V. We may extend the strategy of Definition 21.1 to define derivatives of higher than second order inductively by setting Dk f |x = D(Dk−1 f )|x ∈ L(k) (X × ·{z · · × X}; V ): | k times for k = 2; 3; 4 : : :. Here, we again identify L(k) (X × · · · × X ; V ) ∼ = L(X; L(k−1) (X × · · · × X ; V )) | {z k times } | {z k − 1 times } We denote by C k (Ω; V ) the set of those functions whose kth derivative is continuous and by C ∞ (Ω; V ) the intersection of all sets C k (Ω; V ), k ∈ N. Taylor Series Slide 683 Multi-Index Notation For maps f ∈ C k (Rn ; R) the following multi-index notation for partial derivatives has been developed. This notation depends essentially on the interchangeability of partial derivatives. The tuple ¸ = (¸1 ; : : : ; ¸n ) ∈ Nn is called a multi-index of degree |¸| = ¸1 + · · · + ¸n . We also define n Y ¸! := ¸1 !¸2 !:::¸n ! = ¸i ! i=1 For f ∈ C k (Rn ; R) we define @ ¸ f := @¸f @ |¸| f := : @x ¸ @x1¸1 @x2¸2 : : : @xn¸n For x = (x1 ; : : : ; xn ) ∈ Rn we define x ¸ := x1¸1 x2¸2 : : : xn¸n = n Y ¸i xi : i=1 Taylor Series Slide 684 Multi-Index Notation In particular, if we have a potential function f ∈ C k (Rn ; R) and we let the kth derivative act on k copies of a vector u ∈ Rn , then Dk f |x (u; : : : ; u ) = | {z } k times X k! @ ¸ f (x)u ¸ : ¸∈Nn |¸|=k ¸! (23.1) (Prove this by induction!) This relation will be important in the formulation of Taylor’s Theorem for potential functions. For simplicity, we define the notation u (k) := (u; : : : ; u) | {z k times } for u ∈ X. (23.2) Taylor Series Slide 685 Taylor’s Theorem 23.1. Taylor’s Theorem. Suppose that f ∈ C k (˙; V ). Let x ∈ ˙ and h ∈ X such that the line ‚(t) = x + th, 0 ≤ t ≤ 1, is wholly contained within ˙. Denote by h(k) a k-tupel as in (23.2). Then for all p ≤ k, f (x + h) = f (x) + 1 1 Df |x h + · · · + Dp−1 f |x h(p−1) + Rp (x; h) 1! (p − 1)! (23.3) with the remainder term Rp (x; h) = Z 1 0 (1 − t)p−1 p D f |x+th h(p) dt: (p − 1)! Taylor Series Slide 686 Taylor’s Theorem Proof. We prove the theorem by induction in p. For p = 1 the theorem is just the Mean Value Theorem 11.6, so there is nothing to prove. In order to prove that the formula for p implies the corresponding formula for p + 1, we need to show 1 Rp (x; h) = Dp f |x h(p) + Rp+1 (x; h): p! Integrating by parts, we have Rp (x; h) = Z 1 0 (1 − t)p−1 p D f |x+th h(p) dt (p − 1)! ˛ ˛1 (1 − t)p p =− D f |x+th h(p) ˛˛ + p! t=0 Z 1 0 (1 − t)p d p D f |x+th h(p) dt p! dt Taylor Series Slide 687 Taylor’s Theorem Proof (continued). By the chain rule, we have d p D f |x+th h(p) = D(Dp f |z h(p) )|z=x+th h = Dp+1 f |x+th h(p+1) dt so ˛ Z 1 (1 − t)p d ˛1 (1 − t)p p D f |x+th h(p) ˛˛ + Dp f |x+th h(p) dt Rp (x; h) = − p! p! dt 0 t=0 Z 1 1 (1 − t)p p+1 = Dp f |x h(p) + D f |x+th h(p+1) dt p! p! 0 1 p (p) = D f |x h + Rp+1 (x; h) p! Taylor Series Slide 688 Taylor’s Theorem We will call tf ;x;p (h) := f (x + h) − Rp (x; h) 1 1 Dp−1 f |x h(p−1) = f (x) + Df |x h + · · · + 1! (p − 1)! (23.4) the Taylor polynomial of degree p − 1 of f at x in the variable h. Frequently we will set x = x0 and h = x. 23.2. Remark. If X = Rn , V = R, we see from (23.1) and (23.4) that f (x0 + x) − Rp (x0 ; x) = p−1 X p−1 X 1 X k! 1 k D f |x0 x (k) = @ ¸ f (x0 )x ¸ k! k! ¸! k=0 k=0 ¸∈Nn |¸|=k = p−1 X 1 ¸ @ f (x0 )x ¸ ¸! |¸|=0 Taylor Series Slide 689 Taylor’s Theorem Another way of expressing the Taylor expansion in Rn is to write f (x0 + x) = p−1 X 1 ¸ @ f (x0 )x ¸ + O(x p ) ¸! |¸|=0 (23.5) where O(x p ) is understood to refer to any combinations of monomials x ¸ with |¸| = p. 23.3. Example. The Taylor polynomial of degree 2 of the function f (x1 ; x2 ) = cos(x1 + 2x2 ) at x0 ∈ R2 is given by tf ;x0 ;3 (x1 ; x2 ) := f (x0 + x) − R3 1 1 1 = @ (0;0) f (x0 )x (0;0) + @ (1;0) f (x0 )x (1;0) + @ (0;1) f (x0 )x (0;1) (0; 0)! (1; 0)! (0; 1)! 1 1 1 @ (2;0) f (x0 )x (2;0) + @ (1;1) f (x0 )x (1;1) + @ (0;2) f (x0 )x (0;2 + (2; 0)! (1; 1)! (0; 2)! Taylor Series Slide 690 Taylor’s Theorem To find the Taylor polynomial at x0 = 0 we have tf ;0;3 (x1 ; x2 ) 1 1 1 = f (0)x10 x20 + @x1 f |x0 =0 x11 x20 + @x f |x =0 x10 x21 0!0! 1!0! 0!1! 2 0 1 1 2 1 2 + @ f |x =0 x12 x20 + @x x f |x =0 x11 x21 + @ f |x =0 x10 x22 2!0! x1 0 1!1! 1 2 0 0!2! x2 0 1 = f (0; 0) + @x1 f |x0 =0 x1 + @x2 f |x0 =0 x2 + @x21 f |x0 =0 x12 2 1 2 2 + @x1 x2 f |x0 =0 x1 x2 + @x2 f |x0 =0 x2 2 1 2 = 1 − x1 − 2x1 x2 − 2x22 2 = cos(x1 + 2x2 ) − R3 : Taylor Series Slide 691 Taylor’s Theorem We could also have obtained this result in an easier way by using the Taylor formula for functions of a single variable: 1 cos(x1 + 2x2 ) = 1 − (x1 + 2x2 )2 + O(x 4 ) 2 1 = 1 − x12 − 2x1 x2 − 2x22 + O(x 4 ): 2 In cases where this (quick) strategy will not easily work, the full formula (23.5) needs to be evaluated.