Uploaded by Cookies Round

vv285 all lecture slides

advertisement
Slide 1
Honors Mathematics III
Linear Algebra and Functions of Multiple Variables
Horst Hohberger
University of Michigan - Shanghai Jiao Tong University
Joint Institute
Summer Term 2022
Overview
Slide 2
Welcome to MATH285 !
I Please read the Course Description, which has been uploaded to the
Files section on the Canvas course site.
I My office is Room 441c in in the Longbin Building. Feel free to drop
in during my office hours (announced on Canvas) or just whenever you
find me there.
I You can also make an appointment by email or write to me with any
questions. My email is horst@sjtu.edu.cn
I The Teaching Assistants for this course will provide recitation classes,
office hours, and help with grading.
Overview
Slide 3
Remote Video Client: Zhumu
After Zoom decided to discontinue direct activities in China, it licensed its
software to several local companies. One of these companies is Zhumu.
Please download an “international” Zhumu client here:
https://zhumu.com/download-intl
(Note that this is a different client from the one that is offered by default
on the main page.)
Please create an account using you SJTU email address and make sure
that your alias is visible in roman transliteration.
Links for joining our classes by video will be published on Canvas. You are
required to keep these links confidential and to not share them with any
other JI student or anyone else.
Our course will not use Feishu.
Overview
Slide 4
Office Hours / Piazza
In addition to being available in office hours, I and the TAs will be
answering course-related questions on Piazza. Please also create an
account such that your name in pinyin is visible.
It is possible to send private messages on Piazza, but most messages
should be public so that everyone can see them and the responses or
respond themselves. Feel free to answer other students’ questions!
Please do not post anonymously unless you have a good reason. Don’t be
shy!
Please post messages in English only.
Here is the sign-up link:
piazza.com/sjtu.org/summer2022/math285
Overview
Slide 5
Mathematica
JI has obtained an unlimited student license for a computer algebra
software called Mathematica, developed by Wolfram Research.
You will be required to make use of this or a similar software in your
homework assignments and examinations, so you should obtain a copy.
Please see the Course Description for details on the download procedure.
Overview
Slide 6
Course Outcomes
The Course Description defines a set of “Course Outcomes.” These are a
sampling of minimal skills that you should obtain in the process of taking
this course.
The list is of course not exhaustive (you should actually learn much more
than what is given there). Nevertheless, it represents an indication of
whether the course successfully conveyed a selection of concepts.
Whether the outcomes are attained is evaluated in two ways:
I Subjectively: You will be asked your opinion on how well you think
you have mastered each given outcome in the Course Evaluation at
the end of the term.
I Objectively: The course will include a set of online quizzes on Canvas
that you can take in your own time without the extreme pressure
found in exams. Each quiz will evaluate one of the course outcomes.
The quizzes will contribute to the course grade.
Overview
Slide 7
Term Projects
You will be asked to complete a Term Project within the scope of the
present course. These projects will be assigned to randomly determined
teams of 4-5 students each.
The goal of the term projects is to conduct a deeper investigation into
specific topics and applications in physics, mathematics, and engineering.
More details will be announced on Canvas.
In general, the teams for both projects will be the same. However, on
request, certain teams may be re-arranged for the second project.
In principle, all members of a given team will receive a joint grade.
However, there may be an opportunity to evaluate the individual work
performed by the team members and the grade may be adjusted based on
this evaluation. More details will be given in the project descriptions.
Overview
Slide 8
Coursework
There will be weekly coursework throughout the term.
You will be randomly assigned into assignment groups of three students;
you are expected to collaborate within each group and hand in a single,
common solution paper to each coursework.
Each student must achieve 60% of the total coursework points by the end
of the term in order to obtain a passing grade for the course. However, the
assignment points have no effect on the course grade.
Each member of an assignment group will receive the same number of
points for each submission. However, there will be an opportunity for team
members to anonymously evaluate each others’ contributions to the
assignments. In cases where one or more group members consistently do
not contribute a commensurate share of the work, individual group
members may lose some or all of their marks.
Overview
Slide 9
Course Topics and Examinations
Our course will be split, broadly, into three parts:
1. Linear Algebra
2. Differential and Integral Calculus in Rn
3. Vector Fields and Higher-Order Derivatives
There will be two midterm and a final exam, correspondingly aligned with
these topics.
There will be frequent references to results and theorems from the previous
term; to allow cross-referencing, a current version of last term’s lecture
slides will be placed on the Canvas site. All theorems referenced from this
course will be prefixed by “186,” e.g., 186 Theorem 1.2.1 refers to
Theorem 1.2.1 in last term’s lecture.
Overview
Slide 10
Grading Policy
Please find the grade components in the Course Description on Canvas.
The course will be graded on a letter scale, with a certain number of
points corresponding to a letter grade.
The grading scale will usually be based on the top approximately 6-12% of
students receiving a grade of A+, with the following grades determined by
(mostly) fixed point increments.
Apart from this normalization, the grade distribution is up to you! If (for
example) all students obtain many points in the exams, I am happy to see
everyone receive a grade of A. Students are primarily evaluated with
respect to a fixed point scale, not with respect to each other.
Overview
Slide 11
More Info: Syllabus (a.k.a. Course Description)
Overview
Course Topics: Linear Algebra
1. Systems of Linear Equations
2. Finite-Dimensional Vector Spaces
3. Inner Product Spaces
4. Linear Maps
5. Matrices
6. Theory of Systems of Linear Equations
7. Determinants
Slide 12
Overview
Slide 13
Course Topics: Differential and Integral Calculus
8. Sets and Equivalence of Norms
9. Continuity and Convergence
10. The First Derivative
11. The Regulated Integral for Vector-Valued Functions
12. Curves, Orientation, and Tangent Vectors
13. Curve Length, Normal Vectors, and Curvature
14. The Riemann Integral for Scalar-Valued Functions
15. Integration in Practice
16. Parametrized Surfaces and Surface Integrals
Overview
Slide 14
Course Topics: Vector Fields and Higher-Order Derivatives
17. Potential Functions and the Gradient
18. Vector Fields and Integrals
19. Flux and Circulation
20. Fundamental Theorems of Vector Calculus
21. The Second Derivative
22. Free Extrema
23. Taylor Series
Slide 15
Part 1: Linear Algebra
Slide 16
Linear Algebra
1. Systems of Linear Equations
2. Finite-Dimensional Vector Spaces
3. Inner Product Spaces
4. Linear Maps
5. Matrices
6. Theory of Systems of Linear Equations
7. Determinants
Systems of Linear Equations
Slide 17
1. Systems of Linear Equations
Systems of Linear Equations
Linear Algebra
1. Systems of Linear Equations
2. Finite-Dimensional Vector Spaces
3. Inner Product Spaces
4. Linear Maps
5. Matrices
6. Theory of Systems of Linear Equations
7. Determinants
Slide 18
Systems of Linear Equations
Slide 19
Linear Systems of Equations
Throughout this course, we use the letter V to denote a (real or complex)
vector space. Whenever necessary, we use the letter F to stand for either R
or C, depending on the context. (F = R in the context of a real vector
space, F = C in the context of a complex vector space.)
A linear system of m (algebraic) equations in n unknowns x1 ; : : : ; xn ∈ V
is a set of equations
a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
..
.
(1.1)
am1 x1 + am2 x2 + · · · + amn xn = bm
where b1 ; : : : ; bm ∈ V and aij ∈ F, i = 1; : : : ; m, j = 1; : : : ; n.
If b1 = b2 = · · · = bm = 0, then (1.1) is called a homogeneous system.
Otherwise, it is called an inhomogeneous system.
Systems of Linear Equations
Slide 20
Linear Systems of Equations
1.1. Examples.
1. This is an inhomogeneous system of equations in R:
x1 + 3x2 − x3 = 1
x1 − 2x2
=2
10x2 + x3 = 1
2. This is a homogeneous system of equations in R:
x1 + 3x2 − x3 = 0
x1 − 2x2
=0
4x1 + 7x2 − 3x3 = 0
3. This is an inhomogeneous system of equations in R2 :
2x1 + x2 =
!
2
;
1
x1 − x2 =
0
1
!
Systems of Linear Equations
Slide 21
Linear Systems of Equations
In these examples, the number m of equations is equal to the number n of
variables. This is of course not always the case. If m < n we say that the
system in underdetermined, if m > n it is called overdetermined.
A solution of a linear system of equations (1.1) is a tuple of elements
(y1 ; : : : ; yn ) ∈ V n such that the predicate (1.1) becomes a true statement.
We will prove later that an inhomogeneous system of equations may have
either
I a unique solution or
I no solution or
I an infinite number of solutions.
A homogeneous system evidently always has the trivial solution
x1 = x2 = · · · = xn = 0:
It further either has
I no non-trivial solution or
I an infinite number of non-trivial solutions.
Systems of Linear Equations
Slide 22
Solving Linear Systems
We will later discuss the theory of existence and uniqueness of solutions for
linear systems of equations more extensively. For now, we want to discuss a
practical method for actually finding solutions.
In school, you have probably learned that there are some basic strategies
for solving systems of equations:
I solving one of the equations for a variable, and then substituting into
the other equations, thereby reducing the number of variables.
I manipulating two equations until they have identical expressions on
one side, then setting them equal.
I adding and subtracting multiples of one equation to another equation.
Perhaps you have encountered other stategies, but we will look at the last
of the three given here. We want to develop a method of systematically
solving systems of equations. If we employ a good strategy of adding
equations to each other, we will be able to determine the unknowns
efficiently and systematically.
Systems of Linear Equations
Slide 23
Solving Linear Systems
1.2. Example. Consider the system,
x1 + 3x2 − x3 = 1
x1 − 2x2
=2
10x2 + x3 = 1
Let us subtract the first equation from the second equation:
x1 + 3x2 − x3 = 1
−5x2 + x3 = 1
10x2 + x3 = 1
Next, we add twice the second equation to the third equation:
x1 + 3x2 − x3 = 1
−5x2 + x3 = 1
3x3 = 3
Systems of Linear Equations
Slide 24
Solving Linear Systems
We read off from
x1 + 3x2 − x3 = 1
−5x2 + x3 = 1
3x3 = 3
(starting from the last equation and proceeding upwards) that x3 = 1,
x2 = 0 and x1 = 2.
Instead of reading off the solution, we could have proceeded more
systematically: we divide the last equation by three and the second
equation by −5:
x1 + 3x2 − x3 = 1
1
1
x2 − x3 = −
5
5
x3 = 1
Systems of Linear Equations
Slide 25
Solving Linear Systems
We then add 1/5 times the last equation to the second equation, and the
simple last equation to the first equation:
x1 + 3x2
=2
x2
=0
x3 = 1
Lastly, we subtract thrice the second equation from the first equation:
x1
=2
x2
=0
x3 = 1
This gives us the solution directly.
Systems of Linear Equations
Slide 26
Equivalence of Linear Systems
By adding and subtracting one equation from another, we have been
effectively changing the system of equations. Formally, it may be useful to
understand the validity of this procedure using the notion of equivalence.
We say that two systems of linear equations are equivalent if any solution
of the first system is also a solution of the second system and vice-versa.
Thus the systems
x1 + 3x2 − x3 = 1
−5x2 + x3 = 1
10x2 + x3 = 1
are equivalent.
x1 = 2
and
x2 = 0
x3 = 1
Systems of Linear Equations
Slide 27
Simplifying Notation
Listing the variables and the equality sign is essentially a waste of space.
Instead of saying that we transform
x1 + 3x2 − x3 = 1
x1 + 3x2 − x3 = 1
−5x2 + x3 = 1
−5x2 + x3 = 1
to
10x2 + x3 = 1
3x3 = 3
by adding twice the second equation to the third equation, it would be
more efficient to write
1
3
0 −5
0
10
˛
−1 ˛˛ 1
˛
·2 ∼
1 ˛ 1
˛
1 ˛ 1 ←
−+
We will use this notation forthwith.
1
3
0 −5
0
0
˛
−1 ˛˛ 1
˛
1 ˛ 1
˛
3 ˛ 3
Systems of Linear Equations
Slide 28
The Gauß – Jordan Algorithm
The goal of the Gauß-Jordan algorithm (also called Gaußian
elimination) is to transform a system
∗ ∗ ∗ ⋄
∗ ∗ ∗ ⋄;
∗ ∗ ∗ ⋄
∗ ∈ R or C;
⋄∈V
first into the form
1 ∗ ∗ ⋄
0 1 ∗ ⋄
0 0 1 ⋄
(1.2)
1 0 0 ⋄
0 1 0 ⋄:
0 0 1 ⋄
(1.3)
and subsequently into
(Ideally; it may not always be possible to achieve the form (1.2))
Systems of Linear Equations
Slide 29
The Gauß – Jordan Algorithm
We are allowed to achieve this using elementary row manipulations.
These are
1. Swapping (interchanging) two rows,
2. Multiplying each element in a row with a number,
3. Adding a multiple of one row to another row.
Of course, each “row” represents an equation, so we are simply
manipulating equations. It is obvious that these manipulations will
transform a system into an equivalent system.
A system in the form (1.2) is said to be in upper triangular form; a
system in the form (1.3) is said to be in diagonal form.
The procedure for transforming a system into upper triangular form is
called forward elimination; the subsequent procedure for achieving
diagonal form is called backward substitution.
Systems of Linear Equations
Slide 30
The Gauß – Jordan Algorithm
1.3. Example. Consider the system
2x1 + x2 + x3 =
!
2
;
1
!
0
;
1
x1 − x2 =
x 1 + x3 =
!
1
:
1
We rewrite this as
2
1
1
1
˛ `2´
1 ˛˛
1
˛ ` ´
−1 0 ˛ 01
˛ ` ´
0 1 ˛ 1
(1.4)
1
We now proceed with forward elimination to achieve upper diagonal form.
Systems of Linear Equations
Slide 31
The Gauß – Jordan Algorithm (Forward Elimination)
Step 1a: Ensure that the top left hand element is equal to 1:
2
˛ `2´
1 −1
1
˛ `0 ´
−
1 ←
˛ `0´
0 ˛ 1 ←
−
˛ ` ´
˛
1
1
1
1 ˛˛
1
0
1 −1 0 ˛˛ 1
˛ ` ´
2 1 1 ˛ 21
∼
1
0
˛ ` ´
1 ˛
1
1
Step 1b: Eliminate (transform to zero) all lower entries in the first column:
˛ `0´
·(−2)
1 −1 0 ˛˛ 1
˛ `2´
2 1 1 ˛ 1 ←
−+
1
0
˛ ` ´
1 ˛ 1
1
˛ `0´
·(−1)
←−−−−−− +
∼
1 −1 0 ˛˛ 1
˛ `2´
0 3 1 ˛ −1
0
1
˛ ` ´
1
1 ˛
0
Systems of Linear Equations
Slide 32
The Gauß – Jordan Algorithm (Forward Elimination)
Step 2a: Ensure that the entry in the second row and second column is
equal to 1:
˛ `0´
1 −1 0 ˛˛ 1
˛ `2´
←
−
0 3 1 ˛ −1
0
1
˛ ` ´
1 ˛ 1 ←
−
0
˛ `0´
1 −1 0 ˛˛ 1
˛ ` ´
0 1 1 ˛ 10
∼
0
˛ `
1 ˛
3
2 ´
−1
Step 2b: Eliminate (transform to zero) all entries in the second column
below the second row:
˛ `0 ´
1 −1 0 ˛˛ 1
˛ ` ´
0 1 1 ˛ 10
0
3
˛ ` ´
1 ˛ 2
−1
1 −1
·(−3)
←
−+
∼
0
1
0
0
˛ `0´
1
˛ `1´
1 ˛ 0
˛ ` ´
−2 ˛ −1
0 ˛˛
−1
Systems of Linear Equations
Slide 33
The Gauß – Jordan Algorithm (Forward Elimination)
Step 3: Ensure that the entry in the third row and third column is equal
to 1:
1 −1
0
1
0
0
˛ `0´
˛ `0 ´
0 ˛˛
1
˛ ` ´
1 ˛ 10
˛ ` ´
−2 ˛ −1
−1 | : (−2)
∼
1 −1 0 ˛˛ 1
˛ `1´
0 1 1 ˛ 0
0
0
˛ `
1 ˛
1=2´
1=2
The system now has upper triangular form. We next commence the
backward substitution.
Systems of Linear Equations
Slide 34
The Gauß – Jordan Algorithm (Backward Substitution)
Step 1: Eliminate all entries in the third column above the third row:
˛ `0´
1 −1 0 ˛˛ 1
˛ `1´ − +
0 1 1 ˛ 0 ←
0
˛ ` ´
1 ˛ 1=2
0
1=2
˛
∼
0
·(−1)
`0´
1
1 −1 0 ˛˛
˛ ` 1=2 ´
0 1 0 ˛ −1=2
0
˛ ` ´
1=2
1 ˛
1=2
Step 2: Eliminate all entries in the second column above the second row:
`0´
←
−+
1
1 −1 0 ˛˛
`
´
1=2
˛
0 1 0 ˛ −1=2
˛
0
0
˛ ` ´
1 ˛ 1=2
1=2
˛
∼
`1=2´
1 0 0 ˛˛ 1=2
˛ ` 1=2 ´
0 1 0 ˛ −1=2
˛
0 0 1 ˛ `1=2´
1=2
Our system now has diagonal form, and we may directly read of the
solution.
Systems of Linear Equations
Slide 35
The Gauß – Jordan Algorithm
We see that the system
2x1 + x2 + x3 =
!
2
;
1
!
0
;
1
x1 − x2 =
!
1
:
1
x 1 + x3 =
is solved by
x1 =
!
1=2
;
1=2
x2 =
!
1=2
;
−1=2
x3 =
!
1=2
:
1=2
We notice that instead of solving a single system in R2 , we could have
solved two systems in R, determining the components of x1 ; x2 ; x3
separately from
2x11 + x21 + x31 = 2;
x11 − x21 = 0;
x11 + x31 = 1
2x12 + x22 + x32 = 1;
x12 − x22 = 1;
x12 + x32 = 1:
and
Systems of Linear Equations
Slide 36
Existence and Uniqueness of Solutions
1.4. Remark. A system of m equations with n unknowns will have a unique
solution if and only if it is diagonalizable, i.e., if it can be transformed
into diagonal form (1.3). Since backward substitution will always work, we
see that a unique solution exists if and only if the system can be
transformed into an upper triangular form, such as
1 ∗ ∗ ⋄
0 1 ∗ ⋄
0 0 1 ⋄
(m = n = 3)
or
1
0
0
0
0
∗
1
0
0
0
∗
∗
1
0
0
⋄
⋄
⋄
0
0
(m = 5; n = 3)
Systems of Linear Equations
Slide 37
Existence and Uniqueness of Solutions
Thus m ≥ n is a necessary condition for the existence of a unique solution.
A system has no solution if one of the rows has the form
0 ::: 0 0 ⋄
⋄ ̸= 0;
which represents the false statement 0 = ⋄.
If a system has more than one solution, it can be transformed into a
so-called echelon form, e.g.,
1 ∗ ∗ ∗ ∗ ⋄
0 1 ∗ ∗ ∗ ⋄
0 0 0 1 ∗ ⋄
0 0 0 0 1 ⋄
In this case, one of the unknowns acts as a parameter.
(1.5)
Systems of Linear Equations
Slide 38
The Solution Set
1.5. Definition. The solution set S of a system of equations (1.1) is the
set of all n-tuples of numbers x1 ; : : : ; xn that satisfy (1.1).
I If a linear system has a unique solution, the set S contains a single
point.
I If there is no solution, S = ∅.
I If there is more than one solutions, S is an infinite set.
1.6. Example. Consider the real system given by
x1 + 2x2 + 3x3 = 0;
4x1 + 5x2 + 6x3 = 0;
7x1 + 8x2 + 9x3 = 0:
Systems of Linear Equations
Slide 39
A Homogeneous System
Applying our algorithm,
˛
·(−4)
1 2 3 ˛˛ 0
˛
4 5 6 ˛ 0 ←
−+
∼
˛
˛ 0
˛
˛
0 −3 −6 ˛ 0 | : (−3) ·6
˛
0 −6 −12 ˛ 0 ←−−−−−− +
∼
1 0 −1 ˛˛ 0
˛
0 1 2 ˛ 0
·(−7)
˛
7 8 9 ˛ 0 ←−−−−−− +
1
˛
∼
1 2 3 ˛˛ 0 ←
−+
˛
·(−2)
0 1 2 ˛ 0
˛
0 0 0 ˛ 0
2
3
˛
0 0
˛
0 ˛ 0
Writing this system out explicitly,
x1 = x3 ;
x2 = −2x3
where x3 ∈ R is arbitrary. It is often convenient to introduce a parameter:
x1 = ¸;
x2 = −2¸;
x3 = ¸;
¸ ∈ R:
Systems of Linear Equations
Slide 40
A Homogeneous System
In vector notation, the solution is
0 1
0
1
0
1
x1
¸
1
B C B
C
B C
x = @x2 A = @−2¸A = ¸ @−2A ;
x3
¸
1
¸ ∈ R:
The solution set is
8
>
<
0 1
0
1
9
>
x1
1
=
B C
B C
3
S = x = @x2 A ∈ R : x = ¸ · @−2A ; ¸ ∈ R :
>
>
:
;
x3
1
Geometrically, S corresponds to a straight line through the origin. We will
return to discuss the geometric properties of solutions to systems of
equations (they turn out to be affine spaces) later.
Systems of Linear Equations
Slide 41
Gauß – Jordan with Mathematica
1.7. Example. Consider the system of equations
x1 − 2x2 + 3x3 + 4x4 = 2;
x1 − 2x2 + 5x3 + 5x4 = 3;
−x1 + 2x2 − x3 − 4x4 = 2:
In our array notation, this is
1
−2
3
1
−2
5
−1
2
˛
4 ˛˛ 2
˛
5 ˛ 3
˛
−1 −4 ˛ 2
Systems of Linear Equations
Slide 42
Gauß – Jordan with Mathematica
To enter a table/array/matrix in Mathematica (these are all represented in
the same way), use the following command structure:
A = 881, -2, 3, 4, 2<, 81, -2, 5, 5, 3<, 8-1, 2, -1, -4, 2<<
881, -2, 3, 4, 2<, 81, -2, 5, 5, 3<, 8-1, 2, -1, -4, 2<<
For convenience, we have here given our array a name, “A”. We can
retrieve the array by referring to A:
A
881, -2, 3, 4, 2<, 81, -2, 5, 5, 3<, 8-1, 2, -1, -4, 2<<
Systems of Linear Equations
Slide 43
Gauß – Jordan with Mathematica
A more easily readable form is obtained using the TableForm command:
TableForm@AD
1
1
-1
-2
-2
2
3
5
-1
4
5
-4
2
3
2
The RowReduce command implements the Gauß-Jordan algorithm,
returning the echelon form:
TableForm@RowReduce@ADD
1
0
0
-2
0
0
0
1
0
0
0
1
8
2
-3
Systems of Linear Equations
Slide 44
Fundamental Lemma for Homogeneous Equations
We will discuss the general theory of uniqueness and existence of solutions
to linear equations, after we have studied vector spaces a little more closely.
However, the following fundamental lemma requires no additional theory:
1.8. Lemma. The homogeneous system
a11 x1 + a12 x2 + · · · + a1n xn = 0
..
.
am1 x1 + am2 x2 + · · · + amn xn = 0
of m equations in n real or complex unknowns x1 ; : : : ; xn has a non-trivial
solution if n > m.
Systems of Linear Equations
Slide 45
Fundamental Lemma for Homogeneous Equations
Proof.
We proceed by induction in m, the number of equations. This means that
for any m ∈ N \ {0} we will establish that the system has a non-trivial
solution if n > m.
We first prove the statement of the lemma for m = 1, i.e., we show that
a11 x1 + a12 x2 + · · · + a1n xn = 0;
a1k ̸= 0; k = 1; : : : ; n
(1.6)
has a non-trivial solution whenever n > 1.
Proof by induction: For n = 2, a11 x1 + a12 x2 = 0 has the solution
x2 = 1, x1 = −a12 =a11 . If (1.6) has a non-trivial solution (x1 ; : : : ; xn ),
then
a11 x1 + a12 x2 + · · · + a1n xn + a1(n+1) xn+1 = 0
has the non-trivial solution (x1 ; : : : ; xn ; 0).
Systems of Linear Equations
Slide 46
Fundamental Lemma for Homogeneous Equations
Proof (continued).
We assume that in the system
a11 x1 + a12 x2 + · · · + a1n xn = 0
..
.
am1 x1 + am2 x2 + · · · + amn xn = 0
at least one aij ̸= 0 (otherwise the theorem is trivially true). By reordering
the equations and renumbering the indices, we can ensure that a11 ̸= 0.
We write this system as
a11
a21
a31
..
.
a12
a22
a32
..
.
:::
:::
:::
˛
a1n ˛˛ 0
a2n ˛˛ 0
˛
a3n ˛ 0
.. ˛˛ ..
. ˛ .
˛
am1 am2 : : : amn ˛ 0
Systems of Linear Equations
Slide 47
Fundamental Lemma for Homogeneous Equations
Proof (continued).
Then
a11 a12
:::
a21
a22
:::
a31
..
.
a32
..
.
:::
˛
a1n ˛˛ 0
˛
a
·(− a21 )
11
a
·(− a31 )
11
a
·(− am1 )
11
a2n ˛˛ 0 ←
−+
˛
a3n ˛ 0 ←−−−−−−− +
˛
.. ˛˛ ..
. ˛ .
˛
am1 am2 : : : amn ˛ 0 ←−−−−−−−−−−−−−− +
∼
a11
a12
:::
a1n
a21 a12
0 a22 − a11 : : : a2n − a21a11a1n
0
0
a32 − a31a11a12 : : : a3n − a31a11a1n
0
..
..
..
.
.
.
am1 a1n
a12
:
:
:
a
−
0 am2 − am1
mn
a11
a11
..
.
0
0
(1.7)
Systems of Linear Equations
Slide 48
Fundamental Lemma for Homogeneous Equations
Proof (continued).
The boxed area represents a homogeneous system of m − 1 equations in
n − 1 unknowns x2 ; : : : ; xn .
We continue with our proof by induction. The case m = 1 has been
established. Now assume that for m − 1 there exists a non-trivial solution
whenever the number of unknowns is greater than m − 1. A system with m
equations and n > m unknowns may be transformed into the form (1.7).
The subsystem indicated by the boxed area in (1.7) by assumption has a
non-trivial solution x2 ; : : : ; xn . Then the system of m equations in n
unknowns has the solution
“
x= −
”
1
(a12 x2 + · · · + a1n xn ); x2 ; : : : ; xn ;
a11
which is also non-trivial.
Finite-Dimensional Vector Spaces
Slide 49
2. Finite-Dimensional Vector Spaces
Finite-Dimensional Vector Spaces
Linear Algebra
1. Systems of Linear Equations
2. Finite-Dimensional Vector Spaces
3. Inner Product Spaces
4. Linear Maps
5. Matrices
6. Theory of Systems of Linear Equations
7. Determinants
Slide 50
Finite-Dimensional Vector Spaces
Slide 51
Linear Independence
We assume throughout that V is a real or complex vector space. As usual,
we will use the letter F to denote either R (for real vector spaces) or C (for
complex vector spaces).
We want to distinguish elements of vector spaces that are not simply
multiples of each other. For example, the vectors u; v ∈ R2 ,
u=
!
1
;
2
v=
−2
−4
!
are multiples of each other, because v = −2u. In general, we say that
u; v ∈ V are multiples of each other if
∃ : u = –v
–∈F
or
∃
–1 ;–2 ∈F
|–1 |+|–2 |̸=0
: –1 u + –2 v = 0
Finite-Dimensional Vector Spaces
Slide 52
Linear Independence
If u and v are not multiples of each other, we say that they are (linearly)
independent. This means that
„
¬
∃
–1 ;–2 ∈F
|–1 |+|–2 |̸=0
: –1 u + –2 v = 0
«
or
∀
–1 ;–2 ∈F
–1 u + –2 v = 0
⇒
–1 = –2 = 0:
2.1. Definition. Let V be a real or complex vector space and
v1 ; : : : ; vn ∈ V . Then the vectors v1 ; : : : ; vn are said to be independent if
for all –1 ; : : : ; –n ∈ F
n
X
–k vk = 0
⇒
–1 = –2 = · · · = –n = 0:
k=1
A finite set M ⊂ V is called an independent set if its elements are
independent.
Finite-Dimensional Vector Spaces
Slide 53
Linear Independence
2.2. Example. The vectors
v1 =
!
1
;
0
v2 =
0
2
!
are independent (and M = {v1 ; v2 } is an independent set), because
0
0
!
= 0 = –1 v1 + –2 v2 =
–1
2–2
!
is equivalent to the system of equations
0 = –1 ;
0 = 2–2 ;
which has the unique solution –1 = 0 and –2 = 0.
Finite-Dimensional Vector Spaces
Slide 54
Linear Independence
2.3. Example. The vectors
0 1
1
B C
v1 = @4A ;
7
0 1
2
B C
v2 = @5A ;
8
0 1
3
B C
v3 = @6A
9
are not independent, because
0 1
0
1
0
–1 + 2–2 + 3–3
B C
B
C
@0A = 0 = –1 v1 + –2 v2 + –3 v3 = @4–1 + 5–2 + 6–3 A
0
7–1 + 8–2 + 9–3
has a non-trivial solution, as we have seen in Example 1.6. For example,
we can take –1 = 1, –2 = −2, –3 = 1. Hence,
–1 v1 + –2 v2 + –3 v3 = 0
and the vectors are not independent.
̸⇒
–1 = –2 = –3 = 0;
Finite-Dimensional Vector Spaces
Slide 55
Linear Combinations and Span
2.4. Definition. Let v1 ; : : : ; vn ∈ V and –1 ; : : : ; –n ∈ F. Then the expression
n
X
–k vk = –1 v1 + · · · + –n vn
k=1
is called a linear combination of the vectors v1 ; : : : ; vn .
The set
n
span{v1 ; : : : ; vn } := y ∈ V : y =
n
X
–k vk ; –1 ; : : : ; –n ∈ F
o
k=1
is called the (linear) span or the linear hull of the vectors v1 ; : : : ; vn .
Finite-Dimensional Vector Spaces
Slide 56
Linear Combinations and Span
2.5. Example. span
(
!
!)
1
2
;
0
1
= R2 .
We need to show that every x ∈ R2 can be written as
x = –1
!
1
2
+ –2
0
1
!
for some –1 ; –2 ∈ R. This means we need to solve
x=
x1
x2
!
=
–1 + 2–2
–2
!
This is easily done, and we obtain
2x2o. Thus for
n` ´ –`2 ´=
o x2 and –1 =nx`1 −
1
2
1´ `2´
2
any x ∈ R we have x ∈ span 0 ; 1 . Since span 0 ; 1 ⊂ R2 by
definition, we are finished.
Finite-Dimensional Vector Spaces
Slide 57
Linear Combinations and Span
2.6. Lemma. The vectors v1 ; : : : ; vn ∈ V are independent if and only if
none of them is contained in the span of all the others.
Proof.
If vk = 0 for some k = 1; : : : ; n, then the statement is trivially true.
We prove the contraposition of the statement, assuming that all vectors
are non-zero:
vk ∈ span{v1 ; : : : ; vk−1 ; vk+1 ; vn }
∃
k∈{1;:::;n}
⇔
⇔
∃
k∈{1;:::;n}
∃
∃
–i ∈ F
i∈{1;:::;n}\{k}
P
|–i |̸=0
X
–i ∈ F
i
i∈{1;:::;n}
P
|–i |̸=0
–i vi = 0
vk =
X
i
–i vi
Finite-Dimensional Vector Spaces
Slide 58
Span of Subsets
More generally, if V is a vector space and M is some subset of V , then we
can define the span of M as the set containing all (finite) linear
combinations of elements of M, i.e.,
n
n
X
o
–i mi :
span M := v ∈ V : ∃
:v=
∃
∃
n∈N –1 ;:::;–n ∈F m1 ;:::;mn ∈M
i=1
Note that this definition does not presume that M is a subspace, just an
arbitrary subset of V . Furthermore, although only finite linear
combinations are considered, the set M may well be infinite in size.
Moreover, even though M is just any set, span M will be a subspace of V .
2.7. Example. Let M = {f ∈ C(R) : f (x) = x n ; n ∈ N} denote the set of
all monomials in the space of continuous functions on R. Then
P(R) := span M is the space of all polynomials (of any degree) in C(R).
Finite-Dimensional Vector Spaces
Slide 59
Basis
2.8. Definition. Let V be a real or complex vector space. An n-tuple
B = (b1 ; : : : ; bn ) ∈ V n is called an (ordered and finite) basis of V if
every vector v has a unique representation
v=
n
X
–i bi ;
–i ∈ F :
i=1
The numbers –i are called the coordinates of v with respect to B.
2.9. Example. The tuple of vectors (e1 ; : : : ; en ), ei ∈ Rn ,
ei = (0; : : : 0; 1 ; 0; : : : ; 0);
↑
ith
entry
i = 1; : : : ; n;
is called the standard basis or canonical basis of Rn .
Finite-Dimensional Vector Spaces
Slide 60
Characterization of Bases
Sometimes we are not interested in the order of the elements of a basis,
and write B = {b1 ; : : : ; bn }, replacing the tuple by a set. This is known as
an unordered basis.
2.10. Theorem. Let V be a real or complex vector space. An n-tuple
B = (b1 ; : : : ; bn ) ∈ V n is a basis of V if and only if
(i) the vectors b1 ; : : : ; bn are linearly independent, i.e., B is an
independent set, and
(ii) V = span B.
Finite-Dimensional Vector Spaces
Slide 61
Characterization of Bases
Proof.
(⇒) Suppose that B is a basis of V . Then every v ∈ V can be expressed as
v=
n
X
– i bi
i=1
for some coefficients –i ∈ F. Hence, V ⊂ span B. From B ⊂ V it is
clear that span B ⊂ V , so we deduce V = span B. The zero vector
0 ∈ V has the representation
0 = 0 · b1 + · · · + 0 · bn :
Since B is a basis, this representation is unique, i.e.,
n
X
i=1
–i bi = 0
⇒
–1 = · · · = –n = 0:
Finite-Dimensional Vector Spaces
Slide 62
Characterization of Bases
Proof (continued).
It follows that B is an independent set.
(⇐) Suppose that B ⊂ V satisfies span B = V . Then every v ∈ V is an
element of the span of B, so
v=
n
X
– i bi
i=1
for some coefficients –i ∈ F. It remains to show that this
representation is unique. Suppose that
v=
n
X
i=1
–i bi =
n
X
i=1
—i bi ;
–i ; —i ∈ F:
Finite-Dimensional Vector Spaces
Slide 63
Characterization of Bases
Proof (continued).
Then
0=
n
X
(–i − —i )bi :
i=1
Since the bi are all independent, this implies
–i − —i = 0;
so the representation is unique.
i = 1; : : : ; n;
Finite-Dimensional Vector Spaces
Slide 64
Finite- and Infinite-Dimensional Spaces
2.11. Definition. Let V be a real or complex vector space. Then V is called
finite-dimensional if either
I V = {0} or
I V possesses a finite basis.
If V is not finite-dimensional, we say that it is infinite-dimensional.
2.12. Example.
1. The space of polynomials of degree at most n,
R) : f (x) =
Pn = {f ∈ C(
n
X
ak x k ; a0 ; a1 ; : : : ; an ∈ R}
k=0
is finite-dimensional, because it has the basis B = (1; x; x 2 ; : : : ; x n ).
2. The space of real polynomials of any degree, P(R), is
infinite-dimensional. (See Example 2.7.)
Finite-Dimensional Vector Spaces
Slide 65
Length of Bases
2.13. Theorem. Let V be a real or complex finite-dimensional vector space,
V ̸= {0}. Then any basis of V has the same length (number of elements)
n.
Proof.
Let A = (a1 ; : : : ; an ) be a basis of V . We will show that no tuple
B = (b1 ; : : : ; bm ) with m > n can be a basis of V . (We do not need to
consider the case m < n, because we could just switch the role of A and
B.) Thus, suppose that A and B given as above are both bases. Then for
every j = 1; : : : ; m there exist uniquely determined numbers cij ∈ F,
i = 1; : : : ; n, such that
bj =
n
X
i=1
cij ai :
Finite-Dimensional Vector Spaces
Slide 66
Length of Bases
Proof (continued).
Now let –1 ; : : : ; –m ∈ F and consider the linear combination
m
X
j=1
–j bj =
m X
n
X
cij –j ai =
j=1 i=1
n “X
m
X
i=1
|
j=1
”
cij –j ai
{z
=:—i
}
Now, since A and B are bases and as we know that 0 ∈ V has a unique
representation in terms of basis vectors, we have
–1 = –2 = · · · = –m = 0
⇔
m
X
–j bj = 0
j=1
⇔
⇔
⇔
n
X
— i ai = 0
i=1
—1 = —2 = · · · = —n = 0
∀
i=1;:::;n
m
X
j=1
cij –j = 0
Finite-Dimensional Vector Spaces
Slide 67
Length of Bases
Proof (continued).
This means that the homogeneous system of equations
c11 –1 + c12 –2 + · · · + c1m –m = 0
..
.
(2.1)
cn1 –1 + cn2 –2 + · · · + cnm –m = 0
has only the trivial solution
–1 = –2 = · · · = –m = 0:
However, we have assumed that m > n, i.e., there are more unknowns than
equations. By the Fundamental Lemma 1.8, there must exist a non-trivial
solution. Thus we have a contradiction, so B can not be a basis if A
is.
Finite-Dimensional Vector Spaces
Slide 68
Dimension
2.14. Definition. Let V be a finite-dimensional real or complex vector
space. We define the dimension of V , denoted dim V , as follows:
(i) If V = {0}, dim V = 0.
(ii) If V ̸= {0}, dim V = n, where n is the length of any basis of V .
If V is an infinite-dimensional vector space we write dim V = ∞.
2.15. Examples.
(i) dim Rn = n
(ii) dim Pn = n + 1
(iii) dim C(R) = ∞
(iv) dim{(x1 ; x2 ) ∈ R2 : x2 = 3x1 } = 1
Finite-Dimensional Vector Spaces
Slide 69
Characterization of Bases
2.16. Remark. In an n-dimensional vector space V a basis is an independent
set with n elements that spans V . A few questions arise naturally:
1. Is any independent set with n elements in an n-dimensional space a
basis?
2. If not, is it possible to find independent sets with more than n
elements in an n-dimensional space?
Finite-Dimensional Vector Spaces
Slide 70
Maximal Subsets
In order to answer these and similar questions, we will work towards a
fundamentally important result called the basis extension theorem. First,
we need a lemma:
2.17. Lemma. Let a1 ; : : : ; an+1 ∈ V and assume that a1 ; : : : ; an are
independent and that a1 ; : : : ; an+1 are dependent. Then an+1 is a linear
combination of (some of) a1 ; : : : ; an .
The proof is quite easy and left to you as an exercise.
2.18. Definition. Let V be a real or complex vector space and A ⊂ V a
finite set. An independent subset F ⊂ A is called maximal if every x ∈ A
is a linear combination of elements of F .
If A is finite and F ⊂ A is maximal, then span F = span A. A maximal
subset is of course not defined uniquely.
Finite-Dimensional Vector Spaces
Slide 71
Maximal Subsets
2.19. Example. Let V = R3 and
80 1 0 1 0 19
>
0
1 >
< 1
=
B C B C B C
A = @1A ; @1A ; @ 0 A :
>
>
: 0
1
−1 ;
Then
80 1 0 1 9
80 1 0 1 9
80 1 0 19
>
>
>
0 >
1 >
1 >
< 1
=
< 1
=
< 0
=
B C B C
B C B C
B C B C
F1 = @1A ; @1A ; F2 = @1A ; @ 0 A ; F3 = @1A ; @ 0 A
>
>
>
>
>
>
: 0
: 0
: 1
1 ;
−1 ;
−1 ;
are all maximal independent subsets of A. Furthermore,
span A = span F1 = span F2 = span F3 :
Finite-Dimensional Vector Spaces
Slide 72
Maximal Subsets
2.20. Theorem. Let V be a vector space and A ⊂ V a finite set. Then
every independent subset A′ ⊂ A lies in some maximal subset F ⊂ A.
Proof.
We proceed algorithmically. We ask: Does there exist a vector x ∈ A \ A′
such that x ∈
= span A′ ?
I If no, we are finished, because A′ is maximal.
I If yes, we take this x and define A′′ = A′ ∪ {x}.
By Lemma 2.17, A′′ is independent (otherwise x ∈ span A′ and we have a
contradiction) and we can repeat the procedure, substituting A′′ for A′ .
Since A is finite, the loop will terminate at some point and we obtain a
maximal independent subset of A.
Finite-Dimensional Vector Spaces
Slide 73
Basis Extension Theorem
2.21. Basis Extension Theorem. Let V be a finite-dimensional vector space
and A′ ⊂ V an independent set. Then there exists a basis of V containing
A′ .
Proof.
Write A′ = {a1 ; : : : ; am } and choose a basis A = {am+1 ; : : : ; am+n } of V ,
dim V = n. We now define
A = {a1 ; : : : ; am+n } ⊃ A′ :
By Theorem 2.20 there exists a maximal independent subset F of A
containing A′ .
Since A is a basis, V = span A = span A. Furthermore, span F = span A,
so span F = V . Thus F is a basis.
Finite-Dimensional Vector Spaces
Slide 74
Basis Extension Theorem
2.22. Corollary. Let V be an n-dimensional vector space, n ∈ N. Then any
independent set A with n elements is a basis of V .
Proof.
By the basis extension theorem there is a basis containing A. Since this
basis will have n elements, A itself is this basis.
2.23. Corollary. Let V be an n-dimensional vector space, n ∈ N. Then an
independent set A may have at most n elements.
Proof.
By the basis extension theorem there is a basis containing A. Since this
basis will have n elements, A may not have more elements than this.
Finite-Dimensional Vector Spaces
Slide 75
Sums of Vector Spaces
2.24. Definition. Let V be a real or complex vector space and U; W be sets
in V .
(i) We define the sum of U and W by
n
U + W := v ∈ V : ∃
o
∃ : v =u+w :
u∈U w ∈W
(ii) If U and W are subspaces of V with U ∩ W = {0}, the sum U + W is
called direct, and we denote it by U ⊕ W .
2.25. Remark. It is easy to see that if U; W are subspaces of V , then
U + W (or U ⊕ W ) is a subspace of V ; check this for yourself!
Finite-Dimensional Vector Spaces
Slide 76
Sums of Vector Spaces
2.26. Examples.
(i) Let U; W ⊂ R2 be given by
U = span
(
1
0
!)
;
W = span
(
2
1
!)
:
Then every x ∈ R2 has a representation in the form x = u + w , where
u ∈ U and w ∈ W . (Why? See also Example 2.5) Therefore,
R2 = U + W .
` ´
Furthermore, U ∩ W = {0} since 10 and
Lemma 2.6.) Hence, we can write
R2 = U ⊕ W:
`2 ´
1
are independent (See
Finite-Dimensional Vector Spaces
Slide 77
Sums of Vector Spaces
(ii) Let U; W ⊂ R3 be given by
80 1 0 1 9
>
1 >
< 1
=
B C B C
U = span @0A ; @1A ;
>
>
: 0
0 ;
80 1 0 1 9
>
0 >
< 0
=
B C B C
W = span @0A ; @1A :
>
>
: 1
0 ;
Then U + W = R3 , but the sum is not direct, because
(0; 1; 0) ∈ U ∩ W
(iii) We could write V = {(x1 ; x2 ; x3 ) ∈ R3 : x3 = 0} as
80 19
80 19
>
>
< 1 >
=
< 1 >
=
B C
B C
V = span @1A ⊕ span @0A :
>
>
: 0 >
;
: 0 >
;
Finite-Dimensional Vector Spaces
Slide 78
Sums of Vector Spaces
2.27. Lemma. The sum U + W of vector spaces U; W is direct if and only
if all x ∈ U + W , x ̸= 0, have a unique representation x = u + w , u ∈ U,
w ∈ W.
Proof.
(⇒) We show the contraposition: if the representation is not unique for all
x ∈ U + W , then the sum is not direct. Let x = u + w = u ′ + w ′ with
u; u ′ ∈ U, w ; w ′ ∈ W . Then u − u ′ = w ′ − w , so u − u ′ ∈ U and
u − u ′ ∈ W . Thus U ∩ W =
̸ {0}.
Finite-Dimensional Vector Spaces
Slide 79
Sums of Vector Spaces
Proof (continued).
(⇐) We again show the contraposition: if the sum is not direct, then there
exists some x ∈ U + W with a non-unique representation. This is
obvious, because if 0 ̸= x ∈ U ∩ W , then we may write
x = |{z}
x + |{z}
0 =
∈U
∈W
1
1
x + x;
2
2
|{z} |{z}
∈U
so this x has more than one representation.
∈W
Finite-Dimensional Vector Spaces
Slide 80
Sums of Vector Spaces
2.28. Theorem. Let V be a vector space and U; W ⊂ V be
finite-dimensional subspaces of V . Then
dim(U + W ) + dim(U ∩ W ) = dim U + dim W:
The proof will discussed in recitation class.
Inner Product Spaces
Slide 81
3. Inner Product Spaces
Inner Product Spaces
Linear Algebra
1. Systems of Linear Equations
2. Finite-Dimensional Vector Spaces
3. Inner Product Spaces
4. Linear Maps
5. Matrices
6. Theory of Systems of Linear Equations
7. Determinants
Slide 82
Inner Product Spaces
Slide 83
Inner Product Spaces
3.1. Definition. Let V be a real or complex vector space. Then a map
⟨ · ; · ⟩ : V × V → F is called a scalar product or inner product if for all
u; v ; w ∈ V and all – ∈ F
(i) ⟨v ; v ⟩ ≥ 0 and ⟨v ; v ⟩ = 0 if and only if v = 0,
(ii) ⟨u; v + w ⟩ = ⟨u; v ⟩ + ⟨u; w ⟩,
(iii) ⟨u; –v ⟩ = –⟨u; v ⟩,
(iv) ⟨u; v ⟩ = ⟨v ; u⟩.
The pair (V; ⟨ · ; · ⟩) is called an inner product space.
3.2. Remark. Properties (iii) and (iv) imply that
⟨–u; v ⟩ = ⟨v ; –u⟩ = –⟨v ; u⟩ = –⟨u; v ⟩:
We say that the inner product is linear in the second component and
anti-linear in the first component.
Inner Product Spaces
Slide 84
The Induced Norm
3.3. Examples.
I In Rn we define the canonical or standard scalar product
⟨x; y ⟩ :=
n
X
xi yi ;
x; y ∈ Rn :
(3.1)
i=1
I In
Cn we can define the inner product
⟨x; y ⟩ :=
n
X
xi yi ;
x; y ∈ Cn :
i=1
I In C([a; b]), the space of complex-valued, continuous functions on the
interval [a; b], we can define an inner product by
⟨f ; g ⟩ :=
Z b
a
f (x)g (x) dx;
f ; g ∈ C([a; b]):
Inner Product Spaces
Slide 85
The Induced Norm
3.4. Definition. Let (V; ⟨·; ·⟩) be an inner product space. The map
∥ · ∥ : V → R;
∥v ∥ =
q
⟨v ; v ⟩
is called the induced norm on V .
3.5. Examples.
I The induced norm in Rn and Cn is given by
∥x∥ =
q
v
u n
uX
⟨x; x⟩ = t |xi |2 = ∥x∥2 ;
i=1
which is the usual euclidean norm.
I The induced norm on C([a; b]) is
∥f ∥ =
q
⟨f ; f ⟩ =
which is just the 2-norm.
s
Z b
a
|f (x)|2 dx = ∥f ∥2
(3.2)
Inner Product Spaces
Slide 86
The Induced Norm
3.6. Cauchy-Schwarz Inequality. Let (V; ⟨ · ; · ⟩) be an inner product vector
space. Then
|⟨u; v ⟩| ≤ ∥u∥ · ∥v ∥
for all u; v ∈ V
where ∥ · ∥ is the induced norm.
Proof.
Let e := v =∥v ∥. Then ⟨e; e⟩ = ⟨v ; v ⟩=∥v ∥2 = 1 and
0 ≤ ∥u − ⟨e; u⟩e∥2 = ⟨u − ⟨e; u⟩e; u − ⟨e; u⟩e⟩
= ∥u∥2 − |⟨e; u⟩|2
It follows that
|⟨u; v ⟩|2 = ∥v ∥2 · |⟨u; e⟩|2 ≤ ∥u∥2 · ∥v ∥2 :
Inner Product Spaces
Slide 87
The Induced Norm
3.7. Corollary. The induced norm is actually a norm, i.e., it satisfies
(i) ∥v ∥ ≥ 0, ∥v ∥ = 0 ⇔ v = 0,
(ii) ∥–v ∥ = |–| · ∥v ∥,
(iii) ∥u + v ∥ ≤ ∥u∥ + ∥v ∥
for all u; v ∈ V and – ∈ F.
Proof.
All properties except for the triangle inequality are easily checked. By the
Cauchy-Schwarz inequality, we have
∥u + v ∥2 = ∥u∥2 + ∥v ∥2 + 2 Re⟨u; v ⟩
≤ ∥u∥2 + ∥v ∥2 + 2|⟨u; v ⟩|
≤ ∥u∥2 + ∥v ∥2 + 2∥u∥∥v ∥
= (∥u∥ + ∥v ∥)2 :
Inner Product Spaces
Slide 88
Angle Between Vectors
3.8. Remark. Every inner product space is also a normed vector space and
by extension a metric space.
3.9. Definition. Let V be a real inner product space and u; v ∈ V . We
define the angle ¸(u; v ) ∈ [0; ı] between u and v by
cos ¸(u; v ) =
⟨u; v ⟩
:
∥u∥∥v ∥
(3.3)
This definition makes sense, since by the Cauchy-Schwarz inequality
˛
˛
˛ ⟨u; v ⟩ ˛
|⟨u; v ⟩|
˛
˛
˛ ∥u∥∥v ∥ ˛ = ∥u∥∥v ∥ ≤ 1:
In R2 and R3 the expression (3.3) of course corresponds to our geometric
notion of the (cosine of the) angle between two vectors.
Inner Product Spaces
Slide 89
Angle Between Vectors
3.10. Example. For x; y ∈ R2 we have ^(x; y ) = ¸(x; y ).
We may assume that ∥x∥ = ∥y ∥ = 1 and we consider the case
x=
!
cos ’1
;
sin ’1
y=
!
cos ’2
;
sin ’2
0 < ’1 < ’2 < ı:
(Cf. the section on polar coordinates in last term’s lecture.)
Then ^(x; y ) = ’2 − ’1 and
cos ^(x; y ) = cos(’2 − ’1 ) = cos ’2 cos ’1 + sin ’2 sin ’1
= ⟨x; y ⟩ = cos ¸(x; y )
In a similar manner, one can prove that ^(x; y ) = ¸(x; y ) for x; y ∈ R3 .
Inner Product Spaces
Slide 90
Vectors, Norms and Inner Products
We can use a Table command to create a general vector:
X = Table@xi , 8i, 3<D
8x1 , x2 , x3 <
The standard inner product (3.1) is implemented by a simple dot:
Y = Table@yi , 8i, 3<D;
X.Y
x1 y1 + x2 y2 + x3 y3
The induced norm (3.2) is given by the Norm command:
Norm@XD
Abs@x1 D2 + Abs@x2 D2 + Abs@x3 D2
Inner Product Spaces
Slide 91
Orthogonality
3.11. Definition. Let (V; ⟨ · ; · ⟩) be an inner product vector space.
(i) Two vectors u; v ∈ V are called orthogonal or perpendicular if
⟨u; v ⟩ = 0. We then write u ⊥ v .
(ii) We call
n
o
M ⊥ := v ∈ V : ∀ ⟨m; v ⟩ = 0
m∈M
the orthogonal complement of a set M ⊂ V .
For short, we sometimes write v ⊥ M instead of v ∈ M ⊥ or v ⊥ m for all
m ∈ M.
3.12. Lemma. The orthogonal complement M ⊥ is a subspace of V .
Proof.
If v1 ; v2 ∈ M ⊥ , then ⟨v1 + v2 ; m⟩ = ⟨v1 ; m⟩ + ⟨v2 ; m⟩ = 0 + 0 = 0 for all
m ∈ M, so v1 + v2 ∈ M ⊥ . Similarly, if v ∈ M ⊥ and – ∈ F, then
⟨–v ; m⟩ = –⟨v ; m⟩ = 0, so –v ∈ M ⊥ . Thus M ⊥ is a subspace of V .
Inner Product Spaces
Slide 92
Orthogonality
3.13. Pythagoras’s Theorem. Let (V; ⟨ · ; · ⟩) be an inner product space and
M some subset of V . Let z = x + y , where x ∈ M and y ∈ M ⊥ . Then
∥z∥2 = ∥x∥2 + ∥y ∥2 :
Proof.
We see directly that
∥z∥2 = ⟨z; z⟩ = ⟨x + y ; x + y ⟩
= ⟨x; x⟩ + ⟨x; y ⟩ + ⟨y ; x⟩ +⟨y ; y ⟩
| {z }
=0
2
= ∥x∥ + ∥y ∥ :
2
| {z }
=0
Inner Product Spaces
Slide 93
Orthonormal Systems
3.14. Definition. Let (V; ⟨ · ; · ⟩) be an inner product vector space. A tuple
of vectors (v1 ; : : : ; vr ) ⊂ V is called a (finite) orthonormal system if
⟨vj ; vk ⟩ = ‹jk :=
(
1
0
for j = k;
;
for j ̸= k;
j; k = 1; : : : ; r;
i.e., if ∥vk ∥ = 1 and vj ⊥ vk for j ̸= k.
3.15. Example. The standard basis vectors in R3 ,
0 1
1
B C
e1 = @0A ;
0
0 1
0
B C
e2 = @1A ;
0
0 1
0
B C
e3 = @0A ;
1
form an orthonormal system (e1 ; e2 ; e3 ) with respect to the standard scalar
product.
Inner Product Spaces
Slide 94
Orthonormal Systems
3.16. Lemma. Let (V; ⟨ · ; · ⟩) be an inner product vector space and
F = (v1 ; : : : ; vr ) ⊂ V an orthonormal system. Then the elements of F are
linearly independent.
Proof.
We want to prove that for any –1 ; : : : ; –r ∈ F
r
X
(3.4)
–i vi = 0
i=1
implies –1 = · · · = –r = 0. We take the scalar product of (3.4) with v1 :
0 = ⟨v1 ; 0⟩ = ⟨v1 ; –1 v1 + · · · + –r vr ⟩
= –1 ⟨v1 ; v1 ⟩ +–2 ⟨v1 ; v2 ⟩ + · · · + –r ⟨v1 ; vr ⟩ = –1 ;
| {z }
=1
| {z }
=0
| {z }
=0
so (3.4) implies –1 = 0. Similarly, we obtain –2 = · · · = –r = 0.
Inner Product Spaces
Slide 95
Orthonormal Bases
3.17. Definition. Let (V; ⟨ · ; · ⟩) be a finite-dimensional inner product vector
space and B = (e1 ; : : : ; en ) a basis of V . If B is also an orthonormal
system, we say that B is an orthonormal basis (ONB).
3.18. Theorem. Let (V; ⟨ · ; · ⟩) be a finite-dimensional inner product vector
space and B = (e1 ; : : : ; en ) an orthonormal basis of V . Then every v ∈ V
has the basis representation
v=
n
X
⟨ej ; v ⟩ej :
j=1
3.19. Definition. The numbers ⟨ej ; v ⟩ are called Fourier coefficients of v
with respect to the basis B. The vector
ıei v := ⟨ei ; v ⟩ei
is called the projection of v onto ei
Inner Product Spaces
Slide 96
Orthonormal Bases
Proof of Theorem 3.18.
Since B is a basis, for every v ∈ V there exist coefficients –1 ; : : : ; –n ∈ F
such that
v=
n
X
–j ej :
j=1
Now for any k = 1; : : : ; n, we have
⟨ek ; v ⟩ =
n
X
–j ⟨ek ; ej ⟩ =
j=1
so it follows that
v=
n
X
j=1
n
X
–j ‹kj = –k ;
j=1
–j e j =
n
X
j=1
⟨ej ; v ⟩ej :
Inner Product Spaces
Slide 97
Orthonormal Bases
The following result, which follows directly from Theorem 3.18, generalizes
Pythagoras’s Theorem 3.13:
3.20. Parseval’s Theorem. Let (V; ⟨ · ; · ⟩) be a finite-dimensional inner
product vector space and B = {e1 ; : : : ; en } an orthonormal basis of V .
Then
∥v ∥2 =
n
X
|⟨v ; ei ⟩|2
i=1
for any v ∈ V .
We have generalized the concepts of angle and orthogonality to vector
spaces and thereby obtained Pythagoras’s Theorem and now Parseval’s
inequality. For understanding the geometry of vector spaces (and thereby
extending the “elementary” geometry of R3 , the projection of a vector
onto subspaces is of fundamental importance. The following theorem
develops this concept a little further.
Inner Product Spaces
Slide 98
The Projection Theorem
3.21. Projection Theorem. Let (V; ⟨ · ; · ⟩) be a (possibly
infinite-dimensional) inner product vector space and (e1 ; : : : ; er ), r ∈ N, be
an orthonormal system in V . Denote U := span{e1 ; : : : ; er }.
Then for every v ∈ V there exists a unique representation
where u ∈ U and w ∈ U ⊥
v =u+w
and u =
r
P
⟨ei ; v ⟩ei , w := v − u.
i=1
3.22. Definition. The vector
ıU v :=
r
X
⟨ei ; v ⟩ei
i=1
is called the orthogonal projection of v onto U. The projection theorem
essentially states that ıU v always exists and is independent of the choice
of the orthonormal system (it depends only on the span of the system, U).
Inner Product Spaces
Slide 99
The Projection Theorem
3.23. Example. Consider the subspace U = {(x1 ; x2 ; x3 ) ∈ R3 : x3 = 0} of
R3 . An orthonormal basis of U is given by B = {e1 ; e2 }, where
0 1
0 1
1
B C
e1 = @0A ;
0
0
B C
e2 = @1A :
0
Then the projection of a vector y = (y1 ; y2 ; y3 ) onto U is given by
ıU y = ⟨e1 ; y ⟩e1 + ⟨e2 ; y ⟩e2
0 1 0 1 0 1
0 1 0 1 0 1
* 1
* 0
y1 + 1
y1 + 0
B C B C B C
B C B C B C
= @0A ; @y2 A @0A + @1A ; @y2 A @1A
0
0 1
y3
0 1
0
0 1
1
0
y1
B C
B C B C
= y1 @0A + y2 @1A = @y2 A
0
0
0
0
y3
0
Inner Product Spaces
Slide 100
The Projection Theorem
Proof of the Projection Theorem.
We first show the uniqueness of the decomposition: Assume
v = u + w = u ′ + w ′ . Then by Pythagoras’s theorem,
0 = ∥u − u ′ + (w − w ′ )∥2 = ∥u − u ′ ∥2 + ∥w − w ′ ∥2 ;
so ∥u − u ′ ∥ = ∥w − w ′ ∥ = 0. Thus u = u ′ and w = w ′ .
Regarding the existence of such a decomposition, it is clear that
u=
r
X
⟨ei ; v ⟩ei
i=1
lies in U. We need to show that w ∈ U ⊥ , i.e., u ⊥ v − u.
Inner Product Spaces
Slide 101
The Projection Theorem
Proof of the Projection Theorem (continued).
Note first that
∥u∥2 = ⟨u; u⟩ =
r
DX
⟨ei ; v ⟩ei ;
i=1
=
r X
r
X
i=1 j=1
r
X
⟨ej ; v ⟩ej
E
j=1
⟨ei ; v ⟩⟨ej ; v ⟩ ⟨ei ; ej ⟩ =
| {z }
=‹ij
r
X
|⟨ei ; v ⟩|2 :
i=1
It then follows that
D
⟨v − u; u⟩ = ⟨v ; u⟩ − ∥u∥2 = v ;
r
X
E
⟨ei ; v ⟩ei − ∥u∥2
i=1
=
r
X
i=1
⟨ei ; v ⟩⟨ei ; v ⟩ −
r
X
i=1
|⟨ei ; v ⟩|2 = 0:
Inner Product Spaces
Slide 102
Orthogonal Subspaces
An immediate consequence of the Projection Theorem is as follows:
3.24. Corollary. Let (V; ⟨ · ; · ⟩) be a (possibly infinite-dimensional) inner
product vector space and let U ⊂ V be a finite-dimensional subspace. Then
V = U ⊕ U⊥
If V is finite-dimensional, then
dim V = dim U + dim U ⊥ :
This follows directly from the Projection Theorem with Lemma 2.27 and
Theorem 2.28.
Inner Product Spaces
Slide 103
Bessel’s Inequality
As a consequence of the Projection Theorem 3.21 and Pythagoras’s
Theorem 3.13 we obtain the following important result:
3.25. Bessel Inequality. Let (V; ⟨ · ; · ⟩) be an inner product space and
(e1 ; : : : ; en ) an orthonormal system in V . Then, for any v ∈ V and any
r ≤ n,
r
X
|⟨ek ; v ⟩|2 ≤ ∥v ∥2 :
(3.5)
k=1
Proof.
By Pythagoras’s Theorem 3.13 we then have ∥v − u∥2 + ∥u∥2 = ∥v ∥2 , so
0 ≤ ∥v − u∥2 = ∥v ∥2 − ∥u∥2 = ∥v ∥2 −
r
X
i=1
|⟨ei ; v ⟩|2 :
Inner Product Spaces
Slide 104
Best Approximation
Now suppose that we want to approximate an element v ∈ V using a linear
combination of the first r elements of an orthonormal system,
v≈
r
X
–1 ; : : : ; –r ∈ F:
–i ei ;
(3.6)
i=1
The question is how to choose the coefficients –1 ; : : : ; –r to make the
approximation “as good as possible”. We note that
r
r
r
r
‚
‚2
X
X
X
X
‚
‚
–i ⟨ei ; v ⟩
–i ei ‚ = ∥v ∥2 +
|–i |2 −
–i ⟨v ; ei ⟩ −
‚v −
i=1
i=1
i=1
i=1
r ˛
r
X
˛2 X
2
˛
˛
= ∥v ∥ +
⟨ei ; v ⟩ − –i −
|⟨ei ; v ⟩|2
i=1
(3.7)
i=1
It is clear that (3.7) is minimal if –i = ⟨ei ; v ⟩, i.e., the coefficients in (3.6)
are just the Fourier coefficients.
Inner Product Spaces
Slide 105
Best Approximation
From (3.7) we can also see that
r′
r
‚ ‚
‚
‚
X
X
‚ ‚
‚
‚
⟨ei ; v ⟩ei ‚ ≤ ‚v −
⟨ei ; v ⟩ei ‚
‚v −
i=1
for any r ′ ≥ r ;
(3.8)
i=1
so the approximation can only improve when we add further elements of
the orthonormal system B to the approximation.
Clearly, orthonormal systems and bases are extremely useful. We next
discuss how to obtain an orthonormal system from any system of vectors.
Inner Product Spaces
Slide 106
Gram-Schmidt Orthonormalization
Assume that we have a system of vectors (perhaps a basis) (v1 ; : : : ; vn ) in
an inner product vector space V . We wish to construct a new system
(w1 ; : : : ; wn ) that is orthonormal. We start with v1 and norm it, defining
w1 :=
v1
∥v1 ∥
Next, we want to obtain from v2 a vector w2 such that w1 ⊥ w2 . By
Theorem 3.21, v2 has a unique representation as a sum v2 = x + y , where
x ∈ span{w1 } and y ∈ (span{w1 })⊥ . Now x = ⟨w1 ; v2 ⟩w1 , so
y = v2 − ⟨w1 ; v2 ⟩w1 ∈ (span{w1 })⊥ :
(Of course, y is independent and even orthogonal to w1 .) It just remains
to norm y , and we define
w2 :=
v2 − ⟨w1 ; v2 ⟩w1
:
∥v2 − ⟨w1 ; v2 ⟩w1 ∥
Inner Product Spaces
Slide 107
Gram-Schmidt Orthonormalization
Now we can write
v3 = ⟨w1 ; v3 ⟩w1 + ⟨w2 ; v3 ⟩w2 + y ;
where y ∈ (span{w1 ; w2 })⊥ . Thus
w3 :=
v3 − ⟨w2 ; v3 ⟩w2 − ⟨w1 ; v3 ⟩w1
∥v3 − ⟨w2 ; v3 ⟩w2 − ⟨w1 ; v3 ⟩w1 ∥
will be normed and orthogonal to w1 and w2 . Proceeding in this way, we
set
v1
w1 :=
∥v1 ∥
wk :=
vk −
∥vk −
Pk−1
j=1 ⟨wj ; vk ⟩wj
Pk−1
j=1 ⟨wj ; vk ⟩wj ∥
;
k = 2; : : : ; n;
and hence obtain an orthonormal system as desired.
Inner Product Spaces
Slide 108
Gram-Schmidt Orthonormalization
3.26. Example. Suppose we are given
0 1
0 1
1
B C
v1 = @0A ;
1
√
Then ∥v1 ∥ = 2 so
1
B C
v2 = @1A ;
1
0 1
1
B C
v3 = @2A :
0
0 1
1
1 B C
w1 = √ @0A :
2 1
Next, we calculate the projection of v2 onto w1 and subtract it from v2 :
0 1
0 1
0 1
„
«
1
1
0
1
1
1 B C B C
B C
v2 − ⟨w1 ; v2 ⟩w1 = @1A − √ · 1 + √ · 1 √ @0A = @1A
2
2
2 1
1
0
Inner Product Spaces
Slide 109
Gram-Schmidt Orthonormalization
Since the norm of this vector is already one, we have
0 1
0
B C
w2 = @1A :
0
Next, we calculate
0 1
0 1
0 1
0
1
1
0
1
1
1B C
B C
B C 1B C
v3 − ⟨w2 ; v3 ⟩w2 − ⟨w1 ; v3 ⟩w1 = @2A − 2 @1A − @0A = @ 0 A
2
2
0
0
1
−1
Norming,
0
1
1
1 B C
w3 = √ @ 0 A :
2 −1
Inner Product Spaces
Slide 110
Projections and Gram-Schmidt
We can use the Normalize command to create a normed vector:
v1 = 81, 0, 1<;
w 1 = Normalize@v1 D
:
1
, 0,
2
1
2
>
The projection of v2 onto w1 can be calculated through the Projection
command.
v2 = 81, 1, 1<;
v2 - Projection@v2 , w 1 D
w2 =
Norm@v2 - Projection@v2 , w 1 DD
80, 1, 0<
Note that addition of vectors and multiplication with numbers work
naturally.
Inner Product Spaces
Slide 111
Projections and Gram-Schmidt
Mathematica has a built-in command for the Gram-Schmidt procedure,
Orthogonalize:
v3 = 81, 2, 0<;
Orthogonalize@8v1 , v2 , v3 <D
::
1
2
, 0,
1
2
>, 80, 1, 0<, :
1
2
, 0, -
1
2
>>
Linear Maps
Slide 112
4. Linear Maps
Linear Maps
Linear Algebra
1. Systems of Linear Equations
2. Finite-Dimensional Vector Spaces
3. Inner Product Spaces
4. Linear Maps
5. Matrices
6. Theory of Systems of Linear Equations
7. Determinants
Slide 113
Linear Maps
Slide 114
Linear maps on Vector Spaces
In calculus, physics and engineering applications, a fundamental role is
played by functions between vector spaces that are linear:
4.1. Definition. Let (U; ⊕; ⊙) and (V; ´; `) be vector spaces that are either
both real or both complex. Then a map L : U → V is said to be linear if it
is both homogeneous, i.e.,
L(– ⊙ u) = – ` L(u)
(4.1a)
L(u ⊕ u ′ ) = L(u) ´ L(u ′ );
(4.1b)
and additive, i.e.,
for all u; u ′ ∈ U and – ∈ F. The set of all linear maps L : U → V is
denoted by L(U; V ).
4.2. Remark. A linear map L : U → V satisfies L(0) = 0, where we use the
same symbol 0 for the zero in U or in V .
Linear Maps
Slide 115
Linear maps on Vector Spaces
4.3. Examples.
(i) All linear maps R → R are of the form x 7→ ¸x for some ¸ ∈ R.
d
(ii) For I ⊂ R, the map dx
: f 7→ f ′ is a linear map C 1 (I) → C(I).
(iii) The map (an ) 7→ a0 is a linear map from the space of all sequences to
C.
(iv) The map (an ) 7→ limn→∞ an is linear map from the space of all
convergent sequences to C.
(v) If C is regarded as a real vector space, the map z 7→ z is linear
C → C. It is not linear if C is regarded as a complex vector space.
(vi) For any real or complex vector space V , the map V ∋ x 7→ c ∈ F
(F = R or C) is linear if and only if c = 0.
For linear maps, we often write simply Lu instead of L(u).
Linear Maps
Slide 116
Linear Maps are Structure-Preserving
A linear map
L: U → V
between vector spaces (U; ⊕; ⊙) and (V; ´; `) is a structure-preserving
map. What does this mean?
Suppose (for the moment) that L : U → V is also bijective. Consider the
scalar multiplication of a vector x ∈ U with a number –. There are now
two ways of doing this: Either we calculate – ⊙ x directly, or we use the
map L to form Lx ∈ V , then multiply by –, then use the inverse map
L−1 : V → U to regain an element of U:
U
L
w V
–⊙
–`
u
U u
L−1
u
V
(4.2)
Linear Maps
Slide 117
Homomorphisms
The validity of (4.2) follows from
`
´
L−1 (– ` Lx) = L−1 L(– ⊙ x) = – ⊙ x:
From now on, we will use · instead of symbols like ⊙ or ` and + instead
of ⊕ or ´. It is up to the reader (you!) to determine which operation in
which space is indicated.
Since linear maps have this important property of structure preservation,
they deserve an appropriately fancy name: they are also known as (vector
space) homomorphisms (the greek prefix homo means “same”, while
morphos means “shape”). Thus, “homomorphism” and “linear map” both
denote the same thing.
In fact linear maps are so intertwined with the linear structure of vector
spaces that in the finite-dimensional case it suffices to know how a linear
map acts on basis vectors to determine it completely.
Linear Maps
Slide 118
Homomorphisms and Finite-Dimensional Spaces
4.4. Theorem. Let U; V be real or complex vector spaces and (b1 ; : : : ; bn ) a
basis of U (in particular, it is assumed that dim U = n < ∞). Then for
every n-tuple (v1 ; : : : ; vn ) ∈ V n there exists a unique linear map L : U → V
such that Lbk = vk , k = 1; : : : ; n.
Proof.
We first show the uniqueness of L: Assume there exists a second
homomorphism M ∈ L(U; V ) with Mbk = vk . For any u ∈ U we have
P
numbers –1 ; : : : ; –n such that u = –k bk . Then
Lu = L(–1 b1 + : : : + –n bn ) =
n
X
–k L(bk )
k=1
=
n
X
k=1
–k vk =
n
X
–k M(bk ) = M(–1 b1 + : : : + –n bn ) = Mu:
k=1
Since this is true for any u ∈ U, we have L = M.
Linear Maps
Slide 119
Homomorphisms and Finite-Dimensional Spaces
Proof (continued).
We now prove the existence of such a linear map, i.e., given the tuple
(v1 ; : : : ; vn ) we want to show how to define L. We define L by defining it
P
for each u ∈ U. Every u ∈ U has a unique basis decomposition u = –k bk
with numbers –1 ; : : : ; –n ∈ F. We hence define Lu in the obvious way,
Lu :=
n
X
–k vk :
k=1
It remains to check that L is linear: if u; u ′ ∈ U have coordinates (–k )nk=1
and (–′k )nk=1 , respectively, we have
L(u + u ′ ) =
n
X
(–k + –′k )vk =
k=1
n
X
–k vk +
k=1
The homogeneity of L can be shown similarly.
n
X
k=1
–′k vk = Lu + Lu ′ :
Linear Maps
Slide 120
Coordinate Map
4.5. Remarks.
(i) The identity map id : V → V , id(v ) = v , is linear.
(ii) The set L(U; V ) is again a vector space when endowed with pointwise
addition and scalar multiplication.
(iii) If L1 ∈ L(U; V ) and L2 ∈ L(V; W ), then L2 ◦ L1 ∈ L(U; W ). (The
composition of linear maps is linear.)
4.6. Examples.
(i) If V is a real or complex vector space and (b1 ; : : : ; bn ) a basis of V ,
then the coordinate map
’ : V → Fn ;
is linear (and bijective).
0
1
–1
B .. C
v=
–k bk 7→ @ . A
k=1
–n
n
X
Linear Maps
Slide 121
Dual Space
4.7. Examples.
(ii) Let V be a real or complex vector space. Then L(V; F) is known as
the dual space of V and denoted by V ∗ . The dual space of V is of
course itself a vector space.
Let dim V = n < ∞ and B = (b1 ; : : : ; bn ) be a basis of V . Then for
every k = 1; : : : ; n there exists a unique map
bk∗ : V →
F;
bk∗ (bj ) = ‹jk =
(
1; j = k;
0; j =
̸ k:
It turns out (see exercises) that the tuple of maps B∗ = (b1∗ ; : : : ; bn∗ )
is a basis of V ∗ = L(V; F) (called the dual basis of B) and thus
dim V ∗ = dim V = n.
Linear Maps
Slide 122
Range and Kernel
4.8. Definition. Let U; V be real or complex vector spaces and
L ∈ L(U; V ). Then we define the range of L by
n
ran L := v ∈ V : ∃ v = Lu
o
u∈U
and the kernel of L by
ker L := {u ∈ U : Lu = 0}:
It is easy to see that ran L ⊂ V and ker L ⊂ U are subspaces.
4.9. Remark. It is not difficult to see that L ∈ L(U; V ) is injective if and
only if ker L = {0}.
Linear Maps
Slide 123
Nomenclature
According to their properties, there are several fancy names for linear
maps. A homomorphism L ∈ L(U; V ) is said to be
I an isomorphism if L is bijective;
I an endomorphism if U = V ;
I an automorphism if U = V
and L is bijective;
I epimorph if L is surjective;
I monomorph if L is injective.
4.10. Remark. If L is an isomorphism, the its inverse, L−1 is also linear and
hence also an isomorphism.
Linear Maps
Slide 124
Isomorphisms
4.11. Theorem. Let U; V be finite-dimensional vector spaces and
L ∈ L(U; V ). Then L is an isomorphism if and only if for every basis
(b1 ; : : : ; bn ) of U the tuple (Lb1 ; : : : ; Lbn ) is a basis of V .
Proof.
(⇒) Assume that L is bijective. Then for y ∈ V the pre-image x = L−1 y is
P
uniquely determined. Let x = –k bk be the representation of x in
the basis B = (b1 ; : : : ; bn ). Now
y =L
n
“X
k=1
”
–k bk =
n
X
–k · Lbk
k=1
where the –k are uniquely determined by x, which is uniquely
determined by y . Thus for any y we can find a representation in terms
of (Lb1 ; : : : ; Lbn ) by considering the pre-image x = L−1 y .
Linear Maps
Slide 125
Isomorphisms
Proof (continued).
We still need to show that this representation is unique, i.e., if
P
y = —k · Lbk , then —k = –k . Applying L−1 , we see that
L
−1
y =x =
n
X
k=1
–k bk ;
−1
L
y =L
−1
n
X
k=1
—k · Lbk =
n
X
—k bk
k=1
and because (b1 ; : : : ; bn ) is a basis we see that —k = –k .
(⇐) We need to show that L is injective and surjective. Since any y ∈ V
P
may be written as y = –k · Lbk , y is obviously the image of
P
x = –k bk ∈ U. Thus L is surjective.
To see that L is injective, we show that ker L = {0} (see Remark 4.9).
P
P
Now Lx = 0 for x = –k bk implies –k · Lbk = 0. Since
(Lb1 ; : : : ; Lbn ) is a basis, this means that –1 = · · · = –n = 0, so
x = 0.
Linear Maps
Slide 126
Isomorphisms
4.12. Definition. Two vector spaces U and V are called isomorphic, written
U∼
= V , if there exists an isomorphism ’ : U → V .
4.13. Lemma. Two finite-dimensional vector spaces U and V are
isomorphic if and only if they have the same dimension:
U∼
=V
⇔
dim U = dim V
Proof.
(⇒) Let ’ : U → V be an isomorphism and (b1 ; : : : ; bn ) a basis of U
(dim U = n). Then (’(b1 ); : : : ; ’(bn )) is a basis of V and thus
dim V = n = dim U.
(⇐) If (a1 ; : : : ; an ) and (b1 ; : : : ; bn ) are bases of U and V , respectively,
define an isomorphism ’ by ’(ak ) = bk , k = 1; : : : ; n.
Linear Maps
Slide 127
The Dimension Formula
We can now prove a deep and fundamental result on linear maps:
4.14. Dimension Formula. Let U; V be real or complex vector spaces,
dim U < ∞. Let L ∈ L(U; V ). Then
(4.3)
dim ran L + dim ker L = dim U:
Proof.
Let dim U =: n < ∞. Since ker L ⊂ U, we have dim ker L =: r ≤ n. We
choose a basis (a1 ; : : : ; ar ) of the kernel, and use the Basis Completion
Theorem 2.21 to construct a basis (a1 ; : : : ; ar ; ar +1 ; : : : ; an ) of U. Then for
P
any x = –k ak ∈ U,
Lx = L(–1 a1 + · · · + –n an ) = –r +1 Lar +1 + · · · + –n Lan :
| {z }
=:b1
Thus ran L = span{b1 ; : : : ; bn−r }.
|{z}
=:bn−r
Linear Maps
Slide 128
The Dimension Formula
Proof.
We now claim that the vectors b1 ; : : : ; bn−r are independent; in that case
they form a basis of ran L and dim ran L = n − r , proving (4.3). Consider
the equality
0 = —1 b1 + · · · + —n−r bn−r = L(—1 ar +1 + · · · + —n−r an ):
(4.4)
If (4.4) holds, then —1 ar +1 + · · · + —n−r an ∈ ker L = span{a1 ; : : : ; ar }.
Thus, there exist –1 ; : : : ; –r such that
—1 ar +1 + · · · + —n−r an − (–1 a1 + · · · + –r ar ) = 0:
Since (a1 ; : : : ; an ) is a basis of U, we thence obtain
—1 = · · · = —n−r = 0;
–1 = · · · = –r = 0:
Thus (4.4) implies (4.5) and b1 ; : : : ; bn−r are independent.
(4.5)
Linear Maps
Slide 129
The Dimension Formula
4.15. Corollary. Let U; V be real or complex finite-dimensional vector spaces
with dim U = dim V . Then a linear map L ∈ L(U; V ) is injective if and
only if it is surjective.
Proof.
L injective
⇔
ker L = {0}
⇔
dim ker L = 0
⇔
dim ran L = dim U = dim V
⇔
ran L = V
⇔
L surjective
Linear Maps
Slide 130
Normed Vector Spaces and Bounded Linear Maps
4.16. Definition. Let (U; ∥ · ∥U ) and (V; ∥ · ∥V ) be normed vector spaces.
Then a linear map L : U → V is said to be bounded if there exists some
constant c > 0 (called a bound for L) such that
∥Lu∥V ≤ c · ∥u∥U
for all u ∈ U.
4.17. Remark. It can be shown that if U is a finite-dimensional vector
space, then any linear map is bounded.
4.18. Examples.
1. The map L¸ : R → R, x 7→ ¸x is bounded with c = |¸|.
(4.6)
Linear Maps
Slide 131
Bounded Linear Maps
2. The map
L: R → R ;
2
2
x1
x2
!
7→
2x2
−x1
!
q
is linear and bounded. If we take ∥x∥2 = x12 + x22 , we can see that
c = 2 is a bound for L.
3. Take the space C 1 ([0; 1]) of the continuously differentiable functions
on the interval [0; 1] and imbue it with the norm given by
∥f ∥∞ = sup |f (x)|. Then the map
x∈[0;1]
d
: C 1 ([0; 1]) → C([0; 1]);
f 7→ f ′
dx
is not bounded. To see this, consider the function f (x) = e −nx for
n ∈ N. Clearly, ∥f ∥∞ = 1 but ∥f ′ ∥∞ = n. Since we can choose n as
large as we like, there can exist no c > 0 such that ∥f ′ ∥∞ ≤ c · ∥f ∥∞ .
Linear Maps
Slide 132
The Operator Norm
By (4.6), for every bounded linear map there exists an upper bound c > 0
such that
∥Lu∥V
≤c
for u ̸= 0.
∥u∥U
We are now interested in the least upper bound c.
4.19. Definition and Theorem. Let U; V be normed vector spaces. Then
the set of bounded linear maps L(U; V ) is also a vector space and
∥Lu∥V
= sup ∥Lu∥V :
u∈U ∥u∥U
u∈U
∥L∥ := sup
u̸=0
(4.7)
∥u∥U =1
defines a norm, the so-called operator norm or induced norm on
L(U; V ).
The proof of the norm properties is left to the reader. The operator norm
also has the additional, very useful, property that
∥L2 L1 ∥ ≤ ∥L2 ∥ · ∥L1 ∥;
L1 ∈ L(U; V );
L2 ∈ L(V; W ):
Matrices
Slide 133
5. Matrices
Matrices
Linear Algebra
1. Systems of Linear Equations
2. Finite-Dimensional Vector Spaces
3. Inner Product Spaces
4. Linear Maps
5. Matrices
6. Theory of Systems of Linear Equations
7. Determinants
Slide 134
Matrices
Slide 135
A Calculus of Linear Maps
We have seen in Lemma 4.13 that two vector spaces are isomorphic if their
dimensions are equal. In particular:
I Every real n-dimensional vector space is isomorphic to Rn
I Every complex n-dimensional vector space is isomorphic to Cn ∼
= R2n
This means that if we can find a calculus for linear maps Rn → Rm , we
can automatically treat maps from an n-dimensional space U to an
m-dimensional space V :
L
w V
U
’1
u
’2
R
n
A
u
w Rm
Here L ∈ L(U; V ), ’1 ; ’2 are isomorphisms and A ∈ L(Rn ; Rm ). If
L = ’−1
2 ◦ A ◦ ’1
we obtain all relevant properties of L (range, kernel) by analyzing A.
Matrices
Slide 136
A Calculus of Linear Maps
The word calculus means a “scheme of calculating” that transforms a
procedure that otherwise needs to be performed individually into an
algorithm that can be (easily) applied in general.
For example, the revolutionary aspect of Newton/Leibniz’s calculus was the
fact that areas under curves, which earlier had been calculated by hand for
each individual type of curve, could suddenly easily be computed through
inverse differentiation (by finding the primitive of a function).
This was exemplified by the fundamental theorem of calculus. As an
example, compare Exercises 4 and 19 of Chapter 14 of Spivak’s book with
the simplicity of applying the Fundamental Theorem of Calculus.
In the following, we will establish an analogous calculus for linear maps,
where we are able, due to Lemma 4.13, to concentrate on those in
L(Rn ; Rm ).
Matrices
Slide 137
Matrices
5.1. Definition. An m × n matrix over the complex numbers is a map
a : {1; : : : ; m} × {1; : : : ; n} → C;
We represent the graph of a through
0
a11
B
a
B 21
A := B
B ..
@ .
a12
a22
..
.
···
···
..
.
(i; j) 7→ aij :
1
a1n
a2n C
C
.. C
C = (aij )1≤i≤m :
. A
1≤j≤n
am1 am2 · · · amn
We denote the set of all m × n matrices over C by Mat(m × n; C).
5.2. Remarks.
1. With the usual pointwise addition and scalar multiplication of maps,
Mat(m × n; C) becomes a complex vector space.
2. Matrices over R instead of C are defined in the same way.
Occasionally, we may also replace C by a real or complex vector space.
Matrices
Slide 138
Matrices as Linear Maps
Matrices turn out to be important tools in the analysis of linear maps:
every linear map between finite-dimensional vector spaces may be
expressed as a matrix, and every matrix corresponds (in a certain way) to
some such linear map. We first restrict ourselves to the case of Rn .
5.3. Theorem. Each matrix A ∈ Mat(m × n; R) uniquely determines a
linear map j(A) ∈ L(Rn ; Rm ) such that the columns a·k are the images of
the standard basis vectors ek ∈ Rn ; in particular,
j : Mat(m × n; R) → L(Rn ; Rm )
is an isomorphism, Mat(m × n; R) ∼
= L(Rn ; Rm ), so every map
L ∈ L(Rn ; Rm ) corresponds to a matrix j −1 (L) whose columns a·k are the
images of the standard basis vectors ek ∈ Rn .
Matrices
Slide 139
Matrices as Linear Maps
Proof.
Given a matrix A with columns a·k , k = 1; : : : ; n, we simply define j(A) by
j(A) : Rn → Rm ;
ek 7→ a·k ;
k = 1; : : : ; n:
Given a map L ∈ L(Rn ; Rm ) we define j −1 (L) ∈ Mat(m × n; R) by
j −1 (L) = (a·1 ; : : : ; a·n );
a·k = L(ek );
k = 1; : : : ; n:
Obviously, j −1 is actually the inverse of j; hence j is bijective. It remains to
show that j is linear. Let A = (aik ), B = (bik ). Then
j(A + B)ek = (a + b)·k = a·k + b·k = j(A)ek + j(B)ek ;
so j is additive. The homogeneity can be shown analogously.
Matrices
Slide 140
Matrices as Linear Maps
We have thus established that every matrix A = (aik ) represents a linear
map j(A). In particular,
1
0
a1k
n
B .. C X
aik ei ;
j(A)ek = @ . A =
amk
k = 1; : : : ; n:
i=1
We also note that we can represent x ∈ Rn as
0
1
0 1
0 1
1
0
x1
B C
B.C
n
X
0
B
C
B
.. C
B C
B
C
C
x = @ ... A = x1 B
+
·
·
·
+
x
=
xk ek :
nB C
B .. C
@.A
@0A k=1
xn
0
1
Matrices
Slide 141
Matrices as Linear Maps
Then j(A) ∈ L(Rn ; Rm ) acts on a general x ∈ Rn as follows,
j(A)x = j(A)
n
“X
k=1
=
n
X
k=1
0
”
xk ek =
0
1
n
X
xk j(A)ek
k=1
a1k
B .. C
xk @ . A
amk
1
x1 a11 + · · · + xn a1n
B
C
..
=@
A:
.
x1 am1 + · · · + xn amn
Matrices
Slide 142
Matrices as Linear Maps
From a practical point of view, we start with A ∈ Mat(m × n; R) and some
x ∈ Rn and obtain
0
1
x1 a11 + · · · + xn a1n
B
C
..
m
@
A∈R :
.
x1 am1 + · · · + xn amn
It seems unnecessary to include the map j : Mat(n × m; R) → L(Rn ; Rm )
in this. In fact, we might directly interpret the matrix A as a linear map
without mentioning j !
The isomorphism j can be simply left out; mathematicians routinely
consider sets of objects that are isomorphic as being actually identical. In
this way, A has a double meaning: it is on the one hand a matrix, and on
the other hand a linear map. This avoids always mentioning a superfluous
isomorphism and greatly simplifies the formulation of statements.
Matrices
Slide 143
Matrices as Linear Maps
We therefore write Ax instead of j(A)x; in particular,
0
a11
B ..
Ax = @ .
am1
10
1
0
1
· · · a1n
x1
x1 a11 + · · · + xn a1n
C
.. C B .. C = B
..
..
A:
.
. A@ . A @
.
· · · amn
xn
x1 am1 + · · · + xn amn
(5.1)
We can interpret (5.1) as the action of a matrix A ∈ Mat(m × n; R) on a
vector x ∈ Rn , yielding a vector Ax ∈ Rm .
This is the beginning of our calculus of linear maps. We now need to
develop this further to deal with (e.g.) compositions and inverses of linear
maps.
Matrices
Slide 144
Compositions
Let Rn −−→ Rm −−→ Rl be linear maps and consider their composition
j(B) ◦ j(A). We want to find a matrix C such that j(B) ◦ j(A) = j(C). Now
j(A)
j(B)
j(B) ◦ j(A)ek = j(B)
=
m
X
s=1
l
m
“
X X
t=1 s=1
|
ask es =
m
X
s=1
ask j(B)es =
m
X
s=1
ask
l
X
bts et
t=1
”
bts ask et
{z
=:ctk
}
where C = (ctk ) ∈ Mat(l × n; R). We thus introduce C as the matrix
product of B and A.
Matrices
Slide 145
Matrix Product
5.4. Definition. Let A ∈ Mat(l × m; C) and B ∈ Mat(m × n; C). Then we
define the product of A = (ai k ) and B = (bkj ) by
AB ∈ Mat(l × n; C);
AB :=
m
“X
aik bkj
k=1
”
i=1;:::;l
j=1;:::;n
We have seen that the matrix product satisfies j(A) ◦ j(B) = j(AB).
Furthermore, the product is associative, i.e.,
`
´
`
´
A(BC) = j −1 j(A) ◦ j(BC) = j −1 j(A) ◦ (j(B) ◦ j(C))
`
´
`
´
= j −1 (j(A) ◦ j(B)) ◦ j(C) = j −1 j(AB) ◦ j(C)
= (AB)C
If A; B ∈ Mat(n × n; C) both products AB and BA exist; however
AB ̸= BA;
so the matrix product is not commutative.
Matrices
Slide 146
Matrix Product
The matrix product is easily memorized through “row-by-column
multiplication,” as seen in the following examples:
5.5. Examples.
1. A =
AB =
!
1 2
,B=
3 4
!
5 6 7
,
1 0 2
!
1·5+2·1 1·6+2·0 1·7+2·2
3·5+4·1 3·6+4·0 3·7+4·2
!
!
2. A =
1 1
,B=
2 2
3. A =
1 0
,B=
0 1
4. A =
0 ¸
, A2 = AA =
0 0
!
!
2 1
, AB =
1 0
!
2 1
, AB =
1 0
0 0
0 0
!
=
!
3 1
, BA =
6 2
2 1
1 0
!
= BA
∀¸ ∈ C
7 6 11
19 18 29
4 4
1 1
!
!
Matrices
Slide 147
Matrix Transpose
For A = (aij ) ∈ Mat(m × n; F) we define the transpose of A by
AT ∈ Mat(n × m; F);
AT = (aji ):
For example,
5 6 7
1 0 2
!T
0
1
5 1
B
C
= @6 0A :
7 2
We also define the adjoint
A∗ ∈ Mat(n × m; F);
A∗ = A = (aji ):
T
where in addition to the transpose the complex conjugate of each entry is
taken.
It is easy to see (in the assignments) that for A ∈ Mat(m × n; F), x ∈ Fm ,
y ∈ Fn ,
⟨x; Ay ⟩ = ⟨A∗ x; y ⟩:
Matrices
Slide 148
Matrices
In Mathematica, a matrix is defined as follows:
In[6]:=
A = TableAai ,j , 8i, 4<, 8j, 3<E
Out[6]=
88a1,1 , a1,2 , a1,3 <, 8a2,1 , a2,2 , a2,3 <, 8a3,1 , a3,2 , a3,3 <, 8a4,1 , a4
The MatrixForm command can be used for nicer formatting.
Matrices
Slide 149
Matrix Multiplication
Matrix multiplication works using the same dot as for the inner product:
In[3]:=
A = 881, 1<, 82, 2<<;
B = 882, 1<, 81, 0<<;
MatrixForm@A.BD
Out[5]//MatrixForm=
K
3 1
O
6 2
The Transpose command gives the transpose:
Matrices
Slide 150
Matrix Multiplication
There are two very useful facts to keep in mind:
(i) When a vector x ∈ Rn is multiplied by a matrix A ∈ Mat(m × n), the
result is a linear combination of the column vectors of A. For
illustration, in the case of n = 3 and m = 2, we have
!
0 1
x
a11 a12 a13 B 1 C
@x2 A =
a21 a22 a23
x3
a11 x1 + a12 x2 + a13 x3
a21 x1 + a22 x2 + a23 x3
= x1
!
!
!
a
a
a11
+ x2 12 + x3 13
a23
a22
a21
!
(ii) When a matrix B is multiplied by a matrix A, the result is a matrix
whose columns are the products of the columns of B multiplied with
A. Again, for illustration, we give a simple example.
Matrices
Slide 151
Matrix Multiplication
Write
A=
!
a11 a12
;
a21 a22
where
b1 =
!
b11
;
b21
Then
AB =
=
a11 a12
a21 a22
!
B=
b11 b12 b13
b21 b22 b23
b2 =
b12
;
b22
!
b11 b12 b13
b21 b22 b23
!
= (b1 ; b2 ; b3 )
b3 =
!
b13
:
b23
!
a11 b11 + a12 b21 a11 b12 + a12 b22 a11 b13 + a12 b23
a21 b11 + a22 b21 a21 b12 + a22 b22 a21 b13 + a22 b23
= (Ab1 ; Ab2 ; Ab3 ):
!
Matrices
Slide 152
Matrix of a Linear Map
We are now able to properly define the matrix of a linear map between two
finite-dimensional vector spaces.
Let U; V be finite-dimensional real or complex vector spaces with bases
A = (a1 ; : : : ; an ) ⊂ U
and
B = (b1 ; : : : ; bm ) ⊂ V:
Define the isomorphisms
∼
=
’A : U −
→ Rn ;
∼
=
’B : V −
→ Rm ;
’A (aj ) = ej ;
j = 1; : : : ; n;
’B (bj ) = ej ;
j = 1; : : : ; m:
Then any linear map L ∈ L(U; V ) induces a matrix A = ΦB
A (L)
∈ Mat(m × n; R) through
U
L
’A
w V
u
’B
Rn
A
u
w Rm
−1
ΦB
A (L) = A = ’B ◦ L ◦ ’A
Matrices
Slide 153
Matrix of Complex Conjugation
5.6. Example. Consider C as a real two-dimensional vector space with basis
C → C, z 7→ z is then a linear
map. We want to determine the matrix of this map with respect to the
basis B. The isomorphism is
B = (1; i). The complex conjugation L :
’B : C → R ;
2
1 7→
!
1
;
0
i 7→
!
0
:
1
` ´
Thus ’B (a + bi) = ba . The most convenient way to determine
A = ΦB
B (L) is to calculate
’B (a + bi) =
!
a
;
b
’B (L(a + bi)) = ’B (a − bi) =
` ´
`
´
a
−b
!
a
and then find A ∈ Mat(2 × 2; R) such that A ba = −b
. It is easily seen
that
!
1 0
B
A = ΦB (L) =
:
0 −1
Matrices
Slide 154
Matrix of Complex Conjugation
5.7. Example. If we change the basis we used in the previous example, we
get a different matrix. Let us take the basis A = (1 + i; 1 − i) for C. Then
the isomorphism is
’A : C → R ;
2
1 + i 7→
!
1
;
0
1 − i 7→
!
0
:
1
Thus
`
´
1
’A (a + bi) = ’A (a + b)(1 + i) + (a − b)(1 − i)
2
!
a+b
a−b
1 a+b
=
’A (1 + i) +
’A (1 − i) =
:
2
2
2 a−b
`
´
`
´
a−b
Hence we need to find A ∈ Mat(2 × 2; R) such that A a+b
a−b = a+b , i.e.,
A
A
A = Φ (L) =
!
0 1
:
1 0
Matrices
Slide 155
Systems of Equations
Before we proceed, we take a step back to the beginning of the course.
Recall that a system of linear equations was given by
a11 x1 + · · · + a1n xn = b1
..
.
(5.2)
am1 x1 + · · · + amn xn = bm
We can express (5.2) using vectors and matrices by writing
Ax = b;
where
0
a11
B ..
A=@ .
am1
1
: : : a1n
.. C ;
..
.
. A
: : : amn
0
1
x1
B .. C
x = @ . A;
xn
0
1
b1
B .. C
b = @ . A:
bm
Matrices
Slide 156
Elementary Matrix Manipulations
The Gauß-Jordan algorithm introduced elementary row manipulation,
which we now reformulate in the context of matrices:
5.8. Elementary Matrix Manipulations. An elementary row manipulation of
a matrix is one of the following:
(i) Swapping (interchanging) of two rows,
(ii) Multiplication of a row with a non-zero number,
(iii) Addition of a multiple of one row to another row.
The additions and multiplications are performed componentwise in each
row.
If the word “row” is replaced by “column” these operations are termed
elementary column operations
The Gauß-Jordan algorithm uses only row operations. We seek to find
matrices that implement these row manipulations through multiplication of
Ax = b from the left.
Matrices
Slide 157
Elementary Matrix Manipulations
For illustration, we consider the case n = 4, m = 3.
Consider the most trivial operation possible: we do nothing. This would be
represented by multiplying with the unit matrix,
0
10
1
0
1
0
1
1 0 0
b1
b1
B
CB C B C
@0 1 0A @b2 A = @b2 A
0 0 1
b3
b3
and
0
10
1
1 0 0
a11 a12 a13 a14
a11 a12 a13 a14
B
CB
C B
C
@0 1 0A @a21 a22 a23 a24 A = @a21 a22 a23 a24 A
0 0 1
a31 a32 a33 a34
a31 a32 a33 a34
Note that we do not need to mention x at all in these equations! This is
the true underlying philosophy of the notational scheme used in the
Gauß-Jordan algorithm.
Matrices
Slide 158
Elementary Matrix Manipulations
Now how would we swap the first and second row? We can see that
0
10
1
0
1
0
1
0 1 0
b1
b2
B
CB C B C
@1 0 0A @b2 A = @b1 A
0 0 1
b3
b3
|
{z
=:S12
}
and
0
10
1
0 1 0
a11 a12 a13 a14
a21 a22 a23 a24
B
CB
C B
C
@1 0 0A @a21 a22 a23 a24 A = @a11 a12 a13 a14 A
0 0 1
a31 a32 a33 a34
a31 a32 a33 a34
Note that we have swapped the first and second row of the unit matrix to
obtain S12 !
Matrices
Slide 159
Elementary Matrix Manipulations
Furthermore, in order to add 3 times the second row to the third row, we
would use
0
10 1 0
1
1 0 0
b1
b1
B
CB C B
C
@0 1 0A @b2 A = @ b2
A
0 3 1
b3
b3 + 3b2
and
0
10
1
1 0 0
a11 a12 a13 a14
B
CB
C
@0 1 0A @a21 a22 a23 a24 A
0 3 1
a31 a32 a33 a34
0
1
a11
a12
a13
a14
B
C
a21
a22
a23
a24
=@
A
a31 + 3a21 a32 + 3a22 a33 + 3a23 a34 + 3a24
Again we have performed the elementary row operation on the unit matrix
to obtain the matrix that implements this operation.
Matrices
Slide 160
Elementary Matrix Manipulations
In conclusion, we remark that
(i) An elementary row operation on a system of equations may be simply
considered as a multiplication of Ax = b from the left with a suitable
matrix, a so-called elementary matrix.
(ii) An elementary matrix is obtained by applying the desired elementary
row operation to the unit matrix. [Why must this be the case?]
(iii) If we apply two elementary operations, the product of their respective
matrices gives the matrix corresponding to these two operations, in
order.
This means that in solving a system Ax = b, the sum of all row operations
in forward elimination and backward substitution may be represented by a
single matrix S ∈ Mat(m × m; R). We thus have
SAx = Sb:
If m = n, the system Ax = b may have a unique solution; in that case, SA
is a diagonal matrix of the form (1.3), i.e., SA = id.
Matrices
Slide 161
Inverse of a Matrix
Let us now return to the question of finding the inverse of a linear map
A : Rn → Rn (of course, we must assume that A is an isomorphism, so
m = n). Of course, we say that a matrix is invertible if the corresponding
linear map is invertible and the inverse of a matrix is just the matix of the
inverse map. However, it may be useful to clarify this.
5.9. Definition. A matrix A ∈ Mat(n × n; R) is called invertible if there
exists some B ∈ Mat(n × n; R) such that
0
1
0
1
B
C
AB = BA = id = @ . . . A :
0
1
(5.3)
We then write B = A−1 and say that A−1 is the inverse of A.
e both satisfy
5.10. Remark. The inverse is of course unique; if B and B
(5.3) for some A, then
e
e
e
B = (BA)B
= B(AB)
= B:
Matrices
Slide 162
Inverse of a Matrix
5.11. Remark. It is obvious that the matrix S corresponding to a series of
elementary row manipulations will be invertible, because the operations
themselves are invertible. Thus the matrix S : Rm → Rm represents an
isomorphism.
5.12. Remark. Given a matrix A ∈ Mat(n × n; R) (identified with a linear
map L ∈ L(Rn ; Rn )) and a putative inverse matrix
B = A−1 ∈ Mat(n × n; R) it is sufficient to verify that
BA = id :
In this case, B corresponds to a linear map M such that M ◦ L is the
identity map. Thus dim ran M = n, so M is bijective. Hence M is invertible
and L = M −1 is bijective. Then L ◦ M = M ◦ L = id, so we have
AB = BA = id.
Matrices
Slide 163
Inverse of a Matrix
5.13. Lemma. Let A ∈ L(Rn ; Rn ). Then A is invertible if and only if
there exists an elementary matrix S corresponding to elementary row
operations that transform A into the unit matrix SA = id.
Proof.
(⇒) If A is bijective, for every y ∈ Rn there exists a unique solution x to
Ax = y . Thus there exists a matrix S corresponding to row operations
such that
SAx = x = Sy :
For every x there exists a unique y such that y = Ax. Thus SAx = x
for every x ∈ Rn , and so SA = id.
(⇐) By Remark 5.12, SA = AS = id, and by Remark 5.10, S = A−1 so A
is invertible.
Matrices
Slide 164
Finding the Inverse
Lemma 5.13 tells us how to actually find the inverse of a matrix A: it is
simply the elementary matrix that transforms A into the unit matrix. If this
transformation is not possible, A is not invertible.
5.14. Example. Consider the matrix
A=
!
2 3
:
2 1
In order to find the inverse, we transform A into the unit matrix through a
sequence of elementary row operations S, keeping track of the elementary
matrix that implements these operations.
Matrices
Slide 165
Finding the Inverse
SA
2 3
2 1
S
!
!
1 3=2
0 −2
1 0
0 1
„
„
!
1 0
0 1
1=2 0
−1 1
|
We may immediately check that
−1
A
A=
−1
AA
=
«
!
| : 2 ·(−2)
←−−− +
←−−−−−− +
| : (−2) ·(−3=2)
!
−1=4 3=4
1=2 −1=2
{z
=A−1
!
−1=4 3=4
1=2 −1=2
2 3
2 1
«
2 3
2 1
!
!
−1=4 3=4
1=2 −1=2
}
=
1 0
0 1
=
1 0
0 1
!
!
Matrices
Slide 166
Matrix Inverse
Mathenatica has a command for finding the inverse:
In[7]:=
MatrixForm@Inverse@882, 3<, 82, 1<<DD
Out[7]//MatrixForm=
1
-4
1
2
3
4
1
-2
Matrices
Slide 167
Inverse Maps
5.15. Remark. We note that if A; B ∈ Mat(n × n; R) are invertible, then so
is their product AB ∈ Mat(n × n; R) and (AB)−1 = B −1 A−1 .
We can use this procedure to find the inverse of any vector space
isomorphism L:
U
L
’A
w V
u
’B
Rn
A
u
w Rm
−1
L−1 = ’−1
◦ ’B
A ◦A
5.16. Example. Let P2 be the space of polynomials of degree not more
than 2. Consider the linear map
a−c
a+b+c 2 a+b
x +
x+
L : P2 → P2 ;
ax 2 + bx + c 7→
3
2
2
Matrices
Slide 168
Inverse Maps
We choose a basis (any will do) of P2 : B = (x 2 ; x; 1). Then
0 1
0
1
a+b+c
3
C
B a+b
2
’B (L(ax + bx + c)) = @ 2 A
a−c
2
a
B C
’B (ax 2 + bx + c) = @b A ;
c
We can read off that
0
1
1=3 1=3 1=3
B
C
0 A
A = @1=2 1=2
1=2 0 −1=2
and (with a little bit of work) calculate
0
1
3 −2 2
B
C
−1
A = @−3 4 −2A
3 −2 0
Matrices
Slide 169
Inverse Maps
Now we are able to calculate the inverse of L:
−1
L−1 (ax 2 + bx + c) = ’−1
◦ ’B (ax 2 + bx + c)
B ◦A
0 1
a
−1 B C
b
= ’−1
◦
A
@
A
B
c
0
10 1
0
1
3 −2 2
a
CB C
−1 B
= ’B @−3 4 −2A @b A
3 −2 0
c
3a − 2b + 2c
B
C
−3a
+ 4b − 2c A
= ’−1
@
B
3a − 2b
= (3a − 2b + 2c)x 2 + (−3a + 4b − 2c)x + 3a − 2b
Matrices
Slide 170
Changes of Basis
Suppose that in Rn we are given an initial basis B = {e1 ; : : : ; en }, e.g., the
standard basis. A vector x ∈ Rn has the representation
x=
n
X
xi ei
i=1
We now wish to represent x in terms of a new basis, B′ = {e1′ ; : : : ; en′ }.
Let us suppose that T is the linear map such that T ei = ei′ , i = 1; : : : ; n.
Then T is uniquely defined and invertible. If (e1 ; : : : ; en ) is the standard
basis, then T may be represented as
T = (e1′ ; : : : ; en′ ):
Matrices
Slide 171
Changes of Basis
5.17. Example. Consider a rotation by 45◦ in the clockwise direction,
T: R →R ;
2
2
!
1
0
!
1
1
7 √
→
;
2 −1
so
0
1
!
!
1 1
7 √
→
2 1
!
1
1 1
:
T =√
−1
1
2
x2 ·e2
x2 ·e2
x2 ·e2′
x
x1 ·e1′
x
x1 ·e1
x1 ·e1
Matrices
Slide 172
Changes of Basis
Suppose that
x=
n
X
i=1
xi ei =
n
X
xi′ ei′ =
i=1
Then
T −1 x =
n
X
xi′ T ei :
i=1
n
X
xi′ ei
i=1
and we can find the coordinates xi′ , i = 1; : : : ; n, of x with respect to B′
simply by applying T −1 .
x 2 ·e2
x2 ·e2
x
T -1 x
x1 ·e1
x 1 ·e1
Matrices
Slide 173
Active and Passive Points of View
We can therefore implement the passive change of basis T for x by the
active action of T −1 on x.
In the passive point if view, we are not doing anything - x stays exactly the
same, it is simply re-written in terms of another basis.
In the active point of view, something is happening: we are applying a
linear map to x.
Both points of view are equally valid. The active point of view sometimes
appears easier, while the passive point of view is often more more elegant.
Matrices
Slide 174
Reflection in R2
5.18. Example. Consider the reflection of vectors in R2 by the x1 axis, i.e.,
the map
A: R → R ;
2
x1
A
x2
2
The matrix representation of A is simply
A=
!
1 0
:
0 −1
!
=
!
x1
:
−x2
(5.4)
Now we want to consider the reflection in R2 about the line through the
vector
!
1
y=
:
2
Of course, the correct matrix for this reflection can be found geometrically.
However, here we want to illustrate how a change of basis can help us
determine this matrix algebraically.
Matrices
Slide 175
A Suitable Basis for the Reflection
` ´
Denote by L the reflection about the line through y = e1′ = 12 . A vector
` ´
′
′
perpendicular to y is e2′ = −2
1 and so we can choose A = (e1 ; e2 ) as a
basis. These vectors have the property that
Le1′ = e1′
Le2′ = −e2′ ;
and
so in this basis the action of L is known and simple.
Now e1′ = T e1 and e2′ = T e2 where
T =
!
1 −2
2 1
and we note that
T
−1
!
1 1 2
=
:
5 −2 1
The strategy for calculating the action of the reflection L is now as follows:
1. Change to the basis (e1′ ; e2′ );
2. Execute the reflection in this basis. It is given by the matrix A of (5.4);
3. Change back to the basis (e1 ; e2 ).
Matrices
Slide 176
Active Point of View
The basis change can be implemented actively by applying T −1 , we then
apply A, and then change back (actively) by applying T to x:
L = T AT −1
=
!
!
1 −2
2 1
1 −3 4
=
5 4 3
!
1 0 1 1 2
0 −1 5 −2 1
!
It is easily verified that Le1′ = e1′ and Le2′ = −e2′ , as expected.
Matrices
Slide 177
Passive Point of View
In the passive point of view we regard L : V → V where V = R2 is imbued
with the basis B′ = {e1′ ; e2′ }. We then find the representing matrix A for L
with respect to this basis:
R2
L
’B ′
w R2
’B ′
u
R
2
A
u
w R2
Of course, ’B′ = T −1 maps (e1′ ; e2′ ) into (e1 ; e2 ). Then it is easy to see
that
!
1 0
A=
0 −1
and therefore L = T AT −1 as above.
Theory of Systems of Linear Equations
Slide 178
6. Theory of Systems of Linear Equations
Theory of Systems of Linear Equations
Linear Algebra
1. Systems of Linear Equations
2. Finite-Dimensional Vector Spaces
3. Inner Product Spaces
4. Linear Maps
5. Matrices
6. Theory of Systems of Linear Equations
7. Determinants
Slide 179
Theory of Systems of Linear Equations
Slide 180
The Solution Set of Systems of Equations
We briefly return to the theory of solvability of linear systems of equations
Ax = b. We define the solution set
Sol(A; b) = {x ∈ Rn : Ax = b}:
If x0 ∈ Rn satisfies
Ax0 = b
we say that x0 is a particular solution of Ax = b. The associated
homogeneous solution set is
Sol(A; 0) = {x ∈ Rn : Ax = 0} = ker A:
A very important, fundamental result states:
The solution set of Ax = b is the sum of the homogeneous
solution set and a particular solution.
Theory of Systems of Linear Equations
Slide 181
Structure of the Solution Set
6.1. Lemma. Let x0 ∈ Rn be a particular solution of Ax = b. Then
Sol(A; b) = {x0 } + ker A = {y ∈ Rn : y = x0 + x; x ∈ ker A}:
where the sum of sets is understood as in Definition 2.24.
Proof.
(i) Sol(A; b) ⊃ {x0 } + ker A: Let x ∈ ker A. Then
A(x0 + x) = Ax0 + Ax = Ax0 = b;
so x0 + x ∈ Sol(a; b).
(ii) Sol(A; b) ⊂ {x0 } + ker A: Let v ∈ Sol(A; b). Then
A(v − x0 ) = Av − Ax0 = b − b = 0;
so v − x0 ∈ ker A, implying v ∈ {x0 } + ker A.
Theory of Systems of Linear Equations
Slide 182
Solvability of Systems of Equations
The following results follow immediately:
6.2. Corollary. If x0 is a solution of Ax = b and {v1 ; : : : ; vr } a basis of
ker A, then
Sol(A; b) = {x ∈ Rn : x = x0 + –1 v1 + · · · + –r vr : –1 ; : : : ; –r ∈ R}:
Here r = dim ker A.
6.3. Corollary. Suppose that the linear system of equations Ax = b has a
solution. Then the solution is unique if and only if ker A = {0}.
Theory of Systems of Linear Equations
Slide 183
Solvability of Systems of Equations
This gives rise to a further, fundamentally important result:
6.4. Fredholm Alternative. Let A be an n × n matrix. Then
I either Ax = b has a unique solution for any b ∈
Rn
I or Ax = 0 has a non-trivial solution.
Proof.
Either ker A = {0} (in which case Ax = b has the solution x = A−1 b for
any b ∈ Rn ) or x0 ∈ ker A is a non-trivial solution of Ax = 0.
The Fredholm alternative occurs in many more complicated contexts. This
is the most basic case.
Theory of Systems of Linear Equations
Slide 184
Matrix Rank
6.5. Definition. Let A ∈ Mat(m × n; F) be a matrix with columns a·j ∈ Fm ,
1 ≤ j ≤ n, and rows ai· ∈ Fn , 1 ≤ i ≤ m. Then we define
I the column rank of A to be
column rank A := dim span{a·1 ; : : : ; a·n }
I and the row rank of A to be
row rank A := dim span{a1· ; : : : ; am· }:
6.6. Remarks.
I The column rank is the greatest number of independent column
vectors a·j that can be selected from all columns. This is analogously
true for the row rank.
I column rank A = row rank AT .
I column rank A = dim ran A.
Theory of Systems of Linear Equations
Slide 185
Matrix Rank
6.7. Definition and Theorem. Let A ∈ Mat(m × n; F). Then the column
rank is equal to the row rank and we define the rank of A by
rank A := column rank A = row rank A:
Proof.
In the assignments it will be shown that
ran A = (ker A)⊥ :
T
Then, using Corollary 3.24 and the dimension formula (4.3),
T
T
row rank A = column rank AT = column rank A = dim ran A
= dim(ker A)⊥ = n − dim ker A = dim ran A = column rank A:
Here we have used that complex conjugation is a linear, bijective map
C → C if C is regarded as a real vector space.
Theory of Systems of Linear Equations
Slide 186
Existence of Solutions
The fundamental theorem on the existence of solutions to a linear system
of equations is the following:
6.8. Theorem. There exists a solution x for Ax = b if and only if
rank A = rank(A | b), where
0
a11
B ..
(A | b) = @ .
:::
a1n
..
.
1
b1
.. C ∈ Mat((n + 1) × m):
. A
am1 : : : amn bm
Theory of Systems of Linear Equations
Slide 187
Solvability of Systems of Equations
Proof.
We write A = (a·1 ; : : : ; a·n ), where the a·k ∈ Rm are column vectors of A.
Then we use that the range of a matrix is the span of its column vectors
and the rank is the dimension of the range, so
Ax = b has solution x ∈ Rn
⇔ b ∈ ran A
⇔ b ∈ span{a·1 ; : : : ; a·n }
⇔ b is not independent of a·1 ; : : : ; a·n
⇔ dim span{a·1 ; : : : ; a·n } = dim span{a·1 ; : : : ; a·n ; b}
⇔ dim ran A = dim ran(A | b)
⇔ rank A = rank(A | b)
Theory of Systems of Linear Equations
Slide 188
Manipulating Matrices
A matrix is just a list of lists. The command Append is used to add
elements to a list:
Append@8a, b, c, d<, xD
8a, b, c, d, x<
We want to use this to check rank A = rank(A | b). Define a matrix A and
a vector b as follows::
A = TableAai ,j , 8i, 2<, 8j, 3<E;
B = Table@bi , 8i, 2<D;
Print@"A = " MatrixForm@AD, ", b = " MatrixForm@BDD
A =
a1,1 a1,2 a1,3
b
, b = K 1O
a2,1 a2,2 a2,3
b2
Theory of Systems of Linear Equations
Slide 189
Manipulating Matrices
Since a matrix is a list of row vectors, it is easy to add a row:
Append@A, 8x, y, z<D  MatrixForm
a1,1 a1,2 a1,3
a2,1 a2,2 a2,3
x
y
z
To add a column, we could transpose, add a row, and transpose again:
Transpose@Append@Transpose@AD, BDD  MatrixForm
a1,1 a1,2 a1,3 b1
a2,1 a2,2 a2,3 b2
However, the repeated transposition is inefficient and may cost significant
computing resources for large matrices.
Theory of Systems of Linear Equations
Slide 190
Manipulating Matrices
There exists a specialized command to achieve the same result without
transposition:
MapThread@Append, 8A, B<D  MatrixForm
a1,1 a1,2 a1,3 b1
a2,1 a2,2 a2,3 b2
The rank of a matrix is found through the MatrixRank command.
Theory of Systems of Linear Equations
Slide 191
Manipulating Matrices
6.9. Example.
A = 881, 2, 3<, 84, 5, 6<, 87, 8, 9<<;
MatrixRank@AD
2
b = 83, 4, 5<;
MatrixRank@MapThread@Append, 8A, b<DD
2
b = 83, 4, 6<;
MatrixRank@MapThread@Append, 8A, b<DD
3
Theory of Systems of Linear Equations
Slide 192
Manipulating Matrices
The kernel of a matrix is obtained from the NullSpace command:
NullSpace@881, 2, 3<, 84, 5, 6<, 87, 8, 9<<D
881, -2, 1<<
The output is a list of basis vectors of the kernel of the matrix.
Determinants
Slide 193
7. Determinants
Determinants
Linear Algebra
1. Systems of Linear Equations
2. Finite-Dimensional Vector Spaces
3. Inner Product Spaces
4. Linear Maps
5. Matrices
6. Theory of Systems of Linear Equations
7. Determinants
Slide 194
Determinants
Slide 195
Parallelograms
We will motivate determinants geometrically (as areas of
parallelograms/volumes of parallelepipeds) rather than algebraically (via
solutions of systems of linear equations).
Consider a parallelogram P (a; b) spanned by two non-colinear vectors
a; b ∈ R2 .
a¦
b
Θ
a
We are interested in the area A(a; b) of the
parallelogram, which is equal to the area
of the rectangle with width |a| and height
given by |b||cos „|. Let a = (a1 ; a2 ), a⊥ =
(−a2 ; a1 ). Then a ⊥ a⊥ , i.e., ⟨a; a⊥ ⟩ = 0 and
|a⊥ | = |a|.
From (3.3) it follows that
|b| cos „ =
*
a⊥
;b
|a⊥ |
+
Determinants
Slide 196
The Determinant in R2
We obtain
˛*
+˛
˛ a⊥
˛
|a|
˛
˛
A(a; b) = |a| ˛
;
b
˛ = ⊥ |⟨a⊥ ; b⟩| = |⟨a⊥ ; b⟩| = |a1 b2 − a2 b1 |:
⊥
˛ |a |
˛
|a |
We remark that
A(a; b) = |⟨a⊥ ; b⟩| = |a||b| sin ^(a; b)
We define the determinant as a map
det : R × R → R;
2
2
det
!
b1
a1
;
b2
a2
!!
= a1 b2 − a2 b1 ;
(7.1)
so that A(a; b) = |det(a; b)|. The determinant is an oriented area.
Equivalently, the determinant may be regarded as a map
det : Mat(2 × 2; R) → R;
a b
det 1 1
a2 b2
!
= a1 b2 − a2 b1 :
(7.2)
Determinants
Slide 197
Properties of the Determinant
Both interpretations of the determinant will be used frequently.
7.1. Remark. We note the following properties of the determinant:
1. det is normed, i.e.,
1 0
det(e1 ; e2 ) = det
0 1
!
= 1:
2. det is bilinear:
det(–a; b) = – det(a; b) = det(a; –b);
det(a + b; c) = det(a; c) + det(b; c);
det(a; b + c) = det(a; b) + det(a; c):
This can be easily seen geometrically by considering the volumes of
the parallelograms.
Determinants
Slide 198
Properties of the Determinant
3. det is alternating, i.e., det(a; a) = a1 a2 − a2 a1 = 0. Note that this
implies that det(a; b) = − det(b; a), since (using the bilinearity)
0 = det(a + b; b + a)
= det(a; a) + det(b; b) + det(a; b) + det(b; a)
= det(a; b) + det(b; a):
(In the case of two variables, an alternating map is often called
antisymmetric.)
Determinants
Slide 199
Vector Product in R3
We now introduce the “vector product” a × b of two vectors a; b ∈ R3 .
The vector a × b ∈ R3 is determined by
1. its length: we set |a × b| = A(a; b), the area of the parallelogram
spanned by a and b (if a and b are linearly dependent, we set
a × b = 0);
2. its direction: we want a × b to be orthogonal to a and b - in other
words, a × b ⊥ span{a; b};
3. its orientation: (a; b; a × b) should form a “right-hand system”
(defined using the thumb, index finger and middle finger of the right
hand).
This is sufficient to define a unique vector a × b for a; b ∈ R3 , i.e., we have
a map × : R3 × R3 → R3 .
7.2. Remark. In contradistinction to the scalar product, which can be
defined on Rn for n = 1; 2; : : :, the vector product is only defined on R3 .
Rechte Hand Regel [modified]. Wikimedia Commons. Wikimedia Foundation. Web. 9 May 2012
Determinants
Slide 200
The Right Hand Rule
Determinants
Slide 201
Properties of the Vector Product in R3
Note that the vector product is
1. bilinear: the homogeneity (–a) × b = –(a × b) = a × (–b) follows
from the definition of the cross product, the additivity
a × (b + c) = a × b + a × c, (a + b) × c = a × c + b × c is easy to see
geometrically when a; b; c are coplanar and slightly more difficult to
show when they are not.
2. antisymmetric: a × a = 0, or a × b = −b × a.
We can compute the vector product of the standard basis vectors with
each other:
e1 × e2 = e3 = −e2 × e1
e2 × e3 = e1 = −e3 × e2
e3 × e1 = e2 = −e1 × e3
e1 × e1 = 0 = e2 × e2 = e3 × e3
(7.3)
Determinants
Slide 202
Calculating the Vector Product in R3
Using the bilinearity and (7.3), we can now calculate a × b for arbitrary
a; b ∈ R3 :
a × b = (a1 e1 + a2 e2 + a3 e3 ) × (b1 e1 + b2 e2 + b3 e3 )
=
3
X
ai bj (ei × ej )
i;j=1
= (a2 b3 − a3 b2 )e1 + (a3 b1 − a1 b3 )e2 + (a1 b2 − a2 b1 )e3
0
+ det
“
a2 b 2
a3 b 3
”1
B
“
”C
C
B
a1 b1 C
B
= B− det a3 b3 C
@
”A
“
+ det
a1 b1
a2 b2
(7.4)
Determinants
Slide 203
Parallel Epipeds
We now consider the problem of finding the volume of a parallel epiped
spanned by three vectors a; b; c ∈ R3 .
The volume is given by the base area
(the area of the parallelogram spanned
by a; b) multiplied with the height,
|c||cos „|. Using the fact that a × b
is orthogonal to a and b we have
|c| cos „ =
fi
fl
a×b
;c ;
|a × b|
so the volume is given by
˛fi
fl˛
˛ a×b
˛
V (a; b; c) = |a × b| ˛˛
; c ˛˛ = |⟨a × b; c⟩|:
|a × b|
Determinants
Slide 204
The Determinant in R3
We therefore define the determinant as an oriented volume,
det : R3 × R3 × R3 → R;
det(a; b; c) = ⟨a × b; c⟩:
(7.5)
(Again, we note that it can be equivalently defined Mat(3 × 3; R) → R.)
Note that
det(a; b; c) > 0
if (a; b; c) form a right-hand system,
(7.6)
det(a; b; c) < 0
if (a; b; c) form a left-hand system,
(7.7)
det(a; b; c) = 0
if a = –b or a = –c or b = –c for any – ∈ R.
(7.8)
The last property follows from the properties of the vector and scalar
products: if a = –b then a × b = 0, and since a × b is orthogonal both a
and b, the scalar product ⟨a × b; c⟩ will vanish if a = –c or b = –c.
Determinants
Slide 205
Cyclic Permutations
Let (x1 ; : : : ; xn ) be an ordered list of elements. Define a relation ≺ by
x1 ≺ x2 ≺ x3 ≺ · · · ≺ xn−1 ≺ xn ≺ x1 (“x1 precedes x2 precedes x3 etc.”).
Let ı : {x1 ; : : : ; xn } → {x1 ; : : : ; xn } be a bijective map (such a map is
called a permutation). Then any list (ı(x1 ); : : : ; ı(xn )) is called a cyclic
permutation of (x1 ; : : : ; xn ) if ı(x1 ) ≺ ı(x2 ) ≺ · · · ≺ ı(xn ) ≺ ı(x1 ).
Furthermore, if (a; b; c) form a right-hand system, then
V (a; b; c) = det(a; b; c). Since the volume is independent of the
designation of the vectors, we observe that a cyclic permutation of
(a; b; c) preserves the right-handedness and
det(a; b; c) = det(c; a; b) = det(b; c; a)
or
⟨a × b; c⟩ = ⟨c × a; b⟩ = ⟨b × c; a⟩:
Determinants
Slide 206
Calculating Determinants in R3
Note that by (7.4),
det(a; b; c) = ⟨b × c; a⟩ =
3
X
ai (b × c)i
i=1
!
!
b c
b c
b c
= a1 det 2 2 − a2 det 1 1 + a3 det 1 1
b3 c3
b3 c3
b2 c2
0
!
1
a1 b1 c1
B
C
= det @a2 b2 c2 A
a3 b3 c3
(7.9)
We may therefore calculate a 3 × 3 determinant det A by calculating 2 × 2
subdeterminants. Denote by Akj the 2 × 2 matrix obtained from A by
deleting the kth row and the jth column, (7.9) can be written as
0
1
a11 a12 a13
3
X
B
C
det A = det @a21 a22 a23 A =
(−1)k+1 ak1 det Ak1
k=1
(7.10)
Determinants
Slide 207
Calculating Determinants in R3
We will prove later that in fact
0
1
0
1
a1 b1 c1
a1 a2 a3
B
C
B
C
det @a2 b2 c2 A = det @b1 b2 b3 A :
a3 b3 c3
c1 c2 c3
This (together with (7.9)) motivates the mnemonic
0
1
e1 e2 e3
C
B
a × b = det @a1 a2 a3 A ;
b1 b2 b3
where e1 ; e2 ; e3 are the standard unit basis vectors and a = (a1 ; a2 ; a3 ),
b = (b1 ; b2 ; b3 ).
Determinants
Slide 208
Properties of the Determinant in R3
We once more note the following properties of the determinant:
1. det is normed, i.e.,
0
1
1 0 0
B
C
det(e1 ; e2 ; e3 ) = det @0 1 0A = 1:
0 0 1
2. det is trilinear:
det(–a; b; c) = – det(a; b; c) = det(a; –b; c) = det(a; b; –c);
det(a + b; c; d) = det(a; c; d) + det(b; c; d);
det(a; b + c; d) = det(a; b; d) + det(a; c; d);
det(a; b; c + d) = det(a; b; c) + det(a; b; d)
These properties follow from the corresponding properties of the scalar
and vector products.
3. det is alternating (see (7.8)).
Determinants
Slide 209
Preview of Determinants in Rn
Our goal is now to find a generalization of the determinant that has the
three main properties of being
I multilinear,
I alternating and
I normed.
It will turn out that these three properties are actually sufficient to define
the determinant uniquely; there is only one map Mat(n × n; R) → R with
these properties, and for n = 2; 3 it is given by (7.2) and (7.5), respectively.
In the case of n = 1, Mat(1 × 1; R) is equivalent to R and we define
det(a) = a. This definition trivially has the properties of being normed and
linear; while it makes no sense to define what alternating means.
Determinants
Slide 210
Preview of Determinants in Rn
We will further see that one possible formula for determinants
det : Mat(n × n) → R can be constructed recursively, similar to (7.10).
In fact, if A ∈ Mat(n × n; R), we define Akj ∈ Mat((n − 1) × (n − 1); R) as
the matrix obtained from A by deleting the kth row and the jth column.
Then for any j = 1; : : : ; n we will obtain the recursion formula
det A =
n
X
(−1)k+j akj det Akj
(7.11)
k=1
In order to understand the extension of determinants to Rn better, we need
to formalize the concept of permutations.
Determinants
Slide 211
Groups
7.3. Definition. A group is a pair (G; ◦) consisting of a set G and a group
operation ◦ : G × G → G such that
1. a ◦ (b ◦ c) = (a ◦ b) ◦ c for all a; b; c ∈ G (associativity),
2. there exists an element e ∈ G such that a ◦ e = e ◦ a = a for all a ∈ G
(existence of a unit element),
3. for every a ∈ G there exists an element a−1 ∈ G such that
a ◦ a−1 = a−1 ◦ a = e (existence of an inverse).
A group is called commutative if in addition to the above properties
4. a ◦ b = b ◦ a for all a; b ∈ G (commutativity).
Determinants
Slide 212
Groups and Permutations
7.4. Examples.
1. Any vector space (V; +; ·) may be regarded as a commutative group
(V; +) with the additional operation of scalar multiplication.
2. The set of invertible matrices,
GL(n; R) := {A ∈ Mat(n × n; R) : A is invertible}
is a group with the group operation given by matrix multiplication
(composition of maps).
7.5. Definition. The set of all permutations of n elements
Sn = {ı : {x1 ; : : : ; xn } → {x1 ; : : : ; xn } : ı bijective}
together with the group operation “composition of maps”,
ı1 ◦ ı2 (x) = ı1 (ı2 (x)) is called the symmetric group.
Determinants
Slide 213
Permutations
It is easy to check that (Sn ; ◦) in fact has properties i) – iii), but not
property iv). We will often denote a group by G instead of (G; ◦) if no
confusion arises therefrom.
A permutation of n elements is a finite map; recall that a function f is
defined by pairs of the form (x; f (x)), where x is the independent variable.
A permutation is defined on a set of n elements; instead of {x1 ; : : : ; xn } we
can also simply write {1; : : : ; n}, replacing the permutation of elements
with a permutation of indices. Then we might define a permutation ı
through a set of pairs {(1; ı(1)); : : : ; (n; ı(n))}. In fact, we do represent
permutations in this way, but us a different notation, writing
ı=
!
1
2
:::
n
ı(1) ı(2) : : : ı(n)
Determinants
Slide 214
Transpositions
For example, if n = 2, there are only two permutations ı1 ; ı2 ∈ S2 ,
ı1 : 1 7→ 1;
ı2 : 1 7→ 2;
!
2 7→ 2; ı1 =
1 2
;
1 2
2 7→ 1; ı2 =
1 2
:
2 1
!
(7.12)
7.6. Definition. A permutation in Sn that leaves exactly n − 2 elements
invariant is called a transposition.
A transposition fi ∈ Sn has the form
fi (k) =
for some i; j ∈ {1; : : : ; n}.
8
>
>i
<
j
>
>
:k
if k = j
if k = i
otherwise
(7.13)
Determinants
Slide 215
Permutations as Transpositions
7.7. Lemma. Every permutation ı ∈ Sn , n ≥ 2, is a composition of
transpositions, ı = fi1 ◦ · · · ◦ fik .
Note that the transpositions fij and the number k are not uniquely defined.
Proof.
We proceed by induction. For n = 2 there are only two permutations ı1
and ı2 (see (7.12)); ı2 is a transposition, and ı1 = ı2 ◦ ı2 . We now
assume that any permutation in Sn can be written as a composition of
transpositions and prove that this is also true for any permutation in Sn+1 .
Let ı ∈ Sn . Then we can consider
e=
ı
1
:::
n
n+1
ı(1) : : : ı(n) n + 1
!
(7.14)
e ∈ Sn+1 of the form (7.14)
as an element of Sn+1 . Also, every element ı
can be regarded as an element ı ∈ Sn .
Determinants
Slide 216
Permutations as Transpositions
Proof (continued).
Now let ff ∈ Sn+1 and let fi be the transposition that exchanges n + 1 and
ff −1 (n + 1). Then
ff ◦ fi : n + 1 7−→ ff −1 (n + 1) 7−→ n + 1
fi
ff
so that
ff ◦ fi =
1
:::
n
n+1
ı(1) : : : ı(n) n + 1
!
for some values ı(1); : : : ; ı(n), ı ∈ Sn . It follows that ff ◦ fi can be
written as a composition of transpositions fi1 ◦ · · · ◦ fik ,
ff ◦ fi = fi1 ◦ · · · ◦ fik ;
so ff = fi1 ◦ · · · ◦ fik ◦ fi −1 , which proves the assertion.
Determinants
Slide 217
Sign of a Permutation
While the number of transpositions that make up a permutation is not
unique, we do have the following:
7.8. Definition and Theorem. Let ı ∈ Sn be represented as a composition
of k transpositions, ı = fi1 ◦ · · · ◦ ık . Then the sign of ı,
sgn ı := (−1)k
does not depend on the representation chosen.
In order to prove this, we need an additional concept from group theory,
which we introduce on the following slide.
In advance, we that the sign is “well-behaved”:
sgn(ı1 ◦ ı2 ) = sgn ı1 sgn ı2
for any ı1 ; ı2 ∈ Sn .
Determinants
Slide 218
Group Actions
7.9. Definition. Let (G; ◦) be a group and X a set. Then an action (or
operation) of G on X from the left is a map
Φ: G × X → X
(g ; x) 7→ Φ(g ; x) = Φg x = g x
with the properties
1. ex = x (e ∈ G is the unit element),
2. (a ◦ b)x = a(bx) for a; b ∈ G, x ∈ X.
We say that G acts (operates) on X.
7.10. Proposition. Let X be the set of all maps f : Rn → R. Then Sn acts
on X via
(ıf )(x1 ; : : : ; xn ) = f (xı(1) ; : : : ; xı(n) );
ı ∈ Sn :
Determinants
Slide 219
Group Actions
Proof.
We need to show the properties i) and ii) of Definition 7.9. The unit
element of Sn is
!
1 ::: n
ıe =
;
1 ::: n
so trivially ıe f = f , since
(ıe f )(x1 ; : : : ; xn ) = f (xıe (1) ; : : : ; xıe (n) ) = f (x1 ; : : : ; xn ):
Furthermore, let ff; ı ∈ Sn . Then
[ff(ıf )](x) = (fff )(xı(1) ; : : : ; xı(n) ) = f (xff(ı(1)) ; : : : ; xff(ı(n)) )
= f (x(ff◦ı)(1) ; : : : ; x(ff◦ı)(n) ) = [(ff ◦ ı)f ](x1 ; : : : ; xn );
so ff(ıf ) = (ff ◦ ı)f .
Determinants
Slide 220
Group Actions
7.11. Lemma. Denote by ∆ : Rn → R the function
∆(x1 ; : : : ; xn ) =
Y
(xj − xi ):
(7.15)
i<j
Then
fi ∆ = −∆
for any transposition fi ∈ Sn .
Proof.
Let r; s ∈ {1; : : : ; n}, r < s, and fi the transposition exchanging r and s,
fi =
1 ::: r − 1 r r + 1 ::: s − 1 s s + 1 :::
1 ::: r − 1 s r + 1 ::: s − 1 r s + 1 :::
!
n
:
n
Determinants
Slide 221
Group Actions
Proof (continued).
Note that
fi ∆(x1 ; : : : ; xn ) =
Y
fi (xj − xi ):
i<j
Then
fi (xs − xr ) = −(xs − xr ):
(7.16)
All other factors in (7.15) either do not contain xr or xs (and are left
unchanged by fi ) or occur in one of the following pairings:
I j < r : (xr − xj )(xs − xj )
I r < j < s: (xs − xj )(xj − xr )
I s < j: (xj − xs )(xj − xr )
Each of these pairs is left invariant by fi , so the sign change in (7.16) is the
only effect of fi on ∆.
Determinants
Slide 222
Sign of a Permutation
7.12. Corollary. For every permutation ı = fi1 ◦ · · · ◦ fik ∈ Sn ,
ı∆ = (fi1 ◦ · · · ◦ fik )∆ = (−1)k ∆:
In particular,
sgn ı = (−1)k ;
does not depend on the decomposition of ı into transpositions and is
therefore well-defined.
Proof.
Let ı ∈ Sn and assume that there are transpositions fi1 ; : : : fik , fie1 ; : : : ; fiel
such that
ı = fi1 ◦ · · · ◦ fik = fie1 ◦ · · · ◦ fiel :
Then ı∆(x1 ; : : : ; xn ) = (−1)k ∆(x1 ; : : : ; xn ) = (−1)l ∆(x1 ; : : : ; xn ).
Choosing some x1 ; : : : xn such that ∆(x1 ; : : : ; xn ) ̸= 0 we obtain
(−1)k = (−1)l .
Determinants
Slide 223
p-Multilinear Maps
n
7.13. Definition. A function f : R
× ·{z
· · × Rn} → R is said to be a
|
p times
p-multilinear map (or p-multilinear form) if f is linear in each entry, i.e.,
f (–a1 ; a2 ; : : : ; ap ) = –f (a1 ; a2 ; : : : ; ap )
and
f (a1 + b; a2 ; : : : ; ap ) = f (a1 ; a2 ; : : : ; ap ) + f (b; a2 ; : : : ; ap )
for b; a1 ; : : : ; ap ∈ Rn and – ∈ R and analogous equations hold for the
other entries.
The form is said to be alternating if f (a1 ; : : : ; ap ) = 0 whenever aj = ak
for any j =
̸ k.
An n-multilinear form is said to be normed if f (e1 ; : : : ; en ) = 1, where
e1 ; : : : ; en are the standard basis vectors in Rn .
Determinants
Slide 224
Characterization of Alternating Forms
We will prove that the properties of being multilinear, alternating and
normed are sufficient to uniquely define the determinant in Rn . First,
however, we give a useful result:
n
7.14. Lemma. Let f : R
× ·{z
· · × Rn} → R be a p-multilinear map. Then
|
p times
the following are equivalent:
(i) f is alternating
(ii) f (a1 ; : : : ; aj−1 ; aj ; aj+1 ; : : : ; ak−1 ; ak ; ak+1 ; : : : ; ap )
= −f (a1 ; : : : ; aj−1 ; ak ; aj+1 ; : : : ; ak−1 ; aj ; ak+1 ; : : : ; ap )
(iii) f (a1 ; : : : ; ap ) = 0 if a1 ; : : : ; ap are linearly dependent.
The proof is not difficult and left as an exercise!
Determinants
Slide 225
Determinants in Rn
We will now define the determinant as an alternating, normed,
n-multilinear function for column vectors in Rn and corresponding square
matrices whose columns consist of these vectors, using the notation
1
0
a1j
B .. C
aj = @ . A ;
anj
0
1
a11 a12 : : : a1n
B ..
..
.. C
(j = 1; : : : ; n); A = (a1 ; : : : ; an ) = @ .
.
. A
an1 an2 : : : ann
7.15. Theorem. For every n ∈ N, n > 1, there exists a unique, normed,
n
alternating n-multilinear form det : R
× ·{z
· · × Rn} ∼
= Mat(n × n; R) → R.
|
n times
Furthermore,
det(a1 ; : : : ; an ) = det A =
X
ı∈Sn
sgn ı aı(1)1 · · · aı(n)n :
(7.17)
Determinants
Slide 226
Determinants in Rn
Proof.
We will first show that the determinant defined in (7.17) in fact has the
required poperties.
1. (det is multilinear) Let a1 ; : : : ; an ; b ∈ Rn . Then we show the
additivity in the first entry (the proof for all other entries is completely
analogous)
det(a1 + b; a2 ; : : : ; an )
=
X
sgn ı (aı(1)1 + bı(1) )aı(2)2 · · · aı(n)n
ı∈Sn
=
X
ı∈Sn
X
sgn ı aı(1)1 · · · aı(n)n +
sgn ı bı(1) · · · aı(n)n
ı∈Sn
= det(a1 ; a2 ; : : : ; an ) + det(b; a2 ; : : : ; an )
The homogeneity is shown analogously
Determinants
Slide 227
Determinants in Rn
Proof (continued).
2. (det is normed) Let
0
1
‹1j
B .. C
ej = @ . A
(j = 1; : : : ; n);
‹nj
‹ij =
(
1
0
i = j;
i ̸= j:
Then for any permutation ı ∈ Sn ,
‹ı(1)1 · · · ‹ı(n)n =
(
1 ı(k) = k; k = 1; : : : ; n;
0 otherwise:
Determinants
Slide 228
Determinants in Rn
Proof (continued).
2. It follows that in the summation of the permutations only the
summand with
ı=
!
1 2 ::: n − 1 n
;
1 2 ::: n − 1 n
sgn ı = 1;
survives. Thus
det(e1 ; : : : ; en ) =
X
ı∈Sn
sgn ı ‹ı(1)1 · · · ‹ı(n)n = 1:
Determinants
Slide 229
Determinants in Rn
Proof (continued).
3. (det is alternating) We will show that
det(a1 ; a2 ; : : : ; an−1 ; an ) = − det(an ; a2 ; : : : ; an−1 ; a1 ) (again, the proof
is similar when any other entries are exchanged).Let
fi =
1 2 ::: n − 1 n
n 2 ::: n − 1 1
!
∈ Sn :
(7.18)
be the transposition exchanging 1 and n. We will use that sgn fi = −1
and that summing over all permutations ı ∈ Sn is the same as
summing over all ı ◦ fi ∈ Sn , when fi is fixed by (7.18).
Determinants
Slide 230
Determinants in Rn
Proof (continued).
3. Then
det(an ; a2 ; : : : ; an−1 ; a1 )
=
X
sgn ı aı(1)n aı(2)2 · · · aı(n−1)(n−1) aı(n)1
ı∈Sn
=
X
sgn ı aı(n)1 aı(2)2 · · · aı(n−1)(n−1) aı(1)n
ı∈Sn
=−
X
sgn(ı ◦ fi ) aı(fi (1))1 aı(fi (2))2 · · · aı(fi (n−1))(n−1) aı(fi (n))n
ı∈Sn
=−
X
sgn(ı ◦ fi ) aı(fi (1))1 aı(fi (2))2 · · · aı(fi (n−1))(n−1) aı(fi (n))n
ı◦fi ∈Sn
= − det(a1 ; a2 ; : : : ; an−1 ; an ):
Determinants
Slide 231
Determinants in Rn
Proof (continued).
We next show that the properties of the determinant imply the formula
(7.17). By multilinearity we have
det(a1 ; : : : an ) = det
n
“X
aj1 1 ej1 ; : : : ;
j1 =1
n
X
=
n
X
ajn n ejn
”
jn =1
aj1 1 · · · ajn n det(ej1 ; : : : ; ejn )
j1 ;:::;jn =1
Since det is supposed to be alternating, all summands vanish where any jk
occurs more than once. We therefore sum only over permutations of
{1; : : : ; n},
det(a1 ; : : : an ) =
X
ı∈Sn
aı(1)1 · · · aı(n)n det(eı(1) ; : : : ; eı(n) )
Determinants
Slide 232
Determinants in Rn
Proof (continued).
Again, because det is alternating and assuming each ı is composed of k
transpositions,
det(eı(1) ; : : : ; eı(n) ) = (−1)k det(e1 ; : : : ; en ) = sgn ı det(e1 ; : : : ; en ):
Since det is normed, det(e1 ; : : : ; en ) = 1, so we finally have
det(a1 ; : : : an ) =
X
ı∈Sn
aı(1)1 · · · aı(n)n sgn ı:
Determinants
Slide 233
Determinants and Elementary Column Operations
Since the determinant is alternating and multilinear, we see that the
Elementary Column Operations 5.8 affect the determinant as follows:
I The determinant of a matrix A changes sign if two columns of A are
interchanged, e.g.,
det(a2 ; a1 ; : : : ; an ) = − det(a1 ; a2 ; : : : ; an )
I Multiplying all the entries in a column with a number – leads to the
determinant being multiplied by this constant:
det(a1 ; : : : ; –aj ; : : : ; an ) = – det(a1 ; : : : ; aj ; : : : ; : : : ; an )
I Adding a multiple of a column to another column does not change the
value of the determinant:
det(a1 ; : : : ; aj ; : : : ; ak + –aj ; : : : ; an ) = det(a1 ; : : : ; aj ; : : : ; ak ; : : : ; an )
Determinants
Slide 234
Determinants of Transposed Matrices
7.16. Lemma. Let A ∈ Mat(n × n; R). Then
det A = det AT :
Proof.
We first note that for every ı ∈ Sn , sgn ı = sgn ı −1 and the sum over all
ı is equal to the sum over all ı −1 . Then we can reorder the terms in each
summand, so that
det A =
X
sgn ı aı(1)1 · · · aı(n)n =
ı∈Sn
=
X
ı −1 ∈Sn
T
X
sgn ı a1ı−1 (1) · · · anı−1 (n)
ı∈Sn
sgn ı −1 a1ı−1 (1) · · · anı−1 (n) =
= det A :
X
ı∈Sn
sgn ı a1ı(1) : : : anı(n)
Determinants
Slide 235
Determinants and Elementary Row Operations
As a corollary, we can rewrite (7.17) in a more commonly seen form:
7.17. Leibnitz Formula.
det A =
X
sgn ı a1ı(1) · · · anı(n)
(7.19)
ı∈Sn
7.18. Corollary. Elementary row manipulations of a matrix A affect the
determinant of A in the same way as the corresponding elementary column
manipulations.
Proof.
det A
det AT
row manipulation
column manipulation
w det B
w det B T
Determinants
Slide 236
Triangular Determinants
7.19. Proposition. Let A ∈ Mat(n × n) have upper triangular form, i.e.,
0
–1
B
A=@
0
..
∗
.
–n
1
C
A
for diagonal elements –1 ; : : : ; –n ∈ R and arbitrary values (denoted by ∗)
above the diagonal. Then
det A = –1 · · · –n :
Determinants
Slide 237
Triangular Determinants
Proof.
By multilinearity,
det A =
n
“Y
”
0
1
1
∗
C
B
–k det @ . . . A :
i=k
0
1
The matrix in the determinant on the right can be transformed into the
unit matrix through elementary row manipulations that do not change the
value of the determinant. Therefore its determinant is 1, proving the
result.
Proposition 7.19 can be applied to calculate determinants of matrices
A ∈ Mat(n × n) when A is first transformed to upper triangular form using
elementary matrix manipulations. This is of practical use for n ≥ 4.
Determinants
Slide 238
Determinants and Invertibility of Matrices
The following result is of fundamental importance for many applications:
7.20. Proposition. A matrix A ∈ Mat(n × n) is invertible if and only if
det A ̸= 0.
Proof.
We first show that if A is not invertible, then det A = 0. The linear map
A : Rn → Rn is invertible if and only if ran A = Rn . Since ran A is the span
of the column vectors, A is invertible if and only if the column vectors are
independent. But if the column vectors are not independent, then det A
vanishes.
Now let A = (a1 ; : : : ; an ) be invertible. By Lemma 5.13 A can be
transformed into the unit matrix by elementary row operations. These only
change the value of the determinant by a non-zero factor. Since the
determinant of the unit matrix is 1, it follows that det A ̸= 0.
Determinants
Slide 239
Determinants and Systems of Equations
The determinant can be used to give another formulation of Fredholm’s
Alternative 6.4:
7.21. Fredholm Alternative. Let A ∈ Mat(n × n). Then either
I det A = 0, in which case Ax = 0 has a non-zero solution x ∈ ker A, or
I det A ̸= 0, then Ax = b has a unique solution x = A−1 b for any
b ∈ n.
R
The proof is a straightforward application of the definitions and left to the
reader.
7.22. Cramer’s Rule. Let A = (a1 ; : : : ; an ) ∈ Mat(n × n), a1 ; : : : ; an ∈ Rn ,
be invertible. Then the system Ax = b, b ∈ Rn , has the solution
xi =
1
det(a1 ; : : : ; ai−1 ; b; ai+1 ; : : : ; an );
det A
i = 1; : : : ; n:
(7.20)
Determinants
Slide 240
Determinants and Systems of Equations
Proof.
n
P
We note that Ax =
xk ak for A = (a1 ; : : : ; an ) ∈ Mat(n × n). Therefore,
k=1
det(a1 ; : : : ; ai−1 ; b; ai+1 ; : : : ; an )
= det(a1 ; : : : ; ai−1 ; Ax; ai+1 ; : : : ; an )
“
= det a1 ; : : : ; ai−1 ;
n
X
xk ak ; ai+1 ; : : : ; an
”
k=1
=
n
X
k=1
“
xk det a1 ; : : : ; ai−1 ; ak ; ai+1 ; : : : ; an
“
”
”
= xi det a1 ; : : : ; ai−1 ; ai ; ai+1 ; : : : ; an + 0
= xi det A:
Determinants
Slide 241
Minors and Cofactors
7.23. Definition. Let A = (aij ) ∈ Mat(n × n). Denote the (n − 1) × (n − 1)
matrix obtained from A by deleting the ith row and jth column by
Aij = (akl )1≤k;l≤n :
k̸=i
l̸=j
Then
mij := det Aij
is called the (i ; j )th minor of A. The number
cij := (−1)i+j mij = (−1)i+j det Aij
is called the (i ; j )th cofactor of A and the matrix
Cof A := (cij )1≤i;j≤n
is called the cofactor matrix of A.
Determinants
Slide 242
Determinants and Inversion of Matrices
7.24. Definition. Let A = (aij ) ∈ Mat(n × n). The transpose of the
cofactor matrix of A is called the adjugate of A, denoted by
A] := (Cof A)T
7.25. Theorem. Let A = (aij ) ∈ Mat(n × n) be invertible.Then
A−1 =
1
A]
det A
The proof is based on a useful lemma, which we first establish.
Determinants
Slide 243
Determinants and Inversion of Matrices
7.26. Lemma. Let A = (a1 ; : : : ; an ) ∈ Mat(n × n) and ei be the ith
standard basis vector in Rn . Then
det(a1 ; : : : ; aj−1 ; ei ; aj+1 ; : : : ; an ) = (−1)i+j det Aij = cij
where cij is the (i; j)th cofactor of A.
Proof.
Since the determinant is multilinear, we have
det(a1 ; : : : ; aj−1 ; ei ; aj+1 ; : : : ; an )
= − det(a1 ; : : : ; aj−1 ; aj+1 ; ei ; aj+2 ; : : : ; an )
= (−1)n−j det(a1 ; : : : ; aj−1 ; aj+1 ; : : : ; an ; ei ):
Determinants
Slide 244
Determinants and Inversion of Matrices
Proof (continued).
Swapping ith and the (i + 1)st row, etc., we obtain
n−j+n−i
det(a1 ; : : : ; aj−1 ; ei ; aj+1 ; : : : ; an ) = (−1)
Aij
det
∗
!
0
1
= (−1)i+j det B:
where the entries in ∗ represent the elements of the ith row of A (with the
jth entry deleted). Now from the definition (7.17),
Aij
det
∗
!
0
1
= det B =
X
ı∈Sn
sgn ı bı(1)1 · · · bı(n)n
Determinants
Slide 245
Determinants and Inversion of Matrices
Proof (continued).
Since bı(n)n = ‹nı(n) , we can write
Aij
det
∗
!
0
1
= det B =
X
sgn ı bı(1)1 · · · bı(n−1)(n−1) bnn = det Aij ;
ı∈Sn−1
|{z}
=1
completing the proof.
Proof of Theorem 7.25.
Let A−1 = (x1 ; : : : ; xn ) = (xij ) be a matrix of column vectors x1 ; : : : ; xn .
The inverse of A satisfies AA−1 = id, so we need to find columns xj of A−1
satisfying Axj = ej , j = 1; : : : ; n.
By Cramer’s rule and Lemma 7.26,
xij =
1
1
det(a1 ; : : : ; ai−1 ; ej ; ai+1 ; : : : ; an ) =
(−1)i+j Aji :
det A
det A
Determinants
Slide 246
Laplace Expansion
Another application of Lemma 7.26 is the expansion of det A in terms of
the minors of A:
7.27. Laplace Expansion. For A ∈ Mat(n × n) and any j = 1; : : : ; n the
recursion formula
det A =
n
X
(−1)i+j aij det Aij
(7.21)
i=1
holds
Note that when using this expansion to calculate the determinant of an
n × n matrix, n determinants of (n − 1) × (n − 1) matrices need to be
evaluated. This means the number of computational steps required is
much larger than when Proposition 7.19 is used.
Determinants
Slide 247
Laplace Expansion
Proof.
Let A = (a1 ; : : : ; an ), ak ∈ Rn , k = 1; : : : ; n. Then the jth column has the
P
representation aj = ni=1 aij ei and
“
det A = det a1 ; : : : ; aj−1 ;
n
X
aij ei ; aj+1 ; : : : ; an
”
i=1
=
n
X
=
n
X
“
aij det a1 ; : : : ; aj−1 ; ei ; aj+1 ; : : : ; an
i=1
aij (−1)i+j det Aij ;
i=1
where the last equality follows from Lemma 7.26.
”
Determinants
Slide 248
Determinants and Minors
We can obtain the determinant of a matrix as follows:
A = 881, 2, 3<, 84, 5, 6<, 87, 8, 8<<;
MatrixForm@AD
1 2 3
4 5 6
7 8 8
Det@AD
3
The Mathematica command Minors gives the matrix of minors of A.
However, the command returns the determinants of the submatrix found
by deleting the (n − i − 1)th row and (m − j − 1)th column. To conform to
our definition, the command needs to be modified slightly.
Determinants
Slide 249
Determinants and Minors
MatrixForm@Map@Reverse, Minors@AD, 80, 1<DD
-8 -10 -3
-8 -13 -6
-3 -6 -3
The adjugate matrix can be defined as follows:
adj@m_D := Map@Reverse,
Minors@Transpose@mD, Length@mD - 1D, 80, 1<D
TableAH-1Li+j , 8i, Length@mD<, 8j, Length@mD<E
MatrixForm@adj@ADD
-8 8 -3
10 -13 6
-3 6 -3
Determinants
Slide 250
Product Rule for Determinants
7.28. Proposition. Let A; B ∈ Mat(n × n). Then det(AB) = det A det B.
Proof.
If A = (aik ), B = (bkj ) then AB = C = (cij ) with column vectors cj ,
0
1
c1j
B .. C
cj = @ . A ;
cij =
n
X
aik bkj :
k=1
cnj
Let bj denote the columns of B, then
cj = Abj :
We can assume that A is bijective (otherwise det(AB) = 0 = det A det B).
Determinants
Slide 251
Product Rule for Determinants
Proof (continued).
Hence we can write
1
1
1
det AB =
det(c1 ; : : : ; cn ) =
det(Ab1 ; : : : ; Abn )
det A
det A
det A
1
=
det(A( · ); : : : ; A( · ))[b1 ; : : : ; bn ] =: f (b1 ; : : : ; bn )
det A
The so defined function f is clearly multilinear, because A is linear and det
is multilinear. It is also alternating, because Abk = Abj if bk = bj and det
is alternating. Finally,
f (e1 ; : : : ; en ) =
1
det(Ae1 ; : : : ; Aen ) = 1
det A
The function f is multilinear, normed and alternating. Therefore, by the
uniqueness of the determinant, it must be the determinant.
Determinants
Slide 252
Product Rule for Determinants
Proof (continued).
That means that
f (B) = det B
⇔
1
det AB = det B:
det A
7.29. Corollary. Let A ∈ Mat(n × n) be invertible. Then
det A−1 =
1
:
det A
Slide 253
Part 2: Continuity, Differentiability, Integrability
Slide 254
Continuity, Differentiability, Integrability
8. Sets and Equivalence of Norms
9. Continuity and Convergence
10. The First Derivative
11. The Regulated Integral for Vector-Valued Functions
12. Curves, Orientation, and Tangent Vectors
13. Curve Length, Normal Vectors, and Curvature
14. The Riemann Integral for Scalar-Valued Functions
15. Integration in Practice
16. Parametrized Surfaces and Surface Integrals
Sets and Equivalence of Norms
Slide 255
8. Sets and Equivalence of Norms
Sets and Equivalence of Norms
Slide 256
Continuity, Differentiability, Integrability
8. Sets and Equivalence of Norms
9. Continuity and Convergence
10. The First Derivative
11. The Regulated Integral for Vector-Valued Functions
12. Curves, Orientation, and Tangent Vectors
13. Curve Length, Normal Vectors, and Curvature
14. The Riemann Integral for Scalar-Valued Functions
15. Integration in Practice
16. Parametrized Surfaces and Surface Integrals
Sets and Equivalence of Norms
Slide 257
Finite-Dimensional Vector Spaces
For the rest of the term, we will focus on functions of several variables,
e.g., functions
f : Rn → Rm :
Our previously developed knowledge of linear algebra is essential for this,
since, for example, the derivative of such a function at a point x turns out
to be a matrix. More precisely, the derivative is a map
Df : Rn → Mat(m × n; R):
It is not sufficient to restrict ourselves to functions defined on Rn with
values in Rm . On the one hand, the second derivative of f is then the
derivative of Df , a matrix-valued function. Another aspect occurs in the
study of ordinary differential equations, when we need to differentiate
functions such as the determinant of a matrix. Therefore, we need to
define concepts such as continuity for arbitrary vector spaces.
Sets and Equivalence of Norms
Slide 258
Open Balls
The basic ingredient in our discussion are open balls:
8.1. Definition. Let (V; ∥ · ∥) be a normed vector space. Then
B" (a) := {x ∈ V : ∥x − a∥ < "};
a ∈ V; " > 0;
(8.1)
is called an open ball of radius " about a.
Of course, the “shape” of an open ball depends on the vector space V and
the norm ∥ · ∥. For instance, the open balls in R2 with norms
∥x∥1 = |x1 | + |x2 |;
∥x∥2 =
q
|x1 |2 + |x2 |2 ;
(8.2)
∥x∥∞ = max{|x1 |; |x2 |}
all have quite different shapes.
Furthermore, if V = Pn , for example, open balls do not have an obvious
“shape” at all.
Sets and Equivalence of Norms
Slide 259
Open Sets
8.2. Definition. Let (V; ∥ · ∥) be a normed vector space. A set U ⊂ V is
called open if for every a ∈ U there exists an " > 0 such that B" (a) ⊂ U.
8.3. Examples.
(i) Any open ball B" (a), " > 0, a ∈ V , is an open set.
(For any b ∈ B" (a) take ‹ < " − ∥a − b∥. Then B‹ (b) ⊂ B" (a).)
(ii) The empty set ∅ ⊂ V is open.
(Since there is no a ∈ ∅ for which we need to check that B" (a) ⊂ ∅,
this is an example of a vacuously true statement.)
(iii) The entire space V is an open set in V .
Sets and Equivalence of Norms
Slide 260
Open Sets
We will see that open sets are fundamental for understanding properties of
continuous functions, convergence in vector spaces and much more.
Therefore, it becomes important to answer a basic question:
If a set is open in a vector space (V; ∥ · ∥), is it also open if ∥ · ∥ is
replaced by some other norm?
8.4. Example. If a set ˙ ⊂ R2 is open with respect to any one of the
norms (8.2), it is also open with respect to any of the other norms given
tin (8.2) here. Why?
Sets and Equivalence of Norms
Slide 261
Equivalent Norms
8.5. Definition. Let V be a vector space on which we may define two norms
∥ · ∥1 and ∥ · ∥2 . Then the two norms are called equivalent if there exists
two constants C1 ; C2 > 0 such that
C1 ∥x∥1 ≤ ∥x∥2 ≤ C2 ∥x∥1
for all x ∈ V .
(8.3)
8.6. Example. In Rn we have (amongst others) the following two possible
choices of norms:
∥x∥2 :=
n
“X
|xi |2
”1=2
;
∥x∥∞ := max |xi |:
i=1
It is easily verified that for all x ∈ Rn ,
1
√ ∥x∥2 ≤ ∥x∥∞ ≤ ∥x∥2 ;
n
so the two norms are equivalent.
1≤i≤n
(8.4)
Sets and Equivalence of Norms
Slide 262
Convergence of Sequences
8.7. Remark. It is obvious from the definition that if two norms on a vector
space are equivalent, then any set that is open with respect to the first
norm is also open with respect to the second norm.
We recall the following from Vv186:
8.8. Definition. Let (V; ∥ · ∥) be a normed vector space and (vn ) a sequence
in V . Then (vn ) converges to a (unique) limit v ∈ V ,
n→∞
vn −−−→ v
if and only if
n→∞
∥vn − v ∥ −−−→ 0:
For later use, we note:
8.9. Remark. If a sequence (vn ) in (V; ∥ · ∥) converges to v ∈ V , then
∥vn ∥ → ∥v ∥. This follows from
˛
˛
˛∥vn ∥ − ∥v ∥˛ ≤ ∥v − vn ∥ → 0:
Sets and Equivalence of Norms
Slide 263
Equivalence of All Norms
8.10. Remark. It is again easy to see from the definition that if two norms
on a vector space are equivalent, then a sequence that converges to a limit
with respect to the first norm also converges to the same limit with respect
to the second norm.
Therefore, the following theorem is of fundamental importance:
8.11. Theorem. In a finite-dimensional vector space, all norms are
equivalent.
A major consequence of Theorem 8.11 is that if we have several norms at
our disposal in a finite-dimensional space, then we can freely choose a
convenient one in order to show openness of sets, convergence of
sequences, etc.
The proof of Theorem 8.11 requires some preliminary work.
Sets and Equivalence of Norms
Slide 264
The Theorem of Bolzano-Weierstraß
We recall two basic facts from the theory of sequences of real numbers:
(i) Every bounded and monotonic sequence of real numbers converges.
(ii) Every sequence of real numbers has a monotonic subsequence.
Together, these yield the following fundamental result (cf.
186 Theorem 2.2.35):
8.12. Theorem of Bolzano-Weierstraß. Every bounded sequence of real
numbers has a convergent subsequence.
We remark that the Theorem of Bolzano-Weierstraß easily implies that
every Cauchy sequence of real numbers converges, because every Cauchy
sequence that has a convergent subsequence must itself converge. Thus the
basic ingredient in proving that the real numbers (with the usual metric)
are complete is the fact that a bounded, monotonic sequence converges.
Sets and Equivalence of Norms
Slide 265
The Theorem of Bolzano-Weierstraß in Rn
8.13. Theorem of Bolzano–Weierstraß in Rn . Let (x (m) )m∈N be a sequence
(m)
(m)
of vectors in Rn , i.e., x (m) = (x1 ; : : : ; xn ). Suppose that there exists a
(m)
constant C > 0 such that |xk | < C for all m ∈ N and each k = 1; : : : ; n.
Then there exists a subsequence (x (mj ) )j∈N that converges to a vector
y ∈ Rn in the sense that
(m ) j→∞
xk j −−−→ yk
for k = 1; : : : ; n.
Proof.
(m)
Consider the real coordinate sequence (x1 )m∈N . By assumption, this
sequence is bounded, so by the Theorem of Bolzano-Weierstraß 8.12 there
(mj )
exists a convergent subsequence (x1 1 ) with some limit, say y1 ∈ R.
(m)
The second coordinate sequence (x2 ) is also bounded and has a
convergent subsequence, but this subsequence does not need to have the
(m)
same indices as that for (x ).
Sets and Equivalence of Norms
Slide 266
The Theorem of Bolzano-Weierstraß in Rn
Proof (continued).
(mj )
We therefore employ a trick: The subsequence (x2 1 ) that uses the
indices from our above subsequence for the first coordinate is of course
(mj )
also bounded and hence has a sub-subsequence (x2 2 ) that converges, say
to y2 ∈ R. Taking the corresponding sub-subsequence for the first
(mj )
coordinate, (x1 2 ) still converges to y1 .
Similarly, we a sub-sub-subsequence of the third coordinate will converge
to some y3 ∈ R while the corresponding sub-sub-subsequences of the first
two coordinates will still converge to y1 and y2 , respectively. Repeating the
(m )
procedure n times, the n-fold subsequence (xk jn ) converges to some
yk ∈ R, k = 1; : : : ; n. Hence, the subsequence (x (mjn ) ) converges to some
y ∈ Rn .
Sets and Equivalence of Norms
Slide 267
A Basic Norm inequality
8.14. Lemma. Let (V; ∥ · ∥) be a finite- or infinite-dimensional normed
vector space and {v1 ; : : : ; vn } an independent set in V . Then there exists a
C > 0 such that for any –1 ; : : : ; –n ∈ F
`
´
∥–1 v1 + · · · + –n vn ∥ ≥ C |–1 | + · · · + |–n | :
(8.5)
Proof.
Let s := |–1 | + · · · + |–n |. If s = 0, then all –k = 0 and the inequality (8.5)
holds trivially for any C, so we can assume s > 0. Dividing by s, (8.5)
becomes
∥—1 v1 + · · · + —n vn ∥ ≥ C;
n
X
|—k | = 1;
k=1
with —k = –k =s.
(8.6)
Sets and Equivalence of Norms
Slide 268
A Basic Norm inequality
Proof (continued).
Hence, we need to show
∃
C>0
∀
∥—1 v1 + · · · + —n vn ∥ ≥ C:
—1 ;:::;—n ∈F
|—1 |+···+|—n |=1
Suppose that this is false, i.e.,
∀
C>0
∃
∥—1 v1 + · · · + —n vn ∥ < C:
—1 ;:::;—n ∈F
|—1 |+···+|—n |=1
In particular, choosing C = 1=m, m = 1; 2; 3; : : :, we can find a sequence of
vectors
(m)
u (m) := —1 v1 + · · · + —(m)
n vn
(m)
(m)
such that ∥u (m) ∥ → 0 as m → ∞ and |—1 | + · · · + |—n | = 1 for all m.
Sets and Equivalence of Norms
Slide 269
A Basic Norm inequality
Proof (continued).
(m)
Hence, for each k = 1; : : : ; n, |—k | ≤ 1 and so each coefficient sequence
(m)
(—k ) is bounded. Write
(m)
—(m) := (—1 ; : : : ; —(m)
n )
By the Theorem of Bolzano Weierstraß in Rn , there exists a subequence of
vectors (—(mj ) )j∈N that converges to some ¸ = (¸1 ; : : : ; ¸n ) ∈ Rn . This
corresponds to a subsequence u (mj ) of u (m) such that
j→∞
u (mj ) −−−→ ¸1 v1 + · · · + ¸n vn =: u
with |¸1 | + · · · + |¸n | = 1.
Since the vectors v1 ; : : : ; vn are independent and not all ¸k vanish, it
follows that u ̸= 0.
Sets and Equivalence of Norms
Slide 270
A Basic Norm inequality
Proof (continued).
Remark 8.9 then implies,
j→∞
∥u (mj ) ∥ −−−→ ∥u∥ ̸= 0:
But by our construction, ∥u (m) ∥ → 0 as m → ∞, so the subsequence
(∥u (mj ) ∥) must also converge to zero. This gives a contradiction.
We can now proceed to prove Theorem 8.11.
Sets and Equivalence of Norms
Slide 271
Equivalence of Norms
Proof of Theorem 8.11.
Let V be a finite-dimensional vector space, ∥ · ∥ be any norm on V and
{v1 ; : : : ; vn } a basis of V . Let v ∈ V have the representation
v = –1 v1 + · · · + –n vn with –1 ; : : : ; –n ∈ F. By the triangle inequality,
∥v ∥ = ∥–1 v1 + · · · + –n vn ∥ ≤
n
X
|–i |∥vi ∥ ≤ C
i=1
n
X
|–i |
i=1
where C := max ∥vi ∥ depends only on the basis and not on v . We hence
1≤i≤n
see that for any norm there are constants C1 ; C2 > 0 such that
C1
n
X
i=1
|–i | ≤ ∥v ∥ ≤ C2
n
X
|–i |;
(8.7)
i=1
where the first inequality is just (8.5). Given two norms ∥ · ∥1 and ∥ · ∥2 , it
follows from their respective inequalities (8.7) that (8.3) holds.
Sets and Equivalence of Norms
Slide 272
Equivalence of Norms
It is essential that Theorem 8.11 assumes that V is a finite-dimensional
vector space. In an infinite-dimensional vector space, it is possible to define
non-equivalent norms.
8.15. Example. Consider the space of continuous functions on [0; 1],
C([0; 1]). We can define the two norms
∥f ∥∞ = sup |f (x)|;
x∈[0;1]
∥f ∥1 =
Z 1
0
|f (x)| dx:
You will show in the assignments that these two norms are not equivalent.
Sets and Equivalence of Norms
Slide 273
Interior, Exterior and Boundary Points
8.16. Definition. Let (V; ∥ · ∥) be a normed vector space and M ⊂ V .
(i) A point x ∈ M is called an interior point of M if there exists an
" > 0 such that B" (x) ⊂ M.
(ii) The set of interior points of M is denoted by int M.
(iii) A point x ∈ V is called a boundary point of M if for every " > 0
B" (x) ∩ M ̸= ∅ and B" (x) ∩ (V \ M) ̸= ∅.
(iv) The set of boundary points of M is denoted by @M.
(v) A point that is neither a boundary nor an interior point of M is called
an exterior point of M.
8.17. Remarks.
(i) An exterior point of M is an interior point of V \ M. (Check this!)
(ii) For given M, any point of V is either an interior, boundary or exterior
point of M.
Sets and Equivalence of Norms
Slide 274
Closed Sets
8.18. Definition. Let (V; ∥ · ∥) be a normed vector space and M ⊂ V . Then
M is said to be closed if its complement V \ M is open.
8.19. Remark. Of course, a set M does not need to be either open or
closed. Some sets are open and closed at the same time.
8.20. Examples.
(i) A set consisting of a single point, M = {a} ⊂ V , is a closed set.
(ii) The empty set ∅ ⊂ V is closed.
(iii) The entire space V is a closed set in V .
Sets and Equivalence of Norms
Slide 275
Closed Sets
8.21. Lemma. Let (V; ∥ · ∥) be a normed vector space and M ⊂ V .
(i) The set M is open if and only if M = int M.
(ii) The set M is closed if and only if @M ⊂ M.
Proof.
(i) This is just a restatement of the definition of an open set.
(ii) Suppose that M is closed. Then V \ M is open. An open set can not
contain a boundary point, since all its points are interior points.
Hence, @M ∩ (V \ M) = ∅ and so @M ⊂ M.
Suppose that @M ⊂ M. Then V \ M contains only exterior points of
M. But an exterior point of M is an interior point of V \ M, so V \ M
is open. Hence, M is closed.
Sets and Equivalence of Norms
Slide 276
The Closure
8.22. Definition. Let (V; ∥ · ∥) be a normed vector space and M ⊂ V . Then
M := M ∪ @M
is called the closure of M.
8.23. Remark. It is not hard to show that the closure of a set M is a closed
set. In fact, it is the smallest set that both contains M and is closed.
The closure of a set may also be characterized in terms of sequences:
8.24. Lemma. Let (V; ∥ · ∥) be a normed vector space and M ⊂ V . Then
n
M= x ∈V:
∃
(xn )n∈N
xn ∈ M and xn → x
o
(8.8)
Sets and Equivalence of Norms
Slide 277
The Closure
Proof.
(i) Suppose that x ∈ V is such that there exists a sequence (xn ) with
xn ∈ M and xn → x. Then for every " > 0, B" (x) contains at least
one xn . Hence, B" (x) ∩ M ̸= ∅ and so x can not be an exterior point.
This implies x ∈ M ∪ @M.
(ii) Suppose x ∈ M ∪ @M. Then for every " > 0, B" (x) ∩ M ̸= ∅. Choose
" = 1=n for n ∈ N \ {0} to find a sequence of points
xn ∈ B1=n (x) ∩ M. This sequence converges to x, so x is in the set on
the right-hand side of (8.8).
Continuity and Convergence
Slide 278
9. Continuity and Convergence
Continuity and Convergence
Slide 279
Continuity, Differentiability, Integrability
8. Sets and Equivalence of Norms
9. Continuity and Convergence
10. The First Derivative
11. The Regulated Integral for Vector-Valued Functions
12. Curves, Orientation, and Tangent Vectors
13. Curve Length, Normal Vectors, and Curvature
14. The Riemann Integral for Scalar-Valued Functions
15. Integration in Practice
16. Parametrized Surfaces and Surface Integrals
Continuity and Convergence
Slide 280
Continuous Functions
Recall the following definition of continuity in normed vector spaces:
9.1. Definition. Let (X; ∥ · ∥X ) and (V; ∥ · ∥V ) be normed vector spaces and
f : X → V a function. Then f is continuous at a ∈ X if
∀ ∃
∀
">0 ‹>0 x∈X
∥x − a∥X < ‹
⇒
∥f (x) − f (a)∥V < ":
(9.1)
Of course, we can prove as usual the following:
9.2. Theorem. Let (X; ∥ · ∥X ) and (V; ∥ · ∥V ) be normed vector spaces and
f : X → V a function. Then f is continuous at a ∈ X if and only if
∀
(xn )n∈N
xn ∈X
xn → a
⇒
f (xn ) → f (a):
(9.2)
Continuity and Convergence
Slide 281
Image and Pre-Image of Sets
Suppose that f : M → N, where M; N are any sets. Let A ⊂ M. Then we
define the image of A by
f (A) := {y ∈ N : y = f (x) for some x ∈ A}:
In particular, we can write
ran f = f (M):
Similarly, for B ⊂ N we define the pre-image of B by
f −1 (B) := {x ∈ M : f (x) = y for some y ∈ B}:
9.3. Examples.
(i) Let f : R → R, f (x) = sin x. Then f ([0; ı]) = [0; 1].
(ii) Let f : R2 → R, f (x; y ) = x 2 + y 2 . Then
f −1 ({1}) = {(x; y ) ∈ R2 : x 2 + y 2 = 1}
(This is the unit circle in R2 ).
(9.3)
Continuity and Convergence
Slide 282
Continuous Functions
It is often useful to characterize continuous maps by using open sets:
9.4. Theorem. Let (X; ∥ · ∥X ) and (V; ∥ · ∥V ) be normed vector spaces and
f : X → V a function. Then f is continuous if and only if the pre-image
f −1 (˙) of every open set ˙ ⊂ V is open.
Proof.
(⇒) Let f be continuous and ˙ ⊂ V open. We will show that f −1 (˙) is
open. Let a ∈ f −1 (˙). Then f (a) ∈ ˙, and since ˙ is open we can
find " > 0 such that B" (f (a)) ⊂ ˙.
Now let ‹ > 0. By the continuity of f we can choose ‹ small enough
to ensure that f (B‹ (a)) ⊂ B" (f (a)). But then B‹ (a) ⊂ f −1 (˙). Since
this is true for any a ∈ f −1 (˙), it follows that f −1 (˙) is open.
Continuity and Convergence
Slide 283
Continuous Functions
Proof (continued).
(⇐) Let f : X → V be such that the pre-image f −1 (˙) of every open set
˙ ⊂ V is open. We will show that f is continuous. Let a ∈ X be
arbitrary and fix " > 0. We want to show that there exists a ‹ > 0
such that
x ∈ B‹ (a)
⇒
f (x) ∈ B" (f (a)):
(9.4)
The set B" (f (a)) is open, and by assumption f −1 (B" (f (a))) ∋ a is
also open. Thus, we can find ‹ > 0 such that B‹ (a) ⊂ f −1 (B" (f (a))).
But then (9.4) holds and we are finished.
Continuity and Convergence
Slide 284
Continuous Functions
9.5. Example. We show that the function
det : Mat(n × n; C) → C;
det A =
X
sgn ı aı(1)1 · · · aı(n)n
ı∈Sn
is continuous.
In particular, we can choose to use the norm ∥A∥ = maxi;j |aij |. Then fix
A = (aij ) ∈ Mat(n × n; C) and suppose that (Am ) is a sequence converging
(m)
to A. Our choice of norm implies that all coefficients converge, aij → aij .
Since det A is a polynomial in the coefficients aij , det Am → det A and
therefore det is continuous at A ∈ Mat(n × n; C).
Note that the pre-image of the set of non-zero complex numbers is
det−1 (C \ {0}) = GL(n; C);
the general linear group of invertible matrices. Since C \ {0} is an open
set, Theorem 9.4 implies that GL(n; C) is an open set in Mat(n × n; C).
Continuity and Convergence
Slide 285
Compact Sets
We are now interested in generalizing the results of Vv186 that apply to
continuous functions on closed intervals to vector spaces. Note that a
closed interval in R is always bounded in the following sense
9.6. Definition. Let (V; ∥ · ∥V ) be a normed vector space and M ⊂ V . Then
M is said to be bounded if there exists some R > 0 such that M ⊂ BR (0).
It turns out that the natural generalization of a closed interval is a little
more complicated than just requiring a set to be closed and bounded.
9.7. Definition. Let (V; ∥ · ∥V ) be a normed vector space and K ⊂ V . Then
K is said to be compact if every sequence in K has a convergent
subsequence with limit contained in K.
Continuity and Convergence
Slide 286
Compact Sets are Closed and Bounded
9.8. Theorem. Let (V; ∥ · ∥V ) be a (possibly infinite-dimensional) normed
vector space and K ⊂ V be compact. Then K is closed and bounded.
Proof.
We first show that K is closed by establishing K = K. Let x ∈ K. Then
there exists a sequence (xn ) in K converging to x. Since K is compact,
(xn ) has a subsequence (xnk ) that converges to x ′ ∈ K. Since (xn )
converges to x, x = x ′ ∈ K, so K = K and K is closed.
Now suppose that K is unbounded. Then for any n ∈ N there exists an
xn ∈ K such that ∥xn ∥V > n. This gives rise to an unbounded sequence
(xn ). Furthermore, any subsequence of (xn ) is unbounded. Since a
convergent sequence is bounded, we conclude that (xn ) can not have a
convergent subsequence. This implies that K is not compact. By
contraposition, if K is compact, then K must be bounded.
Continuity and Convergence
Slide 287
Closed and Bounded Sets are Sometimes Compact
9.9. Theorem. Let (V; ∥ · ∥V ) be a finite-dimensional vector space and let
K ⊂ V be closed and bounded. Then K is compact.
Proof.
Suppose that (b1 ; : : : ; bn ) be a basis of V and K closed and bounded. Let
(vm ) be a sequence in K. Then each sequence term has the representation
(m)
vm = –1 b1 + · · · + –(m)
n bn ;
–1 ; : : : ; –(m)
∈ F;
n
(m)
m ∈ N:
By Lemma 8.14 and the boundedness of K, there exist constants
C1 ; C2 > 0 such that
C1 ≥ ∥vm ∥V ≥ C2
n
X
(m)
|–k |:
k=1
Continuity and Convergence
Slide 288
Closed and Bounded Sets are Sometimes Compact
Proof (continued).
(m)
It follows that for each k, the sequence (–k ) is bounded. Write
(m)
–(m) = (–1 ; : : : ; –(m)
n ):
By the Theorem of Bolzano-Weierstraß in Rn , (–(m) ) has a convergent
subsequence (–(mj ) ) so that (vmj ) converges to some element v ∈ K. Since
K is closed, v ∈ K. This implies that K is compact.
Continuity and Convergence
Slide 289
Closed and Bounded Sets are Sometimes Compact
Theorem 9.9 is in general false in infinite-dimensional spaces:
9.10. Example. Consider the vector space of summable complex sequences,
n
‘1 := (an ) : N → C :
∞
X
o
|an | < ∞ :
n=0
The natural norm is given by
∥(an )∥1 :=
∞
X
|an |:
n=0
Then the set
n
B1 (0) = (an ) ∈ ‘ :
1
∞
X
|an | ≤ 1
n=0
is closed and bounded, but not compact.
o
Continuity and Convergence
Slide 290
Compact Sets and Continuity
Why are we so interested in compact sets? Well, it turns out that compact
sets are natural extensions of closed intervals in R for the purpose of
generalizing some major theorems on continuous functions.
9.11. Proposition. Let (X; ∥ · ∥X ), (V; ∥ · ∥V ) be normed vector spaces and
K ⊂ X compact. Let f : K → V be continuous. Then ran f = f (K) is
compact in V .
Proof.
Let (yn ) be a sequence in f (K). Then there exists a sequence (xn ) in K
with yn = f (xn ). Since K is compact, a subsequence (xnk ) of (xn )
converges to some a ∈ K. But because f is continuous the subsequence
(f (xnk )) of (yn ) converges to f (a) ∈ f (K). Hence, (yn ) has a convergent
subsequence and f (K) is compact.
Continuity and Convergence
Slide 291
Extrema of Continuous Functions on Compact Sets
9.12. Theorem. Let (X; ∥ · ∥X ) be a normed vector space and K ⊂ X
compact. Let f : K → R be continuous. Then f has a maximum in K, i.e.,
there exists an x ∈ K such that f (y ) ≤ f (x) for all y ∈ K.
Proof.
The range ran f = f (K) is compact by Proposition 9.11, so it is closed and
bounded by Theorem 9.8. The least upper bound b = sup f (K) exists
because f (K) is bounded.
Since b is the least upper bound, b can not be an exterior point of f (K),
so b ∈ f (K). Since f (K) is closed, f (K) = f (K) and b ∈ f (K).
Hence, there exists an x ∈ K with f (x) = b and f (y ) ≤ b for all
y ∈ K.
Continuity and Convergence
Slide 292
Uniform Continuity on Compact Sets
Recall the definition of uniform continuity for functions in vector spaces:
9.13. Definition. Let (X; ∥ · ∥X ) and (V; ∥ · ∥V ) be normed vector spaces,
˙ ⊂ X and f : ˙ → V a function. Then f is uniformly continuous in ˙ if
∀ ∃
∀
">0 ‹>0 x;y ∈˙
∥x − y ∥X < ‹
⇒
∥f (x) − f (y )∥V < ":
(9.5)
(Compare with Definition 9.1.)
9.14. Theorem. Let (X; ∥ · ∥X ) and (V; ∥ · ∥V ) be normed vector spaces,
K ⊂ X a compact set and f : K → V continuous on K. Then f is
uniformly continuous on K.
Continuity and Convergence
Slide 293
Uniform Continuity on Compact Sets
Proof.
Suppose that f is continuous but not uniformly continuous on K. Then
∃ ∀
∃
">0 ‹>0 x;y ∈K
∥x − y ∥X < ‹
∧
∥f (x) − f (y )∥V ≥ ":
Denote this " by "0 . Then for each ‹ = 1=n there exist vectors xn ; yn ∈ K
such that
∥xn − yn ∥X <
1
n
∧
∥f (xn ) − f (yn )∥V ≥ "0 :
Since K is compact, there exist subsequences (xnk ) and (ynk ) that
converge, say to ‰ and ”, respectively. Since ∥xnk − ynk ∥X < n1k , we see
that ‰ = ”. However, then
xnk → ‰ ∧ ynk → ‰
∧
∥f (xnk ) − f (ynk )∥V ≥ "0 ̸→ 0:
which contradicts the continuity of f at ‰.
Continuity and Convergence
Slide 294
Continuity and Convergence
We now present two lemmas that will be very useful for our future
discussion. Often, we will discuss convergence of functions that also
depend on parameters. The issue is then how varying these parameters
affects the convergence.
9.15. Lemma. Let (X; ∥ · ∥X ) and (V; ∥ · ∥V ) be normed vector spaces,
f : X → V a function such that
lim ∥f (x)∥V = 0:
x→0
Then
lim sup ∥f (t · x)∥V = 0:
x→0 t∈[0;1]
(9.6)
Continuity and Convergence
Slide 295
Continuity and Convergence
Proof.
We need to show that for any " > 0 there exists a ‹ > 0 such that for all
h ∈ X the following is true: if ∥x∥X < ‹, then
sup ∥f (t · x)∥V < ":
t∈[0;1]
Fix " > 0. Choose a ‹ > 0 such that whenever ∥y ∥X < ‹ for y ∈ X, then
∥f (y )∥V < "=2. (This is possible by the assumption (9.6).)
Then ∥x∥X < ‹ implies ∥t · x∥X < ‹ for all t ∈ [0; 1] and hence
∥f (t · x)∥V < "=2
for all t ∈ [0; 1].
But then
sup ∥f (t · x)∥V ≤ "=2 < "
t∈[0;1]
and the proof is complete.
Continuity and Convergence
Slide 296
Continuity and Convergence
The following lemma clearly shows how uniform continuity is leveraged to
provide convergence uniformly with respect to a parameter.
First, we remark that if (X; ∥ · ∥X ) and (Y; ∥ · ∥Y ) are normed vector spaces,
then so is the set of pairs X × Y with norm
∥(x; y )∥X×Y := ∥x∥X + ∥y ∥Y :
(check that this actually defines a norm on X × Y !)
Then if K1 ⊂ X is compact and K2 ⊂ Y is compact, it is easy to see
(check this!) that K1 × K2 is compact in X × Y .
(9.7)
Continuity and Convergence
Slide 297
Continuity and Convergence
9.16. Lemma. Let (X; ∥ · ∥X ), (Y; ∥ · ∥Y ) and (V; ∥ · ∥V ) be normed vector
spaces, ˙ ⊂ X an open set and K ⊂ Y a compact set. Suppose that
f : ˙ × K → V is continuous and that
lim ∥f (x; y )∥V = 0
x→0
for every y ∈ K.
Then
lim sup ∥f (x; y )∥V = 0:
x→0 y ∈K
9.17. Remark. The compactness of K is essential. For example,
lim (1 − e −x·y ) = 0
x→0
for every y ∈ [0; ∞)
but
lim
sup (1 − e −x·y ) = 1:
x→0 y ∈[0;∞)
(9.8)
Continuity and Convergence
Slide 298
Continuity and Convergence
Proof.
Since we are considering the limit as x → 0, we may restrict the x-domain
of f to a compact neighborhood Kx of zero (e.g., Kx = Br (0) for some
suitable r > 0).
Then f is considered to be defined on the compact set Kx × K and, since
f is assumed to be continuous, by Theorem 9.14 it is also uniformly
continuous on Kx × K.
That means that for any " > 0 there exists a ‹ > 0 such that
∀
(x;y );(‰;”)∈Kx ×K
∥(x; y ) − (‰; ”)∥X×Y < ‹
⇒
∥f (x; y ) − f (‰; ”)∥V < ":
Now let " > 0 be fixed and choose a corresponding ‹ > 0 so that the
above implication holds.
Continuity and Convergence
Slide 299
Continuity and Convergence
Proof (Continued).
Choosing y = ” and ‰ = 0 we then have
∥(x; y ) − (‰; y )∥X×Y = ∥x − ‰∥X = ∥x∥X
by (9.7). Furthermore,
for every y ∈ K
f (0; y ) = 0
by (9.8) and the continuity of f .
Then for our given choice of "; ‹ we have
∀
∀ ∥x∥X < ‹
x∈Kx y ∈K
⇒
∥f (x; y )∥V < ":
This implies
∀ ∥x∥X < ‹
x∈Kx
which proves the assertion.
⇒
sup ∥f (x; y )∥V ≤ ";
y ∈K
The First Derivative
Slide 300
10. The First Derivative
The First Derivative
Slide 301
Continuity, Differentiability, Integrability
8. Sets and Equivalence of Norms
9. Continuity and Convergence
10. The First Derivative
11. The Regulated Integral for Vector-Valued Functions
12. Curves, Orientation, and Tangent Vectors
13. Curve Length, Normal Vectors, and Curvature
14. The Riemann Integral for Scalar-Valued Functions
15. Integration in Practice
16. Parametrized Surfaces and Surface Integrals
The First Derivative
Slide 302
Calculus on Vector Spaces
In the rest of this term we will develop calculus for “functions of multiple
variables”. This generally means functions defined on (a subset of) Rn , but
it is not any more difficult to treat functions defined on finite-dimensional
vector spaces.
Throughout the following discussion, we assume that V and X denote
finite-dimensional, normed vector spaces. The concrete norm will be
irrelevant, as all norms are equivalent (see Theorem 8.11). We will
consider first the derivative of a function
f : X → V:
10.1. Definition. Let f : X → V1 , g : X → V2 and x0 ∈ X. We say that
f (x) = o(g (x)) as x → x0
⇔
∥f (x)∥V1
=0
x→x0 ∥g (x)∥V
2
lim
The First Derivative
Slide 303
The Derivative of a Function
10.2. Definition. Let X; V be finite-dimensional vector spaces and ˙ ⊂ X
an open set. Then a map f : ˙ → V is called differentiable at x ∈ ˙ if
there exists a linear map Lx ∈ L(X; V ) such that
f (x + h) = f (x) + Lx h + o(h)
as h → 0.
(10.1)
In this case we call Lx the derivative of f at x and write
Lx = Df |x = df |x :
We say that f is differentiable on ˙ if it is differentiable for every x ∈ ˙.
10.3. Remarks.
I Just as in the proof of 186 Lemma 3.1.2 we can show that the
derivative is uniquely defined by (10.1).
I We may also copy the proof of Lemma 186 3.1.8 to see that every
differentiable function is continuous.
The First Derivative
Slide 304
The Derivative of a Function
If f is differentiable on ˙, we may regard Df as a map
Df : ˙ → L(X; V );
x 7→ Df |x :
10.4. Definition. We define
C(˙; V ) := {f : ˙ → V : f is continuous};
C 1 (˙; V ) := {f : ˙ → V : f is differentiable and Df is continuous}:
We may thus regard the derivative D as a (linear) map
D : C 1 (˙; V ) → C(˙; L(X; V ));
f 7→ Df :
The First Derivative
Slide 305
The Derivative of a Function
10.5. Example. Let X; V be finite-dimensional vector spaces and
L ∈ L(X; V ) a linear map. Then
!
L(x + h) = Lx + Lh = Lx + DL|x h + o(h)
(h → 0);
so the derivative of L at any x ∈ X is DL|x = L.
10.6. Examples. Explicit instances of Example 10.5 are, e.g.,
C be regarded as real vector spaces and f : z → z be
the (then linear) complex conjugation. Then for z; h ∈ C
I Let X = V =
z + h = z + h;
so Df |z (h) = h.
The First Derivative
Slide 306
The Derivative of a Function
I Regard A ∈ Mat(2 × 2;
x; h ∈ 2
R
R) as a linear map R2 → R2 . Then for
A(x + h) = Ax + Ah;
so DA|x (h) = Ah.
I Let tr : Mat(n × n;
C) → C be the trace of a square matrix, i.e.,
tr A = tr(aij )1≤i;j≤n =
n
X
aii :
i=1
Then the trace is linear and for A; H ∈ Mat(n × n; C)
D tr |A H = tr H:
The First Derivative
Slide 307
The Derivative of a Function
10.7. Example. Some examples of derivatives of non-linear maps are as
follows:
I Let X = V = C be regarded as real vector spaces and f : z → z 2 .
Then for z; h ∈ C
(z + h)2 = z 2 + 2zh + h2 ;
so Df |z (h) = 2zh.
I Let f : R2 → R be given by
f (x) = f (x1 ; x2 ) = x1 + 2x2 x1 + x22 :
Then, for h = (h1 ; h2 ) ∈ R2 and x ∈ R2 ,
f (x + h) = f (x1 + h1 ; x2 + h2 )
= x1 + h1 + 2(x2 + h2 )(x1 + h1 ) + (x2 + h2 )2
= f (x) + h1 + 2(h2 x1 + h1 x2 + h2 x2 ) + 2h1 h2 + h22 :
The First Derivative
Slide 308
The Derivative of a Function as a Matrix
In
f (x + h) = f (x) + h1 + 2(h2 x1 + h1 x2 + h2 x2 ) +2h1 h2 + h22
|
{z
=:L(x1 ;x2 ) h
}
the term L(x1 ;x2 ) h is clearly linear in h, while
∥2h1 h2 ∥R
|2h1 h2 |
|h2 |
= lim q
= 2 lim q
:
h→0 ∥h∥R2
h1 ;h2 →0
h1 ;h2 →0
1 + (h2 =h1 )2
h2 + h2
lim
1
2
q
Since |h2 | → 0 as h2 → 0 and 1= 1 + (h2 =h1 )2 is bounded, we see that
∥2h1 h2 ∥R
= 0;
h→0 ∥h∥R2
lim
and so 2h1 h2 = o(h) as h → 0. Similarly, we show that h22 = o(h), so we
conclude
Df |x h = (1 + 2x2 )h1 + 2(x1 + x2 )h2 :
The First Derivative
Slide 309
The Derivative of a Function as a Matrix
Notice that we may express the derivative as a 1 × 2 matrix,
`
Df |x h = 1 + 2x2 ; 2(x1 + x2 )
´ h1
h2
!
:
This is of course not surprising; if X = Rn and V = Rm , i.e., we are
considering a function
f : Rn ⊃ ˙ → Rm ;
0
1
f1 (x1 ; : : : ; xn )
B
C
..
f (x1 ; : : : ; xn ) = @
A;
.
fm (x1 ; : : : ; xn )
then its derivative at x ∈ ˙ (if it exists) is
Df |x ∈ L(Rn ; Rm ) ≃ Mat(m × n; R):
How to obtain this matrix? Denote by ej the jth standard basis vector in
Rn or Rm . We now consider the columns of Df |x , which are given by
Df |x ej , j = 1; : : : ; n.
The First Derivative
Slide 310
The Derivative of a Function as a Matrix
Assuming that f is differentiable, for any h ∈ R, x ∈ Rn and j = 1; : : : ; n
we have
f (x + hej ) = f (x) + Df |x (hej ) + o(h);
which we may rewrite as
Df |x ej =
m
1
1X
(f (x +hej )−f (x))+o(1) =
(fk (x +hej )−fk (x))ek +o(1):
h
h k=1
The (i; j)th element of Df |x is given by ⟨ei ; Df |x ej ⟩, so
(Df |x )ij = ⟨ei ; Df |x ej ⟩ =
1
(fi (x + hej ) − fi (x)) + o(1):
h
We now take the limit h → 0 to obtain
fi (x + hej ) − fi (x)
:
h→0
h
(Df |x )ij = ⟨ei ; Df |x ej ⟩ = lim
The First Derivative
Slide 311
Partial Derivatives
10.8. Definition. Let ˙ ⊂ Rn and f : ˙ → R be differentiable on ˙. We
then define the partial derivative with respect to xj at x ∈ ˙ by
˛
@f ˛˛
f (x + hej ) − f (x)
˛ := lim
h→0
@xj ˛x
h
f (x1 ; : : : ; xj−1 ; xj + h; xj+1 ; : : : ; xn ) − f (x)
h→0
h
= lim
In this notation,
(Df |x )ij =
or rather
@fi
@xj
0 @f
1
@x1
···
@fm
@x1
···
B .
Df |x = B
@ ..
1˛
˛
˛
C
.. C˛˛
. A˛
@fm ˛˛
@f1
@xn
@xn
x
The First Derivative
Slide 312
Partial Derivatives
There are several notations for the partial derivatives of a function. If
f : Rn → R, we may use any of the following:
@f
= @xj f = @j f = fxj = fj
@xj
to denote differentiation w.r.t. the variable xj . In practice, we calculate the
partial derivative w.r.t. to xj by holding all other variables constant and
simply differentiating f as a function of xj .
10.9. Example. Let f (x1 ; x2 ; x3 ) = x1 sin(x1 x2 x3 ) + 3x22 x1 . Then
@f
= sin(x1 x2 x3 ) + x1 x2 x3 cos(x1 x2 x3 ) + 3x22 ;
@x1
@f
= x12 x3 cos(x1 x2 x3 ) + 6x2 x1 ;
@x2
@f
= x12 x2 cos(x1 x2 x3 ):
@x3
The First Derivative
Slide 313
The Jacobian
Of course, if Df |x exists, we may write it as a matrix of partial derivatives.
However, it is not clear whether the existence of all partial derivatives
implies the existence of the derivative Df |x . Thus it is useful to consider
the matrix of partial derivatives on its own; in fact, it deserves a special
designation.
10.10. Definition. Let ˙ ⊂ Rn and f : ˙ → Rm . Assume that all partial
@fi
of f exist at x ∈ ˙. The matrix
derivatives @x
j
0 @f
@x1
···
@fm
@x1
···
B .
Jf (x) := B
@ ..
1
1˛
˛
˛
C
.. C˛˛
. A˛
@fm ˛˛
@f1
@xn
@xn
x
called the Jacobian of f .
If the derivative Df |x ∈ L(Rn ; Rm ) exists, Jf (x) ∈ Mat(m × n; R) is the
representing matrix of Df |x w.r.t. the standard bases in Rn and Rm .
The First Derivative
Slide 314
The Jacobian
10.11. Example. Let f : R2 → R2 be given by
f (x1 ; x2 ) = (x12 + x22 ; x2 − x1 ). Then the partial derivatives are
@f1
@
=
(x 2 + x22 ) = 2x1 ;
@x1
@x1 1
@f2
@
=
(x2 − x1 ) = −1;
@x1
@x1
@f1
@
=
(x 2 + x22 ) = 2x2 ;
@x2
@x2 1
@f2
@
=
(x2 − x1 ) = 1:
@x2
@x2
The Jacobian is given by
Jf (x1 ; x2 ) =
2x1 2x2
−1 1
!
The natural question that arises is, “Does the existence of Jf (x) imply the
differentiability of f at x?”
The First Derivative
Slide 315
The Jacobian
Regrettably, the answer to that question is negative, as the following
example shows:
10.12. Example. Let g : R2 → R be given by
g (x1 ; x2 ) =
8
< x21 x2 2
x1 +x2
:0
(x1 ; x2 ) ̸= (0; 0)
(x1 ; x2 ) = (0; 0)
Then all partial derivatives of g exist at x = 0, since
˛
g (0 + h; 0) − g (0)
@g ˛˛
= lim
= 0;
˛
@x1 x=0 h→0
h
˛
@g ˛˛
g (0; 0 + h) − g (0)
= 0:
= lim
@x2 ˛x=0 h→0
h
Thus both partial derivatives exist at x = 0 and in fact vanish.
The First Derivative
Slide 316
The Jacobian
However, g is not even continuous at 0 since
lim g (h; h) =
h→0
h2
1
= ;
2
2
h +h
2
lim g (−h; h) =
h→0
Thus g can not be differentiable at x = 0.
−h2
1
=− :
2
2
(−h) + h
2
The First Derivative
Slide 317
The Jacobian
Thus the existence of the partial derivatives of a function f : Rn → Rm is
not even enough to guarantee the continuity of f . However, we have the
following result:
10.13. Theorem. Let ˙ ⊂ Rn be an open set and f : ˙ → Rm such that all
partial derivatives @xj fi exist on ˙.
(i) If all partial derivatives are bounded (there exists a constant M > 0
such that |@xj fi | ≤ M on ˙), then f is continuous i.e., f ∈ C(˙; Rm ).
(ii) If all partial derivatives are continuous on ˙, then f is continuously
differentiable on ˙, i.e., f ∈ C 1 (˙; Rm ). In particular,
0 @f
1
@x1
···
@fm
@x1
···
B .
Df |x = Jf (x) = B
@ ..
for all x ∈ ˙.
1˛
˛
˛
C
.. C˛˛
. A˛
@fm ˛˛
@f1
@xn
@xn
x
The First Derivative
Slide 318
The Jacobian
Proof.
Let
0
1
f1 (x1 ; : : : ; xn )
B
C
..
f (x) = @
A:
.
f : Rn → Rm ;
fm (x1 ; : : : ; xn )
For both statements of the theorem, we need to consider f (x + h) − f (x).
To illustrate, let us first look at the case n = 2. Then
fi (x + h) − fi (x) = fi (x1 + h1 ; x2 + h2 ) − fi (x1 ; x2 )
ˆ
˜
= fi (x1 + h1 ; x2 + h2 ) − fi (x1 + h1 ; x2 )
ˆ
˜
+ fi (x1 + h1 ; x2 ) − fi (x1 ; x2 ) :
The First Derivative
Slide 319
The Jacobian
Proof (continued).
For fixed h1 , the first difference can be treated by the Mean Value
Theorem 3.2.7 of Vv186: define
g : R → R;
g (y ) = fi (x1 + h1 ; y ):
Then there exists a „2 ∈ (x2 ; x2 + h2 ) such that
fi (x1 + h1 ; x2 + h2 ) − fi (x1 + h1 ; x2 )
= g (x2 + h2 ) − g (x2 )
= h2 · g ′ („2 ) = h2 @2 fi (x1 + h1 ; x2 + fi2 h2 )
where we have chosen fi2 ∈ (0; 1) such that „2 = x2 + fi2 h2 .
The First Derivative
Slide 320
The Jacobian
Proof (continued).
Similarly, we find that
@fi
(x1 + fi1 h1 ; x2 )
@x1
for some fi1 ∈ (0; 1). Generalizing to n ≥ 2, we have constants
fi1 ; : : : ; fin ∈ (0; 1) such that
fi (x1 + h1 ; x2 ) − fi (x1 ; x2 ) = h1
fi (x + h) − fi (x)
= fi (x1 + h1 ; x2 + h2 ; : : : ; xn + hn ) − fi (x1 ; x2 + h2 ; : : : ; xn + hn )
+ fi (x1 ; x2 + h2 ; : : : ; xn + hn ) − fi (x1 ; x2 ; x3 + h3 : : : ; xn + hn )
+ · · · + fi (x1 ; x2 ; : : : ; xn−1 ; xn + hn ) − fi (x1 ; x2 ; : : : ; xn )
= h1 @1 fi (x1 + fi1 h1 ; x2 + h2 ; : : : ; xn + hn )
+ h2 @2 fi (x1 ; x2 + fi2 h2 ; x3 + h3 : : : ; xn + hn )
+ · · · + hn @n fi (x1 ; x2 ; : : : ; xn + fin hn ):
The First Derivative
Slide 321
The Jacobian
Proof (continued).
We proceed with the proof of the theorem.
(i) Suppose that the partial derivatives are bounded. We want to prove
that f is continuous at x ∈ ˙, i.e.,
lim f (x + h) = f (x)
h→0
where we are free to choose arbitrary norms in Rn and Rm for the
convergence. In both spaces we choose the maximum norm ∥ · ∥∞
(see (8.4)):
∥f (x + h) − f (x)∥∞ = max |fi (x + h) − fi (x)|
i=1;:::;n
˛
˛
˛ @f
˛
˛ i
˛
≤ n · max |hj | max sup ˛
(x)˛
˛
j=1;:::;n
i;j=1;:::;n x∈˙ ˛ @xj
h→0
≤ n · M · ∥h∥∞ −−−→ 0:
The First Derivative
Slide 322
The Jacobian
Proof (continued).
(ii) Write
L=
@fi
@xj
!
= (Lij )i=1;:::;m
i=1;:::;m
j=1;:::;n
j=1;:::;n
for the Jacobian. We want to show that
f (x + h) − f (x) − Lh = o(h)
as h → 0.
We again choose the maximum norm ∥ · ∥∞ to establish the
convergence and write
u j = (x1 ; : : : ; xj−1 ; xj + fij hj ; xj+1 + hj+1 ; : : : ; xn + hn ):
for j = 1; : : : ; n. We have the following estimate:
The First Derivative
Slide 323
The Jacobian
Proof (continued).
˛
˛
∥f (x + h) − f (x) − Lh∥∞ = max ˛fi (x + h) − fi (x) −
i=1;:::;m
n
X
j=1
˛
˛
Lij hj ˛
n
˛X
˛
˛
˛
= max ˛ hj (@j fi (u j ) − @j fi (x))˛
i=1;:::;m
≤ ∥h∥∞
= o(h)
j=1
n
X
max |@j fi (u j ) − @j fi (x)|
j=1 |
→ 0 as h → 0
i=1;:::;m
{z
}
as h → 0.
Observe that we use the assumption that @j fi (x) is continuous at x.
This proves that f is differentiable, L = Df |x and Df |x depends
continuously on x.
The First Derivative
Slide 324
The Jacobian
10.14. Remark. Let ˙ ⊂ Rn be an open set. Then
C 1 (˙; Rm ) = {f : ˙ → Rm : @j fi is continuous for j = 1; : : : ; n
and i = 1; : : : ; m}:
If m = 1, we write C 1 (˙) := C 1 (˙; R) for short.
We will next establish the product and chain rules for differentiation.
The First Derivative
Slide 325
Generalized Products
To avoid having to re-prove the product rule for various types of products
that we will encounter, we first define a generalized product through
precisely those properties that we shall need.
10.15. Definition. Let X1 ; X2 ; V be normed vector spaces. A map
⊙ : X1 × X2 → V is called a (generalized) product if
1. ⊙ is bilinear, i.e., linear in each entry and
2. ∥u ⊙ v ∥V ≤ ∥u∥X1 ∥v ∥X2 for all u ∈ X1 , v ∈ X2 .
10.16. Examples.
1. The scalar product in Rn ;
2. The cross product × : R3 × R3 → R3 ;
3. For a compact non-empty set K ⊂ Rn and f ; g ∈ C(K; R) the
pointwise product f · g ∈ C(K; R), defined by
(f · g )(x) = f (x)g (x)
The First Derivative
Slide 326
The Product Rule
10.17. Product Rule. Let U; X1 ; X2 ; V be finite-dimensional vector spaces
and ˙ ⊂ U an open set. Let f : ˙ → X1 and g : ˙ → X2 be differentiable
maps and ⊙ : X1 × X2 → V a generalized product. Then f ⊙ g : ˙ → V is
also differentiable and
D(f ⊙ g ) = (Df ) ⊙ g + f ⊙ (Dg ):
(10.2)
At x ∈ ˙ the right-hand side is interpreted as a linear map U → V
u 7→ D(f ⊙ g )|x u = (Df |x u) ⊙ g (x) + f (x) ⊙ (Dg |x u):
(10.3)
The First Derivative
Slide 327
The Product Rule
Proof.
The proof is similar to that for the product rule for functions of one
variable. We telescope the difference,
f (x + h) ⊙ g (x + h) − f (x) ⊙ g (x)
= f (x + h) ⊙ (g (x + h) − g (x)) + (f (x + h) − f (x)) ⊙ g (x)
= (f (x) + O(h)) ⊙ (Dg |x h + o(h)) + (Df |x h + o(h)) ⊙ g (x)
as h → 0. Extending the relevant limit theorems from the pointwise
product to the generalized product, we have
f (x + h) ⊙ g (x + h) − f (x) ⊙ g (x)
= f (x) ⊙ (Dg |x h) + O(∥h∥2 ) + o(h) + (Df |x h) ⊙ g (x) + o(h)
= f (x) ⊙ (Dg |x h) + (Df |x h) ⊙ g (x) + o(h)
The First Derivative
Slide 328
The Chain Rule
10.18. Chain Rule. Let U; X; V be finite-dimensional vector spaces and
˙ ⊂ U, ˚ ⊂ X open sets. Let g : ˙ → ˚ and f : ˚ → V be differentiable
maps. Then the composition f ◦ g : ˙ → V is also differentiable and for all
x ∈˙
D(f ◦ g )|x = Df |g (x) ◦ Dg |x ;
(10.4)
where the right-hand side is a composition of linear maps.
The proof is basically identical to that of 186 Theorem 3.1.12, the chain
rule for functions of one real variable. You are encouraged to revisit that
proof and apply it to the general chain rule here.
10.19. Example. Consider the polar coordinates (r; ffi) ∈ (0; ∞) × [0; 2ı),
defined through the map
Φ(r; ffi) =
!
r cos ffi
:
r sin ffi
The First Derivative
Slide 329
The Chain Rule
Then
@Φ1
@r
@Φ2
@r
DΦ|(r;ffi) =
@Φ1
@ffi
@Φ2
@ffi
!
!
cos ffi −r sin ffi
:
sin ffi r cos ffi
=
Next, consider the map U : R2 → R, (x1 ; x2 ) 7→ x12 + x22 . The derivative is
DU|x =
“
@U @U
@x1 ; @x2
”
“
= 2x1 ; 2x2
”
Now U ◦ Φ = (r cos ffi)2 + (r sin ffi)2 = r 2 . Clearly, D(U ◦ Φ)|(r;ffi) = (2r; 0).
We can also apply the chain rule:
D(U ◦ Φ)|(r;ffi) = DU|(r cos ffi;r sin ffi) DΦ|(r;ffi)
“
= 2r cos ffi; 2r sin ffi
” cos ffi
sin ffi
!
−r sin ffi
r cos ffi
= (2r cos ffi + 2r sin ffi; −2r 2 cos ffi sin ffi + 2r 2 sin ffi cos ffi)
2
= (2r; 0)
2
The First Derivative
Slide 330
The Mean Value Theorem
One of the most important properties of differentiable, single-variable
functions that we encountered in Vv186 was the mean value theorem: If
f : [a; b] → R is continuous on [a; b] and differentiable on (a; b), then there
exists a number ‰ ∈ (a; b) such that
f (b) − f (a)
= f ′ (‰):
b−a
We may ask how this might be generalized to functions of several
variables: if f : Rn → Rm is differentiable, will it still be true that
f (y ) − f (x) = Df |‰ (y − x)
for some ‰ ∈ Rn , perhaps lying on a straight line between x and y ?
The First Derivative
Slide 331
The Mean Value Theorem
In the case of a scalar function f : X → R, the theorem will still hold, since
we can set
‚(t) = tx + (1 − t)y ;
t ∈ [0; 1];
and simply consider f ◦ ‚ : [0; 1] → R.
Problems will occur, however, if f is vector-valued:
10.20. Example. The function
f : [0; 2ı] → R ;
2
f (x) =
!
cos(x)
;
sin(x)
` ´
satisfies f (0) = f (2ı) = 10 , but Df |‰ ̸= (0; 0) for all ‰ ∈ (0; 2ı).
However, we may save a version of the mean value theorem by considering
integrals.
The Regulated Integral for Vector-Valued Functions
Slide 332
11. The Regulated Integral for Vector-Valued
Functions
The Regulated Integral for Vector-Valued Functions
Slide 333
Continuity, Differentiability, Integrability
8. Sets and Equivalence of Norms
9. Continuity and Convergence
10. The First Derivative
11. The Regulated Integral for Vector-Valued Functions
12. Curves, Orientation, and Tangent Vectors
13. Curve Length, Normal Vectors, and Curvature
14. The Riemann Integral for Scalar-Valued Functions
15. Integration in Practice
16. Parametrized Surfaces and Surface Integrals
The Regulated Integral for Vector-Valued Functions
Slide 334
Integrals of Vector-Space-Valued Functions
The following important result will require the integral of a function of a
single variable, albeit with values in a vector space V . In other words, we
need to assign a meaning to
Z b
a
f (x) dx;
where f : [a; b] → V .
Fortunately, the procedure is completely analogous to that of functions
f : R → R, at least for the regulated integral: we define step functions on
[a; b] with respect to a partition P by setting them constant on
sub-intervals of the partition:
11.1. Definition. A V be a real or complex vector space. A function
f : [a; b] → V is called a step function with respect to a partition
P = (a0 ; : : : ; an ) if there exist elements yi ∈ V such that f (t) = yi
whenever ai−1 < t < ai , i = 1; : : : ; n. We denote the set of all step
functions by Step([a; b]; V )
The Regulated Integral for Vector-Valued Functions
Slide 335
Step Functions
11.2. Example. The map
f : [0; 1] → R ;
2
8` ´
0
>
> 1=2
<
`1´
f (x) =
>
>`12´
:
0
0 ≤ x < 1=2
x = 1=2
1=2 < x ≤ 1
is a step function.
We then follow through with analogous definitions to the ones for real
functions, replacing the modulus in R by the norm in a vector space:
11.3. Definition. Let I ⊂ R be an interval and (V; ∥ · ∥V ) a normed vector
space. We say that a function f : I → V is bounded if
∥f ∥∞ := sup∥f (x)∥V < ∞:
x∈I
The set of all bounded functions f : I → V is denoted L∞ (I; V ).
(11.1)
The Regulated Integral for Vector-Valued Functions
Slide 336
Bounded Functions
11.4. Example. The map
f:R→R ;
2
f (t) =
sin t
e −t 2
!
is a bounded map. To see this, we endow R2 with the norm
∥x∥1 := |x1 | + |x2 |. (Since all norms in Rn are equivalent, it doesn’t matter
which norm we take.) Then
`
´
∥f ∥∞ := sup∥f (t)∥1 = sup |sin t| + |e −t |
t∈R
t∈R
2
≤ sup|sin t| + sup|e −t | = 2 < ∞:
t∈R
t∈R
2
The Regulated Integral for Vector-Valued Functions
Slide 337
Integrals of Step Functions
We then define the integral of a step function as before:
11.5. Theorem. Let f : [a; b] → V be a step function with respect to some
partition P . Then
IP (f ) := (a1 − a0 )y1 + · · · + (an − an−1 )yn ∈ V
is independent of the choice of the partition P and is called the integral of
f.
Note that if f : [a; b] → V , then
element of the vector space V .
Rb
a f (x) dx ∈ V , the integral of f
is an
(This makes it impossible to define the Riemann integral for functions
f : I → V , because it relies comparing the size of upper and lower step
functions.)
The Regulated Integral for Vector-Valued Functions
Slide 338
Integrals of Step Functions
The main ingredient is again uniform convergence, where we now say that
a sequence of functions (fn ), fn : I → V , I ⊂ R, converges uniformly to
f : I → V in a normed vector space (V; ∥ · ∥V ) if
n→∞
∥fn − f ∥∞ := sup∥fn (x) − f (x)∥V −−−→ 0:
x∈I
A function f is then said to be regulated if it is the uniform limit of a
sequence of step functions. We can then define the integral of f as the
limit of the integrals of these step functions.
You are invited to check that everything in fact works just as in the
regulated integral for scalar real functions!
The Regulated Integral for Vector-Valued Functions
Slide 339
Integrals of Step Functions
The upshot is the following: if f : [a; b] → Rn is piecewise continuous, then
f is regulated and
Z b
a
f (x) dx =
Z b
a
0
1
0R
1
b
f1 (x)
a f1 (x) dx
B
C
B .. C
..
C
@ . A dx = B
.
@
A
fn (x)
Rb
a fn (x) dx
(This follows because a sequence of step functions converging uniformly to
f will converge uniformly in each component; the individual components
are then equal to the “usual” regulated integrals of real-valued functions.)
Furthermore, we have the standard estimate
‚Z
‚
Z b
‚ b
‚
‚
‚
f (x) dx ‚ ≤
∥f (x)∥V dx ≤ |b − a| · sup ∥f (x)∥V :
‚
‚ a
‚
a
x∈[a;b]
V
The Regulated Integral for Vector-Valued Functions
Slide 340
The Mean Value Theorem
We may now write down an “integral version” of the mean value theorem
(186 Theorem 3.2.7):
11.6. Mean Value Theorem. Let X; V be finite-dimensional vector spaces,
˙ ⊂ X open and f ∈ C 1 (˙; V ). Let x; y ∈ ˙ and assume that the line
segment x + ty , 0 ≤ t ≤ 1, is wholly contained in ˙. Then
f (x + y ) − f (x) =
Z 1
0
Df |x+ty y dt =
“Z 1
0
”
Df |x+ty dt y :
(11.2)
11.7. Remark. The integrals in (11.2) are integrals of elements of V (the
integrand Df |x+ty y ) and L(X; V ) (the integrand Df |x+ty ). Hence the
second equality is not trivial but needs to be proved.
The Regulated Integral for Vector-Valued Functions
Slide 341
The Mean Value Theorem
The Mean Value Theorem 11.6 can also be understood as a generalization
of the fundamental theorem of calculus: For single-variable functions, the
fundamental theorem of calculus can be expressed as
f (x + y ) − f (x) =
Z x+y
f ′ (‰) d‰:
x
Substituting t = (‰ − x)=y in the integral, we have the equivalent identity
f (x + y ) − f (x) =
a special case of (11.2).
Z 1
0
f ′ (x + y t)y dt;
The Regulated Integral for Vector-Valued Functions
Slide 342
The Mean Value Theorem
Proof of Theorem 11.6.
Define the auxiliary function g ∈ C 1 ([0; 1]; V ) by g (t) := f (x + ty ). Thus
(by 186 Lemma 4.2.3) we have
f (x + y ) − f (x) = g (1) − g (0) =
Z 1
g ′ (t) dt:
0
For ‚(t) = x + ty we have ‚ ′ (t) = y . Applying the chain rule,
g ′ (t) = D(f ◦ ‚)|t = Df |‚(t) D‚|t = Df |x+ty y :
Thus we obtain
f (x + y ) − f (x) =
proving the first equality.
Z 1
0
Df |x+ty y dt;
The Regulated Integral for Vector-Valued Functions
Slide 343
The Mean Value Theorem
Proof of Theorem 11.6 (continued).
We now prove that y may be “taken out” of the integral. Let us abbreviate
L(t) = Df |x+ty . For z ∈ (0; 1) we have
d
dz
Z z
L(t)y dt = L(z)y =
0
d
dz
ȷ„Z z
« ff
L(t) dt y :
0
Furthermore, setting z = 0 we have
Z 0
L(t)y dt = 0 =
„Z 0
0
Therefore,
«
L(t) dt y :
0
Z z
0
L(t)y dt =
„Z z
«
L(t) dt y
0
for all z ∈ [0; 1], in particular also for z = 1.
The Regulated Integral for Vector-Valued Functions
Slide 344
The Mean Value Theorem
11.8. Corollary. From the standard estimate
‚Z b
‚
‚
‚
‚
‚ ≤ |b − a| · sup ∥f (t)∥V
f
(t)
dt
‚
‚
a
V
t∈[a;b]
Theorem 11.6 yields
∥f (x + y ) − f (x)∥V ≤ ∥y ∥X · sup ∥Df |x+ty ∥;
0≤t≤1
where ∥Df |x+ty ∥ denotes the operator norm of Df |x+ty ∈ L(X; V ).
The Regulated Integral for Vector-Valued Functions
Slide 345
Differentiating Under an Integral
We close this section with a useful result concerning the interchanging of
differentiation and integration.
11.9. Theorem. Let X; V be finite-dimensional, normed vector spaces,
I = [a; b] ⊂ R an interval and ˙ ⊂ X an open set. Let f : I × ˙ → V be a
continuous function such that Df (t; ·)|x exists and is continuous at every
(t; x) ∈ I × ˙. Then
g (x) =
Z b
f (t; x) dt
a
is differentiable in ˙ and
Dg (x) =
Z b
a
Df (t; · )|x dt:
The Regulated Integral for Vector-Valued Functions
Slide 346
Differentiating Under an Integral
Proof.
Fix x ∈ ˙ and choose h small enough such that x + h ∈ ˙. In any case,
we assume ∥h∥X < 1. We need to prove
g (x + h) − g (x) −
„Z b
|
a
«
Df (t; · )|x dt h = o(h):
{z
=:L
}
By the Mean Value Theorem, the left-hand side equals
Z b
a
=
´
f (t; x + h) − f (t; x) − Df (t; · )|x h dt
Z b „Z 1
a
=
`
0
Z b „Z 1
a
0
«
Df (t; · )|x+sh ds h − Df (t; · )|x h dt
`
´
«
Df (t; · )|x+sh − Df (t; · )|x ds h dt:
(11.3)
The Regulated Integral for Vector-Valued Functions
Slide 347
Differentiating Under an Integral
Proof (continued).
Taking the norm, we have
∥g (x + h) − g (x) − Lh∥V
‚Z 1
‚
‚ `
´ ‚
‚
≤ (b − a) sup ‚
Df (t; · )|x+sh − Df (t; · )|x ds ‚
‚ · ∥h∥X
0
t∈[a;b]
‚
‚
≤ (b − a) sup sup ‚Df (t; · )|x+sh − Df (t; · )|x ‚ · ∥h∥X
t∈[a;b] s∈[0;1]
Here ∥ · ∥ denotes the operator norm (4.7). We now need to show that
sup
‚
‚
sup ‚Df (t; · )|x+sh − Df (t; · )|x ‚
t∈[a;b] s∈[0;1]
vanishes when h → 0. However, this requires careful reasoning because s
and t are free to vary independently of h
The Regulated Integral for Vector-Valued Functions
Slide 348
Differentiating Under an Integral
Proof (continued).
Define the function ¯ : [a; b] × ˙ → L(˙; V ),
¯ (t; h) = Df (t; · )|x+h − Df (t; · )|x :
Since f is assumed to be continuously differentiable, ¯ is continuous and
lim ∥¯ (t; h)∥ = 0
for any t ∈ [a; b].
h→0
By Lemma 9.16 we see that
lim sup ∥¯ (t; h)∥ = 0:
h→0 t∈[a;b]
Furthermore, by Lemma 9.15,
lim sup
sup ∥¯ (t; sh)∥ = 0;
h→0 s∈[0;1] t∈[a;b]
which is what we wanted to show.
The Regulated Integral for Vector-Valued Functions
Slide 349
Differentiating Under an Integral
Differentiating an integral with respect to a parameter can be very useful
for calculating integrals that are otherwise difficult to evaluate directly. For
example, by differentiating
g (x) :=
Z ∞
0
sin t −xt
e
dt
t
with respect to x you will show in the assignments that
g (0) =
Z ∞
0
sin t
ı
dt = :
t
2
(Compare with the discussion of the Dirichlet integral last term, see
186 Example 4.2.14.)
Curves, Orientation, and Tangent Vectors
Slide 350
12. Curves, Orientation, and Tangent Vectors
Curves, Orientation, and Tangent Vectors
Slide 351
Continuity, Differentiability, Integrability
8. Sets and Equivalence of Norms
9. Continuity and Convergence
10. The First Derivative
11. The Regulated Integral for Vector-Valued Functions
12. Curves, Orientation, and Tangent Vectors
13. Curve Length, Normal Vectors, and Curvature
14. The Riemann Integral for Scalar-Valued Functions
15. Integration in Practice
16. Parametrized Surfaces and Surface Integrals
Curves, Orientation, and Tangent Vectors
Slide 352
Curves
As an important application, we consider vector-space-valued functions of
a single variable, ‚ : R → V . These play an important role in the
parametrization of curves. In many applications V = Rn , but our results
are applicable for curves in general normed vector spaces (V; ∥ · ∥).
12.1. Definition. Let (V; ∥ · ∥) be a normed vector space and I ⊂ R an
interval.
I A set C ⊂ V
for which there exists a continuous, surjective and locally
injective map ‚ : I → C is called a curve.
I The map ‚ is called a parametrization of C.
I A curve C together with a parametrization ‚, i.e., the pair (C; ‚), is
called a parametrized curve.
12.2. Remark. Here locally injective means that in the neighborhood
B" (x) ∩ I of any point x ∈ I the parametrization is injective.
Curves, Orientation, and Tangent Vectors
Slide 353
Local Properties
More generally, we say that a property holds locally at a point p ∈ V if
this property holds in some "-neighborhood B" (p).
12.3. Examples.
R) and f (0) > 0, then f is locally positive at 0.
R) and f ′ (0) > 0, then f is locally increasing at 0.
I If f ∈ C 1 (R) and f ′ (0) ̸= 0, then f is locally invertible at 0.
I If f ∈ C(
I If f ∈ C 1 (
We simply say that a property holds locally if this property holds locally at
every point p ∈ V .
12.4. Example. The sequence of functions (fn ) given by fn (x) = (x − 1=n)2
converges to f (x) = x 2 locally uniformly, because at every point p ∈ R
there is an "-neighborhood such that the convergence is uniform in this
neighborhood.
Curves, Orientation, and Tangent Vectors
Slide 354
Curves
12.5. Example. The set
˘
S 1 := (x1 ; x2 ) ∈ R2 : x12 + x22 = 1
¯
is a curve in R2 because we can find a parametrization, e.g.,
‚ : [0; 2ı] → S ;
1
‚(t) =
!
cos(t)
:
sin(t)
It is clear that ‚ is continuous. Furthermore, ran ‚ ⊂ S 1 since
cos2 t + sin2 t = 1
for all t ∈ [0; 2ı].
The map ‚ is not injective, since ‚(0) = ‚(2ı) = (1; 0), but it is injective
on (0; 2ı): If ‚(t1 ) = ‚(t2 ) ̸= (1; 0), then cos t1 = cos t2 and
sin t1 = sin t2 . The second equation implies t1 ; t2 ∈ (0; ı] or
t1 ; t2 ∈ (ı; 2ı). However, since the cosine function is injective on (0; ı]
and on (ı; 2ı) we obtain t1 = t2 . Hence ‚ is locally injective.
Curves, Orientation, and Tangent Vectors
Slide 355
Curves
Now suppose (x1 ; x2 ) ∈ S 1 is given. Then taking
t0 =
8
x2
>
>
>arctan x1
>
>
<ı=2
>
ı + arctan xx21
>
>
>
>
:
3ı=2
if x2 > 0, x1 ̸= 0;
if (x1 ; x2 ) = (0; 1);
if x2 < 0, x1 ̸= 0;
if (x1 ; x2 ) = (0; −1);
(12.1)
(using a suitable branch of the inverse tangent) gives ‚(t0 ) = (x1 ; x2 ) for
some t0 ∈ [0; 2ı]. Thus, ‚ is surjective and therefore a parametrization.
Another parametrization is
‚e : [0; 1] → S ;
1
‚e(t) =
Both (C; ‚) and (C; ‚e) are parametrized curves.
!
cos(2ıt)
:
− sin(2ıt)
Curves, Orientation, and Tangent Vectors
Slide 356
Parametrizations of Curves
It is clear that a curve will have an infinite number of parametrizations.
Physically, a curve C might be considered to be the path of a particle, while
the parametrization ‚ gives the position of the particle at each time t.
Hence, in Example 12.5, ‚ describes the counter-clockwise movement of a
particle around the unit circle, while ‚e describes a clockwise movement
around the same path. The parametrization ‚ corresponds to completing
the path in time 2ı, while ‚e corresponds to completing the path in time 1.
Hence, the parametrization ‚e can be said to correspond to a greater
velocity of the particle.
Curves, Orientation, and Tangent Vectors
Slide 357
Simple, Open and Closed Curves
12.6. Definition. Let C ⊂ V be a curve possessing a parametrization
‚ : I → C with int I = (a; b) for −∞ ≤ a < b ≤ ∞.
(i) If ‚ is (globally) injective parametrization we say that C is a simple
curve.
(ii) If
lim ‚(t) = lim ‚(t);
t→a
t→b
the curve C is said to be closed.
(iii) If a curve is not closed, it is said to be open. The points
x := lim ‚(t)
t→a
and
y := lim ‚(t)
t→b
are called the initial point and the final point of the parametrized
curve (C; ‚). The open curve is said to join x and y .
Curves, Orientation, and Tangent Vectors
Slide 358
Sketches of Simple, Open and Closed Curves
Curves, Orientation, and Tangent Vectors
Slide 359
Initial and Final Points of Open Curves
12.7. Remark. Whether a point is an initial or final point of an open curve
depends on the parametrization. We will explore this a little further.
12.8. Example. The simple open curve
C=
˘
(x1 ; x2 ) ∈ R2 : 0 ≤ x1 ≤ 1; x2 = x12
¯
joins the points x = (0; 0) and y = (1; 1). Either may be considered the
initial point or the final point. Possible parametrizations are
‚(t) =
!
t
;
t2
where both ‚; ‚e : [0; 1] → C.
‚e(t) =
1−t
(1 − t)2
!
Curves, Orientation, and Tangent Vectors
Slide 360
Reparametrization of Curves
12.9. Definition. Let C ⊂ V be a curve with parametrization ‚ : I → C.
(i) Let J ⊂ R be an interval. A continuous, bijective map r : J → I is
called a reparametrization.of the parametrized curve (C; ‚).
(ii) If r is increasing the reparametrization is said to be
orientation-preserving.
(iii) If r is decreasing the reparametrization is said to be
orientation-reversing.
12.10. Remarks.
(i) Given any two parametrizations ‚; ‚e of an open curve, one can always
find a reparametrization by setting r = ‚ −1 ◦ ‚e (the continuity and
local injectivity is enough for this definition to make sense).
(ii) Since every bijective map in R is either decreasing or increasing (see
186 Theorem 2.5.20), it follows that a reparametrization is either
orientation-preserving or orientation-reversing.
Curves, Orientation, and Tangent Vectors
Slide 361
Reparametrization of Curves
12.11. Example. Consider the unit circle S 1 of Example 12.5 with
parametrizations
‚ : [0; 2ı] → S ;
1
‚e : [0; 1] → S ;
1
!
‚(t) =
cos(t)
;
sin(t)
‚e(t) =
cos(2ıt)
:
− sin(2ıt)
!
Then r : [0; 1] → [0; 2ı], r (t) = 2ı(1 − t), is a reparametrization of the
paramatrized curve (C; ‚). In fact,
‚e = ‚ ◦ r:
The reparametrization is not orientation-preserving since r ′ (t) = −2ı < 0.
Curves, Orientation, and Tangent Vectors
Slide 362
Orientation of Curves
A reparametrization of a parametrized curve (C; ‚) yields a new
parametrized curve (C; ‚e) where ‚e = ‚ ◦ r .
It is easy to see that an orientation-preserving reparametrization of an open
curve (C; ‚) yields an open parametrized curve (C; ‚e) with the same initial
and final points.
12.12. Definition. Let (C; ‚) be a parametrized curve and r a
reparametrization of (C; ‚).
The curve (C; ‚e) with ‚e = ‚ ◦ r is said to have the same orientation as
(C; ‚) if r is orientation-preserving. Otherwise it is said to have reverse
orientation.
12.13. Remark. The orientation of an open curve can be fixed by selecting
the initial and final points. The orientation of a closed curve can be fixed
by splitting the curve into two disjoint simple curves and fixing appropriate
orientations for them.
Curves, Orientation, and Tangent Vectors
Slide 363
Orientation of Curves
Hence a curve can have two possible orientations. If we want to fix a curve
C together with an orientation (but not necessarily a concrete
parametrization), we denote it by C∗ and if necessary give a single
parametrization ‚ so that (C; ‚) has the desired orientation. The same
curve with opposite orientation is denoted by −C∗ . (This will be quite
important when we discuss integration later.)
There is in general no natural way to select a “proper” or positive
orientation of a curve; rather both possible orientations have equal validity.
There is a single exception, however:
12.14. Definition. Let (C; ‚) be a parametrized, simple, closed curve in R2 .
Then C is said to have positive orientation if ‚ traverses C in a
counter-clockwise direction.
Curves, Orientation, and Tangent Vectors
Slide 364
Curves in Polar Coordinates
When we previously introduced polar coordinates in C, we remarked that
there is a one-to-one correspondence
C \ {0} ∋ x + iy ↔ (r; ’) ∈ R+ × [0; 2ı)
We want to adapt this to R2 instead of C, i.e., associate an angle ’ and a
distance r to every point (x1 ; x2 ) ∈ R2 .
One of the main difficulties stems from the fact that we can not associate
an angle ’ to x = 0. However, if we do not focus on associating an angle
’ to every point x ∈ R2 , but only on finding a cartesian point
(x1 ; x2 ) ∈ R2 given (r; ’), we can be a bit more flexible.
Curves, Orientation, and Tangent Vectors
Slide 365
Curves in Polar Coordinates
We will allow (r; ’) ∈ R2 , and associate to them a point x ∈ R2 as follows:
x=
r cos ’
r sin ’
!
Of course this association is not injective, but this will not matter for our
present purposes. We consider a particular type of curve, defined through
the map
‚(t) =
!
f (t) cos t
;
f (t) sin t
(12.2)
where f : R → R is some function. For short, such a curve is sometimes
written as
r = f (t)
(12.3)
A curve given (12.3) is known as a curve in polar coordinates. The
equation (12.3) is to be interpreted in the sense of (12.2).
Curves, Orientation, and Tangent Vectors
Slide 366
Curves in Polar Coordinates
12.15. Example. The cardioid is the plane curve given in polar coordinates
by r = 1 − sin t.
y
-1
1
-2
x
Curves, Orientation, and Tangent Vectors
Slide 367
Smooth Curves
12.16. Definition. A curve C ⊂ V is said to be smooth if there exists a
parametrization ‚ : I → C such that
(i) ‚ is continuously differentiable on int I and
(ii) ‚ ′ (t) ̸= 0 for all t ∈ int I.
A smooth reparametrization is a reparametrization that is continuously
differentiable with non-vanishing derivative in the interior of its domain.
The Jacobian J‚ = ‚ ′ of a smooth curve ‚ : I → Rn is given by
0 ′
1
‚1 (t)
B .. C
′
‚ (t) = @ . A ;
‚n′ (t)
t ∈ int I:
Curves, Orientation, and Tangent Vectors
Slide 368
Graphs of Functions as Curves
Let us consider the case of the graph ` of a function f : I → R, I ⊂ R an
interval: it is defined as the set
˘
¯
` = (x; y ) ∈ R2 : x ∈ I; y = f (x) :
This set can be regarded as a curve with parametrization
‚: I → R ;
t 7→
2
!
t
:
f (t)
12.17. Example. Consider the function f : R → R, f (x) = x 2 . Its graph is
just the curve parametrized by
‚(t) =
!
‚1 (t)
‚2 (t)
=
!
t
:
t2
Curves, Orientation, and Tangent Vectors
Slide 369
Curves and Graphs of Functions
y
Γ HtL = It, t 2 M
3
t = -1.5
2
t=1
1
-1
t=0
1
x
Curves, Orientation, and Tangent Vectors
Slide 370
Curves and Graphs of Functions
y
2
Γ' HtL = H1, 2 tL
1
t=
1
By our previous considerations, ‚ is differentiable and
2
′
x
1
‚ (t) =
!
‚1′ (t)
‚2′ (t)
=
!
1
:
2t
The graph of ‚ ′ is quite unspectacular.
-1
t=-
4
5
Curves, Orientation, and Tangent Vectors
Slide 371
Tangent Lines of Curves
So what is the interpretation of ‚ ′ ? If ‚ = (t; f (t)) parametrizes the graph
function of some function f , then ‚ ′ (t) = (1; f ′ (t)). Recall that the
derivative satisfies
‚(t0 + t) = ‚(t0 ) + ‚ ′ (t0 )t + o(t);
so ‚(t0 ) + ‚ ′ (t0 )t is a linear approximation to the parametrization ‚ at a
point t0 .
In fact, if C ⊂ V is a curve and p = ‚(t0 ) ∈ C, then
Tp C = {x ∈ V : x = ‚(t0 ) + ‚ ′ (t0 )t; t ∈ R}
gives the tangent line to C at p.
Curves, Orientation, and Tangent Vectors
Slide 372
Tangent Lines of Curves
Continuing from Example 12.17, we have the following tangent line Tp `
for p = (1; 1):
y
G = 8x: x = Γ HtL<
3
2
Tp G
p = Γ H1L=H1,1L
1
-1
1
x
Curves, Orientation, and Tangent Vectors
Slide 373
Tangent Lines of Curves
1
y
0
12.18. Example. Consider the
curve parametrized by
0
-1
1
t
B
C
‚ : [0; 8ı] → R3 ; ‚(t) = @cos t A :
sin t
1
z 0
-1
This curve is called a helix.
0
10
x
20
Curves, Orientation, and Tangent Vectors
Slide 374
Tangent Lines of Curves
1
y
0
-1
1
This is the graph of
0
1
1
C
B
‚ ′ (t) = @− sin t A
cos t
z
0
-1
0
1
x
2
3
Curves, Orientation, and Tangent Vectors
Slide 375
Tangent Lines of Curves
The tangent line makes sense as a linear approximation
0
1
0
1
t0
1
B
C B
C
‚(t0 + t) = @cos t0 A + @− sin t0 A t + o(t)
sin t0
cos t0
y
0
-1
1
z
0
-1
0
1
Curves, Orientation, and Tangent Vectors
Slide 376
The Tangent Vector to a Curve
12.19. Definition. Let C∗ ⊂ V be an oriented smooth curve in (V; ∥ · ∥) and
p ∈ C∗ . Let ‚ : I → V be a parametrization of C∗ . Then we define the unit
tangent vector to C∗ at p = ‚(t) by
T ◦ ‚(t) :=
‚ ′ (t)
;
∥‚ ′ (t)∥
t ∈ int I:
(12.4)
This defines the tangent vector field T : C∗ → V on C: every point of C is
associated to a vector in V .
We will show that (12.4) does not depend on which parametrization ‚ is
used to calculate T , as long as ‚ corresponds to the orientation of C∗ .
Curves, Orientation, and Tangent Vectors
Slide 377
The Tangent Vector to a Curve
In fact, suppose ‚ : I → C, ‚e : J → C are two smooth parametrizations
connected by a reparametrization r : J → I so that ‚e = ‚ ◦ r . Let p ∈ C
satisfy p = ‚(t) = ‚e(fi ), t = r (fi ).
Then
‚e′ (fi ) =
d
‚(r (fi )) = ‚ ′ (r (fi ))r ′ (fi ) = ‚ ′ (t)r ′ (fi ):
dfi
Hence,
T ◦ ‚e(fi ) =
‚e′ (fi )
r ′ (fi ) ‚ ′ (t)
r ′ (fi )
=
=
T ◦ ‚(t):
∥‚e′ (fi )∥
|r ′ (fi )| ∥‚ ′ (t)∥
|r ′ (fi )|
(12.5)
If r is orientation-preserving, then r ′ (t) > 0 and the tangent vector is the
same when calculated using ‚ as when using ‚e.
If r is orientation-reversing, the tangent vector reverses direction.
Thus (12.4) defines a unique unit tangent vector for an oriented curve.
Curves, Orientation, and Tangent Vectors
Slide 378
The Tangent Vector to a Curve
12.20. Example. Consider the circle of radius R in R2 ,
C := {x ∈
R2 : x12 + x22 = R2 }:
By choosing the parametrization
‚ : [0; 2ı) → C;
‚(t) =
R cos t
R sin t
!
we endow C with a positive (counter-clockwise) orientation. The unit
tangent vector at ‚(t) is then given by
‚ ′ (t)
1
T ◦ ‚(t) = ′
=
∥‚ (t)∥
R
Thus, if p = (p1 ; p2 ) ∈ C, then
1
T (p) =
R
Hence T (p) ⊥ p.
−R sin t
R cos t
!
!
−p2
:
p1
=
!
− sin t
:
cos t
Curve Length, Normal Vectors, and Curvature
Slide 379
13. Curve Length, Normal Vectors, and Curvature
Curve Length, Normal Vectors, and Curvature
Slide 380
Continuity, Differentiability, Integrability
8. Sets and Equivalence of Norms
9. Continuity and Convergence
10. The First Derivative
11. The Regulated Integral for Vector-Valued Functions
12. Curves, Orientation, and Tangent Vectors
13. Curve Length, Normal Vectors, and Curvature
14. The Riemann Integral for Scalar-Valued Functions
15. Integration in Practice
16. Parametrized Surfaces and Surface Integrals
Curve Length, Normal Vectors, and Curvature
Slide 381
Curve Length
Consider a simple curve C ⊂ V parametrized by ‚ : [a; b] → V , where
(V; ∥ · ∥) is a normed vector space. Then a natural approximation to the
length of ‚, which will depend on the norm used, is found by taking a
partition P = (a0 ; : : : ; an ) of [a; b] and considering the lengths of the
straight line segments joining ‚(ai−1 ) to ‚(ai ), i = 1; : : : ; n.
The sum of the lengths of these line segments is given by
‘P (C) =
n
X
∥‚(ai ) − ‚(ai−1 )∥:
i=1
We will say that a curve has a length if there exists an upper bound to
these lengths. Note that ‘P (C) is of course independent of the
parametrization ‚, since only the actual points ‚(ai ) ∈ C are used in this
definition.
Curve Length, Normal Vectors, and Curvature
Slide 382
Curve Length
13.1. Definition. Let (V; ∥ · ∥) be a normed vector space and C ⊂ V an
open curve. Then we say that C is rectifiable if
‘(C) :=
sup
partitions P
‘P (C)
exists. We then call ‘(C) the curve length or arc length of C.
13.2. Theorem. Let C ⊂ V be a smooth and open curve with
parametrization ‚ : [a; b] → C. Then C is rectifiable if and only if
Z b
a
∥‚ ′ (t)∥ dt < ∞:
Furthermore, if C is rectifiable,
‘(C) =
Z b
a
∥‚ ′ (t)∥ dt;
where the right-hand side is independent of ‚.
(13.1)
Curve Length, Normal Vectors, and Curvature
Slide 383
Curve Length
13.3. Example. Consider the helix segment C given by the graph of
0
1
¸t
B
C
‚(t) = @R cos t A ;
R sin t
‚ : [0; 2ı] → R3 ;
¸; R > 0:
The length of C = ‚([0; 2ı]) is given by
‘(C) =
Z 2ı
0
∥‚ ′ (t)∥ dt =
p
Z 2ı q
¸2 + (−R sin t)2 + R2 cos2 t dt
0
= 2ı ¸2 + R2 :
13.4. Remark. Definition 13.1 and Theorem 13.2 refer to open curves. To
find the length of a closed curve, we express it as the disjoint union of two
simple curves and find their lengths separately.
Curve Length, Normal Vectors, and Curvature
Slide 384
Curve Length
Proof of Theorem 13.2.
We first show that the value of the integral
Z b
a
∥‚ ′ (t)∥ dt
does not depend on the parametrization ‚. Let C ⊂ V be a smooth curve
and ‚ : [a; b] → C a parametrization of C. Let ‚e : [¸; ˛] → C be some other
parametrization and let r : [¸; ˛] → [a; b] be a smooth reparametrization so
that
‚e(fi ) = ‚(r (fi ))
Suppose that r is orientation-preserving. Then r (¸) = a and r (˛) = b.
Curve Length, Normal Vectors, and Curvature
Slide 385
Curve Length
Proof (continued).
Let t = r (fi ) so dt = r ′ (fi ) dfi . Furthermore,
`
´′
‚e′ (fi ) = ‚ ◦ r (fi ) = ‚ ′ (r (fi ))r ′ (fi )
‚ ′ (r (fi )) =
⇔
‚e′ (fi )
r ′ (fi )
Thus, substituting t = r (fi ),
Z b
a
∥‚ ′ (t)∥ dt =
=
Z ˛
¸
Z ˛
¸
∥‚ ′ (r (fi ))∥ r ′ (fi ) dfi =
∥‚e′ (fi )∥
|r ′ (fi )|
r ′ (fi ) dfi =
Z ˛‚ ′
‚
‚ ‚e (fi ) ‚ ′
‚ ′
‚r (fi ) dfi
Z ˛
¸
¸
r (fi )
∥‚e′ (fi )∥ dfi;
where
we have used that r is increasing, i.e., r ′ (fi ) > 0. This proves that
Rb ′
a ∥‚ (t)∥ dt is independent of the parametrization ‚.
Curve Length, Normal Vectors, and Curvature
Slide 386
Curve Length
Proof (continued).
Now, for any partition P of [a; b] and any parametrization ‚ we have
‘P (C) =
=
≤
=
n
X
∥‚(ai ) − ‚(ai−1 )∥
i=1
n ‚Z ai
‚
X
‚
‚
′
‚
i=1 ai−1
n Z ai
X
i=1 ai−1
Z b
a
Hence, ‘(C) ≤
Rb
′
a ∥‚ (t)∥ dt.
‚ (t) dt ‚
∥‚ ′ (t)∥ dt
∥‚ ′ (t)∥ dt:
Curve Length, Normal Vectors, and Curvature
Slide 387
Curve Length
Proof (continued).
Proving the converse inequality is slightly more difficult. We first establish
three preliminary estimates. We will use the fact that since C is smooth,
‚ ′ : [a; b] → V is continuous and hence uniformly continuous on [a; b] (see
Theorem 9.14). Fix " > 0.
(i) Choose a ‹ > 0 such that
|t − fi | < ‹
for all t; fi ∈ [a; b].
⇒
∥‚ ′ (t) − ‚ ′ (fi )∥ <
"
2(b − a)
Curve Length, Normal Vectors, and Curvature
Slide 388
Curve Length
Proof (continued).
(ii) Consider the function f : [a; b] → R, f (t) = ∥‚ ′ (t)∥. Since ‚ is
smooth, f is continuous and we can find a step function that
uniformly approximates f . In fact, there is a 0 < ‹1 < ‹ and a
partition P = (a0 ; : : : ; an ) with ai − ai−1 < ‹1 , i = 1; : : : ; n, of [a; b]
such that
˛Z b
˛
n
X
˛
˛
′
′
˛
˛
∥‚
(t)∥
dt
−
(a
−
a
)∥‚
(a
)∥
i
i−1
i−1 ˛ < "=2
˛
a
i=1
Curve Length, Normal Vectors, and Curvature
Slide 389
Curve Length
Proof (continued).
(iii) For any t ∈ (a; b) and h < ‹ with t + h ∈ [a; b],
‚ ‚(t + h) − ‚(t)
‚
‚ ‚ 1 Z t+h
‚
‚ ‚
‚
‚ ′ (fi ) dfi − ‚ ′ (t)‚
− ‚ ′ (t)‚ = ‚
‚
h
h
t
‚ 1 Z t+h `
´ ‚
‚
‚
=‚
‚ ′ (fi ) − ‚ ′ (t) dfi ‚
h t
Z
1 t+h ′
≤
∥‚ (fi ) − ‚ ′ (t)∥ dfi
h t
h
"
≤
sup ∥‚ ′ (fi ) − ‚ ′ (t)∥ <
:
h fi ∈[t;t+h]
2(b − a)
‚ ‚(t + h) − ‚(t) ‚
‚
‚
‚+
This implies ∥‚ ′ (t)∥ ≤ ‚
h
"
:
2(b − a)
Curve Length, Normal Vectors, and Curvature
Slide 390
Curve Length
Proof (continued).
We then have
Z b
a
∥‚ ′ (t)∥ dt ≤
n
X
≤
n
X
≤
i=1
n
X
=
i=1
n
X
(ai − ai−1 )∥‚ ′ (ai−1 )∥ +
i=1
"
2
‚ ‚(a ) − ‚(a ) ‚
i
i−1 ‚
‚
‚+
(ai − ai−1 )‚
(ai − ai−1 )
ai − ai−1
"
"
(b − a) +
2(b − a)
2
∥‚(ai ) − ‚(ai−1 )∥
+"
|ai − ai−1 |
∥‚(ai ) − ‚(ai−1 )∥ + " = ‘P (C) + "
i=1
≤ ‘(C) + "
Letting " → 0, we obtain the desired inequality.
Curve Length, Normal Vectors, and Curvature
Slide 391
Curve Length
We can now express the total curve length by
‘(C) =
Z b
a
∥‚ ′ (t)∥ dt:
More generally, we can define a length function
(‘ ◦ ‚)(t) =
Z t
a
∥‚ ′ (fi )∥ dfi
(13.2)
so that (‘ ◦ ‚)(b) = ‘(C).
The function
‘ ◦ ‚ : [a; b] → [0; ∞)
associates to each value of t the length of the curve at ‚(t). Since the
integral (13.2) is strictly monotonic, ‘ ◦ ‚ is strictly increasing and hence
bijective.
Curve Length, Normal Vectors, and Curvature
Slide 392
Curve Length
13.5. Example. We return to Example 12.20 and study the circle of radius
R in R2 with parametrization
‚ : [0; 2ı) → C;
‚(t) =
!
R cos t
:
R sin t
The curve length is given by
(‘ ◦ ‚)(t) =
Z t
0
′
∥‚ (fi )∥ dfi =
Z t
R dfi = Rt:
0
Thus, for p = (p1 ; p2 ) ∈ C, we can read off
‘(p) = R · arctan
p2
:
p1
where the arctangent is understood in the sense of (12.1); i.e., the
appropriate branch is chosen depending ion the signs of p1 and p2 .
Curve Length, Normal Vectors, and Curvature
Slide 393
Parametrization Through Curve Length
The map ‘ : C∗ → [0; ∞) is bijective, since its inverse is given by
‘−1 = ‚ ◦ (‘ ◦ ‚)−1 :
If s = ‘(p) is the curve length at p ∈ C∗ , then p = ‘−1 (s) is the unique
point in C associated to this curve length.
In other words, once we fix an orientation of a simple curve C, the curve
length determines all other points of C∗ uniquely. This means that we can
use the curve length as a natural parametrization of C, i.e., we can
parametrize C using
‚ = ‘−1 : I → C;
int I = (0; ‘(C)):
If we want to parametrize closed curves through curve length, we must fix
an “initial point” in some fashion.
Curve Length, Normal Vectors, and Curvature
Slide 394
Line Integrals
We may extend the extend the concept of the curve length integral (13.1)
for a parametrized curve (C; ‚),
Z b
a
∥‚ ′ (t)∥ dt;
to integrals of the form
Z b
a
f (‚(t)) · ∥‚ ′ (t)∥ dt
where f is a real-valued function defined on C. Such an integral is called a
line integral of the scalar function f .
Curve Length, Normal Vectors, and Curvature
The Line Integral of a Scalar Function
Slide 395
Suppose that we are given a simple, open, oriented curve C∗ ⊂ R2 and a
scalar function f : R2 → R. In the sketch below, the red curve C∗ joins the
points p and q in the x1 -x2 -plane, and the function f is given by
f (x1 ; x2 ) = 4=5 + x12 sin x2 .
Curve Length, Normal Vectors, and Curvature
Slide 396
The Line Integral of a Scalar Function
Suppose that C∗ is parametrized by a function ‚ : [a; b] → C such that
‚(a) = p and ‚(b) = q. Then the blue curve below shows the values of
f ◦ ‚ : [a; b] → R.
Curve Length, Normal Vectors, and Curvature
Slide 397
The Line Integral of a Scalar Function
We now want to integrate the values of f along the red curve, i.e., we will
determine the area of the green surface.
Curve Length, Normal Vectors, and Curvature
Slide 398
The Line Integral of a Scalar Function
For clarity, the graph of f has been removed in the sketch below.
Curve Length, Normal Vectors, and Curvature
Slide 399
The Line Integral of a Scalar Function
By considering the composition f ◦ ‚, we are effectively “straightening out”
the red curve to the interval [a; b].
Curve Length, Normal Vectors, and Curvature
Slide 400
The Line Integral of a Potential Function
13.6. Definition. Let C∗ be a smooth, oriented curve in a normed vector
space V and f : C → R a continuous function.
We then define the line integral of f along C∗ by
Z
C∗
f d‘ :=
Z
I
(f ◦ ‚)(t) · ∥‚ ′ (t)∥ dt
(13.3)
where ‚ : I → C is a (any) parametrization of C∗ .
13.7. Remarks.
I Using the chain rule it can easily be seen that this integral is
∗
independent of the parametrization of C .
I The line integral along a piecewise-smooth curve is defined as the sum
of the integrals of the individual smooth segments.
Curve Length, Normal Vectors, and Curvature
Slide 401
The Scalar Line Element
The symbol “d‘” in (13.3) is, strictly speaking, unnecessary decoration.
However, it can be interpreted geometrically as a scalar line element, i.e.,
an infinitesimal length.
Inspired by (13.2),
‘(‚(t)) =
Z t
a
∥‚ ′ (fi )∥ dfi
one sometimes thinks of this “line element” as
d‘ = ∥‚ ′ (t)∥ dt;
but this equation should not be interpreted in a strict mathematical sense.
Curve Length, Normal Vectors, and Curvature
Slide 402
A Physical Wire
13.8. Example. The mass of a physical wire (interpreted as a curve; i.e.,
having no thickness) can be obtained by integrating its density along its
path. If a wire C is taken to have variable density % its mass is given by
˛Z
˛
˛
˛
˛
m = ˛ % d‘˛˛:
C
As an example, consider a semi-circular wire
C = {(x; y ) ∈
R2 : x 2 + y 2 = 1; y ≥ 0}
with density %(x; y ) = k(1 − y ) where k > 0 is a constant. (Thus the wire
is denser at its base and light at the top. We might alternatively interpret
the varying density as varying thickness of a uniformly dense wire.)
Curve Length, Normal Vectors, and Curvature
Slide 403
Total Mass of a Wire
We choose the parametrization ‚(t) = (cos t; sin t), I = [0; ı]. We have
Z
C
% d‘ =
Z ı
0
% ◦ ‚(t) · ∥‚ ′ (t)∥ dt =
so m = |k(ı − 2)| = k(ı − 2).
Z ı
0
k(1 − sin t) · 1 dt = k(ı − 2);
Curve Length, Normal Vectors, and Curvature
Slide 404
Center of Mass of a Wire
The center of mass of the wire is given by (xc ; yx ), where
xc :=
1
m
Z
x · % d‘;
yc :=
C
1
m
Z
y · % d‘:
C
(of course, an analogous formula holds for objects represented as
one-dimensional curves in Rn ).
In our example,
Z
Z
1 ı
1 ı
(x · %) ◦ ‚(t) dt =
cos t · k(1 − sin t) dt = 0
m 0
m 0
Z
4−ı
1 ı
sin t · k(1 − sin t) dt =
yc =
m 0
2(ı − 2)
xc =
Curve Length, Normal Vectors, and Curvature
Slide 405
Rate of Change of the Tangent Vector
In order to gain more insight into the geometric properties of a curve, we
will study the rate of change of the tangent vector as it “travels” along a
curve. We assume from now on that
I V
is a real inner product space and
I C⊂V
has a parametrization ‚ such that ‚ ′′ exists and ‚ ′′ ̸= 0.
We call such a C a smooth C 2 -curve and ‚ a C 2 -parametrization.
We are interested in
´
d `
T ◦ ‚(t) :
dt
(13.4)
Now T ◦ ‚ itself parametrizes a curve T, so (13.4) gives the tangent vector
of T at T ◦ ‚(t). Moreover, since ∥T ∥ = 1 we see that
T ⊂ S = {x ∈ V : ∥x∥ = 1}:
Curve Length, Normal Vectors, and Curvature
Slide 406
The Normal Vector of a Curve
Just as in Example 12.20, this implies that (13.4) is perpendicular to
T ◦ ‚(t):
1 = ∥T ◦ ‚(t)∥2 = ⟨T ◦ ‚(t); T ◦ ‚(t)⟩
d
⇒ 0=
⟨T ◦ ‚(t); T ◦ ‚(t)⟩ = 2⟨(T ◦ ‚)′ (t); T ◦ ‚⟩
dt
(13.5)
13.9. Definition. Let C ⊂ V be a smooth C 2 -curve. Let ‚ : I → V be a
smooth C 2 -parametrization of C. Then the unit normal vector N : C → R
is defined by
N ◦ ‚(t) :=
(T ◦ ‚)′ (t)
;
∥(T ◦ ‚)′ (t)∥
t ∈ int I:
(13.6)
Curve Length, Normal Vectors, and Curvature
Slide 407
The Normal Vector of a Curve
The unit normal vector does not depend on ‚, not even up to orientation:
suppose ‚ : I → C, ‚e : J → C are two C 2 -parametrizations connected by a
C 2 -reparametrization r : J → I so that ‚e = ‚ ◦ r . Let p ∈ C satisfy
p = ‚(t) = ‚e(fi ), t = r (fi ).
By (12.5),
T ◦ ‚e(fi ) =
r ′ (fi )
T ◦ ‚(t):
|r ′ (fi )|
Suppose that r is orientation-reversing. Then
d
d
T ◦ ‚e(fi ) = − T ◦ ‚(r (fi )) = −(T ◦ ‚)′ (r (fi ))r ′ (fi )
dfi
dfi
and
N ◦ ‚e(fi ) = −
(T ◦ ‚)′ (t)
r ′ (fi ) (T ◦ ‚)′ (t)
=
:
|r ′ (fi )| ∥(T ◦ ‚)′ (t)∥
∥(T ◦ ‚)′ (t)∥
Of course, if r is orientation-preserving, the proof is similar.
Curve Length, Normal Vectors, and Curvature
Slide 408
The Normal Vector of a Curve
13.10. Example. We return to Example 12.20 and study the circle of
radius R in R2 with parametrization
‚ : [0; 2ı) → C;
‚(t) =
!
R cos t
:
R sin t
The unit tangent vector at ‚(t) is given by
T ◦ ‚(t) =
− sin t
cos t
!
⇒
(T ◦ ‚)′ (t) =
and ∥(T ◦ ‚)′ (t)∥ = 1. Then
(T ◦ ‚)′ (t)
=
N ◦ ‚(t) =
∥(T ◦ ‚)′ (t)∥
Thus, if p = (p1 ; p2 ) ∈ C, then
1
N(p) = − p
R
− cos t
− sin t
!
− cos t
− sin t
!
Curve Length, Normal Vectors, and Curvature
Slide 409
Curvature
We are interested in the rate of change of the direction of the tangent
vector to a curve. However, while T does not depend on ‚, if we simply
differentiate T ◦ ‚(t), the derivative will depend on ‚. In order to obtain a
purely geometric measure for the rate of change of T , we need to settle on
a “canonical” parametrization. Luckily, we have just developed one:
parametrization using curve length. This parametrization takes into
account only the specific geometric properties of the curve.
13.11. Definition. The curvature of a smooth C 2 -curve C ⊂ V is
» : C → R;
‚
‚d `
» ◦ ‘−1 (s) := ‚
‚
ds
‚
´‚
T ◦ ‘−1 (s) ‚
‚
where T is the unit tangent vector and ‘−1 : I → C is the curve length
parametrization of C.
Note that, like the unit normal vector N, » also does not depend on the
orientation of C.
Curve Length, Normal Vectors, and Curvature
Slide 410
Curvature in Arbitrary Parametrization
Given a parametrization ‚ : I → C of C (which is not necessarily the curve
length), by the chain rule
˛
˛
˛
d(T ◦ ‚) ˛˛
d(T ◦ ‘−1 ) ˛˛
d(‘ ◦ ‚) ˛˛
=
:
˛
˛
˛
dt
ds
dt ˛t
t
s=‘◦‚(t)
Using (13.2),
˛
˛
1
d(T ◦ ‘−1 ) ˛˛
d(T ◦ ‚) ˛˛
= ′
˛
˛
ds
∥‚ (t)∥
dt ˛t
s=‘◦‚(t)
and so we obtain for the curvature at p = ‚(t) = ‘−1 (s)
˛
» ◦ ‚(t) = » ◦ ‘−1 (s)˛s=‘◦‚(t) =
∥(T ◦ ‚)′ (t)∥
:
∥‚ ′ (t)∥
(13.7)
Curve Length, Normal Vectors, and Curvature
Slide 411
Curvature
13.12. Example. We return to Example 12.20 and study the circle of radius
R in R2 with parametrization
‚ : [0; 2ı) → C;
‚(t) =
!
R cos t
:
R sin t
We have
T ◦ ‚(t) =
!
− sin t
;
cos t
so by (13.7)
‚
1‚
∥(T ◦ ‚)′ (t)∥
‚ − cos t
=
» ◦ ‚(t) =
‚
∥‚ ′ (t)∥
R ‚ − sin t
!‚
‚
1
‚
‚= :
‚
R
Thus, the curvature of circle is constant and equal to the inverse of its
radius.
The Riemann Integral for Scalar-Valued Functions
Slide 412
14. The Riemann Integral for Scalar-Valued
Functions
The Riemann Integral for Scalar-Valued Functions
Slide 413
Continuity, Differentiability, Integrability
8. Sets and Equivalence of Norms
9. Continuity and Convergence
10. The First Derivative
11. The Regulated Integral for Vector-Valued Functions
12. Curves, Orientation, and Tangent Vectors
13. Curve Length, Normal Vectors, and Curvature
14. The Riemann Integral for Scalar-Valued Functions
15. Integration in Practice
16. Parametrized Surfaces and Surface Integrals
The Riemann Integral for Scalar-Valued Functions
Slide 414
Integration in Rn
Contrary to the general approach we took in differential calculus, we will
now focus on functions defined on Rn , even restricting ourselves to R2 and
R3 in some cases. The reason for this is that the geometry of objects in
general vector spaces (even Rn ) is quite complex, and in order to
understand even the finite-dimensional case, we would need to introduce a
variety of abstract concepts (manifolds, tangent and cotangent spaces
etc.). This is generally done in courses on vector analysis; regrettably, we
do not have time to pursue these things here.
When integrating a function defined on a subset ˙ ⊂ R2 , there are some
difficulties that do not occur for functions of a single variable. In particular,
while integrating across an interval [a; b] is straightforward, the shape of ˙
now plays a significant role.
We will discuss the concept of volume of sets and introduce integrals of
step functions on cuboids before attempting to extend integration to
continuous functions defined on more general sets.
The Riemann Integral for Scalar-Valued Functions
Slide 415
Cuboids
We wish to assign a volume to general sets in Rn .
14.1. Definition. Let ak ; bk , k = 1; : : : ; n be pairs of numbers with ak < bk .
Then the set Q ⊂ Rn given by
Q = [a1 ; b1 ] × · · · × [an ; bn ]
= {x ∈ Rn : xk ∈ [ak ; bk ]; k = 1; : : : ; n}
is called an n-cuboid. We define the volume of Q to be
|Q| :=
n
Y
(bk − ak ):
k=1
We will denote the set of all n-cuboids by Qn .
14.2. Remark. Clearly, an n-cuboid is a compact set in Rn .
The Riemann Integral for Scalar-Valued Functions
Slide 416
Upper and Lower Volumes of Sets
The idea for assigning volume to a subset ˙ ⊂ Rn is similar to that for the
Riemann integral: consider volumes of enclosing and enclosed n-cuboids; if
there infimum and supremum (respectively) are equal, assign this number
as the volume of ˙.
14.3. Definition. Let ˙ ⊂ Rn be a bounded non-empty set. We define the
outer and inner volume of ˙ by
V (˙) := inf
r
nX
k=0
V (˙) := sup
r
nX
Qk ;
k=1
r
[
Qk ;
|Qk | : r ∈ N; Q0 ; : : : ; Qr ∈ Qn ; ˙ ⊃
k=0
It is easy to see that 0 ≤ V (˙) ≤ V (˙).
o
r
[
|Qk | : r ∈ N; Q0 ; : : : ; Qr ∈ Qn ; ˙ ⊂
k=1
r
\
k=1
o
Qk = ∅ :
The Riemann Integral for Scalar-Valued Functions
Slide 417
Measurable Sets
Sets for which we can define a volume are called measureable. The
volume is referred to as the measure of a set.
14.4. Definition. Let ˙ ⊂ Rn be a bounded set. Then ˙ is said to be
(Jordan) measurable if either
(i) V (˙) = 0 or
(ii) V (˙) = V (˙).
In the first case, we say that ˙ has (Jordan) measure zero, in the second
case we say that
|˙| := V (˙) = V (˙)
is the Jordan measure of ˙.
The Riemann Integral for Scalar-Valued Functions
Slide 418
Sets of Measure Zero
For a set ˙ ⊂ Rn to have measure zero, V (˙) does not need to exist
(possibly because there is no n-cuboid that can be a subset of ˙).
14.5. Examples.
(i) A set {x} consisting of a single point x ∈ Rn is a set of measure zero.
(ii) A subset of Rn consisting of a finite number of single points is a set
of measure zero.
(iii) A curve of finite length C ⊂ Rn , n ≥ 2, is a set of measure zero.
(iv) A bounded section of a plane in R3 is a set of measure zero.
(v) The set of rational numbers in the interval [0; 1] is not (Jordan)
measurable. The outer volume is V (Q ∩ [0; 1]) = 1, but the inner
volume does not exist.
The Riemann Integral for Scalar-Valued Functions
Slide 419
“Almost Everywhere” Properties
14.6. Definition. A function f on Rn that has a property for all x ∈ Rn \ ˙,
where ˙ is a set of measure zero, is said to have this property almost
everywhere (often abbreviated by “a.e.”).
14.7. Example.
(i) The function f : R → R, f (x) = |x| is differentiable almost
everywhere.
(ii) The function f : [0; 1] × [0; 1] → R,
f (x; y ) =
(
1
x−y
x ̸= y ;
0
otherwise
is continuous almost everywhere.
;
The Riemann Integral for Scalar-Valued Functions
Slide 420
Partitions of Cuboids
Let Q ⊂ Rn be an n-cuboid. We can then define step functions on Q just
as we did for intervals. First, recall from 186 Definition 4.1.1 that a
partition P of an interval [a; b] is a finite sequence of numbers
P = (a0 ; : : : ; an ) with
a = a0 < a1 < : : : < am = b:
14.8. Definition. A partition P of an n-cuboid Q = [a1 ; b1 ] × · · · × [an ; bn ]
is a tuple P = (P1 ; : : : ; Pn ) such that Pk = (ak0 ; : : : ; akmk ) is a partition of
the interval [ak ; bk ].
The partition P of Q induces cuboids of the form
Qj1 j2 :::jn := [a1(j1 −1) ; a1j1 ] × · · · × [an(jn −1) ; anjn ] ⊂ Q:
The Riemann Integral for Scalar-Valued Functions
Slide 421
Step Functions on Cuboids
The intersection of the cuboids Qj1 j2 :::jn is a subset of
˙ = {x ∈ Q : xk = akik for some k = 1; : : : ; n − 1};
which is a set of measure zero. We say that the union of sets whose
intersection is a set of measure zero is almost disjoint. Thus,
Q=
[
Qj1 j2 :::jn
1≤j1 ≤m1
..
.
1≤jn ≤mn
is the union of almost disjoint cuboids induced by a partition P of Q.
The Riemann Integral for Scalar-Valued Functions
Slide 422
Step Functions on Cuboids
14.9. Definition. Let Q ⊂ Rn be an n-cuboid. A function f : Q → R is
called a step function with respect to a partition P if there exist
numbers yj1 j2 :::jn ∈ R such that f (x) = yj1 j2 :::jn whenever x ∈ int Qj1 j2 :::jn ,
jk = 1; : : : ; mk , k = 1; : : : ; n.
14.10. Remarks.
(i) It doesn’t matter how the step function is defined on the set
˙ = {x ∈ Q : xk = akik for some k = 1; : : : ; n};
which is a set of measure zero.
(ii) We call f simply a step function on Q if there exists some partition P
of Q with respect to which it is a step function.
(iii) The set of step functions on Q is a vector space (the sum of two step
functions is a step function w.r.t. a common subpartition of the
partitions). This vector space os a subspace of the space of bounded
functions on Q.
The Riemann Integral for Scalar-Valued Functions
Slide 423
Integration of Step Functions
14.11. Theorem. Let Q ⊂ Rn be a cuboid and f : Q → R a step function
with respect to some partition P of Q. Then
X
IP (f ) :=
|Qj1 :::jn | · yj1 :::jn
j1 =1;:::;m1
..
.
jn =1;:::;mn
is independent of the choice of the partition P and is called the integral of
f.
We thus define
Z
f := IP (f )
Q
for any partition P to be the integral of a step function over a cuboid Q.
The Riemann Integral for Scalar-Valued Functions
Slide 424
The Regulated Integral
Recall that the regulated integral was defined through the following
procedure:
I We defined the set of step functions on an interval [a; b] ⊂
R.
I We defined the integral of a step function.
I Those functions that were uniform limits of sequences of step
functions were termed regulated functions. Their integral was
defined as the limit of the integrals of a corresponding sequence of
step functions.
I We showed that the continuous functions are regulated. The same is
true for piecewise-continuous functions.
We can not extend this strategy to sets in Rn ; the reason it breaks down is
that functions f : ˙ → R on general domains ˙ ⊂ Rn can not be
approximated uniformly by step functions.
The Riemann Integral for Scalar-Valued Functions
Slide 425
The Riemann / Darboux Integral
However, we have an alternative approach at hand in the form of the
Riemann integral. This was defined for a function f : [a; b] → R as follows:
I We defined the set of step functions on an interval [a; b] ⊂
R.
I We defined the set of lower step functions w.r.t. f , i.e., those whose
values are less than the values of f .
I We defined the set of upper step functions w.r.t. f , i.e., those whose
values are greater than the values of f .
I If the greatest lower bound for the values of the integrals of upper
step functions coincides with the least upper bound for the integrals of
lower step functions, this must be the integral of f . This integral is
called the Darboux integral and is equivalent to the Riemann
integral.
I The Riemann integral coincides with the integral for regulated
functions, but can be applied even to functions that are not regulated.
The Riemann Integral for Scalar-Valued Functions
Slide 426
Integration over Cuboids
We will now formulate the definition of the Riemann integral for functions
of several variables with real values.
14.12. Definition. Let Q ⊂ Rn be an n-cuboid and f a bounded real
function on Q. let Uf denote the set of all step functions u on Q such that
u ≥ f and Lf the set of all step functions v on Q such that v ≤ f . The
function f is then said to be (Darboux)-integrable if
sup
v ∈Lf
Z
v = inf
Q
u∈Uf
Z
u:
Q
In this case, the (Darboux) integral of f over Q,
this common value.
R
Q f , is defined to be
14.13. Theorem. A bounded function f : Q → R is Darboux-integrable if
and only if for every " > 0 there exist step functions u" and v" such that
v" ≤ f ≤ u" and
Z
Z
Q
u" −
Q
v" ≤ ":
The Riemann Integral for Scalar-Valued Functions
Slide 427
Integration over Cuboids
14.14. Proposition. Let Q ⊂ Rn be an n-cuboid and f : Q → R be bounded
and continuous almost everywhere. Then f is Darboux-integrable.
Proof.
Since f is continuous almost everywhere, we can find a partition of Q such
that the set of points where f is discontinuous is contained in cuboids of
arbitrarily small measure. Furthermore, since f is bounded, we can find
some C > 0 such that
−C=2 < f (x) < C=2:
For any partition P of Q we denote by Q′ the union of the induced cuboids
on which f is discontinuous. Fix " > 0 and choose a partition of Q such
that
"
:
|Q′ | <
2C
The Riemann Integral for Scalar-Valued Functions
Slide 428
Integration over Cuboids
Proof (continued).
Let Q′′ := Q \ Q′ be the union of the cuboids where f is continuous.
Choose the partition P in such a way that
sup
x;y ∈Qj1 :::jn
|f (x) − f (y )| ≤
"
2|Q|
for any Qj1 :::jn ⊂ Q′′ .
Define a step function u as follows:
u(x) =
8
>
C=2
>
>
<
sup
x∈Qj1 :::jn
>
>
>
:
f (x)
Clearly, f ≤ u on Q.
x ∈ Qj1 :::jn ; Qj1 :::jn ⊂ Q′ ;
f (x) x ∈ Qj1 :::jn ; Qj1 :::jn ⊂ Q′′ ;
x ∈ @Qj1 :::jn :
The Riemann Integral for Scalar-Valued Functions
Slide 429
Integration over Cuboids
Proof (continued).
Similarly, set
v (x) =
so v ≤ f on Q.
8
>
>−C=2
>
<
inf
x∈Qj1 :::jn
>
>
>
:f (x)
x ∈ Qj1 :::jn ; Qj1 :::jn ⊂ Q′ ;
f (x) x ∈ Qj1 :::jn ; Qj1 :::jn ⊂ Q′′ ;
x ∈ @Qj1 :::jn ;
The Riemann Integral for Scalar-Valued Functions
Slide 430
Integration over Cuboids
Proof (continued).
Then
˛Z
˛ ˛Z
˛
Z ˛ ˛Z
˛
˛ ˛
˛ ˛
˛
˛ u−
˛
˛
˛
˛
v ˛ ≤ ˛ (u − v )˛ + ˛
(u − v )˛˛
˛
′
′′
Q
Q
Q
Q
≤ |Q′ | · sup |u(x) − v (x)| + |Q′′ | · sup |u(x) − v (x)| ≤ |Q′ | ·
x∈Q′
≤
"
"
· C + |Q′′ | ·
< ":
2C
2|Q|
By Theorem 14.13, f is integrable.
x∈Q′′
„
The Riemann Integral for Scalar-Valued Functions
Slide 431
Integration over Jordan-Measurable Sets
When integrating a function f : U → R defined on some set U ⊂ Rn , we
automatically consider the domain of f to be extended to all of Rn by
setting f (x) = 0 for x ∈ Rn \ U. At the same time, we define the indicator
function for a set ˙ ⊂ Rn :
1˙ (x) =
(
x ∈ ˙;
otherwise:
1;
0;
Then
1˙ (x)f (x) =
(
f (x)
0
x ∈ ˙;
x ̸∈ ˙:
The Riemann Integral for Scalar-Valued Functions
Slide 432
Integration over Jordan-Measurable Sets
14.15. Definition. Let U ⊂ Rn , f : U → R and ˙ ⊂ Rn be a bounded
Jordan-measurable set. Then f is said to be integrable on ˙ if for every
n-cuboid Q ⊂ Rn such that ˙ ⊂ Q the function 1˙ f : Q → R is integrable.
We then write
Z
Z
f :=
f · 1˙ :
˙
Q
for any n-cuboid Q ⊃ ˙.
We omit the proof of the following result:
14.16. Lemma. Let ˙ ∈ Rn be a bounded set. Then ˙ is
Jordan-measurable if and only if its boundary @˙ has Jordan measure zero.
Proposition 14.14 and Lemma 14.16 immediately yield:
14.17. Corollary. Let ˙ ⊂ Rn be a bounded Jordan-measurable set and let
f : ˙ → R be continuous a.e. Then f is integrable on ˙.
The Riemann Integral for Scalar-Valued Functions
Slide 433
Basic Properties of the Integral
From the definition of the integral and measurability of sets, we have the
following result:
14.18. Lemma.
(i) Let ˙ ⊂ Rn be a measurable set. Then
|˙| =
Z
1:
˙
(ii) Let ˙ ⊂ Rn be a set of measure zero and
and f : ˙ → R some
R
function that is integrable on ˙. Then ˙ f = 0.
(iii) Let ˙ ⊂ Rn and ˙ ′ ⊂ ˙ be measurable sets and f : Rn → R
integrable on ˙. Then f is also integrable on ˙ ′ .
(iv) Let ˙; ˙ ′ ⊂ Rn measurable sets and f : Rn → R integrable on both
of them. Then f is integrable on ˙ ∪ ˙ ′ and
Z
˙∪˙ ′
f =
Z
˙
f +
Z
˙′
f −
Z
˙∩˙ ′
f:
Integration in Practice
Slide 434
15. Integration in Practice
Integration in Practice
Slide 435
Continuity, Differentiability, Integrability
8. Sets and Equivalence of Norms
9. Continuity and Convergence
10. The First Derivative
11. The Regulated Integral for Vector-Valued Functions
12. Curves, Orientation, and Tangent Vectors
13. Curve Length, Normal Vectors, and Curvature
14. The Riemann Integral for Scalar-Valued Functions
15. Integration in Practice
16. Parametrized Surfaces and Surface Integrals
Integration in Practice
Slide 436
Practical Integration over Cuboids
The following result lets us reduce integrals over n-cuboids to separate
integrals over n1 - and n2 -cuboids, where n1 + n2 = n. Since we know how
to integrate over 1-cuboids (intervals!), this is a powerful tool for
evaluating general integrals.
15.1. Fubini’s Theorem. Let Q1 be and n1 -cuboid and Q2 an n2 -cuboid so
that Q := Q1 × Q2 ⊂ Rn1 +n2 is an (n1 + n2 )-cuboid. Assume that
f : Q → R is integrable on Q and that for every x ∈ Q1 the integral
g (x) =
Z
Q2
exists. Then
Z
Q
f =
Z
Q1 ×Q2
f =
f (x; · )
Z
Q1
g=
Z
Q1
„Z
«
f :
Q2
We omit the proof of this theorem; the statement can be shown as in the
previous results by considering step functions.
Integration in Practice
Slide 437
Practical Integration over Cuboids
15.2. Example. Consider the 2-cuboid Q = [0; 1] × [0; 2] and the function
f : Q → R, f (x; y ) = y x 2 + 2y 2 . The integral
Z
"
Z 2
#2
x2 2 2 3
16
g (x) =
f (x; · ) =
f (x; y ) dy =
y + y
= 2x 2 +
2
3
3
[0;2]
0
y =0
exists for every x ∈ [0; 1], so we can apply Fubini’s Theorem to yield
Z
f =
Q
Z
[0;1]
=
g=
Z 1
g (x) dx =
0
Z 1 „Z 2
0
«
f (x; y ) dy dx
0
i1
2h 3
18
x + 8x = :
0
3
3
We can thus use Fubini’s theorem to iteratively reduce integrals over
n-cuboids to integrals over intervals, which we know well how to calculate.
Integration in Practice
Slide 438
Practical Integration over Cuboids
We will often omit the parentheses when evaluating multiple integrals, e.g.,
we will write
Z 1Z 2
0
f (x; y ) dy dx =
Z 1 „Z 2
0
0
«
f (x; y ) dy dx
0
If the hypotheses of Fubini’s Theorem are satisfied, we can therefore write
Z
Q
f =
Z bn
an
···
Z b2 Z b1
a2
f (x1 ; x2 ; : : : ; xn ) dx1 dx2 : : : dxn :
a1
We will often abbreviate dx := dx1 dx2 : : : dxn , writing
Z
Q
f =
Z
Q
f (x) dx:
Integration in Practice
Slide 439
Ordinate and Simple Regions in R2
Of course, we do not just want to integrate over cuboids but also over
more complicated domains. For most purposes it is sufficient to consider
regions whose boundaries can be expressed by the graphs of functions.
15.3. Definition. A set D ⊂ R2 is called an ordinate region with respect
to x2 , if there exists an interval I ⊂ R and continuous, almost everywhere
differentiable functions ’1 ; ’2 : I → R such that
˘
¯
D = (x1 ; x2 ) ∈ R2 : x1 ∈ I; ’1 (x1 ) ≤ x2 ≤ ’2 (x1 ) :
If the role of x1 and x2 above is interchanged, we say that D is an
ordinate region with respect to x1 .
If D ⊂ R2 is an ordinate region both with respect to x1 and x2 , we say that
D is a simple region.
Integration in Practice
Slide 440
Ordinate and Simple Regions in R2
15.4. Example. The half-disk region
˘
R1 = (x1 ; x2 ) ∈ R2 : x2 ≥ 0; x12 + x22 ≤ 1
is a simple region, because we can write
˘
R1 = (x1 ; x2 ) ∈ R2 : x1 ∈ [−1; 1]; 0 ≤ x2 ≤
q
˘
q
¯
1 − x12
= (x1 ; x2 ) ∈ R2 : x2 ∈ [0; 1]; − 1 − x22 ≤ x1 ≤
¯
q
x2
R1
-1
1
x1
¯
1 − x22 :
Integration in Practice
Slide 441
Ordinate and Simple Regions in R2
15.5. Example. The upper half-annulus
˘
¯
R2 = (x1 ; x2 ) ∈ R2 : x2 ≥ 0; 1=4 ≤ x12 + x22 ≤ 1
is an ordinate region with respect to x2 but not with respect to x1 . We can
write
˘
R2 = (x1 ; x2 ) ∈ R : x1 ∈ [−1; 1]; f (x1 ) ≤ x2 ≤
2
x2
q
¯
1 − x12 :
where
f (x) =
1
2
q
1=4 − x 2
if |x| < 1=2 and
R2
f (x) = 0
-1
1
-2
1
2
1
x1
otherwise.
Integration in Practice
Slide 442
Ordinate and Simple Regions in R2
15.6. Example. The annulus
˘
¯
R3 = (x1 ; x2 ) ∈ R2 : 1=4 ≤ x12 + x22 ≤ 1
is not an ordinate region (but can be expressed as the union of two
ordinate regions).
x2
1
1
2
-1
1
R3
1
2
-2
1
-2
-1
1
x1
Integration in Practice
Slide 443
Ordinate Regions in Rn
We now generalize ordinate regions to Rn . For x ∈ Rn we define
x̂ (k) := (x1 ; : : : ; xk−1 ; xk+1 ; : : : ; xn ) ∈ Rn−1
as the vector x with the kth component omitted.
15.7. Definition. A subset U ⊂ Rn is said to be an ordinate region (with
respect to xk ) if there exists a measurable set ˙ ⊂ Rn−1 and continuous,
almost everywhere differentiable functions ’1 ; ’2 : ˙ → R, such that
˘
¯
U = (x ∈ Rn : x ∈ ˙; ’1 (x̂ (k) ) ≤ xk ≤ ’2 (x̂ (k) ) :
If U is an ordinate region with respect to each xk , k = 1; : : : ; n, it is said to
be a simple region.
15.8. Lemma. Any ordinate region is measurable.
Integration in Practice
Slide 444
Ordinate Regions
15.9. Example. The unit ball in R3 ,
B 3 := {x ∈ R3 : ∥x∥ ≤ 1}
is an ordinate region, since we can write
ȷ
q
B 3 = x ∈ R3 : (x1 ; x2 ) ∈ B 2 ; − 1 − x12 − x22 ≤ x3 ≤
q
1 − x12 − x22
ff
where B 2 := {x ∈ R2 : ∥x∥ ≤ 1}. Of course, we still need to check that B 2
is measurable. However,
ȷ
q
B 2 = x ∈ R2 : x1 ∈ [−1; 1]; − 1 − x12 ≤ x2 ≤
q
1 − x12
ff
is itself an ordinate set, and since the interval [−1; 1] is measurable, so is
B2.
Integration in Practice
Slide 445
Integrals on Ordinate Regions
For an ordinate region U ∈ Rn with respect to xk over a measurable set ˙
the indicator function 1U takes the form
1U (x) = 1˙ (x̂ (k) ) · 1[’1 (x̂ (k) );’2 (x̂ (k) )] (xk )
It then follows that
Z
U
if
R ’2 (x̂ (k) )
’1 (x̂ (k) )
f (x) dx1 : : : dxn =
Z „Z ’2 (x̂ (k) )
˙
’1 (x̂ (k) )
f (x) dxk exists for every x̂ (k) ∈ ˙.
«
f (x) dxk d x̂ (k)
Integration in Practice
Slide 446
Integrals on Ordinate Regions
15.10. Example. The volume of a Jordan-measurable measurable set
˙ ⊂ Rn is given by
Z
|˙| =
1:
˙
As an example, we calculate the volume of the three-dimensional unit ball
B 3 . Writing B 3 as ordinate regions, we have
Z
Z Z √
2
2
|B 3 | =
=2
B3
Z
B2
=2
1−x1 −x2
1=
Z 1
−1
B2
q
√
−
1−x12 −x22
1 dx3 d(x1 ; x2 )
1 − x12 − x22 d(x1 ; x2 )
„Z √
2
1−x1
√
−
q
1−x12
q
«
1 − x12 − x22 dx2 dx1
We now substitute y2 = x2 = 1 − x12 in the inner integral.
Integration in Practice
Slide 447
Integrals on Ordinate Regions
|B 3 | = 2
=2
=8
Z 1„
Z 1 q
−1
−1
Z 1
−1
Z 1
0
=
16
3
(1 − x12 )
(1 − x12 ) dx1 ·
(1 − x12 ) dx1 ·
Z 1q
0
«
1 − y22 dy2 dx1
Z 1 q
−1
Z 1q
0
1 − y22 dy2
1 − y22 dy2
1 − y22 dy2
Substituting y2 = sin „, we obtain
|B 3 | =
as expected.
16
3
Z ı=2
0
4
cos2 „ d„ = ı;
3
Integration in Practice
Slide 448
Bodies, Moments and Center of Mass
A rigid body is a set B ⊂ Rn (in physics, n = 2; 3) with a mass
distribution % : B → R. The mass of the body is given by
M(B) =
Z
%:
B
We define the (first) moments of B by
mk (B) =
Z
xk %;
k = 1; : : : ; n:
B
Then the center of mass is given by
xc (B) =
1
M(B)
0
1
m1 (B)
B .. C
@ . A:
mn (B)
If % = 1 on B, then xc (B) represents the geometric center and M(B) = |B|
the volume of B.
Integration in Practice
Slide 449
Bodies, Moments and Center of Mass
15.11. Example. Let B ⊂ R2 be given by
B = {(x; y ) ∈ R2 : 0 ≤ x ≤ 1; 0 ≤ y ≤ x 2 }:
with %(x; y ) = x + y . Then
M(B) =
Z
%=
Z 1 Z x2
B
=
Z 1
0
(x + y ) dy dx =
0
Z 1
0
(x 3 + x 4 =2) dx = 1=4 + 1=10 = 7=20:
0
The moments of B are
m1 (B) =
=
Z
x% =
B
Z 1
0
Z 1 Z x2
0
2
[xy + y 2 =2]x0 dx
x(x + y ) dy dx
0
(x 4 + x 5 =2) dx = 1=5 + 1=12 = 17=60;
Integration in Practice
Slide 450
Bodies, Moments and Center of Mass
and
m2 (B) =
Z
B
=
y% =
Z 1
Z 1 Z x2
0
y (x + y ) dy dx
0
(x 5 =2 + x 6 =3) dx = 1=12 + 1=21 = 11=84:
0
Hence the center of mass is given by
!
1
m1 (B)
xc (B) =
M(B) m2 (B)
!
20 17=60
=
7 11=84
=
17=21
55=147
!
Integration in Practice
Slide 451
The Substitution Rule
A powerful tool in evaluating integrals is the substitution rule, which takes
on an analogous form to that for functions of one variable. We will merely
state, and not prove, this result.
15.12. Substitution Rule. Let ˙ ⊂ Rn be open and g : ˙ → Rn injective
and continuously differentiable. Suppose that det Jg (y ) ̸= 0 for all y ∈ ˙.
Let K be a compact measurable subset of ˙. The g (K) is compact and
measurable and if f : g (K) → R is integrable, then
Z
g (K)
f (x) dx =
Z
K
f (g (y )) · |det Jg (y )| dy :
Integration in Practice
Slide 452
Polar Coordinates
15.13. Examples. The most important substitutions are transformations to
cylindrical or spherical/polar coordinates.
(i) Polar coordinates in R2 are defined by a map
˘ : (0; ∞) × [0; 2ı) → R2 \ {0};
(r; ffi) 7→ (x; y )
where
x = r cos ffi;
y = r sin ffi:
Note that this map is bijective and even C ∞ in the interior of its
domain. An alternative (but rarely used) version of polar coordinates
would map x = r sin ffi, y = r cos ffi. This simply corresponds to a
different geometrical interpretation of the angle ffi. In any case,
˛
˛
cos ffi
˛
|det J˘ (r; ffi)| = ˛det
˛
sin ffi
!˛
˛
˛
˛=r
˛
−r sin ffi
r cos ffi
Integration in Practice
Slide 453
Cylindrical Coordinates
(ii) Cylindrical coordinates in R3 are given through a map
˘ : (0; ∞) × [0; 2ı) × R → R3 \ {0};
(r; ffi; “) 7→ (x; y ; z)
defined by
x = r cos ffi;
y = r sin ffi;
z =“
In this case,
˛
0
˛
cos ffi
˛
˛
B
|det J˘ (r; ffi; “)| = ˛det @ sin ffi
˛
˛
0
1˛
−r sin ffi 0 ˛˛
C˛
r cos ffi 0A˛ = r
˛
0
1 ˛
Integration in Practice
Slide 454
Spherical Coordinates in R3
(iii) Spherical coordinates in R3 are often defined through a map
˘ : (0; ∞) × [0; 2ı) × (0; ı) → R3 \ {0}; (r; ffi; „) 7→ (x; y ; z);
x = r cos ffi sin „; y = r sin ffi sin „; z = r cos „:
Of course, there is a certain freedom in defining „ and ffi, so there are
alternative formulations. The modulus of the determinant of the
Jacobian is given by
˛
0
˛
cos ffi sin „
˛
˛
B
|det J˘ (r; ffi; „)| = ˛det @ sin ffi sin „
˛
˛
cos „
= r 2 sin „
1˛
−r sin ffi sin „ r cos ffi cos „ ˛˛
C˛
r cos ffi sin „ r sin ffi cos „ A˛
˛
˛
0
−r sin „
Integration in Practice
Slide 455
Spherical Coordinates in Rn
(iv) In Rn , we can define spherical coordinates by
x1 = r cos „1
x2 = r sin „1 cos „2
x3 = r sin „1 sin „2 cos „3
..
.
xn−1 = r sin „1 sin „2 : : : sin „n−2 cos „n−1
xn = r sin „1 sin „2 : : : sin „n−2 sin „n−1
with r > 0 and 0 < „k < ı, k = 1; : : : ; n − 2, and 0 < „n−1 < 2ı.
Here, the determinant of the Jacobian can be shown to be
|det J˘ (r; „1 ; : : : ; „n−1 )| = r n−1 sinn−2 „1 sinn−3 „2 : : : sin „n−2 :
Integration in Practice
Slide 456
The Substitution Rule in Practice
Using spherical coordinates in R3 as an example, we write
Z
˙
f =
Z
˙
f (x) dx =
Z
˘−1 (˙)
f ◦ ˘(r; „; ffi) · |det J˘ (r; „; ffi)| dr d„ dffi
The terms dx and |det J˘ (r; „; ffi)| dr d„ dffi are often referred to as volume
elements, and one sometimes writes
dx = |det J˘ (r; „; ffi)| dr d„ dffi:
Physicists like to interpret dx as an “infinitesimally small volume” whose
volume is changed when transforming by ˘−1 to dr d„ dffi. Thus
|det J˘ (r; „; ffi)| (which can be interpreted as the size of the parallelepiped
@x @x
spanned by the tangent vectors @x
@r , @„ , @ffi at x) corrects this change in
volume. These ideas can be made rigorous, but we will not pursue them
further.
Integration in Practice
Slide 457
The Substitution Rule in Practice
15.14. Example. We can again calculate the volume of the unit ball
B 3 ⊂ R3 . Using spherical coordinates,
|B 3 | =
Z
1=
Z 2ı Z ı Z 1
B3
= 2ı
Z ı
0
4ı
=
3
0
sin „ d„ ·
Z ı=2
0
3
Note that B is not given by
0
0
Z 1
r 2 sin „ dr d„ dffi
r 2 dr
0
sin „ d„ =
4ı
:
3
{(x; y ; z) ∈ R3 : x = r cos ffi sin „; y = r sin ffi sin „; z = r cos „;
0 ≤ ffi < 2ı; 0 < „ < ı; 0 < r ≤ 1}
because this set does not include the set {(0; 0; x) : x ∈ [−1; 1]}. Since this
set is of measure zero (as is the boundary S 2 of B 3 ) our calculation
remains correct.
Integration in Practice
Slide 458
Gravitational Potential
15.15. Example. We want to calculate the gravitational potential of a
homogenous solid ball in R3 of mass M and radius R at a point p ∈ R3
with distance r > R from the center of the sphere. This potential is given
by
Z
%( · )
U(p) = −G
B 3 dist(p; · )
where % is the mass density of the sphere. In our case,
%=
M
3M
=
· 1 3:
3
|B |
4ıR3 B
Due to the symmetry of the problem, we may choose coordinates such that
p = (0; 0; r ) and introduce polar coordinates
x1 = ȷ cos ffi sin „;
x2 = ȷ sin ffi sin „;
with 0 ≤ ffi < 2ı, 0 ≤ „ < ı, 0 < ȷ ≤ R.
x3 = ȷ cos „
Integration in Practice
Slide 459
Gravitational Potential
Then
dist p; x(r; ffi; „) =
q
=
q
`
´
(ȷ cos ffi sin „)2 + (ȷ sin ffi sin „)2 + (ȷ cos „ − r )2
ȷ2 + r 2 − 2r ȷ cos „
and
Z
3MG
dx
U(p) = −
3
3
4ıR B dist(p; x)
Z Z
Z
3MG R 2ı ı ȷ2 sin „ d„ dffi dȷ
p
=−
4ıR3 0 0
ȷ2 + r 2 − 2r ȷ cos „
0
3MG
=−
4ıR3
3MG
=− 3
2R r
Z R Z 2ı
0
0
Z R „q
ȷ
0
˛ı
˛
ȷ2 q 2
˛
ȷ + r 2 − 2r ȷ cos „˛ dffi dȷ
˛
rȷ
0
ȷ2 + r 2 + 2r ȷ −
q
ȷ2 + r 2 − 2r ȷ
«
dȷ
Integration in Practice
Slide 460
Gravitational Potential
Continuing,
Z
„
q
q
3MG R
U(p) = − 3
ȷ
(ȷ + r )2 − (ȷ − r )2
2R r 0
Z
3MG R
ȷ (ȷ + r − |ȷ − r |) dȷ
=− 3
2R r 0
«
dȷ
Since r > R > ȷ, we have
Z
3MG R
ȷ (ȷ + r − (r − ȷ)) dȷ
U(p) = − 3
2R r 0
Z
3MG R 2
MG
=− 3
ȷ dȷ = −
R r 0
r
Thus the potential induced by a sphere of mass M and radius R at a point
with distance r > R from the center of the sphere is the same as that
induced by a point mass with mass M situated at the center of the sphere.
Integration in Practice
Slide 461
Gravitational Potential
15.16. Remarks. In the physical literature, this is part of what is called the
shell theorem. You will study the other parts of this theorem in the
assignments.
An analogous formula holds for the electrostatic potential induced by a
body with charge density %.
If the mass/charge distribution of the sphere is not uniform, then the
integral becomes much more difficult to solve. One then expands the
integrand using the generating function of the Legendre polynomials
Pl , l ∈ N,
∞
X
1
√
=
Pl (x)t l
2
1 + 2xt + x
l=0
which for every x ∈ [−1; 1] has radius of convergence 1. The same
expansion can be used when summing over several discrete point
charges/masses, where it is then called a multi-pole expansion.
Integration in Practice
Slide 462
Improper Integrals
Just as for integrals of a single variable, we can treat improper Riemann
integrals of functions f : Rn → R over measurable sets ˙ ⊂ Rn . These
occur if either
1. f is unbounded or
2. ˙ is unbounded.
In either case, one considers the improper integral as a suitable limit of
“proper” integrals; if the limit exists, so does the improper integral.
15.17. Example. Our aim is to prove that the Gauß integral
Z ∞
exists and equals
√
−∞
e −x =2 dx
2
2ı. First, consider the integral
I(a) =
Z a
−a
e −x =2 dx:
2
Integration in Practice
Slide 463
The Gauß Integral
Since the integrand is positive and continuous, I(a) exists and is
increasing. For a > 1,
I(a) <
Z −1
−a
−xe −x =2 dx +
2
= 2e −1=2 − 2e −a =2 +
2
−−−→ 2e −1=2 +
a→∞
Z 1
−1
Z 1
2
−1
Z 1
−1
e −x =2 dx +
Z a
xe −x =2 dx
2
1
e −x =2 dx
2
e −x =2 dx < ∞;
2
so I(a) is bounded. It follows that lim I(a) =:
a→∞
R∞
−∞ e
−x 2 =2 dx exists.
We now consider
I(a)2 =
„Z a
−a
e −x =2 dx
2
«„Z a
−a
«
e −y =2 dy :
2
Integration in Practice
Slide 464
The Gauß Integral
By Fubini’s theorem, we can write
I(a)2 =
Z
e −(x +y )=2 dx dy ;
2
2
Qa
where Qa = [−a; a] × [−a; a].
Now Ba (0) ⊂ Qa ⊂ B2a (0), where Br (0) = {(x; y ) ∈ R2 : x 2 + y 2 < r 2 }, so
Z
e −(x +y )=2 dx dy ≤ I(a)2 ≤
2
Ba (0)
2
Z
e −(x +y )=2 dx dy :
2
2
B2a (0)
Using polar coordinates, we calculate
Z
e −(x +y )=2 dx dy =
2
BR (0)
2
Z 2ı Z R
0
e −r =2 r dr dffi
2
0
= 2ı(1 − e −R =2 ) −−−−→ 2ı
√
R ∞ −x 2 =2
This implies lim I(a)2 = 2ı and hence −∞
e
dx = 2ı.
2
a→∞
R→∞
Parametrized Surfaces and Surface Integrals
Slide 465
16. Parametrized Surfaces and Surface Integrals
Parametrized Surfaces and Surface Integrals
Slide 466
Continuity, Differentiability, Integrability
8. Sets and Equivalence of Norms
9. Continuity and Convergence
10. The First Derivative
11. The Regulated Integral for Vector-Valued Functions
12. Curves, Orientation, and Tangent Vectors
13. Curve Length, Normal Vectors, and Curvature
14. The Riemann Integral for Scalar-Valued Functions
15. Integration in Practice
16. Parametrized Surfaces and Surface Integrals
Parametrized Surfaces and Surface Integrals
Slide 467
Parametrized Surfaces
We will now introduce surfaces in Rn . While it is possible to discuss
surfaces without references to a parametrization and then consider
different parametrizations and reparametrizations like we did for curves,
this requires more mathematical background than we have developed at
this point. Therefore, we restrict ourselves to parametrized surfaces, i.e.,
surfaces that are accompanied by a fixed parametrization.
16.1. Definition. A (smooth, parametrized) m-surface in Rn is a subset
S ⊂ Rn together with a locally bijective, continuously differentiable map
(parametrization)
’ : ˙ → S;
˙ ⊂ Rm ;
such that
rank D’|x = m
for almost every x ∈ ˙.
If m = n − 1, then (S; ’) is said to be a hypersurface.
Parametrized Surfaces and Surface Integrals
Slide 468
Parametrized Surfaces
16.2. Example. The unit sphere in R3 ,
˘
S 2 := (x1 ; x2 ; x3 ) ∈ R3 : x12 + x22 + x32 = 1
is a two-surface with parametrization
0
1
cos ffi sin „
B
C
’(ffi; „) = @ sin ffi sin „ A :
cos „
’ : [0; 2ı] × [0; ı] → S 2 ;
We note that
¯
0
1
− sin ffi sin „ cos ffi cos „
B
C
rank D’|(ffi;„) = rank @ cos ffi sin „ sin ffi cos „ A
0
− sin „
0
1
− sin ffi cos ffi cos „
B
C
= rank @ cos ffi sin ffi cos „ A = 2
0
1
when sin „ ̸= 0. Hence rank D’ = 2 almost everywhere on [0; 2ı] × [0; ı].
Parametrized Surfaces and Surface Integrals
Slide 469
Parametrized Surfaces
16.3. Example. Consider the graph of a scalar function f : ˙ → R,
˙ ⊂ Rn ,
˘
¯
` (f ) = (x1 ; : : : ; xn ; xn+1 ) ∈ Rn+1 : x = (x1 ; : : : ; xn ) ∈ ˙; xn+1 = f (x) :
This is a hypersurface in Rn+1 with parametrization
0
’ : ˙ → ` (f );
B
B
’(x) = B
B
xn
f (x1 ; : : : ; xn )
@
The rank of the Jacobian is
rank D’|x = rank
x1
..
.
1
Df |x
!
1
C
C
C:
C
A
= n;
written in block matrix form, where 1 ∈ Mat(n × n; R) is the n × n unit
matrix.
Parametrized Surfaces and Surface Integrals
Slide 470
Tangent Spaces of Surfaces
We now want to define the tangent space of a parametrized m-surface
S ⊂ Rn . The parametrization ’ : ˙ → S satisfies
’(x0 + h) = ’(x0 ) + D’|x0 h + o(h)
as h → 0:
Hence we consider the map
h → D’|x h
to be the linear approximation to ’ near x0 . The range of this map is given
by
{x ∈ Rn : x = D’|x0 h; h ∈ Rm }:
and is equal to the span of the column vectors of D’|x0 . The elements of
the range give good approximations to S at points near x0 .
Parametrized Surfaces and Surface Integrals
Slide 471
Tangent Spaces of Surfaces
Hence, it is natural to make the following definition.
16.4. Definition. Let S ⊂ Rn be a smooth, parametrized m-surface with
parametrization ’ : ˙ → S. Then
0
1˛
’1 (x) ˛
@ B . C˛˛
tk (p) =
;
@ .. A˛
˛
@xk
˛
’n (x) x=’−1 (p)
k = 1; : : : ; m:
is called the kth tangent vector of S at p ∈ S and
Tp S := ran D’|x = span{t1 (p); : : : ; tm (p)}
is called the tangent space to S at p. The vector field
tk : S → Rn ;
is called the kth tangent vector field on S.
p 7→ tk (p)
Parametrized Surfaces and Surface Integrals
Slide 472
Tangent Vectors to the Unit Sphere
16.5. Example. For the unit sphere S 2 ⊂ R3 parametrized with
0
1
cos ffi sin „
C
B
’(ffi; „) = @ sin ffi sin „ A :
cos „
we have the tangent vectors at p ∈ S 2 given by
0
1
0
0
1
0
1
cos ffi sin „
− sin ffi sin „
@ B
C B
C
sin
ffi
sin
„
tffi (p) =
=
@
A @ cos ffi sin „ A
@ffi
cos „
0
1
cos ffi sin „
cos ffi cos „
@ B
C B
C
t„ (p) =
@ sin ffi sin „ A = @ sin ffi cos „ A
@„
cos „
− sin „
taken at (ffi; „) = ’−1 (p).
Parametrized Surfaces and Surface Integrals
Slide 473
Tangent Vectors to the Unit Sphere
√
√
At p = (1= 2; 0; 1= 2) = ’(0; ı=4) the tangent vectors are
0
1
0 √
1
0
2=2
B√
C
C
B
tffi (p) = @ 2=2A ;
t„ (p) = @ √0 A
0
− 2=2
and the tangent space is
Tp S 2 = span{tffi (p); t„ (p)}
8
>
<
0 1
0
1
9
>
0
1
=
B C
B C
= x ∈ R3 : x = ¸ @1A + ˛ @ 0 A ; ¸; ˛ ∈ R :
>
>
:
;
0
−1
Parametrized Surfaces and Surface Integrals
Slide 474
The Normal Vector to Hypersurfaces
The tangent space of an m-surface in Rn is an m-dimensional subspace of
Rn . If S is a hypersurface, i.e., an (n − 1)-surface in Rn , then (Tp S)⊥ is a
1-dimensional subspace of Rn and there exists a unit basis vector of this
space. This basis vector is uniquely defined up its sign.
16.6. Definition. Let S ⊂ Rn be a hypersurface. Then a unit vector that is
orthogonal to all tangent vectors to S at p is called a unit normal vector
to S at p and denoted by N(p).
The mapping
N : S → Rn ;
is called the normal vector field on S.
p 7→ N(p)
Parametrized Surfaces and Surface Integrals
Slide 475
Orientation of Hypersurfaces
16.7. Example.
Returning
to Example 16.5, the unit normal vector at
√
√
p = (1= 2; 0; 1= 2) is orthogonal to both tffi (p) and t„ (p) and given by
0√
1
2=2
B
C
N(p) = ± @√ 0 A = ±p
2=2
where we are free to choose a sign arbitrarily.
Since the unit normal vector is uniquely determined up to its sign, there
are two possible choices for a normal vector at each p ∈ S. Usually, one
chooses the direction of the normal vector at a single point of S and
attempts to choose the normal vector at all other points of S in such a way
that the normal vector field is continuous on S.
Parametrized Surfaces and Surface Integrals
Slide 476
Orientation of Hypersurfaces
16.8. Definition.
(i) A hypersurface S ⊂ Rn such that it admits a continuous normal
vector field is said to be orientable.
(ii) A choice of direction for the normal vector field is called an
orientation of S.
(iii) A hypersurface that is the boundary of a measurable set ˙ ⊂ Rn with
non-zero measure is said to be a closed surface.
(iv) A closed hypersurface is said to have positive orientation if the
normal vector field is chosen so that the normal vectors point
outwards from ˙.
Later, we will give a more accessible way of distinguishing between a closed
surface and the alternative, a surface with boundary.
Parametrized Surfaces and Surface Integrals
Slide 477
Orientation of Surfaces
16.9. Example. The classic example of a 2-surface that is not orientable is
the Möbius strip in R3 :
A parametrization is given by
’ : [−w ; w ] × [0; 2ı) → R3 ;
0
1
(R + s cos(t=2)) cos t
B
C
’(s; t) = @ (R + s cos(t=2)) sin t A
s sin(t=2)
The above parametrization gives a Möbius strip lying in the x1 -x2 plane of
width 2w > 0.
Suppose a normal vector is chosen at some point p. Moving the normal
vector around the strip back to its initial position p, it then points in the
other direction. Hence, the normal vector field is not continuous.
Parametrized Surfaces and Surface Integrals
Slide 478
Normal Vectors for Curves in R2
In R2 , a curve is a hypersurface and we have already introduced the
concept of a normal vector in Definition 13.9. There, the normal vector
always points into the direction of change of the tangent vector of the
curve. This can cause the normal vector to “jump” as the curve winds, in
which case we do not obtain a continuous normal vector field.
Therefore, whenever we regard a curve in R2 as being a surface, we will
use the normal vector convention for surfaces described here rather than
Definition 13.9.
Parametrized Surfaces and Surface Integrals
Slide 479
Infinitesimal Surface Elements of Hypersurfaces
Our goal now is to define the area of surfaces. Consider a parametrized
2-surface S in R3 . At any point p ∈ S there exist two tangent vectors
t1 (p) and t2 (p). Suppose ’ = ’(x1 ; x2 ) is a parametrization of S and
p = ’(x). We would like to define an “infinitesimal surface element”
dA = area of the parallelogram spanned by t1 and t2 at ’(x1 ; x2 ) · dx1 dx2
= ∥t1 × t2 ∥ ◦ ’(x1 ; x2 ) dx1 dx2
However, this expression would not generalize well to higher dimensions.
Another approach would be make use of the unit normal vector N at
p = ’(x) (because S is a hypersurface). We can hence replace the area of
the parallelogram spanned by t1 and t2 by the volume of the parallelepiped
spanned by t1 ; t2 and N.
Parametrized Surfaces and Surface Integrals
Slide 480
Volume (Area) of Hypersurfaces
We define the scalar surface element of a hypersurface in R3 by
dA = |det(t1 ; t2 ; N) ◦ ’(x1 ; x2 )| dx1 dx2 :
This may be generalized to hypersurfaces in Rn by setting
dA = |det(t1 ; t2 ; : : : ; tn−1 ; N) ◦ ’(x1 ; : : : ; xn−1 )| dx1 dx2 : : : dxn−1 :
16.10. Definition. Let S ⊂ Rn be a hypersurface with parametrization
’ ∈ C 1 (˙; Rn ), ˙ ⊂ Rn−1 . Let tj = D’ej , j = 1; : : : ; n − 1, be the
tangent vector fields on S. Let N be a chosen normal vector field on S (so
that S is oriented). Then the volume or area of S is defined as
|S| :=
Z
˙
|det(t1 ; : : : ; tn−1 ; N) ◦ ’(x)| dx1 dx2 : : : dxn−1 :
Parametrized Surfaces and Surface Integrals
Slide 481
Area of the Unit Sphere in R3
16.11. Example. In Example 16.5 we have see that S 2 with parametrization
0
1
cos ffi sin „
B
C
’(ffi; „) = @ sin ffi sin „ A :
cos „
has tangent vectors
0
1
− sin ffi sin „
B
C
tffi ◦ ’(ffi; „) = @ cos ffi sin „ A ;
0
0
1
cos ffi cos „
B
C
t„ ◦ ’(ffi; „) = @ sin ffi cos „ A
− sin „
To calculate the normal vector, we can simply take
0
1
− cos ffi sin2 „
B
C
(tffi × t„ ) ◦ ’(ffi; „) = @ − sin ffi sin2 „ A
− cos „ sin „
Parametrized Surfaces and Surface Integrals
Slide 482
Area of the Unit Sphere in R3
Taking account of |tffi × t„ | = sin „, we have
0
1
cos ffi sin „
B
C
N ◦ ’(ffi; „) = − @ sin ffi sin „ A :
cos „
Then the area of the unit sphere is given by
˛
0
Z 2ı Z ı ˛˛
− sin ffi sin „
˛
B
|S 2 | =
det
˛
@ cos ffi sin „
0
0 ˛˛
0
˛
0
˛
Z 2ı Z ı
− sin ffi
˛
˛
B
=
sin „ ˛det @ cos ffi
˛
0
0
˛
0
Z ı
= 2ı
sin „ d„ = 4ı
0
1˛
cos ffi cos „ − cos ffi sin „ ˛˛
C˛
sin ffi cos „ − sin ffi sin „ A˛ d„ dffi
˛
˛
− sin „
− cos „
1˛
cos ffi cos „ cos ffi sin „ ˛˛
C˛
sin ffi cos „ sin ffi sin „ A˛ d„ dffi
˛
˛
− sin „
cos „
Parametrized Surfaces and Surface Integrals
Slide 483
Infinitesimal Surface Elements of Arbitrary Surfaces
We would like to generalize the concept of area and infinitesimal surface
elements from hypersurfaces to arbitrary surfaces in Rn . From the
beginning, the introduction of the normal vector to calculate the surface
area by means of the volume was undertaken by necessity rather than
through any other considerations.
We note that, in block matrix notation.
`
´
det(t1 ; : : : ; tn−1 ; N)2 = det (t1 ; : : : ; tn−1 ; N)T · det(t1 ; : : : ; tn−1 ; N)
`
= det (t1 ; : : : ; tn−1 ; N)T · (t1 ; : : : ; tn−1 ; N)
00
1
t1T
BB . C
BB .. C
BB
C
= det BB
C
C
C
C · (t1 ; : : : ; tn−1 ; N)C
T
@@tn−1
A
NT
1
A
´
Parametrized Surfaces and Surface Integrals
Slide 484
Infinitesimal Surface Elements of Arbitrary Surfaces
Performing the row-by-column matrix multiplication, we see that
0
⟨t1 ; t1 ⟩
B
..
B
.
B
1
· · · ⟨t1 ; tn−1 ⟩
⟨t1 ; N⟩
C
..
..
..
C
.
2
.
.
C
det(t1 ; : : : ; tn−1 ; N) = det B
C
@⟨tn−1 ; t1 ⟩ · · · ⟨tn−1 ; tn−1 ⟩ ⟨tn−1 ; N⟩A
⟨N; t1 ⟩ · · · ⟨N; tn−1 ⟩
⟨N; N⟩
1
0
⟨t1 ; t1 ⟩
B
..
B
.
B
· · · ⟨t1 ; tn−1 ⟩ 0
..
.. C
..
.
.
.C
C
= det B
C
@⟨tn−1 ; t1 ⟩ · · · ⟨tn−1 ; tn−1 ⟩ 0A
0
···
0
1
0
⟨t1 ; t1 ⟩
B
..
= det @
.
1
· · · ⟨t1 ; tn−1 ⟩
C
..
..
A
.
.
⟨tn−1 ; t1 ⟩ · · · ⟨tn−1 ; tn−1 ⟩
where we have used that the normal vector is orthogonal to all tangent
vectors and has unit length.
Parametrized Surfaces and Surface Integrals
Slide 485
The Metric Tensor
16.12. Definition. Let S ⊂ Rn be an m-surface with parametrization ’ and
tangent vector fields t1 ; : : : ; tm . Then G ∈ Mat(m × m; R) given by
0
1
⟨t1 ; t1 ⟩
B
G := @ ...
· · · ⟨t1 ; tm ⟩
C
..
..
A
.
.
⟨tm ; t1 ⟩ · · · ⟨tm ; tm ⟩
is said to be the metric tensor on S with respect to ’. The coefficients
gij := ⟨ti ; tj ⟩;
i; j = 1; : : : ; m;
are called the metric coefficients of G. We often write
g (x) = det G(’(x))
for short.
Parametrized Surfaces and Surface Integrals
Slide 486
The Metric Tensor
16.13. Remarks.
(i) We have proved that if S is a hypersurface in Rn , then
√
|det(t1 ; : : : ; tn−1 ; N)| = det G:
This will allow us to extend the definition of area/volume to general
surfaces.
(ii) In the case n = 3, m = 2, we have
!
∥t1 ∥2 ⟨t1 ; t2 ⟩
g = det
⟨t2 ; t1 ⟩ ∥t2 ∥2
= ∥t1 ∥2 ∥t2 ∥2 − ⟨t1 ; t2 ⟩2
`
´
= ∥t1 ∥2 ∥t2 ∥2 1 − cos2 \(t1 ; t2 )
= ∥t1 ∥2 ∥t2 ∥2 sin2 \(t1 ; t2 )
= ∥t1 × t2 ∥2
so that
dA = ∥t1 × t2 ∥ ◦ ’(x) dx1 dx2 :
Parametrized Surfaces and Surface Integrals
Slide 487
Scalar Surface Integrals
16.14. Definition. Let S be a parametrized m-surface with parametrization
’ : ˙ → S, ˙ ⊂ Rm . Then
|S| :=
Z q
g (x) dx
˙
defines the volume or area of S.
Let f : S → R be a potential function. Then the (scalar) surface integral
of f over S is defined as
Z
f dA :=
S
Z
˙
f ◦ ’(x)
q
16.15. Remark. As usual,
dA :=
q
g (x) dx
is called the scalar surface element of S.
g (x) dx
Parametrized Surfaces and Surface Integrals
Slide 488
Electrostatic Potential of a Surface Charge
16.16. Example. The electrostatic potential V (p) at a point p ∈ R3
induced by a charged surface S is given by
1
V (p) =
4ı"0
Z
S
%( · )
dA
dist(p; · )
where % is the charge density of the surface. Let
S = {(x; y ; z) ∈
R3 : x 2 + y 2 = z 2 ; 0 ≤ z ≤ 1}
and assume that % is constant on S. We calculate the potential at the
point p = (0; 0; 1). Introducing polar coordinates, we have
S = {(x; y ; z) ∈
R3 : x = r cos „; y = r sin „; z = r;
0 ≤ „ ≤ 2ı; 0 ≤ r ≤ 1}:
Parametrized Surfaces and Surface Integrals
Slide 489
Electrostatic Potential of a Surface Charge
We can read off that a parametrization of S is given by
0
’ : [0; 2ı] × [0; 1] → S;
The tangent vectors are
0
1
0
−r sin „
B
C
t„ ◦ ’(„; r ) = @ r cos „ A ;
0
1
cos „
B
C
tr ◦ ’(„; r ) = @ sin „ A :
1
Hence,
!˛
⟨t„ ; t„ ⟩ ⟨t„ ; tr ⟩ ˛˛
g („; r ) = det
⟨tr ; t„ ⟩ ⟨tr ; tr ⟩ ˛’(„;r )
!
r2 0
= det
0 2
= 2r 2 :
1
r cos „
B
C
’(„; r ) = @ r sin „ A :
r
Parametrized Surfaces and Surface Integrals
Slide 490
Electrostatic Potential of a Surface Charge
It follows that the volume element is given by
√
dA = 2r dr d„:
We then have
V (p) =
%
4ı"0
%
=
4ı"0
Z
dA
S ∥p − ( · )∥
Z 2ı Z 1
0
Z
0
q
√
2r dr d„
r 2 cos2 „ + r 2 sin2 „ + (1 − r )2
1
%
r dr
√
=√
2"0 0
2r 2 − 2r + 1
√
%
ln(3 + 2 2)
=
4"0
Slide 491
Part 3: Vector Fields and Higher Order Derivatives
Slide 492
Vector Fields and Higher Order Derivatives
17. Potential Functions and the Gradient
18. Vector Fields and Integrals
19. Flux and Circulation
20. Fundamental Theorems of Vector Calculus
21. The Second Derivative
22. Free Extrema
23. Taylor Series
Potential Functions and the Gradient
Slide 493
17. Potential Functions and the Gradient
Potential Functions and the Gradient
Slide 494
Vector Fields and Higher Order Derivatives
17. Potential Functions and the Gradient
18. Vector Fields and Integrals
19. Flux and Circulation
20. Fundamental Theorems of Vector Calculus
21. The Second Derivative
22. Free Extrema
23. Taylor Series
Potential Functions and the Gradient
Slide 495
Potentials
A map f : ˙ → R where ˙ ⊂ Rn is called a scalar function or a
potential. Physically, if n = 3, a potential assigns to each point in space a
scalar value. Examples include temperature, pressure or height.
While such functions have appeared in the previous sections, here we will
take a closer look at some basic aspects of potentials. The first question is
how to visualize them.
Potential Functions and the Gradient
Slide 496
Visualizing Functions f : R2 → R
Suppose we have a function f : R2 → R, i.e., a real function of two
variables. One method of graphing such a a function is using a
three-dimensional graph showing the (x1 ; x2 ; z)-axes and plotting
z = f (x1 ; x2 ). For example, the graph below shows the function
f : [0; 4ı] × [0; 4ı] → R;
f (x1 ; x2 ) = cos x1 + cos x2
Potential Functions and the Gradient
Slide 497
3D Plots with Mathematica
17.1. Example. The following Mathematica command creates the
three-dimensional plot on the previous slide:
Plot3D@Cos@xD + Cos@yD, 8x, 0, 4 Pi<, 8y, 0, 4 Pi<,
Mesh ® False,
Ticks ® 880, Π, 2 Π, 3 Π<, 80, Π, 2 Π, 3 Π<, 8- 2, 0, 2<<,
AxesLabel ® 8x1 , x2 , f@x1 , x2 D<,
BaseStyle ® 8FontSize ® 12, FontFamily ® "CMU Sans Serif"<D
Potential Functions and the Gradient
Slide 498
Contour Plots
Another representation for functions f : R2 → R is the so-called contour
plot. In this two-dimensional graph we plot curves
C¸ = f −1 ({¸})
for several values of ¸. These are the pre-image sets (see (9.3)) of {¸}.
Potential Functions and the Gradient
Slide 499
Contour Plots
To illustrate, we successively show contours of
f : [0; 4ı] × [0; 4ı] → R;
1
1.9
0
f (x1 ; x2 ) = cos x1 + cos x2
1
1.5
1.9
1.9
1
0
1.5
1.5
0.5
0.5
0.5
-1.9
3Π
-1.9
-1
-1
-0.5
-0.5
-1.5
-1.5
1
1
1.9
1.9
0.5
x2 2 Π
1.5
1
1.9
0.5
0.5
1.5
-0.5
1.5
-1
Π 0
-1.5
-1.9
-1.5
0
-1.9
0.5
-1
1
1
-0.5
1.9
1.9
1.5
0
0
0.5
1
Π
1.9
2Π
0.5
1.5
3Π
1.5
Potential Functions and the Gradient
Slide 500
Contour Plots
Instead of labeling, Mathematica can also color-code the contours
according to their values. Here, dark colors represent smaller values, light
colors larger values.
Potential Functions and the Gradient
Slide 501
Contour Plots with Mathematica
17.2. Example. The following Mathematica commands creates the contour
plots on the previous slides:
ContourPlot@Cos@xD + Cos@yD, 8x, 0, 4 Pi<, 8y, 0, 4 Pi<,
FrameLabel ® 8x1 , x2 <, RotateLabel ® False,
FrameTicks ® 880, Π, 2 Π, 3 Π, 4 Π<, 80, Π, 2 Π, 3 Π, 4 Π<, 8<, 8<<,
ContourStyle ® 88RGBColor@0, 1, 0.5D, Thickness@0.004D<<,
BaseStyle ® 8FontSize ® 14, FontFamily ® "CMU Sans Serif"<,
PlotPoints ® 50, ContourLabels ®
HText@Framed@ð3, FrameStyle ® White, FrameMargins ® 0.2D,
8ð1, ð2<, Background ® White, BaseStyle ® 8FontSize -> 10<D &L,
Contours ® 80, 0.5, - 0.5, - 1, - 1.5, 1, 1.5, 1.9, - 1.9<,
ContourShading ® NoneD
ContourPlot@Cos@xD + Cos@yD, 8x, 0, 4 Pi<, 8y, 0, 4 Pi<,
FrameLabel ® 8x1 , x2 <, RotateLabel ® False,
FrameTicks ® 880, Π, 2 Π, 3 Π, 4 Π<, 80, Π, 2 Π, 3 Π, 4 Π<, 8<, 8<<,
BaseStyle ® 8FontSize ® 16, FontFamily ® "CMU Sans Serif"<,
PlotPoints ® 50, Contours ® 10D
Potential Functions and the Gradient
Slide 502
Phase Curves
In the hamiltonian formulation of analytical mechanics, one defines a
so-called Hamilton function H for a mechanical system. This function is
the sum of the kinetic energy (T ) and potential energy (V ). It represents
the total energy of the system, and remains constant if the system satisfies
the law of energy conservation (there are not, for example, any frictional
forces). We will assume this for our present discussion.
Inthis approach, the essential variables of a system are the position x and
the momentum p. The variables are tracked in so-called phase space
Rnx × Rnp = R2n
(x;p) , where, typically, n = 1; 2 or 3.
The time-evolution of the system is represented through phase curves in
R2n , which are given by the contour lines of H, which is regarded as a
function R2n → R. In other words, a phase curve is the set H −1 (E), where
E is the conserved energy of the system.
Potential Functions and the Gradient
Slide 503
Phase Curves
17.3. Example. For the simple harmonic oscillator, the kinetic energy is
given by T = 12 mv 2 = p 2 =(2m) and the potential energy is given by
V = k2 x 2 , so
1 2 k 2
H(x; p) =
p + x :
2m
2
The phase curves of the system are ellipses in R2x;p , with
each ellipse describing the behavior of a harmonic oscillator
at a fixed energy E.
Potential Functions and the Gradient
Slide 504
Phase Curves
17.4. Example. For a mathematical pendulum of length l with mass m,
V = −mg l cos „, so
1 2
H(„; p) =
p − mg l cos „:
2m
Sketch the phase curves of the pendulum for different energies and
interpret them physically!
Potential Functions and the Gradient
Slide 505
Derivatives of Potential Functions
The Jacobian of a differentiable potential is given by
Df |x =
“
˛
@f ˛
@x1 x
:::
˛ ”
@f ˛
@xn x
:
The row vector Df |x may be regarded as a linear map Df |x : Rn → R,
Df |x y =
“
˛
@f ˛
@x1 x
n
X
:::
0 1
” y1
˛
B .. C
@f ˛
@ A
@xn x
.
yn
˛
@f ˛˛
=
yi
@xi ˛x
i=1
Thus Df |x ∈ (Rn )∗ , the dual space of Rn (see Examples 4.6 ii)).
Potential Functions and the Gradient
Slide 506
Coordinate Maps
Classically, we considered xj (1 ≤ j ≤ n) as a coordinate of the vector
x = (x1 ; : : : ; xn ) ∈ Rn . We now introduce a different interpretation. Define
the map
0
1
x1
B .. C
x = @ . A 7→ xj :
Rn → R;
xn
This map is clearly linear; it is the coordinate map that assigns to x ∈ Rn
its coordinate xj . We denote this map by xj also; hence xj : x 7→ xj or
xj (x) = xj . The dual meaning of xj as a map and the value of this map is
very convenient. In fact, the entire discipline of differential geometry
hinges on exploiting this ambiguity.
The derivative of the map xj is given by
dxj = (0; : : : ; 0; 1; 0; : : : ; 0)
↑
j
(17.1)
Potential Functions and the Gradient
Slide 507
The Differential and the Gradient
Note that we have written dxj instead of Dxj ; this is a traditional notation
for potential functions. The derivative is also called a differential and
written df instead of Df . Note also that dxj |x does not depend on x.
Therefore, we have
df |x =
“
˛
@f ˛
@x1 x
:::
˛ ”
@f ˛
@xn x
˛
˛
@f ˛˛
@f ˛˛
=
dx1 + · · · +
dxn
˛
@x1 x
@xn ˛x
Each differential dxj = ej∗ is the dual basis vector to the standard basis
vector ej with respect to the euclidean scalar product.
The transpose of the Jacobian is the gradient,
0 @f ˛ 1
˛
@x1 x
B
. C
C
∇f (x) := (Jf (x))T = B
@ .. ˛ A
@f ˛
@xn x
The triangle symbol is called nabla.
Potential Functions and the Gradient
Slide 508
Etymology of Nabla
The term “nabla” derives from the greek word for a Phoenician harp,
whose shape the nabla triangle ∇ is supposed to resemble.
It was used by the physicists James Maxwell
and Peter Tait (the latter developed much of
the modern mathematics of the nabla operator) in their private correspondence. There is
evidence that this was a private joke between
them and Maxwell did not use the term in serious publications. However, it became popular
nevertheless, being used by William Thomson
(Lord Kelvin) at the end of the 19th century.
Another proposal has been to call the symbol
“del”, but it seems that “nabla” is the most
common term today.
Harps from 1911 Webster’s Dictionary. Wikimedia Commons.
Wikimedia Foundation. Web. 18 July 2018
Potential Functions and the Gradient
Slide 509
The Directional Derivative
17.5. Definition. Let ˙ ⊂ Rn be an open set, f : ˙ → R continuous and
h ∈ Rn , ∥h∥ = 1, be a unit vector. Then the directional derivative Dh f
in the direction h is defined by
Dh f |x :=
˛
d
˛
f (x + th)˛ :
dt
t=0
(17.2)
if the right-hand side exists.
17.6. Remarks.
(i) It is essential that ∥h∥ = 1, otherwise the slope will not be scaled
correctly.
(ii) The directional derivative is a number, in contradistinction to the
derivative. Thus it should perhaps be more properly known as the
“directional slope.”
Potential Functions and the Gradient
Slide 510
Interpretation of the Directional Derivative
The directional derivative has the following interpretation: if
‚(t) = x + th, t ∈ [0; 1], parametrizes the straight line segment joining x
and x + h, then Dh f is simply the derivative of f ◦ ‚.
Hence,
The directional derivative Dh f |x is the derivative of f at x along
the line segment joining x and x + h.
Another way of stating this is
The directional derivative Dh f |x gives the slope of the tangent
line of f at x in the direction of h.
Potential Functions and the Gradient
Slide 511
Visualization of the Directional Derivative
x2
1
0
-1
0.5
0
fHxL
x+h
-0.5
x
-1
0
x1
1
fHx1 ,x2 L
Potential Functions and the Gradient
Slide 512
The Directional Derivative
We note that the tangent line of f : Rn → R at x in the direction h is
given by
tf ;x;h (s) =
!
x + sh
;
f (x) + Dh f |x s
s ∈ R;
(17.3)
where h ∈ Rn (so the above vector is a “block vector” with n + 1 entries).
For functions f : R2 → R the directional derivative is sometimes specified
through the angle „. This understood to mean that
h=
!
cos „
:
sin „
Potential Functions and the Gradient
Slide 513
The Directional Derivative
17.7. Example. Let f : R2 → R, f (x1 ; x2 ) = x12 − 4x2 . Then the directional
derivative of f at x in the direction is
˛
˛
˛
´˛
d
d `
f (x + th)˛˛
=
(x1 + th1 )2 − 4(x2 + th2 ) ˛˛
dt
dt
t=0
t=0
= 2h1 (x1 + th1 ) − 4h2 |t=0
Dh f |x =
= 2h1 x1 − 4h2 :
√
√
For h = (1= 2; 1= 2) (or „ = ı=4) we would have
√
√
Dh f |x = 2x1 − 2 2:
At x = (0; 0), the directional derivative in direction h is
√
Dh f |x=0 = −2 2
Potential Functions and the Gradient
Slide 514
The Directional Derivative for Smooth Functions
Suppose that f is differentiable. If ‚(t) = x + th, t ∈ [0; 1], parametrizes
the straight line segment joining x and x + h, then by the chain rule
Dh f |x =
˛
˛
d
˛
˛
f (x + th)˛
= Df |x+th h˛
= Df |x h
dt
t=0
t=0
so
Dh f |x = Df |x h = ⟨∇f (x); h⟩:
(17.4)
This is a useful expression for calculating the directional derivative, but it
supposes that f is differentiable. In practice, (17.4) will be valid if the
partial derivatives if f exist and are continuous at x.
Potential Functions and the Gradient
Slide 515
The Directional Derivative for Smooth Functions
17.8. Example. Returning to Example 17.7, we have
∇f (x) =
2x1
−4
!
Since the partial derivatives are continuous,
Dh f |x = ⟨∇f (x); h⟩ =
*
!
2x1
h
; 1
−4
h2
!+
This coincides with the result obtained previously.
= 2x1 h1 − 4h2 :
Potential Functions and the Gradient
Slide 516
The Normal Derivative
An important special case of the directional derivative is the normal
derivative.
17.9. Definition. Let ˙ ⊂ Rn be an open set, f : ˙ → R and S∗ a smooth,
oriented, parametrized hypersurface in ˙. Let p ∈ S and N(p) denote the
normal vector at p. Then
˛
@f ˛˛
:= DN(p) f |p
@n ˛p
is called the normal derivative of f at p (with respect to the oriented
surface S∗ ).
17.10. Example. Let f : R2 → R, f (x1 ; x2 ) = x12 − 4x2 , and
R2 : x2 = x12 ; x1 ∈ R}:
Then C is parametrized by ‚(t) = (t; t 2 ), t ∈ R, and
C = {(x1 ; x2 ) ∈
!
1
1
T ◦ ‚(t) = √
:
2
1 + 4t 2t
Potential Functions and the Gradient
Slide 517
The Normal Derivative in R2
A normal vector is then found from
−4t
1
(T ◦ ‚) (t) =
3=2
2
2t
(1 + 4t )
′
0
!
1
−4t
(1+4t 2 )3=2 A
@
= −8t 2 +2(1+4t 2 )
(1+4t 2 )3=2
2
=
(1 + 4t 2 )3=2
−2t
1
!
1
0
+√
2
1 + 4t 2
!
The unit normal vector is found by normalizing (T ◦ ‚)′ , so we have
1
N ◦ ‚(t) = √
1 + 4t 2
!
−2t
:
1
Potential Functions and the Gradient
Slide 518
The Normal Derivative in R2
At a point p = ‚(t) on C the normal derivative is hence
˛
@f ˛˛
= ⟨∇f (‚(t)); N ◦ ‚(t)⟩
@n ˛‚(t)
1
=√
1 + 4t 2
*
4(t 2 + 1)
= −√
1 + 4t 2
!
2t
−2t
;
−4
1
!+
Potential Functions and the Gradient
Slide 519
Properties of the Gradient
The gradient vector of f at x, ∇f (x), has some interesting properties:
I ∇f (x) points in the direction of the greatest directional
derivative of f at x.
This follows from
Dh f (x) = ⟨∇f (x); h⟩ = |∇f (x)| cos \(∇f (x); h);
which becomes maximal if \(∇f (x); h) = 0.
I ∇f (x) is perpendicular to the contour line of f
at x.
More precisely, it is perpendicular to the tangent line of the contour lie
at x. This is due to the fact that the tangent line to the contour is
parallel to the direction h0 in which Dh0 f (x) = 0, so
⟨∇f (x); h0 ⟩ = 0:
Vector Fields and Integrals
Slide 520
18. Vector Fields and Integrals
Vector Fields and Integrals
Slide 521
Vector Fields and Higher Order Derivatives
17. Potential Functions and the Gradient
18. Vector Fields and Integrals
19. Flux and Circulation
20. Fundamental Theorems of Vector Calculus
21. The Second Derivative
22. Free Extrema
23. Taylor Series
Vector Fields and Integrals
Slide 522
Vector Fields
We now turn to a very important type of map, the vector field. Vector
fields play an extremely important role in physics and mathematics.
Examples include the flow field of a fluid or the electromagnetic field
induced by a charge.
18.1. Definition. Let ˙ ⊂ Rn . Then a function F : ˙ → Rn ,
0
1
F1 (x)
B .. C
F (x) = @ . A :
Fn (x)
is called a vector field on ˙.
18.2. Example. Let f : Rn → R be a potential function. Then the gradient
field of f given by
F : Rn → Rn ;
F (x) = ∇f (x);
associates to every x ∈ Rn the direction of largest slope of f .
Vector Fields and Integrals
Slide 523
Force Fields
18.3. Example. A mass M situated at the origin of a coordinate system
exerts an attractive force on another mass m at position x ∈ R3 \ {0}.
This force field is given by
F : R3 \ {0} → R3 ;
F (x) = −G
m·M x
;
|x|2 |x|
(18.1)
where G is Newton’s gravitational constant.
Any vector field that associates to each x ∈ Rn a physical force vector is
said to be a force field. (This term of course has only physical, not
mathematical, significance.)
In physics, the concept of work arises from the integration of the forces
acting along a particle’s trajectory (curve), where force that are orthogonal
to the trajectory do not contribute to the work. In particular, the work is
obtained by integrating only the tangential components of the force field.
Vector Fields and Integrals
Slide 524
Gravitational Force Field
The plot below shows the gravitational force field (18.1) by attaching a
vector representing F (x) to each x ∈ R3 \ {0}
Vector Fields and Integrals
Slide 525
Gravitational Force Field
For future examples we will use the two-dimensional version,
m·M x
F : R2 \ {0} → R2 ;
F (x) = −G
;
|x|2 |x|
x2
x1
(18.2)
Vector Fields and Integrals
Slide 526
Streamlines of Fluid Flow
18.4. Example. Consider a fluid flow in R2 where the fluid rotates about
the origin in a counter-clockwise manner. The streamlines show the paths
of a “fluid particle”:
x2
x1
Vector Fields and Integrals
Slide 527
Direction Field of Fluid Flow
The streamlines are circles and the unit tangent vector field (the direction
field) of the circles is given by
F : R2 \ {0} → R2 ;
F (x1 ; x2 ) = q
!
1
x12 + x22
x2
x1
−x2
:
x1
(18.3)
Vector Fields and Integrals
Slide 528
Velocity Field of Fluid Flow
q
The velocity at a distance r = x12 + x22 from the origin is r · !, where
! > 0 is the rotational velocity. Hence, the velocity vector field is given by
v: R →R ;
2
2
!
−x2
v (x1 ; x2 ) = r !F (x1 ; x2 ) = !
:
x1
x2
x1
(18.4)
Vector Fields and Integrals
Slide 529
The Line Integral of a Vector Field
18.5. Definition. Let ˙ ⊂ Rn , F : ˙ → Rn be a continuous vector field and
C∗ ⊂ ˙ an oriented open, smooth curve in Rn . We then define the line
integral of the vector field F along C∗ by
Z
C∗
F d ~‘ :=
Z
C∗
⟨F; T ⟩ d‘
(18.5)
18.6. Remarks.
(i) We have defined the line integral of the vector field F as the line
integral of the scalar product ⟨F; T ⟩ on C∗ . Since T does not depend
on the parametrization of C∗ and the line integral of a scalar function
doesn’t either, the line integral of a vector field is independent of
parametrization of C∗ .
Vector Fields and Integrals
Slide 530
The Line Integral of a Vector Field
(ii) The symbol “d ~‘” can be interpreted geometrically as a vectorial line
element and one often writes
d ~‘ = ‚ ′ (t) dt:
In the same spirit, one sometimes writes
Z
C∗
F d ~‘ =
Z
C∗
⟨F; d ~‘⟩
(iii) Integrals along closed curves are sometimes emphasized by writing
I
C∗
f d‘
if the curve C is closed.
or
I
C∗
F d ~‘
Vector Fields and Integrals
Slide 531
Integrals of Vector Fields
If we calculate the line integral using a concrete parametrization ‚ : I → C,
we obtain
Z
F d ~‘ =
C∗
=
Z
C∗
Z fi
I
=
⟨F; T ⟩ d‘ =
Z
I
F ◦ ‚(t);
Z
⟨F ◦ ‚(t); T ◦ ‚(t)⟩∥‚ ′ (t)∥ dt
I
‚ ′ (t)
∥‚ ′ (t)∥
fl
∥‚ ′ (t)∥ dt
⟨F ◦ ‚(t); ‚ ′ (t)⟩ dt
(18.6)
18.7. Example. Calculate the work performed when traveling in a force field
F (x; y ) = (x; y ) along the parabola y = x 2 in R2 from (0; 0) to (1; 1).
W =
=
Z
C∗
Z 1
0
F d ~‘ =
Z 1
0
⟨F ◦ ‚(t); ‚ ′ (t)⟩ dt =
(t + 2t 3 ) dt =
1 1
+ =1
2 2
Z 1*
0
!
t
1
;
2
t
2t
!+
dt
Vector Fields and Integrals
Slide 532
Integrals of Vector Fields
y
x
Vector Fields and Integrals
Slide 533
Potential Fields
18.8. Definition. Let ˙ ⊂ Rn be an open set. A vector field F : ˙ → Rn is
said to be a potential field if there exists a differentiable potential
function U : ˙ → R such that
F (x) = ∇U(x)
convention in mathematics;
F (x) = −∇U(x)
convention in physics:
or
18.9. Example. The gravitational force field (18.1) introduced in Example
18.3 is a potential field, because F = −∇U for
U : R3 \ {0} → R;
as is easily checked.
U(x) = G
m·M
;
|x|
(18.7)
Vector Fields and Integrals
Slide 534
Integrals of Potential Fields
Potential fields are very useful, as the integral along an oriented open curve
C∗ depends only on the initial and the final point of the curve. This can be
seen from
Z
I
⟨F ◦ ‚(t); ‚ ′ (t)⟩ dt =
=
Z
ZI
I
⟨∇U ◦ ‚(t); ‚ ′ (t)⟩ dt =
Z
I
DU|‚(t) (‚ ′ (t)) dt
(U ◦ ‚)′ (t) dt:
where we have used the chain rule.
Supposing that the initial point of the curve is pinitial and the final point is
pfinal , we have from the fundamental theorem of calculus
Z
C∗
F d ~‘ =
Z
I
(U ◦ ‚)′ (t) dt = U(pfinal ) − U(pinitial ):
(18.8)
We see that for a potential field, the line integral along a simple open curve
C∗ depends only on the initial and final points of C; the shape of the curve
is irrelevant. The potential function U plays the role of a primitive for F .
Vector Fields and Integrals
Slide 535
Conservative Fields
Integrals along closed curves can be easily realized by splitting a closed
curve into two open curves. The final point of one curve is the initial point
of the other curve.
18.10. Lemma. Let ˙ ⊂ Rn be open, F : ˙ → Rn a potential field and
C ⊂ ˙ a closed curve. Then
I
F d ~‘ = 0:
C
The proof is obvious from the preceding discussion.
18.11. Definition. Let ˙ ⊂ Rn be open and F : ˙ → Rn a vector field. If
the integral along any open curve C∗ depends only on the initial and final
points or, equivalently,
I
F d ~‘ = 0
C
then F is called conservative.
for any closed curve C,
Vector Fields and Integrals
Slide 536
Potential Fields are Conservative
In physical terms, a conservative force field has the property that the work
required to move a particle from one point to another does not depend on
the path taken. Therefore, energy is conserved.
18.12. Remark. We note explicitly that every potential field is a
conservative field.
In fact, under certain conditions a conservative field is also a potential field.
18.13. Definition. Let ˙ ⊂ Rn . Then ˙ is said to be (pathwise)
connected if for any two points in ˙ there exists an open curve within ˙
joining the two points.
Vector Fields and Integrals
Slide 537
Conservative Fields are Potential Fields
18.14. Theorem. Let ˙ ⊂ Rn be a connected open set and suppose that
F : ˙ → Rn is a continuous, conservative field. Then F is a potential field.
Proof.
We need to show that there exists a function U such that F = ∇U on ˙.
In fact, we fix an arbitrary point x0 ∈ ˙ and define
U(x) :=
Z
C∗
F d ~‘
for any path C∗ joining x0 and x. (The path exists because ˙ is connected;
the integral does not depend on which path is chosen since F is
conservative.) We will show that
@U
= Fi ;
@xi
i = 1; : : : ; n:
(18.9)
Vector Fields and Integrals
Slide 538
Conservative Fields are Potential Fields
Proof (continued).
Let ei be the ith unit vector and h small enough to ensure that
x + hei ∈ ˙. A path joining x0 to x + hei can be found by taking a path
C∗ joining x0 and x and a straight line segment Ch∗ parametrized by
‚(t) = x + thei , 0 ≤ t ≤ 1. We then have
U(x + hei ) =
Z x+hei
x0
= U(x) +
F d ~‘ =
Z 1
0
= U(x) + h
Z
C∗
F d ~‘ +
Z
Ch∗
F d ~‘
⟨F (x + thei ); hei ⟩ dt
Z 1
Fi (x + thei ) dt
0
= U(x) + hFi (x) + h
Z 1
0
`
´
Fi (x + thei ) − Fi (x) dt
Vector Fields and Integrals
Slide 539
Conservative Fields are Potential Fields
Proof (continued).
The proof is complete if we can show that
lim
Z 1
h→0 0
`
´
Fi (x + thei ) − Fi (x) dt = 0:
Since Fi is continuous, we know that for every fixed t ∈ [0; 1]
lim |Fi (x + thei ) − Fi (x)| = 0:
h→0
Then by Lemma 9.16 we have
lim sup |Fi (x + thei ) − Fi (x)| = 0:
h→0 t∈[0;1]
and, since
˛Z 1
˛
˛ `
´ ˛
˛
Fi (x + thei ) − Fi (x) dt ˛˛ ≤ sup |Fi (x + thei ) − Fi (x)|;
˛
0
we are finished.
t∈[0;1]
Vector Fields and Integrals
Slide 540
Criteria for Potential Fields
18.15. Lemma. Let ˙ ⊂ Rn be a connected open set and suppose that
F : ˙ → Rn is continuously differentiable. If F is a potential field then the
relations
@Fi
@Fj
=
:
@xj
@xi
(18.10)
hold for all i; j = 1; : : : ; n.
The proof, which is based on an analysis of the second derivative of the
potential, will be deferred to a later section.
Vector Fields and Integrals
Slide 541
Criteria for Potential Fields
18.16. Example. The velocity field (18.4) introduced in Example 18.4 is not
a potential field, since F (x1 ; x2 ) = !(−x2 ; x1 ) and
@F1
@F2
= −! ̸= ! =
:
@x2
@x1
Note that (18.10) is necessary, but not sufficient, for a field to be a
potential field.
18.17. Example. The field
F : R \ {0} → R ;
2
satisfies
H
2
1
F (x1 ; x2 ) = 2
x1 + x22
@F2
@F1
=
@x2
@x1
!
−x2
:
x1
but S 1 F d ~‘ ̸= 0, so F is not a potential field in R2 \ {0}. Details are left
to the assignments.
Vector Fields and Integrals
Slide 542
Criteria for Potential Fields
On certain “nice” sets, however, we do have a converse theorem:
18.18. Theorem. Let ˙ ⊂ Rn be a simply connected open set and
suppose that F : ˙ → Rn is continuously differentiable. If for all
i; j = 1; : : : ; n
@Fi
@Fj
=
;
@xj
@xi
then F is a potential field.
We will not have time prove this result here. However, we do need to
explain what a “simply connected” set is.
Loosely speaking, a set ˙ ⊂ Rn is said to be simply connected if
(i) ˙ is pathwise connected and
(ii) every closed curve in ˙ can be contracted to a single point within ˙.
Vector Fields and Integrals
Slide 543
Simply Connected Sets
Salix alba. A homotopy of a circle around a sphere can be reduced to single point.. 2006. Wikipedia. Wikimedia Foundation. Web. 12 July 2012
For example, the unit sphere S 2 = {(x1 ; x2 ; x3 ) ∈ R3 : x12 + x22 + x32 = 1} is
simply connected, because any closed curve can be “continuously
contracted”, staying on the sphere the entire time, until it becomes a
single point.
Intuitively, a closed curve can be imagines as a stretched rubber band. If
any rubber band can be contracted to a single point within a set, then the
set is simply connected.
Vector Fields and Integrals
Slide 544
Simply Connected Sets
18.19. Examples.
(i) R2 \ {(x1 ; x2 ) : x12 + x22 ≤ 1} is not simply connected.
(ii) R2 \ {0} is not simply connected.
(iii) R3 \ {0} is simply connected.
(iv) A torus is not simply connected.
A closed curve C in a set ˙ can be thought of as the image of a
continuous function g : S 1 → C, where S 1 = {(x1 ; x2 ) ∈ R2 : x12 + x22 = 1}.
Let us write D = {(x1 ; x2 ) ∈ R2 : x12 + x22 ≤ 1}.
18.20. Definition. Let ˙ ⊂ Rn be an open set.
(i) A closed curve C ⊂ ˙ given as the image of a map g : S 1 → C is said
to be contractible to a point if there exists a continuous function
G : D → ˙ such that G|S 1 = g .
(ii) The set ˙ is said to be simply connected if it is connected and
every closed curve in ˙ is contractible to a point.
Vector Fields and Integrals
Slide 545
Determining Potentials
We will develop a practical way of obtaining a potential function for a
vector field. The idea is simply to integrate the components of the field
and compare the results, then try to find a compatible potential. This is
best demonstrated by an example.
18.21. Example. Consider the field F (x1 ; x2 ) = (x12 + x22 ; 2x2 x1 + x22 ). Since
F is defined on the simply connected set R2 and
@F1
@F2
= 2x2 =
@x2
@x1
@U
@U
the field is a potential field, i.e., F1 = @x
, F2 = @x
for some U : R2 → R.
1
2
We integrate the components to find U:
U(x1 ; x2 ) =
Z
1
F1 (x1 ; x2 ) dx1 = x13 + x22 x1 + C1 (x2 )
3
where the integration constant C1 may be a function of x2 .
(18.11)
Vector Fields and Integrals
Slide 546
Determining Potentials
We repeat this for the second component:
U(x1 ; x2 ) =
Z
1
F2 (x1 ; x2 ) dx2 = x23 + x22 x1 + C2 (x1 )
3
(18.12)
where the integration constant C2 is allowed to depend on x1 .
Comparing (18.11) with (18.12), we see that
1
U(x1 ; x2 ) = (x13 + x23 ) + x22 x1
3
is a potential function for F (of course, we can add any constant to U if we
like).
This procedure works analogously for vector fields in Rn .
Vector Fields and Integrals
Slide 547
Differential Forms
We conclude this section by discussing an alternative notation for
integration of vector fields. More properly, these concepts belong to a
formal discussion of curves and surfaces in the field of vector analysis. We
mention them here only because they might be encountered in certain
old-fashioned textbooks.
The transpose of a vector field is called a differential form:
“
”
F (x)T = F1 (x); : : : ; Fn (x)
= F1 (x) dx1 + · · · + Fn (x) dxn ;
where the differentials dxj , j = 1; : : : ; n, are simply the standard basis row
vectors, as defined in (17.1).
18.22. Definition. Let F1 ; : : : ; Fn : Rn → R be scalar functions. Then
¸ = F1 dx1 + · · · + Fn dxn
is said to be a differential one-form.
Vector Fields and Integrals
Slide 548
Integral of a Differential Form
We then simply define
Z
C∗
¸ :=
Z
C∗
F d ~‘
where F = (F1 ; : : : ; Fn )T is the transpose of the differential form ¸.
18.23. Example. We integrate the form 4y dx + 2x 2 y dy in
counter-clockwise direction along the unit circle S 1 ⊂ R2 . We parametrize
the circle by ‚(„) = (cos „; sin „), 0 ≤ „ < 2ı:
I
2
4y dx + 2x y dy =
S1
!
I
4y
d ~‘
2x 2 y
S1
=
Z 2ı
`
0
= −8ı:
´
4 sin „ · (− sin „) + 2 cos2 „ sin „ · (cos „) d„
Flux and Circulation
Slide 549
19. Flux and Circulation
Flux and Circulation
Slide 550
Vector Fields and Higher Order Derivatives
17. Potential Functions and the Gradient
18. Vector Fields and Integrals
19. Flux and Circulation
20. Fundamental Theorems of Vector Calculus
21. The Second Derivative
22. Free Extrema
23. Taylor Series
Flux and Circulation
Slide 551
Vector Fields of Fluids
In the previous section we have primarily motivated line integrals of vector
fields through the concept of work in a force field. Another physical
approach is to motivate vector fields through velocity fields of fluids. This
turns out to yield further useful concepts in field theory.
x2
x1
We will consider fluid flows in R2 to
introduce general concepts. Observe
the vector field illustrated at left, interpreted as the direction field of a
fluid flow, and the closed curve, interpreted as the boundary of a region.
We can decompose the vector field at
the boundary into a tangential component and a normal component.
Flux and Circulation
Slide 552
Circulation and Flux
We interpret the normal component of the vector field as the part that
flows through the boundary of the region, i.e., into or out of the region.
This is called the flux of the vector field through the boundary.
The tangential component is the part of the vector field that flows around
the boundary, called the circulation of the field.
19.1. Example. Let S 1 = {(x1 ; x2 ) ∈ R2 : x12 + x22 = 1} be the unit circle,
bounding the unit disc. Consider the two vector fields
F; G : R → R ;
2
2
F (x1 ; x2 ) =
!
x1
;
x2
G(x1 ; x2 ) =
!
−x2
:
x1
A unit tangent vector field to S 1 at (x1 ; x2 ) is given by T (x) = (−x2 ; x1 ), so
˛
⟨T; F ⟩˛S 1 = −x2 x1 + x1 x2 = 0;
˛
⟨T; G⟩˛S 1 = −x2 (−x2 ) + x1 x1 = 1:
Flux and Circulation
Slide 553
Circulation and Flux
x2
x2
x1
A unit normal vector field at x ∈ S 1 is given by N(x) = −x, so
˛
⟨N; F ⟩˛S 1 = −x1 x1 + −x2 x2 = −1;
˛
⟨N; G⟩˛S 1 = x1 (−x2 ) + x2 x1 = 0:
x1
Flux and Circulation
Slide 554
Circulation and Flux
19.2. Definition. Let ˙ ⊂ Rn be an open set, F : ˙ → Rn a continuously
differentiable vector field.
Let C∗ be a positively oriented closed curve in Rn . Then
Z
C∗
⟨F; T ⟩ d‘
(19.1)
is called the (total) circulation of F along C∗ .
Let S∗ ⊂ ˙ be an oriented hypersurface. Then
Z
S∗
is called the flux of F through S.
⟨F; N⟩ dA
(19.2)
Flux and Circulation
Slide 555
Circulation and Flux
19.3. Remarks.
I The integral (19.1) coincides with the line integral we defined in
(13.3) and hence also gives the amount of work needed to move a
particle along the closed curve C. In a non-rotating fluid, this work
should be zero.
I In
R2 , a hypersurface is just a curve and (19.2) becomes a line
integral. The normal vector is of course taken according to the
convention used for surfaces.
Flux and Circulation
Slide 556
Flux Through Hypersurfaces
19.4. Remark. We also sometimes write
Z
~
⟨F; d A⟩
S
for the flux integral. The term
~ := N(’(x)) ·
dA
q
g (x) dx
is called the vectorial surface element of a hypersurface S.
For a hypersurface in R3 we have
N=
t1 × t2
;
∥t1 × t2 ∥
so that
~ = t1 (’(x)) × t2 (’(x)) dx1 dx2 :
dA
Flux and Circulation
Slide 557
Flux of an Electrostatic Field
19.5. Example. A point charge Q at the origin induces a field
E(p) =
1
Q
p
4ı"0 ∥p∥3
at any point p ∈ R3 \ {0}. The flux of this field through the unit sphere S 2
is given by
Z
~
⟨E; d A⟩:
S2
As in Example 16.11, we can parametrize S 2 by
0
1
cos ffi sin „
B
C
’(ffi; „) = @ sin ffi sin „ A ;
cos „
0
1
cos ffi sin2 „
B
C
t„ × tffi = @ sin ffi sin2 „ A
cos „ sin „
where 0 < ffi < 2ı and 0 < „ < ı. Here we have chosen the
outward-pointing (positively oriented) normal vector.
Flux and Circulation
Slide 558
Flux of an Electrostatic Field
It follows that
Z
S2
~ =
⟨E; d A⟩
=
Q
4ı"0
Q
4ı"0
Q
4ı"0
Q
= :
"0
=
Z 2ı Z ı
0
0
1
⟨’(ffi; „); tffi × t„ (ffi; „)⟩ d„ dffi
∥’(ffi; „)∥3
|
{z
=1
}
0
1 0
1
Z 2ı Z ı * cos ffi sin „
cos ffi sin2 „ +
B
C B
C
@ sin ffi sin „ A ; @ sin ffi sin2 „ A d„ dffi
0
0
Z 2ı Z ı
0
cos „
cos „ sin „
sin „ d„ dffi
0
The fact that this result is actually true for any closed surface (not just S 2 )
that contains the charge at the origin is known as Gauß’s law in
electrostatics.
Flux and Circulation
Slide 559
Circulation and Flux in R2
19.6. Example. For the vector field F (x1 ; x2 ) = (1; 0) and the square
pictured below, both the circulation and the flux are zero.
y
x
Flux and Circulation
Slide 560
Total and Infinitesimal Flux
The previous example shows clearly that the total flux of a vector field
through a boundary is the difference between “influx” and “efflux” of the
field. In the context of fluid flow, zero (total) flux through a boundary
means
“what flows in also flows out”
or
“there are nor fluid sources or sinks within the boundary”
We now want to characterize vector fields where the flux through any
boundary is zero. In fluid flow, these correspond to fluid fields where the
fluid volume is preserved (incompressible fluids with no external influx or
efflux).
This approach will lead to infinitesimal flux, i.e., the flux of the field
through a given point (instead of across a surface).
Flux and Circulation
Slide 561
The Flux Through a Square
Let ˙ ⊂ R2 be open and consider the flux of a conitnuously differentiable
vector field F : ˙ → R2 through a square of side length 2h, h > 0,
centered at a point x ∈ ˙.
In particular, the square is given by
x2
Sh = [x1 − h; x1 + h] × [x2 − h; x2 + h]
l3
x2 + h
l4
x2 - h
x
and the boundary consists of four
lines,
l2
@Sh = l1 ∪ l2 ∪ l3 ∪ l4 :
l1
x1 - h
x1 + h
Z
@Sh
x1
⟨F; N⟩ ds =
We find the flux through the boundary
by integrating
4 Z
X
lk
⟨F; Nk ⟩ ds
Flux and Circulation
Slide 562
The Flux Through a Square
We have the following parametrizations and normal vectors:
`
´
`
´
` ´
0
1
l1 : N1 = −1
, ‚1 (t) = x2x−h
+ t 10 , −h ≤ t ≤ h,
` ´
`
´
` ´
` ´
`
´
` ´
l2 : N2 = 10 , ‚2 (t) = x1x+h
+ t 01 , −h ≤ t ≤ h,
2
1
l3 : N3 = 01 , ‚3 (t) = x2x+h
− t 10 , −h ≤ t ≤ h,
`
´
`
´
` ´
x1 −h
l4 : N4 = −1
− t 01 , −h ≤ t ≤ h.
0 , ‚4 (t) =
x2
Hence,
Z
@Sh
⟨F; N⟩ ds = −
Z h
+
−h
Z h
F2 (x1 + t; x2 − h) dt +
−h
Z h
F2 (x1 − t; x2 + h) dt −
−h
Z h
F1 (x1 + h; x2 + t) dt
−h
F1 (x1 − h; x2 − t) dt
In the last two integrals, we substitute fi = −t and then rename the
variable t.
Flux and Circulation
Slide 563
The Flux Through a Square
This gives
Z
@Sh
⟨F; N⟩ ds =
Z h
−h
+
`
´
F2 (x1 + t; x2 + h) − F2 (x1 + t; x2 − h) dt
Z h
−h
`
´
F1 (x1 + h; x2 + t) − F1 (x1 − h; x2 + t) dt
We note that
@F2 (x1 + t; x2 )
· h + o(h);
@x2
@F1 (x1 ; x2 + t)
F1 (x1 ± h; x2 + t) = F1 (x1 ; x2 + t) ±
· h + o(h)
@x1
F2 (x1 + t; x2 ± h) = F2 (x1 + t; x2 ) ±
as h → 0, where the small o(h) symbols represent continuous functions of
h and t.
Flux and Circulation
Slide 564
The Flux Through a Square
Inserting this expansion and substituting in the integrals
Z
Z h „
«
@F1 (x1 ; x2 + t)
⟨F; N⟩ ds = 2h
+ o(1) dt
@x1
@Sh
−h
«
Z h „
@F2 (x1 + t; x2 )
+ 2h
+ o(1) dt
@x2
−h
«
Z 1=2 „
@F1 (x1 ; x2 + 2ht)
2
= (2h)
+ o(1) dt
@x1
−1=2
+ (2h)
2
Z 1=2 „
−1=2
«
@F2 (x1 + 2ht; x2 )
+ o(1)
@x2
dt:
Flux and Circulation
Slide 565
The Flux Through a Square
Since the functions o(1) are continuous functions of h and t, we can apply
Lemma 9.16 to see that
˛Z 1=2
˛
˛
˛
˛
o(1) dt ˛˛ ≤
˛
−1=2
sup
h→0
|o(1)| −−−→ 0:
t∈[−1=2;1=2]
Furthermore, we write
Z 1=2
@F1 (x1 ; x2 + 2ht)
dt
@x1
−1=2
=
Z 1=2
@F1 (x1 ; x2 )
dt +
@x1
−1=2
Z 1=2 „
−1=2
@F1 (x1 ; x2 + 2ht) @F1 (x1 ; x2 )
−
@x1
@x1
2
and do the same for the term involving @F
@x2 .
«
dt
Flux and Circulation
Slide 566
The Flux Through a Square
Since F is continuously differentiable,
lim
„
h→0
@F1 (x1 ; x2 + 2ht) @F1 (x1 ; x2 )
−
@x1
@x1
«
=0
for all t ∈ [−1=2; 1=2]. By Lemma 9.16, we then conclude
˛Z 1=2 „
˛
«
˛
˛
@F1 (x1 ; x2 + 2ht) @F1 (x1 ; x2 )
˛
−
dt ˛˛
˛
@x1
@x1
−1=2
˛
˛
˛ @F1 (x1 ; x2 + 2ht)
@F
1 (x1 ; x2 ) ˛˛
˛
≤
sup
−
˛
˛
@x
@x
1
t∈[−1=2;1=2]
1
h→0
−−−→ 0:
This implies that
1
lim
h→0 (2h)2
Z
˛
˛
@F2 ˛˛
@F1 ˛˛
+
:
⟨F; N⟩ ds =
˛
@x1 x
@x2 ˛x
@Sh
Flux and Circulation
Slide 567
Flux Density and the Divergence
We have shown that if Sh is a square of side length 2h centered at x,
˛
˛
Flux through Sh h→0 @F1 ˛˛
@F2 ˛˛
−−−→
+
˛
Area(Sh )
@x1 x
@x2 ˛x
Hence, the limit on the right corresponds to a flux density. There is a
special term for this flux density:
19.7. Definition. Let ˙ ⊂ Rn and F : ˙ → Rn be a continuously
differentiable vector field. Then
div F :=
@F1
@Fn
+ ··· +
@x1
@xn
is called the divergence of F .
The flux density at a point x is given by the divergence of the field at x.
Although we have only proven this in the case of fields in R2 , this holds in
any dimension n ≥ 2 (we will prove this using surface integrals in R3 later).
Flux and Circulation
Slide 568
Circulation Around a Parallelogram
We now turn to the circulation of a vector field. Again, our goal is to find
an expression for the infinitesimal circulation around a point x ∈ Rn . In
contrast to the flux, where we had to restrict ourselves to R2 , we now
consider a line integral in Rn .
Let ˙ ⊂ Rn and F : ˙ → Rn a continuously differentiable vector field. Let
x ∈ Ω and denote by P (u; v ) a parallelogram spanned by vectors u and v
and centered at x. We want to calculate the circulation of F arounbd the
boundary of the parallelogram, given by
Z
F d~s:
@P (u;v )
Our goal is to analyze the integral when ∥u∥ + ∥v ∥ → 0.
Flux and Circulation
Slide 569
Circulation Around a Parallelogram
The parallelogram is the union of four
straight line segments parametrized by
x2
u+v
+ tu;
2
u−v
‚2 (t) = x +
+ tv ;
2
u+v
‚3 (t) = x +
− tu;
2
v −u
‚4 (t) = x +
− tv
2
‚1 (t) = x −
v
x
u
x1
for t ∈ [0; 1].
We will use that for every point ‚j (t) on the parallelogram, we can write
F (‚j (t)) = F (x) + DF |x (‚j (t) − x) + o(∥u∥ + ∥v ∥)
as ∥u∥ + ∥v ∥ → 0.
Flux and Circulation
Slide 570
Circulation Around a Parallelogram
The line integral is then
Z
F d~s =
@P (u;v )
4 Z 1
X
j=1 0
=
Z 1
0
−
=
0
−
⟨F ◦ ‚1 (t); u⟩ dt +
Z 1
Z 1
⟨F ◦ ‚j (t); ‚j′ (t)⟩ dt
0
Z 1
⟨F ◦ ‚3 (t); u⟩ dt −
⟨DF |x ‚1 (t); u⟩ dt +
Z 1
0
`
0
Z 1
0
Z 1
0
⟨DF |x ‚3 (t); u⟩ dt −
2´
+ o (∥u∥ + ∥v ∥)
⟨F ◦ ‚2 (t); v ⟩ dt
⟨F ◦ ‚4 (t); v ⟩ dt
⟨DF |x ‚2 (t); v ⟩ dt
Z 1
0
⟨DF |x ‚4 (t); v ⟩ dt
Flux and Circulation
Slide 571
Circulation Around a Parallelogram
From the linearity of the integral, the inner product and DF |x we can write
Z
F d~s =
@P (u;v )
Z 1
0
+
=
Z 1
Z 1
0
+
⟨DF |x (‚1 (t) − ‚3 (t)); u⟩ dt
0
`
⟨DF |x (‚2 (t) − ‚4 (t)); v ⟩ dt + o (∥u∥ + ∥v ∥)2
−⟨DF |x (u + v − 2tu); u⟩ dt
Z 1
0
`
⟨DF |x (u − v + 2tv ); v ⟩ dt + o (∥u∥ + ∥v ∥)2
`
= ⟨DF |x u; v ⟩ − ⟨DF |x v ; u⟩ + o (∥u∥ + ∥v ∥)2
´
In leading order, the circulation is thus ⟨DF |x u; v ⟩ − ⟨DF |x v ; u⟩.
´
´
Flux and Circulation
Slide 572
The Circulation Density - Rotation / Curl
The expression ⟨DF |x u; v ⟩ − ⟨DF |x v ; u⟩ is clearly anti-symmetric (it
changes sign when u and v are interchanged) and bilinear. As we will see,
it is the main term describing the circulation density in the plane spanned
by u and v . It therefore deserves a special mention.
19.8. Definition. Let ˙ ⊂ Rn be open and F : ˙ → Rn a continuously
differentiable vector field. Then the anti-symmetric, bilinear form
rot F | : R × R → R; rot F | (u; v ) := ⟨DF | u; v ⟩ − ⟨DF | v ; u⟩
x
n
n
x
x
x
(19.3)
is called the rotation (in mainland Europe) or curl (in anglo-saxon
countries) of the vector field F at x ∈ Rn .
We will study this bilinear form in more detail for the case of fields in R2
and R3 .
Flux and Circulation
Slide 573
The Rotation in R2
In R2 , the area of the parallelogram is given by |det(u; v )|. The circulation
density (circulation per unit area) is then
1
|det(u; v )|
Z
F d~s =
@P (u;v )
⟨DF |x u; v ⟩ − ⟨DF |x v ; u⟩
+ o(∥u∥ + ∥v ∥)
det(u; v )
For instance, if we set u = h · u0 and v = h · v0 where h > 0 and
∥u0 ∥; ∥v0 ∥ > 0 are fixed, we have
1
|det(u; v )|
Z
@P (u;v )
h→0
F d~s −−−→
⟨DF |x u0 ; v0 ⟩ − ⟨DF |x v0 ; u0 ⟩
:
|det(u0 ; v0 )|
Therefore,
rot
F |x (u; v )
⟨DF |x u; v ⟩ − ⟨DF |x v ; u⟩
=
|det(u; v )|
Area(P (u; v ))
(19.4)
represents the infinitesimal circulation around a parallelogram centered at a
point x ∈ R2 .
Flux and Circulation
Slide 574
The Rotation in R2
19.9. Theorem. Let ˙ ⊂ R2 be open and F : ˙ → R2 a continuously
differentiable vector field. Then here exists a uniquely defined continuous
potential function rot F : ˙ → R such that
rot F | (u; v ) = rot F (x) · det(u; v ):
(19.5)
x
Proof.
The determinant is the unique alternating, normed, bilinear form in R2 .
Since
F |x is alternating and bilinear (but not normed) it must be a
multiple of the determinant. In fact, we have
rot
rot F (x) = rot F (x) · det
!
1
0
;
0
1
!!
=
= ⟨DF |x e1 ; e2 ⟩ − ⟨DF |x e2 ; e1 ⟩
@F2 @F1
−
:
=
@x1
@x2
rot F |
x
!
1
0
;
0
1
!!
Flux and Circulation
Slide 575
The Rotation in R2
19.10. Remark. Comparing (19.5) with the circulation density (19.4), it is
clear that the function rot F gives this circulation density. We note:
The circulation density of a vector field F in R2 is represented by
a scalar function, rot F .
This scalar function is given by
rot F =
@F2 @F1
−
:
@x1
@x2
(19.6)
Flux and Circulation
Slide 576
The Rotation in R3
19.11. Theorem. Let ˙ ⊂ R3 be open and F : ˙ → R3 a continuously
differentiable vector field. Then here exists a uniquely defined continuous
vector field rot F : ˙ → R3 such that
rot F | (u; v ) = det(rot F (x); u; v ) = ⟨rot F (x); u × v ⟩:
(19.7)
x
Proof.
Suppose that the vector field rot F exists for F = (F1 ; F2 ; F3 ); then we can
use (19.7) to calculate the first component of rot F :
(rot F (x))1 = ⟨rot F (x); e1 ⟩ = ⟨rot F (x); e2 × e3 ⟩ =
= ⟨DF |x e2 ; e3 ⟩ − ⟨DF |x e3 ; e2 ⟩
@F3 @F2
=
−
:
@x2
@x3
rot F | (e ; e )
x
2
3
Flux and Circulation
Slide 577
The Rotation in R3
Proof (continued).
Similarly, we calculate the other components of rot F to obtain
1
0
@F3
2
− @F
@x3
C
B @x2
B 1
@F3 C
−
rot F (x) = B @F
:
@x1 C
@ @x3
A
@F2
@F1
@x1 − @x2
Hence, the vector rot F is uniquely determined from F by (19.7).
rot
The existence of a vector y ∈ R3 such that
F |x (u; v ) = det(y ; u; v )
will be proven in the assignments. Since F is continuously differentiable as
a function of x, so is the vector y = y (x) and the vector field
rot F (x) = y (x) exists.
Flux and Circulation
Slide 578
The Rotation in R3
19.12. Remark. Comparing (19.7) with the circulation density (19.4), the
circulation density in the plane spanned by u and v at x is given by
fi
fl
u×v
rot F |x ;
:
∥u × v ∥
We note:
The circulation density of a vector field F in R3 is represented by
a vector field, rot F .
This vector field is given by
0
1
@F3
2
− @F
@x3
C
B @x2
B 1
@F3 C
−
rot F = B @F
:
@x1 C
A
@ @x3
@F2
@F1
@x1 − @x2
(19.8)
Flux and Circulation
Slide 579
The Rotation in R3
19.13. Remark. We can consider R2 as being a subspace of R3 by
identifying points (x1 ; x2 ) ∈ R2 with (x1 ; x2 ; 0) ∈ R3 . Similarly, a vector
field in R2 can be considered as a field in R3 of the form
0
1
F1 (x1 ; x2 )
C
B
F (x1 ; x2 ; x3 ) = @F2 (x1 ; x2 )A :
0
We then obtain
0
1
0
B
C
0
rot F (x) = @
A;
@F2
@F1
−
@x1
@x2
effectively regaining (19.6) from (19.8).
Flux and Circulation
Slide 580
The Rotation in Higher Dimensions
We have shown that for x ∈ Rn ,
rot F | (u; v ) = ⟨u; A(x)v ⟩
x
(19.9)
where
A(x) = (DF |x )T − DF |x :
Note that
A(x)T = −A(x):
so at any point x the rotation is represented as an antisymmetric matrix.
Flux and Circulation
Slide 581
The Rotation in Higher Dimensions
The space
has
{A ∈ Mat(n × n; R) : AT = −A}
rot F | can be represented by a scalar.
Dimension 3 if n = 3, so rot F | can be represented by a vector in R .
Dimension 6 if n = 4, so rot F | can not easily be represented by a
I Dimension 1 if n = 2, so
x
I
x
I
x
vector.
3
Flux and Circulation
Slide 582
Irrotational Fields
rot
A continuously differentiable field F : ˙ → Rn such that
F |x = 0 for all
x ∈ ˙ is said to be irrotational. Writing an irrotational field in the form
(19.9), we see that A(x) = 0 for all x. This implies
(DF |x )T = DF |x
which means that
@Fi
@Fj
=
;
@xj
@xi
i; j = 1; : : : ; n:
Hence, a potential field is irrotational. Conversely, if F : ˙ → Rn is defined
on a simply connected domain, we may apply Theorem 18.18 to deduce
that F is a potential field.
Flux and Circulation
Slide 583
Fluid Statics
Fluid statics is the study of time-independent flows. In particular, the
streamlines are assumed to be given by a direction field F in Rn (most
often, n = 2 or 3) that does not depend on time. Irrotational fluid flow is
often modeled by a potential field, i.e., one assumes
F = ∇U
for some potential U. (The resulting flow is known as potential flow.) If
there are no sources or sinks and the fluid is incompressible, one
additionally has
div F = 0:
Combining these two equations, one obtains
0 @U 1
@x1
2
2
B . C
@ U
@ U
C
div(∇U) = div B
@ .. A = @x 2 + · · · + @x 2 = ∆U = 0:
n
1
@U
@xn
Flux and Circulation
Slide 584
The Laplace Equation
The equation
∆U = 0
is a partial differential equation and known as the Laplace equation.
Together with boundary conditions for the flow it can (in principle) be
solved to yield the streamlines of an irrotational, incompressible fluid in any
physical situation.
However, practical solutions must often rely on numerical or approximate
methods, as solving the equation explicitly is possible only in the simplest
situations (such as fluid flow around a sphere). Finding solutions to this
(and other) partial differential equations is one of the main research
problems in applied mathematics and engineering.
Solutions of the Laplace equation play a minor role in Vv286 (Honors Math
II), a major role in Vv454 (Partial Differential Equations and Boundary
Value Problems) and are a principal topic of Vv557 (Methods of Applied
Mathematics II).
Flux and Circulation
Slide 585
Triangle Calculus
It is convenient to introduce the formal “vector”
0 @ 1
B @x. 1 C
C
∇ := B
@ .. A
@
@xn
Then the gradient of a potential function f is just
0 @ 1
@x1
0 @f 1
@
@xn
@f
@xn
@x1
B . C
B . C
C
B
C
∇f = B
@ .. A f = @ .. A :
The divergence of a vector field F can be expressed as
0
1 0 1
* @x@
F1 +
1
B . C B . C
@Fn
@F1
C
div F = ⟨∇; F ⟩ = B
@ .. A ; @ .. A = @x + · · · + @x :
@
@xn
Fn
1
n
Flux and Circulation
Slide 586
Triangle Calculus
The rotation of a vector field F can be formally written as
0
1
e1
e2
e3
@
rot F = ∇ × F (x) = det B
@ @x1
@
@x2
@ C
;
@x3 A
B
F1
F2
C
F3
where e1 ; e2 ; e3 are the standard basis vectors in R3 . Finally, we can
formally write
„
«
„
@ 2
@
⟨∇; ∇⟩ =
+ ··· +
@x1
@xn
@2
@2
+
·
·
·
+
=
@xn2
@x12
«2
=∆
so it is common for physicists to write ∇2 instead of ∆.
Fundamental Theorems of Vector Calculus
Slide 587
20. Fundamental Theorems of Vector Calculus
Fundamental Theorems of Vector Calculus
Slide 588
Vector Fields and Higher Order Derivatives
17. Potential Functions and the Gradient
18. Vector Fields and Integrals
19. Flux and Circulation
20. Fundamental Theorems of Vector Calculus
21. The Second Derivative
22. Free Extrema
23. Taylor Series
Fundamental Theorems of Vector Calculus
Slide 589
Fundamental Theorems of Integration
In our previous integrals, we have often been able to formulate a
“fundamental theorem” of the type
“integral of f over domain” = “values of primitive of f on the boundary”
For example, for a real integrable function defined on an interval,
Z b
a
f (x) dx = F (b) − F (a)
A similar formula was found to hold for a potential field integrated along a
curve, where the potential of the vector field played the role of the
primitive (see (18.8)).
It turns out that suitable generalizations of this principle hold in higher
dimensions. These fundamental theorems of vector analysis are among
the most important theorems for engineering applications that we will
study this term.
Fundamental Theorems of Vector Calculus
Slide 590
Green’s Theorem
20.1. Green’s Theorem. Let R ⊂ R2 be a bounded, simple region and
˙ ⊃ R an open set containing R. Let F : ˙ → R2 be a continuously
differentiable vector field. Then
Z
@R∗
F d ~‘ =
Z „
R
@F2 @F1
−
@x1
@x2
«
dx
(20.1)
where @R∗ denotes the boundary curve of R with positive
(counter-clockwise) orientation.
20.2. Remark. In fact, Green’s Theorem is valid not only for simple regions,
but for more general regions, as we shall describe below.
Fundamental Theorems of Vector Calculus
Slide 591
Green’s Theorem
Proof.
Consider R as an ordinate region with respect to x2 , let ˙ an open set
containing R and suppose F1 : ˙ → R is continuously differentiable.
x2
The boundary @R of R is the union of the curves
`1 = {(x1 ; x2 ) ∈ R2 : x1 ∈ [a; b]; x2 = ’1 (x1 )};
G2
R
`2 = {(x1 ; x2 ) ∈ R2 : x1 ∈ [a; b]; x2 = ’2 (x1 )};
G3
G4
G1
a
`3 = {(x1 ; x2 ) ∈ R2 : x1 = b; x2 ∈ [’1 (b); ’2 (b)]};
b
x1
`4 = {(x1 ; x2 ) ∈ R2 : x1 = a; x2 ∈ [’1 (a); ’2 (a)]}:
We will later imbue the boundary with positive orientation, so we discuss
integrals along the curves `1 ; `2 ; `3 ; `4 in the directions indicated by the
arrows.
Fundamental Theorems of Vector Calculus
Slide 592
Green’s Theorem
Proof (continued).
Consider the integral
Z
@F1
dx =
R @x2
=
Z b Z ’2 (x1 )
a
Z b
a
’1 (x1 )
@F1
dx2 dx1
@x2
F1 (x1 ; ’2 (x1 )) dx1 −
Z b
F1 (x1 ; ’1 (x1 )) dx1 :
a
We now integrate the vector field
Fe : ˙ → R2 ;
!
F1 (x)
;
0
Fe(x) =
along `1 and −`2 , using the respective parametrizations
‚
(1)
(t) =
!
t
;
’1 (t)
‚
(2)
(t) =
!
t
;
’2 (t)
t ∈ [a; b]:
(20.2)
Fundamental Theorems of Vector Calculus
Slide 593
Green’s Theorem
Proof (continued).
A quick calculation yields
Z
`1
Z
−`2
!
F1
d ~‘ =
0
Z b
F1 ◦ ‚ (1) (t) dt =
a
!
Z b
F1 (t; ’1 (t)) dt;
a
Z
Z
b
b
F1
d ~‘ = −
F1 ◦ ‚ (2) (t) dt = −
F1 (t; ’2 (t)) dt:
0
a
a
It is also easy to see that
!
Z
`3
so we find
I
@R
!
F1
d ~‘ =
0
Z b
a
F1
d ~‘ =
0
Z
`4
!
F1
d ~‘ = 0
0
F1 (t; ’1 (t)) dt −
Z b
a
F1 (t; ’2 (t)) dt
(20.3)
Fundamental Theorems of Vector Calculus
Slide 594
Green’s Theorem
Proof (continued).
Putting (20.2) and (20.3) together,
!
Z
@R
Z
F1
@F1
d ~‘ = −
dx:
0
R @x2
(20.4)
Repeating a similar argument with the scalar function F2 : ˙ → R and
representing R as an ordinate region with respect to x1 yields
Z
@R
!
0
d ~‘ =
F2
Z
@F2
dx:
R @x1
Adding (20.4) and (20.5) then gives (20.1).
(20.5)
Fundamental Theorems of Vector Calculus
Slide 595
Green’s Theorem
R
20.3. Example. We wish to calculate the integral ` x 4 dx + xy dy , where
` ⊂ R2 is the triangle with vertices (0; 0), (0; 1) and (1; 0), oriented
positively.
We note that the vector field F (x; y ) = (x 4 ; xy ) is
defined on all of R2 and the region R bounded by
the triangle is a simple region. Instead of evaluating
three separate line integrals (one for each edge of the
triangle) we can apply Green’s Theorem. We note
that
y
1
G
R
1
x
R = {(x; y ) ∈ R2 : x ∈ [0; 1]; 0 ≤ y ≤ 1 − x}:
Then
Z
`
4
x dx + xy dy =
Z
R
(y − 0) d(x; y ) =
Z 1 Z 1−x
0
0
1
y dy dx = :
6
Fundamental Theorems of Vector Calculus
Slide 596
Measurement of Area – the Planimeter
For F (x1 ; x2 ) = (−x2 ; x1 ) we obtain
Z
1
|R| =
1 dx =
2
R
Z
@R
!
−x2
d ~‘:
x1
Hence the integral of a vector field around the boundary of a region can be
used to determine the area of that region.
Several measurement instruments, known as planimeters, have been
developed to implement this. The most successful version is the polar
planimeter, invented by the Swiss mathematician Jakob Amsler in 1854.
Previous planimeters (the first was constructed in 1814) were not as
accurate as the polar planimeter, and this remains the most common form
today.
A description of the basic functioning of the polar planimeter is quoted on
the next slide. The source is an article on military surveys and earthwork
constructions by GlobalSecurity.org.
Fundamental Theorems of Vector Calculus
Slide 597
Functioning of a Polar Planimeter
“The planimeter [...] touches the paper at three points: the anchor point, P; the tracing
point, T; and the roller, R. The adjustable arm, A, is graduated to permit adjustment to
the scale of the plot. This adjustment provides a direct ratio between the area traced by
the tracing point and the revolutions of the roller. As the tracing point is moved over the
paper, the drum, D, and the disk, F, revolve. The disk records the revolutions of the
roller in units of tenths; the drum, in hundredths; and the vernier, V, in thousandths.”
http://www.globalsecurity.org/military/library/policy/army/fm/5-430-00-1/CH3.htm
Polar Planimeter in Use. GlobalSecurity.org. Web. 22 July 2012
Fundamental Theorems of Vector Calculus
Slide 598
Principle of a Polar Planimeter
y
To understand the principle of the polar
planimeter, consider the sketch at right:
vHx,yL
Assume for simplicity that both arms of
the planimeter have equal length r . When
tracing the boundary curve of the pictured
shape, the position of the pivot (p; q)
changes as a function of (x; y ) (the point
(p; q) is unique if we require that the angle
between the two arms is less than ı).
Hx,yL
r
r
Hp,qL
x
The planimeter vector field v is given by
!
1 −(y − q(x; y ))
v (x; y ) =
r
x − p(x; y )
where r 2 = (y − q(x; y ))2 + (x − p(x; y ))2 . The disk, roller and vernier of
the planimeter record the integral of v (x; y ) along the boundary.
Fundamental Theorems of Vector Calculus
Slide 599
Principle of a Polar Planimeter
The point (p(x; y ); q(x; y )) is given by the intersection of two circles of
radius r about the origin and about (x; y ). From the two equations
(p − x)2 + (q − y )2 = r 2 ;
p2 + q2 = r 2;
we obtain
x 2 + y 2 − 2px − 2qy = 0:
The implicit equations for p and q,
q
x 2 + y 2 − 2px − 2 r 2 − p 2 y = 0;
q
x + y − 2qy − 2 r 2 − q 2 x = 0;
2
2
yield (see assignments)
@p @q
+
= 1:
@x
@y
Fundamental Theorems of Vector Calculus
Slide 600
Principle of a Polar Planimeter
We then have
planimeter reading =
Z
@R∗
Z „
v d ~‘
«
@v2 @v1
=
−
dx dy
@x
@y
R
Z
1
1
=
dx dy = |R|:
r R
r
Hence, with r known, the area of the enclosed domain R has been found.
Fundamental Theorems of Vector Calculus
Slide 601
Finding Areas by Green’s Theorem
20.4. Example. Let E = {(x; y ) ∈ R2 : (x=a)2 + (y =b)2 = 1} be the ellipse
centered at the origin with half-axes of length a > 0 and b > 0. We can
find the area of the ellipse from
1
|E| =
2
First, @E is parametrized by
@E
!
−y
d ~‘:
x
!
a cos t
;
b sin t
‚(t) =
We obtain
1
|E| =
2
=
Z
1
2
Z 2ı *
0
Z 2ı
0
t ∈ [0; 2ı]:
!
−b sin t
−a sin t
;
a cos t
b cos t
!+
dt
ab(cos2 t + sin2 t) dt = ıab:
Fundamental Theorems of Vector Calculus
Slide 602
Physical Interpretation of Green’s Theorem
Green’s theorem can be interpreted both in terms of the rotation and the
divergence of a vector field.
Let ˙ ⊂ R2 be open, F : ˙ → R2 and R ⊂ ˙ a simple region with
boundary @R. Then the circulation (see (19.1) and the following
discussion) around R is given by
Z
@R∗
F d ~‘
where @R∗ is oriented positively. Furthermore, the rotation of F is given by
rot F =
(see (19.6)).
@F2 @F1
−
@x1
@x2
Fundamental Theorems of Vector Calculus
Slide 603
Physical Interpretation of Green’s Theorem
Green’s Theorem then states that
circulation along @R =
Z
@R∗
=
Z „
=
ZR
F d ~‘
@F2 @F1
−
@x1
@x2
«
dx
rot F dx
R
= integral of circulation density over R
Fundamental Theorems of Vector Calculus
Slide 604
Physical Interpretation of Green’s Theorem
In a similar manner, we can show that Green’s Theorem relates the total
flux through @R with the divergence of F . Define
Fe(x) :=
!
−F2 (x)
:
F1 (x)
Then ⟨F; N⟩ = ⟨Fe; T ⟩ for a tangent vector of @R (positively oriented) and
the outward-pointing normal vector. This yields
flux through @R =
Z
=
Z
@R∗
R
=
Z
⟨F; N⟩ d‘ =
@ Fe2 @ Fe1
−
@x1
@x2
Z
!
@R∗
Fe d ~‘
dx
div F dx
R
= integral of flux density over R
Fundamental Theorems of Vector Calculus
Slide 605
Ordinate Regions in Rn
We will now develop generalizations of Green’s Theorem 20.1 to higher
dimensions. In R2 , we have seen that Green’s Theorem provides
macroscopic equations for both the divergence and the rotation,
summarized as
flux through @R = integral of flux density over R;
circulation along @R = integral of circulation density over R:
for a suitable region R ⊂ R2 . It turns out that in R3 these physical
statements lead to two separate theorems.
Fundamental Theorems of Vector Calculus
Slide 606
Admissible Regions
We have proven Green’s theorem for simple regions in R2 , but we have not
precisely specified for which regions it is valid. Let us now do this:
20.5. Definition.
(i) A subset R ⊂ Rn is called a region if it is open and (pathwise)
connected.
(ii) A region R ⊂ Rn is said to be admissible if it is bounded and its
boundary is the union of a finite number of parametrized
hypersurfaces whose normal vectors point outwards from R.
20.6. Theorem. Green’s Theorem is valid for any admissible region in R2 .
Fundamental Theorems of Vector Calculus
Admissible Regions
Admissible regions may have edges and corners:
Slide 607
Fundamental Theorems of Vector Calculus
Slide 608
Admissible Regions
The boundary may not behave “too wildly”. This region is not admissible:
Fundamental Theorems of Vector Calculus
Slide 609
Admissible Regions
Removing an interior point means the boundary is not everywhere a
hypersurface. This region is not admissible:
Fundamental Theorems of Vector Calculus
Slide 610
Admissible Regions
Removing part of the center line means that it is impossible to find
outward-pointing normal vectors. This region is not admissible:
Fundamental Theorems of Vector Calculus
Slide 611
Admissible Hypersurfaces in R3
20.7. Definition. A hypersurface S ⊂ R3 with parametrization ’ : R → S is
said to be admissible if
(i) the interior int R is an admissible region in R2 with an oriented
boundary curve @R∗ and
(ii) R is closed, i.e., R = R.
In particular, for the boundary of the region R consists of a finite number
of parametrized hypersurfaces in R2 , i.e., smooth curves. Let us write
@R = C1 ∪ C2 ∪ · · · ∪ Ck :
This boundary of R is of course mapped by ’ into a set of points of S. We
would like to formulate a criterion for determining whether ’(@R) (or part
of this set) constitutes an “actual boundary” of the surface S or not.
Since @R∗ is oriented, we define the chain of curves C1∗ ∪ C2∗ ∪ · · · ∪ Ck∗
where the individual curves are traversed in the “correct” orientation
determined by @R∗ .
Fundamental Theorems of Vector Calculus
Slide 612
Closed Hypersurfaces in R3 and those with Boundary
20.8. Definition. Let S ⊂ R3 be an admissible hypersurface with
parametrization ’ : R → S. Let @R∗ = C1∗ ∪ C2∗ ∪ · · · ∪ Ck∗ , where each Ck∗ is
an oriented smooth curve in R2 and all Ci are pairwise disjoint.
(i) We say that ’ annihilates a chain of curves Ci1 ∪ · · · ∪ Cij if
Z
’(Ci1 ∪···∪Cij )
1 d‘ = 0:
(ii) If ’ annihilates @R, S is said to be a closed surface.
(iii) Denote by C′ ⊂ @R the largest chain of curves that is annihilated by
’. If C′ ̸= @R we say that S is a surface with boundary and define
@ S := ’(@R \ C′ ):
(20.6)
Fundamental Theorems of Vector Calculus
Slide 613
Admissible Hypersurfaces in R3
20.9. Example. The unit sphere S 2 ⊂ R3 is parametrized by
0
’ : [0; 2ı] × [0; ı] → S 2 ;
1
cos ffi sin „
B
C
’(ffi; „) = @ sin ffi sin „ A :
cos „
The interior of [0; 2ı] × [0; ı] is clearly an admissible region in R2 , as it is
closed and its boundary consists of four lines, which are hypersurfaces in
R2 and the normal vectors can be taken to point outward. It is easily seen
that ’ annihilates the boundary of the rectangle, so S 2 is a closed,
admissible surface.
Fundamental Theorems of Vector Calculus
Slide 614
Admissible Hypersurfaces in R3
20.10. Example. The map
’ : [0; 2ı] × [0; ı=2] → S;
0
1
cos ffi sin „
C
B
’(ffi; „) = @ sin ffi sin „ A :
cos „
parametrizes a unit hemisphere S. The boundary not annihilated, so the
hemisphere is a surface with boundary, given by
@ S = {x ∈ R3 : x3 = 0; x12 + x22 = 1}:
Fundamental Theorems of Vector Calculus
Slide 615
Stokes’s Theorem in R3
There is a theorem that states that an oriented hypersurface in R3 is
closed if and only if it divides R3 into two disjoint, connected components.
Hence, an oriented hypersurface is closed if and only if it is the boundary
of a region in R3 .
After these preparations we can finally formulate one generalization of
Green’s theorem to R3 :
20.11. Stokes’s Theorem. Let ˙ ⊂ R3 be an open set, S ⊂ ˙ a
parametrized, admissible surface in R3 with boundary @ S and let
F : ˙ → R3 be a continuously differentiable vector field. Then
Z
@ S∗
F d ~‘ =
Z
S∗
~
rot F d A
where the orientations of the boundary curve @ S∗ and the surface S∗ are
chosen so that the normal vector to S∗ points in the direction of the
thumb of the right hand if the four fingers point in the direction of the
tangent vector to @ S∗ .
Fundamental Theorems of Vector Calculus
Slide 616
Orientation for Stokes’s Theorem in R3
Orientation for Stokes’s Theorem. James Stewart, Calculus, 4th Ed., Brooks Cole
Right-hand Grip Rule. Wikimedia Commons. Wikimedia
Foundation. Web. 28 July 2012
Fundamental Theorems of Vector Calculus
Slide 617
Physical Interpretation of Stokes’s Theorem
The physical interpretation of Stokes’s theorem is the same as for Green’s
theorem: in the integral of infinitesimal circulations (the rotation) across a
surface, the individual circulations cancel everywhere except at the
boundary, so the total integral is just the circulation along the boundary:
Illustration of Stokes’s Theorem. Wikimedia Commons. Wikimedia Foundation. Web. 28 July
2012
Fundamental Theorems of Vector Calculus
Slide 618
Physical Interpretation of Stokes’s Theorem
Furthermore, it does not matter how the surface is deformed if the
boundary remains the same; the “infinitesimal circulations” continue to
cancel each other. The integral of the circulation across the hemisphere
below left is not affected by deformations (middle) and even equal to the
circulation integrated over the disk with the same boundary (right).
Fundamental Theorems of Vector Calculus
Slide 619
Proof of Stokes’s Theorem
Proof of Stokes’s Theorem 20.11.
Stokes’s Theorem in R3 can be reduced to Green’s Theorem 20.1. Let
’ : R → S be the parametrization of the surface. Then
Z
S∗
~=
rot F d A
Z
=
ZS
∗
R
⟨rot F; N⟩ dA
⟨rot F (’(x1 ; x2 ); t1 × t2 |’(x1 ;x2 ) ⟩ dx1 dx2 :
By (19.7) and (19.3) we have
rot F | `„t (’(x)); t«(’(x))´
@’ @’
= rot F |
;
@x @x
⟨rot F (’(x)); t1 × t2 |’(x) ⟩ =
’(x)
1
2
’(x)
fi
1
@’ @’
;
= DF |’(x)
@x1 @x2
fl2
−
fi
@’
@’
; DF |’(x)
@x1
@x2
fl
Fundamental Theorems of Vector Calculus
Slide 620
Proof of Stokes’s Theorem
Proof of Stokes’s Theorem 20.11 (continued).
By the chain rule,
@
@x1
fi
@’
F |’(x) ;
@x2
2
fl
fi
@’ @’
= DF |’(x)
;
@x1 @x2
fl
*
@2’
+ F |’(x) ;
@x1 @x2
+
:
2
’
’
Using that @x@1 @x
= @x@2 @x
(as we will prove later), we obtain
2
1
fi
@
@’
⟨rot F (’(x)); t1 × t2 |’(x) ⟩ =
F |’(x) ;
@x1
@x2
= rot Fe(x);
fl
@
−
@x2
fi
@’
F |’(x) ;
@x1
where
Fe : R → R2 ;
Fe(x) =
˙
¸!
F|
; @’1
˙ ’(x) @x
@’ ¸ :
F |’(x) ; @x
2
fl
Fundamental Theorems of Vector Calculus
Slide 621
Proof of Stokes’s Theorem
Proof of Stokes’s Theorem 20.11 (continued).
We can now apply Green’s theorem in the admissible region R. Then
Z
S∗
~=
rot F d A
Z
=
Z˙
=
ZR
⟨rot F (’(x1 ; x2 ); t1 × t2 |’(x1 ;x2 ) ⟩ dx1 dx2
rot Fe(x) dx1 dx2
@R
Fe d ~‘:
Let us suppose (for simplicity) that @R is given by a single parametrization
‚ : I → @R. Then
Z
S∗
~=
rot F d A
Z
@R
Fe d ~‘ =
Z
I
⟨Fe(‚(t)); ‚ ′ (t)⟩ dt:
Fundamental Theorems of Vector Calculus
Slide 622
Proof of Stokes’s Theorem
Proof of Stokes’s Theorem 20.11 (continued).
Inserting the definition of Fe,
Z
S∗
Z „D
@’ E˛˛
‚1′ (t)
˛
@x
x=‚(t)
I
1
«
D
@’ E˛˛
+ F |’(x) ;
‚2′ (t) dt
˛
@x2 x=‚(t)
Z D
E
d
=
F |’(‚(t)) ; ’(‚(t)) dt
dt
ZI
=
F d ~‘
~=
rot F d A
F |’(x) ;
@ S∗
where we have used the chain rule and that ’(‚(t)) parametrizes @ S. We
have assumed a single parametrization for the boundary; in general, this
calculation can be applied to each boundary segment.
Fundamental Theorems of Vector Calculus
Slide 623
Stokes’s Theorem in Rn
20.12. Remark. We have formulated and proved Stokes’s theorem in R3 ,
since we have a good idea of the structure of the rotation (as a
three-dimensional vector) in R3 .
To generalize Stokes’s theorem to n dimensions would require working with
the rotation as a bilinear form and would entail a fair amount of abstract
algebra and geometry, including a closer study of differential forms. this is
beyond the scope of our course; if you are interested in this, search for
books on vector analysis.
For example, Michael Spivak’s book Calculus on Manifolds is a good
place to start.
Fundamental Theorems of Vector Calculus
Slide 624
Gauß’s Theorem
The other aspect of Green’s theorem is based on the physical idea of flux.
This has a straightforward generalization to n dimensions:
20.13. Gauß’s Theorem. Let R ⊂ Rn be an admissible region and
F : R → Rn a continuously differentiable vector field. Then
Z
R
div F dx =
Z
@R∗
~
F d A:
The integrals make sense, as the boundary of an admissible region is a
union of hypersurfaces. Recall that the surfaces are oriented in such a way
that the normal vector points outward.
We will prove Gauß’s theorem only for the case of simple regions, whose
definition we now recall.
Fundamental Theorems of Vector Calculus
Slide 625
Ordinate Regions in Rn
For x ∈ Rn we define
x̂ (k) := (x1 ; : : : ; xk−1 ; xk+1 ; : : : ; xn ) ∈ Rn−1
as the vector x with the kth component omitted.
20.14. Definition. A subset R ⊂ Rn is said to be an ordinate region (with
respect to xk ) if there exists a measurable set ˙ ⊂ Rn−1 and continuous,
almost everywhere differentiable functions ’1 ; ’2 : ˙ → R, such that
˘
¯
R = x ∈ Rn : x̂ (k) ∈ ˙; ’1 (x̂ (k) ) ≤ xk ≤ ’2 (x̂ (k) ) :
If R is an ordinate region with respect to each xk , k = 1; : : : ; n, it is said
to be a simple region.
20.15. Theorem. A simple region is admissible.
Fundamental Theorems of Vector Calculus
Slide 626
Proof of Gauß’s Theorem for Simple Regions
We will prove Gauß’s theorem only for simple regions.
Proof of Gauß’s Theorem 20.13.
Suppose that F = (F1 ; : : : ; Fn ). Then we have to prove
Z
div F dx =
Z
@R∗
R
~
F d A;
where @R∗ is oriented by the normal vector pointing outwards. Since
Z
div F dx =
R
n Z
X
@Fk
@xk
k=1 R
dx;
Z
@R∗
⟨F; N⟩ dA =
n Z
X
k=1 @R
Fk ⟨ek ; N⟩ dA
it is sufficient to show that
Z
@Fk
dx =
R @xk
for k = 1; : : : ; n.
Z
@R∗
Fk ⟨ek ; N⟩ dA
(20.7)
Fundamental Theorems of Vector Calculus
Slide 627
Proof of Gauß’s Theorem for Simple Regions
Proof of Gauß’s Theorem 20.13 (continued).
We fix some k and use that R is a simple region, so we can write
˘
¯
R = x ∈ Rn : x̂ (k) ∈ ˙; ’1 (x̂ (k) ) ≤ xk ≤ ’2 (x̂ (k) ) :
for some ˙ ⊂ Rn−1 . The boundary of R is given by
˘
¯
@R∗ = x ∈ Rn : x̂ (k) ∈ ˙; xk = ’1 (x̂ (k) )
|
˘
{z
=:S1
}
∪ x ∈ Rn : x̂ (k) ∈ ˙; xk = ’2 (x̂ (k) )
|
˘
{z
=:S2
¯
}
¯
∪ x ∈ Rn : x̂ (k) ∈ @˙; ’1 (x̂ (k) ) ≤ xk ≤ ’2 (x̂ (k) )
|
{z
=:S3
}
Fundamental Theorems of Vector Calculus
Slide 628
Proof of Gauß’s Theorem for Simple Regions
Proof of Gauß’s Theorem 20.13 (continued).
It is left as an exercise to show that the normal vector N to the “mantle”
S3 is orthogonal to ek (by writing down a parametrization and showing that
ek is in fact a tangent vector). Then the surface integral in (20.7) becomes
Z
@R∗
Fk ⟨ek ; N⟩ dA =
Z
@ S1∗
Fk ⟨ek ; N1 ⟩ dA +
Z
@ S2∗
Fk ⟨ek ; N2 ⟩ dA:
(20.8)
To evaluate the surface integrals, we need to find the unit normal vectors
N1 and N2 . The surface S1 is parametrized by
“
˘1 (x̂ (k) ) = x1 ; : : : ; xx−1 ; ’1 (x̂ (k) ); xk+1 ; : : : ; xn
”T
:
Fundamental Theorems of Vector Calculus
Slide 629
Proof of Gauß’s Theorem for Simple Regions
Proof of Gauß’s Theorem 20.13 (continued).
Then the jth tangent vector is given by
tj =
”T
@˘(x̂ (k) ) “
@’1
= 0; : : : ; 0; 1; 0; : : : ; 0;
; 0; : : : ; 0
↑
@xj
@xj
j
↑
k
for j ∈ {1; : : : ; n} \ {k}. The normal vector will be orthogonal to all n − 1
tangent vectors, so
N1 = C ·
„
@’1
@’1
@’1
@’1
;:::;
; −1;
;:::;
@x1
@xk−1
@xk+1
@xn
where C ∈ R is a normalization constant.
«T
Fundamental Theorems of Vector Calculus
Slide 630
Proof of Gauß’s Theorem for Simple Regions
Proof of Gauß’s Theorem 20.13 (continued).
Since N1 is to have unit length,
„
n “
X
1
@’1 ”2
= 1+
C
@xj
j=1
«1=2
=
q
1 + |D’1 |2 :
j̸=k
We note that, after re-arranging rows,
˛
!˛
˛
1n−1 (D’1 )T ˛˛
˛
n−k
det
|det(t1 ; : : : ; tn−1 ; N1 )| = C · ˛(−1)
˛
˛
˛
D’1
−1
˛
1˛
0
˛
˛
1n−1
0
˛
«C˛˛
˛
B
P ` @’1 ´2 A˛
= C · ˛det @
D’1 −1 −
˛
˛
@xj
˛
˛
j̸=k
=
1
:
C
Fundamental Theorems of Vector Calculus
Slide 631
Proof of Gauß’s Theorem for Simple Regions
Proof of Gauß’s Theorem 20.13 (continued).
We see that ⟨ek ; N1 ⟩ = −C and hence
Z
@ S1∗
Z
Fk ⟨ek ; N1 ⟩ dA = −C
=−
˙
Z
Fk (˘1 (x̂ (k) )) |det(t1 ; : : : ; tn−1 ; N1 )| d x̂ (k)
|
{z
=1=C
}
Fk (x1 ; : : : ; xk−1 ; ’1 (x̂ (k) ); xk+1 ; : : : ; xn ) d x̂ (k)
˙
In the same way we can show that
Z
@ S2∗
Fk ⟨ek ; N2 ⟩ dA =
Z
˙
Fk (x1 ; : : : ; xk−1 ; ’2 (x̂ (k) ); xk+1 ; : : : ; xn ) d x̂ (k)
Fundamental Theorems of Vector Calculus
Slide 632
Proof of Gauß’s Theorem for Simple Regions
Proof of Gauß’s Theorem 20.13 (continued).
Finally, we obtain
Z
@R∗
Fk ⟨ek ; N⟩ dA =
=
Z
@ S2∗
Z
Fk dA +
Z
@ S1∗
Fk dA
Fk (x1 ; : : : ; xk−1 ; ’2 (x̂ (k) ); xk+1 ; : : : ; xn ) d x̂ (k)
˙Z
−
Fk (x1 ; : : : ; xk−1 ; ’1 (x̂ (k) ); xk+1 ; : : : ; xn ) d x̂ (k)
˙
Z Z ’2 (x̂ (k) )
@Fk
(x) dxk d x̂ (k)
=
(k)
˙ ’1 (x̂ ) @xk
=
which shows (20.7).
Z
@Fk
dx;
R @xk
Fundamental Theorems of Vector Calculus
Slide 633
Application to Electromagnetics
We have shown in Example 19.5 that the flux through the unit sphere S 2
of the electric field
1
Q
E(p) =
p
4ı"0 ∥p∥3
induced by a point charge at the origin is
Z
S2
~ =
⟨E; d A⟩
Q
:
"0
This calculation can easily be modified so that it holds for any sphere
@Br (0) of radius r > 0. Now let R ⊂ R3 be an admissible region
containing the origin. Then the flux through R is given by
Z
R
~ =
⟨E; d A⟩
Z
@Br (0)
Q
=
+
"0
~ +
⟨E; d A⟩
Z
R\Br (0)
Z
@(R\Br (0))
div E dx:
~
⟨E; d A⟩
Fundamental Theorems of Vector Calculus
Slide 634
Application to Electromagnetics
Now
@Ei
Q @
xi
=
2
2
@xi
4ı"0 @xi (x1 + x2 + x32 )3=2
` 2
Q
1
2
2 3=2
2
2
2 1=2 2 ´
=
(x
+
x
+
x
)
−
3(x
+
x
+
x
) xi :
1
2
3
1
2
3
2
2
2
4ı"0 (x1 + x2 + x3 )3
Since
3
X
`
´
(x12 + x22 + x32 )3=2 − 3(x12 + x22 + x32 )1=2 xi2 = 0;
i=1
we see that div E(x) = 0 for x ̸= 0. This implies that
Z
~ = Q
⟨E; d A⟩
"0
R
for any admissible region R ⊂ R3 .
Fundamental Theorems of Vector Calculus
Slide 635
Green’s Identities
20.16. Green’s Identities. Let R ⊂ Rn be an admissible region and
u; v : R → R be twice continuously differentiable potential functions. Then
Z
R
⟨∇u; ∇v ⟩ dx = −
Z
u · ∆v dx +
R
Z
@R∗
u
@v
dA
@n
(20.9)
and
Z
R
`
´
u · ∆v − v · ∆u dx =
Z
@R∗
„
u
«
@v
@u
−v
dA:
@n
@n
(20.10)
Here we have used the normal derivative (see Definition 17.9) in the
integrals on the boundary.
The relation (20.9) is commonly called Green’s first identity and (20.10)
is known as Green’s second identity.
Green’s identities can be regarded as “integration by parts for ∇ and ∆”.
Fundamental Theorems of Vector Calculus
Slide 636
Green’s Identities
Proof.
We make use of the relation
div(u∇v ) = u div(∇v ) +⟨∇u; ∇v ⟩:
| {z }
=∆v
Applying Gauß’s theorem,
Z
R
⟨∇u; ∇v ⟩ dx =
Z
`
=
ZR
=
Z@R
=
Z
´
div(u∇v ) − u∆v dx
∗
@R∗
@R∗
~−
u∇v d A
Z
u∆v dx
R
u⟨∇v ; N⟩ dA −
u
@v
dA −
@n
Z
R
Z
u∆v dx
R
u∆v dx:
Fundamental Theorems of Vector Calculus
Slide 637
Green’s Identities
Proof (continued).
This proves the first identity, (20.9). The second identity, (20.10), follows
by subtracting the two equations
Z
Z
Z
@v
dA;
@n
R
R
Z
Z
Z
@u
⟨∇v ; ∇u⟩ dx = − v · ∆u dx +
v
dA
∗
@n
R
R
@R
⟨∇u; ∇v ⟩ dx = −
from each other.
u · ∆v dx +
@R∗
u
The Second Derivative
Slide 638
21. The Second Derivative
The Second Derivative
Slide 639
Vector Fields and Higher Order Derivatives
17. Potential Functions and the Gradient
18. Vector Fields and Integrals
19. Flux and Circulation
20. Fundamental Theorems of Vector Calculus
21. The Second Derivative
22. Free Extrema
23. Taylor Series
The Second Derivative
Slide 640
The Second Derivative
In the next section, we wish to discuss maxima and minima of potential
functions on Rn . This requires us to analyze the concept of the second
derivative of a function a little more closely.
21.1. Definition. Let X; V be finite-dimensional normed vector spaces and
˙ ⊂ X an open set. A function f : ˙ → V is said to be twice
differentiable at x ∈ ˙ if
I f is differentiable in an open ball B" (x) around x and
I the function Df : B" (x) → L(X; V ) is differentiable at x.
We say that f is twice differentiable on ˙ if f is twice differentiable at
every x ∈ ˙.
The derivative of Df (if it exists) is a map
D(Df ) =: D2 f : ˙ → L(X; L(X; V )):
(21.1)
We call (21.1) the second derivative of f . If the map x 7→ D2 f |x is
continuous on ˙ we say that f ∈ C 2 (˙; V ).
The Second Derivative
Slide 641
The Second Derivative for Potential Functions
21.2. Example. Consider a differentiable potential function f : Rn → R.
Then the derivative is given by the Jacobian
Df |x =
“
˛
@f ˛
@x1 x
:::
˛ ”
@f ˛
@xn x
:
:::
@f ˛
@xn x
The second derivative is the derivative of the map Df : Rn → L(Rn ; R),
Df : x 7→ Df |x =
“
˛
@f ˛
@x1 x
˛ ”
:
The map Df is of course in general non-linear. The derivative is found by
taking
Df |x+h = Df |x + D2 f |x h + o(h)
as h → 0.
Here Df |x and Df |x+h ∈ L(Rn ; R) are linear maps from Rn → R, while
D2 f |x ∈ L(Rn ; L(Rn ; R))
so that D2 f |x h ∈ L(Rn ; R).
The Second Derivative
Slide 642
The Second Derivative for Potential Functions
Now what shape does D2 f |x take? A “column vector” h ∈ Rn is
transformed into a linear map in L(Rn ; R) ≃ Mat(n × 1; R), which we can
regard as the space of “row vectors”.
Recall that in this case Df |x = (∇f (x))T and that the transposition is a
linear map. Then we can write
Df : x 7→ (∇f (x))T = Df |x :
Hence, Df = ( · )T ◦ ∇f and we can differentiate Df by the chain rule.
The derivative of the map
∇f : Rn → Rn ;
0 @f ˛ 1
˛
B @x1. x C
C
∇f (x) = B
@ .. ˛ A
@f ˛
@xn x
can be easily calculated: it is just the Jacobian of ∇f .
The Second Derivative
Slide 643
The Hessian
We hence have
0
˛
B
@
..
. ˛
2
@2f ˛
@x2 @x1 ˛x
:::
@ f ˛
@x1 @xn ˛x
@ f ˛
@x2 @xn ˛x
:::
D(∇f )|x = B
where
˛
@2f ˛
˛
@x
B 1 @x1 x
..
. ˛
2
˛ 1
@2f ˛
@xn @x1 ˛x C
C
..
∈ Mat(n × n; R):
. ˛ C
A
2
@ f ˛
@xn @xn ˛x
(21.2)
@2f
@ @f
:=
@xi @xj
@xi @xj
is the second partial derivative of f with respect to xj (first) and xi
(second). The matrix in (21.2) is important enough to warrant a special
name: It is called the Hessian of f and denoted by
Hess f (x):
The Second Derivative
Slide 644
The Hessian as a Bilinear Map
Recall that the transposition is a linear map, so its derivative is again the
transposition (see Example 10.5). Hence,
D2 f |x h = D(( · )T ◦ ∇f (x))h = D( · )T |∇f (x) ◦ D(∇f )|x h
= ( · )T ◦ D(∇f )|x h = (Hess f (x)h)T :
(21.3)
As required, D2 f |x h is a “row vector”, i.e., an element of L(Rn ; R). We
see that if h̃ ∈ Rn is some other vector, D2 f |x h acts on h̃ via
(D2 f |x h)h̃ = (Hess f (x)h)T h̃ = ⟨Hess f (x)h; h̃⟩ ∈ R:
Note that the expression ⟨Hess f (x)h; h̃⟩ is linear in both h and h̃; hence
we can also regard the second derivative as a bilinear map
D2 f |x : Rn × Rn → R;
(h; h̃) 7→ ⟨Hess f (x)h; h̃⟩:
The Second Derivative
Slide 645
The Second Derivative for General Functions
The preceding extended example already includes all relevant ideas for the
general case, which we now discuss.
Let X; V be normed vector spaces, ˙ ⊂ X open and f : ˙ → V a
differentiable function. Then the derivative of f is a map
Df : ˙ → L(X; V );
x 7→ Df |x :
(21.4)
The derivative of Df (if it exists) is a map
D(Df ) =: D2 f : ˙ → L(X; L(X; V )):
We will investigate the space L(X; L(X; V )) a little more closely. Let
x1 ; x2 ∈ X and L ∈ L(X; L(X; V )). Then Lx1 ∈ L(X; V ) and
(Lx1 )(x2 ) ∈ V:
The Second Derivative
Slide 646
The Second Derivative as a Bilinear Map
e : X × X → V given by
To L ∈ L(X; L(X; V )) we can associate a map L
e 1 ; x2 ) := (Lx1 )(x2 )
L(x
(21.5)
Obviously, for x1 ; x2 ; x2′ ∈ X and – ∈ F,
e 1 ; x2 + x ′ ) = (Lx1 )(x2 + x ′ ) = (Lx1 )(x2 ) + (Lx1 )(x ′ )
L(x
2
2
2
e 1 ; x2 ) + L(x
e 1 ; x ′ );
= L(x
2
e 1 ; x2 )
L(x1 ; –x2 ) = (Lx1 )(–x2 ) = –(Lx1 )(x2 ) = –L(x
because Lx1 ∈ L(X; V ) is linear. Furthermore, since L ∈ L(X; L(X; V )),
e 1 + x ′ ; x2 ) = (L(x1 + x ′ ))(x2 ) = (Lx1 + Lx ′ )(x2 )
L(x
1
1
1
e 1 ; x2 ) + L(x
e ′ ; x2 );
= (Lx1 )(x2 ) + (Lx1′ )(x2 ) = L(x
1
e
e
L(–x
1 ; x2 ) = (–Lx1 )(x2 ) = –(Lx1 )(x2 ) = –L(x1 ; x2 ):
e is a bilinear map, i.e., linear in both components.
We thus see that L
The Second Derivative
Slide 647
Multilinear Maps
21.3. Definition. Let X; V be finite-dimensional normed vector spaces. The
set of multilinear maps from X to V is denoted by
L(n) (X; V ) :=
n
o
L : X × · · · × X → V : L linear in each component :
In the special case V = R an element of L(n) (X; V ) is called a multilinear
form.
From the previous discussion we see that there is a canonical isomorphism
L(X; L(X; V )) ∼
= L(2) (X; V )
given by (21.5).
From now on, we will make no difference between these two spaces, and in
fact drop the tilde in (21.5), treating L either as a bilinear map
X × X → V or as a linear map X → L(X; V ).
The Second Derivative
Slide 648
Bilinear Forms on Rn
21.4. Example. Let X = Rn and V = R. Then we have seen that
R
L(2) ( n ×
Rn ; R) ∼
= L(Rn ; L(Rn ; R)):
We know that L(Rn ; R) = (Rn )∗ ∼
= Rn , so we have
R
L(2) ( n ×
Rn ; R) ∼
= L(Rn ; Rn ) ∼
= Mat(n × n; R):
(21.6)
Thus the space of bilinear maps on Rn is isomorphic to the set of square
n × n matrices. How does this work in practice? Every linear map
L ∈ (Rn )∗ has the form
L = ⟨z; · ⟩
for some z ∈ Rn .
The Second Derivative
Slide 649
Bilinear Forms on Rn
We thus interpret an element of A ∈ L(Rn ; L(Rn ; R)) as a linear map that
associates
A : y 7→ Ly := ⟨zy ; · ⟩
(21.7)
Equivalently, we associate to every y some zy = A(y ); this is realized
through a matrix which we also call A:
A : y 7→ zy :
(21.8)
Hence, for every y ∈ Rn we obtain a linear map ⟨Ay ; · ⟩ ∈ L(Rn ; R).
Letting this linear map act on some x ∈ Rn we get
Ly x = ⟨Ay ; x⟩ = L(x; y ):
Often, one prefers to write ⟨x; Ay ⟩ instead of ⟨Ay ; x⟩. We thus see that
Mat(n × n; R) ∼
= L(2) (Rn × Rn ; R)
via
A ↔ ⟨ · ; A( · )⟩:
The Second Derivative
Slide 650
Schwarz’s Theorem
If f : Rn → R is twice differentiable, we can represent D2 f in the form of a
square n × n matrix; this is just the Hessian we have introduced in (21.2).
However, in general we can not represent the second derivative of a
function Rn → Rm as a matrix; furthermore, even in the case of potential
functions (m = 1) higher order derivatives can not be expressed as
matrices.
We now introduce a fundamental result governing the second derivative:
21.5. Schwarz’s Theorem. Let X; V be normed vector spaces and ˙ ⊂ X
an open set. Let f ∈ C 2 (˙; V ). Then D2 f |x ∈ L(2) (X × X; V ) is
symmetric for all x ∈ ˙, i.e.,
D2 f (u; v ) = D2 f (v ; u);
for all u; v ∈ X.
The Second Derivative
Slide 651
Schwarz’s Theorem
Proof.
Fix x ∈ ˙ and choose r > 0 sufficiently small that Br (x) ⊂ ˙. Choose
u; v ∈ ˙ such that ∥u∥; ∥v ∥ < r =2. Set g (x) := f (x + v ) − f (x). Then
the Mean Value Theorem 11.6 we have
f (x + v + u) − f (x + u) − f (x + v ) + f (x)
= g (x + u) − g (x) =
=
Z 1
=
Z 1 “Z 1
0
0
`
Z 1
0
Dg |x+tu u dt
´
Df |x+tu+v − Df |x+tu u dt
0
”
D2 f |x+tu+sv v ds u dt
The Second Derivative
Slide 652
Schwarz’s Theorem
Proof (continued).
Now the continuity of D2 f implies that
D2 f |x+sv +tu − D2 f |x = o(1)
as ∥u∥ + ∥v ∥ → 0.
for any 0 ≤ s; t ≤ 1. By Lemma 9.16 (applied with y = (s; t) in the
compact set [0; 1] × [0; 1] ⊂ R2 ) the convergence is even uniform in s and
t, i.e.,
sup ∥D2 f |x+sv +tu − D2 f |x ∥ = o(1)
0≤s;t≤1
as ∥u∥ + ∥v ∥ → 0
where we use the operator norm. This implies that
Z 1 “Z 1
0
0
”
D f |x+sv +tu v ds u dt =
2
as ∥u∥ + ∥v ∥ → 0.
Z 1 “Z 1
0
0
”
D2 f |x v ds u dt + ∥u∥∥v ∥o(1):
The Second Derivative
Slide 653
Schwarz’s Theorem
Proof (continued).
Again from the Mean Value Theorem 11.6 we have
Z 1 “Z 1
0
0
”
D f |x v ds u dt =
2
Z 1Z 1
0
0
` 2
´
D f |x v u ds dt:
If we regard D2 f |x as a bilinear map, we have, as ∥u∥ + ∥v ∥ → 0,
g (x + u) − g (x) =
Z 1Z 1
0
2
0
D2 f |x (v ; u) ds dt + ∥u∥∥v ∥o(1)
= D f |x (v ; u) + ∥u∥∥v ∥o(1)
since the integrand does not depend on s or t.
(21.9)
The Second Derivative
Slide 654
Schwarz’s Theorem
Proof (continued).
We may repeat this entire calculation, using ge(x) := f (x + u) − f (x)
instead of g . We then obtain the same result, but with u and v
interchanged:
g (x + v ) − g (x) = D2 f |x (u; v ) + ∥u∥∥v ∥o(1)
(21.10)
as ∥u∥ + ∥v ∥ → 0. Now both (21.9) and (21.10) are equal to
f (x + v + u) − f (x + u) − f (x + v ) + f (x), so taking the difference yields
D2 f |x (v ; u) − D2 f |x (u; v ) = ∥u∥∥v ∥o(1)
as ∥u∥ + ∥v ∥ → 0. Furthermore, we can now use a scaling argument to see
that the right-hand side is actually zero. For this, note that the left-hand
side may be regarded as a bilinear map L ∈ L(X × X; V ).
The Second Derivative
Slide 655
Schwarz’s Theorem
Proof (continued).
We will show that if L ∈ L(X × X; V ) and
L(u; v ) = ∥u∥∥v ∥o(1)
as ∥u∥ + ∥v ∥ → 0
then L = 0. Choose s; t ∈ R \ {0}. Then
L(u; v ) =
1
L(su; tv ):
st
So for |s| + |t| → 0 we have
∥L(u; v )∥ =
1
1
∥L(su; tv )∥ = o(1)
∥su∥∥tv ∥ = o(1)∥u∥∥v ∥
|st|
|st|
as |s| + |t| → 0. Since the left-hand side does not depend on s or t, we see
that L(u; v ) = 0.
The Second Derivative
Slide 656
Symmetry of the Hessian
In the case of potential functions (X = Rn , V = R), Theorem 21.5 implies
that
⟨Hess f (x)y ; z⟩ = ⟨Hess f (x)z; y ⟩;
x; y ; z ∈ Rn :
which means Hess f (x) = (Hess f (x))T , i.e., the Hessian of f at x is a
symmetric matrix.
Writing out the components of Hess f (x), this means that
@2f
@2f
=
:
@xi @xj
@xj @xi
In other words, if f is twice continuously differentiable, the order of
differentiation in the second-order partial derivatives does not matter. (This
will be the case if all second-oder partial derivatives are continuous. Why?)
This immediately proves Lemma 18.15.
Free Extrema
Slide 657
22. Free Extrema
Free Extrema
Slide 658
Vector Fields and Higher Order Derivatives
17. Potential Functions and the Gradient
18. Vector Fields and Integrals
19. Flux and Circulation
20. Fundamental Theorems of Vector Calculus
21. The Second Derivative
22. Free Extrema
23. Taylor Series
Free Extrema
Slide 659
Extrema of Real Functions
In this section we will focus on extrema of functions. Recall that a twice
continuously differentiable real function f : R → R satisfies
f (x + h) = f (x) + f ′ (x)h +
f ′′ (x) 2
h + o(h2 )
2
as h → 0:
(22.1)
The stationary points of f are given by f ′ (x) = 0 and their nature is
determined by the sign of f ′′ (x). We now aim to develop an analogous
theory for functions f : Rn → R. Our first goal will be to extend the
formula
f (x + h) = f (x) + Df |x h + o(h)
into an expression analogous to (22.1).
as h → 0
Free Extrema
Slide 660
Quadratic Approximation of Potential Functions
22.1. Lemma. Let ˙ ⊂ Rn be an open set and f ∈ C 2 (˙; R). Then for
any h ∈ Rn small enough that x + h ∈ ˙,
f (x + h) = f (x) + ⟨∇f (x); h⟩ +
Z 1
0
(1 − t)⟨Hess f (x + th)h; h⟩ dt: (22.2)
Proof.
By the Mean Value Theorem 11.6,
f (x + h) − f (x) =
Z 1
0
Df |x+th h dt =
Z 1
0
1 · Df |x+th h dt
We now want to integrate by parts, differentiating Df |x+th h and
integrating 1 in the integrand. As a primitive for 1 we can take t + c for
any c ∈ R; we choose t − 1.
Free Extrema
Slide 661
Quadratic Approximation of Potential Functions
Proof (continued).
By the chain rule and (21.3),
d
d
Df |x+th = D2 f |x+th (x + th) = (Hess f (x + th)h)T :
dt
dt
Hence
d
Df |x+th h = ⟨Hess f (x + th)h; h⟩:
dt
Then
˛1
f (x + h) − f (x) = (t − 1)Df |x+th h˛ −
0
= Df |x h +
Z 1
0
Z 1
0
(t − 1) · ⟨Hess f (x + th)h; h⟩ dt
(1 − t) · ⟨Hess f (x + th)h; h⟩ dt
= ⟨∇f (x); h⟩ +
Z 1
0
(1 − t)⟨Hess f (x + th)h; h⟩ dt:
Free Extrema
Slide 662
Quadratic Approximation of Potential Functions
22.2. Corollary. Let ˙ ⊂ Rn be an open set and f ∈ C 2 (˙; R)). Then, as
h → 0,
1
f (x + h) = f (x) + ⟨∇f (x); h⟩ + ⟨Hess f (x)h; h⟩ + o(∥h∥2 ):
2
(22.3)
Proof.
In view of (22.2), we just need to show that
Z 1
1
(1 − t)⟨Hess f (x + th)h; h⟩ dt = ⟨Hess f (x)h; h⟩ + o(∥h∥2 );
2
0
as h → 0.
Free Extrema
Slide 663
Quadratic Approximation of Potential Functions
Proof (continued).
We have
Z 1
=
0
Z 1
0
+
(1 − t)⟨Hess f (x + th)h; h⟩ dt
(1 − t)⟨(Hess f (x + th) − Hess f (x))h; h⟩ dt
Z 1
0
(1 − t)⟨Hess f (x)h; h⟩ dt
1
= ⟨Hess f (x)h; h⟩ +
2
Z 1
0
(1 − t)⟨(Hess f (x + th) − Hess f (x))h; h⟩ dt;
so it remains to show
Z 1
0
(1 − t)⟨(Hess f (x + th) − Hess f (x))h; h⟩ dt = o(∥h∥2 ):
Free Extrema
Slide 664
Quadratic Approximation of Potential Functions
Proof (continued).
We have
˛Z 1
˛
˛
˛
˛ (1 − t)⟨(Hess f (x + th) − Hess f (x))h; h⟩ dt ˛
˛
˛
0
≤ 1 · sup |1 − t| · sup |⟨(Hess f (x + th) − Hess f (x))h; h⟩|
t∈[0;1]
t∈[0;1]
≤ sup ∥Hess f (x + th) − Hess f (x)∥ · ∥h∥2 ;
t∈[0;1]
where we have used the operator norm for the Hessian. In order to show
the desired estimate, we need to establish
lim sup ∥Hess f (x + th) − Hess f (x)∥ = 0:
h→0 t∈[0;1]
However, this follows from the continuity of the second derivative of f
together with Lemma 9.15.
Free Extrema
Slide 665
Quadratic Forms
22.3. Definition. Let A ∈ Mat(n × n; R). Then the quadratic form
induced by A is defined as the map
QA := ⟨ · ; A( · )⟩;
x 7→ ⟨x; Ax⟩ =
n
X
ajk xj xk ;
x ∈ Rn :
j;k=1
Clearly, QA (–x) = –2 QA (x) for any – ∈ R. Note also that QA is
continuous, because it is a polynomial in x1 ; : : : ; xn .
Free Extrema
Slide 666
Quadratic Forms
22.4. Definition. A quadratic form QA induced by a matrix
A ∈ Mat(n × n; R) is called
I positive definite if QA (x) > 0 for all x ̸= 0,
I negative definite if QA (x) < 0 for all x ̸= 0,
I indefinite if QA (x0 ) > 0 for some x0 ∈
y0 ∈ n .
R
Rn and QA (y0 ) < 0 for some
A matrix A is said to be negative definite / positive definite / indefinite if
the induced quadratic form QA has the corresponding property.
22.5. Remarks.
I It is easy to see that not all quadratic forms fall into one of the above
three categories.
I If A is positive definite, then −A is negative definite.
Free Extrema
Slide 667
Quadratic Forms
22.6. Example. The matrix
A=
1 −2
1 1
!
is positive definite, since
QA (x) =
*
!
x1
1 −2
;
x2
1 1
!
x1
x2
!+
= x1 (x1 − 2x2 ) + x2 (x1 + x2 )
´
1`
= x12 + x22 − x1 x2 = x12 + x22 + (x1 − x2 )2 :
2
This expression is strictly positive when x ̸= 0.
Free Extrema
Slide 668
Criteria for Definiteness
We will be mainly interested in the case n = 2, and here we can give an
explicit criterion:
22.7. Lemma. Let A ∈ Mat(2 × 2; R) be symmetric, i.e.,
A=
!
a b
:
b c
Let ∆ = det A. Then
(i) A positive definite ⇔ a > 0 and ∆ > 0
(ii) A negative definite ⇔ a < 0 and ∆ > 0
(iii) A indefinite ⇔ ∆ < 0
The proof will be part of the assignments.
Free Extrema
Slide 669
Criteria for Positive Definiteness
For our analysis of the extrema of a function, the following criteria are
essential:
22.8. Lemma. The matrix A ∈ Mat(n × n; R) is
positive definite
⇔
negative definite
⇔
∃
2
∀ QA (x) ≥ ¸∥x∥
∃
2
∀ QA (x) ≤ −¸∥x∥
¸>0 x∈Rn
¸>0 x∈Rn
Proof.
If there exists some ¸ > 0 such that QA (x) ≥ ¸∥x∥2 for all x ∈ Rn , it is
clear that QA (x) > 0 if x ̸= 0, so A is positive definite.
Now let A be positive definite. In particular, QA (x) > 0 for
x ∈ S n−1 = {x ∈ Rn : ∥x∥ = 1}. Since S n−1 is closed and bounded, it is
compact by Theorem 9.9.
Free Extrema
Slide 670
Criteria for Definiteness
Proof (continued).
By Theorem 9.12, the minimum
¸ := min QA (x)
x∈S n−1
exists and is strictly positive. Thus, for x ̸= 0,
„
x
1
QA (x) = QA
2
∥x∥
∥x∥
«
≥¸
and so QA (x) ≥ ¸∥x∥2 . This is also trivially true for x = 0, so we have
proven the statement for positive definite matrices.
The matrix A will be negative definite if and only if −A is positive definite.
Since Q−A (x) = −QA (x), the statement for negative definite matrices
follows.
Free Extrema
Slide 671
Extrema of Real Functions
We state the obvious definitions:
22.9. Definition. Let ˙ ⊂ Rn and f : ˙ → R.
(i) f is said to have a (global) maximum at ‰ ∈ ˙ if
x ∈˙
⇒
f (x) ≤ f (‰):
(ii) f is said to have a strict (global) maximum at ‰ ∈ ˙ if
x ∈ ˙ \ {‰}
⇒
f (x) < f (‰):
The function f is said to have a (strict) global minimum at ‰ if −f has a
(strict) global maximum at ‰.
Free Extrema
Slide 672
Extrema of Real Functions
22.10. Definition. Let ˙ ⊂ Rn and f : ˙ → R.
(i) f is said to have a local maximum at ‰ ∈ ˙ if there exists a ‹ > 0
such that
x ∈ ˙ ∩ B‹ (‰)
⇒
f (x) ≤ f (‰):
(ii) f is said to have a strict local maximum at ‰ ∈ ˙ if there exists a
‹ > 0 such that
x ∈ ˙ ∩ B‹ (‰) \ {‰}
⇒
f (x) < f (‰):
The function f is said to have a (strict) local minimum at ‰ if −f has a
(strict) local maximum at ‰.
As usual, we will be able to deal with extrema at interior points of ˙ using
methods based on differentiation, while the boundary points of ˙ must be
considered separately.
Free Extrema
Slide 673
Extrema of Real Functions
22.11. Theorem. Let ˙ ⊂ Rn , f : ˙ → R and ‰ ∈ int ˙. Assume that all
partial derivatives of f exist at ‰ and that f has a local extremum
(maximum or minimum) in ‰. Then
∇f (‰) = 0:
If f is differentiable in ‰, this implies Df |‰ = 0.
Proof.
Let ‰ = (‰1 ; : : : ; ‰n ). Define
g1 (x1 ) := f (x1 ; ‰2 ; : : : ; ‰n ):
Then g1 has a local extremum at x1 = ‰1 and
˛
˛
˛
dg1 ˛˛
@f (x1 ; ‰2 ; : : : ; ‰n ) ˛˛
@f ˛˛
0=
=
=
:
˛
˛
dx1 x1 =‰1
@x1
@x1 ˛x=‰
x1 =‰1
In the same way we see that all partial derivatives of f vanish at ‰.
Free Extrema
Slide 674
Extrema of Real Functions
22.12. Theorem. Let ˙ ⊂ Rn be open, f ∈ C 2 (˙) and ‰ ∈ ˙. Let
∇f (‰) = 0 (i.e., Df |‰ = 0).
(i) If Hess f |‰ is positive definite, f has a strict local minimum at ‰.
(ii) If Hess f |‰ is negative definite, f has a strict local maximum at ‰.
(iii) If Hess f |‰ is indefinite, f has no extremum at ‰.
Proof.
Denote by Q the quadratic form induced by Hess f |‰ . Since Df |‰ = 0, we
see from (22.3) that
„
f (‰ + h) − f (‰)
h
1
= Q
2
∥h∥
2
∥h∥
«
+ %(h)
with lim %(h) = 0.
h→0
(22.4)
Now let Hess f |‰ be positive definite. By Lemma 22.8 we can find ¸ > 0
such that Q(h=∥h∥) ≥ ¸ for all h ̸= 0.
Free Extrema
Slide 675
Extrema of Real Functions
Proof (continued).
For this ¸, we can find a ‹ > 0 such that
1. |%(h)| < ¸=2 whenever ∥h∥ < ‹ and
2. B‹ (‰) ⊂ ˙.
Now every x ∈ B‹ (‰) \ {‰} can be expressed as x = ‰ + h with
h := x − ‰ ̸= 0, so we have
f (‰ + h) − f (‰)
f (x) − f (‰) = ∥h∥2
= ∥h∥2
∥h∥2
≥ ∥h∥2
„
«
„
„
1
h
Q
2
∥h∥
«
«
+ %(h)
¸
− |%(h)| > 0
2
for all x ∈ B‹ (‰) \ {‰}. Thus f (‰) is a strict local minimum.
In the same way one sees that f (‰) is a strict local maximum if Hess f |‰ is
negative definite.
Free Extrema
Slide 676
Extrema of Real Functions
Proof (continued).
Now assume that Hess f |‰ is indefinite. Then there exist h0 ; k0 ∈ Rn such
that
and
Q(h0 ) > 0
Q(k0 ) < 0:
For all – ̸= 0 we then have
„
–h0
Q
∥–h0 ∥
and similarly
Q
«
„
„
–2
h0
=
Q
2
|–|
∥h0 ∥
–k0
∥–k0 ∥
«
=
«
=
1
Q(h0 ) =: "1 > 0
∥h0 ∥2
1
Q(k0 ) =: −"2 < 0
∥k0 ∥2
Then we see from (22.4) that for sufficiently small – ̸= 0 we have
f (‰ + –h0 ) > f (‰) and f (‰ + –k0 ) < f (‰), so f can not be a local
extremum.
Free Extrema
Slide 677
Extrema of Real Functions
Applying Lemma 22.7, we have the following result:
22.13. Corollary. Let ˙ ⊂ R2 be open, f ∈ C 2 (˙) and ‰ ∈ ˙ with
∇f (‰) = 0. Set
˛
˛
@ 2 f ˛˛ @ 2 f ˛˛
∆ := det Hess f |‰ =
˛
˛ −
@x12 ˛‰ @x22 ˛‰
Then f (‰) is
˛
˛
2
I a local minimum if @ f2 ˛
@x1 ˛
„
˛ «
@ 2 f ˛˛
˛
@x1 @x2 ˛‰
> 0 and ∆ > 0,
‰
˛
2f ˛
@
˛ < 0 and ∆ > 0,
I a local maximum if
@x12 ˛
‰
I no extremum if ∆ < 0.
Note that if ∆ = 0, Corollary 22.13 yields no information.
2
Free Extrema
Slide 678
Finding Extrema
In searching for extrema of functions f ∈ C 2 (˙; R), we follow a four-step
process:
1. Check for critical points ‰ ∈ int ˙, i.e., those were Df |‰ = 0.
2. Use Theorem 22.12 or Corollary 22.13 to check which of the critical
points is an extremum.
3. Check the boundary @˙ separately for local extrema.
4. Identify the global extrema. Any finite global extremum must also be
a local extremum, so it will be included among those found above.
22.14. Example. Let f (x; y ) = x 3 + y 3 − 3xy on R2 . Then ∇f = 0 gives
@f
= 3x 2 − 3y = 0;
@x
@f
= 3y 2 − 3x = 0
@y
or x 2 = y and y 2 = x. The only two solutions to these equations are (0; 0)
and (1; 1).
Free Extrema
Slide 679
Finding Extrema
We have
2
∆(x; y ) = fxx fy y − fxy
= (6x)(6y ) − (−3)2 = 36xy − 9:
At (0; 0), ∆ < 0 so there is no extremum at this point. At (1; 1), ∆ > 0
and fxx = 6 > 0, so this point corresponds to local minimum. Since
˙ = R2 is open, there are no other extrema.
Taylor Series
Slide 680
23. Taylor Series
Taylor Series
Slide 681
Vector Fields and Higher Order Derivatives
17. Potential Functions and the Gradient
18. Vector Fields and Integrals
19. Flux and Circulation
20. Fundamental Theorems of Vector Calculus
21. The Second Derivative
22. Free Extrema
23. Taylor Series
Taylor Series
Slide 682
Higher-Order Derivatives
In this section we will suppose (X; ∥ · ∥X ) and (V; ∥ · ∥z v ) to be normed
vector spaces, ˙ ⊂ X an open set and we will consider functions
f : ˙ →V.
We may extend the strategy of Definition 21.1 to define derivatives of
higher than second order inductively by setting
Dk f |x = D(Dk−1 f )|x ∈ L(k) (X
× ·{z
· · × X}; V ):
|
k times
for k = 2; 3; 4 : : :. Here, we again identify
L(k) (X × · · · × X ; V ) ∼
= L(X; L(k−1) (X × · · · × X ; V ))
|
{z
k times
}
|
{z
k − 1 times
}
We denote by C k (Ω; V ) the set of those functions whose kth derivative is
continuous and by C ∞ (Ω; V ) the intersection of all sets C k (Ω; V ), k ∈ N.
Taylor Series
Slide 683
Multi-Index Notation
For maps f ∈ C k (Rn ; R) the following multi-index notation for partial
derivatives has been developed. This notation depends essentially on the
interchangeability of partial derivatives.
The tuple ¸ = (¸1 ; : : : ; ¸n ) ∈ Nn is called a multi-index of degree
|¸| = ¸1 + · · · + ¸n . We also define
n
Y
¸! := ¸1 !¸2 !:::¸n ! =
¸i !
i=1
For f ∈ C k (Rn ; R) we define
@ ¸ f :=
@¸f
@ |¸| f
:=
:
@x ¸
@x1¸1 @x2¸2 : : : @xn¸n
For x = (x1 ; : : : ; xn ) ∈ Rn we define
x ¸ := x1¸1 x2¸2 : : : xn¸n =
n
Y
¸i
xi :
i=1
Taylor Series
Slide 684
Multi-Index Notation
In particular, if we have a potential function f ∈ C k (Rn ; R) and we let the
kth derivative act on k copies of a vector u ∈ Rn , then
Dk f |x (u; : : : ; u ) =
| {z }
k times
X k!
@ ¸ f (x)u ¸ :
¸∈Nn
|¸|=k
¸!
(23.1)
(Prove this by induction!) This relation will be important in the
formulation of Taylor’s Theorem for potential functions. For simplicity, we
define the notation
u (k) := (u; : : : ; u)
|
{z
k times
}
for u ∈ X.
(23.2)
Taylor Series
Slide 685
Taylor’s Theorem
23.1. Taylor’s Theorem. Suppose that f ∈ C k (˙; V ). Let x ∈ ˙ and h ∈ X
such that the line ‚(t) = x + th, 0 ≤ t ≤ 1, is wholly contained within ˙.
Denote by h(k) a k-tupel as in (23.2). Then for all p ≤ k,
f (x + h) = f (x) +
1
1
Df |x h + · · · +
Dp−1 f |x h(p−1) + Rp (x; h)
1!
(p − 1)!
(23.3)
with the remainder term
Rp (x; h) =
Z 1
0
(1 − t)p−1 p
D f |x+th h(p) dt:
(p − 1)!
Taylor Series
Slide 686
Taylor’s Theorem
Proof.
We prove the theorem by induction in p. For p = 1 the theorem is just the
Mean Value Theorem 11.6, so there is nothing to prove. In order to prove
that the formula for p implies the corresponding formula for p + 1, we need
to show
1
Rp (x; h) = Dp f |x h(p) + Rp+1 (x; h):
p!
Integrating by parts, we have
Rp (x; h) =
Z 1
0
(1 − t)p−1 p
D f |x+th h(p) dt
(p − 1)!
˛
˛1
(1 − t)p p
=−
D f |x+th h(p) ˛˛
+
p!
t=0
Z 1
0
(1 − t)p d p
D f |x+th h(p) dt
p! dt
Taylor Series
Slide 687
Taylor’s Theorem
Proof (continued).
By the chain rule, we have
d p
D f |x+th h(p) = D(Dp f |z h(p) )|z=x+th h = Dp+1 f |x+th h(p+1)
dt
so
˛
Z
1 (1 − t)p d
˛1
(1 − t)p p
D f |x+th h(p) ˛˛
+
Dp f |x+th h(p) dt
Rp (x; h) = −
p!
p! dt
0
t=0
Z 1
1
(1 − t)p p+1
= Dp f |x h(p) +
D f |x+th h(p+1) dt
p!
p!
0
1 p
(p)
= D f |x h + Rp+1 (x; h)
p!
Taylor Series
Slide 688
Taylor’s Theorem
We will call
tf ;x;p (h) := f (x + h) − Rp (x; h)
1
1
Dp−1 f |x h(p−1)
= f (x) + Df |x h + · · · +
1!
(p − 1)!
(23.4)
the Taylor polynomial of degree p − 1 of f at x in the variable h.
Frequently we will set x = x0 and h = x.
23.2. Remark. If X = Rn , V = R, we see from (23.1) and (23.4) that
f (x0 + x) − Rp (x0 ; x) =
p−1
X
p−1
X 1 X k!
1 k
D f |x0 x (k) =
@ ¸ f (x0 )x ¸
k!
k!
¸!
k=0
k=0
¸∈Nn
|¸|=k
=
p−1
X
1 ¸
@ f (x0 )x ¸
¸!
|¸|=0
Taylor Series
Slide 689
Taylor’s Theorem
Another way of expressing the Taylor expansion in Rn is to write
f (x0 + x) =
p−1
X
1 ¸
@ f (x0 )x ¸ + O(x p )
¸!
|¸|=0
(23.5)
where O(x p ) is understood to refer to any combinations of monomials x ¸
with |¸| = p.
23.3. Example. The Taylor polynomial of degree 2 of the function
f (x1 ; x2 ) = cos(x1 + 2x2 ) at x0 ∈ R2 is given by
tf ;x0 ;3 (x1 ; x2 )
:= f (x0 + x) − R3
1
1
1
=
@ (0;0) f (x0 )x (0;0) +
@ (1;0) f (x0 )x (1;0) +
@ (0;1) f (x0 )x (0;1)
(0; 0)!
(1; 0)!
(0; 1)!
1
1
1
@ (2;0) f (x0 )x (2;0) +
@ (1;1) f (x0 )x (1;1) +
@ (0;2) f (x0 )x (0;2
+
(2; 0)!
(1; 1)!
(0; 2)!
Taylor Series
Slide 690
Taylor’s Theorem
To find the Taylor polynomial at x0 = 0 we have
tf ;0;3 (x1 ; x2 )
1
1
1
=
f (0)x10 x20 +
@x1 f |x0 =0 x11 x20 +
@x f |x =0 x10 x21
0!0!
1!0!
0!1! 2 0
1
1 2
1 2
+
@ f |x =0 x12 x20 +
@x x f |x =0 x11 x21 +
@ f |x =0 x10 x22
2!0! x1 0
1!1! 1 2 0
0!2! x2 0
1
= f (0; 0) + @x1 f |x0 =0 x1 + @x2 f |x0 =0 x2 + @x21 f |x0 =0 x12
2
1 2
2
+ @x1 x2 f |x0 =0 x1 x2 + @x2 f |x0 =0 x2
2
1 2
= 1 − x1 − 2x1 x2 − 2x22
2
= cos(x1 + 2x2 ) − R3 :
Taylor Series
Slide 691
Taylor’s Theorem
We could also have obtained this result in an easier way by using the
Taylor formula for functions of a single variable:
1
cos(x1 + 2x2 ) = 1 − (x1 + 2x2 )2 + O(x 4 )
2
1
= 1 − x12 − 2x1 x2 − 2x22 + O(x 4 ):
2
In cases where this (quick) strategy will not easily work, the full formula
(23.5) needs to be evaluated.
Download