Uploaded by 김지하

Calculus1 Textbook

advertisement
Yong-Jung Kim
25 Lectures for
Undergraduate Calculus I
February 20, 2024
카이스트 수리과학과
To the ones who question
Foreword
For a long time, there has been a need for a revision of the first-year calculus curriculum. The primary reasons for the necessity of curriculum revision are not so
much the changes in calculus itself but the failure of the current curriculum to reflect
the changes in the high school education that incoming students have experienced.
Additionally, the recent advancements in technology have altered the demands of
members from other departments who utilize mathematics.
Calculus is a crucial subject that students encounter for the first time upon entering university. The emphasis has been placed on raising interest and enthusiasm
for academic pursuits. To achieve this, instead of merely listing mathematical facts,
a new structure has been adopted, focusing on achieving core objectives through
the process of acquiring mathematical facts. To this end, the first part of Calculus 1 places the understanding of Kepler’s laws as a core objective. In fact, Newton invented calculus for this purpose. Through this process, students familiarize
themselves with the basic concepts of calculus and learn about vectors in threedimensional space, including velocity, acceleration, and gravity. Particularly, learning basic principles related to the orbits of satellites and planets has become a crucial
educational topic for scientists in South Korea, especially after the successful launch
of the Nuriho 3rd satellite.
The latter part of Calculus 1 focuses on the development and understanding of
approximation techniques. After learning the mathematical core techniques of integration and differentiation in Part III, the study of sequences and series is approached from the perspective of approximation techniques in Part IV. In particular,
students learn the mathematical understanding of approximation techniques, which
is essential for engineers.
Practicing the achievement of scientific goals with a long-term perspective may
feel more challenging, as it is an experience not typically encountered in middle
and high school curricula. However, the practice of applying and developing various
mathematical facts to achieve scientific goals is expected to be a valuable experience
and will aid in the research life of a scientist.
vii
viii
Daejeon, August 2023,
Foreword
Calculus Curriculum Revision Committee
Preface
There is a fundamental difference between academic textbooks on university subjects and lecture notes used for teaching. While academic textbooks strive for completeness and accurate explanations of essential parts, even if they cannot encompass
all related content, they also emphasize accessibility to easily approach necessary
sections even independent of the course progression. Difficulty in accessing information can arise if one has to revisit the entire preceding section to understand a
specific part.
In contrast, lecture notes are created for the purpose of teaching. They are designed with the consideration of students studying the entire course together. Therefore, the key difference from academic textbooks lies in the approach of guiding
students to follow the entire process. Effective communication, resembling a conversation between the lecturer and students in the classroom, is essential. Proper
questions and motivation that allow students to think can enhance the effectiveness
of learning. Sometimes, motivating students with appropriate hints may be more effective than providing detailed explanations, encouraging students to find answers
on their own and stimulating critical thinking and creativity.
”25 Lectures for Undergraduate Calculus I” adheres to the characteristics of lecture notes, structured to facilitate communication between students and instructors.
Efforts have been made to construct it in a format where achievements can be made
through appropriate questions and the process of self-understanding. Additionally,
considering holidays and other factors, the notes are structured for 25 lectures, even
though many universities conduct a semester course consisting of a maximum of 28
lectures, each lasting 75 minutes.
Questions serve as the driving force for learning and the starting point for creative
thinking. This aligns with the QAIST education philosophy that emphasizes the importance of questions. The structure of these lecture notes aims to replace summaries
and proofs with questions and solutions. Only the essential summaries remain in the
form of a structured presentation. It is encouraged to visualize problems and take
time to answer them before making an effort to understand the explanations. Continuous questioning is promoted. Attempting to answer these questions leads to a
deeper understanding of the core concepts and encourages individuals to formulate
their own questions. Asking questions is the beginning of creating something new.
Yong Jung Kim
ix
x
Preface
Preface
xi
To KAIST Students Attending the Course
A semester consists of approximately 25 lectures, and this lecture note is also
organized into 25 lectures. It is helpful to read the lecture content before attending
the class and come prepared with questions. Even if time is limited, verify the goals
of the class before entering. For the problems constituting each lecture, try to answer them yourself before looking at the solutions. Subsequently, actively seek to
understand the solutions. Each lecture contains several questions, so take some time
to ponder them. It is advisable to maintain a slightly slower pace while engaging
in mathematical activities. Reflecting with leisure can yield more effective results.
At the end of each of the 25 lectures, exercise problems are provided. Although
not numerous, they serve as a means to confirm and deepen your understanding
of the material. If you find the practice problems insufficient, consider attempting
problems from other general calculus books.
This lecture note was initially created in Korean and then translated into English
with the assistance of ChatGPT. While the English version is the official one, you
are welcome to use the Korean version.
Contents
Part I Differentiation: Mathematical Description of Motion
1
Limit and continuity #1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Common Language Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Quality control and ε-δ arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
3
8
2
Limit and continuity #2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Rigorous definitions using ε-δ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Limits as x → ∞ and f (x) → ∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
11
15
16
3
Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Rate of increase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Differentiation Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Intermediate and Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Derivative of Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Velocity and Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
19
22
25
25
26
4
Chain rule and implicit differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Implicit Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5
Integration & fundamental theorem of calculus . . . . . . . . . . . . . . . . . . . .
5.1 Antiderivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Integral as the area bounded by a graph . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Riemann sum and area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
37
39
40
6
Inverse functions and their derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1 Bijection (one-to-one and onto function) . . . . . . . . . . . . . . . . . . . . . . .
6.2 Derivative of inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Natural logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
45
47
49
xiii
xiv
Contents
6.4
Exponential function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Part II Kepler and Newton’s Laws of Motion
7
Rectangular coordinate system and curves in R3 . . . . . . . . . . . . . . . . . . .
7.1 Coordinate system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Moving particle and trajectory curves in space . . . . . . . . . . . . . . . . . .
7.4 Cross product & inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
57
59
61
62
8
Polar coordinates in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1 Variable change with polar coordinates . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Motion in polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3 Ellipses in polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.4 Curves in polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
67
69
71
73
9
Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1 First order differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2 Separation of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3 Integrating factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.4 Second Order Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . .
9.5 Equation for two-body problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
77
80
81
83
83
10
Newton’s law on Earth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1 Newton’s law of motion and gravitation . . . . . . . . . . . . . . . . . . . . . . . .
10.2 Work and energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3 Gravity force and potential energy . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4 Projectile motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
85
86
87
90
11
Newton’s law in space: Two-body problem . . . . . . . . . . . . . . . . . . . . . . . .
11.1 Kepler’s laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2 Two-body problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3 Center of mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.4 Displacement vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.5 Kepler problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
93
94
95
96
97
12
Kepler’s law and the energy of planets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
12.1 Energy of circular orbits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
12.2 Energy of elliptical orbits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
12.3 Circular orbit of satellites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
12.4 Elliptical orbits of satellites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
12.5 Interstellar and solar system object . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Part III The Arts of Calculus
13
Curves and particle trajectories in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Contents
13.1
13.2
13.3
13.4
13.5
xv
Arc length as a variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Parametrization with arc length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
TNB coordinate system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Computation formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
14
Linearization and differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
14.1 Linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
14.2 Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
14.3 Differentials for linear approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 124
15
Inverse trigonometric and hyperbolic functions . . . . . . . . . . . . . . . . . . . . 127
15.0.1 Inverse trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . 127
15.1 Hyperbolic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
16
L’Hopital’s rule, big-oh, and little-oh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
16.1 L’Hopital’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
16.2 Big-oh and Little-oh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
17
Integration Techniques # 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
17.1 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
17.2 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
18
Integration Techniques # 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
18.1 Trigonometric substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
18.2 Integration of rational functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
19
Integration Techniques #3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
19.1 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
19.2 Integration with software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Part IV Approximation Techniques and Series
20
Numerical Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
20.1 Numerical integration and Riemann sum . . . . . . . . . . . . . . . . . . . . . . . 159
20.2 Convergence order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
20.3 Numerical integrals and Gaussian quadrature . . . . . . . . . . . . . . . . . . . 163
21
Sequences and series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
21.1 Sequence of real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
21.2 Series of real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
21.3 Power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
22
Tests for absolute convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
22.1 Integral test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
22.2 Comparison test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
22.3 Ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
xvi
Contents
22.4 Root test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
23
Power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
23.1 Convergence of a power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
23.2 Radius of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
23.3 Alternating series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
23.4 Rearrangement and conditional convergence . . . . . . . . . . . . . . . . . . . . 190
24
Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
24.1 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
24.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
A
Second Order Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
A.1 Second-order homogeneous linear equation . . . . . . . . . . . . . . . . . . . . . 201
A.2 Second order inhomogeneous linear equation . . . . . . . . . . . . . . . . . . . 205
A.3 Equation for two-body problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
B
Elliptical orbits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
B.1 Eccentricity and focus of an ellipse . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
B.2 Directices and ellipses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
B.3 Polar equations of an ellipse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
C
Numerical experiments for Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Part I
Differentiation: Mathematical Description
of Motion
During the time of Newton (1643-1727), the most intriguing scientific topic was
the motion of celestial bodies. The heliocentric theory had collapsed due to Galileo
(1564-1642), and the geocentric theory had started to be accepted, thanks to Kepler
(1571-1630), who explained the orbits and motions of celestial bodies. Therefore,
the study of celestial motion was the hottest topic of that era. However, the explanations were based on observational data and could not address the underlying causes.
Moreover, there were no mathematical tools suitable for describing such dynamic
celestial motions. Newton, the person who developed mathematical tools suitable
for describing dynamic phenomena in contrast to static ones, such as Pythagoras’s
theorem, was the one who introduced calculus. Newton used calculus to explain the
motion of celestial bodies.
In Part I, our goal is to develop the concept of differentiation, the means to represent such dynamic motion. Studying it from Newton’s perspective of developing
calculus may help us better understand its value. In the following chapters of Part
II, we will use differentiation to mathematically represent and prove the orbits of
planets and satellites. We will then use this knowledge to explain celestial motion,
including Kepler’s laws. In this process, we will grasp the concept of differentiation
and experience how to apply it in specific situations.
Of course, differentiation is used not only in celestial motion but also in various
other areas. While studying celestial motion is fascinating, we should not limit ourselves to that perspective. We will learn the fundamental aspects of calculus from
various viewpoints. In particular, we will rigorously handle concepts such as convergence, continuity, and differentiability using the ε-δ technique. This technique,
developed after Newton, might be allowing us to deal with differentiation more rigorously than Newton did.
Lecture 1
Limit and continuity #1
In calculus, limits are frequently used. Not only do we define differentiation and
integration through limits, but we also understand fundamental concepts like convergence and continuity through limits. In practical applications, obtaining an approximate value is often more common than an exact true value. In such cases, a
crucial question arises: ”Does the approximate value converge to the true value as
accuracy increases, or is there a limit beyond which it cannot approach?”
Limits or convergence are not exclusive to calculus; they are core concepts recurring in various fields. Therefore, a clear understanding of these concepts is crucial.
In this lecture, we learn about limits, convergence, and continuity using everyday
language.
Use of Symbols
Although mathematics is considered the study of numbers, it often expresses many
things in terms of symbols rather than actual numbers. Notations like f , g are used to
represent functions, x, y, z for variables, and a, b, c for constants. However, using only
alphabetical characters is not sufficient, and Greek letters are frequently employed.
Here are some commonly used Greek lowercase letters:
1. α alpha, β beta, γ gamma, ω omega, σ sigma, θ theta, ρ rho, φ phi,
2. ε epsilon, δ delta, λ lambda, τ tau
And commonly used Greek uppercase letters:
1. Ω Omega, Σ Sigma, ∆ Delta
1.1 Common Language Definitions
Let’s understand the properties of limits and continuity using everyday language.
Consider a function f with real values defined on the interval [a, b] ⊂ R. In notation,
we write this as:
f : [a, b] → R.
We refer to [a, b] as a closed interval, including both a and b and all real numbers
in between. It is denoted as [a, b] = {x ∈ R : a ≤ x ≤ b}. An open interval, denoted
3
4
1 Limit and continuity #1
as (a, b), excludes the endpoints. If a variable x ∈ [a, b] approaches a specific value
c ∈ (a, b), and the function value f (x) approaches a certain value L, we say that the
limit of f (x) as x approaches c is L, denoted as:
lim f (x) = L.
x→c
We also say that, as x approaches c, f (x) converges to L. This convergence statement
holds regardless of the direction from which x approaches c. In one-dimensional
space R, there are only two directions: left and right. The right limit (right limit) is
denoted as:
lim f (x) = L.
x→c+
Here, c+ implies that x approaches c but always from the right, indicating values
greater than c. Similarly, the left limit (left limit) is denoted as:
lim f (x) = L.
x→c−
In summary, the limit limx→c f (x) = L means both the left and right limits converge
to L. That is,
lim f (x) = lim f (x) = L.
x→c−
x→c+
When discussing limits and convergence, the specific value f (c) has no relevance.
The focus is on the behavior of the function as x approaches c.
On the other hand, the continuity of a function f at c is related to the function
value f (c). If, as x approaches c, the limit exists, and the limit value L is equal to
f (c), we say that f is continuous at c. In summary, saying f is continuous at c means
that the following four conditions exist and are all equal:
lim f (x) = lim f (x) = lim f (x) = f (c).
x→c+
x→c−
x→c
Now, let’s consider the first problem in this lecture:
Problem 1.1. Find the left and right limits of the given functions at the specified
points c and determine whether the functions are continuous at those points.
2 −1
1
at c = 1.
(4) x−1
at c = 1.
(1) sin x at c = 0.
(2) cos x at c = 0.
(3) xx−1
(5) f (x) at c = 0, where f (x) is the Heaviside function given by
(
0, if x < 0
f (x) =
1, if x ≥ 0.
After attempting to answer the questions, it’s beneficial to review the solutions to
enhance mathematical understanding.
1.1 Common Language Definitions
5
Solution 1.1 Knowing the graph of the functions helps determine the one-sided
limits easily. (1) lim sin x = 0. (2) lim cos x = 1. (Both limits in these examples equal
x→0
x→0
2
−1
the function value at c = 0.) (3) limx→1 xx−1
= 2. (Note that
x2 −1
x−1
is undefined at
1
does
x = 1 due to division by zero, but the limit exists.) (4) The limit lim
x→1 x − 1
not exist. (As x → 1, the function value diverges. Therefore, there is no specific
number L corresponding to the limit.) (5) The limit lim f (x) does not exist. The
x→0
function f (x) converges to 1 as x approaches 0 from the right and converges to 0 as
x approaches 0 from the left. Since the right-hand limit and the left-hand limit are
different, it is said that the limit does not exist. ⊔
⊓
Mathematical principles and laws consist of conditions and results. This is true
not only in mathematics but also in most sciences. However, at times, even without
explicitly stating what the conditions are, the meaning of the conclusion can be clear
based on what it implies. Let’s practice reading mathematical laws by looking at the
following principles.
Problem 1.2 (Rules of limits). Below are several laws related to limits written in
the form of expressions. To understand the meaning of these expressions, one must
be able to read what the conditions are and what the conclusions are. Differentiate
and explain the meaning of the laws by specifying the conditions and conclusions.
(1) lim ( f (x) + g(x)) = lim f (x) + lim g(x).
x→c
x→c
x→c
(2) lim ( f (x) − g(x)) = lim f (x) − lim g(x).
x→c
x→c
x→c
(3) lim (k f (x)) = k(lim f (x)), k is a constant number.
x→c
x→c
(4) lim ( f (x)g(x)) = (lim f (x))(lim g(x)).
x→c
x→c
x→c
Solution 1.2 Let’s consider the obvious conditions and conclusions for these expressions to have meaning. For the first expression (1), the condition for it to have
meaning is simply ”both limits limx→c f (x) and limx→c g(x) exist.” The conclusion
it wants to convey is ”the limit of the function f + g also exists, and that limit is the
sum of the two limits limx→c f (x) + limx→c g(x).” This can be expressed differently,
but it is not very natural. Arguing something unnatural is not helpful. Now, let’s
similarly explain the conditions and results for the remaining three cases. ⊔
⊓
The following is a question. Unlike problems, questions are not intended to have
mathematical answers, but rather to think about principles, provide motivation, or
sometimes pose slightly philosophical questions. Some answers are provided below,
while others are not.
Question 1.1. If you have proven the four laws above, what is the difference between proving and explaining?
6
1 Limit and continuity #1
Expressing mathematical facts in everyday language and understanding them in
your own terms is crucial. It is a core part of the process of understanding mathematics. However, even though explanations have been given in everyday language,
calling it a proof may feel somewhat lacking. The next lecture will introduce the
ε-δ method for a more rigorous approach.
Problem 1.3 (Limit of quotient). The quotient rule related to limits is as follows:
lim f (x)
f (x) x→c
=
.
x→c g(x)
lim g(x)
lim
x→c
For this rule to have meaning, conditions are needed. State the necessary conditions
and explain the conclusion.
Solution 1.3 The four cases from Problem 1.2 are different in a way, so they are
separated here. If we discuss it in a similar way, the condition is that both limits
limx→c f (x) and limx→c g(x) exist. The conclusion is that the limit of the quotient
f (x)
x→c f (x)
given by g(x)
also exists, and that limit is lim
limx→c g(x) . However, there is an issue here.
In the case of a fraction, the denominator should not be 0. Therefore, an additional
condition needs to be added in the conditions section, namely, limx→c g(x) ̸= 0.
Additionally, when x approaches c, g(x) ̸= 0 must be satisfied for the left-hand side
to have meaning. These two conditions are required. ⊔
⊓
One of the most commonly used and important functions is the power function.
Given a real number α as the power, the function is defined as f (x) = xα . The
composite function F(x) = ( f (x))α also frequently appears. When dealing with the
limits of these functions, certain precautions need to be taken. Conversely, exponential functions have a constant base, and the power is a variable, such as f (x) = 2x or
generally f (x) = ax .
Problem 1.4 (Limit of power functions). The following power rule and its meaning are given:
lim ( f (x))α = lim f (x)
x→c
x→c
α
.
Explain the meaning of this rule and state the necessary conditions for the rule to
hold.
Solution 1.4 When given a power function as in the problem, additional conditions
depend on the value of α.
(1) If α is a positive integer, the necessary condition is that the limit limx→c f (x)
exists.
(2) If α is a negative integer, since it is in fractional form, the limit should not be 0,
and f (x) in the vicinity of c should not be 0.
(3) If α is a positive real number, in addition to the condition that the limit
limx→c f (x) exists, f (x) must be either 0 or positive in order to avoid issues with
1.1 Common Language Definitions
7
the square of a negative real number.
(4) If α is a negative real number, in addition to the condition that the limit
limx→c f (x) exists, f (x) must be positive, and its limit limx→c f (x) should also be
positive to avoid division by 0. ⊔
⊓
Knowing whether a function is continuous or not is crucial. The laws mentioned
above for limits are directly used to determine the continuity of the following six
functions.
Problem 1.5 (Rules of continuity). Assume two functions f and g are continuous
at c and k ∈ R is a real number. Then, show that the following six functions are continuous at c. However, in some cases, additional conditions may be needed. Specify
which cases require additional conditions and what those conditions are.
(1) f + g
(2) f − g
(3) k f
(4) f g
(5) f /g
(6) f k
Solution 1.5 (1-4) The first four cases follow the same conditions as the four cases
in Problem 1.2 without the need for additional conditions. (5) For the fractional
function f /g, the additional condition is that g(c) ̸= 0 to avoid division by zero.
(6) For the power function f k , additional conditions depend on k. If k is a positive
integer, no additional conditions are needed. If k is negative, an additional condition
is that f (c) ̸= 0. If k is a real number, an additional condition is that f should not be
negative to avoid issues with the square root. ⊔
⊓
Question 1.2. In the answers for (6) and (5), the condition that f should not be zero
near c was not added. While it was added when explaining limits, why is it not
necessary when explaining continuity?
Problem 1.6 (Continuity of a composition). Let g : R → R be continuous at c ∈ R
and f : R → R be continuous at g(c) ∈ R. Then, a composition function ( f ◦g)(x) :=
f (g(x)) is continuous at c. Does this statement seem correct? Explain in your own
words.
Solution 1.6 Since g is continuous at c, as x approaches c, g(x) approaches g(c).
Also, when g(x) approaches g(c), f is continuous at g(c), so f (g(x)) approaches
f (g(c)). Therefore, the composite function f (g(x)) is continuous at c. ⊔
⊓
8
1 Limit and continuity #1
1.2 Quality control and ε-δ arguments
This section aims to assist in understanding the ε-δ argument. The ε-δ argument
is a very natural concept that anyone can become familiar with. For example, when
a factory produces goods or adjusts a situation to generate the appropriate output,
there must be proper input. In such cases, maintaining the quality of the input is
essential to uphold the quality of the output.
Problem 1.7 (Quality control). Suppose a factory produces products, and when
there is no impurity in the input material, the defect rate
√ is 0%. If impurities of xµg
per 1g of input material are mixed, the defect rate is x%. (1) The factory manager
received an instruction to reduce the defect rate to 4% (ε = 4%) or less. How much
should the impurity be kept below? (2) If an instruction is given to reduce the defect
rate to 0.4% or less, how much should the impurity be kept below?
√
< 16.
Solution 1.7 (1) The upper limit condition for the defect rate, x < 4, gives x √
Therefore, the impurity should be kept below 16µg(= δ ) per 1g. (2) Similarly, x <
0.4 yields x = 0.16. Therefore, the impurity should be kept below 0.16µg(= δ ) per
1g. In the given problem situation, to reduce the defect rate by a factor of 10, the
impurity must be reduced by a factor of 100. ⊔
⊓
When the directive to reduce the defect rate is given, the factory manager must
know how much impurity should be reduced for this purpose. Such situations are
common in our surroundings. The desired error range of the output is traditionally
denoted by ε > 0, and the adjustment range of the input to achieve it is denoted by
δ > 0. In many cases, reducing the amount of impurity to zero is impossible. What
can be done is to minimize the adjustment range. Such situations are prevalent in
our surroundings.
Problem 1.8 (Quality control with continuity). In a vinegar factory using traditional fermentation methods, the ideal acidity is pH 3. To achieve this, the ingredients need to be fermented appropriately, and in the manufacturing environment
of this factory, when the fermentation efficiency is 36%, the pH becomes 3. If we
denote the acidity pH by y and the fermentation efficiency percentage by x, we can
express this relationship as the
√ function y = f (x). The relationship between them in
this factory is stated as y = x/2. The factory manager instructed the factory manager to adjust the fermentation efficiency to a range within 0.1 above and below pH
3. In that case, in what range should the fermentation efficiency be adjusted by the
factory manager?
Solution 1.8
√ First, the upper limit of fermentation efficiency is given by the relation y = x/2 < 3.1, and therefore, the upper limit of fermentation
√ efficiency is
x < (6.2)2 = 38.44. The lower limit is given by the relation y = x/2 > 2.9, and
therefore, the lower limit of fermentation efficiency is x > (5.8)2 = 33.64. In other
words, it is allowed for the fermentation efficiency to be 2.44% larger or 2.36%
1.2 Quality control and ε-δ arguments
9
smaller than the optimal fermentation efficiency of 36%, but it should not exceed
that range. ⊔
⊓
In the above example, the difference between the optimal fermentation efficiency
of 36% and the upper limit is different from the difference with the lower limit. In
most cases, this is true. Choosing the smaller one, let’s set δ = 2.36 and choose
the tolerance range of acidity from the manager’s office as ε = 0.1. Then, we can
express it as follows.
| f (x) − 3| < ε
if |x − 36| < δ .
(1.1)
If the boundaries of the upper and lower limits are different, it may seem like losing
information to choose the smaller one, but sometimes it is more convenient or there
is no choice but to do so. We use such relationships to define many things, and these
are collectively referred to as the ε-δ method.
Question 1.3. In the given problem, even if the manager provides a very small tolerance range ε > 0, can we determine the adjustment range δ > 0 that satisfies (1.1)?
How can this be proven?
Assuming not a number but an arbitrary ε (0 < ε < 3) is given, similar calculations are performed to determine δ√
. Let’s try it. The output y should be within the
range of 3 − ε < y < 3 + ε, so y = x/2 is rearranged as follows:
√
3 − ε < x/2 < 3 + ε ⇒ 36 − (24ε − 4ε 2 ) < x < 36 + (24ε + 4ε 2 ).
Therefore, δ can be set as (24ε − 4ε 2 ) or smaller. As shown here, usually, when the
limit error ε is given, the adjustment error δ is determined based on ε.
Question 1.4. In the given problem,
the reason why it is possible to choose δ > 0
√
for every ε > 0 is that f (x) = x/2 is continuous at c = 36. Can you see why?
The following problem deals with a situation where defective products are produced based on different relationships depending on whether there are impurities or
not.
Problem 1.9 (Quality control with discontinuity). Suppose that when impurities
are mixed in 1g of input material at a rate of xµg, the probability of defective prod-
10
1 Limit and continuity #1
√
ucts is given by 1+ x%. If there are no impurities at all, the probability of defective
products is 0 (refer to the figure). In this case, let’s reconsider parts (1) and (2) of
Problem 1.7.
√
Solution 1.9 (1) The upper limit condition for the defect rate, 1 + x < 4, gives
x < 9. Therefore,
impurities should be adjusted
to 9µg/g(= δ ). (2)
√
√ the amount of √
Similarly, 1 + x < 0.4 yields x < −0.6. However, x cannot be negative, so
there is no control range. In other words, there is no corresponding δ > 0. ⊔
⊓
In the above problem, when the error limit for the product was set to ε = 4, it
was possible to determine the control limit δ . However, when ε = 0.4, it was not
possible to determine the corresponding δ . This is because the defect rate function
f (x) is discontinuous at x = 0. If it were continuous, such issues would not arise.
Problem 1.10. In the case of Problem 1.8, if the function f (x) is discontinuous at c,
the manager can assign an impossible task to the factory manager (refer to the figure). The manager can specify ε > 0 so that it is impossible for the factory manager
to determine δ > 0 satisfying (1.1). How small can ε > 0 be?
Solution 1.10 If the function f (x) is discontinuous at c, then either the right-hand
limit or the left-hand limit is different from the function value f (c). If ε is chosen
to be smaller than the difference between these values and the function value, then
no matter how small δ > 0 is chosen, the error cannot be maintained to be less than
ε. ⊔
⊓
Exercises
1. Find the following limits.
x2 − c2
(2) lim xπ
(1) lim
x→c x − c
x→0
(3) lim π x
x→0
2. Let the input amount be x, and the output amount be f (x) = x2 . If the desired
output amount is 100 and the error limit is ε = 1, what is the maximum range of
adjustment for the input amount around x = 10? (Find δ > 0 such that | f (x) −
f (10)| < ε whenever |x − 10| < δ .)
3. If lim f (x) = f (c), explain that there exists a control range δ > 0 even if a very
x→c
small error limit ε > 0 is given.
4. Prove that if the function f (x) is continuous at x = c, and a very small error range
ε > 0 is given, there exists a range of adjustment |x − c| < δ for x to ensure that
f (x) is within the error range of f (c).
Lecture 2
Limit and continuity #2
To prove rather than explain, a proper definition is needed. In this lecture, we define
the concepts of limit, convergence, and continuity using the ε-δ method. The ε-δ
method defines whether it is possible to control the range of errors in the output
f (x) within ε by adjusting the range of input within δ . It is not very different from
everyday language. It may feel awkward at first, but it can become familiar with a
little effort.
2.1 Rigorous definitions using ε-δ
Now, using the ε-δ method, we aim to define limit, convergence, and continuity
more rigorously. The concept is defined by determining whether it is possible to find
an adjustment range δ > 0 that satisfies the error range ε in the output f (x). This
definition is not much different from the one in everyday language. It is necessary
to confirm and become familiar with the fact that such definitions using everyday
language are not very different from the rigorous ones we are introducing.
Continuity is a fundamental concept that appears everywhere. One classical definition of continuity of the function f (x) at the point c is, ”small changes in x near
the point c produce only small changes in the function values f (x) near f (c).” However, this is still an explanatory definition and is not sufficient for rigorous proofs.
The following rigorous definition is given using the ε-δ method.
Definition 2.1 (Limit and continuity). Let a function f : R → R and a point c ∈ R
be given. We say that the limit of f (x) as x → c is the number L such that if, for any
given ε > 0, there exists a corresponding number δ > 0 such that
| f (x) − L| < ε
whenever
0 < |x − c| < δ .
(2.1)
11
12
2 Limit and continuity #2
We denote the limit as lim f (x) = L. We also say that f (x) is continuous at c if
x→c
lim f (x) = f (c). We can also directly define continuity without using limit, i.e.,
x→c
there exists δ > 0 for any given ε > 0 such that
| f (x) − f (c)| < ε
whenever
|x − c| < δ .
(2.2)
If f is continuous at all c ∈ (a, b), we say f is continuous in (a, b).
Read and understand Definition 2.1, and the next step is to perform simple calculations using this definition. The purpose of the following problems is not only
to provide obvious solutions but to make you feel that the above definition really
means what seems obvious. It requires effort to become familiar with it.
Problem 2.1. For the function f (x) = 2x + 1, show the following:
(1) limx→2 f (x) = 5.
(2) limx→1 f (x) ̸= 2.
(3) f is continuous at any point c ∈ R.
Solution 2.1 Drawing the graph of the function f (x) = 2x + 1 and answering the
above questions using everyday language is straightforward. Here, we use the ε-δ
argument to explain. It helps to draw the graph. Let’s go through them one by one.
(1) To show the limit limx→2 f (x) = 5, we need to find a suitable δ > 0 for any
given ε > 0. Since f (2) = 5, we have
| f (x) − f (2)| = |2x + 1 − 5| = |2(x − 2)| = 2|x − 2| < ε
if and only if
|x − 2| < ε/2.
Therefore, we can choose δ to be ε/2. We haven’t been explicitly told to choose δ
as large as possible, so choosing it smaller than this is acceptable.
(2) To show that limx→1 f (x) ̸= 2, it means we need to provide an ε > 0 for which
there is no δ > 0. Since f (1) = 3, let’s choose a difference of 1 or smaller. Let
ε = 0.5. Then,
| f (x) − 2| = |2x + 1 − 2| = |2(x − 1) + 1| = 2|x − 1| + 1 > 0.5
for any x > 1. Therefore, no matter how small we choose δ > 0, there will always
be x satisfying 0 < |x − 1| < δ such that | f (x) − 2| > ε. Thus, limx→1 f (x) ̸= 2.
(3) Let’s take an arbitrary c ∈ R. We want to show that the function f (x) = 2x + 1
is continuous at this point. Assume ε > 0 is given. Then, we choose δ = ε2 . (For
1st-degree functions like this one, we can choose the same δ for all points. In most
cases, we need to choose different δ depending on the location.) Now, assume 0 <
|x − c| < δ . Then,
2.1 Rigorous definitions using ε-δ
13
ε
| f (x) − f (c)| = |2x + 1 − 2c − 1| = 2|x − c| < 2 = ε.
2
Hence, for any given error limit ε > 0, we can find an adjusting limit δ , making f
continuous at c.
Question 2.1. Does the continuity defined in Definition 2.1 align with our general
concept of continuity?
The ε-δ method is a dynamic expression. It’s like playing a game. If you give me
an error limit ε > 0, I can always find an adjusting range δ > 0 that satisfies (2.1)
or (2.2). Therefore, proving means finding and demonstrating such δ > 0. In other
words, proving is the technique of finding something, not just explaining. Showing
that for any given ε > 0, there is no such δ > 0 is equivalent to proving that it is not
continuous or does not converge. Now, proof is not about explaining well in verbal
language but about the skill of finding something.
If you give a large ε > 0, it is usually easier to find δ > 0. However, if it is
continuous or converges, you can find δ > 0 even if you give a very small ε > 0.
So, we start by assuming any ε > 0, and if we can find the corresponding δ > 0,
it means the convergence of L is the limit of f (x), or f is continuous at c. If L is
not a limit, or f is not continuous, it means we cannot find such δ > 0 when ε is
sufficiently small.
Problem 2.2. In the previous definition of the limit, we assumed that the function
f is defined for the entire real numbers R. However, upon closer inspection, the
definition shows that even if f is defined only in a small interval, the limit at all c in
that domain is well-defined. Explain the reason for this.
Solution 2.2 Looking at the relation (2.1), we can see that if f is defined only in
an open interval (a, b) and c ∈ (a, b), the definition is still valid. The reason is that
if δ is small enough, the points satisfying 0 < |x − c| < δ will lie inside (a, b). As
long as f is defined there, the definition holds. However, if the domain is a closed
interval [a, b] and c is one of the endpoints, then we need to consider left or right
limits and redefine the limit. In this case, for the limit to be well-defined, one-sided
limits need to be considered.
Now, it is possible to precisely define the extremes of the right and left in everyday
language. Let’s create a definition. Creating definitions can be a useful exercise.
Problem 2.3. Use the ε-δ method to define the right limit and left limit.
Solution 2.3 The definitions can vary slightly, but the essence should be included.
(i) Left limit: Let f : (a, b) → R and c ∈ (a, b). We say L is the left limit of f (x)
as x → c− and write
lim f (x) = L,
x→c−
14
2 Limit and continuity #2
if, for any ε > 0, there exists δ > 0 such that
| f (x) − L| < ε
whenever
0 < c−x < δ.
(ii) Right limit: Let f : (c, b) → R for some c < b. We say L is the right limit of
f (x) as x → c+ and write
lim f (x) = L,
x→c+
if, for any ε > 0, there exists δ > 0 such that
| f (x) − L| < ε
whenever
0 < x−c < δ.
Check if these definitions accurately reflect the meaning of the right limit and left
limit.
Problem 2.4 (Left continuity and right continuity). Define left continuity and
right continuity using the ε-δ method.
Solution 2.4 Using the previously defined left limit and right limit, we can define
right continuity and left continuity as follows:
Definition A: Let f : (a, b) → R and c ∈ (a, b). We say that f (x) is leftcontinuous at c if limx→c− f (x) = f (c). We say f (x) is left-continuous on (a, b)
if f is left-continuous at all c ∈ (a, b). The right-continuity is defined similarly.
Without using right and left limits explicitly, we can define left continuity and
right continuity using the ε-δ method by only including 0.
Definition B: Let f : (a, b) → R and c ∈ (a, b). We say that f (x) is left continuous
at c if, for any given ε > 0, there exists δ > 0 such that
| f (x) − f (c)| < ε
whenever
0 ≤ c−x < δ.
We say that f (x) is right continuous at c if, for any given ε > 0, there exists δ > 0
such that
| f (x) − f (c)| < ε whenever 0 ≤ x − c < δ .
We say f is left (right) continuous in (a, b), if it is left (right) continuous at all
c ∈ (a, b). The difference between the two definitions lies in substituting L with f (c)
and including the case of 0 = x − c by using 0 ≤ x − c < δ instead of 0 < x − c < δ .
⊔
⊓
2.2 Examples
15
2.2 Examples
Using the ε-δ method, we can prove the limits and continuity laws for Problems 1.2
and 1.5. However, to do this, some techniques are required. Problem 4 is challenging, while other cases are relatively straightforward.
Problem 2.5 (Sum law). (1) Show the following relationship between limits:
lim ( f (x) + g(x)) = lim f (x) + lim g(x).
x→c
x→c
x→c
(2) Prove that if functions f and g are continuous at c ∈ R, then f + g is also continuous at c ∈ R.
Solution 2.5 Proving (1) and (2) is almost the same problem. Let ε > 0 be
given. The goal is to find δ > 0 determined by ε. Assume limx→c f (x) = L and
limx→c g(x) = M. According to the definitions, there exist δ1 > 0 and δ2 > 0 such
that:
| f (x) − L| < 0.5ε whenever 0 < |x − c| < δ1 ,
|g(x) − M| < 0.5ε
whenever
0 < |x − c| < δ2 .
Note that we found δ corresponding to 0.5ε instead of ε in the definitions. Since
δ may differ in the two cases, we denote them as δ1 and δ2 . We use the smaller
δ = min(δ1 , δ2 ). Then, if 0 < |x − c| < δ , we have:
| f (x) + g(x) − (L + M)| ≤ | f (x) − L| + |g(x) − M| < 0.5ε + 0.5ε = ε.
Thus, by the definition of the limit, L + M is the limit of the function f (x) + g(x),
and therefore,
lim ( f (x) + g(x)) = L + M = lim f (x) + lim g(x).
x→c
x→c
x→c
If f and g are continuous at c, then L = f (c) and M = g(c). Hence,
lim ( f (x) + g(x)) = L + M = f (c) + g(c).
x→c
Therefore, f + g is continuous at c ∈ R. ⊔
⊓
The above proof regarding convergence implies that when two functions f and g
are continuous at a point, their sum f + g is also continuous at that point. Now, let’s
prove the continuity of a composite function.
Problem 2.6 (Continuity of a composition). Let f : R → R be continuous at c ∈
R, and let g : R → R be continuous at f (c) ∈ R. Then, the composition function
(g ◦ f )(x) := g( f (x)) is continuous at c.
Solution 2.6 Let ε > 0 be given. We need to find δ > 0 such that (2.2) holds. Since
g is continuous at f (c), there exists δ1 > 0 such that
16
2 Limit and continuity #2
Once ε > 0 is given, the goal is to find δ > 0 that satisfies (2.2). Since g is
continuous at f (c), there exists δ1 > 0 such that
|g( f (x)) − g( f (c))| < ε
whenever
| f (x) − f (c)| < δ1 .
At this point, it is crucial that we did not use |g(x) − g( f (c))| < ε. In the next step,
the key is to use δ1 as the ε for the continuity of the function f . Since f is continuous
at c, there exists δ2 > 0 such that
| f (x) − f (c)| < δ1
whenever |x − c| < δ2 .
Now, set δ = δ2 , and the proof is complete. (This proof, though straightforward
after careful consideration, goes beyond the simplicity when attempting to prove the
continuity of composite functions using everyday language, as attempted in Lecture
1. It illustrates the cleverness of the ε-δ method.) ⊔
⊓
Problem 2.7. Let
(
x, if x < 1
f (x) =
2x, if x ≥ 1.
Show that (1) limx→1+ f (x) = 2 and (2) limx→1− f (x) = 1.
Problem 2.8 (One-sided limits). Show that
lim f (x) = L
x→c
if and only if
lim f (x) = L = lim f (x).
x→c−
x→c+
Problem 2.9 (Sandwich Theorem). Let f , g, h : (a, b) → R, g(x) ≤ f (x) ≤ h(x) on
(a, b), and c ∈ (a, b). Show limx→c f (x) = L if
lim g(x) = lim h(x) = L.
x→c
x→c
All three problems above require choosing δ > 0 given ε > 0. In Problem 2.7, it
is necessary to carefully consider the function to determine the appropriate δ , while
in Problems 2.8 and 2.9, conditions must be used to establish δ > 0.
2.3 Limits as x → ∞ and f (x) → ∞
Sometimes, as x → ∞ or x → −∞, a function may either converge or diverge to
infinity. Let’s think about the definitions in such cases. Try creating definitions for
the situations listed below and then compare them with the given examples.
Problem 2.10. For a function f : R → R, create the definitions for the following
cases using the ε-δ method: (1) lim f (x) = L. (2) lim f (x) = L. (3) lim f (x) = ∞.
x→∞
x→−∞
x→c
2.3 Limits as x → ∞ and f (x) → ∞
17
(4) lim f (x) = −∞. (5) lim f (x) = ∞. (6) lim f (x) = ∞. (7) lim f (x) = −∞. (8)
x→c+
x→c
x→c+
x→c−
lim f (x) = −∞.
x→c−
Solution 2.10 First, try creating definitions and then compare them with the definitions given below. Think about whether the provided definitions capture the intended
situations.
1. We say lim f (x) = L if for any ε > 0, there exists N ∈ R such that | f (x) − L| < ε
x→∞
whenever x > N.
2. We say lim f (x) = L if for any ε > 0, there exists N ∈ R such that | f (x)−L| < ε
x→−∞
whenever x < N.
3. We say lim f (x) = ∞ if for any N ∈ R, there exists δ > 0 such that f (x) > N
x→c
whenever 0 < |x − c| < δ .
4. We say lim f (x) = −∞ if for any N ∈ R, there exists δ > 0 such that f (x) < N
x→c
whenever 0 < |x − c| < δ .
5. We say lim f (x) = ∞ if for any N ∈ R, there exists δ > 0 such that f (x) > N
x→c+
whenever 0 < x − c < δ .
6. We say lim f (x) = ∞ if for any N ∈ R, there exists δ > 0 such that f (x) > N
x→c+
whenever 0 < c − x < δ .
7. We say lim f (x) = −∞ if for any N ∈ R, there exists δ > 0 such that f (x) < N
x→c−
whenever 0 < x − c < δ .
8. We say lim f (x) = −∞ if for any N ∈ R, there exists δ > 0 such that f (x) < N
x→c−
whenever 0 < c − x < δ .
⊔
⊓
Exercises
1. Prove the following limits using the definitions:
(1) lim x−1 = 0 (2) lim x−1 = ∞ (3) lim x3 = 0
x→∞
x→0+
2. Find the limits if they exist:
1
x3 − 1
(2) lim
(1) lim 2
x→2 x − 2
x→1 x − 1
x→0
x2 − 4
x→2 x − 2
(3) lim
3. Show whether the following functions are continuous at the given point or not:
√
2 −4
(1) f (x) = 1x at c ̸= 0 (2) f (x) = xx−2
at c = 2 (3) f (x) = x at c = 4
18
2 Limit and continuity #2
4. Determine whether the following functions are continuous at x = 0 or not:
(1) f (x)p
= x sin(1/x) with f (0) = 0 (2) f (x) = x2 sin(1/x) with f (0) = 0
f (x) = |x|
(3)
5. (2)Draw the graph of the following function:
(
0
if x ≤ 0,
f (x) =
−1
cos(x ) if x ≥ 1.
(2) Show that this function is continuous at points c ̸= 0 and discontinuous at c =
0. (Use the fact that cos x is continuous, and the composition of two continuous
functions is continuous.)
Lecture 3
Differentiation
The statement that differentiation is a mathematical tool for describing motion
means that it can express laws related to motion. Moving objects have velocity. Velocity indicates how much and in which direction the position changes at each moment. However, if an object moves instantaneously, velocity cannot be considered.
The motion we want to represent mathematically is continuous motion where the
position changes continuously. We have already learned the very important mathematical concept of continuity. Now, we learn differentiation, which can represent
quantities related to motion, such as velocity or acceleration.
3.1 Rate of increase
Let the function f (x) be given as follows:
f : R → R,
x ∈ R.
Here, the variable is represented by x and takes real values (x ∈ R). The function
is denoted by f and also takes real values ( f ∈ R). This function can represent the
position of an object moving on a one-dimensional line, or it can represent more
general quantities. Depending on the situation, a different symbol may be used instead of x to represent the variable. x can be used to represent time, but very often
time is represented by t ∈ R. If f (t) represents the position of a runner after t seconds since the start, it is related to motion. However, it can also represent various
things, such as the production quantity of a specific product during t hours. We deal
with a wide variety of situations. The input variable can represent a quantity other
than time, and in that case, t can be considered as a quantity other than time, or
a different symbol like x ∈ R can be used, but it doesn’t make a difference. Using
notation that does not confuse the meaning is good.
Let x increase from a to b, and suppose that the value of the function increases
from f (a) to f (b). Let’s denote the increments as △x = b−a and △ f = f (b)− f (a).
The ratio of these increments, i.e.,
f (b) − f (a) △ f
≡
b−a
△x
mean growth rate,
is called the mean growth rate. For example, saying the mean growth rate is 10
means that when x increases by △x from a, the function value increases by 10△x
19
20
3 Differentiation
from f (a). Of course, if x decreases, the function value decreases accordingly. When
this ratio has a positive value, it means that when the variable x increases, the value
of the function f increases, and when x decreases, the value of the function f decreases. Saying the mean growth rate is -10 means that when x increases, f decreases
ten times, and when x decreases, f increases ten times. In the definition above, if we
replace a with c and b with c + h, the mean growth rate can also be written as
f (c + h) − f (c)
h
mean growth rate.
Here, h can be positive or negative. This expression is also widely used and useful.
If the limit of the mean growth rate exists as h approaches 0, we denote that limit
as f ′ (c). That is,
f ′ (c) = lim
h→0
f (c + h) − f (c)
.
h
derivative (instantaneous growth rate)
This limit becomes the instantaneous growth rate (derivative) of the function f
at c and is also called the derivative of f at c. Geometrically, it becomes the slope of
the tangent line at the point (c, f (c)) on the graph.
Problem 3.1 (Tangent line formula). Suppose the function f is differentiable at
x = c. Find the equation of the tangent line that touches the graph at x = c.
Solution 3.1 Using the fact that the slope is a and the line passes through (x0 , y0 ),
we use the equation of a line y − y0 = a(x − x0 ). For the tangent line with slope f ′ (c)
and passing through (c, f (c)), the equation is
y − f (c) = f ′ (c)(x − c) or y = f ′ (c)(x − c) + f (c)
tangent line formula.
Instead of the above expression, the following expression can also be used:
f ′ (c) = lim
b→c
f (b) − f (c)
.
b−c
derivative (instantaneous growth rate)
3.1 Rate of increase
21
Sometimes, when we want to express the differentiation of the function f with respect to the variable x more clearly, we write it as
df
(c) = f ′ (c).
dx
This notation is called Leibnitz notation, and although it may seem complex and
inconvenient, it is very convenient when combined with various properties of derivatives that we will learn in the future.
Remark 3.1. Many people refer to differentiation as the rate of change, but differentiation is not the rate of change. The term ”change” does not specify whether it
increased or decreased, so saying the rate of change is 3 doesn’t reveal if it’s increasing or decreasing. However, saying the derivative is 3 means it is increasing. Also,
saying the rate of change is -3 is an awkward expression. Does it mean it changed
less than when the rate of change is 0? It’s an inappropriate expression. Probably,
people use the term ”rate of change” but interpret it as the growth rate.
Not all functions have this limit for every variable. For the instantaneous growth
rate to exist, both the left-hand limit and the right-hand limit of the mean growth
rate must exist, and the two values must be equal. If such a limit exists, we say that
the function f is differentiable at c. Therefore, it is necessary to distinguish whether
a function is differentiable or not.
Problem 3.2 (Examples). (1) Show that the function f (x) = |x| is not differentiable
at x = 0. (2) Show that the function f (x) = x2 is differentiable at x = 1. (3) Show
that differentiation is not possible at points where the function f is discontinuous.
Solution 3.2 (1) Let’s show that the left-hand limit and the right-hand limit of the
mean growth rate are different. The left-hand and right-hand limits are as follows:
lim
h→0+
lim
h→0−
|h|
h
f (0 + h) − f (0)
= lim
= lim = lim 1 = 1.
h
h→0+ h
h→0+ h
h→0+
f (0 + h) − f (0)
|h|
−h
= lim
= lim
= lim −1 = −1.
−
−
h
h
h→0+
h→0 h
h→0
Since they are different, the limit does not exist, and therefore, it is not differentiable.
(2) The limit is given as follows:
lim
h→0
f (1 + h) − f (1)
(1 + h)2 − 1
h2 + 2h
= lim
= lim
= lim (h + 2) = 2.
h→0
h→0
h→0
h
h
h
The limit is 2, so it is differentiable, and the derivative value is 2.
(3) Suppose that the function f is discontinuous at c. Then, for any ε > 0, there
exists no δ > 0 such that, for all natural numbers n > 0, 0 < |hn | < n1 satisfies
| f (c + hn ) − f (c)| > ε.
22
3 Differentiation
Therefore, for any M > 0, if n >
M
ε ,
then
| f (c + hn ) − f (c)|
ε
≥
≥ εn ≥ M.
|hn |
|hn |
So, even though hn approaches 0, the average growth rate can be arbitrarily large,
and therefore, the limit does not exist.
Problem 3.3 (Continuity of a differentiable function). If the function f (x) is differentiable at x = c, then show that f (x) is continuous at x = c.
Solution 3.3 There are various ways to prove this, and Problem 3.2(3) is one of
them. Let’s consider at least one more way.
3.2 Differentiation Rules
There are many cases where differentiation of functions is necessary. Therefore, it is
crucial to understand and memorize some cases well to differentiate without making
mistakes. In this section, we will learn methods of differentiation by categorizing
them into eight cases.
Question 3.1 (Is a number a function?). If a real number x is substituted into the
function f : R → R, it produces another real number f (x). If a constant function
always provides the same real number 3 for all x ∈ R, the best way to denote it is to
write just 3 instead of f . Numbers are just numbers. However, they can also be used
as a notation for representing functions. Using a single notation with a dual meaning
is very convenient, and we will use such notation in various cases in the future.
Problem 3.4 (Rules of Differentiation). Prove the following rules of differentiation. Also, specify the conditions required for these differentiation rules to hold.
1. The derivative of a constant function is 0.
2. f (x) = x ⇒ f ′ (x) = 1.
3. f (x) = xn for n ∈ N ⇒ f ′ (x) = nxn−1 .
4. f (x) =
1
x
⇒ f ′ (x) = −x−2 .
Sum Rule 5. ( f + g)′ (x) = f ′ (x) + g′ (x).
Product Rule 6. ( f g)′ (x) = ( f ′ g + f g′ )(x).
f ′
f ′ g − f g′
Quotient Rule 7.
(x) =
(x).
g
g2
Power Rule 8. f (x) = xα for α ∈ R ⇒ f ′ (x) = αxα−1 .
Solution 3.4 The above 8 Differentiation Rules are the most fundamental rules.
Let’s examine the conditions under which these rules can be applied. The first three
3.2 Differentiation Rules
23
Differentiation Rules do not require any conditions. For (4), the condition x ̸= 0 is
necessary. The condition that the denominator is not zero is necessary in all cases.
(5,6,7) are meaningful only under the condition that both functions f and g are
differentiable. However, (7) requires an additional condition that the denominator
g(x) should not be zero. Power Rule (8) requires careful attention. If α is not a
positive integer, it holds only for well-defined x where xα and xα−1 are defined.
Once α − 1 is negative, the condition x ̸= 0 is necessary, and if α is not an integer,
the condition x ≥ 0 is necessary.
Now, let’s prove them. The first four rules are special cases of the Power Rule (8).
(1) If f is a constant function, then f (x) = f (x + h). Therefore,
f ′ (x) = lim
h→0
0
f (x + h) − f (x)
= lim = lim 0 = 0.
h→0 h
h→0
h
(2) If f (x) = x, then f (x + h) − f (x) = h. Therefore,
f ′ (x) = lim
h→0
f (x + h) − f (x)
= lim 1 = 1.
h→0
h
n−2 h2 +
(3) If f (x) = xn , then f (x + h) − f (x) = (x + h)n − xn = nxn−1 h + n(n+1)
2 x
n
· · · + h . Therefore,
n−2 h2 + · · · + hn
nxn−1 h + n(n+1)
2 x
= nxn−1 .
h→0
h
f ′ (x) = lim
(4) If f (x) =
1
x
and x ̸= 0, then
1 1
1
1 x − (x + h)
1
−
= lim
= − 2.
h→0 h x + h
h→0 h x(x + h)
x
x
f ′ (x) = lim
(5) The differentiation of the sum can be easily shown. If f and g are both differentiable,
f (x + h) + g(x + h) − f (x) − g(x)
h→0
h
f (x + h) − f (x)
g(x + h) − g(x)
= lim
+ lim
= f ′ (x) + g′ (x).
h→0
h→0
h
h
( f + g)′ (x) = lim
(6) The differentiation of the product is a bit more challenging but essential. The
technique used in this proof is ”add and subtract after adding.” Assuming f and g
are both differentiable,
24
3 Differentiation
f (x + h)g(x + h) − f (x)g(x)
h→0
h
f (x + h)g(x + h) − f (x + h)g(x) + f (x + h)g(x) − f (x)g(x)
= lim
h→0
h
f (x + h)g(x + h) − f (x + h)g(x)
f (x + h)g(x) − f (x)g(x)
= lim
+ lim
h→0
h→0
h
h
g(x + h) − g(x)
f (x + h) − f (x)
= lim f (x + h)
+ lim
g(x)
h→0
h→0
h
h
= f (x)g′ (x) + f ′ (x)g(x).
lim
Therefore, the limit exists, and ( f g)′ (x) = f (x)g′ (x) + f ′ (x)g(x) holds.
(7) The differentiation of the fraction can be shown similarly to the product rule, but
a little more attention is required for the fractional form. The condition is that f and
g must each be differentiable, and it holds only for x where the denominator g(x) is
nonzero. You can think of gf = f 1g . Therefore, it’s fine to calculate the derivative of
1
g first and then use the product rule (the calculation is omitted).
(8) The Power Rule is the most commonly used differentiation rule. If α is not
a positive integer, the proof uses the logarithmic function that will be learned in
Chapter 6. Assuming you already know it, I’ll write it down below. After learning
the logarithmic function, reviewing this part should enhance your understanding.
Let y = xα , and take the logarithm of both sides:
ln y = ln xα = α ln x.
By differentiating both sides with respect to x using the derivative of the logarithmic
function:
y′
α
=
⇒ y′ = αx−1 y = αxα−1 .
⊔
⊓
y
x
Problem 3.5. Find the derivatives of the following functions.
(1) f (x) = 3xα .
Solution 3.5 (1) Understand it as the product of the constant function 3 and the
exponential function xα . Using the product rule:
(3xα )′ = 0xα + 3αxα−1 .
In other words, if the function is a constant, its derivative is 0, so you can ignore the
constant coefficient and just differentiate the remaining function part, then multiply
by the constant coefficient. ⊔
⊓
3.4 Derivative of Trigonometric Functions
25
3.3 Intermediate and Mean Value Theorem
Theorem 3.1 (Intermediate Value Theorem). Let f : [a, b] → R be a continuous
function, and let m ∈ R be a constant between f (a) and f (b). Then, there exists c in
the open interval (a, b) such that
f (c) = m.
Theorem 3.2 (Mean Value Theorem). Let f : [a, b] → R be a continuous function,
and assume that f is differentiable for all x ∈ (a, b). Then, there exists c ∈ (a, b)
such that
f (b) − f (a)
.
f ′ (c) =
b−a
Proof. The proof is carried out using the Intermediate Value Theorem 3.1.
Theorem 3.3 (Cauchy’s Mean Value Theorem). Let f , g : [a, b] → R be continuous
functions, and assume that both f and g are differentiable for all x ∈ (a, b). Also,
suppose that g′ (x) ̸= 0 for all x ∈ (a, b). Then, there exists c ∈ (a, b) such that
f ′ (c)
f (b) − f (a)
=
.
′
g (c)
g(b) − g(a)
Proof. The proof is carried out using the Mean Value Theorem 3.2.
The mean value is one of the intermediate values, and the mean value theorem
sounds similar to the intermediate value theorem. However, the above mean value
theorem is, in fact, about the mean rate of increase, not the average value. For this
reason, it would have been better to call it the Mean Growth Rate Theorem, but it
has already settled with the name Mean Value Theorem.
3.4 Derivative of Trigonometric Functions
Trigonometric functions are widely used and will continue to appear frequently.
Remember them well.
Problem 3.6. Prove the following.
(1) sin′ x = cos x (2) cos′ x = − sin x (3) tan′ x = sec2 x
(4) cot′ x = − csc2 x (5) sec′ x = sec x tan x (6) csc′ x = − csc x cot x
Solution 3.6 (1) Using the sum rule for the sin function, we have:
sin(x + h) − sin x sin x cos h + cos x sin h − sin x
cos h − 1
sin h
=
= sin x
+ cos x
.
h
h
h
h
26
3 Differentiation
Taking the limit as h → 0, and using cos′ 0 = 0 and sin′ 0 = 1, we obtain sin′ x = cos x.
(2) can be done similarly using the sum rule for cos x. The rest are obtained using
the quotient rule for differentiation. It is good to remember if possible. ⊔
⊓
3.5 Velocity and Acceleration
Newton developed calculus as a mathematical tool to explain the motion of planets.
Let’s consider the relationship between position, velocity, and acceleration in onedimensional space. Let x : R → R be the position function. Here, x(t) represents the
position or x coordinate of an object moving along a straight line (or x-axis) at time
t. We are using the symbol x with dual meanings, representing the x-coordinate of a
point and now the function representing the position.
Then, x(t +h)−x(t) is the difference in position,
over time h, and its derivative
x′ (t) = lim
h→0
x(t+h)−x(t)
h
is the average velocity
x(t + h) − x(t)
h
is the instantaneous velocity at time t. Especially, the derivative with respect to time
is sometimes denoted as ẋ instead of x′ . If v is used to represent velocity, then v = ẋ.
The instantaneous rate of increase of velocity is acceleration, denoted as a = v̇ = ẍ.
Remark 3.2 (Preview of Part II). The space is three-dimensional in the universe.
To express the position of a planet, three coordinates x, y, z are needed, represented
by three real variable functions x(t), y(t), z(t). If we simply denote these functions
as x(t), y(t), z(t), the position can be expressed as r(t) = (x(t), y(t), z(t)). Velocity
and acceleration are the first and second derivatives, respectively, of these position vector functions. That is, v(t) = ṙ(t) = (ẋ(t), ẏ(t), ż(t)), a(t) = v̇(t) = r̈(t) =
(ẍ(t), ÿ(t), z̈(t)). For an object with mass m and a force acting on it denoted as F,
Newton’s second law of motion is expressed as
F = ma = mv̇ = mr̈.
(Newton’s Second Law of Motion)
Exercises
1. Find the tangent line at the given point on the graph of the following
functions.
√
(1) y = 4 + x2 at (1, 5) (2) y = x−2 at√(2, 0.25) (3) y = x3 at (4, 8)
x
(4) f (t) = t 3 −t 2 at t = 2 (5) f (x) = t 2 + 1 at x = 2 (6) f (x) = x−2
at x = 1
1
x−2
3
0.5
(7) f (s) = s − s at s = 1 (8) f (t) = t−1 at t = 2 (9) f (x) = x+1 at x = 2
2. Check if the following functions are differentiable at x = 0 when
p f (0) = 0.
(1) f (x) = x sin(1/x) (2) f (x) = x2 sin(1/x) (3) f (x) = |x|
3.5 Velocity and Acceleration
27
3. Calculate the instantaneous rates of increase for the following functions at x = r,
where r is the radius.
(1) Perimeter of a circle: 2πx (2) Area of a circle: πx2 (3) Surface area of a
sphere: 4πx2 (4) Volume of a sphere: 34 πx3
4. Find the intervals where the following functions are defined and points where
they are not differentiable.
p
p
2 +1
(4) √ 12
(2) |x| + 1
(3) xx−1
(1) |x|2 + 1
x −1
Lecture 4
Chain rule and implicit differentiation
Chain Rule is the flower of differentiation rules. It is useful and powerful. The
true power of the chain rule can be seen when studying functions with multiple
variables and vector-valued functions in Calculus 2. In Calculus 1, we consider the
simplest one-dimensional functions as follows:
g : (a2 , b2 ) → R, f : (a1 , b1 ) → R, c ∈ (a2 , b2 ), g(c) ∈ (a1 , b1 ).
What the chain rule states is that if the function g is differentiable at the point c
and the function f is differentiable at g(c), then the composite function f ◦ g is
differentiable at c, and its derivative is given by the following formula:
( f ◦ g)′ (c) = f ′ (g(c))g′ (c).
(4.1)
Question 4.1. Can you interpret the meaning of the mathematical relationship in
(4.1) in everyday language?
There are various languages in the world, and sometimes mathematics is considered the language of science. For example, the equation (4.1) is expressing something in the language of mathematics. Explaining its meaning in your own language
is the first step in understanding this mathematical expression. If the meaning seems
obvious, it is called intuition. For example, saying g′ (c) = 10 means that when x
increases slightly around c, the function value g increases 10 times more. Similarly,
saying f ′ (g(c)) = 10 means that when x increases slightly around g(c), f increases
10 times more. Therefore, the composite function f (g(x)) increases 100 times more
when x increases slightly around c, and this is the meaning of the chain rule (4.1).
Explaining mathematical language in everyday language is very useful.
29
30
4 Chain rule and implicit differentiation
4.1 Chain rule
The derivative of a function f is defined as follows:
f ′ (c) = lim
z→c
f (z) − f (c)
.
z−c
f (c)
The right side is the limit of the ratio f (z)−
. If the variable x increases by △x
z−c
around the point c, the function f (x) increases by △x times f ′ (c). The geometric
meaning of the derivative f ′ (c) is the slope of the tangent line to the graph y = f (x)
at the point (c, f (c)). Using Leibniz notation, it can be written as:
f′ =
df
.
dx
This notation clearly shows that the derivative is the rate of increase of the function
f with respect to the increase in the variable x.
The Chain Rule is a rule for differentiating composite functions, so let’s consider
composite functions first. For two functions:
g : (a1 , b1 ) → R
and
f : (a2 , b2 ) → R
we can define the composite function as follows:
( f ◦ g)(x) = f (g(x)),
x ∈ Ω.
(4.2)
Problem 4.1. Given the composite function f ◦ g in (4.2), it is generally not defined
for all x ∈ (a1 , b1 ). Why is that? What is the maximum domain Ω that this composite
function can have?
Solution 4.1 If g(x) ̸∈ (a2 , b2 ), then f (g(x)) is not defined. Therefore, the maximum
possible domain of the composite function is Ω := {x ∈ (a1 , b1 ) : g(x) ∈ (a2 , b2 )}.
⊔
⊓
For convenience, let’s consider two functions f and g given as follows:
g : (a1 , b1 ) → (a2 , b2 ),
f : (a2 , b2 ) → R.
(4.3)
Then, the composite function f ◦ g : (a1 , b1 ) → R is defined without worrying about
the domain.
Problem 4.2. Consider the functions f and g given in (4.3). Assume that g is differentiable at c ∈ (a1 , b1 ) and f is differentiable at g(c) ∈ (a2 , b2 ). In this case, explain
how much the composite function f ◦ g increases when the variable x increases by
△x using the derivatives of f and g.
4.1 Chain rule
31
Solution 4.2 When the variable x changes slightly around c, the function g magnifies that change by g′ (c). On the other hand, the function f magnifies the change
in g(c) by f ′ (g(c)). Therefore, the composite function f ◦ g magnifies the change
around c by f ′ (g(c))g′ (c). ⊔
⊓
Theorem 4.1 (Chain Rule). Let g : (a1 , b1 ) → (a2 , b2 ) and f : (a2 , b2 ) → R be given
functions. Assume that g is differentiable at c ∈ (a1 , b1 ) and f is differentiable at
g(c) ∈ (a2 , b2 ). Then, the composite function f ◦ g is differentiable at c, and its
derivative is given by the following formula:
( f ◦ g)′ (c) = f ′ (g(c))g′ (c).
(4.4)
Proof. Even if you understand the meaning of the chain rule intuitively, proving it
requires a separate skill, apart from intuition. The skill we will use is called ”cancel
and multiply”:
f (g(z)) − f (g(c)) g(z) − g(c)
f (g(z)) − f (g(c))
=
.
z−c
g(z) − g(c)
z−c
Since g is differentiable at c, g is continuous at c. Therefore,
g(z) → g(c) as
z → c.
Also, since f is differentiable at g(c),
lim
z→c
f (g(z)) − f (g(c))
f (g(z)) − f (g(c))
= lim
= f ′ (g(c)).
g(z) − g(c)
g(z) − g(c)
g(z)→g(c)
Thus, using the product of limits, we have
lim
z→c
f (g(z)) − f (g(c))
f (g(z)) − f (g(c))
g(z) − g(c)
= lim
lim
= f ′ (g(c))g′ (c),
z→c
z→c
z−c
g(z) − g(c)
z−c
and the proof of the chain rule is complete. ⊔
⊓
Problem 4.3. For the following cases of f and g, calculate the derivative of the
composite function ( f ◦ g)(x) using the chain rule, and compare it with the direct
differentiation of the composite function.
(1) f (x) = x2 , g(x) = 2x+1 (2) f (y) = y2 , g(x) = 2x+1 (3) f (g) = g2 , g(x) = 2x+1
(4) f (x) = x4 , g(x) = x3 (5) f (x) = x10 , g(x) = x2 + 1 (6) f (x) = x2 , g(y) = cos y
Solution 4.3 (1,2,3) are essentially the same problem. The variable x of f (x) and
the variable x of the composite function ( f ◦ g)(x) are different. Instead, g(x) of
( f ◦ g)(x) corresponds to x of f (x). For (5), using the chain rule is possible, but
directly creating the composite function and differentiating it is unrealistic. For (6),
chain rule must be used. ⊔
⊓
32
4 Chain rule and implicit differentiation
Equation (4.4) is the chain rule written in Newton’s notation. Rewriting it in Leibniz’s notation, it becomes:
df
d f dg
=
.
(4.5)
dx
dg dx
This equation involves a different kind of duality. Previously, x was used as both a
variable and a function, but in the notation above, f is used as a function of x and
as a function of g. In the left side of (4.5), ddxf means differentiating f as a function
of x, while on the right side, ddgf means considering f as a function of g and dg
dx
means differentiating g as a function of x. That is, on the left side, f is the actual
composite function of f and g, and on the right side, f is treated as a single function,
considering g as a variable. Familiarity with this kind of notation duality requires
practice. On the other hand, Equation (4.4) does not have such duality.
However, if you write it as (4.5), the Chain Rule looks like canceling fractions. In
other words,
d f dg d f
=
.
dg dx
dx
In the notation of the above differentiation, dg is not a number that can be canceled,
but the Chain Rule seems to eliminate dg as if canceling a number. The process of
canceling is the process of using the Chain Rule.
Question 4.2 (Butterfly effect and chain-reaction). The phenomenon where small
changes lead to significant consequences is commonly referred to as the butterfly
effect. Assuming a phenomenon is given by the composition of n functions, it can
be understood as the following composite function:
H(x) = ( f1 ◦ f2 ◦ · · · ◦ fn )(x) = f1 ( f2 (· · · ( fn (x)) · · · )).
Under what circumstances does the butterfly effect occur? Can the chain rule
explain it?
The occurrence of the butterfly effect is when H ′ (c) is very large. In such cases, a
slight change in the variable x around c causes the output H(x) to change significantly. When does such an event happen? By repeatedly applying the chain rule, we
have
H ′ (c) = f1′ ( f2 (· · · ( fn (c)) · · · )) × f2′ (· · · ( fn (c)) · · · ) × · · · × fn′ (c)
′ ( f (c)), · · · ,
Therefore, if there exists a point c where each derivative, fn′ (c), fn−1
n
′
f1 ( f2 (· · · , ( fn (c)) · · · )), takes on large values, then at that point, the butterfly effect is
maximized. For example, if c is such that fn leads to a large derivative for fn−1 , and
fn (c) is again matched with a large derivative for fn−2 , and this situation continues
in a chain reaction, H ′ (c) can become very large, and that is when the butterfly effect
occurs.
Understanding the chain rule and its proof is not enough. You also need to know
how to choose f and g when the situation is given.
4.2 Implicit Differentiation
33
Problem 4.4. (1) Find h′ (x) when h(x) = (3x2 + 1)2 . (2) Find h′ (x) when h(x) =
(3x2 + 1)6 . (3) Find x′ (t) when x(t) = cos(t 2 + 1).
Solution 4.4 (1) First, let’s perform this task without using the chain rule. Expanding, we get h(x) = 9x4 + 6x2 + 1. Then, h′ (x) = 36x3 + 12x. Now, let’s use the chain
rule. Set the outer function as f (g) = g2 and the inner function as g(x) = 3x2 + 1.
Then, h′ (x) = ( f ◦ g)′ (x) = f ′ (g(x))g′ (x) = 2(3x2 + 1)6x = 36x3 + 12x. The two results match. However, in this case, using the chain rule seems slightly more complex
than directly expanding. But this is true only for simple cases; in most cases, it is
not.
(2) Expanding (3x2 + 1)6 is too much work, so let’s use the chain rule instead.
Now, let f (g) = g6 and g(x) = 3x2 + 1. Then, f ′ (g) = 6g5 and g′ (x) = 6x. Thus,
h′ (x) = 6(3x2 + 1)5 6x.
(3) Let f (g) = cos(g) and g(t) = t 2 + 1. Then, f ′ (g) = − sin(g) and g′ (t) = 2t.
⊓
Therefore, x′ (t) = f ′ (g(t))g′ (t) = − sin(t 2 + 1)2t. ⊔
Problem 4.5. (1) Find h′ (x) when h(x) = sin(x2 + x). (2) Find h′ (t) when h(t) =
tan(5 − sin(2t)).
Solution 4.5 (1) In this case, let f (g) = sin(g) and g(x) = x2 + x. Then, h′ (x) =
( f ◦ g)′ (x) = f ′ (g(x))g′ (x) = cos(x2 + x)(2x + 1).
(2) In this case, let f (g) = tan(g) and g(t) = 5 − sin(2t). Then, g′ (t) = − cos(2t)2
and f ′ (g) = sec2 (g). Therefore, h′ (t) = − sec2 (5 − sin(2t)) cos(2t)2. ⊔
⊓
4.2 Implicit Differentiation
Implicit differentiation is one of the most crucial applications of the chain rule. For
example, when there is an equation involving two variables, implicit differentiation
involves considering one variable as a function of the other and finding its derivative.
Although the entire equation may not be viewable as a function, parts of it can be
considered as functions. This method is essential and powerful, encompassing the
core principles of the chain rule. Let’s illustrate this with an example.
Suppose we have two variables, x and y, satisfying the following equation:
x2 + y2 − 25 = 0.
Then, we can treat one of the variables, say x, as the independent variable and consider the other as a function of this variable. For instance, if we choose x as the
independent variable, from the given equation, we get:
p
p
y2 = 25 − x2 ⇒ y = 25 − x2 or y = − 25 − x2 .
Now, the derivative can be calculated as follows:
34
4 Chain rule and implicit differentiation
x
x
dy
or − √
=√
2
dx
25 − x
25 − x2
for − 5 < x < 5.
p
If we want x as a function of y, a similar process gives x = ± 25 − y2 .
dy
However, expressing y explicitly as a function of x and then calculating dx
, as
shown above, can be inconvenient. Moreover, in many cases, expressing y as a function of x is not easy or even impossible. We can use a much simpler and more powerful technique called implicit differentiation to easily calculate it. Consider, for
example,
x3 + 2y3 − 9xy = 1.
(4.6)
In this case, expressing y as a function of x is not straightforward. However, by
mentally considering y as a function y = y(x) of x (implicit function), we can view
the equation as:
x3 + 2(y(x))3 − 9xy(x) = 1.
Terms like 2(y(x))3 are treated as compositions of two functions. Now, by differentiating both sides with respect to x, we obtain:
dy
dy
d 3
(x + 2y3 − 9xy) = 3x2 + 6y2 − 9y − 9x = 0.
dx
dx
dx
In this calculation, we used the chain rule and the product rule for differentiation.
dy
Rearranging the relationship with respect to dx
, we get:
dy 9y − 3x2
=
.
dx 6y2 − 9x
Problem 4.6. Given that x and y satisfy (4.6), calculate
(4.7)
dy
dx
at the point x = 1.
Solution 4.6 To substitute x = 1 into the derived equation (4.7), we need to determine what to substitute for y. Substituting x = 1 into (4.6), we obtain 2y3 − 9y = 0.
Solving this cubic equation, we find y = 0 and y = ± 92 . Substituting these three possible values of y, we get three potential derivative values. (Take a moment to think:
A function should have only one value for each x, but having three values indicates
that y is not a function of x. Although it is a function locally, it is not a function
globally. However, implicit differentiation still finds all three derivative values.) ⊔
⊓
Problem 4.7. Let x and y satisfy x2 + y2 = 25. Using implicit differentiation, find
dx
dy . What is the value at the point (5, 0)?
Solution 4.7 Consider x as a function of y this time. Then,
2y = 0. Therefore,
dx
y
y
=− = p
.
dy
x ± 25 − y2
Substituting (5, 0), we get
d
2
2
dy (x + y )
= 2x dx
dy +
4.2 Implicit Differentiation
35
dx
dy
x=5,y=0
= 0.
In the given equation, x and y are indistinguishable. (Even if we swap them, the
equation remains the same.) Therefore, the derivative is given as the same function.
⊔
⊓
Problem 4.8. There is a circle passing through the point (3, −4) with the origin as
the center. Find the slope of the line tangent to the circle at this point.
p
Solution 4.8 The radius of the circle is 32 + (−4)2 = 5, and the circle satisfies
x2 + y2 = 25. Performing implicit differentiation with respect to x, we get
2x + 2y
dy
= 0.
dx
Therefore, the slope of the tangent at the point (3, −4) is
dy
dx
3
= − xy = − −4
= 43 . ⊔
⊓
Problem 4.9. There is a circle passing through the origin with the point (5, 0) as a
point on the circle. Find the slope of the line tangent to the circle at this point.
Solution 4.9 The radius of the circle is 5. Therefore, it satisfies x2 + y2 = 25. Performing implicit differentiation with respect to x, we get:
2x + 2y
dy
= 0.
dx
dy
= − xy = 05 . However, something is wrong.
Substituting the values at (5, 0), we get dx
There is no numerical output. Ah, in this case, we cannot view y as a function of x
dy
near the given point (5, 0), and we cannot calculate dx
. Geometrically, it corresponds
to a vertical line in the graph. It would be correct to say that y is not differentiable
at this point. Instead of saying the slope is infinite, it is more accurate to say that y
is not differentiable at this point. ⊔
⊓
Question 4.3. We considered the case of one equation with two variables. Then, (1)
What happens if there is one equation with three variables? (2) What happens if
there are two equations with three variables?
In Calculus 2, we will explore these cases. However, in case (1), it is impossible
to consider two variables as functions of the third variable. Why? In case (2), it is
possible to consider two variables as functions of the third variable. Why? Here, we
are talking about general possibilities, and it does not mean it is always possible.
There are cases, as in Problem 4.8, where differentiation is not possible, or the function cannot be viewed as a function of x. Let’s try to answer these questions on your
own.
36
4 Chain rule and implicit differentiation
Exercises
1. Use the chain rule to find
composite function f ◦ g.
√ the derivative of the√
(2) f (x) = x2 − 1, g(x) = sin x
(1) f (u) = u2 , g(x) = x + 1
(3) f (u) = u12 , g(t) = t 3 − t
(4) f (u) = sec u, g(t) = cost
√
(5) f (x) = x, g(u) = cos u
(6) f (s) = 2s2 , g(u) = 5u − 1
2. Use the chain rule to find
√ the derivative of the3following functions.
(3) cos x
(1) (2x + 1)3
(2) t 3 − 2t + 1
1
−3
(4) 3(cos x)
(5) sin(3πx) + cos(2x2 )
(6)
cost + sint
3. Use the chain rule to find the
derivative
of
the
following
functions.
q
p
√
4
2
(3) sin3 (cos2 t)
(1) 1 + tan (t )
(2) 2t + 1 + 1 − t
dy
4. Use implicit differentiation to find dx
.
(1) x2 + y2 = 4
(2) x + y2 = 1
(3) sin x + y2 = 1
(4) xy2 + x2 = 3
(5) xy = sin(xy)
(6) (2xy + y2 )2 = x2 − y2
dy
5. Use implicit differentiation to find dx
and dx
dy at the given points.
2
2
2
(1) x + y = 4 at x = 1
(2) y − x + x = 4 at x = 1
(3) xy + x2 + y2 = 1
2
at x = 1
(4) x + y − 2y = 4 at x = 1
6. Use the results from the above calculations to find the product of
same points.
dy
dx
and
dx
dy
at the
Lecture 5
Integration & fundamental theorem of calculus
In the third lecture, we studied differentiation, and in the fourth lecture, we explored
differentiation techniques. Now, we delve into integration. Integration involves the
process of going back to the function before differentiation. To master integration,
practice is essential. The techniques of integration are covered in more detail in Part
III.
5.1 Antiderivative
For a given function f (x), an antiderivative refers to a function that, when differentiated, results in f (x). In other words,
F ′ (x) = f (x)
defines a function F(x) as the antiderivative of f (x). The process of finding antiderivatives involves reversing the steps of differentiation. However, finding actual
antiderivatives can be challenging. Problem 3.3 provides rules of differentiation,
which, when applied in reverse, can help find antiderivatives in certain cases.
It is crucial to note that an antiderivative is not unique; it can have any constant
added to it. Since the derivative of any constant is zero, all constant functions become antiderivatives of 0. Thus, F(x) = C for any constant C ∈ R. Here, C is called a
generic constant. If there is a constant like 3C, it can be rewritten as C for simplicity.
Let’s consider the following problems.
Problem 5.1. Find the antiderivatives of the following functions.
(1) f (x) = 0.
(2) f (x) = 1.
(3) f (x) = xα , where α ̸= −1.
(4) f (x) = x−1 .
37
38
5 Integration & fundamental theorem of calculus
Solution 5.1 Interpret the instruction to find all antiderivatives.
(1) Since the derivative of a constant is 0, all constant functions are antiderivatives
of 0. Thus, F(x) = C for any constant C ∈ R. Let’s consider generic constants; 3C
can be rewritten as C.
(2) Let F(x) = x; then, F ′ (x) = 1. Therefore, F(x) = x is an antiderivative of
f (x) = 1. However, since the derivative of any constant is 0, all functions of the
form F(x) = x +C, where C is any constant, are antiderivatives of f (x) = 1.
(3) The derivative of xα reduces the power by one ((xα )′ = αxα−1 ). So, the antiderivative should increase the power by one, adjust the coefficient, and add a con1
xα+1 +C, but it’s valid only when α ̸= −1.
stant. The antiderivative is F(x) = α+1
(4) For α = −1, what is the antiderivative of f (x) = x−1 ? This case is special and
holds significant importance. The antiderivative and its inverse function might be
among the most crucial functions in mathematics.
Apart from the cases mentioned above, practicing finding antiderivatives of various functions is necessary.
The antiderivatives obtained in Problem 5.1 always include an additional constant
term C. This constant is referred to as a generic constant and automatically appears
to encompass all possible antiderivatives.
Question 5.1 (Antiderivative of 0). Is there really nothing more than the antiderivatives found above? In other words, by adding the generic constant C, do we obtain
all possible antiderivatives?
This question is equivalent to asking whether the antiderivative of 0 is always
a constant function other than 0. The reason is that if F and G are different antiderivatives of the function f , then (F − G)′ = F ′ − G′ = f − f = 0. This means the
difference F − G is the antiderivative of 0, and hence, there are no antiderivatives of
0 other than constant functions.
So, does a function other than a constant have no antiderivative of zero? In other
words, is there any peculiar function that exists such that its derivative becomes
zero? This question is somewhat philosophical(?) and mysterious. It can be addressed using the Mean Value Theorem. First of all, an antiderivative is a differentiable function, allowing us to apply the theorem. Suppose a differentiable function F satisfies F ′ (x) = 0. If F is not a constant function, then there exist two distinct points x ̸= y such that F(x) ̸= F(y), and hence, their average rate of change is
F(x)−F(y)
̸= 0. Therefore, by the Mean Value Theorem, there exists a point c with
x−y
this average rate of change between x and y, i.e., F ′ (c) = F(x)−F(y)
. This implies a
x−y
contradiction to the fact that F is the antiderivative of zero, as F ′ (c) should be zero.
Therefore, the only antiderivative of zero is a constant.
Problem 5.2.
Solution 5.2
5.2 Integral as the area bounded by a graph
39
5.2 Integral as the area bounded by a graph
Let f : R → R be a given function, and define F(t) as the area between the x-axis
and the graph of f over the interval [a,t] where a is a fixed constant and t > a. If the
area lies below the x-axis (where f < 0), consider it with a negative sign. Treat a as
a fixed constant and t as a variable. This area function is typically represented using
the integral symbol as
Z
t
F(t) =
f (x) dx.
a
Now, let’s examine the derivative of this area function. The derivative is given by
F ′ (t) = lim
h→0
F(t + h) − F(t)
.
h
Now, let’s explore what the derivative of this area function is.
Problem 5.3. If F(t) is the area function corresponding to the given function f (t),
prove the following:
F ′ (t) = f (t).
(5.1)
However, this equation is not always satisfied. Provide the conditions on the function
f for which (5.1) holds.
Solution 5.3 Consider F(t + h) − F(t) as the area represented by the pink region in
the graph. Dividing it by the width h, as h approaches 0, the height approaches the
value f (t). To show convergence, the function f needs to be continuous at t. This
proof uses everyday language; in Problem 5.4, the ε-δ method is used for a more
formal proof.
Question 5.2 (Definition of the area function). To proceed with the discussion, a
natural question arises. Given a function f (x), is the area function well-defined?
First, you need to decide how to calculate the area, and based on that method,
it will determine whether the area function of a certain function is well-defined.
This is a question of how to perform integration. We will determine the area using
the method of Riemann sum for definite integrals. However, regardless of how you
define the area, some functions cannot have a well-defined area function. There are
40
5 Integration & fundamental theorem of calculus
two main reasons for this: when the area diverges to infinity, or when the function
is too irregular and impossible to calculate. In introductory calculus courses, only
functions that are continuous except for a finite number of points and have finite
values on all finite intervals are considered. Under these conditions, the function is
bounded, and the area does not diverge to infinity. Due to continuity, the function is
not too irregular, allowing for the calculation of the area.
Problem 5.4 (Derivative of the area function). Assuming the area function is welldefined and f (x) is continuous, use the ε-δ method to prove (5.1).
Solution 5.4 Given an arbitrary ε > 0, since f is continuous at x = t, there exists a
δ > 0 such that for all |h| < δ , | f (t) − f (t + h)| < ε. Therefore,
F(t + h) − F(t) ( f (t) + ε)h
<
< f (t) + ε
h
h
and
F(t + h) − F(t) ( f (t) − ε)h
>
> f (t) − ε.
h
h
For |h| ≤ δ , we have
F ′ (t) =
F(t+h)−F(t)
h
− f (t) ≤ ε. Thus, F is differentiable at t, and
f (t).
5.3 Riemann sum and area
In this section, we define the Riemann
integral using the method of partition sums
R
and use it to define the integral ab f (x)dx. Let f : [a, b] → R be a continuous function
defined on the closed interval [a, b]. First, consider a set of points,
π = {x0 , x1 , · · · , x p },
called a partition of the closed interval [a, b], satisfying the following conditions:
a = x0 < x1 < x2 < · · · < x p = b.
Although the symbol π is commonly used for the mathematical constant pi, here it
is used to represent a partition. A partition can consist of many points, and another
partition may have fewer points, but they all must start at x0 = a and end at x p = b.
Furthermore, they must be correctly ordered. The size of the k-th subinterval is
represented as follows:
△xk = xk − xk−1 ,
k = 1, · · · , p.
The gauge of the partition π is defined as the maximum size of the subintervals and
is denoted as:
5.3 Riemann sum and area
41
∥π∥ := max △xk .
1≤k≤p
After that, for each subinterval, choose a point ck ∈ [xk−1 , xk ], and define the Riemann sum as follows:
p
∑ f (ck )△xk ,
ck ∈ [xk−1 , xk ].
Riemann Sum
k=1
Let’s understand the Riemann sum through some examples.
Problem 5.5. Calculate the Riemann sum for the given functions and partitions.
Choose the points ck in each subinterval for ease of calculation.
1. For the constant function f (x) = c, interval [0, 1], and partition π = { ni : 0 =
0, 1, · · · , n}, calculate the Riemann sum and compare it with the area.
2. For the function f (x) = x, interval [0, 1], and partition π = { ni : 0 = 0, 1, · · · , n},
calculate the Riemann sum and compare it with the area. Also, find the limit of
the Riemann sum as n → ∞.
3. For the function f (x) = x2 , interval [0, 1], and partition π = { ni : 0 = 0, 1, · · · , n},
calculate the Riemann sum and compare it with the area.
Solution 5.5
⊔
⊓
Definition 5.1. (1) A number I is called the integral of f over [a, b] if
p
∑ f (ck )△xk = I
∥π∥→0
lim
k=1
for any choice of ck ∈ [xk−1 , xk ]. We denote I =
(2) If the limit Rexists, f is called
integrable.
R
(3) We denote ba f (x)dx = − ab f (x)dx.
Rb
a
f (x)dx.
Theorem 5.1. (1) If f : [a, b] → R is continuous, then it is integrable. (2) If f :
[a, b] → R is continuous except for a finite number of points and is bounded, then it
is integrable.
The condition of being continuous except for a finite number of points is necessary for integrability. The following problem illustrates this point.
Problem 5.6. Show that the function f : R → R defined as follows is not integrable
on any finite interval [a, b].
(
1, x is a rational number,
f (x) =
0, x is an irrational number.
42
5 Integration & fundamental theorem of calculus
Solution 5.6 For any subinterval [xk−1 , xk ], both rational and irrational numbers
exist. Therefore, choosing ck to be always a rational number makes the Riemann
sum equal to 1, and choosing it as an irrational number makes the Riemann sum
equal to 0. According to the definition, the Riemann sum should converge to the
same number I regardless of how ck is chosen, which is not the case. ⊔
⊓
Problem 5.7. Show the following properties of integrals for functions f and g defined on the interval [a, b]:
1.
Ra
a
f (x)dx = 0.
2.
Rb
k f (x)dx = k
Rb
3.
Rb
a
( f + g)dx =
Rb
4.
Rb
Rc
a
a
f (x)dx =
a
a
a
f (x)dx for any constant k.
f (x)dx +
f (x)dx +
Rb
c
Rb
a
g(x)dx.
f (x)dx for any c ∈ [a, b].
5. If f (x) ≤ g(x) for all x in [a, b], then
Solution 5.7
Rb
a
f (x)dx ≤
Rb
a
g(x)dx.
⊔
⊓
Now let’s introduce the fundamental theorem of calculus in two forms.
Theorem 5.2 (Fundamental Theorem of Calculus).
Let f be a continuous funcR
tion on the closed interval [a, b], and let F(x) = ax f (t)dt. Then, F(x) is differentiable, and F ′ (x) = f (x).
Proof. To prove this theorem, we need to make good use of the properties of integrals, which we accept without proof. ⊔
⊓
The Fundamental Theorem of Calculus states that if a function is continuous, then
it is integrable, and if we integrate it and then differentiate the result, we get back to
the original function.
Theorem 5.3 (Cauchy’s Fundamental Theorem of Calculus). Let f Rbe a continuous function on the closed interval [a, b], and let F ′ (x) = f (x). Then, ab f (x)dx =
F(b) − F(a).
Proof. Let’s
prove this using the first version of the Fundamental Theorem. Let
R
G(x) = ax f (t)dt. Then, G(x) is also an antiderivative of f . Therefore, (F −G)′ (x) =
F ′ (x) − G′ (x) = f (x) − f (x) = 0. Since F − G is an antiderivative of 0, it is a constant. Thus, there exists a constant C such that F − G = C. Consequently, F = G +C
and
Z
b
F(b) − F(a) = G(b) − G(a) =
f (x)dx.
a
⊔
⊓
Cauchy’s version shows a way to calculate integrals or areas. Once we find an
antiderivative F of f , we can calculate F(b) − F(a) to obtain the integral value
5.3 Riemann sum and area
43
Rb
a f (x)dx. If finding an antiderivative is easy, then using it is preferable. However,
in some cases, finding an antiderivative is not straightforward, and in such cases, it
is much easier to use Riemann sums (partition integrals).
Problem 5.8. Use the Fundamental Theorem of Calculus (FTC) to calculate the
following integrals.
Z a
(1)
Z a
x2 dx
(2)
0
Z a
xα dx
(3)
0
Z 2
x2 + x4 dx
(4)
0
x−1 dx
1
⊔
⊓
Solution 5.8
Exercises
1. For the partition π of the interval [a, b] and ck ∈ [xk−1 , xk ], find the following
limits and determine for which intervals [a, b] they converge.
p
p
∑ 2c2k △k
∥π∥→0
(1) lim
k=1
p
(4) lim
∑ tan ck △k
∥π∥→0 k=1
p
∑ (3ck + c2k )△k
∥π∥→0
∑ sin ck △k
∥π∥→0
(3) lim
c−1
k △k
(6) lim
(2) lim
k=1
p
(5) lim
∑
∥π∥→0 k=1
k=1
p
∑ ck0.5 △k
∥π∥→0 k=1
2. Find the antiderivatives of the following functions. (The variable used doesn’t
make a difference.)
(1) f (x) = x2
(2) f (y) = y−2
(3) f (t) = −|t|
(4) f (t) = 3t − 1
(5) f (y) = sin y
(6) f (x) = x−1
3. Evaluate the following integrals.
Z 2
(1)
(4)
2
(x − x − 1)dx
Z0 3 2
y − y4
1
y2
Z 2
(2)
x
Z 12π
dy
(5)
0
100
Z 3
dx
sin 2x
dx
sin x
4. Simplify the following.
Z
Z 2
d t
d x 2
t dt (2)
f (x)dx
(1)
dx 0
dt 2
d
(3)
dx
(3)
Z−2π
(6)
|t|dt
(cos x + | cos x|)dx
0
Z 1
√ sintdt
x
5. Find the area between y = x2 and y = 1.
√
6. Find the area between y = x, x = 1, and y = 0.
d
(4)
dt
Z √t
x
1
−2
3
dx
Lecture 6
Inverse functions and their derivatives
In this lecture, we will learn about the logarithmic function, which is a very important function. We will also learn about its inverse function, the exponential function.
We will cover basic mathematical concepts such as function definitions, when inverse functions exist, and use the chain rule to calculate the derivatives of inverse
functions.
6.1 Bijection (one-to-one and onto function)
If a function f takes set A as its domain and set B as its co-domain, it is denoted
as f : A → B. Set A is called the domain, and set B is called the co-domain. The
set of all images of the function f in the subset of set B, defined as R( f ) := {y ∈
B| there exists x ∈ A such that y = f (x)}, is called the range.
A function must have exactly one value for each element in its domain. It cannot
have more than one or zero values for a given element.
Definition 6.1. The mapping f : A → B is called a function if there exists y ∈ B
uniquely for each x ∈ A such that f (x) = y.
Defining the inverse function is straightforward – it is defined as the reverse.
However, it is important to note that the inverse function must also be a function,
which imposes certain conditions.
Definition 6.2. A function f : A → B is called one-to-one (or injection) if any value
y ∈ B is taken at most once. In other words, f (x1 ) ̸= f (x2 ) if x1 ̸= x2 . A function
f : A → B is called onto (or surjection) if any value y ∈ B is taken at least once, i.e.,
R( f ) = B. The function f is called bijection if it is one-to-one and onto.
For one-to-one and onto functions, we can define the inverse function.
Definition 6.3. If f : A → B is one-to-one and onto, we may define a function g :
B → A such that g(y) is the preimage of f for y, i.e., f (g(y)) = y. This function g is
called the inverse function of f . We denote the inverse function of f as f −1 .
Problem 6.1. Prove that if the function f : A → B is a bijection, then the inverse
function f −1 is defined.
45
46
6 Inverse functions and their derivatives
Solution 6.1 To prove that the inverse function is defined in this situation means to
show what? It means to show that the inverse function defined as the inverse relation
is indeed a function. To demonstrate that the inverse function g is a function, we
need to show that, for every y ∈ B, the function value g(y) is uniquely defined. Since
f is onto, for every y ∈ B, there exists an x ∈ A such that f (x) = y. This implies
that g(y) cannot have zero function values. If g(y) had distinct values x1 ̸= x2 , then
f (x1 ) = f (x2 ) = y, meaning f is not one-to-one. Therefore, g(y) has a unique value
for every y ∈ B, and thus, it is a function. ⊔
⊓
Question 6.1. If f (x) is a bijection, how are the graphs of f (x) and its inverse function f −1 (x) related? They are symmetric with respect to the line y = x. Can you
easily explain why this symmetry exists?
The graph of the function f depicts points satisfying y = f (x) on the coordinate
plane. The same points satisfy x = f −1 (y). Therefore, the graphs are the same.
However, since variables are typically represented by x, the graph is symmetrically shifted about the line y = x. After swapping x and y, the third figure is obtained.
How do we determine the inverse function for a non-bijective function? We select appropriate subsets of A and B to make the function one-to-one and onto. This
process is called choosing a branch. Different branches yield different inverse functions. When selecting a branch, it is essential to include as many regions as possible.
Problem 6.2. Find sets A, B ⊂ R to make the following function f : A → B a bijection. Find its inverse function.
(1) f (x) = 12 x + 1
(2) f (x) = x2
Solution 6.2 (1) This function is one-to-one and onto when we set A = R and
B = R. To find the inverse function, we can simply set f (x) = y and then rearrange
the equation to solve for x as x = g(y).
1
x + 1 = y ⇒ x = 2(y − 1) ⇒ g(y) = x = 2(y − 1).
2
(2) f is not one-to-one on R. To define a one-to-one interval, we can choose
A = [0, ∞) or A = (−∞, 0]. In this case, most people would choose A = [0, ∞), and we
6.2 Derivative of inverse functions
47
will also adopt that. Additionally, setting B = [0, ∞) and considering f as a function
restricted to the branch f : A → B, an inverse function exists. In this case,
y = x2 ⇒ x =
√
√
y ⇒ g(y) = y,
Thus, the inverse function is g(y) =
y ≥ 0.
√
y. ⊔
⊓
6.2 Derivative of inverse functions
The inverse function of a function f : A → B is typically denoted as f −1 . Thus,
f −1 : B → A. If we distinguish the variables of the domain as x ∈ A and the variables
of the co-domain as y ∈ B, then f (x) and f −1 (y) are written. This distinction helps
reduce confusion.
Calculating the derivative of the inverse function is best approached using the
chain rule, which states that if f is differentiable at c and g is differentiable at f (c),
then the composition g ◦ f is differentiable at c, and its derivative is
(g ◦ f )′ (c) = g′ ( f (c)) f ′ (c).
Let’s rewrite this using Leibniz notation, distinguishing the variables x and y: If we
set y = f (x) and g = g(y) = g( f (x)), we may denote
dg dg dy
=
= g′ (y) f ′ (x) = g′ ( f (x)) f ′ (x).
dx
dy dx
If g is the inverse function of f , then g( f (x)) = x. Thus,
(g ◦ f )(x) = x ⇒ (g ◦ f )′ (x) = 1.
Applying the chain rule, we get
g′ ( f (c)) f ′ (x) = 1 ⇒ g′ ( f (x)) =
1
f ′ (x)
.
As mentioned earlier, this rule does not hold when the denominator is zero. Therefore, we obtain the following derivative of the inverse function.
Theorem 6.1 (Derivative of Inverse Functions). If f : A → B is a bijection, differentiable at c ∈ A, and f ′ (c) ̸= 0, then the inverse function f −1 is differentiable at
f (c), and its derivative is given by
( f −1 )′ ( f (c)) =
1
.
f ′ (c)
48
6 Inverse functions and their derivatives
Proof. The previous application of the chain rule to calculate the inverse function
was not a complete proof. We assumed the differentiability of the inverse function and used the chain rule accordingly. To complete the proof, we must demonstrate that the inverse function is differentiable. Two approaches are possible. Firstly,
knowing what the derivative should be, we show that it is a limit. Secondly, using
the graph, we observe that the inverse function’s graph is obtained by symmetrically shifting the original function’s graph with respect to the line y = x. The slope
⊓
becomes f ′1(c) after the shift. ⊔
To calculate the derivative of the inverse function, one can either find the inverse
function explicitly and then differentiate it or use the formula provided above. Let’s
try both approaches in the following problem and compare the results.
Problem 6.3. Find the inverse function and its derivative, then compare with the
derivative formula.
(1) f (x) = x2
(2) f (x) = x3 − 2 for finding ( f −1 )′ (6).
Solution 6.3 (1) To find the inverse function, start with y = x2 . Since x, y ≥ 0, we
√
√
have x = y. Therefore, the inverse function is f −1 (y) = y = y1/2 . Differentiating,
we get
1
1
( f −1 )′ (y) = y−1/2 = √ .
2
2 y
Now, using the derivative formula for inverse functions,
( f −1 )′ (y) = ( f −1 )′ ( f (x)) =
Do these two results match? Indeed, since x =
1
f ′ (x)
=
1
.
2x
√
y, we can substitute to verify.
(2) Let’s repeat the process. Set y = x3 − 2, and rewriting gives x = (y + 2)1/3 .
1
Hence, f −1 (y) = (y + 2)1/3 , and its derivative is ( f −1 )′ (y) = (y + 2)−2/3 . Using
3
the derivative formula for inverse functions,
( f −1 )′ (y) = ( f −1 )′ ( f (x)) =
1
1
=
.
f ′ (x) 3x2
These two results match. Now, evaluate ( f −1 )′ (6). In the first case, substituting
1
y = 6 gives ( f −1 )′ (6) = 31 8−2/3 = 12
. In the second case, we first confirm that x = 2
when y = 6, and then evaluate,
( f −1 )′ (6) =
The results match. ⊔
⊓
1
1
1
=
= .
f ′ (2) 3 × 22
12
6.3 Natural logarithm
49
In many cases, finding the inverse function explicitly requires extensive computation and may not be possible. Even if it is possible, it is often more desirable to
understand and use the derivative of the inverse function effectively.
6.3 Natural logarithm
In the previous lecture, we were able to find the antiderivative of the power function
xα for all cases except when α = −1. So, what about the antiderivative of x−1 ? We
define it as the area function of the function f (x) = x−1 using the natural logarithm.
We call this area function the natural logarithm and represent it as follows, defining
it through the integral:
Z x
1
dt.
(6.1)
ln x =
1 t
Question 6.2. Why do we define ln x by integrating from 1 to x? Wouldn’t it be
more natural to integrate from 0 to x? Or even from x to ∞? Is there a specific reason
for integrating from 1 to x?
Perhaps the most natural choice would be to integrate from 0. However, the function t −1 cannot be integrated from 0 because the integral diverges to infinity:
Z x
1
0
t
dt = ∞.
The next logical choice might be to integrate from x to ∞, but this is also impossible
as the integral diverges:
Z ∞
1
dt = ∞.
x t
Therefore, the only feasible option is to integrate from a positive value other than
0. Historically, people chose to integrate from 1, leading to the definition of ln x
through (6.1).
50
6 Inverse functions and their derivatives
Now that we have antiderivatives for all power functions xα , except when α = −1,
we can conclude that the antiderivative for α = −1 is the natural logarithm ln x. This
allows us to perform differentiations related to natural logarithms.
Here are some fundamental properties derived from the definition of the natural
logarithm:
Theorem 6.2. The natural logarithm ln x satisfies the following properties.
1. ln x is defined only for x > 0.
2. ln |x| is defined for all non-zero x ∈ R − {0}.
3.
d
dx
4.
d
dx
ln x =
1
x
ln |x| =
for all x > 0 by the fundamental theorem of calculus.
1
x
for all x ̸= 0.
Proof. Properties 1, 2, and 3 stem from the definition of the natural logarithm. Property 4 is evident when considering the graph of y = ln |x|. Using the chain rule,
ln |x| = ln(−x) for x < 0, so when x < 0:
d
1
1
d
ln(|x|) =
ln(−x) =
× (−1) = .
dx
dx
−x
x
Therefore, for all x ̸= 0, we have
d
dx
ln |x| = 1x . ⊔
⊓
Next, we present some useful laws derived from the definition of natural logarithms.
Theorem 6.3 (Properties of Natural Logarithms). The natural logarithm ln x has
the following properties.
1. ln(xy) = ln x + ln y.
2. ln xk = k ln x for x > 0 and k ∈ R.
3. ln(x/y) = ln x − ln y.
4.
d
dx
ln | f (x)| =
f ′ (x)
f (x)
for all x such that f (x) ̸= 0.
Proof. The proof of these rules is significantly facilitated by the technique of differentiation. To demonstrate (1), differentiate both sides using the same method.
Treating y as a constant and differentiating with respect to x, we use the chain rule
to obtain:
d
1
1
d
1
ln(xy) = y = ,
(ln x + ln y) = .
dx
xy
x
dx
x
The equality of the derivatives implies that the two sides differ by a constant. Thus,
ln(xy) = ln x + ln y +C.
Setting x = 1 to find the constant gives C = 0.
6.3 Natural logarithm
51
To demonstrate (2), differentiate both sides to obtain:
d
k
1
ln xk = k kxk−1 = ,
dx
x
x
d
k
k ln x = .
dx
x
Again, the equality of the derivatives implies that the two sides differ by a constant.
Thus,
ln xk = k ln x +C.
Setting x = 1 to find the constant gives C = 0.
For (3), we use the properties obtained above:
ln(x/y) = ln(xy−1 ) = ln x + ln y−1 = ln x − ln y.
For (4), we apply the chain rule to Theorem 6.2(4):
d
1
f ′ (x)
d
ln(| f (x)|) =
ln(− f (x)) =
× (− f ′ (x)) =
.
dx
dx
− f (x)
f (x)
⊔
⊓
Problem 6.4. Let a > 0. Using Theorem 6.3, prove the following:
d x
a = ax ln a.
dx
Solution 6.4 The proof involves differentiating the natural logarithm attached to
ln ax . Using Theorem 6.3(2), we have:
d
d
ln(ax ) = (x ln a) = ln a.
dx
dx
On the other hand, using Theorem 6.3(4), we have:
d
(ax )′
ln(ax ) = x .
dx
a
Both expressions must be equal, leading to (ax )′ = ax ln a. ⊔
⊓
Using natural logarithms makes it easier to differentiate seemingly complex fractional functions, as shown in the following problem.
Problem 6.5. Find the derivative of the fraction function
(x2 +1)(x+3)1/2
.
x−1
Solution 6.5 Applying the product and chain rules to the given function becomes
complicated. However, taking the natural logarithm of this function simplifies the
product and division, making differentiation easier. Let y =
using the properties of natural logarithms, we have:
(x2 +1)(x+3)1/2
.
x−1
Then,
52
6 Inverse functions and their derivatives
1
ln y = ln(x2 + 1) + ln(x + 3) − ln(x − 1).
2
Differentiating both sides with respect to x:
2x
1
1
y′
= 2
+
−
.
y
x + 1 2(x + 3) x − 1
Simplifying, we get:
2
1
(x + 1)(x + 3)1/2
2x
1
′
−
.
y =
+
x2 + 1 2(x + 3) x − 1
x−1
For more complicated fraction functions, using natural logarithms can simplify the
differentiation process. ⊔
⊓
Problem
6.6. EvaluateZthe following indefinite
integrals.
Z
Z
2 cos x
(1) cot x dx
(2) tan x dx
(3)
dx
3 + 2 sin x
Z
Z
(4) sec x dx
(5) csc x dx
Solution 6.6 Generally, finding antiderivatives is not easy. The given problems
involve finding antiderivatives using Theorem 6.3(4). If you recognize the given
function as
f ′ (x)
F(x) =
,
f (x)
you can directly use Theorem 6.3(4) to find:
Z
F(x)dx = ln | f (x)| +C.
Don’t forget to include the absolute value. ⊔
⊓
6.4 Exponential function
The Euler number, also known as the natural constant, denoted as e, is a special
number that satisfies:
Z e
1
dt = 1.
1 t
In other words, ln e = 1. This number, like π, is irrational. Now let’s consider the
derivative of the exponential function f (x) = ex with respect to x. According to
Problem 6.4, we have:
d x
e = ex ln e = ex .
dx
6.4 Exponential function
53
In other words, the derivative and the original function are the same, making it a
very special function.
Now, let’s consider the inverse function of the natural logarithm. The inverse
function of the natural logarithm ln x is called the exponential function and is denoted as exp x. When expressing the natural logarithm function as ln : A → B, where
the domain is A = (0, ∞) and the range is B = (−∞, ∞) = R, the exponential function
is defined as exp : R → (0, ∞).
Now, let’s compute the derivative of exp x. Let y = exp(x); then x = ln(y). Using
the inverse function’s derivative rule, we get
exp′ (x) =
1
1
=
= y = exp(x).
ln (y) 1/y
′
In other words, the exponential function, like the exponential function ex , has the
property that its derivative returns itself. To show whether these two functions are
the same, what do we need to demonstrate? Let’s consider the following problem
for now.
Problem 6.7. Two functions f (x) and g(x) are both positive and satisfy the following conditions:
f ′ (x) = f (x),
g′ (x) = g(x),
f (c) = g(c) for some c.
Prove that these two functions are the same.
Solution 6.7 Given that both f (x) and g(x) are positive, the fact that g(x) does not
f (x)
become 0 ensures that the fraction function g(x)
is well-defined. It is easy to show
that the derivative of this fraction function is 0. Therefore, the fraction function is a
constant, and since f and g have the same value at x = 1, this constant must be 1.
f (x)
Consequently, for all x, we have g(x)
= 1, implying f (x) = g(x). ⊔
⊓
Since ex and exp x are both positive, have derivatives equal to themselves, and are
equal at x = 0, we can conclude that exp x = ex for all x ∈ R. Thus,
exp x = ex ,
x ∈ R.
While both notations are used, we typically use ex more commonly. Now, let’s summarize some properties of the exponential function.
Theorem 6.4 (Exponential Function). The exponential function ex satisfies the following properties.
d x
d f (x)
(1) dx
e = ex .
(2) dx
e
= f ′ (x)e f (x) .
(3) ex ey = ex+y .
Proof. (1) has already been proven. (2) follows from (1) using the chain rule. (3)
can be proven by taking the natural logarithm of both sides:
54
6 Inverse functions and their derivatives
ln(ex ey ) = ln ex + ln ey = x + y = (x + y) ln e = ln(ex+y ).
Since the natural logarithm is a one-to-one function, we can conclude that ex ey =
ex+y . ⊔
⊓
Exercises
1. Find the inverse functions of the following functions along with their domains.
(1) f (x) = x5
(2) f (x) = 2x + 1
(3) f (x) = (1 − x)3
2. Compute the derivative of the inverse functions ( f −1 )′ (2) at y = 2.
x
(1) f (x) = x5
(2) f (x) = 2√
+1
(3) f (x) = x3 + 1
x
x+3
(5) f (x) = √
(6) f (x) = (x2 − 1)0.3
(4) f (x) =
x−1
x−3
3. Compute the derivatives of the following functions.
2
2
(1) y = ln(x3 + x)
(2) y = ln(ln
(3) y = ex +3
r x )
1 1+t
1+t
(4) y =
(5) y =
(6) y = ln
1−t
1−t
t(1 + t)
4. Compute the following integrals.
Z −1
Z π
Z 2
1
cos x
(1)
dx
(2)
dx
(3)
ex dx
−2 x
0 10 + sin x
0
Z 2
Z
Z
7t
dx
(4)
(5) 10 tan 2x dx
(6)
dt
2
t − 10
1 x ln x
5. Differentiate
the following using the properties of natural logarithms.
s
√
(x + 1)(x − 1)2
2+t
(1)
(2)
(3)
tan
θ
2θ − 1
(x2 + 1)x2
t sint
dy
when x and y satisfy the given relations.
dx
(1) y = yx
(2) ex = x2y+2x
(3) ln y = x3 y
x
2x
(4) y = x
(5) y = (tan x)
(6) ysin x = xcos y
6. Find
7. Evaluate the following integrals.
Z
Z
2 ln(10x)
dx
(1)
dx
(2)
x
−
1
x
Z
Z ln x
(4)
3x
√
2
dx
(5)
x
√
2−1
(3)
dx
Z ln x
1
Z1 0 t
dt
5−x dx
(6)
−2
Part II
Kepler and Newton’s Laws of Motion
Astronomer Johannes Kepler, in the 16th century, analyzed the observations of
Danish astronomer Tycho Brahe and explained the orbits of planets around the sun
with three laws between 1609 and 1619. These laws modified the circular orbit
theory of Nicolaus Copernicus to elliptical orbits and explained how the speed of
planets changes. The three laws are as follows:
1. The orbit of a planet is an ellipse with the sun at one of the two foci.
2. The line segment connecting the planet and the sun sweeps equal areas in equal
time intervals.
3. The square of the period of the planet’s orbit is proportional to the cube of the
semi-major axis length of the orbit.
Isaac Newton, in 1687, demonstrated that Kepler’s laws result from his laws of
motion and universal gravitation. Newton’s laws of motion consist of three parts:
1. Law of Inertia: An object at rest stays at rest, and an object in motion stays in
motion with the same speed unless acted upon by an external force.
2. Force Law: Force is the product of mass and acceleration (F = ma).
3. Action-Reaction Law: For every action, there is an equal and opposite reaction.
Newton’s law of gravitational force states that the gravitational force between two
objects is inversely proportional to the square of the distance and directly proportional to the product of their masses. If the masses of the two objects are m1 and m2 ,
and their positions are x1 and x2 , then the gravitational force acting on object m1 is
given by:
m1 m2 r
.
Fm1 = −G 2
r r
In the above equation, G is the gravitational constant (6.674 × 10−11 m2 /kg s), and
r and r are defined as follows:
r = x1 − x2 ,
r = ∥r∥.
The force acting on object m2 is simply the opposite, following the action-reaction
law:
m1 m2 r
= −Fm1 .
Fm2 = G 2
r r
Thus, it satisfies the action-reaction law.
In Part II, the first goal is to explain Kepler’s laws using Newton’s laws, and in
the process, the second goal is to learn various useful mathematics.
Lecture 7
Rectangular coordinate system and curves in R3
Space: the final frontier. These are the voyages of the Starship Enterprise. Its
five-year mission: to explore strange new worlds. To seek out new life and new
civilizations. To boldly go where no man has gone before! (From Star Trek)
Now, let’s take the perspective of Newton and try to explain the motion of celestial
bodies using mathematics. To represent the motion of celestial bodies in space with
equations, we first need to establish a coordinate system in space. However, this task
is not as simple as it might seem. While the Earth has served as a reference for us
living on it, there is no such absolute reference in space. The reference frame needs
to be chosen by us.
In the movie Star Trek, the spacecraft Enterprise often moved at high speeds
and then came to a stop. However, distinguishing between a spacecraft moving at a
constant speed and a stationary one is not meaningful. Therefore, stating whether an
object is moving quickly or at rest is not meaningful. If we want to reach a certain
planet, it is more accurate to say that we match the velocity of the spacecraft to the
velocity of that planet. Velocity is relative, and kinetic energy is also relative. Only
acceleration has meaning.
7.1 Coordinate system
Let there be a particle in space. It is represented as r in the figure. Before choosing
a reference frame, we cannot say whether this particle is moving or not. What we
57
58
7 Rectangular coordinate system and curves in R3
can have as a reference is an object that is not accelerating or a position. We call it
the origin, denoted by 0. If the particle r moves with the same velocity as the origin
0, we say the particle is not moving. In other words, the velocity of the origin is the
zero velocity, denoted by v = 0. We use the same notation 0 for the position of the
origin and the velocity of the origin, distinguishing them by context. A point that
does not move in space, i.e., has a zero velocity, is called a position.
To express the position of the particle numerically, we need a coordinate system.
A coordinate system in space implies three positions satisfying certain conditions
with respect to the origin. First, we define the unit of distance. Next, we need a position i, which is one unit of distance away from the origin 0 in the x-axis direction.
Then, we choose a line perpendicular to the line connecting the origin and i in the
y-axis direction. On this line, we choose a point one unit of distance away from the
origin and name it j. A line perpendicular to the plane passing through the origin, i,
and j is chosen, and a point at a unit distance from the origin on this line is selected
and named k. This completes the coordinate system.
Problem 7.1. In the explanation above, positively oriented coordinate systems and
negatively oriented coordinate systems are distinguished based on the choice of k.
What are these cases?
Solution 7.1 (i) Right-hand rule: Wrap the fingers of your right hand around the
line passing through the origin, i, and j in the plane containing them, with the thumb
pointing in the k direction. If k aligns with the thumb, the coordinate system is
positively oriented. Otherwise, it is negatively oriented.
(ii) Cross product test: If k = i × j, the coordinate system is positively oriented. (In
any case, cross product can be explained using the right-hand rule.) ⊔
⊓
We choose the positively oriented coordinate system, as is the tradition.
Remark 7.1. In this lecture, we consider 3-dimensional space, but for spaces with
dimensions two or higher, there exist both positive and negative coordinate systems,
and they can be distinguished. However, two positively oriented coordinate systems
cannot be distinguished from each other. They coincide upon rotation. The choice
7.2 Projection
59
of the coordinate system order determines the orientation. In 1-dimensional space,
there is only one coordinate system. When an object moves to the right, increasing
x is considered positive, and when it moves to the left, decreasing x is considered
positive. However, with a rotation, left and right are swapped, and they become
indistinguishable. The actual choice of coordinates determines the orientation.
Problem 7.2. Two particles move with different velocities without acceleration.
Prove that there exists a plane containing the motion of these two particles in space.
Solution 7.2 As discussed, let’s take one of the particles as the origin. There is a
line passing through the origin, and it intersects the plane containing the motion of
the second particle in space. If this line passes through the origin, there are many
such planes, and if it does not pass through the origin, there is a unique plane. ⊔
⊓
After looking at the solution to the above problem, if you feel a bit deceived, I
want to emphasize that this is not the case. Of course, within the coordinate system
with the third party as the origin, there is no such plane. Problem 7.2 illustrates that
the coordinate system should be chosen according to the purpose.
7.2 Projection
Let r denote the position of a particle in space. Given a coordinate system, we can
represent the position of r with three numbers using that coordinate system. Let’s
examine the meaning and method in detail. First, we project r onto the line x-axis,
which is the line connecting the origin 0 and the unit vector i. When projecting onto
the line, the point where the line, passing through the position r and perpendicular
to the x-axis, intersects the x-axis is the projection point of r onto the x-axis. The
distance from the origin to the projection point is the x coordinate of r. If the projection point is on the opposite side of i, we assign a negative sign. Similarly, we
can perform this process for j and k to find the y and z coordinates. These are the
coordinates of the point r. Consider the projection onto the xy-plane. Draw a line
perpendicular to the xy-plane, passing through r, and find the point where it intersects the xy-plane. This point is the projection. The coordinates of this point on the
xy-plane are (x, y).
We represent r as a column vector:
 
x
r = y .
z
The coordinates for i, j, k, and 0 are as follows:
60
7 Rectangular coordinate system and curves in R3
 
 
 
 
0
1
0
0
0 = 0 , i = 0 , j = 1 , k = 0 .
0
0
0
1
The point r can be expressed using i, j, and k as:
r = xi + yj + zk.
Vectors are denoted in bold, and scalars are denoted in regular font. The magnitude
or norm of the position vector r is defined and represented as:
p
∥r∥ = x2 + y2 + z2 .
This represents the distance between r and the origin 0 (Pythagorean theorem). Different coordinate systems can be chosen as needed. In such cases, the essential position of r remains unchanged, but its representation changes.
Question 7.1. Most calculus textbooks do not distinguish whether vectors are column vectors or row vectors. However, we fix r as a column vector. What is the
advantage of choosing column vectors over row vectors?
Distinguishing between column vectors and row vectors reduces confusion. One
reason for representing the position vector r as a column vector is matrix multiplication. If A is a 3 × 3 matrix and x is a vector, we typically write the matrix-vector
multiplication as Ax. In this case, x must be a column vector.
However, using column vectors has its drawbacks, as it consumes more space.
Therefore, sometimes, we may write r = (1, 3, 2), saving space horizontally. But
remember to keep in mind that, depending on the context, this may still represent a
column vector.
7.3 Moving particle and trajectory curves in space
61
7.3 Moving particle and trajectory curves in space
Let’s consider a planet moving in space. Let time be represented by t ∈ R, and let
r(t) denote the position of the planet or object at time t. Then, we can write:


f (t)
r(t) = f (t)i + g(t)j + h(t)k =  g(t)  .
h(t)
Alternatively, we can express it as:
x = f (t),
y = g(t),
z = h(t).
Both representations are equivalent, and the meaning is clear. However, reconsidering, what is the reason for introducing the new expressions f (t), g(t), and h(t)?
They represent functions of x, y, and z coordinates of the planet, respectively. But
later, one might forget whether f (t) represented the x or y coordinate. So, it is better
to write:


x(t)
r(t) = y(t) .
z(t)
The trajectory of the planet, denoted as {r(t) : t ∈ R}, is a curve in 3D space. Thus,
we can consider it as a vector-valued function with time variable t ∈ R. Using either
of the two expressions mentioned earlier, the norm of the position vector r(t) can be
represented as follows:
q
q
∥r(t)∥ = f 2 (t) + g2 (t) + h2 (t) or ∥r(t)∥ = x2 (t) + y2 (t) + z2 (t).
The second notation makes it clear that this is the distance between the position
vector r(t) and the origin. This use of notation abuse clarifies the meaning.
Remark 7.2. In this notation, x(t) is a function with t as the variable representing
the x coordinate of the moving particle’s position at time t. We refer to this kind of
expression as notation abuse. Using the same symbol x for both the x coordinate in
the coordinate system and the function representing the position at time t is more
convenient than introducing a new function f (t) as x = f (t). This kind of notation
abuse, where the same symbol is used for two different entities, is widespread and
has been used in calculus, including the chain rule.
Question 7.2. What is the difference between a vector and a scalar?
We commonly say that a scalar is a quantity with only magnitude, and a vector is a
quantity with both magnitude and direction. However, that statement is not entirely
accurate. A scalar value x ∈ R also has one of two directions, either to the right
or to the left, with a magnitude of |x|. A more precise distinction is that a scalar
7 Rectangular coordinate system and curves in R3
62
is a quantity that arises in a number system like real or complex numbers, while a
vector can be considered as composed of multiple scalars, including the case of a
single-component vector. In other words, a scalar can be called a single-component
vector.
Problem 7.3. Draw the trajectory of the vector function r(t) = costi + sintj given
by r : (0, 2π) → R2 . In which direction is it moving?
Problem 7.4. Draw the trajectory of the vector function r(t) = costi + sintj + tk
given by r : (0, 2π) → R3 .
Problem 7.5. Generate a function r : (0, 2π) → R3 that traces the trajectory of a coil
rotating the z-axis 10 times when projected onto the xy plane, resulting in a circle of
radius 2.
Vector sums and subtractions
Multiplying a vector by a scalar is given by cr = (cx, cy, cz). The sum and difference
of two vectors are defined by adding and subtracting each component of the vectors,
respectively. That is,
r1 + r2 = (x1 + x2 , y1 + y2 , z1 + z2 ).,
r1 − r2 = (x1 − x2 , y1 − y2 , z1 − z2 ).
The geometric interpretation of vector addition is explained using parallelograms.
The vector difference r2 − r1 is understood with r2 as the terminal point and r1 as
the initial point (refer to the figure above).
7.4 Cross product & inner product
For two vectors,
7.4 Cross product & inner product
 
x1
r1 = y1  ,
z1
63
 
x2
r2 = y2  ,
z2
the cross product is denoted and defined as follows:
r1 × r2 = (y1 z2 − z1 y2 )i − (x1 z2 − z1 x2 )j + (x1 y2 − y1 x2 )k.
It is also called the vector product. To make it easier to remember the above formula,
we use the determinant of a 3 × 3 matrix:
i j k
x y
x z
y z
r1 × r2 = x1 y1 z1 = 1 1 i − 1 1 j + 1 1 k.
x2 y2
x2 z2
y2 z2
x2 y2 z2
The cross product is defined only for 3-dimensional vectors. Geometrically, the
cross product r1 × r2 is a vector perpendicular to the plane containing the two vectors r1 and r2 , with a magnitude given by
∥r1 × r2 ∥ = ∥r1 ∥ ∥r2 ∥ sin θ
(7.1)
where θ is the angle between them. There are two such vectors, satisfying the righthand rule. If the two vectors are parallel, i.e., if the angle is θ = 0, then r1 × r2 = 0.
Problem 7.6. Let r1 (t) and r2 (t) denote the vectors representing the positions of
two objects at time t. Show that the cross product satisfies the following product
rule:
(r1 (t) × r2 (t))′ = r′1 (t) × r2 (t) + r1 (t) × r′2 (t).
Solution 7.6 We can use the product rule for derivatives as follows:
(r1 (t) × r2 (t))′ = (y1 z2 − z1 y2 )′ i − (x1 z2 − z1 x2 )′ j + (x1 y2 − y1 x2 )′ k
= (y′1 z2 − z′1 y2 )i + (y1 z′2 − z1 y′2 )i + (· · · )j + (· · · )k
= r′1 (t) × r2 (t) + r1 (t) × r′2 (t).
Thus, the product rule is satisfied. (Not all terms are explicitly written, please verify.)
⊔
⊓
7 Rectangular coordinate system and curves in R3
64
Question 7.3. Is there a way to determine if two vectors r1 and r2 are perpendicular?
Is there an easy way to find the angle between them?
Using (7.1), we can find the angle between two vectors. However, an easier way
to determine the angle is through the inner product, also known as the dot product.
The inner product is defined in two ways:
r1 · r2 = ⟨r1 , r2 ⟩ = x1 x2 + y1 y2 + z1 z2 .
The inner product of two vectors yields a single scalar value.
Problem 7.7. If θ is the angle between two vectors r1 and r2 , show that
cos θ =
r1 · r2
.
∥r1 ∥ ∥r2 ∥
(7.2)
Solution 7.7 Assuming the two vectors meet at the origin, we can consider them
lying in the xy-plane. Therefore, let’s assume all z components are zero. Then the
relationship (7.2) corresponds to basic trigonometry learned in high school. Though
not explicitly shown here, (7.2) should be remembered. ⊔
⊓
The relationship (7.2) is very important. If the inner product is 0, the vectors are
perpendicular. If the angle is 0, i.e., if the vectors are parallel, then cos 0 = 1, and
the inner product of the two vectors equals the product of their lengths.
Problem 7.8 (Equation of a plane). Find the equation of a plane perpendicular to
vector v = 2i + 3j + k passing through the point r = (1, 2, −1).
Solution 7.8 (Refer to the figure above) Let x = (x, y, z) represent a point on the
plane. Then, the vector x − r = (x − 1, y − 2, z + 1) is perpendicular to v = (0, 3, −2).
Therefore,
(x − r) · v = 0(x − 1) + 3(y − 2) − 2(z + 1) = 3y − 2z − 8 = 0.
Thus, the equation of the plane is 3y − 2z − 8 = 0. Alternatively, it can be written as
3y − 2z = 8. ⊔
⊓
The inner product can be defined not only for 3-dimensional vectors but also
for vectors of any dimension. However, the notation used previously is not suitable
for expressing the inner product of n-dimensional vectors. Let’s represent two ndimensional vectors slightly differently:
 
 
x1
y1
 .. 
 .. 
x =  . , y =  . .
xn
yn
The inner product of these two vectors is defined as follows.
7.4 Cross product & inner product
65
n
x · y = ⟨x, y⟩ = ∑ xi yi .
(7.3)
i=1
The inner product of two functions f and g can also be defined by integration.
⟨ f , g⟩ =
Z
f (x)g(x)dx.
(7.4)
What is the angle between two vectors in n-dimensional space? What about the
angle between two functions f and g? Although their meanings are different, (7.2)
can be used as a definition for angles.
Question 7.4. What commonality exists between the inner products (7.3) and (7.4),
even though they seem different?
Problem 7.9. Let x(t) and y(t) denote vectors representing the positions of two
objects at time t. Show that the derivative of their inner product also satisfies the
following product rule:
(x(t) · y(t))′ = x′ (t) · y(t) + x(t) · y′ (t).
Solution 7.9 We can use the product rule for derivatives as follows:
(x(t) · y(t))′ =
n
′
n
x
(t)y
(t)
= ∑ (xi (t)yi (t))′
∑ i i
i=1
n
=
i=1
∑ (xi′ (t)yi (t) + xi (t)y′i (t)) = x′ (t) · y(t) + x(t) · y′ (t).
i=1
Thus, the product rule is satisfied. ⊔
⊓
Exercises
1. Find all vectors perpendicular to r = i + 2j + k.
(1) r = i + 2j + k (2) r = 2i − 3j + 4k (3) r = i − j + k
(4) r = 2i + j − 3k
2. Find unit vectors perpendicular to the following pairs of vectors.
(1) r1 = 3j + k, r2 = 2i + j − k (2) r1 = i + 2j + k, r2 = 2i − j
3. Find the equation of a plane perpendicular to the vector v = 2i + 3j + k passing
through the point r = (1, 2, −1).
4. Find the equation of a plane perpendicular to the vector v = 2i + 3j + k passing
through the point r = (1, 2, −1).
5. Find a vector perpendicular to the plane with the equation 2x + 3y − z = 2.
66
7 Rectangular coordinate system and curves in R3
6. Find the equation of a plane parallel to the xy-plane passing through the point
r = (2, 1, 4).
7. Find the equation of a plane parallel to the xz-plane passing through the point
r = (2, 1, 4).
8. Find the intersection of the planes 2x + 3y − z = 2 and 3x + y − 2z = 0.
9. Find the equation that represents all points equidistant to the points r1 = (1, 2, 1)
and r2 = (3, 2, −1).
Lecture 8
Polar coordinates in R2
The planets in the solar system orbit in elliptical paths close to circles around the
sun. Artificial satellites orbiting around the Earth are mainly designed to orbit in
circular paths, but they can also orbit in elliptical paths. Each orbit can be described
in two-dimensional space coordinates on a plane. Particularly, polar coordinates are
useful for representing circular or elliptical orbits. In this lecture, we will discuss
polar coordinates, which have many practical applications.
8.1 Variable change with polar coordinates
The polar coordinate system in two-dimensional space consists of two numbers:
the length r and the angle θ . The orthogonal coordinate system in two dimensions
consists of two numbers: the x-coordinate and the y-coordinate. The length of the
line segment connecting the origin and the point (x, y) is r, and this line segment
makes an angle θ with the x-axis. Given polar coordinates (r, θ ), we can calculate
orthogonal coordinates using sin θ and cos θ . That is,
x = r cos θ ,
y = r sin θ .
(8.1)
Of course, given orthogonal coordinates (x, y), we can find the corresponding polar
coordinates (r, θ ). However, it is important to determine the ranges of r and θ . We
have
67
8 Polar coordinates in R2
68
(r, θ ) ∈ [0, ∞) × [0, 2π),
so the length r is given by
r=
p
x 2 + y2 .
However, explicitly expressing the angle θ as θ = f (x, y) is difficult, and it is given
implicitly as
p
y
x
(8.2)
cos θ = , sin θ = , r = x2 + y2 .
r
r
If r ̸= 0, then there exists a unique θ satisfying (8.2) in the interval 0 ≤ θ < 2π.
Let’s agree to write r before θ , similar to writing x before y.
Depending on the purpose and convenience, one can choose either of the two coordinate systems and should understand their relationship well in order to perform
variable transformations freely. The relations (8.1) and (8.2) between polar coordinates (r, θ ) and orthogonal coordinates (x, y) are perhaps the most important examples of multidimensional variable transformations, serving as the first example in
understanding variable transformations clearly, which is essential for understanding
Newton’s planetary theory.
The orthogonal coordinate system is not only a coordinate system but also the
actual world where the motion of planets occurs. On the other hand, the polar coordinate system is a convenient coordinate system for representing elliptical orbits in
the orthogonal coordinate system. To easily use the polar coordinate system in the
orthogonal coordinate system, we introduce new basis vectors instead of the basic
basis vectors i and j of the orthogonal coordinate system. These are as follows:
cos θ
− sin θ
er (θ ) =
,
eθ (θ ) =
.
(8.3)
sin θ
cos θ
Although these two vectors are unit vectors, unlike i and j, they are not constant
vectors. Both vectors depend only on θ and are independent of r. The corresponding basis vectors of the polar coordinate system are er , which becomes (1, 0), and
eθ , which becomes (0, 1). The reason is as follows: as seen in the figure, the vec-
tor er is a vector in the direction of the fixed θ , so it corresponds to (1, 0) in the
polar coordinate system, and the vector eθ corresponds to (0, 1) in the polar coor-
8.2 Motion in polar coordinates
69
dinate system as it is a vector in the direction of the fixed r. Let’s examine which
point in the orthogonal coordinate corresponds to the given coordinates (r, θ ) in the
polar coordinate plane. Once the angle θ is given, we consider the direction vector er corresponding to the angle θ . Since the direction vector is a unit vector, the
corresponding vector has a length of r:
r = rer (θ ).
This equation is nothing more than rewriting the relationship (8.1) as a vector equation. If er is the first coordinate axis and eθ is the second coordinate axis, then the
new coordinate system also has a positive orientation.
Now, if the point r = (x, y) on the xy plane is given, let’s find the corresponding
polar coordinates (r, θ ). There is a point to be careful about: since the correspondence (8.1) is not one-to-one, it is not uniquely determined. To establish an inverse
correspondence, we must choose a branch as in defining inverse functions. In polar
coordinates, we choose r ≥ 0 and 0 ≤ θ < 2π as branches. Within this range, we
choose r and θ that satisfy (8.2).
Question 8.1. What if we express θ as tan−1 (y/x), θ as sin−1 (y/r), or θ as
cos−1 (x/r) instead of (8.2)?
Indeed, many calculus books use such relationships. However, if we define θ =
tan−1 (y/x) or θ = sin−1 (y/r), these two inverse functions only give angles in the
range − π2 ≤ θ ≤ π2 according to their definitions. If we use θ = cos−1 (x/r), it gives
angles only in the range 0 ≤ θ ≤ π according to the definition of cos−1 . Therefore,
these expressions are not accurate representations. Let’s simply use the solutions of
(8.2) as a new function θ (x, y). Then, we can cover the range 0 ≤ θ < 2π handled
in polar coordinates.
8.2 Motion in polar coordinates
This section is essential for deriving the orbit formulas of planets. It requires mathematical thinking for physical understanding. Assuming that two celestial bodies
(such as the Sun and the Earth) do not exert any external forces other than gravity
on each other, they will lie on the same plane (this will be confirmed later). Introducing a polar coordinate system on this plane allows us to represent the position of
an object or a planet using polar coordinates:
r = xi + yj = r cos θ i + r sin θ j = rer (θ ).
We denote the position vector in bold font r. The relationship with polar coordinates
r is
∥r∥ = r.
8 Polar coordinates in R2
70
The basis vectors i and j in the orthogonal coordinate system are fixed perpendicular coordinate systems regardless of the position. However, er (θ ) and eθ (θ ) are
perpendicular coordinate systems that vary depending on the position. They are determined by the angle θ for a given position in the orthogonal coordinate system
and are independent of r.
Problem 8.1. Prove the following derivatives.
der
= eθ ,
dθ
deθ
= −er .
dθ
Solution 8.1 These relations can be easily proven using the derivatives of trigonometric functions. Remembering them is more important. ⊔
⊓
With the new coordinate system, the position of the object is represented as
r = rer (θ ). This notation hides the time variable. As the object moves, the polar
coordinates r and θ representing the position of the object become functions of the
time variable t. The right side of the following figure shows the trajectory of a particle moving on the xy plane. Then, the corresponding polar coordinate position is
represented as r̃(t) = (r(t), θ (t)). The space where Newton’s laws apply is not the
polar coordinate space but the orthogonal coordinate space. In other words, Newton’s gravitational law and laws of motion must be applied to the trajectory where
the point r = rer (θ ) on the right side of the figure moves. Therefore, the coordinates
er and eθ become functions of the angle θ with respect to the time variable t, and
the position of the particle can be written as follows:
r(t) = r(t)er (θ (t)).
Problem 8.2. Prove the following.
ėr = eθ θ̇ ,
ėθ = −er θ̇ .
(8.4)
Solution 8.2 To calculate the derivatives with respect to time ėr and ėθ , consider
the angle as a function of time θ = θ (t). Using the chain rule and problem 8.1, we
get
8.3 Ellipses in polar coordinates
ėr =
d
dt
71
′
der
cos θ (t)
cos θ (t)θ̇
− sin θ (t)θ̇
θ̇ = eθ θ̇
=
=
=
′
sin θ (t)
sin θ (t)θ̇
cos θ (t)θ̇
dθ
and similarly
ėθ =
deθ
θ̇ = −er θ̇ .
dθ
⊔
⊓
Problem 8.3 (Position, velocity, acceleration using polar coordinates). The position, velocity, and acceleration of an object are given as follows.
r = rer
(8.5)
v = ṙer + rθ̇ eθ
2
a = (r̈ − rθ̇ )er + (rθ̈ + 2ṙθ̇ )eθ
(8.6)
(8.7)
Solution 8.3 The position vector (8.5) has already been explained. Its derivative
using the product rule and (8.4) is as follows:
v = ṙ = ṙer + rėr = ṙer + rθ̇ eθ ,
a = v̇ = r̈er + 2ṙθ̇ eθ + rθ̈ eθ − rθ̇ 2 er = (r̈ − rθ̇ 2 )er + (rθ̈ + 2ṙθ̇ )eθ .
⊔
⊓
Remark 8.1. Remember that using polar coordinates r and θ , it is convenient to use
er and eθ as basis vectors instead of i and j.
8.3 Ellipses in polar coordinates
The equation of an ellipse with its center at the origin and major and minor axes
along the x-axis and y-axis, respectively, is given by:
x 2 y2
+
= 1.
a2 b2
An overview of the graph is given on the left side of the figure. ±a represent the
x-intercepts, and ±b represent the y-intercepts. If a = b, then the above ellipse becomes a circle. For convenience, we consider the case where a ≥ b, so the x-axis
becomes the major axis. The focus of the ellipse lies on√the major axis at two points.
The distance between the center and the focus is c = a2 − b2 , i.e., the foci are at
(±c, 0). The eccentricity of the ellipse, which indicates how far it deviates from a
circle, is given by:
r
c
a2 − b2
.
(8.8)
e= =
a
a2
8 Polar coordinates in R2
72
If e = 0, the shape is a circle. If e = 1, then b = 0, and it is no longer an ellipse. The
eccentricity of an ellipse lies between 0 and 1.
Let’s represent the ellipse using polar coordinates. Take a line perpendicular to
the x-axis, x = k, and use this line as the directrix for obtaining the curve in front.
Let P(x, y) have polar coordinates (r, θ ), and denote the foot of the perpendicular
from P to the directrix as D. For some positive e > 0,
r = ePD
(8.9)
defines all points P(x, y) that satisfy this equation. Since the length of segment PD
is k − x, we have:
p
r = ePD ⇒
x2 + y2 = e(k − x) ⇒ x2 + y2 = e2 (k2 − 2kx + x2 ).
In simplified form, this becomes:
(1 − e2 )x2 + 2ke2 x + y2 = e2 k2 .
If e ̸= 1, we can rewrite this equation as follows:
ke2 2
y2
e2 k 2
+
=
.
x+
1 − e2
1 − e2
(1 − e2 )2
(8.10)
Problem 8.4. If 0 < e < 1, show that (8.10) represents an ellipse with one of its foci
at the origin, where e represents the eccentricity of the ellipse.
Solution 8.4 If 0 < e < 1, then 1 − e2 > 0, and we can define:
a2 =
e2 k2
,
(1 − e2 )2
b2 = a2 (1 − e2 ) =
e2 k 2
,
(1 − e2 )
Dividing (8.10) by a2 , we get:
(x + c)2 y2
+ 2 = 1,
a2
b
c=
ke2
> 0.
1 − e2
8.4 Curves in polar coordinates
73
which represents an ellipse.
q The center of the ellipse is (−c, 0). The eccentricity of
the ellipse is defined as
a2 −b2
.
a2
Calculating,
a2 − a2 (1 − e2 ) 1 − (1 − e2 )
a2 − b2
=
=
= e2 .
a2
a2
1
(8.11)
Thus, the coefficient e in the relationship r = ePD is indeed the eccentricity of the
ellipse, so it is reasonable to set the coefficient to
√e from the beginning. The distance
from the center to the focus of the ellipse is a2 − b2 , and using (8.11), we can
compute:
s
p
√
k 2 e4
2
2
2
2
= c.
a −b = e a =
(1 − e2 )2
Therefore, shifting the ellipse by c units to the left means the origin is a focus. ⊔
⊓
We have shown that points satisfying (8.9) form an ellipse with eccentricity e
and one focus at the origin. The length of segment PD is k − r cos θ , so the polar
representation of this ellipse becomes r = e(k − r cos θ ). Solving for r, we get:
r=
L
,
1 + e cos θ
L = ek.
This equation represents an ellipse with eccentricity e for 0 < e < 1. However, for
e ≥ 1, it represents a parabola or a hyperbola (see Appendix B).
8.4 Curves in polar coordinates
When using polar coordinates (r, θ ) correspondingly with Cartesian coordinates
(x, y), it’s common to define the range as r ≥ 0 and 0 ≤ θ < 2π. However, when
simply using polar coordinates to represent curves, they can be used without such
restrictions. In this section, we consider the equations of curves represented in polar coordinates and their corresponding curves in Cartesian coordinates and their
meanings.
Problem 8.5. Convert the following equations given in polar coordinates to Cartesian coordinates and draw their corresponding graphs.
2
(1) r = 1
(2) r = cos θ
(3) r = cos(2θ )
(4) r =
sin θ − cos θ
Solution 8.5 It’s important to distinguish between the graphs in polar coordinates
and their corresponding graphs in Cartesian coordinates, understanding that the
graphs in polar coordinates correspond to the graphs in Cartesian coordinates via
the transformation (8.1). The overview of the graphs is given in the figure.
74
8 Polar coordinates in R2
p
(1) The equation r = 1 in Cartesian coordinates becomes x2 + y2 = 1, which
represents the equation x2 + y2 = 1. We know this represents a circle with its center
at the origin and radius 1. Even without knowing this, if we plot r = 1 for various
values of θ from 0 to 2π, we would observe a circle with radius 1.
(2) Since cos θ can take negative values, we need to consider the possibility of
r being negative when writing r = cos θ . Multiplying both sides by r, we get r2 =
r cos θ , which, in Cartesian coordinates, becomes x2 + y2 = x. Rewriting this, we get
(x − 0.5)2 + y2 = 0.52 . This represents a circle centered at (0.5, 0) with radius 0.5. In
the polar coordinate space, this graph is represented by the cosine function, which
repeats every 2π interval. Thus, the interval [0, 2π] corresponds to two circles. It’s
worth understanding why this is so when θ moves from 0 to π.
(3) Using the double angle formula, we get r = cos2 θ − sin2 θ , and in Cartesian
coordinates, this becomes (x2 + y2 )3/2 = x2 − y2 . Squaring both sides and rewriting,
we get x6 + 3x4 y2 + 3x2 y4 + y6 = x4 − 2x2 y2 + y4 . It’s not immediately clear what
curve this equation represents. However, in polar coordinates, the graph is simply
the cosine function, and considering the above graph, we end up with a four-leaf
clover pattern due to the absence of overlapping.
(4) In this case, the graph in polar coordinates might seem more complicated,
but when rewritten in Cartesian coordinates, we get y = x + 2, which represents a
straight line. ⊔
⊓
Exercises
1. Convert the following points√
given in Cartesian√
coordinates to polar coordinates.
(1) r = (1, 1) (2) r = (−1, 3) (3) r = (−2 3, −2) (4) r = (0, −2)
8.4 Curves in polar coordinates
75
2. Convert the following points given in polar coordinates to Cartesian coordinates.
√ π
π
π
(2) r̃ = (4, π)
(3) r̃ = (2 3, )
(4) r̃ = (0, )
(1) r̃ = (2, )
2
6
4
3. Sketch the overview of the curves represented by the following polar equations.
(1) r = 1 − cos θ
(2) r = 1 − sin θ
(3) r2 = sin θ
(4) r2 = 4 cos θ
4. Given below are equations of ellipses. Compute the center, foci, and eccentricity.
(1) 16x2 + 25y2 = 400 (2) 9x2 − 18x + 10y2 = 44 (3) 6x2 + 9y2 − 18y = 45
5. Represent the above ellipses in polar coordinates.
6. Convert the equations of the curves given in polar coordinates to Cartesian coordinates and sketch their overview.
1
20
5
5
(2) r =
(3) r =
(4) r =
(1) r =
1 + cos θ
10 − 5 cos θ
1 + 2 sin θ
1 − 0.5 cos θ
7. Find the equation of the ellipse with a directrix at x = 5, eccentricity e = 0.5, and
the focus at the origin.
8. Use equation (8.6) to find the distance from the center of an artificial satellite with
an orbital period of 24 hours to the center of the Earth. (Refer to the necessary
data such as the gravity formula from the internet, etc.)
Lecture 9
Differential Equations
Many physical quantities are given as functions of derivatives, and physical laws are
given by their relation equations. For this reason, differential equations explaining
important phenomena frequently appear. The task of finding solutions to simple
differential equations is the topic of this lecture.
9.1 First order differential equations
The independent variable can be either t or x. We often use time as the independent
variable, but there are many cases where we don’t. To emphasize this, let’s first
consider x as the independent variable. Let the dependent variable be denoted as f
and the solution to the differential equation be denoted as f (x), but in the theory of
differential equations, we often write the dependent variable that needs to be found
as y. Then y becomes implicitly a function of x, i.e., y = y(x). In this notation, x and
y are just general variables, not coordinates. The most general first-order differential
equation can be written as follows:
y′ = f (x, y),
y(x0 ) = y0 .
(9.1)
(We use the symbol f here.) In this notation, the first equation y′ = f (x, y) is the
differential equation. Since only first-order differentials are involved, it is called a
first-order differential equation. The second equation y(x0 ) = y0 is the initial condition. In this case, x0 is considered the initial moment, and y0 is the value that
the function y has at the initial moment. Using Leibniz notation, we can write it as
follows:
dy
= f (x, y), y(x0 ) = y0 .
(9.2)
dx
This notation is a bit friendlier. It explicitly states that y is a function of x, and we
are differentiating y with respect to the variable x. If f is a function of x only, i.e.,
y′ = f (x), then y is an antiderivative of f (x). It can also be easily solved if f = f (y).
If both x and y are on the right side, then it needs to be solved.
When solving first-order differential equations, one general constant appears,
which is determined by the initial conditions. Let’s verify this through a simple
example.
77
78
9 Differential Equations
Problem 9.1 (Easy example). Find the solution to the following first-order differential equation:
y′ = 3x + 2, y(1) = 1.
Solution 9.1 Since f is a function of x only, we integrate:
Z
y=
y′ dx =
Z
3
(3x + 2)dx = x2 + 2x +C.
2
Considering the initial condition y(1) = 32 + 2 + C =
Therefore, the solution is y = 32 x2 + 2x − 52 . ⊔
⊓
7
2
+ C = 1, we find C = − 52 .
The above problem is a simple case, and generally, solving differential equations
is more challenging. However, verifying if a given function is a solution or not is
easier.
Problem 9.2 (Verifying Solutions). (1) Show that for all constants C, the function
dy 1
C
= (2 − y). (2) Show that
y = + 2 is a solution to the differential equation
x
dx x
1 x
y = (1 + x) − e is a solution to the differential equation y′ = y − x, y(0) = 32 .
3
Solution 9.2 (1) Since no initial value is given, there are many solutions including
arbitrary constants C. To include them, solutions contain a general constant C. Let’s
start with y = Cx−1 + 2. Taking the derivative, we have y′ = −Cx−2 . Substituting
y = Cx−1 + 2 into the right-hand side of the equation, we get:
1 C
C
1
(2 − y) = (− ) = − 2 .
x
x
x
x
The left-hand side and right-hand side are the same, so it is a solution.
1
(2) Differentiating y = (1 + x) − 13 ex gives y′ = 1 − ex , and computing the right3
1 x
hand side, y − x = 1 − e . Thus, it satisfies the differential equation. Moreover, it
3
satisfies the initial condition: y(0) = 1 − 31 e0 = 23 . Therefore, it is a solution to the
initial value problem. ⊔
⊓
The first-order differential equation (9.1) or (9.2) is written in a very general form
and in many cases cannot be explicitly solved. However, we can understand what is
happening by creating a slope field on the xy-plane. The principle is simple: draw
a small line segment with slope f (x, y) at the point (x, y). Then, if the graph of
the solution y(x) passes through the point (x, y), the graph of the solution will be
tangent to this small line segment. The collection of these line segments is called a
slope field.
Problem 9.3 (Slope field). Draw the slope field on the domain [−2, 2] × [−2, 2] for
the differential equations y′ = f (x, y) given by the following functions:
9.1 First order differential equations
79
2xy
.
1 + x2
2
(2) f (x, y) = y − x .
(1) f (x, y) =
Solution 9.3 Let’s use MATLAB to draw the slope field of the given functions
f (x, y). Below is the code and the corresponding figure. Practicing to create such
small codes is helpful. ⊔
⊓
- MATLAB CODE %% parameters
L=2.1;
dx=0.2;dy=0.2;
%% variables
[x,y] = meshgrid(-L:dx:L,-L:dy:L);
[NX,NY]=size(x);
%% computation
Y=(2*x.*y./(1+x.*x));
X=ones(NX,NY);
NRM=(Y.ˆ2+X.ˆ2).ˆ0.5;
X=dx*X./NRM;
% normalize to the size of dx
Y=dx*Y./NRM;
quiver(x,y,X,Y);
axis([-2 2 -2 2]);
title(’f(x,y)=2xy/(1+xˆ2)’);
Problem 9.4. Sketch the overview of the solution graphs for the cases in Problem
(9.3) with initial condition y(0) = 1 on the above slope field plots.
Solution 9.4 The initial condition y(x0 ) = y0 implies that the graph of the solution
passes through the point (x0 , y0 ). Therefore, starting from this point, we sketch the
curves tangent to each line segment. ⊔
⊓
80
9 Differential Equations
Let’s try to solve the following differential equation:
y′ = ky,
y(0) = y0 .
This problem is for the case where f = f (y). One method is to use memory. The
function that differentiates to itself is ex . But it’s a little harder to come up with the
fact that the function that doesn’t differentiate to itself but is multiplied by itself is
ekx . (It’s multiplied instead of added.) Once we find this general solution, using the
initial condition, C = y0 , we get:
y = y0 ekx .
However, solving problems empirically like this is too restrictive. We need a systematic way to solve problems. Although we can’t solve all differential equations,
we can solve certain types of them. And we need to remember which types of differential equations can be solved.
9.2 Separation of variables
Let’s use the technique called separation of variables to find the solution when
f (x, y) = 2kxy. Let’s start by writing it in Leibniz notation:
dy
= 2kxy.
dx
Now let’s separate x and y. We put dx and dy each in the x group and the y group.
Then we get:
dy
= 2kxdx.
y
Integrating both sides, we get:
Z
dy
=
y
Z
2kxdx ⇒ ln |y| +C1 = kx2 +C2 ⇒ ln |y| = kx2 +C.
Here, C2 − C1 can be regarded as a single general constant, so we replaced it with
C. Taking the exponential function, which is the inverse function of the natural logarithm ln y, on both sides, we get:
eln |y| = |y|,
ekx+C = eC ekx .
Therefore,
|y| = eC ekx ⇒ y = Cekx .
ekx .
(9.3)
eC
Using the initial condition here yields y = y0
In (9.3), note that
becomes C,
and |y| becomes y simultaneously. Even when the general constant C is negative, eC
9.3 Integrating factor
81
is positive. Therefore, |y| = eC ekx is a correct expression. If a new general constant C
were used instead of eC , |y| would become y, and the sign would need to be adjusted
accordingly.
Question 9.1. However, is it permissible to solve it in this manner? The differential
dy
to x, but is it acceptable to multiply both
dx instructs to differentiate y with respect
R
sides by dx, attach the integral symbol , integrate the left side with respect to y,
and integrate the right side with respect to x?
The background of the separation of variables technique involves the chain rule.
Let’s explain this. If f (x, y) can be divided as follows:
y′ = f (x, y) = g(x)h(y),
we can express it as:
1 ′
y = g(x).
h(y)
Now, if we find G′ (x) = g(x) and H ′ (y) =
1
h(y)
such that G and H are functions of x
d
and y respectively, then by the chain rule, we have dx
H(y) = H ′ (y)y′ . Applying the
Fundamental Theorem of Calculus and integrating both sides with respect to x, we
obtain:
Z
Z
1 ′
y dx = g(x)dx ⇒ H(y) = G(x) +C.
h(y)
The solution y is implicitly given by the above equation. If the inverse function of
H exists, then y = H −1 (G(x) +C) is obtained.
Problem 9.5. Find the general solution of the following differential equations.
(1) y′ = (1 + y)ex
(2) y(x + 1)y′ = x(y2 + 1)
Solution 9.5 Since there are no initial conditions given, we find solutions that include a general constant C. ⊔
⊓
9.3 Integrating factor
A first-order linear equation is of the form a(x)y′ + b(x)y = c(x). It can be transformed into the following form by dividing by a(x) on intervals where a(x) ̸= 0:
y′ + P(x)y = Q(x).
(9.4)
Here, P(x) is the coefficient of the zeroth-order term, and Q(x) is the inhomogeneous
term. If Q(x) = 0, (9.4) is called a homogeneous problem. Although it is permissible
to write Q(x) on the left and 0 on the right, it is more common to write it on the right.
First-order linear equations can be solved using the integrating factor technique.
If P(x) is integrable, the integrating factor of the first-order linear equation (9.4) is
82
9 Differential Equations
as follows:
I(x) = e
R
P(x)dx
.
It is important to remember that the usefulness of the integrating factor can be understood from its derivative:
R
I ′ (x) = P(x)e
P(x)dx
= P(x)I(x).
Now, multiplying the above equation by the integrating factor, something good happens:
Iy′ + IPy = IQ ⇒= Iy′ + I ′ y = IQ ⇒ (Iy)′ = IQ ⇒ Iy =
Z
IQdx.
Therefore, if IQ is integrable, the solution is as follows:
y=
1
I(x)
Z
I(x)Q(x)dx = e−
R
P(x)dx
Z
R
e
P(x)dx
Q(x)dx.
For instance, if P and Q are continuous functions, integration is possible, and the
solutions to the first-order linear equation (9.4) are provided by the aforementioned
integral expressions.
Problem 9.6. Determine the solutions and the intervals of existence for the following differential equations.
(1) xy′ = x3 + 3y.
(2) y′ = x − 32 y.
Solution 9.6 (1) Rewriting in the form of (9.4), we have y′ − 3x y = x2 . This is the
case where P(x) = − 3x . If x = 0 is included, the integral is not feasible. The solution
space is divided into x > 0 or x < 0. Let’s only find solutions for x > 0. Then the
integrating factor is
R 3
I = e − x dx = e−3 ln x = x−3 .
(Integrating factor does not include a general constant C. One integrating factor is
sufficient for helping in integration.) Therefore,
y = x3
Z
x−3 x2 dx = x3 (ln x +C).
(2) is done similarly. ⊔
⊓
Problem 9.7. A circular water tank with a diameter of 10 meters has water flowing
into it at a rate of 50 liters per second as shown in the figure. Water leaks from the
bottom of the tank at a rate of 10×y liters per second, where y meters is the height
of the water. Find the first-order differential equation for the height of the water and
its solution. The water inflow starts at t = 0 and there is no water in the tank at that
time.
9.5 Equation for two-body problem
Solution 9.7
ẏ +
83
1
1
y=
,
10000
2000
y(0) = 0.
⊔
⊓
9.4 Second Order Differential Equations
Now, we solve for the solutions of second-order linear equations. A second-order
linear equation can be written as follows:
y′′ + a(x)y′ + b(x)y = Q(x).
Solving a second-order equation corresponds to integrating twice, leading to the
appearance of two general constants. To determine these, two conditions are necessary:
y(x0 ) = y0 , y′ (x0 ) = y1 .
Solving second-order equations is more difficult than first-order equations and only
resolves in special cases. In this lecture, we solve for the solution when a, b, Q are
all constants. This form is given by the equation for the orbit of a celestial body.
9.5 Equation for two-body problem
To determine the orbit of two celestial bodies, such as the Sun and Earth, we need
to solve the following differential equation:
u′′ + u = K.
(9.5)
Obtaining this equation is the main goal of Lecture 11. The K on the right side is
(m1 + m2 )G
. Here, m1 and m2 are the masses of the two
a constant given by K =
L2
celestial bodies, G is the gravitational constant, and L is the angular momentum,
all of which are constants. If x1 (t) and x2 (t) are the positions of the two celestial
bodies at time t, then u is the reciprocal of the distance between the two bodies,
r = ∥x1 − x2 ∥. However, the differentiation in (9.5) is not with respect to the time
variable t but with respect to the angle variable θ in polar coordinates. The method
for solving a second-order linear differential equation with constant coefficients is
described in detail in Appendix A.
The solution to the second-order differential equation (9.5) requires two initial
conditions:
(m1 + m2 )G
u = (1 + e cos(θ − θ0 ))K, K =
.
L2
84
9 Differential Equations
Here, θ0 is the initial angle, and e is the eccentricity. (It is a tradition to use the
same symbol e for eccentricity as the natural constant e, distinguishing them from
the context.) These two are determined by the two initial conditions.
Remark 9.1. The natural initial conditions for determining the orbit of a planet are
the initial positions and velocities of the two planets. However, u is a function of the
angle θ , so you need to know the initial angle θ0 at the initial moment to find the
initial conditions. However, finding the initial angle is only possible after finding
the solution. Once the shape of the solution is known, it is determined by the initial
conditions, but finding the initial angle requires finding a way to express the solution
using the conserved energy. This will be done in Lecture 12.
Exercises
1. Find the general solutions for the following differential equations.
(1) xy′ = y2 x2 .
(2) x−1 y′ = y sin x.
(3) xy′ − 2y = x3 sin x cos x.
2. Determine the general solutions and intervals of existence for the following differential equations.
(1) y′ = x−1 ex − xy.
(2) x2 y′ = xy − ex .
(3) x3 y′ + dx2 y = cos x.
3. Find the general solutions for the following differential equations.
2
d2y
dy
dy
dy
= y2 − 2t. (2) 2 + 2 + y = 0 (3)
= y2 e−t
(1)
dt
dt
dt
dt
Lecture 10
Newton’s law on Earth
10.1 Newton’s law of motion and gravitation
Newton’s three laws of motion are as follows.
1. Law of inertia: An object moves at a constant velocity if no external forces act
on it.
2. Law of force: Force is equal to the product of mass and acceleration (F = ma).
3. Law of action-reaction: For every action, there is an equal and opposite reaction.
The first law is Galileo’s law of inertia, which describes motion at a constant
speed, corresponding to a = 0 in the second law, which is a special case.
According to Newton’s law of universal gravitation, the gravitational force between two objects is inversely proportional to the square of the distance between
them and directly proportional to the product of their masses. Let m1 and m2 be the
masses of two objects, and x1 and x2 be their position vectors. Then, the gravitational force acting on object m1 is given by
Fm1 = −G
m1 m2
m1 m2 r
= −G 2 er
2
r r
r
(10.1)
where G is the gravity constant with a value of G ∼
= 6.674 × 10−11 m2 /kg s. r is the
distance between the two objects, r is the position difference vector, and er is the
unit vector in the direction of r. That is,
r = x1 − x2 ,
r = ∥r∥,
r
er = .
r
(10.2)
Here, r is the position vector pointing from object m2 to object m1 . Hence, the
motion of m1 as observed by an observer at m2 is r(t). We will consider m2 as a
large object like the sun and m1 as a small object like the Earth. In Equation (10.2),
the trajectory of r(t) with respect to the time parameter t ∈ R is viewed from the
origin with m2 . In the next chapter, we will see that this trajectory is an elliptical
85
86
10 Newton’s law on Earth
orbit. However, in reality, m2 also moves slightly, so the motion of m1 is a slightly
deviated orbit from the ellipse by the amount m2 moves. What remains stationary (or
moves at a constant speed) is the center of mass of the two celestial bodies, which is
not the sun but the center of mass of the two celestial bodies, the sun and the Earth.
The vector rr is a unit vector pointing from x2 to x1 . This corresponds to the unit
vector er in Lecture 8 when x2 is taken as the origin in polar coordinates.
10.2 Work and energy
When an object receives a force F and moves a distance ℓ, the magnitude of work
W is given as follows:
W = f ℓ (work = component of force in the direction of motion × displacement).
(10.3)
Here, f refers to the component of force F in the direction of motion. If the force F
is perpendicular to the direction of motion, then f = 0, and no work is done by the
force. For instance, if a planet or satellite orbits in a circular orbit, the force acting,
i.e., gravity, is perpendicular to the direction of motion, and thus, the work done is
W = 0.
Question 10.1. Why is work defined as the product of force and displacement?
Is Equation (10.3) a definition of work? If work is energy, then Equation (10.3)
should be a formula for calculating energy, not a definition of work. Some books
refer to Equation (10.3) as the definition of work. If that is the case, then one must
separately demonstrate that work and energy are the same. In any case, what needs
to be explained is that using Equation (10.3) for calculation yields the correct energy. And by ”correct,” it is meant that the calculated energy does not contradict
existing energy concepts. In fact, Equation (10.3) can be understood as a formula
for calculating potential energy.
Let’s see through an example how energy and work are connected. Suppose a
mass m in a stationary state in one-dimensional space receives force f = ma for a
time t. Then the obtained velocity is v = at. Therefore, the kinetic energy at that
moment is Ek = 21 ma2t 2 . So what is the distance traveled? The distance traveled is
obtained by integrating the velocity. That is,
Z t
ℓ=
0
1
as ds = as2
2
t
0
1
= at 2 .
2
Therefore, using Equation (10.3) to calculate work, W = f ℓ = ma × 12 at 2 = 12 ma2t 2 ,
which is equal to kinetic energy. In other words, energy can also be calculated using
Equation (10.3). In reality, Equation (10.3) is a formula for calculating potential
10.3 Gravity force and potential energy
87
energy when the parameter for energy calculation is changed from time to distance
(arc-length).
What if the force is not constant but a function? If it is a function of time, then
it means that acceleration varies with
time, and thus, velocity becomes the integral
R
of acceleration, i.e., v(t) = v(0) + 0t a(s)ds. Therefore, kinetic energy can be easily
obtained. If the force is a function of position, then integration using Equation (10.3)
is necessary. The actual gravity (10.1) is a function of position or distance, and in
this case, Equation (10.3) is more useful than the formula for kinetic energy. For
example, if an object moves along the x-axis and the force component in the xdirection is a function of x, i.e., f = f (x), then the work done by the force f (x)
between x = a and x = b is given by
Z b
W=
f (x)dx.
a
It is called a definite integral because it calculates the accumulated work done by the
force f (x) from the beginning to the end. That is, the definite integral is to determine
the signed area of the graph of f (x) from x = a to x = b.
10.3 Gravity force and potential energy
The motion energy of a planet undergoes exchange between potential and kinetic
energy as it alternates between acceleration and deceleration. When an object with
mass m1 moves with velocity v, the kinetic energy is given by:
1
Ek = m1 ∥v∥2 .
2
The following problem demonstrates that the potential energy due to gravity on the
surface of Earth can also be expressed as a product of gravity and distance.
Problem 10.1 (Gravity on the earth surface). The gravitational force exerted on
an object with mass m1 at the Earth’s surface is −m1 gk̂. Here, g = 9.8 m/sec2 is
the gravitational acceleration, and k̂ is the unit vector in the vertical direction on
the Earth’s surface. If this object is placed at a height h > 0 above the surface, the
object’s potential energy is
E p = m1 gh
(10.4)
Show the following:
(1) Confirm the magnitude of the gravity constant g using Equation (10.1).
(2) Explain the concept of potential energy (10.4) using the work concept.
(3) Explain the significance of potential energy (10.4).
Solution 10.1 (1) The mass m corresponds to m1 , and the vector k̂ corresponds to
r/r. Therefore, the remaining part corresponds to the constant g:
88
10 Newton’s law on Earth
g = Gm2 /R2 ≈ 9.8 m/sec2
Here, m2 is the mass of Earth, and R is the radius of Earth. The value of g can be
verified by finding it on the internet.
(2) Work is a method of calculating potential energy. If the force F in the direction
of motion of an object with respect to the ground is constant, then the work is given
by fz h. Here, h is the (vertical) displacement. Therefore, the potential energy is
E p = m1 gh.
(3) The energy required to push the object from the Earth’s surface to its current
position is the potential energy. Alternatively, it is the amount of work needed for
the object to fall to the Earth’s surface from that position.
Problem 10.2. A mass of 2Kg is thrown vertically upward from the ground with a
force of twice the gravity for t seconds. Calculate the kinetic and potential energies
at that moment.
Solution 10.2 If the force is twice the gravity, then 2mg = 4Kgg. The acceleration is g since we subtract gravity. Therefore, the velocity after t seconds is
Rt
gds = gt. Therefore, the kinetic energy is 21 mv2 = g2t 2 Kg. The distance traveled is
R0t
1 2
1 2
2 2
0 gsds = 2 gt , so the potential energy is E p = mgh = (2Kg)g 2 gt = g t Kg. The
2
2
2
2
2
2
total energy is g t Kg + g t Kg = 2g t Kg. Alternatively, using Equation (10.3),
the total energy can be calculated. Then, 4gKg × 21 gt 2 = 2g2t 2 Kg. If the total energy after 100 seconds is expressed in units, since g = 9.8 m/sec2 , the total energy
is as follows:
Etotal = 2(9.8)2 m2 /sec4 × (100)2 sec2 Kg = 1.9208 × 106 m2 Kg/sec2 .
Calculating the potential energy or gravity between planets or between a planet
and a star requires a different approach. In these cases, gravity cannot be treated as a
constant. Gravity becomes a function of distance, requiring integration to calculate
energy. However, there is another fundamental problem. Potential energy on the
Earth’s surface is defined to be 0, with the Earth’s surface as the reference point.
What should be the reference point for potential energy between planets?
Problem 10.3 (Potential energy with Earth’s surface as reference). Gravity is a
function of distance r between two objects given by Newton’s law of gravitation
(10.1). Let’s denote the mass of Earth as m2 . For an object with mass m1 located at
a distance r > 0 from the center of Earth (not on the Earth’s surface), the potential
energy is given by
E p = Gm1 m2 (R−1 − r−1 ),
(10.5)
where R is the radius of Earth and r is the distance between the object’s center and
the Earth’s center.
Solution 10.3 First, assume that the object moves up and down along the center
of the Earth. The k̂ component of gravity is given by f = −Gm1 m2 s−2 . Here, s
10.3 Gravity force and potential energy
89
is the distance to the center of the Earth. Assume pushing the object away from
the Earth’s surface requires a force in the opposite direction. Integrating gravity for
r > R yields:
Z r
R
Gm1 m2 s−2 ds = −Gm1 m2 s−1
r
R
= Gm1 m2 (R−1 − r−1 ).
This matches (10.5). Let h denote the distance from the surface. Then, r = R + h.
Therefore, the potential energy is:
E p = Gm1 m2
1
R
−
1 R+h−R
h
R2
= Gm1 m2
= Gm1 m2 2 2
.
R+h
R(R + h)
R R + Rh
If h is much smaller than R,
R2
R2 +Rh
≈ 1. The potential energy can then be written as:
E p ≈ Gm1 m2
Gm2
h
= m1 2 h,
2
R
R
which is a valid approximation for the potential energy (10.5). (The radius of Earth is
2
R = 6371 km. If h = 10 km, then R2R+Rh ≈ 0.9984, with a difference of about 0.16%.)
h
Remark 10.1 (A brief note). Since h is much smaller than R, we can say R(R+h)
≈ Rh2 .
However, we left the h in the numerator. We shouldn’t delete everything just because
it’s small. Depending on what we want to see, we can distinguish between what can
be deleted and what shouldn’t be deleted, depending on what’s around.
Question 10.2. The potential energy (10.5) becomes 0 on the Earth’s surface. This
definition represents potential energy with respect to the Earth’s surface. What happens if we calculate potential energy with respect to the center of the Earth?
When calculating the potential energy from the center of the Earth, it corresponds
to the case where R = 0. In this scenario, the potential energy given by (10.5) diverges, meaning:
lim Gm1 m2 (R−1 − r−1 ) = ∞.
R→0
This implies that the potential energy becomes infinite when measured from the
center of the Earth. Essentially, this suggests that an infinite amount of energy is
required to move away from the center of the Earth. In other words, objects located
at the center of the Earth cannot escape. (Even if an object has a small mass, if it
can be compressed sufficiently, nothing can escape from within. Such objects are
known as micro black holes.)
If potential energy cannot be measured from the center of the Earth, the next
natural choice is to measure it from ∞. Then, when R = ∞, the potential energy is
given by:
Gm1 m2
.
(Potential Energy)
Ep = −
r
90
10 Newton’s law on Earth
In this case, the drawback is that potential energy is negative. When measured from
infinity, the potential energy is 0 at ∞ and becomes increasingly negative as it approaches the Earth’s center. But among other choices, this is the best one. When
considering the movement between planets, the reference point for potential energy
is r = ∞, and the potential energy is negative and becomes 0 at r = ∞. When considering movement due to gravity on the Earth’s surface, the reference point is the
surface of the Earth, and potential energy is positive, reaching a minimum of 0 at
h = 0.
10.4 Projectile motion
Let’s examine the trajectory of a projectile launched from the ground at an angle
φ ∈ (0, π2 ) with an initial velocity v0 > 0. The objective is to find the projectile’s
trajectory before it touches the ground again, the maximum height reached before it
falls, the distance traveled, and the time it stays in the air. Air resistance is ignored.
Assuming the projectile moves in the xz-plane, let’s find the trajectory r(t).
Let the starting point be the origin, r(0) = 0, and the initial velocity be v(0) =
(v0 cos φ , v0 sin φ ). The acceleration a is given by gravity, so a(t) = (0, −g). The
velocity vector v(t) at time t is obtained by integrating the acceleration with initial
conditions:
Z
c1
c
v cos φ
v(t) = a(t)dt =
, v(0) = 1 = 0
.
−gt + c2
c2
v0 sin φ
Thus, v(t) = (v0 cos φ , −gt + v0 sin φ ). Integrating once more to calculate the position vector:
Z
v0 cos φt + c1
c
0
r(t) = v(t)dt =
⇒ r(0) = 1 =
.
c2
0
− 21 gt 2 + v0 sin φt + c2
Therefore, the projectile’s trajectory is:
v0 cos φt
x(t)
r(t) =
=
.
z(t)
− 12 gt 2 + v0 sin φt
z(t) = 0 represents the moment when the projectile is on the ground. Therefore,
solving − 12 gt 2 + v0 sin φt = 0 gives us the moments when it touches the ground.
One solution is the initial time, t = 0. The other is:
T=
2v0 sin φ
g
Time of flight
when it touches the ground again. The x-component x(T ) at time T is the distance
traveled:
10.4 Projectile motion
91
R = x(T ) =
2v20 sin φ cos φ
g
Range
The projectile’s maximum height occurs at half of the total time of flight, so:
H = z(T /2) =
v20 sin2 φ
2g
Maximum height
Problem 10.4. Explain how the projectile trajectory changes if there is a crosswind
blowing at a speed of v1 .
Solution 10.4 If we ignore air resistance, no matter how strong the crosswind is, it
doesn’t affect the projectile’s trajectory. When considering air resistance, the method
used above is not sufficient.
Problem 10.5. Given a fixed launch velocity, how can you maximize the distance
the projectile travels?
Solution 10.5 If the launch angle φ is fixed, the maximum distance and height are
proportional to the square of the velocity v20 . The time of flight is proportional to v0 .
If the velocity is fixed, you can choose the angle φ . The range is maximized when
sin φ cos φ reaches its maximum value. To find the maximum, differentiate it since
it’s 0:
(sin φ cos φ )′ = cos2 φ − sin2 φ = 2 cos2 φ − 1.
Thus, the critical points are when cos φ =
√1 ,
2
so φ = π4 .
Question 10.3. The following text is from a baseball magazine: ”We were taught
in school that the ’most distance a ball can be thrown angle’ is 45 degrees. But
in actual baseball, the optimal launch angle is close to 30 degrees.” Why is this
different? (The optimal angle for a golf ball is about 17 degrees.)
The reason is air resistance and the spin of the ball. The ball’s spin is due to the
bottom part of the bat hitting the ball. If the launch angle is 45 degrees and the ball
has such spin, the actual trajectory is much higher than the optimal trajectory. The
spin of a golf ball is also caused by hitting the bottom of the ball, making the spin
more pronounced than a baseball and having a greater impact due to the surface of
the ball. Of course, without air resistance, 45 degrees is always the optimal launch
angle.
Exercises
1. Calculate the potential energy of a 10 kg object on the Earth’s surface. (Consider
R = ∞ as the reference point.)
92
10 Newton’s law on Earth
2. Calculate the gravitational force between the Earth and the Sun. Compare it with
the gravitational force between Mars and the Sun. (Necessary data can be found
on the internet, such as the masses of Earth and Mars, and the distances from the
Sun.)
3. Let the mass of Jupiter be 1.899 × 1027 kg and its radius be 140, 000 km. Calculate the magnitude of the gravitational force on Jupiter’s surface and compare it
with the gravitational force on the Earth’s surface.
4. A 10 kg piece of iron falls into water with a depth of 10 meters. How much work
does gravity do? (Necessary data can be found on the internet, such as the density
of iron.)
5. Assume the speed of sound is 340 m/s. Calculate the maximum distance traveled
when the projectile’s velocity is equal to the speed of sound. Also, determine the
time of flight and maximum height reached.
6. It is said that the maximum range of a K9 howitzer is 53 km. What is the launch
velocity?
Lecture 11
Newton’s law in space: Two-body problem
The purpose of this lecture is to solve the trajectory between two celestial bodies
due to gravity using Newton’s laws of motion and differentiation, and to confirm
Kepler’s laws through the two-body problem. We consider differentiation with respect to the time variable t in problems involving motion. In this case, to distinguish
differentiation with respect to the time variable t from differentiation with respect
to spatial variables, we use the following notation:
Ḟ :=
d
F = F ′ (t).
dt
11.1 Kepler’s laws
Astronomer Johannes Kepler described the orbits of planets around the Sun with
three laws between 1609 and 1619. These laws modify Copernicus’ theory of circular orbits centered on the Sun and explain how planetary velocities change. Kepler’s
three laws of planetary motion are as follows:
1. The orbit of a planet is an ellipse with one of the two foci at the Sun.
2. The area swept out by the line connecting the planet and the Sun is constant with
time.
3. The square of the orbital period of a planet is proportional to the cube of the
semi-major axis.
Isaac Newton showed in 1687 that Kepler’s three laws are generated as a result of
Newton’s laws of motion and the law of gravity presented in Section 10.1. We aim
to understand this process in this lecture.
In this section, we follow the notation from the previous section. For example, if
the masses of two celestial bodies are m1 and m2 , and their positions are x1 and x2 ,
93
94
11 Newton’s law in space: Two-body problem
respectively, then it is the case where the position difference is r = x1 − x2 when
viewing m1 from m2 (m2 is assumed to be at the origin, i.e., x2 = 0, then r = x1 ).
Problem 11.1 (Plane motion). (1) Show that there exists a constant vector c satisfying the following equation:
r × ṙ = c.
(11.1)
(2) Explain the meaning of this relationship.
To solve this problem, one must remember the properties of the cross product:
1. The cross product v1 × v2 is defined between two 3-dimensional vectors.
2. The cross product v1 × v2 is a vector perpendicular to both v1 and v2 .
3. If v1 and v2 are parallel or if one of them is zero, then v1 × v2 = 0.
4. The derivative of a cross product satisfies the following product rule:
d
(v1 × v2 ) = v̇1 × v2 + v1 × v̇2 .
dt
Solution 11.1 (1) The gravitational force acting on object m1 is given by Fm1 =
− Gmr12m2 er , where the direction is towards m2 . Then, since m1 ẍ1 = Fm1 and m2 ẍ2 =
−Fm1 , both ẍ1 and ẍ2 are in the direction of r or −r. Therefore,
d
(r × ṙ) = ṙ × ṙ + r × r̈ = 0 + 0 = 0.
dt
Thus, there exists a constant vector c such that r × ṙ = c.
(2) Consider a plane perpendicular to the vector c. Then, Equation (11.1) implies
that the vector r is perpendicular to c, meaning it lies on the plane. Thus, object m1
moves on this plane. ⊔
⊓
11.2 Two-body problem
Consider two objects moving in space. Let m1 and m2 be their masses, and x1 (t)
and x2 (t) be their positions at time t. Assuming no external forces other than gravity act between them, they satisfy two second-order differential equations given by
Newton’s law of force, i.e.,

m1 ẍ1 = −G m1 m2 er ,
r2
(11.2)
m2 ẍ2 = G m1 m2 er .
2
r
This problem is called the two-body problem. Let’s solve this problem to see if the
solutions indeed satisfy Kepler’s laws.
11.3 Center of mass
95
Note that Newton’s third law of motion, the law of action and reaction, is already
embedded in the differential equations (11.2). The force exerted on m1 by m2 is
pulling m1 , and conversely, the force exerted on m2 by m1 is pulling m2 , so their
sum is
m1 ẍ1 + m2 ẍ2 = 0.
These forces are equal in magnitude but opposite in direction.
While (11.2) can be seen as second-order vector differential equations in three
dimensions, knowing that they lie in a plane, we understand them as second-order
vector differential equations in two dimensions. So, in fact, we need to solve a total
of four scalar equations. To solve first-order differential equations, we need one
initial condition. Second-order differential equations require two initial conditions.
To solve two second-order differential equations, we need four initial conditions.
That is, suppose the following are given:
x1 (0), ẋ1 (0), x2 (0), and ẋ2 (0).
(11.3)
Thus, there are four degrees of freedom in the choice of initial values. This means
that there are various possibilities. Since each of them is a vector on the plane, there
are a total of eight degrees of freedom.
11.3 Center of mass
To find the solutions x1 (t) and x2 (t) of the two second-order vector differential
equations (11.2), we proceed with simplification. First, we find the center of mass,
which is the position where the total mass is balanced, defined as the weighted
average of positions with respect to mass. It is as follows:
R=
m2
m1 x1 + m2 x2
m1
x1 +
x2 =
.
m1 + m2
m1 + m2
m1 + m2
(Center of Mass)
Taking the second derivative with respect to time of the center of mass, we get,
R̈ =
m1 ẍ1 + m2 ẍ2
= 0.
m1 + m2
In other words, the center of mass moves at a constant velocity. Therefore, as discussed in Lecture 7, we can adopt a coordinate system where the center of mass is at
the origin. This coordinate system is called the center of mass frame. In this frame,
the following holds:
R = Ṙ = R̈ = 0.
Under this coordinate system, the initial values in (11.3) must satisfy the following
two conditions:
96
11 Newton’s law in space: Two-body problem
m1 x1 (0) + m2 x2 (0) = 0,
m1 ẋ1 (0) + m2 ẋ2 (0) = 0.
Here, four degrees of freedom have been used, leaving four remaining.
11.4 Displacement vector
What we are calculating is the position difference vector r = x1 −x2 . In other words,
it means calculating the trajectory with x2 at the origin. Just like when a heavy
celestial body like the sun is at the origin, thinking of x2 = 0, r can be considered
as the position of the earth. However, this should not be calculated as such because
what is fixed is the center of mass. Therefore, even if r satisfies the elliptical orbit,
the actual trajectory of m1 is slightly off the ellipse. How much is it? It is the distance
between the center of mass and x2 .
Now let’s consider the movement of r. Since Fm2 = −Fm1 , the difference in acceleration is as follows:
1
Fm
Fm
1 m1 + m2
+
Fm1 =
Fm1 .
r̈ = ẍ1 − ẍ2 = 1 − 2 =
m1
m2
m1 m2
m1 m2
Rewriting Fm1 using Newton’s law of gravity, we get:
r̈ = −
m1 + m2 m1 m2
1
G 2 er = −G(m1 + m2 ) 2 er .
m1 m2
r
r
If we denote the term k = G(m1 + m2 ), this equation can be written as:
r̈ = −
k
er ,
r2
k = (m1 + m2 )G.
(11.4)
This type of problem is called the Kepler problem. The same problem arises not
only with gravity but also in the case of electric fields. Once the displacement vector
11.5 Kepler problem
97
r(t) is obtained, we can determine the two trajectories x1 and x2 using the center of
mass R(t) and the vector difference r(t).
Problem 11.2. Show the following:
x1 (t) = R(t) +
m2
r(t),
m1 + m2
x2 (t) = R(t) −
m1
r(t).
m1 + m2
(11.5)
Solution 11.2 It’s a simple calculation. R and r are given by:
R=
m1
m2
x1 +
x2 ,
m1 + m2
m1 + m2
r = x1 − x2 .
To compute x1 , we can eliminate x2 . Substituting x2 = x1 − r, we have:
R=
m1
m2
m2
x1 +
(x1 − r) = x1 −
r.
m1 + m2
m1 + m2
m1 + m2
The rest is straightforward. We obtain x2 similarly. ⊔
⊓
Remark 11.1. When m1 is significantly smaller than m2 , we can ignore m1 in the
equation (11.4) and use k = m2 G. Considering the relationship between the Earth
and its satellite, the mass of the satellite is much smaller than the Earth’s mass, so
it may be reasonable to set m1 = 0 and use k = (m1 + m2 )G. In the case of the
relationship between the Earth and the Sun, the error in the mass of the Sun may
be larger than the mass of the Earth, so it may be reasonable to use k = m2 G. In
practice, this is how it is done. However, in (11.5), using k = (m1 + m2 )G makes a
difference when ∥r∥ is large. For satellites orbiting the Earth, the distance is small,
but for planets far away, the distance is significant and cannot be ignored.
11.5 Kepler problem
Let’s solve the Kepler problem (11.4). Restating the problem, we have:
r̈ = −
k
er ,
r2
k = (m1 + m2 )G.
Rewriting in terms of acceleration, we have:
a=−
k
er ,
r2
k = (m1 + m2 )G.
(11.6)
Problem 11.3. Using the relations from Equations (8.5)–(8.7), show that the vector
equation (11.6) can be written as the following two scalar equations:
98
11 Newton’s law in space: Two-body problem
r̈ − rθ̇ 2 = −
(m1 + m2 )G
,
r2
(11.7)
rθ̈ + 2ṙθ̇ = 0.
(11.8)
Solution 11.3 Bringing Equation (8.7), we have:
a = (r̈ − rθ̇ 2 )er + (rθ̈ + 2ṙθ̇ )eθ = −
k
er
r2
Comparing the coefficients of er and eθ , we obtain the two expressions above. ⊔
⊓
These two equations play a crucial role. Equation (11.8) implies the law of conservation of angular momentum, which is then used to derive the orbit equation from
Equation (11.7).
Problem 11.4 (Kepler’s Second Law and Conservation of Angular Momentum). (1) Show that the angular momentum r2 θ̇ is constant. (2) Explain how this
relates to Kepler’s Second Law. (i.e., verify that the rate of change of the area swept
out by the line connecting the Sun and the Earth when r = r(t) and θ = θ (t) is the
angular momentum.)
Solution 11.4 (1) First, let’s differentiate the angular momentum r2 θ̇ . Using Equation (11.8), we obtain:
d 2
(r θ̇ ) = 2rṙθ̇ + r2 θ̈ = r(2ṙθ̇ + rθ̈ ) = 0.
dt
Thus, the angular momentum r2 θ̇ is constant.
(2) From t to t + h, the area swept out by the line connecting the two celestial
bodies is similar to the area of a sector. However, since the radius r(t) is not constant,
there exists some radius between the maximum and minimum radii such that the
area is given by:
1
(θ (t + h) − θ (t))r2 (t ∗ ),
2
t < t ∗ < t + h.
Using the mean value theorem, t ∗ lies between t and t + h, converging to t as h → 0.
Therefore, the instantaneous rate of change is given by:
lim
h→0
1 θ (t + h) − θ (t) 2 ∗
1
r (t ) = θ̇ (t)r2 (t).
2
h
2
Thus, the rate of change of the area swept out by the line is half the angular momentum r2 θ̇ , and since the angular momentum is constant, this verifies Kepler’s Second
Law. ⊔
⊓
Now let’s derive the differential equation satisfied by the planet’s orbit. Equations
(11.7) and (11.8) are two equations with r and θ as dependent variables and time
11.5 Kepler problem
99
t as the independent variable. Now, we want to eliminate time t and express r as a
function of θ , treating θ as the independent variable. To do this, we use the fact that
angular velocity is constant, denoted as L:
L := r2 θ̇ .
Then, θ̇ = L/r2 > 0. In other words, θ is a monotonically increasing function of
time. Therefore, instead of the time variable t, we can use θ for the variable transformation. Using the chain rule, time derivatives can be expressed in terms of derivatives with respect to θ :
dθ d
d
L d
d
=
= θ̇
= 2
.
dt
dt dθ
dθ
r dθ
We then switch to the reciprocal of r, denoted as u = r−1 . Then,
d(r−1 )
dr
du
=
= −r−2
dθ
dθ
dθ
is satisfied. Now, let’s substitute each term of Equation (11.7) into u(θ ). First, for
the term r̈:
2
d dr L d L dr 2 2d u
r̈ =
= 2
=
−L
u
dt dt
r dθ r2 dθ
dθ 2
Then, for the second term of Equation (11.7):
rθ̇ 2 =
(r2 θ̇ )2
L2
=
= u3 L2
r3
r3
And for the third term:
(m1 + m2 )G
= u2 (m1 + m2 )G
r2
Now, let’s substitute into Equation (11.7) and divide by −L2 u2 , we get the final
equation in terms of u:
d2u
(m1 + m2 )G
+u =
(11.9)
dθ 2
L2
This equation is the inhomogeneous second-order differential equation introduced
in Equation (9.5). Its solution is as follows:
u=
(m1 + m2 )G
(1 + e cos(θ − θ0 )).
L2
(11.10)
Since u = r−1 , we can express the distance r between the two bodies as follows:
r=
L2
G(m1 + m2 )(1 + e cos(θ − θ0 ))
(11.11)
100
11 Newton’s law in space: Two-body problem
We call e the eccentricity and θ0 the phase offset, determined by initial conditions.
2
If e = 0, the orbit is a circle with radius r = G(mL+m ) . If 0 < e < 1, the orbit is an
1
2
ellipse. If e > 1, the orbit is a hyperbola. The boundary case e = 1 is a parabola. The
relationship between eccentricity and orbit is explained in detail in Appendix B.
Question 11.1. What do we mean by saying that the orbit is a hyperbola or a
parabola?
It means that the orbit does not orbit around the sun but rather passes by and continues.
Once the center of mass is fixed, there are 4 degrees of freedom remaining. To
specify the planet’s orbit, four elements are needed, and four degrees of freedom
determine them. The four elements specifying the orbit are as follows:
1. Angular velocity L: Determines the size of the orbit.
2. Eccentricity e: Determines the shape of the orbit.
3. Phase offset θ0 : Determines the position of the planet on the orbit.
4. Angle of the major axis: Determines the angle between the orbit’s major axis and
the x-axis.
We are primarily interested in L and e. In particular, these two are related to
the total energy of the planet, and we will learn about this relationship in the next
lecture.
Question 11.2. Kepler’s laws provide a remarkably accurate description of planetary orbits, derived from solving Newton’s two-body problem. It’s remarkable that
Kepler derived them from observational data. The only thing missing from Kepler’s
laws is the fact that the center of mass does not move. How far away is the center of
mass from the Sun? In the case of Jupiter, how far away is the center of mass from
the Sun?
Exercises
1.
2.
Lecture 12
Kepler’s law and the energy of planets
Curiosity: the final frontier for intelligence. These are voyages of college students. Its four-year mission: to explore strange ideas. To seek out new ideas and
methodology. To boldly think what no man has thought about before!
The solar system has eight planets orbiting around the sun. Each planet revolves
in an elliptical orbit, with eccentricity close to or below 0.1, resembling circles.
Only Mercury has an eccentricity of about 0.2. Although the planes of revolution
for the eight planets differ slightly, they appear to lie on a single plane. All eight
planets rotate in the same direction. In contrast, comets like Halley’s Comet or
HD20782b, discovered in 2006, orbit in highly eccentric orbits, with eccentricities
around 0.97 and 0.9999, respectively. Why do they orbit in such different trajectories? Although it is believed that these eight planets and the sun were formed
simultaneously, what evidence supports this? Were comets with eccentricities close
to 1 also formed around the same time as the solar system?
12.1 Energy of circular orbits
To obtain the equation for planetary orbits (11.10), one had to derive equation (11.9),
but most properties of planetary motion can be derived from the conservation of angular velocity obtained from equation (11.8) (i.e., the fact that L = r2 θ̇ is a constant
101
102
12 Kepler’s law and the energy of planets
with respect to time) and equation (11.7). Rewriting equation (11.7), we get:
r̈ − rθ̇ 2 = −kr−2 ,
k = G(m1 + m2 ).
(12.1)
The gravitational potential energy between two objects with masses m1 and m2 at a
distance r from each other is often expressed as:
Ep = −
Gm1 m2
.
r
(12.2)
Problem 12.1 (Escape speed and escape energy). Let the radius of the Earth be R.
Find the minimum speed required to launch an object with mass m from the surface
of the Earth to escape its gravitational pull.
Solution 12.1 The potential energy at the surface of the Earth is E p = − GmM
R , where
M is the mass of the Earth. As the object escapes Earth’s gravity, the total energy
must be greater than 0. Thus, the kinetic energy must exceed GmM
R , which is the
escape energy. This leads to the minimum speed v satisfying:
1 2 GmM
mv =
.
2
R
q
Solving this equation yields v = 2GM
⊓
R , which is the escape speed. ⊔
Problem 12.2 (Total energy of circular orbits). Show that the total energy of an
object orbiting in a circular orbit with radius r > 0 is given by:
Etotal =
1 Gm1 (m1 − m2 )
< 0.
2
r
(12.3)
Solution 12.2 Since the object is in a circular orbit, r̈ = 0. Therefore, using (12.1)
to compute the kinetic energy, we get:
1
1 Gm1 (m1 + m2 )
Ek = m1 r2 θ̇ 2 =
.
2
2
r
The potential energy is the same as (12.2), so the total energy is the sum of these
two, as given in (12.3). ⊔
⊓
When m1 is significantly smaller than m2 , as in the case of artificial satellites
orbiting the Earth in circular orbits, the kinetic energy, potential energy, and total
energy can be expressed as follows:
Ek =
1 Gm1 m2
,
2
r
Ep = −
Gm1 m2
,
r
Etotal = −
1 Gm1 m2
.
2
r
In other words, as an object descends from infinite distance to its current position,
half of the decrease in potential energy is converted to kinetic energy. Therefore, the
total energy and kinetic energy have opposite signs.
12.2 Energy of elliptical orbits
103
Problem 12.3 (Formation of Inner Planets and Total Energy). Explain why the
8 planets in the solar system were not formed from outside but were instead formed
together with the Sun.
Solution 12.3 The 8 planets in the solar system orbit in nearly circular orbits, and
as shown in problem 12.2, the total energy of planets orbiting in such orbits is much
lower compared to objects coming from outside. Therefore, the possibility of planets
coming from outside is very low. ⊔
⊓
Satellites orbiting in higher orbits have lower velocities, thus their kinetic energy
decreases. However, their potential energy increases to twice the decreased kinetic
energy. Consequently, the total energy increases with higher orbits, and raising a
satellite to a higher orbit requires more energy.
Problem 12.4 (Minimum energy required to raise a satellite to orbit). Consider
a satellite with mass m1 on the surface of the Earth and when it is at a height h above
the ground in a circular orbit. (Ignore the rotation of the Earth.)
Solution 12.4 Let the radius of the Earth be R. When the satellite is on the surface
of the Earth, its kinetic energy is zero, and the potential energy is E p = −GmR1 m2 .
Thus, the energy difference is
(R + 2h)m1 m2 + Rm21 ∼ (R + 2h)m1 m2
1 Gm1 (m1 − m2 ) −Gm1 m2
−
=G
.
=G
2
R+h
R
2R(R + h)
2R(R + h)
This represents the minimum energy required to raise a satellite of mass m1 to an
orbit with height h above the ground. (Of course, in reality, much more energy is
needed. Taking into account fuel weight, rocket weight, etc., much more energy
would be needed than this minimum energy.) ⊔
⊓
12.2 Energy of elliptical orbits
Setting the phase offset to θ0 = 0, the distance r given in (11.11) can be expressed
as:
L2
r=
(12.4)
G(m1 + m2 )(1 + e cos(θ ))
Problem 12.5 (Total energy and eccentricity). Show that the total energy of an
object orbiting in an elliptical orbit with eccentricity e and angular velocity L is
given by:
Etotal =
G2 m1 (m1 + m2 )(1 + e)[(m1 + m2 )(1 + e) − 2m2 ]
.
2L2
(12.5)
104
12 Kepler’s law and the energy of planets
Solution 12.5 Energy is conserved, so one can compute the kinetic and potential
energies at a specific moment. Using the formula for the velocity v given in equation
(8.6), the kinetic energy of an object orbiting in an elliptical orbit can be calculated
as:
1
1
1
Ek = m1 v · v = m1 (ṙer + rθ̇ eθ ) · (ṙer + rθ̇ eθ ) = m1 (ṙ2 + r2 θ̇ 2 )
2
2
2
At the moment of minimum distance, when θ = 0, the derivative is ṙ = 0, so the
kinetic energy becomes:
m1 G2 (m1 + m2 )2 (1 + e)2
1
1 L2
Ek = m1 r2 θ̇ 2 = m1 2 =
2
2 r
2
L2
The potential energy at this moment is:
Ep = −
Gm1 m2
G2 m1 m2 (m1 + m2 )(1 + e)
=−
r
L2
Adding these energies together yields the total energy given in (12.5). ⊔
⊓
Rewriting the expression for total energy in equation (12.5), we get:
(m1 + m2 )e2 + 2m1 e + m1 − m2 −
2L2 Etotal
= 0.
G2 m1 (m1 + m2 )
Solving the quadratic equation for eccentricity e yields one positive root and one
negative root. Taking the positive root gives us the eccentricity. Hence,
s
m21
m1
m1 − m2
2L2 Etotal
e=−
±
−
+ 2
.
2
m1 + m2
(m1 + m2 )
m1 + m2 G m1 (m1 + m2 )2
In the case where m1 is much smaller than m2 , this can be simplified to:
s
2Etotal L2
e = 1+
.
m1 G2 m22
(12.6)
Comparing this formula, it is evident that Etotal < 0 corresponds to an ellipse (all
solutions of closed orbits are ellipses), Etotal = 0 corresponds to a parabola, and
m1 G2 m22
m1 k 2
Etotal > 0 corresponds to a hyperbola. In particular, Etotal = − 2 ∼
=−
2L
2L2
corresponds to a perfect circular orbit. Given the total energy and angular velocity,
the eccentricity is determined. Conversely, given the eccentricity and total energy,
the angular velocity is determined.
12.4 Elliptical orbits of satellites
105
12.3 Circular orbit of satellites
Consider an artificial satellite orbiting the Earth in a circular orbit with radius r. The
corresponding orbit equation is
r3 θ̇ 2 = G(m1 + m2 ),
(12.7)
where m2 is the mass of the Earth and m1 is the mass of the satellite. Let T be the
time taken for one revolution, i.e., the orbital period. Since the radius and angular
velocity are constant, θ̇ is constant, and we have
T θ̇ = 2π.
Substituting this into the equation above, we obtain the following Kepler’s 3rd law:
4π 2
T2
=
.
r3
G(m1 + m2 )
(12.8)
Problem 12.6 (Kepler’s third law).
(1) What happens to the period when the radius of the satellite is doubled?
(2) What is the radius of the orbit if the satellite revolves around the Earth once a
day?
Solution 12.6 (1) Calculate using (12.7) or (12.8). Expressing the formula for the
period using (12.7), we have
T=
1/2
4π 2
r3
.
G(m1 + m2 )
Therefore, if we substitute r with 2r, T becomes 23/2 times.
(2) Now, let’s use (12.8), r = θ̇ −2/3 (G(m1 + m2 ))1/3 . If the Earth rotates once
a day, then θ̇ = 2π/86400s. Ignoring the weight of the artificial satellite m1 and
substituting the mass of the Earth m2 = 5.972 × 1024 kg and the gravitational constant G ∼
= 6.674 × 10−11 m2 /kg s, we get the distance from the Earth’s center to the
geostationary orbit as r = 4.2240 × 107 m. It’s about 6.63 times the Earth’s radius.
⊔
⊓
12.4 Elliptical orbits of satellites
Elliptical orbits of satellites are also commonly used. The moment when an artificial
satellite approaches the Earth the closest occurs when the angle θ is 0 as represented
by (12.4). The distance at that moment is as follows:
106
12 Kepler’s law and the energy of planets
r0 =
L2
.
G(m1 + m2 )(1 + e)
The speed at this moment is denoted as v0 and is given as follows:
v0 = ∥v∥ = ∥ṙer + rθ̇ eθ ∥ = r0 θ̇ .
Thus, r0 v0 = r02 θ̇ is the angular velocity L. Re-writing (12.4) using this minimum
distance, we get:
(1 + e)r0
.
r=
1 + e cos θ
On the other hand, when it is farthest away with cos θ = −1, the distance is as
follows:
1+e
r1 =
r0 .
1−e
The semi-major axis of the ellipse is half of the major axis:
r0 + r1
r0
L2
=
=
.
2
1 − e G(m1 + m2 )(1 − e2 )
√
The semi-minor axis b = a 1 − e2 is obtained using the eccentricity (8.8).
a=
Problem 12.7 (Kepler’s 3rd law). Let T > 0 be the period in which a planet orbits
and a > 0 be the semi-major axis of the orbit. Show the following:
T2
4π 2
=
.
3
a
G(m1 + m2 )
(12.9)
√
Solution 12.7 The area of the ellipse is πab = πa2 1 − e2 . Using Kepler’s 2nd law
to find the area, we have:
Z T
0
Therefore,
1
dA =
2
Z T
0
1
Ldt = LT.
2
√
2πa2 1 − e2
T=
L
is satisfied. Squaring both sides and dividing by a3 , we get:
T2
4π 2 a(1 − e2 )
4π 2
=
=
,
3
2
a
L
G(m1 + m2 )
which gives Kepler’s 3rd law. ⊔
⊓
Question 12.1. Between an artificial satellite orbiting in a circular orbit with radius
r and an artificial satellite orbiting in an elliptical orbit with a semi-major axis equal
12.4 Elliptical orbits of satellites
107
to r but an eccentricity e, which one has a longer orbital period? Which orbit requires
more energy to place the satellite in? If the mass of the satellite doubles, how does
the period change?
Question
p 12.2. A pendulum with length ℓ and mass m1 is known to have a period
T = 2π ℓ/g. Is there any relationship between this and Kepler’s 3rd law (12.9)?
Problem 12.8. The Enterprise is flying towards an unknown planet. Just before entering the gravitational influence of the planet, it is moving at a speed of v0 . Instead
of landing the Enterprise on the planet, Spock and McCoy decide to beam down
using the transporter. The maximum beaming distance is r > 0, so it is decided to
orbit the planet on a circular orbit with radius r. How much energy needs to be lost?
How can we save energy?
Solution 12.8 Assuming the mass of the spacecraft Enterprise is m1 , then the kinetic
energy at infinity is Ek = 12 m1 ∥v0 ∥2 , so this is the total energy as seen from the
planet. If the spacecraft orbits on a circular orbit with radius r > 0, the total energy
changes to Etotal = Gm1 (m2r1 −m2 ) . Therefore, the energy difference is
1
Gm1 (m1 − m2 )
m1 ∥v0 ∥2 −
,
2
2r
which is the energy lost to stay on the circular orbit. To save energy, it would be
better to orbit on an elliptical orbit with a minimum distance of r. ⊔
⊓
Problem 12.9. In the case of the above problem, find out how much energy can be
saved if the Enterprise is orbiting an elliptical orbit with eccentricity e and minimum
distance r.
Solution 12.9 Using (12.4), the minimum distance of an orbit with eccentricity e
and angular velocity L occurs when cos θ = 1. In that case,
L2 = rG(m1 + m2 )(1 + e)
holds. And in this case, the total energy is given by
Etotal =
Gm1 [(m1 + m2 )(1 + e) − 2m2 ] Gm1 (m1 − m2 ) Gm1 (m1 + m2 )e
=
+
2r
2r
2r
The difference in energy is
can be saved. ⊔
⊓
Gm1 (m1 + m2 )e
, and this is the amount of energy that
2r
108
12 Kepler’s law and the energy of planets
12.5 Interstellar and solar system object
Let’s assume there is an object moving in space with velocity v and it happens
to approach the Sun. What could happen? The object could collide head-on with
the Sun and be absorbed, or it could pass by the Sun and move elsewhere, or it
could collide partially and some parts could be absorbed by the Sun while the rest
separates. Could it then be trapped by the gravity of the Sun and orbit around the
Sun? Could celestial bodies currently orbiting the Sun have formed in this way? Our
idea to verify this is simple. The energy of a planet on an orbit should be conserved.
Compare this energy with the energy of the extraterrestrial planet.
Before the object enters the region affected by the Sun’s gravity, let’s assume it
is moving with velocity v. If the mass of the object is m1 , then the kinetic energy is
Ek = 12 m1 ∥v∥2 . The potential energy at infinity is E p = 0, so the total energy is the
kinetic energy at infinity. That is, if there is no energy loss due to collision or other
factors,
1
Etotal = Ek + E p = m1 ∥v∥2
2
should hold. If the object is dragged by the gravity of the Sun and orbits around it,
then the total energy will be a positive value. Of course, there may be changes in
total energy due to too close approaches causing collisions or some parts being torn
off and separated.
Problem 12.10 (Orbiting by extraterrestrial objects). Investigate whether a planet
or object orbiting on an elliptical orbit with an eccentricity close to 1 can have a positive total energy.
Solution 12.10 In Problem 12.5, we calculated that the total energy of a celestial
body orbiting on an ellipse with an angular velocity L and eccentricity e is
Etotal =
G2 m1 (m1 + m2 )(1 + e)[(m1 + m2 )(1 + e) − 2m2 ]
.
2L2
To have a positive value, we need (m1 +m2 )(1+e)−2m2 > 0. When the eccentricity
e is close to 1, it is possible. However, looking at the data, it is very difficult for the
total energy to be positive even when the eccentricity is close to 1 due to the large
mass difference between the planet and the Sun. If it were an object from outer
space, it might have had a change in total energy. ⊔
⊓
Exercises
1. Calculate the minimum energy required to place a satellite with a mass of 103 kg
in a stationary orbit.
12.5 Interstellar and solar system object
109
2. Determine the total energy of a satellite with a mass of 103 kg orbiting on an
elliptical orbit with a period of 1 day and an eccentricity e.
3. Halley’s comet, discovered by E. Halley, has an orbital eccentricity of 0.9673 and
a period of 76.03 years. Find the maximum and minimum speeds of this comet.
Calculate the total energy.
Part III
The Arts of Calculus
Differential and integral calculus have now become essential tools for understanding the world’s problems, and with them, much can be accomplished. Understanding
and applying differentiation and integration effectively is useful, and in Part III, we
will understand and learn the core techniques for this.
Lecture 13
Curves and particle trajectories in R3
When the variable t represents time and r(t) represents the position of a moving
particle at time t, then the derivative r′ (t) = v(t) represents the velocity of the particle at time t. Even if it is not necessarily the trajectory of a particle, r(t) represents
a curve in space and can represent one-dimensional objects such as wires or bent
bars. By using arc length s as a variable instead of time t, we can characterize the
properties of the curve itself rather than the trajectory of the object. In this lecture,
we study the properties of curves in three-dimensional space by alternately using the
time variable t and the arc length variable s. In particular, we use a lot of notation
abuse in this lecture, and you should get used to it.
13.1 Arc length as a variable
Let’s assume driving from home to work. The distance between two locations is
the length of the straight line connecting the two locations. On the other hand, the
traveled distance is the length of the trajectory the car travels, and it is obtained by
integrating the velocity with respect to time from the departure time to the arrival
time.
Question 13.1. Why does integrating velocity give the length of the trajectory?
What do you get if you integrate speed instead of velocity?
Let r : [0, T0 ] → R3 represent the position of a particle moving in space. The
variable t ∈ [0, T0 ] represents time. Velocity is the derivative of position with respect
to time. Representing position and velocity as


 ′ 
x(t)
x (t)
r(t) = y(t) and v(t) = r′ (t) = y′ (t)
z(t)
z′ (t)
respectively, the speed is the magnitude of velocity
113
13 Curves and particle trajectories in R3
114
∥v(t)∥ =
q
(x′ (t))2 + (y′ (t))2 + (z′ (t))2 .
The distance traveled, called arc length, is calculated as follows for a given time t:
Z tq
Z t
s(t) =
(x′ (τ))2 + (y′ (τ))2 + (z′ (τ))2 dτ =
∥v(τ)∥dτ.
0
0
Problem 13.1. For the curve in three-dimensional space r(t) = costi + sintj + tk
with the variable range t ∈ [0, 2π], find the arc length.
Solution 13.1 The velocity is v(t) = − sinti + costj + k and the speed is
p
√
∥v∥ = sin2 t + cos2 t + 1 = 2.
√
R √
Therefore, the arc length is s(t)√= 0t 2dt = 2t. Hence, the arc length for the
⊓
entire variable range [0, 2π] is 2 2π. ⊔
13.2 Parametrization with arc length
Let r : [0, T0 ] → R3 be a vector function that satisfies ∥r′ (t)∥ =
̸ 0 for all t. Let s(t)
represent the length of the trajectory traveled from the starting time 0 to t > 0. The
length of the trajectory is given by
Z t
Z tq
s(t) =
∥v(τ)∥dτ =
(x′ (τ))2 + (y′ (τ))2 + (z′ (τ))2 dτ.
0
0
Thus, s(t) is a function that corresponds to the arc length interval [0, L] from the
time interval [0, T0 ], where L = s(T0 ) is the total length of the trajectory. Then, s(t)
is an increasing function, so it has an inverse function denoted by t(s). Here, we are
using notation abuse by using both arc length s and time parameter t as variables
and functions. The relationships of the inverse function s(t(s)) = s and t(s(t)) = t
hold. By using the derivative rule for the inverse function, we obtain
ds
dt
t=t0
=
1
dt
ds s=s
0
.
Alternatively,
s′ (t0 ) =
1
,
t ′ (s0 )
t ′ (s0 ) =
1
s′ (t0 )
13.2 Parametrization with arc length
115
can be obtained. Here, s0 and t0 correspond to the arc length and time parameter,
respectively.
Let r̃(s) = r(t(s)) represent the composite function of r(t) and t(s), which is
defined on the interval representing arc length s ∈ [0, L] and takes values in R3 .
When considering arc length s as the variable instead of time t, what will be the
magnitude of the derivative? From a variable s perspective, the magnitude of the
derivative is 1. Is this intuitively clear? Let’s calculate it:
r̃′ (s) =
v(t)
d
r(t(s)) = r′ (t)t ′ (s) =
.
ds
∥v(t(s))∥
(13.1)
Thus, ∥r̃′ (s)∥ = 1.
Let’s abuse notation. If we simply write r(s) without the tilde notation for r̃(s),
will it be confusing with r(t)? We can distinguish them sufficiently. When writing
r(s), we know that r is a function of the arc length variable s, and when writing
r(t), we know that it is a function of time t. And when we write r′ (s), it means to
d
differentiate the position r with respect to s. That is, r′ (s) = r. Then, is v(s) =
ds
d
r′ (s)? Not at all. v represents velocity, and therefore, v = r is given. Thus, v(s) =
dt
d
dt r(s) represents the velocity at the corresponding point of arc length s. You should
get used to such notation abuse in this lecture, including the definitions of T , N, S,
κ, τ, etc., considering both s and t as variables.
Remark 13.1. When it is clear what the variable of a function f is, writing the derivative as f ′ does not cause confusion. However, when abusing notation, it is necessary
to clearly state what the variable is. Therefore, either write f ′ (s) or f ′ (t) to indicate
df
df
clearly what the variable is. In that sense, the use of Leibnitz notation
or
ds
dt
prevents confusion.
Question 13.2. Once write r(s) = r(t(s)) and r(t) = r(s(t)). Which one corresponds to r̃ defined in (13.1)? (First and fourth)
Problem 13.2. Draw a graph of the curve in three-dimensional space r(t) = costi +
sintj + tk with t ∈ [0, T0 ]. Compute the arc length s(t), its inverse function t(s), and
r′ (s).
Solution 13.2 As given in Problem (13.1), ∥v(t)∥ =
Z t√
s(t) =
2dτ =
√
2. Therefore,
√
2t,
0
is given as a constant multiple, and its inverse function is given as t(s) =
fore, upon composition,
√s .
2
There-
13 Curves and particle trajectories in R3
116
s
s
s
1 s
s
r(s) = cos √ i + sin √ j + √ k, r′ (s) = √ − sin √ i + cos √ j + k
2
2
2
2
2
2
are obtained. In this case, it can be easily verified that ∥r′ (s)∥ = 1. ⊔
⊓
13.3 TNB coordinate system
Unit tangent vector
The unit tangent vector T is the derivative of the position vector r with respect to
the arclength s:
T (s) = r′ (s).
We have already seen from equation (13.1) that T (s) is a unit vector. When we write
T (t), it does not mean T (t) = r′ (t). We are applying notation abuse to the already
defined T . Therefore, T (t) = T (s(t)).
Problem 13.3. Compute the unit tangent vector for the curve r(t) = (1 + 3 cost)i +
(3 sint)j + t 2 k.
Solution 13.3 We compute r′ (t) as usual and then divide by its magnitude to obtain
T (t). ⊔
⊓
Problem 13.4. (1) Given the curve r(t) = cos(t)i + sin(t)j defined over the interval
t ∈ (0, 2π), find the velocity v(t) and acceleration a(t) vectors of the curve, and
explain the perpendicular relationship among the three vectors (r, v, a).
(2) Given the curve r(t) = cos(t 2 )i + sin(t 2 )j defined over the interval t ∈ (0, 2π),
find the velocity v(t) and acceleration a(t) vectors of the curve, and explain the
relationship among the three vectors (r, v, a).
(3) Given the curve r(t) = sin(t) cos(t)i+sin2 (t)j+cos(t)k defined over the interval
t ∈ (0, 2π), find the velocity v(t) and acceleration a(t) vectors of the curve, and
explain the relationship among the three vectors (r, v, a).
Solution 13.4 (1) The velocity and acceleration are both perpendicular to the position vector r. (2) The position vector r is perpendicular to the velocity v. (3) The
position vector r is perpendicular to the velocity v. ⊔
⊓
Problem 13.5. (1) Prove that if the speed is constant, the acceleration a(t) and velocity v(t) are perpendicular to each other. (2) If the distance between a particle and
the origin is constant, what is perpendicular to each other?
Solution 13.5 (1) If the speed is constant, the square of the speed is also constant.
The square of the speed is v(t) · v(t). Taking the derivative, we have
0=
d
(v(t) · v(t)) = 2v(t) · a(t) = 2v(t) · a(t).
dt
13.3 TNB coordinate system
117
Therefore, v(t) and a(t) are perpendicular. (2) Similarly, if the distance between a
particle and the origin is constant, its square is also constant. That is, r(t) · r(t) is
constant. Taking the derivative, we conclude that r(t) and v(t) are perpendicular to
each other. ⊔
⊓
Curvature κ and principal unit normal vector N
The unit tangent vector T (s) is differentiated to obtain the vector curvature T ′ (s),
and its magnitude is called the scalar curvature. When we say ”curvature,” we usually mean the scalar curvature, denoted by the Greek letter κ (kappa), defined as:
κ(s) = ∥T ′ (s)∥ = ∥r′′ (s)∥.
Problem 13.6 (Principal unit normal vector). Let κ(s) ̸= 0. (1) Define N(s) =
1
′
κ(s) T (s) as the principal unit normal vector. Show that it is perpendicular to the
unit tangent vector T . (2) Show that it satisfies the following:
N(t) =
T ′ (t)
.
∥T ′ (t)∥
Solution 13.6 (1) Since T is always a unit vector, we have T · T = 1, which is
constant. Taking the derivative, we get
(T (s) · T (s))′ = 2T (s) · T ′ (s) = 2κ(s)T (s) · N(s) = 0.
Therefore, they are perpendicular. (2) Since T ′ (s) and T ′ (t) have the same direction
and N is a unit vector, this equation holds. ⊔
⊓
The curvature κ(s) indicates how sharply the curve bends at the point r(s) along
the curve.
Problem 13.7. Compute the curvature of the straight line r(t) = c + tv. Here, c and
v are given constant vectors.
Solution 13.7 We find r′ (t) as usual, then divide by its magnitude to obtain T (t).
Since T × T = 0, we have
∥v × a∥ = ∥v∥3 κ = 0.
Therefore, the curvature is κ = 0. ⊔
⊓
Problem 13.8. Compute the curvature of the circle r(t) = a costi + a sintj with radius a > 0.
13 Curves and particle trajectories in R3
118
Solution 13.8 We compute r′ (t) and its magnitude to obtain T (t). Then, we find
T ′ (t) and compute its magnitude to obtain κ(t). We have
κ(t) = ∥T ′ (t)∥/a = 1/a.
⊔
⊓
Problem 13.9. The example in Problem 13.8 illustrates a method for determining
the curvature of a curve in the plane. Describe the method.
Solution 13.9 Let r(t) be a point on the curve. Then, we find the circle that touches
the curve at that point. If a > 0 is the maximum radius of such a tangent circle, then
the curvature is 1/a. If there is no restriction on the radius of the tangent circle, we
say a = ∞, and the curvature is 0. ⊔
⊓
Torsion τ and binomial vector B
The binomial vector is denoted by B and defined as follows:
B = T × N.
Then T , N, and B form a positively oriented orthogonal coordinate system. Differentiating, we have
d
d
d
d
B = (T × N) = 0 + T × N = T × N.
ds
ds
ds
ds
d
d
Since N is a unit vector, ds
N is perpendicular to N, and therefore T × ds
N has only
the N direction. The torsion τ of the curve r(s) is defined as the negative of the N
d
component of ds
B:
d
τ = − B · N.
ds
It represents how much the curve twists in the direction perpendicular to the direction of progression T .
13.4 Computation formulas
The subjects defined in the previous section are the geometric properties of curves
in space, and all definitions were made using the arclength parameter. However, it
is cumbersome to perform calculations by converting variables each time. It can be
done for general parameters. In this section, we introduce the method of calculation
when the existing time variable t is given. We use the velocity v and acceleration a
given by the derivatives with respect to time for these calculations.
13.4 Computation formulas
119
The unit tangent vector T (t) has the same direction as v(t) but with magnitude 1:
T (t) =
v(t)
.
∥v(t)∥
T ′ (s) and T ′ (t) are different vectors, but they have the same direction. We have
T ′ (s) =
dt
1
d
T (t(s)) = T ′ (t) = T ′ (t)
.
ds
ds
∥v(t)∥
Therefore, the principal normal vector N(t) has the same direction as T ′ (t) but with
magnitude 1:
T ′ (t)
N(t) =
.
∥T ′ (t)∥
The binormal vector B is computed using the cross product:
B = T × N.
Problem 13.10. The acceleration a is perpendicular to B, and it is given as follows:
a = CT T +CN N,
CT =
d
∥v∥, CN = κ∥v∥2 .
dt
Solution 13.10 We can compute a in the T NB system as follows:
d
d
ds d 2 s
ds d
ds v(t) =
T (t)
= 2T +
T (s)
dt
dt
dt
dt
dt ds
dt
ds 2
d2s
κN.
= 2T +
dt
dt
a=
The binomial vector B does not have a component in the direction of acceleration,
meaning B is perpendicular. Rewriting the coefficients, we obtain the given expressions. ⊔
⊓
The above equation states that the acceleration a is divided into components along
T and N, and only the component along T contributes to the change in velocity
d
dt |v|. The component along N depends on curvature and is responsible for changing
direction, proportional to the square of velocity, but it does not contribute to changes
in velocity.
Problem 13.11. The curvature κ and torsion τ are given by:
κ=
∥v × a∥
,
∥v∥3
ẋ ẏ ż
ẍ ÿ z̈
... ... ...
x y z
τ=
.
∥v × a∥2
13 Curves and particle trajectories in R3
120
Solution 13.11 For κ, we use the expression for a from Problem 13.10. Since
v = ∥v∥T , and T × T = 0, we have
∥v × a∥ = ∥∥v∥T × a∥ = ∥v∥3 κ∥T × N∥ = ∥v∥3 κ.
Therefore, κ satisfies the given expression. The expression for τ can be directly
memorized. ⊔
⊓
13.5 Exercises
1. Rewrite the following curves using the arclength parameter s.
(1) r(t) = sinti + costj + tk, 0 ≤ t ≤ 1
(2) r(t) = ti + 3j − t 2 k, 0 < t < 1
2. Using the basis vectors from (8.3), find the velocity and acceleration of the curves
in polar coordinates.
(1) r = θ , θ = 3t
(2) r = sin θ , θ = t 2
(3) rθ = 1, r = t
3. Compute T, N, B, κ, and τ for the following curves.
(1) r(t) = sinti + costj + 2tk
(2) r(t) = sin2 ti + cos2 tj − 3k
Lecture 14
Linearization and differentials
The equation of the tangent line tangent to the graph of the function y = f (x) at
the point (c, f (c)) on the graph is y − f (c) = f ′ (c)(x − c). As these two, the tangent
line and the graph, are enlarged in the vicinity of the tangent, they become increasingly similar. For this reason, the tangent line possesses many properties of the graph
in the vicinity of the tangent point and can be considered as an approximation of the
graph. As they move away from the tangent point, they become increasingly different, but as they approach the tangent point, the tangent line becomes an excellent
approximation of the function. Approximations using not only first derivatives but
also higher-order derivatives are learned in Part 4. In this lecture, we assume that
the function f : R → R is differentiable at c ∈ R and discuss it.
14.1 Linearization
The equation of the line passing through the origin with slope a ∈ R is y = ax.
Although we can express it using function notation as f (x) = ax, it is necessary to
become accustomed to simply thinking of y as a function of x. Then, the slope of this
dy
line is y′ = dx
= a. If the line passes through a point (x0 , y0 ) instead of the origin,
the equation of the line is as follows:
y − y0 = a(x − x0 ).
(14.1)
The point (c, f (c)) is a point on the graph of y = f (x). How can we find the equation
of the line tangent to the graph at this point? Using the derivative, we know that the
slope is f ′ (c), and since the point (c, f (c)) must lie on this line, the equation is
y − f (c) = f ′ (c)(x − c). Rewriting this, we have:
y = f (c) + f ′ (c)(x − c).
(14.2)
121
122
14 Linearization and differentials
For convenience, we denote the right-hand side as
L(x) := f (c) + f ′ (c)(x − c),
which represents a function having the tangent line as its graph. In this case, we call
L(x) the linearization or linear approximation of the function f (x) at the point c.
√
Problem 14.1. Let f (x) = 1 + x. Find the linearization function L(x) of f at x = 0.
Solution 14.1 Differentiating, we get f ′ (x) = 12 (1 + x)−1/2 , f ′ (0) = 0.5, and f (0) =
1. Therefore, L(x) = f (0) + f ′ (0)(x − 0) = 1 + 0.5x.
Problem 14.2. Find the linearization function L(x) of the function f (x) = cos x at
the point x = π2 .
Solution 14.2 Differentiating, we get f ′ (x) = − sin x, f ′ ( π2 ) = −1, and f ( π2 ) = 0.
Therefore, L(x) = f ( π2 ) + f ′ ( π2 )(x − π2 ) = −(x − π2 ) = π2 − x.
Local property
A property that is determined by the behavior in a small region is called a local
property. In this section, the linearization explained indicates that the linear function L(x) is locally equivalent to the original function f (x) and its x-coordinate c
near the tangent point. This implies that many local properties are given through
differentiation, which is one of the reasons why differentiation is useful. It is used
for local maximum, local minimum, rates of change of length and volume, etc., as
learned in Calculus 1 and 2.
14.2 Differentials
The Linearization L(x) = f (c) + f ′ (c)(x − c) utilizes the relationship (14.2) to approximate f (x) when x is close to c. Rather than approximating f (x) directly, it is
more convenient to approximate only its difference. In other words, we can use the
following relationship corresponding to (14.1):
dy = f ′ (x)dx.
(14.3)
Here, dx and dy are called differentials, where dx is the independent variable and dy
is the dependent variable determined by x and dx. This relationship is meaningful
when dx is sufficiently small.
Question 14.1 (Notation abuse). Are the dx and dy in the notation
tiation the same as the differentials dx and dy in (14.3)?
dy
dx
for differen-
14.2 Differentials
123
Solution 14.1 No, they are not. They mean different things but use the same notation. Representing different things with the same notation is called notation abuse.
Sometimes, we abuse notation for convenience. One of the most striking abuses
of notation is differentiation, where dx and dy are used. We denote the derivative
dy
. Thus,
of the dependent variable y with respect to the independent variable x as dx
when both are used together, it indicates the meaning of differentiation. On the other
hand, dy and dx used as differentials are treated as small numbers. In (14.2), dy corresponds to f (x) − f (c) and dx to x − c. However, (14.3) can also be written as
follows:
dy
= f ′ (x).⊓
⊔
dx
In this case, it means that the differential and differentiation using Leibniz notation
represent the same thing. Therefore, they represent different things depending on
the perspective, but in reality, they denote the same thing. It is necessary to become
accustomed to the difference between them and to interchangeably use them.
Problem 14.3. Given x = 1 and dx = 0.2, find the differential dy of the variable
y = x4 + 7x.
Solution 14.3 Differentiating, we have f ′ (x) = 4x3 + 7 and f ′ (1) = 11. Therefore,
dy = 11dx = 0.2.
The above differential dy pertains to the function f (x). Instead of introducing y
directly, we can represent it as
d f = f ′ (x)dx.
Problem 14.4. Find the differential d f of the function f (x) = 3x2 − 6.
Solution 14.4 d f = f ′ (x)dx = 6xdx.
The calculation of differentials for sums, products, and composite functions follows the rules of differentiation.
1. d(u + v) = du + dv.
2. d(uv) = udv + vdu.
3. d( f (u)) = f ′ (u)du.
Question 14.2. Using the chain rule for differentiation, we obtain:
d(sin(sin(u))) = cos(sin(u))d(sin(u)) = cos(sin(u)) cos(u)du.
What is the reason for the first equation?
Solution 14.2 First, by substituting sin(u) as v, we obtain:
d(sin(v)) = cos(v)dv = cos(sin(u))d(sin(u)).
124
14 Linearization and differentials
Repeating this calculation, we obtain the second equation.
14.3 Differentials for linear approximation
Approximation using differentiation refers to approximation using linear functions.
Now, let’s compare linear approximation with errors in detail. Let ∆ x = x − c denote the difference between the independent variable x and the comparison point c.
For such an independent variable, we set the differential dx to be the same as the
difference in variables ∆ x. That is,
dx = ∆ x = x − c.
If dx is sufficiently small, according to the definition of differentiation, for a differentiable function f (x), we have
f (c + dx) − f (c)
f ′ (c) ∼
=
dx
holds. Therefore, at x = c + dx, we approximate the function value as a linear function:
f (x) = f (c + dx) ∼
= f (c) + f ′ (c)dx.
Now let’s compare the difference in function values
∆ y = f (x) − f (c) = f (c + dx) − f (c)
and the difference with the differential
dy = f ′ (c)dx
Using this notation, where ∆ y is the difference in function values and dy is the
product of dx and the derivative f ′ (c), we compare them as follows:
∆ y = f (c + dx) − f (c) =
f (c + dx) − f (c)
dx ∼
= f ′ (c)dx = dy.
dx
Now, the actual difference ∆ y and the difference in differential dy are as follows:
∆ y − dy =
f (c + dx) − f (c)
dx
− f ′ (c) dx.
Therefore, the difference between the actual difference ∆ y and the differential dy
decreases as dx becomes smaller. More importantly, as dx decreases, the ratio
f (c+dx)− f (c)
converges to the derivative f ′ (c) faster than dx converges to 0. That
dx
is,
∆ y − dy
→ 0 as dx → 0.
dx
14.3 Differentials for linear approximation
125
Expressed with the little-oh notation in the 16th lecture, it is as follows:
∆ y − dy = o(dx) as
dx → 0.
Problem 14.5. Given a circular disk with a radius of 10 cm, find the exact increase
in area when the radius increases by 1
Solution 14.5 The area of a disk with radius r is A(r) = πr2 . Then,
∆ y = A(10.1) − A(10) = 2.01π.
A′ (r) = 2πr, and A′ (10) = 20π. Therefore,
dy = A′ (10)dx = 20π0.1 = 2π.
Thus, the difference is ∆ y − dy = 0.01π.
Problem 14.6. Approximate the value of (7.97)1/3 using c = 8, and predict the error
using differentials. Compare it with the actual error.
Solution 14.6 Let’s use the function f (x) = x1/3 . Using c = 8 as the approximation,
1
.
we have f (8) = 81/3 = 2. Here, dx = 7.97 − 8 = −0.03, and f ′ (8) = 31 8−2/3 = 12
Therefore, the differential dy is
dy = f ′ (c)dx = −
1
× 0.03 = −0.0025.
12
Now, using differentials for approximation, we have:
1
(7.97)1/3 = f (7.97) ∼
= f (8) + f ′ (8)dx = 2 + (−0.03) = 2 − 0.0025 = 1.9975.
12
The actual difference in error is 0.000003, which is not too bad.
Exercises
1.
Lecture 15
Inverse trigonometric and hyperbolic functions
In this lecture, we will learn about inverse trigonometric functions and hyperbolic
functions. However, functions like sin x and cos x are not one-to-one functions, so
their inverse functions do not exist. What we are seeking is not the inverse functions
of sin x and cos x, but rather the inverse functions for their branches.
15.0.1 Inverse trigonometric functions
First, let’s review the basic properties of inverse functions learned in Lecture 6. If a
function f : A → B is one-to-one and onto, then an inverse function g : B → A exists,
satisfying g( f (x)) = x and f (g(y)) = y. If f is differentiable and f ′ (x) ̸= 0, then
g′ (y) = f ′1(x) holds, where y = f (x). We denote the inverse function g as f −1 .
Sine function
The function sin θ has a domain that spans the entire real line R and a co-domain of
[−1, 1]. Since it is not one-to-one, the inverse function does not exist. Even if there
is no inverse function for the sine function, let’s remember that we are considering
the inverse function for the chosen branch of sin x within its domain.
Question 15.1. If we want to create a one-to-one function by taking some part of
the domain of the function sin θ , what interval would be the best choice?
It is essential to include the most critical angles, which are from 0 degrees to
90 degrees or from 0 to π/2. Considering the shape of the sin function, to include
cases where sin takes negative values, it’s reasonable to include the interval from
−90 degrees to 0 degrees as well. Therefore, the branch we should consider is as
follows:
sin |[−π/2,π/2] : [−π/2, π/2] → [−1, 1].
127
128
15 Inverse trigonometric and hyperbolic functions
The inverse function is denoted by arcsin or sin−1 . Even if we use sin−1 notation,
let’s not forget that it represents the inverse function of the branch sin |[−π/2,π/2] ,
chosen not only by us but also by everyone. Therefore, the inverse function is:
arcsin : [−1, 1] → [−π/2, π/2].
If someone asks for arcsin(−0.5), the correct answer is an angle between −π/2 and
π/2 that satisfies sin θ = −0.5. Providing a different angle would be an incorrect
answer.
When we write the sine function, we sometimes use sin x or sin θ . Especially
when using θ , it clearly indicates that it represents an angle. Of course, even when
expressed as sin x, x represents an angle. However, writing arcsin θ is very misleading. This is because arcsin does not take angles as variables but rather takes values
of sin between -1 and 1 as variables.
Problem 15.1. Find the following values:
(1) arcsin(0.5).
(2) arcsin(−0.5).
(3) arcsin(1).
(4) arcsin(−1).
Solution 15.1 (1) arcsin(0.5) is an angle θ such that sin(θ ) = 0.5 within the interval
[−π/2, π/2]. Thus, the answer is 30 degrees or π/6. ⊔
⊓
Now let’s find the derivative of arcsin. Using the derivative of the inverse function,
we have:
d
1
1
arcsin x =
=
.
dx
cos θ
sin′ θ
Of course, we shouldn’t stop here. Since we differentiated with respect to x, we
should end with a function of x to be useful. Thus,
q
p
p
cos θ = 1 − sin2 θ = 1 − sin2 (arcsin x) = 1 − x2 .
Consequently, the derivative of arcsin is:
15 Inverse trigonometric and hyperbolic functions
1
d
,
arcsin x = √
dx
1 − x2
129
−1 ≤ x ≤ 1.
This formula is essential and should be remembered.
Cosine function
Now let’s consider the inverse function of the cosine function. Its domain spans the
entire real line R, and its co-domain is [−1, 1]. Similarly, we choose a branch of
the cosine function and consider its inverse. When creating a one-to-one function
by taking some part of the domain of the function cos θ , similarly, it is essential to
include the most crucial angles from 0 to 90 degrees. Thinking about the shape of
cos, if we want to include cases where cos takes negative values, it’s reasonable to
include the interval from 0 degrees to 180 degrees as well. Therefore, the branch we
should consider is:
cos |[0,π] : [0, π] → [−1, 1].
The inverse function is denoted by arccos or cos−1 . Thus, the inverse function is:
arccos : [−1, 1] → [0, π].
If someone asks for arccos(−0.5), the correct answer is an angle between 0 and π
that satisfies cos θ = −0.5.
Problem 15.2. Find the following values:
(1) arccos(0.5).
(2) arccos(−0.5).
(3) arccos(1).
(4) arccos(−1).
Solution 15.2 (1) ⊔
⊓
Now, let’s find the derivative of arccos. Let arccos x = θ . Then,
1
−1
−1
−1
d
arccos x =
=
=p
=√
.
2
dx
cos′ θ
sin θ
1
− x2
1 − cos (arccos x)
Thus, the derivative of arccos is
d
−1
arccos x = √
,
dx
1 − x2
−1 ≤ x ≤ 1.
This is essentially the negative of the derivative of arcsin x. Looking at the graph
makes the reason clearer.
130
15 Inverse trigonometric and hyperbolic functions
tangent function
Let’s consider the inverse function of the function tan θ . First of all, the tangent
function diverges at θ = ±π/2. The interval for defining the branch to choose is
(−π/2, π/2):
tan |[−π/2,π/2] : (−π/2, π/2) → R.
Its inverse function is denoted by arctan,
arctan : R → (−π/2, π/2).
Let’s find the derivative of arctan. First, let’s assume arctan x = θ , then
1
d
arctan x =
= cos2 θ .
dx
tan′ θ
To calculate by substituting θ with x, using arctan x = θ , we have
x = tan θ ⇒ 1 + x2 = 1 +
sin2 θ
cos2 θ + sin2 θ
1
=
=
.
2
cos θ
cos2 θ
cos2 θ
Therefore,
1
d
arctan x =
.
dx
1 + x2
Remember this. It appears frequently.
other functions
We can also consider the inverse functions for the remaining three trigonometric
functions among the total six trigonometric functions. Of course, we need to choose
branches. The remaining three trigonometric functions are cotangent, secant, and
cosecant. Their definitions are as follows.
cot θ =
1
1
1
, sec θ =
, csc θ =
.
tan θ
cos θ
sin θ
The four cases excluding sine and cosine are unbounded functions. To select the
inverse function, we choose the definition interval as [0, π] or [−π/2, π/2]. The
inverse functions of the total six trigonometric functions are summarized as follows.

arcsin : [−1, 1]
→ [−π/2, π/2]





arctan
:
R
→ (−π/2, π/2)



arcsec : R \ (−1, 1) → [0, π] \ {π/2}
(15.1)

arccos : [−1, 1]
→
[0, π]





arccot :
R
→
(0, π)



arccsc : R \ (−1, 1) → [−π/2, π/2] \ {0}
15.1 Hyperbolic functions
131
Problem 15.3. (1) Show that when choosing branches for creating inverse functions
of trigonometric functions, the angles between 0 and 90 degrees must be included
and the domain must be made into a connected interval. This is uniquely determined
only for the sine, cosine, tangent, and cotangent functions. (2) Confirm that for the
secant and cosecant functions, there is no interval that is connected to the graph and
contains all function values. (3) The formula (15.1) simply aligns the domain of
arcsecant with arccosine and the domain of arccosecant with arccocosine.
Solution 15.3 This problem can be confirmed by drawing graphs. ⊔
⊓
15.1 Hyperbolic functions
Let’s first consider the derivatives of sine and cosine functions;
(
sin′ x = cos x,
cos′ x = − sin x
′′
sin x = − sin x, cos′′ x = − cos x
When differentiating the sine function, sometimes it becomes the cosine function,
and sometimes it becomes the sine function with a minus sign. It returns to itself
after the second derivative, but with a minus sign. Hyperbolic functions have similar
properties, but without the minus sign. Because of the properties of differentiation,
these functions have names similar to sine and cosine functions, but their structures
are very different.
Hyperbolic sine and hyperbolic cosine functions are defined as follows:
sinh x =
ex − e−x
,
2
cosh x =
When computing their derivatives, we have
ex + e−x
.
2
132
15 Inverse trigonometric and hyperbolic functions
sinh′ x =
ex + e−x
= cosh x,
2
cosh′ x =
ex − e−x
= sinh x.
2
That is, they become each other, so the second derivative is itself:
sinh′′ x = sinh x,
cosh′′ x = cosh x.
Remembering the properties related to differentiation can be helpful in solving differential equations.
Problem 15.4. Plot the graphs of the functions ex and e−x , and use them to draw the
graphs of sinh x and cosh′ x.
Solution 15.4
⊔
⊓
Problem 15.5. Find functions that become (1) themselves and (2) their negative
counterparts when differentiated once.
This problem is to find the solutions of the first-order linear differential equations
y′ = y or y′ = −y.
Solution 15.5 (1) y = ex . (2) y = e−x . (It’s also acceptable to answer with y = 3ex
or y = 4e−x by multiplying constants, but it looks a bit strange. It would be better to
answer with y = Cex and y = Ce−x .) ⊔
⊓
The hyperbolic sine function is not related to the sine function. It is not a periodic
function. As seen in the graph above,
sinh : R → R
one-to-one and onto function.
cosh : R → [1, ∞) not a one-to-one function.
Other hyperbolic functions are defined as follows.
tanh x =
sinh x
cosh x
1
1
, coth x =
, sechx =
, cschx =
.
cosh x
sinh x
cosh x
sinh x
15.1 Hyperbolic functions
133
The formulas of hyperbolic functions corresponding to some properties of sine and
cosine functions are as follows.
cosh2 x − sinh2 x = 1, sinh 2x = 2 sinh x cosh x.
tanh′ x = sech2 x,
coth′ x = csch2 x, · · · .
Problem 15.6. Find functions that become (1) themselves and (2) their negative
counterparts when differentiated twice.
This problem is to find the solutions of the second-order linear differential equations
y′′ = y or y′′ = −y. Since it is a second-order equation, we need to find two functions
for each case.
Solution 15.6 (1) sinh x and cosh x. (Or ex , e−1 , eix , e−ix can be used as answers.) (2)
y = sin x and y = cos x. (Or eix , e−ix can be used as answers.) ⊔
⊓
From the answers above, we can see that sinh x and cosh x are in the same family
as ex and e−x , while y = sin x and y = cos x are in the same family as eix and e−ix .
Problem 15.7. Find functions that become (1) themselves and (2) their negative
counterparts when differentiated three times.
Solution 15.7 We need to find three functions each, but we know one, and the others
are not general functions. ⊔
⊓
Problem 15.8. Find functions that become (1) themselves and (2) their negative
counterparts when differentiated four times.
This problem is to find the solutions of the fourth-order linear differential equations y′′′′ = y or y′′′′ = −y. Since it is a fourth-order equation, we need to find four
functions for each case.
Solution 15.8 (1) sin x, cos x, sinh x, cosh x. There are four functions. Expressed as
exponential functions, they are ex , e−1 , eix , e−ix . (2) It is not a general function. ⊔
⊓
Exercises
1. Find the angles.
(1) arctan(1)
(2) arcsin(−0.5)
(5) arctan(−1)
(6) arctan(0.5)
√
(3) arccos( √
3/2)
(7) arcsin(− 3/2)
√
(4) arcsin(1/ 2)
(8) arccos(0.5)
2. Find the derivatives.
(1) p
arctan(x2 )
(2) arccos(1 − x)
(5) | arctan x|
(6) ln(arccos x)
(3) arcsin(cos θ )
(7) arcsec(cos θ )
(4) arctan(ln x)
(8) arccsc(sin θ )
134
15 Inverse trigonometric and hyperbolic functions
3. FindZ the integrals.
Z
1
1
√
dx
(2)
(1)
dx
2
1 − x2
Z
Z 1+x
−1
1
dx
(5) √
dx
(4)
2
9+x
1 − x2
1
√
dx
9 − x2
Z
1
(6) p
dx
1 − (x + 1)2
Z
(3)
Lecture 16
L’Hopital’s rule, big-oh, and little-oh
16.1 L’Hopital’s rule
If the limits lim f (x) and lim g(x) exist and lim g(x) ̸= 0, then the limit lim
x→a
x→a
x→a
x→a
f (x)
g(x)
exists, and
lim
x→a
f (x) limx→a f (x)
=
g(x)
limx→a g(x)
holds. If this condition is not satisfied, the right-hand side does not have any meaning. However, the limit of the left-hand side may still exist. There are cases where
we can easily determine the divergence of the left-hand side. For instance, in the
following cases, the left-hand side diverges:
lim g(x) = 0
x→a
and
lim f (x) = c ̸= 0 or lim f (x) = ±∞
x→a
x→a
Problem 16.1. Explain the limit for cases where convergence can be easily determined.
Solution 16.1 Let’s consider the case when (limx→a f (x), limx→a g(x)) = (c, 0).
If the limit value c is a nonzero real number or ±∞, then the limit diverges to ±∞
depending on the sign of c. Moreover, the case (limx→a f (x), limx→a g(x)) = (c, ±∞)
occurs, and if the limit c is a real number, the quotient above converges to 0. ⊔
⊓
However, if both limits converge to 0 or both diverge, that is,
lim ( f (x), g(x)) = (0, 0) or (±∞, ±∞),
x→a
we cannot intuitively determine. In this case, a convenient method for finding the
limit is L’Hopital’s rule.
Here, since we are dealing with limits, we are not concerned with the function
values f (a) and g(a), but rather with the limits. When the function is continuous,
135
136
16 L’Hopital’s rule, big-oh, and little-oh
the limit and function value are the same. The above limits may be one-sided limits, and a may not be a real number but ∞ or −∞. Of course, the proof should be
adapted accordingly. Although proofs are not provided for all cases, L’Hopital’s rule
is demonstrated for two representative cases.
Theorem 16.1 (L’Hopital’s rule for
0
0
and a ∈ R). Suppose that lim ( f (x), g(x)) =
x→a
(0, 0). If f and g are differentiable in (a − δ , a + δ ) for some δ > 0, and g(x) ̸= 0
for x ̸= a in the interval, then
f (x)
f ′ (x)
= lim ′ .
x→a g(x)
x→a g (x)
lim
(16.1)
Before proving, there are a few things to confirm. Equation (16.1) does not assume
that the right-hand side converges. It holds even when the limit diverges. While the
application of the rule does not definitively determine convergence, if it satisfies the
conditions for using L’Hopital’s rule, it can be used multiple times.
Proof. Let x ∈ (a − δ , a + δ ). Since f and g are differentiable in (a − δ , a + δ ) and
g′ (t) ̸= 0 for all t ∈ (a − δ , a) ∪ (a, a + δ ) by Cauchy’s Mean Value Theorem 3.3,
f (x) − f (a)
f ′ (c)
=
′
g (c)
g(x) − g(a)
holds for some c between x and a. Since f and g are differentiable functions, they are
continuous at the point a, and therefore f (a) = g(a) = 0. Thus, the above equation
becomes
f ′ (c)
f (x)
=
.
g′ (c)
g(x)
Here, c can be seen as a function of x and is between a and x. When x converges to
a, c also converges to a by the sandwich theorem. Therefore,
f (x)
f ′ (c(x))
f ′ (x)
= lim ′
= lim ′ .
x→a g(x)
x→a g (c(x))
x→a g (x)
lim
The above proof does not assume that the limit converges and includes cases of
divergence. ⊔
⊓
Problem 16.2. Find the following limits.
3x − cos x
3x − sin x
(1) lim
(2) lim
x→0
x→0
x
x
x − sin x
sin x
(4) lim
(5) lim 2
x→0
x→0 x
x3
√
1+x−1
(3) lim
x→0
x
Solution 16.2 For (1), since the denominator converges to 0 and the numerator
converges to −1, it diverges to positive and negative infinity (right-hand limit is −∞
and left-hand limit is ∞). For (2), since both the numerator and the denominator
converge to 0, we can use L’Hopital’s rule:
16.1 L’Hopital’s rule
137
3 − cos x
3x − sin x
= lim
= 2.
x→0
x→0
x
1
lim
For (3), since both the numerator and the denominator converge to 0, we may use
L’Hopital’s rule and get:
√
1+x−1
(1 + x)−0.5 × 0.5
lim
= lim
= 0.5.
x→0
x→0
x
1
For (4), as both the numerator and the denominator converge to 0, L’Hopital’s rule
can be applied:
1 − cos x
x − sin x
= lim
.
lim
3
x→0
x→0
x
3x2
However, we cannot conclude the obtained expression is in the form of 00 . By applying L’Hopital’s rule continuously, we get:
lim
x→0
1 − cos x
sin x
cos x 1
x − sin x
= lim
= lim
= lim
= .
x→0
x→0 6x
x→0 6
x3
3x2
6
(We applied L’Hopital’s rule from Theorem 16.1 three times consecutively.) For (5),
we have:
cos x
sin x
= lim
lim
x→0 2x
x→0 x2
and it diverges. ⊔
⊓
Theorem 16.2 (L’Hopital’s rule for
∞
∞
and a ∈ R). Suppose that lim ( f (x), g(x)) =
x→a+
(±∞, ±∞). If f and g are differentiable in (a, b) and g′ (x) ̸= 0 for x ∈ (a, b), then
lim
x→a+
f (x)
f ′ (x)
= lim ′ .
g(x) x→a+ g (x)
This theorem discusses right-hand limits. Corresponding facts hold for left-hand
limits under the corresponding conditions.
Proof. Since f (x) and g(x) tend to ±∞ as x → a+ , we can assume that they do not
1
have a value of 0. Otherwise, we can redefine b to be closer to a. Now let F(x) = f (x)
1
and G(x) = g(x)
. Define F(a) = G(a) = 0, so F and G are right continuous at a.
Applying Cauchy’s Mean Value Theorem 3.3 to F and G as in the proof above, for
all x ∈ (a, b),
g(x)
f ′ (c) g2 (c)
F(x) F ′ (c)
= ′
⇒
= ′
G(x) G (c)
f (x) g (c) f 2 (c)
holds for some c between x and a. Here, c can be seen as a function of x and is
between a and x. When x converges to a, c also converges to a by the sandwich
theorem. Therefore,
138
16 L’Hopital’s rule, big-oh, and little-oh
lim
x→a+
g(x)
f ′ (x)
g2 (x)
= lim ′
lim 2
f (x) x→a+ g (x) x→a+ f (x)
is obtained. Rewriting gives the relationship of the theorem. ⊔
⊓
∞
∞
0
0
when a = ±∞). The proofs of Theo0
±∞
rems 16.1 and 16.2 were shown for the cases when the forms are and
, with
0
±∞
the limit point a being a real number. Describe and prove the theorem corresponding
to the case where the limit point is ∞.
Problem 16.3 (L’Hopital’s rule for
and
Solution 16.3 The basic approach to proving Theorem 16.2 was to modify the
situation to correspond to Theorem 16.1. Recall that this was done by handling f
and g indirectly and considering F = 1f and G = 1g instead. Now, it’s not a problem
of function values but of variables. Can it be resolved by variable transformation?
⊔
⊓
Problem 16.4. Compute the following limits.
sec x
ln x
(1) lim
.
(2) lim √ .
x→∞ x
x→π/2 1 + tan x
⊔
⊓
Solution 16.4
To compute the limit lim f (x) directly may be difficult, so we introduce an
x→a
indirect method. Let φ be a continuous inverse function, and let φ −1 be its inverse. Although it may be hard to compute lim f (x) directly, there are cases where
x→a
lim φ ( f (x)) is easier to compute. Denote this limit by A. Since φ is continuous, we
x→a
have
A = lim φ ( f (x)) = φ (lim f (x)).
x→a
x→a
Thus, by applying the inverse function, we obtain
lim f (x) = φ −1 (A).
x→a
Using this logic along with L’Hopital’s rule, we can compute the following.
Problem 16.5. Compute the following limits.
(1) lim (1 + x)1/x . (2) lim x1/x .
x→∞
x→0
Solution 16.5 (1) Taking the natural logarithm seems to simplify the calculation.
1/(x + 1)
ln(1 + x)
= lim
= 1.
x→0
x→0
x
1
lim ln(1 + x)1/x = lim
x→0
Therefore, lim (1 + x)1/x = e1 = e.
x→0
16.2 Big-oh and Little-oh
139
(2) Taking the natural logarithm also seems to simplify the calculation.
lim ln x1/x = lim
x→∞
x→∞
1/x
ln x
= lim
= 0.
x→∞ 1
x
Therefore, lim x1/x = e0 = 1. ⊔
⊓
x→∞
16.2 Big-oh and Little-oh
This section discusses mathematical language for comparing the sizes of limits, and
having a clear understanding of these concepts is helpful. Since it’s about comparing
sizes, we compare two positive-valued functions.
Consider two positive functions f (x) and g(x).
Definition 16.1. We say f (x) = o(g(x)) (little-oh) as x → a for a ∈ R or a = ±∞ if
lim
x→a
f (x)
= 0.
g(x)
We say f (x) = O(g(x)) (big-oh) as x → a for a ∈ R or a = ±∞ if there exists an
upper bound M > 0 such that
f (x)
≤M
g(x)
for x close enough to a.
In other words, saying that the function f is little-oh o(g) as x → a means that f
is much smaller than g near a. The meaning of being much smaller is that the ratio
tends to 0. Also, saying that f is big-oh O(g) as x → a means that as x approaches
a, f is smaller than a constant multiple of g, where the constant multiple can be
as small as 0. Therefore, f can be much smaller than g, or the two functions can
be of similar sizes up to a constant multiple. Note that in this definition, the set of
functions that are little o(g) is considered as a subset of functions that are big O(g).
Problem 16.6. (1) Compare the sizes of the functions f (x) = ex and g(x) = x2 as
x → ∞. (2) Compare the sizes of the functions f (x) = x2 and g(x) = |x|3 as x → 0.
Solution 16.6 (1) Using L’Hopital’s rule,
f (x)
ex
= lim 2 = ∞.
x→∞ g(x)
x→∞ x
lim
Therefore, we can say g = o( f ) (or x2 = o(ex )) as x → ∞.
For (2),
140
16 L’Hopital’s rule, big-oh, and little-oh
f (x)
x2
= lim 3 = ∞,
x→0 g(x)
x→0 x
lim
so x3 = o(x2 ) as x → 0. If we consider negative x, since x3 can be negative, it’s better
to say |x|3 = o(x2 ) as x → 0. ⊔
⊓
Problem 16.7. Verify whether the following asymptotic comparisons are correct or
incorrect.
(1) ln x = o(x) as x → ∞ (2) x2 = o(x3 + 1) as x → ∞ (3) x = o(ex ) as x → ∞
(4) ex = o(x) as x → ∞ (5) x3 = o(x2 ) as x → 0+ (6) x = o(xx ) as x → 0+
Solution 16.7
⊔
⊓
Exercises
1. Find the limit using L’Hopital’s rule.
x2 + 3x + 1
sin(x)
(2) lim 2
(1) lim
x→∞ 2x + 5x − 3
x→0
x
(3) limπ
x→ 2
cos(x)
sin(x)
Lecture 17
Integration Techniques # 1
In this and the next lecture, we will learn several integration techniques. While differentiation can often be done easily using rules like the chain rule and the product
rule, integration requires specific techniques for different types of functions, which
must be learned and practiced. Despite these techniques, there are still many functions that cannot be integrated by hand. In such cases, numerical methods such as
numerical integration or the use of computer software can be employed.
Derivatives of some functions
First, it is important to remember the derivatives of several special functions. Refer
to the following list:
d x
= ex

dx e



d
x

= ax ln a

 dx a
d
= 1x
(17.1)
dx ln(|x|)

d
1

√
arcsin
x
=


dx

1−x2

d
1
dx arctan x = 1+x2
If you want to find the antiderivative of the functions on the right-hand side of equation (17.1), you simply add a general constant C to the left-hand side function. This
is not a process of finding the function on the left-hand side starting from the function on the right-hand side; it’s simply a matter of memorization. Therefore, memory is important in integration. By combining these memorized functions with some
integration techniques, you can integrate functions in quite a variety of cases.
141
142
17 Integration Techniques # 1
17.1 Substitution
The substitution technique involves using the chain rule in reverse. If we substitute a
function u(x) for the variable x in the function f (x) on the left side of equation (17.1)
to create the composite function f (u(x)), then the derivative of that is according to
the chain rule:
d
f (u(x)) = f ′ (u(x))u′ (x)
dx
Therefore, the indefinite integral of f ′ (u(x))u′ (x) is f (u(x)) + C. This principle is
methodically represented as simply
u′ dx = du
(17.2)
Subsequently, following the reverse chain rule as explained above:
Z
f ′ (u(x))u′ (x)dx =
Z
f ′ (u)du = f (u) +C = f (u(x)) +C
The second integral indicates integrating with respect to u instead of x. Equation
(17.2) extends beyond a mere technique for substitution; it was used as a definition
of differential in Lecture 14. All of this is made possible due to the chain rule.
To actually use it, the key is to identify what becomes u and what becomes u′
(the derivative of u). For example, the integral of √ 1 2 is arcsin(x) +C. Instead, if
1−x
we integrate
′
√ u (x) 2 ,
1−u(x)
1
1−u(x)2
the answer becomes arcsin(u(x)) + C. Integrating √
directly might be more challenging, but integrating the product with u′ (x) becomes
easier thanks to the substitution technique. Now, how do we integrate √ x 4 ? It’s
1−x
important to recognize here that we can use u = x2 .
Let’s look at some examples of substitution. Through this process, you’ll gain a
clearer understanding.
Problem
17.1. Compute
Z the following five
Z indefinite integrals.
Z
Z
1
1
2x − 3
1
√
dx
(2)
dx
(3)
dx
(4)
dx
(1)
2
2
2
2
2
a −x
8x − x
x − 3x + 1
Z x +A
1
(5)
dx
1 − sin x
Solution 17.1 (1)
Z
⊔
⊓
1/A
1
√
dx = √
2
(x/ A) + 1
A
√ √
= arctan(x/ A)/ A +C.
1
dx =
2
x +A
Z
Z
√
1/ A
√
dx
(x/ A)2 + 1
17.2 Integration by parts
143
Problem 17.2. Compute the following four indefinite integrals.
Z
Z
Z
Z
1
3x2 − 7x
3x + 2
√ dx (4) x3 cos xdx
dx (3)
(1)
dx (2) √
3x + 2
(1 + x)3
1 − x2
Solution 17.2
⊔
⊓
Let’s consider another example of substitution. Since the derivative of sine is
cosine and the derivative of cosine is sine, we can use this relationship
effectively to
R
integrate products of sine and cosine. For instance, to compute cosk x sin xdx, we
substitute u = cos x. Then, du becomes − sin xdx, so
Z
cosk x sin xdx = −
Z
uk du = −
1
1 k+1
u +C = −
cosk+1 x +C
k+1
k+1
This way, if there is only one sin x and the rest are all cos x or vice versa, it’s convenient to make the substitution. Even if that’s not the case, when sin x is raised
to an odd power, integration becomes straightforward. Let’s practice trigonometric
substitution with the following problems.
Problem 17.3. Compute the following integrals.
Z
sin3 x cos xdx,
Z
sin3 x cos2 xdx,
Z
cos5 xdx,
Z
sin2 x cos4 xdx.
Solution 17.3 The most difficult integral is (iv). The others are actually relatively
easy. However, since both sin x and cos x appear to even powers, integration seems
difficult. But we can use the double angle formulas.
sin2 x =
1 − cos 2x
,
2
cos2 x =
1 + cos 2x
.
2
Rewriting the problem,
Z
sin2 x cos4 xdx =
Z
1 − cos 2x 1 + 2 cos 2x + cos2 2x
×
.
2
4
Simplify, and integrate each cosine power separately. ⊔
⊓
17.2 Integration by parts
Perhaps the most commonly used and important integration technique is integration
by parts. This technique involves using the product rule of differentiation in reverse.
Recalling the product rule of differentiation,
(uv)′ = u′ v + uv′
⇒
u′ v = (uv)′ − uv′ .
144
17 Integration Techniques # 1
Integrating both sides, we get the following indefinite integral formula:
Z
u′ vdx = uv −
Z
uv′ dx.
But the integration is not complete; there is another integral on the right-hand side.
Is there an improvement? It depends on the problem. The integral on the right side
should be simpler than the integral on the left. How do we achieve this?
First, we need to view the integral
as the product of two functions. Then, to use
R
integration by parts to integrate f (x)g(x)dx, we need to decide which of f and g
will be u′ and which will be v. We decide based on which one is easier to integrate
and which becomes simpler upon differentiation. If integrating one function is easier
and differentiation makes the other function simpler, then we choose to differentiate
the function that makes the other easier to integrate. For example, if integrating f
becomes easier and multiplying g′ to f makes it easier to integrate, then we set
u′ = f and v = g. After applying integration by parts, uv′ is simpler to integrate than
u′ v. Integration by parts essentially swaps the positions of differentiation.
Let’s look at some examples of integration by parts. Through these examples,
you’ll get a clearer understanding.
Problem
17.4. Use integration
by parts to
integrals.
Z
Z
Z evaluate the following
Z
(1)
x cos xdx
(2)
ln xdx
(3)
x2 ex dx
(4)
ex sin xdx
Solution 17.4 (1) Both x and cos x can be integrated or differentiated, but differentiating x simplifies it, so let v = x and integrate u′ = cos x. Applying integration by
parts, u = sin x and v′ = 1, so
Z
x cos xdx = x sin x −
Z
sin xdx = x sin x + cos x +C.
(2) In the second problem, only ln x is present, and its integral is
unknown.
DifferR
R
entiating it gives 1x . Since we can differentiate it, we’ll consider ln xdx = 1 ln xdx
so ln x and 1 are a good pair to consider for integration by parts. Let u = ln x, v′ = 1,
then u′ = 1x , v = x, so
Z
Z
ln xdx =
1 ln xdx = x ln x −
Z
1
x dx = x ln x − x +C.
x
(3) ex can be integrated or differentiated without changing, but x2 becomes simpler upon differentiation. Specifically, its second derivative becomes 1. Therefore,
setting u′ = ex and v = x2 and applying integration by parts a couple of times, we
get
Z
x2 ex dx = x2 ex −
Z
2xex dx = x2 ex − 2xex +
Z
2ex dx = x2 ex − 2xex + 2ex +C.
17.2 Integration by parts
145
(4) Both ex and sin x are easy to integrate or differentiate, but combining them does
not simplify. However, ex remains unchanged upon differentiation and integration,
while sin x becomes − sin x when differentiated twice. So, we can integrate this by
parts. We get
Z
ex sin xdx = ex sin x −
Z
ex cos xdx = ex sin x − ex cos x −
Z
ex sin xdx.
Re-solving for ex sin xdx,
R
Z
ex sin xdx =
1 x
e sin x − ex cos x +C.
2
⊔
⊓
Integration by parts is very useful. Here’s another way to use it.
Problem 17.5 (Reduction). Compute the integral cosn xdx.
R
Solution 17.5 If n = 1, the indefinite integral is sin x +C. For n ≥ 2, let’s consider
those cases. First,
Z
Z
cosn xdx = cosn−1 x cos xdx
and then we use integration by parts. To integrate, let u′ = cos x and v = cosn−1 x.
Then u = sin x and v′ = −(n − 1) cosn−2 x sin x. Therefore,
Z
cosn xdx = cosn−1 x sin x − (n − 1)
Z
cosn−2 x sin2 xdx
= cosn−1 x sin x − (n − 1)
Z
cosn−2 x(1 − cos2 x)dx
= cosn−1 x sin x − (n − 1)
Z
cosn−2 xdx + (n − 1)
Z
cosn xdx.
Rearranging and simplifying, we get
(2 − n)
Z
n
n−1
cos xdx = cos
x sin x − (n − 1)
Z
cosn−2 xdx.
Z
cosn−2 xdx
So, if n ̸= 2,
Z
cosn xdx = −
1
n−1
cosn−1 x sin x +
n−2
n−2
which gives a sort of reduction formula. After computing the integral for n = 1, 2,
subsequent integrals can be obtained using the previous integral values. ⊔
⊓
Lecture 18
Integration Techniques # 2
18.1 Trigonometric substitution
√
2
2
√ substitution is a method to integrate functions involving a + x ,
√ Trigonometric
2
2
2
2
a − x√, and x − a . Let’s consider each one with reference to the figure. To
handle a2 + x2 , we utilize the following relationships provided by the first triangle
in the figure:

x = a tan θ



dx = a sec2 θ dθ
θ = arctan(x/a), − π2 ≤ θ ≤ π2


√ 2
a + x2 = a sec θ
Problem 18.1. Find the integral
R
√1
4+x2
dx.
√
Solution 18.1 a = 2, x = 2 tan θ , dx = 2 sec2 θ dθ , 4 + x2 = 2 sec θ are substituted
to get
Z
Z
Z
1
2 sec2 θ
√
dθ = secθ dθ
dx =
2 sec θ
4 + x2
which is transformed into an integral with respect to θ instead of x. Now using the
integral of secant function, we get:
147
148
18 Integration Techniques # 2
Z
To handle
the figure:
secθ dθ = ln | sec θ + tan θ | +C = ln
√
4 + x2 x
+ +C.
2
2
⊔
⊓
√
a2 − x2 , we use the relationships provided by the second triangle in

x = a sin θ



dx = a cos θ dθ

θ = arcsin(x/a), − π2 ≤ θ ≤


√ 2
a − x2 = a cos θ .
Problem 18.2. Compute the integral
R
√1
4−x2
π
2
dx.
√
Solution 18.2 a = 2, x = 2 sin θ , dx = 2 cos θ dθ , 4 − x2 = 2 cos θ are substituted
to get
Z
Z
2 cos θ
1
√
dx =
dθ = θ +C = arcsin(x/2) +C.
⊔
⊓
2
2
cos θ
4−x
To handle
figure:
√
x2 − a2 , we use the relationships provided by the third triangle in the


x = a sec θ

dx = a sec θ tan θ dθ

θ = arcsec(x/a), 0 ≤ θ ≤ π


√ 2
x − a2 = a| tan θ |.
Problem 18.3.
Solution 18.3
Z
Z
1
√
dx =?.
x2 − 4
1
√
dx =
2
x −4
Z
2 sec θ tan θ
dθ =
2| tan θ |
⊔
⊓
18.2 Integration of rational functions
A rational function is a function with polynomial numerator and denominator. For
example,
x4 + 2x3 + x2 + x + 1
f (x) =
x3 + 1
is a rational function. In this section, we find integrals of such functions. First, if the
degree of the numerator is greater than the degree of the denominator, we can divide
and write as follows:
18.2 Integration of rational functions
149
x4 + 2x3 + x2 + x + 1
x2 − 1
= x+2+ 3
.
3
x +1
x +1
Thus, every rational function can be expressed as a sum of a polynomial and a
rational function whose numerator’s degree is less than the denominator’s degree.
Since we know how to integrate polynomials well, we only need to find integrals of
rational functions whose numerator’s degree is less than the denominator’s degree.
That is, we want to find the integral of
f (x) =
q(x)
,
p(x)
deg(q) < deg(p)
We need to use a very important theorem in algebra.
Theorem 18.1 (Fundamental Theorem of Algebra). A polynomial p(x) with leading coefficient 1 can be factored as follows:
p(x) = p1 (x) · · · pk (x).
Here, pi (x) are irreducible polynomials, which are either linear or quadratic, and
each has a leading coefficient of 1.
A polynomial that cannot be factored further is called an irreducible polynomial.
According to this theorem, polynomials of degree 3 or higher can be factored, and
among quadratic polynomials, some can be factored while others cannot.
Problem 18.4. Prove that if a2 −4b < 0, then the quadratic x2 +ax+b is irreducible.
Solution 18.4 If the quadratic equation x2 + ax + b = 0 has two real roots α, β , then
the quadratic can be factored as (x − α)(x − β ) and is not irreducible. Therefore, the
condition is that the discriminant is less than 0. ⊔
⊓
Rewriting a quadratic in square form, we get
a 2
+ A,
x2 + ax + b = x +
2
A=−
a2 − 4b
4
So if the quadratic is irreducible, then the value A at the vertex is positive, and the
graph does not intersect the x-axis and has no real roots.
Now, let’s find the integrals of rational functions with irreducible denominators.
Problem 18.5 (Irreducible denominators). Show the following integrals. In the
second equation, assume that the denominator is irreducible, i.e., a2 < 4b.
b
dx = b ln |x + a| +C
x+a
Z
√ √
cx + d
c
ac
dx = ln |x2 + ax + b| + (d − ) tan−1 ((x + a/2)/ A)/ A +C,
2
x + ax + b
2
2
Z
150
18 Integration Techniques # 2
where A =
4b−a2
4
> 0.
Solution 18.5 The first equation is obtained by using the natural logarithm:
Z
b
dx = b
x+a
Z
1
dx = b ln |x + a| +C.
x+a
Let’s derive the second equation. Considering that the derivative of the denominator
is (x2 + ax + b)′ = 2x + a, we rewrite the numerator as follows:
Z
cx + d
dx =
x2 + ax + b
=
Z c
ac
2 (2x + a) + d − 2
c
2
Z
dx
x2 + ax + b
Z
2x + a
ac
1
dx
+
(d
−
)
dx.
2
2
x + ax + b
2
x + ax + b
The first term is obtained using substitution:
c
2
2x + a
Z
x2 + ax + b
dx =
c
ln |x2 + ax + b|.
2
For the second term, rewriting the denominator in square form:
Z
1
dx =
2
x + ax + b
2 −4b
where A = − a
⊔
⊓
4
Z
√ √
1
dx = arctan((x + a/2)/ A)/ A,
2
(x + a/2) + A
> 0. Adding these, we get the second equation in the problem.
Partial fraction
So far, we have found integrals for cases where the denominator is of degree 1 or 2
and for cases where the denominator is quadratic and irreducible. Now, we express
any rational function as a sum of these two cases using partial fraction. Then, we
can integrate all rational functions.
Partial fraction decomposition expresses a rational function as a sum of rational
functions with denominators of degree 1 or 2. Theorem 18.1 guarantees that any
rational function can be written as follows:
q(x)
q(x)
=
.
p(x)
p1 (x) · · · pk (x)
Using this fact, we can perform partial fraction decomposition. First, arrange them
according to degrees and separate them into polynomials with degree 1 and those
with degree 2:
q(x)
a1
aℓ
aℓ+1 x + bℓ+1
ak x + bk
=
+···+
+
+···+
p1 (x) · · · pk (x)
p1 (x)
pℓ (x)
pℓ+1 (x)
pk (x)
(18.1)
18.2 Integration of rational functions
151
We can find ai and bi that satisfy this equation. The core of partial fraction is expressing any rational function as a sum of rational functions with denominators of
degree 1 or 2. Although we already know how to integrate each of them, there are
cases where we need to treat them differently, especially when there are pi = p j for
i ̸= j. Below, we will discuss how to perform partial fraction decomposition through
an example.
Problem 18.6. Decompose the following rational function into partial fractions:
5x + 3
x2 + 2x − 3
.
Solution 18.6 Although the denominator is a quadratic, it is not irreducible. We can
factor it as x2 + 2x − 3 = (x − 3)(x + 1). Then,
5x + 1
A
B
Bx − 3B + Ax + A (A + B)x + (A − 3B)
=
+
=
=
(x − 3)(x + 1) x − 3 x + 1
(x − 3)(x + 1)
(x − 3)(x + 1)
So A = 4 and B = 1, and thus
5x + 3
x2 + 2x − 3
=
4
1
+
.
x−3 x+1
Problem 18.7. Integrate the rational function
⊔
⊓
5x+3
.
(x+1)2
Solution 18.7 First, let’s perform partial fraction decomposition. It’s easy to see
that it cannot be written in the form of (18.1). If we attempt to do so:
5x + 3
A
B
A+B
=
+
=
(x + 1)2
x+1 x+1
x+1
which cannot be solved for A and B. So, we take a different approach:
A
Ax + A + B
B
5x + 3
=
=
.
+
(x + 1)2
x + 1 (x + 1)2
(x + 1)2
From the first step, we remember how we started. So, A = 5 and B = −2, and thus
5x + 3
5
2
=
−
2
(x + 1)
x + 1 (x + 1)2
Now, we integrate the two terms:
Z
5
dx = 5 ln |x + 1| +C,
x+1
Adding them up gives the final result. ⊔
⊓
Problem 18.8. Integrate the following:
−
Z
2
2
dx =
+C.
(x + 1)2
x+1
152
18 Integration Techniques # 2
3x3 + 2x2 + 2x + 1
.
(x2 + 1)(x + 1)2
Solution 18.8 We need to perform partial fraction decomposition first:
D
3x3 + 2x2 + 2x + 1 Ax + B
C
+
.
= 2
+
(x2 + 1)(x + 1)2
x + 1 x + 1 (x + 1)2
We have to find A, B, C, and D by comparing coefficients. Comparing, we get:
1
1
5
A = , B = − , C = , D = −1
2
2
2
Now, we integrate the three terms:
Z
C
5
= ln |x + 1|,
x+1 2
Z
1
D
=
,
(x + 1)2
x+1
1
Ax + B 1
= ln |x2 + 1| − tan−1 (x).
2
x +1
4
2
Adding these up gives the final result. ⊔
⊓
Z
Lecture 19
Integration Techniques #3
19.1 Improper integrals
A general integral is performed when the function f is bounded and the integration
R
interval is a finite interval [a, b]. In this case, the integral is denoted as ab f (x)dx
and is called a proper integral. Improper integrals, on the other hand, occur in two
cases: when the function f is unbounded or when the integration interval is not a
finite interval, i.e., [a, ∞), (−∞, a], or (−∞, ∞).
Improper integral of type #1
Let’s consider the improper integral when the size of the integration interval
is inR
finite. If the function f is a continuous function defined on R, then ab f (x)dx is
well-defined. However, when the integration interval has infinite size, the integral
is not immediately defined. In mathematics, we do not ”add infinitely” or anything
like that. We only consider limits. The improper integral is given by the following
limits:
Z ∞
Z b


f (x)dx = lim
f (x)dx,


b→∞ aZ

Zab
b
f (x)dx = lim
f (x)dx,
a→−∞ a

−∞

Z
Z
Z b

∞
0



f (x)dx = lim
f (x)dx + lim
f (x)dx.
−∞
a→−∞ a
b→∞ 0
Question 19.1. Is the size of the universe infinite? Some say it’s finite and expanding. If we refer to everything beyond as the universe, it wouldn’t be incorrect to call
it infinite. But what does it mean when the integration interval is infinite? Should
we be concerned about it?
153
154
19 Integration Techniques #3
Problem
improper
integrals.
Z ∞ 19.1. Find the
Z following
Z ∞
∞
ln x
dx
(1)
dx
(2)
(3)
x p dx
2
2
−∞ x + 1
1
1 x
Solution 19.1 (1) One should realize that using integration by parts is appropriate for this problem. Considering the derivative of ln x as 1x simplifies the process.
Therefore, let’s choose v = ln x for differentiation and u′ = x−2 for integration. Then,
since v′ = x−1 and u = −x−1 , we have
Z b
b
u′ v = −x−1 ln x +
1
1
Z b
x−2 = −x−1 ln x − x−1
1
b
1
=−
ln b 1
− + 1.
b
b
Taking the limit as b → ∞ yields 1.
(2) This problem is straightforward if one remembers the derivative of the arctangent function. It seems obvious that the answer is π even without computation.
Let’s do it anyway.
Z b
dx
0
x2 + 1
b
= arctan x = arctan b − 0 →
0
π
as b → ∞,
2
Z 0
0
dx
π
= arctan x = 0 − arctan b →
as b → −∞.
2 +1
x
2
b
b
Adding them together yields the answer π.
(3) This problem may seem easy, but its significance is crucial and should be
remembered. Let’s consider the integrability of the function x p at infinity. If p ̸= −1,
then
Z b
1
1 p+1 b
x
=
(b p+1 − 1).
x p dx =
p+1
p+1
1
1
Taking the limit as b → ∞, if p > −1, it diverges to infinity, and if p < −1, it
1
converges to − p+1
. For p = −1, it also diverges to infinity as ln b. ⊔
⊓
The last case of this problem is crucial and is summarized below.
(
Z ∞
∞, p ≥ −1
p
x dx = −1
1
p+1 , p < −1.
(19.1)
As we saw when introducing the natural logarithm ln x, the boundary is p = −1.
Problem 19.2 (Comparison). Compare the magnitudes of
1
1 1+x2
R∞
and
Solution 19.2 Consider the following comparison.
0 ≤ f (x) ≤ g(x) in [a, b] ⇒ 0 ≤
Z b
f (x)dx ≤
a
Therefore,
1
1 1+x2
R∞
≤
R∞ 1
R ∞ −2
= 1, and
1 x2 . Also, 1 x
Z b
g(x)dx.
a
R∞ 1
1 x2 .
19.1 Improper integrals
155
Z b
dx
1
x2 + 1
Thus, we obtain
π
4
b
= arctan x = arctan b −
1
π
π
→
as b → ∞.
4
4
< 1, implying π < 4. (Not bad.) ⊔
⊓
Improper integral of type #2
Let’s consider the improper integral when the function’s magnitude is infinite. This
case requires a bit more attention. Suppose the function f is defined on the finite
interval [a, b] and approaches infinity as it approaches a point x0 ∈ (a, b). That is, if
f (x) is finite for all c < x0 in the interval [a, c] and also finite for all c > x0 in the
interval [c, b], then the improper integral of f on [a, b] is defined as follows:
Z b
Z c
Z b
f (x)dx = lim
c→x0 − a
a
f (x)dx + lim
c→x0 + c
f (x)dx.
If the limit exists, then the improper integral exists; otherwise, it does not.
Problem 19.3. Show the following:
(
Z 1
p
x dx =
1
p+1 ,
p > −1
∞,
p ≤ −1.
0
(19.2)
Solution 19.3 If p ̸= −1, then
Z 1
b
x p dx =
1 p+1
x
p+1
1
b
=
1
(1 − b p+1 ).
p+1
Taking the limit as b → 0, if p < −1, it diverges to infinity, and if p > −1, it con1
. For p = −1, it is difficult to conclude, but as we have learned from
verges to p+1
the natural logarithm, ln b tends to infinity as b → 0+ . ⊔
⊓
Like in the case of (19.1), p = −1 forms the boundary in (19.2). However, the
cases where it diverges to infinity are reversed. Only in the case of p = −1 do both
cases diverge.
Problem 19.4. Compute the following.
Z 2
Z 2
1
1
(1)
dx
(2)
dx
1/2
0 |x − 1|
0 x−1
Solution 19.4 Both cases correspond to improper integrals since the functions diverge in the vicinity of x = 1. (1) Let’s use a change of variables. Let z = x − 1, then
dx = dz, and
Z 2
0
1
dx =
|x − 1|1/2
Z 1
−1
|z|−1/2 dz = 2
Z 1
0
z−1/2 dz = 4.
156
19 Integration Techniques #3
(2) Similarly for this case,
Z 2
0
1
dx =
x−1
Z 1
1
0
z
dz +
Z 0
1
−1
z
dz.
Using the variable transformation y = −z for the last integral, where dy = −dz,
Z 0
1
−1
z
dz = −
Z 0
1
1
−y
dy = −
Z 1
1
0
y
dy.
Now substituting back,
Z 2
0
1
dx =
x−1
Z 1
1
0
z
dz −
Z 1
1
0
y
dy = 0.
However, this should not be the answer. Although 01 1z dz and 01 1y dy are the same,
both diverge, and subtracting one from the other implies subtracting infinity from
infinity, which is incorrect. It is more appropriate to state that both integrals diverge.
⊔
⊓
R
R
19.2 Integration with software
Besides the integration methods we have learned, various other integration methods
are available in computer software for use. In this section, we will explore how to
use them through some examples.
Problem 19.5.
Solution 19.5
⊔
⊓
Problem 19.6.
Solution 19.6
⊔
⊓
Part IV
Approximation Techniques and Series
When dealing with real problems, most of the time, it’s necessary to work with
approximate values because handling true values is often not possible. For instance,
since π is an irrational number, computers cannot handle its true value. However,
for activities like flying planes and putting satellites into orbit, extremely precise
approximate values are required. When using approximate values, there are two
important factors to consider. What are they?
One is that the precision should be high for a good approximation, and the other
is that you need to know the maximum error range between the approximate value
and the true value. Let’s say there’s an approximation method with excellent convergence but the error range is unknown, and another method with poor convergence
but the error range is known. Which one would most bosses choose? Most bosses
would choose the method where the error range is known. That’s how important
knowing the error range is.
Sequences and series might seem algebraic, so why are they dealt with in calculus? It’s because calculus and analysis deal with approximation mathematics. Taylor
expansion, in particular, is a method of approximating functions using differentiation. Typically, when approximating a function, a specific function series φi is given
first, and the target function f (x) to be approximated is represented as a linear combination of the function series φi . In other words,
∞
f (x) ∼
= ∑ ai φi (x)
i=0
is the form. The goal is to find the sequence ai corresponding to the given function
f (x) and to find the convergence of this constructed function series and the interval
of convergence for x, and if possible, to find the maximum error range. In Taylor
expansion, φi (x) = xi or φi (x) = (x − x0 )i is given, and the coefficients ai are found
using differentiation. Then, the error range is calculated. There are various other
approximation methods, but how the function series φi is constructed is crucial.
If a specific number is substituted for x, the right-hand side of the above equation
becomes a series. For this reason, we first understand the properties of fundamental
sequences and series before considering function series. If approximating values using data and neural networks is AI, then finding a function for approximation using
differentiation is Taylor expansion. There’s something missing in Taylor expansion
compared to AI: error estimation. If AI can provide the maximum error range for
approximations, we can use it confidently. However, it seems impossible to theorize
about the error range of AI approximations.
Lecture 20
Numerical Integration
In the previous three lectures, various techniques of integration were learned.
Nevertheless, depending on the task at hand, one may encounter more cases where
integration is not possible using these methods than cases where it is. However,
understanding the principles and the relationship with differentiation is important.
Even if we cannot obtain exact integrals by hand in practice, we can use computers
to compute the integral values. In this process, our understanding of the integrals
and differentials we have already learned will guide us.
20.1 Numerical integration and Riemann sum
Partitioned integration is a task that is too time-consuming and tedious for humans
to do directly, but it is very suitable for performing numerical calculations using
computers. In this section, we√will learn about these techniques. Let’s start by taking
the test function as f (x) = 1 − x2 , and the integration interval as [a, b] = [0, 1].
Then, as shown in the above figure, it is a part in the first quadrant of a circle with
the origin as the center and a radius of 1, and its integral value is 14 of the area of the
circle. In other words,
Z 1p
π
1 − x2 dx = .
4
0
Let’s compare how well the numerical techniques perform compared to the exact integral value. First, let’s decide on a partition to perform the integration. Let’s simply
set
0
1
2
n−1
n
, xn = .
x0 = , x1 = , x2 = , · · · , xn−1 =
n
n
n
n
n
We divided it into a total of n subintervals. Next, according to the partitioned integration method, for the i-th subinterval [xi−1 , xi ], we choose a point si ∈ [xi−1 , xi ] and
compute the Riemann sum as
159
160
20 Numerical Integration
n
n
q
1
1 − s2i ,
n
i=1
Rn = ∑ f (si )△xi = ∑
i=1
which becomes a numerical integration. As n increases, the above value gradually
approaches the integral value. Below is the MATLAB code to calculate this:
%% parameters
n=10;% number of subintervals
L=1;% integration domain is [0, L]
%%
dx=L/n;
x=0:dx:L; % partition for [0,L] with mesh size dx
% i-th interval is [x(i),x(i+1)].
%%
R=0;
% Riemann sum
for i=1:n
s=x(i);
% s is the left point
R=R+sqrt(1-sˆ2)*dx;
end
E=R-pi/4; % approximation error
In many numerical computation codes like MATLAB, the index starts from 1
instead of 0, so the partition created above is x1 , · · · , xn+1 . Here, the i-th subinterval
is [xi , xi+1 ] and xi is the left endpoint.
Problem 20.1. When calculating the Riemann sum numerically, how should si be
chosen among the points in the interval [xi , xi+1 ]?
Solution 20.1 Three methods can be considered. Left point si = xi , midpoint si =
xi +xi+1
, and right point si = xi+1 . We computed the approximation error for various
2
numbers of subintervals using these three methods and created Table 20.1. From
this table, it can be seen that the case of using the midpoint has the smallest error.
Then, can we
√ say that using the midpoint is the best method? If the function is
not f (x) = 1 − x2 but another function, will using the midpoint still minimize the
error? What does it mean for a method of integration to be good? ⊔
⊓
From Table 20.1, it seems that using the midpoint yields the best results, followed
by the Trapezoid rule. Using the left point appears to give the worst results. Then,
can we √
say that using the midpoint is the best method? Even if the function is not
f (x) = 1 − x2 , will using the midpoint still minimize the error? What does it mean
for a method of integration to be good?
20.2 Convergence order
161
Number of subintervals Left point si = xi Midpoint si =
n=5
0.07386
0.00760
n=10
0.04073
0.00270
n=15
0.02828
0.00147
n=20
0.02172
0.00096
n=25
0.01765
0.00069
n=30
0.01488
0.00052
n=35
0.01286
0.00042
n=40
0.01134
0.00034
R √
Table 20.1 Integration approximation error for 01 1 − x2 dx.
xi +xi+1
2
Trapezoid rule
-0.02614
-0.00927
-0.00505
-0.00328
-0.00235
-0.00179
-0.00142
-0.00116
20.2 Convergence order
When discussing what constitutes a good numerical method, one commonly used
criterion is the convergence order. If a function is continuous, as the size of the
subintervals △x tends to zero, the Riemann sum converges to the integral value.
However, how quickly it converges depends on the method used. One of the ways to
indicate the rate of convergence is the convergence order. If the function is continuous, as the mesh size △x tends to 0, the size of the approximation error decreases at
a rate determined by the convergence order. This is typically expressed using big-oh
notation. A convergence order k means that as △x approaches 0:
Approximation Error = O(△xk ) as
△x → 0.
A larger k indicates faster convergence.
So, how can we compute the convergence order based on numerical results? The
best way to compute the convergence order k is by taking the logarithm of the approximation error. If the function is given as a power function F = yk and we want
to find k, we take the natural logarithm of both sides. Then, if y = y1 corresponds to
F = F1 and y = y2 corresponds to F = F2 :
ln F1 − ln F2 = k ln y1 − k ln y2 ⇒ k =
ln F1 − ln F2
.
ln y1 − ln y2
Therefore, if the mesh size is △x = △x1 and △x = △x2 , and the errors are E1 and
E2 respectively, the convergence order k is given by:
k=
ln E1 − ln E2
ln △x1 − ln △x2
(20.1)
Of course, the error is not precisely given by a power function, so we should understand this as merely showing such a convergence order as △x decreases.
Problem 20.2. Compute the convergence order for each method using Table 20.1.
162
20 Numerical Integration
Solution 20.2 To compute the convergence order using equation (20.1) with the
given data, it would be recommended to write a small code to perform the calculations rather than doing it manually for each case, as it would be time-consuming.
The resulting table would be similar to Table 20.2. From this calculation, it seems
that when using the left point, the convergence order appears to converge to 1, while
for the midpoint and the Trapezoid rule, it appears to converge to around 1.5. ⊔
⊓
Number of subintervals Left point si = xi Midpoint si =
n=5→ 10
0.8587
1.4903
n=10→ 15
0.8996
1.4945
n=15→ 20
0.9181
1.4961
n=20→ 25
0.9293
1.4970
n=25→ 30
0.9369
1.4975
n=30→ 35
0.9426
1.4979
n=35→ 40
0.9469
1.4982
n=40→ 45
0.9505
1.4984
R √
Table 20.2 Convergence order for 01 1 − x2 dx.
xi +xi+1
2
Trapezoid rule
1.4956
1.4975
1.4982
1.4986
1.4989
1.4991
1.4992
1.4993
Question 20.1. When using the left point, the convergence order appears to converge to 1, which is already known. However, when using the midpoint or the Trapezoid rule, it was known to converge to 2, but why does it converge to around 1.5?
√
The test function we chose, f (x) = 1 − x2 , has a divergence in the derivative at
x = 1. Graphically, the slope of the derivative is vertical at x = 1. Note that even if
good integration methods are used, the convergence order is not as high as theoretically expected if the derivative of the function to be integrated is not finite. So let’s
now integrate a function whose derivative is always less than 1, f (x) = sin x, on the
interval [a, b] = [0, π/2]. Then, the true value is as follows:
Z π/2
0
sin xdx = − cos x
π/2
0
= − cos(π/2) + cos(0) = 1.
Table 20.3 provides the convergence order. When using the left point, it appears to
be close to 1, while when using the midpoint and the Trapezoid rule, it appears to
be close to 2. We obtained results close to the known convergence order. Remember
that theoretical convergence orders are limited to functions that are differentiable
enough.
Trapezoid Rule
Integration involves calculating the areas between the x-axis and the graph of a
function over intervals. In the Riemann sum, this is approximated by dividing the
intervals into smaller parts and summing up the areas corresponding to each part.
Another approximation method involves using trapezoids to approximate the area.
20.3 Numerical integrals and Gaussian quadrature
163
Number of subintervals Left point si = xi Midpoint si =
n=5→ 10
1.0364
2.0031
n=10→ 15
1.0211
2.0010
n=15→ 20
1.0149
2.0005
n=20→ 25
1.0116
2.0003
n=25→ 30
1.0095
2.0002
n=30→ 35
1.0080
2.0001
n=35→ 40
1.0070
2.0001
n=40→ 45
1.0061
2.0001
Table 20.3 Convergence order for
R π/2
0
xi +xi+1
2
Trapezoid rule
2.0018
2.0006
2.0003
2.0002
2.0001
2.0001
2.0001
2.0000
sin xdx.
In essence, it averages the areas obtained from considering the left and right points
of each interval. Thus, it can be expressed as:
Z b
a
n
n
1 n
f (xi−1 ) + f (xi )
△x =
f
(x
)△x
+
f
(x
)△x
.
f (x)dx ∼
=∑
i
i−1
∑
∑
2
2 i=1
i=1
i=1
So, strictly speaking, the trapezoid rule is not a type of Riemann sum. However,
it is interesting to note that it is the average of two Riemann sums, one using the
left point and the other using the right point. While the trapezoid rule averages two
Riemann sums with a convergence order of 1, its convergence order becomes 2. It
shares this convergence order with the midpoint method.
One of the advantages of the trapezoid rule is that it comes with an error estimate,
which we present without proof here.
Theorem 20.1 (Error Estimate for Trapezoid Rule). Let f : [a, b] → R be twice
R
differentiable, and T (△x) be the estimate of the integral ab f (x)dx using the trapezoid rule with uniform mesh size △x. Then, there exists ξ ∈ [a, b] such that
Z b
a
f (x)dx − T (△x) = −
(b − a)3 ′′
f (ξ )|△x|2 .
12
Therefore, if the function f is twice differentiable and its second derivative is
finite, the convergence order is O(|△x|2 ) as △x → 0.
20.3 Numerical integrals and Gaussian quadrature
Is focusing on the midpoint in the previous Riemann integration the best way to
reduce numerical errors? Yes. However, in Riemann integration, only one point si
is chosen in the i-th subinterval, but when performing numerical calculations, it is
possible to compute the area corresponding to the interval [xi , xi+1 ] using more than
one point. The Trapezoid rule can be considered a case where two endpoints are
selected. If more than one point is chosen, what points should be chosen?
164
20 Numerical Integration
In the Trapezoid rule, endpoints are used, but there is another method to achieve
higher convergence rates than that. It is the method presented in Gaussian quadrature
or simply Gaussian-Legendre quadrature. When one point is chosen, the midpoint is
chosen for this method. When multiple points are chosen, traditionally, points and
coefficients, also known as weights, that need to be chosen based on the interval
[−1, 1] are given. The sum of these coefficients is 2, which is the size of the interval.
Table (20.4) provides these reference points and weights. You can understand how
these values are given by studying Legendre polynomials.
# of points
1
2
3
4
Points used
Weights
s1 = 0
w1 = 2
s1 = − √13 , s2 = √13
w1 = 1, w2 = 1
q
q
s1 = − 34 , s2 = 0, s3 = 34
w1 = 59 , w2 = 98 , w3 = 59
r
r
q
q
√
√
s2,3 = ± 37 − 27 65 , s1,4 = ± 37 + 27 65 w2,3 = 18+36 30 , w1,4 = 18−36 30
Table 20.4 Gaussian-Legendre quadrature in interval [−1, 1].
When the integration interval changes from [−1, 1] to [xi , xi+1 ], the positions and
sizes need to be adjusted to obtain the values above. Since the interval size has
changed from 2 to ∆ x, the weight w j simply needs to be multiplied by ∆2x . The
x +x
position s j is multiplied by ∆2x and then moved to the right by i 2 i+1 . Then we
obtain Table 20.5. The number of points used can be increased, and the higher the
number of points used, the higher the convergence order.
Problem 20.3. Using the Gaussian quadrature approximation given in Table 20.5,
calculate the convergence order when the number of points used is 1, 2, and 3.
Solution 20.3 The calculated convergence order is given in Table 20.6. The case
where 1 point approximation is used is the case of using the midpoint, and the
cases where 2 and 3 points are used are new calculations. It can be seen that the
convergence rates are even numbers, 4 and 6, respectively. However, when using
3 points approximation, it slightly fluctuates near 6. Why is that? The magnitude
of the approximation error is compared in Table 20.6. It can already be seen that
20.3 Numerical integrals and Gaussian quadrature
Number of points
1
2
3
4
165
Points used
Weights
x +x
s1 = i 2 i+1
w1 = ∆ x
x +x
s1,2 = i 2 i+1 ± √13 ∆2x
w1,2 = ∆2x
q
x +x
x +x
s1,3 = i 2 i+1 ± 34 ∆2x , s2 = i 2 i+1 w1,3 = 59 ∆2x , w2 = 89 ∆2x
r
q
√
x +x
s2,3 = i 2 i+1 ± 37 − 27 65 ∆2x
w2,3 = 18+36 30 ∆2x
r
q
√
xi +xi+1
s1,4 = 2 ± 37 + 27 65 ∆2x
w1,4 = 18−36 30 ∆2x
Table 20.5 Gaussian-Legendre quadrature in interval [xi , xi+1 ].
Number of subintervals
n = 5 → 10
n = 10 → 15
n = 15 → 20
n = 20 → 25
n = 25 → 30
n = 30 → 35
n = 35 → 40
Table 20.6 Convergence order for
1 point approx.
2.0031
2.0010
2.0005
2.0003
2.0002
2.0001
2.0001
R π/2
0
2 points approx.
4.0034
4.0011
4.0005
4.0003
4.0002
4.0001
4.0001
3 points approx.
6.0036
6.0013
6.0033
5.9793
6.0257
6.4575
5.6449
sin xdx.
the error reaches the MATLAB error limit. Increasing the number of significant
digits reduces this phenomenon. Using a higher-order approximation may be more
effective than increasing the number of intervals for integration. ⊔
⊓
Number of subintervals
n=5
n = 10
n = 15
n = 20
n = 25
n = 30
n = 35
1 point approx.
4.1242e-03
1.0288e-03
4.5705e-04
2.5707e-04
1.6451e-04
1.1424e-04
8.3930e-05
Table 20.7 Integration approximation error for
R π/2
0
2 points approx.
-2.2619e-06
-1.4104e-07
-2.7847e-08
-8.8097e-09
-3.6082e-09
-9.3919e-10
-5.5053e-10
3 points approx.
4.7849e-10
7.4576e-12
6.5437e-13
1.1635e-13
3.0642e-14
1.0214e-14
3.7748e-15
sin xdx.
Problem 20.4. When increasing the number of points used, the convergence order
changes, but the computational complexity also increases. Instead of increasing the
number of points, would increasing the number of subintervals be more effective?
How can we compare what is more effective? Interested students may find it beneficial to practice conducting convergence tests in cases other than those mentioned
in this lecture.
R √
Problem 20.5. Is there a reason why the convergence order for 01 1 − x2 dx is 1.5?
Will the Gaussian quadrature technique with 2 or 3 points also yield approximately
1.5? Why is it 1.5?
Lecture 21
Sequences and series
Sequences and series are fundamentally the same. Given a sequence an , we can
create its partial sum sn = ∑ni=1 ai , which is another sequence. We call sn a sequence
composed of partial sums of the sequence an , or simply the series of an . Likewise,
the sequence an can also be a series of some other sequence. If we define
b1 = a1 , b2 = a2 − a1 , b3 = a3 − a2 , · · · , bn = an − an−1 , · · · ,
then an becomes the series of the sequence bn . Given a series, the purpose of this
lecture is to find the sequence that generates the series and use this information to
find limits of the series. (Sometimes partial sums sn are also referred to as series.)
21.1 Sequence of real numbers
A collection of ordered numbers is called a sequence. We consider sequences composed of real numbers. The order is indicated by attaching an index. For example,
a1 , a2 , a3 , a4 , · · · .
Usually, we start indexing from 0 or 1, but it is not necessary. Depending on the
situation, we can choose whichever is convenient. Unless stated otherwise, indices
are considered as natural numbers i ∈ N = {0, 1, 2, . . . }.1 If a sequence an is given
by a general formula like an = (−1)n−1 for n = 1, 2, · · · , we can list the sequence as
” {1, −1, 1, −1, · · · }” in order.
Problem 21.1. Given the sequence {an } = {1, − 21 , 13 , − 41 , · · · }, find the general formula for an .
1
Some include 0 in natural numbers, and some don’t. It’s a matter of convenience; we include it
here.
167
168
21 Sequences and series
Solution 21.1 The general formula for this sequence is an =
(−1)n+1
.
n
⊔
⊓
Convergence and limits of sequences are handled more simply and easily compared to functions. When discussing limits, it’s important to remember that the values of the initial terms of a sequence are essentially irrelevant; it’s the terms that
come later that matter. The limit of a sequence is defined as follows.
Definition 21.1. (1) We say a sequence an converges to a number L ∈ R as n → ∞ if
for any given ε > 0, there exists N, which may depend on ε, such that |an − L| < ε
whenever n ≥ N. We call L the limit of an as n → ∞ and denote lim an = L. (2) If
n→∞
there is no such number L, we say an diverges as n → ∞. (3) If for any M ∈ R, there
exists N > 0 such that an > M whenever n ≥ N, we say an diverges to infinity. (4)
If for any M ∈ R, there exists N > 0 such that an < M whenever n ≥ N, we say an
diverges to negative infinity.
Saying a sequence diverges means it either grows to infinity or does not converge
for any case. When defining the continuity of a function, we used the ε-δ method.
The definition here is essentially the same. We’ve just replaced δ with N. Therefore,
while it may seem somewhat familiar, it’s worth reconsidering its meaning.
Problem 21.2. Prove the following.
1
(1) lim = 0. (2) lim k = k. (3) an = (−1)n diverges.
n→∞
n→∞ n
Solution 21.2 To prove this, it is necessary to have an understanding of the situation.
Then, practice writing clearly is needed. Through the process of writing, thoughts
become clearer and more organized. Also, through efforts to write clearly, new ways
of expression can be discovered. Thus, mathematics has been used as a means to
learn logical expression.
(1) Let ε > 0 be given (or assume it is given). Let N be an integer greater than ε1 .
Then, for all n > N, the following holds:
1
1
− 0 = < ε.
n
n
We have found N satisfying the properties required by the definition, so the proof is
complete.
(2) This problem refers to the case where the sequence is given as an = k. Since
it is given regardless of the index n, k simply represents a constant. Therefore, for
any given ε > 0, we can choose N to be any integer greater than or equal to 1. Then,
|an − k| = |k − k| = 0 < ε for any n > N. We have found N satisfying the properties
required by the definition, so the proof is complete.
(3) Saying that it diverges means there is no converging value L. Generally, it is
more difficult to show that something does not exist than to show that it does. Let’s
assume there is some L it converges to. Then, for ε = 0.5, there exists N such that
21.1 Sequence of real numbers
169
for all n > N, |an − L| < 0.5 must hold. However, no matter how large N is chosen,
there are n > N such that an = 1 and m > N such that am = −1. Thus, we can create
the following contradiction:
2 = |1 − (−1)| = |1 − L + L − (−1)| ≤ |an − L| + |L − am | < 0.5 + 0.5 = 1.
A contradiction arises with 2 < 1, which was derived from the assumption that there
exists a converging L. Therefore, the given sequence does not converge, i.e., it diverges. ⊔
⊓
Problem 21.3. Let lim an = A and lim bn = B. Prove the following.
n→∞
(1) lim (an + bn ) = A + B.
n→∞
(4) lim (an bn ) = AB.
n→∞
(5)
n→∞
(2) lim (an − bn ) = A − B.
n→∞
lim (an /bn ) = A/B
n→∞
(3) lim (kan ) = kA.
n→∞
if B ̸= 0.
Solution 21.3 The proofs for the limits of sequences and functions are essentially
the same.
(1) Let ε > 0 be given. Then, since limn→∞ an = A, there exists N1 > 0 such that
|an − A| < ε/2 whenever n > N1 . Similarly, since limn→∞ bn = B, there exists N2 > 0
such that |bn − B| < ε/2 whenever n > N2 . Let N = max(N1 , N2 ). Then,
|(an + bn ) − (A + B)| = |an − A + bn − B| ≤ |an − A| + |bn − B| < 0.5ε + 0.5ε = ε
whenever n > N. Therefore, limn→∞ (an + bn ) = A + B.
(5) In this case, the condition B ̸= 0 is additionally required. However, if one tries
too hard to prove this, it may become analysis rather than calculus. Nevertheless,
students interested in mathematics are encouraged to try. ⊔
⊓
Problem 21.4 (Sandwich Theorem). Let an ≤ bn ≤ cn and lim an = lim cn = L.
n→∞
n→∞
Show that lim bn = L.
n→∞
If an and cn tend to the same limit L, then any term bn squeezed between them
also tends to L.
Problem 21.5 (Continuity). Let lim an = L and f (x) be continuous at L. Show that
n→∞
lim f (an ) = f (L).
n→∞
It’s good to remember that if a function is continuous at a limit point, then the
limit enters into the function.
Problem 21.6 (Useful limits to remember). (1) lim x1/n = 1 if x > 0.
n→∞
(2) lim n1/n = 1.
n→∞
x n
= ex .
(3) lim 1 +
n→∞
n
170
21 Sequences and series
xn
= 0.
n→∞ n!
n + 1 n
(5) lim
= e2 .
n→∞ n − 1
(4) lim
Solution 21.6 Let’s compute these limits. We skip (1) since it can be done similar
to (2).
1/n )
(2) Take natural logarithm and then exponentiate. (Using eln(n
have,
ln n
ln n1/n =
→ 0 as n → ∞.
n
Thus, using the continuity of ex , we get
1/n
lim n1/n = lim eln n
n→∞
n→∞
= n1/n .) We
1/n
( lim ln n
)
= e n→∞
= e0 = 1.
(Something invisible becomes visible when you take logarithm and then exponentiate. Why does this happen? What is going on? Have we gained something profound?)
(3) Similarly, take logarithm, compute the limit, and then take the exponential.
Let h = 1/n, then
lim ln 1 + hx
h→0
1/h
x
1
ln(1 + hx)
ln 1 + hx = lim
= lim 1+hx = x.
h→0 h
h→0
h→0 1
h
= lim
We used L’Hopital’s rule in the last step. Then, exponentiate.
(4) Intuitively clear. The denominator grows much faster. For a proof, let’s take
natural logarithm. Then,
xn
ln = n ln x − ln(n!).
n!
This approach doesn’t work well. Divide numerator and denominator by xn and take
the limit. Then,
xn
1
lim
= lim 1 2
= 0,
n−1 n
n→∞ n!
n→∞
x x ··· x x
because the absolute value of the denominator tends to infinity.
(5) Rewrite (3) as follows:
lim
n→∞
n + 1 n
n−1
= lim 1 +
n→∞
2 n−1 2 1+
= e2 · 1.
n−1
n−1
⊔
⊓
Now, let’s familiarize ourselves with the concepts of upper bound, supremum,
and limit supremum.
21.1 Sequence of real numbers
171
Definition 21.2. (1) A sequence an is called bounded above if there exists M such
that an ≤ M for all n. Such M is called an upper bound of an . (2) The smallest
upper bound of an is called the supremum of an and denoted by M = sup an . (3)
Let M k be the supremum of {an : n ≥ k}. Then, the limit lim M k is called the limit
k→∞
supremum of an and is denoted by M = lim sup an .
We often focus on limits, and when examining the limit of a sequence, a finite
number of initial values does not matter. Similarly, when considering supremum,
we might want to discard a finite number of initial values and consider the limit
supremum.
Through the following problem,
Problem 21.7. The largest value among the values of a sequence an is called the
maximum, denoted by max an . The smallest value is called the minimum, denoted
by min an . (1) Find the maximum of the sequence an = 9.8n − n2 . (2) Find the maximum of the sequence an = − n1 .
Solution 21.7 (1) Consider the sequence an as a function, let’s say f (x) = 9.8x −
x2 . Taking derivative, we get f ′ (x) = 9.8 − 2x. Thus, the maximum occurs at x =
4.9. But n needs to be an integer, and observing the form of the function f (x), the
sequence an achieves its maximum value of 24 when n = 5.
(2) As n increases, an increases. It approaches zero but never attains zero. Therefore, there’s no single maximum value for an . While there’s no maximum value, 0
acts as an upper bound. Hence, the supremum is 0. Maximum may or may not exist,
but supremum always exists. This is why we define supremum instead of maximum.
⊔
⊓
Now, let’s understand the concepts of lower bound, infimum, and limit infimum.
Definition 21.3. (1) A sequence an is called bounded below if there exists M such
that an ≥ M for all n. Such M is called a lower bound of an . (2) The largest lower
bound of an is called the infimum an and denoted by M = inf an . (3) Let M k be the
infimum of {an : n ≥ k}. Then, the limit limk→∞ M k is called the limit infimum of
an and is denoted by M = lim inf an .
Among sequences, monotonic sequences are the most manageable. They either
continuously increase or decrease.
Definition 21.4. Let an be a sequence of real numbers. (1) It is called an increasing
sequence if an ≤ an+1 for all n. (2) It is called a decreasing sequence if an ≥ an+1
for all n. (3) It is called monotone if it is one of the two cases.
If a sequence is bounded above and increasing, it converges easily.
Problem 21.8 (Bounded monotone sequence). If an is bounded above and increasing, it converges.
172
21 Sequences and series
Solution 21.8 The first step is to guess the converging value. Let L = sup an and
show that it converges to L. Given ε > 0, now we need to find N. Firstly, note that for
all indices n, an ≤ L. If not, L wouldn’t be the supremum. L −ε isn’t an upper bound,
so there exists a number between L − ε and L in the sequence. Let’s denote one such
index by N. If n > N, since an is increasing, L − ε < aN ≤ an ≤ L. Therefore, for all
n > N, |an − L| < ε. ⊔
⊓
The proof above is very basic and simple, but it might seem difficult if you’re
not accustomed to the logical progression of such statements. Once you get used to
expressing yourself mathematically or logically, it becomes easy thereafter.
21.2 Series of real numbers
Given a sequence an , we can create a new sequence using the partial sums of this
sequence:
n
sn = ∑ ai .
i=1
Such a sequence created by partial sums is called a series, and its limit is represented
as follows:
∞
lim sn =
n→∞
∑ an .
n=1
∑∞
i=1 an
It is important to note that
does not mean adding an infinite number of an .
We never add an infinite number, nor can we. We only create a sequence using
partial sums sn , find the limit of this sequence, and denote it as ∑∞
n=1 an . We refer to
the sequence thus obtained or its limit as a series.
∞
Problem 21.9. Show that if the series
lim ak = 0.
∑ ak converges, then k→∞
k=1
Solution 21.9 Let’s denote the limit as ∑∞
k=1 ak = L. Then, sk − sk−1 = ak , so
lim ak = lim (sk − sk−1 ) = lim sk − lim sk−1 = L − L = 0.
k→∞
k→∞
k→∞
k→∞
Hence, the sequence an converges to 0. ⊔
⊓
If we reverse the above result, logically called the contrapositive, it states that
if the sequence an does not converge to 0, then the series ∑∞
k=1 ak diverges. There
are two possible cases to consider: either an converges to a non-zero value, or it
diverges.
n+1
Problem 21.10. Show that the series ∑∞
n=1 n diverges.
Solution 21.10 Since lim
n→∞
n+1
= 1, the series diverges. ⊔
⊓
n
21.3 Power series
173
However, just because lim ak = 0 does not mean that the series ∑∞
n=1 an conk→∞
verges.
1
Problem 21.11. Show that the series ∑∞
n=1 n diverges.
1
Solution 21.11 In this case, lim an = lim = 0, but the series diverges. Specifin→∞
n→∞ n
cally,
1
1 1
1 1 1 1
1
1
1
1
1
1
1
1
+ + + + + + + +
+
+
+
+
+
+
+···
2
3 4
5 6 7 8
9 10 11 12 13 14 15 16
1
1
1
1
1
1
1
1
1 1 1 1 1 1 1
+
+
+
+
+
+
+
+···
≥ 1+ + + + + + + +
2 4 4 8 8 8 8 16 16 16 16 16 16 16 16
1
1
1
1
= 1+
+
+
+
+···
2
2
2
2
1+
In this calculation, what we are trying to show is that we can insert 12 as many times
1
1
as we want. We can have 4 18 s, 8 16
s, 16 32
s, and so on, infinitely. Therefore, it
diverges to infinity. ⊔
⊓
1
The fact that the series ∑∞
n=1 n diverges is quite important and will be used frequently in the future. This case is a very important one at the boundary between
convergence and divergence.
1
Question 21.1. If α < 1, the series ∑∞
n=1 nα diverges. Why is that? On the other
1
∞
hand, if α > 1, the series ∑n=1 nα converges. Why is that? The former is obvious,
but the latter is not.
21.3 Power series
In the future, we will mainly deal with series called power series, which are given
in the following form:
∞
∑ cn (x − x0 )n .
n=0
Here, x0 is called the center, and cn is the coefficient of the n-th term. Also, if we
consider x as the variable, then a power series becomes a function that resembles
an infinite-degree polynomial in x. When calculating partial sums of power series,
we start indexing from 0 instead of 1. This is because it is convenient to denote the
constant term when n = 0. Considering the case where we shift the x-axis parallel to
the x0 = 0, is sufficient. The convergence of the above power series depends on the
coefficients cn and the magnitude of the variable |x − x0 |. It is of primary interest to
find the convergent value and the convergent region of the series.
In particular, if all coefficients are the same, cn = c0 , we call it a geometric series.
174
21 Sequences and series
Problem 21.12 (Geometric series). A sequence given by an = c0 xn , n = 0, 1, · · · , is
called a geometric sequence. Prove the following:
∞
c0
∑ c0 xn = 1 − x
if |x| < 1,
i=0
∞
and if |x| ≥ 1, then the geometric series
∑ c0 xn diverges.
n=0
n
Solution 21.12 Let’s denote the partial sum as sn = ∑ c0 xi . Then,
i=0
n
n
n
n+1
sn − rsn = ∑ c0 xi − ∑ c0 xi+1 = ∑ c0 xi − ∑ c0 xi = c0 − c0 xn+1 .
i=0
i=0
i=0
i=1
The last equality holds because all intermediate terms cancel out, leaving only the
first and the last term. Therefore, for |x| < 1,
c0 (1 − xn )
c0
=
.
n→∞
1−r
1−r
lim sn = lim
n→∞
∞
And for |x| ≥ 1, an = c0 xn does not converge to 0. Therefore, the series
∑ c0 xn
n=1
diverges. ⊔
⊓
Problem 21.13. Find the limit of the following series.
(−1)n 4
1
1
+ 81
+· · · . (2) ∑∞
(1) 19 + 27
n=0 4n . (3) 5.232323232323 · · · .
Solution 21.13 (1) In this case, the initial value is a0 =
r = 13 . Therefore, the limit of the series is:
∞
a0
1
1
∑ ak = 1 − r = 9(1 − 1 ) = 9 2
k=0
3
3
1
9
1
(4) ∑∞
n=1 n(n+1) .
and the common ratio is
1
= .
6
(2) The initial value is a0 = 4 and the common ratio is r = − 14 . Therefore, the limit
of the series is:
∞
(−1)n 4
4
16
∑ 4n = 1 − (− 1 ) = 5 .
n=0
4
(3) (So you’re writing the repeating decimal 5.2323232323 · · · as a series? Oh, you
mean interpret the repeating decimal as a series?) Rewriting this repeating decimal
as a series:
5 + 0.23 + 0.0023 + 0.000023 · · · = 5 + 0.23 + 0.23 × 0.01 + 0.23 × (0.01)2 + · · · .
21.3 Power series
175
The 5 at the beginning is considered separately, so the initial term is a0 = 0.23 and
the common ratio is r = 0.01 in a geometric series. Therefore, its limit is:
5+
0.23
0.23
23
= 5+
= 5+ .
1 − 0.01
0.99
99
(4) This is not a geometric series. We can think of it as a power series with x = 1
1
and cn = n(n+1)
, but it seems like saying all series can be seen as power series. Let’s
just consider it as a general series starting from n = 1. The partial sum is:
n 1
1 1
1
=
−
∑ k(k + 1) ∑ k k + 1 = 1 − n + 1 .
k=1
k=1
n
sn =
(Everything in the middle cancelled out.) Therefore,
∞
1
lim sn = lim
∑ n(n + 1) = n→∞
n→∞
n=1
1−
1 = 1.⊓
⊔
n+1
We have determined the convergence of geometric series, but we have not yet determined the convergence of power series. We will need several discernment techniques that will be learned in the next lecture.
Exercises
1.
Lecture 22
Tests for absolute convergence
We learn techniques to determine the convergence of series. The absolute conver∞
gence test tests whether the series
∞
∑ |an | converges. Then the series ∑ an auto-
n=1
n=1
matically converges without taking absolute values.
22.1 Integral test
A sequence of partial sums sn for a sequence with non-negative terms an ≥ 0 forms
an increasing sequence.
∞
Problem 22.1. If an ≥ 0, then the convergence of the series
∑ an is equivalent to
n=1
the existence of the supremum of sn .
Solution 22.1 First, since sn+1 − sn = an+1 ≥ 0, we have sn+1 ≥ sn . Thus, sn is
an increasing sequence. We already know that the convergence of an increasing
sequence is equivalent to the existence of its supremum (bounded above). ⊔
⊓
Using integrals to determine the convergence of series is called the integral test.
Problem 22.2 (Integral test). Three conditions are necessary: (1) an ≥ 0, (2) an =
f (n) ≥ 0 for j = 1, 2, · · · , (3) f (x) is a monotonically decreasing function. Then,
∞
the convergence of the series
R∞
1
∑ an is equivalent to the convergence of the integral
n=1
f (x)dx.
Let’s consider the meaning of this integral test before proving it. There are three
main conditions. All three conditions are necessary, and we should observe how
they are used in the proof below. Moreover, we can construct counterexamples if
any of these conditions are not satisfied.
177
178
22 Tests for absolute convergence
Solution 22.2 Let’s mark where the three conditions are used in the following proof.
First,
Z
Z
n
n+1
f (x)dx =
1
n
k+1
f (x)dx ≤
∑
k=1 k
n Z k+1
∑
k=1
n
f (x)dx ≥
k=1 k
n
∑ f (k) = ∑ ak = sn ,
k=1
n
∑ f (k + 1) = ∑ ak+1 = sn+1 − a1 .
k=1
k=1
Therefore,
sn+1 − a1 ≤
Z n+1
1
f (x)dx ≤ sn
holds.
At this step, we use that f is a decreasing function. If sn converges, then
R n+1
f (x)dx is finite. Since f is monotonically decreasing and f (n) ≥ 0 for all
1
R
R
n, f (x) ≥ 0. Therefore, 1n+1 f (x)dx Rincreases as n increases. Thus, 1n+1 f (x)dx
converges
as n → ∞. Conversely, if 1∞ f (x)dx converges, then sn+1 is less than
R∞
1 f (x)dx + a1 . Therefore, sn is a finite increasing sequence, and thus it converges.
⊔
⊓
∞
The series
∞
1
1
∑ n is known to diverge. So, what about the series ∑ nα ? If α < 1,
n=1
n=1
then each term will be larger than when α = 1, so the series will certainly diverge.
If α > 1, then each term will be smaller, so there is a possibility that the series
converges. In the following problem, we show that the series converges for α > 1.
This means that α = 1 serves as the boundary for convergence. We saw a similar
phenomenon with integrals, and indeed, the reason behind it is the same. The proof
also utilizes this fact.
∞
Problem 22.3. (1) Show that for all α > 1, the series
1
∑ nα
converges. (2) Show
n=1
∞
that for all α ≤ 1, the series
1
∑ nα
diverges.
n=1
Solution 22.3 (1) Let f (x) = x1α = x−α . Then f is a positive, decreasing function
for x ≥ 1, and f (n) = n1α . Therefore, we can apply the integral test. As n → ∞, we
have
Z n
Z n
f (x)dx =
1
1
x−α dx =
1 1−α
x
1−α
n
1
=
1
1
1
n1−α +
→
.
1−α
α −1
α −1
From this, we can conclude that the series converges for all α > 1.
(2) We have already shown that the series diverges for α = 1. For α < 1, since
each term is larger than when α = 1, the series diverges even faster (this is known
as the comparison test). ⊔
⊓
22.2 Comparison test
179
Problem 22.4. Determine the convergence of the following series.
∞
∞
∞
2
1
1
(1) ∑ 2
. (2) ∑ ne−n . (3) ∑ ln n .
n
+
1
2
n=1
n=1
n=1
Solution 22.4 We use the integral test. First, we check if the three conditions are
satisfied. ⊔
⊓
22.2 Comparison test
If we know the convergence of one series, we can often determine the convergence
of a smaller or larger series.
∞
Problem 22.5 (Comparison test). Let 0 ≤ an ≤ bn . If the larger series
∑ bn con-
n=1
∞
verges, then the smaller series
∑ an
∞
also converges. If the smaller series
n=1
∞
diverges, then the larger series
∑ an
n=1
∑ bn also diverges.
n=1
The convergence does not depend on a finite number of terms, no matter how
large or small they are initially. What matters is the behavior as the index increases.
In the comparison test, comparison is only needed for sufficiently large indices.
That is, 0 ≤ an ≤ bn needs to hold only for sufficiently large n.
Solution 22.5 The proof is simple. If the terms are positive, then the partial sums
form an increasing sequence. If the larger partial sum sequence converges, it is
bounded above, and therefore, the smaller partial sum sequence is also bounded
above. Conversely, if the smaller partial sum sequence is unbounded, then the larger
partial sum sequence is also unbounded. This logic allows us to answer rigorously.
⊔
⊓
Question 22.1. One thing to be careful of in the above conditions is that the comparison cannot be applied to sequences where the signs change. It only works when
all terms are positive. A similar statement can be made for the case where all terms
are negative, an ≤ bn ≤ 0. Let’s write this out.
Problem 22.6 (Limit comparison). Let an , bn ≥ 0 for n ≥ N. (1) Let lim
∞
Then, if
∞
∞
∞
an
n→∞ bn
= 0.
∑ bn converges, ∑ an converges. If ∑ an diverges, ∑ bn diverges. (2)
n=1
n=1
n=1
n=1
∞
∞
an
Let lim
= C ̸= 0. Then, ∑ an converges if and only if ∑ bn converges.
n→∞ bn
n=1
n=1
180
22 Tests for absolute convergence
If lim
an
n→∞ bn
= ∞, then lim
bn
n→∞ an
= 0, so we can apply (1). The convergence of a series
is determined by its behavior as n → ∞, so it is natural that it is determined by the
an
limit lim . However, note that this comparison does not work when the signs
n→∞ bn
change.
Solution 22.6
⊔
⊓
Problem 22.7. Determine the convergence of the following series.
∞
∞
5
1
1
1
1
1
√ +
√ +
√ +···.
(1) ∑
. (2) ∑ . (3) +
3
2
4
3 2 + 2 2 + 3 2 + 4
n=1 5n − 1
n=1 n!
Solution 22.7
⊔
⊓
22.3 Ratio test
In the previous discussion, we considered sequences with positive values. Now we
consider sequences and series that can have both positive and negative values. However, we do not perform a detailed test that considers both positive and negative
values separately. Instead, we discuss convergence when taking the absolute values of both positive and negative terms. Let’s start with the definition of absolute
convergence.
∞
Definition 22.1. We say
∞
∑ an converges absolutely if ∑ |an | converges.
n=1
n=1
If the series ∑∞
n=1 an converges absolutely, then it can be easily shown to converge
in the usual sense.
∞
Problem 22.8. Show that if
∞
∑ |an | converges, then ∑ an converges.
n=1
n=1
The best way to prove convergence when the limit value is unknown is to use the
concept of Cauchy sequences.
Solution 22.8 Let sn be the partial sum of an and vn be the partial sum of |an |.
Since vn converges, for any ε > 0, there exists N such that |vn − vn′ | < ε whenever
n, n′ > N. Since
n
|sn − sn′ | =
∑′
k=n +1
n
ak ≤
∑′
k=n +1
sn converges. ⊔
⊓
Let’s consider a few simple examples.
|ak | < ε,
n, n′ > N,
22.3 Ratio test
181
Problem 22.9. Show that the following series converge.
∞
∞
(−1)n+1
sin n
(1) ∑
.
(2)
.
∑
2
2
n
n=1
n=1 n
Solution 22.9 Both of these examples, when their absolute values are taken, become
1
series such as ∑∞
n=1 n2 or smaller, which are known to converge. Therefore, they
converge absolutely. ⊔
⊓
Next, we introduce two methods for determining absolute convergence, with the
ratio test being the first. The Greek letter ρ is read as ”rho”.
Problem 22.10 (Ratio test). Show the following when lim
n→∞
(1) If ρ < 1, ∑∞
n=1 an converges absolutely.
(2) If ρ > 1, ∑∞
n=1 an diverges. (3) If ρ = 1, no conclusion.
an+1
= ρ.
an
If an is a geometric sequence, then ρ becomes the common ratio. Even if it’s not
a geometric sequence, as n approaches infinity, |an | tends to resemble the form of
a geometric sequence. Therefore, it’s quite natural that if ρ corresponding to the
geometric sequence is greater than 1, it diverges; if it’s less than 1, it converges.
When ρ equals 1, it encompasses both cases of convergence and divergence, making
it indeterminate.
1
Solution 22.10 First, let’s check (3). We already know that the series ∑∞
n=1 n diverges. Calculating ρ, we find ρ = limn→∞ n+1
n = 1. We also know that the series
2
1
n +2n+1
= 1. Thus, when
∑∞
n=1 n2 converges. Calculating ρ, we find ρ = limn→∞
n2
ρ = 1, both convergent and divergent cases are included, making it inconclusive.
Let’s prove (1). To understand the principle, consider this: If ρ < 1, there exists
|a
|
a common ratio r between ρ and 1, and for all n > N, |an+1
< r holds. Then, we
n|
can create a geometric series with ratio r that is larger than |an |. Therefore, by the
comparison test, the series converges. Students serious about mathematics can take
this logic, create such series abstractly, and complete the proof.
The logic for (2) can be similarly constructed.
⊔
⊓
Let’s practice using the ratio test with a few examples.
Problem 22.11. Determine the convergence of the following series.
∞
∞
∞
2n + 5
(2n)!
4n n!n!
(1) ∑
. (2) ∑
. (3) ∑
.
n
n=1 3
n=1 n!n!
n=1 (2n)!
Solution 22.11 For (1), the series looks like a geometric series with r =
sufficiently large n. Thus, we might consider applying the ratio test. Then,
2
3
for
182
22 Tests for absolute convergence
|an+1 |
2n+1 + 5 3n
= lim
.
n→∞ |an |
n→∞ 3n+1 2n + 5
ρ = lim
Dividing both numerator and denominator by 2n 3n , we get
2
1
2 + 5/2n
= .
n
n→∞
3
1 + 5/2
3
ρ = lim
As expected, we find that ρ = 32 , so the series converges by the ratio test.
For (2), although it doesn’t look like a geometric series, when we compute the
ratio, many terms cancel out, simplifying the calculation. Let’s see:
ρ = lim
n→∞
(2n + 2)!
(2n + 2)(2n + 1)
|an+1 |
n!n!
= lim
= lim
= 4.
n→∞ (n + 1)!(n + 1)! 2n!
n→∞ (n + 1)(n + 1)
|an |
Thus, the series diverges by the ratio test.
For (3), after reversing the ratio of 4 from the previous case and multiplying it by
4n , we might expect to get ρ = 1. Therefore, it is inconclusive. ⊔
⊓
22.4 Root test
Now, we introduce the root test, which is sometimes very useful.
Problem 22.12 (Root test). Let ρ = lim
n→∞
p
n
|an | = |an |1/n . Show the following:
∞
(1) If ρ < 1,
∞
∑ an converges absolutely.
(2) If ρ > 1,
n=1
∑ an diverges.
n=1
(3) If ρ = 1, no conclusion.
Solution 22.12 The proof and logic for this are very similar to those for the ratio
1
1
∞
test. Let’s first check (3). For the series ∑∞
n=1 n and ∑n=1 n2 , both have ρ = 1. From
1/n
Problem 21.6(2), we know that limn→∞ n = 1. Also,
lim (n2 )1/n = lim (n1/n )2 = ( lim n1/n )2 = 12 = 1.
n→∞
n→∞
n→∞
(The statement about the continuity of the function f (x) = x2 was used. Where?)
Therefore, in the case of ρ = 1, it encompasses both scenarios of convergence and
divergence, rendering the determination inconclusive.
Let’s prove (1). To understand the principle, consider this: If ρ < 1, there exists a
common ratio r between ρ and 1, and for all n > N, |an |1/n < r holds. When raised
to the power of n, we get
|an | < xn .
22.4 Root test
183
The right side forms a geometric series and converges. Therefore, by the comparison
test, ∑∞
n=1 an converges absolutely.
The logic for (2) can be similarly constructed.
⊔
⊓
Let’s practice using the root test with a few examples.
Problem 22.13. Determine if the following series converge.
∞ ∞
∞
1 n
2n
2n
(3) ∑
(1) ∑ 3 .
(2) ∑ √ .
.
n=1 n + 1
n=1 n
( n=1 n!
∞
(4)
∑ an , an =
n=1
n2−n , if n is odd
.
2−n , if n is even
Whether to use the ratio test or the root test is something to be learned through
practice. First, make a prediction and then iterate between failure and success.
Solution 22.13 For (1), the ratio test seems appropriate. Then,
2n+1 n3
= 2.
n→∞ 2n (n + 1)3
ρ = lim
Thus, the series diverges by the ratio test. For (2), let’s try the ratio test as well.
Then,
r
√
2n+1 n!
1
ρ = lim p
= 0.
= 2 lim
n→∞ 2n (n + 1)!
n→∞
n+1
Thus, the series converges by the ratio test. For (3), the root test seems appropriate.
Then,
1
ρ = lim (an )1/n = lim
= 0.
n→∞
n→∞ n + 1
Thus, the series converges by the root test. (4) is artificially created to demonstrate
a case where the ratio test does not work well, but the root test does. First, let’s try
the ratio test. Then,


(n + 1)2−n−1

 lim
= lim (n + 1)2−1 = ∞ if n is even
n→∞
2−n
ρ = n→∞ 2−n−1
−1
2


= lim
=0
if n is odd
 lim
−n
n→∞ n2
n→∞ (n + 1)
Thus, the ratio test is not helpful. Let’s try the root test. Then,
lim (n2−n )1/n = lim n1/n 2−1 = 2−1 ,
n→∞
n→∞
lim (2−n )1/n = 2−1 .
n→∞
Both converge to 0.5, so ρ = 2−1 , and the series converges by the root test. ⊔
⊓
184
Exercises
1.
2.
22 Tests for absolute convergence
Lecture 23
Power series
During the lecture, we delve into the examination of convergence and convergence
radius of power series utilizing the root test, a method among the absolute convergence criteria we’ve covered. The convergence analysis of Taylor series is rooted in
this approach. Towards the latter part of the lecture, we also present the introduction
of the conditional convergence test.
23.1 Convergence of a power series
Let’s develop an understanding of the convergence region of power series and its
relationship with the coefficients through the following examples.
Problem 23.1. Find the convergence regions of the following power series.
∞
∞
∞
∞
∞ xn
1 n n
x . (4) ∑ n2 xn . (5) ∑ (−1)n .
(1) ∑ xn . (2) ∑ (−1)n xn . (3) ∑ −
2
n
n=0
n=0
n=0
n=0
n=0
The convergence test for power series often allows for both the root test and
the ratio test. Of course, the coefficients must be considered. Cases (1), (2), and
(3) are geometric series, so we can easily find their limits. Cases (4) and (5) seem
challenging to find the limits, but we can still determine their convergence.
Solution 23.1 For (1), it’s a geometric series with a common ratio of x. Therefore,
it converges for all |x| < 1 and diverges for all |x| ≥ 1.
For (2), it’s also a geometric series with a common ratio of −x. The convergence
region is the same as in (1), i.e., |x| < 1. Although (2) appears as an alternating
series, it’s not exactly. If x is positive, (2) is alternating, and if x is negative, (1)
becomes an alternating sequence.
185
186
23 Power series
For (3), it’s a geometric series with a common ratio of − 2x . The coefficients decrease rapidly, and the convergence region is |x| < 2, which is twice as large as
(1).
For (4), the coefficients increase as n2 . Applying the ratio test, we get
(n + 1)2 xn+1
(n + 1)2
= lim
x = x.
2
n
n→∞
n→∞
n x
n2
lim
Therefore, it converges for |x| < 1. The boundary cases x = 1 and x = −1 both
diverge. The convergence interval remains unchanged. In conclusion, even with coefficients growing as n2 , it doesn’t affect the convergence region.
For (5), the coefficients decrease, but the convergence interval remains unchanged. Using the ratio test, we find
n
nxn+1
= lim
x = x.
n→∞ n + 1
n→∞ (n + 1)xn
lim
Therefore, it converges for |x| < 1. Among the boundary cases, x = 1 converges.
We can use the Alternating Series Test. For x = −1, it diverges. Since one boundary
point is included in the convergence interval, the interval is −1 < x ≤ 1. ⊔
⊓
Coefficients growing as finite squares of n or decreasing as finite squares of n1
don’t affect the convergence region of power series. However, there might be variations at the boundary points. On the other hand, if the coefficients grow or decrease
like geometric series, the convergence region adjusts accordingly. This is natural
since the behavior of coefficients is similar to that of a geometric series. Now let’s
consider cases where the coefficients grow or decrease faster than geometric series.
Problem 23.2. Find the convergence regions of the following power series.
∞ n
∞
x
(1) ∑ .
(2) ∑ n!xn .
n=0 n!
n=0
Solution 23.2 For (1), the coefficients are n!1 , decreasing rapidly. Since they decrease much faster than a geometric series, we expect a large convergence region.
Let’s verify this by the ratio test:
xn+1 n!
x
= lim
= 0 < 1.
n→∞ (n + 1)!xn
n→∞ n + 1
ρ = lim
This means ρ is less than 1 for any x, implying the convergence interval is the entire
real line. This series is well-known, converging to ex . Differentiating it yields the
same function, verifying its convergence.
For (2), the coefficients grow much faster than a geometric series. We expect a
very small convergence region. Let’s confirm this with the ratio test:
23.2 Radius of convergence
187
xn+1 (n + 1)!
ρ = lim
= lim (n + 1)x =
n→∞
n→∞
n!xn
(
0
∞
if x = 0,
otherwise.
Therefore, the convergence region consists of a single point, x = 0. ⊔
⊓
Let’s use the root test to determine the convergence of the general power series
n
∑∞
n=0 cn x . Then,
ρ = lim (|cn ||x|n )1/n = lim (|cn |)1/n |x|
n→∞
n→∞
According to the root test, if ρ > 1, the series diverges; if ρ < 1, it converges; and
if ρ = 1, it may either diverge or converge. Therefore, using this, we can find the
convergence radius as follows.
23.2 Radius of convergence
Problem 23.3 (Radius of convergence). The radius of convergence R of a given
n
series ∑∞
n=0 cn x is defined as follows:
R=
1
1
=
.
ρ
lim (|cn |)1/n
n→∞
It satisfies the following:
∞
1. If |x| < R, the series
∑ cn xn converges.
n=0
∞
2. If |x| > R, the series
∑ cn xn diverges.
n=0
∞
3. If |x| = R, the series
∑ cn xn may converge or diverge.
n=0
Solution 23.3 This is obvious from the root test. ⊔
⊓
Problem 23.4 (Differentiability of power series). Let R be the radius of conver∞
n
n
gence of ∑∞
n=0 cn x , and let f (x) = ∑n=0 cn x for |x| < R. Then,
n−1 .
1. For all |x| < R, f is differentiable, and f ′ (x) = ∑∞
n=1 ncn x
2. For all |x| < R, f is infinitely differentiable, and for |x| < R, it satisfies:
f (k) (x) =
∞
∑ n(n − 1) · · · (n − k + 1)cn xn−k .
n=k
188
23 Power series
n
∑ ck xk is given by s′n (x) =
Solution 23.4 The derivative of the partial sum sn (x) =
k=0
n
∑ kck x
k−1
. The convergence region of
s′n (x)
is also R. Since the differentiation
k=0
operation is continuous,
f ′ (x) = ( lim sn (x))′ = lim (sn (x))′ =
n→∞
n→∞
∞
∑ ncn xn−1 .
(23.1)
n=0
∞
By repeatedly applying this process to the power series
∑ ncn xn−1 , we obtain all
n=0
its derivatives. ⊔
⊓
Question 23.1. After claiming that differentiation is continuous, we took the limit
out of the derivative in (23.1). Can you explain the connection between claiming
differentiation is continuous and taking the limit out of the derivative like this? Is
the differentiation operation really continuous? How does it seem?
Just as we can differentiate, we can also integrate. Integrate the partial sums,
verify their convergence radius, and then take the limit. The integration operation is
also continuous.
Problem 23.5 (Integrability of power series). Let R be the radius of convergence
∞
n
n
of ∑∞
n=0 cn x , and let f (x) = ∑n=0 cn x for |x| < R. Then,
Z x
F(x) =
∞
f (s)ds =
0
cn
∑ n + 1 xn+1 ,
|x| < R.
n=0
Solution 23.5 Omitted. ⊔
⊓
n
Given a power series ∑∞
n=0 cn (x − x0 ) , if the convergence radius is R, then we
can differentiate and integrate within this range as much as we want, and it will still
converge. This means that a function represented by a power series is differentiable
and integrable at any point within its convergence radius.
23.3 Alternating series
A sequence an is called an alternating sequence if the sign of each element alternates, and the series formed by it is called an alternating series. For example, a
sequence where the numbers corresponding to even indices are positive and those
corresponding to odd indices are negative is an alternating sequence. Therefore, if
an is an alternating sequence, it can be written as follows:
an = (−1)n−1 bn ,
bn ≥ 0.
23.3 Alternating series
189
Let’s study the properties of series composed of such sequences.
Problem 23.6 (Alternating series test). Let’s assume that bn ≥ 0, bn → 0 as n → ∞,
n−1 b
and bn is monotonically decreasing. Then, prove that the series ∑∞
n
n=1 (−1)
converges, and furthermore, for all n > 0, show that
∞
s2n ≤
∑ (−1)n−1 bn ≤ s2n+1
(23.2)
n=1
holds.
It is necessary for the sequence bn to be a decreasing sequence and converge to
0. If either of these conditions is not satisfied, a counterexample can be constructed
where the series does not converge. Additionally, Equation (23.2) can play a crucial
role as an error estimate for convergence. Knowing not only that a series converges
but also where its limit lies is very important.
Solution 23.6 The partial sums are expressed as follows:
s2n+1 = b1 − b2 + b3 − b4 + b5 − · · · − b2n + b2n+1
= b1 − (b2 − b3 ) − (b4 − b5 ) − · · · − (b2n − b2n+1 ).
Since bn is a decreasing sequence, b2 − b3 ≥ 0, b4 − b5 ≥ 0, b6 − b7 ≥ 0, · · · . Therefore, the partial sum s2n+1 is a decreasing sequence as n increases. Furthermore,
rewriting s2n+1 yields
s2n+1 = b1 − b2 + (b3 − b4 ) + · · · + (b2n−1 − b2n ) + b2n+1 ≥ b1 − b2
which has a lower bound (bounded below). Therefore, s2n+1 converges. Now, let’s
denote its limit as L1 . Similarly, considering s2n , we have:
s2n = (b1 − b2 ) + (b3 − b4 ) + (b5 − b6 ) + · · · + (b2n−1 − b2n )
Each term is either 0 or positive, so s2n is an increasing sequence. Also, rewriting
s2n gives:
s2n = b1 − (b2 − b3 ) − (b4 − b5 ) − · · · − (b2n−2 − b2n−1 ) − b2n ≤ b1
which has an upper bound (bounded above). Hence, s2n converges, and let’s denote
its limit as L2 . Then, the difference between the two limits is:
L1 − L2 = lim s2n+1 − lim s2n = lim (s2n+1 − s2n ) = lim b2n+1 = 0.
n→∞
n→∞
n→∞
n→∞
n−1 b converges. Since s
Therefore, L1 = L2 and ∑∞
n
2n+1 is a decreasing sen=1 (−1)
quence and s2n is an increasing sequence, (23.1) is satisfied. ⊔
⊓
190
23 Power series
Problem 23.7. Determine the convergence of the following series.
∞
∞
1
10n
(1) ∑ (−1)n−1 . (2) ∑ (−1)n−1 2
.
n
n
+ 16
n=1
n=1
Solution 23.7 (1) Since n1 is positive, decreasing, and converges to 0, by the altern−1 1 converges. However, if we attach absonating series test, the series ∑∞
n=1 (−1)
n
1
∞
∞ 1
n−1
lute values, ∑n=1 |(−1) n | = ∑n=1 n does not converge. Such cases where adding
absolute values causes divergence while without absolute values they converge are
called conditional convergence.
is positive and converges to 0 as n approaches infinity. However, it
(2) bn = n210n
+16
is not a decreasing sequence. For small n, it can increase. But for sufficiently large
n, it decreases. Furthermore, when applying the Alternating Series Test, the initial
few terms do not affect convergence. How do we show that it is a decreasing seand analyze the sign of its derivative.
quence for large n? Let’s denote f (x) = x210x
+16
Upon computation, we find that f ′ (x) ≤ 0 when x > 4. Therefore, for n > 4, it is
decreasing, and by the property of converging decreasing sequences, it converges.
⊔
⊓
(23.1) suggests that powers can estimate the location of limits. Let’s verify this
through the following problem.
n−1 2−n with an error of less than
Problem 23.8. Estimate the value of L = ∑∞
n=1 (−1)
0.01.
Solution 23.8 Since it is an alternating series and an is positive when n is odd,
s2n < s2n+1 . The limit lies between s2n and s2n+1 with a gap of b2n+1 . Therefore, we
start by finding n such that b2n+1 < 0.01. This implies
ln 2−2n−1 < ln 0.01 ⇒ −2n − 1 <
ln 0.01
⇒ n > 2.8219.
ln 2
Thus, n = 3. Then, the estimation interval is (s6 , s7 ). Of course, we know the value
of L:
0.5
1
L=
= .
1 − (−0.5) 3
Therefore, it can be confirmed that
1
3
∈ (s6 , s7 ). ⊔
⊓
23.4 Rearrangement and conditional convergence
A sequence is a collection of numbers with a specified order. Sometimes, by rearranging the order, we can infer properties of the original sequence. This is called
rearrangement. But how do we define it?
23.4 Rearrangement and conditional convergence
191
Definition 23.1 (Rearranged series). Let N be the set of natural numbers and an be
a given sequence for all n ∈ N. A sequence bn is called a rearrangement of an if
there exists a one-to-one onto mapping φ : N → N such that
bn = aφ (n) ,
n ∈ N.
Question 23.2. Can you explain if this definition satisfies the intended purpose?
Problem 23.9. If the series ∑∞
n=1 an absolutely converges, then all rearrangements
also absolutely converge, and their limits do not change.
This problem demonstrates that for absolutely convergent series, rearrangements
do not make significant differences. The proof is relatively simple but requires careful organization.
∞
Solution 23.9 Let ∑∞
n=1 an = L. Now let’s show that ∑n=1 bn = L. Let sn be the
partial sum of an , and vn be the partial sum of bn . We need to show that for any given
ε > 0, there exists an N such that for all n > N, |vn − L| < ε. Since ∑∞
n=1 an = L,
there exists N1 such that for all n > N1 , |sn − L| < ε2 holds. Moreover, since ∑∞
n=1 |an |
ε
converges, there exists N2 > N1 such that ∑∞
|a
|
<
.
Now,
we
choose
N such
k
k=N2
2
that:
N = max{φ −1 (n) : n ≤ N2 }.
Then, for n > N,
n
N2
∞
|vn − L| = | ∑ bk − L| < | ∑ ak − L| +
k=1
k=1
∑
|ak | ≤
k=N2
ε ε
+ = ε.
2 2
Thus, ∑∞
⊓
n=1 bn = L. (The core is the first inequality.) ⊔
Now let’s consider another type of convergence.
∞
Definition 23.2. We say ∑∞
n=1 an converges conditionally if ∑n=1 an converges, but
∞
∑n=1 |an | diverges.
The statement that the series ∑∞
n=1 an converges, but the series with absolute values attached, ∑∞
|a
|,
diverges,
means that within it, there are both negative and
n=1 n
positive terms that cancel each other out, resulting in convergence. However, attaching absolute values leads to divergence.
Problem 23.10. Verify that the following series converge conditionally.
∞
∞
∞
(−1)n n
sin(n)
(−1)n
with
0
<
α
≤
1.
(2)
.
(3)
.
(1) ∑
∑
∑
α
2 − 2n + 1
n
n
n
n=1
n=1
n=1
Solution 23.10 (1) and (2) can both be shown to converge by the alternating convergence test. It can also be shown that attaching absolute values leads to divergence.
192
23 Power series
Hence, they converge conditionally. (3) is not precisely an alternating series. The
sine function changes sign every π. It changes sign twice per 2, which means occasionally it doesn’t change sign. While it seems likely to converge conditionally due
to approximately balanced positive and negative terms, there isn’t a straightforward
way to prove it. ⊔
⊓
The statement of conditional convergence means that adding only positive terms
or only negative terms separately would lead to divergence. However, they converge
when combined appropriately. Would their convergence remain unchanged if we
rearrange their order? Surprisingly, we obtain unexpected results.
Problem 23.11. If a series ∑∞
n=1 an converges conditionally, then regardless of the
given number L, we can create a rearranged series ∑∞
n=1 bn that converges to L.
Solution 23.11 Even though it may seem odd that rearranging the terms would converge to a specific L, especially when the series contains both positive and negative
terms that separately lead to divergence, this is simply due to our familiarity with
finite worlds and lack of experience with the infinite world.
Let’s create such a rearranged series bn . This means creating a one-to-one onto
mapping φ : N → N so that the series converges to L. Remembering that the sequence an contains infinitely many positive and negative terms, and each sum separately diverges, let’s create a rearrangement. First, let b1 = 1. If ∑kn=1 bn exceeds
L, we assign the next negative index. If it’s less than L, we assign the next positive
index. Continuing this process, we ensure that eventually, ∑∞
⊓
n=1 bn = L. ⊔
Exercises
1. Determine the convergence intervals of the following power series.
∞
∞
∞
∞
3n x2n
(2n)! n
x (4) ∑ 2
(1) ∑ nxn (2) ∑ n2 (3x − 1)n (3) ∑
n=1 n
n=1
n=1
n=1 n!
∞
∞
∞
∞
(−1)n (2x − 1)n
(x2 − 1)n
nn x n
n2n xn
(5) ∑
(6) ∑
(7) ∑
(8) ∑
2n + 2
n=1
n=1 2 · 4 · 6 · · · 2n
n=1 n!
n=1 n!
2.
Lecture 24
Taylor Series
In the previous lecture, we studied various properties and convergence of sequences
and series, which served as preliminary work for studying Taylor series. In this
lecture, we introduce Taylor series and study its properties.
24.1 Taylor series
Let’s approximate a function f (x) that is differentiable n times as a power series,
especially around a point x = x0 where we want the approximation to be good. Let’s
initially approximate it as
f (x) ∼
=
n
∑ cs (x − x0 )s
≡ pn (x)
(24.1)
s=0
Let’s denote the right-hand side sum simply as pn (x).
Question 24.1. We want to choose the coefficients cs so that the polynomial pn (x)
on the right becomes a good approximation of the function f (x) on the left. How
should we determine the coefficients?
There are various methods to determine the coefficients cs , but in Taylor series,
we choose them such that all derivatives up to order n are equal at one tangent point
x0 .
Problem 24.1. Suppose the function f is differentiable up to order n at x = x0 . Determine the coefficients cs such that the approximation function pn (x) in (24.1) and
the target function f (x) have equal derivatives from order 0 to n at x = x0 .
Solution 24.1 Since there are a total of n + 1 coefficients, we can make n + 1 derivatives equal from order 0 to n. That is,
193
194
24 Taylor Series
(k)
f (k) (x0 ) = pn (x0 ),
k = 0, 1, · · · , n
Let’s determine the coefficients cs so that the above equation holds. This equation
forms a system of n + 1 simultaneous equations with n + 1 coefficients as unknowns.
Moreover, the right-hand side is already diagonalized. By differentiating pn (x) k
times and substituting x0 for x, we obtain
(k)
pn (x0 ) =
n
∑ s(s − 1) · · · (s − k + 1)cs (x − x0 )s−k x=x0 = ck k!
s=k
This calculation can be explained in detail. When we differentiate pn (x) k times, all
terms with degrees less than k become 0. Thus, the summation starts from s = k. For
terms with degrees s ≥ k, the kth derivative of cs (x − x0 )s is given as shown above,
which has a factor of (x − x0 )s−k . In particular, the kth derivative of the term ck (x −
x0 )k becomes the constant k!ck . Substituting x with x0 in the above expression yields
(k)
0 for all terms except the constant term. Therefore, rewriting f (k) (x0 ) = pn (x0 ), we
get f (k) (x0 ) = k!ck . Thus, the coefficients are given by
ck =
f (k) (x0 )
k!
⊔
⊓
The polynomial pn (x) constructed with these coefficients is called the Taylor
polynomial. The nth degree Taylor polynomial of function f (x) centered at x0 is as
follows:
n
f (k) (x0 )
pn (x) = ∑
Taylor polynomial
(x − x0 )k .
k!
k=0
Question 24.2. Taylor polynomial is an approximation function made with derivative information at one point. Therefore, while this approximation function can approximate the function f (x) well near the differentiation point x0 , we cannot expect
pn (x) to converge to f (x) if x0 is far away. However, many known functions do converge. For example, functions like sin x converge for all x ∈ R. How is this possible?
Problem 24.2 (Taylor series). If the function f (x) is differentiable an infinite number of times, we can create a series instead of a partial sum:
∞
p(x) =
∑
k=0
f (k) (x0 )
(x − x0 )k .
k!
Taylor series
This is called the Taylor series. Of course, it is meaningful only within its convergence interval.
(1) When does this series converge?
(2) If f (x) = sin x, what is the convergence radius?
(3) Can we say that p(x) equals f (x) on the convergence interval of the series?
24.1 Taylor series
195
Solution 24.2 (1) Since there is k! in the denominator of the coefficients, there is a
possibility of a large convergence interval. However, if f (k) (x0 ) grows very quickly,
this effect may be diminished. The ratio test seems useful. (2) If the function is
f (x) = sin x, then f (k) (x0 ) is always less than 1 in absolute value. Therefore, by
applying the ratio test, we can show that the convergence interval is the entire real
line. (3) If the series converges, is p(x) equal to f (x)? In fact, there is no reason for
it. The approximation function p(x) only has information about the original function
f (x) at one point x0 . Therefore, although it may be near x0 , there is no reason for
p(x) to equal f (x) far from x0 . If they do, it would be quite surprising. However,
many functions do so. How can this be possible? How can we prove it? ⊔
⊓
The convergence of Taylor series alone does not indicate what its limit is. The
real value of Taylor series lies in error estimation.
Theorem 24.1 (Taylor’s theorem (Lagrange form)). Suppose that f (x) is differentiable n + 1 times for all x ∈ (a, b) ⊂ R, and x0 ∈ (a, b). Then, there exists a point c
between x and x0 such that
f (x) = pn (x) + Rn (x),
where
n
pn (x) =
∑
k=0
f (k) (x0 )
(x − x0 )k ,
k!
Rn (x) :=
f (n+1) (c)
(x − x0 )n+1 .
(n + 1)!
(24.2)
Proof. (The logic used in this proof is also employed in Problem 24.1.) When proving this theorem, x and x0 are constants. We will denote the function by s as a
variable. Let
f (x) = pn (x) + M(x − x0 )n+1
f (x) − pn (x)
. Now, we need to
(x − x0 )n+1
show that this constant M is given as in the theorem. Consider the difference between the left-hand side and the right-hand side as
be satisfied for some constant M. Simply let M =
E(s) = f (s) − pn (s) − M(s − x0 )k+1 .
Then E(s) is differentiable n + 1 times, and for all 0 ≤ k ≤ n, E (k) (x0 ) = 0. Now,
we intend to use the Mean Value Theorem n + 1 times. Since E(x0 ) = E(x) = 0,
by the MVT, there exists c1 between x and x0 such that E ′ (c1 ) = 0. Furthermore,
E ′ (x0 ) = E ′ (c1 ) = 0, so there exists c2 between c1 and x0 such that E ′′ (c2 ) = 0.
Repeating this process n + 1 times, we obtain E (n+1) (c) = 0 satisfying cn+1 = c
between x0 and cn . Since pn (s) is an nth-degree polynomial, its n + 1st derivative is
0. Therefore,
E (n+1) (c) = f (n+1) (c) − M(n + 1)! = 0,
satisfying M =
f (n+1) (c)
for some c between x and x0 . ⊔
⊓
(n + 1)!
196
24 Taylor Series
Since the function f (x) is differentiable n + 1 times, we can approximate it with
the n + 1st degree Taylor polynomial pn+1 (x). However, in that case, we do not
know how large the error is. Error estimation is the essence of the Taylor theorem.
The remainder term Rn (x) represents the approximation error of f (x) and pn (x),
and except for f (n+1) (x0 ) in place of f (n+1) (s), it looks just like the n + 1st term of
the Taylor polynomial. Moreover, s lies between x and x0 .
Problem 24.3 (One point decides all). If f (x) = sin x, then for all x, x0 ∈ R, prove
that
∞
f (n) (x0 )
(x − x0 )n
(24.3)
f (x) = ∑
n!
n=0
is true, in other words, prove that f (x) = p(x).
Solution 24.3 (1) With the presence of error estimation, we can easily solve this
problem. Since sin x has derivatives whose absolute values are either 1 or less than
1, the error term Rn (x) in (24.2) converges to 0 for all x, x0 ∈ R as n → ∞ by the ratio
test. Therefore, the limit of the series p(x) is equal to f (x). Let’s manually compute
and verify this by explaining the key points verbally. ⊔
⊓
The above result is very peculiar. The Taylor series is defined based solely on
the derivative values at one point. However, the above result suggests that all the
derivative information at one point determines the function values at all points.
Let’s rewrite the remainder term Rn ;
Rn (x) :=
f (n+1) (s)
(x − x0 )n+1 .
(n + 1)!
If there exists a number M > 0 such that regardless of the degree n and the point
s, the numerator of the coefficient is bounded, i.e., | f (n+1) (s)| < M holds, then as n
approaches infinity, Rn (x) converges to 0 for any x. Of course, if x is far from x0 ,
|x − x0 | is large, so for Rn (x) to be sufficiently small, n must be much larger, but
ultimately it becomes sufficiently small.
Problem 24.4. For the exponential function f (x) = ex , (1) find the Taylor polynomial pn (x). (2) Find the interval of convergence. (3) Determine whether the limit
p(x) matches with f (x) = ex .
Solution 24.4 (1) To be completed. ⊔
⊓
Problem 24.5 (Maclaurin series). If x0 = 0, show that (24.3) can be written as
follows;
∞
∞
(−1)n 2n+1
(−1)n 2n
sin x = ∑
x
, cos x = ∑
x .
n=0 (2n + 1)!
n=0 (2n)!
Solution 24.5 [Solution] The even derivatives of sin x are either sin x or − sin x, both
of which have a value of 0 at x = 0. Therefore, c2n = 0. The odd derivatives are
24.2 Applications
197
either cos x or − cos x, and they have values of +1 or −1 at x = 0. Hence, we obtain
the above expressions. We can similarly proceed for f (x) = cos x. ⊔
⊓
The above special cases of the Taylor series with the center point x0 set to 0 are
called the Maclaurin series.
24.2 Applications
Theorem 24.2 (Binomial expansion). For all |x| < 1 and all α ∈ R, prove the following:
∞
α(α − 1) · · · (α − k + 1)
α
α
k
.
(24.4)
(1 + x) = ∑ ck x , ck =
:=
k
k!
k=0
Proof. The proof follows from Taylor’s theorem. For the interval where x > −1,
1 + x > 0, so (1 + x)α is well-defined for all α ∈ R, and its derivatives are also welldefined. Let f (x) = (1 + x)α , then f (k) (x) = α(α − 1) · · · (α − k + 1)(1 + x)α−k .
Therefore, by Taylor’s theorem,
n
(1 + x)α =
∑ ck x k +
k=0
α(α − 1) · · · (α − n)(1 + s)α−n−1
(n + 1)!
holds. Here, s is a number between 0 and x. If |x| < 1, then 1 + x > 0, hence the
remainder term converges to 0 as n → ∞. Thus, the series converges, and (24.4)
holds. ⊔
⊓
Expanding the binomial (1 + x)n for positive integer n > 0 can be seen as simply
unfolding it by multiplication, but the result is always in the form of a Taylor series.
Therefore, viewing the expanded result as a Taylor polynomial is a good application
of Taylor series. In this case, for k > n, the numerator is 0, so ck = 0, and ck =
n!
k!(n−k)! . Thus,
n
n
(1 + x) =
k
∑ ck x ,
k=0
n!
n
ck =
=
k
k!(n − k)!
holds. However, if α is not a positive integer, ck is not always 0, and therefore it
should be understood as a series.
Problem 24.6. Use the ratio test to prove the convergence of the series (24.4).
Solution 24.6
⊔
⊓
198
24 Taylor Series
Problem 24.7. Find the Taylor expansions of the following functions and determine
their convergence intervals.
1
1
1
. (4) arctan x.
. (2)
. (3)
(1)
1+x
1−x
1 + x2
Solution 24.7
⊔
⊓
Problem 24.8. Find the 0th, 1st, and 2nd terms of the polynomial (2 + 3x + x2 )10 .
Solution 24.8
⊔
⊓
Problem 24.9. Find the Taylor expansion of ln x.
Solution 24.9 An important point to note is that we cannot find a Taylor expansion
centered at 0 because ln 0 is undefined. Therefore, the next option is to center it at
1. Then, ln(1) = 0 and ln(k) (x) = (−1)k−1 (k − 1)!x−k , so
∞
ln =
∑ ck (x − 1)k ,
k=1
ck =
(−1)k−1 (k − 1)! (−1)k−1
=
.
k!
k
It is possible to choose a center other than 1 if necessary. ⊔
⊓
There are different versions of Taylor’s Theorem, and one of them is as follows.
Theorem 24.3 (Taylor’s theorem (Peano form)). Suppose that f (x) is differentiable n times for all x ∈ (a, b) ⊂ R, and x0 ∈ (a, b). Then, there exists a function
hn : (a, b) → R such that
f (x) = pn (x) + hn (x)(x − x0 )n ,
a < x < b,
where hn (x) → 0 as x → x0 .
Proof. Define the function hn as
( f (x)−p
hn (x) =
n (x)
(x−x0 )n
if x ̸= x0 ,
0
if x = x0
This definition satisfies the relationship in the theorem. Applying L’Hopital’s Rule
repeatedly shows that its limit is 0. ⊔
⊓
Problem 24.10. (1) Explain the meaning of Theorem 24.3 and (2) compare it with
Theorem 24.1.
Solution 24.10 (1) Theorem 24.3 explains the limit as x → x0 . Expressing what the
theorem says using little-oh notation, we have:
| f (x) − pn (x)| = o(|x − xn |n ) as
x → x0 .
24.2 Applications
199
Therefore, increasing the degree helps to improve the convergence speed as x → x0 .
However, since the exact error is not given, it’s hard to say how much it helps, as we
cannot compare the sizes of |hn (x)| corresponding to the coefficients.
(2) Theorem 24.3 does not address the convergence of Taylor series. Even if f is
infinitely differentiable and Taylor series can be constructed, it does not talk about
the limit as n → ∞ for a fixed x ̸= x0 . ⊔
⊓
Exercises
1.
Appendix A
Second Order Differential Equations
In this lecture, we find solutions to second-order linear equations. A second-order
linear equation can be written as follows:
y′′ + a(x)y′ + b(x)y = Q(x).
(A.1)
Solving a second-order differential equation yields two general constants, and to
determine them, two conditions are needed. For instance, we can provide two initial
conditions as follows:
y(x0 ) = y0 , y′ (x0 ) = y1 .
Solving a second-order differential equation is much more difficult than solving a
first-order one. It can be solved by hand only in special cases. In this lecture, we find
solutions only when a, b, and Q are all constants. This is particularly important to
Newton because the equations of celestial orbits are given as a special case among
these.
A.1 Second-order homogeneous linear equation
If Q = 0, (A.1) is called a homogeneous differential equation. We specifically find
solutions when it has constant coefficients:
y′′ + ay′ + by = 0.
(A.2)
The first objective is to find two nonzero solutions that are linearly independent of
each other, denoted as y1 and y2 . Of course, y = 0 satisfies the equation, but this
is not helpful. Being linearly independent means that one cannot be expressed as
a constant multiple of the other. In other words, finding y1 and y2 but still having
y2 = Cy1 for some constant C ∈ R means we haven’t found the second solution yet.
201
202
A Second Order Differential Equations
Problem A.1. If y1 and y2 are solutions to (A.2), show that any linear combination
of them,
y = C1 y1 +C2 y2 ,
is also a solution for all C1 ,C2 ∈ R.
Solution A.1 Substituting the linear combination above into (A.1), we obtain:
(C1 y1 +C2 y2 )′′ + a(x)(C1 y1 +C2 y2 )′ + b(x)(C1 y1 +C2 y2 )
= C1 (y′′1 + a(x)y′1 + b(x)y1 ) +C2 (y′′2 + a(x)y′2 + b(x)y2 ) = 0.
Hence, the linear combination is also a solution to (A.1). ⊔
⊓
The key here is that once we find two linearly independent solutions, we can find
the general solution containing two general constants, representing all solutions. So,
how do we find these two solutions? When the coefficients a and b are constants,
we can find solutions of the form
y = eλ x .
To find the corresponding λ , we substitute eλ x into (A.2). Utilizing the property of
the exponential function:
(eλ x )′ = λ eλ x ,
(A.3)
we obtain:
λ 2 eλ x + aλ eλ x + beλ x = 0.
Dividing by eλ x (which is nonzero), we get a quadratic equation for λ :
λ 2 + aλ + b = 0.
(A.4)
This important equation is called the characteristic equation, and its solutions are:
√
√
−a + a2 − 4b
−a − a2 − 4b
,
λ2 =
.
λ1 =
2
2
The nature of the solutions depends on the sign of the discriminant a2 − 4b.
Remark A.1 (Exponential Function). The property used in obtaining the characteristic equation (A.4) is the property of the exponential function eλ x given in (A.3).
This property is fundamental to the exponential function and is essential for us.
Case 1. a2 − 4b > 0
If the discriminant is positive, the characteristic equation has two real roots λ1 , λ2 ∈
R. Hence, the general solution is:
A.1 Second-order homogeneous linear equation
y = C1 eλ1 x +C2 eλ2 x .
203
(A.5)
Problem A.2. Describe the asymptotic behavior of the solutions given by (A.5) depending on the signs of the real parts of the roots λ1 and λ2 of the characteristic
equation, as x → ∞.
Solution A.2 If either λ1 or λ2 is positive, the solutions diverge as x tends to infinity. If both are negative, the solutions converge to zero. If one of them is zero,
the behavior depends on the other root. (If x represents time and y represents the
distance between planets and the sun, these solutions do not describe the orbit of a
planet.) ⊔
⊓
Case 2. a2 − 4b = 0
If the discriminant is zero, the characteristic equation has a single real root:
a
λ =− .
2
This root is called a repeated root. Firstly, y1 = eλ x is a solution. We need to find
another solution. The second solution is:
y2 = xy1 .
When λ is not a repeated root, xy1 is not a solution. However, when it is a repeated
root, xy1 becomes a solution. This can be verified as follows. Substituting xy1 into
equation (A.2), we get:
(xy1 )′′ + a(xy1 )′ + bxy1 = xy′′1 + axy′1 + bxy1 + x′′ y1 + 2x′ y′1 + ax′ y1
= 2λ y1 + ay1 = 0.
Thus, xy1 is a solution. Therefore, the general solution is:
y = C1 eλ x +C2 xeλ x .
(A.6)
Problem A.3. Describe the asymptotic behavior of the solutions given by (A.6) depending on the sign of the real part of the root λ of the characteristic equation, as
x → ∞.
Solution A.3 If λ is positive, the solutions diverge as x tends to infinity. If λ is
negative, the solutions converge to zero. If λ is zero, the solutions are straight lines.
(If x represents time and y represents the distance between planets and the sun, these
solutions do not describe the orbit of a planet.) ⊔
⊓
204
A Second Order Differential Equations
Case 3. a2 − 4b < 0
If the discriminant is negative, the characteristic equation has two complex roots:
λ1 = α + β i,
a
Here, α = − and β =
2
λ2 = α − β i.
√
|a2 −4b|
.
2
The solutions are:
y1 = e(α+β i)x ,
y2 = e(α−β i)x .
Thus, the general solution we seek is:
y = C1 e(α+β i)x +C2 e(α−β i)x .
Let’s review the definition of exponential functions with complex powers. Firstly,
consider when only the imaginary part is raised to the power:
(
eβ ix = cos β x + i sin β x,
(A.7)
e−β ix = cos(−β x) + i sin(−β x) = cos β x − i sin β x.
When there is also a real part:
eα+β ix = eα eβ ix = eα (cos β x + i sin β x).
Question A.1. Is it reasonable to call the function defined on the right-hand side of
(A.7) eβ ix an exponential function? What exactly is an exponential function?
To call the function defined by (A.7) an exponential function, it must satisfy the
unique properties of exponential functions. What are they? It’s essential to satisfy
the property used to obtain the characteristic equation, (A.3).
Problem A.4. For a real number β ∈ R, prove that the function defined by (A.7)
satisfies the unique property of exponential functions (A.3).
Solution A.4 To verify whether the definition in (A.7) makes sense, we need to
satisfy the unique property of exponential functions. By direct computation:
(eβ ix )′ = (cos β x + i sin β x)′ = −β sin β x + iβ cos β x
= β i cos β x + i2 β sin β x = β i(cos β x + i sin β x) = β ieβ ix .
The second case, (e−β ix )′ = −β ie−β ix , can be similarly shown, but it’s not necessary.
It naturally follows due to the fact that the sin function is an odd function. ⊔
⊓
Hence, the two solutions are written as:
y1 = e(α+β i)x = eαx eβ ix = eαx (cos β x + i sin β x),
A.2 Second order inhomogeneous linear equation
205
y2 = e(α−β i)x = eαx e−β ix = eαx (cos β x − i sin β x).
The inconvenience of using these two lies in dealing with complex-valued functions.
One could restrict to real functions. Since the linear combination of the two solutions
is also a solution:
eαx cos β x = (y1 + y2 )/2
and eαx sin β x = (y1 − y2 )/2i
are also linearly independent solutions. Thus, we can use these two solutions:
y1 = eαx cos β x,
y2 = eαx sin β x.
Using these, we can construct a general real-valued solution as follows:
y = C1 eαx cos β x +C2 eαx sin β x.
(A.8)
Problem A.5. Describe the asymptotic behavior of the solutions given by (A.8) depending on the sign of the real part α of the roots of the characteristic equation, as
x → ∞.
Solution A.5 If the real part α is positive, the solutions diverge as x tends to infinity.
If α is negative, the solutions converge to zero. If α is zero, the solutions are periodic
functions. (If x represents time and y represents the distance between planets and the
sun, these solutions do not describe the orbit of a planet.) ⊔
⊓
A.2 Second order inhomogeneous linear equation
To find all possible solutions of the inhomogeneous problem (A.1), first, we need
to find two solutions y1 and y2 of the homogeneous problem with Q = 0. The work
done in the previous section covers this. Now, we need to find one solution of the
inhomogeneous problem (A.1) with Q(x). This solution is called the particular solution and denoted as y p . Then, all solutions of (A.1) are given by:
y = C1 y1 +C2 y2 + y p .
(A.9)
Problem A.6. To show that for all constants C1 ,C2 , if y p satisfies (A.1) and y1 , y2
are linearly independent solutions of the homogeneous problem, then y given by
(A.9) is a solution of (A.1).
Solution A.6 For linear problems, it is convenient to introduce a linear operator.
Defining L (y) = y′′ + ay′ + by, we can express (A.1) simply as L (y) = Q, which
is convenient. The answer to this problem can also be stated concisely.
L (y) = L (C1 y1 +C2 y2 + y p ) = C1 L (y1 ) +C2 L (y2 ) + L (y p ) = L (y p ) = Q.
Thus, y = C1 y1 +C2 y2 + y p is a solution of (A.1). ⊔
⊓
206
A Second Order Differential Equations
The technique for finding particular solutions varies depending on Q and y1 , y2 .
However, in the case where the coefficients a, b are constants and Q is also a constant, it can be easily verified that the constant function y p = Q/b becomes a particular solution. That is,
y′′p + ay′p + by p = (Q/b)′′ + a(Q/b)′ + b(Q/b) = Q.
Therefore,
y = C1 y1 +C2 y2 + Q/b
is the general solution of (A.1).
A.3 Equation for two-body problem
The differential equation we need to solve to find the orbit of two celestial bodies,
such as the Sun and the Earth, is as follows:
u′′ + u = K.
(A.10)
Deriving this equation is the main goal of Lecture 11. The constant K on the right(m1 + m2 )G
hand side is given by K =
. Here, m1 and m2 are the masses of the two
L2
celestial bodies, G is the gravitational constant, and L is the angular momentum; all
of these are constants. If x1 and x2 are the positions of the two celestial bodies, then
u is the reciprocal of the distance between them, r = ∥x1 − x2 ∥. However, the above
differential equation is not a derivative with respect to the time variable t but rather
a derivative with respect to the angular variable θ in polar coordinates.
The general solution of the above problem (A.10) is
u = C1 cos θ +C2 sin θ + K.
The coefficients C1 and C2 of the trigonometric parts are determined by the initial
conditions. Rewriting so that the sum of their squares is 1, we get
q
C1
C2
K
u= q
cos θ + q
sin θ + q
C12 +C22 .
2
2
2
2
2
2
C1 +C2
C1 +C2
C1 +C2
Then there exists an angle θ0 satisfying the following, called the phase offset:
C1
cos(θ0 ) = q
,
C12 +C22
C2
sin(θ0 ) = q
.
C12 +C22
Therefore, the above expression can be written as follows:
A.3 Equation for two-body problem
207
q
K
C12 +C22 .
u = cos(θ0 ) cos θ + sin(θ0 ) sin θ + q
2
2
C1 +C2
Now, using the difference of cosines, we rewrite the solution u as
q
K
u = cos(θ − θ0 ) + q
C12 +C22 .
2
2
C1 +C2
Simplified,
u = (1 + e cos(θ − θ0 ))K,
e=
q
C12 +C22
K
,K=
(m1 + m2 )G
.
L2
Here, e is the eccentricity of the ellipse. (People often use the letter e for eccentricity,
which is merely a tradition and should be distinguished from the exponent e based
on context.) Thus, the distance r between the two celestial bodies is given by:
r=
L2
.
(1 + e cos(θ − θ0 ))(m1 + m2 )G
Appendix B
Elliptical orbits
B.1 Eccentricity and focus of an ellipse
The equation of an ellipse with center center at the origin and major and minor axes
along the x and y axes, respectively, is given by:
x 2 y2
+
= 1.
a2 b2
An overview of the graph is provided in the figure below. If a = b, the above ellipse
becomes a circle. For convenience, we consider the case where a > b, making the
x-axis the major axis. The foci (foci) of the ellipse are located on the major axis.
The distance between the center and the focus is given by
p
c = a2 − b2 .
Thus, the foci are at (±c, 0). The eccentricity, which indicates how far the ellipse
deviates from a circle, is given by:
209
210
B Elliptical orbits
c
e= =
a
r
a2 − b2
.
a2
If e = 0, then a = b, and the ellipse becomes a circle. If e = 1, then b = 0, and the
shape is no longer an ellipse. Therefore, the eccentricity of an ellipse lies between 0
and 1.
Problem B.1. Show that if a point P(x, y) lies on the ellipse, then the sum of the
distances between this point and the two foci is always constant.
Solution B.1
⊔
⊓
The equation of a pair of hyperbolas, hyperbolas, with center at the origin and
foci on the x-axis is given by:
x 2 y2
−
= 1.
a2 b2
In this case, when the coefficient of y2 is negative, the foci lie on the x-axis. Refer
to the figure for an overview of the graph. The distance between the center and the
focus of the hyperbola is given by
p
c = a2 + b2 .
Thus, the foci are at (±c, 0). The eccentricity of the hyperbola is similarly defined
as:
r
a2 + b2
c
.
e= =
a
a2
The eccentricity of a hyperbola is greater than 1.
Problem B.2. Show that if a point P(x, y) lies on the hyperbola, then the difference
between the distances from this point to the two foci is always constant.
Solution B.2
⊔
⊓
B.2 Directices and ellipses
Consider the following figure. The line x = k is called the directrix of the trajectory
of the point P(x, y) we are going p
to obtain in this section. The length of the line OP
is denoted by r and given by r = x2 + y2 . The length of PD is k − x. For a number
e, we find a curve made by the point P(x, y) that satisfies
r = ePD.
Then, it satisfies
(B.1)
B.2 Directices and ellipses
p
211
x2 + y2 = e(k − x)
⇒
x2 + y2 = e2 (k2 − 2kx + x2 ).
It is written as
(1 − e2 )x2 + 2ke2 x + y2 = e2 k2 .
(B.2)
Depending on the value e, we obtain three kinds of curves. We will soon see that e
is the eccentricity of these curves, which is why we denote it e. We split the problem
into three cases.
Problem B.3 (Case 1. e = 1). Show that if e = 1, the particle trajectory satisfied by
(B.2) is a parabola.
Solution B.3 If e = 1, (B.2) is written as
x=
1
k
− y2 .
2 2k
This is a parabola. We know that the eccentricity of a parabola is 1. ⊔
⊓
Next, we assume e ̸= 1. Then, (B.2) is written as
ke2 y2
e2 k2
k 2 e4
x+
+
=
+
.
1 − e2
1 − e2
1 − e2 (1 − e2 )2
Simplify the right side and obtain
ke2 y2
e2 k 2
x+
+
=
.
1 − e2
1 − e2
(1 − e2 )2
(B.3)
Problem B.4 (Case 2. 0 < e < 1). Show that if 0 < e < 1, the particle trajectory
satisfied by (B.2) is an ellipse, the origin is one of the two focuses, and e is the
eccentricity of the ellipse.
Solution B.4 Suppose that 0 < e < 1. Then, since 1 − e2 > 0, we may set
a2 =
e2 k2
,
(1 − e2 )2
b2 = a2 (1 − e2 ) =
e2 k 2
,
(1 − e2 )
c=
ke2
> 0.
1 − e2
(B.4)
212
B Elliptical orbits
Divide (B.3) by a2 and obtain
(x + c)2 y2
+ 2 = 1,
a2
b
which is an ellipse. The center of the ellipse is (−c, 0).
q
2
2
The eccentricity of the ellipse is defined as a a−b
2 . We see that
a2 − a2 (1 − e2 ) 1 − (1 − e2 )
a2 − b2
=
=
= e2 .
2
a
a2
1
(B.5)
Hence, the eccentricity
of the ellipse is e. The distance from the center of an ellipse
√
to a focus is a2 − b2 . Hence, using (B.5), we obtain
a2 − b2 = a2 − a2 (1 − e2 ) = e2 a2 =
k 2 e4
.
(1 − e2 )2
Therefore, c in (B.4) is the distance. Hence, the origin is one of the two focuses of
the ellipse. ⊔
⊓
Problem B.5 (Case 3. e > 1). Show that if e > 1, the particle trajectory satisfied by
(B.2) is a branch of a hyperbola, the origin is one of the two focuses, and e is the
eccentricity of the ellipse.
Solution B.5 Suppose that e > 1. Then, since 1 − e2 < 0, we cannot take (B.4). We
take
a2 =
e2 k2
,
(1 − e2 )2
b2 = a2 (e2 − 1) =
e2 k 2
,
(1 − e2 )
Divide (B.3) by a2 and obtain
(x + c)2 y2
− 2 = 1,
a2
b
c=
ke2
.
1 − e2
c=
ke2
< 0.
1 − e2
B.3 Polar equations of an ellipse
213
This is a hyperbola. e is still the eccentricity of the hyperbola, and −c is the distance
between the origin and a focus. ⊔
⊓
B.3 Polar equations of an ellipse
One simple method of representing an elliptical orbit using polar coordinates is
given by (B.1). In this case, PD is equal to k − r cos θ , so using this expression, we
obtain
r = e(k − r cos θ ).
Solving this equation for r, we get:
r=
ek
.
1 + e cos θ
This equation represents an ellipse with eccentricity e when 0 < e < 1. However,
when e ≥ 1, it represents a parabola or a hyperbola. The directrix k depends on
the eccentricity e. When angular velocity L and eccentricity are given, it can be
expressed as follows:
L2
k=
.
eG(m1 + m2 )
Also, remember that when the total energy Etotal and angular velocity L are given,
the eccentricity is given by (12.6). That is,
s
2Etotal L2
e = 1+
.
m1 G2 m22
Appendix C
Numerical experiments for Taylor series
In this final lecture, we observe how Taylor series approximates actual functions
through simple numerical coding. We also compare it with some other approximation methods.
215
Index
absolute convergence, 180
acceleration in polar coordinates, 71
big-oh, 139
bijection, 45
binomial expansion, 197
Cauchy’s Fundamental Theorem of Calculus,
42
Cauchy’s Mean Value Theorem, 25
center of ellipse, 209
chain rule, 29
co-domain, 45
comparison test, 179
conditional convergence, 190, 191
continuity, 12
decreasing sequenc, 171
differential equation, 77
directrix, 72, 210
domain, 45
eccentricity of ellipse, 71, 209
eccentricity of hyperbola, 210
focus of ellipse, 71, 209
function, 45
Fundamental Theorem of Algebra, 149
fundamental theorem of calculus, 42
injection, 45
integrability, 41
integral, 41
integral test, 177
Intermediate Value Theorem, 25
inverse function, 45
Kepler problem, 96
L’Hopital’s rule, 135, 136
left continuity, 14
left limit, 13
limit, 11
limit comparison, 179
limit infimum, 171
limit supremum, 171
linearization, 122
little-oh, 139
local property, 122
lower bound, 171
Mean Growth Rate Theorem, 25
Mean Value Theorem, 25
monotonicity, 171
natural logarithm, 49
Newton’s Second Law of Motion, 26
one-to-one function, 45
onto function, 45
gauge, 40
hyperbola, 210
implicit differentiation, 34
increasing sequence, 171
infimum, 171
partition, 40
Position, velocity, acceleration using polar
coordinates, 71
range, 45
ratio test, 181
217
218
rearrangement, 191
Riemann sum, 41
right continuity, 14
right limit, 14
root test, 182
rules of continuity, 7
sandwich theorem, 169
separation of variables, 80
Index
slope field, 78
supremum, 171
surjection, 45
Taylor polynomial, 194
Taylor series, 194
upper bound, 171
Download