Yong-Jung Kim 25 Lectures for Undergraduate Calculus I February 20, 2024 카이스트 수리과학과 To the ones who question Foreword For a long time, there has been a need for a revision of the first-year calculus curriculum. The primary reasons for the necessity of curriculum revision are not so much the changes in calculus itself but the failure of the current curriculum to reflect the changes in the high school education that incoming students have experienced. Additionally, the recent advancements in technology have altered the demands of members from other departments who utilize mathematics. Calculus is a crucial subject that students encounter for the first time upon entering university. The emphasis has been placed on raising interest and enthusiasm for academic pursuits. To achieve this, instead of merely listing mathematical facts, a new structure has been adopted, focusing on achieving core objectives through the process of acquiring mathematical facts. To this end, the first part of Calculus 1 places the understanding of Kepler’s laws as a core objective. In fact, Newton invented calculus for this purpose. Through this process, students familiarize themselves with the basic concepts of calculus and learn about vectors in threedimensional space, including velocity, acceleration, and gravity. Particularly, learning basic principles related to the orbits of satellites and planets has become a crucial educational topic for scientists in South Korea, especially after the successful launch of the Nuriho 3rd satellite. The latter part of Calculus 1 focuses on the development and understanding of approximation techniques. After learning the mathematical core techniques of integration and differentiation in Part III, the study of sequences and series is approached from the perspective of approximation techniques in Part IV. In particular, students learn the mathematical understanding of approximation techniques, which is essential for engineers. Practicing the achievement of scientific goals with a long-term perspective may feel more challenging, as it is an experience not typically encountered in middle and high school curricula. However, the practice of applying and developing various mathematical facts to achieve scientific goals is expected to be a valuable experience and will aid in the research life of a scientist. vii viii Daejeon, August 2023, Foreword Calculus Curriculum Revision Committee Preface There is a fundamental difference between academic textbooks on university subjects and lecture notes used for teaching. While academic textbooks strive for completeness and accurate explanations of essential parts, even if they cannot encompass all related content, they also emphasize accessibility to easily approach necessary sections even independent of the course progression. Difficulty in accessing information can arise if one has to revisit the entire preceding section to understand a specific part. In contrast, lecture notes are created for the purpose of teaching. They are designed with the consideration of students studying the entire course together. Therefore, the key difference from academic textbooks lies in the approach of guiding students to follow the entire process. Effective communication, resembling a conversation between the lecturer and students in the classroom, is essential. Proper questions and motivation that allow students to think can enhance the effectiveness of learning. Sometimes, motivating students with appropriate hints may be more effective than providing detailed explanations, encouraging students to find answers on their own and stimulating critical thinking and creativity. ”25 Lectures for Undergraduate Calculus I” adheres to the characteristics of lecture notes, structured to facilitate communication between students and instructors. Efforts have been made to construct it in a format where achievements can be made through appropriate questions and the process of self-understanding. Additionally, considering holidays and other factors, the notes are structured for 25 lectures, even though many universities conduct a semester course consisting of a maximum of 28 lectures, each lasting 75 minutes. Questions serve as the driving force for learning and the starting point for creative thinking. This aligns with the QAIST education philosophy that emphasizes the importance of questions. The structure of these lecture notes aims to replace summaries and proofs with questions and solutions. Only the essential summaries remain in the form of a structured presentation. It is encouraged to visualize problems and take time to answer them before making an effort to understand the explanations. Continuous questioning is promoted. Attempting to answer these questions leads to a deeper understanding of the core concepts and encourages individuals to formulate their own questions. Asking questions is the beginning of creating something new. Yong Jung Kim ix x Preface Preface xi To KAIST Students Attending the Course A semester consists of approximately 25 lectures, and this lecture note is also organized into 25 lectures. It is helpful to read the lecture content before attending the class and come prepared with questions. Even if time is limited, verify the goals of the class before entering. For the problems constituting each lecture, try to answer them yourself before looking at the solutions. Subsequently, actively seek to understand the solutions. Each lecture contains several questions, so take some time to ponder them. It is advisable to maintain a slightly slower pace while engaging in mathematical activities. Reflecting with leisure can yield more effective results. At the end of each of the 25 lectures, exercise problems are provided. Although not numerous, they serve as a means to confirm and deepen your understanding of the material. If you find the practice problems insufficient, consider attempting problems from other general calculus books. This lecture note was initially created in Korean and then translated into English with the assistance of ChatGPT. While the English version is the official one, you are welcome to use the Korean version. Contents Part I Differentiation: Mathematical Description of Motion 1 Limit and continuity #1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Common Language Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Quality control and ε-δ arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 8 2 Limit and continuity #2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Rigorous definitions using ε-δ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Limits as x → ∞ and f (x) → ∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 15 16 3 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Rate of increase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Differentiation Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Intermediate and Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Derivative of Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Velocity and Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 22 25 25 26 4 Chain rule and implicit differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2 Implicit Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5 Integration & fundamental theorem of calculus . . . . . . . . . . . . . . . . . . . . 5.1 Antiderivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Integral as the area bounded by a graph . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Riemann sum and area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 37 39 40 6 Inverse functions and their derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Bijection (one-to-one and onto function) . . . . . . . . . . . . . . . . . . . . . . . 6.2 Derivative of inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Natural logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 45 47 49 xiii xiv Contents 6.4 Exponential function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Part II Kepler and Newton’s Laws of Motion 7 Rectangular coordinate system and curves in R3 . . . . . . . . . . . . . . . . . . . 7.1 Coordinate system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Moving particle and trajectory curves in space . . . . . . . . . . . . . . . . . . 7.4 Cross product & inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 57 59 61 62 8 Polar coordinates in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Variable change with polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Motion in polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Ellipses in polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Curves in polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 67 69 71 73 9 Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 First order differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Separation of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Integrating factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Second Order Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Equation for two-body problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 77 80 81 83 83 10 Newton’s law on Earth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Newton’s law of motion and gravitation . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Work and energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Gravity force and potential energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Projectile motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 85 86 87 90 11 Newton’s law in space: Two-body problem . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Kepler’s laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Two-body problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Center of mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Displacement vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Kepler problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 93 94 95 96 97 12 Kepler’s law and the energy of planets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 12.1 Energy of circular orbits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 12.2 Energy of elliptical orbits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 12.3 Circular orbit of satellites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 12.4 Elliptical orbits of satellites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 12.5 Interstellar and solar system object . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Part III The Arts of Calculus 13 Curves and particle trajectories in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Contents 13.1 13.2 13.3 13.4 13.5 xv Arc length as a variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Parametrization with arc length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 TNB coordinate system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Computation formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 14 Linearization and differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 14.1 Linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 14.2 Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 14.3 Differentials for linear approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 124 15 Inverse trigonometric and hyperbolic functions . . . . . . . . . . . . . . . . . . . . 127 15.0.1 Inverse trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . 127 15.1 Hyperbolic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 16 L’Hopital’s rule, big-oh, and little-oh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 16.1 L’Hopital’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 16.2 Big-oh and Little-oh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 17 Integration Techniques # 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 17.1 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 17.2 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 18 Integration Techniques # 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 18.1 Trigonometric substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 18.2 Integration of rational functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 19 Integration Techniques #3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 19.1 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 19.2 Integration with software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Part IV Approximation Techniques and Series 20 Numerical Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 20.1 Numerical integration and Riemann sum . . . . . . . . . . . . . . . . . . . . . . . 159 20.2 Convergence order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 20.3 Numerical integrals and Gaussian quadrature . . . . . . . . . . . . . . . . . . . 163 21 Sequences and series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 21.1 Sequence of real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 21.2 Series of real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 21.3 Power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 22 Tests for absolute convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 22.1 Integral test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 22.2 Comparison test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 22.3 Ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 xvi Contents 22.4 Root test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 23 Power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 23.1 Convergence of a power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 23.2 Radius of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 23.3 Alternating series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 23.4 Rearrangement and conditional convergence . . . . . . . . . . . . . . . . . . . . 190 24 Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 24.1 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 24.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 A Second Order Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 A.1 Second-order homogeneous linear equation . . . . . . . . . . . . . . . . . . . . . 201 A.2 Second order inhomogeneous linear equation . . . . . . . . . . . . . . . . . . . 205 A.3 Equation for two-body problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 B Elliptical orbits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 B.1 Eccentricity and focus of an ellipse . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 B.2 Directices and ellipses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 B.3 Polar equations of an ellipse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 C Numerical experiments for Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Part I Differentiation: Mathematical Description of Motion During the time of Newton (1643-1727), the most intriguing scientific topic was the motion of celestial bodies. The heliocentric theory had collapsed due to Galileo (1564-1642), and the geocentric theory had started to be accepted, thanks to Kepler (1571-1630), who explained the orbits and motions of celestial bodies. Therefore, the study of celestial motion was the hottest topic of that era. However, the explanations were based on observational data and could not address the underlying causes. Moreover, there were no mathematical tools suitable for describing such dynamic celestial motions. Newton, the person who developed mathematical tools suitable for describing dynamic phenomena in contrast to static ones, such as Pythagoras’s theorem, was the one who introduced calculus. Newton used calculus to explain the motion of celestial bodies. In Part I, our goal is to develop the concept of differentiation, the means to represent such dynamic motion. Studying it from Newton’s perspective of developing calculus may help us better understand its value. In the following chapters of Part II, we will use differentiation to mathematically represent and prove the orbits of planets and satellites. We will then use this knowledge to explain celestial motion, including Kepler’s laws. In this process, we will grasp the concept of differentiation and experience how to apply it in specific situations. Of course, differentiation is used not only in celestial motion but also in various other areas. While studying celestial motion is fascinating, we should not limit ourselves to that perspective. We will learn the fundamental aspects of calculus from various viewpoints. In particular, we will rigorously handle concepts such as convergence, continuity, and differentiability using the ε-δ technique. This technique, developed after Newton, might be allowing us to deal with differentiation more rigorously than Newton did. Lecture 1 Limit and continuity #1 In calculus, limits are frequently used. Not only do we define differentiation and integration through limits, but we also understand fundamental concepts like convergence and continuity through limits. In practical applications, obtaining an approximate value is often more common than an exact true value. In such cases, a crucial question arises: ”Does the approximate value converge to the true value as accuracy increases, or is there a limit beyond which it cannot approach?” Limits or convergence are not exclusive to calculus; they are core concepts recurring in various fields. Therefore, a clear understanding of these concepts is crucial. In this lecture, we learn about limits, convergence, and continuity using everyday language. Use of Symbols Although mathematics is considered the study of numbers, it often expresses many things in terms of symbols rather than actual numbers. Notations like f , g are used to represent functions, x, y, z for variables, and a, b, c for constants. However, using only alphabetical characters is not sufficient, and Greek letters are frequently employed. Here are some commonly used Greek lowercase letters: 1. α alpha, β beta, γ gamma, ω omega, σ sigma, θ theta, ρ rho, φ phi, 2. ε epsilon, δ delta, λ lambda, τ tau And commonly used Greek uppercase letters: 1. Ω Omega, Σ Sigma, ∆ Delta 1.1 Common Language Definitions Let’s understand the properties of limits and continuity using everyday language. Consider a function f with real values defined on the interval [a, b] ⊂ R. In notation, we write this as: f : [a, b] → R. We refer to [a, b] as a closed interval, including both a and b and all real numbers in between. It is denoted as [a, b] = {x ∈ R : a ≤ x ≤ b}. An open interval, denoted 3 4 1 Limit and continuity #1 as (a, b), excludes the endpoints. If a variable x ∈ [a, b] approaches a specific value c ∈ (a, b), and the function value f (x) approaches a certain value L, we say that the limit of f (x) as x approaches c is L, denoted as: lim f (x) = L. x→c We also say that, as x approaches c, f (x) converges to L. This convergence statement holds regardless of the direction from which x approaches c. In one-dimensional space R, there are only two directions: left and right. The right limit (right limit) is denoted as: lim f (x) = L. x→c+ Here, c+ implies that x approaches c but always from the right, indicating values greater than c. Similarly, the left limit (left limit) is denoted as: lim f (x) = L. x→c− In summary, the limit limx→c f (x) = L means both the left and right limits converge to L. That is, lim f (x) = lim f (x) = L. x→c− x→c+ When discussing limits and convergence, the specific value f (c) has no relevance. The focus is on the behavior of the function as x approaches c. On the other hand, the continuity of a function f at c is related to the function value f (c). If, as x approaches c, the limit exists, and the limit value L is equal to f (c), we say that f is continuous at c. In summary, saying f is continuous at c means that the following four conditions exist and are all equal: lim f (x) = lim f (x) = lim f (x) = f (c). x→c+ x→c− x→c Now, let’s consider the first problem in this lecture: Problem 1.1. Find the left and right limits of the given functions at the specified points c and determine whether the functions are continuous at those points. 2 −1 1 at c = 1. (4) x−1 at c = 1. (1) sin x at c = 0. (2) cos x at c = 0. (3) xx−1 (5) f (x) at c = 0, where f (x) is the Heaviside function given by ( 0, if x < 0 f (x) = 1, if x ≥ 0. After attempting to answer the questions, it’s beneficial to review the solutions to enhance mathematical understanding. 1.1 Common Language Definitions 5 Solution 1.1 Knowing the graph of the functions helps determine the one-sided limits easily. (1) lim sin x = 0. (2) lim cos x = 1. (Both limits in these examples equal x→0 x→0 2 −1 the function value at c = 0.) (3) limx→1 xx−1 = 2. (Note that x2 −1 x−1 is undefined at 1 does x = 1 due to division by zero, but the limit exists.) (4) The limit lim x→1 x − 1 not exist. (As x → 1, the function value diverges. Therefore, there is no specific number L corresponding to the limit.) (5) The limit lim f (x) does not exist. The x→0 function f (x) converges to 1 as x approaches 0 from the right and converges to 0 as x approaches 0 from the left. Since the right-hand limit and the left-hand limit are different, it is said that the limit does not exist. ⊔ ⊓ Mathematical principles and laws consist of conditions and results. This is true not only in mathematics but also in most sciences. However, at times, even without explicitly stating what the conditions are, the meaning of the conclusion can be clear based on what it implies. Let’s practice reading mathematical laws by looking at the following principles. Problem 1.2 (Rules of limits). Below are several laws related to limits written in the form of expressions. To understand the meaning of these expressions, one must be able to read what the conditions are and what the conclusions are. Differentiate and explain the meaning of the laws by specifying the conditions and conclusions. (1) lim ( f (x) + g(x)) = lim f (x) + lim g(x). x→c x→c x→c (2) lim ( f (x) − g(x)) = lim f (x) − lim g(x). x→c x→c x→c (3) lim (k f (x)) = k(lim f (x)), k is a constant number. x→c x→c (4) lim ( f (x)g(x)) = (lim f (x))(lim g(x)). x→c x→c x→c Solution 1.2 Let’s consider the obvious conditions and conclusions for these expressions to have meaning. For the first expression (1), the condition for it to have meaning is simply ”both limits limx→c f (x) and limx→c g(x) exist.” The conclusion it wants to convey is ”the limit of the function f + g also exists, and that limit is the sum of the two limits limx→c f (x) + limx→c g(x).” This can be expressed differently, but it is not very natural. Arguing something unnatural is not helpful. Now, let’s similarly explain the conditions and results for the remaining three cases. ⊔ ⊓ The following is a question. Unlike problems, questions are not intended to have mathematical answers, but rather to think about principles, provide motivation, or sometimes pose slightly philosophical questions. Some answers are provided below, while others are not. Question 1.1. If you have proven the four laws above, what is the difference between proving and explaining? 6 1 Limit and continuity #1 Expressing mathematical facts in everyday language and understanding them in your own terms is crucial. It is a core part of the process of understanding mathematics. However, even though explanations have been given in everyday language, calling it a proof may feel somewhat lacking. The next lecture will introduce the ε-δ method for a more rigorous approach. Problem 1.3 (Limit of quotient). The quotient rule related to limits is as follows: lim f (x) f (x) x→c = . x→c g(x) lim g(x) lim x→c For this rule to have meaning, conditions are needed. State the necessary conditions and explain the conclusion. Solution 1.3 The four cases from Problem 1.2 are different in a way, so they are separated here. If we discuss it in a similar way, the condition is that both limits limx→c f (x) and limx→c g(x) exist. The conclusion is that the limit of the quotient f (x) x→c f (x) given by g(x) also exists, and that limit is lim limx→c g(x) . However, there is an issue here. In the case of a fraction, the denominator should not be 0. Therefore, an additional condition needs to be added in the conditions section, namely, limx→c g(x) ̸= 0. Additionally, when x approaches c, g(x) ̸= 0 must be satisfied for the left-hand side to have meaning. These two conditions are required. ⊔ ⊓ One of the most commonly used and important functions is the power function. Given a real number α as the power, the function is defined as f (x) = xα . The composite function F(x) = ( f (x))α also frequently appears. When dealing with the limits of these functions, certain precautions need to be taken. Conversely, exponential functions have a constant base, and the power is a variable, such as f (x) = 2x or generally f (x) = ax . Problem 1.4 (Limit of power functions). The following power rule and its meaning are given: lim ( f (x))α = lim f (x) x→c x→c α . Explain the meaning of this rule and state the necessary conditions for the rule to hold. Solution 1.4 When given a power function as in the problem, additional conditions depend on the value of α. (1) If α is a positive integer, the necessary condition is that the limit limx→c f (x) exists. (2) If α is a negative integer, since it is in fractional form, the limit should not be 0, and f (x) in the vicinity of c should not be 0. (3) If α is a positive real number, in addition to the condition that the limit limx→c f (x) exists, f (x) must be either 0 or positive in order to avoid issues with 1.1 Common Language Definitions 7 the square of a negative real number. (4) If α is a negative real number, in addition to the condition that the limit limx→c f (x) exists, f (x) must be positive, and its limit limx→c f (x) should also be positive to avoid division by 0. ⊔ ⊓ Knowing whether a function is continuous or not is crucial. The laws mentioned above for limits are directly used to determine the continuity of the following six functions. Problem 1.5 (Rules of continuity). Assume two functions f and g are continuous at c and k ∈ R is a real number. Then, show that the following six functions are continuous at c. However, in some cases, additional conditions may be needed. Specify which cases require additional conditions and what those conditions are. (1) f + g (2) f − g (3) k f (4) f g (5) f /g (6) f k Solution 1.5 (1-4) The first four cases follow the same conditions as the four cases in Problem 1.2 without the need for additional conditions. (5) For the fractional function f /g, the additional condition is that g(c) ̸= 0 to avoid division by zero. (6) For the power function f k , additional conditions depend on k. If k is a positive integer, no additional conditions are needed. If k is negative, an additional condition is that f (c) ̸= 0. If k is a real number, an additional condition is that f should not be negative to avoid issues with the square root. ⊔ ⊓ Question 1.2. In the answers for (6) and (5), the condition that f should not be zero near c was not added. While it was added when explaining limits, why is it not necessary when explaining continuity? Problem 1.6 (Continuity of a composition). Let g : R → R be continuous at c ∈ R and f : R → R be continuous at g(c) ∈ R. Then, a composition function ( f ◦g)(x) := f (g(x)) is continuous at c. Does this statement seem correct? Explain in your own words. Solution 1.6 Since g is continuous at c, as x approaches c, g(x) approaches g(c). Also, when g(x) approaches g(c), f is continuous at g(c), so f (g(x)) approaches f (g(c)). Therefore, the composite function f (g(x)) is continuous at c. ⊔ ⊓ 8 1 Limit and continuity #1 1.2 Quality control and ε-δ arguments This section aims to assist in understanding the ε-δ argument. The ε-δ argument is a very natural concept that anyone can become familiar with. For example, when a factory produces goods or adjusts a situation to generate the appropriate output, there must be proper input. In such cases, maintaining the quality of the input is essential to uphold the quality of the output. Problem 1.7 (Quality control). Suppose a factory produces products, and when there is no impurity in the input material, the defect rate √ is 0%. If impurities of xµg per 1g of input material are mixed, the defect rate is x%. (1) The factory manager received an instruction to reduce the defect rate to 4% (ε = 4%) or less. How much should the impurity be kept below? (2) If an instruction is given to reduce the defect rate to 0.4% or less, how much should the impurity be kept below? √ < 16. Solution 1.7 (1) The upper limit condition for the defect rate, x < 4, gives x √ Therefore, the impurity should be kept below 16µg(= δ ) per 1g. (2) Similarly, x < 0.4 yields x = 0.16. Therefore, the impurity should be kept below 0.16µg(= δ ) per 1g. In the given problem situation, to reduce the defect rate by a factor of 10, the impurity must be reduced by a factor of 100. ⊔ ⊓ When the directive to reduce the defect rate is given, the factory manager must know how much impurity should be reduced for this purpose. Such situations are common in our surroundings. The desired error range of the output is traditionally denoted by ε > 0, and the adjustment range of the input to achieve it is denoted by δ > 0. In many cases, reducing the amount of impurity to zero is impossible. What can be done is to minimize the adjustment range. Such situations are prevalent in our surroundings. Problem 1.8 (Quality control with continuity). In a vinegar factory using traditional fermentation methods, the ideal acidity is pH 3. To achieve this, the ingredients need to be fermented appropriately, and in the manufacturing environment of this factory, when the fermentation efficiency is 36%, the pH becomes 3. If we denote the acidity pH by y and the fermentation efficiency percentage by x, we can express this relationship as the √ function y = f (x). The relationship between them in this factory is stated as y = x/2. The factory manager instructed the factory manager to adjust the fermentation efficiency to a range within 0.1 above and below pH 3. In that case, in what range should the fermentation efficiency be adjusted by the factory manager? Solution 1.8 √ First, the upper limit of fermentation efficiency is given by the relation y = x/2 < 3.1, and therefore, the upper limit of fermentation √ efficiency is x < (6.2)2 = 38.44. The lower limit is given by the relation y = x/2 > 2.9, and therefore, the lower limit of fermentation efficiency is x > (5.8)2 = 33.64. In other words, it is allowed for the fermentation efficiency to be 2.44% larger or 2.36% 1.2 Quality control and ε-δ arguments 9 smaller than the optimal fermentation efficiency of 36%, but it should not exceed that range. ⊔ ⊓ In the above example, the difference between the optimal fermentation efficiency of 36% and the upper limit is different from the difference with the lower limit. In most cases, this is true. Choosing the smaller one, let’s set δ = 2.36 and choose the tolerance range of acidity from the manager’s office as ε = 0.1. Then, we can express it as follows. | f (x) − 3| < ε if |x − 36| < δ . (1.1) If the boundaries of the upper and lower limits are different, it may seem like losing information to choose the smaller one, but sometimes it is more convenient or there is no choice but to do so. We use such relationships to define many things, and these are collectively referred to as the ε-δ method. Question 1.3. In the given problem, even if the manager provides a very small tolerance range ε > 0, can we determine the adjustment range δ > 0 that satisfies (1.1)? How can this be proven? Assuming not a number but an arbitrary ε (0 < ε < 3) is given, similar calculations are performed to determine δ√ . Let’s try it. The output y should be within the range of 3 − ε < y < 3 + ε, so y = x/2 is rearranged as follows: √ 3 − ε < x/2 < 3 + ε ⇒ 36 − (24ε − 4ε 2 ) < x < 36 + (24ε + 4ε 2 ). Therefore, δ can be set as (24ε − 4ε 2 ) or smaller. As shown here, usually, when the limit error ε is given, the adjustment error δ is determined based on ε. Question 1.4. In the given problem, the reason why it is possible to choose δ > 0 √ for every ε > 0 is that f (x) = x/2 is continuous at c = 36. Can you see why? The following problem deals with a situation where defective products are produced based on different relationships depending on whether there are impurities or not. Problem 1.9 (Quality control with discontinuity). Suppose that when impurities are mixed in 1g of input material at a rate of xµg, the probability of defective prod- 10 1 Limit and continuity #1 √ ucts is given by 1+ x%. If there are no impurities at all, the probability of defective products is 0 (refer to the figure). In this case, let’s reconsider parts (1) and (2) of Problem 1.7. √ Solution 1.9 (1) The upper limit condition for the defect rate, 1 + x < 4, gives x < 9. Therefore, impurities should be adjusted to 9µg/g(= δ ). (2) √ √ the amount of √ Similarly, 1 + x < 0.4 yields x < −0.6. However, x cannot be negative, so there is no control range. In other words, there is no corresponding δ > 0. ⊔ ⊓ In the above problem, when the error limit for the product was set to ε = 4, it was possible to determine the control limit δ . However, when ε = 0.4, it was not possible to determine the corresponding δ . This is because the defect rate function f (x) is discontinuous at x = 0. If it were continuous, such issues would not arise. Problem 1.10. In the case of Problem 1.8, if the function f (x) is discontinuous at c, the manager can assign an impossible task to the factory manager (refer to the figure). The manager can specify ε > 0 so that it is impossible for the factory manager to determine δ > 0 satisfying (1.1). How small can ε > 0 be? Solution 1.10 If the function f (x) is discontinuous at c, then either the right-hand limit or the left-hand limit is different from the function value f (c). If ε is chosen to be smaller than the difference between these values and the function value, then no matter how small δ > 0 is chosen, the error cannot be maintained to be less than ε. ⊔ ⊓ Exercises 1. Find the following limits. x2 − c2 (2) lim xπ (1) lim x→c x − c x→0 (3) lim π x x→0 2. Let the input amount be x, and the output amount be f (x) = x2 . If the desired output amount is 100 and the error limit is ε = 1, what is the maximum range of adjustment for the input amount around x = 10? (Find δ > 0 such that | f (x) − f (10)| < ε whenever |x − 10| < δ .) 3. If lim f (x) = f (c), explain that there exists a control range δ > 0 even if a very x→c small error limit ε > 0 is given. 4. Prove that if the function f (x) is continuous at x = c, and a very small error range ε > 0 is given, there exists a range of adjustment |x − c| < δ for x to ensure that f (x) is within the error range of f (c). Lecture 2 Limit and continuity #2 To prove rather than explain, a proper definition is needed. In this lecture, we define the concepts of limit, convergence, and continuity using the ε-δ method. The ε-δ method defines whether it is possible to control the range of errors in the output f (x) within ε by adjusting the range of input within δ . It is not very different from everyday language. It may feel awkward at first, but it can become familiar with a little effort. 2.1 Rigorous definitions using ε-δ Now, using the ε-δ method, we aim to define limit, convergence, and continuity more rigorously. The concept is defined by determining whether it is possible to find an adjustment range δ > 0 that satisfies the error range ε in the output f (x). This definition is not much different from the one in everyday language. It is necessary to confirm and become familiar with the fact that such definitions using everyday language are not very different from the rigorous ones we are introducing. Continuity is a fundamental concept that appears everywhere. One classical definition of continuity of the function f (x) at the point c is, ”small changes in x near the point c produce only small changes in the function values f (x) near f (c).” However, this is still an explanatory definition and is not sufficient for rigorous proofs. The following rigorous definition is given using the ε-δ method. Definition 2.1 (Limit and continuity). Let a function f : R → R and a point c ∈ R be given. We say that the limit of f (x) as x → c is the number L such that if, for any given ε > 0, there exists a corresponding number δ > 0 such that | f (x) − L| < ε whenever 0 < |x − c| < δ . (2.1) 11 12 2 Limit and continuity #2 We denote the limit as lim f (x) = L. We also say that f (x) is continuous at c if x→c lim f (x) = f (c). We can also directly define continuity without using limit, i.e., x→c there exists δ > 0 for any given ε > 0 such that | f (x) − f (c)| < ε whenever |x − c| < δ . (2.2) If f is continuous at all c ∈ (a, b), we say f is continuous in (a, b). Read and understand Definition 2.1, and the next step is to perform simple calculations using this definition. The purpose of the following problems is not only to provide obvious solutions but to make you feel that the above definition really means what seems obvious. It requires effort to become familiar with it. Problem 2.1. For the function f (x) = 2x + 1, show the following: (1) limx→2 f (x) = 5. (2) limx→1 f (x) ̸= 2. (3) f is continuous at any point c ∈ R. Solution 2.1 Drawing the graph of the function f (x) = 2x + 1 and answering the above questions using everyday language is straightforward. Here, we use the ε-δ argument to explain. It helps to draw the graph. Let’s go through them one by one. (1) To show the limit limx→2 f (x) = 5, we need to find a suitable δ > 0 for any given ε > 0. Since f (2) = 5, we have | f (x) − f (2)| = |2x + 1 − 5| = |2(x − 2)| = 2|x − 2| < ε if and only if |x − 2| < ε/2. Therefore, we can choose δ to be ε/2. We haven’t been explicitly told to choose δ as large as possible, so choosing it smaller than this is acceptable. (2) To show that limx→1 f (x) ̸= 2, it means we need to provide an ε > 0 for which there is no δ > 0. Since f (1) = 3, let’s choose a difference of 1 or smaller. Let ε = 0.5. Then, | f (x) − 2| = |2x + 1 − 2| = |2(x − 1) + 1| = 2|x − 1| + 1 > 0.5 for any x > 1. Therefore, no matter how small we choose δ > 0, there will always be x satisfying 0 < |x − 1| < δ such that | f (x) − 2| > ε. Thus, limx→1 f (x) ̸= 2. (3) Let’s take an arbitrary c ∈ R. We want to show that the function f (x) = 2x + 1 is continuous at this point. Assume ε > 0 is given. Then, we choose δ = ε2 . (For 1st-degree functions like this one, we can choose the same δ for all points. In most cases, we need to choose different δ depending on the location.) Now, assume 0 < |x − c| < δ . Then, 2.1 Rigorous definitions using ε-δ 13 ε | f (x) − f (c)| = |2x + 1 − 2c − 1| = 2|x − c| < 2 = ε. 2 Hence, for any given error limit ε > 0, we can find an adjusting limit δ , making f continuous at c. Question 2.1. Does the continuity defined in Definition 2.1 align with our general concept of continuity? The ε-δ method is a dynamic expression. It’s like playing a game. If you give me an error limit ε > 0, I can always find an adjusting range δ > 0 that satisfies (2.1) or (2.2). Therefore, proving means finding and demonstrating such δ > 0. In other words, proving is the technique of finding something, not just explaining. Showing that for any given ε > 0, there is no such δ > 0 is equivalent to proving that it is not continuous or does not converge. Now, proof is not about explaining well in verbal language but about the skill of finding something. If you give a large ε > 0, it is usually easier to find δ > 0. However, if it is continuous or converges, you can find δ > 0 even if you give a very small ε > 0. So, we start by assuming any ε > 0, and if we can find the corresponding δ > 0, it means the convergence of L is the limit of f (x), or f is continuous at c. If L is not a limit, or f is not continuous, it means we cannot find such δ > 0 when ε is sufficiently small. Problem 2.2. In the previous definition of the limit, we assumed that the function f is defined for the entire real numbers R. However, upon closer inspection, the definition shows that even if f is defined only in a small interval, the limit at all c in that domain is well-defined. Explain the reason for this. Solution 2.2 Looking at the relation (2.1), we can see that if f is defined only in an open interval (a, b) and c ∈ (a, b), the definition is still valid. The reason is that if δ is small enough, the points satisfying 0 < |x − c| < δ will lie inside (a, b). As long as f is defined there, the definition holds. However, if the domain is a closed interval [a, b] and c is one of the endpoints, then we need to consider left or right limits and redefine the limit. In this case, for the limit to be well-defined, one-sided limits need to be considered. Now, it is possible to precisely define the extremes of the right and left in everyday language. Let’s create a definition. Creating definitions can be a useful exercise. Problem 2.3. Use the ε-δ method to define the right limit and left limit. Solution 2.3 The definitions can vary slightly, but the essence should be included. (i) Left limit: Let f : (a, b) → R and c ∈ (a, b). We say L is the left limit of f (x) as x → c− and write lim f (x) = L, x→c− 14 2 Limit and continuity #2 if, for any ε > 0, there exists δ > 0 such that | f (x) − L| < ε whenever 0 < c−x < δ. (ii) Right limit: Let f : (c, b) → R for some c < b. We say L is the right limit of f (x) as x → c+ and write lim f (x) = L, x→c+ if, for any ε > 0, there exists δ > 0 such that | f (x) − L| < ε whenever 0 < x−c < δ. Check if these definitions accurately reflect the meaning of the right limit and left limit. Problem 2.4 (Left continuity and right continuity). Define left continuity and right continuity using the ε-δ method. Solution 2.4 Using the previously defined left limit and right limit, we can define right continuity and left continuity as follows: Definition A: Let f : (a, b) → R and c ∈ (a, b). We say that f (x) is leftcontinuous at c if limx→c− f (x) = f (c). We say f (x) is left-continuous on (a, b) if f is left-continuous at all c ∈ (a, b). The right-continuity is defined similarly. Without using right and left limits explicitly, we can define left continuity and right continuity using the ε-δ method by only including 0. Definition B: Let f : (a, b) → R and c ∈ (a, b). We say that f (x) is left continuous at c if, for any given ε > 0, there exists δ > 0 such that | f (x) − f (c)| < ε whenever 0 ≤ c−x < δ. We say that f (x) is right continuous at c if, for any given ε > 0, there exists δ > 0 such that | f (x) − f (c)| < ε whenever 0 ≤ x − c < δ . We say f is left (right) continuous in (a, b), if it is left (right) continuous at all c ∈ (a, b). The difference between the two definitions lies in substituting L with f (c) and including the case of 0 = x − c by using 0 ≤ x − c < δ instead of 0 < x − c < δ . ⊔ ⊓ 2.2 Examples 15 2.2 Examples Using the ε-δ method, we can prove the limits and continuity laws for Problems 1.2 and 1.5. However, to do this, some techniques are required. Problem 4 is challenging, while other cases are relatively straightforward. Problem 2.5 (Sum law). (1) Show the following relationship between limits: lim ( f (x) + g(x)) = lim f (x) + lim g(x). x→c x→c x→c (2) Prove that if functions f and g are continuous at c ∈ R, then f + g is also continuous at c ∈ R. Solution 2.5 Proving (1) and (2) is almost the same problem. Let ε > 0 be given. The goal is to find δ > 0 determined by ε. Assume limx→c f (x) = L and limx→c g(x) = M. According to the definitions, there exist δ1 > 0 and δ2 > 0 such that: | f (x) − L| < 0.5ε whenever 0 < |x − c| < δ1 , |g(x) − M| < 0.5ε whenever 0 < |x − c| < δ2 . Note that we found δ corresponding to 0.5ε instead of ε in the definitions. Since δ may differ in the two cases, we denote them as δ1 and δ2 . We use the smaller δ = min(δ1 , δ2 ). Then, if 0 < |x − c| < δ , we have: | f (x) + g(x) − (L + M)| ≤ | f (x) − L| + |g(x) − M| < 0.5ε + 0.5ε = ε. Thus, by the definition of the limit, L + M is the limit of the function f (x) + g(x), and therefore, lim ( f (x) + g(x)) = L + M = lim f (x) + lim g(x). x→c x→c x→c If f and g are continuous at c, then L = f (c) and M = g(c). Hence, lim ( f (x) + g(x)) = L + M = f (c) + g(c). x→c Therefore, f + g is continuous at c ∈ R. ⊔ ⊓ The above proof regarding convergence implies that when two functions f and g are continuous at a point, their sum f + g is also continuous at that point. Now, let’s prove the continuity of a composite function. Problem 2.6 (Continuity of a composition). Let f : R → R be continuous at c ∈ R, and let g : R → R be continuous at f (c) ∈ R. Then, the composition function (g ◦ f )(x) := g( f (x)) is continuous at c. Solution 2.6 Let ε > 0 be given. We need to find δ > 0 such that (2.2) holds. Since g is continuous at f (c), there exists δ1 > 0 such that 16 2 Limit and continuity #2 Once ε > 0 is given, the goal is to find δ > 0 that satisfies (2.2). Since g is continuous at f (c), there exists δ1 > 0 such that |g( f (x)) − g( f (c))| < ε whenever | f (x) − f (c)| < δ1 . At this point, it is crucial that we did not use |g(x) − g( f (c))| < ε. In the next step, the key is to use δ1 as the ε for the continuity of the function f . Since f is continuous at c, there exists δ2 > 0 such that | f (x) − f (c)| < δ1 whenever |x − c| < δ2 . Now, set δ = δ2 , and the proof is complete. (This proof, though straightforward after careful consideration, goes beyond the simplicity when attempting to prove the continuity of composite functions using everyday language, as attempted in Lecture 1. It illustrates the cleverness of the ε-δ method.) ⊔ ⊓ Problem 2.7. Let ( x, if x < 1 f (x) = 2x, if x ≥ 1. Show that (1) limx→1+ f (x) = 2 and (2) limx→1− f (x) = 1. Problem 2.8 (One-sided limits). Show that lim f (x) = L x→c if and only if lim f (x) = L = lim f (x). x→c− x→c+ Problem 2.9 (Sandwich Theorem). Let f , g, h : (a, b) → R, g(x) ≤ f (x) ≤ h(x) on (a, b), and c ∈ (a, b). Show limx→c f (x) = L if lim g(x) = lim h(x) = L. x→c x→c All three problems above require choosing δ > 0 given ε > 0. In Problem 2.7, it is necessary to carefully consider the function to determine the appropriate δ , while in Problems 2.8 and 2.9, conditions must be used to establish δ > 0. 2.3 Limits as x → ∞ and f (x) → ∞ Sometimes, as x → ∞ or x → −∞, a function may either converge or diverge to infinity. Let’s think about the definitions in such cases. Try creating definitions for the situations listed below and then compare them with the given examples. Problem 2.10. For a function f : R → R, create the definitions for the following cases using the ε-δ method: (1) lim f (x) = L. (2) lim f (x) = L. (3) lim f (x) = ∞. x→∞ x→−∞ x→c 2.3 Limits as x → ∞ and f (x) → ∞ 17 (4) lim f (x) = −∞. (5) lim f (x) = ∞. (6) lim f (x) = ∞. (7) lim f (x) = −∞. (8) x→c+ x→c x→c+ x→c− lim f (x) = −∞. x→c− Solution 2.10 First, try creating definitions and then compare them with the definitions given below. Think about whether the provided definitions capture the intended situations. 1. We say lim f (x) = L if for any ε > 0, there exists N ∈ R such that | f (x) − L| < ε x→∞ whenever x > N. 2. We say lim f (x) = L if for any ε > 0, there exists N ∈ R such that | f (x)−L| < ε x→−∞ whenever x < N. 3. We say lim f (x) = ∞ if for any N ∈ R, there exists δ > 0 such that f (x) > N x→c whenever 0 < |x − c| < δ . 4. We say lim f (x) = −∞ if for any N ∈ R, there exists δ > 0 such that f (x) < N x→c whenever 0 < |x − c| < δ . 5. We say lim f (x) = ∞ if for any N ∈ R, there exists δ > 0 such that f (x) > N x→c+ whenever 0 < x − c < δ . 6. We say lim f (x) = ∞ if for any N ∈ R, there exists δ > 0 such that f (x) > N x→c+ whenever 0 < c − x < δ . 7. We say lim f (x) = −∞ if for any N ∈ R, there exists δ > 0 such that f (x) < N x→c− whenever 0 < x − c < δ . 8. We say lim f (x) = −∞ if for any N ∈ R, there exists δ > 0 such that f (x) < N x→c− whenever 0 < c − x < δ . ⊔ ⊓ Exercises 1. Prove the following limits using the definitions: (1) lim x−1 = 0 (2) lim x−1 = ∞ (3) lim x3 = 0 x→∞ x→0+ 2. Find the limits if they exist: 1 x3 − 1 (2) lim (1) lim 2 x→2 x − 2 x→1 x − 1 x→0 x2 − 4 x→2 x − 2 (3) lim 3. Show whether the following functions are continuous at the given point or not: √ 2 −4 (1) f (x) = 1x at c ̸= 0 (2) f (x) = xx−2 at c = 2 (3) f (x) = x at c = 4 18 2 Limit and continuity #2 4. Determine whether the following functions are continuous at x = 0 or not: (1) f (x)p = x sin(1/x) with f (0) = 0 (2) f (x) = x2 sin(1/x) with f (0) = 0 f (x) = |x| (3) 5. (2)Draw the graph of the following function: ( 0 if x ≤ 0, f (x) = −1 cos(x ) if x ≥ 1. (2) Show that this function is continuous at points c ̸= 0 and discontinuous at c = 0. (Use the fact that cos x is continuous, and the composition of two continuous functions is continuous.) Lecture 3 Differentiation The statement that differentiation is a mathematical tool for describing motion means that it can express laws related to motion. Moving objects have velocity. Velocity indicates how much and in which direction the position changes at each moment. However, if an object moves instantaneously, velocity cannot be considered. The motion we want to represent mathematically is continuous motion where the position changes continuously. We have already learned the very important mathematical concept of continuity. Now, we learn differentiation, which can represent quantities related to motion, such as velocity or acceleration. 3.1 Rate of increase Let the function f (x) be given as follows: f : R → R, x ∈ R. Here, the variable is represented by x and takes real values (x ∈ R). The function is denoted by f and also takes real values ( f ∈ R). This function can represent the position of an object moving on a one-dimensional line, or it can represent more general quantities. Depending on the situation, a different symbol may be used instead of x to represent the variable. x can be used to represent time, but very often time is represented by t ∈ R. If f (t) represents the position of a runner after t seconds since the start, it is related to motion. However, it can also represent various things, such as the production quantity of a specific product during t hours. We deal with a wide variety of situations. The input variable can represent a quantity other than time, and in that case, t can be considered as a quantity other than time, or a different symbol like x ∈ R can be used, but it doesn’t make a difference. Using notation that does not confuse the meaning is good. Let x increase from a to b, and suppose that the value of the function increases from f (a) to f (b). Let’s denote the increments as △x = b−a and △ f = f (b)− f (a). The ratio of these increments, i.e., f (b) − f (a) △ f ≡ b−a △x mean growth rate, is called the mean growth rate. For example, saying the mean growth rate is 10 means that when x increases by △x from a, the function value increases by 10△x 19 20 3 Differentiation from f (a). Of course, if x decreases, the function value decreases accordingly. When this ratio has a positive value, it means that when the variable x increases, the value of the function f increases, and when x decreases, the value of the function f decreases. Saying the mean growth rate is -10 means that when x increases, f decreases ten times, and when x decreases, f increases ten times. In the definition above, if we replace a with c and b with c + h, the mean growth rate can also be written as f (c + h) − f (c) h mean growth rate. Here, h can be positive or negative. This expression is also widely used and useful. If the limit of the mean growth rate exists as h approaches 0, we denote that limit as f ′ (c). That is, f ′ (c) = lim h→0 f (c + h) − f (c) . h derivative (instantaneous growth rate) This limit becomes the instantaneous growth rate (derivative) of the function f at c and is also called the derivative of f at c. Geometrically, it becomes the slope of the tangent line at the point (c, f (c)) on the graph. Problem 3.1 (Tangent line formula). Suppose the function f is differentiable at x = c. Find the equation of the tangent line that touches the graph at x = c. Solution 3.1 Using the fact that the slope is a and the line passes through (x0 , y0 ), we use the equation of a line y − y0 = a(x − x0 ). For the tangent line with slope f ′ (c) and passing through (c, f (c)), the equation is y − f (c) = f ′ (c)(x − c) or y = f ′ (c)(x − c) + f (c) tangent line formula. Instead of the above expression, the following expression can also be used: f ′ (c) = lim b→c f (b) − f (c) . b−c derivative (instantaneous growth rate) 3.1 Rate of increase 21 Sometimes, when we want to express the differentiation of the function f with respect to the variable x more clearly, we write it as df (c) = f ′ (c). dx This notation is called Leibnitz notation, and although it may seem complex and inconvenient, it is very convenient when combined with various properties of derivatives that we will learn in the future. Remark 3.1. Many people refer to differentiation as the rate of change, but differentiation is not the rate of change. The term ”change” does not specify whether it increased or decreased, so saying the rate of change is 3 doesn’t reveal if it’s increasing or decreasing. However, saying the derivative is 3 means it is increasing. Also, saying the rate of change is -3 is an awkward expression. Does it mean it changed less than when the rate of change is 0? It’s an inappropriate expression. Probably, people use the term ”rate of change” but interpret it as the growth rate. Not all functions have this limit for every variable. For the instantaneous growth rate to exist, both the left-hand limit and the right-hand limit of the mean growth rate must exist, and the two values must be equal. If such a limit exists, we say that the function f is differentiable at c. Therefore, it is necessary to distinguish whether a function is differentiable or not. Problem 3.2 (Examples). (1) Show that the function f (x) = |x| is not differentiable at x = 0. (2) Show that the function f (x) = x2 is differentiable at x = 1. (3) Show that differentiation is not possible at points where the function f is discontinuous. Solution 3.2 (1) Let’s show that the left-hand limit and the right-hand limit of the mean growth rate are different. The left-hand and right-hand limits are as follows: lim h→0+ lim h→0− |h| h f (0 + h) − f (0) = lim = lim = lim 1 = 1. h h→0+ h h→0+ h h→0+ f (0 + h) − f (0) |h| −h = lim = lim = lim −1 = −1. − − h h h→0+ h→0 h h→0 Since they are different, the limit does not exist, and therefore, it is not differentiable. (2) The limit is given as follows: lim h→0 f (1 + h) − f (1) (1 + h)2 − 1 h2 + 2h = lim = lim = lim (h + 2) = 2. h→0 h→0 h→0 h h h The limit is 2, so it is differentiable, and the derivative value is 2. (3) Suppose that the function f is discontinuous at c. Then, for any ε > 0, there exists no δ > 0 such that, for all natural numbers n > 0, 0 < |hn | < n1 satisfies | f (c + hn ) − f (c)| > ε. 22 3 Differentiation Therefore, for any M > 0, if n > M ε , then | f (c + hn ) − f (c)| ε ≥ ≥ εn ≥ M. |hn | |hn | So, even though hn approaches 0, the average growth rate can be arbitrarily large, and therefore, the limit does not exist. Problem 3.3 (Continuity of a differentiable function). If the function f (x) is differentiable at x = c, then show that f (x) is continuous at x = c. Solution 3.3 There are various ways to prove this, and Problem 3.2(3) is one of them. Let’s consider at least one more way. 3.2 Differentiation Rules There are many cases where differentiation of functions is necessary. Therefore, it is crucial to understand and memorize some cases well to differentiate without making mistakes. In this section, we will learn methods of differentiation by categorizing them into eight cases. Question 3.1 (Is a number a function?). If a real number x is substituted into the function f : R → R, it produces another real number f (x). If a constant function always provides the same real number 3 for all x ∈ R, the best way to denote it is to write just 3 instead of f . Numbers are just numbers. However, they can also be used as a notation for representing functions. Using a single notation with a dual meaning is very convenient, and we will use such notation in various cases in the future. Problem 3.4 (Rules of Differentiation). Prove the following rules of differentiation. Also, specify the conditions required for these differentiation rules to hold. 1. The derivative of a constant function is 0. 2. f (x) = x ⇒ f ′ (x) = 1. 3. f (x) = xn for n ∈ N ⇒ f ′ (x) = nxn−1 . 4. f (x) = 1 x ⇒ f ′ (x) = −x−2 . Sum Rule 5. ( f + g)′ (x) = f ′ (x) + g′ (x). Product Rule 6. ( f g)′ (x) = ( f ′ g + f g′ )(x). f ′ f ′ g − f g′ Quotient Rule 7. (x) = (x). g g2 Power Rule 8. f (x) = xα for α ∈ R ⇒ f ′ (x) = αxα−1 . Solution 3.4 The above 8 Differentiation Rules are the most fundamental rules. Let’s examine the conditions under which these rules can be applied. The first three 3.2 Differentiation Rules 23 Differentiation Rules do not require any conditions. For (4), the condition x ̸= 0 is necessary. The condition that the denominator is not zero is necessary in all cases. (5,6,7) are meaningful only under the condition that both functions f and g are differentiable. However, (7) requires an additional condition that the denominator g(x) should not be zero. Power Rule (8) requires careful attention. If α is not a positive integer, it holds only for well-defined x where xα and xα−1 are defined. Once α − 1 is negative, the condition x ̸= 0 is necessary, and if α is not an integer, the condition x ≥ 0 is necessary. Now, let’s prove them. The first four rules are special cases of the Power Rule (8). (1) If f is a constant function, then f (x) = f (x + h). Therefore, f ′ (x) = lim h→0 0 f (x + h) − f (x) = lim = lim 0 = 0. h→0 h h→0 h (2) If f (x) = x, then f (x + h) − f (x) = h. Therefore, f ′ (x) = lim h→0 f (x + h) − f (x) = lim 1 = 1. h→0 h n−2 h2 + (3) If f (x) = xn , then f (x + h) − f (x) = (x + h)n − xn = nxn−1 h + n(n+1) 2 x n · · · + h . Therefore, n−2 h2 + · · · + hn nxn−1 h + n(n+1) 2 x = nxn−1 . h→0 h f ′ (x) = lim (4) If f (x) = 1 x and x ̸= 0, then 1 1 1 1 x − (x + h) 1 − = lim = − 2. h→0 h x + h h→0 h x(x + h) x x f ′ (x) = lim (5) The differentiation of the sum can be easily shown. If f and g are both differentiable, f (x + h) + g(x + h) − f (x) − g(x) h→0 h f (x + h) − f (x) g(x + h) − g(x) = lim + lim = f ′ (x) + g′ (x). h→0 h→0 h h ( f + g)′ (x) = lim (6) The differentiation of the product is a bit more challenging but essential. The technique used in this proof is ”add and subtract after adding.” Assuming f and g are both differentiable, 24 3 Differentiation f (x + h)g(x + h) − f (x)g(x) h→0 h f (x + h)g(x + h) − f (x + h)g(x) + f (x + h)g(x) − f (x)g(x) = lim h→0 h f (x + h)g(x + h) − f (x + h)g(x) f (x + h)g(x) − f (x)g(x) = lim + lim h→0 h→0 h h g(x + h) − g(x) f (x + h) − f (x) = lim f (x + h) + lim g(x) h→0 h→0 h h = f (x)g′ (x) + f ′ (x)g(x). lim Therefore, the limit exists, and ( f g)′ (x) = f (x)g′ (x) + f ′ (x)g(x) holds. (7) The differentiation of the fraction can be shown similarly to the product rule, but a little more attention is required for the fractional form. The condition is that f and g must each be differentiable, and it holds only for x where the denominator g(x) is nonzero. You can think of gf = f 1g . Therefore, it’s fine to calculate the derivative of 1 g first and then use the product rule (the calculation is omitted). (8) The Power Rule is the most commonly used differentiation rule. If α is not a positive integer, the proof uses the logarithmic function that will be learned in Chapter 6. Assuming you already know it, I’ll write it down below. After learning the logarithmic function, reviewing this part should enhance your understanding. Let y = xα , and take the logarithm of both sides: ln y = ln xα = α ln x. By differentiating both sides with respect to x using the derivative of the logarithmic function: y′ α = ⇒ y′ = αx−1 y = αxα−1 . ⊔ ⊓ y x Problem 3.5. Find the derivatives of the following functions. (1) f (x) = 3xα . Solution 3.5 (1) Understand it as the product of the constant function 3 and the exponential function xα . Using the product rule: (3xα )′ = 0xα + 3αxα−1 . In other words, if the function is a constant, its derivative is 0, so you can ignore the constant coefficient and just differentiate the remaining function part, then multiply by the constant coefficient. ⊔ ⊓ 3.4 Derivative of Trigonometric Functions 25 3.3 Intermediate and Mean Value Theorem Theorem 3.1 (Intermediate Value Theorem). Let f : [a, b] → R be a continuous function, and let m ∈ R be a constant between f (a) and f (b). Then, there exists c in the open interval (a, b) such that f (c) = m. Theorem 3.2 (Mean Value Theorem). Let f : [a, b] → R be a continuous function, and assume that f is differentiable for all x ∈ (a, b). Then, there exists c ∈ (a, b) such that f (b) − f (a) . f ′ (c) = b−a Proof. The proof is carried out using the Intermediate Value Theorem 3.1. Theorem 3.3 (Cauchy’s Mean Value Theorem). Let f , g : [a, b] → R be continuous functions, and assume that both f and g are differentiable for all x ∈ (a, b). Also, suppose that g′ (x) ̸= 0 for all x ∈ (a, b). Then, there exists c ∈ (a, b) such that f ′ (c) f (b) − f (a) = . ′ g (c) g(b) − g(a) Proof. The proof is carried out using the Mean Value Theorem 3.2. The mean value is one of the intermediate values, and the mean value theorem sounds similar to the intermediate value theorem. However, the above mean value theorem is, in fact, about the mean rate of increase, not the average value. For this reason, it would have been better to call it the Mean Growth Rate Theorem, but it has already settled with the name Mean Value Theorem. 3.4 Derivative of Trigonometric Functions Trigonometric functions are widely used and will continue to appear frequently. Remember them well. Problem 3.6. Prove the following. (1) sin′ x = cos x (2) cos′ x = − sin x (3) tan′ x = sec2 x (4) cot′ x = − csc2 x (5) sec′ x = sec x tan x (6) csc′ x = − csc x cot x Solution 3.6 (1) Using the sum rule for the sin function, we have: sin(x + h) − sin x sin x cos h + cos x sin h − sin x cos h − 1 sin h = = sin x + cos x . h h h h 26 3 Differentiation Taking the limit as h → 0, and using cos′ 0 = 0 and sin′ 0 = 1, we obtain sin′ x = cos x. (2) can be done similarly using the sum rule for cos x. The rest are obtained using the quotient rule for differentiation. It is good to remember if possible. ⊔ ⊓ 3.5 Velocity and Acceleration Newton developed calculus as a mathematical tool to explain the motion of planets. Let’s consider the relationship between position, velocity, and acceleration in onedimensional space. Let x : R → R be the position function. Here, x(t) represents the position or x coordinate of an object moving along a straight line (or x-axis) at time t. We are using the symbol x with dual meanings, representing the x-coordinate of a point and now the function representing the position. Then, x(t +h)−x(t) is the difference in position, over time h, and its derivative x′ (t) = lim h→0 x(t+h)−x(t) h is the average velocity x(t + h) − x(t) h is the instantaneous velocity at time t. Especially, the derivative with respect to time is sometimes denoted as ẋ instead of x′ . If v is used to represent velocity, then v = ẋ. The instantaneous rate of increase of velocity is acceleration, denoted as a = v̇ = ẍ. Remark 3.2 (Preview of Part II). The space is three-dimensional in the universe. To express the position of a planet, three coordinates x, y, z are needed, represented by three real variable functions x(t), y(t), z(t). If we simply denote these functions as x(t), y(t), z(t), the position can be expressed as r(t) = (x(t), y(t), z(t)). Velocity and acceleration are the first and second derivatives, respectively, of these position vector functions. That is, v(t) = ṙ(t) = (ẋ(t), ẏ(t), ż(t)), a(t) = v̇(t) = r̈(t) = (ẍ(t), ÿ(t), z̈(t)). For an object with mass m and a force acting on it denoted as F, Newton’s second law of motion is expressed as F = ma = mv̇ = mr̈. (Newton’s Second Law of Motion) Exercises 1. Find the tangent line at the given point on the graph of the following functions. √ (1) y = 4 + x2 at (1, 5) (2) y = x−2 at√(2, 0.25) (3) y = x3 at (4, 8) x (4) f (t) = t 3 −t 2 at t = 2 (5) f (x) = t 2 + 1 at x = 2 (6) f (x) = x−2 at x = 1 1 x−2 3 0.5 (7) f (s) = s − s at s = 1 (8) f (t) = t−1 at t = 2 (9) f (x) = x+1 at x = 2 2. Check if the following functions are differentiable at x = 0 when p f (0) = 0. (1) f (x) = x sin(1/x) (2) f (x) = x2 sin(1/x) (3) f (x) = |x| 3.5 Velocity and Acceleration 27 3. Calculate the instantaneous rates of increase for the following functions at x = r, where r is the radius. (1) Perimeter of a circle: 2πx (2) Area of a circle: πx2 (3) Surface area of a sphere: 4πx2 (4) Volume of a sphere: 34 πx3 4. Find the intervals where the following functions are defined and points where they are not differentiable. p p 2 +1 (4) √ 12 (2) |x| + 1 (3) xx−1 (1) |x|2 + 1 x −1 Lecture 4 Chain rule and implicit differentiation Chain Rule is the flower of differentiation rules. It is useful and powerful. The true power of the chain rule can be seen when studying functions with multiple variables and vector-valued functions in Calculus 2. In Calculus 1, we consider the simplest one-dimensional functions as follows: g : (a2 , b2 ) → R, f : (a1 , b1 ) → R, c ∈ (a2 , b2 ), g(c) ∈ (a1 , b1 ). What the chain rule states is that if the function g is differentiable at the point c and the function f is differentiable at g(c), then the composite function f ◦ g is differentiable at c, and its derivative is given by the following formula: ( f ◦ g)′ (c) = f ′ (g(c))g′ (c). (4.1) Question 4.1. Can you interpret the meaning of the mathematical relationship in (4.1) in everyday language? There are various languages in the world, and sometimes mathematics is considered the language of science. For example, the equation (4.1) is expressing something in the language of mathematics. Explaining its meaning in your own language is the first step in understanding this mathematical expression. If the meaning seems obvious, it is called intuition. For example, saying g′ (c) = 10 means that when x increases slightly around c, the function value g increases 10 times more. Similarly, saying f ′ (g(c)) = 10 means that when x increases slightly around g(c), f increases 10 times more. Therefore, the composite function f (g(x)) increases 100 times more when x increases slightly around c, and this is the meaning of the chain rule (4.1). Explaining mathematical language in everyday language is very useful. 29 30 4 Chain rule and implicit differentiation 4.1 Chain rule The derivative of a function f is defined as follows: f ′ (c) = lim z→c f (z) − f (c) . z−c f (c) The right side is the limit of the ratio f (z)− . If the variable x increases by △x z−c around the point c, the function f (x) increases by △x times f ′ (c). The geometric meaning of the derivative f ′ (c) is the slope of the tangent line to the graph y = f (x) at the point (c, f (c)). Using Leibniz notation, it can be written as: f′ = df . dx This notation clearly shows that the derivative is the rate of increase of the function f with respect to the increase in the variable x. The Chain Rule is a rule for differentiating composite functions, so let’s consider composite functions first. For two functions: g : (a1 , b1 ) → R and f : (a2 , b2 ) → R we can define the composite function as follows: ( f ◦ g)(x) = f (g(x)), x ∈ Ω. (4.2) Problem 4.1. Given the composite function f ◦ g in (4.2), it is generally not defined for all x ∈ (a1 , b1 ). Why is that? What is the maximum domain Ω that this composite function can have? Solution 4.1 If g(x) ̸∈ (a2 , b2 ), then f (g(x)) is not defined. Therefore, the maximum possible domain of the composite function is Ω := {x ∈ (a1 , b1 ) : g(x) ∈ (a2 , b2 )}. ⊔ ⊓ For convenience, let’s consider two functions f and g given as follows: g : (a1 , b1 ) → (a2 , b2 ), f : (a2 , b2 ) → R. (4.3) Then, the composite function f ◦ g : (a1 , b1 ) → R is defined without worrying about the domain. Problem 4.2. Consider the functions f and g given in (4.3). Assume that g is differentiable at c ∈ (a1 , b1 ) and f is differentiable at g(c) ∈ (a2 , b2 ). In this case, explain how much the composite function f ◦ g increases when the variable x increases by △x using the derivatives of f and g. 4.1 Chain rule 31 Solution 4.2 When the variable x changes slightly around c, the function g magnifies that change by g′ (c). On the other hand, the function f magnifies the change in g(c) by f ′ (g(c)). Therefore, the composite function f ◦ g magnifies the change around c by f ′ (g(c))g′ (c). ⊔ ⊓ Theorem 4.1 (Chain Rule). Let g : (a1 , b1 ) → (a2 , b2 ) and f : (a2 , b2 ) → R be given functions. Assume that g is differentiable at c ∈ (a1 , b1 ) and f is differentiable at g(c) ∈ (a2 , b2 ). Then, the composite function f ◦ g is differentiable at c, and its derivative is given by the following formula: ( f ◦ g)′ (c) = f ′ (g(c))g′ (c). (4.4) Proof. Even if you understand the meaning of the chain rule intuitively, proving it requires a separate skill, apart from intuition. The skill we will use is called ”cancel and multiply”: f (g(z)) − f (g(c)) g(z) − g(c) f (g(z)) − f (g(c)) = . z−c g(z) − g(c) z−c Since g is differentiable at c, g is continuous at c. Therefore, g(z) → g(c) as z → c. Also, since f is differentiable at g(c), lim z→c f (g(z)) − f (g(c)) f (g(z)) − f (g(c)) = lim = f ′ (g(c)). g(z) − g(c) g(z) − g(c) g(z)→g(c) Thus, using the product of limits, we have lim z→c f (g(z)) − f (g(c)) f (g(z)) − f (g(c)) g(z) − g(c) = lim lim = f ′ (g(c))g′ (c), z→c z→c z−c g(z) − g(c) z−c and the proof of the chain rule is complete. ⊔ ⊓ Problem 4.3. For the following cases of f and g, calculate the derivative of the composite function ( f ◦ g)(x) using the chain rule, and compare it with the direct differentiation of the composite function. (1) f (x) = x2 , g(x) = 2x+1 (2) f (y) = y2 , g(x) = 2x+1 (3) f (g) = g2 , g(x) = 2x+1 (4) f (x) = x4 , g(x) = x3 (5) f (x) = x10 , g(x) = x2 + 1 (6) f (x) = x2 , g(y) = cos y Solution 4.3 (1,2,3) are essentially the same problem. The variable x of f (x) and the variable x of the composite function ( f ◦ g)(x) are different. Instead, g(x) of ( f ◦ g)(x) corresponds to x of f (x). For (5), using the chain rule is possible, but directly creating the composite function and differentiating it is unrealistic. For (6), chain rule must be used. ⊔ ⊓ 32 4 Chain rule and implicit differentiation Equation (4.4) is the chain rule written in Newton’s notation. Rewriting it in Leibniz’s notation, it becomes: df d f dg = . (4.5) dx dg dx This equation involves a different kind of duality. Previously, x was used as both a variable and a function, but in the notation above, f is used as a function of x and as a function of g. In the left side of (4.5), ddxf means differentiating f as a function of x, while on the right side, ddgf means considering f as a function of g and dg dx means differentiating g as a function of x. That is, on the left side, f is the actual composite function of f and g, and on the right side, f is treated as a single function, considering g as a variable. Familiarity with this kind of notation duality requires practice. On the other hand, Equation (4.4) does not have such duality. However, if you write it as (4.5), the Chain Rule looks like canceling fractions. In other words, d f dg d f = . dg dx dx In the notation of the above differentiation, dg is not a number that can be canceled, but the Chain Rule seems to eliminate dg as if canceling a number. The process of canceling is the process of using the Chain Rule. Question 4.2 (Butterfly effect and chain-reaction). The phenomenon where small changes lead to significant consequences is commonly referred to as the butterfly effect. Assuming a phenomenon is given by the composition of n functions, it can be understood as the following composite function: H(x) = ( f1 ◦ f2 ◦ · · · ◦ fn )(x) = f1 ( f2 (· · · ( fn (x)) · · · )). Under what circumstances does the butterfly effect occur? Can the chain rule explain it? The occurrence of the butterfly effect is when H ′ (c) is very large. In such cases, a slight change in the variable x around c causes the output H(x) to change significantly. When does such an event happen? By repeatedly applying the chain rule, we have H ′ (c) = f1′ ( f2 (· · · ( fn (c)) · · · )) × f2′ (· · · ( fn (c)) · · · ) × · · · × fn′ (c) ′ ( f (c)), · · · , Therefore, if there exists a point c where each derivative, fn′ (c), fn−1 n ′ f1 ( f2 (· · · , ( fn (c)) · · · )), takes on large values, then at that point, the butterfly effect is maximized. For example, if c is such that fn leads to a large derivative for fn−1 , and fn (c) is again matched with a large derivative for fn−2 , and this situation continues in a chain reaction, H ′ (c) can become very large, and that is when the butterfly effect occurs. Understanding the chain rule and its proof is not enough. You also need to know how to choose f and g when the situation is given. 4.2 Implicit Differentiation 33 Problem 4.4. (1) Find h′ (x) when h(x) = (3x2 + 1)2 . (2) Find h′ (x) when h(x) = (3x2 + 1)6 . (3) Find x′ (t) when x(t) = cos(t 2 + 1). Solution 4.4 (1) First, let’s perform this task without using the chain rule. Expanding, we get h(x) = 9x4 + 6x2 + 1. Then, h′ (x) = 36x3 + 12x. Now, let’s use the chain rule. Set the outer function as f (g) = g2 and the inner function as g(x) = 3x2 + 1. Then, h′ (x) = ( f ◦ g)′ (x) = f ′ (g(x))g′ (x) = 2(3x2 + 1)6x = 36x3 + 12x. The two results match. However, in this case, using the chain rule seems slightly more complex than directly expanding. But this is true only for simple cases; in most cases, it is not. (2) Expanding (3x2 + 1)6 is too much work, so let’s use the chain rule instead. Now, let f (g) = g6 and g(x) = 3x2 + 1. Then, f ′ (g) = 6g5 and g′ (x) = 6x. Thus, h′ (x) = 6(3x2 + 1)5 6x. (3) Let f (g) = cos(g) and g(t) = t 2 + 1. Then, f ′ (g) = − sin(g) and g′ (t) = 2t. ⊓ Therefore, x′ (t) = f ′ (g(t))g′ (t) = − sin(t 2 + 1)2t. ⊔ Problem 4.5. (1) Find h′ (x) when h(x) = sin(x2 + x). (2) Find h′ (t) when h(t) = tan(5 − sin(2t)). Solution 4.5 (1) In this case, let f (g) = sin(g) and g(x) = x2 + x. Then, h′ (x) = ( f ◦ g)′ (x) = f ′ (g(x))g′ (x) = cos(x2 + x)(2x + 1). (2) In this case, let f (g) = tan(g) and g(t) = 5 − sin(2t). Then, g′ (t) = − cos(2t)2 and f ′ (g) = sec2 (g). Therefore, h′ (t) = − sec2 (5 − sin(2t)) cos(2t)2. ⊔ ⊓ 4.2 Implicit Differentiation Implicit differentiation is one of the most crucial applications of the chain rule. For example, when there is an equation involving two variables, implicit differentiation involves considering one variable as a function of the other and finding its derivative. Although the entire equation may not be viewable as a function, parts of it can be considered as functions. This method is essential and powerful, encompassing the core principles of the chain rule. Let’s illustrate this with an example. Suppose we have two variables, x and y, satisfying the following equation: x2 + y2 − 25 = 0. Then, we can treat one of the variables, say x, as the independent variable and consider the other as a function of this variable. For instance, if we choose x as the independent variable, from the given equation, we get: p p y2 = 25 − x2 ⇒ y = 25 − x2 or y = − 25 − x2 . Now, the derivative can be calculated as follows: 34 4 Chain rule and implicit differentiation x x dy or − √ =√ 2 dx 25 − x 25 − x2 for − 5 < x < 5. p If we want x as a function of y, a similar process gives x = ± 25 − y2 . dy However, expressing y explicitly as a function of x and then calculating dx , as shown above, can be inconvenient. Moreover, in many cases, expressing y as a function of x is not easy or even impossible. We can use a much simpler and more powerful technique called implicit differentiation to easily calculate it. Consider, for example, x3 + 2y3 − 9xy = 1. (4.6) In this case, expressing y as a function of x is not straightforward. However, by mentally considering y as a function y = y(x) of x (implicit function), we can view the equation as: x3 + 2(y(x))3 − 9xy(x) = 1. Terms like 2(y(x))3 are treated as compositions of two functions. Now, by differentiating both sides with respect to x, we obtain: dy dy d 3 (x + 2y3 − 9xy) = 3x2 + 6y2 − 9y − 9x = 0. dx dx dx In this calculation, we used the chain rule and the product rule for differentiation. dy Rearranging the relationship with respect to dx , we get: dy 9y − 3x2 = . dx 6y2 − 9x Problem 4.6. Given that x and y satisfy (4.6), calculate (4.7) dy dx at the point x = 1. Solution 4.6 To substitute x = 1 into the derived equation (4.7), we need to determine what to substitute for y. Substituting x = 1 into (4.6), we obtain 2y3 − 9y = 0. Solving this cubic equation, we find y = 0 and y = ± 92 . Substituting these three possible values of y, we get three potential derivative values. (Take a moment to think: A function should have only one value for each x, but having three values indicates that y is not a function of x. Although it is a function locally, it is not a function globally. However, implicit differentiation still finds all three derivative values.) ⊔ ⊓ Problem 4.7. Let x and y satisfy x2 + y2 = 25. Using implicit differentiation, find dx dy . What is the value at the point (5, 0)? Solution 4.7 Consider x as a function of y this time. Then, 2y = 0. Therefore, dx y y =− = p . dy x ± 25 − y2 Substituting (5, 0), we get d 2 2 dy (x + y ) = 2x dx dy + 4.2 Implicit Differentiation 35 dx dy x=5,y=0 = 0. In the given equation, x and y are indistinguishable. (Even if we swap them, the equation remains the same.) Therefore, the derivative is given as the same function. ⊔ ⊓ Problem 4.8. There is a circle passing through the point (3, −4) with the origin as the center. Find the slope of the line tangent to the circle at this point. p Solution 4.8 The radius of the circle is 32 + (−4)2 = 5, and the circle satisfies x2 + y2 = 25. Performing implicit differentiation with respect to x, we get 2x + 2y dy = 0. dx Therefore, the slope of the tangent at the point (3, −4) is dy dx 3 = − xy = − −4 = 43 . ⊔ ⊓ Problem 4.9. There is a circle passing through the origin with the point (5, 0) as a point on the circle. Find the slope of the line tangent to the circle at this point. Solution 4.9 The radius of the circle is 5. Therefore, it satisfies x2 + y2 = 25. Performing implicit differentiation with respect to x, we get: 2x + 2y dy = 0. dx dy = − xy = 05 . However, something is wrong. Substituting the values at (5, 0), we get dx There is no numerical output. Ah, in this case, we cannot view y as a function of x dy near the given point (5, 0), and we cannot calculate dx . Geometrically, it corresponds to a vertical line in the graph. It would be correct to say that y is not differentiable at this point. Instead of saying the slope is infinite, it is more accurate to say that y is not differentiable at this point. ⊔ ⊓ Question 4.3. We considered the case of one equation with two variables. Then, (1) What happens if there is one equation with three variables? (2) What happens if there are two equations with three variables? In Calculus 2, we will explore these cases. However, in case (1), it is impossible to consider two variables as functions of the third variable. Why? In case (2), it is possible to consider two variables as functions of the third variable. Why? Here, we are talking about general possibilities, and it does not mean it is always possible. There are cases, as in Problem 4.8, where differentiation is not possible, or the function cannot be viewed as a function of x. Let’s try to answer these questions on your own. 36 4 Chain rule and implicit differentiation Exercises 1. Use the chain rule to find composite function f ◦ g. √ the derivative of the√ (2) f (x) = x2 − 1, g(x) = sin x (1) f (u) = u2 , g(x) = x + 1 (3) f (u) = u12 , g(t) = t 3 − t (4) f (u) = sec u, g(t) = cost √ (5) f (x) = x, g(u) = cos u (6) f (s) = 2s2 , g(u) = 5u − 1 2. Use the chain rule to find √ the derivative of the3following functions. (3) cos x (1) (2x + 1)3 (2) t 3 − 2t + 1 1 −3 (4) 3(cos x) (5) sin(3πx) + cos(2x2 ) (6) cost + sint 3. Use the chain rule to find the derivative of the following functions. q p √ 4 2 (3) sin3 (cos2 t) (1) 1 + tan (t ) (2) 2t + 1 + 1 − t dy 4. Use implicit differentiation to find dx . (1) x2 + y2 = 4 (2) x + y2 = 1 (3) sin x + y2 = 1 (4) xy2 + x2 = 3 (5) xy = sin(xy) (6) (2xy + y2 )2 = x2 − y2 dy 5. Use implicit differentiation to find dx and dx dy at the given points. 2 2 2 (1) x + y = 4 at x = 1 (2) y − x + x = 4 at x = 1 (3) xy + x2 + y2 = 1 2 at x = 1 (4) x + y − 2y = 4 at x = 1 6. Use the results from the above calculations to find the product of same points. dy dx and dx dy at the Lecture 5 Integration & fundamental theorem of calculus In the third lecture, we studied differentiation, and in the fourth lecture, we explored differentiation techniques. Now, we delve into integration. Integration involves the process of going back to the function before differentiation. To master integration, practice is essential. The techniques of integration are covered in more detail in Part III. 5.1 Antiderivative For a given function f (x), an antiderivative refers to a function that, when differentiated, results in f (x). In other words, F ′ (x) = f (x) defines a function F(x) as the antiderivative of f (x). The process of finding antiderivatives involves reversing the steps of differentiation. However, finding actual antiderivatives can be challenging. Problem 3.3 provides rules of differentiation, which, when applied in reverse, can help find antiderivatives in certain cases. It is crucial to note that an antiderivative is not unique; it can have any constant added to it. Since the derivative of any constant is zero, all constant functions become antiderivatives of 0. Thus, F(x) = C for any constant C ∈ R. Here, C is called a generic constant. If there is a constant like 3C, it can be rewritten as C for simplicity. Let’s consider the following problems. Problem 5.1. Find the antiderivatives of the following functions. (1) f (x) = 0. (2) f (x) = 1. (3) f (x) = xα , where α ̸= −1. (4) f (x) = x−1 . 37 38 5 Integration & fundamental theorem of calculus Solution 5.1 Interpret the instruction to find all antiderivatives. (1) Since the derivative of a constant is 0, all constant functions are antiderivatives of 0. Thus, F(x) = C for any constant C ∈ R. Let’s consider generic constants; 3C can be rewritten as C. (2) Let F(x) = x; then, F ′ (x) = 1. Therefore, F(x) = x is an antiderivative of f (x) = 1. However, since the derivative of any constant is 0, all functions of the form F(x) = x +C, where C is any constant, are antiderivatives of f (x) = 1. (3) The derivative of xα reduces the power by one ((xα )′ = αxα−1 ). So, the antiderivative should increase the power by one, adjust the coefficient, and add a con1 xα+1 +C, but it’s valid only when α ̸= −1. stant. The antiderivative is F(x) = α+1 (4) For α = −1, what is the antiderivative of f (x) = x−1 ? This case is special and holds significant importance. The antiderivative and its inverse function might be among the most crucial functions in mathematics. Apart from the cases mentioned above, practicing finding antiderivatives of various functions is necessary. The antiderivatives obtained in Problem 5.1 always include an additional constant term C. This constant is referred to as a generic constant and automatically appears to encompass all possible antiderivatives. Question 5.1 (Antiderivative of 0). Is there really nothing more than the antiderivatives found above? In other words, by adding the generic constant C, do we obtain all possible antiderivatives? This question is equivalent to asking whether the antiderivative of 0 is always a constant function other than 0. The reason is that if F and G are different antiderivatives of the function f , then (F − G)′ = F ′ − G′ = f − f = 0. This means the difference F − G is the antiderivative of 0, and hence, there are no antiderivatives of 0 other than constant functions. So, does a function other than a constant have no antiderivative of zero? In other words, is there any peculiar function that exists such that its derivative becomes zero? This question is somewhat philosophical(?) and mysterious. It can be addressed using the Mean Value Theorem. First of all, an antiderivative is a differentiable function, allowing us to apply the theorem. Suppose a differentiable function F satisfies F ′ (x) = 0. If F is not a constant function, then there exist two distinct points x ̸= y such that F(x) ̸= F(y), and hence, their average rate of change is F(x)−F(y) ̸= 0. Therefore, by the Mean Value Theorem, there exists a point c with x−y this average rate of change between x and y, i.e., F ′ (c) = F(x)−F(y) . This implies a x−y contradiction to the fact that F is the antiderivative of zero, as F ′ (c) should be zero. Therefore, the only antiderivative of zero is a constant. Problem 5.2. Solution 5.2 5.2 Integral as the area bounded by a graph 39 5.2 Integral as the area bounded by a graph Let f : R → R be a given function, and define F(t) as the area between the x-axis and the graph of f over the interval [a,t] where a is a fixed constant and t > a. If the area lies below the x-axis (where f < 0), consider it with a negative sign. Treat a as a fixed constant and t as a variable. This area function is typically represented using the integral symbol as Z t F(t) = f (x) dx. a Now, let’s examine the derivative of this area function. The derivative is given by F ′ (t) = lim h→0 F(t + h) − F(t) . h Now, let’s explore what the derivative of this area function is. Problem 5.3. If F(t) is the area function corresponding to the given function f (t), prove the following: F ′ (t) = f (t). (5.1) However, this equation is not always satisfied. Provide the conditions on the function f for which (5.1) holds. Solution 5.3 Consider F(t + h) − F(t) as the area represented by the pink region in the graph. Dividing it by the width h, as h approaches 0, the height approaches the value f (t). To show convergence, the function f needs to be continuous at t. This proof uses everyday language; in Problem 5.4, the ε-δ method is used for a more formal proof. Question 5.2 (Definition of the area function). To proceed with the discussion, a natural question arises. Given a function f (x), is the area function well-defined? First, you need to decide how to calculate the area, and based on that method, it will determine whether the area function of a certain function is well-defined. This is a question of how to perform integration. We will determine the area using the method of Riemann sum for definite integrals. However, regardless of how you define the area, some functions cannot have a well-defined area function. There are 40 5 Integration & fundamental theorem of calculus two main reasons for this: when the area diverges to infinity, or when the function is too irregular and impossible to calculate. In introductory calculus courses, only functions that are continuous except for a finite number of points and have finite values on all finite intervals are considered. Under these conditions, the function is bounded, and the area does not diverge to infinity. Due to continuity, the function is not too irregular, allowing for the calculation of the area. Problem 5.4 (Derivative of the area function). Assuming the area function is welldefined and f (x) is continuous, use the ε-δ method to prove (5.1). Solution 5.4 Given an arbitrary ε > 0, since f is continuous at x = t, there exists a δ > 0 such that for all |h| < δ , | f (t) − f (t + h)| < ε. Therefore, F(t + h) − F(t) ( f (t) + ε)h < < f (t) + ε h h and F(t + h) − F(t) ( f (t) − ε)h > > f (t) − ε. h h For |h| ≤ δ , we have F ′ (t) = F(t+h)−F(t) h − f (t) ≤ ε. Thus, F is differentiable at t, and f (t). 5.3 Riemann sum and area In this section, we define the Riemann integral using the method of partition sums R and use it to define the integral ab f (x)dx. Let f : [a, b] → R be a continuous function defined on the closed interval [a, b]. First, consider a set of points, π = {x0 , x1 , · · · , x p }, called a partition of the closed interval [a, b], satisfying the following conditions: a = x0 < x1 < x2 < · · · < x p = b. Although the symbol π is commonly used for the mathematical constant pi, here it is used to represent a partition. A partition can consist of many points, and another partition may have fewer points, but they all must start at x0 = a and end at x p = b. Furthermore, they must be correctly ordered. The size of the k-th subinterval is represented as follows: △xk = xk − xk−1 , k = 1, · · · , p. The gauge of the partition π is defined as the maximum size of the subintervals and is denoted as: 5.3 Riemann sum and area 41 ∥π∥ := max △xk . 1≤k≤p After that, for each subinterval, choose a point ck ∈ [xk−1 , xk ], and define the Riemann sum as follows: p ∑ f (ck )△xk , ck ∈ [xk−1 , xk ]. Riemann Sum k=1 Let’s understand the Riemann sum through some examples. Problem 5.5. Calculate the Riemann sum for the given functions and partitions. Choose the points ck in each subinterval for ease of calculation. 1. For the constant function f (x) = c, interval [0, 1], and partition π = { ni : 0 = 0, 1, · · · , n}, calculate the Riemann sum and compare it with the area. 2. For the function f (x) = x, interval [0, 1], and partition π = { ni : 0 = 0, 1, · · · , n}, calculate the Riemann sum and compare it with the area. Also, find the limit of the Riemann sum as n → ∞. 3. For the function f (x) = x2 , interval [0, 1], and partition π = { ni : 0 = 0, 1, · · · , n}, calculate the Riemann sum and compare it with the area. Solution 5.5 ⊔ ⊓ Definition 5.1. (1) A number I is called the integral of f over [a, b] if p ∑ f (ck )△xk = I ∥π∥→0 lim k=1 for any choice of ck ∈ [xk−1 , xk ]. We denote I = (2) If the limit Rexists, f is called integrable. R (3) We denote ba f (x)dx = − ab f (x)dx. Rb a f (x)dx. Theorem 5.1. (1) If f : [a, b] → R is continuous, then it is integrable. (2) If f : [a, b] → R is continuous except for a finite number of points and is bounded, then it is integrable. The condition of being continuous except for a finite number of points is necessary for integrability. The following problem illustrates this point. Problem 5.6. Show that the function f : R → R defined as follows is not integrable on any finite interval [a, b]. ( 1, x is a rational number, f (x) = 0, x is an irrational number. 42 5 Integration & fundamental theorem of calculus Solution 5.6 For any subinterval [xk−1 , xk ], both rational and irrational numbers exist. Therefore, choosing ck to be always a rational number makes the Riemann sum equal to 1, and choosing it as an irrational number makes the Riemann sum equal to 0. According to the definition, the Riemann sum should converge to the same number I regardless of how ck is chosen, which is not the case. ⊔ ⊓ Problem 5.7. Show the following properties of integrals for functions f and g defined on the interval [a, b]: 1. Ra a f (x)dx = 0. 2. Rb k f (x)dx = k Rb 3. Rb a ( f + g)dx = Rb 4. Rb Rc a a f (x)dx = a a a f (x)dx for any constant k. f (x)dx + f (x)dx + Rb c Rb a g(x)dx. f (x)dx for any c ∈ [a, b]. 5. If f (x) ≤ g(x) for all x in [a, b], then Solution 5.7 Rb a f (x)dx ≤ Rb a g(x)dx. ⊔ ⊓ Now let’s introduce the fundamental theorem of calculus in two forms. Theorem 5.2 (Fundamental Theorem of Calculus). Let f be a continuous funcR tion on the closed interval [a, b], and let F(x) = ax f (t)dt. Then, F(x) is differentiable, and F ′ (x) = f (x). Proof. To prove this theorem, we need to make good use of the properties of integrals, which we accept without proof. ⊔ ⊓ The Fundamental Theorem of Calculus states that if a function is continuous, then it is integrable, and if we integrate it and then differentiate the result, we get back to the original function. Theorem 5.3 (Cauchy’s Fundamental Theorem of Calculus). Let f Rbe a continuous function on the closed interval [a, b], and let F ′ (x) = f (x). Then, ab f (x)dx = F(b) − F(a). Proof. Let’s prove this using the first version of the Fundamental Theorem. Let R G(x) = ax f (t)dt. Then, G(x) is also an antiderivative of f . Therefore, (F −G)′ (x) = F ′ (x) − G′ (x) = f (x) − f (x) = 0. Since F − G is an antiderivative of 0, it is a constant. Thus, there exists a constant C such that F − G = C. Consequently, F = G +C and Z b F(b) − F(a) = G(b) − G(a) = f (x)dx. a ⊔ ⊓ Cauchy’s version shows a way to calculate integrals or areas. Once we find an antiderivative F of f , we can calculate F(b) − F(a) to obtain the integral value 5.3 Riemann sum and area 43 Rb a f (x)dx. If finding an antiderivative is easy, then using it is preferable. However, in some cases, finding an antiderivative is not straightforward, and in such cases, it is much easier to use Riemann sums (partition integrals). Problem 5.8. Use the Fundamental Theorem of Calculus (FTC) to calculate the following integrals. Z a (1) Z a x2 dx (2) 0 Z a xα dx (3) 0 Z 2 x2 + x4 dx (4) 0 x−1 dx 1 ⊔ ⊓ Solution 5.8 Exercises 1. For the partition π of the interval [a, b] and ck ∈ [xk−1 , xk ], find the following limits and determine for which intervals [a, b] they converge. p p ∑ 2c2k △k ∥π∥→0 (1) lim k=1 p (4) lim ∑ tan ck △k ∥π∥→0 k=1 p ∑ (3ck + c2k )△k ∥π∥→0 ∑ sin ck △k ∥π∥→0 (3) lim c−1 k △k (6) lim (2) lim k=1 p (5) lim ∑ ∥π∥→0 k=1 k=1 p ∑ ck0.5 △k ∥π∥→0 k=1 2. Find the antiderivatives of the following functions. (The variable used doesn’t make a difference.) (1) f (x) = x2 (2) f (y) = y−2 (3) f (t) = −|t| (4) f (t) = 3t − 1 (5) f (y) = sin y (6) f (x) = x−1 3. Evaluate the following integrals. Z 2 (1) (4) 2 (x − x − 1)dx Z0 3 2 y − y4 1 y2 Z 2 (2) x Z 12π dy (5) 0 100 Z 3 dx sin 2x dx sin x 4. Simplify the following. Z Z 2 d t d x 2 t dt (2) f (x)dx (1) dx 0 dt 2 d (3) dx (3) Z−2π (6) |t|dt (cos x + | cos x|)dx 0 Z 1 √ sintdt x 5. Find the area between y = x2 and y = 1. √ 6. Find the area between y = x, x = 1, and y = 0. d (4) dt Z √t x 1 −2 3 dx Lecture 6 Inverse functions and their derivatives In this lecture, we will learn about the logarithmic function, which is a very important function. We will also learn about its inverse function, the exponential function. We will cover basic mathematical concepts such as function definitions, when inverse functions exist, and use the chain rule to calculate the derivatives of inverse functions. 6.1 Bijection (one-to-one and onto function) If a function f takes set A as its domain and set B as its co-domain, it is denoted as f : A → B. Set A is called the domain, and set B is called the co-domain. The set of all images of the function f in the subset of set B, defined as R( f ) := {y ∈ B| there exists x ∈ A such that y = f (x)}, is called the range. A function must have exactly one value for each element in its domain. It cannot have more than one or zero values for a given element. Definition 6.1. The mapping f : A → B is called a function if there exists y ∈ B uniquely for each x ∈ A such that f (x) = y. Defining the inverse function is straightforward – it is defined as the reverse. However, it is important to note that the inverse function must also be a function, which imposes certain conditions. Definition 6.2. A function f : A → B is called one-to-one (or injection) if any value y ∈ B is taken at most once. In other words, f (x1 ) ̸= f (x2 ) if x1 ̸= x2 . A function f : A → B is called onto (or surjection) if any value y ∈ B is taken at least once, i.e., R( f ) = B. The function f is called bijection if it is one-to-one and onto. For one-to-one and onto functions, we can define the inverse function. Definition 6.3. If f : A → B is one-to-one and onto, we may define a function g : B → A such that g(y) is the preimage of f for y, i.e., f (g(y)) = y. This function g is called the inverse function of f . We denote the inverse function of f as f −1 . Problem 6.1. Prove that if the function f : A → B is a bijection, then the inverse function f −1 is defined. 45 46 6 Inverse functions and their derivatives Solution 6.1 To prove that the inverse function is defined in this situation means to show what? It means to show that the inverse function defined as the inverse relation is indeed a function. To demonstrate that the inverse function g is a function, we need to show that, for every y ∈ B, the function value g(y) is uniquely defined. Since f is onto, for every y ∈ B, there exists an x ∈ A such that f (x) = y. This implies that g(y) cannot have zero function values. If g(y) had distinct values x1 ̸= x2 , then f (x1 ) = f (x2 ) = y, meaning f is not one-to-one. Therefore, g(y) has a unique value for every y ∈ B, and thus, it is a function. ⊔ ⊓ Question 6.1. If f (x) is a bijection, how are the graphs of f (x) and its inverse function f −1 (x) related? They are symmetric with respect to the line y = x. Can you easily explain why this symmetry exists? The graph of the function f depicts points satisfying y = f (x) on the coordinate plane. The same points satisfy x = f −1 (y). Therefore, the graphs are the same. However, since variables are typically represented by x, the graph is symmetrically shifted about the line y = x. After swapping x and y, the third figure is obtained. How do we determine the inverse function for a non-bijective function? We select appropriate subsets of A and B to make the function one-to-one and onto. This process is called choosing a branch. Different branches yield different inverse functions. When selecting a branch, it is essential to include as many regions as possible. Problem 6.2. Find sets A, B ⊂ R to make the following function f : A → B a bijection. Find its inverse function. (1) f (x) = 12 x + 1 (2) f (x) = x2 Solution 6.2 (1) This function is one-to-one and onto when we set A = R and B = R. To find the inverse function, we can simply set f (x) = y and then rearrange the equation to solve for x as x = g(y). 1 x + 1 = y ⇒ x = 2(y − 1) ⇒ g(y) = x = 2(y − 1). 2 (2) f is not one-to-one on R. To define a one-to-one interval, we can choose A = [0, ∞) or A = (−∞, 0]. In this case, most people would choose A = [0, ∞), and we 6.2 Derivative of inverse functions 47 will also adopt that. Additionally, setting B = [0, ∞) and considering f as a function restricted to the branch f : A → B, an inverse function exists. In this case, y = x2 ⇒ x = √ √ y ⇒ g(y) = y, Thus, the inverse function is g(y) = y ≥ 0. √ y. ⊔ ⊓ 6.2 Derivative of inverse functions The inverse function of a function f : A → B is typically denoted as f −1 . Thus, f −1 : B → A. If we distinguish the variables of the domain as x ∈ A and the variables of the co-domain as y ∈ B, then f (x) and f −1 (y) are written. This distinction helps reduce confusion. Calculating the derivative of the inverse function is best approached using the chain rule, which states that if f is differentiable at c and g is differentiable at f (c), then the composition g ◦ f is differentiable at c, and its derivative is (g ◦ f )′ (c) = g′ ( f (c)) f ′ (c). Let’s rewrite this using Leibniz notation, distinguishing the variables x and y: If we set y = f (x) and g = g(y) = g( f (x)), we may denote dg dg dy = = g′ (y) f ′ (x) = g′ ( f (x)) f ′ (x). dx dy dx If g is the inverse function of f , then g( f (x)) = x. Thus, (g ◦ f )(x) = x ⇒ (g ◦ f )′ (x) = 1. Applying the chain rule, we get g′ ( f (c)) f ′ (x) = 1 ⇒ g′ ( f (x)) = 1 f ′ (x) . As mentioned earlier, this rule does not hold when the denominator is zero. Therefore, we obtain the following derivative of the inverse function. Theorem 6.1 (Derivative of Inverse Functions). If f : A → B is a bijection, differentiable at c ∈ A, and f ′ (c) ̸= 0, then the inverse function f −1 is differentiable at f (c), and its derivative is given by ( f −1 )′ ( f (c)) = 1 . f ′ (c) 48 6 Inverse functions and their derivatives Proof. The previous application of the chain rule to calculate the inverse function was not a complete proof. We assumed the differentiability of the inverse function and used the chain rule accordingly. To complete the proof, we must demonstrate that the inverse function is differentiable. Two approaches are possible. Firstly, knowing what the derivative should be, we show that it is a limit. Secondly, using the graph, we observe that the inverse function’s graph is obtained by symmetrically shifting the original function’s graph with respect to the line y = x. The slope ⊓ becomes f ′1(c) after the shift. ⊔ To calculate the derivative of the inverse function, one can either find the inverse function explicitly and then differentiate it or use the formula provided above. Let’s try both approaches in the following problem and compare the results. Problem 6.3. Find the inverse function and its derivative, then compare with the derivative formula. (1) f (x) = x2 (2) f (x) = x3 − 2 for finding ( f −1 )′ (6). Solution 6.3 (1) To find the inverse function, start with y = x2 . Since x, y ≥ 0, we √ √ have x = y. Therefore, the inverse function is f −1 (y) = y = y1/2 . Differentiating, we get 1 1 ( f −1 )′ (y) = y−1/2 = √ . 2 2 y Now, using the derivative formula for inverse functions, ( f −1 )′ (y) = ( f −1 )′ ( f (x)) = Do these two results match? Indeed, since x = 1 f ′ (x) = 1 . 2x √ y, we can substitute to verify. (2) Let’s repeat the process. Set y = x3 − 2, and rewriting gives x = (y + 2)1/3 . 1 Hence, f −1 (y) = (y + 2)1/3 , and its derivative is ( f −1 )′ (y) = (y + 2)−2/3 . Using 3 the derivative formula for inverse functions, ( f −1 )′ (y) = ( f −1 )′ ( f (x)) = 1 1 = . f ′ (x) 3x2 These two results match. Now, evaluate ( f −1 )′ (6). In the first case, substituting 1 y = 6 gives ( f −1 )′ (6) = 31 8−2/3 = 12 . In the second case, we first confirm that x = 2 when y = 6, and then evaluate, ( f −1 )′ (6) = The results match. ⊔ ⊓ 1 1 1 = = . f ′ (2) 3 × 22 12 6.3 Natural logarithm 49 In many cases, finding the inverse function explicitly requires extensive computation and may not be possible. Even if it is possible, it is often more desirable to understand and use the derivative of the inverse function effectively. 6.3 Natural logarithm In the previous lecture, we were able to find the antiderivative of the power function xα for all cases except when α = −1. So, what about the antiderivative of x−1 ? We define it as the area function of the function f (x) = x−1 using the natural logarithm. We call this area function the natural logarithm and represent it as follows, defining it through the integral: Z x 1 dt. (6.1) ln x = 1 t Question 6.2. Why do we define ln x by integrating from 1 to x? Wouldn’t it be more natural to integrate from 0 to x? Or even from x to ∞? Is there a specific reason for integrating from 1 to x? Perhaps the most natural choice would be to integrate from 0. However, the function t −1 cannot be integrated from 0 because the integral diverges to infinity: Z x 1 0 t dt = ∞. The next logical choice might be to integrate from x to ∞, but this is also impossible as the integral diverges: Z ∞ 1 dt = ∞. x t Therefore, the only feasible option is to integrate from a positive value other than 0. Historically, people chose to integrate from 1, leading to the definition of ln x through (6.1). 50 6 Inverse functions and their derivatives Now that we have antiderivatives for all power functions xα , except when α = −1, we can conclude that the antiderivative for α = −1 is the natural logarithm ln x. This allows us to perform differentiations related to natural logarithms. Here are some fundamental properties derived from the definition of the natural logarithm: Theorem 6.2. The natural logarithm ln x satisfies the following properties. 1. ln x is defined only for x > 0. 2. ln |x| is defined for all non-zero x ∈ R − {0}. 3. d dx 4. d dx ln x = 1 x ln |x| = for all x > 0 by the fundamental theorem of calculus. 1 x for all x ̸= 0. Proof. Properties 1, 2, and 3 stem from the definition of the natural logarithm. Property 4 is evident when considering the graph of y = ln |x|. Using the chain rule, ln |x| = ln(−x) for x < 0, so when x < 0: d 1 1 d ln(|x|) = ln(−x) = × (−1) = . dx dx −x x Therefore, for all x ̸= 0, we have d dx ln |x| = 1x . ⊔ ⊓ Next, we present some useful laws derived from the definition of natural logarithms. Theorem 6.3 (Properties of Natural Logarithms). The natural logarithm ln x has the following properties. 1. ln(xy) = ln x + ln y. 2. ln xk = k ln x for x > 0 and k ∈ R. 3. ln(x/y) = ln x − ln y. 4. d dx ln | f (x)| = f ′ (x) f (x) for all x such that f (x) ̸= 0. Proof. The proof of these rules is significantly facilitated by the technique of differentiation. To demonstrate (1), differentiate both sides using the same method. Treating y as a constant and differentiating with respect to x, we use the chain rule to obtain: d 1 1 d 1 ln(xy) = y = , (ln x + ln y) = . dx xy x dx x The equality of the derivatives implies that the two sides differ by a constant. Thus, ln(xy) = ln x + ln y +C. Setting x = 1 to find the constant gives C = 0. 6.3 Natural logarithm 51 To demonstrate (2), differentiate both sides to obtain: d k 1 ln xk = k kxk−1 = , dx x x d k k ln x = . dx x Again, the equality of the derivatives implies that the two sides differ by a constant. Thus, ln xk = k ln x +C. Setting x = 1 to find the constant gives C = 0. For (3), we use the properties obtained above: ln(x/y) = ln(xy−1 ) = ln x + ln y−1 = ln x − ln y. For (4), we apply the chain rule to Theorem 6.2(4): d 1 f ′ (x) d ln(| f (x)|) = ln(− f (x)) = × (− f ′ (x)) = . dx dx − f (x) f (x) ⊔ ⊓ Problem 6.4. Let a > 0. Using Theorem 6.3, prove the following: d x a = ax ln a. dx Solution 6.4 The proof involves differentiating the natural logarithm attached to ln ax . Using Theorem 6.3(2), we have: d d ln(ax ) = (x ln a) = ln a. dx dx On the other hand, using Theorem 6.3(4), we have: d (ax )′ ln(ax ) = x . dx a Both expressions must be equal, leading to (ax )′ = ax ln a. ⊔ ⊓ Using natural logarithms makes it easier to differentiate seemingly complex fractional functions, as shown in the following problem. Problem 6.5. Find the derivative of the fraction function (x2 +1)(x+3)1/2 . x−1 Solution 6.5 Applying the product and chain rules to the given function becomes complicated. However, taking the natural logarithm of this function simplifies the product and division, making differentiation easier. Let y = using the properties of natural logarithms, we have: (x2 +1)(x+3)1/2 . x−1 Then, 52 6 Inverse functions and their derivatives 1 ln y = ln(x2 + 1) + ln(x + 3) − ln(x − 1). 2 Differentiating both sides with respect to x: 2x 1 1 y′ = 2 + − . y x + 1 2(x + 3) x − 1 Simplifying, we get: 2 1 (x + 1)(x + 3)1/2 2x 1 ′ − . y = + x2 + 1 2(x + 3) x − 1 x−1 For more complicated fraction functions, using natural logarithms can simplify the differentiation process. ⊔ ⊓ Problem 6.6. EvaluateZthe following indefinite integrals. Z Z 2 cos x (1) cot x dx (2) tan x dx (3) dx 3 + 2 sin x Z Z (4) sec x dx (5) csc x dx Solution 6.6 Generally, finding antiderivatives is not easy. The given problems involve finding antiderivatives using Theorem 6.3(4). If you recognize the given function as f ′ (x) F(x) = , f (x) you can directly use Theorem 6.3(4) to find: Z F(x)dx = ln | f (x)| +C. Don’t forget to include the absolute value. ⊔ ⊓ 6.4 Exponential function The Euler number, also known as the natural constant, denoted as e, is a special number that satisfies: Z e 1 dt = 1. 1 t In other words, ln e = 1. This number, like π, is irrational. Now let’s consider the derivative of the exponential function f (x) = ex with respect to x. According to Problem 6.4, we have: d x e = ex ln e = ex . dx 6.4 Exponential function 53 In other words, the derivative and the original function are the same, making it a very special function. Now, let’s consider the inverse function of the natural logarithm. The inverse function of the natural logarithm ln x is called the exponential function and is denoted as exp x. When expressing the natural logarithm function as ln : A → B, where the domain is A = (0, ∞) and the range is B = (−∞, ∞) = R, the exponential function is defined as exp : R → (0, ∞). Now, let’s compute the derivative of exp x. Let y = exp(x); then x = ln(y). Using the inverse function’s derivative rule, we get exp′ (x) = 1 1 = = y = exp(x). ln (y) 1/y ′ In other words, the exponential function, like the exponential function ex , has the property that its derivative returns itself. To show whether these two functions are the same, what do we need to demonstrate? Let’s consider the following problem for now. Problem 6.7. Two functions f (x) and g(x) are both positive and satisfy the following conditions: f ′ (x) = f (x), g′ (x) = g(x), f (c) = g(c) for some c. Prove that these two functions are the same. Solution 6.7 Given that both f (x) and g(x) are positive, the fact that g(x) does not f (x) become 0 ensures that the fraction function g(x) is well-defined. It is easy to show that the derivative of this fraction function is 0. Therefore, the fraction function is a constant, and since f and g have the same value at x = 1, this constant must be 1. f (x) Consequently, for all x, we have g(x) = 1, implying f (x) = g(x). ⊔ ⊓ Since ex and exp x are both positive, have derivatives equal to themselves, and are equal at x = 0, we can conclude that exp x = ex for all x ∈ R. Thus, exp x = ex , x ∈ R. While both notations are used, we typically use ex more commonly. Now, let’s summarize some properties of the exponential function. Theorem 6.4 (Exponential Function). The exponential function ex satisfies the following properties. d x d f (x) (1) dx e = ex . (2) dx e = f ′ (x)e f (x) . (3) ex ey = ex+y . Proof. (1) has already been proven. (2) follows from (1) using the chain rule. (3) can be proven by taking the natural logarithm of both sides: 54 6 Inverse functions and their derivatives ln(ex ey ) = ln ex + ln ey = x + y = (x + y) ln e = ln(ex+y ). Since the natural logarithm is a one-to-one function, we can conclude that ex ey = ex+y . ⊔ ⊓ Exercises 1. Find the inverse functions of the following functions along with their domains. (1) f (x) = x5 (2) f (x) = 2x + 1 (3) f (x) = (1 − x)3 2. Compute the derivative of the inverse functions ( f −1 )′ (2) at y = 2. x (1) f (x) = x5 (2) f (x) = 2√ +1 (3) f (x) = x3 + 1 x x+3 (5) f (x) = √ (6) f (x) = (x2 − 1)0.3 (4) f (x) = x−1 x−3 3. Compute the derivatives of the following functions. 2 2 (1) y = ln(x3 + x) (2) y = ln(ln (3) y = ex +3 r x ) 1 1+t 1+t (4) y = (5) y = (6) y = ln 1−t 1−t t(1 + t) 4. Compute the following integrals. Z −1 Z π Z 2 1 cos x (1) dx (2) dx (3) ex dx −2 x 0 10 + sin x 0 Z 2 Z Z 7t dx (4) (5) 10 tan 2x dx (6) dt 2 t − 10 1 x ln x 5. Differentiate the following using the properties of natural logarithms. s √ (x + 1)(x − 1)2 2+t (1) (2) (3) tan θ 2θ − 1 (x2 + 1)x2 t sint dy when x and y satisfy the given relations. dx (1) y = yx (2) ex = x2y+2x (3) ln y = x3 y x 2x (4) y = x (5) y = (tan x) (6) ysin x = xcos y 6. Find 7. Evaluate the following integrals. Z Z 2 ln(10x) dx (1) dx (2) x − 1 x Z Z ln x (4) 3x √ 2 dx (5) x √ 2−1 (3) dx Z ln x 1 Z1 0 t dt 5−x dx (6) −2 Part II Kepler and Newton’s Laws of Motion Astronomer Johannes Kepler, in the 16th century, analyzed the observations of Danish astronomer Tycho Brahe and explained the orbits of planets around the sun with three laws between 1609 and 1619. These laws modified the circular orbit theory of Nicolaus Copernicus to elliptical orbits and explained how the speed of planets changes. The three laws are as follows: 1. The orbit of a planet is an ellipse with the sun at one of the two foci. 2. The line segment connecting the planet and the sun sweeps equal areas in equal time intervals. 3. The square of the period of the planet’s orbit is proportional to the cube of the semi-major axis length of the orbit. Isaac Newton, in 1687, demonstrated that Kepler’s laws result from his laws of motion and universal gravitation. Newton’s laws of motion consist of three parts: 1. Law of Inertia: An object at rest stays at rest, and an object in motion stays in motion with the same speed unless acted upon by an external force. 2. Force Law: Force is the product of mass and acceleration (F = ma). 3. Action-Reaction Law: For every action, there is an equal and opposite reaction. Newton’s law of gravitational force states that the gravitational force between two objects is inversely proportional to the square of the distance and directly proportional to the product of their masses. If the masses of the two objects are m1 and m2 , and their positions are x1 and x2 , then the gravitational force acting on object m1 is given by: m1 m2 r . Fm1 = −G 2 r r In the above equation, G is the gravitational constant (6.674 × 10−11 m2 /kg s), and r and r are defined as follows: r = x1 − x2 , r = ∥r∥. The force acting on object m2 is simply the opposite, following the action-reaction law: m1 m2 r = −Fm1 . Fm2 = G 2 r r Thus, it satisfies the action-reaction law. In Part II, the first goal is to explain Kepler’s laws using Newton’s laws, and in the process, the second goal is to learn various useful mathematics. Lecture 7 Rectangular coordinate system and curves in R3 Space: the final frontier. These are the voyages of the Starship Enterprise. Its five-year mission: to explore strange new worlds. To seek out new life and new civilizations. To boldly go where no man has gone before! (From Star Trek) Now, let’s take the perspective of Newton and try to explain the motion of celestial bodies using mathematics. To represent the motion of celestial bodies in space with equations, we first need to establish a coordinate system in space. However, this task is not as simple as it might seem. While the Earth has served as a reference for us living on it, there is no such absolute reference in space. The reference frame needs to be chosen by us. In the movie Star Trek, the spacecraft Enterprise often moved at high speeds and then came to a stop. However, distinguishing between a spacecraft moving at a constant speed and a stationary one is not meaningful. Therefore, stating whether an object is moving quickly or at rest is not meaningful. If we want to reach a certain planet, it is more accurate to say that we match the velocity of the spacecraft to the velocity of that planet. Velocity is relative, and kinetic energy is also relative. Only acceleration has meaning. 7.1 Coordinate system Let there be a particle in space. It is represented as r in the figure. Before choosing a reference frame, we cannot say whether this particle is moving or not. What we 57 58 7 Rectangular coordinate system and curves in R3 can have as a reference is an object that is not accelerating or a position. We call it the origin, denoted by 0. If the particle r moves with the same velocity as the origin 0, we say the particle is not moving. In other words, the velocity of the origin is the zero velocity, denoted by v = 0. We use the same notation 0 for the position of the origin and the velocity of the origin, distinguishing them by context. A point that does not move in space, i.e., has a zero velocity, is called a position. To express the position of the particle numerically, we need a coordinate system. A coordinate system in space implies three positions satisfying certain conditions with respect to the origin. First, we define the unit of distance. Next, we need a position i, which is one unit of distance away from the origin 0 in the x-axis direction. Then, we choose a line perpendicular to the line connecting the origin and i in the y-axis direction. On this line, we choose a point one unit of distance away from the origin and name it j. A line perpendicular to the plane passing through the origin, i, and j is chosen, and a point at a unit distance from the origin on this line is selected and named k. This completes the coordinate system. Problem 7.1. In the explanation above, positively oriented coordinate systems and negatively oriented coordinate systems are distinguished based on the choice of k. What are these cases? Solution 7.1 (i) Right-hand rule: Wrap the fingers of your right hand around the line passing through the origin, i, and j in the plane containing them, with the thumb pointing in the k direction. If k aligns with the thumb, the coordinate system is positively oriented. Otherwise, it is negatively oriented. (ii) Cross product test: If k = i × j, the coordinate system is positively oriented. (In any case, cross product can be explained using the right-hand rule.) ⊔ ⊓ We choose the positively oriented coordinate system, as is the tradition. Remark 7.1. In this lecture, we consider 3-dimensional space, but for spaces with dimensions two or higher, there exist both positive and negative coordinate systems, and they can be distinguished. However, two positively oriented coordinate systems cannot be distinguished from each other. They coincide upon rotation. The choice 7.2 Projection 59 of the coordinate system order determines the orientation. In 1-dimensional space, there is only one coordinate system. When an object moves to the right, increasing x is considered positive, and when it moves to the left, decreasing x is considered positive. However, with a rotation, left and right are swapped, and they become indistinguishable. The actual choice of coordinates determines the orientation. Problem 7.2. Two particles move with different velocities without acceleration. Prove that there exists a plane containing the motion of these two particles in space. Solution 7.2 As discussed, let’s take one of the particles as the origin. There is a line passing through the origin, and it intersects the plane containing the motion of the second particle in space. If this line passes through the origin, there are many such planes, and if it does not pass through the origin, there is a unique plane. ⊔ ⊓ After looking at the solution to the above problem, if you feel a bit deceived, I want to emphasize that this is not the case. Of course, within the coordinate system with the third party as the origin, there is no such plane. Problem 7.2 illustrates that the coordinate system should be chosen according to the purpose. 7.2 Projection Let r denote the position of a particle in space. Given a coordinate system, we can represent the position of r with three numbers using that coordinate system. Let’s examine the meaning and method in detail. First, we project r onto the line x-axis, which is the line connecting the origin 0 and the unit vector i. When projecting onto the line, the point where the line, passing through the position r and perpendicular to the x-axis, intersects the x-axis is the projection point of r onto the x-axis. The distance from the origin to the projection point is the x coordinate of r. If the projection point is on the opposite side of i, we assign a negative sign. Similarly, we can perform this process for j and k to find the y and z coordinates. These are the coordinates of the point r. Consider the projection onto the xy-plane. Draw a line perpendicular to the xy-plane, passing through r, and find the point where it intersects the xy-plane. This point is the projection. The coordinates of this point on the xy-plane are (x, y). We represent r as a column vector: x r = y . z The coordinates for i, j, k, and 0 are as follows: 60 7 Rectangular coordinate system and curves in R3 0 1 0 0 0 = 0 , i = 0 , j = 1 , k = 0 . 0 0 0 1 The point r can be expressed using i, j, and k as: r = xi + yj + zk. Vectors are denoted in bold, and scalars are denoted in regular font. The magnitude or norm of the position vector r is defined and represented as: p ∥r∥ = x2 + y2 + z2 . This represents the distance between r and the origin 0 (Pythagorean theorem). Different coordinate systems can be chosen as needed. In such cases, the essential position of r remains unchanged, but its representation changes. Question 7.1. Most calculus textbooks do not distinguish whether vectors are column vectors or row vectors. However, we fix r as a column vector. What is the advantage of choosing column vectors over row vectors? Distinguishing between column vectors and row vectors reduces confusion. One reason for representing the position vector r as a column vector is matrix multiplication. If A is a 3 × 3 matrix and x is a vector, we typically write the matrix-vector multiplication as Ax. In this case, x must be a column vector. However, using column vectors has its drawbacks, as it consumes more space. Therefore, sometimes, we may write r = (1, 3, 2), saving space horizontally. But remember to keep in mind that, depending on the context, this may still represent a column vector. 7.3 Moving particle and trajectory curves in space 61 7.3 Moving particle and trajectory curves in space Let’s consider a planet moving in space. Let time be represented by t ∈ R, and let r(t) denote the position of the planet or object at time t. Then, we can write: f (t) r(t) = f (t)i + g(t)j + h(t)k = g(t) . h(t) Alternatively, we can express it as: x = f (t), y = g(t), z = h(t). Both representations are equivalent, and the meaning is clear. However, reconsidering, what is the reason for introducing the new expressions f (t), g(t), and h(t)? They represent functions of x, y, and z coordinates of the planet, respectively. But later, one might forget whether f (t) represented the x or y coordinate. So, it is better to write: x(t) r(t) = y(t) . z(t) The trajectory of the planet, denoted as {r(t) : t ∈ R}, is a curve in 3D space. Thus, we can consider it as a vector-valued function with time variable t ∈ R. Using either of the two expressions mentioned earlier, the norm of the position vector r(t) can be represented as follows: q q ∥r(t)∥ = f 2 (t) + g2 (t) + h2 (t) or ∥r(t)∥ = x2 (t) + y2 (t) + z2 (t). The second notation makes it clear that this is the distance between the position vector r(t) and the origin. This use of notation abuse clarifies the meaning. Remark 7.2. In this notation, x(t) is a function with t as the variable representing the x coordinate of the moving particle’s position at time t. We refer to this kind of expression as notation abuse. Using the same symbol x for both the x coordinate in the coordinate system and the function representing the position at time t is more convenient than introducing a new function f (t) as x = f (t). This kind of notation abuse, where the same symbol is used for two different entities, is widespread and has been used in calculus, including the chain rule. Question 7.2. What is the difference between a vector and a scalar? We commonly say that a scalar is a quantity with only magnitude, and a vector is a quantity with both magnitude and direction. However, that statement is not entirely accurate. A scalar value x ∈ R also has one of two directions, either to the right or to the left, with a magnitude of |x|. A more precise distinction is that a scalar 7 Rectangular coordinate system and curves in R3 62 is a quantity that arises in a number system like real or complex numbers, while a vector can be considered as composed of multiple scalars, including the case of a single-component vector. In other words, a scalar can be called a single-component vector. Problem 7.3. Draw the trajectory of the vector function r(t) = costi + sintj given by r : (0, 2π) → R2 . In which direction is it moving? Problem 7.4. Draw the trajectory of the vector function r(t) = costi + sintj + tk given by r : (0, 2π) → R3 . Problem 7.5. Generate a function r : (0, 2π) → R3 that traces the trajectory of a coil rotating the z-axis 10 times when projected onto the xy plane, resulting in a circle of radius 2. Vector sums and subtractions Multiplying a vector by a scalar is given by cr = (cx, cy, cz). The sum and difference of two vectors are defined by adding and subtracting each component of the vectors, respectively. That is, r1 + r2 = (x1 + x2 , y1 + y2 , z1 + z2 )., r1 − r2 = (x1 − x2 , y1 − y2 , z1 − z2 ). The geometric interpretation of vector addition is explained using parallelograms. The vector difference r2 − r1 is understood with r2 as the terminal point and r1 as the initial point (refer to the figure above). 7.4 Cross product & inner product For two vectors, 7.4 Cross product & inner product x1 r1 = y1 , z1 63 x2 r2 = y2 , z2 the cross product is denoted and defined as follows: r1 × r2 = (y1 z2 − z1 y2 )i − (x1 z2 − z1 x2 )j + (x1 y2 − y1 x2 )k. It is also called the vector product. To make it easier to remember the above formula, we use the determinant of a 3 × 3 matrix: i j k x y x z y z r1 × r2 = x1 y1 z1 = 1 1 i − 1 1 j + 1 1 k. x2 y2 x2 z2 y2 z2 x2 y2 z2 The cross product is defined only for 3-dimensional vectors. Geometrically, the cross product r1 × r2 is a vector perpendicular to the plane containing the two vectors r1 and r2 , with a magnitude given by ∥r1 × r2 ∥ = ∥r1 ∥ ∥r2 ∥ sin θ (7.1) where θ is the angle between them. There are two such vectors, satisfying the righthand rule. If the two vectors are parallel, i.e., if the angle is θ = 0, then r1 × r2 = 0. Problem 7.6. Let r1 (t) and r2 (t) denote the vectors representing the positions of two objects at time t. Show that the cross product satisfies the following product rule: (r1 (t) × r2 (t))′ = r′1 (t) × r2 (t) + r1 (t) × r′2 (t). Solution 7.6 We can use the product rule for derivatives as follows: (r1 (t) × r2 (t))′ = (y1 z2 − z1 y2 )′ i − (x1 z2 − z1 x2 )′ j + (x1 y2 − y1 x2 )′ k = (y′1 z2 − z′1 y2 )i + (y1 z′2 − z1 y′2 )i + (· · · )j + (· · · )k = r′1 (t) × r2 (t) + r1 (t) × r′2 (t). Thus, the product rule is satisfied. (Not all terms are explicitly written, please verify.) ⊔ ⊓ 7 Rectangular coordinate system and curves in R3 64 Question 7.3. Is there a way to determine if two vectors r1 and r2 are perpendicular? Is there an easy way to find the angle between them? Using (7.1), we can find the angle between two vectors. However, an easier way to determine the angle is through the inner product, also known as the dot product. The inner product is defined in two ways: r1 · r2 = ⟨r1 , r2 ⟩ = x1 x2 + y1 y2 + z1 z2 . The inner product of two vectors yields a single scalar value. Problem 7.7. If θ is the angle between two vectors r1 and r2 , show that cos θ = r1 · r2 . ∥r1 ∥ ∥r2 ∥ (7.2) Solution 7.7 Assuming the two vectors meet at the origin, we can consider them lying in the xy-plane. Therefore, let’s assume all z components are zero. Then the relationship (7.2) corresponds to basic trigonometry learned in high school. Though not explicitly shown here, (7.2) should be remembered. ⊔ ⊓ The relationship (7.2) is very important. If the inner product is 0, the vectors are perpendicular. If the angle is 0, i.e., if the vectors are parallel, then cos 0 = 1, and the inner product of the two vectors equals the product of their lengths. Problem 7.8 (Equation of a plane). Find the equation of a plane perpendicular to vector v = 2i + 3j + k passing through the point r = (1, 2, −1). Solution 7.8 (Refer to the figure above) Let x = (x, y, z) represent a point on the plane. Then, the vector x − r = (x − 1, y − 2, z + 1) is perpendicular to v = (0, 3, −2). Therefore, (x − r) · v = 0(x − 1) + 3(y − 2) − 2(z + 1) = 3y − 2z − 8 = 0. Thus, the equation of the plane is 3y − 2z − 8 = 0. Alternatively, it can be written as 3y − 2z = 8. ⊔ ⊓ The inner product can be defined not only for 3-dimensional vectors but also for vectors of any dimension. However, the notation used previously is not suitable for expressing the inner product of n-dimensional vectors. Let’s represent two ndimensional vectors slightly differently: x1 y1 .. .. x = . , y = . . xn yn The inner product of these two vectors is defined as follows. 7.4 Cross product & inner product 65 n x · y = ⟨x, y⟩ = ∑ xi yi . (7.3) i=1 The inner product of two functions f and g can also be defined by integration. ⟨ f , g⟩ = Z f (x)g(x)dx. (7.4) What is the angle between two vectors in n-dimensional space? What about the angle between two functions f and g? Although their meanings are different, (7.2) can be used as a definition for angles. Question 7.4. What commonality exists between the inner products (7.3) and (7.4), even though they seem different? Problem 7.9. Let x(t) and y(t) denote vectors representing the positions of two objects at time t. Show that the derivative of their inner product also satisfies the following product rule: (x(t) · y(t))′ = x′ (t) · y(t) + x(t) · y′ (t). Solution 7.9 We can use the product rule for derivatives as follows: (x(t) · y(t))′ = n ′ n x (t)y (t) = ∑ (xi (t)yi (t))′ ∑ i i i=1 n = i=1 ∑ (xi′ (t)yi (t) + xi (t)y′i (t)) = x′ (t) · y(t) + x(t) · y′ (t). i=1 Thus, the product rule is satisfied. ⊔ ⊓ Exercises 1. Find all vectors perpendicular to r = i + 2j + k. (1) r = i + 2j + k (2) r = 2i − 3j + 4k (3) r = i − j + k (4) r = 2i + j − 3k 2. Find unit vectors perpendicular to the following pairs of vectors. (1) r1 = 3j + k, r2 = 2i + j − k (2) r1 = i + 2j + k, r2 = 2i − j 3. Find the equation of a plane perpendicular to the vector v = 2i + 3j + k passing through the point r = (1, 2, −1). 4. Find the equation of a plane perpendicular to the vector v = 2i + 3j + k passing through the point r = (1, 2, −1). 5. Find a vector perpendicular to the plane with the equation 2x + 3y − z = 2. 66 7 Rectangular coordinate system and curves in R3 6. Find the equation of a plane parallel to the xy-plane passing through the point r = (2, 1, 4). 7. Find the equation of a plane parallel to the xz-plane passing through the point r = (2, 1, 4). 8. Find the intersection of the planes 2x + 3y − z = 2 and 3x + y − 2z = 0. 9. Find the equation that represents all points equidistant to the points r1 = (1, 2, 1) and r2 = (3, 2, −1). Lecture 8 Polar coordinates in R2 The planets in the solar system orbit in elliptical paths close to circles around the sun. Artificial satellites orbiting around the Earth are mainly designed to orbit in circular paths, but they can also orbit in elliptical paths. Each orbit can be described in two-dimensional space coordinates on a plane. Particularly, polar coordinates are useful for representing circular or elliptical orbits. In this lecture, we will discuss polar coordinates, which have many practical applications. 8.1 Variable change with polar coordinates The polar coordinate system in two-dimensional space consists of two numbers: the length r and the angle θ . The orthogonal coordinate system in two dimensions consists of two numbers: the x-coordinate and the y-coordinate. The length of the line segment connecting the origin and the point (x, y) is r, and this line segment makes an angle θ with the x-axis. Given polar coordinates (r, θ ), we can calculate orthogonal coordinates using sin θ and cos θ . That is, x = r cos θ , y = r sin θ . (8.1) Of course, given orthogonal coordinates (x, y), we can find the corresponding polar coordinates (r, θ ). However, it is important to determine the ranges of r and θ . We have 67 8 Polar coordinates in R2 68 (r, θ ) ∈ [0, ∞) × [0, 2π), so the length r is given by r= p x 2 + y2 . However, explicitly expressing the angle θ as θ = f (x, y) is difficult, and it is given implicitly as p y x (8.2) cos θ = , sin θ = , r = x2 + y2 . r r If r ̸= 0, then there exists a unique θ satisfying (8.2) in the interval 0 ≤ θ < 2π. Let’s agree to write r before θ , similar to writing x before y. Depending on the purpose and convenience, one can choose either of the two coordinate systems and should understand their relationship well in order to perform variable transformations freely. The relations (8.1) and (8.2) between polar coordinates (r, θ ) and orthogonal coordinates (x, y) are perhaps the most important examples of multidimensional variable transformations, serving as the first example in understanding variable transformations clearly, which is essential for understanding Newton’s planetary theory. The orthogonal coordinate system is not only a coordinate system but also the actual world where the motion of planets occurs. On the other hand, the polar coordinate system is a convenient coordinate system for representing elliptical orbits in the orthogonal coordinate system. To easily use the polar coordinate system in the orthogonal coordinate system, we introduce new basis vectors instead of the basic basis vectors i and j of the orthogonal coordinate system. These are as follows: cos θ − sin θ er (θ ) = , eθ (θ ) = . (8.3) sin θ cos θ Although these two vectors are unit vectors, unlike i and j, they are not constant vectors. Both vectors depend only on θ and are independent of r. The corresponding basis vectors of the polar coordinate system are er , which becomes (1, 0), and eθ , which becomes (0, 1). The reason is as follows: as seen in the figure, the vec- tor er is a vector in the direction of the fixed θ , so it corresponds to (1, 0) in the polar coordinate system, and the vector eθ corresponds to (0, 1) in the polar coor- 8.2 Motion in polar coordinates 69 dinate system as it is a vector in the direction of the fixed r. Let’s examine which point in the orthogonal coordinate corresponds to the given coordinates (r, θ ) in the polar coordinate plane. Once the angle θ is given, we consider the direction vector er corresponding to the angle θ . Since the direction vector is a unit vector, the corresponding vector has a length of r: r = rer (θ ). This equation is nothing more than rewriting the relationship (8.1) as a vector equation. If er is the first coordinate axis and eθ is the second coordinate axis, then the new coordinate system also has a positive orientation. Now, if the point r = (x, y) on the xy plane is given, let’s find the corresponding polar coordinates (r, θ ). There is a point to be careful about: since the correspondence (8.1) is not one-to-one, it is not uniquely determined. To establish an inverse correspondence, we must choose a branch as in defining inverse functions. In polar coordinates, we choose r ≥ 0 and 0 ≤ θ < 2π as branches. Within this range, we choose r and θ that satisfy (8.2). Question 8.1. What if we express θ as tan−1 (y/x), θ as sin−1 (y/r), or θ as cos−1 (x/r) instead of (8.2)? Indeed, many calculus books use such relationships. However, if we define θ = tan−1 (y/x) or θ = sin−1 (y/r), these two inverse functions only give angles in the range − π2 ≤ θ ≤ π2 according to their definitions. If we use θ = cos−1 (x/r), it gives angles only in the range 0 ≤ θ ≤ π according to the definition of cos−1 . Therefore, these expressions are not accurate representations. Let’s simply use the solutions of (8.2) as a new function θ (x, y). Then, we can cover the range 0 ≤ θ < 2π handled in polar coordinates. 8.2 Motion in polar coordinates This section is essential for deriving the orbit formulas of planets. It requires mathematical thinking for physical understanding. Assuming that two celestial bodies (such as the Sun and the Earth) do not exert any external forces other than gravity on each other, they will lie on the same plane (this will be confirmed later). Introducing a polar coordinate system on this plane allows us to represent the position of an object or a planet using polar coordinates: r = xi + yj = r cos θ i + r sin θ j = rer (θ ). We denote the position vector in bold font r. The relationship with polar coordinates r is ∥r∥ = r. 8 Polar coordinates in R2 70 The basis vectors i and j in the orthogonal coordinate system are fixed perpendicular coordinate systems regardless of the position. However, er (θ ) and eθ (θ ) are perpendicular coordinate systems that vary depending on the position. They are determined by the angle θ for a given position in the orthogonal coordinate system and are independent of r. Problem 8.1. Prove the following derivatives. der = eθ , dθ deθ = −er . dθ Solution 8.1 These relations can be easily proven using the derivatives of trigonometric functions. Remembering them is more important. ⊔ ⊓ With the new coordinate system, the position of the object is represented as r = rer (θ ). This notation hides the time variable. As the object moves, the polar coordinates r and θ representing the position of the object become functions of the time variable t. The right side of the following figure shows the trajectory of a particle moving on the xy plane. Then, the corresponding polar coordinate position is represented as r̃(t) = (r(t), θ (t)). The space where Newton’s laws apply is not the polar coordinate space but the orthogonal coordinate space. In other words, Newton’s gravitational law and laws of motion must be applied to the trajectory where the point r = rer (θ ) on the right side of the figure moves. Therefore, the coordinates er and eθ become functions of the angle θ with respect to the time variable t, and the position of the particle can be written as follows: r(t) = r(t)er (θ (t)). Problem 8.2. Prove the following. ėr = eθ θ̇ , ėθ = −er θ̇ . (8.4) Solution 8.2 To calculate the derivatives with respect to time ėr and ėθ , consider the angle as a function of time θ = θ (t). Using the chain rule and problem 8.1, we get 8.3 Ellipses in polar coordinates ėr = d dt 71 ′ der cos θ (t) cos θ (t)θ̇ − sin θ (t)θ̇ θ̇ = eθ θ̇ = = = ′ sin θ (t) sin θ (t)θ̇ cos θ (t)θ̇ dθ and similarly ėθ = deθ θ̇ = −er θ̇ . dθ ⊔ ⊓ Problem 8.3 (Position, velocity, acceleration using polar coordinates). The position, velocity, and acceleration of an object are given as follows. r = rer (8.5) v = ṙer + rθ̇ eθ 2 a = (r̈ − rθ̇ )er + (rθ̈ + 2ṙθ̇ )eθ (8.6) (8.7) Solution 8.3 The position vector (8.5) has already been explained. Its derivative using the product rule and (8.4) is as follows: v = ṙ = ṙer + rėr = ṙer + rθ̇ eθ , a = v̇ = r̈er + 2ṙθ̇ eθ + rθ̈ eθ − rθ̇ 2 er = (r̈ − rθ̇ 2 )er + (rθ̈ + 2ṙθ̇ )eθ . ⊔ ⊓ Remark 8.1. Remember that using polar coordinates r and θ , it is convenient to use er and eθ as basis vectors instead of i and j. 8.3 Ellipses in polar coordinates The equation of an ellipse with its center at the origin and major and minor axes along the x-axis and y-axis, respectively, is given by: x 2 y2 + = 1. a2 b2 An overview of the graph is given on the left side of the figure. ±a represent the x-intercepts, and ±b represent the y-intercepts. If a = b, then the above ellipse becomes a circle. For convenience, we consider the case where a ≥ b, so the x-axis becomes the major axis. The focus of the ellipse lies on√the major axis at two points. The distance between the center and the focus is c = a2 − b2 , i.e., the foci are at (±c, 0). The eccentricity of the ellipse, which indicates how far it deviates from a circle, is given by: r c a2 − b2 . (8.8) e= = a a2 8 Polar coordinates in R2 72 If e = 0, the shape is a circle. If e = 1, then b = 0, and it is no longer an ellipse. The eccentricity of an ellipse lies between 0 and 1. Let’s represent the ellipse using polar coordinates. Take a line perpendicular to the x-axis, x = k, and use this line as the directrix for obtaining the curve in front. Let P(x, y) have polar coordinates (r, θ ), and denote the foot of the perpendicular from P to the directrix as D. For some positive e > 0, r = ePD (8.9) defines all points P(x, y) that satisfy this equation. Since the length of segment PD is k − x, we have: p r = ePD ⇒ x2 + y2 = e(k − x) ⇒ x2 + y2 = e2 (k2 − 2kx + x2 ). In simplified form, this becomes: (1 − e2 )x2 + 2ke2 x + y2 = e2 k2 . If e ̸= 1, we can rewrite this equation as follows: ke2 2 y2 e2 k 2 + = . x+ 1 − e2 1 − e2 (1 − e2 )2 (8.10) Problem 8.4. If 0 < e < 1, show that (8.10) represents an ellipse with one of its foci at the origin, where e represents the eccentricity of the ellipse. Solution 8.4 If 0 < e < 1, then 1 − e2 > 0, and we can define: a2 = e2 k2 , (1 − e2 )2 b2 = a2 (1 − e2 ) = e2 k 2 , (1 − e2 ) Dividing (8.10) by a2 , we get: (x + c)2 y2 + 2 = 1, a2 b c= ke2 > 0. 1 − e2 8.4 Curves in polar coordinates 73 which represents an ellipse. q The center of the ellipse is (−c, 0). The eccentricity of the ellipse is defined as a2 −b2 . a2 Calculating, a2 − a2 (1 − e2 ) 1 − (1 − e2 ) a2 − b2 = = = e2 . a2 a2 1 (8.11) Thus, the coefficient e in the relationship r = ePD is indeed the eccentricity of the ellipse, so it is reasonable to set the coefficient to √e from the beginning. The distance from the center to the focus of the ellipse is a2 − b2 , and using (8.11), we can compute: s p √ k 2 e4 2 2 2 2 = c. a −b = e a = (1 − e2 )2 Therefore, shifting the ellipse by c units to the left means the origin is a focus. ⊔ ⊓ We have shown that points satisfying (8.9) form an ellipse with eccentricity e and one focus at the origin. The length of segment PD is k − r cos θ , so the polar representation of this ellipse becomes r = e(k − r cos θ ). Solving for r, we get: r= L , 1 + e cos θ L = ek. This equation represents an ellipse with eccentricity e for 0 < e < 1. However, for e ≥ 1, it represents a parabola or a hyperbola (see Appendix B). 8.4 Curves in polar coordinates When using polar coordinates (r, θ ) correspondingly with Cartesian coordinates (x, y), it’s common to define the range as r ≥ 0 and 0 ≤ θ < 2π. However, when simply using polar coordinates to represent curves, they can be used without such restrictions. In this section, we consider the equations of curves represented in polar coordinates and their corresponding curves in Cartesian coordinates and their meanings. Problem 8.5. Convert the following equations given in polar coordinates to Cartesian coordinates and draw their corresponding graphs. 2 (1) r = 1 (2) r = cos θ (3) r = cos(2θ ) (4) r = sin θ − cos θ Solution 8.5 It’s important to distinguish between the graphs in polar coordinates and their corresponding graphs in Cartesian coordinates, understanding that the graphs in polar coordinates correspond to the graphs in Cartesian coordinates via the transformation (8.1). The overview of the graphs is given in the figure. 74 8 Polar coordinates in R2 p (1) The equation r = 1 in Cartesian coordinates becomes x2 + y2 = 1, which represents the equation x2 + y2 = 1. We know this represents a circle with its center at the origin and radius 1. Even without knowing this, if we plot r = 1 for various values of θ from 0 to 2π, we would observe a circle with radius 1. (2) Since cos θ can take negative values, we need to consider the possibility of r being negative when writing r = cos θ . Multiplying both sides by r, we get r2 = r cos θ , which, in Cartesian coordinates, becomes x2 + y2 = x. Rewriting this, we get (x − 0.5)2 + y2 = 0.52 . This represents a circle centered at (0.5, 0) with radius 0.5. In the polar coordinate space, this graph is represented by the cosine function, which repeats every 2π interval. Thus, the interval [0, 2π] corresponds to two circles. It’s worth understanding why this is so when θ moves from 0 to π. (3) Using the double angle formula, we get r = cos2 θ − sin2 θ , and in Cartesian coordinates, this becomes (x2 + y2 )3/2 = x2 − y2 . Squaring both sides and rewriting, we get x6 + 3x4 y2 + 3x2 y4 + y6 = x4 − 2x2 y2 + y4 . It’s not immediately clear what curve this equation represents. However, in polar coordinates, the graph is simply the cosine function, and considering the above graph, we end up with a four-leaf clover pattern due to the absence of overlapping. (4) In this case, the graph in polar coordinates might seem more complicated, but when rewritten in Cartesian coordinates, we get y = x + 2, which represents a straight line. ⊔ ⊓ Exercises 1. Convert the following points√ given in Cartesian√ coordinates to polar coordinates. (1) r = (1, 1) (2) r = (−1, 3) (3) r = (−2 3, −2) (4) r = (0, −2) 8.4 Curves in polar coordinates 75 2. Convert the following points given in polar coordinates to Cartesian coordinates. √ π π π (2) r̃ = (4, π) (3) r̃ = (2 3, ) (4) r̃ = (0, ) (1) r̃ = (2, ) 2 6 4 3. Sketch the overview of the curves represented by the following polar equations. (1) r = 1 − cos θ (2) r = 1 − sin θ (3) r2 = sin θ (4) r2 = 4 cos θ 4. Given below are equations of ellipses. Compute the center, foci, and eccentricity. (1) 16x2 + 25y2 = 400 (2) 9x2 − 18x + 10y2 = 44 (3) 6x2 + 9y2 − 18y = 45 5. Represent the above ellipses in polar coordinates. 6. Convert the equations of the curves given in polar coordinates to Cartesian coordinates and sketch their overview. 1 20 5 5 (2) r = (3) r = (4) r = (1) r = 1 + cos θ 10 − 5 cos θ 1 + 2 sin θ 1 − 0.5 cos θ 7. Find the equation of the ellipse with a directrix at x = 5, eccentricity e = 0.5, and the focus at the origin. 8. Use equation (8.6) to find the distance from the center of an artificial satellite with an orbital period of 24 hours to the center of the Earth. (Refer to the necessary data such as the gravity formula from the internet, etc.) Lecture 9 Differential Equations Many physical quantities are given as functions of derivatives, and physical laws are given by their relation equations. For this reason, differential equations explaining important phenomena frequently appear. The task of finding solutions to simple differential equations is the topic of this lecture. 9.1 First order differential equations The independent variable can be either t or x. We often use time as the independent variable, but there are many cases where we don’t. To emphasize this, let’s first consider x as the independent variable. Let the dependent variable be denoted as f and the solution to the differential equation be denoted as f (x), but in the theory of differential equations, we often write the dependent variable that needs to be found as y. Then y becomes implicitly a function of x, i.e., y = y(x). In this notation, x and y are just general variables, not coordinates. The most general first-order differential equation can be written as follows: y′ = f (x, y), y(x0 ) = y0 . (9.1) (We use the symbol f here.) In this notation, the first equation y′ = f (x, y) is the differential equation. Since only first-order differentials are involved, it is called a first-order differential equation. The second equation y(x0 ) = y0 is the initial condition. In this case, x0 is considered the initial moment, and y0 is the value that the function y has at the initial moment. Using Leibniz notation, we can write it as follows: dy = f (x, y), y(x0 ) = y0 . (9.2) dx This notation is a bit friendlier. It explicitly states that y is a function of x, and we are differentiating y with respect to the variable x. If f is a function of x only, i.e., y′ = f (x), then y is an antiderivative of f (x). It can also be easily solved if f = f (y). If both x and y are on the right side, then it needs to be solved. When solving first-order differential equations, one general constant appears, which is determined by the initial conditions. Let’s verify this through a simple example. 77 78 9 Differential Equations Problem 9.1 (Easy example). Find the solution to the following first-order differential equation: y′ = 3x + 2, y(1) = 1. Solution 9.1 Since f is a function of x only, we integrate: Z y= y′ dx = Z 3 (3x + 2)dx = x2 + 2x +C. 2 Considering the initial condition y(1) = 32 + 2 + C = Therefore, the solution is y = 32 x2 + 2x − 52 . ⊔ ⊓ 7 2 + C = 1, we find C = − 52 . The above problem is a simple case, and generally, solving differential equations is more challenging. However, verifying if a given function is a solution or not is easier. Problem 9.2 (Verifying Solutions). (1) Show that for all constants C, the function dy 1 C = (2 − y). (2) Show that y = + 2 is a solution to the differential equation x dx x 1 x y = (1 + x) − e is a solution to the differential equation y′ = y − x, y(0) = 32 . 3 Solution 9.2 (1) Since no initial value is given, there are many solutions including arbitrary constants C. To include them, solutions contain a general constant C. Let’s start with y = Cx−1 + 2. Taking the derivative, we have y′ = −Cx−2 . Substituting y = Cx−1 + 2 into the right-hand side of the equation, we get: 1 C C 1 (2 − y) = (− ) = − 2 . x x x x The left-hand side and right-hand side are the same, so it is a solution. 1 (2) Differentiating y = (1 + x) − 13 ex gives y′ = 1 − ex , and computing the right3 1 x hand side, y − x = 1 − e . Thus, it satisfies the differential equation. Moreover, it 3 satisfies the initial condition: y(0) = 1 − 31 e0 = 23 . Therefore, it is a solution to the initial value problem. ⊔ ⊓ The first-order differential equation (9.1) or (9.2) is written in a very general form and in many cases cannot be explicitly solved. However, we can understand what is happening by creating a slope field on the xy-plane. The principle is simple: draw a small line segment with slope f (x, y) at the point (x, y). Then, if the graph of the solution y(x) passes through the point (x, y), the graph of the solution will be tangent to this small line segment. The collection of these line segments is called a slope field. Problem 9.3 (Slope field). Draw the slope field on the domain [−2, 2] × [−2, 2] for the differential equations y′ = f (x, y) given by the following functions: 9.1 First order differential equations 79 2xy . 1 + x2 2 (2) f (x, y) = y − x . (1) f (x, y) = Solution 9.3 Let’s use MATLAB to draw the slope field of the given functions f (x, y). Below is the code and the corresponding figure. Practicing to create such small codes is helpful. ⊔ ⊓ - MATLAB CODE %% parameters L=2.1; dx=0.2;dy=0.2; %% variables [x,y] = meshgrid(-L:dx:L,-L:dy:L); [NX,NY]=size(x); %% computation Y=(2*x.*y./(1+x.*x)); X=ones(NX,NY); NRM=(Y.ˆ2+X.ˆ2).ˆ0.5; X=dx*X./NRM; % normalize to the size of dx Y=dx*Y./NRM; quiver(x,y,X,Y); axis([-2 2 -2 2]); title(’f(x,y)=2xy/(1+xˆ2)’); Problem 9.4. Sketch the overview of the solution graphs for the cases in Problem (9.3) with initial condition y(0) = 1 on the above slope field plots. Solution 9.4 The initial condition y(x0 ) = y0 implies that the graph of the solution passes through the point (x0 , y0 ). Therefore, starting from this point, we sketch the curves tangent to each line segment. ⊔ ⊓ 80 9 Differential Equations Let’s try to solve the following differential equation: y′ = ky, y(0) = y0 . This problem is for the case where f = f (y). One method is to use memory. The function that differentiates to itself is ex . But it’s a little harder to come up with the fact that the function that doesn’t differentiate to itself but is multiplied by itself is ekx . (It’s multiplied instead of added.) Once we find this general solution, using the initial condition, C = y0 , we get: y = y0 ekx . However, solving problems empirically like this is too restrictive. We need a systematic way to solve problems. Although we can’t solve all differential equations, we can solve certain types of them. And we need to remember which types of differential equations can be solved. 9.2 Separation of variables Let’s use the technique called separation of variables to find the solution when f (x, y) = 2kxy. Let’s start by writing it in Leibniz notation: dy = 2kxy. dx Now let’s separate x and y. We put dx and dy each in the x group and the y group. Then we get: dy = 2kxdx. y Integrating both sides, we get: Z dy = y Z 2kxdx ⇒ ln |y| +C1 = kx2 +C2 ⇒ ln |y| = kx2 +C. Here, C2 − C1 can be regarded as a single general constant, so we replaced it with C. Taking the exponential function, which is the inverse function of the natural logarithm ln y, on both sides, we get: eln |y| = |y|, ekx+C = eC ekx . Therefore, |y| = eC ekx ⇒ y = Cekx . ekx . (9.3) eC Using the initial condition here yields y = y0 In (9.3), note that becomes C, and |y| becomes y simultaneously. Even when the general constant C is negative, eC 9.3 Integrating factor 81 is positive. Therefore, |y| = eC ekx is a correct expression. If a new general constant C were used instead of eC , |y| would become y, and the sign would need to be adjusted accordingly. Question 9.1. However, is it permissible to solve it in this manner? The differential dy to x, but is it acceptable to multiply both dx instructs to differentiate y with respect R sides by dx, attach the integral symbol , integrate the left side with respect to y, and integrate the right side with respect to x? The background of the separation of variables technique involves the chain rule. Let’s explain this. If f (x, y) can be divided as follows: y′ = f (x, y) = g(x)h(y), we can express it as: 1 ′ y = g(x). h(y) Now, if we find G′ (x) = g(x) and H ′ (y) = 1 h(y) such that G and H are functions of x d and y respectively, then by the chain rule, we have dx H(y) = H ′ (y)y′ . Applying the Fundamental Theorem of Calculus and integrating both sides with respect to x, we obtain: Z Z 1 ′ y dx = g(x)dx ⇒ H(y) = G(x) +C. h(y) The solution y is implicitly given by the above equation. If the inverse function of H exists, then y = H −1 (G(x) +C) is obtained. Problem 9.5. Find the general solution of the following differential equations. (1) y′ = (1 + y)ex (2) y(x + 1)y′ = x(y2 + 1) Solution 9.5 Since there are no initial conditions given, we find solutions that include a general constant C. ⊔ ⊓ 9.3 Integrating factor A first-order linear equation is of the form a(x)y′ + b(x)y = c(x). It can be transformed into the following form by dividing by a(x) on intervals where a(x) ̸= 0: y′ + P(x)y = Q(x). (9.4) Here, P(x) is the coefficient of the zeroth-order term, and Q(x) is the inhomogeneous term. If Q(x) = 0, (9.4) is called a homogeneous problem. Although it is permissible to write Q(x) on the left and 0 on the right, it is more common to write it on the right. First-order linear equations can be solved using the integrating factor technique. If P(x) is integrable, the integrating factor of the first-order linear equation (9.4) is 82 9 Differential Equations as follows: I(x) = e R P(x)dx . It is important to remember that the usefulness of the integrating factor can be understood from its derivative: R I ′ (x) = P(x)e P(x)dx = P(x)I(x). Now, multiplying the above equation by the integrating factor, something good happens: Iy′ + IPy = IQ ⇒= Iy′ + I ′ y = IQ ⇒ (Iy)′ = IQ ⇒ Iy = Z IQdx. Therefore, if IQ is integrable, the solution is as follows: y= 1 I(x) Z I(x)Q(x)dx = e− R P(x)dx Z R e P(x)dx Q(x)dx. For instance, if P and Q are continuous functions, integration is possible, and the solutions to the first-order linear equation (9.4) are provided by the aforementioned integral expressions. Problem 9.6. Determine the solutions and the intervals of existence for the following differential equations. (1) xy′ = x3 + 3y. (2) y′ = x − 32 y. Solution 9.6 (1) Rewriting in the form of (9.4), we have y′ − 3x y = x2 . This is the case where P(x) = − 3x . If x = 0 is included, the integral is not feasible. The solution space is divided into x > 0 or x < 0. Let’s only find solutions for x > 0. Then the integrating factor is R 3 I = e − x dx = e−3 ln x = x−3 . (Integrating factor does not include a general constant C. One integrating factor is sufficient for helping in integration.) Therefore, y = x3 Z x−3 x2 dx = x3 (ln x +C). (2) is done similarly. ⊔ ⊓ Problem 9.7. A circular water tank with a diameter of 10 meters has water flowing into it at a rate of 50 liters per second as shown in the figure. Water leaks from the bottom of the tank at a rate of 10×y liters per second, where y meters is the height of the water. Find the first-order differential equation for the height of the water and its solution. The water inflow starts at t = 0 and there is no water in the tank at that time. 9.5 Equation for two-body problem Solution 9.7 ẏ + 83 1 1 y= , 10000 2000 y(0) = 0. ⊔ ⊓ 9.4 Second Order Differential Equations Now, we solve for the solutions of second-order linear equations. A second-order linear equation can be written as follows: y′′ + a(x)y′ + b(x)y = Q(x). Solving a second-order equation corresponds to integrating twice, leading to the appearance of two general constants. To determine these, two conditions are necessary: y(x0 ) = y0 , y′ (x0 ) = y1 . Solving second-order equations is more difficult than first-order equations and only resolves in special cases. In this lecture, we solve for the solution when a, b, Q are all constants. This form is given by the equation for the orbit of a celestial body. 9.5 Equation for two-body problem To determine the orbit of two celestial bodies, such as the Sun and Earth, we need to solve the following differential equation: u′′ + u = K. (9.5) Obtaining this equation is the main goal of Lecture 11. The K on the right side is (m1 + m2 )G . Here, m1 and m2 are the masses of the two a constant given by K = L2 celestial bodies, G is the gravitational constant, and L is the angular momentum, all of which are constants. If x1 (t) and x2 (t) are the positions of the two celestial bodies at time t, then u is the reciprocal of the distance between the two bodies, r = ∥x1 − x2 ∥. However, the differentiation in (9.5) is not with respect to the time variable t but with respect to the angle variable θ in polar coordinates. The method for solving a second-order linear differential equation with constant coefficients is described in detail in Appendix A. The solution to the second-order differential equation (9.5) requires two initial conditions: (m1 + m2 )G u = (1 + e cos(θ − θ0 ))K, K = . L2 84 9 Differential Equations Here, θ0 is the initial angle, and e is the eccentricity. (It is a tradition to use the same symbol e for eccentricity as the natural constant e, distinguishing them from the context.) These two are determined by the two initial conditions. Remark 9.1. The natural initial conditions for determining the orbit of a planet are the initial positions and velocities of the two planets. However, u is a function of the angle θ , so you need to know the initial angle θ0 at the initial moment to find the initial conditions. However, finding the initial angle is only possible after finding the solution. Once the shape of the solution is known, it is determined by the initial conditions, but finding the initial angle requires finding a way to express the solution using the conserved energy. This will be done in Lecture 12. Exercises 1. Find the general solutions for the following differential equations. (1) xy′ = y2 x2 . (2) x−1 y′ = y sin x. (3) xy′ − 2y = x3 sin x cos x. 2. Determine the general solutions and intervals of existence for the following differential equations. (1) y′ = x−1 ex − xy. (2) x2 y′ = xy − ex . (3) x3 y′ + dx2 y = cos x. 3. Find the general solutions for the following differential equations. 2 d2y dy dy dy = y2 − 2t. (2) 2 + 2 + y = 0 (3) = y2 e−t (1) dt dt dt dt Lecture 10 Newton’s law on Earth 10.1 Newton’s law of motion and gravitation Newton’s three laws of motion are as follows. 1. Law of inertia: An object moves at a constant velocity if no external forces act on it. 2. Law of force: Force is equal to the product of mass and acceleration (F = ma). 3. Law of action-reaction: For every action, there is an equal and opposite reaction. The first law is Galileo’s law of inertia, which describes motion at a constant speed, corresponding to a = 0 in the second law, which is a special case. According to Newton’s law of universal gravitation, the gravitational force between two objects is inversely proportional to the square of the distance between them and directly proportional to the product of their masses. Let m1 and m2 be the masses of two objects, and x1 and x2 be their position vectors. Then, the gravitational force acting on object m1 is given by Fm1 = −G m1 m2 m1 m2 r = −G 2 er 2 r r r (10.1) where G is the gravity constant with a value of G ∼ = 6.674 × 10−11 m2 /kg s. r is the distance between the two objects, r is the position difference vector, and er is the unit vector in the direction of r. That is, r = x1 − x2 , r = ∥r∥, r er = . r (10.2) Here, r is the position vector pointing from object m2 to object m1 . Hence, the motion of m1 as observed by an observer at m2 is r(t). We will consider m2 as a large object like the sun and m1 as a small object like the Earth. In Equation (10.2), the trajectory of r(t) with respect to the time parameter t ∈ R is viewed from the origin with m2 . In the next chapter, we will see that this trajectory is an elliptical 85 86 10 Newton’s law on Earth orbit. However, in reality, m2 also moves slightly, so the motion of m1 is a slightly deviated orbit from the ellipse by the amount m2 moves. What remains stationary (or moves at a constant speed) is the center of mass of the two celestial bodies, which is not the sun but the center of mass of the two celestial bodies, the sun and the Earth. The vector rr is a unit vector pointing from x2 to x1 . This corresponds to the unit vector er in Lecture 8 when x2 is taken as the origin in polar coordinates. 10.2 Work and energy When an object receives a force F and moves a distance ℓ, the magnitude of work W is given as follows: W = f ℓ (work = component of force in the direction of motion × displacement). (10.3) Here, f refers to the component of force F in the direction of motion. If the force F is perpendicular to the direction of motion, then f = 0, and no work is done by the force. For instance, if a planet or satellite orbits in a circular orbit, the force acting, i.e., gravity, is perpendicular to the direction of motion, and thus, the work done is W = 0. Question 10.1. Why is work defined as the product of force and displacement? Is Equation (10.3) a definition of work? If work is energy, then Equation (10.3) should be a formula for calculating energy, not a definition of work. Some books refer to Equation (10.3) as the definition of work. If that is the case, then one must separately demonstrate that work and energy are the same. In any case, what needs to be explained is that using Equation (10.3) for calculation yields the correct energy. And by ”correct,” it is meant that the calculated energy does not contradict existing energy concepts. In fact, Equation (10.3) can be understood as a formula for calculating potential energy. Let’s see through an example how energy and work are connected. Suppose a mass m in a stationary state in one-dimensional space receives force f = ma for a time t. Then the obtained velocity is v = at. Therefore, the kinetic energy at that moment is Ek = 21 ma2t 2 . So what is the distance traveled? The distance traveled is obtained by integrating the velocity. That is, Z t ℓ= 0 1 as ds = as2 2 t 0 1 = at 2 . 2 Therefore, using Equation (10.3) to calculate work, W = f ℓ = ma × 12 at 2 = 12 ma2t 2 , which is equal to kinetic energy. In other words, energy can also be calculated using Equation (10.3). In reality, Equation (10.3) is a formula for calculating potential 10.3 Gravity force and potential energy 87 energy when the parameter for energy calculation is changed from time to distance (arc-length). What if the force is not constant but a function? If it is a function of time, then it means that acceleration varies with time, and thus, velocity becomes the integral R of acceleration, i.e., v(t) = v(0) + 0t a(s)ds. Therefore, kinetic energy can be easily obtained. If the force is a function of position, then integration using Equation (10.3) is necessary. The actual gravity (10.1) is a function of position or distance, and in this case, Equation (10.3) is more useful than the formula for kinetic energy. For example, if an object moves along the x-axis and the force component in the xdirection is a function of x, i.e., f = f (x), then the work done by the force f (x) between x = a and x = b is given by Z b W= f (x)dx. a It is called a definite integral because it calculates the accumulated work done by the force f (x) from the beginning to the end. That is, the definite integral is to determine the signed area of the graph of f (x) from x = a to x = b. 10.3 Gravity force and potential energy The motion energy of a planet undergoes exchange between potential and kinetic energy as it alternates between acceleration and deceleration. When an object with mass m1 moves with velocity v, the kinetic energy is given by: 1 Ek = m1 ∥v∥2 . 2 The following problem demonstrates that the potential energy due to gravity on the surface of Earth can also be expressed as a product of gravity and distance. Problem 10.1 (Gravity on the earth surface). The gravitational force exerted on an object with mass m1 at the Earth’s surface is −m1 gk̂. Here, g = 9.8 m/sec2 is the gravitational acceleration, and k̂ is the unit vector in the vertical direction on the Earth’s surface. If this object is placed at a height h > 0 above the surface, the object’s potential energy is E p = m1 gh (10.4) Show the following: (1) Confirm the magnitude of the gravity constant g using Equation (10.1). (2) Explain the concept of potential energy (10.4) using the work concept. (3) Explain the significance of potential energy (10.4). Solution 10.1 (1) The mass m corresponds to m1 , and the vector k̂ corresponds to r/r. Therefore, the remaining part corresponds to the constant g: 88 10 Newton’s law on Earth g = Gm2 /R2 ≈ 9.8 m/sec2 Here, m2 is the mass of Earth, and R is the radius of Earth. The value of g can be verified by finding it on the internet. (2) Work is a method of calculating potential energy. If the force F in the direction of motion of an object with respect to the ground is constant, then the work is given by fz h. Here, h is the (vertical) displacement. Therefore, the potential energy is E p = m1 gh. (3) The energy required to push the object from the Earth’s surface to its current position is the potential energy. Alternatively, it is the amount of work needed for the object to fall to the Earth’s surface from that position. Problem 10.2. A mass of 2Kg is thrown vertically upward from the ground with a force of twice the gravity for t seconds. Calculate the kinetic and potential energies at that moment. Solution 10.2 If the force is twice the gravity, then 2mg = 4Kgg. The acceleration is g since we subtract gravity. Therefore, the velocity after t seconds is Rt gds = gt. Therefore, the kinetic energy is 21 mv2 = g2t 2 Kg. The distance traveled is R0t 1 2 1 2 2 2 0 gsds = 2 gt , so the potential energy is E p = mgh = (2Kg)g 2 gt = g t Kg. The 2 2 2 2 2 2 total energy is g t Kg + g t Kg = 2g t Kg. Alternatively, using Equation (10.3), the total energy can be calculated. Then, 4gKg × 21 gt 2 = 2g2t 2 Kg. If the total energy after 100 seconds is expressed in units, since g = 9.8 m/sec2 , the total energy is as follows: Etotal = 2(9.8)2 m2 /sec4 × (100)2 sec2 Kg = 1.9208 × 106 m2 Kg/sec2 . Calculating the potential energy or gravity between planets or between a planet and a star requires a different approach. In these cases, gravity cannot be treated as a constant. Gravity becomes a function of distance, requiring integration to calculate energy. However, there is another fundamental problem. Potential energy on the Earth’s surface is defined to be 0, with the Earth’s surface as the reference point. What should be the reference point for potential energy between planets? Problem 10.3 (Potential energy with Earth’s surface as reference). Gravity is a function of distance r between two objects given by Newton’s law of gravitation (10.1). Let’s denote the mass of Earth as m2 . For an object with mass m1 located at a distance r > 0 from the center of Earth (not on the Earth’s surface), the potential energy is given by E p = Gm1 m2 (R−1 − r−1 ), (10.5) where R is the radius of Earth and r is the distance between the object’s center and the Earth’s center. Solution 10.3 First, assume that the object moves up and down along the center of the Earth. The k̂ component of gravity is given by f = −Gm1 m2 s−2 . Here, s 10.3 Gravity force and potential energy 89 is the distance to the center of the Earth. Assume pushing the object away from the Earth’s surface requires a force in the opposite direction. Integrating gravity for r > R yields: Z r R Gm1 m2 s−2 ds = −Gm1 m2 s−1 r R = Gm1 m2 (R−1 − r−1 ). This matches (10.5). Let h denote the distance from the surface. Then, r = R + h. Therefore, the potential energy is: E p = Gm1 m2 1 R − 1 R+h−R h R2 = Gm1 m2 = Gm1 m2 2 2 . R+h R(R + h) R R + Rh If h is much smaller than R, R2 R2 +Rh ≈ 1. The potential energy can then be written as: E p ≈ Gm1 m2 Gm2 h = m1 2 h, 2 R R which is a valid approximation for the potential energy (10.5). (The radius of Earth is 2 R = 6371 km. If h = 10 km, then R2R+Rh ≈ 0.9984, with a difference of about 0.16%.) h Remark 10.1 (A brief note). Since h is much smaller than R, we can say R(R+h) ≈ Rh2 . However, we left the h in the numerator. We shouldn’t delete everything just because it’s small. Depending on what we want to see, we can distinguish between what can be deleted and what shouldn’t be deleted, depending on what’s around. Question 10.2. The potential energy (10.5) becomes 0 on the Earth’s surface. This definition represents potential energy with respect to the Earth’s surface. What happens if we calculate potential energy with respect to the center of the Earth? When calculating the potential energy from the center of the Earth, it corresponds to the case where R = 0. In this scenario, the potential energy given by (10.5) diverges, meaning: lim Gm1 m2 (R−1 − r−1 ) = ∞. R→0 This implies that the potential energy becomes infinite when measured from the center of the Earth. Essentially, this suggests that an infinite amount of energy is required to move away from the center of the Earth. In other words, objects located at the center of the Earth cannot escape. (Even if an object has a small mass, if it can be compressed sufficiently, nothing can escape from within. Such objects are known as micro black holes.) If potential energy cannot be measured from the center of the Earth, the next natural choice is to measure it from ∞. Then, when R = ∞, the potential energy is given by: Gm1 m2 . (Potential Energy) Ep = − r 90 10 Newton’s law on Earth In this case, the drawback is that potential energy is negative. When measured from infinity, the potential energy is 0 at ∞ and becomes increasingly negative as it approaches the Earth’s center. But among other choices, this is the best one. When considering the movement between planets, the reference point for potential energy is r = ∞, and the potential energy is negative and becomes 0 at r = ∞. When considering movement due to gravity on the Earth’s surface, the reference point is the surface of the Earth, and potential energy is positive, reaching a minimum of 0 at h = 0. 10.4 Projectile motion Let’s examine the trajectory of a projectile launched from the ground at an angle φ ∈ (0, π2 ) with an initial velocity v0 > 0. The objective is to find the projectile’s trajectory before it touches the ground again, the maximum height reached before it falls, the distance traveled, and the time it stays in the air. Air resistance is ignored. Assuming the projectile moves in the xz-plane, let’s find the trajectory r(t). Let the starting point be the origin, r(0) = 0, and the initial velocity be v(0) = (v0 cos φ , v0 sin φ ). The acceleration a is given by gravity, so a(t) = (0, −g). The velocity vector v(t) at time t is obtained by integrating the acceleration with initial conditions: Z c1 c v cos φ v(t) = a(t)dt = , v(0) = 1 = 0 . −gt + c2 c2 v0 sin φ Thus, v(t) = (v0 cos φ , −gt + v0 sin φ ). Integrating once more to calculate the position vector: Z v0 cos φt + c1 c 0 r(t) = v(t)dt = ⇒ r(0) = 1 = . c2 0 − 21 gt 2 + v0 sin φt + c2 Therefore, the projectile’s trajectory is: v0 cos φt x(t) r(t) = = . z(t) − 12 gt 2 + v0 sin φt z(t) = 0 represents the moment when the projectile is on the ground. Therefore, solving − 12 gt 2 + v0 sin φt = 0 gives us the moments when it touches the ground. One solution is the initial time, t = 0. The other is: T= 2v0 sin φ g Time of flight when it touches the ground again. The x-component x(T ) at time T is the distance traveled: 10.4 Projectile motion 91 R = x(T ) = 2v20 sin φ cos φ g Range The projectile’s maximum height occurs at half of the total time of flight, so: H = z(T /2) = v20 sin2 φ 2g Maximum height Problem 10.4. Explain how the projectile trajectory changes if there is a crosswind blowing at a speed of v1 . Solution 10.4 If we ignore air resistance, no matter how strong the crosswind is, it doesn’t affect the projectile’s trajectory. When considering air resistance, the method used above is not sufficient. Problem 10.5. Given a fixed launch velocity, how can you maximize the distance the projectile travels? Solution 10.5 If the launch angle φ is fixed, the maximum distance and height are proportional to the square of the velocity v20 . The time of flight is proportional to v0 . If the velocity is fixed, you can choose the angle φ . The range is maximized when sin φ cos φ reaches its maximum value. To find the maximum, differentiate it since it’s 0: (sin φ cos φ )′ = cos2 φ − sin2 φ = 2 cos2 φ − 1. Thus, the critical points are when cos φ = √1 , 2 so φ = π4 . Question 10.3. The following text is from a baseball magazine: ”We were taught in school that the ’most distance a ball can be thrown angle’ is 45 degrees. But in actual baseball, the optimal launch angle is close to 30 degrees.” Why is this different? (The optimal angle for a golf ball is about 17 degrees.) The reason is air resistance and the spin of the ball. The ball’s spin is due to the bottom part of the bat hitting the ball. If the launch angle is 45 degrees and the ball has such spin, the actual trajectory is much higher than the optimal trajectory. The spin of a golf ball is also caused by hitting the bottom of the ball, making the spin more pronounced than a baseball and having a greater impact due to the surface of the ball. Of course, without air resistance, 45 degrees is always the optimal launch angle. Exercises 1. Calculate the potential energy of a 10 kg object on the Earth’s surface. (Consider R = ∞ as the reference point.) 92 10 Newton’s law on Earth 2. Calculate the gravitational force between the Earth and the Sun. Compare it with the gravitational force between Mars and the Sun. (Necessary data can be found on the internet, such as the masses of Earth and Mars, and the distances from the Sun.) 3. Let the mass of Jupiter be 1.899 × 1027 kg and its radius be 140, 000 km. Calculate the magnitude of the gravitational force on Jupiter’s surface and compare it with the gravitational force on the Earth’s surface. 4. A 10 kg piece of iron falls into water with a depth of 10 meters. How much work does gravity do? (Necessary data can be found on the internet, such as the density of iron.) 5. Assume the speed of sound is 340 m/s. Calculate the maximum distance traveled when the projectile’s velocity is equal to the speed of sound. Also, determine the time of flight and maximum height reached. 6. It is said that the maximum range of a K9 howitzer is 53 km. What is the launch velocity? Lecture 11 Newton’s law in space: Two-body problem The purpose of this lecture is to solve the trajectory between two celestial bodies due to gravity using Newton’s laws of motion and differentiation, and to confirm Kepler’s laws through the two-body problem. We consider differentiation with respect to the time variable t in problems involving motion. In this case, to distinguish differentiation with respect to the time variable t from differentiation with respect to spatial variables, we use the following notation: Ḟ := d F = F ′ (t). dt 11.1 Kepler’s laws Astronomer Johannes Kepler described the orbits of planets around the Sun with three laws between 1609 and 1619. These laws modify Copernicus’ theory of circular orbits centered on the Sun and explain how planetary velocities change. Kepler’s three laws of planetary motion are as follows: 1. The orbit of a planet is an ellipse with one of the two foci at the Sun. 2. The area swept out by the line connecting the planet and the Sun is constant with time. 3. The square of the orbital period of a planet is proportional to the cube of the semi-major axis. Isaac Newton showed in 1687 that Kepler’s three laws are generated as a result of Newton’s laws of motion and the law of gravity presented in Section 10.1. We aim to understand this process in this lecture. In this section, we follow the notation from the previous section. For example, if the masses of two celestial bodies are m1 and m2 , and their positions are x1 and x2 , 93 94 11 Newton’s law in space: Two-body problem respectively, then it is the case where the position difference is r = x1 − x2 when viewing m1 from m2 (m2 is assumed to be at the origin, i.e., x2 = 0, then r = x1 ). Problem 11.1 (Plane motion). (1) Show that there exists a constant vector c satisfying the following equation: r × ṙ = c. (11.1) (2) Explain the meaning of this relationship. To solve this problem, one must remember the properties of the cross product: 1. The cross product v1 × v2 is defined between two 3-dimensional vectors. 2. The cross product v1 × v2 is a vector perpendicular to both v1 and v2 . 3. If v1 and v2 are parallel or if one of them is zero, then v1 × v2 = 0. 4. The derivative of a cross product satisfies the following product rule: d (v1 × v2 ) = v̇1 × v2 + v1 × v̇2 . dt Solution 11.1 (1) The gravitational force acting on object m1 is given by Fm1 = − Gmr12m2 er , where the direction is towards m2 . Then, since m1 ẍ1 = Fm1 and m2 ẍ2 = −Fm1 , both ẍ1 and ẍ2 are in the direction of r or −r. Therefore, d (r × ṙ) = ṙ × ṙ + r × r̈ = 0 + 0 = 0. dt Thus, there exists a constant vector c such that r × ṙ = c. (2) Consider a plane perpendicular to the vector c. Then, Equation (11.1) implies that the vector r is perpendicular to c, meaning it lies on the plane. Thus, object m1 moves on this plane. ⊔ ⊓ 11.2 Two-body problem Consider two objects moving in space. Let m1 and m2 be their masses, and x1 (t) and x2 (t) be their positions at time t. Assuming no external forces other than gravity act between them, they satisfy two second-order differential equations given by Newton’s law of force, i.e., m1 ẍ1 = −G m1 m2 er , r2 (11.2) m2 ẍ2 = G m1 m2 er . 2 r This problem is called the two-body problem. Let’s solve this problem to see if the solutions indeed satisfy Kepler’s laws. 11.3 Center of mass 95 Note that Newton’s third law of motion, the law of action and reaction, is already embedded in the differential equations (11.2). The force exerted on m1 by m2 is pulling m1 , and conversely, the force exerted on m2 by m1 is pulling m2 , so their sum is m1 ẍ1 + m2 ẍ2 = 0. These forces are equal in magnitude but opposite in direction. While (11.2) can be seen as second-order vector differential equations in three dimensions, knowing that they lie in a plane, we understand them as second-order vector differential equations in two dimensions. So, in fact, we need to solve a total of four scalar equations. To solve first-order differential equations, we need one initial condition. Second-order differential equations require two initial conditions. To solve two second-order differential equations, we need four initial conditions. That is, suppose the following are given: x1 (0), ẋ1 (0), x2 (0), and ẋ2 (0). (11.3) Thus, there are four degrees of freedom in the choice of initial values. This means that there are various possibilities. Since each of them is a vector on the plane, there are a total of eight degrees of freedom. 11.3 Center of mass To find the solutions x1 (t) and x2 (t) of the two second-order vector differential equations (11.2), we proceed with simplification. First, we find the center of mass, which is the position where the total mass is balanced, defined as the weighted average of positions with respect to mass. It is as follows: R= m2 m1 x1 + m2 x2 m1 x1 + x2 = . m1 + m2 m1 + m2 m1 + m2 (Center of Mass) Taking the second derivative with respect to time of the center of mass, we get, R̈ = m1 ẍ1 + m2 ẍ2 = 0. m1 + m2 In other words, the center of mass moves at a constant velocity. Therefore, as discussed in Lecture 7, we can adopt a coordinate system where the center of mass is at the origin. This coordinate system is called the center of mass frame. In this frame, the following holds: R = Ṙ = R̈ = 0. Under this coordinate system, the initial values in (11.3) must satisfy the following two conditions: 96 11 Newton’s law in space: Two-body problem m1 x1 (0) + m2 x2 (0) = 0, m1 ẋ1 (0) + m2 ẋ2 (0) = 0. Here, four degrees of freedom have been used, leaving four remaining. 11.4 Displacement vector What we are calculating is the position difference vector r = x1 −x2 . In other words, it means calculating the trajectory with x2 at the origin. Just like when a heavy celestial body like the sun is at the origin, thinking of x2 = 0, r can be considered as the position of the earth. However, this should not be calculated as such because what is fixed is the center of mass. Therefore, even if r satisfies the elliptical orbit, the actual trajectory of m1 is slightly off the ellipse. How much is it? It is the distance between the center of mass and x2 . Now let’s consider the movement of r. Since Fm2 = −Fm1 , the difference in acceleration is as follows: 1 Fm Fm 1 m1 + m2 + Fm1 = Fm1 . r̈ = ẍ1 − ẍ2 = 1 − 2 = m1 m2 m1 m2 m1 m2 Rewriting Fm1 using Newton’s law of gravity, we get: r̈ = − m1 + m2 m1 m2 1 G 2 er = −G(m1 + m2 ) 2 er . m1 m2 r r If we denote the term k = G(m1 + m2 ), this equation can be written as: r̈ = − k er , r2 k = (m1 + m2 )G. (11.4) This type of problem is called the Kepler problem. The same problem arises not only with gravity but also in the case of electric fields. Once the displacement vector 11.5 Kepler problem 97 r(t) is obtained, we can determine the two trajectories x1 and x2 using the center of mass R(t) and the vector difference r(t). Problem 11.2. Show the following: x1 (t) = R(t) + m2 r(t), m1 + m2 x2 (t) = R(t) − m1 r(t). m1 + m2 (11.5) Solution 11.2 It’s a simple calculation. R and r are given by: R= m1 m2 x1 + x2 , m1 + m2 m1 + m2 r = x1 − x2 . To compute x1 , we can eliminate x2 . Substituting x2 = x1 − r, we have: R= m1 m2 m2 x1 + (x1 − r) = x1 − r. m1 + m2 m1 + m2 m1 + m2 The rest is straightforward. We obtain x2 similarly. ⊔ ⊓ Remark 11.1. When m1 is significantly smaller than m2 , we can ignore m1 in the equation (11.4) and use k = m2 G. Considering the relationship between the Earth and its satellite, the mass of the satellite is much smaller than the Earth’s mass, so it may be reasonable to set m1 = 0 and use k = (m1 + m2 )G. In the case of the relationship between the Earth and the Sun, the error in the mass of the Sun may be larger than the mass of the Earth, so it may be reasonable to use k = m2 G. In practice, this is how it is done. However, in (11.5), using k = (m1 + m2 )G makes a difference when ∥r∥ is large. For satellites orbiting the Earth, the distance is small, but for planets far away, the distance is significant and cannot be ignored. 11.5 Kepler problem Let’s solve the Kepler problem (11.4). Restating the problem, we have: r̈ = − k er , r2 k = (m1 + m2 )G. Rewriting in terms of acceleration, we have: a=− k er , r2 k = (m1 + m2 )G. (11.6) Problem 11.3. Using the relations from Equations (8.5)–(8.7), show that the vector equation (11.6) can be written as the following two scalar equations: 98 11 Newton’s law in space: Two-body problem r̈ − rθ̇ 2 = − (m1 + m2 )G , r2 (11.7) rθ̈ + 2ṙθ̇ = 0. (11.8) Solution 11.3 Bringing Equation (8.7), we have: a = (r̈ − rθ̇ 2 )er + (rθ̈ + 2ṙθ̇ )eθ = − k er r2 Comparing the coefficients of er and eθ , we obtain the two expressions above. ⊔ ⊓ These two equations play a crucial role. Equation (11.8) implies the law of conservation of angular momentum, which is then used to derive the orbit equation from Equation (11.7). Problem 11.4 (Kepler’s Second Law and Conservation of Angular Momentum). (1) Show that the angular momentum r2 θ̇ is constant. (2) Explain how this relates to Kepler’s Second Law. (i.e., verify that the rate of change of the area swept out by the line connecting the Sun and the Earth when r = r(t) and θ = θ (t) is the angular momentum.) Solution 11.4 (1) First, let’s differentiate the angular momentum r2 θ̇ . Using Equation (11.8), we obtain: d 2 (r θ̇ ) = 2rṙθ̇ + r2 θ̈ = r(2ṙθ̇ + rθ̈ ) = 0. dt Thus, the angular momentum r2 θ̇ is constant. (2) From t to t + h, the area swept out by the line connecting the two celestial bodies is similar to the area of a sector. However, since the radius r(t) is not constant, there exists some radius between the maximum and minimum radii such that the area is given by: 1 (θ (t + h) − θ (t))r2 (t ∗ ), 2 t < t ∗ < t + h. Using the mean value theorem, t ∗ lies between t and t + h, converging to t as h → 0. Therefore, the instantaneous rate of change is given by: lim h→0 1 θ (t + h) − θ (t) 2 ∗ 1 r (t ) = θ̇ (t)r2 (t). 2 h 2 Thus, the rate of change of the area swept out by the line is half the angular momentum r2 θ̇ , and since the angular momentum is constant, this verifies Kepler’s Second Law. ⊔ ⊓ Now let’s derive the differential equation satisfied by the planet’s orbit. Equations (11.7) and (11.8) are two equations with r and θ as dependent variables and time 11.5 Kepler problem 99 t as the independent variable. Now, we want to eliminate time t and express r as a function of θ , treating θ as the independent variable. To do this, we use the fact that angular velocity is constant, denoted as L: L := r2 θ̇ . Then, θ̇ = L/r2 > 0. In other words, θ is a monotonically increasing function of time. Therefore, instead of the time variable t, we can use θ for the variable transformation. Using the chain rule, time derivatives can be expressed in terms of derivatives with respect to θ : dθ d d L d d = = θ̇ = 2 . dt dt dθ dθ r dθ We then switch to the reciprocal of r, denoted as u = r−1 . Then, d(r−1 ) dr du = = −r−2 dθ dθ dθ is satisfied. Now, let’s substitute each term of Equation (11.7) into u(θ ). First, for the term r̈: 2 d dr L d L dr 2 2d u r̈ = = 2 = −L u dt dt r dθ r2 dθ dθ 2 Then, for the second term of Equation (11.7): rθ̇ 2 = (r2 θ̇ )2 L2 = = u3 L2 r3 r3 And for the third term: (m1 + m2 )G = u2 (m1 + m2 )G r2 Now, let’s substitute into Equation (11.7) and divide by −L2 u2 , we get the final equation in terms of u: d2u (m1 + m2 )G +u = (11.9) dθ 2 L2 This equation is the inhomogeneous second-order differential equation introduced in Equation (9.5). Its solution is as follows: u= (m1 + m2 )G (1 + e cos(θ − θ0 )). L2 (11.10) Since u = r−1 , we can express the distance r between the two bodies as follows: r= L2 G(m1 + m2 )(1 + e cos(θ − θ0 )) (11.11) 100 11 Newton’s law in space: Two-body problem We call e the eccentricity and θ0 the phase offset, determined by initial conditions. 2 If e = 0, the orbit is a circle with radius r = G(mL+m ) . If 0 < e < 1, the orbit is an 1 2 ellipse. If e > 1, the orbit is a hyperbola. The boundary case e = 1 is a parabola. The relationship between eccentricity and orbit is explained in detail in Appendix B. Question 11.1. What do we mean by saying that the orbit is a hyperbola or a parabola? It means that the orbit does not orbit around the sun but rather passes by and continues. Once the center of mass is fixed, there are 4 degrees of freedom remaining. To specify the planet’s orbit, four elements are needed, and four degrees of freedom determine them. The four elements specifying the orbit are as follows: 1. Angular velocity L: Determines the size of the orbit. 2. Eccentricity e: Determines the shape of the orbit. 3. Phase offset θ0 : Determines the position of the planet on the orbit. 4. Angle of the major axis: Determines the angle between the orbit’s major axis and the x-axis. We are primarily interested in L and e. In particular, these two are related to the total energy of the planet, and we will learn about this relationship in the next lecture. Question 11.2. Kepler’s laws provide a remarkably accurate description of planetary orbits, derived from solving Newton’s two-body problem. It’s remarkable that Kepler derived them from observational data. The only thing missing from Kepler’s laws is the fact that the center of mass does not move. How far away is the center of mass from the Sun? In the case of Jupiter, how far away is the center of mass from the Sun? Exercises 1. 2. Lecture 12 Kepler’s law and the energy of planets Curiosity: the final frontier for intelligence. These are voyages of college students. Its four-year mission: to explore strange ideas. To seek out new ideas and methodology. To boldly think what no man has thought about before! The solar system has eight planets orbiting around the sun. Each planet revolves in an elliptical orbit, with eccentricity close to or below 0.1, resembling circles. Only Mercury has an eccentricity of about 0.2. Although the planes of revolution for the eight planets differ slightly, they appear to lie on a single plane. All eight planets rotate in the same direction. In contrast, comets like Halley’s Comet or HD20782b, discovered in 2006, orbit in highly eccentric orbits, with eccentricities around 0.97 and 0.9999, respectively. Why do they orbit in such different trajectories? Although it is believed that these eight planets and the sun were formed simultaneously, what evidence supports this? Were comets with eccentricities close to 1 also formed around the same time as the solar system? 12.1 Energy of circular orbits To obtain the equation for planetary orbits (11.10), one had to derive equation (11.9), but most properties of planetary motion can be derived from the conservation of angular velocity obtained from equation (11.8) (i.e., the fact that L = r2 θ̇ is a constant 101 102 12 Kepler’s law and the energy of planets with respect to time) and equation (11.7). Rewriting equation (11.7), we get: r̈ − rθ̇ 2 = −kr−2 , k = G(m1 + m2 ). (12.1) The gravitational potential energy between two objects with masses m1 and m2 at a distance r from each other is often expressed as: Ep = − Gm1 m2 . r (12.2) Problem 12.1 (Escape speed and escape energy). Let the radius of the Earth be R. Find the minimum speed required to launch an object with mass m from the surface of the Earth to escape its gravitational pull. Solution 12.1 The potential energy at the surface of the Earth is E p = − GmM R , where M is the mass of the Earth. As the object escapes Earth’s gravity, the total energy must be greater than 0. Thus, the kinetic energy must exceed GmM R , which is the escape energy. This leads to the minimum speed v satisfying: 1 2 GmM mv = . 2 R q Solving this equation yields v = 2GM ⊓ R , which is the escape speed. ⊔ Problem 12.2 (Total energy of circular orbits). Show that the total energy of an object orbiting in a circular orbit with radius r > 0 is given by: Etotal = 1 Gm1 (m1 − m2 ) < 0. 2 r (12.3) Solution 12.2 Since the object is in a circular orbit, r̈ = 0. Therefore, using (12.1) to compute the kinetic energy, we get: 1 1 Gm1 (m1 + m2 ) Ek = m1 r2 θ̇ 2 = . 2 2 r The potential energy is the same as (12.2), so the total energy is the sum of these two, as given in (12.3). ⊔ ⊓ When m1 is significantly smaller than m2 , as in the case of artificial satellites orbiting the Earth in circular orbits, the kinetic energy, potential energy, and total energy can be expressed as follows: Ek = 1 Gm1 m2 , 2 r Ep = − Gm1 m2 , r Etotal = − 1 Gm1 m2 . 2 r In other words, as an object descends from infinite distance to its current position, half of the decrease in potential energy is converted to kinetic energy. Therefore, the total energy and kinetic energy have opposite signs. 12.2 Energy of elliptical orbits 103 Problem 12.3 (Formation of Inner Planets and Total Energy). Explain why the 8 planets in the solar system were not formed from outside but were instead formed together with the Sun. Solution 12.3 The 8 planets in the solar system orbit in nearly circular orbits, and as shown in problem 12.2, the total energy of planets orbiting in such orbits is much lower compared to objects coming from outside. Therefore, the possibility of planets coming from outside is very low. ⊔ ⊓ Satellites orbiting in higher orbits have lower velocities, thus their kinetic energy decreases. However, their potential energy increases to twice the decreased kinetic energy. Consequently, the total energy increases with higher orbits, and raising a satellite to a higher orbit requires more energy. Problem 12.4 (Minimum energy required to raise a satellite to orbit). Consider a satellite with mass m1 on the surface of the Earth and when it is at a height h above the ground in a circular orbit. (Ignore the rotation of the Earth.) Solution 12.4 Let the radius of the Earth be R. When the satellite is on the surface of the Earth, its kinetic energy is zero, and the potential energy is E p = −GmR1 m2 . Thus, the energy difference is (R + 2h)m1 m2 + Rm21 ∼ (R + 2h)m1 m2 1 Gm1 (m1 − m2 ) −Gm1 m2 − =G . =G 2 R+h R 2R(R + h) 2R(R + h) This represents the minimum energy required to raise a satellite of mass m1 to an orbit with height h above the ground. (Of course, in reality, much more energy is needed. Taking into account fuel weight, rocket weight, etc., much more energy would be needed than this minimum energy.) ⊔ ⊓ 12.2 Energy of elliptical orbits Setting the phase offset to θ0 = 0, the distance r given in (11.11) can be expressed as: L2 r= (12.4) G(m1 + m2 )(1 + e cos(θ )) Problem 12.5 (Total energy and eccentricity). Show that the total energy of an object orbiting in an elliptical orbit with eccentricity e and angular velocity L is given by: Etotal = G2 m1 (m1 + m2 )(1 + e)[(m1 + m2 )(1 + e) − 2m2 ] . 2L2 (12.5) 104 12 Kepler’s law and the energy of planets Solution 12.5 Energy is conserved, so one can compute the kinetic and potential energies at a specific moment. Using the formula for the velocity v given in equation (8.6), the kinetic energy of an object orbiting in an elliptical orbit can be calculated as: 1 1 1 Ek = m1 v · v = m1 (ṙer + rθ̇ eθ ) · (ṙer + rθ̇ eθ ) = m1 (ṙ2 + r2 θ̇ 2 ) 2 2 2 At the moment of minimum distance, when θ = 0, the derivative is ṙ = 0, so the kinetic energy becomes: m1 G2 (m1 + m2 )2 (1 + e)2 1 1 L2 Ek = m1 r2 θ̇ 2 = m1 2 = 2 2 r 2 L2 The potential energy at this moment is: Ep = − Gm1 m2 G2 m1 m2 (m1 + m2 )(1 + e) =− r L2 Adding these energies together yields the total energy given in (12.5). ⊔ ⊓ Rewriting the expression for total energy in equation (12.5), we get: (m1 + m2 )e2 + 2m1 e + m1 − m2 − 2L2 Etotal = 0. G2 m1 (m1 + m2 ) Solving the quadratic equation for eccentricity e yields one positive root and one negative root. Taking the positive root gives us the eccentricity. Hence, s m21 m1 m1 − m2 2L2 Etotal e=− ± − + 2 . 2 m1 + m2 (m1 + m2 ) m1 + m2 G m1 (m1 + m2 )2 In the case where m1 is much smaller than m2 , this can be simplified to: s 2Etotal L2 e = 1+ . m1 G2 m22 (12.6) Comparing this formula, it is evident that Etotal < 0 corresponds to an ellipse (all solutions of closed orbits are ellipses), Etotal = 0 corresponds to a parabola, and m1 G2 m22 m1 k 2 Etotal > 0 corresponds to a hyperbola. In particular, Etotal = − 2 ∼ =− 2L 2L2 corresponds to a perfect circular orbit. Given the total energy and angular velocity, the eccentricity is determined. Conversely, given the eccentricity and total energy, the angular velocity is determined. 12.4 Elliptical orbits of satellites 105 12.3 Circular orbit of satellites Consider an artificial satellite orbiting the Earth in a circular orbit with radius r. The corresponding orbit equation is r3 θ̇ 2 = G(m1 + m2 ), (12.7) where m2 is the mass of the Earth and m1 is the mass of the satellite. Let T be the time taken for one revolution, i.e., the orbital period. Since the radius and angular velocity are constant, θ̇ is constant, and we have T θ̇ = 2π. Substituting this into the equation above, we obtain the following Kepler’s 3rd law: 4π 2 T2 = . r3 G(m1 + m2 ) (12.8) Problem 12.6 (Kepler’s third law). (1) What happens to the period when the radius of the satellite is doubled? (2) What is the radius of the orbit if the satellite revolves around the Earth once a day? Solution 12.6 (1) Calculate using (12.7) or (12.8). Expressing the formula for the period using (12.7), we have T= 1/2 4π 2 r3 . G(m1 + m2 ) Therefore, if we substitute r with 2r, T becomes 23/2 times. (2) Now, let’s use (12.8), r = θ̇ −2/3 (G(m1 + m2 ))1/3 . If the Earth rotates once a day, then θ̇ = 2π/86400s. Ignoring the weight of the artificial satellite m1 and substituting the mass of the Earth m2 = 5.972 × 1024 kg and the gravitational constant G ∼ = 6.674 × 10−11 m2 /kg s, we get the distance from the Earth’s center to the geostationary orbit as r = 4.2240 × 107 m. It’s about 6.63 times the Earth’s radius. ⊔ ⊓ 12.4 Elliptical orbits of satellites Elliptical orbits of satellites are also commonly used. The moment when an artificial satellite approaches the Earth the closest occurs when the angle θ is 0 as represented by (12.4). The distance at that moment is as follows: 106 12 Kepler’s law and the energy of planets r0 = L2 . G(m1 + m2 )(1 + e) The speed at this moment is denoted as v0 and is given as follows: v0 = ∥v∥ = ∥ṙer + rθ̇ eθ ∥ = r0 θ̇ . Thus, r0 v0 = r02 θ̇ is the angular velocity L. Re-writing (12.4) using this minimum distance, we get: (1 + e)r0 . r= 1 + e cos θ On the other hand, when it is farthest away with cos θ = −1, the distance is as follows: 1+e r1 = r0 . 1−e The semi-major axis of the ellipse is half of the major axis: r0 + r1 r0 L2 = = . 2 1 − e G(m1 + m2 )(1 − e2 ) √ The semi-minor axis b = a 1 − e2 is obtained using the eccentricity (8.8). a= Problem 12.7 (Kepler’s 3rd law). Let T > 0 be the period in which a planet orbits and a > 0 be the semi-major axis of the orbit. Show the following: T2 4π 2 = . 3 a G(m1 + m2 ) (12.9) √ Solution 12.7 The area of the ellipse is πab = πa2 1 − e2 . Using Kepler’s 2nd law to find the area, we have: Z T 0 Therefore, 1 dA = 2 Z T 0 1 Ldt = LT. 2 √ 2πa2 1 − e2 T= L is satisfied. Squaring both sides and dividing by a3 , we get: T2 4π 2 a(1 − e2 ) 4π 2 = = , 3 2 a L G(m1 + m2 ) which gives Kepler’s 3rd law. ⊔ ⊓ Question 12.1. Between an artificial satellite orbiting in a circular orbit with radius r and an artificial satellite orbiting in an elliptical orbit with a semi-major axis equal 12.4 Elliptical orbits of satellites 107 to r but an eccentricity e, which one has a longer orbital period? Which orbit requires more energy to place the satellite in? If the mass of the satellite doubles, how does the period change? Question p 12.2. A pendulum with length ℓ and mass m1 is known to have a period T = 2π ℓ/g. Is there any relationship between this and Kepler’s 3rd law (12.9)? Problem 12.8. The Enterprise is flying towards an unknown planet. Just before entering the gravitational influence of the planet, it is moving at a speed of v0 . Instead of landing the Enterprise on the planet, Spock and McCoy decide to beam down using the transporter. The maximum beaming distance is r > 0, so it is decided to orbit the planet on a circular orbit with radius r. How much energy needs to be lost? How can we save energy? Solution 12.8 Assuming the mass of the spacecraft Enterprise is m1 , then the kinetic energy at infinity is Ek = 12 m1 ∥v0 ∥2 , so this is the total energy as seen from the planet. If the spacecraft orbits on a circular orbit with radius r > 0, the total energy changes to Etotal = Gm1 (m2r1 −m2 ) . Therefore, the energy difference is 1 Gm1 (m1 − m2 ) m1 ∥v0 ∥2 − , 2 2r which is the energy lost to stay on the circular orbit. To save energy, it would be better to orbit on an elliptical orbit with a minimum distance of r. ⊔ ⊓ Problem 12.9. In the case of the above problem, find out how much energy can be saved if the Enterprise is orbiting an elliptical orbit with eccentricity e and minimum distance r. Solution 12.9 Using (12.4), the minimum distance of an orbit with eccentricity e and angular velocity L occurs when cos θ = 1. In that case, L2 = rG(m1 + m2 )(1 + e) holds. And in this case, the total energy is given by Etotal = Gm1 [(m1 + m2 )(1 + e) − 2m2 ] Gm1 (m1 − m2 ) Gm1 (m1 + m2 )e = + 2r 2r 2r The difference in energy is can be saved. ⊔ ⊓ Gm1 (m1 + m2 )e , and this is the amount of energy that 2r 108 12 Kepler’s law and the energy of planets 12.5 Interstellar and solar system object Let’s assume there is an object moving in space with velocity v and it happens to approach the Sun. What could happen? The object could collide head-on with the Sun and be absorbed, or it could pass by the Sun and move elsewhere, or it could collide partially and some parts could be absorbed by the Sun while the rest separates. Could it then be trapped by the gravity of the Sun and orbit around the Sun? Could celestial bodies currently orbiting the Sun have formed in this way? Our idea to verify this is simple. The energy of a planet on an orbit should be conserved. Compare this energy with the energy of the extraterrestrial planet. Before the object enters the region affected by the Sun’s gravity, let’s assume it is moving with velocity v. If the mass of the object is m1 , then the kinetic energy is Ek = 12 m1 ∥v∥2 . The potential energy at infinity is E p = 0, so the total energy is the kinetic energy at infinity. That is, if there is no energy loss due to collision or other factors, 1 Etotal = Ek + E p = m1 ∥v∥2 2 should hold. If the object is dragged by the gravity of the Sun and orbits around it, then the total energy will be a positive value. Of course, there may be changes in total energy due to too close approaches causing collisions or some parts being torn off and separated. Problem 12.10 (Orbiting by extraterrestrial objects). Investigate whether a planet or object orbiting on an elliptical orbit with an eccentricity close to 1 can have a positive total energy. Solution 12.10 In Problem 12.5, we calculated that the total energy of a celestial body orbiting on an ellipse with an angular velocity L and eccentricity e is Etotal = G2 m1 (m1 + m2 )(1 + e)[(m1 + m2 )(1 + e) − 2m2 ] . 2L2 To have a positive value, we need (m1 +m2 )(1+e)−2m2 > 0. When the eccentricity e is close to 1, it is possible. However, looking at the data, it is very difficult for the total energy to be positive even when the eccentricity is close to 1 due to the large mass difference between the planet and the Sun. If it were an object from outer space, it might have had a change in total energy. ⊔ ⊓ Exercises 1. Calculate the minimum energy required to place a satellite with a mass of 103 kg in a stationary orbit. 12.5 Interstellar and solar system object 109 2. Determine the total energy of a satellite with a mass of 103 kg orbiting on an elliptical orbit with a period of 1 day and an eccentricity e. 3. Halley’s comet, discovered by E. Halley, has an orbital eccentricity of 0.9673 and a period of 76.03 years. Find the maximum and minimum speeds of this comet. Calculate the total energy. Part III The Arts of Calculus Differential and integral calculus have now become essential tools for understanding the world’s problems, and with them, much can be accomplished. Understanding and applying differentiation and integration effectively is useful, and in Part III, we will understand and learn the core techniques for this. Lecture 13 Curves and particle trajectories in R3 When the variable t represents time and r(t) represents the position of a moving particle at time t, then the derivative r′ (t) = v(t) represents the velocity of the particle at time t. Even if it is not necessarily the trajectory of a particle, r(t) represents a curve in space and can represent one-dimensional objects such as wires or bent bars. By using arc length s as a variable instead of time t, we can characterize the properties of the curve itself rather than the trajectory of the object. In this lecture, we study the properties of curves in three-dimensional space by alternately using the time variable t and the arc length variable s. In particular, we use a lot of notation abuse in this lecture, and you should get used to it. 13.1 Arc length as a variable Let’s assume driving from home to work. The distance between two locations is the length of the straight line connecting the two locations. On the other hand, the traveled distance is the length of the trajectory the car travels, and it is obtained by integrating the velocity with respect to time from the departure time to the arrival time. Question 13.1. Why does integrating velocity give the length of the trajectory? What do you get if you integrate speed instead of velocity? Let r : [0, T0 ] → R3 represent the position of a particle moving in space. The variable t ∈ [0, T0 ] represents time. Velocity is the derivative of position with respect to time. Representing position and velocity as ′ x(t) x (t) r(t) = y(t) and v(t) = r′ (t) = y′ (t) z(t) z′ (t) respectively, the speed is the magnitude of velocity 113 13 Curves and particle trajectories in R3 114 ∥v(t)∥ = q (x′ (t))2 + (y′ (t))2 + (z′ (t))2 . The distance traveled, called arc length, is calculated as follows for a given time t: Z tq Z t s(t) = (x′ (τ))2 + (y′ (τ))2 + (z′ (τ))2 dτ = ∥v(τ)∥dτ. 0 0 Problem 13.1. For the curve in three-dimensional space r(t) = costi + sintj + tk with the variable range t ∈ [0, 2π], find the arc length. Solution 13.1 The velocity is v(t) = − sinti + costj + k and the speed is p √ ∥v∥ = sin2 t + cos2 t + 1 = 2. √ R √ Therefore, the arc length is s(t)√= 0t 2dt = 2t. Hence, the arc length for the ⊓ entire variable range [0, 2π] is 2 2π. ⊔ 13.2 Parametrization with arc length Let r : [0, T0 ] → R3 be a vector function that satisfies ∥r′ (t)∥ = ̸ 0 for all t. Let s(t) represent the length of the trajectory traveled from the starting time 0 to t > 0. The length of the trajectory is given by Z t Z tq s(t) = ∥v(τ)∥dτ = (x′ (τ))2 + (y′ (τ))2 + (z′ (τ))2 dτ. 0 0 Thus, s(t) is a function that corresponds to the arc length interval [0, L] from the time interval [0, T0 ], where L = s(T0 ) is the total length of the trajectory. Then, s(t) is an increasing function, so it has an inverse function denoted by t(s). Here, we are using notation abuse by using both arc length s and time parameter t as variables and functions. The relationships of the inverse function s(t(s)) = s and t(s(t)) = t hold. By using the derivative rule for the inverse function, we obtain ds dt t=t0 = 1 dt ds s=s 0 . Alternatively, s′ (t0 ) = 1 , t ′ (s0 ) t ′ (s0 ) = 1 s′ (t0 ) 13.2 Parametrization with arc length 115 can be obtained. Here, s0 and t0 correspond to the arc length and time parameter, respectively. Let r̃(s) = r(t(s)) represent the composite function of r(t) and t(s), which is defined on the interval representing arc length s ∈ [0, L] and takes values in R3 . When considering arc length s as the variable instead of time t, what will be the magnitude of the derivative? From a variable s perspective, the magnitude of the derivative is 1. Is this intuitively clear? Let’s calculate it: r̃′ (s) = v(t) d r(t(s)) = r′ (t)t ′ (s) = . ds ∥v(t(s))∥ (13.1) Thus, ∥r̃′ (s)∥ = 1. Let’s abuse notation. If we simply write r(s) without the tilde notation for r̃(s), will it be confusing with r(t)? We can distinguish them sufficiently. When writing r(s), we know that r is a function of the arc length variable s, and when writing r(t), we know that it is a function of time t. And when we write r′ (s), it means to d differentiate the position r with respect to s. That is, r′ (s) = r. Then, is v(s) = ds d r′ (s)? Not at all. v represents velocity, and therefore, v = r is given. Thus, v(s) = dt d dt r(s) represents the velocity at the corresponding point of arc length s. You should get used to such notation abuse in this lecture, including the definitions of T , N, S, κ, τ, etc., considering both s and t as variables. Remark 13.1. When it is clear what the variable of a function f is, writing the derivative as f ′ does not cause confusion. However, when abusing notation, it is necessary to clearly state what the variable is. Therefore, either write f ′ (s) or f ′ (t) to indicate df df clearly what the variable is. In that sense, the use of Leibnitz notation or ds dt prevents confusion. Question 13.2. Once write r(s) = r(t(s)) and r(t) = r(s(t)). Which one corresponds to r̃ defined in (13.1)? (First and fourth) Problem 13.2. Draw a graph of the curve in three-dimensional space r(t) = costi + sintj + tk with t ∈ [0, T0 ]. Compute the arc length s(t), its inverse function t(s), and r′ (s). Solution 13.2 As given in Problem (13.1), ∥v(t)∥ = Z t√ s(t) = 2dτ = √ 2. Therefore, √ 2t, 0 is given as a constant multiple, and its inverse function is given as t(s) = fore, upon composition, √s . 2 There- 13 Curves and particle trajectories in R3 116 s s s 1 s s r(s) = cos √ i + sin √ j + √ k, r′ (s) = √ − sin √ i + cos √ j + k 2 2 2 2 2 2 are obtained. In this case, it can be easily verified that ∥r′ (s)∥ = 1. ⊔ ⊓ 13.3 TNB coordinate system Unit tangent vector The unit tangent vector T is the derivative of the position vector r with respect to the arclength s: T (s) = r′ (s). We have already seen from equation (13.1) that T (s) is a unit vector. When we write T (t), it does not mean T (t) = r′ (t). We are applying notation abuse to the already defined T . Therefore, T (t) = T (s(t)). Problem 13.3. Compute the unit tangent vector for the curve r(t) = (1 + 3 cost)i + (3 sint)j + t 2 k. Solution 13.3 We compute r′ (t) as usual and then divide by its magnitude to obtain T (t). ⊔ ⊓ Problem 13.4. (1) Given the curve r(t) = cos(t)i + sin(t)j defined over the interval t ∈ (0, 2π), find the velocity v(t) and acceleration a(t) vectors of the curve, and explain the perpendicular relationship among the three vectors (r, v, a). (2) Given the curve r(t) = cos(t 2 )i + sin(t 2 )j defined over the interval t ∈ (0, 2π), find the velocity v(t) and acceleration a(t) vectors of the curve, and explain the relationship among the three vectors (r, v, a). (3) Given the curve r(t) = sin(t) cos(t)i+sin2 (t)j+cos(t)k defined over the interval t ∈ (0, 2π), find the velocity v(t) and acceleration a(t) vectors of the curve, and explain the relationship among the three vectors (r, v, a). Solution 13.4 (1) The velocity and acceleration are both perpendicular to the position vector r. (2) The position vector r is perpendicular to the velocity v. (3) The position vector r is perpendicular to the velocity v. ⊔ ⊓ Problem 13.5. (1) Prove that if the speed is constant, the acceleration a(t) and velocity v(t) are perpendicular to each other. (2) If the distance between a particle and the origin is constant, what is perpendicular to each other? Solution 13.5 (1) If the speed is constant, the square of the speed is also constant. The square of the speed is v(t) · v(t). Taking the derivative, we have 0= d (v(t) · v(t)) = 2v(t) · a(t) = 2v(t) · a(t). dt 13.3 TNB coordinate system 117 Therefore, v(t) and a(t) are perpendicular. (2) Similarly, if the distance between a particle and the origin is constant, its square is also constant. That is, r(t) · r(t) is constant. Taking the derivative, we conclude that r(t) and v(t) are perpendicular to each other. ⊔ ⊓ Curvature κ and principal unit normal vector N The unit tangent vector T (s) is differentiated to obtain the vector curvature T ′ (s), and its magnitude is called the scalar curvature. When we say ”curvature,” we usually mean the scalar curvature, denoted by the Greek letter κ (kappa), defined as: κ(s) = ∥T ′ (s)∥ = ∥r′′ (s)∥. Problem 13.6 (Principal unit normal vector). Let κ(s) ̸= 0. (1) Define N(s) = 1 ′ κ(s) T (s) as the principal unit normal vector. Show that it is perpendicular to the unit tangent vector T . (2) Show that it satisfies the following: N(t) = T ′ (t) . ∥T ′ (t)∥ Solution 13.6 (1) Since T is always a unit vector, we have T · T = 1, which is constant. Taking the derivative, we get (T (s) · T (s))′ = 2T (s) · T ′ (s) = 2κ(s)T (s) · N(s) = 0. Therefore, they are perpendicular. (2) Since T ′ (s) and T ′ (t) have the same direction and N is a unit vector, this equation holds. ⊔ ⊓ The curvature κ(s) indicates how sharply the curve bends at the point r(s) along the curve. Problem 13.7. Compute the curvature of the straight line r(t) = c + tv. Here, c and v are given constant vectors. Solution 13.7 We find r′ (t) as usual, then divide by its magnitude to obtain T (t). Since T × T = 0, we have ∥v × a∥ = ∥v∥3 κ = 0. Therefore, the curvature is κ = 0. ⊔ ⊓ Problem 13.8. Compute the curvature of the circle r(t) = a costi + a sintj with radius a > 0. 13 Curves and particle trajectories in R3 118 Solution 13.8 We compute r′ (t) and its magnitude to obtain T (t). Then, we find T ′ (t) and compute its magnitude to obtain κ(t). We have κ(t) = ∥T ′ (t)∥/a = 1/a. ⊔ ⊓ Problem 13.9. The example in Problem 13.8 illustrates a method for determining the curvature of a curve in the plane. Describe the method. Solution 13.9 Let r(t) be a point on the curve. Then, we find the circle that touches the curve at that point. If a > 0 is the maximum radius of such a tangent circle, then the curvature is 1/a. If there is no restriction on the radius of the tangent circle, we say a = ∞, and the curvature is 0. ⊔ ⊓ Torsion τ and binomial vector B The binomial vector is denoted by B and defined as follows: B = T × N. Then T , N, and B form a positively oriented orthogonal coordinate system. Differentiating, we have d d d d B = (T × N) = 0 + T × N = T × N. ds ds ds ds d d Since N is a unit vector, ds N is perpendicular to N, and therefore T × ds N has only the N direction. The torsion τ of the curve r(s) is defined as the negative of the N d component of ds B: d τ = − B · N. ds It represents how much the curve twists in the direction perpendicular to the direction of progression T . 13.4 Computation formulas The subjects defined in the previous section are the geometric properties of curves in space, and all definitions were made using the arclength parameter. However, it is cumbersome to perform calculations by converting variables each time. It can be done for general parameters. In this section, we introduce the method of calculation when the existing time variable t is given. We use the velocity v and acceleration a given by the derivatives with respect to time for these calculations. 13.4 Computation formulas 119 The unit tangent vector T (t) has the same direction as v(t) but with magnitude 1: T (t) = v(t) . ∥v(t)∥ T ′ (s) and T ′ (t) are different vectors, but they have the same direction. We have T ′ (s) = dt 1 d T (t(s)) = T ′ (t) = T ′ (t) . ds ds ∥v(t)∥ Therefore, the principal normal vector N(t) has the same direction as T ′ (t) but with magnitude 1: T ′ (t) N(t) = . ∥T ′ (t)∥ The binormal vector B is computed using the cross product: B = T × N. Problem 13.10. The acceleration a is perpendicular to B, and it is given as follows: a = CT T +CN N, CT = d ∥v∥, CN = κ∥v∥2 . dt Solution 13.10 We can compute a in the T NB system as follows: d d ds d 2 s ds d ds v(t) = T (t) = 2T + T (s) dt dt dt dt dt ds dt ds 2 d2s κN. = 2T + dt dt a= The binomial vector B does not have a component in the direction of acceleration, meaning B is perpendicular. Rewriting the coefficients, we obtain the given expressions. ⊔ ⊓ The above equation states that the acceleration a is divided into components along T and N, and only the component along T contributes to the change in velocity d dt |v|. The component along N depends on curvature and is responsible for changing direction, proportional to the square of velocity, but it does not contribute to changes in velocity. Problem 13.11. The curvature κ and torsion τ are given by: κ= ∥v × a∥ , ∥v∥3 ẋ ẏ ż ẍ ÿ z̈ ... ... ... x y z τ= . ∥v × a∥2 13 Curves and particle trajectories in R3 120 Solution 13.11 For κ, we use the expression for a from Problem 13.10. Since v = ∥v∥T , and T × T = 0, we have ∥v × a∥ = ∥∥v∥T × a∥ = ∥v∥3 κ∥T × N∥ = ∥v∥3 κ. Therefore, κ satisfies the given expression. The expression for τ can be directly memorized. ⊔ ⊓ 13.5 Exercises 1. Rewrite the following curves using the arclength parameter s. (1) r(t) = sinti + costj + tk, 0 ≤ t ≤ 1 (2) r(t) = ti + 3j − t 2 k, 0 < t < 1 2. Using the basis vectors from (8.3), find the velocity and acceleration of the curves in polar coordinates. (1) r = θ , θ = 3t (2) r = sin θ , θ = t 2 (3) rθ = 1, r = t 3. Compute T, N, B, κ, and τ for the following curves. (1) r(t) = sinti + costj + 2tk (2) r(t) = sin2 ti + cos2 tj − 3k Lecture 14 Linearization and differentials The equation of the tangent line tangent to the graph of the function y = f (x) at the point (c, f (c)) on the graph is y − f (c) = f ′ (c)(x − c). As these two, the tangent line and the graph, are enlarged in the vicinity of the tangent, they become increasingly similar. For this reason, the tangent line possesses many properties of the graph in the vicinity of the tangent point and can be considered as an approximation of the graph. As they move away from the tangent point, they become increasingly different, but as they approach the tangent point, the tangent line becomes an excellent approximation of the function. Approximations using not only first derivatives but also higher-order derivatives are learned in Part 4. In this lecture, we assume that the function f : R → R is differentiable at c ∈ R and discuss it. 14.1 Linearization The equation of the line passing through the origin with slope a ∈ R is y = ax. Although we can express it using function notation as f (x) = ax, it is necessary to become accustomed to simply thinking of y as a function of x. Then, the slope of this dy line is y′ = dx = a. If the line passes through a point (x0 , y0 ) instead of the origin, the equation of the line is as follows: y − y0 = a(x − x0 ). (14.1) The point (c, f (c)) is a point on the graph of y = f (x). How can we find the equation of the line tangent to the graph at this point? Using the derivative, we know that the slope is f ′ (c), and since the point (c, f (c)) must lie on this line, the equation is y − f (c) = f ′ (c)(x − c). Rewriting this, we have: y = f (c) + f ′ (c)(x − c). (14.2) 121 122 14 Linearization and differentials For convenience, we denote the right-hand side as L(x) := f (c) + f ′ (c)(x − c), which represents a function having the tangent line as its graph. In this case, we call L(x) the linearization or linear approximation of the function f (x) at the point c. √ Problem 14.1. Let f (x) = 1 + x. Find the linearization function L(x) of f at x = 0. Solution 14.1 Differentiating, we get f ′ (x) = 12 (1 + x)−1/2 , f ′ (0) = 0.5, and f (0) = 1. Therefore, L(x) = f (0) + f ′ (0)(x − 0) = 1 + 0.5x. Problem 14.2. Find the linearization function L(x) of the function f (x) = cos x at the point x = π2 . Solution 14.2 Differentiating, we get f ′ (x) = − sin x, f ′ ( π2 ) = −1, and f ( π2 ) = 0. Therefore, L(x) = f ( π2 ) + f ′ ( π2 )(x − π2 ) = −(x − π2 ) = π2 − x. Local property A property that is determined by the behavior in a small region is called a local property. In this section, the linearization explained indicates that the linear function L(x) is locally equivalent to the original function f (x) and its x-coordinate c near the tangent point. This implies that many local properties are given through differentiation, which is one of the reasons why differentiation is useful. It is used for local maximum, local minimum, rates of change of length and volume, etc., as learned in Calculus 1 and 2. 14.2 Differentials The Linearization L(x) = f (c) + f ′ (c)(x − c) utilizes the relationship (14.2) to approximate f (x) when x is close to c. Rather than approximating f (x) directly, it is more convenient to approximate only its difference. In other words, we can use the following relationship corresponding to (14.1): dy = f ′ (x)dx. (14.3) Here, dx and dy are called differentials, where dx is the independent variable and dy is the dependent variable determined by x and dx. This relationship is meaningful when dx is sufficiently small. Question 14.1 (Notation abuse). Are the dx and dy in the notation tiation the same as the differentials dx and dy in (14.3)? dy dx for differen- 14.2 Differentials 123 Solution 14.1 No, they are not. They mean different things but use the same notation. Representing different things with the same notation is called notation abuse. Sometimes, we abuse notation for convenience. One of the most striking abuses of notation is differentiation, where dx and dy are used. We denote the derivative dy . Thus, of the dependent variable y with respect to the independent variable x as dx when both are used together, it indicates the meaning of differentiation. On the other hand, dy and dx used as differentials are treated as small numbers. In (14.2), dy corresponds to f (x) − f (c) and dx to x − c. However, (14.3) can also be written as follows: dy = f ′ (x).⊓ ⊔ dx In this case, it means that the differential and differentiation using Leibniz notation represent the same thing. Therefore, they represent different things depending on the perspective, but in reality, they denote the same thing. It is necessary to become accustomed to the difference between them and to interchangeably use them. Problem 14.3. Given x = 1 and dx = 0.2, find the differential dy of the variable y = x4 + 7x. Solution 14.3 Differentiating, we have f ′ (x) = 4x3 + 7 and f ′ (1) = 11. Therefore, dy = 11dx = 0.2. The above differential dy pertains to the function f (x). Instead of introducing y directly, we can represent it as d f = f ′ (x)dx. Problem 14.4. Find the differential d f of the function f (x) = 3x2 − 6. Solution 14.4 d f = f ′ (x)dx = 6xdx. The calculation of differentials for sums, products, and composite functions follows the rules of differentiation. 1. d(u + v) = du + dv. 2. d(uv) = udv + vdu. 3. d( f (u)) = f ′ (u)du. Question 14.2. Using the chain rule for differentiation, we obtain: d(sin(sin(u))) = cos(sin(u))d(sin(u)) = cos(sin(u)) cos(u)du. What is the reason for the first equation? Solution 14.2 First, by substituting sin(u) as v, we obtain: d(sin(v)) = cos(v)dv = cos(sin(u))d(sin(u)). 124 14 Linearization and differentials Repeating this calculation, we obtain the second equation. 14.3 Differentials for linear approximation Approximation using differentiation refers to approximation using linear functions. Now, let’s compare linear approximation with errors in detail. Let ∆ x = x − c denote the difference between the independent variable x and the comparison point c. For such an independent variable, we set the differential dx to be the same as the difference in variables ∆ x. That is, dx = ∆ x = x − c. If dx is sufficiently small, according to the definition of differentiation, for a differentiable function f (x), we have f (c + dx) − f (c) f ′ (c) ∼ = dx holds. Therefore, at x = c + dx, we approximate the function value as a linear function: f (x) = f (c + dx) ∼ = f (c) + f ′ (c)dx. Now let’s compare the difference in function values ∆ y = f (x) − f (c) = f (c + dx) − f (c) and the difference with the differential dy = f ′ (c)dx Using this notation, where ∆ y is the difference in function values and dy is the product of dx and the derivative f ′ (c), we compare them as follows: ∆ y = f (c + dx) − f (c) = f (c + dx) − f (c) dx ∼ = f ′ (c)dx = dy. dx Now, the actual difference ∆ y and the difference in differential dy are as follows: ∆ y − dy = f (c + dx) − f (c) dx − f ′ (c) dx. Therefore, the difference between the actual difference ∆ y and the differential dy decreases as dx becomes smaller. More importantly, as dx decreases, the ratio f (c+dx)− f (c) converges to the derivative f ′ (c) faster than dx converges to 0. That dx is, ∆ y − dy → 0 as dx → 0. dx 14.3 Differentials for linear approximation 125 Expressed with the little-oh notation in the 16th lecture, it is as follows: ∆ y − dy = o(dx) as dx → 0. Problem 14.5. Given a circular disk with a radius of 10 cm, find the exact increase in area when the radius increases by 1 Solution 14.5 The area of a disk with radius r is A(r) = πr2 . Then, ∆ y = A(10.1) − A(10) = 2.01π. A′ (r) = 2πr, and A′ (10) = 20π. Therefore, dy = A′ (10)dx = 20π0.1 = 2π. Thus, the difference is ∆ y − dy = 0.01π. Problem 14.6. Approximate the value of (7.97)1/3 using c = 8, and predict the error using differentials. Compare it with the actual error. Solution 14.6 Let’s use the function f (x) = x1/3 . Using c = 8 as the approximation, 1 . we have f (8) = 81/3 = 2. Here, dx = 7.97 − 8 = −0.03, and f ′ (8) = 31 8−2/3 = 12 Therefore, the differential dy is dy = f ′ (c)dx = − 1 × 0.03 = −0.0025. 12 Now, using differentials for approximation, we have: 1 (7.97)1/3 = f (7.97) ∼ = f (8) + f ′ (8)dx = 2 + (−0.03) = 2 − 0.0025 = 1.9975. 12 The actual difference in error is 0.000003, which is not too bad. Exercises 1. Lecture 15 Inverse trigonometric and hyperbolic functions In this lecture, we will learn about inverse trigonometric functions and hyperbolic functions. However, functions like sin x and cos x are not one-to-one functions, so their inverse functions do not exist. What we are seeking is not the inverse functions of sin x and cos x, but rather the inverse functions for their branches. 15.0.1 Inverse trigonometric functions First, let’s review the basic properties of inverse functions learned in Lecture 6. If a function f : A → B is one-to-one and onto, then an inverse function g : B → A exists, satisfying g( f (x)) = x and f (g(y)) = y. If f is differentiable and f ′ (x) ̸= 0, then g′ (y) = f ′1(x) holds, where y = f (x). We denote the inverse function g as f −1 . Sine function The function sin θ has a domain that spans the entire real line R and a co-domain of [−1, 1]. Since it is not one-to-one, the inverse function does not exist. Even if there is no inverse function for the sine function, let’s remember that we are considering the inverse function for the chosen branch of sin x within its domain. Question 15.1. If we want to create a one-to-one function by taking some part of the domain of the function sin θ , what interval would be the best choice? It is essential to include the most critical angles, which are from 0 degrees to 90 degrees or from 0 to π/2. Considering the shape of the sin function, to include cases where sin takes negative values, it’s reasonable to include the interval from −90 degrees to 0 degrees as well. Therefore, the branch we should consider is as follows: sin |[−π/2,π/2] : [−π/2, π/2] → [−1, 1]. 127 128 15 Inverse trigonometric and hyperbolic functions The inverse function is denoted by arcsin or sin−1 . Even if we use sin−1 notation, let’s not forget that it represents the inverse function of the branch sin |[−π/2,π/2] , chosen not only by us but also by everyone. Therefore, the inverse function is: arcsin : [−1, 1] → [−π/2, π/2]. If someone asks for arcsin(−0.5), the correct answer is an angle between −π/2 and π/2 that satisfies sin θ = −0.5. Providing a different angle would be an incorrect answer. When we write the sine function, we sometimes use sin x or sin θ . Especially when using θ , it clearly indicates that it represents an angle. Of course, even when expressed as sin x, x represents an angle. However, writing arcsin θ is very misleading. This is because arcsin does not take angles as variables but rather takes values of sin between -1 and 1 as variables. Problem 15.1. Find the following values: (1) arcsin(0.5). (2) arcsin(−0.5). (3) arcsin(1). (4) arcsin(−1). Solution 15.1 (1) arcsin(0.5) is an angle θ such that sin(θ ) = 0.5 within the interval [−π/2, π/2]. Thus, the answer is 30 degrees or π/6. ⊔ ⊓ Now let’s find the derivative of arcsin. Using the derivative of the inverse function, we have: d 1 1 arcsin x = = . dx cos θ sin′ θ Of course, we shouldn’t stop here. Since we differentiated with respect to x, we should end with a function of x to be useful. Thus, q p p cos θ = 1 − sin2 θ = 1 − sin2 (arcsin x) = 1 − x2 . Consequently, the derivative of arcsin is: 15 Inverse trigonometric and hyperbolic functions 1 d , arcsin x = √ dx 1 − x2 129 −1 ≤ x ≤ 1. This formula is essential and should be remembered. Cosine function Now let’s consider the inverse function of the cosine function. Its domain spans the entire real line R, and its co-domain is [−1, 1]. Similarly, we choose a branch of the cosine function and consider its inverse. When creating a one-to-one function by taking some part of the domain of the function cos θ , similarly, it is essential to include the most crucial angles from 0 to 90 degrees. Thinking about the shape of cos, if we want to include cases where cos takes negative values, it’s reasonable to include the interval from 0 degrees to 180 degrees as well. Therefore, the branch we should consider is: cos |[0,π] : [0, π] → [−1, 1]. The inverse function is denoted by arccos or cos−1 . Thus, the inverse function is: arccos : [−1, 1] → [0, π]. If someone asks for arccos(−0.5), the correct answer is an angle between 0 and π that satisfies cos θ = −0.5. Problem 15.2. Find the following values: (1) arccos(0.5). (2) arccos(−0.5). (3) arccos(1). (4) arccos(−1). Solution 15.2 (1) ⊔ ⊓ Now, let’s find the derivative of arccos. Let arccos x = θ . Then, 1 −1 −1 −1 d arccos x = = =p =√ . 2 dx cos′ θ sin θ 1 − x2 1 − cos (arccos x) Thus, the derivative of arccos is d −1 arccos x = √ , dx 1 − x2 −1 ≤ x ≤ 1. This is essentially the negative of the derivative of arcsin x. Looking at the graph makes the reason clearer. 130 15 Inverse trigonometric and hyperbolic functions tangent function Let’s consider the inverse function of the function tan θ . First of all, the tangent function diverges at θ = ±π/2. The interval for defining the branch to choose is (−π/2, π/2): tan |[−π/2,π/2] : (−π/2, π/2) → R. Its inverse function is denoted by arctan, arctan : R → (−π/2, π/2). Let’s find the derivative of arctan. First, let’s assume arctan x = θ , then 1 d arctan x = = cos2 θ . dx tan′ θ To calculate by substituting θ with x, using arctan x = θ , we have x = tan θ ⇒ 1 + x2 = 1 + sin2 θ cos2 θ + sin2 θ 1 = = . 2 cos θ cos2 θ cos2 θ Therefore, 1 d arctan x = . dx 1 + x2 Remember this. It appears frequently. other functions We can also consider the inverse functions for the remaining three trigonometric functions among the total six trigonometric functions. Of course, we need to choose branches. The remaining three trigonometric functions are cotangent, secant, and cosecant. Their definitions are as follows. cot θ = 1 1 1 , sec θ = , csc θ = . tan θ cos θ sin θ The four cases excluding sine and cosine are unbounded functions. To select the inverse function, we choose the definition interval as [0, π] or [−π/2, π/2]. The inverse functions of the total six trigonometric functions are summarized as follows. arcsin : [−1, 1] → [−π/2, π/2] arctan : R → (−π/2, π/2) arcsec : R \ (−1, 1) → [0, π] \ {π/2} (15.1) arccos : [−1, 1] → [0, π] arccot : R → (0, π) arccsc : R \ (−1, 1) → [−π/2, π/2] \ {0} 15.1 Hyperbolic functions 131 Problem 15.3. (1) Show that when choosing branches for creating inverse functions of trigonometric functions, the angles between 0 and 90 degrees must be included and the domain must be made into a connected interval. This is uniquely determined only for the sine, cosine, tangent, and cotangent functions. (2) Confirm that for the secant and cosecant functions, there is no interval that is connected to the graph and contains all function values. (3) The formula (15.1) simply aligns the domain of arcsecant with arccosine and the domain of arccosecant with arccocosine. Solution 15.3 This problem can be confirmed by drawing graphs. ⊔ ⊓ 15.1 Hyperbolic functions Let’s first consider the derivatives of sine and cosine functions; ( sin′ x = cos x, cos′ x = − sin x ′′ sin x = − sin x, cos′′ x = − cos x When differentiating the sine function, sometimes it becomes the cosine function, and sometimes it becomes the sine function with a minus sign. It returns to itself after the second derivative, but with a minus sign. Hyperbolic functions have similar properties, but without the minus sign. Because of the properties of differentiation, these functions have names similar to sine and cosine functions, but their structures are very different. Hyperbolic sine and hyperbolic cosine functions are defined as follows: sinh x = ex − e−x , 2 cosh x = When computing their derivatives, we have ex + e−x . 2 132 15 Inverse trigonometric and hyperbolic functions sinh′ x = ex + e−x = cosh x, 2 cosh′ x = ex − e−x = sinh x. 2 That is, they become each other, so the second derivative is itself: sinh′′ x = sinh x, cosh′′ x = cosh x. Remembering the properties related to differentiation can be helpful in solving differential equations. Problem 15.4. Plot the graphs of the functions ex and e−x , and use them to draw the graphs of sinh x and cosh′ x. Solution 15.4 ⊔ ⊓ Problem 15.5. Find functions that become (1) themselves and (2) their negative counterparts when differentiated once. This problem is to find the solutions of the first-order linear differential equations y′ = y or y′ = −y. Solution 15.5 (1) y = ex . (2) y = e−x . (It’s also acceptable to answer with y = 3ex or y = 4e−x by multiplying constants, but it looks a bit strange. It would be better to answer with y = Cex and y = Ce−x .) ⊔ ⊓ The hyperbolic sine function is not related to the sine function. It is not a periodic function. As seen in the graph above, sinh : R → R one-to-one and onto function. cosh : R → [1, ∞) not a one-to-one function. Other hyperbolic functions are defined as follows. tanh x = sinh x cosh x 1 1 , coth x = , sechx = , cschx = . cosh x sinh x cosh x sinh x 15.1 Hyperbolic functions 133 The formulas of hyperbolic functions corresponding to some properties of sine and cosine functions are as follows. cosh2 x − sinh2 x = 1, sinh 2x = 2 sinh x cosh x. tanh′ x = sech2 x, coth′ x = csch2 x, · · · . Problem 15.6. Find functions that become (1) themselves and (2) their negative counterparts when differentiated twice. This problem is to find the solutions of the second-order linear differential equations y′′ = y or y′′ = −y. Since it is a second-order equation, we need to find two functions for each case. Solution 15.6 (1) sinh x and cosh x. (Or ex , e−1 , eix , e−ix can be used as answers.) (2) y = sin x and y = cos x. (Or eix , e−ix can be used as answers.) ⊔ ⊓ From the answers above, we can see that sinh x and cosh x are in the same family as ex and e−x , while y = sin x and y = cos x are in the same family as eix and e−ix . Problem 15.7. Find functions that become (1) themselves and (2) their negative counterparts when differentiated three times. Solution 15.7 We need to find three functions each, but we know one, and the others are not general functions. ⊔ ⊓ Problem 15.8. Find functions that become (1) themselves and (2) their negative counterparts when differentiated four times. This problem is to find the solutions of the fourth-order linear differential equations y′′′′ = y or y′′′′ = −y. Since it is a fourth-order equation, we need to find four functions for each case. Solution 15.8 (1) sin x, cos x, sinh x, cosh x. There are four functions. Expressed as exponential functions, they are ex , e−1 , eix , e−ix . (2) It is not a general function. ⊔ ⊓ Exercises 1. Find the angles. (1) arctan(1) (2) arcsin(−0.5) (5) arctan(−1) (6) arctan(0.5) √ (3) arccos( √ 3/2) (7) arcsin(− 3/2) √ (4) arcsin(1/ 2) (8) arccos(0.5) 2. Find the derivatives. (1) p arctan(x2 ) (2) arccos(1 − x) (5) | arctan x| (6) ln(arccos x) (3) arcsin(cos θ ) (7) arcsec(cos θ ) (4) arctan(ln x) (8) arccsc(sin θ ) 134 15 Inverse trigonometric and hyperbolic functions 3. FindZ the integrals. Z 1 1 √ dx (2) (1) dx 2 1 − x2 Z Z 1+x −1 1 dx (5) √ dx (4) 2 9+x 1 − x2 1 √ dx 9 − x2 Z 1 (6) p dx 1 − (x + 1)2 Z (3) Lecture 16 L’Hopital’s rule, big-oh, and little-oh 16.1 L’Hopital’s rule If the limits lim f (x) and lim g(x) exist and lim g(x) ̸= 0, then the limit lim x→a x→a x→a x→a f (x) g(x) exists, and lim x→a f (x) limx→a f (x) = g(x) limx→a g(x) holds. If this condition is not satisfied, the right-hand side does not have any meaning. However, the limit of the left-hand side may still exist. There are cases where we can easily determine the divergence of the left-hand side. For instance, in the following cases, the left-hand side diverges: lim g(x) = 0 x→a and lim f (x) = c ̸= 0 or lim f (x) = ±∞ x→a x→a Problem 16.1. Explain the limit for cases where convergence can be easily determined. Solution 16.1 Let’s consider the case when (limx→a f (x), limx→a g(x)) = (c, 0). If the limit value c is a nonzero real number or ±∞, then the limit diverges to ±∞ depending on the sign of c. Moreover, the case (limx→a f (x), limx→a g(x)) = (c, ±∞) occurs, and if the limit c is a real number, the quotient above converges to 0. ⊔ ⊓ However, if both limits converge to 0 or both diverge, that is, lim ( f (x), g(x)) = (0, 0) or (±∞, ±∞), x→a we cannot intuitively determine. In this case, a convenient method for finding the limit is L’Hopital’s rule. Here, since we are dealing with limits, we are not concerned with the function values f (a) and g(a), but rather with the limits. When the function is continuous, 135 136 16 L’Hopital’s rule, big-oh, and little-oh the limit and function value are the same. The above limits may be one-sided limits, and a may not be a real number but ∞ or −∞. Of course, the proof should be adapted accordingly. Although proofs are not provided for all cases, L’Hopital’s rule is demonstrated for two representative cases. Theorem 16.1 (L’Hopital’s rule for 0 0 and a ∈ R). Suppose that lim ( f (x), g(x)) = x→a (0, 0). If f and g are differentiable in (a − δ , a + δ ) for some δ > 0, and g(x) ̸= 0 for x ̸= a in the interval, then f (x) f ′ (x) = lim ′ . x→a g(x) x→a g (x) lim (16.1) Before proving, there are a few things to confirm. Equation (16.1) does not assume that the right-hand side converges. It holds even when the limit diverges. While the application of the rule does not definitively determine convergence, if it satisfies the conditions for using L’Hopital’s rule, it can be used multiple times. Proof. Let x ∈ (a − δ , a + δ ). Since f and g are differentiable in (a − δ , a + δ ) and g′ (t) ̸= 0 for all t ∈ (a − δ , a) ∪ (a, a + δ ) by Cauchy’s Mean Value Theorem 3.3, f (x) − f (a) f ′ (c) = ′ g (c) g(x) − g(a) holds for some c between x and a. Since f and g are differentiable functions, they are continuous at the point a, and therefore f (a) = g(a) = 0. Thus, the above equation becomes f ′ (c) f (x) = . g′ (c) g(x) Here, c can be seen as a function of x and is between a and x. When x converges to a, c also converges to a by the sandwich theorem. Therefore, f (x) f ′ (c(x)) f ′ (x) = lim ′ = lim ′ . x→a g(x) x→a g (c(x)) x→a g (x) lim The above proof does not assume that the limit converges and includes cases of divergence. ⊔ ⊓ Problem 16.2. Find the following limits. 3x − cos x 3x − sin x (1) lim (2) lim x→0 x→0 x x x − sin x sin x (4) lim (5) lim 2 x→0 x→0 x x3 √ 1+x−1 (3) lim x→0 x Solution 16.2 For (1), since the denominator converges to 0 and the numerator converges to −1, it diverges to positive and negative infinity (right-hand limit is −∞ and left-hand limit is ∞). For (2), since both the numerator and the denominator converge to 0, we can use L’Hopital’s rule: 16.1 L’Hopital’s rule 137 3 − cos x 3x − sin x = lim = 2. x→0 x→0 x 1 lim For (3), since both the numerator and the denominator converge to 0, we may use L’Hopital’s rule and get: √ 1+x−1 (1 + x)−0.5 × 0.5 lim = lim = 0.5. x→0 x→0 x 1 For (4), as both the numerator and the denominator converge to 0, L’Hopital’s rule can be applied: 1 − cos x x − sin x = lim . lim 3 x→0 x→0 x 3x2 However, we cannot conclude the obtained expression is in the form of 00 . By applying L’Hopital’s rule continuously, we get: lim x→0 1 − cos x sin x cos x 1 x − sin x = lim = lim = lim = . x→0 x→0 6x x→0 6 x3 3x2 6 (We applied L’Hopital’s rule from Theorem 16.1 three times consecutively.) For (5), we have: cos x sin x = lim lim x→0 2x x→0 x2 and it diverges. ⊔ ⊓ Theorem 16.2 (L’Hopital’s rule for ∞ ∞ and a ∈ R). Suppose that lim ( f (x), g(x)) = x→a+ (±∞, ±∞). If f and g are differentiable in (a, b) and g′ (x) ̸= 0 for x ∈ (a, b), then lim x→a+ f (x) f ′ (x) = lim ′ . g(x) x→a+ g (x) This theorem discusses right-hand limits. Corresponding facts hold for left-hand limits under the corresponding conditions. Proof. Since f (x) and g(x) tend to ±∞ as x → a+ , we can assume that they do not 1 have a value of 0. Otherwise, we can redefine b to be closer to a. Now let F(x) = f (x) 1 and G(x) = g(x) . Define F(a) = G(a) = 0, so F and G are right continuous at a. Applying Cauchy’s Mean Value Theorem 3.3 to F and G as in the proof above, for all x ∈ (a, b), g(x) f ′ (c) g2 (c) F(x) F ′ (c) = ′ ⇒ = ′ G(x) G (c) f (x) g (c) f 2 (c) holds for some c between x and a. Here, c can be seen as a function of x and is between a and x. When x converges to a, c also converges to a by the sandwich theorem. Therefore, 138 16 L’Hopital’s rule, big-oh, and little-oh lim x→a+ g(x) f ′ (x) g2 (x) = lim ′ lim 2 f (x) x→a+ g (x) x→a+ f (x) is obtained. Rewriting gives the relationship of the theorem. ⊔ ⊓ ∞ ∞ 0 0 when a = ±∞). The proofs of Theo0 ±∞ rems 16.1 and 16.2 were shown for the cases when the forms are and , with 0 ±∞ the limit point a being a real number. Describe and prove the theorem corresponding to the case where the limit point is ∞. Problem 16.3 (L’Hopital’s rule for and Solution 16.3 The basic approach to proving Theorem 16.2 was to modify the situation to correspond to Theorem 16.1. Recall that this was done by handling f and g indirectly and considering F = 1f and G = 1g instead. Now, it’s not a problem of function values but of variables. Can it be resolved by variable transformation? ⊔ ⊓ Problem 16.4. Compute the following limits. sec x ln x (1) lim . (2) lim √ . x→∞ x x→π/2 1 + tan x ⊔ ⊓ Solution 16.4 To compute the limit lim f (x) directly may be difficult, so we introduce an x→a indirect method. Let φ be a continuous inverse function, and let φ −1 be its inverse. Although it may be hard to compute lim f (x) directly, there are cases where x→a lim φ ( f (x)) is easier to compute. Denote this limit by A. Since φ is continuous, we x→a have A = lim φ ( f (x)) = φ (lim f (x)). x→a x→a Thus, by applying the inverse function, we obtain lim f (x) = φ −1 (A). x→a Using this logic along with L’Hopital’s rule, we can compute the following. Problem 16.5. Compute the following limits. (1) lim (1 + x)1/x . (2) lim x1/x . x→∞ x→0 Solution 16.5 (1) Taking the natural logarithm seems to simplify the calculation. 1/(x + 1) ln(1 + x) = lim = 1. x→0 x→0 x 1 lim ln(1 + x)1/x = lim x→0 Therefore, lim (1 + x)1/x = e1 = e. x→0 16.2 Big-oh and Little-oh 139 (2) Taking the natural logarithm also seems to simplify the calculation. lim ln x1/x = lim x→∞ x→∞ 1/x ln x = lim = 0. x→∞ 1 x Therefore, lim x1/x = e0 = 1. ⊔ ⊓ x→∞ 16.2 Big-oh and Little-oh This section discusses mathematical language for comparing the sizes of limits, and having a clear understanding of these concepts is helpful. Since it’s about comparing sizes, we compare two positive-valued functions. Consider two positive functions f (x) and g(x). Definition 16.1. We say f (x) = o(g(x)) (little-oh) as x → a for a ∈ R or a = ±∞ if lim x→a f (x) = 0. g(x) We say f (x) = O(g(x)) (big-oh) as x → a for a ∈ R or a = ±∞ if there exists an upper bound M > 0 such that f (x) ≤M g(x) for x close enough to a. In other words, saying that the function f is little-oh o(g) as x → a means that f is much smaller than g near a. The meaning of being much smaller is that the ratio tends to 0. Also, saying that f is big-oh O(g) as x → a means that as x approaches a, f is smaller than a constant multiple of g, where the constant multiple can be as small as 0. Therefore, f can be much smaller than g, or the two functions can be of similar sizes up to a constant multiple. Note that in this definition, the set of functions that are little o(g) is considered as a subset of functions that are big O(g). Problem 16.6. (1) Compare the sizes of the functions f (x) = ex and g(x) = x2 as x → ∞. (2) Compare the sizes of the functions f (x) = x2 and g(x) = |x|3 as x → 0. Solution 16.6 (1) Using L’Hopital’s rule, f (x) ex = lim 2 = ∞. x→∞ g(x) x→∞ x lim Therefore, we can say g = o( f ) (or x2 = o(ex )) as x → ∞. For (2), 140 16 L’Hopital’s rule, big-oh, and little-oh f (x) x2 = lim 3 = ∞, x→0 g(x) x→0 x lim so x3 = o(x2 ) as x → 0. If we consider negative x, since x3 can be negative, it’s better to say |x|3 = o(x2 ) as x → 0. ⊔ ⊓ Problem 16.7. Verify whether the following asymptotic comparisons are correct or incorrect. (1) ln x = o(x) as x → ∞ (2) x2 = o(x3 + 1) as x → ∞ (3) x = o(ex ) as x → ∞ (4) ex = o(x) as x → ∞ (5) x3 = o(x2 ) as x → 0+ (6) x = o(xx ) as x → 0+ Solution 16.7 ⊔ ⊓ Exercises 1. Find the limit using L’Hopital’s rule. x2 + 3x + 1 sin(x) (2) lim 2 (1) lim x→∞ 2x + 5x − 3 x→0 x (3) limπ x→ 2 cos(x) sin(x) Lecture 17 Integration Techniques # 1 In this and the next lecture, we will learn several integration techniques. While differentiation can often be done easily using rules like the chain rule and the product rule, integration requires specific techniques for different types of functions, which must be learned and practiced. Despite these techniques, there are still many functions that cannot be integrated by hand. In such cases, numerical methods such as numerical integration or the use of computer software can be employed. Derivatives of some functions First, it is important to remember the derivatives of several special functions. Refer to the following list: d x = ex dx e d x = ax ln a dx a d = 1x (17.1) dx ln(|x|) d 1 √ arcsin x = dx 1−x2 d 1 dx arctan x = 1+x2 If you want to find the antiderivative of the functions on the right-hand side of equation (17.1), you simply add a general constant C to the left-hand side function. This is not a process of finding the function on the left-hand side starting from the function on the right-hand side; it’s simply a matter of memorization. Therefore, memory is important in integration. By combining these memorized functions with some integration techniques, you can integrate functions in quite a variety of cases. 141 142 17 Integration Techniques # 1 17.1 Substitution The substitution technique involves using the chain rule in reverse. If we substitute a function u(x) for the variable x in the function f (x) on the left side of equation (17.1) to create the composite function f (u(x)), then the derivative of that is according to the chain rule: d f (u(x)) = f ′ (u(x))u′ (x) dx Therefore, the indefinite integral of f ′ (u(x))u′ (x) is f (u(x)) + C. This principle is methodically represented as simply u′ dx = du (17.2) Subsequently, following the reverse chain rule as explained above: Z f ′ (u(x))u′ (x)dx = Z f ′ (u)du = f (u) +C = f (u(x)) +C The second integral indicates integrating with respect to u instead of x. Equation (17.2) extends beyond a mere technique for substitution; it was used as a definition of differential in Lecture 14. All of this is made possible due to the chain rule. To actually use it, the key is to identify what becomes u and what becomes u′ (the derivative of u). For example, the integral of √ 1 2 is arcsin(x) +C. Instead, if 1−x we integrate ′ √ u (x) 2 , 1−u(x) 1 1−u(x)2 the answer becomes arcsin(u(x)) + C. Integrating √ directly might be more challenging, but integrating the product with u′ (x) becomes easier thanks to the substitution technique. Now, how do we integrate √ x 4 ? It’s 1−x important to recognize here that we can use u = x2 . Let’s look at some examples of substitution. Through this process, you’ll gain a clearer understanding. Problem 17.1. Compute Z the following five Z indefinite integrals. Z Z 1 1 2x − 3 1 √ dx (2) dx (3) dx (4) dx (1) 2 2 2 2 2 a −x 8x − x x − 3x + 1 Z x +A 1 (5) dx 1 − sin x Solution 17.1 (1) Z ⊔ ⊓ 1/A 1 √ dx = √ 2 (x/ A) + 1 A √ √ = arctan(x/ A)/ A +C. 1 dx = 2 x +A Z Z √ 1/ A √ dx (x/ A)2 + 1 17.2 Integration by parts 143 Problem 17.2. Compute the following four indefinite integrals. Z Z Z Z 1 3x2 − 7x 3x + 2 √ dx (4) x3 cos xdx dx (3) (1) dx (2) √ 3x + 2 (1 + x)3 1 − x2 Solution 17.2 ⊔ ⊓ Let’s consider another example of substitution. Since the derivative of sine is cosine and the derivative of cosine is sine, we can use this relationship effectively to R integrate products of sine and cosine. For instance, to compute cosk x sin xdx, we substitute u = cos x. Then, du becomes − sin xdx, so Z cosk x sin xdx = − Z uk du = − 1 1 k+1 u +C = − cosk+1 x +C k+1 k+1 This way, if there is only one sin x and the rest are all cos x or vice versa, it’s convenient to make the substitution. Even if that’s not the case, when sin x is raised to an odd power, integration becomes straightforward. Let’s practice trigonometric substitution with the following problems. Problem 17.3. Compute the following integrals. Z sin3 x cos xdx, Z sin3 x cos2 xdx, Z cos5 xdx, Z sin2 x cos4 xdx. Solution 17.3 The most difficult integral is (iv). The others are actually relatively easy. However, since both sin x and cos x appear to even powers, integration seems difficult. But we can use the double angle formulas. sin2 x = 1 − cos 2x , 2 cos2 x = 1 + cos 2x . 2 Rewriting the problem, Z sin2 x cos4 xdx = Z 1 − cos 2x 1 + 2 cos 2x + cos2 2x × . 2 4 Simplify, and integrate each cosine power separately. ⊔ ⊓ 17.2 Integration by parts Perhaps the most commonly used and important integration technique is integration by parts. This technique involves using the product rule of differentiation in reverse. Recalling the product rule of differentiation, (uv)′ = u′ v + uv′ ⇒ u′ v = (uv)′ − uv′ . 144 17 Integration Techniques # 1 Integrating both sides, we get the following indefinite integral formula: Z u′ vdx = uv − Z uv′ dx. But the integration is not complete; there is another integral on the right-hand side. Is there an improvement? It depends on the problem. The integral on the right side should be simpler than the integral on the left. How do we achieve this? First, we need to view the integral as the product of two functions. Then, to use R integration by parts to integrate f (x)g(x)dx, we need to decide which of f and g will be u′ and which will be v. We decide based on which one is easier to integrate and which becomes simpler upon differentiation. If integrating one function is easier and differentiation makes the other function simpler, then we choose to differentiate the function that makes the other easier to integrate. For example, if integrating f becomes easier and multiplying g′ to f makes it easier to integrate, then we set u′ = f and v = g. After applying integration by parts, uv′ is simpler to integrate than u′ v. Integration by parts essentially swaps the positions of differentiation. Let’s look at some examples of integration by parts. Through these examples, you’ll get a clearer understanding. Problem 17.4. Use integration by parts to integrals. Z Z Z evaluate the following Z (1) x cos xdx (2) ln xdx (3) x2 ex dx (4) ex sin xdx Solution 17.4 (1) Both x and cos x can be integrated or differentiated, but differentiating x simplifies it, so let v = x and integrate u′ = cos x. Applying integration by parts, u = sin x and v′ = 1, so Z x cos xdx = x sin x − Z sin xdx = x sin x + cos x +C. (2) In the second problem, only ln x is present, and its integral is unknown. DifferR R entiating it gives 1x . Since we can differentiate it, we’ll consider ln xdx = 1 ln xdx so ln x and 1 are a good pair to consider for integration by parts. Let u = ln x, v′ = 1, then u′ = 1x , v = x, so Z Z ln xdx = 1 ln xdx = x ln x − Z 1 x dx = x ln x − x +C. x (3) ex can be integrated or differentiated without changing, but x2 becomes simpler upon differentiation. Specifically, its second derivative becomes 1. Therefore, setting u′ = ex and v = x2 and applying integration by parts a couple of times, we get Z x2 ex dx = x2 ex − Z 2xex dx = x2 ex − 2xex + Z 2ex dx = x2 ex − 2xex + 2ex +C. 17.2 Integration by parts 145 (4) Both ex and sin x are easy to integrate or differentiate, but combining them does not simplify. However, ex remains unchanged upon differentiation and integration, while sin x becomes − sin x when differentiated twice. So, we can integrate this by parts. We get Z ex sin xdx = ex sin x − Z ex cos xdx = ex sin x − ex cos x − Z ex sin xdx. Re-solving for ex sin xdx, R Z ex sin xdx = 1 x e sin x − ex cos x +C. 2 ⊔ ⊓ Integration by parts is very useful. Here’s another way to use it. Problem 17.5 (Reduction). Compute the integral cosn xdx. R Solution 17.5 If n = 1, the indefinite integral is sin x +C. For n ≥ 2, let’s consider those cases. First, Z Z cosn xdx = cosn−1 x cos xdx and then we use integration by parts. To integrate, let u′ = cos x and v = cosn−1 x. Then u = sin x and v′ = −(n − 1) cosn−2 x sin x. Therefore, Z cosn xdx = cosn−1 x sin x − (n − 1) Z cosn−2 x sin2 xdx = cosn−1 x sin x − (n − 1) Z cosn−2 x(1 − cos2 x)dx = cosn−1 x sin x − (n − 1) Z cosn−2 xdx + (n − 1) Z cosn xdx. Rearranging and simplifying, we get (2 − n) Z n n−1 cos xdx = cos x sin x − (n − 1) Z cosn−2 xdx. Z cosn−2 xdx So, if n ̸= 2, Z cosn xdx = − 1 n−1 cosn−1 x sin x + n−2 n−2 which gives a sort of reduction formula. After computing the integral for n = 1, 2, subsequent integrals can be obtained using the previous integral values. ⊔ ⊓ Lecture 18 Integration Techniques # 2 18.1 Trigonometric substitution √ 2 2 √ substitution is a method to integrate functions involving a + x , √ Trigonometric 2 2 2 2 a − x√, and x − a . Let’s consider each one with reference to the figure. To handle a2 + x2 , we utilize the following relationships provided by the first triangle in the figure: x = a tan θ dx = a sec2 θ dθ θ = arctan(x/a), − π2 ≤ θ ≤ π2 √ 2 a + x2 = a sec θ Problem 18.1. Find the integral R √1 4+x2 dx. √ Solution 18.1 a = 2, x = 2 tan θ , dx = 2 sec2 θ dθ , 4 + x2 = 2 sec θ are substituted to get Z Z Z 1 2 sec2 θ √ dθ = secθ dθ dx = 2 sec θ 4 + x2 which is transformed into an integral with respect to θ instead of x. Now using the integral of secant function, we get: 147 148 18 Integration Techniques # 2 Z To handle the figure: secθ dθ = ln | sec θ + tan θ | +C = ln √ 4 + x2 x + +C. 2 2 ⊔ ⊓ √ a2 − x2 , we use the relationships provided by the second triangle in x = a sin θ dx = a cos θ dθ θ = arcsin(x/a), − π2 ≤ θ ≤ √ 2 a − x2 = a cos θ . Problem 18.2. Compute the integral R √1 4−x2 π 2 dx. √ Solution 18.2 a = 2, x = 2 sin θ , dx = 2 cos θ dθ , 4 − x2 = 2 cos θ are substituted to get Z Z 2 cos θ 1 √ dx = dθ = θ +C = arcsin(x/2) +C. ⊔ ⊓ 2 2 cos θ 4−x To handle figure: √ x2 − a2 , we use the relationships provided by the third triangle in the x = a sec θ dx = a sec θ tan θ dθ θ = arcsec(x/a), 0 ≤ θ ≤ π √ 2 x − a2 = a| tan θ |. Problem 18.3. Solution 18.3 Z Z 1 √ dx =?. x2 − 4 1 √ dx = 2 x −4 Z 2 sec θ tan θ dθ = 2| tan θ | ⊔ ⊓ 18.2 Integration of rational functions A rational function is a function with polynomial numerator and denominator. For example, x4 + 2x3 + x2 + x + 1 f (x) = x3 + 1 is a rational function. In this section, we find integrals of such functions. First, if the degree of the numerator is greater than the degree of the denominator, we can divide and write as follows: 18.2 Integration of rational functions 149 x4 + 2x3 + x2 + x + 1 x2 − 1 = x+2+ 3 . 3 x +1 x +1 Thus, every rational function can be expressed as a sum of a polynomial and a rational function whose numerator’s degree is less than the denominator’s degree. Since we know how to integrate polynomials well, we only need to find integrals of rational functions whose numerator’s degree is less than the denominator’s degree. That is, we want to find the integral of f (x) = q(x) , p(x) deg(q) < deg(p) We need to use a very important theorem in algebra. Theorem 18.1 (Fundamental Theorem of Algebra). A polynomial p(x) with leading coefficient 1 can be factored as follows: p(x) = p1 (x) · · · pk (x). Here, pi (x) are irreducible polynomials, which are either linear or quadratic, and each has a leading coefficient of 1. A polynomial that cannot be factored further is called an irreducible polynomial. According to this theorem, polynomials of degree 3 or higher can be factored, and among quadratic polynomials, some can be factored while others cannot. Problem 18.4. Prove that if a2 −4b < 0, then the quadratic x2 +ax+b is irreducible. Solution 18.4 If the quadratic equation x2 + ax + b = 0 has two real roots α, β , then the quadratic can be factored as (x − α)(x − β ) and is not irreducible. Therefore, the condition is that the discriminant is less than 0. ⊔ ⊓ Rewriting a quadratic in square form, we get a 2 + A, x2 + ax + b = x + 2 A=− a2 − 4b 4 So if the quadratic is irreducible, then the value A at the vertex is positive, and the graph does not intersect the x-axis and has no real roots. Now, let’s find the integrals of rational functions with irreducible denominators. Problem 18.5 (Irreducible denominators). Show the following integrals. In the second equation, assume that the denominator is irreducible, i.e., a2 < 4b. b dx = b ln |x + a| +C x+a Z √ √ cx + d c ac dx = ln |x2 + ax + b| + (d − ) tan−1 ((x + a/2)/ A)/ A +C, 2 x + ax + b 2 2 Z 150 18 Integration Techniques # 2 where A = 4b−a2 4 > 0. Solution 18.5 The first equation is obtained by using the natural logarithm: Z b dx = b x+a Z 1 dx = b ln |x + a| +C. x+a Let’s derive the second equation. Considering that the derivative of the denominator is (x2 + ax + b)′ = 2x + a, we rewrite the numerator as follows: Z cx + d dx = x2 + ax + b = Z c ac 2 (2x + a) + d − 2 c 2 Z dx x2 + ax + b Z 2x + a ac 1 dx + (d − ) dx. 2 2 x + ax + b 2 x + ax + b The first term is obtained using substitution: c 2 2x + a Z x2 + ax + b dx = c ln |x2 + ax + b|. 2 For the second term, rewriting the denominator in square form: Z 1 dx = 2 x + ax + b 2 −4b where A = − a ⊔ ⊓ 4 Z √ √ 1 dx = arctan((x + a/2)/ A)/ A, 2 (x + a/2) + A > 0. Adding these, we get the second equation in the problem. Partial fraction So far, we have found integrals for cases where the denominator is of degree 1 or 2 and for cases where the denominator is quadratic and irreducible. Now, we express any rational function as a sum of these two cases using partial fraction. Then, we can integrate all rational functions. Partial fraction decomposition expresses a rational function as a sum of rational functions with denominators of degree 1 or 2. Theorem 18.1 guarantees that any rational function can be written as follows: q(x) q(x) = . p(x) p1 (x) · · · pk (x) Using this fact, we can perform partial fraction decomposition. First, arrange them according to degrees and separate them into polynomials with degree 1 and those with degree 2: q(x) a1 aℓ aℓ+1 x + bℓ+1 ak x + bk = +···+ + +···+ p1 (x) · · · pk (x) p1 (x) pℓ (x) pℓ+1 (x) pk (x) (18.1) 18.2 Integration of rational functions 151 We can find ai and bi that satisfy this equation. The core of partial fraction is expressing any rational function as a sum of rational functions with denominators of degree 1 or 2. Although we already know how to integrate each of them, there are cases where we need to treat them differently, especially when there are pi = p j for i ̸= j. Below, we will discuss how to perform partial fraction decomposition through an example. Problem 18.6. Decompose the following rational function into partial fractions: 5x + 3 x2 + 2x − 3 . Solution 18.6 Although the denominator is a quadratic, it is not irreducible. We can factor it as x2 + 2x − 3 = (x − 3)(x + 1). Then, 5x + 1 A B Bx − 3B + Ax + A (A + B)x + (A − 3B) = + = = (x − 3)(x + 1) x − 3 x + 1 (x − 3)(x + 1) (x − 3)(x + 1) So A = 4 and B = 1, and thus 5x + 3 x2 + 2x − 3 = 4 1 + . x−3 x+1 Problem 18.7. Integrate the rational function ⊔ ⊓ 5x+3 . (x+1)2 Solution 18.7 First, let’s perform partial fraction decomposition. It’s easy to see that it cannot be written in the form of (18.1). If we attempt to do so: 5x + 3 A B A+B = + = (x + 1)2 x+1 x+1 x+1 which cannot be solved for A and B. So, we take a different approach: A Ax + A + B B 5x + 3 = = . + (x + 1)2 x + 1 (x + 1)2 (x + 1)2 From the first step, we remember how we started. So, A = 5 and B = −2, and thus 5x + 3 5 2 = − 2 (x + 1) x + 1 (x + 1)2 Now, we integrate the two terms: Z 5 dx = 5 ln |x + 1| +C, x+1 Adding them up gives the final result. ⊔ ⊓ Problem 18.8. Integrate the following: − Z 2 2 dx = +C. (x + 1)2 x+1 152 18 Integration Techniques # 2 3x3 + 2x2 + 2x + 1 . (x2 + 1)(x + 1)2 Solution 18.8 We need to perform partial fraction decomposition first: D 3x3 + 2x2 + 2x + 1 Ax + B C + . = 2 + (x2 + 1)(x + 1)2 x + 1 x + 1 (x + 1)2 We have to find A, B, C, and D by comparing coefficients. Comparing, we get: 1 1 5 A = , B = − , C = , D = −1 2 2 2 Now, we integrate the three terms: Z C 5 = ln |x + 1|, x+1 2 Z 1 D = , (x + 1)2 x+1 1 Ax + B 1 = ln |x2 + 1| − tan−1 (x). 2 x +1 4 2 Adding these up gives the final result. ⊔ ⊓ Z Lecture 19 Integration Techniques #3 19.1 Improper integrals A general integral is performed when the function f is bounded and the integration R interval is a finite interval [a, b]. In this case, the integral is denoted as ab f (x)dx and is called a proper integral. Improper integrals, on the other hand, occur in two cases: when the function f is unbounded or when the integration interval is not a finite interval, i.e., [a, ∞), (−∞, a], or (−∞, ∞). Improper integral of type #1 Let’s consider the improper integral when the size of the integration interval is inR finite. If the function f is a continuous function defined on R, then ab f (x)dx is well-defined. However, when the integration interval has infinite size, the integral is not immediately defined. In mathematics, we do not ”add infinitely” or anything like that. We only consider limits. The improper integral is given by the following limits: Z ∞ Z b f (x)dx = lim f (x)dx, b→∞ aZ Zab b f (x)dx = lim f (x)dx, a→−∞ a −∞ Z Z Z b ∞ 0 f (x)dx = lim f (x)dx + lim f (x)dx. −∞ a→−∞ a b→∞ 0 Question 19.1. Is the size of the universe infinite? Some say it’s finite and expanding. If we refer to everything beyond as the universe, it wouldn’t be incorrect to call it infinite. But what does it mean when the integration interval is infinite? Should we be concerned about it? 153 154 19 Integration Techniques #3 Problem improper integrals. Z ∞ 19.1. Find the Z following Z ∞ ∞ ln x dx (1) dx (2) (3) x p dx 2 2 −∞ x + 1 1 1 x Solution 19.1 (1) One should realize that using integration by parts is appropriate for this problem. Considering the derivative of ln x as 1x simplifies the process. Therefore, let’s choose v = ln x for differentiation and u′ = x−2 for integration. Then, since v′ = x−1 and u = −x−1 , we have Z b b u′ v = −x−1 ln x + 1 1 Z b x−2 = −x−1 ln x − x−1 1 b 1 =− ln b 1 − + 1. b b Taking the limit as b → ∞ yields 1. (2) This problem is straightforward if one remembers the derivative of the arctangent function. It seems obvious that the answer is π even without computation. Let’s do it anyway. Z b dx 0 x2 + 1 b = arctan x = arctan b − 0 → 0 π as b → ∞, 2 Z 0 0 dx π = arctan x = 0 − arctan b → as b → −∞. 2 +1 x 2 b b Adding them together yields the answer π. (3) This problem may seem easy, but its significance is crucial and should be remembered. Let’s consider the integrability of the function x p at infinity. If p ̸= −1, then Z b 1 1 p+1 b x = (b p+1 − 1). x p dx = p+1 p+1 1 1 Taking the limit as b → ∞, if p > −1, it diverges to infinity, and if p < −1, it 1 converges to − p+1 . For p = −1, it also diverges to infinity as ln b. ⊔ ⊓ The last case of this problem is crucial and is summarized below. ( Z ∞ ∞, p ≥ −1 p x dx = −1 1 p+1 , p < −1. (19.1) As we saw when introducing the natural logarithm ln x, the boundary is p = −1. Problem 19.2 (Comparison). Compare the magnitudes of 1 1 1+x2 R∞ and Solution 19.2 Consider the following comparison. 0 ≤ f (x) ≤ g(x) in [a, b] ⇒ 0 ≤ Z b f (x)dx ≤ a Therefore, 1 1 1+x2 R∞ ≤ R∞ 1 R ∞ −2 = 1, and 1 x2 . Also, 1 x Z b g(x)dx. a R∞ 1 1 x2 . 19.1 Improper integrals 155 Z b dx 1 x2 + 1 Thus, we obtain π 4 b = arctan x = arctan b − 1 π π → as b → ∞. 4 4 < 1, implying π < 4. (Not bad.) ⊔ ⊓ Improper integral of type #2 Let’s consider the improper integral when the function’s magnitude is infinite. This case requires a bit more attention. Suppose the function f is defined on the finite interval [a, b] and approaches infinity as it approaches a point x0 ∈ (a, b). That is, if f (x) is finite for all c < x0 in the interval [a, c] and also finite for all c > x0 in the interval [c, b], then the improper integral of f on [a, b] is defined as follows: Z b Z c Z b f (x)dx = lim c→x0 − a a f (x)dx + lim c→x0 + c f (x)dx. If the limit exists, then the improper integral exists; otherwise, it does not. Problem 19.3. Show the following: ( Z 1 p x dx = 1 p+1 , p > −1 ∞, p ≤ −1. 0 (19.2) Solution 19.3 If p ̸= −1, then Z 1 b x p dx = 1 p+1 x p+1 1 b = 1 (1 − b p+1 ). p+1 Taking the limit as b → 0, if p < −1, it diverges to infinity, and if p > −1, it con1 . For p = −1, it is difficult to conclude, but as we have learned from verges to p+1 the natural logarithm, ln b tends to infinity as b → 0+ . ⊔ ⊓ Like in the case of (19.1), p = −1 forms the boundary in (19.2). However, the cases where it diverges to infinity are reversed. Only in the case of p = −1 do both cases diverge. Problem 19.4. Compute the following. Z 2 Z 2 1 1 (1) dx (2) dx 1/2 0 |x − 1| 0 x−1 Solution 19.4 Both cases correspond to improper integrals since the functions diverge in the vicinity of x = 1. (1) Let’s use a change of variables. Let z = x − 1, then dx = dz, and Z 2 0 1 dx = |x − 1|1/2 Z 1 −1 |z|−1/2 dz = 2 Z 1 0 z−1/2 dz = 4. 156 19 Integration Techniques #3 (2) Similarly for this case, Z 2 0 1 dx = x−1 Z 1 1 0 z dz + Z 0 1 −1 z dz. Using the variable transformation y = −z for the last integral, where dy = −dz, Z 0 1 −1 z dz = − Z 0 1 1 −y dy = − Z 1 1 0 y dy. Now substituting back, Z 2 0 1 dx = x−1 Z 1 1 0 z dz − Z 1 1 0 y dy = 0. However, this should not be the answer. Although 01 1z dz and 01 1y dy are the same, both diverge, and subtracting one from the other implies subtracting infinity from infinity, which is incorrect. It is more appropriate to state that both integrals diverge. ⊔ ⊓ R R 19.2 Integration with software Besides the integration methods we have learned, various other integration methods are available in computer software for use. In this section, we will explore how to use them through some examples. Problem 19.5. Solution 19.5 ⊔ ⊓ Problem 19.6. Solution 19.6 ⊔ ⊓ Part IV Approximation Techniques and Series When dealing with real problems, most of the time, it’s necessary to work with approximate values because handling true values is often not possible. For instance, since π is an irrational number, computers cannot handle its true value. However, for activities like flying planes and putting satellites into orbit, extremely precise approximate values are required. When using approximate values, there are two important factors to consider. What are they? One is that the precision should be high for a good approximation, and the other is that you need to know the maximum error range between the approximate value and the true value. Let’s say there’s an approximation method with excellent convergence but the error range is unknown, and another method with poor convergence but the error range is known. Which one would most bosses choose? Most bosses would choose the method where the error range is known. That’s how important knowing the error range is. Sequences and series might seem algebraic, so why are they dealt with in calculus? It’s because calculus and analysis deal with approximation mathematics. Taylor expansion, in particular, is a method of approximating functions using differentiation. Typically, when approximating a function, a specific function series φi is given first, and the target function f (x) to be approximated is represented as a linear combination of the function series φi . In other words, ∞ f (x) ∼ = ∑ ai φi (x) i=0 is the form. The goal is to find the sequence ai corresponding to the given function f (x) and to find the convergence of this constructed function series and the interval of convergence for x, and if possible, to find the maximum error range. In Taylor expansion, φi (x) = xi or φi (x) = (x − x0 )i is given, and the coefficients ai are found using differentiation. Then, the error range is calculated. There are various other approximation methods, but how the function series φi is constructed is crucial. If a specific number is substituted for x, the right-hand side of the above equation becomes a series. For this reason, we first understand the properties of fundamental sequences and series before considering function series. If approximating values using data and neural networks is AI, then finding a function for approximation using differentiation is Taylor expansion. There’s something missing in Taylor expansion compared to AI: error estimation. If AI can provide the maximum error range for approximations, we can use it confidently. However, it seems impossible to theorize about the error range of AI approximations. Lecture 20 Numerical Integration In the previous three lectures, various techniques of integration were learned. Nevertheless, depending on the task at hand, one may encounter more cases where integration is not possible using these methods than cases where it is. However, understanding the principles and the relationship with differentiation is important. Even if we cannot obtain exact integrals by hand in practice, we can use computers to compute the integral values. In this process, our understanding of the integrals and differentials we have already learned will guide us. 20.1 Numerical integration and Riemann sum Partitioned integration is a task that is too time-consuming and tedious for humans to do directly, but it is very suitable for performing numerical calculations using computers. In this section, we√will learn about these techniques. Let’s start by taking the test function as f (x) = 1 − x2 , and the integration interval as [a, b] = [0, 1]. Then, as shown in the above figure, it is a part in the first quadrant of a circle with the origin as the center and a radius of 1, and its integral value is 14 of the area of the circle. In other words, Z 1p π 1 − x2 dx = . 4 0 Let’s compare how well the numerical techniques perform compared to the exact integral value. First, let’s decide on a partition to perform the integration. Let’s simply set 0 1 2 n−1 n , xn = . x0 = , x1 = , x2 = , · · · , xn−1 = n n n n n We divided it into a total of n subintervals. Next, according to the partitioned integration method, for the i-th subinterval [xi−1 , xi ], we choose a point si ∈ [xi−1 , xi ] and compute the Riemann sum as 159 160 20 Numerical Integration n n q 1 1 − s2i , n i=1 Rn = ∑ f (si )△xi = ∑ i=1 which becomes a numerical integration. As n increases, the above value gradually approaches the integral value. Below is the MATLAB code to calculate this: %% parameters n=10;% number of subintervals L=1;% integration domain is [0, L] %% dx=L/n; x=0:dx:L; % partition for [0,L] with mesh size dx % i-th interval is [x(i),x(i+1)]. %% R=0; % Riemann sum for i=1:n s=x(i); % s is the left point R=R+sqrt(1-sˆ2)*dx; end E=R-pi/4; % approximation error In many numerical computation codes like MATLAB, the index starts from 1 instead of 0, so the partition created above is x1 , · · · , xn+1 . Here, the i-th subinterval is [xi , xi+1 ] and xi is the left endpoint. Problem 20.1. When calculating the Riemann sum numerically, how should si be chosen among the points in the interval [xi , xi+1 ]? Solution 20.1 Three methods can be considered. Left point si = xi , midpoint si = xi +xi+1 , and right point si = xi+1 . We computed the approximation error for various 2 numbers of subintervals using these three methods and created Table 20.1. From this table, it can be seen that the case of using the midpoint has the smallest error. Then, can we √ say that using the midpoint is the best method? If the function is not f (x) = 1 − x2 but another function, will using the midpoint still minimize the error? What does it mean for a method of integration to be good? ⊔ ⊓ From Table 20.1, it seems that using the midpoint yields the best results, followed by the Trapezoid rule. Using the left point appears to give the worst results. Then, can we √ say that using the midpoint is the best method? Even if the function is not f (x) = 1 − x2 , will using the midpoint still minimize the error? What does it mean for a method of integration to be good? 20.2 Convergence order 161 Number of subintervals Left point si = xi Midpoint si = n=5 0.07386 0.00760 n=10 0.04073 0.00270 n=15 0.02828 0.00147 n=20 0.02172 0.00096 n=25 0.01765 0.00069 n=30 0.01488 0.00052 n=35 0.01286 0.00042 n=40 0.01134 0.00034 R √ Table 20.1 Integration approximation error for 01 1 − x2 dx. xi +xi+1 2 Trapezoid rule -0.02614 -0.00927 -0.00505 -0.00328 -0.00235 -0.00179 -0.00142 -0.00116 20.2 Convergence order When discussing what constitutes a good numerical method, one commonly used criterion is the convergence order. If a function is continuous, as the size of the subintervals △x tends to zero, the Riemann sum converges to the integral value. However, how quickly it converges depends on the method used. One of the ways to indicate the rate of convergence is the convergence order. If the function is continuous, as the mesh size △x tends to 0, the size of the approximation error decreases at a rate determined by the convergence order. This is typically expressed using big-oh notation. A convergence order k means that as △x approaches 0: Approximation Error = O(△xk ) as △x → 0. A larger k indicates faster convergence. So, how can we compute the convergence order based on numerical results? The best way to compute the convergence order k is by taking the logarithm of the approximation error. If the function is given as a power function F = yk and we want to find k, we take the natural logarithm of both sides. Then, if y = y1 corresponds to F = F1 and y = y2 corresponds to F = F2 : ln F1 − ln F2 = k ln y1 − k ln y2 ⇒ k = ln F1 − ln F2 . ln y1 − ln y2 Therefore, if the mesh size is △x = △x1 and △x = △x2 , and the errors are E1 and E2 respectively, the convergence order k is given by: k= ln E1 − ln E2 ln △x1 − ln △x2 (20.1) Of course, the error is not precisely given by a power function, so we should understand this as merely showing such a convergence order as △x decreases. Problem 20.2. Compute the convergence order for each method using Table 20.1. 162 20 Numerical Integration Solution 20.2 To compute the convergence order using equation (20.1) with the given data, it would be recommended to write a small code to perform the calculations rather than doing it manually for each case, as it would be time-consuming. The resulting table would be similar to Table 20.2. From this calculation, it seems that when using the left point, the convergence order appears to converge to 1, while for the midpoint and the Trapezoid rule, it appears to converge to around 1.5. ⊔ ⊓ Number of subintervals Left point si = xi Midpoint si = n=5→ 10 0.8587 1.4903 n=10→ 15 0.8996 1.4945 n=15→ 20 0.9181 1.4961 n=20→ 25 0.9293 1.4970 n=25→ 30 0.9369 1.4975 n=30→ 35 0.9426 1.4979 n=35→ 40 0.9469 1.4982 n=40→ 45 0.9505 1.4984 R √ Table 20.2 Convergence order for 01 1 − x2 dx. xi +xi+1 2 Trapezoid rule 1.4956 1.4975 1.4982 1.4986 1.4989 1.4991 1.4992 1.4993 Question 20.1. When using the left point, the convergence order appears to converge to 1, which is already known. However, when using the midpoint or the Trapezoid rule, it was known to converge to 2, but why does it converge to around 1.5? √ The test function we chose, f (x) = 1 − x2 , has a divergence in the derivative at x = 1. Graphically, the slope of the derivative is vertical at x = 1. Note that even if good integration methods are used, the convergence order is not as high as theoretically expected if the derivative of the function to be integrated is not finite. So let’s now integrate a function whose derivative is always less than 1, f (x) = sin x, on the interval [a, b] = [0, π/2]. Then, the true value is as follows: Z π/2 0 sin xdx = − cos x π/2 0 = − cos(π/2) + cos(0) = 1. Table 20.3 provides the convergence order. When using the left point, it appears to be close to 1, while when using the midpoint and the Trapezoid rule, it appears to be close to 2. We obtained results close to the known convergence order. Remember that theoretical convergence orders are limited to functions that are differentiable enough. Trapezoid Rule Integration involves calculating the areas between the x-axis and the graph of a function over intervals. In the Riemann sum, this is approximated by dividing the intervals into smaller parts and summing up the areas corresponding to each part. Another approximation method involves using trapezoids to approximate the area. 20.3 Numerical integrals and Gaussian quadrature 163 Number of subintervals Left point si = xi Midpoint si = n=5→ 10 1.0364 2.0031 n=10→ 15 1.0211 2.0010 n=15→ 20 1.0149 2.0005 n=20→ 25 1.0116 2.0003 n=25→ 30 1.0095 2.0002 n=30→ 35 1.0080 2.0001 n=35→ 40 1.0070 2.0001 n=40→ 45 1.0061 2.0001 Table 20.3 Convergence order for R π/2 0 xi +xi+1 2 Trapezoid rule 2.0018 2.0006 2.0003 2.0002 2.0001 2.0001 2.0001 2.0000 sin xdx. In essence, it averages the areas obtained from considering the left and right points of each interval. Thus, it can be expressed as: Z b a n n 1 n f (xi−1 ) + f (xi ) △x = f (x )△x + f (x )△x . f (x)dx ∼ =∑ i i−1 ∑ ∑ 2 2 i=1 i=1 i=1 So, strictly speaking, the trapezoid rule is not a type of Riemann sum. However, it is interesting to note that it is the average of two Riemann sums, one using the left point and the other using the right point. While the trapezoid rule averages two Riemann sums with a convergence order of 1, its convergence order becomes 2. It shares this convergence order with the midpoint method. One of the advantages of the trapezoid rule is that it comes with an error estimate, which we present without proof here. Theorem 20.1 (Error Estimate for Trapezoid Rule). Let f : [a, b] → R be twice R differentiable, and T (△x) be the estimate of the integral ab f (x)dx using the trapezoid rule with uniform mesh size △x. Then, there exists ξ ∈ [a, b] such that Z b a f (x)dx − T (△x) = − (b − a)3 ′′ f (ξ )|△x|2 . 12 Therefore, if the function f is twice differentiable and its second derivative is finite, the convergence order is O(|△x|2 ) as △x → 0. 20.3 Numerical integrals and Gaussian quadrature Is focusing on the midpoint in the previous Riemann integration the best way to reduce numerical errors? Yes. However, in Riemann integration, only one point si is chosen in the i-th subinterval, but when performing numerical calculations, it is possible to compute the area corresponding to the interval [xi , xi+1 ] using more than one point. The Trapezoid rule can be considered a case where two endpoints are selected. If more than one point is chosen, what points should be chosen? 164 20 Numerical Integration In the Trapezoid rule, endpoints are used, but there is another method to achieve higher convergence rates than that. It is the method presented in Gaussian quadrature or simply Gaussian-Legendre quadrature. When one point is chosen, the midpoint is chosen for this method. When multiple points are chosen, traditionally, points and coefficients, also known as weights, that need to be chosen based on the interval [−1, 1] are given. The sum of these coefficients is 2, which is the size of the interval. Table (20.4) provides these reference points and weights. You can understand how these values are given by studying Legendre polynomials. # of points 1 2 3 4 Points used Weights s1 = 0 w1 = 2 s1 = − √13 , s2 = √13 w1 = 1, w2 = 1 q q s1 = − 34 , s2 = 0, s3 = 34 w1 = 59 , w2 = 98 , w3 = 59 r r q q √ √ s2,3 = ± 37 − 27 65 , s1,4 = ± 37 + 27 65 w2,3 = 18+36 30 , w1,4 = 18−36 30 Table 20.4 Gaussian-Legendre quadrature in interval [−1, 1]. When the integration interval changes from [−1, 1] to [xi , xi+1 ], the positions and sizes need to be adjusted to obtain the values above. Since the interval size has changed from 2 to ∆ x, the weight w j simply needs to be multiplied by ∆2x . The x +x position s j is multiplied by ∆2x and then moved to the right by i 2 i+1 . Then we obtain Table 20.5. The number of points used can be increased, and the higher the number of points used, the higher the convergence order. Problem 20.3. Using the Gaussian quadrature approximation given in Table 20.5, calculate the convergence order when the number of points used is 1, 2, and 3. Solution 20.3 The calculated convergence order is given in Table 20.6. The case where 1 point approximation is used is the case of using the midpoint, and the cases where 2 and 3 points are used are new calculations. It can be seen that the convergence rates are even numbers, 4 and 6, respectively. However, when using 3 points approximation, it slightly fluctuates near 6. Why is that? The magnitude of the approximation error is compared in Table 20.6. It can already be seen that 20.3 Numerical integrals and Gaussian quadrature Number of points 1 2 3 4 165 Points used Weights x +x s1 = i 2 i+1 w1 = ∆ x x +x s1,2 = i 2 i+1 ± √13 ∆2x w1,2 = ∆2x q x +x x +x s1,3 = i 2 i+1 ± 34 ∆2x , s2 = i 2 i+1 w1,3 = 59 ∆2x , w2 = 89 ∆2x r q √ x +x s2,3 = i 2 i+1 ± 37 − 27 65 ∆2x w2,3 = 18+36 30 ∆2x r q √ xi +xi+1 s1,4 = 2 ± 37 + 27 65 ∆2x w1,4 = 18−36 30 ∆2x Table 20.5 Gaussian-Legendre quadrature in interval [xi , xi+1 ]. Number of subintervals n = 5 → 10 n = 10 → 15 n = 15 → 20 n = 20 → 25 n = 25 → 30 n = 30 → 35 n = 35 → 40 Table 20.6 Convergence order for 1 point approx. 2.0031 2.0010 2.0005 2.0003 2.0002 2.0001 2.0001 R π/2 0 2 points approx. 4.0034 4.0011 4.0005 4.0003 4.0002 4.0001 4.0001 3 points approx. 6.0036 6.0013 6.0033 5.9793 6.0257 6.4575 5.6449 sin xdx. the error reaches the MATLAB error limit. Increasing the number of significant digits reduces this phenomenon. Using a higher-order approximation may be more effective than increasing the number of intervals for integration. ⊔ ⊓ Number of subintervals n=5 n = 10 n = 15 n = 20 n = 25 n = 30 n = 35 1 point approx. 4.1242e-03 1.0288e-03 4.5705e-04 2.5707e-04 1.6451e-04 1.1424e-04 8.3930e-05 Table 20.7 Integration approximation error for R π/2 0 2 points approx. -2.2619e-06 -1.4104e-07 -2.7847e-08 -8.8097e-09 -3.6082e-09 -9.3919e-10 -5.5053e-10 3 points approx. 4.7849e-10 7.4576e-12 6.5437e-13 1.1635e-13 3.0642e-14 1.0214e-14 3.7748e-15 sin xdx. Problem 20.4. When increasing the number of points used, the convergence order changes, but the computational complexity also increases. Instead of increasing the number of points, would increasing the number of subintervals be more effective? How can we compare what is more effective? Interested students may find it beneficial to practice conducting convergence tests in cases other than those mentioned in this lecture. R √ Problem 20.5. Is there a reason why the convergence order for 01 1 − x2 dx is 1.5? Will the Gaussian quadrature technique with 2 or 3 points also yield approximately 1.5? Why is it 1.5? Lecture 21 Sequences and series Sequences and series are fundamentally the same. Given a sequence an , we can create its partial sum sn = ∑ni=1 ai , which is another sequence. We call sn a sequence composed of partial sums of the sequence an , or simply the series of an . Likewise, the sequence an can also be a series of some other sequence. If we define b1 = a1 , b2 = a2 − a1 , b3 = a3 − a2 , · · · , bn = an − an−1 , · · · , then an becomes the series of the sequence bn . Given a series, the purpose of this lecture is to find the sequence that generates the series and use this information to find limits of the series. (Sometimes partial sums sn are also referred to as series.) 21.1 Sequence of real numbers A collection of ordered numbers is called a sequence. We consider sequences composed of real numbers. The order is indicated by attaching an index. For example, a1 , a2 , a3 , a4 , · · · . Usually, we start indexing from 0 or 1, but it is not necessary. Depending on the situation, we can choose whichever is convenient. Unless stated otherwise, indices are considered as natural numbers i ∈ N = {0, 1, 2, . . . }.1 If a sequence an is given by a general formula like an = (−1)n−1 for n = 1, 2, · · · , we can list the sequence as ” {1, −1, 1, −1, · · · }” in order. Problem 21.1. Given the sequence {an } = {1, − 21 , 13 , − 41 , · · · }, find the general formula for an . 1 Some include 0 in natural numbers, and some don’t. It’s a matter of convenience; we include it here. 167 168 21 Sequences and series Solution 21.1 The general formula for this sequence is an = (−1)n+1 . n ⊔ ⊓ Convergence and limits of sequences are handled more simply and easily compared to functions. When discussing limits, it’s important to remember that the values of the initial terms of a sequence are essentially irrelevant; it’s the terms that come later that matter. The limit of a sequence is defined as follows. Definition 21.1. (1) We say a sequence an converges to a number L ∈ R as n → ∞ if for any given ε > 0, there exists N, which may depend on ε, such that |an − L| < ε whenever n ≥ N. We call L the limit of an as n → ∞ and denote lim an = L. (2) If n→∞ there is no such number L, we say an diverges as n → ∞. (3) If for any M ∈ R, there exists N > 0 such that an > M whenever n ≥ N, we say an diverges to infinity. (4) If for any M ∈ R, there exists N > 0 such that an < M whenever n ≥ N, we say an diverges to negative infinity. Saying a sequence diverges means it either grows to infinity or does not converge for any case. When defining the continuity of a function, we used the ε-δ method. The definition here is essentially the same. We’ve just replaced δ with N. Therefore, while it may seem somewhat familiar, it’s worth reconsidering its meaning. Problem 21.2. Prove the following. 1 (1) lim = 0. (2) lim k = k. (3) an = (−1)n diverges. n→∞ n→∞ n Solution 21.2 To prove this, it is necessary to have an understanding of the situation. Then, practice writing clearly is needed. Through the process of writing, thoughts become clearer and more organized. Also, through efforts to write clearly, new ways of expression can be discovered. Thus, mathematics has been used as a means to learn logical expression. (1) Let ε > 0 be given (or assume it is given). Let N be an integer greater than ε1 . Then, for all n > N, the following holds: 1 1 − 0 = < ε. n n We have found N satisfying the properties required by the definition, so the proof is complete. (2) This problem refers to the case where the sequence is given as an = k. Since it is given regardless of the index n, k simply represents a constant. Therefore, for any given ε > 0, we can choose N to be any integer greater than or equal to 1. Then, |an − k| = |k − k| = 0 < ε for any n > N. We have found N satisfying the properties required by the definition, so the proof is complete. (3) Saying that it diverges means there is no converging value L. Generally, it is more difficult to show that something does not exist than to show that it does. Let’s assume there is some L it converges to. Then, for ε = 0.5, there exists N such that 21.1 Sequence of real numbers 169 for all n > N, |an − L| < 0.5 must hold. However, no matter how large N is chosen, there are n > N such that an = 1 and m > N such that am = −1. Thus, we can create the following contradiction: 2 = |1 − (−1)| = |1 − L + L − (−1)| ≤ |an − L| + |L − am | < 0.5 + 0.5 = 1. A contradiction arises with 2 < 1, which was derived from the assumption that there exists a converging L. Therefore, the given sequence does not converge, i.e., it diverges. ⊔ ⊓ Problem 21.3. Let lim an = A and lim bn = B. Prove the following. n→∞ (1) lim (an + bn ) = A + B. n→∞ (4) lim (an bn ) = AB. n→∞ (5) n→∞ (2) lim (an − bn ) = A − B. n→∞ lim (an /bn ) = A/B n→∞ (3) lim (kan ) = kA. n→∞ if B ̸= 0. Solution 21.3 The proofs for the limits of sequences and functions are essentially the same. (1) Let ε > 0 be given. Then, since limn→∞ an = A, there exists N1 > 0 such that |an − A| < ε/2 whenever n > N1 . Similarly, since limn→∞ bn = B, there exists N2 > 0 such that |bn − B| < ε/2 whenever n > N2 . Let N = max(N1 , N2 ). Then, |(an + bn ) − (A + B)| = |an − A + bn − B| ≤ |an − A| + |bn − B| < 0.5ε + 0.5ε = ε whenever n > N. Therefore, limn→∞ (an + bn ) = A + B. (5) In this case, the condition B ̸= 0 is additionally required. However, if one tries too hard to prove this, it may become analysis rather than calculus. Nevertheless, students interested in mathematics are encouraged to try. ⊔ ⊓ Problem 21.4 (Sandwich Theorem). Let an ≤ bn ≤ cn and lim an = lim cn = L. n→∞ n→∞ Show that lim bn = L. n→∞ If an and cn tend to the same limit L, then any term bn squeezed between them also tends to L. Problem 21.5 (Continuity). Let lim an = L and f (x) be continuous at L. Show that n→∞ lim f (an ) = f (L). n→∞ It’s good to remember that if a function is continuous at a limit point, then the limit enters into the function. Problem 21.6 (Useful limits to remember). (1) lim x1/n = 1 if x > 0. n→∞ (2) lim n1/n = 1. n→∞ x n = ex . (3) lim 1 + n→∞ n 170 21 Sequences and series xn = 0. n→∞ n! n + 1 n (5) lim = e2 . n→∞ n − 1 (4) lim Solution 21.6 Let’s compute these limits. We skip (1) since it can be done similar to (2). 1/n ) (2) Take natural logarithm and then exponentiate. (Using eln(n have, ln n ln n1/n = → 0 as n → ∞. n Thus, using the continuity of ex , we get 1/n lim n1/n = lim eln n n→∞ n→∞ = n1/n .) We 1/n ( lim ln n ) = e n→∞ = e0 = 1. (Something invisible becomes visible when you take logarithm and then exponentiate. Why does this happen? What is going on? Have we gained something profound?) (3) Similarly, take logarithm, compute the limit, and then take the exponential. Let h = 1/n, then lim ln 1 + hx h→0 1/h x 1 ln(1 + hx) ln 1 + hx = lim = lim 1+hx = x. h→0 h h→0 h→0 1 h = lim We used L’Hopital’s rule in the last step. Then, exponentiate. (4) Intuitively clear. The denominator grows much faster. For a proof, let’s take natural logarithm. Then, xn ln = n ln x − ln(n!). n! This approach doesn’t work well. Divide numerator and denominator by xn and take the limit. Then, xn 1 lim = lim 1 2 = 0, n−1 n n→∞ n! n→∞ x x ··· x x because the absolute value of the denominator tends to infinity. (5) Rewrite (3) as follows: lim n→∞ n + 1 n n−1 = lim 1 + n→∞ 2 n−1 2 1+ = e2 · 1. n−1 n−1 ⊔ ⊓ Now, let’s familiarize ourselves with the concepts of upper bound, supremum, and limit supremum. 21.1 Sequence of real numbers 171 Definition 21.2. (1) A sequence an is called bounded above if there exists M such that an ≤ M for all n. Such M is called an upper bound of an . (2) The smallest upper bound of an is called the supremum of an and denoted by M = sup an . (3) Let M k be the supremum of {an : n ≥ k}. Then, the limit lim M k is called the limit k→∞ supremum of an and is denoted by M = lim sup an . We often focus on limits, and when examining the limit of a sequence, a finite number of initial values does not matter. Similarly, when considering supremum, we might want to discard a finite number of initial values and consider the limit supremum. Through the following problem, Problem 21.7. The largest value among the values of a sequence an is called the maximum, denoted by max an . The smallest value is called the minimum, denoted by min an . (1) Find the maximum of the sequence an = 9.8n − n2 . (2) Find the maximum of the sequence an = − n1 . Solution 21.7 (1) Consider the sequence an as a function, let’s say f (x) = 9.8x − x2 . Taking derivative, we get f ′ (x) = 9.8 − 2x. Thus, the maximum occurs at x = 4.9. But n needs to be an integer, and observing the form of the function f (x), the sequence an achieves its maximum value of 24 when n = 5. (2) As n increases, an increases. It approaches zero but never attains zero. Therefore, there’s no single maximum value for an . While there’s no maximum value, 0 acts as an upper bound. Hence, the supremum is 0. Maximum may or may not exist, but supremum always exists. This is why we define supremum instead of maximum. ⊔ ⊓ Now, let’s understand the concepts of lower bound, infimum, and limit infimum. Definition 21.3. (1) A sequence an is called bounded below if there exists M such that an ≥ M for all n. Such M is called a lower bound of an . (2) The largest lower bound of an is called the infimum an and denoted by M = inf an . (3) Let M k be the infimum of {an : n ≥ k}. Then, the limit limk→∞ M k is called the limit infimum of an and is denoted by M = lim inf an . Among sequences, monotonic sequences are the most manageable. They either continuously increase or decrease. Definition 21.4. Let an be a sequence of real numbers. (1) It is called an increasing sequence if an ≤ an+1 for all n. (2) It is called a decreasing sequence if an ≥ an+1 for all n. (3) It is called monotone if it is one of the two cases. If a sequence is bounded above and increasing, it converges easily. Problem 21.8 (Bounded monotone sequence). If an is bounded above and increasing, it converges. 172 21 Sequences and series Solution 21.8 The first step is to guess the converging value. Let L = sup an and show that it converges to L. Given ε > 0, now we need to find N. Firstly, note that for all indices n, an ≤ L. If not, L wouldn’t be the supremum. L −ε isn’t an upper bound, so there exists a number between L − ε and L in the sequence. Let’s denote one such index by N. If n > N, since an is increasing, L − ε < aN ≤ an ≤ L. Therefore, for all n > N, |an − L| < ε. ⊔ ⊓ The proof above is very basic and simple, but it might seem difficult if you’re not accustomed to the logical progression of such statements. Once you get used to expressing yourself mathematically or logically, it becomes easy thereafter. 21.2 Series of real numbers Given a sequence an , we can create a new sequence using the partial sums of this sequence: n sn = ∑ ai . i=1 Such a sequence created by partial sums is called a series, and its limit is represented as follows: ∞ lim sn = n→∞ ∑ an . n=1 ∑∞ i=1 an It is important to note that does not mean adding an infinite number of an . We never add an infinite number, nor can we. We only create a sequence using partial sums sn , find the limit of this sequence, and denote it as ∑∞ n=1 an . We refer to the sequence thus obtained or its limit as a series. ∞ Problem 21.9. Show that if the series lim ak = 0. ∑ ak converges, then k→∞ k=1 Solution 21.9 Let’s denote the limit as ∑∞ k=1 ak = L. Then, sk − sk−1 = ak , so lim ak = lim (sk − sk−1 ) = lim sk − lim sk−1 = L − L = 0. k→∞ k→∞ k→∞ k→∞ Hence, the sequence an converges to 0. ⊔ ⊓ If we reverse the above result, logically called the contrapositive, it states that if the sequence an does not converge to 0, then the series ∑∞ k=1 ak diverges. There are two possible cases to consider: either an converges to a non-zero value, or it diverges. n+1 Problem 21.10. Show that the series ∑∞ n=1 n diverges. Solution 21.10 Since lim n→∞ n+1 = 1, the series diverges. ⊔ ⊓ n 21.3 Power series 173 However, just because lim ak = 0 does not mean that the series ∑∞ n=1 an conk→∞ verges. 1 Problem 21.11. Show that the series ∑∞ n=1 n diverges. 1 Solution 21.11 In this case, lim an = lim = 0, but the series diverges. Specifin→∞ n→∞ n cally, 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 + + + + + + + + + + + + + + +··· 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 + + + + + + + +··· ≥ 1+ + + + + + + + 2 4 4 8 8 8 8 16 16 16 16 16 16 16 16 1 1 1 1 = 1+ + + + +··· 2 2 2 2 1+ In this calculation, what we are trying to show is that we can insert 12 as many times 1 1 as we want. We can have 4 18 s, 8 16 s, 16 32 s, and so on, infinitely. Therefore, it diverges to infinity. ⊔ ⊓ 1 The fact that the series ∑∞ n=1 n diverges is quite important and will be used frequently in the future. This case is a very important one at the boundary between convergence and divergence. 1 Question 21.1. If α < 1, the series ∑∞ n=1 nα diverges. Why is that? On the other 1 ∞ hand, if α > 1, the series ∑n=1 nα converges. Why is that? The former is obvious, but the latter is not. 21.3 Power series In the future, we will mainly deal with series called power series, which are given in the following form: ∞ ∑ cn (x − x0 )n . n=0 Here, x0 is called the center, and cn is the coefficient of the n-th term. Also, if we consider x as the variable, then a power series becomes a function that resembles an infinite-degree polynomial in x. When calculating partial sums of power series, we start indexing from 0 instead of 1. This is because it is convenient to denote the constant term when n = 0. Considering the case where we shift the x-axis parallel to the x0 = 0, is sufficient. The convergence of the above power series depends on the coefficients cn and the magnitude of the variable |x − x0 |. It is of primary interest to find the convergent value and the convergent region of the series. In particular, if all coefficients are the same, cn = c0 , we call it a geometric series. 174 21 Sequences and series Problem 21.12 (Geometric series). A sequence given by an = c0 xn , n = 0, 1, · · · , is called a geometric sequence. Prove the following: ∞ c0 ∑ c0 xn = 1 − x if |x| < 1, i=0 ∞ and if |x| ≥ 1, then the geometric series ∑ c0 xn diverges. n=0 n Solution 21.12 Let’s denote the partial sum as sn = ∑ c0 xi . Then, i=0 n n n n+1 sn − rsn = ∑ c0 xi − ∑ c0 xi+1 = ∑ c0 xi − ∑ c0 xi = c0 − c0 xn+1 . i=0 i=0 i=0 i=1 The last equality holds because all intermediate terms cancel out, leaving only the first and the last term. Therefore, for |x| < 1, c0 (1 − xn ) c0 = . n→∞ 1−r 1−r lim sn = lim n→∞ ∞ And for |x| ≥ 1, an = c0 xn does not converge to 0. Therefore, the series ∑ c0 xn n=1 diverges. ⊔ ⊓ Problem 21.13. Find the limit of the following series. (−1)n 4 1 1 + 81 +· · · . (2) ∑∞ (1) 19 + 27 n=0 4n . (3) 5.232323232323 · · · . Solution 21.13 (1) In this case, the initial value is a0 = r = 13 . Therefore, the limit of the series is: ∞ a0 1 1 ∑ ak = 1 − r = 9(1 − 1 ) = 9 2 k=0 3 3 1 9 1 (4) ∑∞ n=1 n(n+1) . and the common ratio is 1 = . 6 (2) The initial value is a0 = 4 and the common ratio is r = − 14 . Therefore, the limit of the series is: ∞ (−1)n 4 4 16 ∑ 4n = 1 − (− 1 ) = 5 . n=0 4 (3) (So you’re writing the repeating decimal 5.2323232323 · · · as a series? Oh, you mean interpret the repeating decimal as a series?) Rewriting this repeating decimal as a series: 5 + 0.23 + 0.0023 + 0.000023 · · · = 5 + 0.23 + 0.23 × 0.01 + 0.23 × (0.01)2 + · · · . 21.3 Power series 175 The 5 at the beginning is considered separately, so the initial term is a0 = 0.23 and the common ratio is r = 0.01 in a geometric series. Therefore, its limit is: 5+ 0.23 0.23 23 = 5+ = 5+ . 1 − 0.01 0.99 99 (4) This is not a geometric series. We can think of it as a power series with x = 1 1 and cn = n(n+1) , but it seems like saying all series can be seen as power series. Let’s just consider it as a general series starting from n = 1. The partial sum is: n 1 1 1 1 = − ∑ k(k + 1) ∑ k k + 1 = 1 − n + 1 . k=1 k=1 n sn = (Everything in the middle cancelled out.) Therefore, ∞ 1 lim sn = lim ∑ n(n + 1) = n→∞ n→∞ n=1 1− 1 = 1.⊓ ⊔ n+1 We have determined the convergence of geometric series, but we have not yet determined the convergence of power series. We will need several discernment techniques that will be learned in the next lecture. Exercises 1. Lecture 22 Tests for absolute convergence We learn techniques to determine the convergence of series. The absolute conver∞ gence test tests whether the series ∞ ∑ |an | converges. Then the series ∑ an auto- n=1 n=1 matically converges without taking absolute values. 22.1 Integral test A sequence of partial sums sn for a sequence with non-negative terms an ≥ 0 forms an increasing sequence. ∞ Problem 22.1. If an ≥ 0, then the convergence of the series ∑ an is equivalent to n=1 the existence of the supremum of sn . Solution 22.1 First, since sn+1 − sn = an+1 ≥ 0, we have sn+1 ≥ sn . Thus, sn is an increasing sequence. We already know that the convergence of an increasing sequence is equivalent to the existence of its supremum (bounded above). ⊔ ⊓ Using integrals to determine the convergence of series is called the integral test. Problem 22.2 (Integral test). Three conditions are necessary: (1) an ≥ 0, (2) an = f (n) ≥ 0 for j = 1, 2, · · · , (3) f (x) is a monotonically decreasing function. Then, ∞ the convergence of the series R∞ 1 ∑ an is equivalent to the convergence of the integral n=1 f (x)dx. Let’s consider the meaning of this integral test before proving it. There are three main conditions. All three conditions are necessary, and we should observe how they are used in the proof below. Moreover, we can construct counterexamples if any of these conditions are not satisfied. 177 178 22 Tests for absolute convergence Solution 22.2 Let’s mark where the three conditions are used in the following proof. First, Z Z n n+1 f (x)dx = 1 n k+1 f (x)dx ≤ ∑ k=1 k n Z k+1 ∑ k=1 n f (x)dx ≥ k=1 k n ∑ f (k) = ∑ ak = sn , k=1 n ∑ f (k + 1) = ∑ ak+1 = sn+1 − a1 . k=1 k=1 Therefore, sn+1 − a1 ≤ Z n+1 1 f (x)dx ≤ sn holds. At this step, we use that f is a decreasing function. If sn converges, then R n+1 f (x)dx is finite. Since f is monotonically decreasing and f (n) ≥ 0 for all 1 R R n, f (x) ≥ 0. Therefore, 1n+1 f (x)dx Rincreases as n increases. Thus, 1n+1 f (x)dx converges as n → ∞. Conversely, if 1∞ f (x)dx converges, then sn+1 is less than R∞ 1 f (x)dx + a1 . Therefore, sn is a finite increasing sequence, and thus it converges. ⊔ ⊓ ∞ The series ∞ 1 1 ∑ n is known to diverge. So, what about the series ∑ nα ? If α < 1, n=1 n=1 then each term will be larger than when α = 1, so the series will certainly diverge. If α > 1, then each term will be smaller, so there is a possibility that the series converges. In the following problem, we show that the series converges for α > 1. This means that α = 1 serves as the boundary for convergence. We saw a similar phenomenon with integrals, and indeed, the reason behind it is the same. The proof also utilizes this fact. ∞ Problem 22.3. (1) Show that for all α > 1, the series 1 ∑ nα converges. (2) Show n=1 ∞ that for all α ≤ 1, the series 1 ∑ nα diverges. n=1 Solution 22.3 (1) Let f (x) = x1α = x−α . Then f is a positive, decreasing function for x ≥ 1, and f (n) = n1α . Therefore, we can apply the integral test. As n → ∞, we have Z n Z n f (x)dx = 1 1 x−α dx = 1 1−α x 1−α n 1 = 1 1 1 n1−α + → . 1−α α −1 α −1 From this, we can conclude that the series converges for all α > 1. (2) We have already shown that the series diverges for α = 1. For α < 1, since each term is larger than when α = 1, the series diverges even faster (this is known as the comparison test). ⊔ ⊓ 22.2 Comparison test 179 Problem 22.4. Determine the convergence of the following series. ∞ ∞ ∞ 2 1 1 (1) ∑ 2 . (2) ∑ ne−n . (3) ∑ ln n . n + 1 2 n=1 n=1 n=1 Solution 22.4 We use the integral test. First, we check if the three conditions are satisfied. ⊔ ⊓ 22.2 Comparison test If we know the convergence of one series, we can often determine the convergence of a smaller or larger series. ∞ Problem 22.5 (Comparison test). Let 0 ≤ an ≤ bn . If the larger series ∑ bn con- n=1 ∞ verges, then the smaller series ∑ an ∞ also converges. If the smaller series n=1 ∞ diverges, then the larger series ∑ an n=1 ∑ bn also diverges. n=1 The convergence does not depend on a finite number of terms, no matter how large or small they are initially. What matters is the behavior as the index increases. In the comparison test, comparison is only needed for sufficiently large indices. That is, 0 ≤ an ≤ bn needs to hold only for sufficiently large n. Solution 22.5 The proof is simple. If the terms are positive, then the partial sums form an increasing sequence. If the larger partial sum sequence converges, it is bounded above, and therefore, the smaller partial sum sequence is also bounded above. Conversely, if the smaller partial sum sequence is unbounded, then the larger partial sum sequence is also unbounded. This logic allows us to answer rigorously. ⊔ ⊓ Question 22.1. One thing to be careful of in the above conditions is that the comparison cannot be applied to sequences where the signs change. It only works when all terms are positive. A similar statement can be made for the case where all terms are negative, an ≤ bn ≤ 0. Let’s write this out. Problem 22.6 (Limit comparison). Let an , bn ≥ 0 for n ≥ N. (1) Let lim ∞ Then, if ∞ ∞ ∞ an n→∞ bn = 0. ∑ bn converges, ∑ an converges. If ∑ an diverges, ∑ bn diverges. (2) n=1 n=1 n=1 n=1 ∞ ∞ an Let lim = C ̸= 0. Then, ∑ an converges if and only if ∑ bn converges. n→∞ bn n=1 n=1 180 22 Tests for absolute convergence If lim an n→∞ bn = ∞, then lim bn n→∞ an = 0, so we can apply (1). The convergence of a series is determined by its behavior as n → ∞, so it is natural that it is determined by the an limit lim . However, note that this comparison does not work when the signs n→∞ bn change. Solution 22.6 ⊔ ⊓ Problem 22.7. Determine the convergence of the following series. ∞ ∞ 5 1 1 1 1 1 √ + √ + √ +···. (1) ∑ . (2) ∑ . (3) + 3 2 4 3 2 + 2 2 + 3 2 + 4 n=1 5n − 1 n=1 n! Solution 22.7 ⊔ ⊓ 22.3 Ratio test In the previous discussion, we considered sequences with positive values. Now we consider sequences and series that can have both positive and negative values. However, we do not perform a detailed test that considers both positive and negative values separately. Instead, we discuss convergence when taking the absolute values of both positive and negative terms. Let’s start with the definition of absolute convergence. ∞ Definition 22.1. We say ∞ ∑ an converges absolutely if ∑ |an | converges. n=1 n=1 If the series ∑∞ n=1 an converges absolutely, then it can be easily shown to converge in the usual sense. ∞ Problem 22.8. Show that if ∞ ∑ |an | converges, then ∑ an converges. n=1 n=1 The best way to prove convergence when the limit value is unknown is to use the concept of Cauchy sequences. Solution 22.8 Let sn be the partial sum of an and vn be the partial sum of |an |. Since vn converges, for any ε > 0, there exists N such that |vn − vn′ | < ε whenever n, n′ > N. Since n |sn − sn′ | = ∑′ k=n +1 n ak ≤ ∑′ k=n +1 sn converges. ⊔ ⊓ Let’s consider a few simple examples. |ak | < ε, n, n′ > N, 22.3 Ratio test 181 Problem 22.9. Show that the following series converge. ∞ ∞ (−1)n+1 sin n (1) ∑ . (2) . ∑ 2 2 n n=1 n=1 n Solution 22.9 Both of these examples, when their absolute values are taken, become 1 series such as ∑∞ n=1 n2 or smaller, which are known to converge. Therefore, they converge absolutely. ⊔ ⊓ Next, we introduce two methods for determining absolute convergence, with the ratio test being the first. The Greek letter ρ is read as ”rho”. Problem 22.10 (Ratio test). Show the following when lim n→∞ (1) If ρ < 1, ∑∞ n=1 an converges absolutely. (2) If ρ > 1, ∑∞ n=1 an diverges. (3) If ρ = 1, no conclusion. an+1 = ρ. an If an is a geometric sequence, then ρ becomes the common ratio. Even if it’s not a geometric sequence, as n approaches infinity, |an | tends to resemble the form of a geometric sequence. Therefore, it’s quite natural that if ρ corresponding to the geometric sequence is greater than 1, it diverges; if it’s less than 1, it converges. When ρ equals 1, it encompasses both cases of convergence and divergence, making it indeterminate. 1 Solution 22.10 First, let’s check (3). We already know that the series ∑∞ n=1 n diverges. Calculating ρ, we find ρ = limn→∞ n+1 n = 1. We also know that the series 2 1 n +2n+1 = 1. Thus, when ∑∞ n=1 n2 converges. Calculating ρ, we find ρ = limn→∞ n2 ρ = 1, both convergent and divergent cases are included, making it inconclusive. Let’s prove (1). To understand the principle, consider this: If ρ < 1, there exists |a | a common ratio r between ρ and 1, and for all n > N, |an+1 < r holds. Then, we n| can create a geometric series with ratio r that is larger than |an |. Therefore, by the comparison test, the series converges. Students serious about mathematics can take this logic, create such series abstractly, and complete the proof. The logic for (2) can be similarly constructed. ⊔ ⊓ Let’s practice using the ratio test with a few examples. Problem 22.11. Determine the convergence of the following series. ∞ ∞ ∞ 2n + 5 (2n)! 4n n!n! (1) ∑ . (2) ∑ . (3) ∑ . n n=1 3 n=1 n!n! n=1 (2n)! Solution 22.11 For (1), the series looks like a geometric series with r = sufficiently large n. Thus, we might consider applying the ratio test. Then, 2 3 for 182 22 Tests for absolute convergence |an+1 | 2n+1 + 5 3n = lim . n→∞ |an | n→∞ 3n+1 2n + 5 ρ = lim Dividing both numerator and denominator by 2n 3n , we get 2 1 2 + 5/2n = . n n→∞ 3 1 + 5/2 3 ρ = lim As expected, we find that ρ = 32 , so the series converges by the ratio test. For (2), although it doesn’t look like a geometric series, when we compute the ratio, many terms cancel out, simplifying the calculation. Let’s see: ρ = lim n→∞ (2n + 2)! (2n + 2)(2n + 1) |an+1 | n!n! = lim = lim = 4. n→∞ (n + 1)!(n + 1)! 2n! n→∞ (n + 1)(n + 1) |an | Thus, the series diverges by the ratio test. For (3), after reversing the ratio of 4 from the previous case and multiplying it by 4n , we might expect to get ρ = 1. Therefore, it is inconclusive. ⊔ ⊓ 22.4 Root test Now, we introduce the root test, which is sometimes very useful. Problem 22.12 (Root test). Let ρ = lim n→∞ p n |an | = |an |1/n . Show the following: ∞ (1) If ρ < 1, ∞ ∑ an converges absolutely. (2) If ρ > 1, n=1 ∑ an diverges. n=1 (3) If ρ = 1, no conclusion. Solution 22.12 The proof and logic for this are very similar to those for the ratio 1 1 ∞ test. Let’s first check (3). For the series ∑∞ n=1 n and ∑n=1 n2 , both have ρ = 1. From 1/n Problem 21.6(2), we know that limn→∞ n = 1. Also, lim (n2 )1/n = lim (n1/n )2 = ( lim n1/n )2 = 12 = 1. n→∞ n→∞ n→∞ (The statement about the continuity of the function f (x) = x2 was used. Where?) Therefore, in the case of ρ = 1, it encompasses both scenarios of convergence and divergence, rendering the determination inconclusive. Let’s prove (1). To understand the principle, consider this: If ρ < 1, there exists a common ratio r between ρ and 1, and for all n > N, |an |1/n < r holds. When raised to the power of n, we get |an | < xn . 22.4 Root test 183 The right side forms a geometric series and converges. Therefore, by the comparison test, ∑∞ n=1 an converges absolutely. The logic for (2) can be similarly constructed. ⊔ ⊓ Let’s practice using the root test with a few examples. Problem 22.13. Determine if the following series converge. ∞ ∞ ∞ 1 n 2n 2n (3) ∑ (1) ∑ 3 . (2) ∑ √ . . n=1 n + 1 n=1 n ( n=1 n! ∞ (4) ∑ an , an = n=1 n2−n , if n is odd . 2−n , if n is even Whether to use the ratio test or the root test is something to be learned through practice. First, make a prediction and then iterate between failure and success. Solution 22.13 For (1), the ratio test seems appropriate. Then, 2n+1 n3 = 2. n→∞ 2n (n + 1)3 ρ = lim Thus, the series diverges by the ratio test. For (2), let’s try the ratio test as well. Then, r √ 2n+1 n! 1 ρ = lim p = 0. = 2 lim n→∞ 2n (n + 1)! n→∞ n+1 Thus, the series converges by the ratio test. For (3), the root test seems appropriate. Then, 1 ρ = lim (an )1/n = lim = 0. n→∞ n→∞ n + 1 Thus, the series converges by the root test. (4) is artificially created to demonstrate a case where the ratio test does not work well, but the root test does. First, let’s try the ratio test. Then, (n + 1)2−n−1 lim = lim (n + 1)2−1 = ∞ if n is even n→∞ 2−n ρ = n→∞ 2−n−1 −1 2 = lim =0 if n is odd lim −n n→∞ n2 n→∞ (n + 1) Thus, the ratio test is not helpful. Let’s try the root test. Then, lim (n2−n )1/n = lim n1/n 2−1 = 2−1 , n→∞ n→∞ lim (2−n )1/n = 2−1 . n→∞ Both converge to 0.5, so ρ = 2−1 , and the series converges by the root test. ⊔ ⊓ 184 Exercises 1. 2. 22 Tests for absolute convergence Lecture 23 Power series During the lecture, we delve into the examination of convergence and convergence radius of power series utilizing the root test, a method among the absolute convergence criteria we’ve covered. The convergence analysis of Taylor series is rooted in this approach. Towards the latter part of the lecture, we also present the introduction of the conditional convergence test. 23.1 Convergence of a power series Let’s develop an understanding of the convergence region of power series and its relationship with the coefficients through the following examples. Problem 23.1. Find the convergence regions of the following power series. ∞ ∞ ∞ ∞ ∞ xn 1 n n x . (4) ∑ n2 xn . (5) ∑ (−1)n . (1) ∑ xn . (2) ∑ (−1)n xn . (3) ∑ − 2 n n=0 n=0 n=0 n=0 n=0 The convergence test for power series often allows for both the root test and the ratio test. Of course, the coefficients must be considered. Cases (1), (2), and (3) are geometric series, so we can easily find their limits. Cases (4) and (5) seem challenging to find the limits, but we can still determine their convergence. Solution 23.1 For (1), it’s a geometric series with a common ratio of x. Therefore, it converges for all |x| < 1 and diverges for all |x| ≥ 1. For (2), it’s also a geometric series with a common ratio of −x. The convergence region is the same as in (1), i.e., |x| < 1. Although (2) appears as an alternating series, it’s not exactly. If x is positive, (2) is alternating, and if x is negative, (1) becomes an alternating sequence. 185 186 23 Power series For (3), it’s a geometric series with a common ratio of − 2x . The coefficients decrease rapidly, and the convergence region is |x| < 2, which is twice as large as (1). For (4), the coefficients increase as n2 . Applying the ratio test, we get (n + 1)2 xn+1 (n + 1)2 = lim x = x. 2 n n→∞ n→∞ n x n2 lim Therefore, it converges for |x| < 1. The boundary cases x = 1 and x = −1 both diverge. The convergence interval remains unchanged. In conclusion, even with coefficients growing as n2 , it doesn’t affect the convergence region. For (5), the coefficients decrease, but the convergence interval remains unchanged. Using the ratio test, we find n nxn+1 = lim x = x. n→∞ n + 1 n→∞ (n + 1)xn lim Therefore, it converges for |x| < 1. Among the boundary cases, x = 1 converges. We can use the Alternating Series Test. For x = −1, it diverges. Since one boundary point is included in the convergence interval, the interval is −1 < x ≤ 1. ⊔ ⊓ Coefficients growing as finite squares of n or decreasing as finite squares of n1 don’t affect the convergence region of power series. However, there might be variations at the boundary points. On the other hand, if the coefficients grow or decrease like geometric series, the convergence region adjusts accordingly. This is natural since the behavior of coefficients is similar to that of a geometric series. Now let’s consider cases where the coefficients grow or decrease faster than geometric series. Problem 23.2. Find the convergence regions of the following power series. ∞ n ∞ x (1) ∑ . (2) ∑ n!xn . n=0 n! n=0 Solution 23.2 For (1), the coefficients are n!1 , decreasing rapidly. Since they decrease much faster than a geometric series, we expect a large convergence region. Let’s verify this by the ratio test: xn+1 n! x = lim = 0 < 1. n→∞ (n + 1)!xn n→∞ n + 1 ρ = lim This means ρ is less than 1 for any x, implying the convergence interval is the entire real line. This series is well-known, converging to ex . Differentiating it yields the same function, verifying its convergence. For (2), the coefficients grow much faster than a geometric series. We expect a very small convergence region. Let’s confirm this with the ratio test: 23.2 Radius of convergence 187 xn+1 (n + 1)! ρ = lim = lim (n + 1)x = n→∞ n→∞ n!xn ( 0 ∞ if x = 0, otherwise. Therefore, the convergence region consists of a single point, x = 0. ⊔ ⊓ Let’s use the root test to determine the convergence of the general power series n ∑∞ n=0 cn x . Then, ρ = lim (|cn ||x|n )1/n = lim (|cn |)1/n |x| n→∞ n→∞ According to the root test, if ρ > 1, the series diverges; if ρ < 1, it converges; and if ρ = 1, it may either diverge or converge. Therefore, using this, we can find the convergence radius as follows. 23.2 Radius of convergence Problem 23.3 (Radius of convergence). The radius of convergence R of a given n series ∑∞ n=0 cn x is defined as follows: R= 1 1 = . ρ lim (|cn |)1/n n→∞ It satisfies the following: ∞ 1. If |x| < R, the series ∑ cn xn converges. n=0 ∞ 2. If |x| > R, the series ∑ cn xn diverges. n=0 ∞ 3. If |x| = R, the series ∑ cn xn may converge or diverge. n=0 Solution 23.3 This is obvious from the root test. ⊔ ⊓ Problem 23.4 (Differentiability of power series). Let R be the radius of conver∞ n n gence of ∑∞ n=0 cn x , and let f (x) = ∑n=0 cn x for |x| < R. Then, n−1 . 1. For all |x| < R, f is differentiable, and f ′ (x) = ∑∞ n=1 ncn x 2. For all |x| < R, f is infinitely differentiable, and for |x| < R, it satisfies: f (k) (x) = ∞ ∑ n(n − 1) · · · (n − k + 1)cn xn−k . n=k 188 23 Power series n ∑ ck xk is given by s′n (x) = Solution 23.4 The derivative of the partial sum sn (x) = k=0 n ∑ kck x k−1 . The convergence region of s′n (x) is also R. Since the differentiation k=0 operation is continuous, f ′ (x) = ( lim sn (x))′ = lim (sn (x))′ = n→∞ n→∞ ∞ ∑ ncn xn−1 . (23.1) n=0 ∞ By repeatedly applying this process to the power series ∑ ncn xn−1 , we obtain all n=0 its derivatives. ⊔ ⊓ Question 23.1. After claiming that differentiation is continuous, we took the limit out of the derivative in (23.1). Can you explain the connection between claiming differentiation is continuous and taking the limit out of the derivative like this? Is the differentiation operation really continuous? How does it seem? Just as we can differentiate, we can also integrate. Integrate the partial sums, verify their convergence radius, and then take the limit. The integration operation is also continuous. Problem 23.5 (Integrability of power series). Let R be the radius of convergence ∞ n n of ∑∞ n=0 cn x , and let f (x) = ∑n=0 cn x for |x| < R. Then, Z x F(x) = ∞ f (s)ds = 0 cn ∑ n + 1 xn+1 , |x| < R. n=0 Solution 23.5 Omitted. ⊔ ⊓ n Given a power series ∑∞ n=0 cn (x − x0 ) , if the convergence radius is R, then we can differentiate and integrate within this range as much as we want, and it will still converge. This means that a function represented by a power series is differentiable and integrable at any point within its convergence radius. 23.3 Alternating series A sequence an is called an alternating sequence if the sign of each element alternates, and the series formed by it is called an alternating series. For example, a sequence where the numbers corresponding to even indices are positive and those corresponding to odd indices are negative is an alternating sequence. Therefore, if an is an alternating sequence, it can be written as follows: an = (−1)n−1 bn , bn ≥ 0. 23.3 Alternating series 189 Let’s study the properties of series composed of such sequences. Problem 23.6 (Alternating series test). Let’s assume that bn ≥ 0, bn → 0 as n → ∞, n−1 b and bn is monotonically decreasing. Then, prove that the series ∑∞ n n=1 (−1) converges, and furthermore, for all n > 0, show that ∞ s2n ≤ ∑ (−1)n−1 bn ≤ s2n+1 (23.2) n=1 holds. It is necessary for the sequence bn to be a decreasing sequence and converge to 0. If either of these conditions is not satisfied, a counterexample can be constructed where the series does not converge. Additionally, Equation (23.2) can play a crucial role as an error estimate for convergence. Knowing not only that a series converges but also where its limit lies is very important. Solution 23.6 The partial sums are expressed as follows: s2n+1 = b1 − b2 + b3 − b4 + b5 − · · · − b2n + b2n+1 = b1 − (b2 − b3 ) − (b4 − b5 ) − · · · − (b2n − b2n+1 ). Since bn is a decreasing sequence, b2 − b3 ≥ 0, b4 − b5 ≥ 0, b6 − b7 ≥ 0, · · · . Therefore, the partial sum s2n+1 is a decreasing sequence as n increases. Furthermore, rewriting s2n+1 yields s2n+1 = b1 − b2 + (b3 − b4 ) + · · · + (b2n−1 − b2n ) + b2n+1 ≥ b1 − b2 which has a lower bound (bounded below). Therefore, s2n+1 converges. Now, let’s denote its limit as L1 . Similarly, considering s2n , we have: s2n = (b1 − b2 ) + (b3 − b4 ) + (b5 − b6 ) + · · · + (b2n−1 − b2n ) Each term is either 0 or positive, so s2n is an increasing sequence. Also, rewriting s2n gives: s2n = b1 − (b2 − b3 ) − (b4 − b5 ) − · · · − (b2n−2 − b2n−1 ) − b2n ≤ b1 which has an upper bound (bounded above). Hence, s2n converges, and let’s denote its limit as L2 . Then, the difference between the two limits is: L1 − L2 = lim s2n+1 − lim s2n = lim (s2n+1 − s2n ) = lim b2n+1 = 0. n→∞ n→∞ n→∞ n→∞ n−1 b converges. Since s Therefore, L1 = L2 and ∑∞ n 2n+1 is a decreasing sen=1 (−1) quence and s2n is an increasing sequence, (23.1) is satisfied. ⊔ ⊓ 190 23 Power series Problem 23.7. Determine the convergence of the following series. ∞ ∞ 1 10n (1) ∑ (−1)n−1 . (2) ∑ (−1)n−1 2 . n n + 16 n=1 n=1 Solution 23.7 (1) Since n1 is positive, decreasing, and converges to 0, by the altern−1 1 converges. However, if we attach absonating series test, the series ∑∞ n=1 (−1) n 1 ∞ ∞ 1 n−1 lute values, ∑n=1 |(−1) n | = ∑n=1 n does not converge. Such cases where adding absolute values causes divergence while without absolute values they converge are called conditional convergence. is positive and converges to 0 as n approaches infinity. However, it (2) bn = n210n +16 is not a decreasing sequence. For small n, it can increase. But for sufficiently large n, it decreases. Furthermore, when applying the Alternating Series Test, the initial few terms do not affect convergence. How do we show that it is a decreasing seand analyze the sign of its derivative. quence for large n? Let’s denote f (x) = x210x +16 Upon computation, we find that f ′ (x) ≤ 0 when x > 4. Therefore, for n > 4, it is decreasing, and by the property of converging decreasing sequences, it converges. ⊔ ⊓ (23.1) suggests that powers can estimate the location of limits. Let’s verify this through the following problem. n−1 2−n with an error of less than Problem 23.8. Estimate the value of L = ∑∞ n=1 (−1) 0.01. Solution 23.8 Since it is an alternating series and an is positive when n is odd, s2n < s2n+1 . The limit lies between s2n and s2n+1 with a gap of b2n+1 . Therefore, we start by finding n such that b2n+1 < 0.01. This implies ln 2−2n−1 < ln 0.01 ⇒ −2n − 1 < ln 0.01 ⇒ n > 2.8219. ln 2 Thus, n = 3. Then, the estimation interval is (s6 , s7 ). Of course, we know the value of L: 0.5 1 L= = . 1 − (−0.5) 3 Therefore, it can be confirmed that 1 3 ∈ (s6 , s7 ). ⊔ ⊓ 23.4 Rearrangement and conditional convergence A sequence is a collection of numbers with a specified order. Sometimes, by rearranging the order, we can infer properties of the original sequence. This is called rearrangement. But how do we define it? 23.4 Rearrangement and conditional convergence 191 Definition 23.1 (Rearranged series). Let N be the set of natural numbers and an be a given sequence for all n ∈ N. A sequence bn is called a rearrangement of an if there exists a one-to-one onto mapping φ : N → N such that bn = aφ (n) , n ∈ N. Question 23.2. Can you explain if this definition satisfies the intended purpose? Problem 23.9. If the series ∑∞ n=1 an absolutely converges, then all rearrangements also absolutely converge, and their limits do not change. This problem demonstrates that for absolutely convergent series, rearrangements do not make significant differences. The proof is relatively simple but requires careful organization. ∞ Solution 23.9 Let ∑∞ n=1 an = L. Now let’s show that ∑n=1 bn = L. Let sn be the partial sum of an , and vn be the partial sum of bn . We need to show that for any given ε > 0, there exists an N such that for all n > N, |vn − L| < ε. Since ∑∞ n=1 an = L, there exists N1 such that for all n > N1 , |sn − L| < ε2 holds. Moreover, since ∑∞ n=1 |an | ε converges, there exists N2 > N1 such that ∑∞ |a | < . Now, we choose N such k k=N2 2 that: N = max{φ −1 (n) : n ≤ N2 }. Then, for n > N, n N2 ∞ |vn − L| = | ∑ bk − L| < | ∑ ak − L| + k=1 k=1 ∑ |ak | ≤ k=N2 ε ε + = ε. 2 2 Thus, ∑∞ ⊓ n=1 bn = L. (The core is the first inequality.) ⊔ Now let’s consider another type of convergence. ∞ Definition 23.2. We say ∑∞ n=1 an converges conditionally if ∑n=1 an converges, but ∞ ∑n=1 |an | diverges. The statement that the series ∑∞ n=1 an converges, but the series with absolute values attached, ∑∞ |a |, diverges, means that within it, there are both negative and n=1 n positive terms that cancel each other out, resulting in convergence. However, attaching absolute values leads to divergence. Problem 23.10. Verify that the following series converge conditionally. ∞ ∞ ∞ (−1)n n sin(n) (−1)n with 0 < α ≤ 1. (2) . (3) . (1) ∑ ∑ ∑ α 2 − 2n + 1 n n n n=1 n=1 n=1 Solution 23.10 (1) and (2) can both be shown to converge by the alternating convergence test. It can also be shown that attaching absolute values leads to divergence. 192 23 Power series Hence, they converge conditionally. (3) is not precisely an alternating series. The sine function changes sign every π. It changes sign twice per 2, which means occasionally it doesn’t change sign. While it seems likely to converge conditionally due to approximately balanced positive and negative terms, there isn’t a straightforward way to prove it. ⊔ ⊓ The statement of conditional convergence means that adding only positive terms or only negative terms separately would lead to divergence. However, they converge when combined appropriately. Would their convergence remain unchanged if we rearrange their order? Surprisingly, we obtain unexpected results. Problem 23.11. If a series ∑∞ n=1 an converges conditionally, then regardless of the given number L, we can create a rearranged series ∑∞ n=1 bn that converges to L. Solution 23.11 Even though it may seem odd that rearranging the terms would converge to a specific L, especially when the series contains both positive and negative terms that separately lead to divergence, this is simply due to our familiarity with finite worlds and lack of experience with the infinite world. Let’s create such a rearranged series bn . This means creating a one-to-one onto mapping φ : N → N so that the series converges to L. Remembering that the sequence an contains infinitely many positive and negative terms, and each sum separately diverges, let’s create a rearrangement. First, let b1 = 1. If ∑kn=1 bn exceeds L, we assign the next negative index. If it’s less than L, we assign the next positive index. Continuing this process, we ensure that eventually, ∑∞ ⊓ n=1 bn = L. ⊔ Exercises 1. Determine the convergence intervals of the following power series. ∞ ∞ ∞ ∞ 3n x2n (2n)! n x (4) ∑ 2 (1) ∑ nxn (2) ∑ n2 (3x − 1)n (3) ∑ n=1 n n=1 n=1 n=1 n! ∞ ∞ ∞ ∞ (−1)n (2x − 1)n (x2 − 1)n nn x n n2n xn (5) ∑ (6) ∑ (7) ∑ (8) ∑ 2n + 2 n=1 n=1 2 · 4 · 6 · · · 2n n=1 n! n=1 n! 2. Lecture 24 Taylor Series In the previous lecture, we studied various properties and convergence of sequences and series, which served as preliminary work for studying Taylor series. In this lecture, we introduce Taylor series and study its properties. 24.1 Taylor series Let’s approximate a function f (x) that is differentiable n times as a power series, especially around a point x = x0 where we want the approximation to be good. Let’s initially approximate it as f (x) ∼ = n ∑ cs (x − x0 )s ≡ pn (x) (24.1) s=0 Let’s denote the right-hand side sum simply as pn (x). Question 24.1. We want to choose the coefficients cs so that the polynomial pn (x) on the right becomes a good approximation of the function f (x) on the left. How should we determine the coefficients? There are various methods to determine the coefficients cs , but in Taylor series, we choose them such that all derivatives up to order n are equal at one tangent point x0 . Problem 24.1. Suppose the function f is differentiable up to order n at x = x0 . Determine the coefficients cs such that the approximation function pn (x) in (24.1) and the target function f (x) have equal derivatives from order 0 to n at x = x0 . Solution 24.1 Since there are a total of n + 1 coefficients, we can make n + 1 derivatives equal from order 0 to n. That is, 193 194 24 Taylor Series (k) f (k) (x0 ) = pn (x0 ), k = 0, 1, · · · , n Let’s determine the coefficients cs so that the above equation holds. This equation forms a system of n + 1 simultaneous equations with n + 1 coefficients as unknowns. Moreover, the right-hand side is already diagonalized. By differentiating pn (x) k times and substituting x0 for x, we obtain (k) pn (x0 ) = n ∑ s(s − 1) · · · (s − k + 1)cs (x − x0 )s−k x=x0 = ck k! s=k This calculation can be explained in detail. When we differentiate pn (x) k times, all terms with degrees less than k become 0. Thus, the summation starts from s = k. For terms with degrees s ≥ k, the kth derivative of cs (x − x0 )s is given as shown above, which has a factor of (x − x0 )s−k . In particular, the kth derivative of the term ck (x − x0 )k becomes the constant k!ck . Substituting x with x0 in the above expression yields (k) 0 for all terms except the constant term. Therefore, rewriting f (k) (x0 ) = pn (x0 ), we get f (k) (x0 ) = k!ck . Thus, the coefficients are given by ck = f (k) (x0 ) k! ⊔ ⊓ The polynomial pn (x) constructed with these coefficients is called the Taylor polynomial. The nth degree Taylor polynomial of function f (x) centered at x0 is as follows: n f (k) (x0 ) pn (x) = ∑ Taylor polynomial (x − x0 )k . k! k=0 Question 24.2. Taylor polynomial is an approximation function made with derivative information at one point. Therefore, while this approximation function can approximate the function f (x) well near the differentiation point x0 , we cannot expect pn (x) to converge to f (x) if x0 is far away. However, many known functions do converge. For example, functions like sin x converge for all x ∈ R. How is this possible? Problem 24.2 (Taylor series). If the function f (x) is differentiable an infinite number of times, we can create a series instead of a partial sum: ∞ p(x) = ∑ k=0 f (k) (x0 ) (x − x0 )k . k! Taylor series This is called the Taylor series. Of course, it is meaningful only within its convergence interval. (1) When does this series converge? (2) If f (x) = sin x, what is the convergence radius? (3) Can we say that p(x) equals f (x) on the convergence interval of the series? 24.1 Taylor series 195 Solution 24.2 (1) Since there is k! in the denominator of the coefficients, there is a possibility of a large convergence interval. However, if f (k) (x0 ) grows very quickly, this effect may be diminished. The ratio test seems useful. (2) If the function is f (x) = sin x, then f (k) (x0 ) is always less than 1 in absolute value. Therefore, by applying the ratio test, we can show that the convergence interval is the entire real line. (3) If the series converges, is p(x) equal to f (x)? In fact, there is no reason for it. The approximation function p(x) only has information about the original function f (x) at one point x0 . Therefore, although it may be near x0 , there is no reason for p(x) to equal f (x) far from x0 . If they do, it would be quite surprising. However, many functions do so. How can this be possible? How can we prove it? ⊔ ⊓ The convergence of Taylor series alone does not indicate what its limit is. The real value of Taylor series lies in error estimation. Theorem 24.1 (Taylor’s theorem (Lagrange form)). Suppose that f (x) is differentiable n + 1 times for all x ∈ (a, b) ⊂ R, and x0 ∈ (a, b). Then, there exists a point c between x and x0 such that f (x) = pn (x) + Rn (x), where n pn (x) = ∑ k=0 f (k) (x0 ) (x − x0 )k , k! Rn (x) := f (n+1) (c) (x − x0 )n+1 . (n + 1)! (24.2) Proof. (The logic used in this proof is also employed in Problem 24.1.) When proving this theorem, x and x0 are constants. We will denote the function by s as a variable. Let f (x) = pn (x) + M(x − x0 )n+1 f (x) − pn (x) . Now, we need to (x − x0 )n+1 show that this constant M is given as in the theorem. Consider the difference between the left-hand side and the right-hand side as be satisfied for some constant M. Simply let M = E(s) = f (s) − pn (s) − M(s − x0 )k+1 . Then E(s) is differentiable n + 1 times, and for all 0 ≤ k ≤ n, E (k) (x0 ) = 0. Now, we intend to use the Mean Value Theorem n + 1 times. Since E(x0 ) = E(x) = 0, by the MVT, there exists c1 between x and x0 such that E ′ (c1 ) = 0. Furthermore, E ′ (x0 ) = E ′ (c1 ) = 0, so there exists c2 between c1 and x0 such that E ′′ (c2 ) = 0. Repeating this process n + 1 times, we obtain E (n+1) (c) = 0 satisfying cn+1 = c between x0 and cn . Since pn (s) is an nth-degree polynomial, its n + 1st derivative is 0. Therefore, E (n+1) (c) = f (n+1) (c) − M(n + 1)! = 0, satisfying M = f (n+1) (c) for some c between x and x0 . ⊔ ⊓ (n + 1)! 196 24 Taylor Series Since the function f (x) is differentiable n + 1 times, we can approximate it with the n + 1st degree Taylor polynomial pn+1 (x). However, in that case, we do not know how large the error is. Error estimation is the essence of the Taylor theorem. The remainder term Rn (x) represents the approximation error of f (x) and pn (x), and except for f (n+1) (x0 ) in place of f (n+1) (s), it looks just like the n + 1st term of the Taylor polynomial. Moreover, s lies between x and x0 . Problem 24.3 (One point decides all). If f (x) = sin x, then for all x, x0 ∈ R, prove that ∞ f (n) (x0 ) (x − x0 )n (24.3) f (x) = ∑ n! n=0 is true, in other words, prove that f (x) = p(x). Solution 24.3 (1) With the presence of error estimation, we can easily solve this problem. Since sin x has derivatives whose absolute values are either 1 or less than 1, the error term Rn (x) in (24.2) converges to 0 for all x, x0 ∈ R as n → ∞ by the ratio test. Therefore, the limit of the series p(x) is equal to f (x). Let’s manually compute and verify this by explaining the key points verbally. ⊔ ⊓ The above result is very peculiar. The Taylor series is defined based solely on the derivative values at one point. However, the above result suggests that all the derivative information at one point determines the function values at all points. Let’s rewrite the remainder term Rn ; Rn (x) := f (n+1) (s) (x − x0 )n+1 . (n + 1)! If there exists a number M > 0 such that regardless of the degree n and the point s, the numerator of the coefficient is bounded, i.e., | f (n+1) (s)| < M holds, then as n approaches infinity, Rn (x) converges to 0 for any x. Of course, if x is far from x0 , |x − x0 | is large, so for Rn (x) to be sufficiently small, n must be much larger, but ultimately it becomes sufficiently small. Problem 24.4. For the exponential function f (x) = ex , (1) find the Taylor polynomial pn (x). (2) Find the interval of convergence. (3) Determine whether the limit p(x) matches with f (x) = ex . Solution 24.4 (1) To be completed. ⊔ ⊓ Problem 24.5 (Maclaurin series). If x0 = 0, show that (24.3) can be written as follows; ∞ ∞ (−1)n 2n+1 (−1)n 2n sin x = ∑ x , cos x = ∑ x . n=0 (2n + 1)! n=0 (2n)! Solution 24.5 [Solution] The even derivatives of sin x are either sin x or − sin x, both of which have a value of 0 at x = 0. Therefore, c2n = 0. The odd derivatives are 24.2 Applications 197 either cos x or − cos x, and they have values of +1 or −1 at x = 0. Hence, we obtain the above expressions. We can similarly proceed for f (x) = cos x. ⊔ ⊓ The above special cases of the Taylor series with the center point x0 set to 0 are called the Maclaurin series. 24.2 Applications Theorem 24.2 (Binomial expansion). For all |x| < 1 and all α ∈ R, prove the following: ∞ α(α − 1) · · · (α − k + 1) α α k . (24.4) (1 + x) = ∑ ck x , ck = := k k! k=0 Proof. The proof follows from Taylor’s theorem. For the interval where x > −1, 1 + x > 0, so (1 + x)α is well-defined for all α ∈ R, and its derivatives are also welldefined. Let f (x) = (1 + x)α , then f (k) (x) = α(α − 1) · · · (α − k + 1)(1 + x)α−k . Therefore, by Taylor’s theorem, n (1 + x)α = ∑ ck x k + k=0 α(α − 1) · · · (α − n)(1 + s)α−n−1 (n + 1)! holds. Here, s is a number between 0 and x. If |x| < 1, then 1 + x > 0, hence the remainder term converges to 0 as n → ∞. Thus, the series converges, and (24.4) holds. ⊔ ⊓ Expanding the binomial (1 + x)n for positive integer n > 0 can be seen as simply unfolding it by multiplication, but the result is always in the form of a Taylor series. Therefore, viewing the expanded result as a Taylor polynomial is a good application of Taylor series. In this case, for k > n, the numerator is 0, so ck = 0, and ck = n! k!(n−k)! . Thus, n n (1 + x) = k ∑ ck x , k=0 n! n ck = = k k!(n − k)! holds. However, if α is not a positive integer, ck is not always 0, and therefore it should be understood as a series. Problem 24.6. Use the ratio test to prove the convergence of the series (24.4). Solution 24.6 ⊔ ⊓ 198 24 Taylor Series Problem 24.7. Find the Taylor expansions of the following functions and determine their convergence intervals. 1 1 1 . (4) arctan x. . (2) . (3) (1) 1+x 1−x 1 + x2 Solution 24.7 ⊔ ⊓ Problem 24.8. Find the 0th, 1st, and 2nd terms of the polynomial (2 + 3x + x2 )10 . Solution 24.8 ⊔ ⊓ Problem 24.9. Find the Taylor expansion of ln x. Solution 24.9 An important point to note is that we cannot find a Taylor expansion centered at 0 because ln 0 is undefined. Therefore, the next option is to center it at 1. Then, ln(1) = 0 and ln(k) (x) = (−1)k−1 (k − 1)!x−k , so ∞ ln = ∑ ck (x − 1)k , k=1 ck = (−1)k−1 (k − 1)! (−1)k−1 = . k! k It is possible to choose a center other than 1 if necessary. ⊔ ⊓ There are different versions of Taylor’s Theorem, and one of them is as follows. Theorem 24.3 (Taylor’s theorem (Peano form)). Suppose that f (x) is differentiable n times for all x ∈ (a, b) ⊂ R, and x0 ∈ (a, b). Then, there exists a function hn : (a, b) → R such that f (x) = pn (x) + hn (x)(x − x0 )n , a < x < b, where hn (x) → 0 as x → x0 . Proof. Define the function hn as ( f (x)−p hn (x) = n (x) (x−x0 )n if x ̸= x0 , 0 if x = x0 This definition satisfies the relationship in the theorem. Applying L’Hopital’s Rule repeatedly shows that its limit is 0. ⊔ ⊓ Problem 24.10. (1) Explain the meaning of Theorem 24.3 and (2) compare it with Theorem 24.1. Solution 24.10 (1) Theorem 24.3 explains the limit as x → x0 . Expressing what the theorem says using little-oh notation, we have: | f (x) − pn (x)| = o(|x − xn |n ) as x → x0 . 24.2 Applications 199 Therefore, increasing the degree helps to improve the convergence speed as x → x0 . However, since the exact error is not given, it’s hard to say how much it helps, as we cannot compare the sizes of |hn (x)| corresponding to the coefficients. (2) Theorem 24.3 does not address the convergence of Taylor series. Even if f is infinitely differentiable and Taylor series can be constructed, it does not talk about the limit as n → ∞ for a fixed x ̸= x0 . ⊔ ⊓ Exercises 1. Appendix A Second Order Differential Equations In this lecture, we find solutions to second-order linear equations. A second-order linear equation can be written as follows: y′′ + a(x)y′ + b(x)y = Q(x). (A.1) Solving a second-order differential equation yields two general constants, and to determine them, two conditions are needed. For instance, we can provide two initial conditions as follows: y(x0 ) = y0 , y′ (x0 ) = y1 . Solving a second-order differential equation is much more difficult than solving a first-order one. It can be solved by hand only in special cases. In this lecture, we find solutions only when a, b, and Q are all constants. This is particularly important to Newton because the equations of celestial orbits are given as a special case among these. A.1 Second-order homogeneous linear equation If Q = 0, (A.1) is called a homogeneous differential equation. We specifically find solutions when it has constant coefficients: y′′ + ay′ + by = 0. (A.2) The first objective is to find two nonzero solutions that are linearly independent of each other, denoted as y1 and y2 . Of course, y = 0 satisfies the equation, but this is not helpful. Being linearly independent means that one cannot be expressed as a constant multiple of the other. In other words, finding y1 and y2 but still having y2 = Cy1 for some constant C ∈ R means we haven’t found the second solution yet. 201 202 A Second Order Differential Equations Problem A.1. If y1 and y2 are solutions to (A.2), show that any linear combination of them, y = C1 y1 +C2 y2 , is also a solution for all C1 ,C2 ∈ R. Solution A.1 Substituting the linear combination above into (A.1), we obtain: (C1 y1 +C2 y2 )′′ + a(x)(C1 y1 +C2 y2 )′ + b(x)(C1 y1 +C2 y2 ) = C1 (y′′1 + a(x)y′1 + b(x)y1 ) +C2 (y′′2 + a(x)y′2 + b(x)y2 ) = 0. Hence, the linear combination is also a solution to (A.1). ⊔ ⊓ The key here is that once we find two linearly independent solutions, we can find the general solution containing two general constants, representing all solutions. So, how do we find these two solutions? When the coefficients a and b are constants, we can find solutions of the form y = eλ x . To find the corresponding λ , we substitute eλ x into (A.2). Utilizing the property of the exponential function: (eλ x )′ = λ eλ x , (A.3) we obtain: λ 2 eλ x + aλ eλ x + beλ x = 0. Dividing by eλ x (which is nonzero), we get a quadratic equation for λ : λ 2 + aλ + b = 0. (A.4) This important equation is called the characteristic equation, and its solutions are: √ √ −a + a2 − 4b −a − a2 − 4b , λ2 = . λ1 = 2 2 The nature of the solutions depends on the sign of the discriminant a2 − 4b. Remark A.1 (Exponential Function). The property used in obtaining the characteristic equation (A.4) is the property of the exponential function eλ x given in (A.3). This property is fundamental to the exponential function and is essential for us. Case 1. a2 − 4b > 0 If the discriminant is positive, the characteristic equation has two real roots λ1 , λ2 ∈ R. Hence, the general solution is: A.1 Second-order homogeneous linear equation y = C1 eλ1 x +C2 eλ2 x . 203 (A.5) Problem A.2. Describe the asymptotic behavior of the solutions given by (A.5) depending on the signs of the real parts of the roots λ1 and λ2 of the characteristic equation, as x → ∞. Solution A.2 If either λ1 or λ2 is positive, the solutions diverge as x tends to infinity. If both are negative, the solutions converge to zero. If one of them is zero, the behavior depends on the other root. (If x represents time and y represents the distance between planets and the sun, these solutions do not describe the orbit of a planet.) ⊔ ⊓ Case 2. a2 − 4b = 0 If the discriminant is zero, the characteristic equation has a single real root: a λ =− . 2 This root is called a repeated root. Firstly, y1 = eλ x is a solution. We need to find another solution. The second solution is: y2 = xy1 . When λ is not a repeated root, xy1 is not a solution. However, when it is a repeated root, xy1 becomes a solution. This can be verified as follows. Substituting xy1 into equation (A.2), we get: (xy1 )′′ + a(xy1 )′ + bxy1 = xy′′1 + axy′1 + bxy1 + x′′ y1 + 2x′ y′1 + ax′ y1 = 2λ y1 + ay1 = 0. Thus, xy1 is a solution. Therefore, the general solution is: y = C1 eλ x +C2 xeλ x . (A.6) Problem A.3. Describe the asymptotic behavior of the solutions given by (A.6) depending on the sign of the real part of the root λ of the characteristic equation, as x → ∞. Solution A.3 If λ is positive, the solutions diverge as x tends to infinity. If λ is negative, the solutions converge to zero. If λ is zero, the solutions are straight lines. (If x represents time and y represents the distance between planets and the sun, these solutions do not describe the orbit of a planet.) ⊔ ⊓ 204 A Second Order Differential Equations Case 3. a2 − 4b < 0 If the discriminant is negative, the characteristic equation has two complex roots: λ1 = α + β i, a Here, α = − and β = 2 λ2 = α − β i. √ |a2 −4b| . 2 The solutions are: y1 = e(α+β i)x , y2 = e(α−β i)x . Thus, the general solution we seek is: y = C1 e(α+β i)x +C2 e(α−β i)x . Let’s review the definition of exponential functions with complex powers. Firstly, consider when only the imaginary part is raised to the power: ( eβ ix = cos β x + i sin β x, (A.7) e−β ix = cos(−β x) + i sin(−β x) = cos β x − i sin β x. When there is also a real part: eα+β ix = eα eβ ix = eα (cos β x + i sin β x). Question A.1. Is it reasonable to call the function defined on the right-hand side of (A.7) eβ ix an exponential function? What exactly is an exponential function? To call the function defined by (A.7) an exponential function, it must satisfy the unique properties of exponential functions. What are they? It’s essential to satisfy the property used to obtain the characteristic equation, (A.3). Problem A.4. For a real number β ∈ R, prove that the function defined by (A.7) satisfies the unique property of exponential functions (A.3). Solution A.4 To verify whether the definition in (A.7) makes sense, we need to satisfy the unique property of exponential functions. By direct computation: (eβ ix )′ = (cos β x + i sin β x)′ = −β sin β x + iβ cos β x = β i cos β x + i2 β sin β x = β i(cos β x + i sin β x) = β ieβ ix . The second case, (e−β ix )′ = −β ie−β ix , can be similarly shown, but it’s not necessary. It naturally follows due to the fact that the sin function is an odd function. ⊔ ⊓ Hence, the two solutions are written as: y1 = e(α+β i)x = eαx eβ ix = eαx (cos β x + i sin β x), A.2 Second order inhomogeneous linear equation 205 y2 = e(α−β i)x = eαx e−β ix = eαx (cos β x − i sin β x). The inconvenience of using these two lies in dealing with complex-valued functions. One could restrict to real functions. Since the linear combination of the two solutions is also a solution: eαx cos β x = (y1 + y2 )/2 and eαx sin β x = (y1 − y2 )/2i are also linearly independent solutions. Thus, we can use these two solutions: y1 = eαx cos β x, y2 = eαx sin β x. Using these, we can construct a general real-valued solution as follows: y = C1 eαx cos β x +C2 eαx sin β x. (A.8) Problem A.5. Describe the asymptotic behavior of the solutions given by (A.8) depending on the sign of the real part α of the roots of the characteristic equation, as x → ∞. Solution A.5 If the real part α is positive, the solutions diverge as x tends to infinity. If α is negative, the solutions converge to zero. If α is zero, the solutions are periodic functions. (If x represents time and y represents the distance between planets and the sun, these solutions do not describe the orbit of a planet.) ⊔ ⊓ A.2 Second order inhomogeneous linear equation To find all possible solutions of the inhomogeneous problem (A.1), first, we need to find two solutions y1 and y2 of the homogeneous problem with Q = 0. The work done in the previous section covers this. Now, we need to find one solution of the inhomogeneous problem (A.1) with Q(x). This solution is called the particular solution and denoted as y p . Then, all solutions of (A.1) are given by: y = C1 y1 +C2 y2 + y p . (A.9) Problem A.6. To show that for all constants C1 ,C2 , if y p satisfies (A.1) and y1 , y2 are linearly independent solutions of the homogeneous problem, then y given by (A.9) is a solution of (A.1). Solution A.6 For linear problems, it is convenient to introduce a linear operator. Defining L (y) = y′′ + ay′ + by, we can express (A.1) simply as L (y) = Q, which is convenient. The answer to this problem can also be stated concisely. L (y) = L (C1 y1 +C2 y2 + y p ) = C1 L (y1 ) +C2 L (y2 ) + L (y p ) = L (y p ) = Q. Thus, y = C1 y1 +C2 y2 + y p is a solution of (A.1). ⊔ ⊓ 206 A Second Order Differential Equations The technique for finding particular solutions varies depending on Q and y1 , y2 . However, in the case where the coefficients a, b are constants and Q is also a constant, it can be easily verified that the constant function y p = Q/b becomes a particular solution. That is, y′′p + ay′p + by p = (Q/b)′′ + a(Q/b)′ + b(Q/b) = Q. Therefore, y = C1 y1 +C2 y2 + Q/b is the general solution of (A.1). A.3 Equation for two-body problem The differential equation we need to solve to find the orbit of two celestial bodies, such as the Sun and the Earth, is as follows: u′′ + u = K. (A.10) Deriving this equation is the main goal of Lecture 11. The constant K on the right(m1 + m2 )G hand side is given by K = . Here, m1 and m2 are the masses of the two L2 celestial bodies, G is the gravitational constant, and L is the angular momentum; all of these are constants. If x1 and x2 are the positions of the two celestial bodies, then u is the reciprocal of the distance between them, r = ∥x1 − x2 ∥. However, the above differential equation is not a derivative with respect to the time variable t but rather a derivative with respect to the angular variable θ in polar coordinates. The general solution of the above problem (A.10) is u = C1 cos θ +C2 sin θ + K. The coefficients C1 and C2 of the trigonometric parts are determined by the initial conditions. Rewriting so that the sum of their squares is 1, we get q C1 C2 K u= q cos θ + q sin θ + q C12 +C22 . 2 2 2 2 2 2 C1 +C2 C1 +C2 C1 +C2 Then there exists an angle θ0 satisfying the following, called the phase offset: C1 cos(θ0 ) = q , C12 +C22 C2 sin(θ0 ) = q . C12 +C22 Therefore, the above expression can be written as follows: A.3 Equation for two-body problem 207 q K C12 +C22 . u = cos(θ0 ) cos θ + sin(θ0 ) sin θ + q 2 2 C1 +C2 Now, using the difference of cosines, we rewrite the solution u as q K u = cos(θ − θ0 ) + q C12 +C22 . 2 2 C1 +C2 Simplified, u = (1 + e cos(θ − θ0 ))K, e= q C12 +C22 K ,K= (m1 + m2 )G . L2 Here, e is the eccentricity of the ellipse. (People often use the letter e for eccentricity, which is merely a tradition and should be distinguished from the exponent e based on context.) Thus, the distance r between the two celestial bodies is given by: r= L2 . (1 + e cos(θ − θ0 ))(m1 + m2 )G Appendix B Elliptical orbits B.1 Eccentricity and focus of an ellipse The equation of an ellipse with center center at the origin and major and minor axes along the x and y axes, respectively, is given by: x 2 y2 + = 1. a2 b2 An overview of the graph is provided in the figure below. If a = b, the above ellipse becomes a circle. For convenience, we consider the case where a > b, making the x-axis the major axis. The foci (foci) of the ellipse are located on the major axis. The distance between the center and the focus is given by p c = a2 − b2 . Thus, the foci are at (±c, 0). The eccentricity, which indicates how far the ellipse deviates from a circle, is given by: 209 210 B Elliptical orbits c e= = a r a2 − b2 . a2 If e = 0, then a = b, and the ellipse becomes a circle. If e = 1, then b = 0, and the shape is no longer an ellipse. Therefore, the eccentricity of an ellipse lies between 0 and 1. Problem B.1. Show that if a point P(x, y) lies on the ellipse, then the sum of the distances between this point and the two foci is always constant. Solution B.1 ⊔ ⊓ The equation of a pair of hyperbolas, hyperbolas, with center at the origin and foci on the x-axis is given by: x 2 y2 − = 1. a2 b2 In this case, when the coefficient of y2 is negative, the foci lie on the x-axis. Refer to the figure for an overview of the graph. The distance between the center and the focus of the hyperbola is given by p c = a2 + b2 . Thus, the foci are at (±c, 0). The eccentricity of the hyperbola is similarly defined as: r a2 + b2 c . e= = a a2 The eccentricity of a hyperbola is greater than 1. Problem B.2. Show that if a point P(x, y) lies on the hyperbola, then the difference between the distances from this point to the two foci is always constant. Solution B.2 ⊔ ⊓ B.2 Directices and ellipses Consider the following figure. The line x = k is called the directrix of the trajectory of the point P(x, y) we are going p to obtain in this section. The length of the line OP is denoted by r and given by r = x2 + y2 . The length of PD is k − x. For a number e, we find a curve made by the point P(x, y) that satisfies r = ePD. Then, it satisfies (B.1) B.2 Directices and ellipses p 211 x2 + y2 = e(k − x) ⇒ x2 + y2 = e2 (k2 − 2kx + x2 ). It is written as (1 − e2 )x2 + 2ke2 x + y2 = e2 k2 . (B.2) Depending on the value e, we obtain three kinds of curves. We will soon see that e is the eccentricity of these curves, which is why we denote it e. We split the problem into three cases. Problem B.3 (Case 1. e = 1). Show that if e = 1, the particle trajectory satisfied by (B.2) is a parabola. Solution B.3 If e = 1, (B.2) is written as x= 1 k − y2 . 2 2k This is a parabola. We know that the eccentricity of a parabola is 1. ⊔ ⊓ Next, we assume e ̸= 1. Then, (B.2) is written as ke2 y2 e2 k2 k 2 e4 x+ + = + . 1 − e2 1 − e2 1 − e2 (1 − e2 )2 Simplify the right side and obtain ke2 y2 e2 k 2 x+ + = . 1 − e2 1 − e2 (1 − e2 )2 (B.3) Problem B.4 (Case 2. 0 < e < 1). Show that if 0 < e < 1, the particle trajectory satisfied by (B.2) is an ellipse, the origin is one of the two focuses, and e is the eccentricity of the ellipse. Solution B.4 Suppose that 0 < e < 1. Then, since 1 − e2 > 0, we may set a2 = e2 k2 , (1 − e2 )2 b2 = a2 (1 − e2 ) = e2 k 2 , (1 − e2 ) c= ke2 > 0. 1 − e2 (B.4) 212 B Elliptical orbits Divide (B.3) by a2 and obtain (x + c)2 y2 + 2 = 1, a2 b which is an ellipse. The center of the ellipse is (−c, 0). q 2 2 The eccentricity of the ellipse is defined as a a−b 2 . We see that a2 − a2 (1 − e2 ) 1 − (1 − e2 ) a2 − b2 = = = e2 . 2 a a2 1 (B.5) Hence, the eccentricity of the ellipse is e. The distance from the center of an ellipse √ to a focus is a2 − b2 . Hence, using (B.5), we obtain a2 − b2 = a2 − a2 (1 − e2 ) = e2 a2 = k 2 e4 . (1 − e2 )2 Therefore, c in (B.4) is the distance. Hence, the origin is one of the two focuses of the ellipse. ⊔ ⊓ Problem B.5 (Case 3. e > 1). Show that if e > 1, the particle trajectory satisfied by (B.2) is a branch of a hyperbola, the origin is one of the two focuses, and e is the eccentricity of the ellipse. Solution B.5 Suppose that e > 1. Then, since 1 − e2 < 0, we cannot take (B.4). We take a2 = e2 k2 , (1 − e2 )2 b2 = a2 (e2 − 1) = e2 k 2 , (1 − e2 ) Divide (B.3) by a2 and obtain (x + c)2 y2 − 2 = 1, a2 b c= ke2 . 1 − e2 c= ke2 < 0. 1 − e2 B.3 Polar equations of an ellipse 213 This is a hyperbola. e is still the eccentricity of the hyperbola, and −c is the distance between the origin and a focus. ⊔ ⊓ B.3 Polar equations of an ellipse One simple method of representing an elliptical orbit using polar coordinates is given by (B.1). In this case, PD is equal to k − r cos θ , so using this expression, we obtain r = e(k − r cos θ ). Solving this equation for r, we get: r= ek . 1 + e cos θ This equation represents an ellipse with eccentricity e when 0 < e < 1. However, when e ≥ 1, it represents a parabola or a hyperbola. The directrix k depends on the eccentricity e. When angular velocity L and eccentricity are given, it can be expressed as follows: L2 k= . eG(m1 + m2 ) Also, remember that when the total energy Etotal and angular velocity L are given, the eccentricity is given by (12.6). That is, s 2Etotal L2 e = 1+ . m1 G2 m22 Appendix C Numerical experiments for Taylor series In this final lecture, we observe how Taylor series approximates actual functions through simple numerical coding. We also compare it with some other approximation methods. 215 Index absolute convergence, 180 acceleration in polar coordinates, 71 big-oh, 139 bijection, 45 binomial expansion, 197 Cauchy’s Fundamental Theorem of Calculus, 42 Cauchy’s Mean Value Theorem, 25 center of ellipse, 209 chain rule, 29 co-domain, 45 comparison test, 179 conditional convergence, 190, 191 continuity, 12 decreasing sequenc, 171 differential equation, 77 directrix, 72, 210 domain, 45 eccentricity of ellipse, 71, 209 eccentricity of hyperbola, 210 focus of ellipse, 71, 209 function, 45 Fundamental Theorem of Algebra, 149 fundamental theorem of calculus, 42 injection, 45 integrability, 41 integral, 41 integral test, 177 Intermediate Value Theorem, 25 inverse function, 45 Kepler problem, 96 L’Hopital’s rule, 135, 136 left continuity, 14 left limit, 13 limit, 11 limit comparison, 179 limit infimum, 171 limit supremum, 171 linearization, 122 little-oh, 139 local property, 122 lower bound, 171 Mean Growth Rate Theorem, 25 Mean Value Theorem, 25 monotonicity, 171 natural logarithm, 49 Newton’s Second Law of Motion, 26 one-to-one function, 45 onto function, 45 gauge, 40 hyperbola, 210 implicit differentiation, 34 increasing sequence, 171 infimum, 171 partition, 40 Position, velocity, acceleration using polar coordinates, 71 range, 45 ratio test, 181 217 218 rearrangement, 191 Riemann sum, 41 right continuity, 14 right limit, 14 root test, 182 rules of continuity, 7 sandwich theorem, 169 separation of variables, 80 Index slope field, 78 supremum, 171 surjection, 45 Taylor polynomial, 194 Taylor series, 194 upper bound, 171